Table of Contents

 

I. OVERVIEW                                                                                                     1

II. INTRODUCTION                                                                                    3

1. Features                                                                                                           3

2. Conventions                                                                                                3

3. Reading This Manual                                                                         3

III. OPERATING PROCEDURE                                                      5

1. Writing Grammar Description Files for PCYACC  5

2. Generating the Object-Oriented Parsers             5

3. Writing Scanner Description Files                                7

4. Generating the Object-Oriented Lexers                8

5. Integration of All Source Files                                        9

IV. PCLEX                                                                                                            10

1. C Code Structure Generated by PCLEX                     10

2. Code Generated by PCLEX in C++                                         11

3. Structure of Generated C++ Code                                 12

4. Synopsis for ABXLex Class                                                        17

a. Description                                                                                                     17

b. Example                                                                                                             17

c. Public Constructor and Destructor                                            18

d. Public Member Functions                                                                   18

V. PCYACC                                                                                                          20

1. C++ Code Generated with PCYACC C++ Skeleton    21

2. Generating C++ Code by Using PCYTOOL                     25

3. Synopsis for ABXYacc Class                                                     25

a. Description                                                                                                     26

b. Example                                                                                                             26

c. Public Constructor and Destructor                                            29

d. Public Member Functions                                                                   29

VI. SYMBOL TABLE                                                                                  30

1. Introduction                                                                                            30

2. Synopsis for ABXSymbolTable Class                            31

a. Description                                                                                                     31

b. Symbol Table Entry Definition                                                       33

c. Private Class Member                                                                            33

d. Public Constructor and Destructor                                            34

e. Public Member Functions                                                                   34

VII. ERROR HANDLER                                                                         37

1. Introduction                                                                                            37

a. Error Reporting                                                                                          37

b. Error Recovery                                                                                            38

2. Functions for Error Reporting                                        38

3. Functions for Error Recovery                                          39

4. Synopsis for ABXError Class                                                 40

a. Description                                                                                                     41

b. ABXError Class Definition                                                                 41

c. Public Constructor and Destructor                                            42

d. Public Member Functions                                                                   42

VIII. PARSE TREE NODE                                                                   45

1. Analyze Parse Tree Node Class ABXParseTreeNode    45

2. Structure for ABXParseTreeNode Class               47

3. Structure for ABXLeaf Class                                               48

4. Expression Classes ABXExprNode                                    49

5. Structure for Parse Tree Class ABXParseTree    51

IX. Java Parser and Lexer                                                   53

1. Introduction                                                                                            53

2. Java Class Library                                                                              56

a. JavaLex Class                                                                                               56

b. JavaYacc Class                                                                                            59

c. JavaError Class                                                                                          61

d. JavaParseTree Class                                                                              63

(i). JavaParseTreeNode Class                                                                        63

(ii). JavaLeaf Class                                                                                               64

(iii). JavaLeafList Class                                                                                    64

(iv). JavaExprNode Class                                                                                 65

(v). JavaExprNodeList Class                                                                          66

(vi). JavaParseTree Class                                                                                 66

e. JavaSymbolTable Class                                                                        67

3. Example                                                                                                           71

X. Delphi Parser and Lexer                                              77

1. Introduction                                                                                            77

2. Delphi Unit Library                                                                           79

a. DelphiLex Unit                                                                                             79

b. DelphiYacc Unit                                                                                          84

3. Example                                                                                                           86

XI. VBScript Parser and Lexer                                     90

1. Introduction                                                                                            90

2. Structure of VBScript Parser and Lexer              92

a. VBScript Lex Modules                                                                           94

b. VBScript Yacc Modules                                                                        96

c. VBScript Error Report Modules                                                    96

3. Example                                                                                                           97

XII. Pascal Parser and Lexer                                       113

1. Introduction                                                                                          113

2. Pascal Library                                                                                       114

a. Pascal Lexer                                                                                                114

b. Pascal Parser                                                                                              118

XIII. Basic Parser and Lexer                                          120

1. Introduction                                                                                          120

2. Structure of VBasic Parser and Lexer                   123

a. VBasic Lex Modules                                                                              124

b. VBasic Yacc Modules                                                                            126

c. VBasic Error Report Modules                                                       126

XIV DESIGN REQUIREMENT FOR YACC                      128

1. Objective                                                                                                     128

2. Scope                                                                                                                 128

3. Command Line Options                                                                 128

XV DESIGN REQUIREMENT FOR LEX                             132

1. Objective                                                                                                     132

2. Scope                                                                                                                 132

3. Command Line Options                                                                 132

APPENDIX I. HOW TO CREATE C PARSER AND LEXER  136

1. Command Line Format for PCYACC                                 136

2. Command Line Options for PCYACC                                136

3. Command Line Format for PCLEX                                     138

4. Command Line Options for PCLEX                                    139

5. Default Skeleton File                                                                140

APPENDIX II. HOW TO CREATE C++ PARSER AND LEXER     141

1. Command Line Format for PCYTOOL                             141

2. Command Line Options for PCYTOOL                            141

3. Command Line Format for PCLTOOL                             141

4. Command Line Options for PCLTOOL                            142

5. Default Skeleton File                                                                142

APPENDIX III. HOW TO CREATE JAVA PARSER AND LEXER          143

1. Command Line Format for PCYTOOL                             143

2. Command Line Options for PCYTOOL                            143

3. Command Line Format for PCLTOOL                             143

4. Command Line Options for PCLTOOL                            144

5. Default Skeleton File                                                                144

APPENDIX IV. HOW TO CREATE DELPHI PARSER AND LEXER   145

1. Command Line Format for PCYTOOL                             145

2. Command Line Options for PCYTOOL                            145

3. Command Line Format for PCLTOOL                             145

4. Command Line Options for PCLTOOL                            146

5. Default Skeleton File                                                                146

APPENDIX V. HOW TO CREATE PASCAL PARSER AND LEXER     147

1. Command Line Format for PCYTOOL                             147

2. Command Line Options for PCYTOOL                            147

3. Command Line Format for PCLTOOL                             147

4. Command Line Options for PCLTOOL                            148

5. Default Skeleton File                                                                148

APPENDIX VI. HOW TO CREATE VISUAL BASIC SCRIPT PARSER AND LEXER                                                                                                     149

1. Command Line Format for PCYTOOL                             149

2. Command Line Options for PCYTOOL                            149

3. Command Line Format for PCLTOOL                             149

4. Command Line Options for PCLTOOL                            150

5. Default Skeleton File                                                                150

APPENDIX VII. HOW TO CREATE BASIC PARSER AND LEXER      151

1. Command Line Format for PCYTOOL                             151

2. Command Line Options for PCYTOOL                            151

3. Command Line Format for PCLTOOL                             151

4. Command Line Options for PCLTOOL                            152

5. Default Skeleton File                                                                152

APPENDIX VIII. ERROR MESSAGES FOR PCYTOOL           153

Error Code: Error Message and Explanation                                 153

APPENDIX IX. ERROR MESSAGES FOR PCLTOOL 155

Error Code: Error Message and Explanation                                 155

APPENDIX X. BIBLIOGRAPHY                                                  157


 


I. OVERVIEW

PCLEX and PCYACC are widely used software tools for developing compilers. Currently, almost all PCYACC generate output codes in C. It misses the well known advantages of object oriented programming, e.g., data abstraction, encapsulation and inheritance. PCYACC OBJECT ORIENTED TOOLKIT Library offers an object-oriented version of lexical analyzers, syntax parsers, and error handling facilities. In the construction of compilers, symbol table management is a commonly used technique. In this library, parse tree and symbol table classes are also provided as the tools of compiler construction. PCYACC OO TOOLKIT Library provides five basic classes:

1). Lexical Analyzer Class: This class serves as a code skeleton for PCLEX.

2). Syntax Parser Class: This class supports syntactic parser PCYACC.

3). Symbol Table Class: This class is used for symbol table management.

4). Error Handling Class: This class is responsible for error reporting.

5). Parse Tree Class: This class is available when user wants to construct parse trees.


PCYACC OO TOOLKIT Library takes advantage of data abstraction, encapsulation and inheritance in object oriented software design. The structure of our PCYACC OO TOOLKIT Library is shown below:

More detailed explanation about our PCYACC OO TOOLKIT Library will be provided from section IV through section VIII from both theoretical and practical points of view.


II. INTRODUCTION

1. Features

PCLEX and PCYACC have proven to be very efficient and elegant tools for constructing compilers and interpreters for decades. At the time it was born, C was the only target language. With the emergence of object oriented programming languages, especially C++, the programming world quickly switched gears to take advantage of the new programming method. To incorporate C++ features into lexical analyzers and syntactic parsers, C++ classes can be inserted into user defined part in the processing. The resulting lexical analyzers and syntactic parsers are in C++ and they take the advantages of the object oriented programming features. Here we provide a package called PCYACC OO TOOLKIT Library which provides five classes,

            • ABXLex (a class for lexical analyzer)

            • ABXYacc (a class for syntactic parser)

            • ABXSymbolTable (a class for symbol table management)

            • ABXError (a class for error reporting and error recovery)

            • ABXParseTree (a class for parse tree)

However, due to the specialty of PCYACC and PCLEX, the classes that will be utilized by the users in their C++ code are generated from a special form of source files, scanner description files and grammar description files. It will be impossible to define a general class for all the objects in the same category as other class libraries usually do. We will see the reasons in detail later.

2. Conventions

All Abraxas Software class names start with the letters ”ABX”. All function names start with a lower case letter, followed by uppercase letters and underscores. For example, YACC class is named as ABXYacc and parser function is yyParse(). To make it easy to remember and to understand, abbreviations are not used.

3. Reading This Manual

This manual is intended to serve as tutorial book for PCYACC Object Oriented Toolkit Library. It describes all the classes and member functions embedded in this library and how to use PCYTOOL and PCLTOOL utility programs to create parsers and lexers in different languages. The discussion about different classes is generally more detailed than what is necessary for actual use. The reason is that this manual will help the user get familiar about not only the usage of PCYACC OO Toolkit Library itself but also understanding its internal design for further upgrading later.


III. OPERATING PROCEDURE

Abraxas Software provides the PCYACC and PCLEX tools to create C parsers and lexers. With the appearance of C++ and object oriented programming language it is a trend in the industry to upgrade products from non-object oriented design to object oriented design. Thus taking advantage of the fact that C++ programming language provides data encapsulation, inheritance, etc. C++ PCYACC classes provided by Abraxas Software are available for the user to create C++ lexers and parsers. The following describes the operation of how to use our new PCYACC OO TOOLKIT Library to create C++ version of lexer and parser in more detail.

1. Writing Grammar Description Files for PCYACC

If you only need a single parser in your main program there is no change in the requirements for the grammar description file.

If you plan to generate multiple parsers and use them in the same main program one grammar description file should be generated for each parser you wish to have. The main function should not be included in any one of the grammar description files. Instead, a separate .cpp file should be generated for the main function. In order for lexers to support multiple parsers, it is required that tokens defined in the grammar files should be different for each parser. Otherwise redefinition of tokens will occur which makes it impossible to support multiple parsers. Referencing external variables or functions defined in a parser or lexer description file in any user function is also discouraged.

2. Generating the Object-Oriented Parsers

There are two methods to create C++ parser:

            • Use class skeleton file with -p on the PCYACCcommand line.

            • Use utility program PCYTOOL to translate C parser into C++ parser with default class skeleton file.

In the first approach, simply make use of the -p command line option of PCYACC (assuming DOS PCYACC) to include the skeleton C++ code “pcy_sk.cpp”. Everything else is the same as the procedure for generating C code. Users can insert their own skeleton class definition files in place of "pcy_sk.cpp".


The file translation is illustrated as follows, assuming the grammar description filename is “myparser.y”,

The default skeleton file provided is "pcy_sk.cpp". If you would like to use your own class skeleton file, simply follow -p option with your own skeleton filename. Everything else is the same as the usual procedure of using the PCYACC tool.

The second approach provided by Abraxas Software is focused on separation of the procedure for creating C++ parser. This can make it easy for the user to understand how the utility program PCYTOOL works with C parser. Also the new -k option can be used to create some other language parsers like JAVA, Borland Delphi, Basic, Pascal, etc. For a detailed description of -k options, check later sections. The default -k option is -k1 for creating a C++ parser.

There are two separate procedures for creating a C++ parser. In the first phase, based on the availability of grammar description file, simply invoke PCYACC tool on the command line to create a C parser. Once the C parser is generated by PCYACC a utility program named PCYTOOL is needed for generating a C++ parser in the second phase. The PCYTOOL uses the default class skeleton file “pcy_sk.cpp” that is actually the parser class declaration file. The corresponding class header file name is “pcy_sk.hpp”.


If the grammar description file is named “myparser.y”, the following diagram shows how the second approach creates the C++ parser.

The second approach for creating a C++ parser looks inconvenient for the user. However, separation of the procedure of generating a C parser and a C++ parser makes it easier to understand the process of generating a C++ parser. It will also be convenient to implement other language parsers by simply providing -k option that can tell PCYTOOL in which language the parser will be generated.

3. Writing Scanner Description Files

If you only need a single lexer in your main program there is no change in the requirements for the scanner description file. Include the corresponding header file (e.g., y1.h) for the tokens expected by the supported parser (e.g., y1.y).

If you plan to generate multiple lexers and use them in the same main program, one scanner description file should be created for each lexer you wish to have. Each lexer file should include the corresponding token definition header file for the supported parser. A lexer can support only one set of tokens. No lexer externals should be referenced in user's main function.

4. Generating the Object-Oriented Lexers

There are two methods for creating a C++ lexer:

            • Use a class skeleton file with -p on in the PCLEX command line.

            • Use the utility program PCLTOOL to translate a C lexer into a C++ lexer with a default class skeleton file.

In the first approach simply make use of the -p command line option of the PCLEX (assuming DOS PCLEX) to include the skeleton C++ code “pcl_sk.cpp”. Everything else is the same as when generating C code.

The file translation is illustrated below, assuming the scanner description filename is “mylexer.l”,

The default skeleton file provided is "pcl_sk.cpp". If you would like to use your own class skeleton file simply follow -p option with your skeleton filename. Everything else is the same as the usual procedure of using PCLEX tool.

The second approach provided by Abraxas Software is focused on the separation of the procedure for creating a C++ lexer. This can make it easy for the user to understand how the utility program PCLTOOL works with the C lexer. Also the new -k option can be used to create some other language lexers like JAVA, Borland Delphi, Basic, Pascal, etc. For details about the -k option, please check later sections. The default -k option is -k1 for creating a C++ lexer.

There are two separate procedures to create a C++ lexer. In the first phase, based on the availability of scanner description file, simply invoke the PCLEX tool on the command line to create the C lexer. Once the C lexer is generated by PCLEX, a utility program named PCLTOOL is needed for generating a C++ lexer in the second phase. The PCLTOOL uses the default class skeleton file “pcl_sk.cpp” that is actually the lexer class declaration file. The corresponding class header file name is “pcl_sk.hpp”.

If the scanner description file is named “mylexer.l”, the following diagram shows how the second approach creates a C++ lexer.

The second approach creating a  C++ lexer looks inconvenient for the user. However, separation of the procedure of generating a C lexer and a C++ lexer makes it easier to understand the process of generating C++ lexer. It will also be convenient for you to implement other language lexers by simply providing the -k option that can tell PCLTOOL which language the lexer will be generated in.

5. Integration of All Source Files

All the source files generated in step 2 and step 4 plus the user's main function file form the complete set of source code for generating a parser application.


IV. PCLEX

PCLEXTM is a program generator for writing lexical analyzers. A lexical analyzer reads a stream of characters and separates the stream into symbols of a target language, also known as tokens. PCLEX translates a lexical analyzer (lex(.l) file) written in the Scanner Description Language (SDL) into C language. SDL is a special high level language oriented toward string matching. Scanner descriptions can be extended with code sections written in C or C++ to accommodate the different needs of different languages. SDL allows software developers to concentrate on what the scanner recognizes instead of dragging in the details of how. It can reduce the work necessary to bring a project to completion. Almost all the LEXes use C as the host language. To explore the advantages of an object oriented programming language like C++ we provide a lexical scanner class as a skeleton that can be inserted to the output file by specifying the -p option on the PCLEX command line. After running PCLEX on the SDL file (lex(.l) file) with the option -p on, the generated code will be in C++.

1. C Code Structure Generated by PCLEX

The C code of a lexical analyzer generated by PCLEX has the following layout,

1). Macro definitions: This part defines some macros that are actions of the lexical analyzer. It defines symbolic names and gives users a chance to redefine them to meet their special requirements.

2). Code segment copied directly from the declaration section of lexer file: This part contains the macros defined by user, the declarations of variables, functions and types to be used in embedded actions. It varies with different lexical analyzers and it is optional.

3). Data tables: This part consists of the data tables for driving the Deterministic Finite Automaton (DFA) simulator. They are different for different lexical analyzers.

4). Global variables: This part consists of a variable representing the input stream buffer and pointers that indicate the status of input being scanned. They should be almost the same for different lexical analyzers.

5). Auxiliary functions: This part defines the functions that are only called by lexer function yylex().

6). Function yylex(): This part defines the function yylex().

7). Code segment copied directly from the function section of lexer file: This part contains the possible function definitions by the user. It is also optional.

2. Code Generated by PCLEX in C++

There are two approaches for getting the C++ lexical analyzer. 1). Explore the convenience of option -p of PCLEX, which creates a C++ version of lexical analyzer skeleton by using pcl_sk.cpp skeleton file. 2). Modify the C lexer generated by PCLEX to convert it to a C++ lexer.

The first approach is quick and its operation is more familiar to users. The second approach allows new features to be added to PCLEX. For example, we can create a utility program PCLTOOL that will convert the C lexer code generated by PCLEX to be C++ lexer code by using default option -k1. And lexers in other languages can be generated by supporting the -k option.

The ideal situation is to create a general class ABXLex that fits all lexical analyzers. Just like all the sets of objects are instances of the same class SET. However, there are major obstacles for us to reach the goal. A lexical analyzer is a simulator of DFA, it depends on the data tables to drive its transition. For each specific lexical analyzer, it has its own data tables. In other words, data tables vary with the individual analyzers. These data tables have the values determined after PCLEX scanned the lex(.l ) file. They are referenced by function yylex(), and they are always read-only. If we have a general class for all lexical analyzers every instance of this class will have the same copies of these tables. But each class instance should be different because the initial values of these tables are decided after running PCLEX. This is one reason why each lexical analyzer should have one class. Another reason is the embedded user actions. The user actions also vary with different lexical analyzers. These two reasons make it almost impossible to have a general class for all lexical analyzers.

Besides the data tables, the other important parts of a lexical analyzer are the input buffer and the pointers indicating the scanning position. To separate the input stream into tokens we need to give both the value representing the token and the token itself. The value representing the token will be returned by function yylex(). The token will be a variable of type YYSTYPE.

3. Structure of Generated C++ Code

Like the C code generated by PCLEX, the C++ code also has a certain structure. In the C code some functions and variables used as global variables and functions will be moved into the class to become member variables and functions in the C++ code. The C++ lexer has the structure listed below, assuming that the class definition resides in file pcl_sk.hpp, which is the default header file.

001:    #include “pcl_sk.hpp”

002:    Code from section 1 of .l file

003:    Data tables

004:    Definition of the lexical analyzer class’ public

           member functions

Header file pcl_sk.hpp contains the definition of the default lexical analyzer class ABXLex. All these data tables are read-only for the lexical analyzer. This makes it possible to have all instances of that class share the same copies of these large tables without each needing to have a copy.


The structure of class ABXLex is shown below:

001:    Private part:

002:        Declare variables YYlval and YYval

003:        Declare data tables

004:        Declare YY_JAM and YY_JAM_BASE variables used

               for lexer

005:        Define macros in C lexer that will be used for

               C++ lexer

006:        Declare input streams

007:        Declare buffer and buffer pointers

008:        Declare a variable yyLineNo counting the line

               number of input

009:        Declare a variable yyText containing the content

               of lexer matching pattern

YYlval and YYval variables are internal variables for ABXLex class. These represent yylval and yyval variables in a C lexer respectively.

Data tables are declared as read-only arrays. These arrays are const for the lifetime of particular ABXLex object, but not for the ABXLex class as a whole.

YY_JAM and YY_JAM_BASE variables are private members of the ABXLex class which are used by lexing function. These two variables are defined as macros in a C lexer by PCLEX tool.

Macros defined in a C lexer will be put into the ABXLex class as its private data members. Putting C lexer macros into a class scope is due to the characteristics of data encapsulation from the Object Oriented Programming technique.

Four types of input streams are provided in ABXLex class. These are input from stdin, file, character string or istream respectively.

The buffer is the memory segment where input stream stays for scanning. It can only be accessed by the lexer function yyLex. The size of the buffer is decided by a constant, which is defined by a macro. To change the size of the buffer we just need to redefine the macro to a new constant. The pointer indicates the position of the character that has been scanned most recently.

The line number yyLineNo is used to keep track of the actual input line number the scanner has processed.

Variable yyText contains the content of the lexer matching pattern. The corresponding variable in a C lexer is yytext.

001: Public part:

002:      Declare constructor function

003:      Declare destructor function

004:      Declare function get_yyLineNo()

005:      Declare function get_yyText()

006:      Declare function get_yyBufferPtrC()

007:      Declare function input()

008:      Declare function unput()

009:      Declare function set_YYSTYPEInstance()

010:      Declare function yyCheck()

011:      Declare function yyInit()

012:      Declare function yyLex()

013:      Declare function yyPeer()

014:      Declare function yySetBuffer()

015:      Declare function yySetInput()

016:      Declare function yySearch()

017: Protected part:

018:      Declare virtual function yyWrap()

Currently, we consider four types of input stream to the ABXLex objects. Each type of input stream has its own corresponding constructor function. All the private data members for ABXLex class are defined inside the constructor.

The destructor function simply frees all the memory that is allocated in the constructor.

Function get_yyLineNo returns the current line number in the input stream. This information is very useful especially when you want to report an error message.

Function get_yyText returns the lex token buffer which contains the current token processed by the lexer.

Function get_yyBufferPtrC returns the current token buffer index processed by the lexer.

Function input fills the input buffer of the lexer which gets the next character from input.

Function unput puts a character back in the logical input stream.

Function set_YYSTYPEInstance will set two instances of YYSTYPE that is shared between ABXLex/ABXYacc instances.

Function yyCheck will output the content of the lexeme presently being examined by the lexer. It maintains its own line number count and counts line number, whenever a new line character ‘\n’ is encountered. It also should have the ability to manage the output formatting.

Function yyInit resets the position of the buffer pointers and clears the buffer. This function is useful when the lexer needs to switch the input from one file to another.

Function yyLex is actually a lexer that will return a token to a parser whenever it needs a token from the input stream.

Function yyPeer allows users to get the next n characters from the input stream. The number of characters that will be fetched is specified by the first parameter of this function. The fetched characters will be stored in a string that is specified by the second parameter of the function.

Function yySetBuffer allows the user to reset the input buffer size according to project requirement. The buffer size can be modified to shrink or enlarge. This makes it convenient for the user to simply call the member function to change it instead of doing it by “hand”.

Function yySetInput enables a lexer to switch input from one file to another even in the middle of processing a file. To do this, users must take care of the backup and restoration of the buffer and function. yySetInput will do this work.

Function yySearch is provided by the user. This will search the token list based on the current token from the lexical analyzer.

Function yyWrap always returns 1. This indicates the program is done and there is no more input.

We can also take the second approach to create a C++ lexer by using PCLTOOL with the default k1 option on. However, you should create a C lexer with PCLEX first. PCLTOOL will hook the C lexer and insert the default C++ lexer skeleton into it. A C++ version lexer is generated in the way described above. The following gives an example of the command line to create a C++ lexer, however you have to make sure that the C++ lexer skeleton files (pcl_sk.cpp and pcl_sk.hpp) are residing the current working directory.

PCLTOOL –k1 yacc.h lex.l lex.c

Or

PCLTOOL –k1 lex.c

yacc.h is a yacc header file including tokens and YYSTYPE union definition. lex.l is a scanner description file. lex.c is a lexical analyzer generated by PCLEX.

As a result, users get the source code file lex.cpp. To declare any instance of class ABXLex, you just need to include the default header file pcl_sk.hpp in the source code to make the class defined.

The structure of this generated C++ code in file lex.cpp is listed as follows:

001: #include “pcl_sk.hpp”

002: Code copied from section 1 of file lex.l (if there is any)

003: YY_JAM and YY_JAM_BASE constants

004: Data tables

005: ABXLex::ABXLex(FILE *, FILE *) { }

006: ABXLex::ABXLex(istream *, FILE *) { }

007: ABXLex::ABXLex(char *, FILE *) { }

008: A