Table of Contents
PREFACE.............................................................................................................................................................. 1
I. OVERVIEW................................................................................................................................................... 2
1. Typographical Conventions....................................................................................................................................... 2
2. Manual Organization................................................................................................................................................. 3
II. OVERVIEW OF PCYACC........................................................................................................................................ 5
1. History.......................................................................................................................................................................... 5
2. What Does PCYACC Do ?........................................................................................................................................ 6
III. COMMAND LINE AND OPTIONS....................................................................................................................... 8
1. Command Line Format............................................................................................................................................... 8
2. Command Line Options............................................................................................................................................. 8
IV. BASICS OF PROGRAMMING LANGUAGES................................................................................................. 16
1. What Is A Computer Programming Language?................................................................................................ 16
2. What Are Compilers................................................................................................................................................ 17
V. GETTING STARTED -- A SMALL EXAMPLE................................................................................................... 19
1. Grammar Description File for SACALC............................................................................................................. 19
2. Building the Executable........................................................................................................................................... 24
3. Sample Session of SACALC at Work.................................................................................................................. 25
VI. BASIC CONCEPTS REVISITED........................................................................................................................ 26
1. General Concepts..................................................................................................................................................... 26
2. BNF -- A Language for Writing CFG's................................................................................................................ 28
3. PCYACC Terminology’s -- A Short Review........................................................................................................ 30
VII. INTOPOST -- A SECOND EXAMPLE............................................................................................................. 32
1. Problem Statement................................................................................................................................................... 33
2. Defining the infixel Language............................................................................................................................... 35
3. Writing PCYACC Grammar Description For infixel....................................................................................... 36
4. Converting Grammar Descriptions Into C Programs....................................................................................... 42
5. Building the Executable........................................................................................................................................... 43
VIII. PRINCIPLES BEHIND PCYACC..................................................................................................................... 44
1. Introduction to Formal Language Theories......................................................................................................... 44
2. Context-free Grammars.......................................................................................................................................... 45
3. Context-free Languages.......................................................................................................................................... 49
4. Parse Trees............................................................................................................................................................... 49
5. Canonical Derivation and Canonical Reduction................................................................................................. 51
6. Top-down and Bottom-up Parsing.......................................................................................................................... 52
7. Ambiguities of Context-free Grammars............................................................................................................... 53
8. LR Parsers................................................................................................................................................................ 54
9. PCYACC -- From LALR Grammars to LALR Parsers..................................................................................... 56
IX. WRITING PCYACC GRAMMAR DESCRIPTIONS....................................................................................... 57
1. Structure of Grammar Description Programs................................................................................................... 57
2. The Declaration Section.......................................................................................................................................... 58
3. The Grammar Rule Section.................................................................................................................................... 60
4. The Program Section............................................................................................................................................... 60
5. Associating Actions with Grammar Rules.......................................................................................................... 60
X. MORE ON PCYACC PROGRAMMING............................................................................................................. 64
1. Mandatory Supporting Functions.......................................................................................................................... 64
2. The Role of the Drive Routine................................................................................................................................ 65
3. The Role of the Lexical Analyzer.......................................................................................................................... 66
4. The Role of PCYACC Generated Token Definitions......................................................................................... 69
5. The Role of the Error Processing Routine.......................................................................................................... 69
6. Data Types of Grammar Symbols.......................................................................................................................... 69
7. Defining YYSTYPE.................................................................................................................................................. 70
8. Associating Types with Grammar Symbols......................................................................................................... 71
9. Manipulating Values of Grammar Symbols........................................................................................................ 72
10. Ambiguity Resolution Mechanisms.................................................................................................................... 73
11. Resolving Shift/Reduce Conflicts....................................................................................................................... 76
12. Resolving Reduce/Reduce Conflicts................................................................................................................... 78
13. Resolving Ambiguities -- A Summary................................................................................................................ 80
14. Error Recovery Utilities........................................................................................................................................ 81
15. Imbedded Actions.................................................................................................................................................... 84
XI. DEBUGGING -- TOOLS AND TECHNIQUES................................................................................................... 85
1. Correcting Syntax Errors....................................................................................................................................... 85
2. Correct Symbol Usage Errors................................................................................................................................ 87
3. Correcting Grammar Rule Errors........................................................................................................................ 89
XII. CONSTRUCTING COMPILERS -- REVISITED............................................................................................. 98
1. Basic Architecture................................................................................................................................................... 98
2. Lexical Analysis..................................................................................................................................................... 100
3. Syntax Analysis...................................................................................................................................................... 102
4. Semantic Analysis................................................................................................................................................. 103
5. Code Generation..................................................................................................................................................... 107
6. Symbol Table Management................................................................................................................................... 109
7. Error Diagnostics.................................................................................................................................................. 113
XIII. YAEC -- YET ANOTHER EXAMPLE COMPILER...................................................................................... 115
1. Global Definition Head File.................................................................................................................................. 116
2. Lexical Token Definition Header File................................................................................................................ 117
3. Lexical Analysis Module....................................................................................................................................... 118
4. Syntactical Analysis Module................................................................................................................................ 120
5. Semantical Checking............................................................................................................................................ 122
6. Main Routine and Error Handling Module........................................................................................................ 125
7. Execution Module................................................................................................................................................... 126
XIV. UNIQUE FEATURES OF PCYACC................................................................................................................ 129
1. Quick Syntax Check Option................................................................................................................................ 129
2. Generating Parse Trees Using PCYACC......................................................................................................... 129
4. Lexical Analysis Caveats - Combining Lex & Yacc......................................................................................... 132
XV. ERROR PROCESSING WITH PCYACC...................................................................................................... 134
1. Error Reporting...................................................................................................................................................... 134
1.1. Integration with a Lexical Scanner................................................................................................................. 135
1.2. YYERROR Calling Conventions...................................................................................................................... 135
2. Error Handling....................................................................................................................................................... 136
2.1. Simple Recovery.................................................................................................................................................. 137
2.2. Improved Recovery.............................................................................................................................................. 137
2.3. Doing Your Own Parser Recovery.................................................................................................................. 139
2.4. The ANSI C Parser............................................................................................................................................ 140
3. Error Recovery....................................................................................................................................................... 140
3.1. Error Recovery in IMPROVED.Y.................................................................................................................... 141
3.2. Error Recovery in PIC....................................................................................................................................... 141
4. Building Parsers.................................................................................................................................................... 142
4.1. TOKENS program.............................................................................................................................................. 143
4.2. Command Line Format....................................................................................................................................... 143
4.3. Command Line Options..................................................................................................................................... 143
4.4. Using Command Line Options.......................................................................................................................... 144
5. Wrapup..................................................................................................................................................................... 144
XVI. USING PCYACC WITH C++ AND MICROSOFT WINDOWS............................................................... 146
XVII. PCYACC AND PCLEX RECURSION......................................................................................................... 147
XVIII. PCYACC CROSS REFERENCING WITH - YACC TOOL..................................................................... 149
1. Create GRAMMAR FOREST from YTOOL.EXE............................................................................................ 150
XIX. Invoking pcYaccDeBugger - YDB.................................................................................................................. 151
APPENDIX I. INSTALLING PCYACC................................................................................................................... 153
1. System Requirements........................................................................................................................................... 153
2. Unpack and Backup the PCYACC PROGRAM Disk...................................................................................... 153
3. Installing PCYACC............................................................................................................................................... 155
APPENDIX II. ERROR MESSAGES................................................................................. 156
APPENDIX III. ANNOTATED BIBLIOGRAPHY................................................................................................. 160
1. GENERAL REFERENCES..................................................................................................................................... 160
2. Error Recovery References.................................................................................................................................. 163
APPENDIX IV. GLOSSARIES................................................................................................................................. 164
i. General Glossaries................................................................................................................................................ 164
ii. PCYACC Specific Glossaries............................................................................................................................ 166
iii. PCYACC INPUT SYNTAX SUMMARY.......................................................................................................... 169
The ABRAXAS PCYACC is a parser generator. A Parser Generator is a program that produces syntactical analyzers automatically for language translators from high level descriptions of the languages. Like any other program, PCYACC accepts inputs and produces outputs. The inputs to PCYACC are syntax specifications of the programming languages to be developed. Syntax specifications are written using a special purpose language which is referred to by PCYACC as the grammar description language (GDL). The outputs from PCYACC are C implementations of corresponding language recognizers.
PCYACC is a powerful and practical programming tool for software developers. It can dramatically reduce the amount of work in programming language development projects. As a result, it can reduce the production cycle time and hence the cost of software projects.
This manual describes ABRAXAS PCYACC. In addition to the most important theme of providing a reference resource to this software product, the manual includes material that explains the basic principles of the program. The contents are useful to readers with a wide range of expertise, from professional software developers to novice programmers.
PCYACC is largely independent of machine hardware characteristics. It will run on most micro computer systems, and can easily be adapted to others. Therefore, our presentation will also maintain system independence, wherever possible. When we need to refer to a particular system configuration in order to make discussions concrete, we use a MSDOS command shell.
In the appendix, we will provide some tips on system dependent characteristics of PCYACC, such as installations on different machines, and how to take advantage of other application tools such as intelligent editors.
Throughout this manual, we will use the following typefaces and syntax conventions to add clarity:
1). Words or phrases that have important technical meaning or that have special interpretations in PCYACC will be underlined when they appear the first time in the manual. (Appropriate definitions and/or explanations are also given at this time, using normal text style.)
Example: PCYACC programs are written in the grammar description language, or GDL, which is a BNF like language suited to syntactic specifications of programming languages.
2). When illustrating user-machine interactions, we will print the text to be typed by the user in italics. Messages or prompts from the machine will be indicated with a double underline. File names and operating system commands will be shown in uppercase (except when case is sensitive). The enter key will be represented as <ENTER>.
Example: PCYACC SACALC.Y <ENTER>
3). Displays, such as diagrams or code listings, will be printed in mono spaced font. They will be indented to help you distinguish them from surrounding text.
Example: The following is a header file for one of the examples you will see later in this manual.
#include <stdio.h>
#include <ctype.h>
#define LINE 001
#define BOX 002
...
4). Keywords in GDL will be printed in boldface when they appear in text.
Example: token, type, ...
This manual contains fourteen (14) chapters. The contents of these chapters are as follows:
Chapter 2 is an overview of PCYACC; its origin, history, and evolution.
Chapter 3 contains information about PCYACC invocation, command line format and options.
Chapter 4 presents the most basic concepts about computer programming languages and compilers.
Chapter 5 steps through a simple arithmetic calculator program example constructed using PCYACC.
Chapter 6 is a continuation and/or extension of Chapter 4. It summarizes existing basic concepts about languages and translations and introduces some new ones.
Chapter 7 is a slightly more complicated example program that translates infix arithmetic expressions into their postfix notations. A step by step procedure is provided for the entire program development process. This Chapter is intended to give you more hands-on experience using PCYACC.
Chapter 8 is devoted to discussing fundamental principles of PCYACC (or LALR parser generators in general). The discussion is brief and informal, so serious readers need to consult relevant literature for more information. (Compilers: Principles, Techniques, and Tools, the "Dragon" book, is highly recommended for those interested in detailed theory.)
Chapter 9 describes the macro structures of PCYACC grammar description files. These files are created for input to PCYACC specifying the intended language syntax. Formats for writing grammar descriptions and the parts required for such descriptions are outlined.
Chapter 10 details PCYACC internals from the programmer's viewpoint. It overviews and describes features of PCYACC, illustrating how these features work. Several examples are also included.
Chapter 11 describes debugging techniques for program development using PCYACC. It lists common errors that a programmer might encounter and suggests ways to fix them. It also explains the debugging facility provided by PCYACC: parsing tables revealing internal behaviors of parsers generated by PCYACC.
Chapter 12 looks at the general principles of compiler construction. It describes a typical architecture for compilers and outlines each component's function.
Chapter 13 is intended as another hands on project example. A simple picture specification language is defined and a compiler for it is presented.
Chapter 14 describes unique features of PCYACC. It also provides some tips on how to use these features more effectively with existing software tools.
Appendix I states configuration requirements and provides installation instructions.
Appendix II is a list of error messages from PCYACC and their explanations.
Appendix III suggests a list of related literature for further reference.
Appendix IV is a glossary defining the PCYACC vocabulary.
This Chapter provides a brief historical overview of PCYACC and a description of its functionality at an abstract level.
PCYACC is a descendant of YACC, which is the acronym of "Yet Another Compiler-Compiler". YACC was originally designed and implemented by Stephen C. Johnson of AT&T Bell Laboratories over ten years ago. It is one of the standard language tools provided in almost all UNIX operating systems environments. PCYACC then, is a MSDOS implementation of the original UNIX YACC as defined by Johnson.
During the past decade or so, numerous software projects, large and small, have been developed using YACC. As such, YACC has historically been one of the most valuable tools available to programming language developers. YACC can dramatically reduce the time and complexity normally involved with compiler development. Compiler writing tools, such as YACC, provide developers with a practical means of writing the so called LR parsers. LR parsers have a number of advantages over more traditional technologies, such as recursive descent parsing.
Because YACC is such a useful tool, many developers have ported it to their own systems, or re-implemented it in their own software environments. PCYACC is an example of an implementation of YACC on micro computer systems.
PCYACC is designed and implemented to be upward compatible with YACC. All the features of YACC are retained in PCYACC, some of which have been improved. In addition, a number of new functions have been incorporated into the design of PCYACC. Although PCYACC has an almost identical programming interface with its ancestor, there are a few differences. These will be discussed as we go along.
PCYACC is no more than a computer program. In this respect, it is like any ordinary program; it takes inputs, computes values and produces results. There is a special class of programs, called compilers, which take computer programs as input, usually written in a high level programming language, and produces as results corresponding low level programs. (We will use the word compiler loosely to include other types of translators, unless otherwise specified. In the next Chapter we present a taxonomy.)
PCYACC is even more specialized, because it is a compiler generator, a program that writes compilers. PCYACC is a software tool that you will find useful when you wish to write your own compilers, or other similar programs such as command interpreters. (Or, for that matter, any program that needs to be able to read and understand one of the numerous high level languages. "Pretty printers", formatting utilities, cross reference generators, and specialized programs such as "LINT" are examples. Many of the GDL's necessary to parse high level languages are included in the Professional version of PCYACC.)
Based on what it does, PCYACC can still be classified as a compiler, a program that translates programs in one language into programs in another language. PCYACC uses a special programming language, which is referred to as a grammar description language (GDL). The object language of PCYACC is the C/C++ programming language, however the generation of Java, Basic, or Pascal is possible by using our Object Oriented Toolkit. Put simply, PCYACC takes programs written in GDL as input, and produces C programs as output by default. The resulting C/C++ programs are equivalent to the input GDL programs, in that a GDL program is a specification of the formal syntax for a programming language. The C program generated by PCYACC will be a parser for the corresponding programming language that satisfies the specification. The idea is illustrated by the following diagram:

NOTE: Although PCYACC (or YACC) is said to be a compiler generator, or compiler compiler as implied by its name, it is really a parser generator. PCYACC (YACC) can only generate a portion, often referred to as the front end, of a compiler.
PCYACC is invoked by simply typing PCYACC while running under MSDOS command shell. This Chapter explains the command line format and possible options.
PCYACC can be invoked by typing PCYACC, followed by zero or more command line options, followed by a file name. For example:
PCYACC [options] <gdf_name>
Where <gdf_name> is the name of the file containing a grammar description program (GDP), a program written in GDL, and [options] represents zero or more command line options. Files used to hold GDP's are called grammar description files, or GDF's for short.
Although there is no restriction on the format of input file names, it is a good practice to give PCYACC GDF's an extension field distinguishable from other kinds of files. Recommended extensions are ".y", ".Y", ".pcy" and ".PCY" and we will use ".y" throughout this document. PCYACC compiles the GDF <gdf_name> and produces a C program that is an LALR (look-ahead LR) parser for the language defined by the GDP. By default, the generated parser is kept in a file with an extension ".c" with the same basename as the GDF <gdf_name>.
If PCYACC is invoked without a grammar description file, it will display a short message advising you of the correct command line format.
Command line options are used to override default actions or file name conventions, or to indicate actions you want PCYACC to perform in addition to what it does automatically. Available options are described below:
-c: This option overrides default C file name. Instead of using the basename of the grammar description file plus the ".c" extension, it uses "yytab.c". This option is provided to maintain compatibility with earlier versions.
-C<cf>: Like -c, this option overrides default C file name, but uses the name provided by the user, <cf>.
-d: This option tells PCYACC to produce a C header file, using the default file name "yytab.h", in addition to the C code file. This header file is used primarily by your lexical analysis routine yylex(). The definitions generated by PCYACC are used globally at parse time unless your yylex() routine is local to your grammar. PCYACC basically enumerates all of the tokens declared in the grammar, and these enumerated values are used as messages between yyparse() and yylex().
-D<hf>: Like -d, this option produces a C header file, but with a different file name convention. If no <hf> is provided, PCYACC will use the basename of the grammar description file with an extension ".h"; otherwise <hf> will be used instead.
-h: Print a help screen.
-n: Disable #line numbers from the .C output of PCYACC. This option is quite useful if your trying to use a source code debugger. In normal operation the output .C file uses #line to make the output relative to the original .y file, this normally causes source code debuggers like CodeView to generate strange results.
-p<pf>: Use the user provided parser skeleton contained in <pf> file instead of the system default ( internal skeleton ). A sample parser skeleton is supplied in the \src directory of the PROGRAM diskette. ( yaccpar.c ). The external parser skeleton is a commonly used to support multiple parsers.
-P<pf>: Same as -p<pf>.
-r: Report progress during execution. This is a good idea for huge grammars that take seemingly forever to compile.
-R: Report progress during execution.
-s: This option instructs PCYACC produces short integer internal arrays for the parser. The default type for the internal arrays is long integer.
-S: This option overrides PCYACC's default action. Instead of processing the grammar description file, it quits after the syntax analysis phase. This option is useful for doing syntax debugging on large grammar description files, especially when coupled with an extensible text editor (see Chapter XIV).
-t: This option tells PCYACC to construct the parser in such a way that it will build a parse tree for the program being processed. The parse tree, by default, is saved to the file "yy.ast". ( not compatible with the -p switch, requires internal skeleton parser ). The parse tree is not actually generated until the parser is executed.
-T<tf>: Same as option -t, except with different file name conventions. If <tf> is not provided, the parse tree is saved to the file named by the basename of the grammar description file with an ".ast" extension; otherwise, it is saved to <tf>.
-v: This option produces a textual parsing table, in addition to the C parser, using the default file name "yy.lrt". The parsing table is a required tool in debugging PCYACC grammars.
-V<vf>: Same as option -v, except that the parsing table is saved to either <vf> or a file named by the basename of the grammar description file and the extension ".lrt".
NOTE: Command options may change over time. Consult the "readme.now" file in the root of the distribution disk for the latest information.
3. Using Command Line Options
This section shows you how to use command line options. The following example, HELLO.Y, is used throughout this section:
%token WORLD
%token HELLO
%start greetings
%%
greetings : HELLO ',' WORLD ;
At its default setting, PCYACC produces the C file HELLO.C for this example grammar.
3.1 Override Output C File Name Convention (-c and -C)
-c switch overwrites the output C file name convention of PCYACC. The C output is written to YYTAB.C.
Example: PCYACC - c hello.y <ENTER>
produces YYTAB.C in your current folder.
-C<cf> option lets you specify a name for the C output of PCYACC.
Example: PCYACC -C greet.c hello.y <ENTER>
produces GREET.C in your current folder.
3.2 Produce Token Definition File (-d and -D)
These two options let you have a separate file containing token definitions requested by the grammar file. The -d option instructs PCYACC to send the output to YYTAB.H.
Example: PCYACC -d hello.y <ENTER>
produces YYTAB.H in your current folder.
-D also gives you a header file. It allows you to specify a name for the file, or to use a different default.
Example: PCYACC -D hello.y <ENTER>
sends the header definitions to the file HELLO.H.
Example: PCYACC -D hello.y <ENTER>
produces the header definitions to the file GREET.H.
The contents of the three files, YYTAB.H, HELLO.H and GREET.H, are identical:
#define WORLD 257
#define HELLO 258
3.3 Ask PCYACC for Help (-h and -H)
The two options, -h and -H, do exactly the same thing. They ask PCYACC to print out a help message on the screen. This feature is especially useful to new users.
Example: PCYACC -h <ENTER>
displays the following message:
Abraxas Software (R) PCYACC version - 7.01.
Copyright (C) Abraxas Software, Inc. 1986 - 1997, All rights reserved
Usage: PCYACC options gram.y
Available options:
-c : use file "yytab.c" for C output
-Cf: use file f for C output
-d : produce a header file in "yytab.h"
-Df: produce a header file in f or grm.h
-h : display this help message
-H : display this help message
-pf: use skeleton f as parser driver
-Pf: use skeleton f as parser driver
-s : produce short integer internal arrays
-S : quick syntax check on input file grm.y
-t : build parser with parse tree generation using "yy.ast"
-Tf: build parser with parse tree generation using f or grm.ast
-v : produce a parsing table for the parser using "yy.lrt"
-Vf: produce a parsing table for the parser using f or grm.lrt
3.4 Override System Default Parser Skeleton (-p and -P)
The -p option and the -P option have the same function. They let you write your own parser skeleton to drive the LALR table generated by PCYACC.
Example: PCYACC -p myparser hello.y <ENTER>
will use MYPARSER to produce HELLO.C.
3.5 Generate Short Integer Arrays (-s)
The option -s instructs PCYACC to generate short integer arrays for the parser. The default type for the internal arrays is integer. This feature is useful on systems that have severe memory constraints. (Note: "int" may mean different things to different C compilers; check your manual and change "int" to "short" if necessary. Apple MPW C thinks that "int" means a 32-bit value. Lightspeed™ C thinks that "int" means a 16-bit value.)
Example: PCYACC -s hello.y < ENTER>
will generate short integer arrays in HELLO.C. Most 32-bit compilers treat int as a long ( 32-bit ) and short as 16-bit. In general this switch is used when saving space is essential and your working with small tables.
3.6 Quick Syntax Check (-S)
The -S option is useful for large grammar files. It instructs PCYACC to do a syntax check on the grammar file. No other processing is performed when this switch is used.
Example: PCYACC -S hello.y < ENTER>
will perform a quick syntax check on HELLO.Y. No HELLO.C is produced.
3.7 Build Parser with Parse Tree Generation Ability (-t, -T)
The -t option is useful for debugging grammar files and studying the language structures of the source language. With this option, the parser generated by PCYACC will produce textual form parse trees for source programs, using the default file YY.AST.
Example: PCYACC -t hello.y <ENTER>
will make HELLO.C have the ability to produce parse trees for source programs. The tree is saved to the file YY.AST in this example when HELLO.C is compiled an executed.
The -T options has the same function as the -t function. It uses a different naming convention for the file to dump the parse tree information.
Example: PCYACC -T hello.y < ENTER>
will make the parser use the file HELLO.AST to store the parse tree information when HELLO.EXE is executed.
Example: PCYACC -T greet.ast hello.y < ENTER>
will make the parser use the file GREET.AST to store the parse tree information.
When the program generated from HELLO.Y in this example is applied to the following program:
hello, world
The file GREET.AST (YY.AST, or HELLO.AST) is created with the following information:
greetings HELLO , WORLD
which represents the parse tree for the input program:

Abstract Syntax Tree for Hello.Y
3.8 Use Verbose Mode (-v and -V)
The -v option produces verbose output in YY.LRT. The output resembles an LALR parsing table with additional information.
Example: PCYACC -v hello.y <ENTER>
generates a text file YY.LRT in your current folder.
The -V option has the same function as the -v option. -V lets you redirect the verbose output to either the file HELLO.LRT or a file of your choice.
Example: PCYACC -V hello.y <ENTER>
creates the verbose output in HELLO.LRT.
Example: PCYACC -Vgreet.lrt hello.y <ENTER>
produces the verbose output in GREET.LRT.
Partial contents of the file GREET.LRT (HELLO.LRT, or YY.LRT) is shown next:
-*-=-*-=-*-=-*- LALR PARSING TABLE -*-=-*-=-*-=-*-
+-------------------- STATE 0 ---------------------+
+ CONFLICTS:
+ RULES:
$accept : ^greetings $end
+ ACTIONS AND GOTOS:
HELLO : shift & new state 2
: error
greetings : goto state 1
...
+-------------------- STATE 4 ---------------------+
+ CONFLICTS:
+ RULES:
greetings : HELLO , WORLD^ (rule 1)
+ ACTIONS AND GOTOS:
: reduce by rule 1
==================== SUMMARY ====================
grammar description file = hello.y
number of terminals used = 5; limit = 500
number of nonterminals = 1; limit = 500
number of grammar rules = 2; limit = 1000
number of states = 5; limit = 1000
number of s/r errors = 0
number of r/r errors = 0
...
-*-=-*-=-*-=-*- END OF TABLE -*-=-*-=-*-=-*-
IV. BASICS OF PROGRAMMING LANGUAGES
This Chapter introduces basic concepts related to computer programming languages. Some important terminology’s to be used in later chapters are also explained.
1. What Is A Computer Programming Language?
The types of work that can be done and the speed at which these types of work are done by computers are quite overwhelming. Nevertheless, computers do not have intelligence -- they can do nothing without being given step by step instructions, called programs. These programs that make computers appear to be miracles are presented to the computer in the form of computer languages.
Computers can only understand instructions written in a special kind of language, called machine languages. Machine language sentences are phrased in terms of binary digits, (0's and 1's). Machine languages are so primitive that we as computer users find them difficult to understand. Consequently, writing instructions that tell computers what to do, or programming, had been considered a tedious task, before people started to explore the possibility of teaching computers secondary languages. These secondary languages, called high level languages, use sentences that make more sense to human minds.
Computers can't really be taught to understand secondary languages. As a result we have developed translators for computers, which are capable of converting computer programs written in high level languages into instructions in a computer's native language -- machine language, or low level language. There is another kind of low level language, called assembly language. Assembly language is directly derived from machine language by assigning a mnemonic name to each machine instruction.
Translators don't have to be restricted to do the translations from high level languages to low level languages. There are different kinds of translators for computers, and they have been given special names. Translators that translate assembly language programs into machine language programs are called assemblers. Conversely, translators that translate machine language programs into assembly language programs are called disassemblers. Translators that translate high level language programs into low level language programs are called compilers. And translators that translate programs in one high level language into programs in another high language are called preprocessors (such as C++).
For a given translator, programs to be translated are called source programs, and the results of the translation are called object programs. The language in which source programs are written is referred to as the source language, and the language in which object programs are composed is referred to as the object language.
With compilers, human users no longer need to deal with machine languages in order to have computers do useful things, and programming has become a much more pleasant task. Carrying out the effects prescribed by programs written in high level languages is divided into two phases: a compilation phase and an execution phase. During the compilation phase, high level language programs are first translated into machine languages.
During the execution phase, the machine language programs are executed. Schematically, this process can be illustrated by the following diagram:

V. GETTING STARTED -- A SMALL EXAMPLE
The purpose of this Chapter is to give you an idea of how to use PCYACC in a language development project. To achieve that, we assume you are familiar with the C programming language. We also assume you have an , ABRAXAS PCYACC, , and a C compiler.
This Chapter gives you an overview of the program development process using PCYACC. The example used in this Chapter is a simple calculator capable of doing ordinary arithmetic operations. The example will show you the PCYACC program listing for SACALC, the name given to the simple arithmetic calculator program, and illustrate how to build the executable SACALC using PCYACC and a C compiler. In a later Chapter we will detail the development procedure with a slightly more advanced example.
1. Grammar Description File for SACALC
The following is the listing of the PCYACC GDF for the calculator example, SACALC.Y. For reference, line numbers are added to the statements (line numbers should NOT be included in your GDL's).
01: %{
02:
03: #define YYSTYPE double /* stack data type */
04:
05: %}
06:
07: %token NUMBER
08: %left '+' '-' /* left associative */
09: %left '*' '/' /* left associative */
10: %left UNARYMINUS
11:
12: %%
13:
14: list: /* nothing */
15: | list '\n'
16: | list expr '\n'
17: { printf("\t%.8g\n", $2); }
18: | list error '\n'
19: { yyerrok; }
20: ;
21:
22: expr: NUMBER
23: { $$ = $1; }
24: | '-' expr %prec UNARYMINUS
25: { $$ = -$2; }
26: | expr '+' expr
27: { $$ = $1 + $3; }
28: | expr '-' expr
29: { $$ = $1 - $3; }
30: | expr '*' expr
31: { $$ = $1 * $3; }
32: | expr '/' expr
33: { $$ = $1 / $3; }
34: | '(' expr ')'
35: { $$ = $2; }
36: ;
37:
38: %%
39:
40: #include <stdio.h>
41: #include <ctype.h>
42: char *progname; /* for error messages */
43: int lineno = 1;
44:
45: main(argc, argv)
46: char *argv[];
47: {
48: progname = argv[0];
49: yyparse();
50: }
51:
52: yylex()
53: {
54: int c;
55:
56: while ((c=getchar()) == ' ' || c == '\t' );
57:
58: if (c == EOF)
59: return 0;
60: if (c == '.' || isdigit(c)) { /* number */
61: ungetc(c, stdin);
62: scanf("%lf", &yylval);
63: return NUMBER;
64: }
65: if (c == '\n')
66: lineno++;
67: return c;
68: }
69:
70: yyerror(s) /* called on syntax error */
71: char *s;
72: {
73: warning (s, (char *) 0);
74: }
75:
76: warning(s, t) /* print warning message */
77: char *s, *t;
78: {
79: fprintf(stderr, "%s: %s", progname, s);
80: if (t) fprintf(stderr, " %s", t);
81: fprintf(stderr, " near line %d\n", lineno);
82: }
83:
This example, even though extremely small, exhibits the typical structure and components of a PCYACC grammar description program. Lines 1 through 11 form the so called declaration section, where token symbols, operator precedences, etc., are declared. Lines 12 through 37 make up the grammar rule section, where the grammar rules specifying the language being developed are placed. Lines 38 through 83 form the program section, where supporting C routines for the compiler are written. As illustrated by this example, a grammar description program is made up of three sections: a declaration section, a grammar rule section and a program section. Two of the three sections, the declaration section and the program section, can be empty. (If there is no declaration section, the delimiter "%%" will still be needed to tell PCYACC to start processing the grammar rule section.)
The symbol pair on line 1 and line 5 is a delimiter that is used in the declaration section to include C statements, such as preprocessor directives, global data structure definitions or variable declarations. (In this example, there is a preprocessor directive.) PCYACC will not look inside these delimiters. Everything inside the delimiters will be passed to the C program that is generated without any change.
Line 7 declares NUMBER to be a token, a grammar symbol that cannot be used as the left hand side of grammar rules. (When yylex is called, it is supposed to return the type of token that it found. NUMBER will be declared in a #define clause in the generated program. It can also be put into an include file if yylex is not part of the generated program. Token types like NUMBER and also 0, which signifies end of file, are what the generated parser is expecting to receive when yylex is called.)
Lines 8 through 10 define the associativity of the arithmetic operators involved. All the operators are declared to be left associative (i.e., if the statement is a+b+c, then a+b will be calculated first). These statements also convey the following information: addition (+) and subtraction (-) have the same precedence; multiplication (*) and division (/) have the same precedence and it is higher than addition or subtraction; the unary operator, namely the negation sign, has the highest precedence.
Line 12 is a delimiter separating the declaration section from the grammar rule section. Lines 14 through 20 are short hand for the following four grammar rules:
(1) list : ;
(2) list : list '\n' ;
(3) list : list expr '\n' ;
(4) list : list error '\n' ;
These rules say that a list can either be empty (1), be a list followed by a new line character (2), be a list followed by an expression and a new line character (3), or be a list followed by something which, in this case, is an error (4). In short hand notation, the colon (:) is used to delimit the left hand side of grammar rules. The vertical bar (|) is used to delimit the grammar rules that have a common left hand side nonterminal grammar symbol, or the alternatives of this common nonterminal symbol. The semicolon (;) is used to terminate grammar rules. Under the grammar rules for list, on lines 17 and 19, there are commands enclosed in braces. These are called actions. Similarly, lines 22 through 36 define grammar rules for expressions.
The second %% delimiter separates the grammar rule section from the program section. Everything in the program section is also copied to the output of PCYACC. This section defines three C functions: main(), yylex() and yyerror(). Note that these three functions are always required to provide support for the parser generated by PCYACC. (They can be in separate files and simply linked during program generation.) The following discussion will help you understand how to combine what PCYACC produces with the supporting functions written by the programmer to make an integrated C program.
What PCYACC produces is a C function, yyparse(), which is, technically, an LALR parser for the language defined by the grammar rules of in the GDF. The function of the user provided main function, main(), is to activate the parser, perform necessary initialization before activation, and to clean up after activation. The lexical analyzer, yylex(), is the front end of the parser. The lexical analyzer is responsible for decomposing raw text strings into meaningful lexical units, called tokens, and passing this information to the parser. The error processing routine, yyerror(), is called by the parser when a syntax error is uncovered during parsing. The relationship among these four routines is depicted by the following diagram:

The three sections, the declaration section, the grammar rule section and the program section, will be discussed in detail in later chapters.
2. Building the Executable SACALC
To invoke PCYACC on the grammar description file SACALC.Y, issue the following command:
PCYACC SACALC.Y< ENTER>
The result of this operation is a C program. A file with the name SACALC.C will be created in the current folder, which is the C program for the calculator. To obtain the executable version of the SACALC, invoke the C compiler as follows (assuming the MICROSOFT C compiler here):
CL SACALC.C <ENTER>
This will generate the object file. Now use the LINK command to produce the actual MSDOS tool:
LINK SACALC.OBJ <ENTER>
(Note: there is a "make" file, SACALC.MAK, that can be invoked from MSDOS to automatically generate an tool. If you use a different C compiler, you will need to follow your compiler's procedures for building an application. This application will need some means, possibly a dialog, to get input and display results.)
3. Sample Session of SACALC at Work
After the executable version of the SACALC is built successfully (in our example, we would have a SACALC in the current folder), we can use it to perform simple arithmetic. The following is a sample session of SACALC at work:
SACALC <ENTER>
1 + 2 <ENTER>
3
2.5 + 4 * 1.5 <ENTER>
8.5
2 / 4 - 3 * 0.5 / 3 + 5 <ENTER>
5
This Chapter reviews and summarizes important concepts related to PCYACC that we have encountered so far, plus some new ones. We will list basic technical terms to prepare for more formal discussion of these subjects in a later Chapter.
A programming language is an artificial language used to write instructions for computers to perform certain tasks. Programs, which can have any number of instructions, are the basic form in which these instructions are presented to computers for execution. There are two classes of programming languages: machine languages and high level languages. Programs written in machine languages are directly understood by computers and can be executed immediately. Programs written in high level languages have two different modes of execution: interpreted execution and compiled execution. In an interpreted execution mode, instructions in a program are interpreted one at a time. Each step includes translation of one instruction followed by machine actions dictated by the instruction. This interpretive execution process is controlled and performed by another program, called an interpreter. A compiled execution mode is divided into two phases: a compilation phase and an execution phase. During the compilation phase, a compiler first translates the source program into the object program. In the execution phase, the object program is then executed.
A programming language can be defined in a number of ways. For example, you could list all legal programs if there are only a finite number of them. A better way is to use the so called context-free grammars (CFG's), which have been one the most versatile techniques for defining programming languages. There are several advantages to using context-free grammars as opposed to some other means. First, the formal theoretical aspects of CFG's have been studied intensively and are well understood. Second, CFG's are founded on a formal basis and are very precise, as far as defining syntax is concerned. Third, CFG's are powerful enough to encompass most programming languages in practice.
Each programming language has a particular set of predefined vocabularies with which programs can be written. A context-free grammar defining a programming language also uses the same set of vocabularies, which are referred to as terminal symbols, or tokens, of the CFG. In addition to tokens, the CFG needs a separate set of vocabularies so that it can express intermediate constructs needed for defining the language. These additional vocabularies are called nonterminal symbols of the CFG. A grammar symbol can be used to mean either a terminal symbol or a nonterminal symbol. The key components of a CFG is a set of production rules, or grammar rules, constructed in the following form:
R: X --> X1 X2 ... Xn
With this form, a grammar rule R is defined. X is called the lefthand-side (LHS) of R and must be a nonterminal symbol. X1 through Xn form the righthand-side (RHS) of R. The RHS of a grammar rule can also be empty. When not empty, each component of the RHS can either be a terminal symbol or be a nonterminal symbol. The last element in a context-free grammar is a start symbol, which is a special nonterminal symbol used to indicate the most significant construct of the language being defined.
2. BNF -- A Language for Writing CFG's
Backus-Naur-Forms (BNF's) are often used to write context-free grammars. An extended version, referred to as Extended Backus-Naur-Forms (EBNF's), will be used to illustrate how to write a context-free grammar.
The basic conventions of using EBNF's to write a context-free grammar are the following:
1). terminal symbols are always quoted using the single quotes(');
Example: '+', '-', '*'
2). Non-terminal symbols are always enclosed using the angle brackets (< and >);
Example: <list>, <expr>
3). a space is used to separate two consecutive grammar symbols;
Example: <list> '\n'
4). a special symbol, ::=, is used to separate the LHS and the RHS of a grammar rule;
Example: <list> ::= <list> '\n'
5). two consecutive grammar symbols in a row have a meaning of syntactic concatenation;
Example: <list> ::= <list> <expr> '\n'
6). two grammar symbols separated by a vertical bar, |, have a meaning of syntactic alternative;
Example: <list> ::= <list> '\n' | <list> <expr> '\n'
7). parentheses can be used to group grammar symbols;
Example: <list> ::= <list> ('\n' | <expr> '\n')