CS445/CS545 Compiler Construction
Lectures: TR 9:30-10:45am Science Center 354
hours: Science Center 373 TR 10:45-12:15,
Pre-requisites: CS344, CS345.
SYLLABUS: A study of compiler design. Overview of the compilation
process. Formal definition of syntax, lexical scanning, parsing including LL and
LR grammars, run-time structures, intermediate code generation, and storage
allocation. Students are expected to develop a compiler for a substantial subset
of a high-level language using compiler tools such as lex and yacc.
GRADING Project; Exams; Miscellaneous;
- Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman. "Compilers: Principles,
Techniques, and Tools," Addison-Wesley (1986).
- Brian W. Kernighan and Dennis M. Ritchie, "The C Programming Language,"
2nd edition, Prentice-Hall (1988) (on reserve)
- John Levine, Tony Mason, and Jason Brown, "lex & yacc," 2nd edition,
O'Reilly (1992) (on reserve)
- Richard P. Paul, "SPARC Architecture, Assembly Language Programming, &
C," 2nd edition, Prentice-Hall (1994) (on reserve)
- Language translation. Compilation. Overview.
- Lexical Analysis. Regular expressions.
- Syntax Analysis. Theory of Parsing: top-down LL(1); bottom-up SLR(1),
LR(1) (and LALR(1)).
- Semantic Analysis. Parse trees. Syntax-directed translation. Type
- Code generation. Runtime environments.
The objective of this course is to learn the
fundamentals of compiler theory and its application in constructing a real
working compiler. The specific outcomes are:
- Proficient in C programming.
- Knowledge of useful UNIX tools for software design and compiler
construction (make, lex, yacc).
- Understanding of the basics of SPARC assembly language and its run-time
- Ability to build a Pascal-like compiler, written in C, for the SPARC
architecture in the UNIX programming environment.
Although attendance is not mandatory, students are responsible
for all course materials covered in lectures and any exams given during class
periods. Students that need to make up missing course work must follow the
official Clarkson policies regarding these matters. All students must submit
their own work; the exchange of ideas are encouraged but ultimately the
submitted work must be the student's own. Please refer to the Clarkson
University Regulations for more guidelines on academic integrity and related
- Pascal lex scanner & yacc parser:
- adapt the Pascal language description from Appendix A (Dragon book) to
create a lex scanner and yacc parser
- some grammar changes include: allowing subprograms to nest within each
other, allow array access on RHS of assignment statements, handling of unary
minus, handling of procedure calls with no arguments, etc.
- some additional lexical tasks: handling of real numbers, allowing
- yacc parser + symbol table:
- implement a stack of hash tables for handling scoping rules
- connect the symbol table module with the yacc parser to detect
identifier problems, such as: redeclaration of names within the same scope,
usage of undeclared names, potential improper usage of names, etc.
- semantic analyzer & type checker:
see the list of checks that we need
to handle (note the important distinction placed by Pascal with respect to
functions and procedures).
- build a syntax (parse) tree representation of parts of the input program
- type checking of expressions and statements
- code generator:
- implement a SPARC code generator for expression trees; see Section 9.10
pp 557 of the Dragon book.
- handle function calls and return values.
- implement code generator for statements.
- handle non-local access of names within nested scopes.
- handle Input/Output (read and write procedure calls in Pascal).
- Changes to the grammar:
- Allow nested subprogram declarations in subprogram declaration.
- Allow array access to appear on the RHS of an assignment statement.
- Split the num token into integer inum and real
- Adapt the rules involving the sign variable.
- Arrange for unary minus to be handled in factor.
- Error reporting tips for lex &
yacc (courtesy of R. Clarvoe).
- The real Pascal compiler is alive and well and it is called pc.
It's best to not call your executable pc to avoid unintentional
overloading. Use names like yapc (Yet Another Pascal Compiler) or
mypc or related names.
- For Linux users:
Use yacc or bison (for yacc) and flex (for
The only library flag we need is -lfl (instead of -ll -ly).
older versions, the yacc/bison parser uses the prefix filename to create its C
and header files. So the command yacc -dv pc.y will create
pc.tab.c and pc.tab.h (and pc.output) instead of y.tab.c,
y.tab.h and y.output.
We might not be able to override certain things (like
Unused token attributes will create errors.
- Code optimization on DAGs
- Adding (variant) records and pointers to the Pascal source language
- Using C for the source language
- Using Intel as the underlying architecture
READINGS/NOTES (under construction)
- Review of course and project: we are going to build our own Dragon
(compiler) for a Pascal-like programming language (source language) that will
produce SPARC assembly (target language) output. Our compiler (language
translator) will be implemented in C. Core components of the compiler include:
- Scanner: module that tokenize the input stream (of characters) into a
stream of tokens (legal lexical elements of the language). This component
can be speficied using the tools of Regular Expressions and the module
itself is a DFA program. We will build our scanner using lex (or
flex under Linux/BSD).
- Parser: module that checks whether the stream of tokens is syntactically
correct according to the grammar of the source language. This component can
be specified using the tools of Context-free Grammars and the module itself
is a PDA program. A very important task of the parser is also to build the
parse (syntax) tree representation of the input program. We will build our
parser using yacc (or bison under Linux/BSD)
- Symbol Table: module that stores information about names that appear in
the input program. A common data structure used for this module is a hash
table with chaining.
- Semantic analyzer: module that performs semantic checks on the parse
tree. Typical checks include undeclared names, type mismatches, illegal
usage of subroutines, and others. An important sub-module called the Type
Checker is usually part of this module.
- Code generator: module that converts an intermediate representation of
the input program (syntax tree, in our case) to a sequence of commands in
the target language (SPARC assembly code).
- (Theory) Read Chapter 1 (Dragon book).
- (Practice) Read and familiarize yourself with C: either pickup Kernighan
& Ritchie's classic book or check the books that I've put on reserve at
the library; I will also give some examples in class. Things to focus about C:
I/O (file or otherwise), pointers and structures, memory allocations,
preprocessor directives, and style. Some UNIX-related things that might
be useful for us: make and makefiles, splitting source files, debugging
tools like gdb, profiling tools, and accessing the manual pages using
- (Theory) Read Chapter 2 (Dragon book) on syntax-directed translation and
- (Practice) Read and familiarize yourself with lex and yacc. The O'Reilly
book is quite helpful. Experiment with the calculator program example. Then
write the lex & yacc files for the Pascal grammar given in Appendix A
- (Theory) Read Chapter 4 (Dragon book) on various method for parsing. The
sequence that we will cover includes LL(1) top-down parsing, SLR(1) bottom-up
parsing, and LR(1) parsing. If time permits, we will cover LALR(1) parsing,
the method that yacc uses. We will skip the material on Operator Precedence
- (Practice) Build your stack of symbol (hash) tables. Use the hashpjw()
function from Chapter 7 (due to Peter J. Weinberger). Make sure you build a
scaffolding around this module before you connect it to the yacc Pascal parse
(not the lex scanner!).
- (Theory) Read bits and pieces of Chapters 5 and 6 (Dragon book). There is
a nice theory about building the abstract syntax trees and defining general
type expressions. We will only require the most basic things about syntax
trees and type expressions but it is good to know the bigger picture.
- (Practice) Build your syntax trees, mainly for statements in the grammar
and some of the declarative rules. The main reason we are building these trees
are so that we can perform semantic checks easily and so that the code
generator does not have a heart attack when it is called. The type checking
that has to be done is quite strict (and hence simple) but you might want to
think about some nice structures to capture "types".
- (Theory) Read Chapter 7 (Dragon book) on Run-Time Environments. Basically,
we need to get familiar with memory allocations, the run-time stack, the
notion of an activation record, flow of control, and some basic sense of the
von Neumann model.
- (Practice) We will look at many examples of some SPARC assembly programs
(especially ones that are generated by the GNU C compiler). We will follow
almost all of the conventions of this open source compiler (most importantly,
its convention for the activation record). The main reason is that we will use
GCC as our assembler!
architecture manual (note: this is for Version 9!)some SPARC questions
A local copy of
the manual in PDF format
(courtesy of C. Isomaki)
- (Theory) Read Chapter 9 (Dragon book). Our focus will be on a simple
algorithm for generating code from DAG (read: syntax trees) given in Section
10. We will adapt this for our SPARC-based compiler. Details will be covered
- enter the Dragon
THE DRAGON PROJECT
The goal of the project is
to build a compiler for a large chunk of the Pascal programming
language that will output a SPARC assembly program. This project will
give you a chance to apply skills you learned from other courses along with
hopefully new ones:
To synchronize the compiler project, make your
compiler take a single input Pascal file (with suffix extension of ".p", say
"main.p") and make it write the SPARC assembly output to file with the same name
except for the suffix (which is changed to ".s", so "main.s"). We will then use
GCC to produce the executable by "gcc main.s". So if your compiler executable is
called yapc then the following sequence should be used
to test all our final executables.
- Automata theory (regular expressions and context-free grammars)
- Data structures and algorithms (lists, stacks, hash tables, binary trees,
and any combination of these)
- Programming (hacking and debugging our way through the maze of several
hundred lines of C code)
- Languages (Pascal [high-level], SPARC assembly [low-level], and C)
- UNIX environments and tools (Lex [lexical analyzer] and yacc [Yet Another
report containing at least the following items:
- Specification document (describes the software)
- Design document (describes the overall design of the software)
- Testing document (describes the testing and verification process
for the software; also bug status report)
- User document (user manual for compilation and proper usage of the
- Complete source listing (hardcopy and compressed tgz file via
During the last week, schedule a meeting
to present the online demonstration of the compiler. Be prepared for some extra
testing and questions about your compiler project.