CS445/CS545 Compiler Construction
Instructor: Christino Tamon
Lectures: MWF 10am Science Center 356
Laboratory meets: F 10am SUN lab Rowley 142 and/or COSI lab SC346
Office hours: Science Center 373 MW 8-10am, 11-noon F 9-10am
Pre-requisites: CS344, CS345.
A study of compiler design. Overview of the compilation process. Formal
definition of syntax, lexical scanning, parsing including LL and LR
grammars, run-time structures, intermediate code generation, and storage
allocation. Students are expected to develop a compiler for a substantial
subset of a high-level language using compiler tools such as lex and yacc.
Course grading: Project 50%, Exams 40%, Miscellaneous 10%
- Aho, Sethi and Ullman. Compilers. Addison-Wesley Publishing Company, 1986.
- Kernighan and Ritchie. The C Programming Language. Second edition. Prentice-Hall, 1988. (on reserve)
- Levine, Mason, and Brown. lex & yacc. O'Reilly and Associates, Inc., 1992.
- Paul. SPARC Architecture, Assembly Language Programming, & C. Prentice-Hall, 1994.
- Language translation. Compilation. Overview.
- Lexical Analysis. Regular expressions.
- Syntax Analysis. Theory of Parsing: top-down LL(1); bottom-up SLR(1), LR(1) (and LALR(1)).
- Semantic Analysis. Parse trees. Syntax-directed translation. Type checking.
- Code generation. Runtime environments.
Readings and notes
- Review of course and project: we are going to build our own Dragon (compiler)
for a Pascal-like programming language (source language) that will produce
SPARC assembly (target language) output. Our compiler (language translator)
will be implemented in C. Core components of the compiler include:
- Scanner: module that tokenize the input stream (of characters) into a stream
of tokens (legal lexical elements of the language). This component can be
speficied using the tools of Regular Expressions and the module itself is
a DFA program. We will build our scanner using lex (or flex under
- Parser: module that checks whether the stream of tokens is syntactically
correct according to the grammar of the source language. This component can
be specified using the tools of Context-free Grammars and the module itself
is a PDA program. A very important task of the parser is also to build the
parse (syntax) tree representation of the input program. We will build our
parser using yacc (or bison under Linux/BSD)
- Symbol Table: module that stores information about names that appear in
the input program. A common data structure used for this module is a hash
table with chaining.
- Semantic analyzer: module that performs semantic checks on the parse tree.
Typical checks include undeclared names, type mismatches, illegal usage of
subroutines, and others. An important sub-module called the Type Checker is
usually part of this module.
- Code generator: module that converts an intermediate representation of the
input program (syntax tree, in our case) to a sequence of commands in the
target language (SPARC assembly code).
- (Theory) Read Chapter 1 (Dragon book).
- (Practice) Read and familiarize yourself with C: either pickup Kernighan & Ritchie's
classic book or check the books that I've put on reserve at the library; I
will also give some examples in class. Things to focus about C: I/O (file
or otherwise), pointers and structures, memory allocations, preprocessor directives,
and style. Some UNIX-related things that
might be useful for us: make and makefiles, splitting source files, debugging
tools like gdb, profiling tools, and accessing the manual pages using man.
- (Theory) Read Chapter 2 (Dragon book) on syntax-directed translation and
- (Practice) Read and familiarize yourself with lex and yacc. The O'Reilly
book is quite helpful. Experiment with the calculator program example. Then
write the lex & yacc files for the Pascal
grammar given in Appendix A (Dragon book).
- (Theory) Read Chapter 4 (Dragon book) on various method for parsing. The
sequence that we will cover includes LL(1) top-down parsing, SLR(1) bottom-up
parsing, and LR(1) parsing. If time permits, we will cover LALR(1) parsing,
the method that yacc uses. We will skip the material on Operator Precedence
- (Practice) Build your stack of symbol (hash) tables. Use the hashpjw() function
from Chapter 7 (due to Peter J. Weinberger). Make sure you build a scaffolding
around this module before you connect it to the yacc Pascal parse (not the
- (Theory) Read bits and pieces of Chapters 5 and 6 (Dragon book). There is
a nice theory about building the abstract syntax trees and defining general
type expressions. We will only require the most basic things about syntax
trees and type expressions but it is good to know the bigger picture.
- (Practice) Build your syntax trees, mainly for statements in the grammar
and some of the declarative rules. The main reason we are building these trees
are so that we can perform semantic checks easily and so that the code generator
does not have a heart attack when it is called. The type checking that has
to be done is quite strict (and hence simple) but you might want to think about
some nice structures to capture "types".
- (Theory) Read Chapter 7 (Dragon book) on Run-Time Environments. Basically,
we need to get familiar with memory allocations, the run-time stack, the notion
of an activation record, flow of control, and some basic sense of the von Neumann
- (Practice) We will look at many examples of some SPARC assembly programs
(especially ones that are generated by the GNU C compiler). We will follow
almost all of the conventions of this open source compiler (most importantly,
its convention for the activation record). The main reason is that we will
use GCC as our assembler!
manual (note: this is for Version 9!)
A local copy of the manual in PDF format (courtesy of C. Isomaki)
- (Theory) Read Chapter 9 (Dragon book). Our focus will be on a simple algorithm
for generating code from DAG (read: syntax trees) given in Section 10. We
will adapt this for our SPARC-based compiler. Details will be covered in lectures.
Our Dragon course project
The goal of the project is to build a compiler for a large chunk of the
Pascal programming language that will output a SPARC assembly
This project will give you a chance to
apply skills you learned from other courses along with hopefully new ones:
To synchronize the compiler project, make your compiler take a single input Pascal file
(with suffix extension of ".p", say "main.p") and make it write the SPARC assembly output
to file with the same name except for the suffix (which is changed to ".s", so "main.s").
We will then use GCC to produce the executable by "gcc main.s". So if your compiler executable
is called yapc then the following sequence should be used to test
all our final executables.
- Automata theory (regular expressions and context-free grammars)
- Data structures and algorithms (lists, stacks, hash tables, binary trees, and any combination of these)
- Programming (hacking and debugging our way through the maze of several hundred lines of C code)
- Languages (Pascal [high-level], SPARC assembly [low-level], and C)
- UNIX environments and tools (lex & yacc).
Full report (due before dead week) containing at least the following items:
- Specification document (describes the software)
- Design document (describes the overall design of the software)
- Testing document
(describes the testing and verification process for the software; also bug status report)
- User document (user manual for compilation and proper usage of the software)
- Complete source listing (hardcopy and compressed targz file via email)
During the dead and final exam week,
schedule a meeting to present the online demonstration of the compiler.
Be prepared for some extra testing and questions about your compiler project.
Assignments (mini projects)