CS445/CS545 Compiler Construction
Lectures: TR 9:30-10:45am Science Center 342
hours: Science Center 373 TR 8:30-9:30,
Pre-requisites: CS344, CS345.
SYLLABUS: A study of compiler design. Overview of the compilation
process. Formal definition of syntax, lexical scanning, parsing including LL and
LR grammars, run-time structures, intermediate code generation, and storage
allocation. Students are expected to develop a compiler for a substantial subset
of a high-level language using compiler tools such as lex and yacc.
GRADING (tentative) Assignments & Quizzes 50%; Midterm 20%;
Final Exam 30%
- (Main Text) Alfred V. Aho, Ravi Sethi and Jeffrey
D. Ullman. "Compilers: Principles, Techniques, and Tools," Addison-Wesley
- Brian W. Kernighan and Dennis M. Ritchie, "The C Programming Language,"
2nd edition, Prentice-Hall (1988) (on reserve)
- John Levine, Tony Mason, and Jason Brown, "lex & yacc," 2nd edition,
O'Reilly (1992) (on reserve)
- Richard P. Paul, "SPARC Architecture, Assembly Language Programming, &
C," 2nd edition, Prentice-Hall (1994) (on reserve)
- Language translation. Compilation. Overview.
- Lexical Analysis. Regular expressions.
- Syntax Analysis. Theory of Parsing: top-down LL(1); bottom-up SLR(1),
LR(1) (and LALR(1)).
- Semantic Analysis. Parse trees. Syntax-directed translation. Type
- Code generation. Runtime environments.
The objective of this course is to learn the
fundamentals of compiler theory and its application in constructing a real
working compiler. The specific outcomes are:
- Proficient in C programming.
- Knowledge of useful UNIX tools for software design and compiler
construction (make, lex, yacc).
- Understanding of the basics of SPARC assembly language and its run-time
- Ability to build a Pascal-like compiler, written in C, for the SPARC
architecture in the UNIX programming environment.
Although attendance is not mandatory, students are responsible
for all course materials covered in lectures and any exams given during class
periods. Students that need to make up missing course work must follow the
official Clarkson policies regarding these matters. All students must submit
their own work; the exchange of ideas are encouraged but ultimately the
submitted work must be the student's own. Please refer to the Clarkson
University Regulations for more guidelines on academic integrity and related
READINGS/NOTES (under construction)
- Assignment 1:
Implement the symbol table module which
should be a stack of hash tables.
The hash table should support bucket
chaining using the hashpjw function (page 436, Chapter 7, Dragon book). Run
some statistics on this hash function to see how it distributes a large amount
of strings; some test file will be supplied soon. The goal here is to
familiarize ourselves with C and the UNIX programming environment, and to
start building a piece of the compiler.
Timeline: 1-2 weeks
- Assignment 2:
Implement the lex scanner and yacc
parser for the simple Pascal subset given in Appendix A of the Dragon
Follow the updates mentioned in lectures. Then, fold the symbol
table into the mix with the yacc parser. Make sure the scoping rules work.
Timeline: 1 week
- Assignment 3:
Implement the syntax tree module for
certain rules of the grammar.
Start with the grammar rules that describe
expressions (expression, simple_expression, term, factor). Some tree node
types that will be useful are:
integer number, real number, string name, operator (relational,
additive, and multiplicative), function call, array indexing. As
in the other modules, do a scaffold test of the tree module before you
incorporate it into the yacc parser. Beware of memory management problems. Use
the GNU profiler "gprof" to help with some profiling tasks. For example,
create a print_tree function for visual testing and make the parse show the
expression tree as output.
Timeline: 1 week
- Assignment 4:
Implement the semantic analyzer for the
Build additional syntax trees for the rules involving statements,
such as, compound-statement, statement-list, statement, and other rules
dependent on these; variable declarations, such as, identifier-list,
arguments, and parameter-list. Use the syntax trees to perform the required
semantic checks; see checks.
Timeline: 2-3 weeks
- Assignment 5:
Implement the code generator.
gencode algorithm from Chapter 9.10 for the SPARC assembly language. This
algorithm handles the conversion of syntax trees for expressions into SPARC
assembly sequence of instructions. Then, extend this to handle code generation
for statements, bodies of subprograms, and beyond. Experiment with SPARC using
the gcc compiler (which we will use as our assembler).
Timeline: 2-3 weeks
THE DRAGON PROJECT
The goal of the project is to build a compiler for a high-level programming
language that outputs a low-level assembly program. This project will give us a
chance to use knowledge from earlier (pre-requisite) courses and to acquire some
new skills, such as:
- Automata theory (regular expressions and context-free grammars)
- Data structures and algorithms (lists, stacks, hash tables, binary trees,
and any combination of these)
- Programming (coding and debugging through a maze of several hundred lines
of C code)
- Languages (Pascal [high-level], SPARC assembly [low-level], and C
- UNIX environment and tools (lex [lexical analyzer] and yacc [Yet Another
Submit a full report
(hardcopy) containing at least the following items:
Submit the complete source
listing (compressed tar file) via email.
- Specification document (description of the software)
- Design document (description of the overall design of the software)
- Testing document (description of the testing and verification
process for the software, including a bug status report)
- User document (description of a user manual for compilation and
proper usage of the software)
During the last two weeks, schedule a meeting to
present an online demonstration of the compiler. Be prepared for some extra
testing and questions about your compiler project.
Here are some blueprints for the Pascal part
of the compiler project:
To synchronize this component of the project, make your
compiler take a single input Pascal file (with suffix extension of ".p", say
"main.p") and make it write the SPARC assembly output to file with the same name
except for the suffix (which is changed to ".s", so "main.s"). We will then use
GCC to produce the executable by running "gcc main.s". So if the compiler
executable is called "mypc", then the following UNIX sequence should be used for
testing all source files:
- Lex scanner & Yacc parser:
- adapting the Pascal-like source language description (Dragon book,
Appendix A) and including some grammar changes, such as:
- Split the number token into separate integer and real components.
- Allow nested subprogram declarations in subprogram declaration.
- Allow array access to appear on the RHS of an assignment statement.
- Adapt the rules involving the sign variable.
- Arrange for unary minus to be handled in factor.
- Allowing procedure calls with no arguments.
- adding extra lexical tasks: handling of real numbers, allowing comments,
- Yacc parser + Symbol table:
- implementing a stack of hash tables for handling nested scoping
- connecting the symbol table to the yacc parser for detecting identifier
problems, such as redeclaration of names within the same scope, usage of
undeclared names, potential improper usage of names, etc.
- Semantic analyzer & Type checker:
- generating a syntax (parse) tree representation of (parts of) the input
- type checking of expressions and statements
- Code generator:
- implementing a SPARC code generator for expression trees (Section 9.10,
pp 557, Dragon book).
- allowing function calls and return values.
- handling statements.
- handling non-local access of names within nested scopes.
- handling Input/Output (read and write procedure calls).
mypc main.p; gcc main.s; a.out