CS445/CS545 Compiler Construction
Instructor: Christino Tamon
Lectures: MWF 8-9am Science Center 354
Laboratory meets: F 8am Rowley 142
Office hours: Science Center 373 MW 9-10am, 11-noon F 9-10am
Pre-requisites: CS344, CS345.
A study of compiler design. Overview of the compilation process. Formal
definition of syntax, lexical scanning, parsing including LL and LR
grammars, run-time structures, intermediate code generation, and storage
allocation. Students are expected to develop a compiler for a substantial
subset of a high-level language using compiler tools such as lex and yacc.
Course grading: Project 60%, Exams 30%, Miscellaneous 10%
- Aho, Sethi and Ullman. Compilers. Addison-Wesley Publishing Company, 1986.
- Appel. Modern Compiler Implementation in C. Cambridge University Press, 1998.
- Kernighan and Ritchie. The C Programming Language. Second edition. Prentice-Hall, 1988. (on reserve)
- Levine, Mason, and Brown. lex & yacc. O'Reilly and Associates, Inc., 1992.
- Paul. SPARC Architecture, Assembly Language Programming, & C. Prentice-Hall, 1994.
- Kernighan and Pike. The Practice of Programming. Addison-Wesley, 1999.
- Language translation. Compilation. Overview.
- Lexical Analysis. Regular expressions.
- Syntax Analysis. Theory of Parsing: top-down LL(1); bottom-up SLR(1), LR(1) (and LALR(1)).
- Semantic Analysis. Parse trees. Syntax-directed translation. Type checking.
- Code generation. Runtime environments.
Readings and notes
- Review of course and project: we are going to build our own Dragon (compiler)
for a Pascal-like programming language (source language) that will produce SPARC assembly (target language)
output. Our compiler (language translator) will be implemented in C. Core components of the compiler include:
- Scanner: module that tokenize the input stream (of characters) into a stream of tokens
(legal lexical elements of the language). This component can be speficied using
the tools of Regular Expressions and the module itself is a DFA program. We will build
our scanner using lex (or flex under Linux/BSD).
- Parser: module that checks whether the stream of tokens is syntactically correct according
to the grammar of the source language. This component can be specified using the tools
of Context-free Grammars and the module itself is a PDA program. A very important task
of the parser is also to build the parse (syntax) tree representation of the input program.
We will build our parser using yacc (or bison under Linux/BSD)
- Symbol Table: module that stores information about names that appear in the input program.
A common data structure used for this module is a hash table with chaining.
- Semantic analyzer: module that performs semantic checks on the parse tree. Typical checks
include undeclared names, type mismatches, illegal usage of subroutines, and others.
An important sub-module called the Type Checker is usually part of this module.
- Code generator: module that converts an intermediate representation of the input program
(syntax tree, in our case) to a sequence of commands in the target language (SPARC assembly code).
- (Theory) Read Chapter 1 (Dragon book).
- (Practice) Read and familiarize yourself with C: either pickup Kernighan & Ritchie's
classic book or check the books that I've put on reserve at the library; I will also give some
examples in class. Things to focus about C: I/O (file or otherwise), pointers and structures,
memory allocations, preprocessor directives, and style. Some UNIX-related things that
might be useful for us: make and makefiles, splitting source files, debugging tools
like gdb, profiling tools, and accessing the manual pages using man.
The goal of the project is to build a compiler for a large chunk of a Pascal-like programming language.
The output of the compiler is an executable SPARC assembly program. If the compiler is called
yapc (Yet Another Pascal-like Compiler), then a typical happy ending scenario is as follows:
The first line should never result in a segmentation fault (i.e. the compiler should never crash)
and the second line will execute whatever is meant to be done in the Pascal source code
Since we are generating SPARC assembly code, some local machines that would be able to run these are
crux and atlantis.
% yapc -o boo main.p
Assignments (mini projects)
- C & UNIX
- Scanner (deadline: 02/02/01)
- Symbol table (deadline: 02/15/01)
- Parser (deadline: 02/28/01)
- Semantic analyzer; type checker (deadline: 03/30/01)
- Code generator (deadline: 04/25/01)