CS 3723
Programming Languages 
  Compiler Overview   

Outline of the Actions of a Compiler
Classic Form -- More Often Implemented as a Hybrid
[See Programming Language Translation for a restatement of the concepts below.]

In the first statement below, the two identifiers initial and rate are assumed to have floating point values from an earlier part of the program. In an actual compiler, the actions shown below are not separate passes, but each takes output from the previous step as it is produced and feeds input to the next step as it is needed. (We will see how this works.)

Lexical Analyzer: A relatively simple part of the compiler, this breaks the input sequence of characters into a sequence of tokens, which are units of input that will be fed to the next phase of the compiler. Any identifiers will be looked up in the growing SYMBOL TABLE (ST) at the left. The identifier tokens (denoted id1, id2, id2) also have a pointer to the ST that says which identifier it is. The actual output of the lexical analyzer is in internal symbolic form, so it does not output characters "id" or any such thing.
See Lexical Analysis for more detail.


Syntax Analyzer: This builds a syntax tree that represents how the statement is put together. In this case, it means (among other things) that the "*" operator has the highest precedence, "+" the next highest, the the ":=" assignment has the lowest (dealt with last). With an actual compiler the syntax tree is not explicitly constructed, but only exists implicitly. (We'll see how this works.)


Semantic Analyzer: This takes the meaning of constructs into account. In this case one can't directly multiply a float by an int, so 60 must be converted to float.


Intermediate Code Generator: Converts the tree into a sequence of statements in a simple machine-oriented language. This step might be skipped.


Code Optimizer: Tries to get the computation done with as few statements as possible, eliminating two temporary constants in the process. Here it also converts the "inttoreal(60)" into the float constant "60.0" at compile time, so that no conversion is needed at run time. This stage might be carried out in several places.


Code Generator: This generates machine code. In practice it is never assembler code as shown here, but always true machine code. Otherwise the compiler would need to feed this output into an assembler, greatly increasing compile-time. (In the "old" (Paleolithic) days, this was sometimes done.)

( Revision date: 2014-05-21. Please use ISO 8601, the International Standard.)