|
 |
CS 3723
Programming Languages |
Programming Language
Translation |
Programming Language Translation
The structure of a typical compiler is illustrated in Figure 3.2 of the
text. The three basic phases of translation are lexical analysis,
syntactic analysis, and semantic analysis, as shown in the figure. In many
cases there are additional code generation phases such as optimization and
translation from an intermediate form to the code for a target platform.
Some general characteristics of the phases of translation are:
- Lexical analysis (sometimes also called "scanning")
- The objective of the lexical analysis phase is to convert the input
character string (text) representing a program into meaningful chunks
such as reserved words, identifiers, operators, symbols, and literals.
The output of the lexical analyzer is a sequence of tokens, where
each token encodes the type (reserved word, identifier, etc.) and value
(which reserved word, what operator, etc.) of the token.
- Syntactic analysis (sometimes also called "parsing")
- In the syntactic analysis phase the tokens are analyzed and the
structure and semantic components of the program, such as if statements,
function definitions, while loops, and assignment statements, are
determined. The
output of the parser is an encoding, such as a parse tree (discussed
later), that represents the syntactic structure of the program.
- Semantic analysis
- The semantic analysis phase gives meaning to the program's internal
syntactic representation by translating the parse tree or other
output from the parser into code for a virtual or actual computer.
If code for an actual computer is produced then this is the first
phase that is platform dependent. If code for a virtual machine is
produced then the first three phases are all platform independent.
- Optimization
- Optimization is often optional and might even be omitted or combined
with another phase. Optimization can be platform independent, such as
eliminating the calculation of expressions whose value does not change,
or platform dependent, such as eliminating redundant loads for values
that are already in registers, or using an increment instruction rather
than adding 1.
- Code generation
- If it is not a part of semantic analysis, code generation is just the
process of translating the internal virtual machine code into code for
a specific platform.
The symbol table and other internal tables provide overall information about
identifiers and other components that are used in the program. For example,
each variable that is used in the program has an entry in the symbol table
giving the attributes of the variable that are known (type, scope, etc.).
By separating the lexical and syntactic phases, the syntactic phase does not
have to worry about the
mundane aspect of identifying the meaningful sequences of characters that
are of interest, and it can concentrate on identifying the structure and
the semantic components of the program. It also facilitates the use of
different algorithmic paradigms to be used in each phase, which improves
the efficiency because typical parsing algorithms are not the most efficient
algorithms for lexical analysis.
The different phases of a translator can be done as separate passes or as
subprograms that pass data to each other. That is, the lexical analyzer can
convert the entire input text to tokens, then the parser can translate the
set of input tokens to a parse tree or other encoding, and the semantic
analyzer can then translate the entire parser output to code in separate
passes over the program. Alternatively, the tokens can be fed one at a time
to the parser, which can then produce all of the parse tree and send it to
the semantic analyzer or send parts of the parse tree to the semantic
analyzer as they are produced.
Note that an interpreter would normally have the first three phases and perhaps
some optimization as well. An interpreter usually simulates the execution of
virtual machine code produced during semantic analysis, rather than directly
executing source code.
It is interesting to note that some components of a translator can be shared
among different translators. For example, lexical analysis depends only on
the input character set and the rules for combining sequences of characters and
identifying special sequences such as reserved words and operators. Thus the
translators for languages such as Java, C, and C++ might all be able to use the
same lexical analyzer if it were constructed properly. The syntactic
analyzer (parser) is independent of the target platform, as is the lexical
analyzer, so
these two components can be used in a translator for any target platform.
If the semantic analyzer produces code for a virtual machine, then it too
is platform independent and can be used in translators for multiple
platforms. But if the parsers for two different languages produce parse
trees using the same encoding, it might be possible to use the same semantic
analyzer in the translator for more than one language. Similarly, a code
generator is target platform specific in that it generates code for a
specific computer, but if the internal code produced by the translators
for two different
languages is the same, then the same code generator could be used in
the translators for the two languages.
[Taken from: here.]
(Revision date: 2013-12-20.
Please use ISO
8601, the International Standard.)
|