|
|
CS 3723
Programming Languages
Fall 2013 |
Course Overview
|
The main objective of the semester's first 8 weeks is
an understanding of syntax-directed translation.
Here is an online definition:
In simplest terms, 'Syntax Directed Translation' means
driving the entire compilation (translation) process with
the syntax recognizer (the parser).
Using a number of specific examples we will see how this
works in practice. This includes the entire overall approach
to the work in Parts I and II below.
I. Regular Expression Processor.
During the first 2 weeks we will show how a programming
language can handle a regular expression.
Specifically, we will show how the language can take an
input regular expression R and determine the match data
for an input string S.
See material about regular expressions:
As an example, suppose
R = /ab|aac/ and
S = cccaabaccc. Here we are looking for
the first occurrence of either "ab" or "aac" in the string.
In case of a match, we want what comes before the match,
the data that matched, and what comes after the match.
In this case, "ccca" comes before, "ab" is the match,
and "accc" comes after.
A simple Ruby program that carries out this match
is: abaac.rb.
We want a program that will handle an arbitrarily long and complex
regular expression. The running example will be /ab|aac/.
In order to get the match data and to match an interior portion
of a string, it is necessary to add .*
to the start of
R so that anything can come at the beginning of the string.
This makes R look like:
The parentheses are required because the "or" operator
( | )
has lower precedence than the "star" operator
( * ).
The concatenation operation (with only an implicit operator)
has precedence between these two.
The processing has four main steps:
- Unravel the Syntax of the
Regular Expression R. The term for this is to parse
the regular expression. Later in the course we will see how
to use a formal grammar to describe all regular expressions.
Then we will have a parsing algorithm that will allow
to build a parse tree for R. For the specific
example above, the parse tree would look like the following
diagram. Here the leaves (in green)
are single characters
(or in the case of dot, a single character class, namely
any character). The other nodes are operators
(in red). A plus
sign is used for concatenation.
Parse Tree for:
.*(ab/aac)
|
+
/ \
/ \
* |
/ / \
. + \
/ \ \
a b +
/ \
+ c
/ \
a a
|
We will be studying grammars and parsers in the weeks ahead,
but we haven't done it yet. In the meantime we will
simply convert the regular expression R = /.*(ab/aac)/
to RPN (reverse Polish notation):
.*ab+aa+c+|+$, where $ is used as an end marker.
- Process the RPN Version of the
Regular Expression R.
For this we use an auxiliary stack to hold indexes of the
start and terminal states of intermediate NFAs.
Basically, the stack holds intermediate versions of the
growing final NFA.
The RPN is scanned from left to right. Operands cause a
stand-alone NFA to be pushed onto the stack.
Operators are applied to the top NFA on the stack
(in the case of *) or to the top two NFAs on the
stack (in the case of + or |). As the RPN is
processed, the program will carry out the next
part of the construction.
- Build FA.
As indicated in part 2 above, the NFA will be built up during
the evaluation of the RPN.
Later, after we study parsers, we will see that steps
1, 2, and 3 above can be carried out with simple additions
to the parser. There will be no need for the intermediate
RPN, and no need for the auxiliary stack (which is replaced
by recursion.) This is the syntax-directed translation
process mentioned at the start.
- Simulate FA. Next simulate the
actions of the NFA N
as it processes the string S, character-by-character.
In particular, as the NFA processes each character in turn,
our program keeps track of all the possible states
that the NFA might be in.
You will be getting an overview of how the above process works.
You can see how the steps work, and what an implementing
program looks like. However, you will at most be programming
a few small pieces of all these steps.
II. Compiler for a Small Language.
During the next 5 or 6 weeks we will write a
compiler program. This will be much more challenging than
Part I because you will write complete working code in C
or in Java. It will follow the steps in
the diagram Compilers,
but several steps will be left out and other parts simplified.
Still, it will be a "real" compiler and will illustrate how
the process works. In particular:
- The lexical analyzer (also known as a scanner)
will be extremely simple, and
there will be no symbol table.
- We will have a proper syntax analyzer
(also known as a parser), although not as complex
as those in current use.
- Almost no semantic analysis.
- No intermediate code generator.
- Generate MIPS assembly code for floating point numbers
(doubles). This generation will be accomplished
using syntax-directed translation.
III. Investigate Several Other
"Strange" Languages.
These may include Lisp and Ruby.
Revision date: 2013-07-23.
(Please use ISO
8601, the International Standard.)
|