 |
CS 3723
Programming Languages |
Shift-Reduce Parsers |
Overview: Parsers are algorithms
and programs that will
unravel the syntax of a sentence described by a formal grammar.
Most parsers will not handle an arbitrary grammar, but place limitations
on the form of grammar allowed. These parsers go through the
motions of building a parse tree without actually generating the tree.
While the parse is going on, extra code can carry out a variety of
tasks related to translating the source into some other form.
Shift-reduce (S-R) Parsers:
These include the LALR
parsers which are the
most sophisticated in use today (the "industry standard").
What you see on this page uses some of the same ideas, but is
orders of magnitude simpler.
These parsers start with a unambiguous grammar and with
a sentence in the language defined by the grammar.
They construct a rightmost derivation backwards by
performing a series of reductions. As shown below, the setup
consists of a sequence of input symbols (all terminals)
that are used up one-at-a-time, called below
the Rest of Input. There is a place for the
Current Symbol under consideration (also a terminal), and
a Stack of terminals and non-terminals.
At each step, the symbol on the top of the stack is looked up against
the current symbol, using a table called the Shift-Reduce Table,
which has 4 different entries:
- s for "shift", meaning to shift the current symbol onto
the top of the stack and let the next input symbol become the
new current symbol.
- r for "reduce", meaning to find the right side of a grammar
rule on the top of the stack and replace (reduce) it with the left
side.
- acc for "accept", meaning to accept the input.
- a blank, which means an error has occurred.
A final rule is needed when there are two possible reductions
on the top of the stack, as an example shows below: E + T is at the
top of the stack. In that case you use the longest match, so in this
example one would reduce E + T to E,
rather than reducing T to E.
The problem with the specific approach given here is that
the collection of grammars that one can use is severely limited.
A refinement of this approach (LALR parsers mentioned above)
allow a much broader collection of grammars (but not all CF grammars),
and the grammars allowed are sufficient for all programming
language needs that I know of. (Well, as with Python, some
issues are often handled by the scanner (= lexical analyzer).)
Initial Example:
Consider the following grammar:
Grammar:
Arithmetic Expressions |
P −−> E
E −−> E + T | T
T −−> T * F | F
F −−> ( E ) | id
|
Use the following table for this grammar:
Parser:
Shift-Reduce Table |
| Current Symbol |
+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
| id | * | + | ( | ) | $ |
−−+−−−−−+−−−−−+−−−−−+−−−−−+−−−−−+−−−−−+−−−−−+
S | P | | | | | | acc | (s = "shift")
t | E | | | s | | s | r |
a | T | | s | r | | r | r | (r = "reduce")
c | F | | r | r | | r | r |
k | id | | r | r | | r | r | (acc = "accept")
| * | s | | | s | | |
T | + | s | | | s | | | (empty = "error")
o | ( | s | | | s | | |
p | ) | | r | r | | r | r |
| $ | s | | | s | | |
−−+−−−−−+−−−−−+−−−−−+−−−−−+−−−−−+−−−−−+−−−−−+
|
The table below shows the shift-reduce parse of the following sentence,
showing the stack, current symbol, remaining symbols,
and next action to take at each stage. This sentence has the
extra artificial symbol $ stuck in at the beginning
and the end. Any sentence at all described by the grammar will work
here just like the example below.
Input
Sentence |
$ ( id + id ) * id $
|
(You should initially shift the starting $.)
Shift-Reduce
Actions |
Stack Curr Rest of Input Action
(top at right) Sym
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
$ ( id + id ) * id $ shift
$ ( id + id ) * id $ shift
$ ( id + id ) * id $ reduce: F −−> id
$ ( F + id ) * id $ reduce: T −−> F
$ ( T + id ) * id $ reduce: E −−> T
$ ( E + id ) * id $ shift
$ ( E + id ) * id $ shift
$ ( E + id ) * id $ reduce: F −−> id
$ ( E + F ) * id $ reduce: T −−> F
$ ( E + T ) * id $ reduce: E −−> E + T
$ ( E ) * id $ shift
$ ( E ) * id $ reduce: F −−> ( E )
$ F * id $ reduce: T −−> F
$ T * id $ shift
$ T * id $ shift
$ T * id $ reduce: F −−> id
$ T * F $ reduce: T −−> T * F
$ T $ reduce: E −−> T
$ E $ reduce: P −−> E
$ P $ accept
|
Notice that the sequence of reductions give the following rightmost
derivation in reverse:
Rightmost Derivations |
( id + id ) * id
|
P ==> E
==> T
==> T * F
==> T * id
==> F * id
==> ( E ) * id
==> ( E + T ) * id
==> ( E + F ) * id
==> ( E + id ) * id
==> ( T + id ) * id
==> ( F + id ) * id
==> ( id + id ) * id |
S-R Parsers in Practice:
In actual practice, the S-R parsing would be carried out using
a Parser Generator Program and using the much fancier
LALR(1) bottom-up approach, or perhaps the LL(1) top-down approach.
This program would read an
input grammar, and would output a skeleton parser program,
which includes a table like the one above to guide the parse.
This skeleton parse would
carry out the parse using any input sentence at all,
rather than doing it by hand as we show above.
The most important work of such a parser generator program
(typically 5000 lines of C) is to
create the tables, which are much too complicated to generate by hand.
The program would also identify problems with the input
grammar, including especially ambiguities, but other problems also.
An early such program was
yacc (for Yet Another Compiler Compiler) as part of the
Unix system in the early 70s. There are very many tools of this
sort now:
compiler-compilers.
One time in teaching a compiler course, I used a table-generating
program, but had the students write the LALR parser itself.
(Not too hard to write by hand; almost all the difficulty is in
creating the tables.) These students actually liked this approach
because they had complete control of the parser (hey, they wrote it).
With yacc you have to live with the idiosyncrasies of
the parser it generates.
( Revision date: 2014-09-28.
Please use ISO 8601,
the International Standard Date and Time Notation.)
|