 |
CS 3723
Programming Languages |
Derivations, Parse Trees,
and Ambiguity |
Material about Grammars:
This is the second of 3 pages about grammars.
- Formal Grammars (Introduction)
- Derivations, Parse Trees,
Ambiguity (this page)
- Types of Grammars
Derivations:
First let's look at a simple grammar that describes a limited
collection of arithmetic expressions. (This could have been
written with a single rule using more "or"s.)
Grammar: Arith. Exp. |
E ----> E + E
E ----> E * E
E ----> ( E )
E ----> a | b | c | ...
|
Here is a sequence of replacements that leads to a sequence of
terminal symbols.
E ===>
E * E ===>
( E ) * E ===>
( E + E) * E ===>
( a + E ) * E ===>
( a + b ) * E ===>
( a + b ) * c
In this case, we obtained the final sentence ( a + b ) * c
starting from E. One says that "E derives ( a + b ) * c" or
(to use passive voice) "( a + b ) * c is derived from E".
The sequence is called a derivation sequence.
In this case it is a leftmost derivation sequence, because
at each stage the leftmost non-terminal was replaced. Each
non-terminal that is replaced is colored
red above. Sometimes
there was only one non-terminal, so it is leftmost on an honorary
basis.) Here is another derivation:
E ===>
E * E ===>
E * c ===>
( E ) * c ===>
( E + E ) * c ===>
( E + b ) * c ===>
( a + b ) * c
This was a rightmost derivation sequence, because
at each stage the rightmost non-terminal was replaced.
The sentence ( a + b ) * c has a unique leftmost
derivation, a unique (different) rightmost derivation and
the unique parse tree shown below:
Parse Tree:
( a + b ) * c |
E
/|\
E * E
/|\ \
( E ) c
/|\
E + E
| |
a b |
E
/|\
/ | \
/ | \
/ | \
/ | \
E | E
/|\ | |
/ | \ | |
/ E \ | |
/ /|\ \ | |
| / | \ || |
| E | E || |
| | | | || |
( a + b )* c |
Ambiguity:
There are other sentences derived from E above that
have more than one parse tree, and corresponding
left- and rightmost derivations. For example,
the very simple sentence a + b * c.
The table looks at leftmost derivations and parse trees:
1st Leftmost Der.
| 2nd Leftmost Der. |
E ===> E + E
===> a + E
===> a + E * E
===> a + b * E
===> a + b * c |
E ===> E * E
===> E + E * E
===> a + E * E
===> a + b * E
===> a + b * c |
1st Parse Tree
| 2nd Parse Tree |
E
/|\
/ | \
E + E
| /|\
| / | \
a E * E
| |
b c
|
E
/|\
/ | \
E * E
/|\ |
/ | \ |
E + E c
| |
a b
|
Even if some parse trees are unique, if there are multiple parse
trees for any sentence, then the grammar is called ambiguous.
In a programming language it is not acceptable to have more than
one possible reading of a construct. We can't flip a coin to
decide which parse tree to use. There are several ways around this
problem:
- Rewrite the grammar so that it is no longer ambiguous
yet still accepts exactly the same language. This is not
always possible.
- Introduce extra rules that allow the program to decide which
of multiple parse trees to use. These are called
disambiguating rules. (Ah, yes, "disambiguating", one of
my favorite words.)
- An ambiguous grammar may signal problems with language
design, and the programming language itself might be changed.
Here is a new grammar that is not ambiguous and accepts the same
language as the previous grammar.:
Grammar: Arith. Exp. |
E ----> E + T | T
T ----> T * F | F
F ----> ( E )
| a | b | c ...
|
It's can be very hard (or impossible) to prove results such
as determining whether or not a grammar is ambiguous. In practice
there are usually no problems. With this grammar every sentence
has a unique leftmost and rightmost derivation and a unique parse tree.
For example, the above sentence a + b * c that caused
a problem has the following leftmost derivation and
parse tree on the left (along with the its twin
a * b + c on the right):
Leftmost Derivations |
a + b * c
| a * b + c |
E ===> E + T
===> T + T
===> F + T
===> a + T
===> a + T * F
===> a + F * F
===> a + b * F
===> a + b * c |
E ===> E + T
===> T + T
===> T * F + T
===> F * F + T
===> a * F + T
===> a * b + T
===> a * b + F
===> a * b + c |
1st Parse Tree
| 2nd Parse Tree |
E
/|\
/ | \
E + T
| /|\
T T * F
| | |
F F c
| |
a b
|
E
/|\
/ | \
E + T
| |
T F
/|\ |
/ | \ c
T * F
| |
F b
|
a
|
The Parsing Problem:
Given an unambiguous grammar and a sentence derived from the start symbol
of the grammar, the parsing problem is the problem
of calculating
- the parse tree, or
- the leftmost derivation, or
- the rightmost derivation.
Each of the above three items starts with "the" because each is
unique.
It is important to realize that the grammar rules can be
examined in any order, but the sentence must be scanned from
left to right. The parsing problem is to find an algorithm
that will carry out the parse.
There are two basic approaches to parsing, namely top-down
and bottom-up:
Top-Down Parsing:
In this case we do the following:
- Construct the parse tree from the top down and from left to right, or
- Construct the leftmost derivation from left to right, or
In practice the parse tree itself is almost never constructed (too
much computation), although all the information needed to construct
it will be available.
The problem becomes: at each stage there may be several
choices of replacements
for the leftmost non-terminal. How do we decide which replacement to
use? Here are two diagrams illustrating the top-down parsing
process, without showing how to make the choices along the way.
Top-Down Parse of:
a + b * c |
Parse Tree |
E
|
E
/|\
E + T
|
E
/|\
E + T
|
T
|
F
|
a
|
E
/|\
/ | \
E + T
| /|\
T T * F
|
F
|
a
|
E
/|\
/ | \
E + T
| /|\
T T * F
| |
F F
| |
a b
|
E
/|\
/ | \
E + T
| /|\
T T * F
| | |
F F |
| | |
a b c
|
Step Num |
0 |
1 |
2-4 |
5 |
6-7 |
8 |
Der. Steps |
E=> |
E + T=> |
T + T=>
F + T=>
a + T=> |
a + T * F=> |
a + F * F=>
a + b * F=> |
a + b * c |
| |
Leftmost Derivation |
E ==>
E + T ==>
T + T ==>
F + T ==>
a + T ==>
a + T * F ==>
a + F * F ==>
a + b * F ==>
a + b * c |
|
Bottom-Up Parsing:
In this case we do the following:
- Construct the parse tree from the bottom up and from left to right,
and
- Construct the rightmost derivation from right to left, that is, backwards.
(This is hard to think about: because we're doing a rightmost derivation
backwards, that is, right-to-left, we are really once again working
on the input sentence from left-to-right, as we must.)
As with top-down parsing, we don't normally construct the parse tree itself.
This type of parsing may be harder for you to visualize at first,
but it is the most common type used.
We will be using the opposite of a replacement. This opposite
action is called a
reduction. Here one starts
with the sentence, instead of the start symbol. One repeatedly looks
for a match of the right side of a grammar rule (in general,
a sequence of terminals and non-terminals) with the stuff at
the scan point in the input sentence. Then the matched right side
is replaced with the single left side. This process ends when
the input sentence is used up and the derivations sequence is
reduced to the start symbol.
Here the parsing problem is: at each stage there may be
several several choices of what to
reduce, and there may be several choices of what to reduce to.
In each case how do we make the necessary choices?
As with the top-down process, here are two diagrams illustrating
the bottom-up parsing
process, without showing how to make the choices along the way.
Bottom-Up Parse of:
a + b * c |
Parse Tree |
a+b*c
|
E
|
T
|
F
|
a + b * c
|
E
|
T T
| |
F F
| |
a + b * c
|
E
|
T T
| |
F F F
| | |
a + b * c
|
E T
| /|\
| / | \
T T | |
| | | |
F F | F
| | | |
a + b * c
|
E
/|\
/ | \
/ | \
/ | \
E | T
| / /|\
| / / | \
T | T | |
| | | | |
F | F | F
| | | | |
a + b * c
|
Step Num |
0 |
1-3 |
4-5 |
6 |
7 |
8 |
Reduc- tions |
Start |
T <= E
F <= T
a <= F |
F <= T
b <= F |
c <= F |
T * F <= T |
E + T <= E |
| |
Rightmost Derivation
backwards |
a + b * c <==
F + b * c <==
T + b * c <==
E + b * c <==
E + F * c <==
E + T * c <==
E + T * F <==
E + T <==
E
Red = what is reduced,
NOT what reduced TO.
|
|
Revision date: 2013-09-14.
(Please use ISO
8601, the International Standard.)
|