 |
CS 3723
Programming Languages |
II. Derivations, Parse Trees,
and Ambiguity |
Material about Grammars:
This is the second of 4 pages about grammars.
Derivations:
First let's look at a simple grammar that describes a limited
collection of arithmetic expressions. (This could have been
written with a single rule using more "or"s.)
Grammar: Arith. Exp.
Ambiguous |
E −−> E + E
E −−> E * E
E −−> ( E )
E −−> a | b | c | ...
|
Here is a sequence of replacements that leads to a sequence of
terminal symbols.
Leftmost
Derivation: (a+b)*c |
E ==>
E * E ==>
( E ) * E ==>
( E + E) * E ==>
( a + E ) * E ==>
( a + b ) * E ==>
( a + b ) * c |
In this case, we obtained the final sentence ( a + b ) * c
starting from E. One says that "E derives ( a + b ) * c" or
(to use passive voice) "( a + b ) * c is derived from E".
The sequence is called a derivation sequence.
In this case it is a leftmost derivation sequence, because
at each stage the leftmost non-terminal was replaced. Each
non-terminal that is replaced is colored
red above. Sometimes
there was only one non-terminal, so it is leftmost (on an honorary
basis). Here is another derivation:
Rightmost
Derivation: (a+b)*c |
E ==>
E * E ==>
E * c ==>
( E ) * c ==>
( E + E ) * c ==>
( E + b ) * c ==>
( a + b ) * c |
This was a rightmost derivation sequence, because
at each stage the rightmost non-terminal was replaced.
The sentence ( a + b ) * c has a unique leftmost
derivation, a unique (different) rightmost derivation and
the unique parse tree shown below:
Parse Tree:
( a + b ) * c |
E
/|\
E * E
/|\ \
( E ) c
/|\
E + E
| |
a b |
E
/|\
/ | \
/ | \
/ | \
/ | \
E | E
/|\ | |
/ | \ | |
/ E \ | |
/ /|\ \ | |
| / | \ || |
| E | E || |
| | | | || |
( a + b )* c |
Ambiguity:
There are other sentences derived from E above that
have more than one parse tree, and corresponding
left- and rightmost derivations. For example,
the very simple sentence a + b * c.
The table looks at leftmost derivations and parse trees:
1st Leftmost Der.
| 2nd Leftmost Der. |
E ==> E + E
==> a + E
==> a + E * E
==> a + b * E
==> a + b * c |
E ==> E * E
==> E + E * E
==> a + E * E
==> a + b * E
==> a + b * c |
1st Parse Tree
| 2nd Parse Tree |
E
/|\
/ | \
E + E
| /|\
| / | \
a E * E
| |
b c
|
E
/|\
/ | \
E * E
/|\ |
/ | \ |
E + E c
| |
a b
|
Even if some parse trees are unique, if there are multiple parse
trees for any sentence, then the grammar is called ambiguous.
In a programming language it is not acceptable to have more than
one possible reading of a construct. We can't flip a coin to
decide which parse tree to use. There are several ways around this
problem:
- Rewrite the grammar so that it is no longer ambiguous
yet still accepts exactly the same language. This is not
always possible.
- Introduce extra rules that allow the program to decide which
of multiple parse trees to use. These are called
disambiguating rules. (Ah, yes, "disambiguating", one of
my favorite words.)
- An ambiguous grammar may signal problems with language
design, and the programming language itself might be changed.
Unambiguous Grammar:
Here is a new grammar that is not ambiguous and accepts the same
language as the previous grammar:
Grammar:
Arith. Exp.
Unambiguous =
Not Ambiguous |
E −−> E + T | T
T −−> T * F | F
F −−> ( E )
| a | b | c ...
|
It's can be very hard (or impossible) to prove results such
as determining whether or not a grammar is ambiguous. In practice
there are usually no problems. With this grammar every sentence
has a unique leftmost and rightmost derivation and a unique parse tree.
For example, the above sentence a + b * c that caused
a problem has the following leftmost derivation and
parse tree on the left (along with the its twin
a * b + c on the right):
Leftmost Derivations (Unique) |
a + b * c
| a * b + c |
E ==> E + T
==> T + T
==> F + T
==> a + T
==> a + T * F
==> a + F * F
==> a + b * F
==> a + b * c |
E ==> E + T
==> T + T
==> T * F + T
==> F * F + T
==> a * F + T
==> a * b + T
==> a * b + F
==> a * b + c |
1st Parse Tree
| 2nd Parse Tree |
E
/|\
/ | \
E + T
| /|\
T T * F
| | |
F F c
| |
a b
|
E
/|\
/ | \
E + T
| |
T F
/|\ |
/ | \ c
T * F
| |
F b
|
a
|
(Revision date: 2014-09-20.
(Please use ISO
8601, the International Standard.)
|