CS 3723: Derivations, Parse Trees, Ambiguity

	CS 3723 Programming Languages
Derivations, Parse Trees, and Ambiguity

Material about Grammars:

This is the second of 3 pages about grammars.

Formal Grammars (Introduction)
Derivations, Parse Trees, Ambiguity (this page)
Types of Grammars

Derivations: First let's look at a simple grammar that describes a limited collection of arithmetic expressions. (This could have been written with a single rule using more "or"s.)

Grammar: Arith. Exp.
E ----> E + E E ----> E * E E ----> ( E ) E ----> a \| b \| c \| ...

Here is a sequence of replacements that leads to a sequence of terminal symbols.

E ===> E * E ===> ( E ) * E ===> ( E + E) * E ===> ( a + E ) * E ===> ( a + b ) * E ===> ( a + b ) * c

In this case, we obtained the final sentence ( a + b ) * c starting from E. One says that "E derives ( a + b ) * c" or (to use passive voice) "( a + b ) * c is derived from E". The sequence is called a derivation sequence. In this case it is a leftmost derivation sequence, because at each stage the leftmost non-terminal was replaced. Each non-terminal that is replaced is colored red above. Sometimes there was only one non-terminal, so it is leftmost on an honorary basis.) Here is another derivation:

E ===> E * E ===> E * c ===> ( E ) * c ===> ( E + E ) * c ===> ( E + b ) * c ===> ( a + b ) * c

This was a rightmost derivation sequence, because at each stage the rightmost non-terminal was replaced.

The sentence ( a + b ) * c has a unique leftmost derivation, a unique (different) rightmost derivation and the unique parse tree shown below:

Parse Tree: ( a + b ) * c
*E /\|\ E E /\|\ \ ( E ) c /\|\ E + E \| \| a b**	E /\|\ / \| \ / \| \ / \| \ / \| \ E \| E /\|\ \| \| / \| \ \| \| / E \ \| \| / /\|\ \ \| \| \| / \| \ \|\| \| \| E \| E \|\| \| \| \| \| \| \|\| \| ( a + b )* c

Ambiguity: There are other sentences derived from E above that have more than one parse tree, and corresponding left- and rightmost derivations. For example, the very simple sentence a + b * c. The table looks at leftmost derivations and parse trees:

1st Leftmost Der.

2nd Leftmost Der.

E ===> E + E ===> a + E ===> a + E * E ===> a + b * E ===> a + b * c

E ===> E * E ===> E + E * E ===> a + E * E ===> a + b * E ===> a + b * c

1st Parse Tree

2nd Parse Tree

E /|\ / | \ E + E | /|\ | / | \ a E * E | | b c

E /|\ / | \ E * E /|\ | / | \ | E + E c | | a b

Even if some parse trees are unique, if there are multiple parse trees for any sentence, then the grammar is called ambiguous. In a programming language it is not acceptable to have more than one possible reading of a construct. We can't flip a coin to decide which parse tree to use. There are several ways around this problem:

Rewrite the grammar so that it is no longer ambiguous yet still accepts exactly the same language. This is not always possible.
Introduce extra rules that allow the program to decide which of multiple parse trees to use. These are called disambiguating rules. (Ah, yes, "disambiguating", one of my favorite words.)
An ambiguous grammar may signal problems with language design, and the programming language itself might be changed.

Here is a new grammar that is not ambiguous and accepts the same language as the previous grammar.:

Grammar: Arith. Exp.
E ----> E + T \| T T ----> T * F \| F F ----> ( E ) \| a \| b \| c ...

It's can be very hard (or impossible) to prove results such as determining whether or not a grammar is ambiguous. In practice there are usually no problems. With this grammar every sentence has a unique leftmost and rightmost derivation and a unique parse tree. For example, the above sentence a + b * c that caused a problem has the following leftmost derivation and parse tree on the left (along with the its twin a * b + c on the right):

Leftmost Derivations
a + b * c	a * b + c
*E ===> E + T ===> T + T ===> F + T ===> a + T ===> a + T F ===> a + F * F ===> a + b * F ===> a + b * c**	*E ===> E + T ===> T + T ===> T F + T ===> F * F + T ===> a * F + T ===> a * b + T ===> a * b + F ===> a * b + c**
1st Parse Tree	2nd Parse Tree
*E /\|\ / \| \ E + T \| /\|\ T T F \| \| \| F F c \| \| a b**	*E /\|\ / \| \ E + T \| \| T F /\|\ \| / \| \ c T F \| \| F b \| a**

The Parsing Problem: Given an unambiguous grammar and a sentence derived from the start symbol of the grammar, the parsing problem is the problem of calculating

the parse tree, or
the leftmost derivation, or
the rightmost derivation.

Each of the above three items starts with "the" because each is unique.

It is important to realize that the grammar rules can be examined in any order, but the sentence must be scanned from left to right. The parsing problem is to find an algorithm that will carry out the parse.

There are two basic approaches to parsing, namely top-down and bottom-up:

Top-Down Parsing: In this case we do the following:

Construct the parse tree from the top down and from left to right, or
Construct the leftmost derivation from left to right, or

In practice the parse tree itself is almost never constructed (too much computation), although all the information needed to construct it will be available.

The problem becomes: at each stage there may be several choices of replacements for the leftmost non-terminal. How do we decide which replacement to use? Here are two diagrams illustrating the top-down parsing process, without showing how to make the choices along the way.

Top-Down Parse of: a + b * c
Parse Tree	E	E /\|\ E + T	E /\|\ E + T \| T \| F \| a	E /\|\ / \| \ E + T \| /\|\ T T * F \| F \| a	E /\|\ / \| \ E + T \| /\|\ T T * F \| \| F F \| \| a b	E /\|\ / \| \ E + T \| /\|\ T T * F \| \| \| F F \| \| \| \| a b c
Step Num	0	1	2-4	5	6-7	8
Der. Steps	E=>	E + T=>	T + T=> F + T=> a + T=>	a + T * F=>	a + F * F=> a + b * F=>	a + b * c

Leftmost Derivation
E ==> E + T ==> T + T ==> F + T ==> a + T ==> a + T * F ==> a + F * F ==> a + b * F ==> a + b * c

Bottom-Up Parsing: In this case we do the following:

Construct the parse tree from the bottom up and from left to right, and
Construct the rightmost derivation from right to left, that is, backwards. (This is hard to think about: because we're doing a rightmost derivation backwards, that is, right-to-left, we are really once again working on the input sentence from left-to-right, as we must.)

As with top-down parsing, we don't normally construct the parse tree itself.

This type of parsing may be harder for you to visualize at first, but it is the most common type used.

We will be using the opposite of a replacement. This opposite action is called a reduction. Here one starts with the sentence, instead of the start symbol. One repeatedly looks for a match of the right side of a grammar rule (in general, a sequence of terminals and non-terminals) with the stuff at the scan point in the input sentence. Then the matched right side is replaced with the single left side. This process ends when the input sentence is used up and the derivations sequence is reduced to the start symbol.

Here the parsing problem is: at each stage there may be several several choices of what to reduce, and there may be several choices of what to reduce to. In each case how do we make the necessary choices?

As with the top-down process, here are two diagrams illustrating the bottom-up parsing process, without showing how to make the choices along the way.

Bottom-Up Parse of: a + b * c

Parse
Tree

a+b*c

E
|
T
|
F
|
a + b * c

E
|
T   T
|   |
F   F
|   |
a + b * c

E
|
T   T
|   |
F   F   F
|   |   |
a + b * c

E      T
|     /|\
|    / | \
T   T  |  |
|   |  |  |
F   F  |  F
|   |  |  |
a + b  *  c

     E
    /|\
   / | \
  /  |  \
 /   |   \
E    |   T
|    /  /|\
|   /  / | \
T  |  T  |  |
|  |  |  |  |
F  |  F  |  F
|  |  |  |  |
a  +  b  *  c

Step
Num

1-3

4-5

Reduc-
tions

Start

T <= E
F <= T
a <= F

F <= T
b <= F

c <= F

T * F <= T

E + T <= E

Rightmost Derivation backwards
a + b * c <== F + b * c <== T + b * c <== E + b * c <== E + F * c <== E + T * c <== E + T * F <== E + T <== E Red = what is reduced, NOT what reduced TO.

Revision date: 2013-09-14. (Please use ISO 8601, the International Standard.)