CS 3723
 Programming Languages 
   II. Derivations, Parse Trees,  
and Ambiguity


Material about Grammars:

This is the second of 4 pages about grammars.


Derivations: First let's look at a simple grammar that describes a limited collection of arithmetic expressions. (This could have been written with a single rule using more "or"s.)

Grammar: Arith. Exp.
Ambiguous
E −−> E + E
E −−> E * E
E −−> ( E )
E −−> a  |  b  |  c  |  ...

Here is a sequence of replacements that leads to a sequence of terminal symbols.

Leftmost Derivation: (a+b)*c
E ==> E * E ==> ( E ) * E ==> ( E + E) * E ==>
( a + E ) * E ==> ( a + b ) * E ==> ( a + b ) * c

In this case, we obtained the final sentence ( a + b ) * c starting from E. One says that "E derives ( a + b ) * c" or (to use passive voice) "( a + b ) * c is derived from E". The sequence is called a derivation sequence. In this case it is a leftmost derivation sequence, because at each stage the leftmost non-terminal was replaced. Each non-terminal that is replaced is colored red above. Sometimes there was only one non-terminal, so it is leftmost (on an honorary basis). Here is another derivation:

Rightmost Derivation: (a+b)*c
E ==> E * E ==> E * c ==> ( E ) * c ==>
( E + E ) * c ==> ( E + b ) * c ==> ( a + b ) * c

This was a rightmost derivation sequence, because at each stage the rightmost non-terminal was replaced.

The sentence ( a + b ) * c has a unique leftmost derivation, a unique (different) rightmost derivation and the unique parse tree shown below:

Parse Tree: ( a + b ) * c
     E
    /|\
   E * E
  /|\   \
 ( E )   c 
  /|\
 E + E
 |   |
 a   b
            E
           /|\
          / | \
         /  |  \
        /   |   \
       /    |    \
      E     |     E 
     /|\    |     |
    / | \   |     |
   /  E  \  |     |
  /  /|\  \ |     |
 |  / | \  ||     |
 |  E | E  ||     |
 |  | | |  ||     |
 (  a + b  )*     c


Ambiguity: There are other sentences derived from E above that have more than one parse tree, and corresponding left- and rightmost derivations. For example, the very simple sentence a + b * c. The table looks at leftmost derivations and parse trees:

1st Leftmost Der. 2nd Leftmost Der.
E ==> E + E
  ==> a + E
  ==> a + E * E
  ==> a + b * E 
  ==> a + b * c
E ==> E * E
  ==> E + E * E
  ==> a + E * E
  ==> a + b * E 
  ==> a + b * c
1st Parse Tree 2nd Parse Tree
       E
      /|\
     / | \
    E  +  E
    |    /|\
    |   / | \
    a  E  *  E
       |     |
       b     c
          E
         /|\
        / | \
       E  *  E
      /|\    |
     / | \   |
    E  +  E  c
    |     |
    a     b

Even if some parse trees are unique, if there are multiple parse trees for any sentence, then the grammar is called ambiguous. In a programming language it is not acceptable to have more than one possible reading of a construct. We can't flip a coin to decide which parse tree to use. There are several ways around this problem:

  1. Rewrite the grammar so that it is no longer ambiguous yet still accepts exactly the same language. This is not always possible.
  2. Introduce extra rules that allow the program to decide which of multiple parse trees to use. These are called disambiguating rules. (Ah, yes, "disambiguating", one of my favorite words.)
  3. An ambiguous grammar may signal problems with language design, and the programming language itself might be changed.


Unambiguous Grammar: Here is a new grammar that is not ambiguous and accepts the same language as the previous grammar:

Grammar: Arith. Exp.
Unambiguous =
Not Ambiguous
E −−> E + T  |  T
T −−> T * F  |  F
F −−> ( E )   |  a  |  b  |  c ...

It's can be very hard (or impossible) to prove results such as determining whether or not a grammar is ambiguous. In practice there are usually no problems. With this grammar every sentence has a unique leftmost and rightmost derivation and a unique parse tree. For example, the above sentence a + b * c that caused a problem has the following leftmost derivation and parse tree on the left (along with the its twin a * b + c on the right):

Leftmost Derivations (Unique)
a + b * c a * b + c
E ==> E + T
  ==> T + T
  ==> F + T
  ==> a + T
  ==> a + T * F
  ==> a + F * F 
  ==> a + b * F
  ==> a + b * c
E ==> E + T
  ==> T + T
  ==> T * F + T
  ==> F * F + T
  ==> a * F + T
  ==> a * b + T
  ==> a * b + F 
  ==> a * b + c
1st Parse Tree 2nd Parse Tree
       E
      /|\
     / | \
    E  +  T
    |    /|\
    T   T * F
    |   |   |
    F   F   c
    |   |
    a   b
          E
         /|\
        / | \
       E  +  T
       |     |
       T     F
      /|\    |
     / | \   c
    T  *  F
    |     |
    F     b
    |
    a

(Revision date: 2014-09-20. (Please use ISO 8601, the International Standard.)