CS 3723
 Programming Languages 
   if-then-else Ambiguity  


Material about Grammars:

This is the third of 4 pages about grammars.


Ambiguous if-then-else in a Grammar: Here are grammar rules for an if-then-else statement, sort of like C, except that there is a required then keyword. These statements are just a few of a large number of statements describing and entire programming language. The details don't matter here. A curly bracket block statement behaves like a single statement.

<if-stmt>: Grammar Rules
<stmt>       −−>  <if-stmt>  |  <block-stmt> | . . .
<block-stmt> −−>  {  <stmt-list>  }
<stmt-list>  −−>  <stmt>  <stmt-list>  |  <stmt>

<if-stmt>    −−>  if <cond> then <stmt> 
<if-stmt>    −−>  if <cond> then <stmt> else <stmt>

It turns out that the above if-stmt makes the grammar ambiguous. It's hard to show that a grammar is not ambiguous, but an easy way to show it is ambiguous is to give a sentence with two parse trees. Assuming that c1 and c2 stand for "conditional expressions", and that s1 and s2 stand for statements, then the string of terminals at the bottom of the two diagrams below give a sentence with two distinct parse trees.

correct: else matches nearest then
              <if-stmt>
                  |
+----+---+--------+-------+
|    |   |                |
|    |   |             <stmt>
|    |   |                |
|    |   |            <if-stmt>
|    |   |                |
|    |   |   +----+----+--+-+---+----+
|    |   |   |    |    |    |   |    |
| <cond> |   | <cond>  | <stmt> | <stmt>
|    |   |   |    |    |    |   |    |
if  c1 then if   c2  then  s1 else  s2
 
not correct: else matches outer then
               <if-stmt>
                   |
+----+---+---------+--+---------+----+
|    |   |            |         |    |
|    |   |         <stmt>       |    |
|    |   |            |         |    |
|    |   |        <if-stmt>     |    |
|    |   |            |         |    |
|    |   |   +----+---++----+   |    |
|    |   |   |    |    |    |   |    |
| <cond> |   | <cond>  | <stmt> | <stmt>
|    |   |   |    |    |    |   |    |
if  c1 then if   c2  then  s1 else  s2

This is a common problem, present in most of the "standard" languages: C, C++, Java, C#, but not Ruby or Python. It is possible to rewrite the grammar so that it is no longer ambiguous but the language is the same. but this is only done in course exercises. In practice (even with the compiler) one uses the standard disambiguating rule: match each unmatched "else" to the nearest unmatched "then". This issue will come up again soon when we discuss parsers for compilers.


An Example From C: Here is some C code that illustrates this ambiguity, where in this case there is no "then" keyword. Each of the code segments below are legal C (unless I mistyped -- I took them from the book C: A Reference Manual by Harrison and Steele).

Example C Code: Four Segments
// not getting what was intended, or the logic is messed up.
// Also indenting of "else" very confusing
if ((k >= 0) && (k < TABLE_SIZE))
   if (table[k] >= 0)
      printf("Entry %d is %d\n", k, table[k]);
else printf("Error: index %d out of range.\n", k);

// perhaps not getting what was intended at first, // but here the logic is correct (changed error message) if ((k >= 0) && (k < TABLE_SIZE)) if (table[k] >= 0) printf("Entry %d is %d\n", k, table[k]); else printf("Error: entry %d is negative.\n", k);

// perhaps getting what was first intended, based on the // original error message. The { } ARE required if ((k >= 0) && (k < TABLE_SIZE)) { if (table[k] >= 0) printf("Entry %d is %d\n", k, table[k]); } else printf("Error: index %d out of range.\n", k);

// perhaps getting what was intended at first, but new error // message. The {} are NOT required but make it less confusing if ((k >= 0) && (k < TABLE_SIZE)) { if (table[k] >= 0) printf("Entry %d is %d\n", k, table[k]); else printf("Error: entry %d is negative.\n", k); }

In the above code segments, the first, second, and fourth segments are essentially the same, except that the second has a sensible error message, and the fourth also uses (unneeded) curly brackets to make the code clearer. The third segment shows required curly brackets to make the "else" match the outer "then", so that the original error message makes sense. The third and fourth segments could be combined to cover all the possibilities:

Example C Code: Cover all Possibilities
// cover both cases
// The {} are NOT required but make it less confusing
if ((k >= 0) && (k < TABLE_SIZE)) {
   if (table[k] >= 0)
      printf("Entry %d is %d\n", k, table[k]);
   else printf("Error: entry %d is negative.\n", k);
}
else printf("Error: index %d out of range.\n", k);


The Same Example in Python: This example uses a Python list, which only resembles the Java array. The list below has values for the index from 0 to 5 (inclusive), but indexes from -6 to -1 are also legal and give the same list.

Example C Code: Cover all Possibilities
# if_then.py: check if-else 
import sys
TABLE_SIZE = 6
table = [ i-2 for i in range(6) ]  # table = [ -2, -1, 0, 1, 2, 3]

def check(n):
    sys.stdout.write("Example " + str(n) + ": ")
    if (k >= 0) and (k < TABLE_SIZE):
        if table[k] >= 0:
            sys.stdout.write("Entry %d is %d\n" % (k, table[k]) )
        else:
            sys.stdout.write("Error: entry %d is negative.\n" % k)
    else:
        sys.stdout.write("Error: index %d out of range.\n" % k)

k = 3
check(1)
k = 1
check(2)
k = 8
check(3)
k = -2
check(4)
sys.stdout.write("Entry  4 is " + str(table[4]) + "\n")
sys.stdout.write("Entry -2 is " + str(table[-2]) + "\n")
Output
Example 1: Entry 3 is 1
Example 2: Error: entry 1 is negative.
Example 3: Error: index 8 out of range.
Example 4: Error: index -2 out of range.
Entry 4 is 2
Entry -2 is 2  # table[4} same as table[-2]

(Revision date: 2014-09-20. (Please use ISO 8601, the International Standard.)