 |
CS 3723
Programming Languages |
if-then-else Ambiguity |
Material about Grammars:
This is the third of 4 pages about grammars.
Ambiguous if-then-else in a Grammar:
Here are grammar rules for an if-then-else statement,
sort of like C, except that there is a required then keyword.
These statements are just a few of a large number of statements
describing and entire programming language. The details don't
matter here. A curly bracket
block statement behaves like a single statement.
<if-stmt>:
Grammar Rules |
<stmt> −−> <if-stmt> | <block-stmt> | . . .
<block-stmt> −−> { <stmt-list> }
<stmt-list> −−> <stmt> <stmt-list> | <stmt>
<if-stmt> −−> if <cond> then <stmt>
<if-stmt> −−> if <cond> then <stmt> else <stmt>
|
It turns out that the above if-stmt makes the grammar
ambiguous. It's hard to show that a grammar is not ambiguous,
but an easy way to show it is ambiguous is to give a sentence with
two parse trees. Assuming that c1 and c2 stand
for "conditional expressions", and that s1 and s2
stand for statements, then the string of terminals at the bottom of
the two diagrams below give a sentence with two distinct parse trees.
correct:
else matches nearest then |
<if-stmt>
|
+----+---+--------+-------+
| | | |
| | | <stmt>
| | | |
| | | <if-stmt>
| | | |
| | | +----+----+--+-+---+----+
| | | | | | | | |
| <cond> | | <cond> | <stmt> | <stmt>
| | | | | | | | |
if c1 then if c2 then s1 else s2
|
| |
not correct:
else matches outer then |
<if-stmt>
|
+----+---+---------+--+---------+----+
| | | | | |
| | | <stmt> | |
| | | | | |
| | | <if-stmt> | |
| | | | | |
| | | +----+---++----+ | |
| | | | | | | | |
| <cond> | | <cond> | <stmt> | <stmt>
| | | | | | | | |
if c1 then if c2 then s1 else s2
|
|
This is a common problem, present in most of the "standard"
languages: C, C++, Java, C#, but not Ruby or Python.
It is possible to rewrite the grammar so that it is no longer
ambiguous but the language is the same. but this is only done in course
exercises. In practice (even with the compiler) one
uses the standard disambiguating rule: match each
unmatched "else" to the nearest unmatched "then". This issue
will come up again soon when we discuss parsers for compilers.
An Example From C:
Here is some C code that illustrates this ambiguity, where
in this case there is no "then" keyword. Each of the code segments below
are legal C (unless I mistyped -- I took them from the book
C: A Reference Manual by Harrison and Steele).
Example
C Code: Four Segments |
// not getting what was intended, or the logic is messed up.
// Also indenting of "else" very confusing
if ((k >= 0) && (k < TABLE_SIZE))
if (table[k] >= 0)
printf("Entry %d is %d\n", k, table[k]);
else printf("Error: index %d out of range.\n", k);
// perhaps not getting what was intended at first,
// but here the logic is correct (changed error message)
if ((k >= 0) && (k < TABLE_SIZE))
if (table[k] >= 0)
printf("Entry %d is %d\n", k, table[k]);
else printf("Error: entry %d is negative.\n", k);
// perhaps getting what was first intended, based on the
// original error message. The { } ARE required
if ((k >= 0) && (k < TABLE_SIZE)) {
if (table[k] >= 0)
printf("Entry %d is %d\n", k, table[k]);
}
else printf("Error: index %d out of range.\n", k);
// perhaps getting what was intended at first, but new error
// message. The {} are NOT required but make it less confusing
if ((k >= 0) && (k < TABLE_SIZE)) {
if (table[k] >= 0)
printf("Entry %d is %d\n", k, table[k]);
else printf("Error: entry %d is negative.\n", k);
} |
In the above code segments, the first, second, and fourth segments
are essentially the same, except that the second has a sensible
error message, and the fourth also uses (unneeded) curly
brackets to make the code clearer. The third segment shows
required curly brackets to make the "else" match the outer
"then", so that the original error message makes sense.
The third and fourth segments could be combined to cover all
the possibilities:
Example
C Code: Cover all Possibilities |
// cover both cases
// The {} are NOT required but make it less confusing
if ((k >= 0) && (k < TABLE_SIZE)) {
if (table[k] >= 0)
printf("Entry %d is %d\n", k, table[k]);
else printf("Error: entry %d is negative.\n", k);
}
else printf("Error: index %d out of range.\n", k);
|
The Same Example in Python:
This example uses a Python list, which only resembles the
Java array. The list below has values for the index
from 0 to 5 (inclusive), but indexes from -6
to -1 are also legal and give the same list.
Example
C Code: Cover all Possibilities |
# if_then.py: check if-else
import sys
TABLE_SIZE = 6
table = [ i-2 for i in range(6) ] # table = [ -2, -1, 0, 1, 2, 3]
def check(n):
sys.stdout.write("Example " + str(n) + ": ")
if (k >= 0) and (k < TABLE_SIZE):
if table[k] >= 0:
sys.stdout.write("Entry %d is %d\n" % (k, table[k]) )
else:
sys.stdout.write("Error: entry %d is negative.\n" % k)
else:
sys.stdout.write("Error: index %d out of range.\n" % k)
k = 3
check(1)
k = 1
check(2)
k = 8
check(3)
k = -2
check(4)
sys.stdout.write("Entry 4 is " + str(table[4]) + "\n")
sys.stdout.write("Entry -2 is " + str(table[-2]) + "\n")
|
Output |
Example 1: Entry 3 is 1
Example 2: Error: entry 1 is negative.
Example 3: Error: index 8 out of range.
Example 4: Error: index -2 out of range.
Entry 4 is 2
Entry -2 is 2 # table[4} same as table[-2]
|
(Revision date: 2014-09-20.
(Please use ISO
8601, the International Standard.)
|