CS 3723: Parsing

	CS 3723 Programming Languages
Parsing

The Parsing Problem: Given an unambiguous grammar and a sentence (= a string of terminals) derived from the start symbol of the grammar, the parsing problem is the problem of calculating

the parse tree, or
the leftmost derivation, or
the rightmost derivation.

Each of the above three items starts with "the" because each is unique.

It is important to realize that the grammar rules can be examined in any order, but the sentence must be scanned from left to right. The parsing problem is to find an algorithm that will carry out the parse.

There are two basic approaches to parsing, namely top-down and bottom-up:

Top-Down Parsing: In this case we do the following:

Construct the parse tree from the top down and from left to right, and
Construct the leftmost derivation from left to right.

In practice the parse tree itself is almost never constructed (too much computation), although all the information needed to construct it will be available.

The top-down parsing problem is:
At each stage there may be several choices of replacements for the leftmost
non-terminal. How do we decide which replacement to use?

Here are two diagrams illustrating the top-down parsing process, without showing how to make the choices along the way.

Top-Down Parse of: a + b * c
Parse Tree	E	E /\|\ E + T	E /\|\ E + T \| T \| F \| a	E /\|\ / \| \ E + T \| /\|\ T T * F \| F \| a	E /\|\ / \| \ E + T \| /\|\ T T * F \| \| F F \| \| a b	E /\|\ / \| \ E + T \| /\|\ T T * F \| \| \| F F c \| \| a b
Step Num	0	1	2-4	5	6-7	8
Der. Steps	E=>	E + T=>	T + T=> F + T=> a + T=>	a + T * F=>	a + F * F=> a + b * F=>	a + b * c

Leftmost Derivation
E ==> E + T ==> T + T ==> F + T ==> a + T ==> a + T * F ==> a + F * F ==> a + b * F ==> a + b * c

Bottom-Up Parsing: In this case we do the following:

Construct the parse tree from the bottom up and from left to right, and
Construct the rightmost derivation from right to left, that is, backwards. (This is hard to think about: because we're doing a rightmost derivation backwards, that is, right-to-left, we are really once again working on the input sentence from left-to-right, as we must.)

As with top-down parsing, we don't normally construct the parse tree itself.

This type of parsing may be harder for you to visualize at first, but it is the most common type used.

We will be using the opposite of a replacement. This opposite action is called a reduction. Here one starts with the sentence, instead of the start symbol. One repeatedly looks for a match of the right side of a grammar rule (in general, a sequence of terminals and non-terminals) with the stuff at the scan point in the input sentence. Then the matched right side is replaced with the single left side. This process ends when the input sentence is used up and the derivations sequence is reduced to the start symbol.

The bottum-up parsing problem is:
At each stage there may be several choices of what to reduce, and there may be several
choices of what to reduce to. In each case how do we make the necessary choices?

As with the top-down process, here are two diagrams illustrating the bottom-up parsing process, without showing how to make the choices along the way.

Bottom-Up Parse of: a + b * c

Parse
Tree

a+b*c

E
|
T
|
F
|
a + b * c

E
|
T   T
|   |
F   F
|   |
a + b * c

E
|
T   T
|   |
F   F   F
|   |   |
a + b * c

E      T
|     /|\
|    / | \
T   T  |  |
|   |  |  |
F   F  |  F
|   |  |  |
a + b  *  c

     E
    /|\
   / | \
  /  |  \
 /   |   \
E    |   T
|    /  /|\
|   /  / | \
T  |  T  |  |
|  |  |  |  |
F  |  F  |  F
|  |  |  |  |
a  +  b  *  c

Step
Num

1-3

4-5

Reduc-
tions

Start

T <= E
F <= T
a <= F

F <= T
b <= F

c <= F

T * F <= T

E + T <= E

Rightmost Derivation backwards
a + b * c <== F + b * c <== T + b * c <== E + b * c <== E + F * c <== E + T * c <== E + T * F <== E + T <== E Red = what is reduced, NOT what is reduced to.

(Revision date: 2014-09-30. Please use ISO 8601, the International Standard.)