 |
CS 3723
Programming Languages |
Parsing |
The Parsing Problem:
Given an unambiguous grammar and a sentence
(= a string of terminals) derived from the start symbol
of the grammar, the parsing problem is the problem
of calculating
- the parse tree, or
- the leftmost derivation, or
- the rightmost derivation.
Each of the above three items starts with "the" because each is
unique.
It is important to realize that the grammar rules can be
examined in any order, but the sentence must be scanned from
left to right. The parsing problem is to find an algorithm
that will carry out the parse.
There are two basic approaches to parsing, namely top-down
and bottom-up:
Top-Down Parsing:
In this case we do the following:
- Construct the parse tree from the top down and from left to right, and
- Construct the leftmost derivation from left to right.
In practice the parse tree itself is almost never constructed (too
much computation), although all the information needed to construct
it will be available.
The top-down parsing problem is:
At each stage there may be several choices of replacements
for the leftmost non-terminal. How do we decide which replacement to
use? |
Here are two diagrams illustrating the top-down parsing
process, without showing how to make the choices along the way.
Top-Down Parse of:
a + b * c |
Parse Tree |
E
|
E
/|\
E + T
|
E
/|\
E + T
|
T
|
F
|
a
|
E
/|\
/ | \
E + T
| /|\
T T * F
|
F
|
a
|
E
/|\
/ | \
E + T
| /|\
T T * F
| |
F F
| |
a b
|
E
/|\
/ | \
E + T
| /|\
T T * F
| | |
F F c
| |
a b
|
Step Num |
0 |
1 |
2-4 |
5 |
6-7 |
8 |
Der. Steps |
E=> |
E + T=> |
T + T=>
F + T=>
a + T=> |
a + T * F=> |
a + F * F=>
a + b * F=> |
a + b * c |
| |
Leftmost Derivation |
E ==>
E + T ==>
T + T ==>
F + T ==>
a + T ==>
a + T * F ==>
a + F * F ==>
a + b * F ==>
a + b * c |
|
Bottom-Up Parsing:
In this case we do the following:
- Construct the parse tree from the bottom up and from left to right,
and
- Construct the rightmost derivation from right to left, that is, backwards.
(This is hard to think about: because we're doing a rightmost derivation
backwards, that is, right-to-left, we are really once again working
on the input sentence from left-to-right, as we must.)
As with top-down parsing, we don't normally construct the parse tree itself.
This type of parsing may be harder for you to visualize at first,
but it is the most common type used.
We will be using the opposite of a replacement. This opposite
action is called a
reduction. Here one starts
with the sentence, instead of the start symbol. One repeatedly looks
for a match of the right side of a grammar rule (in general,
a sequence of terminals and non-terminals) with the stuff at
the scan point in the input sentence. Then the matched right side
is replaced with the single left side. This process ends when
the input sentence is used up and the derivations sequence is
reduced to the start symbol.
The bottum-up parsing problem is:
At each stage there may be several choices of what to
reduce, and there may be several choices of what to reduce to.
In each case how do we make the necessary choices?
|
As with the top-down process, here are two diagrams illustrating
the bottom-up parsing
process, without showing how to make the choices along the way.
Bottom-Up Parse of:
a + b * c |
Parse Tree |
a+b*c
|
E
|
T
|
F
|
a + b * c
|
E
|
T T
| |
F F
| |
a + b * c
|
E
|
T T
| |
F F F
| | |
a + b * c
|
E T
| /|\
| / | \
T T | |
| | | |
F F | F
| | | |
a + b * c
|
E
/|\
/ | \
/ | \
/ | \
E | T
| / /|\
| / / | \
T | T | |
| | | | |
F | F | F
| | | | |
a + b * c
|
Step Num |
0 |
1-3 |
4-5 |
6 |
7 |
8 |
Reduc- tions |
Start |
T <= E
F <= T
a <= F |
F <= T
b <= F |
c <= F |
T * F <= T |
E + T <= E |
| |
Rightmost Derivation
backwards |
a + b * c <==
F + b * c <==
T + b * c <==
E + b * c <==
E + F * c <==
E + T * c <==
E + T * F <==
E + T <==
E
Red = what is reduced,
NOT what is reduced to.
|
|
(Revision date: 2014-09-30.
Please use ISO
8601, the International Standard.)
|