|
CS 3721, Spring 2004
Programming Languages
Recitation 4: February 2, 4
Formal Grammars
MW 09:00-09:50 pm, SB 3.01.04
Due by email:
2004-02-09 23:59:59
|
Recitation 4 must be sent by email
following directions at: email submissions on or before
- 2004-02-09 23:59:59 (that's Monday, 9 February 2004, 11:59:59 pm)
for full credit.
- 2004-02-13 23:59:59 (that's Friday, 13 February 2004, 11:59:59 pm)
for 75% credit.
|
Overview:
This recitation covers the formal (mathematical)
description of the syntax (that is, the appearance) of programming
languages. The tool used is called a formal grammar.
This remarkable tool allows a finite description
of infinitely many different objects. It also helps in
the construction of a compiler.
Online Resources: So far I haven't found much online
about grammars.
Notation: The main concept has many different names:
formal grammar or context-free (CF) grammar or
Backus-Naur-Form (BNF) grammar.
A grammar is made up of a sequence of rules or productions.
The rules are made up of symbols: non-terminals,
terminals, and metasymbols. Each rule consists
of a single non-terminal, followed by an arrow metasymbol,
followed by a sequence of one or more terminals or non-terminals.
The rules are used to make replacements one-by-one,
to form a replacement sequence or a derivation.
Each replacement consists of replacing a non-terminal on the
left side of a rule with all the symbols on the right side.
Replacements are made one-at-a-time until only a sequence of
terminals remains, so that no more replacements are possible.
The sequence is written with arrows using a double line,
as shown in class.
Each grammar must have a special start symbol
that is used as the start of every derivation.
The final sequence of terminals is said to be described
or derived by the grammar. Notice that there can be
infinitely many possible
derived sequences, each finite in length.
A sequence of terminals described by the grammar is called
a sentence
and the set of all possible sentences is called the
language described by the grammar.
In addition to the derivation, one can also construct a
parse tree corresponding to the derivation.
Start with the start symbol as a root node, and insert
a sub-tree corresponding to each replacement.
A leftmost derivation replaces the leftmost non-terminal
at each stage, and similarly for a rightmost derivation.
In case there is more than one parse tree (or more than one
leftmost derivation, or more than one rightmost derivation)
for any string of terminals described by the grammar,
the grammar is said to be ambiguous. For most (pleasant) grammars, it is
possible to disambiguate the grammar, either by rewriting
it or by giving special disambiguating rules.
Recitation Problems:
- In each case below give an informal description of the language
defined by the given grammar. "Informal" means short and simple.
One way to get started is to try various derivation sequences
and get intuition about what sentences can be generated.
For example, in 1.a. below, one can write A ==> a A c ==> a b c,
or A ==> a A c ==> a a A c c ==> a a b c c, and so forth.
-
A ---> a A c (A is start symbol)
A ---> b
-
O ---> a | a E (O is start symbol)
E ---> a O
-
S ---> A B (S is start symbol)
A ---> a A | a
B ---> b B c | b c
- (This one is really hard and is just to show how complicated
grammars can get. Don't spend too much time on it.)
S ---> a B | b A (S is start symbol)
A ---> a S | b A A | a
B ---> b S | a B B | b
- Consider the following grammar for an assignment statement
(A), with an operator ^ representing exponentiation):
A ---> <id> = E
E ---> E + T | E - T | T
T ---> T * S | T / S | S
S ---> F ^ S | F
F ---> ( E ) | <id>
<id> ---> a | b | c | d | . . .
- What are the terminal symbols?
The non-terminal symbols? The metasymbols?
- Construct parse trees for each of the following sentences:
(But don't write them down, since this is
too irritating to do in a text file. Instead, just use them
to answer question c below.)
- d = a + b + c
- d = a ^ b ^ c
- d = a + b * c ^ d
- d = a * (b + c)
- What do the trees in ii. and iii. above say about the
precedence and associativity of the new operator ^?
- Lists of arbitrarily many items often come up in programming languages,
using different separators, such as blanks, commas, or semicolons.
- For example, consider a language consisting of simple declarations of
int identifiers, as in C or Java:
int <id>;
int <id>, <id>;
int <id>, <id>, <id>;
and so forth with any number of identifiers.
(Here int is a reserved word,
and <id> stands for an identifier token.)
Write a simple (recursive) grammar for this language.
- There is a convenient notation that is often used
with grammar rules: Use {items} to mean "zero or more
repetitions of items", that is:
items or items items
or items items items, and so forth.
Here items could be anything at all: a sequence
of several symbols.
Use this notation to describe the language in a. above in a
simpler way.
- This question users the examples in the separate web page
Examples.
- Show that the grammar in item 2 is ambiguous by giving a
sentence (sequence of terminal symbols) with more than one
parse tree.
- Show that the grammar in item 13 is ambiguous.
- Show that the grammar in item 14 is ambiguous.
What strange and undesirable property does this grammar have?
- What is the problem with the grammar in item 15?
- A regular grammar has only rules of the form:
A ---> a (A any non-terminal, a any terminal)
A ---> a B (A any non-terminal, a any terminal, B any non-terminal)
- Write a regular grammar for identifiers.
- Write a regular grammar for integer constants.
- Here is an algorithm to convert any regular grammar to
a finite state machine:
- For each non-terminal A,
construct a circled state labeled with A.
- Construct and extra double circled terminal state T.
- For each rule A ---> a, draw an arrow labeled
with a from the state labeled with A
to the state labeled with T.
- For each rule A ---> b B, draw an arrow labeled
with b from the state labeled with A
to the state labeled with B.
Convert the following regular grammar to a finite state machine:
A ---> a A | a B | b B | b
B ---> b B | c
- There is a very similar algorithm to convert a finite state
machine to a regular grammar. Figure out the algorithm (you don't
need to state it formally) and use it to convert the finite state
machine for a C-style comment to a regular grammar.
(See Comment FSM here.)
- (This question is also hard!)
Consider the following grammar:
A ---> a A b | a b
First give the language described by this grammar.
Then argue that no finite state machine can describe this language.
(If there are 10 states in the machine, consider the
sentence with 11 a's followed by 11 b's.
As the first 11 a's are processed, you must
be in the same state twice. From this state you must be able
to get to the final state as you use up the remaining a's
and all the b's.) Thus this is not a regular grammar.
What you should email to
cs3723@cs.utsa.edu:
Refer to the email submissions directions and to
deadlines at the top of this page. The text file that you send
should first have Your Name, the Course Number,
and the Recitation Number. The rest of the file
should have answers to the questions above in it.
Contents of email submission
for Recitation 4:
Last Name, First Name; Course Number; Recitation Number (4).
Answers to problems 1, 2, 3, 4, and 5 above (where each has several
subparts). Problems should be in the order given and should
be clearly labeled, with 1, 2, etc., and a., b., etc.
|
Key ideas: A formal grammar allows one to describe the
syntax of a programming language in a formal (mathematical) way
that is not subject to misinterpretations like an English description would be.
They allow an infinite collection of objects (a language) to be described
by a relatively simple short grammar.
These grammars are now essential
descriptive tools and are also directly used to construct compilers,
as we will see in the next few recitations.
Revision date: 2004-01-01.
(Use ISO 8601,
an International Standard.)