CS 3721 -- Formal Grammars

CS 3721 Programming Languages
Formal Grammars

Overview: This sectionn covers the formal (mathematical) description of the syntax (that is, the appearance) of programming languages. This concept is called a formal grammar. This remarkable tool allows a finite description of infinitely many different objects. It also helps in the construction of a compiler.

Best Reference: Here is a 17-page tutorial that covers most of what I will be presenting, in more-or-less the way I will be presenting it:

Page	114	115	116	117	118	119	120	121	122	123	124	125	126	127	128	129	130
Small	S	S	S	S	S	S	S	S	S	S	S	S	S	S	S	S	S
Medium	M	M	M	M	M	M	M	M	M	M	M	M	M	M	M	M	M
Large	L	L	L	L	L	L	L	L	L	L	L	L	L	L	L	L	L

These pages are good, but they have a few potentially confusing misprints: In the captions to Figures 3.2 and 3.3, A = B + (A * C) should be A = B + C * A. In the caption to Figures 3.3, A = B + (A + C) should be A = B + C + A.

Online Resources: So far I haven't found much online about grammars.

Tutorial originally from Rochester (Jan 16 1996 / Tom LeBlanc / leblanc@cs.rochester.edu)
Generating and Recognizing Recursive Descriptions of Patterns with Context-Free Grammars
- A context-free grammar is a set of recursive rewriting rules (or productions) used to generate patterns of strings. Context-free grammars are often used to define the syntax of programming languages.
- A parse tree displays the structure used by a grammar to generate an input string. Parse trees are typically used within a compiler to give structure to an input program in terms of the syntactic rules used to define valid programs.
- A parser is an algorithm that determines whether a given input string is in a language (and, as a side-effect, usually produces a parse tree for the input). There is a procedure for generating a parser from a given context-free grammar.
Examples
More examples
21 Slides (in PDF)

Notation: The main concept has many different names: formal grammar or context-free (CF) grammar or Backus-Naur-Form (BNF) grammar.

A grammar is made up of a sequence of rules or productions. The rules are made up of symbols: non-terminals, terminals, and metasymbols. Each rule consists of a single non-terminal, followed by an arrow metasymbol, followed by a sequence of one or more terminals or non-terminals. The rules are used to make replacements one-by-one, to form a replacement sequence or a derivation. Each replacement consists of replacing a non-terminal on the left side of a rule with all the symbols on the right side. Replacements are made one-at-a-time until only a sequence of terminals remains, so that no more replacements are possible. The sequence is written with arrows using a double line, as shown in class. Each grammar must have a special start symbol that is used as the start of every derivation.

The final sequence of terminals is said to be described or derived by the grammar. Notice that there can be infinitely many possible derived sequences, each finite in length. A sequence of terminals described by the grammar is called a sentence and the set of all possible sentences is called the language described by the grammar.

In addition to the derivation, one can also construct a parse tree corresponding to the derivation. Start with the start symbol as a root node, and insert a sub-tree corresponding to each replacement.

A leftmost derivation replaces the leftmost non-terminal at each stage, and similarly for a rightmost derivation. In case there is more than one parse tree (or more than one leftmost derivation, or more than one rightmost derivation) for any string of terminals described by the grammar, the grammar is said to be ambiguous. For most (pleasant) grammars, it is possible to disambiguate the grammar, either by rewriting it or by giving special disambiguating rules.

Other treatments of this subject: Here are a few links to other similar discussions of this area:

Revision date: 2004-07-09. (Use ISO 8601, an International Standard.)

CS 3721 Programming Languages Formal Grammars

CS 3721 Programming Languages
Formal Grammars