Online Resources: So far I haven't found much online about grammars. Here are three files, though:
Notation: The main concept has many different names: formal grammar or context-free (CF) grammar or Backus-Naur-Form (BNF) grammar.
A grammar is made up of a sequence of rules or productions. The rules are made up of symbols: non-terminals, terminals, and metasymbols. Each rule consists of a single non-terminal, followed by an arrow metasymbol, followed by a sequence of one or more terminals or non-terminals. The rules are used to make replacements one-by-one, to form a replacement sequence or a derivation. Each replacement consists of replacing a non-terminal on the left side of a rule with all the symbols on the right side. Replacements are made one-at-a-time until only a sequence of terminals remains, so that no more replacements are possible. The sequence is written with arrows using a double line, as shown beneath each of the examples in the text. Each grammar must have a special start symbol that is used as the start of every derivation.
The final sequence of terminals is said to be described or derived by the grammar. Notice that there can be infinitely many possible derived sequences, each finite in length. A sequence of terminals described by the grammar is called a sentence and the set of all possible sentences is called the language described by the grammar.
In addition to the derivation, one can also construct a parse tree corresponding to the derivation. Start with the start symbol as a root node, and insert a sub-tree corresponding to each replacement.
A leftmost derivation replaces the leftmost non-terminal at each stage, and similarly for a rightmost derivation. In case there is more than one parse tree (or more than one leftmost derivation, or more than one rightmost derivation) for any string of terminals described by the grammar, the grammar is said to be ambiguous. For most (pleasant) grammars, it is possible to disambiguate the grammar, either by rewriting it or by giving special disambiguating rules.
Recitation Problems:
A ---> a A c (A is start symbol) A ---> b
O ---> a | a E (O is start symbol) E ---> a O
S ---> A B (S is start symbol) A ---> a A | a B ---> b B c | b c
S ---> a B | b A (S is start symbol) A ---> a S | b A A | a B ---> b S | a B B | b
A ---> <id> = E E ---> E + T | E - T | T T ---> T * S | T / S | S S ---> F ^ S | F F ---> ( E ) | <id> <id> ---> a | b | c | d | . . .
int <id>; int <id>, <id>; int <id>, <id>, <id>;and so forth with any number of identifiers. (Here int is a reserved word, and <id> stands for an identifier token.) Write a simple (recursive) grammar for this language.
A ---> a (a any terminal) A ---> a B (a any terminal, B any non-terminal)
Convert the following regular grammar to a finite state machine:
A ---> a A | a B | b B | b B ---> b B | c
A ---> a A b | a b
First give the language described by this grammar.
Then argue that no finite state machine can describe this language. (If there are 10 states in the machine, consider the sentence with 11 a's followed by 11 b's. As the first 11 a's are processed, you must be in the same state twice. From this state you must be able to get to the final state as you use up the remaining a's and all the b's.) Thus this is not a regular grammar.
Key ideas: A formal grammar allows one to describe the syntax of a programming language in a formal (mathematical) way that is not subject to misinterpretations like an English description would be. These grammars are now essential descriptive tools but are also used to construct compilers.