CS 3723
Programming Languages  
Fall 2014
H5. Context-free Grammars
Week 5: Sep 22 - 26

Submit following directions at: submissions and rules at: rules. Deadlines are:
  • 2014-10-06  23:59:59 (that's Mon, 6 Oct 2014, 11:59:59 pm) for full credit.
  • 2014-10-10  23:59:59 (that's Fri,  10 Oct 2014, 11:59:59 pm) for 75% credit.

Note: You should do all 6 of the problems below. I'd announced that I dropped 2 problems from this recitation, but that was before it was posted. I dropped problems 7 and 8.


Context-free (BNF, CF, Formal) Grammars: Study the following pages:

In the three links above, familiarize yourself with the terms:

Basics of context-free Grammars (also called: CFG, Formal Grammar, BNF Grammar, Backus-Naur grammar, or just a grammar):

  • Symbols,
  • Alphabet,
  • Sentence,
  • Language

Context-free grammar (CFG):

  • Non-Terminal Symbols ( = Non-Terminals)
    Non-Terminals = exactly the symbols on the left side of rules
  • Terminal Symbols ( = Terminals)
    Terminals = all other symbols except for meta-symbols
  • Meta-Symbols (for now just −−> and |)
    Meta-Symbols: later we'll add { } ( ) as additional meta-symbols
  • Replacement Rules ( = Rewriting Rules, or Productions, or just Rules)
  • Start Symbol
    This is usually the non-terminal on the left side of the first grammar rule
  • Derivation ( = Derivation Sequence, or Replacement Sequence)
    Sequence starting with the start symbol, resulting from repeated replacements using rules
  • Sentence = any sequence of terminals that can be the result of a derivation sequence

Derivations, Parse Trees, and Ambiguity:

  • Rightmost Derivation,
  • Leftmost Derivation,
  • Parse Tree,
  • Ambiguity, that is, Ambiguous Grammar.
    If there is any sentence at all with more than one parse tree or leftmost derivation?

Syntax Diagrams:

Note: In work you submit for this course, I insist that you use −−> for grammar rules, and ==> for derivations. Don't mix up the notations.


  1. In each case below give three distinct derivations leading to a sequence of terminals (a sentence in the language generated by the grammar). From this initial work come up with an informal description of the language defined by the given grammar. "Informal" means short and simple, perhaps using an English language description or set-theory notation. Note that there are no blank symbols used, since I'll usually stay away from blanks (which are handled by the scanner anyway).
    [For example, in 1.a. below, one such derivation is:
      A ==> a A c ==> a a A c c ==> a a b c c .
    You need two more derivations and then need to guess the language. Notice that we didn't need to specify the start symbol, since it follows our convention of being the first non-terminal encountered.]

    1.      A −−> a A c                 (A is start symbol)
           A −−> b
         
    2.      O −−> a  |  a E             (O is start symbol)
           E −−> a O                     (Hint: Why use symbols E and O?)
         
    3.      L −−> L A | A               (L is start symbol)
           A −−> a
         
    4.      N −−> ( C )                 (N is start symbol)
           C −−> C , A | A             (Anything not a non-terminal and not
           A −−> a | b | c | d          a meta-symbol must be a terminal.)
         
    5.      S −−> A B                   (S is start symbol)
           A −−> a A  |  a
           B −−> b B c  |  b c
         

  2. Here is a very simple example grammar:
      
      E  −−> E + E
      E  −−> E * E 
      E  −−> ( E ) | a | b | c | d 
      
    1. Give the unique leftmost derivation and the unique parse tree for:

        a * ( b + c )
    2. Give the two distinct leftmost derivations and the two distinct parse trees for:

        a * b + c 

  3. Here is a "standard" simple example grammar that I will often use:
      
      E  −−> E + T |  T
      T  −−> T * F |  F
      F  −−> ( E ) | a | b | c | d 
      
    In each case below, given the sentence, construct the leftmost derivation sequence ( = replacement sequence = derivation) and the parse tree. (These are both unique.)

    1.   a + b + c

    2.   a * b + c * d


  4. Consider the following grammar for an assignment statement (A), with an operator ^ representing exponentiation. (This is a "beefed-up" version of the standard grammar.)
      
      A  −−> L = E
      E  −−> E + T | E - T | T
      T  −−> T * S | T / S | S
      S  −−> F ^ S | F
      F  −−> ( E ) | L
      L  −−> a | b | c | d 
      
    1. What are the terminal symbols? The non-terminal symbols? The metasymbols?

    2. Construct leftmost derivations and parse trees for each of the following sentences:

      1.    d = a ^ b ^ c

      2.    d = a + b * c ^ d

    3. What do the derivations and trees in i. and ii. above say about the precedence and associativity of the new operator ^?


  5. Refer to the first link at the top of this page. Using the grammar for a double at the end of that page, give the parse tree for the following double. (You should ignore blanks in the grammar rules. Blanks are not usually handled by the grammar, but instead by the lexical level, which we will do later in the course.)

    1.  2.45E−3


  6. Refer to the third link at the top of this page. Using either one of the first two grammars on the page, give the parse tree for the following sentence. (Assume that "id" and"const" are terminals, since they would be normally be handled by the scanner. In the real source, each of these would be an actual identifier or constant, not all the same. Again ignore blanks.)

    1. if ( id ) { while ( id ) id = id + const ; }

( Revision date: 2014-09-23. (Please use ISO 8601, the International Standard.)