CS 3723
 Programming Languages 
   III. BNF Notation  


Material about Grammars:

This is the third of 4 pages about grammars.


Fragments of a BNF Grammar for C: Here we present a small subset of a BNF grammar for the C language, showing several simple notation variations.

In classroom examples we've been using a single capital letter to represent a non-terminal. With only 26 letters this is unacceptably limiting. The first C language grammar used below defines 65 non-terminals, plus others defined by the lexical analyzer (identifiers, constants). Grammars usually use an identifier for a non-terminal, distinguishing from terminals in several ways as shown below.

The four examples below each have the same terminal symbols and each define the same language (= set of finite strings of terminal symbols). The first three examples use different simple (and pretty obvious) strategies to tell a terminal from a non-terminal. The last example uses a single capital letter for each non-terminal, and even in this simple example we see how few such letters there are.

In each example, the first non-terminal defined (stat, or <stat>, or stat or S) is the start symbol. Notice that the first two use ":" and "::=" where we've been using an arrow.

Non-terminals in Lower Case,
Terminals in Quotes,  Full Grammar
stat       : cmpd_stat
           | if_stat
           | iter_stat
           | assgn_stat

stat_list  : stat
           | stat_list stat

cmpd_stat  : '{' stat_list '}'

if_stat    : 'if' '(' exp ')' stat
           | 'if' '(' exp ')' stat 'else' stat

iter_stat  : 'while' '(' exp ')' stat

assgn_stat : id '=' exp ';'

exp        : exp op exp
           | id
           | const

op         : '+' | '-' | '*' | '/' (etc.)
 
Non-terminals in < > Brackets
Full Grammar
<stat>       ::= <cmpd_stat>
             | <if_stat>
             | <iter_stat>
             | <assgn_stat>

<stat_list>  ::= <stat>
             | <stat_list> <stat>

<cmpd_stat>  ::= { <stat_list> }

<if_stat>    ::= if ( <exp> ) <stat>
             | if ( <exp> ) <stat> else <stat>

<iter_stat>  ::= while ( <exp> ) <stat>

<assgn_stat> ::= <id> = <exp> ;

<exp>        ::= <exp> <op> <exp>
             | <id>
             | <const>

<op>         ::= + | - | * | / (etc.)

Non-terminals in Italics,
Terminals in Bold
stat −−> cmpd_stat
     | if_stat
     | iter_stat
     | assgn_stat

stat_list −−> stat
     | stat_list stat

cmpd_stat −−> { stat_list }

if_stat −−> if ( exp ) stat
     | if ( exp ) stat  else  stat

iter_stat −−> while ( exp ) stat

assgn_stat −−> id  =  exp ;

exp −−> exp  op  exp
     | id
     | const

op −−> + | - | * | / (etc.)

  
Non-terminals in
Capital Letters
S  −−> C  |  I  |  W  |  A

L  −−> S  |  L S

C  −−> { L }

I  −−> if ( E ) S  |
       if ( E ) S else S

W  −−> while ( E ) S

A  −−> id = E ;

E  −−> E P E  |  id  | const

P  −−> + | - | * | / (etc.)
Correspondence
S is stat
L is stat_list
C is cmpd_stat
I is if_stat
W is iter_stat
A is assgn_stat
E is exp
P is op

 

Additional Common Notations: The grammars above use only two metasymbols (= symbols that have special meaning within the grammar, distinct from the terminal and non-terminal symbols): "−−>" and "|". As you introduce more powerful possibilities for notation, you have to introduce more metasymbols. Here are some common examples:

  • Use curly brackets to mean "zero or more occurrences of " what is inside the brackets.
  • Use square brackets to mean " zero or one occurrence of ", that is, an optional item.
  • Use parentheses for grouping, so that ('+' | '-') ident means the same as: '+' ident | '-' ident.
  • Use regular expressions.

Syntax Diagrams: The diagrams below show another, quite different, way to represent BNF Grammars, but the syntax diagrams are completely equivalent to the other notations. In the example below, the "normal" grammar is on the left. On the right are the syntax diagrams. There are endless variations with these diagrams, but here each separate diagram defines a non-terminal. Non-terminals are in boxes and terminals are in ovals. To use the definition you just follow the arrows.

Expression Grammar
<expression> ::= <term> | 
            <expression> "+" <term>

<term>       ::= <factor> |
                 <term> "*" <factor>

<factor>     ::= <constant> |
                 <variable> |
                 "(" <expression> ")"

<variable>   ::= "x" | "y" | "z" 

<constant>   ::= <digit> |
                 <digit> <constant>

<digit>      ::= "0" | "1" | ... | "9"
 

 

Python Syntax Description: The description of the Python language uses a BNF grammar like the examples above. The specific notation conventions are given at Notation (Python3). The lexical level (scanning) handles whitespace, newlines, comments, etc. just as with other languages. Python's weird indenting used for block structure poses unique problems that are handled by the scanner. Basically, after processing, all whitespace used for indenting has been replaced by NEWLINE, INDENT and DEDENT tokens that indicate the block structure, that is, identify each "suite" or block in the program. These details don't appear in the BNF itself. See Indentation. Below are a few fragments from the grammar for Python:

Fragments of Python Grammar
compound_stmt ::=  if_stmt
                   | while_stmt
                   | for_stmt
                   | try_stmt
                   | with_stmt
                   | funcdef
                   | classdef
suite         ::=  stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT
statement     ::=  stmt_list NEWLINE | compound_stmt
stmt_list     ::=  simple_stmt (";" simple_stmt)* [";"]

if_stmt ::=  "if" expression ":" suite
             ( "elif" expression ":" suite )*
             ["else" ":" suite]
while_stmt ::=  "while" expression ":" suite
                ["else" ":" suite]
for_stmt ::=  "for" target_list "in" expression_list ":" suite
              ["else" ":" suite]

Notice that a "suite" can be the following tokens:

    NEWLINE INDENT statement+ DEDENT,
where the INDENT and DEDENT replace all the whitespace used for indenting. The ":" before the suite is placed just before the suite in the various compound statements.

(Revision date: 2014-09-20. (Please use ISO 8601, the International Standard.)