CS 3323 Topics in Programming Languages
Lisp Interpreter Project


LISP Interpreter 2--Parser and Translator
Due February 24, 1999

The purpose of this assignment is to write an interactive S-expression reader/writer, the functions of which are

The conversion from external to internal form has two subparts:
  1. The first is a parser or recognizer of Lisp S-expressions. As the name implies, it will recognize the syntax of a legal S-expression (and reject illegal ones). C code for this subpart is based on a context-free grammar (or BNF grammar) for S-expressions. (See below.)
  2. The second subpart actually translates the S-expression from external form into linked list form. The C code for the second subpart is intermixed with that for the first subpart.
The translation from linked list form back to external form is similar to the other translation, but is simpler since there are no errors to check.

Internal or linked list form: Use the following C type declarations for the internal form of S-expressions. They should be in a header file for the lisp parser.

     typedef enum {l, a, n} Stype;
     typedef struct scell* Sp;
     typedef struct scell {
                Stype celltype;
                union {
                     struct {
                             Sp first;
                             Sp rest;
                     } lcell;
                     char * alfp;
                     Numtype nump;
                } ptrs;
             } Scell;
It is convenient to represent the null S-expression ( "()" or "NIL") just by the null C pointer "NULL". The way in which S-expressions are represented in linked list form will be discussed in class. Notice that the union is important here, because it saves storage for lisp cells, and there are many such cells.

Parser or recognizer for S-expressions: The syntax of an S-expression is given by the following context-free grammar:

    SEXPR   --->  alpha-atom   |   numeric-atom   |   "("  TAIL

    TAIL    --->  ")"          |    SEXPR    TAIL
Here alpha-atom, numeric-atom, "(", and ")" are the four tokens returned by the scanner.

The framework for the translator is a recursive descent parser based on the above grammar. This parser has a procedure for each non- terminal symbol (SEXPR and TAIL above). The C code for each procedure just mirrors the grammar rule for the symbol. Thus procedure sexpr looks for either an alphabetic atom, or a numeric atom, or a left parenthesis followed by stuff that it uses procedure tail to handle. Similarly, procedure tail expects to find either a right parenthesis, or else something sexpr can handle (recursive call) followed by something tail (another recursive call) can handle. Each time a token is handled, gettoken must be called for the next token.

Translator using C: For the translation itself, each of sexpr and tail should be made into functions that return Sp pointers. One possible organization for your program might be as follows. (Note that below, some of these functions were buried in my source file for the parser, and so were declared static, while others had a prototype in the header file for the parser. You may want to organize your files differently.)

     long gettoken (Token* tok);
     /* this function was written during part 1 of the assignment.
        The only time any reading is done by getsexpr is when there
        is a call to this procedure. */

     static Sp getsexpr (void);
     /* reads an S-expression from input stream using gettoken and
        functions sexpr and tail below.  Translates to internal linked
        list form as it proceeds  */

     Sp sexpr(void);
     /* reads and translates an S-expression.  The translated form is
        returned as the function value */

     static Sp tail (void);
     /* reads a list of S-expressions, given that the first "("
        has been read.  May call SEXPR and TAIL recursively */
The translation should be carried out using three additional functions:

     static Sp newalfatom (Alftype );
     /* creates a Lisp cell containing an alphabetic atom */

     Sp newnumatom(Numtype);
     /* creates a Lisp cell containing a numeric atom */

     Sp cons(Sp, Sp);
     /* creates a new list which is the "cons" of lists S1 and S2 */
Only these last three functions should actually update the internal structure of the Lisp cells. Then we can regard Lisp cells as abstract data types which are acted on only by these three functions. This approach is important. Pointers are similar to goto's in that their unrestricted use can be very confusing and error prone. In both cases such programs are difficult to prove correct and difficult to debug.

In order to convert the internal form to external form and to write it out, use the following outline, similar to the above:

     Sp car(Sp);
     /* returns the "car" of S-expression S */

     Sp cdr(Sp);
     /* returns the "cdr" of S-expression S */

     static void wsexpr(Sp );
     /* write out an S-expression */

     static void wtail(Sp );
     /* writes the list S, assuming leftmost "(" already written.
        May call WSEXPR and WTAIL recursively */
As before, only car, cdr and perhaps a few other functions should manipulate the internal details of Lisp cells. I used two additional functions:

     char * chars(Sp);
     /* assuming Sp points to an alpha atom, return the actual
        characters of the atom */

     Numtype number(Sp);
     /* assuming Sp points to a numeric atom, return the floating
        point number pointed to by Sp */
What to turn in: You should exercise your procedures and functions with a simple driver program that reads and writes S-expressions. Extra blanks will be discarded by the scanner, and so will not appear in the S-expression when it is written out. The individual S-expressions in a list should each be separated by one blank. Don't worry if your output doesn't look exactly like that in some standard text, but it should be close. (It is permissible for your output to have a few extra blanks here and there.) The only error situation besides scanner errors is a right parenthesis at the start of an S-expression. If there are missing right parentheses to close the S-expression, your program should just wait for more to be entered.
Revision date: 1/6/99