CS 3723/3721 -- Tiny Compiler 2: Semantic Actions

(Click or use local copy.)

CS 3723/3721
Programming Languages
Spring 2005
Recitation 6
Tiny Compiler 2: Translate Assignments
    Week 6: Feb 21-25
Due (on time): 2005-02-28  23:59:59
Due (late):        2005-03-04  23:59:59

Recitation 6 must be submitted following directions at: submissions with deadlines

2005-02-28 23:59:59 (that's Monday, 28 February 2005, 11:59:59 pm) for full credit.
2005-03-04 23:59:59 (that's Friday, 4 March 2005, 11:59:59 pm) for 75% credit.

Overview: For a description and examles of semantic action or syntax-directed translation, refer to syntax directed translation.

Initial Recitation work: For this recitation and the next one, you are to translate the language described in the previous recitation into MIPS assembly code.

For this recitation you should ignore all statements except assignments and output statements, which are of the form:

< expression ; (print an expression)
< B ;          (print a blank)
< N ;          (print a newline)
< T ;          (print a tab)

The recitation is also ignoring the ^ operator. (You can use the parser for the full language from Recitation 5, or you can truncate the parser if you like.)

Form of assembler output: In writing a compiler, you have to decide what form to use for the output code, in this case MIPS assembly language. There are many forms the output could take. I am suggesting a form here that I chose because it was the simplest I could think of, but it's not a very flexible method.

One big decision is how to handle variables. In MIPS, a variable is sometimes thought to correspond to a register, but since there are only 32 registers, and many of those cannot be used, it allows too few variables. Instead one should use a memory location for a variable, or in MIPS terms, an element of a specific array for a variable. The language Tiny® for this recitation only has 26 variables (the lower-case letters), so that simplifies things. Also there are only 10 constants: the digits from 0 to 9.

Use a large array for storage, say


        la      $s1,    M
        .data
M:      .word	0,1,2,3,4,5,6,7,8,9  # constants
        .space  104                  # 26 variables a to z
        .space  400                  # 100 temporaries

There are other, better ways to handle variables: in general one should use a symbol table for the variables that actually occur in the program, and map each variable at compile time to a memory location at run time. This would allow arbitrarily many variables.

Simlarly one could use the same table or a different table for constants as they occur. Alternatively you could handle the constants one-by-one as they arise, just hard-wiring the constant into the assembly code.

For the rest of the discussion, though, I'm assuming you use the simple scheme above, with a dedicated memory location for each of the 26 variables and each of the 10 constants.

The code g = 2; might be translated to:


        lw      $t1,    8($s1)       # load constant 2 into $t1
        sw      $t1,    64($s1)      # store into g (which is M[16]

Similarly, h = f + g; might be translated to:


        lw      $t1,    60($s1)      # load f (= M[15]) into $t1
        lw      $t2,    64($s1)      # load g (= M[16]) into $t2
        add     $t3,    $t1,    $t2  # form sum in $t3
        sw      $t3,    176($s1)     # store $t3 into temporary M[44]
        lw      $t1,    176($s1)     # load temporary M[44] into $t1
        sw      $t1,    68($s1)      # store $t1 into h (= M[17])

Output < g; might be translated to:


        li	$v0,    1            # 1 to print an int
        lw	$a0,    64($s1)      # load g's value (M[16]) into $a0
        syscall                      # now print

Finally < B; (print a blank) might be translated to:


        li	$v0,    4            # 4 to print a string
        la	$a0,    Blank        # load address of string
        syscall                      # now print

where elsewhere you need:


        .data
Blank:  .asciiz " "

You can combine all the data together, and you will need the framework of a main function.

How to do the actual translation: This first part of the recitation is only using part of the grammar (just for assignments and output):

Simplified Tiny® Grammar
M ---> { S } '#' S ---> A \| P \| C A ---> lower-case '=' E ';' P ---> '<' E ';' C ---> '<' upper-case ';' E ---> T {('+' \| '-') T} T ---> U {('*' \| '/' \| '%') U} U ---> F F ---> '(' E ')' \| lower-case \| digit

Simplified Tiny® Grammar

M  --->  { S } '#'
S  --->  A  |  P  |  C
A  --->  lower-case '=' E ';'
P  --->  '<'  E  ';'
C  --->  '<'  upper-case ';'
E  --->  T {('+' | '-') T}
T  --->  U {('*' | '/' | '%') U}
U  --->  F
F  --->  '(' E ')'  |  lower-case  |  digit

As a hint for completing the recitation, you can work with the code to evaluate an arithmetic expression, producing a double: C source file: .c; with a file showing a run: .txt, .html, .ps, .pdf. (The code you need is actually quite similar to the code above.) Each of the various functions of the parser must return an integer giving the location in memory where the given operand or expression is to be found: memory locations 0 to 9 for constants 0 to 9, locations 10 to 35 for variables a to z, and locations 36 to 85 for 50 temporaries.

In order to do the translation, the functions of the parser involving expressions must return an integer, which will be the memory location where the expression's value will be found at run time.

For example, in your parser, when the function F finds a variable, say g, it should return a 16, which is the index in the memory array M where the value of the variable g will be at run time. Similarly if F finds a constant, say 7, it should return a 7, which again is the index in the memory array M where the value of the constant 7 will be at run time.

In the more complicated case where F encounters a parenthesized expression, the call to E should return an integer representing the location in the M array where the expression's value will be found at run time. This might be a constant (locations 0-9), a variable (locations 10-35), or a temporary (locations from 36 up).

First Sample Input:

Sample Input 1 (simple):

     f = 5 + 8;
     g = f + 8;
     h = f + g;
     < h; < N;
     $

Possible output spim code (many forms):

     ### Compiled on: Fri Sep 27 11:39:34 CDT 2002
     main:   addu    $s7,    $ra,	$zero
             la      $s1,    M
     ### Start of compiled code
     # M[36] = M[5] + M[8]
             lw      $t1,    20($s1)
             lw      $t2,    32($s1)
             add     $t3,    $t1,    $t2
             sw      $t3,    144($s1)
     # M[15] = M[36]
             lw      $t1,    144($s1)
             sw      $t1,    60($s1)
     # M[37] = M[15] + M[8]
             lw      $t1,    60($s1)
             lw      $t2,    32($s1)
             add     $t3,    $t1,    $t2
             sw      $t3,    148($s1)
     # M[16] = M[37]
             lw      $t1,    148($s1)
             sw      $t1,    64($s1)
     # M[38] = M[15] + M[16]
             lw      $t1,    60($s1)
             lw      $t2,    64($s1)
             add     $t3,    $t1,    $t2
             sw      $t3,    152($s1)
     # M[17] = M[38]
             lw      $t1,    152($s1)
             sw      $t1,    68($s1)
     # Print M[17]
             li      $v0,    1
             lw      $a0,    68($s1)
             syscall
     # Print NewL as ASCII char
             li      $v0,    4
             la      $a0,    NewL
             syscall
     ### End of complied code
             addu    $ra,	$s7,	$zero
             jr      $ra
             .data
     M:      .word   0,1,2,3,4,5,6,7,8,9 # constants   
             .space  104  # variables a to z
             .space  500  # temporaries
     Blank:  .asciiz " "
     NewL:   .asciiz "\n"
     Tab:    .asciiz "\t"
     Colon:  .asciiz ":"
     SemiC:  .asciiz ";"
     Star:   .asciiz "*"

Output when MIPS executed by spim:

Detailed Analysis: Here I will give a detailed analysis of the generation of the MIPS code above for the statement: g = f + 8;:

Consider the parse tree for this statement, using the simplified grammar above:


          A
          |
   +------+--------------+
   |      |              |
   |      |              E
   |      |              |        
   |      |       +------+-------+
   |      |       |      |       |
   |      |       T      |       T
   |      |       |      |       |
   |      |       U      |       U
   |      |       |      |       |
   |      |       F      |       F
   |      |       |      |       |
   g     '='      f     '+'      8

The function F will be returning an integer 15 up the line from f and it will be returning an 8 up the line from 8.

Now try to picture what might go on inside the function E at compile time. It might look like this (not the actual code, but just sort of an indication of what is happening at compile time inside E in this particular case):


int E() {
   res1 = T(); // getting a 15 returned from T
   // since in the case above there is a +, E calls T again
   res2 = T(); // getting an 8 returned from T
   // generate the following MIPS instructions:
      lw  $t1, 4*res1($s1)  // 4*res1 is 4*15 = 60
      lw  $t2, 4*res2($s1)  // 4*res2 is 4*8 = 32
      add $t3, $t1, $t2
      sw  $t3, 4*temp($s1)  // next new temp = 37. 4*37 = 148
   return temp;  // which is 37 in this case
}

Similarly, picture what might go on inside the function A at compile time. It might look like this:


int A() {
   loc = location of initial letter;  // a 16
   scan past '=';
   res = E();   // get 37 back
   generate the following MIPS instructions:
      lw  $t1, 4*res($s1)   // 4*res is 4*37 = 148
      sw  $t1, 4*loc($s1)   // 4*loc = 4*16 = 64
}

Compare the above two snapshots of functions with the actual code that was generated, as was given above:


     # M[37] = M[15] + M[8]
             lw      $t1,    60($s1)
             lw      $t2,    32($s1)
             add     $t3,    $t1,    $t2
             sw      $t3,    148($s1)
     # M[16] = M[37]
             lw      $t1,    148($s1)
             sw      $t1,    64($s1)

My own program to carry out this translation output the comments also, just to help with debugging. Notice that the functions E and A always generate pretty much the same MIPS code. Since the statement being compiled is so simple, just g = f + 8, all the code is generated in a single call to E and a single call to A. With more complicated code, such as g = f*4 + (f/8), code would be generated by other parts of the parser before the final four instructions output by E as shown above. In the case of A, there will be arbitrarily much code generated by the call to E from within A, but A always outputs the same two instructions, with the only difference being the offsets from the register $s1.

Remaining Sample Inputs:

Sample Input 2 (intermediate):

     f = 5 + 8;
     g = f + 8;
     h = f + g;
     a = f*f + g*g;                                    
     b = g*h + f*g;
     c = g*g + h*h;
     f = a*a + b*b;
     g = b*c + a*b;
     h = c*c + b*b;
     < h; < N;
     $

Output when MIPS executed by spim:

Sample Input 3 (complicated):

d = 5*2; c = d*d*d*d*d*d*d; b = 2*c; a = 4*c; k = 0; x = 8*2; t = a/(8*k+1) - b/(8*k+4) - c/(8*k+5) - c/(8*k+6); s = t; < s; < N; k = 1; t = a/(8*k+1) - b/(8*k+4) - c/(8*k+5) - c/(8*k+6); s = s + t/x; < s; < N; k = 2; t = a/(8*k+1) - b/(8*k+4) - c/(8*k+5) - c/(8*k+6); s = s + t/(x*x); < s; < N; k = 3; t = a/(8*k+1) - b/(8*k+4) - c/(8*k+5) - c/(8*k+6); s = s + t/(x*x*x); < s; < N; $

Output when MIPS executed by spim:

31333334 31414225 31415874 31415924

What you should submit: Refer to the submissions directions and to deadlines at the top of this page. The text file that you submit should first have Your Name, the Course Number, and the Recitation Number. The rest of the file should have the following in it, in the order below, and clearly labeled, including at the beginning the appropriate item letters: a, b, c, etc.

Contents of submission for Recitation 6:
Last Name, First Name; Course Number; Recitation Number (6).
a. Java or C++ code for your translator, perhaps in multiple files.
b. Results of a run of your program using Sample Input 1 above, giving the complete resulting MIPS code.
c. Results of executing the MIPS code in b using the SPIM simulator.
d. Results of a run of your program using Sample Input 2 above, giving the complete resulting MIPS code.
e. Results of executing the MIPS code in d using the SPIM simulator.
f. Results of a run of your program using Sample Input 3 above, giving the complete resulting MIPS code.
g. Results of executing the MIPS code in f using the SPIM simulator.

Key ideas: It is relatively easy to add code to the recursive-descent parser that will carry out tasks, such as translating input assignment statements into MIPS assembler code. This process is known as syntax-directed traslation. It is an extremely powerful technique for writing compilers and for other similar tasks.

Revision date: 2005-02-25. (Use ISO 8601, an International Standard.)