CS 3721
Programming Languages  
Spring 2013
Recitation 6. 
 Semantics 1:
Assignments and I/O
Week 6: Feb 20 - Feb 22

Submit following directions at: submissions and rules at: rules. Deadlines are:
  • 2013-03-01  23:59:59 (that's Fri, 01 Mar 2013, 11:59:59 pm) for full credit.
  • 2013-03-04  23:59:59 (that's Mon, 04 Mar 2013, 11:59:59 pm) for 75% credit.

NOTE: A sample Java Parser for this assignment is at Bare parser.
(Soon a C parser.)


Overview: For this recitation, you are to translate the language described in Problem 3 of Recitation 5 into MIPS assembly code. In that previous recitation you wrote a recursive-descent parser for the language. Now you will add semantic actions to the parser code. This recitation is ignoring the "raise-to-power" operator ^, the "mod" operator %, and the input statement (using the rule: G  --->  '>'  lower-case ';'). The next recitation will add these along with an if-optional-else and a while statement.

For convenience we give this simplified version of the grammar again here in slightly different form (operators ^ and % are left off as is the grammar rule for G, all of which where in your recursive-descent parser):

Simplified Tiny® Grammar
M  --->  { S } '$'
S  --->  A  |  P  |  C
A  --->  lower-case '=' E ';'
P  --->  '<'  E  ';'
C  --->  '<'  ( 'B' | 'N' | 'T' ) ';'
E  --->  T {('+' | '-') T}
T  --->  U {('*' | '/' ) U}
U  --->  F
F  --->  '(' E ')'  |  lower-case  |  digit


Initial Recitation work: For this recitation you should ignore all statements except assignments and output statements. These are of the form:

    lower-case = expression ; (asign expression to lower-case variable)
    < expression ;            (print value of an expression)
    < B ;                     (print a blank)
    < N ;                     (print a newline)
    < T ;                     (print a tab)
    


MIPS Assembly Language: Here are three resources:

  • My page about MIPS and its assembley simulator: spim.
  • Documentation on spim: SPIM Manual.
  • An appendix to the book on computer organization by Patterson and Hennessy: Appendix A.

Form of assembler output: In writing a compiler, you have to decide what form to use for the output code, in this case MIPS assembly language. (There is a separate page about MIPS that you should also look over. It presents some of the same material that is found here.) There are many forms the output could take. I am suggesting a form here that I chose because it was the simplest I could think of, but it's not a very flexible method.

One big decision is how to handle variables. In MIPS, a variable is sometimes thought to correspond to a register, but since there are only 32 registers, and many of those cannot be used, it allows too few variables. Instead one should use a memory location for a variable, or in MIPS terms, an element of a specific array for a variable. The language Tiny® for this recitation only has 26 variables (the lower-case letters), so that simplifies things. Also there are only 10 constants: the digits from 0 to 9.

  • Use a large array for storage, say

       
              la      $s1, M
              .data
      M:      .word	0,1,2,3,4,5,6,7,8,9  # constants
              .space  104                  # 26 variables a to z
              .space  400                  # 100 temporaries
        
    Again, the page MIPS talks at length about the use of a single address, called M as the start of all the memory requirements.

    For the rest of the discussion, I'm assuming you use the simple address scheme above, with a dedicated memory location for each of the 10 constants, the 26 variables, and however many temporary locations we might need.

  • The code   g = 2;, to assign g the value 2, might be translated to:

      
              lw      $t1, 8($s1)    # load constant 2 into $t1
              sw      $t1, 64($s1)   # store into g (which is M[16]
        

  • The assignment statement:    h = f + g; might be translated to:

      
              lw      $t1, 60($s1)   # load f (= M[15]) into $t1
              lw      $t2, 64($s1)   # load g (= M[16]) into $t2
              add     $t3, $t1, $t2  # form sum in $t3
              sw      $t3, 176($s1)  # store $t3 into temporary M[44]
              lw      $t1, 176($s1)  # load temporary M[44] into $t1
              sw      $t1, 68($s1)   # store $t1 into h (= M[17])
        

  • Output   < g; might be translated to:

      
              li	$v0, 1         # 1 to print an int
              lw	$a0, 64($s1)   # load g's value (M[16]) into $a0
              syscall                # now print
        

  • Finally   < B; (print a blank) might be translated to:

      
              li	$v0, 4         # 4 to print a string
              la	$a0, Blank     # load address of string
              syscall                # now print
        

    where elsewhere you need:

      
              .data
      Blank:  .asciiz " "
        

  • You can combine all the data together, and you will need the framework of a main function.


How to do the actual translation: This first part of the recitation is only using part of the grammar (just for assignments and output):

As a hint for completing the recitation, you can examine the code to evaluate an arithmetic expression: evaluate. The code you need is actually quite similar to the code on that page. In the evaluate program, each of the various parser functions returned a double, representing the value of what below that point in the parse tree. For the compiler here, each function must return an integer giving the location in memory where the given operand or expression is to be found.

In order to do the translation, the functions of the parser involving expressions must return an integer, which will be the memory location where the expression's value will be found at run time.

For example, in your parser, when the function F finds a variable, say g, it should return a 16, which is the index in the memory array M where the value of the variable g will be at run time. Similarly if F finds a constant, say 7, it should return a 7, which again is the index in the memory array M where the value of the constant 7 will be at run time. To get the actual number of bytes from the start of M, each of the above numbers must be multiplied by 4.

In the more complicated case where F encounters a parenthesized expression, the call to E should return an integer representing the temporary location in the M array where the expression's value will be found at run time. This might be a constant (locations 0-9), a variable (locations 10-35), or a temporary (locations from 36 up).


First Sample Input:
Sample Input 3:
     f = 5 + 8;
     g = f + 8;
     h = f + g;
     < h; < N;
     $

Possible output spim code:
### Compiled on:
###Thu Feb 14 20:04:57 CST 2013
main:   addu    $s7, $ra, $zero
        la      $s1, M
### Start of compiled code
# M[36] = M[5] + M[8]
        lw      $t1, 20($s1)
        lw      $t2, 32($s1)
        add     $t3, $t1, $t2
        sw      $t3, 144($s1)
# M[15] = M[36]
        lw      $t1, 144($s1)
        sw      $t1, 60($s1)
# M[37] = M[15] + M[8]
        lw      $t1, 60($s1)
        lw      $t2, 32($s1)
        add     $t3, $t1, $t2
        sw      $t3, 148($s1)
# M[16] = M[37]
        lw      $t1, 148($s1)
        sw      $t1, 64($s1)

# M[38] = M[15] + M[16]
        lw      $t1, 60($s1)
        lw      $t2, 64($s1)
        add     $t3, $t1, $t2
        sw      $t3, 152($s1)
# M[17] = M[38]
        lw      $t1, 152($s1)
        sw      $t1, 68($s1)
# Print M[17]
        li      $v0, 1
        lw      $a0, 68($s1)
        syscall
# Print NewL as ASCII char
        li      $v0, 4
        la      $a0, NewL
        syscall
### End of complied code
        addu    $ra, $s7, $zero
        jr      $ra
        .data
M:      .word   0,1,2,3,4,5,6,7,8,9
        .space  104  # a to z
        .space  500  # temps
Blank:  .asciiz " "
NewL:   .asciiz "\n"
Tab:    .asciiz "\t"
### End of MIPS source

Output when MIPS executes:
     34

 


Detailed Analysis: Here I will give a detailed analysis of the generation of the MIPS code above for the statement: g = f + 8;:

Consider the parse tree for this statement, using the simplified grammar above:

              A
              |
       +------+--------------+------------+
       |      |              |            |
       |      |              E            |
       |      |              |            |
       |      |       +------+-------+    |
       |      |       |      |       |    |
       |      |       T      |       T    |
       |      |       |      |       |    |
       |      |       U      |       U    |
       |      |       |      |       |    |
       |      |       F      |       F    |
       |      |       |      |       |    |
       g     '='      f     '+'      8   ';'
    

The function F will be returning an integer 15 up the line from f and it will be returning an 8 up the line from 8. This is because the location reserved for f is 15 integer locations beyond the start of M, and because the location 8 integer locations beyond the start of M holds the integer constant 8.

Now try to picture what might go on inside the function E at compile time. It might look like this (not the actual code, but just sort of an indication of what is happening at compile time inside E in this particular case):

    
    int E() {
       res1 = T(); // getting a 15 returned from T
       // since in the case above there is a +, E calls T again
       res2 = T(); // getting an 8 returned from T
       // generate the following MIPS instructions:
       //  lw  $t1, 4*res1($s1)  // 4*res1 is 4*15 = 60
       //  lw  $t2, 4*res2($s1)  // 4*res2 is 4*8 = 32
       //  add $t3, $t1, $t2
       //  sw  $t3, 4*temp($s1)  // next new temp = 36. 4*37 = 148
       return temp;  // which is 36 in this case
    }
    

Similarly, picture what might go on inside the function A at compile time. It might look like this:

    
    int A() {
       loc = location of initial letter;  // a 16
       scan past '=';
       res = E();   // get 36 back
      //  generate the following MIPS instructions:
      //    lw  $t1, 4*res($s1)   // 4*res is 4*36 = 144
      //    sw  $t1, 4*loc($s1)   // 4*loc = 4*16 = 64
    }
    

Compare the above two snapshots of functions with the actual code that was generated, as was given above:

    
         # M[36] = M[15] + M[8]
                 lw      $t1,    60($s1)
                 lw      $t2,    32($s1)
                 add     $t3,    $t1,    $t2
                 sw      $t3,    144($s1)
         # M[16] = M[36]
                 lw      $t1,    144($s1)
                 sw      $t1,    64($s1)
    

My own program to carry out this translation outputs the comments also, just to help with debugging. Notice that the functions E and A always generate pretty much the same MIPS code. Since the statement being compiled is so simple, just g = f + 8, all the code is generated in a single call to E and a single call to A. With more complicated code, such as g = f*4 + (f/8), code would be generated by other parts of the parser before the final four instructions output by E as shown above. In the case of A, there will be arbitrarily much code generated by the call to E from within A, but A always outputs the same two instructions, with the only difference being the offsets from the register $s1.


How To Get Started. I would start with your parser from Recitation 5. Then initially I would make only a few changes:
  1. Change F so that it returns the proper integer for a lower-case letter and the proper one for a digit.
  2. Change U, T, E so that they pass the value coming to them from below on up by returning it.
  3. Change A so that it decides which number belongs to the variable on the left side. Save this number. Get the number returned from a call to E. Multiply all the integers by 4 to get byte offsets from the start of M. Finally generate (print) two MIPS instructions, the first to load the number at the locations returned by E into a register, and second to store this number into the location given by the identifier at the right side of the assignment. So start with the input: f = 7;$, which should be a legal input to your parse. With the changes above, your program should output the two MIPS instructions at the far left. Then with the same program, try the input in the middle. For the input on the right, you must add to the function E so that when there is an addition, it will produce an add instruction, along with three other instructions

    Sample Input 0:
    Input: f = 7; $
    Should produce output:
    # M[15] = M[7]
            lw      $t1, 28($s1)
            sw      $t1, 60($s1)
    
     
    Sample Input 1:
    Input: f = 7; g = f; $
    Should produce output:
    # M[15] = M[7]
            lw      $t1, 28($s1)
            sw      $t1, 60($s1)
    # M[16] = M[15]
            lw      $t1, 60($s1)
            sw      $t1, 64($s1)
    
     
    Sample Input 2:
    Input: g = f + 8 ; $
    Should produce output:
    # M[36] = M[15] + M[8]
            lw      $t1, 60($s1)
            lw      $t2, 32($s1)
            add     $t3, $t1, $t2
            sw      $t3, 144($s1)
    # M[16] = M[36]
            lw      $t1, 144($s1)
            sw      $t1, 64($s1)
    
    In order to successfully execute these using spim, you must add the stuff at the beginning and end of the complete sample programs.

    Sample Input 4:
    f = 5 + 8;
    g = f + 8;
    h = f + g;
    a = f*f + g*g; 
    b = g*h + f*g;
    c = g*g + h*h;
    f = a*a + b*b;
    g = b*c + a*b;
    h = c*c + b*b;
    < h; < N;
    $

    Output from spim:
    3524578
     
    Sample Input 5 (complicated):
    d = 5*2;
    c = d*d*d*d*d*d*d;
    b = 2*c;
    a = 4*c;
    k = 0;
    x = 8*2;
    t = a/(8*k+1) - b/(8*k+4) - c/(8*k+5) - c/(8*k+6);
    s = t; < s; < N;
    k = 1;
    t = a/(8*k+1) - b/(8*k+4) - c/(8*k+5) - c/(8*k+6);
    s = s + t/x; < s; < N;
    k = 2;
    t = a/(8*k+1) - b/(8*k+4) - c/(8*k+5) - c/(8*k+6);
    s = s + t/(x*x); < s; < N;
    k = 3;
    t = a/(8*k+1) - b/(8*k+4) - c/(8*k+5) - c/(8*k+6);
    s = s + t/(x*x*x); < s; < N;
    $

    Output when MIPS executed by spim:
    31333334
    31414225
    31415874
    31415924

     


What you should submit: Refer to the submissions directions and to deadlines at the top of this page. If two people are working together, be sure that both signed in as a pair and that both names appear at the top of the submission.

You should first give the compiler source program. Then for the sample programs in turn, the MIPS code and the output from running the MIPS code. You should complete as many of the sample programs as you can.

My MIPS code for sample input 5 was about 500 lines long (yours should be at least 400 lines), so do NOT include a listing of this with your submission.


Revision date: 2013-02-14. (Use ISO 8601, an International Standard.)