(Soon a C parser.) Overview: For this recitation, you are to translate the language described in Problem 3 of Recitation 5 into MIPS assembly code. In that previous recitation you wrote a recursive-descent parser for the language. Now you will add semantic actions to the parser code. This recitation is ignoring the "raise-to-power" operator ^, the "mod" operator %, and the input statement (using the rule: G ---> '>' lower-case ';'). The next recitation will add these along with an if-optional-else and a while statement. For convenience we give this simplified version of the grammar again here in slightly different form (operators ^ and % are left off as is the grammar rule for G, all of which where in your recursive-descent parser):
Initial Recitation work: For this recitation you should ignore all statements except assignments and output statements. These are of the form: lower-case = expression ; (asign expression to lower-case variable) < expression ; (print value of an expression) < B ; (print a blank) < N ; (print a newline) < T ; (print a tab) MIPS Assembly Language: Here are three resources:
Form of assembler output: In writing a compiler, you have to decide what form to use for the output code, in this case MIPS assembly language. (There is a separate page about MIPS that you should also look over. It presents some of the same material that is found here.) There are many forms the output could take. I am suggesting a form here that I chose because it was the simplest I could think of, but it's not a very flexible method. One big decision is how to handle variables. In MIPS, a variable is sometimes thought to correspond to a register, but since there are only 32 registers, and many of those cannot be used, it allows too few variables. Instead one should use a memory location for a variable, or in MIPS terms, an element of a specific array for a variable. The language Tiny® for this recitation only has 26 variables (the lower-case letters), so that simplifies things. Also there are only 10 constants: the digits from 0 to 9.
lw $t1, 8($s1) # load constant 2 into $t1 sw $t1, 64($s1) # store into g (which is M[16] lw $t1, 60($s1) # load f (= M[15]) into $t1 lw $t2, 64($s1) # load g (= M[16]) into $t2 add $t3, $t1, $t2 # form sum in $t3 sw $t3, 176($s1) # store $t3 into temporary M[44] lw $t1, 176($s1) # load temporary M[44] into $t1 sw $t1, 68($s1) # store $t1 into h (= M[17]) li $v0, 1 # 1 to print an int lw $a0, 64($s1) # load g's value (M[16]) into $a0 syscall # now print li $v0, 4 # 4 to print a string la $a0, Blank # load address of string syscall # now printwhere elsewhere you need: .data Blank: .asciiz " " How to do the actual translation: This first part of the recitation is only using part of the grammar (just for assignments and output): As a hint for completing the recitation, you can examine the code to evaluate an arithmetic expression: evaluate. The code you need is actually quite similar to the code on that page. In the evaluate program, each of the various parser functions returned a double, representing the value of what below that point in the parse tree. For the compiler here, each function must return an integer giving the location in memory where the given operand or expression is to be found. In order to do the translation, the functions of the parser involving expressions must return an integer, which will be the memory location where the expression's value will be found at run time. For example, in your parser, when the function F finds a variable, say g, it should return a 16, which is the index in the memory array M where the value of the variable g will be at run time. Similarly if F finds a constant, say 7, it should return a 7, which again is the index in the memory array M where the value of the constant 7 will be at run time. To get the actual number of bytes from the start of M, each of the above numbers must be multiplied by 4. In the more complicated case where F encounters a parenthesized expression, the call to E should return an integer representing the temporary location in the M array where the expression's value will be found at run time. This might be a constant (locations 0-9), a variable (locations 10-35), or a temporary (locations from 36 up). First Sample Input:
Detailed Analysis: Here I will give a detailed analysis of the generation of the MIPS code above for the statement: g = f + 8;: Consider the parse tree for this statement, using the simplified grammar above: A | +------+--------------+------------+ | | | | | | E | | | | | | | +------+-------+ | | | | | | | | | T | T | | | | | | | | | U | U | | | | | | | | | F | F | | | | | | | g '=' f '+' 8 ';'The function F will be returning an integer 15 up the line from f and it will be returning an 8 up the line from 8. This is because the location reserved for f is 15 integer locations beyond the start of M, and because the location 8 integer locations beyond the start of M holds the integer constant 8. Now try to picture what might go on inside the function E at compile time. It might look like this (not the actual code, but just sort of an indication of what is happening at compile time inside E in this particular case): int E() { res1 = T(); // getting a 15 returned from T // since in the case above there is a +, E calls T again res2 = T(); // getting an 8 returned from T // generate the following MIPS instructions: // lw $t1, 4*res1($s1) // 4*res1 is 4*15 = 60 // lw $t2, 4*res2($s1) // 4*res2 is 4*8 = 32 // add $t3, $t1, $t2 // sw $t3, 4*temp($s1) // next new temp = 36. 4*37 = 148 return temp; // which is 36 in this case }Similarly, picture what might go on inside the function A at compile time. It might look like this: int A() { loc = location of initial letter; // a 16 scan past '='; res = E(); // get 36 back // generate the following MIPS instructions: // lw $t1, 4*res($s1) // 4*res is 4*36 = 144 // sw $t1, 4*loc($s1) // 4*loc = 4*16 = 64 }Compare the above two snapshots of functions with the actual code that was generated, as was given above: # M[36] = M[15] + M[8] lw $t1, 60($s1) lw $t2, 32($s1) add $t3, $t1, $t2 sw $t3, 144($s1) # M[16] = M[36] lw $t1, 144($s1) sw $t1, 64($s1)My own program to carry out this translation outputs the comments also, just to help with debugging. Notice that the functions E and A always generate pretty much the same MIPS code. Since the statement being compiled is so simple, just g = f + 8, all the code is generated in a single call to E and a single call to A. With more complicated code, such as g = f*4 + (f/8), code would be generated by other parts of the parser before the final four instructions output by E as shown above. In the case of A, there will be arbitrarily much code generated by the call to E from within A, but A always outputs the same two instructions, with the only difference being the offsets from the register $s1. How To Get Started. I would start with your parser from Recitation 5. Then initially I would make only a few changes:
What you should submit: Refer to the submissions directions and to deadlines at the top of this page. If two people are working together, be sure that both signed in as a pair and that both names appear at the top of the submission. You should first give the compiler source program. Then for the sample programs in turn, the MIPS code and the output from running the MIPS code. You should complete as many of the sample programs as you can. My MIPS code for sample input 5 was about 500 lines long (yours should be at least 400 lines), so do NOT include a listing of this with your submission. Revision date: 2013-02-14. (Use ISO 8601, an International Standard.) |