CS 3721 Recitation 2: Finite State Machines

CS 3721
Programming Languages
Spring 2013

Recitation 2. FSM

Week 2: Jan 23-25

Submit following directions at: submissions and rules at: rules. Deadlines are:

2013-01-28 23:59:59 (that's Mon, 28 Jan 2013, 11:59:59 pm) for full credit.
2013-01-01 23:59:59 (that's Fri, 01 Feb 2013, 11:59:59 pm) for 75% credit.

List of names of students submitting, after final due date: Rec2 Submissions.

Overview: A compiler first breaks the sequence of input characters of a program into a similar sequence of basic units called tokens, in a process called lexical analysis or scanning. Examples of tokens are identifiers, keywords, constants, operators, and special characters. This recitation is focusing on the most complex token, a floating point constant. See lexical analysis for more information.

Writing a recognizer for individual tokens: The most common way to write such a recognizer is to first create a finite state machine (FSM) that describes the token. Then the FSM is converted to a program, either by hand or using a automated software tool to produce the program (such as lex in Unix, or more recently flex and many others).

Regular expressions (RE) are formally equivalent to FSMs, and they are also regularly used in lexical analysis.

C-style comments: C-style comments use an initial /* and a terminal */ to form a comment, This is a very good illustration of the concept of using a FSM to help construct a recognizer, that is, a program that can tell when it has encountered a comment. A recognizer for these comments is illustrated here: C-style comments. Notice that this recognizer can be significantly simplified, and you are allowed to do this.

I will suggest in class that the code given above be written in a slightly different form.

Floating point constants: This is the most complex token that will be worked with in this and the next recitation. For our purposes here, a floating point constant consists initially of any number (including 0) of digits, followed by an optional decimal point(.), followed by any number (including 0) of digits, followed by an optional exponent part, which is an e or E, followed by an optional sign (+ or -), followed by 1 or more digits for the exponent. There must be at least one digit before or after the decimal point (or both), and if there is no decimal point there must be at least one digit. There must be either a decimal point or the exponent part (or both).

This and the next recitation will include an integer constant as a special case, since it just has no decimal point and no exponent part.

Actual floating point constants in Java can also have an optional trailing f or F for float constants and the optional trailing d or D for double constants. (With no optional trailing letter the constant is double by default.) You should ignore these possibilities. You should also ignore the limitation on the size of the exponent, so the last illegal constant in the Java program below (not illegal in C) would be accepted by your FSM. Any initial optional sign (+ or -) is always treated as a separate operator, and it should not be present in the input for this recitation.

Here is a short program that tries out various floating point constants in C and Java. The program only shows most of the possible forms. All the constants shown are legal except for the last 3 in C and the last 5 in Java (all commented out). Your code should accept all but the last 3 on either side.

C Program

Java Program

// double.c: try out double constants
#include <stdio.h>
int main() {   /* below "d" stands for "digit" */
   printf("11.75e2 = %f: normal\n", 11.75e2);
   printf("  .75e2 =   %f: no d before .\n", .75e2);
   printf("  11.e2 = %f: no d after .\n", 11.e2);
   printf("   11e2 = %f: no .\n", 11e2);
   printf("11.5e-2 =    %f: neg after e\n", 11.5e-2);
   printf("11.5e+2 = %f: pos after e\n", 11.5e+2);
   printf("11.75E2 = %f: cap E\n", 11.75E2);
   printf("  11.75 =   %f: no e\n", 11.75);
   printf("    .75 =    %f: no e, no d before .\n",.75);
   printf("    11. =   %f: no e, no d after .\n", 11.);
   printf("     11 =   %f: no e, no .\n", 11);
   printf( 
     "333333333.222222222e-2 = %20.16f: too many d's\n",
      333333333.222222222e-2);
   printf("1.0e-444 = %20.16f:  underflow\n", 1.0e-444);
   printf("1.0e+444 = %20.16f:  overflow\n", 1.0e+444);
/* printf(".e-2 = %f: only .\n", .e-2);
   printf("e-2  = %f: only e\n",  e-2);
   printf("1.2e = %f: e, no d\n", 1.2e);
 */
}

// Double: try out double constants
public class Double {
   public static void main(String[] args) {
      System.out.println("11.75e2 = "+""+ 11.75e2);
      System.out.println("  .75e2 = "+"  "+ .75e2);
      System.out.println("  11.e2 = "+""+   11.e2);
      System.out.println("   11e2 = "+""+    11e2);
      System.out.println("11.5e-2 = "+"   "+11.5e-2);
      System.out.println("11.5e+2 = "+""+ 11.5e+2);
      System.out.println("11.75E2 = "+""+ 11.75E2);
      System.out.println("  11.75 = "+"  "+ 11.75);
      System.out.println("    .75 = "+"   "+  .75);
      System.out.println("    11. = "+"  "+   11.);
      System.out.println("     11 = "+"  "+    11);
      System.out.println( 
         "333333333.222222222e-2 = " +
          333333333.222222222e-2);
   /* System.out.println("1.0e-444  = " + 1.0e-444);
      System.out.println("1.0e+444  = " + 1.0e+444);
      System.out.println(".e-2  = " + .e-2);
      System.out.println("e-2   = " + e-2);
      System.out.println("1.2e  = " + 1.2e);
    */
   }
}

C Run and Output

Java Run and Output

% cc -o double double.c -Wall
double.c: In function ‘main’:
double.c:15:4: warning: format ‘%f’ expects type
   'double', but argument 2 has type 'int'
double.c:19:4: warning: floating constant
   truncated to zero
double.c:20:4: warning: floating constant
   exceeds range of 'double'
% ./double
11.75e2 = 1175.000000: normal
  .75e2 =   75.000000: no d before .
  11.e2 = 1100.000000: no d after .
   11e2 = 1100.000000: no .
11.5e-2 =    0.115000: neg after e
11.5e+2 = 1150.000000: pos after e
11.75E2 = 1175.000000: cap E
  11.75 =   11.750000: no e
    .75 =    0.750000: no e, no d before .
    11. =   11.000000: no e, no d after .
     11 =   11.000000: no e, no .
333333333.222222222e-2 = 3333333.3322222223505378:
   too many d's
1.0e-444 =   0.0000000000000000:  underflow
1.0e+444 =                  inf:  overflow

% javac Double.java
19: floating point number too small
  ... println("1.0e-444  = " + 1.0e-444);
                               ^
20: floating point number too large
  ... println("1.0e+444  = " + 1.0e+444);
                               ^
21: illegal start of expression
  ... println(".e-2  = " + .e-2);
                           ^
21: ')' expected
  ... println(".e-2  = " + .e-2);
                                ^
23: malformed floating point literal
  ... println("1.2e  = " + 1.2e);
                           ^
% java Double
11.75e2 = 1175.0
  .75e2 =   75.0
  11.e2 = 1100.0
   11e2 = 1100.0
11.5e-2 =    0.115
11.5e+2 = 1150.0
11.75E2 = 1175.0
  11.75 =   11.75
    .75 =    0.75
    11. =   11.0
     11 =   11
333333333.222222222e-2 = 3333333.3322222224

The three commented-out constants in the Java code produced error messages:

double.c:21:26: error: expected expression before '.' token
double.c:22:27: error: 'e' undeclared (first use in this function)
double.c:23:26: error: exponent has no digits

One method for constructing a program to recognize floating point constants starts with a complete FSM that exactly recognizes these constants. Such an FSM is fairly complex, because of the many constraints (such "at least one digit before or after the decimal point"). In our example, a string of digits, which would normally be recognized as an int, instead will be treated as a double. Here is a FSM for this case:

FSM for doubles

This method is similar to the example for recognizing C-style comments, but even more complicated.

An easier approach is to start with a "skeleton" FSM that has the basic parts: initial digits, decimal point, trailing digits, letter e, sign on exponent, exponent. Then the constraints can be handled with flags or counters or by other means, so that if there are no initial digits and the constant starts with a decimal point, the program knows that there must be trailing digits, and similarly for other constraints.

Recitation work: You should write a program in either C or in Java (or in C++) that recognizes floating point constants.

Your program should save the characters of the constant as it goes along, so that the actual character string making up the constant in the program can be printed.

What you should submit: Refer to the submissions directions and to deadlines at the top of this page. The text file that you submit should first have Your Name, the Course Number, and the Recitation Number. The rest of the file should have the following in it, in the order below, and clearly labeled, including at the beginning the appropriate item letters: a and b.

Contents of submission for Recitation 2:
Last Name, First Name; Course Number; Recitation Number (2)
Last Name, First Name (of second person, if two are working together).
1. The Java or C++ source file or files for your program. Everything should be run together into one file, with reasonable separators between components (the separate source files and the output). The code should be reasonably organized and written, with special emphasis on header comments. (Not much emphasis on inline comments.)
2. You should give the results of a run using the following source file for input. In case your program works for some inputs but not for the complete source file below, you should explain this and use simpler source that your program will handle (for some loss of credit).

11 11. 62.47 .14 11.75e2 .75e2 11.e2 11e2 11.5e-2 11.5e+2 11.75E2 11.75 .75 .23 .84e-2 666 345.578e5 1234.5678 .e-2 e-2 1.2e $
Output format: In b above, your output should be the string that represents each scanned floating point constant. For example, with the input in 1 above, your output could have the form:
Next double:"11" Next double:"11." Next double:"62.47" Next double:".14" Next double:"11.75e2" Next double:".75e2" Next double:"11.e2" Next double:"11e2" Next double:"11.5e-2" Next double:"11.5e+2" Next double:"11.75E2" Next double:"11.75" Next double:".75" Next double:".23" Next double:".84e-2" Next double:"666" Next double:"345.578e5" Next double:"1234.5678" (plus several things for the last three items)
End-of-file problems: Sometimes students have trouble recognizing end-of-file and spend a lot of time on this simple problem. Instead, you may use a "$" character to represent end-of-file (as shown above), or you may use the system EOF that is provided. (So do not get hung up on end-of-file.)

Key ideas: Finite state machines (FSMs) can be used to aid in constructing a program to recognize tokens of a programming language such as floating point constants or C-style comments.

Revision date: 2013-01-22. (Please use ISO 8601, the International Standard.)