|
CS 3721
Programming Languages
Spring 2013 |
Recitation 2.
FSM
|
Week 2: Jan 23-25
|
Submit following directions at:
submissions
and rules at:
rules.
Deadlines are:
- 2013-01-28 23:59:59 (that's Mon, 28 Jan 2013, 11:59:59 pm)
for full credit.
- 2013-01-01 23:59:59 (that's Fri, 01 Feb 2013, 11:59:59 pm)
for 75% credit.
|
List of names of students submitting, after final due date:
Rec2 Submissions.
Overview:
A compiler first breaks the sequence of input characters of a program into
a similar sequence of basic units called tokens, in a process called
lexical analysis or scanning.
Examples of tokens are identifiers, keywords, constants, operators,
and special characters.
This recitation is focusing on the most complex token,
a floating point constant.
See lexical analysis
for more information.
Writing a recognizer for individual tokens:
The most common way to write such a recognizer is to first
create a finite state machine (FSM) that describes the token.
Then the FSM is converted to a program, either by hand or using
a automated software tool to produce the program
(such as lex in Unix, or more recently
flex and many others).
Regular expressions (RE) are formally equivalent to FSMs, and
they are also regularly used in lexical analysis.
C-style comments: C-style comments
use an initial /* and a terminal */ to form a comment,
This is a very good illustration of the concept of using a FSM to help
construct a recognizer, that is, a program that can tell when it has
encountered a comment.
A recognizer for these comments is illustrated here:
C-style comments.
Notice that this recognizer can be significantly simplified,
and you are allowed to do this.
I will suggest in class that the code given above be written in
a slightly different form.
Floating point constants:
This is the most complex token that
will be worked with in this and the next recitation. For our purposes here,
a floating point constant consists initially of
any number (including 0) of digits,
followed by an optional decimal point(.), followed by
any number (including 0) of digits, followed by an optional
exponent part, which is an e or E,
followed by
an optional sign (+ or -),
followed by 1 or more digits
for the exponent. There must be at least one digit before or
after the decimal point (or both), and if there is no decimal point
there must be at least one digit. There must be either a
decimal point or the exponent part (or both).
This and the next recitation will include an integer constant
as a special case, since it just has no decimal point and no
exponent part.
Actual floating point constants in Java can also have an
optional trailing f or F
for float
constants and the optional trailing d or D
for double
constants. (With no optional trailing letter the constant is
double by default.) You should ignore these possibilities.
You should also ignore the limitation on the size
of the exponent, so the last illegal constant in the Java program
below (not illegal in C) would be accepted by your FSM.
Any initial optional
sign (+ or -) is always
treated as a separate operator, and it should not be present in
the input for this recitation.
Here is a short program that tries out various floating point
constants in C and Java. The program only shows most of the
possible forms.
All the constants shown are legal except for the last 3 in C
and the last 5 in Java (all commented out). Your code should accept
all but the last 3 on either side.
C Program |
Java Program |
// double.c: try out double constants
#include <stdio.h>
int main() { /* below "d" stands for "digit" */
printf("11.75e2 = %f: normal\n", 11.75e2);
printf(" .75e2 = %f: no d before .\n", .75e2);
printf(" 11.e2 = %f: no d after .\n", 11.e2);
printf(" 11e2 = %f: no .\n", 11e2);
printf("11.5e-2 = %f: neg after e\n", 11.5e-2);
printf("11.5e+2 = %f: pos after e\n", 11.5e+2);
printf("11.75E2 = %f: cap E\n", 11.75E2);
printf(" 11.75 = %f: no e\n", 11.75);
printf(" .75 = %f: no e, no d before .\n",.75);
printf(" 11. = %f: no e, no d after .\n", 11.);
printf(" 11 = %f: no e, no .\n", 11);
printf(
"333333333.222222222e-2 = %20.16f: too many d's\n",
333333333.222222222e-2);
printf("1.0e-444 = %20.16f: underflow\n", 1.0e-444);
printf("1.0e+444 = %20.16f: overflow\n", 1.0e+444);
/* printf(".e-2 = %f: only .\n", .e-2);
printf("e-2 = %f: only e\n", e-2);
printf("1.2e = %f: e, no d\n", 1.2e);
*/
}
| // Double: try out double constants
public class Double {
public static void main(String[] args) {
System.out.println("11.75e2 = "+""+ 11.75e2);
System.out.println(" .75e2 = "+" "+ .75e2);
System.out.println(" 11.e2 = "+""+ 11.e2);
System.out.println(" 11e2 = "+""+ 11e2);
System.out.println("11.5e-2 = "+" "+11.5e-2);
System.out.println("11.5e+2 = "+""+ 11.5e+2);
System.out.println("11.75E2 = "+""+ 11.75E2);
System.out.println(" 11.75 = "+" "+ 11.75);
System.out.println(" .75 = "+" "+ .75);
System.out.println(" 11. = "+" "+ 11.);
System.out.println(" 11 = "+" "+ 11);
System.out.println(
"333333333.222222222e-2 = " +
333333333.222222222e-2);
/* System.out.println("1.0e-444 = " + 1.0e-444);
System.out.println("1.0e+444 = " + 1.0e+444);
System.out.println(".e-2 = " + .e-2);
System.out.println("e-2 = " + e-2);
System.out.println("1.2e = " + 1.2e);
*/
}
}
| C Run and Output |
Java Run and Output |
% cc -o double double.c -Wall
double.c: In function ‘main’:
double.c:15:4: warning: format ‘%f’ expects type
'double', but argument 2 has type 'int'
double.c:19:4: warning: floating constant
truncated to zero
double.c:20:4: warning: floating constant
exceeds range of 'double'
% ./double
11.75e2 = 1175.000000: normal
.75e2 = 75.000000: no d before .
11.e2 = 1100.000000: no d after .
11e2 = 1100.000000: no .
11.5e-2 = 0.115000: neg after e
11.5e+2 = 1150.000000: pos after e
11.75E2 = 1175.000000: cap E
11.75 = 11.750000: no e
.75 = 0.750000: no e, no d before .
11. = 11.000000: no e, no d after .
11 = 11.000000: no e, no .
333333333.222222222e-2 = 3333333.3322222223505378:
too many d's
1.0e-444 = 0.0000000000000000: underflow
1.0e+444 = inf: overflow
|
% javac Double.java
19: floating point number too small
... println("1.0e-444 = " + 1.0e-444);
^
20: floating point number too large
... println("1.0e+444 = " + 1.0e+444);
^
21: illegal start of expression
... println(".e-2 = " + .e-2);
^
21: ')' expected
... println(".e-2 = " + .e-2);
^
23: malformed floating point literal
... println("1.2e = " + 1.2e);
^
% java Double
11.75e2 = 1175.0
.75e2 = 75.0
11.e2 = 1100.0
11e2 = 1100.0
11.5e-2 = 0.115
11.5e+2 = 1150.0
11.75E2 = 1175.0
11.75 = 11.75
.75 = 0.75
11. = 11.0
11 = 11
333333333.222222222e-2 = 3333333.3322222224
|
The three commented-out constants in the Java code produced error messages:
double.c:21:26: error: expected expression before '.' token
double.c:22:27: error: 'e' undeclared (first use in this function)
double.c:23:26: error: exponent has no digits
One method for constructing a program to recognize floating point
constants starts with a complete FSM that exactly recognizes
these constants.
Such an FSM is fairly complex, because of the many constraints
(such "at least one digit before or after the decimal point").
In our example, a string of digits, which would normally be recognized
as an int, instead will be treated as a double.
Here is a FSM for this case:

FSM for doubles
This method is similar to the
example for recognizing C-style
comments, but even more complicated.
An easier approach is to
start with a "skeleton" FSM that has the basic parts:
initial digits, decimal point, trailing digits, letter e,
sign on exponent, exponent. Then the constraints can be handled
with flags or counters or by other means, so that if there are no
initial digits and the constant starts with a decimal point,
the program knows that there must be trailing digits, and
similarly for other constraints.
Recitation work:
You should write a program in either C or in Java (or in C++) that
recognizes floating point constants.
Your program should save the characters of the constant as it goes
along, so that the actual character string making up the constant
in the program can be printed.
What you should submit:
Refer to the submissions directions and to
deadlines at the top of this page. The text file that you submit
should first have Your Name, the Course Number,
and the Recitation Number. The rest of the file
should have the following in it, in the order below, and clearly labeled,
including at the beginning the appropriate item letters:
a and b.
Contents of submission
for Recitation 2:
Last Name, First Name; Course Number; Recitation Number (2)
Last Name, First Name (of second person, if two are working together).
1. The Java or C++ source file or files for your program. Everything
should be run together into one file, with reasonable separators between
components (the separate source files and the output).
The code should be reasonably organized and written, with
special emphasis on header comments. (Not much emphasis on inline comments.)
2. You should give the results of a run using the following
source file for input. In case your program works for some inputs
but not for the complete source file below, you should explain this
and use simpler source that your program will handle (for some loss
of credit).
11 11. 62.47 .14
11.75e2 .75e2 11.e2
11e2 11.5e-2 11.5e+2
11.75E2 11.75 .75
.23 .84e-2 666
345.578e5
1234.5678
.e-2
e-2
1.2e
$
Output format:
In b above, your output should be the string that represents
each scanned floating point constant.
For example, with the input in 1 above, your output
could have the form:
Next double:"11"
Next double:"11."
Next double:"62.47"
Next double:".14"
Next double:"11.75e2"
Next double:".75e2"
Next double:"11.e2"
Next double:"11e2"
Next double:"11.5e-2"
Next double:"11.5e+2"
Next double:"11.75E2"
Next double:"11.75"
Next double:".75"
Next double:".23"
Next double:".84e-2"
Next double:"666"
Next double:"345.578e5"
Next double:"1234.5678"
(plus several things
for the last three items)
End-of-file problems:
Sometimes students have trouble recognizing end-of-file and
spend a lot of time on this simple problem. Instead,
you may use a "$" character to represent end-of-file (as shown above),
or you may use the system EOF that is provided.
(So do not get hung up on end-of-file.)
|
Key ideas:
Finite state machines (FSMs) can be used to aid in constructing
a program to recognize tokens of a programming language such
as floating point constants or C-style comments.
Revision date: 2013-01-22.
(Please use ISO
8601, the International Standard.)
|