|
 |
CS 3723
Programming Languages |
Finite Automata |
Languages: Start with
a finite set of symbols called an alphabet, such
as {0, 1}, or {a, b, c, d}, or the set of all
Ascii characters.
A finite string of these symbols is called a sentence.
A language is a set of sentences. Thus a language is
just some set of strings of symbols. For example, the set E of
all strings of 0s and 1s with an even number
of 1s is a language. (E = even parity bit-strings.)
Notice that everything
is finite except for the language, where in the interesting
cases there are infinitely many sentences, as with the example above.
We want to give finite descriptions of languages,
including infinite languages. Consider another example:
the set L of all strings of as and bs
that end with "abb".
The previous (English) sentence is an informal finite description.
It would (of course) be impossible to list the elements of
this language, but we could start: L =
{abb, aabb, babb, aaabb, ababb, baabb, bbabb, aaaabb, aababb, ...}.
The language L can be described or generated by the
regular expression
(a|b)*abb.
If Σ denotes the alphabet,
then (using regular expression
notation), Σ*
denotes the set of all finite strings
of symbols from the alphabet. This includes the empty
string "", which is denoted by the lower-case
Greek letter
epsilon = ε.
Thus any subset of Σ* is a language.
(In particular, the sets Σ* of
all finite strings, and the empty set Ø of no strings are
languages.)
Finite Automata (FA)
(singular: "finite automaton"):
These are also called Finite State Machines (FSM).
An FA consists of a finite set of states and finite set
of transitions from state to state that occur as input symbols
for the alphabet are processed. A transition can go from a state
to itself. One state (never more than one) must be designated
the start state or initial state. Any number of
states (but at least one) must be designated the
terminal states or final states or accepting states.
Here is a diagram of an FA:

NFA for (a|b)*abb
Here the alphabet is {a, b} and the set of states is
{0, 1, 2, 3}. The start state is 0, denoted by an
initial arrow. There is only one terminal state, 3, denoted
by a double circle around it.
- NFAs.
The particular FA above is called a
Non-Deterministic Finite Automaton (NFA).
This NFA is said to recognize the language L above.
In order to recognize strings from L with the NFA above, we
process an input string character-at-a-time, starting with the
0th character. Start with the start state 0. For
each character processed, follow a path labeled with that character.
If we can manage (somehow) to end up in the terminal state 3 just
as we finish all the characters in the string, we accept
the string as belonging to L. Otherwise we reject
the string.
This automaton is called non-deterministic because there
is a case (there must be at least one)
with two possible choices for the arrow to take.
Suppose we are processing the symbol a in
the string "abaabaabb". How
do we know when to leave state 0 and finish the string?
(We could decide with a look-ahead buffer, but we want a simpler
solution.)
- DFAs
Given an NFA, it is always possible to create what is
called a Deterministic Finite Automaton
(DFA) that recognizes the same language.
For the language L above, here is the corresponding DFA:

DFA for (a|b)*abb
This automaton is called deterministic because in each
case there is exactly one choice for which arrow to follow.
The process is uniquely determined. If you end up in the
terminal state 3, and have reached the end of the string,
then you accept the input string as belonging to L and otherwise
you reject it.
There is a clever algorithm for constructing
a DFA that recognizes the same language as a given NFA
(the "subset algorithm") which we will cover in the course.
These two examples above of an
NFA and corresponding DFA both have the same numbered
states and look very similar. This similarity is an artifact of
such simple examples. In general the corresponding DFA
will be much more complex than the original NFA. In the worst
case it will be exponentially larger.
Revision date: 2013-06-07.
(Please use ISO
8601, the International Standard.)
|