|
 |
CS 3723
Programming Languages |
Finite Automata |
Languages: Start with
a finite set of symbols called an alphabet, such
as {0, 1}, or {a, b, c, d}, or the set of all
Ascii characters.
A finite string of these symbols is called a sentence.
A language is a set of sentences. Thus a language is
just some set of strings of symbols. For example, the set E of
all strings of 0s and 1s with an even number
of 1s is a language. (E = even parity bit-strings.)
Notice that everything
is finite except for the language, where in the interesting
cases there are infinitely many sentences, as with the example above.
We want to give finite descriptions of languages,
including infinite languages. Consider another example:
the set L of all strings of as and bs
that end with "abb".
The previous (English) sentence is an informal finite description.
It would (of course) be impossible to list the elements of
this language, but we could start: L =
{abb, aabb, babb, aaabb, ababb, baabb, bbabb, aaaabb, aababb, ...}.
The language L can be described or generated by the
regular expression
(a|b)*abb.
If Σ denotes the alphabet,
then (using regular expression
notation), Σ*
denotes the set of all finite strings
of symbols from the alphabet. This includes the empty
string "", which is denoted by the lower-case
Greek letter
epsilon = ε.
Thus any subset of Σ* is a language.
(In particular, the sets Σ* of
all finite strings, and the empty set Ø of no strings are
languages.)
Finite Automata (FA)
(singular: "finite automaton"):
These are also called Finite State Machines (FSM).
[These are defined informally below to make the concepts easier.
It is possible to give a formal, precise, and mathematical
definition of FSMs.]
An FA consists of:
- a finite set of states, each represented by a circle, with the
name of the state inside. We will usually use successive integers
starting with 0 as names, but the name can be anything.
- a finite set of transitions from state to state,
each represented by an arrow going from state to state.
A transition can go from a state to itself.
The transition arrows are labeled with one or more symbols.
The transition arrows are followed as input symbols for the alphabet
are processed. The input symbol must match the label on the
transition.
- One state (never more than one) must be designated
the start state or initial state. This state is
identified by an arrow with no start, but ending at the state.
- Any number of states (but at least one) must be designated the
terminal states or final states or accepting states.
These are identified using a double circle. The start state can
also be a terminal state.
Here is a diagram of an FA:

NFA for the RE (a|b)*abb
Here the alphabet is {a, b} and the set of states is
{0, 1, 2, 3}. The start state is 0, denoted by an
initial arrow. There is only one terminal state, 3, denoted
by a double circle around it.
There is a clever algorithm for constructing
a DFA that recognizes the same language as a given NFA
(the "subset algorithm") which we will cover in the course.
These two examples above of an
NFA and corresponding DFA both have the same numbered
states and look very similar. This similarity is an artifact of
such simple examples. In general the corresponding DFA
will be much more complex than the original NFA. In the worst
case it will be exponentially larger.
FAs = REs:
Finite automata are exactly equivalent to regular expressions.
If a given FA recognizes a certain language, there always is
an equivalent RE that describes exactly the same language.
Similarly, given an RE that describes a given language, there
is always an FA that recognizes the same language.
These languages are much simpler than the languages
described by CF grammars, which we will study in a few weeks.
CF grammars include REs and FAs as special cases.
(Revision date: 2014-06-20.
(Please use ISO
8601, the International Standard.)
|