|
 |
CS 3723
Programming Languages |
Finite Automata:
Subset Algorithm |
Introduction: This page
is a continuation of the page
Finite Automata,
so you should read that first. On this page we describe an
algorithm that converts an arbitrary NFA into
a DFA that accepts the same language as the NFA.
Example 1:
Below on the left is the NFA
defined by the regular expression
(a|b)*abb. In this
example we will show how to convert this NFA into the
DFA at the right so that the same language is accepted.
The states of the new DFA will each
be given by a subset of the states of the NFA,
that is, subsets of the set {0,1,2,3}. There are 16 of these
subsets. Here are all the subsets that include 0:
{0}, {0,1}, {0,2}, {0,3},
{0,1,2}, {0,2,3}, {0,1,3}, and {0,1,2,3).
The algorithm creates a DFA, so that as it processes
characters of an input string, it
keeps track of all possible states that the original NFA might be in,
up to that point. Initially, the program is in the start state
0, so with the DFA it is in the state {0}.
On input b, the NFA can only go to state 0, so there is
a transition on b from {0} to {0} in the DFA.
On input a, however, the NFA can
go to state 0 or to state 1,
so there is a transition on a from {0} to the new
state {0,1}. This is shown below with the DFA as arrows
going out of the state {0}. The remaining arrows are constructed
the same way. For example, if the machine is in state {0,2}
and the input is a b,
then from state 0 it could only go to 0,
and from state 2 is could only go to 3. Thus the transition
on b is from state {0,2} to state {0,3}.
Since this latter
state contains a terminal state of the original NFA, it
is a terminal state of the new NFA, since it's possible to
be in a terminal state at that point. Note that we only needed
to use 4 of the 16 possible states.
We ignore the ones we don't need.

NFA for (a|b)*abb
| |

DFA for (a|b)*abb
|
Subset Algorithm
(Tables) |
NFA,
(a|b)*abb |
State | a | b |
0 (start) | 0,1 | 0 |
1 | − | 2 |
2 | − | 3 |
3 (term) | − | − |
| |
DFA,
(a|b)*abb |
States | a | b |
{0} (start) |
{0,1} | {0} |
{0,1} |
{0,1} | {0,2} |
{0,2} |
{0,1} | {0,3} |
{0,3} (term) |
{0,1} | {0} |
|
See All States for the same example of
the subset algorithm showing all possible states and
all transitions.]
Example 2:
Below on the left is another example: the NFA
defined by the regular expression
(a|b)*(aa|bb)(a|b)*.
Let's work out one transition in this example.
Suppose the DFA on the right is in state {0,1}.
Suppose the next input symbol is a b.
Then if the original NFA was in state 0,
it could transition to either 0 or to 1.
If it was in state 1, there is no transition.
So the new transition on b in the new DFA
goes from state {0,1} to state {0,2}.

NFA for (a|b)*(aa|bb)(a|b)*
| |

DFA for
(a|b)*(aa|bb)(a|b)*
|
Again we represent the NFA using a table at the left.
This makes it easier
to see how the corresponding DFA at the right is constructed.
Assuming 0 is the NFA start state,
{0} will be the DFA start state.
After processing each line of the DFA table, you add
any new subset states that came into being as new lines at the right.
This process has to terminate because there are only finitely
many subsets. At each stage, you take each element of the
DFA subset at the right, say it is {0,1}. Then take the union
of the corresponding states in the table at the left.
So the result of symbol a from {0,1} will be the set
containing 0 and 1, along with 3, or {0,1,3}.
Each entry is handled this way.
Subset Algorithm
(Tables) |
NFA,
(a|b)*(aa|bb)(a|b)* |
State | a | b |
0 (start) | 0,1 | 0,2 |
1 | 3 | − |
2 | − | 3 |
3 (term) | 3 | 3 |
| |
DFA,
(a|b)*(aa|bb)(a|b)* |
States | a | b |
{0} (start) |
{0,1} | {0,2} |
{0,1} |
{0,1,3} | {0,2} |
{0,2} |
{0,1} | {0,2,3} |
{0,1,3} (term) |
{0,1,3} | {0,2,3} |
{0,2,3} (term) |
{0,1,3} | {0,2,3} |
|
Formal Algorithm:
In studying this algorithm, we will keep referring to the specific
Example 2 above.
Let N be an NFA. This means we have a set Σ
of symbols, a list A of states, which are just non-negative integers,
a start state, which we assume is 0, and a list B of
terminal states, which is a sublist of A.
(Possibly B = A. B must be non-empty.
Possibly 0 is in B.)
N also has a transition function T. For each
state x, and for each symbol u, T(x,u)
is a list of states of N, possibly an empty list.
Starting with state x, and given the symbol u,
this list consists
of the states that can be transitioned to, from x given
symbol u. Above on the left
is a definition of T for Example 2.
We will construct a DFA D that accepts the same language as N.
D will have the same set Σ of symbols as N.
The states of D will be sets of states of N,
including possibly the empty set.
The start state of D will be {0}, assuming 0 is the
start state of N.
The algorithm below will construct a list A' of states of D,
where each state is a set of the states of N.
We will use a queue Q holding states of D, as they are produced
by the algorithm. The algorithm will also define the
transition function T' of D, and the list B'
of terminal states of N.
Algorithm:
Given N as above, construct D. |
Add {0} to A'. // add start state of D to list of states of D
Insert {0} into Q. // put state state of D into queue
while (Q is not empty) {
R = remove(Q). // R = state of D = set of states of N
for (each symbol u) { // define T'(R,u) for each symbol u
S = Ø. // S = state of D to be constructed, initially empty
for (each x in R) { // x = integer, a state of N
S = {T(x,u)} union S.
}
if (S not on the list A') {
Add S to A'. // new state S in list A'
Insert S into Q. // new state S in list queue Q
}
T'(R,u) = S.
}
}
B' = all states of A' containing a state of B. |
Note: Algorithm changed 2014-09-16. [S = Ø Moved
down into outer for loop. Error noticed by Mr. Wimberly.]
State Minimization:
It turns out that the DFA constructed by the subset
algorithm is not necessarily as simple as possible.
That's the case for the DFA on the right above.
It is equivalent to the DFA below, obtained by identifying
the two terminal states. For simplicity I also renamed
the states. There exists a state minimizing algorithm,
which I won't present here. Any NFA has a unique
equivalent DFA with a minimum number of states.
(Well, unique up to renaming things.)

Minimal DFA for
(a|b)*(aa|bb)(a|b)*
With States Renamed
Example 3:
This time start with a slightly different NFA
defined by the same regular expression from Example 2:
(a|b)*(aa|bb)(a|b)*.
The corresponding DFA, given by the subset algorithm, is much
more complicated. The final minimal DFA is the same
as the last DFA in the previous section (just above this sentence).
Again the minimal DFA is obtained by identifying all 6
of the terminal states.

NFA for (a|b)*(aa|bb)(a|b)*
| |

DFA for
(a|b)*(aa|bb)(a|b)*
|
Subset Algorithm
(Tables) |
NFA,
(a|b)*(aa|bb)(a|b)* |
State | a | b |
0 (start) | 0,3 | 0,1 |
1 | − | 2 |
2 (term) | 2 | 2 |
3 | 4 | − |
4 (term) | 4 | 4 |
| |
DFA,
(a|b)*(aa|bb)(a|b)* |
States | a | b |
{0} (start) |
{0,3} | {0,1} |
{0,3} |
{0,3,4} | {0,1} |
{0,1} |
{0,3} | {0,1,2} |
{0,3,4} (term) |
{0,3,4} | {0,1,4} |
{0,1,2} (term) |
{0,2,3} | {0,1,2} |
{0,1,4} (term) |
{0,3,4} | {0,1,2,4} |
{0,2,3} (term) |
{0,2,3,4} | {0,1,2} |
{0,1,2,4} (term) |
{0,2,3,4} | {0,1,2,4} |
{0,2,3,4} (term) |
{0,2,3,4} | {0,1,2,4} |
|
Examples 4:
Here is a different NFA, obtained from the
regular expression (a|b)*a(a|b)(a|b).
The DFA on the right below comes from the same subset
algorithm. In this case it is already the minimum
state DFA. (The diagram on the right gives the subsets
without commas or curly brackets.)

NFA for (a|b)*a(a|b)(a|b)
|

DFA for
(a|b)*a(a|b)(a|b)
|
|
Next, start with a similar NFA with 1 additional state.
In this case the DFA has twice as many states, and it again
is minimal.

NFA for (a|b)*a(a|b)(a|b)(a|b)
|

DFA for
(a|b)*a(a|b)(a|b)(a|b)
Note: Label on arrow from (03) to ((04))
should be "b".
|
|
In general, suppose one starts with a regular expression built from
(a|b)*a
followed by n copies of (a|b).
This NFA has states 0, 1, ... , n, and
the corresponding minimal DFA has 2n states.
In fact the subsets needed by this DFA are exactly the
2n
subsets of the set {0,1,...,n} that contain 0.
(Revision date: 2014-08-28.
Please use ISO
8601, the International Standard.)
|