CS 3723 Python, FAs

CS 3723
Programming Languages

9.1 Simulating FAs

Note: This program uses no classes. See 9.2 Using a Class for the same program as a Python class.

Simulating Finite Automat in Python: I decided to use finite automata as an early exercise in studying Python. I would write code to simulate the actions of a finite automaton. These would include DFAs, NFAs, and NFAs with ε-moves. First step is to read a file representation of the automaton as a graph, and translate it into an internal form. We need to be able to search for information about the FA.

Here I'm only interested in the graph of an FA, but the code below works for an arbitrary graph, with limitations about the names of components. This code only reads in a static version of the graph and does not handle dynamic changes.

I have working code in Java and in C for this problem. Suppose there are n vertices (states for the FA). My representation using these traditional languages had an array of n items, each a class instance containing information about the vertex and containing a link to the adjacency list, a linked list. This linked list required another self-referential class.

In Python, a list or a tuple (= list that can't be modified) can serve as a class, since the elements of the list can be anything at all. You don't get the initialization that a class has, but otherwise this is very simple.

A Specific Example of a NFA: First see the FA as a diagram:

Sample NFA with ε-moves
NFA for (a\|b\|c)*(ab\|aac). Note: the dot on the arrow from state 0 to state 1 represents (a\|b\|c)

Next is the file representation of the graph and the internal representation. The file starts with the number of vertices (16), followed by a string giving the symbols on transitions (abc) not including the symbol for ε, which is @. Next comes a string giving the status of each vertex (1 = start, 2 = terminal, 3 = start and terminal). Finally in a list of all the transitions.

Vertices must be labeled with successive integers starting with 0. There must be a single start state, which doesn't have to be 0. There can be any number >= 1 of terminal states. The start state can also be terminal.

The internal form is a list of 16 items (one for each vertex). Each item is tuple, giving the status followed by the adjacency list for that vertex. The adjacency list is a list of tuples.

FAs in Python

File Format

Internal Form

16
abc
0010000000000002
0 a 1
0 b 1
0 c 1
1 @ 3
1 @ 0
2 @ 0
2 @ 3
3 @ 14
4 a 5
5 @ 6
6 b 7
7 @ 15
8 a 9
9 @ 10
10 a 11
11 @ 12
12 c 13
13 @ 15
14 @ 4
14 @ 8

[(0, [('a', 1), ('b', 1), ('c', 1)]),
 (0, [('@', 3), ('@', 0)]),
 (1, [('@', 0), ('@', 3)]),
 (0, [('@', 14)]),
 (0, [('a', 5)]),
 (0, [('@', 6)]),
 (0, [('b', 7)]),
 (0, [('@', 15)]),
 (0, [('a', 9)]), 
 (0, [('@', 10)]),
 (0, [('a', 11)]),
 (0, [('@', 12)]),
 (0, [('c', 13)]),
 (0, [('@', 15)]),
 (0, [('@', 4), ('@', 8)]),
 (2, [])]

Here is Python code that reads the file and converts the external representation of the graph to the internal representation.

FAs in Python, Internal Form

# fa.py: build internal fa
import sys
import re

dfa = True
fa = []
ss = 0 # start state
stn = None  # num of states
terms = []  # list of terminal states
symbs = None   # array of symbols
eps = False
eps_symb = '@'

# fa is the actual finite automata
# for i in range(0,stnum):
#     fa[i] is the ith vertex data
#     fa[i][0] gives status of ith vertex
#     fa[i][1] gives the ith adjacency list

def read_fa(filename):
    f = open(filename,'r')
    line = f.readline()
    # get first digit of line 0
    r1 = re.compile(R"\s*(\d+).*")
    m1 = r1.search(line)
    global stn  # the number of states
    stn = int(m1.group(1))
    
    global symbs # all symbs in string
    # @ = espilon handled as special case
    symbs = f.readline()
    # stn many ints on separate lines,
    #  giving the status of each state
    #  1 = start, 2 = term, 3 = start and term
    # stp is the status list for each state
    stp = f.readline() # each state status
    ssn = 0  # num of start states (must be 1)
    tn  = 0  # num of term states (at least 1)
    for i in range(0,stn):
        fa.append((int(stp[i]), []))
        if stp[i] == '1' or stp[i] == '3':
            global ss
            ss = i
            ssn = ssn + 1
        if stp[i] == '2' or stp[i] == '3':
            terms.append(i)
            tn = tn + 1
    if ssn != 1:
        sys.stdout.write("ERR, != 1 start st")
    if tn < 1:
        sys.stdout.write("ERR, 0 term states")
            
    # now work on transition data
    r2 = re.compile(R"\s*(\d+)\s+(\S)\s+(\d+)")
    for line in f:
        # read tail_state symb head_state
        m2 = r2.search(line)
        st1 = int(m2.group(1))
        st2 = int(m2.group(3))
        sym = m2.group(2)
        if sym == eps_symb: # check for eps
            global eps
            eps = True
        #  append onto the adjacency list
        fa[st1][1].append( (sym, st2) )
    global dfa
    dfa = check_dfa()

def check_dfa():
    for i in range(0,stn):
        for ch in symbs: # string of all symbs
            t = lookup(i, ch)
            if t != None and len(t) >= 2:
                return False
    return True
                  
def lookup(state, symb):
    ret = []
    if state < 0 or state >= len(fa):
        return None
    elif len(fa[state][1]) == 0:
        return None
    else:
        t = len(fa[state][1])
        for i in range(0,t):
            if fa[state][1][i][0] == symb:
                ret.append(fa[state][1][i][1])
        return ret

def get_start_state():
    return ss

def terminals():
    return terms

def is_terminal(state):
    if fa[state][0] == 2 or fa[state][0] == 3:
        return True
    return False

def is_dfa():
    return dfa
    
def printfa():
    if eps:
        sys.stdout.write("NFA with epsilon: ")
    elif dfa:
        sys.stdout.write("DFA: ")
    else:
        sys.stdout.write("NFA: ")
    sys.stdout.write("States " + str(stn) +
         ", Symbols: " + symbs + "\n")
    for i in range(0,stn):
        if i < 10:
            sys.stdout.write(" " + str(i))
        else:
            sys.stdout.write(str(i))
        if i == ss:
            sys.stdout.write("s")
        else:
            sys.stdout.write(" ")
        if is_terminal(i):
            sys.stdout.write("t:")
        else:
            sys.stdout.write(" :")
        print(fa[i])

Finally, there are two Python programs that use the program fa.py as a library module: dfa.py which handles DFAs, and nfa.py which handles NFAs, with or without ε-moves. Of course, as far as simulation is concerned, an NFA is just a special case of a DFA, so the NFA program will also handle all the DFA examples, producing single-set results at each stage.

dfa.py: code to simulate DFA

# dfa.py: simulate a dfa
import auto
import sys

auto.read_fa("dfa4.auto")
auto.printfa()
sys.stdout.write("\n")
f = open("dfa4.input",'r')
instr = f.readline()
sys.stdout.write("Input: " + instr + "\n")
sys.stdout.write("Simulation Run ...\n")
s1 = auto.get_start_state()
for ch in instr:
    if ch != '$':
        s2 = auto.lookup(s1, ch)[0]
        sys.stdout.write(str(s1) + " " + 
          ch + " " + str(s2))
        if auto.is_terminal(s2):
            sys.stdout.write(" term")
        sys.stdout.write("\n")
        s1 = s2
    else:
        break

DFA Runs: sample runs.

nfa.py: code to simulate NFA

# nfa.py: simulate an nfa
import auto
import sys

def is_term(s):  # s set, contains term?
    if not s:
        return False
    else:
        for x in s:
            if auto.is_terminal(x):
                return True
        return False

def eps_cl(s): # s a set, epsilon closure
    # could use s for b, save one list
    t = list(s) # list t is stack or queue
    b = set(t)  # set of "black" states
    while len(t) != 0:
        u = t.pop() # remove end elt
        # use @ for epsilon
        r = auto.lookup(u,auto.eps_symb)
        if r != None:
            for v in r:
                if not (v in  b):
                    b.add(v) # add
                    t.append(v) # push
    return b
    
auto.read_fa("dfa1.auto")
auto.printfa()
sys.stdout.write("\n")

# OK, let's simulate this sucker
f = open("dfa1.input",'r')
instr = f.readline()
ss = auto.get_start_state()
x = []
x.append(auto.get_start_state())
s1 = set(x)
s1 = eps_cl(s1) # start state
sys.stdout.write("Start:\n   " +
       str(list(s1)) + "\n")
i = 0
while instr[i] != '$':
    s2 = set()
    for y in s1:
        s = auto.lookup(y, instr[i])
        if not s:
            s = set()
        else:
            s = set(s)
        s2 = s2 | s  # as sets, union
    # now got set, need epsilon colsure
    s2 = eps_cl(s2)
    sys.stdout.write(instr[i] + ": " +
          str(list(s2)))
    if is_term(s2):
        sys.stdout.write(" term ")
    sys.stdout.write("\n")
    s1 = s2
    i = i + 1 # next instr

NFA Runs: sample runs.

(Revision date: 2014-06-17. Please use ISO 8601, the International Standard.)