CS 3723
  Programming Languages  
  Debugging REs   

Suppose we want to use regular expressions for some type of transformation or conversion.


Example Problem: Converting Date and Time: Without using library functions, we want a Python program that will convert an "American Style" time and date to the corresponding "International Style." (Python has an extensive datetime module, but we are trying to learn programming here.) Here are specific formats:

"American" style International styleComments
10:03 pm, April 20, 20042004-04-20 22:03:00 (random)
  8:04 am, January 4, 19981998-01-04 08:04:00 (random)
12:00 pm, July 4, 20122012-07-04 12:00:00 This is noon
11:59 pm, December 31, 20032003-12-31 23:59:00 1 min. < next entry
12:00 am, January 1, 20042004-01-01 00:00:00 This is midnight

First we need a regular expression to tear apart data formatted like the dates on the left above. It is not a good idea to write the regular expression and then an entire program based on it, only to find mistakes in the original regular expression. Better would be to "debug" the regular expression as an initial step. For this purpose, I decided to use a simplified version of the debug routine in the initial page on regular expressions. I actually did this for my first version of the regular expression and for the 5 data items. I then got a "No match" message for each example.

Then I simplified and tried to match just the first 2 fields, as shown explicitly below. As you can see this run failed also. Then I tried just the first field by itself, which also failed. (This run is not shown below.) Now (I should have done this before) I finally looked carefully at the first part of the regular expression and saw a character that shouldn't be there (shown in red below). Correcting this error (by deleting the character) produced the expected run on the right below.

Debugging Regular Expressions
Program, with Data Output
#!/usr/bin/python
import re
import sys
def regtest(reg, dat):
    sys.stdout.write("\nInputs: \"" + reg +
           "\", \"" + dat + "\"\n")
    r = re.compile(reg)
    # first try "match"
    m = r.match( dat )
    s = r.search( dat )
    sys.stdout.write("Search: ")
    if s != None:
        j = s.lastindex
        for i in range(0,j+1):
            g = s.group(i)
            sys.stdout.write("group(" +
                  str(i) + "): ")
            sys.stdout.write(g);
            if i != j:
                sys.stdout.write("\n        ");
        sys.stdout.write("\n")
    else:
        sys.stdout.write("None\n")
# data hardwired in as function calls
regexp = regexp = r'\s*([0-9]{1,2]}):([0-9]{2})'
regtest(regexp, "10:03")
regtest(regexp, " 8:04")
regtest(regexp, "12:00")
regtest(regexp, "11:59")
regtest(regexp, "12:00")
% python date.test0.py
Inputs: "\s*([0-9]{1,2]}):([0-9]{2})", "10:03"
Search: None
Inputs: "\s*([0-9]{1,2]}):([0-9]{2})", " 8:04"
Search: None
Inputs: "\s*([0-9]{1,2]}):([0-9]{2})", "12:00"
Search: None
Inputs: "\s*([0-9]{1,2]}):([0-9]{2})", "11:59"
Search: None
Inputs: "\s*([0-9]{1,2]}):([0-9]{2})", "12:00"
Search: None
% python date.test0.py Inputs: "\s*([0-9]{1,2}):([0-9]{2})", "10:03" Search: group(0): 10:03 group(1): 10 group(2): 03 Inputs: "\s*([0-9]{1,2}):([0-9]{2})", " 8:04" Search: group(0): 8:04 group(1): 8 group(2): 04 Inputs: "\s*([0-9]{1,2}):([0-9]{2})", "12:00" Search: group(0): 12:00 group(1): 12 group(2): 00 Inputs: "\s*([0-9]{1,2}):([0-9]{2})", "11:59" Search: group(0): 11:59 group(1): 11 group(2): 59 Inputs: "\s*([0-9]{1,2}):([0-9]{2})", "12:00" Search: group(0): 12:00 group(1): 12 group(2): 00

So it now worked for the first 2 fields. At this point, I tried my entire regular expression again with the full data, and it worked perfectly. (Which of course does not guarantee that there are no errors.)

Of course I'm now showing the full regular expression -- that's part of your work.

( Revision date: 2014-07-20. Please use ISO 8601, the International Standard Date and Time Notation.)