CS 3723
  Programming Languages  
  0. Getting Started   
(with hidden files)


How to Get Started: As a start you should type in the first Python program given below and run it. This work has several objectives:

  • Start working on the Python language. In particular, the Python program you are to copy uses three specific programming methods:

    • Reading from a text file.
    • Using regular expressions to extract fields from a string.
    • Using Python's "list" data type. (A Python list has all the features of an array and of a linked list, plus many more.)

  • Learn to write and run Python programs. Note: You can program in Python on almost any platform: Python is usually already present on Apple and Linux machines, and it is available for download on Windows.

Note: A "copy" program may seem weird and useless, but you really should type it in from scratch. (There is significant benefit from typing it yourself. If you make typing errors, so much the better.)


Running a Python Program: Assume your program is in the file "h0.py" on a Unix/Linux system. Execute the command:
    % python h0.py
where "%" stands for the prompt. Better is to use the command:
    % python h0.py -tt
With the "-t" option, Python issues a warning if there are any tabs, and "-tt" makes any tabs an error.

        Alternatively, you can add the following as the first line of the file h0.py:
          #!/usr/bin/python -tt
          

        And then type:

          % chmod +x h0.py # make h0.py executable
          % ./h0.py # execute, using first line to find python
          

        I mostly won't use this method (more trouble than it's worth for writing sample programs), but it is very useful for systems programming and larger production work.

Larger copy of the above image: image

n Python Copy Program Input File: "students.txt"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# copy.py: H0 program to copy
import sys # for sys.stdout.write
import re # for regular expressions
f = open("students.txt",'r') # open for reading
# reg expr: matches student name plus 3 int scores
r = re.compile(r"(\w+ \w+)\s+(\d+)\s+(\d+)\s+(\d+)")
ave = [] # empty list
sys.stdout.write("Name           " +
       "  E1" + "  E2" + " Final" + "   Grade\n\n");
for line in f: # iterate through each line in file
    m = r.search(line) # m is match data
    if m != None:
        e1 = float(m.group(2)) # exam 1
        e2 = float(m.group(3)) # exam 2
        f  = float(m.group(4)) # final
        sum = e1+e2+f # total raw score
        aver = sum/3.5 # percent score
        ave.append(aver) # add entry to list
        sys.stdout.write(line.strip() + "   ")
        sys.stdout.write("%6.2f" % aver)
        sys.stdout.write("\n")
    else:
        sys.stdout.write("No match!\n")
        break
tot = 0
for av in ave: # iterate through array
    tot += av # add student scores
sys.stdout.write("\nCourse Ave: %6.2f\n"
      % (tot/len(ave)) ) # system demanded extra ()
% cat students.txt
Bruce Wayne      85  67  134
Peter Parker     72  71  129 
Bruce Banner     55  65  114
Clark Kent       91  88  143
Princess Diana   70  62  131
Barbara Gordon   96  89  147
Selina Kyle      77  74  105
Output
% python copy.py
Name             E1  E2 Final Grade

Bruce Wayne      85  67  134  81.71
Peter Parker     72  71  129  77.71
Bruce Banner     55  65  114  66.86
Clark Kent       91  88  143  92.00
Princess Diana   70  62  131  75.14
Barbara Gordon   96  89  147  94.86
Selina Kyle      77  74  105  73.14

Course Ave:  80.20

Notes:

  • In order to produce just this output, the list "ave" is not needed, since the program could compute a running sum. I wanted to illustrate lists.
  • Python mostly has no declarations of variables, and a given variable can mostly be used for any type you like, or even for several types in the same program.
  • When you are inside parens (as in lines 8-9 and 28-29 above), you may indent on the second line in any way you like. Otherwise, the indenting should be exactly as shown above, with exactly 4 spaces for each level of indenting, and with no tab characters. (I was able to set my editor so that the Tab key always produces 4 spaces and no tabs.)
  • My favorite mistake is to leave off the ":" character just before a new level of indenting, or to use ";".
  • The "strip( )" in line 19 strips off any whitespace characters from the start or end of the character string "line". In this case there was only a newline at the end. The "strip( )" method makes a new copy of the string for use in line 19, but the variable "line" retains the newline. You would need to write "line = line.strip( )" to get rid of the newline from the variable "line".
  • Python regular expressions are mostly the same as in any other scripting language, but there are significant notational differences. Python has nothing like the "$" variables that can be used in Perl to refer to the different matching groups. Instead you must use the "group( )" method as shown on lines 13-15. Thus  "m.group(3)"  takes on the role that  "$3"  has in Perl. The result is a string, and lines 13-15 use the function  "float( )"  to convert this string to floating point (a double).


Parts of Python Illustrated by this Program:

  • Reading from a text file: "students.txt" is a file of records in Linux/Unix where each record is terminated with a newline ("\n"). (In Windows, records are often terminated with a carriage return and a linefeed.)

    Line 5 is an iterator that provides each record of the file in sequence. The record retains its newline at the end (in Unix). Lines 12-17 form a while loop that explicitly reads each record and explicitly stores the record in the variable "line". (The name "line" is an arbitrary choice.)

    n Read and Output a File, Three Programs Output
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    # file.py: file: open, read, write
    import sys
    
    f = open("students.txt",'r')
    for line in f:
        sys.stdout.write(line) # "line" has "\n" at end
    
    # file2.py: explicitly read import sys f = open("students.txt",'r') while True: line = f.readline() # at EOF return empty string if not line: break else: sys.stdout.write(line)
    # file_stdin.py: read from stdin import sys for line in sys.stdin: sys.stdout.write(line) # "line" has "\n" at end
    % python file.py
    Bruce Wayne      85  67  134
    Peter Parker     72  71  129 
    Bruce Banner     55  65  114
    Clark Kent       91  88  143
    Princess Diana   70  62  131
    Barbara Gordon   96  89  147
    Selina Kyle      77  74  105
    
    % python file.py Bruce Wayne 85 67 134 Peter Parker 72 71 129 . . . Selina Kyle 77 74 105
    % python file_stdin.py < students.txt Bruce Wayne 85 67 134 Peter Parker 72 71 129 . . . Selina Kyle 77 74 105

    Below is a program that fakes having a file, "opening" it, and "reading" from it. This can be convenient if you interact with a terminal window that has Python but doesn't support input. Notice that the only thing changed below is the new definition of the variable f (in red).

    n Fake a file Output
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    # fakefile.py: fake file input
    import sys
    
    f = [ "Bruce Wayne      85  67  134\n",
          "Peter Parker     72  71  129\n",
          "Bruce Banner     55  65  114\n",
          "Clark Kent       91  88  143\n",
          "Princess Diana   70  62  131\n",
          "Barbara Gordon   96  89  147\n",
          "Selina Kyle      77  74  105\n" ]
    for line in f:
        sys.stdout.write(line)
    
    % python fakefile.py
    Bruce Wayne      85  67  134
    Peter Parker     72  71  129 
    Bruce Banner     55  65  114
    Clark Kent       91  88  143
    Princess Diana   70  62  131
    Barbara Gordon   96  89  147
    Selina Kyle      77  74  105
    


  • Formatted input using a regular expression: Here we don't show the input, but show how the fields of one string are extracted.

    The regular expression (RE) below is what comes between the quote marks in  r"xxx" . In this case it is "xxx". Each RE describes a collection of character strings. (The above describes only a triple of "x"s.) In the example here, we want to describe the contents of lines in the file. Short-hand notation gives us "\s" for white space, "\d" for a digit, and "\w" for a character in a word (upper-lower case letter). Adding a "+" to the right means "one or more occurrences of". Anything inside parentheses are matched and available for use. (A parenthesis itself is represented by "\(" or "\)".) The "compile" method makes the RE ready for use, and the "search" method searches for any matches available. There's a lot more to this, and we'll go over it later.

    n Use REs to Extract Fields From a String Common Output
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    # regexp.py: use RE to extract fields
    import sys
    import re
    
    def printstr(s):
        sys.stdout.write("\"" + s + "\"\n")
    
    r = re.compile(r"(\w+ \w+)\s+(\d+)\s+(\d+)\s+(\d+)")
    m = r.search("Bruce Wayne      85  67  134")
    if m != None:
        for i in range(0,5):
            printstr(m.group(i))
        t = r.split("Bruce Wayne      85  67  134")
        sys.stdout.write(str(t) + "\n")  
    
    % python regexpr.py
    "Bruce Wayne      85  67  134"
    "Bruce Wayne"
    "85"
    "67"
    "134"
    ['', 'Bruce Wayne', '85', '67', '134', '']
    


  • Making use of a Python list: In Python, the common and versatile data structure "list" is a combination of all the features of a linked list and of an array, along with much more besides.

List in Python
n List Notation Array Notation
1
2
3
4
5
6
7
8
9
10
# ave.py: calculate average of list
import sys

ave = [81.71, 77.71, 66.86, 92.00,
       75.14, 94.86, 73.14]
tot = 0
for av in ave:
    tot += av
sys.stdout.write("Average: %6.2f\n"
      % (tot/len(ave)) )
# ave.py: calculate average of list
import sys

ave = [81.71, 77.71, 66.86, 92.00,
       75.14, 94.86, 73.14]
tot = 0
for i in range(0,len(ave)):
    tot += ave[i]
sys.stdout.write("Average: %6.2f\n"
      % (tot/len(ave)) )
Common Output
% python ave.py
Average:  80.20


Suppose you need to use Python from a browser: The best case is to use the
    ideone Python 2 and 3 simulator.
The link is set for Java by default. You have to change a box from Java to Python or to Python 3.

You can copy the whole students.txt file into their spot for stdin. You have to read from the file sys.stdin. One way is to use the line

    for  line  in  sys.stdin:
as shown above. (In place of line 10 in the original program. You also need to delete or comment out line 4, the line that opens the file.)

( Revision date: 2015-01-02. Please use ISO 8601, the International Standard.)