CS 3723
  Programming Languages  
  3.1 Strings   

Reference links given below will be to the online book: Building skills in Python


3.1. Strings: See especially Strings for a lot of material not covered below.


3.1.1. Extended Slices: See Extended Slices. These apply equally to strings, tuples, and lists. But keep in mind that strings and tuples are immutable, so copies must be made in those cases.

  • slice operator [ ] with a single operand: this picks off the element indexed by the operand. A sequence a (string, tuple, or list) has its elements counted by either one of:

            0,         1, ..., len(a)-2, len(a)-1
      -len(a), -len(a)+1, ...,       -2,       -1

    Any index i satisfying i > len(a)-1. or i < -len(a) will raise an IndexError exception.

  • slice operator with a two operands: this looks like [start:end], and picks off a contiguous string of elements from start up to but not including end. Of course these have to be in range, they can be variables, start defaults to 0 and end defaults to len(a). The program below shows a few simple cases of this.

  • slice operator with a three operands: this looks like [start:end:step] ("step" is also called "stride"). step defaults to 1 and cannot be 0, but can be negative. Here, the construct steps through the sequence forward (positive step) or backward (negative step), picking off elements. Again some examples are given below.

    By far the most important example is to use the operator [::-1] to reverse a sequence (string, list, or tuple). (There doesn't seem to be a reverse() method for strings, though there is one for lists.)

Use of Extended Slices
Source Output
# slice.py: extended slices
import sys

a = "abcdefghij"
sys.stdout.write("a=       " + a + '\n')
sys.stdout.write("a[1:5]=  " + a[1:5] + '\n')
sys.stdout.write("a[3:]=   " + a[3:] + '\n')
sys.stdout.write("a[:4]=   " + a[:4] + '\n')
sys.stdout.write("a[-5:-2]=" + a[-5:-2] + '\n')
sys.stdout.write("a[-5:7]= " + a[-5:7] + '\n')
sys.stdout.write("a[-5:2]= " + a[-5:2] + '(empty)\n')

abc = "abcdefghijklmnopqrstuvwxyz"
sys.stdout.write("abc=        " + abc + '\n')
sys.stdout.write("abc[::2]=   " + abc[::2] + '\n')
sys.stdout.write("abc[1::2]=  " + abc[1::2] + '\n')
sys.stdout.write("abc[::-1]=  " + abc[::-1] + '\n')
sys.stdout.write("abc[-1::-2]=" + abc[-1::-2] + '\n')
sys.stdout.write("abc[:-8:3]= " + abc[:-8:3] + '\n')
% python slice0.py



a=       abcdefghij
a[1:5]=  bcde
a[3:]=   defghij
a[:4]=   abcd
a[-5:-2]=fgh
a[-5:7]= fg
a[-5:2]= (empty)


abc=        abcdefghijklmnopqrstuvwxyz
abc[::2]=   acegikmoqsuwy
abc[1::2]=  bdfhjlnprtvxz
abc[::-1]=  zyxwvutsrqponmlkjihgfedcba
abc[-1::-2]=zxvtrpnljhfdb
abc[:-8:3]= adgjmp


3.1.2. Exchanging Rows and Columns (transpose):

I was playing around with extended slices, trying a succession of incremented slices, and hit on the idea of arranging a list of names by column as is done by default in Linux/Unix using the ls utility to output file names in a directory. Looked at another way, this does a sort of matrix transpose, except that the number of columns depends on how long the individual file names are and how wide the available field is. It's interesting that the Python function works with any sequence -- I show it for three cases: strings, lists of integers, and lists of strings. (For strings, it's invoking str() on something already a string, but that does nothing.)

Use of Extended Slices
Source Output
# exch.py: extended slices
import sys

def exch(a, n):
    sys.stdout.write("exch(.," +
      str(n)+'):\n')
    m = [None] * n
    for j in range(0,n):
        m[j] = a[j::n]
    for j in range(0,n):
        sys.stdout.write(str(m[j])+'\n')
    sys.stdout.write('\n')
    # sys.stdout.write(str(m) + '\n')
    
a = "abcdefghijklmnopqrstuvwxyz"
# sys.stdout.write("a=       " + a + '\n')
exch(a,3)
exch(a,4)
exch(a,5) 
exch(a,7) 

b = []
for i in range(0,22):
    b.append(i)
# sys.stdout.write(str(b) + '\n\n')
exch(b,3)
exch(b,4)
exch(b,5)
exch(b,7)
% python exch.py
exch(.,3):
adgjmpsvy
behknqtwz
cfilorux

exch(.,4):
aeimquy
bfjnrvz
cgkosw
dhlptx

exch(.,5):
afkpuz
bglqv
chmrw
dinsx
ejoty

exch(.,7):
ahov
bipw
cjqx
dkry
elsz
fmt
gnu
(a few blanks inserted below)
exch(.,3):
[0, 3, 6,  9, 12, 15, 18, 21]
[1, 4, 7, 10, 13, 16, 19]
[2, 5, 8, 11, 14, 17, 20]

exch(.,4):
[0, 4,  8, 12, 16, 20]
[1, 5,  9, 13, 17, 21]
[2, 6, 10, 14, 18]
[3, 7, 11, 15, 19]

exch(.,5):
[0, 5, 10, 15, 20]
[1, 6, 11, 16, 21]
[2, 7, 12, 17]
[3, 8, 13, 18]
[4, 9, 14, 19]

exch(.,7):
[0,  7, 14, 21]
[1,  8, 15]
[2,  9, 16]
[3, 10, 17]
[4, 11, 18]
[5, 12, 19]
[6, 13, 20]
Result of printing m in each case:
['adgjmpsvy', 'behknqtwz', 'cfilorux']
['aeimquy', 'bfjnrvz', 'cgkosw', 'dhlptx']
['afkpuz', 'bglqv', 'chmrw', 'dinsx', 'ejoty']
['ahov', 'bipw', 'cjqx', 'dkry', 'elsz', 'fmt', 'gnu']
[[0, 3, 6, 9, 12, 15, 18, 21], [1, 4, 7, 10, 13, 16, 19], [2, 5, 8, 11, 14, 17, 20]]
[[0, 4, 8, 12, 16, 20], [1, 5, 9, 13, 17, 21], [2, 6, 10, 14, 18], [3, 7, 11, 15, 19]]
[[0, 5, 10, 15, 20], [1, 6, 11, 16, 21], [2, 7, 12, 17], [3, 8, 13, 18], [4, 9, 14, 19]]
[[0, 7, 14, 21], [1, 8, 15], [2, 9, 16], [3, 10, 17], [4, 11, 18], [5, 12, 19], [6, 13, 20]]

Here is the program invoked for a list of file names. In the output I inserted blanks by hand to make it look more like it normally would be, but that would be an additional requirement. This program is just an illustration of the use of extended slices, and is not how one would want to program it in practice.

Write a list of names alphabetic by columns
# exch2.py: directory listing
import sys

def exch(a, n):
    sys.stdout.write("exch(.," +
      str(n)+'):\n')
    m = [None] * n
    for j in range(0,n):
        m[j] = a[j::n]
    for j in range(0,n):
        sys.stdout.write(str(m[j])+'\n')
    sys.stdout.write('\n')

dir = ["arctanz.gif", "arrays.html", "books.html", "ceder", "classes",
       "containers.html", "copy", "debug", "dict", "ex",
       "fa", "fa0", "fa2", "fp", "fp2",
       "fsm", "func_prog.txt", "future.txt", "index.html", "intro.html",
       "lists.html", "logos", "mips.py.s", "parse", "pi1.gif",
       "piproduct.gif", "pl1.crop.png", "pl2.png", "pl.html", "psum0.pl",
       "psum0.py", "psum.pl", "psum.py", "pvp.html", "random2.html",
       "random.html", "re", "readchar", "recs", "refs",
       "rpn.py", "stack", "start.html", "strings", "stuff.txt",
       "summerfield", "testit.py", "timedate", "tiny", "top10pl.jpg",
       "xkcd.html"] # len(dir) == 51

exch(dir,8)  # want 7 cols. ceil(51/7) == 8
exch(dir,11) # want 5 cols. ceil(51/5) == 11
% python exch.py
exch(.,8):
['arctanz.gif',     'dict', 'func_prog.txt', 'pi1.gif',       'psum.py',      'rpn.py',     'tiny']
['arrays.html',     'ex',   'future.txt',    'piproduct.gif', 'pvp.html',     'stack',      'top10pl.jpg']
['books.html',      'fa',   'index.html',    'pl1.crop.png',  'random2.html', 'start.html', 'xkcd.html']
['ceder',           'fa0',  'intro.html',    'pl2.png',       'random.html',  'strings']
['classes',         'fa2',  'lists.html',    'pl.html',       're',           'stuff.txt']
['containers.html', 'fp',   'logos',         'psum0.pl',      'readchar',     'summerfield']
['copy',            'fp2',  'mips.py.s',     'psum0.py',      'recs',         'testit.py']
['debug',           'fsm',  'parse',         'psum.pl',       'refs',         'timedate']

exch(.,11):
['arctanz.gif',     'fa0',           'mips.py.s',     'pvp.html',     'stuff.txt']
['arrays.html',     'fa2',           'parse',         'random2.html', 'summerfield']
['books.html',      'fp',            'pi1.gif',       'random.html',  'testit.py']
['ceder',           'fp2',           'piproduct.gif', 're',           'timedate']
['classes',         'fsm',           'pl1.crop.png',  'readchar',     'tiny']
['containers.html', 'func_prog.txt', 'pl2.png',       'recs',         'top10pl.jpg']
['copy',            'future.txt',    'pl.html',       'refs',         'xkcd.html']
['debug',           'index.html',    'psum0.pl',      'rpn.py']
['dict',            'intro.html',    'psum0.py',      'stack']
['ex',              'lists.html',    'psum.pl',       'start.html']
['fa',              'logos',         'psum.py',       'strings']
(last listing as it would look in Unix/Linux without extra separators)

arctanz.gif     fa0           mips.py.s     pvp.html     stuff.txt
arrays.html     fa2           parse         random2.html summerfield
books.html      fp            pi1.gif       random.html  testit.py
ceder           fp2           piproduct.gif re           timedate
classes         fsm           pl1.crop.png  readchar     tiny
containers.html func_prog.txt pl2.png       recs         top10pl.jpg
copy            future.txt    pl.html       refs         xkcd.html
debug           index.html    psum0.pl      rpn.py
dict            intro.html    psum0.py      stack
ex              lists.html    psum.pl       start.html
fa              logos         psum.py       strings

 


3.1.3. Example: Palindromes: The year 2015 is a palindrome base 2. So which years near 2015 are palindromes to some base? Here is a Python program that produces those for years from 1900 to 2115 and for bases from 2 to 10:

Which years (1900-2115) are palindromes to some base (2-10)
n Source Output
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# palin.py: which years are palindromes
import sys

def base(n, b): # n base b
    res = ""
    while n > 0:
        res = str(n%b) + res
        n = n / b
    return res

def baser(n, b): # reversed
    res = ""
    while n > 0:
        res = res + str(n%b)
        n = n / b
    return res

for b in range(2,11):
    sys.stdout.write("BASE: " + str(b) + "\n")
    for i in range(1900,2116):
        if base(i, b) == baser(i, b):
            sys.stdout.write(" " + str(i)+"=")
            sys.stdout.write(base(i, b) +
              "<sub>" + str(b) + "</sub>\n")
    sys.stdout.write("\n")
% python palin.py
BASE: 2
 1911= 111011101112
 1935= 111100011112
 1967= 111101011112
 2015= 111110111112
 2047= 111111111112
 2049=1000000000012
BASE: 3
 1913=21212123
 1940=21222123
 1952=22000223
 1979=22010223
 2006=22020223
 2042=22101223
 2069=22111223
 2096=22121223

BASE: 4
 1965=1322314
 2045=1333314
 2050=2000024
BASE: 5
 1903=301035
 1928=302035
 1953=303035
 1978=304035
 2008=310135
 2033=311135
 2058=312135
 2083=313135
 2108=314135

BASE: 6
 1921=125216
 1963=130316
 1999=131316
 2035=132316
 2071=133316
 2107=134316
BASE: 7
 1944=54457
 2000=55557
 2056=56657
 2064=60067

BASE: 8
 1971=36638
 2043=37738
 2052=40048

BASE: 9
 1910=25529
 2000=26629
 2090=27729

BASE: 10
 1991=199110
 2002=200210
 2112=211210

Here are ones that are closest to 2015:

Before 2015After 2015
1971 = 36638
1978 = 304035
1979 = 22010223
1991 = 199110
1999 = 131316
2000 = 26629
2000 = 55557

2002 = 200210
2006 = 22020223
2008 = 310135
2015 = 111110111112
2033 = 311135
2035 = 132316
2042 = 22101223
2043 = 37738
2045 = 1333314
2047 = 111111111112
2049 = 1000000000012
2050 = 2000024
2052 = 40048
2056 = 56657
2058 = 312135

So we must wait at least 18 years for the next one, while we had 6 of them in the 16 years before 2015. In fact the gap from 2015 to 2033 is the longest one in this restricted range. Notice that 2000 is the only year that appears twice. Over a longer range (1492-2200) there are lots of other pairs, but only one longer gap: 1850-1872

Items of Interest or for study:

  • reverse: Python has a reverse() method for lists and tuples, but apparently not for strings. Comparing a string with its reversal is a standard way to check for a palindrome, though not the only way (one could, say, compare initial and terminal symbols).

  • In the program above, base converts to a given base, and it was easy to have another function baser that outputs the same characters in reverse order.

  • The program outputs a little bit of actual HTML so that the subscripts will look correct on the web page.

  • Online, someone suggested using s[::-1] to reverse a string s. "This is extended slice syntax. It works by doing [begin:end:step] -- by leaving begin and end off and specifying a step of -1, it reverses a string." See Extended Slices. These are objects that are powerful, and fairly easy to use and to understand.

  • Using the previous item, the function baser can have the shorter (but not simpler) form:

      def baser(n, b): # reversed
            return base(n,b)[::-1]

  • With the above in mind, here's the whole program rewritten, using recursion for base and using the extended slice operator [::-1] to reverse a string (the output is exactly the same):

    Years that are palindromes
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    # palin2.py: which years are palindromes
    import sys
    
    def base(n, b): # incorrect for n == 0
        if n == 0:
            return ""
        return base(n/b,b) + str(n%b)
    
    for b in range(2,11):
        sys.stdout.write("BASE: " + str(b) + "\n")
        for i in range(1900,2116):
            res = base(i,b)
            if res == res[::-1]:
                sys.stdout.write(" " + str(i) + "=")
                sys.stdout.write(res +
                  "<sub>" + str(b) + "</sub>\n")
        sys.stdout.write("\n")
    

(Revision date: 2015-02-04. Please use ISO 8601, the International Standard.)