Quick Notes About Hashing

Suppose you want to look up names (strings) in an array, where
the name is a key of a record (key uniquely determines the record).
For example:

     +--------+------------+
     |  Key   | Extra Data |
     +--------+------------+
        ...
  17 | ralph  |   blah     |
  18 | wayne  |   blew     |
  19 | henry  |   blow     |
        ...

Suppose also that you have a "magic" function "hash" such that
   hash("henry") = 19
Then we could look up "henry" easily using the function "hash".
But instead, we could plan from the beginning to insert the
henry record at array index 19, where we can easily use
"hash" to find the record.

Problems:
1. Need hash function (lots of these -- see below)
2. Collisions: two strings have same hash value, for example,
   if hash("roger") = 19 also.
   Two solutions for collisions:
   a. (chaining or bucketing) Use a linked list of all records
      whose keys hash to the same number
   b. (open addressing) Store any additional record in the array
      "near" the proper location

Hash function properties:
   Want results well-distributed.
   Want result to depend on every byte of argument
    (If a String is used as argument, then a single
     change to one character, such at "b" for "a" or "2" for "1",
     should produce a completely independent result.)

Example hash function:

   int m = 50; // size of hash table
   int p = 1234321; // multiplier
   public int hash(String s, int p, int m) {
      int result = 0;
      for (int i = 0; i < s.length(); i++) {
         result = (result + s.charAt(i)*p)%m;
      } 
      return result;
   }

Notice that the character is converted to an Ascii code
for use in the calculation.  At each stage "%m" divides
by m and takes the remainder.  Except for integer overflow,
we could do a single "%m" at the end.

Example results:
Input to hash: "jadams":
# ascii char
0 106   j
1 97    a
2 100   d
3 97    a
4 109   m
5 115   s

Input to hash: "Faraz Ali":
# ascii char
0 70    F
1 97    a
2 114   r
3 97    a
4 122   z
5 32    (blank)
6 65    A
7 108   l
8 105   i


Distribution of 38 names over the 50 slots (0 through 49 inclusive):

p = 1234321, m = 50
0 0 2 1 0 0 0 0 2 0 2 2 0 1 1 1 1 0 1 0 2 1 0 2 0 0 1 0 0 1 2 0 1 1 1 1 1 0 1
2 2 1 0 0 0 2 0 0 0 2

p = 1, m = 50
0 1 0 1 1 0 1 0 1 2 2 0 2 2 0 1 1 0 0 2 2 1 0 1 0 0 0 0 1 0 2 0 0 0 1 1 0 0 0
0 2 2 1 1 0 2 1 0 2 1

m = 365
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 
0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 
0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 
0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 
0 2 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 
0 0 1 0 0 0 0 0 0 0 0 0 0 0 

Birthday paradox: If 23 people with random birthdays are assembled,
the probability is > 0.5 the two of them will have the same birthday.

Lookup and Insert:

   // lookUpAndInsert: insert if new and return index
   public int lookUpAndInsert(String s) {
      if (n == (m - 1)) { // hash table is full
         System.out.println("Hash table overflow");
         System.exit(1);
      }
      int i = hash(s);
      while (true) {
         if (h[i].length() == 0) { // empty entry, insert
            h[i] = s;
            n++;
            return i;
         }
         else if (h[i].compareTo(s) == 0) { // found
            return i;
         }
         // look upward for empty entry
         i--; if (i < 0) i += m;
      }
   }