Quick Notes About Hashing Suppose you want to look up names (strings) in an array, where the name is a key of a record (key uniquely determines the record). For example: +--------+------------+ | Key | Extra Data | +--------+------------+ ... 17 | ralph | blah | 18 | wayne | blew | 19 | henry | blow | ... Suppose also that you have a "magic" function "hash" such that hash("henry") = 19 Then we could look up "henry" easily using the function "hash". But instead, we could plan from the beginning to insert the henry record at array index 19, where we can easily use "hash" to find the record. Problems: 1. Need hash function (lots of these -- see below) 2. Collisions: two strings have same hash value, for example, if hash("roger") = 19 also. Two solutions for collisions: a. (chaining or bucketing) Use a linked list of all records whose keys hash to the same number b. (open addressing) Store any additional record in the array "near" the proper location Hash function properties: Want results well-distributed. Want result to depend on every byte of argument (If a String is used as argument, then a single change to one character, such at "b" for "a" or "2" for "1", should produce a completely independent result.) Example hash function: int m = 50; // size of hash table int p = 1234321; // multiplier public int hash(String s, int p, int m) { int result = 0; for (int i = 0; i < s.length(); i++) { result = (result + s.charAt(i)*p)%m; } return result; } Notice that the character is converted to an Ascii code for use in the calculation. At each stage "%m" divides by m and takes the remainder. Except for integer overflow, we could do a single "%m" at the end. Example results: Input to hash: "jadams": # ascii char 0 106 j 1 97 a 2 100 d 3 97 a 4 109 m 5 115 s Input to hash: "Faraz Ali": # ascii char 0 70 F 1 97 a 2 114 r 3 97 a 4 122 z 5 32 (blank) 6 65 A 7 108 l 8 105 i Distribution of 38 names over the 50 slots (0 through 49 inclusive): p = 1234321, m = 50 0 0 2 1 0 0 0 0 2 0 2 2 0 1 1 1 1 0 1 0 2 1 0 2 0 0 1 0 0 1 2 0 1 1 1 1 1 0 1 2 2 1 0 0 0 2 0 0 0 2 p = 1, m = 50 0 1 0 1 1 0 1 0 1 2 2 0 2 2 0 1 1 0 0 2 2 1 0 1 0 0 0 0 1 0 2 0 0 0 1 1 0 0 0 0 2 2 1 1 0 2 1 0 2 1 m = 365 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 2 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Birthday paradox: If 23 people with random birthdays are assembled, the probability is > 0.5 the two of them will have the same birthday. Lookup and Insert: // lookUpAndInsert: insert if new and return index public int lookUpAndInsert(String s) { if (n == (m - 1)) { // hash table is full System.out.println("Hash table overflow"); System.exit(1); } int i = hash(s); while (true) { if (h[i].length() == 0) { // empty entry, insert h[i] = s; n++; return i; } else if (h[i].compareTo(s) == 0) { // found return i; } // look upward for empty entry i--; if (i < 0) i += m; } }