CS 3343/3341: Recitation 12, Hashing, Reductions

Introduction to the Design and Analysis of Algorithms

(Textbook. Click for information.)

CS 3343/3341
Analysis of Algorithms
Spring 2012
Recitation 12
Hashing, Reductions
Partial Answers

Look at the writeup Zero-Knowledge Proofs and in particular at the table on page 146.

Give simple pseudo-code showing how anyone could prove that the "Cycle" column represents a Hamiltonian cycle. (This should be an algorithm not using pictures.)

(Informal version) First check that the vertices listed in the "Cycle" column include every vertex exactly once. Second, check that each vertex in the "Cycle" column also appears in the same row under "Edges". Then it has to be a Hamiltonial cycle.

(Informal version)
First check that the vertices listed in the "Cycle"
column include every vertex exactly once.
Second, check that each vertex in the "Cycle" column
also appears in the same row under "Edges".
Then it has to be a Hamiltonial cycle.

// more formal version boolean isHamilton(int[] Vertex, list[] Edges, int[] Cycle) { int[] visited = new int[20]; set each visited[i] = 0; for (i = 0 to 19) { if (cycle[i] is not in the list Edges[i]) return false; if (visited[cycle[i]] == 1) return false; else visited[cycle[i]] = 1; } return true; }

// more formal version
boolean isHamilton(int[] Vertex, list[] Edges, int[] Cycle) {
   int[] visited = new int[20];
   set each visited[i] = 0;
   for (i = 0 to 19) {
      if (cycle[i] is not in the list Edges[i]) return false;
      if (visited[cycle[i]] == 1) return false;
      else visited[cycle[i]] = 1;
   }
   return true;
}

Is your algorithm polynomial time? Sure, O(E+V).
Assuming that it is, what does this fact alone say about the status of the problem Hamiltonian cycle? HC is in NP. This does NOT show that HC is NP-complete, and a problem satisfying this condition might be in P.
What data does Alice show to Bob in case he asks to see the Hamiltonian cycle?She shows each integer in the Cycle column, and shows that the same integer is in the Edges list in the same row. (Bob must verify that each integer appears exactly once in the Cycles column.)
How many trials would Bob need to give him at least 99.9% confidence that Alice is not cheating? Well, in n trials, we have 1/2^n confidence that Alice is not cheating. 1/2^10 < 1/1000, so 10 trials would suffice.

Hashing

Suppose you are using the hash function hash(anything) = 3, which returns 3 no matter what the input. Explain what happens with both chaining and with open addressing in this case. With chaining, you would simply have a single linked list that contained all the entries. With open addressing, you would fill the hash table one element at a time starting at index 3 (eventually wrapping back around to index 2).
Suppose you use hashing with open addressing in a table of size m. Explain what happens as you insert m items into this table. (This assumes a normal hash function, rather than the function in part (i) above.) You would fill up the table completely. Toward the end you would be chasing along through entries looking for an empty spot. For this reason, if you use open-addressing you should let the table get too full.

The Birthday Paradox asks how many people with random birthdays does it take to have better than even odds that at least two have the same birthday (same month and same day). (This is ignoring February 29.)

It is easier to calculate the probability that among n people, no two have the same birthday. Denote this by N(n). For two people, the second one will have a birthday different from the first with probability:
        N(2) = 364/365.
If there is a third person the probability that he will have a birthday different from the first two (which are assumed distinct now), is 363/365, so that
        N(3) = (364/365)*(363/365).
Similarly,
        N(4) = (364/365)*(363/365)*(362/365).
Of course, N(1) = 1.

The probability, call it S(n), that at least two people from among n have the same birthday is just:
S(n) = 1 - N(n).

Write a simple program that will calculate and print S(n) for n = 1, 2, 3, ..., 50. These should be calculated as doubles.

#include <stdio.h> int main(){ double N[51]; N[1] = 1; int i; for (i = 2; i <= 50; i++) { N[i] = N[i-1]*(365.-i+1)/365.; printf("S(%i) = %10.8f\n", i, 1 - N[i]); } } S(2) = 0.00273973 S(3) = 0.00820417 S(4) = 0.01635591 S(5) = 0.02713557 S(6) = 0.04046248 S(7) = 0.05623570 S(8) = 0.07433529 S(9) = 0.09462383 S(10) = 0.11694818	S(11) = 0.14114138 S(12) = 0.16702479 S(13) = 0.19441028 S(14) = 0.22310251 S(15) = 0.25290132 S(16) = 0.28360401 S(17) = 0.31500767 S(18) = 0.34691142 S(19) = 0.37911853 S(20) = 0.41143838 S(21) = 0.44368834 S(22) = 0.47569531 S(23) = 0.50729723 S(24) = 0.53834426 S(25) = 0.56869970 S(26) = 0.59824082 S(27) = 0.62685928 S(28) = 0.65446147 S(29) = 0.68096854 S(30) = 0.70631624	S(31) = 0.73045463 S(32) = 0.75334753 S(33) = 0.77497185 S(34) = 0.79531686 S(35) = 0.81438324 S(36) = 0.83218211 S(37) = 0.84873401 S(38) = 0.86406782 S(39) = 0.87821966 S(40) = 0.89123181 S(41) = 0.90315161 S(42) = 0.91403047 S(43) = 0.92392286 S(44) = 0.93288537 S(45) = 0.94097590 S(46) = 0.94825284 S(47) = 0.95477440 S(48) = 0.96059797 S(49) = 0.96577961 S(50) = 0.97037358

#include <stdio.h>
int main(){
   double N[51];
   N[1] = 1;
   int i;
   for (i = 2; i <= 50; i++) {
      N[i] = N[i-1]*(365.-i+1)/365.;
      printf("S(%i) = %10.8f\n",
         i, 1 - N[i]);
   }
}
S(2)  = 0.00273973
S(3)  = 0.00820417
S(4)  = 0.01635591
S(5)  = 0.02713557
S(6)  = 0.04046248
S(7)  = 0.05623570
S(8)  = 0.07433529
S(9)  = 0.09462383
S(10) = 0.11694818

S(11) = 0.14114138
S(12) = 0.16702479
S(13) = 0.19441028
S(14) = 0.22310251
S(15) = 0.25290132
S(16) = 0.28360401
S(17) = 0.31500767
S(18) = 0.34691142
S(19) = 0.37911853
S(20) = 0.41143838
S(21) = 0.44368834
S(22) = 0.47569531
S(23) = 0.50729723
S(24) = 0.53834426
S(25) = 0.56869970
S(26) = 0.59824082
S(27) = 0.62685928
S(28) = 0.65446147
S(29) = 0.68096854
S(30) = 0.70631624

S(31) = 0.73045463
S(32) = 0.75334753
S(33) = 0.77497185
S(34) = 0.79531686
S(35) = 0.81438324
S(36) = 0.83218211
S(37) = 0.84873401
S(38) = 0.86406782
S(39) = 0.87821966
S(40) = 0.89123181
S(41) = 0.90315161
S(42) = 0.91403047
S(43) = 0.92392286
S(44) = 0.93288537
S(45) = 0.94097590
S(46) = 0.94825284
S(47) = 0.95477440
S(48) = 0.96059797
S(49) = 0.96577961
S(50) = 0.97037358

What does Part iii of this exercise say about using open addressing to insert into a hash table of size 365? If you have a hash table of size 365, then after inserting 23 items, you'll have more than a 50% chance of a collision. After 50 items, there is a 97% chance of a collision.
This doesn't mean that collisions are terrible events that must be avoided. Rather it means that collisions will occur even though the hash table is far larger than the number of entries. So our algorithms will have to deal with collisions, as the do.

Reductions Suppose you are writing software for a high-precision floating point engine. (Instead of 64-bit floating point numbers, perhaps 1024-bit fp numbers -- most of these used for the significant part, and the remainder for the sign and exponent.) Suppose you have written reasonably efficient addition, subtraction, and negation routines for these number, and you can also efficiently multiply and divide by powers of 2. Now you want to add other arithmetic functions, especially multiplication and division.
We will look at 4 problems:
We will be writing stuff like P₁ ≤ P₂ to mean a reduction of P₁ to P₂, similar to the book's ≤_p. However, here it's not polynomial-time, but rather an efficient reduction comparable to the other operations on these numbers. Intuitively, this means: if you can compute P₂ efficiently, then you can compute P₁ efficiently.
1. If 1000 binary bits are used for the significant part, roughly how many decimal digits of accuracy would we have? On the average there are log₂10 = 3.321928 bits per digit, so 1000 bits is 1000/log₂10 = 301.02999 digits.
2. Explain why it is trivial to see that P_s ≤ P_m
3. Use the formula below (with either part on the right) to show that P_m ≤ P_s
4. Use the formula below to show that P_s ≤ P_r
5. Recall that we used the following iteration formula from Newton's method to find the reciprocal. Argue that this shows that P_r ≤ P_m [Note: Until Tuesday morning this was written P_d ≤ P_m in error.]
  
  You should explain in some detail how you intend to handle the initial guess for Newton's method, which should not be too far from the correct answer.
6. Argue that if you can do both P_m and P_r efficiently, then you can also do P_d efficiently.
7. Explain why if we can do P_s efficiently, then we can also do the other three efficiently: P_m, P_r, and P_d. (This would be a quick and dirty way to do a half-way efficient implementation.) [Note: Part iv above is just an extra fact that shows that P_m, P_r, and P_s are equivalent. Part iv is not needed to show this part vii.]

Revision date: 2012-04-20. (Please use ISO 8601, the International Standard.)