Introduction to the Design and Analysis of Algorithms
(Textbook. Click for information.)
 CS 3343/3341
 Analysis of Algorithms
 Spring 2012

 Recitation 12
 Hashing, Reductions

    Partial Answers  


  1. Look at the writeup Zero-Knowledge Proofs and in particular at the table on page 146.

    1. Give simple pseudo-code showing how anyone could prove that the "Cycle" column represents a Hamiltonian cycle. (This should be an algorithm not using pictures.)

      (Informal version)
      First check that the vertices listed in the "Cycle"
      column include every vertex exactly once.
      Second, check that each vertex in the "Cycle" column
      also appears in the same row under "Edges".
      Then it has to be a Hamiltonial cycle.
      

      // more formal version
      boolean isHamilton(int[] Vertex, list[] Edges, int[] Cycle) {
         int[] visited = new int[20];
         set each visited[i] = 0;
         for (i = 0 to 19) {
            if (cycle[i] is not in the list Edges[i]) return false;
            if (visited[cycle[i]] == 1) return false;
            else visited[cycle[i]] = 1;
         }
         return true;
      }
      

    2. Is your algorithm polynomial time? Sure, O(E+V).
    3. Assuming that it is, what does this fact alone say about the status of the problem Hamiltonian cycle? HC is in NP. This does NOT show that HC is NP-complete, and a problem satisfying this condition might be in P.
    4. What data does Alice show to Bob in case he asks to see the Hamiltonian cycle?She shows each integer in the Cycle column, and shows that the same integer is in the Edges list in the same row. (Bob must verify that each integer appears exactly once in the Cycles column.)
    5. How many trials would Bob need to give him at least 99.9% confidence that Alice is not cheating? Well, in n trials, we have 1/2^n confidence that Alice is not cheating. 1/2^10 < 1/1000, so 10 trials would suffice.


  2. Hashing

    1. Suppose you are using the hash function hash(anything) = 3, which returns 3 no matter what the input. Explain what happens with both chaining and with open addressing in this case. With chaining, you would simply have a single linked list that contained all the entries. With open addressing, you would fill the hash table one element at a time starting at index 3 (eventually wrapping back around to index 2).

    2. Suppose you use hashing with open addressing in a table of size m. Explain what happens as you insert m items into this table. (This assumes a normal hash function, rather than the function in part (i) above.) You would fill up the table completely. Toward the end you would be chasing along through entries looking for an empty spot. For this reason, if you use open-addressing you should let the table get too full.

    3. The Birthday Paradox asks how many people with random birthdays does it take to have better than even odds that at least two have the same birthday (same month and same day). (This is ignoring February 29.)

      It is easier to calculate the probability that among n people, no two have the same birthday. Denote this by N(n). For two people, the second one will have a birthday different from the first with probability:
              N(2) = 364/365.
      If there is a third person the probability that he will have a birthday different from the first two (which are assumed distinct now), is 363/365, so that
              N(3) = (364/365)*(363/365).
      Similarly,
              N(4) = (364/365)*(363/365)*(362/365).
      Of course, N(1) = 1.

      The probability, call it S(n), that at least two people from among n have the same birthday is just:
              S(n) = 1 - N(n).

      Write a simple program that will calculate and print S(n) for n = 1, 2, 3, ..., 50. These should be calculated as doubles.

      #include <stdio.h>
      int main(){
         double N[51];
         N[1] = 1;
         int i;
         for (i = 2; i <= 50; i++) {
            N[i] = N[i-1]*(365.-i+1)/365.;
            printf("S(%i) = %10.8f\n",
               i, 1 - N[i]);
         }
      }
      S(2)  = 0.00273973
      S(3)  = 0.00820417
      S(4)  = 0.01635591
      S(5)  = 0.02713557
      S(6)  = 0.04046248
      S(7)  = 0.05623570
      S(8)  = 0.07433529
      S(9)  = 0.09462383
      S(10) = 0.11694818
      
      S(11) = 0.14114138
      S(12) = 0.16702479
      S(13) = 0.19441028
      S(14) = 0.22310251
      S(15) = 0.25290132
      S(16) = 0.28360401
      S(17) = 0.31500767
      S(18) = 0.34691142
      S(19) = 0.37911853
      S(20) = 0.41143838
      S(21) = 0.44368834
      S(22) = 0.47569531
      S(23) = 0.50729723
      S(24) = 0.53834426
      S(25) = 0.56869970
      S(26) = 0.59824082
      S(27) = 0.62685928
      S(28) = 0.65446147
      S(29) = 0.68096854
      S(30) = 0.70631624
      
      S(31) = 0.73045463
      S(32) = 0.75334753
      S(33) = 0.77497185
      S(34) = 0.79531686
      S(35) = 0.81438324
      S(36) = 0.83218211
      S(37) = 0.84873401
      S(38) = 0.86406782
      S(39) = 0.87821966
      S(40) = 0.89123181
      S(41) = 0.90315161
      S(42) = 0.91403047
      S(43) = 0.92392286
      S(44) = 0.93288537
      S(45) = 0.94097590
      S(46) = 0.94825284
      S(47) = 0.95477440
      S(48) = 0.96059797
      S(49) = 0.96577961
      S(50) = 0.97037358
      

    4. What does Part iii of this exercise say about using open addressing to insert into a hash table of size 365? If you have a hash table of size 365, then after inserting 23 items, you'll have more than a 50% chance of a collision. After 50 items, there is a 97% chance of a collision.

      This doesn't mean that collisions are terrible events that must be avoided. Rather it means that collisions will occur even though the hash table is far larger than the number of entries. So our algorithms will have to deal with collisions, as the do.


  1. Reductions Suppose you are writing software for a high-precision floating point engine. (Instead of 64-bit floating point numbers, perhaps 1024-bit fp numbers -- most of these used for the significant part, and the remainder for the sign and exponent.) Suppose you have written reasonably efficient addition, subtraction, and negation routines for these number, and you can also efficiently multiply and divide by powers of 2. Now you want to add other arithmetic functions, especially multiplication and division.

    We will look at 4 problems:

    We will be writing stuff like P1 ≤ P2 to mean a reduction of P1 to P2, similar to the book's p. However, here it's not polynomial-time, but rather an efficient reduction comparable to the other operations on these numbers. Intuitively, this means: if you can compute P2 efficiently, then you can compute P1 efficiently.

    1. If 1000 binary bits are used for the significant part, roughly how many decimal digits of accuracy would we have? On the average there are log210 = 3.321928 bits per digit, so 1000 bits is 1000/log210 = 301.02999 digits.

    2. Explain why it is trivial to see that Ps ≤ Pm

    3. Use the formula below (with either part on the right) to show that Pm ≤ Ps

    4. Use the formula below to show that Ps ≤ Pr

    5. Recall that we used the following iteration formula from Newton's method to find the reciprocal. Argue that this shows that Pr ≤ Pm [Note: Until Tuesday morning this was written Pd ≤ Pm in error.]

      You should explain in some detail how you intend to handle the initial guess for Newton's method, which should not be too far from the correct answer.

    6. Argue that if you can do both Pm and Pr efficiently, then you can also do Pd efficiently.

    7. Explain why if we can do Ps efficiently, then we can also do the other three efficiently: Pm, Pr, and Pd. (This would be a quick and dirty way to do a half-way efficient implementation.) [Note: Part iv above is just an extra fact that shows that Pm, Pr, and Ps are equivalent. Part iv is not needed to show this part vii.]


Revision date: 2012-04-20. (Please use ISO 8601, the International Standard.)