CS 3343/3341, Subset Sum Problem

CS3343/3341
Analysis of Algorithms
Spring 2012

Dynamic Programming
Subset Sum Problem

Dynamic Programming: In dynamic programming [Text, page 357 on], we try to solve a problem by combining solutions to subproblems. The proper combination of optimal solutions to subproblems will be an optimal solution to the main problem. Here is a problem that lends itself to an especially simple form of dynamic programming.

Subset Sum Problem: Start with an array A of positive integers. Take the sum of a subset of these integers, say, numbers in an array B. The Subset Sum Problem asks: Given A and given a possible sum of a subset of the numbers in A, determine a subset that adds up to the given sum (there may be several solutions), or determine that no such subset exists (no solutions).

Let's look at an example: Take A of size 100 with a sum of 50 of those numbers. The numbers in A are give below, with the ones chosen to add up displayed in red. The sum of the red numbers is 2618000. (There's no significance to the fact that this number ends in "000".)

10207 11296 11587 12426 12823 13174 15859 15899 16006 16489 16493 17285 18091 18673 19133 19560 20346 20768 20949 22800 22857 23206 23547 24354 24927 25778 27320 27516 29356 30849 32221 32233 32447 32609 32644 33346 34591 35506 36451 37398 38235 40590 41208 42990 44137 44383 44436 44484 46011 46177 46205 46660 47870 48665 50851 51656 52005 53697 54719 56623 56894 57124 59791 60085 60313 62182 64244 65648 66814 66898 67950 68502 69640 70523 72041 73413 73454 76055 76652 77123 77600 77701 78465 81628 81866 82388 84351 84622 87382 88730 90015 90365 90435 91420 91977 94664 96331 97454 99313 99971

10207 11296 11587 12426 12823 13174 15859 15899 16006 16489 16493 17285 18091 18673 19133 19560 20346 20768 20949 22800 22857 23206 23547 24354 24927 25778 27320 27516 29356 30849 32221 32233 32447 32609 32644 33346 34591 35506 36451 37398 38235 40590 41208 42990 44137 44383 44436 44484 46011 46177 46205 46660 47870 48665 50851 51656 52005 53697 54719 56623 56894 57124 59791 60085 60313 62182 64244 65648 66814 66898 67950 68502 69640 70523 72041 73413 73454 76055 76652 77123 77600 77701 78465 81628 81866 82388 84351 84622 87382 88730 90015 90365 90435 91420 91977 94664 96331 97454 99313 99971

Given all the numbers above, and given the sum, do you think you could find the subset, the red numbers? Or could you find some other subset that added up to the sum? In any such sum, you can only use any given number at most once. The total number of all possible sums is 2¹⁰⁰ = 1.26765e30. Suppose we knew that the sum was made up of exactly 50 of the numbers from A. Maybe that would help a lot. There are C(100,50) ( = number of combinations of 100 items taken 50 at a time) such sums. C(100,50) = 100!/(50! * 50!) = 1.0089e29. How big is this number? Suppose you could try out 10¹⁸ (a billion billion) of these sums each second. Then it would still take you 10e11 seconds, or over 300 years to finish the problem.

Dynamic Programming Approach: The techniques of dynamic programming will let us tackle this problem. Consider first an extremely simple example, where A = {1, 9, 5, 3, 8}, and the sum of a subset is 13. Below I show contents of a work array W, where W[k] != -1 means that k is the sum of elements of A. Notice that indexes of elements in the W array correspond to sums of elements of the A array.

Possible #s <-------------------- W Array Index ----------------------------------------> in sum 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Initial: 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 {1}: 0 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 {1,9}: 0 1 -1 -1 -1 -1 -1 -1 -1 9 9 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 {1,9,5}: 0 1 -1 -1 -1 5 5 -1 -1 9 9 -1 -1 -1 5 5 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 {1,9,5,3}: 0 1 -1 3 3 5 5 -1 3 9 9 -1 3 3 5 5 -1 3 3 -1 -1 -1 -1 -1 -1 -1 -1 {1,9,5,3,8}: 0 1 -1 3 3 5 5 -1 3 9 9 8 3 3 5 5 8 3 3 -1 8 8 8 8 -1 8 8

Possible #s <-------------------- W Array Index ----------------------------------------> in sum 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Initial: 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 {1}: 0 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 {1,9}: 0 1 -1 -1 -1 -1 -1 -1 -1 9 9 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 {1,9,5}: 0 1 -1 -1 -1 5 5 -1 -1 9 9 -1 -1 -1 5 5 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 {1,9,5,3}: 0 1 -1 3 3 5 5 -1 3 9 9 -1 3 3 5 5 -1 3 3 -1 -1 -1 -1 -1 -1 -1 -1 {1,9,5,3,8}: 0 1 -1 3 3 5 5 -1 3 9 9 8 3 3 5 5 8 3 3 -1 8 8 8 8 -1 8 8

This displays the sums that are possible using the elements of A in turn:

In the 0th row, initial values for W, empty seet yields sum of 0.
In the 1st row, possible elements {1} yield sums of 0,1.
In the 2nd row, possible elements {1,9} yield sums of 0,1,9,10.
In the 3rd row, possible elements {1,9,5} yield sums of 0,1,5,6,9,10,14,15.
In the 4th row, possible elements {1,9,5,3} yield sums of 0,1,3,4,5,6,8,9,10,12,13,14,15,17,18.
In the 5th row, possible elements {1,9,5,3,8} yield sums of all but 2,7,19,24.

Calculating the Work Array W: We repeatedly process the entire array W in j successive steps, for each element A[ j ] of the array of numbers to use for sums, as show above.

At the jth step, we are considering an element A[ j ].

a = A[ j ]

For each j, use a variable k to march through the elements of W in turn. There are several constraints on what we find in the array W.

We need W[k] != -1, since otherwise, a sum of k is still not possible. Assuming W[k] != -1, we know that the program has already found a sum of elements from A that add up to k.
We are going to try adding a into the sum to get a new sum. Thus we are interested in the value k+a, and in the work array entry W[k+a], In case the new sum is acceptable, we are going to set W[k+a] = a to show that a is part of this new sum. Now we get more constraints:
We need W[k] != a, because otherwise, a is already a part of the sum, and we're only using each element of A at most once.
Now focus on the entry W[k+a].
We want W[k+a] == -1 so that we stop processing once we find a sum. (If we go ahead and process when W[k+a] != -1, we will just be overwriting a previous sum that was discovered.)

Summarizing the conditions above, and remembering that we set a = A[ j ], we have:

if (W[k] != -1) and (W[k] != a) and (W[k+a] == -1) then
set W[k+a] = a

And summarizing why we need each of the three parts of the if-condition:

We need (W[k] != -1), because if (W[k] == -1) then we don't have any sum to work with.
We need (W[k] != a), because if we allow (W[k] == a) then we will be using a in the sum a second time.
Leave off this condition to allow using elements of A more than once.
if (W[k+a] != -l) then we already have a sum yielding k+a.
Leave off this condition if you want to consider all sums in turn, rather than just the first one. (In this case, since each new sum overwrites the previous one, we end up with the last sum.)

Code for Calculating the Work Array W:

Building the Work Array W
// calcWork: build work array W int[] calcWork(int[] A, int wsize) { int[] W = new int[wsize]; // work array W[0] = 0; // empty set yields sum of 0 for (int k = 1; k < W.length; k++) W[k] = -1; // sums not attained for (int j = 0; j < A.length; j++) { // each elt of A in turn int a = A[j]; // simplify notation for (int k = 0; k < W.length; k++) { // through all of W if ( (W[k] != -1) // nothing sums to k && (W[k] != a) // a not already in sum (no repetitions) && (k + a < wsize) // don't go past end of W && (W[k + a] == -1) // look for one sum only ) { W[k + a] = a; } } } return W; }

Building the Work Array W

// calcWork: build work array W
int[] calcWork(int[] A, int wsize) {
   int[] W = new int[wsize]; // work array
   W[0] = 0; // empty set yields sum of 0
   for (int k = 1; k < W.length; k++) W[k] = -1; // sums not attained

   for (int j = 0; j < A.length; j++) { // each elt of A in turn 
      int a = A[j]; // simplify notation
      for (int k = 0; k < W.length; k++) { // through all of W

         if (   (W[k] != -1) // nothing sums to k
             && (W[k] !=  a) // a not already in sum (no repetitions)
             && (k + a < wsize) // don't go past end of W
             && (W[k + a] == -1) // look for one sum only
               )
            { W[k + a] = a; }
      }
   }
   return W;
}

Determining the Elements in the Sum: Now comes something really tricky. From the entries in the work array W, for any index k we can completely determine the elements of A that will sum to k. In the simple example, we wanted to find numbers from A that would add up to 13. Now W[13] = 3, so 3 was the last number added to the sum. How did 3 get there. Go back 3 entries to find an entry W[10] = 9. This means we has elements from A that added up to 10. Adding in 3 gave a sum of 13. From W[10], go back 9 entries to W[1], which is 1. So a sum of 1 added up to 1. Going back one more entry gets to W[0] = 0, the only zero entry. Thus we've reconstructed the sum: 3 + 9 + 1 = 13.

If A had been arranged as: A = {1, 9, 5, 8, 3}, then in processing 8, when we got to W[5], we would have added an entry W[13] = 8, and we would have seen that 5 + 8 = 13. The W array only lets you recover a single sum that achieved any given value.

In this way we have a clever linked list of elements of A threaded into the entries of W. I've seen exactly such a tricky linked list in other algorithms.

The First Example From the Start: Consider the example at the start of this page. When I ran the algorithm on this example, I got the following numbers from A that added up to the sum 2618000. (I'm showing these as red entries as before.)

10207 11296 11587 12426 12823 13174 15859 15899 16006 16489 16493 17285 18091 18673 19133 19560 20346 20768 20949 22800 22857 23206 23547 24354 24927 25778 27320 27516 29356 30849 32221 32233 32447 32609 32644 33346 34591 35506 36451 37398 38235 40590 41208 42990 44137 44383 44436 44484 46011 46177 46205 46660 47870 48665 50851 51656 52005 53697 54719 56623 56894 57124 59791 60085 60313 62182 64244 65648 66814 66898 67950 68502 69640 70523 72041 73413 73454 76055 76652 77123 77600 77701 78465 81628 81866 82388 84351 84622 87382 88730 90015 90365 90435 91420 91977 94664 96331 97454 99313 99971

This is a sum of 69 entries, rather than of 50 as in the original sum. This is because the system processed the entries of A in increasing order, so it considered smaller numbers first. In processing the numbers from A in this order, these red ones are the first ones encountered that added up to the given sum. This indicates that there must be a very large number of sums that add up to the given value -- not surprising since there is an astronomical number of such sums.

An Example That Recovers the Original Red Numbers: Suppose the A array has these 20 numbers:

131743 172851 191333 207683 228574 273208 326094 333462 364517 441372 443833 460112 466602 486659 508518 547196 571243 846222 919774 974546

131743 172851 191333 207683 228574 273208 326094 333462 364517 441372 443833 460112 466602 486659 508518 547196 571243 846222 919774 974546

The 10 red entries have sum 4016874. In this case my program recovered the original sum. Notice that here there are only 2²⁰ = 1048576 sums altogether, and only 184756 of length 10.

Revision date: 2012-02-26. (Please use ISO 8601, the International Standard Date and Time Notation.)