CS 3343, NP-Complete Problems

CS 3343
Analysis of Algorithms

The Class NP of Problems

Algorithms versus Problems. It is much easier to talk about an algorithm than about a problem. Suppose we can prove that an algorithm is O(n²) or even Θ(n²). We then say the problem this algorithm solves can be solved in time O(n²). At least this gives an upper bound. We know the problem is no harder than O(n²). But there might be another algorithm that solves the problem more efficiently. It is hard to show that a problem cannot be solved more efficiently than by a given algorithm.

Polynomial Time Algorithms. Assume that the input to a problem is of size given by a variable n. A Polynomial Time Algorithm has worst-case time performance of O(n^k) for a fixed constant k. See hierarchy for a list of problems and algorithms, going from the easiest, through the polynomial-time algorithms, past the NP-complete ones, past the provably exponential-time ones, and finally to the undecidable algorithms, which cannot be solved at all.

In saying that a problem is of size n, one assumes a succinct representation of the problem, meaning that the size of one representation is a polynomial time function of the size of any other. A succinct representation must not use an exponential amount of space compared with another representation. For example, a very sparse graph might have a representation of size O(n), whereas the incidence matrix representation would be of size O(n²), with most entries 0. However, this makes no difference, because the two representations are within a polynomial multiple of one another.

On the other hand, if we take a fixed integer i in binary, its representation will be of size n = O(log(i)), since i is represented with log(i) bits. Here i = 2ⁿ. On the other hand, a representation using i "tick marks" will have size O(i). The ratio of sizes i / log(i) is the same as the ratio 2ⁿ / n. Thus the "tick marks" representation is exponentially bigger that the normal one. For large enough n, 2ⁿ will be larger than n^k for any fixed k. Thus "tick marks" does not qualify as a succinct representation.

Consider an algorithm that prints all 2ⁿ integers starting with 1, and going up to 2ⁿ. Printing 2ⁿ items will always take time O(2ⁿ).

The class P: Polynomial Time Problems. The class P consists of all problems that can be solved in polynomial time (by some polynomial time algorithm). Most of the problems we have looked are in P. Consider however, the Subset Sum problem. For an input list of n numbers, each (say) at most n bits long, the dynamic programming algorithm will require 2ⁿ amount of space, and so of course O(2ⁿ) time. This is an exponential algorithm. (Obviously so because even the tables used by the algorithm take up an exponential amount of space.) There is no known polynomial-time algorithm for Subset Sum, and it is not known if Subset Sum is a polynomial time problem.

Decision Problems. In the theory of NP-completeness, we will be dealing with decision problems, meaning problems with a Yes or No answer. In practice we are interested in optimization problems, where we want a specific optimal solution to the problem. For example, even with Subset Sum we might ask if a subset of the numbers will add up to a given specific sum, but we would be interested in the specific numbers that make up the sum and not just whether or not the numbers exist.

As another example, the page Zero-Knowledge Proofs looked at the Hamiltonian Circuit problem. There we saw that if you had an efficient (= polynomial time) algorithm to always answer the question "Is there a Hamiltonian Circuit" with a "Yes" or "No", then you could get an efficient algorithm to construct the actual circuit. We can often covert an algorithm to solve a decision problem into an algorithm to solve the corresponding optimization problem.

The class NP: Nondeterministic Polynomial Time Problems. This is a difficult class of problems to understand. NP is the class of problems solvable on a nondeterministic computer. From the beginning you should realize that no such actual hardware computer can exist in the real world.

A nondeterministic computer can carry out a nondeterministic algorithm. Such an algorithm proceeds in two stages:

Guessing stage: the algorithm "guesses" a structure S, which either is the answer to the problem or will help compute the answer.
Checking stage: the computer proceeds in a normal deterministic fashion to use S in arriving at a Yes-No answer to the problem. The checking stage must be done in polynomial time. In effect S is the answer and the checking stage verifies in polynomial time that S is the answer.

Summary of the difference between P and NP:

An algorithm in P produces the answer to a problem in polynomial time.
A nondeterministic algorithm in NP verifies in polynomial time that a guess is the answer to a problem.

The above shows that P is a subset of NP: every problem in P is also in NP. Experts conjecture that P is not equal to NP, but no one has been able to prove this (the proof is worth a million dollars right now, plus fame).

For example, if you are given a Hamiltonian circuit as the result of a guess, it is a trivial polynomial time computation to check that this is actually such a circuit.

Arbitrary parallelism: another view of NP. Another model of a nondeterministic computer is one which at any stage of computation allows an arbitrary degree of parallelism. This means that the computation can fork off any number of separate parallel computations. If any one of these separate computations completes successfully, the whole computation completes. This model is similar to a hardware model with separate cores and threads, but here there is no overhead at all to the separate computations, and no limit to the number of them. For example, in a recursive backtracking search (as with depth-first search, for example), each time a choice appears, the computation takes both paths "at once". Using this model, we could imagine starting a computation in parallel with each possible choice in the previous model. Thus these two models are equivalent. It should be emphasized that neither of these models can be realized (implemented) in the real world.

Summary. If you can verify in polynomial time that a given solution to a problem is correct, then the problem is in NP. In most cases this verification is easy. Any problem in P is also in NP.

Examples.

Revision date: 2012-04-15. (Please use ISO 8601, the International Standard Date and Time Notation.)