CATEGORII DOCUMENTE 

Bulgara  Ceha slovaca  Croata  Engleza  Estona  Finlandeza  Franceza 
Germana  Italiana  Letona  Lituaniana  Maghiara  Olandeza  Poloneza 
Sarba  Slovena  Spaniola  Suedeza  Turca  Ucraineana 
All of the algorithms we have studied thus far have been polynomialtime algorithms: on inputs of size n, their worstcase running time is O(n^{k}) for some constant k. It is natural to wonder whether all problems can be solved in polynomial time. The answer is no. For example, there are problems, such as Turing's famous 'Halting Problem,' that cannot be solved by any computer, no matter how much time is provided. There are also problems that can be solved, but not in time O(n^{k}) for any constant k. Generally, we think of problems that are solvable by polynomialtime algorithms as being tractable, and problems that require superpolynomial time as being intractable.
The subject of this chapter, however, is an interesting class of problems, called the 'NPcomplete' problems, whose status is unknown. No polynomialtime algorithm has yet been discovered for an NPcomplete problem, nor has anyone yet been able to prove a superpolynomialtime lower bound for any of them. This socalled P NP question has been one of the deepest, most perplexing open research problems in theoretical computer science since it was posed in 1971.
Most theoretical computer scientists believe that the NPcomplete problems are intractable. The reason is that if any single NPcomplete problem can be solved in polynomial time, then every NPcomplete problem has a polynomialtime algorithm. Given the wide range of NPcomplete problems that have been studied to date, without any progress toward a polynomialtime solution, it would be truly astounding if all of them could be solved in polynomial time.
To become a good algorithm designer, you must understand the rudiments of the theory of NPcompleteness. If you can establish a problem as NPcomplete, you provide good evidence for its intractability. As an engineer, you would then do better spending your time developing an approximation algorithm (see Chapter 37) rather than searching for a fast algorithm that solves the problem exactly. Moreover, many natural and interesting problems that on the surface seem no harder than sorting, graph searching, or network flow are in fact NPcomplete. Thus, it is important to become familiar with this remarkable class of problems.
This chapter studies the aspects of NPcompleteness that bear most directly on the analysis of algorithms. In Section 1, we formalize our notion of 'problem' and define the complexity class P of polynomialtime solvable decision problems. We also see how these notions fit into the framework of formallanguage theory. Section 2 defines the class NP of decision problems whose solutions can be verified in polynomial time. It also formally poses the P NP question.
Section 3 shows how relationships between problems can be studied via polynomialtime 'reductions.' It defines NPcompleteness and sketches a proof that one problem, called 'circuit satisfiability,' is NPcomplete. Having found one NPcomplete problem, we show in Section 4 how other problems can be proven to be NPcomplete much more simply by the methodology of reductions. The methodology is illustrated by showing that two formulasatisfiability problems are NPcomplete. A variety of other problems are shown to be NPcomplete in Section 5.
We begin our study of NPcompleteness by formalizing our notion of polynomialtime solvable problems. These problems are generally regarded as tractable. The reason why is a philosophical, not a mathematical, issue. We can offer three supporting arguments.
First, although it is reasonable to regard a problem that requires time (n^{100}) as intractable, there are very few practical problems that require time on the order of such a highdegree polynomial. The polynomialtime computable problems encountered in practice typically require much less time.
Second, for many reasonable models of computation, a problem that can be solved in polynomial time in one model can be solved in polynomial time in another. For example, the class of problems solvable in polynomial time by the serial randomaccess machine used throughout most of this book is the same as the class of problems solvable in polynomial time on abstract Turing machines.^{1 }It is also the same as the class of problems solvable in polynomial time on a parallel computer, even if the number of processors grows polynomially with the input size.
^{1}See Hopcroft and Ullman [104] or Lewis and Papadimitriou [139] for a thorough treatment of the Turingmachine model. of calls to polynomialtime subroutines, the running time of the composite algorithm is polynomial.
Third, the class of polynomialtime solvable problems has nice closure properties, since polynomials are closed under addition, multiplication, and composition. For example, if the output of one polynomialtime algorithm is fed into the input of another, the composite algorithm is polynomial. If an otherwise polynomialtime algorithm makes a constant number of calls to polynomialtime subroutines, the running time of the composite algorithm is polynomial.
To understand the class of polynomialtime solvable problems, we must first have a formal notion of what a 'problem' is. We define an abstract problem Q to be a binary relation on a set I of problem instances and a set S of problem solutions. For example, consider the problem SHORTESTPATH of finding a shortest path between two given vertices in an unweighted, undirected graph G = (V, E). An instance for SHORTESTPATH is a triple consisting of a graph and two vertices. A solution is a sequence of vertices in the graph, with perhaps the empty sequence denoting that no path exists. The problem SHORTESTPATH itself is the relation that associates each instance of a graph and two vertices with a shortest path in the graph that connects the two vertices. Since shortest paths are not necessarily unique, a given problem instance may have more than one solution.
This formulation of an abstract problem is more general than is required for our purposes. For simplicity, the theory of NPcompleteness restricts attention to decision problems: those having a yes/no solution. In this case, we can view an abstract decision problem as a function that maps the instance set I to the solution set . For example, a decision problem PATH related to the shortestpath problem is, 'Given a graph G = (V, E), two vertices u, v V, and a nonnegative integer k, does a path exist in G between u and v whose length is at most k?' If i = <G, u, v, k> is an instance of this shortestpath problem, then PATH(i) = 1 (yes) if a shortest path from u to v has length at most k, and PATH(i) = 0 (no) otherwise.
Many abstract problems are not decision problems, but rather optimization problems, in which some value must be minimized or maximized. In order to apply the theory of NPcompleteness to optimization problems, we must recast them as decision problems. Typically, an optimization problem can be recast by imposing a bound on the value to be optimized. As an example, in recasting the shortestpath problem as the decision problem PATH, we added a bound k to the problem instance.
Although the theory of NPcompleteness compels us to recast optimization problems as decision problems, this requirement does not diminish the impact of the theory. In general, if we can solve an optimization problem quickly, we can also solve its related decision problem quickly. We simply compare the value obtained from the solution of the optimization problem with the bound provided as input to the decision problem. If an optimization problem is easy, therefore, its related decision problem is easy as well. Stated in a way that has more relevance to NPcompleteness, if we can provide evidence that a decision problem is hard, we also provide evidence that its related optimization problem is hard. Thus, even though it restricts attention to decision problems, the theory of NPcompleteness applies much more widely.
If a computer program is to solve an abstract problem, problem instances must must be represented in a way that the program understands. An encoding of a set S of abstract objects is a mapping e from S to the set of binary strings.^{2 }For example, we are all familiar with encoding the natural numbers N = as the strings . Using this encoding, e(l7) = 10001. Anyone who has looked at computer representations of keyboard characters is familiar with either the ASCII or EBCDIC codes. In the ASCII code, e(A) = 1000001. Even a compound object can be encoded as a binary string by combining the representations of its constituent parts. Polygons, graphs, functions, ordered pairs, programsall can be encoded as binary strings.
^{2 }The codomain of e need not be binary strings; any set of strings over a finite alphabet having at least 2 symbols will do.
Thus, a computer algorithm that 'solves' some abstract decision problem actually takes an encoding of a problem instance as input. We call a problem whose instance set is the set of binary strings a concrete problem. We say that an algorithm solves a concrete problem in time O(T(n)) if, when it is provided a problem instance i of length n = i, the algorithm can produce the solution in at most O(T(n)) time. A concrete problem is polynomialtime solvable, therefore, if there exists an algorithm to solve it in time O(n^{k}) for some constant k.
We can now formally define the complexity class P as the set of concrete decision problems that are solvable in polynomial time.
We can use encodings to map abstract problems to concrete problems. Given an abstract decision problem Q mapping an instance set I to , an encoding e : I ^{*} can be used to induce a related concrete decision problem, which we denote by e(Q). If the solution to an abstractproblem instance i I is Q(i) , then the solution to the concreteproblem instance e(i) ^{* }is also Q(i). As a technicality, there may be some binary strings that represent no meaningful abstractproblem instance. For convenience, we shall assume that any such string is mapped arbitrarily to 0. Thus, the concrete problem produces the same solutions as the abstract problem on binarystring instances that represent the encodings of abstractproblem instances.
We would like to extend the definition of polynomialtime solvability from concrete problems to abstract problems using encodings as the bridge, but we would like the definition to be independent of any particular encoding. That is, the efficiency of solving a problem should not depend on how the problem is encoded. Unfortunately, it depends quite heavily. For example, suppose that an integer k is to be provided as the sole input to an algorithm, and suppose that the running time of the algorithm is (k). If the integer k is provided in unarya string of k 1'sthen the running time of the algorithm is O(n) on lengthn inputs, which is polynomial time. If we use the more natural binary representation of the integer k, however, then the input length is n = lg k. In this case, the running time of the algorithm is (k) = (2^{n}), which is exponential in the size of the input. Thus, depending on the encoding, the algorithm runs in either polynomial or superpolynomial time.
The encoding of an abstract problem is therefore quite important to our understanding of polynomial time. We cannot really talk about solving an abstract problem without first specifying an encoding. Nevertheless, in practice, if we rule out 'expensive' encodings such as unary ones, the actual encoding of a problem makes little difference to whether the problem can be solved in polynomial time. For example, representing integers in base 3 instead of binary has no effect on whether a problem is solvable in polynomial time, since an integer represented in base 3 can be converted to an integer represented in base 2 in polynomial time.
We say that a function f : ^{* }^{*} is polynomialtime computable if there exists a polynomialtime algorithm A that, given any input x ^{*}, produces as output f(x). For some set I of problem instances, we say that two encodings e_{ }and e_{ }are polynomially related if there exist two polynomialtime computable functions f_{12} and f_{21} such that for any i I, we have f_{12}(e_{1}(i)) = e_{2}(i) and f_{21}(e_{2}(i)) = e_{1}(i). That is, the encoding e_{2}(i) can be computed from the encoding e_{1}(i) by a polynomialtime algorithm, and vice versa. If two encodings e_{1} and e_{2} of an abstract problem are polynomially related, which we use makes no difference to whether the problem is polynomialtime solvable or not, as the following lemma shows.
Lemma 1
Let Q be an abstract decision problem on an instance set I, and let e_{1} and e_{2} be polynomially related encodings on I. Then, e_{1}(Q) P if and only if e_{2}(Q) P.
Proof We need only prove the forward direction, since the backward direction is symmetric. Suppose, therefore, that e_{1}(Q) can be solved in time O(n^{k}) for some constant k. Further, suppose that for any problem instance i, the encoding e_{1}(i) can be computed from the encoding e_{2}(i) in time O(n^{c}) for some constant c, where n = e_{1}(i). To solve problem e_{2}(Q), on input e_{2}(i), we first compute e_{1}(i) and then run the algorithm for e_{1}(Q) on e_{1}(i). How long does this take? The conversion of encodings takes time O(n^{c}), and therefore e_{1}(i) = O(n^{c}), since the output of a serial computer cannot be longer than its running time. Solving the problem on e_{1} (i) takes time O(e_{1}(i)^{k}) = O(n^{ck}), which is polynomial since both c and k are constants.
Thus, whether an abstract problem has its instances encoded in binary or base 3 does not affect its 'complexity,' that is, whether it is polynomialtime solvable or not, but if instances are encoded in unary, its complexity may change. In order to be able to converse in an encodingindependent fashion, we shall generally assume that problem instances are encoded in any reasonable, concise fashion, unless we specifically say otherwise. To be precise, we shall assume that the encoding of an integer is polynomially related to its binary representation, and that the encoding of a finite set is polynomially related to its encoding as a list of its elements, enclosed in braces and separated by commas. (ASCII is one such encoding scheme.) With such a 'standard' encoding in hand, we can derive reasonable encodings of other mathematical objects, such as tuples, graphs, and formulas. To denote the standard encoding of an object, we shall enclose the object in angle braces. Thus, G denotes the standard encoding of a graph G.
As long as we implicitly use an encoding that is polynomially related to this standard encoding, we can talk directly about abstract problems without reference to any particular encoding, knowing that the choice of encoding has no effect on whether the abstract problem is polynomialtime solvable. Henceforth, we shall generally assume that all problem instances are binary strings encoded using the standard encoding, unless we explicitly specify the contrary. We shall also typically neglect the distinction between abstract and concrete problems. The reader should watch out for problems that arise in practice, however, in which a standard encoding is not obvious and the encoding does make a difference.
One of the convenient aspects of focusing on decision problems is that they make it easy to use the machinery of formallanguage theory. It is worthwhile at this point to review some definitions from that theory. An alphabet is a finite set of symbols. A language L over is any set of strings made up of symbols from . For example, if = , the set L = is the language of binary representations of prime numbers. We denote the empty string by , and the empty language by . The language of all strings over is denoted *. For example, if = , then * = is the set of all binary strings. Every language L over is a subset of *.
There are a variety of operations on languages. Settheoretic operations, such as union and intersection, follow directly from the settheoretic definitions. We define the complement of L by . The concatenation of two languages L_{1} and L_{2} is the language
L = .
The closure or Kleene star of a language L is the language
L^{*} = L L^{2} L^{3} ^{ . . } ,
where L^{k} is the language obtained by concatenating L to itself k times.
From the point of view of language theory, the set of instances for any decision problem Q is simply the set *, where = . Since Q is entirely characterized by those problem instances that produce a 1 (yes) answer, we can view Q as a language L over = , where
L = .
For example, the decision problem PATH has the corresponding language
PATH = .
(Where convenient, we shall sometimes use the same namePATH in this caseto refer to both a decision problem and its corresponding language.)
Even if language L is accepted by an algorithm A, the algorithm will not necessarily reject a string x L provided as input to it. For example, the algorithm may loop forever. A language L is decided by an algorithm A if every binary string is either accepted or rejected by the algorithm. A language L is accepted in polynomial time by an algorithm A if for any lengthn string x L, the algorithm accepts x in time O(n^{k}) for some constant k. A language L is decided in polynomial time by an algorithm A if for any lengthn string x *, the algorithm decides x in time O(n^{k}) for some constant k. Thus, to accept a language, an algorithm need only worry about strings in L, but to decide a language, it must accept or reject every string in *.
As an example, the language PATH can be accepted in polynomial time. One polynomialtime accepting algorithm computes the shortest path from u to v in G, using breadthfirst search, and then compares the distance obtained with k. If the distance is at most k, the algorithm outputs 1 and halts. Otherwise, the algorithm runs forever. This algorithm does not decide PATH, however, since it does not explicitly output 0 for instances in which the shortest path has length greater than k. A decision algorithm for PATH must explicitly reject binary strings that do not belong to PATH. For a decision problem such as PATH, such a decision algorithm is easy to design. For other problems, such as Turing's Halting Problem, there exists an accepting algorithm, but no decision algorithm exists.
We can informally define a complexity class as a set of languages, membership in which is determined by a complexity measure, such as running time, on an algorithm that determines whether a given string x belongs to language L. The actual definition of a complexity class is somewhat more technicalthe interested reader is referred to the seminal paper by Hartmanis and Stearns [95].
Using this languagetheoretic framework, wee can provide an alternative definition of the complexity class P:
P = * : there exists an algorithm A
that decides L in polynomial time} .
In fact, P is also the class of languages that can be accepted in polynomial time.
Theorem 2
P = .
Proof Since the class of languages decided by polynomialtime algorithms is a subset of the class of languages accepted by polynomialtime algorithms, we need only show that if L is accepted by a polynomialtime algorithm, it is decided by a polynomialtime algorithm. Let L be the language accepted by some polynomialtime algorithm A. We shall use a classic 'simulation' argument to construct another polynomialtime algorithm A' that decides L. Because A accepts L in time O(n^{k}) for some constant k, there also exists a constant c such that A accepts L in at most T = cn^{k} steps. For any input string x, the algorithm A' simulates the action of A for time T. At the end of time T, algorithm A' inspects the behavior of A. If A has accepted x, then A' accepts x by outputting a 1. If A has not accepted x, then A' rejects x by outputting a 0. The overhead of A' simulating A does not increase the running time by more than a polynomial factor, and thus A' is a polynomialtime algorithm that decides L.
Note that the proof of Theorem 2 is nonconstructive. For a given language L P, we may not actually know a bound on the running time for the algorithm A that accepts L. Nevertheless, we know that such a bound exists, and therefore, that an algorithm A' exists that can check the bound, even though we may not be able to find the algorithm A' easily.
11
Define the optimization problem LONGESTPATHLENGTH as the relation that associates each instance of a undirected graph and two vertices with the length of the longest simple path between the two vertices. Define the decision problem LONGESTPATH = . Show that the optimization problem LONGESTPATHLENGTH can be solved in polynomial time if and only if LONGESTPATH P.
12
Give a formal definition for the problem of finding the longest simple cycle in an undirected graph. Give a related decision problem. Give the language corresponding to the decision problem.
13
Give a formal encoding of directed graphs as binary strings using an adjacencymatrix representation. Do the same using an adjacencylist representation. Argue that the two representations are polynomially related.
14
Is the dynamicprogramming algorithm for the 01 knapsack problem that is asked for in Exercise 17.22 a polynomialtime algorithm? Explain your answer.
15
Suppose that a language L can accept any string x L in polynomial time, but that the algorithm that does this runs in superpolynomial time if x L. Argue that L can be decided in polynomial time.
16
Show that an algorithm that makes at most a constant number of calls to polynomialtime subroutines runs in polynomial time, but that a polynomial number of calls to polynomialtime subroutines may result in an exponentialtime algorithm.
17
Show that the class P, viewed as a set of languages, is closed under union, intersection, concatenation, complement, and Kleene star. That is, if L_{l}, L_{2} P, then L_{l} L_{2} P, etc.
We now look at algorithms that 'verify' membership in languages. For example, suppose that for a given instance G, u, v, k of the decision problem PATH, we are also given a path p from u to v. We can easily check whether the length of p is at most k, and if so, we can view p as a 'certificate' that the instance indeed belongs to PATH. For the decision problem PATH, this certificate doesn't seem to buy us much. After all, PATH belongs to Pin fact, PATH can be solved in linear timeand so verifying membership from a given certificate takes as long as solving the problem from scratch. We shall now examine a problem for which we know of no polynomialtime decision algorithm yet, given a certificate, verification is easy.
The problem of finding a hamiltonian cycle in an undirected graph has been studied for over a hundred years. Formally, a hamiltonian cycle of an undirected graph G = (V, E) is a simple cycle that contains each vertex in V. A graph that contains a hamiltonian cycle is said to be hamiltonian; otherwise, it is nonhamiltonian. Bondy and Murty [31] cite a letter by W. R. Hamilton describing a mathematical game on the dodecahedron (Figure 1(a)) in which one player sticks five pins in any five consecutive vertices and the other player must complete the path to form a cycle containing all the vertices. The dodecahedron is hamiltonian, and Figure 1 (a) shows one hamiltonian cycle. Not all graphs are hamiltonian, however. For example, Figure 1 (b) shows a bipartite graph with an odd number of vertices. (Exercise 22 asks you to show that all such graphs are nonhamiltonian.)
We can define the hamiltoniancycle problem, 'Does a graph G have a hamiltonian cycle?' as a formal language:
HAMCYCLE = .
How might an algorithm decide the language HAMCYCLE? Given a problem instance G, one possible decision algorithm lists all permutations of the vertices of G and then checks each permutation to see if it is a hamiltonian path. What is the running time of this algorithm? If we use the 'reasonable' encoding of a graph as its adjacency matrix, the number m of vertices in the graph is , where n = G is the length of the encoding of G. There are m! possible permutations of the vertices, and therefore the running time is , which is not O(n^{k}) for any constant k. Thus, this naive algorithm does not run in polynomial time, and in fact, the hamiltoniancycle problem is NPcomplete, as we shall prove in Section 5.
Consider a slightly easier problem, however. Suppose that a friend tells you that a given graph G is hamiltonian, and then offers to prove it by giving you the vertices in order along the hamiltonian cycle. It would certainly be easy enough to verify the proof: simply verify that the provided cycle is hamiltonian by checking whether it is a permutation of the vertices of V and whether each of the consecutive edges along the cycle actually exists in the graph. This verification algorithm can certainly be implemented to run in O(n^{2}) time, where n is the length of the encoding of G . Thus, a proof that a hamiltonian cycle exists in a graph can be verified in polynomial time.
We define a verification algorithm as being a twoargument algorithm A, where one argument is an ordinary input string x and the other is a binary string y called a certificate. A twoargument algorithm A verifies an input string x if there exists a certificate y such that A(x, y) = 1. The language verified by a verification algorithm A is
L = {x * : there exists y * such that A(x, y) = 1} .
Intuitively, an algorithm A verifies a language L if for any string x L, there is a certificate y that A can use to prove that x L. Moreover, for any string x L there must be no certificate proving that x L. For example, in the hamiltoniancycle problem, the certificate is the list of vertices in the hamiltonian cycle. If a graph is hamiltonian, the hamiltonian cycle itself offers enough information to verify this fact. Conversely, if a graph is not hamiltonian, there is no list of vertices that can fool the verification algorithm into believing that the graph is hamiltonian, since the verification algorithm carefully checks the proposed 'cycle' to be sure.
The complexity class NP is the class of languages that can be verified by a polynomialtime algorithm.^{3 }More precisely, a language L belongs to NP if and only if there exists a twoinput polynomialtime algorithm A and constant c such that
L = * : there exists a certificate y with y = 0(x^{c})
such that A(x,y) = 1} .
We say that algorithm A verifies language L in polynomial time.
^{3}The name 'NP' stands for 'nondeterministic polynomial time.' The class NP was originally studied in the context of nondeterminism, but this book uses the somewhat simpler yet equivalent notion of verification. Hopcroft and Ullman [104] give a good presentation of NPcompleteness in terms of nondeterministic models of computation.
From our earlier discussion on the hamiltoniancycle problem, it follows that HAMCYCLE NP. (It is always nice to know that an important set is nonempty.) Moreover, if L P, then L NP, since if there is a polynomialtime algorithm to decide L, the algorithm can be easily converted to a twoargument verification algorithm that simply ignores any certificate and accepts exactly those input strings it determines to be in L. Thus, P NP.
It is unknown whether P = NP, but most researchers believe that P and NP are not the same class. Intuitively, the class P consists of problems that can be solved quickly. The class NP consists of problems for which a solution can be verified quickly. You may have learned from experience that it is often more difficult to solve a problem from scratch than to verify a clearly presented solution, especially when working under time constraints. Theoretical computer scientists generally believe that this analogy extends to the classes P and NP, and thus that NP includes languages that are not in P.
There is more compelling evidence that P NPthe existence of 'NPcomplete' languages. We shall study this class in Section 3.
Many other fundamental questions beyond the P NP question remain unresolved. Despite much work by many researchers, no one even knows if the class NP is closed under complement. That is, does L NP imply ? We can define the complexity class coNP as the set of languages L such that . The question of whether NP is closed under complement can be rephrased as whether NP = coNP. Since P is closed under complement (Exercise 17), it follows that P NP coNP. Once again, however, it is not known whether P = NP coNP or whether there is some language in NP coNP  P. Figure 2 shows the four possible scenarios.
Thus, our understanding of the precise relationship between P and NP is woefully incomplete. Nevertheless, by exploring the theory of NPcompleteness, we shall find that our disadvantage in proving problems to be intractable is, from a practical point of view, not nearly so great as we might suppose.
21
Consider the language GRAPHISOMORPHISM = . Prove that GRAPHISOMORPHISM NP by describing a polynomialtime algorithm to verify the language.
22
Prove that if G is an undirected bipartite graph with an odd number of vertices, then G is nonhamiltonian.
23
Show that if HAMCYCLE P, then the problem of listing the vertices of a hamiltonian cycle, in order, is polynomialtime solvable.
24
Prove that the class NP of languages is closed under union, intersection, concatenation, and Kleene star. Discuss the closure of NP under complement.
25
Show that any language in NP can be decided by an algorithm running in time 2^{O}^{(nk)} for some constant k.
26
A hamiltonian path in a graph is a simple path that visits every vertex exactly once. Show that the language HAMPATH = belongs to NP.
27
Show that the hamiltonianpath problem can be solved in polynomial time on directed acyclic graphs. Give an efficient algorithm for the problem.
28
Let be a boolean formula constructed from the boolean input variables x_{1}, x_{2}, . . . , x_{k}, negations (), AND's (), OR's (), and parentheses. The formula is a tautology if it evaluates to 1 for every assignment of 1 and 0 to the input variables. Define TAUTOLOGY as the language of boolean formulas that are tautologies. Show that TAUTOLOGY coNP.
29
Prove that P coNP.
210
Prove that if NP coNP, then P NP.
211
Let G be a connected, undirected graph with at least 3 vertices, and let G^{3 }be the graph obtained by connecting all pairs of vertices that are connected by a path in G of length at most 3. Prove that G^{3} is hamiltonian. (Hint: Construct a spanning tree for G, and use an inductive argument.)
Perhaps the most compelling reason why theoretical computer scientists believe that P NP is the existence of the class of 'NPcomplete' problems. This class has the surprising property that if any one NPcomplete problem can be solved in polynomial time, then every problem in NP has a polynomialtime solution, that is, P = NP. Despite years of study, though, no polynomialtime algorithm has ever been discovered for any NPcomplete problem.
The language HAMCYCLE is one NPcomplete problem. If we could decide HAMCYCLE in polynomial time, then we could solve every problem in NP in polynomial time. In fact, if NP  P should turn out to be nonempty, we could say with certainty that HAMCYCLE NP  P.
The NPcomplete languages are, in a sense, the 'hardest' languages in NP. In this section, we shall show how to compare the relative 'hardness' of languages using a precise notion called 'polynomialtime reducibility.' First, we formally define the NPcomplete languages, and then we sketch a proof that one such language, called CIRCUITSAT, is NPcomplete. In Section 5, shall use the notion of reducibility to show that many other problems are NPcomplete.
Intuitively, a problem Q can be reduced to another problem Q' if any instance of Q can be 'easily rephrased' as an instance of Q', the solution to which provides a solution to the instance of Q. For example, the problem of solving linear equations in an indeterminate x reduces to the problem of solving quadratic equations. Given an instance ax + b = 0, we transform it to 0x^{2} + ax + b = 0, whose solution provides a solution to ax + b = 0. Thus, if a problem Q reduces to another problem Q', then Q is, in a sense, 'no harder to solve' than Q'.
Returning to our formallanguage framework for decision problems, we say that a language L_{1} is polynomialtime reducible to a language L_{2}, written L_{1} _{P} L_{2}, if there exists a polynomialtime computable function f : * * such that for all x *,
x L_{1 }if and only if f(x) L_{2} .
We call the function f the reduction function, and a polynomialtime algorithm F that computes f is called a reduction algorithm.
Figure 3 illustrates the idea of a polynomialtime reduction from a language L_{1} to another language L_{2}. Each language is a subset of *. The reduction function f provides a polynomialtime mapping such that if x L_{1}, then f(x) L_{2}. Moreover, if x L_{1}, then f(x) L_{2}. Thus, the reduction function maps any instance x of the decision problem represented by the language L_{1} to an instance f(x) of the problem represented by L_{2}. Providing an answer to whether f(x) L_{2} directly provides the answer to whether x L_{1}.
Polynomialtime reductions give us a powerful tool for proving that various languages belong to P.
Lemma 3
If L_{1}, L_{2} ^{*} are languages such that L_{1} L_{2}, then L_{2} P implies L_{1} P.
Proof Let A_{2} be a polynomialtime algorithm that decides L_{2}, and let F be a polynomialtime reduction algorithm that computes the reduction function f. We shall construct a polynomialtime algorithm A_{1} that decides L_{1}.
Figure 4 illustrates the construction of A_{1}. For a given input x *, the algorithm A_{1} uses F to transform x into f(x), and then it uses A_{2} to test whether f(x) L_{2}. The output of A_{2} is the value provided as the output from A_{1}.
The correctness of A_{1} follows from condition (1). The algorithm runs in polynomial time, since both F and A_{2} run in polynomial time (see Exercise 16).
Polynomialtime reductions provide a formal means for showing that one problem is at least as hard as another, to within a polynomialtime factor. That is, if L_{1} _{p} L_{2}, then L_{1} is not more than a polynomial factor harder than L_{2}, which is why the 'less than or equal to' notation for reduction is mnemonic. We can now define the set of NPcomplete languages, which are the hardest problems in NP.
A language L ^{*} is NPcomplete if
1. L NP, and
2. L' _{p} L for every L' NP.
If a language L satisfies property 2, but not necessarily property 1, we say that L is NPhard. We also define NPC to be the class of NPcomplete languages.
As the following theorem shows, NPcompleteness is at the crux of deciding whether P is in fact equal to NP.
Theorem 4
If any NPcomplete problem is polynomialtime solvable, then P = NP. If any problem in NP is not polynomialtime solvable, then all NPcomplete problems are not polynomialtime solvable.
Proof Suppose that L P and also that L NPC. For any L' NP, we have L' _{p} L by property 2 of the definition of NPcompleteness. Thus, by Lemma 3, we also have that L' P, which proves the first statement of the lemma.
To prove the second statement, suppose that there exists an L NP such that L P. Let L' NPC be any NPcomplete language, and for the purpose of contradiction, assume that L' P. But then, by Lemma 3, we have L _{p} L', and thus L P.
It is for this reason that research into the P NP question centers around the NPcomplete problems. Most theoretical computer scientists believe that P NP, which leads to the relationships among P, NP, and NPC shown in Figure 5. But for all we know, someone may come up with a polynomialtime algorithm for an NPcomplete problem, thus proving that P = NP. Nevertheless, since no polynomialtime algorithm for any NPcomplete problem has yet been discovered, a proof that a problem is NPcompete provides excellent evidence for its intractability.
We have defined the notion of an NPcomplete problem, but up to this point, we have not actually proved that any problem is NPcomplete. Once we prove that at least one problem is NPcomplete, we can use polynomialtime reducibility as a tool to prove the NPcompleteness of other problems. Thus, we now focus on demonstrating the existence of an NPcomplete problem: the circuitsatisfiability problem.
Unfortunately, the formal proof that the circuitsatisfiability problem is NPcomplete requires technical detail beyond the scope of this text. Instead, we shall informally describe a proof that relies on a basic understanding of boolean combinational circuits. This material is reviewed at the beginning of Chapter 29.
Figure 6 shows two boolean combinational circuits, each with three inputs and a single output. A truth assignment for a boolean combinational circuit is a set of boolean input values. We say that a oneoutput boolean combinational circuit is satisfiable if it has a satisfying assignment: a truth assignment that causes the output of the circuit to be 1. For example, the circuit in Figure 6(a) has the satisfying assignment x_{1} = 1, x_{2} = 1, x_{3} = 0, and so it is satisfiable. No assignment of values to x_{1}, x_{2}, and x_{3} causes the circuit in Figure 6(b) to produce a 1 output; it always produces 0, and so it is unsatisfiable.
The circuitsatisfiability problem is, 'Given a boolean combinational circuit composed of AND, OR, and NOT gates, is it satisfiable?' In order to pose this question formally, however, we must agree on a standard encoding for circuits. One can devise a graphlike encoding that maps any given circuit C into a binary string C whose length is not much larger than the size of the circuit itself. As a formal language, we can therefore define
CIRCUITSAT =
.
The circuitsatisfiability problem has great importance in the area of computeraided hardware optimization. If a circuit always produces 0, it can be replaced by a simpler circuit that omits all logic gates and provides the constant 0 value as its output. A polynomialtime algorithm for the problem would have considerable practical application.
Given a circuit C, we might attempt to determine whether it is satisfiable by simply checking all possible assignments to the inputs. Unfortunately, if there are k inputs, there are 2^{k} possible assignments. When the size of C is polynomial in k, checking each one leads to a superpolynomialtime algorithm. In fact, as has been claimed, there is strong evidence that no polynomialtime algorithm exists that solves the circuitsatisfiability problem because circuit satisfiability is NPcomplete. We break the proof of this fact into two parts, based on the two parts of the definition of NPcompleteness.
Lemma 5
The circuitsatisfiability problem belongs to the class NP.
Proof We shall provide a twoinput, polynomialtime algorithm A that can verify CIRCUITSAT. One of the inputs to A is (a standard encoding of) a boolean combinational circuit C. The other input is a certificate corresponding to an assignment of boolean values to the wires in C.
The algorithm A is constructed as follows. For each logic gate in the circuit, it checks that the value provided by the certificate on the output wire is correctly computed as a function of the values on the input wires. Then, if the output of the entire circuit is 1, the algorithm outputs 1, since the values assigned to the inputs of C provide a satisfying assignment. Otherwise, A outputs 0.
Whenever a satisfiable circuit C is input to algorithm A, there is a certificate whose length is polynomial in the size of C and that causes A to output a 1. Whenever an unsatisfiable circuit is input, no certificate can fool A into believing that the circuit is satisfiable. Algorithm A runs in polynomial time: with a good implementation, linear time suffices. Thus, CIRCUITSAT can be verified in polynomial time, and CIRCUITSAT NP.
The second part of proving that CIRCUITSAT is NPcomplete is to show that the language is NPhard. That is, we must show that every language in NP is polynomialtime reducible to CIRCUITSAT. The actual proof of this fact is full of technical intricacies, and so we shall settle for a sketch of the proof based on some understanding of the workings of computer hardware.
A computer program is stored in the computer memory as a sequence of instructions. A typical instruction encodes an operation to be performed, addresses of operands in memory, and an address where the result is to be stored. A special memory location, called the program counter, keeps track of which instruction is to be executed next. The program counter is automatically incremented whenever an instruction is fetched, thereby causing the computer to execute instructions sequentially. The execution of an instruction can cause a value to be written to the program counter, however, and then the normal sequential execution can be altered, allowing the computer to loop and perform conditional branches.
At any point during the execution of a program, the entire state of the computation is represented in the computer's memory. (We take the memory to include the program itself, the program counter, working storage, and any of the various bits of state that a computer maintains for bookkeeping.) We call any particular state of computer memory a configuration. The execution of an instruction can be viewed as mapping one configuration to another. Importantly, the computer hardware that accomplishes this mapping can be implemented as a boolean combinational circuit, which we denote by M in the proof of the following lemma.
Lemma 6
The circuitsatisfiability problem is NPhard.
Proof Let L be any language in NP. We shall describe a polynomialtime algorithm F computing a reduction function a that maps every binary string x to a circuit C = a(x) such that x L if and only if C CIRCUITSAT.
Since L NP, there must exist an algorithm A that verifies L in polynomialtime. The algorithm F that we shall construct will use the twoinput algorithm A to compute the reduction function a.
Let T(n) denote the worstcase running time of algorithm A on lengthn input strings, and let k 1 be a constant such that T(n) = O(n^{k}) and the length of the certificate is O(n^{k}). (The running time of A is actually a polynomial in the total input size, which includes both an input string and a certificate, but since the length of the certificate is polynomial in the length n of the input string, the running time is polynomial in n.)
The basic idea of the proof is to represent the computation of A as a sequence of configurations. As shown in Figure 7, each configuration can be broken into parts consisting of the program for A, the program counter and auxiliary machine state, the input x, the certificate y, and working storage. Starting with an initial configuration c_{0}, each configuration c_{i} is mapped to a subsequent configuration c_{i+}_{1} by the combinational circuit M implementing the computer hardware. The output of the algorithm A0 or 1is written to some designated location in the working storage when A finishes executing, and if we assume that thereafter A halts, the value never changes. Thus, if the algorithm runs for at most T(n) steps, the output appears as one of the bits in c_{T}_{(n)}.
The reduction algorithm F constructs a single combinational circuit that computes all configurations produced by a given initial configuration. The idea is to paste together T(n) copies of the circuit M. The output of the ith circuit, which produces configuration c_{i}, is fed directly into the input of the (i + 1)st circuit. Thus, the configurations, rather than ending up in a state register, simply reside as values on the wires connecting copies of M.
Recall what the polynomialtime reduction algorithm F must do. Given an input x, it must compute a circuit C = a(x) that is satisfiable if and only if there exists a certificate y such that A(x, y) = 1. When F obtains an input x, it first computes n = x and constructs a combinational circuit C' consisting of T(n) copies of M. The input to C' is an initial configuration corresponding to a computation on A(x, y), and the output is the configuration c_{T}_{(n)}.
The circuit C = a(x) that F computes is obtained by modifying C' slightly. First, the inputs to C' corresponding to the program for A, the initial program counter, the input x, and the initial state of memory are wired directly to these known values. Thus, the only remaining inputs to the circuit correspond to the certificate y. Second, all outputs to the circuit are ignored, except the one bit of c_{T}_{(n)} corresponding to the output of A. This circuit C, so constructed, computes C(y) = A(x, y) for any input y of length O(n^{k}). The reduction algorithm F, when provided an input string x, computes such a circuit C and outputs it.
Two properties remain to be proved. First, we must show that F correctly computes a reduction function a. That is, we must show that C is satisfiable if and only if there exists a certificate y such that A(x, y) = 1. Second, we must show that F runs in polynomial time.
To show that F correctly computes a reduction function, let us suppose that there exists a certificate y of length O(n^{k}) such that A(x, y) = 1. Then, if we apply the bits of y to the inputs of C, the output of C is C(y) = A(x, y) = 1. Thus, if a certificate exists, then C is satisfiable. For the other direction, suppose that C is satisfiable. Hence, there exists an input y to C such that C(y) = 1, from which we conclude that A(x, y) = 1. Thus, F correctly computes a reduction function.
To complete the proof, we need only show that F runs in time polynomial in n = x. The first observation we make is that the number of bits required to represent a configuration is polynomial in n. The program for A itself has constant size, independent of the length of its input x. The length of the input x is n, and the length of the certificate y is O(n^{k}). Since the algorithm runs for at most O(n^{k}) steps, the amount of working storage required by A is polynomial in n as well. (We assume that this memory is contiguous; Exercise 34 asks you to extend the argument to the situation in which the locations accessed by A are scattered across a much larger region of memory and the particular pattern of scattering can differ for each input x.)
The combinational circuit M implementing the computer hardware has size polynomial in the length of a configuration, which is polynomial in O(n^{k}) and hence is polynomial in n. (Most of this circuitry implements the logic of the memory system.) The circuit C consists of at most t = O(n^{k}) copies of M, and hence it has size polynomial in n. The construction of C from x can be accomplished in polynomial time by the reduction algorithm F, since each step of the construction takes polynomial time.
The language CIRCUITSAT is therefore at least as hard as any language in NP, and since it belongs to NP, it is NPcomplete.
Theorem 7
The circuitsatisfiability problem is NPcomplete.
Proof Immediate from Lemmas 5 and 6 and the definition of NPcompleteness.
31
Show that the p relation is a transitive relation on languages. That is, show that if L_{1} p L_{2} and L_{2} p L_{3}, then L_{1} p L_{3}.
32
Prove that if and only if .
33
Show that a satisfying assignment can be used as a certificate in an alternative proof of Lemma 5. Which certificate makes for an easier proof?
34
The proof of Lemma 6 assumes that the working storage for algorithm A occupies a contiguous region of polynomial size. Where in the proof is this assumption exploited? Argue that this assumption does not involve any loss of generality.
35
A language L is complete for a language class C with respect to polynomialtime reductions if L C and L' p L for all L' C. Show that and * are the only languages in P that are not complete for P with respect to polynomialtime reductions.
36
Show that L is complete for NP if and only if is complete for coNP.
37
The reduction algorithm F in the proof of Lemma 6 constructs the circuit C = f(x) based on knowledge of x, A, and k. Professor Sartre observes that the string x is input to F, but only the existence of A and k is known to F (since the language L belongs to NP), not their actual values. Thus, the professor concludes that F can't possibly construct the circuit C and that the language CIRCUITSAT is not necessarily NPhard. Explain the flaw in the professor's reasoning.
The NPcompleteness of the circuitsatisfiability problem relies on a direct proof that L _{P} CIRCUITSAT for every language L NP. In this section, we shall show how to prove that languages are NPcomplete without directly reducing every language in NP to the given language. We shall illustrate this methodology by proving that various formulasatisfiability problems are NPcomplete. Section 5 provides many more examples of the methodology.
The following lemma is the basis of our method for showing that a language is NPcomplete.
Lemma 8
If L is a language such that L' _{p} L for some L' NPC, then L is NPhard. Moreover, if L NP, then L NPC.
Proof Since L' is NPcomplete, for all L' NP, we have L' p L'. By supposition, L' _{p} L, and thus by transitivity (Exercise 31), we have L' _{p} L, which shows that L is NPhard. If L NP, we also have L NPC.
In other words, by reducing a known NPcomplete language L' to L, we implicitly reduce every language in NP to L. Thus, Lemma 8 gives us a method for proving that a language L is NPcomplete:
1. Prove L NP.
2. Select a known NPcomplete language L'.
3. Describe an algorithm that computes a function f mapping every instance of L' to an instance of L.
4. Prove that the function f satisfies x L' if and only if f(x) L for all x ^{*}.
5. Prove that the algorithm computing f runs in polynomial time.
This methodology of reducing from a single known NPcomplete language is far simpler than the more complicated process of providing reductions from every language in NP. Proving CIRCUITSAT NPC has given us a 'foot in the door.' Knowing that the circuitsatisfiability problem is NPcomplete now allows us to prove much more easily that other problems are NPcomplete. Moreover, as we develop a catalog of known NPcomplete problems, applying the methodology will become that much easier.
We illustrate the reduction methodology by giving an NPcompleteness proof for the problem of determining whether a boolean formula, not a circuit, is satisfiable. This problem has the historical honor of being the first problem ever shown to be NPcomplete.
We formulate the (formula) satisfiability problem in terms of the language SAT as follows. An instance of SAT is a boolean formula composed of
1. boolean variables: x_{1}, x_{2}, . . . . ;
2. boolean connectives: any boolean function with one or two inputs and one output, such as (AND), (OR), (NOT), (implication), (if and only if); and
3. parentheses.
As in boolean combinational circuits, a truth assignment for a boolean formula is a set of values for the variables of , and a satisfying assignment is a truth assignment that causes it to evaluate to 1. A formula with a satisfying assignment is a satisfiable formula. The satisfiability problem asks whether a given boolean formula is satisfiable; in formallanguage terms,
SAT = .
As an example, the formula
= ((x_{1} x_{2}) ((x_{1} x_{3}) x_{4})) x_{2}
has the satisfying assignment x_{1 }= 0, x_{2 }= 0, x_{3 }= 1, x_{4 }= 1, since
= ((0 0) ((0 1) 1)) 0
= (1 (1 1)) 1
= (1 0) 1
= 1 ,
and thus this formula belongs to SAT.
The naive algorithm to determine whether an arbitrary boolean formula is satisfiable does not run in polynomial time. There are 2^{n} possible assignments in a formula with n variables. If the length of is polynomial in n, then checking every assignment requires superpolynomial time. As the following theorem shows, a polynomialtime algorithm is unlikely to exist.
Theorem 9
Satisfiability of boolean formulas is NPcomplete.
Proof We shall argue first that SAT NP. Then, we shall show that CIRCUITSAT _{p} SAT; by Lemma 8, this will prove the theorem.
To show that SAT belongs to NP, we show that a certificate consisting of a satisfying assignment for an input formula can be verified in polynomial time. The verifying algorithm simply replaces each variable in the formula with its corresponding value and then evaluates the expression, much as we did in equation (2) above. This task is easily doable in polynomial time. If the expression evaluates to 1, the formula is satisfiable. Thus, the first condition of Lemma 8 for NPcompleteness holds.
To prove that SAT is NPhard, we show that CIRCUITSAT _{p} SAT. In other words, any instance of circuit satisfiability can be reduced in polynomial time to an instance of formula satisfiability. Induction can be used to express any boolean combinational circuit as a boolean formula. We simply look at the gate that produces the circuit output and inductively express each of the gate's inputs as formulas. The formula for the circuit is then obtained by writing an expression that applies the gate's function to its inputs' formulas.
Unfortunately, this straightforward method does not constitute a polynomialtime reduction. Shared subformulas can cause the size of the generated formula to grow exponentially (see Exercise 41). Thus, the reduction algorithm must be somewhat more clever.
Figure 8 illustrates the basic idea of the reduction from CIRCUITSAT to SAT on the circuit from Figure 6(a). For each wire x_{i} in the circuit C, the formula has a variable x_{i}. The proper operation of a gate can now be expressed as a formula involving the variables of its incident wires. For example, the operation of the output AND gate is x_{10} (x_{7} x_{8 }x_{9}).
The formula produced by the reduction algorithm is the AND of the circuitoutput variable with the conjunction of clauses describing the operation of each gate. For the circuit in the figure, the formula is
= x_{10} (x_{4} x_{3})
(x_{5 } (x_{1} x_{2))}
(x_{6 } x_{4)}
(x_{7 } (x_{1} x_{2} x_{4))}
(x_{8 } (x_{5} x_{6}))
(x_{9} (x_{6} x_{7}))
(x_{10} (x_{7} x_{8} x_{9})) .
Given a circuit C, it is straightforward to produce such a formula in polynomial time.
Why is the circuit satisfiable exactly when the formula is satisfiable? If C has a satisfying assignment, each wire of the circuit has a welldefined value, and the output of the circuit is 1. Therefore, the assignment of wire values to variables in makes each clause of evaluate to 1, and thus the conjunction of all evaluates to 1. Conversely, if there is an assignment that causes to evaluate to 1, the circuit C is satisfiable by an analogous argument. Thus, we have shown that CIRCUITSAT _{p} SAT, which completes the proof.
Many problems can be proved NPcomplete by reduction from formula satisfiability. The reduction algorithm must handle any input formula, though, and this can lead to a huge number of cases that must be considered. It is often desirable, therefore, to reduce from a restricted language of boolean formulas, so that fewer cases need be considered. Of course, we must not restrict the language so much that it becomes polynomialtime solvable. One convenient language is 3CNF satisfiability, or 3CNFSAT.
We define 3CNF satisfiability using the following terms. A literal in a boolean formula is an occurrence of a variable or its negation. A boolean formula is in conjunctive normal form, or CNF, if it is expressed as an AND of clauses, each of which is the OR of one or more literals. A boolean formula is in 3conjunctive normal form, or 3CNF, if each clause has exactly three distinct literals.
For example, the boolean formula
(x_{1 }_{ }x_{1 }_{ }x_{2}) (x_{3} x_{2} x_{4}) (x_{1} _{ }x_{3 }_{ }x_{4})
is in 3CNF. The first of its three clauses is (x_{1 }x_{1 }x_{2 }), which contains the three literals x_{1}, x_{1}, and x_{2}.
In 3CNFSAT, we are asked whether a given boolean formula in 3CNF is satisfiable. The following theorem shows that a polynomialtime algorithm that can determine the satisfiability of boolean formulas is unlikely to exist, even when they are expressed in this simple normal form.
= ((x_{1} x_{2}) ((x_{1} x_{3}) x_{4})) x_{2 .}
Theorem 10
Satisfiability of boolean formulas in 3conjunctive normal form is NPcomplete.
Proof The argument we used in the proof of Theorem 9 to show that SAT NP applies equally well here to show that 3CNFSAT NP. Thus, we need only show that 3CNFSAT is NPhard. We prove this by showing that SAT _{p} 3CNFSAT, from which the proof will follow by Lemma 8.
The reduction algorithm can be broken into three basic steps. Each step progressively transforms the input formula closer to the desired 3conjunctive normal form.
The first step is similar to the one used to prove CIRCUITSAT _{p }SAT in Theorem 9. First, we construct a binary 'parse' tree for the input formula , with literals as leaves and connectives as internal nodes. Figure 9 shows such a parse tree for the formula
= ((x_{1} x_{2}) ((x_{1} x_{3}) x_{4})) x_{2 }.
Should the input formula contain a clause such as the OR of several literals, associativity can be used to parenthesize the expression fully so that every internal node in the resulting tree has 1 or 2 children. The binary parse tree can now be viewed as a circuit for computing the function.
Mimicking the reduction in the proof of Theorem 9, we introduce a variable y_{i} for the output of each internal node. Then, we rewrite the original formula as the AND of the root variable and a conjunction of clauses describing the operation of each node. For the formula (3), the resulting expression is
' = y_{1} (y_{1} (y_{2} x_{2}))
(y_{2} (y3 y_{4}))
(y_{3} (x_{1} x_{2}))
(y_{4} y_{5})
(y_{5} (y_{6} y_{4}))
(y_{6} (x_{1} x_{3})).
Observe that the formula ' thus obtained is a conjunction of clauses ,_{ }each of which has at most 3 literals. The only additional requirement is that each clause be an OR of literals.
y_{1} y_{2} x_{2} (y_{1} (y_{2} x_{2}))

1 1 1 0
1 1 0 1
1 0 1 0
1 0 0 0
0 1 1 1
0 1 0 0
0 0 1 1
0 0 0 1
The second step of the reduction converts each clause into conjunctive normal form. We construct a truth table for by evaluating all possible assignments to its variables. Each row of the truth table consists of a possible assignment of the variables of the clause, together with the value of the clause under that assignment. Using the truthtable entries that evaluate to 0, we build a formula in disjunctive normal form (or DNF)an OR of AND'sthat is equivalent to . We then convert this formula into a CNF formula by using DeMorgan's laws (5.2) to complement all literals and change OR's into AND's and AND's into OR's.
In our example, we convert the clause into CNF as follows. The truth table for '_{i} is given in Figure 10. The DNF formula equivalent to
(y_{1} y_{2} x_{2}) (y_{1} y_{2} x_{2}) (y_{1} y_{2} x_{2}) (y_{1} y_{2} x_{2}) .
Applying DeMorgan's laws, we get the CNF formula
which is equivalent to the original clause .
Each clause of the formula ' has now been converted into a CNF formula , and thus ' is equivalent to the CNF formula ' consisting of the conjunction of the . Moreover, each clause of ' has at most 3 literals.
The third and final step of the reduction further transforms the formula so that each clause has exactly 3 distinct literals. The final 3CNF formula ''' is constructed from the clauses of the CNF formula '. It also uses two auxiliary variables that we shall call p and q. For each clause C_{i}_{ }of ', we include the following clauses in ''':
If C_{i} has 3 distinct literals, then simply include C_{i} as a clause of '''.
If C_{i} has 2 distinct literals, that is, if Ci = (l_{1} l_{2}), where l_{1} and l_{2 }are literals, then include (l_{1} l_{2} p) (l_{1} l_{2} p) as clauses of f(). The literals p and p merely fulfill the syntactic requirement that there be exactly 3 distinct literals per clause: (l_{1} l_{2} p) (l_{1} l_{2} p) is equivalent to (l_{1} l_{2}) whether p = 0 or p = 1.
If C_{i} has just 1 distinct literal l, then include (l p q) (l p q) (l p q) (l p q) as clauses of '''. Note that every setting of p and q causes the conjunction of these four clauses to evaluate to l.
We can see that the 3CNF formula ''' is satisfiable if and only if is satisfiable by inspecting each of the three steps. Like the reduction from CIRCUITSAT to SAT, the construction of ' from in the first step preserves satisfiability. The second step produces a CNF formula ' that is algebraically equivalent to '. The third step produces a 3CNF formula ''' that is effectively equivalent to ', since any assignment to the variables p and q produces a formula that is algebraically equivalent to '.
We must also show that the reduction can be computed in polynomial time. Constructing ' from introduces at most 1 variable and 1 clause per connective in . Constructing ' from ' can introduce at most 8 clauses into ' for each clause from ', since each clause of ' has at most 3 variables, and the truth table for each clause has at most 2^{3} = 8 rows. The construction of ''' from ' introduces at most 4 clauses into ''' for each clause of '. Thus, the size of the resulting formula ''' is polynomial in the length of the original formula. Each of the constructions can easily be accomplished in polynomial time.
41
Consider the straightforward (nonpolynomialtime) reduction in the proof of Theorem 9. Describe a circuit of size n that, when converted to a formula by this method, yields a formula whose size is exponential in n.
42
Show the 3CNF formula that results when we use the method of Theorem 10 on the formula (3).
43
44
Show that the problem of determining whether a boolean formula is a tautology is complete for coNP. (Hint: See Exercise 36.)
45
Show that the problem of determining the satisfiability of boolean formulas in disjunctive normal form is polynomialtime solvable.
46
Suppose that someone gives you a polynomialtime algorithm to decide formula satisfiability. Describe how to use this algorithm to find satisfying assignments in polynomial time.
47
Let 2CNFSAT be the set of satisfiable boolean formulas in CNF with exactly 2 literals per clause. Show that 2CNFSAT P. Make your algorithm as efficient as possible. (Hint: Observe that x y is equivalent to x y. Reduce 2CNFSAT to a problem on a directed graph that is efficiently solvable.)
NPcomplete problems arise in diverse domains: boolean logic, graphs, arithmetic, network design, sets and partitions, storage and retrieval, sequencing and scheduling, mathematical programming, algebra and number theory, games and puzzles, automata and language theory, program optimization, and more. In this section, we shall use the reduction methodology to provide NPcompleteness proofs for a variety of problems drawn from graph theory and set partitioning.
Figure 11 outlines the structure of the NPcompleteness proofs in this section and Section 4. Each language in the figure is proved NPcomplete by reduction from the language that points to it. At the root is CIRCUITSAT, which we proved NPcomplete in Theorem 7.
A clique in an undirected graph G = (V, E) is a subset V' V of vertices, each pair of which is connected by an edge in E. In other words, a clique is a complete subgraph of G. The size of a clique is the number of vertices it contains. The clique problem is the optimization problem of finding a clique of maximum size in a graph. As a decision problem, we ask simply whether a clique of a given size k exists in the graph. The formal definition is
CLIQUE = .
A naive algorithm for determining whether a graph G = (V, E) with V vertices has a clique of size k is to list all ksubsets of V, and check each one to see whether it forms a clique. The running time of this algorithm is , which is polynomial if k is a constant. In general, however, k could be proportional to V, in which case the algorithm runs in superpolynomial time. As one might suspect, an efficient algorithm for the clique problem is unlikely to exist.
Theorem 11
The clique problem is NPcomplete.
Proof To show that CLIQUE NP, for a given graph G = (V, E), we use the set V' V of vertices in the clique as a certificate for G. Checking whether V' is a clique can be accomplished in polynomial time by checking whether, for every pair u, v V', the edge (u, v) belongs to E.
We next show that the clique problem is NPhard by proving that 3CNFSAT p CLIQUE. That we should be able to prove this result is somewhat surprising, since on the surface logical formulas seem to have little to do with graphs.
The reduction algorithm begins with an instance of 3CNFSAT. Let =_{ }C_{1} C_{2} . . . C_{k} be a boolean formula in 3CNF with k clauses. For r = 1, 2, . . . , k, each clause C_{r} has exactly three distinct literals . We shall construct a graph G such that is satisfiable if and only if G has a clique of size k.
The graph G = (V, E) is constructed as follows. For each clause in , we place a triple of vertices in V. We put an edge between two vertices if both of the following hold:
are in different triples, that is, r s, and
their corresponding literals are consistent, that is, is not the negation of .
This graph can easily be computed from in polynomial time. As an example of this construction, if we have
= (x_{1} x_{2} x_{3}) (x_{1} x_{2} x_{3}) (x_{1} x_{2} x_{3}) ,
then G is the graph shown in Figure 12.
We must show that this transformation of into G is a reduction. First, suppose that has a satisfying assignment. Then, each clause C_{r} contains at least one literal that is assigned 1, and each such literal corresponds to a vertex . Picking one such 'true' literal from each clause yields a set of V' of k vertices. We claim that V' is a clique. For any two vertices , where r s, both corresponding literals are mapped to 1 by the given satisfying assignment, and thus the literals cannot be complements. Thus, by the construction of G, the edge belongs to E.
Conversely, suppose that G has a clique V' of size k. No edges in G connect vertices in the same triple, and so V' contains exactly one vertex per triple. We can assign 1 to each literal such that without fear of assigning 1 to both a literal and its complement, since G contains no edges between inconsistent literals. Each clause is satisfied, and so satisfied. (Any variables that correspond to no vertex in the clique may be set arbitrarily.)
In the example of Figure 12, a satisfying assignment of is x_{1} = 0, x_{2} = 0, x_{3} = 1. A corresponding clique of size k = 3 consists of the vertices corresponding to x_{2} from the first clause, x_{3} from the second clause, and x_{3} from the third clause.
A vertex cover of an undirected graph G = (V, E) is a subset V' Vsuch that if (u, v) E, then u V' or v V' (or both). That is, each vertex 'covers' its incident edges, and a vertex cover for G is a set of vertices that covers all the edges in E. The size of a vertex cover is the number of vertices in it. For example, the graph in Figure 13(b) has a vertex cover of size 2.
The vertexcover problem is to find a vertex cover of minimum size in a given graph. Restating this optimization problem as a decision problem, we wish to determine whether a graph has a vertex cover of a given size k. As a language, we define
VERTEXCOVER = .
The following theorem shows that this problem is NPcomplete.
Theorem 12
The vertexcover problem is NPcomplete.
Proof We first show that VERTEXCOVER NP. Suppose we are given a graph G = (V, E) and an integer k. The certificate we choose is the vertex cover V' V itself. The verification algorithm affirms that V' = k, and then it checks, for each edge (u, v) E, whether u V' or v V'. This verification can be performed straightforwardly in polynomial time.
We prove that the vertexcover problem is NPhard by showing that CLIQUE _{P} VERTEXCOVER. This reduction is based on the notion of the 'complement' of a graph. Given an undirected graph G = (V, E), we define the complement of G as . In other words, is the graph containing exactly those edges that are not in G. Figure 13 shows a graph and its complement and illustrates the reduction from CLIQUE to VERTEXCOVER.
The reduction algorithm takes as input an instance G, k of the clique problem. It computes the complement , which is easily doable in polynomial time. The output of the reduction algorithm is the instance , of the vertexcover problem. To complete the proof, we show that this transformation is indeed a reduction: the graph G has a clique of size k if and only if the graph has a vertex cover of size V  k.
Suppose that G has a clique V' V with V' = k. We claim that V  V' is a vertex cover in . Let (u, v) be any edge in . Then, (u, v) E, which implies that at least one of u or v does not belong to V', since every pair of vertices in V' is connected by an edge of E. Equivalently, at least one of u or v is in V  V', which means that edge (u,v) is covered by V  V'. Since (u, v) was chosen arbitrarily from , every edge of is covered by a vertex in V  V'. Hence, the set V  V', which has size V  k, forms a vertex cover for .
Conversely, suppose that has a vertex cover V' V , where V' = V  k. Then, for all u, v V, if (u, v) , if , then u V' or v V' or both. The contrapositive of this implication is that for all u, v V, if u V' and v V', then (u, v) E. In other words, V  V' is a clique, and it has size V  V' = k.
Since VERTEXCOVER is NPcomplete, we don't expect to find a polynomialtime algorithm for finding a minimumsize vertex cover. Section 37.1 presents a polynomialtime 'approximation algorithm,' however, which produces 'approximate' solutions for the vertexcover problem. The size of a vertex cover produced by the algorithm is at most twice the minimum size of a vertex cover.
Thus, we shouldn't give up hope just because a problem is NPcomplete. There may be a polynomialtime approximation algorithm that obtains nearoptimal solutions, even though finding an optimal solution is NPcomplete. Chapter 37 gives several approximation algorithms for NPcomplete problems.
The next NPcomplete problem we consider is arithmetic. In the subsetsum problem, we are given a finite set S N and a target t N. We ask whether there is a subset S' S whose elements sum to t. For example, if S = and t = 3754, then the subset S' = is a solution.
As usual, we define the problem as a language:
SUBSETSUM =
.
As with any arithmetic problem, it is important to recall that our standard encoding assumes that the input integers are coded in binary. With this assumption in mind, we can show that the subsetsum problem is unlikely to have a fast algorithm.
Theorem 13
The subsetsum problem is NPcomplete.
Proof To show that SUBSETSUM is in NP, for an instance S, t of the problem, we let the subset S' be the certificate. Checking whether t = _{s}S's can be accomplished by a verification algorithm in polynomial time.
We now show that VERTEXCOVER _{P} SUBSETSUM. Given an instance G, k of the vertexcover problem, the reduction algorithm constructs an instance S, t of the subsetsum problem such that G has a vertex cover of size k if and only if there is a subset of S whose sum is exactly t.
At the heart of the reduction is an incidencematrix representation of G. Let G = (V, E) be an undirected graph and let V = and E = . The incidence matrix of G is a V E matrix B = (bij) such that
For example, Figure 14(b) shows the incidence matrix for the undirected graph of Figure 14(a). The incidence matrix is shown with lowerindexed edges on the right, rather than on the left as is conventional, in order to simplify the formulas for the numbers in S.
Given a graph G and an integer k, the reduction algorithm computes a set S of numbers and an integer t. To understand how the reduction algorithm works, let us represent numbers in a 'modified base4' fashion. The E loworder digits of a number will be in base4 but the highorder digit can be as large as k. The set of numbers is constructed in such a way that no carries can be propagated from lower digits to higher digits.
The set S consists of two types of numbers, corresponding to vertices and edges respectively. For each vertex v_{i} V, we create a positive integer x_{i} whose modified base4 representation consists of a leading 1 followed by E digits. The digits correspond to v_{i}'s row of the incidence matrix B = (b_{ij}) for G, as illustrated in Figure 14(c). Formally, for i = 0, 1, . . . , V  1,
For each edge e_{j} E, we create a positive integer y_{j} that is just a row of the 'identity' incidence matrix. (The identity incidence matrix is the E E matrix with 1's only in the diagonal positions.) Formally, for j = 0, 1, . . . , E  1,
y_{j} = 4^{j} .
The first digit of the target sum t is k, and all E lowerorder digits are 2's. Formally,
All of these numbers have polynomial size when we represent them in binary. The reduction can be performed in polynomial time by manipulating the bits of the incidence matrix.
We must now show that graph G has a vertex cover of size k if and only if there is a subset S' S whose sum is t. First, suppose that G has a vertex cover V' V of size k. Let V' = , and define S' by
S' =
.
To see that _{s}S_{'}S = t, observe that summing the k leading 1's of the x_{im }S' gives the leading digit k of the modified base4 representation of t. To get the loworder digits of t, each of which is a 2, consider the digit positions in turn, each of which corresponds to an edge e_{j}. Because V' is a vertex cover, e_{j} is incident on at least one vertex in V'. Thus, for each edge e_{j}, there is at least one x_{im} S' with a 1 in the jth position. If e_{j} is incident on two vertices in V', then both contribute a 1 to the sum in the jth position. The jth digit of y_{j} contributes nothing, since e_{j} is incident on two vertices, which implies that y_{j} S'. Thus, in this case, the sum of S' produces a 2 in the jth position of t. For the other casewhen e_{j }is incident on exactly one vertex in V'we have y_{j} S', and the incident vertex and y_{j} each contribute 1 to the sum of the jth digit of t, thereby also producing a 2. Thus, S' is a solution to the subsetsum instance S.
Now, suppose that there is a subset S' S that sums to t. Let S = . We claim that m = k and that V' = is a vertex cover for G. To prove this claim, we start by observing that for each edge e_{j} E, there are three 1's in set S in the e_{j} position: one from each of the two vertices incident on e_{j}, and one from y_{j}. Because we are working with a modified base4 representation, there are no carries from position e_{j} to position e_{j}_{+1}. Thus, for each of the E loworder positions of t, at least one and at most two x_{i} must contribute to the sum. Since at least one x_{i} contributes to the sum for each edge, we see that V' is a vertex cover. To see that m = k, and thus that V' is a vertex cover of size k, observe that the only way the leading k in target t can be achieved is by including exactly k of the x_{i} in the sum.
In Figure 14, the vertex cover V' = corresponds to the subset S' = . All of the y_{j} are included in S', with the exception of y_{1}, which is incident on two vertices in V'.
We now return to the hamiltoniancycle problem defined in Section 2.
Theorem 14
The hamiltonian cycle problem is NPcomplete.
Proof We first show that HAMCYCLE belongs to NP. Given a graph G = (V, E), our certificate is the sequence of V vertices that make up the hamiltonian cycle. The verification algorithm checks that this sequence contains each vertex in V exactly once and that with the first vertex repeated at the end, it forms a cycle in G. This verification can be performed in polynomial time.
Our first widget is the subgraph A shown in Figure 15(a). Suppose that A is a subgraph of some graph G and that the only connections between A and the remainder of G are through the vertices a, a', b, and b'. Furthermore, suppose that graph G has a hamiltonian cycle. Since any hamiltonian cycle of G must pass through vertices z_{1}, z_{2}, z_{3}, and z_{4} in one of the ways shown in Figures 15(b) and (c), we may treat subgraph A as if it were simply a pair of edges (a, a') and (b, b') with the restriction that any hamiltonian cycle of G must include exactly one of these edges. We shall represent widget A as shown in Figure 15(d).
The subgraph B in Figure 16 is our second widget. Suppose that B is a subgraph of some graph G and that the only connections from B to the remainder of G are through vertices b_{1}, b_{2}, b_{3}, and b_{4}. A hamiltonian cycle of graph G cannot traverse all of the edges (b_{1}, b_{2}), (b_{2}, b_{3}), and (b_{3}, b_{4}), since then all vertices in the widget other than b_{1}, b_{2}, b_{3}, and b_{4} would be missed. A hamiltonian cycle of G may, however, traverse any proper subset of these edges. Figures 16(a)(e) show five such subsets; the remaining two subsets can be obtained by performing a toptobottom flip of parts (b) and (e). We represent this widget as in Figure 16(f), the idea being that at least one of the paths pointed to by the arrows must be taken by a hamiltonian cycle.
The graph G that we shall construct consists mostly of copies of these two widgets. The construction is illustrated in Figure 17. For each of the k clauses in , we include a copy of widget B, and we join these widgets together in series as follows. Letting b_{ij} be the copy of vertex b_{j} in the ith copy of widget B, we connect b_{i}_{,4} to b_{i}_{+1,1} for i = 1, 2, . . . , k  1.
Then, for each variable x_{m} in , we include two vertices . We connect these two vertices by means of two copies of the edge , which we denote by e_{m} and to distinguish them. The idea is that if the hamiltonian cycle takes edge e_{m}, it corresponds to assigning variable x_{m} the value 1. If the hamiltonian cycle takes edge , the variable is assigned the value 0. Each pair of these edges forms a twoedge loop; we connect these small loops in series by adding edges for m = 1, 2, . . . , n  1. We connect the left (clause) side of the graph to the right (variable) side by means of two edges , which are the topmost and bottommost edges in Figure 17.
We are not yet finished with the construction of graph G, since we have yet to relate the variables to the clauses. If the jth literal of clause C_{i} is x_{m}, then we use an A widget to connect edge (b_{ij}, b_{i,j}_{+1}) with edge e_{m}. If the jth literal of clause C_{i} is x_{m}, then we instead put an A widget between edge (b_{ij}, b_{i,j}_{+1}) and edge . In Figure 17, for example, because clause C_{2 }is (x_{1 }x_{2} x_{3}), we place three A widgets as follows:
between (b_{2,1}, b_{2,2}) and e_{1},
between (b_{2,2}, b_{2,3}) and , and
between (b_{2,3}, b_{2,4}) and e_{3}.
Note that connecting two edges by means of A widgets actually entails replacing each edge by the five edges in the top or bottom of Figure 15(a) and, of course, adding the connections that pass through the z vertices as well. A given literal l_{m} may appear in several clauses (x_{m} in Figure 17, for example), and thus an edge e_{m} or may be be influenced by several A widgets (edge , for example). In this case, we connect the A widgets in series, as shown in Figure 18, effectively replacing edge e_{m} or by a series of edges.
We claim that formula is satisfiable if and only if graph G contains a hamiltonian cycle. We first suppose that G has a hamiltonian cycle h and show that is satisfiable. Cycle h must take a particular form:
First, it traverses edge to go from the top left to the top right.
It then follows all of the vertices from top to bottom, choosing either edge e_{m} or edge , but not both.
It next traverses edge to get back to the left side.
Finally, it traverses the B widgets from bottom to top on the left.
(It actually traverses edges within the A widgets as well, but we use these subgraphs to enforce the either/or nature of the edges it connects.)
Given the hamiltonian cycle h, we define a truth assignment for as follows. If edge e_{m} belongs to h, then we set x_{m} = 1. Otherwise, edge belongs to h, and we set x_{m} = 0.
We claim that this assignment satisfies . Consider a clause C_{i} and the corresponding B widget in G. Each edge (b_{i}_{,j}b_{i,j}_{+l}) is connected by an A widget to either edge e_{m} or edge , depending on whether x_{m} or x_{m} is the jth literal in the clause. The edge (b_{i,j},b_{i,j}_{+1}) is traversed by h if and only if the corresponding literal is 0. Since each of the three edges (b_{i}_{,1},b_{i}_{,2}),(b_{i}_{,2},b_{i}_{,3}),(b_{i}_{,3},b_{i}_{,4}) in clause C_{i} is also in a B widget, all three cannot be traversed by the hamiltonian cycle h. One of the three edges, therefore, must have a corresponding literal whose assigned value is 1, and clause C_{i} is satisfied. This property holds for each clause C_{i,} i = 1, 2, . . . , k, and thus formula ø is satisfied.
Conversely, let us suppose that formula ø is satisfied by some truth assignment. By following the rules from above, we can construct a hamiltonian cycle for graph G: traverse edge e_{m} if x_{m} = 1, traverse edge if x_{m} = 0, and traverse edge (b_{i,j},b_{i,j}_{+1}) if and only if the jth literal of clause C_{i} is 0 under the assignment. These rules can indeed be followed, since we assume that s is a satisfying assignment for formula ø.
Finally, we note that graph G can be constructed in polynomial time. It contains one B widget for each of the k clauses in ø. There is one A widget for each instance of each literal in ø, and so there are 3k A widgets. Since the A and B widgets are of fixed size, the graph G has O(k) vertices and edges and is easily constructed in polynomial time. Thus, we have provided a polynomialtime reduction from 3CNFSAT to HAMCYCLE.
In the travelingsalesman problem, which is closely related to the hamiltoniancycle problem, a salesman must visit n cities. Modeling the problem as a complete graph with n vertices, we can say that the salesman wishes to make a tour, or hamiltonian cycle, visiting each city exactly once and finishing at the city he starts from. There is an integer cost c(i, j) to travel from city i to city j, and the salesman wishes to make the tour whose total cost is minimum, where the total cost is the sum of the individual costs along the edges of the tour. For example, in Figure 19, a minimumcost tour is u, w, v, x, u, with cost 7. The formal language for the travelingsalesman problem is
TSP = .
The following theorem shows that a fast algorithm for the travelingsalesman problem is unlikely to exist.
The travelingsalesman problem is NPcomplete.
Proof We first show that TSP belongs to NP. Given an instance of the problem, we use as a certificate the sequence of n vertices in the tour. The verification algorithm checks that this sequence contains each vertex exactly once, sums up the edge costs, and checks whether the sum is at most k. This process can certainly be done in polynomial time.
To prove that TSP is NPhard, we show that HAMCYCLE _{p} TSP. Let G = (V, E) be an instance of HAMCYCLE. We construct an instance of TSP as follows. We form the complete graph G'= (V,E'), where E' = , and we define the cost function c by
The instance of TSP is then (G', c, 0), which is easily formed in polynomial time.
We now show that graph G has a hamiltonian cycle if and only if graph G' has a tour of cost at most 0. Suppose that graph G has a hamiltonian cycle h. Each edge in h belongs to E and thus has cost 0 in G'. Thus, h is a tour in G' with cost 0. Conversely, suppose that graph G' has a tour h' of cost at most 0. Since the costs of the edges in E' are 0 and 1, the cost of tour h' is exactly 0. Therefore, h' contains only edges in E. We conclude that h is a hamiltonian cycle in graph G.
51
The subgraphisomorphism problem takes two graphs G_{1} and G_{2} and asks whether G_{1} is a subgraph of G_{2}. Show that the subgraphisomorphism problem is NPcomplete.
52
Given an integer mbyn matrix A and an integer mvector b, the 01 integerprogramming problem asks whether there is an integer nvector x with elements in the set such that Ax b. Prove that 01 integer programming is NPcomplete. (Hint: Reduce from 3CNFSAT.)
53
Show that the subsetsum problem is solvable in polynomial time if the target value t is expressed in unary.
54
The setpartition problem takes as input a set S of numbers. The question is whether the numbers can be partitioned into two sets A and such that . Show that the setpartition problem is NPcomplete.
55
Show that the hamiltonianpath problem is NPcomplete.
56
The longestsimplecycle problem is the problem of determining a simple cycle (no repeated vertices) of maximum length in a graph. Show that this problem is NPcomplete.
57
Professor Marconi proclaims that the subgraph used as widget A in the proof of Theorem 14 is more complicated than necessary: vertices z_{3} and z_{4} of Figure 15(a) and the vertices above and below them are not needed. Is the professor correct? That is, does the reduction work with this smaller version of the widget, or does the 'either/or' property of the widget disappear?
361 Independent set
An independent set of a graph G = (V,E) is a subset V' V of vertices such that each edge in E is incident on at most one vertex in V'. The independentset problem is to find a maximumsize independent set in G.
a. Formulate a related decision problem for the independentset problem, and prove that it is NPcomplete. (Hint: Reduce from the clique problem.)
b. Suppose that you are given a subroutine to solve the decision problem you defined in part (a). Give an algorithm to find an independent set of maximum size. The running time of your algorithm should be polynomial in V and E, where queries to the black box are counted as a single step.
Although the independentset decision problem is NPcomplete, certain special cases are polynomialtime solvable.
c. Give an efficient algorithm to solve the independentset problem when each vertex in G has degree 2. Analyze the running time, and prove that your algorithm works correctly.
d. Give an efficient algorithm to solve the independentset problem when G is bipartite. Analyze the running time, and prove that your algorithm works correctly. (Hint: Use the results of Section 27.3.)
362 Graph coloring
A kcoloring of an undirected graph G = (V,E) is a function c : V such that c(u) c(v) for every edge (u,v) E. In other words, the numbers 1, 2, . . . , k represent the k colors, and adjacent vertices must have different colors. The graphcoloring problem is to determine the minimum number of colors needed to color a given graph.
a. Give an efficient algorithm to determine a 2coloring of a graph if one exists.
b. Cast the graphcoloring problem as a decision problem. Show that your decision problem is solvable in polynomial time if and only if the graphcoloring problem is solvable in polynomial time.
c. Let the language 3COLOR be the set of graphs that can be 3colored. Show that if 3COLOR is NPcomplete, then your decision problem from part (b) is NPcomplete.
To prove that 3COLOR is NPcomplete, we use a reduction from 3CNFSAT. Given a formula ø of m clauses on n variables x_{1}, x_{2}, . . . , x_{n}, we construct a graph G = (V,E) as follows. The set V consists of a vertex for each variable, a vertex for the negation of each variable, 5 vertices for each clause, and 3 special vertices: TRUE, FALSE, and RED. The edges of the graph are of two types: 'literal' edges that are independent of the clauses and 'clause' edges that depend on the clauses. The literal edges form a triangle on the special vertices and also form a triangle on x_{i}, x_{i}, and RED for i = 1, 2, . . . , n.
d. Argue that in any 3coloring c of a graph containing the literal edges, exactly one of a variable and its negation is colored c(TRUE) and the other is colored c(FALSE). Argue that for any truth assignment for ø, there is a 3coloring of the graph containing just the literal edges.
The widget shown in Figure 20 is used to enforce the condition corresponding to a clause (x V y V z). Each clause requires a unique copy of the 5 vertices that are heavily shaded in the figure; they connect as shown to the literals of the clause and the special vertex TRUE.
e. Argue that if each of x, y, and z is colored c(TRUE) or c(FALSE), then the widget is 3colorable if and only if at least one of x, y, or z is colored c(TRUE).
f. Complete the proof that 3COLOR is NPcomplete.
Garey and Johnson [79] provide a wonderful guide to NPcompleteness, discussing the theory at length and providing a catalogue of many problems that were known to be NPcomplete in 1979. (The list of NPcomplete problem domains at the beginning of Section 5 is drawn from their table of contents.) Hopcroft and Ullman [104] and Lewis and Papadimitriou [139] have good treatments of NPcompleteness in the context of complexity theory. Aho, Hopcroft, and Ullman [4] also cover NPcompleteness and give several reductions, including a reduction for the vertexcover problem from the hamiltoniancycle problem.
The class P was introduced in 1964 by Cobham [44] and, independently, in 1965 by Edmonds [61], who also introduced the class NP and conjectured that P NP. The notion of NPcompleteness was proposed in 1971 by Cook [49], who gave the first NPcompleteness proofs for formula satisfiability and 3CNF satisfiability. Levin [138] independently discovered the notion, giving an NPcompleteness proof for a tiling problem. Karp [116] introduced the methodology of reductions in 1972 and demonstrated the rich variety of NPcomplete problems. Karp's paper included the original NPcompleteness proofs of the clique, vertexcover, and hamiltoniancycle problems. Since then, hundreds of problems have been proven to be NPcomplete by many researchers.
The proof of Theorem 14 was adapted from Papadimitriou and Steiglitz [154].

Vizualizari: 1625
Importanta:
Termeni si conditii de utilizare  Contact
© SCRIGROUP 2019 . All rights reserved
Distribuie URL
Adauga cod HTML in site