The Knapsack Problem, Cryptography, and the Presidential Election by Brandon S. McMillen Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in the Mathematics Program YOUNGSTOWN STATE UNIVERSITY May, 2012 The Knapsack Problem, Cryptography, and the Presidential Election Brandon S. McMillen I hereby release this thesis to the public. I understand that this thesis will be made available from the OhioLINK ETD Center and the Maag Library Circulation Desk for public access. I also authorize the University or other individuals to make copies of this thesis as needed for scholarly research. Signature: Brandon S. McMillen, Student Date Approvals: Dr. Nathan Ritchey, Thesis Advisor Date Dr. Jozsi Jalics, Committee Member Date Dr. Jacek Fabrykowski, Committee Member Date Peter J. Kasvinsky, Dean of School of Graduate Studies & Research Date c© Brandon S. McMillen 2012 ABSTRACT The 0-1 Knapsack Problem is an NP-hard optimization problem that has been stud- ied extensively since the 1950s, due to its real world significance. The basic problem is that a knapsack with a weight capacity c is to be filled with a subset of n items. Each item i, has a weight value w i and a profit value p i . The goal is to maximize total profit value without the having the total weight exceed the capacity. In this thesis, the 0-1 Knapsack Problem is introduced and some of the research and applications of the problem are given. Pisinger’s branch-and-bound algorithm that will converge to an optimal solution is presented. One of the earliest applications of the knapsack problem, the knapsack cryptosystems, is then discussed. The earliest knapsack cryptosystem, the Merkle-Hellman Cryptosystem, is described along with how Adi Shamir broke this cryptosystem. Generating functions are then used to provide a number of solutions to a knapsack problem. Using the generating function of the knapsack problem, the paper concludes with an application on the Electoral College. iv Contents 1 0-1 Knapsack Problem 1 1.1 Branch-and-BoundAlgorithm...................... 5 1.1.1 Introduction............................ 5 1.1.2 Pisinger’sAlgorithm ....................... 7 2 Cryptosystems 11 2.1 Basic Merkle-Hellman Knapsack Cryptosystem ............. 11 2.1.1 CreatingaTrapdoorKnapsack ................. 12 2.2 Attack on the Basic Merkle-Hellman Cryptosystem .......... 15 2.2.1 Rationale for Breaking Merkle-Hellman Cryptosystem ..... 20 3 Generating Functions 26 3.1 Introduction................................ 26 3.2 PredictingthePresidentialElection................... 29 4 Conclusion 32 5 References 33 v 1 0-1 Knapsack Problem The family of knapsack problems requires that a subset of some given items be chosen such that a profit sum is maximized without exceeding the capacity of the particular knapsack. There are many different types of knapsack problems, but for this thesis we are going to discuss the 0-1 Knapsack Problem.The0-1 Knapsack Problem is the problem of choosing a subset of n items, where each item has a profit p i and a weight w i , which will fit into a knapsack with capacity c. The goal is to maximize the profit sum without having the weight sum exceed the capacity c. This problem is formulated as the following maximization problem: maximize z = n summationdisplay i=1 p i x i subject to n summationdisplay i=1 w i x i ≤ c x i ∈{0,1},i=1,...,n, (1.1) where x i equals 1 if item i is chosen to be in the knapsack, and 0 if it is not. A particular case of the 0-1 knapsack problem arises when p i = w i . The objective of the problem is to find a subset of the weights that is closest to c without exceeding it. This problem is generally referred to as the Subset-Sum Problem. This problem is related to the diophantine equation n summationdisplay i=1 w i x i =ˆc where x i ∈{0,1},i=1,...,n, (1.2) such that the optimal solution to the Subset-Sum Problem is the largest ˆc ≤ c for 1 which (1.2) has a solution. The Knapsack Problem has been studied extensively due to its simple structure. In some sense, knapsack problems are among the simplest integer programming prob- lems, and also appear as subproblems in many complex programs and integer pro- gramming algorithms. In the 1950’s, Bellman’s [2] dynamic programming theory produced the first algorithm to exactly solve the knapsack problem. In 1957, Danzig [5] studied the knapsack problem and created a method to determine an upper bound on z that was used in various studies of the knapsack problem. The first branch and bound algorithm was derived in 1967 by Kolesar [11]. The first polynomial-time ap- proximation algorithm was proposed by Johnson [8] in 1974. The main results from the eighties dealt with creating algorithms for large-sized problems. Throughout the years, the knapsack problem has been vastly studied, including quite recently in 2010 by Howgrave-Graham and Joux [7]. They developed an algorithm for knapsack prob- lems that runs in the smallest time found. The Knapsack Problems also have been vastly studied for the past fifty years due to their immediate applications in industry and financial management. Some of these applications include capital budgeting, selection of financial portfolios, cargo loading, the cutting stock problem, and the bin-packing problem. Also, the knapsack problem has been used to solve many combinatorial and optimization problems, such as a sub- problem in many famous optimization problems including the traveling sales-person problem. They were also used in the first cryptosystems that we will discuss later on in this thesis. Knapsack problems can also be used to predict which presidential candidate will win an election, which we will show later in this thesis. The following example demonstrates a simple application of the Knapsack Problem. 2 Example 1.1 Suppose you want to invest c = $20 and you are considering 5 invest- ment opportunities. Investment 1 costs $8 with a return of $10, investment 2 costs $9 with a return of $10, investment 3 costs $3 with a return of $5, investment 4 costs $5 with a return of $2, investment 5 costs $7 with a return of $9, and investment 6 costs $3 with a return of $1. How can you maximize your the return on your investment without spending more than c dollars? This problem can be modeled as a knapsack problem as follows: maximize z =10x 1 +10x 2 +5x 3 +2x 4 +9x 5 +x 6 subject to 8x 1 +9x 2 +3x 3 +5x 4 +7x 5 +3x 6 ≤ 20 x i ∈{0,1},i=1,...,6, (1.3) where x i = investment i and equals 1 if investment i is chosen to be in the knapsack, and 0 if it is not. The above example is so small that we are able to solve the problem by exhaustive search or brute force. However, on larger problems enumerating every solution is quite time consuming and not computationally feasible. In each problem, there are 2 n possible solutions. So, although the 0-1 Knapsack Problem has a very simple structure, it cannot, as of yet, be solved in a time bounded by a polynomial. The 0-1 Knapsack problem is known as an NP−hard problems, meaning that it is very unlikely that researchers will ever devise polynomial algorithms to solve all instances of this problem. The following proof of 0-1 Knapsack Problem being NP−hard is adapted from Martello and Toth [13]. In order to show that a problem is NP−hard, we show that for each problem P,itsrecognition version, R(P), is NP− hard. We do this 3 by showing that a basic NP−hard problem can be polynomially transformed into R(P). We will use the following basic NP−hard problem, known as the Partition Problem, which is originally treated by Karp [10]. The Partition Problem is described as follows: Given n positive integers w 1 ,...,w n , is there a subset S ⊆{1,...,n} such that summationtext i∈S w i = summationtext i/∈S w i ? Theorem 1.1 0-1 Knapsack Problem is NP−hard. Proof Consider the recognition version of the Subset-Sum Problem, i.e.: given n+2 integers w 1 ,...,w n ,c,and a, is there a subset S ⊆{1,...,n} such that summationtext i∈S w i ≤ c and summationtext i∈S w i ≥ a? Bysettingc = a = summationtext i∈S w i /2,anyinstanceofPartitionProblemcanbepolynomi- ally transformed into an instance of the recognition version of the Subset-Sum Prob- lem. Thus, the Subset-Sum Problem is NP−hard. And since the Subset-Sum Prob- lem is a particular case of the 0-1 Knapsack Problem when p i = w i for i ∈{1,...,n}, the 0-1 Knapsack Problem is NP−hard. square In the next section, we provide an algorithm to solve these problems. Note that numerous methods have been developed to solve instances of 0-1 Knapsack Problems. This algorithm is a branch-and-bound algorithm and is used because it helps us converge to an optimal solution more efficiently without having to always enumerate all possible solutions. Also, this algorithm is used, since it shows us certain attributes the 0-1 knapsack problem possesses. 4 1.1 Branch-and-Bound Algorithm 1.1.1 Introduction Since Knapsack problems are NP−hard, we do not know any other exact solution techniquesthancompletelycheckingeachpossiblesolutiontoseewhichistheoptimal. However, we can reduce computational time by using a branch-and-bound algorithm. A branch-and-bound algorithm is a search method where one has a way of computing a lower bound on an instance of the problem and a way to divide the problem into feasible regions to create smaller subproblems. Then, the problem is analyzed in a systematic. A branch-and-bound algorithm is basically a complete enumeration, but we find nodes that cannot lead to an improved solution, thus reducing computational time. Before proceeding with the description of the algorithm, let us define some terms first. Definition 1.1 The efficiency, e i , of a knapsack item i is p i /w i , where p i is the profit of item i and w i is the weight of item i. Definition 1.2 The profit sum ¯p i and the weight sum ¯w i of items up to i is defined as ¯p i = i summationdisplay j=1 p j ,i=0,...,n, ¯w i = i summationdisplay j=1 w j ,i=0,...,n. Definition 1.3 When we order a knapsack from greatest efficiency to the smallest efficiency (i.e. e i >e j when ic, we will remove an item i ≤ s from the knapsack. If we inserted an item i, set t = i+1, or if we removed an item i, set s = i−1. Doing this assures we only insert items after i or remove items before i. 7 We backtrack the algorithm if the upper bound, u, does not exceed z,where z = current ¯p, i.e. if u = floorleftbigg ¯p+ (c− ¯wp t ) w t floorrightbigg ≤ z, when ¯w ≤ c. (1.5) u = floorleftbigg ¯p+ (c− ¯wp s ) w s floorrightbigg ≤ z, when ¯w>c, (1.6) That is if our upper bound u does not exceed z, we cannot improve z on the current branch and we must backtrack to try and improve our z. A problem occurs if no bound stops the branching, i.e. s may grow below 1 or t may grow above n.To prevent this from happening, we enter stop items, 0 and n+1,where(p 0 ,w 0 )=(1,0) and (p n+1 ,w w+1 )=(0,1). This will ensure that if s =0ort = n + 1, the upper bounds will be so bad it will force us to backtrack. Throughout the algorithm, we do not need to keep track of our current solution vector x, we only keep track of the items i that differ from the break solution x prime ,items where x i negationslash= x prime i . We add these items to an exception list E each time an improved solution is found. That is, if we add an item i at a stage and (1.5) is true, we set E = E∪{i}, which means our solution vector x is different from x prime by including item i. Similarly, E = E ∪{i} if we remove item i. Thus, at the end of the algorithm, we will have our list E that we use to add and remove variables from x prime accordingly. If we find an optimal solution x, no bounding will stop the iteration. The process will only stop if one of the following occurs: E = ∅, (1.7) 8 or ¯w>c and i ∈ E negationslash= ∅ : i ≥ b, (1.8) or ¯w ≤ c and i ∈ E negationslash= ∅ : ic, as shown in Figure 1. So, continuing in this manner, we find that our optimal 9 Figure 1: Branching tree for Example 1.2 solution is z = 25. In order to find our optimal solution vector x,welookatoursetE and change the x prime accordingly. Figure 1 shows us that E = {2,4}. The 2 represents we removed knapsack item 2 and the 4 represents when we added knapsack item 4 as indicated in Figure 1 on whether we added item i or removed item i.Thus,our optimal solution vector x = (101100). Note that the arrows that point up on Figure 1 refers to when we check if we found an improved solution or not. Also, notice that 10 when initially adding item t =4, then removing item s =3, and finally adding item t = 5 to the knapsack, we see that our u does not exceed our current z, so we must backtrack the algorithm to eventually get to our optimal solution that we found. Although this algorithm helps to converge to an optimal solution efficiently, we may often come across problems where a complete enumeration of the branching is needed. This is due to the fact the Knapsack problem is NP-hard. 2 Cryptosystems Included in the many applications of the knapsack problem is the ability to create cryptosystems. Some of the earliest public key cryptosystems were ones formed from the knapsack problem since it is NP-hard and is believed to be computationally dif- ficult to solve. With the creation of these cryptosystems, there also came people who wanted to break them. In this section, we discuss the earliest knapsack cryptosystem created by Ralph C. Merkle and Martin E. Hellman [14] in 1978. Then we discuss how, a few years later, Adi Shamir[16] broke the basic Merkle-Hellman knapsack cryptosystem which led to the fall of the knapsack cryptosystem. 2.1 Basic Merkle-Hellman Knapsack Cryptosystem The basic idea of the Merkle-Hellman cryptosystem is to create a knapsack problem which appears to be hard to solve but in actuality is easy to solve. They refer to this special knapsack as a trapdoor knapsack. 11 2.1.1 Creating a Trapdoor Knapsack First, a private super-increasing set of nonnegative integers S = {s i :1≤ i ≤ n}, where for all i s i > i−1 summationdisplay j=1 s j , is created. Then for any knapsack problem of the form n summationtext i=1 s i x i = e, where e ∈ Z + and x i ∈{0,1}, x n = 1 if and only if e− n summationtext j=i+1 s j x j ≥ s i . Note that your n would have to be large enough so that this trend will not be evident. If n were small, then we could just enumerate all possible solutions and find the optimal one. Example 2.1 Let S = {40,116,215,458,982,2024} be a super-increasing set. Find the solution to 40x 1 +116x 2 +215x 3 +458x 4 +982x 5 +2024x 6 = 2279 where x i ∈{0,1} for 1 ≤ i ≤ 6. From above, since 2279 > 2024 ⇒ x 6 =1 2279−2024 = 255 > 215 ⇒ x 3 =1 255−215 = 40 ⇒ x 1 =1 Therefore, the only solution to the knapsack problem is {1,0,0,1,0,1}. In order to produce a trapdoor knapsack problem, you must choose a decryption pair (m,w) which is privately known, and where 12 m> n summationdisplay i=1 s i , (2.1) 2 dn ≤ m ≤ 2 dn+1 , (2.2) where d ≥ 1 is called an expansion factor, 1 ≤ w 982 ⇒ x 5 =1 14 1138−982 = 156 > 116 ⇒ x 2 =1 156−116 = 40 ⇒ x 1 =1. Thus, Alice finds that the message is e = 110010, which is the message that Bob sent. Therefore, the encryption and decryption were successful. For their cryptosystem to be secure, Merkle and Hellman suggested that size of the knapsack problem should contain at least n = 100 knapsack items, m to be chosen uniformly from [2 2n+1 +1,2 2n+2 −1], s 1 be chosen uniformly from [1,2 n ], s 2 be chosen uniformly from [2 n +1,(2)2 n ], s 3 be chosen uniformly from [(3)2 n +1,(4)2 n ], and the s i where 4 ≤ i ≤ n be chosen uniformly from [(2 i−1 −1)2 n +1,(2 i−1 )2 n ]. This method ensures that m is still greater than the sum of s i ’s. Also, to make the set more secure, one could also use a permutation σ on the set A in order to protect the corresponding easy knapsack weights, σ would also be part of the trapdoor information. Not too long after Merkle and Hellman created their knapsack cryptosystem, Adi Shamir [16] broke the cryptosystem which started a chain reaction for the failure of various knapsack cryptosystems. 2.2 Attack on the Basic Merkle-Hellman Cryptosystem Without the trapdoor information, m, w −1 ,andS , Merkel and Hellman thought that it would be difficult to crack this cryptosystem. And since solving knapsacks are hard, they believed that their system was secure. However in 1981, Adi Shamir [16] discovered a strong attack on the basic Merkle-Hellman cryptosystem. The object of this attack was to find a decryption pair (w ∗ ,m ∗ ), where w ∗ m ∗ is sufficiently close to w −1 m . As we will see, this is possible since any basic knapsack cryptosystem has 15 infinitely many decryption pairs. The following algorithm highlights the attack that Shamir [16] provided. In his paper, Lagarias outlined the performance of Shamir’s attack on the cryptosystem and considered all the steps of his attack. Let us refer to this algorithm as Algorithm S, as does Lagarias. Step 1. Estimate the modulus m by ˜m =max 1≤i≤n a i (2.6) and estimate the expansion factor d by d ∗ = 1 n log 2 (n 2 ˜m). (2.7) Step 2. Set g =max{d ∗ +2,5}. (2.8) In this step, we want to find the a i in the public set A that correspond to the g small- est super-increasing elements of the private set S. Since in most cases the person creating the cryptosystem will use a permutation σ on the set A, in order to hide the order of the elements, we must run the rest of the algorithm parenleftbigg n g parenrightbigg times until a solution is found. Step 3. Solve the integer program |k i a 1 −k 1 a i |≤b, 2 ≤ i ≤ g (2.9) 16 1 ≤ k 1 ≤ b−1, (2.10) where b = floorleft2 −n+g ˜mfloorright. (2.11) If a solution (k (0) 1 ,...,k (0) g ) is found, we create and attempt to solve two new integer programs by replacing (2.10) by the constraints 1 ≤ k 1 n summationdisplay i=1 s ∗ i . Therefore, Eve found a decryption pair. Now, Eve is able to decrypt e ∗ by calculating e ∗ w ∗−1 30764 (mod m ∗ 30764 ), which is 693805651968. Thus, 693805651968 > 628250247612 ⇒ x 5 =1 693805651968−628250247612 = 65555404356 > 48746326316 ⇒ x 3 =1 65555404356−48746326316 = 16809078040 ⇒ x 1 =1. 19 Therefore, Eve found that the message Bob was trying to send to Alice was e = 110010. 2.2.1 Rationale for Breaking Merkle-Hellman Cryptosystem We will now discuss the rationale for Algorithm S. Notice that the decryption con- gruence for 1 ≤ i ≤ n w −1 a i ≡ s i (mod m) (2.18) is equivalent to w −1 a i −mt i = s i (2.19) for some t i ∈ Z + . This implies that w −1 m − t i a i = s i ma i , (2.20) by multiplying (2.19) by 1 ma i . Then by subtracting (2.20) and (2.20), when i =1, and then multiplying this difference by a i a 1 ,we get t i a 1 −t 1 a i = 1 m (s 1 a i −s i a 1 ). (2.21) Since we know that 20 s 1 +s 2 +···+s n . [2] R.Bellman,Someapplicationsofthetheoryofdynamicprogramming-areview, Operations Research vol 2, 1954, pp. 275-288. [3] E. F. Brickell, Breaking Iterated Knapsacks, Advances in Cryptology-Proc. Crypto 84, Springer-Verlag, Berlin, 1985, pp. 342-358. [4] B. Chor, and R. Rivest, A knapsack type cryptosystem based on arithmetic in finite fields, IEEE Trans. Inform. Theory, vIT-34 i5, 1985, pp. 901-909. [5] G.B. Dantzig, Discrete variable extremum problems, Operations Research vol. 5, 1957, pp. 266-277. [6] M.R. Garey and D.S. Johnson. Computers and Intractability: a Guide to the Theory of NP-Completeness, Freeman, San Francisco, 1979. [7] N. Howgrave-Graham and A. Joux, New generic algorithm for hard knapsacks, EUROCRYPT’2010, 2010, pp. 235-256. [8] D.S. Johnson, Approximation algorithms for combinatorial problems, Journal of Computer and System Sciences vol. 9, 1974, pp. 256-278. 33 [9] R. Kannan, Improved Alogorithms for Integer Programming and Related Lat- tice Problems,Proc. 15th Annual ACM Symposium on theory of Computing, 1983, pp. 193-206. [10] R.M. Karp, Reducibility among combinatorial problems, In R.E.Miller, J.W. Thatcher (eds.), Complexity of Computer Computations, Plenum Press, New York, 1972, pp. 85-103. [11] P.J. Kolesar, A branch and bound algorithm for the knapsack problem. Man- agement Science vol. 13, 1967, pp. 723-735. [12] J. C. Lagarias, Performance Analysis of Shamirs Attack on the Basic Merkle- Hellman Knapsack Cryptosystem, Proc. 11th Intern. Colloquium on Automata, Languages and Programming (ICALP),LectureNotesinComputerScience, vol. 172, Springer-Verlag, Berlin, 1984, pp. 312-323. [13] S. Martello and P. Toth, Knapsack Problems: Algorithms and Computer imple- mentataions. Wiley, Chichester, England, 1990. [14] R. C. Merkle and M. E. Hellman, Hiding Information and Signatures in Trap- door Knapsacks, IEEE Trans. Inform. Theory, vol. 24, 1978, pp. 525-530. [15] D. Pisinger, Algorithms for Knapsack Problems, Ph.D. Thesis, DIKU, Univer- sity of Copenhagen Report 95/1, 1995. [16] A. Shamir, A Polynomial Time Algorithm for Breaking the Basic Merkle- Hellman Cryptosystem, IEEE Trans. Inform. Theory, vol. IT-30, 1984, pp. 699-70. [17] A. Tucker, Applied Combinatorics (5th ed.), John Wiley & Sons, 2007. 34 [18] S. Vaudenay, Cryptanalysis of the Chor-Rivest Cryptosystem, Journal of Cryp- tology 14, 2001, pp. 87-100. 35