Abstract
The 0-1 knapsack problem is one of the classical NP-hard problems and has been one of the hot research topics in the last few decades. It offers many practical applications in optimization and applied mathematics, such as project selection, resource distribution, investment decision-making and so on. Simultaneity in previous studies DNA molecular computation usually be used to solve NP-complete continuous path search problems (for example HPP, traveling salesman problem), rarely for NP problems with discrete solutions result, such as the 0-1 knapsack problem, graph coloring problem. In this paper, we present a new algorithm for solving the 0-1 knapsack problem with DNA molecular operations. For 0-1 knapsack problem with n distinct items, weight capacity C and total profits W, we reasonably design fixed length DNA strands representing items, take appropriate steps and get the solutions of the problem in proper length range using O(n + C + W) time. We extend the application of DNA molecular operations and simultaneity simplify the complexity of the computation.
Introduction
DNA biological computing is a newly emerging crossdisciplinenary science that uses DNA molecular biotechnologies to solve conundrum problems of computer science and computational mathematics. DNA computing can execute billions of operations simultaneously. The massive parallelism of DNA computing comes from the large number of molecules which chemically interact in a small volume. DNA also provides a huge storage capacity since they encode information on the molecular scale. Meanwhile DNA has a great application prospect for having wide range of abundant resources. NP (nondeterministic polynomial time) problems are a class of mathematical problems which have most likely exponential complexity, for which no efficient solution has been found yet [1]. DNA computing has the advantages of parallel computing, huge storage capacity and low energy consumption. Adleman [2], Liption [3] and Sanches [41] had proved that DNA computing can solve complex NP problems with polynomial time complexity, comparing to exponential complexity of other classical algorithms [43]. As the pioneering work for DNA computing, Adleman [2] presented an idea of solving the Hamiltonian path problem of size n in O(n) steps using DNA molecules. Lipton [3] demonstrated that Adleman’s experiment could be used to figure out the NP-complete satisfiability problem. In recent years, lots of papers have occurred for designing DNA procedures and algorithms to solve various NP-complete problems [4–30]. However, most of the previous works in DNA computing are concentrated on solving the path search problems that the optimum results are continuous head-to-tail ligation edges or vertices sets. For example, Lee [11] first designs different length’s strands representing paths values and cities, takes molecular operations to generate strands standing for all possible paths, then uses biochemical techniques, such as denaturation temperature gradient polymerase chain reaction and temperature gradient gel, to get the optimum solutions of the traveling salesman problem. To solve the shortest path problem, Narayanan [12] respectively carries out DNA reaction to get the strands for a list of series paths, chooses the shortest length strands as the solution through DNA biotechnologies. The previous researches have some insufficient factors. One is that the strands for the possible paths are usually very long, while too long DNA strands can lead to error-prone in annealing and separation procedures using modern biotechniques. The other is that previous research problems are optimum path search ones, so that the possible solutions can be relatively easily represented by DNA strands, while the 0-1 knapsack problem is a discrete mathematics problem with discontinuous solutions. So representation of discrete data with DNA strands is an important issue toward expanding the capability of DNA computing to solve many optimization problems.
The 0-1 knapsack problem is a problem of central importance in optimization and computational sciences. The 0-1 knapsack problem is NP-hard [13], and it is considered unlikely to have an efficient algorithm for solving it up to now. Zou [36], with the aid of two important operations: position updating and genetic mutation with a small probability, proposes a novel global harmony search algorithm to solve 0-1 knapsack problems. Captivo [37] examines the performances of a new labeling algorithm to 4nd all the efficient paths of the bicriteria 0-1 knapsack problem. Zhang [38] introduced an amoeboid organism model to solve the problem. It has three steps: First, the 0-1 knapsack problem is converted into a directed graph by the network converting algorithm. Then, for the purpose of using the amoeboid organism model, the longest path problem is transformed into the shortest path problem. Finally, the shortest path problem can be well handled by the amoeboid organism algorithm. There are n different items available and each item i has a weight wi and a profit pi. The knapsack can hold weight sum at most C. The problem is to find an optimal set of items so as to maximize the total profits subject to the knapsacks weight capacity. The profits, weights, and capacities are positive integers. Let xi (1 ≤ i ≤ n) be binary variables given as follows:
The 0-1 knapsack problem can be mathematically formulated as follows:
The binary decision variables xi are used to indicate whether item i is included in the knapsack or not. For instance, eg. 1 is as:
The rest of this paper is organized as follows. In Section 2, the Adleman-Lipton model is introduced in detail. Section 3 uses a DNA molecular algorithm for solving the 0-1 knapsack problem and the complexity of the proposed algorithm is described in Section 4. We get conclusions in Section 5.
The Adleman-Lipton model
DNA molecular computers work at the molecular level. Because biological and mathematical operations have some similarities, DNA, the genetic material that encodes for living organisms, is stable and predictable in its reactions and can be used to encode information for mathematical systems.
DNA is the major information storage molecule in living cells, and billions of years of evolution have tested and refined both this wonderful informational molecule and highly special enzymes that can either duplicate the information in DNA molecules or transmit this information to other DNA molecules.
A DNA (deoxyribonucleic acid) is a polymer, which is strung together from monomers called deoxyribonucleotides [14]. Distinct nucleotides are detected only with their bases. Those bases are, respectively, abbreviated as adenine (A), guanine (G), cytosine (C), and thymine (T). Two strands of DNA can form (under appropriate conditions) a double strand, if the respective bases are the Watson-Crick complements of each other: A matches T and C matches G; also 3′end matches 5′ end, e.g., the singled strands 5′CTGCAGTACACC3′ and 3′GACGTCATGTGG5′ can form a double strand. We also call the strand 3′GACGTCATGTGG5′ as the complementary strand of 5′CTGCAGTACACC3′. The length of a single stranded DNA is the number of nucleotides comprising the single strand. Thus, if a single stranded DNA includes 20 nucleotides, it is called a 20mer.
The DNA operations proposed by Aldeman [2] and Lipton [3] are described below. These operations will be used for figuring out solutions of the unbalanced assignment problem in this paper. In the Adleman-Lipton model: a tube is a set of molecules of DNA. Given a tube, we can perform the following operations [31–33]: Merge (T1, T2): for two given test tubes T1 and T2, it stores the union T1 ∪ T2 in T1 and leaves T2 empty; Copy (T1, T2): for a given test tube T1, it produces a test tube T2 with the same contents as T1; Detect (T): given a test tube T, it outputs “yes” if T contains at least one strand, otherwise, outputs “no”; Separation (T1, X, T2): for a given test tube T1 and a given set of strings X, it removes all single strands containing a string in X from T1, and produces a test tube T2 with the removed strands; Selection (T1, L, T2): for a given test tube T1 and a given integer L, it removes all strands with length L from T1, and produces a test tube T2 with the removed strands; Sort (T1, T2, T3): for a given test tube T1, it choose the shortest length strands in the tube T2, the longest strands in T3 and the remaining strands in T1; Annealing (T): for a given test tube T, it produces all feasible double strands in T. The produced double strands are still stored in T after annealing; Denaturation (T): for a given test tube T, it dissociates each double strand in T into two single strands; Ligation (T): for a given tube T, the operation is used to ligate together the strands in T; Discard (T): for a given test tube T, it discards the tube T; Read (T): for a given tube T, the operation is used to describe a single molecule, which is contained in the tube T. Even if T contains many different molecules each encoding a different set of bases, the operation can give an explicit description of exactly one of them; Append-tail (T, z): for a given test tube T and a given DNA singled strand zit appends z onto the end of every strand in the tube T.
Since these twelve manipulations are implemented with a constant number of biological steps for DNA strands [34, 40– 42], we assume that each manipulation is in O(1) time complexity.
DNA algorithm for the 0-1 knapsack problem
Thinking process
In the following, the symbols #,
Detailed DNA algorithm
For a graph with n items, every possible subset of the items {x1, x2, …, xn} can be expressed by a n-digit binary number. A bit set to 1 represents the item in the subset, while a bit set to 0 represents the item out of the subset. For example, in eg. 1, the items subset {x2, x4, x5} can be expressed by the binary number 01011. In this way, we transform all possible subsets in a n-item knapsack problem into an ensemble of all n-digit binary numbers. We call this as datapool.
(1) We choose all possible subsets of items. (1-1) Merge (P,Q); (1-2) Aneealing (P); (1-3) Ligation (P); (1-4) Denaturation (P); (1-5) Separation (P, {# A1}, T1); (1-6) Discard (P); (1-7) Separation (T1, {B
n
#}, P).
After the above seven steps of manipulations, the singled strands in tube P will encode all possible subsets of items. For example, in eg. 1, we have singled strands:
They denote the subset of items {x1, x2, x4} corresponding to the binary number 11010. This operation can be finished in O(1) steps since each manipulation above works in O(1) steps.
(2) Each singled strand in tube P denotes one possible item subset after step (1). However the 0-1 knapsack problem requires that the sum of weight is not more than the given capacity C. If xi is selected in a subset, we should append the strands ci at the end of previous strands. For example, in eg. 1, the singled strands # A11B1A20B2A30B3A41B4A51B5 # (representing the subset of items {x1, x4, x5}) should be appended with stands c1, c4, c5. Only so, we could choose all eligible items subsets in the next step. For k = 1 to k = n (2-1) Separation (P, {A
i
1B
i
}, T2); (2-2) Append (T2, c
i
); (2-3) Merge (P, T2); (2-4) Discard (T2). End for
After the above operations, the singled strands in tube P are all possible items subsets with denoting weight stands. Meanwhile we use one “For” clauses, thus this step can be finished in O(n) steps since each single manipulation above works in O(1) steps.
(3) The solution of 0-1 knapsack problem should be a items subset which satisfies the sum of weight is not more than given capacity C. So we should check all the items subsets whether to satisfy the above restrictive conditions. At the same time, in order not to affect the profits measurement, we should make the eligible strands at the same length before the next step. This is done by the following operations: For i = 1 to i = C (3-1) Selection (P, (3n + 2)t + i,T3); (3-2) If (Detect (T3) = “Yes”)
Then (3-3) to (3-5)
(3-3) Append (T3, X); (3-4) Merge ( P,T3); (3-5) Discard (T3);
End for
(3-6) Selection (P, (3n + 2)t + C,T4);
(3-7) Discard (P);
(3-8) Copy (T4, P);
For the eg. 1, the singled strands
In the above operation, we use one “For” clause, thus this operation can be executed in O(C) steps since each single manipulation above works in O(1) steps.
(4) The 0-1 knapsack problem is to find an optimal subset of items so as to maximize the total profits subject with the knapsacks weight capacity restriction. In order to denote the total profits of the items subsets, we append strands Yi at the end of strands if xi is selected in the subset. Taking the eg. 1 for example, the singled strands in P
This is done by the following manipulations: For i = 1 to i = n (4-1) Separation (P, {A
i
1B
i
}, T5); (4-2) Append (T5, Y
i
); (4-3) Merge (P,T5); (4-4) Discard (T5).
End for
In the above operations, we use one “For” clause, thus this operation can be finished in O(n) steps since each single manipulation above works in O(1) steps.
(5) We take out the singled strands in P with maximum length, which represent the solutions of 0-1 knapsack problem. For example, in eg. 1, the singled strands in P with maximum length are
Therefore, solutions of 0-1 knapsack problem for the eg. 1 are {x1, x2, x5} with the profits sum 15. We choose the maximum length items subset from all eligible subsets. For i = W to i = 1
(5-1) Selection (P, (3n + 2)t + C + i, T6);
(5-2) If (Detect (T6) = “Yes”) Break;
End for
In the above operation, W is all items profit sum. We use one “For” clause, the worst conditions is that the algorithm stop at i = 1, thus this operation can be finished less than O(W) steps since each single manipulation above works in O(1) steps. In the above operation, this operation can be finished in O(1) time steps since each single manipulation above works in O(1) steps. Finally the “Read” operation is applied to giving the final solutions to the unbalanced assignment problem.
(6) Finally the “Read” operation is used to get the exact solutions of the 0-1 knapsack problem. For example, for the eg. 1, the maximum profits items subset is {x1, x2, x5}. This operation works in O(1) steps.
(6-1) Read (T6).
The complexity and feasibility of the proposed DNA algorithm
The following theorems prove that the algorithm proposed above really can get solutions of the 0-1 knapsack problem in O(n+C+W) steps using DNA molecules operation.
After the operations of second step, all the strands in P denoting possible items subset will be appended weight strands to satisfy weight sum restriction. Then the strands can be described:
We find the strands satisfying the sum weight restriction at the third step. Simultaneity in order not to affect the profits measurement at the fourth step, we make the eligible strands keep the same length by append strands X
So we define S as the strands after the Step (4). Then S can be described:
The number of appending X times is decided by the existing items weight information on the strands. Due to we get the eligible items set strands satisfying the weight sum restriction at the same length, so we append the profits information strands at end of previous strands as so to get the optimal result strands. Then the strands can be described:
We reasonably design the length
So the length of strands which denote containing eligible items information must be between (3n+2)t+C+1 and (3n+2)t+C+W. The maximal length strands is the solution of the 0-1 knapsack problem. Then we can get the solution in Step (6) in appropriate length range.
So the total time complexity T is as:
In conclusion, we can get the solution of 0-1 knapsack problem with n items in O(n + C+W) time complexity.
Conclusions
In this paper, we present DNA algorithms for solving the 0-1 knapsack problem based on biological operations in the Adleman-Lipton model. The proposed algorithms have two advantages. Firstly, the proposed algorithm actually has a lower rate of errors for hybridization because we generate fixed reasonable DNA sequences for generating the solutions of the problem. Secondly, the proposed algorithms can works in O(n+C+W) steps for the 0-1 knapsack problem with n items, Comparing exponential level time by electronic computer. Correspondingly, Azad M A K [39] use a binary version of the artificial fish swarm algorithm to solve the problem with O(nˆ2) to O(mn+nˆ2); Gao [40] presented a quantum-inspired artificial immune system with O(mnˆ2). With the increasing in the scale of the problem, our proposed DNA computing algorithm has more advantages in operational efficiency compared to other algorithms. In addition, huge storage capacity and low energy consumption of reaction are obvious characteristics of our proposed algorithm. All our results in this paper are based on a theoretical model. However, the ability to perform complex operations in solution might help us learn more about the nature of computation and lead to the development of better DNA based computation, capable of solving a wide range of complex problems.
Copyright
Authors submitting a manuscript do so in the understanding that they have read and agreed to the terms of the IOS Press Author Copyright Agreement posted in the ‘Authors Corner’ on www.iospress.nl.
Quoting from other publications
An author, when quoting from someone else’s work or when considering reproducing figures or tables from a book or journal article, should make sure that he/she is not infringing a copyright. Although in general an author may quote from other published works, he should obtain permission from the holder of the copyright if he wishes to make substantial extracts or to reproduce tables, plates or other figures. If the copyright holder is not the author of the quoted or reproduced material, it is recommended that the permission of the author should also be sought. Material in unpublished letters and manuscripts is also protected and must not be published unless permission has been obtained. Submission of a paper will be interpreted as a statement that the author has obtained all the necessary permission. A suitable acknowledgement of any borrowed material must always be made.
Footnotes
Acknowledgments
It was supported by the Open Research Fund of State Key Laboratory of Simulation and Regulation of Water Cycle in River Basin, China Institute of Water Resources and Hydropower Research (Grant Nos. IWHR-SKL-201606, 2016TS07, 2016CG04) National Key Research and Development Program of China (grant No. 2017YFC0405501), and the Research and Development Projects on Application Technology of Heilongjiang Province(GZ16B011). The project was financially supported by Natural Science Foundation of Zhejiang Fund Project (Grant Nos. LY13H180012, LY15B070011), National Natural Science Foundation of China (Grant Nos. 21207103, 11501359 and 11601324) and Scientific Project of Wenzhou (Grant Nos. G20110004, C20150007). The project was also financially supported by the Education Department of Guizhou province science and technology talents support programs (KJH KY[2016]088). The project was also supported by National Natural Science Foundation of China (Grant No. 41671431).
