Abstract
The vertex coloring problem is a well-known combinatorial problem that requires each vertex to be assigned a corresponding color so that the colors on adjacent vertices are different, and the total number of colors used is minimized. It is a famous NP-hard problem in graph theory. As of now, there is no effective algorithm to solve it. As a kind of intelligent computing algorithm, DNA computing has the advantages of high parallelism and high storage density, so it is widely used in solving classical combinatorial optimization problems. In this paper, we propose a new DNA algorithm that uses DNA molecular operations to solve the vertex coloring problem. For a simple n-vertex graph and k different kinds of color, we appropriately use DNA strands to indicate edges and vertices. Through basic biochemical reaction operations, the solution to the problem is obtained in the O (kn2) time complexity. Our proposed DNA algorithm is a new attempt and application for solving Nondeterministic Polynomial (NP) problem, and it provides clear evidence for the ability of DNA calculations to perform such difficult computational problems in the future.
Introduction
As early as the 1950s, Feynman, one of the founders of nanotechnology, proposed the idea of manipulating matter on a small scale [1]. However, it was not until 1994 that Adleman achieved molecular-scale calculations [2]. Due to Adleman’s pioneering work, DNA computing has gradually evolved. Adleman firstly used DNA molecules to solve the directed Hamilton path problem (HPP). Adleman’s experiments show that the computation with DNA molecules is highly parallel and computationally intensive. The parallelism of DNA computing enables us to solve Non-deterministic Polynomial (NP)-hard problems with DNA molecules in the time of linear growth, and the time needed to solve these problems on the Turing machine tends to increase exponentially with the scale of the problem.
In this section, we briefly introduce the research progress of DNA computing, the existing algorithms of vertex coloring problem, and summarize the research content of this paper.
Research progress of DNA computing
In 1995, Lipton [3] proposed a DNA computing model to solve SAT problems with DNA computing by constructing contact network diagrams. DNA computing has become a hot topic in the field of science. Remarkable achievements have been made in the field of computing, which make the NP-hard problems become a vital research direction of the DNA computing model. Some typical DNA computing models, such as Adleman-Lipton model [2, 3], sticker model [4], surface-based model [5], hairpin model [6] and self-assembly model [7], have been proposed to promote the development of related research. Based on these models, there are many papers that design DNA programs and algorithms to solve various NP-hard problems [8–29]. In terms of DNA computing modeling, after years of development, scientists in computer, mathematics, biology and other related fields have proposed a variety of models for DNA computing, and these models are generally divided into three categories: one is based on the formalization theory, which abstracts biological operations in DNA computing into formal symbols, and then based on the formal derivation and mathematical modeling, the final calculation results are obtained; the other type of model is based on biological theory, which is to realize all the operations related to DNA computing in the way of biological operation, and obtain the calculation results through the actual operation in the biological laboratory; the third type is the calculation based on the structure and characteristics of DNA molecules model, used to reduce DNA strand operations effectively. To make better use of the power of biological computing, it is necessary to use DNA biological computing to solve more NP-hard problems that have not been involved.
Chai et al. [43] proposed an image encryption algorithm based on chaotic system and deoxyribonucleic acid (DNA) sequence operations. They encoded ordinary images into a DNA matrix and used a two-dimensional Logistic chaotic map to generate chaotic sequences for row cycle permutation (RCP) and column cycle permutation (CCP). Then, the line by line image diffusion method at DNA level was used. Finally, the spread DNA matrix was decoded to get the cipher image. Experimental results and security analysis showed that the algorithm has a good encryption effect and could resist various typical attacks. Liang et al. [44] compared and analyzed the differences between molecular computing and bioinformatics, which are two critical interdisciplinary subjects of molecular and computer. There is a close relationship between them, but they are developing in different directions. The molecules are only a few nanometers in size and inexpensive, making it possible to make DNA chips containing billions or even trillions of switches and modules. Xu et al. [45] used a 3-color graph with 61 vertices to illustrate the ability of the DNA computing model. The experimental results show that when constructing the initial solution space, not only all the graph solutions are found, but also more than 99% of the false solutions are eliminated. Liu et al. [46] classified biological networks according to the types of networks used by several standard computing methods, and introduced the computing methods used by every type of network. They used the advantages of the DNA computing model, such as the construction of non-enumerable initial solution space, partition and combination of subgraphs, parallel biological operation, etc., to solve a graph vertex coloring problem with the computational complexity of O (359).
Algorithms of vertex coloring problem
The vertex coloring problem (VCP) is a famous subject of modern graph theory in discrete mathematics. It has good theoretical research significance and engineering application value, such as scheduling, circuit wiring, frequency arrangement, scheduling and storage, etc. However, the vertex coloring problem is an NP-hard problem [30], and there is no precise algorithm of polynomial time. Therefore, the application of heuristic algorithm to find its approximate solution has become a hot topic for many scholars and experts. The vertex coloring problem requires each vertex to be assigned a corresponding color from given k kinds so that the colors on adjacent vertices are different, and the total number of colors used is minimized. For instance, the undirected graph G in Fig. 1 defines such a problem. And the solution to VCP of Fig. 1 is the color set for {v1, v3} ⟶ “ color1 ", {v2, v6} ⟶ “ color2 ", {v3, v5} ⟶ “ color3 ".

An undirected graph G with 6 vertices and 3 kinds color restriction.
Karp [31], Garey and Johnson [30] proved that the vertex coloring problem is an NP-hard problem in graph theory, so many approximate and exact solutions were proposed. Hertz and De Werra [32] proposed a tabu search algorithm that maintains the division of graph vertices in each iteration. In this algorithm, each block of a partition is given a different color, which does not always guarantee that it is an independent set. Therefore, the algorithm is suitable for solutions that are not necessarily feasible. Caramia and Dell’Olmo [33] proposed a local search algorithm HCD based on tabu search. The basic idea behind HCD is to use the tabu concept instead of expressing the tabu list explicitly. In contrast, the dynamic assignment of the vertices’ priorities in the graph performs the same task, avoiding duplication in subsequent movements of the algorithm. Voudouris and Tsang [34] proposed a local guided search (GLS) method for solving combinatorial optimization problems. Caramia et al. [35] proposed a priority search algorithm called CHECKCOL, which reduces runtime by avoiding unnecessary searches in most areas of the graph. Galiner et al. [36] gave an adaptive algorithm AMACOL to solve the problem of coloring graphs. The adaptive memory algorithm is a hybrid evolutionary heuristic algorithm that uses a central memory to store stable sets derived from the coloring generated in the previous stage of the search. Caramia and Dell’Olmo [37] also proposed a two-stage local search for vertex coloring. The algorithm alternately performs two closely related functions, random and deterministic local search.
Based on the Adleman-Lipton model [2, 3], this paper uses a new parallel DNA algorithm to solve the vertex coloring problem. Its calculation time complexity is O (kn2), compared to the exponential time complexity of other electronic computing algorithms.
The rest of the paper is arranged as follows. Section 2 introduces the background of DNA computing and the Adleman-Lipton model. Section 3 presents the detailed steps of the DNA algorithm to solve the vertex coloring problem. Section 4 gives the proof and complexity of the proposed algorithm. The simulation results of DNA algorithm are shown in Section 5. We conclude in Section 6.
Background of DNA computing and adlema-liption model
An overlap between life sciences and engineering is a striking characteristic of the development of modern science and technology. DNA computing is a new calculation method using biomolecule DNA as a calculation medium and biochemical reaction as a calculation tool. Generally considering, although the ability of the classical digital computer is unassailable when it executes the serial task, DNA computing shows a natural advantage compared with the classical digital computer in solving the problems that all possible solutions should be verified, which exist everywhere. When used to solve the parallel processing in large-size computing, it is of tremendous advantages.
DNA computing is a controllable biochemical reaction process that uses the double helix structure of DNA and the complementary matching rule of bases to encode information, map the information to be calculated into DNA molecular chains, generate various data pools with the help of biological enzymes, and then map the data operation of the original problem into DNA molecular chains in high parallel according to specific rules. Finally, molecular biotechnologies (such as PCR, ultrasonic degradation, affinity chromatography, cloning, mutation, molecular purification, electrophoresis, magnetic separation, etc.) are used to screen the required results. Mathematically speaking, A single strand DNA can be regarded as a string composed of symbols A, C, G and T, which can be represented as a set of four letters to decode and calculate information just like the codes 0 and 1 in an electronic computer. The two different ends of the DNA sequence are named after the 5’and 3’ ends, respectively. As a pair, nucleotides A with T and nucleotides C with G are considered complementary. Two complementary single-stranded DNA sequences of opposite polarity join together to form a double helix during annealing. In the opposite process, a double helix separates into two composed single strands, called melting. Besides, the length of a single-stranded DNA is the number of nucleotides composed of a single strand. So if a single-stranded DNA contains 20 nucleotides, it is called 20mer. The length of double-stranded DNA (where each nucleotide is a base pair) is counted in the number of base pairs.
In the Adleman-Lipton model, one tube has many DNA strands with an alphabet set {A, G, C, T}. Given such tubes, we can perform some operations with the O (1) time complexity [38]. The DNA operations are included but not limited to: Merge, Copy, Separation, Selection, Detect, Ligation, Discard, Annealing, Denaturation, Read and Append - tail. The above operations of specific biological meaning can refer to [39]. Specific enzyme reactions can act as "software" to complete all kinds of information processing. In this way, a new type of computer based on DNA chip can be developed by carrying out rich, precise and controllable chemical reactions on DNA double helix to complete various kinds of operation processes. At least, in theory, it can be proved that DNA computing is universal and can solve all the problems that the Turing machine can solve.
DNA computing algorithm of the vertex coloring problem
Description of the vertex coloring problem
For simplicity of discussion, here is a formal description of the vertex coloring problem: for a simple undirected graph G without self-loops, its vertex set V (G) = {v k |k = 1, 2, …, n} and edge set E (G) = {ei,j|1 ⩽ i ≠ j ⩽ n} are known. The coloring function f is the mapping f from the vertex to the color set C = {1, 2, . . . k}: V (G) → C satisfies: ∀ei,j ∈ E (G) , v i , v j ∈ V (G), then f (u) ≠ f (v); this is the k-vertex coloring problem of the graph. Generally, k is considered to be a constant value. For example, in the commonly studied graph vertex 3 coloring problem, k is 3.
Basic idea of the algorithm
The basic idea of solving the vertex coloring problem is as follows: the DNA chains representing the edges and vertices of the vertex coloring problem are put into the test tube, and the biological chains of various possible coloring schemes are obtained by the biochemical reaction. Then, the biological chains that do not meet the constraint conditions of the problem are screened out, and the color information contained in each feasible chain is recorded by the number of color types, to judge and search for the optimal solution. Specifically, the algorithm is implemented in four steps:
(1) The biological chain corresponding to various coloring schemes of vertex coloring problem is generated;
(2) The biological chains that do not meet the feasible solution conditions are deleted;
(3) The number of colors in each possible solution chain was recorded by adding a biological chain;
(4) Find the optimal solution.
The flow chart of our proposed algorithm is shown in Fig. 2.

The flow chart of proposed algorithm.
In this paper, we use the symbols
T1 = {1, ⋯ , k, # A1, B n # , Bt-1A t |t = 2, ⋯ , n},
T3 = {yi,j|1 ⩽ i < j ⩽ n, ei,j ∈ E},
T4 = {X} .
Algorithm steps
Procedure for production of all possible vertex coloring sets. Filter and exclude combination chains with the same color in adjacent vertices. For i = 1 to i = n - 1 For j = i + 1 to j = n Then For t = 1 to t = k End For End for End for Add biological chains to represent the number of colors. For t = 1 to t = k End for Find the optimal solution according to the length of biological chain. For j = 1 to j = k break; Else Continue; End for
Illustrate the steps
For a simple graph with n vertices, each possible coloring set of vertex V G can be represented by an n-bit number. A bit set to 1 indicates that the color of the vertex is the “color 1", a bit set to 2 indicates that the vertex is the “color 2". Correspondingly, a bit set to k indicates that the color of the vertex is the “color k". For instance, in Fig. 1, the 6-digit 132131 indicates that the color of vertex {v1, v4, v6} is “color1", the color of vertex {v3} is “color2", and the color of vertex {v2, v5} is “color3". By this method, we convert all kinds of coloring sets in an n-vertex graph to different representations with n-bit numbers. We mean this as the transformation of information.
After the seven steps of Algorithm 3.4.1, single strands in the tube T1 will encode all possible vertex coloring sets. Taking Fig. 1 as an example, the DNA strand # A12B1A21B2A31B3A43B4A52B5A63B6 # ∈T1 denotes the vertex coloring set: {v2, v3} ⟶ “ color1 ", {v1, v5} ⟶ “ color2 ", {v4, v6} ⟶ “ color3 ".
A feasible solution to the vertex coloring problem requires different colors on adjacent vertices. Therefore, the operation should check whether vertex coloring sets meet the above constraints. If ei,j ∈ E, we should discard DNA strands whose vertices v i and v j are the same color. As shown in Fig. 1, # A13B1A21B2A32B3A43B4A52B5A62B6 # (representing the vertex coloring set: {v2} ⟶ “ color1 ", {v3, v5, v5} ⟶ “ color2 ", {v1, v4} ⟶ “ color3 ") should be excluded as the vertices v5 and v6 connected by edge e5,6 are having the same “color 2". Only then all eligible vertex coloring sets can be selected. Therefore, algorithm 3.4.2 completes this work.
The optimal solution for the vertex coloring problem is to have the smallest type of color combination. We recorded the number of each color and selected the optimal coloring set. If one kind of color exists in the coloring set, we attach chain X to the end of corresponding DNA strand one time in Algorithm 3.4.3, and use the length of the strand to represent the amount of color kinds. Taking Fig. 1 for example, the strand # A12B1A21B2A34B3A43B4A54B5A62B6 # in T1 represent the vertex coloring set with four kinds color for {v1, v5} ⟶ “2 ", {v2, v6} ⟶ “1 ", {v3} ⟶ “4 ", {v4} ⟶ “3 ", then it should be appended X four times, shown as # A12B1A21B2A34B3A43B4A54B5 A62B6 # XXXX.
In Algorithm 3.4.4, we take the shortest single chain in T1 and get the solution of the VCP. In the example of Fig. 1, the singled strand in T1 with shortest length is # A11B1A22B2A33B3A41B4A53B5A62B6 # XX- XX. Therefore, the solution of Fig. 1 is the color set for {v1, v3} ⟶ “1 ", {v2, v6} ⟶ “2 ", {v3, v5} ⟶ “3 ".
Proofs and computational complexity of proposed algorithm
The following theorems show that the algorithm can obtain solutions to the vertex coloring problem in O (kn2) time computing complexity of DNA molecular operations from a limited length range strands.
Meanwhile, every strands in T1 after the second step represents a eligible vertex coloring set. And we originally design the chain length of # , 1, 2, ⋯ , k, A
k
, B
k
and X, for
Thereby, the solution of the vertex coloring problem in a limited length range is obtained.
In summary, we can solve the vertex coloring problem in O (kn2) time computing complexity, where k is the number of colors in the vertex coloring problem and n is the number of vertices.
Simulation experiment of DNA algorithms
The calculation of DNA depends on the accuracy of the biochemical molecules’ operations; otherwise, it will lead to the accumulation and expansion of errors in biochemical reactions. Therefore, designing a suitable DNA sequence is a necessary basis for ensuring accuracy with DNA calculations. To this end, we used the sequence design methods in Ref. [38–40].
In this paper, we use the computational molecular biology tool Biopython as the system development platform, and Braich’s program to generate “appropriate DNA sequences” suitable for biological computing algorithm, which can be used to solve the vertex coloring problem. When generating one DNA chain, the first step is to determine whether the chain satisfies the restrictions specified in references [38–40]. If the produced DNA sequence does not meet the restrictions, the program will continue to generate another new sequence. When these restrictions can be met, the sequence is accepted for subsequent biochemical reactions. Using this method, we can get the “appropriate DNA sequences” which meet the restriction conditions to improve the accuracy of biochemical reactions.
Therefore, taking the vertex coloring problem (Fig. 1) as an example, the program generates random four-base sequences to form A i , B i , #, c1, c2, c3 and X as shown on Table 1. Table 2 demonstrates node sequences composed of Braich’s methods. The program also calculates enthalpy, entropy, free energy of DNA strands under various interactions. Table 3 calculates them for each probe bound to the homologous region in the strands.
Strands for representing A
i
, B
i
, #, c1, c2, c3 and X (i ∈ {1, 2, ⋯ , 6}) for the vertex coloring problem
Strands for representing A i , B i , #, c1, c2, c3 and X (i ∈ {1, 2, ⋯ , 6}) for the vertex coloring problem
Sequences for representing the A i B i and B i A j (i < j) for the vertex coloring problem
Energy used to combine each probe with its corresponding region on the library strand
Table 4 shows their average deviation and standard deviation level. Then, we get DNA solution chains for the vertex coloring problem, which are shown on Table 5.
Energy in all probe/library chain interactions
Solution strands to the vertex coloring problem in Fig. 1
In this paper, we propose a new DNA vertex coloring algorithm based on biological operations. This algorithm has two advantages. First of all, because a relatively fixed-length DNA sequence is used to express the vertices and edges of the problem, the algorithm has a lower error rate for connection and degeneration. Secondly, for vertex coloring problems with n vertices and k color restrictions, the proposed algorithm can work with O (kn2) time complexity, which is obviously superior to the previous algorithms (O (1 .415 n ) [41], O (1 .3289 n ) [42], O (1 .7504 n ) [47], O (n5) [48] and so on). Our algorithm can solve the vertex coloring problem in the polynomial time range, while the other algorithms mentioned above can mostly be in the exponential time complexity range (see Table 6). From this point, our algorithm has obvious advantages. Through comparative analysis, we can find that DNA-based computing may be a good choice for large-scale combinatorial optimization problems.
Time complexity of different algorithms for vertex coloring problem
The key to reduce the computational complexity of our algorithm lies in the parallel operation of biological chains to generate all possible solutions, which is more efficient than the serial operation of traditional computer. Therefore, our proposed DNA computing algorithm points out a parallel intelligent algorithm for generating all feasible solutions of the problem. In order to solve the complex NP problems such as the assignment of set elements and continuous path search, it can learn from our DNA algorithm to try to solve them. Now, some intelligent algorithms for vehicle routing problem and task assignment problem have been proposed. Our algorithm also has particular promotion and reference significance to solve these problems. Moreover, we also believe that more complex issues will be solved by using our DNA computing algorithm, which can effectively improve the computational efficiency of these problems. What we have done is the application and attempt of new methods in intelligent computing algorithms, which also promotes the research progress in this field to a certain extent.
Because of the remarkable parallelism and mass storage of DNA computing, it can solve complex computing problems in polynomial time. However, the current strategy of all DNA computing models is based on exhausting all candidate solutions and then eliminating non-solutions through detection. However, as the number of calculated variables increases, the initial data pool size of this algorithm will increase exponentially. Therefore, as the scale of the problem continues to expand, this method of violent exhaustion will not be feasible. So the integration of intelligent technology will break the barrier of this kind exhaustive violent method, get a reasonable solution from a relatively small initial data pool, and avoid exhausting all candidate solutions.
At present, much research on DNA computing is still on paper, many ideas and schemes are idealized, and there is no condition to put them into experiments. There are still many technical obstacles in how to realize DNA computing and make DNA computers. Further research is needed on the reality and potential of DNA computing construction, reducing errors in DNA computing, effective general algorithms and human-computer interaction. Especially in DNA computing, the error code is generated randomly according to probability and can be amplified step by step. The bit error rate has a direct impact on the accuracy of DNA computing, which can not be effectively overcome at present. Perhaps the DNA computer only acts as an arithmetic unit. Even so, the complementary computers of DNA computing and traditional computers will have an immeasurable impact.
Footnotes
Acknowledgment
It was supported by the Open Research Fund of State Key Laboratory of Simulation and Regulation of Water Cycle in River Basin, China Institute of Water Resources and Hydropower Research (Grant No. IWHR-SKL-201905, SKL2020ZY08). The project is also funded by research Start-up project of Wenzhou Business School (RC202002) and National Natural Science Foundation of China (Grant No. 11701363).
