A memetic gravitation search algorithm for solving DNA fragment assembly problems

Abstract

The DNA fragment assembly (DFA) problem is among the most critical problems in computational biology. Being NP-hard, it can be efficiently solved via meta-heuristic algorithms, such as the gravitation search algorithm (GSA). GSA is a state-of-the-art swarm-based algorithm particularly suitable for solving NP-hard combinatorial optimization problems. This paper proposes a new memetic GSA algorithm called MGSA. MGSA is a type of overlap-layout-consensus model that is based on tabu search for population initialization. In order to increase the diversity of MGSA, we adapted two operator time-varying maximum velocities in the GSA procedure. Finally we also adapted the simulated annealing-based variable neighborhood search (SA-VNS) to find superior precise solutions. The proposed MGSA algorithm was verified with 19 DNA fragments based on seeking to maximize the overlap score measurements. In comparing the performances of the proposed MGSA and state-of-the-art algorithms, the simulation results demonstrate that the MGSA can achieve the best overlap scores.

Keywords

Gravitation search algorithm DNA sequence fragment assembly meta-heuristic algorithm memetic algorithm

1 Introduction

A single-stranded molecule of deoxyribonucleic acid (DNA) comprises a sequence of four nucleotides: adenine (A), cytosine (C), guanine (G), and thymine (T). The DNA Fragment Assembly (DFA) problem constitutes the task of rebuilding a DNA sequence from a set of DNA fragments. Complete DNA sequences are frequently longer than 3000 base pairs (bps). Important industrial applications of DFA are DNA cryptography [1] and gel electrophoresis [2]. The DFA problem has been proven as NP-hard [3] and inductive, as with the asymmetric traveling salesman problem [4].

DFA algorithms can be divided into two categories: overlap layout consensus (OLC) and de Bruijn graph (DBG) models. Since its proposal by Sanger et al., [5], the OLC model has become increasingly important and complex, and has been recently applied to sequencing newly discovered genomes [6]. For the DFA problem, the OLC usually adopts a greedy approach, such as PHRAP [7], GAP [8], TIGR Assembler [9], CAP3 [10], STROLL [11], Celera Assembler [12], or ARACHNE [13]. Be that as it may, although the greedy strategy successfully solves small or medium sequences by merging pairs of sequences with high overlap scores, it is less adept at assembling larger DNA sequences.

Recently, meta-heuristic algorithms [14] increasingly common popular for solving large NP-hard combinatorial optimization problems. Consequently, they have been incorporated into OLC algorithms to find the best overlapping or re-assembly of all DNA fragments. Such algorithms include hierarchy clustering [15], simulated annealing (SA) [16, 17], tabu search (TS) [18], genetic algorithms (GA) [19 –25], ant colony optimization (ACO) [26 –28], artificial bee colony algorithms (ABC) [29], particle swarm optimization (PSO) algorithms [30 –34], artificial immune system (AIS) [35], bee algorithm [29], cuckoo algorithm [36, 37], firefly algorithm [38], grid algorithms [39], hybrid algorithms [17 , 41], iterative algorithms [42, 43], and LHK algorithm [44]. The second broad DFA category embraces DBG models [45]. Differing from the OLC models, DBG models are based on K-mer graphs, searching DBGs and compressing redundant sequences [46, 47]. For the DFA problem, DBG models are solved by the greedy-algorithm EULER [48], parallel strategies [49, 50], maximum matching [51], dynamic hashing [52], exspander algorithm [53], generic algorithm [54], MaSuRCA assembler [55], MapReduce approach [56], and IWP methods [57]. The OLC and DBG models are compared in detail in [47]. However, this paper goal is to find the best overlapping among all fragments and ignore the issue of computation time. Recently, some new techniques [58 –60] have been proposed that offer the compression or selection feature to avoid over-fitting.

In contrast to the OLC and DBG models, the gravitation search algorithm (GSA) [61] approaches to the DFA problem have received little attention. In other words, difference with the other swarm-based algorithms such particle swarm optimization [62]. The GSA is memory-less and only the current position of the agents plays a role in the updating procedure, which can help reduce the premature convergence problem. Thus, this paper proposes a memetic GSA (MGSA) approach based the OLC model that solves DFA problems by maximizing the overlapping score measurement. GSA is rendered suitable for DFA problems by introducing the smallest position value (SPV) rule [63], which converts a continuous number to a DNA sequence. The proposed MGSA initializes the population based tabu search [64]. In order to increase the diversity of MGSA, we adapted two operator time-varying maximum velocities in the GSA procedure. Lastly, the current best solution is fine-tuned by coupling the SA method [65] to a local search method called variable neighborhood search (VNS) [14]. The proposed method was experimentally validated with 19 different DNA fragments (or instances). The overlap scores generated by MGSA were effectively improved, compared with those of existing algorithms. The remainder of this paper is organized as follows. Section 2 provides background information and discusses related studies. The proposed MGSA algorithm is presented and evaluated in Sections 3 and 4, respectively. Brief conclusions and ideas for future studies are presented in Section 5.

2 Background knowledge and related studies

This section provides the background of this research and discusses the related literature. The overview focuses on five main topics:

Memetic Algorithm

The DNA Fragment Assembly Problem (DFA)

Gravitation Search Algorithm (GSA)

Tabu Search (TS)

Simulated Annealing (SA)

Particle Swarm Optimization (PSO)

2.1 Memetic algorithm

This section introduces a memetic algorithm for the evolutionary computation in this paper.

The term memetic was coined by Dawkins [66], in which the word “meme” denotes tunes, ideas, catchphrases, clothing fashions, and styles of making pots or building arches. Just as genes propagate in the gene pool by leaping from body to body via sperms or eggs, memes propagate in the meme pool by leaping from brain to brain via a process that, in a broad sense, can be called imitation.

Therefore, the memetic algorithm [67 –69] attempts to mimic cultural evolution through stochastic global search meta-heuristics algorithms. The main concept underlying the memetic algorithm is hybridization of a class of meta-heuristic and local search algorithms in order to reduce the time taken to obtain good quality solutions to a variety of NP-hard optimization problems.

2.2 The DNA fragment assembly problem

DNA sequences over one hundred bps cannot be sequenced accurately and rapidly, and must first be randomly divided into several fragments. The original DNA sequence is then reconstructed from these fragments; a technique called shotgun sequencing. The OLC model for DFA is a three-step approach, as described below.

Overlap (O): Find a common sequence among all possible pairs of fragments.

Layout (L): Determine the order of fragments from their overlap scores. The overlap scores are based on sequence alignment approaches [70], including global alignment, semi-alignment and local alignment.

Consensus (C): Derive the consensus sequence from the layout step.

The DFA approach to DNA sequencing is illustrated in Fig. 1. First the DNA strand to be sequenced is randomly cut into four fragments. The algorithm then finds the overlaps and determines the layout of each fragment, and finally re-constructs the original DNA sequence [33].

The quality of the consensus is computed using Equation 1, where n denotes the number of fragments, while the coverage measures the mean redundancy of the fragment sequences. The higher the coverage, the higher the overlap scores and the smaller the number of sequence gaps. $Coverage = \frac{\sum_{o = 1}^{n} length of the fragment i}{total target length of sequence}$ (1)

2.3 Gravitation search algorithm

The gravitation search algorithm is a recently proposed meta-heuristic algorithm for modeling swarm populations. Pioneered by Rashedi et al. [61] in 2009, the algorithm is inspired by Newton’s laws of gravity and mass motion. Differ from the PSO approach [62], interactions among individual agents (mass aggregates) are governed by gravitational forces and the laws of motion. Since gravity guides the agents toward heavier masses, mass is regarded as a performance measure, with heavier masses indicating better solutions.

Given A agents whose positions and velocities are denoted by X and V, respectively, the solution of a particle i in iteration t is defined as Equations (2) and (3). $X_{i}^{t} = {x_{i, 1}^{t}, x_{i, 2}^{t}, x_{i, j}^{t}, \dots, x_{i, d}^{t}} for i = 1, 2, 3 \dots, A .$ (2) $V_{i}^{t} = {v_{i, 1}^{t}, v_{i, 2}^{t}, v_{i, j}^{t}, \dots, v_{i, d}^{t}} for i = 1, 2, 3 \dots, A .$ (3) where d is the dimension of the search space, and $x_{i, j}^{t}$ is the position of agent i in dimension j.

Each alternate agent represents one solution. Initially, the GSA assigns a random solution to each agent.

To evaluate the current population fitness, the mass of each agent is calculated using Equations (4)–(7). $m_{i}^{t} = \frac{{fit}_{i}^{t} - {worst}_{t}}{{best}_{t} - {worst}_{t}}$ (4) $M_{i}^{t} = \frac{m_{i}^{t}}{\sum_{c = 1}^{N} m_{c}^{t}}$ (5) ${best}_{t} = min ({fit}_{i}^{t}), for i = 1, 2, 3 \dots, A$ (6) ${worst}_{t} = max ({fit}_{i}^{t}), for i = 1, 2, 3 \dots, A$ (7)

The acceleration of each agent is determined from the gravitational law applied to each agent: $a_{i, j}^{t} = \frac{F_{i, j}^{t}}{M_{i}^{t}} = \sum_{c \in Kbest, c \neq j} G_{t} r_{2} \frac{M c^{t}}{R_{i, c}^{t} + eps} (x_{c, j}^{t} - x_{i, j}^{t})$ (8)

Subsequently, the searching strategy of each agent in the next iteration is governed by Equations (9) and (10). $v_{i, j}^{t + 1} = r_{2} \times v_{i, j}^{t} + a_{i, j}^{t}$ (9) $x_{i, j}^{t + 1} = x_{i, j}^{t} + v_{i, j}^{t + 1}$ (10) where r₂ is a uniform random variable in the interval [0, 1], while t and t + 1 denote the current and next iterations, respectively. Moreover, this system comprises A agents in a d-dimensional search space. $G_{t} = G_{o} \exp (- α \frac{t}{t_{\max}}), α < 1$ (11)

To control the search accuracy, the gravitational coefficient G is not constant; instead, it is a decreasing function of time.

The remaining important GSA notations are listed below:

${fit}_{i}^{t}$ : Fitness value of agent i in iteration t.

best_t: Best fitness value among all agents in iteration t.

worst_t: Worst fitness value among all agents in iteration t.

$m_{i}^{t}$ : Mass of an individual agent i.

$M_{i}^{t}$ : Average mass of agent i in iteration t.

$R_{i, c}^{t}$ : Euclidean distance between agents i and c.

$F_{i, j}^{t}$ : Force acting on each agent i in dimension j in iteration t.

Kbest: the K fittest agents (agents with highest gravitational mass).

ɛ: A small constant, introduced to avoid division by zero.

G^t: Gravitational coefficient in iteration t.

$a_{i, j}^{t}$ : Acceleration of each agent i in dimension j in iteration t.

α: Shrinking constant.

2.4 Tabu search

Tabu search (TS) is a meta-heuristic algorithm used for combinatorial optimization problems and was first proposed by Glover et al. [64]. The motivation for TS comes from the visited solutions and repeated visits of local search approaches, and is used to solve large, complex combinatorial problems. The most important strategy is to create a short-memory structure by TS, to record forbidden moves that do not improve the solution. This type of short-memory structure is called a tabu list.

2.5 Simulated annealing

The simulated annealing (SA) algorithm is a meta-heuristic method proposed by Kirkpatrick et al. [65]. The early concept of SA was derived from the annealing of a solid, as proposed by [71]. The SA algorithm is widely applied to solve complex combinatorial problems by using a perturbation searching strategy.

2.6 Particle swarm optimization

Particle swarm optimization (PSO) is a population-based swarm intelligence algorithm that was first presented by Kennedy and Eberhard [62]. The main concept of PSO is that each particle updates its current solution to reference with its own history experiences as well as the experiences of others, as shown in Equations 12 and 13. $\begin{matrix} v_{i, j}^{t + 1} & = & ω v_{i, j}^{t} + z_{1} r_{5} (p_{i, j}^{t} - x_{i, j}^{t}) \\ + z_{2} r_{6} ({p_{gbest}}_{j}^{t} - x_{i, j}^{t}) \end{matrix}$ (12) $x_{i, j}^{t + 1} = x_{i, j}^{t} + v_{i, j}^{t + 1}$ (13)

3 The proposed methods

This paper presents an MGSA algorithm for solving DFA problems. The solution of one agent is initially obtained by tabu search algorithm; the remaining agents are assigned random values. The best global solution is then computed by SPV rules and the evolutionary GSA algorithm; this process iterates until the stopping criterion is reached. The SA-VNS local search improves the quality of the best global solution generated by the GSA algorithm.

3.1 Objective function

The DFA problem is solved by discovering the consensus sequence and objective that maximizes the overlapping scores of all fragments. The DFA problem is formally described below.

Given a n of fragments f = {f₀, f₁, f₂, …, f_n}, the overlap score between two fragments i and j is w_i,j.

In this study, we adopt the objective function proposed by Parsons et al. [72], given by: $Maximizing : F (X_{i}) = \sum_{p = 0}^{n - 2} w (f_{p}, f_{p + 1}) .$ (14) where X_i denotes agent i, and w (f_p, f_p+1) is the overlap score between the two adjacent fragments, calculated using sequence alignment approaches. In this study, the overlap score is computed by semi-global alignment [73]. The calculation of the overlap score by semi-global alignment of two DNA sequences. Here, the match and mismatch scores are 2 and 0, respectively, and the gap penalty is 2. The overlap score between S₁ and S₂ is 6, and so S₁ is updated to AT-CG, where the symbol “-” indicates a gap in the DNA sequence.

3.2 Solution representation

The swarm based meta-heuristic GSA algorithm was originally designed for continuous problems. However, since the encoding format of each agent cannot be permuted, it is not directly compatible with the DFA problem. To enable a direct mapping of the agent position to the DNA sequence, we adopt a heuristic approach called SPV [63].

3.3 Initial population

In general, the population is randomly initialized according to Equation (15).

$\begin{matrix} x_{i, c} & = & x_{\min} + (x_{\max} - x_{\min}) * r_{a} . c = 1, 2, . . ., d . \end{matrix}$ (15a)

$\begin{matrix} v_{i, c} & = & v_{\min} + (v_{\max} - v_{\min}) * r_{b} . c = 1, 2, . . ., d . \end{matrix}$ (15b) where d represents the number of jobs at the agent position, and x_i,c and v_i,c refer to the c-th position value and the cth velocity value, respectively, of agent i. r_a and r_b are uniform random variables in the interval [0, 1]. The limits are given by x_min = -1.0, x_max = 1.0, v_min = -1.0, and v_max = 1.0, where the terms min and max signify the lower and upper bounds, respectively, of the position or velocity values.

To guarantee that the initial population obtains a diverse and high-quality solution, one of the agents adopts the TS algorithm, which maps sequence to position values following the SPV rule. In Table ??, dimension 3 has a rank value of 1, so its position becomes – 0.79. The rank value of dimension 1 is 2, so the position is reassigned as – 0.39. By similarly assigning the remaining values based on their ranking order, the SPV rule transforms the DNA sequence into the position values [– 0.39, 0.22, – 0.79, 0.13, 0.15].

3.4 The GSA procedure

Each agent moves in a direction governed by its gravitational mass. The gravitational masses and accelerations are formulated by Equations (11) and (8), respectively. According to Equation (8), Kbest is the fittest K agent with the largest gravitational mass. Furthermore, in the original GSA algorithm, K is initialized to the number of agents A, and decreases linearly to 1. In this paper, we imported a new operator to update the value of K by following Equation (16) [74]: $K = ⌊ (γ + (1 - \frac{t}{t_{\max}})) (1 - γ) A ⌋$ (16) where γ imposes a controlled linear decrease on K.

During successive iterations, each agent searches for a solution space and updates its own solution according to Equation (9). This paper assumes a time-varying maximum velocity that adopted from a previous study [75]: $V_{\max 0} = ρ \times (x_{\max} - x_{\min})$ (17a)

$V_{\max} = (1 - {(\frac{t}{t_{\max}})}^{h}) \times V_{\max 0}$ (17b) where the exponent h is a positive constant, while ρ controls the bounds of the search space at each velocity.

Once the position values of each agent have been updated by Equation (10), the new DNA sequence is rearranged using the SPV rule, as shown in Table ??.

3.5 SA-VNS local search

VNS [14] is a local search strategy that improves the quality and ensures suitable development of a meta-heuristic method; further, it can also ensure adequate diversity of the meta-heuristic solutions. VNS uses four common neighborhood perturbation operators, namely swap, insertion, inversion, and displacement, as described below. For a survey of VNS applications, the reader is referred to [14].

Swap: Randomly choose two different positions from the permutation sequence and swap them.

Insertion: Randomly choose two different positions from the permutation sequence and insert the front sequence ahead of the back one.

Inversion: Invert the subsequence between two random positions in the permutation sequence.

Displacement: Randomly select one subsequence and one position, and insert the subsequence before the chosen position.

In this paper, the GSA procedure is performed before the SA algorithm is implemented in the VNS local search. This ordering increases the local searching ability and avoids premature convergence. Importing the SA into the VNS process maintains appropriate balance between exploration and exploitation at different temperatures.

3.6 The proposed MGSA algorithm

A flowchart of the MGSA algorithm is presented in Fig. 3. Initially, the solution of one agent is found by TS algorithms, while the remaining agents are assigned random values. The global solution is then repeatedly optimized by the SPV rules and the evolutionary GSA algorithm until the stopping criterion is reached. Local searching by SA-VNS improves the quality of the best global solution generated by GSA.

3.7 Time complexity of MGSA

The MGSA algorithm can be simply divided into three parts: the initial population part; the GSA procedure part; and, SA-VNS local search part. The complete time complexity of the MGSA algorithm is analysed in the following.

3.7.1 Parameter setting

Assume there are A agents, n fragments, I number of iteration. In this paper, we used the quick sort to design the SPV-Rule; thus, the worst case SPV-Rule is O (n²).

3.7.2 Semi-global alignment

Assume the sizes of sequence S₁ and S₂ is equal to j and k, respectively. Accordingly, the complete time complexity of semi-global alignment is O ((n - 1) * jk)) = O (njk).

3.7.3 Initial population

First, one agent is obtained from the TS within 20 iterations, and the remaining solution agents are generated randomly and by the SPV rule. The time complexity of TS is O (20 * (n - 1 (jk)) = O (njk).Thus, the time complexity of the initial population is equal to O (njk) + O (A - 1 (n²)) = O (An₂).

3.7.4 GSA procedure

Second, F, M, a, and the SPV-rule need to be found once in the GSA procedure. The F needs to calculate the force of each two agents, for which the complexity is O (A (An)) = O (A²n), the M needs to process the semi-global alignment of all agents once, then find the best and worst fitness values once; and, for the one normalization, the complexity is O (A (njk)) + O (A) + O (A) + O (A) = O (A (njk)), where the a is O (A). Thus, the complete time complexity of the GSA procedure is O (IA²n) + O (IA (njk)) + O (IA) + O (IAn²) = O (IA (njk)).

3.7.5 SA-VNS local search

Assume the SA-VNS has a T iteration, and the time complexity of SA-VNS is O (T (njk)).

According to the three parts of the time complexity outlined above, the time complexity of MGSA is O (An₂) + O (IA (njk)) + O (T (njk)) = O (IA (njk)). Further more, the time complexity of the original GSA is as same as the O (IA (njk)).

4 Experimental results

This section evaluates the performance of the proposed MGSA algorithm. The test items are 19 benchmark DNA fragments from the NCBI (http://www.ncbi.nlm.nih.gov/), provided in shotgun format with fragment overlaps. The DFA benchmarks are generated by GenFrag [76], which splits the DNA sequence into different instances and provides a set of overlapping fragments as shown in Table 1. The experimentation was performed on a computer with an Intel Quad core Q9400 CPU operating at 2.66GHz and with 2GB of memory running Microsoft Windows 7.

4.1 Parameter settings

In order to verify the GSA, the parameter settings for all experiments were the same, as shown in 2.

All experimental results were run in 30 independent runs, with each run executed in 1000 iterations. The performance was assessed based on the average of the overlap score and computation time, for which the symbol t_avg denotes the average computation time in seconds, and where the match score is 2, the mismatch score is 0, and the gap penalty is 2 [73].

4.2 Comparison of MGSA with SGSA, GSA,DSAPSO, and TPSOSV

To validate the effectiveness of the MGSA algorithm, in this paper we also propose a variant of GSA, named SGSA. In addition to the SGSA, the original GSA and two PSO-based algorithms are compared in this section. The following abbreviations represent the five algorithms considered:

MGSA: The proposed algorithm in this paper.

SGSA: GSA using SA as the initialization algorithm with VNS local search.

GSA: The standard GSA [61].

DSAPSO: The PSO algorithm for solving DFA [30].

TPSOSV: PSO using TS as the initialization algorithm with SA-VNS local search [33].

4.2.1 Comparison of the overlap scores

The computational results are shown in Table 3. As can be seen, the majority of the MGSA overlap scores are better than SGSA, except for the benchmark TNFRSF19(4), TNFRSF19(7), X60189(4) and X60189(7). The experimental results indicate that the GSA algorithm with the initialization TS algorithm and the SA-VNS local search is more robust than the SGSA. On the other hand, results indicate that the MGSA outperforms the standard GSA, DSAPSO, and TPSOSV in average overlap of all instances is 10% , 75% and 62% , and especially outperform at almost 80% in instances within the BX842596(4), BX842596(7), M15421(5), M15421(6), M15421(7) and NC001807(4), and NC001807(7). Thus, from the above simulation results, it is concluded that MGSA is more effective than SGSA, the original GSA, and the other two PSO-based algorithms. It should be noted that the SGSA also outperforms the DSAPSO and TPSOSV.

4.2.2 Comparison for the computation time

Although the MGSA can outperform the other compared algorithms in overlap scores, the trade-off is computation time. Table 6 shows that the computational time of the MGSA is longer than the SGSA, the original GSA, DSAPSO, and TPSOSV by about 37% , 48% , 107% , and 32% , respectively. Further, based on Table 3, we can find that the SGSA is longer than the TPSOSV by only 2% computation time, but can achieve almost 40% overlap score. Thus, from the above simulation results, it is concluded that the GSA-based algorithm can outperform than the PSO-based algorithm in both overlap score and computation time.

4.3 Comparison of results obtained by MGSA and PH-PALS

In the previous subsection, the MGSA algorithm was demonstrated to provide the best solutions. Therefore, the MGSA was selected for a comparison with another state-of-the-art algorithm, named PH-PALS [17] also with 19 benchmarks. The comparison results are described in Table 5, the results of which indicate the MGSA can get the better overlap scores than the PH-PALS algorithm in most of benchmarks.

4.4 Statistical verification

Table 6 reports the two-sided Wilcoxon rank sum tests [77] of the MGSA, SGSA, GSA, DSAPSO, TPSOSV, and PH-PALS algorithms at the α = 0.001 significance level. Significant differences are indicated by +. Table 6 reveals that the superior performance of the MGSA is statistically significant.

5 Conclusions and future studies

This paper proposed a memetic Gravitation Search Algorithm (MGSA) algorithm for solving DFA problems. This algorithm converts continuous position values into job sequences by an SPV rule, initializes the population by a tabu search, and adopts simulated annealing using VNS as the local search method. These adaptations provide a balance between exploitation and exploration. The simulation results demonstrate that the pro- posed MGSA optimizes the overlap score of 19 bench- mark instances. One drawback of the MGSA is its higher computational time costs than the existing algorithms. In future studies, we will consider two approaches for reducing the computational time: DNA sequence compression, fuzzy entropy, and adapting our MGSA approach to suit the de-Bruijn-graph (DBG) model.

Footnotes

Acknowledgments

This work was supported in part by the National Science Council, Taiwan, R.O.C., under grants MOST 103-2221-E-006-145-MY3, and MOST 103-2221-E-006-181.

References

Zhang

, Fu

and Zhang

, Dna cryptography based on DNA fragment assembly, In Information Science and Digital Content Technology (ICIDT), 2012 8th International Conference on, volume 1, 2012, pp. 179–182.

Wang

R.-Y.

, Shi

Z.-Y.

, Guo

Y.-Y.

, Chen

J.-C.

and Chen

G.-Q.

, DNA fragments assembly based on nicking enzyme system, PLoS One8(3) (2013), e57943.

Pevzner

P.A.

, Computational molecular biology - an algorithmic approach, MIT Press, 2000.

Kato

and Hasegawa

, Performance of heuristic methods driven by chaotic dynamics for atsp and applications to DNA fragment assembly, Nonlinear Theory and Its Applications, IEICE2(4) (2011), 485–496.

Sanger

, Coulson

A.R.

, Hong

G.F.

, Hill

D.F.

and Petersen

G.B.

, Nucleotide sequence of bacteriophage lambda DNA, Journal of Molecular Biology162(4) (1982), 729–773.

Hassanien

A.E.

, Al-Shammari

E.T.

and Ghali

N.I.

, Computational intelligence techniques in bioinformatics, Computational Biology and Chemistry47 (2013), 37–47.

Phil Green, PHRAP, 1994.

Bonfield

J.K.

, Smith

K.F.

and Staden

, A new DNA sequence assembly program, Nucleic Acids Res23 (1995), 4992–4999.

Sutton

G.G.

, White

, Adams

M.D.

and Kerlavage

A.R.

, Tigr assembler: A new tool for asbling large shotgun sequencing projects, Genome Science and Technology1(1) (1995), 9–19.

10.

Huang

and Madan

, Cap3: A dna sequence assembly program, Genome Research9(9) (1999), 868–877.

11.

Chen

and Skiena

S.S.

, A case study in genome-level fragment assembly, Bioinformatics16 (2000), 494–500.

12.

Myers

E.W.

, A whole-genome assembly of drosophila, 2000, pp. 2196–2204.

13.

Batzoglou

, Jaffe

D.B.

, Stanley

, Butler

, Gnerre

, Mauceli

, Berger

, Mesirov

J.P.

and Lander

E.S.

, Arachne: A whole-genome shotgun assembler, Genome Res12(1) (2002), 177–189.

14.

Gendreau

and Potvin

J.-Y.

, Handbook of Metaheuristics, Springer Publishing Company, Incorporated, 2nd edition, 2010.

15.

Schmitt

K.R.B.

, Zimin

A.V.

, Marcaçs

, Yorke

J.A.

and Girvan

, A hierarchical network heuristic for solving the orientation problem in genome assembly, ArXiv E-Prints (2013).

16.

Alba

, Luque

and Khuri

, Assembling DNA fragments with parallel algorithms. In IEEE Congress on Evolutionary Computation, volume 1, 2005, pp. 57–64.

17.

Minetti

, Leguizamón

and Alba

, An improved trajectory-based hybrid metaheuristic applied to the noisy DNA fragment assembly problem, Information Sciences (2014), 273–283.

18.

Baewicz

, Formanowicz

, MKasprzak

, Markiewicz

and Wglarz

, Tabu search for DNA sequencing with false negatives and false positives, European Journal of Operational Research125(2) (2000), 257–265.

19.

Wang

, Fang

S.C.

and Zhong

, A genetic algorithm approach to solving DNA fragment assembly problem, Journal of Computational and Theoretical Nanoscience2 (2005), 499–505.

20.

Alba

and Luque

, A hybrid genetic algorithm for the dna fragment assembly problem. In Recent Advances in Evolutionary Computation for Combinatorial Optimization, 2008, pp. 101–112.

21.

Minetti

, Alba

and Luque

, Seeding strategies and recombination operators for solving the DNA fragment assembly problem, Information Processing Letters108(3) (2008), 94–100.

22.

Luque

and Alba

, Parallel gas in bioinformatics: Assembling DNA fragments, Studies in Computational Intelligence367(9) (2011), 135–147.

23.

Hughes

, Houghten

, Mallen-Fullerton

G.M.

and Ashlock

, Recentering and restarting genetic algorithm variations for dna fragment assembly. In Computational Intelligence in Bioinformatics and Computational Biology, 2014 IEEE Conference on, 2014, pp. 1–8.

24.

Hughes

, A study of ordered gene problems featuring dna error correction and dna fragment assembly with a variety of heuristics, genetic algorithm variations, and dynamic representations, 2014.

25.

Rathee

and Kumar

T.V.V.

, Dna fragment assembly using multi-objective genetic algorithms, International Journal of Applied Evolutionary Computation5(3) (2014), 84–108.

26.

Wetcharaporn

, Chaiyaratana

and Tongsima

, DNA fragment assembly by ant colony and nearest neighbour heuristics, In Proceedings 8th International Conference Artificial Intelligence and Soft Computing, volume 4029, 2006.

27.

Ibrahim

and Kurniawan

T.B.

, Implementation of an ant colony system for DNA sequence optimization, Journal of Artif Life Robotics (2009), 293–296.

28.

Baidya

and RajatKumar

, A novel locally guided genome reassembling technique using an artificial ant system, Applied Intelligence43(2) (2015), 397–411.

29.

Firoz

J.S.

, Rahman

M.S.

and Saha

T.K.

, Bee algorithms for solving DNA fragment assembly problem with noisy and noiseless data. In Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference, 2012, pp. 201–208.

30.

Verma

R.S.

, Singh

and Kumar

, Dna sequence assembly using particle swarm optimization, International Journal of Computer Applications28(10) (2011), 33–38.

31.

Firoz

J.S.

, Rahman

M.S.

and Saha

T.K.

, Hybrid metaheuristics for DNA fragment assembly problem for noiseless data. In Informatics, Electronics Vision (ICIEV), 2012 International Conference on, 2012, pp. 652–656.

32.

Mallén-Fullerton

G.M.

and Fernandez-Anaya

, DNA fragment asbly using optimization. In Evolutionary Computation (CEC), 2013 IEEE Congress on, 2013, pp. 1570–1577.

33.

Huang

K.-W.

, Chen

J.-L.

, Yang

C.-S.

and Tsai

C.-W.

, A memetic particle swarm optimization algorithm for solving the dna fragment assembly problem, pp, Neural Computing and Applications, 2014, pp. 1–12.

34.

Rajagopal

and Sankareswaran

U.M.

, An adaptive particle swarm optimization algorithm for solving dna fragment assembly problem, Current Bioinformatics10(1) (2015), 97–105.

35.

Nazri

M.Z.A.

, Huri

M.D.

, Bakar

A.A.

, Abdullah

, Dan

M.A.

and Kurniawan

T.B.

, DNA sequence design using artificial immune systems, Journal of Engineering and Applied Sciences8(2) (2013), 49–57.

36.

Kartous

and Chikhi

, Improved cuckoo search algorithm for dna fragment assembly problem, Networking and Advanced Systems, 2015, p. 117.

37.

Indumathy

, Maheswari

and Subashini

, Nature-inspired novel cuckoo search algorithm for genome sequence assembly, Sadhana40(1) (2015), 1–14.

38.

Vidal

and Olivera

, A parallel discrete firefly algorithm on gpu for permutation combinatorial optimization problems, 485 (2014), 191–205.

39.

Nebro

A.J.

, Luque

, Luna

and Alba

, DNA fragment assembly using a grid-based genetic algorithm, Computers & Operations Research35(9) (2008), 2776–2790.

40.

Luque

, Dorronsoro

, Alba

and Bouvry

, A self-adaptive cellular memetic algorithm for the DNA fragment assembly problem, In IEEE Congress on Evolutionary Computation, 2008, pp. 2651–2658.

41.

Nemati

, Basiri

M.E.

, Ghasem-Aghaee

and Aghdam

M.H.

, A novel acoga hybrid algorithm for feature selection in protein function prediction, Expert Systems with Applications36(10) (2009), 12086–12094.

42.

Dorronsoro

, Bouvry

and Alba

, Iterated local search for de novo genomic sequencing. In Rutkowski

Leszek

, Scherer

Rafa

, Tadeusiewicz

Ryszard

, Zadeh

Lot-fiA.

and Zurada

JacekM.

, editors, Artifical Intelligence and Soft Computing, volume 6114 of Lecture Notes in Computer Science, SpringerBerlin Heidelberg, 2010, pp. 428–436.

43.

Kubalik

, Buryan

and Wagner

, Solving the DNA fragment assembly problem efficiently using iterative optimization with evolved hypermutations. In Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, GECCO’10, 2010, pp. 213–214.

44.

Mallén-Fullerton

G.M.

, Hughes

J.A.

, Houghten

and Fernández-Anaya

, Benchmark datasets for the dna fragment assembly problem, International Journal of Bio-Inspired Computation5(6) (2013), 384–394.

45.

Compeau

P.E.C.

, Pevzner

P.A.

and Tesler

, How to apply de bruijn graphs to genome assembly, Nature Biotechnology29(11) (2011), 987–991.

46.

Miller

J.R.

, Koren

and Sutton

, Assembly algorithms for next-generation sequencing data, Genomics95(6) (2010), 315–327.

47.

, Chen

, Mu

, Yuan

, Shi

, Zhang

, Gan

, Li

, Hu

, Liu

, et al., Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph, Briefings in Functional Genomics11(1) (2012), 25–37.

48.

Pevzner

P.A.

, Tang

and Waterman

M.S.

, An eulerian path approach to DNA fragment assembly, Proceedings of the National Academy of Sciences98(17) (2001), 9748–9753.

49.

Simpson

, Wong

, Jackman

, Schein

, Jones

and Birol

, Abyss: A parallel assembler for short read sequence data, Genome Res19 (2009), 1117.

50.

Georganas

, Buluc

, Chapman

, Oliker

, Rokhsar

and Yelick

, Parallel de bruijn graph construction and traversal for de novo genome assembly, In High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for, 2014, pp. 437–448.

51.

Couto

A.D.

, Ribeiro Cerqueira

, Guerra

R.L.

, Goncalves

L.B.

, De Castro Goulart

, Siqueira-Batista

, Dos Santos Ferreira

and De Paiva Oliveira

, Theoretical basis of a new method for DNA fragment assembly in k-mer graphs. In Chilean Computer Science Society (SCCC), 2012 31st International Conference of the, 2012, pp. 69–77.

52.

Zhao

, Liu

, Voss

and Muller-Wittig

, A dynamic hashing approach to build the de bruijn graph for genome assembly. In TENCON 2013 - 2013 IEEE Region 10 Conference, 2013, pp. 1–4.

53.

Prjibelski

A.D.

, Vasilinetc

, Bankevich

, Gurevich

, Krivosheeva

, Nurk

, Pham

, Korobeynikov

, Lapidus

and Pevzner

P.A.

, Exspander: A universal repeat resolver for dna fragment assembly, Bioinformatics30(12) (2014), 293–301.

54.

Gritsenko

A.A.

, Nijkamp

J.F.

, Reinders

M.J.T.

and de Ridder

, Grass: A generic algorithm for scaffolding next-generation sequencing assemblies, Bioinformatics28(11) (2012), 1429–1437.

55.

Zimin

A.V.

, Marçais

, Puiu

, Roberts

, Salzberg

S.L.

and Yorke

J.A.

, The masurca genome assembler, Bioinformatics29(21) (2013), 2669–2677.

56.

, Gao

and Chunyan

, An efficient algorithm for {DNA} fragment assembly in mapreduce, Biochemical and Biophysical Research Communications426(3) (2012), 395–398.

57.

Hassan

, Majid

Z.A.

, Halim

A.K.

and Ibrahim

, Design and development of DNA fragment assembly using iwp method, In Control and System Graduate Research Colloquium (ICSGRC), 2013 IEEE 4th, 2013, pp. 63–68.

58.

Wang

X.-Z.

and Dong

C.-R.

, Improving generalization of fuzzy if– then rules by maximizing fuzzy entropy, Fuzzy Systems, IEEE Transactions on17(3) (2009), 556–567.

59.

Zhu

, Zhou

, Ji

and hui

S.Y.

, DNA sequence compression using adaptive particle swarm optimization-based memetic algorithm, IEEE Transactions on Evolutionary Computation15(5) (2011), 643–658.

60.

Zhai

J.-H.

, Xu

H.-Y.

and Xi-Zhao

, Dynamic ensemble extreme learning machine based on sample entropy, Soft {{Computing16(9) (2012), 1493–1502.

61.

Rashedi

, Nezamabadi-Pour

and Saryazdi

, GSA: A gravitational search algorithm, Information Sciences179(13) (2009), 2232–2248.

62.

Kennedy

and Eberhart

, Particle swarm optimization, In IEEE International Conference on Neural Networks4 (1995), 1942–1948.

63.

Fatih Tasgetiren

, Liang

Y.-C.

, Sevkli

and Gencyilmaz

, A particle swarm optimization algorithm for makespan and total flowtime minimization in the permutation flowshop sequencing problem, European Journal of Operational Research177(3) (2007), 1930–1947.

64.

Glover

and Laguna

, Tabu Search. Kluwer Academic Publishers, Norwell, MA, USA, 1997.

65.

Kirkpatrick

, Gelatt

C.D.

and Vecchi

M.P.

, Optimization by simulated annealing, Science220 (1983), 671–680.

66.

Dawkins

, The Selfish Gene. Oxford University Press, Oxford, UK, 1976.

67.

Moscato

, On evolution, search, optimization, genetic algorithms and martial arts - towards memetic algorithms, 1989.

68.

Moscato

and Cotta

, A gentle introduction to memetic algorithms, 57 (2003), 105–144.

69.

Krasnogor

, Aragón

and Pacheco

, Memetic algorithms, 36 (2006), 225–248.

70.

Alimehr

, The performance of sequence alignment algorithms. Master’s thesis, Uppsala University, Department of Information Technology, 2013.

71.

Metropolis

, Rosenbluth

A.W.

, Rosenbluth

M.N.

, Teller

A.H.

and Teller

, Equation of state calculations by fast computing machines, The Journal of Chemical Physics21(6) (1953), 1087–1092.

72.

Parsons

R.J.

, Forrest

and Burks

, Genetic algorithms, operators, and DNA fragment assembly, Machine Learning21(1-2) (1995), 11–33.

73.

Coull

S.E.

and Szymanski

B.K.

, Sequence alignment for masquerade detection, Computational Statistics & Data Analysis52(8) (2008), 4116–4131.

74.

Gao

, Vairappan

, Wang

, Cao

and Tang

, Gravitational search algorithm combined with chaos for unconstrained numerical optimization, Applied Mathematics and Computation231 (2014), 48–62.

75.

Khajehzadeh

, Taha

M.R.

, El-Shafie

and Eslami

, A modified gravitational search algorithm for slope stability analysis, Engineering Applications of Artificial Intelligence25(8) (2012), 1589–1597.

76.

Engle

M.L.

and Burks

, Artificially generated data sets for testing DNA sequence assembly algorithms, Genomics16(1) (1993), 286–288.

77.

Derrac

, García

, Molina

and Herrera

, A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms, Swarm and Evolutionary Computation1 (2011), 3–18.