Protein Folding Prediction in a Cubic Lattice in Hydrophobic-Polar Model

Abstract

The tertiary structure of the proteins determines their functions. Therefore, the predicting of protein's tertiary structure, based on the primary amino acid sequence from long time, is the most important and challenging subject in biochemistry, molecular biology, and biophysics. One of the most popular protein structure prediction methods, called Hydrophobic-Polar (HP) model, is based on the observation that in polar environment hydrophobic amino acids are in the core of the molecule—in contact between them and more polar amino acids are in contact with the polar environment. In this study, we present a new mixed integer programming formulation, exact algorithm, and two heuristic algorithms to solve the protein folding problem stated as a combinatorial optimization problem in a simple cubic lattice. The results from computational runs on a set of benchmarks are favorably compared to known algorithms for solving the 3D lattice HP model as genetic algorithms, ant colony optimization algorithm, and Monte Carlo algorithm.

1. Introduction

Predicting of the protein's tertiary structure from its amino acid sequence is one of the most important problems in computational biology. The protein functions depend on its tertiary structure, which in turn depends on the protein's primary structure. The mistakes, arising in the protein folding process lead to occurrence of protein with unusual forms, which are the main causes of many diseases such as cystic fibrosis, Alzheimer's disease, and mad cow. If we can predict, with high accuracy, the tertiary structures of proteins from their primary structure, we will be able to better treat these diseases. In addition, the knowledge of the tertiary structures of proteins can be used in drugs design (Dill, 1985).

The determination of the functionality of a protein from its amino acid sequence is one of the fundamental problems in computational biology, molecular biology, biochemistry, and physics. Even experimental determination of these conformations often is difficult and time-consuming. It is common practice for predicting of the tertiary structure on the proteins is to use models that simplify the possible conformations search space. These models reflect the different global characteristics of the protein structure.

The Hydrophobic-Polar (HP) model (Dill and Lau, 1989) describes the protein sequence based on the fact that hydrophobic amino acids must have less contact with water as opposed from the polar amino acids. This leads to the formation of hydrophobic core in the tertiary structure of the proteins.

The optimal conformation of protein folding in the HP model is the one that has maximum number of contacts between hydrophobic amino acids, which gives the lowest energy value (Fig. 1).

FIG. 1.

Optimal conformation for HP sequence with length 36 amino acids in (a) 2D lattice (14 contacts) and (b) 3D lattice (18 contacts). HP, Hydrophobic-Polar.

It is proved that the protein folding problem in the HP model for 2D and 3D is NP-hard (Berger and Leighton, 1998; Blazewick et al., 2004).

In 2D, the heuristic algorithm described by Traykov et al. (2016) generated folds that are better than the folds obtained by approximate algorithms as Monte Carlo algorithm, Newman's algorithm, Hart–Istrail algorithm, and close to the folds obtained by the Mixed Search algorithm, and Genetic algorithm (Toma and Toma, 1996; Istrail et al., 2000; Chen and Huang, 2005; Istrail and Lam, 2009; Custódio et al., 2004) (Fig. 2).

FIG. 2.

Protein folds with length 36 amino acid in 2D: (a) Hart-Istrail algorithm (Istrail and Lam, 2009), (b) Newman's algorithm (Istrail and Lam, 2009), and (c) Traykov et al. (2016) algorithm. Light = hydrophobic; dark = hydrophilic.

We note that Figure 2a and b are taken from a recommended survey of Istrail and Lam (2009).

The structures above, called self-avoiding paths (walks), are drawn on a square lattice Z² without overlaps and only hydrophobic interactions are modeled. In the sections below, we (i) formalize the problem, (ii) present it as a problem in so-called alignment graph and give a linear mixed integer programming model, (iii) describe a new heuristics for a cubic lattice case, and (iv) present results from computational runs.

2. Folds in HP Model

The processes, related to the protein folding, are very complex and only a minority of them are explained and understood from the scientists. For this reason, the simplified models, such as Dill's HP model, have become one of the main tools for study of the proteins (Dill and Lau, 1989). The HP model is based on the observation that the hydrophobic interaction between the amino acids is the driving force in the protein folding process. In this model, the 20th amino acids are reduced to two types—H (hydrophobic) and P (hydrophilic). The energy of the conformation is defined as the number of contacts between hydrophobic amino acids (H-H contacts), which are not neighbors in the protein sequence. The optimal conformation is the conformation with lowest energy value, defined as a negative of the number of H-H contacts.

Conformations of the proteins in the HP models are limited to self-avoiding paths in lattice models. The self-avoiding path is a sequence of moves in the lattice, which do not pass through the same position more than once. In a cubic lattice, such a path is simply a sequence of moves from (x, y, z) node to one of the six neighbor nodes. The goal is to find the path that minimizes the energy. Figure 3 shows a schematic representation of a self-avoiding path in a cubic lattice (Zhang and Skolnick, 2004; Yanev et al., 2011).

FIG. 3.

Three-dimensional lattice in the HP model (Thilagavathi and Amudha, 2015).

Briefly, the HP model of protein folding is as follows: given is an amino acid sequence, S = s1, s2, …, sn (sequence of letters over the {H,P} alphabet) and a lattice. The goal is to find conformation of S (to align the letters with a subset of the lattice points) with lowest energy value, that is

Maximize:

The number of H-H contacts

Subject to:

1. Assignment: Each amino acid must occupy one lattice point.

2. Nonoverlapping: No two amino acids may share the same lattice point.

3. Connectivity: Every two amino acids that are consecutive in the protein's sequence must also occupy adjacent lattice points.

For solving the protein folding problem in the HP model, proposed are a number of optimization algorithms, including Evolutionary algorithms (Krasnogor et al., 1998), Monte Carlo algorithms (Liang and Wong, 2001), and Ant-Colony Optimization algorithms (ACOs) (Shmygelska and Hoos, 2005; Thilagavathi and Amudha, 2015; Ramyachitra and Veeralakshmi, 2014).

3. An Integer Programming Formulation

The key ingredients of the problem are as follows: a sub lattice (arbitrary) and a sequence of H, P letters are used below for creating a problem on graphs, which could be solved as an integer programming problem or by means of graph theory only [In the 2D case, the sublattice is a square, and for 3D, a cube with nodes painted in black (set V_b) and white (set V_w) alternately.] Let G_c = (V_c, E_c) be a graph with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${V_c} = {V_b} \cup {V_w} = \left\{ {1 , \ 2 , \ \ldots m} \right\} $$ \end{document} , where node i corresponds to the ith node of the sublattice (under an arbitrary numeration of the sublattice nodes) and the edge (i; j)∈E_c if i and j are neighbors in the sublattice. The simple paths (each node is visited at most once) in G_c are called self-avoiding paths.

Let S be a sequence of n letters on {0,1} alphabet (0-for P, and 1-for H). Let G_S = (V_S, E_S) be a graph associated with S with a node set V_S = {1, 2, …, n} and (i, j) ∈ E_S if and only if |i − j| ≥ 2 and S[i] = S[j] = 1. Let G = G_S ∪ G_c be a complete bipartite graph with node set V_S ∪ V_c. The matching (in this case one-to-one mapping of V_S to V_c) M = {e₁, e₂, …, e_n} with |M| = n is feasible if the covered nodes in V_c define a self-avoiding path. Define function z(e_i, e_j) = z_ikjl = 1 if (i, j)∈E_S, (k, l)∈E_c, and z_ikjl = 0 otherwise. Finally, define \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} v \left( M \right) = \sum \limits_{{e_i},{e_j} \in M} z \left( {{e_i},{e_j}} \right) \qquad\qquad\qquad\qquad \left( {1{ \rm{a}}} \right) \end{align*} \end{document}

Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\tilde M$$ \end{document} be the set of feasible matchings. Then, the problem of finding the optimal folding over the chosen lattice is as follows:

Folding problem on graph: For \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$M \in \tilde M$$ \end{document} , find v = max v(M).

Converting the HP problem to an optimization problem on graphs allows for building various integer programming models. Most of them involve introducing binary variables, say x_ik for modeling the feasible matchings from above as 0–1 solutions to simple linear constraints:

Assignment \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \sum \limits_{i = 1}^m {x_{ik}} = 1 \quad k = 1 , \ldots , n. \qquad\qquad\qquad\qquad \left( 2 \right) \end{align*} \end{document}

Nonoverlapping \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \sum \limits_{k = 1}^n {x_{ik}} \le 1 \quad i = 1 , \ldots, m. \qquad\qquad\qquad\qquad \left( 3 \right) \end{align*} \end{document}

The objective function could be expressed by linearization of z_ijkl = x_ikx_jl and/or by partitioning the sum of z in subsums in different ways. We will not present all possible integer programming models, but for the sake of completeness, we add the following constraints to finish modeling the self-avoiding paths: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} {x_{ik}} \le \sum \limits_{j \in n \left( i \right) } {x_{jk + 1}} \quad i = 1 , \ldots , m; \ k = 1 , \ldots , n - 1.\qquad\qquad\qquad\qquad \left( 4 \right) \end{align*} \end{document}

where n(i) is the set of neighbors of the ith node.

An easily derivable bound to v could be obtained by the following observation, let ODD be the set of odd i, such that S[i] = 1 and EVEN be the set of even i, s.t. S[i] = 1. Since without loss of generality even elements of S are assigned to black nodes and odd elements to the white ones, then z_ikjl could be equal to 1 only for even–odd couples i, j. Then C_2D(S) = 2 × min{|ODD|, |EVEN|} [C_3D(S) = 4 × min{|ODD|, |EVEN|} in 3D] is obviously a sharp upper bound to v.

Getting back to the HP folding problem and its conversion to a problem of finding matching that maximizes the number of overlapping edges, one could find a lot of similarity with another problem known as Contact Map Overlap that is (in our case) as follows: for a given two graphs G_S, G_c find an embedding (matching) of V_S in V_c that maximizes the number of common (overlapped) edges.

A suitable platform for building integer programming models is so called alignment graph G = (V_c ⨂ V_S, E) with E = {i, k, j, l}, (i, j)∈E_c, (k, l)∈E_S. If we call the arcs in E z arcs then the problem is to find the path in G that activates the maximum number of z arcs. By decomposing the sum (1a), we could obtain different integer programming models like the next one: without loss of generality assume that the set EVEN is of smaller cardinality and they are assigned to V_b. Let y_ij∈{0,1}, i∈V_b, j∈V_w correspond to the sum of z arcs between rows i and j (Fig. 4). Thus, the problem is to maximize: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \sum {y_{ij}}, \qquad\qquad\qquad\qquad \left( 1 \right) \end{align*} \end{document}

FIG. 4.

Partial subgraph of alignment-graph G. H = hydrophobic; P = hydrophilic.

subject to additional [to Eqs. (2 –4)] constraints allowing y_ij to be equal to 1: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} {v_i} = \sum \limits_{k \in EVEN} {x_{ik}} \quad i \in {V_b},\qquad\qquad\qquad\qquad \left( 5 \right) \end{align*} \end{document} \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} {v_i} = \sum \limits_{k \in ODD} {x_{ik}} \quad i \in {V_w}, \qquad\qquad\qquad\qquad \left( 6 \right) \end{align*} \end{document} \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} {v_j} \ge {y_{ij}} \le {v_i} \qquad\qquad\qquad\qquad \left( 7 \right) \end{align*} \end{document}

The partial subgraph of G shown in Figure 4 intends for clarifying the relation of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${y_{ij}}$$ \end{document} to the z-arcs (contacts), that is, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${y_{ij}}$$ \end{document} accounts for the sum of these arcs and it could be 0 or 1. The model above counts, also, and the contacts between consecutive H, but for each input sequence this adds a constant to the previously defined energy. The contacts in Table 2 are between even–odd nonconsecutive H only. The sets V_w and V_b (for this toy example) are {2,4,6,8} and {1,3,5,9}, corresponding to the black/white coloring of sublattice nodes (from the left).

The model above is a mixed linear integer programming problem with x_ik binary and using appropriate solvers like CPLEX (www.ibm.com/software/products/en/ibmilogcpleoptistud) or GUROBI (www.gurobi.com/documentation) allows for finding optimal folds for sequences of up to 100 elements on a computer with average capabilities. A challenge for the reader is to prove that the number of binary variables could be reduced from 0.5 to 0.25 nm.

4. Algorithms for Solving the Problem

The main components of the algorithms outlined below are as follows:

• a model builder—creates an lp file for the Model (1–7)

• a MIP (mixed integer programming) solver like CPLEX or GUROBI

The common input is a cubic sublattice d₁ ≥ x ≥ 0, d₂ ≥ y ≥ 0, d₃ ≥ z ≥ 0 related to the length n of the input HP sequence S. For the heuristics HE-1 and HE-2 (see below), S is splitted in segments of predefined lengths, that is, S = S₁ ∪ S₂ ∪ … ∪ S_k, S_i ∩ S_i +₁ = Ø.

The time complexity of these algorithms depends on the segment size, the lattice size, and the computer capabilities. On laptops of average class, both heuristics could find folds of low energy for proteins of any length if the segments length is in [20, 30]. The usage of EA, for such class of computers, is limited to proteins of length up to 100 characters.

5. Computational Experiments

In this chapter, we compare our algorithms with Genetic algorithm, ACO, and Evolutionary algorithm with Backtracking in 3D lattice. For computational experiments, we use eight HP sequences that are known in the literature benchmarks for 3D lattice in the HP model (Jiang et al., 2003; Garza-Fabre et al., 2003) and one additional HP sequence with length 102 amino acids, which is the benchmark sequence that we propose (Table 1).

Table 1.

Hydrophobic-Polar Sequences for Three-Dimensional Lattice

Seq.	Length	Protein sequence
1	20	(HP)₂PH(HP)₂(PH)₂HP(PH)₂
2	24	H₂P₂(HP₂)₆H₂
3	25	P₂HP₂(H₂P₄)₃H₂
4	36	P(P₂H₂)₂P₅H₅(H₂P₂)₂P₂H(HP₂)₂
5	46	P₂H₃PH₃P₃HPH₂PH₂P₂HPH₄PHP₂H₅PHPH₂P₂H₂P
6	48	P₂H(P₂H₂)₂P₅H₁₀P₆(H₂P₂)₂HP₂H₅
7	50	H₂(PH)₃PH₄PH(P₃H)₂P₄(HP₃)₂HPH₄(PH)₃PH₂
8	60	P(PH₃)₂H₅P₃H₁₀PHP₃H₁₂P₄H₆PH₂PHP
9	102	PH₂P₅H₂P₂H₂PHP₂HP₇HP₃H₂PH₂P₆HP₂HPHP₂HP₅H₃P₄H₂PH₂P₅H₂P₄H₄PHP₈H₅P₂HP₂

The symbols H_i, P _i and (…)_i in Table 1 show i repeats of character or sequence.

In Table 2, we compare the obtained results by our algorithms (the columns EA [Exact algorithm], HE-1, and HE-2) with known in the literature results obtained by the Meta-Heuristic Ant-Colony Optimization algorithm (the column ACO-Metaheuristic) (Shmygelska and Hoos, 2005; Thilagavathi and Amudha, 2015), Genetic algorithm (the column GA) (Lin and Su, 2011), and Evolutionary algorithm with Backtracking (the column Backtracking-EA) (Cotta, 2003). The column BKS (best known solution) in Table 2 shows the best know solution for each of the tested HP sequences.

Table 2.

Computational Results Obtained for Nine Hydrophobic-Polar Sequences in Three-Dimensional Lattice in Hydrophobic-Polar Model

	Contacts
Length	BKS	GA	Backtracking-EA	ACO-metaheuristic	EA	HE-1	HE-2
20	11	11	11	10	11	11	11
24	13	13	13	8	13	13	13
25	9	9	9	6	9	9	9
36	18	18	18	10	18	18	18
46	32	—	—	21	33^a	31	31
48	29	25	25	—	31^a	31^a	28
50	26	23	23	—	28^a	28^a	28^a
60	49	37	39	—	55^a	55^a	52^a
102	—	—	—	—	53	45	50

ACO, Ant-Colony Optimization algorithm; BKS, best known solution; EA, Exact algorithm; GA, Genetic algorithm; HE-1, Heuristic algorithm-1; HE-2, Heuristic algorithm-2.

The protein sequence for which we improve the BKS.

Remark. Protein sequence with length 102 in Table 2 is a benchmark sequence that we propose and for this reason no values in columns BKS, GA, Backtracking-EA, and ACO-metaheuristic.

The column EA in Table 2 shows the optimal number of contacts obtained by the EA. From the table we can see that the algorithm HE-1 obtains also the optimal number of contacts for all tested sequences and HE-2 behaves similarly for all but sequences with length 48 and 60.

For sequences with length 48, 50, and 60 amino acids (Fig. 7), the algorithm HE-1 generates folds, which improve the BKS, and HE-2 behaves similarly excluding sequence with length 48. Figure 5 presents the solution for protein sequence with length 102 amino acids in analytical form, where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$x / \bar x$$ \end{document} , \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$y / \bar y$$ \end{document} , or \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$z / \bar z$$ \end{document} represents “plus/minus one” to the corresponding coordinate of the previous amino acid.

FIG. 5.

Solution for protein sequence with length 102 amino acids.

Figures 6 and 7 show the obtained conformations for protein sequences with length 102 and 60 amino acids.

FIG. 6.

Protein folds with length 102 amino acids (53 contacts). Light = hydrophobic; dark = hydrophilic.

FIG. 7.

Protein folds with length 60 amino acids (55 contacts). Light = hydrophobic; dark = hydrophilic.

The machine that we use for realization of the computational experiments is a laptop with Intel(R) Core(TM) i7-3632QM (2.20 GHz, 6 MB cache) processor and 8 GB RAM.

6. Conclusion

A novel mixed integer programming model for the HP folding problem is proposed, which is successfully used in an exact algorithm for the case of cubic lattices and also as a component for efficient heuristics. Without any further improvements, the exact algorithm is capable for finding (by use of solvers like CPLEX; www.ibm.com/software/products/en/ibmilogcpleoptistud) the optimal number of contacts for sequences of lengths up to 100 HP characters. This limit is overcome by the heuristics, which (at least over the available benchmarks) create near optimal folds.

Footnotes

Acknowledgments

This work is partially supported by the project of the Bulgarian National Science Fund, entitled: “Bioinformatics research: protein folding, docking and prediction of biological activity,” code NSF I02/16, 12.12.14.

Author Disclosure Statement

No competing financial interests exist.

References

Berger

, and Leighton

1998. Protein folding in the hydrophobic-hydrophilic (HP) is NP-complete. J. Comput. Biol., 5, 27–40.

Blazewick

, Dill

, Lukasiak

, et al. 2004. A tabu search strategy for finding low energy structures of proteins in HP-model. CMST, 10, 7–19.

Chen

, and Huang

2005. A branch and bound algorithm for the protein folding problem in the HP Lattice Model. Genomics Proteomics Bioinformatics, 3, 225–230.

Cotta

2003. Protein structure prediction using evolutionary algorithms hybridized with backtracking. Artif. Neural Nets Probl. Solving Methods, 2687, 321–328.

Custódio

F.L.

, Barbosa

H.J.C.

, and Dardenne

L.E.

2004. Investigation of the three-dimensional lattice HP protein folding model using a genetic algorithm. Genet. Mol. Biol., 27, 611–615.

Dill

K.A.

1985. Theory for the folding and stability of lobular proteins. Biochemistry, 24, 1501–1509.

Dill

K.A.

, and Lau

1989. A lattice statistical mechanics model of the conformational sequence spaces of proteins. Macromolecules, 22, 3986–3997.

Garza-Fabre

, Toscano-Pulido

, and Rodriguez-Tello

2003. Benchmark sequences for the HP model of protein structure prediction: 2D square and 3D cubic lattices.

Istrail

, Hurd

, Lippert

, et al. 2000. Prediction of self-assembly of energetic tiles and dominoes: Experiments, mathematics, and software. Technical Report SAND2002. Sandia National Laboratories.

10.

Istrail

, and Lam

2009. Combinatorial algorithms for protein folding in lattice models: A survey of mathematical results. Commun. Inf. Syst., 9, 303–346.

11.

Jiang

, Cui

, Shi

, et al. 2003. Protein folding simulations for the hydrophobic-hydrophilic model by combining tabu search with genetic algorithms. J. Chem. Phys., 119, 4592–4596.

12.

Krasnogor

, Pelta

, Lopez

P.M.

, et al. 1998. Genetic algorithms for the protein folding problem: A critical view. Proc. Eng. Intell. Syst. 1, 353–360.

13.

Liang

, and Wong

W.H.

2001. Evolutionary Monte Carlo for protein folding simulations. J. Chem. Phys., 115, 444–451.

14.

Lin

, and Su

2011. Protein 3D HP model folding simulation using a hybrid of genetic algorithm and particle swarm optimization. Int. J. Fuzzy Syst., 13, 140–147.

15.

Ramyachitra

, and Veeralakshmi

2014. Computational Analysis of Protein Structure Prediction and Folding. Int. J. Comput. Sci. Inform. Technol. Secur., 4, 116–127.

16.

Shmygelska

, and Hoos

2005. An ant colony optimisation algorithm for the 2D and 3D hydrophobic polar protein folding problem. BMC Bioinformatics, 6, Article No. 30.

17.

Thilagavathi

, and Amudha

2015. Aco-metaheuristic for 3D-HP protein folding optimization. ARPN J. Eng. Appl. Sci., 10, 4948–4953.

18.

Toma

, and Toma

1996. Contact interactions method: A new algorithm for protein folding simulations. Protein Sci. 5, 147–153.

19.

Traykov

, Angelov

, and Yanev

2016. A new heuristic algorithm for protein folding in the HP model. J Comput. Biol., 23, 662–668.

20.

Yanev

, Milanov

, and Mirchev

2011. Integer programming approaches to HP folding. Serdica J. Comput., 5, 359–366.

21.

Zhang

, and Skolnick

2004. Tertiary structure predictions on a comprehensive benchmark of medium to large size proteins. Biophys. J., 87, 2647–2655.