Abstract
The detection of complexes in protein–protein interaction (PPI) network bears great significance in predicting unknown protein and providing theoretical basis for various researches on a wide array of diseases. In this paper, a PPI network complexes detection algorithm based on Harmony Search Clustering Optimization Model (HMS-CD) is being proposed. Compared with the traditional harmony search algorithm, dynamic parameter of HMCR and BW is introduced to improve the search strategy. The method to find a set of nodes with a larger aggregation coefficient in PPI network is regarded as the objective function of the proposed algorithm. The experimental result on a real dataset like yeast PPI network shows that the proposed algorithm achieves better detection accuracy than traditional harmony search (HMS) algorithm and typical MCODE algorithm, and it can detect complexes in PPI network more effectively.
Introduction
Protein is an essential part in all cell and tissue structures of which serves as the executor of the physiological functions and is considered as the most important component in basic activities. A number of studies show that each protein in the cell does not exist in isolation but interacts with other proteins to fulfil its physiological functions. The network of interactions between proteins in organisms is called PPI (Protein–Protein Interaction) network. Current studies take protein network as an undirected graph. Each node consists of a protein, and each edge represents a pair of protein interaction. The PPI network is regarded as a kind of complex network which has a complex network topology structure feature, showing small-world and scale-free features (Watts et al. [14]). Studies have pointed out that the large-scale complex network, including biological network of significant modular characteristics, and a large number of protein complexes and functional modules (Graves et al. [5]) are shown in the PPI network. Those proteins connect to each other closely. Detecting complexes in the PPI network [13] help in predicting function of an unknown protein and providing an important theoretical basis of the research on disease prevention and treatment. According to the different implementation principles, the existing detection methods are mainly divided into four categories: based on (1) hierarchy [2], based on (2) density [1], based on (3) graph partitioning [20] and based on (4) core-attachment [3]. Some scholars have to adopt the intelligent algorithms such as Artificial Bee Colony Mechanism (Wu et al. [15]) and Ant Colony Clustering (Ji et al. [6]) to solve the problem of detecting functional modules in the PPI network.
Harmony Search (Geem et al. [4]) algorithm is a heuristic intelligent optimization algorithm designed by imitating the musicians playing wonderful music and has been successfully applied to many practical optimization problems such as Information network optimization (Mahdavi et al. [10]) and neural network training (Kulluk et al. [7]). Improved Harmony Search algorithm [9] has also been successfully designed for Automatic generation control of power system (Shiva et al. [12]), flowline manufacturing cell scheduling problem (Li et al. [8]) and other engineering problems. Similar to genetic algorithms that mimic biological evolution, the harmony search algorithm simulate the music playing with the variables of optimization problem corresponding to the music notes; the solution vectors corresponding to the harmony; the optimal solution corresponding to the beautiful sound and the optimization process corresponding to playing splendid music. The creation of a new harmony in the mechanism of the harmony search algorithm takes all the current harmony in harmony memory into account and adjusts to variables independent [18]. The biggest characteristic and advantage of this algorithm is that it can perform global both random searches and local searches as well [17].
Based on the advantage of the harmony search, the PPI network complexes detection algorithm based on Harmony Search Clustering Optimization Model (Harmony Search for Complexes Detection, HMS-CD) which is proposed in this paper aims to detect complexes clusters in the PPI network. The proposed algorithm overcomes the shortcomings of falling into local optimum easily by improving the search strategy of the traditional harmony search algorithm. It uses clustering coefficient of protein complex centre’s node to measure the clustering effect and impact the method of searching for the optimal clustering centre’s node as the objective function of the proposed algorithm to obtain the final detection result. The second section of this paper introduces the traditional harmony search (HMS) algorithm and the design of the objective function for the complexes detection in the PPI network which then introduces the framework of the proposed algorithm. In the third section, the result of experiment and analysis is presented. The fourth section summarizes the full text.
Proposed algorithm
This section introduces the traditional harmony search algorithm and the improved search strategy first and then applies the proposed algorithm to the studies of complexes detection in the PPI network. The objective function and the framework are therefore introduced toward the end of this section.
Design the complexes detection clustering model in PPI
Harmony Search algorithm is a kind of heuristic intelligent optimization algorithm. This algorithm initializes the harmony memory on a preset value scale firstly and next random search with the probability of HMCR in the harmony memory, then a new harmony is randomly generated by the probability of PAR and step length of
In this paper, the objective of the strategy in this proposed algorithm is to set the parameter of HMCR, PAR and
Designing the objective function of the proposed algorithm
One of the most important characteristics of biological data is that proteins with similar functions are linked to each other very closely. In the PPI network, the aggregation coefficient (Zhang et al. [19]) of each node reflects the aggregation degree of the node in the network. It is calculated by the number of real edges that exist on the adjacent nodes dividing by the number of possible edges, expressed in the formula (2):
In this paper, the method of searching for the optimal cluster center node is regarded as the objective function and the aggregation coefficient of the center node is used to evaluate the clustering effect. The larger the aggregation coefficient of the center node is, the closer the connection between the proteins in the complexes is and it is likelier that these proteins belong to the same complexes.
Algorithm of harmony search clustering optimization model
By improving the search strategy of the traditional harmony search algorithm, the harmony search clustering optimization model is proposed and applied for the complexes detection in the PPI network. The algorithm passes through certain phases. Firstly, it needs to initialize and work out the quantity of all nodes in the PPI network and identifying the number on these nodes one by one which is a benefit for improving the efficiency of the algorithm. The proposed algorithm begins with initializing the HMS. Each HMS stores a harmony corresponding to the centre’s node of a certain complexes cluster, then search a new harmony as the centre’s node of this cluster with probability of HMCR and the edge aggregation coefficient of the centre’s node is calculated as the objective function value. If the new objective function value is better than the value of the original node, replace the original node with the new one. Keep searching for the new harmony until the algorithm termination conditions are met which means all the objective function value of HMS is greater than the given threshold (
According to the above description, the specialized process of the proposed complexes detection algorithm based on Harmony Search clustering optimization model is shown in Algorithm 1.

The Framework of HMS-CD
This paper takes the yeast protein–protein interaction network as the research object. The PPI network data for experiment are from DIP (Xenarios et al. [16]), including 5093 proteins, 24743 interactive edges, and the protein complexes data used for experimental comparison come from CYC2008 (Pu et al. [11]) which contains 408 complexes.
Evaluation method of complexes detection
The selection of an appropriate evaluation method for complexes detection in the PPI network is the most important. Different methods will generate different results. Even the same method with a different parameter value will also generate different results. Now the most widely used evaluation indicator is precision and recall.
Precision is the ratio of the maximum matching of data set that map the result of experimental data set to standard data sets and the experimental data set. Formula (3) as follows:
Recall is the ratio of the maximum matching of data set that map the result of the experimental data set to standard data sets and the standard data set. Formula (4) as follows:
Among them x is one of the experiment result date sets,
In general, the more nodes the result data set of the experiment has, the higher the recall tends to be. On the contrary, the less nodes, the higher the precision is. In this paper, the harmonic means of the precision and recall is used to measure the performance of a method synthetically. It is computed as follows:
Parameter setting
In the proposed algorithm, the function of perturbation probability (PAR) is to determine whether the local search is performed or not. As a smaller PAR readily increases the possibility of local search, the PAR is set to 0.03 in the experiment. If the current iteration number reaches the maximum iterations, the algorithm will be terminated. In the experiment, the parameter MAXITE of maximum iterations was set to 5000. The 5-fold cross validation was carried out in the experiment. When
Experiment Result of
Experiment Result of
It is not difficult to find out that the algorithm achieves higher accuracy, up to 85% complexes can be obtained, and the advantage of the proposed algorithm is apparent. With the increase as the value of HMS, more complexes are obtained; the proportion of effective complexes will be smaller, and precision value will be decreased. Instead, the protein of matching rate that in known complexes will increase, thus, the recall value will show a rising trend. Therefore, when the HMS value is 50, the value of f-measure is larger, and the detection results are better.
Next, the effect of

Influence of

Influence of
The greater the value of

Influence of
From the experimental result as shown in Fig. 1 to Fig. 3 that when
In this paper, the algorithm was improved on the basis of the traditional harmony search algorithm (HMS). In order to verify the effectiveness of the improved algorithm parameters, the proposed HMS-CD algorithm was compared with the HMS algorithm through experiments. The experiments were carried out independently for 20 times, take the average of each experiment result as a comparison value, the result of precision, recall and f-measure under

Compared between HSM-CD and HMS on precision.

Compared between HSM-CD and HMS on recall.

Compared between HSM-CD and HMS on f-measure.
From the experimental result as shown in Fig. 4 to Fig. 5, compared the HMS algorithm, the proposed HSM-CD algorithm achieves higher value of precision and recall. Next, the influence of f-measure is shown in Fig. 6.
Figure 6 shows the comparative result on the value of f-measure, the proposed HMS-CD algorithm achieves higher value than HMS algorithm. It fully proves the effect of parameter improvement.
In order to verify the effectiveness of the complexes detection, the proposed HMS-CD algorithm was compared with the HMS algorithm and the MCODE (Bader et al. [1]) algorithm which is based on the density of the typical, The experiments were carried out independently for 10 times, the result of average value is shown in Fig. 7.

Compared HSM-CD with HMS and MCODE on f-measure.
As it can be seen in Fig. 7, the proposed HMS-CD algorithm achieves higher value of f-measure. It fully concludes the performance of this algorithm and the effectiveness of complexes detection in the PPI network.
In this paper, by improving and applying the traditional harmony search to detect complexes in PPI network, an algorithm for complexes detection in protein–protein interaction network based on harmony search clustering optimization has been presented. The experimental results show that the proposed algorithm not only has an ability for local searches but also has stronger the ability for global searches. Next work is to verify the reliability of the algorithm on a number of data sets, and the further optimization with algorithm time complexity shall be done.
Footnotes
Acknowledgement
This work was supported in part by the National Natural Science Foundation of China (No. 41362015) and Science & Technology Projects of Department of Education of Jiangxi Province (Grant Nos. GJJ14431, GJJ14432, GJJ14434 and GJJ14458).
