ARAZ: A software modules clustering method using the combination of particle swarm optimization and genetic algorithms

Abstract

A considerable percentage of software costs are usually related to its maintenance. Program comprehension is a prerequisite of the software maintenance and a considerable time of maintainers is spent to comprehend the structure and behavior of the software when the source code is the only product available. Program comprehension is one of difficult and challenging task especially in the absence of design documents of the software system. Clustering of software modules is an effective reverse-engineering method for extracting the software architecture and structural model from the source code. Finding the best clustering is considered to be a multi-objective NP hard optimization-problem and different meta-heuristic algorithms have been used for solving this problem. Local optimum, insufficient quality, insufficient performance and insufficient stability are the main shortcomings of the previous methods. Attaining higher values for software clustering quality, attaining higher success rate in clustering of software modules, attaining higher stability of the obtained results and attaining the higher convergence (speed) to generate optimal clusters are the main goals of this study. In this study, a hybrid meta heuristic method (ARAZ) includes particle swarm optimization algorithm and genetic algorithm (PSO-GA) is proposed to find the best clustering of software modules. An extensive series of experiments on 10 standard benchmark programs have been conducted. Regarding the results of experiments, the proposed method outperforms the other methods in terms of clustering quality, stability, success rate and convergence speed.

Keywords

Software maintenance software module clustering particle swarm optimization algorithm clustering quality convergence speed

1. Introduction

Software change several times during development phase and especially after it has been delivered. Changing a software product or component after delivery is named as software maintenance. Software maintenance is performed to repair software faults, adapt the software to a different operating environment and improve the software functionality. About 60% to 70% of software costs are usually related to its maintenance [6, 15, 19]. Before making a change in a piece of software, the maintainer should gain a complete understanding about the structure and behavior of the software (program comprehension). Program comprehension is a prerequisite of the software maintenance and a considerable time of maintainers is spent to comprehend the structure and behavior of the software [7, 15, 18, 20]. Program comprehension is one of difficult and challenging task especially in the absence of design documents of the software system. Program comprehension techniques and tools can reverse engineer an existing code of software system to generate an abstract structural model; hence, the software maintenance cost can be notably reduced using the generated structural models of software.

Software clustering is one of helpful methods to understand the large software system when the source code of the software system is the only product available. Software clustering decompose the software system into smaller manageable subsystems containing similar modules. Clustering of software modules is an effective method for extracting software architecture and structural model [1, 3, 10]. Software clustering is carried out according to the relations among the modules which are demonstrated by module dependency graphs (MDGs). The rationale behind clustering software modules is to produce clusters with maximum cohesion (intra cluster relations) and the minimum coupling (inter cluster relations) [1, 3, 4, 7]. Finding the best clustering is considered to be a multi-objective N-P hard optimization-problem. Hence, different meta-heuristic algorithms have been applied for solving this problem [3, 4, 5, 9]. In the related works which have been conducted up to now, different objectives have been considered in this research problem. Some of these objectives are as follows:

•
Obtaining higher quality for software module clustering (MQ)
•
Providing higher stability of the values obtained from clustering methods
•
Obtaining higher success rate in achieving the best MQ value by the clustering methods

For fulfilling the above-mentioned objectives, different studies proposed several meta-heuristic algorithms such as hill climbing (HC), genetic algorithms (GA), particle swarm optimization (PSO), multi-objective evolutionary algorithm, artificial bee colony algorithm (ABC) [2, 3, 4, 5, 11, 14, 16, 18, 17]. The drawbacks of these methods are the main motivations of this study. Regarding the results of conducted experiments, the main drawbacks of these methods are as follows:

•
In some of previous methods like HC based method, despite high clustering speed, the results are likely to be local optima.
•
Late convergence to the optimal results are the other shortcoming in some of the previous methods such as GA.
•
Inefficient performance of some previous methods in clustering software with a large number of modules and communications is the other important drawbacks.
•
The considerable variance among the results of different executions of a method is considered as instability of the method. Instability and low success rate in achieving the best clustering quality are the other shortcomings of the previous methods.

The proposed fitness function in [4] had the highest application in clustering studies. In the present study, a method (ARAZ) has been proposed for clustering software modules based on the hybrid algorithm of PSO and GA. The purposes of the present study are as follows:

•
Attaining higher values for software clustering quality (MQ)
•
Attaining higher success rate in clustering of software modules with the best MQ value
•
Attaining higher stability of the obtained results
•
Attaining the higher convergence (speed) to generate optimal clusters

The main contributions of this study are as follows:

•
Combining PSO and GA for optimal software clustering
•
Producing the high-quality clusters of software modules, especially for large software
•
Generating more stable results compared with the previous heuristic-methods
•
Attaining higher convergence and success rate compared with previous heuristic based software clustering-methods

The paper is organized as follows: Section 1 reports the introduction to the study which includes the significance and justification for the present study and the purpose of the study. Section 2 briefly reviews the related works. Section 3 introduces and clarifies the proposed method (ARAZ) in line with improving the clustering of software modules via the combination of PSO and GA. Section 4 reports the simulation of the proposed hybrid algorithm in details and the experiments on 10 real data. Also, discussion of the results and their comparison with those of PSO and GA are reported in this section. Finally, in Section 5, the contributions and findings of the study are concluded and reiterated and directions for further research are given.
2. Related works

Mancoridis et al., proposed methods for automatic clustering of software modules using HC and GA algorithms [8]; this method responded well to several software systems. Then, they developed Bunch clustering tool in 1999 which can automatically analyze and cluster different software systems. Mitchell [3], in 2002, introduced several heuristic algorithms to automatically analyze source code and then cluster it into subsystems. The input of Bunch tool is the module dependency graph (MDG) of the input program and includes three fitness functions. The fitness functions of Bunch are BasicMQ, TurboMQ and ITurboMQ [2]. Also, they presented better results in HC algorithm than those of GA.

In 2011, Praditwong et al., introduced two novel multi-objectıve approaches, namely maximum cluster approach (MCA) and equal cluster approach (ECA) for clustering software modules [4]. The obtained results of the study indicated that the multi-purpose approach was significantly better than the single-purpose approach. These two proposed approaches i.e. MCA and ECA, were regarded as a milestone in software clustering studies. The applied graphs in this approach were considered as both weighted and unweighted graphs. The main objective of MCA approach was to achieve a high degree of cohesıon and low coupling in such a way that the number of produced clusters is maximized and the number of single-module clusters is minimized. On the other hand, ECA encourages and stimulates the production of clusters with approximately equal number of modules. The objectıve function applied in this study was the most well-known objectıve function in the realm of clustering software modules. The results of the application of ECA and MCA in the multi-objective method in comparison with single objective HC method indicated that:

•
Multi-objective approach is able to present notably better results for the weighted and unweighted graphs.
•
Although multi-objective method produces desirable results, the single-objective method outperformed it in the unweighted graphs.
•
For achieving maximum cohesion and minimum coupling, ECA provided better results in the multi-objective approach.
•
In the multi-objective approach, double more effort was required for obtaining better results.

In 2016, Kumari et al., proposed automatic clustering of software modules [5]. They introduced two new multi-objective formulas for clustering software modules where cohesion and coupling were separately considered. The results of this experimental study revealed that the multi-objective approach provides significantly better solutions than the single-objective approach. In 2017, a multi-objective clustering method has been proposed for software modules [11]. Their purpose was to produce automatic clustering solutions which simultaneously optimize several contradictory clustering criteria. Until the development of this method, multi-objective evolutionary algorithm (MOEA) was considered as the best choice for solving this type of problems. However, it was observed that the performance of MOEA algorithm is reduced for optimizing several objectives which have more than two objective functions. For solving this problem, artificial bee colony algorithm (ABC) was proposed with five objective functions. The comparison of the results indicated that ABC method significantly outperforms the other existing methods.

In 2017, a method using particle swarm optimization (PSO) algorithm has been proposed for optimizing the clustering of software modules [9]. They improved clustering with respect to the following factors: optimizing the relations within clusters, relations among clusters, the number of clusters and the number of modules within each cluster. The results obtained in this study were compared with those of GA, HC and simulated annealing (SA) algorithms. The results indicated that the proposed method was notably effective and promising for clustering software modules. For sorting out the problems of slow convergence speed, poor clustering results and complicated algorithm, in 2018, Sun et al., used probability selection in which software system is converted into a complex network diagram [10]. Then, merger, adjustment and optimization operations are used for clustering software modules.

In [13], ant colony optimization (ACO) algorithm was used for the optimal clustering of software modules. Each independent cluster (a subsystem) includes highly dependent modules. Ant colony optimization (ACO) algorithm is a heuristic algorithm based on swarm intelligence which is used for solving several search-based optimization problems. In the proposed algorithm, each ant is regarded as a possible response for the clustering problem. A clustering with maximum quality includes the maximum number of cohesion and the minimum number of coupling. Intra-connections indicate the relation among the modules within the clusters and the inter-connections show the relation among the modules of different clusters. This method was experimented using a limited number of datasets. The results of conducted experiments on the three real data sets confirm the performance and stability of this method compared with the previous heuristic-methods. In [8] a new clustering method named Neighborhood tree has been proposed; this method creates a neighborhood tree using available knowledge in an ADG and uses this tree for clustering the modules of software. The results of experiments indicate the success of the algorithm in extracting an acceptable architecture in a reasonable time compared with some of the previous methods. In this method, the size of generated subsystems was not considered into accounts; also, this method is not compared with novel meta heuristic based methods. The comparison of the experimental results indicate the simplicity of the algorithm, little time complexity and high convergence speed. Indeed, this algorithm provides a simple but effective method for sorting out the problem of clustering software modules.
3. Proposed method (ARAZ)

Software modules are clustered based on the relations among the modules which are depicted via module dependency graphs (MDGs). In each MDG, nodes represent modules and edges stand for the relations among the modules. The resulting graph edges can be weighted or unweighted. In case they are unweighted, the weights of all the edges are assumed to be equal to one and illustrate the relations among modules which are one-way or two-way. However, if graph edges are weighted, the weights indicate the number of relations among the modules. These relations are the same as calling modules by one another. The resulting graph will be considered as the input of the clustering algorithm [1, 2]. For automatic production of MDG, source code analyzing tools are used. Some samples of these tools are as follows: CIA for C, Acacia for C and C $++$ , Chava for Java. These tools analyze and parse the source code and store the entities and relations among them in a file. Then, according to the information obtained from the file, the intended MDG can be designed [1]. As shown in Fig. 1, in the proposed clustering method, each cluster is defined in the form of an array (clusterıng array) whose length is equal to the number of modules in the MDG. In this array (clustering array), index stands for the number of a module from the MDGs. The content of each cell of the array refers to the cluster number where the module located in that cluster. The length of clustering array length is equal to the number of modules the corresponding MDG.

Figure 1.

Illustrating a clustering array.

Figure 2.

A clustering a software with nine module.

Figure 3.

The clustering array of the clustered software shown in Fig. 2.

3.1 Software module clustering using combination of PSO and GA

In order to overcome the drawbacks of previous software clustering methods (inadequate MQ, low convergence, inadequate success rate and low stability), we proposed a heuristic based software module clustering method using PSO and GA. The proposed method uses the capabilities of both heurıstıc algirthms. PSO is regarded as a particle swarm-based optimization algorithm [12]. This algorithm was modeled from the group flying of the birds and the group movement of the fish. Each member of the group is defined by two vectors, i.e. speed and position in the search space. In all the iteratıons, the new position of the particles is updated by vector of speed. The updating is done by using the best position found by the particle itself (X ${}^{\text{local best}}$ ) and the best position found by the best particle of the group (X ${}^{\text{global best}}$ ). Figure 4 shows the optimization process in PSO.

Figure 4.

Optimizing the position of each particle in PSO algorithm using the best experience of each individual particle and the best experience of all the particles.

In this study, optimizing clustering quality and achieving fast data convergence to the optimal response with respect to PSO and GA algorithms, we proposed a hybrid algorithm including both PSO and GA. In this algorithm, in the stage on updating speed vector of all the particles, two well-known operators of the genetic algorithm, i.e. crossover and mutation, were used for updating and optimizing the position of the particles. GA is regarded as one of the most significant population-based meta-heuristic algorithms which is used for salvıng optımızatıon problems [21]. The structure of the genetic algorithm includes chromosome, fitness function, population size and selection algorithm. Chromosome is a string or sequence of bits which is considered to be the coded form of a possible response to the problem. The objective/fitness function is a function within which the value of problem variable is inserted; hence, by defining and applying this function, the best possible response for the problem is detected by the genetic algorithm [14]. Crossover and mutation are two major operators of GA for generating new chromosomes from their parents. In the issue of clustering software modules, each chromosome is implemented by clustering array that is shown by Fig. 1.

Given the different objective/fitness functions introduced by researchers in the related works, the one used in [2, 3, 8, 13] was applied in this study for clustering software modules. The purpose of this function was to reduce relations among clusters and enhance relations among modules within the clusters as much as possible. This fitness function or MQ criterion makes a balance between cohesion and coupling of clusters. The enhancement of the cohesion and the reduction of the coupling improves the MQ score. Equation (1) illustrates the fitness function (MQ) that is used in this study. In Eq. (1), variable $i$ indicates the relation among the modules within each cluster (cohesion) and variable $j$ refers to the number of external relations of a cluster with other clusters (coupling). Since external relation always involves two clusters, the number of relations will be divided by two. In Eq. (2), $MF_{k}$ refers to the clustering factor which illustrates the ratio of the internal edges and external edges in each cluster. Equation (3) illustrates the way of measuring the fitness (MQ) for the clustered software shown in Fig. 2. The input of the fitness function is the relation among the modules of a software which is provided by the MDG. Each clustering way is shown by a clustering array and its MQ is computed by the fitness function (Eq. (1)).

$\displaystyle MQ=\sum_{k=1}^{n}MF_{k}$ (1) $\displaystyle MF_{k}=\left\{\begin{array}[]{ll}0&\text{if }i=1\\ \frac{i}{i+j/2}&\text{if }i>0\\ \end{array}\right.$ (2) $\displaystyle MQ=\sum_{k=1}^{n=3}\frac{i}{i+\left({\frac{j}{2}}\right)}=\frac{% 2}{2+\left({\frac{1}{2}\ast 1}\right)}{}+\frac{1}{1+\left({\frac{1}{2}\ast 3}% \right)}+\frac{4}{4+\left({\frac{1}{2}\ast 2}\right)}=2$ (3)

The flowchart of the proposed PSO-GA algorithm is given in Fig. 5.

Figure 5.

Flowchart of the proposed PSO_GA algorithm.

3.1.1 Initial population

The execution of evolutionary algorithms begins with the production of the initial population. In this algorithm, before the production of the initial population, the best particle of the group (X ${}^{\text{global best}}$ ) is produced randomly and the value of the fitness function is measured for it. In this way, we will have a pre-supposed value for identifying the best real particle. Next, the initial population of the birds as much as the initial size is randomly produced. The best found position is produced by each particle at the same time along with the production of each member of the population. At first, this position is equal to the position of the particle itself in the group. MQ value is measured for all the particles simultaneously with the production of the initial population.

3.1.2 Updating position of each particle with GA algorithm

By executing the first iteration of the algorithm, the position of the best group member (X ${}^{\text{Global best}}$ ) and, also, the best position found by each particle in the group and MQ amount for all the particles are determined. Now, we have two positions, i.e. X ${}^{\text{local best}}$ and X ${}^{\text{Global best}}$ and the amount of MQ for each particle or X ${}^{\text{old}}$ . In PSO algorithm, in case a particle wants to access the optimal position, it should either use the position of X ${}^{\text{local best}}$ which is the local best or the individual experience of each particle; or it should use global best which is the best experience of all the particles. In this study, the stages of the updating position of each particle are as follows:

Stage 1:
applying crossover operator on X ${}^{\text{local best}}$ and X ${}^{\text{global best}}$ .

As shown in Fig. 6, for using the personal experience of each particle and the experience of all the group members, we used 2-point crossover operator which is considered as one of the significant functions of the genetic algorithm. Accordingly, X ${}^{\text{local best}}$ is considered as one chromosome and X ${}^{\text{Global best}}$ is considered as the next chromosome; the crossover operator is applied on these two chromosomes. The chromosome or the newly produced particle includes two sections of the position of the best particle of the group and one section of the best position found by that particle in the group. This experiment was also carried out by single-point crossover operator; however, the best results were related to the application of 2-point crossover operator.

Figure 6.
Executing crossover operator on the position of the best local X and the best global X for finding the next best position or X ${}^{\text{new}}$ .

By obtaining the position of X ${}^{\text{new}}$ , the value of fitness function is measured for it. In case the obtained MQ value is more than that of X ${}^{\text{local best}}$ , X ${}^{\text{new}}$ will replace it and the value of the fitness function is updated for it. If X ${}^{\text{new}}$ with respect to MQ value is better than X ${}^{\text{global best}}$ , it will replace it. Otherwise, the algorithm will be led to the next optimization stage which is the application of mutation operator.
Stages 2 to 5:
applying mutation operator on the particles of X ${}^{\text{global best}}$ , X ${}^{\text{local best}}$ , X ${}^{\text{new}}$ and X ${}^{\text{old}}$ .

As the problem relations become more complicated, we can use all four stages of the mutation operations on X ${}^{\text{global best}}$ X ${}^{\text{local best}}$ , X ${}^{\text{new}}$ and X ${}^{\text{old}}$ . In case the problem relations are not complicated but simple, we can apply only two stages of mutation on X ${}^{\text{global best}}$ and X ${}^{\text{local best}}$ . This operation will be effective while executing the proposed algorithm. The mutation operator is applied for accessing all the search space and for finding the clusters lost in the population. Also, this operator reduces the probability of the problem responses falling in the local optima. In this way, if we cannot find better clustering than X ${}^{\text{global best}}$ and X ${}^{\text{local best}}$ by using crossover function, we will apply the mutation operator on the particle X ${}^{\text{global best}}$ . Then, the comparisons made while using the crossover operator are carried out on the particle produced from the mutation particle. If an optimal response is found, the position of the particle will replace those of X ${}^{\text{global best}}$ or X ${}^{\text{local best}}$ and the algorithm will get to optimize the next particle or the next iterations. Finally, after the execution of all the iterations of the algorithm, our best response will be X ${}^{\text{global best}}$ . In case of the failure of the mutation operator on X ${}^{\text{global best}}$ , it will be applied on X ${}^{\text{local best}}$ . If no improvement is obtained yet, the mutation operation will be carried out on X ${}^{\text{new}}$ and X ${}^{\text{old}}$ respectively. The probability of the mutation operation for all the particles ranges from 0 to 1. That is, if an improvement is obtained while conducting the crossover operation and in the optimization, the mutation function will not be applied on any of the particles.

In Eq. (4), the number of variables or clusters which are mutated in each chromosome is denoted by Nmu and the mutation percentage is indicated by mu. The mu rate depends on the number of the available modules in the program and is represented by nVar in the equation. In this study, the mu rate in low-module programs ranges from 0.03 to 0.05 and in high-module programs, it ranges from 0.01 to 0.02. Regarding high-module programs, if mutation is done on higher number of clusters, it leads to the collapse of the extracted clusters; Also, it will prevent the optimal clustering or leads to late access to the optimal clustering.

$\displaystyle\textit{Nmu}=\textit{ceil}(\textit{mu}\textit{nVar})$ (4)

Table 1
The list of 10 benchmark MDGs along with the number of modules and edges of the corresponding programs

Applications Number of modules Number of edges

RCS 29 169

INCL 174 360

GRAPPA 86 295

Modulizer 26 66

Compiler 13 32

Mtunis 20 57

Bison 37 179

Boxer 18 29

acqCIGNA 114 180

Ispell 24 103

Table 2
Data obtained from 10 executions of PSO-GA, PSO and GA methods on the RCS, INCLE, GRAPPA* and Modulizer MDGs with three different clusters’ number

Program name Clustering algorithm name Number of cluster MQ in 10 runs

Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10

RCS PSO_GA 4 1.9963 2.0080 2.0094 1.9892 1.9963 2.0094 2.0076 1.9963 1.9892 2.0002

Modules:29 3 1.8980 1.8980 1.8980 1.8762 1.8980 1.8980 1.8922 1.8922 1.8922 1.8980

Edges:169 2 1.7454 1.6411 1.6022 1.6113 1.7454 1.6047 1.7454 1.6411 1.6113 1.7454

PSO 4 1.9892 2.009 2.002 2.0076 1.9915 2.0094 2.0094 1.9892 1.9963 1.9892

3 1.8715 1.8882 1.8922 1.8922 1.8769 1.8847 1.7172 1.5866 1.898 1.898

2 1.7454 1.6139 1.6000 1.7454 1.4864 1.6113 1.7454 1.6113 1.6139 1.6139

GA 4 1.9888 1.9768 1.9963 2.0094 2.0076 2.0002 1.9859 1.9650 1.9963 2.0076

3 1.8853 1.8980 1.8757 1.8874 1.8922 1.8882 1.8922 1.7400 1.8826 1.8891

2 1.6075 1.6411 1.7454 1.6097 1.6411 1.6113 1.6113 1.6139 1.6139 1.5931

INCL PSO_GA 10 6.2841 6.1268 6.6113 6.6759 6.1973 6.3291 5.8666 6.1435 6.3645 6.3529

Modules:174 8 5.4869 5.2038 4.9543 5.2038 5.2064 4.9835 5.1746 4.8657 5.2807 5.2038

Edges:360 5 4.1185 3.8850 4.0910 3.9690 3.9748 4.1572 4.0382 3.9624 3.9978 3.8710

PSO 10 6.0966 6.1134 6.3496 6.6489 5.9726 6.0568 5.8273 6.0775 6.6964 6.3899

8 5.4181 5.2326 5.3430 5.0914 5.2889 5.2454 5.6094 4.9616 5.3330 5.5600

5 4.0870 4.0011 4.0622 3.9117 4.1424 4.0751 3.9753 4.0159 4.0570 3.9456

GA 10 5.7995 6.3005 5.9550 6.1846 5.9753 5.9753 5.9549 5.2825 6.0595 5.3307

8 4.8432 4.9429 5.2921 5.0018 4.5840 5.3779 4.8224 5.2096 4.6153 5.4724

5 3.6583 4.0174 3.6387 4.3506 3.9867 3.4783 3.7814 3.8159 4.0090 3.9329

GRAPPA PSO_GA 4 3.8645 3.7930 3.8853 3.7531 3.6465 3.8677 3.8231 3.9114 3.7640 3.6341

Modules:86 5 4.6767 4.7400 4.7339 4.6276 4.7339 4.8177 4.7215 4.5919 4.7597 4.6450

Edges:295 6 5.6917 5.5772 5.7118 5.7244 5.3862 5.7121 5.6965 5.5052 5.6690 5.7069

PSO 4 3.8325 3.7730 3.5877 3.6766 3.5779 3.7758 3.8047 3.5458 3.6110 3.5912

5 4.8359 4.7194 4.5673 4.5640 4.8489 4.7371 4.8489 4.5628 4.7313 4.6952

6 5.7566 5.4731 5.6841 5.7859 5.3972 5.7004 5.5571 5.7865 5.6205 5.6841

GA 4 3.6555 3.8106 3.5125 3.6152 3.7290 3.6898 3.4816 3.0353 3.5523 3.6321

5 4.6542 4.5131 4.5131 4.5133 4.5014 4.3128 4.4697 4.5815 4.4911 4.4278

6 5.0547 5.1021 5.2636 5.5285 5.0148 5.2647 5.1563 5.3047 4.9137 5.3028

Modulizer PSO_GA 7 3.1842 3.0610 3.1842 3.1842 3.0610 3.1842 3.1094 3.1842 3.1842 3.1094

Modules:26 6 3.0353 3.0353 3.0144 3.0353 3.0353 3.0353 2.4854 2.9204 2.9121 3.0353

Edges:66 3 2.2086 1.9552 2.2086 1.9396 2.2086 2.1974 2.2086 2.2086 1.9396 2.2086

PSO 7 3.1842 3.1094 3.1094 3.1842 3.1094 3.1842 3.1842 3.0610 3.1094 3.0610

6 3.0353 2.9121 3.0353 3.0353 2.9121 2.6194 3.0144 2.9121 3.0353 2.6194

3 1.9552 1.9396 1.9272 1.9396 2.2086 1.8592 2.1974 2.2086 1.9582 1.8400

GA 7 3.1094 3.1094 3.1842 3.1037 3.1842 3.1842 3.1037 3.1089 3.1842 3.1842

6 2.9264 3.0144 3.0353 3.0353 2.9702 2.8682 2.8994 2.9503 2.9552 3.0353

3 2.2086 2.1974 2.9552 1.9396 2.1678 2.1974 2.2086 1.9396 1.9396 1.9396

Table 3
Data obtained from 10 executions of PSO-GA, PSO and GA methods on the Compiler, Mtunis, Bison and Boxer MDGs with three different clusters’ number

Program name Clustering algorithm name Number of cluster MQ in 10 runs

Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10

Compiler PSO_GA 5 1.7083 1.6578 1.7083 1.7083 1.7083 1.7083 1.7083 1.7083 1.7083 1.7083

Modules:13 3 1.4804 1.4251 1.4804 1.4804 1.4804 1.4804 1.4804 1.4804 1.4804 1.4804

Edges:32 2 1.2471 1.2471 1.2000 1.2471 1.2271 1.1867 1.2471 1.2471 1.2471 1.2271

PSO 5 1.7083 1.7083 1.7083 1.7083 1.7083 1.7083 1.6578 1.7083 1.7083 1.6578

3 1.4804 1.4804 1.4251 1.4804 1.4804 1.4804 1.4251 1.4804 1.4804 1.4804

2 1.2471 1.2644 1.2644 1.2163 1.2271 1.2471 1.2471 1.2163 1.2471 1.2644

GA 5 1.4251 1.7083 1.7083 1.6650 1.7083 1.7083 1.7083 1.7083 1.7083 1.7083

3 1.4251 1.4804 1.4804 1.4804 1.4804 1.4804 1.4804 1.4804 1.4804 1.4804

2 1.2471 1.2471 1.2644 1.2471 1.2644 1.2471 1.2471 1.2471 1.2471 1.2644

Mtunis PSO_GA 4 2.3145 2.3145 2.3145 2.3145 2.3145 2.3145 2.3145 2.3145 2.3145 2.3145

Modules:20 3 2.0579 2.1248 2.1248 2.0810 2.1248 2.1248 2.1248 2.1248 2.1248 2.1248

Edges:57 2 1.8588 1.8588 1.8588 1.8588 1.8588 1.8588 1.8328 1.8588 1.8588 1.8588

PSO 4 2.3145 2.3145 2.3145 2.3145 2.3145 2.3145 2.3145 2.3145 2.3145 2.3145

3 2.1248 2.5790 2.1248 2.1248 2.1248 2.5790 2.1248 2.0810 2.1248 2.1248

2 1.8588 1.8588 1.7926 1.8588 1.8588 1.7926 1.8588 1.8588 1.8588 1.8588

GA 4 2.3145 2.3145 2.3145 2.3145 2.3145 2.3145 2.3145 2.3145 2.3145 2.3145

3 2.0579 2.1248 2.1248 2.1248 2.1248 2.1248 2.1009 2.5490 2.8100 2.1248

2 1.7926 1.8585 1.8588 1.8588 1.8588 1.8328 1.8588 1.8588 1.8588 1.8168

Bison PSO_GA 10 2.5876 2.5876 2.6008 2.5993 2.5904 2.5993 2.5712 2.6008 2.5876 2.6008

Modules:37 8 2.5344 2.5344 2.5344 2.5344 2.5344 2.5344 2.5344 2.5344 2.5344 2.5344

Edges:179 5 2.1338 2.1681 2.1367 2.1367 2.1574 2.1367 2.1574 2.1367 2.1148 2.1375

PSO 10 2.5876 2.4750 2.5993 2.6008 2.5876 2.5904 2.5541 2.5904 2.5931 2.5876

8 2.5344 2.5344 2.5344 2.4065 2.5344 2.4038 2.3944 2.5322 2.4065 2.5344

5 2.1180 2.1148 2.1497 2.1367 2.1305 2.1467 2.1574 2.0852 2.1574 2.1681

GA 10 2.5515 2.5876 2.5872 2.4891 2.4633 2.4231 2.5480 2.6008 2.5396 2.5091

8 2.5344 2.5344 2.5344 2.5344 2.5344 2.5344 2.5344 2.4065 2.5344 2.5344

5 2.1208 2.1293 2.1367 2.1367 2.1131 2.1681 2.0133 2.1148 2.1498 2.1293

Boxer PSO_GA 4 2.9822 2.9822 2.9822 2.9822 2.8397 2.9822 2.9822 2.9822 2.8417 2.9822

Modules:18 5 2.7504 2.7504 2.6889 2.6889 2.7504 2.7504 2.6889 2.6889 2.6889 2.7504

Edges:29 6 2.4250 2.4250 2.0828 2.4250 2.0827 2.4250 2.4250 2.4250 2.4250 2.4250

PSO 4 2.9822 2.9822 2.9822 2.8000 2.9822 2.9568 2.9822 2.9822 2.9822 2.9822

5 2.7504 2.6889 2.7504 2.6889 2.7504 2.6889 2.7504 2.6889 2.6889 2.7504

6 2.4250 2.4250 2.4250 2.4250 2.4250 2.4250 2.0828 2.4250 2.4250 2.4250

GA 4 2.9822 2.9822 2.9222 2.9222 2.9822 2.7504 2.9822 2.9222 2.9822 2.9822

5 2.7504 2.7504 2.7504 2.6889 2.7504 2.7504 2.7504 2.6889 2.7504 2.6889

6 2.1434 2.4250 2.4250 2.4250 2.4250 2.4250 2.4250 2.4250 2.4250 2.4250

Table 4
Data obtained from 10 executions of PSO-GA, PSO and GA methods on the acqCIGNA and Ispell MDGs with three different clusters’ number

Program name Clustering algorithm name Number of cluster MQ in 10 runs

Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10

acqCIGNA PSO_GA 7 7.2864 7.1924 6.9719 7.2405 7.0754 6.7911 7.1431 7.2613 6.6932 7.0511

Modules:114 6 6.1544 6.1382 5.7840 5.9798 5.6162 5.6934 6.2131 5.8319 5.9682 6.0349

Edges:180 3 4.3011 4.3209 4.5822 4.4124 4.5338 4.5198 4.4350 4.5566 4.8966 4.4877

PSO 7 7.1870 6.9328 6.9396 6.9166 6.6902 6.4940 7.0656 6.9344 7.2526 7.2306

6 6.0984 5.9782 5.5597 5.8390 6.1019 5.9385 5.7688 5.4663 6.1085 5.9722

3 4.6865 4.2962 4.8340 4.3555 4.7738 4.2728 4.3486 4.8298 4.5102 4.3589

GA 7 6.3293 6.4986 6.1026 6.7561 6.7675 6.1516 6.6671 6.3025 5.9468 6.5720

6 5.2496 5.5865 5.2840 5.3065 5.1841 4.9949 5.2281 5.2352 5.6685 5.1495

3 4.4044 4.3598 4.2988 4.1178 4.2440 4.1481 4.2408 3.9454 4.3839 4.1215

Ispell PSO_GA 5 2.2567 2.2727 2.2922 2.2629 2.2610 2.2491 2.2805 2.2629 2.2922 2.2727

Modules:24 3 2.1851 2.1943 2.2021 2.1724 2.1748 2.1827 2.1943 2.1589 2.1851 2.1764

Edges:103 2 1.9369 1.9369 1.9206 1.9350 1.9369 1.8958 1.9163 1.9369 1.8967 1.9369

PSO 5 2.2825 2.1589 2.2825 2.2629 2.2629 2.2688 2.2727 2.2825 2.2825 2.2629

3 2.1827 2.1822 2.2021 2.1724 2.1822 2.2021 2.1618 2.1702 2.2021 2.1851

2 1.9206 1.9024 1.9206 1.9369 1.9024 1.9024 1.9369 1.8996 1.3472 1.3818

GA 5 2.2761 2.2467 2.2747 2.2907 2.2922 2.2890 2.2932 2.2888 2.2883 2.2591

3 2.1822 2.1724 2.2021 2.1677 2.1383 2.1943 2.1943 2.1827 2.1827 2.1924

2 1.9163 1.9369 1.9350 1.9350 1.9350 1.9350 1.9369 1.9163 1.9369 1.9350

4. Experiments and results

Applications	Number of modules	Number of edges
RCS	29	169
INCL	174	360
GRAPPA	86	295
Modulizer	26	66
Compiler	13	32
Mtunis	20	57
Bison	37	179
Boxer	18	29
acqCIGNA	114	180
Ispell	24	103

Program name	Clustering algorithm name	Number of cluster	MQ in 10 runs
			Run 1	Run 2	Run 3	Run 4	Run 5	Run 6	Run 7	Run 8	Run 9	Run 10
RCS	PSO_GA	4	1.9963	2.0080	2.0094	1.9892	1.9963	2.0094	2.0076	1.9963	1.9892	2.0002
Modules:29		3	1.8980	1.8980	1.8980	1.8762	1.8980	1.8980	1.8922	1.8922	1.8922	1.8980
Edges:169		2	1.7454	1.6411	1.6022	1.6113	1.7454	1.6047	1.7454	1.6411	1.6113	1.7454
	PSO	4	1.9892	2.009	2.002	2.0076	1.9915	2.0094	2.0094	1.9892	1.9963	1.9892
		3	1.8715	1.8882	1.8922	1.8922	1.8769	1.8847	1.7172	1.5866	1.898	1.898
		2	1.7454	1.6139	1.6000	1.7454	1.4864	1.6113	1.7454	1.6113	1.6139	1.6139
	GA	4	1.9888	1.9768	1.9963	2.0094	2.0076	2.0002	1.9859	1.9650	1.9963	2.0076
		3	1.8853	1.8980	1.8757	1.8874	1.8922	1.8882	1.8922	1.7400	1.8826	1.8891
		2	1.6075	1.6411	1.7454	1.6097	1.6411	1.6113	1.6113	1.6139	1.6139	1.5931
INCL	PSO_GA	10	6.2841	6.1268	6.6113	6.6759	6.1973	6.3291	5.8666	6.1435	6.3645	6.3529
Modules:174		8	5.4869	5.2038	4.9543	5.2038	5.2064	4.9835	5.1746	4.8657	5.2807	5.2038
Edges:360		5	4.1185	3.8850	4.0910	3.9690	3.9748	4.1572	4.0382	3.9624	3.9978	3.8710
	PSO	10	6.0966	6.1134	6.3496	6.6489	5.9726	6.0568	5.8273	6.0775	6.6964	6.3899
		8	5.4181	5.2326	5.3430	5.0914	5.2889	5.2454	5.6094	4.9616	5.3330	5.5600
		5	4.0870	4.0011	4.0622	3.9117	4.1424	4.0751	3.9753	4.0159	4.0570	3.9456
	GA	10	5.7995	6.3005	5.9550	6.1846	5.9753	5.9753	5.9549	5.2825	6.0595	5.3307
		8	4.8432	4.9429	5.2921	5.0018	4.5840	5.3779	4.8224	5.2096	4.6153	5.4724
		5	3.6583	4.0174	3.6387	4.3506	3.9867	3.4783	3.7814	3.8159	4.0090	3.9329
GRAPPA	PSO_GA	4	3.8645	3.7930	3.8853	3.7531	3.6465	3.8677	3.8231	3.9114	3.7640	3.6341
Modules:86		5	4.6767	4.7400	4.7339	4.6276	4.7339	4.8177	4.7215	4.5919	4.7597	4.6450
Edges:295		6	5.6917	5.5772	5.7118	5.7244	5.3862	5.7121	5.6965	5.5052	5.6690	5.7069
	PSO	4	3.8325	3.7730	3.5877	3.6766	3.5779	3.7758	3.8047	3.5458	3.6110	3.5912
		5	4.8359	4.7194	4.5673	4.5640	4.8489	4.7371	4.8489	4.5628	4.7313	4.6952
		6	5.7566	5.4731	5.6841	5.7859	5.3972	5.7004	5.5571	5.7865	5.6205	5.6841
	GA	4	3.6555	3.8106	3.5125	3.6152	3.7290	3.6898	3.4816	3.0353	3.5523	3.6321
		5	4.6542	4.5131	4.5131	4.5133	4.5014	4.3128	4.4697	4.5815	4.4911	4.4278
		6	5.0547	5.1021	5.2636	5.5285	5.0148	5.2647	5.1563	5.3047	4.9137	5.3028
Modulizer	PSO_GA	7	3.1842	3.0610	3.1842	3.1842	3.0610	3.1842	3.1094	3.1842	3.1842	3.1094
Modules:26		6	3.0353	3.0353	3.0144	3.0353	3.0353	3.0353	2.4854	2.9204	2.9121	3.0353
Edges:66		3	2.2086	1.9552	2.2086	1.9396	2.2086	2.1974	2.2086	2.2086	1.9396	2.2086
	PSO	7	3.1842	3.1094	3.1094	3.1842	3.1094	3.1842	3.1842	3.0610	3.1094	3.0610
		6	3.0353	2.9121	3.0353	3.0353	2.9121	2.6194	3.0144	2.9121	3.0353	2.6194
		3	1.9552	1.9396	1.9272	1.9396	2.2086	1.8592	2.1974	2.2086	1.9582	1.8400
	GA	7	3.1094	3.1094	3.1842	3.1037	3.1842	3.1842	3.1037	3.1089	3.1842	3.1842
		6	2.9264	3.0144	3.0353	3.0353	2.9702	2.8682	2.8994	2.9503	2.9552	3.0353
		3	2.2086	2.1974	2.9552	1.9396	2.1678	2.1974	2.2086	1.9396	1.9396	1.9396

Program name	Clustering algorithm name	Number of cluster	MQ in 10 runs
Compiler	PSO_GA	5	1.7083	1.6578	1.7083	1.7083	1.7083	1.7083	1.7083	1.7083	1.7083	1.7083
Modules:13		3	1.4804	1.4251	1.4804	1.4804	1.4804	1.4804	1.4804	1.4804	1.4804	1.4804
Edges:32		2	1.2471	1.2471	1.2000	1.2471	1.2271	1.1867	1.2471	1.2471	1.2471	1.2271
	PSO	5	1.7083	1.7083	1.7083	1.7083	1.7083	1.7083	1.6578	1.7083	1.7083	1.6578
		3	1.4804	1.4804	1.4251	1.4804	1.4804	1.4804	1.4251	1.4804	1.4804	1.4804
		2	1.2471	1.2644	1.2644	1.2163	1.2271	1.2471	1.2471	1.2163	1.2471	1.2644
	GA	5	1.4251	1.7083	1.7083	1.6650	1.7083	1.7083	1.7083	1.7083	1.7083	1.7083
		3	1.4251	1.4804	1.4804	1.4804	1.4804	1.4804	1.4804	1.4804	1.4804	1.4804
		2	1.2471	1.2471	1.2644	1.2471	1.2644	1.2471	1.2471	1.2471	1.2471	1.2644
Mtunis	PSO_GA	4	2.3145	2.3145	2.3145	2.3145	2.3145	2.3145	2.3145	2.3145	2.3145	2.3145
Modules:20		3	2.0579	2.1248	2.1248	2.0810	2.1248	2.1248	2.1248	2.1248	2.1248	2.1248
Edges:57		2	1.8588	1.8588	1.8588	1.8588	1.8588	1.8588	1.8328	1.8588	1.8588	1.8588
	PSO	4	2.3145	2.3145	2.3145	2.3145	2.3145	2.3145	2.3145	2.3145	2.3145	2.3145
		3	2.1248	2.5790	2.1248	2.1248	2.1248	2.5790	2.1248	2.0810	2.1248	2.1248
		2	1.8588	1.8588	1.7926	1.8588	1.8588	1.7926	1.8588	1.8588	1.8588	1.8588
	GA	4	2.3145	2.3145	2.3145	2.3145	2.3145	2.3145	2.3145	2.3145	2.3145	2.3145
		3	2.0579	2.1248	2.1248	2.1248	2.1248	2.1248	2.1009	2.5490	2.8100	2.1248
		2	1.7926	1.8585	1.8588	1.8588	1.8588	1.8328	1.8588	1.8588	1.8588	1.8168
Bison	PSO_GA	10	2.5876	2.5876	2.6008	2.5993	2.5904	2.5993	2.5712	2.6008	2.5876	2.6008
Modules:37		8	2.5344	2.5344	2.5344	2.5344	2.5344	2.5344	2.5344	2.5344	2.5344	2.5344
Edges:179		5	2.1338	2.1681	2.1367	2.1367	2.1574	2.1367	2.1574	2.1367	2.1148	2.1375
	PSO	10	2.5876	2.4750	2.5993	2.6008	2.5876	2.5904	2.5541	2.5904	2.5931	2.5876
		8	2.5344	2.5344	2.5344	2.4065	2.5344	2.4038	2.3944	2.5322	2.4065	2.5344
		5	2.1180	2.1148	2.1497	2.1367	2.1305	2.1467	2.1574	2.0852	2.1574	2.1681
	GA	10	2.5515	2.5876	2.5872	2.4891	2.4633	2.4231	2.5480	2.6008	2.5396	2.5091
		8	2.5344	2.5344	2.5344	2.5344	2.5344	2.5344	2.5344	2.4065	2.5344	2.5344
		5	2.1208	2.1293	2.1367	2.1367	2.1131	2.1681	2.0133	2.1148	2.1498	2.1293
Boxer	PSO_GA	4	2.9822	2.9822	2.9822	2.9822	2.8397	2.9822	2.9822	2.9822	2.8417	2.9822
Modules:18		5	2.7504	2.7504	2.6889	2.6889	2.7504	2.7504	2.6889	2.6889	2.6889	2.7504
Edges:29		6	2.4250	2.4250	2.0828	2.4250	2.0827	2.4250	2.4250	2.4250	2.4250	2.4250
	PSO	4	2.9822	2.9822	2.9822	2.8000	2.9822	2.9568	2.9822	2.9822	2.9822	2.9822
		5	2.7504	2.6889	2.7504	2.6889	2.7504	2.6889	2.7504	2.6889	2.6889	2.7504
		6	2.4250	2.4250	2.4250	2.4250	2.4250	2.4250	2.0828	2.4250	2.4250	2.4250
	GA	4	2.9822	2.9822	2.9222	2.9222	2.9822	2.7504	2.9822	2.9222	2.9822	2.9822
		5	2.7504	2.7504	2.7504	2.6889	2.7504	2.7504	2.7504	2.6889	2.7504	2.6889
		6	2.1434	2.4250	2.4250	2.4250	2.4250	2.4250	2.4250	2.4250	2.4250	2.4250

Program name	Clustering algorithm name	Number of cluster	MQ in 10 runs
acqCIGNA	PSO_GA	7	7.2864	7.1924	6.9719	7.2405	7.0754	6.7911	7.1431	7.2613	6.6932	7.0511
Modules:114		6	6.1544	6.1382	5.7840	5.9798	5.6162	5.6934	6.2131	5.8319	5.9682	6.0349
Edges:180		3	4.3011	4.3209	4.5822	4.4124	4.5338	4.5198	4.4350	4.5566	4.8966	4.4877
	PSO	7	7.1870	6.9328	6.9396	6.9166	6.6902	6.4940	7.0656	6.9344	7.2526	7.2306
		6	6.0984	5.9782	5.5597	5.8390	6.1019	5.9385	5.7688	5.4663	6.1085	5.9722
		3	4.6865	4.2962	4.8340	4.3555	4.7738	4.2728	4.3486	4.8298	4.5102	4.3589
	GA	7	6.3293	6.4986	6.1026	6.7561	6.7675	6.1516	6.6671	6.3025	5.9468	6.5720
		6	5.2496	5.5865	5.2840	5.3065	5.1841	4.9949	5.2281	5.2352	5.6685	5.1495
		3	4.4044	4.3598	4.2988	4.1178	4.2440	4.1481	4.2408	3.9454	4.3839	4.1215
Ispell	PSO_GA	5	2.2567	2.2727	2.2922	2.2629	2.2610	2.2491	2.2805	2.2629	2.2922	2.2727
Modules:24		3	2.1851	2.1943	2.2021	2.1724	2.1748	2.1827	2.1943	2.1589	2.1851	2.1764
Edges:103		2	1.9369	1.9369	1.9206	1.9350	1.9369	1.8958	1.9163	1.9369	1.8967	1.9369
	PSO	5	2.2825	2.1589	2.2825	2.2629	2.2629	2.2688	2.2727	2.2825	2.2825	2.2629
		3	2.1827	2.1822	2.2021	2.1724	2.1822	2.2021	2.1618	2.1702	2.2021	2.1851
		2	1.9206	1.9024	1.9206	1.9369	1.9024	1.9024	1.9369	1.8996	1.3472	1.3818
	GA	5	2.2761	2.2467	2.2747	2.2907	2.2922	2.2890	2.2932	2.2888	2.2883	2.2591
		3	2.1822	2.1724	2.2021	2.1677	2.1383	2.1943	2.1943	2.1827	2.1827	2.1924
		2	1.9163	1.9369	1.9350	1.9350	1.9350	1.9350	1.9369	1.9163	1.9369	1.9350

In this study, the following 10 benchmark programs, i.e. RCS, INCLE, grappa, modulizer, compiler, Mtunis, Bison, Boxer, acqCIGNA and spell were evaluated as the inputs of the three clustering algorithms of PSO, GA and the proposed PSO-GA algorithm. Each benchmark includes the MDG of the corresponding program. Edges in each MDG are considered as unweighted. Table 1 gives the characteristics of the benchmark programs. The number modules and related edges of each benchmark program were illustrated in Table 1. Each benchmark MDG was clustered by three different clustering methods which differed from one another in terms of the number of cluster. This task was done to evaluate the effectiveness of the clustering methods for fındıng the best clustering. Each experiment was executed for more than 10 times. The proposed PSO-GA clustering method was implemented in Matlab. Tables 2–4 give the results related to the best value of the clustering quality (MQ) for 10 executions of PSO, GA and the proposed PSO-GA algorithms. In this study, for comparing the performance of the different clustering algorithms, we examined the values related to the best MQ value.

4.1 Evaluating the proposed method based on the MQ criterion

Table 5 depicts the average MQ values obtained by different clustering algorithms in MDGs of different benchmark programs. It was found that all three clustering algorithms were able to identify the best software clustering when the programs included a number of modules less than 38 and the number of edges less than 180. Hence, the competition among clustering algorithms is more focused on the programs with higher number of modules and edges (real-world size programs). According to the given average MQ values in Table 5, it was observed that, PSO-GA algorithm outperformed PSO algorithm in 60% of the programs. Also, PSO algorithm outperformed GA algorithm in some of the benchmark programs. Hence, it can be argued that PSO-GA algorithm had better results than the other two algorithms and PSO had better results than GA.

The particle swarm optimization (PSO) algorithm, is a metaheuristic algorithm based on the concept of swarm intelligence; it has been successfully applied in many areas such as solving complex mathematics problems existing in engineering. In PSO algorithm, the search can be carried out by the speed of the particle. The disadvantages of PSO algorithm are that it is easy to fall into local optimum in high-dimensional space and has a low convergence rate in the iterative process. Also, low quality of the obtained solutions is the other drawback of the PSO algorithm. On the other hand, mutation and crossover are regarded as two significant operators of GA which are used for finding a spot in the search space. In order to overcome the problems of PSO and GA combination of GA and PSO can combine the advantages of PSO and GA and improve the overall performance. Combining these two algorithms together means to create a compound algorithm that has practical value. Regarding the results of our experiments, combination of PSO and GA outperforms the previous methods in the software module clustering problems. Indeed, the results confirm the higher performance of the PSO_GA in this problem (software module clustering). Table 5 depicts the quality of generated clusters by different methods. In this table the effectiveness of different method in clustering the modules of a software were directly compared.

Table 5
The measured clustering quality for each method with 10 executions

Benchmark applications	PSO_GA	PSO [9]	HC [7]	GA [14]	Best method
RCS	2.0002	1.9993	2.180	1.9934	HC
INCL	6.2952	6.2229	5.480	5.8817	PSO-GA
GRAPPA	5.5531	4.8489	5.501	4.6542	PSO-GA
Modulizer	3.1446	3.1296	3.180	3.1456	GA
Compiler	1.7033	1.7083	1.693	1.7083	PSO && GA
Mtunis	2.3145	2.3145	2.250	2.3145	All three methods
Bison	2.5925	2.5766	2.590	2.5299	PSO-GA
Boxer	2.9339	2.9614	2.962	2.9410	PSO
acqCIGNA	7.0706	6.9643	6.029	6.4094	PSO-GA
Ispell	2.2703	2.2619	2.252	2.2639	PSO-GA

Table 6

The standard deviation among the obtained values of MQ by different clustering methods

Applications	PSO-GA	PSO [9]	GA [14]	Best method
RCS	0.0080	0.0091	0.0145	PSO-GA
INCL	0.2354	0.2878	0.3327	PSO-GA
GRAPPA	0.1131	0.1324	0.1782	PSO-GA
Modulizer	0.0536	0.0505	0.0407	GA
Compiler	0.1600	0.0213	0.0891	PSO
Mtunis	0.0000	0.0000	0.0000	All three methods
Bison	0.0096	0.0379	0.0582	PSO-GA
Boxer	0.0597	0.0572	0.0727	PSO
acqCIGNA	0.2007	0.2392	0.2871	PSO-GA
Ispell	0.0145	0.0372	0.0158	PSO-GA

Figure 7.

The stability of PSO-GA, PSO and GA clustering algorithms on 10 benchmark MDGs.

Figure 7.

continued.

4.2 Evaluating the proposed method based on the standard deviation of the results

Standard deviation (SD) is one of the measures of dispersion which indicates that how far data is from the mean score or mean data. The lower the standard deviation of the given data, the lower the dispersion of the data. Hence, data stability will be high. On the other hand, a big SD stands for more dispersion and the instability of the given responses by the clustering algorithm. Table 6 shows the standard deviation among the values of MQ obtained from 10 executions of each clustering algorithms on different benchmark MDGs. given in Table 6, it can be observed that the highest stability is related to mtunis MDG with 20 modules and 67 edges in which SD value is 0. Regarding the results shown in Table 6, PSO-GA algorithm had more stability than PSO and GA algorithms in 60% of the programs. Also, it was found that the stability of data in PSO algorithm was more than that of GA algorithm. Figure 7 illustrates the curves related to the stability of different clustering algorithms on 10 benchmark MDGs in 10 times executions. In sum, it can be maintained that stability of PSO-GA algorithm was better than that of PSO and the stability results of PSO was better than those of GA algorithm.

4.3 Evaluating the proposed method based on the success rate in achieving the best clustering quality

According to the success percentage (rate) given in Table 7, it was observed that the success percentage of PSO-GA algorithm was more than that of PSO algorithm in 40% of the programs and it was more than that of GA algorithm in 50% of the programs. Also, in 30% of the programs, all the three algorithms had identical success percentage. Furthermore, in 20% of the programs, the results of PSO algorithm were better than those of GA algorithm. In 30% of the programs, GA algorithm performed better than PSO algorithm. In the columns marked by unknown, the clustering algorithms had different results in different iterations. Hence, the success percentage was not measurable in those programs. In sum, PSO-GA algorithm had better results than the other two algorithms and the success percentage of GA algorithm was 10% more than that of PSO algorithm.

Table 7
Success percentage of the clustering algorithms in finding the optimal clustering

Applications	PSO-GA	PSO [9]	GA [14]	Best method
RCS	70%	20%	10%	PSO_GA
INCL	Unknown	Unknown	Unknown	Unknown
GRAPPA	Unknown	Unknown	Unknown	Unknown
Modulizer	60%	40%	30%	PSO_GA
Compiler	90%	80%	90%	PSO_GA
Mtunis	100%	100%	100%	Unknown
Bison	100%	50%	80%	PSO_GA
Boxer	80%	80%	60%	PSO_GA
acqCIGNA	Unknown	Unknown	Unknown	Unknown
Ispell	50%	20%	20%	PSO_GA

4.4 Evaluating the proposed method based on the convergence to optimal response

Fast data convergence to the best response refers to the fact that clustering algorithm is able to identify the best clustering in few number of iterations. Figure 8 has depicted data convergence of PSO-GA algorithm, PSO and GA algorithms to optimal response in 10 benchmark programs. It was found that, in 90% of programs, the convergence of PSO-GA algorithm was faster than the other two algorithms. Also, convergence of PSO algorithm to optimal response was faster than that of GA algorithm. In sum, PSO-GA algorithm had faster convergence to optimal response than PSO and GA algorithms.

Figure 8.

Convergence of PSO-GA, PSO and GA algorithms on 10 benchmark MDGs.

Figure 8.

continued.

5. Conclusion

In this paper, we focused on the issue of software module clustering which refers to grouping of interdependent software modules within a group. The rationale behind software clustering was to produce clusters with the highest internal relations among the modules in a cluster (cohesion) and minimum external relations with other different clusters of the software (coupling). Identifying the best clustering is considered to be a non-definite multipurpose problem. In this paper, we introduced PSO-GA algorithm for improving data convergence speed to optima response, for enhancing data stability and for optimizing software clustering quality. We conducted experiments on 10 standard benchmark MDGs and each experiment was replicated for 10 times. Finally, the obtained results of PSO-GA algorithm were compared with those of PSO and GA. Regarding the results of experiments, the PSO-GA algorithms outperforms the other algorithms in terms of MQ, stability, success rate and convergence speed.

6. Drawbacks and directions for further research

On average, in 90% of the programs, the proposed method (ARAZ) using PSO-GA algorithm managed to access optimal response better than PSO and GA algorithms. However, since the proposed PSO-GA algorithm in the Incl program produced poorer responses than PSO algorithm, further research studies and experiments should be conducted for investigating the proposed algorithm on programs with higher number of modules and edges. Hence, future studies should be done on benchmark programs with more than 300 edges. Furthermore, in future studies, the proposed PSO-GA algorithm can be investigated and studied on weighted graphs. The other shortcoming of the proposed method is that the size of the generated clusters is not considered into the fitness function; hence, taking the size of clusters into account is the other future study.

This study was not funded by any third party and the authors declare that they have no conflict of interest.

References

Mancoridis

Mitchell

B.S.

Rorres

Chen

Gansner

E.R.

Using Automatic Clustering To Produce High-level System Organizations Of Source code, Department of Mathematics & Computer Science Drexel University, Philadelphia, PA, USA.

Mancoridis

Mitchell

B.S.

Chen

Y.F.

Gansner

E.R.

Bunch: a clustering tool for the recovery and maintenance of software system structures, in: Proceedings of the IEEE International Conference Software Maintenance, 1999.

Mitchell

B.S.

A Heuristic Search Approach to Solving the Software Clustering Problem, A Thesis Submitted to the Faculty of Drexel University in Partial Fufillment of Therequirements for The Degree of Doctor of Philosop, 2003.

Praditwong

Harman

Yao

Software module clustering as a multi-objective search problem, IEEE Transactions on Software Engineering, 37(Issue 2) (2011).

Kumari

A.C.

Srinivas

Hyper-heuristic approach for multi-objective software module clustering, Systems and Software, 117 (July 2016).

Akhlaq

Yousaf

M.U.

Impact of Software Comprehension in Software Maintenance and Evolution, Master Thesis, School of Computing Blekinge Institute of Technology, Sweden, 2010.

Huang

Liu

Asimilarity-based modularization quality measure for software module clustering problems, Information Sciences, 342 (2016).

Mohammadi

Izadkhah

A new algorithm for software clustering considering the knowledge of dependency between artifacts in the source code, Information and Software Technology, 105 (2019), 252–256.

Prajapati

Kumar Chhabra

A particle swarm optimization-based heuristic for software module clustering problem, Arabian Journal For Science And Engineering, 43(Issue 12) (2017).

10.

Sun

Ling

Software module clustering algorithm using probability selection, Wuhan University Journal of Natural Sciences, 23(Issue 2) (2018).

11.

Prajapati

Chhabra

J.K.

TA-ABC: two-archive artificial bee colony for multi-objective software module clustering problem, J. Intelligent Systems, 27(Issue 4) (2018), 619–641.

12.

Kennedy

Eberhart

Particle Swarm Optimization, in: Proceedings of IEEE International Conference on Neural Networks, 1995, pp. 1942–1948.

13.

Hatami

Arasteh

An efficient and stable method to cluster software modules using ant colony optimization algorithm, Journal of Supercomputing, 76(Issue 9) (2020), 6786–6808.

14.

Doval

Mancoridis

Mitchell

B.S.

Automatic Clustering of Software Systems Using a Genetic Algorithm, in: Proceedings of the IEEE Conference on Software Technology and Engineering Practice, 1999.

15.

Storey

Theories, methods and tools in program comprehension: past, present and future, in: 13th International Workshop on Program Comprehension (IWPC’05), USA, 2005, pp. 181–191.

16.

Xie

Gong

Tang

Lei

Liu

Wang

Enhancing Evolutionary Multifactorial Optimization Based On Particle Swarm Optimization, in: IEEE Congress on Evolutionary Computation (CEC), 2016.

17.

Amarjeet

Chhabra

J.K.

FP-ABC: Fuzzy-Pareto Dominance Driven Artificial Bee Colony Algorithm for Many-Objective Software Module Clustering, in: Computer Languages, Systems & Structures, Vol. 52, 2018, pp. 1–21.

18.

Amarjeet,

Chhabra

J.K.

Improving modular structure of software system using structural and lexical dependency, Information and Software Technology, 82 (2017).

19.

Austin

M.A.

Samadzadeh

M.H.

Software comprehension/maintenance: an introductory course, in: 18th International Conference on Systems Engineering (ICSEng’05), Las Vegas, USA, 2005, pp. 414–419.

20.

Arasteh

Sadegi

Arasteh

. Bölen: software module clustering method using the combination of shuffled frog leaping and genetic algorithm. Data Technologies and Applications. Online publication. November, 2020.

21.

McCall

Genetic algorithms for modelling and optimization, Journal of Computational and Applied Mathematics, 184(Issue 1) (2005).

ARAZ: A software modules clustering method using the combination of particle swarm optimization and genetic algorithms

Abstract

Keywords

1. Introduction

3.1.2 Updating position of each particle with GA algorithm

4.1 Evaluating the proposed method based on the MQ criterion

Table 5 The measured clustering quality for each method with 10 executions

4.3 Evaluating the proposed method based on the success rate in achieving the best clustering quality

Table 7 Success percentage of the clustering algorithms in finding the optimal clustering

6. Drawbacks and directions for further research

References

Table 5
The measured clustering quality for each method with 10 executions

Table 7
Success percentage of the clustering algorithms in finding the optimal clustering