Abstract
Estimation of software cost (ESC) is considered a crucial task in the software management life cycle as well as time and quality. Prior to the development of a software project, precise estimations are required in the form of person month and time. In the last few decades, various parametric and non-algorithmic or non-parametric approaches regarding the estimation of software costs have been developed. Among them, the constrictive cost model (COCOMO-II) is a commonly used method for estimating software cost. To further improve the accuracy of this model, researchers and practitioners have applied numerous computational intelligence algorithms to optimize their parameters. However, accuracy is still a big problem in this model to be addressed. In this paper, we proposed a biogeography-based optimization (BBO) method to optimize the current coefficients of COCOMO-II for better estimation of software project cost or effort. The experiments are conducted on two standard data sets: NASA-93 and Turkish Industry software projects. The performance of the proposed algorithm called BBO-COCOMO-II is evaluated by using performance indicators including the manhattan distance (MD) and the mean magnitude of relative error (MMRE). Simulation results reveal that the proposed algorithm obtained high accuracy and significant error minimization compared to original COCOMO-II, particle swarm optimization, genetic algorithm, flower pollination algorithm, and other various baseline cost estimation models.
Keywords
Introduction
Estimating of software cost is one of the key element of the software management life cycle which is widely used for planning and managing project cost and time [1, 3]. Reliable estimation makes the software project more efficiently and accurately, such as resources allocation, reducing the project failures, level of reliability required, the tool used for a software project, the capability of the programmers, and so on [4, 5]. Several software cost estimation methods and procedures are proposed which are categorized into two sets: parametric or algorithmic and non-algorithmic or non-parametric. The most commonly used non-algorithmic methods are estimation via analogy, top-down, bottom-up estimation, Parkinsons law and price to win. Some of the prevalent parametric methods are COCOMO (Constructive Cost model), Baily Basil (BB), Walston Filex model (WF), Halstead, Doty, Putnam, function point analysis, SEL and COCOMO-II. COCOMO-II is the most widely used algorithmic estimation model which is discussed in Section 3. However, this model is still lacking in term of accuracy. To further improve the accuracy of this model, several algorithms have been studied. In this paper, the recently developed derivative free method, namely, the biogeography-based optimization algorithm (BBO) has employed for training the coefficients of COCOMO-II model and compared the proposed model with baseline model denoted by COCOMO-II. The other well-known derivative free methods including genetic algorithm (GA), and particle swarm optimization (PSO) are used to establish fair comparison of the proposed methodology and other various software effort estimation models likewise Doty, IVR, Halstead, Bailey Basil, and SEL. The two historical datasets, NASA-93 and Turkish Industry software projects were used in our carried out experiments.
The main contribution of this paper is given as:
BBO algorithm is employed for the first time for settling well optimized choice of parameters in respect of COCOMO-II for software cost estimation. Experimental results of the proposed methodology are compared with several derivative free methods including GA, FPA and PSO versus other models of the same natures likewise Dosty, IVR, Halstead, Bailey Basil, and SEL. Two different standard dataset NASA and Turkish Industry software projects are used to prove the effectiveness of the proposed methodology
The rest of this paper is arranged as follow: Section 1 presents the background of the study, in Section 2, we discuss some related work regrading estimation of software project, while Section 3 illuminates COCOMO-II model. The proposed methodology is discussed in Section 4. Furthermore, Sections 5 and 6 explain results analysis, conclusion and future recommendation.
Many estimation techniques were used to optimize the current coefficient of the COCOMO model [6, 7], such as PSO, GA, tabu search, intelligent water drop algorithms and many more. In [8], a cuckoo search algorithm was applied to optimize the coefficients of COCOMO-II. Experiments were conducted using 18 datasets from NASA-93 software projects. Experimental results demonstrate that the cuckoo search algorithm performs better than Baily-basil, Doty, Hallstead, and original COCOMO-II. Sweta and Pushkar [9] proposed a hybrid approach of cuckoo search and artificial neural networks to optimize the existing COCOMO-II parameters. They used NASA-93 projects for experimental work. A hybrid technique was presented in [10] for optimizing the parameters of COCOMO-II. The authors used tabu search with GA for the parameters optimization of COCOMO-II. Urbanek et al. [11] used analytical programming with differential evolution techniques for software cost estimation. Their results demonstrate that the proposed method outperforms differential evolution technique.
Dalal et al. [12] presented a generalized reduced gradient nonlinear optimization algorithm with best-fit analysis to improve the current coefficients of COCOMO-II. They applied the optimized COCOMO-II on NASA-93 software projects. Their results demonstrate that the optimized COCOMO-II gave better performance than COCOMO-II. Pereira et al. [13] proposed COCOMO-II for effort estimation using organization case study (aeronautical industry). They concluded that the COCOMO-II has better estimation technique for real software project as compared to other estimation techniques. Shepperd et al. [14] proposed a new method named as expert judgment to calculate the cost of software projects. Amelia Effendi et al. [17] proposed a bat algorithm for optimization of COCOMO-II. Sumera et al. [18] proposed a hybrid whale-crow optimized-based optimal regression method for estimation software project cost. They used four software industries dataset in their experiments. According to their experimental results, the proposed model has better than other some estimation model. Sheta et al. [19] proposed soft-computing method for estimation of software project cost. They used PSO to optimize the coefficients of COCOMO. They also used fuzzy logic to create a set of linear methods over the domain of possible software LOC. The proposed algorithms are compared with some baseline methods like, HS, WF, BB and Doty models. However, COMOCO-II is still requiring lacking of accuracy.
In this study, we offered a BBO for optimization of COCOMO-II coefficients. The proposed algorithm is compared with conventional COCOMO-II and some other baseline methods for NASA-93 and Turkish Industry software projects. Simulation results reveal that the proposed algorithm gives better results as compared to original COCOMO-II, PSO, GA and some other baseline models in terms of MMRE and MD.
The process of COCOMO-II model.
COCOMO model was first developed by Boehm in 1981 [20]. It is an algorithmic technique used for project cost or effort estimation. COCOMO models consists of three-layer, basic, intermediate, and detail layer. COCOMO-II is a new form of COCOMO and is proposed by B. B in 1995 [21]. It provides better cost estimation of the software projects compared to COCOMO. Figure 1 presents the overall process of COCOMO-II. The formula of effort estimation in terms of man month for projects is given as:
Here
COCOMO-II model effort multipliers [21]
The exponent
Here
COCOMO-II model scales factors [21]
There are numerous uncertainties in the software project cost estimation using COCOMO-II. In COCOMO-II the multiplicative constants A and B need be to optimized to get better estimation. The aim of this study is to optimize these constants to improve the performance of COCOMO-II by using BBO optimization method. The performance of the BBO-COCOMO-II is evaluated by comparing with PSO, GA, and other various cost estimation models, such as IVR, SEL, Bailey-Basil, Doty, and Halstead [22, 23, 24, 25]. NASA-93 and Turkish Industry software projects are used for the experiments. The input is project size in term of KLOC, measured effort, 17 effort multipliers, and 5 scale factors, and the output is in the form of new optimized values of A and B.
Fitness function
In terms of cost estimation for different projects, if the predicted effort approximately matches the real effort then projects are successfully completed. It means for higher accuracy, it’s needed the lower value of MMRE and MD. Therefore, our goal is to minimize MD and MMRE between the actual and predicted efforts. Fitness functions used in our experiments are given in Eqs (5)–(7).
BBO algorithm
BBO is a stochastic based computational algorithm, which was proposed by Simon in 2008 [25]. In BBO, habitants are considered as individuals which are generated randomly. Each individual has habitat suitability index (HSI) i.e. fitness value which shows its degree of goodness. Habitat with high HSI represents a good solution while habitat with low HSI presents a poor solution. Factors that can affect HIS of a habitat such as rainfall, diversity of vegetation, topographic diversity, land area, and temperature etc., are called suitability index variables (SIVs). Features emigrate from high HSI habitat to low HSI habitat. Two migration opera- tors named as emigration and immigration are used to evaluate and improve solutions of the given optimization problem. The emigration rate in high HSI habitat is high and its immigration rate is low, while in low HSI habitat, the immigration rate is high and the emigration rate is low [28, 29]. In BBO, first a population of habitats is randomly generated. Each habitat is denoted by an N-dimensional integer vector
Two operators’ migration (i.e., immigration and emigration) and mutation are used to get new population for next generation. Mutation factor is employed to the habitat which tends to increase the biological diversity of the population probabilistically. The mutation rate ‘m’ is calculated by the following formula.
Where “
Data set description
Two datasets NASA-93 and Turkish Industry software projects [26, 27] were used for experiments. These software projects contain KLOC, actual effort, estimated effort, 17 cost drivers or effort multipliers (EM), and 5 Scale Factors (SF) with rating level from very low to extra high.
Parameter settings in BBO algorithm
Parameter settings in BBO algorithm
Parameter settings in genetic algorithm framework
Parameter settings in PSO framework
Parameter settings in FPA framework
Magnitude of relative errors for estimations using BBO, COCOMO-II and others models using NASA-93 datasets
To evaluate the accuracy of BBO-COCOMO-II and to compare it with other approaches including original COCOMO-II, we have used the most common evaluation methods MRE, MMRE and MD. MRE calculates the absolute percentage of errors among actual and expected effort for each reference software project. The parameter setting of the proposed BBO algorithm, GA, FPA, and PSO are given in Tables 3–6 respectively. The Matlab implementation of BBO introduced by Dan Simon [24] is applied in our paper.
MMRE of software projects is given as
MD is the difference between predicted effort and actual effort which can be calculated as
Where
Magnitude of relative errors for estimations using BBO, COCOMO-II and others models using Turkish Industry datasets
MMRE and MD for proposed BBO, PSO, FPA, GA, COCOMO-II and others models using NASA datasets.
Comparison among actual efforts vs BBO, PSO, FPA, GA and COCOMO-II for NASA (c) and Turkish Industry (d) datasets.
MD and MMRE for proposed BBO, PSO, GA, COCOMO-II and others models using Turkish Industry datasets.
Experimental work was conducted on PC with core i 7 processor 3.3 GHz and 16.00 GB Ram. The proposed algorithm is implemented in MATLAB-2013 tool. The experimental work is carried out by using NASA-93 and Turkish Industry software projects. For NASA-93 software projects, we used first 40 projects for training data and 35 randomly selected software projects for testing. For Turkish Industry software projects overall 12 projects were used. In our experiments, we obtained A
MMRE (mean MMRE) and MD (Manhattan distance) comparison for proposed BBO, PSO, GA and other various models using NASA-93 datasets
MMRE (mean MMRE) and MD (Manhattan distance) comparison for proposed BBO, PSO, GA and other various models using NASA-93 datasets
MMRE (mean MMRE) and MD (Manhattan distance) comparison for Proposed BBO, PSO, GA and other various models using Turkish Industry datasets
It clearly seen that the proposed BBO algorithm is able to reduce error efficiently. Moreover, the performance of PSO and GA algorithms are explained in detail for NASA-93 and Turkish Industry software projects. The Fig. 3 is the graphical representation of the effort results for both datasets. The graphs also represent that using BBO-COCOMO-II is much closer to the actual effort when compared to efforts predicted by the original parameters of COCOMO-II, optimized COCOMO-II using BBO-COCOMO-II, PSO and GA algorithms. In these figures red curve represents the proposed BBO algorithm. The Figs 2 and 4 show the MMRE and MD comparison of the proposed BBO and other cost estimation techniques using NASA-93 and Turkish Industry software projects respectively.
Moreover, Tables 9 and 10 present comparison analysis of MRE among the COCOMO-II with BBO-COCOMO-II, PSO, GA, FPA and other different cost estimation models in term of effort for 30 projects from NASA and 12 projects from Turkish Industry software projects datasets.
In this study, we applied a novel approach biogeography-based algorithm as a natured inspired optimization algorithm for optimizing the COCOMO-II current parameters. The proposed BBO-COCOMO-II has been tested by using NASA-93 and Turkish Industry software projects. Simulation results show that the proposed BBO-based method outperforms conventional COCOMO-II, PSO, GA, FPA, and other cost estimation techniques. In the future, we intend to use other evolutionary algorithms for optimizing the coefficients of the constructive cost model and constructive quality estimation model.
Footnotes
Acknowledgments
We thank the editor and reviewers for their thorough reviewers, thoughtful comments, and constructive suggestions.
