Abstract
Software cost estimation is the process of predicting the most realistic and valid amount of effort necessary for the development of any software. The cost estimation of any software is a difficult assignment due to the involvement of many factors that anyhow affect the estimation process. In literature, many cost estimation models have been developed for more than a decade to maintain accuracy in estimation of the cost of software projects. But, it is found that these models are inefficient to estimate the exact cost of software development because of uncertainties and lack of accuracy associated with them. In this paper, Alla F. Sheta models have been taken for optimization, which are the modified versions of the very famous Boehm’s COCOMO model. Parameters of the Sheta models have been tuned enough by the proposed method to estimate and minimize the consequences of different factors that affect the overall software development cost. Experimental work has been carried out in MATLAB environment and analysis of results is performed on the basis of Magnitude of Relative Error (MRE), Prediction (PRED) at 0.25, Value Accounted For (VAF) and Mean Magnitude of Relative Error (MMRE). Estimation accuracy of the proposed work is tested on NASA software project dataset. It is found that the proposed method shows good estimation capabilities over other state-of-the-art cost estimation models.
Introduction
During the software development, objective of software cost estimation model is to accurately estimate the cost, time, effort and expertise of working staff needed for the project in early stage of the software development life cycle. Accurate estimation is very important because underestimation or overestimation may negatively affect the overall estimation process. Underestimating the cost may affect the quality and performance of the software product, while overestimation may cause of misuse of funds.
Cost estimation techniques are categorized into algorithmic and non-algorithmic techniques. Constructive Cost Model (COCOMO) [1, 2] and software life cycle management [3] comes under algorithmic techniques.
The COCOMO model is a mathematical relationship among software development time and effort as a function of program size and maintenance effort [4, 5].
Estimation of program size is expressed in kilo lines of code (KLOC). A basic regression approach is applied with parameters in the COCOMO model. These parameters are obtained from the statistical analysis of historical project datasets and current project features.
The non-algorithmic techniques include expert judgment, price-to-win and soft computing tech-niques. Fuzzy logic, neural networks and evolutionary computation comes under soft computing techniques. Fuzzy logic and neural networks are widely used to build the effective cost estimation models [6, 7]. Evolutionary computation techniques are also applicable in software cost estimation and different models based on these techniques are introduced with inherent novelty [8].
Several cost estimation techniques have been developed in recent years, yet variation in software requirements make the estimation more challenging for the software development. Hence, the need of precise, valid and reliable cost estimation is an on-going challenge in software engineering.
In general, objective of a software cost estimation technique is to obtain a clear estimation of the cost needed for the software development. The basic input for software cost estimation is the size of the software to be developed and the size is then converted into effort needed for software development.
The measurement of effort is done in person months (PM), which can easily be converted into software cost. In literature, several cost estimation models are available that shows the relationship between size and effort.
Project managers maintain a track record of the project progress and assure about the better utilization of the available resources. In this scenario, effort estimation has a prime importance among all available cost drivers and it is affected by Developed Line of Code (DLOC). All formal statements and program instructions are the prime elements of DLOC which are included in it.
In this paper, Environmental Adaption Method for Dynamic Environment EAMD [10] is used to optimize the parameters of the Sheta models [9] by providing the generalized optimal value for each parameter. These parameters are optimized in such a way that accurate cost estimation of different software projects can be achieved.
In our work, a new architectural framework has been proposed for cost estimation. This architecture represents the application of Magnitude of Relative Error (MRE) and Mean Magnitude of Relative error (MMRE) with EAMD to assign the quality value to each parameter of the Sheta models that helps to improve the overall performance of the existing COCOMO model and makes it more reliable and abbreviates the negative effect of noise in measurement of software cost with high accuracy.
Rest of the paper is organized as follows. Section 2 shows the related work on software cost estimation. Background details are provided in Section 3. Section 4 explains the work that has been done to improve the estimation accuracy of different types of projects. Section 5 is used for result analysis and finally the paper is concluded in Section 6.
Related work
Many models have been developed for more than a decade to estimate the cost of software as per changing requirements. Several assessment techniques have been applied to find out the approximate value of the factors such as effort in man-months, time and expertise needed for any software to become functional. A lot of research is done and many techniques are already explored to solve the estimation problems. This section discusses some of the very concerned and useful work that has been used for cost/effort estimation.
Pedrycz et al. [11] used fuzzy sets to develop the models for software cost estimation. They introduced the granular models of cost estimation. They also proposed the augmentation of well-known class of COCOMO cost estimation models.
Software cost estimation model using the Genetic Programming (GP) is proposed by Sheta et al. in year 2010 [12]. In this model they applied the effect of both the developed line of code and the methodology used throughout the overall development. They applied this approach to find out the estimated effort on some NASA software projects and they found good results. Performance of the GP based proposed model is compared with the well-known models in the literature shows its superiority over other models.
A hybrid approach including Ridge Regression (RR) with Genetic Algorithm (GA) is proposed by Papatheocharous et al. [13] in year 2010. They applied their hybrid cost model on ISBSG dataset of software project samples and the output received shows that by eliminating the repeated attributes accuracy of results may be improved.
Sheta [9] established two new models for estimating the parameters of COCOMO model using genetic algorithm. He also provided the improved version of the very popular COCOMO model to examine the effect of software development adopted methodology in effort computation. He also estimated the effort necessary for the development of the software projects.
Borte et al. [14] established software effort estimation as collective accomplishment. This paper also shows their effort to find out that how the software effort estimation should be performed by the complex and sense making actions rather than by using the assumed information.
In year 2010, Basha et al. [15] published their work and established Empirical Software Effort Estimation (ESEE) model. During the study of ESEE, it is concluded that no single technique is best for all situations, so a valid and careful comparison is required among the results generated by all techniques to bring out valid estimates.
Attarzadeh et al. [16] introduced a fuzzy logic based effort estimation model that proposed a new approach that applies fuzzy logic for software effort estimation which helps in reducing the long term estimation process which is generally needed in conventional estimation techniques.
Kumar et al. [17] applied fuzzy estimation theory and neural networks in software engineering project management and control. For that they used Manpower Build up Index (MBI) estimation model. Selection process in MBI estimation model is based upon 64 different Fuzzy Associative Memory (FAM) rules. The fuzzy estimation theory is used to model the three fuzzy parameters like inverse application complexity (IAC), task concurrency (TC) and schedule pressure (SP) in estimating the MBI. They also show the working of fuzzy FAM in software project management and in estimation of the MBI.
In 2008, Idri et al. [18] proposed software cost estimation models using radial basis function neural network. They addressed imprecision and uncertainty as main issue in their work. They described that estimating software development effort remains a complex problem, and one which continues to attract considerable research attention. Improving the accuracy of the effort estimation models available to project managers would facilitate more effective control of time and budgets during software development.
Background details
Constructive Cost Estimation Model (COCOMO)
COCOMO model is a well-known and prominently used cost and schedule estimation model proposed by Dr. Barry W. Boehm in 1981 [19, 20]. This model is based on the analysis of 63 software projects from different domains during 1970s and early 1980s. In COCOMO, effort is expressed as man-months (MM) or person months and software projects are classified into three categories like organic, semi-detached and embedded. These categories are classified based on the complexity of the project. A common approach to estimate the software cost (effort) with COCOMO model is given in Equation 1:
In Equation (1) a and b are constants and values of these constants are determined by regression analysis applied to historical projects and current software projects. The value of a and b is varies for different types of projects such as organic, semidetached and embedded. These parameters are optimized through different algorithmic and non-algorithmic techniques. COCOMO model is easy to understand unlike other models such as SLIM [21] and SEER-SEM [20]. But there are some limitations with COCOMO model as mentioned below:
Attributes and their relationship used to predict software development effort are time dependent and differ for software development environment [22]. Problem in accurate software size estimation in terms of source line of code (SLOC), number of user screens, interfaces, complexity and so on, are the parameters desirable in existing models at very early stage in development process when uncertainty surrounds the most [23]. Inability to handle data that are specified by a range of values categorically and most importantly lack of logical reasoning capabilities and their ability to draw conclusions or make judgments based on recently available data [22, 23].
Due to these limitations many researchers have explored the non-algorithmic techniques to build efficient and valid cost estimation models.
Sheta proposed evolutionary models using Genetic Algorithm (GA) for estimating the software effort [9]. He applied GA to estimate the parameters of COCOMO effort estimation model. Performance of these models has been tested on 18 software project dataset taken from NASA [27]. He has provided a new estimate for the parameters used in the basic COCOMO model given in Equation 1. The parameters are estimated in such a way that the generalized computation of the developed effort for all projects can be obtained. The details of the changes in COCOMO model are discussed in Section 4.3.
Environmental Adaption Method for Dynamic Environment (EAMD)
EAMD is a population based, nature inspired and randomized algorithm which works on real valued parameters [10, 24]. It is an improved version of Environmental Adaption Method (EAM) [25, 26]. In EAMD, environment of the search space is very dynamic unlike EAM. Environment in EAM is bounded by the average fitness of the population. While EAMD uses a term “Environmental Window” that is fully dynamic and it is directly managed by the environmental changes used to represent the strength of the environment that supports life. Over a few generations, as environment becomes tough due to environmental constraints, the window size decreases gradually to make an environment dynamic.
The environmental window is used to represent an environment for its inhabitants to survive. The nature of the environment is dependent on the size of the window, if the window size is large, species can easily survive and as the window size decreases gradually the environment becomes tough for its species to survive. So, as the survival is concerned individual changes their phenotypic structure as per environmental changes and gain better fitness over time. If the individuals are not able to adapt the changes, they may no longer survive and eliminated from the environment.
EAMD has two operators named adaption and selection. Here the adaption is applied first on the set of solutions and improves their fitness and form new intermediate set of improved solutions. Adaption improves the fitness of solutions in the range provided by the environmental window (adaption window). Selection is applied on the merging of initial and intermediate improved solutions. This merged solution is sorted and best ‘n’ number of solutions is selected for the next generations. The whole process in the successive generations is continued until either the optimal solution is found or maximum number of fitness evaluation is reached. The working model of EAMD is mentioned in Fig. 1.

Working model of EAMD [24].
Software cost/effort estimation is the challenging task for the software development process and software project management. In order to calculate the value of total effort, many models have been suggested i.e. COCOMO I, COCOMO II, Sheta etc. These models use some adjustable parameters whose value may significantly affect the accuracy of the effort estimation. It includes many factors like labour cost (effort), hardware cost, project category, language, storage constraints, software tools etc. But effort is the most significant and dominating factor among all factors. Effort is the amount of labour (persons) requisite for completing a software project.
Objective
Objective of software project estimation technique is as follows:
Estimation of effort required to complete a project successfully. To calculate total cost of a project.
Characteristics of a good software cost estimation technique
Some characteristics which are necessary for good software cost estimation technique are as follows:
Support and acceptance of project manager, development team and stakeholders are required. It should be relied on a good software cost model. It should be based on the relevant historical projects that support to build a cost effective estimation techniques. It should be well-defined so that the necessary action can be taken to minimize the risk occurred during the cost estimation.
Challenges
Some basic and most significant challenges for software cost/effort estimation techniques for providing valid and authentic estimation of software project are as follows:
Validity and uncertainty of data. Limited time to prepare estimate. Proper use of resources. To make sure that the data is authentic, valid and unambiguous. Estimation should be done within time because the available time for estimation is very limited and delay in this may affect the overall development process.
Solving approach
It is highly expected to get the accurate estimate of the cost/effort of software, but there is as no such type of statistical model is available that can accurately predict the cost/effort of software development due to uncertainties and imprecision associated with the software projects. Sheta has mentioned in their work that estimating the software cost/effort is a kind of optimization problem and he has used genetic algorithm in his work to estimate the software cost/effort.
Most of the researchers have explored the domain of nature inspired optimization techniques such as evolutionary computation, swarm intelligence, differential evolution etc. to achieve better estimation models. These techniques have tremendous exploration and exploitation capabilities to handle the accuracy in cost/effort estimation which motivate researchers to improve the performance of existing models. EAMD is such kind of optimization technique that randomly generates solutions with higher level of performance accuracy.
Proposed work
The parameters (a, b, c, d) of the models proposed by Sheta over basic COCOMO model are optimized by EAMD to obtain the accurate estimation of the effort. The experiments have been done on the data set of 18 NASA software projects [27]. In the dataset three parameters such as Kilo Developed Line of code (KDLOC), Methodology (ME) and the Measured Effort are considered for the experiment.
Fitness function
Fitness function is an evaluation criterion which is used to measure the performance of the proposed algorithm. In other words, it can be defined as an objective function that is used to analyze the optimal solution among the obtained feasible solutions pertaining to the problem. Fitness function is vital in optimization problems to decide the validity and effectiveness of the proposed algorithm.
In the proposed work, Mean Magnitude of Relative Error (MMRE) [9, 32] is considered as the fitness function to evaluate the performance of the cost estimation models. EAMD is used to obtain the optimal (minimum) value of the parameters a, b, c and d of the models proposed by Sheta for which MRE and MMRE is minimized as compared to existing models. We have also measured, value accounted for (VAF) [9] and prediction at level L (PRED (L)) [33] to check the accuracy of the proposed work.
Evaluation technique
Relative Error (RE) is one of the ratio measurement techniques that give error rate by measuring the average of prediction errors in every unit of effort. To do this, it takes project size and provides those projects which have larger absolute error. The measurement techniques like MRE [9, 33], MMRE and PRED [33] are based on the relative error (RE) [9, 33]. Relative error and measurement techniques are calculated as follows:
Magnitude of Relative Error (MRE) is computed by taking the absolute value of the relative error i.e.
Mean magnitude of relative error (MMRE) is the average of MRE over n observations and can be calculated as follows:
The quality of the cost estimation technique is determined by the minimum value of MMRE.
Prediction at some level (PRED) is used to measure the performance of estimation technique. PRED with prediction at level L is as follows:
The value of L is 0.25 which is the standard value for the measurement of PRED. The quality of the estimation method depends on the maximum value of PRED. VAF [9] is another evaluation technique which is used to verify the authenticity of the estimation technique. In VAF, measured value and estimated value is used to measure the working accuracy of estimation technique. The VAF is calculated as follows:
Variance (var) in the above equation is calculated as:
In Equation 7, ‘x’ is a variable that depends on the outcome of the values received by Equation 6 and ‘n’ denotes the total number of values of x.
In the late 1970s, Boehm proposed a Constructive Cost Model (COCOMO). This model is simply classified and characterized by the type of projects to be handled. This model includes three category of project classification i.e. organic, semidetached and embedded [4, 31].
In recent years many researchers have contributed their effort to modify and tune Bohem’s model to gain more accuracy in cost estimation of the software projects. Sheta is one of them who used Genetic Algorithm to develop a generalized form of Boehm’s model to compute the effort needed for all types of projects.
Sheta proposed new estimated value of the parameters and also added new parameters into the Boehm’s basic model of cost estimation (see Equation 1). Cost estimation models proposed and tuned by Sheta are as follows:
Model 1: In this model only the values of parameters a and b is optimized by Sheta.
Model 2: This model includes one new parameter c and a new term methodology from NASA datasets (M).
In the second model, effect of methodology (M) is considered as an element that contributes its effect in computation of the software development effort. This model is based on the linear model structure development process. The prediction quality is improved by the addition of the effect of measured effort (ME) with basic COCOMO model.
In the above model a, b and c is the three parameters optimized by Genetic Algorithms (GAs) to provide the accurate estimation of the development effort required for the software projects. In this model ME is linearly related to the effort. This model is further improved by adding a bias term which is very similar to the classes of regression models to stabilize the model. This also helps to minimize the effect of noise in effort measurements. The previous model is re-estimated by the new model given below:
Model 3: This model takes one additional biased parameter d with the Equation number (9).
In the above equations, EE and DLOC stand for estimated effort and developed line of code (size) respectively.
The objective of the proposed work is to find the generalized optimal value of all parameters and to provide highest accuracy in the estimation of the effort required for all types of software projects. The experiments have been conducted on the dataset shown in Table 1. This table shows the measured effort and the corresponding line of code and methodology used for 18 software projects. Table 2 shows the terms used in the proposed work which have already been used by Sheta [9].
NASA Data for 18 software projects [27]
NASA Data for 18 software projects [27]
Terms used for software cost estimation
EAMD is applied on the models proposed by Sheta and Boehm. The parameters of these models are tuned by EAMD to obtain the optimal result. The parameter tuning is based on the outcome of MMRE. For every generation these values are changed and the updated values are temporarily stored in the memory. At the end of final generation we get the minimum value of MMRE and optimized value of four parameters. The working model of the proposed work is shown in Fig. 2.

(a) Prototype model of cost estimation. (b) Detailed view of the internal working of cost estimation model.
Working model of the proposed work is divided into three steps which are as follows:
Population size (PS * D) is initialized randomly having four parameters (a, b, c and d) are initialized randomly within the specified search domain, shown in Table 2. A population of size (PS * D), here D (dimension) actually shows the number of parameters and PS stands for population size. Group of four individual parameters has been taken and further process is applied as shown in Fig. 2.
From all 18 projects, select one project randomly. Calculate MRE of this project using all sets of parameters (a, b, c, d). Select minimum of MRE calculated in step 2. Find parameter set (a, b, c, d) corresponding to minimum MRE. Apply parameter set (a, b, c, d) obtained from step 4 on all 18 projects. Find all 18 MRE from step 5. Compute average of 18 MRE found in step 6 and assign it as MMRE. Same process continues for subsequent generations.
Proposed algorithm
Proposed algorithm generates global optimal values of a, b, c and d and minimum MMRE to minimize the difference between actual effort (measured effort) and estimated effort and applicable to all types of software projects. Steps involved in the proposed algorithm are as follows:
For i = 1, 2………. PS
// Do the following operations
Calculate EE i // Equations 8–10
Calculate MRE i // Equation 3
End for
In the proposed algorithm, we have calculated the estimated efforts along with measured (actual) efforts for different projects and they are used to calculate the magnitude of relative error (MRE). We have optimized the value of four parameters in every generation to minimize MMRE. Initially population of four parameters have been generated and as per model type, each row vector of parameters is applied to calculate the MRE of all 18 projects and correspondingly we have calculated MMRE and stored the minimum of MMRE along with their parameters in a memory.
In successive generation each row vector is optimized by EAMD and the whole process is repeated. After the completion of each generation, the current MMRE is compared with the previous generation and it is updated with minimum MMRE along with corresponding parameters. This is continued until the final generation has been reached. Finally the reduced MMRE and related optimized parameters are stored as optimal result.
Result analysis
In this section we have depicted the initial population of four parameters and EAMD is applied to optimize these parameters for improving the estimation accuracy of the proposed models. The estimated effort and MRE obtained by the proposed algorithm gives better results than the existing techniques for all three models proposed by Sheta. Estimated effort is very close to the measured effort and MRE is very less as compared to other methods for all models.
We have calculated MMRE, PRED and VAF which are shown in tables given below. The results received by the proposed work are superior to the other methods. For every model distinct optimal value is obtained for the related parameters. When we compare model 2 to model 1, a slightly better estimation has been achieved. The same thing happened during the comparison of model 3 to model 2. The parameters have been tuned enough to provide better estimation rather than other estimationtechniques.
All models proposed by Sheta have been optimized by the proposed algorithm and the parameters are tuned enough using EAMD. The proposed optimized models are as follows:
Here EE is the estimated effort, DLOC stands for developed line of code and M is the methodology added in the basic COCOMO model to improve the prediction accuracy of the model. In Table 3, we have shown the random initial population for all models to do the experimental work and analysis within the upper and lower limit of the search domain on which the estimation takes place.
Initial random population of parameters within the specified limit
Table 4 shows the final optimized value of the parameters used in calculating the estimated effort for modal 1. Estimated effort obtained by theproposed method, shown in Table 5, gives better result rather than Sheta and Sharma et al. [32]. Table 6, shows the magnitude of relative error of all methods. MRE obtained by proposed method is very less as compared to other methods. For every project we get the required optimum value due to better convergence rate.
Optimized value of parameters by proposed method for model 1
Optimized value of parameters by proposed method for model 1
Comparison of EE for model 1
Comparison of MRE for model 1
Results of MMRE and PRED of three methods are listed in Table 7. The value obtained by the proposed method is better than other two methods. It also gives better value of VAF rather than others, shown in Table 8.
Comparison of MMRE and PRED for model 1
Comparison of VAF for model 1
Table 9 shows the final optimized value of parameters for modal 2. Estimated effort is shown in Table 10 by different techniques and it is found that the proposed method shows better result rather than Sheta and Sharma et al. MRE obtained by different methods is shown in Table 11, which shows the superiority of the proposed method over other methods. Project wise estimation of the proposed method is very close to the measured effort and it gives better results. In Table 12, MMRE and PRED results obtained by the proposed method is superior to the Sheta and Sharma et al. methods. The VAF value estimated by the proposed technique comparable to other models presented in Table 13.
Optimized value of parameters by proposed method for model 2
Optimized value of parameters by proposed method for model 2
Comparison of EE for model 2
Comparison of MRE for model 2
Comparison of MMRE and PRED for model 2
Comparison of VAF for model 2
Table 14 shows the optimized value of all four parameters of modal 3. In Table 15, the estimated effort calculated by proposed method is compared with other estimated effort proposed by Sheta, Sharma et al. and Brajesh et al. [33]. It is found that the estimated effort provided by the proposed method is very close to the measured (actual) effort and it also provides better estimated value than the others.
Optimized value of parameters by proposed method for model 3
Optimized value of parameters by proposed method for model 3
Comparison of EE for model 3
Table 16 shows that the proposed method also provides better MRE than the other existing methods. Table 17 present MMRE and PRED which is obtained by all methods during the experimental work and results show the superiority of the proposed method. In all kind of measurements, proposed method gives better results. The value of VAF shown by all techniques in Table 18 and EAMD shows good result than the other techniques.
Comparison of MRE for model 3
Comparison of MMRE and PRED for model 3
Comparison of VAF for model 3
We have also compared our proposed algorithms to the different classification of the software projects categorized as Organic, Semidetached and Embedded based on the Boehm’s COCOMO model. Comparison of the result with our proposed algorithms for the basic COCOMO model is shown in Section 5.4.
We have also compared our proposed algorithms to the different classification of the software projects categorized as Organic, Semidetached and Embedded based on the Boehm’s COCOMO model. Comparison of the results with our proposed algorithm for the basic COCOMO model is discussed below. In Table 19, optimized values of the parameters a and b are shown which is used to obtain the results of estimated effort, MRE, MMRE and PRED. These results are shown in Tables 20–22. We can see the better performance of our proposed algorithms in comparison to the Boehm’s models.
Optimized value of the parameters for the Boehm’s model and proposed model
Optimized value of the parameters for the Boehm’s model and proposed model
Comparison of EE for the Boehm’s model and proposed model
Comparison of MRE for the Boehm’s model and proposed model
Comparison of MMRE and PRED for the Boehm’s model and proposed model
Proposed work has been used to estimate the software cost with intent to minimize the difference between the estimated (predicted) and measured (actual) cost/effort of different projects. For this work, parameters a, b, c and d of Alaa F. Sheta models is tuned by EAMD. To check the closeness between predicted result and actual result, MMRE is taken as fitness function. From the results it is clear that values obtained by proposed method is more accurate as compared to other models like Sheta, Sharma et al. and Brajesh et al. A percentage reduction has been seen in MMRE obtained by proposed algorithm i.e. for model 1, 10.38% from Sheta and 8.30% from Sharma et al., for model 2, 15.68% from Sheta and 2.42% from Sharma et al. and for model 3, 69.09% from Sheta, 3.25% from Sharma et al. and 10.59% from Brajesh et al.
The minimum value of MMRE obtained by the proposed method (using EAMD) is 19.67% which is very less as compared to other proposed techniques. Best value of percentage of prediction (PRED (L)) at L = 25% is 72.22 and VAF is 97.92% which is better than the other techniques. It is also found that proposed method gives better results than the basic COCOMO model for Organic, Semidetached and Embedded software projects shown in Tables 19, 20, 21 and 22. It is found that the proposed algorithm is very effective in predicting the overall cost of any project with minimum MMRE, high percentage of PRED and VAF.
In future we will introduce some new optimization techniques to improve the performance of different cost estimation models and provide better estimation with minimal error rate.
