Software reusability metrics prediction and cost estimation by using machine learning algorithms

Abstract

In this research, a highly robust and efficient software design optimization model has been proposed for object-oriented programming based software solutions while considering the importance of quality and reliability. Due to a piece of information that software component reusability has allowed cost and time-efficient software design. The software reusability metrics prediction and cost estimation play a vital role in the software industry. Software quality prediction is an important feature that can be achieved a novel machine learning approach. It is a process of gathering and analyzing recurring patterns in software metrics. Machine learning techniques play a crucial role in intelligent decision making and proactive forecasting. This paper focuses on analyzing software reusability and cost estimation metrics by providing the data set. In the present world software, cost estimation and reusability prediction problem has been resolved using various newly developed methods. This paper emphasizes to solve the novel machine learning algorithms as well as improved Output layer self-connection recurrent neural networks (OLSRNN) with kernel fuzzy c-means clustering (KFCM). The investigational results confirmed the competence of the proposed method for solving software reusability and cost estimation.

Keywords

Object-Oriented Metrics software reusability metrics machine learning techniques software cost estimation

1. Introduction

Software reusability is a characteristic that refers to the projected reuse possibly of a software module. A module can be well-thought-out, a self-governing replaceable piece of the application that gives a distinct time function. A module of a software can be logically put together, which is created separately and distributed as components. The separated components can be associated unaffected with other apparatus to create an extensive system. Software reusability enhances the efficiency as well as can impact on the excellence and maintainability of software product. Developers have frequently used the concept of software reuse because of time and money constraints. Within a short period, developing the new system is a challenging and costlier one. It is generally assumed that the reprocessing of existing software will improve the dependability of a new software application. The real challenge involved in Software metrics is processing a large amount of historical data. Initially, the authors have taken the data set “sourceforge.net.”. In this paper, the author considered the WoS (Web of Service) projects. There are 100 numbers of WoS JAVA projects taken and extracted the software metrics. The role of data analytics (ML) Machine learning algorithms are used to perform so that the data is processed and patterns are generated. In this paper, we have extracted the software metrics by using the Chidambaram and Kemerer (CK) metrics, which are popularly known as CKJM. There are six well defined class oriented metrics are available like: -WMC (Weighted Method Per Class), DIT (Depth of Inheritance), NOC (Number of Classes), CBO (Coupling Between Objects), LCOM (Lack of Cohesion). These metrics are used to get the structural investigation of OO (Object – Oriented) based software mechanism. The primary objective of SE (software engineering) is to construct high-quality reusable software within the given period, with the least amount of price, and that satisfies the users requirement. To improve the quality of the software, the number of methods should be assumed. In the present trend, a set of experimental tools are used to determine the aspects of software with the help of recent developed technologies. Software measurement considers size, complexity, reliability and provides useful information concerning the exterior aspects of software like reusability, maintainability and testability.

2. Literature review

Dissimilar structures are enumerated in a variety of learning due to their belongings on the software merits. Padhy et al. [2] represents the concept of software reusability metrics and its implementation by using novel evolutionary algorithms. They proposed a set of parameters that allows estimating the measurement from the software code. Priyalakshmi and Latha [3] present the Evaluation of Software Reusability Based on Coupling and Cohesion. They have used several techniques to estimate the reusability metrics in this paper, they have focused cohesion primarily and coupling metrics as well as defined their metrics for extracting the quality information from the package. Li and Dai [4] present the article which demonstrates the objective of software reusability metrics and its importance in the software industry. They proposed a set of parameters that can calculate the Message length-based causal discovery, and others one is Ensemble structure discovery. Padhy et al. [5] focused more on software reusability criteria. They have done the systematic literature review (a decade review from 2000–2018) and found the essential reusability influencing factors. Hudaib et al. [6] discussed the software reusability metrics prediction by using a self-organization map (SOM). The SOM is one of the known pattern matching algorithms and ANN representation that is based on competitive knowledge and is an unsupervised learning paradigm. Padhy et al. [7] describe the complexity estimation using software met- rics. They have taken a set of 20 numbers of programs of each C# and Python and estimated the metrics form each other, further demonstrated software reusability metrics. Bou Nassif et al. [12] demonstrate how effort estimation can be done by using novel machine learning algorithms. Padhy et al. [13] proposed a set of reusability metrics by considering a six CK metrics set, which is defined by Chidambaram (1994). They have demonstrated the importance of software reusability in the industry. Padhy et al. [14] proposed evolutionary algorithms and models which able to estimate the reusability cost estimation by using novel machine learning algorithms. Przemyslaw et al. [16] discussed the software cost and project estimation by using unique machine learning algorithms Padhy et al. [17] identified the reusable components from the Component-Based System. They have proposed several metrics and model which will determine the reusable parts from the repository system. Padhy et al. [18] discusses the software reusability metrics from the MobileApps by using evolutionary algorithms. They have taken the case study of mobile Apps, which is called RozGaar. They have taken the evolutionary algorithm, which allows making a prediction of job seeker as well as a provider. Padhy et al. [19] demonstrated the software estimation procedure and parameter settings. Authors have used the different algorithms, notations to represent reusability metrics. Several algorithms have been used to get the reusability prediction.

Figure 1.

Software metric.

3. Proposed software metrics model

Software metrics can be used throughout a software development life cycle to assist in the estimation, quality control, productivity assessment, and project control. Figure 1 presents the software metrics life cycle, that demonstrate the metric software concept with measurement based techniques.

Table 1
CK metrics

Metric	Description
CBO	It represents the total number of different classes to which specific class is coupled with a particular class used to be combined with other classes in case the approaches of one class employ the attributes of each other. It encompasses inheritance based on the coupling.
RFC	Response for a Class (RFC) indicates towards approaches in the response set of the class, where the response set of specific category comprises the set of approaches of the class and the cluster approaches directly summoned by the criteria in set $M$ . A alternative of RFC rule outs techniques indirectly called up by a method in class set $M$ .
DIT	DIT metrics represent the Dept of Inheritance. Tree, it measures the total height from a parent node to the leaf node of the given tree.
NOC	The total number of children in the class available in the class diagram.
WMC	It stands for method measurement in the class hierarchy. The sum of all the methods includes in the class. Even though the developers of this metric left the weighting approach as a completion decision, but if no weighting method is employed, WMC turns out to be the number of strategies used in particular classes.
LCOM	Lack of Cohesion of Methods metric (LCOM) estimates the whole amount of pairs of approaches inside the class, which there is no attributes in common, minus the number of pair’s methods that do.

Figure 2.

Different characteristics of different metrics.

3.1 Research question

RQ 1:
Is it possible to predict the software reusability metrics
RQ 2:
Is it feasible to estimation the software cost?
RQ 2:
Is it feasible to derive the software cost estimation by using novel K-based fuzzy clustering?

In contrast, Metrics play a vital role in measuring quality products [15]. It can be employed throughout SDLC to assist fault and software quality estimation, quality monitoring and control, product evaluation, etc.

Figure 1 represents the flow of metrics and their classification. The above diagram indicates the hierarchical relationship of metrics. There are different types of parameters described; among them, CK metrics is one of them. In this paper, our prime concentration is CK metrics. The CK metrics consist of six other metrics, which we have been used to estimate the metrics from the data set. These are WMC (Weighted Method Per Class), DIT (Depth of Inheritance), NOC (Number of Children), CBO (Coupling between Object) RFC (Response for Class), and LCOM (Lack of Cohesion). The detailed classification of software metrics represents in the Fig. 2.

In this paper, the following six CK metrics have been used for fault detection and classification, as well as prediction purposes. The objective of choosing the attribute subsection is to access the complete feature set by utilizing enriched principles. Generally, the assortment of attribute subdivision includes two main fragments: the calculation of the feature subsections and examination of the overall existing space. Likewise, the evaluation of the specific features where the exploration contains an empirical model established on extensive survey methodology. The collection of assets signifies the method to decrease the laid-off attributes of the primary data and used to calculate the attribute subsections for assessing the improved value in the function.

OSM (Object-Oriented Software Metrics) are used to compute an assortment of characteristics of software like size, complexity, reliability, quality, and so on. They participate in a significant responsibility in analyzing and improving software code. Software measurement also offers useful information on exterior eminence aspects of software such as its maintainability, reusability, and reliability [8, 9].
4. Importance of software measurement

The fundamental requirement of any metric is to measure a precise issue. Here, the primary task is to identify the metrics and formulate what we expect to achieve in the course of reusability.

The most important plans in the construction of reusable mechanism are applicability, competence, ease of use, and reduce the efforts and comfort of maintenance.

The subsequent are the key aims for reuse metrics. The importance is on straightforwardness and ease

1.
To make available practical procedures of reuse
2.
To estimate the reimbursement of reuse. The parameters help to judge the approximation of the benefits from specific factor
3.
To offer a response to developers and management
4.
To give easy, effortless the recognition of values.

5. Research objective

In this section, the authors have discussed the state-of-the-art issues of Object-Oriented Metrics.

The principles of this research work are sketched below:

1.
To recommend the diversified ML (Machine learning) performance based on software reusability estimation replica for OO (Object-Oriented) software using use case point approach
2.
To evaluate the efficiency of machine learning techniques for reusability estimation of WoS (Web of Service) projects and validate the result using the dataset, and collected from “Sourceforgue.net.”

6. Machine learning techniques for reusability estimation

In this paper, we have used machine learning algorithms to estimate the software reusability metrics from the data set. There are several algorithms used, which are specially used for prediction purposes. The choice regarding deciding the machine learning technique for accomplishment reason in the planned investigate is carried out based on the historical research study done in the literature survey [10, 11].

6.1 Decision tree technique

This technique is called a prediction technique where an intelligent model characterized, and the dependent variable used as a set of predictor variables. Initially, it was proposed by “Morgan and Sonquist in 1963,” and at that time, it was referred to as Automatic Interaction Detection (AID).

The primary objective of the DT algorithm is to optimize every node locally rather than globally for the whole tree structure diagram. It provides better results accuracy As compared to other models. It may understand the ill belongings of the over-fitting issues.

6.2 Random forest technique

It is another kind of classification and regression analysis technique. The random forest (RF) falls under ensemble learning algorithms. It builds several decision trees throughout the preparation period and decides the concluding class by choosing e the mode of the levels produced by distinct trees. To get better results which are competitive than the results from individual DT models, ensemble model joins the consequences from dissimilar models of similar type or different types.

Table 2
Performance evaluation criteria

Parameter	Mathematical expression	Definition
Accuracy	$\frac{(TN+TP)}{(TN+FN+FP+TP)}$	Accuracy measure is the proportion of predicted fault-prone modules that are inspected out of all
		modules.
Precision	$\frac{TP}{(TP+FP)}$	It is used to measure the degree to which the repeated measurements under unchanged conditions
		show the same results.
F-measure	$2.\frac{\textit{Recall. Precision}}{\textit{Recall}+\textit{Precision}}$	F-measure combines the precision and recall numeric value to give a single score, which is
		defined as the harmonic mean of the recall and precision.
Recall	$\frac{TP}{(TP+FN)}$	Recall indicates how many of the relevant item that is to be identified.
Specificity	$\frac{TN}{(TN+FP)}$	Specificity focus on how effectively a classifier identifies the negative labels.

6.3 Support Vector Machines (SVM)

It is the groups of learning machines, supportive of implementing the structural hazard minimization inductive standard in arrange to get a good generalization on a limited number of learning prototypes. For the regression type, the authors have used the SVM (Support Vector Machine). The earlier drawbacks found in the literature can be overcome by the SVM model. Weighted Gradient Boost Support Vector Machine Classification Technique After relevant feature selection, Weighted Gradient Boost Support Vector Machine Classification (WGB-SVMC) Technique is proposed in GLSFE-WGBSC Technique to attain improved classification efficiency for brain tumor disease identification.

On the contrary to well-known works, WGB-SVMC Technique is designed by combining Support Vector Machine (SVM) and gradient boosting classification.

On the contrary to existing gradient boosting classification, WGB-SVMC Technique assigns a weight value for each base SVM classifier based on loss function and thus finds the robust classifier for brain tumor disease classification. The WGB-SVMC Technique combines a set of base SVM classifiers (i.e., weak classifiers) to form a reliable and robust classifier. An SVM is a weak classifier, which is a prediction model with relatively poor performance (e.g., in terms of accuracy) that may lead to wrong conclusions due to the high rate of misclassification error. To convert a weak learner to a strong one, the predictions of several weak independent learners have to be combined. This combination accomplished by taking the higher weight of every prediction of all vulnerable learners as the final robust classifier. SVM predicts the reuse-proneness of the OO-software classes. To model SVM based prediction the following generic function is applied

$\displaystyle Y^{\prime}=w*\phi(x)+b$ (1)

Where represents the non-linear transform. The prime objective of this model is to estimate the value is obtained by the minimization of regression risk.

$\displaystyle R_{\textit{reg}}(Y^{\prime})=C*\sum^{l}_{i=0}\gamma(Y^{\prime}_{% i}-Y_{i})+\frac{1}{2}*||w||^{2}$ (2)

It signifies the cost function, and the penalties for the estimation error (small and significant value of $C$ make the cost of misclassification low (“soft margin”) and high (“hard margin”), respectively). Here it is estimated as:

$\displaystyle w=\sum^{l}_{j=1}({\alpha}_{j}-{\alpha}^{*}_{j})\phi(x_{j})$ (3)

7. Materials and methods

Association – Identifying correlations among data and establishing relationships between data that exist 284 together in a given record. Classification – Identify and 285 sorting data into groups based on similarities of data classification is one of the most common applications of data mining. The idea is to build a model to predict future outcomes through the classification of database records into several predefined classes based on specific criteria. Clustering – Finding and visually presenting groups of facts previously unknown or left unnoticed. Heterogeneous data is segmented into several homogenous clusters. Standard tools used for clustering include neural networks and survival analysis Forecasting – Discovering patterns and data that may lead to reasonable predictions. It estimates the future value based on a record’s design. It deals with the continuously valued outcome. Forecasting relates to modeling and the logical relationships of the model and sometime in the future.

7.1 Performance evaluation and its criteria

7.1.1 Mean absolute error (MAE)

$\displaystyle\textit{MAE}=\frac{1}{n}\sum^{n}_{i=1}(|Y^{\prime}_{i}-Y_{i}|)$ (4)

It presents the calculated output, and it gives the actual value as expected.

7.1.2 Mean magnitude of the relative error

Mathematically, it is presented as

$\displaystyle\textit{MMRE}=\frac{1}{n}\sum^{n}_{i=1}{\frac{|Y^{\prime}_{i}-Y_{% i}|}{Y_{i}}}$ (5)

It presents the calculated output and shows the actual value as expected. In the experiment, to alleviate any possibility of overflowing in the denominator, 0.01 is added. Thus, it becomes

$\displaystyle\textit{MMRE}=\frac{1}{n}\sum^{n}_{i=1}{\frac{|Y^{\prime}_{i}-Y_{% i}|}{Y_{i}+0.01}}$ (6)

7.1.3 Standard error of the mean (SEM)

Mathematically, SEM is presented as below

$\displaystyle\textit{SEM}=\frac{\sigma}{\sqrt{N}}$ (7)

Table 3

Software reusability metrics comparison using machine learning algorithm

Techniques	MAE	MMRE	RMSE	SEM
Decision tree	0.1209	0.8092	0.0330	0.0404
SVM (linear)	0.1648	0.6494	0.0549	0.0494
SVM (quadratic)	0.0989	0.4895	00110	0.0413
SVM (polynomial)	0.1099	0.6993	0.0220	0.0397
RF	0.1219	0.6620	0.1050	0.0230

In the above equation refers to the standard deviation of the mean and presents the total number of samples.

In this paper, we have used several parameters that will estimate the accuracy, precision, F-measure, and recall. The corresponding mathematical expression has been given in the tabular form. Apart from this, there are other parameters that are used, which are called errors.

Figure 3.

Software reusability comparison graph.

8. Data and methods

The artificial Intelligence-based ANN model has developed for three machine learning algorithms, specifically SVM, RF, and DT, where their efficiency measured. The overall representation has been drawn in terms of the confusion matrix, which represents in Fig. 4. It shows the value of Accuracy of SVM is 93.41%, and F-Measure is 95.45%, which obtained for training, validation, testing, and overall, respectively.

Figure 4.

SVM.

Figure 5.

Random forest.

Similarly, the overall representation has been drawn in terms of the confusion matrix, which is represented in Fig. 5. It shows the value of Accuracy of RF is 83.41%, and F-Measure is 89.45%, which obtained for training, validation, testing and overall respectively. As a comparison from these two models, it reveals that SVM dominates RF technique.

Similarly, the overall representation has been drawn in terms of the confusion matrix, which is represented in Fig. 6. It shows the value of Accuracy of DT is 92.40%, and F-Measure is 95.45%, which obtained for training, validation, testing and overall respectively. A comparison in terms of DT and SVM from these two models it reveals that SVM dominates all other techniques which are clearly visualized in Fig. 7.

9. Proposed software reused prediction model

During data pre-processing techniques, the raw data must be transformed into target data. Then after the target data must be converted by using the selection technique so that pre-processed data will generate. In this paper, our raw data are 100 WoS projects developed using an object-oriented programming approach. To convert a sophisticated tool has been designed. A CKJM (Chidambaram and Kemerer) metrics extracted, then normalization is done, which range must be 0 to 1. At the end of the stage, machine learning reusability proneness techniques used.

10. Software cost estimation using kernel-based fuzzy clustering model

Papamichail et al. [21] discussed the data set about software reusability based on the static analysis metrics. They have explored the data set and presents the reuse rate information. They have discussed 62 maven projects and examined the number of classes and packages. Their primary focus on the source meter tool and finally retrieved 24,000 classes and 2000 packages. The primary purpose of this function is to use employing kernel function is to rewrite to get the intended service of the kernel-based fuzzy C-means clustering (KFCM) algorithm.

$\displaystyle J({P,U,C})=\sum\limits_{i=1}^{c}\sum\limits_{j=1}^{n}{u_{ij}^{m}% }k({p_{j},u_{j}}){}+k({c_{i},c_{i}})-2k({p_{j},c_{i}})$ (8)

In the above equation, the parameter $k$ points out the kernel function and’ denotes the weighting exponent of each fuzzy membership function and the fuzzy partition matrix denoted as ‘U’. The algorithm further divided into two different forms. As a comparison to the kernel, RBF outperforms and gives non-complex derivation to compute the cluster centers.

Figure 6.

Decision tree.

Figure 7.

Software reused prediction model.

Figure 8.

Structure of hidden layer for OLSRNN model.

Moreover, the partition matrix and cluster centers can be computed using alternating optimization [8] as follows:

$\displaystyle u_{ij}=\frac{({1-k({p_{j},c_{i}})})^{-\frac{1}{({m-1})}}}{\sum% \limits_{l=1}^{c}{({1-k({p_{j},c_{l}})})^{-\frac{1}{({m-1})}}}}$ (9) $\displaystyle c_{i}=\frac{\sum\limits_{j=1}^{n}{u_{ij}^{m}}k({p_{j},c_{i}})p_{% j}}{\sum\limits_{l=1}^{c}{u_{ij}^{m}k({p_{j},c_{i}})}}$ (10)

In this paper, the polynomial kernel has also been examined right through this make inquiries. To achieve this research work, the PKF (Polynomial Kernel Function) is used to redraft the objective function (i.e. Eq. (8)). And then, making an allowance for the FPM (Fuzzy Partition Matrix) and CC (Cluster Centers), it is being optimized. After optimization, we got the fresh CC and FPM. This can be acquired as below:

$\displaystyle u_{ij}=\frac{\left.\begin{array}[]{c}(k({p_{j},p_{j}})+k({c_{i},% c_{i}})\\ -2k({p_{j},c_{i}}))^{-\frac{1}{({m-1})}}\\ \end{array}\right.}{\left.\begin{array}[]{c}\sum\limits_{l=1}^{c}(k(p_{j},p_{j% })+k(c_{i},c_{i})\\ -2k(p_{j},c_{i}))^{-\frac{1}{(m-1)}}\\ \end{array}\right.}$ (11) $\displaystyle c_{i}=\frac{\sum\limits_{j=1}^{n}{u_{ij}^{m}}p({\langle{p_{j},c_% {i}}\rangle+1})^{q-1}}{\sum\limits_{l=1}^{c}{u_{ij}^{m}k({\langle{c_{i},c_{i}}% \rangle+1})^{q-1}}}$ (12)

Table 4

Statistical properties of used datasets

Software projects/dataset	Features	Effort data
		Completed unit	Min	Max		Mean		Median		Skew
[P1]	12	Months	1	105		22		12		2	.2
[P2]	18	Hours	12	54620		3921		1829		3	.92
[P3]	13	Months	3	138	.3	49	.47	26	.5	0	.57
[P4]	14	Months	4	11400		686		98		4	.4
[P5]	19	Hours	453	63694		8223	.2	5189	.5	3	.26

Figure 9.

KFCM-OLSRNN based block diagram for software cost estimation.

10.1 The hypothesis of the problem and its solution for cost estimation

In this section, the proposed model is constructed where a single training sample can be indicated as $({u,v})$ , in which the feature vectors sequence is denoted as $u$ depicting $\{{u^{1},u^{2},\ldots,u^{T}}\}$ , and a D-dimensional label vectors sequence is denoted as $v$ depicting $\{{v^{1},v^{2},\ldots,v^{T}}\}$ . $T$ can be represented as the Input sequence length, and D is termed as the output layer. The corresponding label vector is represented as $v^{t}$ and $t^{h}$ e feature vector of the input sequence is denoted as $u^{t}$ ; whereas, the right label dimensionality is represented as 1, and the dimensionality that corresponds to other labels is denoted as 0, respectively. Figure 7 represents the hidden layer structure, and the training sample $({u,v})$ can be visualized in the Fig. 8.

10.2 Software cost estimation by using KFCM

There are several authors focused on software cost estimation techniques by taking sample software project datasets. Padhy et al. [3] demonstrated the model for software cost estimation and reusability prediction. The author [15] focused on software cost estimation techniques by taking sample data sets. FCM stated as one of the premier fuzzy clustering algorithms, but it is one that’s why it is not widely used as well as multifaceted intrinsic uniqueness cannot proceed as well on polarimetric data clustering. It is known as linearly separable patterns. This part of this paper investigates the new cost prediction techniques, which are called KFCM, and it is derived from the COCOMO model. This model primarily required the following parameters, and these are as follows:

•
Person team size
•
Work proceeds time in months
•
Person efforts in the month

The above parameters can be represented in numerical ways:

$\displaystyle\textit{Effort}=a\times[\textit{SIZE}]^{b}\times\prod\limits_{i=1% }^{15}{EM_{i}}$ (13)

In the above equation, the coefficient ‘a’ represents the productivity, and ‘b’ indicates the scaling factor parameter. The accuracy of the project cost will be increased with 15 cost drivers mentioned as $EM_{I}$ . It is essential to normalize all the training datasets in the range [0, 1], before the application of the algorithm. The below-mentioned Fig. 9 represents block diagram for software cost estimation model.

In this part of this article provides the necessary dataset information. The primary dataset refers to PROMISE, but in this paper, a sample of students’ lab assignments considered to be the data set of GIET. To assess the performance of the proposed approach, we employed five software measurement datasets of the PMS-Repository of GIET. Table 4 offers the statistical information for these datasets in terms of a numeral of features in each dataset, dataset size, mean, minimum, maximum and skewness of effort.

Padhy et al. [20] focused on the reusability metric estimation as well as threshold derivation. Authors have also described the evaluation of threshold measurement as well as the threshold parameter setting mechanism. Maggo and Gupta [22] discussed the software reusability prediction model by using novel machine learning algorithms based on certain OOPs (Object-Oriented Programming System). They have developed the optimisation model for software reusability prediction. Papamichail et al. [23] discussed the software user-perceived reusability estimation techniques. They have estimated metrics from the software repositories. Again they have used several machine learning algorithms for estimation software metrics at both class and package level.

Table 5
Comparison of machine learning models for cost estimation model

Models MMRE MdMRE PRED (0.5)

Training Testing Training Testing Training Testing

SVM 0.38 0.44 0.41 0.38 0.90 0.98

RF 0.36 0.51 0.48 0.46 0.71 0.70

DT 0.38 0.48 0.50 0.32 0.86 0.87

GRU 0.50 0.50 0.42 0.34 0.98 0.80

10.3 Performance measures

Models	MMRE	MdMRE	PRED (0.5)
	Training	Testing	Training	Testing	Training	Testing
SVM	0.38	0.44	0.41	0.38	0.90	0.98
RF	0.36	0.51	0.48	0.46	0.71	0.70
DT	0.38	0.48	0.50	0.32	0.86	0.87
GRU	0.50	0.50	0.42	0.34	0.98	0.80

To identify the best possible (cost-efficient and effective reusability estimation model), the performance of the proposed system is assessed in two phases:

•
Accuracy related performance assessment.
•
Cost-efficiency related performance assessment.
•
A brief of these performance evaluation approaches is given as follows.

Accuracy related performance assessment

In this phase, performance is evaluated in terms of prediction Accuracy, Precision, F-Measure, Recall and Specificity and different error-profiles such as mean absolute error (MAE), the mean magnitude of the relative error, standard error of the mean (SEM), and root mean square error (RMSE). To retrieve performance measures, confusion metrics are obtained for each classifier. The key variables of the confusion metrics are given in Table 2.

Metrics-1

The performance of the above model can be measured by using the below metrics, and it can accurately identifying model derivation which is called as MMRE (mean magnitude of relative error)

$\displaystyle\textit{MMRE}=\frac{1}{N}\sum\limits_{X=1}^{N}{\frac{\left|{t\arg et% _{i}-\textit{output}_{i}}\right|}{t\arg et_{i}}}$ (14)

In the above equation, the parameter ‘N’ stands for the number of projects, and it $\text{target}_{i}$ is the desired output for the project $i$ .

Metrics-II

By taking the above equation, we can derive the new metrics which estimate the median value of all the mean magnitude errors, which termed as MdMRE. It can be represented as below:

$\displaystyle\textit{MdMRE}=\textit{Median}(\textit{MRE})$ (15)

Metrics-III

The third metric used in this work is defined as the percentage of predictions falling within the actual known value, referred to as PRED, which is given by

$\displaystyle\textit{PRED}(l)=\frac{100}{N}\sum\limits_{i=1}^{N}{D_{i}}$ (16)

Where

$\displaystyle D_{i}=\left\{{{\begin{array}[]{ll}1,&\textit{if }(\textit{MMRE}_% {i})<\frac{l}{100}\\ 0,&\textit{otherwise}\\ \end{array}}}\right.$

Where $N$ is the total number of projects in which term $l$ is equalled to 25 (then the PRED metric is defined as PRED (25)).

We utilise the cross-entropy between the predicted probability $o_{k}^{T}$ and the actual label $y_{k}^{T}$ as the cost function of models

$\displaystyle J({u,v})=-\frac{1}{T}\sum\limits_{t=1}^{T}\sum\limits_{k=1}^{K}y% _{k}^{t}\log({o_{k}^{t}})$ (17)

It seems that different models convergence rates and their essential differences are obviously depicted. Since in the proposed reusability prediction model the total number of reusable class identified by the proposed reusability prediction model is equal to the sum of true positive (TP) and false positive (FP) value, it is must estimate the cost incurred at testing and verification process at class level. In this manner, the cost of evaluation would be equivalent to the cost of unit testing ( $C_{u}$ ), and therefore the total unit-testing cost would be:

$\displaystyle\textit{Cost}_{\textit{Unit}}=(TP+FP)C_{u}$ (18)

Figure 10.
Machine learning models for cost estimation model.

In practice, where there can be 1000s of classes particularly in WoS solutions where there can be more functions or classes working dependently, it becomes infeasible to identify 100% of reusable components (due to higher complexity and inter-module dependency), it can be possible that a few of the correctly predicted faulty class may remain undetected in unit testing. Additionally, there is the possibility that such faulty classes and the defective classes which were predicted as non-reusable classes (number of FN) are identified by the predictor in later phases of testing such as system testing ( $C_{s}$ ) and field testing. Therefore, the final system-level reusability estimation cost can be estimated as:

$\displaystyle\textit{Cost}_{\textit{System}}={\delta}_{s}C_{s}$ (19) $\displaystyle\quad(FN+(1-{\delta}_{u})TP)$

In testing, if prediction analysis is avoided then the unit testing of each class becomes a must. Thus the total cost on unit testing turns out to be:

$\displaystyle\textit{Cost}_{\textit{Unit}}=M_{p}C_{u}TC$ (20) $\displaystyle\quad(\textit{Total number of classes})$

The experimental results suggested that improved results are achieved by combining the KFCM algorithm with the OLSRNN method than the other traditional methods such as KFCM-LSTM, KFCM-SRNN, and KFCM-GRU. In other words, for all the datasets, the KFCM-OLSRNN process provides more PRED (0.25) values, MdMRE, and MMRE values. Also, it is more significant for the selection of the validation technique.
11. Conclusion and future scope

In this article, software reusability prediction and cost estimation model have developed for object-oriented design based Web of Service software systems. The proposed method applies several novelties, such as a web of service software metrics estimation and reusability prediction. The overall results presented in the form of a confusion matrix in Fig. 6. The confusion matrix represents the accuracy of the Decision tree is 92.40%, and F-Measure is 95.45%, which obtained for training, validation, testing phases. Similarly, Random Forest and SVM algorithms are used for software cost estimation and reusability prediction. This research work revealed that is it feasible to derive an efficient and robust Reusability prediction model for Web-service products using OO-CK metrics. Here, it was also found that OO-CK metrics, particularly complexity, cohesion, and coupling related metrics can help predict Reusability in Web-services software products.

In the future, researchers may use deep learning algorithms to detect software reusability metrics prediction and cost estimation. The other researchers may focus early prediction whether the system enters into the aging (fault) or not? In the future, other reusability threshold estimation, feature selection, and evolutionary computing based classification algorithms can be explored to provide a more efficient solution.

References

Gill

N.S.

, Importance of software component characterization for better software reusability, ACM SIGSOFT Software Engineering Notes 31(1) (Jan 2006), 1–3.

Padhy

Singh

R.P.

and Satapathy

, Software reusability metrics estimation: algorithms, models and optimization techniques, Elsevier, Computers & Electrical Engineering 69 (2018), 653–668.

Priyalakshmi

and Latha

, Evaluation of Software Reusability Based on Coupling and Cohesion, 28(10) (2018), 1455–1485,

and Dai

H.H.

, What will affect of software: a casual model analysis, International Journal of Software Engineering and Knowledge Engineering 14(3) (2004), 351–364.

Padhy

Satapathy

and Singh

R.P.

, State-of-the-Art Object-Oriented Metrics and Its Reusability: A Decade Review, Smart Innovation, Systems and Technologies, Vol. 77, Springer, Singapore, 2018.

Hudaib

Huneiti

and Othman

, Software reusability classification and predication using self-organizing map (SOM), Communications and Network 8 (2016), 179–192.

Padhy

Satapathy

and Singh

R.P.

, Utility of an object-oriented metrics component: examining the feasibility of.Net and C# object-oriented program from the perspective of mobile learning, International Journal of Mobile Learning and Organization (IJMLO) 12(3) (2019), 2018.

Halstead

M.H.

, Elements of Software Science, New York, Elsevier North-Holland, 1977.

McCabe

T.J.

, A complexity measure, IEEE Trans. On Software Engg. SE-2(4) (Dec. 1976), 308–320.

10.

Corazza

Di Martino

Ferrucci

Gravino

and Mendes

, Investigating the use of support vector regression for web effort estimation, Empirical Software Engineering 16(2) (2011), 211–243.

11.

Braga

P.L.

LI Oliveira

and Meira

S.R.L.

, Age-based feature selection and parameter optimization for support vector regression applied to software effort estimation, in: Proceedings of the 2008 ACM Symposium on Applied Computing, ACM, 2008, pp. 1788–1792.

12.

Bou Nassif

Fernando Capretz

and Azzeh

, A tree boost model for software effort estimation based on use case points, in: Machine Learning and Applications (ICMLA), 2012 11th International Conference on, IEEE, Vol. 2, 2012, pp. 314–319.

13.

Padhy

Satapathy

and Singh

R.P.

, The utility of object-oriented reusability metrics and estimation complexity, Indian Journal of Science and Technology 10(3) (2017).

14.

Padhy

Singh

R.P.

and Satapathy

, Cost-effective and fault-resilient reusability prediction model by using adaptive genetic algorithms based neural network for the web of service application, Springer Cluster computing, 2018.

15.

Vahid Khatibi

Jawawi Dayang

N.A.

Hashim

S.Z.M.

and Khatibi

, A new fuzzy clustering-based method to increase the accuracy of software development effort estimation, World Appl Sci J 14 (2011), 1265–1275.

16.

Pospieszny

Czarnacka-Chrobot

and Kobylinski

, An effective approach for software project effort and duration estimation with machine learning algorithms, Journal of Systems and Software 137 (March 2018), 184–196.

17.

Padhy

Panigrahi

and Satapathy

S.C.

, Identifying the Reusable Components from Component-Based System: Proposed Metrics and Model, Information Systems Design and Intelligent Applications, Advances in Intelligent Systems and Computing, 2019, 863.

18.

Padhy

Panigrahi

and Satapathy

, Software Reusability Metrics Prediction by using Evolutionary Algorithms: RozGaar an Interactive Mobile Learning Application of IOS Press, Netherland, Scopus, ESCI, ACM, DBLP Indexing, Scopus, 2018.

19.

Padhy

Singh

R.P.

and Satapathy

S.C.

, Cluster Computing, Enhanced evolutionary computing based artificial intelligence model for web-solutions software reusability estimation, 2017.

20.

Padhy

Panigrahi

and Neeraja

, Evolutionary Intelligence, Threshold estimation from software metrics by using evolutionary techniques and its proposed algorithms, models, 2019.

21.

Papamichail

M.D.

Diamantopoulos

and Symeonidis

A.L.

, Software Reusability Dataset based on Static Analysis Metrics and Reuse Rate Information, Data in Brief, 104687, Elsevier, 2019.

22.

Maggo

and Gupta

S.C.

, Machine learning-based efficient software reusability prediction model for java-based object-oriented software, International Journal of Information Technology and Computer Science (IJITCS) 6 (2014).

23.

Papamichail

Diamantopoulos

Chrysovergis

Philippos

and Andreas

, User-perceived reusability estimation based on analysis of software repositories, IEEE, Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE), 2018.

Software reusability metrics prediction and cost estimation by using machine learning algorithms

Abstract

Keywords

1. Introduction

2. Literature review

Table 1 CK metrics

6.1 Decision tree technique

6.2 Random forest technique

Table 2 Performance evaluation criteria

7.1 Performance evaluation and its criteria

7.1.1 Mean absolute error (MAE)

10. Software cost estimation using kernel-based fuzzy clustering model

10.2 Software cost estimation by using KFCM

Accuracy related performance assessment

Metrics-1

Metrics-II

Metrics-III

References

Table 1
CK metrics

Table 2
Performance evaluation criteria