Software defect prediction based on weighted extreme learning machine

Abstract

The uncertainty of developers’ activity can lead to engineering problems such as increased software defects during software development. Therefore, advanced approaches to discovering software defects are needed to improve software systems by software practitioners. This paper describes a novel framework named Weighted Supervised-And-Unsupervised Extreme Learning Machine (WSAU-ELM) including the construction of supervised weighted extreme learning machine for software defect prediction (WELM-SDP) and unsupervised weighted extreme learning machine with spectral clustering for software defect prediction (WELMSC-SDP) that can perform significantly better than the previous software prediction methods. The key advantages of this proposed work are: (i) both the two algorithms can reveal the better learning capability and computational efficiency; (ii) the supervised prediction algorithm is more precisely and faster to handle data sets than the common models, and save more time and resources for software companies; (iii) the unsupervised prediction algorithm can increase accuracy compared to the current method; (iv) the paper also discusses the software defect priority for the defective data, and provides the detailed priority levels that is not discussed before. Experimental results on the benchmark data sets show that the proposed framework is not only more effectively than the existing works, but also can extend the study by the priority analysis of software defects.

Keywords

Software defect software defect prediction weighted extreme learning machine software defect priority

1. Introduction

Software maintenance has been regarded as the most difficult and costly activity in software development life cycle [5]. During software maintenance period, it is difficult to maintain and evolve a software system with minimised environmental impact, a sufficient economic balance, and well-managed knowledge. Moreover, more bugs or defects cause software systems become complex and time-consuming to maintain. Hence, it is necessary to develop the effective solutions, which could discover the software bugs fast and reduce software development cost.

Yang and Qian [16] proposed approaches to software defect prediction based on supervised learning algorithms. Some popular classifier, including Naive Bayes [6], decision tree [24], logistic regression [1], support vector machine (SVM) [25], k-nearest neighbours (KNN) [18] and ensemble methods [3], have been all utilised to satisfy the requirements of classification. Although the supervised works have implemented the intelligent prediction, the time consuming of these algorithms cannot satisfy developers if the software data keeps increasing continuously, and the model performance were affected by the imbalanced data distribution in software defect data sets. Besides applying the supervised learning to software defect prediction, the unsupervised learning can be also used to predict defect proneness. The typical steps are: 1) clustering software data into k clusters; and 2) labelling each cluster as defective or clean. However, the data dimension usually affects the prediction performance.

To summarise the existing learning models applied in software defect prediction, this paper found that they suffer from either low classification performance or high time-consumption problems. Therefore, this paper wishes to propose more effective and efficient models. The proposed models are named WSAU-ELM. This paper selected extreme learning machine (ELM) as the baseline classifier in software defect prediction based on three observations: 1) it always has better than or at least comparable generality ability and classification performance as SVM and multiple-level perceptron (MLP) [10]; 2) it can tremendously save training time compared to other classifiers [12]; and 3) weighted ELM has an efficient strategy for imbalanced data distribution. In WSAU-ELM, it first took advantage of the idea of cost-sensitive learning to select the weighted extreme learning machine (WELM) [32] as the base learner to address the class imbalance problem existing in the procedure of software defect prediction. Then, it adopted the ELM algorithm to construct a supervised learning framework. Next, this paper drew the advantages from spectral clustering and ELM to descend the data dimension and design an effective weight unsupervised learning framework. Finally, this paper discussed the defect priority for the defective data by Euclidian distance computation, which can support software maintenance further. Experiments are conducted on NASA imbalanced data sets, and the results demonstrate that the proposed algorithmic framework is generally more effective and efficient than several state-of-the-art learning algorithms that were specifically designed for software defect prediction.

In order to solve the above problem, this paper introduced a software prediction framework named WSAU-ELM including the supervised prediction and unsupervised process. For the supervised process, WELM-SDP was proposed and proved that it is not only faster than other methods on running time, but also is competitive with state-of-the-art algorithms in the term of accuracy. For the unsupervised process, WELMSC-SDP can not only reduce the data dimension, but also advance the ACC with the current unsupervised method. Moreover, this paper has discussed the defect priority for the defective data by Euclidian distance computation, which will be able to support software maintenance further.

The contributions of this study are as follows:

•
Proposing novel methods, WELM-SDP and WELMSC-SDP, for software defect prediction on benchmark datasets in a unified framework.
•
Presenting an empirical study to evaluate the WELM-SDP and WELMSC-SDP approaches against existing software defect prediction approaches.

The rest of this paper is organised as follows. Section 3 describes some priori knowledge and the proposed approach. Section 4 provides experimental results and analysis. Section 2 introduces the related work of software defect prediction. Finally, Section 5 concludes the contributions of this paper and indicates future work.
2. Proposed approach

In this section, some preliminaries are presented first, including extreme learning machine, weighted extreme learning machine, and spectral clustering. Then the proposed core algorithmic models of this article are presented later.

2.1 Extreme learning machine

Extreme learning machine proposed by Huang et al. [10] is a specific learning algorithm for single-hidden layer feedforward neural networks (SLFN). The main characteristics of ELM that distinguish it from those conventional learning algorithms of SLFN is the random generation of hidden nodes. Therefore, ELM does not need to iteratively adjust parameters to make them approach the optimal values, thus it has faster learning speed and better generalisation ability. Previous research has indicated that ELM can produce better than or at least comparable generality ability and classification performance to SVM and multiple-level perceptron (MLP) but only consumes tenths or hundredths of training time compared to SVM and MLP.

Let us consider a classification problem with $M$ training instances to distinguishing $n$ categories, then the $i$ th training instance is able to be represented as ( $x_{i}$ , $t_{i}$ ), where $x_{i}$ is a $m\times 1$ input vector, while $t_{i}$ is the corresponding $n\times 1$ input vector. Suppose there are $L$ hidden nodes in ELM, as well all weights and bias on these nodes are generated randomly, then for instance $x_{i}$ , the hidden layer output could be represented as a row vector $h(x_{i})=[h_{1}(x_{i}),h_{2}(x_{i}),\ldots h_{L}(x_{i})]$ , thereby the mathematical model of ELM is described as:

$\displaystyle H\beta=T$ (1)

where $H$ is the hidden layer output matrix over all training instances, $\beta$ describes the weight matrix of the output layer, and $T=[t_{1},t_{2},\ldots,t_{M}]$ denotes the target matrix. It can be seen that only $\beta$ is unknown in Eq. (1), so the least square algorithm can be applied to acquire its solution, which can be shown as follows.

$\displaystyle\beta=H^{\dagger}T=\left\{\begin{array}[]{ll}H^{T}{(HH^{T})}^{-1}% T&\text{when}\ M\leqslant L\\ {(HH^{T})}^{-1}H^{T}T&\text{when}\ M>L\end{array}\right.$ (2)

where $H^{\dagger}$ describes the Moore-Penrose generalised inverse of the hidden layer output matrix $H$ , which can guarantee the solution is the least-norm least-square solution of Eq. (1).

According to previous work, ELM can be trained in the viewpoint of optimisation. In the optimisation version of ELM, we wish to synchronously minimise $||H\beta-T||^{2}$ and $||\beta||^{2}$ , so the question can be described as follows.

$\displaystyle\text{Minimise}:L_{P_{\textit{ELM}}}=\frac{1}{2}||\beta||^{2}+C% \frac{1}{2}\sum^{M}_{i=1}||\xi_{i}||^{2}$ (3) $\displaystyle\text{Subject to}:h(x_{i})\beta=t^{T}-\xi_{i}^{T},i=1,2,\ldots,M$

where $\xi_{i}=[{\xi}_{i,1},{\xi}_{i,2},\ldots,\ {\xi}_{i,m}]$ describes the training error vector of the $n$ output nodes with respect to the training instance $x_{i}$ , while $C$ is the penalty factor, representing the tradeoff between the maximisation of generality ability and minimisation of training errors. Obviously, this is a typical quadratic programming problem that can be solved by the KarushCKuhnCTucker (KTT) theorem [11]. The solution for Eq. (3) is given as follows.

$\displaystyle\beta=H^{\dagger}T=\left\{\begin{array}[]{ll}H^{T}\left(\frac{1}{% C}+HH^{T}\right)^{-1}T&\text{when}\ M\leqslant L\\ \left(\frac{1}{C}+HH^{T}\right)^{-1}H^{T}T&\text{when}\ M>L\end{array}\right.$ (4)

2.2 Weighted Extreme Learning Machine

Weighted Extreme Learning Machine that can be regarded as a cost-sensitive learning version of ELM is an effective way to handle imbalanced data. Similar to CS-SVM, the main idea of WELM is to assign different penalties for different categories, where the minority class has a larger penalty factor $C$ , while the majority class has a smaller $C$ value. Then, WELM focuses on the training errors of the minority instances, making a classification hyperplane emerge in a more impartial position. A weighted matrix $W$ is used to regulate the parameter $C$ for different instances, i.e., Eq. (3) can be rewritten as:

$\displaystyle\text{Minimize}:Lp_{\textit{ELM}}=\frac{1}{2}||\beta||^{2}+C\frac% {1}{2}W\sum_{i=1}^{N}||\xi_{i}||^{2}$ (5) $\displaystyle\text{Subject to}:h(x_{i})\beta=t_{i}^{T}-\xi_{i}^{T},i=1,2,% \ldots,N$

where $W$ is a $N\times N$ diagonal matrix in which each value existing on the diagonal represents the corresponding regulation weight of parameter Zong et al. [32] provided two different weighting strategies, which are described as follows.

$\displaystyle\text{WELM1}:W_{ii}=1/\#(t_{i})$ (6)

and

$\displaystyle\text{WELM2}:W_{ii}=\left\{\begin{array}[]{ll}0.618/\#(t_{i})&% \text{if}\ \#(t_{i})>\text{AVG}(t_{i})\\ 1/\#(t_{i})&\text{if}\ \#(t_{i})\leqslant\text{AVG}(t_{i})\end{array}\right.$ (7)

where #( $t_{i}$ ), $\text{AVG}(t_{i})$ and 0.618 denote the number of instances belonging to the class $t_{i}$ , the average number of instances over all classes, and the value of the golden standard, respectively. Compared with WELM2, WELM1 is more practical and popular. Then, the solution can be described as follows.

$\displaystyle\beta=\left\{\begin{array}[]{l}H^{T}\left(\frac{I}{C}+\text{WHH}^% {T}\right)^{-1}WT,\text{when}\ N\leqslant L\\ \left(\frac{I}{C}+\text{HWH}^{T}\right)^{-1}H^{T}WT,\text{when}N>L\end{array}\right.$ (8)

2.3 Spectral clustering

Unsupervised models make use of clustering methods. Clustering is a common way to discover similar entities. Frequently applied clustering algorithms include k-means and hierarchical clustering. K-means clustering is often to handle high-dimensional data that are linearly separable [13] Hierarchical clustering generates clusters based on the structure of a similarity matrix [9]. Recently, spectral clustering has become more effective to implement clustering process [4] because it can handle different data distribution compared to the traditional clustering methods.

Unlike the k-means clustering based on Euclidean distance, spectral clustering divides dataset based on connectivity among its entities. The similarity [22] in the spectral clustering can be denoted as shown in

$\displaystyle w_{ij}=x_{i}\cdot x_{j}=\sum^{m}_{k=1}{a_{ik}a_{kj}}$ (9)

where $x_{i}$ and $x_{j}$ denote the metric values of software entities $i$ and $j$ , $a_{ik}$ is the value of the $k$ th metric on the $i$ th software entity, and $m$ is the total number of metrics. The similarity can be recognised as unnormalised Pearson correlation coefficient between nodes $i$ and $j$ [23]. It is unnormalised because it makes little sense to normalise the values across metrics belonging to the same software entity.

In the work of [15], a popular solution for spectral clustering is to minimise the normalised cut, which is a disassociation measure the cost of cutting a graph. The three major steps of the algorithm are described as follows:

•

The Laplacian matrix from weighted adjacency matrix is computed, where the Laplacian matrix is the representation of graph;

•

The eigenvalues are conducted based on Laplacian matrix; and

•

A threshold is chosen on the second smallest eigenvector to gain the bipartitions of the graph, and deciding the threshold through adjusting in the experiment.

2.4 Proposed approach

In this section, a WSAU-ELM framework including supervised prediction and unsupervised prediction based on extreme learning machine (ELM) is described.

As seen in Fig. 1, the data sets are processed firstly. For the supervised WELM-SDP, the weighted ELM algorithm is applied to build learning model. Firstly, the feature and label of data can be found in the existing data sets. Secondly, the relationship between feature and label can be decided after the training data, and then a learning model will be obtained. Thirdly, applying to the obtained model implement the software defect classification, and evaluation is completed based on the testing data.

Figure 1.

WSAU-ELM framework.

Beside the supervised process, the proposed method is also feasible to unsupervised process. In unsupervised WELMSC-SDP process, since the data label should be not known in unsupervised process, this paper first cleans the label in the existing data sets. Then, the weighted ELM is utilised to reduce data dimension in order to improve the prediction accuracy. Next, the unsupervised learning-spectral clustering is used to build the model based on the cleaned data, and the testing data is applied to evaluate the model.

Meanwhile, the defects are also divided into different priorities to support software maintenance. The following subsections will illustrate the models in detail.

2.5 Supervised defect prediction process

In this section, the detailed procedure of the supervised algorithm is described. Specifically, the algorithm based on weighted extreme learning machine is called as WELM-SDP. Its flow path is briefly described as follows.

Algorithm 1: WELM-SDP
Input: Training Defect set $\Theta=\{(x_{1},y_{1}),(x_{2},y_{2}),\ldots,(x_{N},y_{N})\}$ , where $y_{i}\in\{+,-\}$ , Penalty factor $C$ , the number of hidden nodes $L$ .
Output: A WELM-SDP classifier
Begin
//Step 1: Data collection
For each $\Theta$ in data sets
$\{$
Dived it into two sets, $\Theta^{+}$ only contains positive instances, and $\Theta^{-}$ only contains negative instances;
}
//Step 2: Data normalisation
For $\Theta^{+}$ and $\Theta^{-}$ in data sets
{
Normalise the training data by Eq. (10);
}
//Step 3: WELM parameter settings
If (WELM selected)
{
Calculate the output matrix H of hidden layer and Laplacian, finally get $\beta$ by Eq. (8);
}
//Step 4: Training model
If (Parameters set)
{
Train a WELM-SP by Eq. (5) with the given parameters C and L, then acquire the decision.
}
End

During the above process, it needs to carry out normalisation processing in order to eliminate classification error between the amounts of data samples, and the method is given as follow in Eq. (10).

$\displaystyle x_{\textit{norm}}=\frac{2x_{i}-\text{max}(x)-\text{min}(x)}{% \text{max}(x)-\text{min}(x)}$ (10)

where $i=1,2,\ldots,k,\text{max}(x)$ and $\text{min}(x)$ mean the maximal value and minimum value of $x$ .

2.5.1 Unsupervised defect prediction process

The existing unsupervised defect prediction methods almost use the clustering algorithms to build the classification model. However, the high data dimension is still a problem in the clustering process. Hence, this section draws the advantages from spectral clustering and ELM, and proposes an improved unsupervised model. This model considers utilising WELM to discover the underlying structure of original data, and descend data dimension. Then, spectral clustering is applied to implement the software defect identification. The algorithm based on weighted extreme learning machine and spectral clustering is called as WELMSC-SDP. Its flow path is briefly described as follows.

It is important to point out in Step 4, this paper considers two situations for $\beta$ value:

•
If the number of hidden neurons ( $H_{n}$ ) is less than the number of training data ( $T_{n}$ ), find the generalised eigenvectors $v_{2}$ , $v_{3}$ , $\ldots$ , $v_{n+1}$ through the $n+1$ smallest eigenvalues, and Let $\beta=[\tilde{v}_{2},\tilde{v}_{3}\ldots\tilde{v}_{n+1}]$ , where $\tilde{v}_{i}=v_{i}/||Hv_{i}||$ , $i=2,\ldots,n+1$ ;
•
If $H_{n}>T_{n}$ , they find the generalised eigenvectors $u_{2}$ , $u_{3}$ , $\ldots$ , $u_{n+1}$ through the $n+1$ smallest eigenvalues, and Let $\beta=H^{T}[\tilde{u}_{2},\tilde{u}_{3}\ldots\tilde{u}_{n+1}]$ , where $\tilde{u}_{i}=u_{i}/||HH^{T}u_{I}||$ , $i=2,\ldots,n+1$ .

Algorithm 2: WELMSC-SDP

Input: Training Defect set $\Theta=\{(x_{1},y_{1}),(x_{2},y_{2}),\ldots,(x_{N},y_{N})\}$ , where $y_{i}\in\{+,-\}$ , Penalty factor $C$ , the number of hidden nodes $L$ .

Output: The label vector of defective and clean clusters: $Y_{\textit{cluster}}$

Begin

//Step 1: Data collection

For each $\Theta$ in data sets

{

Dived it into two sets, $\Theta^{+}$ only contains positive instances, and $\Theta^{-}$ only contains negative instances;

}

//Step 2: Similarity calculation

For $\Theta^{+}$ and $\Theta^{-}$ in data sets

{

Calculate the training data by Eq. (9);

}

// Step 3: Laplacian calculation

For each sample in data sets

{

Construct and calculate the graph Laplacian $L$ from $X_{\textit{training}}$ ;

}

//Step 4: Eigenvector selection

Find the generalised eigenvectors, select the smallest eigenvector and confirm the $\beta$ value;

//Step 5: Embedding matrix $E$

$E=H\beta$ ;

//Step 6: Decide $Y_{\textit{cluster}}$

For each row of E

{

Cluster the $E$ into $k$ cluster using spectral clustering, and deciding $k$ ;

Let $Y_{\textit{cluster}}$ be the label vector of cluster index for all the points.

}

END

Then the embedding matrix is calculated in Step 5, which makes preparation for the eigenvalues dimension descending. Finally, the spectral clustering performs the unsupervised task in the embedded space as seen in Step 6. The numbers of $k$ is needed to discuss. This paper selects the sum of squares of deviations (SSD) [8] to decide the value. If the SSD is smaller, it shows that the data sample is more stable. When all the samples are one category, SSD achieves to the largest value; and it will become smaller when the clustering numbers increase. SSD comes to 0 if each sample is seen as one category.
2.6 Defect priority partition

The software developers can fix software defects better if they can know the software detailed defect priority. Thus, this subsection proposes a distance calculation method to divide the priority level. The IEEE has proposed software defect priority No. 1 through No. 5 as a standard [28]. Hence, it will be helpful if the defective data can be labelled by five levels of priority.

In order to achieve the target, the Euclidean distance is applied between each sample in defective data and each sample in non-defect cluster firstly. The “distance_every_ave” value is obtained to denote as the average of distance sum of any defective sample with all the non-defective samples, and all the “distance_every_ave” values in a cluster are calculated. Secondly, the calculated results are needed to sort and decide the defect level by the distance comparison. According to the Euclidean distance, it can be found that the distance between defective and non-defective data is closer, which proves that the similarity is higher and the fixing priority should be not urgent. Then the defective data is able to label from Priority 1 to Priority 5, which represents the highest priority and the lowest priority respectively.

3. Experiments and analysis

3.1 Data sets

The study collected the benchmark data sets NASA [2] to validate the effectiveness of the proposed algorithms. As shown in Table 1, the collection includes 10 data sets, number of instances, number of defects and defective ratio.

Table 1
NASA data sets

Data	Language	Number of instances	Number of defects	Defective%
mc2	C++	125	44	35.20
kc2	C++	522	107	20.50
jm1	C	10885	2106	19.35
kc3	Java	194	36	18.56
kc1	C++	2109	326	15.46
pc3	C	1077	134	12.44
pc4	C	1458	178	12.21
mw1	C	264	27	10.23
cm1	C	498	49	9.83
pc1	C	1109	77	6.94

All algorithms are implemented in Matlab 2015a environment, and experiments are conducted on Intel(R) Core(TM) i7 6700HQ 8 cores CPU (main frequency: 2.60 GHZ for each core) and 32 GB RAM.

3.2 Experimental settings

To validate the effectiveness and superiority of two proposed algorithms, this paper compared them with many representative and state-of-the-art class imbalance learning algorithms which are presented as follows.

•
Ensemble learning [27]: it is the standard ensemble process without any operations to addressing supervised software defect prediction based Bayes.
•
KNN [26]: it is the standard KNN algorithm without any operations to train supervised prediction model on the training set.
•
Logical regression (LR) [21]: It is the standard LR algorithm without any operations to train supervised prediction model on the training set.
•
Decision tree (DT) [20]: It is the standard DT algorithm without any operations to train supervised prediction model on the training set.
•
Spectral clustering [13]: It is the standard spectral clustering algorithm without any operations to train unsupervised prediction model on the training set.

Before training any classifier, each data set was scaled into [0, 1] interval. Also, considering for evaluation about supervised and unsupervised learning, the study adopted Accuracy and ACC (clustering accuracy) respectively, which is listed below, as the performance evaluation metrics.

$\displaystyle\textit{Accuracy}=\frac{\textit{TP}+\textit{TN}}{\textit{TP}+% \textit{TN}+\textit{FP}+\textit{FN}}$ (11)

where TP (true positive) indicates the defective data is classified as defect-prone; FP (false positive) indicates the non-defective data is classified as defect-prone; TN (true negative) describes the non-defective samples correctly classified as non-defect-prone and FN (false negative) indicates the defective samples incorrectly classified as non-defect-prone.

$\displaystyle\textit{ACC}=\frac{\sum_{i=1}^{N}\delta(y_{i},\textit{map}(c_{i})% )}{N}$ (12)

where

$\displaystyle\delta(y_{i},c)=\left\{\begin{array}[]{ll}1,&\text{if}\ y=c\\ 0,&\text{otherwise}\end{array}\right.$

$N$ is the number of training data, $y_{i}$ and $c_{i}$ are the true category label and the predicted cluster of $x_{i}$ , $\textit{map}()$ is an optimal permutation function that maps each cluster label to a category label by Hungarian algorithm [26].

At last, to impartially compare the performance of various algorithms for supervised process, 100 times’ randomly external 10-fold cross validation is applied to calculate the final results that are provided in the form of mean $\pm$ standard deviation. For each algorithm related to WELM, a sigmoid function is used to calculate the hidden-layer output matrix, and two main parameters L and C are determined by grid search, where $L\in\{10,20,\ldots,200\}$ and $C\in\{2^{-4},2^{-2},\ldots,2^{20}\}$ , For unsupervised process, the average and best result were calculated based on $\delta=$ 0.2, 0.3, 0.4 and 0.5 [16].
3.3 Comparison of prediction performance

Tables 2 and 3 present the performance comparisons among the two algorithms on NASA data sets. Table 4 shows the defect priority analysis result. By observing the results, it is not difficult to draw some conclusions as follows.

Table 2
Accuracy of comparisons between WELM-SDP and other algorithms

ID	WELM-SDP	Ensemble	KNN	LR	DT
1	87.57 $\pm$ 6.75	83.03 $\pm$ 4.30	80.57 $\pm$ 3.16	78.78 $\pm$ 9.96	82.66 $\pm$ 4.11
2	81.67 $\pm$ 2.30	80.02 $\pm$ 1.17	79.94 $\pm$ 0.92	76.55 $\pm$ 16.95	75.57 $\pm$ 1.32
3	53.33 $\pm$ 0.14	50.33 $\pm$ 0.19	50.00 $\pm$ 0.29	40.00 $\pm$ 0.19	45.00 $\pm$ 0.25
4	85.67 $\pm$ 2.36	83.34 $\pm$ 1.37	83.28 $\pm$ 1.53	84.04 $\pm$ 1.94	83.24 $\pm$ 1.67
5	76.74 $\pm$ 9.10	72.04 $\pm$ 4.42	70.06 $\pm$ 3.75	75.00 $\pm$ 4.87	71.29 $\pm$ 3.72
6	76.98 $\pm$ 9.86	76.10 $\pm$ 6.41	74.68 $\pm$ 8.25	64.85 $\pm$ 8.61	72.58 $\pm$ 6.60
7	82.01 $\pm$ 0.22	81.52 $\pm$ 0.12	80.08 $\pm$ 0.28	80.33 $\pm$ 0.39	80.48 $\pm$ 0,45
8	85.70 $\pm$ 1.70	83.92 $\pm$ 1.42	82.46 $\pm$ 1.51	81.17 $\pm$ 1.75	80.03 $\pm$ 1.36
9	68.22 $\pm$ 2.07	69.86 $\pm$ 1.81	66.54 $\pm$ 1.96	63.79 $\pm$ 2.53	62.89 $\pm$ 2.33
10	71.28 $\pm$ 2.91	79.34 $\pm$ 1.60	76.32 $\pm$ 1.28	74.73 $\pm$ 2.05	78.02 $\pm$ 1.73

Table 3

ACC of WELMSC-SDP and spectral clustering

ID	$\delta=$ 0.2		$\delta=$ 0.3		$\delta=$ 0.4		$\delta=$ 0.5
	Spectral	WELM SC-SDP	Spectral	WELM SC-SDP	Spectral	WELM SC-SDP	Spectral	WELM SC-SDP
1	0.50	0.62	0.63	0.69	0.61	0.66	0.50	0.67
2	0.74	0.86	0.74	0.87	0.74	0.85	0.74	0.87
3	0.46	0.52	0.47	0.56	0.48	0.61	0.47	0.59
4	0.41	0.73	0.41	0.56	0.41	0.64	0.41	0.60
5	0.32	0.65	0.38	0.42	0.37	0.58	0.37	0.58
6	0.43	0.48	0.40	0.46	0.48	0.50	0.55	0.52
7	0.62	0.83	0.45	0.59	0.62	0.64	0.62	0.85
8	0.56	0.74	0.56	0.77	0.56	0.63	0.56	0.73
9	0.52	0.71	0.53	0.71	0.53	0.71	0.52	0.70
10	0.72	0.79	0.73	0.81	0.73	0.80	0.73	0.79
11	0.64	0.87	0.49	0.83	0.48	0.91	0.64	0.93

Table 4

Priority of defective data

Defect priority level	Number of defects	Defect priority level	Number of defects
Priority 1	16	Priority 1	1
Priority 2	36	Priority 2	24
Priority 3	92	Priority 3	71
Priority 4	197	Priority 4	113
Priority 5	388	Priority 5	206
kc1		kc2
Defect priority level	Number of defects	Defect Priority level	Number of defects
Priority 1	1	Priority 1	1
Priority 2	5	Priority 2	5
Priority 3	10	Priority 3	10
Priority 4	27	Priority 4	27
Priority 5	115	Priority 5	115
kc3		mc1
Defect priority level	Number of defects	Defect priority level	Number of defects
Priority 1	39	Priority 1	5
Priority 2	44	Priority 2	1
Priority 3	70	Priority 3	22
Priority 4	232	Priority 4	170
Priority 5	647	Priority 5	39
mc2		mw1
Defect priority level	Number of defects	Defect priority level	Number of defects
Priority 1	39	Priority 1	16
Priority 2	44	Priority 2	36
Priority 3	70	Priority 3	92
Priority 4	232	Priority 4	197
Priority 5	647	Priority 5	388
pc1		pc2
Defect priority level	Number of defects	Defect priority level	Number of defects
Priority 1	5	Priority 1	1
Priority 2	1	Priority 2	5
Priority 3	22	Priority 3	10
Priority 4	170	Priority 4	27
Priority 5	39	Priority 5	115
pc3		pc4

In Table 2, the five techniques are useful for promoting the classification performance of software defect prediction on NASA data sets. They have acquired higher Accuracy metric values. Meanwhile, the proposed WELM-SDP algorithm show obviously better performance than the other four algorithms on most data sets (1 to 8). Specifically, it is noted that the data sets (1 to 8) in Table 1 have a relatively high class imbalance ratio, the proposed WELM-SDP algorithm has produced significantly higher performance than other algorithms. The other two data sets have less class imbalance ratio, and the ensemble learning obtains the better results than other algorithms, but WELM-SDP also has the good performance. It proves that the proposed algorithm cannot achieve the good prediction performance, but also handle the imbalanced data problem in software defect data sets. In comparison with the other traditional or state-of-the-art supervised algorithms, WELM-SDP has acquired the best results on 8 data sets, respectively. In addition, it both perform stable on the other 2 data sets, indicating that it is robust enough.

In Table 3, it reports the clustering accuracy (average $\pm$ standard deviation) of proposed unsupervised process and spectral clustering. It can be found that the proposed WELMSC-SDP algorithm has obtained both good average and best results on the criterion of ACC at $\delta=$ 0.2, 0.3, 0.4 and 0.5. The result indicates that WELM can reduce data dimension in the unsupervised prediction process, and keep the useful feature information for spectral clustering.

As seen in Table 4, it obtains the numbers of different defective priority samples in all the datasets. It can be found: kc1 and pc2 have the same numbers of each priority for defective data; kc2 is different from other datasets on the numbers; kc3 is the same with mc1 and pc4; mc2 is the same with pc1 and pc5; mw1 is the same with pc3. According to the results, software developers can easily choose some datasets together based on the priority level and its numbers together. Meanwhile, the software developers are able to fix the defects more reasonably combining to the developing experience. It indicates that Euclidian distance can be used to calculate the similarity for the priority of software defective data after obtaining prediction results.

3.4 Discussion about parameter K

Next, this paper detected the clustering numbers of the parameter $K$ to the unsupervised prediction performance. As said in Section 3.1, the sum of square of deviations (SSD) is adopted in the analysis. Figure 2 shows the analysis between SSD and number of clusters for the eleven datasets respectively. It can be found that the sum of SDD decreases very lowly and keep stable when the number of clustering is close to 5 for all the datasets. Therefore, $K$ can be chosen as 5.

Figure 2.

Analysis of $k$ selection on SSD values.

3.5 Discussion about training time

For comparison, this paper computed 100 times’ training time for WELM-SDP Ensemble, KNN, LR, and Decision Tree. According to Table 5, WELM-SDP is the most rapid classifier on the data training in all the compared classifiers. Take the dataset of mw1 as example which is marked by ID-1, it can be seen that Ensemble is over six times than WELM-SDP on running time, KNN is more than seven times than WELM-SDP, LR achieves to over 170 times than WELM-SDP, and LR uses more than eight times than WELM-SDP. The other datasets also show WELM-SDP can save more training time to complete software defect prediction, which can support software developers conduct software maintenance fast and effectively.

As same with Table 5, Table 6 also presents training time spent on unsupervised learning. Since WELMSC-SDP handles the software eigenvectors and conducts the dimension descending, the training time is higher than spectral clustering. Although it is not as fast as spectral clustering, time cost is still acceptable in practice.

Table 5
Comparison of WELM-SDP and other learning algorithms on training time (s)

ID	WELM-SDP	Ensemble	KNN	LR	DT
	Time	Time	Time	Time	Time
1	12.50	78.13	87.50	2053.13	96.88
2	9.38	87.50	95.31	16100.00	109.38
3	9.38	75.00	93.75	1170.31	100.00
4	10.94	96.88	96.88	6196.88	106.25
5	6.25	84.38	92.19	1639.06	98.44
6	4.69	73.44	84.38	1757.81	60.94
7	10.94	79.69	98.44	2559.38	89.06
8	7.81	95.31	92.19	4875.00	100.00
9	9.38	98.44	93.75	7192.19	90.63
10	9.38	104.69	95.31	9831.25	85.94

Table 6

Comparison of WELMSC-SDP and spectral clustering on training time (s)

ID	$\delta=$ 0.2		$\delta=$ 0.3		$\delta=$ 0.4		$\delta=$ 0.5
	SC	WELMSC-SDP	SC	WELMSC-SDP	SC	WELMSC-SDP	SC	WELMSC-SDP
1	6.24	16.85	5.79	13.10	5.12	14.54	6.48	12.62
2	123.10	323.59	124.26	245.07	132.44	244.67	144.16	265.25
3	2.82	5.87	3.14	5.96	3.60	7.00	3.11	8.36
4	189.69	443.90	183.94	323.35	184.30	324.73	154.98	324.37
5	17.40	27.69	14.70	37.90	24.70	25.28	16.22	36.29
6	4.29	9.06	3.02	8.48	3.91	7.66	3.79	8.46
7	63.58	112.34	62.03	101.42	71.18	102.60	64.04	133.70
8	32.73	58.37	27.58	55.12	24.25	47.85	26.70	55.29
9	34.68	109.95	40.44	70.95	37.54	79.91	35.72	72.34
10	69.15	185.20	64.76	136.14	62.08	133.84	64.25	127.83

4. Related work

Since the PROMISE repository [33] was created in 2005, the researchers utilised the defect prediction data sets to build comparable models for studies. So far, great numbers of researches have been devoted to metrics describing code modules and learning algorithms to create SDP models.

Software defect prediction can be seen as a binary problem in machine learning. For each sample, there are two types of labels: defective data and non-defective data. Based on the learning model, the input new samples are predicted to determine whether they contain defects. Hence, varieties of machine learning methods have been proposed and compare for SDP problems, supervised learning and unsupervised learning are two main techniques in machine learning, which have been used to build learning models for software defects [4, 5, 6, 7]. The main difference between supervised learning and unsupervised learning is whether the data sets have labels. According to the built SDP models, the time and accuracy of defect prediction can be improved compared to traditional manual work. However, no single method is found to be the best, due to different types of software, different algorithms settings and different performance evaluation criteria of assessing the models. Among all, Random Forest appears to be a good choice for large data sets and Naïve Bayes performs well for small data sets. Yan et al. proposed automated change-prone class prediction using unsupervised method that is more suitable to cross-project prediction [19]. Gray et al. [7] discussed just-in-time software defect prediction on practical code changes. Although all the above approaches can achieve to implement SDP, they ignored the effect of class imbalance.

In additions, Ensemble algorithms and their cost-sensitive variants were studied and shown to be effective if a proper cost ratio can be set [29]. Lu et al. [31] changed the data distribution to handle with the class imbalance based on ensemble undersampling-boost. Issam et al. [17] implemented software defect prediction using ensemble learning on selected features-greedy forward selection.

In summary, current SDP studies include the feature selection of SDP data sets, the data sampling techniques for class imbalance of SDP and ensemble algorithsm design for SDP prediction. This study mainly focuses on the imbalanced SDP problem to introduce an improved training classifier WELM with relative density measurement and fuzzy set, which shows more robust as it is irrelevant with the scale of data distribution in feature space in contrast with Euclidean distance-based measure. Comparing with top-used SDP classifiers like Random Forest and Naïve Bayes, WELM also has better generalisation ability than them. Moreover, the single classifier is also compared with some representative ensemble learning algorithms on SDP like DNC for SDP imbalance.

5. Conclusion and future work

This paper described WELM technique in the context of software defect prediction, and presented a framework of SDP named WSAU-ELM algorithms. First, comparing to the four existing learning techniques, the introduced WELM-SDP does not have the remarkable training efficiency that can save time and resources, but also shows the best classification accuracy on eight of ten data sets, and then the second better method is ensemble learning. Although ensemble technique is constructed by many single classifiers and recognised to be better choice in the classification, the proposed single WELM-SDP classifier still has the competitive performance on the benchmark data sets.

Then, with respect to the existing software defect unsupervised prediction, experimental results also show that the WELMSC-SDP model gives better accuracy compared to the current popular spectral clustering, which can evaluate the data dimension will affect the clustering result and WELM can be incorporated in the unsupervised process. For both prediction models, it proves that WELM cannot only satisfy the requirements of supervised and unsupervised software defect prediction, but also show better model performance than the common learning algorithms. Specially, the priority has been discussed for defective data to assist with software developers during software maintenance.

However, this paper only chose sample open source project data to evaluate the proposed approach and discussed the within project prediction. The practical software system is more complex. Therefore, it will be a challenging to present more intelligent methods or learning algorithms to solve complex prediction for practical software defects in the future.

Footnotes

Acknowledgments

This work was supported in part by the Scientific Research Foundation for the introduction of talent of Jiangsu University of Science and Technology, China; Natural Science Foundation of the Higher Education Institutions of Jiangsu Province, China (Grant No. 18JKB520011); Primary Research & Development Plan (Social Development) of Zhenjiang City, China (Grant No. SH2019021); Natural Science Foundation of Jiangsu Province, China (Grant No. BK20191457).

Authors’ Bios

	Jinjing Gai is a master candidate at School of Computer Science in Jiangsu University of Science and Technology, China. Her research focuses on software data mining.
	Shang Zheng received his B.Sc, M.Sc. and Ph.D. at Northeast Normal University, Jilin University in China and De Montfort University in UK. Now he is Lecture at School of Computer Science in Jiangsu University of Science and Technology, China. His research interests include software data analysis and testing, reliable software evolution, software data analytics, etc. Up to now, He has published over ten papers in International Journals and conferences.
	Hualong Yu received the B.Sc. degree in computer science from Heilongjiang University, Harbin, China, in 2005, and received M.Sc. and Ph.D. degrees in computer science from Harbin Engineering University, Harbin, China, in 2008 and 2010, respectively. His research interests include machine learning, data mining and bioinformatics. He is the reviewer for more than 20 high-quality international journals, and the member in the organizing committee of several international conferences. He is also the member of ACM, China Computer Federation and the Youth Committee of the Chinese Association of Automation.
	Hongji Yang received the B.Sc. and M.Sc. degrees in computer science from the Jilin University, China in 1982 and 1985, respectively, and the Ph.D. degree in computer science from Durham University, UK in 1994. He was a faculty member at Jilin University, China in 1985, at De Montfort University, UK in 1993, and at Bath Spa University, UK in 2013. Currently, Dr. Yang is a professor in School of Informatics, Leiceser University, UK. He has published well over 400 refereed journal and conference papers. His research interests include software engineering, creative computing, web and distributed computing. He has become IEEE Computer Society Golden Core member since 2010. He is the editor in chief of International Journal of Creative Computing.

References

Schein

A.I.

and Ungar

L.H.

, Active learning for logistic regression: an evaluation, Machine Learning 68(3) (2007), 235–265.

Kaur

Sandhu

P.S.

and Bra

A.S.

, Early software fault prediction using real time defect data, International Conference on Machine Vision, Dubai, United Arab Emirates, IEEE, Dec 2009, pp. 242–245.

Krogh

and Vedelsby

, Neural network Ensembles, cross validation, and active learning, Advances in Neural Information Processing Systems 7(10) (1995), 231–238.

A.Y.

Jordan

M.I.

and Weiss

, On spectral clustering: analysis and an algorithm, Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, Vancouver, British Columbia, Canada, MIT Press, Dec 2001, pp. 849–856.

Seiffert

Khoshgoftaar

T.M.

and Hulse

J.V.

, Improving software-quality predictions with data sampling and boosting, IEEE Transactions on Systems, Man, and Cybernetics – Part A: Systems and Humans 39(6) (2009), 1283–1294.

Lewis

D.D.

and Gale

A.W.

, A sequential algorithm for training text classifiers, International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, Springer, July 1994, pp. 3–12.

Gray

Bowes

Davey

Sun

and Christianson

, The misuse of the NASA metrics data program data sets for automated software defect prediction, 15th Annual Conference on Evaluation & Assessment in Software Engineering, Durham, UK, IET, Nov 2011, pp. 96–103.

Carrizosa

Mladenović

and Todosijević

, Variable neighborhood search for minimum sum-of-squares clustering on networks, European Journal of Operational Research, 230(2) (2013), 356–363.

Murtagh

and Contreras

, Algorithms for hierarchical clustering: an overview, Wiley Interdisciplinary Reviews Data Mining & Knowledge Discovery 2(1) (2012), 86–97.

10.

Huang

G.B.

Zhu

Q.Y.

and Siew

C.K.

, Extreme learning machine: theory and applications, Neurocomputing 70(1–3) (2006), 489–501.

11.

Feng

G.C.

Lin

Z.H.

and Yu

, Existence of an interior pathway to a karush-kuhn-tucker point of a nonconvex programming problem, Nonlinear Analysis 32(6) (1998), 761–768.

12.

Huang

G.B.

Song

and You

, Trends in extreme learning machines: a review, Neural Networks 61 (2015), 32–48.

13.

Jiang

Zhang

Ren

and Lo

, A more accurate model for finding tutorial segments explaining APIs, IEEE International Conference on Software Analysis, Suita, Japan, IEEE, March 2016, pp. 157–167.

14.

Dhillon

Guan

and Kulis

, Kernel k-means, spectral clustering and normalized cuts, Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Seattle, WA, USA, ACM, August 2004, pp. 551–556.

15.

Shi

and Malik

, Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8) (2000), 888–905.

16.

Yang

and Qian

, Defect Prediction on unlabeled datasets by using unsupervised clustering, IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), Sydney, NSW, Australia, IEEE, Dec 2016, pp. 465–472.

17.

Issam

Alshayeb

and Ghouti

, Software defect prediction using ensemble learning on selected features, Information and Software Technology, 58 (2015), 388–402.

18.

Lindenbaum

Markovitch

and Rusakov

, Selective sampling for nearest neighbour classifiers, Machine Learning 54(2) (2004), 125–152.

19.

Yan

Zhang

X.H.

Liu

Ling

Yang

M.N.

and Yang

, Automated change-prone class prediction on unlabelled dataset using unsupervised method, Information and Software Technology 92 (2107), 1–16.

20.

P.L.

and Chung

J.Y.

, A new decision-tree classification algorithm for machine learning, Proceedings Fourth International Conference on Tools with Artificial Intelligence TAI ’92, Arlington, VA, USA, IEEE, Nov 1992, pp. 370–377.

21.

Bibi

Tsoumakas

Stamelos

and Vlahavas

, Software defect prediction using regression via classification, IEEE/ACS International Conference on Computer Systems and Applications, Dubai/Sharjah, UAE, IEEE, March 2006, pp. 330–336.

22.

Liao

S.H.

Chu

P.H.

and Hsiao

P.Y.

, Data mining techniques and applications – a decade review from 2000 to 2011, Expert Systems with Applications 39(12) (2012), 11303–11311.

23.

Borgatti

S.P.

and Everett

M.G.

, Models of core/periphery structures, Social Networks 21(4) (2000), 375–395.

24.

Rathore

S.S.

and Kumar

, A decision tree regression based approach for the number of software faults prediction, ACM SIGSOFT Software Engineering Notes 41(1) (2016), 1–6.

25.

Tong

and Koller

, Support vector machine active learning with applications to text classification, Journal of Machine Learning Research 2(1) (2002), 999–1006.

26.

Yang

Jian

Ding

Zha

and Giles

C.L.

, IKNN: Informative K-Nearest Neighbor Pattern Classification, The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, Springer, Sep 2007, pp. 248–264.

27.

Dietterich

T.G.

, Ensemble Methods in Machine Learning, International Workshop on Multiple Classifier Systems, Cagliari, Italy, Springer, June 2000, pp. 1–15.

28.

Menzies

and Marcus

, Automated severity assessment of software defect reports, IEEE International Conference on Software Maintenance, Beijing, China, IEEE, Oct 2008, pp. 92–97.

29.

Khoshgoftaar

T.M.

Geleyn

Nguyen

and Bullard

, Cost-sensitive boosting in software quality modelling, 7th IEEE International Symposium on High Assurance Systems Engineering, Tokyo, Japan, IEEE, Jan 2003, pp. 51–60.

30.

Richard

, Combinatorial optimization: algorithms and complexity (christos h. papadimitriou and kenneth steiglitz), SIAM Review 25(3) (1983), 424–425.

31.

and Chu

J.H.

, Adaptive Ensemble Undersampling-Boost: A novel learning framework for imbalanced data, Journal of Systems and Software 132(2017), 272–282.

32.

Zong

Huang

G.B.

and Chen

, Weighted extreme learning machine for imbalance learning, Neurocomputing 101(2013), 229–242.

33.

Shirai

Nichols

and Kasunic

, Initial evaluation of data quality in a TSP software engineering project data repository, Proceedings of the 2014 International Conference on Software and System Process, Nanjing, China, ACM, May 2014, pp. 25–29.

Software defect prediction based on weighted extreme learning machine

Abstract

Keywords

1. Introduction

2.1 Extreme learning machine

3. Experiments and analysis

3.1 Data sets

Table 1 NASA data sets

Table 2 Accuracy of comparisons between WELM-SDP and other algorithms

Table 5 Comparison of WELM-SDP and other learning algorithms on training time (s)

5. Conclusion and future work

Footnotes

Acknowledgments

Authors’ Bios

References

Table 1
NASA data sets

Table 2
Accuracy of comparisons between WELM-SDP and other algorithms

Table 5
Comparison of WELM-SDP and other learning algorithms on training time (s)