Abstract
High dimensional data have brobdingnagian number of features, but not all features are useful. Irrelevant and redundant features may even reduce the classification accuracy. Feature selection is a process of selecting a subset of relevant features to decrease the dimensionality of data. When applied on high dimensional datasets (Big Data) the feature selection methods perceives many challenges and it is pertinent to come up with the new methods or revamp the existing methods. In this study, a new method ‘Newtonian particle swarm optimization (NPSO)’ has been proposed. In the proposed method Newton’s second law of motion has been used to update the learning mechanism of PSO. In NPSO, particle not only learn from the position but also from the mass and acceleration of neighboring particles. The proposed method is mathematically validated at equilibrium using eigen values. Further, the proposed method has been applied on high dimensional microarray gene expression dataset. The NPSO is also compared with other state of art feature selection methods. Selected features, classification accuracy and dimension reduction are used to appraise the goodness of the proposed method. Mathematical validation and experimental results clearly validates the merits of the proposed method in field of feature selection. This paper show the classwise analysis of SRBCT, Brain1, 11-Tumor and 14-Tumor datasets. When number of classes increased dimension reduction is increased but classification accuracy of dataset is decreased.
Keywords
Introduction
High dimensional data means number of feature is high as compared to the number of samples. In real world situations, relevant features are often unknown a priori [1]. A relevant feature is neither irrelevant nor redundant to target concept. However, irrelevant and redundant features are not useful for classification and they may even reduce the classification performance due to the large search space known as the curse of dimensionality [2]. Feature selection can address this problem by selecting only the relevant feature and reducing the irrelevant features. Feature selection is a difficult task because complex interaction among features. An individual relevant feature may become redundant when working together with other features [2]. Features selection task is challenging because of the large search space. It is reported in literature that in case of high dimensional data nature inspired methods are giving promising results [3]. Nature inspired techniques are well known for their global search ability. Particle swarm optimization (PSO) is a nature inspired computing method which is based on swarm intelligence [4]. PSO is computationally less expensive because of its simple calculation.
Despite of many advantages and benefits, PSO has some bottlenecks which causes the local optimization. Several variants of PSO have been reported in literature to solve the drawbacks of PSO. Therefore, a brief survey of PSO for dimensionally reduction has been conducted and it is concluded from survey that velocity update equation of PSO need some reconsideration. Hence, in this paper a novel Newtonian PSO (NPSO) method has been proposed which works on the Newton’s law of motion. In NPSO, velocity update equation is rewritten in terms of mass, position and time step of local and global best particles. NPSO has been mathematically validated using equilibrium concept of the dynamic system which depends on the eigen values. The proposed method is applied on the gene expression microarray cancer dataset to select the most informative genes. A comparative study is also done to validate the goodness of the proposed method.
This paper is organized as follows, Section 2 presents the basic PSO and a survey of PSO variants. Section 3 provides the problem definition and proposes the novel NPSO method. Section 5 covers the analysis of the proposed algorithm. Section 6 deals with the simulation part of this paper. Section 7 shown the classwise analysis. Finally, Section 8 concluded the entire work.
Particle swarm optimization
Particle swam optimization(PSO) is a nature inspired methods that is computationally less expensive and faster than other methods [4]. PSO start with population of particles. Particle moves in search space to search the optimal solution by updating each particle based on its own experience and experience of its neighboring particles. Each particle is depicted using position and velocity vectors. Position vector of i
th
is elucidated as X
i
= (xi1, xi2, . . . , x
id
) where, d is the dimensionality of search space. The velocity of i
th
particle is defined as V
i
= (vi1, vi2, . . . , v
id
). There are two terms namely; pbest and gbest which represent the personal and global learning receptively. pbest(pb
i
) is the personal best position obtained by any particle(i
th
) which is defined as pb
i
= (pbi1, pbi2, . . . , pb
id
). gbest (gb) is global best position obtained by swarm which is defined as gb = (gb1, gb2, . . . , gb
d
). There are two equations to update position and velocity of each particle which are given as Equations (1) and (2).
In 1998, inertia weight(ω) has been added to the velocity update Equation (2) and modified equation is given as Equation (3).
This section describes different variants of PSO implemented in various field of research.
Binary PSO(BPSO)
BPSO [5], is a binary version of PSO to solve discrete problems. The position of each particle is encoded by binary form. The values of x
id
, pb (pbest) and gb (gbest) in the form of 0 and 1. In BPSO, velocity is transformed using sigmoid function according to Equation (4). The position vector is updated according to Equation (5).
Catfish BPSO [6], is developed to avoid the trapping in the local optimum solution. Discrete PSO is applied to discrete and combinatorial optimization problems [7]. Discrete PSO has a high success rate in solving integer programming problems as compare with other methods. It has a quick convergence and better performance.
Neighborhood search PSO, utilizes one local and two global neighborhoods search strategies. It includes two operations, first it generates the three trial particles corresponding to each particle in swarm using the local and global search strategies, then the best one among the three trial particles and the current particle is selected.
Guaranteed convergence PSO has an additional particle to search the region around the current best position [8]. Neighborhoods GCPSO model the structure of social networks [9]. To implement the neighborhoods in the standard PSO velocity update equation of basic PSO is replaced with Equation (6).
In QBPSO [10], implicit space decomposition is adopted, according to this method the whole swarm is divided into several sub swarms. Particles in each subswarm will search in different areas of whole search space, therefore, maintains the diversity of the search. Agarwal & Ranjan [11–13], have also proposed the quantum behaved ternary particle swarm optimization on the map reduce platform. Li et al. [14], have also developed a map reduced quantum based PSO for complex optimization problem.
Fuzzy logic based PSO
In fuzzy PSO [15], position of particle is defined using a matrix of membership values to solve the travel salesman problem. Fuzzy K-mean PSO [16], fuzzy adaptive turbulent PSO [17], fuzzy adaptive catfish PSO [18], and PSO with fuzzy controller [19], are some of the fuzzy logic and PSO hybrid methods applied on different fields of research. In fuzzy rule binary PSO (FRBPSO), the strength of fuzzy logic is used for the decision making of feature selection in PSO [20]. In FRBPSO, position bit of particle is updated using fuzzy inference system.
Multi objective PSO
In multiobjective optimization (MO) problems, multiple objective functions are optimized independently from each other and the best solution for each function is obtained separately which in turn results into a group of alternative solutions [21]. This group of alternative solutions is called Pareto optimal solution set. Non-dominated sorting PSO (NSPSO) is used for better multi objective optimization [22].
Accelerated particle swarm optimization
The basic PSO is based on local and global best experience. Individual best is used to solve the non linear problem and to increase the diversity of the search. However, diversity can be pushed using some randomness. Therefore, in accelerate PSO (APSO) [23] only global best is used.
Some other PSO variants
Some other PSO variants
Some other PSO variants
A brief summary of merits and demerits of basic variants of PSO
There are many other methods apart from PSO which are very effectively used for feature selection. Zang et al [34] have developed the fuzzy rough set-based information entropy for feature selection in a heterogeneous dataset.
Li et al. [35] has proposed a knowledge reduction framework for incomplete decision contexts by constructing a discernibility matrix and its associated Boolean function.
Cai et al. [36] have computed the type-1 and type-2 characteristic matrices for calculating the second and sixth lower and upper approximations in dynamic covering approximation spaces.
Problem description
Let X is a dataset defined as X = {x1, x2, . . . , x N }, where N is the number of samples. The i th sample of the dataset is denoted as x i ϵ R D , where; D is the dimensionality of the dataset. The associated class labels of dataset are y i ϵ {1, 2, . . . , c}, where; c is the numbers of classes.
The objective of the proposed method is to select the most relevant features with highest fitness in terms of classification. To achieve this goal a Newton’s second law based PSO method called Newtonian particle swarm optimization (NPSO) has been proposed. The aim of the proposed method is to improve the PSO based search to select the relevant feature which in turn optimize the k-nearest neighbor (k-NN) based classification model. It is a wrapper feature selection method in which feature selection is done using NPSO and testing validation is done using k-NN based classifier.
Problem in existing variants of PSO
PSO standard equation for velocity update Equation (3) has three components. First is old velocity, second and third are learning from itself and neighbor particle respectively. The second and third components are the main ingredients which are contributing in the change of velocity therefore, Equation (3) is rewritten as Equation (7).
On the other hand according to law of motion v = u + at where v, u and at are new velocity, old velocity and change in velocity due to acceleration. If Equation (3) is compared with this equation then it is observed that Equation (3) does not follow the law of motion concept because it is not considering the acceleration of each particles while calculating the new velocity. It could be confidently stated that there should some acceleration because velocity is not constant and it is changing in every iteration.
Therefore to incorporate the time and acceleration in the PSO equation NPSO is proposed to enhance the strength of PSO by utilizing the Newton’s law of motion.
Newton’s second law states that the acceleration of an object depends upon two variables the net force (F) acting upon the object and the mass (m) of the object. The acceleration of an object depends directly upon the net force acting upon the object, and inversely upon the mass of the object. As the force acting upon an object is increased, the acceleration of the object is also increased as shown in Equation (8).
Put the value of velocity from Equation (13) in Eq (7), and a new velocity update Equation (14) in terms of force (F) and duration of the period of time (t) and mass (m
i
) for i
th
particle for d
th
dimension is given as Equation (14).
g
timestep
is defined as the time difference between current iteration number and the iteration number of obtaining the previous global best. g
timestep
= (current _ iteration – iteration _ of _ previous _ gb).
The complete algorithm of the proposed NPSO is shown in algorithm (1). It shows, step by step process of NPSO algorithm. The velocity equation of basic PSO is depend on personal best and global best of particles. But in NPSO velocity update equation depends on personal best position, mass, p timestep , global best position, global best mass and g timestep . Algorithm (1) starts with initialization of population of agents (particles) which are uniformly distributed, then it evaluates each particles position using LOOCV-kNN(Leave-one-out cross validation using k-NN classifier). If particles current position is better than its previous position, update it. Determine the global best particle then update particles velocities and move particles to new positions.
Summary of experimental gene expression datasets
Summary of experimental gene expression datasets
This section covers the two main objectives of equilibrium and convergence analysis. First objective is to check the convergence point of NPSO and second objective is to find the time complexity of the algorithm. Mathematical approaches of Trelea [37, 38], has been used for analysis of algorithm.
Equilibrium and convergence analysis
Let’s consider NPSO velocity update equation as one dimension since each dimension is updated in NPSO independently from other dimension. Velocity update Equation (22) of NPSO rewritten in generalized one dimensional form Equation (23). For deterministic analysis, parameters are set to b1 = b2 = b, r1 = r2 = r, pb = gb = p, pb
mass
=gb
mass
=m
i
=m, p
timestep
=g
timestep
=1.
Initialize the mass (m), ptimestep, gtimestep, inertia weight (w) and number of particles(p)
Calculate fitness of particles using LOOCV-kNN
pbi = xi
Calculate ptimestep
pbmass = mi
No change in pb, ptimestep and mi
gb = pbi
Calculate gtimestep
gb mass = mi
No change in gb, gtimestep and mi
/* Update the particle*/
Now, calculate the eigen values of the system.
Dynamics theory says that equilibrium of system with respect to time depends on the eigenvalues of the dynamic values. Both eigen values, {0, ω} where, ω =0.5 are less than one. Therefore, it is stated that system is moving to the stable equilibrium point.
Time complexity of the proposed methods is mainly depends on the time given to compute the new velocity and position of each dimension of each particle. Let us consider, I is the total number of iterations, D is initial dimension and P is the total number of particles. So the time complexity of the proposed methods is O(IDP). In algorithm (1), while loop is terminate when maximum iteration(I) is reached. For loop is terminate when loop reach to total no of partitions(P) and another for loop is depend on the dimension(D). Therefore, total complexity of the proposed methods is the multiplication of I, P and D.
Space complexity
Space complexity of proposed methods is O(DP), where, D is initial dimension and P is the total number of particles. Since this algorithms only store a matrix with the entire particle population which is called cost matrix.
Experiment and results
Datasets
The performance of the proposed NPSO is tested on different microarray gene expression dataset. Table 3 show some different type of experimental datasets. SRBCT is characterized by the large features information. It consist of small round cells that on staining appears blue. Frequency of these tumors is high in children as compare to adults. It has 83 samples and 2308 number of features. The SRBCT dataset has four subcategories of small round blue cell tumors. These four classes are Ewing’s sarcoma (EWS: 29 samples), Burkitt’s lymphoma (BL: 11 samples), neuroblastoma (NB: 18 samples) and rhabdomyosarcoma (RMS: 25 samples). DLBCL has 77 samples and 5469 number of features. It has two subcategories of diffuse large B-cell lymphoma. These two classes are (DLBCL:58 samples) and Follicular lymphoma (FL: 19 samples). Brain1-Tumor has 90 samples and 5920 number of features. Brain2-Tumor has total 50 samples and 10367 number of features.
Leukemia1 has 72 samples and 5327 number of features. Leukemia2 has 72 samples and 11225 number of features. Prostate Tumor has 102 samples and 10509 number of features. It has two classes which are normal tissue (normal: 50 samples) and prostate tumor (tumor: 52 samples). All the datasets have been downloaded from http://www.gems-system.org/.
Experimental setup
Experiment 1
This experiment has been conducted with the hope that NPSO selects the highly informative features from all datasets. To this end, NPSO is applied on SRBCT dataset, DLBCL dataset, Brain1-Tumor dataset, Brain2-Tumor dataset, Leukemia1 dataset, Leukemia2 dataset and Prostate Tumor dataset. NPSO is heuristic search method therefore, NPSO is applied multiple (five times) times on each datasets.
Experiment 2
NPSO is also compared with other state-of-art methods. The classification accuracy acquired from all the features (kNN-without feature subset selection) is also obtained to demonstrate the benefit of dimensionality reduction. The proposed method is compared with kNN [43], IBPSO [44], correlation coefficient [45], evolutionary computing [46], genetic algorithm [47], relieff [48], information gain [49], MIBPSO, FRBPSO and EVPSO.
Correlation coefficient, relieff and information gain are filter feature selection methods. Filter feature selection methods just ranks the features according to their performance. Therefore, top ranked feature are selected and given to the kNN to find the classification accuracy. The number of selected features from ranked features is kept equal to the number of features selected by NPSO-kNN. This experimental configuration enables a simple comparison and allows to investigate the discriminative power given by the selected subset of features from individual filter feature selection method. The evolutionary computing and genetic algorithm methods are performed using Weka tool. Weka has (μ, λ) evolutionary algorithm with random initialization, binary tournament selection operator, single point crossover operator, bit flip mutation and generational replacement.
In this experiment IBPSO, MIBPSO, FRBPSO, EVPSO, NPSO are wrapper class of feature selection methods in which kNN is used as classifier to calculate the classification accuracy. The feature selected by the NPSO is given to the SVM, to check the strength of selected feature using SVM classifier. All the simulations has been conducted on Matlab 7.11.1.866 R2010b, Licence No. 691568. For filter feature selection methods like (information gain, correlation coefficient and relieff) Weka 3.7.13 has been used. In filter feature selection the number of selected features are kept same as number of feature selected by the NPSO, to make the comparison simple. It allows to investigate the informative strength of given number of features. Maximum iteration used in NPSO is 150 and number of particles in the swarm are set to 40. Cognitive and social parameters are given the value of 2. Inertia weight is taken as ω = 0.5. The complete parameter setting used here, is already optimized in the literature by Shi and Eberhart [50], and also used by Chung et al. [44], for PSO based feature selection. In all the wrapper feature selection methods (IBPSO, MIBPSO, FRBPSO, and EVPSO) same parameter set is used along with leave one out cross validation.
Simulation results
Experiment 1
NPSO is a heuristic nature inspired search method. Therefore, it has been applied five times on all datasets. Classification accuracy of the selected features with leave one out cross validation (LOOCV) using kNN is reported in the Table 4. Table 5 shows the result of classification accuracy obatined using SVM with features which are selected by NPSO.
NPSO-kNN is the proposed NPSO which has leave one out cross validation (LOOCV) using kNN and NPSO-SVM means selected features from NPSO is given to the SVM. Therefore, number of selected features in both the table are same.
Classification accuracy obtained after feature selection from microarray datasets using NPSO-kNN
Classification accuracy obtained after feature selection from microarray datasets using NPSO-kNN
Classification accuracy obtained after feature selection from microarray datasets using NPSO-SVM
Table 6, shows the results of comparative study which shows the revealing performance of NPSO.
The reported results reveal that NPSO in combination with kNN and SVM respectively perform equally as well, or even better compared to other previously available feature selection methods on the same datasets. Table 6, reveals the blessings of feature selection with increased classification accuracy as compare to classification with all features (classification accuracy 91.57 with kNN). The highest classification accuracy with minimum features is shown in bold in Table 6. Since correlation coefficient, information gain and relieff the filter feature selection method. Therefore, number of selected features obtained from these methods are kept same as the number of features obtained using NPSO. This allows to investigate the classification benefit given by the limited number of the features. NPSO shows the highest reduction in dimension (average dimension reduction 98.59%) and highest information (average classification accuracy 98.56%) conveyed by the selected features.
Average classification accuracy, number of selected features, reduction in dimension using different feature selection methods and NPSO
Average classification accuracy, number of selected features, reduction in dimension using different feature selection methods and NPSO
This section shows a classwise analysis of datasets through proposed method. Classwise analysis has been performed on SRBCT, Brain1, 11-tumors and 14-tumor dataset. The information of SRBCT and Brain1 datasets is available in 3. Information of 11-tumors and 14-tumor datasets are shown in Table 7 with number of classes, number of features and number of samples. The main objective of classwise analysis to check the performance of NPSO with increasing numbers of classes. To perform this analysis many different datasets with different number classes are generated from each dataset. For example SRBCT has four classes hence, three new datasets are generated from SRBCT dataset. In two class SRBCT dataset only two classes are retained, resulting in 54 samples out of 83 samples, similarly in three class SRBCT dataset only three classes are retained which in turn results in 65 samples out of 83 samples. Finally all 83 samples are considered to perform analysis for four classes. Similarly all the datasets are fragmented to generate different set of multi class datasets.
Summary of classwise analysis datasets
Summary of classwise analysis datasets
Table 8 shows the classwise average classification accuracy of NPSO on SRBCT. The average classification of NPSO on two class dataset is 100% and average dimension reduction is 89.60%. Dimension reduction means (Total number of features-Selected features)/(Total Number of features). The average classification accuracy of three class SRBCT data is 100% and dimension reduction is 94.34. In four class (complete SRBCT dataset) dataset average classification accuracy is 98.09% and dimension reduction is 96.44.
Average classwise classification accuracy obtained using NPSO on SRBCT dataset
Table 9, shows the average classwise classification accuracy of Brain1-Tumor dataset. Brain1-Tumor have 70 samples in two class, 80 samples in three class, 84 samples in four class, 90 samples in five class. Total number of features in every class is 5921. Table 9 shows when number of classes increases classification accuracy decreases, but dimension reduction increases.
Average classwise classification accuracy obtained using NPSO on Brain1-Tumor dataset
Table 10, shows the classwise analysis of 11-Tumor dataset. Table 10, has ten datasets generated from 11-Tumor dataset. Table 10, also shows that two class and three class datasets have 100% classification accuracy with considerable dimension reduction. As the number of classes are increased in 11-Tumor dataset, performance of NPSO in turn of classification accuracy decreases but considerable dimension reduction is achieved.
Average classwise classification accuracy obtained using NPSO on 11-Tumor dataset
Table 11, shows the classwise analysis of 14-Tumors dataset. 14-tumor has twenty six different classes of tumor hence twenty five different datasets are generated from 14-tumor dataset for classwise analysis. Upto three classes NPSO is performed with 100% classification accuracy. When number of classes are increased dimension reduction is increased.
The NPSO is proposed with the aim of dimension reduction. From classwise analysis it is clear that NPSO is able to reduce dimension of very complex datasets like 14-tumor datasets which has 26 classes and 15010 features. This analysis clearly reveals the merits of NPSO for dimension reduction of high dimensional datasets.
Average classwise classification accuracy obtained using NPSO on 14-Tumor dataset
The proposed novel variant of PSO for feature selection (Newtonian PSO) incorporates the Newton’s law of motion in the PSO. Therefore, velocity equation is rewritten considering the particle’s own and neighboring particle’s mass and acceleration. After incorporating these factors PSO velocity update equation completely satisfies all the aspects of the Newton’s law of motion. The proposed method is mathematically validated and also applied on the gene expression microaaray datasets for gene selection. Mathematical validation show that the proposed method is converging to the equilibrium point. Experimental results show that the NPSO achieves both accurate classification and sufficient dimensionality reduction. This shows that, NPSO preferentially selects the most informative features which in turn, boosting the performance of classifier. From classwise analysis it is clear that NPSO is able to achieve very good dimension reduction with considerable classification accuracy from high dimensional complex datasets.
