Leveraging local receptive fields based random weights networks for hyperspectral image classification

Abstract

This paper proposes a classification approach for hyperspectral image using the local receptive fields based random weights networks (RWN). Considering the local correlations of spectral features, it is promising to improve the performance of hyperspectral image (HSI) classification by introducing the local receptive fields (LRF). It is the first time to apply such LRF-based RWN structure to HSI classification. The proposed classification framework consists of four layers, i.e., input layer, convolution layer, pooling layer, and output layer. The convolution and pooling layer are used for feature extracting and the last layer is used as the classifier. Experimental results on two real hyperspectral image datasets have confirmed the effectiveness of the proposed HSI classification method.

Keywords

Hyperspectral image classification random weights networks local receptive field

1 Introduction

Urban land use and land-cover classification is a key technique in remote sensing application system. Its main idea is to assign a land-cover class label to each pixel in the remote sensing image. During recent years land-cover classification has been widely applied in ground mapping, urban planning, mineral resources exploration, environment monitoring and many other fields.

The hyperspectral remote sensing technology emerges in the 1980s, and it has become the frontier technology in remote sensing. A hyperspectral image (HSI) is obtained by a spectrometer with a large wavelength range. Usually it contains tens or hundreds of bands, and the map in each band is an image of specific wavelength. Thus, HSI includes both the image information and spectral information. Since introducing spectral information would significantly improve the performance of land-cover classification, HSI classification has received growing attention from remote sensing community [1, 2]. However, the large amount of bands and relatively small training sample size brings challenges to conventional remote sensing classification methods.

As a fast learning algorithm for single-hidden layer feedforward neural networks (SLFNs), the random weights networks (RWN) model was first proposed by Schmidt et al. in [3], and it has been considered as a promising learning algorithm for pattern classification. A similar idea came out from Pao’s group and they termed such randomized learning models as random vector functional-link (RVFL) nets [4]. Further investigations of RVFLs were given in [5, 6]. Compared to traditional SLFN learning algorithms, RWN has the following advantages: (1) high efficiency, (2) easy-implementation, (3) unification of classification and regression, and (4) unification of binary and multi-class classification. Plenty of works have shown the capabilities of RWN in classification and regression task [7 –13].

In past few years the local receptive field (LRF) has been applied in neural learning models widely [22 –25]. With LRFs neurons can extract low level visual features, which are then combined to obtain higher level features in the subsequent layers. By considering the local connections of spectral features, it is possible to further improve the performance of HSI classification. Thus, in this paper, we propose a novel HSI classification scheme by introducing the LRF concept into the RWN model, which is called LRF based random weights networks (RWN-LRF). The main contributions of our work are summarized as follows.

A new RWN structure using the LRF for hyperspectral image classification is proposed. To the best of our knowledge, it is the first time to apply such LRF-based RWN structure to HSI classification problems.

Contextual information of HSI data is brought in LRF-based RWN to further improve classification performance.

Experiments on two real HSI data (the Pavia University dataset and the Salinas dataset) demonstrate that our proposed approach outperforms the ordinary RWN and other classical or state-of-the-art classification methods.

The rest of the paper is organized as follows. Section 2 introduces the basic idea of random weights networks. Section 3 describes our proposed HSI classification method using LRF-based RWN. Experimental results are presented in Section 4. Finally, Section 5 concludes the paper.

2 Brief of random weights networks

Random weights networks (RWN) can be regarded as a SLFN using randomly assigned input weights. Unlike the BP (back-propagation) algorithm, it does not require adjustment of input weights. In general, RWN is a feedforward neural network with a simple three-layer structure: input layer, hidden layer and output layer. The schematic diagram of RWN is depicted in Fig. 1.

Let $X = {x_{1}, x_{2}, . . ., x_{N} | x_{i} \in ℝ^{D}, i = 1, 2, . . .,$ N} denotes the training set containing N training samples, and $T = {t_{1}, t_{2}, . . ., t_{N} | t_{i} \in ℝ^{c}, i = 1, 2, . . ., N}$ is the training data target matrix, where t_i is the vectorized label, and c is number of classes. Then the RWN model with L hidden neurons and an activation function g(x) can be expressed as $\begin{matrix} \sum_{j = 1}^{L} β_{j} g (w_{j} \cdot x_{i} + b_{j}) = t_{i}, i = 1, 2, . . ., N \end{matrix}$ (1) where w_j and β_j represents the weight vector from the j-th hidden neuron to the input neurons and the output neurons, respectively, and b_j is the bias of the j-th hidden neuron. The above N equations can be compactly rewritten as $\begin{matrix} H β = T \end{matrix}$ (2) where $\begin{matrix} H = {[\begin{matrix} g (w_{1} \cdot x_{1} + b_{1}) & \dots & g (w_{L} \cdot x_{1} + b_{L}) \\ g (w_{1} \cdot x_{2} + b_{1}) & \dots & g (w_{L} \cdot x_{2} + b_{L}) \\ ⋮ & ⋱ & ⋮ \\ g (w_{1} \cdot x_{N} + b_{1}) & \dots & g (w_{L} \cdot x_{N} + b_{L}) \end{matrix}]}_{N \times L,} \\ β = {[\begin{matrix} β_{1} \\ β_{2} \\ ⋮ \\ β_{L} \end{matrix}]}_{L \times c} and T = {[\begin{matrix} t_{1} \\ t_{2} \\ ⋮ \\ t_{N} \end{matrix}]}_{N \times c .} \end{matrix}$ H is the hidden layer output matrix of SLFNs and β is the output weight matrix. By solving the above linear equation, the optimal output weight can be obtained as follows: $\begin{matrix} \hat{β} = H^{†} T \end{matrix}$ (3) where H^† is the Moor-Penrose generalized inverse [16] of the hidden layer output matrix H.

During the training of RWN, w_j and b_j are generated randomly and kept unchanged, and β is the only parameter to be trained. When β is solved, the RWN training process is completed.

To get better generalization performance, a regularized RWN model is often used by adding constraint to the output weights [17]. The optimization object becomes: $\begin{matrix} min_{β \in R^{L \times c}} \frac{1}{2} | | T - H β | |_{F}^{2} + \frac{λ}{2} | | β | |_{F}^{2} \end{matrix}$ (4) where || · ||_F is the Frobenius norm and λ > 0 is parameter controlling the trade-off between the training error and output weights norm.

And the output weights can be calculated as [9 , 15]: $\begin{matrix} \hat{β} = {\begin{matrix} H^{T} {(\frac{I}{λ} + H H^{T})}^{- 1} T, if N \leq L \\ {(\frac{I}{λ} + H^{T} H)}^{- 1} H^{T} T, if N > L \end{matrix} \end{matrix}$ (5) where I is the identity matrix.

3 HSI classification using local receptive fields based RWN

3.1 Local receptive fields based RWN

The major thought of local receptive field (LRF) is that each unit in a layer receives inputs from a set of units located in a small neighborhood in the previous layer. The idea of connecting units to LRFs on the input goes back to the perceptron in the early 1960s, and it was almost simultaneous with Hubel and Wiesel’s discovery of locally sensitive, orientation selective neurons in the mammalian visual system [18, 19]. The LRF has also been justified by more recent biological evidence, which shows the visual cortex cell only responds to sub-region of the retina (i.e., input layer) [20, 21].

The LRF has been used extensively in neural models of visual learning, such as Fukushima’s Neocognitron [22], Poggio’s HMAX [23], Lecun’s CNN [24], and Olshausen’s sparse coding [25] model. With local receptive fields neurons can extract elementary visual features such as oriented edges, corners, endpoints. These features are then combined to obtain higher order features in the subsequent layers. By introducing the LRF concept into random weights networks, RWN-LRF learns the local structures and generates more meaningful representations at the hidden layer when dealing with image processing and similar tasks.

The schematic diagram of RWN-LRF is shown in Fig. 2. To represent the input effectively, K different input weights are adopted and K diverse feature maps are obtained. The hidden layer consists of convolutional nodes in feature maps. The input weights of the same feature maps are shared while different feature maps are generated by distinct input weights (convolution kernels).

Assuming the input data has the size of d₁ * d₂, the kernel size of convolution (i.e., receptive field size) is r₁ * r₂, then the feature map after the convolution operation will have the size of (d₁ - r₁ + 1) * (d₂ - r₂ + 1). Let X $\in ℝ^{d_{1} \times d_{2}}$ be the input data, then the node (i, j) in the k-th convolution feature map, C_i,j,k, can be calculated as: $\begin{matrix} C_{i, j, k} = \sum_{m = 0}^{r_{1} - 1} \sum_{n = 0}^{r_{2} - 1} X_{i + m, j + n} \cdot W_{m, n, k} \\ i = 0, 1, . . ., (d_{1} - r_{1}); j = 0, 1, . . ., (d_{2} - r_{2}) \end{matrix}$ (6) where $W \in ℝ^{r_{1} \times r_{2} \times K}$ is the random weight matrix.

Next the square and root pooling structure is applied to formulate the combinatorial node. The pooling maps are composed of combinatorial nodes. Assuming the pooling size is s₁ * s₂, then the size of each pooling map is $\frac{d_{1} - r_{1} + 1}{s_{1}}$ × $\frac{d_{2} - r_{2} + 1}{s_{2}}$ . The node (i, j) in the k-th pooling map, P_i,j,k, is calculated as follows: $\begin{matrix} P_{i, j, k} = \sqrt{\sum_{m} \sum_{n} C_{(i \cdot s_{1} + m), (j \cdot s_{2} + n), k}^{2}} \\ 0 \leq m < s_{1}, 0 \leq n < s_{2} \end{matrix}$ (7)

The square and rooting pooling introduces rectification nonlinearity and translational invariance into the network, which has been proved to be effective in [26] and [27].

Simply concatenating all combinatorial nodes values into a vector, the pooling layer (i.e., combinatorial layer) is fully connected to the output layer. The output weight matrix can be analytically calculated in a similar way as Equation (5):

1) If N ≤ K (d₁ - r₁ + 1) (d₂ - r₂ + 1)/s₁s₂ $\begin{matrix} \hat{β} = H^{T} (\frac{I}{λ} + H H^{T})^{- 1} T \end{matrix}$ (8)

2) If N > K (d₁ - r₁ + 1) (d₂ - r₂ + 1)/s₁s₂ $\begin{matrix} \hat{β} = (\frac{I}{λ} + H^{T} H)^{- 1} H^{T} T \end{matrix}$ (9)

3.2 HSI classification based on RWN-LRF

The critical challenge of pattern classification is to model the intra-class appearance and shape variation of objects. Since the band of HSI data is usually associated with its neighbor bands, it is possible to further improve the performance of HSI classification by considering the local correlations of spectral features. For HSI data, each pixel can be regarded as a 2D image whose height (i.e., d₂) is 1. Thus, the size of the input layer is d₁ × 1, where d₁ is the number of spectral bands. Fig. 3 depicts our proposed RWN-LRF based HSI classification method.

The training process of the proposed RWN-LRF based HSI classification framework is described as follows. The feature maps in the convolution layers are calculated through the convolution operation in a weight-sharing way. That is to say, the nodes in the same feature map share the same convolution kernel. The convolution operation performs in the spectral dimension, which means the aforementioned r₂ also has the value of 1 like d₂. Moreover, it is notable that the input weights of RWN-LRF are randomly generated and kept unchanged. The nodes in pooling layer are then obtained by square/rooting pooling operation, where s₂ is also set to 1. Next, the nodes in the pooling maps are combined into a feature vector and connected to c output nodes, where c is number of land-cover classes. The training is done when the output weights have been calculated analytically.

Since the architecture and weights are specified, the RWN-LRF model can be used as a classifier for HSI data. As expressed in Equation (2), the classification results can be obtained with a forward-propagation step. The index of the node with the largest value in the output layer is regarded as the predicted label of current pixel.

4 Experiments

In this section, we evaluate the classification performance of the proposed RWN-LRF based method first by comparing with ordinary RWN [29]. Then experiments are done with contextual information of HSI data added into RWN and RWN-LRF model. These two contextual methods are called CRWN and CRWN-LRF, respectively. The contextual methods are implemented by a square neighbor window. Assuming the size of neighbor window is w × w, then the spectral reflective value of a pixel becomes the mean value of the w² pixels in the neighborhood. Comparisons are also made with other widely-used or state-of-the-art methods, including SVM [30], contextual SVM (CSVM) [31], SVM with composite kernels (SVM-CK) [32], and the simultaneous orthogonal matching pursuit (SOMP) algorithm [33]. Two benchmark hyperspectral images were used for the evaluation.

4.1 HSI data description

1) Pavia University Dataset: The dataset was collected by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor in 2003 with a spatial resolution of 1.3 m/pixel. It consists of 610×340 pixels in 103 spectral bands (12 noisy bands are removed from total 115 spectral bands) covering 0.43 μm–0.86 μm. The HSI data has 9 land-cover classes, whose spectral signatures are illustrated in Fig. 4. The hyperspectral 3D cube and the ground-truth image of the data set are shown in Fig. 5(a) and (b), respectively. In the experiments, 200 samples of each class are randomly chosen as the training data, and the rest are used as the test samples. The numbers of samples of each class are shown in Table 2.

2) Salinas Dataset: This dataset was collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor in 1998 over the Salinas Valley, CA, USA. It consists of 512×217 pixels with a spatial resolution of 3.7 m/pixel. The original dataset is composed of 224 bands, with a spectral range from 0.4 μm–2.5 μm. In the experiments, the corrected 204-bands dataset is considered by discarding the 20 water absorption bands: [108–112], [154–167], 224. The HSI data has 16 land-cover classes, whose spectral signatures are illustrated in Fig. 6. The hyperspectral 3D cube and the ground-truth image of the dataset are shown in Fig. (a) and (b), respectively. For this dataset, 50 samples of each class are randomly chosen as the training data, and the rest are used as the test samples. The numbers of samples of each class are shown in Table 4.

4.2 Experimental settings

Among the proposed approach and the compared methods, four methods (RWN, RWN-LRF, SVM, SOMP) use the only spectral information, while other methods use both spectral and contextual information. The neighbor window size w of contextual methods was chosen from 3, 5, 7, 9. For RWN and CRWN, two main parameters are the trade-off coefficient λ and the hidden neurons number L. The parameter λ was chosen from 10^{-4,⋯,4} and L from {100,200,⋯,3000}. The best parameters were determined by five-fold cross validation, and were then used in the training and testing process. In RWN-LRF and CRWN-LRF methods, the convolution kernel size was set to 20 for the Pavia University dataset and 21 for the Salinas dataset, respectively. The pooling size was set to 2, and the number of feature maps K was chosen from {10,30,⋯,150} for both datasets. The trade-off parameter λ was also chosen from 10^{-4,⋯,4} for both datasets. For SVM and CSVM classification, we applied the one-versus-one strategy using the LIBSVM Toolbox [34]. A radial basis function (RBF) kernel was adopted; the penalty term and the RBF kernel width were selected using grid search within the given sets {2^-5,⋯,2¹⁵} and {2^-15,⋯,2³}, respectively. For SVM-CK and SOMP, we used the settings as reported therein.

All experiments were implemented in MATLAB R2014a and run on a computer with two eight-core Intel Xeon E5-2650 processors at 2.0 GHz and 128GB of memory.

4.3 Evaluation metrics

To quantitatively evaluate the quality of classification results, several measure methods are commonly used, i.e., confusion matrix, overall accuracy, average accuracy, and Kappa coefficient. Confusion matrix is also called error matrix, which is the basis of other measures such as overall accuracy and Kappa coefficient. Let M be the confusion matrix, then its element m_ij denotes number of pixels whose actual label is j while the predicted label is i. Confusion matrix can also be expressed in the form of percent.

Overall accuracy(OA) is the global evaluation of the classification results, which is the ratio of correctly classified pixels to all test pixels. This can be expressed as $\begin{matrix} OA = \sum_{i = 1}^{c} m_{ii} / N \end{matrix}$ (10) where c is number of classes, m_ii denotes the diagonal elements of the confusion matrix, and $N = \sum_{i = 1}^{c} \sum_{j = 1}^{c} m_{ij}$ is number of all test samples. Average accuracy(AA) is the average of accuracy for each class.

The Kappa coefficient [28] uses the multiple discrete analysis technique and involves all elements of the confusion matrix, which is regarded as a more objective metric. The definition of Kappa coefficient can be written as $\begin{matrix} κ = \frac{N \sum_{i = 1}^{c} m_{ii} - \sum_{i}^{c} (m_{i +} m_{+ i})}{N^{2} - \sum_{i}^{c} (m_{i +} m_{+ i})} \end{matrix}$ (11) where m_i+ and m_+i denote the sum of elements in row i and column i of the confusion matrix, respectively. The higher the Kappa coefficient is, the better classification performance is.

4.4 Classification results

The classification performance of the four methods on the Pavia University dataset are summarized in Table 3, where the classification accuracy for each class, OA, AA, and κ coefficient are reported. These results are averaged over ten runs and the standard deviation is also reported. We make several observations: First, by introducing the LRF concept, RWN-LRF provides better classification result than the ordinary RWN, with the OA increasing from 77.17% to 80.85%. Second, taking into account contextual information can significantly enhance the classification performance, especially for several classes such as Asphalt, Meadows, and Bricks. Compared to RWN, RWN-LRF and SVM, the CRWN, CRWN-LRF and CSVM get an improvement of about 5%, 12% and 5% in term of OA, respectively. In general, the CRWN-LRF method provides the best classification performance, with the OA of 92.71%, AA of 94.95%, and κ of 0.9034. The classification maps of different methods are further illustrated in Fig. 8. As shown in the figure, the RWN, RWN-LRF, SVM, and SOMP have the problem of salt-and-pepper noise because no contextual information is involved. As a result, their classification maps are not smooth. This problem is greatly overcome by taking into account contextual information, as shown in Fig. 8(e, f, h and i). It could also be observed that CRWN-LRF dramatically reduces the confusion between Meadows and Bare Soil, compared to other methods.

The classification accuracies for Salinas are reported in Table 3. As the results shown, compared to RWN, RWN-LRF and SVM, the CRWN, CRWN-LRF and CSVM methods yield higher classification result, which stresses the importance of contextual information for HSI classification again. For Salinas dataset, the CRWN-LRF method provides considerably better accuracies than all other methods, with the OA of 96.69%, AA of 98.27%, and κ of 0.9574. The classification maps of different methods are further illustrated in Fig. 9.

4.5 Parameter analysis

In this subsection, we evaluate the effects of the two key parameters of the proposed CRWN-LRF method, i.e., the trade-off parameter λ and the feature maps number K. Figure 11 depicts the overall accuracies obtained by the CRWN-LRF method with different values of λ on Pavia University and Salinas datasets. Note that K is fixed to 150 for both datasets. It is observed that the optimal λ is 0.01 for Pavia University dataset and 10 for Salinas dataset.

Next, we analyze the effects of feature maps number K. We set λ = 0.01 for Pavia University image and λ = 10 for Salinas image, and we vary K from 10 to 150 for both datasets. Figure 10(a) and (b) illustrate the effects of K on running time and overall accuracy, respectively. It can be concluded from Fig. 10 that the running time of CRWN-LRF increases approximately linearly as K increases, while the overall accuracy nearly does not rise when K is larger than 130. Thus, K is set to 150 for both datasets in abovementioned experiments.

5 Conclusion

In this paper, we propose a HSI classification method using the local receptive fields based RWN. Experiments on the Pavia University and Salinas datasets demonstrate excellent performance of our proposed approach compared to the ordinary RWN and other classical or state-of-the-art methods. As for our future work, hierarchical RWN-LRF framework with multiple convolution layers and pooling layers will be studied to further improve HSI classification results.

Footnotes

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grants 61125201, 61303070, U1435219, 61402507, and 61202127. The authors would like to thank Prof. P. Gamba for providing the ROSIS image of Pavia University. They also would like to thank the anonymous reviewers for their comments and suggestions, which greatly helped us to improve the technical quality and presentation of this paper.

References

Bioucas-Dias

J.M.

, Plaza

, Camps-Valls

, Scheunders

, Nasrabadi

N.M.

and Chanussot

, Hyperspectral remote sensing data analysis and future challenges, IEEE Geoscience and Remote Sensing Magazine1(2) (2013), 6–36.

Zhang

L.P.

, Du

and Zhang

L.F.

, Hyperspectral Remote Sensing Image Processing, Science Press, Beijing, 2014.

Schmidt

W.F.

, Kraaijveld

M.A.

and Duin

R.P.

, Feedforward neural networks with random weights, 11th IAPR International Conference on Pattern Recognition, Vol. II. Conference B: Pattern Recognition Methodology and Systems, Hague, Netherlands, pp. 1992, 1–4.

Pao

Y.H.

and Takefji

, Functional-link net computing, IEEE Computer Journal25(5) (1992), 76–79.

Pao

Y.H.

, Park

G.H.

and Sobajic

D.J.

, Learning and generalization characteristics of the random vector functional-link net, Neurocomputing6(2) (1994), 163–180.

Igelnik

and Pao

Y.H.

, Stochastic choice of basis functions in adaptive function approximation and the functional-link net, IEEE Transactions on Neural Networks6(6) (1995), 1320–1329.

Zhao

, Wang

, Cao

and Wang

, A local learning algorithm for random weights networks, Knowledge-Based Systems74 (2015), 159–166.

, Zhao

and Cao

, Extended feed forward neural networks with random weights for face recognition, Neurocomputing136 (2014), 96–102.

Cao

, Ye

and Wang

, A probabilistic learning algorithm for robust modeling using neural networks with random weights, Information Sciences313 (2015), 62–78.

10.

Cao

, Tan

and Cai

, Sparse algorithms of random weight networks and applications, Expert Systems with Applications41(5) (2014), 2457–2462.

11.

Wan

, Zhou

, Zhao

and Cao

, A novel face recognition method: Using random weight networks and quasisingular value decomposition, Neurocomputing151 (2015), 1180–1186.

12.

Truong

T.K.

, Li

K.L.

and Xu

Y.M.

, Chemical reaction optimization with greedy strategy for the 0-1 knapsack problem, Applied Soft Computing13(4) (2013), 1774–1780.

13.

Wang

Y.G.

, Cao

F.L.

and Yuan

Y.B.

, A study on effectiveness of extreme learning machine, Neurocomputing74(16) (2011), 2483–2490.

14.

Cao

, Wang

, Zhu

and Wang

, An iterative learning algorithm for feedforward neural networks with random weights, Information Sciences328 (2016), 546–557.

15.

Huang

G.B.

, Zhu

, Ding

and Zhang

, Extreme learning machine for regression and multiclass classification, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics42(2) (2012), 513–529.

16.

Rao

C.R.

and Mitra

S.K.

, Generalized Inverse of Matrices and Its Applications, Wiley, New York, 1971.

17.

Bartlett

P.L.

, The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network, IEEE Transactions on Information Theory44(2) (1998), 525–536.

18.

Hubel

D.H.

and Wiesel

T.N.

, Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex, Journal of Physiology160(1) (1962), 106–154.

19.

Hubel

D.H.

and Wiesel

T.N.

, Receptive fields and functional architecture in two nonstriate visual areas (18 and 19) of the cat, Journal of Neurophysiology28(2) (1965), 229–289.

20.

Kara

and Reid

R.C.

, Efficacy of retinal spikes in driving cortical responses, Journal of Neuroscience23(24) (2003), 8547–8557.

21.

Sosulski

D.L.

, Bloom

M.L.

, Cutforth

, Axel

and Datta

S.R.

, Distinct representations of olfactory information in different cortical centres, Nature472(7342) (2011), 213–216.

22.

Fukushima

and Miyake

, Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position, Pattern Recognition15(6) (1982), 455–469.

23.

Serre

, Wolf

and Poggio

, Object recognition with features inspired by visual cortex, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, 2005, 994–1000.

24.

LeCun

, Bottou

, Bengio

and Haffner

, Gradient-based learning applied to document recognition, Proceedings of the IEEE86(11) (1998), 2278–2324.

25.

Olshausen

B.A.

and Field

D.J.

, Emergence of simple-cell receptive field properties by learning a sparse code for natural images, Nature381(6583) (1996), 607–609.

26.

Ngiam

, Chen

, Chia

, Koh

P.W.

, Le

Q.V.

and Ng

A.Y.

, Tiled convolutional neural networks, 24th Annual Conference on Advances in Neural Information Processing Systems (NIPS), Vancouver, Canada, 2010, pp. 1279–1287.

27.

Saxe

, Koh

P.W.

, Chen

, Bhand

, Suresh

and Ng

A.Y.

, On random weights and unsupervised feature learning, 28th International Conference on Machine Learning (ICML), Bellevue, Washington, 2011, pp. 1089–1096.

28.

Thompson

W.D.

and Walter

S.D.

, A reappraisal of the kappa coefficient, Journal of Clinical Epidemiology41(10) (1988), 949–958.

29.

Pal

, Extreme-learning-machine-based land cover classification, International Journal of Remote Sensing30(14) (2009), 3835–3841.

30.

Melgani

and Bruzzone

, Classification of hyperspectral remote sensing images with support vector machines, IEEE Transactions on Geoscience and Remote Sensing42(8) (2004), 1778–1790.

31.

C.H.

, Kuo

B.C.

, Lin

C.T.

and Huang

C.S.

, A spatial–contextual support vector machine for remotely sensed image classification, IEEE Transactions on Geoscience and Remote Sensing50(3) (2012), 784–799.

32.

Camps-Valls

, Gomez-Chova

, Muñoz-Marí

, Vila-Francés

and Calpe-Maravilla

, Composite kernels for hyperspectral image classification, IEEE Geoscience and Remote Sensing Letters3(1) (2006), 93–97.

33.

Chen

, Nasrabadi

N.M.

and Tran

T.D.

, Hyperspectral image classification using dictionary-based sparse representation, IEEE Transactions on Geoscience and Remote Sensing49(10) (2011), 3973–3985.

34.

Chang

C.C.

and Lin

C.J.

, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology2(3) (2011), Article 27.