Abstract
This paper proposes a classification approach for hyperspectral image using the local receptive fields based random weights networks (RWN). Considering the local correlations of spectral features, it is promising to improve the performance of hyperspectral image (HSI) classification by introducing the local receptive fields (LRF). It is the first time to apply such LRF-based RWN structure to HSI classification. The proposed classification framework consists of four layers, i.e., input layer, convolution layer, pooling layer, and output layer. The convolution and pooling layer are used for feature extracting and the last layer is used as the classifier. Experimental results on two real hyperspectral image datasets have confirmed the effectiveness of the proposed HSI classification method.
Introduction
Urban land use and land-cover classification is a key technique in remote sensing application system. Its main idea is to assign a land-cover class label to each pixel in the remote sensing image. During recent years land-cover classification has been widely applied in ground mapping, urban planning, mineral resources exploration, environment monitoring and many other fields.
The hyperspectral remote sensing technology emerges in the 1980s, and it has become the frontier technology in remote sensing. A hyperspectral image (HSI) is obtained by a spectrometer with a large wavelength range. Usually it contains tens or hundreds of bands, and the map in each band is an image of specific wavelength. Thus, HSI includes both the image information and spectral information. Since introducing spectral information would significantly improve the performance of land-cover classification, HSI classification has received growing attention from remote sensing community [1, 2]. However, the large amount of bands and relatively small training sample size brings challenges to conventional remote sensing classification methods.
As a fast learning algorithm for single-hidden layer feedforward neural networks (SLFNs), the random weights networks (RWN) model was first proposed by Schmidt et al. in [3], and it has been considered as a promising learning algorithm for pattern classification. A similar idea came out from Pao’s group and they termed such randomized learning models as random vector functional-link (RVFL) nets [4]. Further investigations of RVFLs were given in [5, 6]. Compared to traditional SLFN learning algorithms, RWN has the following advantages: (1) high efficiency, (2) easy-implementation, (3) unification of classification and regression, and (4) unification of binary and multi-class classification. Plenty of works have shown the capabilities of RWN in classification and regression task [7–13].
In past few years the local receptive field (LRF) has been applied in neural learning models widely [22–25]. With LRFs neurons can extract low level visual features, which are then combined to obtain higher level features in the subsequent layers. By considering the local connections of spectral features, it is possible to further improve the performance of HSI classification. Thus, in this paper, we propose a novel HSI classification scheme by introducing the LRF concept into the RWN model, which is called LRF based random weights networks (RWN-LRF). The main contributions of our work are summarized as follows. A new RWN structure using the LRF for hyperspectral image classification is proposed. To the best of our knowledge, it is the first time to apply such LRF-based RWN structure to HSI classification problems. Contextual information of HSI data is brought in LRF-based RWN to further improve classification performance. Experiments on two real HSI data (the Pavia University dataset and the Salinas dataset) demonstrate that our proposed approach outperforms the ordinary RWN and other classical or state-of-the-art classification methods.
The rest of the paper is organized as follows. Section 2 introduces the basic idea of random weights networks. Section 3 describes our proposed HSI classification method using LRF-based RWN. Experimental results are presented in Section 4. Finally, Section 5 concludes the paper.
Brief of random weights networks
Random weights networks (RWN) can be regarded as a SLFN using randomly assigned input weights. Unlike the BP (back-propagation) algorithm, it does not require adjustment of input weights. In general, RWN is a feedforward neural network with a simple three-layer structure: input layer, hidden layer and output layer. The schematic diagram of RWN is depicted in Fig. 1.
Let N} denotes the training set containing N training samples, and is the training data target matrix, where
During the training of RWN,
To get better generalization performance, a regularized RWN model is often used by adding constraint to the output weights [17]. The optimization object becomes:
And the output weights can be calculated as [9, 15]:
Local receptive fields based RWN
The major thought of local receptive field (LRF) is that each unit in a layer receives inputs from a set of units located in a small neighborhood in the previous layer. The idea of connecting units to LRFs on the input goes back to the perceptron in the early 1960s, and it was almost simultaneous with Hubel and Wiesel’s discovery of locally sensitive, orientation selective neurons in the mammalian visual system [18, 19]. The LRF has also been justified by more recent biological evidence, which shows the visual cortex cell only responds to sub-region of the retina (i.e., input layer) [20, 21].
The LRF has been used extensively in neural models of visual learning, such as Fukushima’s Neocognitron [22], Poggio’s HMAX [23], Lecun’s CNN [24], and Olshausen’s sparse coding [25] model. With local receptive fields neurons can extract elementary visual features such as oriented edges, corners, endpoints. These features are then combined to obtain higher order features in the subsequent layers. By introducing the LRF concept into random weights networks, RWN-LRF learns the local structures and generates more meaningful representations at the hidden layer when dealing with image processing and similar tasks.
The schematic diagram of RWN-LRF is shown in Fig. 2. To represent the input effectively, K different input weights are adopted and K diverse feature maps are obtained. The hidden layer consists of convolutional nodes in feature maps. The input weights of the same feature maps are shared while different feature maps are generated by distinct input weights (convolution kernels).
Assuming the input data has the size of d1 * d2, the kernel size of convolution (i.e., receptive field size) is r1 * r2, then the feature map after the convolution operation will have the size of (d1 - r1 + 1) * (d2 - r2 + 1). Let X be the input data, then the node (i, j) in the k-th convolution feature map, Ci,j,k, can be calculated as:
Next the square and root pooling structure is applied to formulate the combinatorial node. The pooling maps are composed of combinatorial nodes. Assuming the pooling size is s1 * s2, then the size of each pooling map is × . The node (i, j) in the k-th pooling map, Pi,j,k, is calculated as follows:
The square and rooting pooling introduces rectification nonlinearity and translational invariance into the network, which has been proved to be effective in [26] and [27].
Simply concatenating all combinatorial nodes values into a vector, the pooling layer (i.e., combinatorial layer) is fully connected to the output layer. The output weight matrix can be analytically calculated in a similar way as Equation (5):
1) If N ≤ K (d1 - r1 + 1) (d2 - r2 + 1)/s1s2
2) If N > K (d1 - r1 + 1) (d2 - r2 + 1)/s1s2
The critical challenge of pattern classification is to model the intra-class appearance and shape variation of objects. Since the band of HSI data is usually associated with its neighbor bands, it is possible to further improve the performance of HSI classification by considering the local correlations of spectral features. For HSI data, each pixel can be regarded as a 2D image whose height (i.e., d2) is 1. Thus, the size of the input layer is d1 × 1, where d1 is the number of spectral bands. Fig. 3 depicts our proposed RWN-LRF based HSI classification method.
The training process of the proposed RWN-LRF based HSI classification framework is described as follows. The feature maps in the convolution layers are calculated through the convolution operation in a weight-sharing way. That is to say, the nodes in the same feature map share the same convolution kernel. The convolution operation performs in the spectral dimension, which means the aforementioned r2 also has the value of 1 like d2. Moreover, it is notable that the input weights of RWN-LRF are randomly generated and kept unchanged. The nodes in pooling layer are then obtained by square/rooting pooling operation, where s2 is also set to 1. Next, the nodes in the pooling maps are combined into a feature vector and connected to c output nodes, where c is number of land-cover classes. The training is done when the output weights have been calculated analytically.
Since the architecture and weights are specified, the RWN-LRF model can be used as a classifier for HSI data. As expressed in Equation (2), the classification results can be obtained with a forward-propagation step. The index of the node with the largest value in the output layer is regarded as the predicted label of current pixel.
Experiments
In this section, we evaluate the classification performance of the proposed RWN-LRF based method first by comparing with ordinary RWN [29]. Then experiments are done with contextual information of HSI data added into RWN and RWN-LRF model. These two contextual methods are called CRWN and CRWN-LRF, respectively. The contextual methods are implemented by a square neighbor window. Assuming the size of neighbor window is w × w, then the spectral reflective value of a pixel becomes the mean value of the w2 pixels in the neighborhood. Comparisons are also made with other widely-used or state-of-the-art methods, including SVM [30], contextual SVM (CSVM) [31], SVM with composite kernels (SVM-CK) [32], and the simultaneous orthogonal matching pursuit (SOMP) algorithm [33]. Two benchmark hyperspectral images were used for the evaluation.
HSI data description
1) Pavia University Dataset: The dataset was collected by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor in 2003 with a spatial resolution of 1.3 m/pixel. It consists of 610×340 pixels in 103 spectral bands (12 noisy bands are removed from total 115 spectral bands) covering 0.43 μm–0.86 μm. The HSI data has 9 land-cover classes, whose spectral signatures are illustrated in Fig. 4. The hyperspectral 3D cube and the ground-truth image of the data set are shown in Fig. 5(a) and (b), respectively. In the experiments, 200 samples of each class are randomly chosen as the training data, and the rest are used as the test samples. The numbers of samples of each class are shown in Table 2.
2) Salinas Dataset: This dataset was collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor in 1998 over the Salinas Valley, CA, USA. It consists of 512×217 pixels with a spatial resolution of 3.7 m/pixel. The original dataset is composed of 224 bands, with a spectral range from 0.4 μm–2.5 μm. In the experiments, the corrected 204-bands dataset is considered by discarding the 20 water absorption bands: [108–112], [154–167], 224. The HSI data has 16 land-cover classes, whose spectral signatures are illustrated in Fig. 6. The hyperspectral 3D cube and the ground-truth image of the dataset are shown in Fig. (a) and (b), respectively. For this dataset, 50 samples of each class are randomly chosen as the training data, and the rest are used as the test samples. The numbers of samples of each class are shown in Table 4.
Experimental settings
Among the proposed approach and the compared methods, four methods (RWN, RWN-LRF, SVM, SOMP) use the only spectral information, while other methods use both spectral and contextual information. The neighbor window size w of contextual methods was chosen from 3, 5, 7, 9. For RWN and CRWN, two main parameters are the trade-off coefficient λ and the hidden neurons number L. The parameter λ was chosen from 10{-4,⋯,4} and L from {100,200,⋯,3000}. The best parameters were determined by five-fold cross validation, and were then used in the training and testing process. In RWN-LRF and CRWN-LRF methods, the convolution kernel size was set to 20 for the Pavia University dataset and 21 for the Salinas dataset, respectively. The pooling size was set to 2, and the number of feature maps K was chosen from {10,30,⋯,150} for both datasets. The trade-off parameter λ was also chosen from 10{-4,⋯,4} for both datasets. For SVM and CSVM classification, we applied the one-versus-one strategy using the LIBSVM Toolbox [34]. A radial basis function (RBF) kernel was adopted; the penalty term and the RBF kernel width were selected using grid search within the given sets {2-5,⋯,215} and {2-15,⋯,23}, respectively. For SVM-CK and SOMP, we used the settings as reported therein.
All experiments were implemented in MATLAB R2014a and run on a computer with two eight-core Intel Xeon E5-2650 processors at 2.0 GHz and 128GB of memory.
Evaluation metrics
To quantitatively evaluate the quality of classification results, several measure methods are commonly used, i.e., confusion matrix, overall accuracy, average accuracy, and Kappa coefficient. Confusion matrix is also called error matrix, which is the basis of other measures such as overall accuracy and Kappa coefficient. Let M be the confusion matrix, then its element m ij denotes number of pixels whose actual label is j while the predicted label is i. Confusion matrix can also be expressed in the form of percent.
Overall accuracy(OA) is the global evaluation of the classification results, which is the ratio of correctly classified pixels to all test pixels. This can be expressed as
The Kappa coefficient [28] uses the multiple discrete analysis technique and involves all elements of the confusion matrix, which is regarded as a more objective metric. The definition of Kappa coefficient can be written as
The classification performance of the four methods on the Pavia University dataset are summarized in Table 3, where the classification accuracy for each class, OA, AA, and κ coefficient are reported. These results are averaged over ten runs and the standard deviation is also reported. We make several observations: First, by introducing the LRF concept, RWN-LRF provides better classification result than the ordinary RWN, with the OA increasing from 77.17% to 80.85%. Second, taking into account contextual information can significantly enhance the classification performance, especially for several classes such as Asphalt, Meadows, and Bricks. Compared to RWN, RWN-LRF and SVM, the CRWN, CRWN-LRF and CSVM get an improvement of about 5%, 12% and 5% in term of OA, respectively. In general, the CRWN-LRF method provides the best classification performance, with the OA of 92.71%, AA of 94.95%, and κ of 0.9034. The classification maps of different methods are further illustrated in Fig. 8. As shown in the figure, the RWN, RWN-LRF, SVM, and SOMP have the problem of salt-and-pepper noise because no contextual information is involved. As a result, their classification maps are not smooth. This problem is greatly overcome by taking into account contextual information, as shown in Fig. 8(e, f, h and i). It could also be observed that CRWN-LRF dramatically reduces the confusion between Meadows and Bare Soil, compared to other methods.
The classification accuracies for Salinas are reported in Table 3. As the results shown, compared to RWN, RWN-LRF and SVM, the CRWN, CRWN-LRF and CSVM methods yield higher classification result, which stresses the importance of contextual information for HSI classification again. For Salinas dataset, the CRWN-LRF method provides considerably better accuracies than all other methods, with the OA of 96.69%, AA of 98.27%, and κ of 0.9574. The classification maps of different methods are further illustrated in Fig. 9.
Parameter analysis
In this subsection, we evaluate the effects of the two key parameters of the proposed CRWN-LRF method, i.e., the trade-off parameter λ and the feature maps number K. Figure 11 depicts the overall accuracies obtained by the CRWN-LRF method with different values of λ on Pavia University and Salinas datasets. Note that K is fixed to 150 for both datasets. It is observed that the optimal λ is 0.01 for Pavia University dataset and 10 for Salinas dataset.
Next, we analyze the effects of feature maps number K. We set λ = 0.01 for Pavia University image and λ = 10 for Salinas image, and we vary K from 10 to 150 for both datasets. Figure 10(a) and (b) illustrate the effects of K on running time and overall accuracy, respectively. It can be concluded from Fig. 10 that the running time of CRWN-LRF increases approximately linearly as K increases, while the overall accuracy nearly does not rise when K is larger than 130. Thus, K is set to 150 for both datasets in abovementioned experiments.
Conclusion
In this paper, we propose a HSI classification method using the local receptive fields based RWN. Experiments on the Pavia University and Salinas datasets demonstrate excellent performance of our proposed approach compared to the ordinary RWN and other classical or state-of-the-art methods. As for our future work, hierarchical RWN-LRF framework with multiple convolution layers and pooling layers will be studied to further improve HSI classification results.
Footnotes
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Grants 61125201, 61303070, U1435219, 61402507, and 61202127. The authors would like to thank Prof. P. Gamba for providing the ROSIS image of Pavia University. They also would like to thank the anonymous reviewers for their comments and suggestions, which greatly helped us to improve the technical quality and presentation of this paper.
