A lightweight 3D-2D convolutional neural network for spectral-spatial classification of hyperspectral images

Abstract

Hyperspectral Image (HSI) is usually composed of hundreds of capturing wavelength bands, which not only increase the size of the HSI rapidly but also impose various obstacles in classifying the objects accurately. Moreover, the traditional machine learning schemes utilize only the spectral features for HSI classification, which, therefore, neglect the spatial features that have a significant impact on the classification improvement. To address the aforementioned issues, in this paper, we propose to employ the principal component analysis (PCA), the baseline feature extraction method, and a thoughtfully designed stacked autoencoder, a deep learning-based feature extraction approach, for reducing the high dimensionality of the HSI and then propose a novel lightweight 3D-2D convolutional neural network (CNN) framework to concurrently exploit both spatial and spectral features from the dimensionality-reduced HSI for classification. In particular, PCA and stacked autoencoder are applied to reduce the high dimensionality of the original HSI and then the proposed 3D-2D CNN provides a combination of 3D and 2D convolution operations to extract the subtle spatial and spectral features for efficient classification. We well-adjust the proposed 3D-2D CNN architecture, and perform extensive experiments on three benchmark HSI datasets and compare our approach with the state-of-the-art classical and deep learning methods. Experimental results illustrate that we have achieved an overall accuracy of 99.73%, 99.90%, and 99.32% on Indian Pines, Pavia University, and Kennedy Space Center datasets, respectively, which outperform the classical machine learning and independent 2D and 3D CNN-based state-of-the-art methods.

Keywords

Feature extraction principal component analysis deep learning stacked autoencoder classification convolutional neural network

1 Introduction

Hyperspectral imaging is one of the most pivotal fields in satellite remote sensing imagery as it allows to identify different ground objects and analyze their structures in great detail from far away. In particular, a Hyperspectral Image (HSI) is an emerging remote sensing data source. Due to the modern advancement in hyperspectral camera sensor technology, hyperspectral cameras can now capture thousands of high-resolution spectral wavelength bands over the same spatial area. These high-resolution spectral bands conserve the crucial aspects of the spectrum and pave the way to differentiate between the materials and objects on the ground surface. For instance, Airbone Visible Infrared Imaging Spectrometer (AVIRIS) hyperspectral camera sensor can capture 224 contiguous spectral bands consisting of wavelength from 400 to 2500 nanometer, which ranges from visible ray to the infrared ray in the electromagnetic spectrum [1 –4]. Due to the enormous amount of spectral bands, HSI is being used rigorously in vegetation analysis, precision agriculture, deforestation analysis, identification and analyzing compounds of soil and rock and also in surveillance [5 –9]. However, there are some inherent challenges in effective HSI classification, which need to be resolved to take the full advantage of the imaging system. First, the neighboring spectral bands are highly correlated and, therefore, they create a huge amount of redundant information [10 –13]. Secondly, all spectral bands are not equally paramount for all types of applications, which implies that some bands are more prominent than others in the case of a specific application [14, 15]. Thirdly, an insufficient number of training samples often creates an imbalance between the training set and spectral bands, which causes the Hughes phenomenon and works to reduce the classification accuracy [16]. Fourthly, a slew of spectral bands leads to a high dimensionality, which makes the computational cost much higher [17]. At last but most importantly, it is very critical task to combine both the spectral and spatial features together for more accurate classification of the HSI [18 –21].

To tackle the correlated bands and high dimensionality, feature selection and feature extraction (sometimes, their combination) are mostly considered [1 , 22–27]. Principal component analysis (PCA) is the mostly used feature extraction techniques in HSI that extracts orthogonal features and lessens the high dimensionality. Datta [28, 29] has found out that PCA reduces the dimensionality of the HSI by 85%. Besides, there are several other versions of PCA, such as kernel PCA (KPCA) [30], iterative PCA (IPCA) [31], segmented PCA (SPCA) [32], folded PCA (FPCA) [33], and many more. Though these methods are memory-efficient, they are more time consuming compared to the baseline PCA [34]. In addition, PCA also performs well in combination with the deep learning (DL) methods [22, 23]. As such. we utilize PCA as one of the baseline feature extraction approaches in this paper. Along with that, we also thoughtfully design a DL-based feature extraction method called stacked autoencoder [35]. stacked autoencoder enables a latent space representation of the original input and compresses the high dimensional inputs in such a way that no data is lost and these excellencies motivate us to design a stacked autoencoder in HSI perspective.

Most conventional machine learning approaches exploit only spectral features and, therefore, ignore the spatial features for HSI classification. Due to the recent breakthrough of DL models in image classification, researchers are leaning towards using DL models to analyze the HSIs with more precision [36 –38]. Chen [39] was the pioneer to introduce a deep belief network that incorporates both spectral and spatial features to provide more precise classification. After that, different types of DL architectures was used in HSI classification, such as convolutional neural network (CNN) [40], recurrent neural network (RNN) [41], long-short-term-memory (LSTM) network [42], a combination of both CNN and RNN [43] etc. Among these, CNN has emerged as one of the most prominent architectures because of its remarkable success in image classification [44] and its ability to share a local connection, which leads to fewer training parameters. Makantasis [23] used a hybrid approach that incorporates PCA and CNN for HSI classification. Though it produces a significant result, it uses a 2D CNN structure, which struggles to identify the region with a similar texture. [45] used a deep CNN network in combination with random forest method, where the same deep CNN architecture was used separately for each of the feature subspace extracted by the random forest. However, this combination of CNN and random forest results in an enormous number of training parameters with such a limited number of training samples and may lead to the overfitting problem. Zhang [46] used a dual-channel convolution layer, one channel consisting of 1D CNN and another channel consisting of 2D CNN to provide a good quality hierarchical features and lead to an improvement of accuracy. Consequently, this pair of 1D and 2D CNNs also increases the network size enormously as there are dual channels in a single network and the features extracted by the 1D CNN are not quite up to the marks as they only consider the spectral bands. Recently 3D CNN have been [47, 48] introduced in the HSI classification field because of their ability to extract deep spectral and spatial features. Nevertheless, the 3D networks cannot extract as fine-tuned spatial features as 2D networks because of their structure. A dual graph CNN (GCN) [49] has been proposed in which the first GCN extracts features from hyperspectral images and the second GCN uses label distributed learning that allows it to work with a limited number of training samples. However, a limited number of training samples sometimes lead to underfitting. Another variant of CNN was presented in [22] known as multi-scale 2D CNN that extracts multiple higher quality spatial contexts, but the number of training parameters increases exponentially because multiple spatial contexts are being taken into the model simultaneously As such, training a model with an enormous number of parameters using a limited number of training samples demands a large amount of time and often overfits the model [50] to decrease the generalization performance.

From the above discussion, it can be seen that although different kinds of CNN networks have been used extensively for HSI classification, 2D and 3D CNN have been used separately. 2D CNN has achieved remarkable accuracy but its convolutional kernels move over only two-dimension not over the spectral dimension leading to mediocre spectral features. On the other hand, although the convolutional kernels of 3D CNN move across the spectral dimensions, it does not provide as good inter-spectral learning as 2D CNN. These observations motivate us to propose for merging 3D CNN and 2D deep CNN together to exploit both of their advantages in a single architecture. As such, in this paper, we propose a novel 3D-2D CNN architecture for performing HSI classification. The proposed hybrid 3D-2D CNN exploits both the spatial and spectral features of the HSI more precisely following the PCA and stack autoencoder-based extracted features. The number of trainable parameters in our proposed architecture is far less than other state-of-the-art CNN approaches, which makes it less time-consuming and curbs its tendency for overfitting. We consider three different benchmark HSI datasets for extensive performance evaluation of our proposed framework. We have achieved an accuracy of 99.73%, 99.80%, and 99.32% on the Indian Pines, Pavia University, and Kennedy Space Center (KSC) datasets, respectively, which are superior than the investigated state-of-the-art independent 2D and 3D approaches. The main reason behind the result contributes to the fact that our novel hybrid 3D-2D CNN makes the proper use of spatial and spectral feature at the same time. To summarize, the main noteworthy contributions of this paper can be put forward into the following key points:

modelling and adjusting a novel 3D-2D deep CNN architecture for spatial-spectral classification,

performance investigation of the proposed 3D-2D CNN using the PCA features,

performance analysis of the proposed 3D-2D CNN using the designed stacked autoencoder features, and

extensive and detailed experiments on three benchmark datasets comparing with the classical machine learning, and independent 2D and 3D CNN-based state-of-the-art approaches.

The rest of this paper is structured as follows. Section 2 delineates the dimensionality reduction and our novel 3D-2D deep CNN architecture for spectral-spatial classification. The experiments and results are described and analyzed in Section 3, while Section 4 concludes and summarizes the observations with pointing out some potential future research directions.

2 Proposed approach

2.1 Approach overview

Proper classification of the HSIs is challenging due to the increasing number of correlated spectral bands. At first the HSI and the corresponding ground truth are collected. Then the high dimensionality of the HSI image has been reduced by PCA and stacked autoencoder separately. After that, 3D spatial patches are created from the dimensionality reduced HSI. next, the patches are split into train and test samples. Finally, the train samples have been used to train the proposed 3D-2D CNN architecture for exploiting both the spectral and spatial features in great detail and the test samples have been used to evaluate the trained model. Figure 1 depicts the workflow of our proposed approach.

Fig. 1

Overview of the proposed approach for performing the HSI classification based on the spectral-spatial features.

2.2 Dimensionality reduction

2.2.1 Principal component analysis

PCA is an unsupervised feature extraction technique, which is mainly utilized to reduce the correlated spectral bands that conveys almost the same spectral information about a particular region or object [8, 51]. PCA reduces the original dataset in such a way that it stores the structure of the original data [52]. PCA explores the statistical property of hyperspectral bands and reduces the dimensionality by using eigenvalue decomposition of covariance matrix formed from the spectral bands of the original HSI. At first, the HSI hypercube is converted into a two-dimensional matrix, A, which of dimension N × M, where N is the number of spectral bands and M is the multiplication of length and width of the original HSI band image. Each of the pixels in the image can be described by a single spectral vector x , such that x ∈ A and x can be represented as x_n = [x_n1, x_n2, x_n3, . . . . x_nN] ^T and n ∈ (1, M). The mean vector ( μ ) of the matrix A can be determined and specified as follows. $μ = \frac{1}{M} \sum_{n = 1}^{M} x_{n} .$ (1) After that the mean-adjusted HSI, Y can be calculated by subtracting the mean from the data matrix, A $Y = A - μ .$ (2) Now, the covariance matrix C can be determined and specified as follows. $C = \frac{1}{M} \sum_{n = 1}^{M} (x_{n} - μ) (x_{n} - μ)^{T} = \frac{1}{M} Y Y^{T},$ (3) where the dimensionality of the covariance matrix is N × N. Afterwards, the eigenvalues and eigenvectors can be determined using eigenvector decomposition that takes the form of the following equation. $C = Z D Z^{T},$ (4) where D is the diagonal matrix of the eigenvalues and Z is the matrix containing all orthogonal eigenvectors, also known as principal components. After that, the eigenvectors are arranged in a descending order, where the first principal component covers high variance and the last eigenvector component covers low variance [24] respectively. Such a distribution of principal components allows us to take first t components, where t << N, and create a matrix w of size N × t. To achieve the transformed data matrix, A′, we multiply w with A as follows. $A^{'} = w^{T} Y .$ (5)

2.2.2 Stacked autoencoder

Although PCA is computationally inexpensive, but when the dataset is non-linearly distributed, PCA becomes inefficient to extract the proper features. Alternatively, autoencoder is an unsupervised feature extraction technique that uses mainly three-layers: input, hidden and output layers to reconstruct the original input data at the output layer via the reduced intermediate layer [53]. The pivotal purpose of an autoencoder is to learn a latent space representation of the original data. The hidden layer of the autoencoder preserves the information about the input data in much lower dimensionality. As the entire original data can be reconstructed from this lower-dimensional data, this provides a much more efficient and accurate way for reducing the dimensionality of high dimensional data [54].

Stacked autoencoder is a variant of the autoencoder, where instead of using just one input, hidden and output layer, we use multiple hidden layers between the input and output. This leads to a more precise and accurate feature extraction through the abstraction of the hidden layers. Stacked autoencoder is used in HSI based on reducing the correlation or noise from hyperspectral images and thus providing more useful and pivotal features for HSI classification [55]. The stacked autoencoder is trained through the back-propagation algorithm to extract the required features. We describe the working principle of stacked autoencoder for reducing the dimensionality in HSI as follows. Firstly, the hyperspectral cube is converted into a 2D data matrix, C of dimension D × E by reshaping, where D is the multiplication of length and width of the band image and E is the number of spectral bands in the HSI. A representation of the stacked autoencoder that has been used for this experiment is illustrated in Fig. 2, where the original input matrix C has been reduced to a total number of features, t through three subsequent hidden layers called an encoder, and then reconstructed again into the original signal C′, through two hidden layers called a decoder. P represents a single column vector derived from matrix C which is the input to stacked-encoder. Each row of Matrix C is feed to into the stacked-autoencoder. The encoder and decoder layers are symmetric most of the time and consist of the same number of hidden layers and nodes. The training process for the autoencoder is based upon the reconstruction of the original signal at the final output layer [35]. This training process provides the hidden layers to learn the coding representation of the entire input matrix. The reconstruction of the original signal at the output layer can be determined by Equation (6). $\begin{matrix} \begin{matrix} R = f (W_{r} \times B_{r} + P) \\ L = f (W_{l} \times B_{l} + R) \\ Q = f (W_{q} \times B_{q} + L) \\ T = f (W_{t} \times B_{t} + Q) \\ U = f (W_{u} \times B_{u} + T) \\ Y = f (W_{y} \times B_{y} + U) \end{matrix}}, \end{matrix}$ (6) where W_r, W_l, W_q, W_t, W_u and W_y are the weights and B_r, B_l, B_q, B_t, B_u and B_y are the biases of the layers P, R, L, Q, T, U and Y respectively. The goal of the training is to make the signal C′ as close to C as possible that means to reduce the error between C′ and C. The back-propagation is based on updating the weights and biases in such a way that the error between C′ and C reduces to a minimum [35]. As such, for training the stacked autoencoder the optimizing the parameter has to be done by minimizing the error as follows. $\overset{\min}{W_{r}, W_{l}, W_{q}, W_{t}, W_{u}, W_{y}, B_{r}, B_{l}, B_{q}, B_{t}, B_{u} B_{y}} [error (C^{'}, C)] .$ (7)

Fig. 2

Graphical representation of the stacked autoencoder designed in the experiment.

The middle layer of the stacked autoencoder, also known as the bottleneck layer, is mainly accountable for reducing the dimensionality of original data. Once the training process is completed and the error between the original signal C and reconstructed signal C′ is minimized, the number of nodes, t in the bottleneck layer Q, can be used to represent the primary data C. As it has encoded the structure of the original data that is required for reconstruction in the neurons of hidden layer Q. In this way, the stacked autoencoder is utilized for reducing the number of spectral bands from E to t (t << E) abating the correlation and noise in the HSI. Note that in this case only the spectral bands are affected.

2.3 3D and 2D convolutional neural network

Among the deep neural network architectures, CNN has emerged as the most promising architecture in the field of image classification [44]. This mainly happened because of its significant performance in the Imagenet challenge [56] and the fact that it was based on a human visual system. There are mainly three parts of CNN. First one is the convolution layer, where the convolution operation is performed between the input image and different kernels in order to produce feature maps [57]. This is the main component of convolution layers that leads to the deep abstract hierarchical features. The second one is the pooling layer that provides translation invariance and reduces the dimensionality of feature maps. Finally, there is a fully connected layer that provides a brief description of which feature exists or not in the image [58]. In 2D CNN, the kernel only moves to two directions over the input image i.e., length-wise and height-wise producing superior spatial features in case of HSI. This also provides the opportunity for cross-channel learning HSIs. The convolutional operation of 2D CNN can be denoted by the following equation. $G_{i, j}^{x, y} = f (\sum_{n} \sum_{h = 0}^{H_{i} - 1} \sum_{w = 0}^{W_{i} - 1} K_{i, j, n}^{h, w} G_{(i - 1), j}^{(x + h) (y + w)} + b_{i, j}),$ (8) where $K_{i, j, n}^{h, w}$ denotes the value of the j^th kernel at i^th convolutional layer at position (x, y). H_i and W_i represent the sizes of the filter, $G_{(i - 1), j}^{(x + h) (y + w)}$ symbolizes the value of n^th feature map of previous layer at position (x + h) (y + w), $G_{i, j}^{x, y}$ represents the output ofj^th feature map of i^th layer at position (x, y), b_i,j represents the bias of i^th layer and n represents the number of kernels used. Finally, f () denotes the ReLU activation function that is responsible for introducing non-linearity into the feature maps. Different Convolutional layers consist of different numbers of kernels and kernel size, optimized by hyperparameter tuning strategy. The 2D feature map contains the features extracted by a specific convolutional kernel. As the input HSI is 3D (length, width and spectral band), the output after multiple convolutional kernels are also 3D, where the third dimension of the output feature maps does not refer to spectral bands, rather it represents the number of kernels used in convolutional layers. In the 3D CNN, the kernels move across three dimensions, length, height and width, and eventually leads to more remarkable and promising spectral features than ordinary 2D CNN [47]. The 3D convolution operation can be specified by the equation below. $V_{i, j}^{x, y, z} = f (\sum_{τ = 1}^{d_{t} - 1} \sum_{λ = - ν}^{λ = ν} \sum_{ρ = - γ}^{ρ = γ} \sum_{φ = - δ}^{φ = δ} K_{i, j, τ}^{ν, γ, δ} V_{(i - 1), τ}^{(x +) (y + γ) (z + δ)}),$ (9) where $V_{i, j}^{x, y, z}$ represents the value of j^th feature map of i^th layer at (x, y, z) spatial location. d_t - 1 denotes the number of 3D feature maps produced at (i - 1) ^th layer and K_i,j,τ represents the kernel used in i^th layer. 2ν + 1, 2λ + 1 and 2γ + 1 represent the height, width and depth of the kernel used in the 3D convolutional layer respectively. Finally, f () denotes the activation function used in the 3D convolutional layer. The usage of 3D CNN extracts more accurate spectral features than vanilla CNN and 2D CNN, as well as extracts superior spatial features in the HSI. A properly tuned hybrid architecture that contains both 3D and 2D CNN benefits is thus more likely to produce sublime spectral and spatial features that will lead to a more accurate HSI classification.

2.4 Proposed hybrid 3D-2D CNN architecture

We propose a hybrid 3D-2D CNN to exploit the advantage of both 3D-CNN and 2D CNN together. To make the 3D hyperspectral cube compatible with CNN input, we create several small 3D spatial patches of dimension S × S × N, where S represents the size of spatial context and N represents the number of spectral bands. The extraction process is illustrated in Fig. 3. The ground truth for a specific patch depends on its center pixel. Note that the spatial size, S could be 3, 5, 7, 9, 11, 13 etc. and its appropriate value is required to tune. If the size is too small, then we might not have enough spatial context; on the other hand, if the size is too big, then spatial noise might get included while performing classification of the HSI. We provide an exhaustive search on the available values for spatial contexts towards its adjustment. After creating the 3D spatial patches, we pass the patches through four 3D convolutional layers to acquire finer spectral features along the depth dimension. The number of layers, the number of filters used in each layer, and the size of filters are optimized through a hyperparameter search for extracting higher quality spatial and spectral features. With the passing of each 3D convolution layer, we decrease the depth dimension of the filter size in order to extract more abstract features. There are mainly two ways of combining 3D and 2D CNNs. The first one is to perform the 3D convolutional operations and then 2D convolutional operations. The second one is to perform the 2D convolutional operations first and then reshape the output feature map and then perform 3D convolutional operations. However, in the second way, we lose some information about the 3^rd dimension of the feature maps which leads to poorer spectral and spatial features. As such, we have used 3D convolutional operations first and then 2D convolutional operations.

Fig. 3

Extraction of 3D spatial patches from dimensionality reduced HSI.

Afterwards, we reshape the 3D feature maps produced by the 3D convolutional layers and pass them through a 2D convolutional layer. This combination leads to more accurate discrimination and enhancement of the spatial features than using only 3D convolutional layers. The 2D convolutional layer also provides a way for cross-channel learning that helps tune the spatial features across the spectral band. In each convolutional layers, the ReLU activation function is used to introduce the non-linearity in the extracted features. The number of 2D convolutional layers, the number of kernels on each layer, and the filter size are determined through performing an extensive grid search based upon the previous known 2D networks. Then, we flatten the 2D feature maps produced by the 2D convolutional layer into a one-dimensional tensor containing the spectral and spatial features extracted by the 3D-2D hybrid architecture. Finally, a multilayer perceptron (MLP) network with two hidden layers are used to perform the HSI classification based on the extracted spectral and spatial features using the hybrid 3D-2D CNN. We also use ReLU activation functions in the hidden layers of the MLP. The number of nodes in the output layer depends upon the number of classes in each hyperspectral image. We use the Softmax activation function in the output layer as each experimented HSI contains more than two ground classes and it provides a multinomial probability distribution, which provides more intuitive intuition about which class a sample belongs to. Figure 4 illustrates a detailed design of our proposed hybrid 3D-2D architecture and Table 1 represents the number of filters and sizes of the filters used in each convolutional layer.

Fig. 4

Structure of the proposed hybrid 3D-2D CNN architecture, where S and N respectively denote the spatial context and number of spectral bands in the dimensionality reduced image.

Table 1

Description of filters used in each convolutional layer

Filter Information	3D Convolutional Layer 1	3D Convolutional Layer 2	3D Convolutional Layer 3	3D Convolutional Layer 4	2D Convolutional Layer
Number of Filters	8	16	32	64	128
Size of Filters	3×3×9	3×3×7	3×3×5	3×3×3	3×3

3 Experimental result and analysis

3.1 Experimental setting

We consider three benchmark HSI datasets i.e., Indian Pines, Pavia University and Kennedy Space Center (KSC) for experimental validation of our proposed spectral-spatial classification framework over the state-of-the-art DL-based approaches. We first perform standalone experiments using our 3D-2D CNN architecture preceded by the PCA and stacked autoencoder-based feature reduction methods over all three datasets. We denote the pair of PCA and our proposed 3D-2D CNN by PCA-Hybrid 3D-2D CNN and the pair of stacked autoencoder and our 3D-2D CNN by SAE-Hybrid 3D-2D CNN, respectively. In this way, we first fine-tune the hyperparameters associated with our proposed framework and then conduct comparative assessment of the 3D-2D CNN with the baseline independent 2D and 3D CNN-based approaches. We also include some classical machine learning based approaches, such as Support Vector Machine (SVM) in the comparative study. We consider average accuracy (AA), overall accuracy (OA) and Cohen’s Kappa score as the evaluation metrics. All of the accuracy metrics are mainly based on the confusion matrix of the test dataset and used to evaluate the performance of any multiclass problem. AA can be calculated by taking the average of individual accuracy’s for each of the classes and OA denotes the number of total correct prediction samples among the total amount of prediction samples. AA and OA act as complementary metrics to one another as they provide a clear picture about which classes are more difficult to predict in a multiclass environment. Cohen’s Kappa indicates the difference between the obtained overall accuracy of a classification model and the overall accuracy achieved by random guess. Besides, it represents the gap between the actual estimation and the chance estimation. There exists a positive correlation between Kappa score and efficacy of the model i.e., a high value of Kappa score yields a better classification model. In case of multiclass classification, where it becomes difficult to interpret from the receiver operating characteristic (ROC) curves, the kappa score can be used as a good alternative. Now, OA and Kappa score can be determined as follows. $\begin{matrix} \begin{matrix} OA = \frac{\sum_{j = 1}^{ψ} {CM}_{jj}}{T} \\ Kappa = \frac{T \times \sum_{j = 1}^{ψ} {CM}_{jj} - \sum_{j = 1}^{ψ} C_{j +} C_{+ j}}{T^{2} - T \times \sum_{j = 1}^{ψ} C_{j +} C_{+ j}} \end{matrix}}, \end{matrix}$ (10) where $C_{j +} = \sum_{k = 1}^{ψ} {CM}_{jk}$ , $C_{+ j} = \sum_{k = 1}^{ψ} {CM}_{kj}$ , ψ is the total number of classes in the confusion matrix, T is the total number of test samples and C_jj denotes the samples belonging to j^th class and classified into j^th class. C_j+ and C_+j respectively represent the sum of elements of j^th row and j^th column in confusion matrix,CM.

For the implementation, we use google colaboratory as the online computing platform, which provides GPU supports for training the DL models. We build our proposed hybrid 3D-2D architecture using Keras, which is a Python framework used to create and deploy DL architecture, and runs on the top of Tensorflow, another framework developed by Google. Generally, It offers 13GB of RAM memory and 11-16 GB of graphics memory.

3.2 Dataset description

3.2.1 Indian Pines dataset

The Indian Pines dataset was curated by AVIRIS over the Indian Pines test set situated at North-western Indiana [59]. This HSI consists of 224 contiguous spectral bands with wavelengths from 400 to 2500 nanometers and a size of 145 × 145 pixels with a spatial resolution of 20 meter per pixel. The ground truth for this dataset mainly consists of agriculture, forests, and natural persisting forestation. There are a total of 16 different classes. We work with 200 spectral bands while excluding the spectral bands containing the water regions. A total of nine classes has been considered for this experiment while omitting the classes with a very lower amount of training and testing samples. Figure 5 depicts a sample band image and ground truth reference. The numbers of training and testing samples for our experiment are provided in Table 2.

Fig. 5

Indian Pines HSI dataset. (a) Sample color image. (b) Ground truth map.

Table 2

Number of training and testing samples for Indian Pines HSI

Class Name	Train samples	Test samples
Corn-notill	856	572
Corn-mintill	498	332
Grass-pasture	290	193
Grass-trees	438	292
Hay-windrowed	287	191
Soybean-notill	583	389
Soybean-mintill	1473	982
Soybean-clean	359	237
Woods	759	506

3.2.2 Pavia university dataset

The Pavia University dataset was acquired by Reflective Optics System Imaging Spectrometer (ROSIS) over Pavia, norther Italy during a flight campaign. This dataset comprises 102 spectral bands ranging over the wavelengths of 430 to 960 nanometer. The size of the image is 610 × 610 pixels with a spatial resolution of 1.3 meter per pixel, but some of the samples contain no information. Therefore, they have been shed before performing further analysis resulting in a dataset with a size of 610 × 340. There is a total of 9 different classes representing different kind of objects and vegetations in the ground truth. From the hyperspectral signature of different classes, it is evident that some of the classes have a similar spectral signature. That is why it is paramount to introduce spatial information while performing classification. The original dataset of Pavia University has been provided by Prof. Paolo Gamba of Pavia University. Figure 6 portrays the image and ground truth for this dataset. The numbers of training and testing samples used in our experiment are given in Table 3.

Fig. 6

Pavia University Dataset. (a) Sample image. (b) Ground truth map.

Table 3

Number of training and testing samples for Pavia University HSI

Class Name	Train samples	Test samples
Asphaltl	3979	2652
Meadows	11189	7460
Gravel	1260	839
Trees	1838	1226
Painted metal sheets	807	538
Bare Soil	3017	2012
Bitumen	798	532
Self-Blocking Bricks	2209	1473
Shadows	568	379

3.2.3 KSC dataset

The KSC dataset was also collected by the AVIRIS sensor. This HSI has been captured over the area of KSC in Florida by NASA. This image has been procured via 224 contiguous spectral bands, each of which is 10 nanometer wide in length. After removing the water absorption bands and low signal to noise ration bands, the total number of spectral bands is 176. The size of the image is 512 × 614 pixels with a spatial resolution of 18 meter per pixel. There is a total of 13 different classes representing various kinds of land covers in the swamp and dry land of that area. Some of the classes have mixed hyperspectral signatures that become very hard to distinguish in low dimension. For this reason, discerning between land covers for this image using only spectral signature gets very difficult. Figure 7 provides a representation of the image and ground truth reference. A distribution of training and testing samples for our experiment is provided in Table 4.

Fig. 7

KSC HSI dataset. (a) Sample image. (b) Ground truth map.

Table 4

Number of training and testing samples for KSC dataset

Class Name	Train Samples	Test Samples	Class Name	Train Samples	Test Samples
Scurb	456	305	Cattail-marsh	242	162
Willow-swamp	146	97	salt-marsh	251	168
Cabbage-palm-hammock	154	102	Mud-Flats	302	201
Cabbage-palm/oak-hammock	151	101	Water	556	371
Slash-pine	97	64
Oak/broadleaf-hammock	137	92
Hardwood-swamp	63	42
Graminoid-marsh	259	172
Spartina-Marsh	312	208

3.3 Hyperparameter tuning

For performing accurate and precise classification, the tuning of our proposed hybrid 3D-2D architecture is one of the most crucial stages. Firstly, we need to fix the numbers of 3D and 2D convolutional layers in our deep architecture. For this, we start with using one 3D convolutional layer and one 2D convolutional layer. Then, we start increasing the numbers of both 3D and 2D convolutional layers. We provide the results in Fig. 8, from where it can be seen that we achieve more accuracy when we use three 3D convolutional layers and one 2D convolutional layer in the case of the Indian Pines dataset. We apply similar tuning strategy on the other datasets and we find that the same combination of three 3D convolutional layers and one 2D convolutional layer always yields the best classification result.

Fig. 8

Graphical representation of accuracy vs number of 3D and 2D convolutional layers in the proposed hybrid 3D-2D CNN using S=7.

Usually, if the number of filters gradually increases, then the size of the filter decreases in deeper convolutional layers. This mainly happens because of the hierarchy of the features in feature maps. In the first few layers, the number of features is comparatively low but their size is moderate. As the layers go deep, the number of features rises, and the size of features declines. In this architecture, we have not used any pooling layer because the spatial patches are already small and there is no scaling variance present in the spatial patch. ReLU activation function is used in each convolutional layers. We consider different optimizers to minimize the categorical cross-entropy loss for our multiclass classification problem. Generally, Adam and stochastic gradient descent (SGD) optimizers are used in HSI classification. From Table 5, we can see that the Adam optimizer works superior with our proposed hybrid 3D-2D CNN in case of all the datasets used for the evaluation of the model.

Table 5

Impact of different optimizers on PCA-Hybrid 3D-2D CNN

Dataset Name	Adam	SGD
Indian Pines	99.7	98.0
Pavia University	99.8	98.5
KSC-Centre	98.62	97.28

The training dataset is split into two parts: training and validation. The validation data is used to monitor the overfitting while training the model. As the number of samples is modicum, we use dropout and L2 regularization in each layer to avoid the model from overfitting. We take 128 samples in a single batch after experimenting with different kinds of batch sizes. Batch size 128 leads to more remarkable accuracy than other batch sizes. We use early-stopping and variable learning rates throughout the training process. Initially, the learning rate is set to 10^-4. After that, it is gradually decreased based on early stopping criteria. We monitor the loss function and stop training after a certain period when the loss became constant. In this way, we are able to save a noticeable amount of computation time. The number of epochs varied based on the dataset i.e., 15, 20 and 36 for Indian Pines, Pavia University and KSC dataset, respectively, using the PCA-Hybrid 3D-2D CNN module, and 25, 35 and 42 for Indian Pines, Pavia University and KSC dataset, respectively, using the SAE-Hybrid 3D-2D CNN module. From the training and validation loss curves portrayed in Fig. 9, it is obvious that the training loss and validation loss are almost similar and, therefore, we can conclude that the model is neither overfitting nor underfitting. We can also see after a certain point that the training and validation loss stops decaying i.e., when we decide to stop the training process.

Fig. 9

Training and validation loss curve showing the training progress of our approach on three different benchmark datasets.

3.4 Impact of feature extraction on the hybrid 3D-2D CNN

In this experimental part, we evaluate the impact of the aforementioned two feature extraction methods i.e., PCA and stacked autoencoder on our proposed 3D-2D CNN framework. First, we perform PCA on each HSI dataset and then feed the reduced HSI into the proposed hybrid 3D-2D CNN. The number of principal components (PCs) is determined based upon the cumulative variance graph, as depicted in Fig. 10. In case of Indian Pines and Pavia University, it can be seen that 30 and 20 PCs respectively cover a total of 99% of the variance of each of the whole datasets. As such, we select 30 and 20 PCs for Indian Pines and Pavia university dataset, respectively. However, it can be seen that we need at least 100 PCs to cover about 90% of the total variance for the KSC dataset. Therefore, we choose the first 100 PCs for the KSC dataset.

Fig. 10

Graphical representation of the cumulative variance of PCs of the three benchmark datasets.

Secondly, we implement a stacked autoencoder for each of the datasets to refine the relevant features and feed the features to the proposed hybrid 3D-2D CNN architecture. In stacked autoencoder, there are no hard and fast rules for the selection of nodes in the bottleneck layer, which is essentially the number of features extracted using the stacked autoencoder. We use a grid search strategy to find the appropriate number of nodes in the bottleneck layer. The search space is determined through previous heuristics and works on these datasets using autoencoders. From Fig. 11, it can be observed that 30, 20 and 19 nodes in the bottleneck layer yield a maximum classification accuracy for the Indian Pines, Pavia University and KSC dataset, respectively.

Fig. 11

Graphical representation of accuracy vs number of nodes in the bottleneck layer of the proposed SAE-Hybrid 3D-2D CNN architecture.

From the above discussion, it is obvious that the number of PCs in case of the KSC dataset is far greater than the number of nodes used in the bottleneck layer of the stacked autoencoder. Although PCA approach outperforms the stacked autoencoder by a slight margin, the network built with a stacked autoencoder takes a lot less time as the total number of hyperparameters reduces drastically due to the number of features extracted via the stacked autoencoder. This happens mainly because of the heavy non-linearity that exists in the KSC dataset. As such, PCA cannot properly reduce the number of spectral bands in comparison to the stacked autoencoder. On the other hand, PCA performs comparatively better than the stacked autoencoder in case of the Indian Pines and Pavia University datasets because the number of PCs is much less than the original spectral bands.

3.5 Classification result and evaluation

We use three different accuracy metrics (AA, OA and Cohen’s Kappa score) for the evaluation of our proposed 3D-2D CNN architecture on the three benchmark datasets along with two different dimensionality reduction methods (PCA and stacked autoencoder). We provide the accuracy results in Tables 6, 7 and 8; from these tables, it can be observed that our proposed 3D-2D CNN architecture paired with PCA or stacked autoencoder achieves remarkable classification accuracy for all the three benchmark datasets. In particular, the proposed 3D-2D CNN framework paired with PCA obtains an overall accuracy of 99.73%, 99.90% and 99.32% for the Indian Pines, Pavia University, and KSC datasets, respectively. These are slightly higher than the accuracies achieved by pairing stacked autoencoder with our proposed 3D-2D CNN architecture, which are 99.70%, 99.87%, and 97.93% for the three datasets, respectively. It can also be seen that the Cohen Kappa score is high for our 3D-2D CNN architecture, which indicates the robustness and unbiased performance of our proposed 3D-2D CNN model. Besides, we consider increasing the value of spatial context, S. From there, it can be observed that the accuracy degrades with the increment of S. It happens mainly because lots of noise gets introduced into the feature space with the increasing of S. Figure 12 portrays the relationship between average accuracy and S. From these experimental results, we can conclude that the best classification accuracy is achieved when S equals to 7 and the classification accuracy declines, otherwise. It can be noted from Fig. 12 that the number of trainable parameters increases dramatically with the increment of S for all the three benchmark datasets.

Table 6
Classification accuracy of the proposed hybrid 3D-2D CNN architecture using PCA and stacked autoencoder on Indian Pines dataset

Class Name PCA-Hybrid 3D-2D CNN SAE-Hybrid 3D-2D CNN

S=3 S=5 S=7 S=3 S=5 S=7

Corn-notill 95.97 98.59 99.98 90.73 98.95 99.53

Corn-mintill 94.27 99.21 99.71 83.45 97.59 99.69

’Grass-pasture’ 96.30 99.98 99.98 97.92 97.92 99.98

’Grass-trees’ 99.33 99.98 99.98 98.28 99.31 99.98

’Hay-windrowed’ 99.98 99.98 99.98 99.98 99.98 99.98

’Soybean-notill’ 93.05 98.67 99.78 77.65 95.13 99.24

’Soybean-mintill’ 96.74 98.23 99.24 84.12 95.11 99.89

’Soybean-clean’ 94.93 98.33 99.98 86.91 92.40 99.15

Woods 99.80 99.98 99.98 99.61 99.80 99.98

Average Accuracy 96.72 99.22 99.84 90.96 97.35 99.71

Overall Accuracy 96.67 99.89 99.70 89.36 97.13 99.73

Kappa Score 96.18 99.30 99.74 90.50 97.10 99.70

Class Name	PCA-Hybrid 3D-2D CNN	SAE-Hybrid 3D-2D CNN
Corn-notill	95.97	98.59	99.98	90.73	98.95	99.53
Corn-mintill	94.27	99.21	99.71	83.45	97.59	99.69
’Grass-pasture’	96.30	99.98	99.98	97.92	97.92	99.98
’Grass-trees’	99.33	99.98	99.98	98.28	99.31	99.98
’Hay-windrowed’	99.98	99.98	99.98	99.98	99.98	99.98
’Soybean-notill’	93.05	98.67	99.78	77.65	95.13	99.24
’Soybean-mintill’	96.74	98.23	99.24	84.12	95.11	99.89
’Soybean-clean’	94.93	98.33	99.98	86.91	92.40	99.15
Woods	99.80	99.98	99.98	99.61	99.80	99.98
Average Accuracy	96.72	99.22	99.84	90.96	97.35	99.71
Overall Accuracy	96.67	99.89	99.70	89.36	97.13	99.73
Kappa Score	96.18	99.30	99.74	90.50	97.10	99.70

Table 7

Classification accuracy of the proposed hybrid 3D-2D CNN architecture using PCA and stacked autoencoder on Pavia University dataset

Class Name	PCA-Hybrid 3D-2D CNN			SAE-Hybrid 3D-2D CNN
	S=3	S=5	S=7	S=3	S=5	S=7
Asphalt	99.02	99.71	99.98	99.05	98.98	99.84
Meadows	99.54	99.94	99.94	99.87	99.91	99.98
Gravel	89.86	97.51	99.88	94.89	98.80	99.04
Trees	98.85	99.42	99.85	98.45	99.18	99.83
Painted metal sheets	99.62	99.98	99.97	99.98	99.96	99.97
Bare Soil	99.20	99.98	99.95	98.80	99.90	99.95
Bitumen	95.86	99.62	99.46	95.30	99.26	99.65
Self-Blocking Bricks	94.63	98.91	99.66	97.62	98.55	99.81
Shadows	99.98	99.20	99.73	99.73	99.97	99.98
Average Accuracy	97.38	99.36	99.82	98.19	99.37	99.78
Overall Accuracy	98.35	99.64	99.90	98.94	99.52	99.87
Kappa Score	97.81	99.52	99.86	98.59	99.36	99.83

Table 8

Classification accuracy of the proposed hybrid 3D-2D CNN architecture using PCA and Stacked Autoencoder on KSC dataset

Class Name	PCA-Hybrid 3D-2D CNN			SAE-Hybrid 3D-2D CNN
	S=3	S=5	S=7	S=3	S=5	S=7
Scrub	98.03	99.67	99.98	99.11	97.70	99.95
Willow-swamp	91.75	95.90	97.95	92.88	98.96	99.98
Cabbage-palm-hammock	96.07	98.23	99.98	93.13	99.97	98.70
Cabbage-palm/oak-hammock	79.20	86.13	96.20	74.25	75.84	82.77
Slash-pine	50.20	78.12	89.98	67.18	68.75	82.91
Oak/broadleaf-hammock	69.56	95.65	99.96	77.17	90.31	95.65
Hardwood-swamp	92.85	99.98	99.98	92.86	99.96	99.45
Graminoid-marsh	97.09	98.25	99.97	94.19	97.19	99.41
Spartina-Marsh	99.53	99.98	99.98	99.52	99.51	99.98
Cattail-marsh	99.38	99.44	99.97	93.83	97.73	99.97
salt-marsh	99.40	99.98	99.96	98.81	97.81	99.72
Mud-Flats	98.50	99.97	99.94	96.02	99.96	97.66
Water	99.98	99.95	99.98	99.98	99.98	99.97
Average Accuracy	90.12	96.25	98.75	90.69	94.12	96.62
Overall Accuracy	94.58	97.93	99.32	94.29	96.35	97.93
Kappa Score	93.96	97.70	99.25	93.64	95.94	97.70

Fig. 12

Graphical representation of the accuracy vs size of spatial context in our proposed PCA-Hybrid 3D-2D CNN.

We provide the number of trainable parameters among only hybrid 3D-2D CNN, PCA-Hybrid 3D-2D CNN, SAE-Hybrid 3D-2D CNN and other state-of the art model such as PCA-Multi Scale CNN [22], ResNet50 [60] in Table 9. Here, we can see the lowest number of hyperparameter is in SAE-Hybrid 3D-2D CNN far less than other state of the art approaches. The greater the number of trainable parameters, the longer it takes to train the model and the more computational cost it incurs. From this analysis, we can say that our proposed approach takes less computation time and space compared to the other state-of-the art approaches as it has less number of hyperparameters. Therefore, we intend to employ dimensionality reduction techniques (PCA and stacked autoencoder) to reduce such huge trainable parameters of the standalone hybrid 3D-2D CNN. Consequently, the numbers of trainable parameters are the same in both frameworks (PCA-Hybrid 3D-2D CNN and SAE-Hybrid 3D-2D CNN) for the Indian Pines and Pavia University datasets. However, the number of trainable parameters for the KSC dataset is less in SAE-Hybrid 3D-2D CNN as the number of extracted components is less. From this discussion, it can be concluded that we could use stacked autoencoder when the number of PCs extracted by PCA is much more than the features extracted by the stacked autoencoder.

Table 9

Comparison of the total number of trainable parameters among Hybrid 3D-2D CNN, PCA-Hybrid 3D-2D CNN and SAE-Hybrid 3D-2D CNN

Dataset Name	Hybrid	PCA- Hybrid	SAE-Hybrid	PCA Multi-
	3D-2D CNN	3D-2D CNN	3D-2D CNN	Scale CNN	ResNet50
Indian Pines	15,538,953	3,005,193	3,005,193	13,924,682	2,56,10,826
Pavia University	8,387,337	2,267,913	2,267,913	9,570,122	2,56,00,956
KSC	13,769,481	8,166,669	2,194,701	44,130,122	2,56,40,050

3.6 Extended comparison

After getting the promising results of our hybrid 3D-2D CNN framework on the three different HSI datasets, we compare our approach with the following state-of-the-art independent 2D and 3D CNN-based methods: PCA-multiscale-CNN (PCA-MS-CNN) [22], dual channel CNN [46], random forest with CNN [45], 3D CNN [47], PCA with 2D CNN [23] and lastly SVM with non-linear RBF kernel. We provide the results in Figs. 13, 14 and 15; from which it can be observed that in all three datasets, our proposed approach PCA-Hybrid 3D-2D CNN significantly outperforms the independent 2D and 3D CNN-based state-of-the-art approaches and SAE-Hybrid 3D-2D CNN also moderately surpasses those state-of-the-art methods. To this end, our proposed hybrid 3D-2D CNN architecture achieves superior results in comparison to the investigated methods because (i) the 3D convolutional layers that have been used in the architecture extract not only spatial features but also extract fine spectral features; (ii) the 3D convolutional layers also fine-tune the spectral features making the classification accuracy more precise and accurate; and (iii) the 2D convolutional layer facilitates spatial feature learning across the spectral bands and enhances the spatial features extracted before. Note that the feature reduction techniques PCA and stacked autoencoder reduce the high dimensionality of the HSI and contribute to the reduced parameter of the proposed hybrid 3D-2D CNN architecture. The number of training samples are very limited in case of hyperspectral image which may lead to overfitting problem, that is why we have used dropout and regularization techniques to prevent it. Lastly, if the spatial context size, s is too big, then there is the possibility of the test sample being overlapped with the train sample, that is why we have restricted the spatial context, s to a certain extent to avoid this overlapping. Note that our main goal in this work is to build a lightweight CNN with very smaller number of parameters to achieve the similar performance as the heavyweight CNN. As such, the results of our proposed lightweight CNN is superior than the existing methods but very close to them. Therefore, we have not compared the visual results of hyperspectral image of our method over the existing methods as the differences in such results would not be highly visualized.

Fig. 13

Comparison of the proposed hybrid 3D-2D CNN architecture with the independent 2D and 3D CNN-based methods on Indian Pines dataset.

Fig. 14

Comparison of the proposed hybrid 3D-2D CNN architecture with the independent 2D and 3D CNN-based methods on Pavia University dataset.

Fig. 15

Comparison of the proposed hybrid 3D-2D CNN architecture with the independent 2D and 3D CNN-based methods on KSC dataset.

4 Conclusion and future research

In this paper, we have proposed a novel well-tuned hybrid 3D-2D deep CNN architecture that has been evaluated on three benchmark datasets (Indian Pines, Pavia University and KSC) and has achieved a remarkable classification accuracy improvement. We have used PCA and stacked autoencoder for reducing the dimensionality of the HSI and provided an extensive comparison based on different accuracy measures (AA, OA and Kappa score) and the number of trainable parameters. We have also provided a way of when to use PCA and when to use stacked autoencoder for the HSI classification along with our designed 3D-2D CNN framework. In the proposed architecture, we have used three 3D convolutional layers followed by one 2D convolution layer to extract high-quality spectral and spatial features and merge them in such a congruous way that improves the classification of different land covers and vegetations in the HSI. We have achieved 99.73%, 99.90%, and 99.32% accuracy for Indian Pines, Pavia University, and KSC datasets, respectively that have outperformed the other state-of-the-art approaches both in machine learning (SVM with RBF kernel) and DL (independent 2D and 3D CNN-based approaches). Our proposals have improved the classification accuracy as well as reduced the high computational cost using feature reduction techniques. However, choosing the size of the spatial context in the 3D-2D CNN adaptively could be a fine future research direction. In future, we will investigate our approaches for different fields, ranging from 3D medical imaging to satellite surveillance.

References

Hossain

M.A.

, Jia

, Pickering

, Subspace detection using a mutual information measure for hyperspectral image classification, IEEE Geoscience and Remote Sensing Letters 11(2) (2013), 424–428.

Uddin

M.P.

, Mamun

M.A.

, Hossain

M.A.

, Feature extraction for hyperspectral image classification, In 2017 IEEE Region 10 Humanitarian Technology Conference (R10-HTC), pages 379–382, IEEE, 2017.

, Ma

, Wang

, Hyperspectral image denoise based on curvelet transform combined with weight coefficient method, Journal of Intelligent & Fuzzy Systems 37(4) (2019), 4425–4429.

Uddin

M.P.

, Mamun

M.A.

, Hossain

M.A.

, Afjal

M.I.

, Improved folded-pca for efficient remote sensing hyperspectral image classification, Geocarto International (just-accepted) (2021), 1–28.

Chen

G.Y.

, Xie

W.F.

, Hyperspectral face recognition with minimum noise fraction, histogram of oriented gradient features and collaborative representation-based classifier, Journal of Intelligent & Fuzzy Systems 37(1) (2019), 635–643.

, Zhang

, Pan

, Studies on hyperspectral face recognition in visible spectrum with feature band selection, IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 40(6) (2010), 1354–1361.

, Dao

P.D.

, Liu

, He

, Shang

, Recent advances of hyperspectral imaging technology and applications in agriculture, Remote Sensing 12(16) (2020), 2659.

Uddin

M.P.

, Mamun

M.A.

, Hossain

M.A.

, Pcabased feature reduction for hyperspectral remote sensing image classification, IETE Technical Review, pages 1–21, 2020.

van der Werff

H.M.A.

, Knowledge-based remote sensing of complex objects: recognition of spectral and spatial patterns resulting from natural hydrocarbon seepages, Citeseer, 2006.

10.

, Niu

, Dou

, Xu

, Xia

, Leveraging local receptive fields based random weights networks for hyperspectral image classification, Journal of Intelligent & Fuzzy Systems 31(2) (2016), 1017–1028.

11.

Uddin

M.P.

, Mamun

M.A.

, Hossain

M.A.

, Improved feature extraction using segmented fpca for hyperspectral image classification, In 2017 2nd International Conference on Electrical & Electronic Engineering (ICEEE)pages 1–4, IEEE, 2017.

12.

Uddin

M.P.

, Mamun

M.A.

, Hossain

M.A.

, Segmented fpca for hyperspectral image classification, In 2017 3rd International Conference on Electrical Information and Communication Technology (EICT), pages 1–6, IEEE, 2017.

13.

Singh

, Gaurav

, Rai

A.K.

, Beg

, Machine learning to estimate surface roughness from satellite images, Remote Sensing 13(19) (2021), 3794.

14.

Zheng

, Yuan

, Lu

, Dimensionality reduction by spatial–spectral preservation in selected bands, IEEE Transactions on Geoscience and Remote Sensing 55(9) (2017), 5185–5197.

15.

Anowar

, Sadaoui

, Selim

, Conceptual and empirical comparison of dimensionality reduction algorithms (pca, kpca, lda, mds, svd, lle, isomap, le, ica, t-sne), Computer Science Review 40 (2021), 100378.

16.

Hughes

, On the mean accuracy of statistical pattern recognizers, IEEE Transactions on Information Theory 14(1) (1968), 55–63.

17.

Gasteiger

, Emde

, Mayer

, Buras

, Buehler

S.A.

, Lemke

, Representative wavelengths absorption parameterization applied to satellite channels and spectral bands, Journal of Quantitative Spectroscopy and Radiative Transfer 148 (2014), 99–115.

18.

Yang

, Zhao

Y.-Q.

, Chan

J.C.-W.

, Learning and transferring deep joint spectral–spatial features for hyperspectral classification, IEEE Transactions on Geoscience and Remote Sensing 55(8) (2017), 4729–4742.

19.

, Du

, Huang

, Tan

, A deep translation (gan) based change detection network for optical and sar remote sensing images, ISPRS Journal of Photogrammetry and Remote Sensing 179 (2021), 14–34.

20.

Kuang

, Xu

, Combined multiple spectral–spatial features and multikernel support tensor machine for hyperspectral image classification, Journal of Applied Remote Sensing 14(3) (2019), 032603.

21.

Xue

, Zeng

, Chen

, Wang

, Zhang

, A new dataset and deep residual spectral spatial network for hyperspectral image classification, Symmetry 12(4) (2020), 561.

22.

Haque

M.R.

, Mishu

S.Z.

, Spectral-spatial feature extraction using pca and multi-scale deep convolutional neural network for hyperspectral image classification, In 2019 22nd International Conference on Computer and Information Technology (ICCIT)pages 1–6, IEEE, 2019.

23.

Makantasis

, Karantzalos

, Doulamis

, Deep supervised learning for hyperspectral data classification through convolutional neural networks, In 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)pages 4959–4962, IEEE, 2015.

24.

Mishu

S.Z.

, Ahmed

, Hossain

M.A.

, Uddin

M.P.

, Effective subspace detection based on the measurement of both the spectral and spatial information for hyperspectral image classification, International Journal of Remote Sensing 41(19) (2020), 7541–7564.

25.

, Hu

, Yu

, A framework of multiple kernel ensemble learning for classification using twostage feature selection method, Journal of Intelligent & Fuzzy Systems 33(5) (2017), 2737–2747.

26.

Uddin

M.P.

, Mamun

M.A.

, Afjal

M.I.

, Hossain

M.A.

, Information-theoretic feature selection with segmentation-based folded principal component analysis (pca) for hyperspectral image classification, International Journal of Remote Sensing 42(1) (2020), 286–321.

27.

Uddin

M.P.

, Mamun

M.A.

, Hossain

M.A.

, Effective feature extraction through segmentation-based folded-pca for hyperspectral image classification, International Journal of Remote Sensing 40(18) (2019), 7190–7220.

28.

Datta

, Ghosh

, Pca, kernel pca and dimensionality reduction in hyperspectral images, In Advances in Principal Component Analysis, pages 19– 46, Springer, 2018.

29.

C.-F.

, Liu

, Lei

Y.-M.

, Yin

J.-Y.

, Zhao

, Sun

X.-K.

, Clustering for hsi hyperspectral image with weighted pca and ica, Journal of Intelligent & Fuzzy Systems 32(5) (2017), 3729–3737.

30.

Scholkopf

, Smola

, Muller

K.-R.

, Kernel principal component analysis, In International conference on artificial neural networks, pages 583–588, Springer, 1997.

31.

, Fowler

J.E.

, Low-complexity principal component analysis for hyperspectral image compression, The International Journal of High Performance Computing Applications 22(4) (2008), 438–448.

32.

, Chang

C.-I.

, Segmented pca-based compression for hyperspectral image analysis, In Chemical and Biological Standoff Detection, volume 5268, pages 274–281. International Society for Optics and Photonics, 2004.

33.

Zabalza

, Ren

, Yang

, Zhang

, Wang

, Marshall

, Han

, Novel folded-pca for improved feature extraction and data reduction with hyperspectral imaging and sar in Remote Sensing, ISPRS Journal of Photogrammetry and Remote Sensing 93 (2014), 112–122.

34.

Cao

L.J.

, Chua

K.S.

, Chong

W.K.

, Lee

H.P.

, Gu

Q.M.

, A comparison of pca, kpca and ica for dimensionality reduction in support vector machine, Neurocomputing 55(1-2) (2003), 321–336.

35.

Zabalza

, Ren

, Zheng

, Zhao

, Qing

, Yang

, Du

, Marshall

, Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging, Neurocomputing 185 (2016), 1–10.

36.

Jacob

N.V.

, Sowmya

, Soman

K.P.

, Effect of denoising on hyperspectral image classification using deep networks and kernel methods, Journal of Intelligent & Fuzzy Systems 36(3) (2019), 2067–2073.

37.

Yuksel

M.E.

, Basturk

N.S.

, Badem

, Caliskan

, Basturk

, Classification of high resolution hyperspectral remote sensing data using deep neural networks, Journal of Intelligent & Fuzzy Systems 34(4) (2018), 2273–2285.

38.

Ghafari

, Tarnik

M.G.

, Yazdi

H.S.

, Robustness of convolutional neural network models in hyperspectral noisy datasets with loss functions, Computers & Electrical Engineering 90 (2021), 107009.

39.

Chen

, Zhao

, Jia

, Spectral–spatial classification of hyperspectral data based on deep belief network, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 8(6) (2015), 2381–2392.

40.

Paoletti

M.E.

, Haut

J.M.

, Plaza

, Deep&dense convolutional neural network for hyperspectral image classification, Remote Sensing 10(9) (2018), 1454.

41.

Zhang

, Sun

, Jiang

, Li

, Jiao

, Zhou

, Spatial sequential recurrent neural network for hyperspectral image classification, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 11(11) (2018), 4141–4155.

42.

Zhou

, Hang

, Liu

, Yuan

, Hyperspectral image classification using spectral-spatial lstms, Neurocomputing 328 (2019), 39–47.

43.

Sun

, Zheng

, Lu

, Wu

, Spectral–spatial attention network for hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing 58(5) (2019), 3232–3245.

44.

Krizhevsky

, Sutskever

, Hinton

G.E.

, Imagenet classification with deep convolutional neural networks, Communications of the ACM 60(6) (2017), 84–90.

45.

Wang

, Wang

, Chen

, Hyperspectral image classification based on convolutional neural network and random forest, Remote Sensing Letters 10(11) (2019), 1086–1094.

46.

Zhang

, Li

, Zhang

, Shen

, Spectral-spatial classification of hyperspectral imagery using a dual-channel convolutional neural network, Remote Sensing Letters 8(5) (2017), 438–447.

47.

Ahmad

, A fast 3d cnn for hyperspectral image classification, arXiv preprint arXiv:2004.14152, 2020.

48.

Kanthi

, Sarma

T.H.

, Bindu

C.S.

, A 3d-deep cnn based feature extraction and hyperspectral image classification, In 2020 IEEE India Geoscience and Remote Sensing Symposium (InGARSS), pages 229–232, IEEE, 2020.

49.

, Chen

, Ghamisi

, Dual graph convolutional network for hyperspectral image classification with limited training samples, IEEE Transactions on Geoscience and Remote Sensing, 2021.

50.

Paoletti

M.E.

, Haut

J.M.

, Plaza

, Deep learning classifiers for hyperspectral imaging: A review, ISPRS Journal of Photogrammetry and Remote Sensing 158 (2019), 279–317.

51.

Rodarmel

, Shan

, Principal component analysis for hyperspectral image classification, Surveying and Land Information Science 62(2) (2002), 115–122.

52.

Han

, Kamber

, Pei

, Data mining concepts and techniques third edition, The Morgan Kaufmann Series in Data Management Systems 5(4) (2011), 83–124.

53.

Meng

, Catchpoole

, Skillicom

, Kennedy

P.J.

, Relational autoencoder for feature extraction, In 2017 International Joint Conference on Neural Networks (IJCNN), pages 364–371, IEEE, 2017.

54.

Zheng

, Peng

, An autoencoder-based image reconstruction for electrical capacitance tomography, IEEE Sensors Journal 18(13) (2018), 5464–5474.

55.

Zhou

, Han

, Cheng

, Zhang

, Learning compact and discriminative stacked autoencoder for hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing 57(7) (2019), 4823–4833.

56.

Russakovsky

, Deng

, Su

, Krause

, Satheesh

, Ma

, Huang

, Karpathy

, Khosla

, Bernstein

, et al., Imagenet large scale visual recognition challenge, International Journal of Computer Vision 115(3) (2015), 211–252.

57.

Aghdam

H.H.

, Heravi

E.J.

, Convolutional neural networks, In Guide to convolutional neural networkspages 85–130, Springer, 2017.

58.

, Zhang

, Gu

, Pan

, Overfitting remedy by sparsifying regularization on fully-connected layers of cnns, Neurocomputing 328 (2019), 69–74.

59.

Baumgardner

M.F.

, Biehl

L.L.

, Landgrebe

D.A.

, 220 band aviris hyperspectral image data set: June 12, 1992 indian pine test site 3, Sep 2015.

60.

Firat

, Hanbay

, Classification of hyperspectral images using 3d cnn based resnet50, In 2021 29th Signal Processing and Communications Applications Conference (SIU), pages 1–4, IEEE, 2021.