A saliency model-oriented convolution neural network for cloud detection in remote sensing images

Abstract

Remote sensing is an indispensable technical way for monitoring earth resources and environmental changes. However, optical remote sensing images often contain a large number of cloud, especially in tropical rain forest areas, make it difficult to obtain completely cloud-free remote sensing images. Therefore, accurate cloud detection is of great research value for optical remote sensing applications. In this paper, we propose a saliency model-oriented convolution neural network for cloud detection in remote sensing images. Firstly, we adopt Kernel Principal Component Analysis (KCPA) to unsupervised pre-training the network. Secondly, small labeled samples are used to fine-tune the network structure. And, remote sensing images are performed with super-pixel approach before cloud detection to eliminate the irrelevant backgrounds and non-clouds object. Thirdly, the image blocks are input into the trained convolutional neural network (CNN) for cloud detection. Meanwhile, the segmented image will be recovered. Fourth, we fuse the detected result with the saliency map of raw image to further improve the accuracy of detection result. Experiments show that the proposed method can accurately detect cloud. Compared to other state-of-the-art cloud detection method, the new method has better robustness.

Keywords

Cloud detection remote sensing convolutional neural network kernel principal component analysis saliency model super-pixel segment

1. Introduction

With the rapid development of remote sensing technology, remote sensing image has been widely used in scientific research, social services and other fields [1, 2, 3]. In China, the indexes of domestic satellites represented by Gaofen-1, Gaofen-2 and Resource-3 are close to the international advanced level [4]. The quality and data volume of domestic satellite images are increasing rapidly, and the marketization degree is greatly improved compared with previous time, which plays an important role in various fields of society. However, not all remote sensing images can be directly used in engineering projects and research practices. One of the important limiting factors is that the images acquired by satellite-borne satellites generally contain clouds due to cloud cover, which accounts for a relatively large proportion, generally more than 50% [5]. The existence of clouds not only covers the surface information, but also affects image registration, fusion, image information extraction and other processing. Therefore, it is very necessary to detect and remove clouds from remote sensing image. However, there are still bottlenecks in the current image cloud detection and removal technology [6].

Cloud is ubiquitous in remote sensing images. By shielding ground objects, it weakens the recording of the real radiation characteristics of the surface by satellite sensors, thus changing the spectral information of ground objects [7, 8, 9]. Therefore, remote sensing image cloud identification is a very critical data preprocessing, and the accuracy of cloud identification has an important impact on land use classification, change detection and quantitative extraction of remote sensing parameters.

In this paper, we propose a saliency model-oriented convolution neural network for cloud detection in remote sensing images. This paper is organized as follows. First, the related works are shown in Section 2. The training process and super-pixel segmentation are introduced in Section 3. In Section 4, the super-pixel segmentation method is adopted for obtaining cloud saliency map. In Section 5, experiments on the remote sensing images from Google Earth are implemented. The proposed method and the state-of-the-art cloud detection methods are compared in both subjective and quantitative evaluations. Finally, a conclusion is conducted in Section 6.

2. Literature review

In recent years, many scholars have proposed a variety of cloud detection methods in remote sensing image, where spectral threshold method is one of the simplest and most effective algorithms in the current cloud detection field. The method mainly uses remote sensing image spectral feature, namely according to the cloud in the visible or infrared reflectivity or bright temperature value and other spectral characteristics, the cloud and non-cloud by brightness threshold feature will be segmented.

Cloud detection methods are mainly divided two aspects. One is the traditional method. For example, Wang [10] proposed an improved method for detecting cloud combining K-means clustering and the multi-spectral threshold approach. On the basis of landmark spectrum analysis, MODIS data was categorized into two major types initially by K-means method. The first class included clouds, smoke and snow, and the second class included vegetation, water and land. Then a multispectral threshold detection was applied to eliminate interference such as smoke and snow for the first class. Li [11] presented a new spectral-spatial classification strategy to enhance the performance of an orbiting cloud screen obtained on hyperspectral images by integrating a threshold exponential spectral angle map (TESAM), adaptive Markov random field (AMRF) and dynamic stochastic resonance (DSR). TESAM was applied to roughly classify cloud pixels based on spectral information. Then AMRF was used to do optimal process by using spatial information, which improved the classification performance significantly. Nevertheless, misclassifications occurred due to noisy data in the onboard environments, and DSR was employed to eliminate noise data produced by AMRF in binary labeled images. Another method is deep learning method. For example, Jedlovec [12] proposed a new cloud detection technique. The bispectral composite threshold (BCT) technique adopted only the 11th mum and mum 3.9 channels, and the composite imagery generated from these channels. In a four-step cloud detection procedure to produce a binary cloud mask at single pixel resolution. This kind of cloud detection method has the advantages of simple calculation and high monitoring efficiency. But it also has many defects, such as requiring images to have thermal infrared band information, slightly lower monitoring accuracy, misjudgment and so on.

Based on the difference of cloud and ground object texture features, cloud and ground object can be distinguished by extracting feature combination (such as gray level symbiosis matrix, Gabor texture features, etc.), which is another cloud detection method with better effect in remote sensing image. Başeski [13] proposed a new method making use of both color and texture characteristics of cloud regions. The image was divided into sub-images in order to perform initial color and edge analysis. Liu [14] designed a super-pixel level cloud detection method based on convolutional neural network (CNN) and deep forest. Firstly, remote sensing images were segmented into super-pixels through the combination of SLIC and SEEDS. Structured forests was carried out to compute edge probability of each pixel, based on which super-pixels were segmented more precisely. Segmented super-pixels composed a super-pixel level remote sensing database. However, due to the diversity of cloud features in optical remote sensing images, the distribution of different cloud features in feature combination is not typical, so it is still difficult to use texture difference for cloud detection. Some modified cloud detection algorithms based on texture feature, to a certain extent, improve the cloud detection accuracy. But there are some defects such as time-consuming, laborious training classifier and difficulty in automatic extraction of massive image data. Convolutional neural network (CNN) is a typical deep learning algorithm, and its model parameters can be obtained through network training by gradient descent method, avoiding the complex pre-processing process in the early stage of image. Moreover, the trained convolutional neural network can fully mine the features in the image and efficiently complete the cloud detection of remote sensing image.

To solve the above problems, we propose a saliency model-oriented convolution neural network for cloud detection in remote sensing images. We use KCPA to unsupervised pre-training the network. Then the network structure is fine-tuned. Remote sensing images are performed with super-pixel approach before cloud detection to eliminate the irrelevant backgrounds and non-clouds object. The image blocks are input into the trained CNN for cloud detection. Finally, we fuse the detected result with the saliency map of raw image to further improve the accuracy of detection result.

3. Proposed CNN and super-pixel segmentation

3.1 Convolution neural network (CNN)

Convolution neural network (CNN) is inspired by biology, neuroscience [15, 16, 17]. It refers to its structure principle combined with artificial neural network and produces one of the pioneering researches. CNN is a kind of feed forward artificial neural network system with deep learning ability. Typical convolution neural network model concludes the input layer, convolution, sample layer, fully connected layer. Compared with the traditional neural network, the convolutional neural network has strong applicability, feature extraction and classification at the same time, strong generalization ability, less global optimization training parameters, which gradually becomes the hot-spot research in the field of deep learning [18, 19]. Its principle structure is shown in Fig. 1.

Figure 1.

CNN framework.

The working processes of the convolutional neural network are as follows.

First of all, the image executes convolution operation on convolution layer. And the convolution output feature graph is sampled by the sampling layer [20]. Then after a number of convolution operations and sampling processing, the extracted feature images are classified through the full connection layer. Output the target result.

In this paper, $G_{i}$ is adopted to represent the feature graph of the i-th layer in convolutional neural network. The convolution process can be described as:

$\displaystyle G_{i}=f(G_{i-1}\otimes W_{i}+b_{i})$ (1)

where the symbol $\otimes$ represents the convolution operation between the $i$ -th layer and the $i$ -1th layer feature graph. $W_{i}$ represents the weight vector of the convolution kernel at the $i$ -th layer. $b_{i}$ represents the offset vector. The linear activation function $f(\cdot)$ denotes the feature image $G_{i}$ of the $i$ -th layer in the convolution process.

The feature graph calculated by the CNN has the dimension of millions. If classifiers are trained directly, it is easy to lead to over-fitting [21, 22]. Therefore, it is necessary to carry out the image feature graph down-sampling after the convolution operation, namely pooling operation. At the same time, in order to avoid the loss of information in the pooling operation, the probability maximization is usually used for the pooling operation.

Assume that one neuron in the sampling block $X_{n}$ is in the activated state, then, the sampling value $P_{n}$ generates a response. Otherwise no response. And the pooling process can be expressed as:

$\displaystyle P(P_{n}|i)=e^{G_{i-1}\otimes W_{i}+b_{i}}/(1+e^{G_{i-1}\otimes W% _{i}+b_{i}})$ (2)

Traditional full connection layer parameters are huge, which may lead to over-fitting problems. Therefore, when designing the network framework, this paper considers to replace the final full connection layer with a fully maximized pooling layer. The structure of deep convolutional neural network designed in this paper is shown in Fig. 2.

Figure 2.

Proposed CNN network in this paper.

3.2 SCNN training

The training of CNN requires a large number of samples, while the actual sample base establishment requires a lot of time and manpower. To solve this problem, this paper adopts the semi-supervised classification method to train the CNN network, which can effectively improve the training efficiency while ensuring the training samples [23].

This paper adopts the unsupervised pre-training network based on kernel principal component analysis structure. It is assumed that the images input into the convolutional neural network have M scenes, the size is $m\times n$ . The convolutional filter size is $g_{1}\times g_{2}$ . In the i-th scene of the training image, all image blocks with size $g_{1}\times g_{2}$ are taken and expressed as the vector form $X_{i}=(x_{i,1},x_{i,2},\cdots,x_{i,nm})$ . The image block of $X$ is $\bar{X}_{i}=\{\bar{x}_{i,1},\bar{x}_{i,2},\cdots,\bar{x}_{i,nm}\}$ after mean filter. so the image block of training data can be denoted as:

$\displaystyle X=\{\bar{x}_{1},\bar{x}_{2},\cdots,\bar{x}_{n}\}$ (3)

Kernel principal component analysis method is adopted to minimize reconstruction errors and solve eigenvectors.

$\displaystyle\mathop{\min}\limits_{V\in R^{g_{1}g_{2}H_{1}}}||X-VV^{T}X||^{2}% \textit{ and }V^{T}V=I_{H}$ (4)

where $I_{H}$ is the $H\times H$ unit matrix. $V$ is the top-H eigenvectors of the covariance matrix $XX^{T}$ . $V$ can represent the main features of the input image block. Kernel principal component analysis initializes filter $W_{h}$ of convolutional neural network, it can be denoted as:

$\displaystyle W_{h}=m_{g_{1}g_{2}}(V_{h}),h=1,2,\cdots,H$ (5)

where $m_{g_{1}g_{2}}(V_{h})$ represents the mapping of the vector $V$ to the matrix $W_{h}$ , and $V_{h}$ represents the $h$ -th main feature of the image.

The network structure is supervised and pre-trained through kernel principal component analysis, and the initial weight parameters of the network are obtained. Then the whole network is fine-tuned through labeled data. The process of training convolutional neural network in this paper is shown in Fig. 3.

Figure 3.

The new detection process in this paper. Abbreviations: CNN: Convolutional Neural Network. KPCA: Kernel Principal Component Analysis. SLIC: Simple Linear Iterative Clustering.

3.3 Super-pixel segmentation

In order to avoid the low efficiency of cloud detection by inputting the whole image into the convolutional neural network, the remote sensing image needs to be pre-processed into blocks before cloud detection. On the basis of the spectral and spatial characteristics of remote sensing image, the simple linear iterative cluster (SLIC) is used to generate super-pixels [24] to improve the classification accuracy.

SLIC consists of similar pixels with similar spectra and adjacent spaces. Assuming remote sensing image $(x_{i})_{i=1}^{M}$ includes N bands, M pixels. $x_{i}$ represents the spectral feature of i-th pixel. SLIC algorithm generates $Q$ super pixels $(v_{i})_{i=1}^{Q}$ images, where each super pixel contains $M/Q$ pixels.

The spectral features are described as the sum of the Euclidean distances of the pixel spectra and all the generated super-pixels in the N bands, which can be expressed as:

$\displaystyle d_{j}=\sqrt{\sum_{j=1}^{N}{(x_{kj}-v_{ij})^{2}}}$ (6)

where $j$ represents the number of image bands. $k$ represents the $k$ -th pixel of the original image. $d_{j}$ denotes the sum of the distance between the pixel spectrum and the super pixel in N bands.

The spatial feature is described by the distance between the pixel and the super-pixel in the coordinate system, and the expression is:

$\displaystyle d_{i}=\sqrt{(x_{x}-v_{x})^{2}+(x_{y}-v_{y})^{2}}$ (7)

The final SLIC clustering function can be defined as:

$\displaystyle\textit{final}(f)=d_{j}+(M/S)\cdot d_{i}$ (8)

where $S\in[1,40]$ is a parameter that controls the tightness between super-pixels. This parameter represents the maximum possible value in XY space. In order to prevent the clustering center of the super-pixel falling on the boundary point or noise pixel, the point with the smallest spectral gradient of the pixel is searched in a $4\times 4$ neighborhood of the clustering center. As the seed point of the new super-pixel, the generation of super-pixel is iteratively completed.

4. Obtaining cloud saliency map

The function of human visual attention mechanism is to locate the interesting object quickly from the complex visual scene. Current simulation calculation model of visual attention is mainly divided into two categories: a) Bottom-up visual calculation model based on low-level vision features, the typical model is Itti [25], when the background distribution is clutter, the processing result is not ideal. b) For a specific object, the top-down visual selective attention model based on its advanced visual feature. In remote sensing images, clouds have typical shape and significant texture features. Therefore, this paper uses visual model to calculate region saliency regarding the cloud area and shape, texture features as the visual search object and middle-bottom feature respectively. And it considers the above as the evaluation index to judge whether this region is a cloud region, finally determine the location of the cloud region by measuring this index.

In order to highlight the texture of cloud area and improve the distinction between cloud area texture and other disturbing textures, there are two processing means. 1) Combining with the features (such as multi-scale, multi-direction and translational invariance) of nonsubsampled contourlet transform (NSCT), local texture energy of each scale and direction sub-band after NSCT decomposition is taken as its texture feature. Combining with maximizing the information content of each feature, the saliency map is obtained. However, more global features are lost under more complex background. 2) On the basis of “global isolation and local compactness” in saliency region, the problem of saliency detection is transformed into markov random walk problem. So an algorithm for detecting saliency objects based on random walk on mixed graph is proposed. However, due to the fully connected mixed graph constructed by the algorithm, the time complexity of the algorithm is $O(n^{2})$ . When the input image has a high resolution and a large number of blocks, it is not suitable for fast extraction of saliency regions.

In order to rapidly calculate the saliency features, the QDCT (quaternion discrete cosine transform) method [26] is adopted to extract the saliency map in this paper.

$\displaystyle S_{\textit{QDCT}}(I_{Q})=g\cdot[T(I_{Q})\circ\tilde{T}(I_{Q})]$ (9) $\displaystyle T(I_{Q})=\textit{Inverse}(\textit{QDCT})^{L}(\text{sgn}(\textit{% QDCT}^{L}(I_{Q})))$ (10)

where $S$ represents the extracted saliency map, $g$ is the Gaussian filter. $I_{Q}$ is the image matrix converted from the each channel of original image $I$ . $\circ$ represents the Hadamar product. $\tilde{T}(I_{Q})$ is the conjugate of $T(I_{Q})$ . sgn is the sign function, QDCT and $\textit{Inerse}(\textit{QDCT})$ are the QDCT transform and QDCT inverse transform.

4.1 Super-pixel segmentation for saliency map

A single pixel is not enough to represent the feature information of the cloud, so the cloud needs to be segmented. In order to speed up the subsequent processing, this paper adopts the method of combining super-pixel block with visual saliency map. Based on SLIC algorithm, the image is biased into blocks combined with the obtained saliency image, and dense super-pixels are generated for saliency regions. A relatively sparse super-pixel is generated in the non-significant region. While reducing the number of global pixels, the edge details are retained, which is beneficial to the subsequent extraction and processing.

Saliency region $S R$ and non-saliency region NSR are obtained by thresholding saliency map S_map. Therefore, the steps of super-pixel segment algorithm are as follows:

Grid initialization. The grid space $S_{1}$ and $S_{2}$ are set for the saliency region $S R$ and non-significant region NSR, respectively. The grid center is selected as the initial super-pixel center. The grid center saliency of the $S R$ region is set as 0, and the grid center saliency of the NSR region is set as 1.

$\displaystyle S_{1}=\sqrt{(N+c(t-1))/K}$ (11) $\displaystyle S_{2}=S_{1}/\sqrt{t}$ (12)

where $N$ represents the total number of pixels in the image. $c$ is the number of saliency pixels. $t$ denotes the ratio of the super-pixel size between NSR and $S R$ . $K$ represents the total number of super-pixel blocks.

Clustering. Under the defined distance function $D_{i,k}$ , a distance calculation is performed for each pixel in the search area corresponding to the super-pixel center $C_{k}$ . And the pixel is divided into the nearest super-pixel block. The search scope of each time is the rectangular area centered on the super-pixel center. The size of the search scope varies according to the saliency of the super-pixel center. If the center saliency is 0, the search scope is set as $2S_{1}*2S_{1}$ . Otherwise, the search scope is set to $2S_{2}*2S_{2}$ .

$\displaystyle D_{i,k}=\sqrt{(d_{c}^{2}+(d_{s}/s)^{2}/m^{2})}$ (13) $\displaystyle m=40e^{(-(\textit{S\_map}(i))^{2}/(2\sigma^{2}))}$ (14)

where $d_{c}$ represents the color distance between the pixel $i$ and super-pixel center $C_{k}$ . $d_{s}$ denotes the spatial distance between the pixel $i$ and $C_{k}$ . $S$ is the grid size, and $\textit{S\_map}(i)$ represents the saliency of pixel $i$ in the image.

Edge refinement. According to SLIC algorithm, it finds the edge pixel $P_{i}$ in the search range of super pixel center $C_{k}$ . It calculates the distance $D_{i,k}$ between pixel $P_{i}$ and super pixel center $C_{k}$ . If $D_{i,k}$ is greater than a certain threshold, it will be reallocated to the nearest super pixel center $C_{k}^{\prime}$ . As the iterative process progressing, the edge of the super-pixel is continuously refined until the superpixel object error function is convergent.

4.2 Final cloud detection process

We define our proposed cloud detection method as SCNN (saliency model-oriented convolution neural network). The following is the procedure of SCNN cloud detection Algorithm 1.

Algorithm 1. SCNN-based cloud detection.
Input: Remote sensing image
Step 1. Using KPCA to train the CNN network;
Step 2. Block preprocessing for remote sensing images to be detected;
Step 3. Super-pixel segmentation for remote sensing images to be detected;
Step 4. Super-pixel clustering center is as the center, the image is divided into 64 $\times$ 64 pixel and 32 $\times$ 32 pixel;
Step 5. Input image blocks into the trained convolutional neural network for cloud detection;
Step 6. The image blocks are spliced after the detection;
Step 7. QDCT is adopted for obtaining saliency map of cloud and SLIC for saliency region.
Step 8. Fused step 6 and step 7, obtain the final result.
Output: Cloud detection result.

5. Experiments and analysis

The experiments are conducted on Ubuntu16.04 with Python software. The deep learning framework is Caffe, whose hardware environment is Intel Core i7, 16G memory, GPU GeForce GTX 1080.

The convolutional neural network structure constructed in this paper consists of two convolutional layers, two pooling layers and a full connection layer. The first layer of convolution filters is set to 340. The second layer of convolution filters is set to 200. Filter sizes are set to 5 $\times$ 5. The sampling interval is set to 1. Probability maximization is used for pooling. Before cloud detection, the remote sensing image to be detected is firstly clustered with a simple linear iteration to generate super pixel image. Taking the super pixel clustering center as the center, the image is divided into 64 $\times$ 64 pixel and 32 $\times$ 32 pixel blocks. Finally, the cloud features learned by convolutional neural network are input into the Sofimax classifier for cloud detection. We make overall accuracy comparison with CNN, KPCA $+$ CNN, KPCA $+$ CNN $+$ FCN (full connection layer) and KPCA $+$ CNN $+$ FCN (fully maximized pooling layer) under the same data set and experimental parameters. The results are shown in Fig. 5. The overall accuracy is calculated as:

$\displaystyle\text{Overall accuracy}=(\text{correct detected pixels})/(\text{% total detected pixels})\times 100{\%}$ (15)

Figure 4.

Visual comparisons of the cloud detection results with different methods.

As can be seen from Fig. 4, the proposed network can improve the performance of the training network, which verifies the rationality of the network constructed in this paper.

To verify the effectiveness of the proposed cloud detection method in this paper, we also conduct comparison with GAGC [27], SLCD [28] and COF [29]. GAGC treated clouds as a special kind of object and eliminate human labeling by two procedures. First, it adaptively computed the thresholds for each cloud image which automatically label some pixels as “cloud” or “clear sky” with high confidence. Then, those labeled pixels served as hard constraint seeds for the following graph cut algorithm. SLCD contained two modules: feature data simulating and cloud detector learning and applying. It first simulated a kind of cubic structural data by stacking different fundamental image features, including color, statistical information, texture, and structure. Such data synthesized different image features, and it was used for cloud detector training and applying. Cloud detector was designed based on minimizing the residual error between the feature data and its labels. COF extracted the color, texture, and statistical features of the remote sensing images with the color transform, dark channel estimation, Gabor filtering, and local statistical analysis methods. Then, an initial cloud detection map could be obtained by performing the support vector machines (SVM) on the stacked features, in which the SVM was trained with a set of samples automatically labeled by processing the dark channel of the original image with several thresholding and morphological operations. Finally, guided filtering was used to refine the boundaries in the initial detection map. The images are obtained from Google Earth. The ground truth of all images is manually marked. The cloud detection comparison results with different method are shown in Fig. 4.

In this paper, precision, recall (recall $=$ TPR), F-measure, FPR and ROC (receiver operating characteristic curve) are used to evaluate the quality of the extracted cloud. The precision rate P represents the proportion of the detected correct information, and the recall rate R represents the proportion of the effective information correctly retrieved. The calculation formula is shown in Eqs (16)–(18).

$\displaystyle\Pr\textit{ecision}=\frac{TP}{TP+FP}$ (16) $\displaystyle\textit{Recall}=\frac{TP}{TP+FN}$ (17) $\displaystyle\textit{FPR}=\frac{FP}{TN+FP}$ (18)

where TP denotes the case where positive class is judged as positive, FP represents the case where negative class is judged as positive, and FN represents the case where negative class is judged as positive. The value of precision rate and recall rate is from 0 to 1. The value is closer to 1, the algorithm performance is better. However, there is a contradiction between them. By introducing the weighted average $F$ value between them, the evaluation is conducted. The calculation formula of $F$ value is shown in Eq. (19), where $\beta$ is the introduced parameter. When the value of $\beta$ is 1, the calculated $F$ value is F1-measure.

$\displaystyle\textit{F-measure}=\frac{(1+\beta^{2})\cdot P\cdot R}{\beta^{2}(P% +R)}$ (19)

Figure 5.

Overall comparison.

The original images contain thin cloud, thick cloud, broken cloud and the translucent thin cloud, which are difficult to distinguish. Some images include cloud and snow or ice, which can easily cause confusion. It can be seen from Fig. 5 that the GAGC method can effectively identify thick clouds, but it has a poor effect on thin clouds. It is easy to identify bright ground objects as clouds, such as underlying surface covered with snow or ice. The SLCD method uses the combination of feature data simulating and cloud detector learning, which can overcome the interference of some noises. However, it is easy to be affected by the super pixel segmentation and generate the initial error, so the detection accuracy at the pixel level cannot be obtained, and it is easy to miss the detection of broken clouds. COF is an unsupervised method for cloud detection. It extracts the color, texture, and statistical features of the remote sensing images with the color transform, dark channel estimation, Gabor filtering, and local statistical analysis methods. So, an initial cloud detection map can be obtained by performing the support vector machines (SVM). A guided filtering is used to refine the boundaries in the initial detection map to improve the cloud detection accuracy. But it loses some features in the sampling process, resulting in fuzzy and smooth cloud detection results.

Table 1

Performance comparison

Method	Precision	Recall	F-measure	Time/s
GAGC	0.856	0.873	0.864	4.6
SLCD	0.928	0.914	0.874	3.5
COF	0.945	0.928	0.936	3.1
SCNN	0.954	0.964	0.959	2.2

Abbreviations: COF: Coarse-to-Fine Method. GAGC: Ground-Based Cloud Detection Using Automatic Graph Cut. SCNN: Saliency Model-oriented Convolution Neural Network. SLCD: Scene Learning for Cloud Detection.

Table 2

Average accuracy and error comparison

Method	GAGC	SLCD	COF	SCNN
Accuracy	0.897	0.917	0.952	0.963
Error	0.19	0.15	0.12	0.08

Figure 6.

ROC comparison.

The cloud detection result of proposed method is better than that of other three methods. SCNN method adopts KCPA to unsupervised pre-training the network. Small labeled samples are used to fine-tune the network structure. The proud is that it fuses the saliency map to further improve the detection result. Even under complex conditions (underlaying surface, a large number of thin clouds, etc.), the cloud can still be more completely extracted.

As can be seen from Table 1, compared to other methods, the recall and precision rate of the proposed algorithm is the highest. The F-measure of proposed method is improved due to the introduction of super pixel and saliency-oriented. Meanwhile, the time cost is less. Table 2 is the comparison of average accuracy and error. It displays the better result with proposed cloud detection method. Figure 6 is the ROC comparison with different methods. As seen, for the clouds, the ROC value of SCNN is greater than those of GAGC, SLCD and COF. This can be easily explained because the super-pixel segmentation approach of SCNN are more suitable for remote sensing objects than those of other methods. Better cloud detection results are obtained using SCNN.

6. Conclusions

This paper proposes a new cloud detection method based on saliency model via convolution neural network. Through experimental analysis, this method can make up for the shortcomings of traditional cloud detection methods. The cloud detection method proposed in this paper has high detection accuracy, few misjudgments and low error detection rate. Cloud shadows can be detected effectively with good generalization compared to other state-of-the-art methods. However, if the overlapped clouds are very complex, the proposed method has a low detection rate. In the future, we will research more advanced deep learning methods to improved the complex cloud detection effect.

Footnotes

Author’s Bios

	Jun Zhang was born in Gansu Province 1982. He was graduated from Nanjing University of Science and Technology in 2013 and obtained the Master Degree. Now he is with the office of academic affairs in Zhengzhou University of Science and Technology. His major is Computer Control Technology.
	Junjun Liu was born in Hebei Province 1984. She was graduated from Shenyang Aerospace University in 2010 and obtained the Master Degree. Now she is with the College of Information Engineering in Zhengzhou University of Science and Technology. Her major is Signals and systems, digital signal processing.

References

Nogueira

Penatti

and Santos

, Towards better exploiting convolutional neural networks for remote sensing scene classification, Pattern Recognition 61 (2017), 539–556.

Napoletano

, Visual descriptors for content-based retrieval of remote-sensing images, International Journal of Remote Sensing 39(5) (2017), 1–34.

Zhang

Q.C.

Yang

L.T.

Chen

Z.K.

et al., An adaptive dropout deep computation model for industrial iot big data learning with crowdsourcing to cloud computing, IEEE Transactions on Industrial Informatics 15(4) (2019), 2330–2337.

Tao

Dan

and Yu

, Enhanced IT2FCM algorithm using object-based triangular fuzzy set modeling for remote-sensing clustering, Computers & Geoscience 118 (2018), 14–26.

Zhu

and Woodcock

C.E.

, Automated cloud, cloud shadow, and snow detection in multitemporal Landsat data: An algorithm designed specifically for monitoring land cover change, Remote Sensing of Environment 152 (2014), 217–234.

Williams

J.A.

Dawood

A.S.

and Visser

S.J.

, FPGA-based cloud detection for real-time onboard remote sensing, in: 2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings., 2002, pp. 110–116.

Shen

et al., Multi-feature combined cloud and cloud shadow detection in GaoFen-1 wide field of view imagery, Remote Sensing of Environment 191 (2017), 342–358.

Yin

Zhang

and Karim

, Large scale remote sensing image segmentation based on fuzzy region competition and gaussian mixture model, IEEE Access 6 (2018), 26069–26080.

Zheng

et al., Object-based cloud detection of multitemporal high-resolution stationary satellite images, Optical Engineering 56(7) (2017), 073103.

10.

Wang

Song

Liu

S.X.

et al., A cloud detection algorithm for MODIS images combining Kmeans clustering and multi-spectral threshold method, Spectroscopy & Spectral Analysis 31(4) (2011), 1061–1064.

11.

Zheng

Han

et al., Onboard spectral and spatial cloud detection for hyperspectral remote sensing images, Remote Sensing 10(1) (2018), 152.

12.

Jedlovec

and Haines

, Spatial and temporal varying thresholds for cloud detection in satellite imagery, in: 2007 IEEE International Geoscience and Remote Sensing Symposium, 2007, pp. 3329–3332.

13.

Başeski

and Cenaras

Ç.

, Texture and color based cloud detection, in: 2015 7th International Conference on Recent Advances in Space Technologies (RAST), 2015, pp. 311–315.

14.

Liu

Zeng

and Tian

, Super-Pixel Cloud Detection Using Hierarchical Fusion CNN, in: 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM), 2018, pp. 1–6.

15.

Laves

Bicker

Kahrs

et al., A dataset of laryngeal endoscopic images with comparative study on convolution neural network-based semantic segmentation, International Journal of Computer Assisted Radiology and Surgery 14(3) (2019), 483–492.

16.

Chen

Yang

L.T.

et al., Deep convolutional computation model for feature learning on big data in internet of things, IEEE Transactions on Industrial Informatics 14(2) (2018), 790–798.

17.

Chen

Yang

L.T.

et al., An incremental deep convolutional computation model for feature learning on industrial big data, IEEE Transactions on Industrial Informatics 15(3) (2019), 1341–1349.

18.

Dey

Chatterjee

Dalai

et al., A deep learning framework using convolution neural network for classification of impulse fault patterns in transformers with increased accuracy, IEEE Transactions on Dielectrics & Electrical Insulation 24(6) (2018), 3894–3897.

19.

Zhang

Yang

L.T.

Yan

et al., An efficient deep learning model to predict cloud workload for industry informatics, IEEE Transactions on Industrial Informatics 14(7) (2018), 3170–3178.

20.

Gao

and Chen

, A canonical polyadic deep convolutional computation model for big data feature learning in Internet of Things, Future Generation Computer Systems 99 (2019), 508–516.

21.

Xin

Cong

Xing

et al., Combining pixel- and object-based machine learning for identification of water-body types from urban high-resolution remote-sensing imagery, IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing 8(5) (2017), 2097–2110.

22.

Chen

Yang

L.T.

et al., An improved stacked auto-encoder for network traffic flow classification, IEEE Network 32 (6), 22–27.

23.

Gao

and Li

, Approximate event detection over multi-modal sensing data, Journal of Combinatorial Optimization 32 (2016), 1002–1016.

24.

and Choi

, Acceleration of simple linear iterative clustering using early candidate cluster exclusion, Journal of Real-Time Image Processing 5 (2016), 1–12.

25.

Mahapatra

and Sun

, Rigid registration of renal perfusion images using a neurobiology-based visual saliency model, Eurasip Journal on Image & Video Processing 1 (2010), 1–16.

26.

Yasuda

and Ahmad

M.O.

, Development of an Active Shape Model Using the Discrete Cosine Transform, ICIAR 2017. Lecture Notes in Computer Science, vol 10317. Springer, Cham, 2017, pp. 259–267.

27.

Shuang

Zhong

Xiao

et al., Ground-based cloud detection using automatic graph cut, IEEE Geoscience & Remote Sensing Letters 12(6) (2017), 1342–1346.

28.

and Shi

, Scene learning for cloud detection on remote-sensing images, IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing 8(8) (2015), 4206–4222.

29.

Kang

Gao

Hao

and Li

, A coarse-to-fine method for cloud detection in remote sensing images, IEEE Geoscience and Remote Sensing Letters 16(1) (2019), 110–114.