Abstract
This paper introduces an end-to-end approach for land cover classification utilizing high-resolution remote sensing images (HRRSI), leveraging an Interval Type-2 Fuzzy Convolutional Neural Network (IT2FCNN). This method employs fuzzy logic for nonlinear pixel mapping and adaptively identifies the bounds of Interval Type-2 Fuzzy Sets through fuzzy convolution operations. By incorporating multivariate Type-1 membership functions into conventional convolutional kernels, we have engineered fuzzy convolutional kernels. These kernels, along with fuzzy rule libraries, activate features derived from fuzzy convolutions, facilitating the iterative refinement of the model’s fuzzy sets. This hierarchical process culminates in the development of the IT2FCNN model. When applied to the Wuhan dense labeling dataset (WHDLD), our proposed method outperformed the latest Interval Type-2 Fuzzy Neural Network by 5.27% in accuracy across nine land cover categories. Furthermore, it demonstrated a 17.52% increase in accuracy on the UC Merced Land Use Dataset (UCM Dataset), particularly in dense residential areas, and an 18.3% improvement in sparse residential areas across eleven land cover categories. These results highlight the approach’s effectiveness in mitigating the impact of regional noise on land cover classification, showcasing its strong generalization capability and superior classification accuracy.
Keywords
Introduction
Land Use/Land Cover (LULC) classification is a pivotal semantic classification task within the realm of remote sensing, aimed at assigning precise semantic labels to each pixel in an image [1]. The demand for rapid, accurate, and detailed classification outcomes has become essential across various research areas, such as national basic geographical information, disaster studies, and climate research, propelled by expanding application prospects [2–6]. However, advancements in remote sensing sensors, notably in high-resolution optical satellites, have yielded images with greater spatial, spectral, and temporal resolutions [7, 8]. Coupled with a substantial increase in data volume, these developments have rendered land cover in remote sensing images more intricate and uncertain [9, 10]. Challenges in the classification process encompass: (1) Feature confusion and the augmented challenge of extraction due to surface feature diversity and spectral similarities, complicating differentiation among feature types in remote sensing images and elevating the difficulty of feature extraction. This complexity presents a significant obstacle to the accuracy and robustness of classification algorithms [11]. (2) Enhanced uncertainty in pixel classification as ongoing enhancements in image resolution render finer surface details visible, simultaneously reducing the area represented by a single pixel, significantly augmenting pixel classification uncertainty, which detrimentally impacts the precision and stability of classification models [12]. (3) Amplified uncertainty in training sample features within homogeneous areas due to micro-surface feature variability and imaging conditions, introducing new challenges to the model’s generalization capability and predictive accuracy.
Despite progress, traditional methods like Fuzzy C-Means [13–17] face challenges with High-Resolution Remote Sensing Images (HRRSI) due to data variability. Recent studies employing Interval Type-2 fuzzy sets have shown promise in managing uncertainty more effectively, leading to advancements in Interval Type-2 Fuzzy Clustering (IT2FCM) [24–27] and Neural Networks (IT2FNN) [28–31]. Nonetheless, IT2FCM grapples with sensitivity to initial values, hyperparameter dependence, and sample imbalance issues [32, 33].
The IT2FNN model, merging Interval Type-2 fuzzy logic with neural networks, addresses these challenges by adeptly managing uncertainty and large-scale data. Its applications have proven superior in nonlinear system modeling and adaptive inverse control [34–42], though its complexity, computational demands, and parameter sensitivity limit its wider application.
To advance the application of IT2FNN in processing high-resolution remote sensing images, Wang et al. utilized the upper and lower bounds of the Interval Type-2 fuzzy membership of training samples and the baseline model membership as inputs to the IT2FNN model for HRRSI classification, achieving favorable outcomes [43]. Building upon this, Wang et al. developed an Interval Type-2 Gaussian Regression Neural Network model to further mitigate the impact of spatially correlated features on classification results, surpassing the 2018 Interval Type-2 fuzzy model in classification accuracy and simplifying input parameters, training, and inference processes [44]. Despite these breakthroughs, limitations remain: (1) The relatively shallow model structure and restricted neighboring pixel window limit the effective capture of global image information. (2) The reliance on empirical values for hyperparameter selection and boundary determinations reduces model flexibility and adaptability. (3) The absence of an end-to-end operation and inconsistency between the Type-1 membership function and the neural network’s loss function potentially leads to optimization objective discrepancies at different training stages, affecting overall model performance.
Addressing the limitations of Interval Type-2 Fuzzy Neural Networks, this paper, drawing inspiration from convolutional neural networks, innovatively integrates the membership function into the convolution kernel. This not only deepens the model to capture more complex features but also expands the model’s receptive field, enabling a more comprehensive understanding of global image information. Furthermore, by introducing fuzzy inference mechanisms to activate fuzzy convolutional kernel features, the prior impact of the membership function on classification performance is diminished, boosting the model’s flexibility and task adaptability. Additionally, this study unifies the loss function, achieving end-to-end classification, eradicating previous inconsistencies, streamlining the training process, and enhancing overall model performance. The main contributions of this paper are as follows:
The novel integration of the membership function into the convolution kernel, forming fuzzy convolutional kernels, thereby showcasing the potential to extend fuzzy neural networks as deep models and utilizing complex nonlinear functions in lieu of traditional linear convolutional kernels. The introduction of using fuzzy rule bases as activation functions to activate features learned by fuzzy convolutional kernels and the successful implementation of adaptive determination of Interval Type-2 fuzzy sets, markedly boosting the model’s expressive power and adaptability. This research not only presents a new effective method for remote sensing image land cover classification but also furthers the application of Interval Type-2 fuzzy theory in remote sensing image processing. Compared to traditional convolutional neural networks, this model offers enhanced interpretability, facilitating a deeper comprehension of its operational principles and performance advantages, and providing fresh insights and perspectives for related research fields.
The paper is structured as follows: Section 1 outlines the background of land cover classification in HRRSI and Interval Type-2 fuzzy theory applications. Section 2 explains the modeling approach, network structure, and training strategies of the proposed fuzzy convolutional neural network. Section 3 describes the setup for simulation experiments, including dataset selection and experimental configurations. Section 4 discusses the experimental results, highlighting the proposed method’s advantages and drawbacks. Finally, Section 5 summarizes the research and suggests directions for future work.
Methodology
Core components of the fuzzy convolutional neural network
Fuzzy convolutional kernel
To facilitate the learning of features within the network, a fuzzy convolutional kernel with a size of fk h × fk w is initially applied to perform fuzzy filtering on the grayscale image HRRSI. Here, assuming the image has dimensions H × W, each pixel is denoted as hrrsi (h, w), where h = 0, 1, . . . , H - 1 and w = 0, 1, . . . , W - 1 represent the row and column indices of the pixel, respectively. The fuzzy convolution operation on the image yields an output feature matrix A represented by:
Here, A is a Type-1 fuzzy set defined by a Type-1 multivariate membership function μ A . R is the set of pixel indices within the neighborhood window, while k h and k w represent the offsets of pixel indices within the fuzzy convolution kernel. Additionally, ⌊· ⌋ denotes the floor operation. The expression for the Type-1 multivariate membership function μ A is given by:
here, (h + k h , w + k w ) ∈ R
Here, the mean m, m’ and variance σ, σ′ serve as iteration coefficients for the Type-1 multivariate membership function μ A . γ is a weight hyperparameter for pixels within the neighborhood window, determining the weight proportions between neighboring pixels and the central pixel.
As illustrated in Fig. 1, the activation function, based on fuzzy logic principles, employs a priori rule base for fuzzy reasoning on extracted fuzzy features. Specifically, 1, …, m, … M fuzzy rules from the fuzzy rule base are merged to establish a mapping from the fuzzy input set A to the fuzzy output set B. The resulting mapping is operated on the antecedent set of fuzzy rules, generating an activation set G. Subsequently, the activation set is combined with the consequent set of fuzzy rules to produce the final output set B. The definition of the antecedent set’s membership function μA m →G m is as follows:

Fuzzy logic activation function.
Here, t represents the index of the membership function involved in the m rule, and ⊗ is the t-norm operator for fuzzy intersection. The membership function for the fuzzy output set B of the m rule is obtained as:
When dealing with membership functions, common composite methods include the maximum t-norm, minimum t-norm, and product t-norm. In this context, the activation set G involves multiple sets of membership functions, while the output set B involves hyperplane membership functions μ F m and the activation set G. Figure 2 illustrates the feature mapping generated using the parameters from Table 1. Specifically, Fig. 2(a) and 2(b) depict the feature mapping obtained by applying the maximum t-norm preconditions with two and three sets of membership functions defined in Equation (1), respectively. Figures 2(c) and 2(d) correspond to the posterior activation feature maps of Fig. 2(a) and 2(b). Figure 2(c) employs the minimum t-norm rule, while Fig. 2(d) utilizes the product t-norm rule for fuzzy logical operations. Observation of Fig. 2 reveals that the maximum t-norm can identify significant features in the image, while the product t-norm effectively avoids the issue of feature gradient vanishing. Therefore, the paper adopts the maximum t-norm method to generate the precondition activation set and employs the product t-norm to obtain the final output set.

Feature mapping of fuzzy convolution. The x-axis represents the central pixel value within the fuzzy convolution kernel, the y-axis represents the neighborhood window pixel value, and the z-axis represents the resulting membership degree after activation.
Parameters for Fuzzy Convolutional Feature Mapping
The output feature set of continuous fuzzy convolution represents a set of Type-1 fuzzy sets, namely an Interval Type-2 Fuzzy Set. This fuzzy convolution operation, while expanding the receptive field, adaptively determines the upper and lower bounds of the Interval Type-2 Fuzzy Set based on input data features. Taking B’ as the output fuzzy set from the preceding layer, B’’ as the final output fuzzy set for this layer, and B’’’ as the output fuzzy set for the next layer, we can use sets B’’ and B’’’ to form an Interval Type-2 Fuzzy Set. Specifically, by employing Equations (1)–(3), expressions for B’’ and B’’’ can be derived as shown in Equation (5). Simultaneously, the upper and lower bounds of the Interval Type-2 Fuzzy Set can be obtained, as demonstrated in Equation (6).
Figure 3 presents fuzzy convolution feature mapping using parameters from Table 2. Here, Fig. 3(a)–(c) correspond to feature mapping using B’, B’’, and B’’’ iterations of fuzzy convolution, respectively, and Fig. 3(d) displays the uncertain region formed by B’, B’’, and B’’’. The neighborhood value slices for Fig. 3(d) are shown in Fig. 3(e)–(f). Figure 3(a)–(d) showcase the dynamic evolution of feature extraction as the model iteratively processes features, and with each fuzzy convolution operation, the membership degrees of the feature map change. The multiple fuzzy convolutions create an Interval Type-2 Fuzzy Model with clear upper and lower bounds (Fig. 3(d)). Observing the feature slice graphs (Fig. 3(e)–(f)), it is evident that the fuzzy convolution process continuously alters the shape and region of the slice graph. This change further indicates the model’s sharp capturing ability for image details.

Feature mapping of fuzzy convolution and its slice. Specifically, (a-c) display feature maps generated using 1st, 2nd, and 3rd-order fuzzy convolutions. (d) illustrates the uncertainty region formed by these feature maps. Finally, (e-h) represent different slices for the neighborhood values in (d).
Parameters for fuzzy convolution feature mapping
The Interval Type-2 Fuzzy Convolutional Neural Network (IT2FCNN) architecture, as illustrated in Fig. 4, incorporates a feature extraction phase characterized by four consecutive fuzzy convolution sets, enhanced by two skip connections. Each set encompasses fuzzy convolution kernels, fuzzy activation functions, and batch normalization components. The fuzzy convolution layers leverage membership functions to capture the uncertainty inherent in pixel memberships, while fuzzy activation functions, informed by a fuzzy rule base, bolster the network’s robustness. Batch normalization optimizes the stability of data distribution and expedites the training process, thereby facilitating convergence. The integration of skip connections amalgamates shallow and deep feature layers, effectively leveraging both low-level and high-level feature information to mitigate the vanishing gradient problem and boost overall model performance.

Network model architecture.
Significantly, the feature extraction process within this model maintains the original image size, achieving feature compression solely through the fuzzy convolution layers. This ensures the feature map size remains constant across layers, varying only in channel number. As the network deepens, the dimensions of the fuzzy convolution kernels incrementally expand, emphasizing comprehensive feature representation. This facilitates multi-scale feature extraction and enhances the network’s robustness.
The network architecture also includes fully connected layers designed to discern the nonlinear relationships among membership features across various levels and scales. This approach adeptly circumvents the spatial correlation loss typically associated with fully connected layers in conventional convolutional neural networks. Moreover, the fuzzy convolution layers, beyond compressing the image during feature extraction, are adept at extracting spatial information across multiple scales. This dual capability allows the network to concurrently learn intricate local details and broader regional texture features. To address potential salt-and-pepper noise introduced by fully connected layers and elevate the quality of image processing, a post-processing step employing the mode substitution method is utilized, ensuring accurate final label determination. Furthermore, the cross-entropy loss function is selected for model backpropagation, optimizing the learning process and model performance.
Experimental dataset
This study utilizes the UC Merced Land Use Dataset [45] and the Wuhan Dense Labeling Dataset (WHDLD) [46, 47] for experiments. The UC Merced dataset encompasses 21 categories, each comprising 100 images, and is augmented by the DLRSD [48] dataset, which provides 17 pixel-level labels such as airplanes, buildings, and water. The WHDLD consists of 4940 images across six categories: bare soil, buildings, pavement, roads, vegetation, and water. To evaluate the network’s performance on non-color features, images from both datasets are converted to grayscale to focus on structure and texture. Figure 5 showcases samples from the UC Merced dataset alongside pixel-level labels from DLRSD.

Images of various land cover types from the UCM Dataset and pixel-level classification labels provided by DLRSD.
To evaluate the performance of the proposed algorithm, this study conducts comparisons against a diverse set of algorithms, including Deep Forest (DF) [49], Interval Type-2 Fuzzy Model with Gaussian Membership (IT2FM_GM) [44], Interval Type-2 Fuzzy Neural Network (IT2FNN) [50], Interval Type-2 Fuzzy Membership Function Model with Adaptive Weighted Average (IT2FM_AWA) [43], Interval Type-2 Fuzzy Neural Network Model with Gaussian Regression Membership (IT2FNN_GRM) [44], Interval Type-2 Fuzzy Membership Function Model with Adaptive Gaussian Regression (IT2FM_GA) [51], U-Net [52], and an Improved U-Net for Semantic Segmentation of Remote Sensing Images Based on a Combined Attention Mechanism (CTMU-Net) [53].
DF, a model depth-oriented and not reliant on neural networks, is acknowledged for its context and structural awareness. This characteristic allows DF to adapt its model complexity based on the dataset, showcasing notable performance across various datasets. On some small-scale datasets, DF, even with default settings, surpasses certain deep neural network algorithms, thereby serving as the baseline algorithm in this study.
IT2FM_GM, IT2FNN, IT2FM_AWA, IT2FNN_GRM, and IT2FM_GA, all supervised image classification algorithms, are grounded in Interval Type-2 Fuzzy Sets. These sets provide superior feature representation capabilities over Type-1 Fuzzy Sets, facilitating more effective management of uncertain information in remote sensing images. Specifically, IT2FNN enhances IT2FM_GM by integrating neural networks for end-to-end learning and prediction. IT2FM_AWA enhances noise robustness by incorporating neighborhood-weighted averages into the base membership functions, effectively leveraging spatial information. The IT2FNN_GRM algorithm, by combining the Gaussian Regression Model (GRM) with IT2FNN, improves feature representation for images with complex spatial details. In this experiment, IT2FNN and IT2FM_GA adaptively iterate to identify optimal parameters, while IT2FNN_GRM, IT2FM_GM, and IT2FM_AWA require manual parameter adjustments for optimal classification results.
U-Net, a seminal deep learning structure, adopts VGG16 [54] as its foundational network framework in this paper, with feature map quantities constrained to ensure fair comparison with the proposed model. CTMU-Net, an enhancement of U-Net, features a dual-branch structure and a combination attention module, adept at extracting both local and global information. As with U-Net, feature map numbers are limited to ensure a fair comparison with the proposed model.
For the UC Merced Land Use Dataset, all methods randomly allocate 40% of the samples for training. Considering the WHDLD dataset, which includes a significant volume of images, methods like DF, which lack GPU acceleration support, necessitate dividing the dataset into 20 groups. The overall dataset performance is evaluated by averaging the performance across these groups, with 40% of the samples in each group randomly selected for training.
Experimental performance evaluation metrics
This study employs four distinct metrics to assess the performance of various algorithms in classification tasks across different datasets. These metrics are Overall Accuracy (OA), Average Accuracy (AA), the Kappa coefficient, and the F1-score. OA measures the total classification accuracy, representing the proportion of correctly classified instances in the dataset. AA signifies the model’s average performance across different classes, providing insight into how uniformly the model performs across all categories. The Kappa coefficient acts as a statistical measure that assesses the accuracy of model classification while correcting for chance agreement. The F1-score is a comprehensive metric that combines Precision and Recall, reflecting the balance between the model’s ability to correctly classify positive cases and its tendency to avoid misclassifying negative cases, as detailed in Equations (7)–(9).
Here, TP, FP and FN represent the quantities of True Positives, False Positives, and False Negatives, respectively, in the confusion matrix.
UCM dataset analysis: dense residential area
In dense residential areas, pixel-level classification encompasses labels for Buildings, Cars, Grass, Trees, Bare-soil, and Pavement, each represented by distinct colors and detailed in proportions in Table 3. Buildings are identified as the predominant feature, whereas Cars, Grass, and Trees appear in comparatively lower proportions, with Bare-soil and Pavement being more evenly distributed.
Proportions and corresponding color labels of pixel-level labels in the dense residential area
Proportions and corresponding color labels of pixel-level labels in the dense residential area
To evaluate classification efficacy across these complex scenes, various algorithms were compared, including DF, IT2FM_GM, IT2FNN, IT2FM_AWA, IT2FNN_GRM, IT2FM_GA, U-Net, CTMU-Net, and the proposed algorithm. The results, detailed in Table 4 and Fig. 6, reveal the challenges presented by the diversity of semantic information and strong pixel correlations. Specifically, the IT2FM_GM method, reliant on pixel grayscale information, encounters significant salt-and-pepper noise, impacting its accuracy negatively. While the inclusion of a neural network model in IT2FNN improves performance, limitations arise due to the network’s depth. Conversely, fuzzy neural network models like IT2FM_AWA and IT2FNN_GRM, which incorporate neighborhood relationships, demonstrate considerable advantages, notably enhancing classification accuracy over algorithms that only analyze single-pixel features.

Pixel-level classification results in the dense residential area. (a0-f0) Images of the dense residential area. (a1-f1) Pixel-level classification labels from DLRSD. (a2-f2) DF model. (a3-f3) IT2FM_GM model. (a4-f4) IT2FNN model. (a5-f5) IT2FM_NWA model. (a6-f6) IT2FNN_GRM model. (a7-f7) IT2FM_GA. (a8-f8) U-Net. (a9-f9) CTMU-Net. (a10-f10) Our model.
U-Net attains commendable overall accuracy (OA) but faces difficulties in scenarios with imbalanced sample distributions, particularly in classifying Cars and Grass. This results in lower average accuracy (AA) and F1 scores compared to the model proposed in this paper. CTMU-Net, an enhanced version of U-Net with a modified loss function, shows improved learning from less represented categories and excels in AA segmentation. However, it still does not match the proposed model in AA, Kappa, and F1 scores.
The proposed model significantly improves the network’s learning efficiency by integrating fuzzy convolutional kernels and fuzzy activation functions. It effectively addresses the common issue of salt-and-pepper noise in image classification through the feature compression capability of fuzzy convolutional kernels and a majority substitution post-processing strategy, thus markedly enhancing accuracy. Notably, this model achieves exceptional performance without reducing feature map dimensions.
Challenges in classifying Cars, Grass, and Trees, as observed in most comparative algorithms, largely arise from their similar grayscale characteristics in images, complicating quick visual differentiation. Additionally, the heterogeneity in building rooftops, due to varied lighting angles and materials, adds to the classification complexity. The proposed model adeptly overcomes these challenges by leveraging fuzzy membership functions to create fuzzy convolutional kernels, enhancing feature extraction and generalization capabilities. As a result, the proposed method demonstrates superior classification performance across all categories. Compared to other algorithms in Table 4, our model achieves a significant increase in overall accuracy—about 2% higher than U-Net and CTMU-Net, approximately 20% higher than DF, and around 16% higher than IT2FNN_GRM. Furthermore, the proposed model excels in additional evaluation metrics, achieving an average accuracy of 88.39%, a Kappa coefficient of 0.9252, and an F1 score of 0.9186.
Overall classification evaluation metrics and classification accuracy of different pixel-level labels in the dense residential area
In comparison to dense and medium-density residential areas, sparse residential regions present a more intricate challenge for land cover classification due to the diversity of 11 pixel-level categories, including Buildings, Court, Chaparral, Cars, Grass, Bare Soil, Trees, Field, Water, Sand, and Pavement. Detailed pixel proportions are presented in Table 5. The complexity of these environments, coupled with a rich variety of labels, sets a demanding benchmark for classification algorithms. Our model, as detailed in Tables 6 and 7, distinguishes itself with exceptional performance across key evaluation metrics, achieving an Overall Accuracy (OA) of 91.48% and an Average Accuracy (AA) of 88.98%. This demonstrates its consistent effectiveness in accurately identifying and classifying a wide range of land cover types. The Kappa coefficient, at 0.9015, attests to the model’s reliability, while an F1 score of 0.9101 underscores its balanced precision and recall capabilities.
Proportions and corresponding color labels of pixel-level labels in the sparse residential area
Proportions and corresponding color labels of pixel-level labels in the sparse residential area
Classification result accuracy evaluation metrics for sparse residential area
Classification accuracy of different pixel-level labels in sparse residential area
While U-Net exhibits superior performance in terms of OA and the Kappa coefficient, indicating greater efficiency in broad image segmentation tasks, it significantly underperforms in precisely segmenting specific categories such as Court and Cars. This limitation leads to U-Net’s AA and F1 score being notably lower than those achieved by our model. Conversely, CTMU-Net, with an AA of 89.52%, demonstrates robust performance. However, as depicted in Fig. 7, particularly through examples e9 and e10, CTMU-Net faces challenges in fully capturing and depicting complex terrain details in grayscale images despite its advanced dual-branch structure and attention module. This underscores the difficulties inherent in accurately classifying land cover in sparse residential areas and highlights the necessity for sophisticated modeling approaches capable of navigating the intricacies of such diverse environments.

Pixel-level classification results in the sparse residential area. (a0-f0) Images of the sparse residential area. (a1-f1) Pixel-level classification labels from DLRSD. (a2-f2) DF model. (a3-f3) IT2FM_GM model. (a4-f4) IT2FNN model. (a5-f5) IT2FM_NWA model. (a6-f6) IT2FNN_GRM model. (a7-f7) IT2FM_GA. (a8-f8) U-Net. (a9-f9) CTMU-Net. (a10-f10) Our model.
This paper delves into the intricate mechanisms of fuzzy convolutional neural networks through a series of ablation experiments designed to elucidate the distinct impact of various components on the overall performance of the model. The findings, illustrated in Fig. 8 and Table 8, provide insightful revelations into the integral roles these components play. A focal point of these experiments was the evaluation of fuzzy activation functions. By comparing the performance of a standard model equipped with a fuzzy activation function against a variant devoid of it (referred to as the NF-AF Model), a notable degradation in the latter’s performance in image segmentation tasks was observed. This decline underscores the indispensable role of fuzzy activation functions in facilitating the model’s learning process and its adaptation to complex nonlinear relationships inherent in the training data. Particularly, the use of specific membership functions as complex nonlinear operators through fuzzy activation functions enables the model to effectively capture essential features, significantly mitigating training instability.

Pixel-level classification results in the sparse residential area. (a0-f0) Images of the sparse residential area. (a1-f1) Pixel-level classification labels from DLRSD. (a2-f2) NF-AF Model. (a3-f3) 1-Layer Model. (a4-f4) 2-Layer Model. (a5-f5) 3-Layer Model. (a6-f6) Our model.
Ablation Study: Classification result accuracy evaluation metrics for sparse residential area
Further investigation into the impact of network depth on model performance was conducted, examining models with varying depths, from 1 to 5 layers. A progressive enhancement in performance correlating with increased network depth was observed, highlighting the critical importance of deepening the network structure to enhance the model’s perceptual capabilities and proficiency in managing complex tasks. Such findings not only affirm the viability of developing deep fuzzy neural networks but also suggest the potential superiority of employing complex nonlinear functions over traditional linear convolutional mappings in enhancing model performance. This exploration into the effects of different network components not only validates existing theories but also opens avenues for future research aimed at optimizing fuzzy convolutional neural network architectures for advanced image segmentation and classification tasks.
In comparison to the UC Merced Land Use Dataset, the primary challenge in pixel classification for the WHDLD dataset arises from increased pixel uncertainty due to variations in illumination under grayscale conditions. These variations lead to decreased distinguishability of pixel features across different land cover categories. Table 9 details the distribution proportion of each category label. Despite such uncertainty, the model proposed in this study demonstrates exceptional performance across all key performance evaluation metrics. Specifically, the overall classification accuracy of the model reached 89.52%, with an average accuracy of 84.85%, a Kappa coefficient of 0.8275, and an F1 score of 0.8310, as shown in Table 10.

Pixel-level classification results in the WHDLD. (a0-f0) Images of the sparse residential area. (a1-f1) Pixel-level classification labels from DLRSD. (a2-f2) DF model. (a3-f3) IT2FM_GM model. (a4-f4) IT2FNN model. (a5-f5) IT2FM_NWA model. (a6-f6) IT2FNN_GRM model. (a7-f7) IT2FM_GA. (a8-f8) U-Net. (a9-f9) CTMU-Net. (a10-f10) Our model.
Proportions and corresponding color labels of pixel-level labels in the WHDLD
Overall classification evaluation metrics and classification accuracy of different pixel-level labels in the dense residential area (denseresidential)
Although the U-Net model exhibits slight advantages in segmentation precision for certain land cover types (e.g., “Water” and “Building’’), it performs poorly in handling imbalanced sample types (e.g., “Bare soil” and “Road’’). In contrast, the model proposed in this research exhibits superior adaptability, not only offering specialized adjustments for imbalanced sample issues but also demonstrating robust feature extraction capability. Notably, the design of this model does not incorporate pooling layers; instead, it employs fuzzy convolution and a simplified mode substitution post-processing technique to effectively preserve complex boundary information in the presence of slight salt-and-pepper noise. The observation of figures d0 to d10 in Image 9 clearly illustrates the model’s superiority in retaining boundary information, providing a more reliable solution for pixel-level land cover classification tasks in practical applications.
Addressing the semantic classification challenges inherent in the diversity and uncertainty of land surface cover in High-Resolution Remote Sensing Images (HRRSI), this paper introduces a deep-structured Interval Type-2 Fuzzy Convolutional Neural Network (IT2FCNN) model. This innovative model weaves pixel membership functions into convolutional kernels and utilizes a fuzzy rule base for the activation of extracted fuzzy features, thereby augmenting the network’s robustness. By continuously learning fuzzy convolutional kernels, the model adeptly identifies the upper and lower bounds of the Interval Type-2 Fuzzy Model, facilitating precise representation of uncertain features within complex datasets.
Experimental analysis conducted on the UC Merced Land Use Dataset and the WHDLD dataset underscores the proposed model’s significant superiority in critical performance metrics—overall accuracy, average accuracy, Kappa coefficient, and F1 score—over existing techniques. These results affirm the feasibility of extending fuzzy neural networks into deep learning paradigms and the benefits of enriching convolutional kernels with complex nonlinear functions. Nonetheless, the model faces limitations, such as potential information loss due to the cross-entropy loss function’s incompatibility with fuzzy information and an increase in computational complexity from linear layer parameters used in the defuzzification process, which may hinder generalization capabilities.
In light of these challenges, future research will pivot towards incorporating a wider array of membership functions and an enriched rule base tailored for interval type-2 fuzzy logic systems. This will involve exploring more effective loss functions and defuzzification strategies to boost the model’s proficiency in navigating complex scenarios and deepening its interpretative depth. Additionally, integrating cutting-edge feature learning mechanisms—such as asymmetric convolutions, attention mechanisms, tensor decomposition, and knowledge distillation techniques inspired by Cov-Net [55] and TDKD-Net [56], is anticipated to further refine the feature learning and representation capabilities of fuzzy convolutional kernels. This approach aims to deliver a more accurate and robust framework for the semantic classification of high-resolution remote sensing imagery, enhancing our understanding and interaction with the Earth’s surface.
Footnotes
Acknowledgment
This research was funded by the National Natural Science Foundation of China Youth Project, grant number 41801368, and Fundamental Research Youth Project of the Education Department of Liaoning Province, grant number LJKOZ2021154.
Conflicts of interest
The authors declare no conflict of interest.
