Abstract
Neighborhood classifier, a common classification method, is applied in pattern recognition and data mining. The neighborhood classifier mainly relies on the majority voting strategy to judge each category. This strategy only considers the number of samples in the neighborhood but ignores the distribution of samples, which leads to a decreased classification accuracy. To overcome the shortcomings and improve the classification performance, D-S evidence theory is applied to represent the evidence information support of other samples in the neighborhood, and the distance between samples in the neighborhood is taken into account. In this paper, a novel attribute reduction method of neighborhood rough set with a dynamic updating strategy is developed. Different from the traditional heuristic algorithm, the termination threshold of the proposed reduction algorithm is dynamically optimized. Therefore, when the attribute significance is not monotonic, this method can retrieve a better value, in contrast to the traditional method. Moreover, a new classification approach based on D-S evidence theory is proposed. Compared with the classical neighborhood classifier, this method considers the distribution of samples in the neighborhood, and evidence theory is applied to describe the closeness between samples. Finally, datasets from the UCI database are used to indicate that the improved reduction can achieve a lower neighborhood decision error rate than classical heuristic reduction. In addition, the improved classifier acquires higher classification performance in contrast to the traditional neighborhood classifier. This research provides a new direction for improving the accuracy of neighborhood classification.
Introduction
Granular computing simulates human thinking and solves large-scale and complex problems, which has broad application prospects in fields such as artificial intelligence, knowledge discovery, and image processing. As the main tools of granular computing, rough set, soft set, and fuzzy set are combined to solve uncertain decision-making problems. For example, the possibility m-polar fuzzy soft set is the combination of soft set and fuzzy set [15]. This model effectively handles the uncertainties during the decision-making problems. Soft β-rough set was generated by the combination of soft set and rough set. Through soft β-rough approximation and soft β-rough topology, this model determines the people most likely to be infected with new coronary pneumonia [9]. Based on fuzzy soft set, a new fuzzy soft expert system for lung cancer disease prediction is proposed [16].
Many strategies are applied to solve uncertainty and ambiguity issues [3]. The rough set is also applied to describe and address uncertain problems. The classical rough set model was first proposed by Polish mathematician Z. Pawlak [40, 41] in the last century. Using upper and lower approximation sets, the target concept can be characterized. Rough set theory has been further developed in recent years. For example, the target concept is integrated with topology to construct the generalized approximation operator [10]. In addition, rough set has also been researched in practical applications. By means of j-neighborhoods, complementary j-neighborhoods and j-adhesions, several types of j-covering approximations based rough set is proposed. The application of the novel types of covering-based rough set to the rheumatic fever is developed [20]. Although the rough set has developed to a certain degree, when encountering complex real-world issues, the traditional rough set is limited due to the exact demands of the definitions of lower and upper approximation sets. To address complex issues such as multisource data, incomplete data, numeric data, and fuzzy data, the classical rough set method has been further developed.
(1) Classical rough set cannot reflect the tolerance of the concept. To fill this gap, Yao et al. first combined probabilistic rough set and decision-making and proposed the Decision-Theoretic Rough Set (DTRS) model [35]. In light of DTRS, the framework of three-way decision was proposed [35], and many research results has been achieved [27]. For example, Huang et al. proposed a novel three-way neighborhood decision model to deal with incomplete hybrid information [25]. Wang et al. proposed a three-way decision model based on cumulative prospect theory [31]. Li et al. adopted a deep neural network framework for three-way decision image data analysis [11].
(2) Considering different levels of granularity, many research results have been achieved [39, 19]. For example, Zhu et al. developed a multi-granularity distance learning technique and learned distance variable in different granular spaces [21]. Huang et al. proposed two novel MG-IF-DTRS models, which are the fusion of multi-granulation, intuitionistic fuzzy, and risk decision [1]. Xu et al. constructed a novel information fusion approach, and then the uncertainty measures of fusion progression were studied [32]. Kong et al. also studied the multi-granulation method of information fusion in multi-source decision information system [26].
(3) As far as fuzzy environment is concerned, fuzzy rough set has been proposed, and some related issues have been investigated [6, 7]. For example, Ye et al. proposed a novel fuzzy rough set method, which handled multi-criteria decision-making problems under uncertain and fuzzy environments [13]. Zhan et al. developed a covering-based variable precision fuzzy rough set model to address the problem of misclassifications in decision-making [14]. Wang et al. proposed four uncertainty measures by introducing the notion of self-information to fuzzy rough approximations [5].
(4) In some cases, the data is presented in an incomplete form, because data measurement errors often occur in practical applications. Therefore, how to use rough set theory to mine knowledge from incomplete information systems is a popular topic [8]. For instance, Luo et al. proposed a novel dynamic probabilistic rough set model for the sake of processing incomplete data [2]. Qian et al. introduced a multi-granulation notion to an information system, which applies multiple tolerance relations to approximate the target concept [37]. Luo et al. studied several semantics issues between incomplete information system and rough set theory [12]. Moreover, Yang et al. performed systematic and landmark work on incomplete information systems and rough set theory, of which several achievements are referred to by researchers in the field of granular computing and knowledge discovery [34].
In addition, as far as numeric data are considered, the traditional approach usually involves transforming them into discrete data, which usually excludes some important information. To solve this issue, Lin introduced the neighborhood notion to rough set theory and proposed a novel rough set model [30]. Hu et al. redefined and interpreted Lin’s neighborhood rough set model from the perspective of the granular computing of mixed data [22]. In light of the k-nearest neighbor classification approach, Hu et al. proposed a rough set-based classification strategy called the neighborhood classifier [23]. Neighborhood rough set is an effective tool to tackle practical problems, and many scholars focus on this popular research topic [17, 18]. The classical neighborhood rough set also has several shortcomings. The classification method is simple, and the classification performance of attribute reduction reduct is insufficient. To solve these problems, some scholars have conducted additional research. For example, Yang et al. fused the pseudolabel to the neighborhood relationship to improve the performance of the neighborhood rough set model [33]. Wang et al. constructed new uncertainty measures in a neighborhood rough set framework and adopted it for attribute reduction [4]. These studies optimize the neighborhood relation and attribute reduction part of the neighborhood rough set, add restrictions to the neighborhood relation, and propose a new attribute importance index in the attribute reduction part. It should be noted that the attribute reduction and classification processes based on neighborhood rough set still have several shortcomings. The detail issues are as follows:
First, traditional heuristic attribute reduction may be insufficient when the attribute significance index is not monotonous. Because its termination threshold is fixed, it cannot retrieve an optimal value of classification. The neighborhood decision error rate (NDER) is selected as an example, as shown in Fig. 1.

NDER variation with attribute increasing.
When the NDER value of the subset decreases to the threshold, i.e., the curve reaches point A, the algorithm is terminated. Compared with point B, the neighborhood decision error of A is far larger than that of point B. Therefore, the classical heuristic algorithm lacks good reduction performance for this category of data.
Second, in the traditional neighborhood classifier, majority voting strategy is widely applied, which is based on the number of samples in the neighborhood. It is simple, however, the distribution of samples has not been considered. As shown in Fig. 2, according to the traditional voting mechanism, one may assign the label of x to “+”. However, from the distribution of data in the neighborhood, one may realize that the data points with label “Δ” are closer than other data points with label “+”. From this point of view, the class label of x may be “Δ”. Therefore, the data distribution information should be considered when classification method is constructed.

Majority voting strategy.
For the issues mentioned above, this paper puts forward a novel solution in two stages. On the one hand, a dynamic updating method is proposed to improve the attribute reduction algorithm when the attribute significance index is nonmonotonic. The termination threshold of the proposed reduction algorithm is dynamically optimized based on the searched values. Therefore, this method can retrieve the optimal value according to the termination threshold. Different from the traditional reduction set, which maintains the same classification effect as the original attribute set, the reduction set obtained by this algorithm can achieve a classification performance far superior to the original attribute set. The reduction set obtained by the proposed algorithm can better represent the resolution of the original attribute set and obtain good classification performance. On the other hand, to describe the closeness between samples, D-S evidence theory is also an effective method. For example, Denoeux first applied D-S evidence theory to the kNN algorithm [29]. Then, Zhang et al. further developed D-S evidence theory and applied it to an ensemble classification algorithm to improve the performance of the k-nearest neighbor classification method [38]. In light of the advantage of D-S evidence theory, this paper introduces D-S evidence theory into data information fusion. Through the information distribution of samples in the neighborhood, different classes of sample information are fused to obtain the evidence support information of samples to improve the accuracy of classification. Then, an improved D-S evidence theory-based neighborhood rough classification approach is discussed. The improved classification method refers to more information, which is more scientific and comprehensive, and can effectively improve the accuracy of classification.
The main structure of this paper is organized as follows: Section 2 mainly introduces the neighborhood rough set approach and its classifier implementation method. Section 3 proposes a novel classification method based on a neighborhood rough set that includes the implementations of attribute reduction and D-S neighborhood classification. Section 4 presents several comparisons and analyses the proposed algorithm of this paper through nine UCI datasets. Finally, we summarize the full paper in Section 5.
Neighborhood rough set and classification mechanism
Generally, the structural data of information system can be defined as IS =< U, AT, V, f >, where U is a non-empty set of samples, i.e., U = {x1, x2, . . . , x n }, and it is called universe. AT is the set of attributes, and can be denoted as: AT = {a1, a2, . . . , a m , d}, in which conditional attribute set C = {a1, a2, . . . , a m } and decision attribute set D = {d} are considered simultaneously. V is the value range of attribute a, and f is the attribute mapping, f : U × a → V a .
Δ (x
i
, x
j
) ≥0; Δ (x
i
, x
j
) =0, if x
i
= x
j
; Δ (x
i
, x
j
) = Δ (xj,x
i
); Δ (x
i
, x
k
) ≤ Δ (x
i
, x
j
) + Δ (x
j
, x
k
).
In this paper, the radial basis function kernel distance is selected. This is also known as Gaussian kernel distance. In this framework, the distance between any two samples x i , x j is written as
In the neighborhood classification process, let δ
B
(x
i
) be the neighborhood of testing sample x
i
, let x
j
become the sample within neighborhood δ
B
(x
i
), and let its class label be expressed as ω
j
. In terms of majority voting strategy, the class label of x
i
can be obtained as follows:
D-S evidence theory is a reasoning theory, that was first proposed by Dempster in 1967 and further developed by Shafer in 1976. Hence, it is also known as Dempster-Shafer’s evidence theory [29]. Suppose Ω is an exhaustive set of all possible values of variable x, and the elements in Ω are mutually exclusive. In evidence theory, Ω is called frame of discernment.
In above definition, M is called the basic probability distribution function of 2 Ω and M (A) is the basic probability number of A. In addition, for any A ⊆ Ω, if M (A) ≠0, then A is a focal elementof M.
In virtue of the basic probability distribution function, the belief function Bel and the plausibility function Pl of the proposition can be defined as follows:
Bel (A) represents the overall trust degree of A, and Pl (A) indicates the degree of not against A.
An improved attribute reduction approach
Attribute reduction is one of the core issues of rough set theory [24]. By deleting redundant attributes, the data are reduced in dimension to achieve the purpose of data processing. Candidate attributes are selected by comparing the significance of attributes. Generally, approximate quality and information entropy are common indicators of attribute significance. Hu et al. considered that the samples in the boundary domain are not necessarily misclassified, and only a few samples are misclassified [24]. Therefore, in light of neighborhood rough set, Hu et al. proposed the neighborhood classification error rate notion (NDER) [24] from the perspective of sample error classification, and used it as an index to assess attribute subspace.
Let δ
B
(x
i
) be the neighborhood of sample x
i
, and P (ω
j
|δ
B
(x
i
)) (j = 1, 2, . . . , c) is the class probability of δ
B
(x
i
) under class ω
j
. Owing to the class probability, the class label of sample x
i
can be assigned, i.e., if
Therefore, the neghborhood error rate can be denoted as follows:
In light of the neighborhood decision error rate, Hu et al. further proposed a novel attribute reduction approach, which is based on the neighborhood decision error rate [24]. The detailed definition can be found as follows.
NDER
A
≤ NDER
AT
; ∀A′ ⊂ A, then NDERA′ > NDER
A
.
The reduct set defined above deletes the redundant attributes in the information system, and reduces the neighborhood decision error rate of attribute set. In an information system, the reduced attribute subspace based on the neighborhood decision error rate can completely replace the original attribute set.
Having the definition of neighborhood decision error rate reduction in mind, ∀B ⊆ AT, a ∈ AT - B, the significance of attribute A with respect to attribute subset B is
To achieve the above attribute significance, the traditional heuristic algorithm can be applied. The classical heuristic algorithm is a commonly employed method for attribute reduction. In this strategy, attributes are selected by finding the maximum attribute significance. The classical heuristic algorithm is shown in
The time complexity of Algorithm 1 is O (|U|2 × |AT|2) where the length of attributes tested is |AT| and the complexity of calculating significance of candidate attributes is O (|U|2 × |AT|). The classical heuristic algorithm is employed in most situations. However, there are some shortcomings of Algorithm 1 in some cases, which can be indicated in
In the rough set model, many indexes of attribute significance do not change monotonically with the increase of attributes, as does the neighborhood error rate. To obtain the attribute reduct set of this category of dataset, an improved heuristic reduction is developed by dynamically updating the standard to gain the minimum value of the neighborhood error rate. The algorithm of searching attribute reduct is shown in
The time complexity of Algorithm 2 is the same as that of Algorithm 1 where the consumption is O (|U|2 × |AT|2). Compared with the classical heuristic reduction algorithm, Algorithm 2 does not increase the time consumption but effectively improves the attribute reduction performance. The improved reduction set is deeper and more representative for attribute set data dimension reduction. It is able to effectively obtain regional extrema and greatly improve the progress of classification.
The neighborhood classification method proposed by Hu et al. adopts the majority voting strategy, which ignores the case of different distributions of data. In this section, the majority voting strategy is altered under the framework of neighborhood classification, and the neighborhood rough classification method based on D-S evidence theory is proposed. The key to applying evidence theory to classification is to judge samples belonging to the same category based on the similarity of the samples. Within the k-nearest neighbor classification framework, Denoeux developed a novel approach to represent the relationship between categories and neighborhood samples [29].
For a testing sample, assume that π ={ X1, X2, … X
d
} is the partition of universe U on decision attribute D, and d is the number of categories. ω
i
={ 1, 2, … d } is the class label of the sample. For any sample x
i
in the neighborhood, its class label ω
i
is marked as q. Then, (x
i
, X
q
) can be used as independent evidence to support the classification of x
t
. The evidence information contained in it can be defined as follows:
There are multiple samples in the neighborhood of testing sample x
t
, and the evidence support of multiple samples for x
t
is fused by evidence combination. Let
Similarly, if different types of evidence
On these basis notions, the belief function and plausibility function of sample x
t
to a certain class X
q
can be obtained
Let δ
B
(x
i
) be the neighborhood of sample x
i
. x
j
is the sample within the neighborhood, and its class label is ω
j
. Then, due to D-S evidence theory, the class label of x
i
can be obtained as follows:
In light of the above discussions, the detailed process of D-S evidence theory based neighborhood rough classification can be shown in
The time complexity of Algorithm 3 is mainly reflected in calculating the distance between the sample to be tested and the training sample. The calculated information is based on the distance. According to the above discussion, the time complexity of Algorithm 3 is O (|U| · |AT| · d).
To verify the effectiveness of the proposed attribute reduction method and D-S classification approach, several datasets are employed from UCI database. The detail information of datasets are shown in Table 1.
Datasets from UCI
Datasets from UCI
In this paper, the data are scattered by using 5-fold cross-validation. The dataset is divided into two parts: training set and testing set. The detailed process is as follows. Each dataset is partitioned into 5 disjoint groups such that U ={ U1, U2, U3, U4, U5 }. For example, in the first round of calculation, U2 ∪ U3 ∪ U4 ∪ U5 is called the training set which is the supervised dataset used to learn the neighborhood rough set model and neighborhood classifier. U1 is regarded as the testing set which is used to assess the classification prediction performance of the proposed method. The classification error rate is the measure employed in the experiment. In summary, the experiment is expected to verify that the proposed approach can improve the classification performance.
First, the performance of the improved attribute reduction approach is presented. Figure 3 shows the neighborhood decision error rate (NDER) values of the original, classical and improved reduction versions, with increasing universe. The red line represents the NDER values of improved attribute reduction, the blue line represents the NDER values of the classical heuristic reduction, and the black line denotes the NDER values of the original data. It is not difficult to note that, no matter how the universe increases, the improved reduction can achieve a lower NDER than the original dataset and heuristic reduction set, respectively. For example, in the dataset Sonar, the value of NDER from the original data is higher than 0.45 while the value of NDER deprived from classical reduction is higher than 0.2 and the value of NDER derived from improved reduction is lower than 0.2. The NDER derived from improved reduction is far lower than that of classical reduction and original data, which means that the improved reduction is able to avert several errors of classification. By observing nine datasets in Fig. 3, it is not difficult to find that the improved attribute reduction greatly improves the performance of neighborhood classification.

NDER values comparisons of different reduction algorithm.
The results of nine datasets deprived from different reduction algorithms are recorded in Table 2. In other words, through different algorithms, the size of attributes and classification performance are shown. It is easy to reach the following two remarks.
Attribute reduction comparison with different reduction algorithms
(1) From the viewpoint of attribute numbers, it is not difficult to note that both the improved reduction set and the heuristic reduction set have reduced the dimension of the dataset. However, compared with classical heuristic reduction, the improved reduction set retains further depth for attribute set reduction. For example, in the first dataset, the original dataset has 21 attributes while the classical reduction set owns 5 attributes, and the improved reduction set obtain 9 attributes. Compared with the classical reduction method, the improved approach can break through the original constraints, and find a more in-depth and representative attribute reduct set that gains better discrimination ability than the original set. Therefore, the reduction set can possess better classification performance.
(2) By comparing the values of NDER, it is easy to find that the neighborhood decision error rate of the improved reduction set is lower than that of the classical heuristic reduction set. For instance, in the third dataset, the value of NDER from the original data is approximately 0.37 when the value of NDER from classical reduction is approximately 0.2, and that from improved reduction is only approximately 0.16. Therefore, it is shown that the classification performance of the improved reduction method is better than that of the classical reduction approach. Moreover, in the first dataset, the fluctuation range of NDER deprived from improved reduction is 0.0228 while the range of NDER from classical reduction is 0.0169, and the range of original data is 0.0153. It is shown that the fluctuation of the neighborhood error rate of the improved reduction set is smaller and more stable than that of the classical heuristic reduction set. Therefore, the improved reduction algorithm improves the attribute reduction performance.
In the neighborhood classification comparisons, the results of the reduction set and original set in the traditional neighborhood classifier [23] and D-S improved classifier are shown in Fig. 4. For example, in the dataset Cardiotocography, the black curve is the classification accuracy of the data employed by the D-S improved classifier. The red curve is that of data classified by the neighborhood classifier. The black curve is always above the red curve regardless of whether the dataset is reduced. It is not difficult to find that the classification accuracy of the D-S neighborhood classifier is higher than that of the traditional neighborhood classifier. The neighborhood classification assessment of several datasets is shown in Fig. 4. The improved attribute set has higher classification performance than the original attribute set.

Classification accuracies comparison with different algorithms.
Finally, as we all know, the classical neighborhood classifier ignores the distribution of samples in the neighborhood, resulting in a decrease in classification performance, so this method is proposed to solve this problem. To prove the effectiveness of the DSNEC algorithm, it is necessary to judge whether this method improves the classification performance effectively. The proposed classification algorithm DSNEC is further compared with several state-of-the-art classification approaches, such as: the classical voting based kNN (voting-kNN), weighted kNN [36], DS-kNN [29] and NEC [23]. Voting-kNN is represented as the classical kNN algorithm using a voting strategy. This algorithm is suitable for multiclassification problems with high accuracy and insensitivity to outliers. In the weighted kNN, the distance of each sample is considered. The classification accuracy is improved by adding weight to the distance. In the DS-kNN, k-neighbor evidence support is obtained through D-S evidence theory to improve the performance of the kNN algorithm. In the neighborhood classifier, the neighborhood is determined by distance and classification is performed by the principle of majority strategy. It is widely applied in classification applications. These classification methods are excellent and widely
applied in practical classification tasks. The proposed method in this paper is further improved according to common algorithms. These classification methods are employed as a comparative experiment which can further improve the rationality and persuasion of the experiment. Apparently, the first three classification methods are related with the parameter of k. In this paper, without loss of generality, the value of k is set as
Classification comparison with several popular classification algorithms Data ID
Neighborhood classifier is a common classification method but it also has some shortcomings. The distribution of samples in the neighborhood is ignored, which decreases the accuracy of classification. To solve this problem, this paper introduces D-S evidence theory into the neighborhood classifier. Then a novel classification approach based on D-S evidence theory is proposed. In this approach, D-S evidence theory is applied to describe the closeness of samples. According to the distribution of samples in the neighborhood, the evidence information of samples is obtained which improves the classification performance. In addition, the dynamic updating strategy is applied to improve the attribute reduction algorithm. When the attribute significance index is not monotonous, the classical heuristic algorithm cannot retrieve the optimal value. However, in the improved algorithm, the termination threshold of the reduction algorithm is continuously optimized and then the method can achieve an effective value. Compared with traditional methods, the reduction set obtained by the proposed method has superior classification performance. Experimental results show that the improved reduction algorithm can not only improve the classification effect, but also represents the resolution of the original attribute set. In contrast to several common classification algorithms, the classification performance of the improved method based on D-S evidence theory is superior. Through the proposed method, the effect of classification is improved. In future work, this method can be applied to a tourism demand forecasting model, and improve the accuracy of time-series forecasting [28]. To further improve the method, the following directions can be followed: In this paper, neighborhood decision error rate is employed as an index of attribute significance, other non-monotonic indexes will be studied in the future. The improved reduction method has obtained better performance than the traditional heuristic algorithm. The maximum value will be obtained by the improved method.
Footnotes
Acknowledgments
The authors would like to express the sincere appreciation to the editor and anonymous reviewers for their insightful comments, which greatly improve the quality of this paper. This work is supported by the Natural Science Foundation of China (Nos. 62006128, 61976120, 62076111), Jiangsu Innovation and Entrepreneurship Program, Natural Science Foundation of Jiangsu Province (No. BK20191445), Natural Science Foundation of Jiangsu Higher Education Institutions of China (No. 20KJB520009), Qing Lan Project of Jiangsu Province, Science and Technology Project of Nantong City (No. JC2020141), Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. SJCX21_1447) and College Students’ Innovative Entrepreneurial Training Plan Program (No. 202110304024Z).
