Abstract
A heart attack is a common cause of death globally. It can be treated successfully through a simple and accurate diagnosis. Getting the right diagnosis at the right time is very important for the treatment of heart failure. Currently, the conventional method of diagnosing heart disease is not reliable. Machine learning is a type of artificial intelligence that can be used to analyze the data collected by sensors. Data mining is another type of technology that can be utilized in the healthcare industry. These techniques help predict heart disease based on various factors. We developed a prediction and recommendation model aimed at predicting heart disease using the Optimized Deep Belief Network. It does so by taking into account the various features of the heart disease UCI and Stalog database. Finally, the proposed method classifies healthy people and people with heart illness with an accuracy of 97.91%.
Keywords
Introduction
Rough set theory (RST), brought forward by Pawlak [21, 22], can effectively deal with the uncertainty of an information system (IS). The application of RST is mostly related to ISs [12, 36].
Attribute reduction (A-reduction) as a technology of data ming is to use the attribute evaluation function to evaluate the attributes, and filter the redundant attributes in the conditional attribute set, so as to reduce the data dimension and to improve the generalization performance and efficiency of algorithms. The core step of A-reduction is to construct the attribute evaluation function. This function can be used to select key representative attributes from high-dimensional data. So far, there have been many outstanding results. Yao et al. [33] brought up class-specific A-reduction in RST. Lang et al. [14] gave A-reduction in dynamic fuzzy covering information systems based on homomorphism. Li et al. [15] investigated A-reduction for heterogeneous data based on information entropy. Trabelsi et al. [26] put forward heuristic method for A-reduction from partially uncertain data using rough sets.
As the extension of RST, fuzzy rough sets (FR-sets) can effectively deal with the uncertainty and ambiguity in the actual scene, and has been successfully applied to A-reduction. Wang et al. [29] proposed a fitting model for A-reduction based on FR-sets. Chen et al. [4] presented a novel A-reduction algorithm based on FR-sets. Dai et al. [6] presented maximal-discernibility-pair-based approach to A-reduction based on FR-sets. Wang et al. [28] constructed A-reduction based on FR-sets using distance measures.
Usually, RST deals with MV-data in the following way. Similarity measure in an MSVDIS is introduced in the sense of the similarity between information values which is fed back to the object set. Then, the tolerance relations on the object set are constructed. But these kinds of tolerance relations have deficiencies when they are used in fuzzy rough computation.
Given an MSVDIS and a threshold λ. This paper defines the fuzzy symmetry relations on the object set based on the number of attributes about the similarity between information values which is not less than λ, where λ controls the similarity between information values. This leads to FR-sets for MV-data. Fuzzy positive regions and fuzzy dependency functions associated with the FR-sets are constructed. Thus, fuzzy rough iterative computation model (FRIC-model) for MV-data is obtained. A-reduction algorithms in an MSVDIS based on FRIC-model are given. Finally, the experimental results show that the given algorithm is more effective than some existing algorithms.
This paper is organized as follows. Section 2 recalls some notions about FR-sets, multisets and probability distribution sets. Section 3 introduces an MSVDIS. Section 4 introduces fuzzy symmetric relations in an MSVDIS. Section 5 defines some evaluation functions in an MSVDIS. Section 6 establishes FRIC-model. Section 7 presents A-reduction algorithm based on FRIC-model. Section 8 carries out numerical experiments to show the feasibility of the presented algorithm. Section 9 executes Friedman test and post-hoc test to further verify the stability of classification of the presented algorithm. Section 10 concludes this paper.
Preliminaries
In this section, we give an overview of FR-sets, multisets and probability distribution sets.
Throughout this paper, O signifies a finite set, 2
O
means the family of all subsets of O and |X| denotes the cardinality of X ∈ 2
O
. Put
FR-sets
Suppose that R is a fuzzy relation on O. Then R may be represented by M(R) =(R(o
i
, o
j
)) n×n, where R(o
i
, o
j
) ∈ I means the similarity between two objects o
i
and o
j
. the lower and upper fuzzy rough approximations of the fuzzy set F are defined as
Fuzzy-rough set model is the generalization of classical rough set model.
Multisets and probability distribution sets
In order to facilitate,
If M(x) = m, then x appears m times in M, we denoted it by m/x ∈ M or x ∈ m M.
Given X = {x1, x2, …, x
s
}. If M(x
i
) = m
i
(i = 1, 2, ⋯ , s), then M is denoted by {m1/x1, m2/x2, ⋯ , m
n
/x
s
}, i.e.,
(1) A = B⇔ A(x) = B(x) (x ∈ X) ;
(2) A⊑ B ⇔ A(x) ≤ B(x) (x ∈ X) ;
(3) (A⊔ B)(x) = max {A(x) , B(x)} (x ∈ X) ;
(4) (A ⊔ B)(x) = min {A(x) , B(x)} (x ∈ X) .
P can be seen as a map P : X → [0, 1], i.e., ∀ i, P(x i ) = p i .
(1) If s = t and ∀ i, x i = y i , p i = q i , then P and Q are said to be equal. Denote P = Q.
(2) If s = t and ∀ i, x i = y i , then P and Q are said to be approximately equal. Denote P ≃ Q.
Obviously, P = Q ⇒ P ≃ Q.
Obviously,
MSVDISs
(O, C, d) is named as a decision information system (DIS), if (O, C ∪ {d} is an IS, where C and d are a set of conditional attributes and a decision attribute.
Suppose that * means a missing value. If ∃ a ∈ C, * ∈ V a , but * ∉ V d = {f d (o) : o ∈ O}, then a DIS (O, C, d) is an incomplete decision information system (IDIS).
Let (O, C, d) be an IDIS. ∀ a ∈ C, indicate
An IDIS
An IDIS
If P ⊆ C, then (O, P, d) is known as a subsystem of (O, C, d).
Let M a = {f a (o1) , f a (o2) , ⋯ , f a (o n )} - {*} be a multiset (i.e., if f a (o1) = f a (o3), then it is allowed).
Suppose that ∀ i, m
i
expresses the number of occurrences of x
i
in M
a
. Then
If f
a
(o) =*, then f
a
(o) is replaced by {m1/x1, m2/x2, ⋯ , m
s
/x
s
}; if f
a
(o) = x
j
, then f
a
(o) is replaced by
After this treatment, (O, C, d) is an MSVDIS. We call it the MSVDIS induced by the IDIS (O, C, d).
The following example shows that an IIS induces an MSVDIS. Thus, an MSVDIS can be teated like an IIS.
An MSVDIS (O, A)
For convenience, denote
Clearly,
If θ = |P|, then
(1) If P1 ⊆ P2 ⊆ C, then ∀ λ ∈ [0, 1],
(2) If 0 ≤ λ1 < λ2 ≤ 1, then ∀ P ⊆ C,
Since P1 ⊆ P2, we have ∀ o, o′ ∈ O,
So ∀ o, o′ ∈ O,
Thus,
(2) By Definition 4.1,
Since 0 ≤ λ1 < λ2 ≤ 1, we have ∀ o, o′ ∈ O,
So ∀ o, o′ ∈ O,
Thus,
Proposition 4.2 illustrates that
Some evaluation functions in an MSVDIS
This part presents some evaluation functions in an MSVDIS.
(3) If 0 ≤ λ1 < λ2 ≤ 1, then ∀ P ⊆ C, ∀ X ∈ 2
O
,
Then
So ∀ o ∈ O,
Thus ∀ o ∈ X,
Note that ∀ u ∉ X,
This implies that ∀ o ∈ O,
Thus
(ii) Obviously,
Then ∀ u ∉ X,
Note that ∀ o ∈ X,
This implies that ∀ o ∈ O,
Thus
From the above,
(2) (i) Since P1 ⊆ P2 ⊆ C, by Proposition 4.2, we have
Then ∀ o ∈ O, o′ ∉ X,
This implies that ∀ o ∈ O,
Thus ∀ o ∈ O,
Therefore,
(ii) Since P1 ⊆ P2 ⊆ C, by Proposition 4.2, we have
Then ∀ o ∈ O, o′ ∈ X,
So
Thus
Therefore,
(3) (i) Since 0 ≤ λ1 < λ2 ≤ 1, by Proposition 4.2, we have
Then ∀ o ∈ O, o′ ∉ X,
This implies that ∀ o ∈ O,
Thus ∀ o ∈ O,
Therefore,
(ii) Since P1 ⊆ P2 ⊆ C, by Proposition 4.2, we have
Then ∀ o ∈ O, o′ ∈ X,
So
Thus
Therefore,
(1) If P1 ⊆ P2 ⊆ C, then ∀ λ ∈ [0, 1],
(2) If 0 ≤ λ1 < λ2 ≤ 1, then ∀ P ⊆ C,
(1) If P1 ⊆ P2 ⊆ C, then ∀ λ ∈ [0, 1],
(2) If 0 ≤ λ1 < λ2 ≤ 1, then ∀ P ⊆ C,
FRIC-model in an MSVDIS
This section gives FRIC-model in an MSVDIS.
Let (O, C, d) be an MSVDIS. Given P ⊆ C and λ ∈ [0, 1]. Suppose |P| ≤ θ1 < θ2 < ⋯ ≤ |C|. Then ∀ i, denote
Similarly,
Then,
Thus
Thus
Then
Thus
Thus
A-reduction in an MSVDIS
This part studies A-reduction in an MSVDIS and proposes the corresponding algorithm.
Algorithm 1 Computing
REQUIRE (O, C, d), P ⊆ C and λ ∈ [0, 1].
ENSURE
1: foro, o′ ∈ O
2: fora ∈ C
3: Calculate HD(Pf a (o), Pf a (o′));
4: ifHD(Pf a (o), Pf a (o′)) ≤ λ
5:
6: end if
7: end for
8: Obtain
9: Calculate
10: end for
11: return
It can be seen that Algorithm 1 has two cycles, and the time complexity is mainly determined by the scale of O and C. Therefore, the time complexity of the overall algorithm is O(|C| * |O|2). Based on the above FRIC-model, we propose Algorithm 2 for a reduct. This algorithm uses FRIC-model to solve the MV-data problem, so we call this algorithm MFRIC.
Algorithm 2 Obtain a reduct in an MSVDIS (FRIC)
REQUIRE (O, C, d), D ∈ O/d,
ENSURE A reduct P.
1: let P =∅;
2: while true do
3: θ = |P|;
4: fora ∈ C - P
5: Compute
6: end for
7: Find a* ∈ C - P such that
Then compute
8: ifsig(a*, P, d) > δ
9: P = P ∪ {a*
10: else
11: break;
12: end if
13: end while
14: return P.
By Algorithm 2 (MFRIC), we obtain a reduct about data. It adds a parameter δ in step 10, which is a threshold and usually take (0.05, 0.3). The δ is mainly used to control the significance. When δ takes different values, different reducts can be obtained. The time complexity of Algorithm 2 is polynomial. The time complexity from step(4) to step(7) is O(|C - P| * |O|2). The time complexity from step(8) is O(|C - P|). There are no loops in the remaining steps, and the time complexity is O(1). From step 2 to step 15, attribute reduction is realized, and the number of attributes is reduced from |C| to |P|. The time complexity is
Experiments
In order to verify the effectiveness of the proposed algorithm, we will introduce the results of numerical experiments in this section. All numerical experiments were completed on Lenovo computer with Intel Core i7-4790cup @ 3.60GHZ and 16G memory. Eight classified data were selected from UCI [27] machine learning repository for experiment. Since there is no MV-data in UCI, we choose incomplete data to replace MV-data. These eight data sets are detailed in Table 3.
A description of data sets
A description of data sets
In order to facilitate operation and observe the impact of deletion rate on the algorithm, 5%, 10% and 20% of the information are deleted randomly in the all data set. Our goal is to reduce data attributes while keeping the classification accuracy of data unchanged. The subset with the largest classification accuracy is recorded while reducing the data set, and the subset with the largest classification accuracy is taken as the optimal subset. Therefore, MFRIC algorithm is programmed with MATLAB to find the best reduction set. Because the proposed algorithm has a key parameter λ, the selection of λ will directly affect the reduced subset. Parameter λ is selected from 0.5 to 0.9 in steps of 0.05.
The attribute subset with the highest classification accuracy is selected as the final result. The number of reduced subsets and the value of parameter λ are shown in Table 4. At the same time, in order to verify the advantages and disadvantages of the proposed algorithm, we looked for the other three algorithms to compare with it. Because there are few literatures on multiset-valued information systems, algorithms KGIRA-M [35], FSRS [24] and FSNTDJE [25] were selected for comparison. KGIRA-M and FSNTDJE discuss incomplete information systems, and FSRS studies set-valued information systems. The average reduction subsets of the other three algorithms under the three deletion rates are shown together in Table 4.
Number of selected attributes
It can be seen from Table 4 that the average number of reduced attributes of algorithm MFRIC in the case of data loss of 5%, 10% and 20% is 6.38, 7.75 and 9.50 respectively, and the overall average number is 7.86. The reduction effect of this algorithm is equivalent to that of the other three algorithms, and can be considered to be effective.
Next, we will discuss the classification accuracy of the reduced subsets of the four algorithms under the same classifier. Two classifiers Classification And Regression Trees (CART) and Support Vector Machines (SVM) are selected to calculate the classification accuracy of reduced subsets. 5-fold cross validation is used to test the stability of the four algorithms. Tables 5 and 6 show the accuracy comparison of the four algorithms after reduction under the two classifiers. For comparison, the classification accuracy of all sets is calculated and the results are placed in column 2. Black bold font indicates the best result for this line. From the tables, it can be seen that MFRIC calculates slightly more quantities with higher accuracy than the other three algorithms. The accuracy of CART classifier is significantly higher than that of SVM. Further, it can be found that when the deletion rate increases gradually, the classification accuracy decreases, which is consistent with the intuitive understanding. However, does this mean that the proposed algorithm must be better than other algorithms? Statistical analysis is discussed in the next section. Through statistical analysis, we can judge whether the performance of the algorithm is significant.
Comparison of classification accuracies of reduced data with Classification And Regression Trees (CART)
Comparison of classification accuracies of reduced data with Support Vector Machines (SVM)
In the proposed algorithm, the parameter λ affects the change of classification accuracy. The relationship between λ and classification accuracy under different data sets is observed by drawing. Figure 1 shows the relationship between parameter λ and classification accuracy. It can be seen from the figure that the curve drawn by CART is slightly higher than SVM under the same deletion rate, which also shows that CART is slightly better than SVM in classification effect. The curve in the figure has no obvious change law, which shows that the value of parameter λ will affect the classification accuracy, but which value will have a greater impact depends on the data set and deletion rate.

Classification accuracy of reduced subset of algorithm MFRIC under classifier CART and SVM.
In order to compare the classification accuracy of the four algorithms, the classification accuracy of all the best reduced subsets is found, and the corresponding receiver operation characteristic curves(ROC) were drawn. The deletion rate of 5% was used for all experimental data. The ROC curves obtained by using CART classifier for the reduced subsets of the four algorithms are shown in Figure 2. The Area Under Curve (AUC) is calculated separately and displayed on the legend of each subgraph. The horizontal axis of Figure 2 is false positive rate and the vertical axis is true positive rate. It can be seen from Figure 2 that most of the areas are greater than 0.5, indicating that the classification effect is very nice. In most cases, the AUC of MFRIC algorithm is slightly larger than that of other algorithms, which shows that it not only has higher classification accuracy, but also has good classification balance. In order to confirm the effect of cart classifier, we also selected four data sets to verify whether the ROC curve and AUC under SVM classifier are as obvious as each other. As can be seen from Figure 3, the effect of classifier SVM is slightly worse than CART, but the AUC still exceeds 0.5. This shows that although the classification effect of SVM is not as good as CART, it also meets the requirements of classification.

ROC curve and AUC value under CART classifier.

ROC curve and AUC value under SVM classifier.
Friedman test is an nonparametric hypothesis test which is applied to compare the performance of classification algorithms statistically [5]. If Friedman test result is significant, then the null hypothesis “all algorithms have the same performance” will be rejected. Thus a post-hoc test such as Nemenyi test should be carried out to further test which algorithms perform better.
For T data sets and G algorithms, C
i
is the average rank of the i-th algorithm with i = 1, 2, ⋯ , G. The original Friedman statistic is defined as
Since the original Friedman test result is generally not significant when the value of G is too small, a different Friedman test with a test statistic F
F
is usually used. The statistic F
F
follows the F-distribution with G - 1 and (G - 1)(T - 1) degrees of freedom and it is defined as
The Friedman tests results with CART and SVM
The Friedman tests results with CART and SVM
Nemenyi test can be used to test whether there is difference between the performance of two algorithms. The critical value of Nemenyi test is defined as
The ranks of the proposed algorithms with different classifiers can be calculated by their classification accuracies from Tables 5 and 6. They are showed in Tables 7 and 8.
Ranks of different algorithms with CART
Ranks of different algorithms with SVM
The Friedman tests results with CART and SVM in Table 8. F F > F0.05(3, 69), p - value< α. The results of statistical analysis point out that the null hypothesis of “there is no significant difference between the classification accuracies of the four algorithms" is rejected both with CART and with SVM. This means that two Nemenyi tests are needed. The results of the Nemenyi tests in Figures 4 and 5 are illustrated as follows:
(1) Under the CART classifier
a) The algorithm MFRIC had a higher classification accuracy than that of KGIRA-M, FSRS, and FSNTDJE;
b) There was no significant difference in classification accuracy among algorithms KGIRA-M, FSRS, FSNTDJE.
(2) Under the SVM classifier
a) The algorithm MFRIC had a higher classification accuracy than that of FSRS and FSNTDJE;
b) The algorithm KGIRA-M had a higher classification accuracy than that of FSRS and FSNTDJE;
c) There was no significant difference in classification accuracy between algorithms MFRIC and KGIRA-M;
d) There was no significant difference in classification accuracy between algorithms FSRS and FSNTDJE.

The result of Nemenyi test with CART.

The result of Nemenyi test with SVM.
In this paper, the fuzzy symmetry relations on object set of an MSVDIS have been established based on the similarity between information values which is fed back to the attribute set, and FR-sets have been proposed. FR-sets use the fuzzy symmetry relations to define fuzzy rough approximations, fuzzy positive region and fuzzy dependency and then overcome the deficiency of classical rough sets. In the fuzzy symmetry relation, there is a parameter that controls the similarity between two objects. FRIC-model in an MSVDIS has been presented based on the iteration of fuzzy positive region and fuzzy dependency. A-reduction algorithm based on FRIC-model has been given. Experiments on eight datasets from UCI have been carried out. The experimental results and statistical analysis show that the given algorithm is fast and occupies few memory. The advantage of the given algorithm is to define an iterative formula for A-reduction algorithm in an MSVDIS. In the near future, we will develop FRIC-model for other type of data.
Footnotes
Acknowledgments
The authors would like to thank the editors and the anonymous reviewers for their valuable comments and suggestions, which have helped immensely in improving the quality of the paper. This work is supported by Natural Science Foundation of Guangxi (2021GXNSFAA220114).
