Abstract
Based on the quaternion system, we give a new representation of the complex vague soft set, and related logical operations. This new representation contains more information than before. Three quaternion distance measures are proposed and a decision model is established. The disease diagnosis of breast cancer is applied to the model to reflect the superiority of the model. By comparing the diagnostic errors under the different distance measures, the most suitable distance measure for this dataset is selected.
Introduction
In real life, there are numerous uncertain data. However, when we apply these data to make decisions only through classical theory, we sometimes fail to achieve our goals. Therefore, many mathematicians began to study uncertain data. Over time, a set of systematic fuzzy theory has been gradually formed to solve these problems, such as Intuitive fuzzy set theory, Rough set theory and so on.
In 1999, in order to deal with uncertainties, Molodtsov [1] introduced the concept of soft set as a new mathematical tool. Since then, the soft set theory has received extensive attention. The concept and basic properties of soft set theory were presented in [1, 2]. Maji et al. [3] were the first one to introduce the concept of fuzzy soft sets, and Majumdar and Samanta [4] further generalized it. Yang et al. [5] introduced the notion of interval-valued fuzzy soft set. Xiao et al. [6] introduced the notion of exclusive disjunctive soft sets. Maji et al. [7, 8] gave the concept of intuitionistic fuzzy soft sets. Wang and Qu [9] introduced several different measures and their properties of vague soft sets.
The theory of vague sets was first proposed by Gau and Buehrer [10]. A vague set is defined by a closed interval formed by a combination of membership function and non-membership function. In fact, vague set is an extension of fuzzy set [9]. The basic concepts and some applications of vague set theory and its extensions can be found in [10–18]. As mentioned in [19], vague sets are equivalent to intuitionistic fuzzy sets (IFS). So we conclude that the equivalence relation still holds after combining the above two sets with the concept of soft set [9, 11].
In 2011, Tamir et al. [20] gave the representation of the pure complex grade of membership in the polar coordinate system and the Cartesian coordinate system respectively, and proposed a new concept of complex fuzzy class (CFC). Other researchers had studied more related concepts on this basis. Among them, Selvachandran et al. [21] proposed the concept of complex vague soft set (CVSS) based on the polar representation form of complex membership function. In 2016, Tamir et al. [22] combined the degrees of membership with non-membership in IFS by the tool of complex numbers, and proposed several important properties. As an extension of the complex number system in the form of Cartesian coordinates, the quaternion system expands the dimension of the space from two to four, which can contain more information. Based on the advantages of the quaternion system, Ngan et al. [23] combined the system with IFSs to include more fuzzy information.
Inspired by the previous paragraph, in this paper, we combine the advantages of the quaternion system with CVSS, and introduce a new representation of complex vague soft sets with the help of the quaternion system. Then we propose three different distance measures, and give a new decision model to apply. Compared to the representation of VSS, new representation extands the dimension of fuzzy information. Compared to the previous CVSS representation, the main contributions of using quaternion representation are: (1) this representation combines the advantages of the quaternion system and provides a compact representation of CVSS; (2) it combines membership and non-membership functions together for a more comprehensive analysis to make the final result more accurate. Finally, the practicality of the model in medicine can be seen through examples in medical diagnosis.
Preliminaries
In this section, we will show several basic notions which are necessary for this paper. They are presented as follows:
A ={ (x, [t A (x) , 1 - f A (x)]) ∣ x ∈ U }, i.e. A (x) = [t A (x) , 1 - f A (x)], where ∀x ∈ U, 0 ⩽ t A (x) + f A (x) ⩽1. Then t A (x), f A (x) are called the degree of membership and non-membership of the element x to vague set A, respectively.
To illustrate the idea, let us consider the example as following:
New representation of complex vague soft set based on quaternion numbers
Based on the above basic concepts, in this chapter, firstly, we rewrite the definition and the property of the vague soft set by complex function in Cartesian coordinates. Secondly, we combine the quaternion system in 4-dimensional space, introduced by Hamilton in 1943 [25], with the concept of complex vague soft sets to propose a new definition of CVSSs and give a new order relation. Finally, the advantages of the new expressions introduced are demonstrated through concrete examples.
(F, A) = { (x, z
F
a
) : z
F
a
= t
F
a
(x) + j (1 - f
F
a
(x)) , x∈ U, a ∈ A },
A ⊆ B, ∀a ∈ A, z
F
a
≤ z
G
a
⇔ t
F
a
(x) ≤ t
G
a
(x) , f
F
a
(x) ≥ f
G
a
(x) , ∀ x ∈ U.
This relation is denoted by (F, A) ⊆ (G, B). Particularly, if both (F, A) ⊆ (G, B) and (G, B) ⊆ (F, A) hold, then we call (F, A) and (G, B) are vague soft equal, denoted by (F, A) = (G, B).
α
F
a
(x) , β
F
a
(x) , ω
F
a
(x) , γ
F
a
(x) ∈ [0, 1],
α
F
a
(x) + β
F
a
(x) ≤1,
ω
F
a
(x) + γ
F
a
(x) ≤1,
α
F
a
(x) + ω
F
a
(x) ≤1,
β
F
a
(x) + γ
F
a
(x) ≤1,
The values α F a (x) , β F a (x) , ω F a (x), and γ F a (x) are the degrees of real membership, imaginary membership, real non-membership and imaginary non-membership, respectively, of x ∈ U, a ∈ A. Then, (F Q , A) is called the complex vague soft set based on quaternion function(CVSS-Q), denoted as follows:
It can be seen that in q F a , if β F a = γ F a = 0, then we can get q F a = α F a + j (1 - ω F a ) = t F a + j (1 - f F a ), where α F a (x) , ω F a (x) , α F a (x) + ω F a (x) ∈ [0, 1]. Hence, the representation of VSSs in Definition3.1 is a special case of the proposed model.
Here, we take an example to check the condition "t F a + f F a ∈ [0, 1] " of Definition3.1 for the introduced sets, i.e., the inequalities of Definition3.3.
Take q F a 1 (x3) for example. Due to 0.9 + 0.1 = 1 ≤1, 0 + 0.9 = 0.9 ≤ 1, we say that (x3, F Q (a1)) satisfies the conditions as follows: α F a (x) + β F a (x) ≤1, ω F a (x) + γ F a (x) ≤1, α F a (x) + ω F a (x) ≤1, β F a (x) + γ F a (x) ≤1. Similarly, we can conclude that for other a i ∈ A, x i ∈ U, the conditions are still satisfied. So (F Q , A) indeed satisfies the conditions of a complex vague soft set in Definition3.3.
The (F
Q
1
, A) is defined to be a subset of (F
Q
2
, A) if and only if for ∀a ∈ A,
Let (F, A), (G, B) be two CVSSs-Q over a universe U. If A ⊆ B and for ∀a ∈ A, q F a ≤ q G a holds (as defined in Definition3.5), then we get z F a ≤ z G a (as defined in Definition3.2), where z F a = α F a + j (1 - ω F a ), z G a = α G a + j (1 - ω G a ). Therefore, the proposed quaternion representation is able to better evaluate the relationship between VSSs.
In order to show the superiority of quaternion representation over the complex one, here we give an example.
From Definition3.4, 3.5, we have (F Q , A) ⊆ (G Q , A). But from Definition3.2, we have (F, A) = (G, A).
From the above example, it can be seen that in the Cartesian coordinate system, the conclusion obtained under complex number representation is not as accurate as that under quaternion representation. Therefore, the new representation proposed in this article contains more fuzzy information and leads to a more accurate conclusion.
Some algebraic operations
In this chapter, we give new logical operations on CVSSs-Q and analyze their properties.
The (P
Q
, C) , (H
Q
, C) are characterized by the quaternion function as: In A - B,
In B - A,
In A ∩ B,
(reflectivity property) (F
Q
1
, A1) ∪ (F
Q
1
, A1) = (F
Q
1
, A1), (F
Q
1
, A1) ∩ (F
Q
1
, A1) = (F
Q
1
, A1). (commutative property) (F
Q
1
, A1) ∪ (F
Q
2
, A2) = (F
Q
2
, A2) ∪ (F
Q
1
, A1), (F
Q
1
, A1) ∩ (F
Q
2
, A2) = (F
Q
2
, A2) ∩ (F
Q
1
, A1). (associative property) ((F
Q
1
, A1) ∪ (F
Q
2
, A2)) ∪ (F
Q
3
, A3) = (F
Q
1
, A1) ∪ ((F
Q
2
, A2) ∪ (F
Q
3
, A3)), ((F
Q
1
, A1) ∩ (F
Q
2
, A2)) ∩ (F
Q
3
, A3) = (F
Q
1
, A1) ∩ ((F
Q
2
, A2) ∩ (F
Q
3
, A3)). (distributive property) ((F
Q
1
, A1) ∩ (F
Q
2
, A2)) ∪ (F
Q
3
, A3) = ((F
Q
1
, A1) ∪ (F
Q
3
, A3)) ∩ ((F
Q
2
, A2) ∪ (F
Q
3
, A3)), ((F
Q
1
, A1) ∪ (F
Q
2
, A2)) ∩ (F
Q
3
, A3) = ((F
Q
1
, A1) ∩ (F
Q
3
, A3)) ∪ ((F
Q
2
, A2) ∩ (F
Q
3
, A3)). (negative property) ((F
Q
1
, A1) ∪ (F
Q
2
, A2))
N
= (F
Q
1
, A1)
N
∩ (F
Q
2
, A2)
N
, ((F
Q
1
, A1) ∩ (F
Q
2
, A2))
N
= (F
Q
1
, A1)
N
∪ (F
Q
2
, A2)
N
.
(1) for ∀a ∈ A1 ∩ A2,
So we can get
(2) for all a ∈ A1 - A2 and a ∈ A2 - A1, the equation still holds, obviously.
Thus, we can conclude that the property (5) holds.□
Quaternion distance measure
In this chapter, we propose and study a quaternion-based Euclidean distance measure and a new order relation in CVSS-Q.
d ((F
Q
, A) , (G
Q
, A)) = d ((G
Q
, A) , (F
Q
, A)); d ((F
Q
, A) , (G
Q
, A)) ∈ [0, 1];
d ((F
Q
, A) , (G
Q
, A)) =0 ⇔ (F
Q
, A) = (G
Q
, A); (F
Q
, A) ⊆ (G
Q
, A) ⊆ (P
Q
, A) ⇒ d ((F
Q
, A) , (P
Q
, A)) ⩾ max(d ((F
Q
, A) , (G
Q
, A)) , d ((G
Q
, A) , (P
Q
, A))).
The Euclidean quaternion distance:
The Hamming quaternion distance:
The Hausdorff quaternion measure:
(1) From the definition of d E ((F Q , A) , (G Q , A)), we can get d E ((F Q , A) , (G Q , A)) = d E ((G Q , A) , (F Q , A)).
(2) From the definition of q F a , q G a , we know that for all a i ∈ A, x j ∈ U, α, β, γ, ω ∈ [0, 1]. Thus, (α F a i (x j ) - α G a i (x j )) 2, (β F a i (x j ) - β G a i (x j )) 2, (ω F a i (x j ) - ω G a i (x j )) 2, (γ F a i (x j ) - γ G a i (x j )) 2 ∈ [0, 1]. So we can conclude that d E ((F Q , A) , (G Q , A)) ∈ [0, 1] .
(3) Similar to (2), we can prove that
(4) Because d E ((F Q , A) , (G Q , A)) =0 ⇔ ∀ a i ∈ A, x j ∈ U, α F a i - α G a i = 0, β F a i - β G a i = 0, γ F a i - γ G a i = 0, ω F a i - ω G a i = 0 . Thus, for all a i ∈ A, x j ∈ U, we have q F a i (x j ) = q G a i (x j ), which is equivalent to (F Q , A) = (G Q , A).
(5) By the Definition 3.5, we know that (F Q , A) ⊆ (G Q , A) ⊆ (P Q , A) implies for all a i ∈ A, q F a i ≤ q G a i ≤ q P a i . Combining Definition 3.4 and the expression of the Euclidian quaternion distance measure between two CVSSs-Q, we can easily conclude that d E ((F Q , A) , (P Q , A)) ⩾ max(d E ((F Q , A) , (G Q , A)) , d E ((G Q , A) , (P Q , A))) .
Similarly, we can obtain proofs of the other two distance measures. □
Decision making model based on Quaternion distance measures
Based on the CVSS being represented by quaternion numbers and four quaternion distance measures proposed above, this chapter propose a new decision making model. Here we define the quaternion decision model as QDM.
A complete set of medical disease decision-making model is divided into the following parts:
1. Collecting patient data:
(1) Encoding the patient;
(2) Each group of patient data is processed by CVSS-Q.
2. Constructing disease diagnostic criteria on CVSS-Q through the training dataset.
3. Calculating the distance between patient dataset and disease diagnosis criteria, and then give medical diagnoses.
The QDM model is performed in C-QDM. The detailed description is as follows:
1. Quaternionification: Consider indicators I j (j = 1, 2, . . . , n) of a disease D and the data corresponding to m groups of patients p i (i = 1, 2, . . . , m). The dataset encoded in Cartesian form of quaternion numbers is shown as follows:
The (α ij , β ij , ω ij , γ ij ) = (α j (p i ) , β j (p i ) , ω j (p i ) , γ j (p i )) and y i represent the data of patient p i corresponding to the indicator I j and the final diagnostic result of p i , respectively. The α ij , β ij , ω ij , γ ij are calculated as follows:
and
where
2. Trainig Process: Assuming that there has a set of training dataset, which contains s + t groups of data. For every indicator I j (j = 1, 2, . . . , n), the judgment standard of it can be expressed as (α j , β j , ω j , γ j ), which can be calculated by the following:
α j = f (α ij ), β j = min{ 1 - α j , f (β ij ) }, ω j = min{ 1 - α j , f (ω ij ) }, γ j = min {1 - β j , 1 - ω j , f (γ ij )}
where
3. Testing process: Let dataset of patient p
i
be P
i
, and the diagnosis criteria of the disease D be D*. Then
If the degree of fit with disease D is divided into the following stages:
where e ∈ [0, 1] is the best threshold selected by the training process.
Patient dataset of the cold
Patient dataset of the cold
The class of the cold is defined as follows:
According to the quaternion Euclidean distance formula of the model, we can calculate as follows:
Then, from the model, we can give the final diagnosis: the patient p4 has a cold but others not. This is consistent with the actual diagnostic results in the table.
However, if we calculate the Euclidean distance measure of vague soft set, we can get:
Thus, we can give the final diagnosis: the patient p1, p2, p4 has a cold, which contradicts the actual diagnostic results in the table.
In this section, we validate the decision model proposed inSection 6 with real-world real patient data. In addition to the Euclidean distance measure used in the model, we replace the other two distance measures into the instances and apply them to real cases, resulting in three sets of diagnostic results. The three diagnosis results are compared, and the optimal solutions under different circumstances are proposed.
The real patient data we use is Breast Cancer Wisconsin (Original) Dataset (BCWD) from the University of California (UCI) Machine Learning Repository (https ://archive . ics . uci . edu/ml/index . php).
Experimental process
The description of BCWD is shown as below:
Here we delete the instances containing "?" and divide the remaining instances into two groups,applying into training process and testing process, respectively. Then the experimental process is specifically divided into the following steps:
step1. Use quaternion function to quaternize total dataset.
step2. Construct disease diagnostic criteria by calculating the training dataset.
step3. Calculate the distance between the training dataset and criteria.
step4. Give the value of e, according to the distance and the class attributes.
step5. Calculate the distance between testing dataset and criteria.
step6. Give medical diagnosis by comparing the results in previous step and e. And derive accuracy.
step7. Replace the distance measure in the model with the other two measures and repeat the above steps.
Experimental results
We program the training and testing process of the dataset in python software, and the program finally output the similar distance for each patient and disease. Then the comparison between the obtained results and the actual diagnostic results are done in Excel. The test results for different distance measures in the new model are obtained. In addition, we append the distance measures of vague soft set for comparison. According to the testing results, we draw the graph as shown in Fig. 1. The first column of each group of graphs represents the real number of diagnoses in the testing dataset; the second to fourth columns represent the numbers of diagnoses under the d
E
, d
v
, d
H
distance measures, respectively. The fifth to seventh columns represent the numbers of diagnoses under the Euclidean distance measure(

Testing Results.
From the figure, we can easily find that all the distance measures under quaternion system give much higher accuracy in disease diagnosis than those under complex system. After calculating, we find that the accuracy of d
E
, d
v
, d
H
are 90.2%, 90.6%, 5.5%, while the accuracy of
Beginning from our conviction in the importance of the concepts of VSS. In this article, we first use the complex number function to represent the VSS, and then by using quaternion function, we introduce a new representation method of CVSSs. In this way, we can simply express the degree of real membership, imaginary membership, real non-membership and imaginary non-membership of the initial elements under each parameter. Based on the above concepts, we give some logic and operations on CVSS-Q. Then, we give three quaternion distance measures between two CVSSs-Q and apply to decision models, behaving more excellent than before. The model can also be applied in many other important areas, such as decision making, pattern recognition, cluster analysis, fuzzy control, comprehensive decision analysis, etc.
However, there are still many imperfections in the content proposed in the article: there are still many distance measures under quaternion, and the accuracy of this model needs to be improved. Therefore, in future research, I will focus on these points.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Footnotes
Acknowledgements
The works described in this paper are supported by the National Natural Science Foundation of China under Grant nos.11501444,11726019.
