Parallel FCM clustering algorithm of fuzzy number based on cut set

Abstract

Aiming at the problem of complex method and low efficiency of fuzzy numbers in classification processing, a parallel Fuzzy CMeans (FCM) clustering method based on cut set is proposed. Firstly, according to the decomposition theorem, the fuzzy numbers are divided horizontally into the form of the union of interval numbers, and then the interval numbers are transformed into the determined “real” data, and the parallel FCM clustering algorithm is used to classify the fuzzy numbers. The theoretical analysis and application show that the method has good classification accuracy and efficiency for fuzzy data clustering.

Keywords

Fuzzy number fuzzy C means clustering interval number influence factor

1. Introduction

Fuzzy number is a kind of special data widely existing in the information world. It expresses inaccurate and uncertain description of transaction, so it can truly reflect the characteristics of objective transaction, such as meteorological observation data, and language data such as “about or almost” generated in people’s communication. At present, the analysis and processing of this kind of data has gradually attracted the interest of researchers, and many local research results have been obtained [1, 2, 3, 4]. However, due to the complexity of fuzzy number itself, much of the work is still preliminary [5]. At this stage, relevant methods and technologies need to be further improved. They includes: for the problem of fuzzy number expression and conversion, Auephanwiriyakuk et al. [6] expressed the fuzzy number as $(m,\alpha,\beta)$ (mean, left and right offset), this expression makes the subsequent processing of fuzzy number complicated; Sinova et al. [7] gave the method of maximum asymptotic deviation for position estimation of fuzzy number, but this method did not give a detailed description of the value of the expansion value of the left and right endpoints of the fuzzy number; Aiming at the clustering result of fuzzy number and practical application problem, although domestic and foreign scholars have proposed a series of algorithms [8, 9, 10, 11, 12, 13, 14], these algorithms have certain application effects and reference significance in solving related clustering problems, but they also exist certain shortcomings and deficiencies, such as Gao et al. [8] proposed a clustering method to convert fuzzy number into interval number based on fuzzy decomposition theorem, and initially proposed a multi-interval number parallel structure model, but they did not specify the interval impact on the clustering results in parallel implementation. Yazdi et al. [9] proposed a hierarchical model fuzzy number clustering method based on extended tree (ET), a clustering tree was formed by $\alpha$ cut set, and the samples containing noise clustering is more robust, but the median position of the model cannot be determined well, resulting in low clustering accuracy. In addition, there are FCM clustering methods of multiple attributes information based on polyline triangle fuzzy numbers proposed by Duan [10] and Fan [11], these methods have some shortcomings.

In 1973, Bezdek proposed Fuzzy C-Means (FCM) clustering algorithm based on objective function according to fuzzy theory and fuzzy relation. Since then, the research of FCM algorithm has been carried out continuously, and its application is very extensive. For the details of FCM algorithm, please refer to the relevant literature, which is not detailed here. Based on the comprehensive analysis of the fuzzy number processing methods proposed by scholars at home and abroad, and inspired by the interval number clustering given in literature [8], a parallel FCM clustering algorithm based on cut set is proposed, and the implementation of the algorithm is given for the one-dimensional bell fuzzy number of normal distribution and the triangular fuzzy number of multi-dimensional attributes. The method has a clear design idea and sufficient theoretical basis. Experiments show that it has high clustering accuracy and has certain reference value for the analysis and processing of uncertain data in the information world.

2. Transforming fuzzy number into interval number

2.1 The relationship between fuzzy number and interval number

Fuzzy number is a basic concept in fuzzy set theory. It is a kind of special uncertain data defined on the data set. Common fuzzy numbers include normal fuzzy numbers, triangular fuzzy number, trapezoidal fuzzy number, etc. According to the principle of expansion in fuzzy mathematics, it is intuitive and natural to extend ordinary algebra operation to fuzzy number, but in actual operation, the method is difficult to implement. In addition, there is an inevitable connection between fuzzy number and interval number. The calculation of interval number is relatively simple. Therefore, before clustering fuzzy numbers, we can convert them to interval numbers and then cluster them based on interval numbers to achieve fuzzy clustering analysis.

Let $R^{+}$ is a set of positive real numbers, represent $[0,+\infty)$ , $F(R^{+})$ is a fuzzy power set of $R^{+}$ , and $\tilde{A}\in F(R^{+})$ is a fuzzy number, according to the definition of fuzzy number and the decomposition theorem (for details of the decomposition theorem, refer to the relevant knowledge of fuzzy mathematics, which will not be detailed here) $\tilde{A}$ is a regular closed convex set on the universe, so any cut set of $\tilde{A}$ is a closed interval, that is:

$\displaystyle\tilde{A}=\mathop{\cup}\limits_{\lambda\in[0,1]}\lambda\cdot[% \tilde{A}_{\lambda}^{-},\tilde{A}_{\lambda}^{+}]$ (1)

Among them, $\tilde{A}_{\lambda}^{-},\tilde{A}_{\lambda}^{+}$ respectively represent value of the left and right end points of the interval which formed by the $\lambda$ cut of the fuzzy number $\tilde{A}$ . In practical applications, $\tilde{A}$ can be approximated by the union of a finite number of intervals, that is:

$\displaystyle\tilde{A}\approx\mathop{\cup}\limits_{i=1}^{h}\lambda_{i}\cdot[% \tilde{A}_{\lambda_{i}}^{-},\tilde{A}_{\lambda_{i}}^{+}]$ (2)

In this way, the fuzzy number $\tilde{A}$ becomes stepwise, it becomes the union of the product of interval number and $\lambda_{i}$ . The larger $h$ is, that is, the more the number of $\lambda_{i}$ , the closer the real fuzzy number is. The value of $h$ can be given according to the actual situation. In addition, from Eq. (2) it can be known that the fuzzy number $\tilde{A}$ has become the union of multiple interval numbers at this time, so the clustering analysis of the fuzzy numbers is transformed into the clustering of the interval numbers.

2.2 Interval representation of two kinds of fuzzy numbers

The relationship between fuzzy number and interval number was given earlier. The following two kinds of fuzzy numbers (multi-attribute index triangle fuzzy number, normal distribution fuzzy number) are used to design their representation methods, and other representations such as trapezoidal fuzzy number can be used for reference.

Multi-attribute index triangular fuzzy number: Let $\tilde{A}$ be a triangular fuzzy number, expressed as $\tilde{A}=(\tilde{A}^{L},\tilde{A}^{C},\tilde{A}^{R})$ , where $\tilde{A}^{L},\tilde{A}^{R}$ respectively represent the left and right endpoints of the fuzzy number, $\tilde{A}^{C}$ represents the point value with the highest membership, and its membership function is:

$\displaystyle\tilde{A}(x)=\left\{{\begin{array}[]{l}0,x<\tilde{A}^{L}\\ (x-\tilde{A}^{L})/(\tilde{A}^{C}-\tilde{A}^{L}),\tilde{A}^{L}\leqslant x% \leqslant\tilde{A}^{C}\\ (x-\tilde{A}^{R})/(\tilde{A}^{C}-\tilde{A}^{R}),\tilde{A}^{C}\leqslant x% \leqslant\tilde{A}^{R}\\ 0,x>\tilde{A}^{R}\\ \end{array}}\right.$

According to the principle that cut set of the normalized fuzzy number is an interval number, the $\lambda$ cut set of the fuzzy number is an interval number, that is, $\tilde{A}_{\lambda}=[\tilde{A}^{L}+\lambda(\tilde{A}^{C}-\tilde{A}^{L}),\tilde% {A}^{R}+\lambda(\tilde{A}^{C}-\tilde{A}^{R})]$ , where $\lambda\in[0,1]$ , when the point $\tilde{A}^{C}$ with the highest membership is added to the interval number, the above interval number becomes a three-parameter interval number, that is:

$\displaystyle\tilde{A}_{\lambda}=[\tilde{A}^{L}+\lambda(\tilde{A}^{C}-\tilde{A% }^{L}),\tilde{A}^{C},\tilde{A}^{R}+\lambda(\tilde{A}^{C}-\tilde{A}^{R})]$ (3)

In addition, the data set $S=\{\tilde{A}_{1},\cdots\tilde{A}_{k},\cdots,\tilde{A}_{n}\}$ contains $n$ fuzzy numbers, each fuzzy number $\tilde{A}_{k}=(\tilde{A}_{k1},\tilde{A}_{k2},\cdots,\tilde{A}_{kp})$ is composed of $p$ -dimensional attributes, and each attribute is a triangular fuzzy number. According to Eq. (3), the $\lambda$ cut set of attribute $\tilde{A}_{kl}(l=1,2,\cdots,p)$ is $[\tilde{A}_{kl}^{L}+\lambda(\tilde{A}_{kl}^{C}-\tilde{A}_{kl}^{L}),\tilde{A}_{% kl}^{C},\tilde{A}_{kl}^{{}^{R}}+\lambda(\tilde{A}_{kl}^{{}^{C}}-\tilde{A}_{kl}% ^{R})]$ . When clustering interval numbers (See Section 3.1 below), transform the interval number into real data composed of the interval median and interval size. At this time, the maximum point $\tilde{A}_{kl}^{{}^{C}}$ of membership can be regarded as the interval median, and the interval size is:

$\displaystyle[\tilde{A}_{kl}^{{}^{R}}+\lambda(\tilde{A}_{kl}^{{}^{C}}-\tilde{A% }_{kl}^{R})]-[\tilde{A}_{kl}^{L}+\lambda(\tilde{A}_{kl}^{C}-\tilde{A}_{kl}^{L}% )]=(1-\lambda)(\tilde{A}_{kl}^{R}-\tilde{A}_{kl}^{L})$

Therefore, the two-dimensional number composed of the interval median and the interval size after the transformation is: $(\tilde{A}_{kl})_{\lambda}=(\tilde{A}_{kl}^{C},\alpha(1-\lambda)(\tilde{A}_{kl% }^{R}-\tilde{A}_{kl}^{L}))$ .

Normal distribution fuzzy number: Suppose that an observation sample set $S=\{\tilde{A}_{1},\cdots\tilde{A}_{k},\cdots\tilde{A}_{n}\}$ contains $n$ samples, and each sample $\tilde{A}_{k}$ is a one-dimensional normal distribution fuzzy number, and its membership function is expressed as: $\tilde{A}_{k}(x)=e^{-\textstyle{{x-a}\over\sigma}},(\sigma>0,x\in R)$ , where $\alpha$ is the mean value of the normal distribution and $\sigma$ is the standard deviation of the normal distribution, record $\tilde{A}_{k}=N(a,\sigma)$ . The method of converting it into interval numbers is carried out according to Section 2.1, which will not be described in detail here.

3. Parallel FCM clustering of fuzzy numbers based on cut sets

3.1 Interval number conversion into real data

The interval number is also a special type of uncertain data. For this type of data, the interval number can be transformed into real data, and then the traditional FCM algorithm is called for clustering. For the interval number $\bar{A}_{\lambda}=[\tilde{A}_{\lambda}^{-},\tilde{A}_{\lambda}^{+}]$ in Eq. (1), in order to no information is lost during the clustering process, and the relationship between the interval median and the interval size needs to be taken into account. The interval number can be mapped to the real number space $R(\dot{A}_{\lambda},\hat{A}_{\lambda})$ , which is formed by the interval median $\dot{A}_{\lambda}$ and the interval size $\hat{A}_{\lambda}$ , to form a point in the space $A_{\lambda}$ , that is:

$\displaystyle{M:}\bar{A}_{\lambda}\in I(R^{+})\to A_{\lambda}\in R^{2}$ (4)

Where $I(R^{+})$ is a set of one-dimensional interval numbers, $R^{2}$ is a set of two-dimensional real numbers, and the transformed sample is $A_{\lambda}=(\dot{A}_{\lambda},\alpha\hat{A}_{\lambda})$ , which is a common two-dimensional real number, and $\alpha$ is a weighted interval size influence factor. This is a very important parameter to control the effect of interval size on clustering. The definition of $\alpha$ is: if the interval median is determined, the larger the interval, that is, the greater the distance between the median and the left and right end points of the interval, the more the impact of interval size on the clustering of interval numbers. For example, today’s temperature is about 25 ${}^{\circ}$ C, then 25 ${}^{\circ}$ C is the interval median, and the range of change can be the number of intervals [24, 26], or [23, 27]. Obviously, the larger the range of change is, the greater the effect of number clustering is. Based on this, $\alpha$ can be expressed as:

$\displaystyle\alpha=(\dot{A}_{\lambda}-A_{\lambda}^{-})/\dot{A}_{\lambda}=(A_{% \lambda}^{+}-\dot{A}_{\lambda})/\dot{A}_{\lambda}$ (5)

Replaced Eq. (5) with the interval median $\dot{A}_{\lambda}=(\tilde{A}_{\lambda}^{-}+\mathord{\buildrel\lower 3.0pt\hbox% {$\scriptscriptstyle\frown$}\over{A}}_{\lambda}^{+})/2$ , we get:

$\displaystyle\alpha=(\tilde{A}_{\lambda}^{+}-\tilde{A}_{\lambda}^{-})/(\tilde{% A}_{\lambda}^{-}+\tilde{A}_{\lambda}^{+})$ (6)

It can be seen from Eq. (6) that $\alpha\in[0,1]$ , and the larger the interval, that is to say, the larger the interval width of the interval number, the greater the degree of uncertainty. Therefore, when it is converted into the real number for clustering, the greater the impact on the clustering results. When $\alpha=1$ , the interval size and the median interval is equally important, when the interval narrows until it reaches 0, that is, $\tilde{A}_{\lambda}^{-}=\tilde{A}_{\lambda}^{+}$ , $\alpha=0$ , then the interval number becomes common real data. At this time, the interval size has no effect on the clustering, so $\alpha$ can be regarded as a measure of the influence of interval size on interval number clustering.

3.2 Parallel FCM clustering algorithm for fuzzy numbers based on cut sets

After defining the relationship between fuzzy number and interval number and the method of converting interval number into real data, a parallel FCM clustering method based on interval value of fuzzy number is proposed. Suppose the observation sample set $S=\{\tilde{A}_{1},\cdots\tilde{A}_{k},\cdots\tilde{A}_{n}\}$ , according to Eq. (2), the fuzzy number $\tilde{A}_{k}$ can be expressed as the union of $h$ the interval numbers of $\lambda_{i}$ -horizontal cut sets, and the larger $h$ is, the closer the union set is to the true fuzzy number, so that all fuzzy FCM clustering is performed for each interval number of all fuzzy numbers, and then the $h$ clustering results are combined to approximate the FCM clustering of the real fuzzy set, that is, the fuzzy number FCM clustering objective function is defined as:

$\displaystyle\min J_{m}(U,\tilde{P})=\sum_{i=1}^{h}{\lambda_{i}}\left(\sum_{j=% 1}^{c}\sum_{k=1}^{n}(u_{jk})^{m}D^{2}[(\tilde{A}_{k})_{\lambda_{i}},(\tilde{P}% _{j})_{\lambda_{i}}]\right)$ (7)

In the above formula, the fuzzy weighted index $m\in[1,\infty)$ , on the right () is the interval number FCM clustering objective function formed by $\lambda_{i}$ -cut set. It is clear that Eq. (7) is composed of $h$ interval-based FCM clustering objective functions, and each objective function minimization based on interval number clustering is independent of each other, so each item about objective function can be minimized according to different $\lambda_{i}$ values. The final union operation is the objective function of fuzzy number clustering, as shown in Fig. 1.

Figure 1.

Fuzzy number clustering algorithm based on interval value parallel FCM.

It can be seen from Fig. 1 that the FCM algorithm for fuzzy number can be implemented by the interval number FCM algorithm. The algorithm structure is parallel and the accuracy of the algorithm can be adjusted by the value of $h$ . As can be seen from Section 2.1, the larger $h$ is, the closer the union of interval numbers is to the fuzzy number, the higher the accuracy of the algorithm.

3.3 Parallel FCM algorithm implementation steps

With the method of converting interval number into real data, combined with Fig. 1, the implementation of parallel FCM clustering based on fuzzy number composed interval value is as follows:

Step1:
Set an $h$ value, according to $\lambda_{i}(i=1,2,\cdots h)$ , decompose each fuzzy number in the fuzzy data set into the interval numbers formed by $h$ $\lambda_{i}$ -cuts;
Step2:
For each interval number obtained in step1, transform it into ordinary real data according to Eq. (4). In the process, a reasonable value of the influence factor $\alpha$ is set according to Eq. (6). In this way, the real data sets corresponding to $h$ interval data sets is formed;
Step3:
Initialize the FCM clustering of eachreal data set, set the iteration termination threshold $\varepsilon$ , initialize the cluster center $\tilde{v}^{(0)}$ , set the iteration counter $b=0$ , and then perform iterative calculation according to the classic FCM algorithm. After meeting the termination condition, the best division of membership $(u_{jk}^{\ast})_{\lambda_{i}}$ and clustering center $(\tilde{v}_{j}^{\ast})_{\lambda_{i}}=(\dot{v}_{j},\alpha\hat{v}_{j})_{\lambda_% {i}}$ on samples are obtained;
Step4:
Inverse transform, restore the clustering center obtained in step3 to obtain the left and right endpoint values of the interval of clustering center $\tilde{v}_{j}^{\ast}$ , that is:

$\displaystyle(v_{j}^{-})^{\ast}=\dot{v}_{j}-\hat{v}_{j}/2,(v_{j}^{+})^{\ast}=% \dot{v}_{j}+\hat{v}_{j}/2,j=1,2,\cdots c$
Step5:
After each computing node completes its assigned clustering task, according to Fig. 1, the union of clustering results of each interval is taken to obtain the clustering result of the fuzzy number set.

4. Analysis of experimental results

In order to verify the effectiveness of the fuzzy number parallel FCM clustering algorithm proposed in this paper, we use multiple fuzzy data sets for repeated experiments, including artificial data set, standard test data set, and real data set with the prior categories and data set without prior knowledge. Due to space limitations, this section only analyzes the experimental process and experimental results of two fuzzy data sets. One is an artificially constructed one – dimensional fuzzy data set, and the other is from the literature [1], the triangular fuzzy data set of the multi – dimensional attribute index formed by the tea grade evaluation in a certain place in Taiwan. Experimental environment: computer hardware configuration includes i7 CPU, 4G memory, programming implementation tool is matlab2014a. In the experimental process, all clustering processes are completed on the same machine. Because the algorithm is executed in parallel, the order of interval number clustering generated by $\lambda_{i}$ -cuts does not affect the clustering results. There are 2 experimental evaluation indicators in experiments, one is clustering accuracy, that is, the closeness of the experimental result to the real result, in the bell function data set it is expressed as the consistency between the clustering center of the data and the actual center, and the tea data set it is expressed as the percentage of correct classification in the classification. The other is the running time, that is, the relationship between the system running time and value of $h$ under the condition of different number of copies $h$ . We hope that while ensuring the accuracy of the clustering, the running time cannot be too long, that is, the value of $h$ cannot be too large.

4.1 Experiment 1 Artificial data set

This experiment constructs 100 fuzzy numbers belonging to 2 classes in a one – dimensional space. The membership of each fuzzy number is a bell function with a normal distribution. The central values of bell function are generated by Gauss random numbers of N (0.25, 0.1) and N (0.75, 0.1) respectively, and the variance of bell function is generated with random numbers what uniformly distributed between [0, 0.02]. Use the cut – set fuzzy number parallel FCM clustering algorithm proposed in this paper to perform experiment. The $h$ takes 8 values of 5, 10, 15, …40, and performs parallel clustering separately. Record the time and clustering results of each experiment for the purpose of comparison. In the experiment, the influence factor $\alpha$ is 0.5 according to the calculation of Eq. (6), the iteration termination condition (the difference between the cluster centers between two iterations) $\varepsilon=$ 0.001, and each experiment (that is, for each $h$ ) is repeated 10 times to take the average value. After the experiment, the cluster centers of the two classes of fuzzy numbers are (0.241, 0.768), (0.245, 0.769), (0.251, 0.263), (0.250, 0.762), (0.251, 0.755), (0.250, 0.753), (0.252, 0.752) and (0.251, 0.754), after converting to clustering accuracy, the accuracy and running time of fuzzy clustering under different $h$ are shown in Table 1.

It can be seen from Table 1, with the increase of $h$ , the clustering accuracy also increases for the clustering on fuzzy number of bell function. At the same time, the time spent to finish the clustering also increases, but when $h$ increases to 30, and continue to increase its value, the clustering accuracy no longer changes significantly, but the time overhead to finish the clustering is still increasing, so $h=$ 30 or so is sufficient. When $h=$ 30, its membership function is as shown in Fig. 2, which completely matches the mathematical model for generating fuzzy data, thus verifying that the proposed fuzzy number parallel FCM clustering algorithm based on cut set is completely feasible and effective.

Table 1
The clustering center, accuracy and running time of bell function fuzzy number under different $h$

$h$	5	10	15	20	25	30	35	40
Clustering accuracy operation times	0.92	0.94	0.96	0.98	0.98	0.99	0.99	0.99
	15.9	33.4	46.7	61.2	72.4	78.6	90.2	101.3

4.2 Experiment 2 Taiwanese tea dataset

This is a data set of tea classification in Taiwan. It contains 69 types of tea, each of which has 4 attributes of appearance, color, soup color and fragrance. The 4 attributes of each tea were evaluated by 10 experts; all tea is divided into five levels: perfect, good, medium, poor and bad, and then the experts express the evaluation language value with triangular fuzzy numbers, as shown in Table 2. Among the triangular fuzzy numbers of each attribute, the first value represents the central evaluation value (equivalent to the maximum membership value of triangular fuzzy number), the second value represents the maximum offset of evaluation, and the third represents the minimum offset, The detailed evaluation scheme and the fuzzy data set for evaluation are shown in the literature [1].

Table 2
Expert evaluation data set for 69 types of tea

Tea number	Tea name	Appearance	Color	Soup color	Fragrance
1	Bai Mao Hou	(0.87, 0.23, 0.16)	(0.36, 0.25, 0.16)	(0.31, 0.26, 0.19)	(0.61, 0.24, 0.18)
2	Hei Mao HMou	(0.65, 0.24, 0.21)	(0.33, 0.20, 0.16)	(0.40, 0.28, 0.19)	(0.64, 0.19, 0.23)
3	Qing Xin Hei Nou	(0.4, 0.22, 0.20)	(0.38, 0.22, 0.21)	(0.34, 0.23, 0.21)	(0.70, 0.19, 0.26)
4	Qui keng Bai Mao	(0.66, 0.25, 0.21)	(0.32, 0.26, 0.23)	(0.36, 0.26, 0.23)	(0.69, 0.21, 0.27)
5	Da Nan Bai Mao	(0.88, 0.17, 0.16)	(0.19, 0.21, 0.18)	(0.17, 0.19, 0.13)	(0.64, 0.19, 0.14)
	⋮	⋮	⋮	⋮	⋮

Figure 2.

Membership of 2 cluster centers obtained by parallel FCM clustering.

Firstly, the cut set parallel FCM algorithm (CPFCM for short) proposed in this paper is used to carry out tea classification experiments. The experimental environment is the same as experiment 1. Through experiment 1, we know that when the $h$ is about 30, the clustering accuracy can be better, so this experiment sets $h=$ 30, iteration termination condition $\varepsilon=$ 0.001, and the interval size influencing factor in the experiment is calculated according to Eq. (6), that $\alpha=$ 0.25 is more suitable, and the classification results are shown in Table 3 after the experiment is finished. The experimental results show that the CPFCM algorithm proposed in this paper is consistent with the experimental results obtained in literature [1], that is, the clustering accuracy rate is $P=$ 100%.

Table 3

Classification results of 69 kinds of tea

Tea number	Category	Corresponding grade
1–19	1	Excellent
20–38	2	Good
39–54	3	Medium
55–59	4	Worse
60–69	5	Failed

In addition, in order to verify the robustness of the algorithm to the noise samples and the time efficiency of the parallel algorithm, one of the best-quality teas which named “white tip” is added to the above tea varieties, which form a data set of 70 triangular fuzzy numbers. The serial number of this kind of tea is 70, and the triangle fuzzy numbers of its 4 attributes are (0.9120, 0.1362, 0.2230), (0.9260, 0.1284, 0.2640), (0.8765, 0.1834, 0.2650) and (0.8354, 0.2109, 0.2210), respectively. The experiments are performed using the CPFCM algorithm, the AFCN algorithm proposed in [1] and the FCN algorithm proposed in [15]. Among them, Yang M. S. first proposed a classical fuzzy clustering algorithm based on fuzzy number in reference [15], and then improved it to propose an adaptive fuzzy number clustering algorithm in reference [1]. Both algorithms have been successfully applied in fuzzy number clustering, so it is feasible to use them for experimental comparison. The experimental classification results are shown in Table 4.

Table 4

Classification results of 70 kinds of tea by 3 algorithms

Algorithm	Class 1	Class 2	Class 3	Class 4	Class 5	Correct rate	Time required
FCN	70	1–19	20–38	39–59	60–69	56%	25.36
AFCN	1–19	20–38	39–59	60–64	65–69	85%	28.92
CPFCM	1–19	20–38	39–54	55–59	60–69	100%	56.31

It can be seen from Table 4 that after adding exceptional noise with tea of “white tip”, the experimental results of the algorithm proposed in this paper is consistent with the results in Table 3, and the clustering accuracy rate is still 100%, that is, the noise sample (tea No. 70) is not divided into any category, although the AFCN algorithm does not classify the noise sample of tea No. 70 into any of the categories, the classification results are inconsistent with the classification results of no noise samples, thus indicating that the algorithm in this paper is very robust to noise. However, the FCN algorithm separates the noise samples into the first category, but the other 69 kinds of tea are only divided into 4 class levels, which are inconsistent with the actual classification, indicating that the algorithm is sensitive to noise. Of course, from the perspective of the experiment completion time, compared to the FCN algorithm and the AFCN algorithm, the parallel FCM algorithm proposed in this paper takes longer to complete the classification, but it is worthwhile to sacrifice the computer’s time overhead properly under the condition of ensuring the classification accuracy. In addition, you can also reduce the time overhead by setting up multiple computing nodes to run in parallel, of course, at the expense of multiple computing node resources that is, computing space is exchanged for computing time.

5. Conclusions

Fuzzy number is an important part of the data field in the information world. In this paper, a clustering method based on cut-set parallel FCM is proposed for the clustering of fuzzy numbers. The validity of the method is verified by clustering analysis of artificial data set and tea evaluation experiments. There are two innovations in this paper. One is the method of converting interval number into real data, including the design of the interval size influence factor, the relationship between triangular fuzzy numbers and three-parameter interval numbers, and the conversion rules of three-parameter interval numbers and real data. The second is the implementation mechanism of fuzzy number parallel FCM clustering. It should be noted that in the process of parallel FCM clustering of fuzzy numbers, since interval clustering of multiple cut-sets is to be performed, and as the number of cut-sets increases (that is, $h$ becomes larger), the time required to complete the clustering will increase. However, it is worthwhile to increase the computing time on the premise of ensuring the quality of clustering. In addition, in the multi-attribute triangular fuzzy number clustering, the algorithm proposed in this paper does not consider the fact that different attributes affect the clustering results differently that is all attributes of clustering analysis have the same importance weights, for example, the importance of appearance, color, soup color and Fragrance in Taiwan tea is the same, which may lead to incorrect classification results in practical application. The solution of these problems will be the goal of further research in the next step.

Footnotes

Acknowledgments

This paper supported by the Industrial Support Project of Xinyu Science and Technology Bureau, Jiangxi Province: Research on Reliability Control Technology for Train Communication Network Transmission System (2019).

References

Hung

W.L.

and Yang

M.S.

, Fuzzy clustering on LR-type fuzzy numbers with an application in Taiwanese tea evaluation, Fuzzy Sets and Systems 150(3) (2015), 561–577.

Seekhwan

Son

and Chung

, A digital forensic model based on the generated fuzzy number using FCM clustering, Ubiquitous Information Technologies and Application, 2014, 581–586.

Lan

, The distance of fuzzy information and its applications, Xi’an: Xi’an University of Electronic Technology, 2013, 61–67.

Peykani

Mohammadi

Emrouznejad

et al., Fuzzy Data Envelopment Analysis: An Adjustable Approach, Expert Systems with Applications, 2019.

and Zheng

, Research on dynamic fuzzy data types, Journal of Computer Research and Development 35(8) (1998), 714–718.

Auephanwiriyakuk

and Keller

J.M.

, Analysis and efficient implementation of a linguistic fuzzy c-means, IEEE Transaction on Fuzzy systems 10(5) (2002), 563–582.

Sinova

and Aelst

S.V.

, Empirical analysis of the maximum asymptotic bias of location estimators for fuzzy number valued data, International Journal of Approximate Reasoning, 2019, 1–13.

Gao

and Xie

, A Novel FCM Clustering Algorithm for Interval-Valued Data and Fuzzy Valued Data, in: 5TH International Conference on Signal Proceeding, Springer Nature Switzerland, VOLS I-III, 2000, pp. 1551–1555.

Yazdi

H.S.

Gol

M.G.

Effati

et al., Hierarchical tree clustering of fuzzy number, Journal of Intelligent & Fuzzy System 26 (2014), 541–550.

10.

Duan

and Wang

, FCM clustering algorithm based on polyline fuzzy multi-attribute index information, System Engineering Theory and Practice 36(12) (2016), 3220–3228.

11.

Fan

and You

, A FCM clustering algorithm based on triangular fuzzy number multiple index information, Control and Decision 19(12) (2004), 1407–1411.

12.

and Weng

, Uncertainty data clustering algorithm based on triangular fuzzy numbers, Journal of Zhejiang University of Technology 44(4) (2016), 405–409.

13.

Ivokhin

V.E.

and Apanasenko

V.D.

, Clustering of composite fuzzy numbers aggregate based on sets of scalar and vector levels, Journal of Automation and Information Sciences 50(10) (2018), 47–59.

14.

Screenivasan

and Balamurugan

B.J.

, Computing cluster centers of trapezoidal fuzzy numbers through fuzzy c means and kernel based fuzzy c means clustering algorithm with two metric distances using Matlab, International Journal of Civil Engineering and Technology 9(10) (2018), 1322–1330.

15.

Yang