Abstract
Fuzzy cognitive maps (FCMs) have widely been applied for knowledge representation and reasoning. However, in real life, reasoning is always accompanied with hesitation, which is deriving from the uncertainty and fuzziness. Especially, when processing the online data, since the internal and external interference, the distribution and characteristics of sequence data would be considerably changed along with the passage of time, which further increase the difficulty of modeling. In this article, based on intuitionistic fuzzy set theory, a new dynamic intuitionistic fuzzy cognitive map (DIFCM) scheme is proposed for online data prediction. Combined with a novel detection algorithm of concept drift, the structure of DIFCM can be adaptively updated with the online learning scheme, which can effectively improve the representation of online information by capturing the real-time changes of sequence data. Moreover, in order to tackle with the possible hesitancy in the process of modeling, intuitionistic fuzzy set is applied in the construction of dynamic FCM, where hesitation degree as a quantitative index explicitly expresses the hesitancy. Finally, a series of experiments using public data sets verify the effectiveness of the proposed method.
Introduction
Fuzzy cognitive map is represented by graph structure [1], which consists of nodes and arcs. Nodes can be concepts, entities, etc., and arcs represent between concepts. Different from the traditional intelligent computing methods, by using weigh matrices and causal relationships, knowledge can be conveniently represented and inferred. As the combination of fuzzy logic and neural network (NN), FCMs can be regarded as a single layer of object-oriented neural network with feedback. Therefore, the advantages of neural network can be inherited. Meanwhile, compared with NN, FCM has stronger semantics and interpretability such that complex systems can be more intuitively and conveniently modeled. Up to data, fuzzy cognitive map has been applied in various fields, including the modeling of complex problems in aviation service management [2], strategy analysis [3], and control system [4], etc. In recent years, scholars have done a lot of research on time series prediction and put forward many models.
In recent years, the research filed of time series, such as prediction [5] and classification [6, 7], etc., has developed rapidly. As to the prediction of time series, the constructed information granules are abstracted from the time series, which can be taken as the concepts of the fuzzy cognitive map. In the process of information granulation, language labels and fuzzy sets [8] are utilized to make the model easier to understand. For example, the information granulation of time series in [9] is to cluster data into fixed clusters in the two-dimensional space of amplitude and amplitude change, and a cluster represents an information granule, thus building a model. Although the concepts of the fuzzy cognitive map can be constructed by clustering, but it is difficult to obtain the weight matrix of the model. Generally, the weight matrix between concepts of FCM is given by experts’ knowledge and experience in this field [10]. For the numerical data of time series, it is difficult to capture the changing trend and characteristics of different data in different fields, so it is proposed to learn from historical data. For the learning of weight matrix, Aguilar [11] proposed adaptive random training algorithm in 2002. With the change of causal model and the update of expert knowledge, the model structure was changed adaptively. Besides, some heuristic algorithms, such as bee colony [12, 13, 14], genetic algorithm [15] and particle swarm optimization [16, 17], can be applied. For instance, [18] used FCMs as a modeling tool for time series prediction. A construction of information granules is realized with the aid of fuzzy C-means, particle swarm optimization algorithm is used to learn parameters.
In short, the purpose of learning is to get a reasonable weight matrix and make the model more consistent with the actual situation. In this paper, particle swarm optimization algorithm is also used for model weight learning.
Intuitionistic fuzzy sets, as a method of dealing with fuzzy information, have developed rapidly in recent years. Based on the fuzzy set, Atanassov put forward the concept of intuitionistic fuzzy sets by adding a new attribute parameter, i.e. non-membership function, which can be used to more objectively describe the ambiguity in real world. The advantage of intuitionistic fuzzy set theory in the representation and processing of fuzziness makes it widely used in the field of decision-making and reasoning. In 2014, Balasubramaniam et al. [19] proposed a method to fuse two or more images using the maximum and minimum operations in intuitionistic fuzzy sets to tackle with the existing uncertainties in digital image. In 2016, Marasini et al. [20] applied intuitionistic fuzzy sets to questionnaire analysis, where intuitionistic fuzzy sets were applied to the evaluation of public administration as a valuable theoretical framework. By introducing the intuitionistic fuzzy set into fuzzy cognitive map, the intuitionistic fuzzy cognitive map (IFCM) is constructed for the prediction [21]. The main advantage of IFCM is that it can take into account the hesitation degree of the elements to the attribution of sets, where the hesitation value is calculated by the relationship between the membership degree and the non-membership degree. Up to data, there are two kinds of IFCM. One is IFCM-I, which is built by introducing hesitation to the relationship between nodes [22]. On this basis, besides of the weights, IFCM-II is built by implementing hesitation degree when determining the concept value, which can further spread the hesitation utility to the final decision.
Usually, time series arising from real applications would not have the stability, i.e. the distribution of data might change over time. For instance, in online business activities, customers’ purchasing interests would change at various times. In the field of network security, network access patterns would also change with the habits of the users. Generally, the above scenario considering the distribution of the sequence data change over time is known as concept drift [23]. Since a well-trained prediction model becomes invalid when a drift occurs, the involvement of concept drift would considerably affect the prediction of time series. In order to prevent the decrease of model prediction accuracy caused by concept drift, change detection algorithm can be adopted to monitor the prediction process in a statistical way. When the data distribution is stable, the model does not need to be adjusted. When conceptual drift occurs, the old model is no longer used, and the model must be updated with new data to improve the prediction accuracy. Although there are algorithms proposed, such as performance method [24], distance method [25] and property method [26], however, how to efficiently detect the occurrence of drift is still an open problem. In this paper, the performance method is used to judge whether conceptual drift occurs and then change the model according to the prediction error.
The establishment of traditional static FCM relies on expert knowledge and lacks of learning ability. Due to the subjectivity of experts, the application scope of FCM has been limited. For the prediction of time series, it is impractical to build FCM relying on experts due to the volatility and uncertainty of time series data. In order to overcome the artificial limitation, it is proposed to use the historical data to construct FCM model. And, since there are uncertainties in the construction of the model, the intuitionistic fuzzy set is involved, which takes into account the influence of hesitation in the data reasoning. The existing IFCMs for the applications of decision-making and prediction were generally constructed on the basis of experts’ experience, the structure of which is fixed. As mentioned above, the static FCM cannot meet with the variability of time series. Therefore, in this article, dynamical IFCM is first proposed and the main contributions are as follows:
Dynamical intuitionistic FCM is first proposed for the prediction of time series, instead of the construction based on domain experts’ knowledge, which is established by directly learning from online time series. Moreover, both the structure and weights of FCM can adaptively be adjusted along with the incoming sequence data. In order to capture the change of time series, a novel drift detection with hesitation degree is proposed into the learning process of FCM, which can effectively detect the occurrence of drift to trigger the learning mechanism. Meanwhile, the involvement of drift detection can avoid redundant and invalid learning behaviors such that the algorithm efficiency can be improved. Hesitation degree of the intuitionistic fuzzy set is directly calculated by using the raw data, compared with the experts’ experience, which would reduce subjective factors. Besides of the construction of FCM, intuitionistic fuzzy set is also applied in the drift detection algorithm.
The rest of this paper consists of five parts. The second part represents the related work. The third part briefly reviews the basic knowledge and traditional FCM. In the fourth part, we introduce the modified drift detection and the DIFCM model of time series prediction. In the fifth part, some open data sets are applied to describe the prediction results of the proposed model in detail, and the sixth part makes a brief summary.
In recent years, the research of time series has attracted attention of many researchers. As an important knowledge-based model, fuzzy cognitive map has a good application in prediction of time series. Papageorgiou and Poczeta [27] combined structural optimization algorithm with artificial neural network to improve the effective ability of evolutionary FCMs, and proposed a two-stage prediction model for multivariable time series prediction. Wu and Liu [28] employed the least absolute shrinkage and selection operator (lasso) to learn FCMs, and studied the synthetic data with different sizes and densities. The experimental results showed that the method had an appropriate learning performance in both noisy and noiseless time series. Zhang and Luo [29] made the real-time prediction of time series by dynamically adjusting fuzzy cognitive map. Feng et al. [30] proposed a simple and robust learning method to learn FCMs from noisy data, especially large-scale FCMs. The algorithm equivalently transformed the learning problem of FCM into a classical constrained convex optimization problem. A series of experiments on noisy data sets indicated the effectiveness of the method. Peng et al. [31] proposed a model based on primary sub fuzzy cognitive maps for multidimensional time series to study the formation of haze. This method quantitatively revealed the causal relationship of haze formation, and effectively predicted the haze pollution. Shen et al. [32] proposed a preference based iterative threshold evolutionary bi-objective optimization algorithm to learn FCMs. Double objective meant the minimum measurement error and the minimum number of nonzero entries. Nair et al. [33] introduced a generalized fuzzy cognitive map (GFCMs) to solve the time lag between causes and effects. By analyzing the social, economic and technical consequences of heavy rainfall in Kampala, Uganda, it is confirmed that the addition of time lags enhances the reliability of GFCMs as a quantitative method for complex system dynamics. Luo et al. [34] presented a time series prediction model based on intuitionistic fuzzy cognitive map. The algorithm did not rely on expert experience, but directly constructed the conceptual structure of cognitive map and weight matrix from the original data. On this basis, combined with the dynamic membership degree and Femi formula, a real-time adjustable fuzzy degree calculation scheme was proposed. Hajek and Froelich [35] developed an intuitionistic fuzzy grey cognitive map (IFCMs) model to predict interval valued time series (ITS). Through the analysis of the open stock market data, it showed that the model had high efficiency.
Preliminaries
Fuzzy cognitive map
A fuzzy set on a universe of discourse
where
Fuzzy cognitive map is a directed graph, which is obtained by adding a fuzzy reasoning mechanism to the cognitive map, as shown in Fig. 1, where a FCM with five nodes and six weighted arcs are depicted. Nodes
The activation level of the fuzzy cognitive map at the current moment
Normally,
A simple model of fuzzy cognitive map with five nodes.
An IFS [38] can be considered as a generalized fuzzy set. Given a universe of discourse
where
For each
where
Hesitation can be regarded as the degree of uncertainty, and it is usually explained more intuitively than non-membership. There are some computations of intuitionistic fuzzy sets, such as summation and multiplication of two fuzzy sets, which are defined as:
Generally, the overall model can be divided into two parts. Firstly, a prototype of FCM is first constructed by means of fuzzy C-means clustering and PSO algorithm with the involvement of intuitionistic fuzzy set. Secondly, based on the drift detection algorithm, dynamic fuzzy C-means clustering is applied to adaptively adjust the parameters of model. The specific process is shown in Fig. 2. It should be noted that, in this article, the proposed FCM model is constructed by using time series data, where domain expert or domain knowledge is not involved.
Flow chart of constructing dynamic intuitionistic fuzzy cognitive map.
To construct dynamic intuitionistic fuzzy cognitive map, the initial intuitionistic fuzzy cognitive map should be constructed by using historical data and updated on this basis. IFCM-I [39] for medical decision making links symptoms to medical decision making diagnoses. Experts make decisions based on patients’ symptoms not only on the analysis of their symptoms, but also on their intuition. So, there is an effect of hesitation, which is a hesitation existing in different symptoms.
However, the hesitation degree existing in time series prediction cannot be obtained by the expert knowledge for medical decision making. It reflects the degree of hesitation about the obtained value of membership degree, which contributes to the modeling of uncertainty. Figure 3 is an IFCM-I with four nodes. Here, according to the negative impact of the hesitation between concepts, the inference form of IFCM-I can be shown as follows:
where
IFCM-I of four nodes [36].
IFCM-I only considers the hesitation degree on the weight relationship between nodes. As an extension, besides weights, IFCM-II also includes the hesitation to the concepts [22], based on which we propose the DIFCM.
The weight matrix and hesitation degree of DIFCM are obtained by learning from time series. Generally, we assume that the hesitation degree of high membership degree is low, which is in accordance with the generic sense in daily life. In order to improve the readability, the simplified representation of
The membership is activated by a Fermi formula
Like IFCM-I,
Here the sum and multiplication of arithmetic are changed by Eqs (7) and (8). Using Eqs (8) and (13), one obtain that:
And the right term of the equation is calculated by using Eq. (7) to calculate the intuitionistic fuzzy summation described by symbol
Here,
By recursively calculated by using Eq. (16), one can obtain:
According to Eq. (17), one can get:
A DIFCM model with four nodes is taken as an example, as shown in Fig. 4. Using Eq. (19), the calculation of
The inference equation of the model is obtained by substituting
By using Eq. (15), the hesitation can be quantified such that the obtained results will provide more information where.
DIFCM with four concepts.
Here, Time series is represented by the amplitude and the change of amplitude, i.e. data is processed in the form of two-dimensional space
The construction of graph structure is based on fuzzy C-means clustering [40]. Fuzzy C-means forms
The membership matrix
Here
After the model is built, each data corresponds to an activation vector for the model, and each activation vector is associated with the
For the prediction model, the activation vector of the next time is predicted by using the current input data. The closer the predicted activation vector is to the target value, the better the performance of the proposed model.
Regarding the optimization of parameters, we only adjust the weight matrix, which directly affects the prediction performance. The steepness coefficient here is set to 5. Here, the particle swarm optimization algorithm is applied for the parameter optimization.
Particle swarm optimization (PSO) algorithm derive from the research on the predation behavior of birds, where the group achieves the optimal goal through collective cooperation. Each particle can be regarded as a search individual in the D-dimensional search space, and the current position of the particle is a candidate solution of the corresponding optimization problem. The velocity and position of particles are initialized randomly. After each iteration, the fitness value of each particle is calculated, and the position and speed of each particle are adjusted to reach the optimal fitness.
It is assumed that there is a community consisting of N granules in the target search space of
where
The velocity of the
The best location of particle ith has been searched for individual extremum:
The optimal location of the whole particle swarm has been searched for the global extremum:
When finding these two optimal values, particles update their speed and position according to Eqs (26) and (27):
where
In this paper, the dynamic inertia weight value of PSO [41, 42] adopts a linear decrement weight strategy. Inertia weight
When PSO algorithm is used to train the weight matrix of the model, the weight interval is first initialized. Then, the elements of particle swarm position vector are defined as the weights of the proposed model for iterative optimization. Here, the iteration termination condition is set to satisfy the maximum number of iterations. In order to optimize the weight matrix (including hesitation matrix), the fitness function is selected as follows:
where,
As to the unstable time series in reality, concepts drift would occur when the distribution of sequence data is changed. Drawing on [43], we propose an extended version of drift detection with hesitation degree for online data and incorporate it into the implementation of DIFCM. Different from the existing work [43], the main idea of the applied drift detection is to monitor the change of the hesitation degrees of error rates (HDER) of the prediction instead of using error rates. When the time series keeps stably distribution, the HDER of the prediction model would gradually decrease with the passage of time. Otherwise, when the HDER increases, it is assumed that the probability distribution changes and drift occurs.
Two thresholds are required, where one indicates the warning phase and the other determine the occurrence of the drifting. When the
Assuming that the real sample sequence is
When
where
After obtaining the initial intuitionistic fuzzy cognitive map, by using dynamic fuzzy C-means clustering [44], the position of concepts are adaptively adjusted according to the change of time series. Each cluster is represented by its cluster center, and a cardinality is assigned to record the membership degree of data for the cluster.
where
When an input sample
The initial value of
Flow chart of dynamic adjustment of fuzzy cognitive map.
Our method of calculating the error [45] is as follows:
The activation level (membership degree) at
When the value of
The overall process is shown in Fig. 5.
In the following, several public data sets are applied to verify the effectiveness of the proposed model, where the detail process of the construction of DIFCM-IDDM is shown. In all experiments, the number of particle swarm is 40, and 250 iterations are conducted. All algorithms are run under python with an Intel (R) core (TM) i7-6700 CPU, 3.4 GHz processor and 8 GB memory.
Copper time series
Copper [46] is a widely investigated time series from public data sets that records the monthly Celsius temperature of copper deposits, which consists of total 257 observations from 1933 to 1976, and its fluctuations over time are approximately constant.
Figure 6a can clearly observe the scalar time series. Figure 6b is the amplitude change of the time series. Figure 6c is the processing of the time series, so that it is expressed in the two-dimensional space of amplitude and amplitude change. Figure 6d is the clustering prototype obtained by fuzzy C-means clustering.
Representation of the copper sequence.
Performance index and minimum error (training data).
Error values obtained by three kinds of model in copper test data.
Copper’s predicted values and true values (orange line represents true values, green line represents predicted values).
In this data set 30% of the data points are used for model training and 70% data points are left for testing purposes. Firstly, 30% of the processed two-dimensional data is clustered with the fuzzy C-means clustering algorithm, and the number of clusters is
Finally, the clustering centers obtained by clustering are as follows:
The PSO algorithm is used to learn the weight matrix to get the initial model. The model is adjusted when the error value of the current time prediction is greater than the preset threshold value. However, the distribution of the data may not change, the adjustment at this time may not be necessary, and may cause greater error. Therefore, it is necessary to observe the model before making changes, so drift detection is introduced. Figure 8a shows the error values of static fuzzy cognitive map without change. Figure 8b shows the error values of dynamic fuzzy cognitive map and the model will be changed when the error value at the current time is greater than the threshold. Figure 8c shows the error values of dynamic fuzzy cognitive map obtained after drift detection. The red line represents the moment when drift occurs and then change the weight matrix of the model according to the data from the warning to the drift. Figure 8d is a comparison of the three methods. It shows that the model with drift detection is more effective, and the error values are obviously reduced by adjusting the model after drift detection.
Finally, according to predicted membership degree, the predicted data are obtained by defuzzification. As show in Fig. 9 that the predicted value is basically consistent with the trend of the real value, but there are also some gaps. The prediction at the maximum and minimum amplitude is not satisfactory, which is closely related to the position of the cluster centers, but the fluctuation effect between the peaks is still good.
This data set [47] records the monthly change of sunspots from 1749 to 1983, consisting of 2820 observations in total. The data is split into the training (70% of all data) and testing set (30%). The running time of prediction data are 0.012 s and 0.06 s for the static FCM and FCM with intuitionistic fuzzy set, respectively. As to DIFCM-IDDM, since the detection of drift in data would take some additional time, a longer running time 0.469 s is required for the prediction process.
Figure 10a can clearly observe the scalar time series. Figure 10b is the amplitude of time series change. Figure 10c is a processing of time series, which is represented in two-dimensional space of amplitude and amplitude change. Figure 10d is a clustering prototype obtained by fuzzy C-means clustering.
Representation of the sunspot data.
When
Figure 12a shows the error values of static fuzzy cognitive map without change and Fig. 12b shows the error values of dynamic fuzzy cognitive map, and the model will be changed when the error value at the current time is greater than the threshold. Figure 12c shows the error values of dynamic fuzzy cognitive map obtained after drift detection. Figure 12d is a comparison of the three methods, and it shows that the model with drift detection is more effective.
A comparison figure of the sunspot test data for whether to consider HDER.
Error values obtained by three kinds of model in sunspot test data.
Sunspot’s predicted values and true values (orange line represents true values, green line represents predicted values).
Finally, according to the membership degree after our prediction, defuzzification is carried out to obtain the predicted data, as shown in Fig. 13, we can see that the fitting effect is very good.
The third time series includes 1,460 observations, Oldman, coming from the repository [48], which describes the average daily flow of the Oldman River reported from January 1st, 1988 to December 31st, 1991. The 70% of data sets are used as the train set and the rest for test set. Figure 14a can clearly observe the scalar time series. Figure 14b is the amplitude change of the time series. Figure 14c is a processing of time series, which is represented in two-dimensional space of amplitude and amplitude change. Figure 14d is a clustering prototype obtained by fuzzy C-means clustering.
Representation of the Oldman sequence.
Figure 15a shows the error values of static fuzzy cognitive map without change. Figure 15b shows the error values of dynamic fuzzy cognitive map, and the model will be changed when the error value at the current time is greater than the threshold. Figure 15c shows the error values of dynamic fuzzy cognitive map obtained after drift detection. Then change the weight matrix of the model based on the data from the warning to the drift. Figure 15d is a comparison of the three methods. We can see that the model with drift detection is more effective, and the error values are obviously reduced by adjusting the model after drift.
Error values obtained by three kinds of model in Oldman test data.
Oldman’s predicted values and true values (orange line represents true values, green line represents predicted values).
Finally, according to the membership degree after our prediction, the defuzzification is carried out to obtain the predicted data. From Fig. 16, Oldman series has good prediction effect on high and low trend.
The fourth time series describes the monthly milk production [47] from January 1962 to December 1975, concerning 187 observations totally. We split the data into the training (70% of all data) and testing set (30%). Figure 17a can clearly observe the scalar time series. Figure 17b is the amplitude change of the time series. Figure 17c is a processing of time series, which is represented in two-dimensional space of amplitude and amplitude change. Figure 17d is a clustering prototype obtained by fuzzy C-means clustering.
Representation of the milk sequence.
Two methods of dynamically adjusting the model (testing data).
The constructed DIFCM.
Figure 18a shows the error values of change the model as soon as the error value exceeds the specified threshold. Figure 18b shows the error values of the model which introduces drift detection, when error value exceeds the specified threshold, continue to observe the model without changing its current state. When the warning is reached, it will remind the model distribution may have changed, and update the model when the drift is reached. However, with the data input, the HDER gradually decreases, indicating that the sample data is stably distributed. Therefore, it is unnecessary to change the model, and the previous weight meets the needs of the model. By comparison, the model with drift mechanism has better prediction effect.
Figure 19 is Milk’s dynamic intuitionistic fuzzy cognitive map, which consists of four nodes and is described in Cartesian language [23]. If the data is a small negative number expressed in
Finally, Fig. 20 shows that the predicted effect is good. In order to verify the performance of the proposed DIFCM-IDDM, we selected four data sets and compared them using four modeling methods. The population size
Error values obtained from the discussion of the population number
The average error values obtained by four models of four data sets (
Milk’s predicted values and true values (orange line represents true values, green line represents predicted values).
This paper proposes a new dynamic intuitionistic fuzzy cognitive map to predict the online time series, which can be established from historical data and avoid subjective influence from experts. By means of intuitionistic fuzzy set, the involvement of hesitation degree can comprehensively depict the prediction model and improve the results. Considering the diversity of time series, dynamic fuzzy C-means clustering is used to dynamically adjust the structure and model weight, so that the model can adapt to the changing trend of time series. A novel Drift detection method with hesitation degree is introduced into the learning process, which determines whether the weights of model should be updated. When drift occurs, it means that the distribution of data has been changed such that the prediction model is adjusted to improve the prediction efficiency and accuracy. Finally, four groups of experiments verify the feasibility and effectiveness of the proposed scheme. Since noises and missing values in data are common in the practical applications, future research will attempt to further improve the robustness and anti-noise performance of the model. We hope current results can enrich the research in this community and inspire the work in the future.
Footnotes
Acknowledgments
This research is supported by the National Natural Science Foundation of China (Nos: 61402267); Shandong Provincial Natural Science Foundation (ZR2019MF020).
