Abstract
The key problem of time series classification is the similarity measure between time series. In recent years, efficient and accurate similarity measurement methods of time series have attracted extensive attention from researchers. According to the different similarity measure strategies, the existing time series classification methods can be roughly divided into shape-based (original value) methods and structure-based (symbol transformation) methods. Shape-based methods usually use Euclidean distance (ED), dynamic time warping (DTW), or other methods to measure the global similarity between sequences. The disadvantage of these methods is that their measurement process does not necessarily achieve local sensible matchings of time series, which leads to a decrease in their accuracy and interpretability. To better capture the local information of the sequence, the structure-based methods discretize or symbolize the local value of the time sequence, which leads to the loss of the original information of the sequence. To address these problems, this paper proposes a novel similarity measurement method named dynamic time warping based on the local morphological pattern (MPDTW), which first decomposes the local subsequences of time series using discrete wavelet transforms for extracting the local structure information. Then, the decomposed subsequence will be encoded by the morphological pattern. Finally, the ED between points and their local structure difference based on morphological pattern will be weighted and applied to the DTW algorithm to measure the similarity between sequences. Experiments have been carried out on the classification tasks of the UCR datasets and the results show that our method outperforms the existing baselines.
Introduction
With the development of modern industry, information technology, and the continuous advancement of data generation and collection technology, massive amounts of data are constantly being generated from many areas of daily production and life. These data can essentially be regarded as time series, and the corresponding classification technologies has attracted increasing attention in many practical fields, such as finance, medicine, geological monitoring, climate science, and aerospace [1, 2, 3, 4, 5, 6].
Time series classification is one of the most important research topics in the field of data mining and has received extensive attention from many researchers in recent years [7, 8, 5, 6, 9]. At present, some methods of time series classification have been proposed. According to the different similarity measure strategies used in the classification process, these methods can be divided into two types: shape-based methods and structure-based methods [10, 11]. The former mainly focuses on the global similarity between time series, using the original value of the series or its slope and other derivative values to measure the similarity and apply it to the classifiers. At present, similarity measure techniques commonly used in such classifiers include ED, longest common subsequence (LCSS) [12], and DTW [13]. The One Nearest Neighbor (1NN) classifiers using these distance measures are very easy to implement and have been widely used in time series classification. However, this type of time series classification method that relies on the global similarity of the sequence fails to handle the local structure information of the sequence during the calculation of the similarity. This will eventually decrease the interpretability and accuracy of the classification results. Figure 1 shows the sequence matching results of the similarity measure of two time series in the DTW-1NN classification model. Some local structures between the two sequences are not well matched (the area marked by the green box in Fig. 1).
The similarity measure results of DTW in 1NN.
Structure-based classification methods pay more attention to the local structure information of the time series in the similarity measure. These classifiers first learn local features and perform feature space transformation on the time series, such as transforming the local subsequence of the time series into a symbol sequence. Then, the similarity measure between time series is calculated. Among these methods, symbolic aggregation approximation (SAX) [14], symbolic Fourier approximation (SFA) [1], and bag of pattern (BOP) [15], etc. are commonly used. Compared with shape-based methods, structure-based methods highlight the local structure information of the sequence. However, because the original data information of the time series is discarded during the feature space transformation, the structure-based methods also have the problem of sequence information loss during the similarity measure of the sequence. In addition, the feature space complexity of this type of method is often high, and at the same time, it is accompanied by a more complicated process of selecting the optimal parameters, which is not conducive to the classification results.
To address the above problems, we propose a new time series similarity measure method consisting of two sequential steps: (1) local morphological pattern encoding (LMP) and distance weighting. By using this similarity method in the 1NN classifier, it is possible to achieve accurate and interpretable time series classification. Specifically, when we measure the similarity between sequences, we first extract subsequences around each time stamp and then use discrete wavelet transform (DWT) [16] to decompose the subsequences. To extract the structure information, we propose the LMP to encode the subsequences to represent the local structure information at that timestamp. (2) In the distance weighting step, we assign appropriate weights to the ED and local morphological pattern difference between any two points in the time series and perform weighting to replace the traditional single ED. The proposed method alleviates the imbalance between the original information and local structure information in traditional time series similarity measurement methods.
The main contributions of this paper are summarized as follows:
We use DWT to decompose the time series subsequences, which can effectively extract the local structure information of the sequence. In addition, the noise reduction characteristics of DWT effectively alleviate the influence of data noise on the time series similarity measurement process. The proposed LMP was successfully applied to the local structure information encoding of time series and realized effective representation of the local structure information of the sequence. The proposed similarity measurement method effectively alleviates the imbalance between the original sequence information and local structural information in traditional similarity measurement methods by weighted ED and local morphological pattern differences. In addition, the 1NN classifier using MPDTW shows accurate, efficient, and interpretable classification results in the test using the UCR datasets.
The rest of the paper is organized as follows. Related work is discussed in Section 2. Section 3 gives a formal description of the time series classification problem and its related definitions. Section 4 elaborates on the similarity measurement method proposed in this paper. Section 5 introduces the experimental design and results. Finally, the conclusions of this article are drawn in Section 6.
In recent years, some methods for time series classification have been proposed. Here, according to similarity measure strategies, we divide these methods into two categories: shape-based methods and structure-based methods. Specifically, the shape-based methods rely on the global similarity of time series to classify them. At present, most of the research on this type of methods focus on alternative elastic distance measure methods that measure the global similarity of sequences. For example, typical ones are ED, LCSS, and DTW, which have been successfully applied to classify time series that have global similarity. However, these point-to-point matching methods are unreliable and error-prone due to the possibility that local different subsequences may be aligned. To improve DTW, in [17], derivative dynamic time warping (DDTW) is proposed to achieve a reasonable alignment, which converts the original sequence into a first-order derivative sequence. In [18], the authors developed a globally weighted dynamic time warping (WDTW) algorithm that assigns lower weights to points closer to the diagonal. In [19], the authors described a means of weighting a distance measure complexity invariant distance (CID) to compensate for differences in the complexity in the two series being compared. In addition, the time warp edit (TWE) [20] implements the elastic measure of similarity between sequences by imposing a penalty on the distance between point pairs. Górecki et al. [21] proposed the derivative transform distance (DTD), which extended the sine-cosine transform to DDTW to measure the similarity of time series. In particular, to achieve a reasonable match of local shape features, shapeDTW [22] converted the sequence into shape feature codes and then aligned them. These measurement methods are designed to use elastic distance metrics to compensate for small deviations between sequences and are usually used with the nearest neighbor classifier. When there are discriminative features in the entire sequence, a classification method that relies on the global similarity measurement is appropriate. However, when the discriminative characteristics of the sequence are contained in the local structure of the sequence, this type of method cannot capture the local structure information of the sequence well, which causes the loss of the sequence information during the measurement and affects the final classification efficiency. In addition, this type of algorithms are more sensitive to noise.
Structure-based methods focus on extracting the local structure information of the time series. The local feature representation after feature transformation usually has a better smoothing effect and is more resistant to noise. For example, in [23], the authors proposed time series forest (TSF) to overcome the problem of the large interval feature space by employing a random forest approach, using summary statistics of each interval as features. The time series bag of features (TSBF) [24] is an extension of TSF that has multiple stages. Similar to TSF and TSBF, learning pattern similarity (LPS) [3] is also based on intervals, but the main difference is that subseries become attributes rather than cases. In addition, bag of pattern (BOP) [15] works by applying SAX to subsequences to form words and using the distribution of words over a series to form a count histogram to classify samples. Different from the BOP forms term frequencies over the series, symbolic aggregate approximation-vector space model (SAXVSM) [14] forms term frequencies over classes and weights these by the inverse document frequency. DTW features (DTW-F) [25] combines DTW distances to training cases and SAX histograms. In addition, [26] proposed an extension of the decision tree shapelet approach [27], fast shapelets (FS) to speed up the shapelet feature discovery. [28] proposed a shapelet transformation that separates shapelet discovery from the classifier by finding the top
Furthermore, considering achieving higher accuracy, some researchers have proposed ensembles that are highly competitive on general classification problems [30]. Typical examples include EE, BOSS, and COTE [5]. To obtain better accuracy, ensembles usually need to run multiple classifiers on each dataset, which causes their time complexity and space complexity to be much higher than that of a single classifier. Similarly, Fawaz et al. [6] introduced some classification techniques based on deep learning, such as ResNet, FCN, and Encoder. The characteristic of this kind of method is to regard the time series as image data and then use more mature image recognition and classification technology for classification, and some models have achieved promising classification accuracy. However, this type of classifier also one-sidedly emphasize accuracy and are often accompanied by a complicated hyperparameter adjustment process, and the classification result is often not interpretable.
Definition and related technology
In this section, some definitions and technologies related to time series classification will be introduced in detail.
Basic definition
It should be noted that when both
Morphological pattern
The morphological pattern (MP) [11] is a classic time series encoding method that can encode the rate of change of the time series upward trend and downward trend into a series of discrete values, thereby effectively reflecting the overall time series trend. For the time series
where
Based on the segmented limit theorem and the central limit theorem of the piecewise aggregate approximation (PAA), Lin et al. combined the normal distribution characteristics of time series and proposed the symbolic representation of time series (SAX) [31]. SAX has been proven to be a fast and effective tool to solve the problem of time series classification. It can convert a time series
Symbolization process of time series.
Haar wavelet transform of time series.
Step 1: SAX is usually used for time series obeying the standard normal distribution, so it is necessary to standardize the time series with Eq. (4).
where NX represents the sequence obtained by standardizing
Step 2: As shown in Fig. 2, the standardized sequence NX needs to be divided into
where
Step 3: Divide the distribution space of the segment mean obtained in Step 2 into equal probability or width intervals and assign a symbol to each interval. For example, all PAA segment averages below the minimum breakpoint (the boundary value of the division interval) can be mapped to the symbol “
The discrete wavelet transform (DWT) is widely used in the field of signal and image processing. It can decompose signals and images into low-frequency components and high-frequency components. The low-frequency components contain the trend information of the signal, and the noise in the signal is concentrated in the high-frequency components [32]. Due to various factors in the environment, the time series obtained in real-life production often contains considerable noise. Given the abovementioned characteristics of DWT technology, we extend it to the local information extraction of time series. Concretely, we use DWT to extract the corresponding low-frequency components of time series subsequences to express the local structure information. Due to the different wavelet coefficients, there are many discrete wavelet transform strategies. For simplicity, this paper uses the Haar discrete wavelet transform to extract and represent the local structure information of the subsequence [33]. Figure 3 shows the extraction result of the Haar wavelet transformation on the trend information of the time series. The trend information of the series after the wavelet transform is retained, and the noise is significantly alleviated.
Here is the extraction process of the local trend component (low frequency component) of the time series
Step 1: In the original time series, each point is taken as the center point to extract the subsequence
Step 2: Given any subsequence
According to Eq. (6), the low-frequency component of all subsequences in time series
DTW is a dynamic programming algorithm that can measure the similarity between sequences through nonlinear distortions and find the optimal matching path between time series. In addition, DTW can be applied to both univariate and multivariate time series. For simplicity, only univariate time series are considered in this paper.
LMP encoding process.
Given two time series
where
In this section, we specifically introduce the proposed similarity measure method MPDTW. Concretely, as shown in Fig. 4, the LMP encoding algorithm is first used to extract the local structure information of the time series. Then, based on the encoding results, we use the LMPD algorithm to calculate the local structural difference between time series points. Finally, the MPDTW measures the time series similarity by weighting the LMPD and ED.
Local morphological pattern encoding
As mentioned in Section 2, the morphological pattern can well reflect the overall trend of the sequence by encoding the upward and downward trends of the time series into a series of discrete values. At the same time, the symbolic representation method SAX can achieve a simple and effective representation of the time series by segmenting and symbolizing the sequence, which smooths the sequence and reduces noise. To extract the local structure information of the sequence as efficiently as possible while avoiding the influence of data noise, we designed the extraction method of the local structure information of the time series with reference to MP and SAX, which is named local morphological pattern encoding (LMP). LMP symbolizes and encodes subsequences based on the morphological pattern framework. Figure 4 specifically shows the process of LMP encoding time series.
Given a time series
Step 1: Extract the subsequence
Step 2: Use DWT to decompose the subsequence extracted in Step 1 and extract the trend component
Step 3: The trend component ST of each subsequence extracted in Step 2 is coded by the LMP encoding Eqs (10) and (11), and the symbol aggregation approximate representation of the trend component of the subsequence can be obtained, namely,
Through LMP encoding, the symbolic representation of the local structure information of the time series
where
Algorithm 1 shows the LMP encoding algorithm. The first line is the initialization of the conversion sequence; Lines 2–4 are the DWT processing on the subsequence
: LMP encoding
[1] time series subsequence
Traditional DTW algorithm usually uses ED to calculate the distance between time series point pairs. For example, given two time series
where
Algorithm 4.2 shows the calculation process of MPDTW. Specifically, the first line is the initialization of the distance measurement matrix; Lines 2–10 are the entire process of the similarity measure of sequences
: Calculation of MPDTW
[1] time series
The proposed MPDTW considers the local structure information of the time series when calculating the similarity, and its time complexity is
We conduct extensive experiments on 85 univariate time series datasets to prove the effectiveness of MPDTW. These datasets are all from the UCR1
https://www.timeseriesclassification.com.
The proposed MPDTW includes three parameters: division point
First, we tested the impact of division point
The influence of local morphology breakpoint 
The influence of local morphological pattern weight 
The LMPD weight
For the subsequence length
The effect of the change of subsequence length l on classification accuracy.
In summary, according to the test results, the division point
In this section, we use MPDTW in 1NN to perform classification tests on 85 time series datasets in UCR and compare the classification results with popular baselines to analyze the performance of the proposed method. According to different classification strategies, we divided the existing benchmark classifiers into four types to compare with 1NN-based MPDTW. The detailed evaluation results are as follows.
(1) Shape-based classifier: Shape-based classification methods measure and classify time series based on global similarity, such as ED, LCSS, DTW, DDTW, WDTW, CID, TWE, DTD, and shapeDTW.
Classification accuracy of models based on DTW
Classification accuracy of models based on DTW
Essentially, the proposed MPDTW is based on the DTW framework, so here, we first compare the classification accuracy of MPDTW with DDTW, WDTW, and shapeDTW, which are also based on the DTW measure framework. The results are shown in attached Table 1 (considering the space, the results of DTW are not listed in the table). The accuracy rate in bold in the table indicates that the accuracy rate of the corresponding model is the best among the four models. The statistical results of the data in Table 1 show that DDTW, WDTW, shapeDTW, and MPDTW have the best classification accuracy on 12, 21, 24, and 35 datasets, respectively, and the optimal number of MPDTW is significantly more than other classification models. In particular, compared with the other three models, the accuracy of MPDTW is improved by more than 10% on 16, 10, and 6 datasets.
Comparison of shape-based classifiers.
Critical difference graph of shape-based classifiers.
To further analyze the performance of MPDTW, we draw a scatter plot comparing MPDTW with other shape-based methods (as shown in Fig. 8, each red point in the figures represents a dataset, and the points below the diagonal line indicate that the accuracy of MPDTW is higher). Figure 8 shows that compared with the traditional shape-based method, the accuracy of the proposed MPDTW is significantly better than ED (57/1/27), LCSS (48/1/36), DTW (55/3/27), DDTW (47/1/37), WDTW (48/3/34), CID (44/2/39), TWE (46/1/38), DTD-C (46/2/37), and shapeDTW (48/4/33). The numbers in parentheses indicate the number of datasets that win, equal and lose in sequence. For example, “(55/3/27)” means that the proposed MPDTW has a higher accuracy rate on 55 datasets, the accuracy rate on 3 datasets is consistent with the compared method, and the accuracy rate on 27 datasets is lower than that of the compared method. In addition, we use the significant difference plot proposed by DemÅ¡ar to score these classification methods [34]. The scoring results are shown in Fig. 9. We can find that MPDTW achieves the best overall average rank of 4.6941, which is lower than DTD, CID, and shapeDTW, etc.
(2) Structure-based classifiers: Structure-based classifiers usually first extract discriminative subsequences from the sequence, then perform symbolization or other conversions on the subsequences, and finally perform similarity measurements in the new feature space after the conversion. Typical structure-based classifiers include FS, BOP, SAXVSM, LS, DTW-F and WEASEL [35]. We also compare the classification performance of these classifiers with the proposed MPDTW through the critical difference graph. The comparison result is shown in Fig. 10. Except for the significant difference from WEASEL, MPDTW has no significant difference compared with the classic models of DTW-F and LS and is significantly better than FS, BOP and SAXVSM.
Critical difference graph of structure-based classifiers.
(3) Ensemble classifier: The ensemble classifier can obtain good classification performance by integrating a series of single classifiers. Currently, the popular ensemble classifiers in time series classification tasks include ACF, PS, LPS, TSF, TSBF, BOSS, EE, COTE and ST. It can be seen from Fig. 11 that although it is slightly weaker than ST, BOSS and COTE, the proposed MPDTW has no significant difference compared with the ensemble classifiers such as EE, TSF, and TSBF. In addition, as a single classifier, MPDTW can be integrated into these ensemble classifiers to improve their performance.
Critical difference graph of ensemble classifiers.
(4) Deep learning-based classifiers: In addition to the aforementioned traditional classifiers, in recent years, some researchers have also proposed many deep learning-based classifiers, such as multilayer perceptron (MLP) [36], MCNN [37], TleNet [38], MCDCNN [39], CNN [40], TWISN [41], ResNet, FCN and Encoder. Figure 12 shows the critical difference graph comparing MPDTW with these classification models, from which we can see that in addition to being slightly weaker than ResNet, MPDTW has no significant difference with FCN and Encoder, and is significantly better than the others. More importantly, the proposed MPDTW has better interpretability, which is an advantage that the classification model based on deep learning does not have.
Critical difference graph of deep learning-based classifiers
Through the previous comparison of classification accuracy, it can be seen that MPDTW is better than most classifiers. Although the results are weaker than individual classifiers, this does not mean that MPDTW is completely weaker than these classifiers. To prove this point, we compared these classifiers that are superior to MPDTW in the critical difference graph with MPDTW in terms of running time. Classifiers based on deep learning, due to the need to adjust a large number of hyperparameters, will not be tested here. The test is performed on 10 datasets. For objectivity, we take the average of 10 runs of the classifier on each dataset. The test result is shown in Fig. 13, from which we can see that, compared with ST and COTE, MPDTW has at least an order of magnitude advantage in running time, and for BOSS, MPDTW also has a running time shorter than it on most datasets.
The average running time of BOSS, COTE, WEASAL, ST, and MPDTW on 10 datasets.
The effect of Gaussian noise on classifier performance.
Besides the strong competitiveness in terms of accuracy and efficiency, our method can effectively reduce data noise due to the DWT method adopted in the similarity measurement process. For verification, we randomly selected three datasets Symbols, WordSynonyms, and BeetleFly and added Gaussian noise to them. Then, we tested the noise resistance of MPDTW with the noise-added datasets. Specifically, we compare MPDTW with current representative baselines, including shapeDTW, WEASEL, and ResNet, which are all competitive models in different types of single classifiers. The comparison results are shown in Fig. 14, where SNR is the signal-to-noise ratio, the larger the SNR, the smaller the noise. In particular, when the value of the x-axis is “Original”, it indicates the classification results without adding Gaussian noise. As can be seen from the figures, after adding different degrees of noise to the data, the classification accuracy of MPDTW on the three datasets is generally better than other methods, which indicates that MPDTW is more capable of dealing with noise.
Case study
The BeetleFly [28] dataset in UCR classifies insects by image contours. The time series in the dataset contains many peak and valley shapes, which enriches its local structure information. We tested the alignment capabilities of DTW, shapeDTW, and MPDTW on this dataset. The test result is shown in Fig. 15. In the part enclosed by the green box in the figures, we can see that there are obvious alignment errors in the alignment of DTW and shapeDTW, while MPDTW achieves a more accurate alignment, which shows that the classification result based on MPDTW is more interpretable.
In addition, we also show the superiority of the proposed MPDTW through the t-SNE diagram [42]. Detailed information on the dataset used is shown in Table 2. The three datasets contain 2, 4, and 6 time series types. It can be seen from Fig. 16 that MPDTW has the best distinguishing ability for all datasets among the three methods. Although the performance of shapeDTW on the dataset FaceFour is similar to MPDTW, there is still a small amount of error distribution in shapeDTW. In contrast, MPDTW distinguishes these time series completely and accurately.
Details of datasets
Details of datasets
Comparison of alignment effects of BeetleFly.
Comparison of measurement results between MPDTW, DTW and shapeDTW.
In this paper, we propose a classic time series local structure information coding method LMP, which uses the ideas of MP and SAX to realize the discretization and symbolic representation of the local structure information of the time series, and at the same time, the DWT technology is used to effectively extract the local trend information and filter noise alleviate of the time series. Based on LMP, we propose the time series similarity measurement method MPDTW, which weights the original numerical information and local structure information of the time series. MPDTW comprehensively considers the important role of time series original numerical information and local structure information in time series similarity measurement and to a large extent alleviates the limitations of traditional methods that cannot take into account the two factors. Experimental results show that MPDTW has excellent performance in time series classification tasks. In addition, the case study results show that the classification model based on this method has good interpretability.
Footnotes
Acknowledgments
This work is supported by the National Key R&D Program of China (No. 2022YFE0200400), the Beijing Natural Science Foundation (Nos. 4214067, 4182052), the National Natural Science Foundation of China (Nos. 61702030), and Fundamental Research Funds for the Central Universities (2022JBMC011).
