Local morphological patterns for time series classification

Abstract

The key problem of time series classification is the similarity measure between time series. In recent years, efficient and accurate similarity measurement methods of time series have attracted extensive attention from researchers. According to the different similarity measure strategies, the existing time series classification methods can be roughly divided into shape-based (original value) methods and structure-based (symbol transformation) methods. Shape-based methods usually use Euclidean distance (ED), dynamic time warping (DTW), or other methods to measure the global similarity between sequences. The disadvantage of these methods is that their measurement process does not necessarily achieve local sensible matchings of time series, which leads to a decrease in their accuracy and interpretability. To better capture the local information of the sequence, the structure-based methods discretize or symbolize the local value of the time sequence, which leads to the loss of the original information of the sequence. To address these problems, this paper proposes a novel similarity measurement method named dynamic time warping based on the local morphological pattern (MPDTW), which first decomposes the local subsequences of time series using discrete wavelet transforms for extracting the local structure information. Then, the decomposed subsequence will be encoded by the morphological pattern. Finally, the ED between points and their local structure difference based on morphological pattern will be weighted and applied to the DTW algorithm to measure the similarity between sequences. Experiments have been carried out on the classification tasks of the UCR datasets and the results show that our method outperforms the existing baselines.

Keywords

1. Introduction

With the development of modern industry, information technology, and the continuous advancement of data generation and collection technology, massive amounts of data are constantly being generated from many areas of daily production and life. These data can essentially be regarded as time series, and the corresponding classification technologies has attracted increasing attention in many practical fields, such as finance, medicine, geological monitoring, climate science, and aerospace [1, 2, 3, 4, 5, 6].

Time series classification is one of the most important research topics in the field of data mining and has received extensive attention from many researchers in recent years [7, 8, 5, 6, 9]. At present, some methods of time series classification have been proposed. According to the different similarity measure strategies used in the classification process, these methods can be divided into two types: shape-based methods and structure-based methods [10, 11]. The former mainly focuses on the global similarity between time series, using the original value of the series or its slope and other derivative values to measure the similarity and apply it to the classifiers. At present, similarity measure techniques commonly used in such classifiers include ED, longest common subsequence (LCSS) [12], and DTW [13]. The One Nearest Neighbor (1NN) classifiers using these distance measures are very easy to implement and have been widely used in time series classification. However, this type of time series classification method that relies on the global similarity of the sequence fails to handle the local structure information of the sequence during the calculation of the similarity. This will eventually decrease the interpretability and accuracy of the classification results. Figure 1 shows the sequence matching results of the similarity measure of two time series in the DTW-1NN classification model. Some local structures between the two sequences are not well matched (the area marked by the green box in Fig. 1).

Figure 1.

The similarity measure results of DTW in 1NN.

Structure-based classification methods pay more attention to the local structure information of the time series in the similarity measure. These classifiers first learn local features and perform feature space transformation on the time series, such as transforming the local subsequence of the time series into a symbol sequence. Then, the similarity measure between time series is calculated. Among these methods, symbolic aggregation approximation (SAX) [14], symbolic Fourier approximation (SFA) [1], and bag of pattern (BOP) [15], etc. are commonly used. Compared with shape-based methods, structure-based methods highlight the local structure information of the sequence. However, because the original data information of the time series is discarded during the feature space transformation, the structure-based methods also have the problem of sequence information loss during the similarity measure of the sequence. In addition, the feature space complexity of this type of method is often high, and at the same time, it is accompanied by a more complicated process of selecting the optimal parameters, which is not conducive to the classification results.

To address the above problems, we propose a new time series similarity measure method consisting of two sequential steps: (1) local morphological pattern encoding (LMP) and distance weighting. By using this similarity method in the 1NN classifier, it is possible to achieve accurate and interpretable time series classification. Specifically, when we measure the similarity between sequences, we first extract subsequences around each time stamp and then use discrete wavelet transform (DWT) [16] to decompose the subsequences. To extract the structure information, we propose the LMP to encode the subsequences to represent the local structure information at that timestamp. (2) In the distance weighting step, we assign appropriate weights to the ED and local morphological pattern difference between any two points in the time series and perform weighting to replace the traditional single ED. The proposed method alleviates the imbalance between the original information and local structure information in traditional time series similarity measurement methods.

The main contributions of this paper are summarized as follows:

We use DWT to decompose the time series subsequences, which can effectively extract the local structure information of the sequence. In addition, the noise reduction characteristics of DWT effectively alleviate the influence of data noise on the time series similarity measurement process.

The proposed LMP was successfully applied to the local structure information encoding of time series and realized effective representation of the local structure information of the sequence.

The proposed similarity measurement method effectively alleviates the imbalance between the original sequence information and local structural information in traditional similarity measurement methods by weighted ED and local morphological pattern differences. In addition, the 1NN classifier using MPDTW shows accurate, efficient, and interpretable classification results in the test using the UCR datasets.

The rest of the paper is organized as follows. Related work is discussed in Section 2. Section 3 gives a formal description of the time series classification problem and its related definitions. Section 4 elaborates on the similarity measurement method proposed in this paper. Section 5 introduces the experimental design and results. Finally, the conclusions of this article are drawn in Section 6.

2. Related work

In recent years, some methods for time series classification have been proposed. Here, according to similarity measure strategies, we divide these methods into two categories: shape-based methods and structure-based methods. Specifically, the shape-based methods rely on the global similarity of time series to classify them. At present, most of the research on this type of methods focus on alternative elastic distance measure methods that measure the global similarity of sequences. For example, typical ones are ED, LCSS, and DTW, which have been successfully applied to classify time series that have global similarity. However, these point-to-point matching methods are unreliable and error-prone due to the possibility that local different subsequences may be aligned. To improve DTW, in [17], derivative dynamic time warping (DDTW) is proposed to achieve a reasonable alignment, which converts the original sequence into a first-order derivative sequence. In [18], the authors developed a globally weighted dynamic time warping (WDTW) algorithm that assigns lower weights to points closer to the diagonal. In [19], the authors described a means of weighting a distance measure complexity invariant distance (CID) to compensate for differences in the complexity in the two series being compared. In addition, the time warp edit (TWE) [20] implements the elastic measure of similarity between sequences by imposing a penalty on the distance between point pairs. Górecki et al. [21] proposed the derivative transform distance (DTD), which extended the sine-cosine transform to DDTW to measure the similarity of time series. In particular, to achieve a reasonable match of local shape features, shapeDTW [22] converted the sequence into shape feature codes and then aligned them. These measurement methods are designed to use elastic distance metrics to compensate for small deviations between sequences and are usually used with the nearest neighbor classifier. When there are discriminative features in the entire sequence, a classification method that relies on the global similarity measurement is appropriate. However, when the discriminative characteristics of the sequence are contained in the local structure of the sequence, this type of method cannot capture the local structure information of the sequence well, which causes the loss of the sequence information during the measurement and affects the final classification efficiency. In addition, this type of algorithms are more sensitive to noise.

Structure-based methods focus on extracting the local structure information of the time series. The local feature representation after feature transformation usually has a better smoothing effect and is more resistant to noise. For example, in [23], the authors proposed time series forest (TSF) to overcome the problem of the large interval feature space by employing a random forest approach, using summary statistics of each interval as features. The time series bag of features (TSBF) [24] is an extension of TSF that has multiple stages. Similar to TSF and TSBF, learning pattern similarity (LPS) [3] is also based on intervals, but the main difference is that subseries become attributes rather than cases. In addition, bag of pattern (BOP) [15] works by applying SAX to subsequences to form words and using the distribution of words over a series to form a count histogram to classify samples. Different from the BOP forms term frequencies over the series, symbolic aggregate approximation-vector space model (SAXVSM) [14] forms term frequencies over classes and weights these by the inverse document frequency. DTW features (DTW-F) [25] combines DTW distances to training cases and SAX histograms. In addition, [26] proposed an extension of the decision tree shapelet approach [27], fast shapelets (FS) to speed up the shapelet feature discovery. [28] proposed a shapelet transformation that separates shapelet discovery from the classifier by finding the top $k$ shapelets on a single run. [29] described a shapelet discovery algorithm learned shapelets (LS), which adopts a heuristic gradient descent shapelet search procedure rather than enumeration. However, focusing on the local structure information of the sequence leads to these methods ignoring the original numerical information of the sequence. At the same time, these methods are often accompanied by a large feature space and complex parameter selection processes, which can adversely affect the final classification results.

Furthermore, considering achieving higher accuracy, some researchers have proposed ensembles that are highly competitive on general classification problems [30]. Typical examples include EE, BOSS, and COTE [5]. To obtain better accuracy, ensembles usually need to run multiple classifiers on each dataset, which causes their time complexity and space complexity to be much higher than that of a single classifier. Similarly, Fawaz et al. [6] introduced some classification techniques based on deep learning, such as ResNet, FCN, and Encoder. The characteristic of this kind of method is to regard the time series as image data and then use more mature image recognition and classification technology for classification, and some models have achieved promising classification accuracy. However, this type of classifier also one-sidedly emphasize accuracy and are often accompanied by a complicated hyperparameter adjustment process, and the classification result is often not interpretable.

3. Definition and related technology

In this section, some definitions and technologies related to time series classification will be introduced in detail.

3.1 Basic definition

Definition 1 (Time series and its subsequences): The time series $T=(t_{1},t_{2},\ldots,t_{n})$ is an ordered real-valued sequence of length $n$ . The subsequence $S=(t_{i},t_{i+1},\ldots,t_{i+l-1})$ of a time series is a sequence consisting of $l$ consecutive ordered values from position $i$ in $T$ , where $1\leqslant l\leqslant n$ , $1\leqslant i\leqslant n-l+1$ .

Definition 2 (Time series dataset): A dataset is a set of time series $T_{i}$ and its class labels $c_{i}$ . Usually, $D=\left\{{\left\langle{{T_{1}},{c_{1}}}\right\rangle,\left\langle{{T_{2}},{c_{% 2}}}\right\rangle\ldots,\left\langle{{T_{N}},{c_{N}}}\right\rangle}\right\}$ , where $N$ is the number of time series in $D$ .

Definition 3 (Distance between time series points): Given two time series $X=(x_{1},x_{2},\ldots,x_{m})$ , $y=(y_{1},y_{2},\ldots,y_{n})$ , the distance between any point $x_{i}$ in the sequence $X$ and any point $y_{j}$ in the sequence $Y$ can be expressed by Eq. (1).

$\displaystyle\textit{dist}\left(x_{i},y_{j}\right)=\left|x_{i}-y_{j}\right|.$ (1)

It should be noted that when both $x_{i}$ and $y_{j}$ are represented by symbols, the difference calculation ( $x_{i}-y_{j}$ ) depends on the index determination of the symbol value in the alphabet. For example, for a given sequence, its point values are represented by the characters in a given ordered alphabet $\left\{a,b,c,d,e\right\}$ . Then, the difference between the symbols “ $a$ ” and “ $b$ ” is represented as $\left|{a-b}\right|=1$ .

Definition 4 (Equal-length subsequence distance): Given two subsequences $S_{1}=(u_{1},u_{2},\ldots,u_{l})$ , $S_{2}=(v_{1},v_{2},\ldots,v_{l})$ , the distance between $S_{1}$ and $S_{2}$ can be formulated as

$\displaystyle\textit{Dist}\left(S_{1},S_{2}\right)=\sqrt{\sum\limits_{i=1}^{l}% \textit{dist}\left(u_{i},v_{i}\right)^{2}}.$ (2)

Definition 5 (Time series classification): Given a time series set $D=\left\{T_{1},T_{2},\ldots,T_{N}\right\}$ , each sequence $T_{i}$ is associated with a class label $c_{i}$ ( $i=$ 1, 2, $\ldots$ , $N$ . The goal of time series classification is to use existing labeled time series to train a classification model and label new time series.

3.2 Related technology

3.2.1 Morphological pattern

The morphological pattern (MP) [11] is a classic time series encoding method that can encode the rate of change of the time series upward trend and downward trend into a series of discrete values, thereby effectively reflecting the overall time series trend. For the time series $X=(x_{1},x_{2},\ldots,x_{n})$ , the MP sequence $F=(f_{1},f_{2},\ldots,f_{n})$ can be obtained by Eq. (3).

$\displaystyle{f_{i}}\left\{\begin{array}[]{l}3,\left({{x_{i}}-{x_{i-1}}}\right% )/t>1\\ 2,\left({{x_{i}}-x{}_{i-1}}\right)/t=1\\ 1,0<\left({{x_{i}}-x{}_{i-1}}\right)/t<1\\ 0,{x_{i}}=x{}_{i-1}\\ -1,-1<\left({{x_{i}}-{x_{i-1}}}\right)/t<0\\ -2,\left({{x_{i}}-{x_{i-1}}}\right)/t=-1\\ -3,\left({{x_{i}}-{x_{i-1}}}\right)/t<-1\end{array}\right.,$ (3)

where $t$ is the index interval between two adjacent sampling time points.

3.2.2 Symbolic aggregate approximation

Based on the segmented limit theorem and the central limit theorem of the piecewise aggregate approximation (PAA), Lin et al. combined the normal distribution characteristics of time series and proposed the symbolic representation of time series (SAX) [31]. SAX has been proven to be a fast and effective tool to solve the problem of time series classification. It can convert a time series $X$ of length $n$ into a symbol sequence of length $h$ ( $h<<n$ ), and simplify the representation of the time series, and simultaneously smooth the time series and reduce their noise. The symbolization process of a time series is shown in Fig. 2, which mainly includes 3 steps:

Figure 2.

Symbolization process of time series.

Figure 3.

Haar wavelet transform of time series.

Step 1: SAX is usually used for time series obeying the standard normal distribution, so it is necessary to standardize the time series with Eq. (4).

$\displaystyle\textit{NX}=\frac{{X-\mu}}{\sigma},$ (4)

where NX represents the sequence obtained by standardizing $X$ , $\mu$ is the mean value of all points in $X$ , and $\sigma$ is the standard deviation of the value of these points.

Step 2: As shown in Fig. 2, the standardized sequence NX needs to be divided into $w$ segments of equal length by PAA. Then the means of these segments are used to represent the corresponding sequence segments. The calculation of the segment means is shown in Eq. (5).

$\displaystyle{\bar{x}_{i}}=\frac{w}{n}\sum\limits_{j=\frac{n}{w}\left({i-1}% \right)+1}^{\frac{n}{w}i}x_{j},$ (5)

where ${\bar{x}_{i}}$ is the mean value of the ith segment, and $x_{j}$ is the point in $X$ .

Step 3: Divide the distribution space of the segment mean obtained in Step 2 into equal probability or width intervals and assign a symbol to each interval. For example, all PAA segment averages below the minimum breakpoint (the boundary value of the division interval) can be mapped to the symbol “ $a$ ” and all the segment averages greater than or equal to the minimum breakpoint and less than the second minimum breakpoint can be mapped to the symbol “ $b$ ”. The symbolized result of the time series in Fig. 2 is “eaaddddc”.

3.2.3 Discrete wavelet transform

The discrete wavelet transform (DWT) is widely used in the field of signal and image processing. It can decompose signals and images into low-frequency components and high-frequency components. The low-frequency components contain the trend information of the signal, and the noise in the signal is concentrated in the high-frequency components [32]. Due to various factors in the environment, the time series obtained in real-life production often contains considerable noise. Given the abovementioned characteristics of DWT technology, we extend it to the local information extraction of time series. Concretely, we use DWT to extract the corresponding low-frequency components of time series subsequences to express the local structure information. Due to the different wavelet coefficients, there are many discrete wavelet transform strategies. For simplicity, this paper uses the Haar discrete wavelet transform to extract and represent the local structure information of the subsequence [33]. Figure 3 shows the extraction result of the Haar wavelet transformation on the trend information of the time series. The trend information of the series after the wavelet transform is retained, and the noise is significantly alleviated.

Here is the extraction process of the local trend component (low frequency component) of the time series $T=\left\{{{t_{1}},{t_{2}},\ldots,{t_{n}}}\right\},{T\in{R^{n}}}$ , which includes the following 2 steps:

Step 1: In the original time series, each point is taken as the center point to extract the subsequence $S_{i}$ with a length $l$ (for simplicity, $l$ is set to an even number here).

Step 2: Given any subsequence $S=\left\{{{s_{1}},{s_{2}},\ldots,{s_{l}}}\right\},{S\in{R^{l}}}$ , $\textit{SL}=\left\{{s_{1}^{L},s_{2}^{L},\ldots,s_{l/2}^{L}}\right\}$ represents the low-frequency component of the subsequence S, and each low-frequency component element ${s_{i}^{L}}$ of the subsequence can be extracted according to Eq. (6).

$\displaystyle{s_{i}^{L}=\frac{1}{2}\left({{s_{2i-1}}+{s_{2i}}}\right)}.$ (6)

According to Eq. (6), the low-frequency component of all subsequences in time series $T$ can be obtained, and $T$ can be finally represented as $\textit{TL}=\left\{{\textit{SL}_{1},\textit{SL}_{2},\ldots,\textit{SL}_{n}}% \right\},{\textit{TL}\in{R^{l\times n}}}$ .

3.2.4 Dynamic time warping

DTW is a dynamic programming algorithm that can measure the similarity between sequences through nonlinear distortions and find the optimal matching path between time series. In addition, DTW can be applied to both univariate and multivariate time series. For simplicity, only univariate time series are considered in this paper.

Figure 4.

LMP encoding process.

Given two time series $P=\left\{{{p_{1}},{p_{2}},\ldots,{p_{m}}}\right\}$ , $P\in{R^{m}}$ and $Q=\left\{{{q_{1}},{q_{2}},\ldots,{q_{n}}}\right\}$ , $Q\in{R^{n}}$ . The distance matrix of the points between $P$ and $Q$ is defined as $\textit{DIST}\in{R^{m\times n}}$ , where each element $\textit{dist}\left({{p_{i}},{q_{j}}}\right)$ represents the distance between $p_{i}$ and $q_{j}$ , $\left({1\leqslant i\leqslant m,1\leqslant j\leqslant n}\right)$ . For the time series, the purpose of aligning $P$ and $Q$ is to find an alignment path $W=\left({\left({{e_{1}},{f_{1}}}\right),\left({{e_{2}},{f_{2}}}\right),\ldots,% \left({{e_{l}},{f_{l}}}\right)}\right)$ , $\textit{min}\left({m,n}\right)\leqslant l\leqslant\left({m+n-1}\right)$ , in which the point with index $e_{i}$ in sequence $P$ and the point with index $f_{i}$ in sequence $Q$ are matched. The final obtained path $W$ requires $\sum_{i=1}^{l}{\textit{dist}\left({{p_{{e_{i}}}},{q_{{f_{i}}}}}\right)}$ to take the minimum value, while the alignment path is constrained to satisfy boundary, monotonicity and continuity conditions, as shown in Eq. (7).

$\displaystyle{\left\{\begin{array}[]{l}\left({{e_{1}},{f_{1}}}\right)=\left({1% ,1}\right)\\ \left({{e_{l}},{f_{l}}}\right)=\left({m,n}\right)\\ \left({{e_{i+1}},{f_{i+1}}}\right)-\left({{e_{i}},{f_{i}}}\right)\in\left\{{% \left({1,0}\right),\left({1,1}\right),\left({0,1}\right)}\right\}\end{array}% \right.},$ (7)

where $\gamma\left({i,j}\right)$ denotes the iterative distance between $P$ and $Q$ . The recursive process of $\gamma\left({i,j}\right)$ is shown in Eqs (8) and (9).

$\displaystyle\gamma\left({i,j}\right)=\textit{dist}\left({{p_{i}},{q_{j}}}% \right)+\textit{min}\left\{{\gamma\left({i-1,j-1}\right),\gamma\left({i-1,j}% \right),\gamma\left({i,j-1}\right)}\right\},$ (8) $\displaystyle\textit{DTW}\left({P,Q}\right)=\gamma\left({m,n}\right).$ (9)

4. Proposed method

In this section, we specifically introduce the proposed similarity measure method MPDTW. Concretely, as shown in Fig. 4, the LMP encoding algorithm is first used to extract the local structure information of the time series. Then, based on the encoding results, we use the LMPD algorithm to calculate the local structural difference between time series points. Finally, the MPDTW measures the time series similarity by weighting the LMPD and ED.

4.1 Local morphological pattern encoding

As mentioned in Section 2, the morphological pattern can well reflect the overall trend of the sequence by encoding the upward and downward trends of the time series into a series of discrete values. At the same time, the symbolic representation method SAX can achieve a simple and effective representation of the time series by segmenting and symbolizing the sequence, which smooths the sequence and reduces noise. To extract the local structure information of the sequence as efficiently as possible while avoiding the influence of data noise, we designed the extraction method of the local structure information of the time series with reference to MP and SAX, which is named local morphological pattern encoding (LMP). LMP symbolizes and encodes subsequences based on the morphological pattern framework. Figure 4 specifically shows the process of LMP encoding time series.

Given a time series $X=\left({{x_{1}},{x_{2}},\ldots,x{}_{m}}\right)$ , $Y=\left({{y_{1}},{y_{2}},\ldots,y{}_{n}}\right)$ , the local structure information extraction process is as follows.

Step 1: Extract the subsequence $S=\left({{s_{1}},{s_{2}},\ldots,s{}_{l}}\right)$ of length $l$ with each point $x_{i}$ as the center point in $X$ , as shown in Fig. 4 (the subsequence extraction method of $Y$ is the same as that of $X$ ). For simplicity, $l$ only takes even values.

Step 2: Use DWT to decompose the subsequence extracted in Step 1 and extract the trend component $\textit{ST}=\left({{t_{1}},{t_{2}},\ldots,t_{l/2}}\right)$ of the subsequence. The extraction method is shown in Eq. (6).

Step 3: The trend component ST of each subsequence extracted in Step 2 is coded by the LMP encoding Eqs (10) and (11), and the symbol aggregation approximate representation of the trend component of the subsequence can be obtained, namely, $\textit{SM}=\left({{f_{1}},{f_{2}},\ldots,f{}_{l/2}}\right)$ . Here, $f_{j}$ represents the symbol value corresponding to each value $t_{j}$ in the trend component ST after LMP encoding conversion, and the symbols belong to the given alphabet $\left\{{a,b,c,d,e,f,g}\right\}$ . To preserve the correlation between the local structure information and the center point, we use the difference between each point in the trend component and the center point of the corresponding original subsequence instead of the difference between adjacent points in the traditional MP. In short, we use $\left({{t_{j}}-{x_{i}}}\right)$ to replace $\left({{t_{j}}-{t_{j-1}}}\right)$ . $\Delta t$ represents the index difference between each trend component point and the center point of the trend component, that is, the index difference between $t_{j}$ and $t_{i}$ , $\varepsilon$ is the distribution space division point of the segmented mean of the time series. Figure 3 shows the LMP encoding process of a subsequence in the time series, and as shown in the figure, the final LMP encoding result can be expressed as “egggfaeegg”.

$\displaystyle{{f_{j}}=\left\{\begin{array}[]{l}g,\left({{t_{j}}-{x_{i}}}\right% )/\Delta t>\varepsilon\\ f,\left({{t_{j}}-{x_{i}}}\right)/\Delta t=\varepsilon\\ e,0<\left({{t_{j}}-{x_{i}}}\right)/\Delta t<\varepsilon\\ d,{t_{j}}={x_{i}},1\leqslant j\leqslant l/2\\ c,-\varepsilon<\left({{t_{j}}-{x_{i}}}\right)/\Delta t<0\\ b,\left({{t_{j}}-{x_{i}}}\right)/\Delta t=-\varepsilon\\ a,\left({{t_{j}}-{x_{i}}}\right)/\Delta t<-\varepsilon\end{array}\right.},$ (10)

$\displaystyle{\Delta t=\left\{\begin{array}[]{l}\left|{j-\left\lceil{l/4}% \right\rceil}\right|,{\rm{}}j\neq l/4\\ 1,j=l/4\end{array}\right.}.$ (11)

Through LMP encoding, the symbolic representation of the local structure information of the time series $X$ and $Y$ at any point $x_{i}$ and $y_{j}$ can be obtained. At the same time, we applied this representation to the quantitative calculation of the local structural difference between the sequences $X$ and $Y$ at points $x_{i}$ and $y_{j}$ . At the same time, we use Eq. (2) to calculate the local structural difference LMPD between $x_{i}$ and $y_{j}$ , which is specifically expressed as Eq. (12).

$\displaystyle\textit{LMPD}\left({{x_{i}},{y_{j}}}\right)=\textit{Dist}\left({% \textit{SM}_{X,i},\textit{SM}_{Y,j}}\right)=\sum\limits_{k=1}^{l/2}{\textit{% dist}{{\left({{\eta_{i,k}},{\beta_{j,k}}}\right)}^{2}}}=\sum\limits_{k=1}^{l/2% }{{{\left|{{\eta_{i,k}}-{\beta_{j,k}}}\right|}^{2}}},$ (12)

where $\textit{SM}_{X,i}$ and $\textit{SM}_{Y,j}$ represent the LMP encoding sequence of the sequence $X$ and $Y$ at the two points of $x_{i}$ and $y_{j}$ , ${\eta_{i,k}}$ and ${\beta_{j,k}}$ represent the symbols in the LMP encoding sequence of the two points of $x_{i}$ and $y_{j}$ , and the calculation method of the distance between symbols is shown in Eq. (1).

Algorithm 1 shows the LMP encoding algorithm. The first line is the initialization of the conversion sequence; Lines 2–4 are the DWT processing on the subsequence $S$ to extract structural information; Lines 5–7 are LMP processing on the subsequence after DWT processing according to Eqs (10) and (11), and finally, the LMP encoding of $S$ can be obtained.

: LMP encoding

[1] time series subsequence $S$ ; division point $\varepsilon$ . LMP coding sequence $M^{\prime}$ of subsequence $S$ ; Initialize the DWT conversion sequence $S^{\prime}[\textit{length}(S)/2]$ and LMP coding sequence $M^{\prime}[\textit{length}(S)/2]$ of $S$ ; ( $i=$ 1; $i<=\textit{length}(S)/2;i++)$ Calculate each point $S^{\prime}[i]$ in $S^{\prime}$ according to Eq. (6) and subsequence $S$ ; ( $i=$ 1; $i<=\textit{length}(S)/2;i++)$ Calculate each point $M^{\prime}[i]$ in $M^{\prime}$ according to Eqs (10) and (11); RETURN $M^{\prime}$

4.2 Similarity measure algorithm based on LMP

Traditional DTW algorithm usually uses ED to calculate the distance between time series point pairs. For example, given two time series $P=\left\{{{p_{1}},{p_{2}},\ldots,{p_{m}}}\right\}$ , $P\in{R^{m}}$ , and $Q=\left\{{{q_{1}},{q_{2}},\ldots,{q_{n}}}\right\}$ , $Q\in{R^{n}}$ , the distance between any two points $p_{i}$ and $q_{j}$ is expressed as $\textit{dist}\left({{p_{i}},{q_{j}}}\right)=\left|{{p_{i}}-{q_{j}}}\right|$ . Although this method effectively alleviates the timing offset problem in the time series measurement process through dynamic programming, it fails to capture the local structural information of the sequence effectively. In order to address this problem, in this paper, we adopt a weighting strategy for the distance calculation of point pairs between sequences. Specifically, as shown in Eq. (13), we use the ED between each point pair and LMPD of the subsequence centered on the point pair to represent the total distance between the point pairs.

$\displaystyle{d_{M}}\left({{p_{i}},{q_{j}}}\right)=\left({1-\alpha}\right)% \cdot\textit{dist}{\left({{p_{i}},{q_{j}}}\right)^{2}}+\alpha\cdot\textit{LMPD% }\left({{p_{i}},{q_{j}}}\right)=\left({1-\alpha}\right)\cdot{\left|{{p_{i}}-{q% _{j}}}\right|^{2}}+\alpha\cdot\sum\limits_{k=1}^{l/2}{{{\left|{\beta_{k}^{i}-% \beta_{k}^{j}}\right|}^{2}}},$ (13)

where $\alpha$ is the weight of the LMPD, ( $1-\alpha$ ) is the weight of the ED, and ${d_{M}}\left({{p_{i}},{q_{j}}}\right)$ represents the total similarity between the point pairs. Afterwards, the similarity between time series can be calculated by the distance iteration Eqs (14) and (15). It should be noted that this similarity measure takes into account both global and local information of the sequence.

$\displaystyle{\gamma\left({i,j}\right)={d_{M}}\left({{p_{i}},{q_{j}}}\right)+% min\left\{{\gamma\left({i-1,j-1}\right),\gamma\left({i-1,j}\right),\gamma\left% ({i,j-1}\right)}\right\}},$ (14)

$\displaystyle{\textit{MPDTW}\left({P,Q}\right){\rm{=}}\gamma\left({m,n}\right)}.$ (15)

Algorithm 4.2 shows the calculation process of MPDTW. Specifically, the first line is the initialization of the distance measurement matrix; Lines 2–10 are the entire process of the similarity measure of sequences $P$ and $Q$ ; Line 11 is the last cumulative distance in the distance matrix, which is also the distance between sequence $P$ and $Q$ .

: Calculation of MPDTW

[1] time series $P$ and $Q$ , division point $\varepsilon$ ; subsequence length $l$ ; LMPD weight $\alpha$ Similarity measurement result MPDTW between sequence $P$ and $Q$ ; Initialization of distance measure matrix $\gamma\left[{\textit{length}\left(P\right)}\right]\left[{\textit{length}\left(% Q\right)}\right]$ ; ( $i=$ 1; $i<=\textit{length}(P)$ ; $i++$ ) ( $j=$ 1; $j<=\textit{length}(Q)$ ; $j++$ ) Extract the subsequence $P_{i}$ , $Q_{j}$ of length $l$ in $P$ , $Q$ with the point $p_{i}$ , $q_{j}$ as the center; ${M_{i}}\leftarrow$ Algorithm 4.1( $P_{i}$ , $\varepsilon$ , $p_{i}$ ); ${M_{j}}\leftarrow$ Algorithm 4.1( $Q_{j}$ , $\varepsilon$ , $q_{j}$ ); Calculate the distance ${d_{M}}\left({{p_{i}},{q_{j}}}\right)$ between $p_{i}$ and $p_{j}$ according to Eq. (13); Calculate the cumulative distance $\gamma\left({i,j}\right)$ of the sequence $P$ and $Q$ at the two points $p_{i}$ and $q_{j}$ according to Eq. (14); $\textit{MPDTW}\left({P,Q}\right)\leftarrow\gamma\left({\textit{length}\left(P% \right),\textit{length}\left(Q\right)}\right)$ . RETURN $\textit{MPDTW}\left({P,Q}\right)$

The proposed MPDTW considers the local structure information of the time series when calculating the similarity, and its time complexity is $O\left({{n^{2}}+l{n^{2}}}\right)$ . Generally, compared with $n$ , $l$ is much smaller. Thus the complexity of MPDTW is almost the same as $O\left({{n^{2}}}\right)$ of the traditional DTW and does not obviously increase the time consumption.

5. Experiments

We conduct extensive experiments on 85 univariate time series datasets to prove the effectiveness of MPDTW. These datasets are all from the UCR1

¹
https://www.timeseriesclassification.com.

time series database [5]. Each dataset contains two parts: the training set and the test set. The training set is used to train the classification model, and the test set is used to test the classification model. Notably, the data characteristics of these datasets are different, which are mainly reflected in three aspects: the length of the time series in the dataset varies between 24 and 2709; the size of the training set in the dataset varies between 16 and 8926; and the types of class labels in the datasets vary from 2 to 60.

5.1 Parameter analysis and selection

The proposed MPDTW includes three parameters: division point $i$ , LMPD weight $\alpha$ , and subsequence length $l$ . We used 10 datasets to analyze all the parameters.

First, we tested the impact of division point $\varepsilon$ on the classification accuracy. At the same time, according to our experience, the initial values of $l$ and $\alpha$ are set to 22 and 0.06, respectively, and the test range of $\varepsilon$ is set to [0.015, 0.15]. Figure 5 shows the test results. The purple curve in the figure is the change in the average accuracy of the classifier on the 10 datasets. As seen from the figure, when $\varepsilon$ changes, the accuracy on these datasets (especially WormsTwoClass, Wine, and Lightning7) fluctuates significantly. Specifically, the accuracy of most datasets is relatively stable when $\varepsilon\leqslant$ 0.105, and then the accuracy of some datasets begins to decline. The average accuracy curve also reflects this trend. In view of this, we set the value of the parameter $\varepsilon$ to 0.045, in which the average accuracy reaches the maximum value.

Figure 5.

The influence of local morphology breakpoint $\varepsilon$ on classification accuracy.

Figure 6.

The influence of local morphological pattern weight $\alpha$ on classification accuracy.

The LMPD weight $\alpha$ reflects the proportion of local structural information when using MPDTW to calculate the similarity. Figure 6 shows the effect of different $\alpha$ values on the final classification results when $\varepsilon=$ 0.045 and $l=$ 22, in which the purple curve reflects the trend of the average accuracy. As seen from Fig. 6 ( $\alpha$ - $A$ ) and ( $\alpha$ - $B$ ), the influence of the weight $\alpha$ on the classification results shows two distinct situations according to the different datasets. Note that $\alpha=$ 0 means that only the ED is used to measure the similarity of the time series; $\alpha=$ 1 means that only the LMPD is used to measure the similarity of the time series. For $\alpha$ - $A$ , it can be seen from Fig. 6 (left) that when the weight $\alpha$ changes, the accuracy of this part of the dataset is relatively stable. Specifically, when $\alpha\leqslant$ 0.35, as $\alpha$ increases, the classification accuracy of most datasets increases significantly; when $\alpha>$ 0.35, the accuracy of some datasets decreases to varying degrees; and when $\alpha=$ 0.22, the average accuracy is close to the maximum value. Therefore, we take $\alpha$ - $A=$ 0.22 as a candidate value of $\alpha$ . For $\alpha$ - $B$ , it can be seen in Fig. 6 (right) that the accuracy of this part of the datasets only shows a significant increase before the weight $\alpha=$ 0.02 and then decreases significantly and eventually stabilizes. It shows that for some datasets, a very small number of local morphological pattern components can significantly improve the result of the sequence similarity measure, and as the proportion of morphological pattern components increases, this effect begins to weaken. Considering that the average accuracy is close to the maximum when $\alpha$ - $B=$ 0.02 in Fig. 6 (right), we take $\alpha$ - $B=$ 0.02 as another candidate value of $\alpha$ .

For the subsequence length $l$ , we tested it in two cases of $\alpha$ - $A$ and $\alpha$ - $B$ . The test results are shown in Fig. 7, where $l$ - $A$ shows the effect of $l$ on the classification accuracy when $\alpha=$ 0.22 and $\varepsilon=$ 0.045; $l$ - $B$ shows the effect of $l$ on the classification accuracy when $\alpha=$ 0.02 and $\varepsilon=$ 0.045. For the test range of $l$ , based on the existing experience, we set it as $2\leqslant l\leqslant 50$ . It can be seen from the figures that in the two cases of $\alpha$ - $A$ and $\alpha$ - $B$ , the classification accuracy of these datasets fluctuates significantly with the change of $l$ and the accuracy of most datasets first increases and then decreases with the increase of $l$ . In addition, as shown by the purple curve in the figures, the average accuracy of the datasets in both cases is close to the maximum when $l=$ 22. In view of this, we set $l$ in the MPDTW to 22.

Figure 7.

The effect of the change of subsequence length l on classification accuracy.

In summary, according to the test results, the division point $\varepsilon$ and length $l$ of the subsequence of MPDTW are set to 0.045 and 22, respectively. For $\alpha$ , its two values, 0.22 and 0.02, indicate that different datasets are affected by the morphological pattern to different degrees. In view of this situation, we select the specific value of $\alpha$ through cross-validation in the experiment. Specifically, when classifying different datasets, we first randomly select a small part of the training set to compare the classification results under the two values of $\alpha=$ 0.22 and $\alpha=$ 0.02 and then select the better one as the value of $\alpha$ . Because the amount of data used in cross-validation is much smaller than the total amount of data in the dataset and only two values of one parameter need to be verified, the process does not have a significant impact on the overall efficiency of the classification results.

5.2 Classifier performance evaluation

In this section, we use MPDTW in 1NN to perform classification tests on 85 time series datasets in UCR and compare the classification results with popular baselines to analyze the performance of the proposed method. According to different classification strategies, we divided the existing benchmark classifiers into four types to compare with 1NN-based MPDTW. The detailed evaluation results are as follows.

(1) Shape-based classifier: Shape-based classification methods measure and classify time series based on global similarity, such as ED, LCSS, DTW, DDTW, WDTW, CID, TWE, DTD, and shapeDTW.

Table 1
Classification accuracy of models based on DTW

Datasets	DDTW	WDTW	shapeDTW	MPDTW	Datasets	DDTW	WDTW	shapeDTW	MPDTW
Adiac	0.672	0.617	0.731	0.619	MedicalImages	0.668	0.751	0.736	0.761
ArrowHead	0.871	0.811	0.823	0.731	MiddlePhalanxOA	0.591	0.566	0.740	0.506
Beef	0.610	0.524	0.733	0.867	MiddlePhalanxOC	0.781	0.753	0.750	0.715
BeetleFly	0.850	0.804	0.800	0.950	MiddlePhalanxTW	0.507	0.509	0.571	0.487
BirdChicken	0.853	0.831	0.950	0.850	MoteStrain	0.756	0.848	0.890	0.911
Car	0.750	0.720	0.867	0.867	NonInvasiveECG1	0.709	0.817	0.781	0.526
CBF	0.545	0.993	0.920	0.962	NonInvasiveECG2	0.846	0.888	0.860	0.719
ChlorineConcent	0.690	0.648	0.645	0.693	OliveOil	0.784	0.868	0.900	0.900
CinCECGtorso	0.944	0.908	0.651	0.927	OSULeaf	0.865	0.643	0.868	0.847
Coffee	0.982	0.986	0.964	1.000	PhalangesOC	0.780	0.763	0.739	0.730
Computers	0.688	0.687	0.644	0.664	Phoneme	0.255	0.223	0.264	0.283
CricketX	0.617	0.779	0.792	0.769	Plane	0.994	1.000	1.000	1.000
CricketY	0.573	0.750	0.774	0.810	ProximalPhaOA	0.781	0.773	0.790	0.815
CricketZ	0.610	0.784	0.792	0.815	ProximalPhaOC	0.857	0.814	0.794	0.801
DiatomSizeRed	0.920	0.958	0.931	0.941	ProximalPhaTW	0.750	0.731	0.725	0.732
DistalPhalanxOAG	0.699	0.734	0.767	0.691	RefrigerationDev	0.566	0.570	0.493	0.451
DistalPhalanxOC	0.759	0.754	0.772	0.681	ScreenType	0.555	0.465	0.475	0.413
DistalPhalanxTW	0.622	0.619	0.710	0.604	ShapeletSim	0.662	0.682	0.972	0.889
Earthquakes	0.700	0.695	0.742	0.719	ShapesAll	0.856	0.811	0.888	0.818
ECG200	0.855	0.864	0.900	0.820	SmallKitchenApp	0.675	0.679	0.699	0.723
ECG5000	0.926	0.927	0.929	0.936	SonyAIBORobotS1	0.745	0.811	0.807	0.684
ECGFiveDays	0.685	0.824	0.943	0.978	SonyAIBORobotS2	0.852	0.853	0.826	0.858
ElectricDevices	0.773	0.791	0.600	0.619	StarlightCurves	0.964	0.913	0.900	0.959
FaceAll	0.936	0.960	0.762	0.785	Strawberry	0.959	0.954	0.949	0.965
FaceFour	0.732	0.860	0.909	0.966	SwedishLeaf	0.906	0.858	0.915	0.814
FacesUCR	0.873	0.923	0.919	0.948	Symbols	0.935	0.942	0.961	0.984
FiftyWords	0.763	0.765	0.758	0.822	SyntheticControl	0.563	0.989	0.847	0.967
Fish	0.910	0.817	0.949	0.966	ToeSegmentation1	0.702	0.728	0.899	0.833
FordA	0.732	0.677	0.721	0.558	ToeSegmentation2	0.804	0.862	0.862	0.915
FordB	0.717	0.663	0.739	0.559	Trace	0.995	1.000	1.000	1.000
GunPoint	0.973	0.956	0.993	0.980	TwoLeadECG	0.940	0.910	0.994	0.984
Ham	0.691	0.747	0.543	0.619	TwoPatterns	0.997	1.000	0.999	1.000
HandOutlines	0.853	0.855	0.794	0.768	UWaveGestureLAll	0.923	0.961	0.942	0.965
Haptics	0.323	0.406	0.377	0.429	UWaveGestureLX	0.740	0.775	0.737	0.735
Herring	0.505	0.550	0.500	0.516	UWaveGestureLY	0.659	0.687	0.642	0.718
InlineSkate	0.453	0.404	0.384	0.424	UWaveGestureLZ	0.673	0.684	0.662	0.711
InsectWingbeatS	0.531	0.553	0.416	0.504	Wafer	0.995	0.996	0.990	0.995
ItalyPowerDemand	0.956	0.934	0.897	0.948	Wine	0.850	0.885	0.463	0.759
LargeKitchenApp	0.768	0.795	0.840	0.864	WordSynonyms	0.728	0.731	0.740	0.765
Lightning2	0.691	0.837	0.885	0.869	Worms	0.638	0.579	0.525	0.688
Lightning7	0.616	0.754	0.767	0.781	WormsTwoClass	0.710	0.677	0.713	0.831
Mallat	0.933	0.945	0.938	0.875	Yoga	0.835	0.858	0.883	0.878
Meat	0.821	0.971	0.900	0.783	Best	12	21	24	35

Essentially, the proposed MPDTW is based on the DTW framework, so here, we first compare the classification accuracy of MPDTW with DDTW, WDTW, and shapeDTW, which are also based on the DTW measure framework. The results are shown in attached Table 1 (considering the space, the results of DTW are not listed in the table). The accuracy rate in bold in the table indicates that the accuracy rate of the corresponding model is the best among the four models. The statistical results of the data in Table 1 show that DDTW, WDTW, shapeDTW, and MPDTW have the best classification accuracy on 12, 21, 24, and 35 datasets, respectively, and the optimal number of MPDTW is significantly more than other classification models. In particular, compared with the other three models, the accuracy of MPDTW is improved by more than 10% on 16, 10, and 6 datasets.

Figure 8.

Comparison of shape-based classifiers.

Figure 9.

Critical difference graph of shape-based classifiers.

To further analyze the performance of MPDTW, we draw a scatter plot comparing MPDTW with other shape-based methods (as shown in Fig. 8, each red point in the figures represents a dataset, and the points below the diagonal line indicate that the accuracy of MPDTW is higher). Figure 8 shows that compared with the traditional shape-based method, the accuracy of the proposed MPDTW is significantly better than ED (57/1/27), LCSS (48/1/36), DTW (55/3/27), DDTW (47/1/37), WDTW (48/3/34), CID (44/2/39), TWE (46/1/38), DTD-C (46/2/37), and shapeDTW (48/4/33). The numbers in parentheses indicate the number of datasets that win, equal and lose in sequence. For example, “(55/3/27)” means that the proposed MPDTW has a higher accuracy rate on 55 datasets, the accuracy rate on 3 datasets is consistent with the compared method, and the accuracy rate on 27 datasets is lower than that of the compared method. In addition, we use the significant difference plot proposed by DemÅ¡ar to score these classification methods [34]. The scoring results are shown in Fig. 9. We can find that MPDTW achieves the best overall average rank of 4.6941, which is lower than DTD, CID, and shapeDTW, etc.

(2) Structure-based classifiers: Structure-based classifiers usually first extract discriminative subsequences from the sequence, then perform symbolization or other conversions on the subsequences, and finally perform similarity measurements in the new feature space after the conversion. Typical structure-based classifiers include FS, BOP, SAXVSM, LS, DTW-F and WEASEL [35]. We also compare the classification performance of these classifiers with the proposed MPDTW through the critical difference graph. The comparison result is shown in Fig. 10. Except for the significant difference from WEASEL, MPDTW has no significant difference compared with the classic models of DTW-F and LS and is significantly better than FS, BOP and SAXVSM.

Figure 10.

Critical difference graph of structure-based classifiers.

(3) Ensemble classifier: The ensemble classifier can obtain good classification performance by integrating a series of single classifiers. Currently, the popular ensemble classifiers in time series classification tasks include ACF, PS, LPS, TSF, TSBF, BOSS, EE, COTE and ST. It can be seen from Fig. 11 that although it is slightly weaker than ST, BOSS and COTE, the proposed MPDTW has no significant difference compared with the ensemble classifiers such as EE, TSF, and TSBF. In addition, as a single classifier, MPDTW can be integrated into these ensemble classifiers to improve their performance.

Figure 11.

Critical difference graph of ensemble classifiers.

(4) Deep learning-based classifiers: In addition to the aforementioned traditional classifiers, in recent years, some researchers have also proposed many deep learning-based classifiers, such as multilayer perceptron (MLP) [36], MCNN [37], TleNet [38], MCDCNN [39], CNN [40], TWISN [41], ResNet, FCN and Encoder. Figure 12 shows the critical difference graph comparing MPDTW with these classification models, from which we can see that in addition to being slightly weaker than ResNet, MPDTW has no significant difference with FCN and Encoder, and is significantly better than the others. More importantly, the proposed MPDTW has better interpretability, which is an advantage that the classification model based on deep learning does not have.

Figure 12.

Critical difference graph of deep learning-based classifiers

5.3 Running time

Through the previous comparison of classification accuracy, it can be seen that MPDTW is better than most classifiers. Although the results are weaker than individual classifiers, this does not mean that MPDTW is completely weaker than these classifiers. To prove this point, we compared these classifiers that are superior to MPDTW in the critical difference graph with MPDTW in terms of running time. Classifiers based on deep learning, due to the need to adjust a large number of hyperparameters, will not be tested here. The test is performed on 10 datasets. For objectivity, we take the average of 10 runs of the classifier on each dataset. The test result is shown in Fig. 13, from which we can see that, compared with ST and COTE, MPDTW has at least an order of magnitude advantage in running time, and for BOSS, MPDTW also has a running time shorter than it on most datasets.

Figure 13.

The average running time of BOSS, COTE, WEASAL, ST, and MPDTW on 10 datasets.

Figure 14.

The effect of Gaussian noise on classifier performance.

5.4 Noise immunity

Besides the strong competitiveness in terms of accuracy and efficiency, our method can effectively reduce data noise due to the DWT method adopted in the similarity measurement process. For verification, we randomly selected three datasets Symbols, WordSynonyms, and BeetleFly and added Gaussian noise to them. Then, we tested the noise resistance of MPDTW with the noise-added datasets. Specifically, we compare MPDTW with current representative baselines, including shapeDTW, WEASEL, and ResNet, which are all competitive models in different types of single classifiers. The comparison results are shown in Fig. 14, where SNR is the signal-to-noise ratio, the larger the SNR, the smaller the noise. In particular, when the value of the x-axis is “Original”, it indicates the classification results without adding Gaussian noise. As can be seen from the figures, after adding different degrees of noise to the data, the classification accuracy of MPDTW on the three datasets is generally better than other methods, which indicates that MPDTW is more capable of dealing with noise.

5.5 Case study

The BeetleFly [28] dataset in UCR classifies insects by image contours. The time series in the dataset contains many peak and valley shapes, which enriches its local structure information. We tested the alignment capabilities of DTW, shapeDTW, and MPDTW on this dataset. The test result is shown in Fig. 15. In the part enclosed by the green box in the figures, we can see that there are obvious alignment errors in the alignment of DTW and shapeDTW, while MPDTW achieves a more accurate alignment, which shows that the classification result based on MPDTW is more interpretable.

In addition, we also show the superiority of the proposed MPDTW through the t-SNE diagram [42]. Detailed information on the dataset used is shown in Table 2. The three datasets contain 2, 4, and 6 time series types. It can be seen from Fig. 16 that MPDTW has the best distinguishing ability for all datasets among the three methods. Although the performance of shapeDTW on the dataset FaceFour is similar to MPDTW, there is still a small amount of error distribution in shapeDTW. In contrast, MPDTW distinguishes these time series completely and accurately.

Table 2
Details of datasets

Datasets	Type	Length	Number
			Total	Train	Test	Class
FaceFour	IMAGE	350	56	28	28	2
Fish	IMAGE	463	112	24	88	4
Coffee	SPECTRO	286	350	175	175	7

Figure 15.

Comparison of alignment effects of BeetleFly.

Figure 16.

Comparison of measurement results between MPDTW, DTW and shapeDTW.

6. Conclusion

In this paper, we propose a classic time series local structure information coding method LMP, which uses the ideas of MP and SAX to realize the discretization and symbolic representation of the local structure information of the time series, and at the same time, the DWT technology is used to effectively extract the local trend information and filter noise alleviate of the time series. Based on LMP, we propose the time series similarity measurement method MPDTW, which weights the original numerical information and local structure information of the time series. MPDTW comprehensively considers the important role of time series original numerical information and local structure information in time series similarity measurement and to a large extent alleviates the limitations of traditional methods that cannot take into account the two factors. Experimental results show that MPDTW has excellent performance in time series classification tasks. In addition, the case study results show that the classification model based on this method has good interpretability.

Footnotes

Acknowledgments

This work is supported by the National Key R&D Program of China (No. 2022YFE0200400), the Beijing Natural Science Foundation (Nos. 4214067, 4182052), the National Natural Science Foundation of China (Nos. 61702030), and Fundamental Research Funds for the Central Universities (2022JBMC011).

References

Nguyen

T.L.

Gsponer

and Ifrim

, Time series classification by sequence learning in all-subsequence space, in: Proceedings of the IEEE 33rd International Conference on Data Engineering, 2017, pp. 947–958.

Brill

Fluschnik

Froese

Jain

Niedermeier

and Schultz

, Exact mean computation in dynamic time warping spaces, Data Mining and Knowledge Discovery 33(1) (2019), 252–291.

Baydogan

M.G.

and Runger

, Time series representation and similarity based on local autopatterns, Data Mining and Knowledge Discovery 30(2) (2016), 476–509.

Faouzi

and Janati

, pyts: A python package for time series classification., J. Mach. Learn. Res. 21 (2020), 46–1.

Bagnall

Lines

Bostrom

Large

and Keogh

, The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances, Data Mining and Knowledge Discovery 31(3) (2017), 606–660.

Ismail Fawaz

Forestier

Weber

Idoumghar

and Muller

P.-A.

, Deep learning for time series classification: a review, Data Mining and Knowledge Discovery 33(4) (2019), 917–963.

Yuan

Douzal-Chouakria

Varasteh Yazdi

and Wang

, A large margin time series nearest neighbour classification under locally weighted time warps, Knowledge and Information Systems 59(1) (2019), 117–135.

Yuan

Lin

Zhang

and Wang

, Locally slope-based dynamic time warping for time series classification, in: Proceedings of the 28th ACM international conference on information and knowledge management, 2019, pp. 1713–1722.

Kurbalija

Radovanović

Geler

and Ivanović

, The influence of global constraints on similarity measures for time-series databases, Knowledge-Based Systems 56 (2014), 49–67.

10.

Fang

Wang

and Wang

, Efficient learning interpretable shapelets for accurate time series classification, in: Proceedings of the IEEE 34th International Conference on Data Engineering, IEEE, 2018, pp. 497–508.

11.

Yin

Wang

Zheng

Yang

and Xu

, A new time series similarity measurement method based on the morphological pattern and symbolic aggregate approximation, IEEE Access 7 (2019), 109751–109762.

12.

Ratanamahatana

C.A.

and Keogh

, Three myths about dynamic time warping data mining, in: Proceedings of the 2005 SIAM International Conference on Data Mining, SIAM, 2005, pp. 506–510.

13.

Berndt

D.J.

and Clifford

, Using dynamic time warping to find patterns in time series, in: KDD workshop, Vol. 10, Seattle, WA, USA, 1994, pp. 359–370.

14.

Senin

and Malinchik

, SAX-VSM: Interpretable time series classification using sax and vector space model, in: Proceedings of the IEEE 13th International Conference on Data Mining, IEEE, 2013, pp. 1175–1180.

15.

Lin

Khade

and Li

, Rotation-invariant similarity in time series using bag-of-patterns representation, Journal of Intelligent Information Systems 39(2) (2012), 287–315.

16.

Wang

Liu

She

M.F.

Nahavandi

and Kouzani

, Bag-of-words representation for biomedical time series classification, Biomedical Signal Processing and Control 8(6) (2013), 634–644.

17.

Keogh

E.J.

and Pazzani

M.J.

, Derivative dynamic time warping, in: Proceedings of the 2001 SIAM International Conference on Data Mining, SIAM, 2001, pp. 1–11.

18.

Jeong

Y.-S.

Jeong

M.K.

and Omitaomu

O.A.

, Weighted dynamic time warping for time series classification, Pattern recognition 44(9) (2011), 2231–2240.

19.

Batista

G.E.

Keogh

E.J.

Tataw

O.M.

and de Souza

, CID: an efficient complexity-invariant distance for time series, Data Mining and Knowledge Discovery 28(3) (2014), 634–669.

20.

Marteau

P.-F.

, Time warp edit distance with stiffness adjustment for time series matching, IEEE transactions on pattern analysis and machine intelligence 31(2) (2008), 306–318.

21.

Górecki

and Łuczak

, Non-isometric transforms in time series classification using DTW, Knowledge-based systems 61 (2014), 98–108.

22.

Zhao

and Itti

, shapeDTW: Shape dynamic time warping, Pattern Recognition 74 (2018), 171–184.

23.

Deng

Runger

Tuv

and Vladimir

, A time series forest for classification and feature extraction, Information Sciences 239 (2013), 142–153.

24.

Baydogan

M.G.

Runger

and Tuv

, A bag-of-features framework to classify time series, IEEE transactions on pattern analysis and machine intelligence 35(11) (2013), 2796–2802.

25.

Kate

R.J.

, Using dynamic time warping distances as features for improved time series classification, Data Mining and Knowledge Discovery 30(2) (2016), 283–312.

26.

Rakthanmanon

and Keogh

, Fast shapelets: A scalable algorithm for discovering time series shapelets, in: Proceedings of the 2013 SIAM International Conference on Data Mining, SIAM, 2013, pp. 668–676.

27.

and Keogh

, Time series shapelets: a novel technique that allows accurate, interpretable and fast classification, Data Mining and Knowledge Discovery 22(1–2) (2011), 149–182.

28.

Hills

Lines

Baranauskas

Mapp

and Bagnall

, Classification of time series by shapelet transformation, Data Mining and Knowledge Discovery 28(4) (2014), 851–881.

29.

Grabocka

Schilling

Wistuba

and Schmidt-Thieme

, Learning time-series shapelets, in: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014, pp. 392–401.

30.

Yuan

Shi

Wang

Liu

and Li

, Random pairwise shapelets forest: an effective classifier for time series, Knowledge and Information Systems, 2022, pp. 1–32.

31.

Lin

Keogh

Lonardi

and Chiu

, A symbolic representation of time series, with implications for streaming algorithms, in: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2003, pp. 2–11.

32.

Nigam

Singh

and Misra

, Efficient facial expression recognition using histogram of oriented gradients in wavelet domain, Multimedia tools and applications 77(21) (2018), 28725–28747.

33.

Dastourian

, Discrete wavelet transforms of Haar’s wavelet, International Journal of Scientific & Technology Research 3(9) (2014), 247–251.

34.

Demšar

, Statistical comparisons of classifiers over multiple data sets, The Journal of Machine Learning Research 7 (2006), 1–30.

35.

Schäfer

and Leser

, Fast and accurate time series classification with weasel, in: Proceedings of the 2017 ACM Conference on Information and Knowledge Management, 2017, pp. 637–646.

36.

Wang

Yan

and Oates

, Time series classification from scratch with deep neural networks: A strong baseline, in: Proceedings of the 2017 International Joint Conference on Neural Networks, IEEE, 2017, pp. 1578–1585.

37.

Cui

Chen

and Chen

, Multi-scale convolutional neural networks for time series classification, arXiv preprint arXiv:1603.06995 (2016).

38.

Le Guennec

Malinowski

and Tavenard

, Data augmentation for time series classification using convolutional neural networks, in: Proceedings of the european conference on machine learning and principles and practice of knowledge discovery in databases, 2016.

39.

Zheng

Liu

Chen

and Leon

, Exploiting multi-channels deep convolutional neural networks for multivariate time series classification, Frontiers of Computer Science 10 (2016), 96–112.

40.

Zhao

Chen

Liu

and Wu

, Convolutional neural networks for time series classification, Journal of Systems Engineering and Electronics 28(1) (2017), 162–169.

41.

Tanisaro

and Heidemann

, Time series classification using time warping invariant echo state networks, in: Proceedings of the 15th IEEE International Conference on Machine Learning and Applications, IEEE, 2016, pp. 831–836.

42.

Van der Maaten

and Hinton

, Visualizing data using t-SNE, Journal of Machine Learning Research 9(11) (2008).

Local morphological patterns for time series classification

Abstract

Keywords

1. Introduction

3. Definition and related technology

3.1 Basic definition

3.2.1 Morphological pattern

4.1 Local morphological pattern encoding

1 https://www.timeseriesclassification.com.

Table 1 Classification accuracy of models based on DTW

5.5 Case study

Table 2 Details of datasets

Footnotes

Acknowledgments

References

¹
https://www.timeseriesclassification.com.

Table 1
Classification accuracy of models based on DTW

Table 2
Details of datasets