Univariate and multivariate time series classification with parametric integral dynamic time warping

Abstract

The dynamic time warping (DTW) distance measure is one of the popular and efficient distance measures used in algorithms of time series classification. It frequently occurs with different kinds of transformations of input data. In this paper we propose a combination of the DTW distance measure with a (discrete) integral transformation. This means that the new distance measure IDTW is simply calculated as the value of DTW on the integrated input time series. However, this design means that the distance cannot in itself give good classification results. We therefore propose to construct a parametric integral dynamic time warping distance measure ID_DTW which is a parametric combination of the distances DTW and IDTW. Such a combined distance is used in the nearest neighbor (1NN) classification method in the case of both univariate and multivariate time series analysis. Computational experiments performed on both one-dimensional and multidimensional datasets show that this approach reduces the classification error significantly in comparison with the component methods. The parametric approach allows the new distance to be adapted to each dataset, while showing no significant overfitting effects. The contribution and the main motivation of the paper is to show that the simple transformation as the integral transform can include a bit information about examined time series data and can be used to significantly improve performance of the classification process both for univariate and multivariate time series data. The results are confirmed by graphical and statistical comparisons.

Keywords

Time series classification univariate and multivariate time series data dynamic time warping parametric integral distance measure

1 Introduction

Dynamic Time Warping (DTW) is a popular distance measure used in data mining [4]. The Nearest Neighbor (NN) method with DTW is frequently used in classification and clustering of time series data [1 , 32]. It can be applied to both univariate and multivariate analysis of one-dimensional and multidimensional time series [28, 31]. The results of classification using the pair NN and DTW are usuallyvery good and hard to surpass by other distance measures [13]. The DTW distance often occurs with various kinds of transformations of the input data. For example, computing DTW on the derivative of the raw data is called the Derivative Dynamic Time Warping (DDTW) distance measure [20]. However, the DTW distance measure on the transformed data rarely gives us a universal distance measure which can compete with DTW on a large number of datasets. Therefore, in order to use the information in both the raw and transformed data, combined methods are used. We can create a distance measure which is a combination of DTW on raw and transformed time series data. Examples of such combined distances can be found in the literature: a combination of raw and derivative data with the discrete derivative of the first degree for both one-dimensional and multidimensional time series [14, 17], a combination with a sine/cosine transform [16], and others [30].

In this paper we examine an integral transformation of time series data. We introduce the Integral Dynamic Time Warping (IDTW) distance measure, whose value is computed as the DTW on the integrated input data. This distance measure can be defined for both one-dimensional and multidimensional time series. We then introduce a parametric combination of DTW and IDTW, called the Parametric Integral Dynamic Time Warping (ID_DTW) distance measure. It is a parametric convex combination of the DTW distance measure on the raw and integrated data. The share of each of the distance component is controlled by a single real-valued parameter. Such a parametric distance can be used for both univariate and multivariate time series data. The ID_DTW will be used in the classification process with the one nearest neighbor (1NN) method. The parameter of the combination is computed in the learning phase on the training subset by leave-one-out cross-validation.

We perform computational experiments on two large benchmark data bases for one-dimensional and multidimensional time series. The classification error rates obtained on these datasets clearly show that the proposed combined distance ID_DTW gives better results than the component distances DTW and IDTW alone. The results are presented by graphical comparison and confirmed statistically. Since the parameter of ID_DTW is located outside the distance components, there can be a significant reduction of computation time in the learning phase (cross-validation). We can easily get lower bounds for the combined distance, which can also be used to reduce the computational complexity of the method.

The contribution and the main motivation of the paper is to show that that the simple transformation as the integral transform can include a bit information about examined time series data and can be used to significantly improve performance of the classification process. On the second hand, we show that only parametrical approach can combine information from raw and transformed data both for univariate and multivariate time series data sets.

The presented method can be used in classification of time series originated from all fields, including price series, complementing other price prediction models [8 –10]. It also seems that it may be used in future research with granular computing techniques [2 , 33] to solve univariate and multivariate time series classification problems.

The remainder of the paper is organized as follows. We first (Section 2) review the concept of time series and the dynamic time warping algorithm for univariate and multivariate time series data. In the same section we introduce our parametric distance based on integral transformation, and explain the optimization process and properties of the new distance measure. In Section 3 the benchmark datasets used in the empirical comparison of methods are described, and we explain the experimental setup. Later in that section we present the results of our experiments on the described time series, as well as statistical analysis of the examined methods. We conclude in Section 4 with discussion of possible future extensions of the work.

2 Parametric integral dynamic time warping (ID_DTW)

A one-dimensional (univariate) time series is a sequence of observations aligned in time or space [6]. In this paper we shall assume that time series are discrete i.e. they are finite sequences of real numbers: $\begin{matrix} x = {x (i) \in ℝ : i = 1, 2, \dots, n}, \end{matrix}$ where $n \in ℕ$ is the length of the time series.

Multidimensional (multivariate) time series are defined as finite sequence of one-dimensional time series: $X = (x_{1}, x_{2}, \dots, x_{m}),$ where $m \in ℕ$ is the dimensionality of the multi-series X i.e. the number of variables of the multi-series. In this paper we shall assume that all time series (dimensions, variables) of a multi-series have the same length n for all elements of the dataset.

2.1 Dynamic time warping

The dynamic time warping distance measure (DTW) is a popular distance measure used to calculate the similarity/dissimilarity of time series [4]. To calculate the DTW for the two one-dimensional time series with length $n \in ℕ$ $\begin{matrix} x & = & {x (i) \in ℝ : i = 1, 2, \dots, n}, \\ y & = & {y (i) \in ℝ : i = 1, 2, \dots, n} \end{matrix}$ we proceed as follows. We define a local cost function d — a real-valued function of two real variables which computes a distance between two different points of the time series x and y. For the standard DTW it is usually defined as: $d (x (i), y (j)) = (x (i) - x (j))^{2} .$ (1)

Then we construct a square matrix D with dimensionality n × n consisting of the local cost function values D (i, j) = d (x (i) , y (j)). The matrix element D (i, j) corresponds to the alignment between values x (i) and y (j) of the time series. Then we create a warping path W = {w₁, w₂, …, w_K} ( $K \in ℕ$ ) with elements of the matrix D. In standard DTW the warping path is required to satisfy three conditions:

w₁ = d (1, 1) and w_K = D (n, n) (boundary conditions);

if w_k = D (i_k, j_k) and w_k+1 = D (i_k+1, j_k+1) then i_k+1 - i_k ≤ 1 and j_k+1 - j_k ≤ 1 (continuity);

i_k+1 - i_k ≥ 0 and j_k+1 - j_k ≥ 0 (monotonicity).

To get a warping path we start at the element D (1, 1) and shifting at most one index forward we finish at the element D (n, n) (Fig. 1). The path which minimizes the warping cost gives the value of the DTW distance measure: $DTW (x, y) = min_{W} {\sum_{k = 1}^{k = K} w_{k}} .$ (2)

Fig.1

Time series alignment and the corresponding warping path.

Sometimes the DTW is defined as the square root of (2).

In practice, we calculate the value of DTW by building a cumulative distance matrix Γ by dynamic programming with the following recursion: $\begin{matrix} D (i, j) + \\ min {Γ (i - 1, j - 1), Γ (i - 1, j), Γ (i, j - 1)} \end{matrix}$ with initial conditions: $\begin{matrix} Γ (i, 0) = \infty (i = 1, 2, \dots, n) . \end{matrix}$

The value of DTW is found at position (n, n) of the matrix Γ: $\begin{matrix} DTW (x, y) & = & Γ (n, n) \\ (DTW (x, y) & = & \sqrt{Γ (n, n)}) . \end{matrix}$

The DTW distance measure does not satisfy the triangle inequality hence it is not a metric. However, it holds that DTW (x, x) =0 and DTW (x, y) = DTW (y, x) for the cost function (1).

2.2 Integral dynamic time warping

The distance measure computed as DTW on an integrated input time series (indefinite integral of a function) will be called the Integral Dynamic Time Warping distance measure (IDTW): $IDTW (x) = DTW (I (x)),$ (3) where I (x) is an integral of the time series x. In the case of discrete time series, integration is simply the cumulative sum. For y = I (x): $\begin{matrix} y (1) & = & x (1), \\ y (i) & = & y (i - 1) + x (i), i = 2, 3, \dots, n \end{matrix}$ or $y (i) = \sum_{j = 1}^{i} x (j), i = 1, 2, \dots, n .$ (4)

The metric conditions for IDTW are the same as for DTW.

2.3 Multivariate dynamic time warping

With the assumption that the univariate time series of all dimensions of a multivariate time series have the same length we can view a multi-series as a one-dimensional trajectory in an m-dimensional Euclidean space:

\begin{array}{l} X = {X (i) = (x_{1} (i), x_{2} (i), \dots, x_{m} (i)) \in ℝ^{m} : \\ i = 1, 2, \dots, n} . \end{array}

(5)

Then we can define the DTW distance measure between two multi-series X and Y [17] in the same qay as for the one-dimensional case, but with a local cost function d defined as

$d (X (i), Y (j)) = \sum_{k = 1}^{k = m} (x_{k} (i) - y_{k} (j))^{2},$ (6) i.e. as the squared Euclidean distance of two m-dimensional vectors formed by the values along the dimensions of the multi-series at positions i and j (Fig. 2).

Fig.2

Multivariate time series alignment and the cost function d.

Similarly to the one-dimensional case, we define the Integral Dynamic Time Warping distance measure as DTW computed on the integrated multi-series X = (x₁, x₂, …, x_m), i.e. we integrate each component one-dimensional time series (dimension, variable) separately by formula (4): $\begin{matrix} I (X) = (I (x_{1}), I (x_{2}), \dots, I (x_{m})), \end{matrix}$

$IDTW (X, Y) = DTW (I (X), I (Y)) .$ (7) The multidimensional IDTW has similar properties as the multidimensional DTW.

Table 1

Characteristics of the one-dimensional datasets used in the experiments (#cl – number of classes, #el – number of elements (size), len – time series length)

dataset	#cl	#el	len	dataset	#cl	#el	len
50Words	50	905	270	MedicalImages	10	1141	99
Adiac	37	781	176	MiddlePhalanxOutlineAgeGroup	3	554	80
ArrowHead	3	211	251	MiddlePhalanxOutlineCorrect	2	891	80
Beef	5	60	470	MiddlePhalanxTW	6	553	80
BeetleFly	2	40	512	MoteStrain	2	1272	84
BirdChicken	2	40	512	NonInvasiveFatalECG Thorax1	42	3765	750
Car	4	120	577	NonInvasiveFatalECG Thorax2	42	3765	750
CBF	3	930	128	OliveOil	4	60	570
ChlorineConcentration	3	4307	166	OSU Leaf	6	442	427
CinC ECG torso	4	1420	1639	PhalangesOutlinesCorrect	2	2658	80
Coffee	2	56	286	Phoneme	39	2110	1024
Computers	2	500	720	ProximalPhalanxOutlineAgeGroup	3	605	80
Cricket X	12	780	300	ProximalPhalanxOutlineCorrect	2	891	80
Cricket Y	12	780	300	ProximalPhalanxTW	6	605	80
Cricket Z	12	780	300	RefrigerationDevices	3	750	720
DiatomSizeReduction	4	322	345	ScreenType	3	750	720
DistalPhalanxOutlineAgeGroup	3	539	80	ShapeletSim	2	200	500
DistalPhalanxOutlineCorrect	2	876	80	ShapesAll	60	1200	512
DistalPhalanxTW	6	539	80	SmallKitchenAppliances	3	750	720
Earthquakes	2	461	512	SonyAIBORobot Surface	2	621	70
ECG200	2	200	96	SonyAIBORobot SurfaceII	2	980	65
ECG5000	5	5000	140	StarLightCurves	3	9236	1024
ECGFiveDays	2	884	136	Strawberry	2	983	235
ElectricDevices	7	16637	96	Swedish Leaf	15	1125	128
FaceAll	14	2250	131	Symbols	6	1020	398
FaceFour	4	112	350	Synthetic Control	6	600	60
Fish	7	350	463	ToeSegmentation1	2	268	277
FordA	3	4921	500	ToeSegmentation2	2	166	343
FordB	3	4446	500	Trace	4	200	275
Gun Point	2	200	150	Two Patterns	4	5000	128
Ham	2	214	431	TwoLeadECG	2	1162	82
HandOutlines	2	1370	2709	uWaveGestureLibrary X	8	4478	315
Haptics	5	463	1092	uWaveGestureLibrary Y	8	4478	315
Herring	2	128	512	uWaveGestureLibrary Z	8	4478	315
InlineSkate	7	650	1882	UWaveGestureLibraryAll	8	4478	945
InsectWingbeatSound	11	2200	256	Wafer	2	7174	152
ItalyPowerDemand	2	1096	24	Wine	2	111	234
LargeKitchenAppliances	3	750	720	WordsSynonyms	25	905	270
Lightning-2	2	121	637	Worms	5	258	900
Lightning-7	7	143	319	WormsTwoClass	2	258	900
MALLAT	8	2400	1024	Yoga	2	3300	426
Meat	3	120	448

2.4 Combining raw and integrated data

Before combining into one distance measure, both raw and integrated data are normalized. The normalized time series will be a new time series denoted N (x): $\begin{matrix} N (x) = z - norm (x), \end{matrix}$ where z-norm denotes z-normalization by: $\begin{matrix} z - norm (x) = \frac{x - μ (x)}{σ (x)}, \end{matrix}$ where μ (x) is the mean of the (univariate) time series x, and σ (x) is the standard deviation of x.

The normalized multivariate time series will be the series for which every dimensionality is normalized separately: $\begin{matrix} N (X) = (N (x_{1}), N (x_{2}), \dots, N (x_{m})) . \end{matrix}$ Therefore the strict definitions of the distances are: $\begin{matrix} NDTW (a, b) & = DTW (N (a), N (b)), \\ NIDTW (a, b) & = NDTW (I (N (a)), I (N (b))), \end{matrix}$ where a and b are univariate and multivariate time series respectively. However, to simplify notation, in the rest of the work we shall always use the symbols DTW and IDTW (instead of NDTW and NIDTW) with the assumption that the data is always normalized before computation of the distance measure.

We can now define the parametric integral dynamic time warping distance measure (for both univariate and multivariate time series) as a convex combination of the distances DTW and IDTW:

$\begin{matrix} {ID}_{DTW} (a, b) = \\ (1 - α) DTW (a, b) + α IDTW (a, b), \end{matrix}$ where a and b are univariate and multivariate time series respectively and α is a real-valued parameter (α ∈ [0, 1]).

The distance function ID_DTW can be used in the nearest neighbor classification method, where the parameter α is tuned in the learning phase. In this work the parameter α will be chosen by leave-one-out cross-validation on the training dataset.

Table 2
Summary of multidimensional datasets used in the experiments

Data sets Variables Max length Min length Classes Size Source

Arabic Digits 13 93 4 10 8800 UCI

Australian Language 22 136 45 95 2565 UCI

BCI 28 500 500 2 416 Blankertz

Character Trajectories 3 205 109 20 2858 UCI

CMU Subject 16 62 580 127 2 58 CMUMC

ECG 2 152 39 2 200 Olszewski

Graz 3 1152 1152 3 140 Leeb

Japanese Vowels 12 29 7 9 640 UCI

Libras 2 45 45 15 360 UCI

Non-Invasive Fetal ECG 2 750 750 42 3765 UCR

Pen Digits 2 8 8 10 10992 UCI

Robot Failure LP1 6 15 15 4 88 UCI

Robot Failure LP2 6 15 15 5 47 UCI

Robot Failure LP3 6 15 15 4 47 UCI

Robot Failure LP4 6 15 15 3 117 UCI

Robot Failure LP5 6 15 15 5 164 UCI

uWaveGestureLibrary 3 315 315 8 4478 UCR

Wafer 6 198 104 2 1194 Olszewski

Data sets	Variables	Max length	Min length	Classes	Size	Source
Arabic Digits	13	93	4	10	8800	UCI
Australian Language	22	136	45	95	2565	UCI
BCI	28	500	500	2	416	Blankertz
Character Trajectories	3	205	109	20	2858	UCI
CMU Subject 16	62	580	127	2	58	CMUMC
ECG	2	152	39	2	200	Olszewski
Graz	3	1152	1152	3	140	Leeb
Japanese Vowels	12	29	7	9	640	UCI
Libras	2	45	45	15	360	UCI
Non-Invasive Fetal ECG	2	750	750	42	3765	UCR
Pen Digits	2	8	8	10	10992	UCI
Robot Failure LP1	6	15	15	4	88	UCI
Robot Failure LP2	6	15	15	5	47	UCI
Robot Failure LP3	6	15	15	4	47	UCI
Robot Failure LP4	6	15	15	3	117	UCI
Robot Failure LP5	6	15	15	5	164	UCI
uWaveGestureLibrary	3	315	315	8	4478	UCR
Wafer	6	198	104	2	1194	Olszewski

Since α is outside the distances DTW and IDTW, to compute values of ID_DTW for all parameters from the interval [0, 1] we need to calculate DTW and IDTW only once. This allows some optimization of the computation time in the learning phase of the NN method. The optimized algorithm for leave-one-out cross-validation on the training dataset is shown in Fig. 3.

Fig.3

Implementation of the optimized algorithm (for univariate and multivariate time series) for the leave-one-out cross-validation routine (Matlab code).

2.5 Metric conditions and lower bounds

Since the DTW distance measure is not a metric (it does not satisfy the triangle inequality), the new combined distance measure ID_DTW is also not a metric. However, similarly to DTW, the following identities hold: $\begin{matrix} {ID}_{DTW} (X, X) & = 0, \\ {ID}_{DTW} (X, Y) & = {ID}_{DTW} (Y, X), \end{matrix}$ for every fixed parameter α ∈ [0, 1].

To shorten the calculation time of the NN method with the distance ID_DTW we can use a lower bound. If LB is a lower bound for DTW and LBI is a lower bound for IDTW then

{LB}_{α} (a, b) = (1 - α) LB (a, b) + α LBI (a, b)

(8)

is a lower bound for the distance ID_DTW (for every fixed α ∈ [0, 1] in both the univariate and multivariate cases). There are many good lower bounds of the DTW for one-dimensional time series, such as LB_Keogh [18] and LB_Improved [22]. By our definition of DTW for multidimensional time series (5), (6) we can easily transform those univariate lower bounds to multivariate DTW lower bounds. Then, by (8), we can also find a good lower boundsfor ID_DTW.

3 Results

3.1 Experimental setup

We conducted computational experiments for both one-dimensional and multidimensional time series data [19].

In the univariate case, we performed experiments on 85 data sets from the UCR Time Series Classification Archive [7]. This is a database with labeled time series data from a very broad range of fields, including medicine, finance, multimedia and engineering. Each dataset from the database is split into training and testing subsets. All data is z-normalized. Information on the time series used is presented in Table 1.

For the classification process the nearest neighbor method (1NN) is used for all compared distances: DTW, IDTW and ID_DTW. We use the cross-validation (leave-one-out) method to find the best parameter α in our classifier ID_DTW on a training subset. If the minimal error rate is the same for more than one value of α, we choose the median of those values. A finite subset of the parameter α is chosen, ranging from 0 to 1 with a fixed step size of 0.01.

Table 3
Test errors of the compared methods (in %) for univariate time series

Dataset DTW IDTW ID_DTW Dataset DTW IDTW ID_DTW

50words 30.99 49.67 26.59 MedicalImages 26.32 36.84 26.05

Adiac 39.64 64.19 39.13 MiddlePhalanxOutlineAgeGroup 25.00 28.00 25.00

ArrowHead 29.71 38.29 26.86 MiddlePhalanxOutlineCorrect 35.17 31.00 29.50

Beef 36.67 50.00 33.33 MiddlePhalanxTW 41.60 39.85 42.36

BeetleFly 30.00 40.00 25.00 MoteStrain 16.53 25.24 23.80

BirdChicken 25.00 35.00 25.00 NonInvasiveFatalECG Thorax1 20.97 43.66 20.15

Car 26.67 41.67 31.67 NonInvasiveFatalECG Thorax2 13.54 36.28 12.98

CBF 0.33 1.11 0.11 OliveOil 16.67 16.67 20.00

ChlorineConcentration 35.16 44.69 36.33 OSULeaf 40.91 66.12 40.08

CinC ECG torso 34.93 43.62 32.68 PhalangesOutlinesCorrect 27.16 30.77 27.27

Coffee 0.00 14.29 0.00 Phoneme 77.16 83.44 77.00

Computers 30.00 48.40 34.40 Plane 0.00 15.24 0.00

Cricket X 24.62 57.69 24.62 ProximalPhalanxOutlineAgeGroup 19.51 23.90 23.41

Cricket Y 25.64 49.49 25.64 ProximalPhalanxOutlineCorrect 21.65 25.43 22.68

Cricket Z 24.62 55.64 24.10 ProximalPhalanxTW 26.25 30.00 26.00

DiatomSizeReduction 3.27 13.73 2.61 RefrigerationDevices 53.60 59.47 53.60

DistalPhalanxOutlineAgeGroup 20.75 22.50 20.00 ScreenType 60.27 63.47 58.13

DistalPhalanxOutlineCorrect 23.17 29.50 23.17 ShapeletSim 35.00 47.22 41.67

DistalPhalanxTW 29.00 29.75 28.00 ShapesAll 23.17 41.67 21.17

Earthquakes 25.78 35.71 25.78 SmallKitchenAppliances 35.73 36.53 31.47

ECG200 23.00 22.00 19.00 SonyAIBORobotSurface 27.45 27.12 26.79

ECG5000 7.56 10.29 7.78 SonyAIBORobotSurfaceII 16.89 19.73 17.94

ECGFiveDays 23.23 27.29 22.42 StarLightCurves 9.34 13.82 9.06

ElectricDevices 40.42 38.65 35.52 Strawberry 6.04 14.19 6.85

FaceAll 19.23 24.38 19.29 SwedishLeaf 20.80 45.44 17.60

FaceFour 17.05 23.86 15.91 Symbols 5.03 17.69 5.73

FacesUCR 9.51 15.76 9.90 synthetic control 0.67 3.33 0.67

FISH 17.71 50.29 21.14 ToeSegmentation1 22.81 29.82 29.39

FordA 43.79 41.66 41.21 ToeSegmentation2 16.15 21.54 14.62

FordB 40.59 38.81 40.04 Trace 0.00 26.00 0.00

Gun Point 9.33 9.33 10.67 Two Patterns 0.00 0.00 0.00

Ham 53.33 41.90 51.43 TwoLeadECG 9.57 32.13 9.83

HandOutlines 20.20 23.70 20.10 uWaveGestureLibrary X 27.25 33.84 24.09

Haptics 62.34 68.18 57.14 uWaveGestureLibrary Y 36.60 41.71 32.52

Herring 46.88 51.56 48.44 uWaveGestureLibrary Z 34.17 43.44 32.72

InlineSkate 61.64 72.18 62.18 UWaveGestureLibraryAll 10.83 10.47 4.83

InsectWingbeatSound 64.49 56.46 51.92 wafer 2.01 1.56 1.23

ItalyPowerDemand 4.96 9.14 4.86 Wine 42.59 50.00 42.59

LargeKitchenAppliances 20.53 42.40 20.53 WordsSynonyms 35.11 53.61 31.82

Lighting2 13.11 24.59 13.11 Worms 53.59 67.40 53.59

Lighting7 27.40 39.73 24.66 WormsTwoClass 33.70 43.65 37.02

MALLAT 6.61 16.12 5.97 yoga 16.33 23.33 15.57

Meat 6.67 10.00 6.67

Dataset	DTW	IDTW	ID_DTW	Dataset	DTW	IDTW	ID_DTW
50words	30.99	49.67	26.59	MedicalImages	26.32	36.84	26.05
Adiac	39.64	64.19	39.13	MiddlePhalanxOutlineAgeGroup	25.00	28.00	25.00
ArrowHead	29.71	38.29	26.86	MiddlePhalanxOutlineCorrect	35.17	31.00	29.50
Beef	36.67	50.00	33.33	MiddlePhalanxTW	41.60	39.85	42.36
BeetleFly	30.00	40.00	25.00	MoteStrain	16.53	25.24	23.80
BirdChicken	25.00	35.00	25.00	NonInvasiveFatalECG Thorax1	20.97	43.66	20.15
Car	26.67	41.67	31.67	NonInvasiveFatalECG Thorax2	13.54	36.28	12.98
CBF	0.33	1.11	0.11	OliveOil	16.67	16.67	20.00
ChlorineConcentration	35.16	44.69	36.33	OSULeaf	40.91	66.12	40.08
CinC ECG torso	34.93	43.62	32.68	PhalangesOutlinesCorrect	27.16	30.77	27.27
Coffee	0.00	14.29	0.00	Phoneme	77.16	83.44	77.00
Computers	30.00	48.40	34.40	Plane	0.00	15.24	0.00
Cricket X	24.62	57.69	24.62	ProximalPhalanxOutlineAgeGroup	19.51	23.90	23.41
Cricket Y	25.64	49.49	25.64	ProximalPhalanxOutlineCorrect	21.65	25.43	22.68
Cricket Z	24.62	55.64	24.10	ProximalPhalanxTW	26.25	30.00	26.00
DiatomSizeReduction	3.27	13.73	2.61	RefrigerationDevices	53.60	59.47	53.60
DistalPhalanxOutlineAgeGroup	20.75	22.50	20.00	ScreenType	60.27	63.47	58.13
DistalPhalanxOutlineCorrect	23.17	29.50	23.17	ShapeletSim	35.00	47.22	41.67
DistalPhalanxTW	29.00	29.75	28.00	ShapesAll	23.17	41.67	21.17
Earthquakes	25.78	35.71	25.78	SmallKitchenAppliances	35.73	36.53	31.47
ECG200	23.00	22.00	19.00	SonyAIBORobotSurface	27.45	27.12	26.79
ECG5000	7.56	10.29	7.78	SonyAIBORobotSurfaceII	16.89	19.73	17.94
ECGFiveDays	23.23	27.29	22.42	StarLightCurves	9.34	13.82	9.06
ElectricDevices	40.42	38.65	35.52	Strawberry	6.04	14.19	6.85
FaceAll	19.23	24.38	19.29	SwedishLeaf	20.80	45.44	17.60
FaceFour	17.05	23.86	15.91	Symbols	5.03	17.69	5.73
FacesUCR	9.51	15.76	9.90	synthetic control	0.67	3.33	0.67
FISH	17.71	50.29	21.14	ToeSegmentation1	22.81	29.82	29.39
FordA	43.79	41.66	41.21	ToeSegmentation2	16.15	21.54	14.62
FordB	40.59	38.81	40.04	Trace	0.00	26.00	0.00
Gun Point	9.33	9.33	10.67	Two Patterns	0.00	0.00	0.00
Ham	53.33	41.90	51.43	TwoLeadECG	9.57	32.13	9.83
HandOutlines	20.20	23.70	20.10	uWaveGestureLibrary X	27.25	33.84	24.09
Haptics	62.34	68.18	57.14	uWaveGestureLibrary Y	36.60	41.71	32.52
Herring	46.88	51.56	48.44	uWaveGestureLibrary Z	34.17	43.44	32.72
InlineSkate	61.64	72.18	62.18	UWaveGestureLibraryAll	10.83	10.47	4.83
InsectWingbeatSound	64.49	56.46	51.92	wafer	2.01	1.56	1.23
ItalyPowerDemand	4.96	9.14	4.86	Wine	42.59	50.00	42.59
LargeKitchenAppliances	20.53	42.40	20.53	WordsSynonyms	35.11	53.61	31.82
Lighting2	13.11	24.59	13.11	Worms	53.59	67.40	53.59
Lighting7	27.40	39.73	24.66	WormsTwoClass	33.70	43.65	37.02
MALLAT	6.61	16.12	5.97	yoga	16.33	23.33	15.57
Meat	6.67	10.00	6.67

In the multivariate case, the experiments are carried out on 16 data sets, all of which have labels given. The data sets originate from different domains, including medicine, robotics, handwriting recognition, etc. Information on the time series used is presented in Table 2 (UCI — [3], CMU MOCAP — [5 , 25]). Most data is not z-normalized. There is no split into training and testing subsets.

The multivariate time series samples in each dataset are of different lengths. For each dataset, the samples are extended to the length of the longest time series in the data set. We extend all variables of the multidimensional series to the same length. For a short time series instance x with length n, we extend it to a long instance y with lengthn_max by

$\begin{matrix} y (j) = x (i), for i & = ⌈ \frac{j - 1}{n_{\max} - 1} (n - 1) + 0.5 ⌉ \\ j & = 1, 2, \dots, n_{\max} . \end{matrix}$ Some of the values in the sample are duplicated in order to extend a time series. In this way, all the values in the original multivariate time series sample appear in the extended sample.

For the classification process the nearest neighbor method (1NN) is used for all compared distances: DTW, IDTW and ID_DTW. For each dataset we calculated the classification error rate using the 10-fold cross-validation method (1NN classifier). We use the cross-validation (leave-one-out) method to find the best parameter α for our classifier ID_DTW on a training subset for each split of the 10-fold CV. If the minimal error rate is the same for more than one value of α, we choose the median of those values. A finite subset of the parameter α is chosen, ranging from 0 to 1 with a fixed step size 0.01.

3.2 Experimental results

The results for one-dimensional time series are presented in Table 3. The columns of the table (from left) show: the testing error rate (as a percentage) for the component methods DTW and IDTW and for the examined parametric method ID_DTW. In Fig. 4 a graphical comparison of the combined method with the component methods is presented.

Fig.4

Graphical comparison of error rates: DTW vs ID_DTW and IDTW vs ID_DTW for one-dimensional time series data.

For statistical confirmation of the performance of ID_DTW, p-values were calculated for the Wilcoxon test. To statistically compare two classifiers over multiple data sets, [12] recommends the Wilcoxon signed-ranks test. The Wilcoxon signed-ranks test is a non-parametric alternative to the paired t-test, which ranks the differences in the performances of two classifiers for each dataset, ignoring the signs, and compares the ranks for the positive and the negative differences. The p-values are: 0.0272 for the pair DTW vs. ID_DTW and 5.6043 × 10^-14 for the pair IDTW vs. ID_DTW. This means that the parametric combined distance ID_DTW is better than the compared component distances DTW and IDTW at a 97% significance level.

For multidimensional time series data, the results are shown in Table 4. The columns (from left) show the testing error rate (as a percentage) for the component distances DTW and IDTW and for the new parametric combined distance ID_DTW. In Fig. 5 a graphical comparison of the examined method with the component methods is shown.

Table 4

Test errors of the compared methods (in %) for multivariate time series

Dataset	DTW	IDTW	ID_DTW
ArabicDigits	0.22	2.23	0.22
AUSLAN	23.20	47.72	22.11
BCI	46.54	45.22	51.23
CharacterTrajectories	1.50	0.87	0.74
CMUsubject16	0.00	1.67	0.00
ECG	16.00	28.00	17.00
Graz	34.29	46.43	40.71
JapaneseVowels	36.09	22.50	19.84
Libras	18.61	17.50	11.67
NonInvasiveFetalECG Thorax	9.99	21.04	10.01
PenDigits	0.63	1.70	0.62
RobotFailure LP1	28.06	22.64	14.58
RobotFailure LP2	34.00	45.50	28.00
RobotFailure LP3	48.00	33.50	25.50
RobotFailure LP4	16.21	22.88	13.64
RobotFailure LP5	36.54	34.19	33.53
uWaveGestureLibrary	1.90	4.20	1.92
Wafer	3.85	4.18	3.68

Fig.5

Graphical comparison of error rates: DTW vs ID_DTW and IDTW vs ID_DTW for multidimensional time series data.

As in the univariate case, we confirm statistically the superior performance of ID_DTW by the Wilcoxon test. The calculated p-values are: 0.0703 for the comparison of DTW vs. ID_DTW and 0.0012 for IDTW vs. ID_DTW. We can see that the combined method ID_DTW outperforms the component methods DTW and IDTW at a significance level of 93.

We can present the contribution of the component distances to the combined distance ID_DTW. Figure 6 shows graphs of the parameter α for example univariate time series. The α corresponding to the minimal error rate is different for each data set but we can see that the minimum of the error is well-positioned — there is only one minimum for each error curve. The testing error rate generally corresponds to the cross-validation training error rate, so we can predict quite well the best value of the parameter α. The combined method adapts well to different data sets without showing signs of overfitting. Similar behavior can be observed for multivariate time series.

Fig.6

Correspondence of the parameter α and error rates for the ID_DTW method on some univariate datasets. Dashed line: training subset (CV) error; solid line: test subset error.

4 Conclusions and future work

In this paper we have proposed and examined a parametrical distance measure based on a convex combination of the dynamic time warping distance measure (DTW) and the integral dynamic time warping distance measure (IDTW). The distance IDTW, computed as DTW on the integrated data, when used separately does not give good results in the process of time series classification. However, there exist quite a large group of datasets for which using the IDTW distance is very favorable. The experiments performed show that the parametrical combination ID_DTW outperforms the component distance measures DTW and IDTW in the case of both univariate and multivariate time series data and significantly reduces the classification error rate. This is confirmed by graphical and statistical comparison. The parametric approach used in the ID_DTW method makes it possible to combine the advantages and avoid the disadvantages of the component methods. The new distance can be adapted to individual datasets giving the best classification performance without showing signs of overfitting. We can see a very good correspondence between the values of the distance parameter on the training and testing data subsets. The method seems to be easily implemented and interpreted. There exist methods of computational complexity reduction resulting both from the existence of lower bounds and directly from the structure of the distance measure ID_DTW. The main motivation of the paper is to show that that the simple transformation as the integral transform can include a bit information about examined time series data and can be used to significantly improve performance of the classification process.

A disadvantage of the examined method is that its computational complexity is higher than that of the component methods DTW and IDTW. Tuning the distance parameter requires some additional computation in the learning phase. However, in the testing phase, the computational complexity is the same as for the component distances. The new parametric combined method seems to be especially interesting in cases where we have precomputed integral data of the examined time series or where the computation time of integrals is negligible, for example in systems with fast software or even hardware integrators.

Future investigation of the parametric integral dynamic time warping distance measure ID_DTW may take several directions. We may try to adapt the method to the unsupervised classification case. Clustering methods often require a different approach (to address certain specific problems) than for supervised methods [24]. We can also construct a parametrical distance measure for higher degrees of integrals, similarly as for derivatives in the paper [15]. It may be interesting to examine different definitions of the discrete integral and their influence on the performance of the distance measure. We may also seek further methods of reducing computational complexity in the training phase of classification.

References

Aghabozorgi

, Shirkhorshidi

A.S.

and Wah

T.Y.

, Time-series clustering—A decade review, Information Systems53 (2015), 16–38.

Ahmad

S.S.S.

and Pedrycz

, The development of granular rule-based systems: A study in structural model compression, Granular Computing2(1) (2017), 1–12.

Bache

and Lichman

, UCIMachine Learning Repository [http://archive.ics.uci.edu/ml], University of California, School of Information and Computer Science, Irvine. CA, 2013.

Berndt

D.J.

and Clifford

, Using dynamic time warping to find patterns in time series, AAAIWorkshop on Knowledge Discovery in Databases (1994), 229–248.

Blankertz

, Curio

, Müller

K.R.

, Classifying single trial EEG: Towards brain computer interfacing, http://www.bbci.de/competition/ii/, In: Diettrich

T.G.

, Becker

and Ghahramani

(Eds.), Advances in Neural Inf.roc. Systems 14 (NIPS 01). Available from, 2002.

Box

G.E.P.

, Jenkins

G.M.

and Reinsel

G.C.

, Time series analysis: Forecasting and control, Wiley, 2008.

Chen

, Keogh

, Hu

, Begum

, Bagnall

, Mueen

and Batista

, The UCR Time Series Classification Archive. www.cs.ucr.edu/∼eamonn/time_series_data/ (2015).

Chen

M.Y.

and Chen

B.T.

, A hybrid fuzzy time series model based on granular computing for stock price forecasting, Information Sciences294 (2015), 227–241.

Chen

M.Y.

and Chen

B.T.

, Online fuzzy time series analysis based on entropy discretization and a fast fourier transform, Applied Soft Computing14 (2014), 156–166.

10.

Chen

M.Y.

, A high-order fuzzy time series forecasting model for Internet stock trading, Future Generation Computer Systems - The International Journal of eScience37 (2014), 461–467.

11.

Carnegie Mellon University Motion Capture Database (2014). Available from: http://mocap.cs.cmu.edu/.

12.

Demšar

, Statistical Comparisons of Classifiers over Multiple Data Sets, Journal of Machine Learning Research7 (2006), 1–30.

13.

Ding

, Trajcevski

, Scheuermann

, Wang

and Keogh

, Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures, In Proc 34th Int Conf on Very Large Data Bases, 2008, pp. 1542–1552.

14.

Górecki

and Łuczak

, Using derivatives in time series classification, Data Mining and Knowledge Discovery26(2) (2013), 310–331.

15.

Górecki

and Łuczak

, First and second derivative in time series classification using DTW, Communications in Statistics-Simulation and Computation43(9) (2014a), 2081–2092.

16.

Górecki

and Łuczak

, Non-isometric transforms in time series classification using DTW, Knowledge-Based Systems61 (2014b), 98–108.

17.

Górecki

and Łuczak

, Multivariate time series classification with parametric derivative dynamic time warping, Expert Systems with Applications42(5) (2015), 2305–2312.

18.

Keogh

, Exact indexing of dynamic time warping, In 28th International Conference on Very Large Data Bases, 2002, pp. 406–417.

19.

Keogh

and Kasetty

, On the need for time series data mining benchmarks: A survey and empirical demonstration, Data Mining and Knowledge Discovery4(7) (2003), 349–371.

20.

Keogh

and Pazzani

, Dynamic Time Warping with Higher Order Features, In First SIAM International Conference on Data Mining (SDM’2001), Chicago, USA, 2001.

21.

Leeb

, Lee

, Keinrath

, Scherer

, Bischof

and Pfurtscheller

, Brain-computer communication: Motivation, aim, and impact of exploring a virtual apartment, http://www.bbci.de/competition/iv/, IEEE Transactions on Neural Systems and Rehabilitation Engineering15, 473–482. Available from, 2007.

22.

Lemire

, Faster retrieval with a two-pass dynamic-time-warping lower bound, Pattern Recognition42(9) (2009), 2169–2180.

23.

Liu

, Gegov

and Cocea

, Rule-based systems: A granular computing perspective, Granular Computing1(4) (2016), 259–274.

24.

Łuczak

, Hierarchical clustering of time series data with parametric derivative dynamic time warping, Expert Systems with Applications62 (2016), 116–130.

25.

Olszewski

R.T.

, Generalized Feature Extraction for Structural Pattern Recognition in Time-Series Data, http://www.cs.cmu.edu/ bobski, Ph. D. Thesis. Carnegie Mellon University, Pittsburgh. Available from, 2001.

26.

Peters

and Weber

, DCC: A framework for dynamic granular clustering, Granular Computing1(1) (2016), 1–11.

27.

Petitjean

, Forestier

, Webb

G.I.

, Nicholson

A.E.

, Chen

and Keogh

, Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm, Knowledge and Information Systems47(1) (2016), 1–26.

28.

Shokoohi-Yekta

, Wang

and Keogh

, On the Non-Trivial Generalization of Dynamic Time Warping to the Multi-Dimensional Case, Proceedings of the 2015 SIAM International Conference on Data Mining, 2015, p. 9.

29.

Skowron

, Jankowski

and Dutta

, Interactive granular computing, Granular Computing1(2) (2016), 95–113.

30.

Taylor

, Zhou

, Rouphail

N.M.

and Porter

R.J.

, Method for investigating intradriver heterogeneity using vehicle trajectory data: A Dynamic Time Warping approach, Transportation Research Part B: Methodological73 (2015), 59–80.

31.

Warrenliao

, Clustering of time series data— a survey, Pattern Recognition38(11) (2005), 1857–1874.

32.

Xing

, Pei

and Keogh

, A brief survey on sequence classification, ACM SIGKDD Explorations Newsletter12(1) (2010), 40–48.

33.

Yao

, A triarchic theory of granular computing, Granular Computing1(2) (2016), 145–157.

Univariate and multivariate time series classification with parametric integral dynamic time warping

Abstract

Keywords

1 Introduction

2 Parametric integral dynamic time warping (IDDTW)

2.1 Dynamic time warping

3.1 Experimental setup

References

2 Parametric integral dynamic time warping (ID_DTW)