Abstract
The dynamic time warping (DTW) distance measure is one of the popular and efficient distance measures used in algorithms of time series classification. It frequently occurs with different kinds of transformations of input data. In this paper we propose a combination of the DTW distance measure with a (discrete) integral transformation. This means that the new distance measure IDTW is simply calculated as the value of DTW on the integrated input time series. However, this design means that the distance cannot in itself give good classification results. We therefore propose to construct a parametric integral dynamic time warping distance measure IDDTW which is a parametric combination of the distances DTW and IDTW. Such a combined distance is used in the nearest neighbor (1NN) classification method in the case of both univariate and multivariate time series analysis. Computational experiments performed on both one-dimensional and multidimensional datasets show that this approach reduces the classification error significantly in comparison with the component methods. The parametric approach allows the new distance to be adapted to each dataset, while showing no significant overfitting effects. The contribution and the main motivation of the paper is to show that the simple transformation as the integral transform can include a bit information about examined time series data and can be used to significantly improve performance of the classification process both for univariate and multivariate time series data. The results are confirmed by graphical and statistical comparisons.
Keywords
Introduction
Dynamic Time Warping (DTW) is a popular distance measure used in data mining [4]. The Nearest Neighbor (NN) method with DTW is frequently used in classification and clustering of time series data [1, 32]. It can be applied to both univariate and multivariate analysis of one-dimensional and multidimensional time series [28, 31]. The results of classification using the pair NN and DTW are usuallyvery good and hard to surpass by other distance measures [13]. The DTW distance often occurs with various kinds of transformations of the input data. For example, computing DTW on the derivative of the raw data is called the Derivative Dynamic Time Warping (DDTW) distance measure [20]. However, the DTW distance measure on the transformed data rarely gives us a universal distance measure which can compete with DTW on a large number of datasets. Therefore, in order to use the information in both the raw and transformed data, combined methods are used. We can create a distance measure which is a combination of DTW on raw and transformed time series data. Examples of such combined distances can be found in the literature: a combination of raw and derivative data with the discrete derivative of the first degree for both one-dimensional and multidimensional time series [14, 17], a combination with a sine/cosine transform [16], and others [30].
In this paper we examine an integral transformation of time series data. We introduce the Integral Dynamic Time Warping (IDTW) distance measure, whose value is computed as the DTW on the integrated input data. This distance measure can be defined for both one-dimensional and multidimensional time series. We then introduce a parametric combination of DTW and IDTW, called the Parametric Integral Dynamic Time Warping (IDDTW) distance measure. It is a parametric convex combination of the DTW distance measure on the raw and integrated data. The share of each of the distance component is controlled by a single real-valued parameter. Such a parametric distance can be used for both univariate and multivariate time series data. The IDDTW will be used in the classification process with the one nearest neighbor (1NN) method. The parameter of the combination is computed in the learning phase on the training subset by leave-one-out cross-validation.
We perform computational experiments on two large benchmark data bases for one-dimensional and multidimensional time series. The classification error rates obtained on these datasets clearly show that the proposed combined distance IDDTW gives better results than the component distances DTW and IDTW alone. The results are presented by graphical comparison and confirmed statistically. Since the parameter of IDDTW is located outside the distance components, there can be a significant reduction of computation time in the learning phase (cross-validation). We can easily get lower bounds for the combined distance, which can also be used to reduce the computational complexity of the method.
The contribution and the main motivation of the paper is to show that that the simple transformation as the integral transform can include a bit information about examined time series data and can be used to significantly improve performance of the classification process. On the second hand, we show that only parametrical approach can combine information from raw and transformed data both for univariate and multivariate time series data sets.
The presented method can be used in classification of time series originated from all fields, including price series, complementing other price prediction models [8–10]. It also seems that it may be used in future research with granular computing techniques [2, 33] to solve univariate and multivariate time series classification problems.
The remainder of the paper is organized as follows. We first (Section 2) review the concept of time series and the dynamic time warping algorithm for univariate and multivariate time series data. In the same section we introduce our parametric distance based on integral transformation, and explain the optimization process and properties of the new distance measure. In Section 3 the benchmark datasets used in the empirical comparison of methods are described, and we explain the experimental setup. Later in that section we present the results of our experiments on the described time series, as well as statistical analysis of the examined methods. We conclude in Section 4 with discussion of possible future extensions of the work.
Parametric integral dynamic time warping (IDDTW)
A one-dimensional (univariate) time series is a sequence of observations aligned in time or space [6]. In this paper we shall assume that time series are discrete i.e. they are finite sequences of real numbers:
Multidimensional (multivariate) time series are defined as finite sequence of one-dimensional time series:
Dynamic time warping
The dynamic time warping distance measure (DTW) is a popular distance measure used to calculate the similarity/dissimilarity of time series [4]. To calculate the DTW for the two one-dimensional time series with length
Then we construct a square matrix D with dimensionality n × n consisting of the local cost function values D (i, j) = d (x (i) , y (j)). The matrix element D (i, j) corresponds to the alignment between values x (i) and y (j) of the time series. Then we create a warping path W = {w1, w2, …, w
K
} ( w1 = d (1, 1) and w
K
= D (n, n) (boundary conditions); if w
k
= D (i
k
, j
k
) and wk+1 = D (ik+1, jk+1) then ik+1 - i
k
≤ 1 and jk+1 - j
k
≤ 1 (continuity); ik+1 - i
k
≥ 0 and jk+1 - j
k
≥ 0 (monotonicity).
To get a warping path we start at the element D (1, 1) and shifting at most one index forward we finish at the element D (n, n) (Fig. 1). The path which minimizes the warping cost gives the value of the DTW distance measure:

Time series alignment and the corresponding warping path.
Sometimes the DTW is defined as the square root of (2).
In practice, we calculate the value of DTW by building a cumulative distance matrix Γ by dynamic programming with the following recursion:
The value of DTW is found at position (n, n) of the matrix Γ:
The DTW distance measure does not satisfy the triangle inequality hence it is not a metric. However, it holds that DTW (x, x) =0 and DTW (x, y) = DTW (y, x) for the cost function (1).
The distance measure computed as DTW on an integrated input time series (indefinite integral of a function) will be called the Integral Dynamic Time Warping distance measure (IDTW):
The metric conditions for IDTW are the same as for DTW.
With the assumption that the univariate time series of all dimensions of a multivariate time series have the same length we can view a multi-series as a one-dimensional trajectory in an m-dimensional Euclidean space:
Then we can define the DTW distance measure between two multi-series X and Y [17] in the same qay as for the one-dimensional case, but with a local cost function d defined as

Multivariate time series alignment and the cost function d.
Similarly to the one-dimensional case, we define the Integral Dynamic Time Warping distance measure as DTW computed on the integrated multi-series X = (x1, x2, …, x
m
), i.e. we integrate each component one-dimensional time series (dimension, variable) separately by formula (4):
Characteristics of the one-dimensional datasets used in the experiments (#cl – number of classes, #el – number of elements (size), len – time series length)
Before combining into one distance measure, both raw and integrated data are normalized. The normalized time series will be a new time series denoted N (x):
The normalized multivariate time series will be the series for which every dimensionality is normalized separately:
We can now define the parametric integral dynamic time warping distance measure (for both univariate and multivariate time series) as a convex combination of the distances DTW and IDTW:
The distance function IDDTW can be used in the nearest neighbor classification method, where the parameter α is tuned in the learning phase. In this work the parameter α will be chosen by leave-one-out cross-validation on the training dataset.
Summary of multidimensional datasets used in the experiments
Summary of multidimensional datasets used in the experiments
Since α is outside the distances DTW and IDTW, to compute values of IDDTW for all parameters from the interval [0, 1] we need to calculate DTW and IDTW only once. This allows some optimization of the computation time in the learning phase of the NN method. The optimized algorithm for leave-one-out cross-validation on the training dataset is shown in Fig. 3.

Implementation of the optimized algorithm (for univariate and multivariate time series) for the leave-one-out cross-validation routine (Matlab code).
Since the DTW distance measure is not a metric (it does not satisfy the triangle inequality), the new combined distance measure IDDTW is also not a metric. However, similarly to DTW, the following identities hold:
To shorten the calculation time of the NN method with the distance IDDTW we can use a lower bound. If LB is a lower bound for DTW and LBI is a lower bound for IDTW then
is a lower bound for the distance IDDTW (for every fixed α ∈ [0, 1] in both the univariate and multivariate cases). There are many good lower bounds of the DTW for one-dimensional time series, such as LB_Keogh [18] and LB_Improved [22]. By our definition of DTW for multidimensional time series (5), (6) we can easily transform those univariate lower bounds to multivariate DTW lower bounds. Then, by (8), we can also find a good lower boundsfor IDDTW.
Experimental setup
We conducted computational experiments for both one-dimensional and multidimensional time series data [19].
In the univariate case, we performed experiments on 85 data sets from the UCR Time Series Classification Archive [7]. This is a database with labeled time series data from a very broad range of fields, including medicine, finance, multimedia and engineering. Each dataset from the database is split into training and testing subsets. All data is z-normalized. Information on the time series used is presented in Table 1.
For the classification process the nearest neighbor method (1NN) is used for all compared distances: DTW, IDTW and IDDTW. We use the cross-validation (leave-one-out) method to find the best parameter α in our classifier IDDTW on a training subset. If the minimal error rate is the same for more than one value of α, we choose the median of those values. A finite subset of the parameter α is chosen, ranging from 0 to 1 with a fixed step size of 0.01.
Test errors of the compared methods (in %) for univariate time series
Test errors of the compared methods (in %) for univariate time series
In the multivariate case, the experiments are carried out on 16 data sets, all of which have labels given. The data sets originate from different domains, including medicine, robotics, handwriting recognition, etc. Information on the time series used is presented in Table 2 (UCI — [3], CMU MOCAP — [5, 25]). Most data is not z-normalized. There is no split into training and testing subsets.
The multivariate time series samples in each dataset are of different lengths. For each dataset, the samples are extended to the length of the longest time series in the data set. We extend all variables of the multidimensional series to the same length. For a short time series instance x with length n, we extend it to a long instance y with lengthnmax by
For the classification process the nearest neighbor method (1NN) is used for all compared distances: DTW, IDTW and IDDTW. For each dataset we calculated the classification error rate using the 10-fold cross-validation method (1NN classifier). We use the cross-validation (leave-one-out) method to find the best parameter α for our classifier IDDTW on a training subset for each split of the 10-fold CV. If the minimal error rate is the same for more than one value of α, we choose the median of those values. A finite subset of the parameter α is chosen, ranging from 0 to 1 with a fixed step size 0.01.
The results for one-dimensional time series are presented in Table 3. The columns of the table (from left) show: the testing error rate (as a percentage) for the component methods DTW and IDTW and for the examined parametric method IDDTW. In Fig. 4 a graphical comparison of the combined method with the component methods is presented.

Graphical comparison of error rates: DTW vs IDDTW and IDTW vs IDDTW for one-dimensional time series data.
For statistical confirmation of the performance of IDDTW, p-values were calculated for the Wilcoxon test. To statistically compare two classifiers over multiple data sets, [12] recommends the Wilcoxon signed-ranks test. The Wilcoxon signed-ranks test is a non-parametric alternative to the paired t-test, which ranks the differences in the performances of two classifiers for each dataset, ignoring the signs, and compares the ranks for the positive and the negative differences. The p-values are: 0.0272 for the pair DTW vs. IDDTW and 5.6043 × 10-14 for the pair IDTW vs. IDDTW. This means that the parametric combined distance IDDTW is better than the compared component distances DTW and IDTW at a 97% significance level.
For multidimensional time series data, the results are shown in Table 4. The columns (from left) show the testing error rate (as a percentage) for the component distances DTW and IDTW and for the new parametric combined distance IDDTW. In Fig. 5 a graphical comparison of the examined method with the component methods is shown.
Test errors of the compared methods (in %) for multivariate time series

Graphical comparison of error rates: DTW vs IDDTW and IDTW vs IDDTW for multidimensional time series data.
As in the univariate case, we confirm statistically the superior performance of IDDTW by the Wilcoxon test. The calculated p-values are: 0.0703 for the comparison of DTW vs. IDDTW and 0.0012 for IDTW vs. IDDTW. We can see that the combined method IDDTW outperforms the component methods DTW and IDTW at a significance level of 93.
We can present the contribution of the component distances to the combined distance IDDTW. Figure 6 shows graphs of the parameter α for example univariate time series. The α corresponding to the minimal error rate is different for each data set but we can see that the minimum of the error is well-positioned — there is only one minimum for each error curve. The testing error rate generally corresponds to the cross-validation training error rate, so we can predict quite well the best value of the parameter α. The combined method adapts well to different data sets without showing signs of overfitting. Similar behavior can be observed for multivariate time series.

Correspondence of the parameter α and error rates for the IDDTW method on some univariate datasets. Dashed line: training subset (CV) error; solid line: test subset error.
In this paper we have proposed and examined a parametrical distance measure based on a convex combination of the dynamic time warping distance measure (DTW) and the integral dynamic time warping distance measure (IDTW). The distance IDTW, computed as DTW on the integrated data, when used separately does not give good results in the process of time series classification. However, there exist quite a large group of datasets for which using the IDTW distance is very favorable. The experiments performed show that the parametrical combination IDDTW outperforms the component distance measures DTW and IDTW in the case of both univariate and multivariate time series data and significantly reduces the classification error rate. This is confirmed by graphical and statistical comparison. The parametric approach used in the IDDTW method makes it possible to combine the advantages and avoid the disadvantages of the component methods. The new distance can be adapted to individual datasets giving the best classification performance without showing signs of overfitting. We can see a very good correspondence between the values of the distance parameter on the training and testing data subsets. The method seems to be easily implemented and interpreted. There exist methods of computational complexity reduction resulting both from the existence of lower bounds and directly from the structure of the distance measure IDDTW. The main motivation of the paper is to show that that the simple transformation as the integral transform can include a bit information about examined time series data and can be used to significantly improve performance of the classification process.
A disadvantage of the examined method is that its computational complexity is higher than that of the component methods DTW and IDTW. Tuning the distance parameter requires some additional computation in the learning phase. However, in the testing phase, the computational complexity is the same as for the component distances. The new parametric combined method seems to be especially interesting in cases where we have precomputed integral data of the examined time series or where the computation time of integrals is negligible, for example in systems with fast software or even hardware integrators.
Future investigation of the parametric integral dynamic time warping distance measure IDDTW may take several directions. We may try to adapt the method to the unsupervised classification case. Clustering methods often require a different approach (to address certain specific problems) than for supervised methods [24]. We can also construct a parametrical distance measure for higher degrees of integrals, similarly as for derivatives in the paper [15]. It may be interesting to examine different definitions of the discrete integral and their influence on the performance of the distance measure. We may also seek further methods of reducing computational complexity in the training phase of classification.
