Multivariate dynamic time warping in automotive applications: A review

Abstract

The use of multivariate time series generation in industrial settings such as the automotive industry continues to increase. The complexity of data analysis requirements in such industries has led to an urgent need to develop effective methods for extracting structural information from data based on the clustering of system behavior time series. Because there are complex interactions between vehicle data variables, the time series clustering of single variables can lead to insufficient results. To the best of our knowledge, only univariate dynamic time warping (DTW) approaches have thus far been applied in an automotive context. To close this research gap, this paper presents a review of generic approaches in multivariate dynamic time warping (MDTW) to determine the most promising approaches for use in the automotive domain. Four approaches are found to be particularly useful for tasks such as the objective assessment of subjective driving perceptions.

Keywords

Multivariate dynamic time warping time series clustering automotive applications vehicle data analysis objective assessment of subjective driving perceptions

1. Introduction

The increasing power of data storage and processing, in many real-world applications in the fields of, for instance, finance and automotive engineering, results in data being stored not only as sets of data points but also as time series. Therefore, the field of multivariate time series analysis is promising in these applications [1, 2]. This is especially true for automotive applications because vehicular behavior is influenced by various variables. For specific driving maneuvers, the values of these variables vary over time and evoke complex processes. Therefore, the analysis of multivariate time series is highly promising and essential for understanding and controlling such vehicular behavior. There are two major applications for knowledge discovery in multivariate time series: clustering of these time series is used to discover insights regarding the structure of the data and time-related patterns [1, 2]; classification based on multivariate time series is a second application [3].

However, analyzing multivariate time series is nontrivial. This is due to the potentially complex interrelations between the variables, which can even vary over time [4]. In [5] three major challenges for time series analysis are given. First, many methods only accept input data as a vector of features. Unfortunately, there are no explicit features in sequence data. Second, the feature selection is far from trivial because even with feature selection methods, the dimensionality of the feature space can be very high and the computation can be costly. Third, an interpretable classifier or clustering method is often desired. With no explicit features, it is difficult to build an interpretable sequence classifier or clustering method.

Raw-data-based clustering and classification approaches based on multivariate dynamic time warping (MDTW) are promising because they do not require feature selection and discretization using the raw data of the entire time series [5]. To the authors’ best knowledge, there has to date been no specific MDTW approach developed for automotive applications. To help bridge this gap, this paper reviews generic MDTW approaches to find the most promising ones for use in the automotive domain. The main contribution of this paper is to review current MDTW approaches with regard to automotive applications in order to present the recent progress in this field and to derive possibilities for future research.

The structure of this paper is shown in Fig. 1. The main focus of the paper is a review of generic MDTW approaches based on automotive requirements. Section 2 compares clustering of static data with the clustering of time series data. Section 3 describes the use-case of objective assessment and discusses the current use of univariate dynamic time warping (DTW) in automotive applications. Section 4 describes and classifies the current research approaches to MDTW, which are further discussed in terms of their respective research applicability. These approaches are rated with respect to automotive requirements in Section 5, leading to the selection of the four generic MDTW approaches most suitable for automotive applications. Finally, the conclusion is presented and future work described in Section 6.

Figure 1.

Deriving the most suitable generic MDTW approaches for automotive applications.

2. Basics of time series clustering

This section explains the clustering of static data, describes how it differs from the clustering of time series data and presents the terminology that is used in this paper.

2.1 Clustering

The goal of clustering is to partition a set of data points into a set of groups that are as similar as possible [6]. In [7], clustering approaches are divided according to their input, into similarity-based clustering and feature-based clustering. In similarity-based clustering, the input to the clustering algorithm is an $N x N$ distance matrix, while in feature-based clustering, the input is an NxD feature matrix. Similarity-based clustering allows for the easy inclusion of domain-specific similarity or kernel functions, and feature-based clustering is useful for potentially noisy data [7]. Further, clustering methods can also be divided according to the clustering output. In flat clustering, the objects are partitioned into disjoint sets, and in hierarchical clustering, a nested tree of partitions is created. In general, flat clustering is faster, whereas hierarchical clustering can be more useful because different numbers of clusters can be analyzed in one clustering [7].

2.2 Time series clustering

In contrast to static data, discussed in Section 2.1, the time series of a variable comprises values that change over time. Although time series have high significance in many industrial applications, the majority of work on clustering has focused on static data[2]. In general, most algorithms for time series have been derived by modifying existing algorithms for clustering static data to adapt them for processing time series data [2].

The focus of this paper is on applications for which an automated process is required to minimize user impact on the clustering process and on defining suitable multivariate automated clustering processes for use in driving applications. An example that explains the benefits of an automated clustering process can be found in Section 3.1. For such applications, expert guessing of parameters is undesirable, as it could significantly change the output; thus, a suitable clustering method must be selected. Three types of time series clustering approaches have been identified: feature-based, model-based, and raw-data-based[2].

In feature-based approaches, an expert must engineer features such as the maximum and minimum of the time series, and a clustering method such as $k$ -means is applied to these features. The advantage of this approach is that it can use existing approaches designed for the clustering of static data; however, feature extraction is highly dependent on domain expertise to achieve a high accuracy of clustering, which makes automated feature-based processes unfeasible [8].

In model-based approaches, the time series is approximated using a model upon which clustering is conducted. An advantage of this approach is that models such as continuous time Bayesian networks can be used to represent temporal dynamics by allowing the states to continuously evolve over time. Such discrete states can provide an understanding of the system and reduce noise [9]. Unfortunately, the states must be defined by an expert, a limitation that occurs in other model-based approaches. In any case, as a model is essentially an approximation of reality, some expert assumptions must be made [10]; as is true under the feature-based approach, such expert influence impedes the automated process because the clustering becomes highly dependent on the user’ model assumptions.

In raw-data-based approaches, clustering is performed using a distance measure on the raw data. By definition, such approaches use the highest potential amount of information and, by working directly with raw data, provide an approach in which clustering is achieved without the need for expert input. However, raw data are potentially prone to noise, and the choice of a suitable distance metric is highly dependent on the characteristics of the time series data.

Nevertheless, raw-data-based methods are probably the optimal approach to implementing automated processes, as the raw data can be processed without expert knowledge and the highest information gain is obtained.

Raw-data-based time series clustering approaches are distinguishable by the underlying similarity measure used. Similarity measures can be divided into lock-step measures, elastic measures, threshold-based measures, and pattern-based measures[11]. Threshold-based measures are not considered in this work because the threshold must be defined by the user, leading to a high degree of user-dependence of the clustering result. Pattern-based measures search for recurring patterns and therefore are more suitable for long time series. However, because the time series focused upon in this work, such as the variables measured during a lane change, are too short for significant recurring patterns to appear, pattern-based measures are not considered here.

Lock-step measures such as the Euclidean distance compare the $i$ -th point of time series $A$ with the $i$ -th point of time series $B$ to calculate the overall distance [12]. Some lock-step measures do not require parameter tuning, making them suitable for use in automated processes. However, in many automotive applications, such as the analysis of lane changes based on vehicle data, time series of different lengths in which time shifts can occur must be processed (e.g., not every lane change has the same duration). The disadvantage of lock-step measures is that they cannot directly deal with time series of different lengths and are sensitive to distortion along the time axis [13].

Elastic measures such as dynamic time warping (DTW) compare the $i$ -th point of time series $A$ with the $j$ -th point of time series $B$ , resulting in a distance matrix from which the optimal warping path can be calculated [14]. DTW is therefore able to accommodate time shifts in the data at the cost of a higher calculational complexity relative to, e.g., the Euclidean distance as a distance metric.

For applications in which time shifts and observations of different lengths occur, elastic measures are optimal because they have the capacity to deal directly with time series of different lengths and compensate for small offsets in comparable patterns. Elastic measures can be further divided into dynamic time warping and edit distance-based measures [11]. In the former, time steps are only stretched, whereas in the latter, time steps are also skipped; however, skipping time steps can result in the loss of potentially valuable information, and therefore edit distance-based measures are excluded from further consideration in this work.

Thus, DTW is the most promising approach for applications in which an automated time series clustering process without an educated guess is required for analyzing data with time shifts and varying temporal lengths without skipping time steps.

Figure 2.

Integrating DTW into the research context of time series clustering.

Figure 2 illustrates the integration of DTW into the context of time series clustering. When dealing with vehicle data, single time series are logged in parallel with other variables, leading to multivariate time series. The clustering of lane change maneuvers is an example of this, in which, in addition to other variables such as longitudinal and lateral acceleration, velocity and steering angle are logged. In applications such as these, clustering of one variable does not lead to sufficient results; in the lane-changing case, for example, the process of changing a lane is not sufficiently represented by a single variable. In [15], this problem is confirmed; it states that valid information can be lost when only one variable of a multivariate time series is considered. To group multivariate time series data, a multivariate time series clustering approach, namely multivariate dynamic time warping (MDTW), is required. To provide an understanding of MDTW, Section 2.3 presents the terminology and the problem definition for this paper, and Section 2.4 explains the univariate case of DTW.

2.3 Terminology and problem definition

This paper uses the following terminology: scalar $x$ , column vector $\bm{x}$ , row vector $\bm{x}^{T}$ , and matrix $\bm{M}$ . A multivariate time series $\bm{X}$ is a time series with $n$ variables and $t$ discrete time steps, where $\bm{x}_{j}$ is the $j$ -th variable and $x_{ij}$ is the $i$ -th time step with $\bm{X}=(\bm{x}_{1},\bm{x}_{2},...,\bm{x}_{j},...,\bm{x}_{n})$ , $\bm{x}_{j}=(x_{1j},x_{2j},...,x_{ij},...,x_{tj})^{T}$ and $\bm{x}_{i}=(x_{i1},x_{i2},...,x_{ij},...,x_{in})$ [4]. The problem solved using MDTW is the search for similarity (or minimal distance) between multivariate time series (MTS). Given two MTS $\bm{X}$ and $\bm{Y}$ with

$\displaystyle\bm{X}=\left(\begin{array}[]{ccccc}x_{11}&\ldots&x_{1j}&\ldots&x_% {1n}\\ \vdots&\ddots&\vdots&\udots&\vdots\\ x_{i1}&\ldots&x_{ij}&\ldots&x_{in}\\ \vdots&\udots&\vdots&\ddots&\vdots\\ x_{t1}&\ldots&x_{tj}&\ldots&x_{tn}\\ \end{array}\right),\ \bm{Y}=\left(\begin{array}[]{ccccc}y_{11}&\ldots&y_{1j}&% \ldots&y_{1n}\\ \vdots&\ddots&\vdots&\udots&\vdots\\ y_{l1}&\ldots&y_{lj}&\ldots&y_{ln}\\ \vdots&\udots&\vdots&\ddots&\vdots\\ y_{m1}&\ldots&y_{mj}&\ldots&y_{mn}\\ \end{array}\right)$ (1)

the distance measure $D(\bm{X},\bm{Y})$ calculates the accumulated distance between the two MTS over $x_{ij}$ and $y_{lj}$ and therefore quantifies the (dis-)similarity [16]. In the problem definition of time series clustering as defined in [17, 1], a dataset $\bm{X^{q}}=\{\bm{X_{1}},\bm{X_{2}},$ … $,\bm{X_{q}}\}$ of $q$ time series data is given. Clustering this dataset involves the process of unsupervised partitioning of $\bm{X^{q}}$ into $C=\{c_{1},c_{2},...,c_{R}\}$ exclusive clusters with $\bm{X^{q}}=\bigcup_{C_{r}\in C}c_{r}$ , $c_{r}\cap c_{d}=\emptyset$ $\forall r\neq d$ . Given a dataset $\bm{Y}^{q},q\in\mathbb{N}$ with $q$ time series $\bm{Y}$ , the distance of all $\bm{Y}^{q}$ to a new MTS $\bm{X}$ must be calculated to find the closest match. The MTS $\bm{Y}^{*}$ is most similar to $\bm{X}$ and can be defined as

$\displaystyle\bm{Y}^{*}=\textit{argmin}\ D(\bm{X},\bm{Y}^{q})$ (2)

With univariate time series, $D(\bm{X},\bm{Y})$ is clearly defined [13, 14]; with multivariate time series, however, there is no unique solution for the problem, as there are many potential combinations of warping dimensions. In the following subsection, the DTW process for the univariate case is explained; this is necessary because MDTW approaches use univariate DTW as a basis.

2.4 Univariate dynamic time warping

This subsection presents the formal definition of DTW. For further information on the univariate case of DTW, see [13, 18]. The aim of DTW is to align two time series $\bm{x}$ and $\bm{y}$ of length $t$ and $m$ , where

$\displaystyle\bm{x}=(x_{1},x_{2},\ldots,x_{i},\ldots,x_{t})^{T},\bm{y}=(y_{1},% y_{2},\ldots,y_{l},\ldots,y_{m})^{T}$ (3)

Hence, a $t$ -by- $m$ matrix is constructed, where the $(i^{th},l^{th})$ element contains the distance $d(x_{i},y_{l})$ between two points $x_{i}$ and $y_{l}$ . An example is $d(x_{i},y_{l})=|x_{i}-y_{l}|$ . In this matrix, every element $(i,l)$ corresponds to the alignment between the points $x_{i}$ and $x_{l}$ . To obtain the optimal alignment of the two univariate time series (UTS), a warping path $W$ as a set of $k$ matrix elements $w_{k}=(i,l)_{k}$ is created, where

$\displaystyle W=w_{1},w_{2},\ldots,w_{k},\ldots,w_{K}\ \ \textit{max}(m,t)% \leqslant K<m+t-1.$ (4)

The warping path must follow three constraints:

•

Boundary conditions: $w_{1}=(1,1)$ and $w_{K}=(t,m)$ . The warping path starts with the first time step and ends with the last time step of the time series.

•

Continuity: The allowable steps in the warping path are limited to the adjacent cells. If $w_{k}=(a,b)$ is given, then $w_{k-1}=(a^{\prime},b^{\prime})$ where $a-a^{\prime}\leqslant 1$ and $b-b^{\prime}\leqslant 1$ .

•

Monotonicity: The points in $W$ must be monotonically spaced in time. If $w_{k}=(a,b)$ is given, then $w_{k-1}=(a^{\prime},b^{\prime})$ with $a-a^{\prime}\geqslant 0$ and $b-b^{\prime}\geqslant 0$ .

Many warping paths $W$ potentially fulfill these constraints. However, to achieve the optimal alignment, the warping costs must be minimized in terms of the number and magnitude of elements $w_{k}$ :

$\displaystyle\textit{DTW}(\bm{x},\bm{y})=\textit{min}\left\{\sum_{k=1}^{K}w_{k% }\right\}$ (5)

This warping path can be found using a dynamic programming approach in which the cumulative distance $\gamma(i,l)$ is defined as the sum of the distance $d(i,l)$ of the current cell and the minimum of the cumulative distances in the adjacent cells:

$\displaystyle\gamma(i,l)=d(x_{i},y_{l})+\textit{min}\{\gamma(i-1,l-1),\gamma(i% -1,j),\gamma(i,l-1)\}.$ (6)

As the DTW distance does not satisfy the triangle inequality, it is a pseudo-metric. In the following section, the univariate DTW approaches used in automotive applications are reviewed.

3. The need for multivariate time series clustering in automotive applications

This section begins with an automotive example, the objective assessment of subjective driving perception (Section 3.1), where an MDTW approach is required. As there is no current MDTW approach specifically tailored to automotive challenges, the example helps to describe the characteristics of the data and the automotive requirements. These requirements will be applied in Section 5 to select the most suitable generic MDTW approaches. Section 3.2 reviews univariate DTW approaches for automotive applications to provide evidence for the method’s usefulness in the automotive context. The major research directions in MDTW are used in these subsections to classify the approaches explored and are discussed in detail in Section 4.

3.1 An automotive example: objective assessment of subjective driving perceptions

The objective assessment of subjective driving perceptions in Advanced Driver Assistance Systems (ADAS) is an automotive example in which MDTW-based time series clustering appears to be promising but has not yet been applied. The gain in comfort and safety provided by ADAS can only be appreciated when the driver perceives the driving behavior of the ADAS to be positive; therefore, subjective perception must be assessed in the development of a customer-centered ADAS. Furthermore, the objective assessment of ADAS requires the correlation of subjective perceptions with objectively measurable variables that occur in time series. This in turn necessitates the clustering of time series of measured observations to extract recurring patterns in terms of, e.g., acceleration. An automated clustering process such as the one described in Section 1 is therefore compulsory for this use-case. The aim of clustering time series in objective assessment is to determine the most influential objectively measurable variables; using an expert-dependent approach such as feature-based clustering could lead to expert-dependent variation in the clustering. Such human-induced variation is by definition subjective and can therefore distort any objective process. This example is used as representative of various scientific problems in the automotive industry. Based on data obtained from objective assessment, Table 1 summarizes the most important automotive data characteristics focused upon in this research.

Table 1
Description of automotive data used for objective assessment

Length of the time series	Up to approx. 2,000 time steps (e.g., 40 s with a sampling rate of 20 ms)
Variation of length	High (difference of 0–1,000 time steps)
Number of observations	20–500
Number of variables	$>$ 1
Occurrence of phase shifts	Phase shifts occur by similar events in different observations
Level of noise	Low (or filtered to a low level)

Working with vehicle data poses special requirements in terms of the methods used for time series analysis; the requirements for generic MDTW approaches used in this study are based on informal expert interviews at BMW AG and can be divided into four main aspects:

(a) (a)

Time dependency of the variables: In automotive applications, many variables – for instance, steering angle and lateral acceleration – have a high degree of interdependence. To obtain a realistic description of system behavior, all variables must be considered at specific time steps to capture their interaction.

(b)

Interpretability of the system: In automotive use-cases such as objective assessment, an attempt is made to understand the different types of system behavior through representation using clusters with several time series. Correspondingly, the process of clustering needs to be highly traceable.

(a) (a)

Variables with different magnitudes and units: The units and magnitudes of data such as velocity and acceleration can differ significantly (e.g., velocity, 0–200 km/h, and acceleration, 0–3 m/s ${}^{2}$ ). An appropriate method will take these different scales into account.

(b)

Variables with different influence: As stated in requirement 1 (a), variables can influence each other significantly. However, the intensity of interdependence is generally not the same for all variables, and it is often useful to know how interdependencies vary. Variable influence can also be driving maneuver-specific.

In Section 5, requirements 1 (a) and 1 (b) will be used to assess the general research direction, with the respective approaches analyzed in more detail based on requirements 2 (a) and 2 (b). In the following subsection, the terminology used in this paper is explained and the problem of calculating the distance between two multivariate time series is defined mathematically.

3.2 Univariate DTW in automotive applications

Because there is currently no MDTW approach focusing on automotive applications, the authors conducted a review of univariate DTW applications in this domain. The review was subsequently extended to MDTW approaches (Section 4) from other domains to evaluate their defined requirements (Section 5). DTW originally emerged from speech recognition [18] but has been widely applied to other fields; in this subsection, an overview of the use of univariate DTW in the automotive context is provided.

In [19], fast DTW was used in combination with spectral clustering to cluster multiple velocity profiles to enable the prediction of behavior or the future state of a vehicle. Because velocity profiles can be considered time series, the use of DTW was suggested as a highly accurate method for finding a distance measurement between time series.

In [20], the focus was on the detection of “aggressive” and “non-aggressive” driving styles using a smartphone-based sensor fusion of an accelerometer, gyroscope, magnetometer, GPS, and video. Such recognition of driving styles and maneuvers via DTW is useful in vehicle safety systems.

In [21], the problem of tracking fine-grained speed variations of vehicles was tackled. This research proposed a technique based on derivative DTW that aligns a received signal strength trace from a moving cell phone handset with a reference trace for a given road segment.

In [22], DTW was used to measure the similarity between two time series of speeds taken from two vehicles driven on the same road segment. This method allows for early detection of vehicles driving at abnormal speeds.

In [23], a $k$ -nearest neighbor classifier based on DTW was used as a distance metric to detect lane changes. The method can be used in vehicle dynamic signals of, e.g., steering angle and vehicle speed extracted from the Controller Area Network (CAN-bus).

In [19, 20, 21, 22], it was demonstrated that DTW performs very well in terms of analyzing velocity and acceleration time series. According to [23], using time series of these variables in combination with the steering angle leads to results sufficient for detecting lane changes.

In [24], the DTW approach was adopted to measure the similarity of two magnetic signatures of a wireless magnetic sensor network for vehicle speed estimation.

In [25], a metric based on DTW was proposed to compare the time histories of the outputs of simulation models with time histories obtained from experimental tests with an emphasis on vehicle safety applications.

In [26], the use of DTW was presented to process electric motor current signals to detect and quantify common faults in a downstream two-stage reciprocating compressor.

In [27], a method using the scale of matched SURF image features and DTW was proposed to perform stable ego-localization for driver assistance and autonomous driving systems.

In [28], driver heterogeneity in car-following behavior and heterogeneous situation-dependent behavior while driving were examined. The DTW algorithm was used to calibrate a microscopic simulation model by synthesizing driver trajectory data.

In [24, 25, 26, 27, 28], the use of DTW was studied for variables beyond those in the previous works cited.

From this brief review, it is seen that DTW is a promising approach to tackling a range of problems in the automotive domain. One challenge is that under the original definition of DTW, 1-D time series are compared, meaning that each element of a sequence is described in a 1-D space (see Section 2.4). The inherent inability of DTW to handle observations of differing or higher dimensionality limits the application of DTW under its original definition. As stated in Section 1, in real-world applications (e.g., capturing data from multiple sensors) high-dimensional data are common [29]. This is especially true for the introductory example of objective assessment (Section 3.1) because, in most driving scenarios, the perception of the driver is influenced by more than one objectively measurable variable. Therefore, a univariate DTW approach leads to insufficient results. In such applications, each element of a sequence is described in an $n$ -D space [30], which means that the DTW algorithm must be extended to a multivariate case. Such extension is not trivial because there are multiple approaches to implementing it, each of which can lead to different distance calculations and, consequently, different clustering (for a demonstration of this problem on a database, the reader is referred to [31]). Thus, finding the most suitable approach for a certain application is necessary but challenging. To identify the most promising approaches in the automotive domain, this paper provides a qualitative comparison of generic approaches based on automotive requirements in Section 5. The following section discusses current research directions for generic MDTW approaches.

4. Review of the generic MDTW approaches

In Section 3.2, it was noted that the univariate version of DTW does not provide sufficient results for applications in which multiple variables with high degrees of interrelation occur. The importance of this is striking when considering that most dynamical systems are characterized by multivariate time series [32]. Thus, obtaining sufficient clustering results requires the use of MDTW approaches, which in turn leads to the challenge of determining the appropriate approach for a given application. The three major directions in MDTW research that have emerged in the last few years are also discussed in this section.

Figure 3.

Respective warping paths of $\textit{DTW}_{I}$ and $\textit{DTW}_{D}$ [31].

In [33], one of the first studies referring to a multidimensional DTW concept is documented. The primary goal of this work was to directly control the warping function curvature by augmenting the dimensionality of DTW. Because the dimensionality increase was used to increase the robustness of speech recognition against colored noise, it represented only a generalization of DTW for similar variables.

In [34], a multipattern DTW was presented in which an optimum path in multidimensional space was determined to increase noise robustness in speech recognition. Similar to that in [33], this approach compared only multiple versions of univariate time series.

The approaches in [33, 34] were in fact the first MDTW approaches to appear in the literature. However, they did not provide a generalization of MDTW compatible with the focus of this work, as they involved the use of only one type of multivariable (speech signals); as discussed in the preceding section, approaches involving multiple variables of differing types (e.g., steering angle and velocity) are more appropriate to automotive applications and are therefore focused on in this review.

The first multidimensional MDTW approach was reported in [30], where two primary MDTW methods are given. Under the first, called $\textit{DTW}_{I}$ (where $I$ stands for “independent”) [31], the DTW is computed $n$ times, or once per dimension (see left-hand side of Fig. 3). In $\textit{DTW}_{I}$ , each time series is aligned separately, leading to $n$ alignments of two multidimensional time series. This alignment is indicated in Fig. 3 as a series of black boxes in the matrix that represents the warping path. The second method is called $\textit{DTW}_{D}$ (where $D$ stands for “dependent”) [31]. In $\textit{DTW}_{D}$ , an $n$ -dimensional $\delta$ is used in the computation of the DTW. In this case, a single alignment of the two sequences is made and the cost function $\delta$ is used to compare vectors of values instead of scalars. $\textit{DTW}_{D}$ is illustrated on the right-hand side of Fig. 3. $\textit{DTW}_{I}$ and $\textit{DTW}_{D}$ represent the two primary research directions in MDTW. According to [31], they can produce different classifications, and neither dominates the field. In the following subsections, the two MDTW approaches are further discussed, with $\textit{DTW}_{I}$ presented in Section 4.1 and $\textit{DTW}_{D}$ , presented in Section 4.2. Furthermore, we show that there are additional approaches that cannot be classified as either $\textit{DTW}_{I}$ or $\textit{DTW}_{D}$ but instead combine some elements of both. These integrated approaches are presented in Section 4.3.

The following subsections describe current MDTW approaches, which, as identified above, are divided into $\textit{DTW}_{I}$ , $\textit{DTW}_{D}$ , and integrated approaches.

4.1 Independent dynamic time warping (

\textit{DTW}_{I}

)

This subsection describes $\textit{DTW}_{I}$ and presents the approaches for which it is used. In independent DTW Eq. (7), each dimension $j$ is warped independently using a univariate distance measure $d(x_{ij},y_{lj})$ , following which the warping costs on all dimensions are summed. The different dimensions can be weighted using the factor $c_{j}$ [31].

$\displaystyle\textit{DTW}_{I}(\bm{X},\bm{Y})=\sum_{j=1}^{n}c_{j}\cdot\textit{% DTW}(\bm{x}_{j},\bm{y}_{j}),\text{ with }d(x_{ij},y_{lj})$ (7)

In [35], a DTW kNN classifier for time series with missing data was presented in which a DTW is calculated for each dimension and later aggregated into one distance measurement using the Euclidean distance. In this case, normalization is necessary, particularly for variables with different scales, and therefore a standard deviation vector $\bm{\sigma}=(\sigma_{1},...,\sigma_{d})$ is introduced. As not all variables are equally important, weighting is also used, with the Pearson product-moment correlation coefficient adopted as an indicator of variable importance to create a weighting matrix, $\bm{W}$ , that indicates the correlation between each set of two variables.

$\displaystyle\textit{DTW}_{I}(\bm{X},\bm{Y})=\sqrt{\sum_{j=1}^{n}\left(\frac{% \textit{DTW}(\bm{x}_{j},\bm{y}_{j})}{\bm{\sigma}_{j}}\cdot\bm{W}_{j}\right)^{2}}$ (8)

In [31, 36], a threshold-based learning approach was used for which either $\textit{DTW}_{I}$ or $\textit{DTW}_{D}$ can be the preferable method for obtaining optimal time series classification accuracy. In this approach, $\textit{DTW}_{I}$ is used to find the cumulative distances of all dimensions independently measured under the DTW.

$\displaystyle\textit{DTW}_{I}(\bm{X},\bm{Y})=\sum_{j=1}^{n}\textit{DTW}(\bm{x}% _{j},\bm{y}_{j})$ (9)

In [37], $\textit{DTW}_{I}$ was applied following the definition of [31], in which the DTW algorithm is applied to each dimension, as well as $\textit{DTW}_{D}$ , in which the DTW algorithm is applied to the sum of the dimensions. In this study, the performance of $\textit{DTW}_{I}$ and $\textit{DTW}_{D}$ were compared based on the task of single-character recognition, with the arithmetic mean used to calculate the overall distance in $\textit{DTW}_{I}$ .

$\displaystyle\textit{DTW}_{I}(\bm{X},\bm{Y})={\textstyle\frac{1}{n}}\sum_{j=1}% ^{n}\textit{DTW}(\bm{x}_{j},\bm{y}_{j})$ (10)

In [8], MDTW was applied in human activity recognition based on the use of the DTW to compute a distance for each dimension, with the resulting vector of distances treated as a feature vector. A dimensionality reduction algorithm such as principal component analysis (PCA) is then applied to this feature vector and a classifier such as signal vector magnitude (SVM) is used to predict the correct class. Relative to previous approaches, this method uses a genuinely different approach.

$\displaystyle\textit{DTW}_{I}(\bm{X},\bm{Y})=\textit{PCA}\left(\begin{array}[]% {c}\textit{DTW}(\bm{x}_{1},\bm{y}_{1})\\ \vdots\\ \textit{DTW}(\bm{x}_{j},\bm{y}_{j})\\ \vdots\\ \textit{DTW}(\bm{x}_{n},\bm{y}_{n})\\ \end{array}\right)$ (11)

4.2 Dependent dynamic time warping (

\textit{DTW}_{D}

)

This subsection describes the $\textit{DTW}_{D}$ method and presents some approaches in which it is used. Under dependent DTW Eq. (12), multivariate time series are treated as single series with $n$ -dimensional vectors. In this case, only a single warping is conducted. This warping requires a cost function, $\delta(\bm{x}_{i},\bm{y}_{l})$ , that can compare vectors of values; thus, $\delta(\bm{x}_{i},\bm{y}_{l})$ is multivariate, unlike the univariate $d(x_{ij},y_{lj})$ described in Section 4.1[30].

$\displaystyle\textit{DTW}_{D}(\bm{X},\bm{Y})=\textit{DTW}(\bm{X},\bm{Y}),\text% {with }\delta(\bm{x}_{i},\bm{y}_{l})$ (12)

The $p$ -norm Eq. (13) is the cost function for $\textit{DTW}_{D}$ that appears most frequently in current research. It is sometimes extended by a weighting factor $c_{j}$ to adjust the interrelations of the variables. For $p=$ 2 and $c_{j}=$ 1, Eq. (13) becomes the Euclidean distance. All of the following approaches employ some variation of the $p$ -norm.

$\displaystyle\delta(\bm{x}_{i},\bm{y}_{l})=\left(\sum_{j=1}^{n}{c_{j}|x_{ij}-y% _{lj}|^{p}}\right)^{{\textstyle\frac{1}{p}}}$ (13)

In [38], DTW was used as a tool to match movement patterns. In this study, such patterns were represented as sequences of feature vectors, and the distance between the $i$ -th feature vector of the reference pattern and the $j$ -th feature vector of the test pattern was calculated to produce a $t$ -by- $n$ matrix in which univariate DTW was performed. Although the authors referred to weighting of the dimensions in $\delta(\bm{x}_{i},\bm{y}_{l})$ , they did provide an explicit value for $\delta$ .

In [39], multivariate time series were treated as sequence of feature vectors, in a manner similar to that in [38]. Stating that in principle any $p$ -norm Eq. (13) can be used to calculate the distance between two feature vectors, the authors used the $1$ -norm, i.e., the sum of the absolute differences in all dimensions. As a preparation step, each dimension was normalized to zero mean and unit variance, rendering all of the dimensions comparable.

In [40], an algorithm was proposed for similarity search in trajectories and archival data. This paper was the first to extend DTW to cases of greater than three dimensions and applied the $p$ -norm Eq. (13).

In [41], the dimensionality of DTW was expanded by indexing multidimensional time series for the task of efficient retrieval and analysis of trajectory similarities. These authors described an extension from the classical one dimensional DTW to a two dimensional search space to obtain optimal alignment using the $p$ -norm Eq. (13).

In [42], the approach of [39] was applied by using the $1$ -norm to measure image texture similarity based on an assessment of the texture similarities of images with structured textures.

In [43], an approach similar to that in [38, 39] was presented in which time series were treated as sequences of feature vectors. As had been demonstrated in [39], the Euclidean distance with weights $c_{j}$ for each dimension Eq. (13) was used to calculate the distances between the test and the class templates. The cosine correlation coefficient and a weight vector were also used in this method.

In [30], the two possible types of MDTW discussed in Section 4 were recognized, but $\textit{DTW}_{D}$ was identified as more promising in the analysis of radiometric series in satellite image time series analysis. In this study, vectors were compared based on the $\delta$ obtained from the Euclidean distance Eq. (13).

In [44], the DTW algorithm uWave for gesture recognition using a three-axis accelerometer was developed. In this method, a feature vector with three elements per time sample corresponding to the components of acceleration along the three spatial axes is used. The algorithm employs the Euclidean distance Eq. (13) for matching quantized time series of acceleration; as all three dimensions represent acceleration signals, no weight vector is needed.

In [16], a measure was proposed for the discrepancy $\delta(\bm{x}_{i},\bm{y}_{l})$ between pairs of multidimensional time points, $\bm{x}_{i}$ and $\bm{y}_{l}$ . After applying this distance measure, time warping is conducted. The Euclidean distance Eq. (13) is used to calculate the distances between feature vectors, which are combined with global alignment kernels to find a global alignment solution.

In [45], a DTW-based algorithm was presented to classify any $n$ -dimensional signal and automatically compute a classification threshold. In this method, the Euclidean distance Eq. (13) is used to generalize the DTW to the multivariate case.

In [46], regular and derivative DTW (DDTW) were combined into one parametric distance measure, allowing the contributions of the two respective methods to be defined individually for any data set. In this method, it is assumed that multivariate time series are one-dimensional trajectories in an $n$ -dimensional Euclidean space, and the Euclidean distance Eq. (13) is used to calculate the distances between feature vectors. Using the derivative dynamic time series distance between two multivariate time series defined as $\textit{DDTW}(\bm{X},\bm{Y})=\textit{DTW}(\bm{X}^{\prime},\bm{Y}^{\prime})$ , a convex combination of the distances DTW and DDTW can be calculated as

$\displaystyle\textit{DD}_{\textit{DTW}}(\bm{X},\bm{Y})=(1-\alpha)\textit{DTW}_% {D}(\bm{X},\bm{Y})+\alpha\textit{DDTW}(\bm{X},\bm{Y})$ (14)

In [47], DTW was applied in a clinical test to measure balance and mobility. In this case, the data were generated by a wearable inertial sensor unit, and the Euclidean distance Eq. (13) was used to generalize to the multivariate case.

As previously stated in Section 4.1, two studies [31, 36] applied both $\textit{DTW}_{I}$ and $\textit{DTW}_{D}$ , with the Euclidean distance Eq. (13)used for $\textit{DTW}_{D}$ .

In [48], a new lip-reading system was presented based on the classification of lip geometry features using a template probabilistic multi dimensional DTW approach. In the opinion of the authors, it would have been possible in principle to apply DTW to each feature separately and subsequently select the class with the shortest distance, but in their use-case they did not obtain sufficient results; therefore, a new MDTW distance designed specifically for lip-reading was developed Eq. (15)

$\displaystyle\delta(\bm{x}_{i},\bm{y}_{l})=\sum_{j=1}^{n}{|{\textstyle\frac{t% \cdot x_{ij}}{\sum_{i=1}^{t}{x_{ij}}}}-{\textstyle\frac{m\cdot y_{lj}}{\sum_{l% =1}^{m}{y_{lj}}}}|},1\leqslant j\leqslant t,1\leqslant i\leqslant m$ (15)

The distance function used in this study differed significantly from those used in other approaches in that it was highly tuned to the use-case of classifying lip geometry features; correspondingly, it is not further considered in this paper.

As mentioned in Section 4.1, both $\textit{DTW}_{I}$ and $\textit{DTW}_{D}$ were also investigated in [37], with the DTW algorithm applied to the sum of the dimensions Eq. (16) for single-character recognition under $\textit{DTW}_{D}$ :

$\displaystyle\delta(\bm{x}_{i},\bm{y}_{l})=\sum_{j=1}^{n}x_{ij}-\sum_{j=1}^{n}% y_{lj}$ (16)

In [49], a new ensemble classifier was proposed based on the DTW, and a method for combining information from time series extracted from multiple sensors was demonstrated. In this method, the signal vector magnitude (SVM) for the three dimensional case is calculated and used to generalize to Eq. (17) for the $n$ -dimensional case:

$\displaystyle\delta(\bm{x}_{i},\bm{y}_{l})=\sqrt{\sum_{j=1}^{n}x^{2}_{ij}}-% \sqrt{\sum_{j=1}^{n}y^{2}_{lj}}$ (17)

In [50], an MDTW measure was proposed based on the Mahalanobis distance Eq. (18) for use in data-driven fault diagnosis. A metric learning algorithm was used to learn the static feature vectors in measurement signals to obtain the Mahalanobis distance over the feature space. In this paper, the authors discussed the Euclidean distance approach mentioned in Section 4.2 and noted that this method fails in assigning different weights to each variable. According to them, the assumption that every variable is equally important does not hold for process monitoring and fault diagnosis. Their approach was further developed in [51].

$\displaystyle\delta(\bm{x}_{i},\bm{y}_{l})=(\bm{x}_{i}-\bm{y}_{l})^{T}M(\bm{x}% _{i}-\bm{y}_{l})$ (18)

4.3 Integrated approaches or combinations of

\textit{DTW}_{I}

and

\textit{DTW}_{D}

In [52], DTW approaches were divided into methods employing early and late fusion of signals, a classification corresponding to $\textit{DTW}_{D}$ and $\textit{DTW}_{I}$ , respectively. A hybrid approach that combines the advantages of $\textit{DTW}_{D}$ and $\textit{DTW}_{I}$ was designed but found to be limited to three dimensions. The authors claimed to have developed the first approach using a search space of greater than two dimensions to align time series. In this method, $\textit{DTW}_{D}$ and $\textit{DTW}_{I}$ are combined by calculating the Euclidean distance for each feature of two modalities added and weighted by the factor $w$ Eq. (19):

$\displaystyle\delta(\bm{x}_{i},\bm{y}_{l})=\sum_{j=1}^{n}{(x_{ij}-y_{lj})^{2}}% +w\cdot\sum_{j=1}^{\tilde{n}}{(\tilde{x}_{ij}-\tilde{y}_{lj})^{2}}$ (19)

In [31], there was a discussion of how the two primary modalities of MDTW identified in [30] can produce different classifications in which neither dominates the other. In this study, $\textit{DTW}_{I}$ was defined as the cumulative distances of all dimensions independently measured under DTW Eq. (9), while $\textit{DTW}_{D}$ was calculated in the same manner as DTW in the univariate case (compare Section 2.4), except with the cumulative squared Euclidean distance of dimensions Eq. (13) used as a distance measure. A framework was proposed for determining if $\textit{DTW}_{I}$ or $\textit{DTW}_{D}$ should be used for best results in classification. This framework was used in [53] for C. elegans search behavior analysis and was further refined in [36].

In [4], a fundamentally different approach was presented, one that cannot be classified into either $\textit{DTW}_{I}$ or $\textit{DTW}_{D}$ . Stating that the correlations between variables carries the real information obtainable through multivariate comparison, the authors developed an algorithm called correlation-based dynamic time warping (CBDTW). This approach, which applies PCA segmentation (see [54]), develops multivariate time series based on PCA-related costs, where the covariance matrices of the segments, $\bm{F}_{i}$ , are calculated as follows:

$\displaystyle\bm{F}_{i}={\textstyle\frac{1}{b_{i}-a_{i}}}\sum_{k=a_{i}}^{b_{i}% }(\bm{x}_{k}-\bm{v}_{i})(\bm{x}_{k}-\bm{v}_{i})^{T}$ (20)

where $\bm{v}_{i}$ are the means of the segments and $a$ and $b$ are the start and end points of a segment, respectively. The covariance matrix $\bm{F}_{i}=\bm{U}_{i}\Lambda_{i}\bm{U}_{i}^{T}$ is decomposed in $\bm{U}_{i}$ , which are eigenvectors with eigenvalues in their columns. To calculate the distance between the PCA segments of two multivariate time series, the Krzanowski distance Eq. (21) is used:

$\displaystyle S_{\textit{PCA}}={\textstyle\frac{1}{p}}\textit{trace}(\bm{U}_{i% ,p}^{T}\bm{U}_{j,p}\bm{U}_{j,p}^{T}\bm{U}_{i,p})$ (21)

In [29], a method called deep canonical time warping (DCTW) was proposed in which multiple sequences are aligned to discover complex hierarchical representations. DTW-based temporal alignment methods are extended to encompass heterogeneous collections of features that can be connected via non-linear hierarchical mappings. This method was the first deep learning approach to temporal alignment and thus represented a fundamental break with earlier approaches.

5. Discussion

Section 4 presented what the authors believe to be the most relevant publications regarding MDTW. The methodologies examined in this section can be classified as either $\textit{DTW}_{I}$ , $\textit{DTW}_{D}$ , or integrated approaches. As the goal of this paper is to assess these generic MDTW approaches with a focus on automotive applications, requirements for working with vehicle data (Section 3.2) from, e.g., the CAN-bus will be necessary for a qualitative comparison of these approaches. Based on these requirements, the most promising approaches for this application domain can then be recommended. A two-phase comparison is applied: in the first phase, requirements 1(a) and 1(b) from Section 3.1 are used to compare research directions; in the second phase, requirements 2(a) and 2(b) are used to compare specific approaches.

Table 2
Comparison of MDTW research directions in terms of requirements 1(a) and 1(b)

	1(a) Time dependency of variables	1(b) Interpretability of the system
$\textit{DTW}_{I}$	Suitable for systems in which the interrelations between the variables are time-independent.	The warping of $\textit{DTW}_{I}$ is traceable because there is a warping path for each dimension that can be supervised.
$\textit{DTW}_{D}$	Suitable for systems in which the interrelations between variables are time-dependent.	Since the warping is conducted over multiple variables, the “correct” warping path cannot be supervised easily.
Integrated approaches	Suitable for systems in which the interrelations between variables are time-dependent [52, 4, 29]. However, [31] is suitable for time-independent and time-dependent variables, as this approach combines $\textit{DTW}_{I}$ and $\textit{DTW}_{D}$ .	With [52, 31], the warping is traceable, but with [4, 29], it is problematic.

Table 2 shows a comparison of the research directions for $\textit{DTW}_{I}$ , $\textit{DTW}_{D}$ , and the integrated approaches in terms of requirements 1(a) and 1(b). It is striking that $\textit{DTW}_{I}$ is preferable to $\textit{DTW}_{D}$ and the integrated approaches when considering variables that are somewhat loosely interrelated in the time domain. This results from the fact that warping each variable separately results in a loss of these interrelations.

Table 3

Comparison of $\textit{DTW}_{I}$ approaches in terms of requirements 2(a) and 2(b)

$\textit{DTW}_{I}$	2 (a) Variables with different magnitudes and units	2 (b) Variables with different influence	Application examples
[35]	Yes, the variables are normalized.	The influence of the variables on the distance measure can be determined with the weighting.	Robot arm, shuttle
[36]	No, the variables are not normalized.	As there is no weighting, an equal influence is assumed.	Gesture recognition (acceleration), human activity recognition
[37]	Yes, the variables are normalized.	As there is no weighting, an equal influence is assumed.	Word recognition (biometric smart pen)
[8]	Yes, PCA is invariant in magnitude and units.	The influence of the variables cannot be detected using PCA owing to the transformation into principal components.	Human activity recognition based on smartphone data (acceleration and angular velocity)

From this perspective, $\textit{DTW}_{D}$ and the integrated approaches appear to be more promising for working with vehicle data because they meet requirement 1(a). However, with respect to system interpretability (requirement 1(b)), the actions of $\textit{DTW}_{I}$ are easier to trace than the actions of either $\textit{DTW}_{D}$ or the integrated approaches.

Table 4

Comparison of $\textit{DTW}_{D}$ approaches in terms of requirements 2(a) and 2(b)

$\textit{DTW}_{D}$	2 (a) Variables with different magnitudes and units	2 (b) Variables with different influence	Application examples
[38, 39, 40, 41, 42, 43, 30, 44, 16, 45, 46, 47, 36]	Yes, the weighting factors $c_{j}$ compensate for different magnitudes and units.	Yes, the influence can be determined with the weighting factor $c_{j}$ . However, $c_{j}$ is also influenced by the different variable magnitudes.	Human activity and gesture recognition, multi-sensor fusion
[37]	No, the variables from different dimensions are calculated together and not normalized.	No, as all variables are summed, the influence of individual variables cannot be determined.	Measuring the mobility of patients (acceleration, angular velocity)
[49]	No, the variables from different dimensions are calculated together and not normalized.	No, as all variables are summed, the influence of the individual variables cannot be determined.	Human activity recognition (acceleration sensors)
[51]	Yes, the different magnitudes can be compensated for in the Mahalanobis matrix, $\bm{M}$ .	Yes, the Mahalanobis matrix, $\bm{M}$ , shows the relation between the variables.	Robot execution failures.

This increased interpretability occurs under $\textit{DTW}_{I}$ because in this approach each dimension is warped independently and can consequently be supervised individually. By contrast, under $\textit{DTW}_{D}$ and the integrated approaches all dimensions are warped together, resulting in a warping that is more difficult to trace and therefore minimally comprehensible by the user.

This first comparison has shown that all three research directions are in principle suitable for working with vehicle data, with efficacy varying depending on the emphasis placed on requirement 1(a) or 1(b). Now follows a detailed qualitative comparative analysis of the respective approaches.

Table 3 shows a comparison of the $\textit{DTW}_{I}$ approaches in terms of requirements 2(a) and 2(b). The approaches from [35, 37, 8] are all designed to manipulate variables with different magnitudes and units, fulfilling requirement 2(a). However, the approach of [36] does not fulfill this requirement because there is no normalization or weighting of the variables, which makes it rather difficult to work with varying magnitudes. The only $\textit{DTW}_{I}$ approach that fulfills requirement 2(b) is the one in [35], as it can determine the influence of variables via weighting. Based on this comparison, the approach of [35] is the most promising $\textit{DTW}_{I}$ approach for meeting automotive requirements.

Table 4 shows a comparison of the $\textit{DTW}_{D}$ approaches in terms of requirements 2(a) and 2 (b). The $p$ -norm approaches ([38, 39], etc.) and the approach of [51] fulfill requirement 2(a) by, respectively, introducing a weighting factor $c_{j}$ or compensating for the different magnitudes in the Mahalanobis matrix, $\bm{M}$ . By contrast, the approaches in [37, 49] do not fulfill requirement 2(a) because variables with different dimensionalities are calculated together and not normalized, making these approaches prone to overweight variables of the highest magnitude.

This characteristic also causes these two approaches to fail requirement 2(b); because they sum all variables, the influence of a single variable cannot be determined. However, the $p$ -norm approaches ([38, 39], etc.) and the approach of [51] do fulfill requirement 2(b). The weighting factor $c_{j}$ in the $p$ -norm approaches can be used as an indicator for the influence of the different variables on the distance measurement. However, because the weighting factor is also used to compensate for different magnitudes, it can be challenging to separate these two effects. Nevertheless, the weighting factor approach is the most cited approach in the literature, which indicates its wide range of applications. The approach of [51] also fulfills requirement 2(b), as the Mahalanobis matrix reveals the relations between variables. Based on this comparison, it appears that the $p$ -norm approaches ([38, 39], etc.) and the approach of [51] are the most suitable $\textit{DTW}_{D}$ approaches to fulfilling automotive requirements.

Table 5 shows a comparison of the integrated approaches in terms of requirements 2(a) and 2(b). The approaches of [4, 29] are the only integrated approaches that fulfill requirement 2(a). Because PCA is magnitude- and unit-invariant, the former approach has no problem handling variables of different magnitudes and units. In the latter approach, the variables are transformed into a feature space that is unit- and magnitude-invariant. The approach of [52] and the $\textit{DTW}_{D}$ approach in [31] do not normalize the variables and therefore do not fulfill requirement 2(a). Requirement 2(b) tends to be a problem for all of the integrated approaches.

Table 5

Comparison of integrated approaches in terms of requirements 2(a) and 2(b)

Integrated approaches	2(a) Variables with different magnitudes and units	2(b) Variables with different influence	Application examples
[52]	No, the variables are not normalized.	Partial; because there is only one weighting factor, the overall influence of the variables cannot be determined.	Speech/gesture recognition
[31]	No, the variables in $\textit{DTW}_{D}$ are not normalized.	Partial; only in the case of $\textit{DTW}_{I}$ is there a weight factor, which allows the influence to be determined.	Gesture recognition (acceleration), human activity recognition
[4]	Yes, PCA is invariant to magnitudes and units.	No, the influence of the variables cannot be detected using PCA.	Fault detection (gas turbine), drunk driving detection
[29]	Yes, the variables are transformed into a feature space.	No, owing to use of a neural network, the interpretability is limited.	Alignment of acoustic recordings

By definition, these approaches are more complex, making it more difficult to trace the influence of different variables. The weighting factor in [52] can solve this problem partially but, as there is only one weighting factor, this solution is limited. In the approach of [31], only $\textit{DTW}_{I}$ incorporates a weight factor, which also partially solves this problem. The method of [4] uses a PCA approach, making it impossible to trace the influence of the variables owing to the transformation into principal components. The use of a neural network in [29] leads to a low degree of interpretability. In general, neural networks require high numbers of observations to avoid overfitting. In some automotive applications (compare Section 3.1), relatively few observations are available, which makes the approach of [29] less suitable for these applications. As, according to Table 1, the number of observations ranges from 20–500, this approach is not further considered. Apart that used in [29], the only approach that fulfills requirement 2(a) is the one presented in [4]; this approach is therefore the most promising of the integrated approaches. However, requirement 2(b) cannot be fulfilled under this approach.

To summarize the discussion in this section, as an automotive application the most promising $\textit{DTW}_{I}$ approach is that presented in [35], while the most promising $\textit{DTW}_{D}$ approaches come from [38, 39], etc. and the approach of [51]. In terms of integrated approaches, the reader is advised to use [4]. An overview of the formulas used in the most promising approaches is given in Table 6.

Table 6

Overview of distance functions used in the most promising approaches

$\textit{DTW}_{I}$	$\textit{DTW}_{I}(\bm{X},\bm{Y})=\sqrt{\sum_{j=1}^{n}({\textstyle\frac{DTW(\bm{% x}_{j},\bm{y}_{j})}{\bm{\sigma}_{j}}}\cdot\bm{W}_{j})^{2}}$ [35]
$\textit{DTW}_{D}$	$\delta(\bm{x}_{i},\bm{y}_{l})=\left(\sum_{j=1}^{n}{c_{j}\|x_{ij}-y_{lj}\|^{p}}% \right)^{{\textstyle\frac{1}{p}}}$ [38, 39, 40, 41, 42, 43, 30, 44, 16, 45, 46, 47, 36]
	$\delta(\bm{x}_{i},\bm{y}_{l})=(\bm{x}_{i}-\bm{y}_{l})^{T}M(\bm{x}_{i},\bm{y}_{% l})$ [51]
Integrated approaches	$\bm{F}_{i}={\textstyle\frac{1}{b_{i}-a_{i}}}\sum_{k=a_{i}}^{b_{i}}(\bm{x}_{k}-% \bm{v}_{i})(\bm{x}_{k}-\bm{v}_{i})^{T}$ $S_{PCA}={\textstyle\frac{1}{p}}trace(\bm{U}_{i,p}^{T}\bm{U}_{j,p}\bm{U}_{j,p}^% {T}\bm{U}_{i,p})$ [4]

6. Conclusion and future work

The goal of this paper was to review the current approaches in MDTW and to identify the most appropriate of these with respect to automotive applications. The paper began with a discussion of the need for an MDTW approach in the automotive domain. Subsequently, the current research directions in terms of generic MDTW approaches were discussed, followed by a classification of approaches and an assessment of the respective approaches with regard to automotive applications. Although no approach specifically designed to handle vehicle data could be found, four existing approaches that fulfill the defined automotive requirements were identified.

A core insight of this literature review is that no approach currently exists that fulfills all of the requirements of automotive applications. There is a high demand from industry, where a massive amount of time series data is created. Although the research on MDTW is relatively new, three main research directions have evolved. The first on, $\textit{DTW}_{I}$ is suitable for systems, where the variables are independent from each other in the time domain. However, this assumption holds only for few dynamic systems. Since an own warping path is calculated for every dimension, the paths can easily be supervised. The second one, $\textit{DTW}_{D}$ is suitable for dynamic systems, where the variables are dependent from each other in the time domain. This assumption holds for the majority of dynamic systems. However, conducting a single warping over all variables makes it hard to trace. Since the integrated approaches (the third one) differ greatly from each other, the influence of time dependency and interpretability is highly dependent on the respective approach. The p-norm ( $\textit{DTW}_{D}$ ) has been widely adopted in the research community, while other approaches are designed for a special use-case. Correlation based dynamic time warping [4] e.g., is suitable for high dimensional data with high interrelations, where the interrelations do not have to be explicitly modeled and therefore is promising for automotive applications.

Possibilities for future work lie in an experimental comparison of the MDTW approaches on an open source database with automotive time series data to achieve deeper insights for the user. Furthermore, it is necessary to analyze how well the MDTW approaches react to high dimensionality data and highly noisy data without filtering. This can give an estimation of the use-cases for which the MDTW approaches are also promising. In the next step, the applicability for these use-cases must be evaluated with experimental data. In addition, the data preparation that is necessary for MDTW must be analyzed in depth, and the possibility of combining the MDTW approaches with other machine learning methods in order to create a hybrid approach should be explored.

References

Aghabozorgi

Seyed Shirkhorshidi

and Ying Wah

, Time-series clustering – a decade review, Inform Syst 53 (2015), 16–38. doi: 10.1016/j.is.2015.04.007.

Warren Liao

, Clustering of time series data – a survey, Pattern Recogn 38(11) (2005), 1857–1874. doi: 10.1016/j.patcog.2005.01.025.

Xing

Pei

and Keogh

, A brief survey on sequence classification, ACM SIGKDD Explorations Newsletter 12(1) (2010), 40. doi: 10.1145/1882471.1882478.

Bankó

and Abonyi

, Correlation based dynamic time warping of multivariate time series, Expert Syst Appl 39(17) (2012), 12814–12823.

Xing

Pei

and Yu

P.S.

, Early prediction on time series: A nearest neighbor approach, IJCAI International Joint Conference on Artificial Intelligence, 2009, pp. 1297–1302.

Aggarwal

C.C.

and Reddy

C.K.

, Data clustering: Algorithms and applications. Boca Raton Fla. u.a.: CRC Press, 2014.

Murphy

K.P.

, Machine learning: A probabilistic perspective, The MIT Press, Massachusetts, 2012.

Seto

Zhang

and Zhou

, Multivariate time series classification using dynamic time warping template selection for human activity recognition, IEEE SSCI, 2015, pp. 1399–1406. doi: 10.1109/SSCI.2015.199.

Stella

and Amer

, Continuous time bayesian network classifiers, J. Biomed. Inform 45(6) (2012), 1108–1119. doi: 10.1016/j.jbi.2012.07.002.

10.

Wit

van den Heuvel

and Romeijn

J.-W.

, All models are wrong…: an introduction to model uncertainty. Statistica Neerlandica 66(3) (2012), 217-236. DOI: 10.1111/j.1467-9574.2012.00530.x.

11.

Wang

Mueen

Ding

Trajcevski

Scheuermann

and Keogh

, Experimental comparison of representation methods and distance measures for time series data, Data Min Knowl Disc 26(2) (2013), 275–309. doi: 10.1007/s10618-012-0250-5.

12.

Izakian

Pedrycz

and Jamal

, Fuzzy clustering of time series data using dynamic time warping distance, Eng Appl Artif Intel. 39 (2015), 235–244. doi: 10.1016/j.engappai.2014.12.015.

13.

Keogh

and Ratanamahatana

C.A.

, Exact indexing of dynamic time warping, Knowl Inf Syst 7(3) (2005), 358–386. doi: 10.1007/s10115-004-0154-9.

14.

Berndt

D.J.

and Clifford

, Using dynamic time warping to find patterns in time series, KDD Workshop, 1994, 359–370.

15.

Cao

and Liu

, Research on dynamic time warping multivariate time series similarity matching based on shape feature and inclination angle, J Cloud Comp 5(1) (2016), 155. doi: 10.1186/s13677-016-0062-z.

16.

Kale

D.C.

Gong

Che

Liu

Medioni

Wetzel

and Ross

, An examination of multivariate time series hashing with applications to health care, IEEE ICDM, 2014, pp. 260–269. doi: 10.1109/ICDM.2014.153.

17.

Arbelaitz

Gurrutxaga

Muguerza

Pérez

J.M.

and Perona

, An extensive comparative study of cluster validity indices, Pattern Recogn 46(1) (2013), 243–256. doi: 10.1016/j.patcog.2012.07.021.

18.

Rabiner

L.R.

and Juang

B.-H.

, Fundamentals of speech recognition. Delhi: Pearson, 1993.

19.

Lohrer

and Lienkamp

, Building representative velocity profiles using fastdtw and spectral clustering, ITST, 2015, pp. 45–49. doi: 10.1109/ITST.2015.7377398.

20.

Johnson

D.A.

and Trivedi

M.M.

, Driving style recognition using a smartphone as a sensor platform, IEEE ITSC 2015, pp. 1609–1615. doi: 10.1109/ITSC.2011.6083078.

21.

Chandrasekaran

Varshavsky

Gruteser

Martin

R.P.

Yang

and Chen

, Tracking vehicular speed variations by warping mobile phone signal strengths, IEEE PerCom, 2011, pp. 213–221. doi: 10.1109/PERCOM.2011.5767589.

22.

Tin

T.T.

Hien

N.T.

and Vinh

V.T.

, Measuring similarity between vehicle speed records using dynamic time warping, KSE (2015), 168–173. doi: 10.1109/KSE.2015.69.

23.

Zheng

and Hansen

J.H.L.

, Lane-change detection from steering signal using spectral segmentation and learning-based classification, IEEE Trans Intell Veh 2(1) (2017), 14–24. doi: 10.1109/TIV.2017.2708600.

24.

Zhang

Zhao

and Yuan

, A vehicle speed estimation algorithm based on dynamic time warping approach, IEEE Sensors J 17(8) (2017), 2456–2463. doi: 10.1109/JSEN.2017.2672735.

25.

Sarin

Kokkolaras

Hulbert

Papalambros

Barbat

and Yang

R.-J.

, A comprehensive metric for comparing time histories in validation of simulation models with emphasis on vehicle safety applications, ASME, 2008, pp. 1275–1286. doi: 10.1115/DETC2008-49669.

26.

Zhen

Wang

and Ball

A.D.

, Fault diagnosis of motor drives using stator current signal analysis based on dynamic time warping, Mech Syst Signal Pr 34(1-2) (2013), 191–202. doi: 10.1016/j.ymssp.2012.07.018.

27.

Wong

Deguchi

Ide

and Murase

, Single camera vehicle localization using surf scale and dynamic time warping, IEEE IV 1 (2014), 681–686. doi: 10.1109/IVS.2014.6856545.

28.

Taylor

Zhou

Rouphail

N.M.

and Porter

R.J.

, Method for investigating intradriver heterogeneity using vehicle trajectory data: A dynamic time warping approach, Transport Res B-Meth 73 (2015), 59–80. doi: 10.1016/j.trb.2014.12.009.

29.

Trigeorgis

Nicolaou

Zafeiriou

and Schuller

, “Deep Canonical Time Warping for simultaneous alignment and representation learning of sequences” (eng), IEEE TPAMI 40(5) (2018), 128–1138.

30.

Petitjean

Inglada

and Gancarski

, Satellite image time series analysis under time warping, IEEE Trans Geosci Remote Sensing 50(8) (2012), 3081–3095. doi: 10.1109/TGRS.2011.2179050.

31.

Shokoohi-Yekta

Wang

and Keogh

, On the non-trivial generalization of dynamic time warping to the multi-dimensional case, in: SDM, pp. 289–297. doi: 10.1137/1.9781611974010.33.

32.

Tapinos

and Mendes

, A method for comparing multivariate time series with different dimensions, PloS one 8(2) (2013), e54201. doi: 10.1371/journal.pone.0054201.

33.

Stettiner

Malah

and Chazan

, Dynamic time warping with path control and non-local cost, IAPR, 1994, pp. 174–177. doi: 10.1109/ICPR.1994.577150.

34.

Ulhas Nair

and Sreenivas

T.V.

, Joint decoding of multiple speech patterns for robust speech recognition, IEEE ASRU, 2007, pp. 93–98. doi: 10.1109/ASRU.2007.4430090.

35.

Oehmcke

Zielinski

and Kramer

, KNN ensembles with penalized dtw for multivariate time series imputation, IJCNN, 2016, pp. 2774–2781. doi: 10.1109/IJCNN.2016.7727549.

36.

Shokoohi-Yekta

Jin

Wang

and Keogh

, Generalizing dtw to the multi-dimensional case requires an adaptive approach, Data Min Knowl Disc 31(1) (2017), 1–31. doi: 10.1007/s10618-016-0455-0.

37.

Bashir

and Kempf

, Reduced dynamic time warping for handwriting recognition based on multidimensional time series of a novel pen device, International Journal of Electrical, Computer, Energetic, Eletronic and Communication Engineering 2(9) (2008), 1839–1845.

38.

Gavrila

D.M.

and Davis

L.S.

, Towards 3-d model-based tracking and recognition of human upper body movement: a multi-view approach, Int Worksho on Face and Gesture Recognition, Zurich, 1995, doi: 10.1109/ISCV.1995.477010.

39.

Ten Holt

G.A.

Reinders

and Hendriks

E.A.

, Multi-dimensional dynamic time warping for gesture recognition, ASCI 300.

40.

Sherkat

and Rafiei

, On efficiently searching trajectories and archival data for historical similarities, Proc VLDB Endow 1(1) (2008), 896–908. doi: 10.14778/1453856.1453953.

41.

Vlachos

Hadjieleftheriou

Gunopulos

and Keogh

, Indexing multi-dimensional time-series with support for multiple distance measures, ACM SIGKDD, 2003, p. 216. doi: 10.1145/956750.956777.

42.

de Mello

R.F.

and Gondra

, Multi-dimensional dynamic time warping for image texture similarity, SBIA 19 (2008), 23–32.

43.

M.H.

West

Venkatesh

and Kumar

, Using dynamic time warping for online temporal fusion in multisensor systems, Inform Fusion 9(3) (2008), 370–388. doi: 10.1016/j.inffus.2006.08.002.

44.

Liu

Zhong

Wickramasuriya

and Vasudevan

, uwave: Accelerometer-based personalized gesture recognition and its applications, Pervasive and Mob. Comp 5(6) (2009), 657–675. doi: 10.1016/j.pmcj.2009.07.007.

45.

Gillian

and Knapp

R.B.

, “Recognition of Multivariate Temporal Musical Gestures Using N-Dimensional Dynamic Time Warping,” NIME, 2011, pp. 337–342,

46.

Górecki

and Łuczak

, Multivariate time series classification with parametric derivative dynamic time warping, Expert Syst Appl 42(5) (2015), 2305–2312. doi: 10.1016/j.eswa.2014.11.007.

47.

Al-Jawad

Adame

M.R.

Romanovas

Hobert

Maetzler

Traechtler

Moeller

and Manoli

, Using multi-dimensional dynamic time warping for tug test instrumentation with inertial sensors, IEEE MFI, 2012, pp. 212–218. doi: 10.1109/MFI.2012.6343011.

48.

Ibrahim

M.Z.

and Mulvaney

D.J.

, Geometrical-based lip-reading using template probabilistic multi-dimension dynamic time warping, J Vis Commun Image R 30 (2015), 219–233. doi: 10.1016/j.jvcir.2015.04.013.

49.

McGlynn

and Madden

M.G.

, An ensemble dynamic time warping classifier with application to activity recognition, in: Bcs Conf Series, pp. 339–352. doi: 10.1007/978-0-85729-130-1-26.

50.

Mei

Hou

Karimi

H.R.

and Huang

, A novel data-driven fault diagnosis algorithm using multivariate dynamic time warping measure, Abstr Appl Anal (5) (2014), 1–8. doi: 10.1155/2014/625814.

51.

Mei

Liu

Wang

Y.-F.

and Gao

, Learning a mahalanobis distance-based dynamic time warping measure for multivariate time series classification, IEEE TCYB 46(6) (2016), 1363–1374. doi: 10.1109/TCYB.2015.2426723.

52.

Wöllmer

Al-Hames

Eyben

Schuller

and Rigoll

, A multidimensional dynamic time warping algorithm for efficient multimodal fusion of asynchronous data streams, 73, pp. 366–380. doi: 10.1016/j.neucom.2009.08.005.

53.

Wang

Smith

Shao

Huang

Furst

Raicu

and Kim

, C. elegans search behavior analysis using multivariate dynamic time warping, pp. 1569–1576. doi: 10.1109/BIBM.2016.7822754.

54.

Abonyi

Feil

Nemeth

and Arva

, Principal component analysis based time series segmentation – a new sensor fusion algorithm, IEEE CYBERNETICSCOM.

Multivariate dynamic time warping in automotive applications: A review

Abstract

Keywords

1. Introduction

2.1 Clustering

2.2 Time series clustering

3.1 An automotive example: objective assessment of subjective driving perceptions

Table 1 Description of automotive data used for objective assessment

4. Review of the generic MDTW approaches

Table 2 Comparison of MDTW research directions in terms of requirements 1(a) and 1(b)

References

Table 1
Description of automotive data used for objective assessment

Table 2
Comparison of MDTW research directions in terms of requirements 1(a) and 1(b)