Two-dimensional dynamic time warping algorithm for matrices similarity

Abstract

Dynamic Time Warping (DTW algorithm) provides an effective method to obtain the similarity between unequal-sized signals. However, it cannot directly deal with high-dimensional samples such as matrices. Expanding a matrix to one dimensional vector as the input data of DTW will decrease the measure accuracy because of the losing of position information in the matrix. Aiming at this problem, a two-dimensional dynamic time warping algorithm (2D-DTW) is proposed in this paper to directly measure the similarity between matrices. In 2D-DTW algorithm, a three dimensional distance-cuboid is constructed, and its mapped distance matrix is defined by cutting and compressing the distance-cuboid. By introducing the dynamic programming theory to search the shortest warping path in the mapped matrix, the corresponding shortest distance can be obtained as the expected similarity measure. The experimental results suggest that the performance of 2D-DTW distance is superior to the traditional Euclidean distance and can improve the similarity accuracy between matrices by introducing the warping alignment mechanisms. 2D-DTW algorithm extends the application ranges of traditional DTW and is especially suitable for high-dimensional data.

Keywords

Pattern recognition similarity distance dynamic programming DTW

1. Introduction

As a significant similarity measure, distance is widely used in many clustering and classification algorithms in the field of machine learning and pattern recognition [6, 8]. Some well-known distances, such as Euclidean distance, Manhattan distance, chebyshev distance, minkowski distance, hamming distance, cosine similarity and so on [1], explain the similarity from different perspectives. However, a strict precondition is required for these distances that the samples (general are vectors) must be of the same length. Some vectors obtained by certain feature extraction method are very easy to satisfy thus constraint. But for many original signals directly from sampling, it is not applicable to calculate the similarity because of their unequal length, for example, some original speech timeseries and gene sequence. Fortunately, Dynamic Time Warping (DTW) [2] is available to solve this problem and become one of the important distance measures for vectors with unequal length.

DTW algorithm take advantage of the local stretch or compression to the time axes, and search the optimally alignment of points to map one timeseries onto another with difference length. E.g. for the speech signal, it is unavoidable for one speaker that there are differences in tone and environment. In other words, almost all the speech timeseries are different even sampling from the same person with same language content. So it is impossible for traditional methods to one-to-one align the similar points of signals exactly right. With the merit of time warping technique to nonlinear align the sequence, it is credible for DTW to obtain the exact similarity between such unequal length timeseries and then achieve high recognition accuracy. And still for the advantage of nonlinear alignment mechanism, our improved 2D-DTW algorithm is superior to traditional distances even for equal length samples (which will be discussed and analyzed in experimental section).

With the rapid development of intelligent technology, DTW algorithm has been comprehensively applied in many fields in last few years, including the complex sequences recognition by DTW in machine vision and bioinformatics. For the gait recognition in machine vision, Ahmed et al. proposed a DTW-based kernel and rank-level fusion in 2015 [3]. Zhao et al. achieved the region correction for train coupler buffer images in 2017 [14]. For image similarity, Wang et al. proposed a measurement of graph similarity based on vertical dimension sequence DTW in 2018 [11]. That is to say, in addition to speech recognition, DTW algorithm has a wide-ranging application and excellent identification effect in image processing.

Like many bursting algorithms, DTW continuously develop in theory along with the advanced intelligent technology, and some improved hybrid algorithms based on DTW have been explored. For example, in 2011, Jeong et al. proposed the weighted DTW by penalizing points with higher phase difference between a reference point and a testing point [13]. Zhao et al. proposed a shape DTW algorithm in 2018, which try to match the local similar points and avoid the point with different neighborhood structure [7]. In 2018, Li et al. proposed a filtering search method to balance the matching precision and computational efficiency [15]. In 2020, Li et al. developed an adaptively constrained DTW to adjust the corresponding relationship between two trajectories [5].

As an active similarity measure, DTW provides excellent performance for many distance-based machine learning algorithms such as support vector machine and fuzzy clustering. However, because traditional DTW can only process one dimensional sample, feature extraction is generally required for some matrices to get one dimensional vector in advance to complete the optimization procedure. The loss of information in the process of feature extraction will decrease the measurement accuracy. Therefore, it is desirable to overcome this limitation to directly calculate the similarity between matrices. Here, we proposed a two-dimensional dynamic time warping algorithm by designing a distance-cuboid between matrices and searching the shortest path with dynamic programing theory (For the convenience of description, we use the short term 2D-DTW for this new algorithm hereinafter). What’s more, the input data of 2D-DTW is extended from one dimensional sequence to two dimensional matrices which broaden the application ranges and especially suitable for high-dimensional data.

2. Related works

DTW algorithm is one of the branches of nonlinear programming theory, which is to study the relationship of unequal length signals by introducing the dynamic programming technique to get the shortest cumulative distance as the similarity measure.

Suppose two sequences $S$ and $C$ with the length of $n_{S}$ and $n_{C}$ respectively. Without loss of generality, it can be set $n_{S}\neq n_{C}$ . Obviously, there is no existence that the one-to-one corresponding relationship between the elements $s_{i}\in S(i=1,2,\cdots,n_{S})$ and $c_{j}\in C(j=1,2,\cdots,n_{C})$ in the two sequences. In order to correctly calculate the distance between sequences $S$ and $C$ , DTW algorithm construct an adjacent distance matrix D0 by considering single point distance between $s_{i}$ and $c_{j}$ , where the size of matrix D0 is $n_{S}\times n_{C}$ and can be expressed as follows:

$\displaystyle D0=\left[{{\begin{array}[]{*{20}c}{d_{1,1}}&{d_{1,2}}&\cdots&{d_% {1,n_{C}}}\\ {d_{2,1}}&\ddots&\cdots&{d_{2,n_{C}}}\\ \vdots&\vdots&{d_{i,j}}&\vdots\\ {d_{n_{S},1}}&{d_{n_{S},1}}&\cdots&{d_{n_{S},n_{C}}}\\ \end{array}}}\right]$ (1)

The element $d_{i,j}$ in matrix D0 indicates the pairwise distance between data points $s_{i}(s_{i}\in S,i=1,2,\cdots,n_{S})$ and $c_{j}(c_{j}\in C,j=1,2,\cdots,n_{C})$ , which is generally calculated by the Euclidean distance $d_{i,j}=\sqrt{(s_{i}-c_{j})^{2}}$ . Based on the distance matrix D0, DTW algorithm can obtain the optimal match between sequences $S$ and $C$ . More specifically, DTW searches the shortest path in matrix D0 with dynamic programming method and then define the accumulate distance of the shortest path as the DTW distance between sequences $S$ and $C$ .

Algorithm 1: DTW algorithm
Part 1
01 Read D0 // input the distance matrix D0
02 [n_S, n_C] $=$ size (D0) // get the size of matrix D0
03 g $=$ zeros (n_S, n_C)
04 for i $=$ 2 to n_S step 1
05 do g [i, 1] $=$ g [i $-$ 1, 1] $+$ D0 [i, 1]
06 for j $=$ 2 to n_C step 1
07 do g [1, j] $=$ g [1, j $-$ 1] $+$ D0 [i, j]
08 for i $=$ 2 to n_S step 1
09 for j $=$ 2 to n_C step 1
10 do g [i, j] $=$ D0 [i, j] $+$ min (g [i $-$ 1, j], g [i, j $-$ 1], g [i $-$ 1, j $-$ 1])
11 dist $=$ g [n_S, n_C] // output DTW distance
Part 2
12 path $=$ [] // initialize the backtracking path
13 k $=$ 0; i $=$ n_S; j $=$ n_C // initialize the iterative parameters
14 while i $+$ j $!=$ 2 do
15 if i $-$ 1 $==$ 0
16 do j $=$ j $-$ 1
17 elseif j $-$ 1 $==$ 0
18 do i $=$ i $-$ 1
19 else
20 do index $=$ min ([D0 (n $-$ 1, m), D0 (n, m $-$ 1), D0 (n $-$ 1, m $-$ 1)])
21 switch index
22 case 1 do i $=$ i $-$ 1
23 case 2 do j $=$ j $-$ 1
24 case 3 do i $=$ i $-$ 1; j $=$ j $-$ 1
25 k $=$ k $+$ 1
26 path [k] $=$ [i, j] // output the DTW shortest path

Then we provide the detailed process of searching shortest path in matrix D0 by dynamic programming [12, 9]. According to the pairwise similarity of elements $s_{i}$ and $c_{j}$ , the alignment procedure between sequences $S$ and $C$ can be expressed as different stages $V_{I}(I=1,2,\cdots,\text{end})$ of dynamic programming, and the exact position of each stage is corresponding to the state $d_{i,j}\in D0$ . Based on such conception, DTW algorithm can obtain the shortest path from the first position $v_{1,1}$ of initial stage $V_{1}$ to the last position $v_{n_{S},n_{C}}$ of the end stage $V_{\text{end}}$ . E.g. for the position $v_{i,j}$ of stages $V_{I}$ and its corresponding state $d_{i,j}$ , the iterative Eq. (2) is given for the next position.

$\displaystyle g_{i,j}=d_{i,j}+\min\{{g_{i-1,j},g_{i,j-1},g_{i-1,j-1}}\}$ (2)

Where $g_{i,j}$ is the component of matrix g (the accumulated shortest path matrix). It can be known from Eq. (2) that matrix g is assigned during the iterative proceeding from the initial stages to the end stages. That is to say, $g_{i,j}$ is the accumulated shortest distance include all stages from $V_{1}$ to $V_{I}$ . Therefore, $g_{n_{S},n_{C}}$ at stage $V_{\text{end}}$ is the expected shortest distance corresponding to the shortest path, which is just the DTW distance between sequences $S$ and $C$ . Furthermore, in light of the path backtracking method of dynamic programming, the shortest path is consist of the point set: ${{\rm{\bf path}}=\{v_{i,j}}|i\in(1,n_{s}),j\in(1,n_{c})\}$ , and path also represent the alignment relationship between sequences $S$ and $C$ .

To guarantee that DTW algorithm can convergence in finite time and the completeness of the algorithm, some constraint conditions should be satisfied.

Boundary conditions: the warping path must start at the point $v_{1,1}$ and end at $v_{n_{S},n_{C}}$ , to guarantee each point has a corresponding best matching point.

Continuity conditions: as for the calculation of the stage $V_{I+1}$ , it must be controlled in the neighborhood of the previous stage $V_{I}$ so as to guarantee that the element of each sequence is continuous. Generally, the length of step is one unit.

Monotonicity conditions: as for the current location $v_{i,j}$ of the stage $V_{I}$ and any possible location $v_{{i}^{\prime},{j}^{\prime}}$ of the next stage $V_{I}^{\prime}$ , $i\leqslant{i}^{\prime}$ and $j\leqslant{j}^{\prime}$ should be satisfied so as to guarantee that dynamic researching can be convergent to the location $v_{n_{S},n_{C}}$ and the stage $V_{\text{end}}$ .

Comply with the theory of DTW and its constraint conditions, the pseudocode of DTW algorithm is given as.

There are two main parts in DTW algorithm, in where part 1 is the calculation of shortest accumulated distance according to Eq. (2), and part 2 is the backtracking of shortest path (so called “warping path”) from the last state of the minimum cumulative distance to the initial state.

By virtue of introducing the ideas of dynamic programming to compare single element, DTW algorithm provide an effective method to obtain the distance between unequal length sequences, as well as the method for equal length sequences to obtain more accurate distance. However, the limitation of traditional DTW is that it cannot directly deal with high-dimensional samples such as matrices.

3. 2D-DTW algorithm

As for the position information in matrix, one dimensional sequence has no ability of storage and process, which limits the applications of traditional DTW in some practical problems. That is the motivation of our work to develop the similarity between two dimensional matrixes to improve the traditional method. The proposed algorithm based on dynamic programming theory is named as 2D-DTW algorithm.

Different from other pattern recognition methods such as deep learning or neural networks, 2D-DTW algorithm can obtain the similarity between two matrices without any training samples. Not needing to collect a large number of reliable samples as training data, 2D-DTW can be directly applied to unsupervised learning without prior knowledge. In theory, the similarity result between two matrices in 2D-DTW algorithm is independent of the size of dataset, whether for a small sample or a large-scale sample. As a contrast for ANN, if the training data is insufficient or not with high quality, the performance will be seriously affected.

What more, 2D-DTW can process unequal-sized samples by warping alignment mechanism, which is essentially different from other methods that using one-to-one correspondence between equal-sized samples. However, many approaches such as ANN, have difficulty to deal with unequal-sized samples. Some special steps are required to convert the unequal-sized samples to equal-sized samples, which will cause the loss of information in the matrix and then impair the result.

3.1 Construct distance-cuboid

Suppose two matrices $T$ and $M$ , the size of $T$ is $r\times q$ and $M$ is $p\times q$ , and $(r,q,p)$ satisfy the following condition

$\displaystyle(r,p,q)|r\geqslant 2,p\geqslant 2,p\geqslant 2,r\neq p\cup r=p\}$ (3)

In order to construct the distance-cuboid, matrices $T$ and $M$ should have equal columns, but there is no requirement for their number of rows $r$ and $p$ to be equal. Without loss of generality, it is assumed $r\neq p$ in the following details of description for 2D-DTW. On such condition, there is no existence that the one-to-one data point matching relationship between matrices $T$ and $M$ . However, in any case, matrices $T$ and $M$ can be put together (alignment) according to their same size of column. That is, if we put matrices $T$ and $M$ in two orthogonal planes, we can get a three-dimensional cuboid where $T$ and $M$ respectively as the top and back side of the cuboid like that in Fig. 1.

Figure 1.

Three-dimensional cuboid by matrix alignment.

If the cuboid is regarded as a three dimensional matrix, then the values of single point distance between matrices $T$ and $M$ can be stored in the responding position in the cuboid. Here we define the single point Euclidean distance as the elements in the cuboid to express the pairwise similarity. Consequently, we define a three dimensional matrix cuboid-D between matrices $T$ and $M$ and gives the element values in cuboid-D as Eq. (4)

$\displaystyle\textit{cuboid-D}_{j,i,k}=\sqrt{(T_{j,k}-M_{i,k})^{2}}$ (4)

Where $1\leqslant i\leqslant p$ , $1\leqslant j\leqslant r$ , $1\leqslant k\leqslant q$ . The distance values in cuboid-D depend on the elements in matrices $T$ and $M$ which are respectively as the top and back side of the three dimensional cuboid. It should be aware of that the structure of cuboid-D can be regarded as the single point comparison.

3.2 The mapped distance matrix

To obtain the shortest path in cuboid-D with dynamic programming, we use the cutting and compressive technique to map the three dimensional cuboid-D to one plane and get a two dimensional mapping matrix. The detailed method of accumulating mapping values of the cuboid is given below.

In Eq. (4), there is the same $k$ value in the assignment process for cuboid-D which indicates the column of matrices $T$ and $M$ . To get the compressed information both contain matrices $T$ and $M$ , the cuboid-D should be mapped to a plane perpendicular to both top and back side, such as the left side. So we firstly cut the cuboid-D parallel to matrix $T$ to get $p$ planes, and then cut each plane by row (parallel to matrix $M$ ) to get $r$ vectors with length of $q$ . That is to say, there are $r*p$ vectors and each vector corresponds to a given position in the mapping plane. We then calculate the sum of elements in each vector as the cumulative mapping values. The relationship of cutting cuboid-D for the compressive mapping is visualized show in Fig. 2.

Figure 2.

The demonstration of cutting cuboid-D.

The cutting method in Fig. 2 can also be regarded as respectively cutting the matrices $T$ and $M$ by rows and get the vectors $\textit{row}_{j}^{(T)}\in T(j=1,2,\cdots,r)$ and $\textit{row}_{i}^{(M)}\in M(i=1,2,\cdots,p)$ , where vectors $\textit{row}_{j}^{(T)}$ and $\textit{row}_{i}^{(M)}$ have the same length. We define $\textit{row\_d}(\textit{row}_{j}^{(T)},\textit{row}_{i}^{(M)})$ to measure the similarity between $\textit{row}_{j}^{(T)}$ and $\textit{row}_{i}^{(M)}$ , that is the Euclidean distance between vectors $\textit{row}_{j}^{(T)}$ and $\textit{row}_{i}^{(M)}$ , the relationship between $\textit{row}_{j}^{(T)}$ and $\textit{row}_{i}^{(M)}$ is shown in Fig. 3.

Figure 3.

The relationship between $\textit{row}_{j}^{(T)}$ and $\textit{row}_{i}^{(M)}$ .

After cutting cuboid-D and getting all $\textit{row\_d}(\textit{row}_{j}^{(T)},\textit{row}_{i}^{(M)})$ , the accumulated values of all elements in cuboid-D have been compressed to the left side. And then we define $\textit{row\_d}(\textit{row}_{j}^{(T)},\textit{row}_{i}^{(M)})$ to represent the mapped distance matrix D whose elements are defined as Eq. (5).

$\displaystyle D_{j,i}=\textit{row\_d}(\textit{row}_{j}^{(T)},\textit{row}_{i}^% {(M)})(j=1,2,\cdots,r;i=1,2,\cdots,p)$ (5)

Substitute Eq. (4) into Eq. (5), we can obtain Eq. (6)

$\displaystyle D_{j,i}=\sum\limits_{k=1}^{q}{\sqrt{(T_{j,k}-M_{i,k})^{2}}}(j=1,% 2,\cdots,r;i=1,2,\cdots,p)$ (6)

3.3 Distance and row alignment between matrices

Distance matrix D contains all the comparison between $\textit{row}_{j}^{(T)}$ and $\textit{row}_{i}^{(M)}$ , that is the important basis of matrix similarity, especially for matrices with different location information. Therefore the subsequent work of 2D-DTW algorithm is searching the shortest path in matrix D and obtaining the rows alignment between matrices $T$ and $M$ . The compressed distance matrix D in Eq. (6) in 2D-DTW algorithm serves the same role as the distance matrix in Eq. (1) of DTW algorithm. Therefore, the shortest path (rows alignment) in matrix D is also searched by dynamic programming, as well as the shortest distance between matrices $T$ and $M$ can be obtained.

Although distance $D_{j,i}$ in Eq. (5) is defined by row comparision, column information is still contained in the corresponding row sequence. It can be easy known in Eq. (6) that the column element as variable $k$ is accumulated from 1 to $q$ in the iteration. So the shortest distance in matrix D contains both the comparative information of row and column.

It should be noticed that the value of shortest distance vary with the size of input matrices. In order to eliminate the effect of matrix size and get an unbiased similarity measure, we define a standard distance as Dist in Eq. (7) by normalizing the shortest distance.

$\displaystyle\text{Dist}=\frac{\text{dist}}{r\ast q\ast p}$ (7)

Where, dist is the shortest distance that is directly obtained from matrix D. So far we achieve the expected similarity between two matrices, as well as the row alignment of the input matrices. We used 2D-DTW distance to express the standard distance Dist when mentioned below.

2D-DTW distance obtained from Eq. (7) is an unbiased similarity measure, it can be regarded as a dimensionless measure that will not vary with data granularity. Regardless of the size of the matrix, the more similar the two matrices are, the smaller the 2D-DTW distance is. Of course, fine-grained data generally produce more accurate distance based on more reference comparisons, at the same time, the algorithm needs much time to search the optimal matching between all elements. That is to say, data granularity will affect the calculation speed and the similarity accuracy, but the similarity comparison of dimensionless measures is independent of granularity data. The provided 2D-DTW algorithm above only presents the cutting and compressive method for matrices with same column. Obviously, it is available for matrices with same row that cuboid-D can be cut and mapped to the bottom side to get compressed distance matrix. It needn’t to descriptions the algorithm again for the same steps. In the afterward section of experiments, our study is based on the same column matrices.

4. Experimental results and analysis

The theoretical rationality of 2D-DTW algorithm is analyzed firstly by using unequal sized matrices, then for the convenience of comparison with traditional method such as Euclidean distance, the equal sized matrices are used for the rest experiments. To verify the performance of 2D-DTW algorithm in practical applications and its sensitivity to noise and missing data, we performed the comparison experiments based on CIFAR-10 database, handwritten digit and real fingerprints.

4.1 Theoretical rationality experiment

Suppose two random integer matrices $T$ and $M$ with size of $4\times 5$ and $5\times 5$ respectively, here are used to verify the theoretical feasibility of the proposed 2D-DTW algorithm for the similarity between unequal sized matrices:

$\displaystyle T=\left[{{\begin{array}[]{*{20}c}{81}&{63}&{95}&{95}&{42}\\ {90}&{10}&{96}&{49}&{91}\\ {13}&{28}&{16}&{80}&{79}\\ {91}&{55}&{97}&{15}&{95}\\ \end{array}}}\right]\quad M=\left[{{\begin{array}[]{*{20}c}{65}&{76}&{70}&{82}% &{44}\\ 4&{74}&4&{69}&{38}\\ {85}&{39}&{28}&{32}&{76}\\ {93}&{65}&5&{95}&{79}\\ {68}&{17}&{10}&4&{19}\\ \end{array}}}\right]$

The distance between matrices $T$ and $M$ obtained from 2D-DTW algorithm is 454 according to Eq. (7), meanwhile, the rows alignment $\textit{path}(r^{(T)},r^{(M)})$ is shown in Fig. 4.

Figure 4.

Row alignment between matrices $T$ and $M$ .

It can be seen from Fig. 4 that the row 1 and row 2 in matrix $T$ align to row 1 in matrix $M$ , row 3 in $T$ align to row 2,3,4 in $M$ , row 4 in $T$ align to row 5 in $M$ .

4.2 2D-DTW and Euclidean distance based on CIFAR-10 database

In many applications, the foreground image in the sampling matrix cannot be in the same size for some uncontrollable factors. We compared the 2D-DTW distance and the traditional Euclidean distance based on CIFAR-10 image database [4]. No. 1 image in data_batch_2 in airplane dataset serve as the template sample which is shown in Fig. 5a. The 10 test samples randomly selected from the airplane dataset is No. 500, No. 85, No. 789, No. 29, No. 914, No. 719, No. 481, No. 570, No. 234, No. 845, which are renumbered as 1–10 images in Fig. 5b (when mentioned below we use the new numbers 1–10).

Figure 5.

Airplane images of template and test sample.

We preprocess the airplane images by binarization technology with Matlab tools [10] and get the binary images with the foreground only include the object of airplane. The image size of template and samples is 32 $\times$ 32 (exclude the black frame in template). Where the samples 1–9 are the black foreground airplane and white background except some error with black background, but sample 10 is the converse white foreground airplane and black background. Then the 2D-DTW distance and the Euclidean distance between the template and 10 test samples are calculated respectively and shown in Table 1.

Table 1

The comparison of 2D-DTW distance and Euclidean distance

Airplane no.	2D-DTWdistance	Euclidean distance
1	0.004180908	0.007996
2	0.015350342	0.01535
3	0.013641357	0.01535
4	0.010986328	0.011444
5	0.009979248	0.010651
6	0.004364014	0.00589
7	0.00994873	0.011383
8	0.012084961	0.015381
9	0.008911133	0.009766
10	0.01965332	0.02063

Table 2

The computing time of 2D-DTW distance and Euclidean distance (second)

Samples no.	2D-DTW distance	Euclidean distance
1	0.020079	0.001166
2	0.004952	0.000711
3	0.006106	0.000907
4	0.002897	0.000093
5	0.00446	0.000277
6	0.00318	0.000047
7	0.00354	0.000042
8	0.002874	0.000054
9	0.003902	0.000055
10	0.002625	0.000031

It can be seen in Table 1 that No. 10 airplane is least similar to the template either by 2D-DTW distance or by Euclidean distance, primarily because of the opposite foreground and background. According to the distance values in Table 1, No. 1 and No. 6 airplane are most similar to the template based on 2D-DTW distance with the approximately equal smaller values 0.004180908 and 0.004364014. However, if considering the smallest Euclidean distance 0.00589, No. 6 airplane is most similar to the template. We also observed that the second similar Euclidean distance is No. 1 airplane, but there is a lot bigger value than that of No. 6 airplane.

Then we further analyze the disagreement results on No. 1 airplane. We can see that No. 1 airplane displays the visible aircraft outline and body shape, which is similar to the template aircraft with upper-left direction body shape, and there is somewhat similar in airplane’s wing. The main difference between them is that the aircraft in template is smaller than that in No. 1. Fortunately, this difference cannot affect the judgment of 2D-DTW algorithm benefit from its warping alignment mechanism for matching the similar point in the matrix. In contrast, the unequal sized foreground increased the Euclidean distance between No. 1 airplane and the template. The most similar No. 6 aircraft judged by Euclidean distance displays the visible approximate sized aircraft outline and the upper-left trend. According to the statistics, the foreground pixels proportion in template is 894/1024 and No. 6 image is 841/1024. The approximate percentages of black pixels in the two images lead to the above results judged by Euclidean distance.

In short, the 2D-DTW algorithm takes more into account about the matching of shape, trend and resize during the distance calculation. As for the one-to-one point comparison of Euclidean distance, the judgment depends on the object pixels percentage which conceal the important information such as shape, trend and resize.

Additionally, the computing time of the 10 test samples under the same environment is shown in Table 2.

We can know from Table 2 that 2D-DTW algorithm expends the total of 0.054615 seconds to computing all the 10 samples, and the average time is 0.0054615 seconds. The total time for Euclidean distance is 0.003383 seconds and the average time is 0.0003383 seconds. It is the alignment mechanism in 2D-DTW algorithm that increases the time consumption to achieve a more optimal matching.

4.3 Experiment on real fingerprints

The real fingerprint images come from semiconductor fingerprint sensor. The six fingerprint samples are respectively collected from three different overlap areas of right thumb image and three areas of left thumb image, and are stored as 176*176 grey images. The six fingerprint samples are numbered as in Fig. 6.

The Left thumb and Right thumb can be viewed as two categories, and then we performed the identification experiment by different distance. The pairwise similarities by 2D-DTW distances are shown in Table 3 and Euclidean distance in Table 4.

Table 3
The pairwise similarity by 2D-DTW algorithm

No.	Left 1	Left 2	Left 3	Right 1	Right 2	Right 3
Left 1	0.000	83.527	79.051	83.159	77.315	82.491
Left 2	83.527	0.000	76.130	79.164	76.165	82.810
Left 3	79.051	76.130	0.000	80.803	77.984	85.891
Right 1	83.159	79.164	80.803	0.000	72.236	80.257
Right 2	77.315	76.165	77.984	72.236	0.000	73.311
Right 3	82.491	82.810	85.891	80.257	73.311	0.000

Figure 6.

Real fingerprint images from semiconductor sensor.

Table 4

The pairwise similarity by Euclidean distance

No.	Left 1	Left 2	Left 3	Right 1	Right 2	Right 3
Left 1	0.000	98.295	100.173	96.995	94.758	95.647
Left 2	98.295	0.000	94.661	94.905	94.163	96.881
Left 3	100.173	94.661	0.000	96.329	95.981	100.169
Right 1	96.995	94.905	96.329	0.000	91.809	95.622
Right 2	94.758	94.163	95.981	91.809	0.000	90.789
Right 3	95.647	96.881	100.169	95.622	90.789	0.000

For each fingerprint images, the distances between it and all other images are calculated, and the zero values in Tables 3 and 4 indicate the comparison with oneself. The most similar results (except for oneself) for fingerprint indicated by each column heading are highlight with bold font. As for the highlighted cell, if the fingerprint indicated by row heading is the same category as the column heading, it is considered as a correct identification. The results in Tables 3 and 4 reveal that 2D-DTW algorithm successfully identify 5 samples (Left 1 is wrong), while Euclidean distance identify 4 samples (Left 1 and Left 2 are wrong).

Once again, 2D-DTW algorithm obtains expected performance benefit from warping alignment technique. The fingerprint images from one thumb possess the same ridge characteristic rather than the one-to-one similar points, only by local stretch or compression technique can achieve the warping optimally matching points. This is the explanation why the 2D-DTW algorithm is superior to Euclidean distance for this practical application.

4.4 Sensitivity analysis of 2D-DTW based on handwritten digit

We then verify the sensitivity of 2D-DTW algorithm to noise and missing data. Figure 7a is a handwritten image with the correct digit 0, Fig. 7b is a synthetic sample with 10% random noise, and Fig. 7c is also a synthetic sample with 10% data missing in foreground. Then we construct 10 template digit images with Times New Roman font. The different handwritten digit samples and 10 template digits are shown in Fig. 7.

Figure 7.

Handwritten digits and 10 template digits. (a) Handwritten digit (b) Handwritten digit with 10% noise (c) Handwritten digit with 10% data missing in foreground (d) Template digits.

In Fig. 7, all handwritten images and template images are stored in matrices with size 28 $\times$ 28. Euclidean distance, traditional DTW and 2D-DTW are used to calculate three different distances between the test sample and 10 template images. The calculation of 2D-DTW distance is directly based on matrix, whereas the DTW and Euclidean distance need to pre-expand the image matrix into one-dimensional vector. The recognition results give the five most similar digits to the test samples, which are in order of distance from small to large, as shown in Table 5.

Table 5

The comparative recognition results obtained from three distances

	Five most similar recognition digits
	2D-DTW distance	DTW distanc	Euclidean distance
Sample (a)	0 9 8 6 5	3 6 5 0 9	6 4 0 5 1
Sample (b)	0 9 6 4 8	2 8 5 9 3	6 4 5 1 0
Sample (c)	1 0 5 6 3	4 7 6 0 5	1 4 6 5 7

Among the five most similar digits, the closer the number 0 is to the left, the better the performance of the corresponding algorithm. We can observe in Table 5 that the recognition results of sample (c) are the worst, which suggest that all three methods are sensitive to samples with missing data. And noise only affects DTW algorithm, but has a little effect on 2D-DTW and Euclidean distance.

By warping match the similar point, 2D-DTW algorithm achieve the best recognition results. In contrast, Euclidean distance using the one to one point comparison is inefficient for the roughly similar image. DTW is limited to one-dimensional method, and some location information will be lost after expanding the image matrix, which is the main reason for its poor recognition.

5. Conclusion

A new distance measure is defined in the proposed 2D-DTW algorithm to improve the traditional method. Where the input data of 2D-DTW is two dimensional matrices and the output is the similarity between matrices. By constructing a distance-cuboid and searching shortest path using dynamic programming in the mapped matrix, 2D-DTW algorithm can directly obtain the distance between matrices. It is worth mentioning that 2D-DTW can used for the especial unequal size matrices such as that have equal column but unequal row or vice versa. In additions, by normalizing the accumulated shortest distance, the 2D-DTW distance can eliminate the effect of input matrix size and get the unbiased similarity. By virtue of the introduction of warping alignment mechanism, 2D-DTW algorithm can measure the similarity of matrices more accurately than traditional Euclidean distance.

The experimental results based on CIFAR-10 images database and real fingerprints show that 2D-DTW algorithm can achieve expected performance within an acceptable computing time. Certainly, without considering the comparison, the unequal sized matrices (with the same number of row/column) can be used in the practical problem. With the idea of directly taking matrices as input data, 2D-DTW algorithm simplifies the feature extraction or preprocess step in machine learning or pattern recognition. In short, 2D-DTW algorithm based on dynamic programming provides a promising method for the similarity measure of equal size matrices and special unequal size matrices.

In the proposed method, the distance cuboid is constructed by using complete matrices. Actually, for the similarity between matrices in a certain domain, the searching of shortest path can be further simply if the matrices may be partitioned. Taking the similarity of two images as an example, if there are some domain knowledge, such as the matching of local region or special point, these prior knowledge can be used to pre segment the matrices, and then the calculation time between sub-matrices can be greatly reduced. Our future research work will focus on the computational efficiency.

Footnotes

Acknowledgments

This work was supported by National Natural Science Foundation of China (61402202, 61772013), China Postdoctoral Science Foundation (2015M581724).

References

Alpaydin

, Introduction of Machine Learning, China Machine Press, Beijing, 2014, 95–97.

Keogh

and Ratanamahatana

C.A.

, Exact indexing of dynamic time warping, Knowledge & Information Systems 7 (2005), 358–386.

Ahmed

Paul

P.P.

and Gavrilova

M.L.

, DTW-based kernel and rank-level fusion for 3D gait recognition using Kinect, The Visual Computer 6 (2015), 915–924.

Liu

et al., CIFAR10-DVS: An event-stream dataset for object classification, Frontiers in Neuroscience 11 (2017), 309.

Liu

Yang

et al., Adaptively constrained dynamic time warping for time series classification and clustering, Information Sciences 534 (2020), 97–116.

Yang

Zhang

et al., An initialization method based on hybrid distance for k-means algorithm, Neural Computation 29 (2017), 3094–3117.

Zhao

and Itti

, shapeDTW: Shape dynamic time warping, Pattern Recognition 74 (2018), 171–184.

Luo

Lin

and Xi

, Review of classification algorithms in date mining, Computer Engineering 31 (2005), 3–5.

Kasthuri

Kumar

S.B.R.

and Khaddaj

, PLIS: Proposed Language Independent Stemmer for Information Retrieval Systems Using Dynamic Programming, in: 2017 World Congress on Computing and Communication Technologies (WCCCT), Tamil Nadu, India, 2017, pp. 132–135.

10.

Gonzalez

R.C.

Woods

R.E.

Eddins

S.L.

et al., Digital Image Processing Using MATLAB, 3rd edition, Publishing House of Electronics Industry, Beijing, 2005, pp. 305–310.

11.

Wang

OuYang

and Chen

, Measurement of graph similarity based on vertical dimension sequence dynamic time warping method, Journal of Jilin University (Engineering and Technology Edition) 48 (2018), 1199–1205.

12.

et al., Fundamentals and applications of operations research, Sixth Edition, Higher Education Press, Beijing, 2014, 198–215.

13.

Jeong

Y.S.

Jeong

M.K.

and Omitaomu

O.A.

, Weighted dynamic time warping for time series classification, Pattern Recognition 44 (2011), 2231–2240.

14.