Abstract
In technical systems the analysis of similar load situations is a promising technique to gain information about the system’s state, its health or wearing. Very often, load situations are challenging to be defined by hand. Hence, these situations need to be discovered as recurrent patterns within multivariate time series data of the system under consideration. Unsupervised algorithms for finding such recurrent patterns in multivariate time series must be able to cope with very large data sets because the system might be observed over a very long time. In our previous work we identified discretization-based approaches to be very interesting for variable length pattern discovery because of their low computing time due to the simplification (symbolization) of the time series. In this paper we propose additional preprocessing steps for symbolic representation of time series aiming for enhanced multivariate pattern discovery. Beyond that we show the performance (quality and computing time) of our algorithms in a synthetic test data set as well as in a real life example with 100 millions of time points. We also test our approach with increasing dimensionality of the time series.
Keywords
Introduction
The purpose of time series pattern discovery algorithms is the detection of recurring similar subsequences within long sequences of data samples without having prior knowledge of the patterns in general. Depending on the background of the user the impression of a pattern can vary a lot and therefore lead to misunderstandings. One of the most used names in the wide field of pattern discovery in time series is motif discovery. In Section 2 both terms motif and pattern discovery will be defined and the differences will be highlighted.
In the automotive industry, as well as in many other industries, the available amount of data is increasing extremely fast. Beside video data also time series data is highly interesting, because it contains a lot of information concerning the system’s state and behavior in all kinds of situations. By the use of pattern discovery methods it is possible to identify different, recurring load situations with (almost) no expert knowledge. Subsequently this opens up the opportunity for automatized analyses regarding anomaly and aging detection or usage of various systems. For those analyses it is necessary to develop algorithms which reliably detect the potentially noisy and time warped patterns (situations) within an acceptable amount of time. Depending on the use case it is sometimes more important to discover patterns with a high similarity, while in other cases the computing time is more critical. Therefore, we need an adjustable algorithm to face both requirements. Furthermore, it is necessary to analyze multivariate time series, because in most technical use cases a system’s state can only be described by multiple signals. Beyond that we assume the pattern length to be a priori unknown and the time series length at least in a 100 million sample scale. In the automotive context this data set length is typical for the whole life of a passenger car.
We now formulate our requirements and expectations concerning a method, that solves the problem of pattern discovery. In contrast to other publications, e.g. Matrix Profile [21], we demand the patterns to be similar not just in shape, but also in amplitude and offset. This is crucial for anomaly detection, due to the fact that technical systems usually are stressed very differently when the amplitude or offset of a pattern is varied. Imagine a vehicle performing three acceleration events with qualitatively similar shape. First, the vehicle accelerates from 0 to 30 kph. The second acceleration is from 30 to 60 kph. Hence, it has an offset of 30 kph in comparison to the first event. Within the third event the vehicle accelerates from 0 to 50 kph and, thus, the event has a higher amplitude than acceleration events one and two. For the powertrain system these three situations are definitely causing different stress due to different characteristics in e.g. torque and engine speed. This is why the patterns need to be invariant to offset and amplitudinal scaling. Beyond this, a certain amount of time scaling is allowed within the members of a pattern. Hence, the members duration may vary. We also expect parts of a member to be stretched, while other parts of the same member may be compressed in comparison to another member of the same pattern. We refer to this characteristic as time warping, because of its similarity regarding Dynamic Time Warping technique to calculate a distance of two subsequences [15]. Nevertheless it would be beneficial to be able to limit the amount of time warping that is accepted within a pattern. Furthermore, we require the method to discover patterns with variable length in multivariate time series. In general, all these requirements are targeted to distinguish different stress patterns as precise as necessary, while maintaining some blur. The blur within patterns is necessary due to the diversity of real life data.
The definition of multidimensionality in the context of pattern discovery strongly varies. In [21] the authors classified the objectives into three categories guided, constrained and unconstrained search. Guided search means discovery of patterns in
In our previous work [12] we already explored and evaluated different approaches for univariate pattern discovery. We identified the discretization based methods to be very interesting because of their capability to run extremely fast by reason of the simplification. In this paper we want to address the weaknesses of those approaches like symbolic toggling and insufficient handling of time warping. Beyond that we present solutions for the handling of multivariate time series. To fit the aforementioned requirements, we developed additional preprocessing steps, based on discretized time series.
The paper is structured as followed. The next section first defines important terms like motif and pattern and then reviews related work concerning pattern discovery in time series. Section 3 introduces new preprocessing steps to increase the quality of discretization based pattern discovery algorithms. We demonstrate their performance in Section 4 using two synthetic and one real life time series. Section 5 finally gives conclusions and an outlook for further research topics.
Background and related work
Background
As mentioned before, the term pattern discovery can be interpreted in many different ways. To create the necessary transparency, at first the fundamental terms time series as well as subsequence need to be defined:
.
A time series
.
A subsequence
Following these definitions every time series
One of the most used names in the wide field of pattern discovery in time series is motif discovery. Unfortunately, definitions of time series motifs, or just motifs, are not consistent in the literature. For example within the Matrix Profile approach it is defined as “the most similar subsequence pair of a time series” [21], while Lin et al. described it as “a pattern that consists of two or more similar subsequences based on some distance threshold” [8]. To solve this inconsistency this work assumes a definition of motifs based on Matrix Profile [21]:
.
A time series motif is the most similar subsequence pair of a time series. Formally,
A trivial match is a subsequence that is matched to itself or to a subsequence largely overlapping with itself. Hence, a subsequence and its adjacent subsequence can not be considered as the motif pair. Note that the original definition of a motif is slightly adapted here, because of the limitation to equal length motifs. In general, two subsequences of a motif can be of unequal length if the corresponding similarity measure is able to cope with it. An algorithm that solves the problem of motif discovery can always be extended to find the k-best motifs like e.g. in [13]. The result is a set of k motifs with the highest similarity or smallest distance scores. In contrast to the term motif, a pattern is now defined by following the explanations of [8]:
.
A time series pattern
Hence, the definition of a pattern includes the definition of a motif. Consequently an algorithm that solves the problem of motif discovery can most likely be extended to do pattern discovery.
Related work
In this paper we focus on discretization based pattern discovery approaches. However, there are other interesting approaches for pattern discovery in time series, e.g. approaches based on Dynamic Time Warping [19, 6] or Euclidean Distance [21]. In contrast to discretization-based approaches they do not need any preprocessing of the time series. On the downside they need much more computing time as we demonstrated in previous work [12]. Beside these classical attempts, there are also some publications dealing with pattern discovery by applying artificial Neural Networks, more precisely convolutional autoencoders like in [1].
The research in discretization based pattern discovery has earned more and more attention since the first introduction of the Symbolic Aggregate Approximation algorithm (SAX) in [9]. It combines the filtering of time series by a Piecewise Aggregate Approximation (PAA) and subsequently equiprobable discretization. Based on this symbolic representation of time series many authors developed methods to find recurring patterns, initially with fixed length and later with variable length. The approaches for fixed length pattern discovery mostly used the random projection algorithm like in [10], which is not suitable for an extension to variable length pattern discovery, due to its massive memory usage. For this kind of requirements the Sequitur algorithm, first introduced in [11], is much better suited. It extracts a hierarchical structure from the symbolic time series and at the same time compresses it by replacing every discovered pattern by a new symbol. Hence, its memory usage reduces with every pattern while running the algorithm. Using these two algorithms, SAX and Sequitur, and adapting them for different purposes became popular. This concept was used for example in [4] and extended by a numerosity reduction technique. Senin et al. also used it in [16] but focused more on the framework and the visualization. The authors of [14] chose an additional preprocessing step by applying linear differential operators and then used SAX and Sequitur to detect the hierarchical structure of the time series. Beyond that in [17] an adaption of the SAX was firstly proposed to calculate a multi-resolution representation, called iSAX. Gao and Lin went a step further in [3] and proposed not only an alternative multi resolution SAX but additionally a data structure for its storage, called adaptive SAX-Forest, and an extension of the Sequitur algorithm, called distance propagation Sequitur. Their main focus was simplifying the search for the best parameters and enabling the handling of large scale time series.
The implementation of multivariate time series pattern discovery can be handled in different ways. As already mentioned, the authors of [21] named three different queries: guided, constrained and unconstrained pattern search in multivariate time series. They proposed a variation of the Matrix Profile taking into account not only each dimension, but every possible subset of dimensions, which is a solution for fixed length pattern discovery. Another common solution is the analysis of every dimension separately with a post processing method to match patterns of different dimensions. A similar approach was already proposed by Höppner in [5] with the use of Allen’s interval algebra, which classifies the relationship between patterns of different dimensions. In [20] the authors proposed the calculation of a coincidence graph, based on univariate random projection results, and a graph clustering method. Minnen et al. [10] first calculated all dimensional patterns and then proposed a relevance estimation for every dimension. Beside these approaches, there is the common practice to reduce the multivariate time series to a univariate time series representation e.g. by applying Principle Component Analysis (PCA) like in [7, 18].
Methodology
Overview of the pattern discovery toolchain including new preprocessing steps.
In this chapter we introduce our new approaches within a general workflow. Figure 1 depicts an overview of our pattern discovery tool chain including all of our newly proposed preprocessing functions which are marked in red. Our baseline methods, that have already been tested in [12], are shown as bordered boxes. The proposed path through the framework is additionally marked by red arrows. While the whole toolchain is highly automated, the user still has the opportunity to influence the results by adapting the parametrization. Especially in case of discretization, domain knowledge can be helpful in order to have the right amount of abstraction without losing important information of the time series.
After a short introduction to our basic tool chain in sec:basic, we describe challenges, that we faced while working on the pattern discovery and discuss possible solutions. In sec:hyst the problem of symbolic toggling is described and solved by the introduction of Univariate Hysteresis. sec:multi proposes the Unification and therefore the handling of multivariate time series. Last but not least, the problem of time warping is introduced in sec:timewarping and solved by the proposition of Logarithmic Reduction technique. Finally, in sec:pd the pattern discovery in symbolic time series is briefly described which is based on previous work [12].
Our basic tool chain is described in our previous work in [12] which includes algorithms for discretization, symbolic reduction and pattern discovery. Later in sec:evaluation we will refer to it as baseline method. Note that this tool chain is only capable of discovering patterns in univariate time series. The problem of multidimensionality is addressed in Section 3.3. Until then only univariate time series are addressed.
In this paper we use the simple approach for equidistant discretization for the univariate time series. In the following sections we will use the term symbol for a discretized value and the number of bins will be denoted as no_steps. Unlike other approaches that use e.g. SAX for discretization, which includes an averaging by PAA, we do not apply any filtering in terms of our baseline method. Symbolic reduction, also known as numerosity reduction technique, deletes all but one symbol in each consecutive sequence of identical symbols. Doing so, the pattern discovery is able to discover patterns that are scaled in time. Unfortunately this reduction approach deletes all information regarding the duration of a sequence. By applying the Sequitur or the Pattern Enumeration algorithm from [12] we are able to discover patterns in univariate time series.
Problem of symbolic toggling
Problem of symbolic toggling (left) and solution by hysteresis (right).
One of the biggest problems in discretizing time series is the symbolic toggling that results from noisy data which is visualized in Fig. 2. The left side shows a subsequence of a univariate time series which is classified into the symbols A and B by a specified discretization boundary. Below the axes the resulting symbolic time series is represented. Obviously there is a time interval with symbols jumping alternatingly between A and B which we call symbolic toggling. Because of the irregularities due to the toggling the patterns can not be identified in a trustworthy way. This is why we need a method to reduce this toggling.
Common solutions are filtering methods to reduce the noise of the time series before discretizing it. One possibility to do it is to apply PAA, which averages the time series and, hence, is able to reduce the toggling. PAA applies a sliding window of size windowsize and calculates the mean value of the values within the window. Starting at the beginning of the time series, the window slides over the whole time series. This sliding is parametrized by stepsize, which defines the number of samples the window is pushed further after the mean of a window is calculated. On the downside PAA smooths the dynamic processes and thus is a limitation for the discovery of highly dynamic patterns. In the automotive context especially the dynamic patterns are of interest for the engineers, which is why we want to avoid these limitations. Nevertheless it still can be a powerful filtering method to reduce the symbolic toggling when the parameters are selected carefully. The issue of smoothing the dynamic processes can be solved by Piecewise Linear Approximation (PLA). However, when using PLA the parameters also have to be set carefully to have the right amount of smoothing while preserving the dynamic processes. Furthermore, implying that the input time series is interpolated at equidistant time steps, it destroys the regular temporal structure of the time series. Note that this kind of temporal structure is helpful for the quality of the patterns due to the inability of the pattern discovery algorithms to take into account the time explicitly. The temporal dimension is handled in more detail in sec:timewarping. To resume, a filtering method is needed which preserves the temporal structure and denoises the time series. Therefore, we could use e.g. a frequency filter based on Fast Fourier Transformation (FFT) or wavelet denoising. Both methods approximately work in the same way, they transform the time series into another spectrum. FFT transforms it into the frequency spectrum. By setting small frequencies to zero and transforming it back to a time series, the signal will be denoised. The cut-of frequency is the parameter that needs to be chosen. In contrast to FFT, the wavelet transformation transforms the time series not only into a sine and cosine frequency spectrum but other base functions. The cut-of frequency will be referred to as denoiselevel. For our parameter experiments in sec:evaluation we test the wavelet denoising on different levels denoiselevel and the PAA with varying windowsize and stepsize.
Beside these common filtering approaches, we now introduce another solution to the problem of symbolic toggling. The concept, that is visualized on the right side of Fig. 2, is known as hysteresis or mathematically speaking backward bifurcation. Hysteresis is basically the delayed effect of a system after a change in the cause and is widely used to describe the physical behavior of e.g. the magnetization of iron. While in most cases the hysteresis can only be observed, we can use this concept to solve the problem of symbolic toggling. Therefore, we assume we already have a discretized time series
The variable
In the example of Fig. 2 the algorithm detects an HBT within the maximum duration when checking the first SVC from A to B. Therefore, the SVC is valid and the symbolic values from index scv to index hbt are changed to the value B. Thus the symbolic toggling is eliminated within the interval
The other two cases are sketched in Fig. 3. The blue box shows a case with no HBT, but instead the duration of value
Behavior of the algorithm if the signal value never breaks through the hysteresis boundary including the special case if instead the value stays in the hysteresis region for at least 
Example of the unification of two dimensional discrete time series.
After proposing solutions the problem of symbolic toggling, we now discuss options for the multivariate pattern discovery. As already mentioned there are basically two approaches in terms of using the pattern discovery algorithms that were already described. The first one is to run the pattern discovery separately on each dimension and then analyze the sequence of the patterns in different dimensions. The second approach is to somehow reduce the dimensionality of the time series to one dimension and then run the pattern discovery just once. The first case is interesting because it opens up the possibility to find not only all-dimensional patterns, like in the second approach, but also shifted in time. Hence, it solves a more general problem than the second approach but also requires much more computation power. Due to the runtime optimization we want to focus on the second approach in this work.
Dealing with dimensionality reduction, Principal Component Analysis (PCA) or Independent Component Analysis (ICA) are usually good choices. For highly correlated multivariate time series, the reduction by PCA or ICA to one dimension will still preserve most of the information contained in the multivariate time series. But the assumption of high correlations between all time series seems to be unrealistic. Hence, even if it is possible to reduce the dimensionality by PCA or ICA to a few dimensions, we need a method to compress these to just one univariate time series, which we call unification. The authors of [2, 14] had these problems in the domains classification and symbolic analysis of time series and already proposed similar solutions. Input for the unification are the discretized and optionally filtered (e.g. by hysteresis) time series of each dimension as visualized in Fig. 1. A more detailed example for a two dimensional subsequence is plotted in Fig. 4. The left hand side shows the discretization process including the hysteresis of signal 1 (top) and signal 2 (bottom) separately. On the right hand side both symbolic time series are unified by transforming them into a new discretization space. This is done by calculating the Cartesian product of all symbolic vectors
Note that every dimension
Example for the standard symbolic reduction with index relation.
As we already showed in [12], we need a numerosity reduction technique, which we call symbolic reduction, in order to identify time warped patterns. Figure 5 shows an example of our baseline technique with three different subsequences being symbolically reduced to the same short symbolic sequence
In this case a subsequence contains only one kind of symbol and
Following these rules, we implemented the algorithm as sketched in pseudologred. Having a subsequence of equal consecutive symbols and a given logarithmic basis logbasis, the algorithms runs a WHILE loop starting in line 7. Previously, in lines 2 to 6 a few initial steps are performed. This includes the vector new_symbols which is as long as the subsequence and will later be used to store the new reduced symbols. The variable
Doing so we have a high resolution (low compression) near to SVCs and a low resolution (high compression) in the center of long sequences of equal symbols. Note that the longer the subsequence is, the higher the compression will be. For an example see Fig. 6. The three subequences of Fig. 5 are logarithmically reduced by our algorithm. Even though they have basically the same sequence of symbols
Logarithmic Reduction[1] LogarithmicReductionsubsequence, logbasis
Note that there are two possibilities to handle the new symbols
Example for the logarithmic symbolic reduction to the basis of Euler’s number using the subsequences from Fig. 5.
After preprocessing the multivariate time series we finally apply the pattern discovery. As already mentioned earlier, the algorithms Sequitur and Pattern Enumeration proposed in previous work [12] are implemented. Both algorithms process symbolic sequences in an offline fashion to discover recurrent symbolic sequences with variable length. Due to the reduction of complexity within the preprocessing it is not necessary anymore to search for similar subsequences. Instead, both pattern discovery algorithms search for identical symbolic sequences, which is computationally less heavy than calculating similarities over and over again.
The pattern enumeration algorithm iteratively scans the whole symbolic time series with increasing symbolic lengths. It extracts every symbolic subsequence that at least occurs twice. Every pattern is documented in a dictionary containing the symbolic subsequence as well as the starting and ending indices of every member corresponding to their occurrence in the original time series. Sequitur is classified as a compression algorithm and is based on two constraints. The first constraint is called digram uniqueness which means that every digram (a pair of two adjacent symbols) is allowed to occur only once in the symbolic time series. If a digram appears more than once then a rule is created that replaces the digram by a new symbol. All the rules are stored in a dictionary. Every rule has to fulfill the second constraint, the rule utility. It says that every rule has to occur at least twice in order to eliminate redundant rules. The algorithm iteratively replaces digrams until no digram occurs twice. Hence, the symbolic time series is compressed with every replacement while all the information to reconstruct it are stored in the dictionary.
Due to the additional preprocessing steps it is now possible to discover time warped patterns with variable length in noisy multivariate time series.
Experimental evaluation
In the following sections the pattern discovery framework will be extensively tested in three experiments. We aim to show the benefit of the proposed preprocessing methods hysteresis, unification and logarithmic reduction in comparison to common methods. sec:synthetic contains a parameter experiment based on a 2-dimensional synthetic data set. Additionally, multiple quality criteria are introduced to evaluate the quality of patterns as well as the quality of parameter settings based on every discovered pattern. By performing grid search the quality of parameter settings is evaluated and discussed. This section is especially focused on the evaluation of filtering approaches including the univariate hysteresis. sec:reallife proposes an experiment based on a real life data set with
Synthetic randomized data set
Experiment description
Our first test case for the pattern discovery is based on synthetic data and the testing framework of our previous work in [12] with just small changes. In general, it includes the creation of a data set with random patterns and random fluctuation in between the patterns, the application of the pattern discovery with multiple parameter sets and finally the evaluation via Jaccard index. The data set consists of many different patterns which are created by applying a random walk with a random step size and random number of steps. Beyond that, we modified the random walk by setting limits for the minimum and maximum value. A varying number of copies for each pattern is generated by duplicating it and by adding some distortion (noise and time warping). Finally the patterns and its noisy copies are concatenated in a random order to create the test data set. In our previous work the patterns were concatenated by inserting a noisy linear function between the last sample of the previous pattern and the first sample of the following pattern. In this work we replaced the linear function by our modified random walk, which runs until the current sample is in a similar value region as the first sample of the following pattern. In doing so, we added more randomness to the time series. Furthermore, we adapted the framework to create multivariate patterns, meaning the modified random walk runs for each dimension independently but with the same number of steps. Hence, a pattern contains a sequence of univariate samples in every dimension with the same number of samples for every dimension. Beside these two adaptions, we created more patterns per data set to challenge the methods even more. However, this is also why the following results are not exactly comparable to the results of our previous work. In addition to the Jaccard index, which we usually call overlap, we added the true positive rate (TPR, also known as recall) and the false discovery rate (FDR) to our focus. These three characteristics are defined as:
with:
While comparing one predefined pattern with one discovered pattern,
Exemplary visualization of index sets for the calculation of overlap, TPR and FDR.
Before going into detail, for visual validation, in Fig. 8 16 exemplary patterns with all their members are shown that were discovered in a synthetic data set with two dimensions. The blue subsequences correspond to the first dimension, while the black ones belong to the second dimension. To preprocess the time series all the proposed techniques were used, namely univariate hysteresis, unification and logarithmic reduction. The method is able to discover visually valid patterns in 2D with variable length.
Exemplary visualization of patterns discovered in an 2D synthetic data set by preprocessing the time series with discretization, univariate hysteresis, unification and logarithmic reduction.
In this experiment we searched for the best performing parameters concerning the characteristics overlap, TPR, FDR and runtime within the parameter space, which is shown in Fig. 1. By performing grid search, we evaluated 6750 different parameter settings including some settings for our baseline approach. For every parameter combination the same synthetic time series is tested, which has roughly 700.000 samples in each of two dimensions and 200 randomly created patterns having between 10 and 30 repetitions.
In Fig. 9 the results concerning the overlap, TPR, FDR and runtime (in seconds) are shown. A good parameter setting should result in a high overlap, low runtime, high TPR and low FDR. In general the pattern enumeration algorithm dominates the adapted Sequitur algorithm in every evaluation characteristic. But note that, independently from the pattern discovery algorithm, the result can be highly improved by adding the proposed preprocessing steps. This can be seen by comparing the results with the baseline results that are highlighted in the figure. For further evaluation we focus on the pattern enumeration algorithm because of its dominance.
Results of parameter experiments (red: all parameter settings using the pattern enumeration algorithm; blue: all settings using the adapted Sequitur algorithm).
Comparison of filtering methods.
Figure 10 shows a more detailed view on the results, highlighting the differences in quality and runtime between the filtering methods PAA, wavelet, hysteresis univariate and no filtering at all. In general any filtering method is better than no filtering if it is parametrized well. The higher the number of bins is, the more important is filtering. Comparing wavelet and PAA filtering, Fig. 10 shows that in the best case PAA works better while also running faster. It also shows clearly that the results along the pareto front are dominated by tests using our proposed univariate hysteresis approach. For a more detailed view see Fig. 11. There we compare nine parameter settings with fixed parameters except for the filtering parameters, which is meaningful for the comparison under equal circumstances. The fixed parameters no_steps and logbasis for each setting are written below the figure. A logbasis of NaN refers to the standard reduction technique. Noticeable is the high variance of the overlap using PAA. Hence, a badly parametrized PAA can also produce results worse than without filtering at all. Focusing on the univariate hysteresis the variance is less than in case of the PAA. Generally speaking, high values of the parameter
Overlap of different filter parameters, while fixing all other parameters. Parameter settings 1, 2, 4, 5, 7 and 8 used the logarithmic reduction. Parameter settings 3, 6 and 9 used the standard reduction technique.
To complete the analysis of this experiment in Fig. 12 the results concerning the symbolic reduction are shown. The standard symbolic reduction, proposed in previous work, leads to good results concerning the overlap and the TPR. On the downside it also produces extremely high FDR values when the parameters are not set carefully, which is a consequence of the extreme reduction. By the use of the logarithmic symbolic reduction it is possible to reduce FP and consequently the FDR in those cases. In this synthetic test case the higher accuracy also leads to a decrease in overlap and TPR, because the predefined patterns are designed to be time warped. Hence, the same predefined pattern which can be perfectly detected by the standard symbolic reduction will be divided into multiple discovered patterns when using the logarithmic approach. To show the advantages of this effect in more detail, we introduce a real life test in the next section.
Comparison of number of bins and symbolic reduction techniques.
Voltage measurements with charging events marked in black.
Experiment description
This experiment focuses on the evaluation of the logarithmic symbolic reduction in comparison to the standard symbolic reduction in terms of the trade off between improved quality and runtime. Therefore, we choose a real life data set of a battery voltage with roughly 130 mio. samples and more than 2000 labeled charging events. An extract of this data set is shown in Fig. 13 with charging events which are marked in black. The goal of this experiment is to refind all the charging events by the use of the pattern discovery. Note that this use case is just a synthetic one in order to evaluate the performance of the method regarding a labeled data set. The actual task of finding charging events could be easily done without the pattern discovery. However, this is a hard task for the pattern discovery because the voltage within a charging event does not always behave in the same way. It depends on the State of Charge (SOC) at the beginning and in the end of a charging event, on the temperature, on the charging current and on some other effects. Hence, the start and end voltage, the gradient and the duration of a charging event strongly vary, which will be shown subsequently. Because of this variation it is not the goal to parametrize the pattern discovery in a way to put all charging events in one pattern, but to find a set of patterns which describes the charging events as precisely as possible. Therefore, we additionally define the precision also known as Positive Predicted Value (PPV) as:
We tested the same data set with six different parameter settings varying the parameters no_steps and logbasis while fixing the other parameters regarding the results of the previous synthetic experiments. Hence, we used the univariate hysteresis as the only filtering technique and the pattern enumeration algorithm. In Table 1 the results of this experiment are shown with an assignment concerning the parameters. We evaluated for every parameter setting the number of patterns which are necessary to describe more than 90 percent of the charging events. For all those patterns we calculated the PPV and TPR. The average of the PPV and TPR for each parameter setting is shown in the table. Furthermore, these characteristics are complemented by combined PPV and combined TPR. Therefore, all pattern are summarized in one global pattern. For this global pattern PPV and TPR are calculated in order to obtain the comined PPV and TPR. Additionally, the runtime of all preprocessing functions and the pattern discovery algorithm overall is shown.
Evaluation of real life experiment with 6 different parameter settings. The reduction with the value NaN means the usage of the standard symbolic reduction while the other values specify the parameter logbasis of the logarithmic symbolic reduction
left: Representatives of every rediscovered charging pattern found by the pattern discovery algorithm (
As expected, the number of patterns to represent the charging events is increasing with the usage of the logarithmic symbolic reduction and its parameter logbasis. Hence, looking e.g. at the first column of Table 1, each pattern represents only a few charging events which also results in a low average TPR. To further visualize the variation of the charging events Fig. 14 (left) shows one member of every 31 patterns in a different color for the parameter setting of the second column. Note that the colors repeat for different patterns due to insufficient availability of colors. On the right side of Fig. 14 one exemplary pattern with its members and a representative is visualized. However, the average PPV as well as the combined PPV are much higher with the use of the logarithmic reduction in comparison to the standard reduction, meaning less false positives per pattern. At the same time every parameter setting is able to represent a similar amount of charging events that is shown with the values of the combined TPR. The pattern discovery is now able to differentiate between a dynamic symbolic value change and a slow long term value change by the use of our new proposed reduction technique. Because of the higher precision the logarithmic reduction technique can be seen as an additional built in clustering method. This of course comes with costs in runtime, because of the lower time series compression and the higher amount of discoverable patterns. Beyond that, the logarithmic reduction has a more positive impact when using a low number of bins.
However, what does that improvement mean to our initial use case of anomaly and aging detection? A first task for anomaly detection in vehicle data is for example: Discover similar stress patterns in the voltage signal of a battery-electric vehicle. By applying the pattern discovery it is possible to identify different situations in the voltage signal that occur frequently. Within similar situations, which are members of a pattern, analyses regarding the system’s behavior can be performed. These analyses aim, for example, to differentiate between normal and abnormal behavior or to identify aging of components. These patterns can, for instance, represent dynamic behavior while the vehicle is driven as well as long-term situations like charging events. As we demonstrated previously, by means of the additional preprocessing methods, it is possible to differentiate between different kinds of charging events with less false positives than with previous approaches. The differentiation is necessary due to the high variance of charging events, as shown in Fig. 14 (left). Because the anomaly and aging detection is based on patterns, it is crucial to have a narrow definition of patterns. This is achieved by the additional preprocessing methods.
Test of the pattern discovery with varying number of dimensions.
Problem of transitional states in multivariate time series.
Additionally, we made a third experiment concerning the results in a growing dimensional space to show that this approach works in an environment with more than two dimensions. Therefore, we generated a synthetic 5-dimensional time series by applying the random data generator from 4.1 with roughly 0.5 million samples per dimension. Afterwards we tested the pattern discovery with a limited number of parameter settings and a varying number of dimensions, starting with 2 dimensions. The results are shown in Fig. 15. As expected, the quality of the results is slightly decreasing with increasing number of dimensions. This is because of the increasing complexity and the appearance of transitional states in the unified symbolic time series which is sketched in Fig. 16. These two sequences are qualitatively similar to the human eye. But if we go through our tool chain with both of these sequences, we see that they have not the same sequence of symbols after the unification process. Even though they both go from symbol 2 to symbol 3, they have different transitional states, which makes it impossible for the pattern discovery to detect both sequences as the same pattern. This problem gets worse the more dimensions are analyzed and, hence, is a bottleneck of our proposed unification approach. Nevertheless, Fig. 15 shows that our pattern discovery tool chain has acceptable results even for 5-dimensional analyses. A strategy for good results in high dimensional spaces is the reduction of the number of bins per dimension.
Conclusion and outlook
In this paper we proposed additional preprocessing steps for the symbolic time series representation aiming for enhanced multivariate pattern discovery. Therefore, we introduced a new filtering approach, called hysteresis, and showed that its usage improves the pattern discovery results in comparison to common filtering approaches like PAA or wavelet. Furthermore, we presented the unification, a method to reduce multivariate time series to an univariate symbolic representation, which enables common pattern discovery algorithms to be used for multivariate analyses. Beyond that, we showed that the introduced logarithmic symbolic reduction is able to improve the quality and similarity of the patterns while not adding too much computing time. Using this reduction technique the pattern discovery is now able to differentiate between a dynamic symbolic value change and a slow long term value change.
The performance of the additional preprocessing methods was proven in three different experiments. First of all, an extensive parameter test based on a synthetic data set was performed to evaluate the quality of the newly proposed methods in comparison to alternative preprocessing methods. The quality was evaluated by multiple criteria like overlap, True-Positive-Rate, False-Discovery-Rate and Positive-Predicted-Value based on implanted patterns within the synthetic data set. In a second experiment real life data of a vehicle was used to show especially the benefit of logarithmic reduction as well as the ability of the whole framework to process large amounts of time series data (130 mio. samples) in short time (1–30 minutes). Additionally, we used this experiment for a short excursus regarding anomaly and aging detection in vehicle data based on patterns. For this application it is crucial to have a narrow definition of patterns. This is achieved by the additional preprocessing methods. Furthermore, a third experiment evaluated the performance of the method with increasing number of dimensions based on a 5-dimensional synthetic data set. It showed that our pattern discovery tool chain has acceptable results even for 5-dimensional analyses.
In our future work we will focus on solutions for the elimination of transitional states which we identified as a bottleneck for the multivariate pattern discovery in our framework. Furthermore, we will enable our algorithms for distributed computing and optimize them e.g. regarding the automatized estimation of optimal parameter settings. Moreover, an extension to multivariate pattern discovery with irrelevant dimensions is highly interesting.
