Efficient methods to set decay factor of time decay model over data streams

Abstract

Time decay model (TDM) is frequently used for mining frequent patterns on data streams, because the information embedded in the data from the new transactions is particularly valuable. However, some existing methods on designing decay factor of TDM are random, so their results are unsteady. Some other methods focus on only 100% recall or 100% precision of algorithm, but the corresponding high precision or high recall is ignored. In order to balance high recall and high precision of algorithm, meanwhile, ensure the stability of the result, a novel average decay factor is designed. In addition, to further increase the weights of the latest transactions and reduce the weights of historical transactions, another novel Gaussian decay factor is proposed. Hence, based on an analysis of existing decay factors, this paper aims to design two novel decay factors and two novel TDMs. Algorithms based on these two TDMs are designed to discover frequent patterns over data streams. The methods of mining frequent patterns on high density or low density data streams are evaluated via experiments. This paper’s research findings show that the application of average time decay factor can balance the high recall and high precision of algorithm. And Gaussian decay factor can produce better performance than existing algorithms.

Keywords

Data streams mining frequent pattern mining decay factor time decay model Gaussian function

1. Introduction

Data stream is ordered, changing, massive and unlimited, so the knowledge it contains will change in time. Typically, recent transactions contain more valuable knowledge than historical transactions [1, 19]. Therefore, the process of mining frequent patterns on data stream emphasizes the frequent patterns which are generated by the recent transactions. For this reason, several models, landmark window models(LWM) [28], sliding window models(SWM) [9], and time decaying models (TDM) (also called damped window model) [26] have been used to find recent interesting information from datastreams.

The sliding window model and the time decay model are commonly used to discover patterns. The SWM captures a fixed number of recent transactions in a window, and it focuses on discovering the patterns within the window. Based on sliding window model, some methods are proposed to find frequent patterns [4 , 27], high-utility patterns [8 , 28] or sequential patterns [10, 16] over data streams. In the time decay model, each transaction is assigned with a weight that decreases over time. Therefore, in this model, recent transactions are more important than old ones. Some algorithms are proposed to find frequent patterns based on time decay models [9 , 26]. The sliding window model is used to emphasize the frequent patterns mining in the recent sliding window, but the weights of all transactions in the window are the same. The time decay model distinguishes the weights between recent transactions and historical transactions within the window.

However, it is difficult to choose an appropriate decay function or decay rate for TDM. There are three ways to set the decay factor as follows.

The value of the factor is defined as a random value in the range of (0,1] [14 , 26]. This method to set the decay factor may cause instability of the results because of the different random values.

The value of the factor is defined as a constant value which is irrelevant to the window size [3, 11]. Its disadvantage lies in the fact that the performance of algorithm may vary widely with the changing of window sizes.

The value of the factor is defined as a boundary value. The upper or lower bounds are achieved with the estimation of 100% recall or 100% precision of algorithm [5 , 15]. The defect in this way is that it can get high recall or precision, but with the low precision or recall.

Above all, based on TDM and SWM, the study of mining frequent patterns on data streams should focus on two things. The first one is to discover the relationship among several parameters: sliding window size, minimum support threshold, maximal support error threshold and decay factor. However, after the values of the first three parameters are determined, it is difficult to determine the value of decay factor. The second one is to balance high recall and high precision of algorithm and to ensure the stability of the results. In this paper, we summarize and analyze the existing ways of setting decay factors, and we propose two new factors. Our contributions can be summarized as follows:

A new average decay factor f_average is proposed in order to balance high recall and high precision of algorithm.

Another novel Gaussian decay factor is proposed to improve the weights of recent transactions and to reduce the weights of historical transactions in the sliding window. Meanwhile, we explore the close relation among decay factor f, variance δ and the sliding window size N.

Two new kinds of TDMs based on f _ average and f _ gauss are proposed. Algorithms based on these two TDMs are proposed to mine frequent patterns on data streams of different data characteristics.

Through an experimental study on real-world and synthetic data streams, these proposed algorithms show the potentials.

The rest of this paper is organized as follows: Section 2 presents background knowledge aboutfrequent pattern mining on data streams and the time decay model. Summary and analysis of three categories of setting decay factors is provided in Section 2. Two new decay factors and comparisons of different decay factors are introduced in Section 3. Section 4 describes the experiments and explains the experimental results to verify the reasonableness of novel decay factors. Section 5 concludes the work.

2. Related work

2.1. Preliminaries

A data stream DS =< T₁, T₂, …, T_i, … > is a continuous and unbounded sequence of transactions, where T_i (i = 1, 2, …) is the ith transaction. Let A = {a₁, a₂, …, a_n} be a set of items in DS, and T_i (i = 1, 2, …) is a subset of A. A transaction is a tuple, denoted as (TID, itemset), contains a unique transaction identifier TID and an itemset, as shown in Table 1.

Table 1
Four transactions in a data stream

TID itemset

1 1 3 4

2 2 3 5

3 1 2 3 5

4 2 3 4 5

TID	itemset
1	1 3 4
2	2 3 5
3	1 2 3 5
4	2 3 4 5

In the applications of time-sensitive data stream, users are most interested in the recently arrived transactions, thus the sliding window model is suitable for such cases. A sliding window SW of size N contains the nearly N transactions in DS. A frequent pattern P = {p₁, p₂, …, p_k} is also a subset of A. The frequency of P, denoted as freq (P), is the number of transactions in SW where P occurs. The support of P in SW, denoted as sup(P), is defined as freq (P)/N. P is a frequent pattern in SW if sup(P) ≥ θ, where θ (0 ≤ θ ≤ 1) is a minimum support threshold. P is a frequent closed pattern in SW, if P is a frequent itemset in SW and there exists no itemset Y in SW such that X ⊂ Y and freq (X) = freq (Y). In this paper, we will focus on mining frequent closed pattern over data stream on the basis of sliding window model and time decay model.

Due to the continuity and infinite, the knowledge contained in data stream may change over time. Under normal circumstances, the value of the recent transaction is more important than the historical one. Therefore, it is necessary to increase the weight of recent transaction. A TDM is developed to decay gradually the occurrence frequency of the patterns contained in the transactions with time [5].

Let the decay ratio of frequency in the unit time is decay factor f (f ∈ (0, 1]). When transaction T_i arrives, the decay frequency of a pattern P is denoted as freq_d (P, T_i). When T arrives, freq_d (P, T) is initialized to be 1 if it contains P, and 0 otherwise. Each time a new transaction arrives, freq_d (P, T) is multiplied by a decay factor f. As the new transaction T_m arrives, the decay frequency freq_d (P, T_m) of P is calculated by Formulas 1 and 2. When the mth transaction T_m arrives, r is 1 if it contains P, otherwise r = 0.

$\begin{matrix} {freq}_{d} (P, T_{m}) = \\ {\begin{matrix} r, & if m = 1 \\ {freq}_{d} (P, T_{m - 1}) \times f + r, & if m \geq 2 \end{matrix} \end{matrix}$ (1) $r = {\begin{matrix} 1, & if P \subseteq T_{m} \\ 0, & else \end{matrix}$ (2)

From the above formulae, it can be concluded that the decay frequency of pattern P is smaller than its original frequency, that is freq_d (P) ≤ freq (P). For instance, set sliding window size N = 1000 and minimum support threshold θ = 0.01. If freq (P) ≥ θ × N = 10, P is a frequent pattern. However, let the decay factor f = 0.8, then freq_d (P) <1/(1 - 0.8) =5 as calculated by Formula 3. That means no pattern can be mined out under the time decay model. $\begin{matrix} {freq}_{d} (P, T_{m}) = \\ {freq}_{d} (P, T_{m - 1}) \times f + r = \sum_{i} r_{i} \times f^{m - i} \\ = r_{1} \times f^{m - 1} + r_{2} \times f^{m - 2} + \dots + r_{m} \\ \leq f^{m - 1} + f^{m - 2} + \dots + 1 \leq \frac{1}{1 - f} \end{matrix}$ (3)

With TDM, some possible frequent patterns may be lost with only minimum support threshold. In order to reduce the number of missing possible patterns, frequent patterns and sub-frequent patterns need to be maintained during mining process. In addition, in order to reduce the cost of maintaining patterns, non-frequent patterns need to be abandoned. The definitions of frequent patterns are shown as follows. By this method, the possible error of lost patterns is no greater than the maximal support error ɛ [5].

Definition 1. Let θ (θ ∈ (0.1]) be the minimum support and ɛ (ɛ ∈ (0, θ)) be the maximal support error. If freq_d (P) ≥ θ × N, P is a frequent pattern. If ɛ × N ≤ freq_d (P) < θ × N, P is sub-frequent pattern. Otherwise, if freq_d (P) < ɛ × N, P is non-frequent pattern.

Example 1. There are 4 transactions in data stream as shown in Table 1. Let minimum support θ = 0.5, maximal support error ɛ = 0.1 × θ and decay factor f = 0.8. Using Formula 1 to generate decay frequency, the mining process on data stream is shown in Fig. 1. ClosedTable, which consists of three fields: Cid, CP and SCP, that is used to maintain the information of closed itemsets. Each closed itemset CP is assigned a unique closed identifier Cid, and its decay frequency is denoted as SCP. When a new transaction T_m arrives, the frequencies of all itemsets in ClosedTable are needed to be updated. The details are shown in FrequentTable in Fig. 1. For instance, when T_m arrives, we update ClosedTable first. Works include updates: freq_d = ({1 3 4}) =0.8 × f + r⁽³⁾ = 0.64, r⁽³⁾ = 0; freq_d = ({2 3 5}) =1 × f + r⁽³⁾ = 1.8, freq_d = ({3}) =1.8 × f + r⁽³⁾ = 2.44, and r⁽³⁾ = 1. Next step is adding a new itemset {1 2 3 5} into ClosedTable, and freq_d = ({1 2 3 5}) = r⁽³⁾ = 1. So repeatedly, all frequencies of itemsets are generated.

Fig.1

Mining frequent patterns on data stream based on TDM.

2.2. Decay function and decay factor

A decay function takes the initial weight and the age of an itemset. Therefore, the weight of an itemset in the sketch can decrease with time according to a user-specified decay function.

Definition 2. A decay function factor (f, x) takes two parameters, the decay factor f ≥ 0 and an integral age x ≥ 0, which should satisfy the following conditions. (1) factor (f, x) ≥0 for all f, x; (2) if f₁ > f₂, then factor (f₁, x) ≥ factor (f₂, x); (3) if x₁ > x₂, then factor (f, x₁) ≤ factor (f, x₂).

Recently, the commonly used decay function is like factor (f, x) = f^x, where x is the age, and f is the decay factor. There are three types of values to set the decay factor f: user-specified value, constant value and variable value.

(1) Set the decay factor to a user-specified value in the range of (0, 1].

For instance, algorithm IncSpam [16] performs the mining of sequential patterns over a sliding window by using a lexicographical tree. And it reflects the greater importance of more recent information by means of a static decay function with a user-defined decay value of 0.999. Framework GUIDE [17] mines maximal high utility itemsets from data streams by assuming the decay factor as 0.9. DUF-streaming [14] uses the time fading window to mine frequent itemsets from uncertain data streams with setting time fading f to a user-specified value in the range of (0,1]. MPM [26] defines a damping factor, which is a real number ranging from 0 to 1. Then, the importance of transactions generated from data streams is determined by this factor. If time passes after a transaction arrives from a data stream, the importance of the transaction (frequency, utility, etc.) is multiplied by the factor.

Setting the decay factor to a user-specified value may lead to different function values. For example, let f = 0.9999, 0.999, 0.99 or 0.9 and x = 1, 100 and 1000, then the values of factor (f, x) are very different as shown in Table 2. If f = 0.999 (as in algorithm IncSpam [16]) and x = 1000, then factor (f, x) =0.90483289. While if f = 0.9 (as in GUIDE [17]) and x = 1000, then factor (f, x) =1.7479E - 46. The gap of these two decay function values is large. Therefore, setting f to user-specified values may cause unstable mining results.

Table 2
Values of decay function factor (f, x)

f\x 1 100 1000

0.9999 0.999 0.990049339 0.90483289

0.999 0.990045 0.904792147 0.36769542

0.99 0.904382 0.366032341 4.3171E-05

0.9 0.348678 2.65614E-05 1.7479E-46

f\x	1	100	1000
0.9999	0.999	0.990049339	0.90483289
0.999	0.990045	0.904792147	0.36769542
0.99	0.904382	0.366032341	4.3171E-05
0.9	0.348678	2.65614E-05	1.7479E-46

(2) Set the decay factor to a constant value regardless of parameter N.

Chang [3] and Lee [13] define a decay factor by two parameters: a decay-base b and a decay-base-life h. A decay-base b determines the amount of weight reduction per a decay-unit and it is greater than 1. When the weight of the current information is set to 1, a decay-base-life h is defined by the number of decay-units that makes the current weight be b^-1. Based on these two parameters, a decay factor f_d is defined as shown in Formula 4 where b > 1, h ≥ 1, b^-1 ≤ f_d < 1. $f_{d} = b^{- (\frac{1}{h})}$ (4)

HewaNadungodage [11] suggested that the half-life should be used to set the decay factor so as to mine frequent patterns over uncertain data streams. Half-life, which is denoted by t_1/2, is the period of time which takes for a substance undergoing decay to be decreased by half. An exponential decay process can be described by the following Formula: $f = e^{- α}$ (5) where α is a positive number called the decay constant of the decaying quantity and its value is: $α = \frac{\ln 2}{t_{1 / 2}} .$

And the value of decay factor is shown as Formula 6.

$f_halflife = e^{- (\frac{\ln 2}{t_{1 / 2}})}$ (6)

These methods set the decay factor f without considering its relationship with the sliding window size. Therefore, with the change of the window size, the performance of algorithm may be quite different.

(3) Set the decay factor to variable value related to parameters N, θ and ɛ.

Some of the frequent patterns retrieved by stream mining algorithm are sub-frequent or non-frequent. To measure the quality of approximate mining results, two performance metrics recall and precision [5, 6] are commonly used. Given a set of true frequent patterns Q and a set of retrieved patterns O, then $recall = \frac{| Q \cap O |}{| Q |}, precision = \frac{| Q \cap O |}{| O |} .$ (7)

Two conditions are used to set the decay factor. Firstly, assuming recall of the algorithm is 100%, all frequent patterns are discovered under the time decay model. Consider a situation in which only θ × N transactions contain the pattern P, then P is frequent for freq (p) ≥ θN. In order to get 100% recall, P should be selected even though the decayed frequency of P reaches its minimum. Theorem 1 shows a lower bound for the decay factor if 100% recall is assumed.

Theorem 1. With the time decay model, recall will be 100% if the decay factor f satisfies the following condition: $f \geq^{(2 N - θ N - 1)} \sqrt{[(θ - ɛ) / θ]^{2}}$ (8)

Proof. Suppose θN transactions contain P which are the oldest transactions in SW, as shown in Fig. 2(a). Thus the minimum decay frequency of P is:

Fig.2

The model of mining a sliding window of data stream, (a) the oldest θN transactions, (b) the newest θN-1 transactions.

$\begin{matrix} {freq}_{d} (P, T)_{n} = f^{N - θ N} (1 + f + \dots + f^{θ N - 1}) \\ = f^{N - θ N} + f^{N - θ N - 1} + \dots + f^{N - 1} \end{matrix}$ (9)

For this case, all the frequent patterns will be discovered if the minimum decay frequency can be ensured to be no smaller than (θ - ɛ) N. So, the following condition should be satisfied:□

$f^{N - θ N} + f^{N - θ N - 1} + \dots + f^{N - 1} \geq (θ - ɛ) N$ (10)

According to the inequality of arithmetic and geometric means [5] as shown in Inequality 11, the Inequality 12 can be concluded. For inequality 10 and Inequality 12, we can get Inequality 13.

The upper bound of decay factor can be obtained by assumed 100% precision. Therefore the maximal decay frequency of non-frequent pattern P needs to be taken into account. That means the newest θN - 1 transactions contain P as shown in Fig. 2(b). If P is excluded from the mining result no matter how large the decay frequency of P is, then none of the other non-frequent patterns will be selected. Theorem 2 shows the upper bound for the decay factor in case of assuming 100% precision.

$A_{k} = \frac{x_{1} + x_{2} + \dots + x_{k}}{k}, G_{k} = \sqrt[k]{x_{1} \times x_{2} \times \dots \times x_{k}} \Rightarrow A_{k} \geq G_{k}$ (11)

$\begin{matrix} \frac{f^{N - θ N} + f^{N - θ N - 1} + \dots + f^{N - 1}}{θ N} \geq \sqrt[θ N]{f^{N - θ N} \times f^{N - θ N - 1} \times \dots \times f^{N - 1}} \\ \Rightarrow f^{N - θ N} + f^{N - θ N - 1} + \dots + f^{N - 1} \geq θ N \times \sqrt[θ N]{f^{N - θ N} \times f^{N - θ N - 1} \times \dots \times f^{N - 1}} = θ N \times f^{\frac{2 N - θ N - 1}{2}} \end{matrix}$ (12)

$θ N \times f^{\frac{2 N - θ N - 1}{2}} \geq (θ - ɛ) N \Rightarrow f \geq^{(2 N - θ N - 1)} \sqrt{[(θ - ɛ) / θ]^{2}}$ (13)

Theorem 2. [5] With the time decay model, precision will be 100% if the decay factor f satisfies the following condition: $f < \frac{(θ - ɛ) N - 1}{(θ - ɛ) N}$ (14)

Proof. Suppose the newest θN - 1 transactions in the sliding window contain P, then the decayed frequency of P is the maximal value. The decayed frequency is shown as followed: ${freq}_{d} (P, T_{n}) = 1 + f^{1} + \dots + f^{θ N - 1}$ (15)

To ensure that P will not be discovered, the Inequality 16 should be met. $\begin{matrix} 1 + f^{1} + \dots + f^{θ N - 1} < (θ - ɛ) N \\ \Rightarrow \frac{1 - f^{θ N - 1}}{1 - f} < (θ - ɛ) N \Rightarrow 1 - f^{θ N - 1} \\ < (θ - ɛ) N \times (1 - f) \\ \Rightarrow 1 - f^{θ N - 1} < (θ - ɛ) N - f (θ - ɛ) N \\ \Rightarrow f (θ - ɛ) N - f^{θ N - 1} < (θ - ɛ) N - 1 \end{matrix}$ (16)

For f^θN-1 > 0, then the following can be get. $f (θ - ɛ) N < (θ - ɛ) N - 1 \Rightarrow f < \frac{(θ - ɛ) N - 1}{(θ - ɛ) N}$ (17)

These ways set the decay factor with considering the relationship between f and window size, minimum support and maximal error, so the scalability of the result is good. The common method sets f to lower bound with assuming 100% recall [5]. The shortage of this method is the low degree of decay and the low precision of algorithm because it only takes the 100% recall of algorithm into consideration.□

3. Novel ways to set the decay factors

Existing methods which set the decay factor to user-specified value, constant value or boundary value of assuming 100% recall or 100% precision of algorithm are as shown before. Shortcomings of these methods are: (1) the former two methods do not consider the relationship between decay factor and other parameters; (2) the latter may get high recall or high precision and corresponding low precision or low recall.

In order to balance high recall and high precision, we proposed two novel ways to set the decay factor. Firstly, set the decay factor to average way with assuming 100% recall and 100% precision. Secondly, to further increase the weight of new transactions and reduce the weight of historical transactions, we proposed Gaussian function method to set the decay factor.

3.1. Average decay factor

This section focuses on the parameters: the sliding window size N, the minimum support threshold θ (θ ∈ (0, 1)) and the maximal support error ɛ (ɛ ∈ (0, θ)). How is the value of the decay factor f determined? Common ways usually set the decay factor to boundary values f₁ and f₂, as shown in Formula 18 and Formula 19. The value f₁, which is known as the lower bound of decay factor, can be achieved by assuming 100% recall of algorithm. The higher bound f₂ can be attained by assuming 100% precision of algorithm.

Because recall and precision cannot be 100% simultaneous, a balance of both two values should be considered when setting f. The first novel decay factor based on the average value of higher and lower bounds is proposed in this paper, which is called f _ average as shown in Formula 20. Because value 1 is much smaller than value $(θ - ɛ) N (\sqrt[(2 N - θ N - 1)]{[(θ - ɛ) / θ]^{2}} + 1),$

it (value 1) can be negligible. Therefore, Formula 20 can be simplified to Formula 21. Value f₃ is the average of f _ recall and f _ precision, so the recall and precision of algorithm should be balanced in theory. Meanwhile, it avoids the randomness of setting the decay factor. $f_{1} = f_recall = \sqrt[(2 N - θ N - 1)]{[(θ - ɛ) / θ]^{2}}$ (18) $f_{2} = f_precision = \frac{(θ - ɛ) N - 1}{(θ - ɛ) N}$ (19) $\begin{matrix} f_{3} = f_average = (f_recall + f_precision) / 2 \\ = \frac{\sqrt[(2 N - θ N - 1)]{[(θ - ɛ) / θ]^{2}} + \frac{(θ - ɛ) N - 1}{(θ - ɛ) N}}{2} \\ = \frac{(θ - ɛ) N (\sqrt[(2 N - θ N - 1)]{[(θ - ɛ) / θ]^{2}} + 1) - 1}{2 N (θ - ɛ)} \end{matrix}$ (20) $\Rightarrow f_{3} = \frac{\sqrt[(2 N - θ N - 1)]{[(θ - ɛ) / θ]^{2}} + 1}{2}$ (21)

In order to further investigate the ways to set the decay factor, let us change Formula 1 to Formula 22. The function factor (f, x) is the decay function as shown in Formulas 22 and 23. Where parameter x = m - i, i is the time of T_i arrived (i = 1, 2, …, n, …, m) and m is the time of current transaction T_m arrived. Let set D represent the accumulation of factor (f, x), when T_m arrived the values in D are shown in Expression 24.

The forms of above decay functions are similar to factor (f, x) = f^x. The trends of decay function values with f _ recall, f _ precision and f _ average changing over time are shown in Table 4. The values of f _ recall are the highest and the values of f _ precision are the lowest, while values of f _ average are between them.

$\begin{matrix} {freq}_{d} (P, T_{m}) = {freq}_{d} (P, T_{m - 1}) \times f + r \\ = \sum_{i} r_{i} \times f^{m - i} = \sum_{i} r_{i} \times factor (f, m - i) \end{matrix}$ (22) $factor (f, x) = f^{x}$ (23)

$\begin{matrix} D = {factor (f, m - 1), factor (f, m - 2), \dots, \\ factor (f, 1), factor (f, 0)} \\ = {f^{m - 1}, f^{m - 2}, \dots, f^{1}, 1} \end{matrix}$ (24)

3.2. Gaussian decay factor

In order to add the weights of recent transactions and reduce the weights of historical transactions further, the second novel way to set the decay factor on the basis of Gaussian function $f (x) = \frac{1}{σ \sqrt{2 π}} e^{- \frac{{(x - μ)}^{2}}{2 σ^{2}}}$

is proposed in this paper. It means setting decay function as $factor (f_gauss, x) = \frac{A}{σ \sqrt{2 π}} e^{- \frac{{(x - μ)}^{2}}{2 σ^{2}}} .$

To make decay function meet the characteristic of f ∈ (0, 1], constant value A is added. When the new transaction T_m arrived, the value of decay function should be 1, and decay function should associate with sliding window size N, so we set parameters the mean μ = 0 and the variance δ² = B × N, B = 1, 2, 3… Then, decay function is as follows: $factor (f_gauss, x) = \frac{A}{\sqrt{2 π BN}} e^{- \frac{(x)^{2}}{2 BN}},$

where A > 0, B = 1, 2, 3…, N > 0. This decay function satisfies the Definition 2. Therefore, the usage of this decay function is reasonable. In conclusion, the Gaussian decay factor is shown as f₄ in Formula 25. $f_{4} = f_gauss = = \frac{A}{\sqrt{2 π BN}} e^{- \frac{1}{2 BN}}$ (25)

Example 2. Let parameters θ = 0.05, ɛ = 0.1 × θ, B = 1, 2, 3 and the values of different decay factors are shown in Table 3. For instance, f _ recall = 0.999892, f _ precision = 0.977778, f _ gauss|_B=1 = 0.9995, f _ gauss|_B=2 = 0.99975 and f _ gauss|_B=3 = 0.999833 can be achieved when N = 1K. From this table, it can be concluded that the values of f _ gauss are between values of f _ recall and f _ precision, and they are closer to f _ recall. The value of f _ gauss is bigger with B = 3 than with B = 1 under the same N. But when B = 5 and N = 1K, the value of f _ gauss is 0.9999 which is bigger than f _ recall. Therefore, set B with a large number resulting in lower degree of fade than f _ recall. It is unreasonable because we need a higher degree of fade than f _ recall. In this paper, we only consider the values of setting B to 1, 2 and 3.

Table 3

Decay factors f_recall, f_precision and f_gauss

N	f_recall	f_precision	f _ gauss\|_B=1	f _ gauss\|_B=2	f _ gauss\|_B=3	f _ gauss\|_B=4	f _ gauss\|_B=5
1K	0.999892	0.977778	0.9995	0.99975	0.999833	0.999875	0.9999
2K	0.999946	0.988889	0.99975	0.999875	0.999917	0.999938	0.99995
3K	0.999964	0.992593	0.999833	0.999917	0.999944	0.999958	0.999967
4K	0.999973	0.994444	0.999875	0.999938	0.999958	0.999969	0.999975
5K	0.999978	0.995556	0.9999	0.99995	0.999967	0.999975	0.99998

The trends of decay function values with different decay factors changing over time are shown in Table 4. The values of f _ gauss are higher than f _ precision and lower than f _ recall in recent time. But in historical time the values of f _ gauss may be lower than both of them. It means that the degree of fade with f _ gauss in recent transactions is lower than the degree of fade with f _ precision. But the degree of fade with f _ gauss is higher than the degree of fade with f _ precision in historical transactions. Therefore, to set the decay factor to f _ gauss can improve the weights of new transactions and reduce the weights of old transactions.

3.3. Comparison of different decay factors

The commonly used decay factors include: f _ d [3, 13], f_halflife [11], f_recall and f_precision [5]. This section shows the comparison between two novel decay factors we proposed and these four decayfactors.

At first, let us integrate the decay factor factor (f, x) and get a new formula as shown in Formula 26. Parameter x = m - i is the time distance between i and m, i is the time of T_i arrived (i = 1, 2, …, n, …, m) and m is the time when current transaction T_m arrives. There are six kinds of decay functions as shown in Formula 26. When f = f _ d, f _ halflife, f _ recall, f _ precision and f _ average, its form is like the power(f, x). And f = f _ gauss, its form is like the Gaussian function.

Example 3. Let N = 10K, minimum support θ = 0.05, maximal support error ɛ = 0.1 × θ, B = 1, 2, 3, b = 2, h = 10000 [2] and f _ halflife0.93 [11]. When T₁ arrived, we can get function (f _ d, 1) =0.999930688, function (f _ halflife, 1) =0.93, function (f _ recall, 1) =0.999989, function (f _ precision, 1) =0.997778, function (f _ average, 1) =0.998884 and function (f_gauss|_B=1, 1) =0.9995. After T₄ arrived, the values of accumulation decay function in set D are as shown in Table 4. The degrees of fade are low with these decay factors expect f _ halflife after T₄ arrived. When T₅0 arrived, function (f _ recall, 50) =0.99945 which is tiny different from function (f _ recall, 1). Meanwhile, weight is 0.286505 with f = f _ gauss|_B=1 which is far lower than other values except f _ halflife. That means the weights of historical transactions are lower than other decay factors.

Table 4
Values of D with different decay factors

T_m D

f = f_d f = f _ halflife f = f _ recall f = f _ precision f = f _ average f = f _ gauss|_B=1 f = f _ gauss|_B=2 f = f _ gauss|_B=3

T₁ 0.999930688 0.93 0.999989 0.997778 0.998884 0.9995 0.999750031 0.999833347

T₂ 0.99986138 0.8649 0.999978 0.995561 0.997769 0.998002 0.9990005 0.999333556

T₃ 0.999792077 0.804357 0.999967 0.993349 0.996656 0.99551 0.997752529 0.998501124

T₄ 0.99972278 0.74805201 0.999956 0.991142 0.995543 0.992032 0.996007989 0.997336886

… … … … … … … … …

T₅0 0.996540263 0.026555069 0.994564529 0.894739 0.945699 0.286505 0.535261429 0.65924063

T_m	D
T₁	0.999930688	0.93	0.999989	0.997778	0.998884	0.9995	0.999750031	0.999833347
T₂	0.99986138	0.8649	0.999978	0.995561	0.997769	0.998002	0.9990005	0.999333556
T₃	0.999792077	0.804357	0.999967	0.993349	0.996656	0.99551	0.997752529	0.998501124
T₄	0.99972278	0.74805201	0.999956	0.991142	0.995543	0.992032	0.996007989	0.997336886
…	…	…	…	…	…	…	…	…
T₅0	0.996540263	0.026555069	0.994564529	0.894739	0.945699	0.286505	0.535261429	0.65924063

The values of these six kinds of decay factors with time changing are shown in Table 4 where parameters are: N = 10K, θ = 0.05, ɛ = 0.1 × θ, b = 2, h = 10000, B = 1, 2, and3. From this table it can be concluded that (1) when setting f = f _ recall and f = f _ d, there are tiny degree of fade of historical transactions with time changing as shown in Table 4. (2) When setting f = f _ halflife, there is a great degree of fade of historical transactions. (3) When setting f = f _ average, the degree of fade is between f = f _ recall and f = f _ precision. (4) When setting f = f _ gauss, the degree of fade of recent transactions is lower than f _ precision and f _ average. But the degree of fade of historical transactions is higher than these two. In other words, the feature of f _ gauss emphasizes the importance of recent transactions and reduces the importance of the historicaltransactions.

$\begin{matrix} factor (f, x) = \\ {\begin{matrix} = {(b^{- (\frac{1}{h})})}^{x} = b^{- (\frac{x}{h})}, f = f_{d} \\ = {(e^{- (\frac{\ln 2}{t_{1 / 2}})})}^{x} = e^{- (\frac{xln 2}{t_{1 / 2}})}, f = f_halflife \\ = {(\sqrt[(2 N - θ N - 1)]{[(θ - ɛ) / θ]^{2}})}^{x} = {(\frac{θ - ɛ}{θ})}^{\frac{2 x}{2 N - θ N - 1}}, \\ f = f_recall \\ = {(\frac{(θ - ɛ) N - 1}{(θ - ɛ) N})}^{x}, f = f_precision \\ = {(\frac{\sqrt[(2 N - θ N - 1)]{[(θ - ɛ) / θ]^{2}} + 1}{2})}^{x}, f = f_average \\ = \frac{A}{\sqrt{2 π BN}} e^{- \frac{(x)^{2}}{2 BN}}, f = f_gauss \end{matrix} \end{matrix}$ (26)

3.4. Algorithm TDMPDS

Two novel TDMs based on f_average and f_gauss are described as Expressions 27 and 28. The value of f _ average is affected by parameters sliding window size N, minimum support threshold θ, and maximal error rate ɛ. The value of f_gauss is affected by parameters sliding window size N and B. Compared with some existing decay factors, the value of f_average is between f_recall and f_precision in order to balance the high precision and the high recall of algorithm. The character of f_gauss is that it leads to the lower degree of fade on recent transactions and higher degree of fade on historical transactions than some existing decayfactors.

$\begin{matrix} {freq}_{d} (P, T_{m}) = \sum_{i} r_{i} \times {(\frac{\sqrt[(2 N - θ N - 1)]{[(θ - ɛ) / θ]^{2}} + 1}{2})}^{m - i}, f = f_average \\ r = {\begin{matrix} 1, & if P \subseteq T_{m} \\ 0, & else \end{matrix} \end{matrix}$ (27)

$\begin{matrix} {freq}_{d} (P, T_{m}) = \sum_{i} r_{i} \times (\frac{A}{\sqrt{2 π BN}} e^{- \frac{(m - i)^{2}}{2 BN}}), f = f_gauss r = {\begin{matrix} 1, & if P \subseteq T_{m} \\ 0, & else \end{matrix} \end{matrix}$ (28)

Figure 3 shows the process of generating frequent patterns based on the two novel TDMs we proposed. Each group data in Fig. 3 means < {CP} (SCP)>, where CP is the frequent itemset and SCP is the frequency of CP. There are five transactions, the first four transactions shown in Table 1 and another new transaction T₅ = {1 3 4 7}. It is different from Example 1. Instead of updating the whole ClosedTable, it updates the itemsets associated with the new transaction. For instance, the itemset <{1 3 4} (1)> is generated after transaction T₁ arrived and it only needs to be updated after transaction T₅ arrived. The frequency of {1 3 4} is updated to (1 × f⁴ + 1) which uses the value f⁴ as weight. Compared with Example 1, this proposed algorithm can reduce the time complexity when mining frequent patterns on data stream.

Fig.3

Frequent patterns mining by proposed algorithm TDMPDS.

The method we propose is called TDMPDS(TDM-based frequent Pattern mining over Data Streams). It includes adding new transactions into sliding window and dropping old transactions from sliding window. Algorithm TDMPDSADD(T_m) is used to process the new transaction T_m, including in two main steps.

Step 1. Generate the frequent itemsets O referring to T_m and ClosedTable.

Step 2. For each frequent itemset o ∈ O.

if o is a new frequent itemset then add it to ClosedTable.

else update the frequency of o.

Algorithm TDMPDSREMOVE() is used to drop historical transactions. In order to improve the work efficiency and reduce the work time, it processes old transactions in each P steps. That means if time m % P = 0, then call the function TDMPDSREMOVE(). Its main work is to drop the non-frequent itemsets from ClosedTable.

4. Experimental analysis

In order to compare the pros and cons of different methods to set the decay factors, real and synthetic data streams are used in this paper. The first one is high density data stream msnbc which has high frequency of pattern. It describes the page visits of users who visited msnbc.com on September 28, 1999 from UCI

1
http://archive.ics.uci.edu/ml

. Visits are recorded at the level of URL category and are recorded orderly. There are 989,818 transactions and the average length of transactions is 5.7. It is a high density and highly similar data stream. The second one is Kosarak which is a low density data stream. The frequency of pattern in this data stream is low. This is a very large dataset containing 990,000 sequences of click-stream data from a Hungarian news portal. The dataset in its original format can be found at http://fimi.ua.ac.be/data/. The third one is Bmswebview

http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php (http://archive.ics.uci.edu/ml)

which is a low density data stream. This dataset contains 59,601 sequences of click stream data from an e-commerce. It contains 497 distinct items. The average length of sequences is 2.42 items with a standard deviation of 3.22. Synthetic datasets were generated from IBM data generator. There are four synthetic data streams with different average pattern and transaction sizes: T5I5D1000K, T10I5D1000K, T10I10D1000K and T20I5D1000K. These data streams were used to analyze the performance of data stream on different density. The parameters are described as follows: D is the total number of transactions; I is the average size of maximal potential patterns; T is the average length of transactions. Such as, T10I5D1000K means that the average length of transactions is 10, average length of maximal potential patterns is 5, and number of transactions is 1000K.

4.1. Parameters

Several methods to set the decay factors of the four algorithms are compared with each other in this paper. (1) Set f to f _ d with parameters b = 2 and h = 10000 as used in algorithm estDec [2], denoted as estDec|f _ d. (2) Set f to f_halflife with parameter t = 1 as used in algorithm UHS [11], denoted as UHS|f _ h. (3) Set f to lower bound as used in algorithm SWP [5], denoted as SWP|f _ r. (4) In order to balance recall and precision of algorithm, set f as f_average in the proposed algorithm TDMPDS, denoted as TDMPDS|f _ a. (5) To emphasize the importance of the recent transactions and reduce the weights of historical transactions, set f as Gaussian decay factor with mean μ = 0 and variance δ² = B × N, denoted as TDMPDS|f_g.

Set minimum support θ = 0.06 and 0.001 for mining out the majority of frequent patterns. The former is used to discover frequent patterns on high density data stream and the latter is used on low density data stream. Let maximal support error ɛ = 0.1 × θ, and sliding window size N = 1000, 2000, 3000 and 4000.

4.2. Performance

The first experiment analyzes the parameter B of Gaussian decay factor. Figure 4 shows the performance of algorithm TDMPDS on high density data stream msnbc. The comparison among parameters B = 1, 2, 3 can achieve the best precision and the worst recall if B = 1, and the best recall and the worst precision are obtained by B = 3. The analysis of performances under different sliding window sizes shows that, if B = 3 the optimal performance of algorithm and the most balanced recall and precision are achieved as shown in Fig. 4(c). The performance is followed by setting B = 2, while the maximum gap between recall and precision of algorithm is obtained with setting B = 1. Dealing with the low density data stream Kosarak, it can also get similar conclusions.

Fig.4

Variation of recall and precision with different Bs and windows on msnbc, (a) recalls (b) precisions (c) average recall and precision, (a) recalls with different windows (b) precisions with different windows.

Next, compare the pros and cons of different decay factors of the four algorithms over high density data stream msnbc. Let B = 3, the performances of algorithms are shown in Fig. 5. It can be concluded that: (1) performances of algorithm are less influenced by sliding window sizes for TDMPDS|f_a andTDMPDS|f_g. (2) It gets almost 100% recalls for SWP|f_r and estDec|f_d. However, the corresponding precisions are the lowest two. (3) The highest precision is obtained by USH|f_h. But the corresponding recall is the lowest. (4) The recall for TDMPDS|f_a is lower than recall for SWP|f_r, while the precision is higher than it. (5) Average value of recall and precision for TDMPDS|f_g is optimal compared with other ways.

Fig.5

Performance comparison for different methods on msnbc, (a) Kosarak (b) Bmswebview.

Fig.6

Comparison of average recall and precision for the different methods on Kosarak and B mswebview, (a) recalls with different windows (b) precisions with different windows.

The third experiment processes synthetic data streams, the performances of algorithms are shown in Fig. 7. Four data streams with different lengths of transactions or patterns are used, including: T10I4, T10I5, T10I10 and T20I5. Therefore, the performances of algorithms ranging from being excellent to poor are: TDMPDS|f_g ⟶ estDec|f_d ⟶ SWP|f_r. In detail, the performances of estDec|f_d and TDMPDS|f_a are almost the same, and the average values of recall and precision of UHS|f_h are lower than other algorithms.

Fig.7

Performance comparison for the different methods on synthetic data streams.

From all the experiments above mentioned, we can draw the following conclusions:

The changing of sliding window size has less influence on the performances of algorithms with f_a and f_g.

The high recall can be achieved when f is set as f_r or f_d. But there are many possible non-frequent patterns in the result because of low precision.

The high precision can be achieved when f is set as f_h. But some possible frequent patterns may be lost for the low corresponding recall.

Compared with setting f to boundary value, the novel method which sets f = f _ a can get more balanced recall and precision.

With Gaussian decay factor and setting parameters mean μ = 0 and variance δ² = B × N, it obtains good performance of algorithms. Therefore, it is reasonable to set the decay factor in this novel way.

Experimental results show that the usage of Gaussian decay factor can get optimal performance of algorithm in terms of both real and synthetic data streams compared with some classical methods.

5. Conclusion

The efficiency of time decay model depends on the ways to set the decay factor. Existing ways that set the decay factor to user-specified value, constant value independent of window size, or variable value with assuming 100% recall and 100% precision. This paper has summarized the existing methods and analyzed their disadvantages, then proposed two novel ways to set the decay factor. The first method is the average decay factor f _ agerage which can balance high recall and high precision of algorithm. The second method is the Gaussian decay factor f _ gauss which is based on Gaussian function, and detailed discussions on setting the parameters are also provided. The Gaussian decay factor further highlights the importance of the recent transactions and reduces the importance of historical transactions. Meanwhile, two kinds of TDMs based on f _ agerage and f _ gauss are proposed in this paper, and they are used to discover frequent patterns on real and synthetic data streams. The analysis of experimental results shows that the two decay factors we proposed are better than some existingmethods.

Footnotes

Acknowledgments

Project supported by National Nature Science Foundation of China (61563001) and Ningxia Natural Science Foundation (NZ17115).

References

Bifet , Adaptive stream mining: Pattern learning and mining from evolving data stream, Frontiers in Artificial Intelligence and Applications (2010), IOS Press.

J.H.

Chang and

W.S.

Lee , Finding recent frequent itemsets adaptively over online data streams, In SIGKDD'03, 2003, pp. 487–492.

J.H.

Chang and

W.S.

Lee , Finding recently frequent itemsets adaptively over online transactional data streams, Information Systems 31(8) (2006), 849–869.

T.P.

Chang , A sliding-window method to discover recent frequent query patterns from XML query streams, International Journal of Software Engineering and Knowledge Engineering 24(6) (2014), 955–980.

Chen ,

L.C.

Shu , et al., Mining frequent patterns in a varying-size sliding window of online transactional data streams, Information Sciences 215 (2012), 15–36.

Cheng ,

Ke and

Ng , A survey on algorithms for mining frequent itemsets over data streams, Knowledge and Information Systems 16 (2008), 1–27.

Cormode ,

Tirthapura and

Xu , Time-decaying sketches for robust aggregation of sensor data, SIAM Journal on Computing 34(9) (2009), 1309–1339.

Dawar ,

Sharma and

Goyal , Mining top-k high-utility itemsets from a data stream under sliding window model, Applied Intelligence 47(4) (2017), 1240–1255.

Han ,

Ding and

Li , TDMCS: An efficient method for mining closed frequent patterns over data streams based on time decay model, International Arab Journal of Information Technology 14(6) (2017), 851–860.

10.

Hassani ,

Töws and

Seid , Understanding the bigger picture: Batch-free exploration of streaming sequential patterns with accurate prediction, Symposium (2017), 866–869.

11.

HewaNadungodage ,

Xia , et al., Hyper-structure mining of frequent patterns in uncertain data streams, Knowledge and Information Systems 37(1) (2013), 219–244.

12.

Kim and

Yun , Mining high utility itemsets based on the time decaying model, Intelligent Data Analysis 20(5) (2016), 1157–1180.

13.

Lee and

Lee , Finding maximal frequent itemsets over online data streams adaptively, In ICDM'05, 2005, pp. 266–273.

14.

C.K.S.

Leung and

Jiang , Frequent itemset mining of uncertain data streams using the damped window model, In SAC'11, 2011, pp. 950–955.

15.

G.H.

Li , Mining the frequent patterns in an arbitrary sliding window over online data streams, Journal of Software 19(19) (2008), 2585–2596.

16.

H.F.

Li ,

C.C.

Ho , et al., A single-scan algorithm for mining sequential patterns from data streams, International Journal of Innovative Computing, Information and Control 8(3A) (2012), 1799–1820.

17.

H.M.

Nabil ,

A.S.

Eldin and

M.A.E.

Belal , Mining frequent itemsets from online data streams: Comparative study, International Journal of Advanced Computer Science and Applications 4(7) (2013), 117–125.

18.

Nori ,

Deypir and

M.H.

Sadreddini , A sliding window based algorithm for frequent closed itemset mining over data streams, Journal of Systems and Software 86(3) (2013), 615–623.

19.

Ramrez-Gallego ,

Krawczyk and so on, A survey on data preprocessing for data stream mining: Current status and future directions, Neurocomputing 239(C) (2017), 39–57.

20.

Ryang and

Yun , High utility pattern mining over data streams with sliding window technique, Expert Systems with Applications 57 (2016), 214–231.

21.

B.E.

Shie ,

P.S.

Yu and

V.S

Tseng , Efficient algorithms for mining maximal high utility itemsets from data streams with different models, Expert Systems with Applications 39 (2012), 12947–12960.

22.

M.J.

Sutha and

F.R.

Dhanaseelan , Mining frequent, maximal and closed frequent itemsets over data stream - a review, International Journal of Data Science and Applications 9(1) (2017), 46–62.

23.

J.D.

Yuan ,

Z.H.

Wang and so on, An effective pattern-based Bayesian classifier for evolving data stream. Neurocomputing, 2018, 295, pp. 17-28.

24.

Yun ,

Lee and

Yoon , Efficient high utility pattern mining for establishing manufacturing plans with sliding window control, IEEE Transactions on Industrial Electronics 64(9) (2017), 7239–7249.

25.

Yun ,

Kim and so on, Mining recent high average utility patterns based on sliding window from stream data, Journal of Intelligent and Fuzzy Systems 30(6) (2016), 3605–3617.

26.

Yun ,

Kim and so on, Damped window based high average utility pattern mining over data streams, Knowledge-Based Systems 144 (2018), 188–205.

27.

Yun and

Yoon , An efficient approach for mining weighted approximate closed frequent patterns considering noise constraints, International Journal of Uncertainty Fuzziness and Knowledge-Based Systems 22(6) (2014), 879–912.

28.

Zihayat ,

Chen and

An , Memory-adaptive high utility sequential pattern mining over data streams, Machine Learning 106(6) (2017), 799–836.

T_m	D
	f = f_d	f = f _ halflife	f = f _ recall	f = f _ precision	f = f _ average	f = f _ gauss\|_B=1	f = f _ gauss\|_B=2	f = f _ gauss\|_B=3
T₁	0.999930688	0.93	0.999989	0.997778	0.998884	0.9995	0.999750031	0.999833347
T₂	0.99986138	0.8649	0.999978	0.995561	0.997769	0.998002	0.9990005	0.999333556
T₃	0.999792077	0.804357	0.999967	0.993349	0.996656	0.99551	0.997752529	0.998501124
T₄	0.99972278	0.74805201	0.999956	0.991142	0.995543	0.992032	0.996007989	0.997336886
…	…	…	…	…	…	…	…	…
T₅0	0.996540263	0.026555069	0.994564529	0.894739	0.945699	0.286505	0.535261429	0.65924063

Efficient methods to set decay factor of time decay model over data streams

Abstract

Keywords

1. Introduction

2. Related work

2.1. Preliminaries

Table 1 Four transactions in a data stream TID itemset 1 1 3 4 2 2 3 5 3 1 2 3 5 4 2 3 4 5

Table 2 Values of decay function factor (f, x) f\x 1 100 1000 0.9999 0.999 0.990049339 0.90483289 0.999 0.990045 0.904792147 0.36769542 0.99 0.904382 0.366032341 4.3171E-05 0.9 0.348678 2.65614E-05 1.7479E-46

3.1. Average decay factor

1 http://archive.ics.uci.edu/ml

4.2. Performance

Footnotes

Acknowledgments

References

Table 1
Four transactions in a data stream

TID itemset

1 1 3 4

2 2 3 5

3 1 2 3 5

4 2 3 4 5

Table 2
Values of decay function factor (f, x)

f\x 1 100 1000

0.9999 0.999 0.990049339 0.90483289

0.999 0.990045 0.904792147 0.36769542

0.99 0.904382 0.366032341 4.3171E-05

0.9 0.348678 2.65614E-05 1.7479E-46

1
http://archive.ics.uci.edu/ml