Abstract
We consider the problem of efficiently online computing/filtering or analysis multimedia streams. In this scenario, we register a large scale of continuous analysis queries to filter pornographic stream items. Each query is a conjunction of filters. For instance, the query “does this image contain a people basking in the beach?” can be resolved by applying the conjunction of water, people, sand, sea filters successively on the stream item. However, the online evaluation of multimedia filters is indeed very expensive, fortunately there usually exist multiple filters shared among a lot of queries. In other words, each filter may occur in multiple queries. An open problem in such a filtering scenario is how to order the filters in an optimal sequence to achieve significant performance. Existing methods are based on a greedy strategy which orders the filters according to three factors (selectivity, popularity, cost). Although all these methods achieve good results, there are still some problems that haven’t addressed yet. First, the selectivity factor is set empirically, which can not adaptively adjust with multimedia stream. Second, the proportion relationships among the three factors (selectivity, cost, popularity) were not considerably explored. Under these observations,in this paper, we propose a Dynamic-Analytic hierarchy process Framework (DAF) which use a time-based compositional forecasting method, which is based on the idea of exponential smoothing, to deal with the factors’ proportion relationships dynamics. Experiments on both synthetic and real lift multimedia streams demonstrate that our proposed framework (DAF) provides much great adaptability in modeling the factors proportion relationships changing over multimedia stream environment.
Introduction
Many popular applications deal with multimedia that is pushed continuously and needs to be processed in real-time. Multimedia stream filtering which is a typical application example that users subscribe to many kinds of multimedia streams such as image streams, video streams, …, text streams. These streams may contain insalubrious items. To filter these items, we should register many continuous queries on these streams. Each query is a conjunction of filters and each filter occur in multiple queries. For instance, we register four queries just like in Table 1. Query
Query example:
,
,
,
Query example:
So we consider the problem of evaluating multiple overlapping queries defined on multimedia streams. Each query is a conjunction of filters and each expensive filter share across queries. We could achieve significant performance by sharing filters results among queries. If all filters contained a query return true, we say that the query was satisfied. Shared filter ordering is our central optimization problems. The goal of our optimization is to minimize the overall cost of evaluating the filters through sharing filters result across multiple queries. Through our optimization, we could decide the subset of queries that are satisfied by the coming data stream item.
As shown in Fig. 1, we need to decide all queries that are satisfied by the coming stream tuple. (In Fig. 1, the evaluation of a specific filter on the coming tuple in the multimedia stream results in a Boolean value).
This shows a figure consisting of four queries (
Shared filters ordering problem is very common in stream applications [14,29]. In general,the main factors of filters is cost, selectivity and popularity. These factors usually vary considerably in multimedia stream environment.
selectivity: For every filter, the selectivity of the filter is refer to average probability that a stream item satisfies the filter. We preferably evaluate the filters with lower selectivity value, since they are more likely to decide queries if the evaluation result false. cost: The time taken to evaluate a given filter. We should preferably evaluate filters with lower cost. popularity: The number of queries that contain a given filter. We should preferably evaluated filters with higher popularity value. Since filters with higher popularity could decide more queries.
The Shared Filter Ordering problem should be solved by taking all of the factors (selectivity, cost, popularity) into account. A simple execution strategy for such ordering problem is to evaluate every filter in each query independently on all incoming stream item. However, motivated by the need for efficient stream filtering, we could use a shared execution strategy, In which the filters are evaluated on every stream item in some shared result order for a set of queries. In this shared strategy, the evaluation result of All filters are shared across all queries registered in filtering system. In this paper, we tackle the central problem of finding the shared ordering in which filters are evaluated for every data stream item.
The shared filter ordering problem was introduced by [9]. They prove that it is NP-hard problem to find an optimal ordering with given filters and queries. They imply that efficient polynomial time algorithms for solving the shared filter ordering problem are impossible to exist.
However, the previously proposed greedy-based approaches have a most important limitations that we seek to address in this work:
Most greedy-based [8, 10] approaches set a fix value on all factors. Hence, these approaches result in a fixed filter evaluation ordering, which would be very inefficient in today’s rapidly changing environment context. [10, 8] propose their typical ordering evaluation method is by performing a simple weight function joint all the fixed factors, which cannot reflect the real priority weight among all factors.
Shared Filter Ordering problem poses the following major challenges:
The adaptivity re-evaluation of selectivity The selectivity of filters is set empirically, which can not adaptively adjust with multimedia stream environment. The characteristics of streams is ever-changing, it is of primary importance to be able to follow re-evaluate the selectivity over time as to enable the system to respond adaptivity and continuously. The proportion relationships among factors dynamic For a given filter, these factors may give contradictory suggestions for the result of filter ordering. So we must evaluate the proportion relationships among all factors (selectivity, cost, popularity) to taken all factor in a unify form. In multimedia stream environment, filter characteristics may change unpredictably during the evaluation on stream. The proportion relationships among all factors always not stabilize. It is important to be able to devise an strategy to take the proportion relationships among all factors in a unify form adaptively.
Therefore, to fill in this niche and to better deal with the factor priorities change over time in today’s rapidly changing environment, In this paper, we present a Dynamic Analytic hierarchy process Framework (DAF) which achieve efficiently pipeline filter ordering, DAF uses an analytic hierarchy process (AHP [7])-based ordering strategy for evaluating the factors’ priorities precisely. By carefully designing the underlying infrastructure and algorithms, DAF is able to unify the priorities factors of filters. In DAF, we propose a time-based compositional forecasting method, which is based on the fundamental of compositional data, to deal with the AHP [7] priority dynamics. Our proposed method is particularly useful when there is a large number of historical data in multimedia stream environment. Our framework addresses the above limitations of previously proposed approaches, including the following key features:
We develop an effective method to re-evaluate the selectivity factor. We define a notion of an uniform priorities function and show that DAF can effectively evaluate the ordering of filters use our method. We show by experiment that choosing the optimal filter ordering can improve performance. We demonstrate how DAF can adaptively change the evaluation plan as continuous queries run and that an adaptive ordering can outperform a static plan significantly when characteristics of incoming stream change.
We also develop a suite of variants on preview work [8, 10]. We have implemented our DAF algorithm in our multimedia data stream filtering system. Through experiment,we show that our DAF algorithm could significantly improve upon the performance of filter ordering. It is also shown that our proposed method provides much greater adaptability in modeling the factors priorities changing over multimedia stream compared to that of recently developed methods.
The rest of the paper is organized as follows: We formulate the shared filter ordering problem in Section 2. In Section 3 we present a brief description of the fundamentals which is used in this paper; The main contribution of this paper, which is the design the dynamic AHP[7]-based proprietors function, will be elaborated in Section 4. In Section 5, we report the experimental results obtained through our DAF framework. We discuss the related work in Section 6 and conclude the paper in Section 7.
Problem definition
Let query
In rapidly changing multimedia stream environment, it is very likely that the AHP priorities derived at one point in time might change in next stream window. A timely update strategy has to be carried out in order not to make a fallacious decision thereafter. In other words, to enable the multimedia data stream filtering system to respond adaptively and continuously over time of its filtering operations, there is a significant need to adaptive the changes over stream windows as to better anticipate the future stream tuples. Our goal is to decide the subset of queries that are satisfied by each incoming stream item
is the query set where
Simple solution
In this section, we introduce two simple solution to ordering the shared filters in multimedia data stream environment. Specifically,we first formulate a generic strategy by taking all factors static just like [10]. After that,we can formulate a more effective strategy by set the selectivity value adaptive.
Edge-Coverage Based Greedy algorithm (ECBG) ECBG algorithm will not stop evaluating filters until all the queries have been decided. The choice of next being computed filter is determined by the current residual graph. Specifically, let Entry-Point Adjust Base Slider-window stream (EPABS) We find that the selectivity of every filter is set by prior-knowledge in ECBG, so the selectivity of filters will not change over multimedia stream. This strategy can’t reflex the local information of multimedia streams. EPABS considers the special characteristics of multimedia window stream. For example, let
Simplex sample space and operations
The sample space of the compositional data is called the Simplex space[5, 6]. Specifically,in our DAF framework, we use the
Where
We define four important terminologies in terms of the operations in our simplex space, namely, the closure operator, the perturbation operation, the power transformation, and the inner product. For any vector
Let
The inner product of two vectors such as can be denoted as follows:
Adaptive Composition Priorities Forecasting (ACPF)
In this section, we propose our priorities method – Adaptive Composition Priorities Forecasting (ACPF), which can be applied using a simple forecasting procedure to fit the historical priorities data on current multimedia stream. We use
constraint, then
Following the widely known single exponential smoothing formula[2] and moving average method, Our adaptive composition priorities forecasting (ACPF) formula can be analogously expressed as in:
Note that
Symbol
In brief, in our shared filter ordering algorithm we use the last price function to finish the filter ordering evaluation.
General adaptive procedure of DAF
The proposed priorities method, which was described in the last subsections, was applied using a simple forecasting procedure. Generally, the procedure to use the proposed method can be described in Algorithm 3.3. In line 9–18, we collect the necessary historical result data on current multimedia stream, that is, the AHP final priorities of the entities or alternatives over stream window. This step uses the AHP to evaluation the priorities of all factors for a certain period of time, and the AHP was used to derive the importance of factors with respect to the existing condition during that period. In addition, these priorities evaluate from current multimedia stream over time should come from consistent judgments, as prescribed by the AHP internal mechanism. In line 22, we obtain a visual view of the change of the historical AHP priorities data of all factors over stream windows. A time series plot can be drawn using Cartesian coordinate system or a ternary diagram. In line 23, we use the proposed method, namely, the adaptive composition priorities forecasting (ACPF) method, which is described in the next subsections, to fit the historical priorities data on current multimedia stream. In this step, we select the best coefficient of the model based on which that gives the lowest value of fitting error. We use Aitchison distance[1] to evaluate the lowest fitting parameter. Next, we fit the historical priorities data using the optimal model parameter, and obtain the next period forecast.
[!hbp]
int curItemSize
init_priorit();//init the vector of all history priorities factors use AHP
(!S.Empty()) AND curItemSize
Record_history(vHistory);//Record all AHP history result data into vHistory;
sliding window
(Stream.BEGIN())
Obtain_view();//Obtain a visual view of the change of the historical priorities. Forecast_acpf(
(!Stream.Empty())
StreamItem
QuerySet qs
record(qs,t);
(Stream.END())
ReCompute_priority();//Re-Compute the priorities of all factors change in current window. Record_history(vHistory);//Record all history priority of all factors into vHistory.
;
Experiment
In this section, we describe a number of experiments we have run to evaluate the performance of DAF. We report experimental results and comparison of the simple solutions and our proposed DAF framework. The results show that our dynamic AHP-based algorithm consistently outperforms other algorithms under various settings. In this section, we first describe our evaluation methodology, and then present the experimental results in detail.
Evaluation methodology
Our prototype system can support various formats of multimedia data streams and their associated filtering operators. For our experiments, we use a set of 240 filters built through our similarity filtering system.We build the similarity filtering data structure by vectors extracted from training photos and each of these filtering operators detects the presence of one specific concept in an image. The set of filters contains all the concepts existed in Multimedia Analysis and Retrieval System (IMARS). For our test dataset, we crawling online image, video, text from Internet use larbin [24] spider and push them into our filtering system. We implement our prototype system for evaluating queries over multimedia streams and conduct extensive experiments in the context of image stream filtering. The experiment of our filtering system is conducted on a Linux server with a Duo 2.66 GHz CPU and 2 GB memory, using a variety of queries generated from different models and parameters. Our data stream filtering system consist of a server that keeps all queries plan and filter results. Our primary objectives were to evaluate all the queries in a low time cost by properly ordering all filters and by how much DAF can outperform those previously-proposed Greedy-based approach (EPABS, ECBC). All these algorithms are used to order pipelined filters over a single multimedia stream. In experiment comparison,the most important metric is the evaluation time, which is defined as the average time taken by our filtering system to evaluate one image item against all queries. The biggest difference between EPABS and ECBG is that EPABS adjusts the selectivity value of filters after a stream window has been evaluated. To achieve a fair and meaningful comparison, in one set of experiments, we vary only one factor parameter while fixing the other parameters as the feasible value. We repeat our experiments for 20 times and the results presented are the average of multiple runs.
Experiment results
Robustness to filters and queries
The first question we seek to answer in our experiments is how well different algorithms perform when the queries and filters grows. So we measure the scale of the graph by the number of filters and the number of queries. We conduct two groups of experiments to investigate these two aspects respectively. Firstly, we fix the number of queries as 4000 and increase the number of unique filters from 20 to 320, at every point we exhaust all available pre-defined image filters. Secondly, we fix 120 filters and gradually increase the number of queries from 200 to 7000. The experimental results are shown in Figs 2 and 3. We can observe from the figure that our DAF outperforms the other two algorithms in all conditions. When we increase the filters in our prototype system, the performance of system becomes more sensitive compared to that of the queries. This is because many queries share the same filters. Lastly, we can see that when the scale of filtering bipartite graph grows, all three algorithms can achieve sub-linear increase. Moreover, both our DAF framework and EPABS perform much better than ECBG in these experiments. This suggests that our improve strategy of heap-adjust number and priority computation number is helpful in finding the filters whose evaluation is most beneficial.
Evaluation time vs. Number of queries.
Evaluation time vs. Number of filters.
In this section, we study the impact of Slide Window Size (i.e., the number of stream items in one stream window). We fix all the other parameters while varying the window size in our experiments. Given that all three algorithms are designed for the multimedia stream filtering, we show the evaluation time of different window size in Fig. 4. We simulate the stream window in ECBG by dividing all stream items. Intuitively, the more stream items contained by one stream window, the better the performance one would expect from our DAF framework. Figure 4 results of evaluation time with different window sizes. We can make two observations from it. Firstly, our DAF framework performs best under various window sizes. Secondly, as the window size continues to increase, the performance gap between EPABS and DAF framework shrinks. Inversely, the performance gap between ECBG and DAF framework expands. This is because with bigger stream window size, many filters’ selectivity become more appropriate for current multimedia stream. It is much better than setting selectivity value of all filters by prior knowledge. ECBG is not able to reflex the accurate local information of the multimedia streams because of setting selectivity by prior knowledge.
Filtering items vs. Number of filters.
Evaluation time vs. Number of filters.
Filtering items vs. Number of filters.
Evaluation time vs. Number of filters.
We use a large scale of filters with different selectivities and periodically change the filters’s selectivities. Figure 5 shows a timeline from experiments where we varies the stream and filter characteristics over time. For instance, Fig. 5 shows that a change has been made after our filtering system have already processed 1000000 stream tuples. The y-axis in Fig. 5 shows the number of filters evaluated per 2500 multimedia stream tuples. As expected,when multimedia stream environment dynamic changes, Fig. 6 shows that DAF is always converges to more optimal ordering must faster than any other algorithms such as EPABS,ECBG. In general, DAF has higher run-time overhead but faster adaptivity to changes than EPABS or ECBG. Figure 7 shows this tradeoff by plotting on the y-axis the total time to process a workload where filter characteristics are varied periodically, with the period shown on the x-axis. The lower this period, the higher the rate of change, we used 160 filters, these filters’s parameter is: 100 filters’ cost is ten, 60 filters’ cost is two. We randomly change the selectivity of all filters. Figure 7 shows that when the rate of change is high, the faster adaptivity of DAF enable it to significantly outperform EPABS and ECBG. However, the advantage of DAF diminishes when we reduce the rate of change.
Run-time overhead comparison
Figures 3 and 6 show the processing rate and run-time break-down of all our adaptive algorithms under convergence, varying n. We use a drop-profiling probability of 0.07 in this experiment to better illustrate the differences between these algorithms. Now we see the benefits of the lower run-time overhead of EPABS and ECBG. Figure 3 shows that the gap between DAF and EPABS and ECBG increases with n. Figure 6 shows that this widening gap is solely because of DAF’s runtime overhead.
Related work
The problem of multi-query optimization has been tackled in[10, 11]. These works focused on relational operators of database. This is the most important difference between our problem and the multi-query in relational database. We consider the operators as arbitrary action. Adaptive Continuous Queries [12] wanted to optimize the evaluation of continuous queries defined on data streams by sharing relational operators across queries. Their strategy adapted to changes in operator costs and selectivity over time. In order to optimize a large number of continuous queries in the Internet, [12] and its extension [13, 14] proposed different grouping mechanisms for continuous queries. But they also considered the operation as the relational operator. Hence, many of their strategies are not applicable in our context. The term “shared filter ordering” was introduced by [9]. They present two different strategies to solve this problem and showed that the adaptive strategy outperformed the non-adaptive strategy. The adaptive strategy in [9] was a query-coverage base greedy algorithm. Then [10] improved upon those of [9, 10] proposed an edge-coverage based greedy algorithm whose performance was better than [9] under various settings. Our problem also generalizes the shared ordering problem [8, 9, 10].
Conclusion
In multimedia data streams analysis/filtering/computing scenario, each filter occur in multiple analysis queries and each query is a conjunction of filters. There usually exist multiple continuous analysis queries and the evaluation of online multimedia stream filters is indeed very expensive. An open problem in such a filtering scenario is how to order the filters in an optimal sequence to achieve significant performance. Existing works are based on a greedy strategy which orders the filters according to three factors. Although all these methods achieve good results, there are still some problems that haven’t addressed yet. First, the selectivity factor is set empirically, which can not adaptively adjust with multimedia stream.Second, the proportion relationships among the three factors were not considerably explored. Under these observations, in this paper, we propose a Dynamic-Analytic hierarchy process Framework (DAF) which use a time-based compositonal forecasting method, which is based on the idea of exponential smoothing, to deal with the factors’ proportion relationships dynamics. Experiments on both synthetic and real lift multimedia streams demonstrate that our proposed analysis framework (DAF) provides much great adaptability in modeling the factors proportion relationships changing over multimedia data stream environment.
Footnotes
Acknowledgments
This work is supported by the Joint Funds of the National Natural Science Foundation of China (Grant No. U1936111).
