Multidimensional skyline analysis based on agree concept lattices

Abstract

The Skyline concept has been introduced in order to exhibit the best objects according to all the criterion combinations and makes it possible to analyze the relationships between Skyline objects. Like the data cube, the Skycube groups all the multidimensional Skylines and it is so voluminous that reduction approaches are a necessity. In this paper, we define an approach which partially materializes the Skycube. The underlying idea is to discard from the representation the Skycuboids which can be computed again easily. To meet this reduction objective, we characterize a formal framework: the Agree Concept Lattice. It provides a formal framework which makes it possible to improve computation time, reduce representation and easily navigate through the Hasse diagram in order to focus on the most relevant skycuboids. This structure is generic, applies to various database analysis problems and combines both formal concept analysis and database theory. It makes use of the concepts of agree set and database partition. They are associated to define the Agree Concept of a database relation. The set of all the Agree Concepts is organized within the Agree Concept Lattice. From this structure, we derive the Skyline concept lattice which is one of its constrained instances for efficient multidimensional Skyline analysis. The strong points of our approach are: (i) it is attribute oriented; (ii) it provides a boundary for the number of lattice nodes; (iii) it facilitates the navigation within the Skycuboids.

Keywords

Concept lattices databases data cube Skyline Olap-mining

1. Introduction

The formal concept analysis has been successfully used for solving various database, data mining or data warehouse problems. Let us quote the discovery of frequent itemsets and association rules [31, 38], their concise representation, the extraction of functional dependencies [24] (exact, approximate or conditional), the computation of constrained closed cubes and quotient cubes [11, 26, 10, 23] or the multidimensional database analysis through the concept of skycube [35, 34, 33, 43]. The common points of all these issues are the manipulation and storage of very voluminous data sets and the required complex and costly computations. All these approaches take advantage of the formal concept analysis expressiveness [19] in order to soundly characterize the tackled problems, devise efficient algorithms and propose reduced representations as well as visual navigation tools for navigating through the solution space. Our work shares a similar spirit. In this paper, our aim is to propose a theoretical foundation for the multidimensional skyline analysis and lossless partial materialization of skycube. Moreover the approach is validated through experimental evaluations in order to measure both the size reduction and the regeneration time. Let us remind the context in which the studied issue fits.

In a decisional context, some queries do not yield results because the user is interested in tuples (objects) for which the values of certain criteria are optimal. Such queries are fruitless because of their “multi-criterion” feature. Actually, a tuple can be optimal for a given criterion but not for another one; then it is eliminated from the result whereas it could be relevant for the user. For instance, if we consider a housing database, searching an “ideal” housing can combine conditions on the price, the smallest possible, the surface, the largest possible and the distance from the work place, the most reduced as possible. Of course, this ideal housing does not likely exist and therefore the query does not yield any result. However some housings could be relevant for the user because they are located in a close, but not neighboring, area and satisfy the criteria of maximal surface and minimal price.

In order to answer this kind of queries, the Skyline operator [9] has been introduced in a database context. It proposes a solution for the multi-criterion optimization queries over very large databases. The result of such queries is exactly the Pareto front. It considers the set of the criteria chosen for a search as being preferences and extracts the tuples which are optimal for this set of preferences. Therefore, instead of searching an ideal hypothetic solution, it extracts the candidates which are the closest ones to the user expectations. Its general principle is based on the notion of dominance. An object or a tuple is dominated by another one if, for all the criteria relevant for the decision maker, it is less optimal than this other one. Such a tuple is discarded from the result not because it is not relevant for one of the criteria but because it is not optimal according to the combination of all the criteria. In other words, a better solution exists and will be kept.

Like the data cube [21] which exhibits the relationships existing between the various levels of aggregation, a multidimensional generalization of the Skyline has been proposed: the Skycube [35, 29, 34, 33, 43]. This structure groups all the possible SKYLINES according to the various criterion combinations. Then, it is possible to efficiently retrieve the dominating objects according to criterion combinations. This structure also makes it possible to observe the behavior of the dominating objects through the multidimensional space and hence analyze and understand the different dominance factors.

For yielding a Skyline, a naive solution is to compute on line the Skyline query results. Unfortunately, this approach is so time consuming that it cannot apply in practice. On the other side of the coin, another solution is to compute the whole Skycube. Then answering Skyline queries can be easily and efficiently performed because their results have just to be picked out. However, the main drawback of this approach is the tremendous volume of data which must be produced and stored. Between these two extreme cases, a good compromise is to find the best balance between a too costly on line execution time and a too voluminous stored data. The idea under such a compromise is to partially materialize the Skycube. This means that the size of kept data is reduced and for some queries the results are yet available. Nevertheless, when considering the queries for which the result is not yet computed, the issue is to retrieve their result at the lowest cost.

In this paper we characterize the Agree Concept Lattice as a theoretical framework for multidimensional Skyline analysis. From database theory, we make use of the concepts of partition and agree set [5, 36, 24]. The partition of a relation according to a set of attributes $X$ is a set of parts (or equivalence classes) in which all the stored objects (or tuples) share the same value for $X$ . An agree set is a set of attributes for which certain tuples agree, i.e. share the same value. The agree sets and partitions are combined through a particular Galois connection in order to constitute the Agree Concepts. The set of all the Agree Concepts is provided with a twofold order relationship: inclusion between attribute sets and refinement between database partitions. The result is the Agree Concept Lattice.

We define an approach based on the Agree Concept Lattice which makes it possible to partially materialize Skycubes and therefore offers a representation with both a reduced size and the guaranty to build up later the whole result. In contrast to [34, 35], our reduction approach is attribute (criterion) oriented and thus is provided with the same navigation qualities as the ones of the full Skycube. Many approaches address the optimisation of Skyline queries and propose efficient algorithms [39, 4, 14]. Nevertheless, when the number of criteria increases, a “curse of dimensionality” can happen [40, 12, 6, 1]. This case is probably more frequent when a Skyline over join is to be processed [39, 37] or if several data sets have to be explored in a distributed context [22]. Our work supplements these approaches by proposing sub-sets of possibly relevant criteria and aids users to choose the best ones.

This paper is an extended and merged version of the work about the agree concept lattice [28] and the Skyline concept lattice [29]. These new structures are studied in more depth and experimental evaluations are performed. The remainder of this article is the following. In Section 2, we recall the context of our work: the concepts of Skyline and Skycube. Then, we propose a theoretical foundation for multidimensional database analysis (Section 3) and applies it to the partial materialization of Skycubes (Sections 4 and 5). It is compared with related work in Section 6. In conclusion, we highlight the advantages of our approach and evoke further research work. In Appendix A, we remind the formal background in which our approach fits.

2. The Skycube for the multidimensional analysis of Skylines

In this section, we firstly present the Skyline operator as well as the tackled issue. Then, we describe the multidimensional analysis of Skylines through the concept of the Skycube.

2.1 The Skyline operator

Before we formally define the Skyline operator, the application context of our issue must be well described. Indeed, the operator does not apply to any relation. In order to perform the required comparisons, the various semantic domains used to define the attributes, choice criteria of the user, must be totally ordered. For the sake of simplicity and from now on, we consider that these attributes are provided with numerical values.

The tuples of relations which can be used by the Skyline operator are as follows: $t=(d_{1},d_{2},\ldots,d_{k},c_{1},c_{2},\ldots,c_{l})$ where the $d_{i}$ stand for the dimensions not used by the Skyline operator whereas the $c_{i}$ represent the criteria involved in the user choice.

Table 1
The relation Housing

RowId	Owner	City	Price	Distance	Consumption	Neighbour
1	Dupont	Marseilles	220	15	275	5
2	Dupond	Paris	100	15	85	1
3	Martinez	Marseilles	150	7	180	1
4	Sanchez	Aubagne	340	7	85	3
5	Rodriguez	Paris	100	7	180	1

.

The relation depicted in Table 1 lists various housings and is typical for illustrating the use of the Skyline operator. The classical dimensional attributes are here Owner and City and the choice criteria for finding the best housing are: the sale Price in thousands of euros, the Distance from the work place in kilometers, the energy Consumption in kilowatt-hours per year and square meter, the number of Neighbours.

(Dominance relationship).

Let $\mathcal{C}=\{c_{1},c_{2},\ldots,c_{d}\}$ be the set of criteria concerned by the Skyline operator.2

Without any loss of generality, we only consider from now on the case where all the criteria must be minimized.

Let

t

and

t^{\prime}

be two tuples, the dominance relationship according to the set of criteria

\mathcal{C}

is defined as follows:

$t\succeq_{\mathcal{C}}t^{\prime}\;\Leftrightarrow\;t[c_{1}]\leqslant t^{\prime% }[c_{1}]\;and\;t[c_{2}]\leqslant t^{\prime}[c_{2}]\;and\;\ldots\;and\;t[c_{d}]% \leqslant t^{\prime}[c_{d}]$

When $t\succeq_{\mathcal{C}}t^{\prime}$ and $\exists\ c_{i}\in\mathcal{C}$ such that $t[c_{i}]<t^{\prime}[c_{i}]$ , The dominance is strict, it is noted $t\succ_{\mathcal{C}}t^{\prime}$ .

When a tuple $t$ dominates another tuple $t^{\prime}$ (i.e. $t\succeq_{\mathcal{C}}t^{\prime}$ ), it means that $t$ is “equivalent to” or “better than” the tuple $t^{\prime}$ for all the chosen criteria. Since we assume that all the criteria must be minimized, the values of $t$ for all the criteria are less than or equal to the values of $t^{\prime}$ . Then, when a multi-criterion search is performed, tuples dominated by other ones (at least one) are not relevant and are eliminated from the result returned by the Skyline operator.

.

With our relation example (cf. Table 1) and the following criteria $\mathcal{C}=\{$ Distance, Price $\}$ , $t_{5}\succeq_{\mathcal{C}}t_{4}$ because $t_{5}[$ Distance $]\leqslant t_{4}[$ Distance $]$ and $t_{5}[$ Price $]\leqslant t_{4}[$ Price $]$ . As $t_{5}[$ Price $]<t_{4}[$ Price $]$ , the tuple $t_{5}$ dominates strictly $t_{4}$ .

(The Skyline operator).

Let $r$ be a relation, the Skyline of $r$ is the set of tuples which are not dominated by any other tuple according to the criterion set $\mathcal{C}$ :

$SKY_{\mathcal{C}}(r)=\{t\in r\;|\;\nexists\;t^{\prime}\in r\smallsetminus t,\;% t^{\prime}\succ_{\mathcal{C}}t\}$

.

With our relation example (cf. Table 1) and the following criteria $\mathcal{C}=\{$ Distance, Price $\}$ 3

From now on and in order to simplify notations, when there is no ambiguity, we note the sets without braces and commas. For instance, $\{A,B\}$ is noted $A B$ .

SKY_{\mathcal{C}}(

Housing

)=\{t_{5}\}

because the tuple

t_{5}

dominates all the other ones. It is therefore the best possible housing for the user.

By using the Skyline Of clause [9], wich is already integrated in PostgreSQL [15], the Sql formulation of the previous example is the following :

SELECT *

FROM HOUSING

SKYLINE OF DISTANCE MIN, PRICE MIN

This query is equivalent to the following standard Sql query :

SELECT *

FROM HOUSING H1

WHERE NOT EXISTS (SELECT *

FROM HOUSING H2

WHERE H2.DISTANCE <= H1.DISTANCE

AND H2.PRICE <= H2.PRICE

AND (H2.DISTANCE < H1.DISTANCE

OR H2.PRICE < H2.PRICE));

2.2 The Skycube

The Skyline operator is a fundamental tool for database multi-criteria analysis. However, when several Skylines have to be computed from a very same set of data, the operator does not take advantage of the links existing between the different Skylines. This is why a structure called Skycube has been introduced [43, 33]. In fact, the relationship between the Skyline operator and the Skycube is similar to the one between the Group-By operator and the data cube: a multidimensional generalization. When the objective is to efficiently answer all the queries expressed on a set of related Skylines, materializing the associated Skycube is of great interest.

(Skyline subspace).

Let $\mathcal{B}\subseteq\mathcal{C}$ ( $\mathcal{B}\neq\emptyset$ ) be a criterion subset forming a subspace $|\mathcal{B}|$ -dimensional of $\mathcal{C}$ . For a tuple $p\in r$ , the projection of $p$ in the subspace $\mathcal{B}$ , noted $p[\mathcal{B}]$ , belongs to the subspace Skyline according to $\mathcal{B}$ if no tuple $q[\mathcal{B}]$ (with $q\in r$ ) dominates $p[\mathcal{B}]$ . We note $SKY_{\mathcal{B}}(r)$ the set of tuples of the subspace Skyline according to $\mathcal{B}$ for the relation $r$ .

(Skycube).

A Skycube is the set of all the Skylines in all the possible not empty subspaces of $\mathcal{C}$ :

$\textit{ {Skycube}}(r,\mathcal{C})=\{(C,\textit{SKY}_{C}(r))\mid C\subseteq% \mathcal{C}\}$

$\textit{SKY}_{C}(r)$ is called the cuboid Skyline (or Skycuboid) of the subspace $C$ . By convention the cuboid according to the empty criterion set is empty (i.e. $\textit{SKY}_{\emptyset}(r)=\emptyset$ ).

The structure of the Skycube can be represented by a lattice similar to the lattice used for the data cube (cf. Fig. 1). The cuboids of the Skycube are grouped into levels according to their number of criteria. These levels are numbered by begining by the lattice bottom (cuboids encompassing a single criterion) and going up to the lattice vertex (cuboid according to the whole set of criteria). Let us consider two cuboids: $SKY_{\mathcal{U}}(r)$ according to the dimension set $\mathcal{U}$ and $SKY_{\mathcal{V}}(r)$ according to $\mathcal{V}$ . If $\mathcal{U}\subset\mathcal{V}$ we say that the cuboids have a link parent-child, $SKY_{\mathcal{V}}(r)$ is called the ancestor of $SKY_{\mathcal{U}}(r)$ and $SKY_{\mathcal{U}}(r)$ the descendant of $SKY_{\mathcal{V}}(r)$ . Moreover, in the case where $|\mathcal{U}|=|\mathcal{V}-1|$ , $SKY_{\mathcal{V}}(r)$ is called the cuboid father of $SKY_{\mathcal{U}}(r)$ and $SKY_{\mathcal{U}}(r)$ the cuboid sun of $SKY_{\mathcal{V}}(r)$ .

(Skycube).

Figure 1 gives the Skycube associated to the relation Housing (cf. Table 1). The criteria are symbolized by their initial.

Figure 1.

Representation as a lattice of the Skycube of the relation Housing

The multidimensional Skyline queries yield the subset of tuples of the original relation making up the Skylines for a given subspace. Once the Skycube is computed, it is possible to efficiently answer these queries for any subspace.

2.3 Problems related to the multidimensional analysis of the Skylines

In general, if a tuple $t$ belongs to the Skylines of the subspaces $C_{1}$ and $C_{2}$ such that $C_{1}\subset C_{2}$ , can we claim that $t$ will also belong to the Skyline of any subspace $C$ located between $C_{1}$ and $C_{2}$ ( $C_{1}\subset C\subset C_{2}$ )? Such a property would be specially interesting since it could significantly reduce the multidimensional Skylinecomputation. Unfortunately it does not hold in the general case.

.

With the relation Housing as shown in Fig. 1, by considering the Skylines according to (Price, Distance, Neighbor) and (Distance, Neighbor) we have:

•
(Price, Distance, Neighbor) $\supseteq$ (Distance, Neighbor) and
•
$\textit{SKY}_{\textit{PDN}}($ Housing $)\subseteq\textit{SKY}_{\textit{DN}}($ Housing $)$ .

In contrast, if we consider the following Skylines (Price, Distance, Consumption, Neighbor) and (Price, Distance, Consumption), we have:

•
(Price, Distance, Consumption, Neighbor) $\supseteq$ (Price, Distance, Consumption) and
•
$\textit{SKY}_{\textit{PDCN}}($ Housing $)\supseteq\textit{SKY}_{\textit{PDC}}($ Housing $)$ .

The observation highlighted by the previous example is the following: belonging to a Skyline is not monotone, i.e. a tuple $t$ in a cuboid $\textit{SKY}_{\mathcal{U}}(r)$ is not necessarily present in the ancestors of this cuboid.

Like for the data cube, the Skycube can enclose superfluous information [23, 42, 25, 27]. This issue has motivated the proposal of reduced representations of the Skycube presented in [34]. Our contribution is interested in the same issue of reduced representation and also combines the formal concept analysis [19] and the Skyline. In order to avoid the important cost of reconstruction of Skyline cuboids originated by the value oriented grouping of [34], our reduction method chooses a criterion oriented grouping based on the agree sets.
3. The Agree Concept Lattice of a database relation

In this section, our objective is to give the definition of a formal framework combining both the concept of agree set and the one of concept lattice. We propose a new structure, the Agree Concept Lattice of a relation [28], which can be used to solve several multidimensional database analysis problems. Then we soundly characterize the Agree Concept Lattice.

3.1 The Agree Concepts of a database relation

Our objective is to define a particular concept lattice which is based on the agree sets and the database partitions which is recalled in the appendix. In order to meet this goal, we characterize an instance of the Galois connection between on the one hand the lattice of the power set of the attribute set and on the other hand the lattice of the partitions of the tuple identifier set. This connection makes it possible to define dual closure operators, introduce the Agree Concept and characterize the Agree Concept Lattice.

.

Let $Rowid:r\rightarrow\mathbb{N}$ be a total function which associates to each tuple a single natural integer and $Tid(r)=\{Rowid(t)\mid t\in r\}$ . Let $f$ , $g$ be two total functions between the ordered sets $\langle\prod(Tid(r)),\sqsubseteq\rangle$ and $\langle{P}(\mathcal{R})$ ,4

⁴
$\mathscr{P}(\mathcal{R})$ is the powerset lattice of the attribute set of the database relation $r$ .

\subseteq\rangle

which are defined as follows:

$\begin{array}[]{lrcl}f:&\langle\prod(\textit{Tid(r)}),\sqsubseteq\rangle&% \longrightarrow&\langle\mathscr{P}(\mathcal{R}),\subseteq\rangle\\ \par &\pi&\longmapsto&\displaystyle\bigcap_{[t]\in\pi}\textsc{Agr}(\{t_{i}\mid i% \in[t]\})\\ \par g:&\langle\mathscr{P}(\mathcal{R}),\subseteq\rangle&\longrightarrow&% \langle\prod(\textit{Tid(r)}),\sqsubseteq\rangle\\ \par &X&\longmapsto&\displaystyle\{[t]_{X}\mid t\in r\}\\ \end{array}$

As stated in the prévious section, the Rowid is a virtual attribute which identifies the tuples of a relation. This attribute is generally provided by the database systems.

For an attribute set $X$ , the equivalence class set according to $X$ forms a partition of $Tid(r)$ . The function $g$ associates this partition of identifiers to $X$ . This partition is noted $\pi_{X}$ and defined as follows: $\pi_{X}=g(X)$ . The set of all the possible partitions $\pi_{X}$ is noted $\Pi_{\mathscr{P}(\mathcal{R})}$ [24]. The function $f$ performs the opposite association.

.

With the relation Housing, by considering the following attribute sets: $D$ , DC, DCN and PN, we have:

•

$\textit{g(D)}=\{12,345\}$ 5

⁵

$t_{1}$ and $t_{2}$ are in the same equivalence class. $t_{3}$ , $t_{4}$ and $t_{5}$ belong to an other class.

•

$\textit{g(DC)}=\{1,2,35,4\}$ ,

•

$\textit{g(DCN)}=\{1,2,35,4\}$ and

•

$\textit{g(PN)}=\{1,25,3,4\}$ .

With the partitions $\{1,2,35,4\}$ and $\{1,2,345\}$ , we have:

•

$f(\{1,2,35,4\})=\textit{DCN}$ and

•

$f(\{1,2,345\})=D$ .

.

The couple of functions $gc=(f,\ g)$ is a Galois connection between the attribute power set lattice of $\mathcal{R}$ and the lattice of partitions of Tid(r).

Proof..

Due to the definition of the order relationship between partitions and the definition of $f$ , it is trivial to show that $\pi\sqsubseteq\pi_{X}\Leftrightarrow X\subseteq f(\pi)$ . According to [19], $gc=(f,\ g)$ is a Galois connection. ∎

(Closure operators).

The couple $gc=(f,\ g)$ is a particular case of the Galois connection. The compositions $f\circ g$ and $g\circ f$ of the two functions are closure operators [19]. They are defined below:

$\begin{array}[]{lrcl}h:&\mathscr{P}(\mathcal{R})&\longrightarrow&\mathscr{P}(% \mathcal{R})\\ &X&\mapsto&\displaystyle f(g(X))=\bigcap_{\begin{subarray}{c}X^{\prime}\in% \textit{{AGREE}(r)}\\ X\subseteq X^{\prime}\end{subarray}}X^{\prime}\\ \par h^{\prime}:&\Pi(\textit{Tid(r)})&\longrightarrow&\Pi(\textit{Tid(r)})\\ &\pi&\longmapsto&\displaystyle g(f(\pi))=\operatorname*{\bullet}_{% \begin{subarray}{c}\pi^{\prime}\in\Pi_{\mathscr{P}(\mathcal{R})}\\ \pi\sqsubseteq\pi^{\prime}\end{subarray}}\pi^{\prime}\\ \end{array}$

$h$ and $h^{\prime}$ satisfy the following properties:

$X\subseteq X^{\prime}\Rightarrow h(X)\subseteq h(X^{\prime})$ and $\pi\sqsubseteq\pi^{\prime}\Rightarrow h^{\prime}(\pi)\sqsubseteq h^{\prime}(% \pi^{\prime})$ (monotony)

$X\subseteq h(X)$ et $\pi\sqsubseteq h^{\prime}(\pi)$ (extensivity)

$h(X)=h(h(X))$ et $h^{\prime}(\pi)=h^{\prime}(h^{\prime}(\pi))$ (idempotence)

.

With the relation Housing, by considering the attribute sets DC and DCN, according to the previous example, we have:

•

$\textit{h(DC)}=\textit{f(g(DC))}=f(\{1,2,35,4\})=\textit{DCN}$

•

$\textit{h(DCN)}=\textit{f(g(DCN))}=f(\{1,2,35,4\})=\textit{DCN}$

With the partitions $\{1,2,35,4\}$ and $\{1,2,345\}$ , we have:

•

$h^{\prime}(\{1,2,35,4\})=g(f(\{1,2,35,4\}))=\textit{g(DCN)}=\{1,2,35,4\}$

•

$h^{\prime}(\{1,2,345\})=g(f(\{1,2,345\}))=g(D)=\{12,345\}$

(Agree concepts).

An Agree Concept of a database relation $r$ is a couple $(X,\pi)$ associating a set of attributes to a partition of identifiers (i.e. $X\in\mathscr{P}(\mathcal{R})$ and $\pi\in\Pi(\textit{Tid(r)})$ ). The elements of this couple must be related by the following conditions:

$\left\{\begin{array}[]{rcl}X&=&f(\pi)\\ \pi&=&g(X)=\pi_{X}\end{array}\right.$

Let $c_{a}=(X_{c_{a}},\pi_{c_{a}})$ an Agree Concept of $r$ , we call $\pi_{c_{a}}$ the extent of $c_{a}$ (noted $\textit{ext}(c_{a})$ ) and $C_{c_{a}}$ its intent (noted $\textit{int}(c_{a})$ ). The set of all the Agree Concepts of a relation $r$ is noted AgreeConcepts( $r$ ).

(Agree Concept Lattice).

Let AgreeConcepts( $r$ ) be the set of Agree Concepts of a relation $r$ . The ordered set $\langle\textsc{AgreeConcepts($r$)},\leqslant\rangle$ forms a complete lattice called the Agree Concept lattice. 6

⁶

Let $(X_{1},\pi_{1})$ , $(X_{2},\pi_{2})\in\textsc{ConceptsAccords($r$)}$ , $(X_{1},\pi_{1})\leqslant(X_{2},\pi_{2})\Leftrightarrow X_{1}\subseteq X_{2}$ (or in an equivalent way $\pi_{2}\sqsubseteq\pi_{1}$ ).

$\forall P\subseteq\textsc{ConceptsAccords($r$)}$ , the infimum or lower bound ( $\bigwedge$ ) and supremum or upper bound ( $\bigvee$ ) are given below:

$\displaystyle\bigwedge P=(\bigcap_{c_{a}\in P}\textit{int}(c_{a}),\ h^{\prime}% (\operatorname*{+}_{c_{a}\in P}\textit{ext}(c_{a})))$ $\displaystyle\bigvee P=(h(\bigcup_{c_{a}\in P}\textit{int}(c_{a})),\ % \operatorname*{\bullet}_{c_{a}\in P}\textit{ext}(c_{a}))$

Proof..

Since the couple $gc=(f,\ g)$ is a Galois connection, the Agree Concept Lattice is a concept lattice according to the fundamental theorem of Wille [19]. ∎

Figure 2.

Hasse diagram of the Agree Concept Lattice of the relation Housing.

.

Figure 3 gives the Hasse diagram of the Agree Concept lattice of the relation Housing. The couple $(\textit{DCN},\{1,2,35,4\})$ is an Agree Concept because according to the Example 6 we have $\textit{g(DCN)}=\{1,2,35,4\}$ and $f(\{1,2,35,4\})=\textit{DCN}$ . In contrast, the couple $(\textit{DC},\{1,2,35,4\})$ is not an Agree Concept because $f(\{1,2,35,4\})\neq DC$ . Let $c_{a}=(\textit{DCN},\{1,2,35,4\})$ and $c_{b}=(\textit{PN},\{1,25,3,4\})$ be two Agree Concepts, we have:

$\displaystyle c_{a}\wedge c_{b}=(\textit{DCN}\cap PN,\ h^{\prime}(\{1,2,35,4\}% \operatorname*{+}\{1,25,3,4\}))$ $\displaystyle=(N,\ h^{\prime}(\{1,235,4\}))=(N,\ \{1,235,4\})$ $\displaystyle c_{a}\vee c_{b}=(\textit{h(DCN}\cup\textit{PN}),\ \{1,2,35,4\}% \operatorname*{\bullet}\{1,25,3,4\})$ $\displaystyle=(\textit{h(PDCN}),\ \{1,2,3,4,5\})=(\textit{PDCN},\ \{1,2,3,4,5\})$

.

For any attribute set $X\subseteq\mathcal{R}$ , the associated partition $\pi_{X}$ is identical to the partition of its closure.

$\forall X\subseteq\mathcal{R},\pi_{X}=\pi_{h(X)}$

Proof..

By definition, $\forall X\subseteq\mathcal{R}$ , $\pi_{X}=g(X)$ and $h(X)=f(g(X))$ . Thus, we have $\pi_{h(X)}=g(f(g(X))$ . Since the couple $gc=(f,\ g)$ is a Galois connection, we have $g\circ f\circ g=g$ [19]. Then, $\pi_{h(X)}=g(f(g(X)))=g(X)=\pi_{X}$ . ∎

.

With the relation Housing, by considering the set of attributes DC, according to the Examples 6 and 7, we have: $\pi_{\textit{DC}}=\textit{g(DC)}=\{1,2,35,4\}$ and $\pi_{\textit{h(DC)}}=\textit{g(h(DC))}=\textit{g(DCN)}=\{1,2,35,4\}$ .

The previous proposition means that the closure of a set of attributes $X$ can be seen as the greatest super-set of $X$ provided with the same partition.

4. The Skyline concept lattice

The lattice of the Skyline concepts is a constrained lattice of Agree concepts. We prove a fundamental property of our lattice which makes it possible to partially materialize the Skycube by discarding certain cuboids. Then, we show how such cuboids can be easily built again. Like in the original definition of Skycube [43, 33], the dimension attributes are ignored. Without any performance drawback, we can projected over this attributes et retrieve them by a simple join thanks to the Rowid virtual attribute.

4.1 Skyline concept lattice for efficient multidimensional Skylines analysis

Let $\pi_{C}$ be a partition of $r$ over the set of criteria $C$ . By definition, the tuples of a same equivalence class $[t]_{C}\in\pi_{C}$ cannot be distinguished over $C$ (they share the same projection onto $C$ ). If $t$ is not dominated on $C$ , all the tuples of its class will not be dominated. Thus it is sufficient to check the dominance of a single tuple in an equivalence class to know whether the whole tuple set of the class belongs or not to the Skycuboid according to $C$ . Thus, for optimizing the computation of a Skycuboid according to $C$ from its partition $\pi_{C}$ , we only preserve a single representative of each equivalence class. This set of representatives is denoted by $\textit{reps}(\pi_{C})$ . With this set, we reduce the size of the input by discarding tuples which would originate a great number of useless comparisons. In order to take into account the features of the Skyline operator computation from a partition, we introduce the following operator.

.

Let $C$ a set of criteria and $\pi$ a partition of $r$ . We define the new operator $\pi$ -Sky in the following way:

$\displaystyle\pi\text{{Sky}}_{C}(\pi_{C})=\{[t_{i}]\in\pi_{C}\mid\forall t_{j}% \in r\text{ we have }t_{j}\nsucc_{C}t_{i}\}$ $\displaystyle=\{[t_{i}]\in\pi_{C}\mid t_{i}\in\textsc{Sky}_{C}(r)\}$

(Concepts Skylines).

Let $c_{a}=(C,\pi)\in\textsc{AgreeConcept}(r)$ be an Agree concept of a relation $r$ . The concept Skyline associated $c_{s}$ is defined as follows:

$c_{s}=(C,\pi\textsc{Sky}_{C}(\pi))$

There are exactly as many Skyline concepts as there are Agree concepts. The set of Skyline concepts associated to the Agree concepts of $r$ is noted ConceptsSkylines $(r)$ .

The Skyline concepts are Agree concepts for which the partitions have been constrained. Thus, this type of concept is not necessarily ordered by the order relation $\sqsubseteq$ between partitions. The order relation $\leqslant$ between Skyline concepts is expressed in the following way: for all $(C_{1},\pi\textsc{Sky}_{C_{1}}(\pi_{1}))$ , $(C_{2},\pi\textsc{Sky}_{C_{2}}(\pi_{2}))\in\textsc{SkyConcepts($r$)}$ , we have $(C_{1},\pi\text{-}\textsc{Sky}_{C_{1}}(\pi_{1}))\leqslant(C_{2},\pi\textsc{Sky% }_{C_{2}}(\pi_{2}))\Leftrightarrow C_{1}\subseteq C_{2}$ .

(Lattice of Skyline concepts).

The ordered set $\langle\textsc{SkyConcepts($r$)},\leqslant\rangle$ forms a complete lattice called lattice of Skyline concepts. It is isomorphic to the lattice of Agree concepts.

.

Figure 3 gives the Hasse diagram of the Skyline concept lattice of the relation Housing. According to the Example 8, the couple $c_{a}=(\textit{DCN},\{1,2,35,4\})$ is an Agree concept. Its associated Skyline concept $c_{s}$ is $c_{s}=(\textit{DCN},\pi\text{-}\textsc{Sky}_{\textit{DCN}}(\{1,$ $2,35,4\}))=(\textit{DCN},\{2,35,4\})$ . The identifier $1$ is eliminated from the extension by the $\pi$ -Sky operator because the tuple $t_{1}$ is dominated by $t_{2}$ .

Figure 3.

Hasse diagram of the Skyline concept lattice of the relation Housing.

4.2 Partial materialization of the Skycube

In this section, we propose the Skyline concept lattice as a partial materialization approach for the Skycubes. The underlying idea for obtaining a reduced representation is to eliminate the cuboids which can be the most efficiently rebuilt.

(Disagree condition).

Let $r$ be a relation and $C$ a criterion set. The disagree condition $\textsc{Cna}_{C}(r)$ is satisfied when:

$\nexists t_{i},\ t_{j}\in r\text{ such that }t_{i}[C]=t_{j}[C]\text{ with }i% \neq j,\ C\subseteq\mathcal{C}$

When $\textsc{Cna}_{C}(r)$ is verified, $\textsc{Cna}_{X}(r)$ also holds for a superset $X$ of $C$ .

This disagree condition is a weakened version of the condition of distinct values [43] because it applies on the projections instead of the individual values of each criterion.

(Dominance under Cna).

Let $C\subseteq\mathcal{C}$ be a set of criteria and $r$ a relation. The dominance relationship $t\succ_{C}t$ under the $\textsc{Cna}_{C}(r)$ hypothesis can be expressed more simply:

$\text{Let }C\text{ such that }\textsc{Cna}_{C}(r),\ \forall t_{i},\ t_{j}\in r% \text{ we have }t_{j}\succ_{C}t_{i}\text{ iff }\forall c\in C,\ t_{j}[c]% \leqslant t_{i}[c]\text{ with }i\neq j$

.

Let $r$ be a relation. For any criterion set $C\subseteq\mathcal{C}$ verifying the disagree condition $\textsc{Cna}_{C}(r)$ , we have $\textsc{Sky}_{C}(r)\subseteq\textsc{Sky}_{C\cup c_{0}}(r)$ with $c_{0}\in{\cal C}\smallsetminus C$ .

Proof..

Under the hypothesis $\textsc{Cna}_{C}(r)$ , we have: $t_{i}\in\textsc{Sky}_{C}(r)$ $\Rightarrow$ $\forall t_{j}\in r$ with $i\neq j$ , we have $t_{j}\nsucc_{C}t_{i}$ $\Rightarrow$ $\forall t_{j}\in r$ with $i\neq j$ , $\exists c\in C$ such that $t_{j}[c]>t_{i}[c]$ (cf. Definition 12) $\Rightarrow$ $\forall t_{j}\in r$ with $i\neq j$ , $\exists c\in C\cup c_{0}$ such that $t_{j}[c]>t_{i}[c]$ $\Rightarrow$ $\forall t_{j}\in r$ with $i\neq j$ , we have $t_{j}\nsucc_{C\cup c_{0}}t_{i}$ $\Rightarrow$ $t_{i}\in\textsc{Sky}_{C\cup c_{0}}(r)$ . Under the hypothesis $\textsc{Cna}_{C}(r)$ , we have $\textsc{Sky}_{C}(r)\subseteq\textsc{Sky}_{C\cup c_{0}}(r)$ . ∎

The following counter-example shows that the reverse property does not hold.

.

Let $r=\{t_{1},t_{2}\}$ be a relation (with $t_{1}=(0,\ 1)$ and $t_{2}=(1,\ 0)$ ) and $\mathcal{C}=\{A,B\}$ the set of its criteria. The condition $\textit{CNA}_{A}(r)$ holds. We have $t_{2}\notin\textsc{Sky}_{A}(r)$ because $t_{1}\succ_{A}t_{2}$ whereas $t_{2}$ belongs to $\textsc{Sky}_{A\cup B}(r)$ .

.

Let $C$ be a criterion set such that the Cna ${}_{C}(r)$ condition holds. By definition, for all $X$ ( $C\subseteq X$ ), $X$ satisfies the condition Cna ${}_{X}(r)$ and we have Sky ${}_{C}(r)$ $\subseteq$ Sky ${}_{X}(r)$ .

This property holds for any super-set of $C$ until the set of the criteria $\mathcal{C}$ .

(Fundamental theorem).

Let $r$ be a relation, $C$ a set of criteria and h(C) its closure. Then:

$\forall C\subseteq\mathcal{C}\text{ we have }\textsc{Sky}_{C}(r)\subseteq% \textsc{Sky}_{\textit{h(C)}}(r)$

Proof..

By definition, $t_{i}\in\textsc{Sky}_{C}(r)\text{ iff }\exists[t]\in\pi\text{-}\textsc{Sky}_{C% }(\pi_{C})\text{ such that }i\in[t]$ . We would like to show that $\pi\text{-}\textsc{Sky}_{C}(\pi_{C})\subseteq\pi\text{-}\textsc{Sky}_{C}(\pi_{% h(C)})$ . We know that for any $X$ such that $C\subseteq X\subseteq h(C)$ we have $\pi_{X}=\pi_{C}$ . Let $E=\{t_{i}\in r\mid i\in\textit{reps}(\pi_{C})\}$ be the set of the representative tuples of this partition. We can make use of $E$ for the computation of the Skylines according to $X$ only in the interval $C\subseteq X\subseteq h(C)$ , otherwise the equivalence classes are no longer equal and it is not possible to ensure the correct construction of the whole Skyline from the representatives of the equivalence classes. Moreover, $\textsc{Cna}_{X}(E)$ is verified because each representative tuple can be distinguished from the other ones and therefore, according to Lemma 1,, we have $\textsc{Sky}_{C}(E)\subseteq\textsc{Sky}_{h(C)}(E)$ hence $\textsc{Sky}_{C}(r)\subseteq\textsc{Sky}_{\textit{h(C)}}(r)$ . ∎

.

In the relation Housing, we have:

•
$\textsc{Sky}_{\textit{EC}}(\textsc{Housing})=\{t_{4}\}$

Moreover the following properties are verified :

•
$\textit{h(DC)}=\textit{DCN}$ and $\textit{Sky}_{\textit{DCN}}($ Housing $)=\{t_{2},t_{3},t_{5},t_{4}\}$
•
$\textsc{Sky}_{\textit{DC}}($ Housing $)\subseteq\textsc{Sky}_{\textit{h(DC)}}($ Housing $)$

Thus we have $\textsc{Sky}_{D}($ Housing $)=\{t_{3},t_{4},t_{5}\}$ with $h(D)=D$ . Remarks the non-inclusion $\textsc{Sky}_{D}($ Housing $)\nsubseteq\textsc{Sky}_{\textit{DC}}($ Housing $)$ .

The theorem given above shows an inclusion between some Skylines cuboids. More exactly, for any set of criteria, there is a chain of inclusions from its minimal generators to its closure. This means that a cuboid can be computed from another one which contains it. Then, instead of using the original relation, we only consider a limited subset of it. The Skyline concepts are the largest cuboids (according to the inclusion relationship) which make it possible to compute other cuboids. With the above theorem, we are provided with a local monotony w.r.t the inclusion. By only preserving the Skyline concepts (i.e. non-materialization of the non-closed cuboids), missing cuboids can be quickly built up merely by finding the closed cuboid from which they can be computed. Furthermore, thanks to the equivalence classes of Skyline concepts, we avoid a great number of useless comparisons. Tuples which cannot be distinguished are considered by groups and the computation complexity no longer depends on the number of tuples but depends on the number of groups (equivalence classes).

The lattice of Skyline concepts constitutes a partial materialization of the Skycube, from which the information which is not materialized can be efficiently computed.
5. Experimental evaluations

In this section, our aim is to validate our theoretical materialization approach through various experiments. The latter are conducted according to a twofold objective: on the one hand, measuring the size reduction and, on the other hand, evaluating the response time for answering Skyline queries.

For our experiments, we generate synthetic databases7

⁷
The synthetic data generator is given at http://illimine.cs.uiuc.edu/.

of one million tuples and ten dimensions.

Figure 4.

Query response time.

Figure 5.

Size reduction.

5.1 Size reductions

In this experiment, we want to measure both the size of the Skycube and the size of our Skyline Concept Cattice in order to compare them. The Fig. 5 shows the obtained results when the cardinality of the criteria varies from one hundred to ten thousands. The more the cardinality increases, the more our representation is interesting regarding the storage space it needs to be materialized, whereas the Skycube keeps becoming more and more voluminous.

.

With our relation example, the Skycube of Housing according to Price and Distance, Consumption and Neigbourg encompasses 29 tuples.The size of the associated Skyline Concept Lattice is of 22 tuples.

5.2 Query response time

In this experiment, we want to compare the response time of a Skyline query using the SFS [13] algorithm (a short description is given in Appendix A.3) when there is no materialization and when our Skyline Concept Lattice is available. When provided with our partial materialization, two cases have to be considered. In the simplest one, the result of the query is enclosed in our materialization and therefore the cost of the query is also minimal. In the second one, the result of the query must be computed from the associated closed skycuboid by using Theorem 2. In order to improve the quality of our evaluation and to obtain meaningful results from this experiment, each point on the curve in Fig. 4 stands for the average execution time of hundreds of queries while the cardinality of the criteria varies from one hundred to ten thousands. Also, to simulate the user queries, we determine the number of criteria for each of them thanks to binomial distribution centered on the number of possible criteria divided by two and then we pick the combination of criteria randomly. This way, the processed queries often consider several (but not too many) criteria so they have more meaning than if they were picked randomly. The results show that our Skyline Concept Lattice is effective because it can quickly answer to any Skyline query and that its gain factor is constant.

For instance, if we are interested in the less expensive housings, we have to compute $\textit{Sky}_{P}$ (Housing) which is not materialized in the Skyline Concept Lattice. However it can be quickly computed from $\textit{Sky}_{P}N$ (Housing), Cf. Theorem 2. Thus $\textit{Sky}_{P}$ (Housing) encompasses the tuples with the RowId 2 and 5.

6. Related work and comparison

The Skyline operator [9] is intended for retrieving the most relevant objects according to some criteria in a database context. It originates from the maximal vector problem [3, 7]. However, due to the context features, specific algorithms have been developped. The most famous ones are Bnl[9], Sfs[13], Less[20] and Salsa [4]. Other more efficient algorithms like Bbs[30] have been proposed but make use of index structures which are costly to maintain.

When computing Skyline queries, if numerous criteria are considered, the risk is to obtain a huge amount of Skyline objects not necessarily informative. Several approaches attempt to solve this problem [40] proposes a ranking approach of Skyline objects for a Skycube in order to focus on the most informative objects. A Skyline graph is built up and captures the dominance relationships between Skyline objects belonging to different subspaces.

The ”curse of dimensionality” is likely to be more frequent when several relations are scanned (Skyline over join) or several databases in a distributed context [22]. Some approaches address Skyline query optimization and propose efficient algorithms to compute the Skyline of several relations without performing join operator [39] proposes a novel algorithm to reach such an objective. For effectiveness sake, the algorithm does not access all the tuples of relations and the join result is not entirely generated. Moreover it is robust and early ends. Semi-Skylinediscards [16, 17], from relations, the tuples which never belong to the result. Obviously this operation reduces the relation size and the cost for computing latter joins. A semi-Pareto preference is defined for preference queries combined with selections. As mentioned above a critical problem is to deal with numerous criteria. Thus a main question is to supply the user with the most relevant criteria.

The Skyline operator is a decision making tool and the user will likely compute several Skylines before he finds the ones which are really interesting. In order to address such an issue, the Skycube concept [43, 33] has been proposed. The underlying general idea is to compute all the possible Skylines in advance, therefore it is necessary to reduce as much as possible the storage cost of the result.

Reduction approaches have been proposed by [34, 35]. Their objectives are to solve the problem related to objects belonging to several Skycuboids. Their solutions are a representation of the Skycube, based on formal concept analysis, where each node corresponds to a couple made of an equivalence class (set of objects) and the sub-spaces in which they belong to the Skyline. The main drawback of this value oriented approach is that the number of nodes is bounded by the cardinality of the lattice of the tuple powerset ( $|\mathscr{P}(Tid(r))|$ ). The Skylines objects are firstly considered, then for building up a Skycuboid, a large number of lattice nodes must be acceded. In attribute oriented approaches, like ours, the number of nodes is much more reduced since it is bounded by the size of the power set of the criterion set $|\mathscr{P}(\mathcal{C})|$ .

Another interesting approach is proposed in [41] and is cuboid oriented (attribute oriented). Its objective is to obtain a very important reduction of the Skycube by discarding the elements considered as redundant in the different cuboids. Its drawback is that the reconstruction of a Skycuboid is difficult and relatively costly because the underlying data structure must be scanned to retrieve redundant elements. Moreover, in contrast with the approach described in [34, 32] and ours, the representation is not based on a theoretical support as sound as the formal concept analysis. However, in addition to the great reduction of the storage cost, one of the main interests of the approach is the efficiency of data actualization.

In addition to the Skyline materialization, our approach can take advantage of efficient existing algorithms in order to propose criterion sets likely interesting and achieve the associated Skylines. An objective of our approach is the Skycuboid reconstruction at the least cost, and it ideally behaves when such a task must be performed. Despite its targeted orientation, it is a good compromise for both data actualization and storage space reduction.

7. Conclusion

In this paper, we have defined a new theoretical foundation for multidimensional Skyline analysis which is at the cross road between formal concept analysis and database theory.

We have presented the Agree Concept Lattice [28] by combining on the one hand the concepts of database partition [36] and agree set [5] and on the other hand the formal concept lattice [19]. This structure can be redefined with pattern structures [18].

Then we have introduced the lattice of Skyline concepts which is a constrained instance of the Agree Concept Lattice. This new structure offers the partial and lossless materialization of Skycubes as well as the efficient reconstruction of missing cuboids. An additional advantage of our lattice is that the number of its nodes depends on the number of dimensions, generally small and constant, and not on the number of the dimension values.

One of the main features of our Agree Concept Lattice structure is its generic feature and that it can be applied to several application fields, in particular for mining both exact and approximate functional dependencies or computing data cubes. Our idea when proposing this structure is to make use of database concepts to solve database and Olap problems. This makes it possible to take advantage of existing and efficient database tools. Various approaches address the Skyline query optimisation and our work can be seen as a complementary step which aims to aid the decision maker to choose the most relevant criteria.

We can easily extend our materialization approach to the cases of $\epsilon$ -Skylines or approximate Skylines [30], by generalizing the definition of agree set: the strict equality is then replaced by the $\epsilon$ -equality. With such an extension, the user can relax the dominance constraint when the number of results is not sufficient. The Skyline ranking approach [40] applies in the context of the dimension subset lattice. We believe that the Skyline concept lattice can be an interesting framework for such an issue because it is a more reduced space.

Footnotes

Appendix

Formal background

In this section, we present the fundamental concepts for our approach: the partition lattice [8] and agree set [5] originated from database theory.

References

Aggarwal

C.C.

Hinneburg

and Keim

D.A.

, On the surprising behavior of distance metrics in high dimensional spaces, in: ICDT, volume 1973 of Lecture Notes in Computer Science den Bussche

J.V.

and Vianu

, eds, Springer, 2001, pp. 420–434.

Barbut

, Partitions d’un ensemble fini: leur treillis (cosimplexe) et leur représentation géométrique, Mathématiques et Sciences Humaines 22 (1968), 5–22.

Barndorff-Nielsen

and Sobel

, On the Distribution of the Number of Admissible Points in a Vector Random Sample, Theory of Probability and its Applications 11 (1966), 249.

Bartolini

Ciaccia

and Patella

, Efficient sort-based Skyline evaluation, ACM Trans. Database Syst. 33(4) (2008).

Beeri

Dowd

Fagin

and Statman

, On the structure of armstrong relations for functional dependencies, J. ACM 31(1) (1984), 30–46.

Bellman

R.E.

, Dynamic Programming, Dover Publications, Incorporated, 2003.

Bentley

J.L.

Kung

H.T.

Schkolnick

and Thompson

C.D.

, On the average number of maxima in a set of vectors and applications, J. ACM 25(4) (1978), 536–543.

Birkhoff

, Lattice Theory, volume XXV of AMS Colloquium Publications , American Mathematical Society, third (new) edition, 1970.

Börzsönyi

Kossmann

and Stocker

, The Skyline operator, In ICDE, IEEE Computer Society, 2001, pp. 421–430.

10.

Casali

Cicchetti

and Lakhal

, Extracting semantics from data cubes using cube transversals and closures, in: KDD Getoor

Senator

T.E.

Domingos

and Faloutsos

, eds, ACM, 2003, pp. 69–78.

11.

Casali

Nedjar

Cicchetti

and Lakhal

, Closed cube lattices, Annals of Information Systems 3(1) (2009), 145–164. New Trends in Data Warehousing and Data Analysis.

12.

Chan

C.Y.

Jagadish

H.V.

Tan

K.-L.

Tung

A.K.H.

and Zhang

, On high dimensional Skylines, in: EDBT, volume 3896 of Lecture Notes in Computer Science Ioannidis

Y.E.

Scholl

M.H.

Schmidt

J.W.

Matthes

Hatzopoulos

Böhm

Kemper

Grust

and Böhm

, eds, Springer, 2006, pp. 478–495.

13.

Chomicki

Godfrey

Gryz

and Liang

, Skyline with presorting, in: ICDE Dayal

Ramamritham

and Vijayaraman

T.M.

, eds, IEEE Computer Society, 2003, pp. 717–816.

14.

Chomicki

Godfrey

Gryz

and Liang

, Skyline with presorting: Theory and optimizations, in: Intelligent Information Systems Klopotek

M.A.

Wierzchon

S.T.

and Trojanowski

, eds, Advances in Soft Computing, Springer, 2005, pp. 595–604.

15.

Eder

and Wei

, Evaluation of Skyline algorithms in postgresql, in: IDEAS, ACM International Conference Proceeding Series Desai

B.C.

Saccà

and Greco

, eds, ACM, 2009, pp. 334–337.

16.

Endres

and Kießling

, Semi-Skyline optimization of constrained Skyline queries, in: ADC, volume 115 of CRPIT Shen

H.T.

and Zhang

, eds, Australian Computer Society, 2011, pp. 7–16.

17.

Endres

and Kießling

, Skyline snippets, in: FQAS, volume 7022 of Lecture Notes in Computer Science Christiansen

Tré

G.D.

Yazici

Zadrozny

Andreasen

and Larsen

H.L.

, eds, Springer, 2011, pp. 246–257.

18.

Ganter

and Kuznetsov

S.O.

, Pattern structures and their projections, in: ICCS, volume 2120 of Lecture Notes in Computer Science Delugach

H.S.

and Stumme

, eds, Springer, 2001, pp. 129–142.

19.

Ganter

and Wille

, Formal Concept Analysis: Mathematical Foundations, Springer, 1999.

20.

Godfrey

Shipley

and Gryz

, Algorithms and analyses for maximal vector computation, VLDB J. textbf16(1) (2007), 5–28.

21.

Gray

Chaudhuri

Bosworth

Layman

Reichart

Venkatrao

Pellow

and Pirahesh

, Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub totals, Data Min. Knowl. Discov. 1(1) (1997), 29–53.

22.

Hose

and Vlachou

, A survey of Skyline processing in highly distributed environments, VLDB J. 21(3) (2012), 359–384.

23.

Lakshmanan

L.V.S.

Pei

and Han

, Quotient cube: How to summarize the semantics of a data cube, in: VLDB Lochovsky

F.H.

and Shan

, eds, Morgan Kaufmann, 2002, pp. 778–789.

24.

Lopes

Petit

J.-M.

and Lakhal

, Functional and approximate dependency mining: database and fca points of view, J. Exp. Theor. Artif. Intell. 14(2-3) (2002), 93–114.

25.

Morfonios

and Ioannidis

Y.E.

, Cure for cubes: Cubing using a rolap engine, in: VLDB, U. Dayal Whang

K.-Y.

Lomet

D.B.

Alonso

Lohman

G.M.

Kersten

M.L.

Cha

S.K.

and Kim

Y.-K.

, eds, ACM, 2006, pp. 379–390.

26.

Nedjar

Casali

Cicchetti

and Lakhal

, Emerging cubes: Borders, size estimations and lossless reductions, Information Systems 34(6) (2009), 536–550.

27.

Nedjar

Cicchetti

and Lakhal

, Extracting semantics in olap databases using emerging cubes, Information Sciences 181(10) (2011), 2036–2059.

28.

Nedjar

Pesci

Cicchetti

and Lakhal

, The agree concept lattice for multidimensional database analysis, In ICFCA 2011 – 9th International Conference on Formal Concept Analysis, Lecture Notes in Artificial Springer-Verlag

Intelligence.

, 2011.

29.

Nedjar

Pesci

Cicchetti

and Lakhal

, Treillis des concepts Skylines: Analyse multidimensionnelle des Skylines fondée sur les ensembles en accord. In EGC 2011 – Extraction et gestion des connaissances, Revue des Nouvelles Technologies de Cépaduès-Éditions

l’Information.

, 2011.

30.

Papadias

Tao

and Seeger

, Progressive Skyline computation in database systems, ACM Trans. Database Syst. 30(1) (2005), 41–82.

31.

Pasquier

, Y. Bastide Taouil

and Lakhal

, Efficient mining of association rules using closed itemset lattices, Information Systems 24(1) (1999), 25–46.

32.

Pei

A.W.-C.

Lin

and Wang

, Computing compressed multidimensional Skyline cubes efficiently, in: ICDE Dogac

Ozsu

and Sellis

, eds, IEEE, 2007, pp 96–105..

33.

Pei

Jin

Ester

and Tao

, Catching the best views of Skyline: A semantic approach based on decisive subspaces, In VLDB, 2005, pp 253–264..

34.

Pei

Yuan

Lin

Jin

Ester

Liu

Wang

Tao

J.X.

and Zhang

, Towards multidimensional subspace Skyline analysis, ACM Trans. Database Syst. 31(4) (2006), 1335–1381.

35.

Raïssi

Pei

and Kister

, Computing closed skycubes, PVLDB 3(1) (2010), 838–847.

36.

Spyratos

, The partition model: A deductive database model, ACM Trans. Database Syst. 12(1) (1987), 1–37.

37.

Spyratos

Sugibuchi

Simonenko

and Meghini

, Computing the Skyline of a relational table based on a query lattice, in: Formal Concept Analysis, volume 7278 of Lecture Notes in Computer Science. F.Domenach Ignatov

D.I.

and Poelmans

, eds, Springer, 2012.

38.

Stumme

Taouil

Bastide

Pasquier

and Lakhal

, Computing iceberg concept lattices with titanic, Data Knowl. Eng. 42(2) (2002), 189–222.

39.

Vlachou

Doulkeridis

and Polyzotis

, Skyline query processing over joins, in: SIGMOD Conference Sellis

T.K.

Miller

R.J.

Kementsietsidis

and Velegrakis

, eds, ACM, 2011, pp 73–84..

40.

Vlachou

and Vazirgiannis

, Ranking the sky: Discovering the importance of Skyline points through subspace dominance relationships, Data Knowl. Eng. 69(9) (2010), 943–964.

41.

Xia

and Zhang

, Refreshing the sky: the compressed skycube with efficient support for frequent updates, in: SIGMOD Conference Chaudhuri

Hristidis

and Polyzotis

, edis, ACM, 2006, pp 491–502..

42.

Xin

Shao

Han

and Liu

, C-cubing: Efficient computation of closed cubes by aggregation-based checking, in: ICDE Liu

Reuter

Whang

K.-Y.

and Zhang

, eds, IEEE Computer Society, 2006, pp. 4.

43.

Yuan

Lin

Liu

Wang

J.X.

and Zhang

, Efficient computation of the Skyline cube, In VLDB, 2005, pp. 241–252.

Multidimensional skyline analysis based on agree concept lattices

Abstract

Keywords

1. Introduction

2. The Skycube for the multidimensional analysis of Skylines

2.1 The Skyline operator

Table 1 The relation Housing

.

(Dominance relationship).

.

(The Skyline operator).

.

(Skyline subspace).

(Skycube).

(Skycube).

.

3.1 The Agree Concepts of a database relation

.

4 𝒫 ⁢ ( ℛ ) is the powerset lattice of the attribute set of the database relation r .

.

.

Proof..

(Closure operators).

.

(Agree concepts).

(Agree Concept Lattice).

Proof..

.

.

Proof..

.

4.1 Skyline concept lattice for efficient multidimensional Skylines analysis

.

(Concepts Skylines).

(Lattice of Skyline concepts).

.

(Disagree condition).

(Dominance under Cna).

.

Proof..

.

.

(Fundamental theorem).

Proof..

.

7 The synthetic data generator is given at http://illimine.cs.uiuc.edu/.

.

5.2 Query response time

6. Related work and comparison

7. Conclusion

Footnotes

Appendix

Formal background

References

Table 1
The relation Housing

⁴
$\mathscr{P}(\mathcal{R})$ is the powerset lattice of the attribute set of the database relation $r$ .

⁷
The synthetic data generator is given at http://illimine.cs.uiuc.edu/.