Relaxing of flexible RDF queries: A relative proximity relation-based approach

Abstract

In recent years there have been a great deal of interests in effectively extracting information from RDF repositories. However, users often experience empty answer problems (EAP) when they are not familiar with the structure or content of a RDF dataset and issue a restrictive query to the system. A solution is to allow users to incorporate fuzzy conditions into the query to make it more flexible. Although a fuzzy query is more or less satisfactory, there are still occasions when no or too few entities in a RDF dataset satisfy the query criteria with a degree more than zero, which is called a fuzzy-EAP problem. In this paper, we propose a query relaxation approach based on relative proximity relations to tackle this problem. We present a framework of fuzzy query relaxation over RDF datasets, and present query relaxation algorithms for atomic f-SPARQL queries and complex f-SPARQL queries. Moreover, this paper reports on a prototype implementation on an extension of LUBM dataset to verify the feasibility and effectiveness of the proposed method. The experimental results show a relatively high recall as more relevant results are returned after relaxation.

Keywords

Query relaxation f-SPARQL flexible RDF queries relative proximity relation minimal failing subqueries

1 Introduction

With more and more domain knowledge being encoded in RDF (Resource Description Framework) data formats [23], retrieving data and information from RDF datasets has become an important research topic. SPARQL (Simple Protocol and RDF Query Language) is the W3C proposed standard RDF query language, which provides users with the capability to extract information from graph-based data [25].

However, ordinary users may not fully understand the structure and content of a RDF dataset on account of its growing scale and complexity. In such cases, even if users are accurately aware of their own query intentions, there are still chances when the query result is empty or too few in amount, which is called an Empty Answer Problem (EAP). In order to avoid users from adopting inefficient “trial and error” method to get more results, there have been query systems [19, 20] using RDFS ontology entailment to automatically relax an initial SPARQL query to obtain more results.

Another solution for EAP is to allow users to issue queries with flexible/fuzzy conditions. The idea of introducing flexibility into queries is gaining more and more attention in the Semantic Web community. In particular, there have been researches on applying fuzzy set theory [29] to RDF query languages to increase the ability of expressing fuzzy query conditions and thus obtain so called flexible queries for RDF, e.g., a flexible extension of SPARQL called f-SPARQL [8] which allows to construct fuzzy FILTER constraints by means of fuzzy terms, fp-SPARQL [28] and SPARQLf-p [22] which further extends f-SPARQL with preferred expressions, and Fuzzy SPARQL [26] which addresses queries to fuzzy RDFS graphic data. Although a fuzzy query is more or less satisfactory, there are still occasions when there is no or too few entities in a RDF dataset that satisfy the query criteria with a degree more than zero, which is called a fuzzy-EAP problem. Inspired partly by studies from relational database research [2, 6], we endeavor to achieve automatic relaxation of a failing fuzzy RDF query to obtain more results. This paper makes the following major contributions:

To the best of our knowledge, it presents the first query relaxation framework for flexible queries over RDF datasets. The framework consists of relaxation algorithms for atomic fuzzy queries and complex fuzzy queries. The relaxation lattice and the weights of fuzzy conditions is introduced to determine the execution order of relaxed queries, and the MFSs (Minimal Failing Subqueries) is introduced to avoid unnecessary execution of some relaxed queries.

It presents a prototype implementation on an extension of LUBM dataset to verify the feasibility and effectiveness of the proposed fuzzy query relaxation method. The experimental results show a relatively high recall as more relevant results are returned after relaxation.

The remainder of this paper is structured as follows. Section 2 presents related works. Section 3 briefly review some necessary background knowledge of RDF, SPARQL and its flexible extension f-SPARQL, and the relative proximity relationship and the relaxation principle of fuzzy sets. Sections 4 and 5 provide the relaxation algorithms for atomic and complex f-SPARQL queries respectively. Section 6 provides an empirical evaluation of the effectiveness of proposed method. We wrap the paper with a brief conclusion and some directions of future works.

2 Related works

When users are unclear of the content and structure of RDF datasets, they frequently encounter the problem of too few or even no results when submitting a very specific query, which is in this scenario called a failing query. A common and intuitive solution for this problem is query relaxation, which means trying repeatedly a more general query until expected number of results is obtained.

In the area of Semantic Web, relaxation of RDF queries, especially SPARQL queries, has been extensively studied. Hurtado et al. [20, 21] put forward a query relaxation method based on RDFS entailment, which automatically relaxes a failing query to generate more general queries and returns approximate query results relevant to the original query. Poulovassilis et al. [24] extend the query relaxation method in [20] to support regular path queries, and show its combination with query approximation. Calì et al. [7] extend the work in [24] to propose an extension of SPARQL 1.1 with approximation and relaxation operators which is applicable to regular expressions for querying property paths. Building on the work of [20], Huang et al. [18, 19] propose a ranking strategy which computes the similarity degrees between relaxed queries and the original query and use them to score the potential relevant answers. Dolog et al. [10] propose query rewriting rules by including domain knowledge (not encoded in RDFS) and user’s preference to handle the problem of excessive restrictions on query conditions in RDF queries.

The abovementioned query relaxation methods based on RDFS semantics [18 , 24] or domain knowledge [10] are for categorical attributes in RDF datasets and are not applicable to numeric attributes. In addition, these solutions for crisp SPARQL queries cannot be applied to the relaxation of fuzzy queries directly.

In the area of database, the relaxation of fuzzy queries is well studied in the last nineteenth. The seminal work comes from Andreasen and Pivert [1]. The approach is based on relaxation of one or several fuzzy predicates using a fuzzy linguistic modifier which extends the support of the original predicate to obtain more answers. Fuzzy selectivity is used to guide the execution order of relaxed queries by finding the query as close as possible to the original query. In a series of papers, Bosc et al. propose two relaxation paradigms for fuzzy relational queries based on relative proximity relation [2, 3] and absolute proximity relation [6], respectively. The methods show how a tolerance relation modeled by an appropriate parameterized proximity relation can transform a given fuzzy term into an enlarged one by widening its support. Voglozin et al. [27] propose a method to repair failing queries addressed to data summaries. The method is based on a specified distance and generates alternative queries by modifying fuzzy labels in a query. This method does not provide any termination condition for relaxation process. De Calmès et al. [9] presents an information system, called PRETI, which is applied to a house renting database. The system provides a flexible querying module which use a case-based method to avoid empty answer. Hachani et al. [15] present a FCA (Formal Concept Analysis) based cooperative approach to handling empty answers. The approach provides users with the nearest answers to the original query and gives the cause of a failing query.

Much less attention has been paid to the relaxation of fuzzy queries over crisp RDF datasets. Very few works are concerned with this problem. Hogan et al. [17] introduce a real-life use case that require approximate answers for fuzzy queries over crisp RDF data. They use query relaxation to match incomplete description of entities (i.e., fuzzy queries) to structured descriptions encoded in RDF (i.e., the data). Fuzzy constraints involved in [17] are categorical attribute values prefixed by modifiers (e.g., “slightly”, “looks like”, and “may be”) to form the fuzzy semantics. In this paper, we consider fuzzy constraints in a more general sense, i.e., fuzzy terms (e.g., “high”, “short”, and “young”) that correspond to fuzzy sets.

3 Preliminaries

In this section, we give a brief introduction of RDF data model and the RDF standard query language SPARQL, fuzzy set theory and the relaxation principle of fuzzy sets, and finally a flexible extension of SPARQL called f-SPARQL.

3.1 RDF and SPARQL

RDF is designed as a metadata data model for describing of Web resources. Its basic building block is an entity-attribute-value triple, called a statement. More formally, let I, B, and L be pairwise disjointing infinite sets of IRIs (Internationalized URIs), blank nodes and RDF literals, respectively, a triple (s, p, o) ∈ (I ∪ B) × I × (I ∪ B ∪ L) is called an RDF triple, where s is called the subject, p the predicate (or property), and o the object. The RDF triple implies “a subject s has a predicate (or property) p with value o”.

SPARQL is the W3C standard query language and data access protocol, which is specially developed for RDF. A SPARQL query is actually a graph pattern to be matched in a RDF dataset. A graph pattern is constituted of triple patterns and their conjunctions and/or disjunctions. A triple pattern is a RDF triple that may contain variables. Among others, SPARQL allows for select queries, formed in a SELECT-FROM-WHERE manner. The result form represents the set of variables appearing in the SELECT, the dataset forms the FROM part, constituted by a set of IRIs of RDF datasets, while the graph pattern forms the WHERE part. The detailed syntax of SPARQL can be found in [25].

3.2 Fuzzy set theory and principle of relaxation

We introduce some basic concept of fuzzy set theory, a special kind of fuzzy sets called relative proximity relations, and the principle of fuzzy set relaxation.

3.2.1 Fuzzy set theory

In order to quantitatively describe fuzzy concepts and fuzzy phenomena, L.A. Zadeh [29] proposed the concept of fuzzy set to expand the membership relationship in the classic set. The membership degree of an elements to a set is described by a membership function.

Definition 1. (membership function) Let U be a universe of discourse, a fuzzy subset (or simply fuzzy set) A on U is characterized by a membership function μ_A : U → [0, 1], which assigns a membership degree to each element u in U, denoted by μ_A (u) or simply A (u).

Definition 2. (support and core of a fuzzy set) Given A a fuzzy set, the support of A is a set of elements with non-zero membership degrees, denoted by supp (A) = {u ∈ U|μ_A (u) >0}, the core of A is a set of the elements that completely belong to A, denoted by core (A) = {u ∈ U|μ_A (u) =1}.

Definition 3. (convex, normal and fuzzy number) A fuzzy set A w.r.t. U is called convex, iff for all u₁, u₂ ∈ U, μ_A (αu₁ + (1 - α) u₂) ≥ min(μ_A (u₁) , μ_A (u₂)), where α ∈ [0, 1]. A fuzzy set A w.r.t. U is called normal, if ∃u ∈ U, s.t. μ_A (u) =1. A fuzzy number is a convex and normal fuzzy set.

Membership functions are always named after their shapes, such as rectangular and half rectangular, trapezoidal and half trapezoidal, parabolic, and ridge shape membership functions. The triangular and trapezoidal shapes of membership functions are used most often for representing fuzzy numbers. For ease of narration and without loss of generality, in the following, we use trapezoidal membership function as an example to discuss the proposed method. We adopt the quadruple expression F (A, B, a, b) to represent the trapezoidal membership function shown in Fig. 1, where a, b, A, B ≥ 0, the core of F is [A, B], and the support of F is [A - a, B + b].

Fig.1

The trapezoidal membership function of F (A, B, a, b).

Example 1. The trapezoidal membership function of the fuzzy set “close_to_185” is defined as F (184, 186, 5, 5).

3.2.2 Relative proximity relation

The tolerance indicator Z is an important application of fuzzy relations, and is the foundation of fuzzy-set-based relaxation [3].

Definition 4. (proximity relation) A proximity relation (or a tolerance relation) is a fuzzy relation E on domain U that is reflective and symmetric, i.e., for any x, y ∈ U, μ_E (x, x) =1 and μ_E (x, y) = μ_E (y, x).

The value of μ_E (x, y) is used to measure the degree that x and y are approximately equal. The closer the values of x and y, the closer the membership degree μ_E (x, y) is to 1. In a continuous domain U, a relative proximity relation characterizes to what extent the ration x/y is close to 1.

Definition 5. (relative proximity relation) A relative proximity relation E can be modeled as μ_E (x, y) = μ_Z (x/y), where Z is a fuzzy number “close to 1” and is called the tolerance indicator.

The fuzzy number Z has following properties to ensure the reflectivity and symmetry of E.

Reflexivity: μ_Z (1) =1, which implies that the proximity degree of x to itself is 1.

Symmetry: μ_Z (r) = μ_Z (1/r), which implies that μ_E (x, y) = μ_E (y, x).

μ_Z (r) =0 if r ≤ 0 which assumes that two numbers which are close should have the same sign.

Property (3.2.2) implies the core of Z is [1,1]. Property (3.2.2) implies the support of Z is [1 - ɛ, 1/(1 - ɛ)], where ɛ is a rational within [0,1] and is called the relative tolerance parameter. The quadruple representation of Z is thus Z (1, 1, ɛ, ɛ/(1 - ɛ)). As is suggested in [16], ɛ can be deliberately chosen to ensure supp (Z) fall in the interval $V = [(\sqrt{5} - 1) / 2, (\sqrt{5} + 1) / 2]$ called validity interval, which in turn ensures that the proximity relation E is more restrictive than the “not negligible” relation. This means that if supp (Z) ⊈ V, then Z is not in accordance with the intuitive semantics underlying this notion [6]. To ensure supp (Z) ⊆ V, ɛ should lie in $[0, (3 - \sqrt{5}) / 2]$ .

3.2.3 Fuzzy set relaxation

Fuzzy set relaxation is the basis of the relaxation of fuzzy RDF queries. There have been various ways to achieve this goal, e.g, by using fuzzy modifiers, and relative or absolute proximity relations [5]. In this paper, we adopt a relative-proximity-relation-based relaxation. By applying a relative proximity relation E to a given fuzzy set F on domain U, we obtain a more general fuzzy set F′. The newly generated fuzzy set gathers not only all the elements in the original fuzzy set, but also elements close to a certain element in F, which is formally defined as “for each u ∈ U, there exists a element v ∈ U, such that F (v) ∧ E (u, v) holds”. The semantics of this statement is formulated as $\forall u \in U, μ_{F^{'}} (u) = sup_{v \in U} min (μ_{F} (v), μ_{E} (u, v)) .$

If E is modeled by a fuzzy number Z (“close to 1”), we obtain a fuzzy set F^Z. F^Z can be expressed as F^Z = F ⊕ Z, where ⊕ denotes the addition operation on fuzzy sets [11], i.e., $μ_{F^{Z}} (u) = sup_{v \in U} min (μ_{F} (v), μ_{E} (u, v))$ $= sup_{v \in U} min (μ_{F} (u)$ , μ_Z (u/v)) = μ_F⊕Z (u).

If the quadruple expressions of F and Z are (A, B, a, b) and (1, 1, ɛ, ɛ/(1 - ɛ)) respectively, then F^Z = (A, B, a + A * ɛ, b + B * ɛ/(1 - ɛ)) according to the definition of ⊕.

3.3 f-SPARQL

f-SPARQL [8] is a fuzzy extension of SPARQL. As is shown in Table 1, f-SPARQL extends three elements of SPARQL, i.e., the “Query”, the “TripleBlock”, and the “Constrain”. A “Query” element is extended with the element QueryType, which is defined as #FQ# (to denote flexible queries) and #top-k FQ# with k (to denote top-k flexible queries). The [with weight] is used to assign weight for a triple pattern. The “Constraint” element is extended with an additional “FlexibleExpression” element to form a so-called fuzzy FILTER constraint (or called a fuzzy query condition), which is characterized by one or several fuzzy terms. It should be noted that a fuzzy term here is either a atomic fuzzy term (or called a fuzzy predicate), e.g. “young” and “tall”, a modified fuzzy term, e.g. “very young” or “more or less tall”, which is described by a simple fuzzy term and a fuzzy modifier, or a compound fuzzy term such as “young or very young”, which is represented by simple fuzzy terms or modified fuzzy terms connected by or (union), and (intersection), or not (complement) connectors. The membership function of a modified fuzzy term or a compound fuzzy term is not defined but computed from the membership function of atomic fuzzy term by taking into account the fuzzy modifier or fuzzy connectors [8]. As the calculation of membership degrees is not the main focus of this paper, we do not distinguish these three types of fuzzy terms and call them fuzzy terms as a whole.

Table 1
Syntax of f-SPARQL

Query ::= Prologue (QueryType SelectQuery |

ConstructQuery |

DescribeQuery| AskQuery)

QueryType ::= ‘♯FQ♯’ | ‘♯ top-k FQ♯’ with k

TriplesBlock ::= TriplesSameSubject (‘.’ [with weight]

TriplesBlock?)?

Filter ::= ‘FILTER’ Constraint

Constraint ::= BrackettedExpression | BuiltInCall |

FunctionCall | FlexibleExpression

FlexibleExpression ::= (FuzzyTermExpression)?

FuzzyTermExpression ::= (Var = FuzzyTerm) [with threshold]

An f-SPARQL query Q is called atomic if it contains only one fuzzy term. The basic form of the fuzzy FILTER constraint in an atomic f-SPARQL query is of the form FILTER (?X = FT) [WITHα ], where ?X is a query variable, FT a fuzzy term, which corresponds to a fuzzy number. The optional parameter [WITH α] indicates the threshold of the minimum membership degree, where α∈[0,1]. For the cause of brevity, we represent an atomic f-SPARQL query as Q = 〈F, α〉.

An complex f-SPARQL query Q contains more than one fuzzy FILTER constraints, and is abbreviated as Q = 〈F₁, α₁, w₁〉 ∧ … ∧ 〈F_m, α_m, w_m〉, where F_i is a fuzzy term, α_i is the threshold for F_i, w_i is the weight of F_i which indicate the important degree of i-th fuzzy FILTER constraint, and 1 ≤ i ≤ m.

Example 2. A user query Q “retrieve the top 100 young and close_to_185 full professors that are prolific in publication” can be expressed in f-SPARQL as #top-k FQ# with 100

SELECT ?X ?Age ?Height ?NumPub

WHERE{?X rdf:type ub:FullProfessor.

?X ub:hasAge ?Age. with 0.5

FILTER(?Age=young) with 0.9

?X ub:hasHeight ?Height. with 0.3

FILTER(?Height=close_to_185) with 0.9

?X ub:numberOfPublications ?NumPub. with 0.2

FILTER(?NumPub=prolific) with 0.9}

The query Q is abbreviated as Q = 〈young, 0.9, 0.5〉 ∧ 〈close _ to _ 185, 0.9, 0.3〉 ∧ 〈prolific, 0.9, 0.2〉.

As for the query evaluation, a set of rules is proposed in [8] to translate an f-SPARQL query into a crisp SPARQL query so as to take advantage of existing implementations of SPARQL. More specifically, fuzzy FILTER constraints are translated into interval FILTER conditions by means of α-cut of a fuzzy number.

If the result set of an f-SPARQL query contains too few results (less than a given number k) or is empty if k is not specified, which means that no or only a few entities in the RDF dataset match fuzzy constraints in Q, then Q is called a failing query and needs to be relaxed to obtain more results.

4 Query relaxation of atomic f-SPARQL queries

This section focuses on query relaxation of an atomic f-SPARQL query. Our query relaxation method rests on the relative proximity relation-based fuzzy set relaxation introduced in Section 3.2.3. Firstly, for each atomic f-SPARQL query, we propose a relaxation operation characterized by a relative proximity relation and the notion for determining the parameter ɛ in the tolerance indicator Z. Then, we discuss the termination conditions for the relaxation process and provide the relaxation algorithm for atomic f-SPARQL queries.

4.1 Relaxation operation on fuzzy terms

We introduce for each fuzzy term F (which is actually a fuzzy set) a relaxation operation T^↑ and use T^↑ (F) to denote the relaxation of F using T^↑. The relaxation operation T^↑ relating to F should have the following three characteristics [6] to satisfy the semantics of fuzzy set relaxation.

For any element u in the domain of F, T^↑ will increase the membership degree of u or keep it unchanged, which is formally represented as μ_T^↑(F) (u) ≥ μ_F (u).

T^↑ must broaden the support of F, which is formally represented as supp (F) = {u|μ_F (u) >0} ⊂ supp (T^↑ (F)) = {u|μ_T^↑(F) (u) >0}.

In order to keep the semantic consistency with the initial fuzzy term as far as possible, T^↑ will preserve the core of F, which is formally represented as: core (F) = {u|μ_F (u) =1} = core (T^↑ (F)) = {u|μ_T^↑(F) (u) =1}.

Relative proximity relation E mentioned in Section 3.2.2 is a fuzzy set obtained by parameterizing ɛ in the tolerance indicator Z (1, 1, ɛ, ɛ/(1 - ɛ)). It is trivially true that Z satisfies the characteristics mentioned above. We thus, in this paper, choose Z to characterize the relaxation operation T^↑. After applying the relaxation operation T^↑ on the fuzzy set F, a more general fuzzy set F′ is obtained, i.e., F′ = T^↑ (F) = F^Z = F ⊕ Z.

The relative tolerance parameter ɛ directly determines the efficiency of fuzzy query relaxation and the quality of relaxation results. One intuitive way consists in requesting the user to specify the granularity of the relaxation operation. The granularity is given in terms of the maximal relaxation step ω, which ensures that relaxed queries are not too far, semantically speaking, from the original one. The larger the value of ω, the finer the granularity. When ω is specified by the user, ɛ is calculated as ɛ = ɛ_max/ω. As is shown in 3.2.3, to ensure the support of Z lies in the validity interval V, $ɛ_{\max} = (3 - \sqrt{5}) / 2 \approx 0.634$ .

4.2 Relaxation algorithm for atomic fuzzy queries

The relaxation of an atomic f-SPARQL query Q = 〈F, α〉 is actually relaxation of the fuzzy set F by the relaxation operation T^↑ to F′ = T^↑ (F) = F^Z = F ⊕ Z. Q is then relaxed to a less restrictive query Q′ = 〈F′, α〉 = 〈F ⊕ Z, α〉. The query after n times of relaxation is Qⁿ = 〈F⁽ⁿ⁾, α〉 = 〈T^↑(n) (F) , α〉 = 〈F^(n)Z, α〉 = 〈F ⊕ ⁿZ, α〉. The relaxation is conducted repeatedly until enough results are returned or termination conditions are met. After n times of query relaxation, the relaxed query may be too far from the initial query. In order to prevent the relaxation process from proceeding endless, even the result set remains empty or contains too few results, we need to stop the relaxation operation. Several termination conditions are taken into account to ensure the fuzzy query relaxation algorithmcontrollable.

The relaxation process terminates either the number of results has meet the user request, the maximal relaxation step is reached, or the support of relaxed fuzzy set overlaps with the core of a fuzzy set F_P of non-authorized values.

We introduce a fuzzy set of non-authorized values as a judgment condition to control the relaxation process [6]. For each fuzzy term F, there is a non-authorization fuzzy set F_P of unauthorized values, which is specified by domain experts, for indicating the degree that an element u does not belong to F. If μ_{F
_P} (u) =1, then u is completely not affiliated with F. After introducing F_P, the relaxed fuzzy set F⁽ⁿ⁾ is reformulated as F⁽ⁿ⁾⁺ = F⁽ⁿ⁾ ∩ (F_P) ^c, where (F_P) ^c is the complement of F_P. If ∩ is interpreted as minimum, the membership function of F⁽ⁿ⁾⁺ is μ_{F
⁽ⁿ⁾⁺} (u) = min(μ_{F
⁽ⁿ⁾} (u) , 1 - μ_{F
_P} (u)). The relaxation process is stopped if one of the following conditions is satisfied.

Condition 1: the number of results satisfies the user’s request.

Condition 2: ɛ reaches its maximum ɛ_max.

Condition 3: the support of the relaxed fuzzy set F⁽ⁿ⁾ overlaps with the core of the F_P.

Condition 3 enables a RDF query system to effectively filter out entities that are completely unsatisfied with the fuzzy FILTER constraint in an f-SPARQL query, and to return entities to some extent satisfy F⁽ⁿ⁾, since elements in the core of F_P are completely unsatisfied with the fuzzy FILTER constraint.

Example 3. The non-authorization fuzzy set F_P of F (184, 186, 5, 5) in Example 1 is defined as (0, 160, 0, 4) ∪ (200, 250, 4, 0) and is shown in combination with the obtained relaxed fuzzy set F⁽ⁿ⁾ in Fig. 2.

Fig.2

The membership functions of F, F⁽ⁿ⁾, and F_P.

The relaxation algorithm for atomic fuzzy terms is shown in Algorithm 1. In Line 1, the ɛ is calculated as shown in Section 4.1 and the tolerant indicator Z is obtained. The function Exec (Q) in line 2 executes the query Q over the input RDF dataset D and returns the results. In Line 3, the fuzzy set F is relaxed to F⁽ⁿ⁾ after n times of relaxation. If F = (A, B, a, b), then F⁽ⁿ⁾ = (A, B, a + n * A * ɛ, b + n * B * ɛ/(1 - ɛ)). If (F_P) ^c = (C, D, c, d), then Condition 3 is formally represented as C - c ≥ A - a - n * A * ɛ and D + d ≤ B + b + n * B * ɛ/(1 - ɛ) and the execution condition for the while loop is C - c < A - a - n * A * ɛ or D + d > B + b + n * B * ɛ/(1 - ɛ). It should be noted that, the membership function of F in Fig. 2 contains both left support and right support. If the left support of a fuzzy set overlaps with the core of F_P, while the right support does not overlap the core of F_P, then only relax the right support until the number of the results is met or the right support overlaps with core of F_P. In this case, the tolerance indicator Z is no longer symmetric. The relaxation operation is then only applied on the right support of F with the tolerance indicator Z = (1, 1, 0, ɛ/(1 - ɛ)).

Algorithm 1 The relaxation algorithm for atomic fuzzy queries

Input: Q = {〈F, α〉} with F = (A, B, a, b); a RDF dataset D; The maximal relaxation step ω specified by the user; number of results required k.

Output: Result set Results (Q);

1: n = 0; ɛ = ɛ_max/ω; Z = (1, 1, ɛ, ɛ/(1 - ɛ)); F⁽ⁿ⁾ = F; F⁽ⁿ⁾⁺ = F⁽ⁿ⁾ ∩ (F_P) ^c; Q⁽ⁿ⁾⁺ = 〈F⁽ⁿ⁾⁺, α〉;

2: Results (Q) = Exec (Q⁽ⁿ⁾⁺);

3: while |Results (Q) | < k ∧ (C - c < A - a - n * A * ɛ or D + d > B + b + n * B * ɛ/(1 - ɛ) do

4: n = n + 1;

5: F⁽ⁿ⁾ = T^↑(n) (F) = F^(n)Z = F ⊕ ⁿZ;

6: F⁽ⁿ⁾⁺ = F⁽ⁿ⁾ ∩ (F_P) ^c;

7: Q⁽ⁿ⁾⁺ = 〈F⁽ⁿ⁾⁺, α〉;

8: Results (Q) = Exec (Q⁽ⁿ⁾⁺);

9: end while

10: return Results (Q);

5 Relaxation of complex fuzzy queries

An complex f-SPARQL query contains multiple fuzzy terms, each of which is assigned a weight to reflect its importance. In practice, we focus on the following four main issues in the process of complex f-SPARQL query relaxation.

To determine the execution order of multiple relaxed queries derived from the initial complex query.

To ensure that the relaxation degree of each fuzzy term is basically the same.

To ensure that every execution of a relaxed query has effect on the final results.

To ensure each relaxation operation is controllable to preserve the semantics of query results.

An f-SPARQL complex query is of the form Q = 〈F₁, α₁, w₁〉 ∧ … ∧ 〈F_m, α_m, w_m〉. In the following, when the threshold and the weight of a fuzzy term is not relevant to the illustration of proposed method, they are sometimes not included and the abbreviated form of Q = F₁ ∧ … ∧ F_m is adopted.

5.1 Determination of execution order of multiple relaxed queries

In this paper, we adopt the local relaxation method in the process of the relaxation of complex queries. Local relaxation is more fine-grained than the global relaxation in that the latter one relaxes all the terms in one relaxation step [4].

In order to determine the execution order of relaxation operations on different fuzzy terms in a relaxed query, we use a lattice structure to represent the derivation relationship of all relaxed queries. The root of a lattice structure is the initial failing query, and the other nodes are relaxed queries derived from the initial query through one or several relaxation operations. The layer of the lattice structure in which a relaxed query lies indicates the times for which the relaxation operation is applied.

Example 4. Given the query Q in Example 2, the relaxed query lattice structure of Q is shown in Fig. 3. For the reason of space, only two layers of relaxed queries are shown.

Fig.3

The lattice of relaxed queries.

Fig.4

The lattice of relaxed queries with w₁ > w₂ > w₃.

The execution order of relaxed queries is roughly determined by their priorities [4]. The priority of a relaxed query is in accordance with the times of relaxation operations applied on it. The priorities of two relaxed queries Q′ and Q″ satisfy Proposition 1.

Proposition 1. If $Σ_{i = 1}^{k} count (T_{i}^{↑}$ in $Q^{'}) < Σ_{i = 1}^{k} count (T_{i}^{↑}$ in , then .

$Σ_{i = 1}^{k} count (T_{i}^{↑} in Q^{'})$ denotes the times of relaxation operations transforming the initial query Q to the relaxed query Q′. Obviously, the relaxed queries obtained by fewer relaxation operations are situated in the higher layer of the lattice structure and are given higher priority. We adopt the breadth-first strategy to execute relaxed queries in each layer from top to bottom, which makes the relaxation degree of all fuzzy conditions are basically the same.

Proposition 2 defines the execution order of relaxed queries in the same layer of the query relaxation lattice.

Proposition 2. Q′ and Q″ are situated in the same layer of the query relaxation lattice. If $Σ_{i = 1}^{k} (w_{i} * count (T_{i}^{↑}$ about F_i in Q′))< $Σ_{i = 1}^{k} (w_{i} * count (T_{i}^{↑}$ about F_i in , then .

In Proposition 2, w_i is the weight of the i-th fuzzy condition. The higher the weight w_i of a fuzzy term, the more attention the user pays to the result of F_i and the more accurate information the user wants to obtain on the attribute related to F_i, thus the later it is executed to ensure the result close to the initial query intent. Assume Q′ and Q″ are in the same layer of the query relaxation lattice, then they are derived from the initial query through same times of relaxation. We calculate the weighted means of the relaxation times of each fuzzy condition to determine the priority. As suggested by Proposition 2, the higher the mean value, the lower the execution priority.

Example 5. As is shown in Example 2, w₁ > w₂ > w₃, the query relaxation lattice in Fig. 3 is then transformed into the lattice shown in Fig. 4.

5.2 Minimal failing subqueries

Minimal Failing Subqueries (MFSs) give the reason why the initial query fails [4, 12]. We apply MFSs to our fuzzy query relaxation algorithm for deciding the necessity of the execution of a relaxed query, so as to reduce the times of execution of relaxed queries.

Definition 6. (MFS) Let Q = F₁ ∧ … ∧ F_m be a failing fuzzy query, if Q′ is a failing subquery of Q and all the subqueries of Q′ do not fail, then Q′ is a MFS of Q. A MFS of Q is represented as Q′ = mfs (Q) = F_i ∧ … ∧ F_j, where {i, …, j} ⊆ {1, …, m}.

In the following, we discuss the method of how to calculate all the MFSs of a failing fuzzy query Q over a RDF dataset. A failing fuzzy query Q = F₁ ∧ … ∧ F_m is represented as a set S = {F₁, …, F_m}, where F_i (1 ≤ i ≤ m) is a fuzzy term. Generally, a failing query has one or several MFSs. The algorithm for finding MFSs is based on the lattice structure of S, which is composed of the power set of S, denoted as 2^S, with S the top node and ∅ the bottom node in the lattice. The lattice structure of a query with four fuzzy terms is shown in Fig. 5.

Fig.5

The lattice structure for finding MELs.

Finding MFSs of a failing query can be mapped to the MEL (Minimal Element in the Lattice) problem [4]. A test function T is defined on 2^S, which is monotonic w.r.t set subsumption, i.e., if T (A) = True for a node A in the lattice of S, then for each ancestor B of A, T (B) = True. We call a minimal element in the lattice w.r.t T an MEL. If A is a MEL of S, then T (C) = False for any successor C of A. The unary relation T in this scenario is defined as testing whether a subquery in the lattice fails. We adopt depth-first and top-down traversal of the lattice of S to find a MEL that causes the failure of the initial query. For example, the test path that finds the MEL F₄ is shown in Fig. 5 with solid lines. An underline suggests a termination of the search procession where the test of the underlined set returns False. A set is boxed if the test return True and findAllMels () is called recursively. The algorithm for finding all MELs of S is shown in Algorithm 2.

Algorithm 2 Finding all MELs findAllMels (S)

Input: S: the set representation of a failing complex f-SAPRQL query Q

Output: AllMels: the set of all MELs

1: for all F_i in S do

2: isMel : = True;

3: if (T (S - {F_i}) = True) then

4: findAllMels (S - {F_i});

5: end if

6: isMel : = False;

7: end for

8: if isMel = True then

9: AllMels = AllMels ∪ S;

10: end if

The set of MELs of the initial query is obtained by Algorithm 2, then the set of MFSs of a failing query Q is formally defined as mfs (Q) = {M|M ⊆ Q is a MEL of Q}.

5.3 Controllable relaxation algorithm based on MFSs

The relaxation lattice structure after several times of relaxation operations contains a large search space and results in exponential time complexity. In order to improve the efficiency of query relaxation, we use MFSs of the initial query to determine the necessity of execution of a relaxed query in the relaxation lattice. The MFSs of a relaxed query in the query relaxation lattice can be speculated by the MFSs of the initial query without execution of MFS finding algorithm [6].

Given a failing fuzzy query Q = F₁ ∧ … ∧ F_m, we use T^↑ (Q) to denote the relaxed query of Q, and SQ^[j] to denote a subquery of Q, which is obtained by deleting the j-th fuzzy term of Q. The mfs (Q) of the initial query Q is obtained in Section 5.2 and mfs (T^↑ (Q)) is constructed by using elements of mfs (Q). Given mfs (Q) = {F_{i
₁} ∧ … ∧ F_{j
₁}, …, F_{i
_h} ∧ … ∧ F_{j
_h}}, the construction rules of mfs (T^↑ (Q)) is based on Propositions 3 and 4, which are proved in [3].

Proposition 3. If $T^{↑} (Q) = {SQ}^{[j]} \land T_{j}^{↑} (F_{j})$ , j ∈ {i_r, …, j_r}, 1 ≤ r ≤ h, then replacing F_j in all elements of mfs (Q) with $T_{j}^{↑} (F_{j})$ leads to mfs (T^↑ (Q)).

Proposition 4. If $T^{↑} (Q) = {SQ}^{[j]} \land T_{j}^{↑} (F_{j})$ , j ∉ {i_r, …, j_r}, for each 1 ≤ r ≤ h, then mfs (T^↑ (Q)) = mfs (Q).

Example 6. Given a fuzzy query Q = F₁ ∧ F₂ ∧ F₃ ∧ F₄ and mfs (Q) = {F₁ ∧ F₃, F₁ ∧ F₄} is determined by Algorithm 2, if the relaxed query on F₁ is $T^{↑} (Q) = T_{1}^{↑} (F_{1}) \land {SQ}^{[1]} = T_{1}^{↑} (F_{1}) \land F_{2} \land F_{3} \land F_{4}$ , then $mfs (T^{↑} (Q)) = {T_{1}^{↑} (F_{1}) \land F_{3}, T_{1}^{↑} (F_{1}) \land F_{4}}$ . If $T^{↑} (Q) = T_{2}^{↑} (F_{2}) \land {SQ}^{[2]} = F_{1} \land T_{2}^{↑} (F_{2}) \land F_{3} \land F_{4}$ is the relaxed query on F₂, then mfs (T^↑ (Q)) = mfs (Q) = {F₁ ∧ F₃, F₁ ∧ F₄}.

Table 2
Query instances

Queries

Q ₁ #top-k FQ# with 100

SELECT ?X ?Height

WHERE {?X rdf:type ub:UndergraduateStudent.

?X ub:hasHeight ?Height.

FILTER(?Height=close_to_185) with 0.9}

Q ₂ #top-k FQ# with 100

SELECT ?X ?NumPub

WHERE{?X rdf:type ub:GraduateStudent.

?X ub:hasHeight ?Height. with 0.2

FILTER(?Height=close_to_185) with 0.9

?X ub:numberOfPublications ?NumPub. with 0.8

FILTER(?NumPub=prolific) with 0.9}

Q ₃ #top-k FQ# with 100

SELECT ?X ?Age ?Height ?NumPub

WHERE{?X rdf:type ub:Lecturer

?X ub:hasAge ?Age. with 0.2

FILTER(?Age=young) with 0.9

?X ub:hasHeight ?Height. with 0.3

FILTER(?Height=close_to_185) with 0.9

?X ub:numberOfPublications ?NumPub. with 0.5

FILTER(?NumPub=prolific) with 0.9}

Q ₄ #top-k FQ# with 100

SELECT ?X ?Age ?Height ?Salary ?NumPub

WHERE{?X rdf:type ub:FullProfessor

?X ub:hasAge ?Age. with 0.1

FILTER(?Age=young) with 0.9

?X ub:hasHeight ?Height. with 0.2

FILTER(?Height=close_to_185) with 0.9

?X ub:hasSalary ?Salary. with 0.3

FILTER(?Salary=well-paid) with 0.9

?X ub:numberOfPublications ?NumPub. with 0.4

FILTER(?NumPub=prolific) with 0.9}

Q ₅ #top-k FQ# with 100

SELECT ?X ?Age ?Height ?NumPub

WHERE{?X rdf:type ub:FullProfessor.

?X ub:hasAge ?Age. with 0.5

FILTER(?Age=young) with 0.9

?X ub:hasHeight ?Height. with 0.3

FILTER(?Height=close_to_185) with 0.9

?X ub:numberOfPublications ?NumPub. with 0.2

FILTER(?NumPub=prolific) with 0.9}

	Queries
Q ₁	#top-k FQ# with 100
	SELECT ?X ?Height
	WHERE {?X rdf:type ub:UndergraduateStudent.
	?X ub:hasHeight ?Height.
	FILTER(?Height=close_to_185) with 0.9}
Q ₂	#top-k FQ# with 100
	SELECT ?X ?NumPub
	WHERE{?X rdf:type ub:GraduateStudent.
	?X ub:hasHeight ?Height. with 0.2
	FILTER(?Height=close_to_185) with 0.9
	?X ub:numberOfPublications ?NumPub. with 0.8
	FILTER(?NumPub=prolific) with 0.9}
Q ₃	#top-k FQ# with 100
	SELECT ?X ?Age ?Height ?NumPub
	WHERE{?X rdf:type ub:Lecturer
	?X ub:hasAge ?Age. with 0.2
	FILTER(?Age=young) with 0.9
	?X ub:hasHeight ?Height. with 0.3
	FILTER(?Height=close_to_185) with 0.9
	?X ub:numberOfPublications ?NumPub. with 0.5
	FILTER(?NumPub=prolific) with 0.9}
Q ₄	#top-k FQ# with 100
	SELECT ?X ?Age ?Height ?Salary ?NumPub
	WHERE{?X rdf:type ub:FullProfessor
	?X ub:hasAge ?Age. with 0.1
	FILTER(?Age=young) with 0.9
	?X ub:hasHeight ?Height. with 0.2
	FILTER(?Height=close_to_185) with 0.9
	?X ub:hasSalary ?Salary. with 0.3
	FILTER(?Salary=well-paid) with 0.9
	?X ub:numberOfPublications ?NumPub. with 0.4
	FILTER(?NumPub=prolific) with 0.9}
Q ₅	#top-k FQ# with 100
	SELECT ?X ?Age ?Height ?NumPub
	WHERE{?X rdf:type ub:FullProfessor.
	?X ub:hasAge ?Age. with 0.5
	FILTER(?Age=young) with 0.9
	?X ub:hasHeight ?Height. with 0.3
	FILTER(?Height=close_to_185) with 0.9
	?X ub:numberOfPublications ?NumPub. with 0.2
	FILTER(?NumPub=prolific) with 0.9}

The construction of the relaxation lattice of the initial query makes the relaxed queries being executed in a certain order, and the introduction of MFSs makes it possible to avoid some nodes in the relaxation lattice from being executed. In fact, if a node contains any element in the MFS set of its parent node, then the relaxed query in this node is a failing query and should not be executed.

Algorithm 3 Relaxation Algorithm for Complexf-SPARQL Queries

Input: A complex failing f-SPARQL query Q = F₁ ∧ … ∧ F_m with mfs (Q) = {F_{i
₁} ∧ … ∧ F_{j
₁}, …, F_{i
_h} ∧ … ∧ F_{j
_h}}; Number of intended results k;

Output: The result set Results (Q);

1: i : =0; Layer (0) : = Q; failing : = True; Results (Q) : = Exec (Q);

2: if |Results (Q) | > = k then

3: failing : = False;

4: end if

5: while (failing) do

6: i : = i + 1; $Layer (i) : = {Q_{i}^{1}, \dots, Q_{i}^{n_{i}}}$ ;

7: for all $Q_{i}^{j}$ in Level (i) do

8: for all fQ in $father (Q_{i}^{j})$ do

9: if $mfs (Q_{i}^{j})! = mfs (fQ)$ then

10: $Results (Q) = Results (Q) \cup Exec (Q_{i}^{j})$ ;

11: if |Results (Q) | > = k then

12: failing = False; break;

13: end if

14: end if

15: if failing = False then

16: break;

17: end if

18: end for

19: end for

20: end while

21: return Results (Q);

We obtain an efficient and controllable relaxation algorithm (see Algorithm 3), by considering the MFSs and the fuzzy set of non-authorized values.

The function Exec (Q) evaluates Q against a RDF dataset and returns results. Layer (i) is the set of all the relaxed queries in the i-th layer of the lattice structure. Elements in Layer (i) are ordered by considering relaxed query execution priorities introduced in Section 5.1. Each fuzzy term in a query is not relaxed further if its support overlaps with the core of the corresponding fuzzy set F_P of non-authorized values (see Section 4.2). The father (Q) represents the set of parent nodes of Q. If the set of MFSs of Q is equivalent to that of any of its parent node, which means the relaxation operation is applied to the fuzzy term that is not the cause of failure, then it should not be executed.

6 Experimental evaluation

Experiment setup. We use Java to implement the proposed algorithm, Jena TDB for storing RDF datasets, and Jena ARQ API for processing crisp SPARQL queries translated from f-SPARQL queries. The system is running on WIN 7 SP1 64-bit operating system with i7 3.40 GHz CPU and 8G RAM.

Datasets. We evaluate our relaxation algorithm using an extension of Lehigh University Benchmark (LUBM) [14] to measure its performance. The dataset contains two parts. The first part is generated by LUBM that contains about 8M distinct triples based on a university domain ontology. The second part is generated by adding a set of triples concerning some DatatypeProperty, e.g., hasHeight, hasAge, hasSalary and numberOfPublications.

The values of numberOfPublications property are calculated by using aggregate function COUNT, while the values of properties hasHeight, hasAge and hasSalary are set randomly within a reasonable range. For example, for the ub:hasAge property of an UndergraduateStudent, we request the value of it within a reasonable range of [15, 25].

Query instances. We compose five f-SPARQL queries (see Table 2), where Q₁ is an atomic query, and Q₂–Q₅ are complex queries. There are four fuzzy terms in the query instances, e.g., “close_to_185”, “young”,“well-paid”, and “prolific”, which are related to the properties of hasHeight, hasAge, hasSalary, and numberOfPublications, respectively. The membership functions and the complements (F_P) ^c of the corresponding non-authorized fuzzy sets F_P are listed in Table 3.

As mentioned in many fuzzy systems, the rigorous specification of membership functions can be a difficult task. Often the parameterization of a membership function is summarily described as being chosen by users or being assigned by experts according to the previous experience. Also, in our work, the parameters of fuzzy sets are assigned based on the previous existing experience, e.g., the parameter “25” is often considered as the critical point of the membership function “young” in Table 3 in many fuzzy systems.

Evaluation. We assess our fuzzy query relaxation (FQR) method by means of Recall and Precision for flexible RDF queries which are defined in [22]. The experimental results of Recall and Precision are summarized in Table 4. The proposed method has a relative high recall as more relevant results are returned after relaxation. $\begin{matrix} Recall & = & \frac{relevanttriplesinqueryresults}{relevant triples in the dataset} \\ Precision & = & \frac{top - k triples \cap relevant triples}{k} \end{matrix}$

Table 3
The membership functions of four fuzzy terms

fuzzy terms membership functions (F_P) ^c

close_to_185 (184, 186, 5, 5) (164,196,4,4)

young (0, 25, 0, 25) (0,50,0,5)

well-paid (8000,10000,4000,0) (3000,10000,1000,0)

prolific (15, 20, 5, 0) (5,20,2,0)

fuzzy terms	membership functions	(F_P) ^c
close_to_185	(184, 186, 5, 5)	(164,196,4,4)
young	(0, 25, 0, 25)	(0,50,0,5)
well-paid	(8000,10000,4000,0)	(3000,10000,1000,0)
prolific	(15, 20, 5, 0)	(5,20,2,0)

Table 4

Recall and precision

Query	Recall	Precision
Q ₁	0.97	0.86
Q ₂	0.89	0.82
Q ₃	0.88	0.90
Q ₄	0.77	0.83
Q ₅	0.82	0.86
Average	0.87	0.85

We make a comparison of the response time with the method for processing f-SPARQL queries (abbreviated as FQ) [8] as no similar implementations are concerned with this problem. The experimental results are shown in Fig. 6.

Fig.6

The response time of FQR and FQ.

The response time of FQR is a little longer than that of FQ, as the query relaxation algorithm contains additional procedures, including (i) calculating relaxed queries, (ii) constructing the relaxation lattice structure, and (iii) finding MFSs. The evaluation of Q₁ involves only procedure (i), while Q₂–Q₅ involve all of them. Procedure (i) and (ii) are in O (n³) time complexity with regard to the number of fuzzy atoms, while Procedure (iii) is in exponential time complexity.

Example 7. We set the maximal relaxation step ω to 4, and assume w.l.o.g. that we obtain enough results after the lattice of relaxed queries in Fig. 4 is constructed. The response time of Q₅ is thus consists of

The execution time of Q₅, including the time of translating an f-SPARQL query into a SPARQL query and the execution time of the latter.

The time of sorting relaxed queries in the same layer by calculating priorities shown in Propositions 1 and 2, which is trivial.

The time of calculating each relaxed query in the lattice.

The execution time of each relaxed queries.

The execution time of the Algorithm 2 for finding MFSs, which executes only once.

7 Conclusion and prospect

In this paper, we investigate query relaxation of f-SPARQL queries over RDF data, when the query results are too few or empty.

Firstly, for each atomic fuzzy query, we provide a relaxation operation for the fuzzy term to relax it to a more tolerant fuzzy set. A fuzzy set of non-authorized values is introduced as a termination condition to make the relaxation algorithm controllable.

Secondly, on the basis of atomic fuzzy query relaxation strategy and under the local relaxation, we propose the relaxation algorithm for complex f-SPARQL queries, by considering weights for fuzzy terms, MFSs, and the fuzzy set of non-authorized values.

Finally, we implement the proposed algorithms to evaluate the proposed fuzzy query relaxation method. We show the effectiveness of our proposed method on an extension of LUBM dataset. The experimental results show a relatively high recall as more relevant results are returned after relaxation. We analyze the contribution of different steps to the overall running time of the query relaxation process.

Further research concerns automatically determining the weight of a fuzzy term in a complex f-SPARQL query. In this paper, by requiring the user to specify weights for various fuzzy terms, we assume that the user is knowledgeable enough to ascertain these to a fine degree. It is not a pragmatic approach when there are quite a few fuzzy terms related to different numeric attributes. A technique is required by which the weight of a fuzzy term is automatically specified by considering the user preferences and/or distribution of the RDF dataset queried. Another interesting topic for research is the parameterization of membership functions and its influence on the query relaxation process.

Footnotes

Acknowledgments

The work is supported by the National Natural Science Foundation of China (61672139) and Fundamental Research Funds for the Central Universities (N140404005, N140404010, N151704001).

References

Andreasen

and Pivert

, On the weakening of fuzzy relational queries, In Proceedings of 8th International Symposium on Methodologies for Intelligent Systems (ISMIS ’94), 1994, pp. 144–153.

Bosc

, HadjAli

and Pivert

, Fuzzy closeness relation as a basis for weakening fuzzy relational queries, In Proceedings of 6th International Conference on Flexible Query Answering Systems (FQAS 2004), Springer, 2004, pp. 41–53.

Bosc

, Hadjali

and Pivert

, Towards a tolerance-based technique for cooperative answering of fuzzy queries against regular databases, Proceedings of OTM Confederated International Conferences “On the Move to Meaningful Internet Systems” (OTM 2005), Springer, 2005, pp. 256–273.

Bosc

, HadjAli

and Pivert

, Relaxation paradigm in a flexible querying context, In Proceedings of 7th International Conference on Flexible Query Answering Systems (FQAS 2006), 2006, pp. 39–50.

Bosc

, Hadjali

and Pivert

, Empty versus overabundant answers to flexible relational queries, Fuzzy Sets and Systems 159(12) (2008), 1450–1467.

Bosc

, HadjAli

and Pivert

, Weakening of fuzzy relational queries: An absolute proximity relation-based approach, Mathware & Soft Computing 14(1) (2008), 35–55.

Calì

, Frosini

, Poulovassilis

and Wood

P.T.

, Flexible querying for SPARQL, In Proceedings of OTM Confederated International Conferences “On the Move to Meaningful Internet Systems” (OTM 2014), 2014, pp. 473–490.

Cheng

, Ma

Z.M.

and Yan

, f-sparql: A flexible extension of sparql, In 21st International Conference Database and Expert Systems Applications (DEXA 2010), 2010, pp. 487–494.

de Calmès

, Dubois

, Hüllermeier

, Prade

and Sèdes

, Flexibility and fuzzy case-based evaluation in querying: An illustration in an experimental setting, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 11(1) (2003), 43–66.

10.

Dolog

, Stuckenschmidt

, Wache

and Diederich

, Relaxing RDF queries based on user and domain preferences, J Intell Inf Syst 33(3) (2009), 239–260.

11.

Dubois

and Prade

, Possibility theory. JohnWiley & Sons Inc, 1988.

12.

Fokou

, Jean

, Hadjali

and Baron

, Cooperative techniques for SPARQL query relaxation in RDF databases, In Proceedings of 12th European Semantic Web Conference (ESWC 2015), 2015, pp. 237–252.

13.

Godfrey

, Minimization in cooperative response to failing database queries, Int J Cooperative Inf Syst 6(2) (1997), 95–149.

14.

Guo

, Pan

and Heflin

, Lubm: A benchmark for owl knowledge base systems, Web Semantics: Science, Services and Agents on the World Wide Web 3(2-3) (2005), 158–182.

15.

Hachani

, Hassine

M.A.B.

, Chettaoui

and Ounelli

, Cooperative answering of fuzzy queries, J Comput Sci Technol 24(4) (2009), 675–686.

16.

HadjAli

, Dubois

and Prade

, Qualitative reasoning based on fuzzy relative orders of magnitude, IEEE Trans Fuzzy Systems 11(1) (2003), 9–23.

17.

Hogan

, Mellotte

, Powell

and Stampouli

, Towards fuzzy query-relaxation for RDF, In Proceedings of 9th European Semantic Web Conference (ESWC 2012), 2012, pp. 687–702.

18.

Huang

, Liu

and Zhou

, Computing relaxed answers on RDF databases, In Proceedings of 9th International Conference on Web Information Systems Engineering (WISE 2008), 2008, pp. 163–175.

19.

Huang

, Liu

and Zhou

, Approximating query answering on RDF databases, World Wide Web 15(1) (2012), 89–114.

20.

Hurtado

C.A.

, Poulovassilis

and Wood

P.T.

, A relaxed approach to RDF querying, In Proceedings of 5th International Semantic Web Conference (ISWC 2006), 2006, pp. 314–328.

21.

Hurtado

C.A.

, Poulovassilis

and Wood

P.T.

, Query relaxation in RDF, J Data Semantics 10 (2008), 31–61.

22.

, Jia

, Cheng

and Angryk

R.A.

, Sparql queries on rdf with fuzzy constraints and preferences, Journal of Intelligent & Fuzzy Systems 1–13 (Preprint).

23.

Manola

, Miller

, McBride

, et al., Rdf primer, W3C Recommendation 10(1-107) (2004), 6.

24.

Poulovassilis

and Wood

P.T.

, Combining approximation and relaxation in semantic web path queries, In Proceedings of 9th International Semantic Web Conference (ISWC 2010), 2010, pp. 631–646.

25.

Prud’Hommeaux and A. Seaborne, Sparql query language for rdf, W3C Recommendation 15 (2008).

26.

Straccia

, Fuzzy sparql: A query language for fuzzy rdfs graphs, In Foundations of Fuzzy Logic and Semantic Web Languages, Chapman and Hall/CRC, 2014, pp. 305–309.

27.

Voglozin

W.A.

, Raschia

, Ughetto

and Mouaddib

, Querying the saintetiq summaries – dealing with null answers, In Proceedings of IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2005), 2005, pp. 585–590.

28.

Wang

, Ma

and Cheng

, fp-sparql: An rdf fuzzy retrieval mechanism supporting user preference, In Proceedings of 9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2012), IEEE, 2012, pp. 443–447.

29.

Zadeh

L.A.

, Fuzzy sets, Information and Control 8(3) (1965), 338–353.

Query	::= Prologue (QueryType SelectQuery \|
	ConstructQuery \|
	DescribeQuery\| AskQuery)
QueryType	::= ‘♯FQ♯’ \| ‘♯ top-k FQ♯’ with k
TriplesBlock	::= TriplesSameSubject (‘.’ [with weight]
	TriplesBlock?)?
Filter	::= ‘FILTER’ Constraint
Constraint	::= BrackettedExpression \| BuiltInCall \|
	FunctionCall \| FlexibleExpression
FlexibleExpression	::= (FuzzyTermExpression)?
FuzzyTermExpression	::= (Var = FuzzyTerm) [with threshold]

Relaxing of flexible RDF queries: A relative proximity relation-based approach

Abstract

Keywords

1 Introduction

2 Related works

3 Preliminaries

3.1 RDF and SPARQL

3.2 Fuzzy set theory and principle of relaxation

3.2.1 Fuzzy set theory

3.2.3 Fuzzy set relaxation

3.3 f-SPARQL

4.1 Relaxation operation on fuzzy terms

4.2 Relaxation algorithm for atomic fuzzy queries

5.1 Determination of execution order of multiple relaxed queries

Table 3 The membership functions of four fuzzy terms fuzzy terms membership functions (F P ) c close_to_185 (184, 186, 5, 5) (164,196,4,4) young (0, 25, 0, 25) (0,50,0,5) well-paid (8000,10000,4000,0) (3000,10000,1000,0) prolific (15, 20, 5, 0) (5,20,2,0)

Footnotes

Acknowledgments

References

Table 3
The membership functions of four fuzzy terms

fuzzy terms membership functions (F_P) ^c

close_to_185 (184, 186, 5, 5) (164,196,4,4)

young (0, 25, 0, 25) (0,50,0,5)

well-paid (8000,10000,4000,0) (3000,10000,1000,0)

prolific (15, 20, 5, 0) (5,20,2,0)