Abstract
In recent years there have been a great deal of interests in effectively extracting information from RDF repositories. However, users often experience empty answer problems (EAP) when they are not familiar with the structure or content of a RDF dataset and issue a restrictive query to the system. A solution is to allow users to incorporate fuzzy conditions into the query to make it more flexible. Although a fuzzy query is more or less satisfactory, there are still occasions when no or too few entities in a RDF dataset satisfy the query criteria with a degree more than zero, which is called a fuzzy-EAP problem. In this paper, we propose a query relaxation approach based on relative proximity relations to tackle this problem. We present a framework of fuzzy query relaxation over RDF datasets, and present query relaxation algorithms for atomic f-SPARQL queries and complex f-SPARQL queries. Moreover, this paper reports on a prototype implementation on an extension of LUBM dataset to verify the feasibility and effectiveness of the proposed method. The experimental results show a relatively high recall as more relevant results are returned after relaxation.
Keywords
Introduction
With more and more domain knowledge being encoded in RDF (Resource Description Framework) data formats [23], retrieving data and information from RDF datasets has become an important research topic. SPARQL (Simple Protocol and RDF Query Language) is the W3C proposed standard RDF query language, which provides users with the capability to extract information from graph-based data [25].
However, ordinary users may not fully understand the structure and content of a RDF dataset on account of its growing scale and complexity. In such cases, even if users are accurately aware of their own query intentions, there are still chances when the query result is empty or too few in amount, which is called an Empty Answer Problem (EAP). In order to avoid users from adopting inefficient “trial and error” method to get more results, there have been query systems [19, 20] using RDFS ontology entailment to automatically relax an initial SPARQL query to obtain more results.
Another solution for EAP is to allow users to issue queries with flexible/fuzzy conditions. The idea of introducing flexibility into queries is gaining more and more attention in the Semantic Web community. In particular, there have been researches on applying fuzzy set theory [29] to RDF query languages to increase the ability of expressing fuzzy query conditions and thus obtain so called flexible queries for RDF, e.g., a flexible extension of SPARQL called f-SPARQL [8] which allows to construct fuzzy FILTER constraints by means of fuzzy terms, fp-SPARQL [28] and SPARQLf-p [22] which further extends f-SPARQL with preferred expressions, and Fuzzy SPARQL [26] which addresses queries to fuzzy RDFS graphic data. Although a fuzzy query is more or less satisfactory, there are still occasions when there is no or too few entities in a RDF dataset that satisfy the query criteria with a degree more than zero, which is called a fuzzy-EAP problem. Inspired partly by studies from relational database research [2, 6], we endeavor to achieve automatic relaxation of a failing fuzzy RDF query to obtain more results. This paper makes the following major contributions: To the best of our knowledge, it presents the first query relaxation framework for flexible queries over RDF datasets. The framework consists of relaxation algorithms for atomic fuzzy queries and complex fuzzy queries. The relaxation lattice and the weights of fuzzy conditions is introduced to determine the execution order of relaxed queries, and the MFSs (Minimal Failing Subqueries) is introduced to avoid unnecessary execution of some relaxed queries. It presents a prototype implementation on an extension of LUBM dataset to verify the feasibility and effectiveness of the proposed fuzzy query relaxation method. The experimental results show a relatively high recall as more relevant results are returned after relaxation.
The remainder of this paper is structured as follows. Section 2 presents related works. Section 3 briefly review some necessary background knowledge of RDF, SPARQL and its flexible extension f-SPARQL, and the relative proximity relationship and the relaxation principle of fuzzy sets. Sections 4 and 5 provide the relaxation algorithms for atomic and complex f-SPARQL queries respectively. Section 6 provides an empirical evaluation of the effectiveness of proposed method. We wrap the paper with a brief conclusion and some directions of future works.
Related works
When users are unclear of the content and structure of RDF datasets, they frequently encounter the problem of too few or even no results when submitting a very specific query, which is in this scenario called a failing query. A common and intuitive solution for this problem is query relaxation, which means trying repeatedly a more general query until expected number of results is obtained.
In the area of Semantic Web, relaxation of RDF queries, especially SPARQL queries, has been extensively studied. Hurtado et al. [20, 21] put forward a query relaxation method based on RDFS entailment, which automatically relaxes a failing query to generate more general queries and returns approximate query results relevant to the original query. Poulovassilis et al. [24] extend the query relaxation method in [20] to support regular path queries, and show its combination with query approximation. Calì et al. [7] extend the work in [24] to propose an extension of SPARQL 1.1 with approximation and relaxation operators which is applicable to regular expressions for querying property paths. Building on the work of [20], Huang et al. [18, 19] propose a ranking strategy which computes the similarity degrees between relaxed queries and the original query and use them to score the potential relevant answers. Dolog et al. [10] propose query rewriting rules by including domain knowledge (not encoded in RDFS) and user’s preference to handle the problem of excessive restrictions on query conditions in RDF queries.
The abovementioned query relaxation methods based on RDFS semantics [18, 24] or domain knowledge [10] are for categorical attributes in RDF datasets and are not applicable to numeric attributes. In addition, these solutions for crisp SPARQL queries cannot be applied to the relaxation of fuzzy queries directly.
In the area of database, the relaxation of fuzzy queries is well studied in the last nineteenth. The seminal work comes from Andreasen and Pivert [1]. The approach is based on relaxation of one or several fuzzy predicates using a fuzzy linguistic modifier which extends the support of the original predicate to obtain more answers. Fuzzy selectivity is used to guide the execution order of relaxed queries by finding the query as close as possible to the original query. In a series of papers, Bosc et al. propose two relaxation paradigms for fuzzy relational queries based on relative proximity relation [2, 3] and absolute proximity relation [6], respectively. The methods show how a tolerance relation modeled by an appropriate parameterized proximity relation can transform a given fuzzy term into an enlarged one by widening its support. Voglozin et al. [27] propose a method to repair failing queries addressed to data summaries. The method is based on a specified distance and generates alternative queries by modifying fuzzy labels in a query. This method does not provide any termination condition for relaxation process. De Calmès et al. [9] presents an information system, called PRETI, which is applied to a house renting database. The system provides a flexible querying module which use a case-based method to avoid empty answer. Hachani et al. [15] present a FCA (Formal Concept Analysis) based cooperative approach to handling empty answers. The approach provides users with the nearest answers to the original query and gives the cause of a failing query.
Much less attention has been paid to the relaxation of fuzzy queries over crisp RDF datasets. Very few works are concerned with this problem. Hogan et al. [17] introduce a real-life use case that require approximate answers for fuzzy queries over crisp RDF data. They use query relaxation to match incomplete description of entities (i.e., fuzzy queries) to structured descriptions encoded in RDF (i.e., the data). Fuzzy constraints involved in [17] are categorical attribute values prefixed by modifiers (e.g., “slightly”, “looks like”, and “may be”) to form the fuzzy semantics. In this paper, we consider fuzzy constraints in a more general sense, i.e., fuzzy terms (e.g., “high”, “short”, and “young”) that correspond to fuzzy sets.
Preliminaries
In this section, we give a brief introduction of RDF data model and the RDF standard query language SPARQL, fuzzy set theory and the relaxation principle of fuzzy sets, and finally a flexible extension of SPARQL called f-SPARQL.
RDF and SPARQL
RDF is designed as a metadata data model for describing of Web resources. Its basic building block is an entity-attribute-value triple, called a statement. More formally, let I, B, and L be pairwise disjointing infinite sets of IRIs (Internationalized URIs), blank nodes and RDF literals, respectively, a triple (s, p, o) ∈ (I ∪ B) × I × (I ∪ B ∪ L) is called an RDF triple, where s is called the subject, p the predicate (or property), and o the object. The RDF triple implies “a subject s has a predicate (or property) p with value o”.
SPARQL is the W3C standard query language and data access protocol, which is specially developed for RDF. A SPARQL query is actually a graph pattern to be matched in a RDF dataset. A graph pattern is constituted of triple patterns and their conjunctions and/or disjunctions. A triple pattern is a RDF triple that may contain variables. Among others, SPARQL allows for select queries, formed in a SELECT-FROM-WHERE manner. The result form represents the set of variables appearing in the SELECT, the dataset forms the FROM part, constituted by a set of IRIs of RDF datasets, while the graph pattern forms the WHERE part. The detailed syntax of SPARQL can be found in [25].
Fuzzy set theory and principle of relaxation
We introduce some basic concept of fuzzy set theory, a special kind of fuzzy sets called relative proximity relations, and the principle of fuzzy set relaxation.
Fuzzy set theory
In order to quantitatively describe fuzzy concepts and fuzzy phenomena, L.A. Zadeh [29] proposed the concept of fuzzy set to expand the membership relationship in the classic set. The membership degree of an elements to a set is described by a membership function.
Membership functions are always named after their shapes, such as rectangular and half rectangular, trapezoidal and half trapezoidal, parabolic, and ridge shape membership functions. The triangular and trapezoidal shapes of membership functions are used most often for representing fuzzy numbers. For ease of narration and without loss of generality, in the following, we use trapezoidal membership function as an example to discuss the proposed method. We adopt the quadruple expression F (A, B, a, b) to represent the trapezoidal membership function shown in Fig. 1, where a, b, A, B ≥ 0, the core of F is [A, B], and the support of F is [A - a, B + b].

The trapezoidal membership function of F (A, B, a, b).
The tolerance indicator Z is an important application of fuzzy relations, and is the foundation of fuzzy-set-based relaxation [3].
The value of μ E (x, y) is used to measure the degree that x and y are approximately equal. The closer the values of x and y, the closer the membership degree μ E (x, y) is to 1. In a continuous domain U, a relative proximity relation characterizes to what extent the ration x/y is close to 1.
The fuzzy number Z has following properties to ensure the reflectivity and symmetry of E. Reflexivity: μ
Z
(1) =1, which implies that the proximity degree of x to itself is 1. Symmetry: μ
Z
(r) = μ
Z
(1/r), which implies that μ
E
(x, y) = μ
E
(y, x). μ
Z
(r) =0 if r ≤ 0 which assumes that two numbers which are close should have the same sign.
Property (3.2.2) implies the core of Z is [1,1]. Property (3.2.2) implies the support of Z is [1 - ɛ, 1/(1 - ɛ)], where ɛ is a rational within [0,1] and is called the relative tolerance parameter. The quadruple representation of Z is thus Z (1, 1, ɛ, ɛ/(1 - ɛ)). As is suggested in [16], ɛ can be deliberately chosen to ensure supp (Z) fall in the interval
Fuzzy set relaxation
Fuzzy set relaxation is the basis of the relaxation of fuzzy RDF queries. There have been various ways to achieve this goal, e.g, by using fuzzy modifiers, and relative or absolute proximity relations [5]. In this paper, we adopt a relative-proximity-relation-based relaxation. By applying a relative proximity relation E to a given fuzzy set F on domain U, we obtain a more general fuzzy set F′. The newly generated fuzzy set gathers not only all the elements in the original fuzzy set, but also elements close to a certain element in F, which is formally defined as “for each u ∈ U, there exists a element v ∈ U, such that F (v) ∧ E (u, v) holds”. The semantics of this statement is formulated as
If E is modeled by a fuzzy number Z (“close to 1”), we obtain a fuzzy set F
Z
. F
Z
can be expressed as F
Z
= F ⊕ Z, where ⊕ denotes the addition operation on fuzzy sets [11], i.e.,
If the quadruple expressions of F and Z are (A, B, a, b) and (1, 1, ɛ, ɛ/(1 - ɛ)) respectively, then F Z = (A, B, a + A * ɛ, b + B * ɛ/(1 - ɛ)) according to the definition of ⊕.
f-SPARQL
f-SPARQL [8] is a fuzzy extension of SPARQL. As is shown in Table 1, f-SPARQL extends three elements of SPARQL, i.e., the “Query”, the “TripleBlock”, and the “Constrain”. A “Query” element is extended with the element QueryType, which is defined as
Syntax of f-SPARQL
Syntax of f-SPARQL
An f-SPARQL query Q is called atomic if it contains only one fuzzy term. The basic form of the fuzzy FILTER constraint in an atomic f-SPARQL query is of the form
An complex f-SPARQL query Q contains more than one fuzzy FILTER constraints, and is abbreviated as Q = 〈F1, α1, w1〉 ∧ … ∧ 〈F m , α m , w m 〉, where F i is a fuzzy term, α i is the threshold for F i , w i is the weight of F i which indicate the important degree of i-th fuzzy FILTER constraint, and 1 ≤ i ≤ m.
The query Q is abbreviated as Q = 〈young, 0.9, 0.5〉 ∧ 〈close _ to _ 185, 0.9, 0.3〉 ∧ 〈prolific, 0.9, 0.2〉.
As for the query evaluation, a set of rules is proposed in [8] to translate an f-SPARQL query into a crisp SPARQL query so as to take advantage of existing implementations of SPARQL. More specifically, fuzzy FILTER constraints are translated into interval FILTER conditions by means of α-cut of a fuzzy number.
If the result set of an f-SPARQL query contains too few results (less than a given number k) or is empty if k is not specified, which means that no or only a few entities in the RDF dataset match fuzzy constraints in Q, then Q is called a failing query and needs to be relaxed to obtain more results.
This section focuses on query relaxation of an atomic f-SPARQL query. Our query relaxation method rests on the relative proximity relation-based fuzzy set relaxation introduced in Section 3.2.3. Firstly, for each atomic f-SPARQL query, we propose a relaxation operation characterized by a relative proximity relation and the notion for determining the parameter ɛ in the tolerance indicator Z. Then, we discuss the termination conditions for the relaxation process and provide the relaxation algorithm for atomic f-SPARQL queries.
Relaxation operation on fuzzy terms
We introduce for each fuzzy term F (which is actually a fuzzy set) a relaxation operation T↑ and use T↑ (F) to denote the relaxation of F using T↑. The relaxation operation T↑ relating to F should have the following three characteristics [6] to satisfy the semantics of fuzzy set relaxation. For any element u in the domain of F, T↑ will increase the membership degree of u or keep it unchanged, which is formally represented as μT↑(F) (u) ≥ μ
F
(u). T↑ must broaden the support of F, which is formally represented as supp (F) = {u|μ
F
(u) >0} ⊂ supp (T↑ (F)) = {u|μT↑(F) (u) >0}. In order to keep the semantic consistency with the initial fuzzy term as far as possible, T↑ will preserve the core of F, which is formally represented as: core (F) = {u|μ
F
(u) =1} = core (T↑ (F)) = {u|μT↑(F) (u) =1}.
Relative proximity relation E mentioned in Section 3.2.2 is a fuzzy set obtained by parameterizing ɛ in the tolerance indicator Z (1, 1, ɛ, ɛ/(1 - ɛ)). It is trivially true that Z satisfies the characteristics mentioned above. We thus, in this paper, choose Z to characterize the relaxation operation T↑. After applying the relaxation operation T↑ on the fuzzy set F, a more general fuzzy set F′ is obtained, i.e., F′ = T↑ (F) = F Z = F ⊕ Z.
The relative tolerance parameter ɛ directly determines the efficiency of fuzzy query relaxation and the quality of relaxation results. One intuitive way consists in requesting the user to specify the granularity of the relaxation operation. The granularity is given in terms of the maximal relaxation step ω, which ensures that relaxed queries are not too far, semantically speaking, from the original one. The larger the value of ω, the finer the granularity. When ω is specified by the user, ɛ is calculated as ɛ = ɛ
max
/ω. As is shown in 3.2.3, to ensure the support of Z lies in the validity interval V,
Relaxation algorithm for atomic fuzzy queries
The relaxation of an atomic f-SPARQL query Q = 〈F, α〉 is actually relaxation of the fuzzy set F by the relaxation operation T↑ to F′ = T↑ (F) = F Z = F ⊕ Z. Q is then relaxed to a less restrictive query Q′ = 〈F′, α〉 = 〈F ⊕ Z, α〉. The query after n times of relaxation is Q n = 〈F(n), α〉 = 〈T↑(n) (F) , α〉 = 〈F(n)Z, α〉 = 〈F ⊕ n Z, α〉. The relaxation is conducted repeatedly until enough results are returned or termination conditions are met. After n times of query relaxation, the relaxed query may be too far from the initial query. In order to prevent the relaxation process from proceeding endless, even the result set remains empty or contains too few results, we need to stop the relaxation operation. Several termination conditions are taken into account to ensure the fuzzy query relaxation algorithmcontrollable.
The relaxation process terminates either the number of results has meet the user request, the maximal relaxation step is reached, or the support of relaxed fuzzy set overlaps with the core of a fuzzy set F P of non-authorized values.
We introduce a fuzzy set of non-authorized values as a judgment condition to control the relaxation process [6]. For each fuzzy term F, there is a non-authorization fuzzy set F P of unauthorized values, which is specified by domain experts, for indicating the degree that an element u does not belong to F. If μ F P (u) =1, then u is completely not affiliated with F. After introducing F P , the relaxed fuzzy set F(n) is reformulated as F(n)+ = F(n) ∩ (F P ) c , where (F P ) c is the complement of F P . If ∩ is interpreted as minimum, the membership function of F(n)+ is μ F (n)+ (u) = min(μ F (n) (u) , 1 - μ F P (u)). The relaxation process is stopped if one of the following conditions is satisfied.
Condition 1: the number of results satisfies the user’s request.
Condition 2: ɛ reaches its maximum ɛ max .
Condition 3: the support of the relaxed fuzzy set F(n) overlaps with the core of the F P .
Condition 3 enables a RDF query system to effectively filter out entities that are completely unsatisfied with the fuzzy FILTER constraint in an f-SPARQL query, and to return entities to some extent satisfy F(n), since elements in the core of F P are completely unsatisfied with the fuzzy FILTER constraint.

The membership functions of F, F(n), and F P .
The relaxation algorithm for atomic fuzzy terms is shown in Algorithm 1. In Line 1, the ɛ is calculated as shown in Section 4.1 and the tolerant indicator Z is obtained. The function Exec (Q) in line 2 executes the query Q over the input RDF dataset D and returns the results. In Line 3, the fuzzy set F is relaxed to F(n) after n times of relaxation. If F = (A, B, a, b), then F(n) = (A, B, a + n * A * ɛ, b + n * B * ɛ/(1 - ɛ)). If (F P ) c = (C, D, c, d), then Condition 3 is formally represented as C - c ≥ A - a - n * A * ɛ and D + d ≤ B + b + n * B * ɛ/(1 - ɛ) and the execution condition for the while loop is C - c < A - a - n * A * ɛ or D + d > B + b + n * B * ɛ/(1 - ɛ). It should be noted that, the membership function of F in Fig. 2 contains both left support and right support. If the left support of a fuzzy set overlaps with the core of F P , while the right support does not overlap the core of F P , then only relax the right support until the number of the results is met or the right support overlaps with core of F P . In this case, the tolerance indicator Z is no longer symmetric. The relaxation operation is then only applied on the right support of F with the tolerance indicator Z = (1, 1, 0, ɛ/(1 - ɛ)).
1: n = 0; ɛ = ɛ max /ω; Z = (1, 1, ɛ, ɛ/(1 - ɛ)); F(n) = F; F(n)+ = F(n) ∩ (F P ) c ; Q(n)+ = 〈F(n)+, α〉;
2: Results (Q) = Exec (Q(n)+);
3:
4: n = n + 1;
5: F(n) = T↑(n) (F) = F(n)Z = F ⊕ n Z;
6: F(n)+ = F(n) ∩ (F P ) c ;
7: Q(n)+ = 〈F(n)+, α〉;
8: Results (Q) = Exec (Q(n)+);
9:
10:
An complex f-SPARQL query contains multiple fuzzy terms, each of which is assigned a weight to reflect its importance. In practice, we focus on the following four main issues in the process of complex f-SPARQL query relaxation. To determine the execution order of multiple relaxed queries derived from the initial complex query. To ensure that the relaxation degree of each fuzzy term is basically the same. To ensure that every execution of a relaxed query has effect on the final results. To ensure each relaxation operation is controllable to preserve the semantics of query results.
An f-SPARQL complex query is of the form Q = 〈F1, α1, w1〉 ∧ … ∧ 〈F m , α m , w m 〉. In the following, when the threshold and the weight of a fuzzy term is not relevant to the illustration of proposed method, they are sometimes not included and the abbreviated form of Q = F1 ∧ … ∧ F m is adopted.
Determination of execution order of multiple relaxed queries
In this paper, we adopt the local relaxation method in the process of the relaxation of complex queries. Local relaxation is more fine-grained than the global relaxation in that the latter one relaxes all the terms in one relaxation step [4].
In order to determine the execution order of relaxation operations on different fuzzy terms in a relaxed query, we use a lattice structure to represent the derivation relationship of all relaxed queries. The root of a lattice structure is the initial failing query, and the other nodes are relaxed queries derived from the initial query through one or several relaxation operations. The layer of the lattice structure in which a relaxed query lies indicates the times for which the relaxation operation is applied.

The lattice of relaxed queries.

The lattice of relaxed queries with w1 > w2 > w3.
The execution order of relaxed queries is roughly determined by their priorities [4]. The priority of a relaxed query is in accordance with the times of relaxation operations applied on it. The priorities of two relaxed queries Q′ and Q″ satisfy Proposition 1.
Proposition 2 defines the execution order of relaxed queries in the same layer of the query relaxation lattice.
In Proposition 2, w i is the weight of the i-th fuzzy condition. The higher the weight w i of a fuzzy term, the more attention the user pays to the result of F i and the more accurate information the user wants to obtain on the attribute related to F i , thus the later it is executed to ensure the result close to the initial query intent. Assume Q′ and Q″ are in the same layer of the query relaxation lattice, then they are derived from the initial query through same times of relaxation. We calculate the weighted means of the relaxation times of each fuzzy condition to determine the priority. As suggested by Proposition 2, the higher the mean value, the lower the execution priority.
Minimal Failing Subqueries (MFSs) give the reason why the initial query fails [4, 12]. We apply MFSs to our fuzzy query relaxation algorithm for deciding the necessity of the execution of a relaxed query, so as to reduce the times of execution of relaxed queries.
In the following, we discuss the method of how to calculate all the MFSs of a failing fuzzy query Q over a RDF dataset. A failing fuzzy query Q = F1 ∧ … ∧ F m is represented as a set S = {F1, …, F m }, where F i (1 ≤ i ≤ m) is a fuzzy term. Generally, a failing query has one or several MFSs. The algorithm for finding MFSs is based on the lattice structure of S, which is composed of the power set of S, denoted as 2 S , with S the top node and ∅ the bottom node in the lattice. The lattice structure of a query with four fuzzy terms is shown in Fig. 5.

The lattice structure for finding MELs.
Finding MFSs of a failing query can be mapped to the MEL (Minimal Element in the Lattice) problem [4]. A test function T is defined on 2 S , which is monotonic w.r.t set subsumption, i.e., if T (A) = True for a node A in the lattice of S, then for each ancestor B of A, T (B) = True. We call a minimal element in the lattice w.r.t T an MEL. If A is a MEL of S, then T (C) = False for any successor C of A. The unary relation T in this scenario is defined as testing whether a subquery in the lattice fails. We adopt depth-first and top-down traversal of the lattice of S to find a MEL that causes the failure of the initial query. For example, the test path that finds the MEL F4 is shown in Fig. 5 with solid lines. An underline suggests a termination of the search procession where the test of the underlined set returns False. A set is boxed if the test return True and findAllMels () is called recursively. The algorithm for finding all MELs of S is shown in Algorithm 2.
1:
2: isMel : = True;
3:
4: findAllMels (S - {F i });
5:
6: isMel : = False;
7:
8:
9: AllMels = AllMels ∪ S;
10:
The set of MELs of the initial query is obtained by Algorithm 2, then the set of MFSs of a failing query Q is formally defined as mfs (Q) = {M|M ⊆ Q is a MEL of Q}.
The relaxation lattice structure after several times of relaxation operations contains a large search space and results in exponential time complexity. In order to improve the efficiency of query relaxation, we use MFSs of the initial query to determine the necessity of execution of a relaxed query in the relaxation lattice. The MFSs of a relaxed query in the query relaxation lattice can be speculated by the MFSs of the initial query without execution of MFS finding algorithm [6].
Given a failing fuzzy query Q = F1 ∧ … ∧ F m , we use T↑ (Q) to denote the relaxed query of Q, and SQ[j] to denote a subquery of Q, which is obtained by deleting the j-th fuzzy term of Q. The mfs (Q) of the initial query Q is obtained in Section 5.2 and mfs (T↑ (Q)) is constructed by using elements of mfs (Q). Given mfs (Q) = {F i 1 ∧ … ∧ F j 1 , …, F i h ∧ … ∧ F j h }, the construction rules of mfs (T↑ (Q)) is based on Propositions 3 and 4, which are proved in [3].
Query instances
Query instances
The construction of the relaxation lattice of the initial query makes the relaxed queries being executed in a certain order, and the introduction of MFSs makes it possible to avoid some nodes in the relaxation lattice from being executed. In fact, if a node contains any element in the MFS set of its parent node, then the relaxed query in this node is a failing query and should not be executed.
1: i : =0; Layer (0) : = Q; failing : = True; Results (Q) : = Exec (Q);
2:
3: failing : = False;
4:
5:
6: i : = i + 1;
7:
8:
9:
10:
11:
12: failing = False; break;
13:
14:
15:
16: break;
17:
18:
19:
20:
21:
We obtain an efficient and controllable relaxation algorithm (see Algorithm 3), by considering the MFSs and the fuzzy set of non-authorized values.
The function Exec (Q) evaluates Q against a RDF dataset and returns results. Layer (i) is the set of all the relaxed queries in the i-th layer of the lattice structure. Elements in Layer (i) are ordered by considering relaxed query execution priorities introduced in Section 5.1. Each fuzzy term in a query is not relaxed further if its support overlaps with the core of the corresponding fuzzy set F P of non-authorized values (see Section 4.2). The father (Q) represents the set of parent nodes of Q. If the set of MFSs of Q is equivalent to that of any of its parent node, which means the relaxation operation is applied to the fuzzy term that is not the cause of failure, then it should not be executed.
The values of numberOfPublications property are calculated by using aggregate function
As mentioned in many fuzzy systems, the rigorous specification of membership functions can be a difficult task. Often the parameterization of a membership function is summarily described as being chosen by users or being assigned by experts according to the previous experience. Also, in our work, the parameters of fuzzy sets are assigned based on the previous existing experience, e.g., the parameter “25” is often considered as the critical point of the membership function “young” in Table 3 in many fuzzy systems.
The membership functions of four fuzzy terms
The membership functions of four fuzzy terms
Recall and precision
We make a comparison of the response time with the method for processing f-SPARQL queries (abbreviated as FQ) [8] as no similar implementations are concerned with this problem. The experimental results are shown in Fig. 6.

The response time of FQR and FQ.
The response time of FQR is a little longer than that of FQ, as the query relaxation algorithm contains additional procedures, including (i) calculating relaxed queries, (ii) constructing the relaxation lattice structure, and (iii) finding MFSs. The evaluation of Q1 involves only procedure (i), while Q2–Q5 involve all of them. Procedure (i) and (ii) are in O (n3) time complexity with regard to the number of fuzzy atoms, while Procedure (iii) is in exponential time complexity.
The execution time of Q5, including the time of translating an f-SPARQL query into a SPARQL query and the execution time of the latter. The time of sorting relaxed queries in the same layer by calculating priorities shown in Propositions 1 and 2, which is trivial. The time of calculating each relaxed query in the lattice. The execution time of each relaxed queries. The execution time of the Algorithm 2 for finding MFSs, which executes only once.
In this paper, we investigate query relaxation of f-SPARQL queries over RDF data, when the query results are too few or empty.
Firstly, for each atomic fuzzy query, we provide a relaxation operation for the fuzzy term to relax it to a more tolerant fuzzy set. A fuzzy set of non-authorized values is introduced as a termination condition to make the relaxation algorithm controllable.
Secondly, on the basis of atomic fuzzy query relaxation strategy and under the local relaxation, we propose the relaxation algorithm for complex f-SPARQL queries, by considering weights for fuzzy terms, MFSs, and the fuzzy set of non-authorized values.
Finally, we implement the proposed algorithms to evaluate the proposed fuzzy query relaxation method. We show the effectiveness of our proposed method on an extension of LUBM dataset. The experimental results show a relatively high recall as more relevant results are returned after relaxation. We analyze the contribution of different steps to the overall running time of the query relaxation process.
Further research concerns automatically determining the weight of a fuzzy term in a complex f-SPARQL query. In this paper, by requiring the user to specify weights for various fuzzy terms, we assume that the user is knowledgeable enough to ascertain these to a fine degree. It is not a pragmatic approach when there are quite a few fuzzy terms related to different numeric attributes. A technique is required by which the weight of a fuzzy term is automatically specified by considering the user preferences and/or distribution of the RDF dataset queried. Another interesting topic for research is the parameterization of membership functions and its influence on the query relaxation process.
Footnotes
Acknowledgments
The work is supported by the National Natural Science Foundation of China (61672139) and Fundamental Research Funds for the Central Universities (N140404005, N140404010, N151704001).
