Abstract
With the widespread acceptance of RDF (Resource Description Framework) as the de-facto standard recommended by W3C (World Wide Web Consortium) for the representation and exchange of information on the Web, a huge amount of RDF data is being proliferated and becoming available. Efficient querying of RDF data is therefore of increasing importance. Preference queries and fuzzy queries are required for personalized queries, and have been extensively investigated in the context of databases. Focusing on SPARQL (Simple Protocol and RDF Query Language), we propose a fuzzy extension of SPARQL with preferences in this paper. First, we extend SPARQL with fuzzy sets by allowing the occurrence of fuzzy terms, fuzzy operators and fuzzy relations. Second, we introduce multi-dimensional preferences into the fuzzy SPARQL. Based on fuzzy set theory, we also propose a set of translation rules to convert the fuzzy SPARQL with preferences into crisp SPARQL. Finally we propose an effective scoring function to calculate the order of query results, which integrates the preferences and membership degrees. The proposed approach is evaluated with experiments as well.
Introduction
With the explosion of data on the Web, the Semantic Web has been proposed to support uniform access to Web data sources. Being a data model representing objects and their relationships in network resources, RDF is a labeled graph data format, in which the RDF abstract syntax is a set of triples called the RDF graph. Nowadays more and more Web resources are represented by RDF and the queries of massive RDF data is becoming more and more important. Several RDF query languages such as RDQL [21], RQL [7], SeRQL [5], N3QL [3], Triple [23] and SPARQL [19] have been developed. Among these RDF query languages, the W3C SPARQL standard is a vendor-independent query language for the RDF triple data model, and can supply four kinds of query results: SELECT, CONSTRUCT, ASK and DESCRIBE [19]. These solution modifiers in SPARQL play an important role in improving the query precision in different ways.
It should be noted that that the classical SPARQL processes information only in a crisp way and SPARQL FILTERs restrict solutions to those for which the FILTER expression evaluates to TRUE. As a result, existing SPARQL implementations do not allow users to form queries with preferences or vagueness, which would be desirable in the real-world applications. In the community of databases, fuzzy data queries have been widely investigated and applied [14]. In the context of RDF data, there are some efforts in RDF queries with preferences [13, 22], but the work of fuzzy RDF queries is scattered. In [6], SPARQL is extended by allowing fuzzy query conditions and f-SPARQL is proposed. With f-SPARQL, users are required to provide crisp thresholds. It would be common situation where it is not easy for users to choose appropriate thresholds. In addition, f-SPARQL in [6] supports the fuzzy query conditions only containing fuzzy terms and fuzzy operators, and does not support complex fuzzy query conditions. Focusing on approximate answers, Hogan et al. in [12] investigate query relaxation for fuzzy queries over crisp RDF data. They present a proof-of-concept framework for enabling relaxation of structured entity-lookup queries, evaluating different distance measures for performing relaxations. Complex fuzzy query conditions are not considered in [12]. A query language for fuzzy RDFS graphs named Fuzzy SPARQL is proposed in [24]. Unlike [6] and [12], Fuzzy SPARQL in [24] is issued to fuzzy RDFS graphic data instead of crisp RDF data in [6, 12]. Also Fuzzy SPARQL applies simple fuzzy query conditions only containing fuzzy terms and fuzzy operators, and does not support complex fuzzy query conditions. While the works in [6, 24] do not support user’s preferences in the fuzzy RDF queries in an explicit way, preference SPARQL query over a set of data Web services is investigated in [1, 2] and preferences are modeled thanks to fuzzy sets. The SPARQL query with fuzzy preferences (called as SPARQLp) is discussed in [1, 2]. Unlike [6] and [24], the query conditions of SPARQLp are crisp and fuzziness only appears in the preferences. In addition, the fuzzy preferences of SPARQLp only contain fuzzy terms and do not support complex fuzzy preferences as well.
This paper presents a flexible extension to crisp SPARQL in two ways. We first extend SPARQL with fuzzy sets by allowing the occurrence of fuzzy terms, fuzzy operators and fuzzy relations. The extended SPARQL is called SPARQLf. Second, we further introduce multi-dimensional preferences into the SPARQLf. The SPARQLf with preferences is called SPARQLf-p, which provides users with a flexible way to represent their query intentions. Since the queries described with SPARQLf-p cannot be supported by the existing SPARQL implementations, we then propose a set of translation rules to convert SPARQLf-p into crisp SPARQL. Finally we define an effective scoring function for calculating the order of query results, which integrates the preferences and membership degrees.
This paper differs from [6, 24] in three major aspects. First, the orderly linguistic values are applied instead of the crisp thresholds of [0, 1]. Second, the fuzzy relations between two variables are also accommodated into the fuzzy constrains in addition to fuzzy terms and fuzzy operators. Finally, the dimensional preferences for users are explicitly introduced and applied in ranking the returned results. This paper differs from [1, 2] in two major aspects. First, the fuzzy constraints are directly applied to represent users’ fuzzy requirements instead of crisp query conditions. Second, complex fuzzy preferences are discussed. It is not difficult to differentiate between two fuzzy SPARQL languages, which are the fuzzy SPARQLf in this paper and the f-SPARQL in [6]. We use the term fuzzy SPARQL in the rest of this paper to generally refer to a kind of flexible SPARQL with fuzzy querying conditions.
The remainder of this paper is organized as follows. Section 2 briefly reviews the necessary background knowledge of fuzzy sets, RDF and SPARQL as well as the related work. Section 3 presents the syntax of fuzzy SPARQL with preferences. Section 4 shows how to translate the fuzzy SPARQL queries with preferences into crisp ones. Section 5 describes the top-k ranking algorithm. Section 6 provides an experimental evaluation of the fuzzy SPARQL with preferences. Section 7 concludes this paper.
Preliminaries
RDF and SPARQL
RDF is a directed, labeled graph data format for representing Web resources [15]. The RDF abstract syntax is a set of triples, called the RDF graph. A RDF triple is defined as (s ; p ; o) ∈ (I ∪ B) × I × (I ∪ B ∪ L). Here I, B and L are infinite sets of IRIs, blank nodes and RDF literals, respectively. In a triple, s is called the subject, p the predicate (or property), and o the object.
For RDF data sets, SPARQL [19] provides tools to obtain information from URIs, blank nodes, text of no format and RDF sub-graph, and construct a new RDF graph according to the information from the query graph. A SPARQL query can be defined as a quadruple Q = (V, P, DS, SM), where V declares a result form, P represents a graph pattern, DS is a data set and SM a set of solution modifiers [6].
A SPARQL query has the basic format of SELECT-FROM-WHERE. Among these components, the SELECT component is applied to represent the set of variables being shown in the result, the FROM component is employed to represent the dataset constituted by a set of IRIs of RDF documents, and the WHERE component is used to represent the graph pattern constituted by a set of RDF triples. In addition to these major components above, there is an additional solution modifier SPARQL FILTERs. SPARQL FILTERs restrict solutions to those for which the FILTER expression evaluates to TRUE. One can refer to [19] for the syntax of SPARQL.
Fuzzy sets and orderly linguistic values
In the classical sets theory, two-valued logic {0, 1} is applied. As a result, an element either belongs to a set or does not belong to a set for certain. But in the fuzzy sets theory [25, 29], an element belongs to a set with a membership degree of [0, 1] instead of {0, 1} was introduced, which is described by a membership function.
Let A be a fuzzy set in the universe of discourse U with the membership function μ
A
: U → [0, 1]. Then we have the following notions. Support: the support of A is a set of the elements that have non-zero degrees of membership in A, denoted by supp (A) = {x|x ∈ U and μ
A
(x) >0}. Core: the core of A is a set of the elements that completely belong to A, denoted by core (A) = {x|x ∈ U and μ
A
(x) =1}. α-Cut: the strong (weak) α-cut of A is a set of the elements which degrees of membership in A are greater than (greater than or equal to) α, where 0 ⩽ α < 1 (0 < α ⩽ 1), denoted by Aα+ = {x|x ∈ U and μ
A
(x) > α} and Aα+ = {x|x ∈ U and μ
A
(x) ⩾ α}.
With α-Cut, a fuzzy set (say F) is converted to a classical one (say F α or F α +).
Let A be a fuzzy set in the universe of discourse U with the membership function μ A : U → [0, 1]. We say A is convex if and only if for all μ1, μ2 in U, μ A (λu1 + (1 - λ) u2) ⩾ min (μ A (u1) , μ A (u2)), where λ ∈ [0, 1]. In addition, we say A is a normal fuzzy set if ∃ u ∈ U, μ A (u) =1. A fuzzy number is a convex and a normal fuzzy set. Let A and B be two fuzzy numbers in U and let A α and B α be the α-Cuts of A and B, respectively. Then we have (A ∪ B) α = A α ∪ B α and (A ∩ B) α = A α ∩ B α .
A fuzzy set is applied to represent a fuzzy notation, which generally corresponds to a linguistic value [26–28]. With some linguistic values, we may get orderly linguistic values.
In the orderly linguistic values, the middle linguistic value (i.e., sτ/2) means “approximately 0.5”, and it is assumed that the other values are placed symmetrically around it. In particular, s0 and s τ denote the lower and upper limits of linguistic values. Actually orderly linguistic values generally represent adverbs in the natural language.
Let s
i
and s
j
be two linguistic values. Then we have the following operations on linguistic values. neg (s
i
) = s
j
, j = τ - i
max (s
i
, s
j
) = s
i
, if s
i
⩾ s
j
min(s
i
, s
j
) = s
i
, if s
i
⩽ s
j
Here neg () represents a negation operator, and max () and min () represents the maximazation operator and the minimazation operator, respectively.
The cardinality of S (i.e., τ) are odd such as 7 and 9. The limit of cardinality is 11. Generally it must be small enough so as not to impose useless precision on decision making, but it must also be rich enough in order to allow a discrimination of the performances of each object in a limited number of grades. According to Miller decision theory, generally 5 up to 7 linguistic values on an ordered scale are used [16]. In this paper, we adopt 7 linguistic values to make up the orderly linguistic values. Then we have S = {s0 = none, s1 = very low, s2 = low, s3 = medium, s4 = high, s5 = very high, s6 = perfect}.
Viewed from the processing of orderly linguistic values, it is required that each linguistic value (say s i ) is associated with a fuzzy set defined with a membership function. In principle, membership functions can be of different shape, but in practice, trapezoidal and triangular membership functions are most frequently used. Trapezoid membership function is generally described with (a i , b i , c i , d i ), where a i and d i represent minimal and maximal value in membership interval of an element of an order i, b i and c i form an interval to show the membership degree equals to 1.
The notations and semantics of the orderly linguistic values S = {s0 = none, s1 = very low, s2 = low, s3 = medium, s4 = high, s5 = very high, s6 = perfect} are presented in Table 1. The trapezoid membership functions of the orderly linguistic values in S are given in Fig. 1.
Related work
In recent years with expansion of data Web applications, problems of RDF queries come to forefront. In [6], SPARQL is extended with an easy-to-grasp yet powerful way for querying the RDF data sets. In the extended SPARQL, the occurrence of fuzzy terms and fuzzy operators is allowed. Also, in order to express users’ weight preferences on fuzzy constraints, user-defined weights in the form of ∑ω i = 1 are provided in [6]. Finally it is shown in [6] how to translate fuzzy terms and fuzzy operators into crisp ones and how to efficiently calculate the top-k answers of fuzzy queries with regard to membership degrees as well as user-defined weights. In [12], for fuzzy queries over RDF data, a proof-of-concept framework is presented for enabling relaxation of structured entity-lookup queries, evaluating different distance measures for performing relaxations. To query the fuzzy DL-Lite ontologies, in [18], an extended SPARQL is proposed, which allows threshold and user-defined weights in the query.
In addition to the efforts of fuzzy constraints mentioned above, there are some efforts in the expression constructed with users’ preferences in the context of the Semantic Web. In [22], a method to querying the Semantic Web with multi-dimensional preferences is proposed. In their method, the preferences are divided into two types, which are the Boolean preferences and the scoring preferences. By using the Preferring clause which defines some independent preferences separated by the AND construct, users can specify the preferences. In addition, the atomic preferences can be nested using the CASCADE construct. It should be noted that the ranking method in the Preferring modifier may meet the situation that the queries cannot be handled in [22]. Assume that, for example, we just need one more result in the current result set of a four-dimension preference. Also we assume that there is a best solution in each dimension, but there are not any domination relations between any two of them. For this case, it is difficult for the ranking to be finished (see [22] for more details about ranking). In [13], the preference problem of fuzzy and multiple conditions are considered in practical query, and the relevancy computational formula is provided, which can realize preference query through sequencing.
This paper differs from the efforts above in three aspects: first, the orderly linguistic values are applied instead of the crisp thresholds in [0, 1]; second, the fuzzy relations between two variables are also introduced into the fuzzy constrains in addition to fuzzy terms and fuzzy operators; finally, the dimensional preferences for users are explicitly introduced and applied in ranking the returned results.
Fuzzy SPARQL with preferences
In this section, we present the fuzzy SPARQL with preferences, which is an extension to the classical SPARQL in two ways. First, we present the fuzzy SPARQL (denoted SPARQLf), which is a flexible extension to the classical SPARQL language. In the FILTER component of SPARQLf, the original expression is extended by allowing fuzzy terms and fuzzy operators. Furthermore fuzzy relations are introduced into SPARQLf. Second, a structure used to express users’ multi-dimensional preference is introduced in to SPARQLf. The SPARQLf with preferences is denoted SPARQLf -p.
Fuzzy terms and fuzzy operators
The fuzzy extension of SPARQL mainly takes place in the FILTER constraint component. In [6], the basic form of FILTER constraint allowing fuzzy terms is as follows.
FILTER (?X θ FT [WITH α])
Among them, FT denotes a fuzzy term (e.g., “tall” or “young”), which corresponds to a fuzzy set using a trapezoidal membership function, and θ is an operator of >, < , = , ⩾ , ⩽ , ! = , between and not between . If the operator θ is between or notbetween, the syntax of FILTER constraint is a little different, i.e.,
FILTER (?X between/notbetween FT1 and FT2) [WITH α],
where FT1 and FT2 are two fuzzy numbers, respectively.
The optional parameter [WITH α] indicates the condition that must be satisfied as the minimum membership degree in [0, 1]. It is required in [6] that users need to choose an appropriate value of α to express their requirements. If not specified, 1 is used as default. However, it is not easy for users to understand the semantics of a crisp α and to make an appropriate choice. For this reason, we apply orderly linguistic values in FILTER constraint, which has the form as follows.
FILTER (?X θ FT [WITH s i ]) or
FILTER (?X between/notbetween FT1 and FT2) [WITH s i ]
The fuzzy terms applied in SPARQLf can be classified into three types, which are simple (atomic) fuzzy term, modified (composite) fuzzy term and compound fuzzy term. A simple fuzzy term such as “young” or “tall” is defined in terms of a fuzzy number with a membership function. A modified fuzzy term, e.g. “very young” or “more or less tall”, is described by a simple fuzzy term and a fuzzy modifier together. A compound fuzzy term is represented by simple fuzzy terms or modified fuzzy terms connected by or (union), and (intersection), or not (complement) connectors, such as “young or very young”.
Now we consider fuzzy operators within the FILTER constraint, which is formalized as follows.
FILTER (?X Y) [WITH s i ]
Here denotes a fuzzy operator, and Y denotes a string, an integer or any other data types allowed in RDF. Then the fuzzy operator and the crisp value Y constitute a fuzzy number represented by Y. We, in this paper, mainly discuss three types of fuzzy relations, which are “close to (around)”, “at least”, and “at most”.
Fuzzy relation
It is worth noting that the fuzzy terms and fuzzy operators discussed above are applied in single variables of the fuzzy constraints. In “Age = old” and “Age is around 50”, for example, fuzzy term “old” and fuzzy operator “around 50” are related to a single variable “Age”. In practice, there exist many fuzzy semantics constraints within two variables. Let us look at an example. Suppose that we have two variables, say “supplier” and “customer”, and there is a fuzzy constraint relation between them, say “customer likes supplier”. In order to handle this kind of fuzzy constraints, we further introduce fuzzy relations into SPARQLf.
Basically we can identify two major kinds of fuzzy relations, which are simple fuzzy relation and compound fuzzy relation. A simple fuzzy relation between elements from two universes of discourse and its membership function are generally defined directly by the domain experts. Compound fuzzy relations can be obtained with simple fuzzy relations.
Suppose that there is a compound fuzzy relation R © R, meaning that x is far greater than y. Then we have R © R (x, y) = ∨ z (R (x, z) ∧ R (z, y)) and would get
In order to accommodate the fuzzy relations into SPARQL, we apply FILTER (Fuzzy Relation (?X, ?Y) [WITH s i ]) to express the fuzzy constraints between two variables in the FILTER expression.
#FQ#
SELECT ?name, ?height,? weight WHERE {
?x foaf:name ?name.
?x ex:hasHeight ?height.
?x ex:hasWeight ?weight.
FILTER (?height=medium tall WITH s3 &&
Health (?height, ?weight) WITH s5)
}
In the query above, #FQ# is placed before the SELECT clause, indicating that the SPARQL query is a fuzzy one. In addition, Health (?height, ?weight) WITH s5 is used to indicate a fuzzy relation between height and weight, and s5 is used to indicate the threshold of constraints in the query condition.
Preferences
It is very common to use preferences in human natural language. Thus to support users’ demands well, it is very necessary to accommodate preferences in queries. In this section, we apply the orderly linguistic values for dimensional preferences.
In order to construct users’ preferences in SPARQL, we introduce two keywords AND and SUPERIOR.
The keyword AND is used to express independent preferences and SUPERIOR is used to signify prioritized preferences. We define AND and SUPERIOR as binary relation connection identifiers. Let P i , P j and P k be three atomic preferences. For AND, we have the followings.
Law 1 (commutative law): P i AND P j = P j AND P i ;
Law 2 (associative law): P i AND (P j AND P k ) = (P i AND P j ) AND P k ;
Law 3 (idempotent law): P i AND P i = P i ;
For SUPERIOR, we have:
P i SUPERIOR (P j SUPERIOR P k ) = (P i SUPERIOR P j ) SUPERIOR P k .
Note that SUPERIOR does not satisfy the commutative law because of the strict partial order relationship between the two atomic preferences. So P i SUPERIOR P j is not equal to P j SUPERIOR P i . When AND as well as SUPERIOR appears together, we have the followings.
Law 4: P i SUPERIOR (P j AND P k ) = (P i SUPERIOR P j ) AND (P i SUPERIOR P k )
Law 5: P i AND (P j SUPERIOR P k ) ≠ (P i AND P j ) SUPERIOR (P i AND P k )
Now we investigate how to quantitatively convert a dimensional preference to a weight vector in a semantic perspective. For this purpose, we first define two relational operators: ⊕ for AND and ⊗ for SUPERIOR. Also we use S Δi to indicate the orderly linguistic value of P i . We have the following rules to convert an original preference to a weight vector.
Rule 1: P i AND P j ≅ S Δi ⊕ S Δj ;
Rule 2: P i SUPERIOR P j ≅ S Δi ⊗ S Δj ;
In order to obtain the relatively independent weight of atomic preference after SUPERIOR, we define a notion of conditional probability.
With CP (B|A), we have the independent weight of B, denoted CP (B|A) * SvartriangleB. Here CP (B|A) can be calculated by counting N (AB) and N (A). The former represents the number of results satisfying both A and B, and the later represents the number of resultsmeeting A.
The relational operators ⊕ and × and the conditional probabilities satisfy the following rules.
Rule 3: Svartrianglei ⊕ Svartrianglei = Svartrianglei;
Rule 4: Svartrianglei ⊗ Svartrianglej = Svartrianglei ⊕ CP (P j |P i ) * Svartrianglej;
Rule 5: CP (P k |P i ) * Svartrianglek ⊗ CP (P k |P i P j ) * Svartrianglek = indent CP (P k |P i P j ) * Svartrianglek;
Rule 6: CP (P k |P i ) * Svartrianglek ⊗ Svartrianglej = CP (P k |P i ) * indent Svartrianglek ⊗ CP (P j |P i P k ) * Svartrianglej;
Rule 4 means that the conditional probability must be considered while calculating the weight of the right operand with respect to ⊗. Rule 5 describes a preference which must satisfy the most restrictedconditions.
Using the rules presented above, we can obtain a preference with the form of r1 * Svartriangle1 ⊕ r2 * Svartriangle2 ⊕ … ⊕ r n * Svartrianglen. Here n indicates the number of atomic preferences, and ri indicates a weight coefficient, either a conditional probability or a product of probabilities. Then we use the vector of (r1 * Svartriangle1, r2 * Svartriangle2, … , r n * Svartrianglen) to represent this preference.
Syntax of SPARQLf-p
The SPARQLf with preferences is denoted as SPARQLf-p. In this section, we summarize the syntax of SPARQLf-p. The SPARQLf-p consists of three parts, which are “Query”, “Constraint” and “PREFER”.
In SPARQLf-p, element QueryType is introduced into SELECT query [6] to indicate the query type, either the fuzzy type or the top-k fuzzy type. We use keywords #FQ# to declare that the query is fuzzy, and use #top-k FQ# to declare that the query is top-k fuzzy.
In addition to the three original elements (i.e., BracketedExpression, BuiltInCall and FunctionCall) in the classical SPARQL, the “Constraint” element is extended with a “FlexibleExpression” element, which contains fuzzy terms elements and fuzzy relation element. Element FuzzyTermExpression is a representation of fuzzy terms that is consisted of symbols =, ≠, >, ≥, <and ≤. The FuzzyOperatorExpression is an expression with simple fuzzy operator or modified fuzzy one. The structure of FuzzyRelationExpression is SimpleFuzzyTerm (Var, Var) [WITH s i ], which describes the fuzzy relation between twovariables.
The PREFER is an additional part to construct preference expression, which is added to the SolutionModifier part. At this point, the keyword AND constructs the independent preferences and SUPERIOR constructs the prioritized preferences. And its syntax form is SuperiorPreference (‘AND’ SuperiorPreference)*, where the SuperiorPreference is AtomicPreference (‘Superior’ AtomicPreference)*, and the AtomicPreference element is an atomic preference with weight of orderly linguistic value.
Table 2 presents the syntax of SPARQLf-p.
#FQ#
SELECT ?X WHERE {?X ex:has Age ?age.
FILTER (?age <40) WITH s3.
?X ex:hasHeight ?height.
FILTER (?height close to 175 WITH s5).
?X ex:hasWeight ?weight.
FILTER (HEALTH (?height,?weight) WITH s3)
}
PREFER
(?age <40) WITH s3
AND
(?height = medium) WITH s5
SUPERIOR (?weight) WITH s4.
Fuzzy query translations
By extending the classical SPARQL, the fuzzy SPARQL can present users’ fuzzy query requirements with preferences well. But the fuzzy SPARQL cannot be identified and supported by the classical SPARQL query engines. In order to implement the fuzzy SPARQL queries, we basically have two ways. The first method is to design and develop a completely new query engines which fully support the fuzzy SPARQL directly. So far, to our best knowledge, there are not any such fuzzy SPARQL query systems. The second method is to use the classical SPARQL query engines instead of developing a new fuzzy SPARQL query engine. The second method has been widely applied in the fuzzy database queries because it has some obvious advantages such as low costs and short times in system implementations [14].
In this paper, we follow the second method mentioned above to implement the fuzzy SPARQL queries. For this method, a core issue is how the fuzzy SPARQL can be identified and supported by the classical SPARQL query engines because the classical SPARQL query engines can only identify and support classical SPARQL. So we need to translate the original fuzzy SPARQL into the classical SPARQL. Clearly, the translated classical SPARQL must be semantically closed to the original fuzzy SPARQL. In the following we investigate how to translate the fuzzy SPARQL into the classical SPARQL in a way of semanticpreservation.
The difference between the fuzzy SPARQL and the classical SPARQL is that the fuzzy SPARQL applies the fuzzy constraints, which contain fuzzy numbers. The fuzzy numbers in the fuzzy SPARQL are convex and normal and so it is possible to translate fuzzy SPARQL to classical SPARQL.
Translation of fuzzy terms and fuzzy operators
First we concentrate on the translation of simple fuzzy terms and modified fuzzy term. For FILTER (?X θ FT [WITH s i ]), s i is converted into a fuzzy number with membership function according to Table 1. Then FT with s i an interval, written as FT (s i ) = [x, y], and fuzzy queries can be translated into crisp ones according to the rules given in Table 3.
Let FT1 and FT2 be two fuzzy terms and s i be an orderly linguistic value. Also let FT1 (s i ) = [x1, y1] and FT1 (s i ) = [x2, y2]. For FILTER (?X between FT1 and FT1 WITH s i ) in the constraints, it is converted into ?X ≥ FT1 WITH s i && ?X ≤ FT2 WITH s i and then into ?X ≥ y1 && ?X ≤ x2 according to Table 4. Table 4 gives the translation rules for the compound fuzzy terms.
For the translations of “FT1 or FT2” and “not FT”, ones can refer to [6].
Second we focus on the translation of fuzzy operators. Here we mainly discuss three kinds of fuzzy operators, which are “closeto”, “atleast” and “atmost”.
For FILTER (?X Y)[WITH s i ], the operator represents “closeto”. This fuzzy operator along with the crisp value Y actually forms a fuzzy term, which corresponds to a membership function. The membership function of “closeto” is defined as μ closetoY (Y) (u) = (1 + ((u - Y)/b) -2) -1. According to the membership interval [λ1, λ2] of s i , we choose λ1 as a threshold and then get α-cut sets [x, y]. With the interval [x, y], we can translate fuzzy queries into crisp ones.
#FQ#
SELECT ?X WHERE {
?X rdf:type WorkPos.
?X ex:monthSalary ?salary.
FILTER (?salary close to 10000 WITH s5).
}
Here we choose 1000 as the value of b in the above membership function. That is μ closetoY (Y) (u) = (1 + ((u - Y)/1000) -2) -1. According to the mapping rules between orderly linguistic values and trapezoidal member degree in Table 1, we get λ = 0.71 based on s5. Then the interval of λ-cut set can be computed and we have Y λ = [936, 10639]. Furthermore, ?salary close to 10000 WITH S9 is converted into ?salary ≥ 9361 and ?salary ≤ 10639.
We can directly make translations from “at most Y” to “≤ Y” and from “at least Y” to “≥ Y”, respectively.
Translation of fuzzy relationships
Let R (x, y) be a fuzzy relationship between variables x and y. To translate the fuzzy relationship into a crisp one, we identify the following three cases for R (x, y).
For the given s i , an interval [λ1, λ2] of the membership degree can be obtained. Then the reciprocal fuzzy relationship is converted into λ1 ≤ R (?X, ?Y) ≤ λ2.Such a expression containing arithmetical operations on variables is allowed in the traditional FILTER constraints and is written as R (?X, ?Y) ≥ λ1 && R (?X, ?Y) ≤ λ2.
First we make a preliminary translation and get λ1 ⩽ R (x, y) ⩽ λ2. Suppose that the fuzzy constraint is only on variable x, where x has a crisp range. Then using the inequalities on x and y with the range of x, we can get the expressions of y represented by x. The expressions about y can be translated as y ∈ [f1 (x) , f2 (x)], y ∈ [min(U) , f1 (x)] or y ∈ [f2 (x) , max(U)]. Among them, min (U) and max (U) mean the minimum value and maximum values of y in its domain, respectively. Finally we can combine these two expressions together and get yθf (x) to represent the constraints on x and y.
the translation result of the single variable u is u∈ [4.08, 5.92]; the translation result of the fuzzy relation R (v, u) is 1 + 100 * (v - u) -2 ⩽ 1/0.76 && v > u; the complete translation of FILTER (u close to 5 WITH s5 and R (v, u) WITH s4) is FILTER (u ≥ 4.08 && u ≤ 5.92 && 1 + 100 * (v - u) -2 ⩽ 1/0.76 && v > u).
Two variables in this case need to be translated, respectively. The translation result of x is x θ f (y) and the translation result of y is y θ g (x). The final constraints are translated to x θ f (y) && y θ g (x).
For the case that there is not a direct simple fuzzy relationship between two variables given in the query, additional fuzzy compositional operations are needed before translating. The basic idea is to first convert the indirect fuzzy relationship to a simple fuzzy relationship and then the translation results can be obtained according to the different cases discussed above.
Ranking
In order to rank the returned query results, the work in [6] proposes a scoring function with a simple method of weighted average value. But it is possible that all weights may be equal each other. A ranking method is proposed in [13] for Semantic Web search based on fuzzy logic. Their scoring strategy combines the membership degrees with user preferences and they define a non-increasing order of “degree of relevance”. We propose our strategy, a combination of user preferences and levels of satisfaction based on membership degrees.
We use Q (X, f
ω
) as the formal expression of user query, in which X means the bounded variables in query conditions and f
ω
indicates the weighting function. This weighting function has some desirable properties (see [13] for more details). In order to satisfy the equation ∑ω
i
= 1, the weighting (r1 * Svartriangle1, r2 * Svartriangle2, …, r
m
* Svartrianglem) has to be reduced to (ω1, ω2, …, ω
m
) by the same scaling. Without loss of generality, for a weighting ω = (ω1, ω2, …, ω
m
), we assume ω1⩾ ω2⩾ … ⩾ ω
m
. Then we have
In the above, f (A1 (x1) , A2 (x2) , …, A
m
(x
m
)) is an aggregation function of A
i
(x
i
), indicating the degree of membership. The aggregation function may be min, product, averaged sum, or even one defined according to real applications. Here we apply min and then the aggregation function is defined as follows.
Experiments
To evaluate our proposed approach, we use Eclipse as development environment to implement the prototype system. The prototype runs on a Windows XP Professional System with Intel Pentium 1.8 G CPU and with 1 G RAM. Also we use TDB (a database to store RDF dataset) and Jena ARQ (a query engine to execute crisp SPARQL queries) for the evaluation. We generate the RDF dataset from the Lehigh University Benchmark LUBM [10], which contains about 6,000k triples.
We first introduce four SPARQLf queries shown in Table 5.
In the four SPARQLf queries above, we further use in the experiments the following preference.
PREFER
(?X = busy) WITH s4
AND
(Ratio (UndergraduateStudent, ?X) is moderate)
WITH s3
SUPERIOR (Ratio (GraduateStudent, ?X) is
almost 3.5) WITH s3.
Combining this preference, the four SPARQLf queries Q1’, Q2’, Q3’ and Q4’ become four SPARQLf-p queries Q1, Q2, Q3 and Q4.
We also apply three membership functions in our experiments, which are defined as follows.
Busy (n) = 2/(1 + exp (–0.4n)) − 1,
Ratiomoderate (x, y) = [1+((x - 11y)/10)2]–1
Ratioalmost3.5 (x, y) = [1 + ((x - 3.5y)/10)2]–1
Here n indicates the number of courses taken by a student or taught by a lecturer, and x represents the number of undergraduates and gradates, and y represents the number of the faculty.
We use Recall and Precision to assess our approach. Recall and Precision have been widely adopted for querying evaluation and they can be defined as follows.
The developed prototype takes the four SPARQLf-p queries Q1, Q2, Q3 and Q4 as inputs, respectively, and calls Jena ARQ to search the corresponding query results from the RDF dataset, which is generated from the LUBM and stored in TDB. For the four SPARQLf-p queries Q1, Q2, Q3 and Q4, their experimental results of recall and precision are summarized in Table 6. It can be seen that, for the given sample queries, the average recall of our approach is 0.7375 and the average precision of our approach is 0.845. So our approach is good at dealing with the fuzzy SPARQL queries with preferences.
In addition, we compare our approach using the SPARQLf-p (named OLVP for short) with the approach using f-SPARQL proposed in [6] (named NVP for short) on the response time of queries. The experimental results are shown in Fig. 2.
It is clearly shown in Fig. 2 that OLVP takes a little longer time than NVP. The reason why OLVP takes a little longer time than NVP is that compared with NVP, OLVP can provide a more powerful querying means, and this needs more complex processing and hereby more time. First, in addition to the fuzzy terms and the fuzzy operators used by NVP, OLVP uses the fuzzy relations as well. Generally, converting and then evaluating a more complex fuzzy SPARQL query in OLVP obviously takes more time than a simple fuzzy SPARQL query in NVP. Second, OLVP explicitly supports user preferences which are considered in ranking query results. Processing multi-dimensional preferences and ranking query results with preferences in OLVP do need additional time compared to the simple query result ranking in NVP.
Summaries
In order to provide users with an efficient way to express their intentions in querying RDF data, in this paper we present a fuzzy extension to SPARQL with preferences. We extend SPARQL first by allowing the occurrence of fuzzy terms, fuzzy operators and fuzzy relations and then by allowing the occurrence of preferences. The fuzzy SPARQL with preferences is called the SPARQLf-p. With fuzzy set theory, we also define a set of translation rules, converting SPARQLf-p into crisp ones. With the translation rules, we can take advantage of existing implementations of SPARQL to process SPARQLf-p queries instead of developing new implementations of SPARQL. Finally we propose a scoring function for calculating the order of query results, which contains membership degrees andpreferences.
In this paper, we use LUBM dataset to verify our approach. As pointed by Cudre-Mauroux et al. [8], there exist several benchmarks for RDF data management research, and, in addition to LUBM, Berlin Benchmark [4], DBpedia Benchmark [17] and SP2Bench [20] are also widely used benchmarks. We will use other datasets such as DBpedia and SP2Bench datasets to evaluate the performances of our approach. Furthermore our future work will investigate possible optimization strategies for querying RDF and evaluating their performances with these benchmarks.
