Abstract
With the rapid development of the environmental, meteorological and marine data management, fuzzy spatiotemporal data has received considerable attention. Even though some achievements in querying aspect have been made, there are still some unsolved problems. Semantic and structural heterogeneity may exist among different data sources, which will lead to incomplete results. In addition, there are ambiguous query intentions and conditions when the user queries the data. This paper proposes a fuzzy spatiotemporal data semantic model. Based on this model, the RDF local semantic models are converted into a RDF global semantic model after mapping relational data and XML data to RDF local semantic models. The existing methods mainly convert relational data to RDF Schema directly. But our approach converts relational data to XML Schema and then converts it to RDF, which utilizes the semi-structured feature of XML schema to solve the structural heterogeneity between different data sources. The integration process enables us to perform global queries against different data sources. In the proposed query algorithms, the query conditions inputted are converted into exact queries before the results are returned. Finally, this paper has carried out extensive experiments, calculated the recall, precision and F-Score of the experimental results, and compared with other state-of-the-art query methods. It shows the importance of the data integration method and the effectiveness of the query method proposed in this paper.
Introduction
Spatial and temporal information play an important role in spatiotemporal applications which are popular in many fields such as geography, meteorological and environmental information management. Spatial and temporal entities are represented by spatiotemporal data models. There are fuzzy relationships between them, and therefore, traditional modeling methods [2, 5] cannot meet the needs of dealing with fuzzy spatiotemporal data. It is important to establish a reasonable and effective fuzzy spatiotemporal data model that applies to specific industry and research fields. Song et al. [20] propose a cadastral spatiotemporal data model by extending space-time composite model. Xu et al. [23] establish a fuzzy spatiotemporal XML data model. However, XML data model cannot solve the problem of semantic heterogeneity in different data sources. Therefore, the concept of ontology is introduced in [14]. The goal of ontology is to acquire, describe and represent the knowledge of the relevant domain, provide a common understanding of the knowledge in the domain, identify the commonly recognized terms in the domain, and give a clear definition of these terms and their interrelationships from different levels of formal patterns [9]. In this way, it can describe some concepts within the domain or even in a broader scope and the relations between concepts. Besides, it can describe the data and semantic content from different data sources in an integrated way, so as to solve the semantic heterogeneity between various data sources [12]. The terminology component of the ontology is then represented as a global schema for the system and used as a reference model for formulating query [1]. Ontology-based data integration provides a unified view of multiple data sources for users in the form of global application domain ontology [8, 19]. Ontology has been used in database integration and achieved promising results such as in the fields of biomedicine and bioinformatics [17, 21]. In addition, Pinkel et al. [13] solve the problem of semantic heterogeneity. RDF is a new metadata standard and a kind of ontology description language. It provides rich semantic definition capabilities which defines properties of resources and classes that describe them, and constrains possible combinations of classes and relationships [3, 11]. Kondylakis and Plexousakis [11] provide the algorithms for rewriting queries among different ontology versions and present an algorithm based on MiniCon that uses these rewritings and that is guaranteed to find the set of maximally-contained rewritings for the sources. The extension of the MiniCon algorithm does not involve a significant increase in computational complexity and remains scalable. In RDF, data has explicit semantics and can be processed automatically. Therefore, ontology-based data integration is more efficient [6, 10]. The fuzziness and relations of meteorological entity are extraordinarily important topics in environmental data management systems and geographic information systems. For instance, the position of the weather front will change over time, and the area and intensity of a storm should will over time as well [7]. However, most of users do not know the actual situation of the data when performing query operations, and their query intent is not particularly clear. As a result, the query criteria is vague. There are three kinds of fuzzy concepts: simple fuzzy concept, compound fuzzy concept and combined fuzzy concept. Simple fuzzy concepts, such as “cold” and “warm”, are defined by a fuzzy set with membership functions. Compound fuzzy concepts, such as “very cold” and “more or less warm”, represented by a vague modifier (such as very, more or less) and a simple fuzzy word. Combined fuzzy concepts, such as “cold ∪ very cold”, is composed of simple fuzzy concept or compound fuzzy concept connected by union, intersection or complement symbols. In addition, there are some fuzzy relations: “close to”, “at least”, “at most”, and etc. [4]. The existing query method ARQ [15] cannot handle fuzzy spatiotemporal data with fuzzy relations. Hence, this paper extends it to the FILTER constraint part of SPARQL by allowing fuzzy operands and fuzzy operators. It is more similar to natural language in form and can help users express their query intention more effectively and conveniently [16].
In this study, we define the meteorological spatiotemporal data model and describe the temporal attribute, spatial attribute and fuzzy attribute of meteorological data. Moreover, we integrate meteorological data from different data sources. First of all, the relational data is transformed into XML Schema according to the mapping rules, the XML data sources of XML Schema are extracted through the transformation of the DTD file, and this process can solve the structural heterogeneity problem between different data sources by utilizing the semi-structured features of XML files. The XML Schema is mapped into RDF Schema according to the corresponding transformation methods. Then, we construct the local ontology from the RDF Schema of the local data source. Finally, we map different local ontology to a global ontology for completing the integration process. This process solves the problem of semantic heterogeneity between different data sources. Based on the integrated global ontology, we conduct the query according to the fuzzy query conditions inputted by users. We propose two query algorithms, QCC algorithm for fuzzy operands and operators respectively, which transform the fuzzy query conditions into accurate query conditions, and implement the query process with SPARQL language. Besides, we verify the accuracy and effectiveness of the query method.
Our approach utilizes the semi-structured feature of XML schema to solve the structural heterogeneity between different data sources. The integration process enables us to perform global queries against different data sources.
The rest of this paper is organized as follows. We establish the global semantic model and propose data integration process of different data sources in Section 2. In Section 3, we propose two query condition conversion algorithms. Experimental evaluations are presented in Section 4 and Section 5 concludes the paper.
The integration of multi-source heterogeneous data
Since meteorological data sources are usually independent, autonomous and may already have a large number of applications on existing data sources, it is often not allowed to modify existing data source models. There is no unified specification for data source model, and semantic heterogeneity is widespread. In order to integrate all kinds of spatiotemporal data sources better, a unified semantic model is needed to standardize the description of data model. Based on the integration of meteorological domain, this paper unifies the global semantic model, meteorological domain concepts and the relationship between concepts on the semantic level by establishing domain ontology.
In addition, the semantic heterogeneity of each local spatiotemporal data source can be handled through the semantic mapping relationship of the model.
Multi-source heterogeneous data integration needs to map different data sources into XML Schema according to XML mapping file, then converts XML Schema into RDF Schema according to RDF mapping file, finally constructs local ontology according to RDF Schema, and then maps different local ontology into global ontology according to merging rules. Based on this process, we propose an integration framework shown as Fig. 1.

The integration framework of multi-source heterogeneous data.
This paper takes the fuzzy spatiotemporal meteorological data as an example to illustrate the steps to build the ontology. Firstly, determine the domain scope of integration. What needs to be integrated is multi-source heterogeneous fuzzy spatiotemporal meteorological data (FSTMD). Secondly, once an integrated domain has been identified, it is necessary to look for key terms in that domain. Fuzzy spatiotemporal meteorological data FSTMD is represented by mapping pair FSTD of temporal, spatial and fuzzy membership functions, FSTD: spatial×time×fuzzy ⟶ FSTD. Finally, we define the global semantic model. This step addresses two issues: defining classes and the hierarchy of classes, and defining the attributes of classes. A top-down approach is used to define the hierarchical structure of a class. Namely, from the top level of global semantic model, it mainly includes the unique identification OID, Pos, Motion and Action of fuzzy spatiotemporal meteorological data. OID’s subclasses include HT and Attr.
The attributes of defining classes and the characteristic value of fuzzy spatiotemporal meteorological data HT, Attr, Pos, Motion and Action, whose attribute values describe their internal structure. Oid is the primary key of OID, whose domain is defined as int type, and Oid is the foreign key of all features; the remaining attribute fields of OID are defined as string type; Pid is the primary key of Pos, whose domain is defined as int type; and the domain of PST and PET is defined as Datetime type, other attribute values are defined as float types. Mid is the primary key of Motion, whose domain is defined as int type, and the domain of MST and MET is defined as Datetime type, and the remaining domain is defined as float type; Atid is the primary key of Attr, whose domain is defined as int type, and the domain definition of ACST and ACET is Datetime Type; Co_CiF and Cl_CF fields are defined as float type, and other attribute fields are defined as string type; Aid is the primary key of Action, whose domain is defined as int type, and AST and AET fields are defined as Datetime type, and other attribute fields are defined as bool type; Hid is the primary key of HT, whose domain is defined as int type, and the domain of HST and HET is defined as Datetime type, and the remaining attribute domains are defined as string type.
Based on the above process, the global semantic model of the fuzzy spatiotemporal meteorological data is established as Fig. 2.

The global semantic model of FSTMD.
To define a local semantic model, we need to map different data sources into XML Schema, transform XML Schema into RDF Schema, and finally construct a local ontology based on RDF Schema, that is, a local semantic model.
Firstly, given the relational data source FST1, the mapping operation from relational schema to XML Schema is completed by FST1. The specific attributes of the table are as follows:
OID (Oname, Ostate, Time)
ATTR (cloud_color, covered_city, fuzzy, time)
Position (Oid, min, max, time, possibility)
Motion (OID, direction, velocity, Time, P od , P ov )
Convert the data source FST1 to an XML Schema according to mapping rules [18].
In the XML data source, DTD or XML Schema can be used to describe the metadata. In this paper, XML Schema is used to describe the metadata. The XML Schema file is obtained by converting the DTD file directly. The XML Schema fragment of the XML data source FST2 is as follows:
< ?xml version=“1.0” encoding=“UTF-8”?>
<xsd:schemaxmlns:
xsd = http://www.w3.org/2001/XMLSchema>
<xsd:element name=“Cloud”>
<xsd:complexType>
<xsd:attribute name="name” type="xsd:string"/>
<xsd:element name="ATTR">
<xsd:complexType>
<xsd:attribute name="cloud_color”
type="xsd:string"/>
<xsd:attribute name="covered_city”
type="xsd:string"/>
<xsd:attribute name="fuzzy” type="xsd:float"/>
<xsd:attribute name="time” type="xsd:string"/>......
< /xsd:schema>
Since XML Schema files are mapped by different data sources, there is a problem of semantic heterogeneity. In order to solve the semantic heterogeneity of heterogeneous data sources, XML Schema is converted into RDF Schema according to mapping rules [22]. The XML segment is converted into the RDF triplet mode and the result is shown as Table 1.
The RDF triplet of FSTACT
The RDF triplet of FSTACT
The local ontology constructed from the RDF model of the local data source of the fuzzy spatiotemporal meteorological data FST1 is shown in Fig. 3.

The local ontology of FST1.
The local ontology constructed from the RDF model of the local data source of the fuzzy spatiotemporal meteorological data FST2 is shown in Fig. 4.

The local ontology of FST2.
Semantic mapping refers to associating the concepts in local ontology according to the semantic relationship. The mapping of local ontology is realized through semantic mapping, and then the local ontology is merged into global ontology.
There are three types of mappings: Super Class: The semantics of A super B is the parent concept of A as B. Sub Class: The semantic meaning of A sub B is a sub concept of A as B. Equal Class: The semantics of A equal B are conceptually equivalent to B.
The global ontology constructed from the local ontology FST1 and FST2 is merged through semantic mapping as shown in Fig. 5.

The global ontology of FSTACT.
This section gives two examples to introduce two fuzzy query conditions, namely fuzzy operands and fuzzy operators. And the specific conversion methods of two fuzzy query conditions are proposed, which converts the fuzzy query to an accurate SPARQL query.
Conversion of fuzzy query conditions with fuzzy operators
In this subsection, we discuss three main fuzzy relations: “close to”, “at least” and “at most”. The requiring membership functions γ, ɛ and σ for fuzzy transformation are defined as follows, where the values of a, b, and c are the most appropriate values given by the domain expert.
When user enters a fuzzy query conditions that contain fuzzy operators, we propose the Query condition conversion (QCC) algorithm for fuzzy operators, it is shown as Algorithm 1.
As described in the algorithm, Qr represents the query condition with fuzzy relation, Qr1 depicts the operator of “close to”, Qr2 depicts the operator of “at least”, and Qr3 depicts the operator of ‘at most’ (Line 1–5). Determine the query condition entered by the user. If the fuzzy operator is “close to”, we calculate the upper and lower bounds of the interval based on the function (1) and perform the classic SPARQL query (Line 6–9). If the fuzzy operator is “at least”, we calculate the upper and lower bounds of the interval based on the function (2) and perform the classic SPARQL query (Line 10–13). Otherwise, calculate the threshold based on the function (3) and submit the query results to user (Line 14–18).
In natural languages, some words, such as “very”, “extreme”, are added as prefixes before a word to adjust the degree of certainty of the word’s meaning, changing the original word into a new word, which is called a mood operator. Fuzzy terms are composed of modal operators used in fuzzy queries, which constitute fuzzy queries expressing degree. For example, the linguistic variable x is “cold” and is adjusted to a set of fuzzy terms W(x) = {“Absolutely cold”, “Extremely cold”, “Very cold”, “cold”}. Such fuzzy terms are ordered language values that can be processed to map each term in W(x) to the corresponding subfield in Table 2 to determine the membership interval of the fuzzy term.
Ordered language values sub domain table
Ordered language values sub domain table
For the case where the query condition includes the fuzzy concept, we propose the Query condition conversion (QCC) algorithm for fuzzy operands. It is shown as Algorithm 2.
The fuzzy query conversion process expressed by Algorithm2 is as follows: Determine the query condition inputted by the user. If the query condition is a simple or compound fuzzy concept, then we apply the Translation Rule1 for fuzzy term Y with Y a = [m, n] in [5] to convert the fuzzy terms to classic query intervals and return the query result (Line 1–9). Otherwise, apply the Translation Rule2 for fuzzy term Y with Y a = [m, n] ∪ [p, q] in [5] to convert the fuzzy terms to classic query intervals and return the query result (Line 10–13). Finally, submit the result to user (Line 14).
Experimental setup
The experiments are conducted on a computer running Windows 2007 with Intel P4 3.2-GHz CPU, and 4 GB of RAM. We implement all algorithms in Java and MYSQL. We use one real dataset and one virtual dataset to evaluate the performance of our approach. For our evaluation, we set up a meteorological information database (MIDB) which contains 58,000 tuples from the World-Wide-Meteorological Information Site. According to the proposed fuzzy spatiotemporal model, we create 600 virtual data (VDB), which are stored in relational database and XML file.
Experimental results
The experiments of fuzzy operators aim to determine three optimal parameter values ɛ, γ and μ for the membership functions of fuzzy query conditions in the Algorithm1. For the fuzzy operator “at least”, we assign 5 values {0.5, 0.6, 0.7, 0.8, 0.9} for parameter ɛ and set the temperatures at 15°C, 20°C, 25°C, 30°C and 35°C respectively, which means we launched 25 groups of experiments for one operator. In a similar way, 25 groups are respectively launched for the fuzzy operators “at most” and “close to”. In the process of experiments, according to the method for solving data heterogeneity proposed in Section 3, we integrate the actual data sets and virtual data sets from different data sources, respectively, and formed two unified RDF files. The query process of the three different fuzzy operators in this experiment is performed on the RDF files of the real data set MIDB and the virtual data set VDB. The recall, precision and F-Score indicators are used to evaluate the effectiveness of the Algorithm 1 and select the optimal parameters. They were calculated in the following way: For each of the proposed query criteria, the user selects the 20 most qualified data R
u
, and randomly adds 20 data to form a total of 40 appropriate result sets related to the query. The most of adding data qualify the query criteria which are represented as R
rand
. The number of results queried with Algorithm1 is represented as R
QCC
. R
ret
represents the numbers of the user selecting results in R
QCC
. Recall is the ratio of R
ret
to the sum of R
u
and R
rand
, which is typically used to determine the quality of the query results. Precision is the ratio of R
ret
to R
u
, and is used to determine the accuracy of the results. F-score is the harmonic mean of the recall and precision. It is usually used as a comprehensive indicator.
From Figs. 6 11, Recall is positively correlated with attribute value, and precision is negatively correlated with parameter values. Because the range of the query interval becomes larger as the parameter value increases according to Function 1–3. F-scores of all parameter values are over 76% on average. Hence, the results for the given query condition can meet the user’s intentions more closely and Algorithm 1 is efficient. As shown in Figs. 6 and 7, setting ɛ to 0.5 will result in a relatively higher recall (over 90% on average) and a relatively lower precision (over 72% on average).

Recall, Precision, and F-Score of the “at least” Queries in the MIDB.

Recall, Precision, and F-Score of the “at least” Queries in the VDB.

Recall, Precision, and F-Score of the “at most” Queries in the MIDB.

Recall, Precision, and F-Score of the “at most” Queries in the VDB.

Recall, Precision, and F-Score of the “close to” Queries in the MIDB.

Recall, Precision, and F-Score of the “close to” Queries in the VDB.
According to comparison, the five parameter values selected in this experiment achieved the accuracy and recall rates expected. The query results in MIDB and VDB both show that the recall rate is positively related to the parameter value, and the accuracy rate is negatively related to the parameter value. Because according to Function (1) (2) (3), the range of the query interval will increase as the parameter value increases. The F-Score of all parameter values exceeds 76% on average. Therefore, the results of a given query condition can meet the user’s intention closely; it shows that the QCC algorithm of the fuzzy operator is effective. By comparing the values of F-Score, the best parameter values can be obtained to return the most effective query results.
According to the method proposed in this paper, the data of the relational data source is converted into XML schema firstly, and then the XML schema is converted into RDF data. We integrate MIDB data into RDF file 1 and VDB into RDF file 2. According to the integration method of reference [11], MIDB is integrated into RDF file 3, and VDB is integrated into RDF file 4. This experiment uses the Algorithm 2 to query different wind speeds on RDF file 1 and RDF file 3, where fuzzy operands W = {low, fairly low, fairly high, high}. Then, the same conditional query is performed on RDF file 2 and RDF file 4 by the Minicon algorithm. In addition, different temperatures are queried on RDF file 1 and RDF file 3 by Algorithm 2, where fuzzy operands T = {very cold, cold, hot, very hot}. In the same way, the experiment is performed on RDF file 2 and RDF file 4 by Minicon algorithm.
As shown from Figs. 12 15, the Recall, Precision and F-Score computed by Algorithm 2 achieved our expected high recall result (90% on average) and acceptable accuracy (an average of more than 70%). It shows the effectiveness of the Algorithm2, and the returned results meet the user’s query intention. The Recall computed by Algorithm2 is significantly higher than the Recall computed by Minicon which means our data integration approach is more accurate than that Minicon. The F-Score computed by Algorithm 2 is higher which illustrated our query approach is more effective than that of [11].

Recall, Precision, and F-Score of the Querying Wind Speed in the VDB.

Recall, Precision, and F-Score of the Querying Wind Speed in the VDB.

Recall, Precision, and F-Score of the Querying Temperature in the MIDB.

Recall, Precision, and F-Score of the Querying Temperature in the VDB.
Our method transforms the relational data source into an XML file. XML is a plain text file, which can be marked with a variety of definitions, such as strings and child elements. It solves the problem that the data types of various types of information in the database have large differences, so the data can be processed more comprehensively when mapping. Therefore, our query results are more accurate.
In the experimental section, we performed 10 groups of experiments, which are shown from Figs. 10 15. Because the paper concerns querying multi-source heterogeneous fuzzy spatiotemporal data, it contains several elements such as multi-source feature, heterogeneous feature, fuzzy feature, spatial feature, and temporal feature. The experimental results of Precision, Recall and F-Score are consistent with the expected effect of querying multi-source heterogeneous fuzzy spatiotemporal data. The first 6 groups of experiments aim to determine the three optimal parameter values ɛ, γ and μ for the membership functions of fuzzy query conditions. As the experimental results shown, the range of the query interval becomes larger as the parameter value increases according to Function 1–3. F-scores of all parameter values are over 76% on average. Setting ɛ to 0.5 will result in a relatively higher recall (over 90% on average) and a relatively lower precision (over 72% on average). According to comparison, the parameter values selected achieved the accuracy and recall rates expected.
Conclusion
In this paper, we propose fuzzy spatiotemporal data with the global semantic model of FSTMD. Based on this model, we have completed the data integration process for different data sources and solved the semantic heterogeneity between them. Furthermore, this paper proposed QCC algorithm for fuzzy operands and operators. The algorithms could solve the problem of inaccurate results caused by the input fuzzy query conditions. Experiments on real meteorological datasets identified the optimal parameter values of QCC algorithm for fuzzy operators and illustrate that the query approach which can return the results more accurately that best match the user’s query intention. Experiments on virtual datasets demonstrated that QCC algorithm for fuzzy operands is effective and our data integration and query approach are more adaptable.
There are several interesting directions of research that we are currently exploring. For example, how do our approach query multi-source heterogeneous fuzzy spatiotemporal in dynamic RDF graph? A possible solution is to integrate our approach with several integrity constraints.
Footnotes
Acknowledgments
This work was supported by the National Natural Science Foundation of China (61402087), the Natural Science Foundation of Hebei Province (F2019501030), the Natural Science Foundation of Liaoning Province (2019-MS-130), the Key Project of Scientific Research Funds in Colleges and Universities of Hebei Education Department (ZD2020402), the Fundamental Research Funds for the Central Universities (N2023019), and in part by the Program for 333 Talents in Hebei Province (A202001066).
The authors would also like to express their gratitude to the anonymous reviewers for providing very helpful suggestions.
