Abstract
With the growing importance of the fuzzy spatiotemporal data in information application, there is an increasing need for researching on the integration method of multi-source heterogeneous fuzzy spatiotemporal data. In this paper, we first propose a fuzzy spatiotemporal RDF graph model based on RDF (Resource Description Framework) that proposed by the World Wide Web Consortium (W3C) to represent data in triples (subject, predicate, object). Secondly, we analyze and classify the related heterogeneous problems of multi-source heterogeneous fuzzy spatiotemporal data, and use the fuzzy spatiotemporal RDF graph model to define the corresponding rules to solve these heterogeneous problems. In addition, based on the characteristics of RDF triples, we analyze the heterogeneous problem of multi-source heterogeneous fuzzy spatiotemporal data integration in RDF triples, and provide the integration methods FRDFG in this paper. Finally, we report our experiments results to validate our approach and show its significant superiority.
Introduction
With the prompt development of the Internet, the data generated in each area is exponentially increased. Aiming at the problem of how to efficiently utilize massive data, researchers study the methodology of data integration [9, 22]. There are three mature technologies for the integration of multi-source heterogeneous data, namely federated database [2], data warehouse [7] and middleware technology [20]. Federated databases [1, 25] are an early and less difficult way to integrate heterogeneous data. A federated database is a collection of independent but cooperative unit databases, that is, data sources are independent of each other and mapped one by one through interfaces defined by data exchange. For example, Haas et al. [8] use information from various life science databases to achieve data management through federated database technology. So the biggest advantage of federated databases is that they are relatively easy to implement, but require a large number of interfaces for data interaction, which is a huge workload. The principle of data warehouse technology [12] implementation is to preprocess and convert data copies of multiple data sources, and then unify the technology in accordance with the pattern of the data warehouse and store the processed data into the data warehouse. The data warehouse is actually a subject oriented data set that has integrated and relatively stable features. The data integration of the middleware technology [6, 15] is similar to the data warehouse, but there is some difference between their architectures. In middleware technology, the data is still stored in heterogeneous data sources, and the integration system provides a virtual integrated view to handle the query functionality of the view. In the development of middleware technology, such as Yang et al. [29], who study the heterogeneous data based on mediators and wrapper machines. Using the design of the transformation algorithm, the data is converted into a unified XML format to eliminate isomerism.
In real-world applications, there is a large amount of spatiotemporal information which is often vague or ambiguous. A lot of researches on fuzzy spatiotemporal data have come out and most of the previous works focus on fuzzy spatiotemporal data modeling and querying [4, 24]. Sözer et al. [24] use a meteorological database application in an intelligent database architecture, which combines an object-oriented database with a knowledgebase for modeling and querying spatiotemporal objects. Cheng et al. [4] propose a novel model for representing fuzzy spatiotemporal objects and their topological relations. Based on this model, they investigate how to design basic and complex fuzzy query operators so that it is possible to describe the evolution of fuzzy spatiotemporal objects over time. Ma et al. [16] extend XML Schema so that it is possible to describe fuzzy spatiotemporal data and capture the structural information in fuzzy XML document. Unfortunately, these models are weak in representing topological relations among data.
RDF (Resource Description Framework) [13], a language proposed by the World Wide Web Consortium (W3C), which is inclusive, exchangeable and easy to extend, control and integrate in data processing. Therefore, it is of great significance in data modeling of spatiotemporal data based on RDF and there are some works about it [5, 26]. For example, stRDF is proposed in [11] which regulates the representation principle of spatiotemporal data in RDF and makes the spatiotemporal data querying more standardized. Di et al. [5] combine spatiotemporal information with RDF and present a novel representation model of spatiotemporal RDF. Unfortunately, there is no ambiguity involved in the works mentioned above. Then, a fuzzy RDF model and its algebra are formally put forward by Ma et al. [18]. It provides a solution to the expression of fuzzy information in RDF. Besides, Wang et al. [27] propose an uncertain spatiotemporal data model and define the corresponding constraint framework for the model. The research on modeling and querying isomorphism fuzzy spatiotemporal data is approaching maturity.
In fact, fuzzy spatiotemporal information is often heterogeneous and multi-source. There are some works on integrating heterogeneous data [17, 28], spatial data [3], temporal data [23], and spatiotemporal data [10]. Little attention has been paid to integration of multi-source heterogeneous fuzzy spatiotemporal data, so how to integrate and store them is expected to be solved. For this purpose, being similar to the study of isomorphism fuzzy spatiotemporal data as mentioned above, RDF is a good choice for integrating multi-source heterogeneous fuzzy spatiotemporal data. Therefore, in this paper we aim at investigating how to model and integrate multi-source heterogeneous fuzzy spatiotemporal data based on RDF.
The contributions of this paper are the following: We define the concept of fuzzy spatiotemporal data and construct the corresponding model to represent the multi-source heterogeneous fuzzy spatiotemporal data based on RDF. We analyze the semantic conflicts of integration and give the corresponding solutions. Then, the integration algorithms FRDFG and FSTR are put forward, which divide nodes into 4 categories for processing. We conduct a comprehensive experiment to demonstrate the benefits of our proposed approach over previous approaches.
The remainder of this paper is organized: Section 2 devises a fuzzy spatiotemporal data model based on RDF. The integration method of fuzzy spatiotemporal data is presented in Section 3 and Section 4. Experimental evaluation is given in Section 5 and Section 6 concludes the paper.
Fuzzy spatiotemporal data model based on RDF
According to the characteristics of multi-source heterogeneous fuzzy spatiotemporal data, this section introduces some tuples to construct fuzzy spatiotemporal data model based on RDF.
Oid: the identification of fuzzy spatiotemporal data, which describes the changing states;
Attr: the name of attribute set of fuzzy spatiotemporal data, which denotes the general properties;
Motion: it describes the next motions of fuzzy spatiotemporal data;
Rs: it describes the resource of fuzzy spatiotemporal data;
Sp: it describes the spatial information of fuzzy spatiotemporal data;
T: it describes the temporal information of fuzzy spatiotemporal data.
Vet is a finitude set of vertices;
E ⊂ Vi×Vj is a set of directed edges, where Vi, Vj ⊂V;
Level is the set of labels at vertices and edges;
μ: V⟶[0, 1] is a fuzzy subset of V;
ρ: E⟶[0, 1] is a fuzzy relation on fuzzy subset μ.
Tp is time point of the temporal information;
T
i
= [t
s
, t
e
] is time interval for representing the temporal information, where t
s
represents start time of T and t
e
represents end time of T;
p Î [0,1] represents the possibility of temporal information of fuzzy spatiotemporal data.
Vet’ is a copy of a Vet which represents a finite set of vertices;
E’ is a copy of E;
Level’ is a copy of the set of vertices and edges;
μ’:Vet’⟶[0, 1] is the fuzzy subset of Vet’;
ρ’:E’⟶[0, 1] is the fuzzy relation of fuzzy subset μ’.

Fuzzy spatiotemporal RDF graph.
The next two sections propose the integration method of multi-source heterogeneous fuzzy spatiotemporal data, and the flowchart is shown in Fig. 2.

Flowchart of the integration method.
This section summarizes semantic conflicts and corresponding solutions in multi-source heterogeneous fuzzy spatiotemporal data integration. There are 5 types of semantic conflicts, namely integration of fuzzy spatiotemporal data from the same source, Data source conflict, Nodes/labels naming conflicts, Nodes value conflicts and Fuzzy value conflicts.
If there is a node that represents the same data as G’, then G’ can be added after node G
ft
(G
ft
represents the last node that has the same type as G) and the member properties in G’ are preserved; Otherwise adding G’ as the last child node of the root in existing FSRG and remaining properties of it.
Step 1: A new node motion is created and the same attribute nodes in the two fuzzy RDF sub-graphs are selected to form a triplet relationship with m otion.
Step 2: The two fuzzy spatiotemporal RDF sub-graphs are taken as the sub-graphs of the new nodes, and the other attribute nodes and the information contained in the child nodes remain unchanged.

Fuzzy spatiotemporal RDF graph of Example 2.
Adding the node ASPOgt (according to ASPO in Definition 5) to graph G, and the existing graph and the corresponding fuzzy values remain unchanged.
Step 1: Creating a new node and assigning the same properties in both RDF sub-graphs to it, with the labels and values remaining unchanged.
Step 2: Regarding two fuzzy spatiotemporal RDF sub-graphs as children of new nodes, where properties of other child nodes remain unchanged.

Fuzzy spatiotemporal RDF graph of Example 3.
Creating a new node and assigning the same properties in the two RDF sub-graphs to it, then taking a node value of conflicting nodes as final value. Regarding two sub-graphs as children of the new node, and other attributes of them remain unchanged.
ASPO
gt
is considered as a possible instance to describe a specific type, so we need to add the node ASPO
G
. By using the data structure pattern of RDF graph G of the existing fuzzy spatiotemporal data, candidate ASPO
gt
and ASPO
G
of fuzzy objects are retained and corresponding fuzzy degree values in the RDF graph need to be integrated. Determining whether the fuzzy values of node properties are equal or not. If they are equal, the property values are merged into the same node. Otherwise the problem will be discussed in Conflict 5.

Fuzzy spatiotemporal RDF graph of Example 4.
For the two fuzzy spatiotemporal RDF sub-graphs, they describe the same fuzzy spatiotemporal data from same data resources. But there are different fuzzy degrees in similar nodes. Here it is assumed that there are no naming and attribute value conflicts because they can be solved beforehand. For example, the membership degree conflict can be solved by Zadeh’s intersection operator [30].
If there is fuzzy value conflict of similar data nodes between fuzzy spatiotemporal RDF graph gt and G, then ASPO gt is a duplicate of ASPO G . Assuming that the membership degrees of similar data nodes in ASPO gt and ASPO G are σgt and σG, then the fuzzy degree of ASPO G after integrating is min( σ gt , σ G ) .

Fuzzy spatiotemporal RDF graph of Example 5.
In the above, the heterogeneous problem is analyzed and represented by the fuzzy RDF graph model, and the corresponding solutions are proposed. The analysis and classification process of heterogeneous conflict problems are shown in the Algorithm 1:
Integration of root nodes in the fuzzy spatiotemporal RDF graph
Before integration, the similarity of objects described by the RDF graph of fuzzy spatiotemporal data should be determined. Each fuzzy spatiotemporal data has a unique identifier Oid, which can be used to determine the similarity.
In Rule 6, the Oid
i
node values in the graph are extracted, and the similarity of Oid
i
nodes in two RDF graphs is calculated by formula (1):
In formula (1), G1 and G2 are name variables of fuzzy spatiotemporal RDF graph. If Oid i = Oidi +1, it means two graphs represent the same object. If Oid i ≠ Oidi +1, it means that the two graphs describe different objects, and a new node needs to be added to transform the two RDF graphs into one.
Rule 6 and Rule 7 aim to judge whether the nodes in the fuzzy spatiotemporal RDF graph are similar, and to realize the integration processing of root nodes in the fuzzy spatiotemporal RDF graph. The implementation of is shown in Example 6.

Fuzzy spatiotemporal RDF graph of Example 6.
The fuzzy spatiotemporal objects described by RDF can be taken as subjects. However, according to the characteristics of the fuzzy spatiotemporal data changing with time, there are fuzzy spatiotemporal data describing different states in the same subject. Therefore, this section focuses on solving the problem of subject nodes with multiple states.
Step.1 It assumes that i = 1, 2 and the intersection of T
i
node is (t1, t2) ∩ (t3, t4) ≠ Ø, where t1 < t2 < t3 < t4 or t3 < t4 < t1 < t 2. It means C
i
has different states, so creating S as object node of the fuzzy spatiotemporal RDF graph C
i
. Step.2 As shown in Fig. 8, creating the new right object node of S which represent as Si +1, Si +2, S
ki
..., where k
i
represents the subject node at k
i
state.

Fuzzy spatiotemporal RDF graph of Example 7.
Step. 1 It assumes that i = 1, 2 and the intersection of T
i
node is (t1, t2) ∩ (t3, t4) ≠ Ø, where t1 < t3 < t4 < t2, then S is taken as subject node of C1 and C2. Step. 2 Creating new left sub-nodes of S which represent as Si +1 . j +1, Si +1 . j +2,..., Si +1 . kj where kj represents the kj state of subject node at i + 1 state in the same RDF graph, as S1, S1.1 shown in Fig. 9.

Fuzzy spatiotemporal RDF graph of Example 8.
In Rule 9, the integration of subject nodes in the fuzzy spatiotemporal RDF graph with similarity but in different intersecting sub-states is studied.
Step. 1 It assumes that i = 1, 2 and the intersection of Ti is (t1, t2) ∩ (t3, t4) = (t3, t2) , then S is created as subject node of C1 and C2. Step. 2 Creating new left sub-nodes of S which represent as Si +1 . j +1 . l +1, Si +1 . j +2 . l +2,... i + 1.j+1 represents the j + 1 state of the subject node at i + 1 state in the same fuzzy spatiotemporal RDF graph, as S1, S1.1, S1.1.1... shown in Fig. 10.

Fuzzy spatiotemporal RDF graph of Example 9.
According to Rule 10, FSTR is guaranteed the integrity of information integration processing different states of fuzzy spatiotemporal data.
In formula (1), the method of judging the similarity of two fuzzy RDF graphs describing the same fuzzy spatiotemporal data is proposed. Although Oid values of the same fuzzy spatiotemporal data in the above examples are the same. However, each fuzzy spatiotemporal data has a fuzzy value, which is used to describe the possible degree of fuzzy spatiotemporal data. As a result, there is conflict between subjects and corresponding properties. Therefore, this subsection proposes a method to deal with it.
Assume that the fuzzy information in the property nodes s1 and s2 is extracted; the similarity of the property nodes is calculated with the formula (2):
Rule 11 is a supplement to the integration method FSTR in the property node. In the formula (2), s1 and s2 represent property nodes, Oid1 and Oid2 are property node names, and P1 and P2 are fuzzy values.
Due to the spatial characteristics and temporal characteristics of spatiotemporal data, the similarity measurement method is divided into two directions: the integration of fuzzy temporal information and the integration of fuzzy spatial information.
In Definition 3, temporal information of the fuzzy spatiotemporal RDF graph is represented by T (T p , T i , p). In general, p can be inferred based on the fuzziness degree and state of Tp and Ti. Therefore, it is assumed that there are fuzzy spatiotemporal RDF subgraphs F1 and F2 from different data sources, and F1 ((t1, t3), Ti1) and F2 ((t2, t4), Ti2) are fuzzy temporal information of the same fuzzy spatiotemporal data where t1≤Ti1≤t3, t2≤Ti2≤t4.
In addition, this subsection studies the integration method of fuzzy spatial object nodes. In Definition 4, the fuzzy spatiotemporal RDF graph G is formed by Sp (la, lo, p). Because the spatial variation of fuzzy spatiotemporal data varies with the temporal information, the integration of the spatial information must consider the temporal information. Sp1 and Sp2 discribe spatial information of two RDF subgraphs of the same fuzzy spatiotemporal data. Therefore, the integration of fuzzy spatial information can be based on the following rules:
If la1 ≠ la2 or lo1 ≠ lo2, then the fuzzy spatial node Spi should store the fuzzy spatiotemporal data and corresponding fuzzy values. If t1 = t3 and t2 = t4, which means (t1, t2) ∩ [t3, t4) = (t1, t2)/(t1, t4)/(t3, t2)/(t3, t4). Then, a new fuzzy temporal object node T is established to merge the two fuzzy time intervals. The value of the integrated fuzzy spatial object node Sp
i
is: Sim o (O1, O2) = P1, la = min la1, la2, lo = maxlo1, lo2.
Cases 4–6 are the integration of fuzzy temporal object nodes and Case 7 is the integration of fuzzy spatial object nodes. Through the definition of Rules 13–22, it improves the integration method FSTR in the integration of object nodes.
The core work of this section is to study the integration method FSTR of multi-source heterogeneous fuzzy spatiotemporal data in fuzzy spatiotemporal RDF graph. The implementation process of integration method FSTR is shown in Algorithm 2. The integration operations are shown as Example 10.

Fuzzy spatiotemporal RDF graph of Example 10.
Experimental setup
In this subsection, we will present evaluations on the basis of the meteorological application. All the evaluations have been implemented in eclipse4.4.1 and JDK 1.8, and performed on 3.2 GHz Intel Core i5 processor with 8 GB RAM on Windows 7 system. We implement some groups of querying experiments in Java and MYSQL.
For this assessment, a meteorological information database was established, containing 55,000 tuples from global meteorological information sites. Our test used two different data sets. The first data set is an unintegrated data set, described as a UIDB, and the second data set is an integrated data set, described as IFDB. In addition, the experiment of this paper is divided into three groups, in the two groups of experiments FSTR need to be compared with other processing method of spatiotemporal data. The method of data processing τ-SPARQL [28] and Deep integration [17] needs to be reconstructed, so two data sets should be added after pretreatment based on UIDB: one is temporal data set τ-SPARQL and the other is spatial data set Deep integration, respectively.
In this experiment, the feasibility and effectiveness of the integrated method FSTR are verified by data set query. The experimental query conditions are divided into three types, and the three types of query conditions contain no less than 12 query tuples. The setting of these query tuples should reflect the characteristics of multi-source heterogeneous fuzzy spatiotemporal data, including general attributes, temporal attributes and spatial attributes of fuzzy spatiotemporal data. In addition, as the spatial attributes of fuzzy spatiotemporal data change with time, the query conditions can be set in the form of combining the attributes of fuzzy spatiotemporal data with the temporal attributes. The query conditions are shown in Table 1.
The query condition of users query
The query condition of users query
Comparison of FSTR between data sets UIDB and IFDB
This section uses G1, G2, G3 and G4 query conditions, which are mainly focused on the general attribute query of fuzzy spatiotemporal data. The corresponding query results are obtained by using respectively on UIDB and IFDB. In Fig. 12, the Precision, Recall and F-score of the experimental results of IFDB are greater than UIDB. In Fig. 13, the Execution Time and Memory Cost values corresponding to the four query conditions -on IFDB were lower than on UIDB. Therefore, the experimental comparison in this section shows that the method of multi-source heterogeneous fuzzy spatiotemporal data integration is effective and accurate.

The Precision, Recall and F-Score of the Queries in the UIBD, IFDB.

Execution Time and Memory Cost of G1 - G4.
Because fuzzy spatiotemporal data is different from general data, most of researches on the integration of fuzzy spatiotemporal data are focused on the temporal or spatial attributes of the fuzzy spatiotemporal data. For example, Tappolet et al. propose an integrated method τ-SPARQL to combine RDF and temporal data, which is of great significance to the research of spatiotemporal data integration. In this section, the experiment will compare the FSTR and τ-SPARQL to verify the superiority and effectiveness of FSTR proposed in this paper.
This section uses G5, G6, G7 and G8 query conditions, which are mainly focused on the temporal attribute query of fuzzy spatiotemporal data. In Fig. 14, the Precision, Recall and F-score of the experimental results of FSTR are higher than τ-SPARQL.

The Precision, Recall, F-Score of the Temporal Queries in the IFDB.
In Fig. 15, the Execution Time and Memory Cost of FSTR are lower than τ-SPARQL. Therefore, the experimental comparison in this section shows that FSTR has better effectiveness and accuracy in temporal data integration than τ-SPARQL.

Execution Time and Memory Cost of G5 - G8.
The experimental content in this section is mainly to compare FSTR with the spatial attribute processing method Deep Integration proposed by Brodt et al [38]. This section uses G5, G6, G7 and G8 query conditions, which are mainly focused on the spatial attribute query of fuzzy spatiotemporal data. In Fig. 16, the Precision, Recall and F-score of the experimental results of FSTR are higher than Deep integration. In Fig. 17, the Execution Time and Memory Cost of FSTR are lower than Deep integration. Therefore, the experimental comparison in this section shows that FSTR has better effectiveness and accuracy in spatial data integration than Deep integration.

The Precision, Recall, F-Score of the Spatial Queries in the IFDB.

Execution Time and Memory Cost of G9 - G12.
By calculating the experimental results, the Precision, Recall and F-Score of the query results were obtained, as well as the comparison between the Execution Time and Memory Cost in the experimental process fully consistent with the expected effect of the integrated method. Finally, through the analysis of the above experimental results, the multi-source heterogeneous fuzzy spatiotemporal data integration method studied in this paper meets the purpose and achieves expected effect.
However, the research on the multi-source heterogeneous fuzzy spatiotemporal data integration method proposed in this paper has some imperfections. On the one hand, this work mainly focuses on the study of fuzzy spatiotemporal data. We only take the fuzziness into consideration, which is a kind of uncertain information. There is also some other uncertain information has been ignored, such as inconsistency, imprecision, vagueness, uncertainty, ambiguity and so on. On the other hand, there are more heterogeneous problems in the integration method of multi-source heterogeneous fuzzy spatiotemporal data that we omit. For example, more in-depth and comprehensive researches can be carried out in the aspects of system heterogeneity and pattern heterogeneity. We will solve these problems in future work.
Conclusion
This paper proposes an integration method of multi-source heterogeneous fuzzy spatiotemporal data. To accomplish the study of integration, a fuzzy spatiotemporal RDF model was proposed. Then an integration algorithm FRDFG is proposed, which concerns 5 types of semantic conflicts and corresponding solutions, can solve the heterogeneous problems. What’s more, we put forward the methodology of integrating subject nodes, property nodes and object nodes in RDF graphs to improve FRDFG. Finally, experimental results confirm the superiority of our approaches.
Future work mainly concentrates on the following aspects: (i) This paper mainly focus on fuzzy data, so we will take more uncertain information (e.g. inconsistency and imprecision) into consideration. (ii) Our model only deals with semantic heterogeneity, so more heterogeneous problems (e.g. system heterogeneity and pattern heterogeneity) will be considered.
Footnotes
Acknowledgments
This work was supported by the National Natural Science Foundation of China (61402087), the Natural Science Foundation of Hebei Province (F2019501030), the Natural Science Foundation of Liaoning Province (2019-MS-130), the Key Project of Scientific Research Funds in Colledges and Universities of Hebei Education Department (ZD2020402), and the Fundamental Research Funds for the Central Universities (N2023019).
