Dealing with temporal indeterminacy in relational databases: An AI methodology

Abstract

Time is pervasive of the human way of approaching reality, so that it has been widely studied in many research areas, including AI and relational Temporal Databases (TDB). While temporally imprecise information has been widely studied by the AI community, only few approaches have faced temporal indeterminacy (in particular, “don’t know exactly when” indeterminacy) in TDBs. Indeed, as we will show in this paper, the treatment of time in general, and of temporal indeterminacy in particular, involves the introduction of implicit forms of data representation in TDBs. As a consequence, we propose a new AI-style methodology to cope with temporal indeterminacy in TDBs. Specifically, we show that typical AI notions and techniques, such as making explicit the semantics of the representation formalism, and adopting symbolic manipulation techniques based on such a semantics, can be fruitfully exploited in the development of a “principled” treatment of indeterminate time in relational databases.

Keywords

Temporal data data representation and semantics query semantics symbolic manipulation

1. Introduction

Time is pervasive of reality, and plays a fundamental role in many intelligent tasks, so that it has been widely investigated by the AI community, which, from its early years, has developed many different methodologies for temporal representation and temporal reasoning. Starting from the middle 80’s, also the scientific DB community has started to recognize that time has a special status with respect to the other data, so that its treatment within a relational database context requires dedicated techniques [20,21]. Such a consideration has led to the development of many different approaches to cope with time in the area of temporal relational databases (TDB in the following; see, e.g., [16,27]). Just as an example, the 1998 cumulative bibliography about TDBs refers more than 2000 papers [27]. Such approaches pointed out a quite wide range of solutions, proposing, e.g., different data models, and algebraic operations to query them. However, TDB approaches have developed quite completely in an independent way with respect to AI methodologies. This is probably due to the fact that, to the best of our knowledge, no TDB approach has explicitly taken into account the fact that, while adding time to a relational DB, one adds implicit knowledge (i.e., the semantics of time) in it. This is particularly true in case temporal indeterminacy is considered (i.e., “don’t know exactly when” indeterminacy [10]), since indeterminacy give rise to many different alternative possibilities, and, for the sake of space and computational efficiency, no TDB approach makes all of them explicit. In this paper we argue that, since a high degree of implicit information is present in temporally indeterminate DB data, a temporal indeterminate DB is indeed quite close to a (simplified) knowledge base, so that AI techniques can be exploited to properly cope with it. In this paper, we propose an AI-based methodology to cope with temporal indeterminacy:

We formally define and extend the snapshot semantics for temporal data [20] to cope also with temporal indeterminacy,

We propose a 1-Normal-Form representation model for “interval-based” temporal indeterminacy

We analyse the semantics of the representation model, showing that (at least) two alternatives are possible

We face the task of defining the relational algebraic operators (which perform symbolic manipulation on the model) to query the representational model, for both the alternative semantics, showing that only with one of them it is possible to devise a relational algebra which is both closed with respect to the model and correct with respect to the semantics.

Result (iv) enforces the core message of our approach: in TDBs, the representational model contains implicit (temporal) information. Thus, AI techniques could/should be used to analyse its semantics, and to devise algebraic operators that perform symbolic manipulation on the representational model, consistently with the devised semantics. In other words, in this paper we propose the first (to the best of our knowledge) AI approach in which an AI methodology is developed to be applied to the relational DB context, to cope with time and temporal indeterminacy.

The paper is organized as follows. In Section 2, we briefly overview the related work, focusing on the TDB approaches to temporal indeterminacy. In Section 3, we informally introduce our AI-style methodology to cope with temporal indeterminacy in TDBs, motivating it. In Section 4, we start with the technical contributions, proposing a “functional” (data and query) semantics for determinate time in TDB. In Section 5, we extend such a semantics to cover temporal indeterminacy. In Section 6, we propose a compact representation (in First Normal Form – 1NF for short) for an important class of temporally indeterminate data, and define temporal algebraic operators to query it, consistently with the semantics of the representation. In Section 7, we provide experimental results, showing that our treatment of temporal indeterminacy only adds a negligible overhead to the “standard” TSQL2 treatment of determinate temporal data. Finally, Section 8 contains conclusions and future work. The Appendix briefly presents the “consensus” BCDM semantics for determinate time, elaborated by the TDB community [15].

2. TDB approaches to valid time temporal indeterminacy

Many different approaches have been devoted to the treatment of time in TDBs. One of the first milestones was the distinction between the time when facts are inserted/deleted into/from the DB (termed transaction time), and the time when such facts occurred in the modelled mini-world (termed valid time) (consider, e.g., [22]). In the following, we only consider the latter. Despite their variety, until now most TDB approaches have focused on individual occurrences of facts, whose valid time is exactly known (i.e., with determinate time). However, as well known in the AI field, in many real-world cases the exact time of occurrence of facts is not known, and can only be approximated, so that temporal indeterminacy (i.e., “don’t know exactly when” indeterminacy [10]) has to be faced. Temporal indeterminacy is so important that “support for temporal indeterminacy” was already one of the eight explicit goals of the data types in TSQL2 [20], the milestone “consensus” approach devised by the TDB community. In effect, temporal indeterminacy in TDBs has various possible sources, including scale, dating techniques, future planning, unknown or imprecise event times, clock measurements (this list is not exhaustive, and is taken from TSQL2 book [20]). Due to the fact that temporal indeterminacy is pervasive in many application domains, since the 80’s AI a plethora of approaches has been devised to cope with it (just to mention few examples, consider the early surveys in [1,13,26]). However, in the area of relational databases, the number of approaches coping with temporal indeterminacy is more restricted (see e.g. the surveys in [10,15]) and current approaches have several limitations.

In the earliest TDB work on temporal indeterminacy, an indeterminate instant was modeled with a set of possible chronons [19]. Dutta [9] introduced a fuzzy set approach. Gadia et al. [14] proposed a model to support value and temporal incompleteness. In the TSQL2 “consensus” book [20], Chapter 18 presents a 1NF data model for temporal indeterminacy and an extension of SQL, while it does not provide a relational algebra. Dyreson and Snodgrass [11] and Dekhtyar et al. [7] have proposed probabilistic approaches coping with different forms of temporal indeterminacy. Dyreson and Snodgrass cope with valid-time indeterminacy by associating a period of indeterminacy with a tuple. A period of indeterminacy is a period between two indeterminate instants, each one consisting of a range of granules and of a probability distribution over it. However, in such an approach, no relational algebra is proposed to query temporally indeterminate data. Dekhtyar et al. introduce temporal probabilistic tuples to cope with a quite specific for of temporal indeterminacy, concerning instantaneous events only (i.e., with data such as “the tuple d is in relation r at some point of time in the interval $[t_{i}, t_{j}]$ with probability between p and $p_{0}^{″}$ ), and provide algebraic relational operators for their data model. Anselma et al. [4] have proposed a general semantic model for temporal indeterminacy in TDBs. They identified different forms of temporal indeterminacy, and proposed a family of achievable representational models and algebrae for such forms. However, such an approach is semantic-oriented, abstract and not in 1NF (thus inefficient and not suitable for a direct implementation). A 1NF approach for a form of temporal indeterminacy has been proposed in [3], but no semantics for the model has been presented.

3. Towards an AI-style semantic-based methodology to cope with time in relational DBs

A premise is important, when starting a discussion about the semantics of temporal relational DBs. Indeed, seen from an AI perspective, a “traditional” non-temporal relational database is just an elicitation of all and only (given the closed-world assumption [18]) the facts that are true in the modeled mini-world. In such a sense, the semantics of a non-temporal DB is “trivial”, since the DB do not contain any implicit data/information. Since all the data are explicit, no “AI-style” reasoning mechanism is required, and (algebraic) query operators are enough to extract the relevant data from a DB. However, such an “easy” scenario drastically changes when time is introduced into DBs, by associating with each fact its valid time, i.e., the time when it holds/occurs. Roughly speaking, in such a case, eliciting explicitly all true facts would correspond to elicit , for each possible point in time , all the facts that hold at that point . Despite the extreme variety of TDB approaches in the literature, almost the totality of them is based, explicitly or (in many cases) implicitly, on the above idea, commonly termed “ snapshot semantics ”: a temporal database is a set of “standard” (non-temporal) databases, each one considering a snapshot of time, and eliciting all facts (tuples) true in the modelled mini-world at that time. As an example, the “consensus” TSQL2 presents the BCDM semantic model, which supports the “snapshot semantics” mentioned above, and proves that such a semantics underlies many different approaches in the literature [20]. For the sake of completeness, the BCDM semantics is briefly reported in the Appendix.

Of course, for space and time efficiency reasons, no TDB approach in the literature directly implements temporal databases making all data explicit: representational models are used to encode facts in a more compact and efficient form. Notably, this is a major departure from “traditional” DB concepts: a TDBs is not just an elicitation of all facts that holds in the modelled mini-world, but a compact implicit representation of them. This consideration becomes even more important while considering temporal indeterminacy. In TDBs, temporal indeterminate facts are facts for which the time of occurrence can only be approximated. Therefore, they may involve many different alternative possibilities and, for the sake of space and computational efficiency, no TDB approach aims at explicitly storing and managing all of them. As a consequence, a high degree of implicit information is present in all TDB data models for temporal indeterminacy. The adoption of representation model that are not fully “explicit”, but involve a degree of implicit information, makes a temporal indeterminate TDB closer to the AI notion of knowledge base (in the sense that implicit information is involved). As a consequence, in this paper, we suggest that a new, “semantic-based” and “AI-based” methodology should be used, when approaching indeterminate time in relational DBs. First of all,

a semantics for making explicit the intended meaning of the representational models must be devised.

In such a context, the algebraic query operators cannot simply select and extract data (since some data are implicit). Making all data explicit before/while answering queries is certainly not a good option (for the sake of space and time efficiency of the approach). As a consequence,

algebraic operators must operate on the (implicit) representation

algebraic operators must provide an output expressed in the given representation (i.e., the representation formalism must be closed with respect to the algebraic operators)

algebraic operators must be correct with respect to the semantics of the representation

In the following, we make the above discussion more concrete and formal. We start from the definition of a semantics for determinate time, and then we extend it to consider temporal indeterminacy.

4. Snapshot semantics for determinate time DBs: A “functional” perspective

In BCDM [20], a semantics for determinate time TDBs is provided (see also the Appendix). It encodes the “snapshot semantics” discussed above, and has been proven to encompass the semantics of many different TDB approaches [20]. However, such a semantics is deliberately “operational”, aiming at being not far from possible implementations. In this section we propose a “functional” specification of the snapshot semantics, which is more abstract and, in the meanwhile, more suitable for

being extended to cope with temporal indeterminacy and

specifying the semantics of temporal algebraic operators in terms of their non-temporal counterparts.

4.1. Data semantics

We first introduce the notion of tuple, relation, and database. We then move to the definition of time, and define the notion of (semantics of) a temporal database.

Definition 1 ((non-temporal) Database, Relation, Tuple).

A (non-temporal) relational database $D B$ is a set of relations over the relational schema $σ = (R_{1} : s_{i}, \dots, R_{k} : s_{j})$ where $s_{i}, \dots, s_{j} \in S$ are the sorts of $R_{1}, \dots, R_{k}$ , respectively. A relation $R (x_{1}, \dots, x_{k}) : s$ of sort $s \in S$ is a sequence of attributes $x_{1}, \dots, x_{k}$ each with values in a proper domain $D_{1}, \dots, D_{k}$ . An instance $r (R : s)$ of a relation $R (x_{1}, \dots, x_{k})$ of sort $s \in S$ is a set ${a_{1}, \dots, a_{n}}$ of tuples, where each tuple $a_{i}$ is a set $⟨ v_{1}, \dots, v_{k} ⟩$ of values in $D_{1} \times \dots \times D_{k}$ .

Notation. In the following, we denote by $D B_{σ}$ the domain of all possible database instances over a schema σ.

As in many TDB approaches, including TSQL2 [20], and in the BCDM “consensus” semantics [15], we assume that time is discrete and bounded.

Definition 2 (Temporal domain $D_{T}$ ).

We assume a limited precision for time, and call chronon (as in TSQL2 [20]) the basic time unit. The domain of chronons is totally ordered and isomorphic to a subset of the domain of natural numbers. The domain of valid times $D_{T}$ is given as a set $D_{T} = {c_{1}, \dots, c_{k}}$ of chronons.

In the (commonly agreed) snapshot semantics, a temporal database is a set of conventional (non-temporal) databases, one for each chronon of time. In this paper, we propose to specify such a concept formally through the introduction of functions, relating each time to the facts holding/occurring at that time.

Definition 3 (Temporal database (semantic notion)).

Given a relational schema $σ = (R_{1} : s_{i}, \dots, R_{k} : s_{j})$ , a temporal database $D B^{T}$ is a function $f_{σ, D_{T}} : D_{T} \to D B_{σ}$ .

Analogously a temporal relation is a function from $D_{T}$ to the tuples that hold at each chronon in $D_{T}$ .

Definition 4 (Time slice).

Given a temporal database $D B^{T}$ and a temporal relation $r^{T} \in D B^{T}$ , and given a chronon $c \in D_{T}$ , we define the time slice of $D B^{T}$ (denoted by $D B^{T} (c)$ ) and of $r^{T}$ (denoted by $r^{T} (c)$ ) the result of the application of the functions $D B^{T}$ and $r^{T}$ to the chronon c.

Example 1.
As a simple running example, let us consider a simple database $D B_{1}^{T}$ modeling patient symptoms. The database contains a unique relation $SYM$ of schema $⟨ Patient, Symptom, Value ⟩$ and models two facts:
John had high fever from 10 to 12 of 1/1/2018

Mary had moderate fever from 11 to 13 of 1/1/2018
(in the example, we assume that chronons are at the granularity of hours, and hour 1 represent the first hour of 1/1/2018).

The temporal database (semantic notion) modeling such a state of affairs is the following (for the sake of clarity and simplicity, we omit the chronons in $D_{T}$ for which no tuple hold, and we omit the name of the relation(s)). $\begin{array}{l} 10 \to & {⟨ John, fever, high ⟩} \\ 11 \to & {⟨ John, fever, high ⟩, \\ ⟨ Mary, fever, moderate ⟩} \\ 12 \to & {⟨ John, fever, high ⟩, \\ ⟨ Mary, fever, moderate ⟩} \\ 13 \to & {⟨ Mary, fever, moderate ⟩} \end{array}$

In this example $\begin{matrix} D B_{1}^{T} (10) = {SYM}^{T} (10) = {⟨ John, fever, high ⟩} \end{matrix}$

Notably, Definition 3 above is a purely “semantic” definition. Other definitions of the snapshot semantics for TDBs, such as the one in BCDM, are more “operational” and are closer to actual representations/implementations (see the Appendix). Indeed, our “functional” definition is similar to the “relation-stamped representation” of abstract temporal databases in the work by Chomicki and Toman [5].

4.2. Query semantics

The semantic of queries is commonly expressed by specifying in terms of relational algebraic operators. Codd designated as complete any query language that was as expressive as his set of five relational algebraic operators: relational union (∪), relational difference (−), selection ( $σ_{P}$ ), projection ( $π_{X}$ ), and Cartesian product (×) [6]. Different approaches have generalized such operators to cope also with temporal databases. Though quite different approaches have been proposed (depending on the chosen representation for a temporal database), there is a common agreement that temporal algebraic operators (i) should behave exactly as Codd’s non-temporal ones, at each point (chronon) of time. Roughly speaking, such a requirement is usually formally expressed in TDBs by the reducibility property, stating that temporal algebraic operators should reduce to standard Codd’s operators in case time is removed (see [17,20], and also the Appendix). Notably, property (i) above (as well as reducibility) has also very important practical implications, since it grants the possible interoperability of temporal databases with standard non-temporal DBs [20]. Given our “functional” definition of temporal databases above, in our approach we can formally define such a property (see below).

Definition 5 (Relational algebraic operators on temporal databases (“semantic” notion)).

Denoting by $O p^{C}$ a Codd’s operator, and by $O p^{T}$ its corresponding temporal operator, $O p^{T}$ must be defined in such a way that the following holds $\begin{array}{l} \forall c \in D_{T} \\ (O p^{T} (r^{T}, s^{T}) (c)) = O p^{C} (r^{T} (c), s^{T} (c)) \end{array}$

In Definition 5 above, we assume that $r^{T}$ and $s^{T}$ are temporal relations in a temporal database $D B^{T}$ , and that $O p$ is a binary operator. $r^{T} (c)$ represents the time slice of $r^{T}$ at the chronon c. The definition of unary operators is analogous.

Of course, the above “purely semantic” definition of temporal relational algebraic operators is highly inefficient, since snapshot(s) of the underlying relation(s) at every single chronon (e.g., day, millisecond) are computed. As a consequence, more “operational” definitions of algebraic operators have been proposed in the literature. Notably, however, the “commonly agreed” BCDM definition of the semantics of algebraic operators (see Appendix) is consistent with Definition 5 above.

5. Snapshot semantics for temporal indeterminacy in TDB

In TDBs, the notion of temporal indeterminacy is usually paraphrased as “don’t know exactly when” indeterminacy (consider, e.g., the Encyclopedia survey in [10]): facts hold at times that are not exactly known. An example is reported in the following:

Example 2.
As a simple running example, let us consider a simple database $D B_{1}^{T}$ modeling patient symptoms. The database contains a unique relation ${SYM}^{I}$ of schema $⟨ Patient, Symptom, Value ⟩$ and models two facts:
John had high fever at 10 and 11, and possibly at 12, or 13, or both.

Mary had moderate fever at 12 and 13, and possibly at 11
(in the example, we assume that chronons are at the granularity of hours, and hour 1 represents the first hour of 1/1/2018).

5.1. Data semantics of indeterminate time DBs

Of course, we can still retain the definition of the temporal domain $D_{T}$ provided in Section 2. However, the definition of an indeterminate temporal database is different: informally speaking, an indeterminate TDB is simply a set of alternative (determinate) TDBs, each one encoding one of the different possibilities. Technically speaking, it requires the introduction of a set of functions.

Definition 6 (Indeterminate temporal database (semantic notion)).

Given a relational schema $σ = (R_{1} : s_{i}, \dots, R_{k} : s_{j})$ , an indeterminate temporal database $D B^{I T}$ is a set $D B^{I T} = {f_{1}, \dots, f_{k}}$ of functions $f_{σ, D_{T}} : D_{T} \to D B_{σ}$ .

Analogously, a temporally indeterminate relation $r^{I T}$ is a set $S (r^{I T})$ of functions from $D_{T}$ to the set of tuples of $r^{T}$ that hold at each chronon in $D_{T}$ . As an example, eight functions are necessary to cover all the alternative possibilities (henceforth called scenarios) for Example 2.

Example 2 (cont).

The indeterminate temporal database $D B^{I T}$ (semantic notion) modeling Example 2 is the following (for the sake of brevity, we denote with “J” the tuple $⟨ John, fever, high ⟩$ and with “M” the tuple $⟨ Mary, fever, moderate ⟩$ ). $\begin{matrix} f_{1} & f_{2} & f_{3} \\ 10 \to {J} & 10 \to {J} & 10 \to {J} \\ 11 \to {J} & 11 \to {J} & 11 \to {J} \\ 12 \to {M} & 12 \to {J, M} & 12 \to {M} \\ 13 \to {M} & 13 \to {M} & 13 \to {J, M} \\ f_{4} & f_{5} & f_{6} \\ 10 \to {J} & 10 \to {J} & 10 \to {J} \\ 11 \to {J} & 11 \to {J, M} & 11 \to {J, M} \\ 12 \to {J, M} & 12 \to {M} & 12 \to {J, M} \\ 13 \to {J, M} & 13 \to {M} & 13 \to {M} \\ f_{7} & f_{8} \\ 10 \to {J} & 10 \to {J} \\ 11 \to {J, M} & 11 \to {J, M} \\ 12 \to {M} & 12 \to {J, M} \\ 13 \to {J, M} & 13 \to {J, M} \end{matrix}$

For the technical treatment that follows, it is useful to introduce the notion of alternative slice.

Definition 7 (Scenario slice).

Given an indeterminate temporal database $D B^{I T} = {f_{1}, \dots, f_{k}}$ and a temporal relation $r^{I T} \in D B^{I T}$ , and given any $f \in {f_{1}, \dots, f_{k}}$ , we define the scenario slice f of $D B^{I T}$ (denoted by $D B_{f}^{I T}$ ) and of $r^{I T}$ (denoted by $r_{f}^{I T}$ ) the determinate temporal database and the determinate temporal relation obtained by considering only the scenario f for $D B^{I T}$ .

For example, considering Example 2 above, $D B_{f_{1}}^{I T} = {SYM}_{f_{1}}^{I T} = {10 \to {J}, 11 \to {J}, 12 \to {M}, 13 \to {M}}$ .

5.2. Query semantics

Of course, for the algebraic query operators, we can still retain all the general requirements discussed so far for determinate time. However, we have to generalize the above approach, to consider the fact that a set of alternative (determinate) temporal databases (scenarios) are involved. Therefore, given two temporally indeterminate relations $r^{I T}$ and $s^{I T}$ , binary temporal algebraic operators must consider, at each chronon, all the possible combinations of the scenarios $f_{r} \in S (r^{I T})$ of $r^{I T}$ and $f_{s} \in S (s^{I T})$ of $s^{I T}$ . At any temporal chronon c, the result of the temporal operator should be the result obtained through the application of the corresponding Codd’s operator in each pair of scenarios, considered at time c.

Definition 8 (Relational algebraic operators on indeterminate temporal databases (“semantic” notion)).

(In Definition 8, $r^{I T}$ and $s^{I T}$ are temporal relations in a temporally indeterminate database $D B^{I T}$ , and $O p$ is a binary operator. $f_{r} (c)$ represents the time slice at the chronon c of the scenario $f_{r}$ of $r^{I T}$ . The definition of unary operators is simpler.)

Indeed, indeterminacy intrinsically involves alternative possibilities, so that the approach usually followed in the TDB area (as well as, of course, in AI) is the introduction of modalities, to ask for possible (i.e., valid in at least one of the possible scenarios) or necessary (i.e., valid in all the possible scenarios) answers (see, e.g., [4]). With the introduction of the modalities (POSS – for possible – and NEC – for necessary), the semantics of algebraic query operators in the indeterminate (temporal) context can be still be easily expressed in terms of their Codd’s non-temporal counterparts, as shown in Definition 9 below.

Definition 9 (Relational algebraic operators on indeterminate temporal databases (“semantic” notion)).

Denoting by $O p^{C}$ a Codd’s operator, and by $O p^{I T}$ its corresponding temporal operator for indeterminate time, $O p^{I T}$ must be defined in such a way that the following holds $\begin{array}{l} \forall c \in D_{T} \\ POSS (O p^{I T} (r^{I T}, s^{I T}) (c)) \\ = ⋃_{f r \in S (r^{I T}) \land f s \in S (s^{I T})} O p^{C} (r_{f_{r}}^{I T} (c), s_{f_{s}}^{I T} (c)) \\ \forall c \in D_{T} \\ NEC (O p^{I T} (r^{I T}, s^{I T}) (c)) \\ = ⋂_{f_{r} \in S (r^{I T}) \land f_{s} \in S (s^{I T})} O p^{C} (r_{f_{r}}^{I T} (c), s_{f_{s}}^{I T} (c)) \end{array}$

(In Definition 9 above, $r^{I T}$ and $s^{I T}$ are temporal relations in a temporally indeterminate database $D B^{I T}$ , and $O p$ is a binary operator. $f_{r} (c)$ represents the time slice at the chronon c of the scenario $f_{r}$ of $r^{I T}$ . The definition of unary operators is simpler.)

Roughly speaking, the above definition dictates that, to preserve the snapshot semantics in the indeterminate context, the evaluation of a binary temporal operator $O p^{I T} (r^{I T}, s^{I T})$ must provide the result that would be obtained by

Applying the corresponding Codd’s operator at each chronon, and considering each combinations $⟨ f_{r}, f_{s} ⟩$ of the scenarios $S (r^{I T})$ of $r^{I T}$ and $S (s^{I T})$ of $s^{I T}$ (please notice that $r_{f_{r}}^{I T} (c)$ denotes the time slice at the chronon c of the relation $r^{I T}$ , considering only the scenario $f_{r}$ ).

In the NEC modality (at each chronon), the intersection among the results of such an application in the different scenarios is performed, to grant that the facts hold in all the scenarios. In the POSS modality (at each chronon), union is used, since we want to report as output (at the given chronon) all the facts that hold in at least one scenario.

Notably, we regard Definition 9 above as one of the major results of this paper: until now, no approach in the TDB community has been able to clarify the semantics of temporal algebraic operators on indeterminate time in terms of their Codd’s counterparts.

But, obviously, this is query data and query semantics : a direct implementation of the data model and algebraic operators defined so far would be highly inefficient, as regards both space and time. As a consequence, “compact” implementations should be performed. We address this issue in the next section, showing that, in any case, an in-depth semantic analysis in necessary to provide correct implementations.

6. Possible “compact” approaches to temporal indeterminacy

Very different realizations of temporal relational databases for determinate time have been proposed in the literature. Actually, all of them (except few “pioneering” approaches) respect the “snapshot” semantics, and provide an efficient implementation for it. The large majority of such approaches enforce at least two key requirements to achieve efficiency:

The First Normal Form (1NF) is used to represent data1

¹
A relation is in first normal form if and only if the domain of each attribute contains only atomic (indivisible) values, and the value of each attribute contains only a single value from that domain [12].

Temporal algebraic operators directly manipulate the representation

In this section we adopt the same requirements to face temporal indeterminacy.

The most frequently adopted representational model to cope with (valid) time in a compact and 1NF way is the interval-based representation (consider, e.g., the TSQL2 “consensus” representational model). A time interval (compactly modelled by a starting and an ending time) is associated with each temporal tuple, to denote that the (fact represented by the) tuple holds in each time points in the interval. In the indeterminate time context, such an interval-based representation has also been used, e.g., in [3,11]: four temporal attributes (say T1, T2, T3, and T4) are associated with each temporal tuple (see Definition 10). Definition 10 (Temporally indeterminate Database, Relation, Tuple (representation level)).

At the representation level, a temporally indeterminate relational database $D B^{I T}$ can be modeled by a set of (temporally indeterminate) relations over the relational schema $σ = (R_{1} : s_{i}, \dots, R_{k} : s_{j})$ where $s_{i}, \dots, s_{j} \in S$ are the sorts of $R_{1}, \dots, R_{k}$ , respectively. A relation $R (x_{1}, \dots, x_{k} | T 1, T 2, T 3, T 4) : s$ of sort $s \in S$ is a sequence of non-temporal attributes $x_{1}, \dots, x_{k}$ each with values in a proper domain $D_{1}, \dots D_{k}$ , and temporal attributes $T 1$ , $T 2$ , $T 3$ , $T 4$ with domain $D_{T}$ . An instance $r (R : s)$ of a relation $R (x_{1}, \dots, x_{k} | T 1, T 2, T 3, T 4) : s$ is a set ${t_{1}, \dots, t_{n}}$ tuples, where each tuple $t_{i}$ is a set $⟨ v_{1}, \dots, v_{k} | t_{1}, t_{2}, t_{3}, t_{4} ⟩$ of values in $D_{1} \times \dots \times D_{k} \times D_{T} \times D_{T} \times D_{T} \times D_{T}$ .

Example 3.
In the temporally indeterminate context, the relation SYM (called ${SYM}^{I T}$ ) may be represented with the schema $⟨ Patient, Symptom, Value | T_{1}, T_{2}, T_{3}, T_{4} ⟩$

Intuitively and roughly speaking, the semantics of such a compact 1NF “interval-based” representation of temporal indeterminacy is the following:
the fact represented by the tuple

$⟨ v_{1}, \dots, v_{k} | t_{1}, t_{2}, t_{3}, t_{4} ⟩$ occurs possibly in $[t_{1}, t_{2})$ and in $[t_{3}, t_{4})$ , and certainly in $[t_{2}, t_{3})$ .
In the following, we show that a “rough” semantics like (sem1) above is not enough: it must be fully formalized (e.g., in terms of the sematic model proposed in Section 5 above) as a starting point for devising a “proper” representational model and algebra, following the methodological requirements M1–M4 above.

6.1. “Single occurrence” semantics

A first way of interpreting the “ambiguous” semantics (sem1) above is the following:

the fact represented by the tuple

$⟨ v_{1}, \dots, v_{k} | t_{1}, t_{2}, t_{3}, t_{4} ⟩$ certainly occurs in the time interval $[t_{2}, t_{3})$ , but it may start before, in a chronon in the interval $[t_{1}, t_{2})$ and end after, in a chronon in the interval $[t_{3}, t_{4})$ .

In such a semantics, all the scenarios includes the chronons $[t_{2}, t_{3})$ , as well as (possibly) a set of chronons in $[t_{1}, t_{2}) \cup [t_{3}, t_{4})$ , provided that such chronons extend the interval $[t_{2}, t_{3})$ on the left, or on the right, or in both directions, without creating any gap. This notion can be formalized in terms of scenarios as follows.

Definition 11 (Semantics sem1′).

The semantics of a tuple $⟨ v_{1}, \dots, v_{k} | t_{1}, t_{2}, t_{3}, t_{4} ⟩$ is the set of functions $F = {f^{'} : [c_{s}, c_{e}] \to {⟨ v_{1}, \dots, v_{k} ⟩} | t_{1} ⩽ c_{s} ⩽ t_{2} \land t_{3} < c_{e} < t_{4}}$

This is, probably, the most intuitive notion of temporal indeterminacy in TDBs: each tuple represents a single occurrence of a fact, and temporal indeterminacy concerns the starting and ending chronons of it. In such a context, it looks natural to impose $t_{1} ⩽ t_{2} < t_{3} ⩽ t_{4}$ , thus granting that there is at least one chronon in which the fact certainly occurs (see, e.g., [11]).

Example 4.
Given the temporally indeterminate relation ${SYM}^{I T}$ , with the semantics (sem1′) above, the fact (f2) Mary had moderate fever at 12 and 13, and possibly at 11 can be represented by the tuple $⟨ Mary, fever, moderate | 11, 12, 14, 14 ⟩$ The semantics of such a tuple is (where “M” stands for $⟨ Mary, fever, moderate ⟩$ ): $\begin{matrix} 11 \to {M} \\ 12 \to {M} & 12 \to {M} \\ 13 \to {M} & 13 \to {M} \end{matrix}$

Notably, if we assume the semantics (sem1′), the fact (f1)
John had high fever at 10 and 11, and possibly at 12, or 13, or both
cannot be represented in the representational model: as a matter of fact, in the semantics (sem1′) the tuple $⟨ John, fever, high | 10, 10, 12, 14 ⟩$ would be interpreted as the compact representation of the semantics below: $\begin{matrix} 10 \to {J} & 10 \to {J} & 10 \to {J} \\ 11 \to {J} & 11 \to {J} & 11 \to {J} \\ 12 \to {J} & 12 \to {J} \\ 13 \to {J} \end{matrix}$ while the scenario $⟨ 10 \to {J}, 11 \to {J}, 13 \to {J} ⟩$ would not be part of the semantics of the representation. Indeed, if we assume (sem1′), each tuple represents a single occurrence of a fact, while the scenario $⟨ 10 \to {J}, 11 \to {J}, 13 \to {J} ⟩$ above represents two separate occurrences, one at $[10, 12)$ , and one at $[13, 14)$ .

Fig. 1.
Algebraic operators for indeterminate time (independent chronons semantics).

Of course, the specification of the semantics if fundamental also for the definition of the algebraic operators. In particular, we must grant that such operators
Are correct with respect to the semantics, and

Are closed with respect to the representational model
Notably, if we assume the semantics (sem1′) for the representational model in Definition 10, there is no way to satisfy both requirements (i) and (ii).2
²
Notably, it is possible to show that it is not possible to define correct algebraic operators closed with respect to the representational model also in case one admits the possibility that facts in the TDBs do not necessarily occur, i.e., imposing $t_{1} ⩽ t_{2} ⩽ t_{3} ⩽ t_{4}$ in the representational model.

A trivial counter-example is discussed in the following, considering algebraic difference.
Example 5.
Consider the difference between two relations $r^{I T}$ and $s^{I T}$ having the same schema $(A_{1}, \dots, A_{k} | T 1, T 2, T 3, T 4)$ . Let $\begin{array}{l} r^{I T} = {⟨ a_{1}, \dots, a_{k} | 1, 3, 5, 7 ⟩} and \\ s^{I T} = {⟨ a_{1}, \dots, a_{k} | 3, 3, 8, 8 ⟩} \end{array}$ (i.e., the two tuples are value-equivalent, and the tuple in $s^{I T}$ is determinate, starts at 3 and ends at 7). In such a case the result of the difference $r^{I T} -^{I T} s^{I T}$ should be a fact $a_{1}, \dots, a_{k}$ which may not occur, or occur in ${2}$ , or in ${1, 2}$ . A tuple with such a semantics cannot be represented in the given representation. Therefore, this example suffices to show that (correct) difference is not closed with respect with the given formalism (with semantics sem1′ above).

6.2. “Independent chronons” semantics

A different way of interpreting the “rough” semantics (sem1) above is the following:

the fact represented by the tuple $⟨ v_{1}, \dots, v_{k} | t_{1}, t_{2}, t_{3}, t_{4} ⟩$ certainly holds in each chronon in $[t_{2}, t_{3})$ (if any), and may hold in each one of the chronons between $t_{1}$ and $t_{2}$ , and between $t_{3}$ and $t_{4}$ .

In such a semantics, each scenario includes the chronons $[t_{2}, t_{3})$ , as well as (possibly) a set of chronons in $[t_{1}, t_{2}) \cup [t_{3}, t_{4})$ . There are as many different scenarios as the cardinality of the power set of the chronons in $[t_{1}, t_{2}) \cup [t_{3}, t_{4})$ .

Definition 12 (Semantics sem1′′).

The semantics of a tuple $⟨ v_{1}, \dots, v_{k} | t_{1}, t_{2}, t_{3}, t_{4} ⟩$ is the set of functions $F = {f^{'} : C_{S} \to {⟨ v_{1}, \dots, v_{k} ⟩} | C_{S} = {t_{2}, t_{2} + 1, t_{2} + 2, \dots, t_{3} - 1} \cup S^{'}$ , where $S^{'} \in P S ({t_{1}, t_{1} + 1, t_{1} + 2, \dots, t_{2} - 1} \cup {t_{3}, t_{3} + 1, t_{3} + 2, \dots, t_{4} - 1})$ , and $P S (S)$ denotes the power set of a set S.

In such a semantics, there is no notion of single occurrence at all. One simply regards chronons independently of each other. In such a context, it looks natural to impose $t_{1} ⩽ t_{2} ⩽ t_{3} ⩽ t_{4}$ and $t_{1} < t_{4}$ so that the fact must be possible in at least a chronon, but may also not be certain in any chronon (in case $t_{2} = t_{3}$ ).

Example 6.
Given the temporally indeterminate relation ${SYM}^{I T}$ , with the semantics (sem1′′) above, the fact (f1)
John had high fever at 10 and 11, and possibly at 12, or 13, or both
is represented in the representational model by the tuple $⟨ John, fever, high | 10, 10, 12, 14 ⟩$ which has the semantics discussed above, i.e., $\begin{matrix} 10 \to {J} & 10 \to {J} & 10 \to {J} & 10 \to {J} \\ 11 \to {J} & 11 \to {J} & 11 \to {J} & 11 \to {J} \\ 12 \to {J} & 13 \to {J} & 12 \to {J} \\ 13 \to {J} \end{matrix}$

With such a semantics for the representational model, it is possible to define correct and closed algebraic operators as follows:
Definition 13 (Algebraic operators for indeterminate time (independent chronons semantics)).

Let r and s denote relations of the proper sort and $⟨ v | t_{1}, t_{2}, t_{3}, t_{4} ⟩$ a tuple with non-temporal part v and temporal part $t_{1}$ , $t_{2}$ , $t_{3}$ , $t_{4}$ . Algebraic operators between r and s are defined as in Fig. 1. The difference function (in the definition of $-^{I T}$ ) can be defined as in Algorithm 1 (where s is a function that returns the starting point of an interval and e returns the ending point).

Algorithm 1:

$difference (p 1, n 1, p 2, n 2)$

The difference function accepts as parameters two time intervals for the minuend ( $p 1$ and $n 1$ ) and two time intervals for the subtrahend ( $p 2$ and $n 2$ ). $p 1$ and $p 2$ are the possible intervals, i.e., they contain the chronons that are in at least one scenario, and $n 1$ and $n 2$ are the necessary – certain – intervals, i.e., they contain the chronons that are in every scenario (thus $n 1 \subseteq p 1$ and $n 2 \subseteq p 2$ ). The function operates along the following idea: if a chronon is both in the minuend and in the subtrahend, and in the subtrahend such a chronon is (i) necessary (i.e., it belongs to $n 2$ ), it will not be in the result, (ii) only possible (i.e., it belongs to $p 2$ but not to $n 2$ ), it will be possible in the result. From (i) and the fact that $n 1 \subseteq p 1$ , descends the first part (lines 1–2) of the difference function, from (ii) descends the second part (lines 3–7), from (i) and (ii) and the fact that $n 2 \subseteq p 1$ descends the third part (lines 8–17) and, in particular, since $n 2 \subseteq p 1$ the minuend “breaks” into two (pairs of) intervals, from (i) and (ii) and the fact that $n 2 ⊈ p 1$ descends the fourth part (lines 18–22).

In such a context, the NEC and POSS operators can be defined as in Definition 14.

Definition 14 (NEC and POSS operators for indeterminate time (independent chronons semantics)).

Let r denote an indeterminate time relation and $⟨ v | t_{1}, t_{2}, t_{3}, t_{4} ⟩$ a tuple with non-temporal part v and temporal part $t_{1}$ , $t_{2}$ , $t_{3}$ , $t_{4}$ . $\begin{array}{l} NEC (r) = & {⟨ v | t_{2}, t_{2}, t_{3}, t_{3} ⟩ | \\ ⟨ v | t_{1}, t_{2}, t_{3}, t_{4} ⟩ \in r \land t_{2} < t_{3}} \\ POSS (r) = & {⟨ v | t_{1}, t_{1}, t_{1}, t_{4} ⟩ | ⟨ v | t_{1}, t_{2}, t_{3}, t_{4} ⟩ \in r} \end{array}$

Roughly speaking, since the semantics of the representation is (sem1′′) above, the chronons in which a fact necessarily occurs/holds are the ones in $[t_{2}, t_{3})$ . A fact with a determinate time $[t_{2}, t_{3})$ can be represented, in our formalism, by a tuple $⟨ v | t_{2}, t_{2}, t_{3}, t_{3} ⟩$ , in which the intervals for possible chronons (i.e., $[t_{2}, t_{2})$ and $[t_{3}, t_{3})$ ) are empty, and the interval for necessary chronons is $[t_{2}, t_{3})$ . Analogously, the chronons in which the fact may occur (i.e., in which the fact possibly or necessarily occur) are the ones in $[t_{1}, t_{2}) \cup [t_{2}, t_{3}) \cup [t_{3}, t_{4})$ , i.e., in $[t_{1}, t_{4})$ . A fact with determinate time $[t_{1}, t_{4})$ can be represented, in our formalism, by a tuple $⟨ v | t_{1}, t_{1}, t_{1}, t_{4} ⟩$ , in which the first interval for possible chronons (i.e., $[t_{1}, t_{1})$ ) and the interval for necessary chronons (i.e., $[t_{1}, t_{1})$ ) are empty, while the second interval for possible chronons is $[t_{1}, t_{4})$ . Notably, other semantically equivalent representations are possible.

Property 1.
The algebraic operators in Definitions 13 and 14 are correct (with respect to the semantics defined so far) and are closed with respect to the representational model.

Fig. 2.
Cartesian product. Comparisons between our approach and TSQL2 as regards execution time (on the left) and I/O (on the right), considering datasets of different dimensions.
Proof (sketch).
Let us consider, for example, the case of Cartesian Product. The closure of such an operation directly follows from its definition: the output is a set of tuples having as non-temporal values the concatenation of the non-temporal values being paired, and as temporal part a quadruple of values $t_{1}$ , $t_{2}$ , $t_{3}$ , $t_{4}$ such that $t_{1} ⩽ t_{2} ⩽ t_{3} ⩽ t_{4}$ and $t_{1} < t_{4}$ (notably, in case $max (t^{'} 1, t^{″} 1) ⩾ min (t^{'} 4, t^{″} 4)$ , the tuple is not part of the output). As regards correctness with respect to the semantics, we have to prove that, given any chronon $c \in D_{T}$ ,
a tuple $t = ⟨ v | t_{1}, t_{2}, t_{3}, t_{4} ⟩$ belongs to $NEC (r \times^{I T} s) (c)$ if and only if it belongs to $⋂_{f_{r} \in S (r^{I T}) \land f_{s} \in S (s^{I T})} (r_{f_{r}}^{I T} (c) \times^{C} s_{f_{s}}^{I T} (c))$

a tuple $t = ⟨ v | t_{1}, t_{2}, t_{3}, t_{4} ⟩$ belongs to $POSS (r \times^{I T} s) (c)$ if and only if it belongs to $⋃_{f_{r} \in S (r^{I T}) \land f_{s} \in S (s^{I T})} (r_{f_{r}}^{I T} (c) \times^{C} s_{f_{s}}^{I T} (c))$
(where $\times^{C}$ denotes Codd’s Cartesian Product). Let us consider, for short, the part of the proof: if a tuple $t = ⟨ v | t_{1}, t_{2}, t_{3}, t_{4} ⟩$ belongs to $NEC (r \times^{I T} s) (c)$ , then it must belong to $⋂_{f_{r} \in S (r^{I T}) \land f_{s} \in S (s^{I T})} (r_{f_{r}}^{I T} (c) \times^{C} s_{f_{s}}^{I T} (c))$ . By definition of NEC, all tuples in $NEC (r)$ are of the form $⟨ v | t_{2}, t_{2}, t_{3}, t_{3} ⟩$ . If $t = ⟨ v | t_{2}, t_{2}, t_{3}, t_{3} ⟩$ and $t \in NEC (r \times^{I T} s) (c)$ , we have that (i) c must belong to $[t_{2}, t_{3})$ and (ii) there must be a tuple $t^{'} \in (r \times^{I T} s)$ such that $t^{'} = ⟨ v | t_{1}, t_{2}, t_{3}, t_{4} ⟩$ . Given (i), the semantics (sem 1”) implies that $c \to {v}$ belongs to all the scenarios of $(r \times^{I T} s)$ . If such the tuple t’ belongs to $(r \times^{I T} s)$ , given the definition of $\times^{I T}$ , there must be a tuple $t_{r} = ⟨ v_{r} | t_{1}^{r}, t_{2}^{r}, t_{3}^{r}, t_{4}^{r} ⟩ \in r$ and a tuple $t_{s} = ⟨ v_{s} | t_{1}^{s}, t_{2}^{s}, t_{3}^{s}, t_{4}^{s} ⟩ \in s$ such that $v = v_{r} \cdot v_{s}$ and $t_{2} = max (t_{2}^{r}, t_{2}^{s}) \land t_{3} = min (t_{3}^{r}, t_{3}^{s})$ (i.e., $[t_{2}, t_{3}) = [t_{2}^{r}, t_{3}^{r}) \cap [t_{2}^{s}, t_{3}^{s})$ ). Since $c \in [t_{2}, t_{3})$ and $[t_{2}, t_{3}) = [t_{2}^{r}, t_{3}^{r}) \cap [t_{2}^{s}, t_{3}^{s}))$ , we thus have that $c \in [t_{2}^{r}, t_{3}^{r})$ and $c \in [t_{2}^{s}, t_{3}^{s})$ . Since $c \in [t_{2}^{r}, t_{3}^{r})$ , given (sem1′′), we have that $c \to {v_{r}}$ belongs to all the scenarios $f_{r} \in S (r^{I T})$ of $r^{I T}$ . Since $c \in [t_{2}^{s}, t_{3}^{s})$ , given (sem1′′), we have that $c \to {v_{s}}$ belongs all the scenarios $f_{s} \in S (s^{I T})$ of $s^{I T}$ . As a consequence, given the definition of Codd’s Cartesian Product, in each combination of scenarios for $r^{I T}$ and $s^{I T}$ , $v = v_{r} \cdot v_{s}$ belongs to the output, so that $v = v_{r} \cdot v_{s}$ belongs to intersection of all the outputs.

The proof in the other direction is similar, and is not reported for the sake of brevity. □
7. Experimental evaluations

As discussed in Section 3, the treatment of temporal indeterminacy involves the management of implicit data in the treatment of query answering (algebraic operators), so that AI symbolic manipulation techniques can be used. Specifically, in Section 6.2, we have defined new algebraic operators for indeterminate time, and we have proven their correctness with respect to the underlying semantics. In this Section, we discuss some experimental evaluations, showing that, though significantly more expressive, our approach only adds a negligible overhead with respect to TSQL2, in which only determinate (i.e., exact) valid time is considered (notably, in TSQL2 book, Chapter 18, also temporal indeterminacy is considered; however, no extension of algebraic operators to cope with indeterminacy is proposed).

Fig. 3.

Difference. Comparisons between our approach and TSQL2 as regards execution time (top left figure), I/O (top right figure), and answer set (bottom left figure), considering datasets of different dimensions.

We have implemented the data model and algebra discussed in Section 6.2 and the TSQL2 model and algebra to cope with determinate time, using PL/pgSQL stored procedures and PostgreSQL. We have experimentally evaluated the performance of our temporal algebra. Experiments have been performed on an Intel Core i7-6700HQ CPU, 2.60 GHZ, 8 GB RAM, OS Windows 10, using PostgreSQL 10 DBMS with default settings (effective cache size 500 MB, 8 KB page size and 500 MB shared buffers).

We have generated datasets of different dimension, to test and compare the scalability of our approach, with respect to TSQL2. In particular, we have focused our experiments on the algebraic operators that manipulate the temporal attributes, i.e., Cartesian Product and difference. As regard Cartesian Product in our approach, we have generated two tables T2 and T2 (of increasing dimension: 500, 1000, 2500, 5000, 10000, 25000, and 50000 tuples each) such that 10% of the tuples of T1 intersects as regards the possible chronons and, among them, 50% intersects also as regards certain chronons. In TSQL2, the two tables T1’ (corresponding to T1) and T2’ (corresponding to T2) have been implemented considering only two temporal attributes (to model the start and the end of the valid time). They contain the same tuples as T1 and T2 as regard the non-temporal attributes, and have as valid time the possible valid time of the corresponding tuples in T1 and T2. Left part of Fig. 2 shows the execution times of Cartesian Product in our approach (blue and solid line) and in TSQL2 (red and dotted line), expressed in milliseconds. Right part of Fig. 2 shows the physical I/O (number of blocks). The answer set is not shown, since, obviously, it is the same for the two approaches. Notably, the I/O is slightly greater in our approach, due to the fact that our data model has two additional temporal attributes (to model both possible and certain time) with respect to TSQL2 (in which only determinate time is modeled). Such a fact also explains the slight overhead in execution time, which is also caused by the fact that the computation of intersection requires more operations in our approach (with respect to TSQL2).

As regard temporal difference, we have generated two tables T1 and T2 of increasing dimension (10000, 25000, 50000, 100000, 250000, 500000, 1000000 tuples) in such a way that 10% of the tuples in T1 have a value-equivalent counterpart (i.e., tuples with the same values for the non-temporal attributes) in T2. Among them, 20% temporally intersects as regards possible times, and 10% intersects also as regards certain time. As above, the tuples in the corresponding TSQL2 tables contain the same tuples as T1 and T2 as regard the non-temporal attributes, and have as valid time the possible valid time of the corresponding tuples in T1 and T2. Notably, the answer set of our approach is significantly greater than the one in TSQL2. This is a natural consequence of the different semantics of data. In TSQL2, only certain time can be considered, so that, e.g., the difference between a time interval t and a time interval t’ containing t is obviously empty. On the other hand, in case t and t’ are possible times, their difference is t. As an obvious consequence, the output in our approach can be larger. Given the larger answer set, and the additional memory needed to store four temporal attributes (instead than two, as in TSQL2), also the I/O (and the execution time) of our approach is higher than in TSQL2. Notably however, Fig. 3 clearly shows that the I/O and execution-time overhead of our approach is negligible.

To summarize, our experimental evaluations show that, while adding the possibility of modeling and querying indeterminate time, our AI-based approach only adds a negligible overhead to the “consensus” TSQL2 approach, in which only exact time is considered.

8. Conclusions and future work

In this paper, we propose an innovative AI approach in which a semantic-based AI-style methodology is applied to the context of TDBs in order to cope with temporal indeterminacy. Specifically:

We have proposed a new “functional” semantic definition for indeterminate time in TDBs, which allows us to express the semantics of temporal algebraic operators in terms of their Codd’s counterparts (thus formally providing a “snapshot semantics” for indeterminate time TDBs).

We have proposed a new AI-style methodology to the treatment of TDBs, using it to develop a semantically-grounded 1NF approach (data model plus algebra) to cope with “interval-based” temporal indeterminacy.

We have experimentally shown that our AI-based approach only adds a negligible overhead to the “consensus” TSQL2 approach, while adding significant additional expressiveness (i.e., indeterminate time with respect to exact time).

Indeed, in this paper we have widely discussed the fact that, when introducing the temporal dimension, TDBs have to cope with implicit information, which has to be symbolically manipulated by algebraic operators to answer queries. As a consequence, we have proposed an innovative AI-based methodology to cope with time in relational DBs. We are confident that our methodology can be fruitfully applied to other types of temporal information in TDBs (e.g., “now-relative” data [2], implicit representation of periodically repeated data [23,24], temporal data with preferences [25]), and possibly of other forms of indeterminacy, thus leading to a new AI stream of research to cope with indeterminate/implicit data in relational DBs.

Footnotes

BCDM semantics

BCDM (Bitemporal Conceptual Data Model) [15] is a unifying data model, isolating the “core” semantics underlying many temporal relational approaches, including TSQL2. In BCDM, tuples are associated with valid time and transaction time [22]. For both domains, a limited precision is assumed (the chronon is the basic time unit). Both time domains are totally ordered and isomorphic to the subsets of the domain of natural numbers. The domain of valid times $D_{V T}$ is given as a set $D_{V T} = {c_{1}, \dots, c_{k}}$ of chronons, and the domain of transaction times $D_{T T}$ is given as $D_{T T} = {c^{'} 1, \dots, c^{'} j} \cup {U C}$ (where UC –Until Changed– is a distinguished value). In general, the schema of a BCDM relation $R = (A_{1}, \dots, A_{n} | T)$ consists of an arbitrary number of non-timestamp (explicit henceforth) attributes $A_{1}, \dots, A_{n}$ , encoding some fact, and of a timestamp attribute T, with domain $D_{T T} \times D_{V T}$ ; the explicit attributes and the timestamp attribute are separated by the symbol “|”. Thus, a tuple $x = (v_{1}, \dots, v n | t_{b})$ in a BCDM relation $r (R)$ on the schema R consists of a number of attribute values associated with a set of bitemporal chronons $c_{b_{l}} = (c_{h}^{'}, c_{i})$ , with $c_{h}^{'} \in D_{T T}$ and $c_{i} \in D_{V T}$ , to denote that the fact $v_{1}, \dots, v_{n}$ is current (present in the database) at time $c_{h}^{'}$ and valid at time $c_{i}$ . An empty timestamp and value-equivalent tuples (i.e., tuples which are equal as regards the values of the non-temporal attributes [20]) are not admitted.

Valid-time, transaction-time and atemporal tuples are special cases, in which either the transaction time, or the valid time, or both of them are absent. In the following, we restrict our attention to valid time (in fact, temporal indeterminacy cannot affect transaction time). In BCDM, in which each tuple is paired with all the chonons when it holds. In BCDM, temporal databases directly associate valid times with tuples, so that the semantics of Example 1 above would be modeled as follows: $\begin{array}{l} {⟨ John, fever, high | {10, 11, 12} ⟩, \\ ⟨ Mary, fever, moderate | {11, 12, 13} ⟩} \end{array}$

Codd designated as complete any query language that was as expressive as his set of five relational algebraic operators: relational union (∪), relational difference (−), selection ( $σ_{P}$ ), projection ( $π_{X}$ ), and Cartesian product (×) [6]. BCDM generalizes these operators to cover bitemporal relations. BCDM dictates that, to support the “snapshot semantics”, temporal operators must behave as standard non-temporal operators on the non-temporal attributes, and apply set operators on the temporal component of tuples. Cartesian product involves the intersection of the temporal components, projection and union involve their union, and difference the difference of temporal components.

This definition can be motivated by a sequenced snapshot semantics [8]: results should be valid independently at each point of time. Such a property is formally proved by proving the reducibility property.

To prove reducibility, they first introduce the timeslice operators. For the sake of simplicity, here we consider valid time relations only, so that valid timeslice operator is introduced.

The valid timeslice operator, given a valid time BCDM relation and a time t, removes the temporal part of the tuple and retains only the tuples whose valid time contained t.

References

J.F.

Allen, Time and time again: The many ways to represent time, International Journal of Intelligent Systems 6(4) (1991), 341–355. doi:10.1002/int.4550060403.

Anselma,

Piovesan,

Sattar,

Stantic and

Terenziani, A comprehensive approach to “now” in temporal relational databases: semantics and representation, IEEE Transactions on Knowledge and Data Engineering 28(10) (2016), 2538–2551. doi:10.1109/TKDE.2016.2588490.

Anselma,

Piovesan and

Terenziani, A 1NF temporal relational model and algebra coping with valid-time temporal indeterminacy, J. Intell. Inf. Syst. 47(3) (2016), 345–374. doi:10.1007/s10844-015-0367-2.

Anselma,

Terenziani and

R.T.

Snodgrass, Valid-time indeterminacy in temporal relational databases: semantics and representations, IEEE Transactions on Knowledge and Data Engineering 25(12) (2013), 2880–2894. doi:10.1109/TKDE.2012.199.

Chomicki and

Toman, Temporal logic in information systems, in: Logics for Databases and Information Systems (the Book Grow Out of the Dagstuhl Seminar 9529: Role of Logics in Information Systems 1995,

Chomicki and

Saake, eds, Kluwer, 1998, pp. 31–70.

E.F.

Codd, Relational Completeness of Data Base Sublanguages, In: R. Rustin (ed.): Database Systems: 65–98, Prentice Hall and IBM Research Report RJ 987, San Jose, California (1972).

Dekhtyar,

R.B.

Ross and

V.S.

Subrahmanian, Probabilistic temporal databases, I: Algebra, ACM Trans. Database Syst. 26(1) (2001), 41–95. doi:10.1145/383734.383736.

Dunn,

Davey,

Descour and

R.T.

Snodgrass, Sequenced subset operators: definition and implementation, in: ICDE, 2002, pp. 81–92.

Dutta, Generalized events in temporal databases, in: Data Engineering, 1989. Proceedings. Fifth International Conference on, 1989, pp. 118–125. doi:10.1109/ICDE.1989.47207.

10.

Dyreson, Temporal indeterminacy, in: Encyclopedia of Database Systems,

Liu and

M.T.

Özsu, eds, Springer, Boston, MA, 2009, pp. 2973–2976, ISBN 978-0-387-39940-9, 978-0-387-35544-3.

11.

C.E.

Dyreson and

R.T.

Snodgrass, Supporting valid-time indeterminacy, ACM Trans. Database Syst. 23(1) (1998), 1–57. doi:10.1145/288086.288087.

12.

S.B.N.R.

Elmasri, Fundamentals of Database Systems, Pearson, 2003.

13.

E.A.

Emerson, Temporal and modal logic, in: Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics (B), Elsevier, 1990, pp. 995–1072.

14.

S.K.

Gadia,

S.S.

Nair and

Y.-C.

Poon, Incomplete information in relational temporal databases, in: VLDB, 1992, pp. 395–406.

15.

C.S.

Jensen and

R.T.

Snodgrass, Semantics of Time-Varying Information, Information Systems 21 (1996), 311–352.

16.

Liu and

M.T.

Özsu (eds), Encyclopedia of Database Systems, Springer, 2009. ISBN 978-0-387-35544-3.

17.

L.E.

McKenzie and

R.T.

Snodgrass, Evaluation of relational algebras incorporating the time dimension in databases, ACM Comput. Surv. 23(4) (1991), 501–543. doi:10.1145/125137.125166.

18.

Minker, On indefinite databases and the closed world assumption, in: 6th Conference on Automated Deduction, Proceedings, New York, USA, June 7–9, 1982,

D.W.

Loveland, ed., Lecture Notes in Computer Science, Vol. 138, Springer, 1982, pp. 292–308. ISBN 978-3-540-11558-8. doi:10.1007/BFb0000066.

19.

R.T.

Snodgrass, Monitoring Distributed Systems: A Relational Approach, PhD thesis, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, 1982.

20.

R.T.

Snodgrass (ed.), The TSQL2 Temporal Query Language, Kluwer, 1995. ISBN 0-7923-9614-6.

21.

R.T.

Snodgrass, Developing Time-Oriented Database Applications in SQL, Morgan Kaufmann, 1999. ISBN 1-55860-436-7.

22.

R.T.

Snodgrass and

Ahn, Temporal databases, IEEE Computer 19(9) (1986), 35–42. doi:10.1109/MC.1986.1663327.

23.

Terenziani, Nearly periodic facts in temporal relational databases, IEEE Trans. Knowl. Data Eng. 28(10) (2016), 2822–2826. doi:10.1109/TKDE.2016.2585483.

24.

Terenziani, Irregular indeterminate repeated facts in temporal relational databases, IEEE Trans. Knowl. Data Eng. 28(4) (2016), 1075–1079. doi:10.1109/TKDE.2015.2509976.

25.

Terenziani,

Andolina and

Piovesan, Managing temporal constraints with preferences: Representation, reasoning, and querying, IEEE Trans. Knowl. Data Eng. 29(9) (2017), 2067–2071. doi:10.1109/TKDE.2017.2697852.

26.

Vila, A survey on temporal reasoning in artificial intelligence, AI Commun. 7(1) (1994), 4–28. doi:10.3233/AIC-1994-7102.

27.

Wu,

Jajodia and

X.S.

Wang, Temporal database bibliography update, in: Temporal Databases, Dagstuhl, 1997, pp. 338–366. doi:10.1007/BFb0053709.

Dealing with temporal indeterminacy in relational databases: An AI methodology

Abstract

Keywords

1. Introduction

2. TDB approaches to valid time temporal indeterminacy

3. Towards an AI-style semantic-based methodology to cope with time in relational DBs

4. Snapshot semantics for determinate time DBs: A “functional” perspective

4.1. Data semantics

Definition 1 ((non-temporal) Database, Relation, Tuple).

Definition 2 (Temporal domain D T ).

Definition 3 (Temporal database (semantic notion)).

Definition 4 (Time slice).

Definition 5 (Relational algebraic operators on temporal databases (“semantic” notion)).

5. Snapshot semantics for temporal indeterminacy in TDB

Definition 6 (Indeterminate temporal database (semantic notion)).

Example 2 (cont).

Definition 7 (Scenario slice).

5.2. Query semantics

Definition 8 (Relational algebraic operators on indeterminate temporal databases (“semantic” notion)).

Definition 9 (Relational algebraic operators on indeterminate temporal databases (“semantic” notion)).

6. Possible “compact” approaches to temporal indeterminacy

1 A relation is in first normal form if and only if the domain of each attribute contains only atomic (indivisible) values, and the value of each attribute contains only a single value from that domain [12].

Definition 11 (Semantics sem1′).

Definition 12 (Semantics sem1′′).

Footnotes

BCDM semantics

References

Definition 2 (Temporal domain $D_{T}$ ).

¹
A relation is in first normal form if and only if the domain of each attribute contains only atomic (indivisible) values, and the value of each attribute contains only a single value from that domain [12].