The distribution semantics in probabilistic logic programming and probabilistic description logics: a survey

Abstract

Representing uncertain information is crucial for modeling real world domains. This has been fully recognized both in the field of Logic Programming and of Description Logics (DLs), with the introduction of probabilistic logic languages and various probabilistic extensions of DLs respectively. Several works have considered the distribution semantics as the underlying semantics of Probabilistic Logic Programming (PLP) languages and probabilistic DLs (PDLs), and have then targeted the problem of reasoning and learning in them. This paper is a survey of inference, parameter and structure learning algorithms for PLP languages and PDLs based on the distribution semantics. A few of these algorithms are also available as web applications.

Keywords

Statistical relational learning distribution semantics probabilistic logic programming probabilistic description logics

1 Introduction

Recently, in the field of Machine Learning, expressive knowledge representations have been considered to cope with a variable number of entities as well as the relationships that hold among them. These representations are mostly based on logic that provides a high expressivity useful in relational domains, and an excellent theoretical foundation for learning; however, it has limitations when reasoning on uncertain domains. These limitations have been lifted, allowing a multitude of different formalisms combining probability theory with logic, databases or logic programming. This direction has been taken, for instance, in the fields of Statistical Relational Learning (SRL) [37], starting from the nineties, for reasoning and learning in domains with complex and relational structure, and of probabilistic Description Logics (DLs), for reasoning and learning in ontologies.

Probabilistic Logic Programming (PLP) [80] has recently received increasing attention inside the SRL field for its ability to incorporate probability in logic programming, allowing one to exploit logic programming techniques. Among the various proposals, the one based on the distribution semantics [87] has gained popularity as the basis of languages such as Probabilistic Horn Abduction (PHA) [66], PRISM [88], Independent Choice Logic (ICL) [65], pD [33], Probabilistic Logic Programs [21], Logic Programs with Annotated Disjunctions (LPADs) [102], ProbLog [26] and CP-logic [101]. Such semantics is particularly appealing for its intuitiveness and because efficient inference algorithms have started to appear [26 , 70]. In many cases they find explanations for queries and compute their probability by building a Binary Decision Diagram (BDD) [26 , 84]. Next, various works started to appear on the problem of learning parameters of probabilistic logic programs based on the distribution semantics, such as LeProbLog [36] and LFI-ProbLog [37] for the ProbLog language, and of learning their structure directly from data, with works such as [25] for ProbLog and [60] for ground LPADs.

This paper surveys several inference and learning systems for PLP languages under the distribution semantics developed for LPADs. As for inference, it will be covered in detail an algorithm for lifted inference [9] and implementations of the “maximum-a-posteriori” and the “most probable explanation” [7] inference tasks in the PITA reasoner [85]. As for parameter and structure learning, the EMBLEM [12], SLIPCASE [11], SLIPCOVER [13], LEMUR [30] algorithms will be described. EMBLEM learns LPADs parameters by means of the BDDs that are built for inference. SLIPCASE, SLIPCOVER and LEMUR learn LPADs structure by relying on EMBLEM for parameter learning.

PLP languages are a suitable framework to handle uncertain information, but usually require expensive inference and learning procedures. For this reason, in the last decade many languages that impose limitations on the form of sentences have been proposed. A possible way to pursue this goal is the application of learning from interpretations [14, 24] instead of the classical setting of learning from entailment. A system that learns from interpretations is Inductive Constraint Logic (ICL) [23], based on the language of Constraint Logic Theories, i.e. models in the form of sets of integrity constraints. [78, 81] propose a probabilistic version of the integrity constraints and an algorithm, PASCAL, that learns them.

The adoption of logical-statistical languages has been particularly useful for modeling social networks analysis, link prediction, entity recognition, collective classification and information extraction, to name a few. In fact, the datasets used for testing algorithms for those languages (see the above-mentioned publications) often represent such kind of domains.

Formalisms for dealing with uncertainty have started to play an important role also in research related to the Semantic Web, where knowledge may come from different sources with different reliability. The main idea of the Semantic Web is making information available in a form that is understandable and automatically manageable by machines [42]. In order to realize this vision, the W3C 1 has supported the development of a family of knowledge representation formalisms of increasing complexity for defining ontologies, called Web Ontology Language (OWL). Ontologies have played a decisive part in the development of the Semantic Web as a means for defining shared terms in web resources, and efficient reasoners [38 , 92] are used to extract implicit information from the modeled ontologies. In particular, representation and reasoning with uncertain information has been investigated by various authors both in the general case of First-Order Logic and in the case of restricted logics, such as Description Logics. DLs are perhaps best known as the basis for ontology languages such as OWL [45]. Description Logics possess nice computational properties such as decidability and/or low complexity [3]. In this direction, the distribution semantics and the inference techniques developed for PLP have been applied to DLs, with the proposal of a “distribution semantics for probabilistic ontologies” called DISPONTE [74, 104], which annotates the axioms of a knowledge base (KB) with a probability. Several reasoners were then developed for computing the probability of queries from uncertain KBs by encoding their explanations in BDDs or as a pinpointing formula: BUNDLE [76, 83], TRILL [105], TRILL^P [105], TORNADO [106]. The implementations of all the previously mentioned systems have been subjected to an extensive experimental evaluation on benchmarks and real-life databases, showing that they can tackle problems of realistic size with often higher performance and lower time than other state-of- the-art existing tools (see the respective publications for details about performance comparison). All works are publicly available at https://ml.unife.it/software/ and many of them as web applications.

The paper is organized as follows. Section 2 introduces background on First-Order Logic. Section 3 presents the distribution semantics for Probabilistic Logic Programs and Probabilistic Description Logics. Section 4 surveys inference and learning systems for LPADs. Section 5 surveys inference and learning in DLs under DISPONTE. Section 6 concludes the paper.

2 Background on First-Order Logic (FOL)

For a more detailed background on FOL we refer the reader to [55]. A first-order alphabet Σ is a set of predicate symbols and function symbols (or functors) together with their arity. A functor with arity 0 is called a constant. A term is either a variable or a functor applied to a tuple of terms of length equal to the arity of the functor. An atom A is a predicate symbol applied to a tuple of terms of length equal to the arity of the predicate. A literal L is either an atom A or its negation ¬A; in the latter case it is called a negative literal. Here we use the logic programming conventions of indicating predicates and constants with alphanumeric strings starting with a lowercase character and variables with alphanumeric strings starting with an uppercase character. The connectives are ∼ (negation), ∧ (conjunction), lor (disjunction), → (implication) and ↔ (equivalence), while the quantifiers are ∃ (existential) and ∀ (universal). A (well-formed) formula is an n-ary predicate symbol applied to a tuple of n terms (i.e., an atom); if F and G are formulas, so are ∼F, F ∧ G, FlorG, F → G, F ↔ G; if F is a formula and X is a variable, then ∃X F ("there exists an X") and ∀X F ("for all X") are formulas. A clause is a formula of the form ∀X₁ . . . ∀ X_s (L₁lor . . . lorL_m), where each L_i is a literal and X₁ . . . X_s are all the variables occurring in L₁lor . . . lorL_m. A special clause notation is used in logic programming: ∀X₁ . . . ∀ X_s (A₁lor . . . lorA_klor ∼ B₁lor . . . lor ∼ B_n), where A_i and B_i are literals and X₁ . . . X_s all the variables occurring in them, is represented as A₁, . . . , A_k ← B₁, . . . . , B_n. A normal logic program is a program containing clauses (called definite clauses) of the form A ← B₁, . . . , B_n, where A is an atom and the B_is are literals, i.e., atoms or negations of atoms. A unit clause or fact is a clause of the form A←, that is a clause with an empty body. A goal or query is a clause of the form ←B₁, . . . . , B_n; each B_i is called a subgoal of the goal.

A term, atom, literal, clause or query is ground if it does not contain variables. A substitution θ is an assignment of variables to terms: θ = {V₁/t₁, . . . , V_n/t_n}. The application of a substitution to a term, atom, literal, query or clause C, indicated with Cθ, is the replacement of the variables appearing in C and in θ with the terms specified in θ.

3 The Distribution Semantics

The distribution semantics [87] is one of the most interesting approaches to the integration of logic programming and probability. It was introduced for the PRISM language [88] but is shared by many other languages (see Introduction). A program in one of these languages defines a probability distribution over normal logic programs called worlds. Each normal program is assumed to have a total well-founded model [98], thus each program can be associated with a Herbrand interpretation (a world) that is its model. The distribution is extended to queries and the probability of a query is obtained by marginalizing the joint distribution of the query and the programs. The distribution semantics has been defined both for programs that do not contain function symbols, and thus have a finite set of worlds, and for programs that contain them, that have an infinite set of worlds. We review here the first case for the sake of simplicity. For the treatment of function symbols see [86]; see also examples 40 and 41 of [72].

Languages relying on the distribution semantics differ in the way they define the distribution over logic programs. Probabilistic Logic Programs, PHA, ICL, PRISM allow probability distributions over facts, ProbLog allows probability distributions over facts and the head of clauses, while LPADs allow probability distributions over the heads of disjunctive clauses. All these languages have the same expressive power: there are transformations with linear complexity that can convert each one into the others [22, 100].

In this paper we will use LPADs [102] for their general syntax. In LPADs the alternatives are encoded in the head of clauses in the form of a disjunction in which each atom is annotated with a probability. We consider only sound LPADs, in which every possible world has a two-valued well-founded model, i.e. a total model according to the well-founded semantics [98]. When a program is not sound assigning a semantics to probabilistic logic programs requires different semantics, such as [39], who proposes a three-valued-logic semantics, or the credal semantics [57, 58]. In this way, uncertainty is modeled only by means of the disjunctions in the head and not by the semantics of negation. LPADs without negation in clauses’ bodies are sound. Formally, a Logic Program with Annotated Disjunctions consists of a finite set of annotated disjunctive clauses. An annotated disjunctive clause C_i is of the form $h_{i 1} : Π_{i 1}; \dots; h_{{in}_{i}} : Π_{{in}_{i}} : - b_{i 1}, \dots, b_{{im}_{i}},$ where h_i1, …, h_{in
_i} are logical atoms and {Π_i1, …, Π_{in
_i}} are real numbers in the interval [0, 1] such that $\sum_{k = 1}^{n_{i}} Π_{ik} \leq 1$ ; b_i1, …, b_{im
_i} is indicated with body (C_i). If $\sum_{k = 1}^{n_{i}} Π_{ik} < 1$ , the head implicitly contains an extra atom null that does not appear in the body of any clause and whose annotation is $1 - \sum_{k = 1}^{n_{i}} Π_{ik}$ . We denote by ground (T) the grounding of an LPAD T.

An atomic choice [65] is a triple (C_i, θ_j, k) where C_i ∈ T, θ_j is a substitution that grounds C_i and k ∈ {1, …, n_i} identifies one of the atoms in the head of C_i. (C_i, θ_j, k) means that, for the ground clause C_i θ_j, the head h_ik was chosen. In practice C_i θ_j corresponds to a multi-valued random variable X_ij and an atomic choice (C_i, θ_j, k) to an assignment X_ij = k. A set of atomic choices κ is consistent if (C, θ, i) ∈ κ, (C, θ, j) ∈ κ ⇒ i = j, i.e., only one head is selected from a ground clause; we assume independence between the different choices. A composite choice κ is a consistent set of atomic choices. The probability P (κ) of a composite choice κ is the product of the probabilities of the independent atomic choices, i.e. $P (κ) = \prod_{(C_{i}, θ_{j}, k) \in κ} Π_{ik}$ . A selection σ is a composite choice that, for each clause C_i θ_j in ground (T), contains an atomic choice (C_i, θ_j, k). Since T does not contain function symbols, ground (T) is finite and so is each σ. Let S_T be the set of all selections σ of a program T. A selection σ identifies a normal logic program w_σ, called a world of T (an instance of the LPAD in [102]), as w_σ = {(h_ik ← body (C_i)) θ_j| (C_i, θ_j, k) ∈ σ}.

Let W_T be the set of all worlds of T. Since selections are composite choices, we can assign a probability to worlds: $P (w_{σ}) = P (σ) = \prod_{(C_{i}, θ_{j}, k) \in σ} Π_{ik} .$ (1)

Let P (W_T) be the distribution over worlds; we also write w_σ ⊨ Q to mean that the query Q is true in the well–founded model of the program w_σ. The probability of a query Q given a world w is P (Q|w) =1 if w ⊨ Q and 0 otherwise. The probability of a query Q can be defined by marginalizing the joint probability of the query and the worlds: $\begin{matrix} P (Q) & = \sum_{w \in W_{T}} P (Q, w) = \sum_{w \in W_{T}} P (Q | w) P (w) \\ = \sum_{w \in W_{T} : w ⊨ Q} P (w) \end{matrix}$ (2)P (w) can be computed as in Eq. 1, as the product of the annotations Π_ik of the atoms selected in σ.

Example 1. The following LPAD encodes a very simple model of the development of an epidemic or pandemic:

$\begin{matrix} C_{1} & = & epidemic : 0.6; pandemic : 0.3 : - flu (X), cold . \\ C_{2} & = & cold : 0.7 . \\ C_{3} & = & flu (david) . \\ C_{4} & = & flu (robert) . \end{matrix}$ If somebody has the flu and the climate is cold, there is the possibility that an epidemic or a pandemic arises. We are uncertain about whether the climate is cold but we know for sure that David and Robert have the flu. Clause C₁ has two groundings, C₁ θ₁ with θ₁ = {X/david} and C₁ θ₂ with θ₂ = {X/robert}, corresponding to the multi-valued random variables X₁₁ and X₁₂. Clause C₂ has only one grounding C₂ θ instead, so there is one random variable X₂₁. X₁₁ and X₁₂ can take on three values each since C₁ has three disjunctions in the head (‘epidemic’, ‘pandemic’ and ‘null’); similarly X₂₁ can take on two values since C₂ has two disjunctions in the head (‘cold’ and ‘null’). T has 3 · 3 ·2 = 18 possible worlds, the query Q = epidemic is true in 5 of them and its probability is P (epidemic) =0.6 · 0.6 · 0.7 + 0.6 · 0.3 · 0.7 + 0.6 · 0.1 · 0.7 + 0.3 · 0.6 · 0.7 + 0.1 · 0.6 · 0.7 = 0.588. This sum takes into account the 5 worlds where the query is true. For instance, the first term 0.6 · 0.6 · 0.7 corresponds to the world where the atom ‘epidemic’ was chosen from the head of C₁ θ₁ and from the head of C₁ θ₂ (with probability 0.6), and ‘cold’ was chosen from the head of C₂ θ (with the associated probability 0.7). So world w₁ is:

$\begin{matrix} epidemic & : - & flu (david), cold . \\ epidemic & : - & flu (robert), cold . \\ cold . \\ flu (david) . \\ flu (robert) . \end{matrix}$ and P (w₁) =0.6 · 0.6 · 0.7 = 0.252. The same operation can be repeated for the other 4 worlds.

It is unfeasible to enumerate all the worlds where a query Q is entailed, so Eq. 2 cannot be applied in practice. Instead, inference algorithms find explanations for Q, i.e. composite choices κ such that Q is entailed in all the worlds whose selections are a superset of them. Explanations however, differently from possible worlds, are not necessarily mutually exclusive with respect to each other so they have first to be made disjoint so that a summation (as in Eq. 2) can be computed. Binary Decision Diagrams have been extensively used in PLP [26 , 84] for performing inference in PLP effectively as, thanks to their structure, they make the explanations disjoint. A BDD is a rooted, directed acyclic graph representing a function taking Boolean values on a set of Boolean variables; it has two terminal nodes labeled 0/1, and a set of variable nodes. Each variable node is associated with the variable of its level and has two children, one for each possible value of the variable. Given values for all the variables we can compute the value of the function by traversing the graph starting from the root and returning the value associated with the leaf that is reached. By means of a dynamic programming algorithm [26] that can be applied over these diagrams, the probability of the query can be returned at the root of the BDD.

The application of the distribution semantics for probabilistic logic programming to DLs originated a novel semantics for probabilistic DLs, named DISPONTE (DIstribution Semantics for Probabilistic ONTologiEs) [8 , 104].

DLs are usually represented using a syntax based on concepts and roles. A concept corresponds to a set of individuals of the domain while a role corresponds to a set of pairs of individuals of the domain. A probabilistic knowledge base cK is a set of certain axioms and/or probabilistic axioms. Certain axioms take the form of regular DL axioms. Probabilistic axioms take the form p : : Elabelpax where p is a real number in [0, 1] and E is a DL axiom.

The idea of DISPONTE is to associate independent Boolean random variables to the probabilistic axioms. By assigning values to every random variable we obtain a world, the set of axioms whose random variables are assigned the value 1. Every formula obtained from a certain axiom is included in a world w. For each probabilistic axiom, we decide whether to include it or not in w. A world therefore is a non probabilistic KB that can be assigned a semantics in the usual way. A query is entailed by a world if it is true in every model of the world. The probability p can be interpreted as an epistemic probability, i.e., as the degree of our belief in the axiom E. For example, a probabilistic concept membership axiom p : : a : C means that we have degree of belief p in individual a belonging to concept C.

To illustrate the semantics, we reflect the definitions given for the case of PLP. An atomic choice is a couple (E_i, k) where E_i is the ith probabilistic axiom and k ∈ {0, 1}. k indicates whether E_i is included in a world (k = 1) or not (k = 0). A set of atomic choices identifies a world w including all certain axioms and all probabilistic axioms that have been selected. Let W be the set of all worlds. P (w) is a probability distribution over worlds, i.e., ∑_w∈WP (w) =1. Given a world w, the probability of a query Q can be defined, as usual, as the sum of the probabilities of the worlds where the query is true. A query Q over a KB cK is usually an axiom for which we want to test the entailment from the KB, i,.e. cK ⊨ Q. The probability of each world is given by: $P (w) = P (σ) = \prod_{(E_{i}, 1) \in σ} p_{i} \prod_{(E_{i}, 0) \in σ} (1 - p_{i})$ , where σ is the selection of the atomic choices. The probability is obtained by multiplying the probabilities associated with each axiom as these are considered independent of each other.

Example 2. Consider the following KB, inspired by the “people+pets” ontology proposed in [5]. Suppose the user has the certain knowledge

$\begin{matrix} \exists hasAnimal . Pet ⊑ NatureLover \\ (kevin, fluffy) : has Animal \\ Cat ⊑ Pet \end{matrix}$ indicating that the individuals that own an animal which is a pet are nature lovers, kevin owns the animal fluffy and cats are pets. Moreover, there are two sources with different reliability that provide the information that fluffy is a cat. On one source the user has a degree of belief of 0.4, i.e., he thinks it is correct with a 40% probability, while on the other one he has a degree of belief of 0.3. The user can add the following statements to his KB: E₁ = 0.4 : : fluffy : Catlabelfluffly + cat + 0.4 and E₂ = 0.3 : : fluffy : Catlabelfluffly + cat + 0.3, representing independent evidence on fluffy being a cat.

The query axiom Q = kevin : NatureLover is true in 3 out of the 4 worlds, those corresponding to the selections {(E₁, 1) , (E₂, 1)} , {(E₁, 1) , (E₂, 0)} , {(E₁, 0) , (E₂, 1)} . So P (Q) =0.4 · 0.3 + 0.4 · 0.7 + 0.6 · 0.3 = 0.58 .

Finally, [73] applies DISPONTE to Datalog+/-, a variant of Datalog for defining ontologies [17]. Here two types of probabilistic annotations are considered, the epistemic type and the statistical type.

A probabilistic ontology (D, T) consists of a database D and a set T of certain formulas, that take the form of a Datalog+/- TGD (tuple-generating dependency), NC (negative constraints) or EGD (equality-generating dependencies), of epistemic probabilistic formulas of the form p_i : : _eF_i where p_i is a real number in [0, 1] and F_i is a TGD, NC or EGD, and of statistical probabilistic formulas of the form p_i : : _sF_i where p_i is a real number in [0, 1] and F_i is a TGD. The epistemic probability (indicated by subscript e) represents a degree of confidence in the formula as a whole, while the statistical probability (indicated by subscript s) considers the populations to which the formula is applied. These two types of statements can be related to the work of Halpern [41]: an epistemic statement is a Type 2 statement and a statistical one is a Type 1 statement. For instance, an epistemic probabilistic concept inclusion TGD of the form p : : _ec (X) → d (X) represents the fact that we believe in the truth of c ⊑ d with probability p. A statistical probabilistic concept inclusion TGD of the form p : : _sc (X) → d (X) instead means that a random individual of class c has probability p of belonging to d, thus representing the statistical information that a fraction p of the individuals of c belongs to d.

To obtain a world w of a probabilistic ontology (D, T), every certain formula is included in w; for each epistemic formula, it can be included or not in w; for each statistical formula, all the substitutions for the variables universally quantified in the outermost quantifier 2 are generated and for each grounding it is decided whether or not to include it in w. A query in this case is a BCQ, a Boolean Conjunctive Query of the form q = ∃ Y Φ ( Y ), where Φ ( Y ) is a conjunction of atoms having as arguments variables Y and constants and, in particular, Boolean means that q is of arity 0. The probability of a BCQ query q can be computed in practice from its set of explanations: a composite choice κ is an explanation for q if q is entailed by the database D and every world identified by the set of composite choices.

Several works investigate inference in probabilistic databases or probabilistic ontologies. [50] enriches DL concept assertions and role assertions with probabilities in a semantics "that is similar to well-known probabilistic versions of Datalog". [15] introduces a refinement of OpenPDBs (Open Probabilistic DataBases) using Datalog+/- ontologies to express additional background knowledge. Also [18] investigate inference in probabilistic databases by relying on the possible worlds semantics. They consider ontology-mediated queries (OMQ), which enrich the class of unions of conjunctive queries (UCQ) with ontological rules based on Datalog+/-, however the tasks investigated here are the most probable explanation and the maximum-a-posteriori inference.

4 Inference and Learning for Probabilistic Logic Programs under the distribution semantics

As there are transformations with linear complexity that can convert each language under the distribution semantics into the others, the algorithms presented in the following subsections, specifically focusing on LPADs, can be applied to other PLP languages.

4.1 Inference

Lifted Inference Research in probabilistic logic languages has made it very clear that it is crucial to design models that can support efficient inference, while preserving intensional, and declarative modeling. The idea is to take advantage of the regularities in structured models to decrease the number of operations. Lifted inference ([28 , 96]) is one of the major advances in this respect. Originally, the idea was proposed as an extension of variable elimination (VE) to solve a probabilistic query without grounding the model.

[9] exploits efficient inference via lifted VE for PLP languages under the distribution semantics. To support reasoning compliant with the distribution semantics, two novel operators are introduced (named heterogeneous lifted multiplication and sum) in the Prolog Factor Language of [35], and the GC-FOVE algorithm [93] is modified for computing them. The resulting system, called LP² (for "Lifted Probabilistic Logic Programming") allows one to perform inference in a time linear with the number of individuals of the program domain. An experimental comparison with ProbLog2 [31] and PITA can be found in [9], while a survey and experimental comparison with C-FOVE-AP [93] and WFOMC (Weighted First-Order Model Counting [95]) is presented in [82]; while C-FOVE-AP uses a representation formalism based on undirected graphical models, WFOMC is presented for ProbLog, a PLP language based on the distribution semantics.

MPE/MAP Inference In PLP the most commonly studied inference task is to compute the marginal probability of a ground query atom Q given evidence e on a subset of the other atoms, P (Q|e). In the absence of e, this is also known as the success probability of a query P (Q), defined as the sum of the probabilities of all the worlds that entail Q (see Sec. 3). However, two other important tasks are the maximum-a-posteriori (MAP) and the most probable explanation (MPE) tasks. In general terms, given a joint probability distribution over a set of random variables, values for a subset of the variables (evidence), and another disjoint subset of the variables (query), the MAP problem consists of finding the most probable values for the query variables given the evidence. The MPE problem is the MAP problem where the set of query variables is the complement of the set of evidence variables. In the following, several techniques for addressing these tasks for PLP languages under the distribution semantics are presented. As LPADs offer a general syntax we will refer to them without loss of generality.

[7] presents an algorithm, included in the PITA reasoner, which solves these tasks by means of BDDs on LPADs.

Example 3. According to the syntax proposed in [7], the query random variables in a LPAD are indicated by prepending map_query to the probabilistic rules. Given the program of Example 3 where all the random variables are query: $\begin{matrix} map_query epidemic : 0.6; pandemic & : 0.3 : - \\ flu (_), cold . \\ map_query cold : 0.7 . \\ ev : - epidemic . \\ flu (david) . \\ flu (robert) . \end{matrix}$ the evidence ev has the MPE assignment:

rule(0,epidemic,[epidemic:0.6,pandemic:0.3,”:0.1], (flu(robert),cold)) rule(0,epidemic,[epidemic:0.6,pandemic:0.3,”: 0.1],(flu(david),cold)) rule(1,cold,[cold:0.7, ”: 0.3], true) where predicate rule/4 specifies clause number (zero-based), selected head, ground clause head, ground clause body, in that order. For this assignment, P= 0.252, meaning that the most probable explanation is the one corresponding to the MPE assignment above with a probability of 0.252, given the evidence ‘epidemic’. Note that this assignment corresponds to the world w₁ reported in Example 3: in fact, the MPE problem can be expressed as taking the truth of some atoms as evidence and finding the world that has the highest probability among those that entail the evidence. In this case, among the 5 worlds that entail ‘epidemic’, the one with the highest probability has probability 0.252.

Instead, given the program: $\begin{matrix} epidemic : 0.6; pandemic : 0.3 : - flu (_), cold . \\ map_query cold : 0.7 . \\ ev : - epidemic . \\ flu (david) . \\ flu (robert) . \end{matrix}$ where a subset of the variables are query (map_query is used only for the second clause), the evidence ev has the MAP assignment: rule(1, cold, [cold:0.7, ”: 0.3], true) with probability, given the evidence, of 0.588. The assignment shows only the clauses preceded by map_query.

[7] provides an experimental comparison with the version of ProbLog presented in [91], which supports annotated disjunctions in the head of clauses (such as LPADs), allowing to perform the MPE (and MAP) task as well. For answering MAP queries ProbLog uses a different strategy resorting to Decision Theoretic ProbLog (DTProbLog) [97], that exploits Algebraic Decision Diagrams.

4.2 Parameter and structure learning

One typically distinguishes two problems within the SRL field. Firstly, there is the problem of parameter estimation, where the goal is to estimate appropriate values for the parameters of a model, whose structure is fixed, and secondly, there is the problem of structure learning, where the learner must infer both the structure and the parameters of the model from data. In the case of LPADs, the problem is that of learning the probabilities Π_i or directly the disjunctive clauses C_i.

EMBLEM ("EM over Bdds for probabilistic Logic programs Efficient Mining") [12] applies the algorithms for performing Expectation Maximization [29] over BDDs proposed in [48 , 94] to the problem of learning the parameters of LPADs.

The typical input for EMBLEM is a set of target predicates, a set of mega-examples and a LPAD with random parameters Π_ik. Among the predicates of the domain, the user has to indicate which are target: the facts for these predicates in the mega-examples will form the queries Q for which the BDDs are built, encoding the disjunction of their explanations. The mega-examples are sets of ground facts describing a portion of the domain and must contain also negative atoms for target predicates, expressed as neg(atom).

The algorithm requires two traversals of the BDD to compute the probability of a query P (Q), so its cost is linear in the number of nodes. P (Q) is returned at the root of the graph. Also, the Π_ik parameters are computed for all rules. In particular, given a LPAD T with unknown parameters Π = {Π_i1, . . . , Π_{in
_i}} for all clauses C_i, and two sets I⁺ = {e₁, . . . , e_M} and I^- = {e_M+1, . . . , e_Q} of ground facts (positive and negative), EMBLEM finds the value of the parameters Π of T that maximize the likelihood of the facts, i.e., solves $arg max_{Π} P (I^{+}, \sim I^{-}) = arg max_{Π} \prod_{m = 1}^{M} P (e_{m}) \prod_{m = M + 1}^{Q} P (\sim e_{m})$ .

The "log-likelihood" is computed using log P (e_m) in the formula above. The predicates for the facts in I⁺ and I^- are called target because the objective is to be able to better predict the truth value for them.

SLIPCASE (“Structure LearnIng of ProbabilistiC logic progrAmS with Em over BDDs”) [11] and SLIPCOVER (“Structure LearnIng of Probabilistic logic programs by searChing OVER the clause space”) [13] learn the structure and the parameters of a LPAD by starting from an empty program. As EMBLEM, SLIPCASE takes as input a set of mega-examples and an indication of which predicates are target, i.e., those for which we want to optimize the predictions of the final LPAD. The mega-examples must contain positive and negative examples for the target predicates. The algorithm performs a beam search in the space of refinements of the programs guided by the log-likelihood of the data. SLIPCOVER is an “evolution” of SLIPCASE in terms of search strategy. SLIPCASE is based on a simple search strategy that refines LPADs by trying all possible program revisions. SLIPCOVER instead uses bottom clauses to guide the refinement process, thus reducing the number of revisions and exploring more effectively the search space. It takes as input a set of mega-examples, the target predicates and learns a LPAD by first identifying good candidate clauses and then by searching for a LPAD guided by the log-likelihood of the data. The search of candidate clauses is performed according to the language bias defined by the user. Since SLIPCASE and SLIPCOVER search space is extremely large, in [30] the authors investigate the application of a new approximate search method, called LEMUR. LEMUR ("LEarning with a Monte carlo Upgrade of tRee search") sees the problem of learning the structure of a PLP as a tree-structured multi-armed bandit problem. During structure learning, the addition of a new clause to the current program is represented by a tree search problem in which a multi-armed bandit problem [16] is solved to choose the clause. Each legal clause is an "arm" with unknown reward: starting from the empty program, clauses are iteratively added to it if obtaining an observable payoff, corresponding to the log-likelihood computed by EMBLEM.

All the previous algorithms are available in the cplint on SWISH web application (https://cplint.eu, [79]). Until now the aim has been to learn PLP models to predict specific predicates of the domain, called "target". However, it might also be useful to learn classifiers for interpretations as a whole: to this end, [78, 81] consider the models (Constraint Logic Theories (CLTs)) produced by the Inductive Constraint Logic (ICL) system [27], represented by sets of integrity constraints (ICs), and propose a probabilistic version of them. Each integrity constraint is annotated with a probability, and the resulting probabilistic logical constraint model assigns a probability of being positive to interpretations. In fact, the "learning from interpretations" setting of Inductive Logic Programming [14, 24] is considered, where, given a set of positive interpretations (positive examples), a set of negative interpretations (negative examples), a normal logic program BG (background knowledge expressing some general knowledge about the domain), a hypothesis space $H$ , one wants to find a hypothesis $H \in H$ that discriminates the positive from the negative interpretations.

A set of probabilistic ICs R_i of the form $p_{i} : : L_{1}, \dots, L_{b} \to \exists (P_{1}); \dots; \exists (P_{n}); \forall \neg (N_{1}); \dots; \forall \neg (N_{m}),$ where p_i ∈ [0, 1] is the probability, each L_q is a logical literal and each P_j and N_k is a conjunction of literals, is called a Probabilistic Constraint Logic Theory (PCLT). A PCLT defines a probability distribution on ground constraint logic theories called possible theorys in this way: for each grounding of the body of each IC, the IC is included in a possible theory with probability p_i and all groundings are independent. A probability p_i means that the sum of the probabilities of the possible theories where a grounding of the constraint is present is p_i. It can be demonstrated that the probability P (⊕ |w, I) of an interpretation being positive (indicated by ⊕), given the interpretation I, a possible background knowledge BG and a possible theory w, can be computed as $\prod_{i = 1}^{n} (1 - p_{i})^{m_{i}}$ where m_i is the number of groundings of the ith constraint that are not satisfied in I. The advantage is that this probability can be computed in a time that is O (n log m), where n is the number of ICs and m is the maximum number of groundings of ICs that are violated.

C (o). nsider the PCLT $\begin{matrix} R_{1} & = & 0.5 : : triangle (T), square (S), in (T, S) \to false \end{matrix}$ If there is only a triangle T inside a square S in the interpretation I, the body of the IC R₁ is true for one substitution, thus m₁ = 1 and P (⊕ |w, I) =0.5. If there were three pairs (triangle, square) in I for which the body of R₁ would be true, then m₁ = 3 and P (⊕ |w, I) =0.125.

To learn both the structure and the parameters of such probabilistic models the system PASCAL (“ProbAbiliStic inductive ConstrAint Logic”) is presented and experimentally compared with the previous (and other) SRL systems in [78, 81]. Parameter learning can be performed using gradient descent or Limited-memory BFGS (Broyden–Fletcher–Goldfarb–Shanno) [62].

5 Inference and Learning in DISPONTE Description Logics

5.1 Inference

As illustrated for PLP, the variables representing the selection of the probabilistic axioms are independent Boolean random variables X. The probability of the query on an uncertain KB can be computed by translating a Boolean formula f (X) over these variables, representing the set of the explanations for the query, into a BDD.

BUNDLE (“Binary decision diagrams for Uncertain reasoNing on Description Logic thEories”) [76, 83], written in Java, performs inference over probabilistic KBs based on the DISPONTE semantics. It first finds a set of explanations for a query and converts them into BDDs. Finally, it computes the probability from the BDD with the dynamic programming algorithm of [26].

The problem of finding explanations for a query from a DL KB has been investigated by various authors [40 , 52] and was called axiom pinpointing by [89]. In particular, they define minimal axiom sets or MinAs for short. BUNDLE is based on Pellet [92] and uses it for enumerating all MinAs.

Reasoners written in Prolog can exploit its backtracking facilities for performing the search of MinAs, as has been observed in various works ([6 , 56]). In this category fall TRILL and TRILL^P [105], that offer a Prolog implementation of the tableau algorithm [46], an approach usually adopted by DL reasoners. Such an algorithm decides whether an axiom is entailed or not by a KB by refutation: axiom E is entailed if ∼E has no model in the KB. TRILL and TRILL^P use the Thea2 library [99] for parsing OWL in its various dialects. Thea2 translates OWL files into a Prolog representation in which each axiom is mapped to a fact. In TRILL independent Boolean random variables are assigned to the axioms and the explanations for a query will be defined by a DNF 3 Boolean formula f ( X ). TRILL^P, instead, computes directly a pinpointing formula [2], a monotone Boolean formula that represents the set of all explanations for a query by associating a unique propositional variable with every axiom of the KB. Irrespective of which representation of the explanations we choose, a DNF or a general pinpointing formula, it is possible to transform it into a BDD, from which one can compute the probability of the query with the dynamic programming algorithm of [26], that is linear in the size of the BDD.

Both these Prolog-based probabilistic reasoners and BUNDLE can achieve significant results in terms of scalability and speed [105]. [106] presents TORNADO for “Trill powered by pinpOinting foRmulas and biNAry DecisiOn diagrams”, an improvement of TRILL^P in which the BDD representing the pinpointing formula built for a query is directly generated during the construction of the tableau, speeding up the overall inference process.

BUNDLE is available at https://bundle.ml.unife.it/; TRILL, TRILL^P and TORNADO are all available in the TRILL on SWISH web application at https://trill.ml.unife.it/ [10].

5.2 Parameter and structure learning

DISPONTE applies the distribution semantics for probabilistic logic programming to DLs. However, specifying the values of the probabilities is a difficult task for humans. On the other hand, data is usually available about the domain, that can be leveraged for tuning the parameters. [75] presents EDGE, for “Em over bDds for description loGics paramEter learning”, for learning the parameters of probabilistic ontologies from data. It is targeted to DLs following DISPONTE. [77] considers the problem of learning both the structure and the parameters of PDLs under DISPONTE by means of the LEAP system, based on the combination of CELOE [54] for building new (equivalence and subsumption) axioms to be added to the KB, and EDGE to learn their parameters. [19] presents LEAP^MR, its distributed version.

Other algorithms available for these tasks are [64], which learns parameters and structure of CR $ALC$ (Credal $ALC$ ) knowledge bases [4, 20], where parameters are learned using the EM algorithm as in EDGE, and [32, 103], which use an algorithm (GoldMiner) that exploits Association Rules [1] for building ontologies.

6 Discussion and Conclusions

Probabilistic Logic Programming allows the integration of logic and probability and combines the ability of the first to represent complex relations among entities with the ability of the latter to model uncertainty over attributes and relations. Logic programming provides a Turing complete language based on logic and thus represents an excellent candidate for this integration. Also, it becomes possible to exploit logic programming techniques in PLP systems for reasoning and learning. Languages following the distribution semantics use an approach to logic-probability integration which is simple and coherent across the languages themselves, but powerful enough to be useful in a variety of domains with richly structured data. PLP and more in general SRL techniques have been successfully applied in social networks analysis, entity recognition, collective classification and information extraction, to name a few.

On the other hand, various authors have advocated the use of probabilistic ontologies, see e.g. [63], and many proposals have been put forward for allowing ontology languages, and OWL in particular, to represent uncertainty in the Semantic Web. Ontologies are machine-interpretable semantic models, and the interplay of probabilities and ontologies has been shown fruitful for representation and reasoning on data that was extracted from the semantic web, which can be often uncertain because of the unreliability of many web data sources. OWL is based on Description Logics which are a decidable fragment of FOL, therefore, amenable for automated reasoning and for many reasoning tasks such as classification, retrieval, consistency checking, subsumption checking, satisfiability.

This paper reviewed the distribution semantics for expressing and reasoning in Probabilistic Logic Programs and probabilistic Description Logics. Several learning algorithms for parameter and structure of Probabilistic Logic Programs based on the distribution semantics are presented, with particular reference to Logic Programs with Annotated Disjunctions. These algorithms can be applied to all languages that are based on the distribution semantics, as there are transformations that can convert one language into another.

The distribution semantics was also applied to Description Logics to model ontologies probabilistically and to perform inference and learning over them with techniques based on Binary Decision Diagrams.

Acknowledgements

This paper surveys the work done in the field of Statistical Relational Learning until today, at the Department of Engineering of the University of Ferrara. This work has greatly taken advantage from collaboration with a number of people I would like to thank. First of all, my PhD supervisor, Fabrizio Riguzzi, who led me in the realm of PLP and machine learning. Thanks to Professor Evelina Lamma for her supervision throughout my entire work. Finally, thanks to my colleagues at the ML@unife research group, with whom I worked in all these years.

Footnotes

https://www.w3.org/

see [] for the detailed syntax of TGD, NC, EGD.

A Disjunctive Normal Form is a logical formula consisting of a disjunction of conjunctions.

References

Agrawal

and Srikant

, Fast algorithms for mining association rules in large databases. In International Conference on Very Large Data Bases, Morgan Kaufmann, 1994, pp. 487–499.

Baader

and Peñaloza

, Axiom pinpointing in generaltableaux, Journal of Logic and Computation 20(1) (2010), 5–34.

Baader

, Calvanese

, McGuinness

D.L.

, Nardi

and Patel-Schneider

P.F.

, editors. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, New York, NY, USA, 2003.

Baader

and Nutt

, Basic description logics. In Description Logic Handbook, Cambridge University Press, 2002, p. 47100.

Bechhofer

, Horrocks

and Patel-Schneider

P.F.

, Tutorial on OWL, 2003. https://www.cs.man.ac.uk/ horrocks/ISWC2003/Tutorial/

Beckert

and Posegga

, leanTAP: Lean tableau-based deduction, Journal of Automated Reasoning 15(3) (1995), 339–358.

Bellodi

, Alberti

, Riguzzi

and Zese

, Map inference forprobabilistic logic programming, Theory and Practice of LogicProgramming 20(5) (2020), 641–655.

Bellodi

, Lamma

, Riguzzi

and Albani

, Adistribution semantics for probabilistic ontologies. In Proceedings of the 7th InternationalWorkshop on Uncertainty Reasoning for the Semantic Web (URSW 2011), Bonn, Germany, 2011, pp. 75–86.

Bellodi

, Lamma

, Riguzzi

, Costa

V.S.

and Zese

, Liftedvariable elimination for probabilistic logic programming, Theory and Practice of Logic Programming 14(4-5) (2014), 681–695.

10.

Bellodi

, Lamma

, Riguzzi

, Zese

and Cota

, A web systemfor reasoning with probabilistic OWL, Software: Practice and Experience 47(1) (2017), 125–142.

11.

Bellodi

and Riguzzi

, Learning the structure of probabilistic logic programs. In S.H. Muggleton, A. Tamaddoni-Nezhad and F.A. Lisi, editors, 22nd International Conference on Inductive Logic Programming, volume 7207 of LNCS, Springer Berlin Heidelberg, 2012, pp. 61–75.

12.

Bellodi

and Riguzzi

, Expectation maximization over binarydecision diagrams for probabilistic logic programs, Intelligent Data Analysis 17(2) (2013), 343–363.

13.

Bellodi

and Riguzzi

, Structure learning of probabilistic logicprograms by searching the clause space, Theory and Practice of Logic Programming 15(2) (2015), 169–212.

14.

Blockeel

, Raedt

L.D.

, Jacobs

and Demoen

, Scaling upinductive logic programming by learning from interpretations, Data Mining and Knowledge Discovery 3(1) (1999), 59–93.

15.

Borgwardt

, undefinedsmail undefinedlkan Ceylan and T. Lukasiewicz, Ontologymediated queries for probabilistic databases. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, AAAI Press, 2017, p. 10631069.

16.

Bubeck

and Cesa-Bianchi

, Regret analysis of stochastic andnonstochastic multi-armed bandit problems, Foundations andTrendso in Machine Learning 5(1) (2012), 1–122.

17.

Calì

, Gottlob

and Lukasiewicz

, Tractable query answering over ontologies with datalog+/-. In B.C. Grau, I. Horrocks, B. Motik and U. Sattler, editors, Proceedings of the 22nd InternationalWorkshop on Description Logics (DL 2009), Oxford, UK, July 27-30, volume 477 of CEUR Workshop Proceedings, Aachen, Germany, 2009.

18.

Ceylan

İ.İ.

, Borgwardt

and Lukasiewicz

, Most probable explanations for probabilistic database queries. In C. Sierra, editor, Proceedings of the 26h International Joint Conference on Artificial Intelligence, AAAI Press, 2017, pp. 950–956.

19.

Cota

, Zese

, Bellodi

, Lamma

and Riguzzi

, Learning probabilistic ontologies with distributed parameter learning. In E. Bellodi and A. Bonfietti, editors, Doctoral Consortium (DC) co-located with the 14th Conference of the Italian Association for Artificial Intelligence (AI*IA 2015), volume 1485 of CEUR Workshop Proceedings, Aachen, Germany, 2015, pp. 7–12.

20.

Cozman

F.G.

and Polastro

R.B.

, Loopy propagation in a probabilistic description logic, In S. Greco and T. Lukasiewicz, editors, Scalable Uncertainty Management, Berlin, Heidelberg, Springer Berlin Heidelberg, 2008, pp. 120–133.

21.

Dantsin

, Probabilistic logic programs and their semantics. In Russian Conference on Logic Programming, volume 592 of LNCS, Springer, 1991, pp. 152–164.

22.

De Raedt

, Demoen

, Fierens

, Gutmann

, Janssens

, Kimmig

, Landwehr

, Mantadelis

, Meert

, Rocha

, SantosCosta

, Thon

and Vennekens

, Towards digesting the alphabet-soup of statistical relational learning. In NIPS 2008Workshop on Probabilistic Programming, 2008.

23.

De Raedt

and Van Laer

, Inductive constraint logic. In 6th Conference on Algorithmic Learning Theory (ALT 1995), volume 997 of LNAI, Springer, 1995, pp. 80–94.

24.

Raedt

L.D.

and Dzeroski

, First-order jk-Clausal theories arePAC-learnable, Artificial Intelligence 70(1-2) (1994), 375–392.

25.

Raedt

L.D.

, Kersting

, Kimmig

, Revoredo

and Toivonen

, Compressing probabilistic Prolog programs, Machine Learning 70(2-3) (2008), 151–168.

26.

Raedt

L.D.

, Kimmig

and Toivonen

, ProbLog: A probabilistic Prolog and its application in link discovery. In M.M. Veloso, editor, 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), AAAI Press, 2007, volume 7, pp. 2462–2467.

27.

Raedt

L.D.

and Laer

W.V.

, Inductive constraint logic. In K.P. Jantke, T. Shinohara and T. Zeugmann, editors, Algorithmic Learning Theory, Berlin, Heidelberg, Springer Berlin Heidelberg, 1995, pp. 80–94.

28.

Rodrigo de Salvo Braz , Amir

and Roth

, Lifted first-order probabilistic inference. In L.P. Kaelbling and A. Saffiotti, editors, 19th International Joint Conference on Artificial Intelligence (IJCAI 2005), Professional Book Center, 2005, pp. 1319–1325.

29.

Dempster

A.P.

, Laird

N.M.

and Rubin

D.B.

, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society: Series B 39 (1977), 1–38.

30.

Mauro

N.Di.

, Bellodi

and Riguzzi

, Bandit-based Monte-Carlostructure learning of probabilistic logic programs, Machine Learning 100(1) (2015), 127–156.

31.

Fierens

, den Broeck

G.V.

, Renkens

, Sht. Shterionov

, Gutmann

, Thon

, Janssens

and De. Raedt

, Inference andlearning in probabilistic logic programs using weighted Booleanformulas, Theory and Practice of Logic Programming 15(3) (2015), 358–401.

32.

Fleischhacker

and Völker

, Inductive learning of disjointness axioms. In On the Move to Meaningful Internet Systems: OTM 2011, Springer, 2011, pp. 680–697.

33.

Fuhr

, Probabilistic datalog: Implementing logical informationretrieval for advanced applications, Journal of the AmericanSociety for Information Science 51 (2000), 95–110.

34.

Getoor

and Taskar

, editors. Introduction to Statistical Relational Learning. MIT Press, 2007.

35.

Gomes

and Costa

V.S.

, Evaluating inference algorithms for the Prolog factor language. In F. Riguzzi and F. Železný, editors, 21st International Conference on Inductive Logic Programming (ILP 2012), volume 7842 of LNCS, Springer, 2012, pp. 74–85.

36.

Gutmann

, Kimmig

, Kersting

and Raedt De

, Parameter learning in probabilistic databases: A least squares approach, In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD 2008), volume 5211 of LNCS, Springer, 2008, pp. 473–488.

37.

Gutmann

, Thon

and Raedt

L.De.

, Learning the parameters of probabilistic logic programs from interpretations. In D. Gunopulos, T. Hofmann, D. Malerba and M. Vazirgiannis, editors, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD 2011), volume 6911 of LNCS, Springer, 2011, pp. 581–596.

38.

Haarslev

, Hidde

, Möller

and Wessel

, The racerpro knowledge representation and reasoning system, Semantic Web 3(3) (2012), 267277.

39.

Hadjichristodoulou

and Warren

D.S.

, Probabilistic logic programming with well-founded negation. In D.M. Miller and V.C. Gaudet, editors, 42nd IEEE International Symposium on Multiple-Valued Logic, (ISMVL 2012), 2012, pp. 232–237.

40.

Halaschek-Wiener

, Kalyanpur

and Parsia

, Extending tableau tracing for ABox updates. Technical report, University of Maryland, 2006.

41.

Halpern

J.Y.

, An analysis of first-order logics of probability, Artificial Intelligence 46(3) (1990), 311–350.

42.

Hitzler

, Krötzsch

and Rudolph

, Foundations of semantic web technologies. Chapman & Hall/CRC, 2009.

43.

Horridge

, Justification Based Explanation in Ontologies. Ph.D. thesis. University of Manchester, 2011.

44.

Horridge

, Parsia

and Sattler

, Explaining inconsistencies in owl ontologies. In L. 13 Godo and A. Pugliese, editors, Scalable Uncertainty Management, Berlin, Heidelberg, Springer Berlin Heidelberg, 2009, pp. 124–137.

45.

Horrocks

, Patel-Schneider

P.F.

and Harmelen

F.V.

, From shiq and rdf to owl: The making of a web ontology language, Web Semantics 1(1) (2003), 7–26.

46.

Horrocks

and Sattler

, A tableau decision procedure for SHOIQ, Journal of Automated Reasoning 39(3) (2007), 249–276.

47.

Hustadt

, Motik

and Sattler

, Deciding expressive description logics in the frame-work of resolution, Information and Computation 206(5) (2008), 579–601.

48.

Inoue

, Sato

, Ishihata

, Kameya

and Nabeshima

, Evaluating abductive hypotheses using an EM algorithm on BDDs. In 21st International Joint Conference on Artificial Intelligence (IJCAI 2009), Morgan Kaufmann Publishers Inc., 2009, pp. 810–815.

49.

Ishihata

, Kameya

, Sato

and Minato

, Propositionalizing the EM algorithm by BDDs. In Late Breaking Papers of the 18th International Conference on Inductive Logic Programming (ILP 2008), 2008, pp. 44–49.

50.

Jung

J.C.

and Lutz

, Ontology-based access to probabilistic data with owl ql. In P. Cudr´e-Mauroux, J. Heflin, E. Sirin, T. Tudorache, J. Euzenat, M. Hauswirth, J.X. Parreira, J. Hendler, G. Schreiber, A. Bernstein and E. Blomqvist, editors, The Semantic Web –ISWC 2012, Berlin, Heidelberg, Springer Berlin Heidelberg, 2012, pp. 182–197.

51.

Kalyanpur

, Parsia

, Sirin

and Hendler.

J.A.

, Debugging unsatisfiable classes in OWL ontologies, Journal of Web Semantics 3(4) (2005), 268–293.

52.

Kalyanpur

, Debugging and Repair of OWL Ontologies. PhD thesis, The Graduate School of the University of Maryland, 2006.

53.

Kimmig

, Costa

V.S.

, Rocha

, Demoen

and Raedt

L.De.

, On the efficient execution of ProbLog programs. volume 5366 of LNCS, Springer, 2008, pp. 175–189.

54.

Lehmann

, Auer

, Bühmann

and Tramp

, Class expressionlearning for ontology engineering, Journal of Web Semantics 9(1) (2011), 71–81.

55.

Lloyd

J.W.

, Foundations of Logic Programming, 2nd Edition. Springer, 1987.

56.

Lukácsy

and Szeredi

, Efficient description logic reasoningin prolog: The dlog system, Theory and Practice of LogicProgramming 9(3) (2009), 343–414.

57.

Lukasiewicz

, Probabilistic description logic programs. In L. Godo, editor, Symbolic and Quantitative Approaches to Reasoning with Uncertainty, 8th European Conference, ECSQARU 2005, Barcelona, Spain, Proceedings, volume 3571 of LNCS, Springer, 2005, pp. 737–749.

58.

Lukasiewicz

, Probabilistic description logic programs, International Journal of Approximate Reasoning 45(2) (2007), 288–307.

59.

Meert

, Struyf

and Blockeel

, CP-Logic theory inference with contextual variable elimination and comparison to BDD based inference methods. volume 5989 of LNCS, Springer, 2010, pp. 96–109.

60.

Meert

, Struyf

and Blockeel

, Learning ground CP-Logictheories by leveraging Bayesian network learning techniques, Fundamenta Informaticae 89(1) (2008), 131–160.

61.

Milch

, Zettlemoyer

L.S.

, Kersting

, Haimes

and Kaelbling

L.P.

, Lifted probabilistic inference with counting formulas. In D. Fox and C.P. Gomes, editors, 23rd AAAI Conference on Artificial Intelligence (AAAI 2008), AAAI Press, 2008, pp. 1062–1068.

62.

Nocedal

, Updating Quasi-Newton matrices with limited storage, Mathematics of Computation 35(151) (1980), 773–782.

63.

Obrst

, McCandless

, Stoutenburg

, Fox

, Nichols

, Prausa

and Sward

, Evolving use of distributed semantics to achieve net-centricity. In AAAI Fall Symposium, 2007, pp. 108–111.

64.

Ochoa-Luna

J.E.

, Revoredo

and Cozman

F.G.

, Learning probabilistic description logics: A framework and algorithms, In Advances in Artificial Intelligence, Springer, 2011, pp. 28–39.

65.

Poole

, The Independent Choice Logic for modelling multiple agentsunder uncertainty, Artificial Intelligence 94 (1997), 7–56.

66.

Poole

, Logic programming, abduction and probability - a top-downanytime algorithm for estimating prior and posterior probabilities, New Generation Computing 11(3) (1993), 377–400.

67.

Poole

, First-order probabilistic inference. In G. Gottlob and T. Walsh, editors, IJCAI-03, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, Morgan Kaufmann Publishers, 2003, pp. 985–991.

68.

Riguzzi

, A top down interpreter for LPADand CP-logic. In 10th Congress of the Italian Association for Artificial Intelligence, (AI*IA 2007), volume 4733 of LNAI, Springer, 2007, pp. 109–120.

69.

Riguzzi

, A top down interpreter for LPADand CP-logic. In 10th Congress of the Italian Association for Artificial Intelligence, (AI*IA 2007), volume 4733 of LNAI, Springer, 2007, pp. 109–120.

70.

Riguzzi

, Inference with logic programs with annotated disjunctions under the well founded semantics. volume 5366 of LNCS, Springer, 2008, pp. 667–771.

71.

Riguzzi

, Extended semantics and inference for the independentchoice logic, Logic Journal of the IGPL 17(6) (2009), 589–629.

72.

Riguzzi

, Foundations of Probabilistic Logic Programming: Languages, semantics, inference and 14 learning. Gistrup, Denmark, 2018.

73.

Riguzzi

, Bellodi

and Lamma

, Probabilistic Datalog+/- under the Distribution Semantics. In Proceedings of the 2012 International Workshop on Description Logics, volume 846 of CEUR Workshop Proceedings, Sun SITE Central Europe, 2012, pp. 519–529.

74.

Riguzzi

, Bellodi

, Lamma

and Zese

, Epistemic and statistical probabilistic ontologies. In Proceedings of the 8th International Workshop on Uncertainty Reasoning for the Semantic Web, Boston, USA, volume 900 of CEUR Workshop Proceedings, Sun SITE Central Europe, 2012, pp. 3–14.

75.

Riguzzi

, Bellodi

, Lamma

and Zese

, Parameter Learning for Probabilistic Ontologies. InW. Faber and D. Lembo, editors, 7th International Conference on Web Reasoning and Rule Systems (RR 2013), Mannheim, Germany, volume 7994 of LNCS, Springer Berlin Heidelberg, 2013, pp. 265–270.

76.

Riguzzi

, Bellodi

, Lamma

and Zese

, Reasoning with probabilistic ontologies. In Q. Yang and M. Wooldridge, editors, 24th International Joint Conference on Artificial Intelligence (IJCAI 2015), Palo Alto, California USA, AAAI Press, 2015, pp. 4310–4316.

77.

Riguzzi

, Bellodi

, Lamma

, Zese

and Cota

, Learning probabilistic description logics. In F. Bobillo, R.N. Carvalho, P.C.G. Costa, C. d’Amato, N. Fanizzi, K.B. Laskey, K.J. Laskey, T. Lukasiewicz, M. Nickles and M. Pool, editors, Uncertainty Reasoning for the Semantic Web III, volume 8816 of LNCS, Springer International Publishing, Berlin, Heidelberg, 2014, pp. 63–78.

78.

Riguzzi

, Bellodi

, Lamma

, Zese

and Cota

, Probabilistic inductive constraint logic. In K. Inoue, H. Ohwada and A. Yamamoto, editors, Inductive Logic Programming - 25th International Conference, ILP 2015, Kyoto, Japan, August 20-22, 2015, Revised Selected Papers, volume 9575 of Lecture Notes in Computer Science, Springer, 2016, pp. 30–45.

79.

Riguzzi

, Bellodi

, Lamma

, Zese

and Cota

, Probabilisticlogic programming on the web, Software: Practice andExperience 46(10) (2016), 1381–1396.

80.

Riguzzi

, Bellodi

and Zese

, A history of probabilistic inductive logic programming, Frontiers in Robotics and AI 1(6) (2014).

81.

Riguzzi

, Bellodi

, Zese

, Alberti

and Lamma

, Probabilistic inductive constraint logic, Machine Learning 110 (2021), 1–32.

82.

Riguzzi

, Bellodi

, Zese

, Cota

and Lamma

, A survey of lifted inference approaches for probabilistic logic programming under the distribution semantics, International Journal of Approximate Reasoning 80 (2017), 313–333.

83.

Riguzzi

, Lamma

, Bellodi

and Zese

, BUNDLE: A reasoner for probabilistic ontologies. In W. Faber and D. Lembo, editors, 7th International Conference on Web Reasoning and Rule Systems (RR 2013), Mannheim, Germany, volume 7994 of LNCS, Springer Berlin Heidelberg, 2013, pp. 183–197.

84.

Riguzzi

and Swift

, The PITA system: Tabling and answer subsumption for reasoning under uncertainty, Theory and Practice of Logic Programming 11(4–5) (2011), 433–449.

85.

Riguzzi

and Swift

, The PITA system: Tabling and answer subsumption for reasoning under uncertainty, Theory and Practice of Logic Programming 11(4-5) (2011), 433–449.

86.

Riguzzi

and Swift

, Well definedness and efficient inference forprobabilistic logic programming under the distribution semantics, Theory and Practice of Logic Programming 13(2) (2013), 279302.

87.

Sato

, A statistical learning method for logic programs with distribution semantics. In L. Sterling, editor, Logic Programming, Proceedings of the Twelfth International Conference on Logic Programming, Tokyo, Japan, MIT Press, 1995, pp. 715–729.

88.

Sato

and Kameya

, PRISM: A language for symbolic-statistical modeling. In 15th International Joint Conference on Artificial Intelligence (IJCAI 1997), volume 97, Morgan Kaufmann Publishers Inc, 1997, pp. 1330–1339.

89.

Schlobach

and Cornet

, Non-standard reasoning services for the debugging of description logic terminologies. In G. Gottlob and T. Walsh, editors, IJCAI-03, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc, 2003, pp. 355–362.

90.

Shearer

, Motik

and Horrocks

, HermiT: A highly efficient OWL reasoner. In C. Dolbear, A. Ruttenberg and U. Sattler, editors, Proceedings of the Fifth OWLED Workshop on OWL: Experiences and Directions, collocated with the 7th International Semantic Web Conference (ISWC-2008), Karlsruhe, Germany, volume 432 of CEUR Workshop Proceedings, Sun SITE Central Europe, 2008, pp. 1–10.

91.

Shterionov Sht.

, Renkens

, Vlasselaer

, Kimmig

, Meertand

and Janssens

, The most probable explanation for probabilistic logic programs with annotated disjunctions. In J. Davis and J. Ramon, editors, 24th International Conference on Inductive Logic Programming (ILP 2014), volume 9046 of LNCS, Berlin, Heidelberg, Springer, 2015, pp. 139–153.

92.

Sirin

, Parsia

, Cuenca-Grau

, Kalyanpur

and Katz

, Pellet: A practical OWL-DL reasoner, Journal of Web Semantics 5(2) (2007), 51–53.

93.

Taghipour

, Fierens

, Davis

and Blockeel.

, Lifted variable elimination: Decoupling the operators from the constraint language, Journal of Artificial Intelligence Research 47 (2013), 393–439.

94.

Thon

, Landwehr

and De Raedt

, A simple model for sequences of relational state descriptions, In European Conference on Machine Learning and Knowledge Discovery in Databases, volume 5212 of LNCS, Springer, 2008, pp. 506–521.

95.

den Broeck

G.V.

, Meert

and Darwiche

, Skolemization for weighted first-order model counting. In C. Baral, G. De Giacomo and T. Eiter, editors, 14th International Conference on Principles of Knowledge Representation and Reasoning (KR 2014), AAAI Press, 2014, pp. 111–120.

96.

den Broeck

G.V.

, Taghipour

, Meert

, Davis

and De Raedt

, Lifted probabilistic inference by first-order knowledge compilation. In T.Walsh, editor, 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011), IJCAI/AAAI, 2011, pp. 2178–2185.

97.

den Broeck

G.V.

, Thon

, van Otterlo

and De Raedt

, DTProbLog: A decision-theoretic probabilistic Prolog. In M. Fox and D. Poole, editors, Proceedings of the Twenty- Fourth AAAI Conference on Artificial Intelligence, AAAI Press, 2010, pp. 1217–1222.

98.

Gelder

A.V.

, Ross

K.A.

and Schlipf

J.S.

, The well-founded semanticsfor general logic programs, Journal of the ACM 38(3) (1991), 620–650.

99.

Vassiliadis

, Wielemaker

and Mungall

, Processing OWL2 ontologies using thea: An application of logic programming. In Proceedings of the 6th International Workshop on OWL: Experiences and Directions, volume 529 of CEUR Workshop Proceedings, Sun SITE Central Europe, 2009, pp. 1–10.

100.

Vennekens

and Verbaeten

, Logic programs with annotated disjunctions. Technical Report CW386, KU Leuven, 2003.

101.

Vennekens

, Denecker

and Bruynooghe

, CP-logic: A language ofcausal probabilistic events and its relation to logic programming, Theory and Practice of Logic Programming 9(3) (2009), 245–308.

102.

Vennekens

, Verbaeten

and Bruynooghe

, Logic programs with annotated disjunctions. In B. Demoen and V. Lifschitz, editors, 20th International Conference on Logic Programming (ICLP 2004), volume 3131 of LNCS, Springer, 2004, pp. 431–445.

103.

Völker

and Niepert

, Statistical schema induction. In The Semantic Web: Research and Applications, Springer Berlin Heidelberg, 2011, pp. 124–138.

104.

Zese

, Bellodi

, Lamma

, Riguzzi

and Aguiari

, Semantics and inference for probabilistic description logics. In F. Bobillo, R.N. Carvalho, P.C.G. Costa, C. d’Amato, N. Fanizzi, K.B. Laskey, K.J. Laskey, T. Lukasiewicz, M. Nickles, and M. Pool, editors, Uncertainty Reasoning for the SemanticWeb III, Cham, Springer International Publishing, 2014, pp. 79–99.

105.

Zese

, Bellodi

, Riguzzi

, Cota

and Lamma

, Tableau reasoning for description logics and its extension to probabilities, Annals of Mathematics and Artificial Intelligence 82(1–3) (2018), 101–130.

106.

Zese

, Cota

, Lamma

, Bellodi

and Riguzzi

, Probabilistic DL reasoning with pinpointing formulas: A prolog-based approach, Theory and Practice of Logic Programming 19(3) (2019), 449–476.