Abstract
Representing uncertain information is crucial for modeling real world domains. This has been fully recognized both in the field of Logic Programming and of Description Logics (DLs), with the introduction of probabilistic logic languages and various probabilistic extensions of DLs respectively. Several works have considered the distribution semantics as the underlying semantics of Probabilistic Logic Programming (PLP) languages and probabilistic DLs (PDLs), and have then targeted the problem of reasoning and learning in them. This paper is a survey of inference, parameter and structure learning algorithms for PLP languages and PDLs based on the distribution semantics. A few of these algorithms are also available as web applications.
Keywords
Introduction
Recently, in the field of Machine Learning, expressive knowledge representations have been considered to cope with a variable number of entities as well as the relationships that hold among them. These representations are mostly based on logic that provides a high expressivity useful in relational domains, and an excellent theoretical foundation for learning; however, it has limitations when reasoning on uncertain domains. These limitations have been lifted, allowing a multitude of different formalisms combining probability theory with logic, databases or logic programming. This direction has been taken, for instance, in the fields of Statistical Relational Learning (SRL) [37], starting from the nineties, for reasoning and learning in domains with complex and relational structure, and of probabilistic Description Logics (DLs), for reasoning and learning in ontologies.
Probabilistic Logic Programming (PLP) [80] has recently received increasing attention inside the SRL field for its ability to incorporate probability in logic programming, allowing one to exploit logic programming techniques. Among the various proposals, the one based on the distribution semantics [87] has gained popularity as the basis of languages such as Probabilistic Horn Abduction (PHA) [66], PRISM [88], Independent Choice Logic (ICL) [65], pD [33], Probabilistic Logic Programs [21], Logic Programs with Annotated Disjunctions (LPADs) [102], ProbLog [26] and CP-logic [101]. Such semantics is particularly appealing for its intuitiveness and because efficient inference algorithms have started to appear [26, 70]. In many cases they find explanations for queries and compute their probability by building a Binary Decision Diagram (BDD) [26, 84]. Next, various works started to appear on the problem of learning parameters of probabilistic logic programs based on the distribution semantics, such as LeProbLog [36] and LFI-ProbLog [37] for the ProbLog language, and of learning their structure directly from data, with works such as [25] for ProbLog and [60] for ground LPADs.
This paper surveys several inference and learning systems for PLP languages under the distribution semantics developed for LPADs. As for inference, it will be covered in detail an algorithm for lifted inference [9] and implementations of the “maximum-a-posteriori” and the “most probable explanation” [7] inference tasks in the PITA reasoner [85]. As for parameter and structure learning, the EMBLEM [12], SLIPCASE [11], SLIPCOVER [13], LEMUR [30] algorithms will be described. EMBLEM learns LPADs parameters by means of the BDDs that are built for inference. SLIPCASE, SLIPCOVER and LEMUR learn LPADs structure by relying on EMBLEM for parameter learning.
PLP languages are a suitable framework to handle uncertain information, but usually require expensive inference and learning procedures. For this reason, in the last decade many languages that impose limitations on the form of sentences have been proposed. A possible way to pursue this goal is the application of learning from interpretations [14, 24] instead of the classical setting of learning from entailment. A system that learns from interpretations is Inductive Constraint Logic (ICL) [23], based on the language of Constraint Logic Theories, i.e. models in the form of sets of integrity constraints. [78, 81] propose a probabilistic version of the integrity constraints and an algorithm, PASCAL, that learns them.
The adoption of logical-statistical languages has been particularly useful for modeling social networks analysis, link prediction, entity recognition, collective classification and information extraction, to name a few. In fact, the datasets used for testing algorithms for those languages (see the above-mentioned publications) often represent such kind of domains.
Formalisms for dealing with uncertainty have started to play an important role also in research related to the Semantic Web, where knowledge may come from different sources with different reliability. The main idea of the Semantic Web is making information available in a form that is understandable and automatically manageable by machines [42]. In order to realize this vision, the W3C 1 has supported the development of a family of knowledge representation formalisms of increasing complexity for defining ontologies, called Web Ontology Language (OWL). Ontologies have played a decisive part in the development of the Semantic Web as a means for defining shared terms in web resources, and efficient reasoners [38, 92] are used to extract implicit information from the modeled ontologies. In particular, representation and reasoning with uncertain information has been investigated by various authors both in the general case of First-Order Logic and in the case of restricted logics, such as Description Logics. DLs are perhaps best known as the basis for ontology languages such as OWL [45]. Description Logics possess nice computational properties such as decidability and/or low complexity [3]. In this direction, the distribution semantics and the inference techniques developed for PLP have been applied to DLs, with the proposal of a “distribution semantics for probabilistic ontologies” called DISPONTE [74, 104], which annotates the axioms of a knowledge base (KB) with a probability. Several reasoners were then developed for computing the probability of queries from uncertain KBs by encoding their explanations in BDDs or as a pinpointing formula: BUNDLE [76, 83], TRILL [105], TRILL P [105], TORNADO [106]. The implementations of all the previously mentioned systems have been subjected to an extensive experimental evaluation on benchmarks and real-life databases, showing that they can tackle problems of realistic size with often higher performance and lower time than other state-of- the-art existing tools (see the respective publications for details about performance comparison). All works are publicly available at https://ml.unife.it/software/ and many of them as web applications.
The paper is organized as follows. Section 2 introduces background on First-Order Logic. Section 3 presents the distribution semantics for Probabilistic Logic Programs and Probabilistic Description Logics. Section 4 surveys inference and learning systems for LPADs. Section 5 surveys inference and learning in DLs under DISPONTE. Section 6 concludes the paper.
Background on First-Order Logic (FOL)
For a more detailed background on FOL we refer the reader to [55]. A first-order alphabet Σ is a set of predicate symbols and function symbols (or functors) together with their arity. A functor with arity 0 is called a constant. A term is either a variable or a functor applied to a tuple of terms of length equal to the arity of the functor. An atom A is a predicate symbol applied to a tuple of terms of length equal to the arity of the predicate. A literal L is either an atom A or its negation ¬A; in the latter case it is called a negative literal. Here we use the logic programming conventions of indicating predicates and constants with alphanumeric strings starting with a lowercase character and variables with alphanumeric strings starting with an uppercase character. The connectives are ∼ (negation), ∧ (conjunction), lor (disjunction), → (implication) and ↔ (equivalence), while the quantifiers are ∃ (existential) and ∀ (universal). A (well-formed) formula is an n-ary predicate symbol applied to a tuple of n terms (i.e., an atom); if F and G are formulas, so are ∼F, F ∧ G, FlorG, F → G, F ↔ G; if F is a formula and X is a variable, then ∃X F ("there exists an X") and ∀X F ("for all X") are formulas. A clause is a formula of the form ∀X1 . . . ∀ X s (L1lor . . . lorL m ), where each L i is a literal and X1 . . . X s are all the variables occurring in L1lor . . . lorL m . A special clause notation is used in logic programming: ∀X1 . . . ∀ X s (A1lor . . . lorA k lor ∼ B1lor . . . lor ∼ B n ), where A i and B i are literals and X1 . . . X s all the variables occurring in them, is represented as A1, . . . , A k ← B1, . . . . , B n . A normal logic program is a program containing clauses (called definite clauses) of the form A ← B1, . . . , B n , where A is an atom and the B i s are literals, i.e., atoms or negations of atoms. A unit clause or fact is a clause of the form A←, that is a clause with an empty body. A goal or query is a clause of the form ←B1, . . . . , B n ; each B i is called a subgoal of the goal.
A term, atom, literal, clause or query is ground if it does not contain variables. A substitution θ is an assignment of variables to terms: θ = {V1/t1, . . . , V n /t n }. The application of a substitution to a term, atom, literal, query or clause C, indicated with Cθ, is the replacement of the variables appearing in C and in θ with the terms specified in θ.
The Distribution Semantics
The distribution semantics [87] is one of the most interesting approaches to the integration of logic programming and probability. It was introduced for the PRISM language [88] but is shared by many other languages (see Introduction). A program in one of these languages defines a probability distribution over normal logic programs called worlds. Each normal program is assumed to have a total well-founded model [98], thus each program can be associated with a Herbrand interpretation (a world) that is its model. The distribution is extended to queries and the probability of a query is obtained by marginalizing the joint distribution of the query and the programs. The distribution semantics has been defined both for programs that do not contain function symbols, and thus have a finite set of worlds, and for programs that contain them, that have an infinite set of worlds. We review here the first case for the sake of simplicity. For the treatment of function symbols see [86]; see also examples 40 and 41 of [72].
Languages relying on the distribution semantics differ in the way they define the distribution over logic programs. Probabilistic Logic Programs, PHA, ICL, PRISM allow probability distributions over facts, ProbLog allows probability distributions over facts and the head of clauses, while LPADs allow probability distributions over the heads of disjunctive clauses. All these languages have the same expressive power: there are transformations with linear complexity that can convert each one into the others [22, 100].
In this paper we will use LPADs [102] for their general syntax. In LPADs the alternatives are encoded in the head of clauses in the form of a disjunction in which each atom is annotated with a probability. We consider only sound LPADs, in which every possible world has a two-valued well-founded model, i.e. a total model according to the well-founded semantics [98]. When a program is not sound assigning a semantics to probabilistic logic programs requires different semantics, such as [39], who proposes a three-valued-logic semantics, or the credal semantics [57, 58]. In this way, uncertainty is modeled only by means of the disjunctions in the head and not by the semantics of negation. LPADs without negation in clauses’ bodies are sound. Formally, a Logic Program with Annotated Disjunctions consists of a finite set of annotated disjunctive clauses. An annotated disjunctive clause C
i
is of the form
An atomic choice [65] is a triple (C
i
, θ
j
, k) where C
i
∈ T, θ
j
is a substitution that grounds C
i
and k ∈ {1, …, n
i
} identifies one of the atoms in the head of C
i
. (C
i
, θ
j
, k) means that, for the ground clause C
i
θ
j
, the head h
ik
was chosen. In practice C
i
θ
j
corresponds to a multi-valued random variable X
ij
and an atomic choice (C
i
, θ
j
, k) to an assignment X
ij
= k. A set of atomic choices κ is consistent if (C, θ, i) ∈ κ, (C, θ, j) ∈ κ ⇒ i = j, i.e., only one head is selected from a ground clause; we assume independence between the different choices. A composite choice κ is a consistent set of atomic choices. The probability P (κ) of a composite choice κ is the product of the probabilities of the independent atomic choices, i.e.
Let W
T
be the set of all worlds of T. Since selections are composite choices, we can assign a probability to worlds:
Let P (W
T
) be the distribution over worlds; we also write w
σ ⊨ Q to mean that the query Q is true in the well–founded model of the program w
σ. The probability of a query Q given a world w is P (Q|w) =1 if w ⊨ Q and 0 otherwise. The probability of a query Q can be defined by marginalizing the joint probability of the query and the worlds:
It is unfeasible to enumerate all the worlds where a query Q is entailed, so Eq. 2 cannot be applied in practice. Instead, inference algorithms find explanations for Q, i.e. composite choices κ such that Q is entailed in all the worlds whose selections are a superset of them. Explanations however, differently from possible worlds, are not necessarily mutually exclusive with respect to each other so they have first to be made disjoint so that a summation (as in Eq. 2) can be computed. Binary Decision Diagrams have been extensively used in PLP [26, 84] for performing inference in PLP effectively as, thanks to their structure, they make the explanations disjoint. A BDD is a rooted, directed acyclic graph representing a function taking Boolean values on a set of Boolean variables; it has two terminal nodes labeled 0/1, and a set of variable nodes. Each variable node is associated with the variable of its level and has two children, one for each possible value of the variable. Given values for all the variables we can compute the value of the function by traversing the graph starting from the root and returning the value associated with the leaf that is reached. By means of a dynamic programming algorithm [26] that can be applied over these diagrams, the probability of the query can be returned at the root of the BDD.
The application of the distribution semantics for probabilistic logic programming to DLs originated a novel semantics for probabilistic DLs, named DISPONTE (DIstribution Semantics for Probabilistic ONTologiEs) [8, 104].
DLs are usually represented using a syntax based on concepts and roles. A concept corresponds to a set of individuals of the domain while a role corresponds to a set of pairs of individuals of the domain. A probabilistic knowledge base cK is a set of certain axioms and/or probabilistic axioms. Certain axioms take the form of regular DL axioms. Probabilistic axioms take the form p : : Elabelpax where p is a real number in [0, 1] and E is a DL axiom.
The idea of DISPONTE is to associate independent Boolean random variables to the probabilistic axioms. By assigning values to every random variable we obtain a world, the set of axioms whose random variables are assigned the value 1. Every formula obtained from a certain axiom is included in a world w. For each probabilistic axiom, we decide whether to include it or not in w. A world therefore is a non probabilistic KB that can be assigned a semantics in the usual way. A query is entailed by a world if it is true in every model of the world. The probability p can be interpreted as an epistemic probability, i.e., as the degree of our belief in the axiom E. For example, a probabilistic concept membership axiom p : : a : C means that we have degree of belief p in individual a belonging to concept C.
To illustrate the semantics, we reflect the definitions given for the case of PLP. An atomic choice is a couple (E
i
, k) where E
i
is the ith probabilistic axiom and k ∈ {0, 1}. k indicates whether E
i
is included in a world (k = 1) or not (k = 0). A set of atomic choices identifies a world w including all certain axioms and all probabilistic axioms that have been selected. Let W be the set of all worlds. P (w) is a probability distribution over worlds, i.e., ∑w∈WP (w) =1. Given a world w, the probability of a query Q can be defined, as usual, as the sum of the probabilities of the worlds where the query is true. A query Q over a KB cK is usually an axiom for which we want to test the entailment from the KB, i,.e. cK ⊨ Q. The probability of each world is given by:
The query axiom Q = kevin : NatureLover is true in 3 out of the 4 worlds, those corresponding to the selections {(E1, 1) , (E2, 1)} , {(E1, 1) , (E2, 0)} , {(E1, 0) , (E2, 1)} . So P (Q) =0.4 · 0.3 + 0.4 · 0.7 + 0.6 · 0.3 = 0.58 .
Finally, [73] applies DISPONTE to Datalog+/-, a variant of Datalog for defining ontologies [17]. Here two types of probabilistic annotations are considered, the epistemic type and the statistical type.
A probabilistic ontology (D, T) consists of a database D and a set T of certain formulas, that take the form of a Datalog+/- TGD (tuple-generating dependency), NC (negative constraints) or EGD (equality-generating dependencies), of epistemic probabilistic formulas of the form p i : : e F i where p i is a real number in [0, 1] and F i is a TGD, NC or EGD, and of statistical probabilistic formulas of the form p i : : s F i where p i is a real number in [0, 1] and F i is a TGD. The epistemic probability (indicated by subscript e) represents a degree of confidence in the formula as a whole, while the statistical probability (indicated by subscript s) considers the populations to which the formula is applied. These two types of statements can be related to the work of Halpern [41]: an epistemic statement is a Type 2 statement and a statistical one is a Type 1 statement. For instance, an epistemic probabilistic concept inclusion TGD of the form p : : e c (X) → d (X) represents the fact that we believe in the truth of c ⊑ d with probability p. A statistical probabilistic concept inclusion TGD of the form p : : s c (X) → d (X) instead means that a random individual of class c has probability p of belonging to d, thus representing the statistical information that a fraction p of the individuals of c belongs to d.
To obtain a world w of a probabilistic ontology (D, T), every certain formula is included in w; for each epistemic formula, it can be included or not in w; for each statistical formula, all the substitutions for the variables universally quantified in the outermost quantifier
2
are generated and for each grounding it is decided whether or not to include it in w. A query in this case is a BCQ, a Boolean Conjunctive Query of the form q = ∃
Several works investigate inference in probabilistic databases or probabilistic ontologies. [50] enriches DL concept assertions and role assertions with probabilities in a semantics "that is similar to well-known probabilistic versions of Datalog". [15] introduces a refinement of OpenPDBs (Open Probabilistic DataBases) using Datalog+/- ontologies to express additional background knowledge. Also [18] investigate inference in probabilistic databases by relying on the possible worlds semantics. They consider ontology-mediated queries (OMQ), which enrich the class of unions of conjunctive queries (UCQ) with ontological rules based on Datalog+/-, however the tasks investigated here are the most probable explanation and the maximum-a-posteriori inference.
As there are transformations with linear complexity that can convert each language under the distribution semantics into the others, the algorithms presented in the following subsections, specifically focusing on LPADs, can be applied to other PLP languages.
Inference
Lifted Inference Research in probabilistic logic languages has made it very clear that it is crucial to design models that can support efficient inference, while preserving intensional, and declarative modeling. The idea is to take advantage of the regularities in structured models to decrease the number of operations. Lifted inference ([28, 96]) is one of the major advances in this respect. Originally, the idea was proposed as an extension of variable elimination (VE) to solve a probabilistic query without grounding the model.
[9] exploits efficient inference via lifted VE for PLP languages under the distribution semantics. To support reasoning compliant with the distribution semantics, two novel operators are introduced (named heterogeneous lifted multiplication and sum) in the Prolog Factor Language of [35], and the GC-FOVE algorithm [93] is modified for computing them. The resulting system, called LP2 (for "Lifted Probabilistic Logic Programming") allows one to perform inference in a time linear with the number of individuals of the program domain. An experimental comparison with ProbLog2 [31] and PITA can be found in [9], while a survey and experimental comparison with C-FOVE-AP [93] and WFOMC (Weighted First-Order Model Counting [95]) is presented in [82]; while C-FOVE-AP uses a representation formalism based on undirected graphical models, WFOMC is presented for ProbLog, a PLP language based on the distribution semantics.
MPE/MAP Inference In PLP the most commonly studied inference task is to compute the marginal probability of a ground query atom Q given evidence e on a subset of the other atoms, P (Q|e). In the absence of e, this is also known as the success probability of a query P (Q), defined as the sum of the probabilities of all the worlds that entail Q (see Sec. 3). However, two other important tasks are the maximum-a-posteriori (MAP) and the most probable explanation (MPE) tasks. In general terms, given a joint probability distribution over a set of random variables, values for a subset of the variables (evidence), and another disjoint subset of the variables (query), the MAP problem consists of finding the most probable values for the query variables given the evidence. The MPE problem is the MAP problem where the set of query variables is the complement of the set of evidence variables. In the following, several techniques for addressing these tasks for PLP languages under the distribution semantics are presented. As LPADs offer a general syntax we will refer to them without loss of generality.
[7] presents an algorithm, included in the PITA reasoner, which solves these tasks by means of BDDs on LPADs.
rule(0,epidemic,[epidemic:0.6,pandemic:0.3,”:0.1], (flu(robert),cold)) rule(0,epidemic,[epidemic:0.6,pandemic:0.3,”: 0.1],(flu(david),cold)) rule(1,cold,[cold:0.7, ”: 0.3], true) where predicate
Instead, given the program:
[7] provides an experimental comparison with the version of ProbLog presented in [91], which supports annotated disjunctions in the head of clauses (such as LPADs), allowing to perform the MPE (and MAP) task as well. For answering MAP queries ProbLog uses a different strategy resorting to Decision Theoretic ProbLog (DTProbLog) [97], that exploits Algebraic Decision Diagrams.
Parameter and structure learning
One typically distinguishes two problems within the SRL field. Firstly, there is the problem of parameter estimation, where the goal is to estimate appropriate values for the parameters of a model, whose structure is fixed, and secondly, there is the problem of structure learning, where the learner must infer both the structure and the parameters of the model from data. In the case of LPADs, the problem is that of learning the probabilities Π i or directly the disjunctive clauses C i .
EMBLEM ("EM over Bdds for probabilistic Logic programs Efficient Mining") [12] applies the algorithms for performing Expectation Maximization [29] over BDDs proposed in [48, 94] to the problem of learning the parameters of LPADs.
The typical input for EMBLEM is a set of target predicates, a set of mega-examples and a LPAD with random parameters Π ik . Among the predicates of the domain, the user has to indicate which are target: the facts for these predicates in the mega-examples will form the queries Q for which the BDDs are built, encoding the disjunction of their explanations. The mega-examples are sets of ground facts describing a portion of the domain and must contain also negative atoms for target predicates, expressed as neg(atom).
The algorithm requires two traversals of the BDD to compute the probability of a query P (Q), so its cost is linear in the number of nodes. P (Q) is returned at the root of the graph. Also, the Π
ik
parameters are computed for all rules. In particular, given a LPAD T with unknown parameters Π = {Πi1, . . . , Π
in
i
} for all clauses C
i
, and two sets I+ = {e1, . . . , e
M
} and I- = {eM+1, . . . , e
Q
} of ground facts (positive and negative), EMBLEM finds the value of the parameters Π of T that maximize the likelihood of the facts, i.e., solves
The "log-likelihood" is computed using log P (e m ) in the formula above. The predicates for the facts in I+ and I- are called target because the objective is to be able to better predict the truth value for them.
SLIPCASE (“Structure LearnIng of ProbabilistiC logic progrAmS with Em over BDDs”) [11] and SLIPCOVER (“Structure LearnIng of Probabilistic logic programs by searChing OVER the clause space”) [13] learn the structure and the parameters of a LPAD by starting from an empty program. As EMBLEM, SLIPCASE takes as input a set of mega-examples and an indication of which predicates are target, i.e., those for which we want to optimize the predictions of the final LPAD. The mega-examples must contain positive and negative examples for the target predicates. The algorithm performs a beam search in the space of refinements of the programs guided by the log-likelihood of the data. SLIPCOVER is an “evolution” of SLIPCASE in terms of search strategy. SLIPCASE is based on a simple search strategy that refines LPADs by trying all possible program revisions. SLIPCOVER instead uses bottom clauses to guide the refinement process, thus reducing the number of revisions and exploring more effectively the search space. It takes as input a set of mega-examples, the target predicates and learns a LPAD by first identifying good candidate clauses and then by searching for a LPAD guided by the log-likelihood of the data. The search of candidate clauses is performed according to the language bias defined by the user. Since SLIPCASE and SLIPCOVER search space is extremely large, in [30] the authors investigate the application of a new approximate search method, called LEMUR. LEMUR ("LEarning with a Monte carlo Upgrade of tRee search") sees the problem of learning the structure of a PLP as a tree-structured multi-armed bandit problem. During structure learning, the addition of a new clause to the current program is represented by a tree search problem in which a multi-armed bandit problem [16] is solved to choose the clause. Each legal clause is an "arm" with unknown reward: starting from the empty program, clauses are iteratively added to it if obtaining an observable payoff, corresponding to the log-likelihood computed by EMBLEM.
All the previous algorithms are available in the cplint on SWISH web application (https://cplint.eu, [79]). Until now the aim has been to learn PLP models to predict specific predicates of the domain, called "target". However, it might also be useful to learn classifiers for interpretations as a whole: to this end, [78, 81] consider the models (Constraint Logic Theories (CLTs)) produced by the Inductive Constraint Logic (ICL) system [27], represented by sets of integrity constraints (ICs), and propose a probabilistic version of them. Each integrity constraint is annotated with a probability, and the resulting probabilistic logical constraint model assigns a probability of being positive to interpretations. In fact, the "learning from interpretations" setting of Inductive Logic Programming [14, 24] is considered, where, given a set of positive interpretations (positive examples), a set of negative interpretations (negative examples), a normal logic program BG (background knowledge expressing some general knowledge about the domain), a hypothesis space
A set of probabilistic ICs R
i
of the form
To learn both the structure and the parameters of such probabilistic models the system PASCAL (“ProbAbiliStic inductive ConstrAint Logic”) is presented and experimentally compared with the previous (and other) SRL systems in [78, 81]. Parameter learning can be performed using gradient descent or Limited-memory BFGS (Broyden–Fletcher–Goldfarb–Shanno) [62].
Inference and Learning in DISPONTE Description Logics
Inference
As illustrated for PLP, the variables representing the selection of the probabilistic axioms are independent Boolean random variables
BUNDLE (“Binary decision diagrams for Uncertain reasoNing on Description Logic thEories”) [76, 83], written in Java, performs inference over probabilistic KBs based on the DISPONTE semantics. It first finds a set of explanations for a query and converts them into BDDs. Finally, it computes the probability from the BDD with the dynamic programming algorithm of [26].
The problem of finding explanations for a query from a DL KB has been investigated by various authors [40, 52] and was called axiom pinpointing by [89]. In particular, they define minimal axiom sets or MinAs for short. BUNDLE is based on Pellet [92] and uses it for enumerating all MinAs.
Reasoners written in Prolog can exploit its backtracking facilities for performing the search of MinAs, as has been observed in various works ([6, 56]). In this category fall TRILL and TRILL
P
[105], that offer a Prolog implementation of the tableau algorithm [46], an approach usually adopted by DL reasoners. Such an algorithm decides whether an axiom is entailed or not by a KB by refutation: axiom E is entailed if ∼E has no model in the KB. TRILL and TRILL
P
use the Thea2 library [99] for parsing OWL in its various dialects. Thea2 translates OWL files into a Prolog representation in which each axiom is mapped to a fact. In TRILL independent Boolean random variables are assigned to the axioms and the explanations for a query will be defined by a DNF
3
Boolean formula f (
Both these Prolog-based probabilistic reasoners and BUNDLE can achieve significant results in terms of scalability and speed [105]. [106] presents TORNADO for “Trill powered by pinpOinting foRmulas and biNAry DecisiOn diagrams”, an improvement of TRILL P in which the BDD representing the pinpointing formula built for a query is directly generated during the construction of the tableau, speeding up the overall inference process.
BUNDLE is available at https://bundle.ml.unife.it/; TRILL, TRILL P and TORNADO are all available in the TRILL on SWISH web application at https://trill.ml.unife.it/ [10].
Parameter and structure learning
DISPONTE applies the distribution semantics for probabilistic logic programming to DLs. However, specifying the values of the probabilities is a difficult task for humans. On the other hand, data is usually available about the domain, that can be leveraged for tuning the parameters. [75] presents EDGE, for “Em over bDds for description loGics paramEter learning”, for learning the parameters of probabilistic ontologies from data. It is targeted to DLs following DISPONTE. [77] considers the problem of learning both the structure and the parameters of PDLs under DISPONTE by means of the LEAP system, based on the combination of CELOE [54] for building new (equivalence and subsumption) axioms to be added to the KB, and EDGE to learn their parameters. [19] presents LEAP MR , its distributed version.
Other algorithms available for these tasks are [64], which learns parameters and structure of CR
Discussion and Conclusions
Probabilistic Logic Programming allows the integration of logic and probability and combines the ability of the first to represent complex relations among entities with the ability of the latter to model uncertainty over attributes and relations. Logic programming provides a Turing complete language based on logic and thus represents an excellent candidate for this integration. Also, it becomes possible to exploit logic programming techniques in PLP systems for reasoning and learning. Languages following the distribution semantics use an approach to logic-probability integration which is simple and coherent across the languages themselves, but powerful enough to be useful in a variety of domains with richly structured data. PLP and more in general SRL techniques have been successfully applied in social networks analysis, entity recognition, collective classification and information extraction, to name a few.
On the other hand, various authors have advocated the use of probabilistic ontologies, see e.g. [63], and many proposals have been put forward for allowing ontology languages, and OWL in particular, to represent uncertainty in the Semantic Web. Ontologies are machine-interpretable semantic models, and the interplay of probabilities and ontologies has been shown fruitful for representation and reasoning on data that was extracted from the semantic web, which can be often uncertain because of the unreliability of many web data sources. OWL is based on Description Logics which are a decidable fragment of FOL, therefore, amenable for automated reasoning and for many reasoning tasks such as classification, retrieval, consistency checking, subsumption checking, satisfiability.
This paper reviewed the distribution semantics for expressing and reasoning in Probabilistic Logic Programs and probabilistic Description Logics. Several learning algorithms for parameter and structure of Probabilistic Logic Programs based on the distribution semantics are presented, with particular reference to Logic Programs with Annotated Disjunctions. These algorithms can be applied to all languages that are based on the distribution semantics, as there are transformations that can convert one language into another.
The distribution semantics was also applied to Description Logics to model ontologies probabilistically and to perform inference and learning over them with techniques based on Binary Decision Diagrams.
Acknowledgements
This paper surveys the work done in the field of Statistical Relational Learning until today, at the Department of Engineering of the University of Ferrara. This work has greatly taken advantage from collaboration with a number of people I would like to thank. First of all, my PhD supervisor, Fabrizio Riguzzi, who led me in the realm of PLP and machine learning. Thanks to Professor Evelina Lamma for her supervision throughout my entire work. Finally, thanks to my colleagues at the ML@unife research group, with whom I worked in all these years.
