Abstract
Ontohub is a repository engine for managing distributed heterogeneous ontologies. The distributed nature enables communities to share and exchange their contributions easily. The heterogeneous nature makes it possible to integrate ontologies written in various ontology languages. Ontohub supports a wide range of formal logical and ontology languages, as well as various structuring and modularity constructs and inter-theory (concept) mappings, building on the Object Management Group’s Distributed Ontology, Model and Specification Language (DOL). Ontohub repositories are organised as Git repositories, thus inheriting all features of this popular version control system. Moreover, Ontohub is the first repository engine meeting a substantial amount of the requirements formulated in the context of the Open Ontology Repository (OOR) initiative, including an API for federation as well as support for logical inference and axiom selection.
Introduction
Ontologies play a central role for enriching data with a conceptual semantics and hence form an important backbone of the Semantic Web. The number of ontologies that are being built or are already in use is steadily growing. This means that capabilities for organizing ontologies into repositories, as well as searching, maintaining, structuring, linking and modularizing are becoming more important.
However, for ontology developers there are only limited options to store and maintain their ontologies. Bioportal (Noy et al., 2009) is one of the most successful ontology repositories available. However, it is limited to ontologies for the life sciences that are written in the Web Ontology Language (
See
For a more detailed discussion of Bioportal, the use of Github for hosting ontologies, and other related work, see Section 2.5.
The main idea behind Ontohub is to offer the ontology community a place where ontologies on any subject and written in a variety of formal ontology languages may be published and maintained, and which offers services that make it easier to develop, reason with and test ontologies. In a nutshell, Ontohub is a repository engine that is developed with the needs of the ontology community in mind. Thus, Ontohub provides services similar to Github in that it offers users a distributed version control system for managing their ontologies. Users have complete control over their (public and private) repositories, and Ontohub allows other users to find and reuse published ontologies. Yet, in addition, it is able to parse all major ontology languages and provides reasoning capabilities for many of them. Last, but not least, it supports the reuse and structuring of ontologies via the Distributed Ontology, Model and Specification Language (
The Ontohub architecture is open and flexible because it is decentralised into several communicating RESTful Semantic Web services. Moreover, Ontohub follows linked open data principles (see Section 2.3). Thus ontological and Semantic Web principles have been employed during the design of Ontohub itself. Surprisingly, this is not standard for ontology repository engines.
The main purpose of this paper is to provide an overview of Ontohub.3
This paper is an updated and extended version of Mossakowski et al. (2014). Among the new case studies presented here, we include one that appeared in Kuksa and Mossakowski (2016).
Ontohub is a web-based semantic repository engine. Users of Ontohub can upload, browse, search and annotate basic ontologies in various languages via a web frontend.4
See
The sources are available under
The design of Ontohub was strongly influenced by the Open Ontology Repository (OOR) initiative (Baclawski and Schneider, 2009).6
See
“[OOR aims at] promot[ing] the global use and sharing of ontologies by (i) establishing a hosted registry-repository; (ii) enabling and facilitating open, federated, collaborative ontology repositories, and (iii) establishing best practices for expressing interoperable ontology and taxonomy work in registry-repositories, where an ontology repository is a facility where ontologies and related information artifacts can be stored, retrieved and managed.” (Baclawski and Schneider, 2009)
See
See
The main implementation used by OOR is (a cosmetically adapted) BioPortal, which, however, does not follow the OOR principles to a significant extent. There are no OOR implementation efforts beyond Ontohub.

Architecture of the Open Ontology Repository (OOR).
Ontohub’s overall design was conceived to satisfy a substantial subset of the requirements set out in the OOR initiative. Ontohub is the first implementation of a repository engine to meet these requirements, including an API for federation as well as support for logical inference and axiom selection. Moreover, it extends the initial OOR vision by several features suggested by the development of the DOL language. First and foremost, this includes abstracting particular ontology languages based on general model-theoretic semantics. This provides a principled logic-based support for heterogeneity in ontology design. This abstraction layer allows for a uniform treatment of different logics, both syntactically and semantically, and facilitates elegant and general solutions for ontology translation, structuring, aligment, and approaches to establish consistency.
A good sample use case for these novel capabilities is given by the recent FOUST initiative (‘FOundational STance’),10
See
See
See
See
See
See
See
Based on the developed architecture, Ontohub offers the following central features:
ontologies can be organized in multiple repositories, each with its own management of editing and ownership rights. version control of ontologies is supported via interfacing the Git version control system. one and the same IRI is used for referencing an ontology, downloading it (for use with tools), and for user-friendly presentation in the browser.
Ontohub is decoupled into different services.
ontologies can be formalized in various logics like RDF, OWL, Common Logic, the Unified Modeling Language (UML),17
See
See
See
intended consequences of ontologies can be proved.
Finally, Ontohub fully supports
See also dol-omg.org.
In the RDF world, linked open data has become standard (Heath and Bizer, 2011; Auer et al., 2017). Linked open data is based on several principles: (1) data should be open source, (2) IRIs (generalised URLs) should be used to identify entities, such that (3) useful descriptions and representations can be downloaded (via HTTP) at these IRIs (which therefore should be HTTP IRIs), (4) different formats are provided under the same IRI, depending on the MIME type of the HTTP request, and (5) relationships to other entities are provided. Altogether, these principles ensure that linked open data can be semantically queried.
However, in the OWL ontology world, to our knowledge, linked open data standards are generally not followed. While OWL ontologies often follow principles (1) and (2), (3) is not always followed, and (4) and (5) practically are never implemented. Usually, for download of an ontology in raw format and in HTML, one has to use different IRIs.
By contrast, in Ontohub, one and the same IRI is used for referencing an ontology, downloading it (for use with tools), and for user-friendly presentation in the browser. Depending on the MIME type of the request, under the IRI the raw ontology file is available, but also a HTML version for display in a browser and a Javascript Object Notation (JSON) version for processing with tools.
Ontohub eases the generation of such linked-data compliant IRIs: they are automatically generated through Ontohub’s repository structure. Ontohub uses IRIs of the form
In the future, this may change into
While Ontohub’s design is inspired by OOR, one major difference is that Ontohub is not a repository, but a repository engine. This means that Ontohub ontologies are organized into repositories. The screenshot in Fig. 2 shows a partial view of Ontohub’s currently available repositories. Some of them, e.g. Bioportal or COLORE, are mirrors of repositories hosted elsewhere (as indicated with the mirror icons), while the others are native Ontohub repositories. The organisation into repositories has several advantages:

Overview of Ontohub repositories.
Firstly, repositories provide a certain structuring of ontologies, whether it be thematically or in terms of organisation. Access rights can be given to users or teams of users per repository. Typically, read access is given to everyone, and write access only to a restricted set of users and teams. However, also completely open world-writeable repositories are possible, as well as private repositories visible only to a restricted set of users and teams. Since creation of repositories is done easily with a few clicks, this supports a policy of many but small repositories (which of course does not preclude the existence of very large repositories). Note also that structuring within repositories is possible, since each repository is a complete file system tree.
Secondly, Ontohub repositories are Git repositories. Git is a popular decentralised version control system. With any Git client, the user can clone a repository to her local hard disk, edit it with any editor, and push the changes back to Ontohub. Alternatively, the web frontend can be used directly to edit ontologies; pushing will then be done automatically in the background. Parallel edits of the same file are synchronized and merged via Git; handling of merge conflicts can be done with Git merge tools.
Thirdly, ontologies can be searched globally in Ontohub, or in specific repositories. Additionally, user-supplied metadata can be used for searching.
For these reasons repositories are utilised as the main structuring elements of Ontohub. In this sense Ontohub is more similar to Github than to individual ontology repositories like Bioportal. We will compare Ontohub to Bioportal and other tools in more detail in the next section.
Apart from related, mostly theoretical work on modularisation and structuring (Cuenca Grau et al., 2008; Kutz et al., 2010; Khan and Keet, 2015), the most extensive conceptual work on ontology repository engines has been done by the Open Ontology Repository (OOR) initiative, which we discussed in detail in Section 2.1.
Comparison of features of ontology repository engines
Comparison of features of ontology repository engines
To sum-up and compare features of different approaches to designing ontology repositories, Table 1 presents an overview of several central Ontohub features in comparison with three other web-based ontology repository engines. BioPortal (Noy et al., 2009) is a repository engine that originates in the biomedical domain, but now has instances for various domains. Beyond browsing and searching, it provides means for annotating and aligning ontologies. Besides OWL, also related languages like the OBO flat file format (Mungall et al., 2012) are supported. WebProtégé (Tudorache et al., 2013) has been inspired by the well-known tool Protégé, but offers only part of Protégé’s functionality. Although github is a repository engine not dedicated to ontologies, we have included it, because it is used by COLORE, a repository of Common Logic ontologies.
Table 1 compares various aspects of these four different repository engines. Version control, difference viewing and synchronous editing concern facilities to process the raw ontology file (as well as any other text files accompanying the Ontologies). Search is usually done over various ontologies or even repositories. The set of supported ontology languages (for expressing unstructured ontologies) greatly varies, as well as support for structuring and alignment of ontologies. Only Ontohub supports a larger variety of languages. Some metrics and visualisation for OWL are provided by some repository engines; the generalisation of this to more ontology languages is an interesting future problem. Theorem proving is supported by Ontohub only. Finally, there are features that reach beyond ontologies proper: metadata, reviews, bug trackers, wikis etc. Also, some repository engines (WebProtégé and github) have a plugin architecture. This would be also a useful feature for Ontohub.
The development and maintenance of ontologies are expensive and time consuming. For these reasons it is imperative to reuse existing ontologies. However, there are often two challenges for reuse: the existing ontologies need to be adapted to fit their new purpose and there is a lack of interoperability between existing ontologies. For example, the adaption may require the extension of an ontology by new axioms, the extraction of a subset of the ontology, a change of semantics from open world to closed world, or the translation from one ontology language to another. The lack of interoperability is the consequence of the fact that ontology development is usually driven by project specific requirements. Thus, two ontologies that were developed independently will likely represent the same domain in different and often conflicting ways.
One of the major goals of Ontohub is to make the reuse of ontologies easier. This functionality is supported by its implementation of the Distributed Ontology, Model and Specification Language (
It is beyond the scope of this paper to provide a detailed presentation of
Ontology languages and logics
As discussed in Section 2 one of the major goals of OOR and of the development of Ontohub is to provide a repository that supports a wide variety of ontology languages. On the most basic level that means that Ontohub is able to store and parse ontologies that are written in different serialisations (e.g., Turtle and RDF XML) or different languages (e.g., OWL 2 DL or Common Logic). More interestingly, the implementation of
Ontohub is able to support these features, because most ontology languages are based on logics. Ontohub and
A logic syntax can be complemented with a model theory, which introduces semantics for the language and gives a satisfaction relation between the models and the sentences of a signature. The result is an institution (Goguen and Burstall, 1992). Similarly, we can complement a logic syntax with a proof theory, introducing a derivability relation between sentences, thus obtaining an entailment system (Meseguer, 1989). In particular, this can be done for all logics in use in Ontohub.
There is also a notion of logic mapping, formalised as an institution comorphism (Goguen and Rosu, 2002). Logic mappings can be classified according to their accuracy (Mossakowski and Kutz, 2011). Sublogics are the most accurate mappings in that they are syntactic subsets. Embeddings come close to sublogics, like injective functions come close to subsets. A mapping can be faithful in the sense that logical consequence (or logical deduction) is preserved and reflected, that is, inference systems and engines for the target logic can be reused for the source logic (along the mapping). (Weak) exactness is a technical property that guarantees this faithfulness even in the presence of ontology structuring operations (Borzyszkowski, 2002).
Ontohub supports ontologies written in the ontology languages and logics shown in Fig. 3. A logic tab (see the Ontohub screenshot in Fig. 4) allows the display of ontologies sorted by their logic. As discussed above,

The logic translation graph of currently Ontohub-supported languages.

ontohub.org portal: overview of logics.

The logic translation graph for DOL-conforming languages.
Figure 5 extends Fig. 3. It is a revised and extended version of the graph of logics and translations (Mossakowski and Kutz, 2011) and visualises the future plans for Ontohub. New nodes include UML class diagrams,
This concludes our discussion of
Figure 6 depicts the Ontohub architecture. The most challenging part of Ontohub’s implementation is the complex tool integration. The key feature of the OOR architecture (see Fig. 1) is the decoupling into decentralised services, which are ontologically described (thus arriving at Semantic Web services). With Ontohub, we are moving towards this architecture, while keeping a running and usable system. The OOR services depicted in Fig. 1 are visualised in the Ontohub architecture in Fig. 6: the large boxes with boldface inscriptions show how the OOR services in Fig. 1 are realised with Ontohub components. We now briefly describe these components and services.

Ontohub in a network of web services.
The services are centrally integrated by the Ontohub integration layer, which is a Ruby on Rails application that also includes the presentation layer, i.e. a front-end providing the web interface, as well as the administration layer, i.e. user rights management and authorisation.
The persistence layer is based on Git (via git-svn, also Subversion repositories can be used) and an SQL database. The database backend is PostgreSQL. For the Git integration into the web application, a custom Git client was implemented in Ruby to be less prone to errors due to changes in new versions of the Git command line client. Efficient indexing and searching (the find layer) is done via Elasticsearch22
See
A federation API allows the data exchange among different Ontohub and also with BioPortal instances. We therefore have generalised the OWL-based BioPortal API to arbitrary ontology languages, e.g. by abstracting classes and object properties to symbols of various kinds. Figure 6 indicates federation by showing several Ontohub and BioPortal instances.
Parsing and static analysis is a RESTful service of its own provided by the Heterogeneous Tool Set (Hets; Mossakowski et al., 2007).23
Available at
We have integrated the OntOlogy Pitfall Scanner! (OOPS!; Poveda-Villalón et al., 2012) as an ontology evaluation service (for OWL only), and from the OOPS! API, we have derived a generalised API for use with other evaluation services.
Inference is done by encapsulating standard batch-processing reasoners (Pellet,24
See
See
See
See
In this section we will illustrate the novel capabilities of Ontohub by discussing three use cases: ontology alignment, the representation of competency questions, and Ontohub’s integrated theorem proving capability.
Ontology alignment in Ontohub
The foundational ontology (FO) repository Repository of Ontologies for MULtiple USes (ROMULUS)28
See
See
See
See
This and the other examples from this section can be found at:
We can then combine the ontologies while taking into account the semantic dependencies given by the alignments using

Combination of ontologies along alignments.

Diagrams of
The Ontohub screenshot in Fig. 7 shows the graph of links between ontologies created by Ontohub as a result of analysis of the
We omit here an explanation of the ontology morphisms labeling the edges of the W-diagram. Details on the construction of this diagram and on semantics of
‘Competency questions’ is the name for a methodology that supports behavior-driven development of ontologies. The approach can be, somewhat simplified, summarized in the following steps (Grüninger and Fox, 1995):
The use cases for the soon-to-be-developed ontology are captured in the form of scenarios. Each scenario describes a possible state of the world and raises a set of competency questions. The answers to these competency questions should follow logically from the scenario – provided the knowledge that is supposed to be represented in the ontology.
A scenario and its competency questions are formalized or an existing formalization is refined.
The ontology is developed.
An automatic theorem prover is used to check whether the competency questions logically follow from the scenario and the ontology.
Steps (2–4) are repeated until all competency questions can be proven from the combination of the ontology and their respective scenarios.
Ontohub enables the representation and execution of competency questions with the help of

Scenario and Competency Question.

Competency Questions for an OWL ontology represented in
Ontohub supports the use of competency questions during ontology development in two ways. First, in Ontohub competency questions can be stored and maintained in the same repository as the ontologies that they refer to. Second, Ontohub provides automatic reasoning capabilities for many languages, which allows Ontohub to evaluate the ontology with the help of the competency questions.
The support of competency questions is realized in Ontohub with
For example, Fig. 10 illustrates how the competency questions about family relationship ontology from the example above may be encoded in
Ontohub analyses the

Representation of the Competency Questions in Ontohub.

Overview of the statuses of all theorems.

Configuration page for a proof attempt.
Ontohub recognises proof obligations in ontologies, like the competency questions above, and allows the user to invoke automated theorem provers to attempt to prove these conjectures and thus turn them into theorems. For simplicity, open conjectures and proven theorems are both summarised under a “theorems” tab in the web application. Note that the term conjecture emphasises that the ontology is viewed as correct and fixed, and its logical consequences are to be discovered. From the perspective of competency questions, conjectures should be called assertions, because they assert that an ontology should imply certain logical consequences. So, the assertion is viewed as correct and fixed, while the ontology needs to be adapted if the consequence cannot be proved. In practice, often both the ontology and its intended consequences need to be corrected. Hence, we subsume both conjectures and assertions under the term conjecture.
When an ontology has been analysed, a “theorems” tab listing all conjectures of the ontology is shown under the “content” tab of the ontology page, as displayed in Fig. 12. There, the user can either choose to prove all conjectures at once or only a specific one. To perform such an action, the next step is to configure the proof attempts, as shown in Fig. 13. Above the actual configuration options, the selected “theorems” to be proved are listed with their names and their definitional text. This can be, for example, the competency questions of the previous section.
Multiple provers can be selected which are invoked in parallel for each selected conjecture. If no prover is actively selected by the user, a default prover chosen by Hets is used, e.g. SPASS for first-order logic.
A timeout for the automated theorem prover is used to limit the prover’s resources. When the given amount of time is exceeded and no proof or refutation has been found yet, the prover is stopped. A different configuration may lead to a successful proof attempt.
Axiom selection can be used to restrict a proof attempt to only include a subset of the ontology’s axioms for proving. This can reduce proving time and (in some cases) make proving feasible at all, which is particularly important for large ontologies. Axioms can be selected manually or automatically with a heuristic.
The manual method allows the user to select every axiom that may be needed for a proof individually. Figure 13 displays how the manual axiom selection is configured: For each axiom of the ontology, there is a check-box that allows the prover to use it. These check-boxes are labeled by the names of the axioms. Hovering the mouse over the check-box label reveals the definitional text of the axiom. Furthermore, the names and definitional texts of the axioms can be seen in the “axioms” tab of the ontology’s “content” page. This manual approach can be useful for small ontologies, but can also result in tedious work for ontologies with many axioms.
The automatic heuristic is a prover-independent implementation of the SInE algorithm (Hoder and Voronkov, 2011). A very simplified view of the SInE algorithm is that it recursively selects axioms that share a symbol with the conjecture or with axioms that have been selected in previous recursion steps. It expects three parameters to be set by the user that influence how many axioms are selected. The configured axiom selection is shared between all the selected provers.
When a proof attempt is finished, it is assigned a proof status telling the result in a single word, as displayed in Fig. 12: For instance, the first conjecture (stating that Chris is a Father) has been evaluated with the resulting status “THM”. These statuses are defined in the SZS ontology (Sutcliffe, 2008). The most common statuses used by provers are
THM (Theorem): All models of the axioms are models of the conjecture. That is, a proof of the conjecture has been found. CSA (CounterSatisfiable): Some models of the axioms are models of the conjecture’s negation. That is, a counterexample has been found, and the conjecture has been disproved. TMO (Timeout): The prover exceeded the time limit.
Of these statuses, “THM” and “CSA” indicate a successful prover run, while “TMO” shows that the prover did not finish successfully by exceeding the given amount of time. We extended the SZS ontology34
Our modified SZS ontology can be found on
CSAS (CounterSatisfiableWithSubset): Some models of the selected axioms are models of the conjecture’s negation (“A counterexample has been found, but not all axioms have been used”).
Note that this also holds for inconsistent ontologies: from them, everything can be proved – even though some OWL provers will return an error when an inconsistent ontology is presented to them.
Details of each proof attempt, including the proof itself if the attempt finished, can be inspected on the proof attempts page. There, the user can see, for example, the configuration with the selected axioms and the axioms that were used.
While the example “CQbase” describes a workflow of ontology development, it is too small to reveal the benefits of axiom selection. Therefore, we show a second example, concerning the Suggested Upper Merged Ontology (SUMO; Niles and Pease, 2001). It has been extracted from the TPTP library v6.4.0 (Sutcliffe, 2009), namely problem “CSR005^0” (it is named “sumo_ problem” in the figures). It contains over 5000 axioms and one conjecture. If a proof attempt is started with the configuration shown in Fig. 14, which is the default configuration for SInE, only 430 axioms are selected for proving and the proof is found within a second. The details of this proof attempt are displayed in Fig. 15. If, on the other hand, all axioms are selected, the prover runs into a timeout and returns with no result. These two proof attempts are shown in Fig. 16, where the attempt with SInE has the number 1 and the attempt without SInE (using all axioms) has the number 2. This is one of many examples of the positive effect of axiom selection in theorem proving.

Configuration of SInE for a theorem proving problem from SUMO.

Details of the finished proof attempt with SInE.

Overview of the proof attempts on the SUMO problem. Proof attempt 1 is with SInE, proof attempt 2 is without SInE.
Ontohub is a repository engine that is being developed with the needs of the ontology community in mind. Ontohub has features similar to a general purpose web-based Git repository engine. Users may create their own Git repositories to store, publish, and manage their ontologies. However, in addition Ontohub provides many ontology-specific services. It is linked-data compliant, it is able to analyse ontologies in all major ontology languages, and it provides automatic theorem proving capability for many of them. Further, Ontohub’s support for
Ontohub is on its way from a research prototype to productive use. For example, the FOIS 2014 ontology competition has used Ontohub as platform for uploading ontologies used in submissions.36
See
See
See
Future work will improve stability and useability, and include the completion of full
Currently, a re-implementation of Ontohub is under way. The goal is to advance the splitting of the infrastructure into different services. Most prominently, we plan to implement the Ontohub frontend completely in React/Javascript, while Ruby on Rails is only used for the JSON API served by the backend. Communication with Hets, which is currently done by the background process manager Sidekiq,39
See
See
