Abstract
The potential of large language models (LLM) has captured the imagination of the public and researchers alike. In contrast to previous generations of machine learning models, LLMs are general-purpose tools, which can communicate with humans. In particular, they are able to define terms and answer factual questions based on some internally represented knowledge. Thus, LLMs support functionalities that are closely related to ontologies. In this perspective article, I will discuss the consequences of the advent of LLMs for the field of applied ontology.
Introduction
With the release of ChatGPT by Open AI in November of 2022, generative AI captured the attention and the imagination of the public. McKinsey claims that generative AI will have an impact across all economic sectors and estimates that it will add the equivalent of 2.6 to 4.4 trillion USD to the world economy (Chui et al., 2023). According to Bubeck et al. (2023), the large language model (LLM) underlying one version of ChatGPT, GPT-4, shows ‘sparks of general artificial intelligence’. In an open letter famous researchers and investors in tech companies stated that contemporary AI systems (like GPT-4) are human-competitive at general tasks and warned of their potentially catastrophic effects on society (Bengio et al., 2023). In the ‘Scientific American’ Tamylyn Hunt claims that AI systems might be already on their path to achieving ‘superintelligence’ and warns: ‘Any defenses or protections we attempt to build into these AI “gods,” on their way toward godhood, will be anticipated and neutralized with ease by the AI once it reaches superintelligence status.’ (Hunt, 2023) Of course, many disagree with that outlook, including some well-known members of our community (Landgrebe and Smith, 2022; Hastings, accepted).
While an AI overlord may take over the world eventually, it is easy to find examples that show that at this time LLMs are anything but superintelligent (see Section 4.1). Nevertheless, it is hard to deny that they are extremely impressive and versatile tools: they seem to understand human language and can answer questions in a way that requires significant background knowledge. In particular, these tools are able to provide definitions of terms and to translate natural language sentences into formal languages like OWL. Hence, LLMs support functionalities that are associated with the development of ontologies: collecting definitions of terms and knowledge representation, both in natural and formal languages. However, in contrast to ontologies, LLMs are general-purpose tools, which do not require manual development and maintenance by domain experts. Thus, there arise some questions, which may be rather uncomfortable to some readers of Applied Ontology: In the era of LLMs are ontologies becoming obsolete? And even if ontologies are still useful, it seems that LLMs can generate ontologies automatically with the help of a few prompts. Thus, will ontologists become obsolete?
In this paper, I argue why ontologies are here to stay and why their development requires humans in the loop. LLMs will likely become valuable tools for supporting ontology development, but this will be likely as part of semi-automatic toolchains, because there are significant challenges for any attempt to develop ontologies that rely exclusively on the output of LLMs.
Will ontologies become obsolete?
To state the obvious: whether ontologies are made obsolete by LLMs, depends on their purpose. Historically, one early motivation for the development of ontologies was the goal of creating artificial general intelligence (AGI). In particular, the Cyc project tried to achieve this goal by (1) creating an ontology that contains millions of everyday terms, concepts, facts, and rules that comprise common sense and (2) creating on top of that ontology a system that is able to communicate in a natural language, like English, which enables the system to learn from humans (Lenat, 2001). Systems based on LLMs like ChatGPT, Google Bard, and Microsoft Copilot can successfully communicate with humans in natural languages and answer questions that require common sense (and sometimes even quite specialized) knowledge. Thus, without creating an ontology, these systems are closer to a realization of Lenat’s goal than Cyc has come even after 40 years of development. Thus, it is reasonable to conclude that to create AGI LLMs are a more promising technology than ontologies.
However, most ontologies are developed for other purposes. For example, reference ontologies are intended to provide a standardized, controlled vocabulary for a given domain, e.g., the Gene Ontology (Ashburner et al., 2000), the Chemical Entities of Biological Interest ontology (ChEBI) (Hastings et al., 2016), the Open Energy Ontology (Booshehri et al., 2021) and the Financial Industry Business Ontology (Bennett, 2013). These ontologies provide a public resource that specify the semantics of their vocabulary in both machine and human-readable form. In addition, they may also represent empirical knowledge.1
For example, ChEBI contains the terminological knowledge that ethanol is kind of alcohol. In addition, it contains the empirical knowledge that ethanol may reduce the activity of the central nervous system.
In short, while for some use cases ontologies may be replaced by LLMs, in many cases that is not possible. However, even though LLMs may not be able to be used as one-to-one replacements of ontologies, it may be possible to use LLMs to generate ontologies that represent the knowledge explicitly that is represented within LLMs in a subsymbolic way. If so, ontologies would not be obsolete, but ontology engineers would be redundant. In the next session we will discuss why for many use cases, in particular for reference ontologies, that is an unrealistic scenario.
Automated ontology generation with LLMs seems to be a trivial task. For example, based on the prompt ‘Please provide a classification of tools in the form of an ontology in OWL Manchester Syntax’, ChatGPT generates an output which, with a bit of manual editing, may be opened in Protege and contains a classification of tools (see Fig. 1). Experienced ontologists will immediately recognize problems with this ontology: e.g., classes do not contain natural language definitions, some classes (e.g., hammer) are represented as individuals, and the classification criterion seems rather inconsistent since some subclasses seem to be defined by their purpose and some by their power source and some by their typical location. Nevertheless, given that the ontology in Fig. 1 is the result of less than 30 seconds of work, it is easy to see the large potential of LLMs for automatic generation of ontologies and knowledge bases, which is now actively explored (Pan et al., 2023; Caufield et al., 2023; Mateiu and Groza, 2023; Babaei Giglou et al., 2023; Chen et al., 2023; Lopes et al., 2023). Shall we, thus, expect that in the future ontologies will be generated by LLMs automatically, and ontologists to become obsolete?

A small ontology generated by ChatGPT.
Before we jump to such a conclusion, we should recognize that attempts to generate ontologies automatically have been studied in the field of ontology learning at least since the mid-1990s. Asim et al. (2018) surveyed 140 papers and summarized the general methodology of ontology learning as follows: The starting point is unstructured text (e.g., a corpus of relevant research papers on a subject); after some reprocessing step, terms and concepts, relations, and axioms are extracted; afterward, the result is evaluated and then stored as ontology. This general methodology is instantiated by different approaches to ontology learning by implementing it with different linguistic, statistical, and logical methods. LLMs enable new possibilities to extract knowledge from unstructured text and, thus, likely will improve the results of ontology learning.
However, despite the progress that has been made over the last decades, as far as I am aware, there are no widely adopted ontologies that are automatically generated by ontology learning techniques.This is the case because the methodology of ontology learning, as stated above, rests on a fundamental misconception of what ontology development requires. This misconception becomes apparent in the following quote by Asim et al. (2018):
[...] instead of handcrafting ontologies, research trend is now shifting toward automatic ontology learning. Whenever an author writes something in the form of text, he is actually doing it by following a domain model in his mind. He knows the meanings behind various concepts of particular domain, and then using that model, he transfers some of that domain information in text, both implicitly and explicitly. Ontology learning is a reverse process as domain model is reconstructed from input text [...]
This task is a challenge for any automatic approach. Even recognizing that the same term is used in connected, but slightly different meanings within a community is difficult. But assuming that a system was able to detect that a term T is used by authors in a community to mean A, B, and C, how should it resolve that situation automatically? There are several options, but none of them are good: (a) Adding the definitions
As we will discuss in Section 4.2, a LLM would learn to use T in the appropriate sense depending on the context. However, as we will see below, this does not resolve the issue of ambiguity for the purpose of generating an ontology.
Of course, ontologists would not choose any of these solutions. Instead, they would meet with relevant experts and help them to agree on a definition of T, likely one that differs from A, B, C. However, that is not within the scope of the methodology of ontology learning and it would be difficult to automate.
In addition to the technical challenge, there is a social challenge for the automation of developing ontologies, in particular reference ontologies or other ontologies whose success depends on wide adoption. Establishing such an ontology requires building a consensus on what terms mean in a given community. Creating an ontology by some automatic process and then telling everybody that they should just accept the result is not a promising strategy to achieve that goal, because people typically do not appreciate being told by outsiders that they should change the terminology that they are used to. Instead building such consensus requires an open, participatory process, where stakeholders can influence the definitions of the process. Facilitating this process and supporting the creation of a consensus is part of the responsibility of an ontologist (see Neuhaus and Hastings (2022) for a more detailed discussion).
In short, the advent of LLMs creates new opportunities for ontology learning, but it does not change the fundamental picture. Since they are not designed to help a group of domain experts to resolve their differences of opinion or to help to establish a consensus within a community, they will not replace ontologists. However, they might be able to make ontologists more productive by making ontology development easier.
While LLMs will not automate ontology development, they will hopefully make ontology development easier. As an example, Mateiu and Groza (2023) are working on a Protege Plugin that uses GPT to translate natural language sentences into OWL. This kind of functionality promises to be very helpful since axioms could be automatically generated based on natural language definitions. In addition, it would enable a new kind of quality check, which involves comparing the axioms that are associated with an entity with the axioms that are generated based on its natural language definition and other annotations. Another possible application of LLMs is to predict possible extensions of an ontology (possibly as part of a semi-automatic setup where proposed extensions are manually reviewed). For example, we have predicted extensions of the ChEBI ontology using transformers, which are a kind of deep-learning architecture that is also used by LLMs (Glauer et al., accepted-b; Hastings et al., 2021).
While I am generally optimistic about the potential of LLMs to make ontology development easier, there are important challenges for extracting knowledge from LLMs to build ontologies. Ontologies are supposed to contain unambiguous definitions and to be logically consistent. LLMs are not well designed to support either.
Challenge 1: Lack of logical consistency
Here is a riddle: An astronaut is on the way to a solar system, which contains three inhabited planets. One planet is inhabited by Aargs, another by Blorgs, and the third by both. The astronaut has a map, where the planets are labeled ‘Aarg’, ‘Blorg’, and ‘Aarg and Blorg’, but he knows that all of these labels are incorrect. Before he arrives at his destination he needs to figure out which people live on which planet. For this purpose, he can radio one person on one of the planets and ask which species they belong to. What is the best strategy?
The quoted prompt above contains a variant of a widely known logic riddle.3
A solution is included at the end of this paper.
The answers of ChatGPT, MS CoPilot, and Google Bard are not just false, their answers contradict the premises of the riddle. This distinguishes them from humans, who would rather admit that they did not solve the riddle than answer something that is obviously logically inconsistent with the question. Not surprisingly, the ‘logical blind spots’ of these systems are not just limited to consistency but also apply to logical inferences, as the ‘proof’ in Fig. 2 illustrates: based on the fact that all primes larger than 2 are odd, ChatGPT falsely infers that all odd numbers larger than 2 are prime.

Logical reasoning with ChatGPT.
ChatGPT, MS CoPilot, and Google Bard fail to detect logical inconsistencies and fallacies because of the way these systems work: the core functionality of an LLM is to map some input text (i.e., a sequence of tokens4
In the context of LLM ‘token’ means ‘text chunk’. In the simplest case a token can be a word, but it can also be a part of a word or involve other symbols like question marks.
LLMs are not built on top of some ontology in the sense of a concrete document, such as an OWL file. However, LLMs represent knowledge (in a non-symbolic way) and that knowledge seems to be organized in some form; since, for example, ChatGPT will readily agree that all dogs are mammals and that no mammals are fish. Thus, the question arises whether the responses to these questions reflect a consistent ontological view, which can be utilized to build ontologies.
As an experiment, I tried to explore the categories that are used by ChatGPT to organize its knowledge by asking it questions like: ‘What is the difference between a monkey and a hammer?’ ChatGPT responds, quite reasonably, that one is a living being and the other is an inanimate tool.5
Of course, one can ask ChatGPT directly: ‘What are the most general categories of entities?’ However, it responds to these kinds of prompts with content from the philosophical literature, rather than with a list of the categories that it uses in organizing its knowledge.
After using these kinds of questions, there are several observations to be made: ChatGPT is usually consistent in the categorizations it uses. Changing examples or recombining parts of different questions usually lead to the expected responses. Further, by exploring ChatGPT in this way, one encounters many categories that are familiar to ontologists: abstract object, physical object, event, concept, biological entity, artifact etc.
However, small differences may lead to completely different responses. E.g., during my experiments the question ‘Is a cave (as geological feature) a physical object?’ led to a negative response. In the same session directly as a follow-up, the prompt ‘Is the cave (as geological feature) a kind of physical object?’ led to an affirmative response. First, it argued that a cave is a ‘composite of physical elements and structures, and the term is used to describe the overall geological formation rather than a distinct, standalone object’, then it argued that a cave ‘as a geological feature refers to a complex physical entity composed of various elements’ and, thus, ‘[...] is indeed a kind of physical object or formation within the Earth’s crust.’ In one case exactly the same question (‘Is a soccer team a physical entity or an abstract entity?’) led to opposite answers in different sessions. Another example where the response seemed to be dependent on the context is the question whether litter is a kind of artifact or not.
Further, there is sometimes a disconnect between the way ChatGPT categorizes entities and how it explains its answers. E.g., it defines physical entity as anything ‘that occupies space and has mass, thus having a tangible, material existence’ and abstract entities as ‘conceptual or non-physical entity that exists in thought or within a specific framework but lacks a tangible, concrete presence in the physical world.’ It follows from these definitions that these categories are disjoint. However, if asked whether some entities are both physical and abstract, ChatGPT lists many examples including national flags, physical laws, and coins. Interestingly, while this response contradicts its previous statements, the contradiction disappears (or is at least alleviated) when one asks ChatGPT to categorize the examples. E.g., when then asked whether a coin is both physical and abstract, it usually6
The answer varies depending on context and the exact formulation.
These results seem to be confusing. However, they can be explained by considering the differences between ontologies and LLMs. Ontologies resolve ambiguities by distinguishing between terms and the entities they denote. LLMs do not resolve ambiguities, rather they learn to navigate them during their training by adjusting the mapping from texts to probability distributions of tokens appropriately. For this reason, ChatGPT categorizes ‘book’, dependent on the context, as a physical entity or as a kind of literary work that is subject to copyright law. A major benefit of this probabilistic approach is that it enables LLMs to deal with the kind of ambiguities that are ubiquitous in natural language. This feature also enables it to manage inconsistent uses of technical terms like ‘artifact’ in the literature. However, as a consequence, LLMs may give inconsistent responses. And, thus, although the ability to handle ambiguity is a strength of LLMs, this strength is an obstacle to the goal of supporting ontology development, since ontologies rely on unambiguous logical definitions, which are applied consistently.
With little effort, ChatGPT can be prompted to generate (almost) syntactically valid OWL files, which contain a taxonomy of some domain. Further, a more in-depth exploration of the way ChatGPT categorizes its knowledge shows that its responses are often quite reasonable. Hence, there is no doubt LLMs provide exciting opportunities for supporting ontology development, and there are already tools developed that use them for that purpose. Applied Ontology’s upcoming special issue on “Ontologies and Large Language Models” will be covering this important topic from many angles.7
However, this does not mean that either ontologies or ontologists are going to be obsolete. Further, when we attempt to use LLMs for ontology development, we need to be aware that LLMs do not implicitly hide an ontology within their billions of parameters, which we just need to tap into. The training data of any LLM likely contains a variety of divergent views on any given subject matter and LLMs are not designed to ensure logical consistency. Therefore, we should not expect the output of an LLM to reflect a logically consistent and ontologically sound view of any given domain.
Footnotes
Acknowledgements
I would like to thank Martin Glauer, Janna Hastings, and Till Mossakowski for their helpful feedback.
Solution to the riddle
To solve the riddle one needs first to realize that since all labels are incorrect, asking an inhabitant of the planet labeled ‘Aarg and Blorg’ enables one to determine which species lives on it. The second step requires a distinction of cases: In case the inhabitants of the planet labeled ‘Aarg and Blorg’ are Aargs, then the planet labeled ‘Blorg’ must be inhabited by both races, because otherwise its label would be correct. Consequently, this leaves the planet labeled ‘Aarg’ as the home of the Blorgs. The other case works analogously.
