Abstract
An increasing amount of data is now available from public and private sources. Furthermore, the types, formats, and number of sources of data are also increasing. Techniques for extracting, storing, processing, and analyzing such data have been developed in the last few years for managing this bewildering variety based on a structure called a knowledge graph. Industry has devoted a great deal of effort to the development of knowledge graphs, and knowledge graphs are now critical to the functions of intelligent virtual assistants such as Siri, Alexa, and Google Assistant. The goal of the Ontology Summit 2020 was to understand not only what knowledge graphs are but also where they originated, why they are so popular, the current issues, and their future prospects. The summit sessions examined many examples of knowledge graphs and surveyed the relevant standards that exist and are in development for knowledge graphs. The purpose of this Communiqué is to summarize our understanding from the Summit in order to foster research and development of knowledge graphs.
Introduction
While there is a long history of the use of knowledge graphs (KGs) across various domains, they have proven in the last few years to be an especially important tool for semantic technology and research areas. As structured representations of semantic knowledge that are stored in a graph, KGs are lightweight versions of semantic networks that potentially scale to massive data repositories such as the entire World Wide Web (“Semantic Network”, 2020). Industry has devoted a great deal of effort to the development of knowledge graphs, and they are now critical to the functions of intelligent virtual assistants such as Siri, Alexa, and Google Assistant. Some of the research communities where KGs are relevant include applied ontology, big data, linked data, Open Knowledge Network (OKN), artificial intelligence (AI), and deep learning.
The Ontology Summit 2020 examined KGs from several perspectives during a series of virtual sessions held from September 2019 to June 2020. This Communiqué synthesizes and summarizes the findings of this series. The Ontology Summit 2020 was organized by question words whose answers are considered basic for information gathering, problem solving, or establishing a context. These questions include the traditional Five Ws to which we added “How”, “Whence”, and “Whither”, as shown in Fig. 1. Accordingly, this Communiqué is organized based on these question words. In order to promote a consistent terminology for the notion of a KG, we begin in Section 2 by proposing a practical answer to “What” with a definition of a KG based on definitions published in the literature as well as on invited speakers and discussions during the summit. Section 3 gives some suggestions for “Why” KGs have recently begun to be popular, as well as from “Whence” KGs arose. Section 4 is devoted to addressing “How”, “Who”, “Where”, and “When” by examining examples of techniques and tools used by the many activities of KG systems. Sections 5, 6, and 7 are concerned with “Whither”. Section 5 lists standards and standardization efforts relevant to KGs, and Section 6 lists some of the problems and challenges of KGs that were identified by the Ontology Summit. Section 7 speculates about the future prospects of KGs. The Communiqué ends with a conclusion and acknowledgments.

The context questions.
The Ontology Summit 2020 covered a great deal of material. This Communiqué only outlines the main points of the 32 sessions that occurred over 9 months. Consequently, much of what took place is not covered in this article. It is our intention to present the findings of the summit more completely and in more detail in a series of articles to be published separately.
We begin by addressing the question of what a knowledge graph is. Unfortunately, there are a great many academic papers as well as websites and companies that have proposed many different definitions. To synthesize a coherent definition that helps frame the discussion about KGs, the definitions in references (Krötzsch and Thost, 2016; Paulheim, 2017; Blumauer, 2014; Färber, Ell, Menne, Rettinger, and Bartscherer, 2018; Pujara, Miao, Getoor, and Cohen, 2013; Rohrseitz, 2019; Aijal, 2019; Bergman, 2019; Aasman, 2019) were reviewed. The definitions have the following common features:
All of the definitions emphasize that a KG represents interrelationships.
In every definition, a KG uses techniques to extract knowledge from one or more sources.
The organization of knowledge is based on a graph.
A KG has a schema.
A KG supports various graph-computing, search, and query interfaces.
To see how these features are specified in a KG definition, consider the definition of a KG given by Nicola Rohrseitz:
“A Knowledge Graph is a set of datapoints linked by relations that describe a domain, for instance a business, an organization, or a field of study…. Knowledge Graphs are secondary or derivate datasets: They are obtained by analyzing and filtering the original data…. Knowledge Graphs are also sometimes called semantic networks. Semantic emphasizes the fact that the meaning is encoded together with the corresponding data. This is done through the taxonomies and ontologies …” (Rohrseitz, 2019)
This definition begins by specifying that a KG is “a set of datapoints linked by relations” thereby specifying that a KG represents interrelationships and is organized as a graph. The fact that a KG extracts knowledge is specified by stating that KGs are “secondary or derivate datasets”. The schema is “done through the taxonomies and ontologies”, and the purpose of the taxonomies and ontologies is to encode meaning together with the corresponding data. This definition of a KG specifies that KGs are constructed by analyzing and filtering source data. Rohrseitz (2019) follows the definition of a KG with a discussion of other operations on KGs beyond those operations that are used for constructing KGs.
From these features it is apparent that a KG is not simply another way to represent facts. It involves a software architecture that includes active capabilities for extracting and processing the facts. Jans Aasman (2019) classified the operations of a KG as (1) Generation, (2) Storage, and (3) Applications. In addition, he subclassified the Generation operations into Collection and Processing. Examples of Collection operations include: ingestion (of one or more sources), web extraction, catalog extraction, and ontology importation. Processing operations include: schema mapping, entity resolution, and data cleaning. Application operations include: querying, graph mining, recommendation generation, searching, and question answering. Statistical and machine learning techniques are frequently employed by many of these operations.
The fact that it is the operations of a KG that are its primary distinguishing feature may be the reason for the confusion about what a KG is. It has also led many to characterize KGs as being “nothing new” and as simply another buzzword. Unfortunately, the term “knowledge graph” is partly to blame for this. It tends to suggest that a KG is no more than a special kind of graph or network. So it might be helpful to employ a term that is less confusing. Accordingly, we define a knowledge graph architecture (KGA) to be the fundamental structures of a KG and the operations that can be performed. It functions as a blueprint for the KG, specifying the tasks to be executed and how they are related to each other. Figure 2 is an example of a KGA. The left-hand column of Fig. 2 shows the various kinds of sources for the KG. The boxes at the bottom of the middle column represent the extraction operation of the KG from the sources. The KG itself is shown in the center of the figure. It consists of a graph (“data layer”) and a schema (“pattern layer”). The right-hand column represents the operations that can be performed on the KG.

A knowledge graph architecture from (Yuan, Zhang, Dai, Peng and Zhao, 2018) HTML is the hyper text markup language and XML is the extensible markup language.
The architecture for a KG system is analogous to the architecture for a data warehouse (DW). DW technologies have been used to integrate and harmonize data so that analysts and users can reliably extract meaning from their large enterprise datasets. But while well-established, the DW approach involves significant up-front and ongoing costs, as well as serious risks. Further, due to data complexity, DWs don’t address significant areas of enterprise data. However, to be fair, the KG approach also can have significant costs and complexity. Data warehousing leverages older technologies that lack the flexibility of KGs, making them too slow to meet the ever-changing demands of Big Data. KG systems offer a more modern, flexible and dynamic approach to data sharing and integration; and, as discussed in Section 4, many different methods and technologies are employed by KG systems.
As KGs and KGAs are engineering artifacts, they have associated processes for development, testing, validation, management, and the overall lifecycle. By analogy with DataOps (Liebmann, 2020), the set of practices that combines all of these processes with the KGs might be called the KnowOps. Despite the name “knowledge graph”, there is no requirement that the statements be implemented as a graph. A collection of KG systems could themselves be the sources for an overarching KG that fuses the source KGs.
Ontologies can play a number of roles in a KGA. Some KGs incorporate an ontology as part of the structure of a KG, in which case the notions of KGs and ontologies are essentially equivalent. In other cases, the KG and ontology are decoupled, and it is possible for one KG to have more than one associated ontology so that an ontology plays a role analogous to a relational database view.
We summarize the discussion in this section with a proposal for the definition of a KG and KGA as follows:
A KG is a representation of a set of statements in the form of a node- and edge-labeled directed multigraph allowing multiple, heterogeneous edges for the same nodes. A collection of definitional statements specifying the meaning of the knowledge graph’s labels is called its schema.
A KGA provides a combination of scalable technologies, specifications, and data cultures for representing densely interconnected statements derived from structured or unstructured sources across domains in a reasonable way that is both human- and machine-readable.
A KGA together with a collection of KGs is a KG system (KGS).
While it is beyond the scope of this Communiqué to give formal mathematical definitions for all of the terms used in the definitions of KG and KGA above, we can give a formal definition of the most fundamental term, namely the notion of a “node- and edge-labeled directed multigraph” as follows:
A node- and edge-labeled directed multigraph is an 8-tuple (V, E, s, t, V is a set of nodes and E is a set of edges.
Note that there is no assumption that V and E must be disjoint sets. Indeed, it can be useful for an edge to also be a node. By doing so, an edge can have properties and can be related to other nodes, some of which may themselves be edges.
We now examine the question of why KGs and KG systems have become popular. This is part of the broader question of why one bothers with information at all, a question provocatively asked by Matthew West (West, 2020). Within that broader context, Jans Aasman suggested several reasons why KGs have recently become so popular (Aasman, 2019).
In business, information is used to support decisions. If information required for a decision is missing or inaccurate, the risk of a mistake increases. So, to support a decision, information needs to be fit for purpose, which means information management is a quality management process where information is the product. However, determining the information requirements is problematic. It turns out that asking people for their requirements gives unreliable results. A better approach is to document the processes to the level where key decisions are explainable. It is then possible to document the information requirements for those decisions.
Information has a lot of properties, but only some of them are critical for its use in supporting decisions. One of the hardest properties to achieve is consistency. If data is consistent, then when it arrives from different sources it can just be brought together and used immediately. Consistent data uses the same data model and reference data (or, if you prefer, knowledge graphs of the same ontology). However, if the sources are not consistent, either individually or with each other, then one must not only extract information from sources but also resolve the inconsistencies. Consequently, it is necessary to develop a set of tools for this purpose. In other words, there is a need for a software architecture for the information.
Given that one needs a system for capturing knowledge, a natural question is what it is that made KGs so popular. One potential reason is that graph database technology is now commonly available due, at least in part, by the advent of social media companies. Another possible reason is that people are more accustomed to classifications. This could be due, in part, to the prevalence and acceptance of tagging in social media by the general population. As the number of classifications increases, it is natural to classify the classifications, in other words, to create a taxonomy. Still another possible reason for the popularity of derived databases such as KGs is the advent of tools for entity extraction and Natural Language Processing (NLP) that are easy to obtain and to deploy in applications. Finally, systems that provide tools for sophisticated processing and analysis are more easily developed due to machine learning and advanced analytics tools that are now easy to obtain and to use in applications. While some of these trends are not specifically about KGs, their compatibility with KGs may be contributing to the increasing popularity of KGs.
Note that a capability for reasoning/inference is not in this rationale. Indeed, there are successful KG systems that either have a minimal schema or do not have significant emphasis on the schema. That said, there is general agreement about the usefulness of ontologies for KG systems.
Knowledge graph techniques and tools
In this section we provide a sample of the kinds of techniques and tools being used in and being developed for KGs. Many of the projects described in this section are being supported by the Open Knowledge Network (OKN). The OKN is a program of the U.S. National Science Foundation with the goal of developing an advanced science data infrastructure that is interoperable and has an open architecture, making it easier to access and link heterogeneous data products (Baru, 2020). More succinctly, the goal of the OKN is a “Siri for Science”.
Section 4.1 describes various forms of reasoning and mathematical techniques from probability theory and category theory for KGs. One important challenge for KGs is spatial and temporal reasoning, and Section 4.2 presents two projects addressing this. The rest of this section is devoted to projects in some of the many domains for which KG techniques have been applied. Section 4.3 is concerned with extracting KGs from scientific publications, Section 4.4 is concerned with KGs in product design and manufacturing, Section 4.5 describes two applications of KGs to government problems, and Section 4.6 proposes to use KGs for a new kind of dynamically interactive textbook. For more details about each project, please see the link to the associated slide or video presentation given by the cited reference.
Techniques
Despite the varying definitions of the notion of a KG, there is a common goal: to use these KGs to gain important insights and make data-driven discoveries. Anirudh Prabhu defines “insights” as important patterns, trends, and concordant information obtained from the knowledge graphs, especially in cases where such features are not obvious from simple data exploration tasks (Prabhu, 2020). Using reasoners to gain insights and make inferences about the data is a method commonly known and utilized in the Semantic Web community. But by utilizing methods (both visual and analytical) known in network science, one can identify previously unseen patterns and trends and use these insights to generate or validate hypotheses and aid in scientific discoveries. Global metrics are used to gain insights about the entire network structure and compare two or more networks with each other. Local metrics are used to inspect individual network structure and find important trends within that network. Community detection algorithms are used to mathematically identify groups of nodes within a network, usually based on how these nodes are connected to each other. Lastly, Prabhu examines (both visually and mathematically) the evolution of a network based on the change in a specific data feature (e.g., time, pressure, or temperature) to identify how the addition or removal of a node (or set of nodes) affects the overall network structure.
Another approach to gain insights is to use probabilistic KGs as presented by Srihari (2020). These KGs incorporate statistical models for relational data. Triples are assumed to be incomplete and noisy. There are two main types of models: latent feature models and Markov random fields (MRFs). Latent feature models can be trained using deep learning. MRFs can be derived from Markov Logic Representations of facts in a database.
Yet another technique for gaining insights is to use the mathematical theory of categories and functors. In “Composing Knowledge Graphs, inside and out”, Spencer Breiner explained how some of the limitations of graph-based knowledge representations can be addressed formally by using foundational methods from category theory (Breiner, 2020). While category theory is regarded as very abstract even among mathematicians, categories are in fact closely related to KGs. A category consists of a collection of objects and arrows (directed links) between them, which is exactly what one means by a directed graph. This approach can be applied to practical issues. To illustrate its use for a practical issue, the problem of open-shop scheduling in operations research was presented using category theory.
Temporal and spatial projects
One challenge facing KGs is the problem of representing time and space. Even very powerful AI systems can falter in dealing with time. If you ask Google “How old is Joe Biden?” or “How old is Mitch McConnell?”, you get the correct answers; but if you ask “Who is older, Joe Biden or Mitch McConnell?”, all you get are links to articles that mention both politicians. The problem is that while KGs typically do include temporal features of entities, they are treated as little more than textual strings with no other semantics. Furthermore, many features and relations that are in fact time-dependent, such as the spatial extent of countries, are treated as timeless. This situation is surprising since temporal reasoning is highly developed in AI and database management. Some aspects of temporal and spatial reasoning were covered in the Ontology Summit 2018 on Context and Ontologies (Baclawski et al., 2018). Additionally, standards bodies have been developing temporal and spatial representation and reasoning standards as discussed in Section 5. Unfortunately, within KGs time is an afterthought if it is included at all. Spatial reasoning has similar challenges, although the need for spatial reasoning is less common than for temporal reasoning. The KG research community should explore all aspects of time and space, from the abstract to the concrete, from general purpose reasoning to highly specific applications. In the long run, the benefits to AI systems of effective, flexible temporal and spatial reasoning will be large (Davis, 2020). The following two projects are attempting to respond to this challenge.
The KnowWhereGraph developed by Krzysztof Janowicz is a project that takes a geographic information system (GIS) to the next level, by providing open graph-based linking and semantic enrichment technologies far beyond pre-defined data themes and silos (Janowicz, 2020). The ultimate goal is to understand how to engineer meaningful features (independent variables) via a KG-based GIS for downstream models such as supply chain forecasting or soil health mapping by including spatial-temporal semantics.
Sean Gordon is part of a team that is prototyping an OKN for spatial decision support (Gordon, 2020). Based on existing work by members of the team, four case study sub-teams were created that are working on needs analysis for multi-stakeholder organizations focused on three core environmental themes (water quality, wildland fire, biodiversity) in different regions of the western U.S.; and one case study sub-team was created that is working on a professional body of knowledge for geographic information science and technology. Each of the four case study sub-teams used interviews and/or workshops to have collaborators identify need-to-know-concerns and questions. This approach helped prioritize a) schema for a KG that will support decision making for each theme; b) spatial decision support resources to add to the KG; and c) particular use cases.
Scientific publishing
The main output of Science is publications. There are around 30 000 journals, and about two million papers are published every year. Efforts to extract the knowledge in the scientific record predate the World Wide Web (Baclawski et al., 1993a,b). Yolanda Gil describes seven ontologies that provide essential capabilities, but much work remains to be done to capture more comprehensively the scientific record. Are we far from a day when each scientific article will be properly linked to hypotheses, models, software, provenance, workflows, and other key scientific entities on the Web as shown in Fig. 3? Will AI research tools then be able to access this information to generate new results? Will AI systems ultimately be capable of autonomously writing scientific papers in the future? (Gil, 2020)

Links among entities of scientific knowledge from (Gil, 2020).
The mission of the Manufacturing Open Knowledge Graph (MOKN) project is to structure the world’s public information on product design and manufacturing (Starly, 2020). MOKN’s broader impact is to make information available regarding sourcing critical part components, instantaneous gathering of specific manufacturing capabilities, location of those services, and availability of resources. The global pandemic crisis serves as a contextual example as to the value of this knowledge especially for alternate sourcing and prequalification of vendors – with implications for public health and national security. Accessibility also empowers rural and suburban communities dependent on manufacturing services.
Government
Matthew West is involved with an ambitious attempt in the UK to develop a Digital Twin of the entire national infrastructure. The aim is to establish a distributed Digital Twin of consistent data so that authorized users can construct queries across the Digital Twins in order to answer questions like “Which Tower Blocks have the same type of cladding as Grenfell Tower?” The Information Management Landscape sets out the information needed to support the critical properties of data and the information quality management process. Part of this infrastructure is an integration architecture that allows the distributed National Digital Twin to be virtualized so users can see it as a single database with access only to the data they are authorized to see (West, 2020). In effect, this is a KG system for which the underlying source data is extracted from a large collection of KG systems, each of which is devoted to a single city or small region.
The Rich Context project described by Paco Nathan is the KGA of the Administrative Data Research Facility (ADRF) platform that is currently used by 50 federal, state, and local agencies in the U.S. to identify people with specific expertise (Nathan, 2020). ADRF was cited as the first example of Secure Access to Confidential Data in the final report of the Commission on Evidence-Based Policymaking.
Education
College students today face the challenge of mastering concepts in new subject areas and relating those concepts across multiple disciplines, yet their textbooks have the “one size fits all” nature. In “Textbook Open Knowledge Network”, Vinay K. Chaudhri presented Intelligent Textbooks (ITB) using AI and KGs to solve these problems. Students can dynamically interact with the textbook content, increasing their ability to understand concepts, increasing engagement, and thereby, improving academic performance (Chaudhri, 2020).
Standards
We now discuss some standards that are relevant for KGs. In general, “Standards are documented agreements containing technical guidelines to ensure that materials, products, processes, representations, and services are fit for their purpose” (Allen and Sriram, 2000). In the context of KGs, the main purposes of standards are to support KG construction and interoperability. Generally speaking, standards for KGs focus either on representation of KGs and their schemas (i.e., the syntax) or on the meaning of KGs (i.e., the semantics). One kind of syntax standard specifies how to represent a KG and its schema. Such a syntax is called an interchange format. Unfortunately, interchange formats have proliferated, leading to the need for another kind of syntax standard which specifies how to map from one format to another one. Semantic standards range from simple vocabulary lists to sophisticated ontologies, and ontologies range from very general “foundation” ontologies to very specific domain ontologies.
This section outlines the landscape of existing and proposed standards for KGs. Section 5.1 describes some of the reasons why it is useful to have agreed upon standards for KGs. Section 5.2 surveys the organizations that are relevant for KG standards. These organizations and their standards efforts are discussed in the rest of the section. Section 5.3 surveys syntax standards that are relevant for KGs. The next two subsections are concerned with semantic standards. Since every ontology is, in principle, a possible schema for KGs, it would not be possible to survey every ontology that is relevant for KGs, nor would it be very informative to list them. Instead, the summit examined two ontologies in more depth. In Section 5.4, a prominent foundation ontology is discussed, and in Section 5.5, a specialized domain ontology is covered. Finally, Section 5.6 revisits the advantages and disadvantages of the standardization of ontologies and proposes how ontologies could benefit not only KGs but also standards in general.
Advantages for KG standardization
As explained above, standards for KGs serve primarily for KG construction and interoperability. For example, one can develop a standard to represent objects and relationships for a manufacturing KG, which can be used all over the world to develop KGs in a particular domain. These KGs can then be more easily integrated at a later stage. KG systems differ not only in the sources for their knowledge (e.g., the Web, sensor data in some domains, commercial transaction data, etc.) but also in the operations for generating, processing, and utilizing the results. For example, does the KG system support reasoning? If it does, then what kind of reasoning? When reasoning or inferencing is used, there is an expectation that the result of such an action will produce results that are consistent with expected interpretation(s). Such interpretation(s) are based on distinctions among the entities involved in the inference and are expressed via the labels (usually, in natural language) used in the representation.
A KG is created to meet certain needs and uses in some context, though the context may not be sufficiently or explicitly recognized (or represented). Consequently, a KG will necessarily have limitations of coverage (i.e., scope) and completeness (level of detail), which will hinder interoperability. This is known as the “silo” problem; namely, an ontology that is narrowly focused on specific needs in a vaguely defined context will generally not be interoperable with any other ontologies. One can ameliorate this problem by employing an ontological analysis that identifies relevant standard ontologies on which to base the KG.
Another advantage of standardization is that it can foster innovation in a field. There are many examples of this phenomenon. For example, the standardized musical notation has spurred hundreds of years of creative music compositions. More recent examples include the communication standards for the Internet, and the HTML and XML standards. For a standard to have this benefit, it is important for it to be introduced at the right time (Allen and Sriram, 2000).
Standards organizations
Standards are developed by standards developing organizations (SDO). An SDO is any organization that develops and approves documented standards using a wide variety of methods to establish consensus among its participants. There are hundreds of SDOs. Such organizations may be: accredited, international treaty-based, international private-sector based, an international consortium, or a government agency. An example of an accredited SDO that is relevant to KG standardization is the International Committee for Information Technology Standards, which is accredited by the American National Standards Institute. The International Telecommunication Union-Telecommunication, and the International Civil Aviation Organization are international treaty-based. International private sector-based SDOs include the International Organization for Standardization (ISO), the Institution of Engineering and Technology (IEC) and the Institute of Electrical and Electronics Engineers (IEEE). International consortia include the Object Management Group (OMG), the Organization for the Advancement of Structured Information Standards (OASIS), the Internet Engineering Task Force (IETF), and the World Wide Web Consortium (W3C). U.S. government agencies that may be relevant for KGs include the Department of Defense, the Department of Homeland Security, and the National Institute of Standards & Technology (NIST; Carnahan, 2020).
Knowledge graph syntax standards
The most commonly used standard for representing KGs is the Resource Description Framework (RDF). RDF Schema (RDFS) and the Web Ontology Language (OWL) are ontology languages layered on RDF. All of these standards are from the W3C.
Although the OMG is best known for software engineering standards such as the Unified Modeling Language (UML), the Meta-Object Facility (MOF), and the Model Driven Architecture (MDA), the OMG has been a very active developer of standards relevant to KGs. The group within the OMG that is responsible for KG and ontology related standards is the Ontology Platform Special Interest Group (OPSIG). The OPSIG has been an active, contributing working group of the OMG for over 15 years. Among the standards developed by OPSIG is the Ontology Definition Metamodel (ODM), which is a mapping standard that allows one to specify an ontology using UML. The Distributed Ontology, Model and Specification Language (DOL) is a standard for mapping between different languages for ontologies and models. The MOF to RDF Mapping (MOF2RDF) is a lightweight approach to transform MOF-compliant metamodels into OWL Ontologies (Kendall, 2020).
Although inference and reasoning is not universally supported by KGAs, it is nevertheless common to include it as a feature. Various logical languages have been used for inference and reasoning, but all of them are either equivalent to or subsets of first-order logic (FOL). Common Logic (CL) is the ISO/IEC standard for FOL (ISO/IEC 24707:2018). In “Knowledge Graphs and Logic”, John Sowa gave an overview of CL and related standards for logic (Sowa, 2020). The CL standard includes specifications for three dialects: the Common Logic Interchange Format (CLIF), the Conceptual Graph Interchange Format (CGIF), and an XML-based notation for Common Logic (XCL).
The basic formal ontology
A foundation ontology is an ontology which consists of very general terms (such as “object”, “property”, “relation”) that are common across all domains. A foundation ontology is also known as an upper ontology or a top-level ontology. The most popular foundation ontology is the Basic Formal Ontology (BFO). This ontology is part of the standard ISO/IEC 20838 “Information technology – Top-level ontologies”. Barry Smith is the developer of BFO, and he spoke at the Ontology Summit about his experiences with improving interoperability of KGs (Smith, 2020). Ontologies have been enormously successful in the biomedical field for some 20 years, where the Gene Ontology (GO), the first version of which was created in 1998, was referred to from the very beginning as a “directed acyclic graph” representing knowledge about genes and gene products. The foundational ontology of GO is the BFO. With the growth in impact of the data from the human and other model organism genome projects, the data-annotation needs of the biomedical informatics world expanded tremendously, and this led to the creation of new ontologies, for example for proteins, cell types, diseases, and others. This expansion of ontology development continues to this day with the new COVID-19 ontology. The influence of BFO in non-medical domains is indicated also by the reception of the ISO/IEC 21838 standard in areas such as digital manufacturing, particularly through the creation of the Industrial Ontologies Foundry (IOF). Under the auspices of this entity, work is on-going to relate BFO to current developments on the Standard for the Exchange of Product model data (ISO 10303) and the manufacturing technology standard (MTConnect) for factory device data (“Industrial Ontology Foundry”, 2020).
Standards for the financial services industry
The financial services industry is an exceptionally large, mature, and data-intensive industry that has an impact on virtually everybody. Michael Bennett presented an overview of KGs in the financial sector (Bennett, 2020). While most of the historical standards in the financial services industry deal with messaging requirements or data formats, there are also industry standards for formal semantics. The Financial Industry Business Ontology (FIBO) was conceived to provide a common language across these messaging standards, while a more recent initiative from the ISO Technical Committee dealing with financial services aims to supplement the existing ISO 20022 XML messaging standard with formal semantics. FIBO arose out of a need to harmonize terms across the industry as a common language for reuse of data in reporting, risk management, and compliance. This need arose out of a realization that, while it was hard to reach agreement on common terms, the concepts themselves were well understood. FIBO and the related Financial Instrument Global Identifier (FIGI) are OMG standards.
While the Financial Industry is a specific domain, it provides important lessons that are relevant to ontologies and KGs in general. For example, one distinction is whether to provide a deep hierarchy of foundationally primitive terms based around a Top-Level Ontology (TLO) or not. These are typically not needed for OWL applications and have been removed from the OMG FIBO standard. Another distinction is whether the ontology represents real-world ‘truth-makers’ (assertions that give rise to the meaning of a class of things) or data about things. For example, to be a bank is to hold certain legal capacities and capabilities, whereas to know something is a bank is to interrogate the available data for some suitable ‘data signature’ that such capacities exist, in this case in the form of a banking license. Ontologies therefore may be foundational for use as a point of reference or may be application-focused; and they may be predicated on subject matter or on data about that subject matter. These distinctions may be dismissed by developers as unimportant, but if one does not address them the result is that interoperability can be severely inhibited.
Developing ontologically sound standards
In “Standards and Ontologies” Michael Grüninger discussed the advantages and disadvantages of standardization of ontologies (Grüninger, 2020). The problem with de facto standards is that ontologies will be adopted simply because they are popular and widely used even if they were not properly developed with sufficient evaluation and analysis. The risk with this approach is ontologies could be used that contain ontological errors, unintended models, and omitted models, or they could incorporate implicit ontological commitments that prevent reuse. The standards we need are, therefore, the ones that enable the evaluation and comparison of ontologies. First of all are standards for ontology representation languages with formal semantics such as Common Logic (ISO 24707) and OWL. Second are standards for the specification for mappings between ontologies and between logics, a prime example being DOL from OMG. Finally, there are standardized axiomatizations of ontologies, in particular ISO 18629 (Process Specification Language) and ISO 21838 (Top-Level Ontologies).
Much work remains to be done in the standards arena. Recently, the International Association of Ontology and its Applications (IAOA) established the Industry and Standards Technical Committee (ISTC). This committee has two core purposes:
To foster the use of applied ontology in standardization initiatives, To facilitate the interactions across people in industry and in applied ontology research.
Activities within the ISTC include the dissemination of information about initiatives with the aim to gather experts interested in the development of ontologically sound standards. The ISTC also organizes virtual and physical meetings and events to discuss how to understand and apply ontological approaches and methodologies, both in general and for KG systems in particular.
Challenges
Methods of building a KG take us from raw, messy, and disconnected data/information that is hard to query, analyze, and visualize to a more refined, organized, cleaned, and linked product that is easier to visualize, query, and analyze. Challenges exist at every step in this process including recursions as part of a life cycle. In this section we briefly list these challenges in Table 1. The first column in Table 1 is an operation of a KGA. These operations are labeled “KG step” because they are usually the steps in a pipeline of operations such as that shown in Fig. 4. For each KG step there may be many problems and issues. We list the most significant of these in the second column of Table 1. The next column describes the context of the problem or issue. The last column cites some references. More details about these challenges will be published in a separate article.
Knowledge graph challenges
Knowledge graph challenges
In this section we propose some possibilities for the future development and uses of KGs, primarily in industry but also for the KG research community.
1. At the time of this Communiqué there are multiple definitions and/or types of entities labeled ‘Knowledge Graph’ which have different (and sometimes incompatible) capabilities. Section 2 has proposed a definition that captures the most important features of a large selection of published definitions. Having a single standard definition would help to reduce confusion and misconceptions about KGs, thus promoting confidence among potential adopters and their acceptance.
There will be a general acceptance of an effective definition of “knowledge graph”.
2. The current development paradigms of KGs suggests a continuation of data silos and persisting non-interoperability. However, if KGs are based on well designed and engineered schemas (preferably, ontologies), there is a greater potential for interoperability. While the invited speakers at the Ontology Summit 2020 were not necessarily a representative sample, it is encouraging that there was a consensus among them that ontologies were useful for KGs. Basing a KG on well developed ontologies is the only realistic way to avoid the proliferation of KG silos.
KG developers will understand the need for a well thought out schema and how ontologies, or at least ontological analysis, can aid in this.

A pipeline for building a KG based on a drawing by Pedro Szekely.
3. The use of natural language is the common mechanism (as symbols) for labeling entities in a KG. However, the use of natural language terms and phrases should be treated with more care given the inherent ambiguity with natural language. This is a reasonable expectation given that natural language tools are frequently used by KG projects and also given the proliferation of virtual assistants. Including some basic linguistic analyses in the selection of entity labels can help in minimizing potential ambiguity and better convey an intended interpretation.
KG developers will make use of linguistic analyses to help overcome the ambiguities of the use of natural language terms (and identifiers).
4. As a follow on to possibility 3, the unfortunate practice of relying on assumed common interpretations for the semantics of natural language terms and phrases is a source of misunderstanding, non-computability, and non-interoperability. Natural language terms and phrases can have many interpretations (varying depending on domain of use or other contexts). In human discourse, a context serves as an effective mechanism for disambiguation (e.g., asking for clarification). When terms and phases are used as labels in a KG, there is usually no longer an explicit or computable context available to disambiguate them. In an information system the lack of explicit semantics imposes a handicap on a more complete use of KGs.
KG developers will incorporate explicit formal computable distinctions for the intended interpretations of the natural language terms and phrases used (as symbols) for the labels of entities in a KG.
5. It is already the case that KGs are in use. In particular, they are employed when knowledge needs to be represented and used in information systems. For example, virtual assistants such as Siri, Alexa and Google Assistant make use of KGs. The success of virtual assistants suggests that KGs will be employed for other types of information systems, especially for automation and decision making, where logical reasoning (i.e., inferencing) is required.
KGs will be used in the creation and operation of software intensive systems (e.g., decision tools, analyses, representation of user interfaces).
6. As the pace of development and deployment of systems to provide new services accelerates, there is a significant advantage for information systems to be more dynamic (e.g., change in schema or new uses) and better able to represent knowledge. Unlike traditional relational database systems, KGs are much more flexible and so are more compatible with the need to be dynamic. But in order to take advantage of the dynamism available from KGs the overall architecture of their use will need to be addressed.
Information systems architects will better exploit KGs and their infrastructure to support more dynamic and knowledge based information systems.
7. As the uses of information and knowledge continue to expand (at a non-linear rate) the underlying information system architectures used by organizations will grow in proportion. However, this continued expansion of information use many times takes place in the context of existing processes and infrastructure. To support such expansion and growth new information system architectures, and importantly transition architectures, will become critical for large organizations in order to exploit the capabilities provided by KGs.
Architectures will be developed to aid enterprises and their extensive information systems in a transition to the use of KGs.
8. Given the more robust capabilities KGs have over relational database systems and the expanding expectations of information systems (e.g., virtual assistants), it is likely KGs will be the dominant persistence mechanism (for information systems). But it is hard to predict how long the transition will take. There are a great many legacy systems as well as systems that use legacy techniques. As explained in the previous possibilities above, KGs are a good tool for transitioning from legacy systems and techniques into a more sustainable and effective information environment.
KGs will have a significant effect on data and knowledge management in general.
This is an exciting time to be developing knowledge rich software and applications. There are immense repositories of data and many tools for extracting valuable knowledge. However, there are many pitfalls and challenges to achieving effective, scalable and dynamic information and knowledge based systems. This Communiqué presents the findings extracted from a year-long series of virtual presentations, panel sessions and online discussions on the topic of Knowledge Graphs in particular, and knowledge based systems in general.
Perhaps the most important contribution of this Communiqué is a succinct practical definition of a KG that captures the essential features of the main published definitions. The large diversity of opinions about what a KG is has led to confusion and criticism of the field of KGs. The proposed definition presented here should help to clarify the subject and to foster better quality research, development, acceptance and use.
Having presented both theoretical and pragmatic foundations for KGs, the Communiqué serves as a useful source for answering questions about KGs. The first question is why KGs have emerged as a popular technique for capturing knowledge and modernizing information systems in general. The Ontology Summit considered this question as well as the broader question of why one should bother with information at all, and the findings were summarized in this Communiqué.
The few next questions addressed by the Communiqué are about how one develops KGs and who is developing them. Given the large number and diversity of projects that are developing KGs and KG tools, the Ontology Summit selected a sample of tools, applications and projects that can serve as a source of case studies for individuals, teams and organizations who are considering the development of new projects as well as those who are capturing knowledge implicit in existing systems or projects.
Standards are important whether one is developing an open source project or a commercial product, since customers often make choices based on support for standards, or more generally on interoperability. The Communiqué surveyed both existing standards that are relevant to KG development and standards activities that may impact KGs.
It is difficult to predict the future, especially in a rapidly evolving field like information technology. Nevertheless, to the extent that challenges and future possibilities were identified during the Ontology Summit, these have been summarized in this Communiqué. The hope is that research and development efforts will attempt to make progress on the challenges listed here.
Footnotes
Acknowledgements
Certain commercial software systems are identified in this paper. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology (NIST) or by the organizations of the authors or the endorsers of this Communiqué; nor does it imply that the products identified are necessarily the best available for the purpose. Further, any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NIST or any other supporting U.S. government or corporate organizations.
We wish to acknowledge the support of the ontology community, especially the invited speakers and participants who contributed to the Ontology Summit. There were 22 invited speakers: Jans Aasman, Andreas Blumauer, Barry Smith, Ernest Davis, Anirudh Prabhu, Sargur Srihari, Spencer Breiner, Paco Nathan, Vinay K. Chaudhri, Krzysztof Janowicz, Binil Starly, Sean Gordon, Michael Uschold, Yolanda Gil, Matthew West, Chaitanya Baru, Lisa Carnahan, Elisa Kendall, Michael Grüninger, John F. Sowa, Michael Bennett, and Elise Stickles. The complete list of sessions, speakers, and links to presentation slides and video recordings is available at
A partial list of other contributors includes: Kingsley Idehen, Janet Singer, Doug Foxvog, Jack Hodges Jr, Alex Shkotin, Sjir Nijssen, Paul Tyson, Michael DeBellis, Edward Barkmeyer, Azamat Abdoullaev, Evan K. Wallace, Amit Sheth, Pascal Hitzler, Alessandro Oltramari, Jack Park, George Hurlburt, Russell Reinsch, Mariya Evtimova, and Bruce Bargmeyer. We especially thank Todd Schneider for suggesting the topic of knowledge graphs for the Ontology Summit 2020, and Kristina Rigopoulos for drawing Fig.
.
