Abstract
Knowledge graph reasoning or completion aims at inferring missing facts based on existing ones in a knowledge graph. In this work, we focus on the problem of open-world knowledge graph reasoning—a task that reasons about entities which are absent from KG at training time (unseen entities). Unfortunately, the performance of most existing reasoning methods on this problem turns out to be unsatisfactory. Recently, some works use graph convolutional networks to obtain the embeddings of unseen entities for prediction tasks. Graph convolutional networks gather information from the entity’s neighborhood, however, they neglect the unequal natures of neighboring nodes. To resolve this issue, we present an attention-based method named as NAKGR, which leverages neighborhood information to generate entities and relations representations. The proposed model is an encoder-decoder architecture. Specifically, the encoder devises an graph attention mechanism to aggregate neighboring nodes’ information with a weighted combination. The decoder employs an energy function to predict the plausibility for each triplets. Benchmark experiments show that NAKGR achieves significant improvements on the open-world reasoning tasks. In addition, our model also performs well on the closed-world reasoning tasks.
Keywords
Introduction
Recently, numerous knowledge graphs (KGs) have been constructed, such as Freebase [3], WordNet [25], DBPedia [21], and YAGO [35]. They have been applied in various applications including relation extraction [29], recommender systems [5], and intelligent question answering systems [48]. A typical KG is usually composed of numerous triplets (head, relation, tail), e.g., (Washington D . C ., is CapitalOf, America). However, most KGs that are currently in use are incomplete in the sense that they do not contain all true triplets or they actually contain false facts. For instance, in Freebase, 75% person have unknown nationality [43]. Therefore, how to complete KGs or detect false triplets through reasoning methods is an important and challenging task.
While KGs are widely adopted in intelligent applications, a major bottleneck hindering its usage is the incompleteness of manually curated facts, leading to extensive studies on knowledge graph reasoning (KGR). In recent years, many canonical machine learning tasks related to KGR have been explored, which achieves promising performance. These models predict missing facts by learning entity and relation embeddings. A key problem with most existing methods is that the plausibility of links can be determined for known entities only. However, new entities and relations arise with time. For instance, on DBpedia 2016-04 release, about 200 new entities emerge every day [33]. In case a triplet is not contained in KG, it does not imply that the corresponding fact is false but rather unknown (open-world assumption). Therefore, it is necessary for reasoning models to infer knowledge about entities not observed in the KG due to the evolving nature of KGs.
This research explores the task of open-world KGR, which is a critical but relatively unexplored problem. One major challenge of this task is how to represent unseen entities since these entities are not available when training. It seems promising to embed these entities by aggregating its neighborhood vectors. For example, in Figure 1, suppose there is an unseen entity “Carles Puyol” (marked gray) which has connection with existing entities but not included in the KG. We want to infer more facts from existing triplets, and answer questions like “What’s Carles Puyol’s nationality?”. By aggregating information from its neighborhoods, we can build representations for Carles Puyol and then predict its nationality.

A motivating example of an unseen entity. Solid-lined circles and arrows represent entities and relations that already exisit in the KG. While dashed ones represent the unseen entity which is not observed in the KG and some of its known relations to other exiting entities.
So far, only a few pieces of research have studied the open-world KGR task [31, 45]. Although these methods achieve competitive results, they require external resources, such as entity’s name, description to generate embeddings for unseen entities. Hamaguchi et al. [13] apply graph neural networks on neighbor nodes to get the representations of unseen entities without using external information. However, they treat each node equally without considering their different importance, which is inconsistent with the practical scenarios. As shown in Figure 1, if we intent to infer Carles Puyol’s nationality, the triplet (Carles Puyol, Born _ in, Catalunya) is more informative than (Carles Puyol, career, Soccer _ player). Therefore, we propose an attention-based aggregator that assigns different importance to neighboring nodes in the process of feature aggregation. Our model can be interpreted as an encoder-decoder architecture where the attention-based aggregator and TransE act in an encoder and decoder, respectively. The encoder employs a graph attention network (GAT) to build representations for unseen entities by aggregating its neighboring nodes. The decoder is used to define the task-oriented objective function. In this study, we would like to choose the TransE [4] model as the decoder, but other KGR models can also be adopted.
The contributions of our research are outlined as follows: (1) We draw attention to an important but relatively unexplored problem of open-world KGR. In particular, we present an attention-based aggregator which is different from previous work and more suitable for practical scenarios. (2) In contrast to previous reasoning methods, our proposed method capture both entity and relation features in the aggregation steps. (3) Experiments demonstrate the effectiveness of assigning different weights to neighborhoods.
The rest of this paper is structured as follows. We first review the related work in Section 2. Then, the notations and background are presented in Section 3. The architecture and details of our model are introduced in Section 4. Datasets and experimental results are summarized in Section 5. Finally, we conclude the paper in Section 6.
Our method is related to embedding-based methods, approaches to reasoning concerning unseen entities, recent advancements in applying graph convolution networks to graph-structured data and temporal knowledge graph (TKG) reasoning methods.
Embedding-based methods
For knowledge representation learning and reasoning, embedding-based methods have attracted a lot of interests in the past few years. Chen et al. [6] provide a comprehensive review about these methods. Embedding-based models usually embed entities as well as relations into a low-dimensional semantic vector space, and then measure the plausibility of each triplet in that space. Among them, TransE [4] is a classical model. TransE interprets relations as a translation vector between head entity h and tail entity t for each triplet (h, r, t). As shown in Figure 2, it wants the embeddings of

A simple illustration of TransE.
Despite its simplicity, TransE has flaws in dealing with complex relations, such as one-to-many, many-to-one, and many-to-many relations. To deal with this problem, different variants based on TransE have been derived. TransH [41] proposes that entities are supposed to have distinct representations when involved in different relations. It projects head and tail entities into the hyperplane of one specific relation. However, TransH represents entities and relations in the same space, which prevents TransH from modeling entities and relations precisely. TransR [22] observes that an entity may have multiple aspects depending on specific relations and projects entities and relations into different vector spaces. In fact, although TransR has significant improvements compared with TransE and TransH, it also has several flaws. First, both head and tail entities share the same mapping matrix, which ignores the different attributes of entities. Second, TransR has higher complexity than TransE and TransH due to matrix-vector multiplication. To resolve these problems, TransD [15] constructs dynamic mapping matrix by defining two vectors for each entity-relation pair and replaces matrix-vector multiplication by vector operations to reduce model complexity.
Recently, some works have tried to incorporate external information into embedding-based models. TKRL [47] encodes hierarchical type information into KG representations. IKRL [46] combines images with knowledge graphs for KGR. Its promising performances indicate the significance of visual information for KGR. An et al. [1] proposes an accurate text-enhanced KGR method to handle the semantic variety of entities and relations in distinct triplets by exploiting the entity descriptions and triple-specific relation mention.
However, these embedding-based algorithms could only handle the situation where entities are not absent from KGs. Such limitation prevents them from building representations for unseen entities.
To relieve the issue of emerging entities, several methods which can generalize to perform reasoning concerning unseen entity have been proposed. DKRL [45] utilizes entity’s description to build representation for entity which is absent from KGs. It encodes semantics of entity’s descriptions using CBOW and CNN model. ConMask [33] uses relationship-dependent content masking, fully convolutional neural networks, and semantic averaging to extract relationship-dependent embeddings for unseen entity from the textual features of entities and relationships in KGs. Shah et al. [31] propose OWE model which maps the embeddings of entity’s name and description to the graph-based embedding space, by which OWE can perform open-world KGR. Hamaguchi et al. present a Graph-NN model to build embeddings for unseen entities [13] by exploiting existing elements, without using external resources. However, they assume all local neighbors contribute equally to the entity embedding, whereas heterogeneous neighbors could have different influence. Therefore, it is crucial to design a model to effectively capture impact differences of local neighbors.
Graph neural network-based methods
During the past few years, different variants of GNNs have been developed with graph convolutional networks (GCN) [19] being one of them. Graph neural networks [7] can build representations for entities by encoding local graph structures. GCN learns the features by conducting convolution on neighboring nodes for node classification. A variant of GCN, named R-GCN, is proposed by [30], which aims to model multi-relational data. R-GCN perform reasoning by reconstructing an edge with an autoencoder architecture and using a parameterized score function. However, above methods cannot generalize to unseen nodes. In contrast, Hamilton et al. [14] propose GraphSAGE which can generate representations for previously unseen data by leveraging node attribute information. However, it cannot be directly applied to KGs with multi-relational edges. Shang et al. [32] propose an end-to-end structure-aware convolutional network (SACN), which introduces a weighted GCN to capture the structural information in KGs by utilizing KG node structure, node attributes, and relation types. However, these models are not applicable for open-world KGR.
TKG reasoning methods
Temporal KGR is another related line of our study, as new entities and relations arise with time. Recent studies have demonstrated that incorporating time information into the embeddings can boost the performances of KGR tasks. t-TransE [17], TAE [16] learn time-aware embedding by imposing temporal order constraints based on a translation-based score function. In order to encode temporal information directly in the learned embeddings, HyTE [8] associates each timestamp with a corresponding hyperplane. Different from existing models usually restricting to one time granularity, TA-TransE [11] can deal with temporal facts having varying time granularities by using a LSTM to encode time digits and relations. Recently, TPmod [2] infers missing events by utilizing the Gate Recurrent Unit (GRU) to model the temporal dependency, and achieves the SOTA results.
Background
This section describes the preliminaries of KGR and GATs. Table 1 lists the key symbols used in this paper.
List of notations used in this paper
List of notations used in this paper
KG usually can be represented by a collection of triplets G={ (e h , r, e t ) |e h , e t ∈ E, r ∈ R }, where E and R are the entity set and relation set. e h , r, e t represent the head entity, relation, and tail entity, respectively. Note that, unlike previous methods, in the open-world KGR task, e h (or e t ) may be a previously unseen entity.
For an entity e
i
, we denote its neighborhood by
where R ij represents the relation set. We use bold lower letters and bold upper letters to denote vectors and matrices, respectively.
Given a KG, we would like to learn a neighborhood aggregator A that act as follows: For an entity e
i
, A aggregates features from its neighborhood to build representations for e
i
. For a triplet (e
i
, r, e
j
) containing emerging entity, aggregator A is used to generate embeddings for previously unseen entity to predict the plausibility of the triplet.
When an unseen entity having connection with existing entities and relations emerges, we could apply A on its newly established neighborhood to obtain representations, and infer new facts about it.
GCN simply assumes that all neighboring nodes contribute equally when aggregating feature from its neighborhood. To overcome this disadvantages, Velivckovic et al. [39] introduce GAT. GAT weighs “important” neighbors more, rather than assigning equal importance to each neighboring nodes.
The input to a single graph attentional layer is a set of node features,
To achieve a higher-level representation, GAT applies a shared node-wise feature transformation, specified by a weight matrix
where a is a attention function. To make coefficients easily comparable across different nodes, GAT normalizes them across all the values in the neighborhood using the softmax function:
where
in which σ is an activation function, and α ij specifies the weighting factor (importance) of node j to i. To make the learning process stable, GAT applies a multi-head attention mechanism to concatenate K attention heads, which is defined as:
where || represents the concatenation operation,
Framework
As illustrated in Figure 3, NAKGR follows an encoder-decoder architecture. Given a triplet (e h , r, e t ), the encoder embeds entities and relations into a low-dimensional vector spaces, and then outputs their embeddings. The decoder measures the plausibility of triplet, which can be substituted by a number of existing KGR models. This setting guarantees the flexibility and extendibility of NAKGR.

The architecture of NAKGR model.
Hamaguchi et al. [13] demonstrate the significance of encoding neighborhood information for KGR. The proposed encoder takes the average of feature representations of neighboring nodes as the embedding of unseen entity. Despite the desirable performance, it neglects the different importance of neighboring entities. In light of this issue, we introduce GATs to aggregate information from neighborhood with assigning different weights. Although GAT has proven to be useful in many applications, a deficiency is that it ignores relation features for obtaining node embeddings. As KGs provide semantic relations between entities, it is natural to incorporate the semantics of relation into fact modeling.
In this paper, we enhance GAT by capturing both entity and relation features as relation is an important part of the KG. As shown in Figure 3, the aggregator takes entity e
h
with its neighborhood
Specifically, we first perform linear transformations, parameterized by two weight matrices
where ch,j represents the attention value of entity pair (e
h
, e
hj
), [,] is the concatenation operation, and a is an attention function. In the experiments, we replace a with a feedforward neural network, parametrized by a weight vector
To get the relative attention values, we also apply a softmax function to ch,j, which is defined in Eq.(8). The calculation process of the relative attention values αh,j is shown in Figure 4.

Illustration of attention mechanism.
Finally, we get Ne (e h ), the output of entity e h , which has already aggregated information from neighboring nodes. To encapsulate more information about the neighborhood and stabilize the learning process, we also apply multi-head attention, i . e .,
We train our model using the following margin-based loss function:
{where f () represents the energy function, and γ is the hyperparameter of margin. S is the set of positive triplets, while S′ is denotes the set of negative triplets which is obtained by replacing head entity e
h
or tail entity e
t
in (e
h
, r, e
t
) randomly.
The decoder is supposed to predict the plausibility of the training triplet based on the embeddings of head entity and tail entity output by the encoder. We select TransE model as the decoder. TransE is one of the most typical embedding-based reasoning models, and we adopt it for its simplicity and ease of training. The decoder measures the plausibility of a training triplet (e
h
, r, e
t
) with an energy function f (e
h
, r, e
t
). The energy function of TransE is defined as
KGR can be roughly categorized into the following two kinds of tasks: first, predicting the truth value of triplets (triplet classification), and second, inference of missing entity (entity prediction). We evaluate NAKGR on both tasks under open-world settings and closed-world settings.
Datasets
For the open-world KGR tasks, we need datasets whose test sets contain unseen entities during training. For the task of triplet classification, we conduct experiments on the datasets released by [13], which is constructed based on WN11. Table 2 shows the details of this dataset. For the entity prediction task, experiments are conducted on a new dataset constructed based on FB15K following a similar procedure used in [13] as follows:
Entities and triples statistics of datasets released by [13]. The numbers of triples include negative triples
Entities and triples statistics of datasets released by [13]. The numbers of triples include negative triples
1. Sampling unseen entities. We first select different ratio (5%, 10%, 20%) of the original test triplets as a new test set
2. Filtering and splitting datasets. After finishing the first step, we need to make sure that unseen entities wouldn’t appear in final training set and validation set. The original training dataset is split into the new training dataset and auxiliary dataset. For a triplet (e
h
, r, e
t
), if e
h
∈ E ∧ e
t
∈ E, we add it to the new training dataset. If
Numbers of entities and relations of our processed FB15k dataset
For the closed-world KGR tasks, we evaluate the NAKGR model on the benchmark WN11 [34], FB13 [34], FB15k [4], and WN18 [4] datasets. WN11 and WN18 are extracted from WordNet [25] which provides semantic relations among words. In the WordNet, each entity is a synset consisting of several words, expressing a distinct concept. The semantic relations among synsets includes hypernym, hyponym, meronym as well as holonym. FB13 and FB15k are two subsets of Freebase [3] which represents general world knowledge. For example, the triplet (Mark Twain, writer _ of, The Adventures of Tom Sawyer) denotes the fact that Mark Twain is the writer of The Adventures of Tom Sawyer. Table 4 lists statistics of four datasets.
Statistics of datasets for closed-world tasks
For open-world KGR tasks, we compare our NAKGR model against following baselines:
•DKRL [45] proposes a reasoning method for KGs under the zero-shot setting taking advantage of entity descriptions.
•ConMask [33] learns embeddings of the entity’s name and parts of their text-description to connect unseen entities to the KG.
•TransE-OWE [31] presents an extension embedding-based KGR models to predict the unseen entity, which maps the embeddings of entity’s name and description to the graph-based embedding space.
•Graph-NNs [13] use GNN to build representations for unseen entities, exploiting existing triplets in the KG, which does not rely on external resources.
For closed-world KGR tasks, our model are compared with 5 baseline methods as follows:
•TransE [4] is a well-known embedding-based model for reasoning by capturing the features of entities and relations.
•DistMult [49] proposes a framework for knowledge reasoning task, which models relation composition using a simple formulation of bilinear model.
•ComplEx [37] takes complex valued embeddings into consideration when employing eigenvalue decomposition. It has been shown to achieve SOTA performance on both FB15k and WN18.
•R-GCN [30] is one of the earliest approaches to use GNNs for KGR task. It introduces a relational Graph Convolution Network, which produces locality-sensitive embeddings, which are then passed to the decoder that predicts missing links in KG.
•TransE-NMM [27] is a mixture model which encodes an entity as a weighted hybrid representation of its neighborhoods.
Open-world KGR
Triplet classification
Triplet classification task aims to judge whether a given triplet (e h , r, e t ) is correct or not, which can be viewed as a binary classification task.
Evaluation results on open-world triplet classification. Bold indicates the best scores for each dataset
Evaluation results on open-world triplet classification. Bold indicates the best scores for each dataset
Entity prediction under open-world assumption aims to infer the missing head entity e h or tail entity e t for a triplet (e h , r, e t ) where e h or e t is absent from the KG. To tackle this task, we first hide head entity (tail entity) of each testing triplet in Head-R (Tail-R) to produce a missing part. Then we replace the missing part with each entity in the KG, and calculate the score function value according to Eq.(12). Finally, we rank these entities in ascending order, and obtain the rank of the original correct triplet.
Evaluation results on open-world entity prediction
Evaluation results on open-world entity prediction
Because entities are observed at training time under closed-world assumption so that the NAKGR model can perform closed-world KGR tasks. Therefore, we also compare NAKGR with TransE, DisMult, ComplEx, R-GCN, and TransE-NMM on the triplet classification and entity prediction tasks. For the triplet classification task, WN11 and FB13 are used for evaluation. For the entity prediction task, FB15k and WN18 are the benchmark datasets.
Closed-world triplet classification accuracy on WN11 and FB13
Closed-world triplet classification accuracy on WN11 and FB13
Closed-world entity prediction results on FB15k and WN18
Effectiveness of relations on Head-10
Effectiveness of relations on Head-10
Different scoring function on Head-10

Results on Head-R and Tail-R with different proportion of unseen entities.
In this paper, we introduce a novel method for open-world KGR. We present NAKGR, an attention-based aggregator that leverages neighborhood information to efficiently build representations for previously unseen entity. Additionally, NAKGR captures both entity and relation features in a given entity’s neighborhood. Further analysis shows that our encoder can easily extend to existing models such as ComplEx and Analogy without introducing extra parameters. This makes it possible for our encoder to be a component of other KGR models. Experiments results on benchmark datasets demonstrate that NAKGR achieves competitive performance on the open-world KGR task and performs well in the closed-world reasoning task.
There are two major limitations in this study that could be addressed in future research. First, the NAKGR model only aggregates information from immediate neighbors, while multi-hop neighbors can help the model iteratively accumulate knowledge. Second, NAKGR only considers neighborhood information in the feature aggregation steps, while there is rich information, like visual and textual information which could be integrated into our model.
For future work, we will investigate a more expressive architecture which updates an entity’s representation by aggregating information not only from its direct neighbors, but from its multi-hop neighborhood. Furthermore, we intend to incorporate side information of entities in other modalities into the graph structures.
Footnotes
Acknowledgments
This work is supported by the National Natural Science Foundation of China under Grant No.72071145, the National Key R&D Program of China under Grant No.2019YFB1704402.
OpenKE: github.com/thunlp/OpenKE
