Abstract
Knowledge graph embedding is aimed at capturing the semantic information of entities by modeling the structural information between entities. For long-tail entities which lack sufficient structural information, general knowledge graph embedding models often show relatively low performance in link prediction. In order to solve such problems, this paper proposes a general knowledge graph embedding framework to learn the structural information as well as the attribute information of the entities simultaneously. Under this framework, a H-AKRL (Hypergraph Neural Networks based Attribute-embodied Knowledge Representation Learning) model is put forward, where the hypergraph neural network is used to model the correlation between entities and attributes at a higher level. The complementary relationship between attribute information and structural information is taken full advantage of, enabling H-AKRL to finally achieve the goal of improving link prediction performance. Experiments on multiple real-world data sets show that the H-AKRL model has significantly improved the link prediction performance, especially in the embeddings of long tail entities.
Introduction
Knowledge representation and reasoning is a process inspired by human problem-solving to symbolize knowledge and acquire the ability of solving complex tasks for intelligent systems. Knowledge graph, as one important research content of knowledge representation and reasoning, is structured representation of facts that represent entities in the real world (concepts, people or things) and their relations in the form of graphs, where nodes represent entities and edges represent relations. A knowledge graph is usually expressed as a set of triplet
A toy example of a knowledge graph.
Due to the inevitable incompleteness of the knowledge graph [5], the link prediction task that aims to predict the missing triplet in the knowledge graph is particularly critical and important. A popular way to accomplish this task is Knowledge Graph Embedding (KGE), which learns the low-dimensional representations of all entities and relations, then uses them to predict new facts. These methods have been proved to be scalable and effective [44]. Generally speaking, low-dimensional representations of entities and relations are achieved by optimizing a scoring function that assigns higher scores to valid facts than invalid facts, where invalid facts are usually generated by randomly replacing the head entity or tail entity of the true facts. As shown in Fig. 1, the score assigned for the valid fact (The Forbidden City, isLocatedin, Beijing) should be greater than the score assigned for the invalid fact (The Confucian Temple, isLocatedin, Beijing), where the invalid fact (The Confucian Temple, isLocatedin, Beijing) is generated by replacing the head entity “The Forbidden City” of the valid fact (The Forbidden City, isLocatedin, Beijing) with another entity “The Confucian Temple”. According to different scoring functions, the knowledge graph embedding models can be divided into three categories: translation models, semantic matching models and neural network-based models. Translation models [13, 14, 23, 24, 28, 50, 59, 62] are inspired by the word representation tool word2vec, and generally assume that the difference between the vectors of two entities reflects the relationship between the two entities in the semantic space. These models adopt a distance-based scoring function, and regards the rationality of a fact as the distance between two entities after translation through relations. Semantic matching models [3, 30, 40, 41, 54, 65] use a similarity-based scoring function to measure the rationality of the facts by matching the underlying semantics of the entities and the relations represented in the vector space. These models usually use multiplication operators to construct the interactive relationship between entity embeddings and relation embeddings. Therefore, translation models are also called additive models, while semantic matching models are also called multiplicative models. Neural network-based models [12, 31, 47, 48] automatically learn feature representations of knowledge graph triplets with the help of the powerful learning ability of neural network, which can map the distribution of the input data from the original space to the feature space through nonlinear transformation. Under the development of deep learning, knowledge graph embedding methods based on convolutional neural network [11, 38, 46, 57], recurrent neural network [37], recurrent skipping network [18], reinforcement learning [10], and graph neural network [36, 45, 55, 58] have been proposed and have achieved certain performance improvements.
In the latest research on graph neural networks, references [2, 15, 25] considered to encode higher order data correlation and proposed to use hypergraphs to model nodes and relations. Different from a simple graph where one edge can only connect two nodes, each hyperedge in a hypergraph can connect any number of nodes, which can better model the group relationship of nodes, as shown in Fig. 2. Taking the cooperative research network as an example. Each nodes represents a researcher, and each hyperedge connects multiple nodes to represent a paper co-authored by related researchers. In this way, researchers who cooperate more closely can be connected through more hyperedges. Hypergraph learning has achieved state of the art effects in tasks such as node classification, image classification, network embedding, etc. For instance, in the latest research of network embedding [60], Wu et al. unify the expression form of different information sources of nodes, and achieves outstanding performances. The experiment results [25] have also proved the superiority of high-order correlation in node representation learning. However, to the best of our knowledge, there is no research on incorporating hypergraph into the knowledge graph embedding models.
Illustration of graph and hypergraph.
In addition, there are a large number of long-tail entities in the knowledge graph [18]. An entity is said to be long-tail if it is connected to less than three other entities. The structural information of an entity is usually represented by the edges connected to the entity and other entities connected to these edges. As shown in Fig. 1, the structural information of the entity “The Forbidden City” is provided by the edge “MainCollection” and its connected entity “Riverside Scene at Qingming Festival”, together with the edge “isLocatedin” and its connected entity “Beijing”. Due to the lack of structural information, traditional knowledge graph embedding models have insufficient ability to represent long-tail entities, which then affect the performance of downstream tasks (such as link prediction, etc.). An intuitive idea is to use additional auxiliary information to supplement the semantics of long-tail entities and strengthen structural-view embedding. Related research includes DKRL [64] and KDCoE [8] which adds text information, MT-KGNN [51], KBLRN [16] and AKRL [68] which adds attribute information, as well as IKRL [63] and MKBE [42] which adds image information and so on. On the basis of previous research, this paper also uses entity attributes as auxiliary information to strengthen structural-view embedding, and assumes that entities with similar attributes are also similar in categories. As shown in Fig. 1, the attribute information of the entity “Beijing” includes the value “21.536 million” on the attribute “Population”, the value “16410.54 km
Specifically, the H-AKRL model (Hypergraph Neural Networks based Attribute-embodied Knowledge Representation Learning) proposed in this paper uses an encoder-decoder framework: in the encoder, the entity embedding is learned independently from the structural-view and attribute-view, and sent to the fusion module to complete the encoding; in the decoder, ConvKB [38] plays the role of decoding. Under the structural-view, like most knowledge graph embedding models, the knowledge graph is regarded as a multi-relational directed graph, and the structural encoding of the entity is obtained through the interaction between entities. While under the attribute-view, the knowledge graph is regarded as a hypergraph, each attribute is modeled as a hyperedge, and entities on the same hyperedge share the attributes represented by the hyperedge. The main innovations of this paper are three-fold:
A unified knowledge graph embedding framework is proposed, which can simultaneously learn the structural information and attribute information of entities; It is proposed to use hypergraph neural network to learn the attribute information of the knowledge graph. To the best of our belief, this is the first research to apply hypergraph learning to the knowledge graph embedding; Finally, this paper applies the H-AKRL model to link prediction tasks on multiple real-world data sets. Compared with the latest link prediction models, H-AKRL model has obvious substantial improvements in results.
The rest of the paper is arranged as follows: Section 2 gives the formal expression of the problem and reviews related works, mainly focusing on the existing knowledge graph embedding models with entity attributes; Section 3 explains in detail the proposed model in this paper; Section 4 reports the experiment settings and result analysis; Section 5 concludes the paper and gives future directions of our model.
This section firstly gives the definitions and notations of the knowledge graph and link prediction task before introducing related works. Then the research ideas of knowledge graph embedding and knowledge graph embedding with entity attributes are analyzed in detail.
Problem formulation
Given a knowledge graph
Knowledge graph embedding
As mentioned in the Introduction Section, using
The scoring functions of several knowledge graph embedding models
The scoring functions of several knowledge graph embedding models
Lin et al. [29] are the first to study the content of
The above-mentioned models of
In the study of representation learning for network structure,
Methodology
In this section, we first describe the framework of our proposed model H-AKRL (Hypergraph Neural Networks based Attribute-embodied Knowledge Representation Learning), and then introduce the specific process of each module in detail in subsections. Let us suppose that the number of entities contained in the knowledge graph
Overall architecture of H-AKRL.
The overall architecture of the H-AKRL model is shown in Fig. 3. It is composed of three main steps: encoding step, fusion step and decoding step.
Firstly, in the encoding step, all entities in the entity set pass through the structure learning module and the attribute learning module to independently get the structural-view embedding and the attribute-view embedding. Among them, the roles of attribute learning and structure learning are acted by the hypergraph neural network and the TransE [5] model respectively.
Secondly, due to the heterogeneity of the information used in the structural-view and the attribute-view, the vector of the same entity in the two separate semantic spaces needs to be fused through the fusion module to achieve dual-channel entity representation. The fused entity vector can retain the structural information and the attribute information simultaneously.
Lastly, the obtained entity embedding matrix will be sent to the decoder for the knowledge graph link prediction task. Since we choose ConvKB [38] as the decoder in this paper, additional relational embedding matrices need to be provided.
Let
Structure learning module
The structure learning module preserves the relationship between entities by learning the structural triplets of the knowledge graph. We adopt TransE [5] model as our structure learning module and interpret the relation as a translation vector from the head entity to the tail entity. Given the structural triplet
where
where
Classical graph learning theory defines an edge as a point-to-point relationship on the graph. An edge can only connect two nodes and can only model the binary relationship. In the hypergraph learning theory, the hyperedge models the group relationship between nodes, and a hyperedge can connect any number of nodes, so the hypergraph can model the higher-order correlation between them. Hypergraph learning assumes that given any two nodes in the hypergraph, the larger the amount of hyperedges linking them, the more connected they are to each other, the stronger the data association between these two nodes. This assumption is consistent with our research intuition, that is, the more common attributes between two entities in the knowledge graph, the more similar the entity types to which these two entities belong, and the closer the embeddings of these two entities are in the semantic space.
Inspired by hypergraph learning, given the set of attribute triplets
where,
The attribute learning module based on the hypergraph neural network can learn entity embedding and attribute embedding simultaneously, so that the information of entities can be transferred to each other through the shared hyperedge, thereby achieving the purpose of updating the entity embedding. The attribute embedding is also updated along with the updating of entity embedding. Let
The Eq. (4) shows that the attribute embedding
Entity embedding is updated once after each hyperedge convolutional layer. Assuming that after the
where
It has been proved that HNN [15] is a spectral convolution operation defined on a hypergraph. Hypergraph neural network can effectively extract the high-order correlation between entities through the entity-attribute-entity conversion and aggregation on each layer.
We assume that the attribute learning module is formed by stacking K-layer hyperedge convolutional layers. The initial entity embedding matrix is
Figure 4 vividly depicts the data flow of the attribute learning module. The figure contains a total of 6 entities and 3 attributes, of which attribute “a1” is shared by entities 1, 2, and 6, so the hyperedge corresponding to attribute “a1” connects the nodes corresponding to entities 1, 2, and 6. In the same way, the hyperedge corresponding to attribute “a2” connects the nodes corresponding to entities 1, 3, and 4, and the hyperedge corresponding to attribute “a3” connects the nodes corresponding to entities 3, 4, 5, and 6. The hypergraph neural network in the figure is composed of K hyperedge convolutional layers with embedding dimension
Schematic diagram of the attribute learning module.
The attribute learning module is trained through a logistic regression model, so the loss function is defined as follows:
where,
Combining the above two modules, the structure learning module captures entity similarity based on entity relationships, while the attribute learning module captures entity similarities based on entity attributes. The structure learning module and the attribute learning module together constitute the encoder part, and the overall optimization goal of joint learning is:
This paper uses ConvKB [38] as the decoder. Before decoding, since the two learning modules construct two mutually independent semantic spaces, it is necessary to go through the fusion module to map the entity embeddings from different learning modules into the same vector space.
For any
where,
ConvKB uses convolution operations to analyze the global embedding features in each dimension of the triplet, while retaining the translation characteristics of the triplet. The scoring function is defined as follows:
where
The complete training process of H-AKRL is shown in Algorithm 1.
Datasets
This paper constructs two data sets FB15k-237-Attr and FB24K as example analysis to evaluate the performance of the H-AKRL model proposed in this paper in link prediction.
Statistics of datasets
Statistics of datasets
The relevant statistical information of the constructed data sets is shown in Table 2.
The goal of the link prediction task is to predict a triplet when the head entity or tail entity is missing, i.e., given a combination
We assume that the entity embedding dimension of the structural learning module, the entity embedding dimension and the relation embedding dimension of the fusion module are the same, namely
We select TransE [5], Distmult [65] and ConvE [11], as our baseline models. According to the experimental results of Rossi et al. [44], we also select the current state-of-the-art link prediction models, including ComplEx [54], RotatE [50], Tucker [3], and RSNs [18], as our comparative experiments. In order to allow these models to process attribute information, we follow the approach of Chao et al. [46], to express attributes as attribute nodes connecting related entities.
We use the adaptive moment (Adam) algorithm [26] to train the model, follow Bordes et al. [5] to use the filtered settings for evaluation.
Results and analysis
Table 3 shows the overall link prediction performance of the H-AKRL model and the selected baseline models on the two data sets.
It can be seen from Table 3 that on the FB15k-237-Attr data set, H-AKRL model has achieved the best results on all indicators; while on the FB24K data set, H-AKRL model performs best on the HITS@10 and MR, and the remaining three indicators all achieves sub-optimal results. It fully illustrates the superiority of the H-AKRL model in the performance of link prediction.
In addition, we find that RotatE also shows good performance on these two data sets. We believe the main reason is that RotatE defines entity embedding and relation embedding in the complex domain. Operators on the complex domain have stronger expressive power and can better model symmetric relations. Therefore, in our future work, we can consider defining our model on the complex domain and using RotatE as our structural learning module.
Link prediction for FB15k-237-attr and FB24K datasets
Link prediction for FB15k-237-attr and FB24K datasets
It has been demonstrated through previous works that the Tucker is fully expressive when the entity embedding dimension is larger than the number of entities and the relationship embedding dimension is larger than the number of relationships, that is, has the theoretical potential to learn correctly any valid graph, without being hidden by intronic limitations. However, experimental results from Table 3 shows that Tucker does not performance well on link prediction of the aforementioned dataset. The possible reason could be that the entity embedding dimension set experimentally in this paper is much smaller than the number of entities, and the capability of Tucker is considerably weakened for model expression. As stated by Meilicke et al. [6], when the embedding dimension is low, a fully expressed model does not necessarily exert better performance on link prediction. In addition, the way to express attributes as attribute nodes increases both the number of entities and the number of relations, potentially further impacting the Tucker’s performance.
In order to illustrate the outstanding effect of the H-AKRL model on long-tail entities, we take the FB15k-237-Attr data set as an example for in-depth analysis. Figure 5 shows the distribution of entity degree in the FB15k-237-Attr dataset. It can be seen that there are a large number of long-tail entities in the FB15k-237-Attr dataset, where the degree of an entity is defined as the number of structural triplets containing the entity.
We select TransE, RotatE and H-AKRL models, and carry out comparative experiments for related link prediction tasks on entity with degrees of 1, 2, and 3 respectively. The experimental results are shown in Table 4.
Link prediction results by degrees
Distribution of entity degree.
It should be noted that in the table, TransE and RotatE provide two sets of experimental results in each comparative experiment. One is obtained by using only structural triplets, the other is obtained by following the “bridge” approach of Chao et al. [46], combining entity structural information and attribute information. From Table 4, it can be clearly seen that, in all comparative experiments, whether TransE or RotatE, the link prediction experiment result combined with attribute information is better than that using only structural information. It fully shows that there exists a complementary relationship between entity attribute information and entity structural information, and attribute information can enhance the performance of link prediction.
In addition, it can be found that although RotatE shows strong performance in the link prediction tasks on all entities in Table 3, it still has shortcomings in the representation learning of long-tail entities shown in Table 4. Compared with the two baseline models, the performance of the H-AKRL model on long-tail entities have been significantly improved in all aspects. Taking the Hits@10 index on entities with a degree of 3 as an example, compared with TransE and RotatE, the H-AKRL model achieves an increase of 22.2% and 6.7% respectively. It fully shows that the H-AKRL model proposed in this paper can effectively use entity attribute information and enrich the semantics of long-tail entities. This also further shows that the hypergraph neural network can better model the similarity between entities in the attribute-view.
Ablation study result.
For further illustration, we conducted the following ablation study to exhibit the important role of the attribute learning module in our model. We delete the attribute learning module and fusion module in our model, which means, the entity embedding obtained by the structure learning module is directly sent to the decoder to complete the link prediction task, and the result is shown in Fig. 6.
For the compactness of representation, we normalized the experimental results: the results obtained from TransE in Table 3 are all set to 1, and the results of TransE+ConvKB and H-AKRL are scaled in equal proportions. The scaled results represent the ratio of the results compared to TransE.
It can be seen from Fig. 6 that among the five indicators, H-AKRL has achieved the best performance. The results of TransE+ConvKB are between the results of H-AKRL and TransE. TransE has the worst link prediction performance. This also further illustrates: (1) Compared with a separate structure learning module, the encoder-decoder architecture proposed in this paper can effectively improve the embedding performance of the entity. We speculate the possible reason is that although TransE has some drawbacks in dealing with one-to-many, many-to-many relation types, since H-AKRL adopt an encoder-decoder framework, the convolution operation at the decoder side enhances the correlation between entities and relations, thus weakening the inherent drawbacks of TransE in relation processing and improving link prediction performance; (2) The attribute learning module based on the hypergraph neural network can better extract the entity similarity under the attribute-view, enriching the semantic information of the entity, and showing obvious advantages in improving the performance of link prediction.
This paper studies the problem of knowledge graphs embedding with entity attribute, and proposes a knowledge graph embedding model H-AKRL based on hypergraph neural network, which can simultaneously learn the attribute information and structural information of entities. In the encoder part, TransE and Hypergraph Neural Network play the roles of structure learning module and attribute learning module respectively, and ConvKB is selected as the decoder. H-AKRL has made significant improvements in experiments on multiple real-world data sets, especially in the link prediction of long-tail entities, which proves that the hypergraph neural network can more effectively model the entity similarity under the attribute-view.
Due to the inherent flaws of TransE’s model assumptions, it cannot correctly model one-to-many and many-to-many relations. Using other structural learning models will be our future work.
In addition, in the attribute learning module, the extent of attributes distinguishing between entities differs. Before constructing the incidence matrix, we can consider using the attention mechanism to assign different attention coefficients to different attributes, which is another direction where our model can be improved in the future.
Footnotes
Acknowledgments
This work was supported by the National Natural Science Foundation of China (61806221).
