Knowledge graph embedding with entity attributes using hypergraph neural networks

Abstract

Knowledge graph embedding is aimed at capturing the semantic information of entities by modeling the structural information between entities. For long-tail entities which lack sufficient structural information, general knowledge graph embedding models often show relatively low performance in link prediction. In order to solve such problems, this paper proposes a general knowledge graph embedding framework to learn the structural information as well as the attribute information of the entities simultaneously. Under this framework, a H-AKRL (Hypergraph Neural Networks based Attribute-embodied Knowledge Representation Learning) model is put forward, where the hypergraph neural network is used to model the correlation between entities and attributes at a higher level. The complementary relationship between attribute information and structural information is taken full advantage of, enabling H-AKRL to finally achieve the goal of improving link prediction performance. Experiments on multiple real-world data sets show that the H-AKRL model has significantly improved the link prediction performance, especially in the embeddings of long tail entities.

Keywords

Attribute network hypergraph neural network knowledge graph embedding long-tail entity

1. Introduction

Knowledge representation and reasoning is a process inspired by human problem-solving to symbolize knowledge and acquire the ability of solving complex tasks for intelligent systems. Knowledge graph, as one important research content of knowledge representation and reasoning, is structured representation of facts that represent entities in the real world (concepts, people or things) and their relations in the form of graphs, where nodes represent entities and edges represent relations. A knowledge graph is usually expressed as a set of triplet $(h,r,t)$ , where each triplet $(h,r,t)$ represents the fact that there exists a relation $r$ between the “head entity” $h$ and the “tail entity” $t$ . Figure 1 shows a toy example of a knowledge graph. It can be observed that there is a relation “isLocatedin” between the head entity “The Forbidden City” and the tail entity “Beijing”, which represents the fact that the Forbidden City is located in Beijing. Real world knowledge graphs include Freebase [4], WordNet [34], YAGO [20, 33, 49], DBpedia [1] and NELL [7, 35]. At present, knowledge graph has developed into one of the basic technologies of intelligent services such as relation extraction [56], intelligent question answering [9], standardization [67], system recommendation [69], information retrieval [39] and dialogue generation [43].

Figure 1.

A toy example of a knowledge graph.

Due to the inevitable incompleteness of the knowledge graph [5], the link prediction task that aims to predict the missing triplet in the knowledge graph is particularly critical and important. A popular way to accomplish this task is Knowledge Graph Embedding (KGE), which learns the low-dimensional representations of all entities and relations, then uses them to predict new facts. These methods have been proved to be scalable and effective [44]. Generally speaking, low-dimensional representations of entities and relations are achieved by optimizing a scoring function that assigns higher scores to valid facts than invalid facts, where invalid facts are usually generated by randomly replacing the head entity or tail entity of the true facts. As shown in Fig. 1, the score assigned for the valid fact (The Forbidden City, isLocatedin, Beijing) should be greater than the score assigned for the invalid fact (The Confucian Temple, isLocatedin, Beijing), where the invalid fact (The Confucian Temple, isLocatedin, Beijing) is generated by replacing the head entity “The Forbidden City” of the valid fact (The Forbidden City, isLocatedin, Beijing) with another entity “The Confucian Temple”. According to different scoring functions, the knowledge graph embedding models can be divided into three categories: translation models, semantic matching models and neural network-based models. Translation models [13, 14, 23, 24, 28, 50, 59, 62] are inspired by the word representation tool word2vec, and generally assume that the difference between the vectors of two entities reflects the relationship between the two entities in the semantic space. These models adopt a distance-based scoring function, and regards the rationality of a fact as the distance between two entities after translation through relations. Semantic matching models [3, 30, 40, 41, 54, 65] use a similarity-based scoring function to measure the rationality of the facts by matching the underlying semantics of the entities and the relations represented in the vector space. These models usually use multiplication operators to construct the interactive relationship between entity embeddings and relation embeddings. Therefore, translation models are also called additive models, while semantic matching models are also called multiplicative models. Neural network-based models [12, 31, 47, 48] automatically learn feature representations of knowledge graph triplets with the help of the powerful learning ability of neural network, which can map the distribution of the input data from the original space to the feature space through nonlinear transformation. Under the development of deep learning, knowledge graph embedding methods based on convolutional neural network [11, 38, 46, 57], recurrent neural network [37], recurrent skipping network [18], reinforcement learning [10], and graph neural network [36, 45, 55, 58] have been proposed and have achieved certain performance improvements.

In the latest research on graph neural networks, references [2, 15, 25] considered to encode higher order data correlation and proposed to use hypergraphs to model nodes and relations. Different from a simple graph where one edge can only connect two nodes, each hyperedge in a hypergraph can connect any number of nodes, which can better model the group relationship of nodes, as shown in Fig. 2. Taking the cooperative research network as an example. Each nodes represents a researcher, and each hyperedge connects multiple nodes to represent a paper co-authored by related researchers. In this way, researchers who cooperate more closely can be connected through more hyperedges. Hypergraph learning has achieved state of the art effects in tasks such as node classification, image classification, network embedding, etc. For instance, in the latest research of network embedding [60], Wu et al. unify the expression form of different information sources of nodes, and achieves outstanding performances. The experiment results [25] have also proved the superiority of high-order correlation in node representation learning. However, to the best of our knowledge, there is no research on incorporating hypergraph into the knowledge graph embedding models.

Figure 2.

Illustration of graph and hypergraph.

In addition, there are a large number of long-tail entities in the knowledge graph [18]. An entity is said to be long-tail if it is connected to less than three other entities. The structural information of an entity is usually represented by the edges connected to the entity and other entities connected to these edges. As shown in Fig. 1, the structural information of the entity “The Forbidden City” is provided by the edge “MainCollection” and its connected entity “Riverside Scene at Qingming Festival”, together with the edge “isLocatedin” and its connected entity “Beijing”. Due to the lack of structural information, traditional knowledge graph embedding models have insufficient ability to represent long-tail entities, which then affect the performance of downstream tasks (such as link prediction, etc.). An intuitive idea is to use additional auxiliary information to supplement the semantics of long-tail entities and strengthen structural-view embedding. Related research includes DKRL [64] and KDCoE [8] which adds text information, MT-KGNN [51], KBLRN [16] and AKRL [68] which adds attribute information, as well as IKRL [63] and MKBE [42] which adds image information and so on. On the basis of previous research, this paper also uses entity attributes as auxiliary information to strengthen structural-view embedding, and assumes that entities with similar attributes are also similar in categories. As shown in Fig. 1, the attribute information of the entity “Beijing” includes the value “21.536 million” on the attribute “Population”, the value “16410.54 km ${}^{2}$ ” on the attribute “Area”, the value “3537.13 billion” on the attribute “GRP”, and so on. In addition, the entity “Beijing” and the entity “Nanjing” have the same attributes “Population”, “Area”, “GRP”, etc. Semantically, they both belong to the category of city, so these two entities should be more similar in the semantic space intuitively. It has been proved that [17, 19, 32] entity category information can effectively improve the performance of entity embedding. Therefore, implicitly learning entity categories with the help of attribute learning module should also promote the performance of knowledge graph embedding.

Specifically, the H-AKRL model (Hypergraph Neural Networks based Attribute-embodied Knowledge Representation Learning) proposed in this paper uses an encoder-decoder framework: in the encoder, the entity embedding is learned independently from the structural-view and attribute-view, and sent to the fusion module to complete the encoding; in the decoder, ConvKB [38] plays the role of decoding. Under the structural-view, like most knowledge graph embedding models, the knowledge graph is regarded as a multi-relational directed graph, and the structural encoding of the entity is obtained through the interaction between entities. While under the attribute-view, the knowledge graph is regarded as a hypergraph, each attribute is modeled as a hyperedge, and entities on the same hyperedge share the attributes represented by the hyperedge. The main innovations of this paper are three-fold:

•

A unified knowledge graph embedding framework is proposed, which can simultaneously learn the structural information and attribute information of entities;

•

It is proposed to use hypergraph neural network to learn the attribute information of the knowledge graph. To the best of our belief, this is the first research to apply hypergraph learning to the knowledge graph embedding;

•

Finally, this paper applies the H-AKRL model to link prediction tasks on multiple real-world data sets. Compared with the latest link prediction models, H-AKRL model has obvious substantial improvements in results.

The rest of the paper is arranged as follows: Section 2 gives the formal expression of the problem and reviews related works, mainly focusing on the existing knowledge graph embedding models with entity attributes; Section 3 explains in detail the proposed model in this paper; Section 4 reports the experiment settings and result analysis; Section 5 concludes the paper and gives future directions of our model.

2. Related works

This section firstly gives the definitions and notations of the knowledge graph and link prediction task before introducing related works. Then the research ideas of knowledge graph embedding and knowledge graph embedding with entity attributes are analyzed in detail.

2.1 Problem formulation

Given a knowledge graph $G=\{E,R,A,V,T_{S},T_{A}\}$ , where $E$ represents the entity set, $R$ represents the relation set, $A$ represents the attribute set, and $V$ represents the attribute value set. Relation $r$ connects two entities $h$ and $t$ to form a structural triplet $(h,r,t)\in T_{S}$ , which represents there is a directed relation $r$ from the head entity $h$ to the tail entity $t$ , where $h\in E$ , $r\in R$ , $t\in E$ , $T_{S}\subseteq E\times R\times E$ . Attribute $a$ connects entity $e$ and attribute value $v$ to form an attribute triplet $(e,a,v)\in T_{A}$ , which represents the value of entity $e$ on attribute $a$ is $v$ , where $e\in E$ , $a\in A$ , $v\in V$ , $T_{A}\subseteq E\times A\times V$ . Link prediction on knowledge graph $G$ is to generate a set of structural triplets that are not in $T_{S}$ , namely ${T}^{\prime}_{S}=\{(h,r,t)|h\in E,r\in R,t\in E,(h,r,t)\notin T_{S}\}$ .

2.2 Knowledge graph embedding

As mentioned in the Introduction Section, using knowledge graph embedding to complete link prediction has been extensively studied. Let ${\rm{\bf h}}$ and ${\rm{\bf t}}$ denote the embedding vectors of the head entity $h$ and the tail entity $t$ respectively, and $f_{r}({\rm{\bf h}},{\rm{\bf t}})$ denote the scoring function of the knowledge graph embedding models. TransE [5] interprets the relation $r$ as a translation vector ${\rm{\bf r}}$ , so that the embedded entities ${\rm{\bf h}}$ and ${\rm{\bf t}}$ can be connected by ${\rm{\bf r}}$ with low error, that is, ${\rm{\bf h}}+{\rm{\bf r}}\approx{\rm{\bf t}}$ holds when $(h,r,t)$ is valid. DistMult [65] expresses each relation as a diagonal matrix and models the three-way interaction between the head entity, the relation and the tail entity. ComplEx [54] extends DistMult by introducing complex-valued embeddings to better model asymmetric relations. TuckER [3] views the whole knowledge graph as a binary tensor and applies Tucker decomposition to accomplish the link prediction task. RotatE [50] defines each relation as a rotation from the head entity to the tail entity in the complex space, that is, ${\rm{\bf h}}\circ{\rm{\bf r}}\approx{\rm{\bf t}}$ holds when $(h,r,t)$ is valid. ConvE [11] uses a multi-layer convolutional network model to convert the embeddings of $(h,r)$ pairs into a matrix, and treat it as a feature map extracted through a convolution kernel. RSNs [18] employ a skipping mechanism to bridge the gaps between entities and integrate recurrent neural networks with residual learning to efficiently capture the long-term relational dependencies within KGs. Table 1 summarizes the scoring function of the above-mentioned knowledge graph embedding method. For more information about knowledge graph embedding models, Rossi et al. [44] summarize all the state of the art models, as well as compare the effectiveness and efficiency of these models through large amounts of comparative experiments.

Table 1
The scoring functions of several knowledge graph embedding models

Model	Entity representation	Relation representation	$f_{r}({\rm{\bf h}},{\rm{\bf t}})$
TransE	${\rm{\bf h}},{\rm{\bf t}}\in{\rm R}^{d}$	${\rm{\bf r}}\in{\rm R}^{d}$	$-\left\\|{{\rm{\bf h}}+{\rm{\bf r}}-{\rm{\bf t}}}\right\\|_{1\mathord{\left/{% \vphantom{12}}\right.\kern-1.2pt}2}$
RotatE	${\rm{\bf h}},{\rm{\bf t}}\in{\rm C}^{d}$	${\rm{\bf r}}\in{\rm C}^{d}$	$\left\\|{{\rm{\bf h}}\circ{\rm{\bf r}}-{\rm{\bf t}}}\right\\|$
DistMult	${\rm{\bf h}},{\rm{\bf t}}\in{\rm R}^{d}$	${\rm{\bf r}}\in{\rm R}^{d}$	${\rm{\bf h}}^{T}\textit{diag}({\rm{\bf r}}){\rm{\bf t}}$
ComplEx	${\rm{\bf h}},{\rm{\bf t}}\in{\rm C}^{d}$	${\rm{\bf r}}\in{\rm C}^{d}$	$Re({\rm{\bf h}}^{T}\textit{diag}({\rm{\bf r}}){\rm{\bf\bar{t}}})$
TuckER	${\rm{\bf h}},{\rm{\bf t}}\in{\rm R}^{d}$	${\rm{\bf r}}\in{\rm R}^{d}$	${\rm{\bf W}}\times_{1}{\rm{\bf h}}\times_{2}{\rm{\bf r}}\times_{3}{\rm{\bf t}}$
ConvE	${\rm{\bf M}}_{h}\in{\rm R}^{d_{w}\times d_{h}},{\rm{\bf t}}\in{\rm R}^{d}$	${\rm{\bf M}}_{r}\in{\rm R}^{d_{w}\times d_{h}}$	$\sigma(\textit{vec}(\sigma([{\rm{\bf M}}_{h};{\rm{\bf M}}_{r}]*\omega)){\rm{% \bf W}}){\rm{\bf t}}$
ConvKB	${\rm{\bf h}},{\rm{\bf t}}\in{\rm R}^{d}$	${\rm{\bf r}}\in{\rm R}^{d}$	$\textit{concat}(\sigma([{\rm{\bf h}},{\rm{\bf r}},{\rm{\bf t}}]*\omega))\cdot{% \rm{\bf w}}$
RSNs	${\rm{\bf h}},{\rm{\bf t}}\in{\rm R}^{d}$	${\rm{\bf r}}\in{\rm R}^{d}$	$\sigma(\textit{rsn}({\rm{\bf h}},{\rm{\bf r}})\times{\rm{\bf t}})$

2.3 Knowledge graph embedding with entity attributes

Lin et al. [29] are the first to study the content of knowledge graph embedding with entity attributes. Let ${\rm{\bf e}}$ , ${\rm{\bf a}}$ and ${\rm{\bf v}}$ denote the embedding vectors of the entity $e$ , the attribute $a$ and the attribute value $v$ , respectively. Lin et al. [29] model the correlation analysis between entities and attributes as a classification task, define a scoring function $f_{a}({\rm{\bf e}},{\rm{\bf v}})=-\|{\sigma({{\rm{\bf eW}}_{a}+{\rm{\bf b}}_{a% }})-{\rm{\bf v}}_{av}}\|+b$ for attribute triplets, and propose KR-EAR model. Here, ${\rm{\bf W}}_{a}$ is the translation matrix and ${\rm{\bf b}}_{a}$ is the bias vector for attribute $a$ . The MT-KGNN [51] model and KBLRN [16] model use a shared embedding space to encode the attribute value while learning the entity structural representation. The embeddings of attribute values are obtained by constructing a neural network prediction model for attribute value regression. Wu and Wang [61] also define the attribute-view embedding as a prediction problem, and complete the representation learning under the attribute-view through a linear regression model. Shang et al. [46] express attributes as attribute nodes and the nodes are acted as “bridges” connecting related entities. The entity embeddings can be transported over these “bridges” to merge attribute information. Trisedya et al. [53] use TransE’s model assumptions to process attribute information, interpret attribute $a$ as a conversion from entity $e$ to attribute value $v$ , and use composite function $f_{a}(v)$ to encode the attribute value, that is, ${\rm{\bf e}}+{\rm{\bf a}}\approx f_{a}(v)$ holds when $(e,a,v)$ is valid. The MultiKE model [66] uses a convolutional neural network to extract features from the attributes and values of entities, and defines the scoring function of the attribute-view as $f_{a}({\rm{\bf e}},{\rm{\bf v}})=-\|{{\rm{\bf e}}-\textit{CNN}({\langle{{\rm{% \bf a}};{\rm{\bf v}}}\rangle})}\|$ . AKRL [68] also uses a deep convolutional neural network model to encode the attributes, and uses attribute information as well as structural information to generate attribute-based entity representations.

The above-mentioned models of knowledge graph embedding with entity attributes mostly adopt the method of generating vectors for attribute values to strengthen knowledge representation learning, but this approach has two potential problems. Firstly, the number of attributes for each entity is usually small and different from each other, so the attribute value vector will be very sparse. Secondly, the zero value in the attribute value vector may have ambiguous meaning: the entity does not have this attribute or the attribute value of this entity is zero, which will affect the accuracy of embedding.

In the study of representation learning for network structure, attributed network embedding [21, 22, 27] only considers the 0–1 matrix of nodes and attributes to construct the attribute correlation between nodes, so as to achieve the purpose of strengthening the embedding of network structure. It has been proved that node attributes are beneficial to tasks such as node classification and community analysis. Thus, in order to solve the shortcomings of existing models of knowledge graph embedding with entity attributes, this paper follows the ideas of attributed network embedding methods. In the attribute learning module, we only consider which attributes the entities have, and ignore the values of the entities on these attributes. The intuition of our study is that, the more common attributes between two entities, the more similar the entity types to which these two entities belong, and the closer the embeddings of these two entities are in the semantic space under the attribute-view. We believe by doing so, we can finally achieve the goal of strengthening the structural-view embedding of the knowledge graph.

3. Methodology

In this section, we first describe the framework of our proposed model H-AKRL (Hypergraph Neural Networks based Attribute-embodied Knowledge Representation Learning), and then introduce the specific process of each module in detail in subsections. Let us suppose that the number of entities contained in the knowledge graph $G$ is $n_{e}$ , the number of relations is $n_{r}$ , and the number of attributes is $n_{a}$ , namely, $n_{e}=|E|$ , $n_{r}=|R|$ , $n_{a}=|A|$ . Our goal is to learn deep embedding in a low-dimensional space for each entity, while retaining the structural information and attribute information of the entity.

Figure 3.

Overall architecture of H-AKRL.

3.1 Overall architecture

The overall architecture of the H-AKRL model is shown in Fig. 3. It is composed of three main steps: encoding step, fusion step and decoding step.

Firstly, in the encoding step, all entities in the entity set pass through the structure learning module and the attribute learning module to independently get the structural-view embedding and the attribute-view embedding. Among them, the roles of attribute learning and structure learning are acted by the hypergraph neural network and the TransE [5] model respectively.

Secondly, due to the heterogeneity of the information used in the structural-view and the attribute-view, the vector of the same entity in the two separate semantic spaces needs to be fused through the fusion module to achieve dual-channel entity representation. The fused entity vector can retain the structural information and the attribute information simultaneously.

Lastly, the obtained entity embedding matrix will be sent to the decoder for the knowledge graph link prediction task. Since we choose ConvKB [38] as the decoder in this paper, additional relational embedding matrices need to be provided.

Let $d_{s},d_{a},d_{f},d_{r}$ respectively denote the entity embedding dimension of the structure learning module, the entity embedding dimension of the attribute learning module, the entity embedding dimension after fusion module, and the relationship embedding dimension in the decoder. Next, we will introduce the modules used in these three steps in detail.

3.2 Structure learning module

The structure learning module preserves the relationship between entities by learning the structural triplets of the knowledge graph. We adopt TransE [5] model as our structure learning module and interpret the relation as a translation vector from the head entity to the tail entity. Given the structural triplet $(h,r,t)$ in the knowledge graph, the following scoring function is used to measure the rationality of the given triplet:

$\displaystyle f_{r}^{S}({\rm{\bf h}}^{S},{\rm{\bf t}}^{S})=-\|{{\rm{\bf h}}^{S% }+{\rm{\bf r}}^{S}-{\rm{\bf t}}^{S}}\|$ (1)

where $\|{\cdot}\|$ represents the L1 or L2 norm, and the superscript $S$ represents the structure-view. The parameters are optimized by minimizing the hinge loss function, namely:

$\displaystyle L_{S}=\sum\limits_{(h,r,t)\in T_{S}}{\left({-\log\sigma(\gamma-f% _{r}^{S}({\rm{\bf h}}^{S},{\rm{\bf t}}^{S}))-\sum\limits_{i=1}^{p}{\frac{1}{p}% \log\sigma(f_{r}^{S}({\rm{\bf{h}^{\prime}}}_{i}^{S},{\rm{\bf{t}^{\prime}}}_{i}% ^{S})-\gamma)}}\right)}$ (2)

where $\gamma$ is a fixed margin, $\sigma$ is the nonlinear activation function. By randomly replacing the head entity or the tail entity, $p$ negative triplets are generated for each valid one $(h,r,t)$ , $(h_{i}^{\prime},r,t_{i}^{\prime})$ represents the $i$ -th negative triplet, and satisfies $h_{i}^{\prime}\in E,t_{i}^{\prime}\in E,(h_{i}^{\prime},r,t_{i}^{\prime})% \notin T_{S}$ .

3.3 Attribute learning module

Classical graph learning theory defines an edge as a point-to-point relationship on the graph. An edge can only connect two nodes and can only model the binary relationship. In the hypergraph learning theory, the hyperedge models the group relationship between nodes, and a hyperedge can connect any number of nodes, so the hypergraph can model the higher-order correlation between them. Hypergraph learning assumes that given any two nodes in the hypergraph, the larger the amount of hyperedges linking them, the more connected they are to each other, the stronger the data association between these two nodes. This assumption is consistent with our research intuition, that is, the more common attributes between two entities in the knowledge graph, the more similar the entity types to which these two entities belong, and the closer the embeddings of these two entities are in the semantic space.

Inspired by hypergraph learning, given the set of attribute triplets $T_{A}$ , we firstly construct the incidence matrix ${\rm{\bf H}}$ of the attribute learning module, which is defined as follows, for any $e\in E$ , $a\in A$ :

$\displaystyle{\rm{\bf H}}(e,a)=\left\{{\begin{array}[]{ll}1,&\text{if }\exists v% ,s.t.(e,a,v)\in T_{A}\\ 0,&\text{otherwise}\\ \end{array}}\right.$ (3)

where, ${\rm{\bf H}}\in{\rm R}^{n_{e}\times n_{a}}$ . The attribute learning module constructs a hyperedge for each attribute, and entities with the same attribute are each other’s neighbor on this specific hyperedge. Under the attribute-view, the degree of an entity $e$ is defined as $d(e)=\sum\limits_{a\in A}{{\rm{\bf H}}(e,a)}$ , namely, the number of different attributes that entity $e$ possesses; the degree of an attribute $a$ is defined as $d(a)=\sum\limits_{e\in E}{{\rm{\bf H}}(e,a)}$ , namely, the number of different entities with that attribute. ${\rm{\bf D}}_{e}$ and ${\rm{\bf D}}_{a}$ respectively represent the diagonal matrix of entity degree and attribute degree.

The attribute learning module based on the hypergraph neural network can learn entity embedding and attribute embedding simultaneously, so that the information of entities can be transferred to each other through the shared hyperedge, thereby achieving the purpose of updating the entity embedding. The attribute embedding is also updated along with the updating of entity embedding. Let ${\rm{\bf X}}^{(0)}$ denote the entity embedding in the initial state, ${\rm{\bf X}}^{(k)}\in{\rm R}^{n_{e}\times d_{a}^{(k)}}$ denote the entity embedding after $k$ hyperedge convolutional layers, where $d_{a}^{(k)}$ denotes the embedding dimension of the $k$ -th hyperedge convolutional layer. The hypergraph neural network defines attribute embedding as the aggregation of the embeddings of related entities. Specifically, let ${\rm{\bf Y}}^{(k)}\in{\rm R}^{n_{a}\times d_{a}^{(k)}}$ denote the attribute embedding after k hyperedge convolutional layers, according to Feng et al. [15], we have:

$\displaystyle{\rm{\bf Y}}^{(k)}={\rm{\bf H}}^{T}{\rm{\bf D}}_{e}^{-1\mathord{% \left/{\vphantom{12}}\right.\kern-1.2pt}2}{\rm{\bf X}}^{(k)}$ (4)

The Eq. (4) shows that the attribute embedding ${\rm{\bf Y}}^{(k)}$ is obtained through a linear transformation of the relevant entity embedding ${\rm{\bf X}}^{(k)}$ .

Entity embedding is updated once after each hyperedge convolutional layer. Assuming that after the $k+1$ -th hyperedge convolutional layer, the entity embedding is updated to ${\rm{\bf X}}^{(k+1)}$ , and the update formula is as follows:

$\displaystyle{\rm{\bf X}}^{(k+1)}=\sigma\left({{\rm{\bf D}}_{e}^{-1\mathord{% \left/{\vphantom{12}}\right.\kern-1.2pt}2}{\rm{\bf HD}}_{a}^{-1}{\rm{\bf Y}}^{% (k)}{\rm{\bf W}}^{(k+1)}}\right)$ (5)

where ${\rm{\bf W}}^{(k+1)}\in{\rm R}^{d_{a}^{(k)}\times d_{a}^{(k+1)}}$ represents the corresponding linear transformation matrix of the $k+1$ layer, $\sigma$ is the nonlinear activation function. The updated entity embedding ${\rm{\bf X}}^{(k+1)}$ can be regarded as the aggregation of the embeddings of related attributes ${\rm{\bf Y}}^{(k)}$ in the k-th layer.

It has been proved that HNN [15] is a spectral convolution operation defined on a hypergraph. Hypergraph neural network can effectively extract the high-order correlation between entities through the entity-attribute-entity conversion and aggregation on each layer.

We assume that the attribute learning module is formed by stacking K-layer hyperedge convolutional layers. The initial entity embedding matrix is ${\rm{\bf X}}^{(0)}$ , the final entity embedding matrix obtained by the attribute learning module is ${\rm{\bf X}}^{(K)}$ . For simplicity, we also assume that the embedding dimension of each hyperedge convolutional layer is the same, namely, $d_{a}^{(1)}=\ldots=d_{a}^{(K)}=d_{a}$ .

Figure 4 vividly depicts the data flow of the attribute learning module. The figure contains a total of 6 entities and 3 attributes, of which attribute “a1” is shared by entities 1, 2, and 6, so the hyperedge corresponding to attribute “a1” connects the nodes corresponding to entities 1, 2, and 6. In the same way, the hyperedge corresponding to attribute “a2” connects the nodes corresponding to entities 1, 3, and 4, and the hyperedge corresponding to attribute “a3” connects the nodes corresponding to entities 3, 4, 5, and 6. The hypergraph neural network in the figure is composed of K hyperedge convolutional layers with embedding dimension $d_{a}=4$ The entity embedding matrix is ${\rm{\bf X}}^{(k)}\in{\rm R}^{6\times 4},k=0,1,\ldots K$ , where each row represents an entity embedding vector, and the attribute embedding matrix is ${\rm{\bf Y}}^{(k)}\in{\rm R}^{3\times 4},k=0,1,\ldots K$ , where each row represents an attribute embedding vector. A hyperedge convolutional layer mainly performs node feature gathering and edge feature gathering to complete the update of entity representations and attribute representations. Node feature gathering and edge feature gathering are given by Eqs (4) and (5) respectively.

Figure 4.

Schematic diagram of the attribute learning module.

The attribute learning module is trained through a logistic regression model, so the loss function is defined as follows:

$\displaystyle L_{A}=\sum\limits_{(e,a,v)\in T_{A}}{\left({-\log\left({g_{a}({% \rm{\bf e}}^{A},{\rm{\bf a}}^{A})}\right)-\sum\limits_{i=1}^{p}{\frac{1}{p}% \log\left({1-\left({g_{a}({\rm{\bf e}}_{i}^{A},{\rm{\bf{a}^{\prime}}}_{i}^{A})% }\right)}\right)}}\right)}$ (6)

where, $g_{a}$ is the logistic regression discriminant function. We choose to generates $p$ negative examples $(e,{a}^{\prime},v)$ for each true triplet $(e,a,v)$ , and satisfies ${\rm{\bf H}}(e,{a}^{\prime})=0$ .

3.4 Training objective

Combining the above two modules, the structure learning module captures entity similarity based on entity relationships, while the attribute learning module captures entity similarities based on entity attributes. The structure learning module and the attribute learning module together constitute the encoder part, and the overall optimization goal of joint learning is:

$\displaystyle L_{En}=L_{S}+L_{A}$ (7)

3.5 Fusion module and decoder

This paper uses ConvKB [38] as the decoder. Before decoding, since the two learning modules construct two mutually independent semantic spaces, it is necessary to go through the fusion module to map the entity embeddings from different learning modules into the same vector space.

For any $e\in E$ , ${\rm{\bf e}}^{S}$ and ${\rm{\bf e}}^{A}$ respectively represent the entity embedding obtained from the structure learning module and the attribute learning module. The entity embedding after the fusion module is:

$\displaystyle{\rm{\bf e}}^{F}={\rm{\bf W}}^{F}({{\rm{\bf e}}^{S}||{\rm{\bf e}}% ^{A}})$ (8)

where, $||$ represents the concatenating operation, and ${\rm{\bf W}}^{F}\in{\rm R}^{d_{f}\times(d_{s}+d_{a})}$ represents the linear transformation matrix of the fusion module.

ConvKB uses convolution operations to analyze the global embedding features in each dimension of the triplet, while retaining the translation characteristics of the triplet. The scoring function is defined as follows:

$\displaystyle f_{r}^{De}({\rm{\bf h}}^{F},{\rm{\bf t}}^{F})=\left({\mathop{||}% \limits_{m=1}^{\Omega}\mbox{ReLU}\left({[{\rm{\bf h}}^{F},{\rm{\bf r}}^{De},{% \rm{\bf t}}^{F}]\ast\omega^{m}}\right)}\right).{\rm{\bf W}}^{De}$ (9)

where $\omega^{m}$ represents the $m$ -th convolution kernel, $\Omega$ represents the total number of convolution kernels, $\ast$ represents the convolution operation, ${\rm{\bf W}}^{De}\in{\rm R}^{\Omega d_{f}\times 1}$ represents the final linear transformation matrix for calculating the score of the triplet, and ${\rm{\bf r}}^{De}$ represents the embedding of the relation $r$ , which requires random initialization. The loss function of the decoder part is:

$\displaystyle L_{De}=\sum\limits_{(h,r,t)\in T_{S}}{\left({\log\left({1+\exp(f% _{r}^{De}({\rm{\bf h}}^{F},{\rm{\bf t}}^{F}))}\right)\mbox{ + }\sum\limits_{i=1}^{p}{\frac{1}{p}\log\left({1-\exp(f_{r}^{De}({\rm{\bf{h}^{% \prime}}}^{F},{\rm{\bf{t}^{\prime}}}^{F}))}\right)}}\right)}$ (10)

The complete training process of H-AKRL is shown in Algorithm 1.

Algorithm 1: Training process of H-AKRL
Input: Knowledge graph $G$ , max epochs Q
1 Get randomly initialized entity embedding
2 for $q=$ 1, 2, …, Q do
3 Minimize $L_{En}$ under the relation view
4 Minimize $L_{En}$ under the attribute view
5 end for
6 Get combined entity embedding
7 Get randomly initialized relation embedding
8 for $q=$ 1, 2, …, Q do
9 Minimize $L_{De}$ for decoder
10 end for
11 Link prediction for $G$

4. Experiments and results

4.1 Datasets

This paper constructs two data sets FB15k-237-Attr and FB24K as example analysis to evaluate the performance of the H-AKRL model proposed in this paper in link prediction.

FB15k-237-Attr: It is mainly composed of FB15k-237 data set [52]and related attribute triplets [16]. The FB15k-237 data set mostly describes facts about movies, actors, awards, sports and sports teams, among which the relation triplets are extracted from Freebase, and the inverse relations in FB15K [5] are deleted. In order to reflect the supplementary role of information between relationship triplets and attribute triplets, this paper only selects entities that appear in both data sets.

FB24K: The original data set was created by Lin et al. [29] and also extracted from Freebase. Compared with the FB15k-237-Attr data set, this data set has richer attribute information. We have done the same filtering work as that on FB15k-237-Attr.

Table 2
Statistics of datasets

Dataset	#Entity	#Relation	#Attribute	#Attr-triplets	#Train	#Valid	#Test
FB15k-237-Attr	12402	161	115	29247	163153	9384	10934
FB24K	19480	509	281	119330	181977	8738	9518

The relevant statistical information of the constructed data sets is shown in Table 2.

4.2 Experiment settings

The goal of the link prediction task is to predict a triplet when the head entity or tail entity is missing, i.e., given a combination $(?,r,t)$ of a relation $r$ and a tail entity $t$ to predict the head entity $h$ , or given a combination $(h,r,?)$ of a head entity $h$ and a relation $r$ to predict the tail entity $t$ . In this task, we firstly remove the head entity or the tail entity of the triplet in validation set and test set, and replace it with all other entities to form corrupted triplets. Then by calculating the scores of these corrupted triplets, we get their descending order. Finally, the ranking of the correct entity is recorded. Link prediction task emphasizes the final ranking of the correct entity rather than just finding the best entity. We choose mean rank (MR) and the ratio of the top N correct entities (Hits@N, $\text{N}=1,3,10$ ) as evaluation metrics. Since MR is highly sensitive to outliers, we also choose mean reciprocal rank (MRR) as the evaluation metric to obtain a more stable evaluation result. The lower MR values and the higher MRR or Hits@N values indicate better performance of the model.

We assume that the entity embedding dimension of the structural learning module, the entity embedding dimension and the relation embedding dimension of the fusion module are the same, namely $d_{s}=d_{f}=d_{r}=d$ , and $d$ is set by grid search, $d\in\{25,40,50\}$ . The entity embedding dimension of the attribute learning module, and the learning rate (learning rate, $l r$ ) are also set by grid search, $d_{a}\in\{64,100,128\}$ , $lr\in\{0.01,0.005,0.001\}$ . Other hyperparameter settings include: negative sample size $p=10$ , fixed margin in structure learning module $\gamma=24$ , the number of hyperedge convolution layers in attribute learning module $K=2$ , the total number of decoder convolution kernel $\Omega=64$ , the dropout value of attribute learning module and decoder is set to 0.5.

We select TransE [5], Distmult [65] and ConvE [11], as our baseline models. According to the experimental results of Rossi et al. [44], we also select the current state-of-the-art link prediction models, including ComplEx [54], RotatE [50], Tucker [3], and RSNs [18], as our comparative experiments. In order to allow these models to process attribute information, we follow the approach of Chao et al. [46], to express attributes as attribute nodes connecting related entities.

We use the adaptive moment (Adam) algorithm [26] to train the model, follow Bordes et al. [5] to use the filtered settings for evaluation.

4.3 Results and analysis

Table 3 shows the overall link prediction performance of the H-AKRL model and the selected baseline models on the two data sets.

It can be seen from Table 3 that on the FB15k-237-Attr data set, H-AKRL model has achieved the best results on all indicators; while on the FB24K data set, H-AKRL model performs best on the HITS@10 and MR, and the remaining three indicators all achieves sub-optimal results. It fully illustrates the superiority of the H-AKRL model in the performance of link prediction.

In addition, we find that RotatE also shows good performance on these two data sets. We believe the main reason is that RotatE defines entity embedding and relation embedding in the complex domain. Operators on the complex domain have stronger expressive power and can better model symmetric relations. Therefore, in our future work, we can consider defining our model on the complex domain and using RotatE as our structural learning module.

Table 3
Link prediction for FB15k-237-attr and FB24K datasets

Model	FB15k-237-Attr					FB24K
	HITS			MR	MRR	HITS			MR	MRR
	@1	@3	@10			@1	@3	@10
TransE	0.107	0.199	0.351	281.43	0.185	0.177	0.332	0.484	252.25	0.283
DistMult	0.090	0.170	0.296	607.31	0.158	0.142	0.276	0.432	549.58	0.239
ComplEx	0.087	0.170	0.299	803.91	0.156	0.201	0.330	0.462	699.78	0.290
ConvE	0.139	0.243	0.398	260.78	0.223	0.213	0.342	0.499	219.02	0.308
RotatE	0.147	0.246	0.375	359.82	0.224	0.249	0.383	0.522	290.02	0.343
TuckER	0.116	0.192	0.290	437.74	0.176	0.090	0.151	0.251	577.29	0.143
RSNs	0.101	0.187	0.327	397.31	0.175	0.218	0.345	0.493	305.09	0.310
H-AKRL	0.148	0.260	0.414	199.31	0.235	0.227	0.368	0.534	142.62	0.329

${}^{*}$ The best score is in bold and second best score is underlined.

It has been demonstrated through previous works that the Tucker is fully expressive when the entity embedding dimension is larger than the number of entities and the relationship embedding dimension is larger than the number of relationships, that is, has the theoretical potential to learn correctly any valid graph, without being hidden by intronic limitations. However, experimental results from Table 3 shows that Tucker does not performance well on link prediction of the aforementioned dataset. The possible reason could be that the entity embedding dimension set experimentally in this paper is much smaller than the number of entities, and the capability of Tucker is considerably weakened for model expression. As stated by Meilicke et al. [6], when the embedding dimension is low, a fully expressed model does not necessarily exert better performance on link prediction. In addition, the way to express attributes as attribute nodes increases both the number of entities and the number of relations, potentially further impacting the Tucker’s performance.

In order to illustrate the outstanding effect of the H-AKRL model on long-tail entities, we take the FB15k-237-Attr data set as an example for in-depth analysis. Figure 5 shows the distribution of entity degree in the FB15k-237-Attr dataset. It can be seen that there are a large number of long-tail entities in the FB15k-237-Attr dataset, where the degree of an entity is defined as the number of structural triplets containing the entity.

We select TransE, RotatE and H-AKRL models, and carry out comparative experiments for related link prediction tasks on entity with degrees of 1, 2, and 3 respectively. The experimental results are shown in Table 4.

Table 4

Link prediction results by degrees

Model	Degree $=$ 1			Degree $=$ 2			Degree $=$ 3
	MRR	MR	Hits@10	MRR	MR	Hits@10	MRR	MR	Hits@10
TransE w/o attr.	0.00297	7346.16	0.000	0.06620	2033.86	0.179	0.06659	1227.24	0.151
TransE w/attr.	0.06416	1812.47	0.079	0.07623	865.40	0.180	0.11241	940.50	0.220
RotatE w/o attr	0.00069	8045.26	0.000	0.08050	3474.46	0.054	0.14037	2129.61	0.220
RotatE w/attr	0.05597	2590.21	0.101	0.14054	839.71	0.232	0.14355	1097.46	0.252
H-AKRL	0.06430	1543.24	0.105	0.18405	674.94	0.268	0.18045	416.20	0.269

${}^{*}$ The best score is in bold.

Figure 5.

Distribution of entity degree.

It should be noted that in the table, TransE and RotatE provide two sets of experimental results in each comparative experiment. One is obtained by using only structural triplets, the other is obtained by following the “bridge” approach of Chao et al. [46], combining entity structural information and attribute information. From Table 4, it can be clearly seen that, in all comparative experiments, whether TransE or RotatE, the link prediction experiment result combined with attribute information is better than that using only structural information. It fully shows that there exists a complementary relationship between entity attribute information and entity structural information, and attribute information can enhance the performance of link prediction.

In addition, it can be found that although RotatE shows strong performance in the link prediction tasks on all entities in Table 3, it still has shortcomings in the representation learning of long-tail entities shown in Table 4. Compared with the two baseline models, the performance of the H-AKRL model on long-tail entities have been significantly improved in all aspects. Taking the Hits@10 index on entities with a degree of 3 as an example, compared with TransE and RotatE, the H-AKRL model achieves an increase of 22.2% and 6.7% respectively. It fully shows that the H-AKRL model proposed in this paper can effectively use entity attribute information and enrich the semantics of long-tail entities. This also further shows that the hypergraph neural network can better model the similarity between entities in the attribute-view.

Figure 6.

Ablation study result.

For further illustration, we conducted the following ablation study to exhibit the important role of the attribute learning module in our model. We delete the attribute learning module and fusion module in our model, which means, the entity embedding obtained by the structure learning module is directly sent to the decoder to complete the link prediction task, and the result is shown in Fig. 6.

For the compactness of representation, we normalized the experimental results: the results obtained from TransE in Table 3 are all set to 1, and the results of TransE+ConvKB and H-AKRL are scaled in equal proportions. The scaled results represent the ratio of the results compared to TransE.

It can be seen from Fig. 6 that among the five indicators, H-AKRL has achieved the best performance. The results of TransE+ConvKB are between the results of H-AKRL and TransE. TransE has the worst link prediction performance. This also further illustrates: (1) Compared with a separate structure learning module, the encoder-decoder architecture proposed in this paper can effectively improve the embedding performance of the entity. We speculate the possible reason is that although TransE has some drawbacks in dealing with one-to-many, many-to-many relation types, since H-AKRL adopt an encoder-decoder framework, the convolution operation at the decoder side enhances the correlation between entities and relations, thus weakening the inherent drawbacks of TransE in relation processing and improving link prediction performance; (2) The attribute learning module based on the hypergraph neural network can better extract the entity similarity under the attribute-view, enriching the semantic information of the entity, and showing obvious advantages in improving the performance of link prediction.

5. Conclusions

This paper studies the problem of knowledge graphs embedding with entity attribute, and proposes a knowledge graph embedding model H-AKRL based on hypergraph neural network, which can simultaneously learn the attribute information and structural information of entities. In the encoder part, TransE and Hypergraph Neural Network play the roles of structure learning module and attribute learning module respectively, and ConvKB is selected as the decoder. H-AKRL has made significant improvements in experiments on multiple real-world data sets, especially in the link prediction of long-tail entities, which proves that the hypergraph neural network can more effectively model the entity similarity under the attribute-view.

Due to the inherent flaws of TransE’s model assumptions, it cannot correctly model one-to-many and many-to-many relations. Using other structural learning models will be our future work.

In addition, in the attribute learning module, the extent of attributes distinguishing between entities differs. Before constructing the incidence matrix, we can consider using the attention mechanism to assign different attention coefficients to different attributes, which is another direction where our model can be improved in the future.

Footnotes

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61806221).

References

Auer

et al., Dbpedia: A nucleus for a web of open data, The semantic web, 2007, 722–735.

Bai

Zhang

and Torr

P.H.

, Hypergraph convolution and hypergraph attention, Pattern Recognition 110 (2021), 107637.

Balaževic

Allen

and Hospedales

T.M.

, TuckER: Tensor Factorization for Knowledge Graph Completion, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019.

Bollacker

et al., Freebase: a collaboratively created graph database for structuring human knowledge, in: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, 2008.

Bordes

et al., Translating embeddings for modeling multi-relational data, Neural Information Processing Systems, 2013.

Meilicke

et al., Fine-grained evaluation of rule-and embedding-based systems for knowledge graph completion, in: International Semantic Web Conference, 2018, pp. 3–20.

Carlson

et al., Toward an architecture for never-ending language learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2010.

Chen

et al., Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment, in: Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018.

Cohen

W.W.

et al., Scalable Neural Methods for Reasoning with a Symbolic Knowledge Base, in: International Conference on Learning Representations, 2019.

10.

Das

et al., Go for a walk and arrive at the answer: Reasoning over paths in knowledge bases using reinforcement learning, in: International Conference on Learning Representations, 2018.

11.

Dettmers

et al., Convolutional 2d knowledge graph embeddings, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2018.

12.

Dong

et al., Knowledge vault: A web-scale approach to probabilistic knowledge fusion, in: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014.

13.

Fan

et al., Transition-based knowledge graph embedding with relational mapping properties, in: Proceedings of the 28th Pacific Asia Conference On Language, Information and Computing, 2014.

14.

Feng

et al., Knowledge graph embedding by flexible translation, in: Proceedings of the Fifteenth International Conference on Principles of Knowledge Representation and Reasoning, 2016.

15.

Feng

et al., Hypergraph neural networks, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2019.

16.

Garcia-Duran

and Niepert

, Kblrn: End-to-end learning of knowledge base representations with latent, relational, and numerical features, arXiv preprint arXiv:1709.04676, 2017.

17.

Guan

Song

and Liao

, Knowledge graph embedding with concepts, Knowledge-Based Systems 164 (2019), 38–44.

18.

Guo

Sun

and Hu

, Learning to exploit long-term relational dependencies in knowledge graphs, in: International Conference on Machine Learning, 2019.

19.

Hao

et al., Universal representation learning of knowledge bases by jointly embedding instances and ontological concepts, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019.

20.

Hoffart

et al., YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia, Artificial Intelligence 194 (2013), 28–61.

21.

Hong

et al., Deep attributed network embedding by preserving structure and attribute information, in: IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2019.

22.

Hou

and Tang

, RoSANE: Robust and scalable attributed network embedding for sparse networks, Neurocomputing 409 (2020), 231–243.

23.

et al., Knowledge graph completion with adaptive sparse transfer matrix, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2016.

24.

et al., Knowledge graph embedding via dynamic mapping matrix, in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 2015.

25.

Jiang

et al., Dynamic Hypergraph Neural Networks, IJCAI, 2019, 2635–2641.

26.

Kingma

D.P.

and Ba

, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980, 2014.

27.

Yin

and Chen

, SEAL: Semisupervised Adversarial Active Learning on Attributed Graphs, in: IEEE Transactions on Neural Networks and Learning Systems, 2020.

28.

Lin

et al., Learning entity and relation embeddings for knowledge graph completion, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2015.

29.

Lin

Liu

and Sun

, Knowledge representation learning with entities, attributes and relations, in: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016.

30.

Liu

and Yang

, Analogical inference for multi-relational embeddings, in: Proceedings of the 34th International Conference on Machine Learning, 2017.

31.

Liu

et al., Probabilistic reasoning via deep learning: Neural association models, arXiv preprint arXiv:1603.07704, 2016.

32.

et al., Differentiating Concepts and Instances for Knowledge Graph Embedding, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018.

33.

Mahdisoltani

Biega

and Suchanek

, Yago3: A knowledge base from multilingual wikipedias, in: 7th Biennial Conference on Innovative Data Systems Research, 2014.

34.

Miller

G.A.

, WordNet: a lexical database for English, Communications of the ACM, 1995, 39–41.

35.

Mitchell

et al., Never-ending learning, Communications of the ACM, 2018, 103–115.

36.

Nathani

et al., Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019.

37.

Neelakantan

Roth

and McCallum

, Compositional Vector Space Models for Knowledge Base Completion, in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 2015.

38.

Nguyen

D.Q.

et al., A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018.

39.

Nguyen

D.Q.

Nguyen

T.D.

and Phung

, A Relational Memory-based Embedding Model for Triple Classification and Search Personalization, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020.

40.

Nickel

Rosasco

and Poggio

, Holographic embeddings of knowledge graphs, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2016.

41.

Nickel

Tresp

and Kriegel

H.P.

, A three-way model for collective learning on multi-relational data, in: Proceedings of the 28th International Conference on International Conference on Machine Learning, 2011.

42.

Pezeshkpour

Chen

and Singh

, Embedding Multimodal Relational Data for Knowledge Base Completion, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018.

43.

Rastogi

et al., Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2020.

44.

Rossi

et al., Knowledge graph embedding for link prediction: A comparative analysis, ACM Transactions on Knowledge Discovery from Data, 2021, 1–49.

45.

Schlichtkrull

et al., Modeling relational data with graph convolutional networks, in: European Semantic Web Conference, 2018.

46.

Shang

et al., End-to-end structure-aware convolutional networks for knowledge base completion, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2019.

47.

Shi

and Weninger

, ProjE: Embedding projection for knowledge graph completion, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2017.

48.

Socher

et al., Reasoning with neural tensor networks for knowledge base completion, in: Proceedings of the 26th International Conference on Neural Information Processing Systems, 2013.

49.

Suchanek

F.M.

Kasneci

and Weikum

, Yago: a core of semantic knowledge, in: Proceedings of the 16th International Conference on World Wide Web, 2007.

50.

Sun

et al., RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space, in: International Conference on Learning Representations, 2018.

51.

Tay

et al., Multi-task neural network for non-discrete attribute prediction in knowledge graphs, in: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017.

52.

Toutanova

and Chen

, Observed versus latent features for knowledge base and text inference, in: Proceedings of the 3rd Workshop on Continuous Vector Space Models and Their Compositionality, 2015.

53.

Trisedya

B.D.

and Zhang

, Entity alignment between knowledge graphs using attribute embeddings, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2019.

54.

Trouillon

et al., Complex embeddings for simple link prediction, in: Proceedings of the 33rd International Conference on International Conference on Machine Learning, 2016.

55.

Vashishth

et al., Composition-based Multi-Relational Graph Convolutional Networks, in: International Conference on Learning Representations, 2019.

56.

Vashishth

et al., RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018.

57.

et al., A capsule network-based embedding model for knowledge graph completion and search personalization, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019.

58.

Wang

et al., Knowledge graph embedding via graph attenuated attention networks, IEEE Access 8 (2019), 5212–5224.

59.

Wang

et al., Knowledge graph embedding by translating on hyperplanes, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2014.

60.

et al., Dual-view hypergraph neural networks for attributed graph learning, Knowledge-Based Systems, 2021, 107185.

61.

and Wang

, Knowledge graph embedding with numeric attributes of entities, in: Proceedings of The Third Workshop on Representation Learning for NLP, 2018.

62.

Xiao

Huang

and Zhu

, From one point to a manifold: knowledge graph embedding for precise link prediction, in: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016.

63.

Xie

et al., Image-embodied knowledge representation learning, in: Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017.

64.

Xie

et al., Representation learning of knowledge graphs with entity descriptions, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2016.

65.

Yang

et al., Embedding entities and relations for learning and inference in knowledge bases, in: International Conference on Learning Representations, 2015.

66.

Zhang

et al., Multi-view knowledge graph embedding for entity alignment, in: Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019, pp. 5429–5435.

67.

Zhang

et al., Distilling knowledge from well-informed soft labels for neural relation extraction, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2020.

68.

Zhang

et al., Representation learning of knowledge graphs with entity attributes, IEEE Access 8 (2020), 7435–7441.

69.

Zhou

et al., Improving conversational recommender systems via knowledge graph based semantic fusion, in: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020.

Knowledge graph embedding with entity attributes using hypergraph neural networks

Abstract

Keywords

1. Introduction

2.1 Problem formulation

2.2 Knowledge graph embedding

Table 1 The scoring functions of several knowledge graph embedding models

3. Methodology

3.2 Structure learning module

4.1 Datasets

Table 2 Statistics of datasets

4.3 Results and analysis

Table 3 Link prediction for FB15k-237-attr and FB24K datasets

Footnotes

Acknowledgments

References

Table 1
The scoring functions of several knowledge graph embedding models

Table 2
Statistics of datasets

Table 3
Link prediction for FB15k-237-attr and FB24K datasets