HPRE: Leveraging hierarchy-aware paired relation vectors for knowledge graph embedding

Abstract

Knowledge graphs exhibit a typical hierarchical structure and find extensive applications in various artificial intelligence domains. However, large-scale knowledge graphs need to be completed, which limits the performance of knowledge graphs in downstream tasks. Knowledge graph embedding methods have emerged as a primary solution to enhance knowledge graph completeness. These methods aim to represent entities and relations as low-dimensional vectors, focusing on handling relation patterns and multi-relation types. Researchers need to pay more attention to the crucial feature of hierarchical relationships in real-world knowledge graphs. We propose a novel knowledge graph embedding model called Hierarchy-Aware Paired Relation Vectors Knowledge Graph Embedding (HPRE) to bridge this gap. By leveraging the power of 2D coordinates, HPRE adeptly model relation patterns, multi-relation types, and hierarchical features in the knowledge graph. Specifically, HPRE employs paired relation vectors to capture the distinct characteristics of head and tail entities, facilitating a better fit for relational patterns and multi-relation scenarios. Additionally, HPRE employs angular coordinates to differentiate entities at various levels of the hierarchy, effectively representing the hierarchical nature of the knowledge graph. The experimental results show that the HPRE model can effectively learn the hierarchical features of the knowledge graph and achieve state-of-the-art experimental results on multiple real-world datasets for the link prediction task.

Keywords

Knowledge graph completion link prediction knowledge graph embedding knowledge graph representation

1 Introduction

A knowledge graph (KG) is a knowledge base composed of nodes and edges, representing complex relationships in the real world in the form of a graph. With the rapid development of artificial intelligence, knowledge graphs have been widely used in various fields, such as information extraction [1], question answering [2], and recommendation systems [3]. Knowledge graphs are potent representations of structured information, capturing relationships between entities and providing a foundation for various applications. However, these knowledge graphs often need to be complete, where important relations and facts are missing. The incompleteness of knowledge graphs significantly hampers the performance of downstream tasks that rely on accurate and comprehensive knowledge representations. For instance, in question-answering systems, the need for certain relationships can lead to incorrect or incomplete responses. Information about user preferences or item characteristics is necessary for recommendation systems to avoid suboptimal recommendations. Similarly, incomplete knowledge graphs limit the effectiveness of search and retrieval algorithms in information retrieval. Knowledge graph completion (KGC) effectively alleviates the incompleteness of complex graphs by inferring relations or entities that do not exist in the graph using the existing data in the knowledge base. The primary objective of KGC is to enrich the data of the knowledge graph through this inference process. KGC can be formulated to predict missing relations or entities in a knowledge graph. We denote the knowledge graph as KG ={ (h, r, t) }, where h represents the head entity, r represents relation, and t represents the tail entity. The objective of KGC is predict the missing triples (h, r, t) in the knowledge graph.

Researchers widely use knowledge graph representation learning in the knowledge graph completion task. This approach involves learning entities and relations in a low-dimensional continuous vector space to obtain improved semantic representation. Knowledge graph embedding offers better generalization and greater ease of transfer to the downstream task. An example of a knowledge graph is illustrated in Fig. 1, which contains multiple relationships and entities arranged hierarchically. For instance, “City,” “Honolulu,” “State,” and “California.” The hierarchy of “City” is higher than that of “Honolulu,” while the hierarchy of “State” is higher than that of “California.”

Fig. 1

An example of the knowledge graph.

In order to solve the problem of knowledge graph incompletion, many researchers have proposed multiple knowledge graphs embedding models, such as TransE [4], TransH [5], and RotatE [6]. These models represent entities and relations via low-dimensional vectors, and head entity vectors, tail entity vectors, and relation vectors satisfy specific mathematical relations. The RotatE model can make full use of the spatial relationship. The RotatE model simulates the relationship as the rotation between the head and tail entities. The particular mechanism to better model symmetry/anti-symmetry, inverse, and composition. In practical applications, knowledge graphs often contain multi-relational, and the RotatE model must accurately simulate multi-relationship [7]. The hierarchical relationship between entities is also crucial for model training, which needs to be considered in the RotatE model [8]. Multi-dimensional space can increase the expressive power of the model. MRoatE [7] and StructurE [9] fully use of 2D space, and the models achieve excellent performance in link prediction tasks. Rotate3D [10] models map relationships and entities in 3D vector space, and performance has been further improved. These models still need to incorporate hierarchical information into the model.

In the current model, three main issues have been identified: (1) The Translation model fails to handle multi-relation patterns, such as the inability of the TransE model to handle symmetric/anti-symmetric relation models, and the inability of the RotatE model to handle complex relations. (2) The translation model cannot differentiate between entities at different semantic levels. For instance, in the knowledge graph, the semantic hierarchy of the head entity “palm” and the tail entity “tree” in the triple (palm, _hypernym, tree) is distinct, with “palm” having a lower semantic hierarchy than “tree”. (3) The translation model cannot differentiate between model entities at the same level. For instance, although the head entity of two triples (palm, _hypernym, tree) and (olive, _hypernym, tree) have the same semantic hierarchy, they represent different meanings. The model’s performance can be enhanced by distinguishing semantic distinctions at the same level. Hierarchical features are significant for knowledge graph embedding models.

The Hierarchy-Aware Paired Relation Vectors Knowledge Graph Embedding (HPRE) model model proposed in this paper simulates hierarchical features into the model while fully simulating multiple relational patterns and attributes, addressing the mentioned issues above. The HPRE model is mainly composed of two parts: paired relation part and hierarchy aware part. The pair-relation part mainly refers to the PairRE [11] model, which splits the relationship into two parts, which are the same as the head entity relationship vector and tail entity relationship vector. The model can fit the symmetry/anti-symmetry, inverse, and composition relationship by assigning different roles to the relationship. At the same time, the pair-relation part can handle multiple relationships. However, the relation vector decomposition method ignores the knowledge graph’s hierarchical features. Therefore, we designed a hierarchy-aware part. By mapping the relation vector to the polar coordinate space and using the polar coordinate angle to simulate the hierarchical features of the elements in the knowledge graph, the expressive ability of the model is enhanced. Figure 2 shows the mapping of entities in the knowledge graph in polar coordinates. Different entities in the knowledge graph have tree-like hierarchical relationships. These relationships are mapped to polar coordinates, and angles distinguish elements at different levels.

Fig. 2

Illustration of entities at different levels of the hierarchy.

The contributions of this paper are summarized as follows:

The HPRE model framework we present is a powerful tool for effectively modeling various aspects of relational data. It can comprehensively capture and analyze simple and complex relation patterns, multi-relational types, and hierarchical features. The HPRE model includes identifying superficial one-to-one relationships between entities and more complex multi-relational patterns involving multiple entities and relationships. Furthermore, the framework is designed to account for the hierarchical nature of many datasets, enabling it to capture the relationships between individual entities and the broader hierarchical structure of the knowledge graph.

In the HPRE model, capturing hierarchical features is essential for accurately representing knowledge graphs. However, there may be better choices for capturing such features due to their linear and orthogonal nature than traditional complex systems, such as Cartesian coordinates. In contrast, polar coordinates offer a more natural way of describing hierarchical structures that exhibit circular or radial symmetry. Using polar coordinate pairs in the HPRE model can better capture the multi-scale and multi-level features standard in complex systems.

The experimental results indicate that the HPRE model achieves commendable performance in the link prediction task and effectively discriminates the hierarchical characteristics within the knowledge graph.

This research paper presents the following structure: Section 2 mainly introduces the related work on problem definition and knowledge graph embedding. Additionally, we discuss the commonly used definitions and models in this paper in Section 3. The experimental process and results are described in Section 4, which serves as the experimental part of this paper. Finally, Section 5 presents our research’s conclusion and future work.

2 Related work

Knowledge graph embedding is widely used in link prediction tasks. At present, most researchers project relations and entities into low-dimensional vectors and operate between them [4–6]. Many methods improve the expressive ability of the model by adding attachment information, such as path information [12–14], text description [15–17], type features [18, 19], etc. In this paper, we do not discuss models that use additional information. The main goal of the knowledge graph embedding model is to represent entities and relations through low-dimensional vectors. In the rest of this section, we mainly introduce the related work of knowledge graph embedding and mainly divide the knowledge graph embedding into three types: Translation models, Multiplication models, and Network models. We show these three aspects’ details and standard models and introduce their connection to the HPRE model.

Translation models. The translation model mainly treats the relation as the transformation between the head entity and the tail entity in the geometry vector space. The TransE [4] model is the first Translation model, which uses h + r ≈ t to express the interaction between relations and entities in the geometry space. In geometry vector space, the spatial position of the t vector is close to h + r. The idea of relation as operations between the head entity and tail entity has inspired many researchers, and many models have been proposed based on TransE. TransH [5] maps entities to different vector spaces according to different relational types to handle complex relation patterns. TransH regards the relation as a hyperplane and calculates the difference between the head entity vector and the tail vector mapped on the hyperplane. In the knowledge graph, entities have multiple meanings, and the same entities have different meanings in different relationships. Therefore, some entities are semantically similar in entity geometry space, but the same entity may be far away in different relation vector space. To solve this problem, TransR [20] model defines entity and relation space and performs translation in relation geometry space. To further handle complex relational patterns, the RotatE [6] model works by treating the relation as a relationship of the head entity vector to the tail entity vector in vector space. The formula proves that the RotatE model can effectively simulate the symmetry/anti-symmetry, inverse, and composition relationship. Using the Hamilton product, the QuatE [21] model captures the interactions between entities and relationships. QuatE rotates in four-dimensional space and gives individual components more complexity and expressiveness. The Rotate3D [10] model is an extension of the RotatE model, which projects entity vectors into three-dimensional space. Rotate3D can handle complex relational patterns and reasoning more efficiently. The MRotatE [7] model proposed a unified framework that models relation patterns and multi-relation. The PairRE model divides the relation into two parts, which are the head entity relation vector and the tail entity relation vector. The relation patterns are fitted by training the relation vectors of different roles separately. StructurE [22] improves the link prediction task performance by capturing both relational structure-context and edge structure-context. DualE [44] leverages dual quaternions, a mathematical framework that extends traditional quaternions by incorporating both rotational and translational components. By representing entities and relations using dual quaternions, the model can capture the semantic meaning and geometric transformations associated with the relations in the knowledge graph. ModE [8] leverages a modular design that captures the hierarchical relations between entities and relations. It decomposes the embedding space into multiple modules, where each module focuses on learning representations for a specific level of the knowledge graph hierarchy. HAKE [8] takes a different approach by explicitly modeling the hierarchical organization of the knowledge graph. It maps entities into the polar coordinate system. DiriE [45] utilizes Bayesian inference to measure the relations between entities and learns binary embeddings of knowledge graphs to model complex relation patterns. DensE [43] decomposes each relation into an SO(3) group-based rotation operator and a scaling operator in the three-dimensional (3-D) Euclidean space.

Multiplication models. The Multiplication models are also a critical model component of knowledge graph embedding. DistMult [23] perform bilinear interaction between the entity vector and relation vector, and the relation vector is considered a diagonal matrix. ComplEx [24] automatically learns the features of a knowledge graph by using complex-valued embeddings and latent factorization. HolE [25] uses correlation as the compositional operator, which can better capture contextual features and make training faster than DistMult. ANALOGY [26] proposed a multiplication framework for knowledge graph embedding with analogical properties of the embedding component on the knowledge graph. SimplE [27] can learn two embedding of each entity, which can have linear complexity and interpretability.

Network models. Neural network models are also widely used in knowledge graph embedding. ConvE [28] and ConvKB [29] use the convolutional neural network to learn features of triples. Convolutional neural networks can better fit contextual features and capture the implicit relationships of the head entity, relation, and tail entity. ConvR [30] model use relation considered as the convolutional neural network filter, enhancing the interaction of entities and relations. ConEx [31] combines ComplEx [24] and ConvE [28], which achieve state-of-the-art experimental results. HypER [32] uses a hyper-network to map relations to complex vector space and combines it with a convolutional neural network to obtain good results on the link prediction task. Graph Convolutional Neural (GCN) networks are widely used in knowledge graph embedding. R-GCN [33], A2N [34], KBGAT [35], CompGCN [36], and EIGAT [37] are GCN to extract the structural features of the graph under the framework of encoding and decoding. Table 1 summarizes the details of several advanced KG completion models. CapsE [41] leverages the concept of capsule networks to capture hierarchical relations and improve the expressive of the embeddings.

Table 1
Details of several knowledge graph embedding models

Model Score function Parameter

DistMult h^⊤diag (r) t $h, r, t \in ℝ^{k}$

ComplEx $Re (h^{⊤} diag (r) \bar{t})$ $h, r, t \in ℂ^{k}$

TransE ${‖ h + r - t ‖}_{2}^{2}$ $h, r, t \in ℝ^{k}$

TransH ${‖ (h - w_{r}^{⊤} h w_{r}) + d_{r} - (t - w_{r}^{⊤} h w_{r}) ‖}_{2}^{2}$ $h, r, t, w_{r}, d_{r} \in ℝ^{k}$

TransR ${‖ h M_{r} + r - t M_{r} ‖}_{2}^{2}$ $h, r, t \in ℝ^{k}$

ConvE f (vec (f ([r, h] ⊗ ω)) W) t $h, r, t \in ℝ^{k}$

ConvKB f ([h, r, t] ⊗ ω) t $h, r, t \in ℝ^{k}$

CapsE capsnet (f ([h, r, t] ⊗ ω)) $h, r, t \in ℝ^{k}$

RotatE ∥h ∘ r - t ∥ ₂ $h, r, t \in ℂ^{k}$

QuatE h ⊙ r^△ • t $h, r, t \in ℍ^{k}$

DualE $< a_{h}^{p}, a_{t} > + < b_{h}^{p}, b_{t} > + < c_{h}^{p}, c_{t} > + < d_{h}^{p}, d_{t} >$ $h, r, t \in ℍ^{k}$

PairRE ∥h ∘ r^H - t ∘ r^T ∥ ₂ $h, r, t \in ℝ^{k}$

HRPE $α {‖ h_{p} \circ r_{p}^{head} - t_{p} \circ r_{p}^{tail} ‖}_{2} + β {‖ \sin ((h_{h} + r_{h} - t_{h}) / 2) ‖}_{1}$ $h_{p}, t_{p}, r_{p}^{head}, r_{p}^{tail} \in ℝ^{k}$ h_h, r_h, t_h ∈ [0, 2π) ^k

Model	Score function	Parameter
DistMult	h^⊤diag (r) t	$h, r, t \in ℝ^{k}$
ComplEx	$Re (h^{⊤} diag (r) \bar{t})$	$h, r, t \in ℂ^{k}$
TransE	${‖ h + r - t ‖}_{2}^{2}$	$h, r, t \in ℝ^{k}$
TransH	${‖ (h - w_{r}^{⊤} h w_{r}) + d_{r} - (t - w_{r}^{⊤} h w_{r}) ‖}_{2}^{2}$	$h, r, t, w_{r}, d_{r} \in ℝ^{k}$
TransR	${‖ h M_{r} + r - t M_{r} ‖}_{2}^{2}$	$h, r, t \in ℝ^{k}$
ConvE	f (vec (f ([r, h] ⊗ ω)) W) t	$h, r, t \in ℝ^{k}$
ConvKB	f ([h, r, t] ⊗ ω) t	$h, r, t \in ℝ^{k}$
CapsE	capsnet (f ([h, r, t] ⊗ ω))	$h, r, t \in ℝ^{k}$
RotatE	∥h ∘ r - t ∥ ₂	$h, r, t \in ℂ^{k}$
QuatE	h ⊙ r^△ • t	$h, r, t \in ℍ^{k}$
DualE	$< a_{h}^{p}, a_{t} > + < b_{h}^{p}, b_{t} > + < c_{h}^{p}, c_{t} > + < d_{h}^{p}, d_{t} >$	$h, r, t \in ℍ^{k}$
PairRE	∥h ∘ r^H - t ∘ r^T ∥ ₂	$h, r, t \in ℝ^{k}$
HRPE	$α {‖ h_{p} \circ r_{p}^{head} - t_{p} \circ r_{p}^{tail} ‖}_{2} + β {‖ \sin ((h_{h} + r_{h} - t_{h}) / 2) ‖}_{1}$	$h_{p}, t_{p}, r_{p}^{head}, r_{p}^{tail} \in ℝ^{k}$ h_h, r_h, t_h ∈ [0, 2π) ^k

HPRE stands out from other models in the link prediction task due to several key differences that provide it with distinct advantages. Unlike traditional translation and multiplication models, HAKE explicitly considers the hierarchical structure present in knowledge graphs. HAKE incorporates the hierarchical relations between entities, enabling it to capture fine-grained and high-level semantic information. This hierarchy-aware approach enhances the representation power of HAKE and improves its ability to model complex relationships. Compared with the HAKE model, HPRE explicitly models the interactions between entity pairs. HPRE captures pairwise relationships between entities. In contrast, the HAKE model does not explicitly model entity pairs and their interactions. Meanwhile, HPRE generates relation-specific representations for each entity and relation type. By learning distinct embedding for different relations, HPRE can capture the unique characteristics associated with each relation. HPRE enables the model to distinguish between different relation types and make more accurate predictions. Therefore, compared to the HAKE model, model HPRE exhibits more advantages in handling complex relational patterns, particularly many-to-many relationships. Compared to neural network models, the HPRE model exhibits stronger interpretability, superior performance, and faster execution speed. Overall, the HPRE model stands out from other models in the link prediction task by incorporating hierarchy-aware embeddings, joint entity-relation embedding, and achieving state-of-the-art performance. These unique features and advantages make HPRE a promising approach for knowledge graph completion and link prediction tasks.

3 Proposed approach

In this section, we present the implementation details of the HPRE model. The HPRE model comprises two main components: the paired relation part and the hierarchy aware part. We begin by introducing the problem formulation and providing specific definitions used in this paper. Subsequently, we present a comprehensive description of the two model components and elaborate on the fusion process employed to integrate these components. Next, we present the loss function of the HPRE model, outlining its formulation and significance. Finally, we empirically demonstrate the model’s efficacy in capturing relation patterns, handling multiple relations types, and leveraging hierarchical features through mathematical formulas.

3.1 Problem formulation and notations

The link prediction task is crucial in evaluating knowledge graph representation embedding. This task aims to predict the miss triples in the knowledge graph by leveraging the existing triples. To address the incompleteness of the knowledge graph, we employ standard formulas such as (h, r, ?) or (? , r, t). We consider the set of triples in the knowledge graph as positive samples, while the triples that do not exist in the knowledge graph form the negative samples. In this paper, we represent the entity sets as $E$ , the relation sets as $R$ , and the triples set as $T$ . Each triple is denoted by a head entity, relation, and tail entity, represent as h, r, and t, respectively. We use the notation (h, r, t) to represent a triple, where h, r, and t are the components that satisfy the relationship $h, t \in E$ , $r \in R$ , and $(h, r, t) \in T$ . Furthermore, we assign vectors to the head entity, relation, and tail entity, denoted as h, r, and t, respectively. Hence, we represent the vectors of a triple as (h, r, t).

To leverage the hierarchical features present in knowledge graphs, we propose the HPRE model. The HPRE model comprises two key components: the paired relation and hierarchy-aware parts. This model represents the head entity embedding, relation embedding, and tail entity embedding by h, r, and t, respectively. We assign distinct symbolic representations to ensure distinguishability between the embeddings of different components. Within the paired relation part, we utilize h_p and t_p to denote the embeddings of the head and tail entity, respectively. Additionally, $r_{p}^{head}$ and $r_{p}^{tail}$ represent the relation feature embeddings for the head and tail entities within the paired relation part. In the hierarchy-aware part, h_h and t_h are employed to represent the embeddings of the head entity and tail entity, respectively. The relation features embedded in the hierarch-aware part are denoted by r_h. Consequently, we can describe the entity and relation embeddings as follows: $h = [h_{p}, h_{h}]$ (1) $r = [r_{p}^{head}, r_{p}^{tail}, r_{h}]$ (2) $t = [t_{p}, t_{h}]$ (3) where [.] is the CONCAT function that is combines data. In the HPRE model, we divide the entity embedding into several parts, which function as inputs for the paired relation and hierarchy-aware components.

Relation Pattern. We focus on introducing several relational patterns that significantly influence knowledge graphs. These patterns encompass (1) symmetry/anti-symmetry relations, which capture the symmetry relation between entities. The formula can be utilized to represent symmetry: $(e_{1}, r, e_{2}) \in T \Leftrightarrow (e_{2}, r, e_{1}) \in T$ and the formula can be utilized to represent anti-symmetry: $(e_{1}, r, e_{2}) \in T \Rightarrow (e_{2}, r, e_{1}) \notin T$ . (2) Inverse relations represent the opposite direction or meaning between entities. The formula can be utilized to represent: $(e_{1}, r_{1}, e_{2}) \in T \Leftrightarrow (e_{2}, r_{2}, e_{1}) \in T$ . (3) Composition relations, which involve the combination of entities to form a new relation. The formula can be utilized to represent: $(e_{1}, r_{1}, e_{2}) \in T \land (e_{2}, r_{2}, e_{3}) \in T \Rightarrow (e_{1}, r_{3}, e_{3}) \in T$ .

Multi-Relation Types. This paper introduces multi-relation types encompassing the following categories: (1) One-to-one relation. In this type of relation, each entity is directly associated with precisely one other entity. It represents a one-to-one correspondence between entities, establishing a unique mapping. (2) One-to-many relation. This relation indicates that a single entity can establish multiple connections with other entities. It implies that one entity is the source, linking to multiple target entities. (3) Many-to-one relation. In this type of relation, multiple entities connect to a single entity. It signifies that several source entities converge onto a common target entity. (4) Many-to-many relations. This relation portrays a complex and diverse connectivity pattern, where multiple entities are associated with multiple other entities. It allows for intricate and varied relations among entities in a knowledge graph. We aim to comprehensively understand and analyze the diverse connection patterns in knowledge graphs by exploring these multi-relation types.

3.2 Overall architecture

Figure 3 illustrates the overall architecture of the model, showcasing its components and flow. The input layer primarily comprises the embedding vectors of the head entity, the relation, and the tail entity. Pink dots denote the head vector, red dots denote the relation vector, and orange dots denote the tail entity vector. The model is partitioned into the paired relation and hierarchical aware parts. The embedding vector of the head entity and the embedding vector of the tail entity are split into two separate vectors, which serve as inputs to the paired relation part and hierarchical aware part, respectively. Similarly, the embedding vector of the relation is divided into three vectors, with two of them being input to the paired relation part and one to the hierarchical aware part. The paired relation part effectively handles multiple relation types and patterns, while the hierarchical aware part accurately captures and simulates hierarchical features of relations. The model derives the final prediction result by incorporating these two parts in the calculation.

Fig. 3

Illustration of the architecture of HPRE.

3.3 The paired relation part

The paired relation is crucial in handling relation patterns and multiple relations. In this part, we map the head and tail entities to a low-dimensional vector space, denoted as. $r_{p}^{head}, r_{p}^{tail} \in ℝ^{k}$ represents the dimension of the entity embedding vector representation. The dimension of the relation embedding vector representation is denoted as k. We can formulate the paired relation part as follows:

$h_{p} \circ r_{p}^{head} = t_{p} \circ r_{p}^{tail}$ (4) where ∘ is the Hadamard Product. The pair relation vectors effectively represent the head and tail entity features. The pair relation vectors enable the relation embedding to effectively handle relation patterns, including symmetry/anti-symmetry, inverse, and composition relations. The symmetry relations described as $if (e_{1}, r_{1}, e_{2}) \in T and (e_{2}, r_{1}, e_{1}) \in T$ , we can derive the following relationship:

$\begin{matrix} e_{1} \circ r_{p 1}^{head} = e_{2} \circ r_{p 1}^{tail} \land e_{2} \circ r_{p 1}^{head} = e_{1} \circ r_{p 1}^{tail} \\ \Rightarrow {r_{p 1}^{head}}^{2} = {r_{p 1}^{tail}}^{2} \end{matrix}$ (5)

The anti-symmetry relations described as $if (e_{1}, r_{1}, e_{2}) \in T and (e_{2}, r_{1}, e_{1}) \notin T$ , we can derive the following relationship:

$\begin{matrix} e_{1} \circ r_{p 1}^{head} = e_{2} \circ r_{p 1}^{tail} \land e_{2} \circ r_{p 1}^{head} \neq e_{1} \circ r_{p 1}^{tail} \\ \Rightarrow {r_{p 1}^{head}}^{2} \neq {r_{p 1}^{tail}}^{2} \end{matrix}$ (6)

The inverse relations described as $if (e_{1}, r_{1}, e_{2}) \in T and (e_{2}, r_{2}, e_{1}) \in T$ , we can derive the following relationship:

$\begin{matrix} e_{1} \circ r_{p 1}^{head} = e_{2} \circ r_{p 1}^{tail} \land e_{2} \circ r_{p 2}^{head} = e_{1} \circ r_{p 2}^{tail} \\ \Rightarrow r_{p 1}^{head} \circ r_{p 2}^{head} = r_{p 1}^{tail} \circ r_{p 2}^{tail} \end{matrix}$ (7)

The composition relations described as $if (e_{1}, r_{1}, e_{2}) \in T, (e_{2}, r_{2}, e_{3}) \in T and (e_{1}, r_{3}, e_{3}) \in T$ , we can derive the following relationship:

$\begin{matrix} e_{1} \circ r_{p 1}^{head} = e_{2} \circ r_{p 1}^{tail} \land \\ e_{2} \circ e_{p 2}^{head} = e_{3} \circ r_{p 2}^{tail} \land \\ e_{1} \circ r_{p 3}^{head} = e_{3} \circ r_{p 3}^{tail} \\ \Rightarrow r_{p 1}^{tail} \circ r_{p 2}^{tail} \circ r_{p 3}^{head} = r_{p 1}^{head} \circ r_{p 2}^{head} \circ r_{p 3}^{tail} \end{matrix}$ (8)

The paired relation can handle multiple relations, including one-to-many, many-to-one, and many-to-many relationships. During the model training process, it is inherent to separate the relationship’s head entity features and tail entity features. When confronted with multiple relations, it essentially involves associating the head entity (tail entity) with various types of tail entities (head entities) based on the specific relationships. Drawing inspiration from this inherent characteristic, our model splits the relationship into two parts during training, which has been empirically validated as effective through experimentation. The score function of the triples in the paired relation is defined as follows:

$f_{r, p} (h_{p}, t_{p}) = {∥ h_{p} \circ r_{p}^{head} - t_{p} \circ r_{p}^{tail} ∥}_{2}$ (9)

The primary purpose of this score function is to assess the scores of triples, assigning high scores to positive samples and low scores to negative samples. Let (h_i, r_i, t_i) represent a true triple in the knowledge base, and (h_j, r_j, t_j) represent a false one. Our objective is to maximize f_{r_i,p} (h_i,p, t_i,p) and minimize f_{r_j,p} (h_j,p, t_j,p). While the paired relation part proficiently deals with relationship patterns and multiple relations, it cannot address the hierarchical feature in the knowledge base.

3.4 The hierarchy-aware part

The hierarchy-aware part of the model primarily simulates the hierarchical features present in the knowledge graphs. Within this component, the head embedding and tail embedding represent h_h, t_h ∈ [0, 2π) ^k. The vector representation dimension of entity embedding is denoted as k. Additionally, the relation vector is represented as r_h ∈ [0, 2π) ^k., while k corresponds to the vector representation dimension of relation embedding. In the hierarch-aware part, the score function can be expressed as follows:

$(h_{h} + r_{h}) \mod 2 π = t_{h}$ (10)

In the HPRE model, entities’ hierarchical relationships are captured by utilizing a polar coordinate system. This coordinate system provides a robust framework for representing the hierarchical structure present in the knowledge graph. In the polar coordinate system, entities are represented using the radial and angular coordinates. The radial coordinate is crucial in modeling entities at different hierarchy levels. The radial coordinate captures an entity’s hierarchical depth or distance from a reference point, such as the root of the hierarchy. Entities closer to the reference point have smaller radial coordinates, indicating their higher position in the hierarchy, while entities further away have large radial coordinates, signifying their lower position. On the other hand, the angular coordinate is employed to distinguish entities at the same level of the hierarchy. It enables the model to capture the variations and nuances within a particular hierarchical level. By assigning different angular coordinates to entities within the same level, the HPRE model can represent their unique characteristics within the hierarchy. The corresponding score function is:

$f_{r, h} (h_{h}, t_{h}) = {∥ sin ((h_{h} + r_{h} - t_{h}) / 2) ∥}_{1}$ (11)

where sin(.) is the sine function, the sine function is incorporated into the base score function, enabling it to effectively capture and express various levels of periodicity. The inspiration for incorporating the score function with periodic regularity stems from the pRotatE [6] model, which has demonstrated its efficacy in handling such hierarchical patterns.

3.5 Jointly training

The paired relation and hierarchy-aware parts exhibit limitations when applied individually in knowledge graph applications. However, their functionalities are complementary, motivating the integration of both parts in the HPRE model. The HPRE model can simultaneously address relational patterns, multi-relation scenarios, and hierarchy-level features by combining the paired relation part and the hierarchy-aware part. Incorporating relation pairs and hierarchical features enhances the interaction between entities and relations, improving the robustness and performance of the HPRE model. We present the formulation of HPRE as follows:

${\begin{matrix} h_{p} \circ r_{p}^{head} = t_{p} \circ r_{p}^{tail} & h_{p}, t_{p}, r_{p}^{head}, r_{p}^{tail} \in ℝ^{k} \\ (h_{h} + r_{h}) \mod 2 π = t_{h} & h_{h}, r_{h}, t_{h} \in [0, 2 π)^{k} \end{matrix}$ (12)

The composite score function for HPRE captures the complex relationships between entities and relations in the knowledge graph while considering the relational patterns and hierarchy features associated with relations. The score function combines the paired relation part score function and the hierarchy-aware part score function to generate a total score for triple evaluation. The composite score function is defined as follows:

$\begin{matrix} f_{r} (h, t) & = α f_{r, p} (h_{p}, t_{p}) + β f_{r, h} (h_{h}, t_{h}) \\ = α {∥ h_{p} \circ r_{p}^{head} - t_{p} \circ r_{p}^{tail} ∥}_{2} \\ + β {∥ \sin ((h_{h} + r_{h} - t_{h}) / 2) ∥}_{1} \end{matrix}$ (13) where α and β are the weight parameters of the HPRE model. The value of α and β play a crucial role in determining the contributions of the paired relation and hierarchy-aware parts. Adjusting the values of α and β can modify the relative proportion of these parts. Consequently, the impact of the paired relation part and hierarchy-aware part on the experimental results can be assessed through the values of α and β. Overall, the composite score function in HPRE combines the paired relation part score function and the hierarchical score function, enabling the model to effectively handle hierarchical information and complex relationships in knowledge graphs, thereby enhancing its suitability and performance. To optimize the score function of the HPRE model, we employ self-adversarial training [6] along with the negative sampling loss function:

$\begin{matrix} L = & - log σ (γ - f_{r} (h, t)) \\ - \sum_{i = 1}^{n} p (h_{i}^{'}, r, t_{i}^{'}) log σ (f_{r} (h_{i}^{'}, t_{i}^{'} - γ)) \end{matrix}$ (14) where γ is a fixed margin, σ is the sigmoid activation function, and $(h_{i}^{'}, r, t_{i}^{'})$ is a negative triple in the knowledge graph. $p (h_{i}^{'}, r, t_{i}^{'})$ is the probability distribution of negative sampling triples, which can be represented as follows:

$p (h_{j}^{'}, r, t_{j}^{'} ∥ {(h_{i}, r_{i}, t_{i})}) = \frac{exp ξ f_{r} (h_{j}^{'}, t_{j}^{'})}{\sum_{i} exp ξ f_{r} (h_{i}^{'}, t_{i}^{'})}$ (15) where ξ is the temperature of sampling. Algorithm 1 shows the optimization process of the HPRE model.

Algorithm 1 HPRE Model Algorithm

Input: Training Set $T = (h, r, t)$ , Entity Set $E$ , Relation Set $R$ , Margin γ, Embedding Dimension k.

Output: Embedding of Entities $E^{'} = {e_{1}, e_{2}, \dots, e_{‖ E ‖}}$ , Embedding of Relations $R^{'} = {r_{1}, r_{2}, \dots, r_{‖ R ‖}}$ .

1: compute Initialize;

2: r ← zeros (∥ R ∥ , k) for each $r \in R$ ;

3: $r \leftarrow uniform (r, (- \frac{γ}{k}, \frac{γ}{k}))$ for each $r \in R$

4: $e \leftarrow zeros (‖ E ‖, k)$ for each $e \in E$

5: for iteration = 1, 2, ″¦, N

6: $e \leftarrow uniform (e, (- \frac{γ}{k}, \frac{γ}{k}))$

7: /* Sample a minibatch of size b */

8: T_batch ← sample (T, b)

9: /* Initialize the set of triples */

10: U_batch ← φ

11: for (h, r, t) ∈ S_batch do

12: /* Sample a corrupted triple */

13: $(h^{'}, r, t^{'}) \leftarrow sample (T_{(h, r, t)}^{'})$

14: U_batch← U_batch ⋃ { ((h, r, t) , (h′, r, t′)) }

15: end for

16: Score Calculate $f_{r} (h, t) = α {‖ h_{p} \circ r_{p}^{head} - t_{p} \circ r_{p}^{tail} ‖}_{2} + β {‖ \sin ((h_{h} + r_{h} - t_{h}) / 2) ‖}_{1}$

17: Update Loss Function $L = - log σ (γ - f_{r} (h, t)) - \sum_{i = 1}^{n} p (h_{i}^{'}, r, t_{i}^{'}) log σ (f_{r} (h_{i}^{'}, t_{i}^{'} - γ))$

18: end for

4 Experiments

In this section, we present the procedure and results of the experiments, focusing on the evaluation metrics, comparison of results with other models in the link prediction task, and further analysis of the effects of various components of the HPRE model.

4.1 Datasets

To evaluate the effectiveness of the HPRE model, we conducted experiments using five distinct datasets: WN18 [4], WN18RR [28], FB15k [38], FB15k-237 [39], and YAGO3-10 [40]. The specific details of these datasets are presented in Table 2. The WN18 dataset consists of triples, where each triple represents a relationship between two entities. It includes 40,930 entities and 18 unique relations. These entities and relations are derived from the English vocabulary, covering various lexical categories such as nouns, verbs, adjectives, and adverbs. The WN18RR dataset, derived from WN18, comprises 40,930 entities and 11 different relations. The WN18RR dataset addresses a particular issue present in the original WN18 dataset, namely the problem of data leakage caused by the presence of inverse relations. To mitigate the issue, WN18RR removes the inverse relations present in WN18, resulting in a dataset better suited for evaluating the model’s ability to capture relational patterns without the confounding effect of inverse relations. The WN18 and WN18RR datasets are derived from WordNet, a lexical database that organizes words and concepts based on their semantic relationships. Since WordNet exhibits a hierarchical structure with various types of relations, it aligns well with the goals of the HPRE model. The FB15k dataset is a subset of the extensive Freebase database, containing 15,000 entities and 1,345 different relations. This dataset represents a diverse range of real-world entities and their connections, making it suitable for assessing the performance of models on complex and varied relationship patterns. FB15k-237 is a subset of FB15k specifically designed to mitigate the impact of data leakage on the model. FB15k-237 ensures a more reliable evaluation of the model’s performance by removing certain relations prone to data leakage. The extensive coverage of entities and relations allows the HPRE model to learn and capture complex patterns and relations within the knowledge graph. The YAGO3-10 dataset, sourced from Wikipedia, WordNet, and GeoNames, is known for its distinct semantic hierarchy—one of the critical advantages of the YAGO3-10 dataset incorporating a hierarchical structure. The structure enables the modeling of hierarchical features within the knowledge graph, allowing the evaluation of link prediction methods in hierarchical relations. This aspect makes it a suitable dataset for assessing the ability of the HPRE model to capture and leverage hierarchical features for accurate link prediction. The YAGO3-10 dataset’s incorporation of hierarchical features and comprehensive knowledge coverage makes it an excellent choice for assessing the model’s ability to handle complex relations and accurately predict links within a diverse and hierarchical knowledge graph.

Table 2
Statistics of datasets

Dataset Entities Relations Train Valid Test

FB15k 14,541 1,345 483,142 50,000 59,071

FB15k-237 14,541 237 272,115 17,535 20,466

WN18 40,943 18 141,442 5,000 5,000

WN18RR 40,943 11 86,835 3,034 3,134

YAGO3-10 123,182 37 1,079,040 5000 5000

Dataset	Entities	Relations	Train	Valid	Test
FB15k	14,541	1,345	483,142	50,000	59,071
FB15k-237	14,541	237	272,115	17,535	20,466
WN18	40,943	18	141,442	5,000	5,000
WN18RR	40,943	11	86,835	3,034	3,134
YAGO3-10	123,182	37	1,079,040	5000	5000

4.2 Evaluation metrics

When assessing the performance of models in the link prediction task, we commonly use several evaluation metrics. Three famous evaluation metrics are Mean Rank (MR), Mean Reciprocal Rank (MRR), and Hits@N. The Mean Rank metric measures the average rank of the correct answer among all the possible answers. The model predicts a rank for the true tail entity among all possible tail entities for each test triple. We calculate MR by averaging the ranks across all test triples. A lower MR value indicates better performance, meaning the model ranks the correct answer higher. The MR is defined as follows:

$MR = \frac{1}{| T |} \sum_{i = 1}^{| T |} {rank}_{i}$ (16) where $T$ denotes the set of triples, $| T |$ denotes the number of triples, and rank_i represents the link prediction ranking of the i - th triple. The Mean Reciprocal Rank (MMR) metric calculates the correct answers’ average reciprocal ranks. Similar to MR, the model predicts ranks for the true tail entities. However, instead of averaging the ranks directly, MRR takes the reciprocal of each rank and then averages them. A higher MRR value indicates excellent performance. The MMR is defined as follows:

$MMR = \frac{1}{| T |} \sum_{i = 1}^{| T |} \frac{1}{{rank}_{i}}$ (17)

The Hits@N metric measure the percentage of test triples for which the true entity is within the top N predicted entities. The Hits@N metric determines how frequently the top N ranks include the correct answer. Typical values for N include 1, 3, and 10. A higher Hits@N value indicates excellent performance. We use $I (\cdot)$ to denote the indicator function, which outputs one if the condition is true and zero if the condition is false. The Hits@N is defined as follows:

$Hits @ N = \frac{1}{| T |} \sum_{i = 1}^{| T |} I ({rank}_{i} \leq n)$ (18)

4.3 Training protocol

We implemented the HAKE model using the Python programming language and the PyTorch deep learning frameworks. According to our experiments, the optimal hyperparameter settings are as follows: For the FB15k-237 dataset, the dimension of embeddings is 1000; the learning rate is 5e - 4; the hidden dimension and negative sample size are 500 and 1024; the hyperparameters ξ and γ are 1.0 and 11.0; the hyperparameters α and β are 3.5 and 1.0. For the FB15 dataset, the dimension of embeddings is 1000; the learning rate is 5e-4, the hidden dimension and negative sample size are 500 and 1024; the hyperparameters ξ and γ are 1.5 and 17.0; the hyperparameters α and β are 0.8 and 0.3. For the WN18RR dataset, the dimension of embeddings is 1000; the learning rate is 5e-4, the hidden dimension and negative sample size are 1000 and 1024; the hyperparameters ξ and γ are 0.5 and 8.0; the hyperparameters α and β are 0.5 and 0.5. For the WN18 dataset, the dimension of embeddings is 1000; the learning rate is 5e-4, the hidden dimension and negative sample size are 1000 and 1024; the hyperparameters ξ and γ are 1.0 and 8.0; the hyperparameters α and β are 0.5 and 0.5. For the YAGO3-10 dataset, the dimension of embeddings is 1000; the learning rate is 2e-4, the hidden dimension and negative sample size are 1000 and 1024; the hyperparameters ξ and γ are 1.0 and 28.0; the hyperparameters α and β are 1.0 and 0.5.

4.4 Main results on link prediction

In the link prediction task, we performed experiments on five datasets and Tables 3–7 presents the HPRE model’s experimental results on these datasets. We employed comparison models DistMult [23], ComplEx [24], ConvE [28], TransE [4], TransH [5], TransR [20], ConvKB [29], CapsE [41], RotatE [6], QuatE [21], ModE [8], HAKE [8], PairRE [11], DensE [43], DualDE [44], and DiriE [45] for evaluation during the experiments. In the following parts, we will provide a detailed analysis of the experimental results.

Table 3
Link prediction results on the FB15k-237

Models MR MMR Hits@10 Hits@3 Hits@1

DistMult 254 0.241 0.419 0.263 0.155

ComplEx 339 0.247 0.428 0.275 0.158

ConvE 244 0.325 0.501 0.356 0.237

TransE 357 0.294 0.465 - -

TransH 348 0.284 0.488 - -

TransR 310 0.310 0.506 - -

ConvKB 254 0.418 0.532 - -

RotatE 177 0.338 0.533 0.375 0.241

QuatE 87 0.348 0.550 0.382 0.248

ModE - 0.341 0.534 0.380 0.244

HAKE - 0.346 0.542 0.381 0.250

PairRE 160 0.351 0.544 0.387 0.256

DualDE 80 0.537 0.629 0.423 0.240

DensE - 0.351 0.544 - -

HRPE 183 0.357 0.551 0.379 0.269

Models	MR	MMR	Hits@10	Hits@3	Hits@1
DistMult	254	0.241	0.419	0.263	0.155
ComplEx	339	0.247	0.428	0.275	0.158
ConvE	244	0.325	0.501	0.356	0.237
TransE	357	0.294	0.465	-	-
TransH	348	0.284	0.488	-	-
TransR	310	0.310	0.506	-	-
ConvKB	254	0.418	0.532	-	-
RotatE	177	0.338	0.533	0.375	0.241
QuatE	87	0.348	0.550	0.382	0.248
ModE	-	0.341	0.534	0.380	0.244
HAKE	-	0.346	0.542	0.381	0.250
PairRE	160	0.351	0.544	0.387	0.256
DualDE	80	0.537	0.629	0.423	0.240
DensE	-	0.351	0.544	-	-
HRPE	183	0.357	0.551	0.379	0.269

Table 3 presents the HPRE model’s results on the FB15k-237 dataset. The data in the table demonstrates the favorable experimental outcomes achieved by the HPRE model. A comparative analysis with the PairRE model reveals significant improvements in various evaluation metrics. Specifically, the HPRE model outperformed the PairRE model with notable enhancements of 0.6% on MMR. The effective utilization of the hierarchical features inherent in the knowledge graph contributes to the HPRE model’s superior performance. HPRE demonstrated significant improvements when compared to HAKE. Specifically, it achieved a notable increase of 1.1% on MMR, 0.9% on Hits@10, and 1.9% on Hits@1. Similarly, compared to ModE, HPRE exhibited substantial enhancements with a 1.6% improvement in MMR, 1.7% on Hits@10, and 2.5% on Hits@1. These results highlight the superior performance of the HPRE model in effectively leveraging the relationship patterns embedded within the knowledge graph, thus leading to improved performance when compared to both HAKE models.

HPRE achieved remarkable improvements of 6.3%, 7.3%, and 4.7% on MMR compared to TransE, TransH, and TransR, respectively. Compared to QuatE and RotatE, HPRE demonstrated 0.9% and 1.9% on MMR competitive enhancements, respectively. When compared to DualDE, HPRE exhibited a 2% improvement in MMR. Similarly, HPRE outperformance DensE by 0.6% on MMR. The HPRE model performs excellently due to its effective handling of relational patterns and hierarchical features. Unlike the TransE model, which is limited to handling a single relationship and cannot address complex relationship patterns and hierarchical features, the HPRE model significantly outperforms the TransE model. Other translation models cannot handle hierarchical features except for the HAKE model, resulting in HPRE outperforming most translation models. Furthermore, the multiplication model, which utilizes less relational information, is inferior to the HAKE model, as evident from the data presented in the data.

HPRE also has performance advantages compared to neural network models. Compared with ConvE, HPRE significantly improved by 3.2% on MMR, 5% on Hits@10, 2.3% on Hits@3, and 3.2% on Hits@1. The table shows that the ConKB model outperforms the HPRE model primarily because it utilizes global relationships among the entities and relations embedding. Additionally, we observed that ConvKB effectively generalizes the transitional characteristics in the transition-based embedding model. Our future research will focus on incorporating hierarchical features into convolutional neural networks to improve performance further.

In the FB16k-237 dataset, the HPRE model exhibits a generally lower MR index than most models, ranking higher in correctly predicting triples. However, translation-based models, such as RotatE and QuatE, currently exhibit lower MR indicators than HPRE models due to the slight decrease in model stability observed after adding hierarchical features. The QuatE model represents a more stable expression in a four-dimensional space, which results in a lower MR than the HPRE model.

Table 4 shows the results of the HRPE on the FB15k dataset; HPRE achieved competitive results in various indicators. Compared with the PairRE and RotatE, HPRE achieved great results of 1.3% and 2.7% on MMR, 0.2% and 1.4% on Hits@10, respectively, 0.6% and 2.1% on Hits@3 respectively, 0.8% and 2.7% on Hits@1 respectively. Compared to the HPRE model, the RotatE model exhibits a fitting of the relationship pattern through relational rotation but lacks hierarchical features. The performance of the HPRE is significantly improved compared to the translation model. Compared with TransR, HPRE obtained practical improvements of 21.1% on Hits@10. HPRE employs more sophisticated scoring functions compared to TransR. These functions capture complex relationship patterns and provide a more nuanced data representation. This enhanced modeling approach contributes to the improved performance of HPRE on the Hits@10 metric. The HPRE model achieves a significant improvement of 16.7% on MMR compared to the neural network model ConvE. The HPRE model improved due to its effective handling of relational patterns and integration of hierarchical features, while the ConvE model solely focused on extracting implicit features of entities and relations.

Table 4

Link prediction results on the FB15k

Models	MR	MMR	Hits@10	Hits@3	Hits@1
DistMult	42	0.798	0.893	-	-
ComplEx	-	0.692	0.840	0.759	0.599
ConvE	51	0.657	0.831	0.723	0.558
TransE	-	0.463	0.749	0.578	0.297
TransH	40	0.734	0.867	0.796	0.651
TransR	77	-	0.687	-	-
RotatE	40	0.797	0.884	0.830	0.746
PairRE	37.8	0.811	0.896	0.845	0.765
HRPE	51	0.824	0.898	0.851	0.773

Table 5 presents the results of the HPRE model on the WN18RR dataset, demonstrating its outstanding performance. The HPRE model achieves excellent results on the WN18RR dataset, benefiting from paired relations that provide additional relation features. The competitive results achieved by the HPRE model, with improvements of 1.3% and 1.7% on MMR compared to DualE and QuatE, respectively, can be attributed to its unique characteristics. The HPRE model effectively handles relational patterns and incorporates hierarchical features, allowing it to capture more nuanced and complex relationships within the knowledge graph. This enhanced representation capability enables the HPRE model to outperform DualE and QuatRE regarding MMR, showcasing its superior performance in link prediction tasks. The significant improvements achieved by the HPRE model, with enhancements of 9% on MMR, 1.8% on Hits@10, and 12.2% on Hits@1 compared to CapsE, can be attributed to the combination of hierarchical feature integration and effective handling of relational patterns in the HPRE model. The CapsE model tends to focus more on extracting implicit features of entities and relations rather than explicitly implicit features of entities and relations rather than explicitly modeling the relationships. The CapsE model can not fully leverage the relationship-specific features and interactions in the knowledge graph. In the WN18RR dataset, the neural network model demonstrates a better generalization capability than the HPRE model, resulting in significantly higher MR values for the latter. The presence of noise or outliers in the WN18RR dataset can affect the performance of different models. If the HPRE model is more sensitive to such instances, it results in higher MR values. On the other hand, the neural network model may be more robust to noise or outliers due to its ability to learn more flexible representations, resulting in lower MR values. The DualE model exhibits higher performance in Hits@10 compared to the HPRE model, while the HPRE model outperforms the DualE model in other evaluation indicators. We attribute this phenomenon to data fluctuations.

Table 5

Link prediction results on the WN18RR

Models	MR	MMR	Hits@10	Hits@3	Hits@1
DistMult	5110	0.425	0.491	0.440	0.390
ComplEx	5261	0.444	0.507	0.460	0.410
ConvE	4187	0.433	0.515	0.440	0.400
TransE	3384	0.226	0.501	-	-
TransH	3048	0.286	0.503	-	-
TransR	3348	0.303	0.513	-	-
ConvKB	763	0.253	0.567	-	-
CapsE	719	0.415	0.560	-	0.337
RotatE	3340	0.476	0.571	0.488	0.422
QuatE	2314	0.488	0.582	0.508	0.438
DualE	2270	0.492	0.584	0.513	0.444
ModE	-	0.472	0.564	0.486	0.427
HAKE	-	0.497	0.582	0.516	0.452
DiriE	-	0.495	0.448	0.657	-
DensE	-	0.492	0.586	-	-
HRPE	3895	0.505	0.578	0.527	0.459

Table 6 shows the results of the HRPE on the WN18 dataset. Compared with QuatE and DualE, HPRE significantly improved by 0.6% and 0.4% on MMR. Compared with RotatE, HPPR activated improvements of 0.9% on MMR. Compared with DistMult and ComplEx, the HPRE obtained signification improvements of 15.9% and 1.5% on MMR, respectively, and 1.4% and 1.3% on Hits@10. DistMult and ComplEx models utilize simple scoring functions based on inner products or bilinear forms. While these functions can capture fundamental interactions between entities and relations, they may need to capture more nuanced relationships and higher-order interactions. The HPRE model outperforms DistMult and ComplEx due to its effective incorporation of hierarchical features, improved modeling of relation patterns, ability to handle multiple relation types and optimization of the score function. These factors collectively contribute to the superior performance of the HPRE model on MMR and Hits@10 evaluation metrics. In the WN18RR, HPRE exhibits slightly higher MR values than other models. The conclusion attributing the larger MR values exhibited by the HPRE model compared to other models in the WN18RR dataset to the integration of too many features reflects the model’s instability. In future work, we will focus on enhancing model stability to address this issue.

Table 6

Link prediction results on the WN18

Models	MR	MMR	Hits@10	Hits@3	Hits@1
DistMult	655	0.797	0.946	-	-
ComplEx	-	0.941	0.947	0.945	0.936
ConvE	374	0.943	0.956	0.946	0.935
TransE	-	0.946	0.943	0.888	0.113
TransH	388	-	0.823	-	-
TransR	225	-	0.920	-	-
RotatE	184	0.947	0.961	0.953	0.938
QuatE	162	0.950	0.959	0.954	0.945
DualE	156	0.952	0.962	0.956	0.946
QuatE	199	0.956	0.960	0.958	0.949
DiriE	-	0.955	0.950	0.967	-
HRPE	199	0.956	0.960	0.958	0.949

Table 7 shows the results of the HRPE on the YAGO3-10 dataset. Compared with HAKE and ModE, HPRE obtained significant improvements of 0.4% and 3.9% on MMR. Compared with the relation rotate model RotatE, HPRE performance significantly improved by 5.4% on MMR, 4.3% on Hits@10, 4.0% on Hits@3, and 6.7% on Hits@1. Compared with DensE models, HPRE significantly improved performance by 0.8% on MMR. The DensE model, when compared to the HPRE model, exhibits certain limitations. One of the areas for improvement is its inability to effectively capture and utilize hierarchical features o the dense embeddings of entities and relations, neglecting the hierarchical structure in the data. The HPRE and HAKE models were compared on the structural dataset YAGO3-10 to evaluate their performance. The results indicated that there was no significant improvement in performance between the two models. Both models showed comparable performance because we optimized them for the model structure, which led to a need for improvement. On the other hand, the HPRE model showed significantly better performance than the ModE and RotatE modes. The HPRE model’s ability to fit structural features resulted in improved performance. In the YOGO3-10 dataset, the hierarchical model based on HPRE surpasses neural network models by exhibiting significantly higher MR values. The hierarchical model’s superiority over neural network models is a notable advantage of using HPRE-based models.

Table 7

Link prediction results on the YAGO3-10

Models	MR	MMR	Hits@10	Hits@3	Hits@1
DistMult	5926	0.34	0.54	0.38	0.24
ComplEx	6351	0.36	0.55	0.40	0.26
ConvE	2792	0.44	0.62	0.49	0.35
RotatE	-	0.495	0.670	0.550	0.402
ModE	-	0.510	0.660	0.562	0.421
HAKE	-	0.545	0.694	0.596	0.462
DensE	-	0.541	0.678	-	-
HRPE	1216	0.549	0.713	0.590	0.469

4.5 Analysis on hierarchy features

The visualization of triple embeddings verified that the HPRE model effectively utilizes the hierarchical features of the knowledge graph. In this section, we conducted a visualization analysis of triple embeddings from three models: RotatE, PairE, and HPRE. We projected these models’ head and tail entities onto a 2D vector space, representing the entity vectors using a rectangular coordinate system. The entity vectors had a dimensionality of 1000. We observed the distribution of these points by mapping the head and tail entities to 1000 points in the 2D space. We observed that the models successfully learned the hierarchical features of the knowledge graph by partitioning the head and tail entity boundaries.

Figure 4 illustrates the visualization results obtained from the WN18RR dataset. Specifically, Fig. 4a, 4d, and 4c present the embedding visualization using the RotatE model. Fig. 4b, 4e, and 4h depict the embedding visualization using the PairRE model. Lastly, Fig. 4c, 4f, and 4i showcase the embedding visualization achieved using the HPRE model. In the figure, the blue dots represent the head entity, while the orange dots represent the tail entity. We project the entity vector onto a coordinate system, and the distribution of points in the coordinate system reflects different hierarchical features of the entity. Notably, since all modulus values are less than 1, a larger radius in the figure corresponds to a sampler modulus. Figure 4a, 4b, and 4c showcase the triple (syngnathidae, _member_meronym, syngnathus) from diverse viewpoints. The distribution of points in Fig. 4a and 4b must exhibit a clear distinction. However, in Fig. 4c, a clear differentiation between points representing the head and tail entities can be observed. Specifically, entity “syngnathidae” exhibits higher hierarchical features under relation “_member_meronym” than entity “syngnathus.” Consequently, in Fig. 4c, the tail entity “syngnathus” is located closer to the center. In contrast, the head entity “syngnathidae” is positioned farther away. Fig. 4d, 4e, and 4f illustrate the triple (advantageous, _similar_to, meanwhile), emphasizing the relationship under relation “_similar_to” . In this scenario, the head entity “advantageous” and the tail entity “meanwhile” demonstrate an equivalent hierarchical level. There is no prominent differentiation in the uniform distribution of the points representing the head entity “advantageous” and the tail entity “meanwhile” . Figure 4g, 4h, and 4i present the triple (inoculate, _hypernym, stick), providing insights into its characteristics. The distribution of points in Fig. 4g and 4h requires differentiation. However, in Fig. 4i, there is a clear distinction between the points representing the head entity and those representing the tail entity. Under relation “_hypernym” , the hierarchical features of “inoculate” are low compared to those of “stick” . Consequently, in Fig. 4i, the head entity “inoculate” is positioned closer to the center. In contrast, the tail entity “stick” is situated farther away. These figures demonstrate the ability of HPRE to capture hierarchical semantic features accurately. Conversely, distinguishing hierarchical semantic features proves challenging in the RotatE and PairE models, as different entities’ hierarchies are not easily discernible.

Fig. 4

Visualization of the embedding of triples from the WN18RR dataset.

Convolutional neural networks can fit contextual features and capture implicit relationships. Specifically, CNNs aggregate local information to obtain general information and capture implicit semantic features from context. We propose that the HRPE model, which captures the knowledge graph’s hierarchical features, is a tree-like feature constituting an explicit structure. In a knowledge graph, hierarchical relationships refer to the relationships between entities in a tree-like structure where each entity is a child or a descendant of a parent entity. On the other hand, semantic or contextual feature relationships in a knowledge graph refer to the relationships between entities based on their meaning or contexts. In summary, hierarchical relationships in a knowledge graph are based on a fixed classification hierarchy, while semantic relationships are based on similarities, associations, or co-occurrences between entities.

4.6 Analysis on multi relation

In order to demonstrate the advantages of the HPRE model in multi-relational types of 1-to-1, 1-to-N, N-to-1, and N-to-N, we verified the model’s performance on multi-relations. Figure 5 shows the performance comparison between PairRE, HAKE, and HPRE on multi-relations. As can be seen from the figure, HPRE significantly outperforms other models in multiple relation types. At the same time, to observe the effectiveness of HPRE for multi-relation in more detail, we verified the performance of specific relationships under different relationship types. Table 8 shows the results of the HPRE model in specific relations. The superior performance of HPRE in relational types is mainly due to the effective learning of the vectors represented by paired relations. HPRE achieved good results in a specific relationship compared to PairRE and HAKE. The HPRE model outperforms the HAKE model in complex relationships because it represents relation vectors as paired. Experimental suggests that the HPRE model excels in multi-relational patterns.

Fig. 5

Performance comparison between RotatE, PairRE, and HPRE.

Table 8

Hits@10 and MR on some 1-to-1 1-to-N, N-to-1, N-to-N relations in FB15K-237

Relation Examples		PairRE/HAKE/HPRE
		MR	Hits@10
1-to-1	/flm/flm/prequel	4.77/16.15/17.05	0.88/0.90/0.91
	/education/educational_institution/campuses	1.0/1.0/1.0	1.0/1.0/1.0
	/location/hud_county_place/place	1.0/1.0/1.0	1.0/1.0/1.0
	/education/educational_institution_campus/educational_institution	1.0/1.0/1.0	1.0/1.0/1.0
1-to-N	/sports/sports_league/teams./sports/sports_league_participation/team	19.84/26.38/28.25	0.87/0.87/0.89
	/education/feld_of_study/students_majoring./education/education/student	253.13/364.39/393.26	0.26/0.34/0.38
	/organization/organization/child./organization/organization_relationsh-ip/child	55.52/54.45/53.97	0.35/0.35/0.35
N-to-1	/flm/flm/release_date_s./flm/flm_regional_release_date/flm_release_distribution_medium	54.32/53.21/52.31	0.52/0.52/0.52
	/location/location/time_zones	137.33/201.14/226.89	0.63/0.61/0.62
	/flm/flm/produced_by	239.78/216.47/207.56	0.53/0.48/0.50
	/people/person/nationality	161.74/179.24/182.60	0.54/0.53/0.56
N-to-N	/location/location/contains	218.88/321.78/362.57	0.59/0.61/0.63
	/organization/organization_member/member_of./organization/organization	8.25/9.47/10.33	0.89/0.89/0.91
	/flm/flm/country	112.39/101.49/115.93	0.49/0.48/0.52
	/music/genre/parent_genre	33.02/40.7847.51	0.42/0.48/0.52

4.7 Analysis on relation pattern

In order to verify the effectiveness of the HPRE model on the relation patterns, we visualize the relation vector. Figure 6 shows the distribution of each entry of the relational vector. Figure 6a shows the symmetry relation _similar_to. It can be seen that the obtained vectors are approximately symmetric. Figure 6b shows the properties of inverse relations. We use _hypermym^-1 to denote the inverse relation of _hypermym, and we plot _hypermym and _hypermym^-1 distributions. It can be seen that the image distributions of the two relations are approximately symmetric. This phenomenon shows that HPRE can effectively distinguish inverse relations. Figure 6c shows the properties of composition relation. In the Fig. 6c, winner represents relation /award/award_category/winners./award/award_hon-or/award_winner, for ₁ represents relation/award/award_nominee /award_nominations/award/saward_nomination/nominated_for, and for₂ represents relation /award/award_category/nominees./award/award_nomination/nominated_for. ⊗ denotes the composition operation. In the FB15k-237 dataset, for ₂ is a composition of for ₁ and winner. It can be seen from Fig. 6c that the vector entry distribution is symmetrical, and the composition of for ₁ and winner is very similar to for ₂ .

Fig. 6

Histograms of angles corresponding to some relation embeddings.

4.8 Analysis on complexity

In this research paper, we calculated the operation time of different operators in the model and counted the parameter quantity of the model using the fvcore 1 toolkit. Table 9 displays the results. The table indicates that the RotatE model has the shortest running time, whereas the HPRE and HAKE models have the same running time. The RotatE model has the smallest parameter quantity in terms of parameter quantity. Regarding performance, the HPRE model outperforms the RotatE model, albeit at the cost of slower running speed. In future work, we will focus on improving the running speed of the HPRE model without compromising its accuracy.

Table 9
Running time and parameters

Models Time/s Parameters/M

HPRE 22 29.79

RotatE 16 17.64

HAKE 22 29.79

Models	Time/s	Parameters/M
HPRE	22	29.79
RotatE	16	17.64
HAKE	22	29.79

5 Conclusion and future work

Our research proposed a knowledge graph embedding approach for link prediction, utilizing a multi-dimensional space to model various relation types, patterns, and hierarchical features. The HPRE model, which consists of two main components, the paired relation part, and the hierarchy aware part, plays a crucial role in achieving our goals. In the paired relation part, we leverage paired relations to effectively capture multiple relation types and discern relation patterns within the knowledge graph. On the other hand, the hierarchy aware part of HPRE simulates the hierarchical characteristics of different entities by assigning them specific angles within a polar coordinate space. To facilitate the integration of these components, we introduced a composite score function explicitly tailored for the HPRE model. We conducted extensive experiments to validate the effectiveness of our proposed model, HPRE. The experimental results demonstrated that our model outperforms most current models for link prediction. Moreover, our model can simultaneously handle multi-relation types, relation patterns, and hierarchical features within a unified framework.

In the future, we plan to enhance the capabilities of our model by incorporating first-order logic rules to encode more intricate relationships within the knowledge graph. First-order predicate logic allows us to define and manipulate variables, constants, predicates, and quantifiers. Using first-order predicate logic, we can capture intricate patterns, dependencies, and constraints within a knowledge graph or domain. By combining the expressive power of first-order logic with the flexibility and representation learning capabilities of neural network models, we aim to achieve a more comprehensive understanding of complex relationship patterns and improve the performance of our model accordingly.

Footnotes

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61976032).

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Data availability

The datasets generated during and/or analysed during the current study are available in the GitHub repository, .

References

Mintz

, Bills

, Snow

and Jurafsky

, Distant supervision for relation extraction without labeled data, In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 1003–1011 (2009).

Ferrucci

, Brown

, Chu-Carroll

, Fan

, Gondek

, Kalyanpur

A.A.

, Lally

, Murdock

J.W.

, Nyberg

, Prager

, et al., Building watson: An overview of the deepqa project, AImagazine 31(3) (2010), 59–79.

Wang

, Xu

, He

, Cao

, Wang

and Chua

T.-S.

, Reinforced negative sampling over knowledge graph for recommendation, In Proceedings of the web conference 2020, pages 99–109, 2020.

Bordes

, Usunier

, Garcia-Duran

, Weston

and Yakhnenko

, Translating embeddings for modeling multi-relational data, Advances in Neural Information Processing Systems 26 (2013).

Sun

, Deng

Z.-H.

, Nie

J.-Y.

and Tang

, Rotate: Knowledge graphembedding by relational rotation in complex space, arXiv preprint arXiv:1902.10197, 2019.

Sun

, Deng

Z.-H.

, Nie

J.-Y.

and Tang

, Rotate: Knowledge graph embedding by relational rotation in complex space, arXiv preprint arXiv:1902.10197, 2019.

Huang

, Tang

, Tan

, Zeng

, Wang

and Zhao

, Knowledge graph embedding by relational and entity rotation, Knowledge-Based Systems 229 (2021), 107310.

Zhang

, Cai

, Zhang

and Wang

, Learning hierarchy-awareknowledge graph embeddings for link prediction, In Proceedings of the AAAI Conference on Artificial Intelligence, 34 (2020), 3065–3072.

Zhang

, Wang

, Yang

and Xue

, Structural context-based knowledge graph embedding for link prediction, Neurocomputing. 470 (2022), 109–120.

10.

Gao

, Sun

, Shan

, Lin

and Wang

, Rotate3d: Representing relations as rotations in three-dimensional space for knowledge graph embedding, In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pages 385–394, 2020.

11.

Chao

, He

, Wang

and Chu

, Pairre: Knowledge graph embeddings via paired relation vectors, arXiv preprint arXiv:2011.03798, 2020.

12.

Wang

, Zhang

, Feng

and Chen

, Knowledge graph and text jointly embedding, In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1591–1601 2014.

13.

Xiong

, Hoang

and Wang

W.Y.

, Deep path: A reinforcement learning method for knowledge graph reasoning, arXiv preprint arXiv:1707.06690, 2017.

14.

Lin

X.V.

Socher

and Xiong

, Multi-hop knowledge graph reasoning with reward shaping, arXiv preprint arXiv:1808.10568, 2018.

15.

Han

, Liu

and Sun

, Neural knowledge acquisition via mutual attention between knowledge graph and text, In Proceedings ofthe AAAI Conference on Artificial Intelligence 32 (2018).

16.

Rezayi

, Zhao

, Kim

, Rossi

R.A.

, Lipka

and Li

, Edge:Enriching knowledge graph embeddings with external text, arXiv preprint arXiv:2104.04909, 2021.

17.

Wang

, Shen

, Long

, Zhou

, Wang

and Chang

, Structure-augmented text representation learning for efficient knowledge graph completion, In Proceedings of the Web Conference 2021 pages 1737–1748 2021.

18.

, Ding

, Jia

, Wang

and Guo

, Transt: Type-based multiple embedding representations for knowledge graph completion,In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 717–733. Springer, 2017.

19.

Jain

, Kumar

, Chakrabarti

, et al., Type-sensitive knowledge base inference without explicit type supervision, In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) pages 75–80, 2018.

20.

Lin

, Liu

, Sun

, Liu

and Zhu

, Learning entity and relation embeddings for knowledge graph completion, In Twenty-ninth AAAI conference on artificial intelligence, 2015.

21.

Zhang

, Tay

, Yao

and Liu

, Quaternion knowledge graph embeddings, Advances in Neural Information Processing Systems 32 (2019).

22.

Zhang

, Wang

, Yang

and Xue

, Structural context-based knowledge graph embedding for link prediction, Neurocomputing. 470 (2022), 109–120.

23.

Yang

, Yih

W.-T.

, He

, Gao

and Deng

, Embedding entities and relations for learning and inference in knowledge bases, arXiv preprint arXiv: 1412.6575, 2014.

24.

Trouillon

, Welbl

, Riedel

, Gaussier

É.

and Bouchard

, Complex embeddings for simple link prediction, In International conference on machine learning, pages 2071–2080. PMLR, 2016.

25.

Nickel

, Rosasco

and Poggio

, Holographic embeddings of knowledge graphs, In Proceedings of the AAAI Conference on Artificial Intelligence 30 (2016).

26.

Liu

, Wu

and Yang

, Analogical inference for multi-relational embeddings, In International conference on machine learning, pages 2168–2178. PMLR. 2017.

27.

Kazemi

S.M.

and Poole

, Simple embedding for link prediction inknowledge graphs, Advances in Neural Information Processing Systems 31 (2018).

28.

Dettmers

, Minervini

, Stenetorp

and Riedel

, Convolutional 2d knowledge graph embeddings, In Proceedings of the AAAI Conference on Artificial Intelligence 32 (2018).

29.

Nguyen

D.Q.

, Nguyen

T.D.

, Nguyen

D.Q.

and Phung

, A novel embedding model for knowledge base completion based on convolutional neural network, arXiv preprint arXiv: 1712.02121, 2017.

30.

Jiang

, Wang

and Wang

, Adaptive convolution formulti-relational learning, In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 978–987, 2019.

31.

Demir

and Ngomo

A.-C.N.

, Convolutional complex knowledge graph embeddings, In European Semantic Web Conference, pages 409–424. Springer, 2021.

32.

Kolyvakis

, Kalousis

and Kiritsis

, Hyperbolic knowledge graph embeddings for knowledge base completion, In European Semantic Web Conference, pages 199–214. Springer, 2020.

33.

Schlichtkrull

, Kipf

T.N.

, Bloem

, Berg

R.V.D.

, Titov

and Welling

, Modeling relational data with graph convolutional networks, In European semantic web conference, pages 593–607. Springer, 2018.

34.

Bansal

, Juan

D.-C.

, Ravi

and McCallum

, A2n: Attending to neighbors for knowledge graph inference, In Proceedings of the 57th annual meeting of the association for computational linguistics, pages 4387–4392, 2019.

35.

Nathani

, Chauhan

, Sharma

and Kaul

, Learning attention-based embeddings for relation prediction in knowledge graphs, arXiv preprint arXiv: 1906.01195, 2019.

36.

Vashishth

, Sanyal

, Nitin

and Talukdar

, Composition-based multi-relational graph convolutional networks, arXiv preprint arXiv: 1911.03082, 2019.

37.

Zhao

, Zhou

, Xie

, Zhuang

, Li

and Liu

, Incorporating global information in local attention for knowledge representation learning, In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1341–1351, 2021.

38.

Bordes

, Weston

, Collobert

and Bengio

, Learning structured embeddings of knowledge bases, In Twenty-fifth AAAI conference on artificial intelligence, 2011.

39.

Toutanova

and Chen

, Observed versus latent features for knowledge base and text inference, In Proceedings of the 3rd workshop on continuous vector space models and their compositionality, pages 57–66, 2015.

40.

Mahdisoltani

, Biega

and Suchanek

, Yago3: A knowledge base from multilingual wikipedias, In 7th biennial conference on innovative data systems research. CIDR Conference, 2014.

41.

, Nguyen

T.D.

, Nguyen

D.Q.

, Phung

, et al., A capsule network-based embedding model for knowledge graph completion and search personalization, In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and ShortPapers), pages 2180–2189, 2019.

42.

Cao

, Xu

, Yang

, Cao

and Huang

, Dual quaternion knowledge graph embeddings, In Proceedings of the AAAI Conference on Artificial Intelligence 35 (2019), 6894–6902.

43.

, Hu

and Lin

, Dense: An enhanced non-commutative representation for knowledge graph embedding with adaptive semantic hierarchy, Neurocomputing. 476 (2022), 115–125.

44.

Chen

, Li

, Jiang

and Sun

, Dynamic dual quaternion knowledge graph embedding, Applied Intelligence 52(12) (2022), 14153–14163.

45.

Wang

, Zhang

, Sun

, Ye

and Yan

, Dirie: Knowledge graph embedding with dirichlet distribution, In Proceedings of the ACM Web Conference 2022, pages 3082–3091 2022.

HPRE: Leveraging hierarchy-aware paired relation vectors for knowledge graph embedding

Abstract

Keywords

1 Introduction

3.1 Problem formulation and notations

4.1 Datasets

Table 2 Statistics of datasets Dataset Entities Relations Train Valid Test FB15k 14,541 1,345 483,142 50,000 59,071 FB15k-237 14,541 237 272,115 17,535 20,466 WN18 40,943 18 141,442 5,000 5,000 WN18RR 40,943 11 86,835 3,034 3,134 YAGO3-10 123,182 37 1,079,040 5000 5000

4.4 Main results on link prediction

Table 9 Running time and parameters Models Time/s Parameters/M HPRE 22 29.79 RotatE 16 17.64 HAKE 22 29.79

Footnotes

Acknowledgments

Competing interests

Data availability

References

Table 2
Statistics of datasets

Dataset Entities Relations Train Valid Test

FB15k 14,541 1,345 483,142 50,000 59,071

FB15k-237 14,541 237 272,115 17,535 20,466

WN18 40,943 18 141,442 5,000 5,000

WN18RR 40,943 11 86,835 3,034 3,134

YAGO3-10 123,182 37 1,079,040 5000 5000

Table 9
Running time and parameters

Models Time/s Parameters/M

HPRE 22 29.79

RotatE 16 17.64

HAKE 22 29.79