Abstract
Knowledge Graph Embedding (KGE), which aims to embed the entities and relations of a knowledge gxraph into a low-dimensional continuous space, has been proven to be an effective method for completing a knowledge graph and improving the quality of the knowledge graph. The translation-based models represented by TransE, TransH, TransR and TransD have achieved great success in this regard. There is still potential for improvement in dealing with complex relations. In this paper, we find that the lack of flexibility in entity embedding limits the model’s ability to model complex relations. Therefore, we propose single-directional-flexible (sdf) models and multi-directional-flexible (mdf) models to increase the flexibility and expressiveness of entity embeddings. These two methods can be applied to the TransD model and its variant models without increasing any time cost and space cost. We conduct experiments on benchmarks such as WN18 and FB15k. The experimental results show that the models significantly surpasses the classical translation models in both tasks of triplet classification and link prediction. In particular, for Hits@1 of link prediction of WN18, we get 71.7% after applying our method to TransD, which is much better than 24.1% of TransD.
Keywords
Introduction
Incorporating human knowledge is one of the important research directions of artificial intelligence (AI). Knowledge representation and reasoning, inspired by human problem solving, are to represent knowledge for intelligent systems to gain the ability to solve complex tasks [13]. However, how to effectively represent knowledge has always been a difficult point in knowledge engineering. In 1988, Stokman and Vries proposed a method to represent structured knowledge by using graph data. Subsequently, many real-world knowledge graphs (such as WordNet, Freebase, Yago, etc.) were constructed, published and played an important role in AI applications such as relation extraction [30], question answering systems [8], recommendation systems [13], and disease recognition [4].
A knowledge graph usually contains huge amounts of structured data as the form of triplets (head entity, relation, tail entity) (denoted as (h,r,t)). Since knowledge graphs are usually incomplete, a basic problem of knowledge graphs is to predict the missing edges. However, knowledge graphs are usually very large and mainly consist of symbolic representation, which makes the knowledge graphs difficult to calculate. In order to solve these issues, many Knowledge Graph Embedding (KGE) models were introduced to embed entities and relations into a low-dimensional continuous space to facilitate downstream tasks.
TransE is the earliest and simplest translation-based KGE model. Groundbreakingly, the model proposes that the relation is a translational vector from the head entity to the tail entity, i.e., h + r ≈ t. For example, for the triples (Steven Spielberg, directors, Jurassic Park), TransE believes that the vector of Steven Spielberg entity can approach the vector of Jurassic Park entity after being translated by the vector of the directors relation, which means that Steven Spielberg directed the movie Jurassic Park.
However, TransE encountered obstacles in dealing with the complex relations of 1-N, N-1 and N-N (shown in Fig. 1) in the knowledge graph, as mentioned in [12, 28]. For example, (Steven Spielberg, directors, Jurassic Park) and (Steven Spielberg, directors, E.T.) exist in a knowledge graph at the same time. The directors relation can be simply viewed as a 1-N relation. However, there are some flaws in the above models when dealing with such relations. Take TransE’s modeling process of these two facts as an example. Model need to satisfy both Steven Spielberg + directors = Jurassic Park and Steven Spielberg + directors = E.T. It will cause Jurassic Park and E.T. to be equal which is unexpected.

Proportion of complex relations in FB15k.
In this paper, we find that the lack of flexibility and expressiveness of entity embeddings can lead model to a dilemma when modeling complex relations. By exploring the efforts and thinking of the classical translation models, we find that there is still space for improvement in this regard. In order to improve the representation ability of the model to the complex relations, we add an adjustment vector for each entity embedding, so that the entities have different representations in the context of different triples. For example, Steven Spielberg will have different embeddings when directing different movies, which can distinguish the embeddings of all the movies he directed. The adjustment vectors integrate the information of the triples where the entities are located, it can more flexibly adapt to the restrictions on the entity embeddings by complex relationships.
We propose sdf and mdf methods to increase the flexibility and expressiveness of entity embeddings, and apply these two methods to the TransD model and its variant models. The experiments are carried out on two benchmark data sets WN18 [2] and FB15k [3]. We report the accuracy of triplets classification task and the MRR, MR, Hits@1, Hits@3 and Hits@10 of the link prediction task. Finally, we compare the results with TransE [3], TransH [28], TransR [17], TransD [11], lppTransD [33], DistMult [32] and ComplEx [27].
The contributions of this paper are as follows: Through the exploration and analysis of the classical translation models, we find that the flexibility and expressiveness of entity embedding are factors that affect the modeling of complex relations. In order to improve the modeling ability of the model for complex relations, we propose two methods named single-direction-flexible (sdf) model and multi-direction-flexible (mdf) model and implement these two methods separately on TransD and lppTransD. We conduct experiments on two benchmark data sets WN18 and FB15k, and the results show our method significantly outperforms classic translation models in link prediction and triplet classification tasks on multiple data sets.
The rest of this paper is organized as follows. Section 1 briefly introduces translational distance models and other state-of-the-art methods. Section 1 describes our proposed framework and its application, and discusses the time and space complexity. Experimental studies are presented in Section 1. We conclude the paper in Section 1.
KGE aims to embed the entities and relations in the knowledge graph into a low-dimensional continuous vector space while maintaining semantic information as much as possible. In a knowledge graph, triplets are represented as (head entity, relation, tail entity) (abbreviated as (h,r,t)). For each model, there is a scoring function, also named energy function, to measure the plausibility of the triplet, denoted as S = f r (h, t). The goal of each model is to enforce the score of the golden triplet (h, r, t) as much higher or lower as possible than the corrupted triplets ((h, r, t′) or (h′, r, t)). According to the different forms of the score function, we can roughly divide the existing KGE models into translation-based models and semantic matching models.
Translation-based models
The translation-based models regard a relation as a translational vector operation. After the translation, models obtain a score of the triplet based on the distance. Under the condition of using less resources, this method not only obtains better results, but also gives certain interpretability. Before proceeding, we define some notations. A column vector
The unified score function form of the translation model is as Formula (1):
To alleviate this problem, Ji, et al. [12] proposed
It would be nice to simplify the model so that it can be applied to real problems. There are also some other models that improve the performance by slightly increasing the complexity of the models.
Recently,
Unlike the translation-based models, the semantic matching models use similarity-based score functions instead of distance-based score functions.
The factorization models can be regarded as another form of semantic matching models, which aims to capture semantic information through tensor decomposition. Typically,
Other models
With the popularity of deep learning, many researchers try to apply deep neural network to KGE.
A lot of work [14, 16] improves the performance of embedding through additional information such as paths. There are also many translation-based models that embed entities and relations into other spaces, such as complex space [25], Gaussian space [10, 31], hyperbolic space [1, 5], time domain space [35] and so on. But this paper only focuses on the discussion of Euclidean space. There is also a lot of work to re-evaluate previous work [22, 23].
Our method
First, we analyze in detail the process of entity embedding of TransE, TransD [11] and lppTransD [33]. Through our analysis, we find that the TransE is in a dilemma when dealing with complex relationships. TransD and its variant lppTransD make up for this defect by increasing the flexibility and expressiveness of entity embeddins, but there still exists space for improvement. Subsequently, we propose two methods to improve the ability of the model to deal with complex relations by increasing the flexibility and expressiveness of entity embeddings. Finally, we elaborate on the training and optimization process of the model in detail.
Analysis of TransD and lppTransD
Before analyzing, we give a definition:
TransD prepares two vectors for each entity e and relation r, which are called
Note that the score functions can be seen as consisting of two parts. One part is the same as TransE:
Figure 2 shows the perfect embedding of different models in two dimensional Euclidean space. Suppose there are two triplets (h, r, t1) and (h, r, t2) in a knowledge graph (It can be thought that h is the director Steven Spielberg in section 1, t1 is the movie Jurassic Park and t2 is E.T.). In Fig. 2(a), we can see that TransE, which aims to get perfectly embedding for both triplets, embeds t1 and t2 as a same point (

Illustration of translation-based models.
For TransD (Fig. 2(b)), it can be seen that the head entity vector
From the above analysis, we find there exist unnecessary limitations on entity embeddings in TransD and lppTransD. These limitations affect the ability of the model to model complex relations. Therefore, how to remove these restrictions has become our focus.
Our motivation is to enhance the model’s ability to handle complex relationships by increasing the flexibility of entity embedding. From the analysis of subsection 1, we find that the adjustment vectors of TransD or lppTransD for entities are mainly composed of two parts: sizes and directions. The sizes of adjustment vectors are determined by corresponding entities (
On the one hand, we achieve our goal by increasing the flexibility of the adjustment vector sizes. The sizes are determined jointly by the head entities h and tail entities t instead of themselves. Figure 2(d) shows the embedding of triplets by TransD-sdf, where
On the other hand, we hope to break through the limitation of projection direction. Therefore, we use the projection vectors of entity
Training details
The general optimization goal of the translation models [3, 29] is usually as follows:
We are inspired by RotatE [25]: every negative samples should contribute its corresponding weight to the optimization goal. Therefore, we modify the margin-loss to the following form:
Training process.
The whole optimization process uses Stochastic Gradient Descent (SGD) to optimize the objective function. In order to speed up the convergence and prevent overfitting, we explicitly restrict the sizes of the entity vectors and relation vectors, i.e., make ∥
Table 1 Comparisonof model time complexity and space complexity shows the time complexity and space complexity of our models and all comparison models. Given a certain knowledge graph, we use N e to represent the number of entities, N r to represent the number of relations, and N t to represent the number of triplets. Meanwhile, the dimension of entity and relation are denoted by m and n, respectively. Note that KGE models are expected to embed entities and relations into low-dimensional spaces. So m and n are usually in the tens to hundreds of magnitude.
Comparison of model time complexity and space complexity
Comparison of model time complexity and space complexity
Note that our improvements to TransD and lppTransD do not increase any time complexity and space complexity. Therefore, both the sdf methods and mdf methods perfectly inherit the advantages of TransD and its variant that can be deployed on a large-scale knowledge graph.
In this section, we first introduce the commonly used public data. Then we evaluate our method on two tasks: triplet classification and link prediction. Finally, we analyse of the experimental results in detail. OpenKE [9] is an open source tool for knowledge graph embedding, which integrates the classic KGE model. Our experiment was conducted on OpenKE. Experimental environment: CPU: Intel Xeon Gold 6248, RAM: 376GB, GPU: NVIDIA GV100GL [Tesla V100 PCIe 32GB].
Data sets
The statistics of the data set used in the experiment are shown in Table 2 Datasetsused in the experiments. In the table, Avg. #Train/#Ent represents the average number of linked triplets regarding with each entity in the training set. With a small average linked triplets, it is more challenging to learn the semantic of the entities [36]. For this reason, the link prediction task will only be performed on WN18 and FB15k, while the triplet classification task will be performed on all data sets in Table 2 Datasetsused in the experiments. For most knowledge graphs, the number of entities N e is often much larger than the number of relations N r .
Datasets used in the experiments
Datasets used in the experiments
Tasks
This paper focuses on two tasks: triplet classification and link prediction. Next we will introduce two tasks and their details.
It is worth to pay a special attention that corrupted triplets may also be fact triplets. In order to make a more appropriate comparison, these fact triplets should be filtered out during the evaluation. We will report more reasonable results which have filtered other fact triplets.
unif : For a fact triplet (h, r, t), the corrupt triplet ((h, r, t′) or (h′, r, t)) is obtained by randomly replacing the head entity or the tail entity from all entities. This method is simple and effective, but it will generate a large number of false negative labels during training. bern : Sampling after analyzing the Bernoulli distribution of the sample. Before the model starts training, count the average number of tail entities per head entity under a specific relation, and record it as t
ph
. Symmetrically, record the average number of head entities per tail entity as h
pt
. Then replace the tail entity with probability
Results analysis and disicussion
In this section, we report and analyze the experimental results of triplet classification task and link prediction. Since our method is based on translation model, we mainly compare with classical translation models: TransE, TransH, TransR and TransD. In addition, we compare two improved models based on TransD. In the link prediction task, we also report the results of DistMult and ComplEx for comparison. All the best experimental results are marked in bold. The optimal results of our method are underlined for comparison with the optimal results.
Due to differences in experimental equipment, parameter settings and other details, the experimental results may be different. To be fair, excluding lppTransD, we use the results of translation-based models reported by [36] in recent related work that are better than original papers. And the results of DistMult and ComplEx are derived from the report of [7].
Result of triplet classification
We compare our methods with the classic translational distance models on all datasets. The experimental results are shown in the Table 3 Resultof Triplet Classification. Note that most of the comparison models only perform this task on WN11, FB13 and FB15k. Their original texts do not contain the results of WN18, so we quote the results of Zhang et al. [36].
Result of Triplet classification
Result of Triplet classification
From Table 3 Resultof Triplet Classification we can see that our models have the optimal results on both WN18 and FB15k, and the improvement is significant. It can be seen that the accuracy on WN18 is improved to 98.3% from 96.8%. Significantly, the accuracy of the model on FB15k is increased to 96.7% from 90.3%, which far exceeds the result of the baseline model. However, our models have only baseline performance on the WN11 and FB13 data sets. We think it might be that there exist few logical rules or complex relations in FB13 and WN11. The original intention of our models is to make up for the flaws in the process of modeling these complex data with traditional translation-based models. Therefore, our methods are better at dealing with complex relations.
For the link prediction task, we give the experimental results of WN18 and FB15k data sets. The results of MRR, MR, Hits@10, Hits@3, and Hits@1 are given respectively. Since there are multiple evaluation indicators, we select the best results and parameters based on the best MRR. Table 4 LinkPrediction results on FB15k and WN18 shows the experimental results of link prediction for FB15k and WN18, respectively.
Link prediction results on FB15k and WN18
Link prediction results on FB15k and WN18
Average results of sdf and mdf on FB15k
In addition, in order to show the superiority of our method more intuitively, we average all the results of sdf and mdf methods, and compare them with the results of recent model TransD-pa. The results of the comparison are shown in Table 5.
For a relation r, traverse the two-tuple (h, r,) composed of the head entity and the relation (or the two-tuple (, r, t) composed of the tail entity and the relation). Count the average number of correct entities under (h, r,) denoted as t
r
. Similarly, Let h
r
correspond to (, r, t). If this number is less than 1.5, we mark the position corresponding to this number as 1 otherwise N. For example, if h
r
> 1.5 and t
r
< 1.5, the type of the relation is N-1.
Back to Table 6, it can be seen that the sdf models handle the relations between 1-1, 1-N and N-1 very well. The mdf models have more advantages when dealing with N-N relations. Recall Fig. 1 complexrelations, N-N relations account for about three-quarters of all FB15k relations. It explains why the mdf method can achieve better results on Hits@10. Overall, both sdf method or mdf method have a much better improvement in dealing with complex relations compared with original models.
Experimental results on FB15k by mapping properties of relations (%)
Table 7 Ablationexperiment shows the experimental results of TransD-sdf and TransD-mdf under different loss function optimizations. The “original” represents the models using the original margin-loss for optimization, and the “optimized” represents the models using the optimized margin-loss with weight. For comparison, we only report the “Filter” results under the “bern” negative sampling method. Experimental results show that our sdf method and mdf method can significantly improve the performance of TransD even under the original margin loss function optimization. Although the results of Hits10 are improved, the improvement is limited. TransD-sdf and TransD-mdf mainly improve the MRR results (MRR indicators are highly dependent in Hits@1). This is consistent with our previous result analysis, sdf and mdf methods carry out a more refined sorting for the results of original TransD, and upgrade the original entity from the top ten ranking to the first ranking. Using the optimized loss function further strengthen the refining process.
Results of models under different loss function optimizations
Results of supplementary experiment of WN18RR
The results of TransE are given by [6]. The results of DistMult and ComplEx are given by [7]. Hits@10 of TransD-sdf and TransD-mdf are closed to DistMult and ComplEx. However, MRR results are not so good. Although result of TransE looks better, [23] points out that experiment in [6] exists a test set leak problem. In addition, [22] points out that adding new training tricks can make the early KGE models perform well on FB15k-237 and WN18RR. We have shown in theory that our models are superior to TransE and close to Complex and DistMult. In the future, we will try to add some training tricks to enhance our results.
In this paper, we found that the lack of flexibility and expressiveness of entity embeddings limits the abilities of the classical translation models in modeling complex relations. Through in-depth exploration of the classic translation models, we discovered the unnecessary limitations of these models in process of entity embedding. Specifically, we proposed single-direction-flexible models and multi-directional-flexible models to increase the flexibility and expressiveness of entity embeddings. Such improvement enables the models to improve the modeling ability of complex relations between 1-N, N-1 and N-N. However, our method only pays attention to the flexibility of entity embedding and does not make corresponding improvements to relation embedding. This will be our future research direction. Meanwhile, the model can also embed more triplets perfectly. Experiments proved that our models greatly improve the prediction accuracy of triplet classification and the results of link prediction without increasing any time cost and space cost. Our methods can be applied not only to TransD, but also to its variants. Our models inherit the advantages of TransD that can be deployed to large-scale knowledge graphs, and at the same time improve the ability to model complex relations. In addition, we did not pay enough attention to training tricks, which may limit our results, so we will explore the impact of different training tricks and experimental settings on the model in future work.
Footnotes
Acknowledgments
This work is supported by the National Key Research and Development Program of China (Grant No. 2019YFA0706200), the National Natural Science Foundation of China (Grant Nos. 61632014, 61627808 and 61802158), the Natural Science Foundation of Gansu Province (Grant No. 20JR10RA605), and the Fundamental Research Funds for the Central Universities (lzujbky-2021-66). Thanks for the computing resources provided by the Supercomputing Center of Lanzhou University.
