TransR * : Representation learning model by flexible translation and relation matrix projection

Abstract

The TransR model solves the problem that TransE and TransH models are not sufficient for modeling in public spaces, and is considered a highly potential knowledge representation model. However, TransR still adopts the translation principles based on the TransE model, and the constraints are too strict, which makes the model’s ability to distinguish between very similar entities low. Therefore, we propose a representation learning model TransR* based on flexible translation and relational matrix projection. Firstly, we separate entities and relationships in different vector spaces; secondly, we combine our flexible translation strategy to make translation strategies more flexible. During model training, the quality of generating negative triples is improved by replacing semantically similar entities, and the prior probability of the relationship is used to distinguish the relationship of similar coding. Finally, we conducted link prediction experiments on the public data sets FB15K and WN18, and conducted triple classification experiments on the WN11, FB13, and FB15K data sets to analyze and verify the effectiveness of the proposed model. The evaluation results show that our method has a better improvement effect than TransR on Mean Rank, Hits@10 and ACC indicators.

Keywords

Knowledge representation flexible translation relation matrix projection link prediction triple classification

1 Introduction

Knowledge graph is a structured semantic knowledge base, which describes entities and their relationships in the physical world in symbolic form. Its basic components are “entity-relation-entity” triples and value pairs of entities and related attributes. Entities are connected with each other through relationships, forming a networked knowledge structure [20, 24]. The main research goal in the field of knowledge graphs is to obtain structured knowledge from unstructured Internet information, automatically integrate and construct knowledge bases, service knowledge inferences and other related applications. Among them, knowledge representation is the foundation of knowledge acquisition and application, and a key issue throughout the entire process of knowledge base construction and application. The most direct way to represent knowledge graphs is to use graph databases, but the application of this representation method to large-scale knowledge graphs has problems such as high computational complexity, low reasoning efficiency and data sparseness [21, 25]. In other words, under this kind of representation, the knowledge graph is symbolic and logical. Therefore, numerical machine learning methods and techniques cannot be applied to the knowledge graph.

In recent years, with the continuous deepening of big data research and application, the representation learning technology in artificial intelligence has emerged, aiming to represent the semantic information of the research object as dense low-dimensional real-valued vectors [22]. As a new method to support the calculation and reasoning of knowledge graph, the knowledge graph is mapped into a continuous low dimensional vector space while retaining the specific attributes of the original graph, which makes a large number of efficient numerical calculation and reasoning methods applicable. For example, TransE [5], TransH [6], TransR [13], DistMult [10], DT model [8] and so on. Among these models, TransR is considered a model with great potential. However, TransR still adopts the translation principles based on TransE, and the constraints are too strict, which makes the model’s ability to distinguish between very similar different entities low, and the model’s expressive ability is limited.

For this reason, we propose a representation learning model TransR* based on flexible translation and relational matrix projection. First, the relationship matrix projection is introduced; secondly, based on it, combined with the principle of flexible translation proposed by us to make the translation more flexible and reduce the interference of irrelevant information. At the same time, during model training, the sampling strategy of negative triples is improved, using 1-to-N and N-to-1 mapping relationships to select replacement entities, so that as many entities as possible are trained, and the prior probability of the relationship is used to solve the problem of similar coding relations. Finally, two tasks of link prediction and triple classification were performed on the sub-data sets of WordNet and Freebase, and the performance of the model was judged by Mean Rank, Hits@10, and ACC.

The contributions of this paper are as follows: (1) We propose a new knowledge representation learning model TransR*; (2) We propose a new flexible translation principle for the first time, which reduces translation constraints; (3) During model training, the quality of generating negative triples is improved by replacing semantically similar entities, and the prior probability of relationship is used to solve the problem of encoding similar relationships that are difficult to distinguish; (4) In experiments, our method outperforms the classic TransR model in link prediction and triple classification tasks.

2 Related work

In this section, we introduce the current typical knowledge representation learning model, and analyze its algorithm ideas, scoring function, and time complexity. In Table 1, we use N_c to represent the time complexity, a lowercase letter i to represent the dimensionality of the entity embedding, a lowercase letter j to represent the dimensionality of the relationship embedding, a lowercase letter q to represent the number of nodes of the neural network, and a lowercase letter s to represent the number of tensors. A lowercase x represents the number of times to adjust the hyperparameter. A lowercase v represents the rank of the matrix. W _r represents the three-dimensional tensor. M represents the matrix. A lowercase letter u means hyperparameter and a lowercase letter g represents the function.

Table 1
The score function and time complexity of the knowledge representation learning model

Metric #Scoring Function #Operations (Time complexity)

Unstructured [1] $- ∥ h - t ∥_{2}^{2}$ O (N_c)

RESCAL [9] $e_{s}^{T} W_{r} e_{o}$ O (v²)

SE [2] - ∥ M_rhh - M_rtt ∥ ₁ O (2i²N_c)

SME(linear) [3] $g_{left}^{T} g_{right}$ O (4iqN_c)

SME (bilinear) [3] $g_{left}^{T} g_{right}$ O (4iqsN_c)

LFM [4] h ^T M _r t O ((i² + i) N_c)

NTN [6] $u_{r}^{T} f (h^{T} W_{r} t + M_{r, 1} h + M_{r, 2} t + b_{r})$ O (((i² + i) s + 2iq + q) N_c)

DistMult [10] <W_r, e_s, e₀> O (v)

HolE [11] $W_{r}^{T} (F^{- 1} Θ] F [e_{0}])$ O (v log v)

ComPlEX [13] $Re (< W_{r}, e_{s}, {\bar{e}}_{0} >)$ O (v)

TransE [5] ∥h + r - t ∥ _l₁/l₂ O (N_c)

TransH [6] $∥ (h - W_{r}^{T} {hW}_{r}) + d_{r} - (t - W_{r}^{T} {tW}_{r}) ∥_{l_{1} / l_{2}}$ O (2iN_c)

TransR [13] ∥h_r + r - t_r ∥ _l₁/l₂ O (2ijN_c)

TransR*(this paper) ∥h_r + r - t_r + μ ∥ _l₁/l₂ O (2ijN_c + x)

Metric	#Scoring Function	#Operations (Time complexity)
Unstructured [1]	$- ∥ h - t ∥_{2}^{2}$	O (N_c)
RESCAL [9]	$e_{s}^{T} W_{r} e_{o}$	O (v²)
SE [2]	- ∥ M_rhh - M_rtt ∥ ₁	O (2i²N_c)
SME(linear) [3]	$g_{left}^{T} g_{right}$	O (4iqN_c)
SME (bilinear) [3]	$g_{left}^{T} g_{right}$	O (4iqsN_c)
LFM [4]	h ^T M _r t	O ((i² + i) N_c)
NTN [6]	$u_{r}^{T} f (h^{T} W_{r} t + M_{r, 1} h + M_{r, 2} t + b_{r})$	O (((i² + i) s + 2iq + q) N_c)
DistMult [10]	<W_r, e_s, e₀>	O (v)
HolE [11]	$W_{r}^{T} (F^{- 1} Θ] F [e_{0}])$	O (v log v)
ComPlEX [13]	$Re (< W_{r}, e_{s}, {\bar{e}}_{0} >)$	O (v)
TransE [5]	∥h + r - t ∥ _l₁/l₂	O (N_c)
TransH [6]	$∥ (h - W_{r}^{T} {hW}_{r}) + d_{r} - (t - W_{r}^{T} {tW}_{r}) ∥_{l_{1} / l_{2}}$	O (2iN_c)
TransR [13]	∥h_r + r - t_r ∥ _l₁/l₂	O (2ijN_c)
TransR*(this paper)	∥h_r + r - t_r + μ ∥ _l₁/l₂	O (2ijN_c + x)

2.1 TransE, TransH and TransR

TransE. The algorithmic idea of the TransE model [5] is: entities and relations are mapped to the vector space, and the representation of entities and relations becomes the representation between vectors; at the same time, consider the relation r in each triple instance (h, r, t) as a translation from entity h to entity t.

TransH. The algorithmic idea of the TransH model [6] is: try to express the entity structure in different relationships by different forms, and for the same entity, it also plays different roles under different relationships; firstly, the model selects a hyperplane through the relation vector and its orthogonal normal vector, and then projects the head entity vector and tail entity vector into the hyperplane along the direction of the normal vector; finally, the loss function is calculated.

TransR. The algorithm idea of the TransR model [13] is: for each type of relationship, not only has a vector r to describe itself, but also a mapping matrix M_r to describe the relationship space in which the relationship is located. The entity and relationship are respectively mapped to the entity vector space and relationship vector space.

2.2 Other models

The NTransGH model [14] combines a transformation mechanism that models a relationship as a transformation operation of a generalized hyperplane, and a neural network that captures more complex interactions between entities and relationships. The SProje model [15] introduces an adaptive measurement method to reduce the influence of noise information. On this basis, by further optimizing the loss function to increase the loss weight of the complex relationship triples. The TransGraph model [16] is based on the ability of TransE to learn the characteristics of the triplet and the knowledge graph network structure simultaneously, which enhances the representation effect of the knowledge graph effectively; in order to achieve the deep integration of network structure information and triplet information, a cross-training mechanism for vector sharing is presented. The STransH model [17] is modeled in the entity space and the relationship space, and uses the nonlinear operation of a single-layer neural network to strengthen the semantic connection between entities and relationships. At the same time, inspired by the TransH model, the mechanism of projecting to a specific relationship hyperplane is introduced, so that entities have different roles in different relationships. The SME model [3] proposes more complex operations to portray the internal connections between entities and relationships. The SE model [2] introduce a new method for automatically learning the embedding of a structured distributed knowledge base. The NTN model [6] uses bilinear vectors to replace the linear transformation layer in traditional neural networks. The Unstructured model [1] is a specific framework for performing semantic analysis on free text. The RESCAL model [9] is the representative of the matrix factorization model, which uses the matrix factorization method for knowledge representation learning. The DistMult model [10] uses embedding without explicit logical constraints to mine logical rules directly from the knowledge base. HolE model [11] proposes using holographic embedding to learn the combined vector space representation of the entire knowledge graph. The ComPlEX model [12] discusses the application of complex embeddings in low-rank matrix factorization.

3 Our model

In this section, we define M_r as the relationship matrix, f_r(h, t) represents the score function, h_r and t_r represent the projection vector of the head entity and the tail entity, and u represents the hyperparameter.

3.1 Our motivation

TransR model projects the head entity and tail entity of each triple (h, r, t) in the knowledge base into the relational space through a mapping matrix, so that h_r + r ≈ t_r (Fig. 1(a) represents the entity space, Fig. 1(b) represents the relationship space); and construct a corresponding vector space for each relationship, and separate entities and relationships in different vector spaces [13]. The TransR model solves the problem that the TransE model and TransH model may not be sufficient for modeling in public spaces. It is a very potential knowledge representation learning model. However, the TransR model still uses the translation principle of h_r + r ≈ t based on the TransE model, and this constraint is still too strict. When different entities are very similar, the TransR model’s ability to distinguish between different entities is still low, which makes it less effective when dealing with complex types of relationships.

Fig. 1

Algorithm idea of TransR model.

Where circles represent specific relational projections, and triangles represent entities that do not have h_r + r ≈ t_r relations. The dots refer to entities that satisfy the h_r + r ≈ t_r relationship.

3.2 TransR*

In order to solve the problem that the TransR model has low distinguishing ability for different entities that are very similar, we propose the relational matrix projection model TransR* based on the principle of flexible translation. We put forward the principle of flexible translation for the first time, and its translation principles is: For each triplet (h, r, t), assuming that r and t are given, then we allow h to have the same direction but different sizes, and ϕ is adjusted as a hyperparameter (e.g. Fig. 2(a)); Similarly, assuming that h and t are given, we allow r to have the same direction and different sizes, and β as a hyperparameter to adjust (e.g. Fig. 2(b)); h and r are given, we allow t to have the same direction but different sizes, and λ is adjusted as a hyperparameter (e.g. Fig. 2(c)).

Fig. 2

Principles of flexible translation.

Considering the problem of random errors in the experiment, it is possible that h +ϕ+ r is not enough to equal t, h + r +β is not enough to equal t, and h + r is not enough to equal t +λ. Therefore, we set three random numbers j₁, j₂, j₃ as error adjustment. The situation after setting the random number is shown in Fig. 3(a-c):

Fig. 3

The principle of flexible translation after setting a specific range of random numbers.

In translation, the embedding of entities and relationships are in the same space. The translation principle is defined as:

$(h + ϕ + j_{1}) + (r + β + j_{2}) \approx (t + λ + j_{3})$ (1)

Among them, ϕ, β, λ, j₁, j₂, j₃ ɛ R^m×n. Correspondingly, the scoring function is: $f_{r} (h, t) = | | h + r - t + μ ∥_{l_{1} / l_{2}}$ (2) $μ = ϕ + j_{1} + β + j_{2} - (λ + j_{3})$ (3)

On the basis of TransR model relation matrix projection, we added the principle of flexible translation proposed by us. In the relational representation space of r, for a given r and t_r, the head entity is represented by h_r + ϕ + j₁ (e.g. Fig. 4(a)); for a given h_r and t_r, the relation is represented by r +β+j₂ (e.g. Fig. 4(b)); for a given h_r and r, the tail entity is represented by t_r+λ+j₃ (e.g. Fig. 4(c)). The score function of TransR* is:

Fig. 4

Algorithm idea of TransR* model.

$f_{r} (h, t) = | | h_{r} + r - t_{r} + u | |_{l_{1} / l_{2}}$ (4)

3.3 Model training

In model training, constructing negative triples is an important task. The method we use is the probability method, which replaces the head and tail entities with different probabilities. When generating negative triples, the selected strategy is to replace entities for different relationship types. Specifically, for a N-to-1 relationship, choose a higher probability to replace the tail entity; for a 1-to-N relationship, choose a higher probability to replace the head entity. Since an entity contains multiple attributes, when dealing with N-to-1 relationships, replacing the tail entity can fully train multiple attributes of the tail entity; when dealing with 1-to-N relationships, replacing the head entity can also make the head entity’s multiple attributes are fully trained [17].

(1) Probability method replaces head and tail entities

During model training, we can get the following data: first, the average number of tail entities corresponding to each head entity tqh; second, the average number of head entities corresponding to each tail entity hqt. When we use the probability method, we sample according to the Bernoulli distribution of q = tqh / (tqh + hqt). When we use positive triples to construct negative triples, we replace the head entity with probability q, and replace the tail entity with probability 1-q, so that the total probability is 1, and the sampling method conforms to Bernoulli distribution. The Bernoulli distribution is chosen, because this method can bring two benefits: first, it can increase the probability of getting a positive triplet, and second, it can reduce the computational complexity.

We stipulate that when tqh < 1.5 and hqt < 1.5, then the relationship r is 1-to-1; when tqh > 1.5 and hqt > 1.5, then the relationship r is N-to-N; when tqh≥1.5 and hqt < 1.5, Then it means that the relation r is 1-to-N; when tqh < 1.5 and hqt≥1.5, it means that the relation r is N-to-1 [26].

(2) The prior probability of the relationship

We use the prior probability of the relationship to solve the problem of encoding similar relationships. That is, the more times a relationship appears, the greater the probability that the entity pair (h, t) has the relationship. Given the candidate triples (h, r, t), we determine the prior probability of the relation r by comparing the most similar relation r’ to the given relation r. The specific formula is as follows: $p_{r} (r) = \frac{N_{r}}{N_{r} + N_{r^{'}}}$ (5)

Where N_r is the number of times the relationship r appears in the training set, and N_r’ is the relationship most similar to the relationship r. In the model training process, in order to distinguish between positive triples and negative triples, the following margin-based loss function is used as the optimization objective function of the training model: $L = \sum_{(h, r, t) \in S} \sum_{(h^{'}, r, t^{'}) \in S^{'}} max (f_{r} (h, t) + γ - f_{r} (h^{'}, t^{'}), 0)$ (6)

Where S represents the set of positive triples, the relational matrix projection S’ represents the set of negative triples, and max (x, y) refers to the larger value between x and y, γ represents the distance between the score of the loss function of the positive triple and the score of the loss of the negative triple. Therefore, the optimization goal of this objective function is to separate the positive triples from the negative triples to the greatest extent.

4 Experiment

The running platform of this experiment is pycharm, the operating system is Linux, the development language is python, and the cpu is i7-6700HQ. On the framework, pytorch version 1.5 is selected. In addition, in order to improve the training speed of the model, this experiment chose a GTX1080 GPU with 6 G memory.

4.1 Data set

When selecting the data set, consider the need to compare the data with the classic TransE, TransH, TransR, Unstructured, SE, SME, LFM, NTN and other models. Therefore, we chose the common datasets for these models: two subsets WN18 and WN11 in WordNet [23, 27], and two subsets FB15K and FB13 in Freebase [18]. Among them, FB15K is considered to be a large data set due to the relatively large number of relations. The specific number of entities and relations are shown in Table 2.

Table 2
Data set statistics

Data Set #Entities #Relationships #Train #Valid #Test

WN18 40943 18 141442 5000 5000

FB15K 14951 1345 483142 50000 59071

WN11 38696 11 112581 2609 10544

FB13 75043 13 316232 5908 23733

Data Set	#Entities	#Relationships	#Train	#Valid	#Test
WN18	40943	18	141442	5000	5000
FB15K	14951	1345	483142	50000	59071
WN11	38696	11	112581	2609	10544
FB13	75043	13	316232	5908	23733

4.2 Link prediction

Given an entity and relationship in the triplet, the purpose of the link prediction task is to predict the correct other entity [19, 28]. For example, given the head entity “United States” and the relationship “capital”, it is predicted that the tail entity should be “Washington”. Link prediction can find the missing knowledge in the knowledge graph and is an important means of knowledge completion.

Evaluation index. In the link prediction task, the average ranking (Mean Rank) and the top 10 probability rate (Hits@10) are used to evaluate the quality of the model [18]. For a triple in the test set, use any other entity in the knowledge graph as the wrong entity, and replace the original correct head entity or tail entity in the triple to construct a negative triple. The link prediction task sorts the original triples and all negative triples after scoring. The first sorting indicates that it is more likely to predict that it is the correct triplet, and the lower sorting indicates that it is less likely to be the correct triplet. Lower Mean Rank and higher Hits@10 means better prediction effect.

When constructing the negative example triples, one entity in the original correct triples is replaced with any other entity in the knowledge graph. There is a situation: the new triples generated by the replacement originally exist in the knowledge graph. In this case, the newly generated triples may rank higher than the original correct triples, which will interfere with the prediction results. Therefore, it is necessary to remove the existing part in the newly generated triplet. This filtered experiment is called “Filter", and the experiment without removal is called “Raw".

Experimental realization. We chose some representative models as the basis for comparison, such as Unstructured, RESCAL, SE, SME, LFM, NTN, TransE, TransH, TransR etc. In the process of reproducing experiments, due to parameter settings, random initialization of parameters, and differences in the experimental environment, we did not get the best results in the references. However, considering that the same data set is used and the same indicators are selected, we directly use the best experimental data in the references. In addition, in order to reduce the impact of accidental experimental data on the experimental results, we repeated each task 10 times; then, the average of the 10 results was taken as the final result. When training TransR*, the learning rate α among {0.0001, 0.001, 0.005, 0.01}, the margin γ among {1, 1.5, 2, 3, 3.5, 4}, the embedding dimension k among {50, 100, 150, 200}, the hyperparameter μ among {0, 0.1, 0.5, 1, 2}, the random numbers j₁, j₂, j₃ between (0.0001, 0.001), and the batch size B among {20, 50, 75, 120, 4800, 9600}. The best parameters are determined by the validation set.

Under the “unif” setting, the best configuration is: on WN18, μ= 0.1, α= 0.0001, γ= 3.5, k = 50, B = 75; on FB15K, μ= 0.1, α= 0.001, γ= 3, k = 50, B = 50. Under the “bern” setting, the best configuration is: on WN18, μ= 0.1, α= 0.0001, γ= 3.5, k = 50, B = 75; on FB15K, μ= 0.1, α= 0.001, γ= 3, k = 50, B = 50. For these two data sets, this experiment will iterate all training triples 500 times.

Experimental results. It can be seen from Table 3 that on the WN18 data set, TransR* (unif) and TransR* (bern) are better than other methods in Mean Rank and Hits@10. Among them, in the Hits@10 indicator, TransR* is 2.6% higher than TransR. This may be because the number of relationships in WN18 is relatively small, so it is reasonable to ignore different types of relationships. On the FB15K data set, in the index of Hits@10, TransR* is 2.9% higher than TransR.

Table 3
Link prediction experiment results

Date Sets WN18 FB15K

Metric Mean Rank Hits@10/% Mean Rank Hits@10/%

Raw Filter Raw Filter Raw Filter Raw Filter

Unstructured [1] 315 304 35.3 38.2 1074 979 4.5 6.3

SE [2] 1011 985 68.5 80.5 273 162 28.8 39.8

SME(Linear) [3] 545 533 65.1 74.1 274 154 30.7 40.8

SME(Bilinear) [3] 526 509 54.7 61.3 284 158 31.3 41.3

LFM [4] 469 456 71.4 81.6 283 164 26.0 33.1

TransE [5] 263 251 75.4 89.2 243 125 34.9 47.1

STransH [17] 347 330 77.1 90.6 196 68 46.6 69.5

TransH [6] 401 388 73.0 82.3 212 87 45.7 64.4

TransR [13] 238 225 79.8 92.0 198 77 48.2 68.7

TransR(unif) 224.0 212.3 81.1 94.4 190.5 62.6 47.5 67.6

TransR(bern) 206.7 195.2 81.6 94.6 203.4 107.5 51.1 69.3

Date Sets	WN18	FB15K
Metric	Mean Rank	Hits@10/%	Mean Rank	Hits@10/%
	Raw	Filter	Raw	Filter	Raw	Filter	Raw	Filter
Unstructured [1]	315	304	35.3	38.2	1074	979	4.5	6.3
SE [2]	1011	985	68.5	80.5	273	162	28.8	39.8
SME(Linear) [3]	545	533	65.1	74.1	274	154	30.7	40.8
SME(Bilinear) [3]	526	509	54.7	61.3	284	158	31.3	41.3
LFM [4]	469	456	71.4	81.6	283	164	26.0	33.1
TransE [5]	263	251	75.4	89.2	243	125	34.9	47.1
STransH [17]	347	330	77.1	90.6	196	68	46.6	69.5
TransH [6]	401	388	73.0	82.3	212	87	45.7	64.4
TransR [13]	238	225	79.8	92.0	198	77	48.2	68.7
TransR*(unif)	224.0	212.3	81.1	94.4	190.5	62.6	47.5	67.6
TransR*(bern)	206.7	195.2	81.6	94.6	203.4	107.5	51.1	69.3

In order to prove that TransR* has better expressive ability and can better handle complex relationships. We conducted an in-depth analysis of the FB15K data set and found that the number of 1-to-1 relationships in FB15K reached 323, the number of 1-to-N relationships reached 309, the number of N-to-1 relationships reached 390, and the number of N-to-N relationships reached 323. Thus, FB15K can be used as a large data set. We used the optimal configuration parameters of the TransR* model on FB15K, and tested the scores of the model under the relationship of 1-to-1, 1-to-N, N-to-1, and N-to-N. From the experimental results in Table 4, it can be seen that the TransR* model achieves the best results on Predicting Left and Predicting Right, which is better than other models. Among them, TransR* reached 92.8% on the 1-to-N relationship of Predicting Left and 93.2% on the N-to-1 relationship of Predicting Right.

Table 4

Hits@10 values of various relationships in FB15K(%)

Metric	Predicting Left(Hits@10)				Predicting Right(Hits@10)
	1-to-1	1-to-N	N-to-1	N-to-N	1-to-1	1-to-N	N-to-1	N-to-N
Unstructured [1]	34.5	2.5	6.1	6.6	34.3	4.2	1.9	6.6
SE [2]	35.6	62.6	17.2	37.5	34.9	14.6	68.3	41.3
SME(linear) [3]	35.1	53.7	19.0	40.3	32.7	14.9	61.6	43.3
SME(Bilinear) [3]	30.9	69.6	19.9	38.6	28.2	13.1	76.0	41.8
TransE [5]	43.7	65.7	18.2	47.2	43.7	19.7	66.7	50.0
STransH [17]	76.7	88.2	35.8	68.1	73.6	42.4	85.2	70.6
TransH(unif) [6]	66.7	81.7	30.2	57.4	63.7	30.1	83.2	67.2
TransH(bern) [6]	66.8	87.6	28.7	64.5	65.5	39.8	83.3	67.2
TransR [13]	78.8	89.2	34.1	69.2	79.2	37.4	90.4	72.1
TransR*(unif)	76.4	85.9	48.0	69.3	75.2	48.4	85.1	72.8
TransR*(bern)	84.2	92.8	30.0	68.7	82.3	35.3	93.2	72.6

4.3 Triple classification

Triple classification is used to determine whether a given triple (h, r, t) is correct, and its main task is to classify a triple as “correct” or “wrong” [29]. In the triple classification test, firstly, perform a classification test on the verification set, and maximize the classification accuracy for each relationship r to obtain the classification threshold σ_r; secondly, when testing the triple (h, r, t), the classification greater than the threshold σ_r is positive, otherwise it is negative; in the training process, the verification set is used to test the model effect, and then the accuracy of triple classification is given on the test set.

Evaluation index. The triple classification task uses the accuracy rate (ACC) as the evaluation index [30]. The higher the ACC, the better the effect of the model on the triple classification task. The calculation method is as follows: $ACC = \frac{T_{p} + T_{n}}{N_{pos} + N_{neg}}$ (7)

Where T_p represents the number of positive triples that are predicted correctly; T_n represents the number of negative triples that are predicted correctly; N_pos and N_neg represent the number of positive triples and negative triples in the training set, respectively.

Experimental realization. In the SGD process, we select μ among 0, 0.1, 0.5, 1, 2, the range of random numbers j₁, j₂, j₃ among (0.0001, 0.001), and the range of learning rate α among {0.0001, 0.001, 0.01, 0.1}, the range of the marginal γ among {1, 2, 3, 3.5, 4, 4.5, 5, 10}, the dimension k of the entity vector and the relationship vector among {18, 20, 50, 100, 200}, and the batch size B among {20, 50, 75, 120, 150, 480, 960, 4800}. The accuracy of the best configuration is determined by the validation set. The best configuration on WN11 is: μ= 0.1, α= 0.0001, γ= 4.5, k = 18, B = 4800, and use l₁ as the similarity measure; the best configuration on FB13 is: μ= 0.1, α= 0.0001, γ= 3.5, k = 50, B = 75, and use l₁ as the similarity measure; the best configuration on FB15K is: μ= 0.1, α= 0.0001, γ= 3, k = 50, B = 50, and use l₁ as the similarity measure.

Experimental results. Table 5 lists the evaluation results of triple classification. It can be seen that on WN11, the TransR* model is better than TransR, and on the ACC index, TransR* is improved by 0.3%; on FB13, the NTN model performed best, with TransR* scored lower than TransR. On the FB15K, TransR* performed best. Compared with TransR, the ACC index increased by 13.2%, indicating that the TransR* model is more suitable for large-scale knowledge graphs. This is because there are only 13 relationships in FB13, and the NTN model has certain advantages in tensor decomposition modeling on FB13. However, FB15K has a large number of relations and is relatively sparse, and the performance of NTN is significantly weaker than that of TransR*.

Table 5

Classification accuracy of triples of different models (%)

Metric	WN11	FB13	FB15K
SE [2]	53.0	75.2	—
SME(bilinear) [3]	73.8	84.3	—
NTN [7]	70.4	87.1	66.5(≈40 h)
TransE [5]	75.87	81.5	79.7(≈5 m)
STransH [17]	79.6	85.2	89.6(≈30 m)
TransH(unif) [6]	77.7	76.5	74.2(≈30 m)
TransH(bern) [6]	78.8	83.8	87.7(≈30 m)
TransR [13]	85.9	82.5	83.9(≈30 m)
TransR*(unif)	85.0	66.8	97.0(≈30 m)
TransR*(bern)	86.2	81.7	97.1(≈30 m)

5 Conclusion and future work

We propose a representation learning model TransR* based on flexible translation and relational matrix projection, which mainly solves the problem that the TransR model has low distinguishing ability for different entities that are very similar. First, we separate entities and relationships in different vector spaces; second, we put forward the principle of flexible translation to make translation strategies more flexible. During model training, the quality of generating negative examples is improved by replacing semantically similar entities, and the prior probability of the relationship is used to distinguish the relationship that encodes similarity. Experimental results show that compared with the classic TransR model, TransR* has a significant improvement in Mean Rank, Hits@10 and ACC indicators.

Since our model does not perform well on data sets with sparse relationships, we plan to improve the TransR* model in future research. From the literature, we noticed that in dealing with data sets with sparse relationships, it is better to use similarity negative sampling methods to improve the quality of negative triples. Therefore, we can optimize the method of generating negative triples on the basis of the original model. In addition, we are not satisfied with only doing experiments on link prediction and triple classification. In the next step, we will also study the task of extracting relational facts from text and entity alignment by knowledge representation methods.

Footnotes

Acknowledgments

This research was funded by the National Natural Science Foundation of China (61966035), the Xinjiang Autonomous Region Department of Science and Technology International Cooperation Project (2020E01023), the Xinjiang Uygur Autonomous Region Postgraduate Innovation Project (XJ2019G072), and the Network Resource Management and Trust Evaluation Key Laboratory of Hunan, China (2016TP003). We thank all anonymous reviewers for their constructive comments.

References

Bordes

, Glorot

, Weston

and Bengio

, Joint learning of words and meaning representations for opentext semantic parsing, In Proceedings of AISTATS (2012), 127–135.

Bordes

, Weston

, Collobert

and Bengio

, Learning structured embeddings of knowledge bases, In Proceedings of the 25th AAAI Conference on Artifificial Intelligence (2011).

Bordes

, Glorot

, Weston

and Bengio

, A semantic matching energy function for learning with multirelational data, Machine Learning (2014), 1–27.

Jenatton

, Nicolas

L.R.

, Bordes

and Obozinaki

, A latent factor model for highly multi-relational data, In Proceedings of NIPS (2012), 3167–3175.

Bordes

, Usunier

, Garcia-Duran

, Weston

and Yakhnenko

, Translating embeddings for modeling multi-relational data, In Advances in Neural Information Processing Systems. Curran Associates, Inc, (2013), 2787–2795.

Wang

, Zhang

, Feng

and Chen

, Knowledge graph embedding by translating on hy perplanes, In Proceedings of AAAI (2014), 1112–1119.

Socher

, Chen

, Christopher

D.M.

and Andrew

, Reasoning With Neural Tensor Networks for Knowledge Base Completion, In Proceedings of NIPS (2013), 926–934.

Chang

, Zhu

and Gu

, et al., Knowledge Graph Embedding by Dynamic Translation, IEEE Access (2017), 20898–20907.

Nickel

, Tresp

and Kriegel

H.P.

, A Three-Way Model for Collective Learning on Multi-Relational Data, International Conference on International Conference on Machine Learning. Omnipress, (2011), 809–816.

10.

Yang

, Yih

W.T.

, He

, et al., Embedding Entities and Relations for Learning and Inference in Knowledge Bases, Computer Science (2014).

11.

Nickel

, Rosasco

and Poggio

, Holographic Embeddings of Knowledge Graphs, AAAI Conference on Artificial Intelligence (2015), 1955–1961.

12.

Théo

, Johannes

, Sebastian

, Gaussier

É.

and Guillaume

, ICML, (2016).

13.

Lin

, Zhang

, Liu

, Sun

, Liu

and Zhu

, Learning Entity and Relation Embeddings for Knowledge Graph Completion, In Proceedings of AAAI, (2015).

14.

Zhu

, Zhou

, Zhang

, et al., A neural translating general hyperplane for knowledge graph embedding, Journal of computational ence (2019), 108–117.

15.

, Li

, Zhu

, A knowledge map representation method based on improved vector projection distance, Computer Science 47(04) (2020), 189–193.

16.

Chen

, Wen

, Zhang

, et al., An improved TransE-based knowledge graph representation method, Computer Engineering 046(005) (2020), 63–69.

17.

Xiaojun

, Yang

, STransH: An improved knowledge representation model based on translation model, Computer Science 46(09) (2019), 184–189.

18.

, He

, Xu

, et al., Knowledge Graph Embedding via Dynamic Mapping Matrix, Meeting of the Association for Computational Linguistics & the International Joint Conference on Natural Language Processing (2015).

19.

Fan

, Zhou

, Chang

, et al., Transition-based knowledge graph embedding with relational mapping properties, Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing (2014), 328–337.

20.

Kargin

and Petrenko

, Knowledge Representation in Smart Rules Engine, 2019 3rd International Conference on Advanced Information and Communications Technologies (AICT), Lviv:IEEE Press, (2019), 231–236.

21.

Lijuan

, Jun

and Wei

, Multi-source Knowledge Embedding Research of Knowledge Graph, 2019 IEEE 3rd International Conference on Circuits, Ukraine: Systems and Devices (ICCSD), (2019), 163–166.

22.

Bart

, Joost

and Marc

, Safe inductions and their applications in knowledge representation, Artifificial Intelligence (2018), 167–185.

23.

Liu

, Luan

, Li

, Wu

, Linguistic Petri Nets Based on Cloud Model Theory for Knowledge Representation and Reasoning, IEEE Transactions on Knowledge and Data Engineering 30(4) (2018), 717–728.

24.

Santra

, Basu

S.K.

, Mandal

J.K.

and Goswami

, Rough set based lattice structure for knowledge representation in medical expert systems: Low back pain management case study, Expert Systems With Applications (2020).

25.

Xiao

, Chunhong

, Chenchen

, Yang

and Zheng

, Distributed representation of knowledge graphs with subgraph-aware proximity, Theoretical Computer Science (2020), 48–56.

26.

Kumarasinghe

, Kasabov

and Taylor

, Deep learning and deep knowledge representation in Spiking Neural Networks for Brain-Computer Interfaces, Neural Networks (2020), 169–185.

27.

Guo

, Li

, Hui

, Meng

, Ma

, Liu

, Wang

, Zhai

and Zhang

, Knowledge Graph Embedding Preserving Soft Logical Regularity, In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. Association for Computing Machinery (2020), 425–434.

28.

, Ruan

, Korpeoglu

, Kumar

and Achan

, Product Knowledge Graph Embedding for E-commerce, In Proceedings of the 13th International Conference on Web Search and Data Mining. Association for Computing Machinery, (2020), 672–680.

29.

Rosso

, Yang

and Cudré-Mauroux

, Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction, In Proceedings of The Web Conference. Association for Computing Machinery (2020), 1885–1896.

30.

Gao

, Sun

, Shan

, Lin

and Wang

, Rotate3D: Representing Relations as Rotations in Three-Dimensional Space for Knowledge Graph Embedding. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. Association for Computing Machinery, (2020), 385–394.

TransR * : Representation learning model by flexible translation and relation matrix projection

Abstract

Keywords

1 Introduction

2 Related work

2.2 Other models

3 Our model

3.1 Our motivation

4.1 Data set

Table 2 Data set statistics Data Set #Entities #Relationships #Train #Valid #Test WN18 40943 18 141442 5000 5000 FB15K 14951 1345 483142 50000 59071 WN11 38696 11 112581 2609 10544 FB13 75043 13 316232 5908 23733

Footnotes

Acknowledgments

References

Table 2
Data set statistics

Data Set #Entities #Relationships #Train #Valid #Test

WN18 40943 18 141442 5000 5000

FB15K 14951 1345 483142 50000 59071

WN11 38696 11 112581 2609 10544

FB13 75043 13 316232 5908 23733