Entity alignment with fusing relation representation

Abstract

Entity alignment is the task of identifying entities from different knowledge graphs (KGs) that point to the same item and is important for KG fusion. In the real world, due to the heterogeneity between different KGs, equivalent entities often have different relations around them, so it is difficult for Graph Convolutional Network (GCN) to accurately learn the relation information in the KGs. Moreover, to solve the problem regarding inadequate utilisation of relation information in entity alignment, a novel GCN-based model, joint Unsupervised Relation Alignment for Entity Alignment (URAEA), is proposed. The model first employs a novel method for calculating relation embeddings by using entity embeddings, then constructs unsupervised seed relation alignments through these relation embeddings, and finally performs entity alignment together with relation alignment. In addition, the seed entity alignments are expanded based on the generated seed relation alignments. Experiments conducted on three real-world datasets show that this approach outperforms state-of-the-art methods.

Keywords

Entity Alignment graph convolutional network relation alignment knowledge graphs

1. Introduction

Knowledge graphs (KGs) structurally represent real-world facts in the form of triples $(h, r, t)$ , where h, r and t denote the head entity, relation, and tail entity, respectively, and the triples in KGs can intuitively and efficiently describe facts. In recent years, KGs have played an important role in artificial intelligence tasks, such as intelligent question answering [12], information extraction [6], and recommendation systems [28]. As the scale of applications supported by KGs grows, it is often difficult for a single KG to meet the needs of these applications. An effective solution is entity alignment. Entity alignment aims to discover equivalent entities in different KGs and then fuse different KGs with these entities. Since different KGs often come from various data sources and have different linguistic forms, such as Freebase [1] and YAGO [16], these KGs have different structural forms (i.e., heterogeneity), which presents challenges for the entity alignment task.

In recent years, many researchers have used embedding-based methods for entity alignment. These methods embed KGs into a low-dimensional, continuous vector space and discover possible aligned entities by measuring the similarity of entity embeddings. Translation Embedding(TransE) [2] is the earliest KG embedding method, which treats relation as the translation from head entity to tail entity. Some TransE-based entity alignment methods, such as Multilingual TransE (MTransE) [4] and Joint Attribute-Preserving Embedding (JAPE) [17], have been developed. In recent years, Graph Neural Network (GNN) has received much attention due to their excellent performance in processing graph data [26]. The GNN learns node representations by aggregating neighbourhood information of the node and has been utilised for entity alignment, examples include the GCN-based Entity Alignment (GCN-Align) [22], Relation-aware Dual-Graph Convolutional Network (RDGCN) [23] and KG Alignment Network (AliNet) [20].

Recent entity alignment studies have mainly used Graph Convolutional Network (GCN) [8]. GCN is convenient for modelling the node features of graphs but have difficulty in capturing the relation information contained in the graphs [5]. According to Highway-GCN (HGCN) [24], entities and relations in knowledge graphs are usually closelyrelated, and rich relation information helps to improve the performance of entity alignment. For example, Fig. 1 shows two subgraphs from the Chinese and English versions of DBpedia [10]. If only the surrounding entities are considered, it is difficult to align $v_{s}$ and $v_{t}$ because of their nonequivalent neighbourhoods. But if the relations around them are considered, such as $r_{1}$ and $r_{6}$ , since relations are related to head and tail entities, $v_{1}$ and $v_{6}$ can be aligned with the aid of relation alignment.

Fig. 1.

Illustration of the importance of relations for entity alignment. The entity pairs $⟨ v_{1}, v_{6} ⟩$ and $⟨ v_{2}, v_{7} ⟩$ have been aligned, and $⟨ v_{s}, v_{t} ⟩$ is the entity pair that should be aligned.

RDGCN [23] constructs a dual relation graph, and captures relation information via the interaction between the primal graph and the dual relation graph. However, this method may introduce noise from the dual graph. HGCN [24] uses entity embeddings to approximate relation embeddings and then incorporates these relation embeddings into entities to learn better embeddings for both. However, due to the heterogeneity of KGs, the equivalent entities in two KGs usually have different neighbourhoods. This may make entity alignment challenging since the equivalent entities aggregate different relations. As shown in Fig. 1, the relations around $v_{s}$ and $v_{t}$ are mostly different, so the two entities are difficult to align because they incorporate distinct relation information. Relation-aware Neighborhood Matching (RNM) [31] uses the semantic information and mapping properties of relations and has achieved good entity alignment results. However, RNM ignores relation information when learning entity representations.This paper uses entity embeddings to represent relations and then performs relation alignment. Essentially, it is the alignment of entity embeddings by utilising the relation information. As a result, the entity embedding contains only the relation information in the seed relation pairs, thus effectively mitigating the effects of knowledge graph heterogeneity. In this paper, the model works better if the seed relation pairs are more reliable.

In this paper, a joint Unsupervised Relation Alignment for Entity Alignment model named URAEA, is proposed, which aims to effectively utilise relation information for entity alignment. The entities connected to the relation are weighted to obtain more accurate relation embeddings. Then, both entity alignment and relation alignment are performed to potentially learn relation information, and a strategy is adopted to iteratively expand the seed entity alignments. Previously developed methods such as Bootstrapping Entity Alignment (BootEA) [18] and Degree-Aware Alignment for Entities in Tail (DAT) [27] also expand seed entity alignments, but the relations between entities are ignored. In this paper, the more accurate entity pairs can be identified by using the seed relation alignments obtained in relation alignment. Experiments are conducted on three cross-lingual datasets, and the URAEA achieves the best results.

The main contributions of this paper are as following:

When using entity embeddings to calculate relation embeddings, this paper proposes a weighting equation that considers information about the weights of entities.

To mitigate the effects of knowledge graph heterogeneity, this paper unites relation alignment with entity alignment.

The relations around the entities are considered when expanding the seed entity pairs to obtain the entity pairs that are more likely to be aligned.

2. Related work

Recent entity alignment methods are mainly divided into translation-based methods and GNN-based methods. These methods are reviewed in this section.

2.1. Translation-based methods

Many translation-based entity alignment methods adopt TransE [2], which treats relations as translations from head entities to tail entities. MTransE [4] was the first entity alignment method to utilise the idea of TransE. It uses TransE to first embed each KG, then employs three techniques that can map two KGs into the same vector space, and finally aligns the entities in different KGs. Iterative PTransE (IPTransE) [30] utilises Path-based TransE (PTransE) [11] to embed each KG and then transforms the embeddings of two KGs into a unified vector space based on seed entity alignments. Furthermore, IPTransE employs a strategy to iteratively generate newly aligned entities. BootEA [18] proposes a bootstrapping approach that labels likely entity pairs as training data by optimising a global objective under a one-to-one mapping constraint, and BootEA also uses a new sampling method called truncated uniform negative sampling. Semi-supervised Entity Alignment (SEA) [14] proposes a semi-supervised entity alignment strategy that uses labelled entities and a large number of unlabelled entities for alignment. Edge-centric Translational Embedding (TransEdge) [19] represents relations according to their context (head and tail entities) and then uses a parameter sharing strategy to unify two KGs. Some methods use additional KG information. JAPE [17] embeds two KGs into a unified vector space and then optimises the KG representations by using the attribute information. Multi-view KG Embedding (MultiKE) [29] learns entity embeddings for entity alignment through three types of information: entity names, relations, and attributes.

Following IPTransE and BootEA, the proposed model expands the seed entity alignments to achieve enhanced entity alignment performance. In contrast to these methods, in this paper, the relations between entities are considered when matching them.

2.2. GNN-based methods

The more similar the neighbourhoods of two entities are, the more likely the two entities are to align. GNN can model the neighbourhood information of entities, and therefore, an increasing number of researchers are utilising GNN for entity alignment.

GCN-Align [22] was the first approach to use GCN [8] for entity alignment. It uses GCN to embed entities from two KGs into a unified vector space. RDGCN [23] constructs a dual relation graph to learn relation information. HGCN [24] uses the highway gate strategy, which can control noise propagation to jointly learn both entity and relation representations. Multi-channel GNN (MuGNN) [3] learns KG embeddings through multiple channels, and each channel encodes KGs via different relation weighting schemes. AliNet [20] showed that equivalent entities tend to have different neighbourhoods, so AliNet uses Graph Attention Network (GAT) [21] with a gating mechanism to aggregate multihop neighbours to extend the overlaps among the neighbourhood structures between equivalent entities. Structure and Semantics Preserving (SSP) [13] learns entity representations in a coarse-to-fine manner by exploiting the global structures and local semantics of KGs. Neighborhood Matching Network (NMN) [25] evaluates the similarity between entities by using a novel graph sampling method and a cross-graph neighbourhood matching module to overcome the challenge regarding the structural heterogeneity between KGs. RNM [31] proposes a relation-aware neighbourhood matching model, which considers the neighbouring entities and the connected relations between entities when matching subgraphs.

The HGCN and RDGCN take relation information into consideration when modelling entity embeddings, but some noise from neighbours may be introduced. In this paper, a novel entity alignment model to reduce noise propagation while learning relations is proposed. In addition, the iterative strategy of the RNM is used to achieve better performance.

3. Problem definition

A KG is formalized as $G = (E, R, T)$ , where E denotes the set of entities, R denotes the set of relations, and $T = E \times R \times E$ denotes the set of triples. A KG consists of relation triples $(h, r, t)$ , where $h, t \in E$ and $r \in R$ .

Given two KGs $G_{1} = (E_{1}, R_{1}, T_{1})$ and $G_{2} = (E_{2}, R_{2}, T_{2})$ , the pre-aligned entity pairs are denoted as $L_{e} = {(e_{1}, e_{2}) ∣ e_{1} \in E_{1}, e_{2} \in E_{2}, e_{1} \equiv e_{2}}$ . Among them, ≡ represents equivalence, that is, entity $e_{1}$ and entity $e_{2}$ point to the same thing.

Entity alignment aims to discover newly aligned entity pairs between two KGs based on these pre-aligned entity pairs. In this paper, bold letters are used to denote vectors.

4. Our model

In this section, the URAEA is described in detail. Figure 2 shows the overall structure of URAEA. The URAEA is mainly divided into four stages: preliminary entity alignment, relation alignment, relation-aware entity pair expandsion, and joint entity alignment and relation alignment.

Fig. 2.

Overall architecture of the URAEA. The blue dashed lines denote preliminary entity alignment, and the blue solid lines denote joint entity alignment and relation alignment.

In the first stage, the GCN is used to learn the entity embeddings in two KGs with the objective of entity alignment. In the second stage, the relation embeddings are calculated by using our proposed method; then, unsupervised seed relation alignments are generated via relation embeddings, and these seed relation alignments are used as the training data for relation alignment. In the third stage, the seed entity pairs are expanded based on the generated seed relation alignments. In the fourth stage, the two KGs are trained with the goals of entity alignment and relation alignment.

4.1. Preliminary entity alignment

4.1.1. Entity embedding

In this paper, a 2-layer GCN is used to model the structural information of KGs. Two given KGs $G_{1} = (E_{1}, R_{1}, T_{1})$ and $G_{2} = (E_{2}, R_{2}, T_{2})$ are put into one KG $G_{a} = (E_{a}, R_{a}, T_{a})$ , which is the input of the GCN. $E_{a} = E_{1} \cup E_{2}$ , $R_{a} = R_{1} \cup R_{2}$ , and $T_{a} = T_{1} \cup T_{2}$ . For each triple $(h, r, t) \in T_{a}$ , let $A [h] [t] = 1$ , where A is the adjacency matrix of $G_{a}$ .

The GCN takes the embeddings of all entities $X^{(l)} = {x_{1}^{(l)}, x_{2}^{(l)}, x_{3}^{(l)}, \dots \dots, x_{n}^{(l)} ∣ x_{i}^{(l)} \in R^{d^{(l)}}}$ as the input of layer l, where n denotes the number of entities in $E_{a}$ and $d^{(l)}$ denotes the dimension of the entity embeddings in layer l. The entity embeddings in layer $l + 1$ are obtained as: $\begin{matrix} (1) & X^{(l + 1)} = ReLU ({\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} X^{(l)} W^{(l)}), \end{matrix}$ where $\tilde{A} = A + I$ , A is the adjacency matrix of $G_{a}$ , I denotes an identity matrix, $\tilde{D}$ is the diagonal degree matrix of the nodes in $G_{a}$ , and $W^{(l)} \in R^{d (l) \times d (l + 1)}$ is the trainable weight matrix.

Furthermore, the highway gates strategy [15] can control the propagation of noise in the GCN. This strategy is adopted in each layer of the GCN: $\begin{aligned} (2) & T (X^{(l)}) = σ (X^{(l)} W_{T}^{(l)} + b_{T}^{(l)}), \\ (3) & X^{(l + 1)} = T (X^{(l)}) \cdot X^{(l + 1)} + (1 - T (X^{(l)})) \cdot X^{(l)}, \end{aligned}$ where σ is the sigmoid function, $W_{T}^{(l)}$ and $b_{T}^{(l)}$ are the weight matrix and bias vector of gate $T (X^{(l)})$ respectively, and · denotes the element-wise multiplication.

4.1.2. Entity alignment

After the representations of all entities are obtained, the URAEA determines whether the entities are aligned by measuring the distance between them. For two entities $e_{1} \in E_{1}$ and $e_{2} \in E_{2}$ , the distance between them is calculated as follows: $\begin{matrix} (4) & d (e_{1}, e_{2}) = ‖ x_{e_{1}} - x_{e_{2}} ‖_{L 1}, \end{matrix}$ where $x_{e_{1}}$ and $x_{e_{2}}$ denote the embeddings of entities $e_{1}$ and $e_{2}$ , respectively. A smaller distance denotes a higher probability of alignment between $e_{1}$ and $e_{2}$ .

4.1.3. Training

In this paper, the margin-based loss function is used as the training objective to make the distance of aligned entity pairs as small as possible and the distance of negative samples as large as possible: $\begin{matrix} (5) & L_{e} = \sum_{(p, q) \in L_{e}} \sum_{(p^{'}, q^{'}) \in L_{e}^{'}} max {0, d (p, q) - d (p^{'}, q^{'}) + γ_{e}}, \end{matrix}$ where $L_{e}$ denotes the set of pre-aligned entity pairs, $L_{e}^{'}$ denotes the set of negative samples obtained from $L_{e}$ , and $γ_{e} > 0$ is the margin.

The model uses nearest neighbour sampling [9] to obtain negative samples. Given an entity pair $⟨ e_{1}, e_{2} ⟩$ , the k nearest entities of $e_{1}$ (or $e_{2}$ ) are sampled from the set of entities $E_{a}$ according to Eq. (4). Then, $e_{2}$ (or $e_{1}$ ) is replaced to obtain negative samples $L_{e}^{'}$ .

4.2. Relation alignment

4.2.1. Relation embedding

The semantics of a relation are related to its connected entities. Previously developed approaches, such as HGCN and RNM, use the embeddings of head entities and tail entities to represent relations: $r = f (h^{r}, t^{r})$ , where $h^{r}$ and $t^{r}$ denote the averaged embeddings of head entities and tail entities linked to the relation r, respectively. However, this method ignores the weight of each entity when representing relations. Generally, if an entity appears in more triples with the relation, then it should have more weight. Figure 3 shows the calculation of relation embedding.

Fig. 3.

Calculation of relation embedding.

Given a relation $r \in R_{a}$ , its connected head entities are denoted as $H^{r} = {h_{1}^{r}, h_{2}^{r}, \dots \dots, h_{u}^{r} ∣ h_{i}^{r} \in E a}$ , and its connected tail entities are denoted as $T^{r} = {t_{1}^{r}, t_{2}^{r}, \dots \dots, t_{v}^{r} ∣ t_{i}^{r} \in E a}$ , where u is the number head entity types, v is the number tail entity types, and $h_{i}^{r}$ and $t_{i}^{r}$ denote the $i - t h$ head entity and tail entity of relation r, respectively. The URAEA first calculates the weighted embeddings of head entities and tail entities linked to relation r. The weighted embedding of the head entities is calculated as follows: $\begin{matrix} (6) & h^{r} = (\sum_{i = 1}^{u} (f_{i}^{h} h_{i}^{r})) / \sum_{i = 1}^{u} f_{i}^{h}, \end{matrix}$ the weighted embedding of the tail entities is calculated as follows: $\begin{matrix} (7) & t^{r} = (\sum_{i = 1}^{v} (f_{i}^{t} t_{i}^{r})) / \sum_{i = 1}^{v} f_{i}^{t}, \end{matrix}$ where $f_{i}^{h}$ denotes the number of triples consisting of the head entity $h_{i}^{r}$ and the relation r, $f_{i}^{t}$ denotes the number of triples consisting of the tail entity $t_{i}^{r}$ and the relation r, $h_{i}^{r}$ and $t_{i}^{r}$ denote the embeddings of head entity $h_{i}^{r}$ and tail entity $t_{i}^{r}$ , respectively.

After obtaining $h_{i}^{r}$ and $t_{i}^{r}$ , the embedding of relation r is calculated with the following equation: $\begin{matrix} (8) & r = concat [h^{r}, t^{r}], \end{matrix}$ where $r \in R^{2 \times d^{(l)}}$ , and concat denotes the concatenation operation.

Furthermore, because of the small number of relations in KGs, the reverse relations are added. For example, for a triple (China, $Capital$ , Beijing), there exists another triple (Beijing, ${Capital}^{- 1}$ , China), where ${Capital}^{- 1}$ is the inverse relation of $Capital$ . For the first KG $G_{1}$ , the reverse relations are denoted as $R_{1}^{- 1} = {r_{i}^{- 1} ∣ r_{i} \in R_{1}}$ , and the reverse relations in the second KG $G_{2}$ are denoted as $R_{2}^{- 1} = {r_{i}^{- 1} ∣ r_{i} \in R_{2}}$ , where $r_{i}^{- 1}$ denotes the reverse relation of $r_{i}$ . The inverse relation $r^{- 1}$ is represented as follows: $\begin{matrix} (9) & r^{- 1} = concat [t^{r}, h^{r}], \end{matrix}$

Finally, the representations of all relations $Y^{(l)} = {y_{1}^{(l)}, y_{2}^{(l)}, y_{3}^{(l)}, \dots \dots, y_{m}^{(l)} ∣ y_{i}^{(l)} \in R^{2 \times d^{(l)}}}$ are obtained, where $m = 2 \times | R_{a} |$ , $d^{(l)}$ denotes the dimension of the entity embeddings in layer l.

4.2.2. Generating seed relation alignments

The proposed model first calculates the distance between the relations in $R_{1} \cup R_{1}^{- 1}$ and the relations in $R_{2} \cup R_{2}^{- 1}$ to obtain the matrix $S^{r} \in R^{(2 | R_{1} |) \times (2 | R_{2} |)}$ , where the element ${s i m}_{i j}^{r}$ in matrix $S^{r}$ represents the distance between $r_{i} \in R_{1} \cup R_{1}^{- 1}$ and $r_{j} \in R_{2} \cup R_{2}^{- 1}$ . Following HGCN [24], for two relations from different KGs, the more entities that are aligned in the entities that are connected to the two relations, the more likely the two relations have the same meaning.

Given a relation $r_{i} \in R_{a}$ , the head and tail entities connected to $r_{i}$ are denoted as $H T_{r_{i}} = {h, t ∣ (h, r_{i}, t) \in T_{a}}$ . For the reverse relation $r_{i}^{- 1}$ , the head and tail entities connected to $r_{i}^{- 1}$ are defined as $H T_{r_{i}^{- 1}} = {h, t ∣ (h, r_{i}, t) \in T_{a}}$ . The aligned entities in $H T_{r_{i}}$ and $H T_{r_{j}}$ are denoted as $P_{r_{i} r_{j}}^{r} = {(e_{i}, e_{j}) ∣ e_{i} \in H T_{r_{i}}, e_{j} \in H T_{r_{j}}, (e_{i}, e_{j}) \in L_{e}}$ . The distance between $r_{i}$ and $r_{j}$ is calculated as follows: $\begin{matrix} {s i m}_{i j}^{r} = ‖ y_{r_{i}} - y_{r_{j}} ‖_{L 1} - α \frac{| P_{r_{i} r_{j}}^{r} |}{| H T_{r_{i}} | + | H T_{r_{j}} |}, \end{matrix}$ where $y_{r_{i}}$ and $y_{r_{j}}$ denote the embeddings of $r_{i}$ and $r_{j}$ , α is a hyperparameter, and a smaller ${s i m}_{i j}^{r}$ means that $r_{i}$ and $r_{j}$ are closer.

Afterwards, the relation pairs are selected as unsupervised seed relation alignments based on the matrix $S^{r}$ . In this paper, a method similar to DAT [27] is adopted for the selection process. Specifically, for a relation $r_{i} \in R_{1} \cup R_{1}^{- 1}$ , the model looks for the smallest value ${s i m}_{i j}^{r}$ (corresponding to the relation $r_{j} \in R_{2} \cup R_{2}^{- 1}$ ) and the second smallest value ${s i m}_{i j^{'}}^{r}$ in the $i - t h$ row of $S^{r}$ ; the difference between them is denoted as $Δ_{1} = | {s i m}_{i j}^{r} - {s i m}_{i j^{'}}^{r} |$ . Suppose that for the $j - t h$ column of $S^{r}$ , its smallest value is exactly ${s i m}_{i j}^{r}$ , and the second smallest value is ${s i m}_{i^{'} j}^{r}$ ; the difference between them is denoted as $Δ_{2} = | {s i m}_{i j}^{r} - {s i m}_{i^{'} j}^{r} |$ . When $Δ_{1} + Δ_{2} > θ$ , $r_{i}$ and $r_{j}$ are considered aligned and are added to the seed relation alignments, where θ is a given threshold. If the current relation pair conflicts with the already selected relation pairs, the current relation pair is dropped. Finally, the seed relation alignments $L_{r} = {(r_{i}, r_{j}) ∣ r_{i} \in R_{1} \cup R_{1}^{- 1}, r_{j} \in R_{2} \cup R_{2}^{- 1}, r_{i} \equiv r_{j}}$ are obtained.

4.2.3. Relation alignment

The model determines whether two relations are aligned by measuring the distance between the embeddings of the relations. For relations $r_{1} \in R_{1} \cup R_{1}^{- 1}$ and $r_{2} \in R_{2} \cup R_{2}^{- 1}$ , the distance between them is calculated as follows: $\begin{matrix} (10) & d (r_{1}, r_{2}) = ‖ y_{r_{1}} - y_{r_{2}} ‖_{L 1} . \end{matrix}$ A smaller $d (r_{1}, r_{2})$ means that the two relations are closer.

4.2.4. Training

Similar to entity alignment, the negative samples $L_{r}^{'}$ of the seed relation alignments are obtained by using nearest neighbour sampling, then the margin-based loss function is used as follows: $\begin{matrix} (11) & L_{r} = \sum_{(p, q) \in L_{r}} \sum_{(p^{'}, q^{'}) \in L_{r}^{'}} max {0, d (p, q) - d (p^{'}, q^{'}) + γ_{r}}, \end{matrix}$ where $L_{r}$ denotes the seed relation alignments, $L_{r}^{'}$ denotes the set of negative samples of $L_{r}$ , and $γ_{r} > 0$ is the margin.

4.3. Relation-aware entity pair expanding

Based on the seed relation alignments $L_{r}$ , the seed entity alignments are expanded iteratively. Specifically, the distance from $E_{1}$ to $E_{2}$ is first calculated to obtain the matrix $S^{e}$ , where the element ${s i m}_{i j}^{e}$ in $S^{e}$ represents the distance between $e_{i} \in E_{1}$ and $e_{j} \in E_{2}$ . In this paper, the model considers that the greater the relations and entities around two entities are aligned, the more similar these two entities are. Therefore, in addition to using entity embeddings to measure the distance between two entities, the relations and entities around them are also considered. Based on the seed entity alignments and seed relation alignments, the distance between $e_{i}$ and $e_{j}$ is calculated as follows: $\begin{matrix} {s i m}_{i j}^{e} = ‖ x_{e_{i}} - x_{e_{j}} ‖_{L 1} - β \frac{| P_{e_{i} e_{j}}^{e} |}{| R E_{e_{i}} | + | R E_{e_{j}} |}, \end{matrix}$ where β is a hyperparameter, $R E_{e_{i}} = {(r, t), (r^{- 1}, h) ∣ (e_{i}, r, t) \in T_{a}, (h, r, e_{i}) \in T_{a}}$ denotes the relations and entities around $e_{i}$ , and $P_{e_{i} e_{j}}^{e} = {(r_{i}, r_{j}, t_{i}, t_{j}) ∣ (r_{i}, t_{i}) \in R E_{e_{i}}, (r_{j}, t_{j}) \in R E_{e_{j}}, (r_{i}, r_{j}) \in L_{r}, (t_{i}, t_{j}) \in L_{e}}$ denotes the aligned relation pairs and entity pairs around $e_{i}$ and $e_{j}$ .

Afterwards, the method described in Section 4.2.2 is used to select the newly aligned entity pairs and those entity pairs are added to the seed entity alignments $L_{e}$ .

4.4. Joint entity alignment and relation alignment

In this paper, the URAEA performs relation alignment together with entity alignment by using the following objective function: $\begin{matrix} (12) & L_{a l l} = L_{e} + ε L_{r}, \end{matrix}$ where ε is the hyperparameter used to balance the entity alignment loss and the relation alignment loss. Adaptive Moment Estimation (Adam) [7] is used to minimize the objective function $L_{a l l}$ .

In this paper, the model first uses Eq. (5) for preliminary entity alignment to obtain entity embeddings. Next, the acquired entity embeddings are used to calculate the relation embeddings, and then the seed relation alignments are obtained. Meanwhile, the seed entity alignments are expanded. In this stage, the model is trained using Eq. (12). After the training process is completed, the iterative strategy of RNM [31] is adopted to obtain more accurate alignment results.

5. Experiments

5.1. Experimental setup

5.1.1. Datasets

In this paper, DBP15K [17] is used to evaluate the proposed model URAEA, which is a subset of the large-scale KG DBpedia [10]. DBP15K contains three cross-lingual datasets: ${D B P 15 K}_{Z H - E N}$ (Chinese to English), ${D B P 15 K}_{J A - E N}$ (Japanese to English) and ${D B P 15 K}_{F R - E N}$ (French to English). Each dataset consists of two KGs in different languages and provides 15,000 pre-aligned entity pairs. In addition, each dataset contains some pre-aligned relations, and they are used as test sets for relation alignment. Detailed information about DBP15K is shown in Table 1. Following previous works, 30% of the pre-aligned entity pairs are used as the training set, and 70% are used as the test set.

Table 1
Summary of the DBP15K datasets

DBP15K #Ent #Rel #Triples Alignements

#Ent #Rel

${D B P 15 K}_{Z H - E N}$ ZH 66, 469 2, 830 153, 929 15,000 890

EN 98, 125 2, 317 237, 674

${D B P 15 K}_{J A - E N}$ JA 65, 744 2, 043 164, 373 15,000 529

EN 95, 680 2, 096 233, 319

${D B P 15 K}_{F R - E N}$ FR 66, 858 1, 379 192, 191 15,000 212

EN 105, 889 2, 209 278, 590

DBP15K	#Ent	#Rel	#Triples	Alignements
${D B P 15 K}_{Z H - E N}$	ZH	66, 469	2, 830	153, 929	15,000	890
EN	98, 125	2, 317	237, 674
${D B P 15 K}_{J A - E N}$	JA	65, 744	2, 043	164, 373	15,000	529
EN	95, 680	2, 096	233, 319
${D B P 15 K}_{F R - E N}$	FR	66, 858	1, 379	192, 191	15,000	212
EN	105, 889	2, 209	278, 590

5.1.2. Baselines

To evaluate the model URAEA, some competing entity alignment methods are selected for comparison in this paper. These methods are mainly divided into translation-based methods and GNN-based methods.

The translation-based methods include MTransE [4], SEA [14], BootEA [18] and TransEdge [19].

The GNN-based methods include GCN-Align [22], RDGCN [23], HGCN [24], AliNET [20], NMN [25] and RNM [31].

Table 2
Performance on entity alignment

Models ZH-EN JA-EN FR-EN

Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR

MTransE 30.8 61.4 0.364 27.9 57.5 0.349 24.4 55.6 0.335

SEA 42.4 79.6 0.548 38.5 78.3 0.518 40.0 79.9 0.533

BootEA 62.9 84.8 0.703 62.2 85.4 0.701 65.3 87.4 0.731

TransEdge 73.5 91.9 0.801 71.9 93.2 0.795 71.0 94.1 0.796

GCN-Align 41.3 74.4 0.549 39.9 74.5 0.546 37.3 74.5 0.532

RDGCN 70.8 84.6 0.746 76.7 89.5 0.812 88.6 95.7 0.911

HGCN 72.0 85.7 0.768 76.6 89.7 0.813 89.2 96.1 0.917

AliNET 53.9 82.6 0.628 54.9 83.1 0.645 55.2 85.2 0.657

NMN 73.3 86.9 0.781 78.5 91.2 0.827 90.2 96.7 0.924

RNM 84.0 91.9 0.870 87.2 94.4 0.899 93.8 98.1 0.954

URAEA 87.4 94.3 0.900 89.7 96.2 0.921 96.1 98.5 0.970

Models	ZH-EN	JA-EN	FR-EN
MTransE	30.8	61.4	0.364	27.9	57.5	0.349	24.4	55.6	0.335
SEA	42.4	79.6	0.548	38.5	78.3	0.518	40.0	79.9	0.533
BootEA	62.9	84.8	0.703	62.2	85.4	0.701	65.3	87.4	0.731
TransEdge	73.5	91.9	0.801	71.9	93.2	0.795	71.0	94.1	0.796
GCN-Align	41.3	74.4	0.549	39.9	74.5	0.546	37.3	74.5	0.532
RDGCN	70.8	84.6	0.746	76.7	89.5	0.812	88.6	95.7	0.911
HGCN	72.0	85.7	0.768	76.6	89.7	0.813	89.2	96.1	0.917
AliNET	53.9	82.6	0.628	54.9	83.1	0.645	55.2	85.2	0.657
NMN	73.3	86.9	0.781	78.5	91.2	0.827	90.2	96.7	0.924
RNM	84.0	91.9	0.870	87.2	94.4	0.899	93.8	98.1	0.954
URAEA	87.4	94.3	0.900	89.7	96.2	0.921	96.1	98.5	0.970

5.1.3. Implementation details

In this paper, the dimension of the entity embeddings in GCN is set as 300. The model set the margin $γ_{e} = 1$ , the margin $γ_{r} = 1$ , the threshold $θ = 0.005$ , $α = 2$ , $β = 10$ , and $ε = 0.05$ . Hyperparameters α, β, ε and θ in the experiment were obtained through the optimization of multiple tests. Other parameters were from paper [15]. The learning rate is set to 0.001, and $k = 125$ negative samples for each aligned entity pair are sampled every 10 batches. Following RDGCN, to utilise the entity name information, the entity names in Chinese, French, and Japanese are translated into English using Google Translate, after which the initial entity embeddings are constructed by using the pre-trained English word vectors Glove.

During the training process, the model is trained in 10 batches through Eq. (5) in the first step and is trained in 50 batches through Eq. (12) in the second step. In the second step, the model regenerates the seed relation alignments and re-expands the seed entity alignments every 10 batches.

5.1.4. Evaluation metrics

To evaluate the model, Hits@k and mean reciprocal rank (MRR) are used as metrics. Hits@k is the percentage of correctly aligned entities among the top k results, and MRR is the average of the reciprocal ranks of the prediction results. The larger the values of Hits@k and MRR are, the better the model. Note that the bolded numbers indicate the best results and the underlined numbers indicate the second best results.

5.2. Experimental results

5.2.1. Entity alignment

The experimental results of all methods on entity alignment are given in Table 2. The experimental results show that the model URAEA achieves the best results.

Among the translation-based models, TransEdge focuses on modelling the complex relations in KGs, such as 1-N, N-1, and N-N relations, and achieves the best results. Among the GNN-based models, RDGCN, HGCN, NMN, and RNM initialize entity embeddings by using entity names and obtain advanced results. This illustrates the importance of entity name information. Among all the baselines, RNM achieves the best results by considering relations during neighbourhood matching.

The URAEA achieves the best results on the three datasets, compared with RNM, Hits@1 on the three datasets improves by 4.0%, 2.9% and 2.5%, respectively. This is because URAEA is able to learn the relation information in the KG, and the entity embeddings contain information about neighbouring entities and neighbouring relations. Additionally, the expansion of seed alignments brings performance improvement. This validates the effectiveness of the proposed model.

Table 3
Performance on relation alignment

Models ZH-EN JA-EN FR-EN

Hits@1 Hits@10 Hits@1 Hits@10 Hits@1 Hits@10

MTransE-PR 32.8 57.6 31.0 56.1 18.9 44.3

BootEA-PR 45.3 85.4 41.4 79.8 30.2 60.4

HGCN-PR 69.3 84.5 63.1 81.3 41.5 54.3

HGCN-JR 70.3 85.4 65.0 83.6 42.5 56.6

RNM 80.6 87.1 74.5 84.6 49.5 62.5

URAEA 83.9 89.0 78.5 87.1 51.7 63.4

Models	ZH-EN	JA-EN	FR-EN
MTransE-PR	32.8	57.6	31.0	56.1	18.9	44.3
BootEA-PR	45.3	85.4	41.4	79.8	30.2	60.4
HGCN-PR	69.3	84.5	63.1	81.3	41.5	54.3
HGCN-JR	70.3	85.4	65.0	83.6	42.5	56.6
RNM	80.6	87.1	74.5	84.6	49.5	62.5
URAEA	83.9	89.0	78.5	87.1	51.7	63.4

5.2.2. Ablation study

To evaluate the effectiveness of the various components of the URAEA, some ablation studies are conducted. The following variants of URAEA are provided: URAEA-RA denotes the URAEA without relation alignment; URAEA-RC denotes the model in which the relations are not considered when expanding the seed entity alignments. The experimental results are shown in Table 4. It should be noted that the two variants outperform all baselines in Table 2, which is attributed to the expansion of seed entity alignments.

According to the results of URAEA-RA and URAEA, URAEA achieves Hits@1 values that are 2.3%, 1.4% and 0.7% higher than those of URAEA-RA on the three datasets. The reason for this finding may be that URAEA is able to obtain entity representations that contain relation information by aligning the relations between entities, thereby achieving improved entity alignment performance. This demonstrates the effectiveness of relation alignment.

By comparing URAEA-RC and URAEA, the experimental data of URAEA-RC induce decreases in all metrics, proving the effectiveness of our model, which considers relation information when expanding the training set.

Table 4
Results of ablation study

Models ZH-EN JA-EN FR-EN

Hits@1 Hits@10 Hits@1 Hits@10 Hits@1 Hits@10

URAEA-RA 85.4 91.7 88.5 94.8 95.4 98.1

URAEA-RC 86.3 92.9 89.3 95.5 95.4 98.1

URAEA 87.4 94.3 89.7 96.2 96.1 98.5

Models	ZH-EN	JA-EN	FR-EN
URAEA-RA	85.4	91.7	88.5	94.8	95.4	98.1
URAEA-RC	86.3	92.9	89.3	95.5	95.4	98.1
URAEA	87.4	94.3	89.7	96.2	96.1	98.5

5.2.3. Relation alignment

The URAEA can also be used for relation alignment, and the experimental results of some methods on relation alignment are given in Table 3. The experimental results of the benchmark models, except for RNM, are from HGCN [24], where -PR denotes that the relation embeddings are approximated by using entity embeddings as in HGCN; -JR denotes that the representations of entities and relations are jointly learned. Among the benchmark models, BootEA uses the strategy of iteratively increasing the training set, and the results are better than MTransE; HGCN uses GCN to aggregate the neighbour information and jointly learns entity and relation representations, achieving advanced results. RNM considers the head and tail entities when aligning relations, and its performance is best among all the benchmark models.

The results of the URAEA outperform those of RNM on the three datasets, where Hits@1 improves the most, by 4.1%, 5.4%, and 4.4%, respectively. This is because relationship embedding is composed of entity embedding. Good relationship alignment results indicate that the learned entity embedding is more reasonable, which proves the effectiveness of the model.

5.3. Analysis

5.3.1. Impact of available seed entity alignments

To investigate the impact of the size of seed entity alignments on the model URAEA, different proportions of seed entity alignments are used for entity alignment. RNM is selected as the comparative model, and the experimental results are shown in Fig. 4.

Fig. 4.

Results of entity alignment with different proportions of seed entity alignments.

By observing the results, URAEA outperforms RNM on all three datasets. Furthermore, URAEA, with only 30% seed entity alignments, performs better than RNM with 50% seed entity alignments on the three datasets, which validates the effectiveness of URAEA. In addition, on both the ${D B P 15 K}_{Z H - E N}$ and ${D B P 15 K}_{J A - E N}$ datasets, URAEA outperforms RNM by an average of 4.9% when utilising 10% seed entity alignments and by an average of 2.3% when utilising 50% seed entity alignments. This indicates that the proposed model URAEA performs well even when the seed entity alignments are small.

Fig. 5.

An example from ${D B P 15 K}_{Z H - E N}$ . The blue dashed line connects the pre-aligned entities, and the red dashed line connects entities for alignment.

5.3.2. Case study

Figure 5 shows an example from the dataset ${D B P 15 K}_{Z H - E N}$ , where the entity pair $⟨ v_{1}, v_{4} ⟩$ has been aligned, and $⟨ v_{s}, v_{t} ⟩$ is the entity pair that should be aligned. It is difficult to align $v_{s}$ and $v_{t}$ accurately with GCN because these two entities have different neighbourhoods. In the relation alignment stage, the URAEA correctly predicts the aligned relation pair $⟨ r_{1}, r_{4} ⟩$ . Additionally, the representations of $r_{1}$ and $r_{4}$ are associated with their head and tail entities: $r_{1} = concat [v_{s}, v_{1}]$ and $r_{4} = concat [v_{t}, v_{4}]$ . By aligning $r_{1}$ and $r_{4}$ , the representations of $v_{s}$ and $v_{t}$ become closer; thus, $v_{s}$ and $v_{t}$ can be easily aligned. This indicates that the proposed model URAEA can improve the performance of entity alignment through joint relation alignment.

6. Conclusions

In this paper, to learn the rich relation information contained in KGs, the model URAEA is proposed. URAEA proposes a weighting equation to obtain a more accurate relation representation. Furthermore, the model combines entity alignment with relation alignment. Since relations are represented by entities, the relation alignment in this paper is a variant of entity alignment. Then, the entity embeddings that fuse only the relation information in the seed relation pairs are learned, and the impact of KG heterogeneity is reduced. Compared to other methods, URAEA does not require pre-aligned relations as training data. Furthermore, based on the generated seed relation alignments, URAEA iteratively expands the seed entity alignments. The ablation studies demonstrate that considering the relation pairs around entities can yield more accurate entity pairs. Finally, the proposed model is evaluated on three cross-lingual datasets and compared with other models. The experimental results show that the URAEA achieves the best results. In future work, utilising the attribute information of KGs with attribute alignment will be considered to obtain better entity alignment results.

Footnotes

Acknowledgement

This work is supported by National Nature Science Foundation of China (No. 62062029).

References

K.D.

Bollacker,

Evans,

P.K.

Paritosh,

Sturge and

Taylor, Freebase: A collaboratively created graph database for structuring human knowledge, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10–12, 2008,

J.T.

Wang, ed., ACM, 2008, pp. 1247–1250. doi:10.1145/1376616.1376746.

Bordes,

Usunier,

García-Durán,

Weston and

Yakhnenko, Translating embeddings for modeling multi-relational data, in: Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, Nevada, United States, December 5–8, 2013,

C.J.C.

Burges,

Bottou,

Ghahramani and

K.Q.

Weinberger, eds, Proceedings of a Meeting Held, 2013, pp. 2787–2795, https://proceedings.neurips.cc/paper/2013/hash/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html .

Cao,

Liu,

Li,

Liu,

Li and

Chua, Multi-channel graph neural network for entity alignment, in: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28–August 2, 2019,

Korhonen,

D.R.

Traum and

Marquez`, eds, Volume 1: Long Papers, Association for Computational Linguistics, 2019, pp. 1452–1461. doi:10.18653/v1/p19-1140.

Chen,

Tian,

Yang and

Zaniolo, Multilingual knowledge graph embeddings for cross-lingual knowledge alignment, in: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19–25, 2017,

Sierra, ed., ijcai.org, 2017, pp. 1511–1517. doi:10.24963/ijcai.2017/209.

Dang,

Liu,

Liu and

Chen, Channel attention and multi-scale graph neural networks for skeleton-based action recognition, AI Communications (2022), 1–19. doi:10.3233/AIC-210085.

Du,

Kumar,

Johnson and

Ciaramita, Using entity information from a knowledge base to improve relation extraction, in: Proceedings of the Australasian Language Technology Association Workshop, ALTA 2015, Parramatta, Australia, December 8–9, 2015,

Hachey and

Webster, eds, ACL, 2015, pp. 31–38, https://aclanthology.org/U15-1004/ .

D.P.

Kingma and

Ba, Adam: A method for stochastic optimization, in: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA,, May 7–9, 2015,

Bengio and

LeCun, eds, Conference Track Proceedings, 2015, http://arxiv.org/abs/1412.6980 .

T.N.

Kipf and

Welling, Semi-supervised classification with graph convolutional networks, in: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings, OpenReview.net, 2017. https://openreview.net/forum?id=SJU4ayYgl.

Kotnis and

Nastase, Analysis of the impact of negative sampling on link prediction in knowledge graphs, 2017, CoRR, http://arxiv.org/abs/1708.06816.

10.

Lehmann,

Isele,

Jakob,

Jentzsch,

Kontokostas,

P.N.

Mendes,

Hellmann,

Morsey,

van Kleef,

Auer and

Bizer, DBpedia – a large-scale, multilingual knowledge base extracted from Wikipedia, Semantic Web 6(2) (2015), 167–195. doi:10.3233/SW-140134.

11.

Lin,

Liu,

Luan,

Sun,

Rao and

Liu, Modeling relation paths for representation learning of knowledge bases, in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015,

Marquez`,

Callison-Burch,

Su,

Pighin and

Marton, eds, The Association for Computational Linguistics, 2015, pp. 705–714. doi:10.18653/v1/D15-1082.

12.

Liu,

Lin,

Liu and

Sun, XQA: A cross-lingual open-domain question answering dataset, in: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28–August 2, 2019,

Korhonen,

D.R.

Traum and

Marquez`, eds, Volume 1: Long Papers, Association for Computational Linguistics, 2019, pp. 2358–2368. doi:10.18653/v1/p19-1227.

13.

Nie,

Han,

Sun,

C.M.

Wong,

Chen,

Wu and

Zhang, Global structure and local semantics-preserved embeddings for entity alignment, in: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020,

Bessiere, ed., ijcai.org, 2020, pp. 3658–3664. doi:10.24963/ijcai.2020/506.

14.

Pei,

Yu,

Hoehndorf and

Zhang, Semi-supervised entity alignment via knowledge graph embedding with awareness of degree difference, in: The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13–17, 2019,

Liu,

R.W.

White,

Mantrach,

Silvestri,

J.J.

McAuley,

Baeza-Yates and

Zia, eds, ACM, 2019, pp. 3130–3136. doi:10.1145/3308558.3313646.

15.

R.K.

Srivastava,

Greff and

Schmidhuber, Highway networks, 2015, CoRR, http://arxiv.org/abs/1505.00387.

16.

F.M.

Suchanek,

Kasneci and

Weikum, YAGO: a large ontology from Wikipedia and WordNet, J. Web Semant. 6(3) (2008), 203–217. doi:10.1016/j.websem.2008.06.001.

17.

Sun,

Hu and

Li, Cross-lingual entity alignment via joint attribute-preserving embedding, in: The Semantic Web – ISWC 2017 – 16th International Semantic Web Conference, Vienna, Austria, October 21–25, 2017,

d’Amato,

Fernandez´,

V.A.M.

Tamma,

Lecué´,

Cudre-Mauroux´,

J.F.

Sequeda,

Lange and

Heflin, eds, Lecture Notes in Computer Science, Vol. 10587, Springer, 2017, pp. 628–644. doi:10.1007/978-3-319-68288-4_37.

18.

Sun,

Hu,

Zhang and

Qu, Bootstrapping entity alignment with knowledge graph embedding, in: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, Stockholm, Sweden, July 13–19, 2018,

Lang, ed., ijcai.org, 2018, pp. 4396–4402. doi:10.24963/ijcai.2018/611.

19.

Sun,

Huang,

Hu,

Chen,

Guo and

Qu, TransEdge: Translating relation-contextualized embeddings for knowledge graphs, in: The Semantic Web – ISWC 2019 – 18th International Semantic Web Conference, Proceedings, Part I, Auckland, New Zealand, October 26–30, 2019,

Ghidini,

Hartig,

Maleshkova,

Svatek´,

I.F.

Cruz,

Hogan,

Song,

Lefrancoiş and

Gandon, eds, Lecture Notes in Computer Science, Vol. 11778, Springer, 2019, pp. 612–629. doi:10.1007/978-3-030-30793-6_35.

20.

Sun,

Wang,

Hu,

Chen,

Dai,

Zhang and

Qu, Knowledge graph alignment network with gated multi-hop neighborhood aggregation, in: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, the Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, the Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7–12, 2020, 2020, pp. 222–229, https://ojs.aaai.org/index.php/AAAI/article/view/5354 .

21.

Velickovic,

Cucurull,

Casanova,

Romero,

Liò and

Bengio, Graph attention networks, in: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, Conference Track Proceedings, OpenReview.net, 2018, https://openreview.net/forum?id=rJXMpikCZ .

22.

Wang,

Lv,

Lan and

Zhang, Cross-lingual knowledge graph alignment via graph convolutional networks, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31–November 4, 2018,

Riloff,

Chiang,

Hockenmaier and

Tsujii, eds, Association for Computational Linguistics, 2018, pp. 349–357. doi:10.18653/v1/D18-1032.

23.

Wu,

Liu,

Feng,

Wang,

Yan and

Zhao, Relation-aware entity alignment for heterogeneous knowledge graphs, in: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10–16, 2019,

Kraus, ed., ijcai.org, 2019, pp. 5278–5284. doi:10.24963/ijcai.2019/733.

24.

Wu,

Liu,

Feng,

Wang and

Zhao, Jointly learning entity and relation representations for entity alignment, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3–7, 2019,

Inui,

Jiang,

Ng and

Wan, eds, Association for Computational Linguistics, 2019, pp. 240–249. doi:10.18653/v1/D19-1023.

25.

Wu,

Liu,

Feng,

Wang and

Zhao, Neighborhood matching network for entity alignment, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020,

Jurafsky,

Chai,

Schluter and

J.R.

Tetreault, eds, Association for Computational Linguistics, 2020, pp. 6477–6487. doi:10.18653/v1/2020.acl-main.578.

26.

Wu,

Pan,

Chen,

Long,

Zhang and

P.S.

Yu, A comprehensive survey on graph neural networks, IEEE Trans. Neural Networks Learn. Syst. 32(1) (2021), 4–24. doi:10.1109/TNNLS.2020.2978386.

27.

Zeng,

Zhao,

Wang,

Tang and

Tan, Degree-aware alignment for entities in tail, in: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25–30, 2020,

Huang,

Chang,

Cheng,

Kamps,

Murdock,

Wen and

Liu, eds, ACM, 2020, pp. 811–820. doi:10.1145/3397271.3401161.

28.

Zhang,

N.J.

Yuan,

Lian,

Xie and

Ma, Collaborative knowledge base embedding for recommender systems, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13–17, 2016,

Krishnapuram,

Shah,

A.J.

Smola,

C.C.

Aggarwal,

Shen and

Rastogi, eds, ACM, 2016, pp. 353–362. doi:10.1145/2939672.2939673.

29.

Zhang,

Sun,

Hu,

Chen,

Guo and

Qu, Multi-view knowledge graph embedding for entity alignment, in: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10–16, 2019,

Kraus, ed., ijcai.org, 2019, pp. 5429–5435. doi:10.24963/ijcai.2019/754.

30.

Zhu,

Xie,

Liu and

Sun, Iterative entity alignment via joint knowledge embeddings, in: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19–25, 2017,

Sierra, ed., ijcai.org, 2017, pp. 4258–4264. doi:10.24963/ijcai.2017/595.

31.

Zhu,

Liu,

Wu and

Du, Relation-aware neighborhood matching model for entity alignment, in: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, the Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2–9, 2021, AAAI Press, 2021, pp. 4749–4756, https://ojs.aaai.org/index.php/AAAI/article/view/16606 .

Entity alignment with fusing relation representation

Abstract

Keywords

1. Introduction

2.1. Translation-based methods

2.2. GNN-based methods

3. Problem definition

4. Our model

4.1.1. Entity embedding

4.1.2. Entity alignment

4.1.3. Training

4.2. Relation alignment

4.2.1. Relation embedding

4.2.3. Relation alignment

4.2.4. Training

4.3. Relation-aware entity pair expanding

4.4. Joint entity alignment and relation alignment

5. Experiments

5.1. Experimental setup

5.1.1. Datasets

5.1.4. Evaluation metrics

5.2. Experimental results

5.2.1. Entity alignment

Table 4 Results of ablation study Models ZH-EN JA-EN FR-EN Hits@1 Hits@10 Hits@1 Hits@10 Hits@1 Hits@10 URAEA-RA 85.4 91.7 88.5 94.8 95.4 98.1 URAEA-RC 86.3 92.9 89.3 95.5 95.4 98.1 URAEA 87.4 94.3 89.7 96.2 96.1 98.5

5.3. Analysis

5.3.1. Impact of available seed entity alignments

6. Conclusions

Footnotes

Acknowledgement

References

Table 4
Results of ablation study

Models ZH-EN JA-EN FR-EN

Hits@1 Hits@10 Hits@1 Hits@10 Hits@1 Hits@10

URAEA-RA 85.4 91.7 88.5 94.8 95.4 98.1

URAEA-RC 86.3 92.9 89.3 95.5 95.4 98.1

URAEA 87.4 94.3 89.7 96.2 96.1 98.5