Abstract
Entity alignment is the task of identifying entities from different knowledge graphs (KGs) that point to the same item and is important for KG fusion. In the real world, due to the heterogeneity between different KGs, equivalent entities often have different relations around them, so it is difficult for Graph Convolutional Network (GCN) to accurately learn the relation information in the KGs. Moreover, to solve the problem regarding inadequate utilisation of relation information in entity alignment, a novel GCN-based model, joint Unsupervised Relation Alignment for Entity Alignment (URAEA), is proposed. The model first employs a novel method for calculating relation embeddings by using entity embeddings, then constructs unsupervised seed relation alignments through these relation embeddings, and finally performs entity alignment together with relation alignment. In addition, the seed entity alignments are expanded based on the generated seed relation alignments. Experiments conducted on three real-world datasets show that this approach outperforms state-of-the-art methods.
Introduction
Knowledge graphs (KGs) structurally represent real-world facts in the form of triples
In recent years, many researchers have used embedding-based methods for entity alignment. These methods embed KGs into a low-dimensional, continuous vector space and discover possible aligned entities by measuring the similarity of entity embeddings. Translation Embedding(TransE) [2] is the earliest KG embedding method, which treats relation as the translation from head entity to tail entity. Some TransE-based entity alignment methods, such as Multilingual TransE (MTransE) [4] and Joint Attribute-Preserving Embedding (JAPE) [17], have been developed. In recent years, Graph Neural Network (GNN) has received much attention due to their excellent performance in processing graph data [26]. The GNN learns node representations by aggregating neighbourhood information of the node and has been utilised for entity alignment, examples include the GCN-based Entity Alignment (GCN-Align) [22], Relation-aware Dual-Graph Convolutional Network (RDGCN) [23] and KG Alignment Network (AliNet) [20].
Recent entity alignment studies have mainly used Graph Convolutional Network (GCN) [8]. GCN is convenient for modelling the node features of graphs but have difficulty in capturing the relation information contained in the graphs [5]. According to Highway-GCN (HGCN) [24], entities and relations in knowledge graphs are usually closelyrelated, and rich relation information helps to improve the performance of entity alignment. For example, Fig. 1 shows two subgraphs from the Chinese and English versions of DBpedia [10]. If only the surrounding entities are considered, it is difficult to align

Illustration of the importance of relations for entity alignment. The entity pairs
RDGCN [23] constructs a dual relation graph, and captures relation information via the interaction between the primal graph and the dual relation graph. However, this method may introduce noise from the dual graph. HGCN [24] uses entity embeddings to approximate relation embeddings and then incorporates these relation embeddings into entities to learn better embeddings for both. However, due to the heterogeneity of KGs, the equivalent entities in two KGs usually have different neighbourhoods. This may make entity alignment challenging since the equivalent entities aggregate different relations. As shown in Fig. 1, the relations around
In this paper, a joint Unsupervised Relation Alignment for Entity Alignment model named URAEA, is proposed, which aims to effectively utilise relation information for entity alignment. The entities connected to the relation are weighted to obtain more accurate relation embeddings. Then, both entity alignment and relation alignment are performed to potentially learn relation information, and a strategy is adopted to iteratively expand the seed entity alignments. Previously developed methods such as Bootstrapping Entity Alignment (BootEA) [18] and Degree-Aware Alignment for Entities in Tail (DAT) [27] also expand seed entity alignments, but the relations between entities are ignored. In this paper, the more accurate entity pairs can be identified by using the seed relation alignments obtained in relation alignment. Experiments are conducted on three cross-lingual datasets, and the URAEA achieves the best results.
The main contributions of this paper are as following:
When using entity embeddings to calculate relation embeddings, this paper proposes a weighting equation that considers information about the weights of entities.
To mitigate the effects of knowledge graph heterogeneity, this paper unites relation alignment with entity alignment.
The relations around the entities are considered when expanding the seed entity pairs to obtain the entity pairs that are more likely to be aligned.
Recent entity alignment methods are mainly divided into translation-based methods and GNN-based methods. These methods are reviewed in this section.
Translation-based methods
Many translation-based entity alignment methods adopt TransE [2], which treats relations as translations from head entities to tail entities. MTransE [4] was the first entity alignment method to utilise the idea of TransE. It uses TransE to first embed each KG, then employs three techniques that can map two KGs into the same vector space, and finally aligns the entities in different KGs. Iterative PTransE (IPTransE) [30] utilises Path-based TransE (PTransE) [11] to embed each KG and then transforms the embeddings of two KGs into a unified vector space based on seed entity alignments. Furthermore, IPTransE employs a strategy to iteratively generate newly aligned entities. BootEA [18] proposes a bootstrapping approach that labels likely entity pairs as training data by optimising a global objective under a one-to-one mapping constraint, and BootEA also uses a new sampling method called truncated uniform negative sampling. Semi-supervised Entity Alignment (SEA) [14] proposes a semi-supervised entity alignment strategy that uses labelled entities and a large number of unlabelled entities for alignment. Edge-centric Translational Embedding (TransEdge) [19] represents relations according to their context (head and tail entities) and then uses a parameter sharing strategy to unify two KGs. Some methods use additional KG information. JAPE [17] embeds two KGs into a unified vector space and then optimises the KG representations by using the attribute information. Multi-view KG Embedding (MultiKE) [29] learns entity embeddings for entity alignment through three types of information: entity names, relations, and attributes.
Following IPTransE and BootEA, the proposed model expands the seed entity alignments to achieve enhanced entity alignment performance. In contrast to these methods, in this paper, the relations between entities are considered when matching them.
GNN-based methods
The more similar the neighbourhoods of two entities are, the more likely the two entities are to align. GNN can model the neighbourhood information of entities, and therefore, an increasing number of researchers are utilising GNN for entity alignment.
GCN-Align [22] was the first approach to use GCN [8] for entity alignment. It uses GCN to embed entities from two KGs into a unified vector space. RDGCN [23] constructs a dual relation graph to learn relation information. HGCN [24] uses the highway gate strategy, which can control noise propagation to jointly learn both entity and relation representations. Multi-channel GNN (MuGNN) [3] learns KG embeddings through multiple channels, and each channel encodes KGs via different relation weighting schemes. AliNet [20] showed that equivalent entities tend to have different neighbourhoods, so AliNet uses Graph Attention Network (GAT) [21] with a gating mechanism to aggregate multihop neighbours to extend the overlaps among the neighbourhood structures between equivalent entities. Structure and Semantics Preserving (SSP) [13] learns entity representations in a coarse-to-fine manner by exploiting the global structures and local semantics of KGs. Neighborhood Matching Network (NMN) [25] evaluates the similarity between entities by using a novel graph sampling method and a cross-graph neighbourhood matching module to overcome the challenge regarding the structural heterogeneity between KGs. RNM [31] proposes a relation-aware neighbourhood matching model, which considers the neighbouring entities and the connected relations between entities when matching subgraphs.
The HGCN and RDGCN take relation information into consideration when modelling entity embeddings, but some noise from neighbours may be introduced. In this paper, a novel entity alignment model to reduce noise propagation while learning relations is proposed. In addition, the iterative strategy of the RNM is used to achieve better performance.
Problem definition
A KG is formalized as
Given two KGs
Entity alignment aims to discover newly aligned entity pairs between two KGs based on these pre-aligned entity pairs. In this paper, bold letters are used to denote vectors.
Our model
In this section, the URAEA is described in detail. Figure 2 shows the overall structure of URAEA. The URAEA is mainly divided into four stages: preliminary entity alignment, relation alignment, relation-aware entity pair expandsion, and joint entity alignment and relation alignment.

Overall architecture of the URAEA. The blue dashed lines denote preliminary entity alignment, and the blue solid lines denote joint entity alignment and relation alignment.
In the first stage, the GCN is used to learn the entity embeddings in two KGs with the objective of entity alignment. In the second stage, the relation embeddings are calculated by using our proposed method; then, unsupervised seed relation alignments are generated via relation embeddings, and these seed relation alignments are used as the training data for relation alignment. In the third stage, the seed entity pairs are expanded based on the generated seed relation alignments. In the fourth stage, the two KGs are trained with the goals of entity alignment and relation alignment.
Entity embedding
In this paper, a 2-layer GCN is used to model the structural information of KGs. Two given KGs
The GCN takes the embeddings of all entities
Furthermore, the highway gates strategy [15] can control the propagation of noise in the GCN. This strategy is adopted in each layer of the GCN:
Entity alignment
After the representations of all entities are obtained, the URAEA determines whether the entities are aligned by measuring the distance between them. For two entities
Training
In this paper, the margin-based loss function is used as the training objective to make the distance of aligned entity pairs as small as possible and the distance of negative samples as large as possible:
The model uses nearest neighbour sampling [9] to obtain negative samples. Given an entity pair
Relation alignment
Relation embedding
The semantics of a relation are related to its connected entities. Previously developed approaches, such as HGCN and RNM, use the embeddings of head entities and tail entities to represent relations:

Calculation of relation embedding.
Given a relation
After obtaining
Furthermore, because of the small number of relations in KGs, the reverse relations are added. For example, for a triple (China,
Finally, the representations of all relations
The proposed model first calculates the distance between the relations in
Given a relation
Afterwards, the relation pairs are selected as unsupervised seed relation alignments based on the matrix
Relation alignment
The model determines whether two relations are aligned by measuring the distance between the embeddings of the relations. For relations
Training
Similar to entity alignment, the negative samples
Relation-aware entity pair expanding
Based on the seed relation alignments
Afterwards, the method described in Section 4.2.2 is used to select the newly aligned entity pairs and those entity pairs are added to the seed entity alignments
Joint entity alignment and relation alignment
In this paper, the URAEA performs relation alignment together with entity alignment by using the following objective function:
In this paper, the model first uses Eq. (5) for preliminary entity alignment to obtain entity embeddings. Next, the acquired entity embeddings are used to calculate the relation embeddings, and then the seed relation alignments are obtained. Meanwhile, the seed entity alignments are expanded. In this stage, the model is trained using Eq. (12). After the training process is completed, the iterative strategy of RNM [31] is adopted to obtain more accurate alignment results.
Experiments
Experimental setup
Datasets
In this paper, DBP15K [17] is used to evaluate the proposed model URAEA, which is a subset of the large-scale KG DBpedia [10]. DBP15K contains three cross-lingual datasets:
Summary of the DBP15K datasets
Summary of the DBP15K datasets
To evaluate the model URAEA, some competing entity alignment methods are selected for comparison in this paper. These methods are mainly divided into translation-based methods and GNN-based methods.
The translation-based methods include MTransE [4], SEA [14], BootEA [18] and TransEdge [19].
The GNN-based methods include GCN-Align [22], RDGCN [23], HGCN [24], AliNET [20], NMN [25] and RNM [31].
Performance on entity alignment
Performance on entity alignment
In this paper, the dimension of the entity embeddings in GCN is set as 300. The model set the margin
During the training process, the model is trained in 10 batches through Eq. (5) in the first step and is trained in 50 batches through Eq. (12) in the second step. In the second step, the model regenerates the seed relation alignments and re-expands the seed entity alignments every 10 batches.
Evaluation metrics
To evaluate the model, Hits@k and mean reciprocal rank (MRR) are used as metrics. Hits@k is the percentage of correctly aligned entities among the top k results, and MRR is the average of the reciprocal ranks of the prediction results. The larger the values of Hits@k and MRR are, the better the model. Note that the bolded numbers indicate the best results and the underlined numbers indicate the second best results.
Experimental results
Entity alignment
The experimental results of all methods on entity alignment are given in Table 2. The experimental results show that the model URAEA achieves the best results.
Among the translation-based models, TransEdge focuses on modelling the complex relations in KGs, such as 1-N, N-1, and N-N relations, and achieves the best results. Among the GNN-based models, RDGCN, HGCN, NMN, and RNM initialize entity embeddings by using entity names and obtain advanced results. This illustrates the importance of entity name information. Among all the baselines, RNM achieves the best results by considering relations during neighbourhood matching.
The URAEA achieves the best results on the three datasets, compared with RNM, Hits@1 on the three datasets improves by 4.0%, 2.9% and 2.5%, respectively. This is because URAEA is able to learn the relation information in the KG, and the entity embeddings contain information about neighbouring entities and neighbouring relations. Additionally, the expansion of seed alignments brings performance improvement. This validates the effectiveness of the proposed model.
Performance on relation alignment
Performance on relation alignment
To evaluate the effectiveness of the various components of the URAEA, some ablation studies are conducted. The following variants of URAEA are provided: URAEA-RA denotes the URAEA without relation alignment; URAEA-RC denotes the model in which the relations are not considered when expanding the seed entity alignments. The experimental results are shown in Table 4. It should be noted that the two variants outperform all baselines in Table 2, which is attributed to the expansion of seed entity alignments.
According to the results of URAEA-RA and URAEA, URAEA achieves Hits@1 values that are 2.3%, 1.4% and 0.7% higher than those of URAEA-RA on the three datasets. The reason for this finding may be that URAEA is able to obtain entity representations that contain relation information by aligning the relations between entities, thereby achieving improved entity alignment performance. This demonstrates the effectiveness of relation alignment.
By comparing URAEA-RC and URAEA, the experimental data of URAEA-RC induce decreases in all metrics, proving the effectiveness of our model, which considers relation information when expanding the training set.
Results of ablation study
Results of ablation study
The URAEA can also be used for relation alignment, and the experimental results of some methods on relation alignment are given in Table 3. The experimental results of the benchmark models, except for RNM, are from HGCN [24], where -PR denotes that the relation embeddings are approximated by using entity embeddings as in HGCN; -JR denotes that the representations of entities and relations are jointly learned. Among the benchmark models, BootEA uses the strategy of iteratively increasing the training set, and the results are better than MTransE; HGCN uses GCN to aggregate the neighbour information and jointly learns entity and relation representations, achieving advanced results. RNM considers the head and tail entities when aligning relations, and its performance is best among all the benchmark models.
The results of the URAEA outperform those of RNM on the three datasets, where Hits@1 improves the most, by 4.1%, 5.4%, and 4.4%, respectively. This is because relationship embedding is composed of entity embedding. Good relationship alignment results indicate that the learned entity embedding is more reasonable, which proves the effectiveness of the model.
Analysis
Impact of available seed entity alignments
To investigate the impact of the size of seed entity alignments on the model URAEA, different proportions of seed entity alignments are used for entity alignment. RNM is selected as the comparative model, and the experimental results are shown in Fig. 4.

Results of entity alignment with different proportions of seed entity alignments.
By observing the results, URAEA outperforms RNM on all three datasets. Furthermore, URAEA, with only 30% seed entity alignments, performs better than RNM with 50% seed entity alignments on the three datasets, which validates the effectiveness of URAEA. In addition, on both the

An example from
Figure 5 shows an example from the dataset
Conclusions
In this paper, to learn the rich relation information contained in KGs, the model URAEA is proposed. URAEA proposes a weighting equation to obtain a more accurate relation representation. Furthermore, the model combines entity alignment with relation alignment. Since relations are represented by entities, the relation alignment in this paper is a variant of entity alignment. Then, the entity embeddings that fuse only the relation information in the seed relation pairs are learned, and the impact of KG heterogeneity is reduced. Compared to other methods, URAEA does not require pre-aligned relations as training data. Furthermore, based on the generated seed relation alignments, URAEA iteratively expands the seed entity alignments. The ablation studies demonstrate that considering the relation pairs around entities can yield more accurate entity pairs. Finally, the proposed model is evaluated on three cross-lingual datasets and compared with other models. The experimental results show that the URAEA achieves the best results. In future work, utilising the attribute information of KGs with attribute alignment will be considered to obtain better entity alignment results.
Footnotes
Acknowledgement
This work is supported by National Nature Science Foundation of China (No. 62062029).
