Abstract
Image semantic learning techniques are crucial for image understanding and classification. In social networks, image data is widely disseminated thanks to convenient acquisition and intuitive expression characteristics. However, due to the freedom of users to publish information, the image has apparent context dependence and semantic fuzziness, which brings difficulties to image representation learning. Fortunately, social attributes such as hashtags carry rich semantic relations, which can be conducive to understanding the meaning of images. Therefore, this paper proposes a new method named Social Heterogeneous Graph Networks (SHGN) for image semantic learning in social networks. First, a heterogeneous graph is built to expand image semantic relations by social attributes. Then the consistent semantic space is reconstructed through cross-media feature alignment. Finally, an image semantic extended learning network is designed to capture and integrate the social semantics and visual feature, which obtains a rich semantic representation of images from a social context. The experiments demonstrate that SHGN can achieve efficient image representation, and favorably against many baseline algorithms.
Introduction
As a form of expression for social network messages [1], images can attract users’ attention faster due to their intuitive visual nature. Image information is widely disseminated and shared in fast-paced entertainment and interactive platforms, leading it to encompass a wealth of social phenomena. Adequate semantic understanding of images can help to mine and discover public opinion events promptly. Thus, the feature representation of social network image data are crucial for tasks such as the detection and search of public events [2, 3]. However, due to the ubiquity of users, the shooting and uploading of pictures are random. This phenomenon leads to irregularity and semantic sparsity of images in social networks. Therefore, semantic representation learning for social network image face some challenges [4, 5].
Traditional image feature representation methods can be roughly divided into two categories, shallow and deep feature [6] extraction and learning methods. They design a deep learning framework to extract and mine the global characteristics of images, which shows significant advantages. However, these methods do not fully consider internal and external potential correlations of image data and do not use the additional knowledge to discover the relation between image contexts. Thus, they cannot effectively distinguish and judge the generalization of social network image information, resulting in poor quality of image representation. In this paper, the social relationship of social network images is semantically extended and mined, and high-quality image semantic representation methods are studied.
Due to the unique sharing and interactivity of social networks, multi-attribute features of social networks play a crucial role in message understanding. For example, topic hashtags marked with “#" are summary information that can quickly categorize events. A reference operation that points to or reminds a relevant user represented by the “@" symbol, a hyperlink information “URL" that points to the same external extension content with “http://" and the time-space trajectory of the news release and dissemination. These multi-attribute features can integrate the interaction between social network messages, which are ability to mine the latent semantics between seemingly unrelated messages. Given the relevant statistics, more than 10% of the microblog messages published contain different social attributes such as a topic hashtag, links, or “@". This result indicates that most additional information in social networks has been used as external metadata. The above multi-attribute features of social networks can reflect more semantic external information. Therefore, mining the potential associations and meanings of multi-attribute features in social networks can fully discover and represent the consistent semantics of image data. It can achieve effective semantic expansion and fusion of information to eliminate the incompleteness and semantic sparsity in social networks image descriptions.
Currently, research on graph convolutional neural networks [7, 8] has made advancements with the proliferation of graph data such as social networks and knowledge graphs [9, 10]. [11] propose a method of learning node embedding representation through random walk GCN. It constructs semantic associations of neighbor nodes to realize image recommendation. During the generation and dissemination of events, social attributes have strong semantic consistency in describing events. Therefore, benefiting from the various forms of attribute features within social networks, numerous studies have been devoted to exploiting the unique social context information [12, 13] for tasks such as efficient event detection and feature representation. In addition, there are existing GCN models [14, 15] for heterogeneous graphs, which perform fusion learning on heterogeneous feature spaces by reconstructing adjacency matrices [16]. Based on the implied social relations between social attributes, the social networks can be constructed as a graph model with rich semantics. We mine and analyze the latent semantics contained in social multi-attributes to effectively perform association discovery and semantic expansion of features on images.
In order to obtain higher-quality image semantic information, this paper proposes an image semantic representation learning algorithm based on the social heterogeneous graph network. With the help of social multi-attribute features, the image context information is associated, and a semantically related topic hashtag heterogeneous graph is constructed. The latent semantics of social attributes are incorporated into the expression of image features to establish relevance between images. In the process of node aggregation and convolution operation, more semantically rich image features are obtained. The proposed method associates corresponding text hashtags with visual objects and achieve the effective complement of the semantic features of event images. Extensive experiments fully demonstrate the effectiveness of the proposed algorithm SHGN. Compared with the benchmark methods of image feature representation, the proposed SHGN can obtain a higher quality semantic representation of social networks images. The main contributions of this paper are summarized as follows:
Towards fully understanding the meaning of social network images in public events, this paper proposes a novel social heterogeneous graph network to learn the rich semantics of images. We adopt social attributes to mine the relationship between independent image data, which guides the semantic expansion in the association context. The social heterogeneous graph model is established to reveal the abundant social semantics among images. The image representations are learned by designing SHGN based on semantic aggregation. Extensive experimental results show that the proposed SHGN method performs favorably against many baseline image representation methods in classification tests.
In the following Section 2, the related work is reviewed. Section 3 provides the formal problem statement and the detailed design of the SHGN method. In Section 4, we report and display extensive experimental results. Finally, we conclude the work of this paper in Section 5.
Related work
Image representation
The traditional methods usually take learning more semantically discriminative features as the critical issues. At present, many researchers are committed to image understanding [17, 18] and image sentiment analysis [6], etc. Typical image semantic feature representation methods are divided into shallow visual feature acquisition methods [19, 20] and deep learning-based image representation methods [21, 22]. [23] integrates the social context and visual features according to the user search intent. The learned image features contain rich semantic information and the user search tendency. [24] proposes an image semantic acquisition method based on contextual information and a shallow spatial encoder-decoder network, which improves the quality of image semantics and shows superior performance in image segmentation.
In recent years, the deep learning methods have shown significant advantages in feature learning [25]. For example, deep neural network structures deep belief network (DBN) [26], AlexNet, and VGG [22] can deeply mine and learn the semantic representation of image features. It has been widely used in tasks such as image classification, annotation, and search. [27] proposes an image semantic information recognition method, which uses the low-level image features to assist CNN extract the high-level features. It realizes the image recognition of the stack sparse autoencoder network.
Excellent research for semantic understanding of social images has been proposed successively [28–30]. [28] and [30] design a deep weakly-supervised model to discover image representations and improve the quality of tags. Li et al. [29] achieve image annotation by considering the visual similarity and semantic relevance. They address the problem of tags being noisy, irrelevant, or incomplete. This paper focuses on the issue of ambiguous semantic images and obvious reliance on the text on social platforms such as Sina Weibo and Twitter. We propose a novel social heterogeneous graph network to improve the image semantic representation by introducing social knowledge to mine the relation in image contexts.
Graph learning
Graph learning methods are currently applied in many tasks due to the rich correlations among data [31–33]. They recognize group activities and social image retagging by constructing different relationship graphs. [31] and [32] propose a novel coherence constrained graph LSTM and graph LSTM-in-LSTM to effectively recognize group activities, respectively. Tang et al. [33] design a novel social anchor-unit graph regularized tensor completion method to refine the tags of social images. Graph Convolutional Network (GCN) has attracted the attention of a large number of researchers, which aggregates and learns nodes by using the structural information of the graph and the feature of the nodes. In a typical approach, [34] performs feature embedding learning on nodes by fusing node features and linking information of the graph structure. The GCN-based multi-label image recognition model ML-GCN [35] builds a label dependency graph and improves the ability of image representation recognition through learning of target classifiers. Text GCN method effectively solves the problem of sparse feature representation.
The existing research has obtained outstanding achievements in heterogeneous graph neural networks [36–38]. They intelligently aggregate information from all sources to provide social recommendations by building a heterogeneous graph model. However, the heterogeneous graph model in this paper is to learn the image feature from different modalities in social networks. We use summary text as semantic guidance to integrate into the image feature representation. To some extent, it alleviates the semantic limitations of images and the dependence on text information. The visual and text attributes are connected through the co-occurrence of social characteristics. The edges in the graph also have rich social relationships. We can better fuse the associated nodes with different strengths in the graph convolution. [39] constructs a heterogeneous graph containing multiple node types and achieve accurate text classification by learning node representations based on GCN. The interrelated node information is aggregated during the convolution operation of GCN, which realizes the expansion of node features. [40] finds the hidden semantic spaces by mining context and content links, which maps multimedia data into consistent features. Therefore, in the representation of image features of the social networks, this paper makes full use of the nature of multi-attribute features of social networks. The topic hashtags and images are regarded as the nodes of a heterogeneous social network graph model which realizes image semantic learning and augment representations based on GCN.
Methodology
This paper proposes an image semantic representation algorithm based on the social heterogeneous graph network. It takes topic hashtags and images as two types of nodes in the graph structure. In the light of the potential social semantic association between the social multi-attribute information, a semantically rich graph model of social heterogeneity is constructed. The graph model fuses the contextual correction among image data and the propagation relationship of summative hashtags. Through the reconstruction of the heterogeneous feature space, we obtain the consistent node feature matrix for multimodality. The graph convolutional neural network (GCN) is adopt to represent and learn the features of the graph data. The hashtags are considered as an important guiding information in events, the semantic features of image information are aggregated, so as to achieve a high-quality representation of image semantics in social networks. The overall framework is shown in Fig. 1.

The framework of SHGN.
The main steps of SHGN are: 1) Construction of social heterogeneous graph model. It includes two types of nodes of images and hashtags and forms associations by introducing multi-attribute features; 2) Heterogeneous feature reconstruction representation. We adopt the traditional image feature representation method VGGNet-19 to generate the initial feature of the image node. The visual features are aligned with text features (topic hashtags) to unify heterogeneous features; 3) Image extended semantic feature learning network. The joint learning is performed in the graph convolutional neural network. The neighbor aggregation of convolution operation makes the image features fuse the associated nodes with relevant social semantics so that it can obtain a certain semantic expansion. The SHGN enables deep representation and mining of semantics. It realizes effective semantic learning of social network images.
Given a social network image dataset M, each image element m ∈ M contains attribute domain {ids, image content, topic hashtags, external links, class label}. The id is the number of the message, and the hashtag and external link are the social attributes. According to the social relations of ids, hashtags, and external links, a social heterogeneous graph G is established (see Definition 1 below). The hashtags are regarded as important semantic nodes to establish connections with image nodes. We adopt different modal feature learning techniques to map the feature space of text (topic hashtags) and visual (image), denoted as X H and X V , respectively. The text and visual features are fused and reconstructed, obtaining length-aligned vector representations X. Combined with the graph structure feature matrix A formed by the association weights between the nodes, the node embedding learning is performed based on GCN. The predicted class labels L are output through the node classification training, and the semantic feature representation of the image nodes are obtained, i.e. H.
Construction of social heterogeneous graph model
To mine semantic associations between discrete image elements, hashtags and link information “URL” are extracted from social network messages. A social heterogeneous graph model is built to describe the contextual semantics of image data. The graph model contains two entity types, image nodes and hashtag nodes. The social multi-attribute features guide semantically consistent images to create associations. The formal definition of a social heterogeneous graph model is as follows.
We construct the above social heterogeneous graph model with representative, abstract, and scalable social network multi-attribute features, i.e. hashtags and external link information. Through the co-occurrence relationship between these social multi-attribute features, the semantic relatedness of two types of nodes can be modeled. The following three connection rules are defined to establish connection relationships between nodes in a social heterogeneous graph:
Rule 1(image-hashtag). When the hashtag node and the image node contain the same id value, or the same “URL” information, a connection edge is established between the image and the hashtag node.
Rule 2(hashtag-hashtag). If the two hashtag nodes contain the same id, or the microblog message where the two hashtags are located contains the same “URL”, a connection relationship will be established between the two hashtag nodes.
Rule 3(image-image). If the message where the two image nodes are located contains the same hashtag or “URL”, a connection edge is formed between the two image nodes.
Fig. 2 shows an example of a social heterogeneous graph. This heterogeneous graph contains image nodes a, b, c, d and hashtag nodes e, f, g. Each node corresponds to a certain characteristic attribute. Since hashtag nodes f and g are both included in the message where image node d is located, according to Rule 1, f and d, g and d have a “co-mid” connection relationship. Based on Rule 2, nodes f and g have a “co-mid” connection relationship. In addition, the hashtag node g appears in the message where the image nodes c and d are at the same time, so according to Rule 3, the two image nodes c and d are connected by “co-h”.

An example of social heterogeneous graph.
The connection rules of the graph model reveal explicit and implicit relationships between image nodes and hashtag nodes. The explicit association is directly connected relations, such as the co-occurrence relations of hashtag and hashtag (or images), which indicate that nodes have strong consistency in semantics. Other rules are an implicit association, including Rule 3 and the URL-directed relationships in Rule 1 and Rule 2. They indicate that nodes contain indirect dependencies of events. Therefore, the social heterogeneous graph combines rich semantic relationships and structured information, which lays the foundation for the semantic representation of the social network images.
Nodes in the social heterogeneous graph have different attribute characteristics so that the heterogeneous graph contains different feature spaces. We set the consistent feature matrix
X
U
to represent the semantic feature space of all nodes in the heterogeneous graph. The attribute domain of the hashtag node is a sparse text feature. All hashtags in the dataset are tokenized. Based on the research of text word vectors, the top-2000 most frequent words in the corpus can cover the hashtag vocabulary. Thus, we select them to constitute a text feature dictionary. The feature of each hashtag node is converted into a d
h
-dimensional vector. Each dimension represents the number of occurrences of the feature in the short text. The feature of the hashtag node is normalized as:
Where h ij is the j-dimensional vector representation of the hashtag node h i . The f h mapping function generates the characteristic matrix of hashtag, expressed as X H ∈ R(N-n)*d h .
Aiming at the feature representation of image nodes in the social heterogeneous graph model, an image original feature representation sub-network is designed. The notable image feature learning approach VGGNet-19 [22] is adopted to capture the global features of the social image. Due to the feature space heterogeneity of text and vision, we need to align the feature dimensions of nodes in the graph model. Thus, we set the feature dimension of the VGGNet-19 final output to 1000. The VGGNet-19 network is pre-trained to conduct feature extraction and dimensionality reduction conversion on the social image. By adding a fully connected layer at the end of VGGNet-19, the image original feature representation sub-network is formed. The representation of the last layer is shown as:
Where V vgg is the feature representation learned by the pre-trained VGGNet-19. W v and b v are the weight vector of the fully connected layer. The image feature matrix of the social heterogeneous graph is denoted as X V ∈ Rn*d v .
Under the heterogeneous feature formed by text and visual, this section constructs a unified feature matrix of graph data node. The feature space is reconstructed by fusing the image feature X
V
with the hashtag feature X
H
. The graph node feature matrix is expressed as:
Among them, d h and d v are 1000 respectively, so X U ∈ RN*2000.
There are different degrees of semantic similarity between interconnected nodes. In order to avoid the mismatch of similarity calculation in various feature spaces, the condition numbers that satisfy different rules are utilized to denote the degree of association of different node pairs in the social heterogeneous graph. They are set as edge weights. According to the define of Rules, two nodes may satisfy multiple conditions at the same time. That is, two hashtags may appear in multiple messages at the same time or the message may contain some “URL”. This phenomenon is because the same social information may be included in multiple social network messages, so two nodes may satisfy the connection condition many times. It is further explained that the more connection conditions are satisfied between two nodes, the stronger the semantic correlation of nodes. The weighted adjacency matrix A of the social heterogeneous graph is obtained by calculating the number of conditions that satisfy the connection rules between nodes. It is used as the structural information of the heterogeneous graph. If there is an edge e ij ∈ E between nodes v i and v j , then A ij ∈ W ij , otherwise A ij = 0.
This section designs a network for node feature representation learning for social heterogeneous graph based on GCN. The reconstructed feature representation of the heterogeneous graph is input into the GCN network, and the extended learning of semantic features is performed on the image attributes. Based on the operational characteristics of graph convolution, the global structural information of the graph model and the matrix features of nodes can be fully preserved in the convolution operation. We add a classification layer, which can expand the image semantics and enhance the classification performance of nodes.
In order to ensure that the receptive field can effectively achieve the feature convergence of indirectly connected nodes during the neighbor aggregation process, a K-layer GCN structure is designed in the heterogeneous graph node embedding learning. In the process of generating node embedding vectors according to the neighborhood attributes of nodes, the heterogeneous graph node feature matrix is X. To highlight the importance of inter-node interaction, the weights of each edge in the social heterogeneous graph model are considered in embedding learning. That is, a weighted adjacency matrix A is used to represent the topology of the social heterogeneous graph. The feature matrix of graph data node and graph structure adjacency matrix of the social heterogeneous graph are input into GCN, and the generated node embedding feature matrix is shown as:
where
The generated semantic feature matrix H
K
is classified and predicted line by line in the classification layer. We adopt softmax function:
where W f ∈ R(N-n)*C is the trainable parameter matrix. C is the number of event categories contained in the dataset.
The cross-entropy loss is taken as the loss function, and the model is trained by the gradient descent algorithm. The calculation is shown as:
where, P ij is the j-th dimension in the probability vector of the output image node i of the softmax function, which represents the probability that the node i belongs to the j-th class. c i is the real class label of the node. The SHGN network minimizes the loss function on the training set and computes the gradient of descent through backpropagation.
The network of the SHGN algorithm adopts a joint training method and fine-tunes the original image feature representation sub-network in advance. In the process of joint training with the GCN classification sub-network, the VGGNet-19 pre-training parameters are unchanged to avoid over-fitting. We only update the parameter W(i) of the GCN classification sub-network, so that the entire network model can achieve an effective train. The implementation steps of the SHGN are as Algorithm 1.
M, the dimension of the image feature generated by the sub-network of the original image feature representation d v , GCN classification sub-network layers K, GCN classification sub-network hidden layer feature dimension d, the number of training iterations N r ;
Image semantic feature representation; Image prediction label;
1: Extract images and multi-attribute information in the datasets M;
2: Construction of Social Heterogeneous Graph G = (V, E, f);
3: Generate the characteristic matrix of hashtag nodes according to Eq. (1);
4: Initialize the parameters of VGGNet-19 W v , b v ;
5: The sub-network of image original feature representation is pre trained;
6: Reconstructing the node feature matrix X U according to Eq. (3);
7: Initialize the GCN classification sub-network parameter matrix W(i);
8: Joint training image raw feature representation sub-network and GCN classification sub-network;
9:
10: Calculate the cross-entropy loss;
11:
12: Calculate the gradient of descent using Adam algorithm;
13: Update GCN classification sub-network parameters W(i);
14: Output L and image semantic feature representation.
The proposed SHGN is efficient. Before model training, we construct the social heterogeneous graph via social relations with the complexity O (n (N - n) + N2), where n is the number of image nodes, (N - n) is the number of hashtag nodes, N is the total number of the node in the social heterogeneous graph, respectively. The time complexity of generating the hashtag node feature vectors is O (N - n). Image node features are obtained from Vgg-Net19, and the time complexity is
Experimental results and analysis
In this section, the generated image semantic feature by the SHGN algorithm is used for the image classification task. The image semantic representation quality of the SHGN algorithm is analyzed from the aspect of classification performance. Firstly, the overall SHGN algorithm is evaluated compared with the comparison algorithms. Then the ablation study is carried out to analyze the rationality and effectiveness of each part of SHGN. Finally, we estimate the sensitivity of the SHGN algorithm, including the number of layers of the GCN network, the number of neurons, and different datasets. The experiments sufficient verify the effect of SHGN on semantic extension learning of social network image features.
Setup
Dataset
We choose the public disaster event datasets CrisisLexT6 [41] and CrisisLexT26 [42] on Twitter. The event “West Texas Explosion” and unrelated event classes are extracted from CrisisLexT26 to form a single event dataset. Each dataset is category annotated by event type and “NONE”. The image data of each message is crawled by the Twitter API based on the Tweet ID and the image link “url”. Due to the lack and mismatch of image information in the message, the crawled image data is filtered and preprocessed to form three image datasets. The multi-attribute features of social networks are extracted for each dataset, including topic hashtags and hyperlink information “URL". The final dataset statistics are shown in Table 1. We randomly divide each dataset into training and test sets in a ratio of 8 : 2.
Statistics of dataset
Statistics of dataset
In order to verify the effectiveness of the SHGN algorithm in the expression of image semantics, the relevant image semantic representation methods are selected for experimental comparison. They are Corr-LDA [43], which is an algorithm for mining image semantics through topic models; VGGNet-19 [22], MaxNet [6], and GCN [34], a method for image feature representation based on deep neural networks. In addition, because the text information of topic hashtags is integrated into the visual feature of images as extended semantics, typical multi-modal fusion algorithms GCH [44] and att-RNN [45] are introduced to further verify the effectiveness of the algorithm in feature fusion. They adopt topic hashtags as text data.
We implement our proposed SHGN model based on TensorFlow and adopt Adam optimizer with a batch size of 64. The initial learning rate is 0.2 and we decrease the learning rate by 0.01 every 20 steps. Following Reference [34], the number of training iterations Nr is set to a maximum of 200. The GCN classification sub-network layers K is tuned from [1, 4]. the GCN classification sub-network hidden layer feature dimension d is tuned from [16, 32, 64, 128, 256]. Their optimal values are verified by the following experiment. Refer to the algorithm settings of the original literature to make the comparison algorithm reach the best state.
The details of the comparison algorithms are as follows.
Corr-LDA [43]: is the shallow image semantic representation method that builds the relationship between text and image modal data based on the topic model. It uses the topic semantic distribution of image modal to represent image features. This feature is fed into the SVM classifier for evaluation and comparison of image semantic quality.
VGGNet-19 [22]: is the image feature expression method based on deep learning that has excellent performance and is widely used in tasks such as image data classification and retrieval. In this chapter, the generated image features are input into the fully connected layer designed by the SHGN algorithm to achieve image classification.
MaxNet [6]: is a deep convolutional neural network-based model for image retrieval. It extracts the features from various pipelines in the inception module and aggregates the maximizing values. Tthe softmax classification results are adopted.
GCN [34]: is a deep learning method that is widely used in graph data to implement node embedding representation. The image data is generated according to Rule 3 to generate a connection graph containing only the image data, and GCN is used to learn and classify the node features of the graph data. The generation of image node features is the same as the SHGN algorithm.
GCH [44]: is a graph convolutional hash structure based on GCN, which is used to learn a consistent feature space representation of cross-media data in association graphs. Sigmoid activation is adopted to output predicted labels.
att-RNN [45]: is a multi-model fusion method based on attention mechanism and recurrent neural network. It integrates the text, images, topic hashtags, and links for event classification.
Evaluating measure
When evaluating the result of image classification, the result of image classification is judged according to the class label of each image data. Four common classification evaluation indexes are selected, including accuracy, recall, classification precision and F1 value.
Precision (P) represents the proportion of the actual positive sample data in the sample data marked as correct by the classifier in the classification result:
Recall (R) is used to measure the proportion of positive samples that are marked as correct by the classifier:
Accuracy is the ratio of the number of correctly classified samples to the total number of samples in the classification:
F Score (F1) is a comprehensive index, which is the harmonic mean of precision and recall.
where tp and tn represent the number of positive samples and negative samples that are correctly classified in the classification results of the same category, respectively. fp and fn are the number of wrongly classified positive samples and negative samples quantity.
To more comprehensively evaluate the ability of the SHGN algorithm to learn the image semantics, the proposed SHGN and image feature expression methods are compared in image classification performance. The typical image feature representation, and semantic learning algorithms are selected, and experiments are carried out on three datasets respectively. We report the four evaluation indicators to reveal the performance of image classification. The performance comparison is shown in Fig. 3.

Performance comparison of SHGN and baselines in classification task.
From the comparison results in Fig. 3, it can be seen that the SHGN algorithm has achieved a high improvement in the four evaluation indicators of P, R, F1, and Accuracy. VGGNet-19, MaxNet and GCN algorithms use deep neural networks to perform global extraction and representation learning of data features in a deeper level, so they are significantly better than the shallow topic model algorithm Corr-LDA. Compared with deep learning-based deep feature representation methods VGGNet-19 and MaxNet, the performance of SHGN algorithm is significantly improved. Among them, the classification effect is better than that of VGGNet-19 and MaxNet. The results indicat that the SHGN algorithm is based on VGGNet-19, and the social heterogeneous graph convolution model effectively supplements and mines image semantic features. MaxNet does not consider the context of image in social networks. Thereby SHGN improves the performance of image semantic learning. Compared with the GCN algorithm, SHGN achieves better classification effect. The reason is that the SHGN introduces the multi-attribute feature of the social network, and generates a compact graph model with rich semantics in the construction of the social heterogeneous graph. The embedding representation of nodes in learning is more adequate and complete. Under the comprehensive comparison of the four indicators, it fully shows that the SHGN algorithm has a good ability to representation image semantics.
Topic hashtags and hyperlink information play a decisive role in constructing the social heterogeneity graph. In order to verify the influence of the above two types of social network multi-attribute features on the SHGN algorithm, the related connection conditions in the graph model establishment rules are deleted, and a new graph model is established in the SHGN algorithm. Among them, the method of deleting the “co-h” connection relationship in the graph model building process is expressed as SHGN-h. The approach for removing connection rules related to “co-URL" is denoted as SHGN-u. To make the comparison effect more significant, experiments are carried out on three datasets. Table 2 reports the experimental results on the two evaluation indicators of F1 and Accuracy.
Influence of multi-attribute features of social networks on SHGN
Influence of multi-attribute features of social networks on SHGN
In Table 2, we observe that SHGN is better than the comparison algorithms under the two indexes. Notely, deleting the two social network attribute feature can significantly reduce the performance of SHGN. The experimental results of SHGN-h are more obvious than SHGN-u, which shows that hashtags play a more important role than “URL” in the construction of a social heterogeneous graph. Hashtags guide the formation of explicit relationships in the graph model, while external link “URL” identifies implicit relationships between nodes with weaker semantic strength. The results demonstrate the importance of two social multi-attribute features in the proposed SHGN algorithm. Because the multi-attribute features of social networks determine the closeness of the relationship in the construction of the social heterogeneous graph, which affects the topology structure between nodes and the process of association aggregation. By reducing the connection rules, the generated new social heterogeneous graph model becomes looser, and the semantic association strength decreases. This result is in the lack of richer semantics in the semantic learning of images and affects the final image classification effect. Besides, compared with multi-modal fusion algorithms GCH and att-RNN, the proposed SHGN also has certain advantages. Because we can not only fuse text semantics but also mine image context based on the heterogeneous graph, which fully enriches image representation. The results verify the effectiveness of the proposed algorithm in data fusion.
Influence of GCN layers on SHGN
The number of GCN network layers included in the SHGN algorithm can affect the classification effect of the entire algorithm to a certain extent. It determines the scope of the receptive field in the neighbor aggregation. By setting different layers K, SHGN algorithm can achieve feature convergence of connected nodes in different neighborhoods, so as to achieve different degrees of semantic expansion of image features. Based on different K values, we evaluate the classification performance of SHGN algorithm, and analyze the influence of GCN layer number on the image semantic representation effect of SHGN algorithm. The value of the number of layers K are set to {1, 2, 3, 4}. Under the classification precision, the image classification experiments are carried out on three image datasets. The experimental results are shown in Table 3.
Influence of different GCN layers on classification performance of SHGN
Influence of different GCN layers on classification performance of SHGN
By analyzing the results in Table 3, the following conclusions can be drawn. When the number of GCN layers is K = 2, the precision on the dataset can achieve the best results. From the distribution of P, it can be found that when K > 2, with the increase of K value, the P shows a downward trend. This is because when K = 1, only the neighbors directly connected to the node are aggregated in the convolution operation of GCN. While when K = 2, the semantic relationship between the indirectly connected nodes can be mined, and the nodes can be effectively expanded and replenish. When K > 2, the multi-layer GCN may incorporate unnecessary and weakly semantically related node information, so the P can decrease. So, we set the number of GCN layers of SHGN as 2 to achieve the best node embedding learning performance in all experiments.
In the proposed SHGN algorithm, the number of hidden layer units of GCN in the social heterogeneous graph convolutional network involves the mapping of node features and the number of reserved bits. Hence, it affects the effect of image semantic feature learning. In this section, the number of units that make the SHGN algorithm achieve the best performance is obtained by cross-validation. We select M as {16, 32, 64, 128, 256} to evaluate the change of classification accuracy of the SHGN algorithm. Table 4 records the classification results within the three datasets.
Sensitivity analysis of SHGN to the number of hidden layer units
Sensitivity analysis of SHGN to the number of hidden layer units
In Table 4, we can observe that when M = 64, SHGN achieves the best classification accuracy on all three datasets. The reason is that when the dimensionality is low, the network model cannot preserve the integrity of node information. At the same time, when the number of hidden layer units is large, the computational complexity will increase significantly. Therefore, when the number of hidden layer units in the SHGN algorithm is 64, the overall model can achieve the best results. In the following experiments, the value of M is set to 64 bits to keep the SHGN algorithm in a better learning state.
To analyze the sensitivity of the SHGN algorithm to different datasets, this section verifies the impact of different numbers of event categories in the dataset and different proportions of social multi-attributes on the SHGN algorithm through experiments. The performance of the SHGN algorithm for binary classification and multi-classification is tested against three datasets.
Fig. 4 reveals the training process of the SHGN algorithm on various datasets. The recall of SHGN for 200 iterations are obtained on different datasets. From the results of Fig. 4, it can be found that the recall rate of each dataset shows a rising trend with small fluctuations. Combined with the statistical information of the dataset in Table 1, it can be seen that the dataset “CrisisLexT26” contains the largest number of event categories, but its recall rate is the highest. The multi-classification performance on the multi-event dataset is better than the binary classification effect on the single-event dataset. It shows that the SHGN algorithm has strong stability in both binary classification and multi-classification problems. Therefore, the number of categories of the dataset has less influence on the SHGN algorithm.

Training process of SHGN under different datasets.
In addition, the influence of the number of multi-attribute features of social networks on the SHGN algorithm are analyzed by combining the results in Fig. 4 and the information in Table 1. We can see that the higher the proportion of topic hashtags and “URL” contained in the dataset, the better the SHGN algorithm can obtain the better classification recall. Therefore, the Recall is positively related to the proportion of social multi-attribute features in the dataset. This is because the content of multi-attributes in social networks can semantically associate independent and scattered image data, which determines the degree of compactness of connected edges in social heterogeneous graphs. It affects the compactness of semantic relationships in social heterogeneous graph models. In social heterogeneous graphs with different semantic closeness, SHGN algorithm will generate node feature representations with different aggregation degrees through GCN embedding learning. The number of social attributes involve the semantic expansion degree of image nodes, so they affect the classification effect of SHGN algorithm.
This paper proposes an image semantic feature representation algorithm based on the social heterogeneous graph network. The semantic correlation of image context is discovered and a social heterogeneous graph model is constructed based on social attributes. The key text semantics (i.e. topic hashtag) can be effectively correlated and supplemented with the visual information by the node aggregation operation of the graph convolutional neural network. The effective extended representation of the image semantics of social network is realized. In classification experiments on the Twitter image dataset, there is a significant performance improvement compared to contrasting algorithms.
