Abstract
Owing the continuous enrichment of mobile application resources, mobile applications carry almost all user behaviors and preferences. The analysis of user behavior regarding mobile terminals has become an important research direction. The frequency with which users click on mobile applications reflects their preferences to a certain extent. In this study, we propose a mobile application click-frequency prediction model based on heterogeneous information network representation. This model first constructs a heterogeneous information network between users’ mobile devices and mobile applications. To generate a meaningful sequence of network-embedded nodes, we perform a random walk on a specified meta-path. Finally, the prediction of users’ mobile application click frequency is completed using representation fusion and matrix factorization. Experiments show that our method outperforms other baseline methods in terms of the mean absolute error and root mean square error. Therefore, the application of a heterogeneous information network representation method to the prediction model is effective. This study is significant to the behavior research of mobile terminal users.
Keywords
Introduction
With the popularization of smartphones, many mobile application markets have emerged, and an increasing number of mobile applications are being developed. In 2020, the number of application downloads among global mobile users exceeded 218 billion, which showed an annual increase of 7%; additionally, the cumulative usage time reached 3.5 trillion hours [1]. While numerous applications offer convenience and an optimal user experience to mobile owners, they also cause information overload. Therefore, analyzing users’ mobile application usage data, identifying the trends in their behaviors, and recommending mobile applications to meet individual needs has become a key direction of current academic research in this field.
Traditional social information networks are divided into homogeneous and heterogeneous information networks (HINs) according to the number of nodes and links in the network. The majority of real information networks are heterogeneous [2], and their network structures contain different types of nodes and relationships. Compared with homogeneous information networks, HINs have more complex structures and comprehensive node relationships. Recently, an increasing number of studies have modeled these interconnected data as HINs composed of different types of nodes and edges and have used comprehensive structural information and rich semantic information in the network for more accurate knowledge discovery [3]. Previously, the research on the recommended systems or predictive models have focused on bipartite graphs. NMF [4] model sets the predicted user matrix U and item matrix I as non negative numbers from the point of view that negative numbers have no meaning in real life. Biased [5] takes into account the user’s preference and item characteristics in real life; therefore, the average value of the total item clicks and that of all clicks by the user are added to the final prediction result. GCMC [6] belongs to graph neural network, it regards the recommendation problem as a link prediction problem about user and project bipartite graph, and learns the embedding of nodes by differentiable message passing on bipartite graph. NGCF [7] uses neural graph collaborative filtering to better model the collaborative information between users and projects into the embedding of users and items, so as to overcome the problem of lack of collaborative signal in traditional embedding models.
Although the above algorithms can perform well in recommendation and prediction scenarios, they are only limited to learning on the bipartite graph and ignore other auxiliary information between users and items, such as users’ age, users’ gender, other information of items, etc. Pathsim [8] is the simplest heterogeneous information network prediction model. It considers the semantics of meta-paths composed of different types of objects, and it uses a similarity measurement of meta-paths to evaluate the similarity of objects of the same type based on this path. Meng et al. proposed the AvgSim [9] evaluation model to evaluate the similarity score using two random walking processes along a given meta-path and a reverse meta-path. It is difficult for the two abovementioned algorithm models to specifically combine information from multiple meta-paths for prediction, whereas the SemRec [10] algorithm model proposes the concept of a weighted HIN and weighted meta-path for the link attribute values that exist in the information network. The link attribute values are distinguished to reveal the object relationships in more detail. Prediction algorithms mostly regard user similarity as a similarity regularization, which limits similar users to the low-rank matrix factorization framework; however, they are only limited to close user or project relationships. When the relationship is relatively distant, the prediction ability is significantly reduced. The dual similarity regularization (DSR) [11] algorithm, which was subsequently proposed, presents a new double similarity regularization method that can simultaneously impose constraints on users and items with high and low similarity, thereby forcing the potential factors of two similar objects to converge.
Although the performance of the above-mentioned HIN algorithms exhibit improvement over that of traditional algorithms, most of them produce predictions based on the similarity of meta-paths. This presents two major challenges: first, when the semantics contained in some meta-paths are sparse or noisy, the prediction accuracy decreases, owing to being limited by the semantic influence of meta-paths; second, meta-path similarity is used to display the semantic relationship between nodes, and the meaning it may represent is not related to the prediction result. In this study, we propose a method to predict the click frequency of mobile applications based on HIN representation. This method first constructs a heterogeneous information network for user click behaviors, and it then performs a random walk on a given meta-path to learn the low-dimensional vector representations of mobile devices and application nodes. Finally, it uses representation fusion and matrix factorization to predict the application click frequency in a user’s mobile phone; the model is adjusted by changing the number of input meta-path types to solve the above two problems. The key contributions of this study are summarized as follows: A heterogeneous information network is constructed by utilizing the node relationship between the user’s device and the application. A variety of meta-paths with different semantics were selected for further analysis through the distribution of different nodes in the network. The semantic and structural information contained in the heterogeneous information network of mobile phone devices and applications is revealed using a heterogeneous information network representation method based on meta-paths [12]. We also adopt a node representation fusion method, which integrates the node representations based on different meta-paths into one representation, and we use the heterogeneous information network representation vector and matrix factorization algorithm for a weighted summation as the final prediction result [13]. We explore the impact of different numbers of the same type of node in the heterogeneous information network on the results. We select two types of nodes, change the number of each type of node, and divide them into detailed nodes and simple nodes. We then use the controlled variable method to analyze the influence of different numbers of the same type of node on the experimental results and select the best result as the final model of this study. Compared with other methods, the experimental results show that richer semantics and node relationships can be extracted through a heterogeneous information network.
This study primarily uses a heterogeneous information network to predict and classify mobile application clicks and prove its effectiveness. In Section 2, we introduce the related theory and algorithms. Section 3 provides a specific description of the mobile application click-frequency prediction method based on heterogeneous information network representation. Section 4 presents the experimental results and analyses. Finally, the conclusions and prospects are presented in Section 5.
Related works
This section summarizes existing literature related to the following two aspects: heterogeneous information networks and node embedding vector representation of network structure.
Heterogeneous information networks
An information network [14] can be represented by a directed graph G = (V, E), where V and E represent the object and link sets, respectively. It also contains an object-to-object-type mapping function θ : V → T and a mapping function ψ : E → R related to the link type.
Unlike social network analysis, information networks clearly distinguish the types of objects and relationships in the network. When the number of object types |T| and relationship types |R| both equal 1, the information network is considered a homogeneous information network. When the sum of the number of object and relationship types |T| + |R| > 2, the information network is considered to be heterogeneous [15]. Figure 1 shows schematics of homogeneous and heterogeneous information networks.

Schematic of (a) a homogeneous information network and (b) a heterogeneous information network.
The network scheme [14] is a meta-template that describes the HIN as G = (V, E). Its directed graph can be expressed using G = (T, R). Node type mapping can be expressed as θ : V → T, and the link-type mapping is ψ:E → R. The nodes in the network pattern structure are typically biased; that is, the number of nodes of one type is greater than that of other types. Through in-depth mathematical analysis, heterogeneous relations can be divided into affiliation relations (ARs) and interaction relations (IRs) with different structural characteristics [16]. From the different relationships that exist between the two types of nodes, different network schemes can be constructed, as shown in Fig. 2.

Common network schemes, including the (a) multi-relation, (b) dichotomous, (c) multi-center, and (d) star schemes.
For example, as shown in Fig. 3, the heterogeneous information network formed by venues, papers, terms, and authors. Figure 3 (a) illustrates the network scheme that describes the heterogeneous network objects in existing literature and the types of relationships between them. Figure 3 (b) shows the schematic instance of Fig. 3 (a), where four types of objects are included: authors, papers, venues, and terms. Links connect different types of objects, and the type of link is defined by the relationship between two types of objects; that is, the attributes of different types of nodes and edges have different meanings [17]. For instance, a link between a paper and a venue indicates a publication and authors being published, whereas a link between a paper and a term indicates this paper contains this term.

Heterogeneous information network of conference papers.
Sun et al. [8] proposed the concept of a meta-path in 2011, which associates multiple node and edge types in heterogeneous information networks, thereby making the complex spatial relationships of heterogeneous information networks simple and effective. For a given heterogeneous information network, G = (V, E),

Examples of meta-paths for conference papers.
As observed in Fig. 4, Author1 → Paper1 → Author2 indicates the author who co-published the same paper with the target author, and Author1 → Paper1 → Venue1 → Paper2 → Author1 represents authors who have published different papers in the same venue. The above two meta-path examples show that the meta-path is an essential substructure of a heterogeneous information network, and it can effectively reflect path inclusion semantic information.
In natural language processing and text mining, word vectors are typically used as representations of the inner meanings of words. From traditional vector representations to more recent word embedding representations, word vectors have been widely utilized as a common feature of text [18, 19]. Similarly, the purpose of network embedding representation is to embed nodes in a low-dimensional space while maintaining the structure and properties of the network and then apply them as basic features to clustering, classification, prediction, or recommendation algorithms [20]. Network representation learning models are generally split into three types: (1) Models based on matrix factorization (2) Models based on random walks and (3) Models based on deep neural networks [21].
Perozzi et al. [12] proposed the Deepwalk model, which is based on random walks. The model uses Word2vec [18] as a reference, transform the network structure into multiple node sequences, and proves that nodes of the graph and the words after random walks have similar characteristics. After transforming the network structure into a sequence of nodes, the vector representation of each node can be obtained via the skip-gram model in Word2vec. The node2vec model [22], proposed by Grover, changes the way a node walks based on Deepwalk. Grover first proposed two methods of walking between network nodes, breadth-first sampling and depth-first sampling, and combined these two methods to obtain the probability of the current node being transferred to a different type of node. The node sequence obtained by walking is used to train the node sequence using Word2vec. The structural deep network embedding (SDNE) model, proposed by Wang [23], is an improved form of the auto-encoder method [24], and it is used to realize the embedding of network nodes.
The above-mentioned network embedding methods are all based on homogeneous information networks, and they cannot be directly applied to heterogeneous information networks. With the introduction of heterogeneous information networks, there has been an increasing number of studies based on heterogeneous information network embedding representations. Chang et al. [25] designed a deep embedding model to capture the complex interactions between heterogeneous data in a network. The model maps different types of vertices in a heterogeneous information network to a space of the same dimension to obtain the vector representation of the same dimension. Xu et al. [26] proposed an embedding of embedding (EOE) method to encode the intra- and inter-network edges of coupled heterogeneous networks. A harmonic embedding matrix is used to express the links and edges between different networks to achieve the vectorization of the network nodes. However, it is only used in two simple networks, and the loss functions of the two networks are simply added; thus, the scope of this method’s application is restricted. The metapath2vec model [27], proposed by Dong et al., first defines node neighbors through meta-paths and then learns heterogeneous embeddings through negatively sampled skip-grams. HeGAN [28] is inspired by generative adversarial networks (GANs), which train both a discriminator and a generator in a minimax game. Finally, node classification proves the effectiveness of the method. Although these methods can learn network embeddings in various heterogeneous networks, their representations of nodes and relationships cannot be effectively applied to predictive models.
In this study, we use the network embedding method to extract effective information from a heterogeneous information network, and we use this information to predict the click frequency of applications. The node relationship in the heterogeneous information network is converted into a homogeneous information network according to the preset meta-path to generate different node sequences. Then, the network embedding method is used in the homogeneous information networks to obtain the representation vector of each node, which is used as the input of the matrix factorization model for prediction.
Heterogeneous information network representation for prediction
In this section, we introduce the prediction model based on a heterogeneous information network representation; its implementation framework is shown in Fig. 5. We describe the construction from the dataset to the heterogeneous information network, the selection and regulation of the meta-path, the formation of the corresponding node vector using Deepwalk, the fusion function is used to fuse different semantic vectors generated by the same node under different meta paths, their use as the input to the probability matrix factorization model, and finally, the output of the corresponding prediction results.

Overall framework of proposed models.
For the construction of a heterogeneous information network scheme, we assume that the same type of node is represented as a set and connect other types of sets with clear relationships nearby until all node sets are connected directly or indirectly. The heterogeneous relationship between two connected sets is used to determine the network scheme to which it belongs, which is convenient for the next step of the analysis. As an example, Fig. 6 (a) and Fig. 6 (b) reflect the HIN for the social networking site Douban movies and that of Douban books, respectively.

Different HIN schemes for (a) Douban movies and (b) Douban books.
Based on the above content, we construct a network scheme and schematic instance of the heterogeneous information network, as shown in Fig. 7 based on a user’s mobile application data. It can be observed that the network scheme belongs to a multi-center scheme, in which the node set Clickfrequency is the predicted result, and a variety of meta-paths can be constructed using the remaining nodes.

HIN based on mobile phone devices and other information. (a) shows the network model constructed in this paper, and (b) shows the network instance in the network scheme.
The network embedding methods only focus on a homogeneous network that is composed of a single type of node and link, and they cannot directly deal with a heterogeneous network that is composed of multiple node and link types.
According to the constructed HIN, different types of meta-paths can be constructed through the relationship between nodes to obtain the semantic relationship between them. Therefore, our primary task is to transform the heterogeneous information network into multiple homogeneous information networks through meta-paths, so as to facilitate the generation of node sequences in the next step. The main task of all this section is to construct the required meta-paths.
For the device application heterogeneous information network scheme, shown in Fig. 7 (a), this HIN is a multi-center mode. This heterogeneous information network is a multi-center mode; therefore, when selecting the meta-paths, only those starting with the node sets Device and Application are selected. According to part of the network examples that are constructed by Device and Application in Fig. 7 (b), the required construction meta-path can be displayed more intuitively. As shown in Fig. 8, the meta-paths shown in the figure are: Device → Application → Device and Device → Application → Applicationtype → Application → Device.

Partial meta-path instances.
For the convenience of subsequent description, we stipulate that U represents the node set Device, Ap the node set Application, Se the node set Gender, Ag the node set Age, Ty the node set Types of device, and Tp the node Types of application; all of the types of meta-paths shown in Table 1 have been constructed. To explore the influence of the number of nodes in the node set on the final prediction results when building meta-path types, we divided Types of application into Application main classification (TP) and Application detailed classification (TP2), and Types of device into device brand (Ty) and device model (Ty2). The specific results are presented in section 4.
The meaning represented by the meta-path
Because Deepwalk is more effective for the processing of sparse matrices and large networks, the specific method presented in this section adopts the improved Deepwalk method.
To make the generated node sequence meaningful, we use the random walk method based on the meta-path to generate the node sequences so that they can capture the complex semantics contained in the heterogeneous information network. The specific definition is as follows: given a heterogeneous information network G = (V, E), a meta-path
In the distribution, n t represents the t-th node that is walked to, A t is the type of v, and N A t+1 (v) is the first-order neighbor set of nodes of v, of which the type is At+1. Walking repeatedly follows the pattern of the meta-path until it reaches the set length.
For example, in the HIN instance of a device application defined in Fig. 7 (b), given a meta-path U - Ap - U, two walking sequences can be generated from the node Device1: (1) Device1 → Application1 → Device2 and (2) Device1 → Application2 → Device3. Similarly, for a given meta-path U - Ap - Tp - Ap - U, the following walking sequence can be obtained: (1) Device1 → Application1 → Types of application1 → Application2 → Device3. It can be observed that different types of meta-paths can generate node sequences with different semantic relationships.
After transforming the network structure into a sequence of nodes, the vector representation of each node is obtained through skip-gram model training in Word2vec. Skip-gram is a neural network composed of an input layer, mapping layer, and output layer. In the input layer, different nodes in a node sequence are represented by one-hot encoding, that is, all nodes in a node sequence are represented as an n-dimension vector, and N is the number of nodes. The hidden layer is equivalent to a weight matrix with M rows and N columns. M represents the total number of node types, and each row of the weight matrix is a node vector. Therefore, the ultimate aim of skip-gram is to learn the weight matrix of the hidden layer. In the output layer, each neuron classifies the node vector output by the hidden layer using a softmax classifier to obtain a specific weight value between 0 and 1.
After obtaining the embedding representations of different meta-paths, we fuse those of the same node that are set as the output of HIN embedding.
From the three fusion methods proposed by Shi et al. [13], we adopt personalized nonlinear fusion, which can better reflect the preference weight of the user’s device for different applications compared with simple linear fusion. It assigns different meta-path weights to different user devices, which are defined as follows:
Similarly,
The obtained embedding fusion result is weighted and averaged with the traditional matrix factorization probabilistic matrix factorization (PMF) [29] algorithm, which is used as the prediction result. The final prediction result is defined as follows:
In this section, the data source and preprocessing used in the experiment are first introduced, and the evaluation indicators of the model are then determined. Then following experiments are carried out to determine the final model: the adjustment of hyper parameters, the impact of meta-path types on the prediction results is investigated. Finally, we compare our model with other baseline methods, and adjust the training set ratio (from 90% to 60%) to observe the change trend of the results.
Data analysis and preprocessing
The data in this study are partially selected from proposition 2: “prediction of gender and age” in the second Analysis Algorithm Competition (http://ds.analysys.cn/xbnl.html). The initial data structure is shown in Table 2.
Initial data structure diagram
Initial data structure diagram
According to the initial data structure and the data preprocessing method in RHINE [16], we first analyze the node relationships in the data and obtain the statistics shown in Table 3.
Node degree and relationship type table (where IR: Interaction Relation and AR: Affiliation Relation)
It can be observed that each user is represented by a device on which many applications are installed. To obtain the click frequency of the application, we first count the earliest and latest times of all applications on the same device from opening to closing, and the time difference is regarded as the number of days for which users utilize the mobile phone. We then count the number of times that applications are opened and closed, and we obtain the users’ average daily clicks for each application.
The distribution of the average daily click volume of the applications is not smooth, as shown in Fig. 9.

Distribution of daily average clicks for an application.
The number of applications with an average daily click volume of more than 10 comprise approximately 5% of the total; the highest number of clicks is more than 200. Therefore, we regard a number of clicks higher than 10 as a high-frequency click rate. To better observe the following data and reduce the error of the loss function, a number of clicks above 10 is unified into a number of clicks equal to 10.
In this study, we use the mean absolute error (MAE) and root mean square error (RMSE) as evaluation indicators for the test set, which are defined as follows:
Where n represents the number of test sets,
First, we adjusted the preference weights of the devices and applications. When the ration of the training set to the test set is 9:1, the preference weight α of the device is changed, and the result shown in Fig. 10 is obtained; it can be observed that as the weight of a device increases, its MAE and RMSE variation trends are inversely proportional.

Influence of the adjustment of the device preference weight α the on MAE and RMSE.
The results shown in Fig. 11 are obtained over a change in the application preference weight β.

Influence of the adjustment of the application preference weight β on the MAE and RMSE.
To facilitate the observation of the best weighting influence, we directly sum the MAE and RMSE, and we observe the lowest point, the sum of which is shown in Fig. 12.

Sum of the MAE and RMSE.
From the sum, we can observe that with an increase in α, the prediction error significantly decreases, and it begins to increase slowly as it approaches 0.3. For the application weight, the prediction error starts to stabilize with an increase in β; however, when it increases to approximately 0.5, the error begins to increase significantly. After many tests, it is observed that when the proportion of the preference weight is too small, increasing the types of meta-paths increases the prediction error. The reason for this is that there is too little effective content for the model to learn the node relationships; when the preference weight is too large, the prediction error in Fig. 12 will increase. This may be because the learned node relationship accounts for too much information, which results in noisy information. Therefore, in the subsequent experiments, the case of α = β = 0.5 is adopted.
An experiment is conducted to quickly determine the weight ratio of the nodes, and the results are shown in Fig. 13. First, the two meta-paths are tested; that is, only the node sets Device and Application are used for testing. Second, different nodes are merged in sequence, and finally, the influence of different types of meta-paths on the prediction results are compared and analyzed. For ease of representation, the abscissa is set as the number of nodes, which represents the type of meta-path of the passed nodes. For example, the abscissa U/Ap/Tp2 represents the meta-path U - Ap - U, Ap - U - Ap, U - Ap - Tp2 - Ap - U, Ap - Tp2 - Ap.

(a) MAE and (b) RMSE of the prediction results of two and four meta-paths.
From the Fig. 13, it is evident that in addition to the node sets Device and Application, the most noticeable improvement in the prediction results is observed for the nodes Age and Gender; the third is the node set Types of application. These results are in good agreement with the actual situation. Users of different age groups and genders will exhibit significant differences in their preferences of the same application.
It can be observed that the MAE results for the main category Tp and the detailed category Tp2 for the Types of application are almost equal, whereas the RMSE results for the main category of the application Tp exhibits a smaller result than that for Tp2. It can be observed that the application of classification is too detailed, which results in inferior model prediction results. The main reason may be that a large number of noise semantics is generated when the nodes walk. However, the main classification of application TP2 increases the generalization ability of the model.
The node set that exhibited the least improvement is Types of device. Similarly, for the main equipment classification Ty and the equipment detailed classification Ty2, the MAE results are approximately the same, but the detailed classification of the device Ty2 for the RMSE is better. This shows that the classification method of Ty causes an uneven distribution of nodes, and the semantic information in the detailed classification of the device Ty2 is easier to mine.
In the subsequent experiments, we selected the following node sets according to the above analysis: U, Ap, Ag, Se, Tp, and Ty2.
Next, the types of meta-paths are gradually increased according to the degree of importance to analyze the loss function of the prediction result, which is shown in Fig. 14.

Loss function of prediction results according to added meta-paths.
It can be observed that as the number of meta-paths increases, the prediction effect gradually improves. When the last node set Ty2 is added, the model’s prediction results related to the MAE and RMSE both increase, which may occur because the node set Ty2 itself does not obtain significant effective information, and it contains many noisy semantics. Therefore, we remove its nodes in the follow-up test and select the meta-paths generated by node sets U, Ap, Ag, Se, and Tp for fusion, which is regarded as the optimal result.
We use the following methods for comparison, and we list the results of model HEPre2 using the initial meta-paths and model HEPre5 using the optimal meta-paths in this paper.
In addition to the comparison of different methods, we also adjusted the ratio of the training set to the test set, by gradually reducing the proportion of the training set. We analyzed the advantages and disadvantages of the method used in this study when faced with sparse data compared with other methods. The final results are shown in Table 4, in which the best results are marked in bold, the second-best results are underlined, and the percentage improvement of each method relative to the PMF method is marked below the results.
Effect of training ratio and comparison results of other methods
Effect of training ratio and comparison results of other methods
It can be observed from the table that under the same training ratio, the prediction results based on the heterogeneous information network used in this study are significantly better than other prediction methods, and as the types of the meta-paths increase, the model learns more effective information.
As shown in Fig. 15, as the training ratio decreases longitudinally, the data becomes sparser, and the loss function of the prediction result is larger. The slope of the loss function of the BiasSVD and NMF methods is steeper, while the slope of the HEPre method remains stable; therefore, this demonstrates the effectiveness of the method when encountering the sparse matrix.

Comparison of methods under different training ratios.
In this study, we exploit the rich semantics of HINs, use their principles for the prediction of the click frequency of mobile applications, and compare methods to verify that the semantics obtained through the heterogeneous information network are more detailed. We also adjust the number of nodes under the same node set to explore the impact on network representation. The above method realizes the prediction of the click frequency of mobile phone applications. Through a statistical analysis of the results, the users’ preferences for different types of applications can be predicted, which enables us to proceed to the next step of application recommendation or friend recommendation.
The current limitation primarily lies in whether the rich semantics of heterogeneous information networks can be further mined when the dataset remains unchanged. As the next step, we can consider the following three points to improve our model. First, we can use the deep learning algorithm to better fuse the vector representation of multiple meta-paths. Second, in the process of generating the node sequence, we did not notice the weight relationship between the different node sets. Finally, in the network representation, the effective adjustment of the number of the same type of node should also be considered.
Footnotes
Acknowledgments
This research is supported by the National Natural Science Foundation of China (Grant No.62072288, 61303167, 61702306), Shandong Provincial Natural Science Foundation, China (ZR2017BF015), SDUST Research Fund (2015TDJH102), Supported by SDUST Excellent Teaching team Construction Plan (JXTD20180505), the Taishan Scholar Program of Shandong Province (Grant No.ts20190936).
