Abstract
As user-generated content suffers from severe problems of data sparseness, many topic models designed for social network are proposed. Without a unified metric, the methods to weigh topics are mixed. Since topic models like Dirichlet Allocation (LDA) can summarize the main information of news articles, topics of short messages on social networking websites should reveal key features of the authors of messages. Personality is the natural characteristic of human. Past work on personality identification has shown that the words people say on social network can reveal people’s personality but none of them compare the effects of different topic model tactics on personality identifying. We run LDA and one of its variant (Twitter-LDA) on real social network data (Facebook status messages) then use the topics distribution as features to identify pre-labelled Big-Five personality traits. The results demonstrate that the likelihood of personality as a metric can discover more features of topic models than the model designer said. Furthermore, our research add values to personality identification.
Introduction
Topic models are a series of algorithms to uncover the salient information lay behind document collections. Among these algorithms, the unsupervised algorithm Latent Dirichlet Allocation (LDA) which proposed by David Blei on 2003 [2] made topic models even more well known. As LDA is easy to modify and extend, many variants of LDA have been created for different purposes.
In recent years, social network (like Facebook and Twitter) has become a giant source of texts. User profiles, tweets, replies and status updates, most of the texts on social network are short messages. Topics derived from these messages can help us developing new ways to summarize large archives of texts and predicting specific features of users or the messages themselves. Although LDA is known to obtain qualified topic results from common text collections such as news articles [11], scientific papers [22] and blogs [8], the results of studies applying Topic Models to Social Network short messages (TMSN for short in the following article) are mixed.
Topics of short messages on social networking websites should reveal key features of the social network users, we say it is personality. The joint study of TMSN and personality prediction seems natural and promising. The idea personality as a metric for topic models is also an extension to personality prediction based on social network digital records.
We choose two TMSNs (LDA and Twitter-LDA) and apply them on real world data (labelled with personality score), then we test their prediction power for personality. With several sets of experiments, the following contributions are made:
Our work is the first to consider personality as a metric for topic models.
The innovative attempt show some important features of Twitter-LDA that the author did not speak of.
Whereas the past work focuses on proving the practicability of personality, our work is also an important step for personality prediction which turn to seek for generality.
For simplicity and efficiency reasons, we did not make a very comprehensive study, but with only two algorithms we showed the meaningfulness of joint study of personality prediction and TMSN.
Related work
Topic models for social network short texts
The fundamental reason we cannot directly apply original LDA to short texts is the sparsity problem. That is, the occurrences of words in short document play less discriminative role compared to lengthy documents. Phan [14] learned hidden topics from large external resources to enrich the representation of short texts. Yan, the author of biterm topic model [24] believed they proposed the first topic model which does not exploit external knowledges, which is not exact. The most straightforward way to alleviate the sparsity problem is to aggregate short texts into lengthy pseudo-documents before training a standard topic model. Hong [7] found that topics trained on aggregated tweets can obtain better property than on individual tweets. Zhao [27] compared Twitter and traditional media using topic models. They also proposed a Twitter-LDA model based on standard LDA. Xu noticed social network users usually publish noisy posts about their lives or create conversation with others, which do not relate to their topics of interest [23]. A modified author–topic model named Twitter–user is designed to solve this problem. Zhang [25] proposed MB-LDA which featured with taking contactor relevance relation into account in microblogs.
Personality studies of social networking
Personality is an overview of human individual characteristics [5]. It describes how one person differs from another. Personality not only highly correlates with people’s real world behavior, but also correlates with the people’s virtual world behavior on social network. One of the most reliable mainstream personality models is Five Factor Model (FFM). The five dimensions (Big-Five personality traits) are shown in Table 1.
Big-Five personality traits
Big-Five personality traits
Early days it is found that people with diverse personality types treat social network differently [3,6,16]. It means that it is possible to predict social network users’ personality using their digital records. Then Mitja et al. [1] designed an experiment to compare and contrast three strategies of obtaining personality scores which are self-reported personality, participant’s idealized personality and bystander’s judgement. The result supports the conclusion that social network profiles reflect actual personality, not self-idealization.
Based on the early work, many researchers concentrate their efforts on predicting users’ personality with social network digital records. A common way is to count words from a psychological dictionary, such as Linguistic Inquiry and Word Count or LIWC [13]. The representative works using LIWC are [4,21]. Without using closed-vocabulary techniques, Schwartz et al. [17,18] examined n-grams and LDA topics as a function of Big-Five personality traits. They use the default parameters within an implementation of LDA except that they adjusted a parameter (α) to favor fewer topics per document. To date, we are not aware of any other study which explores different topic model strategies and algorithms for the purpose of gaining personality insights.
Evaluation methods for topic models
The evaluation methods we talk about are for topic models. Topic modelling can be seen as a unsupervised clustering procedure, then here lays a fact that it is hard to measure a unsupervised clustering algorithm. Many versions of topic model are proposed but their evaluation methods are mixed. In other words, there is no unified mechanism for weighing topic quality. Interpretable and the predictive power are two goals most of the topic models try to meet.
The most interpretable topics are the ones just like man-made. Compare the results generated from topic models to hand assigned categories, if by chance they actually do get mapped to man-made (which is probably unlikely), a metric based on human subjective choices is still unreliable. There are many other evaluation methods based on human judgements have the same problem. To solve the problem, a compromise is to use external or domain specific knowledge hash-tags. The evaluation method used in [24] is based on the fact that clusters organized with same hash-tags should have low intra-cluster distances and high inter-cluster distances. The defect is also obvious that the quality of hash-tags (which is uncontrollable) plays the most important role.
There are some other metrics do not seek for external help but most of them can only measure an aspect of topic models. The most popular one of these metrics is point-wise mutual information (PMI) [12] which is superb metrics for coherence. Similarly, Yan et al. [24] used an automated metric, namely coherence score, proposed by [10].
Another metric has been widely used is perplexity or the likelihood of held-out data. The basic idea of perplexity is based on the predictive power of topics, but without suitable held-out data, such statistics cannot directly measure the quality of topics.
Enlightened from the insights of personality psychology, a good topic model in social network environment should be bonded with user’s personality. Likelihood of Personality naturally is a good metric for TMSN.
Experiments
Experiment setup
The primary objective of this study is to explore the feasibility of evaluating topic models with personality. As described above, there is still no ideal or generally accepted evaluation methods exist for topic models so far, so it is a challenging task for us to prove the feasibility of personality as a metric.
Firstly, personality as a metric can be seen as a golden standard itself. There is a great demand to extract different aspects of features of social network users using TMSNs. If the topic distribution of one social network user can fit in with his Big-Five personality, the used topic model can be evaluated as high quality.
Although the value of personality as a metric is self-satisfied, we still setup an experiment to show the explanation of personality. We applied two different topic model algorithms to 250 Facebook users who also have Big-Five personality scores. Then we used their topic distributions as features to fit their personality scores. At last, we analyzed the performance of two different topic model algorithms.
Data
We use a gold standard labelled dataset shared on Workshop on Computational Personality Recognition 2013 (WCPR13). The dataset was collected from Facebook, contains Facebook status messages as corpus, basic information of users, gold standard Big-Five personality scores. There are 250 different Facebook users involved in the data but a portion of them have limited texts. We plot the distributions of messages numbers per user in Fig. 1, distributions of personality traits in Table 2.

Distributions of the message numbers per user.
Distributions of each personality traits
As a result of the non-uniform distribution, we find the users with limited texts have a great influence on prediction performance. After several attempts, we remove the users with less than 200 byte texts or 5 messages. After the pre-process, there are 198 users left. We concatenate all the short messages posted by the same author as “pseudo-documents” and put them to topic models. To be fair, two topic models share the same stop words and other parameters. The distribution of topics will be imported into a supervised learning algorithm as features to predict personality traits.
In this paper, we do not introduce new topic models to address the issues of short message modelling especially in social network environments. The two existing TMSNs we test are classic LDA and Twitter-LDA. For LDA, we use an implementation provided by the Mallet package [9]. As we aggregate all the messages of a user as a single document, it can be regarded as an application of the author–topic model [19]. Twitter-LDA is a variant of LDA which is optimized for short messages [27]. The graphical model of Twitter-LDA is shown in Fig. 2. The implementation of Twitter-LDA is provided by Text Mining Group from Singapore Management University.

Plate notation of Twitter-LDA.
Our goal is to compare the relative predicting performance of different TMSNs, so we did not make any adjustment for higher accuracy. Since five dimensions belong to FFM, for each dimension we train a binary classifier that separates the users displaying the trait from those who do not. The machine learning algorithm we choose is Support Vector Machines (SVM). All results are computed using 10-fold cross-validation. We adjust the topic number to 100 and run two topic models. Table 3 shows the precision of classification on five dimensions.
Precisions of predicting Big-Five personality traits using 100 topics. Best results for each trait are shown in bold
Precisions of predicting Big-Five personality traits using 100 topics. Best results for each trait are shown in bold
As shown in Table 3, the result is contrary to expectation. The optimized algorithm (Twitter-LDA) shows it is superiority only on Agreeableness and fails to exceed LDA. We guess the reason is the non-uniform distribution of each user’s text length as shown in Table 1. Unlike LDA, Twitter-LDA produces a single topic from a single message. With uneven distribution of tweets, Twitter-LDA is easier to generate sparser topic matrix.
Then we train 50 topics for both two algorithms. The results are shown in Table 4.
Precisions of predicting Big-Five personality traits using 50 topics. Best results for each trait are shown in bold
When the number of topics is set to 50, Twitter-LDA begins to rock. On three personality traits, Twitter-LDA shows better or equivalent capability for personality prediction. An exception is Agreeableness, when the topic number reduced to 50, the performance of LDA rises and Twitter-LDA falls.
As educed from Tang et al.’s work [20], the number of documents plays perhaps the most important role. By using only 198 documents, we still get some meaningful conclusion from the results:
The parameters of topic models is vital to personality recognition. For instance, Twitter-LDA favors less topics compared with LDA.
The distribution of amount of messages also influence the identification effects.
Different personality dimensions need different parameters (previous works always use the same parameters on all the dimensions). In our study, agreeableness prefers more topics than other dimensions.
For our current results, we have a hypothesis that when the number of topics are set to 50, the performance of topic models can be better developed than 100 topics, so that Twitter-LDA showed better capability on personality prediction than LDA. This hypothesis can be partially confirmed by a former study on suicide ideation prediction [26]. In this study, the author extracted topic features using LDA in a supervised learning model for suicide ideation prediction. Although this study is not about personality prediction, suicide ideation as a psychological index has a similar function just like personality in our study. The topic models we used in this study are all trained models instead of inferred models. As is shown in Fig. 3, when the number of topics is less than 80, the trained topics have the similar performance on the task, and when the number is set to 100, there is a peak of poor performance where the dotted line reaches a local maximum point. It means when the number of topics is set to 100, it is possible that all the topic models all show their poorest performance and the better topic models cannot show their advantages under this circumstance. Only when the number of topics is set lower than 80, the better topic models can perform better than the normal ones. Resnik’s study, they set 50 as the experience topic numbers for both neuroticism and depression prediction, also confirmed our hypothesis [15].

RMSE for suicide prediction across number of topics. Dotted line is for trained model.
In this paper, we demonstrated that personality as a metrics can provide insights to social network modelling. Even though personality score is hard to obtain, the idea can offer guidance to topic model researchers. From a view of practice, some pre-labelled dataset can be utilized to filter and examine new TMSNs.
We are committed to find topic models representing authors’ personality best, our attempts also can be seen as a brave step for personality recognition while most existing works are still working on proving the feasibility. The most difficult in future studying is the implementations of topic model algorithms. Only a small part of authors provided source code of their algorithms in corresponding papers, so it is a great burden to pick suitable topic models and implement them to specific dataset.
There is also a major limitation of our study that our experiments did not prove enough evidences why personality is a valuable metric for TMSNs. The most straightforward value of a metric is its value for real applications but it hard to apply this metric to industrial environments for real. We expect the value of our metric can be proved in industrial environments in the future.
