Personality as a metric for topic models on social networks

Abstract

As user-generated content suffers from severe problems of data sparseness, many topic models designed for social network are proposed. Without a unified metric, the methods to weigh topics are mixed. Since topic models like Dirichlet Allocation (LDA) can summarize the main information of news articles, topics of short messages on social networking websites should reveal key features of the authors of messages. Personality is the natural characteristic of human. Past work on personality identification has shown that the words people say on social network can reveal people’s personality but none of them compare the effects of different topic model tactics on personality identifying. We run LDA and one of its variant (Twitter-LDA) on real social network data (Facebook status messages) then use the topics distribution as features to identify pre-labelled Big-Five personality traits. The results demonstrate that the likelihood of personality as a metric can discover more features of topic models than the model designer said. Furthermore, our research add values to personality identification.

Keywords

Social networks topic models personality

1. Introduction

Topic models are a series of algorithms to uncover the salient information lay behind document collections. Among these algorithms, the unsupervised algorithm Latent Dirichlet Allocation (LDA) which proposed by David Blei on 2003 [2] made topic models even more well known. As LDA is easy to modify and extend, many variants of LDA have been created for different purposes.

In recent years, social network (like Facebook and Twitter) has become a giant source of texts. User profiles, tweets, replies and status updates, most of the texts on social network are short messages. Topics derived from these messages can help us developing new ways to summarize large archives of texts and predicting specific features of users or the messages themselves. Although LDA is known to obtain qualified topic results from common text collections such as news articles [11], scientific papers [22] and blogs [8], the results of studies applying Topic Models to Social Network short messages (TMSN for short in the following article) are mixed.

Topics of short messages on social networking websites should reveal key features of the social network users, we say it is personality. The joint study of TMSN and personality prediction seems natural and promising. The idea personality as a metric for topic models is also an extension to personality prediction based on social network digital records.

We choose two TMSNs (LDA and Twitter-LDA) and apply them on real world data (labelled with personality score), then we test their prediction power for personality. With several sets of experiments, the following contributions are made:

Our work is the first to consider personality as a metric for topic models.

The innovative attempt show some important features of Twitter-LDA that the author did not speak of.

Whereas the past work focuses on proving the practicability of personality, our work is also an important step for personality prediction which turn to seek for generality.

For simplicity and efficiency reasons, we did not make a very comprehensive study, but with only two algorithms we showed the meaningfulness of joint study of personality prediction and TMSN.

2. Related work

2.1. Topic models for social network short texts

The fundamental reason we cannot directly apply original LDA to short texts is the sparsity problem. That is, the occurrences of words in short document play less discriminative role compared to lengthy documents. Phan [14] learned hidden topics from large external resources to enrich the representation of short texts. Yan, the author of biterm topic model [24] believed they proposed the first topic model which does not exploit external knowledges, which is not exact. The most straightforward way to alleviate the sparsity problem is to aggregate short texts into lengthy pseudo-documents before training a standard topic model. Hong [7] found that topics trained on aggregated tweets can obtain better property than on individual tweets. Zhao [27] compared Twitter and traditional media using topic models. They also proposed a Twitter-LDA model based on standard LDA. Xu noticed social network users usually publish noisy posts about their lives or create conversation with others, which do not relate to their topics of interest [23]. A modified author–topic model named Twitter–user is designed to solve this problem. Zhang [25] proposed MB-LDA which featured with taking contactor relevance relation into account in microblogs.

2.2. Personality studies of social networking

Personality is an overview of human individual characteristics [5]. It describes how one person differs from another. Personality not only highly correlates with people’s real world behavior, but also correlates with the people’s virtual world behavior on social network. One of the most reliable mainstream personality models is Five Factor Model (FFM). The five dimensions (Big-Five personality traits) are shown in Table 1.

Table 1
Big-Five personality traits

Personality trait High scores Low scores

Openness Inventive/curious Consistent/cautious

Conscientiousness Efficient/organized Easy-going/careless

Extraversion Outgoing/energetic Solitary/reserved

Agreeableness Friendly/compassionate Analytical/detached

Neuroticism Sensitive/nervous Secure/confident

Personality trait	High scores	Low scores
Openness	Inventive/curious	Consistent/cautious
Conscientiousness	Efficient/organized	Easy-going/careless
Extraversion	Outgoing/energetic	Solitary/reserved
Agreeableness	Friendly/compassionate	Analytical/detached
Neuroticism	Sensitive/nervous	Secure/confident

Early days it is found that people with diverse personality types treat social network differently [3,6,16]. It means that it is possible to predict social network users’ personality using their digital records. Then Mitja et al. [1] designed an experiment to compare and contrast three strategies of obtaining personality scores which are self-reported personality, participant’s idealized personality and bystander’s judgement. The result supports the conclusion that social network profiles reflect actual personality, not self-idealization.

2.3. Personality computing

Based on the early work, many researchers concentrate their efforts on predicting users’ personality with social network digital records. A common way is to count words from a psychological dictionary, such as Linguistic Inquiry and Word Count or LIWC [13]. The representative works using LIWC are [4,21]. Without using closed-vocabulary techniques, Schwartz et al. [17,18] examined n-grams and LDA topics as a function of Big-Five personality traits. They use the default parameters within an implementation of LDA except that they adjusted a parameter (α) to favor fewer topics per document. To date, we are not aware of any other study which explores different topic model strategies and algorithms for the purpose of gaining personality insights.

2.4. Evaluation methods for topic models

The evaluation methods we talk about are for topic models. Topic modelling can be seen as a unsupervised clustering procedure, then here lays a fact that it is hard to measure a unsupervised clustering algorithm. Many versions of topic model are proposed but their evaluation methods are mixed. In other words, there is no unified mechanism for weighing topic quality. Interpretable and the predictive power are two goals most of the topic models try to meet.

The most interpretable topics are the ones just like man-made. Compare the results generated from topic models to hand assigned categories, if by chance they actually do get mapped to man-made (which is probably unlikely), a metric based on human subjective choices is still unreliable. There are many other evaluation methods based on human judgements have the same problem. To solve the problem, a compromise is to use external or domain specific knowledge hash-tags. The evaluation method used in [24] is based on the fact that clusters organized with same hash-tags should have low intra-cluster distances and high inter-cluster distances. The defect is also obvious that the quality of hash-tags (which is uncontrollable) plays the most important role.

There are some other metrics do not seek for external help but most of them can only measure an aspect of topic models. The most popular one of these metrics is point-wise mutual information (PMI) [12] which is superb metrics for coherence. Similarly, Yan et al. [24] used an automated metric, namely coherence score, proposed by [10].

Another metric has been widely used is perplexity or the likelihood of held-out data. The basic idea of perplexity is based on the predictive power of topics, but without suitable held-out data, such statistics cannot directly measure the quality of topics.

Enlightened from the insights of personality psychology, a good topic model in social network environment should be bonded with user’s personality. Likelihood of Personality naturally is a good metric for TMSN.

3. Experiments

3.1. Experiment setup

The primary objective of this study is to explore the feasibility of evaluating topic models with personality. As described above, there is still no ideal or generally accepted evaluation methods exist for topic models so far, so it is a challenging task for us to prove the feasibility of personality as a metric.

Firstly, personality as a metric can be seen as a golden standard itself. There is a great demand to extract different aspects of features of social network users using TMSNs. If the topic distribution of one social network user can fit in with his Big-Five personality, the used topic model can be evaluated as high quality.

Although the value of personality as a metric is self-satisfied, we still setup an experiment to show the explanation of personality. We applied two different topic model algorithms to 250 Facebook users who also have Big-Five personality scores. Then we used their topic distributions as features to fit their personality scores. At last, we analyzed the performance of two different topic model algorithms.

3.2. Data

We use a gold standard labelled dataset shared on Workshop on Computational Personality Recognition 2013 (WCPR13). The dataset was collected from Facebook, contains Facebook status messages as corpus, basic information of users, gold standard Big-Five personality scores. There are 250 different Facebook users involved in the data but a portion of them have limited texts. We plot the distributions of messages numbers per user in Fig. 1, distributions of personality traits in Table 2.

Fig. 1.

Distributions of the message numbers per user.

Table 2

Distributions of each personality traits

Personality trait	y	n
Extraversion	78	120
Neuroticism	75	123
Agreeableness	109	89
Conscientiousness	103	95
Openness	151	57

As a result of the non-uniform distribution, we find the users with limited texts have a great influence on prediction performance. After several attempts, we remove the users with less than 200 byte texts or 5 messages. After the pre-process, there are 198 users left. We concatenate all the short messages posted by the same author as “pseudo-documents” and put them to topic models. To be fair, two topic models share the same stop words and other parameters. The distribution of topics will be imported into a supervised learning algorithm as features to predict personality traits.

3.3. Topic models

In this paper, we do not introduce new topic models to address the issues of short message modelling especially in social network environments. The two existing TMSNs we test are classic LDA and Twitter-LDA. For LDA, we use an implementation provided by the Mallet package [9]. As we aggregate all the messages of a user as a single document, it can be regarded as an application of the author–topic model [19]. Twitter-LDA is a variant of LDA which is optimized for short messages [27]. The graphical model of Twitter-LDA is shown in Fig. 2. The implementation of Twitter-LDA is provided by Text Mining Group from Singapore Management University.

Fig. 2.

Plate notation of Twitter-LDA.

4. Results

Our goal is to compare the relative predicting performance of different TMSNs, so we did not make any adjustment for higher accuracy. Since five dimensions belong to FFM, for each dimension we train a binary classifier that separates the users displaying the trait from those who do not. The machine learning algorithm we choose is Support Vector Machines (SVM). All results are computed using 10-fold cross-validation. We adjust the topic number to 100 and run two topic models. Table 3 shows the precision of classification on five dimensions.

Table 3
Precisions of predicting Big-Five personality traits using 100 topics. Best results for each trait are shown in bold

Personality trait LDA Twitter-LDA

Extraversion 60.61% 58.59%

Neuroticism 57.58% 56.57%

Agreeableness 50.51% 61.62%

Conscientiousness 55.56% 44.44%

Openness 70.20% 64.14%

Personality trait	LDA	Twitter-LDA
Extraversion	60.61%	58.59%
Neuroticism	57.58%	56.57%
Agreeableness	50.51%	61.62%
Conscientiousness	55.56%	44.44%
Openness	70.20%	64.14%

As shown in Table 3, the result is contrary to expectation. The optimized algorithm (Twitter-LDA) shows it is superiority only on Agreeableness and fails to exceed LDA. We guess the reason is the non-uniform distribution of each user’s text length as shown in Table 1. Unlike LDA, Twitter-LDA produces a single topic from a single message. With uneven distribution of tweets, Twitter-LDA is easier to generate sparser topic matrix.

Then we train 50 topics for both two algorithms. The results are shown in Table 4.

Table 4

Precisions of predicting Big-Five personality traits using 50 topics. Best results for each trait are shown in bold

Personality trait	LDA	Twitter-LDA
Extraversion	58.08%	56.57%
Neuroticism	64.14%	64.14%
Agreeableness	59.09%	57.58%
Conscientiousness	55.56%	59.09%
Openness	70.20%	71.72%

When the number of topics is set to 50, Twitter-LDA begins to rock. On three personality traits, Twitter-LDA shows better or equivalent capability for personality prediction. An exception is Agreeableness, when the topic number reduced to 50, the performance of LDA rises and Twitter-LDA falls.

5. Discussion

As educed from Tang et al.’s work [20], the number of documents plays perhaps the most important role. By using only 198 documents, we still get some meaningful conclusion from the results:

The parameters of topic models is vital to personality recognition. For instance, Twitter-LDA favors less topics compared with LDA.

The distribution of amount of messages also influence the identification effects.

Different personality dimensions need different parameters (previous works always use the same parameters on all the dimensions). In our study, agreeableness prefers more topics than other dimensions.

For our current results, we have a hypothesis that when the number of topics are set to 50, the performance of topic models can be better developed than 100 topics, so that Twitter-LDA showed better capability on personality prediction than LDA. This hypothesis can be partially confirmed by a former study on suicide ideation prediction [26]. In this study, the author extracted topic features using LDA in a supervised learning model for suicide ideation prediction. Although this study is not about personality prediction, suicide ideation as a psychological index has a similar function just like personality in our study. The topic models we used in this study are all trained models instead of inferred models. As is shown in Fig. 3, when the number of topics is less than 80, the trained topics have the similar performance on the task, and when the number is set to 100, there is a peak of poor performance where the dotted line reaches a local maximum point. It means when the number of topics is set to 100, it is possible that all the topic models all show their poorest performance and the better topic models cannot show their advantages under this circumstance. Only when the number of topics is set lower than 80, the better topic models can perform better than the normal ones. Resnik’s study, they set 50 as the experience topic numbers for both neuroticism and depression prediction, also confirmed our hypothesis [15].

Fig. 3.

RMSE for suicide prediction across number of topics. Dotted line is for trained model.

6. Conclusions

In this paper, we demonstrated that personality as a metrics can provide insights to social network modelling. Even though personality score is hard to obtain, the idea can offer guidance to topic model researchers. From a view of practice, some pre-labelled dataset can be utilized to filter and examine new TMSNs.

We are committed to find topic models representing authors’ personality best, our attempts also can be seen as a brave step for personality recognition while most existing works are still working on proving the feasibility. The most difficult in future studying is the implementations of topic model algorithms. Only a small part of authors provided source code of their algorithms in corresponding papers, so it is a great burden to pick suitable topic models and implement them to specific dataset.

There is also a major limitation of our study that our experiments did not prove enough evidences why personality is a valuable metric for TMSNs. The most straightforward value of a metric is its value for real applications but it hard to apply this metric to industrial environments for real. We expect the value of our metric can be proved in industrial environments in the future.

References

[1]

M.D.

Back,

J.M.

Stopfer,

Vazire,

Gaddis,

S.C.

Schmukle,

Egloff and

S.D.

Gosling, Facebook profiles reflect actual personality, not self-idealization, Psychological Science 21(3) (2010), 372–374.

[2]

D.M.

Blei,

A.Y.

Ng and

M.I.

Jordan, Latent Dirichlet allocation, J. Mach. Learn. Res. 3 (2003), 993–1022.

[3]

Correa,

A.W.

Hinsley and

H.G.

De Zuniga, Who interacts on the web: The intersection of users personality and social media use, Computers in Human Behavior 26(2) (2010), 247–253.

[4]

Golbeck,

Robles and

Turner, Predicting personality with social media, in: CHI’11 Extended Abstracts on Human Factors in Computing Systems, ACM, 2011, pp. 253–262.

[5]

L.R.

Goldberg, An alternative “description of personality”: The Big-Five factor structure, Journal of Personality and Social Psychology 59(6) (1990), 1216–1229.

[6]

Y.A.

Hamburger and

Ben-Artzi, The relationship between extraversion and neuroticism and the different uses of the Internet, Computers in Human Behavior 16(4) (2000), 441–449.

[7]

Hong and

B.D.

Davison, Empirical study of topic modeling in Twitter, in: Proceedings of the First Workshop on Social Media Analytics, SOMA’10, ACM, New York, NY, USA, 2010, pp. 80–88.

[8]

Liu,

Niculescu-Mizil and

Gryc, Topic-link LDA: Joint models of topic and author community, in: Proceedings of the 26th Annual International Conference on Machine Learning, ICML’09, ACM, New York, NY, USA, 2009, pp. 665–672.

[9]

A.K.

McCallum, Mallet: A machine learning for language toolkit, 2002, available at: http://mallet.cs.umass.edu.

10.

[10]

Mimno,

H.M.

Wallach,

Talley,

Leenders and

McCallum, Optimizing semantic coherence in topic models, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2011, pp. 262–272.

11.

[11]

Newman,

Chemudugunta,

Smyth and

Steyvers, Analyzing entities and topics in news articles using statistical topic models, in: Intelligence and Security Informatics,

Mehrotra,

Zeng,

Chen,

Thuraisingham and

F.-Y.

Wang, eds, Lecture Notes in Computer Science, Vol. 3975, Springer, Berlin, Heidelberg, 2006, pp. 93–104.

12.

[12]

Newman,

Han Lau,

Grieser and

Baldwin, Automatic evaluation of topic coherence, in: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, 2010, pp. 100–108.

13.

[13]

J.W.

Pennebaker,

C.K.

Chung,

Ireland,

Gonzales and

R.J.

Booth, The development and psychometric properties of Liwc2007, LIWC.net, Austin, TX, 2007.

14.

[14]

X.-H.

Phan,

L.-M.

Nguyen and

Horiguchi, Learning to classify short and sparse text web with hidden topics from largescale data collections, in: Proceedings of the 17th International Conference on World Wide Web, WWW’08, ACM, New York, NY, USA, 2008, pp. 91–100.

15.

[15]

Resnik,

Garron and

Resnik, Using topic modeling to improve prediction of neuroticism and depression, in: Proceedings of the 2013 Conference on Empirical Methods in Natural, Association for Computational Linguistics, 2013, pp. 1348–1353.

16.

[16]

Ryan and

Xenos, Who uses Facebook? An investigation into the relationship between the Big Five, shyness, narcissism, loneliness, and Facebook usage, Computers in Human Behavior 27(5) (2011), 1658–1664.

17.

[17]

H.A.

Schwartz,

J.C.

Eichstaedt,

Dziurzynski,

M.L.

Kern,

M.E.P.

Seligman,

L.H.

Ungar,

Blanco,

Kosinski and

Stillwell, Toward personality insights from language exploration in social media, in: 2013 AAAI Spring Symposium Series, 2013.

18.

[18]

H.A.

Schwartz,

J.C.

Eichstaedt,

M.L.

Kern,

Dziurzynski,

S.M.

Ramones,

Agrawal,

Shah,

Kosinski,

Stillwell,

M.E.P.

Seligman et al., Personality, gender, and age in the language of social media: The open-vocabulary approach, PLoS ONE 8(9) (2013), e73791.

19.

[19]

Steyvers,

Smyth,

Rosen-Zvi and

Griffiths, Probabilistic author–topic models for information discovery, in: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2004, pp. 306–315.

20.

[20]

Tang,

Meng,

Nguyen,

Mei and

Zhang, Understanding the limiting factors of topic modeling via posterior contraction analysis, in: Proceedings of the 31st International Conference on Machine Learning, 2014, pp. 190–198.

21.

[21]

Wald,

Khoshgoftaar and

Sumner, Machine prediction of personality from Facebook profiles, in: 2012 IEEE 13th International Conference on Information Reuse and Integration (IRI), IEEE, 2012, pp. 109–115.

22.

[22]

Wang and

D.M.

Blei, Collaborative topic modeling for recommending scientific articles, in: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2011, pp. 448–456.

23.

[23]

Xu,

Lu,

Xiang and

Yang, Discovering user interest on Twitter with a modified author–topic model, in: 2011 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Vol. 1, IEEE, 2011, pp. 422–429.

24.

[24]

Yan,

Guo,

Lan and

Cheng, A biterm topic model for short texts, in: Proceedings of the 22nd International Conference on World Wide Web, WWW’13, Republic and Canton of Geneva, Switzerland, International World Wide Web Conferences Steering Committee, 2013, pp. 1445–1456.

25.

[25]

Zhang and

Sun, Large scale microblog mining using distributed MB-LDA, in: Proceedings of the 21st International Conference Companion on World Wide Web, WWW’12 Companion, ACM, New York, NY, USA, 2012, pp. 1035–1042.

26.

[26]

Zhang,

Huang,

Liu,

Li,

Chen and

Zhu, Using linguistic features to estimate suicide probability of Chinese microblog users, in: Human Centered Computing, Springer, 2015, pp. 549–559.

27.

[27]

Zhao,

Jiang,

Weng,

He,

E.-P.

Lim,

Yan and

Li, Comparing Twitter and traditional media using topic models, in: Advances in Information Retrieval,

Clough,

Foley,

Gurrin,

G.J.F.

Jones,

Kraaij,

Lee and

Mudoch, eds, Lecture Notes in Computer Science, Vol. 6611, Springer, Berlin, Heidelberg, 2011, pp. 338–349.