Abstract
Social networking sites (SNS) are a rich source of latent information about individual characteristics. Crawling and analyzing this content provides a new approach for enterprises to personalize services and put forward product recommendations. In the past few years, commercial brands made a gradual appearance on social media platforms for advertisement, customers support and public relation purposes and by now it became a necessity throughout all branches. This online identity can be represented as a brand personality that reflects how a brand is perceived by its customers. We exploited recent research in text analysis and personality detection to build an automatic brand personality prediction model on top of the (Five-Factor Model) and (Linguistic Inquiry and Word Count) features extracted from publicly available benchmarks. Predictive evaluation on brands’ accounts reveals that Facebook platform provides a slight advantage over Twitter platform in offering more self-disclosure for users’ to express their emotions especially their demographic and psychological traits. Results also confirm the wider perspective that the same social media account carry a quite similar and comparable personality scores over different social media platforms. For evaluating our prediction results on actual brands’ accounts, we crawled the Facebook API and Twitter API respectively for 100k posts from the most valuable brands’ pages in the USA and we visualize exemplars of comparison results and present suggestions for future directions.
Introduction
Social networking has become a big part of our everyday life and users are increasingly open to the choice of where they interact. In 2017, more than half of the global population uses the internet and there are more than 2.7 billion active social media accounts worldwidely [27]. Therefore, it is no surprise that social media plays a big role in individuals social interaction. Every social media user leaves a mark as digital footprint by writing posts, liking pages, providing content or just browsing the social media sites.
Previous research in psychology domain has suggested that an individual’s behaviour can be explained by psychological constructs, which are called personality traits [21]. There are different personality models built on top of this concept. The best well-known personality model is the FFM (Five-Factor Model) introduced by [8], also referred to as Big Five personality traits. This model is based on the association between words and human personality and defines five global factors: Openness, Conscientiousness, Extraversion, Agreeableness and Neuroticism.
Knowledge of an individual’s psychological emotions and personality allows predictions of users’ interests and preferences across different contexts and environments [17,23]. This can be used to alligen advertisements [4], distinguish sales managers skills [18], identify malicious behaviours [5], optimize product and page recommendations [19], as well as for studying humans disease as Parkinsons and Alzheimers disease [3,9].
The traditional approach to measure individual personality traits require users to fill out long questionnaires. An example of this questionnaire is the revised NEO Personality Inventory questionnaire, consisting of 20 to 360 personality related questions [7]. It is very unlikely that web users will fill out those time-consuming and impractical questionnaires to personalize services, like search results [28] or product recommendations. But scientists can automatically infer users’ personality from each individual digital footprint within various social media platforms. This is even more accurate than assessments made by friends and family as pointed out in [16]. Different features from social media can be considered to be used as the digital footprint of a user. Marketers can utilize user’s liked pages, social network attributes, demographic information or media content, which was liked or posted by the user, to better target purchasing preferences without his/her consent. This study focus on the textual language used in social media posts, because the way people use words is reliable over time and internally consistent with traditional measures of personality [6].
A huge part of the 2.7 billion social media accounts are used by separate individuals, brands, communities and public figures. Therefore, brands leave also footprints in online manner with their activities as normal users do. Each brand represents a digital personality with its footprint referred to as brand personality [1]. The knowledge of its own personality could be used to improve the brand’s marketing and public relation in general. In our study, we will focus on the hypothesis that assume we can predict brand personalities based on engineered features from user personalities. We also aim to examine if brand personalities can be predicted by models trained on user personalities. Also, we will test the hypothesis that assume users’ language on Facebook offers more self-disclosure than Twitter and visualize how it affects the final personality prediction score.
This article is organized as follows. The subsequent Section 2 introduces the previous literature in the domain of individual personality and brand personality. Section 3 discuss issues concerning data acquisition, feature selection and implementation criteria. In Section 4, we illustrate different experiments results against various machine learning classifiers while in Section 5 we visualize insights for the followed evaluation method. Finally, in Section 6 and Section 7 we summarize final results with redirection and suggestion for the future work in term of automating personality detection.
Related work
In contrast to traditional methods in determining users’ personality, leveraging social media footprints for predicting personality promises forthright and direct insights. Farnadi et al. [10] compared a variety of univariate and multivariate regression methods on datasets from Facebook, Twitter, and YouTube. The multivariate models often outperformed univariate ones, but the differences were not significant. They found out that no common features can be identified, which perform well on all social media datasets. Even expanding a model with training samples from another social network could not improve their regressors. Farnadi et al. concluded that the context of the data plays a major role in learning. Their dataset from YouTube was labeled by impressions, whereas their Facebook and Twitter labels were self-reported through psychometric questionnaires.
Hall et al. [12] examined the effects of self representation of Facebook users to studies on social phenomena within social media networks. Users of social media platforms consciously or subconsciously represent themselves in a way which is appropriate for their audience. The lack of appropriate methods to identify and control the effects of this restricts research findings. Thus conducted a case study involving 509 paid Amazon Mechanical Turk workers. They provided psychometric survey results and Facebook footprints to the researches. This data was used to predict user’s personality according to the (FFM) using (LIWC)-only features. The study pointed out that self-representation is an existing phenomenon in social media and that personality is still detectable even when self-representation is present.
Both research efforts ([10] and [12]) used supervised machine learning approaches to predict user’s personality according to the (FFM) model. [20] proposed a new approach using linear semi-supervised regression to improve prediction results. Their study is based on data with 1792 users collected from Sina Microblog, the most popular social platform in mainland China. They stated that their experimental results support their thesis that unlabeled data could improve prediction results.
Also, [2] proposed an approach using multi-task regression and incremental regression to predict from Sina microblogging platform the Big- Five personality from online behaviors. Their study is based on survey data of 444 users and indicates that the correlation factors are significant between different personality dimensions. They stated that their training data set is reliable enough and multi-task regression performs better than other modeling algorithms.
Research effort of [14] focused on comparing self-disclosure on Facebook versus Twitter, they collected and process social media data for the same users for both platforms, this enabled them to perform a comparative analysis under a proper scientific setup. The results indicate that users do prefer to self-disclose more on Facebook than on Twitter as platform affordances do play a big role in determining users’ self disclosure behavior.
In regard to brand personality research, two papers from the business domain are relevant. [1] developed a theoretical framework of the brand personality construct by determining the nature of dimensions of brand personality (Sincerity, Excitement, Competence, Sophistication, and Ruggedness). Geuens et al. [11] also developed a new brand personality measure consisting of a dimension mapping to (FFM) personality items (Responsibility = Conscientiousness, Activity = Extraversion, Aggressiveness = Agreeableness, Simplicity = Openness, Emotionality = Neuroticism) in contrast to other models [1].
Implementation
This section is divided into several sub-sections, at the first, we retrieve the training and evaluation datasets from MyPersonality project and Facebook API (Section 3.1). The next sub-section consist of feature creation in the (LIWC) tool, selection of the most significant features (Section 3.2) to train different regression models (Section 3.3) on the (FFM) traits. Section 3.4 introduces our final evaluation metric.
Datasets
Due to labeling data with valid personality scores is very time consuming and difficult task, there are only a few golden standard datasets from social media platforms available for personality prediction tasks. One of the well-known datasets is MyPersonality’s Facebook dataset, which we used it as the only source of labeled training data for Facebook users to train our prediction models.
Also, there are no ground truth dataset from social media platforms for the task of brand personality prediction. Therefore, we crawled our own 100k posts dataset for brands from Facebook API and Twitter API respectively. This dataset is not labeled yet and therefore cannot be used to train supervised machine learning algorithms. The rest of this section describes the datasets we used in this research in more depth and details.
MyPersonality Facebook dataset
MyPersonality was a popular Facebook application introduced by [24] in 2007. It allowed Facebook users to participate in various psychometric tests, including a (FFM) questionnaire comparable to the revised NEO Personality Inventory from [7]. Roughly 30 percent of participating users decided to let the application collect data from their Facebook profile and donate it to research [15]. The MyPersonality database consists of more than six million psychometric test results and more than four million distinct Facebook profiles.
We used the following datasets from MyPersonality project for our research:
It contains demographic details for over 4 million Facebook users and consists of unique user identifier, gender, birthday, age, relationship status, Interested In information, language, number of friends and timezone of the user.
Contains the Five-Factor model personality scores for more than three million users. Scores are represented in the range of
Facebook posts, 25 million status post texts from 22 million unique Facebook users.
The datasets contains (LIWC) annotations for 153617 Facebook users. They were calculated by running the Linguistic Inquiry and Word Count application [22] per user aggregated status posts. The (LIWC) analysis reflects the different emotions, thinking styles, social concerns and parts of speech in free text. Each annotation, also called word categories, is represented as a percentage of words in all of each user’s status posts. However, the sum over all word categories can get greater than 100%, because some words fall into multiple categories. As our goal is to infer the (FFM) personality scores for an English speaking user, we aggregated both filtered tables, (FFM) scores with (LIWC) scores as status post features. Table 1 provides details about the final training dataset. A user is now represented with the extracted linguistic features of his or her status posts and annotated with the (FFM) score for each personality trait.
Characteristics of training dataset
Characteristics of training dataset
Correlation of trait labels in the training dataset
In Table 2, we correlate the trait labels to check if we can consider each trait individually. For calculating the correlation we are using Pearson product-moment correlation. From Pearson result, all correlations are significant (
All three datasets from MyPersonality contain an anonymized user ID that can be only used to match users between tables. We used the demographic details table to filter out users, which had not set their locale or used another language than English. Figure 1 represent the distribution of Openness trait scores in the training dataset. Figure 18 in the Appendix contains the full list of the remaining scores for all personality labels.

Distribution of openness personality scores in the training dataset.
As MyPersonality dataset has no data for brand’s Facebook pages, we decided to crawl our own dataset. Using the Facebook graph API, we crawled status updates from 46 popular brands appearing in the Top 50 of Forbes’ The World’s Most Valuable Brands list. Four brands are missing in our dataset, because they did not have a representation on Facebook. Altogether we collected 85347 status updates for the whole lifetime of the brands’ Facebook pages until January 2018. Some brands have only very few status updates, so we decided to only consider brands with at least 1000 posts. This will improve feature extraction with (LIWC). Table 3 shows more details view about the crawled dataset.
Statistics for the crawled public posts for top brands from Facebook social platform
Statistics for the crawled public posts for top brands from Facebook social platform
We did not look further into comments on brand pages’ posts. The data found in the comments was noisy with a lot of links to other Facebook profiles. This could be tied to the many posts made within a scope of sweepstakes. Manual inspection of the remaining posts revealed a lot of Spam.
Using the Twitter API, we crawled tweets updates for the same previous Facebook list. Altogether we collected 103053 tweet updates. Table 4 provide statistics about the crawled Twitter pages.
Statistics for the crawled public tweets for top brands from Twitter micro blogging platform
Statistics for the crawled public tweets for top brands from Twitter micro blogging platform
As mentioned in Section 3.1, we use the combined datasets from MyPersonality [15] to train our models. (LIWC) features are word categories and some categories are also contained in other categories, which means they depend on each other. Appendix B, inclose statistics about the LIWC features in the used training dataset.
Hence, it is reasonable to select a subset of relevant features for training. In general, we will build one predictor per personality trait (see Section 3.3). We will consider three different feature sets per model:
These features are important to a specific personality trait. This means we will have five different features sets, one for each trait.
A feature set which contains important features for all personality traits. This set will be the same for the all five traits.
This is a union of set Own and Common features.
We investigated two different approaches to select features for the mentioned feature sets: An approach based on Pearson’s correlation coefficient and an approach based on boosted decision trees’ feature significance. They are described in the following two sections.
We will refer to the used feature sets in the following way: X-S-M. X is the affected trait (e.g. O for Openness and C for Conscientiousness). This field is optional if the set is independent from the trait. S is one of the introduced feature sets. M defines the used method. P for Pearson and B for gradient boosting. As an example O-own-P describes the feature set own defined with the Pearson approach for the trait Openness.
Pearson correlation

Pearson correlation coefficient heatmap between LIWC features and personality traits. Red = high coefficient.
We performed pairwise correlation analysis between all 93 features and the (FFM) personality scores using Pearson product-moment correlation. This leads to
Correlation results and significance levels between features and all five traits can be found in heatmap Fig. 2 and heatmap Fig. 3. Unsurprisingly, not all features are correlated with the personality scores. Examples are the word categories Dash, QMark or Period. These punctuations are consistently barely used in status updates and are hence bad discriminators. Features with overall high relative correlation coefficients are e.g. tone, negemo, netspeak and Apostro. Correlations with trait Neuroticism are harder to find than with the other traits. Table 5 reveals the correlation between Openness personality trait and LIWC extracted features. Appendix C contains all the LIWC extracted features pairwise correlated with all personality dimensions.

Pearson correlation significance heatmap between LIWC features and personality traits. Red = low significance.
LIWC features correlations with openness personality trait
To get important features to a trait (X-own-P feature sets), we only consider those features with a significant correlation (
As Pearson coefficient can only measure linear correlations, we use an additional approach for selecting features: Feature importance of boosted decision trees.

Significance and relative importances features for openness trait: the diagram contains all features with a relative importance higher than 0.011 value.
Another approach to get a well suited subset of features is to use the relative importance of features within a gradient boosted regression tree [13]. The idea behind this approach is to train a model on all available features. The resulting predictor will not be good at predicting values but the model implicitly contains the importance of each feature to make a decision. We train one model per personality trait, each with all features as input, to get the own feature sets. There are two approaches to get the set common. First option can be an intersection between all own sets. The second approach is to boost one multivariate regression tree and extract the significance features as the own sets. We used the former approach. Figure 4 shows the resulting importance graph of the Openness trait. You can find the selected features in Table 11 in the Appendix as well. They are called O-own-B, C-own-B, E-own-B, A-own-B, N-own-B and common-B.
All significant correlated feature sets can be seen in Appendix C as well as for all final selected features in Appendix D. Both approaches clearly select similar feature sets, e.g. the feature sets for trait Openness have 12 out of 17 possible common features. The Pearson approach is more selective than the boosted approach. This is explainable by the fact that the Pearson approach only considers linear relationships between features and scores. Generally the number of features was reduced to about
Extraversion and Neuroticism are hard to predict, because their feature sets (E-own-P and N-own-P) have only a few significantly correlated features. The gradient boosting approach however finds a lot of important features for these traits. This leads to the assumption that there could be non-linear relationships between the features and the personality traits extraversion and conscientiousness.
As Five-Factor model’s personality scores are continuous values ranging from 1 to 5, predicting a user’s personality score is a regression task. Regression models approximate a mapping function from the feature vector to a continuous output variable. Based on our training data described in Section 3.1, we trained three different machine learning algorithms: support vector regression, boosted regression trees and Neural Nets. They are described in the following sections.
The (FFM) depict an individual’s personality via five personality scores, therefore we decided to train five models: One for each personality trait. Each algorithm is trained on three different feature sets, selected with two approaches for all five traits (see Section 3.2). This means we will train
All three selected algorithms require hyperparameters to be set before training. We perform a grid search over selected hyperparameters combined with a 3-fold cross-validation for each model to find the best performing parameter combination for the used feature set. (RMSE) root mean squad error is used for comparing the separate grid search folds. This measure is described in Section 3.4. The dataset used to build the models has 108,547 sample. We used a random split to extract
Support vector regression
Support Vector Machines can be used as a regression method. Thereby main features of Support Vector Machines are maintained.
SVRs like other Support Vector Machines allow using kernels to transform data into a higher dimensional feature space for non-linear base data. As experiments with a linear kernel showed bad results for our data, we decided to use a Gaussian Radial Basis Function (rbf)-kernel. It projects the input vectors in an infinite dimensional vector space and is defined as [26]:
It measures the similarity of two feature vectors
Rbf-kernels have a high computational complexity and do not scale well with the number of training samples used. Our training set with roughly 100000 samples already took multiple hours to complete training one model. We suggest to use an approximation of the rbf-kernel for future trainings with even more data.
Gradient boosting
The same approach we use in Section 3.2.2 for feature selection can be used to learn a model based on regression trees. We used the selected feature sets as input and train the model. This approach results in a stair like function. Each leaf has a scalar as output. As the tree has hard decision boundaries the scalar has a fixed number of values.
Neural nets
We applied a feed-forward Neural Network. This means the output of every perceptron is connected as input to every perceptron of the next layer. We deploy four hidden layers with 1024, 512, 256 and 128 perceptrons as presented in Fig. 9.
Right after the input layer there is a layer to normalize the input variables. This normalization is also learned during the training of the Neural Net. There are two more layers, one in the beginning and one right before the output. These layers are dropout layers. They will cut a specific rate of connections between the perceptrons during the training. This can prevent overfitting and supports generalization of the model. The rate is a hyperparameter and gets optimized together with other parameters.
Quality measures
There are many metrics to estimate the skill of a regression model as an error in its predictions. We evaluate our regression models based on the popular (RMSE) measure, which calculates the difference between the predicted values by the model and the observed ones. It is defined by the following formula:
With a sample size of n, we identify instances by their number
Results
In this sections, we demonstrate the performance of the three applied algorithms’ models on various feature sets. We utilize the mean trait scores as baseline predictor. At the end of this section we compare the best performing models of each algorithm type among the others.
Support vector regression
Figure 5 shows the results of the SVR models trained on the feature sets selected by Pearson correlation. The blue bars indicate the mean baseline performance for all traits. The baseline error for the traits Neuroticism and Extraversion is considerably higher than for the other traits. This indicates that predicting this personality scores right is harder. This was already observed during feature selection in Section 3.2.

Errors of different Pearson feature sets with SVR.
The SVR models trained on trait-specific features (own) and on common features (common) clearly outperform the baseline predictor for all personality traits. The combined feature set (union) although using the same features than own and common together, does not achieve good results. Models trained with the union feature set could not beat the mean baseline for the traits Neuroticism, Openness, Extraversion and Conscientiousness.
For the feature sets selected by Pearson correlation the best performing models use the common-P feature set. It consists of more features than the trait-specific ones, which is certainly a better representation of the original feature space.

Errors of different boosted feature sets with SVR.
Figure 6 displays the (RMSE) of the models trained on the feature sets selected by gradient boosting. Similar to the previous models, the models trained on X-own-B and common-B clearly outperform the mean baseline. The models trained on union features have a considerably higher (RMSE) than the other models besides the model for trait Agreeableness.
In contrast to the Pearson feature set, the own features perform the best over all traits on the test data set. This is due to the small number of features in the common-B feature set. It only consists of 9 features. These are too few to successfully represent the original feature space.
Comparison of (RMSE)s of best feature sets on Pearson and boosted with SVR. Bold values indicate lowest error for this trait
In Table 6, we compare the best feature sets of both selection approaches to the baseline. The best performing models of the Pearson correlation feature selection approach used the common-P feature set. Whereas the best performing models of the gradient boosting approach used the trait-specific (X-own-B) feature sets.
Both approaches produce similar results regarding (RMSE), which outperform the baseline for all traits. For the traits Neuroticism, Openness and Extraversion the models trained on X-own-B perform slightly better than the models trained on common-P. Conscientiousness and Agreeableness are better predicted by the models trained on common-P. None of both approaches performs significantly better than the other.
Figure 7 shows the results of the (XGB) models trained on the feature sets selected by Pearson correlation. All our models outperform the mean baseline and have very similar results. The models trained on X-own-P have smaller errors than all other models for their traits Openness, Agreeableness and Conscientiousness. Only for the traits Neuroticism and Extraversion they could not beat the models trained on common-P and X-own-P. On average the models trained on X-own-P have the lowest errors.

Errors of different Pearson feature sets with (XGB).
For the feature sets selected by gradient boosting, there is more variance in the resulting (RMSE). Figure 8 shows the results of the different models. The feature set common-B performs the worst for all traits compared to the other boosted feature sets. Models trained on X-union-B clearly outperform all other models for their trait.

Errors of different boosted feature sets with (XGB).
The best feature sets of the two feature selection approaches are compared in Table 7. The models trained on the X-union-B feature sets are slightly better than the models trained on X-own-P for all traits. In contrast to the SVR models, the difference between the two selection approaches is more distinctive.
Comparison of (RMSE)s of best feature sets on Pearson and boosted with (XGB). Bold values indicate lowest error for this trait
As results of the SVR approach has shown that the feature set union of both selection methods is not an optimal feature set and the results from (XGB) were not yet available, we reduced the amount of models for Neural Nets. Figure 9 represent the utilized feed-forward Neural Network architecture.
For instance, we successfully trained a Agreeableness model on the own sets of Pearson and boosting selection. The error of the boosting based model is 0.7069 and therefore slightly worse than the baseline prediction. The model based on the Pearson set is with 0.6927 better than the baseline. However, these results does not allow any generalization.

The utilized neural net architecture without normalization and dropout layers.

Loss function of the Neural Net based on A-own-B starting with epoch 100.
More interesting in this case is the loss function of the trained models as it can give some insights on the suitability of the designed models for the given task. Figure 10 shows the loss function of the Agreeableness model based on A-own-B features extracted by boosting trees while Fig. 11 represent the loss function of the same personality trait but for the model with features defined by person correlation. First of all we can see that the train loss has very little noise. This means the chosen batch size of 50000 samples during the training is big enough and optimal. Comparing the test loss to the train loss we can see that the model has a good generalization on the data and does not overfit. The loss functions of the other trained models are quite similar to the one demonstrated above.

Loss function of the Neural Net based on A-own-P starting with initial epoch.
As Section 4.2 points out, the gradient boosting algorithms had the best results when trained on the combined features set selected by the boosting approach (X-union-B). The SVR models performed well on both the feature set selected by Pearson correlation (common-P) and the trait-specific feature set selected by the boosting approach (X-own-P), but are bad on the combined feature set (union). For SVR we chose X-own-P for comparison.
Table 8 compares the (RMSE) of the best performing algorithm’s models. Only the Neural Net classifier trained for trait Agreeableness cannot beat the mean baseline. All other models outperform the baseline. The SVR and (XGB) models perform very similar besides they use different feature sets.
Comparison of (RMSE) values of the SVR and NN models trained on X-own-B features where (XGB) models trained on X-union-B features. Bold values indicate lowest error for the associated trait
Comparison of (RMSE) values of the SVR and NN models trained on X-own-B features where (XGB) models trained on X-union-B features. Bold values indicate lowest error for the associated trait
Overall the (XGB) models trained on X-union-B perform the best. The greatest improvement compared to the baseline is achieved for the trait Openness with about
We utilized the Facebook brand pages and Twitter brand pages described earlier (in Section 3.1) to predict the (FFM) traits with the proposed SVR model. To compare whether our general personality prediction is accurate on brand data we used the API from ApplyMagicSauce [25] to predict the traits on the same data. ApplyMagicSauce is a research project from University of Cambridge, using not only datasets like the MyPersonality project, but questionnaires, Tweets, browsing data and open text to identify different psychological parameters. The (FFM) scores are given in percentiles in ratio to the average of each trait in the whole dataset.

Personality scores prediction for brand called (CVS Health) at ApplyMagicSauce API versus the proposed SVR prediction model using their public Facebook posts.
As seen in the radar diagrams in Fig. 12, our model is capable to detect the five personality traits of Facebook brands pages on Facebook, and reported significant improvements in detecting specific personality traits over the another by extracting and engineering several textual features from online available social fingerprints. Figure 13 and 14 represent the predicted personality scores for the same brand page by analyzing their public posts and public tweets at our proposed models and we show how feature extraction approaches at the training phase (Pearson versus Boosting trees) can affect the final predicted results.

Personality prediction for brand (CVS Health) using the proposed SVR model with features extracted by Pearson-Correlation. Prediction is made based on Facebook public posts (Blue) VS Twitter public Tweets (Orange).

Personality prediction for brand (CVS Health) using the proposed SVR model with features defined by Gradient Boosted Regression Tree. Prediction is made based on Facebook public posts (Blue) VS Twitter public Tweets (Orange).
We evaluate how supervised language models trained on Facebook users are capable of detecting personality traits from Twitter users. The results shows that Facebook users’ tend to use more psycho-linguistic conceptual emotion categories words than Twitter users’ and this leads to better personality prediction at Facebook platform. The results are comparable to the state-of-the-art language models provided by [14] where they conclude that Facebook users’ prefer to use Facebook social platform for posting content about their personal relationships and personal concerns, where Twitter users’ tend to use Twitter micro blogging platform for posting about their psychological needs and derives.
To this extent, the lack of restrictions on the posts length at Facebook platform can be considered as a major factor in Facebook’s superiority over Twitter in predicting personality dimensions. Figure 15, 16, and 17 shows how brands personality are varying when well-established language models are used to predict brand personality traits based on their published Facebook posts and Twitter tweets.

Personality prediction for brand (SAP) using the proposed SVR model with features defined by Gradient Boosted Regression Tree and Pearson. Prediction is made based on Facebook public posts VS Twitter public Tweets.

Personality prediction for brand (General Electric) using the proposed SVR model with features defined by Gradient Boosted Regression Tree and Pearson. Prediction is made based on Facebook public posts VS Twitter public Tweets.

Personality prediction for brand (Cisco) using the proposed SVR model with features defined by Gradient Boosted Regression Tree and Pearson. Prediction is made based on Facebook public posts vs Twitter public Tweets.
The evaluation of brand personality in online space is not an easy task. Creating a gold standard for brand personalities by conducting interviews and questionnaires with employees as well as marketing and enterprise managers would definitely bring the research to the front. A point worth investigating is whether followers personality is matching the brand personality. A brand can take advantage and apply reverse psychology in marketing campaigns to attract similar or even totally contrary personality types. This knowledge would also greatly help public relations to identify target audience of a brand over various Social Media networks. The same analysis is conceivable for employee personalities in comparison to the brand and could support human resources in a company or help new applicants to find an appropriate job position.
Conclusion
This paper aims to predict brands personality from social online fingerprints with machine learning algorithms that trained on labeled data from user self-report personalities test at both Facebook and Twitter platform. It uses two different approaches to select feature sets and evaluates three different types of machine learning algorithms. The final model is able to properly distinguish between personality dimensions of Facebook at Twitter pages by investigating a wide set of combination between the extracted features with state-of-the-art machine learning classifiers. In term of the implications for machine learning domain, our experiments suggest that the source of the language samples can greatly affect the ability of capturing users’ personality. In general, language models trained of Facebook data to predict personality dimensions can be decidedly transferred to Twitter platform but not vice versa.
Footnotes
The big five personality labels scores distribution in the training dataset
Explatory features analysis at training dataset
Statistics for LWIC features extracted from MyPersonality dataset
| LIWC features | Count | Mean | std | Min | 25% | 50% | 75% | Max | IQR |
| 115822 | 59.421852 | 19.351168 | 1 | 45.9 | 59.64 | 73.52 | 99 | 27.62 | |
| 115822 | 55.601547 | 17.641694 | 1 | 44.18 | 54.71 | 67.12 | 99 | 22.94 | |
| 115822 | 50.215093 | 26.728161 | 1 | 31.29 | 54.62 | 71.13 | 99 | 39.84 | |
| 115822 | 67.301046 | 28.058909 | 1 | 46.3925 | 73.76 | 92.98 | 99 | 46.5875 | |
| 115822 | 14.907625 | 23.822601 | 0.5 | 9 | 11.5 | 15.55 | 3834 | 6.55 | |
| 115822 | 13.995215 | 4.521729 | 0 | 11.65 | 13.59 | 15.74 | 100 | 4.09 | |
| 115822 | 76.968739 | 17.780135 | 0 | 75.93 | 83.12 | 86.94 | 100 | 11.01 | |
| 115822 | 42.557772 | 11.886974 | 0 | 39.89 | 46.11 | 49.9 | 100 | 10.01 | |
| 115822 | 13.131699 | 4.8014 | 0 | 10.7 | 13.65 | 16.15 | 100 | 5.45 | |
| 115822 | 8.991791 | 3.654331 | 0 | 6.93 | 9.16 | 11.22 | 100 | 4.29 | |
|
|
115822 | 5.113042 | 2.74385 | 0 | 3.3 | 5.08 | 6.79 | 100 | 3.49 |
| 115822 | 0.559721 | 0.689054 | 0 | 0.17 | 0.44 | 0.75 | 50 | 0.58 | |
| 115822 | 2.294037 | 1.749893 | 0 | 1.16 | 2.02 | 3.11 | 50 | 1.95 | |
| 115822 | 0.648692 | 0.780837 | 0 | 0.16 | 0.49 | 0.9 | 60 | 0.74 | |
| 115822 | 0.378993 | 0.442825 | 0 | 0.09 | 0.31 | 0.53 | 18.18 | 0.44 | |
| 115822 | 4.13145 | 2.048509 | 0 | 3.21 | 4.3 | 5.18 | 100 | 1.97 | |
| 115822 | 4.796804 | 1.969031 | 0 | 3.82 | 4.95 | 5.8975 | 37.5 | 2.0775 | |
| 115822 | 9.976873 | 3.477478 | 0 | 8.71 | 10.68 | 12.04 | 66.67 | 3.33 | |
| 115822 | 7.502806 | 2.830535 | 0 | 6.41 | 7.98 | 9.14 | 50 | 2.73 | |
| 115822 | 4.581186 | 2.109951 | 0 | 3.67 | 4.8 | 5.71 | 50 | 2.04 |
Pairwise correlation between features and personality traits
Features that significantly correlate with all trait labels (
| Features | Openness | Conscientiousness | Extraversion | Agreeableness | Neuroticism |
| 0.068494 | 0.009237 | 0.074461 | 0.008496 | 0.052487 | |
| −0.049436 | 0.103304 | 0.049392 | 0.073409 | −0.049931 | |
| −0.069253 | 0.172775 | 0.135431 | 0.18298 | −0.092732 | |
| 0.071569 | 0.014526 | −0.061646 | −0.015693 | −0.013553 | |
| 0.014514 | −0.013042 | 0.046791 | 0.023391 | 0.041792 | |
| −0.023257 | 0.015831 | 0.04325 | 0.024644 | 0.009767 | |
| 0.080803 | 0.060756 | −0.017511 | 0.034252 | −0.042799 | |
| 0.046716 | 0.023749 | −0.007983 | 0.044148 | 0.022498 | |
| 0.042369 | 0.04841 | 0.019684 | 0.056052 | 0.01743 | |
| 0.013816 | −0.023289 | −0.016496 | −0.026628 | 0.041096 | |
| −0.074599 | 0.014773 | 0.050155 | 0.046865 | 0.018472 | |
| −0.085454 | 0.079127 | 0.074458 | 0.101012 | −0.028261 | |
| 0.014434 | −0.107697 | −0.017511 | −0.113296 | 0.047073 | |
| −0.027461 | 0.019418 | 0.033898 | 0.028197 | −0.008801 | |
| −0.027568 | 0.027472 | 0.036498 | 0.014373 | 0.032973 | |
| −0.025863 | 0.019266 | 0.021924 | 0.013261 | −0.02197 | |
| 0.049599 | 0.020891 | −0.027528 | 0.034595 | 0.032331 | |
| 0.05433 | −0.033162 | −0.008349 | 0.020174 | 0.017156 | |
| 0.013998 | −0.031997 | 0.032891 | −0.017096 | 0.029259 | |
| 0.02723 | −0.061689 | 0.009341 | −0.038108 | 0.030724 | |
| 0.01805 | −0.083953 | 0.014281 | −0.075323 | 0.017293 | |
| −0.052251 | 0.091857 | 0.049295 | 0.070159 | −0.0463 | |
| −0.052079 | 0.061489 | 0.055885 | 0.062503 | −0.02759 | |
| −0.025544 | 0.062448 | 0.009735 | 0.033299 | −0.039728 | |
| −0.056552 | 0.058267 | 0.043904 | 0.048076 | −0.036966 | |
| 0.010417 | 0.025491 | 0.012839 | 0.036017 | 0.023151 | |
| −0.023329 | 0.091913 | 0.026936 | 0.071985 | −0.034793 | |
| −0.019126 | 0.048417 | 0.028863 | 0.045586 | −0.024184 | |
| 0.034205 | 0.050738 | 0.017932 | 0.034359 | −0.031808 | |
| −0.058103 | 0.092945 | 0.019274 | 0.073315 | −0.024624 | |
| −0.009032 | 0.056185 | −0.035491 | 0.025133 | −0.028369 | |
| −0.014062 | 0.014116 | 0.026138 | 0.028533 | −0.046717 | |
| 0.057287 | −0.051633 | −0.044626 | −0.049209 | 0.020564 | |
| −0.086264 | −0.079306 | 0.073246 | −0.031772 | 0.008127 | |
| 0.008354 | −0.10041 | 0.018342 | −0.106396 | 0.021728 | |
| 0.088365 | −0.038703 | −0.054195 | −0.013173 | 0.048340 |
Final feature sets extracted by two approches: Pearson and gradient boosting
Feature sets selected by Pearson correlation coefficient and gradient boosting feature importances. The bold printed features for the boosting approach are common features (feature set common-B). There are 9 features in this set
| Set | Count | Features |
| Pearson | ||
| O-own-P | 17 | Apostro, Sixltr, Tone, WC, affect, affiliation, article, death, drives, family, informal, insight, netspeak, percept, posemo, reward, time |
| C-own-P | 30 | Clout, Dic, Tone, achieve, affiliation, anger, article, body, death, drives, family, focusfuture, function., i, informal, negemo, netspeak, posemo, prep, quant, relativ, relig, reward, sexual, social, space, swear, time, we, work |
| E-own-P | 9 | Apostro, Sixltr, Tone, WC, affect, affiliation, informal, netspeak, posemo |
| A-own-P | 19 | Authentic, Clout, Dic, Tone, affiliation, anger, conj, drives, focusfuture, function., negemo, posemo, prep, relativ, sexual, social, swear, time, we |
| N-own-P | 5 | Analytic, Tone, WC, i, negemo |
| common-P | 37 | Analytic, Apostro, Clout, Sixltr, Tone, WC, achieve, affect, affiliation, anger, article, auxverb, bio, body, cogproc, conj, death, drives, female, focuspresent, friend, informal, leisure, male, motion, negate, percept, posemo, ppron, relativ, reward, sexual, space, swear, time, work, you |
| Boosting | ||
| O-own-B | 39 |
AllPunc, Apostro, Clout, Comma, Dic,
|
| C-own-B | 38 |
AllPunc, Colon, Comma, Dic,
|
| E-own-B | 41 |
AllPunc, Apostro, Clout, Colon, Comma,
|
| A-own-B | 35 |
Analytic, Apostro, Colon, Dic,
|
| N-own-B | 38 |
Apostro, Clout,
|
