Dashboard for exploring personalities based on mobile user log data

Abstract

The usefulness and ease of use of Big5 dashboard have been proposed to explore hierarchical structure of personality traits. First, Big5 system architecture and its components are described. Afterwards, we present how to calculate Big5 indicators from available big mobile data sets. Hereafter, Big5 traits can be predicted based on those just-specified indicators. To proof of our concepts, implementation results will be presented in the context of the Big5 dashboard which has been designed and developed to predict Big5 personalities in a representative and interactive manner.

Keywords

Big5 traits personality indicators data warehouse mobile logs Naive Bayes classification dashboard

1 Introduction

Hierarchical structure of personality traits, i.e. extraversion, agreeableness, conscientiousness, neuroticism, and openness to experience is known as the five-factor model (Big5) [12 , 25]. According to [25], these Big5 personality traits have been utilized to recognise connecting among personality and different behaviors. Those factors are very important for business people to understand their clients. Especially in the field of mobile data, the number of users will pass the five billion in 2019 [27] and mobile phone logs have been made by service providers available to researchers [3] as well as to commercial partners [21]. As a result, personality indicators specified from big mobile logs open the door to exciting roads for next generation of social sciences [6].

Furthermore, [7] showed that data can be collected from the onboard sensors and other phone logs. In [3], determining personality of a cell phone client through standard mobile logs has turned into a point of enormous interest. Those mobile log data could therefore give a profitable subtle and cost-effective choice to study based measures ofpersonality [3].

According to [11], dashboards are basic components in supporting interactive queries in data warehousing systems, as they provide analysts the perspective on basic business measurements that reflect the business execution conditions between and within data sources.

In this context, the Big5 dashboard has been proposed and developed based on three data sets of mobile phone logs collected from Orange Senegal [4]. First, those big mobile data sets are extracted, transformed and loaded into the Big5 data warehouse’s fact tables, then aggregated into multi dimensional data cubes, the concepts of which have been introduced in our previous literatures [7 , 15]. Afterward, a set of Big5 indicators has been retrieved and calculated from those multidimensional data cubes. Furthermore, we also reuse an available set of indicators calculated and provided by the Bandicoot tool [3]. In this context, machine learning algorithms, especially, Naive Bayes classification [23, 25] are studied and applied on the just-specified set of indicators to predict if phone users were low, average, or high in Big5 factors [21]. To proof of our concepts, Big5 dashboard use cases have been designed in UML (Unified Modeling Language) and developed to enables user(s) to explore the hierarchical structure of personality traits in an interactive manner.

The rest of this paper is presented as follows: Section 2 introduces typical approaches and projects related to our work; after introducing the Big5 system architecture and its concepts, i.e. Big5 indicators, and Big5 predicted based on Naive Bayes classification in Section 3, Section 4 will present our application results in the context of Big5 dashboard. And lastly, Section 5 gives a summary of our achieved as well as future works.

2 Related work

The features of our approach can be established in the research field of predicting personality traits based on on machine learning algorithms. Moreover, predicting the personality of cell phone clients, besides being imperative exclusively from the mental perspective, can likewise give an important framework for mobile computing [28]. Especially, business people can retrieve information and knowledge from mobile phone data sets, which become very big and contain many features [2].

Personality is an essential factor in deciding individual variation in thoughts, emotions and behavior patterns [23]. In [9], personality can be predicted by abusing different features extracted from Facebook data, e.g., age, gender; statistical data for user’s activities, e.g., number of likes; linguistic features, e.g., word count, etc. Furthermore, [10] used n-grams as features and Naive Bayes algorithm as classification algorithm to determine author’s personality from weblog texts. Moreover, [22] anticipated uprightness by estimating the particularity and objectivity of the verbs taken from WordNet and Senti-WordNet. In [1], personality is specified by different classification approaches, for example Support Vector Machine, Bayesian Logistic Regression. Moreover, On another hand, the ability to draw relationship between behavioral aspects rectrieved from relevant information integrated from cell phones, just as personality, could prompt modeling and applying AI machine learning techniques to predict user’s personality traits [2, 5].

In [2, 3], we have been focusing on main key features: developing a data management system [9 , 16–19] to handle mobile logs, which can be seen as a big amount of data; calculating new Big5 indicators, which could be retrieved from the data management system; then deploying a tool to predict Big5 personalities. This paper can be considered as an extended version of our previous work in [16]

3 Big5 system architecture

In this section, the Big5 system architecture and its components are presented as illustrate in Fig. 1.

Fig.1

Big5 system architecture.

First, three Orange Senegal big mobile phone data sets [3] have been preprocessed and integrated into the Big5 data warehouse. Afterwards, a set of indicators are retrieved from Big5 data cubes, and then have been used to predict Big5 traits as presented in sub-section 3.4. Furthermore the Big5 dashboard also provide an interactive framework which enables user(s) to customize Big5 indicators in term of dashboard slides.

3.1 Orange mobile data

In this section, mobile data sets gathered from the 1666 towers distributed across Senegal [3] have been preprocessed and used to specify Big5 indicators. The datasets include log information of calls and SMSs exchanged among 9 million mobile users during the year 2013. With regards to this big data challenge, those mobile log infromation have been appropriately anonymized before being handled to scientists [4]. An example of SMSs antenna to antenna mobile logs is presented as follows:

3.2 Big5 data warehouse definition

According to [16], the Big5 data warehouse has been defined:

Big5DW=< Big5Dims,Big5Facts,Big5FTs,Big5Gbys>

Where:

Big5Dims is a set of main dimensions, i.e. Site and Time.

Big5Facts is a set of decision variables, i.e. number_of_calls, total_call_duration, number_of_sms.

Big5FTs={FTCalls,FTSMSs} are fact tables, namely FTCalls, FTSMSs. Table 2 represents a subset of FTCalls.

Big5Gbys contains grouped data cubes specified by the hierarchical levels of Time and Site dimensions, e.g. CallbyNightTime, SMSbyMonth, SMSbyYear, etc.

Table 1
An example of SMS antenna to antenna logs

sms_timestamp outgoing_site_id icoming_site_id number_of_sms

2013-01-02 13 24 393 2

2013-01-02 13 24 394 1

2013-01-02 13 24 396 1

2013-01-02 13 24 408 1

2013-01-02 13 24 415 1

2013-01-02 13 24 420 3

sms_timestamp	outgoing_site_id	icoming_site_id	number_of_sms
2013-01-02 13	24	393	2
2013-01-02 13	24	394	1
2013-01-02 13	24	396	1
2013-01-02 13	24	408	1
2013-01-02 13	24	415	1
2013-01-02 13	24	420	3

Table 2

A subset FTCall

call_timestamp	outgoing_site_id	icoming_site_id	number_of_calls
2013-01-01 00	1	1	1
2013-01-01 00	1	2	1
2013-01-01 00	1	24	1
2013-01-01 00	1	186	1
2013-01-01 00	2	2	22
2013-01-01 00	2	3	2
2013-01-01 00	2	4	4
2013-01-01 00	2	5	8
2013-01-01 01	2	6	7

3.3 Calculating Big5 indicators

In [16], Big5 classifiers have been defined: $CL = {A, C, E, N, O}$ (1)

Table 3 shows an example of specified Big5 indicators for towers from Orange data sets.

Table 3

Example of specified Big5 indicators for towers from Orange data sets

tower_id	arr_id	latitude	longitude	outgoing_calls	incoming_calls	outgoing_sms	incoming_sms	duration_incalls
25	2	–17.487569	14.707238	77398	73956	146019	146506	5584846
26	2	–17.485364	14.726616	215384	212481	428666	426613	13603883
27	2	–17.484902	14.718472	80592	74537	121729	120997	5277636
28	2	–17.480833	14.724458	732658	715594	1306120	1302504	47532785
29	2	–17481918	14.725431	209396	213886	587164	568230	15685039
30	2	–17.480514	14.713803	256366	235634	392716	397011	14389587
31	2	–17.47905	14.762229	495607	462741	749446	750446	31286638

Afterwards the linking between Big5 indicators and traits can be recognized as illustrated in Table 4.

Table 4

The linking between calculated indicators and the Big5 traits

Indicators	Set	O	C	E	A	N
# of outgoing calls	1		X	X
# of outgoing SMS	1		X	X		X
ratio of incoming SMS to calls	1	X
entropy of outgoing calls	1	X	X	X	X	X
...
# of unique user ids interactions	2
# of outgoing calls (20:00 - 6:00)	3				X
# of outgoing SMS (20:00 - 6:00)	3				X
... .
AR		X	X	X	X	X

3.4 Predict Big5

Based on a class variable cl ∈ CL and a Big5 indictor vector id (id₁, …, id_n) [16], Bayes’ theorem states the relationship as follows: $P (cl | {id}_{i}, \dots, {id}_{n}) = \frac{P (cl) P ({id}_{i}, \dots, {id}_{n} | cl)}{P ({id}_{1}, \dots, {id}_{n})}$ (2)

Applying the Naive independence assumption then: $P (cl | {id}_{i}, \dots, {id}_{n}) = \frac{P (cl) \prod_{i = 1}^{n} P ({id}_{i} | cl)}{P ({id}_{1}, \dots, {id}_{n})}$ (3)

P(cl) follows Uniform distribution [28], hence the probability is equal to each trait. $P (cl | {id}_{i}, \dots, {id}_{n}) = \frac{\frac{1}{5} \prod_{i = 1}^{n} P ({id}_{i} | cl)}{P ({id}_{1}, \dots, {id}_{n})}$ (4)

In our context, formula 4 is used to predict degree levels of personality dimensions. First, we dichotomize indicators into low (l) and high (h) degrees. More detail, a continuous variable id is converted to discrete variable id that has low and high values by using its median. Then, Multinomial Naive Bayes method [24] is used to calculate.

P (cl|id_i, …, id_n) by using distribution of low or high in each personality’s dimension.

For examples, we have a vector of indicators of a user as follows: $id = {12.32, 0.43, 67, 176.51, 2.82}$

After dichotomization, id = {h, l, h, h, l}, we then apply formular 4 to calculate degree of personality dimensions on user’s indicators. $P (O | (h, l, h, h, l)) = \frac{\frac{1}{5} P (h | O)^{3} P (l | O)^{2}}{P (h)^{3} P (l)^{2}}$ (5)

Using the result, we determine degree levels of personality dimensions according to a set of degree values. Low or average or high degree of a trait are specified based on mean and standard deviation as follows: $\begin{matrix} a = mean - standard deviation \\ b = mean + standard deviation \end{matrix}$ (6)

A trait is low degree if its value less than a; average if the value is between a and b, or high degree if the value is greater than b.

Finally, we use sets of median, distributions of the two values and sets of a and b to predict degree level of personality dimensions.

4 Big5 dashboard

In the user’s point of view, the Big5 dashboard has been proposed to empower user(s) to browse the hierarchical structure of personality traits anticipated by Big5 indicators. Therefore, the dashboard consists of set-up and interactive phases. In this context, a set of available calculated indicators for predicting Big5 traits has been used to form an interactive framework in the set-up phase by using Big5 dashboard widget. Afterwards, personalities can be tracked by mean of the Big5 interactive dashboard.

4.1 Big5 data visualization scenarios

The following paragraphs represent a typical scenario development used to analyse the sample mobile data provided by Orange [3]. As illustrated in firgure 2, the derived Big5 map is a data visualization scenario to identify the Big5 traits distribution in Senegal that are highly susceptible to business analysis and other shocks.

Emphasis was placed on key indicators such as number of outgoing calls, duration of calls and entropy of calls, where entropy is a measure of the network variability of different towers contacted from a given tower. As a result, the Big5 map of the dashboard captures the definition of metrics and related context indicators and Big5 traits as presented in Table 4.

4.2 Big5 dashboard Set-up phase

In the dashboard set-up phase, the Big5 dashboard definition artifacts used in our approach are based on the specifying of Big5 indicators which are input parameters of predict functions for the dashboard data visualization scenarios.

Fig.2

Big5 prediction map scenario.

In a typical Big5 data visualization scenario, dashboard user starts by defining some scenario objects, i.e. Big5 Charts, Big5 Map, and then associates these components with Big5 indicators widgets to build the interactive dashboard.

Asigning a value for a slide, we can bind a subset of its related indicators to set a data visualization scenario value based on the predict function defined in 3.4.

4.3 Big5 interactive dashboard

The Big5 dashboard design have to need to manage various types of interactions, including users setting a new value to the set of chosen values by changing slides. Along these lines, interaction semantics includes a predictable perspective on the Big5 indicators and enable the user to cross to different scenarios in a right and complete way.

4.4 Big5 dashboard Use cases

According to [16], Mobile user, Mobile Phone Provider, and Marketing Agents three main actors, which can be abstracted as Big Dashboard User.

As shown in Fig. 3, three basic use cases are UC1: predict Big5 Traits, UC2: view Big5 Charts, and UC3: view Big5 Map.

Fig.3

Big5 dashboard use case diagram.

Fig.4

Big5 dashboard having UC2: view Big5 Chart,UC3: view Big5 Map and UC4: changing Big5 Slides or UC5: changing Big5.

In this context, a use case can be deployed by revoking other related ones. For example, the main use case is the UC1: predict Big5 Traits, which anticipates Big5 traits by the indicators specified by the calculate Big5 Indicator. Moreover, the calculate Big5 Indicator use case has been implemented based on data cube data obtained from the Big5DW data warehouse, which is defined by the specify Big5DW. However, the big Orange mobile log data has been extracted, transformed and loaded into the Big5DW data waerhouse by mean of the etl Big5Data use case.

As presented in sub-section 4.1, the typical data visualisation scenario, namely Big5 prediction map has been modelled and developed by using the UC3: view Big5 Map, which also calls the UC1: predict Big5 Traits. In this context, number of outgoing calls, duration of calls and entropy of calls indicators are calculated by user_ids and arrondissement_ids.

Furthermore, Big5 Dashboard user(s) can have opporturnities to target implementation rate of interevent by using UC4: changing Big5 Slides or BC5: changing Big5 Metaslide.

5 Conclusion and future works

In this paper, we have presented the Big5 dashboard, which is proposed to explore hierarchical structure of personality traits in an interactive manner by mean of measure slides and their predict functions linked to a package of user-defined data visualization scenarios. In this context, a set of indicators and their associations to Big 5 traits have been retrieved and calculated based on the Orange Senegal mobile phone logs [3].

Future work of our approach could then be able to support users in building interactive dashboards in cost efficient and elastic manner that spans all aspects of smart dashboard widget building lifecycle, i.e. data visualization scenarios, indicator computation and their related measure slides.

Footnotes

Acknowledgments

Thanks to Orange Sonatel Senegal and the D4D team for providing the mobile phone data. Support from the Duy Tan University, Vietnam isacknowledged.

References

Alam ,

E.A.

Stepanov and

Riccardi , Personality traits recognition on social network-facebook, In Proc of Workshop on Computational Personality Recognition, AAAI Press, Melon Park, CA, 2013, pp. 6–9.

CNN, Your phone company is selling your personal data, http://money.cnn.com/2011/11/01/technology/verizon att sprint tmobile privacy/index.htm. 2011

Y.-A.

de Montjoye ,

Quoidbach ,

Robic ,

Pentland , Predicting Personality Using Novel Mobile Phone-Based Metrics, 2013, p. 13.

Y.-A.

de Montjoye ,

Smoreda ,

Trinquart ,

Ziemlicki and

Blondel , D4D-Senegal: The Second Mobile Phone Data for Development Challenge. 2014.

de Oliveira , et al., Towards a psychographic user model from mobile phone usage. In: Proceedings of the 2011 Annual Conference Extended Abstracts on Human Factors in Computing Systems ACM, 2011.

Chittaranjan ,

Blom and

Gatica-Perez , Who’s Who with Big-Five: Analyzing and Classifying Personality Traits with Smartphones. In Proceedings of the 15th Annual International Symposium on Wearable Computers (ISWC ’11) IEEE Computer Society, 2011, pp. 29–36.

G.M.

Harari ,

N.D.

Lane ,

Wang ,

B.S.

Crosier ,

A.T.

Campbell and

S.D.

Gosling , Using smartphones to collect behavioral data in psychological science: Opportunities, practical considerations, and challenges, Perspectives on Psychological Science: A Journal of the Association for Psychological Science (2016).

A.D.T.

Hoang ,

S.N.

Ngo and

T.B.

Nguyen , Collective cubing platform towards definition and analysis of warehouse cubes, ICCCI (2) (2012), 11–20.

A.D.T.

Hoang and

T.B.

Nguyen , An integrated use of CWM and ontological modeling approaches towards ETL processes, ICEBE 2008 (2008), 715–720.

10.

A.D.T.

Hoang and

T.B.

Nguyen , State of the Art and Emerging Rule-driven Perspectives towards Service-based Business Process Interoperability, RIVF 2009:1-4.2009. 2009.

11.

A.D.T.

Hoang ,

T.B.

Nguyen and

A.M.

Tjoa , Dashboard by-example: A hypergraph-based approach to on-demand data warehousing systems, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Seoul, 2012, pp. 1853–1858. doi: 10.1109/ICSMC.2012.6378008

12.

Oberlander and

Nowson , Whose thumb is it anyway? Classifying author personality from weblog text. In Proceedings of the COLING/ACL on Main Conference Poster Sessions (COLING-ACL ’06) Association for Computational Linguistics, Stroudsburg, PA, USA, 2006, pp. 627–634.

13.

R.R.

McCrae and

O.P.

John , An introduction to the five-factor model and its applications, J Pers 60(2) (1992), 175–215.

14.

Mount ,

Ilies and

Johnson , Relationship of personality traits and counterproductive work behaviors: The mediating effects of job satisfaction. Personnel Psychology 59 (2006).

15.

T.B.

Nguyen and

N.S.

Ngo , Semantic Cubing Platform enabling Interoperability Analysis among Cloud-based Linked Data Cubes, In Proceedings of the 8th International Conference on Research and Pratical Issues of Enterprise Information Systems, CONFENIS 2014, 2014, ACM International Conference Proceedings Series, 2014.

16.

B.T.

Nguyen and

Ngoc Dung , Big5 Tool for Tracking Personality Traits. In:

Nguyen ,

Gaol ,

T.P.

Hong and

Trawiński , (eds) Intelligent Information and Database Systems. ACIIDS 2019. Lecture Notes in Computer Science, vol 11431. Springer, Cham, 2019.

17.

T.B.

Nguyen and

Wagner , Collective intelligent toolbox based on linked model framework, Journal of Intelligent and Fuzzy Systems 27(2) (2014), 601–609.

18.

T.B.

Nguyen ,

Wagner and

Schoepp , Federated data warehousing application framework and platform-as-a-services to model virtual data marts in the clouds, International Journal of Intelligent Information and Database Systems 8(3) (2014), 280.

19.

T.B.

Nguyen ,

Wagner and

Schoepp , EC4MACS - An Integrated Assessment Toolbox of Well-Established Modeling Tools to Explore the Synergies and Interactions between Climate Change, Air Quality and Other Policy Objectives. ICT-GLOW 2012:94–108. 2012.

20.

T.B.

Nguyen ,

Wagner and

Schoepp , GAINS-BI: Business Intelligent Approach for Greenhouse Gas and Air Pollution Interactions and Synergies Information System, in Proc of the International Organization for Information Integration and Web-based Application and Services IIWAS 2008, Linz, 2008.

21.

Howlader ,

K.K.

Pal ,

Cuzzocrea and

S.D.

Madhu Kumar , Predicting facebook-users’ personality based on status and linguistic features via flexible regression analysis techniques, In Proceedings of the 33rd Annual ACM Symposium on Applied Computing (SAC ’18), ACM, New York, NY, USA, 2018, pp. 339–345.

22.

K.-H.

Peng ,

L.-H.

Liou ,

C.-S.

Chang and

D.-S.

Lee , Predicting personality traits of Chinese users based on Facebook wall posts, 2015, pp. 9–14.

23.

M.T.

Tomlinson ,

Hinote and

D.B.

Bracewell , Predicting conscientiousness through semantic analysis of facebook posts, Proc of Workshop on Computational Personality Recognition, AAAI Press, Melon Park, CA, 2013.

24.

Zhang and

Gao , An improvement to naive bayes for text classification, Procedia Engineering 15 (2011), 2160–2164.

25.

https://en.wikipedia.org/wiki/Bayes%27_theorem

26.

https://en.wikipedia.org/wiki/Big_Five_personality_traits

27.

https://en.wikipedia.org/wiki/Naive_Bayes_classifier

28.

https://vaciniti.com/mobile-phone-users-worldwide/

29.