Abstract
Engaged customers are a very import part of current social media marketing. Public figures and brands have to be very careful about what they post online. That is why the need for accurate strategies for anticipating the impact of a post written for an online audience is critical to any public brand. Therefore, in this paper, we propose a method to predict the impact of a given post by accounting for the content, style, and behavioral attributes as well as metadata information. For validating our method we collected Facebook posts from 10 public pages, we performed experiments with almost 14000 posts and found that the content and the behavioral attributes from posts provide relevant information to our prediction model.
Keywords
Introduction
Nowadays, people worldwide are largely engaged and attached to different types of Internet technologies and social media platforms. All these technologies combined have provided new ways for exchanging feedback on products and services. As stated in [9], this type of circumstances has boosted customer empowerment. Accordingly, customers have the potential of becoming influential with their opinions, recommendations or complaints.
This situation requires the constant incorporation of novel strategies for effectively managing brand’s aims and marketing plans, especially aspects related to customers’ involvement, relationship, and communication [1]. Thus, measuring the impact of produced advertising is an important issue that needs to be included by brands as part of their social media management strategies [10]. According to previous research, the impact of a published post is measured through several available metrics, mainly related to the consumer’s visualizations, reactions, comments, and interactions. Hence, increasing the impact of the published posts will lead to stronger relationships among brand and consumers, allowing customers to create valuable content through social media [14].
Recently, the community of electronic commerce and business research has started to pay attention to how effectively exploit the mechanisms to interact with their customers. Researches have focused on studying phenomena such as the role of social media on advertising, the electronic word of mouth, customer’s relationships management, brand’s performance, among others [1, 13]. Although many works have proposed techniques for finding the relationships between online posts on social media and the impact of such publications measured by users interactions, the vast majority of these research do it as a posteriori analysis [1–4, 14]. This means they focus on finding those characteristics that allowed a post to be appealing for their customers, obtaining valuable insights that enable designing powerful marketing strategies. However, in spite of all the knowledge that these methodologies can provide to specific firms, it is not enough for predicting the impact a post will have prior to its publication. Therefore, a system able to anticipate the impact of individual posts can provide an enormous advantage when deciding to communicate something to the costumers through social media platforms.
In this paper, we propose a novel framework for predicting the impact of publishing posts on a social media network, namely Facebook. Contrastingly to traditional approaches in the field, our method incorporates features that are able to capture content, style, and behavioral features when representing posts. The proposed approach is based on a supervised machine learning strategy, which allows anticipating post’s impact, i.e., either high- or low-impact. For validating the proposed method, we took on the task of collecting a dataset from ten renowned brands on Facebook Mexico. Our performed experiments, over more than 13,000 posts, for six different classification problems, indicate that the combination of the proposed features with some metadata-based attributes, allows an automatic system to obtain acceptable performance results.
We foresee this work will represent an important contribution to the development of novel methodologies in the field of electronic commerce and business research, as well as motivate further research from the intelligent systems and text mining research communities.
The main contributions of this paper are as follows: We collected and labeled more than 13,000 posts from ten renowned brands on Facebook Mexico. This dataset represents a valuable resource for future research work on the field of electronic commerce and business, as well as for the intelligent systems community. We provide evidence on the importance of content-based, stylistic, and behavioral features in combination with metadata-based attributes for solving the task of impact prediction on Facebook posts. We proposed a novel framework, based on a supervised machine learning approach, for solving the problem of anticipating the impact of publishing a post on Facebook.
The rest of the paper is organized as follows. The next section provides a review of related work on the problem of social media and customer relationships management. Section 3 describes the followed methodology for collecting the employed dataset, how it was labeled, and provides some statistics regarding its composition. Section 4 explains the proposed framework based on an supervised approach for predicting the impact of publishing posts on Facebook. Section 5 depicts the experimental setup, and the obtained results for all the performed experiments. Finally, in Section 6 we draw some conclusions and future work directions.
Related work
Consumer engagement is measured by the number of performed activities by users within the social media platform. Normally, these activities vary from platform to platform 1 , however, on Facebook, a typical set of metrics that help to evaluate the level of engagement are: generated reactions (positive, negative, and neutral reactions), number of comments, and the number of times a post is shared [14]. Thus, posts having elevated or low numbers under these metrics, are considered examples of high or low impact posts respectively; meaning a healthy/unhealthy customer engagement relationships. An additionally employed metric is the ROI (return-on-investment) indicator, which is defined as the profit of an investment divided by the cost of the investment [7]. The ROI indicator is one of the most important engagement metrics employed by many companies [8], however, the core of our research is not related to the ROI analytics’ field since we are not interested in the direct sales reported by companies. Instead, we aim at developing automatic models that can anticipate the impact of a publication in terms of popularity, i.e., how reached customers will interact with the publication of some post.
Accordingly, literature establishes that the more capable are the organizations building and sustaining emotional and social ties between their customers and their brands, i.e., a healthy level of customer engagement; the more the benefits that can be obtained. Therefore, many research groups have tackled the problem of how to contribute to both customers experience and customer relationships using social media platforms [1, 2].
On the one hand, the vast majority of the previous work has faced the problem as a knowledge extraction technique for designing powerful marketing strategies. In other words, this type of research proposes analyzing the relationships between several variables and the level of engagement of customers. Thus, it is possible to find what are the main characteristics that provoke customers manifestations (reactions, comments, and sharing). However, a major drawback of these approaches is that they do not consider using this knowledge as part of an automatic method for anticipating the impact of a post. Recent examples of this type of methodologies can be found in [1–4, 14].
On the other hand, a few research works have proposed and evaluated distinct methodologies for implementing predictive systems [10, 18]. In [10] authors proposed using seven features for representing the information contained in a post, namely: category of the post (action, product, or inspirational), the total likes of the brand’s page, the type of content (photo, video, or link), time of the publication, month, weekday and hour of the post, and a feature that indicates if the post was paid for advertising. These features were employed for predicting 12 distinct Facebook metrics. For their experiments, authors employed a SVM regresor, and evaluated their method in 790 posts from a cosmetic company’s page. A similar approach is described in [15] but for estimating the success of eBay smartphone sellers. For representing the data authors proposed near 20 metadata-based features extracted from the eBay platform, such as reachability and engagement (followers), customer feedback (number of positive and negative reviews) and seller information (name, country, etc.). In the work of [12], 164 posts were analyzed from five distinct tourism brands in Spain (dataset is in Spanish). Authors trained a regression model for predicting the number of likes and the number of comments a post will generate. For this, authors proposed as features the post richness (defined as the number of videos, pictures, links are included in the post), time frame (weekday and time of the publication), plus a couple of features associated to the size of the post (in characters) and the number of followers of the brand’s page. Similar to the above-described research, a few studies analyze the importance of the so-called contextual features (URLs, mentions, hashtags) to infer the number of replies a tweet may provoke [6, 16]. Finally, in the work described in [18], authors model the relationship between the text of a political blog post and the number of comments that such post will receive. Authors approached the problem both as a regression problem and as a classification task. An interesting aspect of this work is that as features, authors employed a topic based representation (LDA) instead of metadata-based features. Given the nature of their data, they hypothesize that the nature of the topic contained in the post will influence the number of generated comments.
A common characteristic in previous research is the exclusion of text-based features (except for [18]). Thus, contrary to previous research, our proposed framework incorporates three feature categories: stylistic, content-based, and behavioral. Our main hypothesis establishes that the content of a post (what it says), as well as the style in how is written (how it say it), in combination with how the post is designed for interacting with the community (behavioral aspects) are important elements for accurately predicting the impact of a post. We validate our proposal on a dataset with near 14,000 posts from ten different brands on Facebook Mexico, and compare our results against traditional metadata-based features.
Dataset
Given the lack of a standard corpus for evaluating impact prediction systems, we took on the task of collecting and standardizing a large dataset 2 of Facebook posts from different brands that have an important presence in Mexico 3 . Collected corpus represents a valuable resource, in a non-English language, that can be used for training and evaluating automatic systems that aim at predicting several customer’s engagement metrics, specifically Facebook’s reactions (i.e., Like, Love, Haha, Wow, Sad and Angry), sharing amount, and the number of comments generated by a post. Table 1 summarizes the composition of the dataset.
Table shows the absolute number of reactions (|R|), comments (|C|) and shares (|S|) in the data set. Additionally, average (
) and standard deviation (σ) values of these characteristics are shown
Table shows the absolute number of reactions (|R|), comments (|C|) and shares (|S|) in the data set. Additionally, average (
Under the columns
Observe in Table 1 that the brand with the highest number of reactions, comments, and shares is Cinépolis. This brand is a very well known firm in Mexico, devoted to the movie theater business. The second place in the number of reactions and shares is held by Muy Interesante México. This is a firm mainly dedicated to science and technology diffusion. It is interesting to notice that even though Cinépolis provokes a high number of manifestations from users, is not the brand that produces the most number of posts, which is the case of Muy Interesante México with the highest number of posts.
In Table 2 we show some basic statistics regarding the size of the corpus. The first three columns indicate the size of the collected data for each brand in terms of the number of tokens, the size of the vocabulary, and the lexical richness of the posts. Next two columns show the average number of tokens, and characters contained in every post of every brand. For the latter two, the standard deviation of these metrics is shown between parenthesis.
This table shows the total number of tokens, vocabulary, and lexical richness of each brand’s posts. Additionally, we show the average number of tokens, and characters for each post; between parenthesis the standard deviation is indicated
From Table 2 we can remark that the brands with the largest number of tokens are National Geographic and Discovery Channel, both dedicated to promoting a great variety of programs related to ecology, wildlife, science, among others. Having a great number of tokens indicates that, in general, published posts from these brands are larger in terms of words per post. This phenomenon can be observed in the fifth column of Table 2 where it is possible to see the average number of tokens in the published posts. Lexical richness (LR) is a value that indicates how the terms from the vocabulary are used within a text. Is defined as the ratio between the vocabulary size and the number of tokens from a text (LR = |V|/|T|). Thus, a value close to 1 indicates a higher LR, which means vocabulary terms are used only once, while values near to 0 represent a higher number of tokens used more frequently (i.e., more repetitive). From our dataset, observe that the brands with the lowest LR values are National Geographic and Cinépolis, which means their produced posts employ a similar vocabulary. We hypothesize that this could be a marketing strategy since, for the case of Cinépolis, allows them to reach a high number of consumers manifestations in their posts in spite of being reiterative.
As we mentioned, our goal was to collect a dataset for evaluating the performance of automatic methods for determining the impact of publishing a post on Facebook, in other words, anticipate the consumers’ engagement. For this purpose, traditional engagement metrics were considered [5]: reactions, comments, and sharing.
Therefore, and inspired on the work of [17, 18], we define the task of predicting consumer’s engagement as the process of classifying whether a post will have higher (or lower) impact volume than the average seen in training data. Even though more fine-grained predictions are possible as well (e.g., predicting the absolute number of distinct reactions, the number of provoked comments, and the number of times is shared), our goal in this paper was not oriented to propose a methodology based on regression algorithms. Consequently, we define six binary classification problems, namely: i) comments (|C|), ii) sharing (|S|), iii) total reactions (|R|), iv) positive reactions (|R + |), v) negative reactions (|R - |) and, vi) neutral reactions (|R ⊙ |). Each classification problem has the categories high-impact and low-impact.
The followed methodology for assigning each post’s category, i.e., either high- or low- impact, consists in the following steps: for each classification problem (i.e., the considered metrics), we compute the average value of metric k among all the posts from the ten brands, this is referred as
Number of high- and low- impact instances for each problem
Number of high- and low- impact instances for each problem
Our general framework relies on the traditional pipeline of an automatic classification system. The classification problem is a learning problem, where the function F (x) = y needs to be learned given a series of pairs <x, y> where x is an example of an instance and y is the class of such example. Usually, y ∈ Y and |Y| is the total number of predefined classes for a given classification problem.
Particularly, our goal is to learn six functions, one for each metric that are relevant to know the overall impact of a Facebook’s post. Thus, our six classification problems are: impact of total number of comments, impact of number of shares, impact of total reactions, as well as, impact of positive reactions, impact of negative reactions, and impact of neutral reactions. And the predefined classes are high-impact and low-impact for each of the previously mentioned problems.
The methodology used in all classification problems is showed in Figure 1. Each process in the figure is explained below:

General framework of our proposed method.
First, for each post (p) a preprocessing is performed. The general idea to this step is to standardize the posts content to avoid having textual attributes with irrelevant semantic information. For instance, we do not care about all different URLs included in the posts, we only need to know that a post has an URL.
In this regard we replace all different url, hashtag, emojis and users’ mentions to unique tags such as <url> ,<hashtag> , <emoji> , and <mentions> , respectively.
Feature selection
The next process in Figure 1 is a feature selection process. As we established, we want to include information about the what is been posted as well as how these posts are written. Consequently, in this process of our methodology we extracted the following types of features: Content-based that capture the what, Style-based and Behavioral to capture the how, and we include two more matadata-based features (as these are usually included in the previous works): Interaction and Time.
For the
Additionally, we include two types of metadata attributes: type of links included in the posts (we called this type
Representation
Once we had selected the corresponding feature type, we represent each post in a multidimensional vector, where the number of dimensions correspond to the total number of features of a given representation.
The vectors are normalized to values between 0 and 1 to reduce the impact of differences between ranges of different type of features.
Classification model
The fourth phase of the general framework (see Figure 1) is to apply a learning algorithm for each classification problem. For this stage, we apply four of the widely algorithms used for text classification. At the same time, we selected one algorithm of 4 different families: Probabilistic (Naíve Bayes), Decisions Trees (DT), with kernel functions (SVM), and Instance-based (k-NN).
As we have mentioned, to provide an overview of the general impact of a post in the consumer, we generate six different prediction algorithms. At the end, the content manager of a given brand can determined the average impact given the predicted impact of comments, shares, total reactions, as well as, positive reactions, negative reactions and neutral reactions.
In the next section we describe the experiments performed as well as the obtained results.
Experiments and results
To test our proposed method, we used the filtered dataset (FL in Table 2) with a total of 13651 Facebook’s posts. To evaluate the classification performance we used the F-score metric, and for all experiments we employ a stratified 10 fold cross validation technique to compute the performance. Note that we do not make any distinctions among the posts of particular brands, we aim at building a general classifier instead of having a specific model for each brand.
One of our research questions establishes if the combination of the what plus the how in the process of post’s representation can be better at predicting the impact of our six metrics than using only features that answered the how. With this in mind we performed two sets of experiments. Firstly, we used as features only single types of attributes for representing post’s information. This type of configuration aims at validating the pertinence of these type of features as has been proposed in many of the previous work. Second, we incorporate the content features to determine the impact of considering textual information on the posed task.
Figure 2 shows obtained results for the first set of experiments. It is important to mention that the size of the representation vector for each of these experiments is very small (between 4 and 5 features). One detail to notice in the Figure 2 is that the best classification algorithm for all problems is Decision Trees, which makes sense given the small number of attributes used as post’s representation. Also, we can observe that the style attribute alone, is the second best predictor for each problem. However, the best performance outcome happens when a combination of the four type of attributes is used (b+s+i+t). Among the less useful set of features are the metadata-based ones: interactions and time, where all instances were classify as the majority class (i.e., low-impact); Therefore, from hereafter, these two types of features are not used in the second set of experiments.

Performance of our method for predicting the impact of brand’s posts using our proposed features types independently. b+s+i+t stands for the combination of behavioral, style, interactions and time features in the vector representation.
So far, Figure 2 shows very consistent results for all problems; but nevertheless, minor aspects are worth mentioning. For instance, the most difficult classification problem is predicting the impact of negative reactions. However, one of the best performances is in predicting the impact of positive reactions. This results can be due to the fact that the positive reactions problem is trained with a slightly less unbalanced dataset in contrast to the negative reactions problem (see Table 3).
Figure 3 shows the results of the second set of experiments. For including the text, we used a traditional bag-of-word approach to represent each post. We used only the 10000 tokens more frequent in each problem. The black solid line in each graph indicates the best performance of the previous set of experiments (i.e., from Figure 2). In general, we notice that for all problems using the content feature (alone or in combination with other type of feature) outperformed the best results using only single types of attributes; we compare the best performance from the first set of experiments (black solid line) against the best result obtained in the second set of experiment (c+b+s+i+t in Figure 3) and we found that in five out of six problems, the differences are statistically significant with a p < 0.0001; for the sixth problem, Positive Reactions, the difference is also statistically significance but for p = 0.01 (for this test we use two-tailed t-test). Another aspect to note is that, on one hand, the best learning algorithm for four out of six problems is the probabilistic one. Support Vector Machines, on the other hand, is also the best algorithm predicting the impact of negative and neutral reactions.

Performance of our method for predicting impact of brand’s posts using our proposed features types in combination with the text (content feature). c, b, s, i and t stands for content feature, behavioral feature, style feature, interaction and time features, respectively.
As shown in Figure 2, the poorest performance was in the prediction of impact of negative and neutral reactions. That is, those two problems are very difficult to solve. On the contrary, the best overall performances were obtained for predicting the impact of total reactions and positive reactions, follow by predicting the impact of shares and comments.
In all the evaluated classification problems, the best performance was obtained using the combination of all our proposed features: the what and the how plus the metadata information. On one hand, the small different in the performance of using only the content feature (the what) with the best results, particularly for predicting the impact of Comments and Shares, gives us some clue of the importance of the content in predicting our six variables. On the other hand, for predicting reactions (total or positive reactions, particularly) there is a clear improvement in the performance when combining content with behavioral features. This means that using the number of occurrences of hashtag, emojis, users’ mentions and links is important to predicting the impact of a post. This type of features were included to give some information regarding the social media lingo used when communicating some information. According to our results, including this type of attributes helps to reach the consumers and induce them to express their feelings towards the brand.
Aside from the prediction tasks such as described above, the proposed approach itself can be informative for people in charge of designing marketing strategies. As stated so far, our proposed framework is able to determine the impact of publishing a post on Facebook. Given that part of our goals was to design a generic impact prediction method, i.e., not brand dependent, our approach allows us to envisage characteristics from high and low impact posts.
In order to exemplify the type of information that can be obtained with our proposed method, we retrieve eight examples (four high-impact, and four low-impact posts), and analyze its characteristics. In Figure 4 we show two high-impact posts (a and b), and two low-impact posts (e and f) from Cannon Mexicana. Similarly, we retrieved two high-impact posts (c and d) and two low-impact posts (g and h) from Nikon’s Facebook page.

Examples of high-impact (a, b, c, and d) and low-impact (e, f, g, and h) published posts extracted from the Cannon Mexicana and Nikon.
Given the nature of these two brands, we found interesting to analyze its publications. As it is known, these two firms compete in the field of photography, they both promote photography courses, professional photography equipment, etc. If we observe Table 1, notice that Nikon publishes a bit more posts than Cannon Mexicana (1,357 vs. 1,157). However, Cannon Mexicana has a significantly greater number of reactions, comments, and shares than Nikon; for example, Cannon has more than 2 million reactions while Nikon has barely 829,560. After examining their most representative posts (Figure 4), we notice the following: i) High-impact Cannon’s posts have a more juvenile way for interacting with their customers, they employ emojis, drawings, as well as less-formal language; ii) contrastingly, Nikon uses a more formal style of writing, and their posts refer (mainly) to photography courses, while for Cannon, their posts refer to photographers activities or situations.
With respect to the low-impact posts for both firms, it is interesting that for Cannon, their less popular posts talk about technicalities of the cameras, such as focus points and lens’ characteristics. A similar phenomenon occurs for the case of Nikon, where their less popular posts talk about the results of a workshop. Thus, as a preliminary result from this analysis, we could conclude that Nikon needs to produce less formal posts in order to reach a higher level of consumer engagement activities, in other words, change the way they use emojis, hashtags, mentions or links in their published posts.
We performed a similar analysis between the high- and low-impact posts from Discovery Channel and National Geographic. Due to the lack of space, we don’t show the obtained most relevant posts. However, for this particular case, we found that for both brands, their less popular publications always refer to TV programming of their respective channels. Regarding their most popular posts, we found that every time these brands publish something related to science and technology diffusion, customers engage positively.
This paper focused on proposing a novel framework for anticipating the impact of publishing a post on a company’s Facebook page. Our main hypothesis establishes that if an automatic classification algorithm is able to accurately model the what and the how a post should be written, then it will be possible to predict its impact, i.e., its consumer engagement level. Thus, our proposed approach incorporates features that are able to capture content, style, and behavioral characteristics from posts.
In order to validate our hypothesis, and given the lack of a standard corpus for evaluating this type of approaches, we took in the task of collecting and standardizing a large dataset of Facebook posts from different brands in Mexico. The collected corpus represents a major contribution of this work, and aims at providing resources for future research work in non-English languages. Accordingly, we evaluated our proposed approach in predicting traditional engagement metrics, such as reactions (total reactions, positive, negative, and neutral), comments, and sharing. We performed experiments on our collected dataset, which contains more than 13,000 posts from ten different brands on Facebook Mexico, and compare our results against traditional metadata-based features. Obtained results indicate that what and how the companies write, in combination with some traditional metadata-based features, allows to obtain the best performance. A qualitative analysis allowed us to observe what are the aspects our proposed model is learning. For instance, we could notice that for some particular brands, competing in the same market, their behavior (i.e., the use of emojis, hashtags, or mentions), in combination with the topics of the post, are very important for improving costumers engagement.
Some relevant advantages of the proposed method are: is a language-independent approach, is not biased towards a specific brand or product type, and allows to obtain relevant insights that could be beneficial for community managers providing them some interesting knowledge.
Several ideas arise from this initial research for future work. First, the proposed model could be enriched with other stylistic and content features. For example, character n-grams are known for providing valuable stylistic information. Regarding content, we plan to incorporate some topic-based features, such a LDA or second order representations. Finally, there exist some evidence on the relevance of detecting the post’s sentiment as a feature, we plan to evaluate how beneficial could be to incorporate this type of features in our framework.
Footnotes
Acknowledgments
First Author was partially supported by the CONACyT Thematic Networks program (RedTTL Language Technologies Network) with project numbers: 281795 and 295022; and by UAM Cuajimalpa. The third author was partially supported by ADOBE project, from Idiap Research Institute, Switzerland, and by the Information Technologies Department from UAM Cuajimalpa, Mexico. Authors also thank the facilities provided by the Information Technologies Department from UAM Cuajimalpa, Mexico to develop this research. Finally, we thank Orlando Hernández Hernández who was responsible for developing part of the tools that helped in the recollection of the compiled corpus.
Compilation of the data was done from November 2018 to January 2019.
