From words to connections: Word use similarity as an honest signal conducive to employees’ digital communication

Abstract

Bringing together considerations from three research trends (honest signals of collaboration, socio-semantic networks and homophily theory), we hypothesise that word use similarity and having similar social network positions are linked with the level of employees’ digital interaction. To verify our hypothesis, we analyse the communication of close to 1600 employees, interacting on the intranet communication forum of a large company. We study their social dynamics and the ‘honest signals’ that, in past research, proved to be conducive to employees’ engagement and collaboration. We find that word use similarity is the main driver of interaction, much more than other language characteristics or similarity in network position. Our results suggest carefully choosing the language according to the target audience and have practical implications for both company managers and online community administrators. Understanding how to better use language could, for example, support the development of knowledge sharing practices or internal communication campaigns.

Keywords

Digital communication homophily honest signals language similarity organisational behaviour social network analysis text mining

1. Introduction

The engagement of employees in an organisation contributes to their well-being [1,2] and to the organisation’s performance [3]. Employees’ engagement is realised through communication [4], which encompasses both the language people use to communicate and the interactions and relationships they have [5 –7]. Nowadays, in most organisations, communication also happens through digital platforms [8,9] (email, intranet, online forums), where employees can interact and exchange opinions and ideas [10]. Use of digital communication has been shown to predict employees’ engagement level [11 –13] and to impact future business performance [11,14]. But what might foster employees’ use of digital communication?

In recent years, several studies [11,15 –20] have used a network perspective to answer this question by studying patterns of communication as manifested in language use and interactions between employees. They seek to capture ‘honest signals’ conducive to collaboration and communication among people [21], particularly signals appearing in digital conversations.

Adopting another stance, studies of socio-semantic networks have shown that interactions and relationships between people are linked to the similarity of the words and expressions they use, both face to face and online [22 –25], while similarity in their attitudes, points of view and roles have been linked to their occupying similar positions in the organisation’s social network [26 –29].

This study proposes to consider similarity of word use and of social network position in order to contribute to a better measurement of ‘honest signals’ conducive to digital communication between employees. Being able to capture these signals could enable the organisation to foster effective digital internal communications and monitor their positive impact on the company performance.

2. Honest signals

The concept of honest signal is borrowed from evolutionary biology [30]. A signal is an attribute or a behaviour that changes the behaviour of receivers to the benefit of the sender. An honest signal is one that reduces the uncertainty of a situation [31]: for example, the cry of a bird signalling the presence of a predator. Pentland [21] argues that the concept can be applied to human behaviour to understand communication among employees within organisations. When we communicate with other people, there are subtle behavioural patterns that reveal our attitude towards them; these are unconscious social signals that complement our conscious language [21]. Indeed, our social behaviour and use of language are the result of both conscious (intentional) and unconscious mental processes [32,33], jointly creating ‘honest signals’.

In their quest for honest signals favourable to collaboration and communication, Gloor and colleagues have considered three dimensions – language, social connectivity and interactivity – that can act as signals for employees’ participation in the digital conversation. In order to capture those signals, they used social and semantic metrics of digital communication and showed their signalling power in several business contexts [11,15,16].

Three aspects of language are included in the framework: complexity, sentiment and emotionality. Language complexity refers mostly to shared language: the more one uses dissimilar words (not used by others), the more one’s language is considered complex. Sentiment expresses the positivity or negativity of the words used, whereas emotionality expresses the deviation from neutral sentiment or, in other words, the intensity of positive or negative feelings.

Social connectivity looks at an individual position within a communication network: the more central the individual is, the higher is his or her connectivity. Interactivity tries to assess how active individuals are in the network by analysing the variation in their network position over time or by looking at how quickly (or slowly) they respond to posts, comments or emails.

In studies using this framework, authors have shown that low language complexity plays an important role in email communication with customers, ultimately improving their satisfaction [11]. In the same vein, the use of a shared language proved to support online community growth [20]. Hence, it appears that greater similarity in word use (less complexity) favours better and more active digital communication. Finally, language complexity and emotionality in the email communication of company managers were also shown to be linked to their level of engagement with their job [34].

Concerning interactivity, it has been shown that people’s oscillations between central and peripheral positions in the network (called ‘rotating leadership’ by the authors) play a very important role in online communities, favouring their growth and participation [20], and that shorter response times to emails support better communication between clients and service providers [11]. There was also a significant association between rotating leadership and team creativity [35,36]. Similarly, a higher level of rotating leadership proved to support the innovation capability of start-ups [37] and the probability that they would survive external shocks [38].

The results highlighted above show that the metrics for measuring the dimensions of language, social connectivity and interactivity uncover ‘signals’ that can give managers invaluable information, leading to efficient action to support employees’ engagement. As maintained by Gloor [15] – who used these metrics to analyse digital communication in more than 200 organisations – considering together language use, social structure and interactivity, is necessary to obtain a complete view and comprehensively evaluate social dynamics in business contexts.

2.1. A socio-semantic perspective

With regard to language and interaction, studies have shown that interactions and relationships between people are linked to the similarity of the words they use [22 –25,39 –42]. These studies use social network analysis to show how word use and relationships are intertwined.

More specifically, these studies are intended to explain and model the emergence of similarities between people’s word use and the emergence, development and severing of social bonds. For example, Carley [43 –45] argues that interactions can lead to a shared vocabulary and that word use is affected by changes over time in people’s conceptual and social environments; Roth and Cointet [23] have studied how social and socio-semantic networks, two-mode networks linking people and the words they use, co-evolve in networks of scientific collaborators and bloggers; Saint-Charles and Mongeau [25] have shown that centrality in an influence network is linked to word use similarity in workgroups and that this relation is transformed during the life of a group.

According to the theory of homophily [46,47], similarity leads to the development of relationships based on attraction to others that are deemed similar to us [48]. Although most studies have explored similarities based on socio-demographic variables, several authors have extended their analysis to a wide range of variables – including attitudes, psychological traits and values – that are seen as latent homophily factors [49 –52], and have shown that the ‘homophily phenomenon’ is complex and is not based solely on socio-demographic factors. Relational aspects, assortative mechanisms based on individual attributes and proximity factors can all influence the way people communicate and the frequency of their communication [37,53 –57]. As for network position, it appears that people occupying similar structural positions in the network [26 –29,58] tend to share similar opinions and behaviours. Maciel De Oliveira [59] shows that similarities between students occupying equivalent structural positions – and specifically a central position – enhance their tendency to identify with one another and to choose one another as workgroup partners. In a study of a large online network, Roy et al. [60] have shown that similarity in centrality is linked with role similarity of actors in the network. Finally, in an online learning community, Cho et al. [61] showed that people move from peripheral participation towards full participation in the community of practice. In that process, the central actor’s position was significantly correlated with the amount of information shared.

Given these findings, we assume that word use similarities between people, and particularly between people occupying similar centrality positions in a network, can act as an honest signal conducive to the development of social interactions in a digital communication network. Therefore, we hypothesise that:

H1. Employees’ word use similarity is positively correlated with dyadic digital interactions.

H2. The similarity of employees’ centrality in the digital communication network is correlated with dyadic interactions.

In other words, the more dyads are similar in their word use and their centrality position in the network, the more they will interact (or vice versa). Aside from their theoretical relevance in the search for an honest signal of collaboration and communication, these similarity measures should prove useful in the specific context of studying internal digital communication systems as they offer a way to identify clusters and inequalities in the distribution of interactions.

3. Research design

In order to verify our hypotheses, we analysed the digital communications of some 1600 employees working for a large company. This company has an intranet social network, structured as an online forum in which only employees can interact, exchanging opinions and ideas as they share news and comments. In the style of well-known online platforms such as Reddit or the Tripadvisor travel forum, employees can either open new threads or comment on issues or news items that have already been posted. User posts generally discuss topics related to company performance or its internal and external initiatives – for example, employees comment on changes in human resource (HR) policies, news of the company’s performance in terms of earnings or stock prices and news of the company’s technological investments or its latest patent applications. The forum is also used to discuss work-related topics, with employees helping one another or sharing knowledge, for example, in order to find solutions to technical problems. In this context, we were able to extract and analyse more than 23,000 posts (news and comments) written in Italian over a period of a year and a half. Corpus statistics are provided in Table 1.

Table 1.

Corpus statistics.

Number of posts	23,031
Total number of words (tokens)	2,440,467
Number of unique words (types)	72,973
Type–token ratio	2.99%
Words that occur only once (hapax)	31,978
Hapax–type ratio	43.82%

Users were mostly males (66% in our sample, also reflecting the percentage of male employees in the company) and a small proportion of them (7%) also acted as forum content managers. Content managers worked in the company’s internal communication department and were responsible for shepherding online conversations, opening new topics and answering users’ questions. However, their assignment was not formal, so they were informally leading online discussion as content managers, without using an institutional writing style. Under agreed privacy arrangements, we are prohibited from revealing the company name or other details that could help in its identification. Data were processed in such a way as to protect employee anonymity: names were changed into numerical codes and message contents were not read (even though all messages were public on the company intranet). This is why we could not carry out a more in-depth content analysis, for example, through topic modelling [62]. Reading messages was not necessary for the calculation of the semantic variables included in this study.

The first step in our analysis was to build a social network representing forum interactions. This network is made of N nodes, one for each forum user, and M edges. There is an edge between two nodes if the corresponding employees had at least one interaction – for example, they exchanged knowledge or opinions through comments, or one answered the other’s question. We then proceeded to calculate similarity measures for both discourse and network centrality position. Figure 1 shows our network, excluding isolates and with node size scaled by betweenness centrality. Average degree is 18,78 and the average distance among reachable pairs is 2,26.

Figure 1.

Interaction network.

3.1. Language similarity measures

Using the framework developed by Gloor and colleagues [11,15] and the network-based studies on language and relationships cited above, we looked at five aspects of language similarity: word use (shared words between dyads), sentiment, emotionality, complexity and length of each post.

Text similarity has been calculated using the Python programming language and the NLTK package. Original posts were preprocessed to put them in lowercase and to remove punctuation, special characters and stop words – that is, those words such as ‘the’ or ‘and’ that usually contribute little to the meaning of a sentence [63]. Subsequently, we also extracted stems by removing word affixes through the Snowball stemmer [64,65] – this is an important step in dealing with the Italian language because many different affixes are used to distinguish between plural, singular, masculine and feminine forms, but the root word remains the same. Lemmatisation would be an alternative approach, which can work well with languages like Italian that are morphologically richer than English, when an adequate vocabulary is available. After preprocessing, we transformed the posts written by each employee into term frequency–inverse document frequency (tf–idf) vectors (applying L2 normalisation to take into account the different length of documents) and then calculated cosine similarity scores for each pair of users [66]. Tf–idf vectors are commonly used in text analysis in order to attribute greater importance to words that can better characterise a text document. Specifically, words that frequently appear in a document, but are not frequently used in all other documents, obtain a higher weight [67]. Following this procedure, very common terms are almost filtered out in the process of identifying similarities in employees’ word use.

We also investigated employees’ use of language by looking at the sentiment, emotionality, complexity and length of their forum posts. Length is simply calculated as the average number of characters used in forum posts by an employee once stop words and punctuation have been removed. Sentiment (positivity or negativity of forum posts) is calculated with the machine-learning algorithm included in the Condor social network and semantic analysis software [15]. For the English language, this algorithm was trained on the basis of a large set of pre-classified text documents extracted from Twitter, as detailed in the work of Brönnimann [68]. For the Italian language, the developers of Condor followed a similar procedure and extracted a first training set from Twitter. Subsequently, we collaborated with them to refine the first version of their classifier and improve its accuracy in our research setting. For this purpose, we extracted 1000 random intranet posts (of a length consistent with the average length of all posts) and asked two independent annotators to classify them as positive, neutral or negative. Initial ratings were consistent in more than 91% of cases. The two annotators subsequently met to review and agree on the discordant ratings. The classified documents were used to extend the original training set and improve the classifier’s accuracy. Sentiment varies between 0 and 1, where 0 represents a totally negative post and 1 a totally positive one.

Emotionality expresses the deviation from neutral sentiment and is computed by Condor using the formula introduced by Brönnimann [69]

Emotionality = 2 \cdot \sqrt{\sum_{i = 1}^{n} \frac{{(0.5 - S e n t i m e n t (w_{i}))}^{2}}{n}}

where Sentiment (w_i ) is the value of the sentiment calculated for the word i, with n being the number of words that appear in a single post. Posts containing words that are either strongly positive or negative are considered highly emotional. Emotionality has been frequently used in past research analysing employees’ communication; it has been shown to be significantly associated with employees’ performance and engagement levels [16,34].

Finally, complexity represents the deviation from common language and is calculated based on the probability that each word will appear in the text based on the Tf–idf information retrieval metric [69]

C o m p l e x i t y = \frac{1}{n} \sum_{w \in V} f (w) log \frac{N}{n_{w}}

where n is the total number of words in a post, V is the vocabulary of words that appear in all intranet posts, f(w) is the frequency of word w in the post, N is the total number of posts, and n_w is the number of posts that contain the word w. When rare terms appear more often in a forum post, its complexity is higher. This last measure was also calculated through Condor. Among the many software options, we chose Condor for semantic and network metrics as this software proved to be useful and reliable in past research [11,15,34]. Moreover, its development has been ongoing for many years. We averaged scores of sentiment, emotionality and complexity to obtain metrics at the individual level (we considered all the posts written by each user).

3.2. Network position similarity measures

To study the similarity of the connectivity and interactivity aspects of the framework proposed by Gloor [11], we used the indexes presented below. To measure connectivity, we used network centrality – a construct commonly used in social network analysis to rank the position of social actors [70]. To measure centrality, we referred to the two well-known metrics of degree and betweenness centrality. Degree centrality measures the number of direct links of a node, that is, the number of people an employee directly interacted with in the online forum [70,71]. Betweenness centrality considers the indirect links of a node and counts how many times a social actor lies in between the paths that interconnect his or her peers. Betweenness centrality is calculated by considering the shortest network paths that interconnect every possible pair of nodes and counting how many times these paths include a specific employee (i.e. the node for which the betweenness centrality is calculated) [71].

Interactivity takes into account the evolution of the social dynamics over time. We operationalised interactivity by calculating rotating leadership. This variable counts the oscillations in betweenness centrality of a social actor back and forth on a weekly basis, that is, the number of times his or her betweenness centrality changed significantly, with absolute variations of original values of at least 30%. Rotating leadership refers to informal communication leaders who move from central positions with high brokerage power to more peripheral roles (and vice versa) – thus permitting other employees to become central and mediate online interactions. The term ‘leader’ is therefore meant to identify people who are central in the communication network, but it has nothing to do with formally appointed leaders [20]. The metric was calculated according to the procedure presented by Kidane and Gloor [35] and based on past research that considered weekly intervals appropriate for business contexts [11,34]. We also tried changing the 30% variation threshold in betweenness centrality, without getting to better or significantly different results. On one hand, if an employee maintains a static position, that person’s rotating leadership is zero. On the other hand, we have rotating leaders when people oscillate between central and peripheral positions, activating or taking the lead in some conversations and then leaving space to other people in the network. The presence of rotating leaders has been shown to support the growth of online communities [20]. Rotating leadership dynamics also proved to favour collaborative innovation [72] and the performance of start-ups [37].

An additional metric often used to represent interactivity is the average response time taken by users to answer comments and questions directed to them. However, we could not compute this metric, as the timestamps associated with forum posts were not accurate enough. The timestamps were reliable with respect to the day/week, but not to the hour/minute, which is necessary to distinguish immediate answers from those which arrive several hours later. Measuring average response time in fractions of hours is indeed a common practice when analysing digital communication in business settings; most answers usually appear within a few hours [11,16,34]. Accordingly, referring to day/week, timestamps to calculate average response time would be highly inappropriate and would bias the results.

3.3. Control variables

The control variables we could access were employees’ gender and forum role (content manager or not). Even if gender homophily is not always supported by social network studies, it is very often used as a control variable, as it has been shown that gender can influence online social communication and behaviour [73,74]. Similarly, we control for content manager role, as we expect different behaviour from employees responsible for informally moderating the intranet social network.

3.4. Similarity matrices

Scores obtained for the above-mentioned variables were transformed into similarity matrices. Like a network adjacency matrix, a similarity matrix is made of N rows and columns, where each row and column represents a specific employee. For categorical attributes (gender and being a content manager or not), we have a value of 1 in a cell of the matrix if the two corresponding employees share the same attribute (e.g. they are both females), and 0 otherwise. For continuous variables, we populated the matrices with the absolute value of the differences in individual actor scores.

4. Results

To verify our hypothesis that the more dyads are similar in their word use and their network centrality position, the more they will interact (or vice versa), we first compared descriptive statistics and correlation coefficients for the social network and semantic variables. We found that being a content manager was associated with more central and dynamic network positions: content managers had higher average scores of degree and betweenness centrality and they rotated more. In other words, they had interactions with more people, often acted as brokers of information, and in general did not keep a static dominant position after having fostered a conversation. Indeed, content managers often had the assignment of opening new forum topics for which they received comments that translated into incoming messages. It is also interesting to notice that posts written by central people had greater average lengths but used simpler language (with fewer rare words). These posts might have become more popular – and received many comments which pushed their authors to more central positions – because they were long enough to be informative, but still relatively easy to read and understand (less complex). Tables 2 and 3 show the correlations and descriptive statistics of our social network and semantic variables at the actor level.

Table 2.

Descriptive statistics.

		M	SD
1	Gender	65.71% male
2	Role (content manager)	6.80% content managers
3	Sentiment	0.657	0.168
4	Emotionality	0.327	0.059
5	Complexity	7.614	0.391
6	Length	357.528	594.911
7	Degree centrality	17.765	64.018
8	Betweenness centrality	4037.020	53,311.098
9	Rotating leadership	11.590	19.494

SD: standard deviation.

Table 3.

Actor-level correlations.

Variable		1	2	3	4	5	6	7	8	9
1	Gender	1
2	Role (content manager)	−0.175**	1
3	Sentiment	0.009	0.017	1
4	Emotionality	0.013	−0.012	0.461**	1
5	Complexity	−0.017	−0.079**	−0.032	−0.091**	1
6	Length	0.055*	0.056*	0.043	0.071**	−0.213**	1
7	Degree centrality	−0.008	0.285**	0.045	0.113**	−0.280**	0.390**	1
8	Betweenness centrality	0.012	0.170**	0.050*	0.157**	−0.393**	0.518**	0.887**	1
9	Rotating leadership	0.030	0.252**	0.031	0.060*	−0.173**	0.214**	0.715**	0.538**	1

*p < 0.05; **p < 0.01.

Table 4 shows the Pearson’s correlation coefficients of digital communications (network ties) with similarity metrics. To address the non-independence of network ties, we assessed the significance of coefficients through permutation tests based on the quadratic assignment procedure (QAP) [75]. As described in the previous section, we measured similarity with respect to several employee characteristics: their gender, role as content manager, word use, connectivity and interactivity. Dyadic text similarity shows the strongest association with digital communication (ρ = 0.48). Employees who more frequently used the same vocabulary communicated more with each other. Apart from gender and sentiment, homophily effects seem to be significant for all the other variables included in our study. Employees who were more similar with respect to their use of words, their interactivity and their degree centrality tended to interact more with each other. With regard to the role of content managers, we see a negative effect coherent with their assignment. In fact, content managers had an obligation to support online conversations, sharing knowledge about the company and answering employees’ questions. Therefore, we might expect that they would interact more with other employees than with other content managers who had the same responsibility.

Table 4.

QAP correlation.

	Similarity metric	1	2	3	4	5	6	7	8	9	10
1	Network interaction
2	Text	0.475***
3	Gender	0.004	0.019***
4	Role	−0.097***	−0.173***	0.079***
5	Sentiment	0.004	−0.003	0.013*	0.013
6	Emotionality	0.051***	0.044***	0.015	−0.001	0.204***
7	Complexity	0.140***	0.097***	−0.001	−0.060*	0.017	0.101***
8	Length	0.142***	0.105***	0.024***	−0.051*	0.009	0.121***	0.348***
9	Degree centrality	0.348***	0.332***	−0.004	−0.261***	0.012	0.148***	0.428***	0.434***
10	Betweenness centrality	0.302***	0.242***	0.005	−0.156***	0.026	0.197*	0.546***	0.565***	0.892***
11	Rotating leadership	0.267***	0.366***	0.012	−0.226***	0.008	0.099***	0.246***	0.259***	0.717***	0.564***

*p < 0.05; **p < 0.01; ***p < 0.001.

QAP: quadratic assignment procedure.

We replicated the calculation of correlation coefficients of digital communication with similarity metrics, while filtering the network to compare content managers with other employees. This served to see whether the associations found in Table 4 were consistent within different groups. As Table 5 shows, correlations of digital communication with text similarity, degree centrality, betweenness centrality and rotating leadership remain fairly high and significant across both groups. These findings highlight the importance of such metrics, and suggest their potential role in digital communication. By contrast, homophily of role exhibits low coefficients, and similarity in gender and language sentiment remains negligible. Similarities in text length and emotionality are significant only for content managers, not for other employees. The effect of complexity is also sharply reduced for those who are not content managers. This suggests that, among employees acting as content managers, word similarity and the emotional content of language played a bigger role in shaping digital interactions. Content managers more often commented on the posts of colleagues who were aligned in terms of length, emotionality and complexity of the language they used. By contrast, these aspects were negligible when looking at the interactions of regular employees.

Table 5.

Network interaction QAP correlations by group.

Similarity metric	Full network	Content managers	Non-content managers
Text	0.475***	0.418***	0.436***
Gender	0.004	−0.033*	0.009***
Role	−0.097***	NA	NA
Sentiment	0.004	0.060*	−0.007*
Emotionality	0.051***	0.216***	0.003
Complexity	0.140***	0.266***	0.012*
Length	0.142***	0.320***	−0.003
Degree	0.348***	0.351***	0.212***
Betweenness	0.302***	0.337***	0.211***
Rotating leadership	0.267***	0.297***	0.142***
N	1611	110	1501

*p < 0.05; **p < 0.01; ***p < 0.001.

QAP: quadratic assignment procedure; NA: not applicable.

We completed the information provided by correlations with the multiple regression models presented in Table 6, in order to check whether digital communication could be at least partially explained by homophily effects. Accordingly, similarity matrices were used as input of QAP multiple regression models and we applied the double semi-partialing permutation method to evaluate the significance of predictors [76]. Indeed, Dekker et al. [76] proved that this approach is one of the most robust against conditions of autocorrelation and collinearity. We also performed a preliminary calculation of the variance inflation factor (VIF), which revealed significant collinearity only between the metrics of degree and betweenness centrality (max VIF = 8.41). In order to avoid collinearity problems in the QAP regression models, we performed a principal component factoring by combining these two variables and retaining just one single factor accounting for 94% of the variance, with the same factor loading of 0.97 for both degree and betweenness centrality. This new variable, named betweenness–degree factor, is higher when employees are more central in the online communication network. A similarity matrix was calculated consistently with the procedure described in the previous section to express similarity of network positions between employees. QAP regression models were implemented using the R programming language and the Asnipe package [77].

Table 6.

QAP multiple regression.

Similarity metric	Model 1	Model 2	Model 3	Model 4	Model 5
Text	1.811660***	1.804603***	1.653310***	1.652887***	1.613879***
Length	0.000013***	0.000013***	−0.000003***	−0.000002***
Gender		−0.001295***	−0.001012***	−0.001048***
Role		−0.003498***	0.006507***	0.006775***
Betweenness–degree factor			0.019318***	0.019907***	0.016154***
Rotating leadership			−0.000199***	−0.000202***
Sentiment				0.001224
Emotionality				−0.009983***
Complexity				−0.003497***
Constant	−0.034849***	−0.030922***	−0.032333***	−0.031098***	−0.029243***
Adjusted R ²	0.2341	0.2342	0.2693	0.2700	0.2675

***p < 0.001.

Regression results confirm the preliminary findings of QAP correlations, highlighting possible homophily effects. We present our models following a hierarchical regression approach, that is, adding blocks of variables to the initial model, to assess their impact on the adjusted R ² coefficient. The first model includes word use similarity and text length similarity metrics (with word use being our main independent variable); the second adds characteristics of individuals (gender and the role of content manager); in the third, we add social network metrics; in the fourth, we complete the analysis with semantic variables (sentiment, emotionality and complexity). Although all of our predictors are significant, except for sentiment, the effect size of many of them is negligible when compared with word use similarity and degree centrality similarity. This is also evident when comparing Models 1–4 with the more parsimonious Model 5. This last model has almost the same adjusted R ² as the full model (0.2675 vs. 0.2700), but only includes two predictors – text similarity and betweenness–degree factor.

To summarise, our results indicate strong homophily effects with respect to word use similarity. Employees tend to interact more with peers who share their vocabulary and a similar level of network centrality – that is, central people interact more with central people and peripheral people with peripheral people. Our main findings are illustrated in Figure 2.

Figure 2.

Main research findings (weaker effects are marked with a dotted line).

We also notice that more central people use a simpler language and write longer, perhaps more informative, posts. Finally, content managers play an important role in building the online swarm [20], rotating more and being more central in social patterns.

5. Discussion

Overall, our results support and nuance our hypotheses stating that similarity in word use and in network centrality position between individuals is correlated with their interactions (H1 and H2). In our data, similarity of word use and centrality in the interaction network contribute to explain digital communication interactions taken as a dependent variable. Those employees who interact most with each other tend to share similar words; they also tend to have a similar number of links with others in the intranet forum and exhibit a similar level of betweenness–degree centrality.

However, those factors do not contribute equally to explain digital interaction. Indeed, although the full model explains 27% of the variance, a model with only text similarity explains 22.6% of the variance – a small difference leading to the conclusion that text similarity is the main driver of interaction.

Our findings differ from what Cho et al. [61] found in an online learning community in which the more an individual became central, the more he or she shared information with others (and therefore interacted with them). In our case, employees having similar centrality interact more with one another, which may lead to the creation of subgroups of people more or less central in the organisation network, with various degrees of access to information and, potentially, various understandings of the organisation’s objectives. This may impede the development of organisational cohesion and a shared organisational vision.

Looking at results from the point of view of homophily theory, we may suppose that individuals are attracted to those whose word use resembles their own. To our knowledge, this is the first study combining social network and semantic analysis to show a strong link between the words used by employees and their digital interactions.

This research also extends the work of Gloor and colleagues, who framed the social and semantic metrics of digital communication and proved their signalling power in several business contexts [11,15,34]. These studies did not consider the homophily effects brought about at the dyadic level by employees’ similarities. Actually, our results indicate that homophily, and especially language homophily, might moderate or mediate some of the results obtained in past research. For example, Gloor et al. [11] showed that faster responses to clients’ emails, the presence of steady leaders and a simpler language can all positively impact customer satisfaction. However, they did not explore the effects produced by language similarity between clients and the account team (which would be interesting to investigate in future research).

Our findings have practical implications for both company managers and administrators of online communities. For example, if a company wants to attract employees’ attention to a strategic topic, in the light of our results, it appears vital to use words close to those used by the target group. Employees’ participation in conversations can be fostered by online messages aligned with their use of words and by choosing social ambassadors who have degree centrality similar to the targets. From this perspective, the choice of the most appropriate ambassadors might be crucial, for example, for the success of internal communication campaigns carried out on intranet social networks.

6. Conclusion

Based on previous observations stating that the employees’ use of digital communication can predict their engagement level [11], and that communication encompasses both the language people are using to communicate and their interactions and relationships [5 –7], and in view of studies showing that interactions between people are linked to the similarity of the words they use [22 –25], we hypothesised that similarity in language and in network position between individuals would be correlated with their interactions. Our results support and nuance our overall hypothesis showing that the main ‘homophilic’ driver of employees’ interactions is language similarity.

It might be useful to replicate our research to see if our findings are confirmed in different business contexts. Future studies could include more control variables, particularly those which are supposed to produce homophily effects, such as employees’ age [78] or their geographical location (not available in our data set). We know that all the employees involved in the study were working in Italy, even if they were grouped in different cities or buildings. The intranet social network was created by the company to support communication and knowledge sharing among geographically dispersed individuals. Indeed, past research has shown that geographical proximity can favour communication – especially face-to-face communication – but not necessarily the sharing of knowledge or of business-related information [37]. An analysis of the possible drivers of word use similarity is beyond the scope of this study. For example, it could be that shared spaces impact language use and both digital and face-to-face communication [79]. This could be considered as control variable in future research. We advocate further research to carry out a longitudinal analysis, which could tell us which actor’s similarity effects can be considered as significant antecedents of digital communication.

In addition, one could argue that having text similarity as one of the major drivers of digital communication is no surprise, as employees who comment on the same forum post will probably discuss a common topic, thus using similar words. This argument would apply to almost all of the studies about the overlap of social and semantic networks. However, in our study, this effect is at least partially mitigated by the fact that text similarity of employees is calculated with respect to their overall word use, that is, considering all their posts, without looking at similarity within topics. Moreover, in an initial qualitative screening of the forum posts, we noticed that many employees maintained their own language, even when commenting on the same forum post. Therefore, word use similarity was only partially influenced by the fact of commenting on the same posts, and should be understood more as an overall metric of content similarity across different forum posts.

It is important to clarify that using the same words does not necessarily mean sharing the same opinion. Consider the extreme case where Person A says ‘The CEO is responsible for our drop in stock price and should be fired’ and Person B says ‘The CFO is responsible for our drop in stock price and should be fired’. Words use similarity is high, even if the two persons are saying different things. We do not see this as a limitation, as Person A and B are expressing different views in similar ways, that is, using a common language. In this scenario, we classify A as much closer to B than to a Person C who says: ‘The CEO is totally incompetent: he is an idiot who should be kicked out of the company’. In short, it is important to notice that we measured word use similarity and not agreement on business-related topics. Content analysis aimed at measuring agreement could certainly be interesting for future research (although we could not do it with the data used here, due to privacy agreements). This example highlights two important considerations: on one hand, our study results support the link between language similarity and interactions shown by other studies in different contexts, none of which have considered agreement, indicating that similarity in the way opinions are expressed may be more important than the actual content of the opinions. For company managers and administrators of online communities, our results draw attention to the need to use similar words when one wishes to influence the opinion of others. On the other hand, it serves as a reminder that all interactions (or relationships), even strong ones, are not necessarily ‘positive’ and calls for further exploration of the links between ‘positive/negative’ interactions and language similarity.

Lastly, further research could explore the impact of the ‘word use homophily’ effect on the overall network: does this lead to the creation of subgroups having different views of the organisation? If so, how does it impact the organisation’s success? Our work could also be a starting point for future research that would see if the major drivers of digital communication on intranet social networks have the same influence in shaping relationships that take place on other media – email, phone calls and Skype calls – or face-to-face communication.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship and/or publication of this article.

ORCID iD

Andrea Fronzetti Colladon

References

De Witte

Vander Elst

De Cuyper

. Job insecurity, health and well-being. In: Vuori

Blonk

Price

(eds) Sustainable working lives. Dordrecht: Springer, 2015, pp. 109–128.

Abildgaard

Hasson

von Thiele Schwarz

, et al. Forms of participation: the development and application of a conceptual model of participation in work environment interventions. Econ Ind Democr. Epub ahead of print 14 March 2018. DOI: 10.1177/0143831X17743576.

Lin

H-F

. Effects of extrinsic and intrinsic motivation on employee knowledge sharing intentions. J Inf Sci 2007; 33: 135–149.

Karanges

Johnston

Beatson

, et al. The influence of internal communication on employee engagement: a pilot study. Public Relat Rev 2015; 41:129–131.

White

. Identité et contrôle: Une théorie de l’émergence des formations sociales. Paris: Éditions de l’École des hautes études en sciences sociales, 2011.

Cooren

. Organizational discourse: communication and constitution. Cambridge: Polity Press, 2015.

Tietze

Cohen

Musson

. Understanding organizations through language. London: SAGE, 2003.

Saastamoinen

Järvelin

. Relationships between work task types, complexity and dwell time of information resources. J Inf Sci 2018; 44: 265–284.

Misuraca

Scepi

Spano

. A network-based concept extraction for managing customer requests in a social media care context. Int J Inf Manag 2020; 51: 10 1956.

10.

Jarvenpaa

Leidner

. Communication and trust in global virtual teams. J Comput Mediat Commun. Epub ahead of print 1 December 1999. DOI: 10.1287/orsc.10.6.791.

11.

Gloor

Fronzetti Colladon

Giacomelli

, et al. The impact of virtual mirroring on customer satisfaction. J Bus Res 2017; 75:67–76.

12.

Brodie

Hollebeek

Jurić

, et al. Customer engagement: conceptual domain, fundamental propositions, and implications for research. J Serv Res 2011; 14: 252–271.

13.

Schneider

Yost

Kropp

, et al. Workforce engagement: what it is, what drives it, and why it matters for organizational performance. J Organ Behav 2018; 39:462–480.

14.

Zhang

Liu

. Understanding new ventures’ business model design in the digital era: an empirical study in China. Comp Human Behav 2019; 95: 238–251.

15.

Gloor

. Sociometrics and human relationships: analyzing social networks to manage brands, predict trends, and improve organizational performance. London: Emerald Publishing Limited, 2017.

16.

Wen

Gloor

Fronzetti Colladon

, et al. Finding top performers through email patterns analysis. J Inf Sci. Epub ahead of print 20 May 2019. DOI: 10.1177/0165551519849519.

17.

Gloor

(ed.). The signal layer: six honest signals of collaboration. In: Swarm leadership and the collective mind. Bingley: Emerald Publishing Limited, 2017, pp. 91–104.

18.

Fronzetti Colladon

Vagaggini

. Robustness and stability of enterprise intranet social networks: the impact of moderators. Inf Process Manag 2017; 53: 1287–1298.

19.

Fronzetti Colladon

Gloor

. Measuring the impact of spammers on e-mail and Twitter networks. Int J Inf Manag 2019; 48: 254–262.

20.

Antonacci

Fronzetti Colladon

Stefanini

, et al. It is rotating leaders who build the swarm: social network determinants of growth for healthcare virtual communities of practice. J Knowl Manag 2017; 21: 1218–1239.

21.

Pentland

. Honest signals: how they shape our world. Cambridge, MA: MIT Press, 2008.

22.

Basov

Brennecke

. Duality beyond dyads: multiplex patterning of social ties and cultural meanings. Res Soc Org. Epub ahead of print 26 September 2017. DOI: 10.1108/S0733-558X20170000053005.

23.

Roth

Cointet

. Social and semantic coevolution in knowledge networks. Soc Netw 2010; 32: 16–29.

24.

Nerghes

Lee

J-S

Groenewegen

, et al. Mapping discursive dynamics of the financial crisis: a structural perspective of concept roles in semantic networks. Comput Soc Netw 2015; 2: 1–29.

25.

Saint-Charles

Mongeau

. Social influence and discourse similarity networks in workgroups. Soc Netw 2018; 52: 228–237.

26.

Borgatti

Foster

. The network paradigm in organizational research: a review and typology. J Manage 2003; 29: 991–1013.

27.

Borgatti

Grosser

. Structural equivalence: meaning and measures. In: International encyclopedia of the social & behavioral sciences. Amsterdam: Elsevier, 2015, pp. 621–625.

28.

Burt

. Social contagion and innovation: cohesion versus structural equivalence. Am J Soc 1987; 92: 1287–1335.

29.

Brass

Krackhardt

. Power, politics, and social networks in organizations. In: Ferris

Treadway

(eds) Politics in organizations: theory and research considerations. New York: Routledge, 2012, pp. 355–375.

30.

Bradbury

Vehrencamp

. Principles of animal communication. Sunderland, MA: Sinauer Associates, 1998.

31.

Johnstone

Grafen

. Dishonesty and the handicap principle. Anim Behav 1993; 46:759–764.

32.

Greco

Polli

. Emotional text mining: customer profiling in brand management. Int J Inf Manag 2020; 51: 101934.

33.

Cordella

Greco

Carlini

, et al. Infertility and assisted reproduction: legislative and cultural evolution in Italy. Rasseg Psicol 2018; XXXV: 45–56.

34.

Gloor

Fronzetti Colladon

Grippa

, et al. Forecasting managerial turnover through e-mail based social network analysis. Comp Human Behav 2017; 71: 343–352.

35.

Kidane

Gloor

. Correlating temporal communication patterns of the Eclipse open source community with performance and creativity. Comput Math Organ Theory 2007; 13:17–27.

36.

Chen

Garg

. Dancing with the stars: benefits of a star employee’s temporary absence for organizational performance. Strateg Manage J 2018; 39: 1239–1267.

37.

Allen

Gloor

Fronzetti Colladon

, et al. The power of reciprocal knowledge sharing relationships for startup success. J Small Bus Enter Develop 2016; 23:636–651.

38.

Raz

Gloor

. Size really matters – new insights for start-ups’ survival. Manag Sci 2007; 53: 169–177.

39.

Danowski

. Change over time in ICA division networks based on semantic similarity of papers presented versus co-memberships. Paper presented at the annual meeting of the International Communication Association, 2013, London, UK, 17–21 June 2013.

40.

Doerfel

. What constitutes semantic network analysis? A comparison of research and methodologies. Connections 1998; 21: 27–36.

41.

Gloor

Diesner

. Semantic social networks. In: Alhajj

Rokne

(eds) Encyclopedia of social network analysis and mining. New York: Springer, 2014, pp. 1654–1659.

42.

Ahmad

Widén

. (2015) Language clustering and knowledge sharing in multilingual organizations: A social perspective on language. J Inf Sci 41:430–443.

43.

Carley

. Group stability: a socio-cognitive approach. Adv Group Process 1990; 7: 1–44.

44.

Carley

. Knowledge acquisition as a social phenomenon. Instruct Sci 1986; 14: 381–438.

45.

Carley

. An approach for relating social structure to cognitive structure. J Math Soc 1986; 12: 137–189.

46.

Mcpherson

Smith-Lovin

Cook

. Birds of a feather: homophily in social networks. Ann Rev Soc 2001; 27: 415–444.

47.

Lazarsfeld

Merton

. Friendship as a social process: a substantive and methodological analysis. Free Contr Mod Soc 1954; 18: 18–66.

48.

Montoya

Horton

. A meta-analytic investigation of the processes underlying the similarity-attraction effect. J Soc Pers Relation 2013; 30: 64–94.

49.

Cuganesan

. Identity paradoxes: how senior managers and employees negotiate similarity and distinctiveness tensions over time. Org Stud 2017; 38: 489–511.

50.

Lawrence

Shah

. Homophily: meaning and measures. Paper presented at the International Network for Social Network Analysis (INSNA), 2007. Corfu Island, Greece, 1–6 May 2007.

51.

Shalizi

Thomas

. Homophily and contagion are generically confounded in observational social network studies. Soc Meth Res 2011; 40: 211–239.

52.

Yuan

Gay

. Homophily of network ties and bonding and bridging social capital in computer-mediated distributed teams. J Comput Mediat Commun 2006; 11: 1062–1084.

53.

Rivera

Soderstrom

Uzzi

. Dynamics of dyads in social networks: assortative, relational, and proximity mechanisms. Ann Rev Soc 2010; 36: 91–115.

54.

Allen

Henn

. The organization and architecture of innovation: managing the flow of technology Burlington, MA: Butterworth-Heinemann, 2007.

55.

Wilson

O’Leary

Metiu

, et al. Perceived proximity in virtual work: explaining the paradox of far-but-close. Org Stud 2008; 29: 979–1002.

56.

Zhao

, et al. Similarity-based link prediction in social networks: a path and node combined approach. J Inf Sci 2017; 43: 683–695.

57.

Seol

Kim

J-D

Baik

D-K

. Common neighbour similarity-based approach to support intimacy measurement in social networks. J Inf Sci 2016; 42: 128–137.

58.

Burt

. Structural holes: the social structure of competition. Cambridge, MA: Harvard University Press, 1995.

59.

Maciel De Oliveira

. Social network analysis and dyadic identification in the classroom. RAM: Rev Adm Mackenzie. Epub ahead of print 5 April 2018. DOI: 10.1590/1678-6971/eramg180051.

60.

Roy

Schmid

Tredan

. Modeling and measuring graph similarity: the case for centrality distance. In: Proceedings of the 10th ACM international workshop on foundations of mobile computing (FOMC 2014). New York: ACM, 2014, pp. 47–52.

61.

Cho

Gay

Davidson

, et al. Social networks, communication styles, and learning performance in a CSCL community. Comput Educ 2007; 49: 309–329.

62.

Blei

. Probabilistic topic models. Commun ACM 2012; 55: 77–84.

63.

Perkins

. Python 3 Text Processing With NLTK 3 Cookbook. Birmingham: Packt Publishing, 2014.

64.

Willett

. The Porter stemming algorithm: then and now. Program 2006; 40: 219–223.

65.

Porter

. Stemming algorithms for various European languages, 2006, http://snowball.tartarus.org/texts/stemmersoverview.html

66.

Tata

Patel

. Estimating the selectivity of TF-IDF based cosine similarity predicates. ACM SIGMOD Record 2007; 36: 7–12.

67.

Ramos

. Using TF-IDF to determine word relevance in document queries. Paper presented at the first instructional conference on machine learning, 2003.

68.

Brönnimann

. Multilanguage sentiment-analysis of Twitter data on the example of Swiss politicians, 2013, https://pdfs.semanticscholar.org/3128/67cfa9d4b03cf9dbfb290daa3521c1d1aa1d.pdf.

69.

Brönnimann

. Analyse der Verbreitung von Innovationen in sozialen Netzwerken. MSc thesis, University of Applied Sciences Northwestern Switzerland, Windisch, 2014.

70.

Freeman

. Centrality in social networks conceptual clarification. Soc Netw 1979; 1: 215–239.

71.

Wasserman

Faust

. Social network analysis: methods and applications. Cambridge: Cambridge University Press, 1994.

72.

Davis

Eisenhardt

. Rotating leadership and collaborative innovation: recombination processes in symbiotic relationships. Admin Sci Quarter 2011; 56: 159–201.

73.

Thelwall

. Homophily in MySpace. J Am Soc Inf Sci Tech 2009; 60: 219–231.

74.

Thelwall

. Social networks, gender, and friending: an analysis of MySpace member profiles. J Am Soc Inf Sci Tech 2008; 59: 1321–1330.

75.

Krackhardt

. Predicting with networks: nonparametric multiple regression analysis of dyadic data. Soc Netw 1988; 10: 359–381.

76.

Dekker

Krackhardt

Snijders

TAB

. Sensitivity of MRQAP tests to collinearity and autocorrelation conditions. Psychometrika 2007; 72: 563–581.

77.

Farine

. Animal social network inference and permutations for ecologists in R using Asnipe. Meth Ecol Evol 2013; 4: 1187–1194.

78.

Kossinets

Watts

DJJ

. Origins of homophily in an evolving social network. Am J Soc 2009; 115: 405–450.

79.

Allen

. Managing the flow of technology: technology transfer and the dissemination of technological information within the R&D organization. Cambridge, MA: MIT Press, 1984.