Abstract
People tagging allows a person to tag one’s self or others; it is reciprocal and therefore has social implications. The main uses of corporate people tagging systems are for building internal social networks, solving problems, and seeking expertise. We explored the statistical and terminological relation between self-presentation and perception by others as reflected by the use of tags in a people tagging system within a large enterprise.
Due to the features of the power law distribution of the data, two different samples were analyzed. Using content analysis, we found that when there are few self or social tags, users prefer to use tags from the Environment and Technology categories, providing tags that tend to be objective or factual. When tagging approaches saturation, it becomes more subjective and social, using tags from the Individual category. Self-tags tend to be more factual describing technology expertise while social tags augment the individual tags by adding a personal dimension. The more people tag and get tagged, the more terminological overlap develops. We conclude by providing practical advice on how to create a sustainable system by balancing originality and duplication using interactivity and feedback.
1. Introduction
Social activity has become inseparable from content creation in many cases on the Web and in the workplace. While the use of multimedia is prevalent, text is commonly used to portray or interpret the characteristics of application users. The focus of this research is on impression management in the workplace as revealed by the analysis of textual tags used to manage or describe an impression about one’s self or colleagues. We compare self-presentation with the perception by others and we also compare two user samples: highly-active and average users.
Impressions online can form based on implicit (expressions one gives) or explicit (expressions one gives off) cues as people present themselves on the Web. A particularly explicit form of describing people is a people tagging system implemented in large organizations. In such systems people describe themselves and their peers by the use of tags. This offers a unique opportunity to study explicit self-presentation and perception by others using real-life data.
The uniqueness of people tagging, unlike tagging other online content, is that people tagging is reciprocal [1]: A person can tag or be tagged. Tagging people can be used for building social networks, solving problems, and seeking expertise [2, 3]. Using tags in order to describe people or colleagues can expose information about skills, roles, projects, and more, which can be helpful for all other users in the enterprise.
The paper begins with background about impression formation, people tagging, and participation patterns in online systems. We then report the results of a study about the difference between self-presentation and perception by others based on data from a people tagging application in a large enterprise. Using content analysis, we compare average users to highly-active ones. Finally, we provide ideas for improving the sustainability and usability of the people tagging system.
1.1. Impression formation
How people present themselves and how they perceive others are questions that challenged social psychologists ever since the pioneering work of Solomon Asch [4]. Impressions are quick to form and fairly stable over time [5]. In order to be socially accepted, people tend to try and control the information they present about themselves in one-to-one encounters as well as in group meetings, or before large audiences.
People generate impressions by two kinds of self-expression: expressions one gives, and expressions one gives off [6]. The first involves relatively easily controlled, presumably intentional expressions, conveyed through traditional verbal communication [7, 8]. The other kind is considered to be more theatrical and contextual, non-verbal, and presumably unintentional.
In social spaces individuals may try to control the impressions others receive and will calculate behavior so that the audience will believe what they see [6]. In order to create an impression, individuals will use explicit expressions such as physical appearance and verbal cues as well as implicit expressions such as body language and non-verbal cues.
Jones [9] described two major strategies of impression management used for attainment of power: ingratiation and self-promotion. Ingratiation is a strategy used by a person who wishes to be liked by others and does not typically involve conscious awareness of deliberate planning. Ingratiation is reactive, done in response to other people’s communication, and is commonly characterized by opinion agreement, compliments, favors, warmth, understanding, or compassion. Self-promotion is a proactive process for generating favorable opinions regarding a person’s competence. Typical examples include self-promotion in relation to getting accepted to an academic program or a new work place. When done in excess, self-promotion may be perceived as intimidating, even off-putting, thus creating undesired outcomes. The people tagging system researched here is a special opportunity to study self-promotion and compare it to the perceptions generated by others.
1.1.1. Impression formation online
Any online activity is an opportunity for self-presentation. Common examples include the construction of a personal home page, the introduction one is expected to make when entering online groups, the descriptions in various social media sites or the profile one accumulates for oneself willingly or not on a variety of online systems [8, 10].
By using built-in system profiling features or by creative content management, participants choose how to introduce themselves, how to manage their portrayed image. As a result, some of the descriptions may be inaccurate or incomplete [11]. The importance of accuracy of the description varies depending on the tool and the use of this information by others.
In social network sites (SNS) it is more complicated to fake information or to have different personalities because in SNS the norm is to have friends. These friends are exposed to the information presented in the profile and if the information is inaccurate they might express doubts on the validity of this information or the system as a whole. Therefore, existence of friends may confirm the self-presentation of the profile owner [12]. This is certainly expected in organizational social networks intended for professional purposes.
Internet users visit SNS and create strategic profiles to influence how others perceive them. While doing so, the presenter creates an image to the receiver which may have varying degrees of truthfulness. Given that the presenter chooses what to present and even who can view the profile page, and given that s/he has goals to achieve, users should be conscious of their friends’ activities and the ways in which they choose to respond. The illusion created by online impression can fade away when a face-to-face meeting eventually occurs [11].
Perception of others online is constructed according to the explicit and implicit cues the other manifests. Each conversation or posting online contains both the carefully controlled, explicit, cues and the unintentional, implicit, cues. In order to make a positive impression a person will carefully select the explicit cues to manifest. The online perceiver is aware of that and he is likely to notice the implicit cues as well. This research will examine both sides of impression management: self-presentation and perception by others.
1.2. Tagging
Users of tagging systems use tags to contribute to a free text taxonomy characterizing web pages, photos, products, and other types of online content [2]. Tags enrich the information about an item and can be used for later retrieval. Since tagging is usually done by end-users, the result is the creation of a web of human topical interest.
Tags can be chosen from a controlled vocabulary or they can be free text assigned by the user. When free text is used, the resulting metadata can include homonyms, synonyms, spelling mistakes, and errors that can lead to inappropriate connections between items and insufficient outcomes for information searches. On the other hand, with free text the user can use terms he thinks are appropriate to describe or help him recall information without the burden of selecting a category from a known taxonomy. Free text also allows for a dynamic update of the vocabulary, maintaining its relevance.
Collaborative tagging systems present new challenges to system designers because social and psychological factors may affect users’ level of activity and tag choices. One factor that is particularly applicable is the principle of social proof (also known as social influence), which indicates that people model their behavior based on others. This principle suggests that: ‘We view a behaviour as correct in a given situation to the degree that we see others performing it’ [13].
1.3. People tagging in a workplace
The special case where the resources to tag are people, is referred to as people tagging. A unique quality of people tagging in contrast to other tagging applications is that tagging is reciprocal: a person can tag or be tagged. From the individual’s perspective, such systems can be described as a social process, enabling use for acting and reacting, posting, or replying [14].
Tagging people offers a new dimension for SNS, and specifically social networking applications within enterprises. Implementing a people tagging system within an enterprise weaves a social network where employees can categorize and characterize their colleagues, aiding better recall, solving problems, and seeking expertise [3].
Ehrlich and Cash [15] claimed that people in an enterprise can provide valuable expertise to solve problems. Knowledge work often involves finding opportunities to contribute to collaborative work. Possibly the simplest way of finding those opportunities is to promote one’s skills to other members of the organization is by the use of tags [16].
People tagging systems are commonly used for contact management. As with social bookmarking, people tagging enables users to organize their contacts into groups, annotate them with terms supporting future recall, and search for people by topic area [1]. It helps people to find, learn about, and keep track of each other in order to improve the effectiveness and reduce the cost of forming and maintaining professional relationships.
Using tags in order to describe people or colleagues can provide employees’ skills, roles, projects, and more information that can be helpful for users in the enterprise. Furthermore, people tagging can provide a picture of one’s social-professional intra-organizational network [17]. The combination of the description and the social network could yield even more compelling results and enhance the reputation of some users. Beyond contact management, people tagging enables the formation of an organizational community that collectively maintains each other’s interests and expertise [18].
The earliest known collaborative tagging system was ‘WebTagger’, which was implemented in 1997, created by a team at NASA’s Ames Research center [19]. WebTagger included a relevance feedback mechanism to rank the tags applied to bookmarks based on previous relevance judgments made by the users. Unlike any other tagging system, WebTagger provided an explicit vocabulary for tagging and not an open one [20].
IBM’s Dogear illustrates how such a system can be adapted to the enterprise [21]. Dogear supports bookmarks of Internet and intranet information sources, and provides user authentication via corporate directories [22].
In the past few years some enterprises discovered the advantages such applications can contribute and integrated the people tagging feature within enterprise applications. As a solution for the private tags problem mentioned above, Razavi and Iverson [22] explored OpnTag where participants can enhance privacy in social tagging systems. The access control feature added to the people tagging application enables users to categorize their friends or colleagues into groups and to decide what information to reveal to any group. Another enterprise people tagging system was Fringe Contacts introduced by Farrell and Lau [2]. Fringe allowed employees to update personal profiles on their own, or by others tagging them with any kind of term so that they provide vital and vast information on their interests, expertise, group affiliation, personal characteristics, and so on. Farrell and Lau showed that the purpose of participating in social bookmarking of web pages is similar to the purpose of participating in people tagging systems. Users tag people for personal benefit (self-presentation or contact management), but when the system becomes popular, a folksonomy of employees is created that can be highly useful in mapping employees’ interests and expertise. The use of Fringe Contact system removed the dependence on the users to update their formal profiles and the advantage for the enterprise is augmentation of profiles in a corporate directory.
Research on Fringe Contacts revealed the accuracy of the information and the purpose for contribution. Farrell et al. [1] used interviews in order to understand whether the tags are accurate and whether the users use the system appropriately in order to find out if any change in the system method or features is required. That study showed that the tags describe the employees’ interests and expertise accurately. In addition, it showed that the employees use the system appropriately and no offensive or inappropriate use of tags occurred. This finding may be attributed to use of the system within an enterprise environment where the users are colleagues and they understand the advantages of correct information for them and for the enterprise. Another study by Farrell et al. [3] examined the purpose of use and participation in the system and showed that an active minority of Fringe Contacts users is using people tagging for building communities.
Two studies compared self-tagging and tagging of others within an enterprise people tagging system based on the Fringe Contacts’ database. Results showed that people put more effort into tagging themselves than into tagging others meaning that users who tag themselves contribute the most to populating the system and making it relevant [14]. Furthermore, while self-tags tend to be more factual describing technology expertise, social tags augment the individual tags by adding a personal dimension [23].
1.4. Participation patterns
Participating in online groups often follows a power law distribution where a minority of participants actively contributes and the majority remains receivers [24]. According to Nonnecke and Preece [25], lurkers make up over 90% of online groups.
In social systems, the users can act or react, post, or reply. The participation ranges from complete inactivity to full activity. Based on [14] we define the ‘participation continuum’, which represents four levels of contribution (as shown in Figure 1):
Active\initiate – a participant who initiates discussion or interaction, acts, and reacts in the interaction.
Active\respond – a participant who responds/reacts to others and does not initiate interaction.
Passive\lurk – a participant who only reads others’ messages. This kind of participant neither acts nor reacts.
Passive\inactive – a participant who is registered in a system but does not actually use it and does not even read others’ messages.

The continuum of participation in social technologies.
Each participant in social applications belongs to one of the four levels defined above. The dilemma for individuals is to either contribute to the common good or to shirk and free ride on the work of others [26]. System designers prefer the users to be more on the active side and to continuously contribute to the system and the organization. This contribution may be either by duplicating existing tags (social proof) or by using new tags that will bring new information to the system. In the corporate tagging system, as in most SNS, overall participation is expected to follow the power law distribution with mostly passive users.
This research will focus on two groups of active users: highly-active and average. From a methodological perspective, the average user group is a balanced representation of the whole active population. From a system development perspective, due to the power law participation distribution, it is interesting to study the group of enthusiasts who are more dedicated to the system in order to explore the highest level of activity and think of ways to induce regular users to become more engaged. Using the database of a people tagging application implemented in a large enterprise, and applying two different and complementary sampling methods we explore the following question: What terms do people use to tag themselves (presentation of self) compared to terms used by others (perception by others)?
2. Method
Content analysis was applied to two large samples taken from data harvested from a corporate people tagging application. In the following we describe the system and data, two sampling methods and the content analysis procedure.
2.1. System and data source
The data were harvested from Fringe, an organizational application within IBM [1]. Fringe enables employees to tag others or themselves with free text tags through a single text box. Multiple tags can be provided either comma separated or through multiple insertions into the text box. There is no categorization or auto-completion of tags. As described in [1], the design principles included no need to ask for permission by the tagged employee in order to enable a lightweight application.
Each employee has a profile page which shows three tag clouds as shown in Figure 2: the tags the employees tagged themselves with (self-tags), tags others tagged the employee with (social tags), and the tags the employee tagged other employees with. The size of each tag in those clouds indicates the relative number of times the tag was used. Hovering over a tag reveals the people that used the tag on the particular employee. Thus, though there is no system initiated feedback on the usage of repeated tags, a user can inspect previous tags used on or by the employee and be influenced by those. This open display of all tagging interactions enabled a system of ‘social translucence’ [1] which causes people to carefully consider their tagging behavior and reputation.

Fringe sample screenshot showing self and social tag clouds.
Following its internal usage as a research project, many of Fringe’s features were implemented in IBM’s social platform, IBM Connections. Its main features were the ability to tag people with free text tags, view the tags a person was tagged with, and view the tags a person used as tags. One could search for all people that were tagged with specific tags. The system was deployed in IBM for many years.
Our data represent a period of 3 years during which the application was deployed on the organization’s intranet. The application had 62,332 distinct participants who tagged themselves and others with one or multiple tags. Each employee had a profile page, which exposed in addition to personal details, the self-tags he assigned to himself and a tag cloud of the tags assigned to him by other employees (social tags).
Every tagging action was registered in the system with the necessary meta-information, including the employee id, employee name, and date. For our analysis the tags related to a person were extracted by querying the database through SQL. The data were subsequently anonymized.
2.1.1. Variables
2.1. Sampling
The database of the application was downloaded and arranged in a relational table containing the self and social tags associated with individual users.
The data follow a power law distribution which is typical of many network-based activities such as web site links and social networks [27], electronic markets [28], discussion groups [29], and more. The people tagging data are characterized by a small number of highly active participants having many self and social tags, and a long tail of occasional participants. Taking a random sample would result in a large proportion of dormant users. In order to examine impression management for highly-engaged users and for average users we analyzed two complementary data samples based on the definitions in section 2.1.1:
2.3. Codebook development
The codebook is based on preliminary work of three independent coders in which each coder categorized tags from a random sample of 25 users (not included in the above samples). Each coder described the tags according to his/her understanding. After examination and discussion common tag classes were formed. These classes fit into three categories: technology, environment, and individual. Each category contains six classes as detailed in Table 1. Technology refers to information systems developed or used by the employees. Environment refers to the employees’ social and geographic circumstances. The Individual category covers personal details.
Codebook categories, classes, and examples.
Having an agreed codebook, coding commenced with each of the three coders coding a sample of tags. Training, coding, and discussions took place until an agreement (calculated as inter-coder reliability) was achieved. With a reliable codebook and coding process we coded all the tags associated with the two samples described earlier.
2.3.1. Inter-coder reliability
The inter-coder reliability test was conducted for the three coders by using a new random sample of 30 system users. Self-tags and social tags were coded. The three coders worked jointly for the first nine participants to reach an agreed understanding of the use of the codebook and then each coder coded the remaining 21 participants. A total of 175 self-tags and 434 social tags were coded.
Inter-coder reliability was calculated using Krippendorff’s alpha index since it is the only index that allows more than two coders. Krippendorf’s alpha for the three coders was 0.75 for self-tags and 0.73 for social tags. Both values are above the threshold of 0.7 needed in order to determine the coding process as reliable.
3. Results
3.1. Tag volumes and ratio
Sample 1 (highly active users) contained 4569 self-tags and 4937 social tags. Each user had 16.26 self-tags on average, and 17.57 social tags. Sample 2 (average users) contained 3513 self-tags and 3034 social tags, on average, each user tagged 7.42 self-tags and was tagged by 6.43 social tags.
Table 2 shows the mean number of self-tags and social tags in each category and the ratio of means between self-tags and social tags.
Mean tags per user in the three categories for both samples.
Technology had the highest number of self and social tags and was the only category where the ratios of means exceeded unity, i.e. people tagged themselves mostly with technology tags, and were tagged by others in the Environment and Individual categories.
Appendix A provides the full results of the mean number of self-tags and social tags in each class and the ratio of means between self-tags and social tags (observations with high values are in bold).
3.2. Overlap analysis
Overlap analysis measures the agreement between self-tags and social tags per user. This is useful for assessing similarities and diversity in tagging. Similarity informs the tag content validity while diversity is desirable for a rich description of the tagged person. Overlap is analyzed across the categories and classes. Overlap in self-tags is defined as the percentage of overlapping tags out of all self-tags. Overlap in social tags is defined as the percentage of overlapping tags out of all social tags. For example, if a user has 50 self-tags, 100 social tags, and of those tags, 20 are found in both types of tag, then the overlap in self is 20/50=0.4, and the overlap in social is 20/100=0.2. We examined the overlap across the users.
Figure 3 shows the overlap measure in both samples across the three categories: Technology, Environment, and Individual. Figure 3 reveals an inconsistency in overlap between the two samples. While the Technology category behaves similarly in both, the Environment and Individual categories vary. The explanation lies in the type and order of tagging. Self-tags are often Technology tags. In addition, self-tags are usually assigned before social tags, since social tags tend to be reactive [14]. The Technology tagging process is likely to reach saturation and stability before the Environment and Individual categories, and this is reflected in the variability seen in Figure 3. The overall overlap found in this study is in the range of 18–36% roughly, whereas in a recent study about book tags overlap was in the range of approximately 12–20% [30].

Overlap between self and social tags across the three categories in both samples.
3.3. Tag popularity analysis
This part focuses on the most popular tags in each category and class and examines the level of agreement between self-presentation and perception by others. Table 3 can be used for comparing self and social tags.
Top 10 most popular self and social tags in each of the categories (Samples 1 and 2).
Table 3 offers a unique view in order to compare Sample 1 to Sample 2. The comparison shows that the smaller sample, Sample 2, contains 60–70% of the results of Sample 1 within the same type of tag and category, with the exception of self-tags in the Environment category where prediction reaches only 40%. Sample 2 can be seen as a preview or predictor of Sample 1. Looking at all the top 10 tags, self and social tags, prediction (similarity) stabilizes at about 70%. If one is interested only in the general direction of the system activity and user profiles, then the parsimonious approach would be to code the smaller sample and assume 70% accuracy.
Examining the correlation between the mean number of self and social tags and class overlap we observe that in Sample 1 classes with a higher mean number of tags (self and social) have higher overlap (see Figure 4), while in Sample 2 the correlation between mean tag number and self and social tag overlap is weaker (see Figure 5). In addition, classes with narrow available vocabulary, for example ‘Organizational Group’ or ‘IBM Product’, have high overlap. Finally, the results in Figures 4 and 5 are compatible with the social proof theory, meaning that the more tags the more overlap and convergence.

Correlation between mean self and social tags and overlap in popular tags (Sample 1) (r = 0.67, P <0.01).

Correlation between mean self and social tags and overlap in popular tags (Sample 2) (r = 0.41, P <0.01).
Comparing self and social tags (presentation vs. perception) can be achieved using the tables in appendices A and B. Overlap at k (ovl@k) is defined as the portion of tags that are among the k most popular self-tags as well as the k most popular social tags. For example, an ovl@5 of 0.6 indicates three overlapping tags within the five most popular. ovl@1 directly indicates whether the most popular tag is identical for self and social tags.
Appendix B shows the 10 most popular tags and the overlap in each of the categories, with overlap @10 in the range of 0.7–0.9. Overlap @1 is 1 for Sample 1 (highly active) but not for Sample 2 (average), indicating a convergence process that takes place as tagging increases. Appendix B shows the most popular self and social tags and the overlap in each class. Name and Training were excluded in this sample since they did not have enough tags for analysis. Overall, the ‘Individual’ category had the most variance in overlap indicating that this category and its classes may be the point of interest for studying the reflection of individual differences in impressions, whereas ‘Technology’ and ‘Environment’ may be more useful for studying convergence and agreement regarding impressions.
4. Discussion
This study set out to explore the terms people use to tag themselves (self-presentation) compared to the terms used by others (perception by others) in a corporate setting based on data from a people tagging system. Because the data follow a power law distribution, it was not clear how to sample it. We opted to analyzing two samples. Sample 1 consisted of the most active users, and Sample 2 represented the entire population using a proportional sample from a cross-tabs table of self vs. social tags. Sample 1 represents a full portrait of the users contained in it, since the number of tags per user in this sample is larger than the tag saturation point of 12–14 tags [14]. Beyond the saturation point new information is scarce and incoming information in the form of new tags is most likely to duplicate known information. Sample 2 is a smaller, and therefore a more economical, sample, however, it provides the situation prior to saturation and in this regard its results can be perceived as indicatory or transient.
The following discussion seeks to interpret the results by referring to three angles: impression management, sampling power law distributed data, and implications for system design.
4.1. Self-presentation vs. perception by others
The most active tagging category is Technology. Most of the users prefer to tag themselves with Technology tags, then with Environment tags, and finally with Individual tags. For social tags this tendency remains, however, the Individual category is more populated in Sample 1, suggesting that Individual-type social tags are practiced by the more enthusiastic system users. This is also an indication that the social process takes time to build. Self-presentation starts by the more factual, Technology, tags and later proceeds to the social and personal descriptions. Both are active forms of participation (Figure 1), however, self-promotion involves initiation while social tagging is likely to be responsive.
The self-to-social ratio shows that people are more comfortable tagging themselves with technology tags (self-to-social ratio >1). On the other hand, peers are more likely to use tags from Environment and Individual categories, adding social and personal dimensions to the tag cloud. Self-to-social tagging ratios were high in tag classes that involved private knowledge, for example, an employee would know more about external products she uses than might be known to her peers. Classes that attracted considerable activity (marked in bold in Appendix B) were attractive to self and social taggers alike. This is an example of social proof, people imitating others’ behaviors by conscious copying, since auto-complete was not available. Based on [14], we may speculate that self-tagging induced social tagging, however the opposite direction could also work. The direction of imitation is secondary in importance to the observation of the existence of social proof in the people tagging system, i.e. the duplication of existing tags.
The extensive overlap between the top self and social tags seen across categories and classes provides further support to the social proof concept. In a hypothetical system one might expect zero overlap since overlap reflects duplication of information, a redundancy. Why do people tag others by tags that have already appeared? To some extent this may represent oversight – people may not read the full profile of others before tagging them – they might employ certain social tags as aids in contact management. However, the large extent of overlap present in our data indicates that duplicate tagging is systematic, and therefore is likely to be an outcome of social proof. Social proof may offer the advantage of validating the initial tags, thereby reducing the negative connotation of redundancy. Yet, another question emerges: what is the right balance between duplication and originality and how can system designers obtain this balance? We address this question in section 4.3 about implications for tagging systems. Next, we compare the outcomes from the two samples in this study.
4.2. Comparing Sample 1 and Sample 2
Generally, users behave similarly in both samples with respect to the three parameters, tag volume and ratio, overlap analysis, and tag popularity. Sample 1 represents the potential level of activity which is currently practiced by a small fraction of the users, the ‘head’ of the power law curve, 281 users. The present findings indicate that Sample 2, the average users, reflects similar results providing a good indication for the results had the system been able to attract higher activity levels. From a research economy perspective, or an Occam’s razor approach, this is encouraging since despite the extreme shape of the power law curve, a parsimonious representative sample is sufficient to learn about the users’ activity and content production. Note that the meaning of ‘representative’ in the current work is a constant proportion from a cross-tabs table of the full population. This is our recommendation for a representative sample from a power law curve as opposed to a completely random sample. A random sample would be highly skewed toward users with low activity.
Sample 1 is larger by a factor of 1.45 in terms of the number of tags analyzed. The users contained in it had more than double the average number of self and social tags compared to Sample 2. Yet, it is important to observe that more social tags are available in Sample 1, so if the social aspect is important for analysis, Sample 1 should be the focus.
Technology tags were most useful for self-presentation as well as for social tagging in the corporate people tagging system, followed by Environment, and by Individual tags. The prominence of Technology tags may be related to the company’s line of business so that in other companies, the technology tagging might be replaced by the particular company’s area of activity. Interestingly, the use of Individual tags as social tags was more pronounced in Sample 1, implying that if system designers are interested in personal descriptions, they need to devise a way to encourage social tagging.
Most of the differences (Table 3 and Figure 3) are explained by the nature of the two samples, meaning that when the volume of self and social tags is low, users prefer to use tags from Environment and Technology categories rather than from Individual (see Table 4). In Sample 2 there are 116 users (25% of the sample) with one self-tag and 105 users (22%) with one social tag. Analyzing these specific users indicates that for self-tags, Environment is the most active category and for social tags it is Technology. In addition, Environment is the only category in which the volume of self-tags is higher than the volume of social tags.
Mean tags per user in the three categories for users with one self-tag or one social tag (sample 2, N = 221).
In order to obtain another angle for comparing samples, the data of the most active people of Sample 2 were analyzed. This sub-sample is similar to Sample 1 in that it contains users with 11 or more self-tags or 11 or more social tags. The data reveal that for self and social tags when the volume of tags is high, the use of tags from the Individual category increases. It is even stronger for social tags, where the mean of Individual tags is higher than for Environment. Technology is still the most active category.
Comparing the results for infrequent and frequent system users within Sample 2 (Tables 4 and 5) indicates that the choice of tags is random for infrequent users, while the frequent users resemble the activity of Sample 1 more so than the entire Sample 2 data presented in Table 2. Overall, it seems that many people try to use the system by assigning a random tag, some of them choose to continue using the system and then their tagging choices indicate awareness of the existing tagging practices. To economize on research resources, our advice would be to follow our average users sampling technique (Sample 2) and then focus the analysis on the most active users in this sample as seen in Table 5.
Mean tags per user in the three categories for users with 11 or more self-tags or 11 or more social tags (sample 2, N = 150).
4.3. Implications for tagging system design
Terminological overlap in Sample 1 was 29%, on average. Assuming that this sample reached tag saturation, we infer that a ratio of approximately 70:30 original to duplicate tags is a reasonable outcome for any social tagging system seeking to balance originality with validity to produce a wealth of information about the object of tagging. The overall recommendation of 70:30 may be too coarse. Possibly for the Individual tags a different ratio (such as 85:15) might be desired in order to extract a higher rate of unique tags. Further fine-tuning of specific classes may also be considered.
System designers may want to direct users to the recommended ratio by providing a relevant statement within the system or by developing a corresponding feedback mechanism indicating whether a given tag is duplicate or original and what tag would improve the tagging process following. Merely adding such a feedback mechanism may have another consequence which is to induce further activity. Feedback would render the system interactive thereby encouraging some of the passive users to become active. Specific feedback such as inviting tags of a particular class could be implemented to enrich the tag cloud. Another mechanism that may induce activity is the provision of a controlled vocabulary while indicating that free-text is equally desirable. A controlled vocabulary would aid in validation by avoiding typos and might aid in variance by suggesting additional terms. The system could also be enhanced by adding a timeline to the tagging process enabling a sort of personal history to be formed. This could inform the company and system users about career paths.
The corporate people tagging system design described in section 2.1 reflects a philosophy of providing an open and flexible platform. Possibly, the complete freedom provided to users might lead to low user activity. Our interpretation of the results leads us to recommend that crowd-based systems should contain a moderate level of prompting and feedback or even a careful selection of gamification elements in order to strengthen participation as well as community aspects of such systems. The frequency of prompting could be a function of activity so as to target only people for whom a nudge might be effective.
4.4. Limitations and future research
The strength of the current study is the employment of an unobtrusive research method in order to learn about activity of users in their natural work environment. However, there are questions that remain unanswered with the current research approach and would benefit from follow-up research using questionnaires and interviews. For example, asking users for the reasons they select specific tags for themselves and for others could lead to deep insights about impression management strategies. The users in the study originated from different units within the organization, such as Sales, R&D, Services, and Corporate. Inspecting different tagging behaviors across organizational units could lead to additional insights on tagging behavior in a large enterprise. Interviewing sporadic users regarding why they use the system infrequently could lead to developing a more engaging interface. The people tagging system is a good choice for conducting future user studies and usability assessments. The interest here, however, focused on the social aspects of use more than on design aspects.
4.5. Conclusion
While Sample 2 is more economical to analyze, Sample 1 provides a richer view of self-presentation and perception by colleagues at work. Tagging systems are often referred to as ‘social media’, yet the present results show that social processes in the system occur only after considerable individualistic activity. Most users are passive beyond sporadic participation. To economize on research resources, our advice is to follow the average users sampling technique (Sample 2) and then focus the analysis on the most active users within this sample.
Self-presentation tends to be factual, focusing on Technology tags whereas perception of others, social tagging, provides the social and personal tags. As the tagging proceeds, social tagging increases and the information about each user deepens. Assuming that system designers and organizations are interested in more than mere contact management or self-promotion, they should develop interactive functions in the people tagging system in order to encourage persisting participation. Higher engagement will lead to richer descriptions and higher system value.
Footnotes
Appendix
Top 10 most popular tags and the overlap in each of the sub-categories.
| Sample 1 |
|||||
|---|---|---|---|---|---|
| Top 10 self-tags | Top 10 social tags | ovl@1 | ovl@5 | ovl@10 | |
| Information Tech | Linux | Linux | 1 | 0.6 | 0.6 |
| Java | Java | ||||
| Css | Visualization | ||||
| linux-desktop | linux-desktop | ||||
| Rfid | Green | ||||
| Architecture | Architecture | ||||
| information-architecture | Twe | ||||
| Uml | Actionscript | ||||
| Actionscript | complexity-method | ||||
| Green | Css | ||||
| Internet Tech | web20 | web20 | 1 | 0.8 | 0.7 |
| Secondlife | Secondlife | ||||
| Socialnetworking | Socialnetworking | ||||
| Soa | Soa | ||||
| Ajax | second-life | ||||
| Web | Web | ||||
| Javascript | web-20 | ||||
| open-source | Ajax | ||||
| j2ee | Metaverse | ||||
| Mashup | open-source | ||||
| Theme | Collaboration | Collaboration | 1 | 0.8 | 0.8 |
| Innovation | Innovation | ||||
| social-networking | Work | ||||
| social-computing | social-networking | ||||
| social-software | social-computing | ||||
| user-experience | user-experience | ||||
| virtual-worlds | Km | ||||
| Mobile | Mobile | ||||
| Usability | social-software | ||||
| Km | Socialsoftware | ||||
| IBM product | websphere-portal | lotus-connections | 0 | 0.8 | 0.9 |
| lotus-connections | Sametime | ||||
| lotus-notes | websphere-portal | ||||
| Domino | lotus-notes | ||||
| Quickr | Domino | ||||
| Sametime | Quickr | ||||
| Learning | Wplc | ||||
| Odw | Learning | ||||
| Thinkplace | Odw | ||||
| Tap | Tap | ||||
| External product | 1 | 0.6 | 0.4 | ||
| Eclipse | Mac | ||||
| Mac | Moleskine | ||||
| Ubuntu | Ubuntu | ||||
| Macintosh | Websheets | ||||
| Iphone | Eclipse | ||||
| Vmware | foxray-xbound | ||||
| Exchange | mac-at-ibm | ||||
| Photoshop | Abb | ||||
| Suse | Audacity | ||||
| Project | Dogear | Beehive | 0 | 0.4 | 0.5 |
| Bluepedia | Bluepoints | ||||
| Beehive | Ets | ||||
| Big-green | best-of-blue | ||||
| collaborationcentral | Dogear | ||||
| Eagle | collaborationcentral | ||||
| Ets | Ebo | ||||
| best-of-blue | luana-related | ||||
| blue20 | project-wookie | ||||
| Bluewiki | Tommy | ||||
| Group affiliation | web20forbiz | web20forbiz | 1 | 0.8 | 0.8 |
| Vuccore | Blueiq | ||||
| Blueiq | Vuccore | ||||
| blueiq-ambassador | blueiq-ambassador | ||||
| taggingsummit2006 | Cio | ||||
| Hackday | Csi | ||||
| enterprise20 | Hackday | ||||
| Tec | taggingsummit2006 | ||||
| web20summit08 | web20summit08 | ||||
| Cio | Communities | ||||
| Org group | Lotus | Lotus | 1 | 0.8 | 0.9 |
| Websphere | Gbs | ||||
| Swg | Sales | ||||
| Gbs | Swg | ||||
| Design | Websphere | ||||
| Sales | Strategy | ||||
| Communications | Design | ||||
| Software | Research | ||||
| Strategy | Software | ||||
| Marketing | Communications | ||||
| Country | Germany | Germany | 1 | 1 | 0.8 |
| Australia | Uk | ||||
| Austria | Austria | ||||
| Canada | Canada | ||||
| Uk | Australia | ||||
| Europe | Brazil | ||||
| Cemaas | Cemaas | ||||
| Asean | India | ||||
| Brazil | Malta | ||||
| Norway | Norway | ||||
| City | Hamburg | Hamburg | 1 | 0.6 | 0.4 |
| Hursley | Hursley | ||||
| Stuttgart | Berlin | ||||
| Berlin | Vancouver | ||||
| Dresden | Nyw | ||||
| Hannover | Toronto | ||||
| Boston | Vienna | ||||
| Cambridge | Boeblingen | ||||
| London | Boston | ||||
| Melbourne | Nyc | ||||
| Customer | Ebic | Ebic | 1 | 0.4 | 0.3 |
| Government | deutsche-bank | ||||
| Itil | Siemens | ||||
| Siemens | ascent-capture | ||||
| Aegon | Ericsson | ||||
| ascent-capture | Foursight | ||||
| Basf | Ibv | ||||
| Bayer | Ing-diba | ||||
| Chevron | p-dakim | ||||
| Eads | ppl | ||||
| Organization | Cisco | cisco | 1 | 0.4 | 0.7 |
| Sap | apple | ||||
| Apple | |||||
| Cynefin | academy | ||||
| alcatel-lucent | cynefin | ||||
| Cvi | microsoft | ||||
| Daimler | Cvi | ||||
| Drupal | Drupal | ||||
| Ducati | Autoid | ||||
| Microsoft | Daimler | ||||
| Job description | Architect | architect | 1 | 0.6 | 0.5 |
| tech-sales | tech-sales | ||||
| Manager | manager | ||||
| Consultant | Tap-innovator | ||||
| Executive | presales | ||||
| Hcm | developer | ||||
| w3-editor | distinguished-engineer | ||||
| Webdesign | Designer | ||||
| Editor | Executive | ||||
| Designer | As-delivery | ||||
| Special skill | Blogger | blogger | 1 | 0.4 | 0.6 |
| Mentor | mentor | ||||
| Communitybuilder | 5live-speaker | ||||
| Inventor | Communitybuilder | ||||
| thinkplacecatalyst | Blogcentral | ||||
| Blogcentral | Inventor | ||||
| portal-evangelist | quickr-expert | ||||
| Webmaster | Author | ||||
| Podcasting | Hacker | ||||
| Speaker | portal-evangelist | ||||
| Hobby | Photography | Biker | 0 | 0.4 | 0.3 |
| Writer | Photography | ||||
| Soccer | Writer | ||||
| Video | Fuwa | ||||
| Climbing | Wii | ||||
| Gaming | battle-of-bands | ||||
| Golf | Fresh | ||||
| Jazz | Art | ||||
| Sailing | Diver | ||||
| Snowboard | Golf | ||||
| Personal adj | Evangelist | Innovator | 0 | 0.4 | 0.7 |
| early-adopter | Evangelist | ||||
| Competitive | early-adopter | ||||
| Leadership | Guru | ||||
| portal-expert | Catalyst | ||||
| Catalyst | portal-expert | ||||
| Innovator | Competitive | ||||
| Education | Creative | ||||
| Inputaccel | Leadership | ||||
| Support | Friend | ||||
| Sample 2 |
|||||
|---|---|---|---|---|---|
| Popular self-tags | Popular social tags | ovl@1 | ovl@5 | ovl@10 | |
| Information Tech | Java | Linux | 0 | 0.4 | 0.5 |
| Linux | Java | ||||
| Architecture | Rfid | ||||
| Gic | Twe | ||||
| Uml | Aot | ||||
| Rfid | energy-utilities-w3-toolbar-links(1) | ||||
| Sonar | Autoid | ||||
| domino-infrastructure | Standards | ||||
| Server | Architecture | ||||
| Client | Sonar | ||||
| Internet Tech | web20 | web20 | 1 | 0.6 | 0.7 |
| Secondlife | secondlife | ||||
| Soa | Soa | ||||
| j2e | open-source | ||||
| Ajax | Web | ||||
| Web | metaverse | ||||
| open-source | j2e | ||||
| Javascript | Ajax | ||||
| virtual-worlds | blogcentral | ||||
| Php | iphone-blog | ||||
| Theme | Socialnetworking | Collaboration | 0 | 0.8 | 0.7 |
| Collaboration | Work | ||||
| Innovation | social-networking | ||||
| social-computing | social-computing | ||||
| social-software | social-software | ||||
| Security | user experience | ||||
| user experience | Innovation | ||||
| Knowledgemanagement | Tagging | ||||
| Virtualization | Mobile | ||||
| Work | Communication | ||||
| IBM product | Portal | Portal | 1 | 0.8 | 0.6 |
| lotus-connection | lotus-connection | ||||
| Notes | Quickr | ||||
| Quickr | Sametime | ||||
| Sametime | Domino | ||||
| Domino | Learning | ||||
| websphere-application-server | Odw | ||||
| Learning | Ibmcom | ||||
| Agile | System-z | ||||
| Ecm | Tap | ||||
| External product | Eclipse | Mac | 1 | 0.6 | 0.5 |
| Eclipse | |||||
| Iphone | Iphone | ||||
| Dogear | Moleskine | ||||
| Dogear | |||||
| Mysql | Obuntu | ||||
| Photoshop | Mozila | ||||
| Firefox | Ais | ||||
| Blackberry | 2010outlook(1) | ||||
| Mac | |||||
| Project | Fringe | Fringe | 1 | 0.6 | 0.6 |
| Beehive | Bluepedia | ||||
| project zero | Ets | ||||
| Bluepedia | p-vista | ||||
| Ets | p-profi | ||||
| Eagle | Bluemail | ||||
| p-vista | Relmap | ||||
| Koala | Eagle | ||||
| Dmtf | Dmtf | ||||
| Rexx | Project-puls | ||||
| Group affiliation | Vuc | web20forbiz | 0 | 0.6 | 0.6 |
| web20forbiz | research-software-strategy-meeting-2006(1) | ||||
| Blueiq | Vuc | ||||
| global-innovation-community | taggingsummit2006 | ||||
| domino-administration | Blueiq | ||||
| Kcblue | pms-contacts | ||||
| it-architects | web20summit08(1) | ||||
| web20summit08(1) | Swat | ||||
| taggingsummit2006 | web20tic(1) | ||||
| Swat | z-community | ||||
| Org group | Lotus | Lotus | 1 | 0.6 | 0.8 |
| Websphere | Swg | ||||
| Rational | Gbs | ||||
| Tivoli | websphere | ||||
| Sales | Rational | ||||
| Marketing | Issl | ||||
| Swg | Tivoli | ||||
| Strategy | Strategy | ||||
| Gbs | Sales | ||||
| Services | Research | ||||
| Country | Canada | germany | 0 | 0.4 | 0.7 |
| Italy | Austria | ||||
| Sweden | UK | ||||
| UK | Italy | ||||
| Europe | Brazil | ||||
| Switzerland | USA | ||||
| India | India | ||||
| Australia | Canada | ||||
| Germany | Sweden | ||||
| Hungary | Europe | ||||
| City | new-york | Hamburg | 0 | 0.6 | 0.6 |
| Hamburg | Hursley | ||||
| Wimbledon | New York | ||||
| Bremen | Phoenix | ||||
| Phoenix | Toronto | ||||
| New Jersey | Rome | ||||
| Toronto | Boeblingen | ||||
| Pittsburg | Wimbledon | ||||
| Vancouver | Mainz | ||||
| Maintz | Atlanta | ||||
| Customer | Itil | Bowstreet | 0 | 0.4 | 0.5 |
| Optim | Marist | ||||
| Vodafone | eon-is | ||||
| Marist | Datev | ||||
| Government | Vodafone | ||||
| Ebic | Government | ||||
| Rails | Rails | ||||
| Gap | Citybank | ||||
| Eon | Citygroup | ||||
| Avaya | Philips | ||||
| Organization | SAP | Sap | 1 | 0.4 | 0.5 |
| Ibm | Ibm | ||||
| Oracle | Cosi | ||||
| Microsoft | Apple | ||||
| Unix | Usaid | ||||
| Adobe | Pubdis | ||||
| Academy | Cisco | ||||
| Issw | Isst | ||||
| Audi | Issw | ||||
| Isst | Microsoft | ||||
| Job description | Architect | Tech sales | 0 | 0.6 | 0.8 |
| Tech sales | tap-innovator | ||||
| Manager | Architect | ||||
| Communitybuilder | Manager | ||||
| Project manager | Developer | ||||
| Swita | Project manager | ||||
| Consultant | Communitybuilder | ||||
| Developer | Director | ||||
| Designer | Swita | ||||
| Programmer | Designer | ||||
| Special Skill | Blogger | Blogger | 1 | 0.8 | 0.6 |
| Mentor | Mentor | ||||
| Inventor | Inventor | ||||
| Coaching | Hacker | ||||
| Speaker | Speaker | ||||
| Photographer | Coaching | ||||
| Hacker | Blogmaster | ||||
| Catalyst | Techgen | ||||
| Presenter | Mindmapper | ||||
| java-performance | Dogmaster | ||||
| Hobby | Jazz | Coffe | 0 | 0.2 | 0.2 |
| Photography | Sea | ||||
| Sailing | Wii | ||||
| Golf | Scuba | ||||
| Sea | Guitar | ||||
| Soccer | Sailing | ||||
| Squash | Cycling | ||||
| Skiing | Cook | ||||
| Guitar | Climbing | ||||
| Symphony | Snowboard | ||||
| Personal adj | Compliance | Innovator | 0 | 0.4 | 0.5 |
| portal-expert | portal expert | ||||
| domino-expert | Friends | ||||
| Creative | Leader | ||||
| Pervasive | Creative | ||||
| Competitive | Evangelist | ||||
| Evangelist | Partner | ||||
| community-leader | Participant | ||||
| architectural-thinking | Early adopter | ||||
| Early adopter | Domino-expert | ||||
Acknowledgements
Ido Guy participated in this research while working at IBM Research.
Funding
Supported by an IBM Open Collaborative Research grant.
