Abstract
Drawing on theories from the sociology of work and the sociology of culture, this article argues that members of nascent technical occupations construct their professional identity and claim status through an omnivorous approach to skills acquisition. Based on a discursive analysis of 56 semi-structured in-depth interviews with data scientists, data science professors and managers in Israel, it was found that data scientists mobilise the following five resources to construct their identity: (1) ability to bridge the gap between scientist’s and engineer’s identities; (2) multiplicity of theories; (3) intensive self-learning; (4) bridging technical and social skills; and (5) acquiring domain knowledge easily. These resources diverge from former generalist-specialist identity tensions described in the literature as they attribute a higher status to the generalist-omnivore and a lower one to the specialist-snob.
Keywords
Introduction
Professional-expert identity in traditional professions is usually based on specialisation (Scott, 2008). Expertise and work-related status are associated with a long process of skill acquisition and a focus on specific problems. For example, doctors are trained to focus on sub-specialities of health problems, attorneys on sub-fields of legal matters and accountants on specialised types of financial reporting. Consequently, generalist practitioners, especially in high-status professions, such as medicine, endure a status penalty (Kyratsis et al., 2017). This status penalty is so severe that the medical literature has deemed it as causing a shortage of generalist physicians compared to specialists (Steiner and Stoken, 1995). Accordingly, Abbott (2001) considers interstitial-generalist professions, which apply their expertise widely and are comprised of an assemblage of skills and knowledge domains, as especially vulnerable to encroachment.
However, studies in various occupational settings have observed that in different occupational socio-cultural contexts, the meaning and content of ‘expert’ and ‘professional’ are constructed differently (Barley et al., 2016; Carollo and Solari, 2019; Evetts, 2013; Fournier, 1999; Larkin, 2003). In Bourdieu’s words, an occupational socio-cultural context is ‘a world apart’ (1990a), and its construction affects both the specific occupation and cluster of occupations surrounding it (Scott, 2008). Furthermore, studies of nascent technical occupations point to a change in the way expert identities are constructed in the new technical elites, specifically in terms of skills (Damarin, 2006; Darr, 2015; Glover, 2013; Liebeskind et al., 1996). Contributing to this discussion, the current study argues that, unlike traditional professions and even 20th-century engineering, current technical elite occupations base the ‘expert’ identity on generalism rather than specialisation.
Proceeding from the assumption that leisure and work practices cannot be analytically separated and the logic of practice in both work and leisure is influenced by similar processes (Bourdieu, 1990b), this study imports the term ‘omnivore’, with its elite status associations and related practices, from the sociology of culture to the sociology of work. It is argued that the same logic that grants omnivores prestige and status in the realm of culture and leisure consumption, grants status to individuals and occupations of the new technical elites. Omnivorousness includes: (a) a wide range of inclusive practices; (b) frequent movement between practices; and (c) selective tolerance, meaning accepting some practices that are considered as ‘low’ while preserving a basic hierarchy and distinction between practices (Bryson, 1996; Peterson and Kern, 1996; Sullivan and Katz-Gerro, 2006).
The findings of this study demonstrate that data scientists, as a case study for elite technical occupations, employ an omnivorous strategy to construct their identity in regard to skills. In their identity work, they emphasise: (a) the breadth of skills, including computer science, maths and statistics, business skills, social skills and expertise in the knowledge domains supplying the data for their work, such as medicine, law or finance; (b) the intensity of learning, including frequent and independent training; and (c) the mixture between what they consider low and high levels of skills – for example, social versus mathematical/technical skills, or domain knowledge versus algorithmic knowledge. The data scientists’ identity work regarding skills is based on generalism, utilises frequent crossing of disciplinary boundaries and spans multiple knowledge domains beyond the technical and mathematical, as a status marker.
Generalist-specialist tensions in the sociology of work
The impact of specialisation on workers’ status is an important question in the sociology of work (Freidson, 1976). According to Durkheim, the differentiation of skills is the main process in the modernisation of society. Inspired by Durkheim, the professions literature allocates a high status to esoteric and specialised skills (Scott, 2008). Conversely, the neo-Marxist literature links the detailed division of labour with the process of deskilling workers, which decreases their status and control over their labour (Braverman, 1974). According to the labour process theory, skill development and training opportunities are shaped by capital and management through job design and are becoming increasingly detailed. Consequently, from the perspective of deskilling, flexible specialisation is perceived as a marker of low-status production-line occupations, which are subjected to managerial control (Piore and Sabel, 1986; Smith, 1997).
However, as the neo-Weberian tradition and work practice approach have repeatedly argued (Freidson, 1971; Van Maanen and Barley, 1984), occupational communities, especially in technical occupations, constitute the main context in which work content, training and skills are shaped (O’Mahony and Bechky, 2008; Orr, 1996). According to this perspective, work-related identities and corresponding skills are largely constructed within occupational socio-cultural contexts, rather than in capitalist and bureaucratic ones. These tensions in the sociological inquiry between specialised and non-specialised, low-level and high-level skill sets and between the occupational and managerial shaping of skills have been framed as ‘the deskilling controversy’ (Attwell, 1987, 1990).
Accordingly, when sociologists of work discuss the post-industrial society (Bell, 1976), the question of skills in the context of the new technical elites is a matter of debate. Whereas some scholars have outlined the polarisation of skills in terms of knowledge-work versus service-work in western countries (Autor et al., 2008), other scholars have maintained that skills in the post-industrial society and, specifically, in technical occupations, are becoming networked, hybrid, modular and flexible. The large body of literature on knowledge workers documents that new technical elites value autonomy, learning, self-expression and working in peer networks to benefit from cooperation and collaboration (Darr and Warhurst, 2008; Liebeskind et al., 1996). Glover (2013) and Darr (2015) argue that some technical knowledge workers are characterised by a hybrid division of labour, combining technical and social skills. Damarin (2006) observes that workers in the field of website production have a modular skill set and switch between different skills. They define their work as internet production rather than limiting it to a specific area of expertise. Recently, STEM (science, technology, engineering and mathematics) professionals have been described as having ‘T-shaped’ skills, indicating both depth and breadth of skills (Conley et al., 2017). Specifically, data scientists have been described as having ‘M-shaped’ skill sets, indicating a multiplicity of skills (Fiore-Gartland and Tanweer, 2016).
All these studies about networks, hybridity, modularity and multiplicity of skills in technical work point to a logic of generalism in the new technical elites. However, none of these concepts convey the utilisation of breadth and intensity in skills acquisition as a professional status marker. Taking data scientists’ identity work as its case, the current study contributes to these discussions by importing the concept of omnivorousness from the sociology of culture to the sociology of work as a unifying metaphor that conveys status.
Omnivorousness in the sociology of culture and sociology of work
In the sociology of culture, Bourdieu (1984) argues that groups define themselves through the social mechanism of distinction. Referring to taste, distinction means that the elite shuns the cultural components identified with the lower classes and only practises what it considers as its unique practices. Yet, Bourdieu’s research is not limited to leisure and considers a variety of occupational fields (Bourdieu, 1986; Bourdieu and Whiteside, 1996). There, Bourdieu describes the occupational habitus as adopting certain forms of thinking and behaviour while rejecting others. In neo-Weberian terms (Van Maanen and Barley, 1984), the division of labour and skills into different occupational communities results in the identity demarcation of symbolic boundaries, a ‘consciousness of kind’ and a ‘consciousness of difference’ regarding skills. This means that through occupational socialisation, the individual slowly identifies with an occupational specialisation and its interpretation of tasks and problems while rejecting other interpretations, mainly of other nearby occupations. For example, technicians reject engineers’ views on building machines (Bechky, 2003) and surgeons reject gastroenterologists’ views on treating kidney diseases (Zetka, 2001). In the past, snobbism and rejection-motivated distinctions have been the main features of symbolic boundaries.
However, since the 1990s, the omnivorous thesis has opposed Bourdieu’s paradigm of symbolic boundaries (Peterson and Kern, 1996). According to this thesis, rejection and exclusiveness are not the basis of distinction. The elites and those aspiring to be considered elite exhibit omnivorousness in their cultural tastes, in opposition to the snobbism of the elites before the mid-20th century. Omnivorousness refers to three qualities of taste: (1) a wide range of inclusive cultural preferences, such as enjoying classical music and pop music (Peterson and Kern, 1996); (2) frequent consumption of cultural products and practices, also referred to as ‘voracious cultural consumption’ (Garía-Álvarez et al., 2007); and (3) selective tolerance, namely the acceptance of some cultural products and practices that are considered ‘low’, while preserving a basic hierarchy and distinction in taste (Bryson, 1996). This shift from snobbery to omnivorousness can be observed across all areas of cultural consumption; for example, musical genres, gastronomical preference and patterns of consuming art (Bryson, 1996; Johnston and Baumann, 2007). Unlike snobs, omnivores develop a wide array of tastes, subscribe to a wide range of cultural practices, mix elite and popular culture and seldom reject practices of the available cultural repertoire.
Interestingly, a narrow and exclusive taste has become a marker of the lower classes (Bryson, 1997). The shift in the differential mechanism of culture and leisure consumption has been ascribed to the democratic and liberal values of the elites in developed countries. For sociologists of culture, this change in taste has affected the construction and performance of elitism. However, in the sociology of work and specifically in the literature on skills, omnivorousness has scarcely been utilised.
In an initial attempt to link omnivorousness to the labour market, Koppman (2016) finds that workers with omnivorous tastes seek creative jobs and are more easily accepted for such positions. Koppman studied the advertising industry and identified a link between omnivorous taste cultivated in an individual since childhood and the desire and chances of obtaining a creative job in advertising. Another link between omnivorousness and the labour market is suggested by Sullivan and Katz-Gerro (2006), who argue that voraciousness in cultural consumption echoes work feelings of ‘being busy’, ‘in a rush’ and working long hours. However, the effect of omnivorousness on work-related status markers, such as the expert identity, has not yet been explored. Studying nascent occupations and professions allows us to investigate how work-related identities, specifically high-status ‘expert’ or ‘professional’ identities, are constructed.
Data science as an emerging elite technical occupation
Data science was formally established as an occupational title in 2008 when two team leaders, DJ Patil and Jeff Hammerbacher, from two major internet companies, LinkedIn and Facebook, met as colleagues and declared its foundation (Patil, 2011). The emergence of data science was preceded by years of academic debate between statisticians and computer scientists regarding the impact of computerisation and computer science on the types and methods of data analysis (Breiman, 2001; Evans et al., 2019). Since then, the status of the occupation of data science has been elevated to ‘the sexiest job of the 21st century’ (Davenport and Patil, 2012). In a rough description of their practices, data scientists build algorithmic systems for computerised quantitative data analysis. With algorithms, digital platforms and algorithmic management, data scientists ‘disrupt’ the organisation of work in varied settings (Sutherland et al., 2020; Veen et al., 2020; Wood et al., 2019). Furthermore, data scientists interfere through algorithmic systems with the work of specialised high-status professions; for example, algo-trading systems for finance, medical systems and devices and automatic sentencing in the legal system.
Consequently, sociologists have observed the potential threat or competition that data scientists pose to elite professions, such as finance, medicine and law (Barley et al., 2017; Susskind and Susskind, 2015). Indeed, the integration of algorithms into different professional fields poses a threat to professionalism and the institution of ‘the expert’ (Friedman, 2019). In most cases, data scientists, who permeate specialised fields, aspire for positions of authority and elite status vis-a-vis the corresponding experts. However, the identity tensions and the required skills of data scientists have currently only been investigated by data scientists. Data scientists have produced abundant quantitative studies and online discussions of their emerging occupation, especially regarding their skill set (Kaggle Survey, 2019; RJ Metrics Survey, 2016). This type of research is mainly based on online surveys and contains lists of statistical methods and technological tools used by data scientists. A thick and rich discursive study of their identity tensions regarding skill is currently missing from the literature, a gap that this study aims to fill.
Inspired by Goffman (1959) and the micro-sociological tradition, studies of identity work in professional settings focus on workers’ self-perception and sense of self-worth and how individuals create and negotiate their professional role by invoking images of their profession’s desired persona (Fournier, 1999). Members of an occupational group learn to perform these images in their negotiations of their professional selves, thus constructing their professional identity, signalling status and structuring their occupational sphere (Giddens, 1984; Sela-Sheffy, 2014). Whereas identity work research sometimes focuses on contested identities (Lee and Lin, 2011; Snow and Anderson, 1987), this study investigates data science as a nascent elite group, attempting to affirm the mechanisms that define its skills and status.
Methods
The research took place in Israel between 2015 and 2018. Fifty-six semi-structured in-depth interviews were conducted with data scientists, their employers and professors who train data scientists. Israel has a developed high-tech sector with over 6600 companies and 300 research and development branches of international corporations in fields such as health care, financial services and cybersecurity (Korbet, 2019). Most of these companies are located in the greater Tel Aviv area – the centre of the country – forming what is known as the ‘Silicon Wadi’, the dense Israeli tech ecosystem. Accordingly, Israel has one of the highest density rates of data scientists (RJ Metrics Survey, 2016).
The 56 interviews were conducted with three categories of participants: the largest group comprised 46 self-defined as data scientists on the professional social network LinkedIn, using the search words ‘data scientist Israel’. There were fewer than 1000 results for this search at the time of data collection (compared with 6500 in 2020). The top 125 profiles were invited to participate. Both men and women were approached; however, there were fewer women in this field, similar to other STEM professions. The sampling focused on individuals who worked as data scientists in small and large local and international organisations; no more than three were selected from a single organisation to avoid cliques and ensure intra-occupational variety. Of the 125 individuals approached, 46 agreed to participate – a relatively high response rate of 37%. The second group consisted of five data science department managers. These were snowball sampled from the interviewees’ recommendations. The third group consisted of five professors from four top universities and technological institutes in Israel who trained data scientists. Table 1 summarises the demographic characteristics of the 46 data scientists.
Demographics of the 46 data scientist participants.
Note: aOut of 132 degrees earned by 46 data scientist interviewees.
The interviews lasted one to three hours and were recorded and closely transcribed. Each interview began with the following open-ended question: ‘Can you tell me about what you do?’. Based on the answer, the interviewees were asked to elaborate on their skills and work, and what they thought was required to qualify as a data scientist. All the interviewees signed an informed consent form in line with the ethical guidelines for social science research, and all the names of people and companies were anonymised.
The analysis of the transcribed interviews focused on the interviewees’ perspectives regarding their skill and examined the discursive production of ‘the worthy’ worker (Snow and Anderson, 1987) in line with grounded theory principles (Strauss and Corbin, 1997). The analysis traced what the interviewees signalled as important to their skill set and their skill-based distinctions from other professions – especially adjacent professions – such as software engineers, statisticians, analysts and algorithmists. Each interview was categorised according to the skills demarcated by the data scientists as unique to them and then compared with other interviews, interpolating between empirical data and theoretical concepts. Categorisation focused on role-images and skills that grant interviewees, from their emic perspective, status and dignity in their social world (Lamont, 2009). Then, these categories were assembled into themes, the identity resources in the taxonomy below. Finally, at this stage, the concept of omnivorousness was introduced into the analysis as a unifying metaphor.
Findings
Ability to bridge the scientists’ and engineers’ identities
In their identity work, data scientists endeavoured to bridge the gap between skills that were considered different and differentiated in their social world: theoretical knowledge in maths and statistics was considered essentially different from the ability to program and master technology. Whereas the former was attributed to the scientist’s identity, the latter was attributed to the engineer’s identity. Roi, who worked as a data scientist in a large start-up company, explained this as follows:
With my background in software development, on the one hand, and data, on the other hand, I can speak to both types of people because I know I identify with the problems of both these types of people, engineers and scientists.
Roi’s skills as a software developer and his PhD in brain sciences enabled his dual identification as both an engineer and a scientist. Similarly, Peter was a data scientist with a large technology company. He said that he could be both a scientist and an engineer:
I can wear three hats. I can be a scientist, I can be a programmer, and I can be an electrical engineer, I mean an algorithmist.
Peter believed that he embodied three identities: the scientist, programmer and electrical engineer. He had two degrees: one in chemistry and one in electrical engineering. Additionally, he had programming experience and had published academic articles while working in a university research lab, like the other participants with advanced degrees. Thus, his identity encompassed all three fields, and he considered this an advantage.
The data scientists’ omnivorousness allowed them to assimilate the scientist and engineer identities, resolving the age-old tension between the two groups (Layton, 1976a). From an etic point of view, the distinction between ‘scientists’ and ‘engineers’ may not seem significant, but from an emic perspective, this gap is fundamental in STEM cultures. Tensions between the ‘abstract, theoretical and impractical’ scientist and the ‘practical, realistic and problem-solving’ engineer are crucial in technical fields. This has been well documented by several historians and sociologists of engineering (Ensmenger, 2001, 2010; Layton, 1976b; Whalley, 1986). Therefore, overcoming this tension could be considered as a step forward towards creating a new omnivorous identity. The data scientists mobilised these two identities and created an omnivore self-image, granting them status and symbolically differentiating them from workers who are only scientists, such as statisticians or algorithmists, or only engineers, such as software engineers. Additionally, this dual identification enabled data scientists to gain from both groups’ status.
Multiplicity of theories
The abstract mathematical knowledge of the data scientist included knowledge from both computer science and statistics. In the following excerpt, Ethan, a data scientist who worked for a large technology company, described the theoretical knowledge required for data science as two branches of knowledge separated by a conceptual and methodological divide:
It is two [scientific] branches and what separates them is that data mining comes from the field of statistics, and machine learning comes from the field of computer science, and there was, in my opinion, not exactly a war, but this type of, each one defined its own terminology.
Ethan explained that, historically, two disciplines were applied in different fields: machine learning, applied mainly in the subfields of computer science, such as robotics and image and voice recognition, and statistics/data mining, applied in different branches of science, such as medicine, economics and psychology. Subsequently, each discipline developed its own culture, jargon and methods. For example, machine learning used algorithms, whereas statistics used population sampling and measurement. However, the data scientist combined expertise from both machine learning and statistics/data mining. Ethan explained that he knew both subjects and that his work demanded familiarity with theoretical approaches from both. Additionally, machine learning algorithms emerged from a variety of disciplines, including computer science, statistics, physics, biology, electrical engineering, linguistics, psychology, brain science and operations research. Practising data scientists presented their ability to synthesise these theoretical traditions and algorithmic architecture as a status marker.
Peleg, who was trained in computer science and humanities, thought that data science was unique in the technological environment because her skills – which were unrelated to programming and maths – were valued in this emerging field for the first time:
When I was interviewed for [a developer role at] Thor [a large high-tech company], it would be an understatement if I said they were not interested in the diversity of my background. Not only were they not interested, they considered it a disadvantage. For the first time, here [at a data company], I came and someone, again, my CV is not a classic [software] development CV – I have done all kinds of things in my life – for the first time, here, I think someone valued it.
Peleg drew a symbolic boundary between data science and software engineering. Her account indicates that multidisciplinarity was considered advantageous in the field of data science, whereas it was considered as a hindrance for a developer role at a big high-tech organisation. Signalling non-technical skills as a status marker, later in the interview, Peleg described how developers failed data science job interviews because they lacked non-technical knowledge. She considered her diverse disciplinary background as an omnivorous advantage.
Their omnivorous approach to skills was also evident in the way data scientists drew symbolic boundaries between data science and specialised types of expertise. Specialisation, even medical specialisation, which demands significant knowledge in one narrow field, was considered inferior. Noam, a data scientist in a start-up, explained:
In my opinion, take doctors, the whole concept is problematic. Look what happens with specialisations. It has reached the point where a doctor can only fix one joint. He is really good at it, but he can only fix that joint. The rest, he cannot fix.
Noam drew a symbolic boundary between data science and medicine. He saw specialisation as a limitation and played down the medical profession and its system of specialities, which limited physicians’ knowledge of a human body. Similarly, Ron, a senior data scientist with a start-up, reported his selection criteria for data science roles:
I had quite a few PhDs here that knew nothing. They focused on a very narrow subject and are probably really good at it, I would not know. However, a little bit to the side, they do not have the required level. On the other hand, I can interview someone with no degree at all, or only an undergraduate, and his basic level, the general level, is good enough.
Refuting symbolic markers of specialised expertise, such as narrow proficiency and advanced academic degrees, Ron attributed more competency to the non-specialised candidate.
Intensive self-learning
The data scientists reported a rapid pace of innovation in machine learning, both in the theory of algorithmic design and the technological tools and infrastructure that support the processing of large amounts of digital data. Therefore, in their identity work, they cultivated an image of themselves as professionals who can quickly adapt to changes in technologies and theories, constantly updating their knowledge and keeping abreast of current innovations without external assistance. For example, Dan, a young data scientist who worked for a start-up company, said that one has to teach oneself to be a data scientist:
Independence is very important. Many people, you will see, they write [in their CVs] things like ‘self-taught’; if you do not know something, you will not start asking people [at your workplace] all the time. Search [online by yourself]. They [colleagues and employers] want a worker who is independent and can conduct searches and research, whatever is needed.
Dan stressed independence and self-learning as key values for data scientists. Here, Dan drew a symbolic boundary between those who learn by themselves and those who require assistance, such as on-the-job training. Therefore, intensive self-learning functioned as a status marker and differentiated competence from incompetence.
Accordingly, a considerable portion of data science learning takes place on open internet platforms, mainly websites providing Massive Open Online Courses (MOOCs), knowledge sharing forums such as Stack Overflow, Stack Exchange and Cross Validated, and the competition website Kaggle. The ability to use these learning platforms constituted a symbolic boundary for data scientists (and for other tech workers). It differentiated them from other professionals, such as statisticians and analysts, who acquire knowledge exclusively through universities or training institutions and do not openly share their knowledge and tools online in the same way that tech communities do.
On the discourse periphery, however, the data scientists noted the problems linked to voracious self-learning. They pointed out that the constant need to absorb mammoth amounts of knowledge, quickly and in such depth, takes a heavy emotional toll, as an older interviewee explained:
Older guys experience terrible pressure. It is hard for us, but we have to continue learning. There are always new technologies and a need for new things. Everyone talks about it. We are left behind. I still see myself as being at the forefront of technology. However, it seems I am not that either. I am being left behind because the field is developing faster than I can absorb it. It is a problem. So, in five years, you may find me in a psychiatric hospital. Alternatively, you may find me out of this field completely and unemployed because they will say: ‘you are old!’. This is a serious problem. Some guys quit because of this.
This interviewee raised the issue of the emotional stress caused by continuously attempting to keep pace with technological and theoretical developments. He was aware that even if he considered himself at the technological forefront, he had gaps regarding innovations and developments owing to the limits of human absorption. He discussed the possible effect of this pressure on a person’s sanity and employment and said that some people leave the field because of this pressure.
Bridging technical and social skills
In the social world of data scientists, like in that of programmers and engineers, skills were divided into ‘hard skills’ and ‘soft skills’ (Grugulis and Vincent, 2009; Guerrier et al., 2009). For them, hard skills included high mathematical and technical skills, whereas soft skills included any skill that is not technical or quantitative. Many interviewees deemed social or business skills as soft skills. However, unlike other engineering professions, soft skills acquired a central position in the data scientist’s professional identity. Data scientists devoted a significant part of their identity work and self-presentation efforts to appropriating social skills. This sets them apart from the algorithmists and introverted ‘geeks’ or ‘nerds’ in research labs. When appropriating social skills, data scientists defied the ‘nerd identity’ typical of the computer professions (Kendall, 2011) and the gendered engineering division into ‘hard’ and ‘soft’ skills.
Kfir, a freelance data scientist with a Bachelor’s degree in information systems and a Master’s in brain sciences, described the importance of social skills and the ability to integrate into a business organisation:
In contrast to the researchers and algorithmists of the 2000s, data scientists are more practical and have theoretical knowledge and background and work experience, and they can fit into the organisation and work with managers and interested parties. They are not professors or PhDs who studied all their lives.
Kfir drew a symbolic boundary between data scientists and algorithmists, PhD graduates and professors. The social skills he described concerned awareness of organisational limitations, goals and deadlines. According to him, data scientists did not have a theoretical mindset but rather a commitment to adapting their research to the organisation’s budgets, costs, goals and, above all, time constraints. Distinguishing themselves from the ‘theoretical and impractical scientists’ and the quirky ‘IT guys’, functioned as a status marker for interviewees.
In his interview, Professor Segall, from a leading Israeli university, explained that when he and his colleagues were planning the data science curriculum, they considered the profile of students that they wished would emerge from their training:
First, let me say there were also disagreements between us regarding the profile of the data scientist. On the one hand, there is what I call ‘the statistician in the basement’, the socially removed person whose only interest is the study of models. What interests them is data. Going for a beer just does not interest them. On the other hand, others argue that what we are looking for are people with the skills to explain data to other people. It seems that it is somewhere on this continuum. I mean, it is someone who was always a statistician, but it is also someone who is a sociable industrial engineer, who can discuss and explain and mediate with all kinds of people – explain everything. What they [the students] all share and what we all agree on is that a deep understanding of mathematics is fundamental.
Here, the professor drew a symbolic boundary between data science students, statisticians and industrial engineers. The discussion about the curriculum included debates around the two sets of skills considered relevant to qualify as a data scientist – deep mathematical skills and social skills, such as being able to explain the model and data to others, have a beer and a conversation, socialise. The professor associated mathematical skills with the ‘statistician in the basement’ identity – the socially removed mathematician – whereas he linked social skills to the industrial engineer, who traditionally receives less mathematical training in the institute where the professor teaches. The omnivorous decision reached by the department was that their training programme should merge both types of skills in their curriculum.
The imperative to combine both ‘hard’ mathematical skills and ‘soft’ social skills is what distinguished data scientists from both the ‘old-school’ technical snobs and non-technical occupations. In their identity work, data scientists maintained the omnivorous symbolic boundary between themselves, who have a mathematical-social skill set, and single-skill occupations such as statisticians and algorithmists, the ‘geeks’ with only mathematical skills, or marketing and sales personnel, who have only social skills.
Easy acquisition of domain knowledge
Finally, owing to the data scientists’ broad and inclusive identity, they perceived themselves as capable of not only absorbing new information in their field but also acquiring what they viewed as the knowledge required to work in other fields, such as medicine, psychology, law, social science and finance.
Ben, a data scientist employed by a start-up company, described the importance of omnivorously learning domain knowledge and explained how he acquired it on-the-job without formal training:
For example, when you work with auctioning models or real-time bidding like in this job, even though I came from my last company with some knowledge, I had a crazy learning curve, crazy domain knowledge [acquisition].
[Q: And if you work in the medical field? Would you need to study medical topics?]
Yes, we need to know these things. I do not see how someone can do their work [as a data scientist] without understanding what DNA sequencing is.
Regardless of the field, internet sales and advertising or understanding a sequence of DNA, the data scientists viewed themselves as able to learn quickly and efficiently in a ‘crazy learning curve’. This phrase captures the data scientists’ relationship with knowledge, their appetite for learning about any sphere and familiarising themselves with areas of expertise, no matter how complicated it is, and applying that knowledge to the production of algorithms. Similarly, Lior, a freelance data scientist, described the importance of the social sciences for data scientists:
The third strand, which people tend to ignore, let us call it social sciences, involves understanding what the data are telling us. Data are not numbers. No one cares about numbers. Data are people’s behaviours, decisions and things people produce. If we want to turn them into business insights, running algorithms and statistics, etc., is insufficient. It is vital to understand what the data mean.
Lior stressed the importance of understanding human behaviour in data science work, customarily the terrain of the social sciences and not engineering. Therefore, contrary to the technical specialists-snobs of the past, data scientists omnivorously embraced non-technical knowledge and skills while preserving a hierarchy between domains, in which the mathematical and technical domains were considered superior.
Discussion and conclusions
Data scientists construct their professional identity and claim an elite status based on a non-specialised, omnivorous approach to skills acquisition. Data scientists do not limit their knowledge to one discipline, such as maths or statistics, or their skill set to a specified reified entity, such as programming skills. Instead, in their identity work, they present themselves as professionals that engage in constant learning (of their own and various domains) and possess different types of skills. As with omnivorousness in leisure practices, this plethora of skills is hierarchically organised, wherein maths, computer skills and statistics are awarded the highest position. However, data scientists embrace non-technical skills (e.g. domain knowledge or social skills), hitherto shunned by technical elites.
In the sociology of culture, the frequent movement between cultural practices and skills has long been considered as an elitist tendency (Peterson and Simkus, 1992; Sullivan and Katz-Gerro, 2006). However, in the sociology of work, the current schema attributes great value to the amount of time devoted to acquiring skills and cultivating a single skill set in a single field of expertise. The omnivorousness of data scientists indicates that while elitism is maintained, its nature changes from symbolic boundaries of specialisation and shunning ‘inappropriate’ skills to the expansion of boundaries and appropriation of skills. According to this structure, the status of the profession increases as the skill set becomes broader and less specialised.
Importing the term ‘omnivorousness’ from the sociology of culture into the sociology of work, sheds light on symbolic processes occurring in the construction of the ‘expert’ and ‘professional’ identity in current technical occupations. Various concepts, such as networks, hybridity, modularity, flexibility and T-shaped and M-shaped skills, have been used to describe the new technical elites and their approach to skills. However, none of these concepts convey the meaning of status gained through generalism.
First, the metaphor of networks, although widely used outside the sociology of work to describe a networked identity (Rainie and Wellman, 2012), does not yet denote inclusiveness in skills in the sociology of work. Individual knowledge workers working in networks of peers are still regarded as ‘specialised’, and their skills are operational only through teamwork (Darr and Warhurst, 2008). The same applies to the T-shaped skills metaphor which denotes narrow specialisation and the ability to collaborate. Omnivorousness, however, allows us to consider the expansion of identity in networks through the appropriation of many practices and skills.
Second, hybridity is a mixture of two things in a binary form and has not been used to denote high status. On the contrary, Glover (2013) has found that for hybrid IT professionals to advance in the organisation, they have to return to non-hybrid categorisation. The present study indicates that data scientists combine multiple skills, not just binaries, and that they use this omnivorousness to gain status.
Third, unlike the internet workers’ skills described by Damarin (2006), the production of algorithms by data scientists does not require modularity or switching between given skill sets. As algorithms allegedly can be applied in any domain and the technologies and theories supporting their application are supposedly constantly updated, data scientists cultivate a continuous supplement of their skills and knowledge, in an intensity resembling voracious culture consumption (Sullivan and Katz-Gerro, 2006).
Fourth, flexible specialisation has emerged from the descriptions of production-line flexibility (Piore and Sabel, 1986), a management decision that minimises workers’ autonomy and discretion (Smith, 1997). In contrast, the flexibility of data scientists is not initiated by management but by the workers. The term ‘omnivore’ conveys this meaning; that is, an independent and autonomous individual who engages in many practices and activities to gain status. The M-shaped skills metaphor (Fiore-Gartland and Tanweer, 2016) is the closet to omnivorousness. However, it does not convey the signalling of status.
Critics may argue that once data science becomes an established profession, it will introduce a well-defined skill set backed by institutionalised training. However, like those in their ‘mother’ profession, software engineering (Ensmenger, 2001), data scientists exhibit no inclination towards professionalisation and standardisation. Consistent with Noordegraaf’s (2007) findings in other occupational socio-cultural contexts, data scientists negate formal markers of professionalisation, such as associations, certifications, clearly defined standards and especially state regulation. With their global occupational community, and as (occasional) workers of global tech organisations, the locality and closure aspects of professionalisation contradict their basic ethos of global, open and networked work life. Thus, data scientists’ omnivorous approach to skills is part of a symbolic order, wherein specialisation and professionalism are ranked low, not high. In the data science occupational socio-cultural context, the ‘expert’ or ‘professional’ identity preserves its status when many aspire to be a ‘data scientist’. However, its content has changed, from an identity based on specialisation to an identity based on inclusivity, multiplicity, intensity and integration of skills. Thus, the basic mechanism of distinction shifts from specialist-snob to generalist-omnivores.
The findings of this study contribute to the theoretical efforts to particularise the notion of professionalism (Barley et al., 2016; Carollo and Solari, 2019; Fournier, 1999; Noordegraaf, 2007). Additionally, these results may contribute to the data science occupational community, while it struggles to define its skill set, as well as to all those interacting with data scientists, when they interfere with algorithmic systems in the organisation of work of varied occupations and professions in different settings. Future studies should investigate other technical occupations in other contexts. This will further develop our understanding of identity tensions in terms of skills and how new technical elites generate status.
Footnotes
Acknowledgements
I would like to thank Asaf Darr, Amalya Oliver, Dafna Hirsch, Dan M Kotliar and Shira Rivnai Bahir for their helpful comments on earlier versions of this article. I am also grateful to Rakefet Sela-Sheffy for the enriching dialogue over the years. Finally, I would like to thank the editor and three anonymous reviewers for their constructive comments, which have greatly strengthened this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
