Abstract
This article investigates occurrences of data journalism between 2011 and 2013 in the Canadian province of Quebec in order to identify and examine its actors, data access conditions, practices and the required computer and statistical skills. We analysed six Quebec media outlets’ quantification and statistical data visualization projects. We noticed a growth in the production of data journalism during the period, but this production is in general of little depth and does not lead to journalists learning new skills. One considerable barrier to the production of quality data journalism in Quebec is the lack of quality data offered by the provincial and federal government.
Keywords
Introduction
‘Data journalism’ has gained followers in media outlets around the world. Journalists in prestigious newsrooms like the New York Times and the Guardian, as well as new online media outlets like Propublica and, more recently, FiveThirtyEight, are experimenting with the form. These new practices raise epistemological questions to traditional journalism (Parasie and Dagiral, 2013a), implying a new conception of data sources, information processing and presentation techniques. This study seeks to identify ‘data journalism’ actors and practices in Quebec, Canada. We are particularly interested in these actors’ profiles and practices as well as the computer and statistical techniques and skills they need to develop.
We studied 178 data journalism projects of six Quebec media outlets: four national dailies, one public radio and television broadcaster and one magazine. Our intention was to evaluate their journalistic practices and identified characteristics that reveal the degree of knowledge of and adherence to the fields from which these practices are borrowed: statistics, statistical visualization, computer sciences, and hacker and open data culture. Inspired particularly by Lisa Gitelman’s (2006) critical approach to data, which focuses on the interpretation of the concept of ‘data’ in different disciplines, and guided by the work of statisticians and historians of statistics such as Alain Desrosières, Michael Friendly and Claudine Schwartz, we asked what might constitute and explain the use of quantification and statistical visualization for news purposes. We found that Quebec’s data journalists focus on finding good quality data but engage very little with statistical analysis, interaction or reader participation. The dependence on public data also raises concerns regarding the independence and development of the practices studied.
What constitutes ‘data’ in data journalism?
Conducting a review of data journalism projects in Quebec implies that we, as researchers, perceive it as a coherent and significant phenomenon. To measure something, one necessarily assumes that it exists in a measurable form. An important part of our work therefore required making choices to delineate boundaries that nonetheless remain porous. Currently, the core definition of data journalism focuses on skills and techniques, as in the following:
Its creation relies on a variety of computer skills needed to collect, process, combine, and visualize data – whether it be numbers, texts, photographs, or audiovisual content available in digital formats. The term most commonly used today to refer to this collection of heterogeneous practices is ‘data journalism’. (Parasie and Dagiral, 2013b: 53, original translation)
Nevertheless, a number of media historians and digital humanities researchers working to identify the culture of media practices warn against essentialist or naturalist approaches. If we do not consider additional points of reference, we risk understanding the emergence of new practices as a ‘form of [technological] evidence’ (Gitelman, 2006: 2). For this study, we chose to conceptualize experimentation with and development of these journalistic practices as a collage of cultures and disciplines. However, this kind of patchwork approach may transform the concepts on which these practices are based and occasionally give rise to contradictions. This is why, in addition to identifying ‘disciplinary elements’ borrowed from statistics and computer science, we attempt to understand the epistemological implications of these borrowings. Éric Dagiral and Sylvain Parasie have examined epistemological challenges that ‘programmer-journalists’ face in their participation in open source and open government communities. One of their studies demonstrated that ‘propositions’ put forward by data journalists ‘challenge long-standing journalistic epistemological principles’ (Parasie and Dagiral, 2013a: 860).
Following media historian Lisa Gitelman’s approach, our investigation of the epistemological challenges of data journalism focuses not on actors’ propositions or beliefs, but rather on interpretations of the meaning of ‘data’ by way of structures and practices. According to Gitelman, disciplines and ‘bodies of knowledge made and maintained by professions’ are reflected primarily through infrastructures. Infrastructure is ‘sunk into, inside of, other structures, social arrangements, and technologies’ (Bowker and Star, 1999: 35). It is possible to examine the relationship to data not only based on the people involved but also in relation to the ‘artifacts and institutions that generate, share, and maintain specific knowledge’ (Gitelman, 2013: 10). In this article, we explore the relationship between data journalism and its sources, its newsroom organization and its tools and practices.
Data journalism as quantified journalism
According to former Guardian journalist Simon Rogers, the history of data journalism dates back to 1821, when an article in the Manchester Guardian listed data on Manchester schools. It presents a simple table indicating the number of students and tuition fees by educational institution (Rogers, 2013). At the time, the use of quantitative information was growing in many disciplines, but because it was still rare and difficult to obtain, data were usually manipulated by specialists and certain journalistic precautions were expected. Today, thanks in part to the open data movement, large sets of data are available to the public. Although this new openness is greatly appreciated, the evolution of journalistic practices in the manipulation of numerical data has not been examined in any depth. Therefore, this article explores data journalism’s relationship with the act of quantification. Alain Desrosières (2008), a historian of statistics, reminds us that, broadly, the act of quantifying involves ‘bringing into existence and expressing in numeric form something formerly expressed through words and not through numbers’ (p. 10, original translation). Data journalism is based on the use of quantitative data, but what interests us here, beyond the simple manipulation of quantitative objects, is the transformation of the journalistic product and its production process. This interest stems from the fact that, according to Desrosières (2008), ‘quantification offers a specific language that provides remarkable properties of transferability, possibilities of standard manipulations through calculation, and routinized interpretation systems’ (p. 12, original translation). We will now briefly summarize the progression of quantification in contemporary thinking, focusing on two crucial aspects relating to data journalism: numerical proof and data visualization. These two focal points will help root our analysis of the use of ‘new media’ skills and tools in conjunction with traditional scientific and statistical practices.
Numerical proof and its conditions of use
Since its appearance in the 18th century, the idea of numerical proof has persisted, with varying intensity, until the present day. More recently, according to Claudine Schwartz (2012), the development of the two concepts of participatory democracy and ‘new public management’ has led numerical proof to an increasingly frequent appearance in the evaluation of public policy. The convention for using numerical proof, established during the 20th century, is that the proof needs to progress through multiple steps: the conceptualization and formulation of hypotheses, data collection, testing and establishment of the proof itself through critical consideration (i.e. the placing of test results within the theoretical or societal framework within which the hypotheses were developed) (Schwartz, 2012).
These protocols are of particular interest to us, as they offer clues as to how these data are perceived by journalistic actors. While this process is strictly codified in many disciplines, this is not necessarily the case in journalism. One of the risks of manipulating numerical proof without having taken into consideration the collection and structuring processes is the misjudging of the value of actual data. In rhetorical terms, data are ‘that which is given prior to argument’ (Gitelman, 2013: 7). Data are abstractions that make it possible to articulate a problem. Thus, by separating statistical proof from its conventions, an amateur data analyst risks transforming assumptions contextualized by rhetorical propositions into assumptions made out of context. Partially or completely bypassing quantification procedures results in presenting an abstraction as a manifestation of reality that is both neutral and independent of the theoretical and societal framework within which the question is formulated. Our investigation aims to find whether this is, or might be, the case in data journalism.
Statistical data visualization techniques
According to Friendly and Denis (2000), the development of statistical data visualization reflects a dependency on theoretical and social frameworks. The increased use of quantitative data has been accompanied by not only the development of statistical thinking but also the emergence of data visualization. Michael Friendly (2005) begins his examination of the history of statistical data visualization with the rise of statistical thinking and the increasing amount of data collected for planning and commerce in the 19th century. By focusing on different periods, Friendly describes a variety of advances facilitating the spread of these practices. Greater precision in physical measurement and the production of the first demographic statistics appeared in the 17th century. During the 18th century, the introduction of new graphic forms, such as isolines and thematic mapping, allowed for the inclusion of quantitative data. The first steps towards the systematic collection of data had been taken. In the modern era, graphic forms became more abstract and complex, passing from simple maps to comprehensive atlases, with the addition of new symbolic forms (pie charts, histograms, etc.). The golden age of statistical visualization (also called the age of enthusiasm) runs from 1850 to 1900. During a relatively bleak period from 1900 to 1950, however, visualization becomes very formal and focused on quantification. According to one interpretation, this diminished scientific interest in graphic approaches to analysing data was caused by the rise of mathematical statistics (typified by Fisher) and classical inference (Friendly and Denis, 2000). ‘Pictures were – well, just pictures: pretty or evocative, perhaps, but incapable of stating a “fact” to three or more decimals. Or so it seemed to statisticians’, states Friendly (2005: 6). A belief in the potential of pure mathematics to provide answers was at its acme, relegating visualization, a possible analysing tool, to an illustrative role. But it is important to note that this dormancy is also interpreted by Friendly (2005) as a necessary period of latency leading to the popularization of new forms, wherein visualization went ‘mainstream’ and then temporarily plummeted. According to Friendly and Denis (2000), the renewal of interest in statistical visualization that began in the 1960s has its origins in the development of computer technology and, in a deeper sense, in the return to a more experimental and graphic approach within the statistical sciences (p. 55).
These different stages in the history of attitudes towards visualization are telling of the state of dynamics between scientific postures, societal concerns and statistical practices throughout the history of statistics. While they cannot be used for direct comparison in the study of data journalism practices, they do offer some helpful reference points in the general culture and for this study. They gave us, for example, reference points to compare our study sample with and an angle to approach the simplicity or complexity of its forms. This complexity/simplicity in data journalism, we expect, might have more to do with the popularization of statistical visualization among the broader public and less with the scientific innovation, as is the case seen in the above brief history of statistics.
Computerized newsrooms and computational journalism
Hamilton and Turner (2009) offer the following definition of computational journalism:
In some ways computational journalism builds on two familiar approaches, computer-assisted reporting (CAR) and the use of social science tools in journalism. Like these models, computational journalism aims to enable reporters to explore increasingly large amounts of structured and unstructured information as they search for stories. (p. 2)
Flew et al. (2012), however, assert that a more complete definition of computational journalism than given above can be obtained in combination with quantitative methods identified by Philip Meyer. Among others, Meyer (2002) cites statistical analysis, polling, surveys and observation, and collection and interpretation of public data (p. 3). The idea of computer-assisted reporting and research is adopted, and in some ways invented, in English-speaking newsrooms (Thiran, 1996). Enthusiasm for the use of quantitative analysis and codes developed in the social sciences is seen in the concept of ‘precision journalism’ proposed by Meyer in the 1970s (Meyer, 2002). As for the computerization of these forms, the progressive and unequal integration of computers in various print newsrooms since the 1980s must be taken into consideration. The computerization of the press triggers a change in professional identities. However, instead of journalists developing new skills, employees from other disciplines arrive in American newsrooms, most often from computer programming and graphic design (Singer et al., 1999). In the Netherlands, a number of researchers discuss the emergence of a new profession (Deuze and Dimoudi, 2002). In this study, we aim to investigate whether this is also the case in Quebec’s data journalism.
Identifying those who produce data journalism and their backgrounds allows for a contextualization for this analysis, but further benchmarks are needed in order to properly categorize productions. This study will adopt benchmarks proposed by digital humanities scholar Lev Manovich (2001). Manovich does not see algorithms and databases as a dichotomy – that is, it is not just simple that a program reads the data, executes an algorithm and creates new data (i.e. calculation as primary function). For him, programming skills can be used to structure and to support the narrative of the journalistic story as well. In more technical terms, it can result in adding thematic sequences or building navigation templates. Manovich links navigation effort and the creation of narrative sequences with the vision of new media in relation to video games. We can see clear examples of this in news games or web documentaries, which use programming skills to immerse readers – for example, to confront them with the life conditions of prisoners and refugees or even to simulate the context for moral dilemmas. This view parallels the categorization of statistical data visualization by Friendly and Denis (2000), who see two basic functions of data display: one is designed as a presentation that stimulates readers’ eyes as well as persuades and informs them; the other to help the reader analyse the data and to encourage perception, detection and comparison (p. 58). We used these design benchmarks to categorize our data journalism sample (see below).
Access to data
A certain amount of public data, information collected by public organizations, is released to the public. Pierre Alonso (2011) defines public data as data published ‘on dedicated dataset sites […] in formats suitable for being repurposed by the public (civil society, businesses) for their own use at no charge’ (original translation). Public data are not completely indispensable to the practice of data journalism. As recording data become more and more frequent in our society, journalists can turn to data scraping websites or generate data themselves (Gray et al., 2011). An example of this is the Los Angeles Times’ (2014) original compilation of homicides. However, research has shown that public data play an instrumental role in the development of data journalism. According to Flew et al. (2012), the abundance of noteworthy public databases is one of three factors encouraging the emergence and expansion of data journalism. 1 Parasie and Dagiral (2013a), in studying data journalism in Chicago, observe that most of these projects rely ‘on data recorded, stored, and distributed by public authorities’ (original translation) (p. 53). They also emphasize the importance of quality granular (or raw) data for journalistic work. Aitamurto et al. (2011) conclude from a broader study of data journalism that the greatest challenge is obtaining data and that, most of the time, data obtained are provided by government organizations. They consider access to information requests a significant means to acquire these data.
In Canada, data and documents can be obtained by submitting an access to information request. However, this procedure has constraints that greatly limit its application. At the federal level, the Access to Information Act that regulates this right stipulates that fees may be charged for document search, processing and reproduction. In 2011–2012, the average cost for a request was over $CAD1350. That same year, most of the documents requested (54%) were only partially obtained. In only 21 per cent of cases were all the requested documents obtained. Overall, the failure rate of requests (no documents obtained) was approximately 23 per cent (Info Source, 2012). The access to information process is, therefore, onerous and long, 2 and success is far from certain.
In Quebec, fees can only be charged for reproduction, and this can be avoided by consulting documents on site, free of charge. This is made possible by the provincial Act on Access to Documents Held by Public Bodies and the Protection of Personal Information, which also specifies that public bodies have 20 days to respond to requests and are only permitted extensions of up to 10 days (Assemblée nationale, 1982: [A-2.1]). However, this law makes it possible for bodies to shirk their responsibilities by citing a variety of exceptions. Monique Dumont (2013: n.p.), a retired journalist, states that there are ‘strategies to prevent documents being found and disclosed, avoid subjecting organizations to the law, apply restrictive interpretations of the law’s restrictions, and alter the role of those responsible for access to information’ (original translation).
Whether it is at the federal, provincial or municipal level, access to public documents is largely dependent on political authorities’ good intention, a situation regularly criticized by journalists. The Fédération professionnelle des journalistes du Québec (FPJQ) denounces the inefficiency of access to information requests. In March and April 2014, the FPJQ (2014) expressed a number of complaints in a series of articles and releases calling for an ‘overhaul of the so very poorly named Access to Information Act’. 3 Alonso (2011) evaluated ‘government openness’ 4 and discovered that Canada does not rank particularly highly, as the majority of evaluated documents were available but subject to legal constraints. A Library of Parliament (2010) publication nonetheless presents a very positive vision of Canadians’ access to information and describes a move towards a more transparent government and the opening of government databases. Its authors consider Quebec, with laws requiring the mandatory release of a certain number of documents, an example to be followed in matters of access to information. Journalists and groups seeking greater access to data are not, however, in agreement. In a brief on the subject, Jean-Hugues Roy (2013), a media professor at Université du Québec à Montréal, sums up that opinion in this way: ‘We mustn’t delude ourselves. Open data portals […] contain nothing but “friendly” data’ (p. 5).
Methodology
To compile our study sample, we chose to begin with a bottom-up approach. We started with an open definition of data journalism by examining which productions (i.e. projects), aside from traditional journalistic articles, are classified as data journalism by relevant actors. We identified outlets that produce data journalism projects that fall within the broad purview of our study, which is interactive statistical data visualization projects on digital platforms. The chosen outlets include the websites of five major print publications (dailies and magazines) and the country’s French-language public radio and television broadcaster, ICI Radio-Canada (RC). Individual projects, such as those posted on blogs, were not included in this study. Projects were selected if they were classified under relevant content categories on the websites – such as the ‘Interactive’ section in The Gazette, or clickable keywords at the end of articles – or if they displayed combined characteristics that distinguish them from traditional articles (i.e. containing statistical data visualization and are created with technology only available in newsrooms since the turn of the millennium such as interactive maps). Where productions were not grouped into relevant categories, we conducted additional research, in the form of contacting the involved actors.
A total of 178 projects published by these six organizations between 2011 and 2013 were included in this study sample. See Figure 1 for a quick overview of how the number of projects increased over this period.

Number of projects per semester, 2011–2013.
We found a hight number of infographics with data in the medias outlets we studied and the sample includes only 15 static infographics. Actors do not usually interpret simple static data visualization as data journalism. However, it should be reminded that the effort expended to build databases and perform statistical analysis for these static projects sometimes does rival the work required to produce what is considered to be data journalism. As observed earlier, some frontiers seem porous. We also noted that some projects were labelled ‘data journalism’ by their producers despite the fact that their interactive characteristics were limited to a few elements that, for example, merely call up explanatory text when clicked on. This type of production constitutes 11 per cent of our sample. This discrimination among initiatives and the efforts, sometimes superficial, to integrate minimum formal interactivity are interesting to note, as they convey what actors value and tend to put forward when they speak of data journalism. The contradictions are a reminder that interactive characteristics constitute a slippery concept that masks more profound epistemological concerns (Galloway, 2012; Manovich, 1996).
With Lisa Gitelman’s understanding of disciplines and structures in mind, we developed an analytical framework to identify the recourse to different disciplines and practices. Underpinning the systematic analysis of characteristics that reveal borrowings from particular disciplines is the goal when identifying the dynamics of data journalism’s emerging infrastructure within newsrooms. Infrastructure, once stabilized and efficient, tends to become invisible (Bowker and Star, 1999: 35). Its fundamental purposes (raisons d’être) are easier to question while their justifications are not yet socially and culturally adopted.
Our first task in reviewing data journalism production in Quebec between 2011 and 2013 was to identify work group composition and demonstrated skills or, in other words, to determine who is producing data journalism. We established how many people worked on each project, whether or not teams were stable and which skills were represented. Regarding data visualization, we considered the characteristics described by Robinson and Friendly in their account of the historical stages of statistical data visualization development. For example, when a map is included, we explore whether it is linked to a database by geolocation and whether the geolocated points can support quantitative information. At times, several themes are superposed on one or more maps. These characteristics provide information about the complexity of the visualization techniques used and can be linked to Friendly’s observations regarding production during the last two centuries. We also identified the software used to create maps and other visuals and noted whether or not it was freeware. In this way, we were able to determine the design effort expended and whether access was available to tools that automatically create statistical data visualization designs.
During coding, identifying the programming category required the most work and caused the most difficulty. Because we lack training in information technology (IT), it was necessary to interview data journalists and consult an IT specialist in order to make relevant connections between observed practices and the theoretical knowledge on which our understanding of the computerization of ‘new media’ is based and, more specifically, to better understand how data are structured and algorithms are used in journalistic production. While coding computer programming and software use, we were rarely able to rely on the explanations of methods and tools provided online. That said, some information was obtained by studying project pages’ source code. It was also sometimes necessary to consult with consenting authors, particularly concerning the use of scripts to gather information from an image file or institutional site.
In our initial observation of our sample, we noticed a heavy reliance on institutional sources (see below). In order to adequately present an analysis of the data used in these data journalism projects, we examined the state of access to information and the nature of institutional data available in Quebec. We analysed the content of dedicated portals maintained by two cities (Montréal, 2014; Québec, 2014a), Quebec’s provincial government (Québec, 2014b) and the Canadian government (Canada, 2014). Our open data website analysis is guided by criteria used by Alexandre Schellong and Ekaterina Stepanets (2011: 11–12) in their evaluation of European countries’ public databases. They considered whether the data were complete, raw (or granular), quickly available, accessible, in formats appropriate for computer analysis, non-discriminatory (consultation available to all), non-proprietary and copyright free. Granular data are data that record the observation of variables without statistical manipulation, while aggregate data result from statistical analysis (Université de Sherbrooke, 2014). Datasets in our study sample were all available for no charge, relatively accessible and mainly in non-proprietary format. Our analysis aims to determine whether the data released were raw or aggregate, quickly available and in formats appropriate for computer analysis.
Analysis of findings
The dependency on pre-processed public data
Our study sample reveals a high level of dependence on available institutional data in Quebec between 2011 and 2013. As seen in Figure 2, only 2 per cent of the 178 data journalism projects are based on self-built databases. A total of 63 per cent used public data and only 35 per cent used non-institutional sources. Moreover, almost half of the projects rely exclusively on institutional sources (85 out of 178) and 16 per cent mixed institutional sources with other types of sources (28 out of 178). Only six projects resulted from an access to information request.

Sources of the data used for production in the studied sample (rounded percentages).
To further explore the implications of such dependency, we will now present our analysis of the types of open data available to journalists and the public in Quebec.
Municipal open data
On 6 April 2014, we consulted 58 datasets for Quebec City and 95 for Montreal. These were then categorized according to content (raw data, aggregate data, maps, municipal information/services, photos) and most recent update (same day, during the last month, more than a month ago). For both cities, approximately 50 per cent of the available documents were not, strictly speaking, databases but administrative documents (hours of operation, meeting schedules, meeting minutes, etc.) or pictures. On Quebec City’s open data site, 45 per cent of the datasets provided were granular, compared to 40 per cent of the items available for Montreal. No aggregate data were present, and for both cities, maps and cartographic data accounted for 10 per cent of datasets. The primary difference between the two cities was the frequency with which their databases were updated. Quebec City did so more frequently, with 33 per cent of its data updated daily against 5 per cent for Montreal. While only 21 per cent of Quebec City’s data had not been updated in over a month at the time of analysis, 61 per cent of Montreal’s data were over a month old.
Government open data
Government databases were consulted on 7 and 8 April 2014, and a randomly selected representative sample was analysed. For the Quebec Province portal, after items were sorted by subject and alphabetical order, the last item of each page was evaluated, for a total of 39 of the 351 documents available on the site. The federal government’s site contains 193,017 datasets, 185,404 of which are categorized as ‘geo’. Considering their number, cartographic data (e.g. river courses, locations of roads or provincial borders) were eliminated to avoid skewing results. Of the 7770 items classified as ‘data’, 64 were randomly selected (by subject and proportionally to the number of items in each subject). Updates were noted according to frequency (as indicated on the site and confirmed in each item’s history) and type of data provided (raw, aggregate, geographic, archives/services or reports). Despite the potential interest of documents in the last two categories, they are not databases.
While most municipal databases are revised daily or monthly, government databases are much less frequently updated. Most of Quebec Province government’s databases (67%) are updated yearly and the majority of federal databases (72%) are never brought up to date. Approximately 8 per cent of Quebec’s data are raw and 76 per cent are aggregate (Figure 3). Federal data are 19 per cent raw and 71 per cent aggregate (Figure 3). Aggregate data are generally published in table format. See Figure 2 for a breakdown of types of data provided by different institutions.

Types of open data produced by institutional sources.
This analysis confirms journalists’ observation that raw data useful for their work are very rarely provided by Quebec and Canadian governments. Cities are more generous, but, to recall Jean-Hugues Roy (2013), these data are ‘friendly’ (p. 5). About half of the data (50%) consist of the geolocation of fire hydrants, trees, skating rinks or police stations. For example, the only information about food security available in Quebec is annual aggregate data (Direction générale de la santé animale et de l’inspection des aliments (DGSIA), 2012) and a list of establishments that have been fined (Ministère de l’Agriculture, and des Pêcheries et de l’Alimentation du Québec (MAPAQ), 2014). Available elsewhere in Canada are high-quality databases on the subject that are updated daily with granular data, such as the Medical Health Officer (MHO) (2014) of Vancouver Island and Toronto Public Health (2014). These kinds of data are far more useful to journalists than the few charts made with aggregate data available in Quebec.
Actors’ background and work dynamics
Returning to our case study, we noted that 83 per cent of the sampled data journalism projects came from only three of the six news outlets – Journal de Montréal, RC and La Presse. It was clear that they differed from the other three in that they had developed specific protocols and created dedicated teams for data journalism.
We were interested in the professional background and work dynamics of data journalism actors (Table 1). The findings suggest the overrepresentation of a handful of actors with atypical profiles. We isolated five actors who participated in 10 or more projects between 2011 and 2013. Among these are two graphic designers, a journalist, a programmer and a statistician. Three of them have developed enough expertise to supervise the teamwork often required to produce data journalism projects. The five of them worked on approximately 55 per cent of the data journalism projects we identified.
Primary expertise areas and provenance of data journalism authors in Quebec.
The cell percentage represents the occurrence frequency of an expertise among the credited authors. Some authors appear in two areas as they had significant experience and/or formation in more than one field.
37e Avenue, an external agency producing data journalism for L’Actualité in this sample, had a statistician in its team but he did not appear in the credits of the analysed projects.
A majority of journalists who contribute to the projects work as researchers and do not perform tasks requiring advanced statistical or computer skills. At the Journal de Montréal, the participation of these journalists is not always mentioned online, but we found out that at least 80 per cent of the credited authors of data journalism projects of this newspaper have journalism as their primary expertise. Yet, the Journal de Montréal’s two main actors (a graphic designer and a statistician) confirmed that journalists had worked on each project, usually as researchers. On occasion, journalists contributed project ideas and acted as specialists for a particular beat. Although there were journalists populating databases by filling in Excel spreadsheets, the two key actors supervised database creation, manipulation and visualization.
At RC, a team of three web programmers was created in 2013 with graphic design skills, a data designer, two or three journalists and an editorial secretary. This team adapts public affairs radio and television reports and continuous news feeds for its platform. It can also support a journalist who wants to produce a data journalism project. Before 2013, regional newsrooms and individual journalists took on several pilot projects and initiatives without the support of the main newsroom. La Presse, after a period of experimentation, progressively implemented a regular procedure. Journalists come up with a project and a programmer assists with gathering data, or ‘scraping’. Web programmers and data analysts are often involved as well. La Presse sometimes calls on external collaborators.
In the other three news outlets, which contributed 17 per cent of our sampled projects, things are a little different. The Gazette created a position in interactive media and data editing for a journalist who had specialized in this type of project. 5 According to traditional indicators (bylines), this journalist only participated in 4 of the 10 projects published by The Gazette during the period studied, although he confirms having supervised and/or consulted on the majority. He also explained that there are no protocols in place and that each project is custom-designed. Meanwhile, L’Actualité does not employ programmers, web programmers or employees dedicated to data journalism development, nor has it established a team to manage this type of production. The graphics design department is sometimes called upon to assist. A big part of their production comes from an external agency specialized in the use and visualization of data (37e Avenue). Finally, Le Devoir published only one data journalism project as defined in this study, and it is the product of a collaboration with an external actor. 6
Numerical proof conventions and statistical data visualization
Data journalism’s dependency on accessible institutional data also suggests that its conformity to numerical proof protocols was weak. As noted, only 2 per cent of the projects are based on self-built databases and, therefore, involved the possibility of the entire protocol required for numerical proof. The 16 per cent of projects that mixed institutional data and other sources might have involved a partial fulfilment of the protocol, as they would require both working on pre-structured data and re-structuring the mixed data.
In terms of statistical visualization, since the studied productions target the general public, we found that more sophisticated forms are absent from the sample. As represented in Figure 4, a great majority of projects contain a map (58%). About half of these are simple maps pointing to locations, with the other half adding geo-coded quantitative information to the local points on the map. In all, 27 per cent of the projects contain graphics and 30 per cent contain infographics. Infographics differ from traditional graphics and charts in that infographics tend to mix information with basic graphics to simplify extremely complex subject matters. By way of example, some headlines of infographics include ‘The federal budget in glance’ or ‘What you should know about global warming in two minutes’.

Types of visualization.
It should be noted that the ‘Map’, ‘Graphic’ and ‘Infographic’ categories are not mutually exclusive, and projects with multiple forms are represented as mixed form in Figure 5. As discussed earlier, statistical data visualization is not, in itself, a statistical analysis activity. However, the forms it assumes provide information about the complexity of both the database-building and manipulation skills required to produce an appropriate visualization. We identified the use of multiple forms and of detailed information to cover a story as a sign of sophistication, following the benchmark identified by Friendly and Denis. As can be seen from Figure 5, although the map is the most used form, the vast majority of the maps in this sample (71%) stand alone with no additional visual elements: only 29 per cent are qualified as mixed forms. The most common source of complexity in maps is the use of geolocalized quantities (55% of the maps). Meanwhile, 67 per cent of the graphics and 63 per cent of the infographics go along with another form of visualization.

Complexity of data journalism projects and their causes.
Navigation and narration elements
Data journalism actors program algorithms and, more specifically, scripts, to collect data from a government site or image file. Although this use of algorithms is more difficult to identify (because this operation is rarely mentioned in the project’s description), we were able to detect that at least 23 per cent of the projects involved the use of a script to collect data. Following the benchmarks in Lev Manovich’s views on experimental videogame and web documentaries, we investigated the efforts to improve navigation and narration in the sampled projects. We noted that the programming efforts were concentrated on the creation and exploitation of databases, and not so much focus was put on navigation techniques and construction of narrative elements. Around 19 per cent of the study sample presents navigation efforts and 20 per cent contains elements of narration. Only 3 per cent presents both.
Conclusion
This study has validated a few simple assertions. Since 2011, data journalism has grown in popularity in Quebec newsrooms. A total of 98 individuals from all backgrounds, 64 of whom were journalists, participated in creating projects between 2011 and 2013. Most of the journalists involved, however, only need a minimum of computer and statistics skills. Indeed, a significant number had participated in only one project over the 3 years studied. In contrast, there were five key actors whose cross-disciplinary training and skills (a) allow them to produce complex projects requiring extensive computer skills and (b) equip them to supervise or manage new newsroom production processes that blend journalism, statistical, computer and graphics skills.
Therefore, a development of computer and statistics skills among journalists participating in data journalism was not observed. We were able to confirm the hypothesis that a new kind of professionals had arrived in the province’s newsrooms, with a handful of actors with atypical professional profiles being responsible for the large majority of data journalism projects. We also found that data journalism in Quebec seems to have benefited from the development of new open source tools, with 72 per cent of the studied projects involving the use of such tools. But, by and large, journalists do not seem to engage deeply with this development.
Furthermore, the quantification of information and visualization of statistical data by journalists in Quebec only rarely respects numerical proof protocols. Most projects present relatively unsophisticated statistical data visualizations based on public data sources. In general, the works in our Quebec sample are heavily weighted towards the programming required to build and visualize databases, but do not often involve other elements often seen in international examples of data journalism, such as navigation facilities and information exchange with readers on dedicated platforms/interfaces. The majority of the studied projects heavily rely on already accessible public datasets and simply illustrate already-assembled datasets with automated visualization programs, without further analysis or restructuration. Following Friendly and Denis’ data display categorization, these can be described as presentations aiming to stimulate, persuade and inform. They do not stand up as original independent analyses, neither during the production process nor in their final form as presented to the public. This is linked in a substantial part to the issue of data accessibility in the province of Quebec. As seen above, it can be difficult for Quebec journalists to obtain numerical data concerning specific topics. Data made public by governments and cities in Quebec are often aggregated or of little journalistic interest. Journalists rarely have access to raw data and must, for now, content themselves with datasets pre-processed by public institutions. This situation is not conducive to journalists conducting original analyses of statistical data.
In order to develop more meaningful and deep-digging data journalism for the future, journalists must control data collection (and, more generally, numerical proof protocols). If data journalism is understood to include original statistical analysis, building a database is a fundamental part of the process. Adopting the naïve positivist position (Desrosières, 2008) that ‘numbers speak for themselves’ (p. 10) means giving to analysis performed by others and, in this case, relaying information created by the institutions in power. Considering that data journalism is still in its infancy and that journalists can only cut their teeth on readily available data, Quebec journalists still have two avenues for developing more engaging, critical and analytical practices. They can claim and gain more access to public data in a form conducive to statistical analysis, or continue to rely on the progress of available technologies to retrieve raw data and hopefully take a more active part in the creation of these tools.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: We would like to acknowledge the financial support of the research center CRISIS of Université du Québec à Montréal’s (UQAM) which contributed greatly to the concretisation of this article.
