Abstract
The increasing availability of statistical data raises opportunities for ‘big’ data and learning analytics. Here, we review the academic literature and research relating to the use of big data analytics in the public sector, and its contribution to public organizations’ performance and efficiency. We outline the advantages as well as the limitations of using big data in public sector organizations and identify research gaps in recent studies and interesting areas for future research.
Introduction 1
The persistent and global financial crisis, and attendant strategies for fiscal consolidation have accentuated the problem of maximizing public administration (PA) efficiency. In many areas of public activities – especially in the social welfare domain (e.g. health care, education, elderly care) – securing increased efficiency and productivity ensure resources are focused on improving the quality of activities and outputs, guaranteeing the satisfaction of citizens who receive the services, and assuring effectiveness and equity in the public sector as a whole.
The literature addressing the empirical measurement of public sector efficiency employs a wide range of techniques and focuses on various units of analysis: local governments (Asatryan and De Witte, 2015; De Borger and Kerstens, 1996; De Witte and Geys, 2011; De Witte and Moesen, 2010; Revelli and Tovmo, 2007), public hospitals (Hollingsworth, 2008), public agencies (Neshkova and Guo, 2012), public transportation services (Pina and Torres, 2001; Sampaio et al., 2008), and, education (Cherchye et al., 2010; De Witte and Lopez-Torres, 2017).2,3
The application of empirical models for assessing the efficiency of public entities can also open the door to the study of its determinants, and consequently have interesting implications for policy, administration and management of the public services. In this context, for instance, it can be tested whether particular managerial tools, different roles for the regulations, or stimulating policies and interventions (for instance, favoring competition, or facilitating strategic management processes) have a positive or negative impact on the efficiency of public spending (e.g. see De Witte and Geys, 2011, for a discussion on the preferences of voters in local government efficiency).
The new opportunities offered by big data can help the efficiency analysis of public entities make a further step. 4 More specifically, nowadays, administrative datasets are ‘big’ in the sense that the individual organizations, in many sectors, periodically produce very detailed questionnaires and databases that include structural or ‘hard’ information and soft data about managerial practices, quality of outputs and inputs, etc. In addition, a huge amount of information is released by individual public organizations, and can be collected as open data. Finally, the diffusion of e-government practices implies the production of huge amount of data through, for instance, the social networks, open data platforms, and public agencies websites.
Yet, the way in which governments, policy makers, and public organizations use big data and learning analytics is still an under-investigated topic, as is the possibility of using big data for enriching efficiency analyses and performance measurements. This review and the symposium of papers in this journal issue, examine both the use of big data for efficiency purposes and the effectiveness of public and welfare services, and for measuring performance in a broader sense.
We are guided by four research foci:
From a theoretical perspective, what are the advantages of using big data in the understanding of public sector organizations? Can new available datasets help the organizations in designing new services, better evaluating their activities, meeting new and more articulated needs, improving the efficiency of operations? How do public administrations use big data for their internal performance management procedures? Is big data employed for comparing outputs, practices (processes) and resources invested, with an explicit aim of benchmarking with similar organizations? How can using big data improve such benchmarking? Can big data help the development of new indicators for outputs and inputs, thus allowing innovative efficiency analyses, which can be used to challenge the existing evidence about the efficiency of public administrations? How do these new studies change the implications that derive from existing literature in the field? Are public policy-makers using big data for designing policies and/or adjusting them, for example following the judgments of citizens that can be processed via adequate analytics?
The main objective of this review is to give an overview of the academic literature and research related to the theme of big data analytics for public organizations’ performance and efficiency measurement, with specific attention to our four themes. To conduct this research, we focus on the most recent studies in leading journals that publish on the relationships between information technology, (public) policy making, PA and government (predominantly the journals Public Policy and Administration and Government Information Quarterly). We also look at recent research report by important political and consultancy institutions (European Commission, McKinsey Global Institute) and recent books on the topic. For the big data applications, we take a broader look at the literature. It is important to note that the overview is not intended to be comprehensive.
This article is the first within a symposium in ‘Public Policy and Administration’ on ‘Big data analytics and its use in the measurement of public organizations’ performance and efficiency’. The next two papers of the symposium deal with innovative applications. The paper by Johnes and Ruggiero (2017) focuses on revenue efficiency, in particular ascertaining the extent to which, given output prices, producers choose the revenue maximizing vector of outputs. They evaluate efficiencies for English institutions of higher education for the academic year 2012–13 and find considerable variation across institutions in revenue efficiency. The relaxation of the price-taking assumption leads to relatively small changes, in either direction, to the estimated revenue efficiency scores. A number of issues surrounding the modeling process are raised and discussed, including the determination of the demand function for each type of output and the selection of inputs and outputs to be used in the model.
The third paper of the symposium is by Agostino and Arnaboldi (2017). They show how social media data represent a potential powerful tool in the hands of public authorities to support the evaluation of public service performance. By relying on an action research project in the higher education field, this study explores how social media data can contribute to measure service effectiveness by focusing specifically on Twitter in the higher education field. The aim of the paper is to develop a set of measures, derived from Twitter data, to quantify the effectiveness of higher education services. This investigation supports a broader discussion about the extent to which social media data can contribute to performance measurement in the public sector.
The article at hand has five main goals. First, it provides readers with a general introduction to the topic area: in particular, it aims to give a clear understanding of the most prescient insights, opinions, results, and big data applications for the public sector that have been described in the literature. Second, a special focus in the review will be on the advantages as well as the limitations of using big data in public sector organizations. Third, the review briefly describes what past studies have written about the use of big data by governments and PAs for internal performance management. In this regard, a particularly interesting research question is whether managers and heads of department have used big data to develop new or better versions of performance indicators (inputs, outputs, and/or outcomes), and/or have used information generated through big data for improving policy making and managerial practices. Fourth, the review considers the potential benefits and applications of big data for commerce and industry (in a context of providing services to the government or not) – this insight is helpful in detecting factors that are growingly important also for the public sector. Finally, the review identifies research gaps in recent studies and fruitful areas for future research, with the aim of setting a tentative agenda for interested scholars.
Describing big data
In general, big data refers to huge volumes of (digital) data that are collected from large variety of sources that are too large, raw, or unstructured for analysis through conventional database techniques (Kim et al., 2014: 78). A common framework that is used to describe big data is the ‘3-V’ framework with the three dimensions ‘Volume’, ‘Variety’, and ‘Velocity’ (Brynjolfsson and McAfee, 2012; Chen et al., 2012; Gandomi and Haider, 2015; Kwon et al., 2014). In this framework, ‘Volume’ corresponds to the size of big data (typically multiple terabytes or petabytes). ‘Variety’, refers to the composition of the data set and, more in particular, to the structural heterogeneity in data (i.e. are the data structured, semi-structured, or unstructured). Practice shows that only a minority of the big data are structured. The ‘Velocity’ dimension refers to the dynamic nature of big data – the speed of collecting, storing and analyzing big data. Regarding this dimension, there is an increasing trend toward generating, collecting, storing, and analyzing data at high-frequency (in some sectors and applications even real-time or near to real-time). While the volume or size dimension is most discussed in the context of big data, Gandomi and Haider (2015) stress that the other dimensions are equally important. In fact, they emphasize that one should avoid focusing exclusively on one particular dimension as there may be interactions between the dimensions. For instance, the interpretation of the ‘Volume’ dimension (i.e. when can a dataset be considered big data) may very well depend on whether the data are structured or not. Unstructured data usually require more storage and analysis capacities and better technologies than structured data. Therefore the threshold size for unstructured data will be smaller than for structured data.
Next to the three dimensions of the basic 3-V framework, also other dimensions are sometimes used to characterize big data. Gandomi and Haider (2015) and Gani et al. (2015) describe four of these dimensions: ‘Veracity’ (unreliability and impreciseness of some data sources), ‘Variability’ (similar or dissimilar data flow rates), ‘Complexity’ (few or numerous data sources), and ‘Value’ (relative value density).
All of the aforementioned characteristics impose critical challenges to the collection, storage, migration, and analysis of big data (Gandomi and Haider, 2015; Gani et al., 2015). Traditional techniques of data analysis, technologies and tools are poorly equipped to deal with these challenges and work with big data. Big data requires effective and efficient techniques and technology (as well as data organization and management) for that its potential value can be unlocked to guide decision making. Such innovative technologies need to be able to cope with the highly demanding characteristics of big data, and then in particular, the organization, storage and analysis of high volumes of fast-moving data, often from heterogeneous sources and different data types, into meaningful information. Although some new storage and computations technologies have been developed recently (for example, text mining and text analytic techniques, information extraction techniques, text summarization techniques, sentiment analysis (opinion mining) techniques, social media analytic techniques, B-tree-oriented indexing techniques, and audio and video analytic techniques), much more technological advances and analytical techniques will very likely emerge in the near future (for a state-of-the-art taxonomy of the techniques see Gani et al., 2015). A positive evolution in this respect is that new viewpoints in social science (for example, computational organizational science) are now following the developments in big data (for a good discussion of the paradigm shift for computational social sciences and big data, we refer the interested reader to Chang et al., 2014). The idea is that this will enable actors in both the public and private sector to use big data in an efficient and effective, and hence, economically feasible manner in more applications and also on a larger scale.
Big data and public sector: Opportunities
In terms of the potential value of big data, there is a growing consensus among governmental stakeholders (i.e. multimedia experts, scholars, policy makers, non-governmental agencies, captains of industry) that big data applications and functionalities provide a broad range of opportunities for governments and governmental institutions worldwide (Brynjolfsson and McAfee, 2012; Chen and Zhang, 2014; Jin et al., 2015; Shaw, 2014). Resulting from this growing awareness, governments worldwide (predominantly in the US, Europe (most notably, the UK and France), Australia, Japan, Singapore, and South Korea) have announced plans and roadmaps to support the development of big data in both the public and private sector (for an overview, see, among others, European Commission, 2010; Kim et al., 2014). Reviews of the literature (Chen and Zhang, 2014; Gandomi and Haider, 2015; Ginsberg et al., 2009; Jin et al., 2015; Morabito, 2015a) showed several interesting new and innovative applications of big data for the public sector that are already in place or that are likely to be implemented in the near future. Policy areas that have been described in the literature as having experienced considerable improvements in outcomes and services thanks to the use of big data are: the organization of traffic (Janssen et al., 2012; Lv et al., 2015), safeguarding of public security, policing (Meijer and Thaens, 2013; Meijer and Torenvlied, 2016), combatting crime and fraud (Chen and Zhang, 2014), health and well-being (Ginsberg et al., 2009), environment and sustainability (Faghmous and Kumar, 2014), transportation (Kim et al., 2014), energy (Diamantoulakis et al., 2015), smart cities (Hashem et al., 2016; Morabito, 2015a), and education (Williamson, 2016). An example of the effective use of big data in the public health sector was discussed by Ginsberg and colleagues in Nature (Ginsberg et al., 2009). In their article, they describe how the use of Google search queries helped in monitoring and tracking influenza-like illnesses of citizens in each region of the US so that earlier detection of influenza epidemics was possible. Positive outcomes were a more accurate prediction of the required facilities (for example, hospital beds) and vaccines, and prompt treatment of the patients. Another interesting application of big data analytics in the public sector is tax collection, an area where the call for more justice is increasingly loud. Chen and Zhang (2014) discuss how the use of big data in that area can help tax services in detecting and combatting fiscal fraud more successfully – for example, by creating profiles of people, triangulating information about people, and developing predictive models of ‘evasion taxpayers profiles’.
More generally, big data offers several advantages for public sector organizations. First, big data can help governments in making the shift from paper-filling to e-government services, for instance, through an increased integration and data flow across different PAs. While ICT is inherently driving organizations ‘paperless’, it is the combination of numerous data sources, unstructured data and data with dissimilar flow rates that make it more specific of big data. This evolution is coherent with a continuing diffusion of ICT as a tool for recording (administrative) information that can be used in a second stage. While in the past (and indeed present!) it was not uncommon for citizens to fill out multiple forms with largely the same personal information for different public service administrations, now, PAs can make use of big technologies to collect the data themselves by sharing the data sources of the other administrations or consulting on line data sources (such as Facebook® and LinkedIn®).
Second, big data can play a pivotal role in developing partnerships between governments and their citizens (Bertot et al., 2010). Whereas traditional technology provides limited possibilities to consult and inform the public about new policy instruments or services, big data technologies and infrastructure offer considerable opportunities for governments to foster civic participation in developing, implementing and assessing policy programs. Big data applications are an important support in initiating and implementing direct online democracy, active citizen engagement, and open government initiatives (Bertot et al., 2010; Hong, 2013; O’Reilly, 2010). Margetts and Dunleavy (2013) speak of ‘digital governance’ which puts the interactions between humans and computers at the center of the (national and local) government business model. There is a growing interest by local governments, cities, and municipalities in innovative online tools to collect feedback from citizens and tailor public services to the citizen needs (Andrews, 2011). Mergel (2012) discussed how social media applications such as Facebook and Twitter have become widely accepted and used by the national and local governments worldwide as part of Open Government initiatives and Smart City Governance (Hoon Lee et al., 2013; Meijer, 2016). Mossberger et al. (2013) found that the use of social networks and other interactive tools in the 75 largest U.S. cities skyrocketed in recent years (with the percentage of cities adopting Facebook and Twitter increasing from respectively 13% and 25% in 2009 to 87% in 2011). Morabito (2015a) describes the example of Citysourced.com, a civic engagement software platform used by local governments and cities that offers several facilities for citizens to report and provide information to local authorities about all sorts of local problems (for example, illegal dumping, air or noise pollution, neighborhood violence, malfunctioning of street or traffic lights). In a recent opinion piece in Public Administration Review (O’Malley, 2014), the former mayor of Baltimore, O’Malley describes how geographic information systems (GIS) were used to collect citizen requests about city actions and services and argues that this has changed the way Baltimore is governed resulting in better administrative choices and better results. Asatryan and De Witte (2015) show for German municipalities that this form of direct democracy fosters local public government efficiency.
Third, and somewhat related to the previous advantage, big data can help PAs compile detailed and accurate profiles of citizens and using them to tailor public services to the needs and demands of the citizens (Bonsón et al., 2015; Heikkila and Isett, 2007). For instance, big data regarding citizen sentiment toward public services (most obviously, by screening the web search queries or using social media) can entail useful feedback and highlight opportunities to customize service delivery by helping employees better understand the needs of each citizen. Ho and Coates (2002) found that citizens are able to identify important aspects of government services (for example, the quality and consumer-friendliness of the provided services) that governments often ignore in the evaluation of the own performances. Incorporating these sentiments in the performance evaluation of government policies and services as well as in the implementation of changes in government policies and services also enhances the legitimacy of performance measurement as well as the transparency and the accountability toward the citizens (Bertot et al., 2010; European Commission, 2010; Lee and Kwak, 2012). The idea is that all these initiatives should also benefit citizen satisfaction with public services and governments (see Van de Walle, 2017). In addition, as discussed by Mossberger et al. (2013: 352), the customization of information through Web 2.0 features such as RSS feeds or social networks like Facebook or Twitter may lower information costs and hence benefit the cost effectiveness of national and local government institutions and cities.
Fourth, big data can play an important role at the international level. Take, for instance, the growing interest in, and importance of, cooperation and information exchange between agencies and governments of different countries in their war on terror, the battle against tax evasion, and the international coordination of global migration. An unfortunate example of the importance of countries sharing data and information in the war on terror was the bomb attack at the Boston Marathon in 2013 which, according to several research reports, could have been prevented if Russian secret services would have shared more information with their American colleagues (Kim et al., 2014: 80). In the complex area of international tax evasion and fraud, large number of national PA databases could be integrated and shared among countries (by bi- or multi-lateral agreements) to improve fraud detection and tax evasion control (Morabito, 2015b). In the migration policy area, the importance of sharing and communicating migration data more effectively in an international context was recently demonstrated with the opening of a new Global Migration Data Analysis Centre (GMDAC) by the International Organization for Migration in Berlin (IOM, 2015).
Finally, big data can become a new source of information for public organizations for pursuing efficiency and effectiveness in their operations. Determining the efficiency of public organizations is usually a hard challenge (McConnell, 2015). Probably one of the most pervasive problems is the lack of information to determine the quality and quantity of government outputs in objective measures or figures. 5 Several studies (Bertot and Jaeger, 2008; Hofmann et al., 2013; Manyika et al., 2011; Mergel, 2012; Williamson, 2014) advocated that big data can provide public organizations with more detailed information about the quality and/or quantity of the governments outputs such that more adequate measures of outputs and outcomes for the public sector can be generated. For developing and implementing e-government services, for instance, Bertot and Jaeger (2008) advocated big data as a potentially valuable source of information that can help government in obtaining a clear understanding over what technologies and instruments are most efficient and effective. Interesting information could consist of measures of the awareness and engagement created by government communications – for example, the numbers of likes and comments that people have given to government posts on social media and the prevailing attitude (negative, neutral, positive) of those comments (Mergel, 2012, 2013).
Big data and performance management of public organizations
Big data can also transform performance management procedures in the public sector. Most importantly, effective use of big data can boost efficiency by reducing the amount of inputs necessary for providing the current service level and/or producing the actual output level (input efficiency) or by increasing the service and/or output level for the current input usage (output efficiency). A global survey, organized by Bloomberg Businessweek Research Services, among top managers of government agencies around the world in 2013, revealed that roughly four out of five leaders are convinced that transformations will take place in the public sector due to the use of big data (Mullich, 2013). A belief held by many managers is that, for some policy areas, big data could result in the use of entirely new management models. Take personnel performance as an example. Here, big data could be used to organize promotions, rewards or salary differentials. Big data can help Human Resource Management (HRM) departments in government institutions to identify and attract resources and talent. Performance dashboards with information on personnel performances can also be constructed and used by managers and HRM to monitor and guide the performances of personnel. In fact, HRM departments of tomorrow will use a variety of data (for example, data on working conditions, employee satisfaction and productivity) to assign tasks more optimally among divisions and employees, improve work conditions and introduce incentives that aim at improving both employee satisfaction and productivity (Brown et al., 2011). Using survey data from US local government managers, Oliveira and Welch (2013) found that social media tools are used for dissemination, feedback on service quality, participation, and internal work collaboration.
Turning to the ‘institutional assessments’ of public organizations, Andrews et al. (2010) discussed the importance (and the differences between) internal and external measures for assessing organizational performance. Several papers and opinion texts (O’Malley, 2014) criticize the old way of thinking about politics and governing as being largely focused on inputs and, in particular, on the question of how the resources should be allocated among the different tasks and problems. In his view, big data, and in particular, the fast collection and sharing of a variety of data, will cause a shift from an input-centric approach to an approach that focuses on outputs and outcomes. Morgeson (2014) shares this viewpoint and note that several national and local governments have already begun with shifting the focus from internal performance measures to citizen-centric measures through, among other things, the use of big data. Applications and functionalities of big data are also expected to increasingly change management models for organizing and providing public service. Government managers and heads of department could make use of performance dashboards with a large amount of operational and financial data to evaluate and compare the (cost) efficiency of departments across government agencies or different departments within governmental agencies that are performing broadly similar functions, in the spirit of benchmarking exercises.
Big data: Limitations and risks
Big data does not only offer potential advantages to countries and industry, it also brings several real limitations, challenges and risks (Bertot et al., 2012; Boyd and Crawford, 2012; Picazo-Vela et al., 2012). Desouza and Jacob (2014) somewhat roughly classified these limitations, challenges, and risks in two broad categories: (1) privacy-related problems and (2) technical difficulties. Regarding the privacy issue of big data, one particularly important question is whether the increase in use of big data may cause privacy intrusions (see Boyd and Crawford, 2012). Indeed, the activity of recording detailed individual-level information may be perceived as dangerous for citizens’ intimacy and privacy. National and international legislations have the specific aim of protecting this individual right, thus acting de facto as a regulatory obstacle to the development of repositories for detailed information on individuals. Overall, the balance between individual rights and public interest, when concerning the sphere of personal privacy, is still an argument subjected to fierce debate (Tene and Polonetsky, 2012). As indicated by Kim et al. (2014: 81) and Yiu (2012), the line between collecting and using big data in a proper manner and sufficiently ensuring people’s privacy is fine and more research should be done in order to find a good answer to this intricate question. Another issue, that somewhat relates to the previous issue is the data ownership (who owns the big data?) (Washington, 2014). Interesting cases here are recurring issues concerning data ownership with multinational social media players such as Facebook, MySpace, and Twitter. A particular problem with these global social media players is that their own rules supersede governmental regulation.
On technical hurdles, while the direction toward the use of complex, unstructured, and ‘big’ datasets to inform decision-making is conceptually clear, the development of systems in public organizations to handle big data effectively and efficiently is an issue that is typically far to be solved yet. Most of the challenges mainly center upon dealing with the digitization of big data, diversity of the data types, timely responding to requirements, and handling uncertainties in the data. Challenges may range from the design of storage systems that enable storing vast amounts of data, the design and implementation of collecting and processing systems which enable collecting and combining data from different sources, to the development and use of analysis techniques that enable dealing with the inherent complexity of big data. A critical point is the creation of interoperable datasets, by structurally merging systems developed by different actors, for different purposes. While this technical problem exists for both private and public organizations, it is exacerbated in the public sector, where the software used and the ability of developing ICT innovative solutions are sometimes not effective and transforming government services using ICT innovation is often complex and costly (Manyika et al., 2011; Morabito, 2015a). The existence of these technical problems should raise questions about the development of core competences within public organizations for managing big data, and make them ready for analytics. In other words, the organizations are called to assure the technical ability of working with big data, and they should not focus their attention solely on policy use of data. A white paper by Software and Information Industry Association (SIIA, 2013: 19–26) offers explicit recommendations and guidelines for policy makers, decision makers and governments to capture the potential of big data and data-driven innovation to the maximum. One such recommendation is that policymakers should avoid establishing policies that restrain data collection and analysis. Another guideline is that policy makers should opt for flexible, open-ended rules to capture, comingle, store, and analyze big data.
There are also some threats and risks to the use of (at least some types of) big data in policy making that are more due to the inherent nature of big data. For instance, one particular threat to the use of big data that has been provided and collected using social media is that some parts of the public may not or only very limitedly participate in the information society due to the lack of knowledge, time, or facilities. Among others, Heikkila and Isett (2007) warned that even though citizens may actively participate and voluntarily provide information and feedback about delivered services, it is important for governments to keep in mind that this may only provide a partial or an incomplete picture of the experiences, criticism, and needs of the broader communities. This issue of different personality types reacting differently to the presence of social media and to social influences was nicely illustrated by Margetts et al. (2015) at the Oxford Internet Institute (OII) in several experiments (personality features that were examined include extravert, pro-self, pro-social, conscientious). An important outcome in these experiments was that whereas some types of people are typically eager to participate in social media, other types of people are less willing to participate in social media. Obviously, this impacts the quality as well as the representativeness of the big data collected by social media. Also Junqué de Fortuny et al. (2013) discuss some of the issues involved with the use of big data (missing data, miscoded data, measurement error, duplicated data, inconsistency) arguing that users of big data should be aware of the presence of such issues as well as their potential consequences – most obviously the lower quality of the big data set. An illustration of a limitation to big data use in policy decision making was discussed given by Lazer et al. (2014). In particular, as to the success story of using Google search queries for monitoring and tracking influenza-like illnesses of US citizens as presented by Ginsberg et al. (2009), they remarked that even though the use of Google search queries facilitated the monitoring it led to a persistent overestimation of flu prevalence.
A final point that is important to discuss deals with the ‘politics’ of big data – i.e. prospecting why policy-makers should use big data in their decision-making processes. This is important for several reasons. The first reason is about the possibility of innovating the way the services are developed and delivered to citizens. If, indeed, the big data allows a clearer and more precise picture of the individuals (as claimed by Pirog, 2014), then the policy-makers can better understand behaviors and preferences of citizens to tailor the specific services to them. For instance, if National Health Service can obtain timely information about people’s activities and health status, it can define an individualized set of services ready-to-use when they arrive at the hospital(s) – also by tracing an electronic set of information. Same reasoning applies to information about other spheres of public services, such as education (where Learning Analytics is indeed diffusing, see Siemens and Long 2011), elderly care, etc. In this vein, big data opens the door to a new citizen-centricity of services’ design, orientated toward a clever use of quantitative information without relying only on citizens’ active involvement.
The second issue is that of more precise and robust evaluation of interventions. The promise of big data in this specific area could be seen from the perspective that ‘[L]arge-scale, internet derived data sets can be combined with existing traditional data from administrative procedures’ (Mergel et al., 2016: 4), so that the empirical approaches used by social scientists can benefit from having a more complete set of indicators about policy outputs. Relatedly, another aspect of this new possibility stems from exploring heterogeneity of policy effects through these more integrated datasets, to concentrate the attention to the ‘tails of distribution’, i.e. where new collected data can help in characterizing subpopulations of citizens affected by single policies and interventions.
Third, a more extensive use of data analysis will necessarily be fostered by a continuing process of de-materialization of service delivery. Indeed, to the extent that governments will be more and more e-governments, PAs and organizations can collect users’ behaviors. While the trend will conduct to some straightforward benefits (such as the reduction of bureaucracy in the intermediation of the relationships between citizens and PAs), at the same time there are indirect effects in the amount of digital information that is created in the active interaction of citizens with administrations’ portals and digital infrastructures. The policy-makers will then be increasingly aware of the potential informative power of the data generated through digitally-delivered services, and they will increase the adoption of tools and instruments that are useful to trace citizens’ activities and requests (as, for instance, systems of unified identification such as the digital identity cards).
A tentative research agenda
To conclude, we identify some research gaps and interesting areas for future research. One promising research area with the potential to have a strong impact on big data use and research is the study of inputs, outputs and outcomes of big data systems. Several studies notified the need to develop efficient and effective tools to collect, store, analyze, and visualize big data (Chen and Zhang, 2014; Gandomi and Haider, 2015). The distinction between efficiency and effectiveness is important. Efficiency evaluations of big data systems focus on the input–output link, thereby asking the question how many outputs can be produced for a certain amount of inputs (for example, how many valuable information can be retrieved from big data). Evaluations of the effectiveness, on the other hand, focus on the link between the outputs and the outcomes (how accurate is retrieved information). Only a few studies have discussed the inputs, outputs and outcomes of big data systems. Among those few are Gandomi and Haider (2015) and Gani et al. (2015). Both of these studies discuss possible metrics for inputs, outputs, and outcomes of big data systems and techniques (e.g. metrics for the volume dimension of big data). Yet, more research is needed.
Another area for further research is related to the development of a theory of how government organizations (should) adopt big data for decision-making and organizing their actions effectively. Such theories may provide insights for managers of public organizations that can be useful for helping them in successfully implementing innovations. Some studies have made interesting attempts at studying and modeling the adoption of new innovations (such as big data) in government sector organizations. Mergel and Bretschneider (2013), for instance described a three-stage process for adopting and integrating social media in government and building communication networks for interacting with citizens and stakeholders. Broadly speaking, these three stages involve an experiment phase (informally working with social media), a regulation phase (drafting norms and regulations), and a formalization phase (the formalization of the types of interactions and new modes of communication in social media strategies and policies). Other models and critical success factors for IT-innovation adoption in government sector organizations (for instance, the Open Government Maturity Model) have been proposed and discussed by, among others, Kamal (2006), Lee and Kwak (2012). Other studies have explored new models of government practices in the era of big data and digitalization (Williamson, 2014). Nevertheless, as noted by several of these authors (e.g. Kamal, 2006), more research across different government departments and their operational settings is needed to test and further refine the model. Therefore, there is still a room for both theoretical and empirical contributions in the field. The research questions should deal with two themes: (i) to what extent big data can provide better and wider sets of information to be used by policy makers and administrators? and (ii) is a more extensive use of big data able to generate more propensity toward innovation in public services – and if yes, is this in turn leading to better results?
Another theme that warrants further research is how new innovations such as big data affect government stakeholders (citizens, suppliers and contractors, and politicians). A useful starting point for such research is the study of Pollitt (2011). He develops a framework for the analysis of technological change. The framework includes the effects on citizens, users of data, service providers, and other stakeholders, as well as on the wider cultural norms and beliefs. The influence of innovations on government stakeholders reveals two main trends. On one side, monitoring citizens’ perspectives can favor a higher level of involvement in public decisions. This trend is not only positive because it allows engaging the citizens per se, but also because this can contrast the growing loss of trust into governments (OECD, 2013). On the other hand, big data analytics can accompany more transparency with more understanding of the underlying phenomena measured by quantitative indicators. In this sense, to the extent that the data is open and publicly available (in the open data spirit, coherent with the big data discourse), several actors can take advantage of monitoring the public organizations’ activities and results. For instance, when the (big) dataset of procurement activities is made public, all the companies that supply services to the PAs can be aware of price competition; and the citizens can check the efficiency of the related expenditures. In both these cases, future research should be devoted to shed more light on these processes of change management of public services, as well as on the effects of public value generated through these changes.
Somewhat related to the previous theme, another interesting research question to explore is what shifts in power big data are bringing? Aren’t there any risk involved for governments (especially the local governments and the governments from less developed countries)? Isn’t there any risk that big data giants such as Facebook, Google, Twitter, and others may start controlling our lives? A study report of the McKinsey Global Institute (Manyika et al., 2011) on ‘Big data: The next frontier for innovation, competition, and productivity’ describes how the ownership and use of big data will become a key element of competition between enterprises and countries. Margetts et al. (2015) speak of an unruly new force in the political (but also the economic) world. Manyika et al. (2011: 6) expect that the use of big data will become a key way for leading governments and companies to outperform their peers. In particular, the belief is that leading users of big data (both in the private and public sector) who succeed to effectively capture the potential of big data will see their value and power increasing at the expense of their competitors who are more lagging in terms of using big data.
Conclusion
There has been an increase in interest in big data technologies and related fundamental and statistical research (Chang et al., 2014; Chen et al., 2012). Illustrative is that many universities have established research centers on big data (for example, University of California at Berkeley, Columbia University, and, Eindhoven University of Technology – Jin et al., 2015). The attention of scholars is warranted given the need to establish a theory of big data. A fundamental analysis of the theory of big data would help to understand the characteristics of big data as well as to develop technologies and management models to work with big data.
In addition, a better understanding would result in clear advantages for public policy and administration. We see at least seven venues. First, by combining structured and unstructured information and data, public policy and administration will benefit from big data thanks to better services for citizens. Second, the de-materialization of procedures and bureaucracy will result in lower costs for both administrations, less personnel and lower tax rates. Also citizens will benefit thanks to fewer administrative exchanges. Third, we see big data as a solution for security issues as from the unstructured data (e.g. phone calls), data can be traced and patterns can be predicted. Fourth, in a similar vein, it might result in a solution for environmental issues as it becomes quicker and easier of keeping track of environmental problems, and providing data-driven solutions. Fifth, the current migration crisis certainly benefits from big data as administrations can easier follow (also in an unstructured way) people. Sixth, it allows policy makers to increase the citizen-centricity of services as there are more data for customizing and targeting interventions. Finally, it is possible to more precisely evaluate the interventions by exploring heterogeneity of effects via more integrated datasets.
We hope that this symposium can further contribute to the debate and fuel the knowledge of the theme. The next two papers (Agostino and Arnaboldi, 2017; Johnes and Ruggiero, 2017) provide some innovative ways of tackling the challenges ahead.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
