Abstract
In this article we confront existing literature on barriers in big data implementation for policy making in municipal governments. We have conducted four cases in a Dutch municipality in which big data is implemented for policy solutions. This has led us to develop a new, comprehensive model which explains which barriers exist while making implementing big data. Dimensions such as the technological, the legal, the informational, the organizational, the ethical and the government-citizen relation dimension will be included in the study. It will be argued that ‘hard’ barriers, such as technological and legal prove to be of far lesser influence than literature suggests. On the other hand, the ‘soft’ barriers such as alignment issues and ethical considerations going further than legal aspects prove to be far more determining barriers for the policy making process in implementing big data solutions in municipal governments.
Introduction
In literature the promise of big data is often emphasized: how it will account for more rational policy making, better prediction of consequences and more cooperation between different public-sector organizations and departments as well as more cooperation with the private sector and citizens (Snellen, 1994; Bekkers & Moody, 2015). In practice we find that public organizations emphasize big data but the degree in which they actually process and use big data is quite limited. While this topic is rather large, we want to focus ourselves on municipal government to see which obstacles they encounter when implementing big data for policy solutions.
When referring to existing literature, the reasons why big data is not being used, or not being used to its full extent, we find a fragmented result. In some disciplines the ICT architecture is being seen as the cause (Vis, 2013; Trelles et al., 2011; Edwards, 2010; Kruizinga et al., 1996; Merz, 2005), in others we find IT alignment issues as being the problem (Henderson & Vankatraman, 1999; Romero, 2011), in other literature we find that the legal aspect (mostly privacy issues) is seen as the biggest barrier (Ohm, 2010; McNeely & Hahn, 2014; Stough & McBride, 2014). What is lacking at this point is an overview of causes, in a more interdisciplinary matter in which both the ICT side (architecture) the data side (the data itself) the organizational side (alignment), the legal side (privacy) and the individual cognitive side are combined to give an overview of barriers to big data implementation for policy solutions in municipal governments (Pencheva et al., 2018). Some research has been conducted aiming to combine these factors, however, this was on the basis of a literature review and an empirical account in which these factors are researched together in one or more cases is lacking (Pencheva et al., 2018; Mergel et al., 2016). In order to group these causes we will elaborate on these causes when we explain the conducted systematic literature review in the theoretical framework section.
In our paper we want to identify antecedents which help us understand why municipal governments do not use big data for policy solutions to its full extent, and we aim to look at the interplay between these antecedents identified above. Primarily we want to look at the way these individual antecedents relate to one another. We aim for a more interdisciplinary approach in which we describe the antecedents that lead to implementation of big data solutions, and their relationship with each other, in order to fill the gap between disciplines. The main question of the research we aim to answer is: Which antecedents limit or help implementing big data for policy solutions in municipal governments and specifically in the municipality of Rotterdam?
In order to do so we will look at four cases within the municipality of Rotterdam where a big data solution was proposed to be implemented. We will focus on the process of implementation of the big data solution and not on the future use for policy making of the implemented solution. In one of these cases the results are positive (however some barriers had to be overcome) one case failed, and two cases are still work in progress. The successful case deals with the use of unstructured data in order to more successfully uphold the law in preventing benefit fraud. The failed case deals with the use of unstructured data to create profiles for vulnerable citizens who need governmental aid. The two cases which are still progressing are a case using operational data for measuring policy effects aimed at preventing and solving youth unemployment and a case mapping cultural and health data.
In the next section we will situate our research question within existing literature and we will elaborate on existing theory on the topic on the basis of which we will build our conceptual framework. In section three we will present our conceptual framework, our operationalization and our methodology. We will present our findings per case in section four, and in section five we will combine the findings per case and relate them to our conceptual framework. Finally, in the conclusion we will answer our research question and relate findings back to existing literature.
Theoretical framework
When placing big data in context, often the DIKW (data, information, knowledge, wisdom) hierarchy is used to do so (Rowley, 2007). Data are objective observations of things, events, activities or transactions and give no meaning without context. Data becomes information when it is combined and organized. Information can be reorganized into knowledge, which is the combination of information with one’s experience and skills, and can be transferred to others. Finally, there is wisdom which is more abstract (Jifa & Lingling, 2014; Rowley, 2007; Weggeman, 1997; Boersma, 2006; Van der Spek & Spijkervet, 2005; Nonaka & Takeuchi, 1995; Garnder, 1995). Whether something is just data or big data remains a grey area, mostly big data is typed by what is called ‘the four v’s’ (Mohanty et al., 2015; Minelli et al., 2013). Firstly, there is volume, which relates to the quantity of data, not only the absolute amount, but also the increase compared to the amount before. Velocity deals with the idea that the data is accumulating speedily, for example by renewing or adding frequently. Variety elaborates on the different types of data and data sources, as well as the different formats and structures the data is stored. Finally, there is veracity which entails that the data is doubted, it is subjective, often misleading and sometimes unreliable, in other words, the data is up for discussion (Lukoianova & Rubin, 2014; Mohanty et al., 2015). In literature there is consensus on these four V’s, however a formal definition is lacking, we will, for our research, stay with the general agreed upon definition by Stough and McBride (2014) that big data is a set of data bases that are too large to be adequately handled by current technologies and is viewed to have limited use for analytical purposes because of their irregular and heterogeneity properties.
Often the assumption is made that big data will lead to more transparency and because of this transparency policy makers will be able to make more rational (effective and efficient) policy decisions. This assumption lies in the idea that big data will make solutions, problems and alternatives more visible then before (Snellen, 1994; Bekkers & Moody, 2015).
The question becomes that if this assumption were true, why are not all organizations using big data? Several barriers to the use of big data can be found in literature. In order to find these barriers, we have conducted a systematic literature review. We have searched for a combination of ‘big data’, ‘open data’ and ‘linked data’ on the one hand, and ‘policy’, ‘government’, ‘governance’ and ‘public sector’ on the other hand. The choice for the search terms is based on the aim of looking at the public sector and big data (or reasonable synonyms) and not the private sector. We have found 484 articles, and after looking at relevance for our research topic, type of paper (in which we have dismissed essays and book reviews) and duplicates we ended up with 74 articles which proved to be useful for our research. On the basis of these articles we have identified barriers to big data use for policy solutions, these barriers then were clustered into six different clusters of variables.
The first cluster deals with barriers relating to the data itself, the informational explanation. Firstly, it is possible that the data one is looking for simply does not exist: nobody ever collected it because nobody thought it to be relevant. The data can also be wrong or incomplete or the data is available but cannot be interpreted. Uninterpretable means that there is a data file, however it is unclear what the data mean, it is not operationalized or it is simply too much to oversee (Kaisler et al., 2013; Ribes & Jackson, 2013; Chen, 2006).
The second cluster focusses on the technical explanations. In order to process data an ICT infrastructure is needed, if this infrastructure is not available or not functioning sufficiently, the data cannot be processed or linked. This technical explanation refers to both the available technological resources as well as the knowledge of those operating those resources (Vis, 2013; Trelles et al., 2011; Edwards, 2010; Kruizinga et al., 1996; Merz, 2005).
The third cluster focusses on legal explanations. This mostly evolves around the privacy discussion. The data cannot be collected, stored or used. For example, often it is illegal to collect data on personal medical issues by non-medical professionals. Secondly the data might be allowed to be collected and stored, but it can only be used for one specific purpose and cannot be linked to other data, limiting its possibilities for transparency. For example, some governments limit the possibility of linking crime data to race (Ohm, 2010; McNeely & Hahn, 2014; Stough & McBride, 2014; Mergel et al., 2016; Pencheva et al., 2018; Clarke & Margetts, 2014).
A fourth cluster deals with organizational explanations. The data might not be transferrable between different organizations or organizational departments. This can be a technical problem. The data can be stored in incompatible formats, making it impossible to link the data to one another. This is a standardization problem. It can also be a human problem; people might be unwilling to share their data because of fear of losing autonomy over their own field (Turner & Higgs, 2003; Moody, 2010; Bellamy & Taylor, 1998; Pollard, 1998; Lips et al., 2000; Pencheva et al., 2018).
The fifth cluster deals with IT alignment explanations. Here we find that there is a mismatch between the IT strategy and the strategy of the rest of the organization (Henderson & Venkatraman, 1999). Infrastructures, strategies, processes and skills of IT and the business need to match each other and quite often this is not the case and the business has an ‘us and them’ perspective towards the IT department (Henderson & Vankatraman, 1999; Romero, 2011). Communication is of prime importance here. Even if the data exists, it might be the case that only the IT department is aware of this. Employees in that case might not be aware of the existence of the data, and will therefore not use it. Lack of communication between the IT department and the department for which the data might be useful causes this situation to remain (Moody, 2017). The perception of complexity and the conception of ICT is another factor within this cluster. If the conception of ICT held by the IT department differs significantly from the conception held by other departments, cooperation becomes impossible, reinforcing the ‘us and them’ line of reasoning. Employees might believe that what they want (for example combining of data) to be complex so they will not ask their IT department to do it for them. Additionally, the IT department might be unaware of the needs of the business and therefore will not offer its services. This deals with the perception of IT itself (Littlejohn & Foss, 2007; Reich & Benbaset, 2000; Luftman, 2003).
The final cluster, and this only occurs in the public sector, are explanations found in government citizens relations. Governments are afraid that the data and the conclusions derived from the data might lead to outcomes that are considered unethical, for example see the discussion on racial profiling. Additionally, they fear that outcomes will be used or interpreted in a way they consider either wrong or unethical and that this would lead to loss of legitimacy (Moody, 2010; Bekkers & Moody, 2015; Moody & Gerrits, 2015; Mergel at al., 2016; Pencheva et al., 2018). Another issue deals with accountability. Once data becomes transparent the public might hold the government accountable, which would also lead to a loss of legitimacy (Tene & Polonetski, 2012; Ohm, 2010; Bekkers & Moody, 2014; Moody & Gerrits, 2015; Clarke & Margetts, 2014).
These clusters of antecedents are intertwined with one another and when combined will lead to the following conceptual framework.
Conceptual framework.
What we find here is that barriers to the use of big data stem from four sources:
Firstly, the point that data is simply not available, this can stem from a) nobody collected it, b) legally it is not allowed for the data to be collected or stored, and c) there is an alignment issue and the data does exist, but the potential user is not aware of the existence of the data.
Secondly there is the point that the data are not interpretable. This can also be caused by three things: a) the alignment issue where the person using the data does not know how to interpret it and this is not communicated by the IT department, b) the alignment issue dealing with the frame of ICT where the person wanting to use the data assumes that he will probably not be able to understand the data or the infrastructure, and c) the point that the data cannot be linked to other data and therefore becomes useless in the given context, this in its turn can be caused by three other factors: 1) legally it is not allowed to link the data, 2) the data cannot be linked because it is technically impossible because different departments work with different formats and 3) departments or organizations refuse to share data because of fear of losing autonomy.
Thirdly the data might be wrong or incomplete, again this can be explained by three factors: a) alignment communication issues, as described above, b) the frame of ICT, the alignment complexity issue as described above, and c) the point that the data cannot be linked, also as described above.
Finally, there is the point of government citizen relationships, governments fear the loss of legitimacy and therefore refrain from using big data because a) they fear unwanted outcomes or wrongful use and b) they fear being held accountable when certain information becomes public.
In order to research the relations presented in the conceptual framework and to provide an answer to the question on which antecedents found in literature, empirically help or hinder implementation of big data solutions in municipal governments, we have conducted a qualitative comparative case study in which we have analyzed four case studies within the city of Rotterdam. We aim to study how antecedents in literature are interrelated and we aim to find out why they are interrelated and/or affect each other. While a large part of the barriers which are found in literature are based on subjective and often tacit considerations (such as for example perception of ICT, communication and autonomy) we chose for qualitative approach to ensure that through in-dept interviews and observations tacit considerations become visible. Furthermore, a qualitative approach enabled us to also find antecedents that were not present in literature in a more inductive manner (Silverman, 1993). The four cases are all pilot projects within a larger pilot called DARE enabling the municipality of Rotterdam to focus on information steered actions. The justification for choosing the city of Rotterdam stems from our wish to look at different degrees of success, and therefore barriers, of big data solutions. The city of Rotterdam provided us with the opportunity to look at four cases with different levels of success within the same municipality. On the one hand this makes our research less externally valid, since all cases are within the same municipality. On the other hand, this allows us to explore how different barriers affect different types of policies and solutions, without being influenced by institutional or organizational setting. We therefore, do not claim strong external validity in terms of the results our research being transferrable to other municipalities. However, we do claim that our results hold some external validity across different types of policy.
Additionally, the city of Rotterdam is the largest city in the Netherlands and we aimed for a municipality which held enough data to ensure that indeed the policy is big data related.
Another point that needs to be mentioned is that all four cases are part of a pilot, thus incurring the risk of a pilot paradox, i.e. those participating in a pilot are in general more enthusiastic about a project and this could bias our results (Van Buuren et al., 2016).
The first case deals with the project of the data map. In this project it is aimed to provide civil servants who operate within neighborhoods with better information on these neighborhoods presented in the form of a map. The future goal is to be able to make more specific policy per neighborhood on the basis of this information. This project is currently running and the end product (the data map) is not finished yet.
The second case deals with youth unemployment. The project aims to obtain insight into how young adults ‘move’ through all different types of services provided by the municipality of Rotterdam on the basis of system data. This insight can be used to develop policies that enable a more efficient way to deal with the problem of youth unemployment. This project is currently running and the end product (insight in how young, unemployed adults move through the municipal system) is not finished yet.
The third case deals with socially vulnerable citizens (such as homeless citizens) and aims to create more robust information within the social domain both qualitatively and quantitatively on which policy could be based. This project does not have a clear definition because it has not started.
The final case deals with enforcing rules relating to welfare benefit fraud. The aim is to tag citizens with a risk score on whether they potentially will commit fraud so that enforcement can become more efficient, monitoring those who are most likely to commit fraud and creating new policy for fraud prevention. This project is finished and the data tool is in use.
Within the cases we have triangulated our methods so that the weaknesses of one method can be substituted by the strengths of another (Yin, 2013). Firstly, a content analysis has been done on relevant policy documents. Secondly, we have observed the interactions between employees of the city of Rotterdam over a three-month time period (the time of the pilot). Thirdly we have conducted semi-structured interviews. An interview guide was used to ensure all barriers that we found in literature were accounted for, while at the same time providing for enough possibilities for the respondents to add their own perception of how the case evolved (Babbie, 2001). A total of 18 individual respondents were interviewed (some more than once). For each case eight or nine interviews were conducted, and in the cases in which respondents were involved in more than one case they were interviewed multiple times, so they were interviewed per case separately, to ensure their focus was on the case at hand. We have selected respondents on the basis of their function within the organization and their involvement with the case. In order to make sure that we were able to analyze the big data solution from different perspectives we have ensured that in each case there were respondents involved in the ICT (technical) side, respondents on the content (bureaucracy-level) side, and respondents on the management level. This enables us to look at both the content as well as the data and infrastructure component and also viewing the alignment between the two.
Table 1 summarizes the respondent’s involvement.
Respondents per case and function type
Respondents per case and function type
All interviews were taped and transcribed and together with the documents and the observations the data was analyzed in a two-step manner. First the data from the different sources was analyzed by closed coding, each barrier in our conceptual framework was assigned a code and interviews, observations and documents were coded along these lines. In a second step we conducted open coding to analyze our data in order to find variables our conceptual framework might be lacking. Finally, we did axial coding in order to relate concepts to one another and to come to a comprehensive set of findings. The coding process was an iterative process and different data sources were constantly compared (Berg & Lune, 2012; Siverman, 1993). In Table 2 we present the coding taxonomy and distinguishing between open and closed coding. In the last three columns we present whether we found the codes in our data. The table shows that observations and interviews show the same results, however documents show less codes, this can be explained by the nature of the documents, they do not represent personal opinions on for example communication or ethics, but they only show more formal information. We have only indicated whether a code was assigned and not how often since it would give a skewed view of reality since the codes prove to hold a causal relation to one another.
It must be noted that a condition for us to do the research was that all respondents were granted anonymity and they would not be quoted because quotes would be directly relatable to individual respondents given their involvement in the cases.
Below we will summarize the findings for every case per cluster of barriers, in Section 5 we will combine the findings so we can compare the cases in order to get an overview of the results of our research.
Data map
As of now the data map is not finished, the aim is to provide civil servants working in neighborhoods with more and combined information on these neighborhoods in the form of a map, in which all geographical based data is combined, this deals with both traditional geographical data (such as sewer pipelines) as well as more social data (cultural organizations) and general neighborhood information. The future goal is to develop specific policies for each neighborhood.
Coding taxonomy
Coding taxonomy
When we looked at informational barriers, we find that a lot of data is not (yet) available or incomplete, mostly there is limited information on health, while the basic information is present. Also, respondents claim that it is difficult to interpret the data correctly but this is caused mostly by the data being incomplete or by the data being outdated. While respondents do mention this as a barrier, they feel that the big data solution is, even with this barrier, useful.
Technical barriers are also present, respondents claim it is difficult to find the data they need and the data is not to be found in the most likely places, the department dealing with the infrastructure is, according to respondents, slow to fix these issues which causes frustration because it takes too long in their opinion to solve these issues.
In terms of legal barriers, we find that the data is mostly collected by the municipality of Rotterdam and therefore only moved within the organization which is (for these data) legally unproblematic. Efforts are being made to combine these data with data from other (semi) government agencies, none of the respondents found a privacy or legal concern there, especially since the data itself is aggregated data and does not refer to an individual. Possible privacy concerns are discussed with an internal privacy officer, but have not proven to be a problem or barrier.
Organizational barriers seem not to exist, different departments are sharing data and in the future the aim is to also share with other organizations. Respondents feel that sharing information would be easy and other departments or organizations would not object.
There seem to be some issues between the IT departments alignment with the business department. Both groups of respondents claim to be in ‘two different worlds’. However, respondents do not feel this will be a problem because they believe there are key figures in the project who can link the two worlds. Respondents do claim that the business side finds it difficult to formulate its needs. Respondents are confident ‘all will work out fine eventually’ on the one hand, but on the other they do signal some problems in terms of difficulties in properly formulating their needs to the IT department. When looking at the civil servants working in the neighborhoods the outlook is less positive. They feel that they should not be involved in anything dealing with ICT and feel that the IT department does not address their questions appropriately.
For government citizens relationships we found that respondents expect that citizens will be pleased with increased transparency and that neighborhoods will obtain more of their needs as a result of the availability of the data map. Not only is it stated that citizens would probably have no objections towards the data map, respondents believe that it would lead to more positive government-citizen relationships. It is explicitly added that this is the case because the data map is not used for surveillance. There is some fear of wrongful use, but this is linked to the outdated or incomplete information and is not seen as a loss of legitimacy, but more as a problem in terms of the work of civil servants who might be working with outdated information.
The aim of this project is to combine system data of all young, unemployed adults in order to visualize how these people move through government systems. By visualizing this information, the goal is to find work quicker by predicting (on an individual level) which approach would work best and to develop policy to make this more efficient.
Looking at the data itself we find that there is a lot of data, but the reliability of the data is questioned by respondents. This stems from the differences in opinion between those running the project and those (caseworkers) collecting the data. Since they do not agree with one another on definitions, the data itself is not seen as reliable. While one caseworker might consider a youngster fit for a certain job, another caseworker might not agree. Also, the data is regarded as incomplete. Respondents feel that they can still work with the data, but worry about its state in terms of completeness and subjectivity.
The technical infrastructure is seen as poor. Respondents feel that the IT department has limited knowledge of the data and are not a large help in solving problems. Additionally, the existing soft- and hardware does not support what the project aims and therefore greatly hinders the project.
Some legal barriers have existed in the past in the field of privacy and linking of personal data, but by adopting binding agreements which specify what information can be used for what purposes and by anonymizing the data, the barrier has been lifted, data is collected by the municipality itself and does not leave the organization, it is only presented in aggregated form and therefore there are no legal boundaries.
Organizational barriers only exist in the legal domain. Some organizational departments are not allowed to share information with other organizational departments, but by anonymizing the data this barrier has been lifted as well.
In terms of alignment several efforts have been made to integrate the business and the IT. Respondents of both departments are positive about these efforts and feel that there is a mutual understanding of needs, however knowledge and fitting soft- and hardware is still lacking.
In terms of government citizens relationship, the unreliability of the data is a concern, there is a fear that information based on these data will be made public and the public might interpret the data incorrectly. Wrongful conclusions on ethnicity or gender are of particular concern to respondents. Respondents fear that this could decrease the legitimacy of the municipality.
Vulnerable citizens
The aim of the project was to combine data on vulnerable citizens in order to find which government interventions would have the most positive impact. The aim was to create profiles of vulnerable citizens so that groups of citizens could be targeted with the best solution. The project has never started and will not start in the near future.
Barriers in the informational domain are large. Data are missing and therefore incomplete, also data are unreliable since case workers dealing with vulnerable citizens have different perceptions from those dealing with the project. More specifically, those dealing with the project aim to categorize vulnerable citizens into profiles while caseworkers aim for tailored, individual solutions. Additionally, the task the municipality is concerned with has only been a municipal task since 2015 so there is limited longitudinal data.
Technical barriers could not be distinguished because the project never started.
Legal barriers are present in this case because mental and physical health is a core factor within the goals of the project, so sensitive personal data is used. Respondents claim that this concern is large, but also link it to a culture of being careful with personal data causing those involved to be critical of the project. Respondents do have confidence that if the project had started, they would have been able to overcome these barriers by anonymizing the data or by making binding agreements on conditions for which the data could be used.
There were also some barriers in the organizational domain. Several departments work with different systems for entering the data and these systems are incompatible with each other, therefore the data cannot be combined making it impossible to achieve the aim of the project. Additionally, those in the business worried that the project would be largely IT based: aggregating individual data. They fear that they would lose autonomy in their tasks because they would be forced to adopt a more collective way of dealing with citizens instead of being able to come up with tailored solutions for individual citizens.
Alignment issues in this case are large. There is distrust between those working in the business and those within IT. This stems from the nature of the project; where those in the business have a culture of looking at the individual, those in IT want to aggregate data. Those in the business fear that the individual perspective of vulnerable citizens will be lost and that a personal approach will be hindered by the project. Additionally, respondents claim that the IT side does not understand what the business side wants. In general respondents claim that the initiative for the project did not come from the side of the business and therefore there was little enthusiasm and willingness to invest time and resources in the project.
No barriers in government citizens relations could be found. Whether there are none or whether this is because the project never started is unclear.
Fraud
In the fourth case the goal is to analyze data in such a way that risk profiles can be made predicting which citizens on welfare are more likely to commit fraud and to target these groups with new policy. The goal is to detect citizens who commit fraud more speedily and not to bother law abiding citizens, who, before the project, were being monitored. The project at this point is at an end stage and is considered successful.
Hard informational barriers were not found. There is some concern about whether the data is longitudinal enough. In the past the data was not always entered into the system. However, this does not pose a real problem for the project today. There is some minor discussion on the reliability of the data, mostly related to whether it was collected long enough, but this does not pose any problems at this point.
One of the technical barriers found in this project is that at this point only already structured data is used, unstructured data would have to be structured with text mining which at this point turns out to be impossible due to lack of resources. According to respondents it would be of added value. Furthermore, we found that there are many agreements in place to make sure that sensitive data is secured properly. At the beginning of the project it turned out to be difficult to interpret longitudinal data since it had to be retrieved from different places (due to organizational choices in the past), however this barrier has been solved. In contrast, when not looking at the IT department dealing with this project directly, but looking at the IT service as a whole, it becomes clear that respondents are dissatisfied, the infrastructure is considered inflexible, slow and unresponsive. According to respondents: “in the end it did work out, but the process itself was very time consuming and frustrating”.
On the legal side there seem to be few barriers. Respondents claim that they do not have to worry about privacy because all the data comes from and stays within one organizational department. Moreover, respondents state that there is an implicit agreement that when one receives welfare one is aware of the fact that fraud will be monitored. However, we also find that measures are taken to avoid issues relating to privacy becoming a problem, all data is pseudonymized and therefore there was no legal barrier.
In the organizational domain we find no barriers, there is no sharing of data or cooperation between departments in the project and other departments or organizations. Some respondents claim that this is the case because it was expected to have difficulties obtaining this data, others claim that it simply did not fit the scope of the project. Whether this in fact was a barrier is unclear on the basis of respondent’s opinions.
Alignment in this project seems to move smoothly, even though those working within the business are not able to explicitly explain how the infrastructure works, they are aware of what is happening and can explain why things are done in a certain way. On the IT side we find that considerable effort is made to understand the tasks, goals and interests of the business side, leading to a good understanding between the two. Alignment itself is a goal within the project and conscious efforts are being made to improve existing alignment.
In terms of legitimacy we find that in this project the variable of ethnicity is taken out, all data leading back to the ethnicity of a person on welfare is not included in the profiles or the data. The reason given for this is not the fear that this information would become public, but the fear that department employees would consider it problematic. Additionally, respondents feel that the project will increase legitimacy, law abiding citizens will be monitored and controlled less often and the feeling is that the majority of people, being law abiding, would have no problem with improved detection of fraud.
Towards a new conceptual framework
Our findings per case are summarized in Table 3. Combining these findings and comparing cases and reasoning them back to our conceptual framework we find a set of results, which are described below.
Firstly, we find that in all cases there are some concerns about the data. They are considered unreliable, incomplete or outdated. This is to be expected since questions about veracity are linked to big data. This is not necessarily a direct barrier in starting a big data project but is a barrier in the scope of the project, projects cannot be broadened if the data is missing or incomplete. There is a link between this barrier and the barrier of legitimacy. In several cases respondents claim that the fear that the incomplete or unreliable data might be interpreted incorrectly which could cause a legitimacy problem for the municipality. Either because decisions are based on incorrect interpretations, i.e. the decision itself is wrong, or because incorrect information is made public.
Technical problems deal with rigidity of the infrastructure, in all cases except for the one which has not started, respondents are dissatisfied with the infrastructure, they claim it hinders scope, innovation and ambition of projects because what they want turns out to be impossible or extremely time consuming. This shows that the expected infrastructural barrier is supported in empirical findings.
This infrastructural point needs to be linked with the alignment point. Where one would expect that these technical, or infrastructural issues reflect alignment issues, this turns out not to be the case. The reason is to be found in the organizational domain. Within the researched projects we find that the IT infrastructure department of the municipality is not the same as the IT department dealing with these projects. Alignment therefore between the IT department and the business can move smoothly, and at the same time the infrastructure can hinder developments. This supports the expectation that alignment barriers and infrastructural barriers need to be separated.
Legal barriers, as we found them, turn out to be more frame and culture based than law based. While in all cases privacy laws are enforced, we find that the concern for these privacy laws depend on the culture and the frame held by employees. Where those in the cases dealing with health are cautious about the privacy issue, those in fraud are less cautious. According to respondents it comes down to a privacy tradition within the respective fields. In medical sectors it is historically more important to deal with privacy than in sectors dealing with policy enforcement, employment or geographical data. This leads us to conclude that there are legal barriers in big data projects, but also that they are fairly easy to overcome. Ethical matters on the other hand are more difficult. Seen in the vulnerable citizens case we find that even though solutions were legally solid, employees were worried about issues concerning privacy beyond the legal scope. In the fraud case we find that the inclusion of ethnicity, while legally
Summary of findings
Summary of findings
allowed, raised ethical problems within the department and therefore was taken out. This prompts an adjustment in which ethical issues outside the scope of legal issues are also considered a barrier.
Concerning organization barriers one of them is explicitly mentioned, in the field of vulnerable citizens, there is a fear of losing legitimacy. This fear can be explicitly linked back to the aims of the department itself. Dealing with vulnerable citizens demands an individual approach, big data solutions always lead to more collective solutions. In the other projects the goal itself was more collective and therefore this problem did not exist. However, implicitly we find more organizational barriers, those are mostly linked with the legal barriers. The expected fear of loss of autonomy or lack of standardization is only found in the vulnerable citizens case. When speaking about legal barriers and privacy, respondents automatically move towards sharing between organizations and departments. This leads us to conclude that organizational barriers (sharing between organizations) in general are caused by legal barriers.
In our researched cases alignment seems to be the variable determining success of the project, when alignment is high, success is more likely. The case with a high degree of alignment succeeded, the cases in which there were some alignment issues we find that they were overcome and the project is ongoing, and in the case where alignment was low the project never started. Given our findings it turns out to be crucial that the business side is convinced of the necessity and added value of the project and is able to formulate its questions in a way the IT side understands them. In turn the IT side needs to have a large understanding of the aims, wishes and problems of the business side and not only in terms of process but also in terms of content. Without a mutual understanding and continuous effort to improve alignment big data projects might fail.
Finally, the barriers dealing with loss of legitimacy we find in general in our cases it is expected that projects and results will increase legitimacy, the only fear is that data is interpreted or used incorrectly and that that will cause a decrease of legitimacy. In one case this is solved by taking out variables that are considered unethical.
When confronting these findings to our original framework we find that we need to adapt, see Fig. 2.
Adaptation conceptual model.
First of all, we have removed the variables of no data, alignment perception of complexity, organizational standardization, accountability and legal data cannot be stored/used, simply because in our cases we found no evidence for these relationships.
On the basis of the literature, we have assumed that informational barriers have a direct effect on big data implementation. This does not seem to be the case; they do not hinder start or continuance of projects. Actors within projects will simply accept lower interpretability or incompleteness. However, these variables do have an effect on legitimacy. Respondents fear that uninterpretability or incompleteness will lead to unwanted use (for example ethnic profiling) or to unwanted decisions (i.e. decisions based on false data and information), and this leads to a barrier. Where in the original conceptual framework both informational barriers and legitimacy barriers had a direct effect on implementation, now we find that they are interdependent. Informational barriers cause legitimacy barriers which effect use.
The literature suggests that the technical, or infrastructural, barrier has an effect on the organizational barrier of standardization, but this does not prove to be the case. The organizational barrier of standardization does not exist in our cases and the infrastructural barrier has a direct effect on implementation, if the infrastructure is inflexible it hinders implementation severely.
Following existing literature, we find that legal barriers interact with organizational barriers, but also cause organizational barriers, if data cannot be shared or linked it hinders use of big data greatly.
Alignment barriers turn out to be the most crucial barrier, frame of complexity is not found in our cases but communication between IT and business determines implementation and success significantly. Furthermore, the perception of an added value is important, which in turn could, but not necessarily would, be created by communication. This closely holds a relation with the goal of the department, if actors do not believe the project has added value, alignment is impossible or fails, which hinders the use drastically.
A new variable which we have not found in literature is the ethical variable, this variable deals with issues that are not covered by legal provisions. Here we find that they have both a direct as well as an indirect influence on barriers. Directly because ethical concerns might cause a project to not even start, whether and to which degree these ethical considerations exist depends on the culture of the department involved. Furthermore, these ethical considerations also link to legitimacy, there is a fear that ethical matters will cause wrongful decisions or unwanted use, such as in the case of ethical profiling.
When reaching a conclusion, we must first state that external generalizability of our findings is low, not only because only four cases are covered within one municipality but also since all cases are pilot cases which often accounts for a pilot paradox in which projects are more successful because those willing to join a pilot are often more enthusiastic (Van Buuren et al., 2016). Furthermore, we must mention that decision making is a complex phenomenon and we have not researched the actual decisions in the implementation process in all its aspects. We have only aimed to look at the big data related barriers, this means that our research does not aim to be an all-encompassing research on big data and implementation of big data for policy. Our conclusions therefore cannot be generalized without reservation, but they are a start for further research in terms of barriers in implementing big data solution for policy making. Our goal of integrating different barriers from different disciplines into a single model provides us with several conclusions.
Firstly, we find that in literature a lot of emphasis is placed within the legal barriers (Ohm, 2010; McNeely & Hahn, 2014; Stough & McBride, 2014). In practice it turns out to be not that much of a barrier. Organizations seem to be used to dealing with privacy issues and all necessary measures are in place. Several solutions to uphold privacy laws, but at the same time make useful use of the data are known to actors and the legal barrier one would expect to find, given the omnipresent discussion of it, does not exist.
We also expected informational barriers, as described in literature (Kaisler et al., 2013; Ribes & Jackson, 2013; Chen, 2006). These are indeed present in situations where the data is uninterpretable or incomplete, but they do not serve as a barrier in themselves contrary to what the literature states. They create fear for unwanted use or wrongful decisions leading to a loss of legitimacy, which in turn serves as a barrier of its own (Mergel et al., 2016; Pencheva et al., 2018). This shows us that those dealing with the data are able to interpret the data, their fear is that others, when presented with the same data, will not be less able. Actors are aware of the risk to legitimacy when presenting others with less knowledge of the content with raw data. This does not refer to an accountability discussion, actors do not fear governments to be held accountable, they instead fear use of data for political gain, or wrongful decisions which might hurt clients (i.e. citizens). In literature this point is understated. Mergel et al. (2016) as well as Clarke and Margetts (2014) argue that civil servants must leverage the data science for the public good, but the consequences in terms of implementation of big data solutions remain under-researched.
Linking to the both points made above an ethical barrier exists, in literature this barrier is mostly combined or considered similar to the legal barrier (Ohm, 2010; McNeely & Hahn, 2014). In practice this is not the case. Following Mergel et al. (2016) we find that the ethical barrier goes further than the legal barrier and deals with organizational culture. Those in cultures which value privacy high (such as the medical sector) go much further than the law and much further than those in sectors with a long tradition in big data use (such as geographical information-based departments). It is also feared, just like with the informational barriers, that certain data will be wrongfully used (see the ethnicity discussion) and will therefore will lead to loss of legitimacy. It would be interesting in further research to separate the legal and ethical barriers and to see where one ends and the other starts since we have not found any literature doing so.
Two other barriers seem to be crucial as well, firstly the infrastructural. Here we find that without proper infrastructure big data use becomes impossible, where in literature this is often considered a problem of the past, our research shows that, at least in municipalities, this problem is present and infrastructures are inflexible and rigid and do not support innovative solutions (Vis, 2013; Trelles et al., 2011; Edwards, 2010; Kruizinga et al., 1996; Merz, 2005). A second barrier deals with alignment and the goal of an organizational department as well as communication. Proper communication between IT departments and the business department is a great enhancer of the use of big data. This is only a barrier when the goals of the business department and the goals of the IT department do not match, when the business department does not view the project as having added value, or serving their goal, the use of big data will not be helped (Henderson & Vankatraman, 1999; Romero, 2011). Our findings support the existing literature on both barriers.
In our cases we therefore find that the ‘hard’ barriers found in literature (infrastructure, information and legal) are not as present as the ‘soft’ barriers (loss of legitimacy, ethical other than legal, alignment). This leads us to conclude that these soft barriers are under-researched and in literature barriers to big data use are quite often seen as a more technical or legal issue while in practice ethics, culture and alignment together with a strong emphasis on legitimacy of the public sector are crucial in order to explain the barriers to big data implementation in municipal governments.
We claimed in the introduction that research on big data and the antecedents for implementing big data solutions are fragmented over different academic fields. Now we find that a lot of different causes or barriers for implementation are linked. Especially the sets of antecedents relating to the data itself, and those relating to legitimacy are closely intertwined. This demonstrates that big data in the public domain cannot be viewed solely from a data science viewpoint or solely from a public administration viewpoint. Disciplines need to be combined in order to come to a comprehensive set of theories on the implementation of big data solutions within the public domain. We follow Mergel et al. (2016) and Pencheva et al. (2018) in the recommendation to closer cooperation between the different fields and instead of reinventing each other’s conclusions, new, combined and overreaching theories and models should be developed.
