Abstract
Here, I introduce a novel approach towards data collection for comparative research and present a new data infrastructure on parties, elections and governments, the Parliament and Government Composition Database (ParlGov). This data infrastructure combines a database, data presentation in webpages and software scripts in order to generate more dynamic datasets and to facilitate cooperation. So far, it includes information about more than 1000 parties, around 600 elections (national and European Parliament) and almost 1000 governments with their party composition. These observations are linked to a wide set of information about party positions and make it possible to derive various datasets for studies in political science. To provide a first glance into the potential of this new data infrastructure, I map the political space of the European Union (EU) by drawing on this source.
Keywords
Introduction
An ‘enterprise of madness’ was the label that colleagues put on Peter Flora’s (1983: 5) efforts to collect data for comparative research in a systematic fashion. Three decades later, students of political institutions and the European Union (EU) are still spending an enormous amount of time creating datasets that are ‘collectively inefficient for the research community’ (Häge, 2011: 456). Here, I am concerned with data and parameters on election outcomes, government compositions and party positions, respectively information about national parliaments, the European Parliament (EP), Council and Commission. This information is available in printed and machine-readable form, but it is heterogeneous, of differing quality and hard to combine. We are still in need of more up-to-date information about political institutions in Europe that suit our empirical research, a modern equivalent for data handbooks and yearly reports of political events. Unfortunately, as of today there is no systematic approach to overcoming contemporary limitations in providing adequate and integrated data sources on parties, legislatures and executives. Here, I will introduce such a data infrastructure for comparative research, address open questions of collective data accumulation and present applications for students of the EU and comparative politics.
There are well-established ways to collect data on elections and cabinets in printed form such as Mackie and Rose (1991), Woldendorp et al. (1998), Nohlen and Stöver (2010), and the yearly political data section of the European Journal of Political Research (EJPR). Unfortunately, these well-respected data sources are the basis of many different digital datasets on political institutions. Establishing a coordinated effort to provide better and more readily available data on parties, elections and governments has largely failed, causing a collective action problem in gathering data for comparative research. As a consequence, empirical studies of political institutions are difficult to replicate, with respect to the data sources they make use of. This is surprising, as there are now high standards when it comes to replicating the statistical analyses of quantitative work (King, 1995). Currently, it is cumbersome to answer some simple questions of comparative and EU politics due to the limited availability of necessary data sources.
In this article, I introduce a new approach towards data collection in political science and a new data infrastructure on parties, national and EP election results as well as the composition of governments. This information can be used to determine the party political composition of EU institutions (cf. Warntjen et al., 2008). The new infrastructure is named ParlGov (Parliament and Government composition database). A third version of this resource was released in July 2011 (Döring and Manow, 2011). The infrastructure makes use of recent technological innovations and has four components: a database to file the information non-redundantly, computer scripts to calculate institutional parameters, a web interface to present observations in a more accessible manner and a feedback system that allows other researchers to contribute their country expertise in an open and transparent way. The latest version of ParlGov includes more than 1000 parties, about 500 elections and 1000 cabinets. These observations are linked to existing data sources such as information about party positions.
To present the new approach towards data collection and its potential, I proceed in three steps. A first section discusses previous approaches to the collection of empirical data on political institutions. A second part presents my ideas towards data collection in political science and introduces ParlGov, a new data infrastructure on political institutions. I conclude by providing an example of using ParlGov for studies about the EU.
Large-scale data collection for comparative research
Approaches in the discipline
Several systematic attempts to collect empirical data for political research have been developed in sub-disciplines of political science over the last decades. But why are there well-established practices for the collection of data for work in political behaviour and political economy but not for research on political institutions? For studies of political behaviour, the cost of large surveys has forced researchers to develop institutionalized ways of collecting and archiving opinion data. As a result, there are national election studies that regularly run large-scale opinion polls and archive their results. For these studies, there are well-established rules about how to conduct, document and file the collected information. As a consequence, students of political behaviour have a large set of archived studies that they can base their analysis on. There are difficulties in combining national election studies across countries and time, but researchers have a wide range of datasets available in digital form upon which to base their empirical work and there are collective efforts to link these existing sources across countries. In the field of political economy, national statistical offices, international organizations such as the OECD (Organisation for Economic Co-operation and Development) and the World Bank, or research institutes provide most datasets. Again, there are institutionalized ways of collecting economic data, updating and archiving them. Empirical studies of political behaviour and political economy start by deriving datasets from institutionally provided sources that are available in a format that makes it possible to apply the information without independent data collection.
The situation is very different for information about political institutions such as election results, government compositions and observations about political parties – the types of data fundamental for comparative politics and studies of the EU. There are established ways to collect, combine and archive this information, but the data are often not suitable for (quantitative) empirical work without major revisions. Currently, students of political institutions spend an enormous amount of time on data preparations. In my view, there is significant room for improving contemporary approaches toward data collection in political science.
What are the empirical sources of information on parties, elections and governments that are available today? What are contemporary approaches towards data collection? Why do I think there is significant room for improvement? Mackie and Rose (1991) and Nohlen and Stöver (2010) are probably the most authoritative sources of data on election results in advanced democracies. Other sources provide information about the party composition of governments in western and central/eastern Europe (Müller and Strøm, 2000; Müller-Rommel et al., 2004; Woldendorp et al., 1998) or information about EP elections (Corbett et al., 2007: 358–365). These sources provide carefully collected information about election results and government compositions in printed form and the library system guarantees that these data are available to all scholars, but it is difficult or time-consuming to draw on them because they are not available in digital form. As a consequence, scholars use different datasets derived from these sources and there is no shared data source that forms the basis of empirical work.
A significant improvement in terms of providing access to empirical information has been to accompany data handbooks with CD-ROMs. The two volumes of the Comparative Manifesto Project (CMP) are the shining examples of this trend (Budge et al., 2001; Klingemann et al., 2006), and Caramani (2000) also makes a significant effort to provide better empirical data about parties and elections in Europe at the sub-national level. Over the last decade, the internet has offered new opportunities for researchers to present and distribute their data and there is now an almost unlimited amount of information on the web. Müller and Strøm (2000) is a good example of work on political institutions that was first published in a format similar to data handbooks but is now accompanied by an online source, the Comparative Parliamentary Democracy Data Archive (CCPD). Using a similar approach, Tausendpfund and Braun (2008) provide an online appendix for the information about elections to the EP and Armingeon et al. (2009) combine political and institutional data with demographic, socio- and economic variables. These online sources follow a traditional format: they combine a dataset with a codebook that documents the data, their variables and sources.
Finally, there are some more recent approaches towards generating data for political science research that draw on new computer techniques. Høyland et al. (2009), for example, suggest creating automated databases for political research based on official online presentations. They give an example for the Members of the European Parliament (MEPs) Information on MEPs, such as biographical data or committee assignments, is available on the webpage of the European Parliament. However, this information is not presented in a way that makes it suitable for comparative research without modifications and has to be transformed into a data matrix. Hoyland et al. suggest applying computer techniques in order to automatically convert these official sources into a data matrix that can be used for empirical work in political science. By running these computer conversions at regular intervals, they provide data for researchers, which are up to date and include the most recent official information. With this approach, students of comparative politics do not have to collect and update data themselves but make use of computer tools to convert existing sources into data for political analysis (cf. Häge, 2011). However, these approaches are limited to information that is prepared and made available by other agencies and does not include the type of data that I am concerned with in this article.
Contemporary shortcomings
A broad set of empirical information and data about political institutions (especially parties, elections and governments) is now available. However, several problems hamper scientific progress. First, data sources are often very difficult to combine. This is a result of the fact that different IDs (identities) are used across datasets, a problem that may not be solved totally. For parties, it may be difficult to find one unique identifier to link all information about parties across various datasets. Parties split, change their names, or form alliances and we may disagree how to code these changes over a party’s life cycle. However, we should be able to find overlapping information for most parties and we need sources that link existing observations on political parties. Difficulties of connecting observations do also apply to elections and governments but these data are easier to combine by technical means. Hence, we face the challenge of how to find ways to better link existing data sources and documentation about the problems of merging these sources.
The second critique concerns the enormous number of variables that are often combined into one data matrix at the coding stage. I am not concerned about the amount of information but the lack of distinction between different types of data and the difficulties in comprehending the vast amount of content. Take for example an election result: there is some information that is unique to every election. Other observations have to be coded at the party level, such as the number of seats a party won. There may be data about party positions in a different source and we might want to calculate some institutional parameters from these observations such as the effective number of parties. All this information is often entered manually or semi-manually into one rectangular data matrix, thereby duplicating a lot of observations. Technically, this information should be kept separated in different data tables and be combined by merge scripts or a database design. I will propose three different types of data later in this paper by introducing a novel approach towards data collection in political science. By distinguishing these data types, information can be coded more coherently and consistently.
Third, there is often no systematic way of improving the information that different datasets provide. Sometimes, the exact coding of an election result or a government termination may be controversial, but it is easy to agree on most of the observations. Today, researchers often correct errors they find in their personal copy of a dataset as it was downloaded or generated from a data handbook. They may inform the original collector about a data bug but only rarely is this information included in an updated version of the original data or communicated separately in a list of known issues. Once data are published in a handbook, on a CD or online, these data are fixed forever. Providing stable versions of a dataset is necessary for the replication of analysis. Nevertheless, there could still be updated information in succeeding versions of a dataset or a list of known errors and later releases should inform us about changes and include received feedback. As of today, there are hardly any institutionalized approaches to creating regularly updated digital datasets on legislatures and executives. 1
Finally, data on political institutions should be presented in a more accessible format. Political institutional data differs from mass-level survey data by providing information at different levels of observations. A dataset may contain variables at the country, election or party level. For most of these observations, we know the ‘true’ values and coding errors should be corrected. However, presenting all observations in a large combined data matrix and a codebook reduces the likelihood of identifying potential coding errors. Traditional data handbooks have presented empirical observations in a more accessible manner by combining data observations, notes and comments. Hence, we should try to find a modern equivalent to present our empirical observations in a more accessible format. Presenting information in different forms may make data errors more easily identifiable and facilitate collaborative data revision.
To sum up, most of the contemporary approaches to collecting data about political institutions no longer match the demands of empirical analysis. Data handbooks, yearly political reports and static datasets offer the information needed for data analysis, but do not present it in a format that can serve as a consistent basis for empirical analysis. These existing sources have yet to be transformed and extended in a way that makes it possible to address a particular research question. As a consequence, most of the current data collections for political analysis are heterogeneous, not up to date and are difficult to combine. Hence, questions of reliability are a major concern for empirical work on political institutions due to differences in the underlying data collections. How can these challenges be overcome and what may a new infrastructure for data on parties, elections and governments look like?
A new data infrastructure
ParlGov is a new data infrastructure to foster empirical work on parties, legislatures and executives. The infrastructure makes use of recent innovations in information technologies and provides an example for new types of collaborative data collection in political science. The new approach towards data collection has four components:
a database to store empirical observations and coded information; a presentation of data content in webpages; feedback mechanisms for collaborative data enhancement; programmed scripts to calculate institutional parameters and to link external datasets to the database.
The data can be accessed via an online interface, but can also be downloaded and used on personal computers. All observations are visualized in webpages and can be accessed as data tables. Users can provide feedback and observations are updated regularly. Yearly releases of static versions of the data guarantee a stable set of information for replication purposes. The following paragraphs describe each of the components of the integrated data infrastructure in more detail.
Empirical information collected
Summary of observations in ParlGov data infrastructure (Version 11/07)
The latest ParlGov version includes observations for 1338 parties and classifies them into party families. It records a party’s name in the original language (native and Latin characters), its English name and the official abbreviation. In addition, all name changes over a party’s history are coded. Observations on parties are also linked to those parties that were formed by merging or by splitting up. This coding scheme makes it possible to track the evolution of a whole party system over time.
Parties in ParlGov are linked to a set of well-known datasets with information about party positions at a particular point in time. The major party expert surveys from Castles and Mair (1984), Huber and Inglehart (1995), Ray (1999), Benoit and Laver (2006), and the Chapel Hill Expert Survey Series (Steenbergen and Marks, 2007; Hooghe et al., 2010) are connected to ParlGov. Party observations are also connected to the CMP data (Budge et al., 2001; Klingemann et al., 2006), the EU Profiler (Trechsel and Mair, 2009), to parties from the EES (2009) and including more external datasets is planned for future releases. This allows users to add to all observations in ParlGov data about the political positions of parties from various external sources. The party table also makes it possible to combine external datasets in order to cross-validate party positions or to derive positional parameters from this information.
The data infrastructure includes all democratic elections for the post-war period and information about the party make-up of governments. The latest version of ParlGov includes 637 elections with about 5485 electoral results at the party level. Among these observations are the results of 127 elections to the European Parliament. Most of the information is based on official electoral results and all parties with seats in national parliaments are coded. For some of the countries, the number of votes for all parties that won more than 0.5 percent of the national vote are included. The coding scheme distinguishes parties that form electoral alliances and run on a joint list from the parliamentary groups these parties join in the legislature. For the former the percentage of votes is recorded whereas the number of seats is coded for the latter. This approach makes it feasible to compare party systems in the electoral arena and in the legislature with the help of the data.
To record the party composition of governments, data about cabinets and the parties represented in them have been collected. Cabinets are coded in line with a definition of a change in government proposed by Budge and Keman (1993: 10): any change in the set of parties holding cabinet membership, any change in the identity of the prime minister, any official resignation of a government and any general election. The latest released version includes 892 cabinets with 2133 governing parties. Again, these data on cabinets can be linked to previously presented information about parties and legislatures. By including unique identifiers into all observations, we can combine information in ParlGov with the help of a database, to which I turn now.
Database
Making use of a database allows us to separate data about political institutions more carefully. Databases come in various forms and relational databases are optimized to store data non-redundantly according to a defined table schema. Take for example an election result: in a relational database, we would create at least two tables for electoral results. One table includes data about each election, such as date and turnout. A second table gathers observations about each party that took part in the election, for example the number of votes and seats won. For each election, the first type of information is observed at the election level, the second at the party level. In a database, we store these data separately and combine them at a later stage. We could also add more information about the parties in a third table such as left/right positions or link to external datasets with this type of observation. For an empirical analysis, we can create a dataset based on these three tables by combining the data sources. In ParlGov, the original tables that record observations are called primary information in order to highlight the fact that these data are collected or coded information and cannot be derived from other observations nor be calculated.
Making use of a database allows it to integrate other datasets more easily. For parties many different datasets about party positions can be combined by creating a primary table with party IDs of the different datasets. Having all IDs in one data matrix allows us to combine the different datasets. Previously, I have listed the set of party position data such as expert surveys and manifesto-based sources that are connected in ParlGov. Other types of external observations may include turnout data for every election from a different source or economic data for a country. Keeping this information in separate tables allows potential users to link the external information to primary data as needed. Hence scholars can decide if they want to link election results with one of the expert surveys or CMP data. Using this approach makes it possible to distribute our database without including the external datasets. The database includes example scripts that demonstrate how to link ParlGov data and external observations.
Another type of table is generated dynamically, based on primary and external observations. These are virtual tables, views in technical terminology, generated through database operations or software scripts by combining primary and external tables via defined operations. The ParlGov database creates a view for election results that provides information for each party but adds party positions from a different table. Another table on government formation links cabinet parties and election results, gives information about all parties in parliament at every instance of government formation and indicates if a party becomes a government member or not. If any of the primary data are changed, information in the virtual table is updated instantly. Virtual tables can also be created via merge operations in a statistical software package. These tables are most likely to form the basis for empirical work based on ParlGov data. The latest release includes three major views, one on parties, and another one on election results and a third on government formation.
Some variables that are of interest to political scientists are also logically based on primary and external information but are difficult to calculate with merge or database operations only. These may be complex institutional parameters that have to be calculated by programmed functions. Determining the position of the median party in parliament is one example of various power indexes. These observations are calculated by software routines from statistical software packages and are based on primary and external data. Because this information is still virtual, based on other coded observations, it is called calculated views. In the latest version of the database, there is, for example, one table that calculates parameters of electoral and party systems (disproportionality, advantage ratio, effective number of parties, etc.) based on the vote and seat share of parties in parliament.
Figure 1 provides an example of how the different data types are interrelated. The Figure shows how to determine the median party based on election results from primary data and party positions from an external source. These two types of information are merged – joined in database terminology – through a table that contains information about parties linking IDs from different datasets. As a result, there is a new table (view) with the electoral results of parties and their policy positions for every observation. Based on this information the median party for each election can be calculated with a computer script.
Combining different data types.
The database in ParlGov has multiple data tables that are combined to produce datasets for empirical research. This approach makes it possible to combine a wide set of existing information with observations on parties, elections and governments. However, combining such a wide set of sources leads to a data structure that may be difficult to understand at first. It is not a problem of the approach per se, but simply the result of integrating an enormous amount of already existing data. Hence, we have to think about alternative ways to present that content in order to make it more accessible. How can we save highly structured observations in a database and present it in an accessible format?
Data presentation in webpages
Data handbooks have the advantage of presenting empirical information in a very comprehensible way. A description of observations, introductory chapters and footnotes provide very detailed summaries about all aspects of the empirical information in these sources. However, preparing information in such a format makes it difficult to use this information in machine-readable form or to include it into a dataset. For our contemporary work, we need information in a data matrix, which is often difficult for human beings to read. In ParlGov, empirical information from data tables is presented in webpages to overcome these limitations. These webpages are available online on the internet as well as offline in a local version and are a modern equivalent for data handbooks.
Webpages are a powerful way of presenting information from databases and they offer an alternative form of data visualization. In ParlGov, all information about parties, elections and governments is presented through these pages. For example, there is one page for each party in the dataset and this page lists all information about the party that is included in the ParlGov data infrastructure, as well as a list of all the names of parties in external datasets that are linked to the observation. If available, the page lists the national and EP elections a party took part in, its government participation and any renamings of the party. The list on the page of elections that a party took part in link to pages that show all parties that took part in that particular election with their respective electoral results. On this page, information about the governments that formed after this election is given and links to separate pages listing the cabinet members, and information about the cabinet is shown. Again, these pages are based on the same database that is used to generate the data tables for empirical research and the pages are available online as well as offline.
By providing such an alternative format to presenting empirical information, the quality of our coding becomes more transparent and open for close scrutiny by country experts. Later in this paper, I will describe the release strategy and demonstrate how updated data are offered at regular intervals. Here, I want to note only that the webpages are available online for the most recent version and are also included in the dataset released as a static version. The online presentation of the data on the internet does also allow users to offer feedback on empirical information and I will now describe the feedback system more generally.
A feedback system to improve cooperation
Most of the observations on parties, elections and governments have defined values for all variables. There is an official election result, an official party name and a date a government is sworn into office. Explicit coding rules may further narrow down coding ambiguities. However, collecting all this information in detail is time-consuming, error-prone and may leave mistakes uncovered. Reasonable effort is sufficient to collect data on the number of seats for parliamentary parties and government participation. Nevertheless, having more detailed information and integrating official sources requires the support of country experts with detailed knowledge about the institutional structure of a country. Often it is time-consuming for non-natives to find out details about a particular electoral alliance or about the causes of a specific government breakdown. Hence, giving users and country experts an easy way to access the data and to update it can significantly improve the quality of empirical information in the long term. Similar to scientific publishing, we are in need of platforms to improve our data over time and to debate the coding of ambiguous cases.
New computer techniques can help to integrate the feedback of users and experts. Modern software development techniques have significantly enhanced the potential to collect error reports (referred to as bugs among programmers), feature requests, user comments and documentation. Modern software development offers many valuable ideas for new approaches towards data collection in comparative research and some of these tools are integrated into the ParlGov infrastructure.
Some of these practices are rather straightforward: encourage and provide a mechanism for feedback, document known errors first and fix them later. In its most simple form this can be done by encouraging suggestions in the documentation and by listing known problems on an internet page. There are some more advanced techniques that foster cooperation and feedback mechanisms. The ParlGov project makes use of an online project management software that includes a wiki. 2 Users can file error reports as well as suggestions in such a tool, assess development progress and add data or software scripts. This openly available information allows users to closely follow and to participate in the evolution of the data project.
Versions and archiving
The previous section has highlighted the fact that errors in datasets about political institutions should and can be corrected through feedback mechanisms. These tools encourage experts to provide their knowledge to facilitate a continuous evolution of data collection. As a consequence, the content of the data structures proposed here changes regularly. In addition, including new data and recent political events such as elections or government formations also alters the observations in the database. Hence, the exact same mechanisms that improve the data in the long term undermine standards of replication. Approaches designed to overcome these shortcomings are well established. Nowadays, researchers are encouraged to file their datasets in data archives such as the Interuniversity Consortium for Political and Social Research (ICPSR) or the Economic and Social Data Service (ESDS) These agencies guarantee the archiving and long-term distribution of social science data. In this way, they ensure that empirical information is available and accessible for future researchers.
For ParlGov, there are two types of released datasets. First, there is a stable version that is well documented. In this version, the quality of all observations has been double-checked and all details of the data are documented. This version should provide the basis for empirical work because it gives a fixed and replicable amount of information and is also archived. Second, there is a development version that includes all recent changes, user feedback and corrected error reports. It may also contain some variables and observations that have to be documented in more detail. Nevertheless, it contains the most recent events (elections and new governments) with data errors corrected and some scholars may want to rely on this more up-to-date information. The development version provides the basis for the next stable release and there will be at least one of these stable versions every year. This release strategy follows the format of yearly data reports in political science journals that document recent political events and make the information available in the long run.
An application: The political space of the EU
How can we map the party make-up of EU institutions over time? Warntjen et al. (2008) present a dataset with some aggregate information about the party political composition of the EU. Similar types of information are used in Franchino (2007) and generated in Veen (2011). These authors rely on independently collected information about parties, elections and cabinets in EU member states to determine the median/mean position of national parliaments, Council, EP and Commission. None of the raw information of these data sets can be easily extended or linked to other sources. In the following sections, I discuss how this type of data may profit from a more generic data infrastructure on parties, elections and cabinets such as ParlGov. In the Web Appendix, I present a set of scripts that derive this information from the infrastructure and generate a dataset about the party political make-up of EU institutions.
To determine the democratic chain of delegation in the EU, we need information about the results of national and EP elections, data on the composition of governments and the College of Commissioners in addition to information about party positions. This information is available today in various sources: Nohlen and Stöver (2010) provide national election results, Tausendpfund and Braun (2008) EP election data, Müller and Strøm (2000) and Müller-Rommel et al. (2004) give information about national governments and the EJPR yearly political data section offers regular updates to this information. In addition, Wonka (2007) offers information about the party make-up of the College of Commissioners. Nevertheless, all this information is not presented in a format that can be easily combined. It is either not available in digital form or lacks a unique set of identifiers that can be applied across datasets. ParlGov includes all this information and can be used to combine the different datasets. 3 Instead of manually assembling a dataset on the party composition of EU institutions, customized software scripts can be used to generate the data on the fly. In the online appendix, I provide a set of scripts to generate the respective datasets on the party make-up of the EU. With the help of these scripts the party composition of the EU can be determined for any point in time and various datasets. Other information about party positions may be added to this information to locate parties in a political space.
Figure 2 is derived from data in ParlGov and gives two-dimensional political spaces for national parliaments, the Council, the EP and the College of Commissioners at the ratification of the Single European Act (SEA) in 1986. The graph provides information on all parties represented in these institutions, their seat strength and their political positions. It also visualizes two institutional parameters, median and mean (centre of gravity) position, of each institution in the left/right and pro/contra EU integration dimension. Party positions for the Figure are derived from aggregated positional information that comes with ParlGov based on several existing party expert surveys (Benoit and Laver, 2006; Castles and Mair, 1984; Hooghe et al., 2010; Huber and Inglehart, 1995; Ray, 1999; Steenbergen and Marks, 2007). However, in succeeding work students may draw on some of these sources individually and link them to ParlGov data on election outcomes and cabinet compositions via a merge table. Again, information similar to that in Figure 2 can be calculated for every point in time and the online appendix provides programmed functions and further examples. In these examples, ParlGov is used as a generic data infrastructure to generate a new dataset with information about parties, elections and cabinets.
European political space on 1986−02−17.
Other applications of ParlGov for studies on the EU may include cross-validating party positions. Here, ParlGov provides a set of information about parties that links party identifiers from several sources such as party expert surveys, manifesto data and a voting advice application for EP elections (EUProfiler). These sources give information about party positions towards European integration and in the left/right dimension at different points in time and can be connected via ParlGov. Moreover, studying second-order elections at the aggregate level with results from national and EP elections may also be easier with the data infrastructure. Observations in ParlGov are combined through unique election and party identifiers. Hence, a dataset with election results of parties in EP elections compared to outcomes in preceding and succeeding national elections can be derived by combining the respective information through a software script. Other information such as party positions, turnout data or information about electoral alliances may also be included. These are only some examples of information on parties, elections and cabinets in ParlGov that may be used for studies of the EU. By having such an infrastructure, students of the EU and of comparative politics more generally have a more easily available source with the raw information needed for much empirical work.
Conclusion
The purpose of this paper was threefold. First of all, I wanted to give an overview on the evolution of data collection in political science; especially data about parties, elections and governments. I discussed the evolution from data handbooks to digitally collected datasets. Second, I provided a summary of the shortcomings of contemporary approaches towards data collection in political science and EU politics. I emphasized the fact that most of the data we need for empirical work on political institutions, on parties, elections and governments, are available but it is cumbersome to combine existing data sources. Finally, I proposed a number of techniques to improve collective data collection in political science.
In this paper, I have also introduced a new data infrastructure on parties, elections and governments – the ParlGov data infrastructure – and presented an application to studies of the EU. ParlGov offers an infrastructure for empirical research that overcomes many of the shortcomings of contemporary data collection approaches. With the help of a database design, it can combine information on electoral outcomes and cabinet compositions with a wide range of external data sources (especially party positions) and ParlGov offers ways to calculate institutional parameters from these observations. Providing collected empirical information in webpages offers a more accessible way of presenting data and links between data sources. Presenting empirical information in such a format should facilitate the integration of detailed country expertise for future revisions, updates and extensions of the data. More information such as presidential and second chamber elections may be integrated and other external sources will be linked to ParlGov in the future.
The data infrastructure described offers a modern and innovative approach towards data collection for comparative research. It may mark the next step in the evolution of collecting empirical information. Modern datasets for comparative research should encourage collective data gathering and reduce barriers of cooperation. In the paper, I have discussed some recent technologies that significantly lower the cost of collective data gathering. The ParlGov infrastructure provides an example of how to make use of these techniques. Gone should be the days of manually typing information from codebooks and data handbooks into spreadsheets to link existing sources on parties, elections and governments.
Footnotes
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Acknowledgements
The ParlGov data infrastructure (
) that I introduce in this paper is based on joint work with Philip Manow, whom I wish to thank for his support. Our work on this infrastructure started at the Max Planck Institute for the Study of Societies (MPIfG) from 2005 to 2007 and continued at the University of Konstanz from 2007 to 2009. At the European University Institute (2009 to 2010), my work on the project has been significantly enhanced through support from Mark Franklin, Peter Mair and Alexander Trechsel. In addition, I would like to thank Laurie Anderson, Jan Biesenbender, Fabio Franchino, Thomas Jensen, Alexia Katsanidou, Alyson Price, Julia Sievers and Luca Verzichelli. A previous version of the paper has been presented at the workshop ‘Quantifying Europe’, University of Mannheim.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
