Abstract
Background and objectives:
The evaluation of public health law requires reliable accounts of underlying statutes and regulations. States often enact public health-related statutes with nonuniform provisions, and variation in the structure of state legal codes can foster inaccuracy in evaluating the impact of specific categories of law. The optimal format for empirical analysis is a machine-readable 50-state coded data set. This study provides a comprehensive assessment of these resources and related materials with a focus on statutory data sets.
Research design:
An exhaustive literature search was followed by a “pearling” or “snowball” approach to assure the most complete inventory of this very diverse and diffuse information. We also interviewed three leading investigators to identify barriers to wider use and availability of coded legal data sets.
Results:
We identified relatively few accessible coded statutory data sets, and others that are not available for use outside the group or individual that compiled them. The Robert Wood Johnson Foundation-funded Public Health Law Research Program has made funding available for the development and dissemination of additional data sets, as well as extensive guidance regarding their use in the evaluation of public health law. Investigators reported serious obstacles to these activities in the past.
Conclusions:
Compilation of coded statutory data sets requires a focused investment of resources that has only recently become available. Funders should require grantees to make their work accessible to other investigators so as to assure development of public health law research and evaluation.
Introduction
The 2011 Institute of Medicine report For the Public’s Health: Revitalizing Law and Policy to Meet New Challenges includes as its final recommendation that the Department of Health and Human Services “convene relevant experts to enhance practical methodologies for assessing the strength of evidence regarding the health effects of public policies. . . ” (IOM 2011). One such methodology receiving significant attention is the empirical analysis of law using coded statutory data sets.
The evaluation of public health law requires access to relevant, timely, reliable, and valid data. The lack of such data is frequently noted as an obstacle to public health law research (Mello and Zeiler 2008). Coded legal data sets are translations of a given body of law into a numerical form that can be used for quantitative analysis (see http://publichealthlawresearch.org/datasets). Such data sets contribute to the evidence base for public health policy by creating quantitative metrics with clearly articulated and replicable classification rubrics that align statutory language, context, and interpretation. They help overcome common obstacles to the comparison of laws across jurisdictions, such as nonuniform statutory language, variation in the placement of similar statutes across diverse statutory code structures, and fragmentation of enacted legislation among state code sections.
Evaluations of public health laws identify variations within categories of law in order to test their relationships with the outcomes that the laws are intended to address. One common example is the exploration of motor vehicle safety laws in relation to fatal crashes (e.g., Dee, Grabowski, and Morrisey 2005; Grabowski, Campbell, and Morrisey 2004). Another under close scrutiny at present is the relationship between various drug and alcohol regulations and associated morbidity or mortality (e.g., McBride et al. 2008; Wagenaar and Maldonado-Molina 2007). Yet another of current interest is the link between types of firearm law and suicide or homicide (e.g., Webster et al. 2004; Frattarolli and Vernick 2006). Policy makers are also interested in fine-tuning soft drink taxation so that its impact on body mass index (BMI) is as effective as increased cigarette taxes in relation to smoking (e.g., Sturm et al. 2010).
Related Approaches
Comprehensive coded data sets such as those that encompass the statutes of all or most states are cited as the gold standard for empirical research in public health law (Tremper, Thomas, and Wagenaar 2010; Mello and Zeiler 2008). However, there are similar but distinct investigations focusing on local ordinances (McCarty et al., 2009), particularly with regard to tobacco control (see, e.g., the work of Americans for Non-Smoker Rights at www.no-smoke.org), as well as changes in the law of individual states over time (e.g., National Conference of State Legislatures [NCSL] 2010).
Other methods for organizing legal text can serve as precursors to coded data sets and are useful supports for other public health law research methods. Unlike fully coded data sets, these resources are not coded for all provisions such as those addressing enforcement and sanctions. First, uncoded databases, as distinguished from coded data sets, are common and accessible (see Table 1); their number appears to be growing at present. Second, many organizations provide “authoritative-looking 50-state lists and similar compendia of ‘the law’ on various topics” (Tremper, Thomas, and Wagenaar 2010, 259). These resources must be reviewed carefully with regard to timeliness, coding protocols, and possible bias; the latter concern arises because such tables are often the products of trade or advocacy organizations.
Statutory Databases in Public Health Areas
A third related approach is reflected in the repositories for social science data sets. Harvard’s Institute for Quantitative Social Science Dataverse repository includes some examples (see www.iq.harvard.edu). The University of Michigan’s Inter-University Consortium for Political and Social Research (ICPSR) includes potentially useful data sets and also provides guidance for curating digital information (see http://www.icpsr.umich.edu/icpsrweb/ICPSR/). A search of these and similar sites found no contemporary data set that included laws of all 50 states relevant to public health. There are several addressing the laws of one or more states and municipalities, comparing U.S. law with that of other nations, or examining the impact of specific laws.
The extent and stature of the social science data set sites make them useful models for public health law data set management, even though their content is not precisely on target. The ICPSR repository, which celebrates its 50th anniversary in 2012 and includes half a million research databases, has the express purpose of supporting replication of earlier investigations by making the underlying data sets available. However, the databases are all time-limited rather than continuously updated resources. Those that cover ongoing analyses appear primarily as freestanding, single-year entries.
These alternative approaches to empirical evaluation are far more numerous and accessible than true coded state statutory data sets, and provide useful input for scholars who aspire to create new statutory data sets for public health law evaluation research. Importantly, they alert users to the extraordinary breadth and depth of topics in public health law and the nuance of statutory variations. Fully coded data sets may underlie some of the published material and could be acquired under appropriate conditions. For example, the Americans for Non-Smoker Rights website (www.no-smoke.org) invites inquiries for data underlying their data tables and makes data sets available for $550–850, depending on the level of detail requested.
Data Set Construction
Construction of coded data sets suitable for public health law research requires investment of time and expertise, optimally including resources to support periodic updates. An extreme example is the Alcohol Policy Information System (APIS, online at www.alcoholpolicy.niaa.nih.gov), a vast multidisciplinary project currently funded by the National Institute on Alcohol Abuse and Alcoholism under a 5-year contract averaging over $1 million per year. In contrast, the Robert Wood Johnson-funded coding projects managed by the Public Health Law Research National Program Office (PHLRNPO) are supported in the range of $50,000 annually. Regardless of cost, sustained investment in coded data set construction yields a product that is more versatile and transmissible than traditional, doctrinal legal research, and given the range of potential uses, has the potential to be more cost-effective.
Detailed guidance for the construction of coded statutory data sets is available on the PHLRNPO website at http://publichealthlawresearch.org/methods-guides. Three critical elements, in addition to the background research and technical process of coding, are the development of an underlying database, protocol, and coding manual. Ibrahim (2010) provides a step-by-step strategy; detailed examples are provided in the review of existing data sets that follows.
Creating an electronic database is an essential step in the generation of a coded data set because without it, the data set cannot be revisited or updated. Investigators often begin with a spreadsheet format such as MS Excel, but its querying capacity is limited. The most common software packages used are Google Documents and Access; however, the Harvard School of Public Health Obesity Prevention Law Research group has built its own custom database. As Ibrahim states, the goal is to “create a document that will allow others to replicate the search process and achieve the same results” (Ibrahim 2010).
The next step is the development of a data set creation protocol, including construction of an Excel spreadsheet with variables in columns and state laws in rows, two independent initial coding runs with audit by a third coder, and reconciliation of differences by the group. Tremper, Thomas, and Wagenaar (2010) also provide extensive guidance for the development of coding measures, including such considerations as the objective of the evaluation, the legal framework, and significant environmental factors such as cross-border purchases and economic conditions.
Coding manuals are also critical to the transparency and utility of legal data sets. Less formal data presentations are annotated with headnotes or footnotes describing coding systems, while the more complex and robust data sets have freestanding coding manuals. Coding systems must be machine-readable to avoid yet another intermediate step, translating ad hoc codes (colors, emoticons, check marks, etc.) into formats for use with software such as SAS or STATA.
Updating statutory data sets, particularly when the original coding was done by a different team, poses its own challenges. If the underlying database is not available or cannot be queried, updating the data set may require creation of a database reflecting the state of the law at the time of the data set. Obviously, greater transparency is a tremendous asset to empirical public health law researchers because it allows them to avoid this step.
Methods
This review of existing coded legal data sets aims to address the full range of available resources. Thus, it combines traditional literature searches with an adaptation of the “snowball” or “pearling” approach and includes separate searches of PubMed, Lexis-Nexis, EBSCO Academic Premier, Google Scholar, the Cochrane Library, the Social Science Research Network, the Greylit collection of the New York Academy of Medicine, and the websites of prominent nongovernmental organizations (e.g., National Conference of State Legislatures, the Kaiser Family Foundation, the American Lung Association, Americans for Non-Smoker Rights) and government agencies in relevant fields (e.g., National Highway Traffic Safety Administration, Consumer Product Safety Commission, Centers for Disease Control and Prevention). Search terms included public health, legal, law, legislative, statutes, statutory, policy, fifty-state, and terms denoting subareas such as tobacco control, alcohol, firearms, immunization, HIV/AIDS, indoor air, competitive foods, child passenger safety, graduated driver licensing, blood alcohol, per se laws, elder abuse, carry-concealed, shall issue, screening, and reporting, along with wildcard symbols specific to the search engines in question.
The snowball or pearling aspect of this search strategy used bibliographies and reference lists from key publications as supplementary resources. Unless the related research had an explicitly historical focus, the search was limited to data sets updated from 1991 forward. The initial phase included data sets that were not fully coded in their published form in order to test the hypothesis that coded versions might exist elsewhere. For example, data sets that are intended for a broad audience might not appear in the same forum as those coded for machine-read analysis.
Results
We found few true coded legal data sets despite the breadth of the search protocol. One example of our search findings is the use of the search term “50 state” within the PubMed database. For the 20 years 1991–2011, the search yielded a total of 61 citations, of which 36 had some relationship to health care law or public health. Nine articles included data tables, but only one included a link to a coded data set, and it comes from the PHLRNPO group itself (Ibrahim et al. 2011). Another is the subject of an upcoming data set in the same repository (Mello, Pomerantz, and Moran 2008).
While not in the public health legal domain, Avraham’s database on medical malpractice law is structurally similar to the public health law coded data sets. The database has been updated twice, most recently in late 2010, and is available at no charge online at http://www.utexas.edu/law/faculty/ravraham/dstlr.html or on the Social Science Research Network at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=902711. Coding notes appear in the spreadsheets themselves, and a separate document notes issues identified in the coding process.
Data Set Inventory
A general characterization of state statutory data sets for public health law research is given in Table 2. There are several rich and accessible data sources available through the National Cancer Institute, the National Institute on Alcohol Abuse and Alcoholism, Americans for Non-Smokers’ Rights, the National Highway Traffic Safety Administration, the Centers for Disease Control and Prevention, and research organizations such as Bridging the Gap and its affiliate, ImpacTeen. Other data sets are available from the compiler, such as Mark Hall’s state-managed care patient protection laws data set (2004). Still others clearly must exist but are not readily available to other investigators. For example, Grabowski, Campbell, and Morrisey (2004) address the relationship between motor vehicle licensure restrictions for older adults and motor vehicle fatalities using statutory variation as an independent variable. Likewise, Vernick and Hepburn (2002), Vernick et al. (2006), Webster et al. (2004), and Vernick et al. (2006) are clearly working from coded data sets in their evaluations of state gun-related laws, but their underlying data are not available.
Public Health Law-Related State Statutory Data Sets
APIS
APIS provides data in uncoded natural language, so its classification protocol is particularly important. The Alcohol Policy Classification System that underlies APIS organizes bills and regulations according to policy areas in nine broad categories: alcoholic beverage control; taxation and pricing; advertising, marketing, and mass media; transportation, crime, and public safety; health care facilities and financing; education; public services, functions, and programs; employment and workplace; and other policy issues. Additional cross-cutting dimensions are included as needed, and each category has several subsets. For example, the category “Employment and Workplace” includes subcategories addressing alcohol in the workplace, employee assistance programs, profession and occupation-specific alcohol issues, and other employment and workplace issues. Cross-cutting dimensions include demographic groups; beverage types; penalties, liabilities, and incentives; special jurisdictions, and other dimensions.
The search page covers 35 topics 1 and is searchable across both time and jurisdictions. Each topic is addressed with regard to general description, data on a specific date, changes over time, and timeline of changes; maps and charts are also provided. While the Policy Topics data are updated as research is completed, the tabulation of enacted laws and regulations is not being updated on a regular basis (see http://www.alcoholpolicy.niaaa.nih.gov/Frequently_ Asked_Questions.html# How+often+will+the+information+available+on+the+APIS+Web+site+be+updated%3f). However, the Change Log tab provides information about updates as they are made. Additional data sets from the Statewide Availability Data System II are available at no charge upon submission of a request form (see http://www.alcoholpolicy.niaaa.nih.gov/ Data_System_Details.html), spanning the period 1933–2004.
The APIS coding system uses symbols rather than alphanumeric characters and may thus require recoding for empirical analysis. Each table is clearly annotated with regard to definitions, state-specific variants, and links to relevant statutes. Because APIS receives continuous funding, it is updated more frequently than other data sets. Examples of APIS-related scholarship include Wagenaar, O’Malley, and LaFond (2001); Wagenaar and Toomey (2002); Wagenaar et al. (2005); and Wagenaar and Maldonado-Molina (2007).
ImpacTeen
Part of the Bridging the Gap research group's work at the University of Illinois at Chicago, ImpacTeen provides the following four groups of data sets in Excel spreadsheet form, each with its own codebook: Illicit Drug Legislative Database. Addresses controlled substances scheduling (1999–2001), penalties for sale or possession of selected substances (1999–2001), and medical marijuana policy (1999–2001). The codebook is very thorough but the material is of questionable utility because it is over a decade old. Tobacco Control Policy and Prevalence Data, 1991–2008. This data set synthesizes the data tables in the CDC’s State Tobacco Activities Tracking and Evaluation (STATE) system, the National Cancer Institute’s State Cancer legislative Database (SCLD) program, and the American Lung Association’s State Legislated Actions on Tobacco Issues (SLATI) system, along with historical information on tobacco taxation. The detailed and useful codebook includes a discussion of areas in which this analysis diverges from those of the CDC STATE system and the Americans for Non-Smoker Rights. Analyses based on these data include Chriqui et al. (2003); Chriqui, Ribisl, et al. (2008); and McMullen, Brownson et al. (2005). SmokeLess States Data. The statutory data set component of this project includes state tobacco-related legislation from 2002 and 2003, covering tax, smoke-free air, and Medicaid smoking cessation coverage legislation. Coding does not include the laws’ effective dates but otherwise appears thorough. These data sets are the basis of the 1999–2006 Strength of Tobacco Control scores by state (Stillman 2006). State Snack and Soda Regulation. The most current of ImpacTeen’s data sets, this section covers three topics: state snack and soda sales tax data, definitions of food in state law, and taxes on soda other than sales taxes. Coding is indicated at the top of each column rather than set out in a separate codebook. Analysis appears in Chriqui, Eidson, et al. (2008), Mâsse, L.D. et al. (2007) and Mâsse, L.C. et al. (2007).
A related group also compiled a coded data set of methamphetamine precursor laws that was used in a report to the Dept. of Justice (McBride et al. 2008).
Public Health Law Research National Program Office
The website for the Public Health Law Research National Program Office (PHLRNPO) at Temple University includes links to data sets that address distracted driving, obesity prevention, and child passenger safety (www.publichealthlawresearch.org). Each entry includes a coding manual and detailed description of the process used for development. Distracted driving. The distracted driving data set covers 1992–2010 laws from all 50 states and the District of Columbia restricting mobile communication device use. There are 22 variables in the distracted driving data set addressing the populations targeted by the laws (e.g., young drivers), the types of communication devices and uses addressed, and exemptions such as use by public safety officers. Obesity prevention. An obesity prevention data set, under development as of this writing, will cover over 100 coded variables for laws enacted in all 50 states between 2002 and 2007, including such topics as menu labeling, BMI reporting, and school food regulation. The project codebook and research protocol are available on the PHLRNPO website at www.publichealthlawresearch.org. The authors are on the faculty of the Harvard School of Public Health. Child safety restraints. The forthcoming child safety restraints data set, another posting at the PHLRNPO website, will include over 100 coded variables addressing child passenger safety legislation from the initial car seat enactments in 1981 through 2010. The authors are faculty members at the New York University School of Public Health.
The Public Health Law Research website also includes links to data sources discussed above such as the Americans for Nonsmokers’ Rights and Alcohol Policy Information Service websites, as well as nonlegal data sources including the Behavioral Risk Factors Surveillance System and the county health rankings compiled by Mobilizing Action Toward Community Health (www.countyhealthrankings.org).
Use of Coded Data Sets
A survey of peer-reviewed publications that use coded statutory data sets for empirical evaluation demonstrates the value of meticulous and transparent coding in several ways. Most of the authors created and maintain the data sets themselves or are closely associated with the data set staff. Typically, the same individual or team acquires the statutes, identifies and verifies their characteristics, develops and deploys coding systems, and then performs the empirical research using their work product. Some data sets are included in publications or otherwise readily available, but many are not. Examples are described in Table 2.
Evaluations of these public health-related laws generally link variations in statutory provisions with outcomes of interest: motor vehicle safety laws with fatal crashes (e.g., Dee, Grabowski, and Morrisey 2005; Grabowski, Campbell, and Morrisey 2004), drug and alcohol regulations with related mortality (e.g., McBride et al. 2008; Wagenaar and Maldonado-Molina 2007), firearm law with suicide or homicide (e.g., Webster et al. 2004; Frattarolli and Vernick 2006), and soft drink taxation with BMI (e.g., Sturm et al. 2010), to name a few.
Statistical analyses are used to report associations between types of law and outcomes in the language characteristic of similar analyses linking public health causes and effects. As Mello and Zeiler (2008) note, this empirical, quantitative approach to legal analysis is unfamiliar to most legal scholars, so it is not surprising that most publications are in journals outside the typical range of legal scholarship. A rare counterexample is the analysis by Burris and colleagues (2007) of the relationship between HIV law and related behavior in two states, published in the Arizona State Law Journal. More commonly, empirical public health law research is published in medical, public health, or economics journals.
Conclusions and Recommendations
At present, only a handful of true coded statutory data sets cover all or most states’ law on topics relevant to public health. The PHLRNPO is expanding both access to and use of rigorously compiled and coded data sets. Another important contribution of the PHLRNPO group is development and dissemination of methods to support expanded creation of the data sets.
Data tables and databases are widely and readily available, but less transparent and useful for empirical research because their contents are not accessible in a ready-to-use coded format. Some data tables and databases may be associated with coded data sets that are available at relatively modest cost. Access to these data sets would be particularly helpful because they are supported by organizational resources that allow for regular updating. A more formal approach to the groups that maintain statutory databases, such as an invitational conference, may facilitate data set identification and access. Given the cost and expertise required to create coded statutory data sets, investment of additional resources to make optimal use of existing resources is likely to be cost-effective.
Funding has been made available for coding projects through the Public Health Law Research Program’s Rapid Response initiative. A requirement that coding protocols and codebooks be accessible will make data sets more understandable to new users by giving them the tools to replicate the original investigators’ work independently. As the community of coded data set creators and users grows, updates are more likely to arise and to be coded to remain consistent with existing protocols.
Agencies and foundations funding public health law research should contribute to building the field by requiring their grantees to make work products publicly accessible with adequate supporting documentation, including coding manuals and protocols. This recommendation reflects the growing public discussion regarding access to source data for additional investigations and assessment of existing findings. The NIH Open Access policy (http://grants.nih.gov/grants/policy/data_sharing/) requires recipients of substantial research awards to make not only their final publications but their final data sets available for use by other investigators. Likewise, the American Economic Review requires that “[a]uthors of accepted papers that contain empirical work, simulations, or experimental work must provide to the Review, prior to publication, the data, programs, and other details of the computations sufficient to permit replication” (http://www.aeaweb.org/aer/data.php). A data sharing requirement would likely be viewed more positively by the research community if it were accompanied by funding of the underlying coding process commensurate with the time and expertise involved, and a reasonable period of exclusive use by the data sets’ creators.
Finally, as experience with coding grows, investigators would benefit from developing a consensus regarding coding protocols and contents. Reasonably uniform coding would make data sets more transparent and more readily amenable to machine-read analysis. The addition of elements such as Zipcodes or FIPS codes would facilitate linkage of coded legal data sets with locally collected health outcomes data. The range of topics and types of law addressed by empirical public health law research will require the development of taxonomies that are adequately flexible to support the expanding scope of new public health law coding initiatives.
As resource constraints and public scrutiny force public health leaders to justify their allocation decisions, assessment of law’s impact on population health will demand the same level of methodological rigor as other aspects of contemporary policy analysis. Advancing the field of empirical public health law research requires increases in both the quantity and the accessibility of coded legal data sets.
Footnotes
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The author received financial support for this research from the Public Health Law Research National Program Office, a program of the Robert Wood Johnson Foundation.
