Abstract
During the last twenty years, more than forty-five publications have sought to measure and evaluate the quality of plans using content analysis methods. We examine reasons for this growth in the literature and its contributions and limitations. We also examine whether the research methods described in these publications conform to recommended practices in the methodological literature on content analysis to determine whether plan quality researchers are likely to be generating reliable and reproducible plan quality data. We provide seven recommendations plan quality researchers can follow to address these weaknesses and improve the reliability and reproducibility of their data.
The importance of plans is a contested territory among planning scholars. On the one hand, plans are valued because they can encapsulate visions for the future, guide and regulate urban development, and serve as communicative signals about values and intentions that can influence a wide array of community conditions (Kaiser and Godschalk 1995; Hopkins 2001; Berke and Godschalk 2009). On the other hand, plans have been attacked on multiple fronts for failing to live up to their promise of being rational, comprehensive, and implementable (Altshuler 1967; Friedmann 1993; O’Toole 2007). Neuman has argued that in spite of the increasing and merited attention to planning processes, plans themselves remain central to the field and are worthy of analysis in their own right (Neuman 1998). We follow Neuman on these points and use them as a jumping off point for critically examining the evaluation of plans.
Since the early 1990s planning researchers have evaluated plans in order to develop and refine theories and expand practical knowledge related to plan quality (Baer 1997; Berke, Godschalk, and Kaiser 2006; Berke and Godschalk 2009; Stevens, Lyles, and Berke 2014). The literature generated by these scholars now includes more than forty-five peer-reviewed, empirical publications garnering more than twenty-five hundred citations (Stevens, Lyles, and Berke 2014, Google Scholar search, January 15, 2014). The growth in studies appears to be accelerating as nine of the studies were published in the 1990s, twenty-four were published in 2000s, and fifteen already were published between 2010 and December 31, 2012 (Stevens, Lyles, and Berke 2014). Considering that two decades have passed since publication of the first empirical plan quality studies and the apparent trend is towards even more studies, we feel that it is time to step back and consider why this area of inquiry has taken hold and continues to grow.
What has come to be referred to as the “plan quality” literature (cf. Berke and Godschalk 2009; Stevens, Lyles, and Berke 2014) consists of researchers using a systematic methodology to conduct comparative research and professional evaluation of plans after they have been developed (Baer 1997). It is only one of multiple approaches to plan evaluation, alongside weighing of planning alternatives during the plan-making process, critiquing plans as one would review a book or movie, post hoc evaluation of plan outcomes, and visually interpreting plan meanings (Baer 1997; Hopkins 2001; Ryan 2011). Plan quality studies have assessed plans addressing a wide variety of planning domains, including comprehensive planning, affordable housing, natural hazard mitigation, and transportation (cf. Edwards and Haines 2007; Hoch 2007; Berke et al. 1996; Evenson et al. 2012). The studies have evaluated local, state, and regional plans from Australia, Canada, New Zealand, the Netherlands, the United Kingdom, and the United States (cf. Berke, Dixon, and Ericksen 1997; Termorshuizen, Opdam, and Brink 2007; Hamin 2011; Preston, Westaway, and Yuen 2011).
Methodologically, plan quality studies employ content analysis procedures to determine whether certain plan characteristics or criteria set out in advance by the researcher(s) are present in the plans (Berke and Godschalk 2009). The post hoc analysis of plans can be linked to the broader literature on evaluating plan implementation (cf. Talen 1997; Laurian et al. 2004; Oliveria and Pinho 2009). Generally speaking, the theoretical and methodological approaches of plan quality evaluation naturally align with the concept of plan conformance, wherein researchers investigate whether the policies included in a plan are carried out as stated in the plan (Alexander and Faludi 1989; Talen 1997). Plan quality studies are less clearly linked to process-oriented implementation conceptions of plan performance evaluation, wherein researchers investigate whether the plan is used and influential in decision-making situations (Mastop and Faludi 1997).
A crucial issue for plan quality researchers is being clear about what is meant by the term “plan quality.” In a meta-analysis of sixteen plan quality studies published before 2008, Berke and Godschalk (2009) traced the evolution of criteria for plan assessment, which they observed depends on the purpose of the plan, following arguments made by Baer (1997) and Hopkins (2001). They further distinguished between internal plan quality tied to “the content and format of key components of the plan” and external plan quality tied to “relevance of the scope and coverage to reflect stakeholder values and the local situation to maximize use and influence of the plan” (Berke and Godschalk 2009, 229). Others have presented alternative conceptions of plan quality that differentiate between a plan’s analytical quality and its consistency (Norton 2008), focus on the communicative and persuasive characteristics (Bunnell and Jepson 2011), or distinguish between a plan’s direction-setting characteristics and its action-oriented characteristics (Berke et al. 2013). Our ambition is not to provide a comprehensive or new definition of plan quality. Like other plan evaluation researchers, we anticipate that what constitutes the quality of a plan will continue to be debated and contested, just as the purposes and impacts of plans are contested (Baer 1997; Berke, Godschalk, and Kaiser 2006; Norton 2008).
Nonetheless, it is important to distinguish between plan content analysis and plan quality evaluation. We define plan content analysis as a systematic process of measuring the characteristics of a plan using content analysis techniques. A successful plan content analysis depends on following methodological standards for content analysis to generate reliable and replicable data about the contents of plans. Plan quality evaluation, then, is the process by which plan content analysis data is linked to normative criteria of what constitutes a better plan. These normative criteria may vary based on the purposes for which a plan is created, the specific planning domain (e.g. transportation, land use, etc.), or the geographic scale or location of the planning entity. A successful plan quality evaluation will consist of a well-executed plan content analysis and provide strong theoretical arguments for measures of plan characteristics used. Moreover, plan quality evaluation can be linked to plan outcome evaluation by validating that certain plan characteristics are linked to desired outcomes.
Building on this distinction between the normative aspect of defining quality and the methodological aspect of generating replicable and reliable data, we aim to critically review the plan quality literature by addressing three questions in this article. First, why has there been so much growth in plan quality evaluation studies over the last twenty years? Second, are plan quality evaluation studies adequately relevant to practice and theoretically informative to merit such growth? To answer these questions, we will identify what we see as the two main arguments explaining the growth of the plan quality literature and two main arguments for why the growth in studies is beneficial to planning. We will also examine the four arguments to identify critical limitations and blind spots in the plan quality literature that need to be addressed moving forward. The third question we seek to answer asks are the methods of plan content analysis being applied such that the data used to measure plan quality are replicable and reliable? We will present data from a content analysis of forty-seven plan quality publications to assess the methodological characteristics of the studies. We conclude by offering recommendations for methodological improvements and future research directions that we believe will help plan quality researchers make an even stronger case that planning does in fact need plan quality evaluation.
Four Explanations for the Growth and Benefits of Plan Quality Research
Our analysis of the growth and benefits of the plan quality evaluation literature builds directly upon Berke and Godschalk’s 2009 meta-analysis of sixteen plan quality studies. A central argument for their work is that
if plans are to achieve their full potential, they should reflect the highest quality of thought and practice. Only systematic evaluation enables us to identify their specific strengths and weaknesses, to judge whether their overall quality is good, and to provide a basis for ensuring that they reach a desirable standard. (Berke and Godschalk 2009, 228)
Moreover, plan quality evaluation should be part of a learning process that engages “contemporary standards of good practice” (Berke and Godschalk 2009, 288). While Berke and Godschalk provide these reasons for evaluating plan quality, the timing and scope of their meta-analysis—sixteen of the twenty-one studies published between 1995 and 2007—limited their analysis. In this section, we extend their analysis by providing two explanations for why there has been so much growth in the number of plan quality studies and two explanations for why this growth in the literature is a worthwhile investment of time and resources by scholars. Additionally, we expand the scope of analysis to the forty-seven plan quality studies published by 2013. Table 1 summarizes our four explanations and the associated gaps in the literature that are discussed in the following section.
Explanations for the Growth and Benefits of the Plan Quality Literature and Associated Limitations and Gaps.
Explanation for Growth 1: Content Analysis is Accessible
The content analysis methodology employed in the plan quality literature is straightforward to understand and use, making it accessible to a wide array of researchers. A common trait of the plan quality publications is the use of content analysis (Berke and Godschalk 2009), which is a broadly applied social science methodology aimed at generating quantitative data on the content of communications including newspaper articles, speeches, and video (Kassarjian 1977; Krippendorff 2004; Singleton and Straits 2005). Content analysis methods have been employed and refined for more than fifty years in fields such as communications, political science, and consumer behavior (Kassarjian 1977; Krippendorff 2004). In the plan quality literature, the plan document is usually treated as the unit of analysis. A common trait of the studies is that they focus on the words, charts, tables, maps, and other content in the paper or digital plan document (Baer 1997; Berke and Godschalk 2009). Planning scholars use coding items to assess the presence or absence of the specific content, akin to survey researchers using questions or prompts to elicit responses.
Plan evaluation using content analysis methods can be attractive from a researcher’s standpoint for at least six major reasons: (1) a plan document can be treated as a static entity, in contrast to more fluid aspects of planning such as planning processes, communication, power, and interpersonal relationships; (2) plans are typically publicly available and are increasingly easy to find and download online; (3) the main skills required for extracting data from plans are basic familiarity with plans, reading comprehension, thoroughness, and attention to detail, none of which requires highly specialized coursework or professional expertise; (4) the protocol items used to extract data only need be understood by the individuals doing the coding, in contrast to the use of survey or interview instruments, which must be carefully vetted to ensure that all respondents will interpret and answer questions posed by the surveyor or interviewer in a consistent manner; (5) there are no obvious ethical or administrative concerns related to involvement of human subjects that limit the range of items that can be included in a coding protocol or necessitate Institutional Review Board authorization; and (6) no travel, specialized tools, or expensive software are required, which means nonlabor costs are minimal. In short, it is likely that part of the reason for the large number of plan quality studies is that the content analysis methods it relies upon are comparatively accessible.
Explanation for Growth 2: Consensus on Core Principles
Scholars have been able to build conceptual consensus around central principles of plan quality, thereby facilitating the accumulation of knowledge across time, geographies, and planning domains. As the plan quality literature has grown, scholars have increasingly agreed on the core first-order characteristics, or principles, of plan quality (Baer 1997; Berke and Godschalk 2009; Berke, Smith, and Lyles 2012). These principles include goals, fact bases, policies, public participation in plan creation, and plan provisions for implementation and monitoring. The principles are considered to be widely applicable across substantive planning domains (e.g. comprehensive planning, transportation planning, hazard mitigation planning, etc.) and across different scales (e.g. municipal, county, regional, and state) (Berke and Godschalk 2009). Some studies have combined these principles to provide an assessment of overall plan quality (e.g. Berke et al. 1996; Brody 2003a), although persuasive arguments have been made that doing so is problematic because it combines conceptually distinct dimensions of plans (Norton 2008). While pursuit of a unitary overall measure of plan quality may be quixotic, arguments have been made that plan quality principles are interdependent and an effective plan will need to be strong on each principle (Berke, Smith, and Lyles 2012). This convergence and refinement of ideas on what constitutes plan quality coupled with the repeated use of the principles as organizing dimensions in plan quality studies contributes an empirically tested theoretical basis on which to build and extend theories of what drives plan quality and what plan quality influences.
Explanation for Benefits of Growth 1: Plans Are Prominent in Practice
Plan quality evaluation is useful in baseline assessments of the content of plans, which fill a central position in the profession of planning as practiced. In spite of scholarly debates about the different forms plan documents can take and the relative merits or lack thereof of creating plans (c.f. Altshuler 1967; Baer 1997; Neuman 1998; Hopkins 2001; Berke, Godschalk, and Kaiser 2006), in practice plans continue to be developed, adopted, and revised in large numbers in the United States. A few examples drive this point home in hard numbers. In response to strong incentives provided by the federal Disaster Mitigation Act of 2000, all fifty states and more than twenty-six thousand local jurisdictions (accounting for roughly 70 percent of the nation’s population) have adopted local hazard mitigation plans (Department of Homeland Security Office of the Inspector General 2012). A recent survey of state land use planning laws indicated that twenty-five of fifty states require some or all local jurisdictions to adopt local comprehensive plans (Institute for Business and Home Safety 2010). In Wisconsin, one of the twenty-five states mandating local comprehensive planning, more than 80 percent of local governments (>1,500 of 1,922 counties, cities, towns, and villages) had adopted or were developing a comprehensive plan twelve years after passage of the state’s Comprehensive Planning Law in 1999 (Wisconsin Department of Administration 2011). Meanwhile, in the face of inaction at the international, national, and state levels on reducing greenhouse gas emissions and preparing for the impacts of anthropogenic climate change, an increasing number of local jurisdictions are voluntarily opting to develop and adopt local climate change plans (Wheeler 2008; Boswell, Greve, and Seale 2013). As a result of these and other plan creation efforts, uncalculated millions of dollars are being invested by federal, state, and local governments, not to mention untold hours of time contributed by both governmental and nongovernmental participants. Developing a baseline understanding of the contents of these plans and their relative strengths and weaknesses to better inform practice is often cited as a reason for undertaking a scholarly plan quality evaluation (cf. Berke and Manta-Conroy 2000; Berke et al. 2002; Wheeler 2008; Berke and Godschalk 2009; Kang, Peacock, and Husein 2010). This baseline understanding can be seen as a necessary condition for the learning process of understanding if and how plans are implemented and if and how they contribute to outcomes (Berke and Godschalk 2009).
Explanation for Benefits of Growth 2: Contributions to Theories of Planning
In addition to documenting and describing the state of practice, plan quality data have proven useful in building knowledge about the state of practice and refining theories related to planning policy, processes, and impacts. Plan content analysis data have been used to examine whether higher quality plans are associated with desirable planning outcomes (Dalton and Burby 1994; Burby and May 1997; Nelson and French 2002) and make claims about the effectiveness of federal and state mandates requiring planning by local governments (Berke and French 1994; Berke et al. 1996; Burby and May 1997; Godschalk et al. 1999; Hoch 2007; Kang, Peacock, and Husein 2010; Berke, Smith, and Lyles 2012). They have been used to examine the adoption of sprawl-reducing polices (Brody, Carrasco, and Highfield 2006), whether comprehensive plans support the principles of sustainable development (Berke and Manta-Conroy 2000; Manta-Conroy and Berke 2004), and emerging trends in planning practice such as climate change planning (Wheeler 2008; Preston, Westaway, and Yuen 2011; Baker et al. 2012; Stone, Vargo, and Habeeb 2012; Baynham and Stevens 2014). Additionally, they have been used to refine our understanding of plans themselves and the process of creating plans, such as the communicative attributes of plans (cf. Norton 2008; Bunnell and Jepson 2011) and the process-oriented factors that contribute to higher quality plans (cf. Berke et al. 1996; Brody, Carrasco, and Highfield 2006; Horney et al. 2012). This variety of uses of plan content analysis data illustrates how plan quality concepts are being used as dependent and independent variables in analysis aimed at refining and extending theories of how plans are and should be developed and how they are and should be implemented.
Revisiting the Four Explanations to Identify Gaps in the Plan Quality Literature
We revisit the four explanations in reverse order from the previous section, in order to work from the theoretically and practice-relevant gaps in the literature to the methodologically oriented gaps.
Contributions to Theory: Revisited
As noted, some plan quality studies have moved beyond descriptive analysis to engage in explanatory research. Most of these studies have focused on factors that influence the quality of plans, such as mandates to develop plans, state agency enforcement approaches, community characteristics, and some planning process characteristics (cf. Berke and French 1994; Deyle and Smith 1998; Brody 2003; Burby 2003a). The pool of studies linking plan quality to planning outcomes is quite limited (Burby and Dalton 1994; Dalton and Burby 1994; Nelson and French 2002; Brody and Highfield 2005; Laurian et al. 2010).
The lack of studies in this area represents a major gap in our knowledge regarding the empirical value of plans. While the small number of studies that have been conducted on the importance of plan quality for planning outcomes have generally found that better plans tend to be associated with better outcomes, this line of research needs considerably more attention and needs to be more closely linked to the plan implementation literature. In particular, there is a need for analyses of conformance of postadoption actions to plan proscriptions, analyses of plan performance in terms of plan use in decision-making situations, and analyses of plan influences on outcomes. These types of analyses would help planning scholars to better address questions from the literature regarding whether plans matter (Burby and Dalton 1994; Neuman 2003), whether planning works (Brody and Highfield 2005), and whether planning needs plans in order to achieve desired outcomes (Neuman 1998).
Evaluating the importance of plans also speaks directly to the value of plan quality research, in the sense that there is arguably no reason to evaluate the quality of plans if the plans themselves do not make any contribution to decision making and/or do not have any influence on the outcomes they were designed to address. While there is empirical evidence to suggest that plans do have at least some influence on outcomes, the value of plan quality research could be enhanced if future research were to identify the particular features of plans that make the strongest contribution to decision making, behavior modification, and governance. If this type of information were to be identified, it could be included in plan evaluation protocols that researchers could use to determine whether plans contain the features that are most critical for ensuring that they are implemented and that their goals are achieved.
The Prominence of Plans: Revisited
Even though plans are prominent in practice and plan quality data are assumed to be useful to practitioners, it is unknown whether published plan quality findings are influencing the practice of plan making by local and state governments or the review of plans mandated by higher levels of government. Leaving aside the issue of whether or not practitioners read peer-reviewed articles, there is the question of whether plan quality findings are presented in other formats that planning professionals have access to and can readily digest. One recent example of such an effort is “Beyond the Basics: Best Practices in Local Mitigation Planning,” a website developed by plan quality researchers at the Institute for the Environment at the University of North Carolina at Chapel Hill, with cooperation with officials from the Federal Emergency Management Agency (FEMA) who along with other mitigation experts served on an advisory board for developing the website (University of North Carolina at Chapel Hill, n.d.). Although we assume that most plan quality researchers aspire to positively influence practice and have sought to share their findings broadly, if and how such translation and dissemination occurs and whether it is effective in influencing practice (positively or negatively) are open questions.
A related concern is the lack of systematic examination of if and how plan evaluation is done in practice by federal, state, and local agencies that mandate planning or participate in planning. A federal-level example is that under the Disaster Mitigation Act, FEMA has developed and regularly updated planning guidance commonly known as the “Blue Book,” which includes a “Crosswalk” that consists of a checklist of items a plan must address to be approved and a Local Mitigation Plan Review Guide for use by federal and state officials (FEMA 2008, 2011). Local hazard mitigation plans are initially reviewed by state agencies before local governments submit them to FEMA regions for final review and approval (FEMA 2011; Smith, Lyles and Berke 2013). Who conducts these reviews, whether they have planning training or experience, and whether the items are assessed consistently across individual reviewers, states, or FEMA regions is unstudied. A state-level example is that the Wisconsin Department of Administration indicates that it “does not certify received comprehensive plans as complying with the Comprehensive Planning Law” (Wisconsin Department of Administration 2011). Thus, Wisconsin does not evaluate the comprehensive plans it mandates (Ohm 2013), which raises concerns about whether adopted plans even meet state requirements, much less exhibit good quality. 1 These examples point to a range of research questions about the basic procedures and effectiveness of current plan review practices employed by higher levels of government to evaluate plans.
Our third concern here is the relatively narrow focus in the literature on plans as the unit of analysis. While Berke, Godschalk and Kaiser (2006), Norton (2008), and Bunnell and Jepson (2011) make persuasive arguments about the role of plans as communicative tools, a blind spot in the planning evaluation literature has been the lack of attention to other planning documents. Arguably, zoning ordinances, subdivision regulations, staff reports, and permits fill equally—or perhaps more—important roles in planning processes. While the growing plan implementation literature has generated data related to some of the documents, only a few studies apply content analysis methods to systematically evaluate these other types of planning documents (cf. Talen and Knaap 2003; Laurian et al. 2004; Berke et al. 2006; Stevens and Hanschka 2014). Such data can be linked to plan content analysis data and plan outcome data (e.g., changes in land use patterns, carbon emissions, and natural hazard event damages) to generate powerful studies of plan implementation.
Consensus on Core Principles: Revisited
The emerging consensus on the core principles of plan quality does not address how those principles are operationalized in plan content analysis studies. When deciding on the set of items used to measure each principle, researchers may draw on their own subject matter expertise, the expertise of other scholars and practitioners, peer-reviewed research, previously used coding protocols (if relevant ones exist), and in some cases relevant federal or state requirements. Thus, there is no clear procedure for developing the specific plan quality coding items. Moreover, in spite of the growth in plan quality publications, many studies have applied plan content analysis methods to a planning domain not previously assessed in the plan quality literature or to geographic scales or regions not previously assessed. As a result of these two factors, there have been few opportunities to replicate items (or entire protocols), with most plan quality studies customizing protocols to new planning domains or geographies. The lack of replication, which may also be influenced by preferences among journal editors and reviewers for new research rather than replication studies, limits the ability of researchers to compare findings across studies and more effectively develop and refine theories related to plans and their impacts, a concern also raised by Berke and Godschalk (2009).
Another critique related to the convergence of plan quality principles is that the existing principles are largely focused on the rational comprehensive view of planning and prioritize criteria that conceive of a plan as a blueprint or agenda to be executed (Bunnell and Jepson 2011). Bunnell and Jepson make a persuasive argument for more evaluation of the persuasive and communicative characteristics of plans, more in line with conceiving of plans as visions (2011). They also argue that local compliance with proscriptive higher-level government mandates may lead to plans that score well on the blueprint-oriented criteria but largely fail as communicative tools. These arguments return us to the importance of the typologies of plans set out by Baer (1997) and Hopkins (2001). Berke and Godschalk (2009), too, acknowledge the need for plan quality evaluation to be cognizant of the purposes of a plan. A challenge before plan quality researchers is to refine plan quality principles to facilitate application across plan types. Norton (2008) offers a very valuable starting point with his distinction between the policy focus of a plan from its analytical quality, as well as his linkage of plan quality principles to the four criteria of communicative action (i.e., accuracy, comprehensibility, legitimacy, and sincerity).
Content Analysis Is Accessible: Revisited
While content analysis is a comparatively straightforward method, that characteristic is not a guarantee that it is always appropriately applied. Social science disciplines that have long histories of using content analysis have empirically evaluated whether researchers have employed its methods in a systematic and consistent manner. Kolbe and Burnett’s evaluation of the use of content analysis in the consumer behavior literature found numerous gaps in the application of content analysis, or at least in the authors’ reporting of the methods used in publications, and argued for widespread improvements in the application of content analysis (1991). In the communication field, Lombard and colleagues found that publications devote too little space to reporting measures of reliability of content analysis data, in part due to a lack of detailed and practical guidelines (Lombard, Snyder-Duch, and Bracken 2002).
Comparable evaluations of the use of content analysis methods to assess plans are limited to Berke and Godschalk (2009) and Stevens, Lyles, and Berke (2014), which evaluated issues related to assessment of data reliability. Both found that a large proportion of plan quality publications did not report assessing data reliability and among those that did, most employed a potentially problematic statistic. Post hoc assessment of reliability is but one of many methodological issues important for generating valid and replicable data, including developing instruments used in coding, sampling plans, and conducting the content analysis. Berke and Godschalk’s (2009) and Stevens, Lyles, and Berke’s (2014) findings raise concerns of whether plan quality researchers are meeting basic standards for generating reliable and replicable data with content analysis methods. Arguably, this is the most pressing issue facing plan quality researchers because without strong confidence that plan content analysis data are replicable and reliable then there is little ground for developing and refining theories or making recommendations for practice.
Evaluating the Plan Content Analysis Methods
In line with questions just raised about whether plan quality researchers have appropriately applied content analysis methods, we have crafted a framework for evaluating the methods described in peer-reviewed plan quality studies. The framework consists of seven categories of methodological issues that the literatures on content analysis specifically and social science research methods generally indicate plan quality scholars should consider in their research and should explicitly address in their published manuscripts. Content analysis is a systematic and rule-based methodology that can be used to produce a dependable, replicable quantitative data set of attributes of recorded information (Putt and Springer 1989; Krippendorff 2004). The systematic and rule-based features of content analysis distinguish it from less structured techniques (Putt and Springer 1989). Three critical issues for generating content analysis data are objectivity, sampling, and reliability (Berelson 1971; Kassarjian 1977; Kolbe and Burnett 1991). Objectivity necessitates that data be generated in a scientific manner that is replicable and requires “that the categories of analysis be defined so precisely that different analysts may apply them to the same body of content and secure the same results” (Kassarjian 1977, 9.) Our framework consists of five categories to assess objectivity related to (1) protocol design, usage, and availability, (2) scoring, (3) description of coders, (4) coding procedures, and (5) pretesting. It also includes categories to address (6) sampling and (7) assessing reliability. The rationales for why each of the seven categories is important for plan quality researchers to address are summarized in Table 2.
Plan Content Analysis Methods Categories.
Computer applications exist that can be programmed to automatically analyze text, including identifying the presence and frequency of selected words or phrases, as well as more involved analyses. However, we are unaware of any usage of this technology in the plan quality literature to date.
Research Design and Methods
We employed a cross-sectional design in order to provide a descriptive assessment of the methods of the plan quality literature to date. The criteria for inclusion of plan quality studies in our sample were that (1) the study is published in a peer-reviewed journal article, book chapter, or book; (2) plans are the unit of analysis evaluated, as opposed to programs, ordinances, or site designs; (3) the publication presents quantitative plan quality data; and (4) the publication date was before January 2013. We attempted to obtain and evaluate all plan quality studies that met these four criteria. To develop our sample, we used keyword searches of online journal databases and reviewed the citations of plan quality studies in the sample to identify additional studies. We identified multiple instances in which one plan quality data set was presented in multiple peer reviewed publications and/or books. In those instances, we randomly selected one of the related publications to include in our sample and used the other studies in our pretesting procedures.
This sampling procedure identified forty-seven publications, forty-three of which ended up in our main sample and four of which were used in pretesting (Table 3). The major sources for the plan quality publications were the Journal of the American Planning Association (n = 9), the Journal of Planning Education and Research (n = 6), the Journal of Environmental Planning and Management (n = 4), and Landscape and Urban Planning (n = 4). All but three of the publications included at least one author whose affiliation was with a university planning department or research unit. Publication dates range from 1994 to 2012. 2 A strong majority (88 percent) of the publications situated their research efforts in the plan quality evaluation literature by using one of the phrases “plan coding,” “plan quality,” or “plan evaluation.” Every publication used the plan quality data in descriptive analysis, roughly one-quarter (27.9 percent) used the data in multivariate regression models, and roughly half (51.2 percent) used the data in correlation analysis, t-tests, or other nonregression statistical analysis.
Plan Quality Studies.
AU = Australia, CA = Canada, EN = England, HO = Holland, NZ = New Zealand, UK = United Kingdom, US = United States.
D denotes descriptive analysis, C denotes correlation or other nonregression statistical analysis, and R denotes regression analysis.
We developed our protocol in line with the evaluation framework described above. The only prior evaluation of plan quality studies similar to the research presented in this paper was the Berke and Godschalk (2009) meta-analysis. Berke and Godschalk assessed just a few methodological characteristics of plan quality studies (e.g., intercoder agreement and research design). To extend the scope of methodological characteristics beyond the work of Berke and Godschalk, we drew on two additional sources. First, we used Krippendorff’s Content Analysis: An Introduction to Its Methodology, to identify other critical areas of content analysis methodology that all researchers using content analysis methods should consider. Second, in order to ensure that we had considered potentially unique issues for applying content analysis methods to plans, we drew on our own sense of what were typical practices and leading edge practices in the plan quality literature, based on our familiarity with many of the studies prior to beginning this research project. After developing the individual items, scoring definitions, and scoring examples in the protocol, we asked three scholars who had published key articles in the plan evaluation literature to review the protocol. 3 They consented and we sought to account for their helpful suggestions for improving the protocol.
The protocol consisted of forty-five discrete items that were grouped into our seven conceptual categories (see appendix). Binary coding was used for all items. Item descriptions were written so that coders recorded a 1 for the item if the answer to the prompt was yes and recorded a 0 if the answer to the prompt was no. For example, to measure whether the publication identifies the number of plans evaluated, the item language was “Indicates the number of plans under study.” Each item included in the protocol was accompanied by a scoring definition, as well as item-specific instructions, such as examples of what distinguishes a yes versus a no for the item.
The authors pretested the protocol and the coding procedures on the sample of four publications noted above. 4 Two iterations of pretesting were conducted in which the authors (1) independently double-coded two publications, (2) identified items on which we disagreed, (3) calculated reliability scores using percentage agreement and Krippendorff’s alpha, and (4) reconciled scores for the items on which we disagreed. After both iterations, minor revisions were made to the protocol. Our reliability statistics for the pretests were 95 percent average agreement and an average Krippendorff alpha score of 0.854, which meet generally accepted standards (Miles and Huberman 1984, 1994; Krippendorff 2004).
For the main sample, the two authors independently coded all forty-three publications, identified the items on which we disagreed and reconciled scores for the items on which we disagreed. Coding reliability was assessed by calculating Krippendorff’s alpha and percentage agreement scores for each individual item. Krippendorff’s alpha scores range from a high of 1.00 to a low of 0.00 and average 0.608 and percentage agreement scores range from a high of 100.0 percent to a low of 53.5 percent and average 91.5 percent (see appendix for individual item reliability scores). 5 The content analysis data created through this process were used to generate descriptive statistics for each item, which are presented in the next section of the paper. Additionally, where applicable, we provide qualitative examples from the sample of publications that provide additional insights into the quantitative findings.
Findings
The findings are organized by the seven categories of items that we assessed. Percentages for each individual item on the coding protocol are reported. For some categories, such as sampling, the percentages are consistently high. For other categories, such as pretesting, coders, coding procedures, and assessing reliability, the percentages are consistently low. The percentages for the items in the remaining categories vary widely.
In the first category, the attention paid to protocol design, usage, and availability was mixed in the sample. Just over half of the publications (56 percent) indicated the source of the individual items included in the protocol. Few publications (12 percent) indicated that the protocol included instructions or had a companion guide to aid coders in making coding decisions. More than three-quarters of the publications (77 percent) made their protocols available in the publication itself, online, or upon request from the authors. Interestingly, four publications noted the use of external experts, such as practitioners, to review draft versions of the coding protocol. In the second category, most publications in the sample addressed basic issues of scoring individual items and creating plan quality indexes from the individual item scores, but considerably fewer addressed more detailed and technical issues. For example, a large majority (74 percent) included specific language for what distinguishes between the scores assigned for items (e.g., code 1 if present, 0 if not present), but few (21 percent) provided specific examples of how the scoring scheme was applied. Likewise, nearly three-quarters (74 percent) described the process of how individual items were aggregated into index scores or a total score. Roughly half (51 percent) described a process of standardizing index scores so that different index scores can be compared. Meanwhile, just one in six publications (16 percent) addressed the issue of whether and how items were weighted in the process of aggregating individual items into plan quality index scores or total plan quality scores.
For the third category, descriptions of coders, the publications provided little information. While roughly half the publications (51 percent) indicate the total number of individual coders on the coding team, just eight (19 percent) indicated that the coders were trained in applying the protocol to plans and only one described the training process. The fourth category, coding procedures, also exhibited low scores. Just over a quarter of the sample (28 percent) indicated that all coders worked independently of each other in the initial coding of each plan. Further, nearly half the publications (49 percent) were unclear about how many coders read the plans, while one-third (33 percent) employed double coding for every plan in the sample. The remainder of the sample employed double coding for a subset of the plans or used a single coder—5 percent for a single coder coding all plans once and 2 percent for a single coder double-coding all plans. Eight of the studies (19 percent) explicitly indicated that after independently coding the plans, the coders discussed their disagreements and developed a reconciled data set.
For the fifth category, few publications addressed pretesting the protocol and coding procedures. Less than one-third of the sample (30 percent) indicated that the protocol was pretested and just three publications (7 percent) indicated that pretesting was done on a plan (or plans) outside the sample under study. No publications indicated that intercoder reliability was assessed during the pretest process and less than one-fifth (19 percent) indicated that modifications were made to the protocol as a result of pretesting. Of the seven categories, the sixth—description of the sampling decisions—was addressed by the most publications in our sample. Almost every publication (95 percent) identified the study region and most (72 percent) justified why the study region is appropriate. Very high percentages (86 percent and 84 percent) described the sampling frame of plans and the process used to select plans from the sampling frame. Every study identified the number of plans assessed in the publication. The only point of relative weakness in this category is that just more than half the publications (51 percent) identified how the actual plans were identified as available.
For the seventh category, assessing reliability, the publications also provided limited information. Only fourteen publications (33 percent) in the sample assessed reliability. Of those, thirteen (30 percent) presented an average percentage agreement score for all items on the protocol and one publication presented a range of reliability scores using an alternative measure of reliability other than percentage agreement. Nearly one-third of the sample (30 percent) identified a published standard for judging the acceptable reliability of items—typically Miles and Huberman (1984), which develops standards for percentage agreement scores. None of the publications indicated that the results of the reliability assessment led the researchers to exclude any of the items in their data set from analysis.
Discussion
If plan quality researchers aim to have their findings positively influence planning theory and practice in the ways we discussed earlier, they first need to be able to make strong claims that content analysis methods are rigorously applied and that plan quality data are reliable and replicable. Our analysis of plan quality evaluation studies exhibits some clear patterns of methodological strengths and weaknesses directly relevant to the generation and reporting of plan quality data. The most consistently strong category is descriptions of sampling decisions. Consistently weak categories include descriptions of coders, coding procedures, pretesting, and assessment of reliability. Meanwhile, for the other categories (protocol design, usage, and availability and scoring), the average scores for the items varied widely within the categories. A potential explanation for these patterns of scores is that, on the one hand, sampling is a basic research design consideration common to all analytical methods and the high scores may indicate baseline awareness among plan quality researchers of the need to describe the sample of plans. On the other hand, the pretesting, coder, coding procedures, and reliability assessment items we assessed are more specific to content analysis methods and plan quality researchers may not be aware of the methodological literature on content analysis. The remainder of this section interprets the findings for each of the categories in more detail.
For the protocol design, usage and availability category of items, the average scores were higher for the availability items than the design and usage items. On the one hand, the fact that almost three-quarters of researchers make their protocol available increases transparency about the data being collected. Also of importance is that higher levels of protocol availability should facilitate replication, which in turn can aid in the collective ability to refine concepts of plan quality and to test the influence of factors leading to higher plan quality and the influence of plan quality on other variables of interest. On the other hand, the moderate average score for identifying the source of protocol items may indicate fairly widespread failure by plan quality researchers to replicate items from existing plan quality protocols. If so, opportunities for theory building and hypothesis testing are being unnecessarily constrained. An alternative explanation may be that for a large portion of plan quality researchers, existing protocols are not relevant because the researchers are applying plan quality evaluation methods to a type of plan (e.g., sustainable development in the case of Berke and Manta-Conroy 2000 or climate change plans in the case of Wheeler 2008) or for a topic (e.g. the communicative quality of plans as in the case of Bunnell and Jepson 2011) that had not been previously evaluated. The studies that have consulted external experts to review coding protocols represent an important innovation in plan quality evaluation methodology. While plan quality researchers are often leading experts in their substantive areas, other researchers and practitioners who develop plans almost certainly can provide insights useful for selecting items for the protocol, and more importantly, for ensuring the subsequent findings are relevant to practice. An example of how external review can be done is provided by Evenson and colleagues, who note that they circulated their pedestrian plan coding protocol among members of the North Carolina Physical Activity Policy Research Center Advisory Group and professional planners to obtain comments on how to improve it (2012).
The average scores for the scoring category items indicated widespread attention to standard and straightforward issues and limited attention to a more complicated, but critical issue. Providing score definitions and descriptions of how individual items are aggregated into indexes or total scores involve straightforward explanation of scoring procedures and are standard practices in the studies. Yet, very few studies address the issue of weighting when aggregating items into indexes. The most common process of creating a plan quality index in the literature consists of adding up the individual item scores, dividing by the total number of items to put index scores on a common scale, and sometimes multiplying the index scores by a constant factor (usually 10.0 or 100.0). Brody (2003a) provides one of the most thorough descriptions of this equal weighting process. Mathematically weighting each item identically, however, contains an implicit assumption that all of the items in an index are of equal importance. Understandably, researchers are hesitant to make value judgments by assigning varying weights to items in the absence of strong theoretical or empirical justification because of concerns about biasing results. As the plan quality literature continues to evolve, we anticipate seeing more arguments that items that are added together to create an index are of different importance and that approaches such as “multiattribute evaluation” (Edwards and Newman 1982) should be used to determine an item’s influence, or weight, relative to other items in the index. While we urge considerable caution in assigning numerical weights to items, the overwhelming lack of acknowledgement in the literature of the current implicit equal weighting constitutes an important oversight among authors.
The description of the coders category is consistently weak. There was widespread failure to describe whether and how the coders were trained. Under ideal circumstances, the instructions provided with a coding protocol would be sufficient for reliable coding; however, training of coders is often necessary to familiarize coders with the coding protocol and the types of plans they will be coding (Putt and Springer 1989; Krippendorff 2004). The only study in the entire sample to describe the process of training coders was provided by Tang and colleagues, who not only provide details about the training steps, but also note that the process was based on principles of behavioral role modeling (Tang et al. 2011, 114). Likewise, descriptions of coding procedures in the publications leave considerable room for improvement. Independent repeated coding is a fundamental principle of content analysis (Krippendorff 2004). Yet, less than a third of the studies explicitly indicated that coders worked independently and that all plans were double coded. Similarly, only a small portion of the publications indicated that disagreements between coders were reconciled to produce a final data set. Failure to use independent double coding raises serious questions about whether plan coding data represent the subjective judgment of a single individual, while failure to indicate whether the scores were reconciled when two or more coders were used means that it is unclear how the final scores used in analysis were determined.
Also consistently weak are descriptions of pretesting. While pretesting should occur iteratively in protocol design and design of procedures for its administration—as is the case in survey research (Dillman, Smyth, and Christia 2008)—only one-third of the studies even noted pretesting took place. It is concerning that very low percentages of studies indicated that the pretesting took place on a sample of plans external to those used in the main study and that any changes were made to the protocol or procedures as a result of pretesting. Worse still, no studies reported assessing data reliability in the pretesting process. Thus, the studies provide little assurance that the protocols are being trialed in a way that demonstrates that they actually measure what the researchers intend to measure and that they do so reliably over repeated application to different samples of plans. Bassett and Shandas (2010) provide a leading example for describing their pretesting process, including identifying the specific plans outside the sample that were used, why those plans were a good test set, and explaining the major modifications to their protocol that resulted from the pretesting process.
Of all the categories assessed here, we found the most consistently strong average scores for sampling. Plan quality researchers have done commendable jobs of clearly identifying the regions from which a sample is drawn, the sampling frame and procedures, and for the most part, justifying why the regions are appropriate. Every study has identified the number of plans in the sample, but only around half explained how the plans were identified as available. As Internet publication of public documents, including plans, becomes more common, this issue may become irrelevant. Nonetheless, in some cases it may be helpful to other scholars interested in replicating research to understand how the sample of plans was actually obtained. 6
The assessment of reliability is consistently weak. The limited number of studies that assessed reliability, and the overwhelming reliance on percentage agreement as a measure of reliability, raises concerns about the reliability of the data being generated in plan quality studies. Stevens, Lyles, and Berke (2014) have addressed this issue in depth recently, identifying a number of specific recommendations for how to improve the reliability of plan quality data and the assessment and reporting of that reliability. Just two studies in our sample, Edwards and Haines (2007) and Horney et al. (2012), explicitly noted that plans were independently coded, that disagreements among coders were reconciled, and that reliability scores were compared to a published standard; both used average percentage agreement and cited Miles and Huberman (1984, 1994).
Our overarching recommendation for improving the application of content analysis methods to plans is for plan quality researchers to follow the standard social science and content analysis research practices we have identified. To restate, these include for plan quality researchers to
seek to replicate existing items when relevant protocols and items have already been developed and tested and clearly identify the sources of all items, whether from previous studies, the authors’ own expertise, or some other source;
specifically describe the scoring scheme and its application and when items are aggregated explicitly address assumptions about item weighting;
clarify who coded the plans and what type of training they received;
employ independent double coding and describe the reconciliation process used to generate the final data set used in analysis;
employ pretesting of coding protocols and procedures and report the reliability of the data generated in the pretest process;
continue to identify the region and sampling, but make clearer how plans were obtained so that other researchers can more easily create a replication sample; and
assess the reliability of all coding items using Krippendorff’s alpha in lieu of, or in addition to percentage agreement, and report the individual item reliability statistics, or at least a range for the items included in the publication.
If researchers follow these procedures, more confidence can be placed on the contributions to theory and practice of the plan quality literature.
Conclusions
Over the last twenty years, more than forty-five plan quality evaluation publications have employed content analysis methods to measure the quality of a wide variety of plans dealing with a range of critical planning issues including sustainable development, affordable housing, climate change, and natural hazards. We have argued that the growth of this area of the planning literature is largely due to content analysis methods being widely accessible to researchers and to the ability of plan quality researchers to build consensus around core principles of what constitutes higher plan quality. We have also argued that because of the prevalence of plans in practice and the utility of plan quality data in developing and refining a wide range of planning theories, researchers’ endeavors to evaluate plan quality have not been misplaced. Instead, we believe that in many ways the potential of plan quality evaluation to inform theory and practice has only begun to be tapped. In addition, we have identified a number of limitations and gaps in the plan quality literature that should be addressed, as shown in Table 1. In particular, our systematic examination of the application of content analysis methods described in the plan quality studies published before 2013 indicates that there is considerable room for improvement in the use (and reporting) of the procedures for generating reliable and replicable plan quality data. Specific areas of weakness in the plan quality publications are the descriptions of the coders involved in content analysis, the coding procedures, the use of pretesting procedures, and the assessment of the reliability of the data generated. Our analysis cannot distinguish whether the weaknesses we have identified are due to a failure to apply the content analysis methods rigorously in fact or if they are due to a failure of authors to adequately describe their rigorous application of content analysis methods. We expect that both types of failures are present in the plan quality literature. Either way, all those interested in plan quality research will benefit if authors report on their methods more thoroughly and consistently moving forward.
The core implications of our findings are that the development and refinement of theories related to the quality of plans are likely being constrained because of the limitations and gaps we have identified and because authors of plan quality studies are not sufficiently thorough and transparent in their methods. The lack of repeated testing and refinement of plan quality coding protocols may also inhibit the transformation of research coding protocols into a set of widely accepted and practice-friendly plan evaluation tools useful for local planners. Local planners evaluating their own plans and regional, state, and federal officials responsible for evaluating plans mandated for lower levels of government may have trouble selecting among the many coding instruments and the large number of items contained in the instruments. These implications relate back to the conclusions of our critical review of the plan quality literature. Namely, in spite of the growth in the number of studies, linkages between the peer-reviewed plan quality literature and the practice of plan making and plan evaluation are underdeveloped. Likewise, while there is the general consensus on the principles of plan quality and some studies have used plan quality data in explanatory research, the principles are more applicable to some types of plans than others (e.g., mandate plans that function as blueprints rather than aspirational vision plans) and the relationships between plan quality and planning outcomes are still underinvestigated. In conclusion, we have identified a number of areas of future research for plan quality scholars, starting with better understanding how plans are evaluated in practice, better linking conceptions of plan purposes with plan quality principles, and examining the relationships between plan quality and desired planning outcomes.
Footnotes
Appendix
Coding Protocol.
| Item | Description | Frequency in Data set |
Reliability Scores: |
|---|---|---|---|
| Protocol design, usage, and availability | |||
| Protocol source | Indicates the source(s) of the items included in the protocol | 24 (55.8%) | 0.534 (76.7%) |
| Protocol instructions | Indicates that the protocol had instructions or a companion guide to aid coders | 5 (11.6%) | 0.539 (93.0%) |
| Protocol included | Includes the protocol or a list of all protocol items in the article (or appendix) | 30 (69.8%) | 0.498 (79.1%) |
| Protocol available | Indicates that the protocol is available upon request or online | 33 (76.7%) | 0.879 (97.7%) |
| Scoring | |||
| Scoring definitions | Provides specific language for what distinguishes between scores for items | 32 (74.4%) | 0.657 (86.0%) |
| Scoring examples | Provides specific examples of how scoring scheme will be applied | 9 (20.9%) | 0.666 (88.4%) |
| Aggregation | Describes the process by which items were aggregated into index scores or a total score | 32 (74.4%) | 0.833 (93.0%) |
| Standardization | Describes the process for standardizing the indexes generated from items so that indexes can be compared (e.g., standardizing index scores for each principle to a 0–10 score so that index scores can be compared across principles) | 22 (51.2%) | 0.768 (88.4%) |
| Item weighting | Addresses the issue of item weighting in aggregation process | 7 (16.3%) | 0.552 (88.4%) |
| Equal weighting | Indicates that all items were weighted equally in aggregation process | 6 (14.0%) | 0.617 (90.7%) |
| Nonequal weighting | Indicates that some items were more (or less) weighted in aggregation process | 1 (2.3%) | 0.000 (97.7%) |
| Descriptions of coders | |||
| Author coder | Notes that at least one of the paper authors was involved in coding the plans | 9 (20.9%) | 0.800 (93.0%) |
| Number of coders | Indicates the total number of coders on the team that coded the plans | 22 (51.2%) | 0.724 (86.0%) |
| Coder training | Indicates that coders were trained in applying the protocol | 8 (18.6%) | 0.848 (95.3%) |
| Coder training process | Describes the process by which coders were trained | 1 (2.3%) | 0.659 (97.7%) |
| Describing coding procedures | |||
| Independent coding | Indicates that all coders worked independent from each other in the initial coding of each plan | 12 (27.9%) | 0.772 (90.7%) |
| Reconciliation | Indicates that coders discussed disagreements in scores from independent coding and developed a reconciled data set | 8 (18.6%) | 0.517 (81.4%) |
| One coder, single code all plans | Indicates that each plan was single coded by the same coder | 2 (4.7%) | 0.482 (95.3%) |
| One coder, double code all plans | Indicates that each plan was double coded by the same coder | 1 (2.3%) | Undefined a (97.7%) |
| One coder, double code some plans | Indicates that some plans were doubled coded by the same coder | 0 (0.0%) | Undefined a (100%) |
| Two coders, double code all plans | Indicates that each plan was doubled coded by two different coders | 14 (32.6%) | 0.850 (93.0%) |
| Two coders, double code some plans | Indicates that some plans were double coded by two different coders | 5 (11.6%) | 0.455 (90.7%) |
| Coders unclear | It is unclear or unmentioned whether plans were single or double coded | 21 (48.8%) | 0.016 (53.5%) |
| Pretesting | |||
| Pretesting conducted | Indicates that a draft of the protocol was pretested | 13 (30.2%) | 0.886 (95.3%) |
| Pretest on plans outside the sample | Indicates that the pretest process was conducted on at least one plan outside the sample under study | 3 (7.0%) | 1.000 (100%) |
| Pretesting reliability assessed | Indicates that intercoder reliability was assessed during the pretest process | 0 (0.0%) | 0.000 (97.7%) |
| Protocol revised as part of pretesting process | Indicates that the protocol was revised as part of or as a result of pretest process | 8 (18.6%) | 0.848 (95.3%) |
| Descriptions of sampling decisions | |||
| Region identified | Identifies the study region | 41 (95.3%) | 0.482 (95.3%) |
| Region justified | Justifies why the study region is appropriate | 31 (72.1%) | 0.721 (88.4%) |
| Sampling frame | Describes the sampling frame or population of jurisdictions/plans from which the sample is selected | 37 (86.0%) | 0.441 (83.7%) |
| Sampling process | Describes the process used to select the sample from the sampling frame | 36 (83.7%) | 0.241 (76.7%) |
| Number of plans | Indicates the number of plans under study | 43 (100.0%) | 0.000 (97.7%) |
| Plans identified | Indicates how copies of the plans were identified as available | 22 (51.2%) | 0.484 (74.4%) |
| Assessing reliability | |||
| Average percentage agreement | An average percentage agreement score across all items is presented as a measure of intercoder reliability | 13 (30.2%) | 0.944 (97.7%) |
| Range of scores (percentage agreement) | The range of percentage agreement scores for all items is presented as a measure of intercoder reliability | 0 (0.0%) | Undefined a (100%) |
| Item scores (percentage agreement) | The percentage agreement scores for all items are presented individually as measures of intercoder reliability | 0 (0.0%) | Undefined a (100%) |
| Average reliability scores (not percentage agreement) | An average reliability score, other than percentage agreement, is presented as a measure of intercoder reliability | 0 (0.0%) | Undefined a (100%) |
| Range of scores (not percentage agreement) | The range of reliability scores, other than percentage agreement, for all items is presented as a measure of intercoder reliability | 1 (2.3%) | 1.000 (100%) |
| Item scores (not percentage agreement) | The reliability scores, other than percentage agreement, for all items are presented individually as measures of intercoder reliability | 0 (0.0%) | Undefined a (100%) |
| Uses published reliability standard | Identifies a published standard for judging the acceptable reliability for items | 13 (30.2%) | 0.944 (97.7%) |
| Item excluded | Indicates that at least one item was dropped from analysis because of an unacceptable reliability score or that items would have been dropped had they not met the reliability standard | 0 (0.0%) | Undefined a (100%) |
The value of alpha is undefined when the following two conditions are met: (1) percentage agreement is equal to 100 percent (i.e., the coders agree on the item score for every publication in the sample), and (2) the item was found by each coder to be present in every publication, or the item was found by each coder to be absent from every publication. For more details, see Stevens, Lyles, and Berke (2014).
Acknowledgements
The authors are grateful to three anonymous reviewers for their astute comments on earlier drafts of this article. We are also grateful to William Baer, Philip Berke, and David Godschalk for their comments and suggestions at the outset of this project.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
