Abstract
Peer review guides the intensive reworking of research reports, a key mechanism in the construction of social scientific knowledge and one that gives substantial creative agency to journal editors and reviewers. We conceptualize this process in terms of two types of challenges: evidentiary challenges that question a study’s methodology and interpretive challenges that question a study’s theoretical framing. A survey of authors recently published in Administrative Science Quarterly finds that their peer review experience was dominated by interpretive challenges: extensive criticisms, suggestions, and subsequent revision concerning conceptual and theoretical issues but limited attention to methodological and empirical aspects of the work. Salient differences between original submissions and published papers include intensive reworking of theory and discussion sections as well as growth and turnover in citations and hypotheses. We consider implications of the dominance of interpretive challenges in successful revision and possible sources of variation across scholarly fields.
From the standpoint of the thinker, the socialization of his thought is coincidental with its revision. The social and intellectual habits of the audience . . . condition the statement of the thinker and the fixation of beliefs evolving from that interplay.
Peer review is the core evaluative practice of the academic world. The judgments of fellow scholars determine which projects are funded, which books win prizes, which professors gain tenure, and—the subject of this article—which words appear on the printed page. Celebrated by Merton (1973:339) as “institutionalized vigilance” and by Chubin and Hackett (1990:5) as the stabilizing “flywheel of science,” peer review is a mainstay of modern science as an autonomous professional project. It is also a rapidly evolving behavioral complex that is under considerable pressure today, given exponential growth in scholarly production amid rising expectations that evaluations be unbiased, detailed, and constructive (Abbott 1999).
Peer review is of considerable theoretical interest as well as empirical significance. For the sociology of work and occupations, peer review is central to the academic community’s capacity to defend its turf and enforce internally generated standards of quality (Abbott 1988; Larson 1977). For the sociology of science, scholarly evaluation provides a site for documentation of particularism (Peters and Ceci 1982; Zuckerman and Merton 1971) and the struggle between authors and dissenters (Myers 1985). Rational choice analyses of self-interested competition can find rich material in a world where competitors judge each other’s work (Squazzoni 2010). And for cultural and economic sociology, peer review offers a remarkable opportunity to examine the social construction of value (Lamont 2012; Zuckerman 2012).
On the downside, research in this area faces daunting challenges. One source of difficulty is logistical; only journal editors have full information, and they are properly committed to maintaining the confidentiality of the communications they receive. Inquiries into the inner workings of peer review are also liable to generate controversy within the academic community. Important studies utilizing deception, like Mahoney (1977) and Peters and Ceci (1982), were difficult to publish, and their authors courted professional censure. And in a broader sense, research on scholarly evaluation—which itself will be assessed in the usual way—may be too close for comfort, readily dismissed as navel gazing and caught between the Scylla of accepting institutional mythology at face value and the Charybdis of serving up a feast of sour grapes. 1
This article analyzes the way journal submissions are reshaped in the course of their evaluation. This is sometimes termed the “developmental” function of peer review, a term that we place in quotes due to its strong but empirically unsubstantiated connotation of improvement. While attention generally centers on whether manuscripts are accepted or rejected (Bakanic, McPhail, and Simon 1987; Beyer, Chanove, and Fox 1995; Hargens 1988), peer review is more than a sorting process that renders final verdicts. It is also a mechanism for the intensive reworking of research reports. In an academic production system marked by intense competition for journal space, multiple revisions, and a culture of referee coaching, the review process can substantially restructure a paper’s appearance and message. Bedeian (2004) characterizes journal publication as a social negotiation between author, editor, and reviewers, while Frey (2003:213) suggests that “sometimes the papers published reflect more the referees’ than the authors’ ideas.”
We study a sample of papers appearing between 2005 and 2009 in Administrative Science Quarterly (ASQ), a flagship journal in organization studies. Drawing on an original survey, we summarize author self-reports concerning the comments they received and the revisions they made in the peer review process. We then examine changes in the research reports themselves, an analysis made possible because authors furnished us with copies of the manuscripts they originally submitted to ASQ that we compare to the ones that the journal published. To our knowledge, this paper presents the first quantitative study of the way social scientific articles are modified in the course of peer review (in medical science, see Goodman, Berlin, and Fletcher [1994] and Hopewell et al. [2013]).
It is important to underline the limitations imposed by our research design. While a fully comprehensive analysis of peer review would examine a random sample of all submissions, this paper’s sample is restricted to ones that ultimately appeared in print. Author self-reports and textual comparisons provide rich evidence concerning the role of editors and reviewers in prompting manuscript change. Because we analyze a sample of recent publications, however, we are unable to describe the criteria that referees applied to submissions that were rejected (with or without revision) or withdrawn. The distinction is important because the evaluative issues that dominate revision presumably differ from the ones that dominate rejection—the former center on changes that referees regard as feasible, while the latter focus on defects they view as irremediable. 2 We return to this issue in the Discussion section, but it should be recalled throughout that the data presented here generalize to successfully negotiated peer review rather than peer review as a whole.
This paper makes two contributions. First, it investigates a key mechanism in the production of scientific knowledge (Camic, Gross, and Lamont 2011; see Gross [2009] for a congenial conceptualization of social mechanisms as learned, habitual practices). Revision for academic journals consumes a substantial amount of the time and energy of the scholars involved—some surveyed authors described a multiyear process, and all of the papers studied here underwent at least one round of revision. Changes made in the course of journal evaluation shape the scientific record, which is constituted by the claims and evidence that appear on the printed page. And although individual scholars have a keen sense of their own “publication journeys,” the confidentiality of peer review inhibits a larger perspective. Indeed, while much that surveyed authors told us resonated with our own professional experience, we were frankly surprised by the strength of the regularities that emerged in the aggregate.
In a broader theoretical sense, analysis of developmental review enlarges our understanding of social valuation. Work in this area generally examines arm’s-length relationships that treat the evaluated object as a static quantity—as when ratings agencies rank university departments or awards committees bestow recognition for accomplishment or funding panels choose proposals to support. Interaction between authors, editors, and reviewers, by contrast, involves negotiation over a dynamic entity whose features are determined at the conclusion rather than the outset. Our central interest is with the substance of this transformative process. In what ways is scholarly work routinely modified in the course of its evaluation? What features of research reports go unchanged?
Ellison (2002) offers a formal model of peer review that motivates the issues considered here. He begins with two striking trends: I encourage readers of this paper to put it down for two minutes and thumb through a 30- or 40-year-old issue of an economics journal. This should convey better than I can in words how dramatically economics papers have changed over the last few decades. Papers today are much longer. They have longer introductions. They have more sections discussing extensions of the main results. They have more references. The publication process has also changed dramatically. Around 1960 most papers were accepted or rejected on the initial submission. In the early 1970s, successful authors would submit a paper, receive reports, make revisions, and get a final acceptance within about nine months. Today extensive revisions are the norm, and getting an acceptance takes 20–30 months at most top economics journals. (Pp. 994–95)
Ellison conceptualizes papers as varying on two dimensions: q, aspects of quality that are fixed and unchangeable, and r, aspects of quality that are modifiable. He considers the endogenous dynamics that arise as authors learn about peer review from the evaluation of their own papers and then apply the inferred standards to others. Ellison argues that reviewers are likely to place increasing stress on r through extensive manuscript revision, because they perceive their work as being held to a higher standard than it really is.
While he offers penetrating insights, Ellison’s analysis is entirely speculative. The historical correlation between evaluative stringency and article complexity could be driven by other factors, such as rising professionalism or tightening competition for journal space. And even if the review process is a key mechanism, as we suspect it is, we lack an empirically grounded understanding of the features of research reports that are typically modified through the negotiation between authors, editors, and reviewers. It is unclear a priori which elements of scholarship are fixed (q) and which are malleable (r).
This paper cuts the Gordian knot by inspecting the changes that occur in the course of peer review. We draw on author self-reports as well as the manuscripts themselves to consider shifts in theoretical argumentation, methods, findings, and other elements of research reports as well as key signposts, like hypotheses and citations. Our starting assumption is that all components of scholarly work are malleable if the parties involved treat them as such. Given the power/dependence differential between authors and referees—the former have much to lose and the latter little to gain—the key to this interaction is the way editors and reviewers approach their task and the intellectual interests and resources they bring to the table.
We follow the lead of Lamont and colleagues (Guetzkow, Lamont and Mallard 2004; Lamont 2009; Mallard, Lamont, and Guetzkow 2009) in focusing on the normatively legitimate criteria that frame peer review. Even the most empowered readers—and journal editors and reviewers are surely empowered—cannot simply signal thumbs-up or thumbs-down. They are obliged to offer assessments that respect scholarly norms and can be viewed by others as compelling. We conceptualize the evaluative stances available to referees in terms of “evidentiary” and “interpretive” challenges, both of which are consistent with broad readings of the scientific project.
Peer review is often approached in terms of the evidentiary basis of knowledge claims, which underlies its popular, scholarly, and legal status as a “gold standard” that protects the public from findings that lack a solid foundation (Cicchetti 1991; Lock 1985). This perspective does not assert that all errors are detected—they manifestly are not—but it does imply close scrutiny of a project’s methodology and the plausibility of its results. Research that exhibits severe or unredeemable failings should be rejected, while flawed but promising studies may be revised toward acceptability. In a publishing regime where standards are high and few submissions accepted outright, an evidentiary perspective suggests that many manuscripts will undergo changes in measurement, analytic technique, and empirical scope. More precise measures should be substituted for cruder ones, more powerful or more appropriate techniques employed, and the author’s strongest claims shored up by additional experiments or analyses while speculative assertions are pruned from the manuscript.
We find, by contrast, that the revision processes studied here center on interpretive issues related to the article’s positioning within the academic conversation. Authors reported that key recommendations from editors and referees involve theoretical concepts and argumentation with much attention to the paper’s audience and that the key changes they made to manuscripts revolved around these matters as well. Inspection of differences between original submissions and published papers tells a compatible story marked by substantial expansion and turnover in theoretical discussion, citations, and hypotheses. In short, manuscript revision was primarily an exercise in reframing.
Evidentiary and Interpretive Challenges in Peer Review
Normative accounts of peer review take a strict view of the considerations that referees should apply in evaluating manuscripts. This is well stated by Mahoney (1977), who contends that the logic of scientific inquiry enjoins critical scrutiny of a research question’s importance and the strength of the project’s methodology but little else. He reasons that empirical findings that support theoretical expectations and those that violate them are both informative—indeed, the Popperian doctrine of falsification instructs us that disconfirmation is more telling than confirmation. Mahoney further argues that the author’s interpretation of results is of secondary significance since fellow researchers are able to draw their own conclusions from research findings fairly stated.
Similar perspectives appear in other discussions of peer review. Cicchetti (1991:120) identifies six criteria: (1) relevance and completeness of the literature review, (2) author’s originality or imaginativeness, (3) adequacy of research methodology, (4) data-analytic strategy, (5) importance of actual/expected findings, and (6) clarity and organization of the information the author presents. Categories (1) and (5) correspond to Mahoney’s notion of importance and (3) and (4) to methodological strength; the other two categories point to abstract aspects of quality. More detailed expositions expand on the same themes: Ramos-Alvarez et al. (2008) develop a seven-page checklist whose major headings include Research Motivation, Theoretical Considerations, Methods, Results, and Interpretation.
These normative treatments suggest two main paths that peer review might take. The first questions the paper’s empirical knowledge claim and works backward to scrutinize its methodological basis. We term this the “evidentiary challenge.” The second accepts the paper’s empirical knowledge claim and moves forward to examine its theoretical foundations and implications. This is the “interpretive challenge.” In the first case, peer review centers on methods; in the second, it centers on meaning. Manuscripts may of course receive challenges of both types, and the issues can be intertwined—for example, a methodological critique might form the basis for a rival theoretical interpretation. The distinction between evidentiary and interpretive challenges is nevertheless a robust one that can guide empirical investigation. 3
Evidentiary challenges have been theorized in a classic stream of constructivist sociology of science; we review several key statements to identify core ideas. Latour (1987; Latour and Woolgar 1979) argues that controversies lead “upstream” from a knowledge claim to the conditions of its production. For example, Latour (1987:22) contrasts the statements “New Soviet missiles are accurate within 100 metres” and “Advocates of the MX in the Pentagon cleverly leak information contending that new Soviet missiles are accurate within 100 metres.” He terms the clause “Advocates . . . contending that” a negative modality that qualifies and casts doubt on an otherwise factlike assertion by specifying its self-interested source.
In science, upstream battles are methodological in the sense that they center on the instruments, routines, and techniques that underlie knowledge claims. Authors point to the verdicts delivered by inscription devices, while dissenters probe for weaknesses. If the author wins the battle, the claim becomes black-boxed as part of the accepted body of scientific knowledge; if the dissenter wins, negative modalities are attached that undermine the claim’s factlike character.
Pinch (1985) develops a related analysis of scientific controversies as involving claims that stand at different levels of externality, contrasting statements like “solar neutrinos were observed” with lower-level assertions, such as “splodges on a graph were observed.” Claims about splodges are virtually incontrovertible but trivial, while statements about neutrinos are scientifically significant but assailable. Pinch argues that dissenters seek to undermine claims by recasting them at lower levels of externality; the broader the critique, the more components of the author’s research are brought into question and the more acrimonious the debate.
Finally, Myers (1985) studies peer review as an institutionalized negotiation where journal editors and reviewers play the role of dissenters. In a close analysis of two biology articles, he argues that authors make the broadest claim possible, while readers seek to narrow the claim and embed it within the literature. Myers stresses the multiple fronts on which the negotiation takes place—the degree of personal verve permitted in the exposition, the breadth of theoretical claims, the balance of attention to the author’s contribution versus that of others. But once again, methodological critiques appear fundamental. Claims are undermined when they cannot be decisively grounded and stray too far ahead of the field’s consensus.
An evidentiary challenge should be visible in the modifications that articles undergo in peer review. Authors can address methodological criticism (including constructive criticism) by documenting their techniques, developing new indicators, adjusting for threats to inference, conducting supplementary analyses, and so on. 4 Or authors can moderate their claims, inserting negative modalities that cast the study at a lower level of externality. Myers (1985) observed both sorts of changes in the articles he studied. Referees viewed the knowledge claims of both authors as excessively speculative. The biologists revised their work extensively in response, expanding the methods and results sections to provide fuller support for their claims while withdrawing their most theoretically provocative assertions.
An interpretive challenge is the converse of an evidentiary one. Rather than question an empirical result by inspecting its source, readers accept the paper’s findings and debate its meaning. These sorts of challenges may address the interpretations that precede the paper’s empirics—the motivating problem or “puzzle,” theoretical framework, and causal argument—as well as the interpretations that follow (implications for future research, proposals for conceptual integration). Such a challenge may take the form of a dismissive critique that questions the significance of the paper’s problem, argument, or findings. It may also take the form of a substantive critique that offers an alternative framing or argument; such a critique may be hostile or friendly.
Sociologists of science have paid rather little attention to interpretive challenges. For Latour (1987), movement from a knowledge claim to its implications adds positive modalities that strengthens the claim and ends the battle. Pinch (1985) describes challenges at higher levels of externality as coherent but less damaging than lower-level challenges since they do not question the investigator’s competence. Myers (1985) does point to interpretive issues in pointing to struggles over the manuscript’s positioning within the literature, but these are subordinated to battles over fact and method.
We expect interpretive challenges to play a key role in peer review and particularly in manuscript revision. First, referees are greatly interested (in the multiple senses of the word) in the meanings that attach to research findings. Science is grounded in interpretive logics that motivate empirical investigation, generate causal arguments, and identify implications from findings. Indeed, the most cited papers are generally conceptual essays that present new theoretical perspectives. The complexity of scientific theorizing means that a given finding is generally consistent with multiple readings, which fosters an active debate. 5
Second, editors and reviewers are better positioned to intervene authoritatively with respect to interpretive issues than with evidentiary ones. Theoretical arguments and implications are public matters fully accessible to peers, while technical internals are less available and in the limit fully known only to those conducting the study. Like sausage making, the cook knows the ingredients while diners hope for the best. Even referees with much relevant experience generally lack familiarity with the author’s particular research site and may have an imprecise conception of specific measurement and analytic operations, not to mention the many features of empirical research that are not explicitly described in the paper. Nor is this a result of obfuscation—it is not possible to detail the multitude of decisions that every scholar makes in the course of investigation. With respect to interpretive issues, knowledgeable peers are well equipped to develop detailed critiques and to identify plausible paths towards revision. The conceptual insights of peer evaluators qua experienced scholars and representative audience members may appear superior to those of the author, while with regard to methodology and empirics, readers are working in the dark.
Like their evidentiary cousins, interpretive challenges should leave a visible trace on the text. Their signature effect is on conceptual reformulation through the framing that defines and motivates the topic, the causal mechanisms that provide an explanatory account, and the theoretical implications linking the reported study to the literature and field more generally. Qualification of theoretical claims is not in itself a strong signal since this may also follow from an effective methodological challenge. Where the logic of the argument or its implications is elaborated or modified, however, we have a clear indication of attention to interpretive matters. This inference is reinforced if there is little or no change to methods or results, which strengthens the presumption that conceptual reformulation is not driven by quarrels over evidence.
Methods
Administrative Science Quarterly (ASQ) is the senior journal in the interdisciplinary field of organization studies, established in 1956. ASQ is also a respected outlet with a high impact factor and widespread recognition as a leading journal. Its editorial practices are the product of a lengthy and successful history, and the articles that it accepts—and reshapes—have played an influential role in defining its field.
An outlet like ASQ offers a strategic research site because it is in social science that high rejection rates and multiple rounds of revision (Hargens 1988, 1990) promote extensive manuscript change. 6 Given the lack of prior work in this area, it is useful to go where the action is. Our analysis is cast in terms of general features of scientific inquiry rather than the particular culture of reviewing in organizational studies generally or ASQ in particular (the journal conducts double-blind review in ways broadly typical of fields we are familiar with—organization studies, sociology, and political science). We do not expect peer review to have a fixed structure across journals and time, however, and speculate briefly on possible sources of variation in the discussion section.
Like most social scientific journals, ASQ self-consciously advocates the partnership of theory and data. Its “Notice to Contributors” begins, The ASQ logo reads, “Dedicated to advancing the understanding of administration through empirical investigation and theoretical analysis.” The editors interpret that statement to contain three components that affect editorial decisions. About any manuscript they ask: does this work to (1) advance understanding, (2) address administration, (3) have mutual relevance for empirical investigation and theoretical analysis? Theory is how we move to further research and improved practice. If manuscripts contain no theory, their value is suspect. Ungrounded theory, however, is no more helpful than atheoretical data.
We sent an online survey to the authors of articles appearing in ASQ between 2005 and 2009. All 78 individuals who had been first authors of articles published during that period were contacted, though scholars who had been first authors of more than one piece were queried only about their last publication. Fifty-two completed surveys were returned, a response rate of 66 percent. Thirty-eight authors additionally sent us a copy of their original submission to ASQ (73 percent of those who responded to the survey and 49 percent of all surveyed individuals). 7
Comparison of background characteristics of survey respondents and nonrespondents showed no significant differences. Fifty-four percent of the respondents were assistant professors, 17 percent associate professors, and 29 percent full professors (versus 47 percent, 21 percent, and 32 percent of surveyed authors overall). Thirty-two percent of respondents were women (versus 29 percent overall), and 92 percent had appointments in business schools (versus 91 percent overall). On average, first authors had published 12 articles in other peer reviewed publications prior to the article examined here, while nonrespondents had published an average of 15. The 52 responding authors had published a total of 51 articles in ASQ, while the 26 nonrespondents had published 22.
A first battery of questions concerned the issues that arose in peer review. Authors were asked how extensive were criticisms and suggestions for revision in 12 topic areas, such as theoretical concepts, data collection methods, and implications for practice/practitioners (see Table 1 for the full list). Authors responded on a five-point scale, from none (1) to major proposed changes (5). A second battery of questions asked about the level of critical attention to the paper’s theory, methods, results, and discussion sections as well as the paper as a whole, again scored on a five-point scale.
Topic-based Changes Proposed and Made in the Course of Peer Review at Administrative Science Quarterly, 52 Published Articles, 2005 to 2009.
Source: Survey of published authors.
Authors were also asked assess the modifications they had made in revising their original submission, via the prompt “How extensive were the changes you and your co-authors made to the following aspects of the paper?” They responded in terms of the same paper components noted above: 12 topic issues, four sections, and the manuscript as a whole. The survey further included the open-ended question, “From your perspective, what was the most significant change in the paper that occurred through the review process?”
Content analysis of original submissions and published papers focused on the number of words in major sections, bibliographic references, the content of hypotheses, and the level of empirical support for hypotheses. Coding rules were developed in a two-stage process where both authors examined a sample of articles based on an initial scheme and resolved disagreements by respecification of coding rules. Once assessment of intercoder agreement (ranging between 82 percent and 98 percent) indicated that the scheme could be reliably utilized, the full set of articles was coded by one or both of us.
Each manuscript subsection was allocated to the theory, methods, results, or discussion section based on its primary content. Theory sections were those parts of the paper that centered on conceptualization, review of the literature, and argument; methods on case selection, measurement, and analytic techniques; results on empirical findings; and discussion on the paper’s contribution, limitations, and conclusions. Subsections were generally but not always contiguous. Once sections were identified, we counted the number of words using the built-in function in Microsoft Word. Footnotes were included in these counts but text appearing in tables and figures was not.
Changes in references and hypotheses were measured by identifying retained, dropped, and added items. This was straightforward for references (since each source item is unique and well defined), while change in the content of hypotheses involved shifts in wording that represent a range of meanings. Hypotheses appearing in the original submission and published paper were coded as retained when they were identical, differed in grammatical or purely cosmetic ways, or used terms judged to be synonymous. Hypotheses were coded as refined when they involved the same causal factors and outcomes but differed in scope conditions or the functional form of the posited relationship or through specification or generalization of the key terms. Hypotheses that appear in the original submission were coded as dropped if they lacked the above connections to those in the published paper; analogous hypotheses appearing only in the published paper were coded as added. 8
Support for hypotheses was based on an examination of text and tables. All authors employed the same calculus for assessing the level of empirical support, with judgments based on statistical significance—invariably at the .05 level (see Leahey [2005] on the institutionalization of this practice). These assessments varied in complexity, since some hypotheses were linked to a single analysis of a single indicator while others involved multiple indicators and/or multiple analyses. Hypotheses were coded as supported when authors reported that all relevant indicators had statistically significant effects in the expected direction, partially supported when some but not all indicators had statistically significant effects in the expected direction, and not supported when no coefficients were statistically significant in the expected direction.
Results
Author Self-reports
Table 1 summarizes attention patterns in peer review of papers submitted to and accepted by ASQ. Of the 12 topic areas that we asked about, authors said theoretical concepts received the most scrutiny from the journal’s reviewers and editors. 9 Sixteen percent of respondents described criticisms/suggestions in this area as “major” (the highest category) and 38 percent as “significant” (the second highest); none of the surveyed authors reported a total absence of criticisms and suggestions concerning theoretical concepts. The study’s argument and hypotheses received almost as much attention from reviewers and editors, on average, followed by the study’s motivation/significance, implications for theory, and alternative explanations. At least a third of the authors scored each of these items as falling into one of the two highest levels of scrutiny (major or significant proposed changes), while only a handful indicated an absence of proposed changes.
The study’s measurement, analytic methods, scope of empirical analysis, and interpretation of findings received moderate levels of attention in peer review. Authors who reported high levels of critical scrutiny in these areas were balanced by those who indicated that few such changes were demanded. In the case of measurement, for example, “major changes” were reported by 4 authors, “significant changes” by 8, “some changes” by 14, “minor changes” by 11, and “no proposed changes” by 12.
Scant attention was reported in three areas: choice of case, data collection methods, and implications for practice/practitioners. In each of these domains, at least half of the authors indicated that reviewers and editors failed to offer any criticisms and suggestions. Only one author indicated that major changes were proposed concerning data collection, and only one (not the same respondent) reported that the implications of the study for practice and practitioners had received the highest level of attention.
The revisions that authors described bear a close connection to the criticisms and suggestions they reported receiving, with a virtually identical rank ordering across papers and respondent-level correlations that range from .59 to .87. 10 Authors said that they extensively reworked the paper’s theoretical concepts, argument and hypotheses, motivation/significance, and implications for theory. A majority of authors scored each of the above as undergoing major or significant modifications, and none indicated that they failed to make changes in these areas. Topics at the intersection of theory and empirics (interpretation of results, alternative explanations) received the next highest level of modification in paper revisions, followed by methodological issues (measurement, analytic methods, scope of analysis). Vanishingly few authors reported that they had made changes where the paper’s data collection methods, choice of case, or implications for practice/practitioners were concerned.
Inattention to implications for practice/practitioners is of particular interest since this was the one interpretive element that was peripheral in peer review. It underscores the professional vantage point from which referees approached their task. Contributions to the evolving academic conversation and the project of developing a scientific analysis of organizations and organizational behavior were highly valued. Questions about the interests of organizations, managers, workers, and customers were viewed as secondary. Interpretive matters devoid of abstract theoretical significance—and around which reviewers lacked an organized perspective—were neglected or invoked ritualistically.
Table 2 provides a correlative view of peer review in terms of manuscript sections. Authors indicated that the theory section was the most criticized and the most modified component of the paper, followed by the discussion, then methods, and finally, results. Differences were once again substantial. Sixty percent of authors indicated that they had carried out major or significant changes to the theory section; none said they had left this part of the paper untouched. There was close attention to the discussion section as well, with some 40 percent reporting high levels of both criticism and modification. The methods section was less heavily revised (22 percent had made major or significant changes, 12 percent had made no changes). And most strikingly, while 22 percent had made major or significant changes to the results, 35 percent of authors reported that this section went unmodified in the course of peer review.
Section-based Changes Proposed and Made in the Course of Peer Review at Administrative Science Quarterly, 52 Published Articles, 2005 to 2009.
Source: Survey of published authors.
The query “From your perspective, what was the most significant change in the paper that occurred through the review process?” gave authors an opportunity to describe the revisions they had made in their own words. Thirty authors (68 percent of all respondents to the question) talked about conceptual or theoretical issues, often stressing the paper’s relationship to the literature. Some examples follow: The most significant change was to the theoretical frame of the paper. The reviews asked that I modify and broaden the organizational literatures that I was engaging, which led to major changes to both the front end and the discussion sections. Our articulation of the theoretical contribution and the audience we were targeting was significantly changed. We had to make major changes to the paper, to respond to reviewers’ and the editors’ request to better link it to the extant literature in organization theory. Reframing and repositioning the paper in the literature and refocusing the theoretical model.
Some descriptions of the authors’ interpretive labors were wholly abstract; for example, one respondent noted that “the front and back were reworked to make the theoretical contribution of our paper clearer.” In other cases, the paper’s audience served as the focal element: I offered some clarification to show why my findings applied more broadly to theories of political and organizational change. The broadened scope helped make the paper more relevant to a bigger audience. We re-framed the paper for more of an organization theory audience rather than a strictly governance audience.
Finally, some authors modified the content of their argument. Two examples follow: In the first round, we proposed a theoretical model where we brought together rational reputation theory, bounded rationality, and some institutional theory. The reviewers did not like this at all, and we had to reformulate the theory to be much more within the boundaries of mainstream organizational theory (i.e. more heavy on institutional theory). The framing of the paper became much bolder. The editor <> encouraged us to make very explicit claims that we believed about resource creation and challenges to open system models, but which we had treated in a very subtle and nuanced way. . . . I want to be very clear that the review process produced what I consider to be massive improvements in the paper . . .
As these opposing experiences indicate, attention to interpretive issues could be a force for conservatism or creativity. Some authors found themselves confined within a Procrustean bed of hegemonic theoretical doctrine, while others were urged to challenge conventional lines of argument. 11 Overall, the most robust generalization is that authors were spurred to reconsider and rework their paper’s theoretical claims.
The language most often used to describe manuscript revision was frame or framing, with 11 authors employing these terms. This concept was introduced into the sociological lexicon by Erving Goffman (1974:21), whose “frame analysis” was concerned with “schemata of interpretation” that allow users to “locate, perceive, identify, and label a seemingly infinite number of concrete occurrences” and is developed extensively in social movement research to highlight the purposeful alignment of causes and potential adherents (Benford and Snow [2000]; see Frickel and Gross [2005] on intellectual movements). Framing is both a universal characteristic of meaning making and a strategic act.
Like movement activists, authors sought to link their work to the concerns of wider constituencies. Authors spoke of framing when they modified not the underlying logic of their analysis or the structure of their empirical investigation but the meanings that surrounded it—the accounts that situated research within the traditions and anticipated trajectory of organization theory abstracted away from specific results to identify the deeper theoretical issues at stake and elaborated the study’s implications for theory and research.
The focus on framing stood out most prominently when the paper’s technical core was untouched while motivation and argumentation were strategically recalibrated. One author made this explicit: “Theory” should be interpreted largely as “Framing.” The remarkable thing about the paper is that the results and analyses essentially did not change at all from the first submission to the last. The editors and reviewers were concerned about “theoretical novelty” but the paper was a test of existing theoretical claims. It took some time to figure out the “right” framing to make them happy.
Another pithily described the key change as “complete restructuring of the theoretical argument leading up to the prediction, which remained unchanged.”
Authors were divided on the value of the interpretive rework they performed. Some described their paper as enriched by the insights of their evaluators: We received fantastic suggestions to make our paper more highly read. One of the reviewers saw two large theoretical contributions that we overlooked that we subsequently incorporated.
Others found peer review a disempowering experience that diminished their product.
We also had to remove a framing that we thought would be the major contribution of the paper, which we did in order to satisfy the reviewers. Also in response to their insistence, we changed the title of the paper, and we think for the worse. Our paper has received far less attention than we think it deserves, we think, because the title no longer signals to potential readers the issues that we thought would capture their interest.
While methodological or empirical issues were not the focus of attention across the respondents as a whole, they were highlighted in a minority of cases. One author noted, “We were asked to provide an entirely separate empirical analysis that entailed additional data collection, analysis and discussion,” while others reported that they had validated key indicators, collected additional measures, or utilized new statistical techniques. But while 30 authors referred to issues of interpretation and theoretical framing in describing the most significant changes to their paper, only 5 pointed to data collection and measurement, 6 to new or modified data analyses, and 2 to alternative explanations. (Three authors noted stylistic modifications; totals sum to more than the number of respondents since some authors pointed to several major changes they had made while others did not answer the question.)
Content Analyses of Original Submissions versus Published Papers
Section Lengths
Figure 1 displays the number of words in the theory, methods, results, and discussion sections of the 38 papers initially submitted to ASQ. Manuscript text (which excludes the abstract, acknowledgements, appendices, bibliography, tables, and figures) was composed of 10,808 words on average, with the shortest submission about 6,500 words and the longest nearly 15,000. On average, theory sections comprised 39 percent of the text. Methods (21 percent) and results (23 percent) sections were roughly half as long, while discussion sections made up 15 percent.

Section sizes, original submissions of 38 papers published in Administrative Science Quarterly, 2005 to 2009.
While these averages well represent the section size profiles of the great majority of manuscripts, a few papers were differently organized. Qualitative papers—identified here as those lacking statistical data summaries—were composed of short theory, methods, and discussion sections and extended presentations of results. These are the handful of long papers (more than 10,000 words) at the extreme right of the graph. Unlike their quantitative brethren, qualitative submissions opened with concise theoretical overviews that described the topic’s conceptual significance, offered abbreviated discussions of method, developed an extended presentation of empirical findings based on observation of events or participant perceptions alongside concept formation, and closed with discussion sections that performed some of the functions of theory sections in quantitative studies. The structure of the two sets of papers stand out in sharp relief: the section size profile of qualitative papers was 16 percent theory, 8 percent methods, 64 percent results, and 11 percent discussion; for quantitative papers, the corresponding percentages were 43, 24, 15, and 16.
Figure 2 shows how section sizes changed in the course of peer review. Most manuscripts expanded, with published papers consisting of 12 percent more words on average than original submissions. Growth was broadly but not proportionately distributed across the four sections. The discussion expanded the most in both absolute and relative terms, from an average of 1,648 words in the original submissions to 2,258 in the published paper—an increase of 37 percent. Methods sections expanded almost as much in absolute but not in relative terms (565 words, 24 percent), while theory sections showed a modest increase (200 words, 4 percent). Only the results section shrank on average, and here the net change was minor (29 words, a decline of 1 percent in absolute terms).

Average change in section size in peer review, 38 papers published at Administrative Science Quarterly, 2005 to 2009.
These shifts altered the relative sizes of the four sections but did not substantially restructure the 38 manuscripts. As Figure 1 shows, most original submissions consisted of lengthy theory sections, methods, and results sections that were each about half as long and somewhat shorter discussions; the same can be said for the papers that ASQ published. The theory section was the longest single section in 28 of the 38 original submissions and in 27 of the 38 published papers. The discussion grew the most dramatically—from 15 to 22 percent of the manuscript—but remained in most papers the shortest of the four sections. Because it stood virtually still while all the other sections contracted, the share of text presenting empirical results diminished 12 percent from original submission to published paper.
Average changes in word counts were in some respects outweighed by variation across manuscripts. In the theory section, considerable volatility among individual papers is disguised by an overall balance between “growers” and “shrinkers.” The length of theory sections fell in almost half of the submissions (16 of 38), declining by more than 1,000 words in three cases and growing by more than that amount in nine. The results section was similarly divided: six papers cut more than 1,000 words, while three added more than 1,000. By contrast, methods and discussion sections expanded in the great majority (86 percent) of the papers under peer review.
Elaboration of the discussion section—the largest grower in both absolute and proportional terms—was in keeping with what authors told us about the centrality of framing. The central function of the discussion is to position the article’s analysis and findings within the scholarly literature in terms of connections to prior research and implications for future research. Our reading of these sections showed that these connections tended to be more extensively developed in published papers than in original submissions. A second source of growth was expanded consideration of alternative explanations and threats to inference—in some cases, this involved subsidiary data analysis, and in others, a purely verbal argument.
The pattern of change in results sections was similarly consistent with author self-reports. In many cases, these sections appeared virtually unchanged, and in none of the submissions were the results extensively modified. The main source of large-scale change resulted from the addition or deletion of sets of analyses. One author eliminated one of the two studies that had initially been reported, while another doubled the number of tables and reported relationships by modeling a second dependent variable while leaving covariates untouched. Findings that previously had been reported as preliminary studies in the introduction were sometimes moved to the results (or vice versa).
In the case of theory sections, our reading suggested much change in content—indeed, we would describe this section as more heavily reworked, on average, than any other. Many authors reconceptualized the motivation underlying their study or grounded their argument in previously undisclosed theoretical perspectives. In some cases, the published paper was organized around a fundamentally different theoretical issue than the one that animated the original submission. These conceptual shifts often led to modified hypotheses, a topic we examine below.
Finally, a close reading of methods sections suggested yet another species of revision. The overall structure of these sections changed little, and we found few instances where text appearing in original submissions failed to appear in the published paper. They grew not through extensive revision but by the addition of discrete blocks of text that explained particular methodological choices. For example, one published paper provided a detailed illustration of the way a network indicator was constructed, while another incorporated text explaining the statistical logic that had led the investigators to employ their estimation strategy. While we lack access to referee reports, it seems evident that these textual additions had been prompted by specific queries from reviewers.
An important lesson here is that word counts provide a useful summary of some aspects of manuscript revision but should not be taken as a sure indicator of the amount or significance of textual change. The justificatory or illustrative material that was added to methods sections translated directly into an increase in the number of words, while the conceptual reframing that took place in theory sections did not—reworked sentences and paragraphs were not necessarily longer than the ones they replaced. Localized methodological insertions also had few implications for other paper components, while changes to framing or theoretical logic might induce a cascade of changes. Word counts thus provide a useful starting point but one that is best combined with a careful reading and/or more sophisticated textual analysis.
We should also note a characteristic shift in authorial tone that took place as manuscripts were revised (see Myers [1985] for a similar observation). Many original submissions read as personal narratives, in some cases by describing the prior research experiences that had led the authors to pose their research question and in other cases by overtly reflecting the author’s sensibilities and values. Published papers were less personal and more static. Prior findings were eliminated or placed with other empirical observations in the results section, providing a crescendo of corroborative evidence but losing contact with the historical sequence of events. Substantive arguments that had been intertwined with the author’s personal commitments were stated more abstractly as matters of fact or purely theoretical interest. This movement away from a personal narrative made for sharper distinctions between major sections; we found published papers easier to code than original submissions.
Citations
Figure 3 displays bibliographic change from original submission to published paper by indicating the average number of citations that were added, dropped, and retained. There is once again a clear trend toward expansion, and indeed, bibliographies grew more than word counts. On average, original submissions cited 75 items while published papers cited 96, an increase of 26 percent. Growth was particularly noticeable on the high end. For example, the number of manuscripts with more than 100 references grew from 6 among the original submissions to 18 among the published papers. Only two papers moved in the opposite direction, with fewer references upon publication than when they were first sent to ASQ.

References in original submissions and published versions, 38 Administrative Science Quarterly papers, 2005 to 2009.
Bibliographies were the site of considerable turnover as well as growth. Thirty percent of the citations appearing in original submissions failed to appear in the published paper, with the most extensively re-referenced paper eliminating a full 82 percent of its original citations. At the other extreme, one author retained every item in the paper’s initial bibliography while also adding new references. In nine cases, more than half of work cited in the published paper had not appeared in the original submission.
The amount of bibliographic change occurring in manuscript revision was linked to the intensity and type of critical feedback that authors received from reviewers and editors. Papers that were more heavily criticized around interpretive issues underwent more bibliographic revision. A simple summary of change in citations (references dropped from the original submission plus references added to the published paper) was correlated with criticism of the paper’s theoretical concepts (r = .36), argument/hypotheses (r = .41), implications for theory (r = .31), and theory section (r = .59), with connections to author-reported changes showing a stronger version of the same pattern (correlations of .50, .57, .42, and .51). Relationships between bibliographic change and criticism/modification in other areas, like data collection, analytic methods, and research scope, were weaker and sometimes negative.
These correlations make good sense given the theoretical and metatheoretical work that references perform in social science. Bazerman (1988) notes that citations are primarily used to establish the significance of a research issue and to locate the author’s investigation within a theoretical perspective, an approach that leads foundational papers to be heavily referenced (Hargens 2000). Citation context analysis shows that social scientific articles are referenced in connection to broad interpretations of their conceptual approach and research problem rather than specific empirical findings (Cozzens 1985), one reason that most citations appear in the theory and discussion sections (Strang and Siler 2013). ASQ’s authors worked resourcefully within this tradition. When reviewers criticized a paper’s concepts and hypotheses, they returned to the literature to address different audiences, reposition their theoretical approach, and establish new lines of argument.
Hypotheses
Figure 4 summarizes the changes to hypotheses that occurred in the course of peer review. Twenty-seven of the original submissions offered formal hypotheses that stated the central knowledge claims examined in the paper’s empirical analysis. 12 Of the 133 hypotheses appearing in original submissions, 55 (40 percent) were retained without substantive amendment. Thirty (22 percent) were refined via shifts in scope conditions, the functional form of the relationship, or specification/generalization of independent or dependent variables. Fifty hypotheses (37 percent) were dropped from consideration, while 70 new claims were added in the course of peer review. Altogether, authors modified their formal hypotheses via a combination of subtle amendments to core ideas, deletion of some predictions, and the development of novel assertions.

Hypotheses in original submissions and published versions, 29 Administrative Science Quarterly papers, 2005 to 2009.
The degree of reformulation varied across individual papers. Twelve manuscripts stated the same hypotheses (net of cosmetic shifts in wording) in the published paper that had appeared in the original submission. In five cases, formal hypotheses were introduced in the course of peer review, while two moved in the other direction by replacing the claims offered in the original submission with more informal arguments voiced in the text. Seven authors provided incremental refinements by adding qualifiers or scope conditions to their existing hypotheses; three of these also introduced novel claims. At the other end of the spectrum, eight authors extensively restructured the claims that their paper investigated by both excising some original hypotheses and adding new ones.
Like citations, changes to hypotheses were related to the type of criticism the manuscript received in peer review. The number of retained and refined hypotheses was negatively correlated with criticism of the paper’s theoretical concepts, argument, and motivation; authors did little to alter what (referees thought) was not broken. By contrast, hypotheses were more substantially modified when papers faced a broad interpretive challenge. The number of hypotheses dropped from the original submission was correlated with criticism of theoretical concepts (r = .32), argumentation (r = .34), and results (r = .35); the number of added hypotheses was correlated with criticism of argumentation (r = .34), choice of case (r = .34), analytic scope (r = .52), methods (r = .36), and results (r = .35).
Support for Hypotheses
The great majority of hypotheses were empirically supported in both sets of manuscripts. Of the 135 hypotheses appearing in original submissions, 100 were supported, 14 partially supported, and 21 not supported (74 percent, 10 percent, and 16 percent, respectively). In published papers, 119 hypotheses were supported, 18 partially supported, and 18 not supported (76 percent, 12 percent, and 12 percent).
Hypotheses were more likely to be dropped from the original submission if they did not receive support. Fifty-seven percent of such claims (13 of 21 unsupported hypotheses, 7 of 14 partially supported hypotheses) were excised in the course of peer review, while only 30 percent of supported hypotheses suffered the same fate. The tendency to weed out hypotheses that did not jibe with the paper’s empirical results was counterbalanced by the somewhat weaker track record of newly minted hypotheses, 69 percent of which were supported (versus 83 percent of those carried over from the original submission).
High levels of support for malleable claims reflect a metatheoretical strategy where hypotheses summarize and carry the paper’s argument. Since hypotheses coevolved with the broader interpretive framework of the study, they possessed great fluidity as well as the tendency to be proven correct. Elaboration of hypotheses consistent with the case led to a more detailed theoretical account, while arguments that did not jibe with empirical results appeared to play no useful function. (A few papers did offer competing hypotheses drawn from rival theoretical perspectives, though even this research strategy did not ensure the stability of claims from original submission to published paper.)
The theoretical reformulation that occurred in peer review runs counter to received principles of scientific method, which presumes that hypotheses inform the initial design of empirical research and are in any case not subject to a regime of continuous improvement. In the papers we studied, it seems clear that hypotheses serve to signal the emergent story inspired by both theory and data; it is difficult to believe that causal claims devised prior to data collection could be correct three fourths of the time. While we do not observe the process that generated the hypotheses and analyses originally submitted to ASQ, we imagine it to parallel the one that is visible in the course of peer review. What changes after journal submission, we suspect, is not the goal of a seamless connection between theoretical reasoning and empirical results but the addition of reviewers and editors to the search party. 13
Where hypotheses appearing in the original submission were retained or refined, we can investigate whether peer review prompted substantial shifts in reported findings. It did not. Of 85 such hypotheses, only 5 received a qualitatively different level of support in the published paper versus the original submission. Three hypotheses that were partially supported in original submissions were supported in published papers, while two moved in the opposite direction (from support to partial support). All other hypotheses lay on the main diagonal, with 68 receiving support in both manuscripts, 4 receiving partial support, and 8 receiving no support. Inspection of tables showed that when hypotheses were retained without significant modification, empirical analyses changed little from original submission to published paper.
Empirical results did change in papers that both added and dropped hypotheses because new questions were answered. This sort of revision generally involved the development of novel dependent and/or independent variables as well as fundamental shifts in theoretical argumentation and, in some cases, a shift in levels of analysis. We cannot here compare the support that hypotheses received, since dropped hypotheses lacked a “child” and added hypotheses a “parent.” But inspection of text and tables indicated that papers that offered markedly different hypotheses often conducted novel analyses as well. Seven papers exhibited this sort of combined theoretical and empirical reformulation, which roughly parallels the number of authors who identified methodological or empirical matters in describing the most significant changes they had made in revising their work.
Discussion
Summary of Findings
Author self-reports paint a clear picture of the impact of peer review on articles reviewed and ultimately published by ASQ. The overwhelming focus was on the interpretive elements of the text that linked the study and its findings to the larger scholarly conversation. Authors reported that the main criticisms and suggestions they received from referees involved the study’s concepts, argument, and motivation and that the changes they made were focused on the same topics. The activity that best described the revision process, for many authors, was “(re)framing.”
Comparisons between original submissions and published papers demonstrate the centrality of interpretive challenges as well. There was turnover and growth in the discussion section and in the paper’s bibliography as authors rethought their audience and forged new conceptual linkages. Theory sections were intensively reworked though they did not expand appreciably in size. Perhaps most strikingly, hypotheses were substantially altered as part of a larger process of theoretical reframing. Many authors not only rethought the arguments that motivated their hypotheses but proposed new formal claims as well. In some papers, substantial attention was directed to the evidentiary basis of the author’s project, prompting modification of measures and data analysis, but in most cases, these concerns were secondary and in many virtually absent.
Accepted versus Rejected Submissions
The sampling frame of this study was shaped by our interest in the concrete ways that scholarly work is altered by its evaluation. It is after papers are published that the cycle of formulation, response, and reformulation ceases and we can speak unambiguously of referee interventions that reshape the scientific record. 14 From the perspective of a broader concern with the practice of peer review, however, the papers that are ultimately accepted tell just part of the story. 15
The finding that revision of ultimately published papers is dominated by interpretive concerns does not mean that evidentiary challenges seldom arise in peer review. Editors and reviewers may uphold a demanding “technical bar” that leads many papers to be rejected without opportunity for revision because their methodology and empirics are viewed as unsalvageable. In addition, authors may be afforded the opportunity to address evidentiary challenges but rarely do so successfully, either because they withdraw their paper from consideration or because their efforts at revision fail to pass muster with referees. Neither a technical bar nor a systematic pattern of “revise and reject” would be detected in a study like the one reported here.
The literature offers some support for these ideas. Kerr et al.’s (1977) survey found that reviewers looked askance at small-N studies, statistically nonsignificant findings, experiments that lacked controls, and inappropriate parametric tests. Beyer et al. (1995) showed that helpful reviewer comments about theory increased probabilities of acceptance, while helpful reviewer comments about findings had no impact, which suggests that authors fare better when the review process centers on interpretive issues. In medical science, Goodman et al (1994) found postreview improvement in three interpretive features (discussion of limitations, generalization, and conclusions) but only one methodological component (reporting of standard errors).
The role of interpretive criteria in acceptance/rejection decisions should not be discounted, however. Journals like ASQ insist on a theoretical bar as well as a technical one. Beyer et al. (1995) found that submissions that claimed theoretical novelty were more likely to be accepted and that “the most important predictor of reviewers’ recommendations was how they rated manuscripts’ significance to the field, which was partially defined by originality” (p. 1253). Mone and McKinley (1993) point to the great value placed on a unique theoretical argument in organizational studies, and Locke and Golden-Biddle (1997) detail the way theoretical contributions are persuasively framed. Many studies demonstrate confirmation bias, where reviewers favor studies that are consistent with their causal priors. 16
Data on the full range of outcomes (especially rejections without revision, which make up almost 90 percent of all cases) is thus needed for a comprehensive assessment of peer review. It is possible that evidentiary issues dominate up-or-down verdicts while interpretive concerns are central to manuscript revision. But it is also possible that both acceptance/rejection and revision are primarily grounded in interpretive criteria. The difference is significant. The first scenario indicates that referees are deeply concerned with evidentiary matters but do not see methodological faults as remediable; the second suggests an evaluative regime that focuses on a paper’s theoretical contribution to the virtual exclusion of empirical findings.
Variation in Peer Review Practices
A second limitation of the study reported here is its restriction to a single journal and historical period. Questions immediately arise about manuscript revision in other outlets, other fields, and other periods. Better: we hope they do!
We do not imagine that peer review has a universal form that is fixed across time and place. Any system where both judges and the judged are drawn from the same population facilitates the endogenous development of community-specific norms. Scholars learn about peer review by observing how their own work is evaluated and by serving as referees; all participants have repeated exposure to the issues that arise and the way they are resolved. Norms about appropriate practice thus emerge and are reinforced through repeated interaction. These include understandings of the issues that referees can legitimately raise as well as the level of responsiveness expected of authors.
Endogenously generated cultures of practice imply pressure toward homogeneity among journals that draw from the same author-reviewer pools. We anticipate that the review process at ASQ will be similar to other journals in the field of organization/management studies and to exhibit substantial continuity with overlapping disciplines, like sociology. While journals sometimes seek to intervene (often to render the process more collegial or constructive), scholars do not readily discard habits and expectations built up over the course of their careers. Heterogeneity can flourish, by contrast, in disjoint fields whose journals share few authors and reviewers.
Scholarly evaluation is a product not only of endogenous dynamics but of the academic community’s intellectual predispositions and commitments. Hambrick (2007) describes management research as marked by a “theory fetish” grounded in status anxiety, which leads to the devaluation of empirical findings not accompanied by a recognizable theoretical contribution. If so, ASQ and related journals may feature a distinctively high level of concern over interpretive issues—though we would be surprised if traces of the processes observed here did not appear in other fields. Ranging further afield, R. Collins (1981) contends that the defining characteristic of rapid discovery sciences, like physics, is the possession of research technologies that can be readily manipulated to generate novel results. Such technologies might lead to a greater weight placed on evidentiary challenges, since reviewers would be better positioned to critique the author’s methods or propose alternatives. More generally, H. Collins and Pinch (1998) identify struggles over empirical findings—not framing—as central to controversies in physics and related fields. Comparative analysis of peer review offers a strategic site to interrogate how different scholarly communities “do” science.
Implications for Theory, Practice, and Practitioners
As Espeland and Sauder (2007) argue, evaluative practices reshape the individuals and behaviors being measured, a process they describe under the rubric of “reactivity.” Social actors reflexively monitor and adjust to the conditions they face, most often by aligning their conduct with the measurement scheme (though they may also contest its dictates or withdraw). It is thus conventional wisdom in business that what gets measured gets done. Teachers teach to the test. Research scholars can be assumed to be similarly responsive.
The journal publication process is of interest because it involves two evaluative mechanisms and associated bases of reactivity. Authors learn about disciplinary imperatives from the acceptance or rejection of their manuscripts, and they are strongly motivated to utilize these insights in recalibrating their future efforts. But peer review does not stop at the point of issuing up-or-down verdicts. Reviewers and editors chart a process of revision that reshapes research communications by guiding the author toward a reformulation of his or her work.
Like other reflexive actors, scholars become adept at performing aspects of their work that are clearly and explicitly evaluated. We suspect the interpretive challenges that dominate manuscript revision have a powerful effect in leading authors to hone their argumentative and framing skills. Rejection letters may be hard to parse, with multiple reviewers offering various and sometimes contradictory criticisms. In successful “revise and resubmits,” by contrast, authors have a clear understanding of the challenge they faced because they are intimately familiar with the means by which they surmounted it. And since peer review is an endogenous system where the authors of one paper are reviewers for others, a scholarly culture that gives great weight to interpretation can emerge and be self-reproducing as researchers gain expertise in managing the issues that dominate the dialogue between peers and when they insist that others meet the standards they have been held to.
Close attention to interpretive issues is thus likely to lead their treatment to become more developed and more elaborate, and indeed, this pattern is evident in the papers studied here. Manuscripts expanded from original submission to published article in overall length (12 percent increase in word count, on average), discussion section (37 percent increase), methods section (24 percent increase), citations (26 percent), and hypotheses (13 percent). The only section of the paper that failed to grow was the results, which shrank slightly in absolute size and by 12 percent in relative terms. There is a robust connection between the concerns of peer evaluators and the restructuring of research reports, though the relationship is not one-to-one; theory sections were intensively rewritten though their length changed little on average, while methods sections show the opposite pattern.
The impact of peer review on manuscripts is particularly noteworthy because it is consistent with the historical trend. In a study of articles appearing in ASQ between 1956 and 2008, Strang and Siler (2013) found that the length of theory sections increased by 669 percent, methods sections by 1,573 percent, and discussion sections by 253 percent, while results sections contracted by 34 percent. During the same period, numbers of hypotheses quintupled while numbers of citations grew eightfold. The original submissions studied here already fit the conventions of contemporary social science and were undoubtedly sculpted in anticipation of reviewer reactions. Yet the interplay between authors, editors, and reviewers propelled the published articles further along the trajectory that social scientific writing has moved over the last half century.
Peer review does not explain provide a one-factor explanation of the larger trend in article structure, which is influenced by shifts in research design (from qualitative case studies to large-N regression-like analyses of archival data) as well as broader theoretical and methodological developments. But it makes good sense that peer review is a central mechanism in the production and dissemination of disciplinary norms, since it is here that authors are most influenced by the expectations of their colleagues. Our findings thus provide concrete support for Ellison’s (2002) hypothesis that peer review is a driver of the growing length and complexity of social scientific articles. They go further by distinguishing the elements in research reports that are empirically treated as malleable—overwhelmingly interpretive in character—from the methodological and empirical elements that are largely treated as fixed.
This paper is descriptive rather than evaluative, and indeed, we lack any strong evidence on the costs and benefits of contemporary peer review practices. Questions naturally arise, however, about both direct and indirect effects of referee-driven revision. Is the inevitable loss of authorial voice that occurs in journal revision compensated by an increase in the insightfulness and accessibility of theoretical argument? Or does it have negative/negligible effects on average, in effect substituting the scholarly tastes of reviewers and editors for those of authors? And in the longer run, is scholarly progress aided by a revision process that centers on theory rather than data or method? While we cannot answer these questions, we regard the regularities identified here as cause for concern, most notably in the absence of a constructive methodological dialogue and in the likely reactivity of scholars to an evaluative regime focused on interpretive issues. Sensitized by this research, for example, we have been struck by the frequency with which researchers describe their data as solid but their framing as needing work.
The study of peer review contributes to burgeoning social scientific analysis of evaluation and its transformative impact. The credo that knowledge is socially constructed embraces both the notion that creative work always involves some sort of collaboration and the assertion that authors are deeply influenced by the concerns and interests of their audience. Both mechanisms are given concrete form in the contemporary practice of peer review. Editors and reviewers do much more than certify their colleagues’ knowledge claims. They shape the scientific record, most significantly by adjusting the meanings that attach to empirical inquiry.
Footnotes
Acknowledgements
We thank Matt Brashears, Michèle Lamont, and Steve Morgan for their helpful comments.
