Abstract

At some stage in a young researcher’s career, there is a critical juncture at which analytical acumen begins to feed a desire to find a statistically significant outcome or a crystal clear emergent theme. In the quest for statistical significance, quantitative researchers might turn a data set upside down or inside out, running every conceivable chi-square test, Pearson correlation, or analysis of variance—all the while disregarding underlying assumptions or ignoring inflated error rates. In qualitative research, comparable risks arise when analyses converge too quickly on anticipated outcomes without due consideration of independent coder, stakeholder, or negative case checks. Emergent designs and exploratory data analyses invite this “believing is seeing” approach to the research process. But the lure of publication and a desire to fast-track a research career can push the boundaries of researcher ethics and judgment, thereby compromising the inherent trust placed in researchers and their findings. The case of food science researcher Brian Wansick serves as an object lesson.
Wansink rose to fame as the head of Cornell University’s Food and Brand Lab, a research unit focused on environmental factors that influence the dietary habits and behavioral economics of consumers. In one particular study, Wansink and his colleagues explored how discounting the cost of an all-you-can-eat pizza buffet affected diner behaviors, including how many trips they made through the buffet line, how many slices of pizza they ate, whether they ordered a drink or ate dessert, and so on. When the initial analysis revealed no significant effect for discounting on the amount of food consumed, Wansink allegedly directed a research assistant to engage in a number of data manipulation techniques, including trimming outliers and breaking the study participants into numerous subgroups (e.g., males vs. females; lunchtime diners vs. dinnertime diners; people eating alone, in small groups, or in large groups), which allowed for an extensive number of relationships among variables to be explored until something significant turned up. Any noteworthy finding could be reverse-engineered to take the form of an ex post facto hypothesis or research question (known as “HARKing”—hypothesizing after results are known) and an entire article then built around an important discovery about pizza eating.
One day, in a blog post, Wansink marveled over the research assistant’s ability to doggedly persist until she found “hidden gems” buried within the data. As these details emerged and independent researchers began to connect the dots, a bevy of questions and critiques were posed (Bartlett, 2017). While Cornell University cleared Wansink of research misconduct, the ongoing investigation has resulted in multiple papers being retracted or corrected, in some instances due to excessive self-plagiarizing or piecemeal publication but most often because of extensive slicing and dicing of the data set. The practice of submitting a data set to dozens or even hundreds of analysis iterations is sometimes called “data dredging” or “p hacking,” because of the effort expended to find any statistically significant outcome (with a p value below .05 or some other critical value), even if the result is due to random chance or a false positive error rather than a valid experimental outcome.
Keep in mind that Wansink successfully published over 250 papers since 1990, a level of productivity that resulted in him being promoted, receiving major grant funding, and achieving great notoriety and influence with appearances on the Today show and coverage in The New York Times. Leaders and decision-makers within the food industry likely made changes to their business model or marketing practices—all based on the assumption that Wansink, as a researcher of repute, could be trusted to tell the truth.
With the ethical questions that have come to light about Wansink’s work and that of other researchers, there is renewed concern about factors that may contribute to researcher carelessness, if not outright research misconduct. Some scholars point to publication pressures and lack of emphasis on ethics as part of researcher education. Others blame publication bias—the tendency for researchers to submit, and journal editorial committees to accept, manuscripts that reveal supportive or significant findings more often that those with negative, null, or nonsignificant results.
The research community typically depends on standard measures, such as peer review, replication, and the reporting of effect sizes alongside tests of statistical significance, to ferret out spurious researcher claims. Some journal editorial boards, however, have begun asking researchers to provide direct access to data sets so that reviewers can check for mathematical anomalies. One such technique is called “grimming”—derived from the acronym GRIM, which stands for granularity-related inconsistency of means (Brown & Heathers, 2017). By considering the likelihood of particular summary statistics, such as means, emerging from the data given various parameters such as sample sizes, the number of items, or the response range, it is possible to identify instances where researchers may have committed data transcription or keypunching errors, misreported certain findings, or deliberately manipulated results.
As an editor, I always begin with the assumption that a researcher can be trusted to provide accurate information about the research method, the data sources, and the analysis that produced a particular finding. On occasion, a perceptive reviewer may spot a peculiarity or inconsistency and ask the researcher to recheck the data file or rerun an analysis. Yet another bond of trust is established with the expectation that what has been requested will be dutifully addressed and honestly reported. I hate to imagine a future in which armies of “grimmers” are deployed to fact check the claims of researchers before manuscripts are accepted for publication, or before published articles are included in meta-analyses and used to inform educational policy and practice. But it all comes down to ethics, honesty, and trust. If researchers cannot trust each other, how can the public continue to trust the researcher community and their recommendations for change and investments in schools?
In a recent Pew Research Center report of U.S. adults’ opinions about science and scientists (Funk, 2017), there is evidence presented that the public has greater confidence in scientists than most other institutional groups (K–12 principals and superintendents, religious leaders, business leaders, the news media, elected officials) and that the confidence in scientists and their influence on society has remained relatively stable over the past four decades. But this trust is conditional depending on the domain of research, as well as changing sensibilities surrounding facts and truth. While most adults trust scientists’ recommendations related to medical care, such as the use of childhood vaccines to prevent disease, they are much less accepting of scientists’ views on the causes of climate change or the health effects of genetically modified foods. As Funk (2017) observes, “Public trust in scientists encompasses expectations about scientists’ actions, trust in scientists to be honest brokers of information, trust in scientific expertise and understanding, and trust in the motivations and influences operating on science research.” Trust may be the researcher’s most precious commodity.
