Abstract

In 2013, we reported in Psychological Science on a longitudinal field experiment in which we and our coauthors randomly assigned participants to receive or not to receive positive-emotions training over 6 weeks, in order to test theory-driven hypotheses regarding the pathways by which positive emotions might build physical health (Kok et al., 2013). Our results revealed that, amidst the unpredictability of field settings, positive-emotions training produced statistically significant improvements in a marker of physical health, and that these improvements were mediated by psychological processes. On the basis of her reanalyses of our data, Nickerson (2018) claims that we overstated our findings. As we detail here, we find the empirical basis for Nickerson’s claim to be questionable.
The Influence of Statistical Outliers
In discussing “unusual” participants, Nickerson argues that extreme and implausible values for high-frequency heart rate variability (HF-HRV) for 7 participants led us to overstate the effects of the positive-emotions training deployed, known as loving-kindness meditation (LKM). However, Nickerson’s method of identifying influential observations does not fit the analysis we conducted. Specifically, although Nickerson characterizes seven data points as “unusual,” she spotlights 5 participants with the highest scores for final vagal tone (see her Fig. 1). Given that the ultimate variable of interest in the analysis we reported in our 2013 article represented change in vagal tone from baseline to the end of the study (see Kok et al., 2013, Fig. 2), Nickerson’s focus on final vagal tone is inappropriate.

Histograms for change in vagal tone, the outcome variable in Kok et al. (2013), in the loving-kindness meditation (LKM) condition and the control condition. Mean change for each condition is represented by the corresponding dashed vertical line. Change in vagal tone was computed by subtracting baseline high-frequency heart rate variability (HF-HRV) from end-of-study HF-HRV, using the square-root-transformed values used in Kok et al.
Closer inspection of Nickerson’s Figure 1 reveals that of the five data points that she characterizes as extreme, the three rightmost points show end-of-study values that are consistent with their corresponding baseline values. Even though these three values are high relative to the other HF-HRV scores, the change in vagal tone that they represent is not itself disproportionately high. Thus, in our statistical model, which rested on change in vagal tone, these three data points were not statistical outliers.
Nickerson also wonders how our team could disagree with the authors of a previously published Commentary on our 2013 article (Heathers, Brown, Coyne, & Friedman, 2015) regarding the biological validity of the seven vagal-tone values she refers to as unusual. It is because, as stated in our published response to Heathers et al. (Kok & Fredrickson, 2015), our determination of biological validity was based on having an outside expert (blind to experimental condition) inspect the raw electrocardiogram (ECG) data for each participant in question—a step that Heathers et al. did not take. The expert we consulted was James Long, of the James Long Company, who has provided software and hardware solutions for psychophysiological research since 1979. Long determined that the values for all participants in our sample, except the two identified and excluded in our response to Heathers et al. (participants 557004 and 557027), were the result of biologically plausible ECG recordings and not of error. 1 Therefore, it is inappropriate to exclude additional participants from analysis.
Nickerson’s approach to the 7 participants she designated as unusual was to exclude them from her reanalysis of our data. A truism in statistics is that statistically significant findings can be rendered nonsignificant with sufficient reductions in sample size and, correspondingly, in statistical power. In this light, it is noteworthy that Nickerson’s approach reduced the available sample by more than 13%.
Figure 1 presents a histogram for change in vagal tone (calculated by subtracting baseline HF-HRV from end-of-study HF-HRV) for each experimental condition (participants 557004 and 557027 excluded). 2 We agree with Nickerson that the rightmost value in the LKM histogram may plausibly function as an influential statistical outlier. We do not agree, however, with her strategy of addressing outliers by expunging data. An alternative analytic method for removing the disproportionate influence of outliers, one that does not reduce statistical power, is to Winsorize any extreme values by replacing them with the value from the 95th percentile (Rivest, 1994). This approach simultaneously reduces the distortion that can be caused by an outlying value while retaining most of the information provided by the data point (in this case, that the individual had a high value for change in vagal tone).
After Winsorizing the data (and again excluding the two cases identified as biologically implausible in Kok & Fredrickson, 2015), we found that empirical support for our three original hypotheses stood steady. Specifically, we reran our original model (see Fig. 2 in Kok et al., 2013) using the 95th-percentile method of Winsorizing extreme data points for change in vagal tone. Six outlying values, 3 including the rightmost value of concern to Nickerson (see Fig. 1), were adjusted. The model fit continued to be acceptable (root-mean-square error of approximation = 0.084, 90% confidence interval = [0.063, 0.103]; comparative fit index = 0.945; Tucker-Lewis index = 0.943), and the results continued to support our three hypotheses: As predicted by Hypothesis 1, the practice of LKM increased positive emotions (b = 0.055, z = 3.770, p < .001), and baseline vagal tone moderated this effect such that participants who began the study with higher values of HF-HRV showed greater increases in positive emotions over 9 weeks’ time (b = 0.047, z = 3.003, p = .003). As predicted by Hypothesis 2, increased positive emotions predicted increased perceived social connections (b = 1.065, z = 4.184, p < .001). And as predicted by Hypothesis 3, greater increase in perceived social connections predicted larger changes in vagal tone over the course of the study (b = 3.795, z = 2.008, p = .045).
Within-Person Analyses?
Nickerson argues that evidence for upward spirals needs to rest on within-person analyses rather than between-person analyses. We agree with Nickerson that within-person tests are of considerable interest. We find fault, however, with her claim that she has provided a within-person test and show that her between-person test rests on data stripped of much of its information value.
A key strength of our original study is its experimental design, in which participants were randomly assigned to either a control condition or an intervention condition (LKM). Both our original analysis and the reanalysis Nickerson offers in the latter section of her Commentary are between-person tests that compare these two groups. Even though both her approach and ours include derived variables to represent within-person changes (i.e., computed indices of change in her reanalysis; latent scores in our analysis), they both ultimately compare these within-person variables using between-person tests. Statistical experts argue that any effects assessed using a single derived value per person to represent within-person change on a repeated measure—whether as an independent or a dependent variable—are unambiguously between-person effects (Curran, Howard, Bainter, Lane, & McGinley, 2014). So, although Nickerson claims that her reanalysis provides a test of within-person effects, it does not.
The primary difference between our original analysis and Nickerson’s reanalysis is that she chose to drastically reduce the information value of our data before conducting her reanalysis. She did this by dichotomizing the data. (We note that only 1 of Nickerson’s 195 computed indices of change—across three variables for each of 65 participants—resulted in a value of “no change.” Therefore, Nickerson’s trichotomization strategy in effect reduces to dichotomization.) Dichotomizing continuous data is strongly discouraged by statistical experts because it has been decisively linked to a wide variety of adverse statistical consequences, including (a) loss of information about individual differences, (b) loss of effect size and power, and (c) loss of measurement reliability (MacCallum, Zhang, Preacher, & Rucker, 2002). Thus, Nickerson’s tallies (in the rightmost column of her Table 1) drastically diminish the information value of the original data. With her extreme approach to data reduction, it is foreseeable that statistical significance would be lost in almost any analysis of these tallies.
In sum, Nickerson’s approach of conducting a between-person test on dichotomized data is far less sensitive than our original analysis, which used the full information within our continuous data.
Future Directions
Although the trio of hypotheses we tested in our study continue to be supported by the initial data we collected in 2007, we acknowledge that the scientific study of upward-spiral processes remains in its infancy (for a review, see Fredrickson & Joiner, in press). Only a handful of studies, for instance, have incorporated biological variables into tests of the prospective and reciprocal relations that form upward-spiral dynamics (i.e., Burns et al., 2008; Kok & Fredrickson, 2010 4 ). Whereas multiple laboratories have reported cross-sectional positive associations between cardiac vagal tone and well-being (both physical and mental; e.g., Bhattacharyya, Whitehead, Rakhit, & Steptoe, 2008; Geisler, Vennewald, Kubiak, & Weber, 2010; Marsland et al., 2007; Oveis et al., 2009; Wang, Lu, & Qin, 2013; for nonlinear associations, see Kogan, Gruber, Shallcross, Ford, & Mauss, 2013; yet for null results, see Silvia, Jackson, & Sopko, 2014, and Sloan et al., 2017), to date, no direct or conceptual replications of the longitudinal field experiment we reported in 2013 have been published. Of relevance for further longitudinal investigations of upward-spiral dynamics are new statistical tools that can be used for rigorous and simultaneous tests of between-person and within-person effects over time (e.g., group iterative multiple-model estimation: see Beltz, Wright, Sprague, & Molenaar, 2016; latent curve models with structured residuals: see Curran et al., 2014). These advanced statistical tools, however, require larger samples (e.g., N = 250) and more frequent repeated assessments (e.g., five time points) than are available in the data set on which our study was based. Rigorous and well-powered tests are thus still needed to further examine whether and how self-generated positive emotions improve vagal tone and other objective markers of physical health, and whether they do so in an upward-spiral dynamic.
Footnotes
Acknowledgements
The authors wish to thank Aaron Boulton, Sara Algoe, Ann Firestine, and James Long for their assistance with preparing this response.
Action Editor
D. Stephen Lindsay served as action editor for this article.
Author Contributions
B. L. Fredrickson and B. E. Kok each drafted sections of the manuscript and provided critical revisions. Both authors approved the final version of the manuscript for submission.
Declaration of Conflicting Interests
B. L. Fredrickson served as a paid consultant to
on the subject of evidence-based well-being interventions. B. L. Fredrickson also serves as an uncompensated advisory board member for several nonprofit organizations, including the Positive Coaching Alliance, The Psych Report, and the VIA Institute on Character. The authors declared that they had no conflicts of interest with respect to the authorship or the publication of this article.
Funding
The data reanalyzed in this article were collected with support from a research grant (MH59615) awarded to Principal Investigator B. L. Fredrickson by the National Institute of Mental Health of the U.S. National Institutes of Health (NIH). Support for B. L. Fredrickson’s time came from a Research and Study Assignment from the College of Arts and Sciences at the University of North Carolina at Chapel Hill, a Cattell Sabbatical Award from the Association for Psychological Science, and three research grants awarded by the NIH: a National Institute for Nursing Research Grant (NR012899), an award supported by the NIH Common Fund, which is managed by the NIH Office of the Director/Office of Strategic Coordination; a National Cancer Institute Research Grant (CA170128); and a National Center for Complementary and Integrative Health Research Grant (AT007884). These funding agencies played no role in the decision to publish this Commentary or in the preparation of the manuscript.
Open Practices
All data, along with James Long’s assessments of the disputed cases, have been made publicly available via the Open Science Framework and can be accessed at https://osf.io/jazfy/. The complete Open Practices Disclosure for this article can be found at https://journals-sagepub-com.web.bisu.edu.cn/doi/suppl/10.1177/0956797617707319. This article has received the badge for Open Data. More information about the Open Practices badges can be found at
.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
