Abstract

Richard Berk, Emil Pitkin, Lawrence Brown, Andreas Buja, Edward George, and Linda Zhao have written a valuable Evaluation Review article (Berk, Pitkin et al., 2014) on regression adjustment in randomized experiments. I’ve long been a fan of Berk’s critical writings on regression and meta-analysis (Berk, 2004, 2007; Berk & Freedman, 2003), and I recently recommended Berk, Brown, et al.’s (2014) helpful Sociological Methods and Research article to a colleague teaching a course on regression. Also, I am grateful to Berk for sending me kind and constructively critical comments on Lin (2013) after it went to press. In my reply to his e-mail, I shared an informal essay (Lin 2012a, 2012b) discussing regression adjustment in practice. We are in agreement on many points.
My subsequent comments are only to clarify a few of Berk, Pitkin, et al.’s (2014) statements about Lin (2013): “He replicates Freedman’s overall results and then turns to a conceptual framework that differs substantially from Freedman’s.” (Berk, Pitkin, et al., 2014, p. 172) “For Lin, the population from which the subjects are drawn is real and finite. The researcher is assumed to know the population mean for the covariate, which can be used as the value of θ. In most social science applications, that mean will not be known.” (Berk, Pitkin, et al., 2014, p. 179) “Lin implicitly loosens the ties to the Neyman approach by making use of a real, finite population from which the data can be treated as a random sample. His conclusions are less pessimistic than Freedman's. However, his proposed estimator will usually not be operational in practice…” (Berk, Pitkin, et al., 2014, p. 187)
In fact, I stuck rigidly to Neyman’s and Freedman’s framework: “The n subjects are the population of interest; they are not assumed to be randomly drawn from a superpopulation” (Lin 2013, p. 297). Since the “population” is just the n subjects in the experiment, the population mean for the covariate is known, and the ordinary least squares with interactions estimator that I studied is identical to Berk, Pitkin, et al.’s (2014, p. 182, equation 11) estimator. Their article helpfully provides a different standard error estimator (Berk, Pitkin, et al. 2014, p. 183, equation 12) because they are generalizing to an infinite population. (I am agnostic about finite- vs. infinite population inference. In my article, I recommended Reichardt and Gollob’s [1999, pp. 125–127] discussion and wrote, “My purpose is not to advocate finite-population inference, but to show just how little needs to be changed to address Freedman’s major concerns.”)
Perhaps the confusion stems from the “imaginary infinite sequence of finite populations” (Lin 2013, p. 301). This is merely the same setup that Freedman assumed for his asymptotic results (and my regularity conditions are merely Freedman’s [2008b] regularity conditions, generalized to allow multiple covariates). He writes (Freedman 2008a, p. 184), “In principle, our inference problem is embedded in an infinite sequence of such problems, with the number of subjects n increasing to infinity.” However, the goal here is still to make inferences about the average treatment effect on the actual subjects in the experiment. As Lehmann (1999, p. 255) writes, “We must go back to the purpose of embedding a given situation in a fictitious sequence: to obtain a simple and accurate approximation. The embedding sequence is thus an artifice and has only this purpose…”
These are but small points in the larger scheme of things, and I look forward to much useful research from Berk, Pitkin, Brown, Buja, George, and Zhao.
