Abstract

There is a real need for hands-on, applied textbooks on using propensity scores and other modern causal estimation strategies. Much of the literature that currently exists is full of wonderful insights and methods, but currently, and partly because this is a field that is so rapidly developing, it is hard to get oriented. Many of the main papers are quite technical and can be a bit overwhelming as a point of first entry. The primary textbooks, while generally excellent, usually follow only a specific branch of the literature and frequently do not give much practical advice on how to actually fit a propensity score model, trim one’s data, select one’s covariates, or assess how well those steps went in the dirty world of actual application. Many books give no help with software, which is incredibly important for applied researchers. Generally, when my students ask me what to read to get started, or when I am consulting to help people use these methods, I am at a loss as to where to point them first.
This is why I was initially very excited about Using Propensity Scores in Quasi-Experimental Designs. Unfortunately, however, I believe this book does not give this sort of guidance. It is very high-level. While too much technical detail can be overwhelming, this book basically has none. This makes it very difficult to discuss many practical concerns such as what the estimands are or how these methods can go awry.
Many sections of the book are incredibly dense, with many details about different methods and their properties. Unfortunately, however, the book does not often step back and help the reader synthesize this information within the general conceptual framework of what the point of propensity score adjustment is or how it works. The first chapter attempts to do this to some extent, but many of the core concepts such as strong ignorability are left too vague. Some terms, such as the definition of the Stable Unit Value Assumption on Page 2 and its use on Page 5, are not standard in the literature. This could create confusion when readers of this book engage with the rest of the field (see also the definition of a probit estimate on Page 80). These nonstandard uses are further exacerbated by statistical errors, such as suggesting an expected value is not the same as a mean (on Page 4).
Chapter 1 of the text is the obligatory chapter comparing the gold standard of randomized experiments to the real world of observational studies. Chapter 2, focusing on the high-level idea of causal inference, has a nice overall structure. The author presents the goal of controlling confounding and then gives a list of the different methods one might take to do it. It is refreshing to see a single textbook specifically address the multiple avenues that all have the same aim. Unfortunately too little detail on the methods is given, making these comparisons difficult. Matching, for example, is discussed without ever explicitly discussing the idea of forming pairs or groups of similar units.
The middle of the book covers different aspects of how one might use propensity scores. It also delves a bit deeper into matching methods that work with the original covariates themselves, such as Full Matching (Hansen & Klopfer, 2012) or GenMatch (Diamond & Sekhon, 2013). The final chapters go into more complex scenarios such as missing data. There are appendices with reference to statistical code for a variety of software (Stata, R, SPSS, and SAS), but these pointers seem too terse to be of much use (e.g., the R section does not list the packages associated with the example calls).
I would have liked greater transparency in the claims made of the text. Some of the statistical claims of the book are unfamiliar to me and associated citations are sporadic; it would be better if specific pointers to the literature were given for those who want to delve deeper. For example, a discussion of unequal group size causing bias in the standard errors on Page 79 was counterintuitive to me, but there was no way to follow up. In a discussion of matching on Page 103 and onward (Chapter 5), there is an illustration of measuring bias without giving any gold-standard ground truth. I found myself confused, therefore, as to how the bias was being measured and compared. Furthermore, the specific process of matching was left quite vague. I could not find specific statements, for example, of how units in the treatment group would be tied to units in the control.
Overall, I admire that this book tries to pull together and compare many different approaches to causal analysis. Unfortunately, however, the conceptual errors and vague descriptions make real comparison impossible. Furthermore, the book is lacking in the specific detail that could help applied researchers implement any of the approaches discussed. My main concern, however, is that the fundamental concepts of causal inference are not explicitly presented. For example, the root idea of matching, of comparing like with like, could have been foregrounded more. The idea that two units with the same propensity score could be considered as-if randomized, in my mind a critical conceptual underpinning of the power of propensity scores, was basically absent. How different assumptions make things possible could have been made clearer. What a causal effect even is could have been more explicitly addressed.
Writing a comprehensive book in this field is a daunting task. The author should be commended for taking it on. Flipping through the book does provide the reader with a survey of the terms used in the literature. The structure of the book does provide a sense of the versatility of propensity scores, and the many methods one might use to incorporate them into an analysis. That being said, the book may have been better served by being more concrete. The reader’s hand could have been held more effectively by making both the discussions and the demonstrations of the methods more transparent.
