Abstract
While most sociology majors must take a statistics course, the content of this course varies widely across departments. Starting from the assumption that sociology students should be able to engage effectively with the sociological literature, this article examines the statistical techniques used in 2,804 journal articles—from four generalist sociology journals from 1990 to 2019 and 11 additional sociology journals from 2019—in order to assess which techniques have risen or fallen in prevalence. Although stalwarts such as ordinary least squares regression, chi-square tests, and t tests maintain strong presences, the rise of logistic regression, interaction effects, and multilevel models has been dramatic. After assessing the proportion of articles students hypothetically could understand given various levels of statistical training, the article ends with suggestions for how to revamp the statistics course to help our students become more numerate citizens, both in their sociology courses and in the world at large.
I have taught undergraduate statistics to sociology students for nearly 30 years, and I have written a social statistics textbook (Linneman 2018). As part of both of these endeavors, I regularly comb through sociology journals in order to find examples to share with my students and readers. In doing so, I have grown increasingly concerned about a possible disjuncture between the statistical techniques we teach our students and the techniques that appear in contemporary sociology journals. I sensed a growing gap, but I had no way of knowing for sure if it existed, nor could I find any such accounting in the literature. Therefore, for my own edification and for my fellow social statistics instructors, I built a data set of journal articles to assess whether this concern is warranted. Do we teach our students techniques that do not appear in the literature? Do we fail to introduce our students to the techniques that are most common in the literature? Before I describe the construction of the data set and present the findings, I first assess the literature on teaching statistics within sociology and describe the gap in the literature this project seeks to fill.
Literature Review
In nationwide surveys of sociologists and representatives from sociology departments and in reports commissioned by the American Sociological Association (ASA), the need to cultivate our students’ statistical skills and their ability to engage with the sociological literature is pronounced. Wagenaar (2004), through his nationwide survey of 301 sociologists, found that on a scale from 1 (skill requires no coverage) to 7 (skill is fundamental and deserves extensive coverage), every measure relating to statistics score highly: quantitative methods (M = 5.9), social statistics (M = 5.6), and “use and assess research” (M = 6.1). A 2004 ASA report (McKinney et al. 2004) recommended that sociology majors should take a statistic course (preferably taught within a sociology department) and that sociology majors should be able to “read professional articles that use different research methods and critically comment on them” (McKinney et al. 2004:9). Delia Deckard (2017) discovered that two-thirds of the 90 sociology programs she surveyed required one statistics course, with an additional 7 percent requiring more than one course. Naturally, this means that over a quarter (27 percent) of the programs did not have a statistics requirement. Sweet, McElrath, and Kain (2014) used a different methodology but came to roughly the same conclusion: In their content analysis of a national sample of program requirements in 2014, they found that 78 percent of programs required a statistics course, which was an increase from 53 percent in a similar study in 2000. Using a more international approach, Parker (2011) found in his study of eight countries that sociology programs required on average one statistics course, making them less quantitative than economics and business programs but more quantitative than political science programs. Putting all of these findings together, it is clear that most sociology programs view as important the need to produce majors who have training in statistics.
This need runs up against students’ very mixed feelings surrounding this required social statistics course. Researchers have found that many students do see the importance in learning statistics, but this is accompanied by a good deal of anxiety (Murtonen and Lehtinen 2003; Ramos and Carvalho 2011; Williams et al. 2008). Students surveyed by Murtonen and Lehtinen (2003), in open-ended responses, characterized their negative feelings toward their statistics course as caused by “superficial teaching” and the inability to create “an integrated picture of the different parts of scientific research and really understanding it” (Murtonen and Lehtinen 2003:182). In a separate study, Murtonen et al. (2008) found that some American students—who, at the beginning of a statistics course, claimed the material would be useful to them in the future—pessimistically changed their minds by the end of the semester.
Some believe that to help students see why quantitative literacy is important, one solution is to integrate data analysis throughout the curriculum so that it is not relegated to a statistical silo. Perhaps the best example of this is the ASA’s early-21st-century program that worked with a wide variety of sociology departments to help them integrate quantitative modules into many of their substantive courses (Howery and Rodriguez 2006; Wilder 2009). Bridges and coauthors (1998) conducted a careful study within the context of a large introductory course in which they integrated a tiered series of quantitative modules, and they were able to demonstrate student improvement throughout the semester. Given the nature of such initiatives, however, the level of statistical training they are able to provide, though foundational, is limited.
Since much of sociology majors’ statistical training does occur within the confines of a single dedicated statistics course, it is surprisingly hard to discern from the literature what content instructors actually teach in such courses. A few projects, using a variety of sources, offer some insight. While Delia Deckard’s survey results from 90 sociology programs do not tell us what is offered in the statistics course, she did discover that a fair number of programs offer specific courses beyond the introductory statistics course, with 32 percent of the programs offering a course in “advanced linear regression”; 19 percent, a course on probit and logit models; 16 percent, a course on big data; and 9 percent, a course on structural equation models (Delia Deckard 2017). Schacht took a different approach, studying 12 statistics textbooks. He was struck that “[a]lthough some texts treat various topics more extensively than others, there does appear to be fairly widespread agreement on the topics that should be covered in introductory statistics textbooks” (Schacht 1990:392). We unfortunately cannot glean from his findings the extent to which particular topics are covered. For example, he notes that all 12 books covered regression, but what particular aspects of regression, and whether the books even go beyond simple regression, is not reported. In addition, these findings are from 30 years ago.
Around this same time, ASA president Hubert Blalock (1989) expressed concern about the state of statistical training. Howery and Rodriguez (2006:24) nicely capture his concern: He noted that, owing to the computer revolution, largescale data sets were now widely available and that the quality of quantitative work had improved. However, he argued that the training of students, both undergraduates and graduates, had failed to keep pace with these opportunities and technological innovations. . . . In short, Blalock argued that there is a quantitative literacy gap in the field of sociology, including in the preparation of undergraduate majors.
Utts (2015:105), speaking more generally about the state of statistics education across all fields, expresses a similar concern: As statistics courses have grown in popularity many of those assigned to teach them have not had adequate training in statistics. And as the skills required of professional statisticians have expanded and changed, degree-granting institutions have had to grapple with what to include and what to omit in the statistics curriculum.
Therefore, while we do not know exactly what happens in social statistics courses, there is a sense that we are failing to adapt to the state of the field.
With regard to statistical techniques used in sociology, then, what exactly is the state of the field? The literature proves lacking in this regard, as well. Researchers, most outside of the field of sociology, have used several methods to gauge the statistical techniques used. Most common is a by-hand content analysis of journals. For example, with an eye toward suggesting the level of training needed by graduate students, Vijverberg (1997) assessed the techniques used by several social science journals (including three sociology journals: American Sociological Review, American Journal of Sociology, and Social Forces) from a single year (1995). He finds that between two-thirds and three-quarters of the articles with quantitative content used techniques beyond basic multivariate regression. Because his sample is from a single year, however, he cannot give insight into temporal trends. In contrast, Trevisani and Tuzzi (2015) examined a single statistics journal (Journal of the American Statistical Association) from 1888 to 2012. Even though they focus only on the words used in the titles of the articles, they are able to identify clear trends in the field of statistics. Porta and colleagues (2013) also analyzed a lengthy time span (1957–2010) but did so by studying how often biomedical articles cited particular statistics texts, tracking the increase in multivariate methods over time. Other researchers examined public health journals (Smith et al. 2006) and communication journals (Trumbo 2004), but a longitudinal study of strictly sociology journals was not to be found. It is my hope that this current study, with its emphasis on sociology, its longitudinal nature, and its level of detail with regard to statistical techniques, will contribute to filling this void.
Method
I analyzed American sociology journals in both a longitudinal and a snapshot manner. The longitudinal component involved four general interest journals, two connected to national associations (American Sociological Review and Social Problems) and two connected to regional associations (Sociological Perspectives and Sociological Forum). For each of these four journals, I examined all of the journal issues from 1990 through 2019, eliciting 30 years of data on these four journals. I included any article whose goal was to make an original empirical contribution to the literature. I therefore excluded certain types of pieces: presidential addresses, polemical essays, comments and replies, introductions to special issues, and strictly theoretical pieces. I coded each remaining empirical article into one of five categories:
Strictly quantitative: The article used an exclusively quantitative methodology.
Mostly quantitative: The article primarily employed a quantitative methodology but also had a small qualitative component.
Equally both: The article equally employed quantitative and qualitative methodologies.
Mostly qualitative: The article primarily employed a qualitative methodology but also had a small quantitative component.
Strictly qualitative: The article used an exclusively qualitative methodology (I did not count a summary table of statistics regarding interview respondents as a quantitative component).
For each article that used quantitative techniques, I noted which techniques the article used. Some articles used a single technique, but many used multiple techniques (some as many as seven separate techniques), and I recorded all of these in an Excel worksheet. Locating the techniques an article used involved scouring the Method and Results sections. Many articles were quite straightforward, carefully listing and justifying their statistical choices. Other articles were more elusive. For example, while most tables of descriptive statistics reported in a footnote which statistical significance test was used (usually either chi-square tests, t tests, or both), occasionally the footnote contained only what the stars signified (such as *p < .05). For these few articles, I coded them as using chi-square tests if the table presented categorical variables and t tests if the table presented ratio-level variables (or both if the table included both types of variables). Given that the presence of ordinary least squares (OLS) regression and multilevel modeling was particularly important (see Findings), I was careful in the coding of these techniques, particularly when they appeared in the same article. Some articles presented OLS results and then also presented results from separate multilevel analyses, and I coded them as using both techniques. A very small number of articles had tables with titles that said “OLS” but then in the table’s footnote said that fixed effects were included (implying a multilevel model), and I coded these articles as using both techniques. Articles that explicitly called the models “multilevel” I coded only as multilevel. Although I initially recorded the presence of descriptive statistics (such as means, standard deviations, correlations, and confidence intervals), these proved so ubiquitous that I decided not to report them as part of the findings.
I then created a list of 40 techniques that appeared most often. Techniques that appeared just a few times in the four journals over 30 years I coded into a category called “unique techniques.” Once I had completed the Excel worksheet, I moved the data into SPSS for analysis.
For the snapshot analysis, I took the 2019 data from these four journals and added to them the 2019 data from 11 additional sociological journals: three more national-level generalist journals (American Journal of Sociology, Social Forces, and Socius), two more regional journals (Social Currents and Sociological Quarterly), and six top specialty journals (Journal of Health and Social Behavior, Social Psychology Quarterly, Sociology of Education, Gender & Society, Journal of Marriage and the Family, and Criminology). I coded these additional 11 journals in the same way as the original four journals. While not an exhaustive list, these 15 journals are common for sociology students to examine as part of their courses and research papers.
Findings
Changes in Techniques over Time
From 1990 to 2019, the four journals had 3,588 empirical articles: 1,279 in American Sociological Review, 805 in Social Problems, 770 in Sociological Perspectives, and 734 in Sociological Forum. Most of these were either strictly quantitative (2,205, or 62 percent) or strictly qualitative (1,113, or 31 percent). This leaves only 270 articles (or 7.5 percent) that involved some combination of quantitative and qualitative methods (with little variation over time). Therefore, in analyzing the four journals over time, I dichotomized type of piece into articles that were strictly qualitative versus articles that had at least some quantitative component. With this in mind, over time there has been a slight decrease in the proportion of articles that had a quantitative component: from a high of 71 percent in the early 1990s to 63 percent in the late 2010s. Each journal, though, followed its own trajectory, as Table 1 shows.
Quantitative Techniques in Four Sociology Journals, 1990 to 2019.
American Sociological Review has remained predominantly quantitative throughout these 30 years, hovering between 80 percent and 90 percent of the articles. Social Problems, in contrast, has become more quantitative in this time period: going from 43 percent quantitative in the early 1990s to 65 percent quantitative in the early 2010s (with a subsequent slight dip to 59 percent in the late 2010s). Sociological Perspectives has remained approximately two-thirds quantitative through these 30 years. Sociological Forum has become steadily less quantitative, falling from 73 percent in the early 1990s to 49 percent in the late 2010s.
Within the 2,475 articles that had quantitative components, several statistical techniques clearly rose above the rest in terms of frequency: OLS regression, logistic regression, interaction effects, and multilevel models. Table 1 illustrates the prevalence of these four techniques in these four journals from 1990 to 2019. There has been a decline in the use of OLS regression in the past 30 years: from appearing in 41 percent of the articles in the early 1990s to 25 percent in the late 2010s. The fact that a quarter of all very recent articles still use OLS regression, however, is a sign of the permanence of this technique in sociology’s statistical arsenal.
Use of logistic regression, interactions effects, and multilevel models, as illustrated by Table 1, has grown swiftly. In recent years, each of these techniques appears in nearly a third of all articles with quantitative components. While logistic regression and interaction effects have grown steadily in the past 30 years, the rise of multilevel models is particularly striking: from 3 percent in the 1990s to 16 percent in the 2000s to 30 percent in the 2010s. This is likely the result of several factors: growing ease of running such models using various statistical software, the growing prominence of data sources collected at multiple levels (e.g., the National Longitudinal Study of Adolescent to Adult Health, the Panel Study of Income Dynamics), and the simple sociological fact that individuals are nested within a variety of institutions and this affects their behavior (Snijders and Bosker 2011). The rise of multilevel models partially explains the simultaneous rise in interaction effects, since many of these models feature cross-level interactions (for example, an interaction between a student’s characteristic and the characteristics of their school). Indeed, the correlation between the use of these two techniques is significant (r = +.18, p < .001). Note that multilevel models diffuse throughout time in these four journals: They first appear in American Sociological Review, then they begin appearing in Social Problems, and then they appear in the two regional journals. OLS and logistic regression have appeared fairly equally across the four journals. Use of OLS ranges from 30 percent in Sociological Forum articles to 34 percent in Sociological Perspectives articles. Use of logistic regression ranges from 25 percent in American Sociological Review to 30 percent in Social Problems. There is more variation in the use of the other two techniques across these journals over these 30 years. Use of interaction effects ranges from 18 percent in Sociological Forum to 28 percent in American Sociological Review. Use of multilevel models ranges from 11 percent in Sociological Forum to 22 percent in American Sociological Review.
To describe the occurrence of all 40 techniques, I created Table 2. It has two axes: prevalence and trajectory. Prevalence is determined by the number of times the technique has been used in the 30 years. Those in the high category are the eight most used techniques, ranging from 216 times (technique: squared independent variable) to 779 times (technique: OLS regression). Those in the medium category are the next 16 most used techniques, ranging from 31 times (technique: the Gini coefficient) to 158 times (technique: event-history analysis). Those in the low category are the 16 remaining techniques, ranging from 7 times (technique: Prais-Winsten regression) to 29 times (technique: log-linear analysis).
Prevalence and Trajectory of 40 Statistical Techniques.
Note: ANOVA = analysis of variance; DV = dependent variable; IV = independent variable; SEM = structural equation modeling.
The other axis concerns a technique’s overall trajectory. I determined each technique’s trajectory by examining time-based cross-tabulations, characterizing each technique as having a rising trajectory, a steady trajectory, or a declining trajectory. Among techniques with rising trajectories, there is much variation in prevalence. For example, both multilevel modeling and seemingly unrelated regression have rising trajectories, but their overall frequencies are quite different. Multilevel modeling rose from a frequency of 25 in the 1990s (appearing in just 3 percent of all articles with quantitative techniques) to 120 in the 2000s (16 percent) to 279 (30 percent) in the 2010s. Seemingly unrelated regression also rose but from only once in the 1990s (appearing in just 0.1 percent of all quantitative articles in this time period) to 6 times in the 2000s (0.8 percent) to 11 times (1.2 percent) in the 2010s. Similarly, while both OLS regression and ANOVA have declining trajectories, they also vary in prevalence. OLS declined from a frequency of 313 times in the 1990s (appearing in 40 percent of all quantitative articles in this time period) to 219 times in the 2000s (29 percent), then slightly rebounded to 246 times in the 2010s (though appearing in a smaller proportion of quantitative articles in this time period, 26 percent). ANOVA declined from a frequency of 33 times in the 1990s (4.2 percent of articles) to 26 times in the 2000s (3.4 percent) to 24 times in the 2010s (2.6 percent).
Table 2 holds several notable findings. In the medium-prevalence/rising-trajectory box, one finds several techniques that are, like logistic regression, variations on the generalized linear model (Hoffman 2004): negative binomial regression, ordered logistic regression, and multinomial logistic regression. Two other related methods are just a box below (probit and Poisson). The rise of such techniques also contributes to the moderate decline in OLS, since these articles often explain that they use these techniques in place of OLS given the nature of the dependent variable. Also located in the medium-prevalence/rising-trajectory box are techniques that analyze networks or spatial phenomena, which have risen from 6 appearances in the 1990s (appearing in only 0.8 percent of articles) to 10 appearances in the 2000s (1.3 percent) to 27 appearances in the 2010s (2.9 percent).
Moving to the right, to the high-prevalence/rising-trajectory box, it might come as a surprise that in addition to the more advanced techniques described earlier, the classic t test is not only quite frequent but on the rise. It has gone from appearing 80 times (10 percent of all quantitative articles) in the 1990s to 107 times (14 percent) in the 2000s to 154 times (17 percent) in the 2010s. This does not necessarily mean that t tests are becoming more integral, though, since many of these appearances are simply part of the reporting of descriptive statistics. The chi-square test maintains a high and steady presence for similar reasons.
In the middle box (medium prevalence/steady trajectory), several techniques that became popular in the mid-to-late twentieth century have maintained a continual presence: factor analysis (often used as a preliminary technique in articles), event-history analysis, and path analysis/structural equation modeling. I coded path analysis and structural equation modeling into a single category, given their similarities, but the more advanced technique of structural equation modeling has nearly supplanted the simpler technique of path analysis (the former appearing 39 times in the 2010s; the latter, only 6 times).
Located in the bottom middle box (medium prevalence/declining trajectory) are both ANOVA and various measures of association. The decline of ANOVA could be attributed to the relative lack of experimental methods in these four journals, although it can also be used with survey data. There are numerous measures of association (such as Somers’ d, phi, and lambda), but they appeared so infrequently that I coded them into a single category.
Of course, as I mentioned already, most articles used multiple techniques. The bottom of Table 1 shows over time that the mean number of techniques per article across all four journals has increased. In the early 1990s, the mean number of techniques was 1.60, and only 5 percent of articles used four or more techniques. This steadily increased throughout these 30 years, and by the late 2010s, the mean number of techniques was 2.28, with 17 percent of the articles using four or more techniques. Limiting this to strictly quantitative articles (that is, articles without any qualitative data component), the rise is from a mean of 1.68 techniques in the early 1990s to a mean of 2.44 techniques in the late 2010s.
A Snapshot of 2019
I then combined the 85 articles from these four journals’ 2019 issues with the 329 articles with quantitative components from the 2019 issues from the additional 11 journals in order to create a broader current snapshot of the statistical techniques used in sociology. Table 3 provides an overall view of the prominence of quantitative research in these 15 journals in 2019. All five national-level journals were majority quantitative in 2019: ranging from 64 percent (American Sociological Review and Social Problems) to 86 percent (Social Forces). Next, the regional journals varied substantially, from Sociological Forum’s 26 percent to Social Currents’s 70 percent. The specialty journals varied as well, from Gender & Society’s 19 percent to Journal of Marriage and the Family’s 92 percent. Overall, 414 (71 percent) of the 580 articles in these 15 journals in 2019 had some quantitative component, and 379 (65 percent) were exclusively quantitative. Five articles were primarily quantitative with a small qualitative component, 20 articles were equally quantitative and qualitative, and 10 articles were primarily qualitative with a minor quantitative component. I should note that the three journals with the highest proportion of quantitative articles—Socius, Social Forces, and Journal of Marriage and the Family—also happened to have more articles than the other journals, and these three journals alone contribute nearly half (46 percent) of the overall number of articles with quantitative content.
Quantitative Techniques in 15 Sociology Journals, 2019.
Note: OLS = ordinary least squares.
These 414 articles used a total of 973 techniques, and Figure 1 shows that the same four techniques predominated: multilevel models, interaction effects, logistic regression, and OLS regression. These four techniques alone account for nearly 50 percent of all techniques used in 2019. With regard to whether or not the initial four journals are similar to the other 11 journals, I compared their late 2010s data to the 2019 data from the other 11. They were strikingly similar, with 35 of the 40 techniques within 2 percentage points of one another. For example, the four journals in the late 2010s featured multilevel models 32.1 percent of the time, compared with 32.5 percent for the other 11 journals in 2019. The four journals featured OLS 25 percent of the time in the late 2010s, compared with 24.6 percent for the other 11 journals in 2019. The five techniques that differed were a logged independent variable (overrepresented in the four journals, 12.4 percent compared with 7.9 percent in the other 11 journals), latent class analysis (underrepresented in the four journals, 0.6 percent compared with 2.7 percent in the other 11 journals), path analysis/structural equation modeling (underrepresented, 5.7 percent vs. 7.9 percent), ANOVA (underrepresented, 2.0 percent vs. 4.3 percent), and multinomial logistic regression (underrepresented, 5.1 percent vs. 7.9 percent).

Frequency of techniques in 15 sociology journals in 2019.
Table 3 also shows the prevalence of the top four techniques in the 15 journals in 2019. In six of the journals, multilevel models are the most frequently used technique, while in another six journals, logistic regression predominates. OLS regression maintains a presence, however, even in the top national journals, appearing on average in 26 percent of the articles across all 15 journals.
Preparing Our Students: A Thought Exercise
If we want to give our sociology students the ability to consume quantitative sociological research critically and effectively, a common goal is to grapple with research articles. Indeed, as discussed in the literature review, this is a stated goal of an official ASA report (McKinney et al. 2004). We do not want them to flounder, however, encountering statistical techniques for which we have not provided them adequate training to interpret. With this in mind, I approached these thousands of journal articles—the 4 journals over time and the additional 11 journals for 2019—in a different way. I first developed nine levels of difficulty based on the preceding analyses as well as my understanding of the mathematical training it would take to understand the techniques fully:
Level 1: Articles that pretty much anyone could read because they involve simple descriptive statistics, frequencies, and percentages
Level 2: Articles that one could read if they had instruction in basic statistics, such as chi-square, t tests, measures of association, index of dissimilarity, and the Gini coefficient
Level 3: Articles that involve OLS regression tables that, with adequate training, one could read and interpret the slopes, R-squared values, and sets of nested regression models
Level 4: Articles with OLS regression that also includes various variable transformations, such as interactions, squared or logged independent variables, or logged dependent variables
Level 5: Articles that use logistic regression, for which a student would need additional training beyond OLS regression to understand how logistic regression extends beyond linear regression
Level 6: Articles that extend beyond logistic regression into other maximum-likelihood types of regression: ordered logistic, multinomial logistic, log-linear analysis, Poisson regression, tobit and probit regression
Level 7: Articles that use techniques that in some ways resemble regression but extend significantly beyond it and require additional training to fully understand: event-history analysis, path analysis, structural equation modeling, factor analysis, two-stages least squares, and generalized least squares
Level 8: Articles that involve multilevel modeling
Level 9: Articles that include highly advanced techniques, such as seemingly unrelated regression, latent class analysis, growth curve models, network analysis, and computer simulations
I realize that this ordering is somewhat subjective and that some instructors might consider particular techniques to be at either a slightly higher or slightly lower level.
Using these levels, I coded each article with this question in mind: What is the highest level of difficulty the article reaches such that a student reading the article would have to possess this level of knowledge in order to understand fully the article’s statistical analyses? For example, I coded an article that uses t tests and OLS regression as Level 3. Had this same article also used multinomial logistic regression, I would have coded it as Level 5. Doing this allows us to see what proportion of journal articles a student with a particular level of training could be expected to comprehend. Figure 2 summarizes this analysis. There are four lines: one for each of the three decades of the four journals studied over time and one for the 15 journals from 2019. The leftmost point of each line represents the proportion of articles that a student could be expected to consume fully if they had only Level 1 abilities (descriptive statistics, frequencies and percentages). Then, each subsequent move to the right represents, in a cumulative way, the additional proportion of articles that would be added if the student were to acquire the next level of statistical skills. The rightmost point of each line is at 100 percent—were a student to have training in all techniques, they hypothetically could understand all quantitative journal articles. This allows us to see, over time, what a particular level of statistical fluency merits in terms of the ability to read these journal articles. Following the thickest line, in the 1990s, someone with the most basic level of skills still would have been able to make sense of 10 percent of the articles. If a student had training through OLS regression (Level 3), this proportion rose to 30 percent. If they had training through logistic regression (Level 5), this proportion rose to 60 percent. By the 2000s, these same percentages fell to 5, 18, and 46 percent, respectively. By the 2010s, they were 4, 14, and 34 percent; in 2019, they were 4, 12, and 31 percent. Thus, over time, these same levels of training began to yield less ability to read large proportions of quantitative articles. To have the capacity to read just over half of the articles in 2019, a student would require training through Level 7. Such training, in the 1990s, would have garnered the ability to read 88 percent of the articles.

Level of difficulty of statistical techniques in four time periods.
Discussion
According to a report from the ASA, preparing our students to engage with the sociological literature is an important goal (McKinney et al. 2004). The preceding review of the statistically oriented literature, however, reveals this to be a daunting task. In 15 sociology journals in 2019, nearly two-thirds of the articles were strictly quantitative in their methodology. A sociology student who has statistical training through basic multiple regression would be able to comprehend fully just 12 percent of these quantitative articles. If one takes a more longitudinal approach and assumes that a student is engaging with the past 20 years of statistically oriented literature (at least in the four journals I analyzed over time), this percentage rises but not even to a quarter of the articles (23.3 percent). Such findings offer an additional explanation for why some of our students question the usefulness of their statistics course: It may not provide them with sufficient training to engage with the current literature in their chosen field.
This final section offers several suggestions for instructors of social statistics to consider. Given the wide range of mathematical backgrounds among sociology majors, some instructors might find these recommendations dauntingly unrealistic, while other instructors may have implemented such suggestions long ago in their courses. I hope that these suggestions will prove useful in helping close our field’s quantitative literacy gap (Howery and Rodriguez 2006).
As noted in the literature review earlier, most sociology programs require only a single statistics course. Therefore, it is critical that every concept taught in these courses can be justified. Instructors should ask themselves, Why am I teaching this particular technique? Granted, some of the older techniques are foundational. As described in the findings earlier, t tests and chi-square tests remain prominent stalwarts, if only in the preliminary stages of an article’s findings. In his review of statistics textbooks, Schacht (1990) found that most books cover one-sample, independent-sample, and paired-sample types of t tests. Perhaps teaching one type in depth and then merely mentioning the other two would suffice, freeing up a few days of lecture to cover other topics. My findings show a marked decrease over time in the use of some techniques, such as ANOVA and measures of association, that were more prominent in earlier decades. In the 15 journals in 2019, only a single article used a measure of association (an article in American Journal of Sociology used Cramér’s V). If an instructor spends several days covering various measures of association, perhaps they could teach the one they find most useful and then tell their students that there are other measures for other situations but refrain from spending several more lectures on them.
With some days freed up, how should instructors decide what additional topics to add? Given the findings discussed earlier, I recommend that instructors, if they have not done so already, add interaction effects, logistic regression, and nonlinear effects. My longitudinal analysis found that use of these techniques has risen markedly in the past 30 years. Of the 973 techniques used in the 15 journals in 2019, these three techniques made up over a quarter (28.3 percent) of them (interactions, 120 times; logistic regression, 117 times; nonlinear effects, 38 times). Given that these techniques are all somewhat straightforward extensions of regular regression, students might find them accessible. If the course textbook does not cover these topics, there are short pieces available in print or online. For example, back in 1987, Walsh wrote a short article for Teaching Sociology called “Teaching Understanding and Interpretation of Logit Regression,” in which he states, “A grasp of logistic regression will not only assist students in their own research efforts, but it will also enable them to intelligently read and evaluate current research in their field” (Walsh 1987:178), and then provides some clear examples of the technique.
The more advanced techniques that are prominent in the current sociological literature present a more challenging situation, for most of them involve significant mathematical training to understand fully. Perhaps full understanding is not a reasonable goal, particularly for undergraduate sociology majors. One could still introduce these topics and explain how to consume such research, at least on a cursory level. Several of these technique bear at least some resemblance to OLS regression, so if a student has solid training on how to read a regression table in a journal article, they could at least partially understand such advanced results. I showed earlier how multilevel models have grown to incredible prominence in recent sociological literature. While an instructor may not want to go into great detail about the difference between fixed effects and random effects, or how to build such models using software, they might at least want to explain to their students how to interpret a cross-level interaction effect should they encounter it. If a student has had training in logistic regression, it is not too much additional work to introduce them to the ordinal or multinomial forms of it. These two techniques accounted for 5 percent of the techniques used in the 15 journals in 2019 (multinomial logistic, 31 times; ordinal logistic, 16 times).
A final recommendation is that, as a field, we should gain a better understanding of the state of our statistics courses and what goes on in them. While recent surveys covered in the literature review get us part of the way there, some vital information is lacking. I considered collecting a sample of syllabi but realized that those available on ASA’s TRAILS resources are in no way representative of any larger population of courses. I also considered analyzing contemporary social statistics textbooks, but given that instructors may not use all of the content of a given book, or may supplement the book with additional material, I decided against this, as well. While it is useful to know that most sociology programs require a statistics course, it would be extremely beneficial to know the specific content these courses cover.
Having completed this tour through the quantitative literature, I believe that I have gained a better sense of what to impart to my statistics students the next time I teach this course. I also believe it gives me justification that I can share with my students regarding why I am covering this content: By teaching them particular techniques, I am helping them become numerate citizens who can engage with the literature in their chosen field. I hope my fellow social statistics instructors find this research useful, as well.
Footnotes
Acknowledgements
The author would like to thank Grace Shamlian for her research assistance with the literature review, Farhang Rouhani for his advice on graphics, and the students in his statistics courses.
Editor’s Note
Reviewers for the manuscript were, in alphabetical order, Joel Best, Natalie Delia Deckard, and Laura Sanchez.
