Abstract
This study presents results from a randomized experiment in the 2015–2017 National Survey of Family Growth, where a large national sample of U.S. individuals aged 15–49 was randomly assigned to one of two different versions of a survey question about sexual identity (one with three response options, including heterosexual, gay/lesbian, and bisexual, and one adding the option “something else”). Analyses of changes in the associations of sexual identity with alcohol, tobacco, and other drug use across these treatments revealed evidence of significant differences in the associations that remained robust after adjusting for socio-demographics. The results suggest that when individuals choose their sexual identity from a more limited number of response options, the heterogeneity of the sexual identity subgroups increases, weakening estimated associations of sexual identity with these behaviors. Open-ended questions may therefore be necessary to measure sexual identity and estimate its associations with substance use behaviors accurately in surveys.
Introduction
Many studies indicate that individuals self-identifying as sexual minorities (e.g., lesbian, gay, and bisexual) are at increased risk of substance misuse, and that this increase tends to be larger for women and bisexuals (Boyd et al. 2019; Fish et al. 2018; McCabe et al. 2009; Medley et al. 2016). The literature presents a variety of theoretical explanations for these differences, including minority stress, individual- or structural-level discrimination, stigma, and victimization faced by sexual minorities (Hatzenbuehler et al. 2010; Hughes et al. 2010; Meyer 2003). The public health implications of these recent studies are critical: sexual minorities may need additional resources and personalized treatment options to prevent substance misuse and progression to substance use disorders (Boyd et al. 2019).
The majority of prior studies focusing on sexual minorities and substance use have performed secondary analyses of national survey data sets, rather than clinical assessments. The response options provided for survey questions asking about sexual identity are therefore critically important for understanding these subgroup differences in substance use behaviors. A significant amount of research has been conducted on the limitations of survey research methods in measuring sexual identity and orientation (Baldwin et al. 2017; Bauer and Jairam 2008; Diamond 2008; Eliason et al. 2016; Eliason and Streed 2017; McCabe et al. 2012; Ridolfo et al. 2012; Savin-Williams and Ream 2007). For example, Ridolfo et al. (2012) used both qualitative and quantitative approaches to assess the validity and measurement properties of survey questions about sexual identity, such as “What sexual orientation do you consider yourself to be? (1) heterosexual, (2) gay or lesbian, (3) bisexual, (4) other, (5) don’t know.” These authors found that certain subgroups (e.g., individuals with lower socioeconomic status or Hispanics) may be more likely to have their sexual identity misclassified when using standard survey measures of sexual identity. Furthermore, survey respondents who select an “other” response option to describe their sexual identity are a diverse subgroup defined by different sexual and gender identities (including transgender identities). This is partly due to the other response option behaving like a “nonresponse” category, capturing individuals confused by the response options or resisting misidentification (FIWG 2016; Truman et al. 2019). These issues make meaningful comparative analyses difficult, although this option is certainly identifying unique individuals (Ridolfo et al. 2012).
Including “something else” as a response option for sexual identity questions may also lead to the misclassification of sexual identity. For example, analyzing a convenience sample of women in health clinics, Eliason and Streed (2017) found that responses of something else to a version of the sexual identity question asked in the National Health Interview Survey (NHIS) defined a heterogeneous subgroup, including 36% of individuals who indicated they were not straight but use another label, and 57% of individuals continuing to respond with something else when asked for clarification. Eliason et al. (2016) found that individuals responding with something else had unique socio-demographic, physical, and mental health characteristics. Related research on open-ended write-in identities further clarifying a response of something else has suggested that a large majority of these respondents are not in fact sexual minorities, but rather individuals refusing to answer (e.g., “none of your business”), sexual majorities recording “protest” responses (e.g., “Christian male”), or individuals conflating sexual orientation with gender identity (Bates et al. 2019; Mishel 2019; Ridalfo et al. 2012; Truman et al. 2019). Despite the small percentages of respondents who tend to select the something else category, these studies suggest that respondents self-identifying in this category are heterogeneous and unique, with identities not clearly captured by the other response options.
The research summarized above suggests that survey questions in a national study asking about sexual identity and omitting a something else response option may force individuals who do not identify as heterosexual, gay/lesbian, or bisexual to choose between the three broader options, potentially resulting in incorrect categorization (or even missing data; Jans et al. 2015). This increased heterogeneity of the subgroups defined by self-reported sexual identity may adversely affect population estimates of the associations between sexual identity and substance use behaviors. In statistical analyses, if a categorical predictor variable of interest (e.g., sexual identity) is prone to misclassification errors and defines subgroups that are more heterogeneous than they should be with correct measurement, estimates of the association of that predictor with a dependent variable of interest (e.g., substance use) will be biased and generally attenuated toward a null relationship (Savoca 2000).
For purposes of the present study, we focus on the implications of including something else as a response option for estimating the associations of sexual identity with self-reported alcohol, tobacco, and other drug use behaviors in surveys. We seek answers to an important methodological question: Are national survey estimates of the associations between sexual identity and substance use behaviors sensitive to the choices provided in sexual identity response options? To answer this question, we analyze the results of a randomized experiment embedded in a national survey of U.S. persons aged 15–49 and examine the effects of including a something else response option for sexual identity on national estimates of the associations between sexual identity and substance use.
Method
Design/Participants
The National Survey of Family Growth (NSFG) collects fertility and family formation data from a national probability sample of individuals aged 15–49 (Lepkowski et al. 2013). In this study, we analyze public-use NSFG data collected from 10,048 U.S. males and females between 2015 and 2017. In the NSFG, sex is only measured in a binary fashion (male/female) and recorded during a household screening interview to identify individuals eligible for the study. We therefore cannot analyze any individuals with nonbinary gender or transgender individuals. Our results will also not apply to early adolescents (under the age of 15) or older adults (above the age of 49) given the NSFG age range.
Intervention
During the 2015–2017 data collection, the NSFG implemented a split-sample experiment in which they randomized each respondent to receive one of two versions of a question about sexual identity. Half of the sample received one version during audio computer-assisted self-interviewing (ACASI; see https://www.cdc.gov/nchs/data/nsfg/NSFG_2015-2017_UG_App5_QuexChanges.pdf), and half received the other. The first version of the question was unchanged from previous NSFG years, and reads as: “Do you think of yourself as…Heterosexual or straight (1), Homosexual or gay/lesbian (2), or Bisexual (3)?” The second version of the question was drawn from the NHIS, and reads as: “Which of the following best represents how you think of yourself? Lesbian/gay (1), Straight, that is, not lesbian/gay (2), Bisexual (3), or Something Else (4)?” We refer to these half-samples of respondents as treatment groups 1 and 2 (TG1 and TG2) moving forward. Write-in options were not available for TG2, so we cannot qualitatively examine the types of respondents selecting the something else response option.
While the wording of these two questions was quite similar, the NHIS version did not use the terms “heterosexual” or “homosexual” in the response options because some respondents have found these terms to be confusing (Miller and Ryan 2011; Ridolfo et al. 2012). We also note that the ordering of the responses varied in the two different versions of the questions. Response-order effects are generally much more prominent in telephone surveys, where respondents cannot see the survey questions and response options (Holbrook et al. 2003). Self-administration of these types of sensitive questions does not introduce substantial effects of response ordering (Bishop et al. 1988; Sykes and Collins 1988). The primacy effects that have been previously reported for “speeders” and those with lower education in web surveys with no interviewer present (Galesic et al. 2008; Malhotra 2008) would likely be mitigated by the interviewer presence during ACASI. We therefore had no theoretical reason to expect that the different ordering of the response options would affect our analyses. Finally, the NHIS version also contains the qualifier “that is, not lesbian/gay” in the “Straight” category. Ridalfo et al. (2012) noted that this is important, as it allows respondents to identify with “not-me identities,” constructed through a process of disidentification with an often-stigmatized group. This subtle difference may have resulted in slightly different populations identifying with the Straight category across TG1 and TG2.
Measures
The NSFG measures sexual identity using ACASI, where an NSFG interviewer provides the respondent with a laptop and headphones to ensure privacy (Copen et al. 2016). The NSFG uses this approach given the sensitivity of the ACASI questions, and to ensure comprehension of these questions for those with literacy concerns. The NSFG collects several validated measures of substance use and misuse via ACASI. Measures of misuse in the past month include binge drinking (five or more drinks within two hours for males and four or more drinks within two hours for females), frequent binge drinking (binge drinking on more than four occasions), and high-intensity drinking (10 or more drinks in one sitting for men and eight or more drinks in one sitting for women). Measures of misuse and use in the past year include binge drinking, cigarette smoking, cigarette smoking at the rate of a pack per day, marijuana use, and other illicit drug use (including cocaine, crack, and crystal meth). Measures of substance use disorders were not available in the NSFG.
The NSFG also measures sex and four covariates that have been found in prior studies to correlate with substance use outcomes (McCabe et al. 2009, 2018; SAMHSA 2019): age (18–24, 25–34, 35–49), race (White, Black, other), education (less than high school, high school, greater than high school), and total family income ($0–$19,999, $20,000–$34,999, $35,000–$69,999, $70,000+). Because Spanish speakers tend to struggle with sexual identity, as the concept of “straight” does not resonate culturally (Ridolfo et al. 2012), we also include an indicator of Hispanic ethnicity (yes, no) in our analyses.
Analytic Approach
All analyses performed in this study were design-based, using survey weights to compute estimates of population parameters and accounting for the complex sampling features of the NSFG when estimating standard errors. All bivariate tests of associations between categorical measures employed design-adjusted Rao-Scott tests, logistic regression models were fitted using the pseudo maximum likelihood estimation method for complex samples, and design-adjusted subpopulation analyses and goodness of fit tests were employed when appropriate (Heeringa et al. 2017). We stratified all analyses by sex, given consistent evidence of larger increases in risk for sexual minority women and to allow for comparisons to other national studies. In the multivariable analyses, very small fractions of cases were dropped due to missing data on the sexual identity and substance use items, so no adjustments for item-missing data were performed. All analyses were performed in Stata (Version 16.1; StataCorp LLC, College Station, Texas).
We first assessed the comparability of the two treatment groups by comparing the estimated population distributions for the covariates and the substance use measures based on these groups. We then performed simple descriptive comparisons of the weighted response distributions for the sexual identity items in each treatment group. Next, we performed a series of bivariate analyses, examining distributions of the covariates and the substance use outcomes for each of the sexual identity subgroups in each treatment group. Finally, we fitted logistic regression models to each treatment group’s data to predict the various binary substance use outcomes with the two different measures of sexual identity and the five covariates. We formally compared the subgroup differences between the two treatment groups by fitting supplemental models to the full sample and testing an interaction between sexual identity and the treatment group indicator. In these supplemental analyses of interactions, we dropped respondents indicating “something else” in TG2, enabling a comparison across the treatment groups of estimated differences in substance use behaviors between the more commonly endorsed sexual identities (straight, gay/lesbian, and bisexual).
Results
Comparability of the Treatment Groups
Supplemental Table A presents comparisons of the estimated population distributions for the covariates and the substance use measures across the two treatment groups. This table shows that the populations represented by the two samples were nearly identical for men and women and provides no evidence of confounding factors that would affect comparisons of the associations of sexual identity with substance use across the treatment groups.
Differences in Response Distributions for Sexual Identity
Table 1 presents descriptive results for the two treatment groups, representing estimated sexual identity distributions for the NSFG target population (persons between the ages of 15 and 49). The question with four response options (including something else) slightly reduced the estimated prevalence of the “heterosexual, straight” category, consistent with the literature. Among women, there were more respondents in the something else category than in the “lesbian” category in TG2, and the probability of identifying as bisexual was twice as high as the probability of identifying as something else. Among men, the counts of male respondents indicating “bisexual” or something else in TG2 were much more comparable. Overall, Table 1 suggests that there is a non-negligible number of individuals in this population who would identify as something else if possible.
Estimated Prevalence of Sexual Identity Subgroups by Treatment Group (NSFG, 2015–2017).
Bivariate Associations for Women and Men
Supplemental Tables B and C show clear differences in age, race/ethnicity, education, and income distributions across the sexual identity subgroups. For example, higher percentages of women identifying as bisexual are in the youngest age category, but there seems to be an even age distribution among those women identifying as something else. In addition, relatively large fractions of women and men identifying as something else report Hispanic ethnicity. Among men, there is strong evidence of associations involving race/ethnicity and income. These associations introduce a need to adjust for these characteristics in multivariable analyses comparing the identity subgroups.
Table 2 presents consistent evidence of attenuated associations between sexual identity and substance use for women in TG1. For example, when considering any past-month binge drinking, an analysis comparing heterosexual, lesbian, and bisexual women based on TG1 concluded that there were no differences in the prevalence of this behavior among the three subgroups based on a Rao-Scott test (P = 0.13). However, an analysis based on TG2 suggested a strong association between sexual identity and past-month binge drinking (P < 0.01), with an estimated 47% of bisexual women engaging in this behavior compared to only 23% of lesbian women and 29% of heterosexual women. For all eight outcomes, the subgroup differences were attenuated for TG1, and for four outcomes, the directions of the estimated bisexual versus lesbian differences flipped across the treatment groups.
Estimated Distributions of Substance Use Outcomes as a Function of Sexual Identity, by Treatment Group (Women; NSFG, 2015–2017).
Abbreviations: PY, past year; PM, past month. P values indicate results of design-adjusted Rao-Scott tests of associations.
Table 3 shows that men identifying as something else were found to be unique in terms of their substance use (in this case having more prevalent substance use), and differences among the subgroups tended to be larger when analyzing data from TG2. For example, there was no evidence of an association between sexual identity and past-year binge drinking when analyzing data from TG1. However, when analyzing TG2, there was a significant association (P = 0.01). Nearly 73% of respondents identifying as something else indicated past-year binge drinking, as opposed to only 31% of bisexual males. In TG1, an estimated 57% of bisexual males indicated past-year binge drinking, and this estimate was higher than that of gay men, whereas analyses suggested that bisexual men have a lower rate of this behavior than gay men when four response options are provided. For five of the eight substance use outcomes, the effect size associated with the subgroup differences was reduced in TG1, and for three of the eight outcomes, the direction of the estimated difference between gay and bisexual males flipped.
Estimated Distributions of Substance Use Outcomes as a Function of Sexual Identity, by Treatment Group (Men; NSFG, 2015–2017).
Abbreviations: PY, past year; PM, past month. P values from design-adjusted Rao-Scott tests of associations.
Multivariable Modeling Results
Table 4 shows that bisexual women had significantly higher adjusted odds of nearly all substance use outcomes relative to heterosexual women when analyzing data from TG2. Consistent with the bivariate associations, these adjusted odds ratios were consistently larger than the adjusted odds ratios seen for TG1, where some of the same odds ratios were no longer significant, or only significant at the 0.05 level. For multiple outcomes, the estimated odds ratios based on TG1 suggested higher odds of substance use for lesbian women than for bisexual women. The analyses for TG2 suggested the opposite: Bisexual women were estimated to have higher odds of substance use. Indeed, when removing the something else respondents and testing interactions between treatment group and the sexual identity subgroups, we found evidence of significant (P < 0.05) changes in these associations for past-year cigarette use and past-year use of other drugs. These results provide additional evidence of the attenuation of these subgroup differences when forcing respondents to indicate their sexual identity with only three options.
Estimates of Adjusted Odds Ratios in Logistic Regression Models for Selected Substance Use Outcomes, by Treatment Group (Women; NSFG, 2015–2017).
Abbreviations: TG1, three-category version of sexual identity measure; TG2, four-category version of sexual identity measure; GOF test, Design-adjusted goodness of fit test; Ref, Reference category. Estimated odds ratios for Age, Race, Hispanic Ethnicity, Education, and Family Income not shown. * P < 0.05, ** P < 0.01, *** P < 0.001.
Finally, Table 5 presents results from our multivariable models for males, for those outcomes where there was a significant association between sexual identity subgroup and the substance use outcome. The something else subgroup consistently had the highest adjusted odds of substance use, with p < 0.05 for past-year binge drinking and p < 0.001 for past-year marijuana use. Moreover, the adjusted odds ratios for bisexual individuals consistently changed from greater than 1 for TG1 to less than 1 for TG2. These results were also consistent with the bivariate associations. When testing the interactions based on the full sample, we found evidence of a significant (P < 0.05) change in the association for past-year marijuana use.
Estimates of Adjusted Odds Ratios in Logistic Regression Models for Selected Substance Use Outcomes, by Treatment Group (Men; NSFG, 2015–2017).
Abbreviations: TG1, three-category version of sexual identity measure; TG2, four-category version of sexual identity measure; GOF test, Design-adjusted goodness of fit test; Ref, Reference category. Estimated odds ratios for Age, Race, Hispanic Ethnicity, Education, and Family Income not shown. * P < 0.05, ** P < 0.01, *** P < 0.001.
Discussion
We have presented empirical evidence of the response options provided in a survey question about sexual identity affecting estimated relationships between sexual identity and substance use in a significant manner. Our results provide striking evidence of weakened associations between sexual identity and substance use when only three response options are provided for sexual identity. In this situation, respondents who would select something else had it been offered, who may have unique behaviors in terms of self-reported substance use, are forced to choose between the three options. This may make the three subgroups more heterogeneous and ultimately attenuate differences between the subgroups.
Researchers may avoid providing additional response options or including a something else category because including them can make a dataset more challenging to manage and/or analyze. However, they do so to the detriment of understanding more clearly the actual relationships between these identity groups (of individuals who actually identify with these labels) and their outcomes of interest. Inaccurate estimation of differences in outcomes between these identity subgroups could have wide-ranging implications for public health, including the accurate identification of high- and low-risk subgroups, the development of interventions, and the shaping of policies and laws.
NSFG respondents were not asked both types of questions, so we cannot infer how the same respondents may have answered when given different response options and what effects those different choices may have had on the associations studied here. Our conclusions in this study are limited to comparisons of the associations of sexual identity with substance use and misuse behaviors in the balanced and mutually exclusive treatment groups defined by the randomized experiment.
The absence of a fourth response option may have changed the compositions of the subgroups, potentially complicating the interpretation and comparisons of odds ratios with heterosexual as a reference category. We therefore also examined marginal predicted probabilities of alcohol, tobacco, and other drug use behaviors across the subgroups based on the fitted models in Figures 1 and 2. We explicitly did not include the something else category in these figures, as our objective was to illustrate predicted differences in the rates of these behaviors for the most common sexual identity subgroups, depending on whether something else was provided as a response option. These figures further underscore the changes in inference regarding differences across these subgroups that may arise depending on the response options provided.

Marginal predicted probabilities of selected substance use outcomes for sexual identity subgroups, by treatment group (three-cat = TG1, four-cat = TG2, Women; NSFG, 2015–2017).

Marginal predicted probabilities of selected substance use outcomes for sexual identity subgroups, by treatment group (three-cat = TG1, four-cat = TG2, Men; NSFG, 2015–2017).
Survey questions about sexual identity can be asked in many ways, with some versions being easier to understand for certain sociodemographic populations (FIWG 2016). A simple one-item quantitative measure of sexual identity, regardless of the number of response options, will always be a limited means of capturing respondents’ actual sexual identity. Our results suggest that either allowing for more response options (McCabe et al. 2012; Vrangalova and Savin-Williams 2012) or asking about sexual identity in an open-ended fashion and then coding the qualitative data may create a more accurate understanding of the differences between these subgroups in terms of substance use and misuse. We also note that using survey measures in national cross-sectional studies to assess sexual identity assumes that sexual identity is a stable concept, despite evidence for considerable changes in sexual identity over time or sexual fluidity (Diamond 2008; Savin-Williams and Ream 2007), especially in young adults (Katz-Wise 2015) and women (Rosario et al. 2006; Savin-Williams and Diamond 2000).
Our findings offer compelling evidence that researchers should exercise caution when comparing health studies that use different measures to assess sexual identity, as these studies could produce substantially different conclusions about health disparities. Several studies offer support for a more open-ended measurement of sexual identity, and one that may lend itself to text analysis procedures in attempting to form relevant and homogeneous subgroups of individuals for eventual comparative analyses (Korchmaros et al. 2013; Talley and Stevens 2017). While we do not specifically endorse the use of something else as a response option for sexual identity, this study has demonstrated that when this option is not available, estimated profiles of the sexual identity subgroups can change substantially.
Supplemental Material
Supplemental Material, sj-docx-1-fmx-10.1177_1525822x21998516 - Choices Matter: How Response Options for Survey Questions about Sexual Identity Affect Population Estimates of Its Association with Alcohol, Tobacco, and Other Drug Use
Supplemental Material, sj-docx-1-fmx-10.1177_1525822x21998516 for Choices Matter: How Response Options for Survey Questions about Sexual Identity Affect Population Estimates of Its Association with Alcohol, Tobacco, and Other Drug Use by Brady T. West and Sean Esteban McCabe in Field Methods
Supplemental Material
Supplemental Material, sj-docx-2-fmx-10.1177_1525822x21998516 - Choices Matter: How Response Options for Survey Questions about Sexual Identity Affect Population Estimates of Its Association with Alcohol, Tobacco, and Other Drug Use
Supplemental Material, sj-docx-2-fmx-10.1177_1525822x21998516 for Choices Matter: How Response Options for Survey Questions about Sexual Identity Affect Population Estimates of Its Association with Alcohol, Tobacco, and Other Drug Use by Brady T. West and Sean Esteban McCabe in Field Methods
Supplemental Material
Supplemental Material, sj-docx-3-fmx-10.1177_1525822x21998516 - Choices Matter: How Response Options for Survey Questions about Sexual Identity Affect Population Estimates of Its Association with Alcohol, Tobacco, and Other Drug Use
Supplemental Material, sj-docx-3-fmx-10.1177_1525822x21998516 for Choices Matter: How Response Options for Survey Questions about Sexual Identity Affect Population Estimates of Its Association with Alcohol, Tobacco, and Other Drug Use by Brady T. West and Sean Esteban McCabe in Field Methods
Footnotes
Authors’ Note
The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funders.
Acknowledgment
The authors thank Kate Leary for her help proofreading and editing earlier versions of this manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Financial support for this research was provided by research grants R01AA025684, R01CA203809, R01CA212517, 1R01HD095920-01, R01DA031160, R01DA036541, and R01DA043696 from the National Institute on Alcohol Abuse and Alcoholism, National Cancer Institute, National Institute of Child Health and Human Development, and National Institute on Drug Abuse.
Supplemental Material
The supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
