Abstract
Previous research on the latent structure of sexual orientation has returned conflicting results, with some studies finding a dimensional structure (i.e., ranging quantitatively along a spectrum) and others a taxonic structure (i.e., categories of individuals with distinct orientations). The current study used a sample (N = 33,525) from the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC). A series of taxometric analyses were conducted using three indicators of sexual orientation: identity, behavior, and attraction. These analyses, performed separately for women and men, revealed low-base-rate same-sex-oriented taxa for men (base rate = 3.0%) and women (base rate = 2.7%). Generally, taxon membership conferred an increased risk for psychiatric and substance-use disorders. Although taxa were present for men and women, women demonstrated greater sexual fluidity, such that any level of same-sex sexuality conferred taxon membership for men but not for women.
Sexual orientation, like most constructs within psychology, is not directly observable, which has contributed to enduring controversy about whether homosexuality describes a discrete class of individuals. Although sexual orientation has long been equated with self-identification categories (heterosexual, bisexual, homosexual), many researchers have instead argued that sexual orientation lies on a continuum (Kinsey, Pomeroy, & Martin, 1948; Savin-Williams, 2014).
Whether sexual orientation is presumed to exist on a continuum or as discrete categories (i.e., to be taxonic) has implications for theories of and research on psychosexual development. With respect to etiology, a taxonic structure would be consistent with models that identify either a single causal factor (e.g., an X-linked gene; Sanders et al., 2015) or a tipping-point model in which multiple factors combine or a single factor (e.g., prenatal androgen exposure) surpasses a threshold to yield a qualitative transformation (Hines, 2011). A dimensional latent structure would be most consistent with models that posit the contribution of multiple causal factors, each adding incrementally to the outcome. There is further debate about whether the structure of sexual orientation differs for men and women (e.g., Bailey, 2009), which is a possibility consistent with research suggesting that female sexuality is more fluid than male sexuality (Baumeister, 2000; Diamond, 2007).
Most of the debate about how to conceptualize sexual orientation has occurred at the manifest-variable level—typically based on self-reports to single questions. However, even if individual indicators exist on a continuum, this does not mean the construct of sexual orientation is likewise dimensional, nor do responses to dichotomous questions prove that sexual orientation is a categorical construct. Taxometric methods are a set of nonredundant statistical procedures developed specifically to determine the underlying structure of data, thus addressing the question of whether constructs are dimensional or taxonic (Waller & Meehl, 1998). An additional benefit of conducting taxometric analyses is that these procedures provide base-rate estimates of the taxon if the construct proves to be taxonic. Prior attempts to estimate the prevalence of homosexuality have been controversial because they have typically relied on responses to a single self-identification item, and the rates have varied depending on how the item was worded (e.g., Gates, 2011). Therefore, if sexual orientation is taxonic, the current study may provide a more accurate measure of the base rate of homosexuality.
Two studies have used taxometric procedures to examine the structure of constructs associated with sexual orientation: Whereas Haslam (1997) utilized a male-only sample and concluded that sexual orientation was dimensional, Gangestad, Bailey, and Martin (2000) used a mixed-gender sample and concluded that the construct was categorical. However, these findings may be somewhat ambiguous regarding sexual orientation because of the primacy of gender measures used in their analysis. For example, Haslam utilized only the Masculinity-Femininity Scale of the Minnesota Multiphasic Personality Inventory–2 to determine the structure of sexual orientation. The Masculinity-Femininity Scale reflects gender typicality and does not directly assess sexual orientation (e.g., Wong, 1984), and Rees-Turyn, Doyle, Holland, and Root (2008) criticized Haslam’s study for relying on this measure as an indirect indicator of sexuality.
Gangestad and colleagues (2000) analyzed a sample of nearly 5,000 individuals in the Australian Twin Registry using the 7-point Kinsey Scale and measures of gender nonconformity and identity. Although Gangestad et al. utilized a measure of sexual orientation, they found that roughly half of the identified taxon members were exclusively heterosexual, and the taxon was more defined by the gender measures. Thus, there have not been any taxometric studies that directly assessed the latent structure of sexual orientation using indicators focused exclusively on sexual orientation.
If a construct is found to be taxonic, it is also necessary to begin to establish the construct validity of this putative taxon by identifying its associations with external correlates (Waldman & Lilienfeld, 2001). Individuals who report nonheterosexual identities, behavior, and attractions are more likely than heterosexual individuals to meet criteria for a psychiatric disorder (e.g., Bostwick, Boyd, Hughes, & McCabe, 2010). Therefore, we would expect taxon members to have poorer mental health. In the current study, we built on the previous studies by conducting a taxometric analysis using data from a much larger (N = 33,525), nationally representative sample of adults (National Institute on Alcohol Abuse and Alcoholism, or NIAAA, 2010); advanced statistical and methodological approaches to taxometric analyses (Ruscio, Walters, Marcus, & Kaczetow, 2010); and widely accepted indicators of sexual orientation (sexual behavior, attraction, and identity; Saewyc, 2011; Savin-Williams, 2006).
Method
The present study utilized the Wave 2 (2004–2005) National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) data set of 35,653 adults in the United States (NIAAA, 2010). The survey was conducted through personal interviews with one randomly selected adult in each household. Non-Caucasian households were oversampled to ensure adequate representation of African American and Hispanic individuals, who comprised 19.1% and 19.3% of the final sample, respectively. Because taxometric methods require complete data, participants with missing data on the sexual-orientation variables (identity, attraction, behavior) or without sexual experience (n = 583) were excluded from further analysis, which yielded a sample size of 33,525 participants, 58% of whom were female, and 42% of whom were male.
Measures
The NESARC data included three indicators of sexual orientation: attraction (“Which category best describes your feelings on sexual attraction to others?”), identity (“Which of the categories best describes you?”), and behavior (“In your lifetime, have you had sex with. . .”). On the basis of the gendered nature of the attraction and behavior variables (e.g., “only attracted to males”), we recoded these separately for men and women to reflect sexual orientation. Responses on all three indicators were ordered from “exclusively other sex” to “exclusively same sex,” with levels of both-sex attraction, identity, and behavior ordered in the same manner for consistency (e.g., attraction: exclusively other sex, mostly other sex, both sexes equally, mostly same sex, exclusively same sex). Thus, higher scores indicated greater same-sex sexuality, and lower scores indicated heterosexuality (sum range: 3–11).
The survey also included measures of mental health. Psychiatric disorders were assessed using criteria from the fourth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM–IV; American Psychological Association, 1994) and coded as dichotomous diagnostic variables for lifetime mood disorders, anxiety disorders, and personality disorders (antisocial, borderline, narcissistic). Problematic substance use was indicated by past-year and lifetime abuse or dependence criteria. Lifetime criteria were used in the present analyses.
Procedure
Three taxometric procedures—means above minus below a sliding cut (MAMBAC; Meehl & Yonce, 1994), maximum covariance (MAXCOV; Meehl & Yonce, 1996), and latent-mode analysis (L-Mode; Waller & Meehl, 1998)—were conducted. Separate analyses were conducted for women (n = 19,407) and men (n = 14,118). Using a minimum of two indicators, MAMBAC plots the input indicator along the x-axis. At specified cuts along the x-axis, the means of the scores on the output indicator are calculated for those cases above and below the cut, from which a mean difference score is plotted along the y-axis. Taxonic constructs yield graphs that appear as an upside-down U. We used 1,000 cut points and examined each indicator combination independently, which resulted in six curves.
MAXCOV calculates the covariance of two output indicators using overlapping intervals, or windows, of the input indicator, thus requiring three indicators. If a construct is taxonic, the maximum covariance should occur in the window that contains roughly half taxon and half complement members. Consequently, taxonic MAXCOV graphs should appear humped or peaked rather than flat. We used 1,000 windows with .90 overlap (140 and 192 cases were captured by each window in the male and female samples, respectively). With three indicators, the present MAXCOV analyses yielded three curves. L-Mode calculates the first, principal factor of supplied indicators and then plots participants’ scores on this factor, which results in one density plot. A categorical structure results in a bimodal plot corresponding to two distinct categories, whereas dimensional data result in a unimodal plot.
The properties of a given data set (e.g., skew) can influence the appearance of taxometric curves. Same-sex attraction, behavior, and identity are highly positively skewed variables, and low-base-rate taxa can be difficult to detect using visual inspection of taxometric results (Ruscio & Marcus, 2007). Using Ruscio’s (2014) statistical R package for taxometric analyses, 100 comparison dimensional and taxonic data sets were created from the existing data set. In this procedure, simulated data sets that reproduce essential features of the actual data (e.g., variance, skew) but vary whether the data reflect a taxonic or dimensional construct are used to produce taxometric graphs. These graphs are compared with the graphs yielded by the actual data. Comparison curve fit indices (CCFIs) were calculated by examining the extent to which the actual taxometric graphs were similar to the dimensional and taxonic comparison data. CCFI values range from .0 (supporting dimensional structure) to 1.0 (supporting taxonic structure), with values around .50 being more ambiguous. Monte Carlo studies (Ruscio et al., 2010) have found that when the three statistically independent taxometric procedures used in the current study yield an average CCFI less than .40 or greater than .60, or when all three CCFIs are less than .45 or greater than .55, the latent structure is correctly identified 99.9% of the time.
Base rates were estimated directly from the data because existing research has suggested a widely discrepant prevalence of nonheterosexuality based on the indicators used. For example, strict self-identification results in the lowest prevalence (1.7%–5.6%) and attraction the highest prevalence (11%; Gates, 2011). Although no taxometric analyses have been conducted with these indicators, latent class analyses suggest that somewhere between 1.5% and 6.7% of men and 1.6% and 11.3% of women show some level of persistent nonheterosexuality in identity, behavior, and attraction (Fergusson, Horwood, Ridder, & Beautrais, 2005; Talley, Sher, Steinley, Wood, & Littlefield, 2012). Monte Carlo research indicates that the values generated using the CCFI method along with L-Mode can yield especially precise estimates of the taxon base rate (Ruscio & Walters, 2009). MAMBAC and MAXCOV procedures generate estimated base rates directly from the data, whereas L-Mode can be utilized to estimate the base rate through a method of successive approximations in which L-Mode is performed at various supplied base rates until the largest CCFI value is achieved.
Base rates were estimated separately for women and men from the MAMBAC, MAXCOV, and L-Mode analyses and averaged. For women, the average base rate used in subsequent analyses was .027 (MAMBAC = .026, MAXCOV = .024, L-Mode = .030). The average base rate for men was .030 (MAMBAC = .035, MAXCOV = .028, L-Mode = .028). These estimated base rates were then used to generate class assignment using Ruscio’s program. The number of individuals assigned to the taxon using the base-rate classification technique was slightly less than the derived estimates because tied cases on the composite of the three indicators were assigned either to the taxon or to the complement depending on which most closely approximated the specified taxon base rate (Ruscio, 2014).
Results
Accurate taxometric conclusions depend on the use of valid, distinct indicators that can distinguish between cases that belong to the taxon and complement classes, with at least 1.25 standard-deviation units difference between the two classes (Meehl, 1995). Our indicators far exceeded this requirement, with d values ranging from 6.1 to 10.6 in the male sample and 6.3 to 8.2 in the female sample. Further, the indicators demonstrated high correlations in the overall sample but little nuisance covariance (Waller & Meehl, 1998); they were not highly correlated within the taxon and complement classes (Table 1).
Results of the Taxometric Analyses Examining the Latent Structure of Sexual Orientation
Note: Base rate = base rate of homosexuality (taxon class); skew = average skew of the indicators; d = validity of the indicator in standard-deviation units (the range for this variable is given in parentheses); r = average correlation among the indicators in the full sample; rc = average correlation among the indicators (nuisance) within the putative complement class; rt = average correlation among the indicators (nuisance) within the putative taxon class. MAMBAC = means above minus below a sliding cut; MAXCOV = maximum covariance; L-Mode = latent-mode factor analysis.
Taxometric analyses
The average taxometric graphs along with the simulated graphs for each procedure are presented in Figures 1 through 3. For MAMBAC, curves with a cusp on the right side of the graph, such as those we obtained, are typically indicative of a low-base-rate taxon. Humped or peaked MAXCOV graphs are typically consistent with taxonic data. For L-Mode, taxonic constructs yield bimodal curves, whereas dimensional constructs are unimodal, and the present analyses revealed bimodal curves. Most important, the actual data were more similar to the simulated taxonic data than to the simulated dimensional data.

Composite curves from the analysis of means above minus below a sliding cut (MAMBAC) for male (top) and female (bottom) samples, separately for categorical and dimensional simulation data. The thick line in each graph presents results for the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) data, and the thin lines show the minimum and maximum values from the simulations. The shaded gray bands visible at the upper right in the graphs show the middle 50% of the values for all simulated data sets.

Composite curves for the maximum covariance (MAXCOV) analyses for men (top) and women (bottom), separately for categorical and dimensional simulation data. The data were ordered along the x-axis by the standardized scores on the input indicator and then grouped into 1,000 subsamples using overlapping windows (.90 overlap). The y-axis shows the covariance between the two output indicators for each subsample (i.e., window). The line with dots (data points) in each graph presents results for the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) data, and the smooth lines show the minimum and maximum values from the simulations. The shaded gray band shows the middle 50% of the values for all simulated data sets.

Results of the latent-mode analysis (L-Mode) for men (top) and women (bottom), separately for categorical and dimensional simulation data. The x-axis shows the scores on the first factor of a factor analysis of the indicators, and the y-axis shows the relative frequency of each score. The thick line in each graph presents results for the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) data, and the thin lines show the minimum and maximum values from the simulations. The shaded gray band shows the middle 50% of the values for all simulated data sets.
This greater similarity in the graphs was quantified by the CCFI values. In the male sample, MAMBAC (.708), MAXCOV (.727), and L-Mode (.601) were consistently above the .55 threshold indicative of a taxonic structure (M = .679). For the female sample, MAMBAC (.656) and MAXCOV (.718) were clearly taxonic, and L-Mode was on the threshold (.543). The average of the three procedures was above the threshold (M = .639).
Using the data-generated base-rate estimates (3.00% for men and 2.67% for women) and the respondents’ scores on the three indicators of sexual orientation, we assigned individual cases to the taxon (ns = 388 men, 489 women) or complement classes (ns = 13,730 men, 18,918 women). Taxon members displayed a pattern of greater same-sex sexuality than complement members across all indicators of sexual orientation (Table 2). No complement members reported mostly or exclusively homosexual attraction, whereas the majority of male and female taxon members reported exclusively homosexual attraction. There was, however, an apparent sex difference in the taxon groups regarding sexual behavior: Men in the taxon were significantly more likely than women in the taxon to report exclusively homosexual behavior and less likely to report exclusively heterosexual experiences, χ2(2, N = 877) = 33.4, p < .001.
Percentage of Participants Who Endorsed Each Option on the Indicators of Sexual Orientation
Note: Values in parentheses indicate the number of individuals.
Because reports of sexual behavior can be influenced by social desirability effects (e.g., Acree, Ekstrand, Coates, & Stall, 1999), and sexual prejudice is associated with political and religious affiliation (Herek, 2000), more conservative or religiously fundamentalist respondents may have underreported their levels of same-sex attraction or behavior, which would have resulted in pseudotaxonic findings. To address this possibility, we reran the taxometric analyses excluding those respondents who indicated that they attended religious services at least once each week (the NESARC survey did not assess political or religious affiliation). Even in these less religiously active subsamples (ns = 12,222 and 10,276 for women and men, respectively), the results were clearly taxonic and virtually unchanged (mean CCFI = .695 and .664 for women and men, respectively), which suggests that the taxonic results are unlikely to be an artifact of reporting bias.
Construct validity: external correlates
For women and men, taxon membership predicted a greater risk of suicidal ideation and attempts as well as a range of lifetime psychiatric and substance-use disorders (Tables 3 and 4), compared with complement membership. This pattern was consistent with our hypothesis based on existing research that nonheterosexual individuals have poorer mental health than heterosexuals (e.g., Bostwick et al., 2010).
Comparison of Lifetime Mental-Health Diagnoses Among the Classes
Note: Percentages indicate members of the given class who met diagnostic criteria for that variable. The next-highest group consisted of the members of the complement group who scored highest on a composite of the three indicators. For the chi-square tests, the degrees of freedom is 1. PTSD = posttraumatic stress disorder; GAD = generalized anxiety disorder; BPD = borderline personality disorder; NPD = narcissistic personality disorder; ASPD = antisocial personality disorder.
p < .05. **p ≤ .01. ***p ≤ .001.
Comparison of Lifetime Substance-Use Diagnoses and Suicidality Among the Classes
Note: Percentages indicate members of the given class who met diagnostic criteria for that variable. The next-highest group consisted of the members of the complement group who scored highest on a composite of the three indicators. Because of low cell counts, these analyses could not reliably be calculated for many of the disorders, and only those chi-square analyses with expected cell counts were included in this table.
For suicide attempts, Fisher’s exact values were calculated for both comparisons for men and for the comparison between women in the taxon and women in the complement; thus, the significance is all that is provided. For all other variables, these columns present results for chi-square tests.
p < .05. **p < .01. ***p < .001.
Tables 3 and 4 also include comparisons of the taxon groups to the members of the complement group who scored highest on a composite of the three indicators (next-highest group). Even when the taxon was compared with the next-highest group, taxon members continued to be more likely to meet criteria for psychiatric diagnoses, although this pattern held only for men. However, both men and women in the taxon groups reported consistently greater suicidal ideation.
Discussion
These results demonstrate that sexual orientation is not a matter of degree but rather of distinct and meaningful categories. The analyses yielded a 3.00% base rate for the male sample and a 2.67% base rate for the female sample. Although previous studies using individual indicators resulted in discrepant prevalence estimates (e.g., Gates, 2011), these base-rate estimates are consistent with existing literature on the prevalence of same-sex sexuality using latent methods: Latent class analyses using these same three dimensions found that 2.8% to 3.2% of individuals are consistently nonheterosexual in their attractions, behavior, and identities (Fergusson et al., 2005; Talley et al., 2012).
Previous research suggests that same-sex sexuality is higher in women than in men (Chandra, Mosher, Copen, & Sionean, 2011), but the female taxon in this study represented a smaller percentage of women than the male taxon did for men (2.67% vs. 3%). Although this result seems to contradict previous findings, this pattern may be explained by the greater homogeneity of the male complement group than the female complement group. Whereas men who were not exclusively heterosexual in attraction and behavior were almost all assigned to the taxon group, it was not uncommon for female complement members to report some degree of same-sex attraction. Whereas only 2% of the men in the complement group (n = 275) reported any level of same-sex attraction, 4.8% of the women in the complement group (n = 912) reported some level of same-sex attraction, which is more women than the 489 women assigned to the taxon. Thus, almost all men who reported some level of bisexuality were classed in the taxon, whereas the majority of women who reported some level of bisexual attraction were classified in the complement. This pattern may be consistent with hypotheses that female sexuality is more fluid than male sexuality (Baumeister, 2000) and perhaps supports Diamond’s (2012) argument that women should not be labeled as bisexual simply because of nonheterosexual attractions. Alternatively, these results may also suggest that whereas the case for a male homosexual taxon is very strong, additional research will be necessary to further validate a female homosexual taxon.
Similarly, the distinction between the male taxon and complement was more well-defined than it was for women, as evidenced by the associations with external correlates. For example, although male and female taxon membership predicted uniformly increased risk of poor mental health across a wide range of diagnoses, female taxon members were often indistinguishable from their nearby counterparts in the complement. When taxon members were compared with a group of the most same-sex-oriented complement members, the men in the taxon group reported consistently higher rates of mental-health problems than even the men at the high end of the complement, whereas women did not. The exception was alcohol use, for which male taxon members were at decreased risk of alcohol abuse and no greater risk for alcohol dependence compared with men in the complement. However, this pattern is consistent with existing research (McCabe, Hughes, Bostwick, & Boyd, 2005). These findings further support a clear boundary between male taxon and complement members. Therefore, despite strong evidence that sexual orientation is taxonic in both men and women, the nature of the taxa differed by gender.
Despite methodological differences between this study and Gangestad et al.’s (2000) taxometric analysis of gender typicality and sexual orientation, it is notable that both approaches found taxonic structures for men and women. Gangestad et al. found that 12% to 15% of men and 5% to 10% of women, many of whom were heterosexual, belonged to the latent taxa. To explain this commingling of sexual orientations in the taxon, Gangestad et al. argued that a larger proportion of individuals demonstrate sex atypicality in childhood than go on to develop as nonheterosexual. Our findings were consistent with this hypothesis, as a smaller proportion of individuals were qualified in our taxa.
These findings are also consistent with biological theories of sexual orientation, including existing research on the heritability of sexual orientation. For example, in a series of studies on siblings and twins, Bailey and colleagues found that although heterosexual women appear to differ from men in their sexual-arousal patterns (Bailey, 2009), sexual orientation appears to have familial roots for both men and women and perhaps a shared mechanism via childhood gender nonconformity (Bailey, Dunne, & Martin, 2000). Therefore, the consistent taxonicity found in Gangestad et al.’s (2000) research and in the present study might support a biological theory of sexual orientation (e.g., Hines, 2011), suggesting that there is a shared pathway indicative of a taxonic process.
Meehl (2001) discussed how a taxon can be biologically based (“natural kinds”) or arise from the type of human-imposed differentiation involved in political ideology, religiosity, or social institutions (“environmental mold”). Although he argued that the strength of taxometrics lay in this inference-free zone, this indeterminacy means that individuals may interpret the taxonicity of sexual orientation differently. Some individuals might view sexual orientation’s taxonicity and the difference in female and male sexual fluidity as being socially constructed, with sexual orientation bound through socialization processes that are particularly punitive toward male nonconforming youths (e.g., Sandnabba & Ahlberg, 1999). Thus, perhaps female sexual fluidity reflects greater social acceptability for such behavior in women. However, some findings complicate this hypothesis. For example, the complement contained similar percentages of men and women who reported some degree of same-sex behavior (1.7% of men and 1.1% of women), so it appears that there is not a uniform bias against heterosexual men reporting same-sex inclinations.
To test these alternative hypotheses, researchers could supplement traditional self-report measures of sexual orientation with more objective biological indicators. If proposed biological indicators also support sexual-orientation taxonicity, researchers could conclude with greater certainty that results were not due to reporting bias. Thus, to clarify these findings, future research should include a more balanced proportion of indicators of sexual orientation and proposed biological correlates, such as spatial ability (e.g., Peters, Manning, & Reimers, 2007) and gender typicality (e.g., Hines, 2011), as well as digit ratio, handedness, and birth order (e.g., Mustanski, Chivers, & Bailey, 2002). Another way to delineate these possibilities is through cross-cultural research. Finding that sexual orientation appears taxonic across cultures, regardless of the country’s attitudes toward homosexuality and sexuality, would support the biological theory. Existing research suggests that the prevalence of homosexuality and its association with gender-atypicality are similar across cultures (e.g., Whitam & Mathy, 1991); however, future work could examine taxonicity as well.
Because of the use of an archival data set, there were some constraints on the analyses, primarily in the limited number of indicators. The NESARC data assessed sexual identity, behavior, and attraction with single-item indicators, which were used as stand-alone indicators in these taxometric analyses. As Beauchaine (2003) recommended, using aggregate factor scores enhances measurement precision in taxometric analyses by reducing the potential error in lone indicators of a construct. For example, the identification question allowed only for “heterosexual,” “homosexual,” and “bisexual” responses, but “mostly heterosexual” identification has arisen as a distinct category of orientation associated with harm (Savin-Williams & Vrangalova, 2013). Notably, a greater proportion of women than men identify as “mostly heterosexual” (e.g., Fergusson et al., 2005), and not having this identification option might have affected the female analyses in particular. Thus, including multiple questions to ascertain the nature of participants’ sexual attractions, fantasies, behaviors, and identification would strengthen future research. However, this limitation was offset by the large size of the NESARC data set, which allowed for the detection of a low-base-rate taxon.
Footnotes
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
Funding
A. L. Norris’s work was supported by the Anthony Marchionne Foundation for the Scientific Study of Human Relations and Psychological Processes Endowed Graduate Fellowship for Research at Washington State University.
