Abstract
People of Hispanic origin, including monolingual Spanish speakers, have experienced difficulty identifying with a race category on U.S. demographic surveys. As part of a larger research effort by the U.S. Census Bureau to improve race and Hispanic origin questions for the 2020 Census, we tested experimental versions of race and Hispanic origin questions in Spanish. About half of the versions asked about race and Hispanic origin in separate questions, while the rest asked about these constructs in a combined question format. Cognitive interviews with 33 monolingual Spanish-speaking participants indicated that (a) most participants affirmatively claimed Hispanic origin on the separate and combined formats, but had difficulty selecting or declined to select a race category on the separate question formats and (b) most participants perceived few differences between Hispanic origin and race. In comparison with the separate race and Hispanic origin questions, the combined question facilitated more satisfactory self-identifications.
Introduction
The measurement of race and ethnicity in the United States is a practice closely influenced by sociocultural, historical, and political forces. For that reason, although the Census Bureau has measured race since the first U.S. census in 1790, it has been measured in a variety of ways since then (Humes & Hogan, 2009; Lee & Tafoya, 2006). Every part of these questions has changed substantially over time, including question wording, response options, and examples provided (for reviews see Anderson & Fienberg, 2000; Hirschman, Alba, & Farley, 2000; Humes & Hogan, 2009; Lee, 1993; Snipp, 2003). These changes are largely due to a shift in scholarly conceptualizations of race in the United States from biologically determined to socially constructed categories (Hirschman et al., 2000; Omi & Winant, 1994; Williams, 1997), as well as the political climate and the relative lobbying power of various interest groups (Humes & Hogan, 2009).
With changes in the public and institutional perceptions of race came the practice of measuring Hispanic origin as well as race, though no clear difference exists between these two constructs (Campbell & Rogalin, 2006). Indeed, this division has not been clearly articulated across censuses. For example, in the 1930 Census, Mexican was added as a race category (not an ethnicity category) in order to enumerate the rapidly growing Hispanic population. 1 However, Mexican American advocates lobbied successfully for its removal, presumably because the resultant exclusion of Mexicans from the White race category restricted their access to certain rights in the United States, such as naturalization and citizenship (Hochschild & Powell, 2008). Decades later, however, civil rights activists claimed that demographic data on Hispanic individuals were insufficient (Allen, Lachance, Rios-Ellis, & Kaphingst, 2011). Therefore, in 1970, a separate question on Hispanic origin was added to the census in an attempt to more accurately enumerate not only U.S. residents of Mexican descent, but Hispanic individuals from other countries in Central and South America. Currently, the Census Bureau follows the U.S. Office of Management and Budget’s (OMB) policy for measuring ethnicity and race across all agencies of the federal government, which requires federal government questionnaires to include separate questions on ethnicity (i.e., Hispanic or non-Hispanic) and race (e.g., White, Black, etc.; OMB, 1997). In addition to the delineated OMB race categories, the Census Bureau also includes a Some Other Race response category for the race question. Most recently, the 2010 Census form asked respondents to report Hispanic origin (Is Person 1 of Hispanic, Latino, or Spanish origin?) and to report one or more races (What is Person 1’s race? Mark [X] one or more boxes.).
Hispanic Population Race Reporting
The current questions and response options for race and Hispanic origin are still challenging for many individuals, and particularly for those who identify as Hispanic (Humes & Hogan, 2009). For many Hispanic individuals who identify with this categorical label, this ethnic identity also constitutes a racial identity (e.g., Treviño, 1986). Therefore, to report a race category for a separate race question may be redundant for many Hispanic respondents. The concept of race may have different meanings for individuals of Hispanic origin who have not adopted the racial categories outlined by the U.S. government. For example, many individuals of Hispanic origin see race as a social and changeable construct related to one’s culture, rather than a classification that depends only on biological inheritance (Rodriguez, 1992).
Perhaps as a result, 35% of respondents in the 1980 Census who identified as Hispanic on the questionnaire chose to select the Some Other Race category, more than any other group in the United States (Martin, DeMaio, & Campanelli, 1990). In 1990, 43.9% of Hispanics selected Some Other Race (Gibson & Jung, 2002), followed by 42.6% in 2000 (Grieco & Cassidy, 2001), and then 36.7% in 2010 (Ennis, Ríos-Vargas, & Albert, 2011). Per OMB policy, federal statistical agencies must later reclassify these Some Other Race responses into one of the delineated race categories (Humes & Hogan, 2009; OMB, 1978), and the practice of reclassification can result in significant data quality and analysis concerns.
Racial reclassification can have effects in many fields, including the fields of health and health disparities. For example, a study of maternal health indicators by Buescher, Gizlice, and Jones-Vessey (2005) described how mothers who identified as Hispanic overwhelmingly reported their race as Other, and thus did not conform to the delineated federal race categories. When the race of these mothers was reclassified from their self-reported choice of Other to White, health statistics for the White and Hispanic groups were affected. For example, mothers who self-identify as Hispanic (non-White) typically have low rates of smoking while pregnant. When these women were reclassified as White, it appeared as if Hispanic women had higher rates of maternal smoking, while non-Hispanic White mothers had lower rates of maternal smoking than actually was the case. Thus, if the goals of analyzing these data were to determine risk factors for health outcomes, to link sociocultural behaviors with health via race, or to provide improved and targeted health care services to certain groups (see Mays, Ponce, Washington, & Cochran, 2003), then the reclassified data are at risk of not reflecting the actual experience of affected communities.
The high rates of Some Other Race reporting suggest that many Hispanic individuals do not identify with delineated OMB race categories. Thus, to ensure that its data gathering instruments meet the goal of collecting meaningful and valid data that reflect the experiences of communities and individuals, the Census Bureau continually investigates the meanings that race and ethnicity hold for the U.S. population, and the best ways to measure these complex constructs.
Research on Race and Ethnicity Survey Measurement
Empirical research on race and Hispanic origin measurement has shown that these questions present a variety of challenges for respondents. A useful tool for improving survey questions is cognitive testing, which is a qualitative method that probes participants about how they process and react to survey questions and response options (Willis, 2005). A number of studies have used this method to document the serious difficulty that monolingual Spanish-speaking Hispanic respondents have had in finding a race category that accurately reflects their self-identity (e.g., Childs, Terry, & Jurgenson, 2011; Goerman, Caspar, Sha, McAvinchey, & Quiroz, 2007, 2008; Goerman, Quiroz, McAvinchey, Reed, & Rodriguez, 2009). This method has also found evidence of the view that race and ethnicity are thought of as the same concept (Gerber & Crowley, 2005).
In part to investigate the extent to which race and ethnicity are regarded as the same concept, the Census Bureau has conducted large-scale quantitative field tests that experimented with combining race and ethnicity into one question (e.g., U.S. Census Bureau, 1997). The most recent field test was the 2010 Census Alternative Questionnaire Experiment (AQE; Compton, Bentley, Ennis, & Rastogi, 2013). The 2010 Census AQE was a multicomponent research effort that investigated alternate phrasing and content of the race and Hispanic origin questions on the 2010 Census mail form, with the goal of enhancing data quality in the 2020 Census. In comparison with the separate question format, the U.S. Census Bureau (1997) found that the combined format had lower item nonresponse for all race and ethnic groups, and Compton et al. (2013) found significantly lower item nonresponse for the combined question among English-speaking Hispanic respondents. AQE focus group research also found that English-speaking participants favored the combined format over the separate format (Carroll et al., 2011).
Current Study
The present study seeks to expand on the quantitative results of support for the combined question format among English-speaking Hispanics by qualitatively investigating monolingual Spanish-speakers’ reactions to combined and separate race and Hispanic origin questions. Because there is evidence from earlier studies that many individuals of Hispanic origin, including those who are monolingual Spanish speakers, identify their race as Some Other Race, it is critical to understand how alternative race and ethnicity questions compare with one another. To study these issues, we cognitively tested how a group of monolingual Spanish speakers living in the United States interpreted Spanish-language versions of experimental 2010 Census questions on race and Hispanic origin. We addressed the following research questions:
Method
Participants
Participants were 33 monolingual Spanish-speaking individuals 2 (17 women, 16 men, Mage = 41.36, age range = 18-65 years). All participants were born outside of the United States and lived in the Washington, D.C., metropolitan area at the time of data collection. About half of the participants had at least a high school diploma, and 10 participants with a high school diploma also had some college education or more. The recruitment goal was to include monolingual Spanish-speaking participants of various ages, genders, and educational levels who represented Spanish-speaking countries across North, Central, and South America. To achieve this goal, we recruited participants by contacting local community centers and by placing advertisements in Spanish-language newspapers in the Washington, D.C., metro area.
Questionnaires Tested
The 11 AQE questionnaires that were tested reflect major and minor differences in wording, layout, and format for the Hispanic origin and race questions, in comparison with the official questionnaire used for the 2010 U.S. Census. 3 The most significant difference among the experimental questionnaires tested was whether the race and Hispanic origin categories were presented in separate questions or in a single combined question. The combined formats were hypothesized to result in reduced question nonresponse and Some Other Race category reporting (Compton et al., 2013).
Figures 1 and 2 show one of the five separate race and Hispanic origin question formats tested, and one of the six combined question formats tested, respectively. The combined format in Figure 2 collected race and Hispanic origin data in a single item, and contained a check box and write-in boxes for each of the major group categories: White, Black, Hispanic, American Indian or Alaska Native, Asian, and Native Hawaiian or Other Pacific Islander, and Some Other Race or Origin. See Terry and Fond (2011) for a discussion of the full cognitive test of these and other features.

Separate race and Hispanic origin questions example.

Combined race and Hispanic origin question example.
Procedures
Two interviewers (including the second author) conducted 33 total cognitive interviews (18 with combined and 15 with separate race and Hispanic origin question formats). All interviews were conducted in person, with participants completing paper versions of the questionnaires. At the start of the interviews, the interviewers told participants that the purpose of the study was to test new survey questions, and the information they provided would be confidential and anonymous. The interviewers asked participants to read and sign a consent form, and to grant permission for the interview to be audio recorded. At the end of the interview, participants received US$40.00 as compensation for the time and travel required for participation. Each interview lasted approximately 1 to 1.5 hours.
The primary interviewing strategy used was the retrospective think-aloud method (Willis, 2005), in which participants completed portions of the questionnaire uninterrupted and then were asked by interviewers to describe their overall impressions, their interpretations of the questions and terminology used, and their reasons for selecting their responses. Interviewers conducted the majority of the probing when the participants finished answering the question(s) about Hispanic origin and race. Each interviewer later reviewed the audio recording and notes from the interviews he or she personally conducted and produced a detailed summary that documented participants’ reactions to the interview protocol’s questions and probes. We then used these summaries to create matrices that listed participants’ answers by each interview question and probe. The data in these matrices were then aggregated for analysis within and across the interviews to address how participants reacted to either a separate or combined race and Hispanic origin question format, as well as how participants conceptualized race and Hispanic origin.
Results
Separate Versus Combined Formats
The first research question addressed how the monolingual Spanish speakers in this study reported race and Hispanic origin when presented with either a separate or a combined question format on a census questionnaire. In formats that separately presented the Hispanic origin and race questions, the race question engendered more comprehension and response difficulty than the Hispanic origin question. When they attempted to report a race, participants often hesitated, requested help or confirmation from the interviewer, or expressed frustration before marking an answer. Of the 15 participants who completed the separate question formats, seven selected a race category from the list of options when responding to the race question. All the participants who selected a race category stated in the interviews that these categories did not match their self-identification, as they considered themselves Hispanic. For example, one female participant who confidently reported a Hispanic origin on the previous question finally checked White after minutes of verbalized deliberation, and then said to the interviewer, “Pero no soy blanca” (But I’m not White) and laughed. In addition, six participants who had not marked one of the approved race categories chose to write in responses, such as Mestiza, Hispano, or a national origin. Finally, two participants refused to provide a response to the race question. See Table 1 for how participants responded to questionnaires with separate race and Hispanic origin questions.
Participants’ Hispanic Origin and Race on Separate Question Formats (n = 15).
Reported origin is the participant’s country of birth or nationality as informally told to the interviewer before or during the interview.
While completing the form or during the cognitive interview debriefing, participants clearly expressed a desire to identify as Hispanic, regardless of whether it was presented separately or along with the race categories. For example, when answering the race question in a separate question format, one participant read the race category response options to himself, and stated that he did not fit into the response categories because he identified as Hispanic and Latino:
No parece alguna respuesta que yo le puedo dar aquí . . . yo soy salvadoreño . . . no sé en cuál clasifico. ¿De la raza negra, africana? No soy. Soy hispano, latino. India Americano, nativo- tampoco lo soy. There doesn’t seem to be any response that I can give you here . . . I’m Salvadoran . . . I don’t know which I belong in. Black, African? I’m not that. I’m Hispanic, Latino. American Indian, native- nor am I that.
After a few moments of deliberation, the participant finally states “Tal vez, sería blanca,” (Maybe it would be White), checked the White box, and wrote Hispanic on the American Indian or Alaska Native write-in line to emphasize his identity as Hispanic. Another participant who wanted Hispanic presented as a race category expressed his opinion as follows:
Yo le pregunté a cuatro hispanos antes de yo envio [the 2010 Census form] . . . Qué van a poner ellos. Y ellos tampoco me podían decir. . . . El hispano va a tener que poner latino, qué sé yo, como una raza. O el español, y no la tiene. ¿Por qué no la tiene, por qué tiene el hispano no como una raza? I asked four Hispanics before sending [the 2010 Census form] back . . . What are they going to put. And they couldn’t tell me either. . . . Hispanics will have to put Latino, what do I know, as a race. Or “Spanish,” but it doesn’t have it. Why doesn’t it have it, why isn’t Hispanic a race?
The tendency of participants to think of Hispanic as a race was supported by the results of the combined question format testing. The combined formats asked for race and Hispanic origin in a single item and were designed in part to address the concerns of respondents who define race and Hispanic origin as the same concept. Of the 18 participants who completed a combined question format, 15 reported only a Hispanic origin. Once they found a Hispanic category, they marked it and appeared satisfied that they had fully answered the question by moving on to the next question without pausing or looking to the interviewer for assistance. The remaining three participants selected a Hispanic origin and the White race category, in accordance with the questionnaires’ instructions to mark one or more categories if desired. Furthermore, these three participants used race category write-in lines to report specific Hispanic origins. See Table 2 for a summary of how participants reported race or origin on the combined race and Hispanic origin question formats.
Participants’ Hispanic Origin and Race on Combined Question Formats (n = 18).
Reported origin is the participant’s country of birth or nationality as informally told to the interviewer before or during the interview.
The two-part combined question format differed from the other combined question formats in that it asked respondents to report race or Hispanic origin for its Question 8, and then asked respondents to write a specific race, origin, or enrolled or principal tribe for its Question 9.
Overall, the 18 participants who completed one of the combined formats completed the question without difficulty. While thinking aloud and marking their answers during the interview, participants expressed approval of combining the concepts of Hispanic origin and race into a single question because they were able to find their identity without having to select an OMB race category. For example, one respondent from El Salvador described how he looked over the combined list of race and Hispanic origin response categories and selected the Other Hispanic Origin category because it was the best fit of all the categories, including the OMB race categories:
Blanco, negro, Africa-Americana, India Americana, mexicano, u otro origen hispano, latino, o español. Y creo que éste está bien. White, Black, African American, American Indian, Mexican, or Other Hispanic Origin. And I think this fit.
Another participant who completed a combined format question commented that she recently heard a news report in which Hispanics expressed confusion about the race question. In contrast, she was not confused because in the combined format that she completed, all of the race or origin self-identification categories were listed together. She expressed that this format was satisfactory to her because she also defined her race as Hispanic.
Conceptualizations of Race and Hispanic Origin
The second research question addressed how monolingual Spanish speakers conceptualized race and origin in the context of a census questionnaire. We found that participants in this study generally had a single definition of origin, while their definitions of race were diverse. Participants defined origin as a person’s country of birth, but did not include ancestors from the distant past in their definition. Overall, participants gave mixed definitions of the term race. Their definitions of race generally included geographic origin, but added other characteristics, such as the cultural and physical heritages of a common people. The most frequent definitions were (a) the culture and physical features of a common people and their descendants (e.g., the children of Spanish colonists and indigenous people were described as Mestizos) and (b) country of birth combined with skin color. Although the role of skin color in how participants conceptualized race was not a standard probe in the cognitive interview protocol, approximately half of the participants introduced the subject of skin color unsolicited. Many participants introduced terms such as Moreno (in addition to Black and White, etc.) as a way to more accurately describe the range of skin colors they perceived as relevant to conceptualizations of race.
Due to their definition of origin as country of birth or nationality, all participants with children born in the United States expressed difficulty when reporting their children’s Hispanic origin. As evidenced from the interview debriefings, their difficulty was due to linking the term origin with Hispanic. Because these terms are presented together as a single phrase, there is no clear option for participants to report different places of birth while maintaining a record of Hispanic familial roots. They wanted to accurately report their children as being born in the United States, because this represents their children’s origin as defined by country of birth. At the same time, they thought they should report the same “Hispanic” origin as they did for themselves because of the parent-child relationship between them.
Participants responded to this issue by reporting Hispanic origin for their children in no consistent pattern. For the separate race and Hispanic origin question format, one participant was uncertain of how to answer the separate race question and did not give a response, and another participant copied the answers she reported for herself (Mexican for the Hispanic origin question and White for the race question).
For participants reporting for their children using the combined question format, one participant simply wrote American in the Other Hispanic, Latino, or Spanish origin write-in line, and another participant wrote Latino American on the Hispanic, Latino, or Spanish origin write-in line. After much deliberation, another participant who marked the Mexican box for herself on a combined question format decided to mark the Other Hispanic origin box and write in Mexican American for her son. She was initially confused because she compared his situation to her own and realized that her and her son’s reports would not match. She wanted to have their responses match because of their mother-son relationship but decided against it, citing her desire to represent her son’s American birth and citizenship. When completing a two-part combined format, another participant checked Hispanic for the general race or origin question, and wrote Hispanic for the specific race or origin question, to indicate that the child’s family was from a Hispanic country. One participant who checked Hispanic and wrote in Salvadoreño (Salvadoran) for himself in the two-part combined question format checked Hispanic and wrote in Hispanic for his two children, because they were born in America, and not El Salvador.
Discussion
Findings from this study showed that when monolingual Spanish-speaking participants conceptualized race and origin on a census form, their conceptualizations of these terms did not match how these terms were presented on the separate race and Hispanic origin format. Participants defined origin as nationality or place of birth and defined race as their self- and family-identity, which included a cultural affinity that cut across national boundaries and across generations. When voicing this expansive definition of race, participants repeatedly used terms such as Hispanic or Latino. Currently, these two categories are only listed as response options to the Hispanic origin question in a separate question format. This conceptual mismatch made it difficult for participants to interpret and accurately answer the separately presented race and Hispanic origin question formats in a way that reflected their cultural heritage. These findings support previous research where such difficulties were also found (e.g., Childs et al., 2011; Goerman et al., 2007; Goerman et al., 2008; Goerman et al., 2009).
The conceptual mismatch was evident at two points in the questionnaire completion process and debriefing session during the cognitive interview. The first instance occurred when participants, after affirming a Hispanic origin and a country of origin, were then asked to report a race for the separate race and Hispanic origin question formats. In answering the Hispanic origin question alone, participants thought that they essentially reported all the information requested from both questions, because they listed a race (Hispanic) and origin (country of origin). This finding supports previous research that found that Hispanic respondents consider race and ethnicity (i.e., Hispanic origin) to be similar concepts (Gerber & Crowley, 2005). The second instance occurred when some participants reported race and Hispanic origin for their American-born children. Having previously constructed ideas of race and origin in the local context of reporting for themselves, their conceptualizations of origin as one’s place of birth or nationality did not permit them to list their children as Mexican, Honduran, and so on. However, not reporting the same origin for their children as they reported for themselves seemed to require denying a strongly felt need to report their children as Hispanic, even if their children’s national origins were different from their own.
Regarding the experimental question formats, the type of format (i.e., separate or combined) influenced the way in which participants reported their race and Hispanic origin. The majority of participants reported only a Hispanic origin for the combined question, and the majority reported a Hispanic origin and a race on the separate question format. However, when these participants selected one of the form’s race categories, they resisted this categorization by writing in Hispanic origins under the race categories or by expressing in the interview that their selections were not accurate or meaningful to them. Hispanic origin was the most salient cultural identifier they used, which resulted in positive reactions to the combined question format. This qualitative finding complements quantitative research finding that the combined question format is a promising option for collecting these data (e.g., Compton et al., 2013; U.S. Census Bureau, 1997).
Limitations
For this study, we note the following limitations. First, the relatively small group of monolingual Spanish-speaking participants in this study was a convenience sample local to the Washington, D.C., metro area. Thus, our results may not be representative of the diverse communities across the United States that would use a Spanish-language census form. Second, the cognitive interview context in this study consisted of a participant completing a questionnaire in the presence of an interviewer, and did not simulate how participants would likely complete a mailed-out census questionnaire (i.e., at home by themselves or with friends and family). The cognitive interview process may have influenced their responses, explanations of responses, or reactions to the response categories. Also related to the cognitive interview process, participants’ responses to cognitive interview questions about the meaning of race and origin may be vulnerable to race-of-interviewer effects, which have been found in studies of race and ethnicity attitude items (for a review, see Davis, Couper, Janz, Caldwell, & Resnicow, 2010).
Implications
The results of this study have implications for the measurement of race and ethnicity, as well as for demographic research on the U.S. Hispanic population. The findings add supporting evidence to previous cognitive interview studies in which Spanish-speaking respondents had difficulty with the separate race question. The findings also contribute to the discussion of whether to collect race and Hispanic origin data in separate or combined questions in future surveys. In particular, the finding of support for the combined question format by monolingual Spanish-speaking respondents expands on results from the AQE quantitative field test and focus group research, which found support for the combined race and Hispanic question format among English-speaking Hispanic respondents. The present study adds depth to the quantitative response analysis by contributing an analysis of participants’ detailed discussion of their impressions of the race and Hispanic origin questions, and the thought processes that led to their responses.
Furthermore, the findings suggest that datasets with race and Hispanic origin collected in a combined question format may better identify the Hispanic population than would a separate question format, and thus may result in improved Hispanic population data quality. Such improved data may be especially useful in studies where it is important to identify the Hispanic population, such as health statistics specific to Hispanics (Allen et al., 2011; Zambrana & Carter-Pokras, 2001). In any case, any format used to collect race and ethnicity data will continue to be challenged with meeting the multiple goals and requirements set by the U.S. government, the researchers who use these data, and the respondents themselves. For example, the OMB requires the collection and review of race and ethnicity data to help enforce federal civil rights laws (OMB, 1997), which depend on questions and response options that have personal and group heritage salience. Achieving this goal begins with minimizing item nonresponse and use of residual categories such as Some Other Race, all while producing data that is as comparable as possible to data collected in previous years.
Footnotes
Acknowledgements
We thank the reviewers whose comments improved earlier drafts of this paper, and all errors that remain are our own.
Authors’ Note
This article is released to inform interested parties of research and to encourage discussion. The views expressed are those of the authors and not necessarily those of the U.S. Census Bureau. A previous version of this paper was presented at the 66th Annual Conference of the American Association for Public Opinion Research (AAPOR) in Phoenix, Arizona, May 12-15, 2011.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
