Abstract
The Stranger Face Trust (SFT) questionnaire and the Imaginary Stranger Trust (IST) questionnaire are two new self-report measures of generalized trust that assess trust in real (SFT) and imaginary (IST) strangers across four trust domains. Both were designed to be objective, empirically valid, and easy to administer and score. To assess measurement validity and reliability, SFT and IST along with other common measures of social trust, sociodemographic characteristics, biographical characteristics, and a survey experiment were administered to a large representative sample of Qualtrics web-panel members (N = 2,041). Confirmatory factor analysis and structural equation models established the internal consistency, convergent validity, discriminant validity, and criterion validity of SFT and IST. Further tests revealed that SFT and IST correlate with well-established predictors of generalized trust, while other correlates like the age–trust relation were called into question. Taken together, this article shows that SFT and IST are valid and reliable instruments for the measurement of generalized trust and that common measures of generalized trust appear to be less valid and less reliable. This article ends with a discussion of the implications and directions for future research.
Keywords
Over the past half-century, generalized trust has garnered great attention from social scientists (Almond and Verba 1963; Fukuyama 1995; Putnam 1993, 2000; Uslaner 2002). It is frequently featured as a treatment or as an outcome in top generalist and disciplinary outlets (Abascal and Baldassarri 2015; Glaeser et al. 2000; Jachimowicz et al. 2017; Kuwabara 2015; Muller and Mitchell 1994; Nunn and Wantchekon 2011; Thomson et al. 2018; Twenge, Campbell, and Carter 2014). In fact, generalized trust has been touted as a source of economic growth (Algan and Cahuc 2010, 2013; Tabellini 2010), welfare state development (Bergh and Bjørnskov 2011, 2014; Daniele and Geys 2015; Rothstein and Uslaner 2005), and political participation (Almond and Verba 1963; Inglehart and Norris 2003; Putnam 2000) and is itself a function of civil society and informal social ties (Brehm and Rahn 1997; Glanville, Andersson, and Paxton 2013; Paxton 2007), religion (Seymour et al. 2014; Welch et al. 2004), ethnic diversity (Dinesen and Sønderskov 2015; Hooghe et al. 2009), personality traits (Dinesen, Nørgaard, and Klemmensen 2014; Freitag and Bauer 2016), political–institutional environments (Bjørnskov 2007; Dinesen 2013; Delhey and Newton 2005; Freitag and Bühlmann 2009; Herreros 2004; Paxton 2002; Robbins 2012a, 2012b), and various sociodemographic characteristics such as age, gender, and race (Alesina and La Ferrara 2002; Freitag and Traunmüller 2009; Hooghe et al. 2009; Kumlin and Rothstein 2005; Leigh 2006; Mewes 2014; Simpson 2006). It is also frequently investigated as an end-in-itself or treated as an indicator of other closely related concepts such as social capital (Messner, Rosenfeld, and Baumer 2004; Paxton 1999; Putnam 2000; Rosenfeld, Messner, and Baumer 2001) and collective efficacy (Sampson, Raudenbush, and Earls 1997).
Despite the cross-disciplinary breadth of generalized trust, a feature commonly shared by all of this work is a reliance on a small set of survey items (Nannestad 2008). The classic measure of generalized trust is the most-people trust question first developed by Rosenberg (1956) as part of a faith-in-people Guttman scale: “Some people say that most people can be trusted. Others say you can’t be too careful in your dealings with people. How do you feel about it?” Since its original formulation, the most-people trust question has been introduced to—in one form or another—the U.S. General Social Survey (GSS), the World Values Survey (WVS), the European Social Survey (ESS), and the various barometers. The most-people trust question has also undergone numerous changes to its wording and structure, scale length, item numbers, and dimensionality (see Bauer and Freitag 2018; Lundmark, Gilljam, and Dahlberg 2015; Uslaner 2011, 2015, for reviews and discussions of these changes).
Despite methodological advances, researchers still “…use modified versions of questions that were introduced in the 1940s and 1950s…” (Bauer and Freitag 2018:20) and, because of that, issues related to operationalization remain. While debates exist regarding the malleability or durability of generalized trust—is generalized trust a function of early life experiences or does generalized trust evolve with lifelong experiences (Bauer 2015; Glanville and Paxton 2007; Paxton and Glanville 2015)—most agree that it captures optimism and unconditional faith in strangers and unknown others (Uslaner 2002). That is, the notion that person A trusts, period, regardless of the person, the matters at hand, or the conditions in which trust is placed. This conceptualization of generalized trust poses a number of interesting empirical issues related to operationalization.
The concept of unconditional faith implies that high generalized trusters, on average, trust unknown others for any number of matters to a greater extent than low generalized trusters. This would include watching loved ones while away, returning a favor or repaying a loan, keeping a secret, and even performing surgery regardless of who the specific stranger (or strangers) might be. Although extreme, the last example—performing surgery—strikes at the heart of the generalized trust concept: Generalized trust is not embedded within a general theoretical framework that specifies the abstract scope conditions under which it should or should not hold (Cohen 1980). Given that the abstract scope conditions are frequently unspecified, high generalized trusters should find strangers more trustworthy than low generalized trusters regarding any matter at hand. A measure of generalized trust that taps into various matters for which one might trust specific strangers eludes the literature. 1
Such omissions are critical for (mis)understanding generalized trust. The GSS, for instance, provides important information about changes in generalized trust over time (Clark and Eisenstein 2013; Fairbrother and Martin 2013; Paxton 1999; Robinson and Jackson 2001; Schwadel and Stout 2012; Twenge et al. 2014; Wilkes 2011). Yet it is difficult to interpret the current meaning of answers to decades-old questions like the most-people trust question whose wording ignores the varied circumstances in which people trust or do not trust strangers. In other words, are responses to the classic and modern generalized trust questions equivalent between respondents and across cohorts and time periods? Do respondents think of strangers and people who are unknown to them and do they consider similar situations and matters at hand? Current evidence is mixed and suggests that (a) the “radius of trust” (or the width of the circle of people among whom a certain trust level exists) varies across respondents and between cultures (Delhey, Newton, and Welzel 2011; van Hoorn 2014), (b) that respondents interpret generalized trust questions differently (Miller and Mitamura 2003), and (c) that respondents do not always think of strangers or people that are unknown to them when responding to generalized trust questions (Freitag and Bauer 2013; Reeskens and Hooghe 2008; Sturgis and Smith 2010; Torpe and Lolle 2011; Uslaner 2002).
Taken together, critical theoretical features of the generalized trust concept are absent from most measures of generalized trust. The most-people trust question and its various offshoots do not faithfully nor effectively probe generalized trust, which ultimately questions the validity of common generalized trust measures and scales. To help bridge the gap between concept and measurement, I develop and test two new measures of generalized trust. The first measure, what I refer to as Stranger Face Trust (SFT), is a long-form questionnaire that requires respondents to assess the faces of six strangers along four matters for which trust is placed. The faces were drawn from the Chicago Face Database (Ma, Correll, and Wittenbrink 2015), which is a database that consists of 597 high-resolution photographs of male and female human faces of varying age and ethnicity. Each face in the database is represented with a neutral expression photograph that has been normed by an independent rater sample. The six faces were selected based on four criteria: age (median U.S. ages), race (Caucasian, African American, and Latino), gender (male and female), and perceived trustworthiness (neutral evaluations). The four matters were constructed to cover general and common pecuniary and nonpecuniary dealings, which includes keeping a secret, watching a loved one, repaying a loan, and providing sound financial advice. The second measure, what I refer to as Imaginary Stranger Trust (IST), is a short-form questionnaire that requires respondents to imagine meeting a total stranger for the first time and to identify the extent to which they would trust the imaginary stranger for the four matters outlined above (i.e., keeping secret, watching a loved one, repaying a loan, and providing sound financial advice).
In what follows, I assess the validity and reliability of the two new measures of generalized trust. I first tackle the content and face validity of both measures. I argue that specifying particular strangers and matters at hand captures core features—or what Saylor (2013) refers to as the dimensional expanse—of the generalized trust construct. By specifying particular strangers and specific matters, I quell issues of measurement inequivalence that plague survey items common to the generalized trust literature. In this regard, I argue that generalized trust is a latent construct and that observed indicators of generalized trust manifest as assessments of trustworthiness for specific matters regarding specific strangers. Second, using classical test theory, I perform various empirical measurement validation tests such as convergent and discriminant validity, criterion validity, and tests of association with known predictors of generalized trust. All of which show that SFT and IST are valid and reliable instruments for the measurement of generalized trust and that common measures of generalized trust appear to be less valid and less reliable.
Materials and Methods
Sample
An online survey was coded and administered by Qualtrics to a convenience sample of market research panel members using quota sampling based on age, gender, and education; 2,041 panelists completed the study between the dates of April 17, 2018, and April 30, 2018. While slight differences exist, benchmark comparisons to the Current Population Survey reveal that the sample characteristics are representative of the U.S. population (see Table 1).
Comparison of Sample Characteristics to Population Benchmark.
Note: Benchmark is the Current Population Survey.
The Qualtrics survey consisted of 14 blocks. Block 1 (consent form), Block 2 (sociodemographic characteristics), and Block 3 (screener question) were fixed and shown at the beginning of the survey. The remaining 11 blocks were shown in random order from respondent-to-respondent. Each block contained a thematic set of questions, ranging from the classic generalized trust questions to trusting family members for specific matters to membership in associations to participating in political action to trusting strangers for specific matters.
Panelists who completed the study received an incentive based on the length of the survey, their specific panelist profile, and respondent acquisition difficulty. The specific type of compensation varied and may have included cash, airline miles, gift cards, redeemable points, sweepstakes entrance, and vouchers.
The data collection stopping rule consisted of two weeks of recruitment with a reminder e-mail or SMS sent every two to three days. The target sample size was 2,000. At the end of recruitment, Qualtrics finished with an extra 41 completed surveys. Eligibility was restricted to U.S. adults age 18 and older who (a) voluntarily consented to participate, (b) met quota requirements, and (c) passed a screener question at the beginning of the survey. Those who failed (a), (b), or (c) were prohibited from participating in the survey. The median survey length was approximately 27 min, 2 and the overall response rate was 27 percent. 3
Stranger Trust Measures
Generalized trust captures the notion that person A trusts, period. “‘A trusts’ describes the idea that individuals possess some generalized situation-independent expectation” (Bauer and Freitag 2018:16) about the cooperativeness and helpfulness of strangers and unknown others in everyday interactions (Putnam 2000; Uslaner 2002; Yamagishi and Yamagishi 1994). 4 Generalized trust implies that regardless of traits of the trustee, the circumstances in which trust is placed, or the matters for which one trusts another, some people are more or less trusting of strangers than others. Since peoples’ expectations about the cooperativeness and helpfulness of unknown others can and do vary, generalized trust is best thought of as a baseline starting point at which people (dis)trust strangers for any number of matters (Colquitt, Scott, and LePine 2007; Erikson 1950; Rotter 1967).
A long-standing issue in the measurement of generalized trust is that common operationalizations fail to measure specific strangers and particular matters for which trust is placed. Omitting specific strangers and particular matters from the operationalization of generalized trust undermines measurement equivalence (Davidov 2009; King et al. 2004; Vandenberg and Lance 2000), or the property of measurement that indicates that a construct (generalized trust in this instance) is being measured the same across respondents, time periods, and cohorts. I argue that measurement equivalence is not something that can be solved solely with classical test theory. Instead, the research community must decide what constitutes a reasonable measurement of unknown others and particular matters (Saylor 2013). Some trust scholars have recently made similar arguments: “As an alternative to current standard measures (e.g., the most-people trust question), one could try to measure generalized trust as an average across many situations that entails a variety of trustees, expected behaviors, and contexts. To avoid confusion, we suggest using the term “cross-situational trust” (Bauer and Freitag 2018: 29). The goal of the present article is to take a first step in this direction.
Given issues of measurement inequivalence and Bauer and Freitag’s (2018) recent call for cross-situational operationalizations of trust, I introduce two new measures of generalized trust. I refer to these new measures as SFT and IST. SFT is a long-form questionnaire that includes actual faces and matters for which trust is placed, while IST is a short-form questionnaire that requires the respondent to imagine meeting a total stranger for the first time and assessing matters of trust similar to those used for SFT. The appeal of common generalized trust measures for public opinion polls is their short length of measurement. IST was thus developed to act as a substitute for the most-people trust question and its various offshoots. This substitution is dependent on the results of classical test theory showing whether and how IST converges with SFT (an instrument with greater face and content validity than IST), which is something I assess in the Results section.
The development of SFT and IST is based on three assumptions:
Generalized trust is best assessed in a person-specific manner: Person A trusting person B for any number of matters (e.g., keeping a secret and repaying a loan) does not necessarily imply that person A will trust person C to the same extent regarding similar matters (Bauer 2013; Bauer and Freitag 2018; Cook, Hardin, and Levi 2005; Hardin 2002; Robbins 2016c). However, if person A is a high generalized truster, he or she should trust persons C,…, Z for any number of matters to a greater extent than person B who is a low generalized truster. SFT and not IST was therefore designed to provide an indicator of generalized trust that assessed multiple specific persons.
Generalized trust is best assessed in a domain-specific manner. Person A trusting person C for one particular matter (e.g., keeping a secret) does not necessarily imply that person A will trust person C for all other matters (Bauer 2013; Bauer and Freitag 2018; Cook et al. 2005; Hardin 2002; Robbins 2016c). However, if person A is a high generalized truster, he or she should trust person C for any number of matters to a greater extent than person B who is a low generalized truster. SFT and IST were therefore designed to identify specific domains of trust (i.e., matters at hand) as well as to provide an indicator of total generalized trust across multiple domains.
Recognition by trust experts is the most valid and practical criteria for the judgment of generalized trust measurement (Saylor 2013): SFT and IST are therefore based upon public assessment by experts in the field. Earlier versions of SFT and IST instruments were presented to social trust scholars at a 2018 workshop in Uppsala, Sweden.
In conjunction with assumptions common to the measurement validation literature (Pedhazur and Schmelkin 1991), assumptions 1 through 3 facilitate the treatment of generalized trust as a latent construct in which observed indicators of generalized trust manifest as assessments of trustworthiness for particular matters (in the case of SFT and IST) regarding specific strangers (in the case of SFT). I and others (Bauer and Freitag 2018) argue that the average of the assessments of trustworthiness across particular matters and specific strangers is the ideal measurement of generalized trust. A feature of measurement that SFT fully captures and IST partially captures. In short, SFT—and IST to a lesser extent—undermines measurement inequivalence and allows researchers to observe valid individual baseline levels of trust. Below, I detail the operationalization of specific strangers and particular matters.
Specific strangers
Six separate strangers were selected for inclusion in the SFT measure. The strangers were drawn from the Chicago Face Database (Ma et al. 2015), which is a database that consists of 597 high-resolution photographs of male and female human faces of varying age and ethnicity (https://chicagofaces.org/). Each face in the database is represented with a neutral expression photograph that has been normed by an independent rater sample. 5 The six faces were selected based on four criteria: age (median U.S. ages), race (Caucasian, African American, and Latino/a), gender (male and female), and perceived trustworthiness (neutral evaluations).
Ma et al. (2015) provided each face with an anonymous identification code (e.g., BF-211 would be black female [BF] image #211). The faces used in the present study were WF-228 (age = 36, number of raters = 26, perceived white = 96.15%, perceived trustworthiness = 3.88 on 7-point scale), WM-225 (age = 36, number of raters = 28, perceived white = 100%, perceived trustworthiness = 3.70 on 7-point scale), BF-214 (age = 32, number of raters = 26, perceived black = 85%, perceived trustworthiness = 3.76 on 7-point scale), BM-021 (age = 36, number of raters = 95, perceived black = 97%, perceived trustworthiness = 3.87 on 7-point scale), LF-240 (age = 43, number of raters = 24, perceived Latina = 65%, perceived trustworthiness = 3.75 on 7-point scale), and LM-227 (age = 32, number of raters = 95, perceived Latino = 65%, perceived trustworthiness = 3.67 on 7-point scale). See the Supplemental Material Online for headshots of all six human faces.
For each human face, respondents were asked to “imagine meeting the following stranger for the first time. Please identify how much you would trust this stranger for each of the following.” For IST, respondents were not shown human faces. Instead, respondents were asked to “imagine meeting a total stranger for the first time. Please identify how much you would trust this stranger for each of the following.”
Particular matters
Four separate domains (or matters) of trust were selected for inclusion in the SFT and IST measures. 6 Two domains captured nonpecuniary matters, while two other domains captured pecuniary matters. Three criteria were used to select the four domains: generality, commonality, and cost. First, the four domains were intended to be general; that is, to span specific instances of a particular matter. For example, the act of keeping a secret can cover, in theory, a great number of secrets (e.g., drug addiction, sexual preferences and habits, prior abortion). The goal is to not cover each and every type of secret but to provide respondents with a concrete dealing or action (like keeping a secret regardless of its content) with which to assess strangers. Second, the four domains were intended to be common matters for which people have trusted, frequently do trust, or can easily imagine trusting others to maintain. For instance, trusting others to keep secrets is a common feature of social life (Cowan 2014; Small 2017). Third, and finally, the four domains were constructed to be costly for respondents if trust was broken or betrayed. This third criterion was used to capture issues or matters that are salient to respondents in real life. Given these three criteria, the four domains selected for SFT and IST were (a) to keep a secret that is damaging to the respondent’s reputation; (b) to repay a loan of 1,000 dollars; (c) to look after a child, family member, or loved one while the respondent is away; 7 and (d) to provide advice about how best to manage money. 8
Scaling and random ordering
Drawing on measures of particularized trust and generalized trust from the World Values Survey, a 4-point scale measuring trust and anchoring at Do not trust at all and Trust completely with Do not trust very much and Trust somewhat in-between the anchors and a Don’t know option at the end of the scale was chosen for each domain. For SFT, I randomized the order of the six human faces between respondents and the order of the four domains between human faces. For IST, only the order of the four domains was randomized between respondents. Moreover, SFT and IST constituted unique blocks where the order of each block was randomized between respondents. 9 Taken together, an example of how SFT was shown to respondents with an example human face, the four trust domains, and the rating scales can be seen in Figure 1.

Example of Stranger Face Trust instrument in Qualtrics. The figure includes a human face (BM-021), the four trust domains, and the rating scales.
Measures of Convergent Validity
Convergent validity consists of within and between construct measurement validation. Regarding the former, convergent validity is the degree of confidence we have that a latent construct is well measured by its observed indicators (Campbell and Fiske 1959). Regarding the latter, convergent validity is the degree to which an operationalization is similar to (or converges on) other operationalizations that it theoretically should be similar to (Pedhazur and Schmelkin 1991). Four instruments common to the GSS and WVS originally designed to measure generalized trust, particularized trust, and political trust were used to assess convergent validity between closely related latent constructs: a 3-item Misanthropy Scale (Brehm and Rahn 1997; Paxton 1999; Zmerli and Newton 2008) and a 3-item Generalized Social Trust Scale (Newton and Zmerli 2011) were used to measure generalized trust, a 3-item Particularized Social Trust Scale (Freitag and Traunmüller 2009) was used to measure particularized trust, and a 4-item political trust scale (Newton and Zmerli 2011; Zmerli and Newton 2008) was used to measure political trust.
Misanthropy scale
The misanthropy scale (MST) consists of 3 items scored on a 2-point binary scale. Item 1 (TRUST) asks “Generally speaking, would you say that most people can be trusted or that you need to be very careful in dealing with people?” with most people can be trusted and need to be very careful in dealing with people as anchors with a don’t know option. Item 2 (FAIR) asks “Do you think that most people would try to take advantage of you if they got the chance or would they try to be fair?” with most people would try to take advantage of me and most people try to be fair as anchors with a don’t know option. Item 3 (HELP) asks “Would you say that most of the time people try to be helpful or that they are mostly looking out for themselves?” with people mostly try to be helpful and people mostly look out for themselves as anchors with a don’t know option. Don’t know responses were treated as system missing and all 3 items were recoded to parallel polarity (i.e., 1 = most people can be trusted, most people try to be fair, and people mostly try to be helpful).
Generalized social trust scale
The generalized social trust scale (GST) consists of 3 items scored on a 4-point scale anchoring at Do not trust at all and Trust completely with Do not trust very much and Trust somewhat in-between the anchors and a Don’t know option at the end of the scale. The question asked “Could you tell me for each whether you trust people from this group completely, somewhat, not very much, or not at all?” with grid items being “people you meet for the first time” (FIRST), “people of another religion” (RELI), and “people of another nationality” (NATION). Don’t know responses were treated as system missing.
Particularized social trust scale
The particularized social trust scale (PST) consists of 3 items scored on a 4-point scale anchoring at Do not trust at all and Trust completely with Do not trust very much and Trust somewhat in-between the anchors and a Don’t know option at the end of the scale. The question asked “Could you tell me for each whether you trust people from this group completely, somewhat, not very much, or not at all?” with grid items being “your family” (FAMILY), “your neighborhood” (NEIGH), and “people you know personally” (KNOW). GST and PST were embedded within the same survey grid. Don’t know responses were treated as system missing.
Political trust scale
The political trust scale (POT) consists of 4 items scored on a 4-point scale anchoring at none at all and a great deal with Not very much and Quite a lot in-between the anchors and a Don’t know option at the end of the scale. The question asked “We are going to list a number of organizations. For each one, could you tell us how much confidence you have in them: Is it a great deal of confidence, quite a lot of confidence, not very much confidence, or none at all?” with grid items being “the police” (POLICE), “the courts” (COURTS), “the government” (GOVERN), and “political parties” (PARTIES). Don’t know responses were treated as system missing.
As a methodological note, while debate exists regarding the validity of MST and GST (see Freitag and Bauer 2013; Uslaner 2015), similar findings were observed for convergent validity when using the standard most-people trust question instead of MST and the trust in people that you meet for the first time question instead of GST (see the Supplemental Material Online for robustness checks). Hence, use of MST and GST in the present article. Descriptive and summary statistics of the observed indicators of the SFT, IST, MST, GST, PST, and POT latent constructs can be found in Table 2.
Sample Descriptive Statistics of Indicators of Latent Factors for SFT, IST, MST, GST, PST, and POT.
Note: SFT = Stranger Face Trust; MST = Misanthropy Scale; IST = Imaginary Stranger Trust; GST = Generalized Social Trust Scale; PST = Particularized Social Trust Scale; POT = Political Trust Scale; WF = white female; WM = white male; BF = black female; BM = black male; LF = Latin female; LM = Latin male.
Measures of Discriminant Validity
Discriminant validity assesses the degree to which an operationalization is not similar to (or diverges from) other operationalizations that it theoretically should not be similar to (Pedhazur and Schmelkin 1991). Discriminant validity, in other words, is the degree to which measures of different traits are unrelated. Four instruments capturing social preferences, risk seeking, betrayal aversion, and social desirability were used to assess discriminant validity: a 9-item social value orientations scale (van Lange 1999; van Lange et al. 1997), a single indicator risk-seeking scale (Fehr 2009), a 2-item betrayal aversion scale (Perugini et al. 2003), and a 10-item social desirability scale (Strahan and Gerbasi 1972).
Social value orientations
The social value orientation instrument consists of nine hypothetical decision scenarios, where participants decide for each scenario how to divide resources between themselves and a hypothetical stranger or “other.” Each scenario includes three options corresponding to one of the three social values or preferences: a cooperative choice, which maximizes joint gain; an individualist choice, which maximizes personal gain without regard to the other’s outcome; and a competitive choice, which maximizes the difference between gains to self and other. Participants were classified as cooperative, individualist, or competitive if they made six or more choices corresponding to one of the social value orientations (missing values were treated as zero). From this, I constructed four dummy variables (N = 2,041): cooperative orientation (58.4 percent), individualist orientation (14.8 percent), competitive orientation (7.3 percent), and a not classified orientation that consists of those not classified as cooperative, individualist, or competitive (19.5 percent). The expectation is that generalized trust (expectations of generalized others) and social preferences (interests in the outcomes of self and other) should be uncorrelated, given that both are theoretically distinct constructs (belief vs. preference).
Risk-seeking
The risk-seeking measure consists of a single 7-point scale: “Are you, generally speaking, a person who is fully prepared to take risks or do you try to avoid taking risks?” Responses anchor at avoid taking risks and always taking risks with a neutral option at the center point and a don’t know option at the end of the scale (N = 2,017, M = 2.44, SD = 1.57, min = 0, max = 6). Don’t know responses were treated as system missing. While scholars frequently conflate trust with perceptions of risk (Gambetta 1988), preferences for risk are theoretically different than perceptions of risk. As a result, generalized trust and preferences for risk—like generalized trust and social preferences—should be uncorrelated.
Betrayal aversion
The betrayal aversion instrument consists of 2 items scored on a 7-point bipolar scale anchoring at strongly disagree and strongly agree with a neither agree nor disagree option at the center point and a don’t know option at the end of the scale. The question asked “To what degree do you agree with the following statements?” with grid items being “If I suffer a wrong, I will take revenge as soon as possible, no matter the cost” and “If someone offends me, I will also offend him or her.” Don’t know responses were treated as system missing. The 2 items were summed and divided by two while ignoring missing values (i.e., row means) to form the betrayal aversion scale (N = 2,022, M = 1.98, SD = 1.53, min = 0, max = 6, α = .82). Betrayal aversion implies that individuals dislike unrequited trust. Fehr (2009) argues that people who dislike nonreciprocated trust are prone to punishing nonreciprocating individuals. Instruments that measure negative reciprocity and propensities for revenge should be reasonable proxies for betrayal aversion. And since betrayal aversion is an attitude, it should be uncorrelated with generalized trust.
Social desirability
The social desirability instrument is a short-form version of the Marlowe–Crowne Social Desirability Scale (Strahan and Gerbasi 1972). The instrument consists of 10 items scored on a true or false scale. The question asked “Listed below are a number of statements concerning personal attitudes and traits. Read each item and decide whether the statement is true or false as it pertains to your personality. It is best to go with your first and immediate judgment.” The grid items included statements such as “I’m always willing to admit it when I make a mistake” to “I always try to practice what I preach” to “I like to gossip at times” to “I sometimes try to get even rather than forgive and forget.” Responses to the grid items were summed (missing values were treated as zero), yielding a scale that ranges from 0 to 10 where larger values indicate greater social desirability or the tendency to respond in a way that is deemed more socially acceptable than what their “true” answer would dictate (N = 2,041, M = 5.26, SD = 2.13, min = 0, max = 10, α = .61). The scale, in short, measures individuals’ general overall tendency to respond to survey items in socially desirable ways. In the ideal, propensities for social desirability would be uncorrelated with generalized trust.
Measures of Criterion Validity
Criterion validity determines the extent to which an operationalization is related to an outcome that it should theoretically be related to. “Broadly speaking, a criterion is any variable…that one wishes to explain and/or predict by resorting to information from another variable” (Pedhazur and Schmelkin 1991:32). Criterion validity generally consists of two types: concurrent validity and predictive validity. Concurrent validity identifies the magnitude of a relationship between the construct of interest and criterion measurements at the time of the construct’s administration (or measurement). Predictive validity refers to the ability of a construct to predict an outcome (i.e., criterion measurement) into the future. For the present article, I focus on concurrent validity but identify future avenues of research in the discussion and conclusion that address the predictive validity of SFT and IST.
Three instruments measuring political action (Putnam 2000), trusting behavior (Glaeser et al. 2000; Naef and Schupp 2009), and trust in a survey experiment (Robbins 2016a, 2016b, 2016d, 2017) were used to assess criterion validity.
Political action
The political action instrument consists of 5 items scored on a 4-point nominal scale with response options being have done, might do, would never do, and prefer not to say. The question asked “We are going to show you some forms of political action that people can take, and we would like you to tell us, for each one, whether you have done any of these things, whether you might do it, or would never under any circumstances do it?” with grid items for “signing a petition, “joining in boycotts,” “attending peaceful demonstrations, “joining strikes,” and “any other act of protest.” Prefer not to say responses were treated as system missing. The “have done” response was treated as 1 while all other responses were treated as 0. Responses to the five grid items were then summed (missing values treated as zero), yielding a scale that ranges from 0 to 5 where larger values indicate greater amounts of political action (N = 2,041, M = 1.20, SD = 1.24, min = 0, max = 5, α = .65).
There is a sizable literature connecting generalized trust to political participation (Almond and Verba 1963; Bäck and Christensen 2016; Benson and Rochon 2004; Inglehart and Norris 2003; Kaase 1999; Putnam 2000). Social scientists typically decompose political participation into three components: voting, institutionalized participation (e.g., contacting politicians, working for political campaigns), and noninstitutionalized participation (e.g., signing a petition, joining a boycott). Although far from conclusive, researchers generally find that generalized trust is positively related to noninstitutionalized participation (e.g., Kaase 1999) and either negatively or statistically unrelated to institutionalized participation (e.g., Bäck and Christensen 2016). Given that my measure of political action can be characterized as noninstitutionalized participation, I expect to observe a positive relationship between SFT/IST and the political action scale.
Trusting behavior
The trusting behavior instrument consists of 3 items scored on a 5-point scale ranging from never to infrequently to sometimes to often to very often with a don’t know option at the end of the scale. The question asked “We would like to ask you some questions about prior behaviors. Could you tell us whether you do the following very often, often, sometimes, infrequently, or never?” with grid items for “How often do you lend personal possessions to your friends (tools, books, your car or bicycle, etc.)?” “How often do you lend money to your friends?” and “How often do you leave your door unlocked?” Don’t know responses were treated as system missing. The 3 items were summed and divided by three while ignoring missing values (i.e., row means) to form the trusting behavior scale (N = 2,019, M = 1.50, SD = 0.94, min = 0, max = 4, α = .68). Given that measures of trusting behavior are positively correlated with measures of social trust and behavioral measures of trust observed in the lab (Glaeser et al. 2000; Naef and Schupp 2009), I expect similar relations between the trusting behavior scale and SFT/IST.
Trust mechanic
The trust mechanic instrument is a measure of trust administered in a hypothetical car repair situation. The experimental situation is a modified version of the car repair survey experiment originally developed by Vincent Buskens and colleagues (Buskens 2002; Buskens and Weesie 2000), which was further refined and implemented by Robbins (2016a, 2016b, 2016d, 2017). The survey experiment involves a simulated auto maintenance scenario where an unknown outcome is inherent to the situation. The scenario consists of a truster (the respondent), a trustee (an auto mechanic), a particular matter (auto repairs), and six manipulated dimensions (e.g., age of the trustee).
In the present study, respondents assess a single vignette and report the extent to which they trust the hypothetical auto mechanic to provide justifiable and quality auto repairs. Dimensions (levels) manipulated in the vignette include age (20, 30, 40, 50, or 60 years of age), race (white, black, Hispanic, or Asian), gender (male or female), perceived internal motivations (no prior interaction, prior interaction, encapsulated interests, or goodwill), contract (BLANK, nonbinding contract, or binding contract), and regulation (no regulations, non-monetary regulations, or monetary regulations). For more detail regarding the car repair survey experiment and the manipulated dimensions, see Robbins (2016a, 2016b, 2016d, 2017).
The trust mechanic instrument consists of a single item scored on a 9-point bipolar scale anchoring at completely distrust and completely trust with a neither trust nor distrust option at the center point and a don’t know option at the end of the scale (N = 2,001, M = 6.05, SD = 2.05, min = 0, max = 8). The question asked of the vignette, “Given the conditions above, to what extent do you trust the auto mechanic to provide justifiable and quality auto repairs?” Don’t know responses were treated as system missing. Robbins (2016a) recently found that the most-people trust question and a version of the GST correlate with responses to the trust mechanic instrument. Similar relations should be observed between SFT/IST and the trust mechanic scale.
Measures of Sociodemographic Characteristics and Predictors of Generalized Trust
In addition to the construct and criterion validation measures listed above, I included a number of other measures to explore their association with the SFT and IST scales, since previous studies have shown that some biographical and sociodemographic characteristics predict levels of generalized trust (Alesina and La Ferrara 2002; Freitag and Traunmüller 2009; Hooghe and Oser 2017; Hooghe et al. 2009; Kumlin and Rothstein 2005; Leigh 2006; Simpson 2006). This exercise was done to identify similarities and differences in predictors of SFT, IST, MST, and GST, but to also verify whether differences exist between predictors of MST and GST observed in the present study and those found in prior research.
With respect to sociodemographic characteristics, I included measures of age (in years), gender (male, female, and other), marital status (married, divorced, separated, never married, living with partner, and none of the above), religious denomination (protestant, catholic, Jewish, eastern religion, Muslim, orthodox, other religion, and no religion), race–ethnicity (non-Hispanic white, non-Hispanic black, non-Hispanic Asian, non-Hispanic other, non-Hispanic 2+ races, and Hispanic), education (no HSD, HSD, some college or AA, and BA+), region (northwest, Midwest, northeast, and south), income (logged per capita household income), and employment status (working and not working).
Regarding biographical characteristics, I included measures of religious attendance (6-point scale of attend religious services never, once a year or less, a few times a year, once or twice a month, once a week, and more than once a week), religiosity (4-point scale of not at all religious, not very religious, somewhat religious, and very religious), party affiliation (7-point unfolding scale of strong Democrat, moderate Democrat, lean Democrat, don’t lean/Independent/None, lean Republican, moderate Republican, and strong Republican), associational memberships (count of active memberships in a church or religious organization, a sport or recreation organization, an art, music or educational organization, a labor union, a political party, an environmental organization, a professional organization, a charitable or humanitarian organization, a consumer organization, and any other organization), and prior trust rewards (7-point scale of trust in the past generally exploited or rewarded). All survey items used in the present article can be found in the Supplemental Material Online.
Analytic strategy
For all measurement validation models, confirmatory factor analysis (CFA) and structural equation models (SEMs) were implemented and estimated in Mplus 8.1 (Muthén and Muthén 2017). Since estimated CFAs and SEMs consisted of categorical (binary and ordinal) observed indicators, two estimation techniques are preferred: robust maximum likelihood (MLR) with numerical integration and mean-and-variance adjusted weighted least squares (WLSMV). 10 Although MLR performs better than WLSMV under most conditions (e.g., small samples, skewness, and kurtosis; Flora and Curran 2004; Lei 2009), the results presented use WLSMV. This was done for two reasons. First, Lei (2009) found that when sample sizes are at least 250, MLR and WLSMV perform similarly regardless of skewness. A condition met in the current study. Second, χ2 and related fit statistics are unavailable for MLR with numerical integration but are available for WLSMV. Model fit is central to measurement validation. Thus, WLSMV was used.
MLR missing data procedures, however, are superior to WLSMV. Note that missing data for MLR are handled with maximum likelihood, which produces asymptotically unbiased estimates when data are missing completely at random (MCAR) or missing at random (MAR) and is even preferable when data are missing not at random (MNAR; Allison 2001). Missing data for WLSMV, on the other hand, are handled with pairwise deletion, which produces unbiased estimates when data are MCAR and biased estimates when data are MAR and MNAR (Little 1992). The χ2 test for MCAR was rejected across all estimated models, showing that the data are not MCAR. As a consequence, the Supplemental Material Online details the results of all models estimated in the present article using MLR with numerical integration in which missingness was modeled in the X and Y variables (i.e., variances for all X variables were freely estimated). The results were largely consistent across WLSMV and MLR with numerical integration. 11
Results
Characteristics of the Sample
The final sample consisted of 2,041 completed surveys. The mean age was 47.19 (SD = 16.62, min = 18, max = 99). The majority of participants were white (81.3 percent), female (50.4 percent), and married (54.2 percent). Almost half of the participants had a bachelor’s degree or greater (45.2 percent), more than half were working (55.48 percent), and the most frequently observed denomination was protestant (45.57 percent). The sample was mostly from the Southern (38.65 percent) and Midwestern states (24.44 percent) with a median household income of US$67,500. On average, participants attended religious services a few times a year (M = 1.97, SD = 1.73, min = 0, max = 5), were not very to somewhat religious (M = 1.52, SD = 1.03, min = 0, max = 3), leaned Democrat (M = 3.88, SD = 2.10, min = 1, max = 7), were not active in many associations (M = 1.40, SD = 1.68, min = 0, max = 10), and generally had their trust rewarded in the past (M = 3.14, SD = 1.55, min = 0, max = 6).
Convergent Validity and Discriminant Validity of SFT and IST
I applied CFA to analyze the latent structure of the 6-item SFT and 4-item IST scales (see Table 2 for summary statistics). For SFT, respondents assessed four domains for which to trust each stranger’s face (SECRET, LOAN, CHILD, and ADVICE). The four domains of each face were summed and divided by four while ignoring missing values (i.e., row means) to form six separate scales for white female (WF; N = 1,813, M = 0.78, SD = 0.82, min = 0, max = 3, α = .92), white male (WM; N = 1,819, M = 0.57, SD = 0.73, min = 0, max = 3, α = .92), black female (BF; N = 1,799, M = 0.77, SD = 0.83, min = 0, max = 3, α = .82), black male (BM; N = 1,810, M = 0.80, SD = 0.85, min = 0, max = 3, α = .93), Latin female (LF; N = 1,808, M = 0.75, SD = 0.82, min = 0, max = 3, α = .92), and Latin male (LM; N = 1,805, M = 0.75, SD = 0.82, min = 0, max = 3, α = .92). Each of the scales for WF, WM, BF, BM, LF, and LM was treated as observed continuous indicators of an SFT latent construct.
To investigate internal consistency, convergent validity, and discriminant validity, a number of measurement validation statistics common to classical test theory were used. Cronbach’s α—which is a lower-bound estimate of the reliability of measurement indicators—was employed to assess internal consistency (Cronbach 1951). Cronbach’s α values range between 0 and 1, with values less than .60 suggesting poor reliability and values greater than .70 indicating acceptable reliability (Nunnally 1978). For convergent validity of observed indicators, standardized factor loadings less than .70 suggest questionable fit since more than 50 percent (i.e., .702) of the variance in an observed indicator is explained by factors other than the latent construct to which the observed indicator is theoretically related (Kline 2005). Standardized factor loadings greater than .70 suggest convergent measurement validation (Campbell and Fiske 1959). Drawing on the criterion of Fornell and Larcker (1981), the convergent validity of a measurement model should also be assessed by the average variance extracted (AVE), which measures the level of variance captured by a construct versus the level due to measurement error. The AVE is defined as the sum of variances extracted (i.e., R 2) from each indicator of a latent construct over the total number of indicators of a latent construct. AVE values greater than .70 are considered very good, while values of .50 are acceptable.
To establish discriminant validity, correlations between latent constructs less than .85 indicate discriminant validity; correlations greater than .85 tell us that the constructs overlap and are likely measuring the same latent concept (Campbell and Fiske 1959). According to Fornell and Larcker (1981), discriminant validity should also be assessed by comparing the amount of the variance captured by a latent construct (i.e., AVE) to the shared variance with other constructs. Thus, discriminant validity is established when the levels of the AVE for each latent construct are greater than the squared correlation between latent constructs.
Overall goodness of fit of the models was assessed with the root mean squared error of approximation (RMSEA), the comparative fit index (CFI), and the standardized root mean square residual (SRMR) using the WLSMV estimator. A hypothesized model that perfectly captures the observed data should produce CFI values equal to 1.0, and RMSEA and SRMR values equal to 0, CFI values greater than .90, RMSEA values less than .08, and SRMR values less than .08 suggest adequate fit (Hu and Bentler 1999; Muthén and Muthén 2017).
Table 3 shows the results of a two-factor CFA in which observed indicators of SFT were treated as continuous and observed indicators of IST were treated as categorical. As Table 3 shows, all standardized factor loadings were well above the standard cutoff value of .70 with most greater than .90. The AVEs for SFT (.809) and IST (.870) were also well above the standard cutoff of .70. Moreover, the reliabilities (Cronbach’s α) for the SFT and IST latent constructs were .96 and .93, respectively, suggesting excellent internal consistency (see Table 2). By all accounts, SFT and IST revealed strong convergent validity. Regarding discriminant validity, the correlation between SFT and IST was strong (r = .718) and statistically significant (p < .0001). And the AVEs for SFT (.809) and IST (.870) were greater than the squared correlation between SFT and IST (.515). These findings suggest that although the SFT and IST latent constructs were highly related and substantially overlapped, they constituted distinct constructs. Finally, the RMSEA (.037), CFI (.995), and SRMR (.013) were well below common cutoff values, suggesting adequate model fit.
Confirmatory Factor Analysis of SFT and IST.
Note: N = 1,985. Unstandardized factor loadings (robust standard errors) [
*p < .05. **p < .01. ***p < .001.
Additional robustness checks and alternative modeling specifications showed that (a) a higher-order CFA yielded similar results to those presented in Table 3, (b) a single-factor CFA fit worse that the two-factor CFA presented in Table 3, (c) SFT was not solely driven by latent expectations regarding specific trust domains but, instead, individuals assessed the trustworthiness of each stranger’s face given the matters at hand and the persons under evaluation, and (d) SFT was not driven by social desirability bias. These additional robustness checks and alternative modeling specifications can be found in the Supplemental Material Online.
Discussion
The analysis of internal consistency indicated excellent reliability of SFT and IST as a whole. In addition, convergent validity of the observed indicators was established with factor loadings and AVEs well above standard .70 cutoff values. The analysis of discriminant validity indicated substantial overlap between SFT and IST—Corr(SFT, IST) >.70—but not enough overlap to indicate a common underlying construct that accounts for both SFT and IST (as shown by the AVEs and squared correlation between SFT and IST).
Convergent Validity and Discriminant Validity of Latent Trust Constructs
To assess convergent and discriminant validity between latent trust constructs, I applied CFA to analyze the degree to which the 6-item SFT and 4-item IST scales correlated with the 3-item MST, the 3-item GST, the 3-item PST, and the 4-item POT scales.
Table 4 shows the results of six-factor CFAs in which observed indicators of SFT were treated as continuous, while observed indicators of IST, MST, GST, PST, and POT were treated as categorical. For SFT and IST, the standardized factors loadings and AVEs were again well above the standard cutoff of .70. While MST, GST, PST, and POT yielded some standardized factor loadings below .70, most were well above .70. Likewise, all of the AVEs for MST, GST, PST, and POT were below .70 but above acceptable values of .50 (with the exception of PST). Moreover, the reliabilities (Cronbach’s α) for the MST (α = .69), GST (α = .74), PST (α = .64), and POT (α = .77) latent constructs suggested reasonable internal consistency. Unlike the two-factor CFA for SFT and IST, the overall model fit of the current six-factor CFA yielded questionable (although marginally reasonable) model fit, χ2(215) = 3,398.213, p < .001, RMSEA = .085, CIF = .897, SRMR = .069. Note that the goal of the current CFA was not to maximize overall model fit but to investigate how well SFT and IST correlate with common instruments of generalized, particularized, and political trust.
Confirmatory Factor Analysis Assessing Convergent and Discriminant Validity.
Note: N = 2,031. Unstandardized factor loadings (robust standard errors) [
*p < .05. **p < .01. ***p < .001.
Regarding convergent validity, results indicated that SFT and IST moderately correlated with MST and GST (the two most common generalized trust scales found in the literature). The correlations between SFT/IST and MST/GST were less than .46, which were substantially less than Corr(SFT, IST) and Corr(MST, GST). This indicated that SFT and IST had more conceptual overlap with each other than with MST or GST (which, in turn, had more conceptual overlap with each other than with SFT or IST). Moreover, research shows that MST and GST consistently correlate with measures of particularized trust (PST) and political trust (POT). A finding that was replicated here for MST and GST and a finding that also extended to SFT and IST.
Beyond convergent validity, the correlations between latent constructs revealed additional interesting insights. Corr(SFT, IST) was greater than .70 but less than .85, again suggesting substantial overlap but construct discrimination between SFT and IST (a finding that was supported by comparing the AVEs of SFT and IST with the squared correlation between SFT and IST). Interestingly, the intercorrelations between MST, GST, PST, and POT ranged between .43 and .77, suggesting that MST, GST, PST, and POT were capturing different latent constructs but with substantial overlap (a finding that was supported by comparing the AVEs of MST, GST, PST, and POT to their respective squared correlations). This is in contrast to both SFT and IST, which yielded intercorrelations with MST, GST, PST, and POT between values of .22 and .46. In fact, the correlations between latent constructs revealed that SFT/ IST were consistently less correlated with PST and POT than MST/GST.
Additional robustness checks and alternative modeling specifications can be found in the Supplemental Material Online. The various robustness checks supported the overall findings presented here.
Discussion
The analysis of convergent validity indicated that SFT and IST were associated with common generalized trust instruments (MST and GST) as well as instruments for closely related but theoretically divergent concepts (PST and POT). The analysis further revealed that the discriminant validity of SFT/IST was superior to MST/GST in relation to particularized trust (PST) and political trust (POT). The results also showed that SFT and IST capture different underlying constructs than MST and GST.
Discriminant Validity
To assess discriminant validity, I used SEM to analyze the degree to which the SFT, IST, MST, and GST scales were a function of social value orientations, risk-seeking, betrayal aversion, and social desirability. I used a multivariate SEM in which SFT, IST, MST, and GST were treated as endogenous latent factors that were regressed on exogenous observed variables for social value orientations, risk-seeking, betrayal aversion, and social desirability. Residual variances for SFT, IST, MST, and GST were allowed to freely covary (not shown).
Table 5 shows the results of the multivariate SEM in which observed indicators of SFT were treated as continuous, while observed indicators of IST, MST, and GST were treated as categorical. Results indicated that SFT was not driven by social value orientations (p > .05), while those with individualist (MST and GST) and competitive (IST and GST) orientations yielded lower generalized trust than those with cooperative orientations (see Table 5). Although the directions of effect were all positive, risk-seeking was statistically nonsignificant for SFT (β = .031, p > .10) but not IST (β = .065, p < .01), MST (β = .069, p < .05), and GST (β = .084, p < .001). Betrayal aversion, on the other hand, was positive and statistically significant for SFT (β = .202, p < .001) and IST (β = .308, p < .001). The effect of betrayal aversion, however, was reversed for MST (β = −.123, p < .05) and was statistically nonsignificant for GST (β = .001, p > .10). Finally, social desirability was positive and statistically significant for SFT, IST, MST, and GST (see Table 5).
Multivariate Structural Equation Model Assessing Discriminant Validity of SFT and IST with MST and GST as Benchmarks.
Note: N = 2,008. Unstandardized coefficients (robust standard errors) [
*p < .05. **p < .01. ***p < .001.
Discussion
SFT was weakly associated with betrayal aversion and social desirability. The results suggest that individuals averse to nonreciprocated trust or individuals who have a tendency to respond to surveys in socially acceptable ways score higher on the SFT scale. IST, MST, and GST, on the other hand, were weakly associated with social value orientations, risk-seeking, betrayal aversion (IST and MST only), and social desirability. The results suggest that the SFT was better able to discriminate from social value orientations and risk-seeking than IST, MST and GST. All four scales, however, were weakly correlated with social desirability, with SFT yielding the smallest standardized effect of social desirability and IST yielding the largest.
Criterion Validity
To assess criterion validity, I used SEM to analyze the degree to which the political action, trusting behavior, and trust mechanic instruments were a function of each respective latent construct. More specifically, I used a multivariate SEM in which political action, trusting behavior, and trust mechanic were treated as endogenous observed variables that were regressed on exogenous latent factors for SFT (models 1 and 2), IST (models 3 and 4), MST (models 5 and 6), and GST (models 7 and 8) with and without control variables. Residual variances for political action, trusting behavior, and trust mechanic were allowed to freely covary (not shown). A single multivariate SEM with all three endogenous observed variables and all four exogenous latent factors was not shown due to issues of multicollinearity between the constructs (results available upon request).
Table 6 shows the results of the multivariate SEM. Results indicated that SFT and IST were good indicators of noninstitutionalized political participation, behaviors that involve trust and unknown outcomes, and trusting a hypothetical auto mechanic to repair a hypothetical car. Similar results were observed for MST and GST. The results were largely consistent with and without control variables. Regarding effect sizes, the standardized βs of SFT, IST, and MST when predicting political action were roughly similar, while GST appeared to overpredict political action in relation to SFT and IST. The effects of SFT, IST, and GST on trusting behavior were roughly similar as indicated by the standardized betas, while MST tended to underpredict trusting behavior in relation to SFT and IST. Finally, the effects of SFT and IST on trust mechanic yielded similar standardized βs, while MST and GST tended to overpredict trust mechanic in relation to SFT and IST.
Multivariate Structural Equation Models Assessing Criterion Validity of SFT and IST with MST and GST as Benchmarks.
Note: Unstandardized coefficients (robust standard errors) [
*p < .05. **p < .01. ***p < .001.
Discussion
The results suggest that SFT and IST perform as well as MST and GST in predicting outcomes that they should, theoretically, be able to predict. Depending on the outcome, MST and/or GST yielded upwardly or downwardly biased estimates in relation to SFT and IST.
Associations with Sociodemographic Characteristics and Classic Predictors of Generalized Trust
To explore common associations and classic predictors of generalized trust, I used SEM to analyze the degree to which the SFT, IST, MST, and GST scales were related to various sociodemographic and biographical characteristics. More specifically, I used SEM in which SFT, SIT, MST, and GST were treated as endogenous latent factors that were regressed on exogenous observed sociodemographic characteristics. Each latent factor consisted of a separately estimated SEM, given that a multivariate SEM in which each latent factor served as an endogenous variable failed to converge regardless of the estimation procedure (be it MLR or WLSMV). As before, observed indicators of SFT were treated as continuous, while observed indicators of IST, MST, and GST were treated as categorical.
Table 7 shows the results of the four SEMs outlined above. The results indicated that SFT and IST have common predictors: age, gender, religious attendance, party affiliation, and associational memberships consistently predicted SFT and IST. That is, older adults, women, individuals who do not attend church services, Republicans, and individuals without associational memberships have less generalized trust (as indicted by SFT and IST) than younger adults, men, individuals who frequently attend church, Democrats, and individuals who are members of many associations, respectively. Between SFT and IST, the only inconsistent predictors were religious affiliation, religiosity, race–ethnicity, and region. All other variables, such as income and education, yielded statistically nonsignificant effects for both SFT and IST.
Structural Equation Models Assessing Statistical Associations of SFT and IST with MST and GST as Benchmarks.
Note: Unstandardized coefficients (robust standard errors) [
*p < .05. **p < .01. ***p < .001.
In comparison to MST and GST, I found that party affiliation was the one variable that consistently related to all four scales. The finding indicated that as one moves from strong Democrat to strong Republican, generalized trust decreases for all four scales. While I found that age was significantly related to SFT, IST, and MST, the direction of effect was negative for SFT (β = −.177, p < .001) and IST (β = −.248, p < .001) but positive for MST (β = .184, p < .001) and GST (β = .061, p > .10). Other less consistent findings were that women tend to generate less generalized trust than men (SFT, IST, and MST), that associational memberships increase generalized trust (SFT, IST, and GST), that a greater frequency of religious attendance increases generalized trust (SFT, IST, and GST), and that trust being rewarded in the past increases generalized trust (MST and GST). Finally, it appears that marital status, religious denomination, race, education, region, and household income failed to generate consistent effects across all four scales, while employment status was not significantly related to any of the generalized trust scales.
Discussion
The results suggest that SFT and IST were driven by predictors that have been shown to correlate with common measures of generalized trust in prior research. Such variables include age, gender, religious attendance, party affiliation, and associational memberships. Similar effects were observed and replicated for MST and GST, but with noticeable differences: Age was negatively related to SFT/IST but positively related to MST/GST. Most other sociodemographic and biographical characteristics failed to generate consistent effects. Taken together, findings from prior research were robust to valid and reliable measures of generalized trust, such as gender, religious attendance, party affiliation, and associational memberships (Glanville et al. 2013; Hooghe and Oser 2017; Mewes 2014; Paxton 2007; Welch et al. 2004). Other correlates of generalized trust have been called into question, particularly age (Clark and Eisenstein 2013; Robinson and Jackson 2001; Schwadel and Stout 2012; Wilkes 2011) and education (Oskarsson et al. 2017).
Discussion
Over the last half-century, the concept of generalized trust has been elevated to an important line of inquiry, both as a cause and as a consequence of various social processes and as an indicator of social capital and collective efficacy. As a result, generalized trust is well worth assessing accurately and investigating in detail. Two new valid and reliable measures of generalized trust were proposed, SFT and IST, that can provide a criterion by which to efficiently measure and study the dimensional expanse of generalized trust. SFT and IST differ from common measures of generalized trust in two important respects. First, SFT and IST specify general and common matters (or domains) in which to trust others. Second, SFT provides respondents with faces of specific strangers to trust, while IST requires respondents to imagine meeting a total stranger for the first time. By measuring these dimensional elements—particular strangers and specific matters—I address issues of measurement inequivalence that plague survey items common to the generalized trust literature. SFT was created as a long-form questionnaire of generalized trust, while IST was created as a short-form questionnaire that can be easily implemented by the research community.
The goal of the present study was to assess the validity and reliability of SFT and IST alone and in relation to each other, with common generalized trust instruments—MST and GST—serving as benchmarks. SFT and IST achieved high overall internal consistency and convergent validity of observed indicators. Although SFT and IST exhibited considerable conceptual overlap, both scales demonstrated discriminant validity when tested against each other. While SFT and IST displayed reasonable convergent validity when tested against MST and GST—and were more divergent from other closely related trust concepts (particularized trust and political trust) than MST and GST—the four scales formed unique latent clusters. SFT and IST were strongly correlated; MST and GST were strongly correlated. But the intercorrelations between SFT/IST and MST/GST were stronger within clusters than between clusters. SFT and IST thus captured different underlying constructs than MST and GST.
SFT demonstrated good discriminant validity as it was unrelated to measures of risk-seeking and social value orientations and weakly related to betrayal aversion and social desirability. SFT was thus separable from preference aversions and enhancements to personal image and was not inflated due to social preferences. IST, MST, and GST exhibited worse discriminant validity than SFT but were weakly related to measures of preference aversions, social preferences, and social desirability. SFT and IST displayed solid criterion validity and served as accurate and consistent predictors of political action, trusting behavior, and assessments of trust in an experimental context. MST and GST also predicted all three outcomes. But in relation to SFT and IST, MST and GST demonstrated upwardly biased or downwardly biased estimates depending on the outcome. Finally, SFT and IST were associated with common correlates of MST and GST, indicating that some prior findings—such as the relation between generalized trust and associational memberships (Glanville et al. 2013; Paxton 2007)—were robust to valid and reliable measures of generalized trust. Other correlates like the age–trust relation (Clark and Eisenstein 2013; Robinson and Jackson 2001; Schwadel and Stout 2012; Wilkes 2011) were called into question.
Implications
Overall, the results indicated that SFT and IST were valid and reliable instruments and that MST and GST are likely driven by different underlying constructs. These findings have two broad implications.
First, a long-standing debate in the literature concerns whether expectations about the cooperativeness and helpfulness of strangers (i.e., generalized trust) are grounded in personal experience and the perceived trustworthiness of others (Cook et al. 2005; Paxton and Glanville 2015) or a function of the cultural transmission of morals and values that reflects how an individual should perceive and ought to behave toward others (Uslaner 2002). Some scholars, in other words, contend that generalized trust is “moralistic”: a concept that depends on the norms and values of one’s culture (Almond and Verba 1963; Fukuyama 1995).
The present results demonstrated that prosocial values—a key dimension of morality (Simpson and Willer 2015)—were moderately associated with common generalized trust scales (as well as IST) but were statistically unrelated to SFT. This finding coupled with the tests of convergent and discriminant validity suggests that all four measures capture general expectations about the cooperativeness of strangers, but that once a measure considers the dimensional expanse of generalized trust—specific strangers and particular matters—then the moralistic foundations of generalized trust disappear. By not addressing issues of measurement equivalence, generalized trust scales such as IST and MST may necessarily confound prosocial values and moral motivations with generalized trust. Expectations about the cooperativeness of strangers, on the other hand, appear to constitute a unique latent core of SFT. This is an important distinction to make as operationalizations of a concept should disentangle the causes of said concept from the words and meanings used to define said concept (Cohen 1980). Overall, I observe weak support for the idea that generalized trust—when properly and accurately measured—is a function of prosocial values and moral motivations.
Second, the results suggested that issues of invalidity and unreliability were a problem for common generalized trust scales. Not only were MST and GST less reliable than SFT and IST, but MST and GST exhibited poorer convergent validity and discriminant validity than SFT and IST. Insofar as SFT and IST are valid and reliable measures of generalized trust, the results call into question prior findings regarding the causes and consequences of generalized trust. Although all four scales predicted the three outcomes under study (political action, trust behavior, and trust mechanic), the effect sizes of MST and GST did not parallel the effect sizes of SFT and IST. Estimates of MST and GST were either upwardly or downwardly biased. While speculative, the findings suggest that the effects observed in prior research linking generalized trust to behavior in trust games might be overstated (Fehr et al. 2002; Gächter, Herrmann, and Thöni 2004; Sapienza, Toldra-Simats, and Zingales 2013) or that the relation between generalized trust and political participation is biased (Almond and Verba 1963; Inglehart and Norris 2003; Putnam 2000). Obviously more work is required to determine whether this is the case.
Moreover, the findings revealed that common predictors of generalized trust were inconsistently related to SFT/IST and MST/GST. Most notably, the age–trust relation showed that generalized trust—as measured by SFT and IST—was negatively related to age, while MST and GST were positively related to age. The implications of this finding are significant. Putnam’s (2000) now classic work suggests that social capital generally, and generalized trust more specifically, is on the decline in the United States, with age–period–cohort analyses largely supporting Putnam’s claim (Clark and Eisenstein 2013; Robinson and Jackson 2001; Schwadel and Stout 2012; Wilkes 2011). This collection of work—using the most-people trust question—consistently shows that older generations are more trusting than younger generations (a finding replicated in the current study with MST and GST). SFT and IST, in contrast, suggest the opposite: Older generations are less trusting than younger generations. A finding that contradicts most work in this area. While the current study is unable to fully address how SFT/IST varies by age, periods, and cohorts in the same model (this would require repeated observations of SFT and IST through time), it (a) calls into question how generalized trust and age are related and (b) suggests that efforts should be made to implement more valid and reliable indicators of generalized trust in common public opinion surveys such as the GSS, ESS, and WVS. Finally, given the constellation of other outcomes and predictors that were not included in the present study, future research should probe the similarities and differences between SFT/IST and MST/GST with other instruments known to correlate with generalized trust.
What, then, are the practical implications of applying SFT and IST to one’s own survey research or to common public opinion surveys? First and foremost, SFT is costly in terms of time. Embedding SFT in an already long survey may be impractical and infeasible, especially for a concept like generalized trust where survey space may be better devoted to questions assessing contemporary social problems such as attitudes toward migrants or experiences of discrimination. If survey duration is not a concern, then SFT should be used given that it is slightly more valid and reliable than IST. If survey duration is an issue, then IST is a suitable alternative with caveats: IST exhibits poorer discriminant validity than SFT, and IST suffers from greater social desirability bias than SFT. Yet IST is preferable to MST and GST given that SFT and IST exhibit comparable internal consistency, convergent validity, and criterion validity (with slightly better assessments of measurement validity and reliability for SFT). In either case, the use of SFT or IST instead of MST and GST would improve how generalized trust is operationalized and, more importantly, reduce measurement error.
Directions for Future Research
A number of research directions remain for the measurement validation of SFT and IST. First, test–retest reliability was not established. Future work should investigate this important element of measurement validation by administering SFT and IST to a sample of respondents at time 1 and then four to six weeks later at time 2 (assuming that respondents’ true level of generalized trust did not change from time 1 to time 2). Second, while criterion validity was established, predictive validity—or the ability of a construct to predict an outcome into the future—was not. As a result, future research should investigate the predictive validity of SFT and IST, possibly with self-reports of political action six months after the administration of the instruments or with behavioral measures of trust observed in the classic trust game. Both endeavors would bolster the criterion validity of SFT and IST. Third, although representative, the data used in the present study were based on nonprobability sampling techniques. Future research that collects probability-based samples, such as the National Opinion Research Center’s AmeriSpeak webpanel, would be a welcome addition to the literature. Fourth, SFT—and IST to a lesser extent—might promote satisficing due to its length. Respondents may not make sufficient effort in answering SFT optimally and provide satisfactory answers requiring least effort instead, which if present could systematically bias responses (Krosnick 1991). Given that satisficing is commonly observed among respondents with low cognitive ability, low educational background, and little motivation (Holbrook, Green, and Krosnick 2003; Narayan and Krosnick 1996), future research should investigate the cognitive demands required of SFT and IST.
Fifth, the cross-cultural equivalence of SFT and IST should be established, especially since the content of the faces and trust domains are largely applicable to Western (mainly United States) nations. It is imperative for future research to refine IST and SFT for cross-cultural administration and comparison. Making adjustments to IST should be fairly uncomplicated (e.g., changing the currency of the LOAN indicator). Adjustments to the SFT scale, on the other hand, may prove more difficult. The selection of faces would ideally be standardized across cultures and may require computer generated faces with varying ages, genders, and shades of skin tone. Ideally, SFT and IST would be validated against these cross-cultural instruments. Alternatively, given that IST is highly correlated with SFT, researchers could establish the “radius-of-trust” and measurement equivalence of IST across cultures. Regardless of the chosen direction, future research looks bright for the analysis and validation of SFT and IST. Sixth, measures of generalized trust are commonly associated with contextual variables that reside at the neighborhood, regional, and national levels. Contextual effects include moral communities (Traunmüller 2011), violent crime (Messner et al. 2004), ethnic diversity (Dinesen and Sønderskov 2015), and income inequality (Fairbrother and Martin 2013). It would be worthwhile for future studies to explore the relation between common contextual correlates of generalized trust with SFT and IST. Given how there is considerable overlap—but noticeable differences—in what variables are associated with SFT/IST and MST/GST, my expectation is that contextual variables will be no different. But this is a question requiring empirical scrutiny that I reserve for future research.
Finally, on a theoretical note, generalized trust has yet to be embedded within a general theoretical framework. It is difficult to identify and delineate the domains and circumstances—or scope conditions—under which generalized trust matters for the formation of relational trust and, more importantly, observed behavior. For instance, would high generalized trusters trust anyone to remove a brain tumor to a greater extent than low generalized trusters? My guess would be no, and that more work is required to specify the scope conditions in which generalized trust applies. Abstractly, some conditions might include the level of uncertainty or the perceived cost of an outcome. Because of this theoretical hole, we need more empirical work detailing the matters and domains to which generalized trust applies. Only then can we outline the theoretical utility of generalized trust.
Conclusion
The framework of SFT and IST allows researchers to examine the dimensional expanse of generalized trust within and across domains of trust. SFT facilitates the comparison of trust in different strangers, absolves the “radius-of-trust” problem, and bolsters measurement equivalence. SFT and IST permit within- and between-individual comparisons of different trust domains (e.g., to keep a secret and to repay a loan). That is, SFT and IST allow researchers to examine individuals who display high amounts of trust across several domains to those who trust in only a couple of domains. While SFT is slightly more difficult to administer than IST, both are easy to score and measure. My hope is that both will prove useful in a variety of settings.
Supplemental Material
Measuring_Generalized_Trust_-_Supplemental_Material - Measuring Generalized Trust: Two New Approaches
Measuring_Generalized_Trust_-_Supplemental_Material for Measuring Generalized Trust: Two New Approaches by Blaine G. Robbins in Sociological Methods & Research
Footnotes
Acknowledgments
I am indebted to Paul Bauer, Jocelyn Bélanger, Hannah Bruckner, Jennifer Glanville, Maria Grigoryeva, PJ Henry, Edgar Kiser, Ross Matsueda, Jaime Napier, Alexandra Suppes, and Jacob Young for comments, conversations, and/or suggestions. I benefited from the opportunity to present this work to members of the 3Labs Social Psychology Working Group in the Department of Psychology at New York University Abu Dhabi and to participants of the workshop on The Determinants of Social Trust in Uppsala, Sweden. I would also like to thank Cian Farrelly and Kerstin Brüning from Qualtrics for their research assistance.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
