Abstract
Behavioral neuroscientists have shown that the neuropeptide oxytocin (OT) plays a key role in social attachment and affiliation in nonhuman mammals. Inspired by this initial research, many social scientists proceeded to examine the associations of OT with trust in humans over the past decade. To conduct this work, they have (a) examined the effects of exogenous OT increase caused by intranasal administration on trusting behavior, (b) correlated individual difference measures of OT plasma levels with measures of trust, and (c) searched for genetic polymorphisms of the OT receptor gene that might be associated with trust. We discuss the different methods used by OT behavioral researchers and review evidence that links OT to trust in humans. Unfortunately, the simplest promising finding associating intranasal OT with higher trust has not replicated well. Moreover, the plasma OT evidence is flawed by how OT is measured in peripheral bodily fluids. Finally, in recent large-sample studies, researchers failed to find consistent associations of specific OT-related genetic polymorphisms and trust. We conclude that the cumulative evidence does not provide robust convergent evidence that human trust is reliably associated with OT (or caused by it). We end with constructive ideas for improving the robustness and rigor of OT research.
Some domains of psychological science, such as the natural sciences, are reductionistic enterprises in which researchers seek to explain complex phenomena in terms of the interactions of simpler phenomena. The desire to reduce psychology to its physical building blocks is responsible for psychologists’ fascination with the hormonal bases of human cognition and behavior. For researchers interested in human social cognition and social behavior—trust, emotion recognition, friendship, generosity, and cooperation, for example—one hormone that has attracted a fulsome amount of attention over the past decade is oxytocin (OT).
OT is a neuropeptide hormone that is synthesized in the hypothalamus and is released in both the brain and the periphery. It has a well-established physiological role in female reproductive function (e.g., facilitating parturition and milk ejection during lactation). Over the past few decades, animal studies have demonstrated that OT also plays a role in the regulation of several mammalian social behaviors (Choleris, Pfaff, & Kavaliers, 2013). Injecting OT into the brain of female rats, for example, appears to facilitate full maternal behavior toward orphaned cubs (Pedersen & Prange, 1979), and administering OT into the brain of female voles also appears to induce the formation of partner preferences, pointing to OT’s role in pair bonding (Carter, 1998; Lim & Young, 2006; Williams, Insel, Harbaugh, & Carter, 1994). In other work, OT has also been shown to be involved in social recognition (e.g., Bielsky & Young, 2004) and other social behaviors, such as maternal aggression (Campbell, 2008).
Inspired by the progress in animal research, investigators began conducting studies in which OT was linked with human social behaviors about a decade ago. This line of research has involved three main methods:
studying the effects of OT on behavioral measures experimentally via placebo-controlled administration of intranasal OT (Bakermans-Kranenburg & Van IJzendoorn, 2013; Bartz, Zaki, Bolger, & Ochsner, 2011; Bos, Panksepp, Bluthé, & van Honk, 2012; Churchland & Winkielman, 2012; Veening & Olivier, 2013),
correlating plasma OT levels with behavioral measures among individuals (Feldman, 2012), and
correlating OT-related genetic polymorphisms with behavioral measures among individuals (Donaldson & Young, 2008; Ebstein, Knafo, Mankuta, Chew, & Lai, 2012).
Our main focus here concerns several studies on OT and trust among humans. The first publications in which a causal link was reported between exogenous administration of OT and human trust (Kosfeld, Heinrichs, Zak, Fischbacher, & Fehr, 2005) and a correlation between trust and OT plasma levels (Zak, Kurzban, & Matzner, 2005) came in 2005. Many other researchers have used intranasal OT since then.
The intriguing evidence of potential causal effects of OT on aspects of sociality inspired numerous researchers in many disciplines, from psychiatry to public policy (Bartz & Hollander, 2008; Fehr, 2009; Harris, 2011; Insel, 2010; Liu, McErlean, & Dadds, 2012; Meyer-Lindenberg, Domes, Kirsch, & Heinrichs, 2011; Neumann, 2008; Neumann & Landgraf, 2012; Riedl & Javor, 2012; Zak, 2011). Enthusiasm about the potential of OT intervention has led to proposals to use OT in couples therapy (Ditzen et al., 2009; Savulescu & Sandberg, 2008; Wudarczyk, Earp, Guastella, & Savulescu, 2013) and to promote prosocial behaviors in patients with autism (Anagnostou et al., 2014; Guastella et al., 2014; L. J. Young & Barrett, 2015). Press accounts and broad-audience writing gave OT a popular reputation as the “love hormone,” “liquid trust,” or the “moral molecule” (e.g., Zak, 2011). Indeed, OT spray, which allegedly enhances trust, can be purchased through online retailers, such as Amazon.
Despite the most extreme nonscientific hype, there is but little doubt that oxytocinergic brain systems are involved in some social behaviors in nonhuman animals (Meyer-Lindenberg et al., 2011). However, there is a complicated scientific chain from the animal evidence to broad claims about OT and aspects of human sociality, including behavior as complex as trust. After all, the repertoire of social behaviors with which OT has been associated in nonhuman animals is quite wide. Those behaviors include maternal aggression and pathogen avoidance (Campbell, 2008; Kavaliers & Choleris, 2011), which might undermine some kinds of human trust in others. In addition, in recent studies with humans, researchers have correlated OT with behaviors and cognitions that appear to work against trust, including envy, gloating, and ethnocentrism (Bethlehem, Baron-Cohen, van Honk, Auyeung, & Bos, 2014; De Dreu, Greer, Van Kleef, Shalvi, & Handgraaf, 2011; Radke & De Bruijn, 2012; Shamay-Tsoory et al., 2009; Tops, 2010).
Besides the inconclusive evidence for OT–sociality influence, there are good methodological reasons to review previous OT research with a careful eye. In their review of a decade of exogenous OT research, Bartz et al. (2011) noted that a main effect of OT on target behavior was found in only half of the published studies; the large amount of variables that could potentially moderate OT’s behavioral effects might inflate the rate of false discoveries because of multiple hypotheses testing (Shaffer, 1995).
Another reason to reexamine the methods that have been put to use in this area is that some behavioral researchers have relied on assumptions about the bioanalytic validity of OT measurement and administration methods that have not been fully confirmed (Churchland & Winkielman, 2012; Evans, Dal Monte, Noble, & Averbeck, 2014; Guastella et al., 2013; Guastella & MacLeod, 2012; Leng & Ludwig, 2015; McCullough, Churchland, & Mendez, 2013; Veening & Olivier, 2013). A third reason is that scientific excitement about OT is high. Such excitement could lead to some degree of overpublication of surprising results, which later turn out to be weaker than first shown. This general concern about scientific reproducibility is hardly unique to OT research, of course, and has been noted in many fields (Cesario, 2014; Ioannidis, Munafò, Fusar-Poli, Nosek, & David, 2014; Simmons, Nelson, & Simonsohn, 2011; Simons, 2014).
In this article, we reexamine the specific OT–trust hypothesis by looking carefully at the original studies and those that followed them. The scope of this article is narrow in terms of behavioral paradigms: We focus on trust and examine its link with various OT-related variables (i.e., intranasal OT administration, peripheral OT levels, and OT-related genetic factors) in typical and healthy humans (for broader overviews of OT research, see Bakermans-Kranenburg & Van IJzendoorn, 2013; Bartz et al., 2011; Carson, Guastella, Taylor, & McGregor, 2013; Churchland & Winkielman, 2012; Ebstein et al., 2012; Feldman, 2012; McCall & Singer, 2012; Meyer-Lindenberg et al., 2011; Stoop, 2012).
We concentrate on OT and trust for two reasons. First, OT researchers in general have used a large variety of behavioral tasks. Direct replications in which identical tasks are used are rare, even though such replications are obviously scientifically valuable (e.g., Simons, 2014). Studies of trust are a useful exception, because many researchers have used a highly comparable trust game originally created for use in experimental economics. The relatively standardized nature of the trust game allows for examining direct and near replications and for rigorously estimating the robustness of the OT–trust link. Second, the seminal articles in which OT and trust are linked (e.g., Kosfeld et al., 2005; Zak et al., 2005) have influenced much of the later behavioral OT research and OT-related theory development across many disciplines.
The reminder of this article is organized as follows. First, we review the behavioral paradigms used to measure trust and biological OT-related variables and discuss the methodological challenges accompanying this line of research. Next, we survey the behavioral OT–trust literature across methodologies (intranasal OT administration, peripheral measures, and genetics). We focus mostly on the trust game but mention other paradigms too. Finally, we summarize the current state of the OT–trust literature and offer our impressions of its apparent evidentiary value.
Methods and Methodological Challenges
Behavioral measures of trust
Trust is typically defined as a psychological state or behavior in which one person is willing to make him- or herself vulnerable to what another person will do, presumably because he or she is sufficiently confident that the other person will not exploit him or her (i.e., be trustworthy; Barber, 1983; Hardin, 2002; Mayer, Davis, & Schoorman, 1995; Ostrom & Walker, 2003; Rousseau, Sitkin, Burt, & Camerer, 1998). Trust is a lubricant of social systems at many levels, from friendships and marriages to business partnerships (Arrow, 1974). Aggregated indicators of trust are also associated with important macroeconomic factors at the country level, including economic growth, low inflation rates, and high volumes of trade (Butler, Giuliano, & Guiso, 2009; Jen, Sund, Johnston, & Jones, 2010; Knack & Keefer, 1997; La Porta, Lopez-De-Silane, Shleifer, & Vishny, 1996; Zak & Knack, 2001).
Trust has been measured with both self-reports in surveys (Johnson-George & Swap, 1982; Wrightsman, 1991) and in economics experiments using monetarily incentivized tasks that are specifically designed for this purpose. The most famous of these tasks is the trust game.
The trust game
In economic theory, trust is the act of engaging in a sequential informal trade that is mutually beneficial but without external legal enforcement in case one side does not keep up his or her end of the bargain (Coleman, 1990; Fehr, 2009). The essence of trust as an economic behavior is embodied by the trust game. The trust game was first implemented experimentally by Camerer and Weigelt (1988) and was simplified several years later by Berg, Dickhaut, and McCabe (1995) (also see Camerer, 2003; Fehr, Kirchsteiger, & Riedl, 1993; King-Casas et al., 2005).
In the trust game, two players—called an investor and trustee—interact nonverbally by sending money to each other in a two-step sequence (see Fig. 1). Each player is typically endowed with an initial amount of money. The investor first decides how much of an endowment (defined as an investment) to send to the trustee. This investment is multiplied by a productivity factor (typically 3), which represents the collective gain from working together. The multiplied investment is added to the trustee’s balance. Then, in a second stage, the trustee decides how much of the balance—often called a back-transfer—to return to the investor.

The trust game illustration. At the beginning of the game, each player (investor and trustee) is endowed with 12 monetary units (MUs). At the first stage of the trust game, the investor chooses how much (if any) of his or her endowment to send to the trustee, and he or she can send any amount between 0 and 12 MUs (the red dotted line indicates the range of possible transfers, and the values of 0, 4, 8, and 12 are for illustration). The investment is then tripled and added to the trustee’s account. At the second stage, the trustee is informed about the investor’s transfer, and he or she has the option of sending a back-transfer to the investor (the blue dotted lines indicate the range of possible back-transfers). For example, if the investor has sent 8 MUs, the trustee will have 36 MUs after the first stage (12 MUs of endowment + 24 MUs for the tripled transfer) and can therefore send back any amount between 0 and 36 MUs. The trustee gets to keep the rest of the money to him- or herself.
From a selfish profit-maximizing perspective, the trustee has no incentive to send any money back to the investor in the trust game. A selfish, rational investor should therefore expect a back-transfer of zero and, therefore, invest nothing at the outset. However, many people will send back money because of guilt, moral obligation, the personal desire to share money evenly, or other reasons (the underlying explanations are still being debated). Indeed, in most experimental populations, many investors do send money in the trust game. Trustees typically reciprocate, to some degree, by sending substantial back-transfers (Camerer, 2003; Johnson & Mislin, 2011). The amount of back-transfer is a measure of trustworthiness, and the amount of investment is a measure of trust.
Although the trust game seemingly lacks the social and affective richness that characterizes trust in many social settings, it nevertheless captures its fundamental behavioral features. Moreover, investment in the trust game has three advantages as an experimental measure of trust. First, it provides an active behavioral measure. Subjects’ decisions are typically used to determine actual monetary consequences, which presumably increase their engagement in the task. Second, it is easy to study interesting variants of the trust game—such as changing the investment multiplier, allowing communication, or repeating the game—so that reputations can form (Camerer & Weigelt, 1988; King-Casas et al., 2005). Third, as noted earlier, the game has been used in several different studies—sometimes with only small methodological variations—providing a rare opportunity to observe identical or near replication.
Note that investment in the trust game could be due to different causes. The readiest interpretation is that trust reflects an expectation of trustworthy repayment. It could also be that trustees who invest generously do so just to create the largest collective gain (even if they will get none back), but this explanation has been ruled out (Brülhart & Usunier, 2012).
A more likely contributing cause is that trust is affected by the investor’s willingness to take financial risks (because investing leads to uncertain payoffs). Indeed, some studies show that trust game investment correlates with one’s general attitude toward risk (Houser, Schunk, & Winter, 2010; Karlan, 2005; Nickel & Vaesen, 2012; Schechter, 2007). However, when financial risk taking is compared with equivalent social risk taking in the trust game, it is clear that the trust game generates a special type of betrayal aversion (beyond aversion toward variation in outcomes), which is a distaste for being let down by an untrustworthy trustee partner (Aimone, Houser, & Weber, 2014; Bohnet & Zeckhauser, 2004).
Nonetheless, the possibility that a shift in general risk attitude because of OT treatment might partly cause behavioral effects in the trust game is a plausible hypothesis, as OT has been reported to have anxiolytic effects (Choleris et al., 2013; MacDonald & Feifel, 2014). OT might also interact with other hormones such as testosterone and cortisol that have already been shown to covary with risk attitudes (Apicella et al., 2008; Coates & Herbert, 2008; Kandasamy et al., 2014).
Therefore, to ensure that OT affects trust rather than risk, researchers often control for the subject’s risk attitudes by using the risk game—which is not played with another human partner. In the risk game, the investor’s decision is identical to the first stage of the trust game except that risk is generated by a chance process rather than by a social action of a trustee. An investor in the risk game is explicitly instructed on the possible outcomes of his or her investment and the probabilities of their occurrence (e.g., the investment will either be doubled or lost completely, both with .50 probability).
Biological methods and measures used in OT research
Intranasal OT administration
The principal method for studying OT’s behavioral effects has involved directly manipulating OT levels by administering OT via an intranasal OT spray (Guastella & MacLeod, 2012; Veening & Olivier, 2013). The procedure is simple and causes minimal discomfort for subjects. Most important, because the intranasal OT treatment is randomly assigned and always precedes behavior, this method permits valid causal inference (provided that internal validity is established).
However, unlike in pharmacological studies of other hormones (e.g., testosterone; Eisenegger, von Eckardstein, Fehr, & von Eckardstein, 2013; Nave, Nadler, & Camerer, 2015), the pharmacokinetics of intranasal OT in humans is not well understood (Churchland & Winkielman, 2012; Guastella et al., 2013; Guastella & MacLeod, 2012; Leng & Ludwig, 2015; Veening & Olivier, 2013). Researchers also lack a simple way to conduct a manipulation check to verify whether the substance indeed reached the human brain following administration. Therefore, in contrast to intravenous OT administration, intranasal OT is not commonly used in clinical practice and is not commercially available in most countries.
In recent studies, researchers have reported that intranasal OT enhances resting-state amygdala–prefrontal cortex connectivity (Sripada et al., 2013) and increases cerebral blood flow in several brain regions containing OT receptors (Paloyelis et al., 2014). However, direct evidence that intranasal OT crosses the blood–brain barrier and reaches its target brain tissues following administration in humans is very limited. In most published OT studies, researchers have justified this assumption on the basis of just one experiment in which Born et al. (2002) reported elevated cerebral and spinal fluid levels of arginine vasopressin—a neuropeptide that is similar in structure yet not identical to OT—following intranasal vasopressin administration in humans. As far as we know, this vasopressin-based study has not been independently replicated, and the assumption that nasal drug delivery of relatively large molecules (such as OT) bypasses the blood–brain barrier in humans remains controversial (Evans et al., 2014; Leng & Ludwig, 2015; Merkus & van den Berg, 2007; Veening & Olivier, 2013).
It is encouraging that rodent studies appear to show increased cerebral OT levels following intranasal OT delivery (Neumann, Maloumby, Beiderbeck, Lukas, & Landgraf, 2013). However, interspecies differences in nasal cavity structure limit confidence that these rodent findings will necessarily extrapolate to humans. 1 Better evidence comes from parallel studies with nonhuman primates, whose nasal cavity tissues are more similar to those of humans than are rodents’ nasal cavities. There are three recent studies on small samples of macaques. The evidence is encouraging but hardly conclusive. In two studies, elevated cerebrospinal fluid and plasma OT levels were reported following intranasal OT administration in macaques (Chang & Platt, 2014; Dal Monte, Noble, Turchi, Cummins, & Averbeck, 2014). In another study, an effect of intranasal OT (or intravenous injection) was not found on OT levels in macaque cerebrospinal fluid (Modi, Connor-Stroud, Landgraf, Young, & Parr, 2014). However, the latter study did show that delivering OT through an aerosolized solution in a nebulizer (e.g., as used by asthmatics) increased OT in macaque cerebrospinal fluid (Modi et al., 2014).
We know of only a single experiment on humans that yielded elevated cerebrospinal fluid and blood OT levels following intranasal OT administration (Striepens et al., 2013). In this experiment, plasma OT increased as early as 15 min following treatment, but cerebrospinal fluid levels increased only after 75 min. No cerebrospinal fluid changes were observed after 45 min—the common time window in which most researchers have come to assume that OT exerts its most reliable behavioral effects. Perhaps most troublingly, the reported statistical effects were derived from analyses of variance that were inappropriate for the small sample size (N = 3 subjects treated with OT), which included a control group of a single subject (Hartung, Argaç, & Makambi, 2002; Toothaker, Banz, Noble, Camp, & Davis, 1983).
Taken together, the evidence on whether intranasal OT reaches the central nervous system, even in cerebrospinal fluid, is not as strong as one might hope. Only one human study has been conducted, and the studies with macaques indicate that cerebrospinal fluid concentration of OT depends on the time of measurement and the method of delivery.
Peripheral OT measures
A different approach to studying OT involves correlating the levels of OT in body fluids (e.g., blood plasma, saliva, or urine) with behavioral measures among subjects. Such measures are called peripheral. Such correlations cannot establish causality, but correlating OT-related differences with behavior is suggestive.
It is still an open question whether peripheral OT measures are reliable indicators of OT-related brain processes, as discussed extensively earlier (because the OT molecule cannot cross the human blood–brain barrier). Some studies have suggested that certain conditions stimulate simultaneous release of both peripheral and central OT, but others have suggested changes in central OT only (for a review, see Neumann & Landgraf, 2012). Four important studies on humans measured OT levels in both plasma and cerebrospinal fluid. In one study, a significant positive correlation was found between the two (Carson et al., 2014), whereas in the other three studies, there was no significant positive correlation (Jokinen et al., 2012; Kagerbauer et al., 2013; Martin et al., 2014).
Another crucial methodological concern is how OT levels in body fluids are measured. Immunoassays use OT antibodies to quantify the levels of the peptide within a sample. The two most commonly used methods are radioimmunoassay immunosorbent assay (RIA) and enzyme-linked immunosorbent assay (ELISA). However, many OT antibodies also bind to substances that are not OT—such as proteins, other peptides, or their degradation products—that exist in human blood plasma (Christensen, Shiyanov, Estepp, & Schlager, 2014; McCullough et al., 2013; Szeto et al., 2011). Accurate immunoassay therefore requires a step called extraction in which the fluid is cleaned of all these nuisance substances.
Extraction is time consuming, but its importance can hardly be overstated. A failure to extract leads to a large overestimate (by 1–3 orders of magnitude) of how much OT is actually in the sample, and it can lead to estimates that are uncorrelated with the true OT levels measured after extraction (McCullough et al., 2013; Szeto et al., 2011). 2 Moreover, in a recent study, Christensen et al. (2014) explored how crucial extraction is by exogenously “spiking” plasma samples with known quantities of OT. 3 Christensen et al. measured OT with and without extraction using both RIA and ELISA kits. When extraction was not performed, both kits failed to detect significant OT changes, even though OT increase was clearly present; the change was detected by both methods when extraction was used (Christensen et al., 2014).
Remarkably, many OT researchers have skipped the extraction step over the past 10 years. Whether to conduct extraction is not a matter of taste; it is necessary to reduce measurement error in OT, as even commercial assay instruction manuals have emphasized. 4 To complicate matters, there may be an even more fundamental problem with at least some of the commercially available OT immunoassay kits: Some analysts have found that some samples of human blood plasma appear to contain non-OT compounds that are nevertheless OT-immunoreactive (and which were in fact responsible for most of the OT-immunoreactivity in the samples examined). The assays tagged these compounds as OT even though they did not appear to be genuine OT and even though they were not removed by extraction (McCullough et al., 2013; Szeto et al., 2011). Furthermore, even when extraction was performed, the correlation between RIA and ELISA OT estimates was only .80, and there was a considerable scalar difference between the two measures (Christensen et al., 2014).
Finally, although researchers have reported that OT measures from other body fluids, such as saliva and urine, correlate with various social behaviors (e.g., Abraham et al., 2014; Fries, Ziegler, Kurian, Jacoris, & Pollak, 2005; Fujiwara, Kubzansky, Matsumoto, & Kawachi, 2012) and with OT measured from unextracted plasma (Feldman, Gordon, Schneiderman, Weisman, & Zagoory-Sharon, 2010), several researchers have raised serious methodological concerns regarding the bio-analytical validity of these measures (Gröschl, 2009; Horvat-Gordon, Granger, Schwartz, Nelson, & Kivlighan, 2005; S. N. Young & Anderson, 2010).
OT-related genetic variables
Twin studies and other methods have established that most human phenotypic traits have some degree of genetic heritability (Beauchamp et al., 2011; Benjamin et al., 2012), including self-report measures of trust and trust game behaviors (Cesarini et al., 2008; Dohmen, Falk, Huffman, & Sunde, 2012; Sturgis et al., 2010). Moreover, conducting genetic OT manipulations (e.g., knocking out genes) induces dramatic changes in rodents’ social behaviors (Donaldson & Young, 2008). From the starting point of these basic insights, researchers in recent years have begun to explore the possible associations of genetic differences with variability in human trust.
Experimental genetic work cannot be conducted in humans for obvious ethical reasons. Therefore, human genomics research involving OT has focused on correlating social behaviors with single nucleotide polymorphisms (SNPs)—variations that occur in a single nucleotide along the DNA sequence of the OT receptor gene (OXTR; Donaldson & Young, 2008; Ebstein et al., 2012; Kumsta & Heinrichs, 2013; Kumsta, Hummel, Chen, & Heinrichs, 2013). 5 Typically, SNP frequencies are compared across phenotypes (e.g., trust game behavior) to explore whether between-subjects genetic variation in these chromosomal regions are correlated with phenotypic variance. Given the large hypothesis space, the probability of finding a false-positive association is high. 6 Thus, replication, meta-analysis, and statistical correction for multiple hypotheses are necessary steps in validating SNP investigations (Ebstein, Israel, Chew, Zhong, & Knafo, 2010; Ebstein et al., 2012).
A Review of the OT–Trust Literature
In this section, we review the experimental literature linking OT and trust across methodologies: intranasal administration, OT plasma measurements, and genetics.
Intranasal administration causal studies
Behavioral literature in which the trust game is used
The first study in which intranasal OT and trust in humans were linked (Kosfeld et al., 2005) consisted of two experiments. In the first experiment, subjects (N = 58) 7 received either intranasal OT or a placebo and then played four rounds of the trust game with a stranger. In the second experiment, a different sample (N = 61) played the risk game as a control. There was an effect: The average trust game investment of the OT group was larger than that of the placebo group (p = .029, one-tailed); however, there was no OT effect on the trustees’ back-transfer levels. Furthermore, OT had no effect on the risk game investment.
This finding has been widely cited, 8 but of course, it is useful to see replications before regarding any single study as conclusive. In fact, we know of six attempts to replicate these initial findings—at least conceptually (see Table 1). In all of them, researchers used intranasal OT administration and the trust game. The studies typically indicate some element of identical replication as well as additional design features such as magnetic resonance imaging scanning or social-context manipulation.
Summary of Trust Game Transfers (Pre- and Postbetrayal Feedback)
Note: All subjects are male unless stated otherwise.
Means and standard deviations are calculated from the data (shared by authors); the statistics are presented for reliable partner games only. bStatistics from authors. cThe choices of second-movers were collected before scanning, and they were not present during the experiment. dStatistics presented for typical (nonborderline) subjects.
The first replication attempt (Baumgartner, Heinrichs, Vonlanthen, Fischbacher, & Fehr, 2008) was a functional magnetic resonance imaging version of the experiment (N = 49) with two stages. In the first stage (“prefeedback”), subjects played the investor role in both the risk game and the trust game with anonymous partners (six rounds each). Then, subjects received deceptive feedback that 50% of their partners had betrayed their trust (i.e., did not send a back-transfer). In the second stage (“postfeedback”), subjects played six additional risk and trust games with new anonymous partners.
Like Kosfeld et al. (2005), Baumgartner et al. (2008) found no causal OT effect in the risk game. Unlike Kosfeld et al.’s study, however, there was also no OT effect in the prefeedback trust game investment, which was a reasonably close replication of Kosfeld et al.’s study: The difference between the mean investments of the OT and placebo groups was less than 0.10 monetary units out of 12 monetary units that could have been allocated. In the postfeedback trust game, the placebo subjects reduced their investments, but the OT group did not (Drug × Stage interaction, p = .05).
In a behavioral replication of Baumgartner et al.’s (2008) study (N = 40; with no functional magnetic resonance imaging scanning), Klackl, Pfundmair, Agroskin, and Jonas (2013) found no reliable main effect of intranasal OT on subjects’ trust game investments and no interaction with feedback.
In a different study, N = 60 subjects received either intranasal OT or a placebo (Mikolajczak, Gross, et al., 2010). Then they played 10 trust games with (fictional) online partners and 10 trust games with a computer trustee who would respond randomly (we think of the computerized trustee as a control for risk attitudes). Before each of the trust games, the subjects read brief descriptions of each of the trustees: Half of them were portrayed as “reliable” (e.g., people who practiced first aid), and the other half were portrayed as “unreliable” (e.g., people who played violent sports).
Mikolajczak, Gross, et al. (2010) found that OT increased the trust game investments but only when playing with the five reliable partners (there was no OT effect when playing unreliable partners). Surprisingly, the largest OT effect was found in the risky computer game. Our reanalysis of these data, in which we used the mean investment in the computer condition as a control for subjects’ risk attitudes, shows that risk attitudes are a strong correlate of trust game investment (p < .01) and that the effects of OT on trust become highly insignificant, even for the reliable partners (see Table S1 in the Supplemental Material available online). These results sit uneasily with the conclusion of Kosfeld et al. (2005)—that OT affects trust while keeping one’s general attitude toward risk intact—although Mikolajczak, Gross, et al. did not use an identical computer trust game condition. 9
In three other studies, the effect of intranasal OT was tested on the investment in the trust game. Barraza (2010) found no reliable effect in a series of four consecutive trust games played with a single anonymous partner, with feedback on the back-transfer after each game. Ebert et al. (2013) found no main effect of OT on trust in typical subjects using a counterbalanced within-subjects design (i.e., each subject was tested under both treatments) and also found that OT reduced trust in subjects with borderline personality disorder.
Finally, Yao et al. (2014) had subjects (N = 104; both men and women) play the trust game with four anonymous partners. One of those partners behaved in a “fair” manner (i.e., returned enough money such that both investor and trustee would end up with equal amounts of money), whereas the other three behaved in an “unfair” manner (i.e., returned no money at all). Then subjects received either intranasal OT or a placebo; 45 min later, subjects received instant messages from two of the four partners. The fair player and one unfair player sent no message, the second unfair player sent an apology, and the third unfair player offered to make a compensatory monetary transfer that would make the payoffs equal. Then, subjects took part in a “surprise” round of the trust game with the same four partners. The researchers found that OT did not increase trust game investment in any of these conditions and that women (but not men) actually behaved in a less trusting manner following intranasal OT treatment.
To summarize the results discussed earlier quantitatively, we conducted a meta-analysis of all intranasal OT trust game experiments. Our selection criteria was intentionally liberal in favor of the OT–trust hypothesis: We excluded two conditions in which researchers had suggested (on the basis of their data) that OT might not increase trust—namely, games played with an unreliable partner (Mikolajczak, Gross, et al., 2010) and games played by subjects with borderline personality disorder (Ebert et al., 2013). 10 Moreover, we did not include any tests or corrections for publication bias. 11 Our analysis of seven studies (see Table 1 and Figure 2), comprising data from 481 subjects, revealed that the combined effect size of intranasal OT on trust was small and not reliably different from zero (Cohen’s d = 0.077, 95% CI [−0.124, 0.278], z = 0.75, p = .45). 12 Thus, the combined sample rules out effect sizes greater than 0.278 with 97.5% confidence. A subsequent power analysis showed that detecting even the upper limit effect size of 0.278 with a standard 0.80 power would require a sample size of more than 400 subjects—almost as much as the subjects in all OT trust game experiments conducted thus far together. We further tested for intranasal OT effects on trust separately in the pre- and postbetrayal feedback trust game experiments; again, we found no reliable effects (prefeedback: Cohen’s d = 0.19, 95% CI [−0.041, 0.422], z = 1.61, p = .11; postbetrayal feedback: Cohen’s d = 0.010, 95% CI [−0.283, 0.303], z = 0.07, p = .94).

Forest plot (95% confidence intervals [CIs]) of intranasal oxytocin trust game experiments. Std diff = standard difference.
Experimental literature in which other measures of trust were used
In one study, Theodoridou, Rowe, Penton-Voak, and Rogers (2009) reported that subjects (N = 96) who received intranasal OT judged photos of unfamiliar faces (both men and women) as more trustworthy and attractive compared with subjects who had received an intranasal placebo. The study is often cited as evidence for a link between OT and trust (e.g., Meyer-Lindenberg et al., 2011), even though the effect was reported for trustworthiness and attractiveness jointly—that is, for a linear composite of the two ratings. A close inspection of the error bars in Figure 1 of Theodoridou et al. (2009, p. 130), which shows the means and standard errors of both rating scales for subjects who received intranasal OT and those who did not, reveals that OT’s effect on the trustworthiness ratings was not reliably different from zero.
In a different study, N = 60 subjects received either intranasal OT or a placebo and then completed a survey that included intimate questions about their sexual habits and fantasies (Mikolajczak, Pinon, Lane, de Timary, & Luminet, 2010). Subjects were instructed that their answers were protected by confidentiality rules but nevertheless submitted them in an envelope that could be sealed and secured with tape. Only 7% of the OT group’s envelopes were sealed and taped compared with 80% of the placebo group’s envelopes; therefore, the researchers concluded that OT increased the subjects’ trust in the experimenter. As acknowledged by the authors, the result could be driven by an experimenter demand effect (Doyen, Klein, Pichon, & Cleeremans, 2012; Klatzky & Creswell, 2014), as the administration procedure in the study was single blind (i.e., the experimenter knew which treatment was assigned to which subject). Indeed, when the same researchers conducted two replication attempts of the study using a double-blind protocol, they found no OT effects (Lane et al., 2015).
Three intranasal OT studies (Gaffey & Wirth, 2014) included self-report measures of the World Benevolence scale (Janoff-Bulman, 1989; Poulin, Holman, & Buffone, 2012) and the Faith in People scale (Rosenberg, 1957)—both are commonly used as self-report measures of trust (Johnson-George & Swap, 1982; Wrightsman, 1991). 13 The researchers found no OT effects for either men or women in any of their studies—or when all three data sets were pooled together (N = 140 [men and women]). The results are summarized in Table S2 in the Supplemental Material.
Finally, in one study, Andari et al. (2010) examined the social behavior of N = 20 adults with autism after receiving either intranasal OT or placebo in a crossover within-subjects design. In the experiment, highly functioning subjects with autism played a ball-tossing game with three fictitious social partners. As a behavioral measure of trust, subjects had to decide whether to pass a computer-simulated ball to their partners to get potentially rewarding reciprocation, in which the three simulated partners varied with respect to their trustworthiness levels (i.e., the probability of reciprocation). Andari et al. found that intranasal OT significantly increased the rate of ball tosses toward the most reliable partner compared with the unreliable partner, suggesting that intranasal OT might have increased their social learning capabilities. This effect was statistically significant, though with a rather high p value (p = .047, two-tailed).
Peripheral OT studies
Three studies have been designed to evaluate the relation between plasma OT and trust (Christensen et al., 2014; Zak et al., 2005; Zhong et al., 2012). In two of those studies, researchers did not use the plasma extraction step when measuring OT.
Zak et al. (2005; N = 96) reported no association between unextracted plasma OT and trust game investment—that is, “trust”—and reported a weak positive correlation between log(OT) and trustee reciprocation levels—that is, “trustworthiness” (p = .021, one-tailed).
Using a much larger sample (N = 1,158), Zhong et al. (2012) found a U-shaped relation among unextracted plasma OT and both investor and trustee behaviors: Subjects with very high plasma OT estimates, as well as people with very low plasma OT estimates, evinced more trust than did people with moderate levels of OT.
Both studies relied on the use of a commercial ELISA assay for measuring OT that was performed on unextracted samples of plasma, an analytic choice that is no longer trusted by some researchers (Christensen et al., 2014; McCullough et al., 2013; Szeto et al., 2011)—or even the assay’s manufacturer—to yield bioanalytically valid measures of OT. Not surprising, the methods that Zak et al. (2005) and Zhong et al. (2012) used to measure OT yielded values that were from ten-fold to one-hundred-fold higher than what should be expected in plasma, further calling into question the validity of their measurements.
In a single study, Christensen et al. (2014) tested for an association between extracted plasma OT levels (measured with both ELISA and RIA kits) and trust in humans. In this study (N = 82), Christensen et al. measured trust using a modified prisoner’s dilemma game, played with partners of two types: a stranger and a family member. Christensen et al. found no significant associations between the trusting behaviors of subjects, or their partners, and OT measures (neither baseline plasma levels nor task-related OT changes). There was also no association between OT and the identity of one’s partner (i.e., family member vs. a stranger).
Genetic OT studies
In a single study, Krueger et al. (2012) reported a significant association between a common variation (rs53576) in the OXTR gene and trust game investment (N = 108). In a study with a larger sample (N = 684) of Swedish twins, Apicella et al. (2010) failed to replicate Kruger et al.’s finding and reported no associations among subjects’ trust game behaviors or subjects’ transfers in a dictator game and nine other OT SNPs. Other OXTR SNPs, in particular rs53576 and rs2254298, were found to correlate with a range of social behaviors in small sample studies (see Kumsta & Heinrichs, 2013; Kumsta et al., 2013)—for example, prosocial behavior (Israel et al., 2009), empathy (Rodrigues, Saslow, Garcia, John, & Keltner, 2009), and autism (Lerer et al., 2007). However, in a recent meta-analysis of these SNPs (N = 13,547), Bakermans-Kranenburg and van IJzendoorn (2014) found that none of the combined effect sizes were significantly different from zero.
The failure to find robust associations between OXTR SNPs and social behaviors is not surprising in light of recent findings from a large-sample (N = 9,836), genome-wide association study (Benjamin et al., 2012). In this study, Benjamin et al. (2012) corroborated the notion that genetic factors do explain a substantial proportion of the diversity in economic behaviors (including trust) but that no single SNP across the entire human genome could explain more than 1.25% of any trait variation. 14 The unequivocal message of Benjamin et al. is that most published candidate gene studies, including the studies mentioned in this review, are dramatically underpowered and are almost surely false positives—a view that has been recently adopted by several leading journals as an editorial policy (Hewitt, 2012).
Discussion
Robustness of the OT–trust literature
We evaluated the OT–trust relationship in light of the methodological challenges of inducing and measuring OT, understanding its bioanalytic effects, and what is known empirically from near replications. A cautious conclusion is that the basic relationship between OT and trust is not particularly robust. The initial finding in which intranasal OT was linked with trust in humans in a causal fashion (Kosfeld et al., 2005) has failed to replicate in several attempts (see Table 1 and Figure 2). Furthermore, the combined effect size of all studies in which intranasal OT is associated with investment in the trust game is not reliably different from zero. It is also impossible to know whether there exist unpublished data that produced null findings and got stuck in the proverbial “file drawer” (because of publication bias; e.g., Lane, Luminet, Nave, & Mikolajczak, 2015; Rosenthal, 1979).
Moving away from the causal method of intranasal OT administration, studies that have led to conclusions that investments or back-transfers in the trust game are correlated with levels of OT in blood plasma are problematic because of limitations with the bioanalytic validity of the assay methods. The only study in which a bio-analytically valid OT measure was used did not show a relationship between OT and a behavioral measure of trust (Christensen et al., 2014). Finally, genetic results are eye-catching, but trying to discover effects of a small number of genetic SNPs is now thought to be almost impossible (Benjamin et al., 2012). The path ahead requires very large samples (at the order of 20,000 subjects) to identify possible, yet tiny, polygenic effects. Predictably (given this profound methodological concern), the only study in which an SNP on the OT receptor gene was linked with investment in the trust game (Krueger et al., 2012) has not fared well in efforts to independently replicate it (Apicella et al., 2010; Bakermans-Kranenburg & van IJzendoorn, 2014).
Reasons to be concerned about the published behavioral OT literature
Low replicability and other methodological concerns
Although other human social behaviors that are purportedly affected by intranasal OT administration are outside the scope of this review, the literature in which OT and trust are linked is surely not exceptional in terms of its challengingly low robustness. A striking absence of direct replications, the lack of main treatment effects (Bartz et al., 2011), and contradictory results are also found in the literature in which intranasal OT is linked with other complex social behaviors, such as prosociality (Israel, Weisel, Ebstein, & Bornstein, 2012; Kosfeld et al., 2005; Radke & De Bruijn, 2012; Zak, Stanton, & Ahmadi, 2007), 15 affiliation (Feldman, 2012; Smith et al., 2013), cooperation (Bartz et al., 2010; Declerck, Boone, & Kiyonari, 2010; De Dreu et al., 2010; Israel et al., 2012; Rilling et al., 2012), and ethnocentrism (De Dreu et al., 2011; Shamay-Tsoory et al., 2013). We also reiterate the inconclusive evidence that experimentally administered intranasal OT indeed reaches brain targets that most plausibly control social cognition or social behavior in humans.
Inappropriate use of unextracted plasma OT measures also continues to be a problem in many research areas that rely on the measurement of naturally occurring levels of OT. Results in which such measures are used are also commonly reported in studies in which OT is linked with social behaviors (such as parenting; Feldman et al., 2010) and with romantic love (Gordon et al., 2008) as well as in studies in which researchers investigate how OT levels are affected by social and environmental factors (e.g., Gouin et al., 2010). Finally, the absence of correlations between SNPs and social behavior that can be established reliably is not unique to trust or to OXTR SNPs. The same weakness is found for all of the SNPs across the entire human genome, even for traits that are known to be highly heritable, such as IQ (Chabris et al., 2012).
High researcher degrees of freedom
The scope of the behavioral OT literature is wide. Researchers have had a lot of freedom to search for variables that can plausibly modulate effects of OT. An example is feedback about betrayal: Baumgartner et al. (2008) concluded from their data that OT blunts adaptation to betrayal. However, Klackl et al. (2013) and Yao et al. (2014) failed to replicate this behavioral effect, which calls into question the interpretation as betrayal adaptation. In our view, it is difficult enough to establish a solid main effect, such as a causal main effect of OT on trust.
It is hard to disagree with Bartz et al. (2011), whose “interactionist” view states that OT’s most interesting effects are likely to be moderated by individual-difference or contextual factors such as sex, age, health status, personality, genetics, or environmental conditions. However, exploration of many types of modulation or context-dependence typically risks creating an inflated rate of Type I errors, unless appropriate methodological standards are adopted.
Consider a hypothetical study in which researchers explore whether the effect of intranasal OT on target behavior is modulated by the Big Five personality traits without a priori specifying which of the five is most likely to modulate the effect of OT. Unless correction for multiple hypotheses is performed, the probability of finding at least one false OT × Personality interaction, even if none are genuine, is greater than 22% (1 − .955). Adding an environmental factor (e.g., partner’s reliability) will inflate the rate of Type I errors to more than 40% (1 − .9510). Thus, the interactionist approach is likely to produce robust true scientific discoveries only if there is a professional capacity to winnow out intriguing false positives rapidly. Otherwise, establishing robust interactions will be more difficult than establishing robust main effects, not easier.
Lack of a falsifiable behavioral theory
Ten years after the publication of the initial studies that suggested intriguing links between OT and complex human social behaviors, researchers in the field are moving forward to form parsimonious theories that acknowledge the mixed, inconclusive, and often-contradictory findings that have been accumulating not only for trust but for many other variables as well. It has been proposed, for example, that OT might affect complex behaviors by regulating low-level processes such as attention to social cues (Averbeck, 2010; Bartz et al., 2011; Churchland & Winkielman, 2012; Heinrichs, Baumgartner, Kirschbaum, & Ehlert, 2003; Heinrichs & Domes, 2008; Weisman & Feldman, 2013), reducing anxiety (Churchland & Winkielman, 2012; Heinrichs & Domes, 2008), facilitating general approach behavior (Kemp & Guastella, 2010, 2011), promoting “calm and connect” affiliative behaviors (Uvnäs-Moberg, Arn, & Magnusson, 2005), modulating reward sensitivity (Bethlehem et al., 2014; Dölen, Darvishzadeh, Huang, & Malenka, 2013), and prompting an adaptive “tend and befriend” response to environmental stress (Taylor, 2006; but see Smith et al., 2013).
Although these theories are potentially compatible with the contradictory findings that already exist in the literature, their capability to generate falsifiable predictions of how OT will affect complex behaviors might be limited. It is crucial that researchers test generalized theories directly using paradigms that can corroborate or falsify their predictions. Hypothesis testing must occur after hypothesis selection rather than before. Furthermore, theories should be developed to account for robust replicable findings rather than push their boundaries to accommodate unreplicated results.
Conclusion and Recommendations
Despite our gloomy conclusions, we do believe that continued investigation of the association between OT and human social behaviors is worthwhile. Our review does not necessarily imply that true discoveries are not to be found in OT research, and given the animal literature, it seems likely that OT plays a role in the regulation of some social behaviors that are homologous across humans and other species. However, there are many reasons to consider the current literature with a high degree of caution: One must keep in mind that the animal literature is based on vigorous, well-established experimental manipulations (e.g., knocking out genes and directly injecting OT into specific brain regions) that cannot be conducted in humans and that could have different effects even if they could be conducted.
We recommend that editors support publication of less glamorous, yet scientifically important, null results and direct replication efforts (Nosek et al., 2015), especially when conducted according to high methodological standards and in large samples (e.g., Apicella et al., 2010; Christensen et al., 2014; Smith et al., 2013). 16 Researchers should also consider preregistering their ex-ante hypotheses (especially when these are complex, nontrivial interactions), 17 and reviewers should insist on a full disclosure of all collected variables together with corrections for multiple hypotheses testing when appropriate.
We hope readers absorb our criticism as part of a movement throughout psychology reassessing how well current practices enhance the robustness of our empirical database (Cesario, 2014; Pashler & Wagenmakers, 2012; Simons, 2014). Perhaps now is a good time to stop and consider what it is that we think we know about the links of OT to complex human social behaviors, how it is that we came to think we know it, and whether we want to continue believing that what we think we know is true.
Footnotes
Acknowledgements
Colin Camerer acknowledges support from the National Science Foundation and the Behavioral and Neuroeconomics Discovery Fund (California Institute of Technology). Michael McCullough acknowledges support from the Air Force Office of Scientific Research and the John Templeton Foundation. We thank Jorge Barraza, Thomas Baumgartner, Allison E. Gaffey, Keith M. Kendrick, Anthony Lane, Moira Mikolajczak, Michelle M. Wirth, and Kevin (Shuxia) Yao for generously and rapidly sharing their data. We also thank Elizabeth Beaver and Robert Glaser for research assistance. Finally, we thank Tom Cunningham, Anna Dreber Almenberg, Ernst Fehr, Philipp Koellinger, and David Sbarra for useful comments on earlier versions of this article.
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
