Abstract
Substantial evidence comparing men’s perceptions of women’s sexual intentions with women’s own reports of their sexual intentions has shown a systematic pattern of results that has been interpreted as support for the idea that men overestimate women’s true sexual intentions. However, because women’s true sexual intentions cannot be directly measured, an alternative interpretation of the existing data is that women understate their sexual intentions and that men’s assessments of women’s intentions are generally accurate. In three studies, we (a) replicated the typical sex difference in sexual-intent ratings, (b) showed that men maintain their ratings of women’s sexual intentions even when incentivized to tell the truth, and (c) showed that women believe that other women are understating their sexual intentions in self-report measures. Taken together, these results imply that men might be accurate in perceiving and reporting women’s sexual intentions and that men might be managing errors through biased behavior rather than biased beliefs.
Thirty years of evidence show that men interpret women’s indications of potential sexual interest to be stronger than women report intending. In a landmark study, Abbey (1982) assigned pairs of male and female observers to watch a short conversation between another mixed-sex pair of previously unacquainted participants, after which all four individuals estimated the sexual intentions of the people in the conversation. Both male participants (observer and converser) estimated the sexual interest of the woman in the conversation to be greater than did the female observer and, more important, than did the woman in the conversation herself. This difference between men’s estimates and women’s self-reports has been replicated many times across methods (for a recent review, see Farris, Treat, Viken, & McFall, 2008).
Error-management theory (Haselton, Buss, & DeKay, 1998) provided a functional explanation for this phenomenon by focusing on the asymmetrical costs of the two possible errors in estimating a target’s potential sexual interest: failing to notice sexual interest when it is present (i.e., miss) or mistaking friendliness for sexual interest (i.e., false alarm; Haselton & Buss, 2000). Haselton and Buss proposed that missing potential sexual opportunities would have been more costly to men’s reproductive success than to women’s, because it would have led to selection for systems that cause men to pursue women even when the chances of success are small. They therefore proposed that there was selection for “intention-reading adaptations designed to minimize the cost of missed sexual opportunities by overinferring women’s sexual intent” (p. 82), and they located selection for the mechanism in the intention-reading system. This hypothesis entails that the system is making an inaccurate inference: an overestimate of a woman’s sexual interest.
All else being equal, if the best decision-making system under uncertainty is simply the one that maximizes expected value (e.g., von Neumann & Morgenstern, 1944), evolution should give rise to expected-value-maximizing systems (McKay & Dennett, 2009; McKay & Efferson, 2010; Perilloux, 2014), in which the expected value represents reproductive success (Tooby & Cosmides, 1990; Wiley, 1994). In this context, men’s overestimation of women’s sexual intentions represents something of a puzzle. The most straightforward way to decide whether to pursue a potential mate would have been to compute the probability of eventual success as accurately as possible and behave in a way that would have increased reproductive success, given that likelihood. Why should the system produce biased beliefs—which carry a potential cost to the extent that biased representations serve as input to other decision-making systems—instead of biasing behavior: pursuing even low-probability/high-payoff opportunities? (See also Kurzban, 2010; McKay & Dennett, 2009; McKay & Efferson, 2010.)
Our claim is emphatically not that the mind must be designed to create perfectly accurate representations of other people’s sexual interest. Rather, our (weaker) claim is that the mind might be designed to read sexual intentions as accurately as possible (i.e., without systematic bias). The present work investigated the possibility that the inaccuracy of men’s perceptions has been exaggerated. A key but underdiscussed empirical issue in this line of research is that the truth of the matter—a woman’s actual likelihood of intending to have sex after engaging in certain behaviors—is unknown. Perhaps women are underreporting because they themselves are unaware of their true intentions (e.g., Wilson, 2002) or because they are using self-reports to control the way they are perceived by other people (Haselton & Buss, 2000; Kurzban, 2010; Trivers, 2010, 2011).
Study 1
Method
Participants
Participants, recruited from Amazon’s crowdsourcing Web site, Mechanical Turk (MTurk; see Buhrmester, Kwang, & Gosling, 2011), completed a short survey in exchange for 15¢. Our goal was to collect 500 participants; we ended up with 502. Removing individuals who reported being mostly or exclusively homosexual (n = 18) left 271 men and 213 women. Their ages ranged from 18 to 75 years (M = 31.80, SD = 11.71). Their self-reported ethnicities were as follows: 75% Caucasian, 9% Asian, 7% Black, 6% Hispanic, and 1% each Native American, East Indian, and other. See the Supplemental Material available online for additional demographics.
Materials and procedure
Participants completed one form of the dating-behaviors scale (DBS; Haselton & Buss, 2000). Women reported their sexual intentions (i.e., how likely they would be to have sex with a man) if they had engaged in each of 15 behaviors. Men reported their estimates of the sexual intentions of women who engaged in each of those same 15 behaviors. Both men and women made their ratings on a 7-point scale from −3 (extremely unlikely) to +3 (extremely likely). Instructions and the list of behaviors are provided in the Supplemental Material. The reliability of the scale was excellent (Cronbach’s α = .93).
A link from the MTurk Web site took participants directly to the survey. After completing some demographic questions and the DBS, participants were directed back to the MTurk site, through which they were compensated.
Results
Following Haselton and Buss (2000), we first calculated each participant’s average rating for all 15 behaviors and then analyzed the sex difference between these composites. Men (M = 1.44, SD = 0.76) rated the behaviors as implying more sexual interest than did women (M = 0.77, SD = 1.21), t(482) = 7.49, p < .001, d = 0.68. Our results closely resemble the pattern found by Haselton and Buss (see Figs. 1 and 2). Table 1 provides descriptive statistics and the results of a multivariate analysis of variance (ANOVA) comparing men’s and women’s ratings for each behavior, and Figures 3 and 4 graph the ratings for each behavior separately for men and for women, respectively. Comparing these ratings reveals that the sex difference was not driven by outliers: Men rated women’s intentions after engaging in 12 of the 15 behaviors significantly higher than women did (α = .003, Bonferroni corrected). Study 1 therefore reproduced, in a generally nonstudent population, the pattern of results found by Haselton and Buss (2000): Men’s estimates of women’s intentions, conditional on the behaviors in question, were higher than women’s estimates.

Men’s and women’s mean sexual-intent composite scores from Study 2 of Haselton and Buss (2000) and Study 1 of the current research. Error bars represent 95% confidence intervals.

Mean sexual-intent composite score as a function of sex, study, and condition. Error bars represent 95% confidence intervals.
Results From a Multivariate Analysis of Variance on Ratings of Women’s Sexual Intention in Study 1
Note: Standard deviations are given in parentheses.
p < .003 (Bonferroni corrected).

Men’s mean sexual-intent ratings in each condition of Studies 2 and 3, compared with men’s and women’s means from Study 1. Results are shown separately for each behavior. Error bars represent 95% confidence intervals.

Women’s mean sexual-intent ratings in each condition of Studies 2 and 3, compared with men’s and women’s means from Study 1. Results are shown separately for each behavior. Error bars represent 95% confidence intervals.
Study 2
As economists frequently point out, absent incentives, participants might not be motivated to provide carefully considered self-reports (Hertwig & Ortmann, 2001). For example, perhaps men are trying to signal that in their particular experience, women who engage in these behaviors are especially sexually interested in them. Conversely, perhaps women dampen their responses to appear coy. If either of these motives is at work, then paying participants to be accurate could alter their responses: If participants are incentivized to honestly reveal their beliefs, the sex difference in perceptions of sexual interest might shrink.
Method
Participants
Participants for Study 2 were also recruited from MTurk to complete a short survey. In one round of data collection, we paid participants a flat rate of 25¢ to complete the same survey as in Study 1 (nonincentivized condition); in a second round of data collection, we paid participants the same flat rate and added the opportunity to earn extra compensation on the basis of accuracy (incentivized condition). We again planned to collect 500 participants; we ended up with 499. Removing nonheterosexual individuals (n = 20) left 283 men (160 incentivized, 123 not incentivized) and 196 women (90 incentivized, 106 not incentivized). Their ages ranged from 18 to 82 years (M = 31.09, SD = 10.92). Their self-reported ethnicities were as follows: 80% Caucasian, 7% Black, 6% Asian, 5% Hispanic, 1% Native American, and 1% East Indian. See the Supplemental Material for additional demographics.
Materials and procedure
The same scales were used as in Study 1, but this time the instructions were to estimate women’s previously recorded responses (from Study 1), and participants recorded their ratings on a continuous scale, which allowed the use of decimals. Incentivized participants earned 1¢ for each answer that was within 0.10 correct in either direction, and 0.5¢ for each answer within 0.20 correct in either direction; these incentives were in line with current market pricing on MTurk. The reliability of the scale was good for incentivized (Cronbach’s α = .88) and nonincentivized (Cronbach’s α = .89) participants. The instructions for the incentivized and nonincentivized participants are provided in the Supplemental Material.
Results
First, we conducted a 2 (sex: male, female) × 2 (incentives: present, absent) ANOVA on the composite mean ratings. The interaction was not significant, F(1, 474) = 0.40, p = .53, nor was the main effect of incentives, F(1, 474) = 1.26, p = .26, R2 < .01. The main effect of sex, however, was significant, F(1, 474) = 16.79, p < .001, R2 = .04. Men’s composite mean (M = 1.47, SD = 0.76) was significantly higher than women’s (M = 1.14, SD = 0.89).
We then compared the composite means in the current study (collapsed across incentive conditions) with the means from Study 1 using one-sample t tests. Men’s composite mean in Study 2 did not significantly differ from men’s composite mean in Study 1, t(282) = 0.56, p = .57, d = 0.03, but was significantly higher than women’s composite mean in Study 1, t(282) = 15.33, p < .001, d = 0.91. Women’s composite mean in Study 2 was significantly higher than women’s composite mean in Study 1, t(195) = 5.82, p < .001, d = 0.41, but significantly lower than men’s composite mean in Study 1, t(195) = −4.69, p = .003, d = 0.33. Figure 2 depicts the composite means; the data for Study 1 and Study 2 show that while men’s ratings did not differ significantly between Study 1 and Study 2, women’s ratings significantly increased, even though they were still below men’s ratings.
Figures 3 and 4 further depict the pattern of changes across the behaviors for men and women, respectively. Consider the data for Studies 1 and 2. In both figures, the solid line indicates men’s mean ratings from Study 1, and the dotted line indicates women’s mean ratings from Study 1. Given the confidence intervals depicted, it is clear that men’s Study 2 ratings clustered around men’s Study 1 means, whereas women’s Study 2 ratings were generally higher than women’s Study 1 means.
Discussion
When asked to guess women’s responses, men reported similar estimates in Study 2 as in Study 1, whereas women’s responses changed in the direction of men’s guesses. Why would women think that other women would report levels of sexual intent similar to the levels that men attribute to women? One natural interpretation of these data is that if men’s beliefs about women’s interest (Studies 1 and 2) were accurate, then this would explain why men’s responses were the same in Study 1 and Study 2, independent of incentives. Further, if women knew other women’s true levels of intent and also knew that other women would underreport these intentions—but did not know by how much these intentions would be underreported—then women would have provided higher estimates in Study 2 than in Study 1. The data from Study 1, as well as the Haselton and Buss (2000) data, could be explained by positing that women underreported their own sexual intentions, whereas men accurately estimated women’s intentions.
There are, of course, alternative explanations. First, women’s self-reports from Study 1 might be lower than their estimates of other women’s sexual intent in Study 2 because women sincerely believe that their own intentions are, in fact, lower than the average woman’s (Alicke & Govorun, 2005; Taylor, 1989). Second, women’s self-ratings in Study 1 might be accurate, but women in Study 2 may have overstated what they believe other women intend, even when incentivized, as a form of rival derogation (Haselton & Buss, 2000). Another possibility for men is that they know they are wrong, but the incentives are insufficient to motivate them to decrease their estimates. We cannot definitively rule out this possibility, but prior work indicates that incentives as small as a nickel influence behavior among MTurk participants (DeScioli, Christner, & Kurzban, 2011).
Broadly, the null effect of incentives in Study 2 suggested that women know that other women underreport their genuine sexual intentions but do not know by how much. If women have an accurate representation of other women’s genuine intentions, then when female participants are asked what other women will report (as in Study 2), they should provide answers that reflect their estimates of the underlying behavioral intentions—to the extent that they believe other women report honestly. But if women tend to understate their intentions and participants do not realize this, participants will appear to overestimate women’s intent compared with women’s self-ratings.
Study 3
Given the data from Studies 1 and 2, it is unclear whether participants distinguish between what other women report and what their intentions actually are. In Study 3, we attempted to induce participants to make this distinction by having participants estimate both what women actually intend by the behaviors in question and what women would report that they intend. If men have been accurate all along and women know other women’s intentions, then asking women what other women actually intend should yield a pattern of results similar to men’s original ratings.
Method
Participants
Participants for Study 3 were also recruited from MTurk to complete a short survey in exchange for 25¢. For this study, we attempted to collect a sample of 250 participants and ended up with 256. Removing nonheterosexual individuals (n = 11) left 119 men and 126 women. Their ages ranged from 18 to 75 years (M = 32.41, SD = 11.67). Their self-reported ethnicities were as follows: 75% Caucasian, 9% Black, 7% Asian, 6% Hispanic, and 1% each Native American, East Indian, and other. See the Supplemental Material for additional demographics.
Materials and procedure
The DBS was used again, but this time participants rated each behavior in terms of what they believed other women would say and what they believed women actually intended. The reliability scores for both scales (what women say, what women want) were good (Cronbach’s αs = .86). The instructions for rating what women say were the same instructions as in the nonincentivized condition of Study 2. The instructions for rating what women want (provided in the Supplemental Material) asked participants to estimate how much women would actually want to have sex with a man if they engaged in a given behavior.
Results
We conducted a 2 (sex: male, female) × 2 (rating type: what women say, what women want) mixed-design ANOVA on the composite mean ratings. The main effect of sex was not significant, F(1, 243) = 0.08, p = .78, η p 2 < .00. The main effect of rating type was significant, F(1, 243) = 143.29, p < .001, η p 2 = .37; participants rated what women actually want (M = 1.87, SD = 0.65) significantly higher than what women say they want (M = 1.48, SD = 0.71). This main effect was qualified by a significant interaction, however, F(1, 243) = 9.08, p = .003, η p 2 = .04. The interaction can be seen in Figure 2, which shows a larger difference for men than for women between mean ratings of what women say and what women want.
The comparison of the sexes’ mean composite ratings to those of Study 1 via one-sample t tests revealed an interesting pattern. Men’s composite mean for what women say (M = 1.42, SD = 0.65) did not significantly differ from men’s composite mean in Study 1, t(118) = −0.31, p = .75, d = 0.03. Men’s composite mean for what women want (M = 1.91, SD = 0.58) was significantly higher than men’s composite mean in Study 1, t(118) = 8.90, p < .001, d = 0.82. Women’s composite mean for what women say (M = 1.54, SD = 0.76) was significantly higher than women’s composite mean in Study 1, t(125) = 11.35, p < .001, d = 1.01. Women’s composite mean for what women want (M = 1.84, SD = 0.71) was also significantly higher than women’s composite mean in Study 1, t(125) = 16.96, p < .001, d = 1.51. We analyzed cross-sex comparisons from Study 3 to Study 1 via one-sample t tests, as we did in Study 2. Compared with women’s composite mean in Study 1, men’s composite means were significantly higher for what women say, t(118) = 10.91, p < .001, d = 1.00, and what women want, t(118) = 21.49, p < .001, d = 1.97. Compared with men’s composite mean in Study 1, women’s composite means were significantly higher for what women want, t(125) = 6.31, p < .001, d = 0.56, but not significantly different for what women say, t(125) = 1.51, p = .13, d = 0.13.
As depicted in Figures 3 and 4, these patterns recur across behaviors in the scale. Men and women provided higher ratings for what women want than for what women say, but the discrepancy was much larger for female raters than for male raters. Figure 4 shows that women’s ratings were quite close to, or higher than, men’s original composite means from Study 1 and therefore substantially higher than women’s original composite means from Study 1.
Discussion
Study 3 suggests that participants in fact distinguish between what women report and what women actually intend. Participants indicated that they believed women’s true sexual intentions conditional on the behaviors in question were significantly higher than what women reported intending. Women’s ratings of what other women want were even more affected by this distinction than men’s ratings. These analyses support our suggestion that men have been relatively accurate all along and that women underreport their own intentions.
Could asking participants to estimate both reported and actual intentions exaggerate differences in their ratings? Perhaps, but this was the point: to determine whether the distinction exists. If focusing participants’ attention on the distinction between what women say and what women actually want influenced ratings as a demand effect, then ratings of what women say ought to have been lower in Study 3 than in Study 2; by the same token, ratings of what women want ought to have been higher. Although we did find that men’s and women’s ratings of what women want were significantly higher in Study 3 than in Study 2, men’s ratings of what women say did not significantly differ from men’s Study 2 ratings, and women’s ratings of what women say were actually significantly higher than women’s Study 2 ratings (see Fig. 2 for means and confidence intervals). These results imply that ratings in Study 3 were not artificially altered by the demand characteristics associated with asking participants to make the distinction between actual and reported intent.
General Discussion
After replicating the findings of Haselton and Buss (2000) in a largely nonstudent sample (Study 1), we found that both men’s and women’s beliefs (Study 2) about women’s sexual intentions—conditional on the behaviors investigated—matched men’s estimates in Haselton and Buss’s research better than they matched women’s self-reported intentions. We subsequently found that men and, more important, women, believe that the level of likelihood that women intend to have sex is even greater than their reported intentions (Study 3). 1
Our interpretation of these data is that men appear to overestimate women’s sexual intentions because women understate them. Therefore, some previously documented patterns—for example, the difference between the estimates of the man and the woman in the conversation in Abbey’s (1982) experiment—might be better explained by women’s underreporting than by men’s overperception. Reconciling our data with other observations is more difficult. Consider the observers in Abbey’s experiment. If men are correct about women’s intentions and women know that other women’s sexual intentions are in line with men’s beliefs, then why do male and female observers disagree about the target woman’s sexual intentions? One possibility is that women evaluate individual women’s sexual intentions differently than hypothetical women’s sexual intentions, on average. Perhaps in studies such as Abbey’s, women more closely identified with the woman they were observing and answered questions about her sexual intentions in a way that reflected what they themselves would report.
The data from Study 3 move us away from the proposal that women overstate what other women actually intend as a form of derogation. The findings that men and women estimate similar levels of actual intentions (Study 3: what-women-want condition) lead us to believe that this similarity results from the same underlying cause: the truth of the matter. This explanation seems more parsimonious than the alternative: that men overestimate women’s intentions for reasons suggested by error-management theory, and women overestimate other women’s intentions for some other reason. In contrast, our results suggest that women underreport other women’s sexual intentions unless special measures, such as those in Studies 2 and 3, are taken to elicit women’s sincere beliefs about other women’s intentions.
Moving forward, a useful piece of evidence would be the actual probability that a woman would have sex with a man conditional on her engaging in the behaviors in question. Retrospective accounts or diary studies of whether participants subsequently had sex with individuals with whom they engaged in the behaviors could address this. Furthermore, studies incorporating actual judgments of sexual interest (e.g., Perilloux, Easton, & Buss, 2012) could be enhanced via longitudinal tracking of romantic outcomes.
Our argument is again that true beliefs are useful in supporting any decisions that refer to those beliefs. False beliefs infect decision-making systems, which leads to suboptimality (except in cases in which false beliefs improve persuasive abilities; e.g., Haselton & Buss, 2009; Kurzban, 2010; Trivers, 2010). One should not expect cognition to be perfect. Still, given evidence that human judgments are extremely accurate in other domains (e.g., Gigerenzer & Hoffrage, 1995), systems designed to guide sexual-intent estimation might be expected to be particularly resistant to bias because of the central role that mating plays in reproductive success. Although all evolved systems are subject to constraints, error, and engineering trade-offs (e.g., Tooby & Cosmides, 1995), optimality is the best place to begin model construction. The default means by which evolution solves the problem of managing errors should be via systems that generate priors and cost-benefit estimates that are as accurate as possible, which would maximize expected value in decisions dependent on these estimates. Using systematically biased beliefs to motivate appropriate behavior will generally be an inferior solution compared with simply behaving appropriately on the basis of the most accurate beliefs available.
Footnotes
Acknowledgements
We thank Jaime Cloud, Aaron Lukaszewski, and Zach Simmons for feedback on an earlier draft of this article. We also thank Jae Ahn for helping with data collection.
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
