Abstract
Memory for the content of our conversations reflects two partially conflicting demands. First, to be an effective participant in a conversation, we use our memory to follow its trajectory, to keep track of unresolved details, and to model the intentions and knowledge states of our partners. Second, to effectively remember a conversation, we need to recall the gist of what was said, by whom, and in what context. These two sets of demands are often different in their content and character. In this article, we review what is known about distant memory for conversations, focusing on issues that have particular relevance for legal contexts. We highlight evidence likely to be of importance in legal contexts, including estimates of how much information can be recalled, the quantity and types of errors that are likely to be made, and the situational factors that shape memory for conversation. The biases we see in distant memory for a conversation reflect in part the interplay of the conflicting demands that conversation places upon us.
Tweet
How should our memories for a conversation be treated by the legal system?
Key Points
The scientific study of memory for conversation has implications for the use of memory for conversation in legal settings.
After a short delay, conversational participants may recall from memory fewer than 20% of the specific ideas that were originally expressed.
If someone does freely recall a detail from conversation, that detail is reasonably likely to be accurate.
You are more likely to recall what you yourself said in conversation compared with what someone said to you.
Contextually inappropriate, salient, and explicit content is remembered better than mundane details.
Recommended action: Scientific findings regarding memory for conversation offer useful guidance regarding the use of memory for conversation in official and legal settings.
Introduction
Former director of the Federal Bureau of Investigation (FBI), James B. Comey, testified for the Senate Select Committee on Intelligence regarding a series of conversations he had with President-Elect, and later President Trump. Among Comey’s recollections of their conversations was the following: He then said, “I hope you can see your way clear to letting this go, to letting Flynn go. He is a good guy. I hope you can let this go.” I replied only that “he is a good guy.” (J. Comey, June 8, 2017; U.S. Hearings before the Senate Select Committee on Intelligence, 2017)
How much faith should we have in Comey’s recollections? How literal do we think his reports are for these events? This is not abstract, ivory-tower stuff. Questions about whether Trump will be indicted on a charge of obstructing justice may well hinge on assessments of the accuracy of reported conversations like this one.
In this article, we argue that if one is to take seriously findings from scientific research on memory for conversation, a series of recommendations follow about the utility of memory for conversation as an arbitrator of truth.
Of course, everything old is new again, both in psychology and in politics. Forty years ago, memory for conversation was central to another political imbroglio. Former White House Counsel John Dean testified for the Senate Watergate Committee about President Nixon’s actions: I can very vividly recall that the way he sort of rolled his chair back from his desk and leaned over to Mr. Haldeman and said, “a million dollars is no problem. . . ” (J. Dean, June 27, 1973; U.S. Government Printing office, 1973)
John Dean’s memory became a focus of interest for memory researchers because his testimony during the Watergate investigation regarding his conversations with President Nixon was so remarkably rich with detail. Tapes containing audio recordings of these conversations later surfaced, creating an opportunity to objectively assess the veracity of Dean’s recollections. In a famous paper, Neisser (1981) compared Dean’s testimony regarding two conversations against the tape recordings. Neisser concluded that even though Dean’s memory was distorted, self-serving, and wrong in many details, the fundamental aspects of Dean’s testimony were nevertheless correct. For example, the infamous conversation about the million-dollar blackmail demand occurred on March 21, 1973, whereas Dean testified that it occurred on March 13. Nixon also never said specifically that “a million dollars is no problem,” though he did say something similar: “[Y]ou could get a million dollars. And you could get it in cash. I, I know where it could be gotten” (R.M. Nixon, March 21, 1973; U.S. v. John N. Mitchell, et al., 1973).
When recollections of conversation are offered as testimony, what does the science say about the probative value of memory for conversation? How much can a person—children, adults, trained FBI investigators—remember of a conversation, and which aspects are likely to be remembered? And do actions, such as taking contemporaneous notes, affect memory for conversation?
The Science Behind Memory for Conversation
Memory for the content of conversation is central to having an effective conversation. When we lose the thread, forget a point recently made, or contradict ourselves, these errors are often attributable to failures of memory. Consequently, memory for conversation is shaped in part by the real-time demands of participating in it. However, these demands may be at odds with the demands that the legal system has on memory for conversation, which ask a person to recall that conversation long after it has concluded. In this article, we review what is known about distant memory for conversation, focusing on issues that have particular relevance for legal contexts.
In conversation, what is said is the basis of what will be remembered. Yet language is situated in, and understood with respect to, the context of use (Clark, 1992). This context shapes the perception of words (Connine, 1987) and grammar (Ferreira, Bailey, & Ferraro, 2002), leading to a system in which multiple probabilistic constraints guide the real-time processing of language (Trueswell & Tanenhaus, 1994). Thus, context is a potent force with legal relevance—in cases where context-driven expectations contradict what was said, errors of comprehension and memory will follow. For court transcriptions (see Gorgos, 2009; Nicolas, 2010) and investigator interview notes (Lamb, Orbach, Sternberg, Hershkowitz, & Horowitz, 2000), such errors may be impactful.
An additional layer of complexity is that we rarely know the ground truth of the matters being testified to—that is, what actually happened in the original conversation. Laboratory studies of conversational memory typically have participants engage in a conversation, followed by a delay of minutes, days, or weeks, and then ask participants to recall all of what was said. Like the Nixon-era Oval Office tapes, recordings of the original conversation provide an official record, and can be compared with what was recalled. Two different assessments of accuracy are important. The first is whether the details included in the report are accurate. In the worlds of business and the law, this is often called “accuracy.” We adopt related terminology from memory research and call this output-bound accuracy (Koriat & Goldsmith, 1996): the accuracy of information conditional on having been output. This can be contrasted with input-bound accuracy: the proportion of the original event that is reported (sometimes called “completeness”). Note the inherent tension between these measures: techniques that elicit more information typically increase input-bound accuracy but decrease output-bound accuracy.
The way memories are elicited plays a large role in determining the balance of these two types of accuracy. Consider productive memory tasks, which require the rememberer to produce the sought-after information. This information could include the contents of conversation, the time it occurred, names of the persons involved, or many other details. Free recall testing involves the least intrusive cuing. One asks a person to report all of what was said, without any additional prompting. Free recall is commonly evaluated based on the successful recall of “idea units,” defined as the smallest unit of meaning that has “informational or affective value” (Stafford & Daly, 1984). Other units of measurement include recall of specific utterances, either verbatim or their gist (Bruck, Ceci, & Francoeur, 1999).
Alternatives to free-recall testing include cued recall, in which a person is prompted, as in, “What did Bob say when you asked him what classes he was taking?” Such cues prompt the rememberer to produce specific information from memory. These cues guide rememberers more pointedly toward aspects of the conversation that are relevant for the legal system and can protect against recall of irrelevant material that may interfere with memory for probative details.
Memory judgment tasks allow the researcher to probe memory by asking the rememberer to evaluate statements or queries. For example, recognition memory procedures may provide the rememberer with statements about a prior conversation (“Bob said he was taking Chemistry”) and ask them to evaluate the statements’ truth in light of their own memory for the conversation. Related memory judgments might involve remembering when a series of conversations took place, or whether a particular individual was present during a conversation.
In the following sections, we describe three bodies of scientific findings regarding memory for conversation with relevance in legal settings.
Input-Bound Accuracy: What and How Much Can be Recalled?
Estimates of how much of a conversation can be accurately recalled in detail after delays of several minutes to several weeks are quite low and range from 0% to 20% of the total idea units that occurred in the original conversation (Miller, deWinstanley, & Carey, 1996; Pezdek & Prull, 1993; Ross & Sicoly, 1979; Samp & Humphreys, 2007; Stafford, Burggraf, & Sharkey, 1987; Stafford & Daly, 1984). For example, Samp and Humphreys (2007) tested memory after a 5-min delay and reported gist recall of 14% of the idea units from a 5-min problem-solving conversation. Stafford and Daly (1984) also tested memory after a 5-min delay and reported 10% gist recall of idea units from a 7-min unstructured conversation, with the best participant (out of 128 participants total) recalling only 40% of the idea units, and the worst recalling none. Ross and Sicoly (1979) tested memory after a 3-to-4-day delay and reported 6% gist recall of participant’s own statements, and only 3% recall of what the other person said. So, overall, we see a pattern of forgetting. We also see a hint that memory might meaningfully differ for one’s own contributions to a conversation than for others’ contributions. This is a point we return to.
Which elements of a conversation are remembered? One common conclusion is that memory for the gist or central point of a conversation is better than memory for the details (Bruck et al., 1999). Surface information is presumably abandoned and replaced by a sparser code that characterizes important “take-home” messages (Bartlett, 1932; Bransford & Franks, 1971).
Yet, testing using judgments of recognition memory raises questions about the extent to which this always holds in memory for conversation. Surprisingly, participants can discriminate between exact transcriptions of conversational content and paraphrases (Hjelmquist, 1984). This finding reflects the fact that specific details about individual words, not to mention their context and prosody, are critically relevant to speaker meaning (Brown-Schmidt & Tanenhaus, 2008; Wagner & Watson, 2010). Impressively, these details are stored in memory, although they are difficult to access except when very precisely cued.
Another wrinkle is that not all content is equally memorable. Indeed, the juicy tidbits of conversation, jokes, and out-of-place remarks, are better remembered than mundane content (Keenan, MacWhinney, & Mayhew, 1977; Kintsch & Bates, 1977; MacWhinney, Keenan, & Reinke, 1982). However, what is distinctive, and thus memorable, depends on the context. Memory for sexually explicit remarks—a domain that has particular importance in sexual harassment cases—reveals this contextual dependence well.
Consider the testimony of Anita Hill, former assistant to Clarence Thomas, now Associate Justice of the Supreme Court of the United States: One of the oddest episodes I remember was an occasion in which Thomas was drinking a Coke in his office, he got up from the table, at which we were working, went over to his desk to get the Coke, looked at the can, and asked, “Who has put pubic hair on my Coke?” (A. Hill, Thomas nomination hearings; U.S. Government Printing Office, 1993, Part 4, p. 38).
The public debate over the hearings reflected, in part, uncertainty over the validity of Hill’s recall. The comments are certainly odd and inappropriate. Pezdek and Prull (1993) reported an interesting result than bears on its interpretation. They found that recognition memory for sexually explicit content is better than memory for nonexplicit content, and that this difference is more pronounced when the utterance is contextually incongruous. This effect was evident even after a relatively long delay of 5 weeks.
Taken together, these findings drive home several important points. First, the ability to accurately recall details is low and drops quickly with time. Second, when appropriately cued, many details of past conversations can be remembered. Finally, the content and context of utterances determines much about whether they will be remembered.
Output-Bound Accuracy: How Much of the Remembered Information Is Accurate?
People reporting freely from their own memories are biased to only include correct information. Consequently, though free recall omits much information, it rarely elicits high rates of truly incorrect information (errors of commission). In one recent study of memory for objects (Stanley & Benjamin, 2016), ~90% of objects that were recalled had actually been viewed. When inaccurate information is produced, it is more likely to be hedged, “I guess… ,” or hesitant (Smith & Clark, 1993). Yet conversation has unique characteristics that might lessen this effect. Errors of commission for conversation can be highly impactful, and so understanding their frequency and origin is important.
Error rate
Does the general reluctance to produce errors of commission extend to conversation? Miller et al. (1996) reported an error rate of 7%, where errors included both new information and source misattributions. Ross and Sicoly (1979) reported that only 56% of statements attributed to the conversational partner were accurate; the most common errors were recollections of information read by a participant prior to a conversation, and inferences they had drawn but that were never explicitly stated. Such errors illustrate two common failures of memory: mistaking plausible inferences for actually spoken statements, and source misattribution.
Errors of source
Mistaking information that was read for information that was heard is one example of a broad class of errors in which the rememberer gets the content right but the context wrong. These errors have the interesting property that they contain correct “idea units.” They include errors in attribution of the source of information (Johnson, Hashtroudi, & Lindsay, 1993), and errors in which a person mistakes something they thought about for something that was witnessed (Johnson & Raye, 1981).
Memory for the context or source of information is often forgotten more quickly than the statement itself (Marsh & Bower, 1993). What happens when we recall having heard that cellular phones cause cancer, but not whether we heard it from a guy at a bar or from the National Cancer Institute? This circumstance can lead to a sleeper effect, a potentially dangerous situation in which an individual is more persuaded by a message because they cannot remember the original source (Kumkale & Albarracín, 2004).
When source memory errors are made, they reflect an amalgamation of partial information and biases about who is likely to have said what. For example, statements about the law are more likely to be attributed to a lawyer than a doctor (Bayen, Nakamura, Dupuis, & Yang, 2000), and utterances are likely to be misattributed to members of the same social group as the original source (Klauer & Wegener, 1998). These effects reveal once again the constructive nature of memory. A related type of source error is cryptomnesia, a phenomenon wherein a person inadvertently reports that they themselves said something, when the true source was someone else (Brown, Jones, & Davis, 1995). Such errors are important in the context of battles over intellectual content.
Finally, what if an investigator remembers what a witness has said, but forgets that his or her statement was prompted (Lamb et al., 2000; Bruck et al., 1999)? In one study of mothers and their children, Bruck et al. (1999) found that mothers’ memory both for who said what (source judgments) and for how information was elicited from a child was poor. Failing to recall that a response was prompted is a significant error, as prompting increases the probability that a target memory will be successfully reported, but also decreases the output-bound accuracy of that report (Loftus, 2005).
Errors of inference
Inference-induced recall of unstudied words (Matzen & Benjamin, 2009; Roediger & McDermott, 1995) is a classic example of a memory error. However, inferences reflect typical mechanisms of language processing (Christianson, Hollingworth, Halliwell, & Ferreira, 2001). In fact, speakers regularly intend pragmatic inferences which go beyond the literal string of words. The question “Shoplifting’s fun?” insinuates a good deal about the addressee (Gunlogson, 2004). In conversation, when inferences are mistaken for actual content from the conversation, the rememberer may be providing an accurate report of what was intended by the speaker. If what was intended is in dispute, memory for what was said may be of limited utility.
Inconsistency of recall
Witnesses in legal domains are often interrogated multiple times. Of course, each retrieval reflects in part the influence of prior interrogations, which is why recall tends to become more homogeneous over time (Bartlett, 1932). Inconsistencies arise over multiple recall attempts, and those inconsistencies are often used to attack a witness’ credibility (Alavi & Ahmad, 2002). Stanley and Benjamin (2016) showed that information that appears inconsistently in a witness’ recall is, in fact, more likely to be incorrect than information that appears consistently. Of additional relevance to the legal domain is their finding that witnesses who are more inconsistent in their recall are less accurate overall—even in the details they recall consistently.
What Situational Factors Influence Memory?
Interrogation
There is an inherent trade-off involved in querying memory. On one hand, rememberers should guide as much of the process on their own as possible. This prevents an interrogator from intentionally or unintentionally biasing memory, and consequently ensures higher output-bound accuracy. On the other hand, having a strong and accurate cue to a past event—like a reminder of a distinctive aspect of the conversation—increases the probability that the sought-after information will be successfully retrieved, thereby increasing input-bound accuracy. Depending on the demands of the situation, different techniques for interrogating memory may be appropriate.
The amount of specific cuing to a particular event, piece of information, or moment in time can vary. More information may be accessed through techniques that guide rememberers through a series of general, but increasingly precise, cues. The cognitive interview is an example of a tool designed to be minimally interfering (Campos & Alonso-Quecuty, 2008; Fisher & Geiselman, 1992). When the goal is to maximize output-bound accuracy, minimal cuing is desirable. When the goal is to access or assess a specific piece of information, cuing is more likely to successfully lead to that memory, and thus maximize input-bound accuracy.
Asking rememberers to make judgments can lead them to a piece of information that they have difficulty producing on their own, but can also lead the rememberer into false memories (Loftus, 1975), confusing a plausible inference with a genuinely remembered statement (Bransford & Franks, 1971), or confusing expectations with memory for what happened (Brewer & Treyens, 1981). After decades of research, and many dramatic cases of misremembering owing to the influence of postevent details or leading interrogation (for reviews, see Loftus, 2005; Zaragoza, Belli, & Payment, 2007), we now know that extreme care must be exercised when interrogating memory. Each probe leaves its own mark (Loftus, 1975) and has the potential to alter future remembering or disrupt the confidence–accuracy relationship for remembered materials (Wixted, Mickes, & Fisher, 2018).
Retrieval-induced forgetting, or blocking, can affect the ability to recall information in the future. Conversation about elements of a socially shared event, such as 9/11, can cause the nondiscussed elements of an event to become less accessible in memory (Coman, Manier, & Hirst, 2009). Similarly, natural sharing among peers following an event can shape and distort individual memory for that event (Principe & Ceci, 2002; Principe & Schindewolf, 2012). Findings like these raise the question of whether the structure of discourse shapes what will and will not be recalled.
Delay
In general, input-bound accuracy decreases with the interval between the event and the interrogation, consistent with decline of memory for the details of conversation with time (Campos & Alonso-Quecuty, 2006; Pezdek & Prull, 1993; Stafford et al., 1987). Yet memory for “coarse-grained” information in a conversation is forgotten more slowly (e.g., Christiaansen, 1980; Conway, Cohen, & Stanhope, 1991). In general, forgetting is most rapid initially and slows over time (Wixted, 2004). Memorable details that are retained for a considerable period after an event are much less likely to be forgotten.
Counterintuitively, delay does not appear to affect output-bound accuracy. Because people can control what they report, they “correct” for forgetting by not reporting details that they have lost access or no longer have confidence in (Ebbesen & Rienick, 1998; Poole & White, 1993). So, though there are strong reasons to expect rememberers to lose access to information over time, there are also reasons to trust their reports just as much after a long interval as after a short one. This same principle applies to eyewitness identification (Wixted, Read, & Lindsay, 2016).
Conversational Role
Memories for conversation are affected by one’s role in the conversation. Active conversational participants better understand what is said (Schober & Clark, 1989), and remember more content (Benoit & Benoit, 1994; MacWhinney et al., 1982). For nonparticipants, increased richness of the experience seems to increase memory. For example, watching a video produces better memory than listening to an audio recording (Campos & Alonso-Quecuty, 2006).
Memory for the content of conversation is also generally superior for what a person has said themselves, compared to what they heard (Fischer, Schult, & Steffens, 2015; McKinley, Brown-Schmidt, & Benjamin, 2017; Miller et al., 1996; Ross & Sicoly, 1979; Yoon, Benjamin, & Brown-Schmidt, 2016). This finding is consistent with a large body of literature indicating that generation or production of information enhances memory for that information (MacLeod, Gopie, Hourihan, Neary, & Ozubko, 2010; Slamecka & Graf, 1978).
By contrast, effects of conversational role on memory for who said what—source memory—are equivocal (Brown et al., 1995; Fischer et al., 2015; Gopie & MacLeod, 2009; Jurica & Shimamura, 1999; cf. McKinley et al., 2017). As an interesting example of the pernicious effects of reality monitoring, memory for who said what may be most impaired when one easily anticipates what one’s partner is about to say next (Foley, Foley, Durley, & Maitner, 2006). Such confusions may underlie some of the many public battles over who deserves recognition for the scientific advances leading to Nobel Prizes (e.g., Fletcher, 1982).
Memory for Context
Beyond memory for the source and destination of utterances in conversation, one can also measure memory for its context. Yoon et al. (2016) tested memory for pictures that were referenced in a conversational game. Although speakers outperformed listeners, participants regardless of role distinguished pictures that had been talked about from similar ones that never appeared in the experiment. Pictures that were seen but not discussed were remembered poorly, except when the speaker described one picture by contrasting it with another (e.g., “the leather boots” in a context with leather and plastic boots).
Contrast can also be evoked with spoken prosody. Fraundorf, Benjamin, and Watson (2010) examined memory for spoken narratives, which contained prosodic emphasis: “ . . .the British and the French biologists had been searching. . . Finally the BRITISH found. . . .” Fraundorf et al. found that prosodic emphasis improved memory for the correct outcome (the BRITISH), and correct rejection of the alternative (the French). These findings indicate that the way we relay information about events can shape our memory for those events; they also reveal another way in which written transcripts miss critical information about the way language is understood in context.
Contemporaneous Notes
In legal settings, witnesses are generally only allowed to testify about their own experiences, though there are exceptions to this rule. Saks and Spellman (2016) discussed an exception wherein an adult testifies on behalf of a child—for example, when an investigator has interviewed the child about possible abuse. A potential cause for concern is the degree to which investigator notes accurately convey the context of the interviewee’s statements. A comparison of audio recordings with written “verbatim” notes made by experienced child sex abuse investigators found that 25% of forensically relevant details recounted by children during the forensic interview were missing from the notes; errors of commission in the notes were rare, occurring 0.004% (Lamb et al., 2000). However, investigators failed to record 57% of their own utterances, thus underreporting the rate at which interviewee statements were in response to a specific prompt (also see Bruck et al., 1999). The degree to which a retrieved detail was elicited by cuing is highly relevant to determining its output-bound accuracy. Yet, a key function of contemporaneous notes is to create an external memory record of a conversation that does not decline with time. Thus, while notes may be imperfect, they are likely to have higher input-bound accuracy than free-recall of conversation at a delay.
Policy Implications of Memory for Conversation
Despite a modest empirical literature on conversational memory, as compared with the volumes on eyewitness memory for events (Loftus & Palmer, 1974), memory for conversation is highly relevant to the legal system (for discussion, see Davis & Friedman, 2007).
As we have illustrated, the overall quantity of information that can be recalled at a delay is low, but broadly accurate. Genuine attempts to recall in full detail the contents of a past conversation are likely to result in far more errors of omission than errors of commission. The amount of information that can be recalled will be higher for participants who are actively involved in the conversation, and when they are queried at shorter delays. The accuracy of what they offer is not likely to be much affected by these factors, however.
Salient information is more likely to be recalled than mundane information. This may be a type of primary distinctiveness effect. The implication is that a witness’ ability to recall inappropriate, explicit, or illegal content is likely to be greater than his or her ability to recall other mundane details of the conversation. Such findings explain the memorability of comments such as “it depends on what the meaning of the word ‘is’ is,” and “basket of deplorables.”
At the same time, the literature is clear that for the bulk of what is said in conversation, the ability to recall the precise words is generally poor. Interlocutors are much more likely to remember the gist of what was said than the exact words. These conclusions apply to explicit, illicit, and mundane content.
If legal implications of memory for conversation hinge on memory for the precise wording of what was said, an investigator may be on less solid ground than if memory for the gist and intentions are in question. At the same time, meaning in language is shaped by a combination of words, gestures, prosody, and actions situated in a rich context among persons who are aware of conventional or pragmatic meaning. Philosophers of language use have long pointed out that the meaning of expressions such as “Can you pass the salt?” are pragmatically interpreted as a request. If a person were to recall that a speaker asked them to “Please pass the salt,” when the original utterance was in fact “Can you pass the salt?” the literal meaning of the recall is closer to what was intended than the literal meaning of the original request.
Contemporaneous notes (“memcons/telcons”) were reportedly taken by both James Comey (former director of the FBI) and by Andrew McCabe (former FBI deputy director) following conversations with President Trump. Although there is no research that we know of that addresses the consequences of such note-taking on memory for conversation, there are relevant facts that can lead to strong hypotheses. First, from research with students in classroom settings, we know that the act of writing down information enhances memory for that information (Aiken, Thomas, & Shennum, 1975; Einstein, Morris, & Smith, 1985). Second, when one composes notes shortly after an event—as Comey claims to have done—we know that input-bound accuracy is likely to be at its highest. One would thus expect that notes taken shortly after an event are likely to include more details than would be recoverable later, during an interrogation.
We know of no experimental evidence that speaks directly to the question of the quantity and quality of notes taken in such a fashion. We can, however, extrapolate from results of experiments that test conversational recall in the minutes following a conversation (Samp & Humphreys, 2007; Stafford & Daly, 1984), along with accuracy of notes taken during the conversation itself (Lamb et al., 2000), and evidence that highly salient and contextually surprising information is more likely to be recalled (Keenan et al., 1977; Pezdek & Prull, 1993). An additional consideration is the possibility that memory for nonmemorialized elements of the conversation can be blocked by virtue of having taken the notes. The findings here suggest that the Comey notes are likely to be incomplete but not errorful. Whether in hindsight Comey will be remembered as “fundamentally right” about what had happened (Neisser, 1981) or not, remains to be seen.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
