Abstract
The majority of research on investigative interviewing has been on police attempting to solve a crime by obtaining a confession or gathering information, and comparatively fewer studies have examined interviewing at points “downstream” in the process, such as in the courts or correctional system. Furthermore, the focus of the research has been to measure the variable techniques or questioning strategies that produce confessions or information at the expense of analyzing factors related to the interview itself. Thus, we analyzed a sample of 50 corrections-based interviews for “dynamic” interviewing methods and interviewee responses that were measured at three points throughout the interview, and we measured 10 “static” interview factors. In the final multilevel model, we found that productive questioning methods increased a component score that combined interviewee cooperation, engagement, and forthcomingness, the several measures of accusatorial interrogation methods decreased the outcome measure, and the case-level variable of interviewee-initiated interviews increased it.
Research on interrogation and investigative interviewing has matured rapidly in recent years in response to miscarriages of justice, human rights abuses, and a more general turn toward adopting science-based methods in the field (Meissner et al., 2017; Vrij et al., 2017). Researchers have largely examined the variable methods that may be employed by interviewers, such as question type (Winerdal et al., 2019), interpersonal behaviors (Alison et al., 2013), evidence disclosure (Hartwig et al., 2014), or a slate of specific techniques (Kelly et al., 2013). In addition, these methods have been found to vary within interviews (Griffiths & Milne, 2006), or in a “dynamic” fashion (Kelly et al., 2016) over interrogations where the rate of different methods used changed over three time periods. Comparatively less research has examined attributes of the interviews themselves that may influence outcomes, what Cleary (2014) termed “objective interrogation circumstances” in her study of youthful suspects, or what Kelly et al. (2016) referred to as “static” or unchanging factors of the interview.
The majority of research in this area, moreover, has been aimed at understanding the role of interviewing in the context of criminal investigations where, especially in the United States, obtaining a confession is the goal of the interview (Kelly & Meissner, 2016) and, regardless of national context, solving a specific crime is the primary goal of the investigation. With a few exceptions, there is a dearth of research on investigative interviewing at other points “downstream” in the criminal justice process such that new settings and populations to study would make meaningful contributions to the literature. To this end, the present study examined a sample of interviews of men incarcerated in a large county jail in Las Vegas, Nevada, and it is the first in the literature to use corrections-based interviews to understand the phenomenon of interviewing and interrogation. We will review existing research on common models of interrogation that include both the dynamic (e.g., rapport-building, question types) and the static (e.g., interviewee age, gang affiliation). Furthermore, we employed multilevel statistical models to simultaneously analyze dynamic interview methods and static interview-related factors, finding that variables at both levels of analysis influence interviewee responses.
Before reviewing the literature, a brief comment on the title of this article is warranted. It is attributable to a corrections officer who participated in this study who remarked that his approach to an interview was to treat it as “just a normal conversation.” From the perspective of a professional investigator, this mental model for handling interviews appears simply to be a way to manage a large volume of work, but one practitioner’s observation creates opportunity for scientific exploration. Thus, our goal in this article was to build upon existing research, continue to unpack the complexities of investigative interviewing, and understand what could make an interview “just a normal conversation.”
Effective Interviewing: Evidence From the Field
Field research on interviewing and interrogation has largely focused on criminal investigations conducted by police attempting to solve a crime by obtaining incriminating information and confessions. In an effort to organize the large number of known interrogation methods, Kelly and colleagues (2013) examined law enforcement training manuals, government reports, and academic publications, arguing that a taxonomic structure existed among the methods. At the macro-level were dichotomous categories such as information-gathering versus accusatorial methods, and the smallest units were the micro-level techniques such as finding common ground, appealing to the suspect’s conscience, or bluffing about evidence (totaling approximately 70 discrete techniques). Between these, they argued, existed six meso-level domains which were more descriptive than the broad categories, yet more parsimonious than the specific techniques.
First, the Rapport and Relationship-Building domain seeks to establish a working relationship between investigator and subject, with example techniques in this domain including meeting the subject’s basic needs (e.g., offers of food, water), finding common ground, and expressing empathy. Next, Emotion Provocation is akin to minimization, and it included techniques such as appealing to the subject’s self-interest, appealing to his or her conscience, offering rationalizations, and flattery. Minimization, it should be noted, has been described as a “soft-sell” constellation of techniques (Kelly, Russano, et al., 2019) aimed at manipulating a suspect’s perception of the crime, the interview, possible punishments, and the suspect him/herself. Context Manipulation is altering the physical or temporal space in which the interview occurs (Kelly, Dawson, & Hartwig, 2021), and the Confrontation/Competition domain includes accusatorial techniques such as expressing frustration, and explicitly or implicitly making threats of punishment for noncooperation. Next, appealing to the subject’s sense of cooperation and exchanges between interviewer and interviewee are included in the Collaboration domain. Finally, showing or alluding to information from sources such as criminal records, witness or victim statements, surveillance footage, or forensic tests represents the Presentation of Evidence domain.
The domain structure has been applied in subsequent research describing the rate at which interrogators and interviewers use the methods (Redlich et al., 2014; Russano et al., 2014), and for comparing interviewing practices across countries (J. C. Miller et al., 2018). In these self-report studies, Rapport and Relationship-Building was used by the participants more than other domains and across different scenarios and intended outcomes. When applied to actual interviews, Kelly et al. (2015) found that Rapport and Relationship-Building was most used and that the Confrontation/Competition and Presentation of Evidence domains were associated with a lower likelihood of confession. In a subsequent analysis, the latter two domains reduced suspect cooperation, whereas Rapport and Relationship-Building significantly increased it (Kelly et al., 2016). Moreover, Kelly et al. (2016) reported significant variation on the average use of the domains and suspect cooperation over three time periods, indicating that interrogation and interviewing methods and interviewee responses have dynamic, or changeable, properties that could be captured using appropriate coding structures.
In the several studies on police interrogation that preceded the taxonomy and its domains, Leo (1996) found that among the 25 techniques he observed in both live and recorded interrogations, appeals to the suspect’s self-interest and conscience and the use of moral justifications—each found within the Emotion Provocation domain—were most commonly used and associated with providing incriminating evidence. Feld (2013), however, found minimization techniques were less frequently applied to juvenile suspects; instead, Feld reported more accusatorial techniques related to the Confrontation/Competition domain used in his sample. In the third major study to examine accusatorial interrogation exemplified by the Reid method (Inbau et al., 2013), King and Snook (2009) reported high rates of techniques related to these same two domains and direct accusations of guilt.
The research described above was undertaken in the United States and Canada, where the Reid technique and similar accusatory interrogation styles predominate (Kelly & Meissner, 2016; Snook et al., 2015), and it should be noted that this type of interrogation is problematic in that it increases the risk of false confessions (Meissner et al., 2014). A contrasting framework has been developed and evaluated in the United Kingdom and elsewhere that is based on rapport-building with the goal of eliciting information. The PEACE model (Milne & Bull, 1999; Williamson, 1993) exemplifies the information-gathering counterpoint to accusatorial interrogation, wherein rapport-building is central to the model, coupled with good questioning strategies and a search for the truth that supplants confession-seeking motivations of the interviewers. Research consistently demonstrates that where the PEACE model is adhered to, suspects disclose more information and the risk of false confessions is significantly reduced (Bull & Soukara, 2010; Meissner et al., 2014; Soukara et al., 2009; Walsh & Bull, 2010).
Question Types
As opposed to the various tactics, techniques, and procedures that researchers have focused on in investigative interviewing, other studies have sought to understand the influence of questioning strategies. In the broadest sense possible, the research on question types has examined open- versus closed-ended questions. Open questions are generally considered those in which an interviewee may give a free-narrative account of the events under investigation, and closed questions considerably constrain her ability to share information as they elicit brief and undetailed responses. Research has found that skilled interviewers should ask more open, appropriate, and/or productive questions (Clarke & Milne, 2001; O’Mahoney et al., 2012; Read et al., 2009) which are more effective on a range of outcomes than closed, inappropriate, or unproductive ones (Oxburgh et al., 2012; Phillips et al., 2012; Walsh & Milne, 2008).
As part of the Motivational Interviewing (MI) framework in the therapeutic literature, W. R. Miller and Rollnick (2012) developed a typology of productive interviewing known as OARS—open-ended questions (i.e., Tell, Explain, Describe; 5W and H); affirmations (e.g., “Thank you for speaking with us,” “We appreciate the difficulty of talking about this issue); reflective listening (e.g., minimal encouragers, echoing words), and summaries (i.e., periodic summarizing of what the interviewee had previously said). Support has been found for MI in general health studies (Lundahl et al., 2010) and among correctional populations (McMurran, 2009), though specific evaluations of OARS itself appear to be relatively rare but supportive of the concepts (Bell & Cole, 2008).
The Static Nature of Interrogation
In contrast to the research on interview techniques and questioning strategies, relatively few studies have examined attributes of the interviews or cases themselves that may influence outcomes, what Kelly et al. (2016) referred to as “static” factors that do not change over the course of an interrogation. In an early examination of police interviewing, Moston et al. (1992) were among the first to consider “pre-interview variables” such as contextual factors (e.g., evidence strength, number of times the suspect was interviewed) and characteristics of the case and suspect (e.g., offense type and severity, suspect’s criminal history, age, and gender). Moston et al. reported that stronger evidence, offense severity, and presence of a legal advisor were related to interview outcomes.
These static factors have received comparatively little attention since Moston et al.’s (1992) work, as interrogation researchers have largely turned their attention to the effectiveness of the techniques, approaches, or questioning strategies interviewers could employ to achieve their goals (see meta-analysis of Meissner et al., 2014). In the past decade, however, several studies emanating from Canada revived this interest in case characteristics’ influence on interview outcomes. Beauregard and colleagues (Beauregard et al., 2010; Beauregard & Mieczkowski, 2010, 2012) analyzed a large sample of sex offenders who were interviewed about their interrogations to determine the factors “outside the interrogation room” that influenced their decision to confess. These studies generally found support that case and offender factors, including age of the suspect and victim, affected the confession outcome.
In her study of American juvenile interrogations, Cleary (2014) also measured a wide range of case and contextual factors, what she referred to as “objective interrogation circumstances,” that could influence the interaction between police and suspects. Cleary measured suspect demographics, crime category, and whether the suspect was in custody or under arrest and the physical environment of the interrogation room. Similarly, in a recent study of the static factor of the physical context in which interviews occurred, Kelly, Dawson, and Hartwig (2021) randomly assigned witnesses to violent crime to be interviewed by police detectives in one of the two contextual conditions—a modified room intended to enhance comfort and warmth, or the control condition of a standard interrogation room.
In an extensive study of interviews of suspected terrorists, Alison et al. (2013) presented what could otherwise have been classified as examining offender or case characteristics. The researchers presented three tables that examined their primary variable groupings as “function[s] of suspect-background” (Alison et al., 2013, pp. 423–424, Tables 5–7). They analyzed differences across international terrorists, right-wing extremists, and paramilitary members, finding variation on the primary variables between groups. Alison et al., however, did not include the static variable of suspect background in their final model. In a related follow-up study, Surmon-Böhr and colleagues (2020) employed a multilevel structural model to examine information yield of suspected terrorists (also including measures related to OARS of MI), but they entered no static interview-level variables.
Finally, another important static factor germane to this study is the matter of gang affiliation of the interviewee. Although the literature on gangs in America is quite rich in both depth and breadth, no prior study has examined investigative interviewing among this population. Regardless, we know from the research that gangs are nearly omnipresent inside correctional facilities (Skarbek, 2014), that they adhere to and mirror street gang culture, including an unwillingness to cooperate with authorities, and influence behavior—both criminal and otherwise—inside and outside the facilities in which they operate (Pyrooz & Decker, 2019).
Gaps and Research Questions
Moston et al. (1992) put forth an interaction process model of interviewing that included both case and offender characteristic variables and interviewer’s choice of methods and questioning strategies. These factors, when considered together, would result in a successful resolution to the interview. The model was only partially tested by Moston et al., however, and the authors went on to state “that there were few obvious links between case characteristics and interviewing styles” (p. 38). The laboratory and field studies in the years since belie this notion that interviewing styles or techniques make no difference, but no research of which we are aware since then has examined both interview methods and case factors in the same study. Thus, using a novel setting and sample for investigative interviewing research, we asked the following research questions:
With regard to the first question, the study established a baseline of interviewing practices and their relationships to interviewee responses that can be used for comparative purposes with conventional police interviewing research described above. As noted, this is the first such examination of corrections-based interviewing practices, and it is heretofore unknown how these investigators may differ from police investigators in their respective uses of various techniques and question types. The nature of the investigations and the outcomes of them differ between police and correctional interviews—the former may lead to arrest, prosecution, and punishment, and the latter (in this study specifically) are nonprosecutable intelligence interviews or institutional rule violation investigations. Furthermore, not all suspects are in custody while being interviewed, whereas, by definition, incarcerated persons are, and this may influence interview methods employed by investigators. Although these differences may influence the interview methods used in correctional settings, we argue that it is premature to offer hypotheses on how corrections-based interviews differ from police interviews.
Question 2 provides unique descriptive data on what correctional interviews look like and examined the bivariate relationships between interview characteristics and the outcome measure. To our knowledge, the final analyses of the third question represent a first-of-its-kind model of investigative interviewing where both interview methods and characteristics were simultaneously modeled. In sum, the contribution of these questions and the article more generally is that the virtues of correctional interviews are themselves unknown in the literature and they may have applicability across different investigative contexts. Specifically, the interviews analyzed in this study may more closely resemble those in other contexts such as national security and intelligence interviews than suspect interrogations where a confession is the main goal and the interviewee is not necessarily in custody.
Materials and Methods
This study’s procedures were approved by the first author’s university institutional review board and that of the funding agency who sponsored this research, and permission to use the interviews was authorized by a memorandum of understanding (MOU) between the research team and the Las Vegas Metropolitan Police Department (LVMPD).
Setting and Sample
The focus of this study was the Gang Special Investigations Unit (GSIU) of the Clark County Detention Center (CCDC), a division of the LVMPD. Because the LVMPD oversees both street-level law enforcement operations and the county corrections system, the GSIU is positioned to facilitate information sharing across the agency (see also Maxson, 2012; Pyrooz & Decker, 2019). At the time of data collection, there were five investigators in the unit whose responsibility is to ensure the safety of the incarcerated persons and staff and the security of the facility by investigating gang membership and activities, institutional rule violations, and when necessary, criminal activities that may be referred to the district attorney for prosecution. The GSIU investigators routinely conduct investigative interviews as part of their duties, but beyond what they would receive in-house, their exposure to any formal interrogation and interviewing training tended to be of the accusatorial variety like the Reid method (see Meehan et al., 2019, for more on the GSIU).
One hundred and thirty-eight video-recorded interviews were provided by the GSIU as part of an unrelated study on multicamera technology in an interviewing booth and were later made available by LVMPD for this examination of interviewing methods. The majority (86.2%) took place between March 2010 and June 2011, with the remainder being recorded between January 2014 and February 2016. Prior to randomly selecting the interviews for analysis, we restricted the sampling frame for several reasons. We removed the 15 recordings where either the interviewee or the investigator was a woman. An additional 16 recordings were removed from consideration due to the length of the interview, as interviews shorter than 15 min (n = 9) contain little more than demographic information and those more than 90 min are atypical in this population (n = 7). Finally, interviews were excluded because the interviewees appeared to suffer from severe mental health issues (n = 3), because they were conducted with translators (n = 2), and because there were two interviewees at once (n = 2).
Any of these static factors—gender, interview length, mental illness, translators, and multiple interviewees—are potentially interesting focal points for future research, but among the recordings in our possession, there was an insufficient number of each for proper analysis of static variables and were therefore removed to limit potential confounding influences in the study. The sampling frame, therefore, consisted of 100 recordings from which we randomly drew our final sample of 50 interviews, a number that would be sufficient for multilevel analysis and given the resources available for intensive coding. The mean length of interviews in the final sample was 35 min and 3 s (SD = 0:16:10), and the total amount of interview time across the sample was 29 hr, 12 min, and 7 s. All interviewees in the final sample appeared in only one interview.
Procedure
Coding
The most common approach to measuring and analyzing investigative interviews has been to code whether the techniques or methods were used at any point in the session (e.g., Feld, 2013; Leo, 1996; Soukara et al., 2009). Other content analyses of police interviews, however, coded recordings in 5-min intervals to examine how the methods employed by investigators changed over the course of the interview (Bull & Soukara, 2010; Pearse & Gudjonsson, 1999). Kelly et al. (2016) did this as well in their dynamic analysis of interrogation methods and suspect responses, but they also aggregated the 5-min intervals representing beginning, middle, and end segments for additional analyses (pp. 301–303). In this study, given the resources available, we adopted a modified version of these procedures by coding interview methods and interviewee responses by dividing the interview into thirds for dynamic coding. In those instances where the interview was not equally divisible by three to the half-minute, the middle segment absorbed the extra time. Thus, we coded and analyzed 150 segments that had an average length of 11 min and 57 s (SD = 0:05:30). Four research personnel served as coders, with two coders responsible for the interview methods and two for the interviewee responses. The coders were paired off in this manner to eliminate potential bias in coding for both methods and responses.
Dynamic Interview Methods
The coding scheme for the interview methods consisted of two parts. First, we coded for those questions and verbalizations in the OARS method detailed in the previous section. In addition, we wished to capture the use of question types and verbalizations that have been found to be less productive in an interviewing context (Griffiths & Milne, 2006). These included closed-ended questions, leading questions, forced-choice questions, and interruptions. At the end of each segment, both the OARS and “anti-OARS” variables were coded on the same 3-point scale (0 = none, 1 = moderate, 2 = major).
We explored the underlying structure of the question types and utterances in a categorical principal components analysis (CATPCA) using Promax rotation that reduced the eight variables to three components with eigenvalues greater than 1, explaining 52.9% of the variance between them—Unproductive Questioning, Appropriate Utterances, and Productive Questioning. The first, Unproductive Questioning, consisted of closed-ended questions (factor loading = .73), forced-choice questions (.70), and leading questions (.66); the variables that loaded onto the Appropriate Utterances component were the reverse coding of interruptions (−.76) and reflective listening (.74); and the third factor consisted of affirmations (.69), open-ended questions (.62), and summaries (.59). These scores are normally distributed (M = 0, SD = 1).
The second part of the interview methods coding scheme was based upon the interrogation taxonomy (Kelly et al., 2013) and a previous content analysis of police interviews (Kelly et al., 2016), coding five of the six meso-level domains (Context Manipulation was excluded because all interviews took place in the same room with an unchanging configuration of furniture). The domains are described above, but it is notable that, unlike criminal interrogations in which such techniques are explicitly prohibited, GSIU investigators are able to offer rewards as an exchange for cooperation (Meehan et al., 2019), such as phone calls out of the facility, extra commissary, or more favorable housing, which is in the Collaboration domain (see Kelly et al., 2015). The five domains were also coded on a 3-point scale of emphasis (0 = none, 1 = somewhat, 2 = major) at the end of each one-third segment.
Dynamic Interviewee Responses
In an attempt to capture a wide variety of possible interviewee responses during the interview, we coded for multiple measures at the end of each segment. Specifically, cooperation was operationalized as agreeing with the investigator and his accusations or offering affirmative responses to questioning; resistance, however, included disagreement and denials. These variables were coded at the end of each segment on similar 5-point scales with higher scores representing a greater degree of the concept. Next, engagement was a 5-point measure and conceptualized as (a) silence, (b) expressions of not wanting to be interviewed, (c) minimal and irrelevant responses, (d) minimal but relevant responses, (e) and extensive answers to questions. Finally, to account for information offered by interviewees, we developed a 3-point measure of forthcomingness that captured, to a limited degree, a level of detail elicited by investigators (0 = none, 1 = somewhat, 2 = very).
As expected, the four interviewee responses of cooperation, resistance, engagement, and forthcomingness were significantly correlated with one another and loaded onto a single component with an eigenvalue greater than 1 in a CATPCA with Promax rotation. Thus, a single standardized component explaining 71.7% of the variance between the variables was created and named CREF (i.e., cooperation, resistance [reverse score], engagement, and forthcomingness), and used as the primary dependent variable in this study. The loadings ranged from .94 to .63, and as a standardized score, it is normally distributed (M = 0, SD = 1).
Static Interview Factors
In the existing literature on interviewing and interrogation, only some of these static factors have been examined, but not comprehensively, and not in consideration with the dynamic-level variables. From the video recordings themselves, we were able to code 10 such variables that are properties of the interview: the number of investigators (continuous); which of the GSIU investigators was the primary interviewer; the nature of the interview (intelligence-gathering vs. incident investigation); whether the interviewee was in a gang (dichotomous); who initiated the interview (interviewee or investigator); interviewee race or ethnicity (African American, Caucasian, Latino, other); interviewee age; where the interviewee was from (California 1 vs. elsewhere); whether the interviewee was shackled throughout the interview (dichotomous); and whether any of the investigators shook the interviewee’s hand (dichotomous). These last two variables were measured as a result of focus groups conducted with current and former GSIU investigators who expressed their perception that rapport-building can be affected by these factors (Meehan et al., 2019). To be clear, these variables are not properties of the segment, and it would be inappropriate to analyze them as such. These variables were coded by the same pair of coders who were responsible for the interviewee response coding.
Reliability
The coding schemes were pilot tested using four randomly selected interviews from the sampling frame that were not included in the final sample, and the research team modified the coding schemes as necessary in preparation for reliability testing. Following conventional standards for interrater reliability testing, from the final sample of 50 recordings, we randomly selected 20% for the analyses (n = 10). As stated above, the interview methods and interviewee response coding was performed by separate teams of two coders, but the same recordings and procedures were used for reliability testing for both interview methods and interviewee responses. For both the interview methods and interviewee response coding, we achieved high levels of reliability throughout the process: 88.4% agreement and Krippendorff’s alpha = .744 for interview methods and 88.2% agreement and Krippendorff’s alpha = .983 for interviewee responses. The low rate of discrepancies was resolved through discussion.
Data Structure and Analysis Plan
We coded the interview methods and interviewee responses at three points of every interview in the sample—at the conclusion of the beginning, middle, and end segments. Put differently, our primary unit of observation in this study is the one-third segment of the recordings, and with 50 interviews, we have 150 segments as the primary unit of analysis. By design, there is a hierarchical structure of the data whereby the segments (Level 1) are nested within the 50 interview recordings (Level 2). When organized in this fashion, we are able to model both the dynamic factors at Level 1 while examining and controlling for the static interview-level factors at Level 2 (Raudenbush & Bryk, 2002). In the final model, all predictor variables were uncentered, and because there is neither a theoretical rationale nor previous research to justify otherwise, the slopes were fixed. Thus, this study is the first multilevel analysis that includes measures at more than one level of aggregation with the sufficient number of Level 2 units for adequate statistical power (Maas & Hox, 2005).
Results
Descriptive Statistics and Bivariate Associations
As shown in Table 1, the interviewees in the sample were, on average, highly engaged, cooperative, and forthcoming, and the measured level of resistance was very low. Table 2 shows the rates of the various interview methods that GSIU investigators employed in this sample, including the specific measures of question types and utterances that loaded on the three components of Productive Questioning, Unproductive Questioning, and Appropriate Utterances. The most common question types and utterances employed were closed-ended questions, followed by leading questions, open-ended questions, and reflective listening. Also in Table 2 are the Pearson correlation coefficients between the dynamic methods and interviewee CREF. Few of the methods were associated with increases in CREF, with only Productive Questioning, reflective listening techniques, and Appropriate Utterances, at conventional significance levels. We note here that although higher rates of interruptions were significantly related to decreased CREF, this variable is reverse-coded in Appropriate Utterances. Alternatively, increasing emphasis of the three domains—Emotion Provocation, Confrontation/Competition, and Presentation of Evidence—significantly decreased CREF.
Descriptive Statistics and Intercorrelations (ρ) of Interviewee Response Measures
Descriptive Statistics of Dynamic Interview Methods and Bivariate Relationships (r) With Cooperation, Resistance, Engagement, and Forthcomingness (CREF)
It is also important to examine the descriptive statistics of a number of static variables among our sample of 50 GSIU interview recordings. The descriptive statistics of the 10 variables we coded, including their association with the group means of CREF, 2 are included in Table 3. The majority of interviews included two or more investigators in the room at once, and the variable labeled with the pseudonym “Officer Armstrong” is a dichotomized version of which the GSIU investigator was the primary interviewer. We recoded as such because of how disproportionate his interviews were represented in the sample (48%). Independent sample t tests indicate that higher levels of CREF were associated with a handshake (M = 0.27, SD = 0.71) versus no handshake (M = −0.32, SD = 1.03); t = 2.38, df = 48, p = .022, d = 0.68, 95% confidence interval (CI) = [0.11, 1.24], and in incident investigation interviews (M = 0.31, SD = 0.65) as opposed to intelligence-gathering ones (M = −0.31, SD = 1.04); t = 2.51, df = 48, p = .016, d = 0.72, 95% CI = [0.14, 1.29]. Lower levels of CREF occurred when the investigator (M = −0.15, SD = 0.98) as opposed to the interviewee (M = 0.56, SD = 0.08) initiated the interview; t = 2.42, df = 48, p = .019, d = −0.81, 95% CI = [−1.50, −0.13].
Descriptive Statistics of Static Interview Factors and Bivariate Relationships With Cooperation, Resistance, Engagement, and Forthcomingness (CREF)
Note. ANOVA = analysis of variance.
Independent t tests used for all dichotomous interview-level measures, correlation statistics (Pearson’s r) used for the number of investigators and inmate age, and one-way ANOVA used for inmate race/ethnicity. The appropriate coefficient and its p value are reported in the CREF column.
Predicting Interviewee Cooperation, Resistance, Engagement, and Forthcomingness
A precondition of conducting multilevel analyses is that significant variation on the CREF outcome must exist across the 50 interviews at Level 2 (Woltman et al., 2012). As such, a one-way analysis of variance model that included no predictors at either the segment or the interview levels was performed to determine whether, in fact, any of the variance on interviewee CREF can be attributed to between-interview factors. The “null” or “empty” model indicates that significant variation on CREF existed across the interviews, χ2 (49) = 450.04, p < .001, therefore justifying further exploration of interviewing as a nested phenomenon. In addition, the variance components of this model can be used to calculate the intraclass correlation (ICC), an indicator of how much of the variance on the outcome (again, measured at Level 1) is attributable to the interviews (at Level 2). In this data set, 73.2% of the variance in CREF was between interviews, and 26.8% was attributable within interviews.
As shown in Table 4, the Confrontation/Competition and Presentation of Evidence domains were strong predictors of reduced CREF, and the Productive Questioning OARS component was the only significant predictor of increased interviewee CREF. At Level 2, we entered those static interview factors that were related to CREF in the bivariate analyses—whether the investigator and interviewee shook hands, the nature of the investigation (i.e., intelligence vs. incident investigation), and who initiated the interview. In addition, we controlled for any influence that “Officer Armstrong” had in the study. As seen in Table 3, Officer Armstrong was the primary investigator on nearly half of the interviews in our sample, and although there was no bivariate relationship between the primary investigator and CREF, to control for possible variation in investigator styles across interviews, we included this at Level 2 also. The only static interview factor related to CREF, controlling for all other variables in the model, was who initiated the interview. In this case, we found higher rates of CREF when the interviewee himself requested the interview. Although significant variation on CREF existed after entering all variables in the model, χ2 (49) = 203.18, p < .001, the variance that remained between interviews was reduced to 55.6%. Finally, we note that multicollinearity was not a problem in this model, as all variance inflation factors (VIFs) were <2.0 and tolerance levels ranged from .73 to .95.
Full Multilevel Model With Level 2 Variables Predicting CREF
CREF = cooperation, resistance, engagement, and forthcomingness; CI = confidence interval.
Discussion
This study brought together two previously disparate spheres of the literature in an attempt to advance the field in meaningful ways, and in doing so, also introduced corrections-based interviews to the literature, expanding the field to include a new setting for investigative interviewing researchers to explore. Previous field research has tended to focus on a limited number of factors thought to be related to the outcome of an investigative interview. (By their nature, laboratory experiments isolate the effects of just one or two conditions.) Be they techniques (Kelly et al., 2016; King & Snook, 2009; Leo, 1996), question type (Griffiths & Milne, 2006), or some combination of the two (Walsh & Bull, 2015), interviewer personality styles (Alison et al., 2013) or disposition (Holmberg & Christianson, 2002), or factors related to the suspect, crime, or interview itself (e.g., Cleary, 2014), there has been little crossover between these threads of the literature (see O’Mahoney et al., 2012). This study, simultaneously examining dynamic (or changeable) interviewing methods and static (or fixed) interview factors represents a step in the direction of ever more explanatory models of investigative interviewing.
Several stable findings emerged that represent replications of prior work (Kelly et al., 2016, 2015). Notably, the Confrontation/Competition domain was used at relatively low rates but nevertheless had a negative effect on interviewee responses. It is important to note the relatively low base rate of use for this domain, but it is telling that Confrontation/Competition resulted in a negative outcome. Similarly, the Presentation of Evidence domain reduced interviewee CREF. Techniques from this domain included showing or referring to information from sources such as criminal records and institutional histories, other interviewees’ or corrections officers’ statements, or surveillance footage, and it was significantly correlated with the Confrontation/Competition domain (results not shown).
At the interview level, several factors were significantly related to CREF in bivariate analyses, though only one remained so in the multilevel model. Although it was a commonsense finding that CREF would increase when the interviewee initiated the interview, it is nevertheless instructive to analyze these interview characteristics individually and to control for them in multivariate models. We are emphatically not arguing that the static interview factors do not make a difference in the interaction, and future research should consider these and other variables in their models (discussed in the next section).
Implications for Research and Practice
There is an interrelated implication of this study for both the practice of investigative interviewing and the research on it. Namely, field research on national security intelligence-gathering is difficult to conduct because of its secret nature, but it is critical to understand and improve practices in the name of security (Brandon et al., 2019). Existing research conducted with police investigators has attempted to draw lessons applicable to classified interrogations, but those lessons are opaque at best and largely unknowable. Studying corrections-based gang intelligence interviews opens a window to understanding how to gather information from individuals who may be subject to ongoing detention and who may be part of a larger criminal organization and adhere to certain principles of group allegiance that would otherwise make it difficult for investigators to gather information from. The results of this study, then, have applications beyond the prison walls, from criminal intelligence-gathering on the street to national security interrogations.
Despite some notable differences between criminal interrogations and those interviews conducted for military or human intelligence (HUMINT) purposes (Redlich, 2007), the results of this study support existing models of science-based investigative interviewing that are largely tested in laboratories and promoted among criminal investigators (e.g., Meissner et al., 2015). Accusatorial methods are eschewed in these models, and they put rapport-building at the forefront to produce interviewee cooperation and encourage questioning strategies that favor open-ended questions to elicit information. As shown in the results, the Rapport and Relationship-Building domain was employed more than the other four domains, and the Productive Questioning component resulted in significantly improved interviewee responses. Thus, although this sample of corrections-based investigative interviews is unique in the literature, and the nature and stakes of the interviews are distinct from criminal interrogations, this study is evidence in support of science-based interviewing across different settings.
Furthermore, that the Rapport and Relationship-Building domain was not significantly related to the outcome measure is likely a function of high rates of interviewee cooperation, engagement, and forthcomingness, indicating a possible ceiling effect in this study. This is also a limitation of this study, but it is nevertheless a worthwhile exercise to examine whether and how the results of the model could change when controlling for greater between-interview variation and several case-level factors. Future research, then, ought to strive for greater diversity in interviewee responses.
For practitioners themselves, this study can help them better understand the vast array of factors that influence an interview, including distinguishing between those they have control over and those they do not. Whereas the dynamic interview domains and questions investigators ask are arguably under their control—though perhaps these sometimes operate at a subconscious level—the static interview factors tend to be beyond the investigators’ ability to change. (Possible exceptions to this may be if the interviewee is shackled during the interview and whether an investigator shakes the interviewee’s hand, though each of these could be regulated by organizational policy, including in immediate post-COVID-19 restrictions, thus impairing the ability to choose.) The implications are twofold: First, investigators can be trained to employ more effective dynamic interview methods, an obvious and ongoing practice happening in increasingly many places around the world (see Walsh et al., 2016).
The second and more important but less obvious implication is that investigators ought to be aware of possible influences on the interview that are beyond their control. For instance, based on our multilevel model, where interviewees initiated the interview, investigators can expect the interviewee to be more cooperative, engaged, and forthcoming. Although rapport-building should always be a concern of investigators, during such interviews, investigators may not have to spend so much time developing a relationship and should be prepared to ask good, productive, open-ended questions to elicit the most possible information from an interviewee who probably is ready to talk. For both practice and research, then, the next advancement in the field may be understanding how dynamic interview methods vary under static conditions.
Limitations
There are a number of limitations that need to be acknowledged so that readers may have a proper context in which to understand the findings, and future research may benefit from the lessons we encountered while conducting this study. First, we cannot make any claim to representativeness. Although correctional investigations units exist at all levels of government across the country (Garzarelli, 2004), there may be something exceptional about the one studied here. Relatedly, as this is the first-of-its-kind study of investigative interviews with incarcerated individuals and correctional investigators, it is difficult to state with great confidence that the interviewees and investigators are representative of similarly situated actors in other facilities. Moreover, how the interviews were originally selected is unknowable, and as indicated above, the interviewees scored relatively high on the CREF outcome measure, indicating a possible unrepresentative study even within GSIU operations. As such, the generalizability of setting and sample is limited, even if the interview methods, factors, and interviewee responses are applicable to other jurisdictions, comparable to other evaluations, and useful to future studies.
The next limitation pertains to the dynamic and static factors analyzed in this study. Despite the variety of variables being among the broadest examined in a single field study in the literature, we do not claim that these measures are exhaustive. For instance, a number of earlier studies found positive effects for interviewer disposition, such as empathy (Oxburgh & Ost, 2011) or humaneness (Holmberg & Christianson, 2002). This line of inquiry was most thoroughly developed by Alison et al. (2013) in their ORBIT model that examined interpersonal characteristics of both interviewer and suspect. Future studies may wish to consider all three clusters simultaneously (Kelly & Valencia, 2020).
At the interview level, we note the absence of any examination of gender in this study, or any characteristics of the investigators at all, and this reflects an important gap in the literature more generally. Gender was identified by the investigators as an important variable they consider during interviews, and they believed that different outcomes would be achieved based on the gender of both the interviewer and the interviewee (Meehan et al., 2019). Because there were so few women in the sample of interviews in our possession, however, we opted to remove them for selection into the final sample and call for future research on this potentially critical interview factor. Indeed, preliminary analyses of these 15 interviews suggest that women may use different interviewing approaches (Biscoe et al., 2017). Additional case factors or characteristics that were not measured here but could be in future studies include investigator perceptions on evidence strength and truthfulness of the interviewee, and how much, if any, preparation was performed prior to the interview. Moreover, future studies should seek to investigate interactions between demographic variables of both interviewers and interviewees not just in correctional settings but in all examinations of investigative interviewing (for analogous studies on race, age, and gender in survey and interviewing research methodologies, see Adida et al., 2016; Brunton-Smith et al., 2016; Haunberger, 2010).
Finally, like all field studies of interviewing and interrogation, it is nearly impossible to ascertain the ground truth regarding what the interviewees reportedly said or the sincerity of their engagement or cooperation with investigators. Future research should attempt to triangulate the ground truth via prospective sampling methods and contemporaneous accounts from the investigators—and, ideally, the interviewees—regarding their perceptions of the interview. Although in this study we were unable to do this, the lack of ground truth is partially accounted for by expected findings regarding which methods increased or decreased CREF.
Conclusion
As we did in the introduction above, we shall conclude with a final reflection on the title of the article. Normal conversations do not typically involve one party attempting to manipulate the other; normal conversations do not typically exploit power differentials even where they exist; and normal conversations do not typically include threats, accusations, or deceit. However, normal conversations do typically engender mutual respect between the parties; normal conversations do typically aim to satisfy both parties’ needs; and normal conversations do typically conclude amicably. In other words, the most successful interviews are normal conversations, and it takes a very skilled investigator to make something as complex as an interview appear normal or routine. This study brought to bear a wide range of dynamic interviewing methods and static interview factors to more fully explain the complexities of successful investigative interviewing.
Footnotes
Authors’ Note:
This work was funded by the High-Value Detainee Interrogation Group (HIG) contract awarded to Nathan Meehan and the Naval Research Laboratory. Statements of fact, opinion, and analysis in the paper are those of the authors and do not reflect the official policy or position of the Federal Bureau of Investigation or the U.S. Government. Earlier versions of this research were presented at the 2018 American Psychology-Law Society conference in Memphis, TN, and the 2019 International Investigative Interviewing Research Group conference in Stavern, Norway. The authors wish to thank the HIG and the FBI for funding this research, as well as the following individuals: Drs. Susan Brandon and Jamie Fader; Sergeant Jere Ebneter, Captain Richard Forbus, Deputy Chief Richard Suey and to the officers of the Gang Special Investigations Unit, LVMPD; research assistants Alicia Biscoe, Ashley Blair, Erin Myers, Ashley Varghese, and Skye Woestehoff.
