Abstract
The present research builds on previous models of jury diversity’s benefits by exploring how diversity impacts the deliberation process. In Study 1, community members (N = 433) participated in a jury decision-making study manipulating the strength of evidence (ambiguous vs. weak) and the diversity of the jury. When the evidence in the case was ambiguous, both white and black jurors made high-quality contributions to discussion in diverse juries than in nondiverse juries. In Study 2, undergraduate students (N = 369) were randomly assigned to wealth and power conditions and then deliberated in diverse and nondiverse groups. Diverse juries were less likely to convict the defendant, and jurors on diverse juries made high-quality contributions to discussion. Although previous work has documented effects of diversity on high-status jurors’ contributions to deliberations, this work suggests that diversity may relate to more complex evidence evaluation for members of low-status groups as well.
In the United States, the Supreme Court has expressed concern about the lack of diversity on juries (e.g., Peters v Kiff, 1972), highlighting the role that diverse juries play in protecting defendants’ Sixth Amendment rights to a fair trial. In addition to the Court’s concern about ensuring representation on juries, juries who exclude members of underrepresented groups can challenge the public’s faith in the jury system in general (Ellis & Diamond, 2003; MacCoun & Tyler, 1988). Although the court in Peters v Kiff (1972) recognized the vast number of ways that diversity might influence the jury decision-making process, jury diversity studies have focused on the effects of racial and ethnic diversity and their tendency to reduce the prevalence of racial bias (Bowers et al., 2001; Lynch & Haney, 2011). Although increasing the racial and ethnic diversity of the jury does reduce racial bias in jury decisions (Bowers et al., 2001), racially diverse juries also process evidence more thoroughly and systematically than do nondiverse juries (Sommers, 2006), in addition to experiencing more cognitive depletion and higher self-monitoring (Peter-Hagene, 2019; Stevenson et al., 2017).
Much of the research examining how lack of diversity influences jury outcomes stems from the concern that modern juries are not representative of communities. In the United States, juries are generally composed of approximately 12 jurors, and attorneys are permitted to remove jurors who they believe are not sympathetic to their argument through the use of a limited number of peremptory challenges. Attorneys need not provide any justification for exercising a peremptory challenge, which may allow racial bias to influence their decisions. To address the issue of the systematic exclusion of people of color from serving on juries, the Court in Batson v. Kentucky (1986) ruled out that attorneys were prohibited from removing jurors using peremptory challenges based on race. According to Batson, if defense attorneys believed their opposition was removing a potential juror because of racial/ethnic group membership, the opposing counsel would need to provide a race-neutral justification for the challenge. The judge then decides whether to accept this race-neutral justification. Empirical research demonstrates that people can easily generate race-neutral justifications when asked to explain their decisions in a mock jury selection task (Sommers & Norton, 2007). As a result, despite the protections afforded by the Batson decision, racial bias remains prevalent in the exercise of peremptory challenges according to archival studies (Baldus et al., 2001; Clark et al., 2007; Equal Justice Initiative, 2010; Rose, 1999; Semmel, 2020); and Batson challenges are rarely successful (Melilli, 1996). When the ineffectiveness of Batson is combined with methods of juror recruitment that tend to underrepresent people of color (Hannaford-Agor & Waters, 2011), the result is that juries tend to underrepresent racial and ethnic minority groups.
The present research expands upon previous research by exploring whether deliberating in diverse juries affects the decision-making of all jurors (i.e., those connected to groups with privilege and those who are not) and these effects are produced by diversity other than racial diversity. Some of the mechanisms through which diversity benefits discussion should benefit all jurors (i.e., a broader range of perspectives and ideas; Nemeth, 1995), but concerns about appearing prejudiced should not improve evidence processing for members of racial and ethnic minority groups. Jurors who are members of underrepresented or low-power groups, however, may be more concerned about being the target of prejudice during a diverse jury deliberation session (Doerr et al., 2011) or confirming stereotypes about their group (Steele & Aronson, 1995). In Study 1, we examined the effects of jury diversity on both white and black mock jurors to examine whether evidence evaluation is improved in both groups; in Study 2, we manipulated wealth and power and manipulated diversity along that dimension. In addition, we used an established scale for juror reasoning as our main measure of deliberation quality (Kuhn et al., 1994).
Study 1
Jury Diversity
Diverse juries are substantially different from homogeneous juries. Mock juries with a lower proportion of white men are less conviction prone and less likely to award the death penalty than juries with a higher percentage of white men (Lynch & Haney, 2011). Racially diverse juries are also less likely to exhibit racial bias in their sentencing decisions (Bowers et al., 2001). In addition to examining the effect of jury diversity on trial outcomes, researchers have explored the effects of racial/ethnic diversity beyond verdicts, examining how diversity influences the decision-making process in mock juror studies (Peter-Hagene, 2019; Sommers, 2006). Diverse mock juries deliberated longer, discussed more case facts, offered more corrections, and engaged in more discussion about the potential for racism to affect case outcomes than did nondiverse mock juries. Sommers (2006) suggested that jurors deliberating in diverse groups might experience benefits because of greater exposure to a wider range of perspectives and points of view, a model by which diversity improves group performance called information exchange (Jehn et al., 1999; Nemeth, 1986). More specifically, Sommers (2006) concluded that the increased complexity of processing of information in diverse juries was driven by the motivation of white jurors to avoid prejudice in the presence of black jurors (Fleming et al., 2005).
Building off the framework established by Sommers (2006), researchers have explored the mechanisms underlying how diversity affects jurors’ deliberation process. Both Stevenson et al. (2017) and Peter-Hagene (2019) have explored how diversity affects mock jurors’ information processing and reasoning. Stevenson et al. (2017) coded each mock jurors’ contributions to deliberations and found that white jurors tended to use fewer overall words, more first-person singular pronouns, fewer social words, and took more time crafting their responses, signaling higher-level self-monitoring in interracial than in nondiverse juries. Peter-Hagene (2019) approached the question from a related perspective, predicting that deliberating in diverse juries (rather than nondiverse juries) would result in greater cognitive depletion among white mock jurors, using a Stroop test to measure cognitive depletion. As predicted, white mock jurors experienced more cognitive depletion when deliberating in a diverse jury as opposed to a nondiverse jury. Diversity also improved both information processing and bias; while nondiverse juries considered more case facts in cases with a white defendant (compared to a black defendant), in diverse juries this discrepancy was eliminated (Peter-Hagene, 2019).
Present Research
We explored the courts’ assumptions that diverse juries will evaluate evidence better than will nondiverse juries. In this study, we examined the differential benefits of diversity for black and white mock jurors by comparing diverse juries (e.g., mixed composition) to those composed of only white jurors and those composed of only black jurors. Prior research has included a range of deliberation quality measures to explore how diversity influences the quality of jury deliberations (e.g., length of deliberations, number of case facts discussed, time crafting responses). In this study, we used a juror reasoning scale (Kuhn et al., 1994) that measured not only the quantity of juror contributions but also their higher-order complexity, evaluative direction, and its role to provide a more nuanced picture of how jurors process information. We varied the strength of the evidence in the case to determine whether diverse juries would be more sensitive to case strength than nondiverse juries. We also included a trust in the system measure to examine whether the race of juror would influence how jurors deliberated the evidence in the case. Finally, we included anticipated interaction measures to evaluate how jurors were feeling about the upcoming interaction and ratings of the various trial participants.
We predicted that diversity would improve the information processing of all jurors in our study. Black jurors are likely not concerned about appearing prejudiced but may be concerned about confirming stereotypes about their cognitive abilities (Steele & Aronson, 1995) and about being the target of prejudice (Doerr et al., 2011). Therefore, we predicted that just as the anxiety and stress surrounding interracial interactions seemed to provide cognitive benefits in terms of white jurors’ ability to process evidence, black jurors would also deliberate more thoroughly in the presence of white jurors. We expected that black mock jurors would display higher levels of reasoning when deliberating in diverse juries than when deliberating in nondiverse juries. We also predicted that jurors in diverse juries would be better able to differentiate between ambiguous and weak cases than those in nondiverse juries.
Method
Trial Stimulus and Pilot Data
Mock jurors viewed a mock trial simulation ranging from 28 to 31 minutes (28 in the ambiguous case and 31 in the weak case) in which a black defendant was on trial for first-degree murder. This trial was based on Commonwealth of Massachusetts v. Jones (1975), 1 used in prior research to examine jury dynamics (Hastie et al., 1983). In this case, a bar fight between two friends and neighbors ended with one of them deceased. The state pursued a first-degree murder charge largely based on the fact that two men had a verbal altercation, and the one who committed the crime had a weapon when he returned to the bar; it was not clear whether he had the weapon during their initial altercation. We developed a script based on a condensed version of the facts in the case, reducing the number of witnesses so that there were three witnesses for each side. The witnesses for the prosecution were: a detective, a medical examiner, and a witness (the bartender the night the crime took place). The witnesses for the defense were the defendant, the defendant’s friend, and a witness (a waitress at the bar where the fight took place).
The filming for the mock trial took place in a mock courtroom using volunteer actors, with both attorneys played by law students and the judge played by a faculty member. All versions of the trial video contained pattern judge’s instructions (reasonable doubt instructions), opening and closing statements from the prosecuting and defense attorneys, and direct and cross-examination of the detective, medical examiner, bartender, waitress, defendant’s friend, and the defendant. The strength of evidence was manipulated by adding motive for the murder in the ambiguous case, a more confident judgment from the medical examiner, and clearer views for the prosecutions’ eyewitness. All other case facts were held constant between the two evidence strength conditions.
A pilot study confirmed that the manipulation of evidence strength was successful. A total of 38 black and white community members recruited from craigslist.org from the New York City area completed the study. All participants completed a demographic screening questionnaire prior to coming in for the study. We extensively pilot tested the trial videos to ensure sufficient variation in the percentage of guilty verdicts in the ambiguous versus weak case. The final versions of the trial resulted in different conviction rates, 57% in the ambiguous case, and 10% in the weak case. A chi-square analysis revealed a significant difference in verdicts between the ambiguous and weak conditions, χ2(1, N = 38) = 10.97, p < .01.
Design
The study used a 3 (jury diversity: nondiverse white vs. nondiverse black vs. diverse) × 2 (evidence strength: ambiguous vs. weak) between-subjects factorial design. Participants were randomly assigned to conditions for which they were eligible but could not be randomly assigned to all six conditions because of their race (e.g., a white participant could not be assigned to a nondiverse black jury condition).
Participants
Four hundred thirty-three jury-eligible black and white community members responded to ads placed on craigslist.org. Participants were paid $25 for participation, and if they were on time for the experimental session, they were entered into a raffle to win an additional $25. Participants completed a screening questionnaire in the form of a Qualtrics survey to ensure that they were jury-eligible (i.e., 18 years of age or older and a U.S. citizen). Collecting participants’ demographic information in advance also allowed us to assign them to the appropriate conditions based on their racial group membership. Overall, participants were about evenly split in terms of gender (51% men) and race (51% black and 49% white). They averaged 39.59 years of age (SD = 14.83).
Procedure
Potential participants responded to an advertisement on Craigslist, which provided a survey link for a Qualtrics survey. The survey featured demographic questions and allowed researchers to identify eligible participants. Emails were sent to interested jury-eligible black and white community members, offering possible times to complete the study. Participants were recruited in groups of 9 to 15 participants, allowing for each session to contain either 1 or 2 juries (allowing for the failure of some participants to show for their session). Jury groups were made up of 4 to 7 jurors, and jurors were dismissed (if necessary) so that the jury groups were never larger than seven or smaller than four. The majority (73%) of juries contained six or more mock jurors. Dismissed jurors completed a brief “postdeliberation” questionnaire and were paid and debriefed, while the rest of the session began deliberations. Dismissal of jurors was done randomly (within the confines of maintaining at least two jurors of each race for diverse juries).
Participants then entered the mock courtroom in groups of 4 to 15, with diversity condition assigned by session to avoid suspicion on the part of participants. Participants completed consent forms, which included a request for permission to videotape them during the deliberation phase of the study. Participants were informed verbally that their videos would not be made public. Participants then completed demographic questionnaires, watched a trial video, and completed their posttrial verdict questionnaires. After participants completed the questionnaires, they were instructed not to discuss the case with other participants and offered a short break. The researchers then divided juries into two different jury groups (if 10 or more participants were present). For each of the juries, the experimenter reminded participants of the videotaping that would occur during the deliberations and instructed them that they should choose a foreperson in charge of fetching the researcher once the jury reached a unanimous verdict. Researchers then started the video recording and left the room, waiting outside for the duration of the deliberations. Participants were not informed of any timeline or deadline, although researchers stopped the deliberations if they continued for longer than 45 minutes and had participants count their votes. These juries were considered hung juries. After deliberations, the mock jurors completed the postdeliberation questionnaires. Participants were then paid, debriefed, and thanked for their participation. The Institutional Review Board of the City University of New York approved the research protocol for this study.
Materials
Voir Dire Questionnaire
Prior to viewing the trial video, participants completed a short demographic questionnaire. Participants provided their gender, age, citizenship, marital status, level of education, occupation, voter registration, political views, ethnicity, and jury duty history.
Manipulations
Jury diversity
Diverse juries contained at least two mock jurors of each race. Nondiverse juries consisted of only black (in the black nondiverse conditions) or only white (in the white nondiverse conditions) mock jurors. Participants were assigned to diversity conditions by session; however, they were not assigned to their jury group until the deliberation phase of the study. diverse juries, on average, contained 50% white and 50% black jurors. All diverse juries contained at least two black and two white jurors.
Strength of evidence
The evidence strength manipulation is explained above in the trial stimulus section. The trial either contained ambiguous or weak evidence for conviction.
Dependent Variables
Juror verdicts
Participants provided individual dichotomous verdicts (guilty or not guilty) after watching the trial videos but before the deliberations.
Jury verdicts
Each jury was instructed to come to a unanimous verdict and complete a form with a dichotomous (guilty/not guilty) option. If the jury did not reach a unanimous verdict after 45 minutes, the experimenter halted the deliberations and asked jurors to take a vote. Juries who did not reach a decision in 45 minutes were considered hung juries. The outcome of the majority vote was recorded by the researchers to determine which way hung juries were leaning.
Postdeliberation verdicts and measures
After deliberating with their jury groups, participants again provided individual responses to reflect their postdeliberation verdicts and recorded continuous verdict measures. These items were the same as the predeliberation dichotomous verdicts and continuous guilt measures.
Additional dependent measures
The predeliberation and postdeliberation questionnaires contained additional dependent measures which are fully described in Appendix A.
Study 1 Results
All data are available on our Open Science site: https://osf.io/k849y/.
Descriptive Statistics
Of the original 433 participants, 20 were removed prior to deliberations, leaving 413 participants. We removed four juries from analysis, as they did not meet the requirements for the diverse condition leaving us with 388 participants, who deliberated in 64 juries.
Juror Verdicts
Descriptive statistics for individual verdict judgments are presented in Table 1. Prior to deliberations, 26% of participants provided guilty verdicts, which dropped to 9% following deliberations. Before deliberations, a single-level binary logistic regression revealed a significant main effect of case strength on predeliberation verdicts, Wald χ2(1, N = 388) = 34.27, p < .01, OR = .21, 95% CI = [.12, .35]. After deliberating in a jury, participants’ individual verdicts were no longer independent. We used the glmer function in the package “lme4” in R Statistical software to perform a multilevel logistic regression examining jurors’ postdeliberation verdicts. First, we conducted an analysis on the null model to determine how much variance was due to group membership. With the null model, we added Jury as the random intercept for the model and found an intraclass correlation (ICC) of .52, demonstrating that 52% of the variance in the model could be explained by group membership. The mean reliability within jury groups was 87%.
Study 1 Guilty Verdict Descriptive Statistics (Predeliberation and Postdeliberation).
Note. Descriptive statistics for guilty verdicts only.
We conducted a generalized linear mixed model with a binary logistic regression presented in full in Table 2 on postdeliberation verdicts. We structured the model so that participants were nested within juries. In Model 2, our dependent measure was postdeliberation verdicts. We entered the fixed effects of diversity, race of participant, and strength of case. None of the fixed effects were significant. In Model 2, we added the diversity by strength of case interaction, which was not significant.
Study 1 Linear Mixed Model on Individual Dichotomous Verdict Outcomes (N = 388).
Note. SOC = strength of case.
Jury Verdicts
Out of the total 64 juries, 56 (88%) voted to acquit the defendant, whereas only five juries (8%) voted to convict. Three juries (5%) remained hung after the 45 minutes allotted for deliberation. We counted majority votes for each hung jury, and after that 5 (8%) voted to convict the defendant, whereas 59 (92%) voted to acquit the defendant. Descriptive statistics are presented in full in Table 3.
Study 1 Jury-Level Guilty Verdict Descriptive Statistics.
Note. Descriptive statistics for guilty verdicts only.
We performed a binary logistic regression on the jury verdicts once the vote counts had been calculated. In the first step, we added the strength of case predictor, as well as the two dummy-coded diversity conditions. In the second step, we added the interactions between case strength and each of the dummy-coded diversity conditions. None of the main effects or interactions were significant at the jury level.
Quality of Deliberations
We videotaped deliberations, and research assistants transcribed every deliberation. To measure quality of deliberations, we primarily used a coding scheme developed in previous research to measure juror reasoning (Kuhn et al., 1994). Coders divided the transcribed deliberations into units, with units defined as one unique argument (Kuhn et al., 1994). Each unit consisted of an assertion and any accompanying justification. Coders were blind to the condition of both the juries and the jurors (jurors were represented by their participant numbers).
The coding of deliberations revealed that there were 6,320 units (made up of an argument and any accompanying justification) in the 64 jury deliberations. The mean number of units per juror was 16.33 (SD = 17.41). Each unit was coded for purpose and quality. We had an overall 90% agreement rate between coders with a Cohen’s Kappa of .83. Disagreements were resolved through discussion. Descriptive statistics for the coding criteria for individual jurors are presented in Table 4. Only analyses with significant diversity effects are presented fully in text.
Study 1 Coding Criteria Descriptive Statistics.
Note. Within dependent measures, means bolded and sharing subscripts differ at p < .05.
First, we examined the number of units per juror. We conducted a 2 (race) × 2 (diversity) × 2 (case strength) nested analysis of variance (ANOVA) with jury group as the random clustering effect to determine any main effects or interactions on the number of units per juror. The results of the nested ANOVA revealed a significant main effect of case strength, F(1, 82.33) = 5.87, p = .02,
The first category used in the coding process was quality. To be sufficient, units had to be drawn from the evidence or constitute a reasonable inference to be drawn from that evidence and had to indicate that it would be used to render a verdict decision. We used the number of sufficient units per juror as our outcome measure and conducted a 2 × 2 × 2 nested ANOVA to account for the error due to jury groups. There were no main effects for diversity or race, but there was a main effect for case strength, F(1, 87.64) = 6.48, p = .02,
Units were then evaluated for the type of argument they represented. The four categories were supporting the chosen, discounting the alternative, discounting the chosen, and supporting the alternative. We only report categories that revealed significant diversity effects in a nested ANOVA. The number of discounting alternative arguments, displayed both a significant main effect of case strength, F(1, 92.59) = 5.90, p = .02,
Each of the units was then evaluated by reasoning type (Kuhn et al., 1994). For supporting arguments, these were either factual (stating a simple statement of fact that served to justify a verdict), narrative (statements used to put together a story to support a verdict), importing (using real-world knowledge to justify a verdict), and credibility (evaluating the credibility of a source to justify a verdict). Discounting arguments were also placed into four different categories: factual (stating facts inconsistent with a verdict), discounting-inconsistent (use real-world knowledge to support an argument that evidence is inconsistent with a verdict), discounting judgmental (uses real-world knowledge that makes evidence less reliable and therefore discounts an argument), and credibility (a juror discounts the source of evidence because they are unreliable). We performed nested ANOVAs on all eight of these outcome variables. Although there were no main effects of diversity, in the ambiguous case, we found significant differences between diverse and nondiverse conditions in several of the categories. Categories with significant differences are presented in Table 4, including, supporting statements using case facts, discounting facts, discounting using importing, and discounting credibility. Diverse juries used more supporting statements using case facts, more discounting facts, more discounting statements using importing and more discounting statements using credibility than did nondiverse juries.
Additional Dependent Variables
Analyses for additional dependent variables are included in Appendix B.
Study 1 Discussion
We predicted that jurors deliberating in diverse juries would make high-quality contributions to discussion than would jurors deliberating in nondiverse juries. This hypothesis was partially supported; although diversity did not improve the quality of deliberations overall, diversity did improve the quality of deliberations in the ambiguous case. When evaluating the ambiguous case, jurors deliberating in diverse juries provided more sufficient units, more discounting arguments, more arguments used to discount the reliability of evidence, and more units using real-world knowledge to discount facts when deliberating in diverse juries than when deliberating in nondiverse juries. This finding is consistent with previous research demonstrating that diverse juries have high-quality discussion content than nondiverse juries (Sommers, 2006). The effects of diversity on deliberation quality were likely restricted to the ambiguous case because the evidence created uncertainty about the defendant’s guilt, which led to more nuanced discussions that were unnecessary when the evidence against the defendant was very weak. Furthermore, this finding of diversity’s benefits in ambiguous cases fits with modern theories of prejudice (i.e., aversive racism theory; Dovidio & Gaertner, 2004), which maintain that ambiguous situations are more likely to evoke biases and racial prejudice. Thus, it is consistent with most theories of modern prejudice that jury diversity would have little influence on deliberations in a clear-cut case in which racial bias would be less likely to affect decisions but would influence deliberations in an ambiguous case, in which the ambiguity of the evidence would allow for the operation of jurors’ racial biases and the ability of jury diversity to counteract those biases.
We manipulated the quality of the evidence in the case to test whether jurors who deliberated in diverse juries were more sensitive to variations in evidence strength than those who deliberated in nondiverse juries. Mock jurors were not more sensitive to case strength when deliberating in diverse juries; thus, this hypothesis was not supported. All jurors were sensitive to the strength of the case before deliberations, but this effect disappeared following deliberations. Given the very low overall postdeliberation conviction rate and fairly low overall predeliberation conviction rate, floor effects may have influenced our ability to detect differences in sensitivity to evidence strength as function of jury diversity.
Participant race did not affect the quality of deliberation contributions, suggesting that diversity similarly affected participants of both races. This finding expands upon that of previous work (Sommers, 2006; Stevenson et al., 2017) and suggests that diversity may improve contributions to deliberations for both white jurors and members of underrepresented groups. The finding that people of color also cognitively process information better in diverse juries does not rule out the possibility of white jurors’ motivations to avoid prejudice (Fleming et al., 2005; Petty et al., 1999) or engaging in higher levels of self-monitoring (Stevenson et al., 2017), but it does suggest that diversity might operate through other mechanisms as well. The results from this study also somewhat deviate from the theories that explain diversity’s benefit in terms of increased heterogeneity of experiences and perspectives (Nemeth, 1986). Our results demonstrate that diversity improved deliberations at the jury level, so jurors from underrepresented groups were not simply adding different perspectives but also contributing at a higher level than they were in nondiverse juries. In all cases, the defendant was a person of color, so it is possible that jurors were alerted to the possibility that an ingroup member might be the target of prejudice. Black jurors may have been concerned about the possibility of the defendant being treated unfairly by members of their jury in a diverse jury, which would be less of a concern in a nondiverse jury. The design of this study, however, does not allow for a full test of that hypothesis because the race of the defendant was not manipulated. Another possibility is a mechanism that explains the benefits of diversity in terms of reasoning for both the dominant group and members of underrepresented groups, such as raising the concern about the defendant receiving a fair trial or anticipating having to defend one’s own position more thoroughly when deliberating in a diverse group.
Study 2
Overview
Previous work (including Study 1 in this article) has primarily explored jury diversity defined by racial and ethnic diversity, with diverse juries featuring members of racial and ethnic minority groups, while nondiverse juries are composed of white jurors. Investigating racial and ethnic diversity and its role in impacting jury decision-making is undoubtedly important, but this study seeks to explore how wealth and power can affect the deliberation process. Prior work on organizational and group dynamics have documented that unequal distribution of power and resources can impact group performance, causing members of lower power groups to experience identity threat (Alderfer & Smith, 1982; Foldy et al., 2009). These power differentials between members of different racial and ethnic groups have the potential to enter the jury room and interfere with jury performance. Moreover, separating the specific stereotypes surrounding racial groups in their relation to crime can help us understand how differences in wealth and power can impact jury deliberations, even when both are irrelevant to the task at hand.
Power differentials play a significant role in the proceedings of jury deliberations. Traditionally, members of higher–social status groups (i.e., higher incomes and education levels) tend to participate more in the jury deliberations than jurors of lower social status groups (Hastie et al., 1983; York & Cornwell, 2006). Similarly, women tend to contribute less to jury deliberations than do men (Hastie et al., 1983). Finally, high socioeconomic status (SES) individuals reported participating more and being perceived as more influential during deliberations than low SES individuals (Cornwell & Hans, 2011; York & Cornwell, 2006).
High-power individuals tend to act differently in groups than do low-power individuals (for a review, see Brauer & Bourhis, 2006). One study found that high-power groups tended to express their opinions more often and were less susceptible to feeling the pressure of an intergroup situation (Galinsky et al., 2008). Social dominance orientation (SDO) can also be relevant to understanding the impact of power dynamics in jury deliberations (Ho et al., 2015; Sidanius & Pratto, 1999). SDO refers to the tendency for high-status individuals to believe that they deserve their position at the top of the power hierarchy, and to maintain their position. People high in SDO are more likely to engage in prejudice and discriminatory behavior toward outgroups (Amiot & Bourhis, 2005). Jurors high in SDO tend to punish outgroup members more than do jurors low in SDO, who are far more likely to be egalitarian, and therefore are less likely to discriminate (Kemmelmeier, 2005).
The minimal groups paradigm (MGP) was initially created to demonstrate that people could develop strong feelings of association with ingroup members and hostility toward outgroup members, even when the groups were created using trivial criteria (Tajfel et al., 1971). Previous research has established methods of manipulating wealth and power within an MGP, using Monopoly money to signify wealth and a greater stake in the allocation decisions to simulate power (Harvey & Bourhis, 2012). In this study, wealth was manipulated through the differential distribution of Monopoly money to participants depending upon their assignment to a wealth condition. The researchers manipulated power by allowing some groups to have more control over how resources were allocated than others. In the end, the wealthy and powerful group has more money and more power to distribute that money. Previous MGP studies manipulating wealth and power have demonstrated that independent of wealth and power manipulations, people tend to show favoritism and discrimination based on their group membership (Harvey & Bourhis, 2012).
In this work, we experimentally manipulated wealth and power to study the effects of wealth and power diversity on the quality of jury deliberations. We used a previously established paradigm that has been demonstrated to manipulate feelings of wealth and power in an MGP setting (Harvey & Bourhis, 2012). We manipulated diversity by combining the wealth/power groups in the diverse conditions. We predicted that diverse juries (i.e., juries combining both wealth/power groups) will exhibit more sensitivity to evidence strength and high-quality deliberations than nondiverse groups.
Method
Participants
Three-hundred sixty-nine jury-eligible undergraduate psychology students at an urban public university in the northeast completed the study for course credit. Participants were prescreened for jury-eligible status (over 18 years old and a U.S. citizen). Participants were 75.6% female and averaged 20.75 years old (SD = 4.29). The sample was ethnically diverse (42.8% Hispanic, 16.8% black, 16.0% white, and 9.8% Other).
Design
Participants were randomly assigned to a three (jury diversity: diverse vs. nondiverse high wealth/power vs. nondiverse low wealth/power) × 2 (evidence strength: ambiguous vs. weak) between-subjects factorial design.
Procedure
Participants were recruited through the Sona online recruitment system and participated in groups of 4 to 16. The procedure for the first part of the study was based on an MGP manipulation developed by Harvey and Bourhis (2012). When participants arrived, they were assigned to either the high-wealth/power group or the low-wealth/power group. Participants were in either “Group A” or “Group B,” assigned by odd or even participant numbers to emphasize the randomness of the group assignments. We used randomization to determine whether Group A or Group B was high power. We manipulated wealth in the manner described above. After distributing the Monopoly money, the experimenter informed participants that the high-power group’s decisions about how to distribute the money would count for 70% of the final distribution of the Monopoly money, with the low-power group’s decisions counting only 30% in the final allocation of money.
Once the groups were assigned, the experimenter explained the distribution task and handed out questionnaires for all participants to fill out before completing the task. Participants first completed three Tajfel matrices originally developed for use in MGPs (Tajfel et al., 1971). The matrices were labeled “Ingroup” for ingroup member on the top row of the matrices, and “Outgroup” for outgroup members on the bottom row of the matrices. For each matrix, one side of the scale represented rewarding ingroup members and punishing outgroup members, whereas the other side of the scale represented rewarding outgroup members and punishing ingroup members. The middle of the scale allowed for more equal distribution of funds among ingroup and outgroup members. Participants received three envelopes, one labeled “Group A Member,” one labeled “Group B Member,” and one labeled “Future Studies.” Consistent with the procedure by Harvey and Bourhis (2012), we asked participants to report how much of the money from the whole experiment they would like to give back to the researchers to contribute to future studies. The purpose of asking participants to donate to future studies was to emphasize that resources were finite because the participants in the low-wealth/power groups did not have much money to donate. Participants placed the appropriate amount of money in each envelope (as dictated by their decisions on the matrices and their donation to future studies).
Following this first task, participants filled out additional questionnaires with questions about their group membership and their feelings of wealth and power.
After completing the resource allocation tasks, the participants completed the study in the same manner as in Study 1. They viewed the trial stimulus after the wealth/power manipulation and then deliberated in diverse or nondiverse groups, depending on condition. The trial stimulus was identical to Study 1, except that it included a Hispanic rather than a black defendant.
Manipulations
Wealth/power
The manipulations of wealth and power were combined in this study so that we had a high-wealth/power condition and a low-wealth/power condition. In the high-wealth/power condition, participants received $1,000 in Monopoly money, and their decisions accounted for 70% of the overall allocation of the money at the end of the study. In the low-wealth/power condition, participants received $300 in Monopoly money, and their decisions accounted for 30% of the overall allocation of the money.
Jury diversity
Diverse juries contained at least two members of the high-wealth/power group and the low-wealth/power groups. Nondiverse juries consisted of either all low-wealth/power group members or all high-wealth/power group members.
Dependent Variables
This study used all dependent variables from Study 1, with additional dependent variables for the MGP portion of the study.
Reports of wealth/power
Following the allocation task, participants rated, on a scale of 1 to 7, how wealthy they felt and how powerful they felt. These questions acted as manipulation checks for the power and wealth manipulations.
Results
Descriptive Statistics
Overall, 369 participants deliberated in 60 juries of 4 to 7 individuals. Of the 369 participants, 13 were removed prior to deliberations, leaving 356 participants following deliberations.
Manipulation Checks
Participants’ responses to the manipulation check questions reflected the fact that our wealth/power manipulation had the intended effect. We conducted a one-way ANOVA on participants’ ratings of how wealthy and powerful they felt. Participants in the high-wealth/power condition reported feeling wealthier (M = 4.60) than did those in the low-wealth/power condition (M = 3.21), F(1, 364) = 64.34, p < .00. Participants in the high-wealth/power condition also reported feeling more powerful (M = 4.52) than did those in the low-wealth/power condition (M = 3.34), F(1, 364) = 51.02, p < .00. Based on the procedure by Harvey and Bourhis (2012), we averaged all items on the satisfaction with group membership scale (α = .78). Participants in the high-wealth/power conditions expressed more satisfaction as members of their group (M = 4.89) than did participants in the low-wealth/power conditions (M = 4.04), F(1, 364) = 45.20, p < .01.
Juror Verdicts
Full descriptive statistics are presented in Tables 5 and 6. Before deliberations, 47% of jurors voted to convict the defendant. We used a standard binary logistic regression to determine whether there were predeliberation effects of case strength or wealth/power group membership on jurors’ verdicts before deliberations. We did not examine predeliberation effects of diversity because diversity conditions were not assigned until the deliberation stage. In the first step, we entered the two main effects (power and case strength), and at the second step entered the interaction between the two variables. There was a significant main effect of case strength on predeliberation verdicts, Wald χ2(1, N = 357) = 68.81, p < .01, OR = 7.64, 95% CI = [4.72, 12.34]. In the ambiguous case condition, 68% of participants voted guilty, with only 22% of participants voting guilty in the weak case. No other main effects or interactions had significant effects on predeliberation verdicts, p values > .05.
Study 2 Predeliberation Guilty Verdict Descriptive Statistics.
Note. Descriptive statistics for guilty verdicts only.
Study 2 Postdeliberation Guilty Verdicts Descriptive Statistics.
Note. Descriptive statistics for guilty verdicts only.
After deliberations, 36% of participants voted to convict the defendant. To examine the effects of the independent variables and deliberations on the individual verdicts, we conducted a linear mixed-model regression with individual juror verdicts as the outcome measure. First, we examined the null model with jury as a random intercept only to determine how much variance was due to jury group membership. The ICC was .75, indicating that 75% of the variance in the model was due to group membership. The mean reliability estimates within groups was 95%, indicating a strong relationship between jury group members on their postdeliberation verdicts.
We then conducted a linear mixed-model regression using the R program lme4, and the function “glmer” with the family binomial analyses regressions with binary outcomes to represent jurors nested in juries. The results of this analysis are presented in Table 7 and revealed a main effect of case strength, z = −5.53, p < .01. Participants in the ambiguous case were more likely to vote guilty (54%) than were participants in the weak case (14%). No other main effects or interactions reached significance in the model, p values > .05.
Study 2 Linear Mixed Model on Individual Dichotomous Verdict Outcomes (N = 354).
Note. SOC = strength of case.
Jury Verdicts
Participants deliberated in 60 juries. Of those 60 juries, 17 (29%) voted to convict the defendant, 34 (57%) voted to acquit the defendant, and 9 (15%) remained hung after 45 minutes. We asked each hung jury to provide a public vote to determine the majority vote for each jury, and after counting majority votes, 22 (37%) juries voted to convict the defendant and 38 (63%) juries voted to acquit the defendant. We performed jury-level analyses on the verdicts using the majority counts. Descriptive statistics are presented in Table 8.
Study 2 Jury-Level Guilty Verdict Descriptive Statistics.
Note. Descriptive statistics for guilty verdicts only.
First, we performed a binary logistic regression examining the effects of the conditions on jury verdicts. We entered the two dummy-coded diversity variables in the first step, along with the strength of case predictor. We then included interactions between strength of case and each of the dummy-coded diversity conditions in the second step. In the first step, there was a main effect of case strength, Wald χ2(1, N = 60) = 15.20, p < .01, OR = 20.51, 95% CI = [4.49, 93.64]. In the ambiguous case, 19 (63%) juries voted guilty, compared to only 3 (10%) juries in the weak case. In the first step, there was also a significant difference between the nondiverse low wealth/power and the diverse conditions, Wald χ2(1, N = 60) = 3.90, p < .05, OR = .18, 95% CI = [.03, .99]. In the diverse condition, 4 (20%) juries voted guilty compared to 9 (47%) in the low-wealth/power diversity condition. In a similar pattern, the difference between the nondiverse high-wealth/power condition and the diverse condition approached significance, Wald χ2(1, N = 60) = 3.60, p = .06, OR = .19, 95% CI = [.04, 1.06]. For the nondiverse high-power juries, 9 (43%) voted to convict the defendant, compared to 4 (20%) in the diverse condition.
Quality of Deliberations Measures
Although jurors deliberated in 60 juries, we had technical problems and lost two of the jury videos, leaving 58 juries. A transcription company transcribed each of the 58 jury videos. Research assistants then divided the transcribed jury discussion into units. Units consisted of an argument and any accompanying justification. Overall, there were 3,305 units determined from the 58 juries. The mean number of units per juror was 9.50 (SD = 9.53). Those units were then coded for purpose and quality. The Cohen’s Kappa for agreement between coders was .83, indicating a high level of agreement. For each of the quality of deliberation coding measures, we conducted a 2 (Diversity) × 2 (Strength of case) nested ANOVA to determine any main effects or interactions on diversity or strength of case. The full set of descriptive results is presented in Table 9. We only report the analyses for categories with significant results.
Study 2 Coding Criteria Descriptive Statistics.
For supporting an alternative verdict, there was a significant main effect for diversity, F(1, 56.83) = 4.42, p = .04,
Additional Dependent Variables
Analyses for additional dependent variables are included in Appendix C.
Study 2 Discussion
Similar to the results in Study 1, jurors deliberating in diverse juries provided more contributions on several of the quality of deliberation reasoning measures than did jurors in nondiverse juries. Unlike in Study 1, this effect was not limited to the ambiguous case. Participants deliberating in diverse juries were more likely to offer arguments that supported an alternative verdict and were more likely to bring in outside information to support their verdict choices than were jurors deliberating in nondiverse juries. These contributions reflect evidence of high-level reasoning as they demonstrate jurors were carefully considering alternative theories to their chosen verdict and bringing in outside information to evaluate their choice (Kuhn et al., 1994). These contributions also reflect jurors changing their minds in light of new information when warranted, which lends further support to the conclusion jurors considered others’ points of view. Consistent with Study 1 and with the Sommers (2006) data, jurors deliberating in diverse juries out-performed jurors deliberating in nondiverse juries, as evidenced by their more frequent high-quality contributions to jury deliberations.
Similar to the results in Study 1, we did not find evidence to support our primary hypothesis that diverse juries (and jurors deliberating in diverse juries) would be more sensitive to evidence strength than nondiverse juries and jurors deliberating in nondiverse juries. There are several possible explanations for diversity’s lack of impact on the sensitivity to evidence strength. Although the conviction rate in the second study was not quite as low as it was in the first study, there was an overall low conviction rate, which may have reduced our ability to detect differences. No previous research has documented diversity’s ability to improve sensitivity to evidence strength, so it is possible that diversity does not improve sensitivity to evidence strength or the quality of jurors’ decisions. Previous research has documented how diversity improves the quality of deliberations discussions (Sommers, 2006), but this may not translate to more accurate verdicts or different trial outcomes. Despite diversity’s failure to alter trial outcomes in both past and the present research, it has consistently improved the quality of the process, which is important for engendering and maintaining the citizenry’s support of the legal system.
We also found jury-level effects of diversity on verdicts. Juries in the nondiverse low-wealth/power condition were more likely to vote to convict the defendant than were juries in the diverse condition. Thus, low-wealth/power groups were more punitive than the diverse juries. In addition, the high-wealth/power condition exhibited the same trend (i.e., nondiverse high-wealth/power juries were more punitive than diverse juries), so diversity likely contributed to this leniency in verdicts. Although deliberations have been demonstrated to increase leniency in juror verdicts (Kerr & MacCoun, 2012), it has not been demonstrated previously that diversity in jury group composition increases leniency overall. However, one study did find that diverse juries (i.e., juries containing more minorities and women) were more lenient toward minority defendants (Lynch & Haney, 2011). For all conditions in this study, the defendant was Hispanic; so it could be that diverse juries are more lenient toward minority defendants.
General Discussion
Limitations
The most substantial limitations to both studies are (a) the lack of clear evidence that diversity improves the quality of deliberations and (b) the lack of consistency in findings between the two studies. The failure of the strength of evidence manipulation to detect diversity’s effect on quality of deliberations limits the study’s impact. This may partially be due to a low conviction rate, particularly in the postdeliberation individual verdicts. These low rates may represent floor effects, which likely explain why it was difficult to detect results from the dichotomous postdeliberation verdicts and continuous guilt measures. The leniency effect of deliberations (e.g., Kerr & MacCoun, 2012) could have masked the effects of diversity on evidence evaluation. Although this was problematic in Study 1, there was evidence of diversity main effects on jury-level verdicts in Study 2 that were not limited to the ambiguous case. In addition, diversity affected the quality of deliberation measures differently between the two studies. Although the studies do not provide clear guidance on how best to measure quality of deliberation discussions, they do support the notion that diversity improves aspects of discussion quality.
We stopped deliberations after 45 minutes, and this procedural limit on deliberation time may have contributed to the lack of diversity’s impact on the length of deliberations. Another possible explanation for some differences in the findings here and in previous work (Sommers, 2006) is that we manipulated diversity differently, with a manipulation of both wealth and power manipulation in Study 2 and an equal number of white and black jurors in Study 1. We also asked jurors to anticipate how they expected the interaction to occur using the anticipated interaction scale. Focusing participants about to participate in an interracial interaction on the social interaction rather than the task to be completed may impair the quality of interracial interactions (Babbitt & Sommers, 2011). It is possible that asking participants to consider how smoothly the interaction would proceed influenced the quality of these discussions. Also, although the present work made efforts to be high in both ecological and external validity, the jurors were aware they were not truly deciding the fate of the defendant in the case. The simulated situation created by the laboratory experiment allowed us to detect causal effects of diversity but of course does not replicate the emotional components and consequences of a real jury trial.
Future Directions
The current research builds on the literature finding evidence of diversity’s ability to improve jury deliberations (Peter-Hagene, 2019; Sommers, 2006; Stevenson et al., 2017). With the inclusion of Study 1, we now have evidence that both people of color and white individuals make high-quality contributions to deliberations when they participate in diverse juries compared to when they participate in homogeneous juries. In addition, Study 2 highlights that wealth and power may play a role in jury deliberations, as groups with diverse wealth and power backgrounds made high-quality contributions to discussion and were less likely to convict the defendant than high- or low-wealth/power groups. Previous research documents the ability of the watchdog effect and self-monitoring to promote more thoughtful information processing in high-status individuals deliberating in diverse groups, sometimes even prior to deliberations (Sommers, 2006; Sommers et al., 2008; Stevenson et al., 2017). The current research suggests that the contributions to deliberation made by members of low-status groups may benefit from jury diversity as well. Future research should explore the mechanism by which members of racial/ethnic minority groups and individuals of low-wealth/power status make more high-quality contributions to jury deliberations and determine whether diversity’s benefit operates through information processing. It is possible that the mechanism is the same for both high- and low-status individuals, such as anticipating more disagreement when deliberating in diverse groups. These results also have implications for other types of group decision-making tasks, as they suggest that all group members will benefit from completing tasks in diverse groups.
Conclusions
The courts prescribe several protections against the prejudices of legal decision-makers, and the ability of diversity to guard against biased legal decisions is uncontroversial. Increasing the diversity of juries also has the potential to improve perceptions of the justice system and increase the perceived fairness of criminal justice outcomes (Ellis & Diamond, 2003). Many court decisions assume that diversity among jury members benefits the quality of deliberations, correcting for biases and prejudices (Peters v. Kiff, 1972). Most research supports this assumption (Bowers et al., 2001; Lynch & Haney, 2011), which is further supported by the present research. Our findings also demonstrate that diversity’s benefits are not restricted to jurors from majority or dominant groups but that jurors from racial minority groups and groups low in status and power also benefit in terms of exhibiting higher levels of reasoning in diverse juries. Overall, our research supports the courts’ assumption that diverse deliberations will improve jury decision-making. Preventing members of underrepresented groups from serving on juries not only violates individual rights but also a disservice to the quality of jury deliberations.
Research Data
sj-csv-1-psp-10.1177_01461672211040960 – Supplemental material for Diversity’s Impact on the Quality of Deliberations
sj-csv-1-psp-10.1177_01461672211040960 for Diversity’s Impact on the Quality of Deliberations by Amanda Nicholson Bergold and Margaret Bull Kovera in Personality and Social Psychology Bulletin
Research Data
sj-csv-2-psp-10.1177_01461672211040960 – Supplemental material for Diversity’s Impact on the Quality of Deliberations
sj-csv-2-psp-10.1177_01461672211040960 for Diversity’s Impact on the Quality of Deliberations by Amanda Nicholson Bergold and Margaret Bull Kovera in Personality and Social Psychology Bulletin
Supplemental Material
sj-docx-1-psp-10.1177_01461672211040960 – Supplemental material for Diversity’s Impact on the Quality of Deliberations
Supplemental material, sj-docx-1-psp-10.1177_01461672211040960 for Diversity’s Impact on the Quality of Deliberations by Amanda Nicholson Bergold and Margaret Bull Kovera in Personality and Social Psychology Bulletin
Supplemental Material
sj-docx-2-psp-10.1177_01461672211040960 – Supplemental material for Diversity’s Impact on the Quality of Deliberations
Supplemental material, sj-docx-2-psp-10.1177_01461672211040960 for Diversity’s Impact on the Quality of Deliberations by Amanda Nicholson Bergold and Margaret Bull Kovera in Personality and Social Psychology Bulletin
Supplemental Material
sj-docx-3-psp-10.1177_01461672211040960 – Supplemental material for Diversity’s Impact on the Quality of Deliberations
Supplemental material, sj-docx-3-psp-10.1177_01461672211040960 for Diversity’s Impact on the Quality of Deliberations by Amanda Nicholson Bergold and Margaret Bull Kovera in Personality and Social Psychology Bulletin
Research Data
sj-sav-1-psp-10.1177_01461672211040960 – Supplemental material for Diversity’s Impact on the Quality of Deliberations
sj-sav-1-psp-10.1177_01461672211040960 for Diversity’s Impact on the Quality of Deliberations by Amanda Nicholson Bergold and Margaret Bull Kovera in Personality and Social Psychology Bulletin
Research Data
sj-sav-2-psp-10.1177_01461672211040960 – Supplemental material for Diversity’s Impact on the Quality of Deliberations
sj-sav-2-psp-10.1177_01461672211040960 for Diversity’s Impact on the Quality of Deliberations by Amanda Nicholson Bergold and Margaret Bull Kovera in Personality and Social Psychology Bulletin
Footnotes
Acknowledgements
The authors would like the thank the research assistants who provided invaluable assistance on this project: Antonella Bariani, Amanda Beltrani, Hayley Carrier, Sara Hartigan, Brittany Lahey, Alexis Merlo, Valerie Negron, Jennifer Teitcher, Gabrielle Trupp, Sydney Wood, and Yvette Yun. The authors would also like to thank Dr. Jennifer Hunt, Dr. Steven Penrod, Dr. Samuel Sommers, and Dr. Daryl Wout for their comments on earlier versions of this manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded through a National Science Foundation Dissertation Improvement (grant no. 40D43-0001) awarded to the authors. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.
Supplemental Material
Supplemental material is available online with this article.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
