Abstract
Allocating limited mental health resources is a challenge for juvenile justice facilities. We evaluated the clinical utility of the Massachusetts Youth Screening Instrument, Version 2 (MAYSI-2)—an instrument designed to aid in this process—in three subsamples of justice-involved youth (ages 14-17): detained girls (n = 69), detained boys (n = 130), and incarcerated boys (n = 373). For perspective, we compared its performance (in the incarcerated subsample) to that of the Youth Self-Report (YSR), a more widely-used screen. The MAYSI-2 subscales were moderately useful for detecting relevant diagnoses, and differences were observed across samples. However, as a general mental health screen, the MAYSI-2 performed well (and comparably to the YSR), correctly classifying 66% to 75% of youth. When used to differentiate youth with any and without any disorder, both instruments were effective. Given the MAYSI-2’s practical advantages over the YSR (lower cost, easier administration), it may be a better option for juvenile facilities.
For youth involved in the juvenile justice system, behavioral problems are often just the tip of the iceberg. Delinquent youth are more likely than their non-delinquent peers to face multiple disadvantages, including disproportionately high rates of mental disorder (Abram, Teplin, McClelland, & Dulcan, 2003; Atkins et al., 1999; Colins et al., 2010; Fazel, Doll, & Langstrom, 2008; Garland et al., 2001; McReynolds et al., 2008; Otto, Greenstein, Johnson, & Friedman, 1992; Shufelt & Cocozza, 2006; Teplin, Abram, McClelland, Dulcan, & Mericle, 2002). The juvenile justice system has an ethical responsibility to provide mental health treatment to those in its custody who require it. However, identifying youth in need of such services poses a major challenge to juvenile facilities, which have limited resources. To address this problem, many facilities screen youth for mental health problems using brief, self-report scales rather than provide each individual with a comprehensive clinical evaluation. However, due to limited research on the relative effectiveness of such mental health screening instruments, reliable guidance for juvenile justice facilities in this area is lacking.
The present study evaluates the utility of the Massachusetts Youth Screening Instrument, Version 2 (MAYSI-2; Grisso & Barnum, 2000; Grisso, Barnum, Fletcher, Cauffman, & Peuschold, 2001), a mental health screening device, which contains subscales designed to assess different clusters of mental health symptoms, and which was developed specifically for use with juvenile justice populations. We examine the effectiveness of this instrument in three populations of juvenile justice–involved youth: detained girls, detained boys, and incarcerated boys. Detained youth are those held in custody temporarily while awaiting trial or transfer to a different placement. Incarcerated youth are those who have been found guilty of an offense (or “delinquent” in the terminology of the juvenile court) and have been placed in a secure facility as part of their sentence (or “disposition”). Evaluating the effectiveness of screening instruments in these different populations is necessary because each of these groups likely has a distinct set of mental health needs, and it is possible that screening instruments vary across these groups in their ability to discriminate between youth who do and do not need additional assessment.
It is well established that MAYSI-2 subscale scores vary by sex (Cauffman, Lexcen, Goldweber, Shulman, & Grisso, 2007), race/ethnicity (Cauffman & MacIntosh, 2006), and for youth at different stages of juvenile justice processing (Vincent, Grisso, Terry, & Banks, 2008). For example, previous research suggests that female offenders exhibit higher rates of symptoms of mental disorder (Cauffman et al., 1998; Stewart & Trupin, 2003; Vincent et al., 2008). Also, prior studies find gender differences in the endorsement and suitability of particular subscales of the MAYSI-2 among incarcerated youth. For example, the Thought Disturbance subscale is considered uninterpretable for female participants and there is evidence of gender differences in responses on the Alcohol/Drug Use, Angry-Irritable, and Somatic Complaints subscales (Cauffman & MacIntosh, 2006). Differences due to the stage of juvenile processing are less well explored. Vincent et al. (2008) observed substantial variation in responses to the MAYSI-2 across different juvenile justice contexts. Findings such as these underscore the need to test how effectively the MAYSI-2 identifies those in need of further psychological assessment across different populations of justice-involved youth (e.g., males and females, detained and adjudicated).
Establishing whether or not youth are accurately flagged as needing additional assessment requires that scores be compared with mental health diagnoses. However, only a handful of studies (Archer, Simonds-Bisbee, Spiegel, Handel, & Elkins, 2010; Hayes, McReynolds, & Wasserman, 2005; Kerig, Moeddel, & Becker, 2011; Kuo, Stoep, & Stewart, 2005; Wasserman et al., 2004) have evaluated the degree to which the MAYSI-2 accurately detects mental disorders in juvenile justice populations. Critically, all of these studies included either detained or incarcerated samples (but not both), so there were no comparisons across settings for differential responding or use of a consistent set of rules for determining the need for further assessment. The most comprehensive of these studies (Wasserman et al., 2004) compared scores on the MAYSI-2 subscales in a sample of 325 detained youth (20% female) to diagnoses obtained using a self-administered mental health questionnaire. Most of the MAYSI-2 subscales exhibited moderate levels of utility for differentiating youth with and without theoretically related—“homotypic”—diagnoses. For example, the Depressed-Anxious subscale of the MAYSI-2 correctly identified 73% of detained youth who met criteria for any affective diagnosis and correctly identified 60% of detained youth who did not meet diagnostic criteria for any affective disorder. Unfortunately, this suggests that the screen misclassified 40% of youth who did not meet diagnostic criteria for any affective disorder. Using different self-administered diagnostic interviews, other researchers have also found certain MAYSI-2 subscales to have moderate diagnostic utility (Kerig et al., 2011; Kuo et al., 2005).
Our analysis proceeds in two steps. First, we evaluate the association between MAYSI-2 subscale scores and diagnoses obtained in a diagnostic interview. In addition, in one of the populations (the incarcerated males), we are able to compare the MAYSI-2’s performance relative to diagnosis to that of the Youth Self-Report (YSR; Achenbach, 1991). The YSR, being the more established and widely-used mental health screen, might be expected to exhibit greater construct validity with respect to mental health diagnosis. However, the MAYSI-2 was developed specifically for juvenile justice populations, which could give it an edge over the YSR in juvenile justice settings. Also, the MAYSI-2 has several practical advantages over the YSR, including lower cost ($279.95 vs. $499.99 for the manual and software at the time of this writing) and greater ease of administration (i.e., the MAYSI-2 does not require clinical training to administer). In prior studies (Grisso et al., 2001; Grisso & Quinlan, 2005), correlations between MAYSI-2 scales and parallel YSR scales were found to be strong, but far from perfect. Therefore, it is reasonable to consider which screening tool is more useful within a juvenile justice setting.
In the second part of our analysis, we test how well the MAYSI-2 (and in the incarcerated male subpopulation, the YSR) identifies youth with and without any mental health diagnosis (according to a diagnostic interview) using two different decision rules likely to be implemented in juvenile justice settings. The first rule would be to refer youths scoring above the clinical cutoff for any scale for additional screening—a reasonable choice considering that scores above the cutoff signal an increased risk of mental disorder. However, given the high prevalence of mental health issues in this population and limited resources to address them, juvenile justice facilities might instead opt to set the threshold for referral for further mental health assessment higher, requiring two scores above the clinical cutoff. This is the second rule we test. Our analyses compare the results of applying these two rules in our samples of justice-involved youth. This analysis has direct practical relevance since, as discussed below, the MAYSI-2 and YSR are meant to be used to identify youth in need of further mental health assessment, not to diagnose specific disorders.
Use of the MAYSI-2 and YSR in Juvenile Justice Populations
Given the high rate of mental disorder among delinquent youth, it would be ideal if each youth entering a juvenile facility received a comprehensive psychological evaluation (Wasserman et al., 2003). However, few facilities have sufficient funding and qualified staff to provide this level of care to every new arrival (Cocozza & Skowyra, 2000; Grisso et al., 2001; Thomas, Gourley, & Mele, 2004). In the absence of funding for universal assessments, facilities turn to mental health screening devices to help them decide how to allocate clinical resources. It is important to note that mental health screening devices, including the MAYSI-2 and YSR, are not designed to diagnose specific mental disorders. Rather, they are meant to identify youth experiencing symptoms severe enough that further assessment is warranted. In 2003, a panel of experts recommended that all youth held at a juvenile justice facility be screened for symptoms of mental disorder using “evidence-based, scientifically sound” instruments, such as the YSR and the MAYSI-2 (Wasserman et al., 2003, p. 754). As a result, the use of such screening instruments in juvenile justice settings has increased.
It is important to consider that, though both the MAYSI-2 and the YSR are comprised of subscales that are theoretically related to diagnostic categories (e.g., the MAYSI-2 Alcohol/Drug Use subscale is related to substance use disorders and the YSR Thought Problems subscale is related to psychotic disorders), neither was designed to serve a diagnostic purpose. Rather than probing diagnostic criteria, the measures’ subscales correspond to clusters of related symptoms (or “syndromes”) empirically observed in youthful (YSR) and juvenile justice–involved (MAYSI-2) populations (Achenbach & Rescorla, 2001; Grisso & Barnum, 2000). Accordingly, the subscales reflect the high rates of comorbidity of mental disorders in youth. This is why, for example, symptoms of anxiety and depression (disorders that commonly co-occur) are represented in the same subscale (e.g., Youngstrom, Findling, & Calabrese, 2003). Consequently, subscale scores are not expected to map perfectly on to specific diagnoses. Indeed, the ultimate goal in using mental health screening devices is to distinguish those youth with any mental disorder from those free of mental disorder. For this reason, assessments of these instruments’ effectiveness should include an analysis of their ability to accurately detect the presence or absence of any mental disorder; analysis of this sort has been lacking in prior investigations. Moreover, while the MAYSI-2 manual does not specify the number of subscales for which an individual would need to score above the cutoff to warrant additional screening, it is useful to explore whether using a threshold of one versus two supra-cutoff scores might lead to better use of resources at different levels of detention and across gender.
Criterion Measure—The Kiddie-Schedule for Affective Disorders and Schizophrenia (K-SADS)
One obstacle to assessing the utility of mental health screening devices is the absence of a “gold standard” method for establishing the presence or absence of a mental disorder. The most valid diagnostic procedure involves an in-depth interview with the patient conducted by a credentialed clinician, ideally with additional information supplied by collateral informants (e.g., parents, teachers, records). An advantage of the present study is its use of a diagnostic procedure that more closely resembles this sort of clinical assessment than those used in prior research. Diagnosis was established in our study using the K-SADS, a semi-structured clinical interview that permits diagnostic decisions based on open-ended participant responses. Collateral sources were not consulted and the administrators were not credentialed clinicians but rigorously trained graduate interviewers; therefore, the procedure fell short of the clinical standard for diagnosis. Still, because it involves clinical judgment, the K-SADS provides a better approximation of a clinical diagnosis than do measures that rely exclusively on self-report [e.g., the The Diagnostic Interview Schedule for Children (DISC-IV), used in Wasserman et al., 2004], especially in juvenile justice populations, where lack of insight and perspective may impede accurate self-report (Wasserman et al., 2004).
Study Aims
The first goal of the present study was to evaluate how well the MAYSI-2 subscales correspond to mental health diagnoses (“criterion-related validity”), using the K-SADS to establish diagnosis, and to assess the practical utility of the measure as a whole in three populations of juvenile justice–involved youth. A secondary goal was to compare the performance of the MAYSI-2 to that of the YSR in the incarcerated sample. Specifically, we compared the MAYSI-2 and YSR subscales in terms of how well they mapped onto related diagnoses using receiver operating characteristic (ROC) curve analyses and contingency tables. Our analyses extend prior research in several ways. First, they do so by sampling youth in the “deeper end” of the justice system—those who have been found guilty or adjudicated delinquent and securely confined (incarcerated). Such youth represent a different population than detained youth, who may be charged with minor offenses and whose guilt may not yet be determined. The study also builds on prior research by treating male and female subsamples of detained youth as representative of distinct populations; a logical step in light of the evidence that, detained girls are, in terms of mental health symptoms, more dissimilar to their community counterparts than are detained boys (Cauffman et al., 2007). The third unique contribution of this study was our use of a diagnostic instrument that incorporated clinical judgment, rather than relying exclusively on self-report. Finally, and most important, we conducted a clinically relevant assessment of the MAYSI-2’s utility by examining how well it detected any of the most common mental health diagnoses under two decision rules likely to be applied in juvenile justice settings: scoring above the pre-established subscale cutoffs on either one or two subscales. (These cutoff scores are described in the Methods section.) For comparison, we conducted a parallel analysis with the YSR in the incarcerated sample. Though the MAYSI-2 manual does not instruct its users to use any particular algorithm for referring youth for further mental health assessment, juvenile justice facilities nevertheless must adopt some decision rule. The one- and two-cutoff rules that we tested represent possible decision rules that are likely to be adopted by juvenile justice facilities, given that the MAYSI-2 cutoff scores are meant to identify youth exhibiting clinically significant levels of distress.
Methods
Participants and Procedure
Three samples of juvenile justice–involved youth were analyzed separately. The first and second samples were male (“PA boys”) and female (“PA girls”) youth recruited upon their arrival at a juvenile detention facility in Pennsylvania. The third, “CA boys” was a male sample recruited post-adjudication upon entry at a secure incarceration facility in California. The sample descriptions and procedures for each subsample follow.
PA boys and girls
Boys (n = 130) and girls (n = 69) held in a Pennsylvania juvenile detention center were recruited for participation. Both male and female youth ranged in age from 14 to 17 (Mboys = 15.89, SDboys = 1.08; Mgirls = 15.61, SDgirls = 1.20) and were racially/ethically representative of youths incarcerated in this region (boys: 53.8% African American, 3.1% Hispanic, 36.9% White; girls: 58.0% African American, 1.4% Hispanic, 37.7% White). Upon arrival at the facility, all youths were administered the MAYSI-2 Voice, which reads each question aloud via computer administration, as part of the facility’s standard intake procedure. Within 1 week of completing the MAYSI-2, a subsample was selected to receive a diagnostic interview. To ensure variability of MAYSI-2 scores in this subsample, participants were selected such that those with low, average, and high scores on the MAYSI-2 were equally represented. Among those in each category, selection into the subsample was random. Interviewers were blind to participants’ MAYSI-2 scores. Prior to conducting the one-on-one diagnostic interview, written parental consent and youth assent were obtained. YSR data were not collected in the PA samples.
CA boys
Male serious offenders (N = 373) were recruited from a high security juvenile facility in California. Most youth in this sample (69%) had committed a violent offense, such as robbery (23%), aggravated assault (17%), battery (6%), or attempted murder (6%). This sample was 29% African American, 53% Hispanic, and 6% White, reflecting the demographics of the juvenile justice population in the region. Youth ranged in age from 14 to 17 years (M = 16.42, SD = .80). The facility provided parental contact information for all youth entering the facility to the researchers. Parental consent was obtained via telephone within 24 to 48 hr of the youths’ arrival at the facility. Parents verbally confirmed that they were the parent of the child in question and their consents were tape-recorded (with the parents’ consent) for record-keeping. If the parent was not comfortable being tape-recorded or preferred a different consent process, written consent was obtained. Youth assent was obtained at the start of the interview session. Of the parents contacted, 97% consented to their child’s participation. Within 48 hr of entry, each youth completed a one-on-one diagnostic mental health interview with a trained graduate student. Following this interview, participants were verbally administered the YSR and then the MAYSI-2 (within about 30 min of each other) as part of a longer interview.
Measures
MAYSI-2
The MAYSI-2 (Grisso & Barnum, 2000) is a mental health screening instrument designed to identify youth within juvenile justice settings who are most in need of clinical intervention for emotional and behavioral problems. It was administered to all three subsamples. Participants answer 52 “yes” or “no” questions about recent symptoms and experiences. Administration takes 10 to 12 min. The measure contains seven subscales: Alcohol/Drug Use, Angry-Irritable, Depressed-Anxious, Somatic Complaints, Suicide Ideation, Thought Disturbance, and Traumatic Experiences. Per the instructions in the manual, the Thought Disturbance scale is used for boys only and the Traumatic Experiences scale is sex specific (varying by one item to reflect common sex differences in traumatic experiences; for example, the girls’ score includes an item about rape that is excluded from the boys’ score). Scores are calculated as the number of “yes” responses for each scale. See Table 1 for sample items, scale reliability statistics, and the number of items on each scale; see Table 2 for scale means. Each MAYSI-2 scale, except Traumatic Experiences, has a “caution cutoff” at or above which the youth is considered to have “clinically significant” levels of symptomatology (Grisso et al., 2001). In our contingency table analyses and our tests of referral decision rules, we assess the degree to which these cutoffs accurately differentiate between youth with and without diagnoses. (The MAYSI-2 manual also describes “warning” cutoffs for each scale. Unlike the caution cutoffs, which were established based on associations with clinical cutoffs on other mental health assessments, the warning cutoffs were intended to flag the highest 10% of scores. We opted to use the caution cutoffs in our analyses as the warning cutoffs are too stringent for identifying youth who might have a diagnosis and the caution cutoffs are a closer parallel to the YSR clinical cutoffs.)
Scale Reliability for MAYSI-2 and YSR Subscales by Sample.
Note. MAYSI-2 = Massachusetts Youth Screening Instrument, Version 2; YSR = Youth Self-Report.
MAYSI-2 caution cutoff score used in the present analysis; these match the published MAYSI-2 caution cutoffs except in the case of Traumatic Experiences, for which there is no published caution cutoff. For the YSR, the borderline cutoff score for all scales. Sample items from each subscale are provided in italics. No alpha is provided for the Traumatic Experiences with the pooled sample as the scales for boys and girls are sex specific.
Diagnosis Rates and MAYSI-2/YSR Subscale Scores by Sample.
Note. MAYSI-2 = Massachusetts Youth Screening Instrument, Version 2; YSR = Youth Self-Report; CI = confidence interval; SUD = substance use disorders; DBD = disruptive behavioral disorders; AFF = affective disorders; ANX = anxiety disorders; SUI = suicide risk; PSY = psychotic disorders; PTSD = posttraumatic stress disorder. Suicidality is not a diagnosed disorder; youth who reported recent suicide attempts or serious suicidal ideation were classified as suicidal. All diagnoses are “current,” meaning diagnostic criteria were met within the past 6 months, except for psychosis and PTSD, which are based on lifetime symptoms. Thought Disturbance (R) is the revised scale that omits the “dream” item.
Research employing the MAYSI-2 (e.g., Ford, Chapman, Pearson, Borum, & Wolpaw, 2008) indicates that one item in the Thought Disturbance scale, “Have you had a bad feeling that things don’t seem real, like you’re in a dream?”—hereafter referred to as the “dream item”—should be eliminated because it is so commonly endorsed that it is not helpful in identifying psychotic symptoms. We observed this same phenomenon in our data. Accordingly, we present results for both the original and a revised Thought Disturbance scale, which omits this item.
YSR
The Child Behavior Checklist–YSR (Achenbach, 1991) is one of the most widely used assessments of emotional and behavioral maladjustment in children and adolescents. It is cited in over 6,000 studies. This measure was administered to the incarcerated male subsample (CA boys) only, and during the same session as the MAYSI-2 (in all instances, it preceded the administration of the MAYSI-2). On the YSR, youth rate on a 3-point scale (“not true,” “somewhat/sometimes true,” or “very true /often true”) the degree to which various behaviors, thoughts, and feelings were true for them in the past 6 months. Its 112 items comprise nine scales. To maximize comparability with the MAYSI-2 subscales, we only used seven scales in the present analyses: Withdrawn-Depressed, Anxious-Depressed, Somatic Complaints, Social Problems, Thought Problems, Rule Breaking, and Aggressive Behavior (subscales that were not included were Attention Problems, Internalizing Problems, and Externalizing Problems). See Table 1 for sample items, scale reliability statistics, and number of items on each scale; see Table 2 for scale means. Scores on YSR scales have corresponding T-scores, which are used to classify youth into three categories: normal (below a score of 60), borderline (between 60 and 63), and clinically significant symptomatology (above 63; Achenbach & Rescorla, 2001). Very few youth in the incarcerated sample scored above the clinical cutoff for any of the subscales (on average, 6.1% of youth fell above clinical cutoff for any of the scales [range: 1.1% to 28.4%]; the average drops to 2.3% of youth meeting the clinical threshold of any of the YSR subscales if we exclude the Rule Breaking scale). When we report YSR cutoff scores in the present analyses, we use a T-score of 60 or above (i.e., “borderline or clinical;” hereafter referred to as YSR cutoff scores), which is comparable to the threshold used for the MAYSI-2 cutoff (i.e., caution/borderline range).
Diagnosis
The K-SADS was used to assess current and lifetime psychiatric history. For the present analysis, we categorized youth as having a diagnosis if they met the criteria for “current” diagnosis, meaning the symptoms were present in the past 6 months. The only exceptions to this were with regard to posttraumatic stress disorder (PTSD) and psychosis, which were based on lifetime symptoms. For PTSD, the lifetime diagnosis was used because the MAYSI-2 asks about lifetime traumatic experiences, rather than querying only the past few months. For psychosis, it made sense to use lifetime diagnosis because psychosis is rare, recurring, and particularly problematic in a juvenile justice setting. The K-SADS is a semi-structured clinical interview designed to detect any of 32 Diagnostic and Statistical Manual of Mental Disorders (4th ed.; DSM-IV; American Psychiatric Association, 1994) Axis I child psychiatric diagnoses. The K-SADS, which takes 30 to 150 min to administer, generates reliable and valid child psychiatric diagnoses with high inter-rater agreement (range: 93% to 100%) and test-retest reliability (κ = .63 to 1.00; Kaufman et al., 1997). Graduate research assistants received extensive training in the administration of this interview, including coding of a video-taped interview. Inter-rater reliability for the coding of this interview was high (κ = .80).
For the present study, diagnoses were grouped into five categories: affective disorders (depression, dysthymia, mania, and bipolar disorder), anxiety disorders (panic disorder, separation anxiety disorder, social phobia, generalized anxiety, and overanxious disorder), psychotic disorders, substance use disorders, and PTSD. (It should be noted that the low rates of psychotic disorder in the present samples means that we must exercise extreme caution in drawing inferences about the utility of the MAYSI-2 and YSR for detecting it.) As the prevention of suicidal behavior is a top priority for juvenile justice facilities, we also created a sixth “diagnosis” category called suicide risk, which was operationalized as a recent suicide attempt or serious suicidal ideation. Rates of diagnosis are shown in Table 2. Because juvenile justice–involved youth are generally assumed to have behavioral problems (though not all will meet the diagnostic criteria for diagnosis), we present our results with and without including disruptive behavior disorders as a diagnostic category of interest.
Analytic Approach
To investigate the relation of MAYSI-2 and YSR subscale scores to diagnosis, we employed ROC analysis, which gauges the sensitivity and specificity of a scale score relative to a criterion—diagnostic status, in this case. For any given score (e.g., a score of 2 on the MAYSI-2 Depressed-Anxious subscale), sensitivity refers to the proportion of youth with a particular diagnosis (e.g., an affective disorder) who score at or above that threshold (true positives), whereas specificity refers to the proportion of youth without a diagnosis who score below that threshold (true negatives). In general, lower thresholds yield greater sensitivity (more individuals with diagnoses are correctly classified) but poorer specificity (more individuals without diagnosis are incorrectly classified). Conversely, higher thresholds yield better specificity, but at the cost of sensitivity. ROC analysis produces a curve that plots the proportion of true positives (on the Y-axis) against the proportion of false positives (on the X-axis) for every possible score on the screening instrument (in this case, the MAYSI-2 or YSR subscale). The proportion of the plot area that lies underneath the curve (“area under the curve” or AUC) is a measure of the screening tool’s validity. An AUC of .50, which is the AUC of the diagonal “chance performance” curve (and the equivalent of a Cohen’s d of 0.00; Rice & Harris, 2005), would indicate that a subscale was useless—no better than chance—at classifying youth with or without the criterion diagnosis. Researchers who have compared different measures of effect sizes have indicated that an AUC of .556 corresponds to a Cohen’s d of .200 (small effect), an AUC of .639 corresponds to a Cohen’s d of .500 (medium effect), and an AUC of .714 corresponds to a Cohen’s d of .800 (large effect; Rice & Harris, 2005). We refer to AUCs ranging from .70 to .79 as moderately useful, .80 to .89 as highly useful, and .90 to 1.0 as excellent (Swets, 1988). Importantly, ROC analyses are most reliable when the prevalence rate for the criterion (i.e., the diagnosis) is close to 50%.
In addition to the ROC analyses, we examine, using contingency tables, how well the pre-established MAYSI-2 and YSR subscale cutoffs perform at classifying youth with and without diagnoses. Finally, we test how well two decision rules—minimum one score versus minimum two scores above the cutoff on any subscale of the MAYSI-2—perform at correctly identifying youth with and without any diagnosis. For incarcerated boys, we also test these decision rules for the YSR. In these analyses, we report several key statistics defined in Box 1.
Statistics Reported to Assess Construct Validity.
Note. K-SADS = Kiddie-Schedule for Affective Disorders and Schizophrenia.
Results
Correspondence Between MAYSI-2 Subscales and Diagnoses
The results of the ROC analyses are reported in Table 3, with shading used to signify instances where the mental health screening subscale and diagnosis are homotypic (theoretically related). The results of analyzing the convergence of the subscales’ cutoffs relative to diagnosis are reported in Table 4. Overall, the MAYSI-2’s Alcohol/Drug Use, Depressed-Anxious, Traumatic Experiences, and Suicide Ideation scales exhibited moderate levels of convergence with homotypic diagnoses, though there was variation across subsamples. The MAYSI-2 Thought Disturbance scale also performed well, though these results should be interpreted with caution due to the low base-rate of psychosis.
AUC Estimates and 95% Confidence Intervals.
Note. Shading signifies that the scale is theoretically related to the diagnosis cluster. Bold font indicates at least moderately useful AUCs (>.70). AUC = area under the curve; K-SADS = Kiddie-Schedule for Affective Disorders and Schizophrenia; SUD = Substance Use Disorders; DBD = Disruptive Behavioral Disorders; AFF = Affective Disorders; ANX = Anxiety Disorders; PTSD = posttraumatic stress disorder; SUI = Suicide Risk; PSY = Psychotic Disorders; ANY = Any Disorder.
MAYSI-2 and YSR Cutoffs: Positive and Negative Predicted Values, Sensitivity, and Specificity.
Note. Shading signifies that the scale is theoretically related to the diagnosis cluster. “Caution range” refers to the percentage of youth scoring at or above the cutoff used for each scale. K-SADS = Kiddie-Schedule for Affective Disorders and Schizophrenia; SN = sensitivity; SP = specificity; PPV = positive predictive value; NPV = negative predictive value; MAYSI-2 = Massachusetts Youth Screening Instrument. See Box 1 for definitions of abbreviation definitions.
Substance use disorders
In detecting substance use disorders, the MAYSI-2 Alcohol/Drug Use scale performed moderately well among PA boys and girls (AUC = .75 and .77 respectively); but not as well in the incarcerated sample (CA boys; AUC = .67). Similarly, looking at the cutoff scores, Alcohol/Drug Use performed reasonably well across all three subsamples. Examining the positive predictive values (PPV) and negative predictive values (NPV) reported in Table 4, one observes that the Alcohol/Drug Use scale was the only subscale of the MAYSI-2 for which half or more of those scoring above the cutoff obtained the homotypic diagnosis (substance use disorder) and more than half scoring below the cutoff did not meet criteria. Consistent with the AUC results, the PPV of this subscale was substantially lower in the CA boys sample than in the PA samples. The relatively low sensitivity of this scale for PA girls (SN = .53) and PA boys (SN = .43), despite the moderately high AUCs, indicates that the caution cutoff on this scale may have been too high, at least for the detained samples.
Disruptive behavior disorders
None of the MAYSI-2 subscales was consistently useful in detecting disruptive behavioral disorders. In particular, the Angry-Irritable scale was not found to be a useful indicator of disruptive behavior disorders in any of the samples.
Affective disorders
For affective disorders, the MAYSI-2 Depressed-Anxious scale was moderately useful among CA boys and PA girls (AUC = .72 and .78 respectively), but less useful for PA boys (AUC = .63). Examination of the cutoff scores reveals relatively low PPV and high NPV for MAYSI-2 subscales gauging symptoms of affective disorder, suggesting that the cutoff thresholds might be too low. Even among PA girls (where PPV was highest), only about a quarter of those scoring above the cutoff on the homotypic scales had an affective diagnosis. For PA boys, it was less than a fifth, and for CA boys, less than a tenth. The Angry-Irritable scale was not a useful indicator of affective disorders in any of the samples, though among the PA boys (AUC = .67) it did perform better than the Depressed-Anxious scale.
Anxiety disorders
Anxiety disorders proved difficult to detect with specific subscales of the screening instruments. The MAYSI-2 Somatic Complaints scale was not useful in this regard for any subsample. Indeed, the only case in which a MAYSI-2 scale proved to be moderately useful for detecting anxiety diagnosis was the Depressed-Anxious scale and only for CA boys (AUC = .71). The cutoff scores also performed rather poorly. Fewer than 15% of those scoring above the cutoffs in any subsample (and fewer than 6% in the PA boys subsample) had an anxiety disorder. Again, this suggests that the cutoffs may have been too low. However, the low AUC scores suggest that raising the cutoffs would probably not make the subscales more accurate—any gain in specificity would be offset by a substantial loss in sensitivity.
PTSD
The MAYSI-2 demonstrated better convergence with diagnosis of PTSD, at least in the detained subsamples. For PA girls and PA boys respectively, the MAYSI-2 Traumatic Experiences scale evidenced moderate (AUC = .75) and good (AUC = .80) levels of criterion validity. Validity was lower for CA boys (AUC = .68). Interestingly, several other scales proved similarly useful at detecting PTSD among PA boys, including Alcohol/Drug Use (AUC = .80) and Depressed-Anxious (AUC = .78), though wider confidence intervals suggest that these other scales were less reliably related to PTSD than the Traumatic Experiences scale. Though the MAYSI-2 manual does not specify a cutoff for Traumatic Experiences, our ROC analysis indicated that using a cutoff of three for this scale results in a high degree of sensitivity, especially among boys, while providing a reasonable degree of specificity (comparable to the other MAYSI-2 subscales). Therefore, for the present analysis, we use three items as a cutoff for the Traumatic Experiences subscale. For PA girls, PPV was relatively high: Nearly a third of those scoring about the cutoff met criteria for PTSD. However, the PPVs were lower for the male subsamples. Specificity was reasonable (in the .6 range) for the detained samples, but was a bit lower for the CA boys (.43), suggesting that traumatic experiences were less indicative of PTSD in this sample relative to the detained samples The low PPVs (.31, .11, and .15 for PA girls, PA boys and CA boys, respectively) indicate that only a small percentage of youth—especially of the boys—who experienced at least three traumatic events met the diagnostic criteria for PTSD. Taken together with the very high sensitivity of this scale for boys, the data suggest that, for boys, a cutoff of three may be too low for this scale. Not surprisingly, the NPV values were very high across all three samples for the Traumatic Experiences scale: At present, one cannot obtain a diagnosis of PTSD if one has not experienced trauma.
Suicidality
The Suicide Ideation scale proved moderately useful in detecting suicidality for CA (AUC = .78) and PA boys (AUC = .77) and highly useful for PA girls (AUC = .89). However, use of the cutoff on this scale resulted in correct classification of only 64% of PA girls, 75% of PA boys, and 31% CA boys describing serious suicidal ideation in the K-SADS interview. The Angry-Irritable and Depressed-Anxious scales performed more poorly than the Suicide Ideation scale for the PA girls and PA boys but these scales respectively correctly identified 38% and 54% of CA boys when compared with K-SADS suicidal ideation reports.
Psychosis
The MAYSI-2 scale that evidenced the greatest criterion validity was Thought Disturbance, which yielded very high AUCs (.95 for CA boys, .98 for PA boys), in spite of the very low prevalence of the homotypic diagnosis of psychotic disorder. Closer examination indicated that specificity would be improved in these samples—with no loss in sensitivity—by eliminating the dream item, which was endorsed by 31% of CA boys and 35% of PA boys. Furthermore, the dream item was not particularly sensitive; only two of the four youth with a psychotic disorder in the sample endorsed it. The cutoff for the original and revised Thought Disturbance subscales correctly classified every youth with a diagnosis of psychosis. As expected, the revised version had greater specificity than the original (.84 vs. .59 for PA boys and .83 vs. .61 for CA boys).
Correspondence Between YSR Scales and Diagnoses in Incarcerated Boys
On average, the YSR scale AUCs (for CA boys) were similar to those of the MAYSI-2. None of the YSR scales were meant to assess symptoms of trauma and, accordingly, none of its scales were sensitive to PTSD. In contrast, the YSR’s Anxious-Depressed Scale proved just as useful as the MAYSI-2’s Suicide Ideation scale in identifying youth with high suicide risk. Also, the YSR’s Anxious-Depressed and Withdrawn-Depressed scales were slightly more convergent with anxiety diagnosis than the MAYSI-2’s Depressed-Anxious scale. Interestingly, for both the MAYSI-2 and the YSR (but more so for the YSR), the Somatic Complaints subscale showed greater convergence with affective than with anxiety diagnoses.
Testing Two Referral Decision Rules
Our final set of analyses investigated the results of applying two potential decision rules for referral. We wanted these analyses to provide information about the practical impact of using either one score in the caution range (a “one-cutoff” rule) versus two scores in the caution range (a “two-cutoff’ rule) as the threshold for referring a youth for further mental health assessment. If a youth meeting or exceeding the threshold for a decision rule had a diagnosis of any kind, we considered the decision rule to have worked correctly. If a youth without any diagnosis did not meet the threshold for a decision rule, we considered that to be a correct outcome as well. Conversely, if a youth with any disorder failed to meet the threshold or if a youth without a disorder scored above the threshold for the rule, we considered it to be a failure of the decision rule. We evaluated the decision rules with and without including disruptive behavior disorders in our “any diagnosis” category. Our results (displayed in Table 5) report the hit rate (the percentage of youth correctly classified as either having or not having any diagnosis) along with the PPV, NPV, SN, and SP (see Box 1 for definitions).
Comparing the One-Cutoff and Two-Cutoff Decision Rules for the MAYSI-2 and YSR.
Note. MAYSI-2 = Massachusetts Youth Screening Instrument, Version 2; YSR = Youth Self-Report; DBD = disruptive behavior disorder; K-SADS = Kiddie-Schedule for Affective Disorders and Schizophrenia; SN = sensitivity; SP = specificity; PPV = positive predictive value; NPV = negative predictive value. See Box 1 for the definitions of the statistics reported in this table. “Thought dist. Version” refers whether the standard version of the MAYSI-2 Thought Disturbance scale was used or, alternatively, whether the revised version that omits the dream item was used.
Overall, application of the two-cutoff decision rule (compared to the one-cutoff rule) resulted in lower sensitivity but increased specificity of the MAYSI-2 for the male samples. The gains in specificity were smaller in magnitude than the losses in sensitivity, but not dramatically so. For the detained girls, the two-cutoff rule (compared to the one-cutoff rule) resulted in a 4–5% loss of sensitivity with no corresponding increase in specificity. Therefore, the one-cutoff rule appears to be the better choice for facilities screening detained females.
Excluding disruptive behavioral disorders as an “eligible” diagnosis decreased the PPVs for all subsamples (from the 80% range to the 60% range, roughly), but also improved the NPVs—dramatically in the case of detained boys. To illustrate what this means in concrete terms, there were 60 incarcerated boys who exceeded the cutoff threshold on one or more MAYSI-2 scale whose only diagnosis was a disruptive behavior disorder. When disruptive behavior disorder was excluded as an eligible diagnosis, these youth moved from a correctly classified category (true positives) into a misclassified category (false positives) under the one-cutoff decision rule, lowering the PPV. With respect to the NPVs, there were 11 incarcerated boys who did not score above the cutoff on any MAYSI-2 scale whose only diagnosis was a disruptive behavioral disorder. When disruptive behavior disorder was excluded as an eligible diagnosis, these youth moved from a misclassified category (misses) into a correctly classified category (true negatives) under the one-cutoff decision rule, thereby raising the NPV. In our sample, a significant minority of those who scored above the threshold under either decision rule were youth whose only diagnosis was a disruptive behavior disorder.
Finally, for the CA boys, it is clear that the MAYSI-2 was at least as successful as the YSR at distinguishing between youth with and without any diagnosis. Excluding disruptive behavior diagnoses, the MAYSI-2 correctly classified 58% and 56% of youth under the one- and two-cutoff rules respectively, whereas the YSR correctly classified 60% and 51%, respectively. Compared with the YSR, the MAYSI-2 had higher sensitivity, but poorer specificity under both decision rules.
Discussion
As part of its mandate to rehabilitate youth, the juvenile justice system must be able to identify youth in need of mental health intervention. Mental health screening devices can assist in this pursuit insofar as they correctly identify youth in need of services and minimize unnecessary referrals for those without serious mental health concerns. Results from the present analysis, consistent with those of prior research (e.g., Wasserman et al., 2003), indicate that the MAYSI-2 performs this function moderately well. Moreover, among incarcerated adolescent boys, the MAYSI-2 appears to perform at least as well as the more widely-used and more expensive YSR. When the two screening devices were evaluated with respect to decision rules likely to be used in juvenile justice settings to make mental health assessment referrals, the MAYSI-2 actually performed slightly better than the YSR, correctly classifying 71% as opposed to 68% of youth when a single caution score was the threshold. Under this decision rule, the YSR evidenced a reasonable balance between sensitivity and specificity, whereas the MAYSI-2 was highly sensitive, but not very specific. Under the rule whereby two-cutoff scores triggered a referral, the YSR’s sensitivity dropped to an unacceptably low level, whereas the MAYSI-2 exhibited a reasonable balance between sensitivity and specificity.
There can be little doubt that YSR is a useful mental health screening instrument. However, with regard to its use in juvenile justice settings, it has several drawbacks, including its greater length and substantial cost. The MAYSI-2 was designed to accommodate the restricted resources of juvenile justice facilities. It can, with minimal training, be administered by any staff member and is automatically scored, whereas the YSR must be administered and scored by a staff member with advanced training (e.g., a master’s degree in social work). The MAYSI-2 is inexpensive. Facilities pay an initial fee for software and training, but, unlike the YSR, the cost does not increase as a function of the number of times it is administered. Given the equivalent effectiveness of these two devices, the MAYSI-2’s operational advantages appear to make it a better choice for resource-limited juvenile justice facilities.
The present study was the first to examine the MAYSI-2’s performance at different stages of juvenile justice processing. The patterns across these levels were not consistent. For most diagnostic clusters, the MAYSI-2 scales were not substantially more sensitive or specific to related disorders for detained than for incarcerated boys. Two exceptions were for the detection of PTSD and suicidality. For these clusters, the relevant MAYSI-2 scales were more sensitive in the detained than in the incarcerated samples. For PTSD, the relevant MAYSI-2 scale was also more specific in the detained than in the incarcerated sample. The low sensitivity of the MAYSI-2 Suicide Ideation scale to suicidality in the incarcerated sample, in the context of comparable AUCs for incarcerated (.78) and detained (.77) boys, may indicate that the cutoff for Suicide Ideation (2) was too high for the sample of incarcerated boys.
The analyses examining decision rules for referral also revealed some differences across samples. For detained boys (but not for other groups), sensitivity was slightly improved and specificity substantially improved (though still not high) under the one-cutoff decision rule when disruptive behavior disorders were not considered. This outcome suggests that—consistent with the ROC findings—the MAYSI-2 caution cutoffs are not useful for discriminating between detained boys with and without disruptive behavior disorders. In the present sample, over a fifth of the detained boys (22%) had diagnoses only of disruptive behavior disorders. As mentioned in the introduction, the juvenile justice system operates on the premise that youth who come into contact with it are likely to have serious behavioral problems. So the MAYSI-2’s lack of criterion-related validity with respect to this class of diagnoses is not particularly problematic.
A limitation of the present study is that, because of the multiple differences between youth in the detained and incarcerated male samples, we cannot be sure whether the differences in MAYSI-2 performance were attributable to the difference in sex, stage of processing, or other factors. One salient difference between incarcerated and detained youth is that the former have typically been in custody longer. This could explain why the MAYSI-2 Alcohol/Drug Use scale had poorer correspondence with substance use disorders in the incarcerated than in the detained samples. To score high on this scale, one needs to have access to alcohol and/or drugs, which are not as available when youth are in custody. Differences in base-rates of the disorders and in the sampling methodologies could also explain at least some of the differences across the detained and incarcerated samples. The fact that the detained subsamples were stratified based on MAYSI-2 scores whereas the incarcerated sample was not could also have contributed to apparent differences in the sensitivity and specificity of the instrument across these groups. (Indeed, the rates of disorder in our detained subsample differed fairly substantially from the rates observed in Teplin et al. (2002), who screened a random sample of detained youth in Washington state using the DISC. Specifically, our detained subsamples exhibited lower rates of affective and anxiety disorders, higher rates of disruptive behavioral disorders, and comparable rates of substance use disorders compared with the Washington state samples. But, here it is hard to know whether the differential rates of disorder reflect true differences in the rates of disorder, differences due to sampling methodology, or differences due to the use of a different diagnostic instrument.) In the present study, all diagnostic interviews were completed by trained graduate students instead of clinicians; however, oversight for these trainings was completed by the same person across all sites, which should minimize concerns about inconsistencies. We therefore conclude tentatively that our results provide initial evidence that a version of the MAYSI-2 tailored for use with populations that have been in custody longer could be useful for juvenile justice facilities.
Our analysis also identified some sex differences in the MAYSI-2’s performance. For affective disorders and suicide risk, the relevant MAYSI-2 scales tended to be more sensitive among the detained girls than among the male samples. For substance use disorders, the relevant MAYSI-2 scale had poor sensitivity (.53 for PA girls, .43 for PA boys, and .61 for CA boys). Taken together with the observation that the AUCs for the scale were reasonably high, at least for detained youth (.75 for PA girls, .77 for PA boys, and .67 for CA boys), this suggests that the caution cutoff may be too high. With respect to the MAYSI-2’s overall ability to correctly classify youth with and without mental health diagnoses, the instrument performed slightly better for detained girls than for detained boys under both the one-cutoff and two-cutoff decision rules, especially when disruptive behavior disorder diagnoses were considered. Under these conditions, the advantage for girls was evident for both specificity and sensitivity.
The MAYSI-2 is clearly a useful tool; however, further modification may improve its clinical utility. One possible improvement would be to remove the “dream” item from the Thought Disturbance subscale. Our analysis suggests that doing so would improve the MAYSI-2’s specificity. Fewer youth without thought disturbance disorders would score above the cutoff on the Thought Disturbance scale, thereby preserving limited mental health care resources. Also, as noted earlier, there appears to be reason to consider lowering the caution cutoff for the MAYSI-2 Alcohol/Drug Use scale. The cutoff score on the MAYSI-2 Suicide Ideation scale also warrants attention. In the detection of suicide risk, sensitivity is paramount. Our findings may indicate that the scale’s cutoff was too high—particularly for the incarcerated boys—leading to poor sensitivity. Alternatively, the Suicide Ideation scale items may just not work as well in an incarceration context. Therefore, juvenile justice facilities should (continue to) employ instruments designed specifically to detect suicide risk as part of their screening protocols. Of course, ours is just one study—replication of our findings would be prudent before making changes to the MAYSI-2.
In its current state, the MAYSI-2 subscales (and the YSR subscales) are only moderately sensitive and specific to their theoretically related diagnostic categories. This finding should serve both as guidance for those researchers hoping to improve the MAYSI-2 and as a reminder to practitioners (and researchers) not to treat high or low scores on particular subscales as diagnostic of particular disorders. Importantly, when used as intended—as a general triage tool—the MAYSI-2 performs better. The YSR also functions well in this capacity, but only when a single above-the-cutoff score is used to trigger referral. Still, it is important for juvenile justice practitioners to bear in mind that, even when used correctly, the MAYSI-2 is not perfect. Many youth who fail to meet the threshold for referral under the one- and two-cutoff rules may have serious mental health problems. Further, this may be truer for detained boys than for detained girls. For this reason, we recommend that facilities monitor youth who score high on one or more MAYSI-2 subscale, even if they do not reach the threshold for referral. Facilities should also, of course, be open to referring youth who exhibit behaviors consistent with mental illness for further assessment, even if they do not self-report problems on mental health screening instruments.
In conclusion, the MAYSI-2 and YSR provide helpful guidance for juvenile justice facilities coping with high rates of incoming youth and limited capacity for mental health assessment. Given the MAYSI-2’s comparability to the YSR and superior suitability for use in juvenile justice settings, the MAYSI-2 should continue to be widely implemented, even as behavioral scientists work to boost its effectiveness.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
