Abstract
The stability of psychopathic personality disturbance (PPD) has important theoretical implications for developmental criminology and population heterogeneity perspective assertions that psychopathy is a key measure of criminal propensity. Data from the Pathways to Desistance Study (n = 1,354) were used to examine short-, moderate-, and long-term reliable change in symptoms of PPD measured via the Youth Psychopathic Traits Inventory (YPI). Youth scoring highest on the YPI at the baseline assessment were most likely to experience reliable decreases in test scores. Binomial regression analyses showed that a reliable decrease in YPI test score was associated with decreased odds of endorsing additional offenses. Findings contrasted the adolescent “fledgling” psychopathy perspective and indicated that individuals scoring high on the YPI are the group most likely to experience reliable decreases in test scores, especially over a longer follow-up period.
Keywords
Introduction
There is a long tradition of clinical research on what is now referred to as psychopathy (e.g., Cleckley, 1941; Karpman, 1948; Pinel, 1801). Through an evolution in the approach to measuring this construct, associated instruments have become central to psychiatric and correctional settings and to the assessment of risk for reoffending (e.g., Hare, 1996; Hart, 1998; Salekin, Rogers, & Sewell, 1996). Two fundamental assumptions guide scholars’ interest in this construct. The first assumption suggests a strong connection between psychopathy and criminal behavior, which explains its relevance for risk prediction purposes and the assessment of dangerousness in psychiatric and correctional settings. This connection is such that common measurement tools like the Psychopathy Checklist–Revised (PCL-R; Hare, 2003) directly refer to repetitive and diversified criminal behavior as symptoms of psychopathy. Although the importance of criminal behavior in the measurement of psychopathy is debated (e.g., Skeem & Cooke, 2010), its relevance to the prediction of offending is not (Hart, 1998), with the construct showing incremental predictive validity above and beyond other key theoretical constructs such as low self-control (DeLisi et al., 2014; Flexon & Meldrum, 2013).
The second assumption posits that psychopathy is defined by stable traits that manifest in a distinctive pattern of interpersonal, affective, and behavioral deficits. This assumption is closely aligned with descriptions of criminal propensity that emerged from population heterogeneity perspectives (see Nagin & Paternoster, 2000). Although personality traits in general are expected to remain relatively stable over time, personality disorders such as psychopathy differ in that this combination of personality traits is rigid, inflexible, and pervasive, with pathological themes repeating over the life course (e.g., Millon & Davis, 2000). Indeed, although a high test score on a measure of psychopathy during one period of adolescence may be evidence that the disorder is present, the adolescent is only truly characterized by the disorder if symptoms remain stable over the life course.
The trait features of psychopathy suggest that its symptoms do not arise de novo in adulthood, and therefore the second assumption is that adults presenting with the disorder have presented this way across the life course (for a discussion, see Seagrave & Grisso, 2002). However, this assumption is still subject to substantial debate with respect to symptom stability among adolescents showing signs of psychopathic personality disturbance (PPD; Edens & Vincent, 2008). For adolescents, the term PPD is used in place of psychopathy because personality is not fully formed at this developmental stage. Measures of PPD have been developed, tested, and validated for use during childhood and adolescent developmental stages (e.g., Andershed, Kerr, Stattin, & Levander, 2002; Forth, Kosson, & Hare, 2003), and can be used to assess the assumption of symptom stability. The few studies that have used repeated measures of PPD across different age stages have lead to questions about its development and the static aspect of PPD over the life course (e.g., Cauffman, Skeem, Dmitrieva, & Cavanagh, 2016; Hawes, Mulvey, Schubert, & Pardini, 2014). The current study expanded on earlier work by utilizing a person-oriented analytic strategy to evaluate individual-level change in symptoms of PPD over multiple measurement periods, including whether this change occurred across different levels of the disorder and whether this change impacted offending outcomes.
A Developmental Criminology Perspective on the Stability of Psychopathy
Descriptions of the nature of psychopathy as a stable construct defined by maladaptive personality traits coincide with developmental criminology principles (see Farrington, 2005), especially population heterogeneity perspectives on criminal propensity. As evidence that psychopathy is associated with a propensity for criminal behavior, a small number of longitudinal studies indicated that high scores on measures of PPD in adolescence predicted association with an offending trajectory characterized by a high rate of general, serious, and violent offending through emerging adulthood (e.g., Dyck, Campbell, Schmidt, & Wershler, 2013; McCuish, Corrado, Hart, & DeLisi, 2015). However, repeated measures of PPD were not utilized in these studies, which is inconsistent with principles of developmental criminology (e.g., Loeber & Le Blanc, 1990) and does not allow for verification of its status as a time-invariant construct, which is a defining feature of criminal propensity (e.g., Nagin & Paternoster, 2000). Thus, although these studies helped illustrate a link between PPD and long-term offending, they did not clarify whether this relationship was due to the stability of the construct or due to state-dependence effects in which symptoms of PPD in adolescence may dissipate, yet have a lasting negative impact on the individual’s access to sources of informal social control, which in turn influence continued offending.
Studies on the stability of PPD and analogous traits between childhood and adolescence and between adolescence and adulthood indicate that the construct is only moderately stable over this period (e.g., Baskin-Sommers, Waller, Fish, & Hyde, 2015; Hawes et al., 2015; Neumann, Wampler, Taylor, Blonigen, & Iacono, 2011; Obradović, Pardini, Long, & Loeber, 2007). When looking at the highest levels of the construct, the evidence for stability is weaker (e.g., Cauffman et al., 2016; Hawes et al., 2014). However, there are two important methodological concerns to note regarding this existing research. First, for those studies examining general stability, important ecological fallacy concerns remained unaddressed regarding whether degree of stability or change was equal across subgroups defined by, for example, low, medium, and high levels of the trait. Second, those studies that addressed ecological fallacy concerns (e.g., Cauffman et al., 2016; Hawes et al., 2014) relied on classification schemes in which change could be defined by very minor differences in PPD test scores. For example, using data from the Pathways to Desistance Study, Hawes et al. (2014) observed that only approximately 15% of participants scoring 50 or greater on the Youth Psychopathic Traits Inventory–Short Version (YPI-S; van Baardewijk et al., 2010) at baseline assessment were the same youth who scored 50 or greater at the final assessment approximately 6.5 years later. However, because most youth scoring greater than 50 will cluster at the lower bound of this threshold, small changes in scoring (e.g., measurement error, regression to the mean) could result in a participant falling below the cutoff point at the final assessment. In other words, stability may have been underestimated, because although a young person may miss the cutoff for a high score, they may still present with a substantial inclination toward PPD. As such, addressing this second methodological concern requires an analytic strategy that evaluates whether the change observed can be considered reliable as opposed to the result of measurement error or regression to the mean.
One analytic strategy that can address the abovementioned methodological need is the Reliable Change Index (RCI; Jacobson & Truax, 1991). Whereas correlations, t tests, and rank-order comparisons evaluate stability at the group level, the RCI is more in line with developmental criminology and person-oriented analyses of within-individual change (Bergman & Magnusson, 1997) as it evaluates whether an individual’s score on an instrument at one time-period was reliably different from their score at a later period. This involves calculating the difference between an individual’s score at two measurement periods and dividing by the standard error of measurement, the latter of which is calculated by considering both the standard deviation and the internal consistency of test scores. In effect, change is defined as reliable because the RCI accounts for measurement error in evaluating whether an individual illustrated a reliable increase, reliable decrease, or no reliable change across two measurement periods. Thus far, studies using the RCI in the context of research on PPD have shown that the construct is relatively stable (e.g., Forsman, Lichtenstein, Andershed, & Larsson, 2008; Hemphälä, Kosson, Westerman, & Hodgins, 2015). However, because average PPD test scores in these studies were relatively low, a conclusion that rivals assertions that PPD is stable over time is that the observed stability in these studies was primarily due to the presence of low test scores that remained low over the study period. Thus, there is a need for research that examines reliable change across different levels of test scores. Just as there is less interest in whether symptoms of the flu remain stable among individuals who are not ill, when examining the stability of symptoms of PPD it makes less sense to aggregate into the same group both low and high scoring youth. If a youth with a high test score on a measure of PPD experienced a reliable decrease in test score that corresponded with lower levels of offending, important implications can be drawn for both desistance theory and the utility of treatment for this group.
In sum, research is needed that simultaneously examines (a) within-individual change in symptoms of PPD, (b) whether change occurs at different levels of the latent trait, (c) whether this change is reliable, and (d) whether reliable change or stability has implications for participation in criminal behavior. These research needs were addressed in the current study to evaluate conceptual concerns about the potential instability of symptoms of PPD among adolescents (e.g., Edens & Vincent, 2008), what this means for explanations of offending, and how this may affect theoretical descriptions of psychopathy as a measure of criminal propensity (e.g., DeLisi, 2016).
Method
Participants
Data were derived from the multisite Pathways to Desistance Study (see Schubert et al., 2010, for additional details), which involved interviews with adolescent males (n = 1,170) and females (n = 184), ranging in age from 14 to 17 at the time of recruitment (112 participants were ages 18-19 at the time of data collection). There was a similar proportion of Black (41.4%), Hispanic (33.5%), and White (20.2%) participants (the remaining 4.8% were identified as “Other”). Youth were included only if they were convicted of a crime, typically a felony crime. Approximately half of the sample had a history of felony violent crimes. Youth in conflict with the law due to drug-related crimes were restricted to a total of 15% of the male subsample. Participants completed a baseline interview and then completed follow-up interviews at 6-month intervals for a 3-year period and then at 1-year intervals for the next 4 years. Youth were an average of 16.04 years old (SD = 1.14) at baseline and an average of 23.03 years old (SD = 1.15) by the time of their final interview (n = 1,134). Archived data from the study are available at the Inter-university Consortium for Political and Social Research at the University of Michigan (Mulvey, 2013).
Procedure
Of the eligible youth (n = 2,008), 67% agreed to participate (n = 1,354). Participants completed a baseline interview within 75 days if they were adjudicated under the youth justice system or within 90 days if they were designated to the adult justice system. Computer-assisted interviews were performed in the homes of participants as well as in public and correctional facilities. Participants were informed that information provided in the interview would be held confidential except in instances of suspected child abuse or imminent danger to others. Participants were remunerated for each interview. Approximately two thirds of the sample participated in all 10 follow-up interviews and only 17 participants were not interviewed after the first assessment. The principal focus within the current study was on the stability of PPD, as measured by the Youth Psychopathic Traits Inventory (YPI; Andershed et al., 2002). The YPI is the full, 50-item version of the YPI-S used by Hawes et al. (2014). This YPI was administered at the 6-month follow-up interview (hereinafter referred to as T1) and again at each subsequent interview (T2 = 12-month follow-up, T3 = 18-month follow-up, T4 = 24-month follow-up, T5 = 30-month follow-up, T6 = 36-month follow-up, T7 = 48-month follow-up, T8 = 60-month follow-up, T9 = 72-month follow-up, and T10 = 84-month follow-up). Excluding those who never completed the YPI (n = 19), the average participant completed this assessment on 8.97 (SD = 1.70) out of 10 possible occasions.
Measures
Andershed et al. (2002) developed the YPI as a 50-item, 4-point (1-4) Likert-type scale self-report measure of symptoms of PPD in adolescence. The YPI was intended to reflect Cooke and Michie’s (2001) three-factor model by excluding items that referenced involvement in criminal behavior and by including items reflecting interpersonal, affective, and lifestyle deficits. These features are captured by 10 subscales that are mapped onto three dimensions: Grandiose-Manipulative, Callous-Unemotional, and Impulsive-Irresponsible. These subscales and dimensions were not examined in this study but are discussed elsewhere (e.g., Andershed et al., 2002). The YPI developmental sample consisted of a group of adolescents from the community, which differs from the more delinquent sample examined in the current study. Nevertheless, other studies of delinquent samples have shown that the YPI suitably emphasizes interpersonal and affective symptoms, is reliable, and informative of offending outcomes (e.g., Skeem & Cauffman, 2003). The YPI had a high degree of reliability in the current study, with Cronbach’s alpha values ranging from .93 to .94 across all 10 assessments. Walters (2015) illustrated the predictive validity of the YPI using the same sample. A description of the distribution of YPI scores across each assessment period is included in Table 1 and shows that average test scores decreased over time. Average test scores were slightly lower in this sample (approximately 100 across the different waves) compared with Skeem and Cauffman’s (2003) sample of incarcerated male youth (approximately 120) but were slightly higher compared with average test scores (approximately 90) within a high-risk sample of boys from the Pittsburgh Youth Study (Neumann & Pardini, 2014).
Description of the Distribution of Key Measures Across Each Interview Period
Note. YPI = Youth Psychopathic Traits Inventory; SRO = Self-Reported Offending.
Low scores = >1 SD below the mean; low–medium scores = between ≤1 SD below the mean and the mean; medium–high scores = between value greater than the mean and ≤1 SD below the mean; high scores = >1 SD above the mean.
Two categorical representations of the YPI were created to examine whether the prevalence of change or stability varied according to different levels of YPI test scores. First, YPI total scores were transformed into z scores to create four test score categories: low (>1 SD below the mean), low–medium (≤1 SD below the mean and the mean), medium–high (> the mean and ≤1 SD above the mean), and high (>1 SD above the mean). Second, YPI scores were dichotomized at each assessment period according to whether a participant’s test score fell into the 90th percentile or above. Across the 10 assessments, the minimum YPI score necessary to reach the 90th percentile ranged from 125 to 139.
Offending was measured using the Self-Reported Offending (SRO; Huizinga, Esbensen, & Weiher, 1991), which was administered at each assessment. Two items (“ever went joyriding” and “ever broke into a car to steal”) were not consistently administered throughout the study and so a modified version of the SRO was used to create an offending versatility scale based on a count of the number of self-reported offending behaviors (k = 22) engaged in between assessments. Examining offending versatility helped capture an offending pattern that was not overweighted by an individual’s frequent involvement in minor crimes (see Sweeten, Pyrooz, & Piquero, 2013). The average level of offending versatility across each follow-up period is shown in Table 1.
Analytic Strategy
Stability or change in YPI test scores was measured using the RCI. The RCI accounts for measurement error by determining whether an individual’s level of change between two assessments was more than expected based on chance alone. Thus, the magnitude of the decrease or increase in score that is required to demonstrate reliable change would likely be greater than changes consistent with measurement error or regression to the mean. As opposed to measuring aggregate stability or change within a sample, the RCI is a person-oriented measure and useful for evaluating within-individual change. Jacobson and Truax’s (1991) formula for calculating reliable change was used. This formula proceeds in three steps:
where sx was defined by the standard deviation of YPI total scores at the earlier assessment and rxx was defined by the internal consistency (Cronbach’s alpha values) of the YPI at the earlier of the two assessments.
where Sdiff is the standard error of measurement of the two scores.
where X1 is the YPI total score at the earlier measurement period, and X2 is the YPI total score at the later measurement period.
The RCI formula produces a raw score for each participant. Scores of −1.96 or further from zero indicate a reliable decrease in YPI test scores between the two waves examined. Scores of +1.96 or greater indicate a reliable increase in YPI test scores. Scores between ±1.96 indicate that a reliable change has not occurred. The RCI favors being conservative in pinpointing what constitutes a reliable change. Thus, the magnitude of the decrease or increase in score that is required to demonstrate reliable change would likely be greater than changes in scores consistent with regression to the mean.
Using the RCI, change or stability in YPI test scores was evaluated in two ways. First, RCI values were calculated between each contiguous measurement period (e.g., T1-T2, T2-T3, T3-T4, etc.), hereinafter referred to as wave-to-wave comparisons. Contiguous assessment periods were examined to provide a more stringent test of the degree of change in PPD symptoms than examining, for example, the degree of change between T1 and T10 (i.e., change across shorter intervals provided stronger evidence of the malleability of PPD). Multiple short-term changes were examined to assess whether change was more or less likely at later assessment periods where participants were older in age (e.g., was change as likely between T9 and T10 as it was between T1 and T2?). 1 Second, to evaluate the prevalence of reliable change across short-, moderate-, and long-term follow-up periods, the T1 baseline assessment was compared with each follow-up period (e.g., T1-T2, T1-T3, T1-T4, etc.), hereinafter referred to as baseline-to-follow-up comparisons. This analysis of change was meant to capture the perspective of practitioners at intake assessment and to provide an indication of the degree to which a client may change, or not change, over varying lengths of time. To investigate ecological fallacy concerns regarding whether change or stability was confined to specific levels of the latent trait, comparisons were made between RCI categories (decrease, increase, or no change in YPI total scores) and each category’s mean YPI total score, categorical score (low, low–medium, medium–high, and high), and whether the score was in the 90th percentile or greater.
The last phase of the analytic strategy involved a series of binomial regression analyses to examine the effect of reliable change on the extent of an individual’s offending versatility. Binomial regression analyses were used because the versatility measure was bounded (22 different offending outcomes), making negative binomial regression inappropriate (see Britt, Rocque, & Zimmerman, 2017). The Generalized Linear Model (GLM) command in STATA 14.2 was used with the binomial family extension, and a logit link function was specified. This analysis allows for the specification of the number of “trials” to be performed (i.e., the number of possible indications that a particular behavior was endorsed). Like logistic regression, binomial regression coefficients can be exponentiated to give an odds ratio, which in this case represents the likelihood of a one-unit increase in the number of different crime types endorsed (Britt et al., 2017). The proportion of time spent in the community during the period in which offending was evaluated was used as an exposure variable.
For both wave-to-wave comparisons and baseline-to-follow-up comparisons, eight binomial regression analyses were performed to examine the relationship between reliable change or stability and offending outcomes measured at each of T3-T10. All models controlled for age at the time the outcome variable was measured, ethnicity, gender, prior offending versatility, and prior YPI test score. For wave-to-wave comparisons, offending versatility was controlled for using the SRO measure assessed at the wave immediately prior to the outcome measure. YPI test score was controlled for using the score measured at the earliest of the two assessments used to examine reliable change (e.g., when reliable change was measured at T2-T3 to predict offending versatility at T4, the analysis also controlled for YPI total score at T2). For baseline-to-follow-up comparisons, offending versatility and YPI test score at T1 were controlled for in all analyses. For both the wave-to-wave models and the baseline-to-follow-up models, of specific interest was whether a reliable decrease in test score decreased the odds of higher levels of offending versatility. A second question of interest was whether the effect of a reliable decrease in YPI test score on offending versatility varied according to different levels of the latent trait. For example, would reliable decreases in YPI test scores fail to result in lower levels of offending versatility for those with the highest scores? This research question was addressed using a series of moderation analyses. By plotting the effect of mean-centered YPI total scores on offending versatility across the three RCI categories (reliable decrease, reliable increase, or no change), we examined whether those who experienced a reliable decrease averaged lower levels of offending versatility even at the highest level of YPI total scores during the initial assessment. For all models, multicollinearity was not an issue (r = <.500). Multiple imputation was not performed because such procedures are not available for the RCI (Barnes et al., 2017). Thus, analyses are of complete cases only. Fortunately, over 75% of the sample was assessed on the YPI on at least nine occasions and 55% of the sample was assessed on the YPI at all 10 measurement periods.
Results
With respect to questions regarding the prevalence of change, as indicated in Table 2, wave-to-wave changes in YPI total scores were common for sample members. Across the nine comparison periods, the prevalence of reliable change ranged from 32.4% to 37.0%. Reliable decreases in test scores were more common than reliable increases. Counting all comparisons, participants averaged 1.24 (SD = 0.99) reliable increases in test score (range = 0-5) and 1.46 (SD = 1.05) reliable decreases in test score (range = 0-5). Across all comparisons, the minimum change in test score required to produce a reliable increase or decrease ranged from 15 to 18 points. Assuming an individual’s likelihood of change was independent of their total YPI score (an assumption tested below in Table 3), given that the average YPI test score between T1 and T10 was 102.85, the 15- to 18-point increase or decrease needed to show reliable change amounted to an approximate change of 14.5% to 17.5% from their earlier score. Although average test scores tended to decrease with age (see Table 1), it did not appear that these average decreases were particularly meaningful given that the prevalence of change between T1-T2 and T9-T10 remained relatively similar. An important question was whether participants experiencing a reliable decrease in symptoms could be meaningfully differentiated from those who were experiencing nominal decreases as part of normative personality changes that take place during the transition from adolescence to emerging adulthood (e.g., Blonigen, 2010). In other words, was an individual’s reliable decrease in test score substantially different from the mean-level decreases in test scores for the sample as a whole? To address this question, YPI test scores were rank-ordered at each assessment period. The difference in rank across the wave-to-wave comparison was then determined for the different categories of change. As shown in Table 2, at any given assessment of reliable change, a participant experiencing a reliable decrease had a YPI test score that was ranked approximately 400 places lower compared with their score at the earlier time-period. Participants experiencing a reliable change experienced both within-individual and between-person change. 2
Examining YPI Scores and RCI Categories Based on Wave-to-Wave and Baseline-to-Follow-Up Comparisons
Note. Age was measured at the later of the two measurement periods. Age at T10 was unavailable as participants’ ages were masked for confidentiality reasons. Given the age at T9, the average age at T10 should be approximately 23. Values for degree of rank-order change represent the number of participants that an average individual scored higher (positive value) or lower (negative value) than between the two waves. YPI = Youth Psychopathic Traits Inventory; RCI = Reliable Change Index; Sx = symptom.
Denotes significantly different from “No Change” (p < .001). bDenotes significantly different from “Reliable Sx Increase” (p < .001). cDenotes significantly different from “Reliable Sx Decrease” (p < .001).
Examining Wave-to-Wave Changes in YPI Total Score as a Function of the YPI Score at the First Assessment
Note. YPI score is the participant’s score at the earlier of the two measurement periods. For YPI total score, values in parantheses reflect the standard deviation of the score. ANOVAs were used to compare average total scores across the three possible change categories (decrease, no change, or increase). All comparisons are significant at p < .001. YPI = Youth Psychopathic Traits Inventory; Sx = symptom.
Significantly different from “No Change.” bSignificantly different from “Symptom Increase.” cSignificantly different from “Symptom Decrease.”
The prevalence of baseline-to-follow-up reliable decreases and increases is shown in Table 2. These analyses indicated that as the length of time between baseline and follow-up increased, so did the prevalence of reliable decreases in test score. Across the nine comparison periods, the prevalence of reliable change ranged from 35.1% to 49.9%, with reliable decreases in test score being more common than reliable increases in test score. Like the wave-to-wave comparisons, the comparisons from baseline to follow-up showed that participants experiencing a reliable change (a) showed a marked difference in test score (±18 points) and (b) swapped rankings with a substantial number of other participants from the sample. The prevalence of reliable decreases was higher for baseline-to-follow-up comparisons relative to wave-to-wave comparisons, which was expected given that the former captured longer follow-up periods, meaning more opportunity for change.
The ecological fallacy concern regarding the prevalence of change across different levels of YPI test scores was evaluated in Table 3. Wave-to-wave comparisons were examined in relation to an individual’s score on the YPI at the earlier of the two time-periods (e.g., the prevalence of reliable change in YPI score between T3 and T4 was examined in relation to a participant’s YPI test score at T3), whereas all baseline-to-follow-up comparisons were examined in relation to an individual’s YPI test score at T1. YPI scores were operationalized in three ways: total sores, categorical scores based on standard deviations from the mean, and scores in the 90th percentile or higher. Beginning with wave-to-wave comparisons, for total YPI scores, individuals experiencing a reliable decrease in scores on the YPI between T1 and T2 averaged a significantly (p < .001) higher score on the YPI at T1 compared with individuals who experienced a reliable increase and individuals who experienced no change. This pattern was also observed for the other eight wave-to-wave comparisons (see Table 3). This trend was also observed when examining categorical representations of the YPI. Categorical scores based on deviations from the mean showed that a reliable decrease in YPI scores was most common for individuals who scored high on the YPI (i.e., greater than 1 SD from the mean). Between T1 and T10, 31.9% to 50.3% of all participants with a high YPI test score also experienced a reliable decrease. Similarly, individuals scoring in the top 90th percentile were disproportionately more likely to experience reliable decreases in YPI test scores, and this was true across all wave-to-wave comparisons.
For baseline-to-follow-up comparisons (see Table 4), similar results were observed to those described in Table 3. For all nine comparisons, individuals experiencing a reliable decrease in YPI test scores averaged a significantly (p < .001) higher score on the YPI at T1 compared with individuals who experienced a reliable increase and individuals whose test score remained stable. Conversely, individuals experiencing a reliable increase in YPI test scores averaged a significantly (p < .001) lower score on the YPI at T1 compared with individuals who experienced a reliable decrease and individuals whose test score remained stable. For categorical scores, individuals with a high YPI test score at T1 were significantly more likely to experience a reliable decrease over the comparison period, and this was true for all nine comparisons. The same was true for individuals scoring in the top 90th percentile at T1. The likelihood of experiencing a reliable decrease in test score improved as the follow-up period grew longer. For example, 47.2% of participants with a high test score at T1 and 53.8% of participants scoring in the top 90th percentile at T1 experienced a reliable decrease between T1 and T2. In contrast, 73.5% of participants with a high test score at T1 and 81.8% of those scoring in the top 90th percentile at T1 experienced a reliable decrease in YPI test score between T1 and T10. There are three key findings when considering these baseline-to-follow-up comparisons. First, test score decreases were more common than increases as the follow-up period became longer. Second, decreases were more common when examining longer time-periods (e.g., T1-T10 compared with T1-T2). Third, increases were less common when examining longer time-periods (e.g., T1-T10 compared with T1-T2). Similar findings were observed when examining reliable change between T1 and T9, T1 and T8, and so on.
Examining Baseline-to-Follow-Up Comparisons in YPI Total Score as a Function of the YPI Score at the First Assessment
Note. YPI score is the participant’s score at the earlier of the two measurement periods. ANOVAs were used to compare average total scores across the three possible change categories (decrease, no change, or increase) All comparisons significant at p < .001. YPI = Youth Psychopathic Traits Inventory; Sx = symptom.
Significantly different from “No Change.” bSignificantly different from “Symptom Increase.” cSignificantly different from “Symptom Decrease.”
Last, we examined whether an individual’s reliable change status between T1 and T2 was maintained for the remainder of the study period. This helped address, for example, whether individuals maintained their reliable decrease in test score (i.e., remained stable after T1-T2), whether they continued to decrease (i.e., showed additional reliable decreases after T1-T2), or whether their reliable decrease was followed up with a reliable increase that brought them closer to their score at T1. When looking at each test score from T3 to T10, individuals who experienced a reliable decrease between T1 and T2 averaged only 0.23 (SD = 0.57) reliable increases over the remaining eight comparison periods. Put differently, of the 151 participants experiencing a reliable decrease between T1 and T2, only 17.2% of this group experienced a reliable increase from their T1 score when examining their test score at each assessment period from T3 to T10. Moreover, individuals who experienced a reliable decrease between T1 and T2 averaged 4.76 (SD = 2.59) additional reliable decreases over the remaining eight comparison periods. Comparing the shortest (T1-T2) with longest (T1-T10) comparison periods, among individuals who experienced a reliable decrease between T1 and T2, 63.0% had a YPI test score at T10 that was also reliably lower than their YPI test score at T1. On the contrary, only 1.1% had a YPI test score at T10 that was reliably higher than their YPI test score at T1. When looking at participants scoring greater than 1 SD above the mean and those scoring in the 90th percentile or higher for their YPI assessment at T1, individuals within these subgroups who experienced a reliable decrease between T1 and T2 had a YPI test score at T10 that was also reliably lower than their T1 score on 86.2% and 88.2% of occasions, respectively. In other words, once an individual experienced a reliable decrease in test score, it was unlikely that he or she would later experience an increase in test score that would nullify the potential benefit of a decrease in symptoms of psychopathy, and this was especially true for those scoring highest on the YPI. Whether such decreases in test score had a beneficial effect via lower levels of offending versatility is examined below.
In Table 5, a series of binomial regression analyses were performed to examine the effect of reliable change, as determined by wave-to-wave comparisons, on offending versatility. Two different models were examined for each offending outcome measured between T3 and T10. One model examined the main effects of all variables in the analysis and the other examined main effects plus the interaction between reliable change category and the participant’s mean-centered YPI score at the earlier of the two waves used to evaluate change. Specific interest was in whether the effect of a reliable decrease in YPI test score on level of offending versatility would vary depending on the participant’s initial YPI test score. Beginning with the models examining main effects only, except for the offending versatility outcome measured at T8, across each offending outcome between T3 and T10, a reliable decrease in YPI test scores significantly (p < .05) decreased the odds of an additional crime type compared with participants showing no change and/or compared with participants showing a reliable increase in test scores. These findings were observed when controlling for YPI score at the earliest of the two measurement periods used to evaluate reliable change, offending versatility in the wave immediately prior to the outcome measure, and demographic characteristics.
Binomial Regression Analysis With a Logit Link Function Examining the Relationship Between Wave-to-Wave Comparisons of Reliable Change and Later General Versatility
Note. All models control for YPI total score at the earlier of the two assessments used to measure reliable change (e.g., for reliable change between T1 and T2, the score at T1 was used). All models also control for SRO offending versatility at the assessment immediately prior to the outcome (e.g., the T3 model controlled for offending versatility at T2). For the interaction analyses, YPI test scores were mean-centered. SRO = Self-Reported Offending; OR = odds ratio; CI = confidence interval ; YPI = Youth Psychopathic Traits Inventory; Sx = symptom.
Reference category is “White.” bReference category is “Reliable Decrease.” cReference category is “YPI × Reliable Decrease.”
p < .05. **p < .01. ***p < .001.
Significant interaction terms (p < .05) were observed for five of the eight interaction effect models (T4-T6, T8, and T9; see Table 5). Interaction effects are depicted in Figure 1. For all nine models, higher scores on the YPI were associated with lower levels of offending versatility for those who experienced reliable decreases compared with those who remained stable over the wave-to-wave comparison. For offending outcomes measured at each of T4 to T6, as shown in Figure 1, individuals who experienced a reliable decrease in YPI test scores had lower levels of offending versatility regardless of their YPI score at the initial measurement period. In other words, level of offending versatility was virtually identical for all individuals showing a reliable decrease in YPI scores, regardless of whether a reliable decrease was observed for individuals scoring lower or higher on the YPI at the earlier of the two assessments used to measure reliable change. This effect is illustrated by a relatively flat line for individuals who experienced a reliable decrease in YPI test scores. Indeed, the general trend across T4 to T6 showed that, at a given level of YPI test score, participants who experienced a reliable decrease were associated with a lower level of offending versatility. For the T8 offending outcome period, the relationship described above was again observed, but only when comparing those who experienced a reliable decrease with those who remained stable in their YPI test score from T6 to T7. For T9, the findings were somewhat different in that, compared with those experiencing no change or a reliable decrease, individuals who experienced a reliable increase in symptoms between T7 and T8 did not show as extreme an increase in offending versatility when scores on the YPI at T7 increased.

Interaction Effects for Wave-to-Wave Comparisons of Reliable Change and YPI Test Score
The analyses in Table 5 were replicated in Table 6, this time with baseline-to-follow-up comparisons used as the measure of change/stability. These analyses helped assess whether short-, moderate-, and long-term change had different effects on offending versatility. Each model controlled for demographic characteristics, YPI total score at T1, and offending versatility at T1. For all nine models examining main effects only, the odds of endorsing additional offenses was significantly (p < .001) greater for individuals who experienced a reliable increase in YPI test scores and for individuals whose scores remained stable over the baseline-to-follow-up period compared with individuals experiencing a reliable decrease in YPI test scores. In effect, whether change was measured over short-, moderate-, or long-term periods had no bearing on whether individuals experiencing a reliable decrease in YPI test scores were at a lower odds of endorsing additional offenses.
Binomial Regression Analysis With a Logit Link Function Examining the Relationship Between Baseline-to-Follow-up Comparisons of Reliable Change and Later General Versatility
Note. For the interaction analyses, YPI test scores were mean-centered. SRO = Self-Reported Offending; OR = odds ratio; CI = confidence interval; YPI = Youth Psychopathic Traits Inventory; Sx = symptom.
Reference category is “White.” bReference category is “Reliable Decrease.” cReference category is “YPI × Reliable Decrease.”
p < .05. **p < .01. ***p < .001.
For the models examining the interaction effect between baseline-to-follow-up reliable change category and YPI total score at T1, significant interaction terms were observed for three of the eight models (T3, T5, and T7; see Table 6). Interaction effects for each model are depicted in Figure 2. For all but one of the nine models, higher scores on the YPI were associated with lower levels of offending versatility for those who experienced reliable decreases compared with individuals who remained stable and compared with individuals who experienced a reliable increase. For the significant interaction term in which offending versatility at T3 was the outcome of interest, the impact of a higher YPI test score on offending versatility was stronger for those experiencing a reliable decrease compared with those experiencing no change. For the significant interaction terms in which offending versatility was measured at T5 and T7, individuals who experienced a reliable decrease in YPI test scores had lower levels of offending versatility regardless of their YPI score at T1. In effect, in the short term, individuals experiencing a reliable decrease that had a higher test score at T1 were associated with higher levels of offending versatility compared with individuals experiencing a reliable decrease that had a lower test score at T1. However, after longer follow-up periods (i.e., T5 and T7 outcome models), regardless of individuals’ test score at T1, if they experienced a reliable decrease in test score, their level of offending versatility was low.

Interaction Effects for Baseline-to-Follow-Up Comparisons of Reliable Change and YPI Test Score
Discussion
Population heterogeneity perspectives consider psychopathy to be an important indicator of criminal propensity (e.g., DeLisi, 2016; Nagin & Paternoster, 2000), and empirical research supports its robustness as a predictor of offending (e.g., DeLisi et al., 2014; Flexon & Meldrum, 2013). However, the status of psychopathy as a time-invariant construct has received less empirical attention, especially with respect to questions regarding (a) whether within-individual differences in test scores are a product of true change or measurement error, including the ecological fallacy concern of whether reliable change is equally likely across different levels of test scores, and (b) whether stability or change in symptoms of the disorder impacts offending involvement. The current study addressed these questions using repeated measures of the YPI among a sample of adjudicated male and female youth from the Pathways to Desistance Study. 3 To address this first question, stability or change in YPI test scores was assessed in two different ways. One approach, referred to as wave-to-wave comparisons, used the RCI to examine reliable short-term change across nine follow-up periods where assessments were repeated every 6 or 12 months. This allowed for an examination of whether short-term change continued to occur as participants in the sample aged and whether short-term change continued to influence offending outcomes as participants in the sample aged. A second approach, referred to as baseline-to-follow-up comparisons, used the RCI to examine the prevalence of reliable change across short-, moderate-, and long-term follow-up periods, with the last follow-up period occurring 6.5 years from the baseline assessment. To address whether change impacts offending involvement, a series of binomial regression analyses were performed to examine the impact of stability or change on offending versatility, with specific interest in whether the effect of reliable change in YPI test score on offending versatility was moderated by the level of an individual’s YPI test score at initial assessment.
The aims of the current study were addressed in part by using Jacobson and Truax’s (1991) RCI to examine changes in YPI test scores. The RCI accounts for the reliability of the instrument under examination (e.g., Cronbach’s alpha values) as well as the standard deviation associated with the assessment. By defining true change only by relatively large differences in test scores, the RCI helps provide confidence that those defined by a reliable increase or decrease in test scores were not simply individuals regressing to the mean. In the current study, reliable decreases in test scores were most common among the individuals scoring highest on this measure. This finding somewhat contradicts the portrayal of some adolescents as fledgling psychopaths (e.g., Lynam, 1998). DeLisi (2016) argued that “the occasional academic concern about the downward extension of psychopathy to children and adolescents was overblown and frankly overwhelmed by studies of psychopathic features among youth” (p. 123). Given that practitioners perceive high ratings to be evidence of poor response to treatment and justification for lengthier sentences (Viljoen, MacDougall, Gagnon, & Douglas, 2010), combined with evidence that symptoms do reliably decrease over time, we feel that ensuring that the construct is applied accurately is more than an academic concern.
It is also important to consider that concepts (e.g., psychopathy) and measures (e.g., the YPI) are distinct. If a youth scores high on the YPI during one period of adolescence, but shows a reliable decrease in test score at a later period, then is this really PPD, or is it simply PPD-like traits that are captured by the YPI that instead reflect other adjustment issues? Future research should examine whether self-identity and associated lifestyle factors (e.g., gang involvement, drug trafficking, large criminogenic social network) are associated with mimicking manifestations of PPD and whether exit from such roles is associated with decreases in test scores. As these adolescents enter adulthood, they may undergo different maturation processes (see Rocque, 2017) in which they transition out of a lifestyle where mimicking attitudes, behaviors, and traits of PPD is valued by peers and commensurate with an antisocial identity. Consequently, such a transition will lead to lower scores on measures of PPD.
The transition from adolescence to emerging adulthood includes pronounced changes in personality and identity (Arnett, 2000) that have been described as important for facilitating desistance (Rocque, 2017). Although adolescents scoring high on measures of PPD are disproportionately less likely to show a pattern of desistance during emerging adulthood (McCuish et al., 2015), as shown in the current study, this group is also not restricted from experiencing changes that result in lower levels of offending versatility relative to others. Most relevant for risk assessors and practitioners, among individuals scoring greater than 1 SD above the mean on the YPI, at each wave-to-wave comparison, approximately 30% of such individuals experienced a reliable decrease in symptoms of PPD. Anywhere from 21.5% to 30.5% of adolescents scoring in the 90th percentile and above on the YPI experienced a reliable decrease in symptoms during the next wave. This reliable decrease meant that across the two assessment periods, test scores declined by a minimum of 16 points. Given this, conclusions about youth as fledgling “psychopaths” (e.g., DeLisi, 2016; Lynam, 1998) could result in a high number of false positives. High scores may instead reflect a lack of psychosocial maturity, a construct that reflects a constellation of personality traits that may be more malleable during adolescence (e.g., Monahan, Steinberg, Cauffman, & Mulvey, 2009). It may therefore be important to distinguish between test scores that capture low levels of psychosocial maturity and test scores that reflect stable symptoms of PPD. For false positive cases, the “psychopath” label may remain with them through adulthood despite their adolescent symptoms dissipating. On the contrary, consistent with the principle of heterotypic continuity, it is also possible that symptoms of PPD manifest in different ways across the transition between adolescence and emerging adulthood, and these manifestations are not captured by items included in the YPI. This age-graded perspective implies a dynamic process of change in PPD symptoms, something that has been neglected thus far given the lack of a developmental life course view on this phenomenon.
The reliable decreases in symptoms had important implications for future offending. Specifically, per a series of binomial regression analyses examining offending outcomes at eight different waves of data collection, wave-to-wave reliable decreases in symptoms of PPD predicted a significantly lower level of offending versatility in the measurement period following the reliable change. These findings were replicated when looking at baseline-to-follow-up reliable decreases in symptoms of PPD that captured short-, moderate, and long-term change. A series of interaction analyses showed that, regardless of whether an individual presented with a lower, higher, or average YPI test score, a reliable decrease in this test score was associated with lower odds of endorsing additional offenses compared to individuals that showed a reliable increase in test scores or did not show a reliable change.
Given the link between PPD and chronic offending (e.g., McCuish et al., 2015) and between chronic offending and high costs to the criminal justice system (e.g., Cohen, Piquero, & Jennings, 2010), when PPD is present, a longer term, punitive sanction is warranted to reduce criminal justice system costs and protect potential victims. However, as shown in the current study, an adolescent’s higher test score at one time-point is not necessarily evidence of a life course pattern of PPD as measured by the YPI. Failure to acknowledge this may result in false positives that unjustifiably reduce freedoms for some youth and increase criminal justice system costs. Results in the current study reiterate the perspective that instruments measuring PPD among adolescents show an individual’s test score at just one time-point. Using this score to make long-term predictions about an individual’s personality will likely result in premature labels about the adolescent’s maladaptive personality traits.
Limitations and Future Research
The current study has several limitations that should be addressed in future research. First, although individuals who experienced a reliable decrease in YPI test scores were associated with lower levels of offending versatility compared with those who did not experience such change, what was not examined was whether within-individual change in symptoms was associated with within-individual change in offending. This question has key implications for the desistance literature in terms of whether reductions in symptoms precede a slowing down in an individual’s level of offending over time. Second, due to space restrictions, the binomial regression analyses did not also compare reliable increases to no change. Future research should examine such comparisons to better understand how increases in YPI test scores may increase risk of offending. Third, high YPI test scores may reflect extremely low levels of psychosocial maturity as opposed to symptoms of PPD. Future research should attempt to disentangle these two constructs by, for example, examining the role and influence of peer delinquency on test scores. Those whose test scores are influenced by peer delinquency may be better characterized by a lack of psychosocial maturity and may be especially likely to show high scores on a measure of PPD at one period but not at a later period. Relatedly, given the tendency of youth to mimic peers, future research should also examine the extent to which the presence of someone with PPD in the youth’s social network increase this youth’s own test score on measures of the construct. Fourth, participants were adolescents involved in the justice system. Whether similar levels of change or stability are observed and whether such change similarly influences offending outcomes should be examined with other populations. Nevertheless, the sample was ideal in that these are the types of youth most likely to present with symptoms of PPD. Fifth, a single source of information (i.e., the participant) was relied upon to measure both PPD and self-reported offending. Replication of these findings using other sources of information and other measures of PPD is needed before drawing firm conclusions. Sixth, the small number of females makes generalizability of the current findings difficult. Females were included in the current study, despite their limited number, as there currently exists little research on stability and change in symptoms of PPD among females. Seventh, future research should examine the role of treatment effects on decreases in PPD test scores.
Conclusion
Understanding the developmental course of PPD between adolescence and adulthood is especially necessary given that this construct is one of the most important individual-level risk factors for offending (Hart, 1998). To this point, population heterogeneity perspectives considered the construct to be an important measure of criminal propensity, especially because of its assumed stability between adolescence and adulthood (e.g., DeLisi, 2016; cf. Hawes et al., 2014). By disaggregating adolescents according to YPI test scores, the current study showed that stability at higher levels of the disorder is not as typical as initially believed. Using a conservative definition of what constituted within-individual change, approximately 35% to 50% of youth with a YPI test score in the 90th percentile or above experienced a reliable decrease in the short term. Over the long term, more than 75% of such youth experienced a reliable decrease in test scores. Using the RCI helped provide confidence that this change was meaningful rather than a marginal change likely consistent with measurement error or regression to the mean. Indeed, the average individual experiencing a reliable decrease in YPI test score dropped approximately 400 spots lower in the sample’s ranking of highest test scores. Also important was the finding that reliable decreases were associated with lower levels of offending versatility at later follow-up periods, and this appeared true regardless of the degree of an individual’s test score (e.g., low, medium, high) at initial assessment. These findings have implications for the assessment of adolescent PPD, the interpretation of what a higher test score means for adolescents, and what such a test score means for theories on offending persistence and desistance. There may be a dynamic process of change in the symptoms of PPD across the adolescence–adulthood transition, which casts doubt on assumptions about the rigid and inflexible nature of PPD over the life course. Some high scores on the YPI may be reflective of low levels of psychosocial maturity rather than PPD. It is also possible that the YPI is better at measuring symptoms of PPD in mid-adolescence than in late-adolescence or early adulthood. The YPI is mentioned here because it is the measurement instrument used in the current study. It is possible that similar conclusions could be drawn using other measures of PPD as well.
Footnotes
Acknowledgements
The authors wish to thank the researchers of the Pathways to Desistance Study for making these data publicly available. They are especially thankful to Carol Schubert for her responses to their queries about the Pathways to Desistance data. They would also like to acknowledge the helpful comments they received from the three reviewers.
