Abstract
Covering both individual and neuropsychological factors, the Forensic Symptom Inventory—Youth Version—Revised (FSI-YV-R) is the first broad spectrum questionnaire for adolescents in forensic care, measuring several deficits, such as executive dysfunctions, anger, and inadequate coping to enhance treatment goals and evaluate interventions. In this study, both the factor structure and the measurement and structural invariance of the FSI-YV-R were investigated. The sample consisted of 159 forensic juvenile outpatients (79.9% males and 20.1% females) aged 12 to 19 with a mean age of 16.07 (SD = 1.57). Multi-Group Confirmatory Factor Analyses identified a second-order model (measuring executive functioning) and proved measurement and structural invariant across age groups (younger adolescents, 12–15 years and older adolescents, 16–18 years). Contrary to the expectations none of the FSI-YV-R subscales differentiated between younger adolescents (N = 74) and older adolescents (N = 85). These results and implications for both theory and practice are discussed.
Keywords
Introduction
Individual risk factors are found to be strong predictors of criminal behavior (Bijleveld et al., 2016; Leverso et al., 2015; Van der Geest, 2011; Van der Put et al., 2012). Although individual risk factors are sensitive to environmental influences (Friedman et al., 2016; Kenny et al., 2014; Stadler et al., 2010; Steketee et al., 2013), the so-called “Big Four”—predominantly individual—risk factors (history of antisocial behavior, antisocial personality pattern, antisocial cognition, and antisocial associates) are causal risk factors that strongly predict the onset of and relapse into criminal behavior (Andrews & Bonta, 2010; Bonta & Andrews, 2017). One of the key elements of this antisocial proneness is a lack of self-control (Gottfredson & Hirschi, 1990), implying a robust and almost predestined route to antisocial behavior. Along this route, the failure to flexibly activate, monitor and inhibit one’s behavior, attention, emotions, and cognitive strategies plays a significant role (Moilanen, 2007). This ability to activate, monitor and inhibit behavior is known as executive functioning (EF): a higher order cognitive construct (Giancola et al., 2006) used for planning (Cauffman et al., 2005), initiation, and self-regulation of behavior (Ogilvie et al., 2011). Three (higher order) executive functions are involved in cognitive control: (1) updating, (2) shifting, and (3) inhibition. Updating is the ability to store and update relevant information in the working memory; shifting is the ability to switch between mental sets or rules; inhibition is the ability to suppress a dominant response in favor of another or no response (Miyake et al., 2000; Van der Ven et al., 2013).
From a forensic point of view, EF is of great interest because it could negatively interfere with one’s ability to interpret interpersonal cues and hence could lead to act impulsively and aggressively, or in any other antisocial way (Valliant et al., 2003). Indeed, research suggests a causal relationship between deficits in EF and misconduct (Ogilvie et al., 2011, Vedelago et al., 2019). Reversely, individuals with an ‘intact’ EF are more likely to self-regulate and inhibit disturbing emotions, hence guarding against offending (Adjorlolo, 2017). EF impairments in a group of children at high risk of future criminal behavior explain 8% of the variation of externalizing behavior (Van Zonneveld, 2019). In incarcerated offenders, the process of updating seems to be relatively impaired (Herrero et al., 2010) and disinhibition is likely to contribute to the development of disruptive behavior disorders in adolescents (Long et al., 2015). Additionally, both updating and impulsivity predicted self-reported reactive (i.e., impulsive) aggression (Tonnaer et al., 2016). Evidence suggests that EF deficits play an important role in risky decision-making behavior (Schiebener et al., 2015), and they influence a person’s judgment, impulse control, self-monitoring, and planning skills (Pharo et al., 2011). Miura and Fuchigami (2017) found higher recidivism rates when conduct disordered boys (aged 14–16 years) had impaired EF and were younger at first offence. Moreover, a lower IQ, inadequate planning skills and working memory impairments in young offenders were found to be related to more risky decision-making than in controls (i.e., non-offenders) (Syngelaki et al., 2009; Taylor et al., 2013). Reversely, when adolescent’s decision-making autonomy increases, susceptibility for peer pressure or other environmental influence decreases (Huizinga & Smidts, 2011; Miller, 2010; Spruit et al., 2017; Van der Put et al., 2012). In other words, (developmental) age seems to matter in the onset and perseverance of criminal behavior, when confronted with both internal (neurobiological) as well as environmental risk factors. Slot and Van Aken (2019) refer to adolescence as a phase between childhood and adulthood during which developmental tasks play a central role. These are faced during three stages (early, middle and late adolescence), indicating different challenges and skills in different age. During early and middle adolescence, the brain is more focused on thrill and reward seeking, fear and pleasure, while the executive functions and self-control are still developing (Crone, 2019), leading way to increased risk of criminal behavior. Since different stages (and age) face different challenges, the nature as well as the risk of criminal behavior is likely to differ during development.
EF are considered to form an explanation for a variety of known forensic symptoms, but also co-exist with them. Combined with deficiencies in impulse control, anger could play a mediating role in the onset and perseverance of delinquency (Cottrell, 2018; Sigfusdottir et al., 2010). In the general strain theory (Agnew, 2001), the mediating role of anger is explained by arguing that stress induces negative emotions, such as anger, which in turn leads to criminal behavior as a mean to reduce these negative emotions.
Stated differently, coping with impulses and angry feelings comprises both EF as well as the active use of coping strategies. For example, Cho and Lee (2020) stress the importance of regulation of distress (self-control) in desisting from crime, and Walters (2019) argues that desistance is best obtained by identity maturation which comprises both decision-making and problem solving, as well as future orientation. Thus, while (cognitive) coping skills play an important role in desisting from crime; impaired problem-solving skills might increase delinquent behavior (Adjorlolo, 2017).
Another established risk factor for the development of delinquency is substance abuse. Problematic substance frequently co-exists with delinquent behavior (e.g. Boisvert et al., 2019), supposing several underlying environmental and genetic factors. In a large research, Rocca et al. (2019) found recent substance abuse and more problematic forms of substance abuse (e.g. binge-drinking) to be linked to delinquent behavior, whereas committing crimes while under influence of substance turns out to be quite limited in frequency.
Walters (2020) stresses the importance of prosocial peers in reducing the risk of both delinquent behavior as well as substance abuse among adolescents. The ability to benefit from social support and refrain from negative peer influence (i.e. low susceptibility) could contribute greatly to desistance from crime, thereby reducing the known individual risk factors as mentioned by Andrews and Bonta (2010).
In sum, EF problems and other known risk factors contribute in several ways to the onset and perseverance of criminal behavior. Therefore, to effectively reduce re-offending in juvenile delinquents, adequate measurements of these potential risk factors are needed. To the authors’ knowledge, a questionnaire covering most of the known (individual) risk factors for recidivism, that is both valid and reliable, as well as easy to administer, is (still) lacking for adolescents. Recently, the Forensic Symptoms Inventory (FSI-R Adults) was developed for use with adults in forensic outpatient care (van Horn, 2018). This was done for similar reasons: to measure the forensic symptoms as experienced by outpatients, to monitor changes in these symptoms throughout forensic treatment, and continuously adapting the intervention to the patient’s criminogenic needs. Several existing questionnaires cover various domains of the aforementioned risk factors (i.e. EF problems, anger, coping, substance abuse), but so far, an integrated, broad-spectrum questionnaire is lacking. Such an instrument would greatly improve the contemporary outpatient forensic care for adolescents for several reasons.
First of all, adolescence is the transitional phase between childhood and adulthood (Sawyer et al., 2018
Consequently, this reflects differences in self-efficacy (e.g. Sagone et al., 2020) and self-competence (e.g. Ohannessian et al., 2019), as well as differences in brain development (Crone, 2009), a changing attitude towards parents and peers (Huizinga & Smidts, 2011) and several cognitive and neurobiological differences as mentioned before (risky decision making; Miura & Fuchigami, 2017; Schiebener et al., 2015).
A second reason for developing a new questionnaire is the relative absence of a reliable measure. Self-reported forensic symptoms are hardly administered, and valid instruments are greatly missing.
Thirdly, existing instruments in the adult field are not validated nor, for aforementioned reasons, suitable for use with adolescents.
Results from a meta-analysis by Walters (2006) demonstrated that self-report measures specifically designed to assess the attitudes, beliefs, and personality of offenders had equivalent predictive validity for recidivism as observer rated risk assessment instruments. In addition, self-report measures accounted for unique variance in recidivism outcome, indicating that the internal experiences of offenders provide unique information about future risk status (Tew et al., 2015).
Aim of the study
Central in this study is the development and validation of a self-report questionnaire measuring executive functioning and other known intra-personal forensic symptoms, such as problems in the affective domain—especially anger—(Cornell et al., 1999; Velotti et al., 2017), cognitive factors such as lacking future orientation (Leverso et al., 2015; Mulder et al., 2011), and behavioral factors such as inadequate coping (Aebi et al., 2014) and substance abuse (Dowden & Brown, 2002; Walters, 2015).
This newly developed broad-spectrum instrument is aimed to identify forensic symptoms amongst adolescents in an outpatient forensic setting. Additionally, measurement and structural invariance will be investigated across age subgroups. Establishing measurement invariance would imply that all respondents view the individual items, as well as the underlying latent factor in the same way, leading to a widespread use among adolescents who show antisocial or criminal behavior. Consequently, for practical use, differentiation is needed to understand the nature and development of forensic complaints. Therefore, it is deemed adequate to compare different age groups, i.e. early and late adolescents, reflecting different phases of both cognitive and social development, as well as development of the self, to investigate the behavior, affect and cognitions, as well as the EF. Finally, individualized information about EF in adolescents in forensic care will generate important input for risk assessment as well as treatment. Structural and measurement invariance will enhance the practical use of this newly constructed broad-spectrum questionnaire, the Forensic Symptom Inventory—Youth Version—Revised (FSI-YV-R).
The following research questions are addressed: 1) Does the FSI-YV-R contain a stable factor structure in which the separate constructs are sufficiently distinctive?, 2) Is the FSI-YV-R structurally and measurement invariant across two age groups (12–15 years and 16–18 years)?, and 3) When invariance has been established, do the age subgroups differ in test scores (i.e. forensic complaints)?
It is expected that the FSI-YV-R concepts will prove to be measurement and structural invariant across the age subgroups. It is furthermore expected that, in particular the younger age group, will show more deficits in executive function (Taylor et al., 2013).
Method
Participants
The study was conducted at de Waag, a Dutch outpatient forensic treatment facility. de Waag offers mainly cognitive-behavioral based interventions to juvenile and adult offenders who, due to their offence behavior, (are prone to) have contact with police force or judicial authorities. Clients enter treatment on a voluntary or mandatory base. Voluntary treatment indicates that the client enters treatment on his own initiative, on referral of a general practitioner or another mental health care institute. Mandatory treatment means that treatment is imposed by a judge. In these cases, a (juvenile) probation officer or family guardian fulfils the supervisory role.
Development of the forensic symptoms inventory—Youth version
Understanding the development of an instrument is important in assessing its validity, and therefore a brief overview is presented. In constructing this broad-spectrum questionnaire, multiple sources were used, covering a broad range of known ‘forensic’ domains, such as environmental factors, substance misuse, executive functioning and affect regulation. FSI-YV items were partly modeled after the adults’ version of the FSI (van Horn, 2018) and furthermore derived from literature review and relevant instruments such as the Behavior Rating Inventory of Executive Function (BRIEF, Fournet et al., 2015; Gioia et al., 2000), Barrat Impulsivity Scale (BSI, Spinella, 2007), the Dutch Competentie Belevingsschaal voor Adolescenten (CBSA, Treffers et al., 2002) which measures the adolescents’ experienced self-competence, and the Adolescent Self-Regulatory Inventory (ASRI, Moilanen, 2007). This resulted in an initial draft of the FSI-YV comprising 84 items divided in 11 subscales: working memory, shifting, inhibition (as part of a broader concept of executive functioning), anger, future orientation, problematic substance use, self-regulation, inadequate coping, susceptibility for peer influence, social support and sexual problems.
To maximize face validity (i.e., the extent to which items are subjectively viewed as covering the concept they seek to measure), the 84 statements were tested during a pilot phase by two panels of clinical experts and end-users (i.e., forensic outpatients < 18 years) and, after adaptation, analyzed using Exploratory Factor analysis (EFA) and Confirmatory Factor Analysis (CFA). The first clinical expert panel consisted of three psychotherapists with experience in the forensic field ranging from 8 till 15 years, of whom two have a PhD degree in (forensic) psychology. The second panel consisted of a professor in forensic psychiatry and two psychotherapists with both approximately 8 years of experience in the forensic psychiatry, particularly with treatment of adolescents. The panels were invited to assess whether the concepts were adequately covered by the items and furthermore were asked to check the items’ phrasing, readability, clarity, and specificity (e.g., double-barreled items). The two panels of end-users consisted of five outpatients (first panel) and another four outpatients (second panel), both in the age range of 12 to 18 years. The end-users were asked to examine comprehensibility, relevance and clarity of the item. Each panel received an up-dated version of the questionnaire.
After this pilot phase, the adapted version in which items were removed, reworded or renewed, was implemented in a Routine Outcome Monitoring (ROM) system and digitally presented to 683 juvenile outpatients who were in treatment from May to December 2017, resulting in a total number of 144 (21% response rate) completed FSI-YVs. Respondents were also requested to comment on the questionnaire using 3 open-ended questions as an opportunity for clients as end-users to provide input concerning the reading-level and relevance of the items.
These data were used to execute a series of EFA’s to design a more reliable, valid, comprehensive, and easy to administer questionnaire. Due to the small number of sexual offenders (n = 18), the scale Sexuality was excluded for further analyses. The items, however, remained to be part of the final version. The EFA-outcomes resulted in the removal of items with cross loadings or items with low factor loadings (in general <.70) and inter-item correlations (<.20 or >.70), as well as the creation of new items and rewording of existing ones based on the open-ended questions and the theoretical background outlined in the introduction.
Next, a series of CFAs were performed as part of the development of a stable and good fitting model. The subscales Working Memory, Shifting and Inhibition demonstrated substantial correlations which might point in the direction of the existence of higher-order factors (Fabrigar et al., 1999). A second order CFA was ran to test if Executive Functioning represents a higher-order factor comprising these three related though distinct subscales. This resulted in a good fit for Working memory, Shifting, and Inhibition as first order constructs, and Executive Functioning as second order construct, χ2 = 50.64; df = 24; Bollen-Stine bootstrapped p = .048; χ2/df = 2.103; RMSEA = .084, 90%CI = .051 −.116; SRMR = .0521; CFI = .947; NNFI = .920; PNFI = .603. The CFA-indices and their thresholds are described in more detail in the analysis section.
Secondly, the other factors (i.e., Future Orientation, Susceptibility, Substance Misuse, Social Support and Coping) were tested in a separate CFA, yielding a reasonable fit, although higher values would have been more satisfactory: χ2 = 267,929; df = 179; Bollen-Stine bootstrapped p = .086; χ2/df = 1,497; RMSEA = .059; 90%CI = .044–.073; SRMR = 0.778; CFI = .910; NNFI = .777; PNFI = .662 .
Eventually, a more complex model was tested, adding the concept of Self-Regulation to the new second-order concept Executive Functioning, and testing a new second-order relationship between the highly correlating concepts of Aggression and Anger. However, this resulted in a poor fit, χ2 = 455,423; df = 245; Bollen-Stine bootstrapped p = ,022; χ2/df = 1,859; RMSEA = .077, 90%CI = .066 −.088; SRMR = .0768; CFI = .867; NNFI = .756; PNFI = .671. This implicated that some concepts were not meeting standards.
After testing the revised version of the FSI-YV, another four subscales were removed (a total of 14 items). The subscale Aggression was excluded because it reflects the misconduct itself (i.e., the main reason for forensic treatment) rather than the symptoms underlying the aggressive behavior. In contrast, the items that cover the Sexual problems scale mainly refer to the uncontrollability of thoughts and feelings—and not the actual transgressive behavior itself. The Self-regulation subscale was also removed from the model because of high covariances with other executive functioning concepts (r = .89), and because of model discriminant and convergent validity issues, due to exceeding the threshold for these indices. The subscale Susceptibility (α = .247) failed to meet the reliability criterion of .70, and, finally, the subscale Social Support was excluded because of demonstrated convergent validity problems (AVE <.5). The analytic strategy and the thresholds for the discriminant and convergent validity are explained in more detail in the Analysis section. A complete overview of the EFA and CFA results and the decisions leading to the final version of the instrument can be obtained from the first author upon request.
The subsequent refinement process resulted in the final revised version of the Forensic Symptom Inventory - Youth Version - Revised (FSI-YV-R) consisting of 29 items divided over seven constructs, each containing at least three items (MacCallum et al., 1999; Raubenheimer, 2004). In the Appendix, the translated FSI-YV-R items and their corresponding subscales are presented.
To ensure the translation validity, the procedure of forward- and back-translation as described by Beaton et al. (2000) was conducted by two bi-linguals. The forward-and-back-translation method is a procedure for investigating the conceptual equivalence (i.e. symmetry) of the original and translated versions, necessary for valid cross-cultural comparisons.
Study protocol
As part of routine outcome monitoring (ROM), all outpatients referred to de Waag are routinely assessed with several internet-based instruments at baseline during intake and, if treatment is initiated, repeated every 4 months during treatment. The FSI-YV-R was included in the ROM-procedure in April 2018. All clients with a registered e-mail address received a secure link to the FSI-YV-R. Participation in the study was voluntary.
At intake, patients were informed by the therapist about the data collection during their treatment, as well as how these data would be used for scientific purposes and to match treatment to their individual problems. Patients also received a flyer detailing the data collection procedure and patients were asked to sign an informed consent letter if they agreed their data to be used for scientific purposes. Parents were asked to sign the same informed consent letter for patients who were 15 years or younger. The procedure falls within the Dutch Data Protection Act (Dutch DPA) and other specific Dutch healthcare laws, which provide legal provisions on how to handle privacy of personal information within the context of, amongst others, mental health services. The Dutch DPA also includes that all patients always have the right to withdraw their previous consent during and after treatment.
Sample
The initial sample comprised 160 forensic adolescent outpatients who completed the FSI-YV in the period from April 2018 till February 2019. After excluding 1 case (age being 19 year), the remaining sample consisted of 159 forensic adolescent outpatients (79.9% male, 20.1% female) who were in treatment during the inclusion period (May 2018 to February 2019). The 159 patients under study had an average age of 16.07 (SD = 1.57) at the start of treatment. Almost all of them (95.6%) were born in the Netherlands. Being in forensic treatment due to their problematic behavior, most of the adolescents were diagnosed with conduct disorder or other disruptive behavior disorders (in both age groups 80% or more); less than 10% had a (additional) developmental disorder; other clinical diagnoses or personality disorders were hardly diagnosed—mostly because of the respondents’ age. Table 1 outlines several demographic and treatment-related characteristics of the final sample, differentiated by age groups (46.5% 12–15 years vs. 53.5% 16–18 years).
Demographic and treatment-related characteristics sample (N = 159).
1Other types of disorder, including paraphilia, mood & anxiety disorders, and personality disorders.
2Including domestic violence, arsonism, skipping class.
Assessment of clinical and personality disorders
At intake, a psychologist or psychiatrist classified a patient according to the Diagnostic and Statistical Manual of Mental Disorders (5th edition) (DSM-5; American Psychiatric Association, 2013). The intake session lasted approximately 60 minutes and consisted of a screening of issues relevant to outpatient forensic care, amongst which the criminal history and index offence, family (situation), education and social network. The clinically assessed diagnosis during the intake session was discussed and determined by a multidisciplinary team of a psychiatrist, psychotherapists, and psychologists. The primary diagnosis underlies the patient’s offence behavior. Additionally, an estimation of the general intelligence was made, either based on existing IQ scores, on history of school functioning and/or clinical impression.
Forensic Symptom Inventory—Youth Version—Revised
The FSI YV-Revised is a broad spectrum questionnaire consisting of 29 items measuring the following seven domains: Working Memory (WORK, 3 items), Shifting (SHIFT, 3 items), Inhibition (INH, 3 items), Anger (ANG, 3 items), Future Orientation (ORT, 3 items), Problematic Substance Use (SUB, 4 items), Coping (COP, 5 items), and Sexual Problems (SEX, 5 items). The subscales WORK, SHIFT, and INH were part of the second order latent construct Executive functioning (EXE).
Working Memory refers to one’s ability to plan and organize one’s work. The subscale Shifting/Cognitive Flexibility relates to one’s skills to shift from one situation to the other, which in turn depends on flexibility, and shifting and dividing attention. The Inhibition subscale refers to a person’s ability to inhibit ones emotional or behavioral response to an impulse or to stop (intended) behavior. The Anger subscale on the other hand refers mostly to feelings of anger and rage, but also covers irritation, frustration and annoyance. The Future Orientation subscale refers to the presence (or absence) of thoughts, ideas and intentions concerning the future. The Susceptibility subscale refers to the youngsters’ tendency to value someone else’s opinion over their own, and their willingness to succumb to peer pressure. The Problematic Substance Use subscale contains items referring to the urge to use alcohol and/or drugs and the problems associated with the patient’s problematic substance use. The Coping subscale refers to one’s ability to adequately deal with tension and distress. The Sexual Problems subscale, finally, refers to the uncontrollable sexual thoughts, feelings, urge, and/or behaviors such as watching pornographic material. In the Appendix, the FSI-YV-R items are listed, as well as the subscales they represent.
The respondents rated, on a 5-point Likert type scale, the degree to which they thought, felt or acted on the statement over the last two weeks. Answering categories were: 1. “(almost) never”, 2. “sometimes”, 3. “occasionally”, 4. “often”, 5. “(almost) always”. High scores on the subscales indicate increased deficits in cognitive, behavioral, and/or emotional functioning.
Analysis
SPSS and AMOS version 25 were used to analyze the data. Normality tests at subscale level showed that all the subscales were below (<±.1) the threshold of 3, above which the normal distribution has been severely violated (Newsom, 2015). Subscale level normality tests were performed since forensic complaints tend to be skewed with scores peaking in the lower score range.
Test-retest reliability
To examine the test-retest reliability, intraclass Correlation Coefficients (ICCs) were calculated of 46 patients who filled out the FSI-YV-R twice. The average period between the two measurements was 19 days (SD = 8.5).
Convergent and discriminant validity
When performing a CFA, it is recommended to check for convergent and discriminant validity issues in de model. Following Gaskin (2016), several measures and their thresholds between brackets were investigated: Composite Reliability (CR>.70) to examine the reliability; Average Variance Extracted (AVE>.5) to test for convergent validity, Maximum Shared Variance (MSV<AVE) and Square root of AVE (which should be greater than inter-construct correlations).
Fit indices and thresholds
Previous investigators have recommended the use of multiple indices when evaluating the fit of a structural model (Hu & Bentler, 1995; Marsh et al., 2004). Based on these authors and the review by Hooper et al. (2008) several absolute and incremental fit indices were included to examine measurement invariance of the FSI-YV-R adults across age groups. χ2/df ratio (CMIN/DF) should preferably lie around 2.0 (Tabachnick & Fidell, 2007). Optimal chi-square values are non-significant (p>.05). For the additional absolute fit indices, a value ≤ .06 is needed for the RMSEA index and for SRMR a value close to .08 is considered best (Hu & Bentler, 1999). RMSEA values in the range of .08 to .10 indicate mediocre fit and above .10 poor fit (Browne & Cudeck, 1993). Kline (2010) suggests that for the incremental fit index CFI and NNFI, values above .90 are adequate, although values above .95 are more desirable (Hu & Bentler, 1999).
Cheung and Rensvold (2002) proposed to rely on the difference in the comparative fit index (Bentler, 1990), ΔCFI, to judge the adequacy of invariance assumptions. Based on their simulation study which analyzed the behavior of several fit indexes, they found that the ΔCFI was the only fit index which was not correlated with its overall value of the former model. They proposed to reject the invariance hypothesis when there is a decrease of .01 or larger in CFI (ΔCFI ≤ −.01). The ΔCFI was calculated by subtracting the CFI of the more constraint model from that of the less constraint model. At a more detailed level, the critical ratio (z-statistic) of differences test provided by Gaskin (2016) was used to test whether the regression weights between two age groups differ. When the critical ratio (CR) is > 1.96 for a regression weight, that path is significant (≤.05) for 95% confidence.
Single-group and multi-group confirmative factor analyses
Preliminary separate single-group CFA analyses were conducted, followed by MGCFA to test for measurement and structural invariance across age groups, using the Bollen-Stine bootstrap technique (95%CI). The bootstrap samples were set at 1,000 as recommended by Cheung and Lau (2008) to obtain stable probability estimates. The hypothesized model is presented in Figure 1.

Seven-factor FSI-YV-R model with a second order factor for executive functioning.
Following Brown (2006) measurement invariance was examined for the next six MGCFA models:
Configural invariance (Model 1)
In this baseline model, it is tested whether the proposed factor structure would be equal across the two groups. In other terms, the items should exhibit the same configuration of salient and non-salient factor loadings across different groups. Failure to demonstrate configural invariance (also called pattern invariance) indicates that different constructs were measured across groups.
Metric (weak) invariance (Model 2)
Measurement weights (factor loadings) are constraint to be equal across groups in order to test whether respondents across groups attribute the same meaning to the latent construct under study. Metric invariance (also called factorial invariance) allows a meaningful comparison of relationships (unstandardized regression coefficients, covariances) between the latent construct and other concepts across groups (Baumgartner & Steenkamp, 1998).
Scalar (strong) invariance (Model 3)
Both configural and metric invariance are tested by using information on the covariances between the items. They are not sufficient if the goal of the analysis is to compare means across groups. To justify comparing means, scalar invariance is necessary. In the scalar invariance model, the item intercepts are constrained to equality to test whether the meaning of the levels of the underlying items (intercepts) are equal in both groups.
Error variance invariance (Model 4)
In this so called strict factorial model, equality constraints across groups are specified for measurement residuals. Testing for the equality of between-group residual variance determines if the scale items measure the latent constructs with the same degree of measurement error. It was recommended by Vandenberg and Lance (2000) that the evaluation of error variance invariance be left to the researcher’s discretion because if scalar invariance holds, group difference in the residual variances is indicative of only the difference in reliabilities of the observed scores. Hence, group difference is compensated if comparison is to be made on the latent variable level. Following this rationale, significant improvement in fit is interpreted as difference in measurement reliability (i.e., random noise) rather than evidence of bias. The error invariance test should only proceed if (at least partial) metric and scalar invariance has been established first.
Next, structural invariance of the latent variables was tested (i.e., variances, co-variances, and latent means). It should be noted that these structural parameters describe characteristics of the population from which the sample was drawn and thus nonequivalence in structural parameters does not represent critiques of the measures themselves, but rather reflects differences in the distribution of the underlying construct between the two groups (i.e., ‘‘true’’ substantive differences; Adamsons & Buehler, 2007). The following sequence of structural invariance models was applied (Vandenberg & Lance, 2000).
Factor variance invariance (Model 5)
In this model, the equality of the factor variances is measured. Invariance of factor variance indicates that the range of scores on a latent factor does not vary across groups.
Factor covariance invariance (Model 6)
In this model, the equality of the covariances between latent constructs is measured. If the covariances are invariant, the correlations between latent constructs are invariant across groups. Assuming measurement and structural invariance, latent mean differences across gender were estimated, fixing the latent mean values to zero in females (reference group).
The process of MGCFA model fitting from steps 1–6 yielded a nested hierarchy of models in which each model contained all the constraints of the prior model, and thus, each was nested within the previous models.
Results
Subscale reliability and (intraclass) correlation coefficients
In Table 2 the Intraclass Correlations Coefficients (ICCs) are presented for the total group.
FSI-YV-R: Intraclass correlation coefficients (ICCs).
Note: EXE = executive functioning; WORK = working memory; SHIFT = shifting/cognitive flexibility; INH = inhibition; ANG = anger; ORT = future orientation; SUB = substance use problems; COP = coping. Latent constructs 1a thru 1c are the first order latent factors of the second order latent factor EXE.
In general, the ICCs were in the range of acceptable to good test re-test reliability coefficients.
Convergent and discriminant validity
As can be seen from Table 3 no reliability, convergent and discriminant validity issues were raised during testing. All the values were well within the specified ranges.
Reliability, convergent and discriminant validity of the FSI-YV-R.
NB. EXE=executive functioning; ANG=anger; ORT=future orientation; SUB=substance use problems; COP=coping. CR=Composite Reliability, AVE=Average Variance Extracted, MSV=Maximum Shared Variance. Inter-construct correlations with the square root of AVE in the diagonal.
Single and multigroup CFA
The configural FSI-YV-R model was tested for 12–15-year olds and 16–18-year olds. Fit indices were in the ranges of adequate fit for 12–15 year olds, χ2 = 311.090; df = 239; Bollen-Stine bootstrapped p = .471; χ2/df = 1.302; RMSEA = .064, 90%CI = .042–.084; SRMR = .0878; CFI = .914; NNFI = .901; PNFI = .625, and good fit for 16–18 year olds, χ2 = 273.231; df = 239; Bollen-Stine bootstrapped p = .729; χ2/df = 1.143; RMSEA = .041, 90%CI = .000–.063; SRMR = .0793; CFI = .964; NNFI = .959; PNFI = .674.
Measurement and structural invariance across age groups
A series of hierarchical measurement models were used to test for age group measurement and structural invariance. Table 4 presents the fit indices for the nested measurement and structural invariance models for 12–15-year olds and 16–18-year olds.
FSI-R youth version: Model fit indices for measurement invariance across age groups (N12–15 yrs. = 74, N16–18 yrs. = 85).
NB. First and second order invariance were tested.
*Bollen-Stine Bootstrap.
The configural model (Model 1) attained adequate fit to the data with values well within the thresholds, indicating that the age groups share the same underlying pattern as measured with the FSI-YV-R, and that corresponding subsets loaded on the same factors. The full metric invariance model (Model 2) resulted in a decrease of fit according to the significant chi-square difference, Δχ2(19) = 85.614, p ≤.001, and the ΔCFI > −.01. Based on the critical ratio (z-statistic) of differences, the factor loading of item 30 from the Substance use scale ‘Others felt that I was using too much alcohol and/or drugs‘had the largest deviation (8.59) from the standard normal z-distribution (−1. 96 > z < 1.96). The constraint of the factor loading of this item was released resulting in a better fit of the model (Model 2a). Scalar variance was established in Model 3 (which was nested in Model 2a) with intercepts constrained to be equal between the two age groups. Despite the significant differences between the two age groups in residuals (Model 4), measurement invariance of the FSI-YV-R was confirmed with partial metric and full scalar invariance being established.
The next Models 5 and 6 tested the structural invariance describing the characteristics of the population from which the sample was drawn. From Table 4 it can be seen that both models fitted the data well. All fit indices were within the required ranges of adequate to good fit.
Latent means for age groups (12–15 years, 16–18 years)
Since measurement and structural invariance across age groups were established, the latent means between the groups were calculated. These are presented in Table 5.
FSI-YV age groups: Bias-corrected means for 16–18-year olds (N = 85) with 12–15 year olds (N = 74) as the reference group.
The youngest age group served as the reference group against which the latent means from the oldest age group were compared. Positive values indicate that the mean of the older age group was higher. The factor mean invariance model showed adequate fit to the data: Bollen-Stine bootstrapped χ2 = 706.358, df = 553, p = .648; χ2/df = 1.277; RMSEA = .042, 90%CI = .032–.051; SRMR = .0927; CFI = .915; NNFI = .915; PNFI = .701. As can be seen from Table 5, no differences were found between the two age groups in problematic executive functioning, anger, future orientation, substance use, and coping.
Discussion
The present study described the development of a self-report questionnaire (FSI-YV-R) measuring a wide range of known forensic intra-personal problems, such as executive function (EF), anger, future orientation, substance abuse and deficits in coping strategies. The following research questions were posited: 1) Does the FSI-YV-R contain a stable factor structure in which the separate constructs are sufficiently distinctive?, 2) Is the FSI-YV-R structurally and measurement invariant across two age groups (12–15 years and 16–18 years)?, and 3) When invariance has been established, do the age subgroups differ in test scores (i.e. forensic complaints)?
It was expected that the FSI-YV-R concepts would prove to be measurement and structural invariant across the age subgroups. It was furthermore expected that, in particular the younger age group, would show more deficits in executive function, since the development through early to middle and late adolescence is associated with changes in brain structure (Crone, 2009), changes in decision making (Schiebener et al., 2015) and changes in impulse control (Petrich & Sullivan, 2020).
As anticipated and in answer to research question 1, the validation process resulted in a model that showed good fit for the data with working memory, shifting, and inhibition as part of a second-order construct of executive functioning. In addition to that, in answer to research question 2, the final model proved to be measurement and structural invariant across age-groups. However, at metric level the model proved partially invariant indicating that some factor loadings differed between the two age groups. More specifically, one item in the Substance use subscale (‘Others felt that I was using too much alcohol and/or drugs’) exceeded the critical ratio of differences. In 12–15-year olds this item was less relevant than in older juveniles as indicated by the low factor loading in this group. After releasing the factor loading constraint, partial metric invariance was met. With regard to research question 3, the younger age-group showed no differences in executive functioning as compared to the older age-group, contrary to our expectations. Similarly, differences in anger, future orientation, substance misuse, and coping were not significant. These results do not corroborate the findings in previous studies (e.g., Iselin & DeCoster, 2012; Spruit et al., 2017).
Several hypotheses can explain this difference. Firstly, in the international literature adolescence is sometimes divided in early, middle and late adolescence (e.g., Gardner et al., 2008), and sometimes in early and late adolescence (e.g., Boelema et al., 2014). Regarding the concept of executive functioning, a significant difference seems to exist between the ability to understand the risks associated with behavior—typically by about the age of 14—and the ability to inhibit the risky behavior, typically at age 20 (Giedd, 2004; Sowell et al., 1999). Since the mean age (16 years) of the adolescents under study was well below the age of 20, this might explain the absence of differences in EF between the two age-groups. Indeed, Pharo et al. (2011) found higher levels of overall risk-taking with adolescents (13–17 years old) compared to emerging adults (18–22 years old).
Secondly, the absence of EF differences could lie within the heterogeneity of the respondents. Our sample consisted of several disorders, more or less evenly divided among the subgroups. Van Zonneveld (2019) concluded that different disorders bare different EF deficits, in different developmental stages, providing evidence that the forensic group is a multi-faceted, multi-faced group. For example, EF are known to be impaired in ADHD (Toplak et al., 2009) which shows a stable impairment throughout the adolescence into (young) adulthood (Biederman et al., 2007), but also a different developmental path as witnessed by mainly inhibitory problems during childhood but more pronounced working memory problems during adolescence (Tillman et al., 2015). Additionally, Matthys et al. (2013) point at the existence of EF problems in cognitive control in Conduct Disorder [CD] (mainly impaired cognitive control). The inclusion of ADHD and CD in both groups (12–15 and 16–18) could have had a neutralizing effect on the potential (developmental) difference between the age groups. Therefore, future research should focus on a balanced and more differentiated group of respondents. In addition, levels of IQ (Moffitt,1993; Séguin et al., 1999) and level of schooling should be taken in account when comparing two (age) groups, since inhibitory skills are deficient in groups with a lower level of schooling (Borrani et al., 2015). In the current study the older age group, in general, shows a lower degree of education. This might have impacted the potential difference on inhibitory skills, amongst others.
Thirdly, Toplak et al. (2013) raise an important question concerning the measurement of EF. They pose that performance‐based measures (i.e. neuropsychological tasks) provide an indication of processing efficiency (the algorithmic mind) whereas rating measures (questionnaires) provide an indication of how an adolescent would (like to) act (the reflective mind). Most of the research concerning forensic patients (e.g., Herrero et al., 2010; Long et al., 2015; Schiebener et al., 2015; Tonnaer et al., 2016) made use of performance-based tasks instead of questionnaires. Nordvall et al. (2017) suggest that, although working memory and inhibitory skills are less developed during early adolescence, the reflective ability is evenly developed in early and late adolescents. Hence, the nature of problems as experienced by both age groups is equal in severity but might differ when measured with performance-based tasks.
Fourth, in this research respondents were used from different branches of the same outpatient center (de Waag). To reach the minimum number of respondents, all juveniles in treatment were asked to participate. This implicates that both ‘experienced’ outpatients, at the end of their treatment, fall in the same group as starting outpatients. To measure validity of our questionnaire, methodically, this is a permitted method. Nonetheless, the outcome might give a blurred effect, perhaps leveling the different scores. Future validation research, therefore, should take this issue into account, not only comparing forensic juveniles with a control group (i.e. ‘non-forensic’), but also measuring the potential treatment effect (i.e. the difference between periodic measurements).
Limitations and suggestions for future research
While having successfully developed a self-report measurement of a wide range of forensic complaints among youth treated in outpatient forensic care, the FSI-YV-R mainly focused on intra-personal factors and not so much on the influence of these factors on environmental processes such as the susceptibility for peer pressure and social support. It has been found in previous studies that when adolescents grow older, the impact of risk factors both in the environmental domain as in the individual domain decreases (Van der Put et al., 2012). Nevertheless, during adolescence the impact of parenting style shifts towards peer influence and school, indicating the importance of adequately fitting interventions to the identified and topical risk factors. Similarly, Spruit et al. (2017) indicate the importance of adjusting offender treatment to the age specific criminogenic needs. These do not only encompass (intra)individual factors, but comprise amongst others substance misuse, emotional problems and schooling (Assink et al., 2015). Friedman et al. (2016) compared EF in late adolescence (17-year olds) to EF in early adulthood (23-year olds). In their twin study they found that EFs are quite stable by late adolescence, yet are still sensitive to environmental influences. Although future research will need to replicate this for forensic youth and young adults, researching the relation between individual factors and environmental influences is evenly important. This relation between the intra-personal FSI-YV-R subscales and environmental factors could be investigated in future research using, for instance, the CBSA (Treffers et al., 2002), as well as the Youth Self Report (YSR, Verhulst et al., 1997), both self-report instruments, and measuring emotional and behavioral problems. Additionally, the FSI-YV-R could be used to assess the parent’s or caregiver’s opinion of risk factors or global functioning concerning the juvenile at stake. Despite reliability of self-report questionnaires in forensic care, at least for predicting recidivism (Kroner & Loza, 2001; Mills et al., 2003), additive information by important others may well contribute to a well-balanced treatment design.
Measurement and structural invariance were established across age groups; however, it is also important to investigate the invariance of the FSI-YV-R between male and female participants. In the underlying study these analyses could not be accomplished due to the small subsample of female participants (n = 32). It is known from previous studies that sex differences exist. Gender tends to moderate the maturation of the neurocognitive skills (Jacobson et al., 2002), the desistance of antisocial or criminal behavior (Spielberg et al., 2015) and the susceptibility for peer pressure and environmental influences (Steketee et al., 2013). Huizinga and Smidts (2011) found that boys show significantly more executive problems than girls, and this effect was consistent with advancing age of the participants. Noteworthy, age and gender also tend to have an interaction effect in particular in middle and late adolescence: maturation of the working memory is less influenced by age among 17 to 19-year-old girls than among their male counterparts (Malagoli & Usai, 2018). Hence, a larger sample is needed to investigate the invariance across gender.
Since all forensic treatment is targeted at reducing offence-related cognitions, behaviors, and feelings, changes must be assessed with the FSI-YV-R (as part of the external validity) and relate them to recidivism information. From a risk management point of view, Moffitt’s (1993) division between the adolescence limited and life course persistent offenders would be useful. It would be worth investigating both the prevalence and pattern of forensic symptoms as well as the change during treatment.
Clinical implications
Low self-control contributes significantly to crime and recidivism, even after correcting for IQ and socio-economic status (Moffitt et al., 2011). Ratchford and Beaver (2009) convincingly reason that self-control in fact is part of EF. Although mainly influenced by environmental factors (school, parenting, peers and siblings), self-control can change (Roberts et al., 2006). In clinical practice exactly this lies at the heart of forensic treatment. The FSI-YV-R is expected to yield important information in guiding therapists in both risk management as well as enhancing self-efficacy and self-control. Unfortunately, at this moment little can be said about the additive value of distinction in age groups and their consecutive executive functioning and other forensic symptoms. More research is needed to determine the usefulness and feasibility of changing—reducing—forensic symptoms as measured with the FSI-YV-R. For now, no differences in age groups could be determined. However, considering the aforementioned limitations in this research and inconsistencies in earlier work, more research is needed to replicate or reject our findings.
Notwithstanding, to our knowledge the FSI-YV-R is the first broad spectrum questionnaire for forensic use that is viable, valid and both measurement and structural invariant across age groups. For more practical use, it is important to aim research to conduct a norm group and measure the expected difference of symptoms between forensic and non-forensic adolescents and emerging adults.
Appendix
Forensic symptom inventory—youth version revised (FSI YV-R)
Answering categories: 1 = (almost) never, 2 = sometimes, 3 = occasionally, 4 = often, 5 = (almost) always.
When an item is not applicable score 1
Note: EXE = executive functioning; WORK = working memory; SHIFT = shifting/cognitive flexibility; INH = inhibition; ANG = anger; ORT = future orientation; SUB = substance use problems; COP = coping; SEX = sexual problems.
*Mirrored items.
