Abstract
Detection of intellectual disability (ID) in the penitentiary system is important for the following reasons: (a) to provide assistance to people with ID in understanding their legal rights and court proceedings; (b) to facilitate rehabilitation programs tailored to ID patients, which improves the enhancement of their quality of life and reduces their risk of reoffending; and (c) to provide a reliable estimate of the risk of offence recidivism. It requires a short assessment instrument that provides a reliable estimation of a person’s intellectual functioning at the earliest possible stage of this process. The aim of this systematic review is (a) to provide an overview of recent short assessment instruments that provide a full-scale IQ score in adult prison populations and (b) to achieve a quality measurement of the validation studies regarding these instruments to determine which tests are most feasible in this target population. The Preferred Reporting Items for Systematic reviews and Meta-Analyses Statement is used to ensure reliability. The Satz-Mögel, an item-reduction short form of the Wechsler Adult Intelligence Scale, shows the highest correlation with the golden standard and is described to be most reliable. Nevertheless, when it comes to applicability in prison populations, the shorter and less verbal Quick Test can be preferred over others. Without affecting these conclusions, major limitations emerge from the present systematic review, which give rise to several important recommendations for further research.
Introduction
Detecting intellectual disability (ID) in prison populations is important (a) in assisting detainees with ID in understanding their legal rights and court proceedings (Ericson & Perlman, 2001), (b) in facilitating ID-tailored rehabilitation programs (Jones, 2007; Lindsay et al., 2002) aimed at increasing quality of life (Riches, Parmenter, Wiese, & Stancliffe, 2006) and preventing reoffending (Willis, Prescott, & Yates, 2013), and (c) because ID should be considered as a factor in assessing reoffending risks (Lofthouse et al., 2013). ID prevalence studies within the prison system reveal ID rates among detainees varying from 0.5% to 45.0% (Crocker, Cote, Toupin, & St-Onge, 2007; T. Holland, Clare, & Mukhopadhyay, 2002; S. Holland & Persson, 2011; McBrien, 2003). This remarkable divergence could be caused by the plurality of methods used in the assessment of ID and the associated assessment studies, which may vary in quality (Uzieblo, Winter, Vanderfaeillie, Rossi, & Magez, 2012). Comparing and evaluating these studies is problematic, as many of them include no (Lyall, Holland, Collins, & Styles, 1995; McNulty, Kissi-Deborah, & Newsom-Davies, 1995) or insufficient (Uzieblo et al., 2012) information concerning the ID assessment method. Studies that do show such information to a satisfactory extent use different instruments, with a wide variation in psychometric properties, such as reliability and validity (Uzieblo et al., 2012).
For diagnosing ID, objectified limitations in both intellectual functioning and adaptive behaviour, with the onset before the age of 18, need to be determined (American Association of Intellectual and Developmental Disorders, 2010). However, due to the high comorbidity rates of mental disorders in prisoners (Diamond, Wang, Holzer, Thomas, & des Anges, 2001; Dias, Ware, Kinner, & Lennox, 2013) and the restrictions in time and resources which the prison environment entails, widely deployed ID assessment is not feasible in such populations (Kaal, 2010). Psychological appointments with detainees often have to fit within 1 hr, including the transport to and from the ward. Furthermore, the conventional assessment time for neuropsychological screening is around 45 min. Regarding the determination of intellectual limitations, the Wechsler Adult Intelligence Scale (WAIS; Wechsler, 2008) is regarded as the golden standard (Hartman, 2009). An alternative of this intensive and time-consuming instrument is the use of a short assessment instrument, which provides a reliable estimation of a person’s full-scale intelligence quotient (FIQ). On the subject of adaptive limitations, a lack of clarity about the classification of this broad concept results in a wide variety of assessment instruments measuring different components of adaptive behaviour (i.e., conceptual, practical, and social skills; Ditterline & Oakland, 2009; Kramer, Coster, Kao, Snow, & Orsmond, 2012; Schalock, 2004). Because of this lack of clarity, the scope of the present review is narrowed to the assessment of FIQ.
In the past decade, a variety of short FIQ assessment instruments has been investigated, of which several were reported as reliable (Hayes, 2002; Mason & Murphy, 2002; McKenzie et al., 2012). However, given that validation studies seem to be more susceptible for bias (i.e., design-related, verification, and review bias) than other forms of research (Korevaar, van Enst, Spijker, Bossuyt, & Hooft, 2014), it is important to assess such studies for quality (McBrien, 2003; Simpson & Hogg, 2001; Uzieblo et al., 2012). So far, no systematic quality assessment regarding validation studies on intelligence assessment instruments in prison populations has been performed.
Given the urgency for the detection of ID in prison populations, the aims of the present systematic review are (a) to provide an overview of recent intelligence tests, with a maximum administration time of 45 min, validated in adult prison populations, and (b) to achieve a quality measurement of the validation studies regarding these instruments, to determine which are most suitable in this target population.
Method
To minimise the risk of poorly reported key information, which often happens in systematic reviews (Dixon, Hameed, Sutherland, Cook, & Doig, 2005; Hemels, Vicente, Sadri, Masson, & Einarson, 2004; Moher, Simera, Schulz, Hoey, & Altman, 2008; Wen et al., 2008), we used the 27-item checklist of the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement (Moher, Liberati, Tetzlaff, & Altman, 2010) for processes described in this section.
Phase 1: Search Methods for Assessment Instruments
Literature searches were systematically performed in four different bibliographic databases (PubMed [MEDLINE], Embase, PsycINFO, and Cochrane Library). Studies in any language, up to January 2015, could be included. Variations and synonyms on the following keywords were generated and used as search terms: intellectual disability, prison, and assessment instruments. The complete list of the used keywords is available on request. Reference Manager (Thomson Reuters) was used to manage the process after retrieving the citations from the literature search. After manual deduplication among the different databases, there were 17,177 studies remaining. To systematically identify relevant literature, two independent reviewers (A.Y.M.E. and A.D.D.) screened the titles, abstracts, and full articles. In the first screening, all studies with a topic in the title that was potentially related to intelligence assessment were included for further review. In cases of doubt or disagreement, these studies were included for further analysis. In the second screening, the abstracts of the remaining 968 studies were reviewed using the following exclusion criteria: (a) studies that exclude prison populations, (b) studies that exclude adults, (c) intervention studies, and (d) studies which criticise risk-assessment instruments. In both screenings, the studies which met the above-mentioned criteria according to both reviewers were included. In cases of doubt, the study was also included for further research; in cases of disagreement, a decision was made based on the full text. In determining eligibility, the full text of the remaining 291 studies was reviewed in more detail using the following exclusion criteria: (a) instruments which do not provide an FIQ, (b) instruments intended for youth (age ≤18), (c) instruments with an administration time longer than 45 min, (d) instruments which were not described in the literature for the past 10 years, and (e) instruments which were only used as a reference test. A total of 41 instruments were described in the 161 remaining studies. For this search, the four-phase flow diagram of the PRISMA statement was used (see Figure 1), based on the Quality of Reporting of Meta-Analysis (QUOROM) statement. PRISMA is an evidence-based checklist, which can be used as a basis for reporting research. It aims to help authors to improve the reporting of systematic reviews and meta-analyses (Moher et al., 1999).

Databases search for screening instruments.
Phase 2: Search Methods for Validation Studies
Another literature search was systematically performed in the four different bibliographic databases. All remaining 41 instruments (see Figure 2) were generated and used as independent search terms. After screening titles and abstracts, full texts were retrieved for all potentially appropriate studies and, again, independently assessed by two reviewers (A.Y.M.E. and A.D.D.). Included in this search were (a) validation studies achieved in a forensic population addressing one of the selected instruments, from here defined as the index tests (see Figure 1); (b) cohort studies or cross-sectional studies achieved in a forensic population addressing the validation of one of the index tests, using one or more tests as a reference; (c) studies of test accuracy in a forensic population, where participants are screened by different index tests verified by the same reference (see Table 1); and (d) case-control studies in a forensic population, where participants had been selected on the outcome side, that is, a sample of patients with ID and a sample of non-ID adults.

Databases search for validation studies.
Study Characteristics.
Note. FIQ = full-scale intelligence quotient; QT = Quick Test; WAIS = Wechsler Adult Intelligence Scale; WAIS-R = Wechsler Adult Intelligence Scale–Revised; WAIS-III = Wechsler Adult Intelligence Scale–Third Edition; SILS = Shipley Institute for Living Scale; sGIT = short Groninger Intelligentie Test; RSPM = Raven’s Standard Progressive Matrices; Beta = Revised Beta; K-BIT = Kaufman Brief Intelligence Test; Satz-Mögel = Satz-Mögel short form of the WAIS.
During the assessment of eligibility of the remaining studies, the following exclusion criteria were used: (a) studies of nonforensic populations or forensic populations who, at the moment of assessment, were not residential in a detention setting (e.g., residents of a psychiatric hospital or prisoners on probation), (b) studies which combine the above-mentioned populations with a prison population in which results for this latter group were not analysed separately, (c) studies of juveniles, and (d) studies combining juveniles with adults, in which results for this latter group were not analysed separately.
Finally, a total of 15 studies were included in qualitative synthesis of this review (see Table 1).
Quality Measurement
The remaining 15 validation studies were judged with the revised version of the Quality Assessment Tool for Diagnostic Accuracy Studies–Second Edition (QUADAS-2; Whiting et al., 2011). The QUADAS-2 is a systematically developed and evaluated tool to assess the quality of primary studies of validation, and it is recommended for use in systematic reviews of validation by the Agency for Healthcare Research and Quality, Cochrane Collaboration (Reitsma et al., 2009), and the U.K. National Institute for Health and Clinical Excellence (Whiting et al., 2004; Whiting, Rutjes, Reitsma, Bossuyt, & Kleijnen, 2003; Whiting et al., 2011). By using four key domains (participant selection, index test, reference standard, and flow & timing) in two aspects (Risk of Bias and Applicability Concerns), the QUADAS-2 allows a transparent rating of an instrument’s qualities (Whiting et al., 2011). “Risk of Bias” (divided into the four above-mentioned domains) represents the methodological quality of the study, whereas “Applicability Concerns” (divided into three of the four domains, see Table 2) represent the applicability of the index test to clinical practice and within the population under study. All domains were rated as “high,” “low,” or “unclear,” using 11 signalling questions (see the appendix for the questionnaire). These questions are described in more detail in the “Results” section. A positive response to all questions implies a “low” Risk of Bias. If any question was answered “negative,” the Risk of Bias could be judged as “high.” The unclear category was used when insufficient data were reported to enable a judgment to be made. Questions regarding applicability concerns were judged using specific guidelines based on the review question. These guidelines are specified in the “Results” section. Following the suggestions of Whiting et al. (2011), all cases were verified by a second reviewer.
Overview of Study Qualities With the Assessed Correlations.
Note. QUADAS-2 = Quality Assessment Tool for Diagnostic Accuracy Studies–Second Edition; K-BIT = Kaufman Brief Intelligence Test; RSPM = Raven’s Standard Progressive Matrices; Satz-Mögel = Satz-Mögel short form of the WAIS; SILS = Shipley Institute for Living Scale; sGIT = short Groninger Intelligentie Test; WASI = Wechsler Abbreviated Scale of Intelligence; + = low risk; – = high risk; ? = risk is unclear.
r values >.20 and ≤.40 indicated poor, >.40 and ≤.60 moderate, >0.60 and ≤.80 substantial, and >.80 excellent agreement between constructs. r values ≤.20 indicated no correlation between constructs (Altman, 1991).
Results
Search for Assessment Instruments and Validation Studies
In the first phase (search for assessment instruments), 161 studies met the inclusion criteria as described in the “Method” section, together pointing out a total of 59 instruments. Exclusion criteria restricted 51 of these instruments, leaving eight index tests for review. These are the Kaufman Brief Intelligence Test (K-BIT), the Quick Test (QT), the Raven’s Standard Progressive Matrices (RSPM), the Revised Beta (Beta), the Satz-Mögel short form of the WAIS (Satz-Mögel), the Shipley Institute for Living Scale (SILS), the short Groninger Intelligentie Test (sGIT), and the Wechsler Abbreviated Scale of Intelligence (WASI). An overview of the full process of inclusion of instruments, as well as the excluded instruments, is available on request. The eight remaining instruments were individually administered, with a maximum administration time of 45 min. The eight tests were described in a total number of 15 studies (see Table 1 for the study characteristics). All 15 studies were subjected to the QUADAS-2, the results of which are presented in Table 2.
Quality Measurement
Risk of Bias
Five of the 15 studies clearly describe the inclusion and exclusion process which was used to select participants. To limit selection bias for this domain, the participant selection should have taken place consecutively or at random. Six studies provided sufficient evidence to conclude that the conduct of interpretation of the test results of the index test did not induce bias. In three studies, this conduct of interpretation of the reference standard did not lead to bias. The question of whether the reference standard was likely to correctly classify the target condition could be answered positively if the reference test used included evidence of standardisation, reliability, validity, and criterion-related validity (Glascoe, 2005). Regarding the domain “flow & timing” (concerning the “Method” section of the study), five studies described a clear procedure, giving sufficient evidence to conclude that there was no selection bias for this domain. Only one study showed no risk of bias across all domains (Nelson, Edinger, & Wallace, 1978).
Applicability Concerns
Concerns about applicability of the participant selection may exist if participants included in the study differ from those targeted by the review question, in terms of severity of the target condition, demographic features, presence of a differential diagnosis or comorbid conditions, setting of the study, and previous testing protocols (see the appendix). In four of the 15 studies, such a concern can be raised. A positive response regarding the applicability of the index test was based on the guidelines of Glascoe (2005): A test was considered to be good when the process and the outcome of standardisation, reliability, validity, and criterion-related validity were described accurately. Three index tests (the K-BIT, the QT, and the RSPM) fulfilled the criteria of this domain. Looking at the applicability of the reference standard, the reference used could be free of bias, but the extent to which the target condition is a distinction of the general prison population might create bias (e.g., by the presence of psychiatric disorders). In 12 studies, there was no concern regarding the applicability of the reference standard. Five studies proved to lack concern regarding applicability in all three domains (Ciula & Cody, 1978; DeCato & Husband, 1984; Gendreau, Wormith, Kennedy, & Wass, 1975; Habets, Jeandarme, Uzieblo, Oei, & Bogaerts, 2014; Hogsett, 1972).
Intelligence Assessment Instruments
Results of the instruments are expressed in terms of concurrent validity. An overview of the correlation rates between the index test and the IQ as assessed by the reference standard is presented in Table 2. The classification of Altman (1991) was used to rate the r values as provided in the studies. The study qualities based on QUADAS-2 outcome values are also presented in Table 2. Following the suggestions of Whiting et al. (2011), we did not exclude any of the instruments based on poor QUADAS-2 results. Subsequently, all eight instruments are described in the following section.
K-BIT
The K-BIT (Kaufman & Kaufman, 1990) consists of two parts, one verbal part (i.e., vocabulary) and one nonverbal part (i.e., matrices). In total, it takes 15 to 30 min to administer. The K-BIT was normed in a sample of 2,022 participants aged 4 to 90 years, where it showed excellent reliability coefficients. In addition, a substantial correlation was found with the WAIS-R. In the literature, however, there is a gap in studies examining the K-BIT in adult samples. Only Klinge and Dorsey (1993) analysed data from the K-BIT to learn how useful this test is for evaluating the forensic psychiatric population they studied. They did not use a reference test, but compared K-BIT test scores with educational level and found a moderate correlation (r = .511; p < .001) between the two. The authors concluded that the K-BIT, while discussing the use of assessment tools in general, is less than adequate for evaluating abilities in forensic psychiatric patients. In the discussion, they recommended focusing on the behavioural components of adaptive behaviour in the assessment of ID.
QT
The QT (Ammons & Ammons, 1962) is the test most frequently mentioned in the literature. According to Zagar et al. (2013), it is the most often used norm-referenced receptive vocabulary test in prison. The QT measures verbal intelligence in 5 to 15 min and gives an estimation of mental age, IQ score, and a percentile. It measures verbal inteligence quotient (VIQ) with words matched to three 4-choice sets of pictures. All six studies found significant positive correlations with the WAIS (Wechsler, 1955) or WAIS-R, ranging from moderate (r values ranging from .54 to .77; p < .001) to excellent (r values ranging from .86 to .93; no p value given). Three of these studies concluded the correlations to be excellent (Ciula & Cody, 1978; DeCato & Husband, 1984; Hogsett, 1972). Quality assessment demonstrated high quality scores for all three studies (see Table 2).
RSPM
The RSPM (Raven, 1938) is a nonverbal intelligence test and is considered to be a good measure for general intelligence (Schroth, 1983; Tulkin & Newbrough, 1968). Because it is nonverbal, it is suitable for all nonnative speakers. Templer (1992) established prison norms for the RSPM based on the data of 1,126 inmates. However, no psychometric properties were calculated in this study, because of the lack of a reference standard. After Templer (1992), Habets et al. (2014) were the next to publish on the use of the RSPM in a forensic setting. In their study, they compared repeated measures of IQ scores, including the RSPM, in forensic psychiatric patients. A moderate correlation (r = .54; p < .001) was found between the RSPM IQ scores and the FIQ scores of the WAIS-III, suggesting that IQ scores on the one test significantly predicted those on the other. However, a significantly lower mean score was found on the RSPM compared with the FIQ scores of the WAIS-III (mean difference = 10.58, SD = 15.15; t = 6.13; no p value given).
Beta
The Beta (Kellogg, Morton, Lindner, & Gurvitz, 1946) is a nonverbal intelligence test with an administration time of 15 min. It consists of six subtests which measure perceptual–motor skills in the same way as the performance intelligence quotient (PIQ) index of the WAIS. The validity of the Beta has often been assessed, but still, so far, no equivalent conclusions can be drawn from the literature. Correlations between the FIQ score of the Beta and the FIQ score of the WAIS range from .37 to .83 (Matarazzo, 1972). The three included studies found correlations between these two measures ranging from .29 to .63. However, quality assessment demonstrated low quality scores for all three studies (see Table 2).
Satz-Mögel
The Satz-Mögel (Satz & Mögel, 1962) is an item-reduction short form of the WAIS. It has an administration time of 30 to 45 min and was evaluated in a prison population by Nelson et al. (1978). An excellent correlation (r = .97; p < .001) with the WAIS was found and the use of the Satz-Mögel was recommended. However, because the study sample only included men under the age of 26, Nelson et al. (1978) questioned the applicability of the study results in general prison populations.
SILS
The SILS (Shipley, 1940) is a 15-min self-administered questionnaire consisting of 60 items, and is designed to assess mental impairment through the comparison of vocabulary and abstract thinking scores. The SILS is a commonly used instrument; however, it has been described in a prison population in only one study (Fowles et al., 1986). This study found a positive correlation with the WAIS-R (r = .78; p < .001). However, it was only significant on group level, not when administered individually. Fowles et al. (1986) argued that this might be due to their use of the WAIS-R, while SILS standards are based on the first edition of the WAIS. The need for new SILS norms before further use of this instrument was underlined. Quality assessment demonstrated a low quality score for the study of Fowles et al. (see Table 2).
sGIT
Short forms of full-scale IQ tests can be classified into selected subtest short forms (including the sGIT and WASI; see below) and item-reduction short forms (including the Satz-Mögel; see below). The short version of the GIT (Luteijn & Barelds, 2004) consists of six of the original 10 subtests and takes 25 min to administer. Habets et al. (2014) were the first to describe this instrument in forensic patients; all participants were staying in one of the selected medium security forensic wards. Habets et al. found a moderate correlation (r = .79; p < .001) between the sGIT IQ scores and the FIQ scores of the third edition of the WAIS (WAIS-III; Wechsler, 1997), with a significantly higher mean score on the sGIT IQ (mean difference = 13.49, SD = 12.03; t = −6.44; no p value given). They concluded that the outcome on one test significantly predicted that on the other.
WASI
The WASI (Wechsler, 1999) is a selected subtest short form of the WAIS, which exists in a two-subtest version (15 min) and a four-subtest version (30 min). Both versions of the WASI were validated by Axelrod (2002) using the WAIS-III as reference. Neither variant demonstrated the desired accuracy; the WASI scores were consistently lower than the WAIS-III scores. In the study of Thompson, Roberts, and Whiddon (1979) in which the four-subtest version is investigated in a prison population, significant differences (t = 3.61; p < .01) were found between the total scores on the WASI and the WAIS.
Discussion
The relevance of the current review is described in the “Introduction” section. Bringing together the clinical practice and research about ID will improve forensic health care and contribute to the reduction in risk of reoffending, on one hand by better predicting this risk (Lofthouse et al., 2013) and on the other hand by preventive interventions as described by Willis et al. (2013) in the Good Lives Model. Despite the importance of accurate detection of ID in the penitentiary system, studies show a broad variety of ID assessment methods and a low quality of associated assessment studies. Therefore, the first aim of this systematic review was to provide an overview of intelligence tests used in an adult prison population in the past 10 years, with a maximum administration time of 45 min. This overview is presented in Table 2. The second aim was to achieve a quality measurement of the validation studies regarding these instruments, of which results are also shown in Table 2. Based on these two factors, we conclude which instruments are most suitable in the target population according to the available literature about this subject.
First, we will address the applicability of the instruments and the limitations of the reviewed studies. Subsequently, we will make recommendations for future research.
Applicability of the Instruments
According to our results, recommendations for use can be made regarding two tests, that is, the QT and the Satz-Mögel. Both instruments provide an FIQ score proved to have excellent correlations with the WAIS or WAIS-R, based on studies of relatively high quality (see Table 2). Although the Satz-Mögel shows a slightly higher correlation with the WAIS, the QT is more often and more recently used in prison populations. This might be due to the high percentage of foreigners in prison populations, for whom the QT is better suited. The Satz-Mögel has a strong language component and is, therefore, particularly suitable for native speakers. The QT, on the contrary, does not require the participant to read or write, but still requires knowledge of words. Furthermore, in two of the studies (Hogsett, 1972; Simon, 1995), the QT was examined in a psychiatrically disturbed prison population and is, therefore, already validated in such a population. These studies still show at least substantial correlations between the QT and the WAIS, which confirms the reliability of the test. The Satz-Mögel has not yet been assessed in such a population. Overall, the Satz-Mögel can be assumed to be the most reliable measure of intelligence level based on the correlation with the WAIS. This makes sense, because it includes exactly the same subtests as the WAIS. However, when it comes to complexity of the target population, including comorbidity, language barrier, and common time pressure, the shorter and less verbal QT can be preferred over the Satz-Mögel.
Further research is required for the K-BIT and the SILS. Results are promising regarding the K-BIT, indicating that this test is a reliable measure of intelligence level. However, due to the lack of a reference test in relation to the K-BIT, standardisation in a prison population still needs to be done for this assessment instrument. No conclusions can be drawn about the SILS because, thus far, no reliable study has been conducted for this instrument.
Based on the aforementioned study results, the sGIT, the Beta, the RSPM, and the WASI cannot be recommended for providing an estimation of intellectual functioning in a prison population. Significant differences were found between the test results of both the sGIT and the RSPM and the WAIS-III, and between the WASI and the WAIS. Concerning the Beta, multiple studies were performed in prison populations, but test results remain ambiguous.
Limitations
When interpreting the reviewed studies, two major limitations emerge. First, many short FIQ instruments were excluded from the present systematic review because their use has not yet been investigated in prison populations. However, tests which are not established to be unsuitable could still be applicable. In addition, for several instruments which were included in this review, a comparison of study results was not possible because they were only assessed in one study. Only three of the eight instruments were described in more than one study (the Beta, the RSPM, and the QT).
The second limitation affects the quality of the included studies. The study characteristics (see Table 1) show that small sample sizes make significance and representativeness of some results questionable (e.g., DeCato & Husband, 1984; Hogsett, 1972; Simon, 1995). Important information about the studied populations, the instruments, the psychometric properties, or statistical support is often missing. This results in the fact that instruments cannot be recommended without the support and evidence of reliable literature. In addition, the study samples were not comparable in all cases. In some studies, the intelligence level of psychiatrically disturbed prisoners was assessed, while other studies focused on regular prisoners. This may have contributed to inconsistency between study results. The QUADAS-2 results (see Table 2) show that all included studies contain qualitative methodological limitations and are at risk of bias in at least one domain. Three studies show a similar risk in all four domains. Given the high probability that results of these latter studies are unreliable, no conclusions can be drawn about the SILS, which was only assessed in one study with aforementioned low quality.
Recommendations for Future Research
Based on the results of the present systematic review, five recommendations for future research can be made.
There is an opportunity to investigate other, and perhaps more recent, intelligence assessment instruments which could be applicable but were, due to the absence of validation studies in adult prison populations, excluded from the present systematic review.
Studies must be performed using the most recent available editions of the instruments. This also applies to the reference standards; for example, the WAIS, WAIS-R, and WAIS-III are no longer representative standards for intelligence (Flynn, 2009); future research requires validation using the WAIS-IV.
As one’s mental condition can influence one’s level of cognitive functioning (Millan et al., 2012), it is essential that instruments are validated for psychiatrically disturbed prisoners, alongside studies assessed in regular prisons. Regarding the sGIT, the K-BIT, and the QT, studies in such populations have already been performed. For the remaining instruments, this still needs to be done. This will also contribute to the comparability of the study results.
As addressed in the “Introduction” section, the scope of the present systematic review was limited to intellectual functioning, without affecting the restrictions that assessment of adaptive functioning entails. However, agreement on the validation and methodological quality of instruments assessing adaptive behaviour should also be a goal of future research, certainly with a view to the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-5; American Psychiatric Association, 2013), which emphasises adaptive functioning rather than intellectual functioning in diagnosing ID.
An additional remark needs to be made regarding the use of intelligence assessment instruments in large groups of psychiatrically disturbed prisoners. When it comes to the complexity of this population and the feasibility in every day practice, the use and efficiency of full-scale intelligence tests need to be considered in general. In the present systematic review, intelligence screening instruments were excluded, although they are considered to be clinically useful (Cook, Cleland, & Huijbregts, 2007). For some purposes, the use of a short estimation of a person’s intelligence level can be recommended over the use of a widely deployed full ID assessment. The potential and methodological quality of this group of instruments should be a further subject for future research.
Summary and Concluding Discussion
As mentioned in the “Introduction” section, the availability of a reliable and feasible assessment instrument for estimating intelligence level is necessary to provide a helpful ID diagnosis for court, treatment purposes, and reoffending risk restriction. Based on the available data on current short assessment instruments used in adult prison populations, the most reliable estimation of intelligence level appears to be possible using the Satz-Mögel. However, when it comes to the applicability in prison populations, the shorter and less verbal QT can be preferred over the use of the Satz-Mögel. Furthermore, there are more instruments for the assessment of intelligence and for the assessment of adaptive behaviour, for which high quality studies into their validation in prison populations still need to be performed. The same applies for screening instruments; the potential and methodological quality of this group of instruments should be a subject for future research.
Footnotes
Appendix
The Revised Quality Assessment Tool for Diagnostic Accuracy Studies (QUADAS-2).
Study:
Instrument 1:
Instrument 2:
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
