Abstract
The Psychological Inventory of Criminal Thinking Styles (PICTS) is a self-report measure which is given to individuals who have been involved in criminal activity or are known to the Criminal Justice System. Although the PICTS is extensively used and its psychometric properties supported within the research, no critique has yet specifically assessed its utility with forensic populations. Therefore, the aim of the critique was to analyse the scientific and psychometric properties of the PICTS. Adaptions have been made to the PICTS from the first to the fourth revision due to issues with the reliability and validity of the measure. Although the PICTS does have satisfactory internal and retest reliability, the reliability of the validity scales within the measure has continued to be poor. Furthermore, no independent research on the measure has been undertaken. As such, gaps in research and issues that need to be addressed have been highlighted. Practical implications, limitations, and future research are also discussed.
Introduction
Walters (1990) hypothesised that the criminal lifestyle is underpinned by a number of dynamic and interactive factors derived from the lifestyle theory. This theory defines criminal lifestyle in terms of interpersonal intrusiveness, irresponsibility, self-indulgence, and social rule breaking and hypothesises that these behaviours arise from three interlinked variables: conditions, choices, and cognition. Conditions refer to internal factors (intelligence and temperament), external influences (peers and family), and synergistic factors (Person × Situation) which impact on an individual’s actions. Although these conditions do not necessarily cause criminal behaviour, they can make an individual more vulnerable to future criminal involvement which, in turn, contributes to a choice being made. Choice is defined as the range of options available to an individual in life. Cognition refers to the rationalisation of choice decisions in order to reduce or eliminate any feelings of guilt that might arise from these. Therefore, once a choice has been made, an individual enters the transitional phase and a complex series of thoughts and ideas evolve into a lifestyle supporting belief system. This occurs through the development of a schematic subnetwork of goals, values, thinking styles, and attitudes, and signals the beginning of the commitment an individual makes to a particular lifestyle.
This schematic subnetwork then becomes self-perpetuating, making commitment stronger and increasing the likelihood of the lifestyle being maintained. A process of change may occur for an individual depending on the extent to which their lifestyle is effective in helping them to achieve their goals (Shover, 1996). Walters (2000) suggested that for change to occur and a new lifestyle to be adopted, the individual must consider taking responsibility for their choices, increase their self-confidence to change, give this meaning, and consider the impact their current actions are having on the community.
Applying this theory to a criminal lifestyle, Walters (2006) suggested that individuals, during the initial phase, will be driven by a curiosity and excitement to engage in crime as well as by factors such as socialisation, stress, and availability of resources. Other factors such as existential fear and peer influence may also serve as a means of considering criminality (Fisher & Bauman, 1988). The incentive for a continued criminal lifestyle develops when the individual feels their self-efficacy increasing and constructs thinking styles supportive of crime. Involvement also continues due to a fear of losing the benefits that criminal activity affords. For individuals who enter the maintenance phase of this criminal lifestyle, Walters (2006) suggests that they will have now developed a congruent belief system which serves to help them maintain the behaviour. Change is reported to occur only when the individual either reaches burnout or maturity. Here an offender begins to experience either decreased pleasure in maintaining a criminal lifestyle or has an increased desire for goals which are incompatible with their lifestyle (e.g., family commitments). Having drawn on this theory and building on Yochelson and Samenow’s (1976) work, Walters (1990) developed eight thinking styles (Mollification, Cut-Off, Entitlement, Power Orientation, Sentimentality, Superoptimism, Cognitive Indolence, and Discontinuity), which he believed to be linked to a criminal lifestyle. This resulted in the development of the Psychological Inventory of Criminal Thinking Styles (PICTS).
Overview of the Tool
The PICTS is a self-report measure which is given to respondents who have been involved in criminality or are known to the Criminal Justice System. The first version was published in 1990 and comprised of 32 items (four items per thinking style). Walters (1990) reported how the eight thinking styles attained stable coefficients, that is, the items correlated well with past criminality. However, he acknowledged that it was not possible to determine whether respondents were being honest and so confidence was reduced in the psychometric measure. The PICTS was subsequently revised to include two validity scales: a Confusion scale designed to identify a “fake bad,” malingering, or “yes-saying” response set, and the Defensive scale to assess respondents trying to create a favourable impression of their psychological stability. A third edition was published in 1992, whereby the number of items for each thinking style scale was doubled from four to eight. However, Walters (2002a) cites an unpublished study that he undertook in 1994 which revealed that, despite the changes, the validity scales reduced the accuracy of the PICTS and weakened the PICTS utility. To improve the PICTS psychometric properties, eight items were removed as well as reversing the scoring of the Defensive scale. Walters (2001a) also questioned the importance of including a scale which measured fear of change as conceptualised by the lifestyle theory. Additional scales from the existing PICTS measure were also created as a result of two factor analyses (Walters, 1995; Walters, Elliott, & Miscoll, 1998) and content analysis (Walters, 2002b). This led to the publication of Version 4 which now comprises of 80 items organised into the following scales:
Two validity scales: Confusion and Defensiveness.
Eight thinking styles.
Fear of Change Scale.
Two general content scales: Current Thinking and Historical Thinking.
Four factor scales: Problem Avoidance, Interpersonal Hostility, Self-Assertion, and Denial of Harm.
To complete the PICTS, individuals are advised to have at least an average reading age of between 11 and 12 years old. There is no time limit although under normal conditions, individuals should be able to complete the PICTS in 15 to 30 min. Respondents are required to rate each item on a 4-point Likert-type scale where they consider if they Strongly Agree, Agree, are Uncertain, or Disagree with each statement.
The items are totalled to produce a raw score which is converted into t scores (a transformation of raw scores to standard scores), with the Defensive scale reverse scored (a transformation of raw scores to standard scores). The interpretative guidelines within the manual suggest that the first step is to determine whether a Confusion t score or a Defensive t score should be interpreted in light of the total score on the Lifestyle Criminality Screening Form–Revised (LCSF-R; Walters, White & Denney, 1991). A Confusion t score of above 70, unless the LCSF-R total score is 10 or higher in which a cut-off of 80 is applied, indicates a fake bad or fake bad response set. A Defensive t score of above 65, unless the LCSF-R score is 4 or less, would mean a score of above 70 is used and suggests a fake good response set. This would mean that the results are compromised and should be interpreted with caution, if at all. This is considered rare and usually pertains to less than 5% of the sample. The second step is to explore the content scales. A t score of above 55 on the Current scale suggests a belief system within the respondent is still active, or if a t score is achieved on the Historical scale above 55 a criminal belief system has been active in the past. However, if there is no elevation on either scale, this implies that the belief system is absent, weak, or hidden and the assessor is advised not to interpret the measure any further. The third step is to examine the eight thinking styles to identify the top three styles in terms of their t score elevation. The guidelines suggest that these are averaged and compared to the contrasting average t scores (above 50) of the remaining thinking styles. This reveals whether the profile is differentiated (i.e., the score differs by more than five t scale points) or undifferentiated (i.e., does not differ). While both can still be interpreted, a differentiated profile allows the assessor to advise the respondent on how best to challenge their thoughts and beliefs associated with the thinking style. Although the scores on the factor scale and fear of change scale can be incorporated into the interpretation, Walters advises that they are there to supplement the thinking styles, and as such, they are not necessary for the assessor to review.
Characteristics of a Good Psychometric Measure
Kline (1986) and Field (2009) argue that the essential components to assess the quality of a psychometric measure are the reliability, validity, discriminatory power, and the appropriateness of the normative data. These concepts are discussed to determine whether the PICTS is an accurate, replicable, valid, standardised measure, free from predictive bias (Schultz & Whitney, 2005).
Reliability
Reliability is fundamental to psychometric measurement and refers to the degree to which a tool measures a construct and produces consistent results over time and under different circumstances (Howitt & Cramer, 2005). If a test is reliable then the difference observed in a respondent’s scores can be attributed to the changes in them rather than being considered as a result of the tool.
Internal Reliability
Kline (1993) defines internal reliability as the extent to which each item within the psychometric tool consistently measures the same construct. If internal reliability is achieved, it can be assumed that different items in the test contribute equally to the overall score. The most commonly employed measure to assess internal reliability is Cronbach’s alpha and, in line with George and Mallery’s (2003) description, the alpha ranges from 0 to 1. When considering the cut-off to determine internal reliability, Nunnaly (1978) argued that a minimum co-efficient of .70 is necessary; yet Kline (2000) suggests that acceptable internal reliability ranges from .60 to .70. However, Cattell (1978) argued that a measure which is comprised of many items and has very high internal reliability can be contaminated by a bloated specific and thus antithetical to being valid. In the development of the PICTS scale, Walters (1995) reported that there was an acceptable level of internal reliability for the eight thinking styles which ranged from .59 to .78. However, the two validity scales were considered to be poor (.42 and .36). In Version 4, the alpha co-efficient measures of internal consistency for each of the 17 scales ranged between .55 and .88 for the male offender cohort tested and in the range of .54 and .88 in the female cohort. Walters concluded that the PICTS possesses moderate to moderately high internal consistency across all scales. These figures appear to be slightly lower than the reliability guidelines for internal reliability and if the above alpha cut-off was to be applied, several scales below .70 would be considered to have questionable internal reliability. Another consideration is how the number of scales has been increased over the different versions, and as such, the measure may suffer from the reliability being inflated.
Internal reliability has also been reported for the PICTS scales when testing on different cultural samples, and similar results were found to Walters (Table 1). However, the studies noted used different versions of the PICTS, with Healy and O’Donnell (2006) not specifying which version they used. It is important to note the Defensive scale continues to have questionable internal reliability, particularly when using an English Sample (Palmer & Hollin, 2003, 2004a). Therefore, it is important for further studies to be conducted particularly on forensic sample to determine the PICTS applicability.
Internal Reliability for the PICTS Version 4.
Notes. PICTS = Psychological Inventory of Criminal Thinking Styles; Cfr =Confusion; Dfr = Defensiveness; Cur = Current Thinking; His = Historical Thinking; Moll = Mollification; Co = Cut-Off; En = Entitlement; Po = Power Orientation; Sn = Sentimentality; So = Superoptimism; Ci = Cognitive Indolence; DS = Discontinuity.
Test–Retest Reliability
For a psychometric measure to have test–retest reliability the tool must yield, in the absence of intervention, the same outcome at different assessment intervals (Kline, 1993). Kline suggests that a correlation analysis is the most effective way to determine test–retest reliability. A minimum level of .70 needs to be achieved as, if the standard of error increases below this, it will render the interpretation of the data uncertain (Guilford, 1956). However, other factors can impact on the psychometric measure which does not necessarily mean it is unreliable such as a respondent being under the influence of medication. In addition, it is important to consider the time period used in studies which have measured retest reliability as a short time period could result in fatigue or the respondent remembering the questions posed. Kline (2000) believed that at least a 3-month period was sufficient, although recognised that this was flexible depending on the type of respondent.
Walters (1995) examined the retest reliability of Version 3 of the PICTS and administered this to 450 inmates during a period of several months where norms were achieved for each of the thinking styles. To evaluate the test–retest reliability, one group of 25 participants was randomly sampled to complete the PICTS again after a 2-week period, a second group of 25 participants undertook the PICTS again after 12 weeks, and a third group of 25 participants were retested after both time periods. With the exception of the defensive scale (.47) at the 2-week retest point, retest reliability ranged from moderately high to high (.72-.85). A weaker correlation was found at the 12-week point which ranged from .57 to .72 across the validity and eight thinking style scales. Similar findings were achieved when using a female offender sample (Walters et al., 1998) suggesting that the PICTS had good temporal stability.
To address the concerns with the reliability of the validity scales, Walters (2001b) amended the items and found that the correlation co-efficient exhibited a level of stability equivalent to that of the thinking styles for Version 4 of the PICTS. However, some of the data were missing for the male sample with data being available for the 10-week interval only. In contrast, better levels of retest reliability for females were found (.87 and .67 at the 12-week interval). Problems with missing data, together with studies that were either not published or used small samples, were acknowledged by Walters (2002a). This led to him undertaking a meta-analysis as a means of strengthening the reliability of the PICTS. Test–retest reliability found most of the scores for the eight thinking scales remained in the same range between testing. An unexpected result was found when comparing the validity scales between gender groups. The male cohort at the 10-week interval was below the acceptable correlation standard (.64 and .47), whereas the female cohort was within acceptable standards (.87 and .67). This suggests that further research needs to be conducted to understand why these differences occurred.
Overall, the research reviewed implies that there is satisfactory reliability with the eight thinking scales and the content and historical scales. However, as detailed, the figures appear to be slightly lower when applying the guidelines for internal reliability, and it could be argued that the retest time frame was too short. Furthermore, independent research is also required and should be conducted on a wider range of cultural forensic populations for Version 4 of the PICTS. This is key given that the majority of studies have been conducted by the developer of the PICTS whereby an Allegiance effect (Hollon, 2006) may have occurred. However, while Walters makes the PICTS available to qualified researchers and licensed psychologists, these recommendations may prove difficult for those outside of the institutions where the PICTS is readily available, such as the Administrative Office of the U.S. Courts that oversees U.S. Probation and Pretrial Services.
Validity
Validity is the second major characteristic of a psychometric measure, with Kaplan and Saccuzzo (2009) and Kline (1998) framing this concept into a question: Does the psychometric-measure measures what it intends to measure?
Face Validity
This refers to the appearance of the items relating to the purpose of the test, that is, are the items of the PICTS considered relevant to the construct of criminal thinking? However, Kaplan and Saccuzzo (2009) advise against considering face validity as assessments will be subjective. Furthermore, if the respondent is able to understand and recognise the purpose of the measure, this may alter their response, particularly among offender populations who attempt to deny or conceal their behaviour on a regular basis (James et al., 2005). In an attempt to resolve these issues, the PICTS includes two validity measures (Confusion and Defensive) to manage response bias. However, issues have been reported with the internal and retest reliability of these scales and so questions were raised that these either needed to be revised or for consideration to be given to the inclusion of further validity scales. More recently, another response style (Infrequency: INF) has been identified for the PICTS which Walters (2011) hypothesised may supplement or improve upon the Confusion score and as such address the validity concerns noted with this scale on the PICTS. He used a sample of inmates with psychiatric problems in order to identify whether malingering could be predicted by both scales. While the INF scale displayed the characteristics of a good screening measure and was able to identify “fake bad” responses, the study confirmed how the confusion scale continues to have validity issues. To be confident of the inclusion of INF scale, further research is needed to understand the underlying structure of malingering in individuals in forensic population as well as tested on a much larger sample size. In turn, this would ensure the assessor was confident that the PICTS presented an accurate view of the respondent.
Content Validity
Content validity is based upon logical evaluation rather than statistical analysis (Kaplan & Saccuzzo, 2009) and focuses on whether the psychometric measure covers all crucial and relevant aspects of the measured concept (Tavernier, Totten, & Beck, 2011). Evaluating the content of the psychometric measure requires careful consideration of the appropriateness of each item, ensuring that test items do not fail to capture elements of the measure. Therefore, if the construct has a clear and consistent definition, the level of content validity should be high (Haynes, Richard, & Kubany, 1995).
In the case of the PICTS, content validity would be the extent to which the measure samples all aspects of criminal thinking. In the development of the original PICTS, Walters (1995) researched the academic literature and also held discussions with offenders who were undertaking offender behaviour interventions. Constructing the psychometric measure in this manner ensured information was firsthand from offenders and so maximised the congruence of the measure with the criminal lifestyle. Walters repeated this procedure for his third version and modified the items accordingly. Walters concluded that as the PICTS is designed to measure the eight thinking styles, it would seem to possess content validity as each scale is devoted to the thinking styles. However, it could be argued that this measure is biased by what Walters deemed as relevant to criminal thinking. For example, Cognitive Indolence continues to remain unique to the PICTS when comparing other measures that focus on cognition.
Construct-Related Validity
While content validity focuses on the inclusion of whether all relevant items relate to what is being measured, construct validity focuses on the theoretical integrity of the measure. This means the degree to which PICTS items relate to theoretical and conceptual understanding of criminal thinking. A typical way in which to establish construct validity is through factor analysis. This helps to determine whether each scale contributes to the scale outcome or whether there are other factors that contribute to the outcome (Kline, 2000). The initial factor analysis undertaken by Walters (1995) resulted in four factors being established from the scales created, which were later incorporated into Version 3 of the PICTS: problem avoidance, interpersonal hostility, self-assertion, and denial of harm. This was then cross-validated in the female offender group study (Walters et al., 1998) and using a goodness of fit index of 0.92 and a root mean squared residual of 0.5, the results revealed a good fit between the two data sets.
Using the original data from Walters’ (1995) study, Egan, McMurran, Richardson, and Blair (2000) carried out a principle components factor analysis on the eight thinking scales. They found one factor on which all of the eight thinking scales loaded which accounted for 58.8% of the variance in scores. When a two-factor solution was applied, there was some overlap between the two factors extracted leading to the conclusion that the measure was assessing a unitary construct rather than eight distinct thinking styles. This was supported by Palmer and Hollin (2003), who performed a similar analysis on an English male prisoner sample.
More recently, Walters (2012, 2014) has undertaken a series of Factor Analyses on the scale of the PICTS and has conceptualised a hierarchical framework to illustrate criminal thinking. This framework details how a higher order construct of General Criminal Thinking (GCT) is overarching and then the second step is to measure the strength of the Proactive Criminal Thinking (PCT) which links to Mollification, Entitlement, Power Orientation, and Superoptimism thinking styles, and Reactive Criminal Thinking (RCT) which encompasses Cut-Off, Cognitive Indolence, and Discontinuity thinking styles. By analysing the PICTS in this way, a high internal consistency for GCT (.84-.86), PCT (.78-.88), and RCT (.70-.73) has been shown. These findings are promising and have resulted in both clinicians and researchers focusing on this framework. For example, Walters (2016) explored the impact of criminal thinking in adolescence to help understand how the criminal lifestyle develops, specifically, the role of peer influence. The findings gave support to show how the PCT style can mediate the effect of peer influence, whereas RCT mediates the peer selection effect. However, the findings reviewed do not make clear if the factorial structure of criminal thinking styles exists with different cultures. For example, Bulten, Nijman, and Van Der Staak (2009) found a two-factor structure as did Palmer and Hollin (2003), and factor analysis could not be completed by Megreya, Bindemann, and Brown (2015) due to the small sample size. Therefore, there is a clear need for further research to be undertaken on cross cultural samples to determine if similar results are achieved.
Another consideration when examining Walters’ findings is how construct validity appears to have been correlated with general criminality and does not discriminate between different offender typologies. Lacy (2000) compared the PICTS scores of drug and nondrug offenders. While there were no significant differences found between Mollification and Sentimentality, the remaining six thinking scales were found to be significant, with the drug offender population scoring higher. Therefore, this led to the conclusion that, while the two groups share many of the same thinking styles, the extent to which these thinking styles are used do differ. Low and Day (2017) investigated whether subtypes of violent offenders can be meaningfully identified when using the PICTS. Mean PICTS t-scores were calculated and compared with offenders generally. The violent offender group held a moderate level of beliefs supportive of a criminal lifestyle (indicated by the GCT scale score). The group also demonstrated some signs of PCT and RCT, although the difference between these scores was insufficient to infer a trend. Cluster analysis was then used to identify three different violent typologies and compare the PICTS data pre- and post-treatment. Their findings suggest that different types of violent offenders gained differential benefit from the completion of the multimodal violence interventions. Therefore, it was deemed that criminal thinking may not always be a treatment need for all violent offenders and supports the rationale for the development of more sophisticated assessments to measure cognition.
Construct validity can further be divided into subcategories of convergent and discriminant validity (Haynes et al., 1995). Using a process of pattern matching, Trochim (1989) revealed that the PICTS does measure criminal thinking and meets the minimum standards for construct validity. Furthermore, the PICTS validity scales displayed clear signs of convergent and discriminant validity when correlated with the Personality Assessment Inventory (PAI) impression management scales. This is an interesting finding and adds weight to the utility of these scales given that reliability had been considered questionable. Furthermore, this reinforces other researchers (Matthews & Deary, 1998) who argue that individuals’ criminal thinking should be compared with their basic personality traits as this will contribute significantly to encompassing psychological phenomena.
Criterion-Related Validity
Criterion-related validity can either be concurrent or predictive in nature. Concurrent validity of the PICTS is assessed by considering the extent to which a measure correlates with other validated measures assessing the same construct at the same time (Kline, 1998). Walters (2001b) identified that to assess this type of validity, the PICTS needs to be correlated with measures of prior criminality (i.e., prior arrests, prior incarceration, age at first arrest, and age at first prison sentence). Using the data from his male sample, Walters (1995) found that all the PICTS scales correlated moderately, with the Historical Scale providing the best measure of an offender’s past criminal involvement. The PICTS has also been correlated with the LCSF-R (Walters, 1998) and the Psychopathy Checklist–Revised (PCL-R; Hare, 1991) with Walters and DiFazio (2000) concluding that both measures do correlate with the PICTS. Other measures which assess criminal thinking include the Criminal Sentiments Scale (CSS; Gendreau, Grant, Leipciger, & Collins, 1979) and Measures of Criminal Attitudes and Associates (MCAA; Mills, Kroner, & Forth, 2002). However, both focus on the content of what an offender thinks, whereas the PICTS assesses specific criminal thinking styles or criminal thought process (Walters, 1990). Drawing any inferences, therefore, proves difficult as they are not assessing the same construct, despite focus both being on cognition.
Predictive validity is typically assessed using receiver operating characteristic (ROC) analysis and area under the curve (AUC) estimates. Early research focusing on the predictive validity of the PICTS has yielded mixed results in terms of the prediction of future behaviour for both female and male offenders (Palmer & Hollin, 2004b; Walters, 1997; Walters & Elliott, 1999). Palmer and Hollin (2003) nevertheless found some evidence of the PICTS’s utility as a measure of change over the duration of a prison sentence, although further work is needed in this area to draw more concrete conclusions.
Gonsalves, Scalora, and Huss (2009) used the PCL-R and the PICTS to determine if both measures could predict recidivism. Findings revealed that recidivists scored significantly higher on Factor 2 of the PCL-R and total score of the PICTS. Furthermore, only the Superoptimism factor significantly contributed to the prediction model (Walters, 2005). However, it is important to consider sample type as this study used forensic inpatients rather than prisoners, thus limiting the generalisability of the conclusions made. Nevertheless, they also found that the PICTS improved the predictive utility of the PCL-R suggesting that it may be worthwhile to include this self-report measure in more dynamic assessments.
Walters and Lowenkamp (2016) found that the PICTS was capable of predicting recidivism in a large group of male and female offenders serving community sentences. These findings were comparable to those obtained in studies on prison inmates (Walters, 2012) despite the fact that the mean overall criminal thinking score was 18% lower in their sample than it was in samples of released prisoners. A further study (Walters, Deming, & Casbon, 2015) reviewing male sex offenders released from prison-based sex offender treatment found that the PICTS scores for GCT, PCT, and RCT predicted general and “failed to register” recidivism. However, only GCT and PCT attained incremental validity relative to the actuarial assessment Static-99 but this was only for predicting “failed to register” recidivism. Therefore, while these different research studies have shown that criminal attitudes can predict future outcomes and may be useful in treatment evaluation, there are a number of inconsistent findings which warrant further exploration.
PICTS Normative Data
To determine the utility of a psychometric measure, normative information is an essential requirement as it provides a basis on which test scores can be compared. Normative data for the PICTS were originally collected from an American male sample (Walters, 1995). Information was separated for males in minimum, medium, and maximum security federal prisons (N = 450; 150 from each security level). The norms have not been further separated into age groups within the manual. Walters et al. (1998) widened the sample by exploring the measure on a female American sample (N = 227; 127 state, 100 federal). Similarly, there is an absence of age ranges in the data. This raises questions regarding the generalisability of the measure to the individuals being assessed, particularly if the offender is in early adulthood, that is, 18 years old.
Since the development of PICTS, further analysis has been undertaken on offender populations (Walters & McCoy, 2007) including forensic and civil psychiatric samples (Carr, Rosenfeld, Magyar, & Rotter, 2009; Magyar, Carr, Rosenfeld, & Rotter, 2010). However, both studies used the “PICTS Layperson Edition” to take into account participants who did not have a history of being involved with the Criminal Justice System. While Walters, Felix, and Reinoehl (2009) outlined this version of the PICTS to have similar test–retest reliability and preliminary validity to the standard PICTS, the ability to generalise these findings is reduced. The PICTS has also been translated into Spanish although there appears to be no published research where normative data are available. Emerging data have begun to be published on the PICTS for other cultural populations. This includes incarcerated offenders within a Dutch (Bulten et al., 2009), Egyptian (Megreya et al., 2015), and English sample (Palmer & Hollin, 2003) as well as a community offender–based sample in Ireland (Healy & O’Donnell, 2006). Outcomes for these samples are summarised in Table 2, from which some disparities can be seen. Normative data from the Egyptian sample obtained by Megreya et al. (2015) found that offenders scored higher on five thinking styles (Mollification, Entitlement, Power Orientation, Sentimentality, and Discontinuity) in comparison to English and Dutch offenders. Moreover, English offenders in Palmer and Hollin’s study scored higher on Cognitive Indolence than the Dutch sample. Interestingly, the samples outside of the American studies were male offenders, and as such, utility on a female offender population within these countries remains unanswered. Therefore, it is clear that a more comprehensive review is needed on forensic populations in custody, psychiatric settings, and in the community to understand the reliability and validity of the PICTS further. This is key given that the PICTS has been used, alongside other psychometric measures, to evaluate the effectiveness of offending behaviour programmes (Gobbett & Sellen, 2014; Palmer & Humphries, 2017).
Normative Data on Differing Cultural Population From the PICTS.
Note. Cfr = Confusion; Dfr = Defensiveness; Moll = Mollification; CO = Cut-Off; En = Entitlement; PO = Power Orientation; Sn = Sentimentality; SO = Superoptimism; Ci = Cognitive Indolence; DS = Discontinuity.
Gobbett and Sellen (2014) sampled Welsh offenders who attended the Thinking Skills Programme (TSP). Here, paired-samples t tests showed that the pre- and post-programme differences achieved statistical significance on six of the thinking styles (Mollification, Cut-Off, Entitlement, Power Orientation, Superoptimism, and Cognitive Indolence) and overall total score. For the two remaining thinking styles, while statistical significance was not achieved, post-programme data revealed a large difference in the expected direction for Discontinuity and a small-medium effect in Sentimentality. Therefore, the authors believed that that there was a positive effect on thinking styles and attitudes for offenders, meaning a positive effect had occurred from attending the TSP. In contrast, Palmer and Humphries (2017) found no significant differences on thinking styles between completers of an unnamed cognitive behavioural programme and non-completers. However, the authors note that, owing to their sample sizes, it is not possible to draw conclusions as to why this was found. Therefore, as outlined, the cultural utility of the PICTS warrants further investigation.
The usefulness of comparing the normative data produced is also limited by the different versions of the PICTS used in studies. Comparisons between data sets have been made by those investigating cultural differences using Versions 3 and 4. Furthermore, one study did not state the version used (Healy & O’Donnell, 2006). Given that additional items were added and the scoring on the Likert-type scale changed from Version 3 onward, the utility of the normative data is limited.
Additional Considerations
Another area worthy of discussion is the PICTS’s level of measurement. Kline (1998) outlines how the most ideal form of measurement should incorporate a ratio scale as it is based on a true zero point. This provides a meaningful difference between each individual rating on the scale and allows for parametric analysis to be used. Most psychometric measures do not use a ratio scale, and Blaikie (2003) highlighted the continued debate as to whether the data achieved from psychometric scales are classified as ordinal or interval. Questions are also raised as to whether a midpoint should be used in scales (Garland, 1991) and whether the categories on a scale influence the responses given (Kieruj & Moors, 2010).
Kline (1986) believed that an ordinal scale allows parametric analysis to be used. As the PICTS has 17 scales derived from 80 questions, this is classified as an ordinal scale. In later research, Walters, Hagman, and Cohn (2011) examined the factor structure and underlying latent trait structure of PICTS using item response theory. Results confirmed that the PICTS is capable of measuring criminal thinking at moderate to high levels of the trait dimension, and as such, it has a good level of measurement. However, the Sentimentality scale was considered poor at assessing criminal thinking, which, in turn, lowers the overall internal reliability of the GCT scale. This led to the authors reflecting on the possibility that Sentimentality may actually be assessing an individual’s response style. As such, future research is required to determine whether items other than those on the Sentimentality scale should be removed from the GCT to determine the continued viability of the PICTS.
While ordinal scales are advantageous in terms of ease of data collection and the categorisation of responses, this can result in bias. The PICTS has two high-response options (agree, strongly agree) in comparison to one lower response option (disagree) as well as a neutral response (uncertain). Therefore, it could be inferred that this increases the likelihood for respondents to endorse higher levels of criminal thinking than is actually the case. Furthermore, the response style effects of satisficing (Krosnick, 1999) and acquiescence (Moors, Kieruj, & Vermunt, 2014) are also issues which can affect Likert-type scales. Krosnick (1999) outlined how satisficing occurs when respondents are not able to understand the question or they are not motivated to give an opinion so are more likely to select a neutral response. Acquiescence can occur when the individual finds it harder to disagree than agree with a statement, and factors such as a lack of motivation or tiredness can influence the individual to agree irrespective of the content (Schalast, Redies, Collins, Stacey, & Howells, 2008). Applying this to the PICTS, the third scaling item is labelled as “uncertain” and gives a choice for an individual to remain neutral, and owing to the number of questions, acquiescence may occur. As such, it would be important for the assessor to monitor an offender’s response during the completion of the PICTS.
In their review of psychometric formatting, Ogden and Lo (2012) noted that respondents consistently base their judgements in accordance with where they believe they should be in their lives or where they have been in the past. The PICTS manual acknowledges that some of the items are clearly historical in nature, while other items ask specifically for current thoughts and attitudes. There is, however, ambiguity in a number of items which could be answered in either time frame. Guidance highlights that the assessor should only help the respondent if they ask specifically about the time frame. At this point, they can be informed to answer in the present. Should a respondent remain quiet, the data achieved may not have been answered within the appropriate time frame. However, despite these limitations, given that criminal activity is linked to an individual’s belief system, it could be argued that self-report measures such as the PICTS are important in being able to capture these attitudes and cognitions. Moreover, cited research continues to support the use of the PICTS as a measure of criminal thinking.
Conclusion
The purpose of this review was to critically analyse the psychometric properties of the PICTS, in line with the standards outlined by Kline (1986) and Field (2009). The PICTS as a self-report measure benefits from being easy to administer and is considered helpful as a measure to capture and gain insight into how an offender thinks. On this basis, research has continued to utilise this measure when evaluating offender behaviour programmes (Gobbett & Sellen, 2014; Palmer & Humphries, 2017). The PICTS has also been through a number of test revisions which could be viewed by some as advantageous as a means of continuing to strive for a more robust measure as more research evolves. This would also be in line with other assessment measures which have gone through similar revision processes such as those measuring intelligence and memory (Wechsler, 2008, 2009). Conversely, the utility could be questioned given that the PICTS has gone through four revisions in short succession.
Of the research reviewed, there are various studies supporting the PICTS which suggest that this measure has satisfactory internal and retest reliability. Yet this differs to the criteria suggested by Kline (2000) and Nunnaly (1978) who suggest a higher cutoff should be applied to conclude a psychometric has good reliability. In turn, this highlights the need for further research to form more concrete conclusions regarding the reliability of the measure. Support has also been found for the validity of PICTS; Walters combined established theory with the opinions of the offenders in his development of the measure, meaning that the measure was more likely to capture accurate facets of the criminal lifestyle. Through factor analysis, Walters has also been able to conceptualise a hierarchical framework from the PICTS to explain the concept of criminal thinking which has received support for having good reliability and validity. However, despite these findings, it has to be acknowledged that the majority of research has been undertaken either solely by the author or by him in collaboration with other authors. As such, questions are raised as to whether an Allegiance effect (Hollon, 2006) may have occurred. Furthermore, despite revised editions of the PICTS, the reliability of the validity scales within the measure has continued to perform poorly. Therefore, it is suggested that they either need to be revised or for consideration to be given to the inclusion of further validity scales to improve the PICTS utility. This would be imperative given that differences were also found between male and female offender samples.
Although research continues to provide emerging data cross culturally on the PICTS, this research is still in its infancy. Therefore, further independent research is needed across different cultural samples both with females and males and from different age ranges to have wider normative data and allow more accurate inferences to be made regarding the PICTS utility in measuring criminal thinking styles. It is also recommended that research be undertaken with different offender samples to explore the factorial structure of criminal thinking styles. For example, a distinction has not been made with the norms when applying this to Moffitt’s (1993) life course persistent (LCP) and adolescence limited (AL) theory. Based on Moffitt’s explanation, it would be anticipated that offenders who were classified as LCP would have a number of entrenched criminal thinking styles when considering the PICTS. Furthermore, there is no research at present which has focused on the PICTS utility with different offender typologies such as sex offenders, gang members and as such, no norms for these populations exist. This has implications not only in terms of the PICTS utility on these cohorts but also as a psychometric in rehabilitative evaluations. However, to attend to these recommendations, it would be important for future research to be undertaken by independent qualified researchers, to avoid the allegiance effect and on different cultural populations. Despite this, the PICTS has continued to be used by both professionals and academics for over three decades. Therefore, given the measures continued use within the forensic field, it is concluded that where the PICTS is used, caution should be exercised and that preferably it be used in conjunction with other measures.
Footnotes
Acknowledgements
This work was supported by the University of Birmingham.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
