Abstract
The Social Skills Improvement System (SSIS; Gresham & Elliott, 2008) is a multiple stage, broadband system for assessing and intervening with children in preschool through 12th grade originally normed in the USA. Two of the assessment components of this system were analysed: (a) the Performance Screening Guides (PSGs); and (b) the Rating Scales (RSs). Australian teachers in Ipswich (N = 15) and South Brisbane (N = 30) rated their elementary school students with the SSIS. This study’s objective was to compare the psychometric properties of an Australian sample of students to the US-based normative sample to determine the transferability of the measure among English speaking populations. Internal consistency reliability was good for both samples across both measures. Correlations between PSGs and RSs domains were similar within the two countries. Conditional probability analyses indicated the PSGs work as the first stage of a multiple gating procedure. Overall, the psychometric data, based on a sample of Australian students, demonstrated similar results to the large US-based normative sample, suggesting that the reliability of scores and the validity of ensuing inferences for the SSIS measures are generalizable for child assessment purposes.
Children’s social skills promote relationships with others, support future emotional adjustment, and enable positive outcomes of schooling. Social skills correlate positively with academic achievement (Malecki & Elliott, 2002; Welsh, Parke, Widaman, & O’Neil, 2001), and even enable academic development according to some investigators (Caprara, Barbaranelli, Pastorelli, Bandura, & Zimbardo, 2000). While social skills strengths have been linked to positive outcomes, social difficulties in school have been linked with future maladjustment (Walthall, Konold, & Pianta, 2005). Assessment focused on identifying students with social skills difficulties is particularly important in schools. Australian schools have recently incorporated social-emotional learning criteria into their curriculum standards (Australian Curriculum, Assessment, and Reporting Authority [ACARA], 2011). Students are expected to reach certain personal and social capabilities at various ages. These standards guide students to recognize and regulate emotions, develop empathy for and understanding of others, establish positive relationships, make responsible decisions, work effectively in teams, and handle challenging situations constructively (ACARA, 2011).
Methods of assessment are needed to identify students who are struggling with social skills. Currently, there is no Australian-normed measure of social skills that is well aligned with ACARA’s proposed Personal and Social Capability standards. In their comprehensive overview of systems for improving social skills, Elliott, Frey, and Davies (2014) reported reviews and meta-analyses of social skills interventions that also identified measures of children’s social emotional behaviour. Of two major reviews of such measures (Crowe, Beauchamp, Catroppa, & Anderson, 2011; Humphrey et al., 2011), the Social Skills Improvement System (SSIS, Gresham & Elliott, 2008) (formerly the Social Skills Rating System [SSRS], Gresham & Elliott, 1990) was one of three measures used in ten or more research articles (Humphrey et al., 2011), and was identified as the measure with the most citations (Crowe et al., 2011). Davies, Cooper, Kettler, and Elliott (in press) also considered these reviews and determined that the SSIS provided scores most closely aligned with the social skills identified by the Australian curriculum.
Multiple stage screening and the SSIS
Behavior rating scales can be used as part of multiple-stage screening methodology, and are often used to identify students in the early stages of a behavioral, social, or emotional difficulty (Merrell, 2001). Multiple staging is a sequential assessment procedure that involves universal screening followed by more specific assessment measures. Through a series of decision steps a large population of interest is narrowed to a smaller, targeted population of those most at-risk. Universal screening is typically the first step, with individuals who achieve a pre-determined cutoff score referred for more involved assessment, intervention, and/or final classification.
The SSIS includes a universal screening tool for social skills called the Performance Screening Guides (PSGs). The PSGs (Elliott & Gresham, 2007) are nomination rubrics designed for class-wide screening of students’ prosocial behaviors, along with their reading, mathematics, and motivation to learn. The PSGs are designed for students from ages 3 - to 17-years old. Students are rated by teachers using a five-level performance rating in each of the four skill areas. A performance rating of 1 indicates an area in need of instructional action. Performance ratings of 2 or 3 indicate ‘caution’ and require teacher attention to monitor the students’ functioning. Students earning a performance rating of 4 or 5 in a skill area are identified as not at-risk. Completing the PSGs takes about 25-minutes per classroom. The authors note that reliability measurements were adequate, given the brevity of the PSGs. Test–retest reliabilities ranged from 0.68–0.74, and inter-observer reliabilities ranged from 0.55–0.68 across skill areas for Elementary and Secondary levels.
Comprehensive assessment via rating scales
The SSIS-Rating Scales (RSs) (Gresham & Elliott, 2008) are used for targeted assessment of students’ social skills, problem behaviors, and academic competence. Teachers report on behaviors across contexts for children aged 3 - to 18-years-old. The frequency of each Social Skills and Problem Behaviors item is rated using a four-point scale, and the importance of each item is rated using a three-point scale. The Social Skills domain includes seven subdomains (i.e. Communication, Cooperation, Assertion, Responsibility, Engagement, Empathy, and Self-Control), and the Problem Behaviors domain includes five subdomains (i.e. Internalizing, Externalizing, Hyperactive, Bullying, Autism Spectrum). For the Academic Competence scale, teachers rate the students’ levels relative to the entire classroom using a five-point scale. The RSs feature coefficient alphas in the upper 0.90 s and test–retest correlations in the 0.80 s.
Objective
The SSIS assessments are currently in use in many English speaking countries (e.g. Australia, Canada, England, Ireland, New Zealand, Scotland) and have been translated for use by researchers in another dozen countries (e.g. Brazil, Chile, Germany, Iran, Japan, Korea, and Malaysia). Yet, there have been no published studies of representative samples of children from these various countries featuring reliability and validity properties compared to properties based on the US normative sample. This study’s objective was to compare the psychometric properties of the SSIS based on the US standardization sample with those from a much smaller, but somewhat representative Australian sample. Evidence of comparable properties would support the transferability of the measure among English speaking populations.
Method
Participants
Three samples of data were used for the analyses in this study. The first sample consisted of 30 teachers in the Brisbane South district in Queensland, Australia who used the SSIS (Elliott & Gresham, 2008) in their second, third, and fifth grade classrooms. Teachers were selected, as a convenience sample, as part of a previous study (Kettler, Elliott, Davies, & Griffin, 2011). Teachers came from nine different schools, sampled from 36 primary schools in Brisbane South District that includes a range of SES areas including inner city, urban, and semi-rural. Generalizability of this sample as being representative of the Queensland or the Australian populations is difficult to determine since, as indicated in Kettler et al. (2011): ‘Ethnicity and income status of these schools, districts, and participants are not routinely documented in Australia’ (p. 97). A total of 536 students were rated using the PSGs. A random sample (N = 179) stratified on gender and PSG levels was selected to be rated by their teachers using the RSs. The mean age in months was 96.9 (SD = 14.9) for the PSGs and 97.7 (SD = 15.4) for the subset also rated on the RSs.
The second convenience sample was collected at the Bundamba State School in the first year of a four-year longitudinal study (Davies et al., in press). This sample included 321 students in Prep (first year of schooling—Kindergarten) through Year 3. In this sample, 15 teachers used the PSGs to evaluate the students in their classes. Those students who were judged at level 1 or 2 (at-risk) on the prosocial performance area were further assessed using the RSs, teacher form (N = 101). The mean age in months was 77.2 (SD = 14.2) for the PSGs and 73.9 (SD = 8.1) for the subset also rated on the RSs. Demographic information on the Australian samples is included in Table 1 (See Supplemental Material).
In addition, data collected for the US standardization sample of the RSs, Teacher Form (N = 950) and the PSGs (N = 85) were used for comparison. The mean age in months was 93.5 (SD = 40.40) for the subset rated on the PSGs and 112.2 (SD = 49.0) for the RSs. Additional demographic information representing this sample is included in Table 2 (See Supplemental Material).
Data analysis
Reliability was estimated using the coefficient alpha (Cronbach, 1951). Murphy and Davidshofer’s (2005) categorical descriptions were used for interpretation. To evaluate content validity evidence, the mean importance ratings were compared via independent samples t-tests at the subdomain level between the US and the Australian samples. Internal structure validity evidence was evaluated using Pearson product-moment correlations and conditional probability analyses.
Results and discussion
Reliability and validity evidence was comparable between the Australian convenience samples and the US standardization sample. This finding is generally supportive of the transferability of the SSIS to Australia, as well as to other English speaking populations. The means and standard deviations of measures in the current study are depicted in Table 3, See Supplemental Material.
Reliability estimates
Results indicated that the reliability was moderately high to high for both samples. All alpha coefficients exceeded 0.80 for both populations. In many instances, the Australian sample yielded higher alpha coefficient values than did the US sample. This finding is consistent with Kettler et al. (2011), who found similar coefficients among Australian third and fifth grade students. The coefficient alphas across samples are described in Table 4, See Supplemental Material.
Validity evidence based on content
Teachers’ importance ratings provided evidence of the appropriateness of the item content for the Social Skills domain. A rating of 0 indicated that the teacher did not think a skill was important in the classroom, a rating of 1 indicated that the teacher believed the skill was important, and a rating of 2 indicated that the teacher believed the skill was critical. Results were compared across samples to determine whether there were any differences in teachers’ evaluation of importance. Mean ratings for all items across the two samples ranged from 1.01–1.63. The Australian teachers rated items on the Social Skills domain as more important in their classrooms (M = 1.22, SD = 0.24) compared to US teachers’ ratings of the same items (M = 1.19, SD = 0.27), t (1218) = 2.07, p = 0.039. Australian teachers also rated the items of three subdomains (i.e. Cooperation, Assertion, Responsibility) as more critical compared to the US teachers. The US teachers only rated Empathy items as more important compared to Australian teachers’ ratings of the same items. Effect sizes for these comparisons were in the small range or lower. The mean importance ratings are describe in Table 5, See Supplemental Material.
Validity evidence based on internal structure
To assess the internal structure validity of the SSIS, correlational analyses and conditional probability analyses were conducted. Results from the analyses were similar across samples, with regard to both magnitudes and directions. Within the PSGs, in the Australian sample two of the six correlations were in the large range, and the other four were in the very large range. In the US sample, two of six correlations were in the medium range, two were in the large range, and two were in the very large range. For both samples, within the RSs, all correlations were positive except those involving Problem Behavior, which were all negative. In the Australian sample two of the three correlations were in the large range, and the other was in the very large range. In the US sample, one of the three correlations was in the medium range, and the other two were in the very large range. Table 6 depicts correlations (See Supplemental Material).
The pattern of correlations indicated that the constructs are being measured appropriately within a multi-trait, multi-method framework (Campbell & Fiske, 1959). For both samples, there were larger correlations among scores measuring the same construct on different measures (e.g. Prosocial Behavior PSG and Social Skills domain, r = 0.77) compared to correlations measuring different constructs on the same measures (e.g. Social Skills domain and Academic Competence domain, r = 0.60).
For conditional probability analyses, the PSGs were used as predictors for the RSs. Sensitivity indices were in the high range for most of the combinations tested. Thus, if a student has academic, social, or behavior difficulties based on RSs domains there is a high probability that the PSGs also identify these difficulties. Specificity indices ranged from the low range to the moderate range for nearly all combinations tested in both samples. It appears that PSGs are less likely to correctly not identify students without a difficulty. Positive predictive value (PPV) indices were in the low range for all the combinations for the South Brisbane sample, and the US data followed a similar pattern. This indicates that the likelihood that a student identified on the PSGs was also identified on the RSs was relatively low. Negative predictive value (NPV) indices were in the high range for all combinations in both samples. This indicates that there was a high likelihood that a student who was not identified on the PSGs was a student without a difficulty as measured on the RSs. Table 7 depicts conditional probability indices (See Supplemental Material).
These results indicate that the PSGs work appropriately as a first step in the multiple stage rating system for both samples, as intended. PPV and Specificity values fall below the high range; this finding may be acceptable for this type of screening system, in which a false positive is not very costly (Kettler & Feeney-Kettler, 2011). If a student were to be inaccurately identified as at-risk on the PSGs, more targeted analyses would be conducted to determine the nature of the difficulty. On the more extensive RSs, the student may not be found to display a difficulty, and no intervention would be used.
Limitations
The primary limitation of the current study is that the Australian dataset was the sum of previously collected data from two convenience samples. That being the case, the study did not directly assess the appropriateness of the tool for the Australian population. To thoroughly evaluate the reliability and validity of the SSIS and similar measures in new populations, more representative samples of participants from those populations are required. The current study indicates that such future research may be warranted.
Conclusions
The results demonstrate that the reliability and validity evidence for the SSIS gathered in this study is comparable between the US and Australian samples. The measure demonstrates moderately high to high reliability for the RSs. Evidence of validity, based on both content and internal structure, was also comparable and promising. These findings indicate that the SSIS may be a measure that can be transferred to some Australian populations, and perhaps to other English-speaking countries, without compromising the soundness of the resulting test scores. Given the widespread international use of the SSIS, it is recommended other investigators provide comparative psychometrics when reporting their use of the SSIS, thus contributing collectively to the soundness of the social behavior measurement infrastructure across countries.
Footnotes
Note
To avoid any real or perceived conflict of interest Stephen N. Elliott, who is a co-author of the Performance Screening Guides and Rating Scales, did not handle any of the data nor conduct any of the analyses for this project. He was involved in obtaining the data and writing this manuscript.
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
