SSIS Performance Screening Guide as an Indicator of Behavior and Academics: A Meta-Analysis

Abstract

This article documents the results of a meta-analysis of available correlational validity evidence for the Social Skills Improvement System Performance Screening Guide (SSIS-PSG), which is a brief teacher-completed rating scale designed to be used as part of universal screening procedures. Article inclusion criteria included (a) published in English in a peer-reviewed journal, (b) administration of the PSG, and (c) provided validity evidence representative of the relationship between PSG scores and scores on related variables. Ten studies yielding 147 correlation coefficients met criteria for inclusion. Data were extracted following established procedures in validity generalization and meta-analytic research. Extracted coefficients were of the expected direction and magnitude with theoretically aligned constructs, thereby providing evidence of convergent validity (e.g., PSG Math and Reading items were most strongly correlated with academic performance and academic behavior variables, with effect sizes ranging from .708 to .740; PSG Prosocial Behavior and Motivation to Learn items were most strongly correlated with broadband externalizing/internalizing problems, with effect sizes ranging from −.706 to −.717), although Prosocial Behavior and Motivation to Learn were not as effective at discriminating among divergent constructs. These results generally support the utility of the PSG in correlating with academic and social/behavioral outcomes in the schools.

Keywords

validity generalization screening performance screening guide social skills

With its focus on early identification, prevention, and problem-solving, the Multitiered Systems of Support (MTSS) model of service delivery in the schools holds promise for identifying students in need of academic intervention and mental health services before problems become severe or the student experiences significant failure, such as earning failing grades and being subject to disciplinary action (Carta, 2019; Kilgus et al., 2015). Universal screening—in which all students within a school are screened to identify those at most risk for significant problems—is an important component of the MTSS approach. Large-scale screening in the schools may serve to: identify students in need of follow-up services, who may otherwise go undetected; identify school-wide risk factors to inform large-scale approaches to prevention and intervention (Dowdy et al., 2012); and place increased emphasis on social-emotional health, prevention, and early intervention services (Moore et al., 2015). As MTSS and universal screening have gained in popularity, school staff are in need of brief, yet psychometrically sound, universal screening measures that lead to informed decision-making on behalf of students (Jenkins et al., 2014; Reinke & Herman, 2016).To be useful, screening measures must be able to accurately identify students who need additional assessment or services while being practical and easy to score to allow for school-wide use (Glover & Albers, 2007).

Results of recent surveys have highlighted screeners that are most likely to be used in schools (e.g., Benson et al., 2019). Notably, the three most popular screeners identified in the Benson et al. (2019) survey are fairly time-consuming and may be expensive to administer, especially if teachers are expected to complete screeners for all students. They may also fail to provide indicators of functioning in a variety of areas. For example, the teacher version of the Behavioral and Emotional Screening System (BESS; Kamphaus & Reynolds, 2015)—ranked as the most frequently used screening instrument—includes 20 items. Although it assesses internalizing risk, externalizing risk, and adaptive skills risk, it can take over 1.5 hr to complete for a classroom of 20 students. The Social, Academic, and Emotional Behavior Risk Screener (SAEBRS; Kilgus et al., 2016) assesses functioning in academic, social, and emotional domains, but at 19 items, it can take up to 1 hr to complete ratings for a classroom of 20 students. Additional effort is required to score and interpret these screeners, often requiring scoring software. Given the length of these screeners, it also seems burdensome for teachers to complete them multiple times to track student progress.

An alternative approach is to use screeners that rely on a single item for assessing unique domains (e.g., one item is used to assess prosocial behavior, and one item is used to assess disruptive behavior). Results of multiple studies provide evidence that single-item screeners have significant correlations with multiple-item screeners and student outcomes. For example, Stormont and colleagues (Stormont et al., 2015, 2017) found that single-item assessments of school readiness—based on the Kindergarten Academic and Behavior Readiness Screener—completed by kindergarten teachers were able to identify students at risk for academic and behavioral problems, and also correlated with academic and behavioral outcomes at the end of the school year. Ratings were found to be more accurate than those of a multiple-item social rating scale. Results of other studies—based on the Direct Behavior Rating tool—have found that single-item ratings of academic engagement, disruptive behavior, and respective behavior were significantly correlated with other, longer rating scales (Chafouleas et al., 2013) and were moderately to highly correlated with behavioral risk (Kilgus et al., 2012). Overall, emerging research supports the use of single-item screeners.

The Social Skills Improvement System Performance Screening Guide (PSG; Elliott & Gresham, 2008) is another standardized, criterion-referenced instrument—based on single items—that appears to meet the need for efficient screening instruments. The PSG is a teacher report measure that is intended to quickly assess observable behaviors in academic and positive behavior domains. There are three versions of the PSG (preschool, elementary [Grades K–6], and secondary [Grades 7–12]), each with four items. The PSG can be completed in less than 30 min for all students in a typical classroom. To complete the PSG, teachers respond to the four single-item scales, using a 5-point Likert-type scale, to rate students’ abilities. Lower scores represent very limited skills and indicate a need for intervention, and higher scores represent excellent skills with little need for intervention. In addition to quickly identifying current levels of functioning, the PSG can be utilized to evaluate student progress in skill development and outcomes of related programs.

The PSG assesses skills in the areas of Reading, Mathematics, Prosocial Behavior, and Motivation to Learn. The Reading Skills item is intended to determine the degree to which a student demonstrates engagement in reading activities, as well as performance and competence on grade-level reading and comprehension skills. The Mathematics Skills item is intended to determine the degree to which a student demonstrates engagement in math activities, as well as performance and competence on grade-level math computation and concepts skills. The Prosocial Behavior item is intended to measure the degree to which students effectively communicate and collaborate with others, as well as their level of self-control and concern for others. The Motivation to Learn item is intended to measure the degree to which a student is engaged in instructional activities, stays on task, and exhibits appropriate levels of effort and attentiveness in learning situations. The four items included on the PSG may be considered measures of “academic enablers” since they are designed to assess behavioral variables associated with academic success and learning, such as engagement, communication skills, social skills, and motivation (Kettler et al., 2012). From a theoretical perspective, these academic enablers should be related to a range of academic and social-emotional variables, such as academic performance, positive behaviors, problem behaviors, emotional distress, and measures of adjustment and attitudes (Gresham & Elliott, 2017). Indeed, there is empirical support for significant positive relationships between children’s social skills and academic achievement, negative relationships between problem behaviors and academic achievement, and positive relationships between academic enablers and math and reading achievement (DiPerna et al., 2002, 2005; Gresham & Elliott, 2017; Kettler et al., 2012). In particular, strong social skills have been found to predict positive behavioral and academic outcomes, as well as psychological adjustment (Gresham & Elliott, 2008).

Because each of the four scales only has one item and because there is no total score, internal consistency data for the PSG are not available. However, the authors report test–retest reliability coefficients for elementary students ranging from .68 to .74 (Elliott & Gresham, 2008). Results of a separate study indicate test–retest reliability coefficients ranging from .59 to .67 (Lane, Oakes, Ennis, & Royer, 2015). Inter-rater reliability, based on a comparison of an elementary student’s teacher and another person familiar with the student, yielded results in the moderate score reliability range (.55–.68). However, less information is available in the test manual regarding the validity of PSG scores; this lack of documented support has been identified as a significant weakness (Jenkins et al., 2014; Krach et al., 2017). The manual does not describe how items were developed and does not provide evidence of content validity specific to the PSG. The authors provide some evidence of concurrent validity by comparing PSG scores with results on a more comprehensive measure of social skills, but the comparison does not seem to cover all areas assessed by the PSG. Since publication of the PSG, several studies have provided evidence about the validity of the PSG, but there has been no quantitative synthesis of PSG studies from which to draw robust conclusions.

Thus, it is important and timely to conduct a quantitative synthesis of the accumulated PSG validity evidence, which would allow researchers and practitioners to draw more robust conclusions about the use of the single-item PSG scales. One way of completing a relevant quantitative review is by conducting a meta-analysis of the available correlational validity evidence. In such a review, validity coefficients are collected across studies to calculate central tendency and variability tendencies across studies, including obtaining an average validity coefficient and correcting for sampling error (Shultz & Whitney, 2005). Meta-analyses provide stronger, more realistic estimates of the average observed validity coefficients across studies and are important to the identification of trends that support the psychometric defensibility of the analyzed screeners. Similar meta-analyses have been conducted for other emotional and behavioral screeners (Allen et al., 2019; Kilgus et al., 2017).

In sum, the purpose of this study was to conduct a meta-analysis of the available correlational evidence for the validity of PSG scores as academic and behavioral indicators, using reported correlation coefficients from previous research studies. In light of the discussion of academic enablers above, we would expect all four of the PSG items to be at least moderately correlated with all academic and behavioral criterion variables included in previous studies (described below). We hypothesized the following relationships between the PSG items and the criterion variables: (a) given their apparent relevance to academic engagement and achievement, the Reading Skills, Mathematics Skills, and Motivation to Learn items will show stronger correlations with academic criterion variables than with measures of externalizing or internalizing behavior problems; and (b) the Prosocial Behavior item will show stronger correlations with measures of externalizing and internalizing behavior problems than with academic variables.

Method

Search and Inclusion Procedures

We searched multiple electronic databases (PsycINFO, Psychology Database, PsycArticles, ERIC, SocINDEX, Psychology and Behavioral Sciences Collection, and PubMed) in March 2018. Search terms used for all databases included “social skills improvement system” or “performance screening guide.” The search terms were used in full text searches within each database. An article was included in the study if it met the following criteria: (a) it was published in English in a peer-reviewed journal, (b) the PSG was administered, and (c) it reported validity evidence representative of the relationship between PSG scores and scores on related variables. There was no specified time period in which studies needed to be conducted or published for inclusion in the analysis. To maintain consistency in our analysis, we excluded studies using the revised PSG (Gresham & Elliott, 2017).

We retrieved 327 articles in the initial search. Two trained graduate students completed a full-text review of each article retrieved in the initial search. Of these, 55 were excluded because they were duplicates (i.e., identified in multiple databases) and 255 were excluded because they did not include administration of the PSG. Notably, inter-rater reliability (based on percent agreement) for determining whether an article included administration of the PSG was 98.8%. Of the remaining 17 articles, seven were excluded because they did not report validity information; inter-rater reliability for this part of the search process was 100%. Ten articles met inclusion criteria (see Table S1 in supplemental material). Notably, each of the 10 studies utilized the elementary (K–6) form of the PSG. One study (Miller et al., 2015) included students in both the elementary (Grades 1–2 and 4–5) and secondary (Grades 7–8) levels, with 30% of participants in the secondary level. They did not differentiate their results based on educational level and they did not specify if they used different forms of the PSG in their study. As their results are based on a sample of majority elementary students, we decided to include their study in our analysis and not attempt to differentiate between different PSG forms.

Coding Procedures

Three researchers trained in the coding process independently coded the 10 articles that met inclusion criteria in regard to three main categories: (a) PSG items used in validity analyses; (b) sample characteristics (sample size, grade level, ethnicity [percentage of the sample identified as White], gender [percentage of the sample identified as male], and special education status [percentage of the sample identified as receiving special education services]); and (c) information regarding the criterion to which the PSG was compared within validity analyses, including correlation coefficients between the criterion and PSG scores. Initial percent agreement for coding for the PSG item category was 95.9%; 95.9% for the PSG and criterion correlation coefficients; 91.8% for the criterion categories; 95.2% for the sample size; 100% for the grade category; 78.2% for the special education category (this was due to differences in coding based on a single study providing a large number of correlation coefficients); 100% for the sample ethnicity; and 100% for the sample gender. For all disagreements, the researchers conferred to determine which codes were appropriate and to correct any coding errors.

Coding Categories

Validity coefficients extracted during the coding process related to a range of social, behavioral, and academic measures. We analyzed these coefficients using an aggregated approach that included analyses of all coefficients associated with academic performance, academic behavior, positive behavior skills, externalizing problems, internalizing problems, and broadband externalizing/internalizing problems. We defined academic performance as indicators of student academic skill and ability. Some examples of criterion variables falling in the academic performance category include student grade point average, grades on report cards, and teacher reported academic competency. Although indicators of competency are different than grades, they nonetheless reflect a student’s academic skill development and ability. We defined academic behavior as student behaviors with relevance to academic performance and engagement. Some examples of criterion variables falling in the academic behavior category include teacher reported motivation to learn and school attendance. We defined positive behavior skills as indicators of a student’s capacity to exhibit prosocial skills. Some of the criterion variables falling in this category include scales of social skills and teacher reported prosocial behavior in the classroom. We defined externalizing problems as indicators of student misbehavior typically directed toward others and the environment. Some of the criterion variables falling in this category include scores on narrowband rating scales of externalizing problems, teacher ratings of student disruptive behaviors, and office discipline referrals. We defined internalizing problems as student problems indicative of mood problems or emotional distress. Criterion variables falling in this category include scores on narrowband rating scales of internalizing problems. Finally, we defined broadband externalizing/internalizing problems as indicators of overall student behavioral and emotional problems, indicative of general risk status. This category is different from the other two problem categories in which the criterion variables were primarily rating scales that included a composite score of externalizing, internalizing, and social skills problems.

These categories were based, in part, on those specified by Kilgus et al. (2017). We adapted them based on the specific variables in the articles included in our analysis. Although some of the categories (e.g., internalizing problems) do not seem to be strongly associated with PSG items, we decided to include all categories represented in the articles in our analyses. In addition to providing a complete analysis of all of the available correlation results, we also believed that it would be appropriate to include seemingly less-related categories to demonstrate the validity, including divergent validity, of the PSG for evaluating multiple domains. See Table S1 in supplemental material for a list of criterion variables coded for each study.

Statistical Analysis

Our meta-analyses are based on an aggregation of correlation coefficients between the PSG and other variables, which were employed as indicators of effect size. As meta-analytic methods assume normality (Beretvas & Pastor, 2003), we transformed the correlation coefficients to z scores using Fisher r-to-z transformation and used the converted estimates for the meta-analysis. Sampling variance for each study was also computed in the transformation. For more intuitive interpretation, we transformed the correlation coefficients back into the original correlation coefficient metric and reported them as such in the results.

We used random-effects models to perform the meta-analysis using the statistical environment R (R Core Team, 2015), the metafor R package (Viechtbauer, 2015), and syntax as described by Assink and Wibbelink (2016) and Quintana (2015). Specifically, we used the restricted maximum likelihood estimation method (REML) and inverse variance weights to estimate parameters in our model. Many of the articles in our analyses provided several correlation coefficients (i.e., the PSG items were correlated with several other variables within the same study). Thus, we used three-level meta-analysis via the metafor rma.mv function (Viechtbauer, 2015) to examine whether effect sizes between samples within individual studies were identical (see Beretvas & Pastor, 2003). The sampling error associated with each reported coefficient is addressed at Level 1; the sampling error associated with coefficients included within the same study is addressed at Level 2; and the sampling error associated with coefficients between studies is addressed at Level 3.

At Level 1, we evaluated the results of tests for heterogeneity for significant variations between all effect sizes. We used criteria mentioned in Cohen (1988) in interpreting the effect sizes of the PSG and outcomes (i.e., small [r = .2], moderate [r = .5], and large [r = .8]). We evaluated the significance of within-study variance (Level 2) and between-study variance (Level 3) by performing two separate log-likelihood ratio tests. In each, the null hypothesis states that one of the variance components equals zero, whereas the alternative hypothesis states that the variance component is greater than zero (Assink & Wibbelink, 2016). We fit the original model, in which the variance at Levels 2 and 3 are freely estimated, and compared it with the fit of a model in which only the variance at Level 3 (for within-study variance) or Level 2 (for between-study variance) is freely estimated and in which the other variance is fixed to zero. If systematic differences between studies are significantly different from zero, additional moderator analyses would typically be indicated to determine the potential moderating effects of variables based on participant characteristics (i.e., sample size, grade level, ethnicity, gender, and special education status) and the type of criterion variable. Given the specific characteristics of our study, however, we decided it would not be appropriate to conduct moderator analyses. Most notably, there was inconsistency in statistical power due to different cell sizes and different numbers of coefficients across the PSG items, with some of these analyses lacking sufficient power to detect statistically significant moderator effects. This is likely reflective of the relatively few articles that met criteria for inclusion in this study, with some of the criterion variables not receiving as much attention in the PSG literature.

In addition, we evaluated publication bias, which can lead to inaccurate statistical estimates in meta-analysis. To address this issue, we implemented the Orwin (1983) Fail-Safe N method via the metafor R package. The Orwin method estimates the number of studies with statistically nonsignificant results that would have to be added to the given set of observed outcomes to reduce the average, unweighted effect size to a target average, unweighted effect size that would be considered nonsignificant. We set the target value at .2 based on criteria mentioned in Cohen (1988) for small effects. A larger Fail-Safe N value indicates minimal bias, as it implies that many nonsignificant studies would have to be added. A smaller Fail-Safe N value indicates greater potential for publication bias. Although studies have been conducted to demonstrate the usefulness of Fail-Safe N method by Orwin in the interpretation of the stability of meta-analytic findings (e.g., Carson et al., 1990), there do not appear to be widely used or validated guidelines for what should be considered a small, large, acceptable, or unacceptable Fail-Safe N value, beyond the notion that the larger the Fail-Safe N, the more confidence we have that the result is stable, while lower values are associated with less confidence that the result is stable (Brown, 1992).

Results

Descriptive Statistics

The 10 studies yielded 147 correlation coefficients. The number of correlation coefficients used in each specific analysis is reported in Tables 1 through 4. The average number of participants in the studies was 604 (SD = 664). Nine of the studies had participants in elementary school only and one, as previously described (Miller et al., 2015), included participants in both elementary and secondary grades. For studies reporting participant characteristics, 50.6% (weighted) of participants were male and 64.7% (weighted) of participants were White. Five of the studies reported the special education status of participants. Publication dates ranged from 2012 to 2018, which is a relatively narrow range reflecting the recency of the PSG. Criterion variables examined in the 10 studies ranged from narrowband (most frequently the student risk screening scale [SRSS]) to broadband (most frequently the BESS) behavior rating scales, and less frequently included other behavioral and academic variables such as office discipline referrals, attendance, and grade point average. See Table S1 in supplemental material for specific characteristics reported for each study.

Table 1.

Results of the Reading PSG Validity Generalization Meta-Analysis.

Parameter Estimates	Reading Skills
Parameter Estimates	Academic Performance	Academic Behavior	Positive Behavior	Externalizing Problems	Internalizing Problems	Broad Int./Ext.
Random effects parameters
Intercept	.725*	.709*	.477*	−.506*	−.320*	−.523*
(SE)	(.203)	(.068)	(.075)	(.048)	(.068)	(.077)
SD within studies	.010*	.123*	.004	.005	.006	.008
SD between studies	.116*	.000	.017	.005	.006	.008
Fail-Safe N	15	11	14	8	2	5
Number of coefficients	6	4	9	5	3	3
Number of samples	3	3	4	5	3	3

Note. PSG = performance screening guide.

p < .05.

Table 2.

Results of the Mathematics PSG Validity Generalization Meta-Analysis.

Parameter Estimates	Mathematics Skills
Parameter Estimates	Academic Performance	Academic Behavior	Positive Behavior	Externalizing Problems	Internalizing Problems	Broad Int./Ext.
Random effects parameters
Intercept	.740*	.708*	.446*	−.505*	−.339*	−.512*
(SE)	(.137)	(.032)	(.107)	(.039)	(.046)	(.053)
SD within studies	.010*	.000	.001	.003	.002	.003
SD between studies	.067	.000	.041	.003	.002	.003
Fail-Safe N	18	8	13	8	3	5
Number of coefficients	7	3	9	5	3	3
Number of samples	4	3	4	5	3	3

Note. PSG = performance screening guide.

p < .05.

Table 3.

Results of the Prosocial Behavior Skills PSG Validity Generalization Meta-Analysis.

Parameter Estimates	Prosocial Behavior Skills
Parameter Estimates	Academic Performance	Academic Behavior	Positive Behavior	Externalizing Problems	Internalizing Problems	Broad Int./Ext.
Random effects parameters
Intercept	.551*	.675*	.578*	−.542*	−.401*	−.717*
(SE)	.(025)	(.083)	(.113)	(.107)	(.045)	(.040)
SD within studies	.000	.051*	.009*	.044	.002	.008
SD between studies	.001	.000	.053	.044	.002	.000
Fail-Safe N	18	19	18	13	4	19
Number of coefficients	10	8	9	8	3	7
Number of samples	5	6	5	8	3	6

Note. PSG = performance screening guide.

p < .05.

Table 4.

Results of the Motivation to Learn PSG Validity Generalization Meta-Analysis.

Parameter Estimates	Motivation to Learn
Parameter Estimates	Academic Performance	Academic Behavior	Positive Behavior	Externalizing Problems	Internalizing Problems	Broad Int./Ext.
Random effects parameters
Intercept	.679*	.688*	.680*	−.537*	−.360*	−.706*
(SE)	(.022)	(.190)	(.040)	(.099)	(.053)	(.068)
SD within studies	.000	.070	.006*	.038	.003	.028*
SD between studies	.000	.070	.004	.038	.003	.000
Fail-Safe N	17	9	31	13	3	18
Number of coefficients	7	4	13	8	3	7
Number of samples	4	4	6	8	3	6

Note. PSG = performance screening guide.

p < .05.

Meta-Analysis Results

Reading

Results of the validity generalization meta-analysis for the PSG Reading item are presented in Table 1. The coefficient of the intercept is considered to be the overall effect size of the relationship between the Reading item score and outcome variables. This means that, for example, the overall effect size for the Reading item and academic performance was .725. The reported intercepts for each of the aggregated outcomes were all significantly different from zero. The largest correlations were with the academic outcomes (i.e., academic performance and academic behavior). Results indicated that within-study variance was significant for academic performance and academic behavior, and between-study variance was significant for academic performance. These results imply that there is more variability in effect sizes within and between studies than may be expected based on sampling variances alone, which would suggest the use of moderator analyses to examine variables, based on participant and criterion variable characteristics, that could explain the variance for those significant findings. As explained in the “Method” section, however, these analyses were not conducted.

Mathematics

Results of the validity generalization meta-analysis for the PSG Math item are presented in Table 2. The reported intercepts for all of the aggregated outcomes were significantly different from zero. Similar to results for the PSG Reading item, the PSG Math item was most strongly correlated with academic performance and academic behavior. Results indicated that within-study variance was significant for academic performance.

Prosocial Behavior

Results of the validity generalization meta-analysis for the PSG Prosocial Behavior item are presented in Table 3. The reported intercepts for all of the aggregated outcomes were significantly different from zero. The PSG Prosocial Behavior item was consistently related to all of the outcomes, with mostly moderate correlation coefficients; the largest correlation was with broadband externalizing/internalizing problems in the negative direction. Results indicated that within-study variance was significant for the academic behavior and positive behavior outcomes.

Motivation to Learn

Results of the validity generalization meta-analysis for the PSG Motivation to Learn item are presented in Table 4. The reported intercepts for all of the aggregated outcomes were significantly different from zero. The PSG Motivation to Learn item was most strongly correlated with broadband externalizing/internalizing problems (in the negative direction), followed very closely by academic behavior, positive behavior, and academic performance. Results indicated that within-study variance was significant for positive behavior and broadband externalizing/internalizing problems.

Publication Bias

We report the results of the Fail-Safe N procedure in Tables 1 through 4. Results suggested that to find the smaller, target effect size of .2, a range of two to 31 studies would be needed. Although firm guidelines for the interpretation of Fail-Safe N values are not available, one clear finding is that the internalizing problems variable consistently had the lowest Fail-Safe N value across all four PSG items. This suggests that very few additional studies may be necessary to produce small effect sizes for the internalizing problems category. In other words, the significant correlation coefficients between the four PSG items and internalizing variables are considered the least stable when compared with coefficients for the other criterion variables.

Discussion

The purpose of this study was to use multilevel meta-analysis to assess the extent to which single-item PSG scores correlate with alternative measures and outcomes such as student academic performance and social/behavioral functioning. Results have important implications for using the elementary version of the PSG as a screener to identify students in need of academic and mental health support. Furthermore, this study represents a unique contribution to the literature that currently includes meta-analyses of multiple-item screeners, but none based on single-item screeners. The 10 studies included in this meta-analysis reported correlation coefficients between PSG-elementary version scores and a wide range of criterion variables, including scores on rating scales of emotional/behavioral functioning and academic outcomes such as office discipline referrals, suspensions, attendance, and grade point average. All of the resulting coefficients were aligned such that higher PSG scores on all four items were positively associated with academic performance, academic behavior, and positive behaviors, and negatively associated with externalizing and internalizing behavior problems.

Recall that our first hypothesis was that the PSG Reading Skills, Mathematics Skills, and Motivation to Learn items would be most strongly correlated with the academic variables. This hypothesis was clearly supported for Reading Skills and Mathematics Skills, as both items were most strongly correlated (all coefficients in the .70s) with the academic performance and academic behavior variables, suggesting that these two items can be used to estimate students’ performance on relevant academic indicators (e.g., grades, attendance, and measures of academic engagement). At the same time, both the Reading and Mathematics items were also significantly (yet less strongly) correlated with the other four criterion variables (i.e., positive behavior, externalizing problems, internalizing problems, and broadband internalizing/externalizing problems). These relationships are consistent with research indicating that academic problems often co-occur with behavioral and social-emotional problems (Sullivan & Conoley, 2004), and suggest that these two PSG items may be predictive of overall risk status rather than limited to identifying academic risk. These relationships also are consistent with the literature on academic enablers, which supports the importance of academic behaviors such as engagement, motivation, and attitudes to predicting academic, behavioral, and psychological outcomes (Gresham & Elliott, 2017). Overall, results for Reading and Mathematics support both the convergent and divergent validity of these two PSG items.

With regard to the Motivation to Learn item, our first hypothesis was not supported, as scores on this item were relatively equally correlated with academic performance, academic behavior, positive behavior, and broadband internalizing/externalizing problems (all magnitudes between .68 to .71). The high correlation coefficients with positive behavior are consistent with previous research on the relationships among student motivation, goal setting, self-regulation, and academic achievement (e.g., Bembenutty, 2016; Mega et al., 2014). However, the strong correlation with broadband internalizing/externalizing problems suggests that scores on Motivation to Learn were unable to discriminate between academic variables and more social-emotional variables. Although we did not hypothesize these results, they are not unexpected when one considers that, as defined by the PSG, the Motivation to Learn item seems to measure behavioral academic engagement, defined as the degree of involvement, or actions and practices, that students direct toward learning (Wang & Eccles, 2013). Previous research indicates that lower academic engagement is associated with higher levels of behavior problems (e.g., Diaz et al., 2017; Kilgus et al., 2012; Thorne & Kamps, 2008).

Our second hypothesis was that the fourth PSG item, Prosocial Behavior, would show stronger correlations with behavioral variables, such as externalizing and internalizing problems, than with academic variables. This hypothesis was not supported by the data; although the strongest correlation was with broadband internalizing/externalizing problems, the next highest coefficient was with academic behavior, and this item was also significantly correlated with academic performance. Furthermore, Prosocial Behavior was more strongly correlated with academic behavior (.68) than positive behavior (.58), even though Prosocial Behavior and positive behavior would appear to be more theoretically aligned. Previous research points to the importance of social skills in predicting positive behavioral and academic outcomes (Elliott & Gresham, 2008, 2017), but the relative uniformity in correlation coefficients for this item suggests that it may be less discriminating than the more academically oriented PSG items. Perhaps, Prosocial Behavior, particularly during the elementary years, reflects overall positive classroom behavior, persistence, attention, and self-control, all of which are factors that are expected to impact achievement and academic behavior (Gresham & Elliott, 2017).

Across all four PSG items, the lowest correlation coefficients were consistently observed with the internalizing problems variable, suggesting that whereas teacher ratings on the elementary version of the PSG are highly correlated with overt student behaviors, schools may need to rely on other assessment methods (e.g., student self-report) to reliably screen for less conspicuous behaviors. This pattern is consistent with the results of the Fail-Safe N analyses, in which the internalizing problems category repeatedly had the lowest Fail-Safe N value across all four PSG items. This pattern also suggests that the internalizing problems correlation coefficients reported in this study were most susceptible to publication bias, likely due to fewer studies including internalizing problems as a dependent variable compared with the other categories of criterion variables examined in this study. At the same time, these lower observed coefficients between the four PSG items and internalizing problems also support the divergent validity of PSG scores, as internalizing problems seem to have the least similarity with the constructs purportedly assessed by the PSG and have shown comparable correlation coefficients with these constructs in previous research (Elliott & Gresham, 2008, 2017).

Limitations and Future Directions

These results must be interpreted within the context of several limitations. First, the current analyses were founded upon a relatively small number of studies (although these studies represent the current literature on the PSG). Similarly, generalizability is limited by the study characteristics presented in Table S1 in supplemental material (e.g., the lack of samples representing preschool and secondary grades is especially notable). In addition, seven out of 10 studies included in this meta-analysis used either the SRSS or BESS as criterion measures. This is not necessarily a limitation in and of itself, as the SRSS and BESS are established measures of important criterion variables. However, additional research is needed to understand the relationships between PSG items and a more diverse sample of behavior rating scales. Finally, a revised version of the PSG, called the SSIS Social-Emotional Learning Edition Screening and Monitoring Scales (Gresham & Elliott, 2017), was recently published. The revised version maintains a similar format to the PSG and most of the PSG items (with the exception of Prosocial Behavior); however, it includes additional items to screen for other areas of functioning. Additional research will need to examine whether similar correlational results are found with this revised version of the scale, and more research is needed to support the use of the PSG with students in preschool and secondary school settings.

Conclusion

The results of this study are especially promising when we consider that each domain on the PSG is assessed by a single item, thereby highlighting the efficiency and simplicity of the PSG within the elementary school context. Given their consistent and significant correlations with criterion variables, these four items seem to provide estimates of broader functioning on relevant academic and social/emotional variables and therefore appear to be ideal for large-scale screening purposes in the schools. An additional advantage of the PSG is the specificity of the items, and how responses may translate more easily to interventions as compared with screeners that provide overall composite scores based on items assessing diverse constructs (Villarreal et al., 2019). Overall, the correlations between PSG scores and criterion variables support the convergent validity of PSG scores (i.e., the items showed strong correlations with variables with which they were hypothesized to be correlated). However, divergent validity was less substantiated based on these analyses (i.e., some of the items were also significantly correlated with less similar variables, such as Motivation to Learn being most strongly correlated with broadband internalizing/externalizing problems). Of course, prospective studies (e.g., Kettler et al., 2012; Kilgus et al., 2012; Krach et al., 2017) are needed to evaluate the ability of PSG items to facilitate accurate decision-making and to identify students in need of further services. For now, the results of this meta-analysis suggest that based on current research, scores on the four PSG items are significantly related to important academic and behavioral variables among students in elementary school settings, and therefore hold potential for contributing to universal screening approaches.

Supplemental Material

Table_S1 – Supplemental material for SSIS Performance Screening Guide as an Indicator of Behavior and Academics: A Meta-Analysis

Supplemental material, Table_S1 for SSIS Performance Screening Guide as an Indicator of Behavior and Academics: A Meta-Analysis by Jeremy R. Sullivan, Victor Villarreal, Evette Flores, Alyssa Gomez and Blaire Warren in Assessment for Effective Intervention

Footnotes

Authors’ Notes

A previous version of this paper was presented at the 2019 convention of the National Association of School Psychologists, Atlanta, GA.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Victor Villarreal

Supplemental Material

Supplemental material for this article is available online at .

References

Allen

A. N.

Kilgus

S. P.

Burns

M. W.

Hodgson

(2019). Surveillance of internalizing behaviors: A reliability and validity generalization study of universal screening evidence. School Mental Health, 11, 194–209. https://doi.org/10.1007/s12310-018-9290-3

Assink

Wibbelink

C. J. M.

(2016). Fitting three-level meta-analytic models in R: A step-by-step tutorial. The Quantitative Methods for Psychology, 12, 154–174. https://doi.org/10.20982/tqmp.12.3.p154

Bembenutty

(2016). Motivation and self-regulated learning among preservice and in-service teachers enrolled in educational psychology courses. Scholarship of Teaching and Learning in Psychology, 2, 231–244. https://dx-doi-org.web.bisu.edu.cn/10.1037/stl0000068

Benson

N. F.

Floyd

R. G.

Kranzler

J. H.

Eckert

T. L.

Fefer

S. A.

Morgan

G. B.

(2019). Test use and assessment practices of school psychologists in the United States: Findings from the 2017 National Survey. Journal of School Psychology, 72, 29–48. https://doi.org/10.1016/j.jsp.2018.12.004

Beretvas

S. N.

Pastor

D. A.

(2003). Using mixed-effects models in reliability generalization studies. Educational & Psychological Measurement, 63, 75–95.

Brown

J. R.

(1992). Detecting potential hucksterism in meta-analysis using a follow-up fail-safe test. Psychology in the Schools, 29, 179–184. https://doi.org/10.1002/1520-6807(199204)29:2<179::AID-PITS2310290213>3.0.CO;2-1

Carson

K. P.

Schriesheim

C. A.

Kinicki

A. J.

(1990). The usefulness of the “fail-safe” statistic in meta-analysis. Educational and Psychological Measurement, 50, 233–243. doi:10.1177/0013164490502001

Carta

J. J.

(2019). Introduction to multi-tiered systems of support in early education. In Carta

J. J.

Young

R. M.

(Eds.), Multi-tiered systems of support for young children: Driving change in early education (pp. 1–14). Brookes.

Chafouleas

S. M.

Kilgus

S. P.

Jaffery

Riley-Tillman

T. C.

Welsh

Christ

T. J.

(2013). Direct behavior rating as a school-based behavior screener for elementary and middle grades. Journal of School Psychology, 51, 367–385. https://doi.org/10.1016/j.jsp.2013.04.002

10.

Cohen

(1988). Statistical power analysis for the behavioral sciences. Routledge.

11.

Diaz

Eisenberg

Valiente

VanSchyndel

Spinard

T. L.

Berger

. . . Southworth

(2017). Relations of positive and negative expressivity and effortful control to kindergarteners’ student-teacher relationship, academic engagement, and externalizing problems at school. Journal of Research in Personality, 67, 3–14. https://doi.org/10.1016/j.jrp.2015.11.002

12.

DiPerna

J. C.

Volpe

R. J.

Elliott

S. N.

(2002). A model of academic enablers and elementary reading/language arts achievement. School Psychology Review, 31, 298–312.

13.

DiPerna

J. C.

Volpe

R. J.

Elliott

S. N.

(2005). A model of academic enablers and mathematics achievement in the elementary grades. Journal of School Psychology, 43, 379–392. doi:101016/jjsp200509002.

14.

Dowdy

Kamphaus

R. W.

Abdou

A. S.

Twyford

J. M.

(2012). Detection of symptoms of prevalent mental health disorders of childhood with the parent form of the Behavioral and Emotional Screening System. Assessment for Effective Intervention, 38, 192–198. https://doi.org/10.1177/1534508412447009

15.

*Elliott

S. N.

Davies

M. D.

Frey

J. R.

Gresham

Cooper

(2018). Development and initial validation of a social emotional learning assessment for universal screening. Journal of Applied Developmental Psychology, 55, 39–51. https://doi.org/10.1016/j.appdev.2017.06.002

16.

Elliott

S. N.

Gresham

F. K.

(2008). SSIS performance screening guide. Pearson Assessments.

17.

Glover

T. A.

Albers

C. A.

(2007). Considerations for evaluating universal screening assessments. Journal of School Psychology, 45, 117–135. doi:101016/jjsp200605005.

18.

Gresham

F. M.

Elliott

S. N.

(2008). Social skills improvement system: Rating scales manual. NCS Pearson.

19.

Gresham

F. M.

Elliott

S. N.

(2017). Social skills improvement system social-emotional learning edition. NCS Pearson.

20.

*Hartman

Gresham

F. M.

Byrd

(2017). Student internalizing and externalizing behavior screeners: Evidence for reliability, validity, and usability in elementary schools. Behavioral Disorders, 42, 108–118. https://doi.org/10.1177/0198742916688656

21.

Jenkins

L. N.

Demaray

M. K.

Wren

N. S.

Secord

S. M.

Lyell

K. M.

Magers

A. M.

. . . Tennant

(2014). A critical review of five commonly used social-emotional and behavioral screeners for elementary or secondary school. Contemporary School Psychology, 18, 241–254. https://doi.org/10.1007/s40688-014-0026-6

22.

Kamphaus

R. W.

Reynolds

C. R.

(2015). Behavior Assessment System for Children—Third Edition (BASC-3): Behavioral and Emotional Screening System (BESS). Pearson.

23.

*Kettler

R. J.

Elliott

S. N.

Davies

Griffin

(2012). Testing a multi-stage screening system: Predicting performance on Australia’s national achievement test using teachers’ ratings of academic and social behaviors. School Psychology International, 33, 93–111. doi:10.1177/0143034311403036

24.

*Kilgus

S. P.

Chafouleas

S. M.

Riley-Tillman

T. C.

Welsh

M. E.

(2012). Direct behavior rating scales as screeners: A preliminary investigation of diagnostic accuracy in elementary school. School Psychology Quarterly, 27, 41–50. https://doi.org/10.1037/a0027150

25.

Kilgus

S. P.

Eklund

Maggin

D. M.

Taylor

C. N.

Allen

A. N.

(2017). The Student Risk Screening Scale: A reliability and validity generalization meta-analysis. Journal of Emotional and Behavioral Disorders, 26, 143–155. https://doi.org/10.1177/1063426617710207

26.

Kilgus

S. P.

Eklund

von der Embse

N. P.

Taylor

C. N.

Sims

W. A.

(2016). Psychometric defensibility of the Social, Academic, and Emotional Behavior Risk Screener (SAEBRS) Teacher Rating Scale and multiple gating procedure within elementary and middle school samples. Journal of School Psychology, 58, 21–39. https://doi.org/10.06/j/jsp.2016.07.001

27.

Kilgus

S. P.

Reinke

W. M.

Jimerson

S. R.

(2015). Understanding mental health intervention and assessment within a multi-tiered framework: Contemporary science, practice, and policy. School Psychology Quarterly, 30, 159–165. https://doi.org/10.1037/spq0000118

28.

*Krach

S. K.

McCreery

M. P.

Wang

Mohammadiamin

Cirks

C. K.

(2017). Diagnostic utility of the social skills improvement system performance screening guide. Journal of Psychoeducational Assessment, 35, 391–409. https://doi.org/10.1177/0734282916636500

29.

*Lane

K. L.

Oakes

W. P.

Common

E. A.

Zorigian

Brunsting

N. C.

Schatschneider

(2015). A comparison between SRSS-IE and SSIS-PSG scores: Examining convergent validity. Assessment for Effective Intervention, 40, 114–126. https://doi.org/10.1177/1534508414560346

30.

*Lane

K. L.

Oakes

W. P.

Ennis

R. P.

Royer

D. J.

(2015). Additional evidence of convergent validity between SRSS-IS and SSiS-PSG scores. Behavioral Disorders, 40, 213–229.

31.

*Lane

K. L.

Richards-Tutor

Oakes

W. P.

Connor

(2014). Initial evidence for the reliability and validity of the student risk screening scale with elementary age English learners. Assessment for Effective Intervention, 39, 219–232. https://doi.org/10.1177/1534508413496836

32.

Mega

Ronconi

De Beni

(2014). What makes a good student? How emotions, self-regulated learning, and motivation contribute to academic achievement. Journal of Educational Psychology, 106, 121–131. https://doi.org/10.1037/a0033546

33.

*Miller

F. G.

Cohen

Chafouleas

S. M.

Riley-Tillman

T. C.

Welsh

M. E.

Fabiano

G. A.

(2015). A comparison of measures to screen for social, emotional, and behavioral risk. School Psychology Quarterly, 30, 184–196. https://doi.org/10.1037/spq0000085

34.

Moore

S. A.

Widales-Benitez

Carnazzo

K. W.

Kim

E. K.

Moffa

Dowdy

(2015). Conducting universal complete mental health screening via student self-report. Contemporary School Psychology, 19, 253–267. https://doi.org/101007/s406880150062x

35.

*Oakes

W. P.

Lane

K. L.

Ennis

R. P.

(2016). Systematic screening at the elementary level: Considerations for exploring and installing universal behavior screening. Journal of Applied School Psychology, 32, 214–233. https://doi.org/10.1080/15377903.2016.1165325

36.

Orwin

R. G.

(1983). A fail-safe N for effect size in meta-analysis. Journal of Educational Statistics, 8, 157–159. https://doi.org/10.2307/1164923

37.

Quintana

D. S.

(2015). From pre-registration to publication: A non-technical primer for conducting a meta-analysis to synthesize correlational data. Frontiers in Psychology, 6, 1–9. https://doi.org/10.3389/fpsyg.2015.01549

38.

R Core Team. (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing. http://www.Rproject.org/

39.

Reinke

W. M.

Herman

K. C.

(2016). Using brief assessments of important indicators to inform school-based interventions and practice. Assessment for Effective Intervention, 42, 3–5.

40.

Shultz

K. S.

Whitney

D. J.

(2005). Measurement theory in action: Case studies and exercises. Sage.

41.

Stormont

M. A.

Herman

K. C.

Reinke

W. M.

King

K. R.

Owens

(2015). The Kindergarten academic and behavior readiness screener: The utility of single-item teacher ratings of kindergarten readiness. School Psychology Quarterly, 30, 212–228.

42.

Stormont

M. A.

Thompson

A. M.

Herman

K. C.

Reinke

W. M.

(2017). The social and emotional dimensions of a single item overall school readiness screener and its relation to academic outcomes. Assessment for Effective Intervention, 42, 67–76.

43.

Sullivan

J. R.

Conoley

J. C.

(2004). Academic and instructional interventions with aggressive students. In Conoley

J. C.

Goldstein

A. P.

(Eds.), School violence intervention: A practical handbook (2nd ed., pp. 235–255). Guilford Press.

44.

Thorne

Kamps

(2008). The effects of group contingency intervention on academic engagement and problem behavior of at-risk students. Behavior Analysis in Practice, 1, 12–18.

45.

Viechtbauer

(2015). Meta-analysis package for R. https://cran.r-project.org/web/packages/metafor/metafor.pdf

46.

Villarreal

Sullivan

Leeth

. (2019). BASC-2 Behavioral and Emotional Screening System: A validity generalization meta-analysis. Manuscript submitted for publication.

47.

Wang

Eccles

J. S.

(2013). School context, achievement motivation, and academic engagement: A longitudinal study of school engagement using a multidimensional perspective. Learning and Instruction, 23, 12–23. https://doi.org/10.1016/j.learninstruc.2013.04.002

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.04 MB