Abstract
This study investigated the psychometric properties of the Preschool Language, Literacy, and Behavior Screener (PLLB-S). We examined and tested the factor structure of the PLLB-S using exploratory and confirmatory factor analyses. We further conducted internal consistency, concurrent validity, and predictive validity analyses and evaluated teacher satisfaction using PLLB-S. Our factor analyses resulted in 22 items distributed among three subscales with high internal consistency: Oral language, emergent literacy, and behavior skills. The PLLB-S and its subscales correlated moderately to strongly with standardized measures. The emergent literacy of the PLLB-S was the only subscale that significantly predicted children’s later vocabulary knowledge. Preschool teachers reported high satisfaction with the content and purpose of the questionnaire. We concluded that this tool with sound psychometric properties can potentially help increase the feasibility and efficiency of implementing standardized assessments in MTSS frameworks in preschool classrooms.
Keywords
When applying a Multi-Tiered System of Supports (MTSS) in preschool classrooms, educators assess child performance on various developmental domains and monitor their progress as they continue to benefit from the tiered intensity of instruction. While interest in implementing MTSS in early state-wide childhood education programs continues to grow (Wackerle-Hollman et al., 2021), implementation barriers, including lack of funding, administrative support, evidence-based practice curricula, and psychometrically sound measures persist (Greenwood et al., 2011). Thus, there is an ongoing effort to provide the resources needed to implement MTSS effectively in preschool classrooms (Carta & Miller-Young, 2019; Wackerle-Hollman et al., 2021). Over the last decade, researchers developed a number of measures, data management systems, data-interpretation and decision-making tools, and decision-making processes (e.g., Individual Growth and Development Indicators, Wackerle-Hollman et al., 2015; The Child Observation Record, Wakabayashi et al., 2019).
Still, preschool educators report challenges with assessing and interpreting child performance across developmental domains for all children in their classrooms (Goldstein et al., 2019; Zucker et al., 2021). Challenges such as excessive time, training, costs, as well as a need to know how to address multiple domains impede teachers’ efforts to implement MTSS fully. Additionally, the younger children are, the more difficult it is to assess their development abilities across domains using one-point-in-time, direct standardized measures (Epstein et al., 2004). One plausible solution might be streamlining teacher efforts by reducing the administration of standardized assessments for each child and prioritizing the areas of greatest needs. A brief, easy-to-administer screener might help teachers make judgments based on their impressions of young children’s language, literacy, and academic-social engagement performances before teachers implement direct performance measures.
This article summarizes the initial evaluation of the psychometric properties of a Preschool Language, Literacy, and Behavior Screener (PLLB-S). PLLB-S was designed to help streamline efforts to prioritize the implementation of performance measures of children who may need further examination in oral language, early literacy, and academic-social engagement domains. If teachers have no concerns about a particular child in a specific area, then follow-up testing in that domain may not be needed or may be delayed. In contrast, areas of concern should be prioritized for immediate follow-up testing.
Measuring Preschoolers’ Performance and Progress Across Developmental Domains
A foundational practice within MTSS is to use psychometrically sound measures to assess children’s skills in all developmental domains, which is followed by decisions regarding a suitable tiered intensity of instruction for each child. Based on the results of performance measures, children who do not yet meet benchmarks receive Tier 2 (e.g., intensified instruction delivered in small groups) or Tier 3 instruction (e.g., individualized instruction) while all children continue to receive the whole-group, core curriculum comprised of evidence-based instructional practices (Tier 1).
In theory, researchers agree that identifying the individualized instructional needs of each preschool child and delivering appropriate interventions improves children’s school readiness (Carta & Miller-Young, 2019). However, translating theoretical approaches into daily classroom practices is rather difficult to do well, resulting in unsystematic implementation of MTSS in early childhood (Wackerle-Hollman et al., 2021).
Preschool teachers are often considered the ideal individuals to collect information about children’s performance and progress (Wakabayashi et al., 2019). Although many preschool teachers understand the value of standardized measures, they report a lack of time and other organizational challenges that impede their routine use (Goldstein et al., 2019). For example, educators find it difficult to administer tests with children individually while monitoring the behavior of other children. Teachers who vary in their experience may need to attend training, which may require staying after school hours. Additional testing requirements might be challenging to implement for teachers in a professional context that many teachers change their positions frequently due to health risks, low pay, and limited health insurance benefits (McLean et al., 2021)
Practical barriers also reduce the use of proper assessment practices in ECE. For example, lack of quality instructional and behavior management practices related to assessment implementation may affect preschoolers’ responses to the assessment questions or tasks. Factors such as children’s rapidly changing needs and emotional states might affect their performance (Meisels, 2007). Young children might be unfamiliar with the assessment context or the assessor and may simply prefer not to respond. Assessors may need to predict each child’s needs, use behavior management practices or arrange the assessment environment to encourage young children to respond consistently, which is essential for ensuring the validity of test results (Epstein et al., 2004). To reduce the burden of practical barriers related to assessment implementation, teachers or other assessors may use an initial screener across developmental domains and reduce the level of assessment needed for all children in their classrooms.
Teacher perceptions and observations have been widely used to help measuring child skills in rating scales. The evaluation of rating scales most of which have been applied to older children have yielded equivocal results. Some studies demonstrate that teacher evaluation of child comprehension and school readiness, are likely to be accurate in relation to the standardized assessments (Paleczek et al., 2017). However, other studies indicate that teachers tend to show cultural, ethnic or racial bias, particularly in their evaluations of child social emotional skills (Ura & d’Abreu, 2022). In these studies, it is also noted that teacher bias might be affected by the teacher education level, child age, or teachers’ experience in administering standardized assessments (Begeny & Buchanan, 2010; Mashburn & Henry, 2004). Despite the bias teachers may demonstrate when using rating scales, preschool classroom teachers seem well positioned to evaluate child skills as they observe or interact with children regularly for prolong durations.
The Standards for Educational and Psychological Testing (AERA et al., 2014) and the Institute of Educational Sciences’ Practice Guide (Hamilton et al., 2009), recommend a two-stage screening to increase confidence in data-based decision-making. A two-stage screening involves applying a less expensive and more convenient screener to a large sample first, followed by a more robust and systematic test to inform the final decision. A screening measure that relies on teachers’ observations and professional judgments when evaluating children’s language, literacy, and academic-social engagement skills may be a feasible first step to assess children’s learning needs. This has the potential to minimize the time, intrusiveness, and expense of implementing multiple assessments for all children to identify children who may be at-risk for developmental delays. As educators seek to identify areas of need across developmental domains, the use of authentic observations may streamline such efforts and reduce burdensome testing for teachers and children.
The current study sought to provide preliminary data on the psychometric properties of a screener that could be a useful first-stage assessment and inform the use of second-stage standardized measures. To our knowledge, there is lack of availability of comparable first-stage screening tools designed to measure preschoolers’ oral language, emergent literacy, and academic and social engagement skills. PLLB-S is a short screener that helps early childhood educators evaluate child skills in these three domains and help teachers use their professional judgment to prioritize subsequent assessments. Therefore, we investigated the reliability, validity, and acceptability of using PLLB-S specifically for MTSS. This study addressed the following research questions: 1. What are the factor structures and internal consistency reliability of PLLB-S? 2. To what extent does PLLB-S correlate with other measures of oral language, emergent literacy, and social behavior? 3. How predictive are PLLB-S and its subscales for predicting children’s vocabulary learning in an early vocabulary intervention program? 4. To what extent do preschool teachers perceive PLLB-S as an acceptable screener of preschoolers' oral language, emergent literacy, and academic-social engagement skills?
Method
Participants and Setting
Participants were enrolled in two randomized controlled trials that investigated the effects of a vocabulary learning intervention program, Story FriendsTM. Participants included 137 children from 19 preschool classrooms for the first study (Kelley et al., 2020) and 176 children from 17 preschool classrooms for the second study (Madsen et al., 2022). All classrooms provided a full-day early childhood program that was funded in part through Florida’s Voluntary Prekindergarten (VPK) program. Fifteen programs also received additional subsidy from Florida’s Division of Early Learning School Readiness Program and/or Title I funding. In total, 36 preschool teachers completed PLLB-S for 313 preschool children: 165 girls and 148 boys with an average age of 54.3 months (Range = 38–64 months). Child demographic characteristics are presented in Table 1 as a Supplemental Material.
Families were asked to complete a brief demographic survey; 67 of 137 families from Study 1 and 136 of 176 families from Study 2 returned the surveys. The survey included information about the education and income levels of the children’s caregivers, the primary language spoken at home, and children’s gender, age, and ethnicity. A summary of family demographic characteristics appears in Supplemental Table 1. A total of 31 of 36 teachers completed demographic questionnaires about their teaching experience, educational level, gender, age, and ethnicity. Teachers’ demographic characteristics are presented in Table 2 as a Supplemental Material.
Measures
Preschool language, literacy, and behavior screener (PLLB-S)
Bradfield et al. (2013) completed the initial item development of PLLB-S based on early language and literacy development literature and experts' content knowledge as a part of Center for Response to Intervention in Early Childhood (CRtIEC) project (https://innovation.umn.edu/igdi/project-archive/crtiec/). The expert content knowledge was gathered during the cross-site meetings of the CRtIEC project team consisted of 13 researchers from five partner sites. The minutes from these meetings informed decisions for the initial item pool development. The pool included items addressing child oral language (n = 30), emergent literacy (n = 50), and behavior (n = 20) skills. Bradfield et al. (2013) conducted a pilot study and preliminary item analysis (i.e., item-to-total correlations and coefficient alpha) to reduce the questionnaire to 49 items. Using these 49 items, the team conducted an initial PLLB-S validation study with 116 preschool children to investigate the efficiency of implementing IGDIs and PLLB-S to identify children who could benefit from additional instruction. Except for initial item development and analyses conducted by Bradfield et al. (2013), limited information on the psychometric properties of this questionnaire exists.
We sought to evaluate whether the questionnaire conforms to reliability and validity standards. We initially reduced the number of items to make it acceptable as a whole-class screening tool by using an iterative process and incorporating feedback from experts and teachers. Based on this input, we reduced the questionnaire to 24 items representing the following key kindergarten readiness indicators: oral language (10 items), emergent literacy (eight items), and academic-social engagement (six items). Each item has four response choices for teachers to complete for each child in the classroom (e.g., Never [0–10%] = 1, Rarely [11–49%] = 2, Often [50–89%] = 3, and Almost Always [90–100%] = 4). Teachers were given standard instructions: “For each student, think about the frequency with which this child displays each skill or behavior for each sentence listed under “Language, Literacy, and Behavioral Skills.” Then write the number from 1 (Never) to 4 (Almost Always) for each sentence in the left column.” To calculate PLLB-S total scores, we added the teacher responses for each subscale (oral language skills subscale min = 10, max = 40; emergent literacy subscale min = 8 max = 32; academic-social engagement subscale min = 6, max = 24) and the total measure (min = 24, max = 96).
Peabody Picture Vocabulary Test (PPVT) fourth Edition
PPVT (Dunn & Dunn, 2007) is a norm-referenced receptive vocabulary test for Standard American English. We selected this measure because it is widely used across experimental studies and review papers and is appropriate for use with diverse population including racial, ethnic, and gender representation (Goldstein & Hersen, 2000; Washington & Craig, 1999). PPVT is normed for 2.6–90-year olds and requires individuals to point to one of four pictures that best illustrates the meaning of the word presented by the test administrator. The test items include verbs, nouns, and adjectives. The split-half reliability coefficients based on the normative sample range from .94 to .95, and test-retest coefficients range between .92 and .96.
Clinical Evaluation of Language Fundamentals Preschool-2 (CELF-P2)
CELF-P2 (Wiig et al., 2004) measures the expressive and receptive language skills of 3-6-year-old preschool children to identify, diagnose, and evaluate language deficits. We selected this measure as it provides a comprehensive analysis of child language skills
Florida Voluntary Prekindergarten (VPK) Assessment
The VPK Assessment (Lonigan, 2011) is a standardized measure used across Florida to screen print knowledge (PK), phonological awareness (PA), mathematics, and oral language subscales. We selected the VPK assessment as Pre-K teachers are encouraged to administer this measure in the fall, winter, and spring in Florida and teachers’ periodic data collection allowed our research team to obtain VPK Fall PK and PA subscale scores. In 2009–2010, state-wide implementation of the measure with samples of over 2000 preschoolers revealed internal consistency reliabilities of .85 for PK and .83 for PA. Correlations between the VPK-PK and PA subscales and the Test of Preschool Early Literacy (TOPEL; Lonigan et al., 2007) were reported to be significant (N = 288, r PK = .86, p < .001; r PA = .68, p < .001).
Social Skills Improvement System (SSIS) Social–Emotional Learning (SEL) and Academic Functioning (AF) Screeners
The SSIS Screening/Progress Monitoring Scales (Elliott et al., 2018) are designed to evaluate preschool to 12th-grade children’s social-emotional learning (SEL) in five areas (Self-Awareness, Relationship Skills, Decision-Making, Social Awareness, and Self-Management) and Academic Functioning (AF) in three areas (Motivation to Learn, Reading Skills, and Mathematics Skills). For each behavior or academic skill area, classroom teachers evaluate child proficiency on a scale of 1–5 (1 = lowest and 5 = highest performance). For the current study, we collected children’s SEL and AF composite scores because these scores provide a general picture of child social behaviors and academic competence based on nationally (USA) standardized behavior rating scales. The total scores for the SEL composite range from 5 to 25. The range of AF scores is 3–15. Based on the teacher ratings of children between 4 and 8 years old, Elliott et al. (2018) reported the reliability estimate of SEL as .93 (N = 268) and the test–retest reliability estimate as .89 (N = 266). The reliability estimate of AF was .91 (N = 268), and the test-retest reliability estimate was .91 (N = 266).
Vocabulary Learning
Our dataset included vocabulary learning scores of children who participated in two randomized trials of the Story Friends
TM
vocabulary intervention study (57 children in the experiment condition and 60 children in the control condition) (Kelley et al., 2020; Madsen et al., 2022). These replication studies evaluated the effects of the Story Friends
TM
vocabulary program on child academic vocabulary learning in preschool classroom settings. Small groups of children listened to Story Friends
TM
storybooks with pre-recorded and embedded vocabulary lessons that encourage children’s active participation. The research team measured child learning of vocabulary words taught during the Story Friends
TM
vocabulary program using a researcher-developed open-ended definitional task (e.g., Tell me. What does
Social Validity
We measured teacher satisfaction and ratings of usability of PLLB-S using a brief social validity questionnaire. This survey included five questions rated on a 5-point Likert scale (1 = Disagree, 3 = Neutral, and 5 = Completely Agree). Survey items addressed the acceptability of PLLB-S, whether the time required to complete this measure was reasonable, whether the measure addressed critical skills for later academic success, whether the directions were clear, and whether the teachers would use this measure in the future.
Procedures and Scoring Reliability
All participating classroom teachers received PLLB-S and the SSIS screening measures in file folders in the fall within the first 2 months of beginning the academic year. Before completing PLLB-S, we asked teachers to read the instructions and read through the items. Teachers completed PLLB-S and SSIS measures in the order they chose. Once teachers completed these measures, we asked teachers to complete the social validity questionnaire. The scores of all other measures (Teacher and Parent Demographic Measures, PPVT-4, CELF-P2, and the VPK assessment) were obtained from the databases of the randomized group studies. VPK assessments results were obtained from the preschool teachers who administered the fall assessments.
Trained research assistants administered and scored the PPVT-4 and CELF-P2 measures at the beginning of the year. A primary scorer used CELF-P2 and PPVT-4 protocols to score, and then a second scorer checked if the user manuals used, and item scores summed correctly. A trained research team member scored three Unit Vocabulary Tests about a month apart (December, February, March). A second-team member independently scored one-third of the tests. Item-by-item interobserver agreement was calculated. For study 1, agreement scores averaged 97.8% (range 96–100%) for the pretest and 92.9% for the post-test (range 87.5–100%). For study 2, the agreement was 100% for the pretest and 99.5% (range 99–100%) for the post-test.
Data Analysis
We descriptively investigated the PLLB-S datasets initially and presented the means and standard deviations of teachers’ responses to each item in Table 3 as Supplemental Material. Later we evaluated the factor structure with an exploratory factor analysis (EFA) on the sample from the first study (n = 137) and a confirmatory factor analysis (CFA) on the sample from the second study (n = 176) using Mplus 8.4 (Muthén & Muthén, 1998-2017).
Responses to the items on PLLB-S were treated as ordered categories. We used the weighted least squares mean with variance adjusted (WLSMV) estimation to analyze the polychoric correlation matrix of the 24 items (See Supplemental Table 3). Geomin rotation was used to interpret the results of EFA. We addressed missing data using pairwise deletion within Mplus 8.4. Results from the EFA in the first sample informed a model structure that was subjected to CFA with the second sample. We examined the model fit by using the chi-square likelihood ratio statistic, Bentler (1990) Comparative Fix Index (CFI), the Standardized Root Mean Square Residual (SRMR), and the Root Mean Square Error of Approximation (RMSEA; Rigdon, 1996). According to Hu and Bentler (1999), values for CFIs greater or equal to .95 and values for SRMR and RMSEA less than or equal to .08 indicate good levels of fit. Cronbach’s alphas were calculated for the items that were determined to represent a factor based on the EFA results. We evaluated the concurrent validity of the questionnaire by calculating the correlation coefficients of each extracted factor (oral language, emergent literacy and academic-social engagement) with the results of the standardized measures of the specific domains (PPVT-4 and CELF-P2 Core Language, VPK-PA and PK, SSIS-SEL and AF composite scores). The predictive validity of the questionnaire was calculated with a multiple regression model evaluating the predictor value of PLLB-S subscale scores and the criterion variable of children’s vocabulary learning during the Story Friends TM Intervention. For the social validity assessment of teachers’ perceptions of acceptability and satisfaction with the questionnaire, we reported the ratings and the frequencies of teacher responses for each item.
Results
Exploratory Factor Analysis (EFA)
The data used to run EFA demonstrated that teacher mean responses to the 24 items on PLLB-S ranged from a high of 3.33 (SD = 0.78) on a four-point scale for item 4 (“This child identifies and labels most common objects in the classroom”) to 2.28 (SD = 1.03) for item 13 (“This child is able to identify the sounds associated with letters”). Item frequencies from the EFA are presented in Table 4 as a Supplemental Material. There were no variables with 5% or more missing data.
All EFA analyses treated the item data as ordered categorical responses. We used the TYPE = complex option to account for the nested structure of the data (i.e., children nested within 19 teacher classrooms). The degree of non-independence among 19 classrooms was minimal as measured by the intraclass correlations for the 22 items (M = .025, range = .060–.056). Geomin rotation (oblique) was used because we aimed to model the correlations among factors.
Ultimately, we conducted eight EFAs with a 24-item version and a 22-item version of PLLB-S, extracting one to four factors. The 24-item version of the questionnaire included two additional emergent literacy items (e.g., “This child knows how to hold and follow along with a book [e.g., position; turn pages; left to right, top to bottom],” and “The child pretends to write what looks like letters to form words.”). The factor loadings for each item of the 24-item questionnaires showed that these two items did not load strongly on any of the three factors. Because these items targeted skills that loaded somewhat evenly across multiple factors, we decided to remove them from the questionnaire for statistical and conceptual reasons.
The three-factor solution with 22 items provided the best fit according to model fit indices: χ2 (168, N = 137) = 298.18, p < .01, RMSEA = .073, CI [.059–.087], CFI = .978, and TLI = .969, SRMR = .052. Despite the lack of significant fit indicated by Chi-square, our alternative indices indicated a good fit according to guidelines provided by Hu and Bentler (1999). We used suggestions from Tabachnick et al. (2007) to determine that an item loaded on the factor. According to this guideline, factor loadings over .55 indicate a good threshold for loading cut-offs. The pattern structure matrix loadings indicated that all items loaded on the respective factors that they were intended to measure. Secondary factor loadings were evaluated based on a minimum difference of .20 between loadings. Item 7 secondarily loaded on the third factor; items 14, 15, and 16 secondarily loaded on the first factor and item 19 secondarily loaded on the first factor. Correlations between factors indicated that the oral language subscale significantly correlated with emergent literacy and academic-social engagement subscales. There was not a significant correlation between emergent literacy and Academic-Social Engagement subscales.
Internal Consistency
We computed Cronbach’s alpha scores for total scale and each of the subscale scores of PLLB-S using SPSS 26. Internal consistency of the overall 22-item PLLB-S showed high reliability (α = .96, 95% CI [.95, .97]). Oral language (α = .96, 95% CI [.95, .97]), emergent literacy (α = .93, 95% CI [.90, .95]) and academic-social engagement (α = .93, 95% CI [.90, .95]) subscales also indicated high internal consistency reliability.
Confirmatory Factor Analysis (CFA)
The data used to run CFA included teachers' mean responses to the 22 items ranged from a high of 3.60 (SD = 0.70) on a four-point scale for item 4 (“This child identifies and labels most common objects in the classroom;”) to 2.60 (SD = 1.00) for item 16 (“This child is able to blend parts of words to make whole words;”). Item frequencies from the CFA are presented in Table 4 as a Supplemental Material. Six items (items, 4, 7, 14, 15, 16, 19) had 0.6% missing data, Item 11 had 4.5% missing data, and Item 12 had 5.1% missing data.
We examined the model fit for the 22-item, three-factor model. The model was identified by setting the first factor loadings to 1.0 for each of the three factors. Item responses were treated as ordered categorical. We used the clustered data option (TYPE = complex) to adjust chi-square statistics for the nested data structure from 17 classrooms represented by 17 preschool teachers. The degree of non-independence among the 17 classrooms was minimal as measured by the intraclass correlations for the 22 items (M = .039, range = .024–.050). The CFA model derived from EFA results was consistent with the original conceptualization of PLLB-S.
The chi-square value, χ2 (321, N = 176) = 487.249, p < .01, indicated a significant lack of fit; however, the other model fit indices indicated acceptable fit (RMSEA = .054, CI [.044–.064]), SRMR = .067, TLI = .974, and CFI = .978). Standardized factor pattern coefficients for the questionnaire items were significantly different from zero (p < .01). Hair, Black, Babin, and Anderson (2014, p. 618) presented guidelines for CFA and suggested that “standardized loading estimates should be .5 or higher, and ideally .7 or higher.” Considering this criterion, our loadings were above the ideal guideline. Only item 20 had a factor loading of 1.023 (SE = .026). As explained by Jöreskog (1999), item loadings greater than 1 imply high correlations among the items, but it does not imply that something is wrong. The correlations among the factors were statistically significant (p < .001). The largest correlation between factors was between oral language and academic-social engagement subscales (r = .81) (See Supplemental Table 3).
Internal Consistency
We used SPSS 26 to calculate internal consistency reliability coefficients for the total scale and each of the subscale scores of the PLLB-S Study 2. The 22-item PLLB-S scores showed high reliability, α = .97, CI [.96–.97]. We also examined the item-to-total correlations for the items within each of the three factors. As shown in Supplemental Table 3, three subscales of PLLB-S demonstrated high alpha values as well. All 22 items demonstrated adequate item-to-total correlations (range: .63–.86) using the minimum acceptable values of 0.3–0.4 as suggested by Nunnally and Bernstein (1994).
Concurrent and Predictive Validity
We used Mplus 8.4 to examine the relations between PLLB-S and the standardized assessments (PPVT-4, CELF-P2, VPK-PA and PK, SSIS-SEL and AF). As seen in Table 5, all correlations were positive and significant. PPVT and CELF scores positively correlated with the oral language subscale. Despite the rather small sample size (n = 100), VPK PK and VPK PA assessments and the emergent literacy subscale correlated positively. SSISS-SE and AF subscales and the academic-social engagement subscale were positively correlated.
Predictions of Vocabulary Learning
We conducted multiple regression analyses using SPSS 24 to examine the predictive value of the three PLLB-S subscales predicting the vocabulary learning of preschool children. We evaluated each subscale separately and added the interaction and condition terms at the second step in the multiple regression model (e.g., treatment and control). Each model included one of the three subscales, the condition and the interaction between the grand-mean centered value of the subscale (i.e., oral language) and the condition. We used the interaction effect to evaluate the differential predictive validity of the subscale for predicting vocabulary learning for those children in the control condition versus those in the treatment condition.
The model consisting of condition, interaction, and the emergent literacy subscale explained 61.4% of the variance in children’s vocabulary learning. The regression analyses revealed that, after accounting for the significant contribution of experimental condition (b = 28.87, t = 12.16, p < .01), the emergent literacy subscale was the only subscale that had a significant interaction with the treatment condition (b = 7.94, p < .05). Hence, the predictive strength of the emergent literacy subscale was stronger for the treatment group than the control group. A one-unit increase in the scores of the emergent literacy subscale was associated with a 2.47 vocabulary learning point increase for children in the control condition (t = 1.26, p = .21), whereas a one-unit increase in the scores of the emergent literacy subscale was associated with a 10.41 vocabulary learning increase for children in the treatment condition. The oral language and academic-social engagement models showed that these subscales and their interactions with the conditions did not significantly predict later vocabulary learning.
Social Validity
A sample of 29 of 32 teachers responded to the final version of PLLB-S at the beginning of the following school year. 22 teachers completely and 7 teachers somewhat agreed that the directions for using this PLLB-S were clear. Of the 29 teachers, 16 teachers completely and 10 teachers somewhat agreed that PLLB-S was a good way to measure their students’ language and literacy development, and only one teacher somewhat disagreed and two teachers provided neutral responses with this statement. Sixteen teachers completely and 11 teachers somewhat agreed that the questionnaire measures critical skills for later academic success except for one teacher who provided neutral response and one teacher who somewhat disagreed with this statement. The results also showed that 22 teachers completely and 5 teachers somewhat agreed that the amount of time required to use PLLB-S was reasonable, except for two teachers who provided neutral responses. Lastly, 26 teachers reported that they would use PLLB-S in the future and 3 teachers provided neutral responses with this statement. Results indicated that most teachers reported satisfaction with the content and feasibility of PLLB-S and indicated willingness to re-use this questionnaire to evaluate the language, literacy, and academic-social engagement skills.
Discussion
The purpose of this study was to examine the psychometric properties and internal structure of PLLB-S and teacher satisfaction. PLLB-S is a screener designed to help allocate efforts while prioritizing the order and content of follow-up testing for children whose teachers report concerns about those children’s oral language, early literacy, or academic-social engagement subscales. This approach is congruent with the literature indicating that teachers’ informal observations and professional judgments are viable sources that help evaluate child performances across developmental domains (Bagnato et al., 2014). Further, our approach is consistent with the two-stage testing recommendations of The Standards for Educational and Psychological Testing (AERA et al., 2014) and the Institute of Educational Sciences' Practice Guide (Hamilton et al., 2009). For example, PLLB-S was designed to be used in the first screening stage of the two-stage screening process. Using an inexpensive method applied to a large sample, the first stage should be followed by a second stage that requires more robust and accurate testing to inform the decisions on children’s performances.
Factor Structure of PLLB-S
Using EFA and CFA, our factor analyses indicated that PLLB-S contains the three hypothesized oral language, early literacy, or academic-social engagement subscales. Based on the EFA analysis results, we removed two items that were created to address the preschoolers' print awareness skills. These two items presented weak factor loadings and conceptually did not distinctly address print awareness. We recognize that evidence from previous research indicates that print awareness as a skill of emergent literacy predicts later reading achievement (Justice & Ezell, 2001). The failure of the two items related to contribute to the model may be because the scores were uniformly high and thus not sensitive for preschoolers. Future psychometric examinations of PLLB-S should address how these two items could be re-worded or altered to address higher-level emergent literacy skills requiring mastery in print awareness.
Evidence for Internal Consistency and Validity
In line with the findings of Bradfield et al. (2013), PLLB-S and its subscales showed high internal consistency. As evidence for validity, we examined the relationships between PLLB-S and the standardized assessments (PPVT-4, CELF-P2, VPK-PA, VPK-PK, SSIS). All correlations were positive and significant; however, a more careful examination of the strength of some correlations indicated varying degrees of correlations. For example, although the SSIS-SEL and SSIS-AF scores correlated strongly with the academic-social Engagement subscale, SSIS-AF also had high correlations with the emerging literacy and oral language subscales. These correlations between PLLB-S subscales and social emotional and academic functioning items are in line with the research indicating children with lower language skills in early grades tend to show later behavior problems (Chow et al., 2018). The PPVT and CELF had lower yet significant correlations with the oral language subscale. Unlike the reported strong predictive value of the oral language subscale of PLLB-S on the CELF-P2 core score in the initial validation study (Bradfield & McConnell, 2013), our results indicated a moderate correlation. The VPK-PK had a higher correlation than VPK-PA with the Emerging Literacy subscale. The mix of findings warrants further investigation. Future research is needed to evaluate concurrent validity with a larger and perhaps more diverse sample.
We examined the predictive validity of the PLLB-S scores by evaluating their relation to children’s vocabulary learning during a small group book reading intervention. Our results indicated that the emergent literacy subscale was the only subscale whose interaction with the treatment condition predicted child vocabulary learning performance. This result is consistent with the previous findings showing the predictive value of emergent literacy skills in learning novel vocabulary (Lonigan et al., 2008). However, it was surprising that the PLLB-S oral language subscale did not significantly predict child vocabulary learning, in contrast to past evidence of the predictive value of oral language abilities for child vocabulary learning (Spira et al., 2005). The oral language subscale includes three items directly addressing child vocabulary knowledge. Although these items address child expressive and receptive language skills, no item was designed to address children’s use of challenging vocabulary. Moreover, teachers scored “almost always” frequently for the oral language items, resulting in a restricted range that would attenuate correlations. Our interpretation is that the predictive validity of the PLLB-S screener might be affected by the sample characteristics, length of time between prediction and outcome, or the outcome criterion itself (Meisels, 2007). Our sample mainly included preschoolers from lower income and education households and with a wider range of teacher-rated emergent literacy skills compared to oral language skills. It is plausible that preschoolers with varying degrees of emergent literacy skills are linked to child vocabulary learning outcomes, above and beyond the oral language skills, that showed a more limited range of variability.
Teacher satisfaction with PLLB-S
Our social validity results provided initial evidence for teacher satisfaction and the feasibility of screener administration. Twenty-seven teachers reported that the time devoted to completing the screener was reasonable, and 26 teachers indicated their willingness to continue using the screener. This result might suggest that using PLLB-S could help reduce burdensome steps of administering multiple standardized tests requiring additional effort (e.g., time, human resources, and finding a quiet location). Additionally, because teachers can use their daily informal observations and professional judgment as sources for evaluation, preschoolers would be spared the undue burden of excessive testing. Initial findings from our social validity measure confirm that PLLB-S may function as a feasible tool informing the first stage of decision-making when allocating and prioritizing efforts and resources to measure child performances.
Limitations and Future Research
This study provides initial evidence supporting the psychometric quality of PLLB-S. However, our analysis is limited to our sample characteristics. Only 49% of the first study population and 77% of the second study population reported their demographic characteristics. Despite our knowledge that participants were recruited from low to middle-income neighborhoods, we are unable to make descriptions of participant characteristics and how these characteristics may influence the study results. Future investigations could use a larger sample size to replicate the current study’s results.
We identified positive and significant correlations between PLLB-S subscales and other oral language, emergent literacy, and academic and social engagement measures. However, these relations do not necessarily lead to interpretations regarding PLLB-S’ classification accuracy (Jenkins et al., 2007). Hence, there is still a need for future research investigating the classification accuracy of PLLB-S as a screening tool. It also would be desirable to expand the scope of PLLB-S to include items on preschoolers’ early mathematic skills, as mathematic ability has been shown to be a strong contributor to later academic proficiency (Clements et al., 2016). Lastly, our predictive validity analysis is limited to vocabulary learning outcomes of a sub-group of study participants. The Story Friends TM , a supplemental vocabulary intervention, was delivered in small groups across the classrooms and only included preschoolers who might benefit from the intensified instruction. In the future, other outcome variables such as later emerging literacy, academic, and social skills could be investigated.
Additional studies need to investigate three significant sources of validity evidence described in the Standards for Educational and Psychological Testing (2014) that were not investigated in the present study—content, response processes, and consequences of testing. Future research can investigate inter-rater reliability if child evaluations are collected simultaneously from multiple teachers. Iteratively building the validity evidence investigated in multiple studies could facilitate scaling up the implementation of MTSS in early childhood.
In conclusion, we propose a screener that would potentially help increase the feasibility and efficiency of implementing standardized assessments. Our examination of the initial psychometric properties of the PLLB-S confirmed a three-factor subscale structure, high inter-item correlations, and high teacher satisfaction using this scale. A next step for the current research is to examine how the PLLB-S helps teachers accurately identify young children who may need follow-up diagnostic testing in oral language, early literacy, or academic-social engagement domains.
Supplemental Material
Supplemental Material - Psychometric Properties of a Preschool Language, Literacy, and Behavior Screener
Supplemental Material for Psychometric Properties of a Preschool Language, Literacy, and Behavior Screener by Yagmur Seven, Robert F. Dedrick, Keri M. Madsen, Trina D. Spencer, Elizabeth Kelley, and Howard Goldstein in Journal of Psychoeducational Assessment
Supplemental Material
Supplemental Material - Psychometric Properties of a Preschool Language, Literacy, and Behavior Screener
Supplemental Material for Psychometric Properties of a Preschool Language, Literacy, and Behavior Screener by Yagmur Seven, Robert F. Dedrick, Keri M. Madsen, Trina D. Spencer, Elizabeth Kelley, and Howard Goldstein in Journal of Psychoeducational Assessment
Supplemental Material
Supplemental Material - Psychometric Properties of a Preschool Language, Literacy, and Behavior Screener
Supplemental Material for Psychometric Properties of a Preschool Language, Literacy, and Behavior Screener by Yagmur Seven, Robert F. Dedrick, Keri M. Madsen, Trina D. Spencer, Elizabeth Kelley, and Howard Goldstein in Journal of Psychoeducational Assessment
Supplemental Material
Supplemental Material - Psychometric Properties of a Preschool Language, Literacy, and Behavior Screener
Supplemental Material for Psychometric Properties of a Preschool Language, Literacy, and Behavior Screener by Yagmur Seven, Robert F. Dedrick, Keri M. Madsen, Trina D. Spencer, Elizabeth Kelley, and Howard Goldstein in Journal of Psychoeducational Assessment
Supplemental Material
Supplemental Material - Psychometric Properties of a Preschool Language, Literacy, and Behavior Screener
Supplemental Material for Psychometric Properties of a Preschool Language, Literacy, and Behavior Screener by Yagmur Seven, Robert F. Dedrick, Keri M. Madsen, Trina D. Spencer, Elizabeth Kelley, and Howard Goldstein in Journal of Psychoeducational Assessment
Footnotes
Acknowledgments
We would like to thank the teachers who participated in the study, as well as Dr Wendy L. Olsen, Dr Lindsey Peters-Sanders, Katharine Hull, and other USF Child Language Development Laboratory research assistants for their help with data collection. Last but not least, our special thanks to Dr Alisha Wackerle-Hollman and Dr Scott McConnel for providing detailed information about the PLLB-S item development and initial testing.
for this article is available on the Journal of Psychoeducational Assessment website along with the online version of this article.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R324A170073 to the University of South Florida. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.
Supplemental Material
Supplement material for this article is available in online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
