Abstract
This study investigated the classification accuracy of the Dynamic Indicators of Vocabulary Skills (DIVS) as a preschool vocabulary screening measure. With a sample of 240 preschoolers, fall and winter DIVS scores were used to predict year-end vocabulary risk using the 25th percentile on the Peabody Picture Vocabulary Test–Third Edition (PPVT-III) to denote risk status. Results indicated that DIVS Picture Naming Fluency (PNF) and Reverse Definition Fluency (RDF) demonstrated very good accuracy in classifying students according to year-end vocabulary risk status. The DIVS measures also demonstrated stronger accuracy than demographic characteristics known to be indicators of vocabulary difficulties (socioeconomic status, English learner [EL] status, and sex). Combining PNF and RDF did not result in sufficient improvement in accuracy to justify administering both measures as opposed to just one. Further examination of predictive probability values revealed the potential for DIVS measures to improve the precision of vocabulary risk identification over considering EL status alone. Overall, results supported the use of the DIVS as a brief and inexpensive tool for preschool vocabulary screening.
Vocabulary knowledge is fundamentally associated with educational success (Dickinson & Tabors, 2001; Duncan et al., 2007). Knowledge of word meanings is central to language comprehension and is associated with the successful acquisition of literacy skills (Sénéchal, Ouellette, & Rodney, 2006). As a mediator of receptive, expressive, and written communication, vocabulary is critical for enabling children to learn from instruction, interact with teachers and peers, and acquire knowledge from curricula.
Vocabulary knowledge plays one of its most important roles in facilitating reading acquisition and proficiency. Oral language skills are consistently found to be one of the strongest predictors of later reading skills (Lonigan, Schatschneider, & Westberg, 2008; Snow, Burns, & Griffin, 1998; Whitehurst & Lonigan, 1998). For young children, vocabulary size is associated with the acquisition of phonological awareness skills (Sénéchal et al., 2006), and vocabulary knowledge has been shown to facilitate decoding and word identification skills (Nation & Snowling, 1998; Ouellette, 2006). Subsequently, once words are effectively decoded, they are only understood when the reader can access their meaning. Thus, vocabulary knowledge is crucial for reading comprehension (Joshi, 2005; Nation, 2009). Studies indicate that significant comprehension impairment can occur when as few as 2% to 5% of word meanings in a passage are unknown (Carver, 1994; Hsueh-Chao & Nation, 2000; Schmitt, Jiang, & Grabe, 2011). In short, vocabulary knowledge is an essential aspect of language proficiency, reading acquisition, and general academic success.
Unfortunately, not all children enter school with similar levels of vocabulary knowledge, and several factors have been associated with vocabulary differences among preschool-aged children. Studies have repeatedly observed profound differences in vocabulary knowledge associated with relative socioeconomic status (SES), with children from lower SES households demonstrating significantly lower vocabulary knowledge (Farkas & Beron, 2004; Fish & Pinkerman, 2003; Hart & Risley, 1995; White, Graves, & Slater, 1990). English learners (ELs) often lag significantly behind monolingual children in their vocabulary knowledge (Hanson et al., 2011; Jean & Geva, 2009; Proctor, Carlo, August, & Snow, 2005). Tabors, Paez, and López (2003) found that the vocabulary knowledge of bilingual preschool-aged children (age 4) was 2 standard deviations below the mean of monolingual populations. Although early differences in language proficiency among males and females tend to disappear over time (Bornstein, Hahn, & Haynes, 2004), statistically significant differences have been observed that favor females’ verbal expression skills and vocabulary size in early childhood and preschool (Bornstein, Hahn, & Haynes, 2004; Bornstein, Leach, & Haynes, 2004; Bouchard, Trudeau, Sutton, Boudreault, & Deneault, 2009; Fenson et al., 1994).
Early Identification of Vocabulary Difficulties
Early intervention in preschool or kindergarten has demonstrated success in improving vocabulary knowledge and reducing early vocabulary gaps (e.g., Beck & McKeown, 2007; Coyne, Simmons, Kame’enui, & Stoolmiller, 2004; Gonzalez et al., 2010; Justice, Meier, & Walpole, 2005; Kelley, Goldstein, Spencer, & Sherman, 2015; Pollard-Durodola et al., 2011; Zipoli, Coyne, & McCoach, 2011). The identification of children with vocabulary deficits is a necessary first step to facilitate early intervention. Therefore, accurate assessments of vocabulary knowledge are needed to identify preschool children that may benefit from additional support.
Several standardized assessments exist for preschool populations that specifically measure vocabulary (e.g., Peabody Picture Vocabulary Test–Fourth Edition [PPVT-IV]; Dunn & Dunn, 2007) or include subtests of receptive and expressive vocabulary within larger batteries (e.g., Test of Language Development–Primary, Fourth Edition; Newcomer & Hammill, 2008). However, these measures are costly and their lengthy individual administration times (e.g., approximately 20 min per child) make them impractical for universal screening. Some studies have evaluated the use of parent surveys for evaluating children’s receptive and expressive vocabulary (see Nelson, Nygren, Walker, & Panoscha, 2006); however, the reliance on parent participation and report may be difficult in some preschool contexts.
Ideal preschool vocabulary screening assessments should be inexpensive, brief, easy for early childhood educators to administer and score, and should accurately distinguish which students are at risk for vocabulary difficulties. In addition to demonstrating the ability to predict future vocabulary achievement, screeners should also be evaluated in terms of their classification accuracy, which refers to a measure’s ability to accurately identify students who are at risk for later failure. Typically, analyses of classification accuracy are used to determine a screening measure’s accuracy in correctly discriminating among students with the problem of interest (i.e., “positives”) from students without the problem (i.e., “negatives”). Inaccuracies result in false positive errors (students incorrectly identified by the screen as having the problem) and false negative errors (students truly with the problem but missed by the screen). Studies of classification accuracy have been described as the “sine qua non” of screening research (Jenkins, Hudson, & Johnson, 2007).
A few measures that are designed for early literacy screening include vocabulary components or subscales. The Individual Growth and Development Indicators (IGDIs; McConnell, 2004; Missall & McConnell, 2010) includes a picture naming task, which has demonstrated correlations with the PPVT-IV (r = .66; McConnell, Bradfield, Wackerle-Hollman, & Rodriguez, 2012). Get Ready to Read (GRTR; Whitehurst & Lonigan, 2001) is a 20-question assessment that includes items that measure vocabulary. Although studies support the relations of GRTR scores with assessments of emergent literacy skills, word reading, and reading comprehension with coefficients ranging from .46 to .72 (Phillips, Lonigan, & Wyatt, 2009; Wilson & Lonigan, 2009), relations of the vocabulary items to vocabulary outcomes have not been evaluated. The mCLASS:CIRCLE (Landry, Assel, Gunnewig, & Swank, 2007), a computer-based screening and progress monitoring tool, includes a vocabulary subtest (in a Picture Naming Fluency [PNF] format) that has been found to correlate moderately (r = .42) with the picture naming task from the IGDIs with preschool children (Gischlar & Shapiro, 2014). Technical report data indicate concurrent validity with the PPVT-IV ranging from r = .61 to .88 for kindergarten students (Center on Response to Intervention, 2014), but relations to scores on other vocabulary assessments with preschoolers have not been published. In summary, although assessments of vocabulary exist within some early literacy screening batteries, information on their validity specific to their vocabulary subscales is limited. Studies have also not evaluated the accuracy of these screening tools in classifying students specifically in relation to vocabulary outcomes.
The Dynamic Indicators of Vocabulary Skills (DIVS)
The DIVS (Parker, 2000) were designed as an inexpensive and efficient set of measures to screen and formatively measure vocabulary with young children. The DIVS were selected for this study because they elicited a vocabulary knowledge estimate distinct from other language and literacy assessments. The DIVS consists of two subtests. PNF is used to assess children’s expressive vocabulary knowledge by measuring the rate in naming a series of pictures representing common nouns. Reverse Definition Fluency (RDF) is used to assess children’s receptive and expressive language skills by measuring children’s ability to provide words that match brief definitions provided by the examiner.
Earlier research supports the reliability and technical adequacy of the DIVS for identifying preschool and kindergarten children with low vocabulary skills and for evaluating the effectiveness of vocabulary interventions (Marcotte, Parker, Furey, & Hands, 2013; Parker, 2006; Parker & Ditkowsky, 2006). Marcotte et al. (2013) found strong concurrent validity (r = .70) between the DIVS subtests and the Peabody Picture Vocabulary Test–Third Edition (PPVT-III). In addition, fall performance on DIVS PNF was strongly predictive of spring PPVT-III performance (r = .74) and moderately predictive of spring phonological awareness skills (r = .45–.55).
The concurrent and predictive validity evidence for the DIVS is encouraging and suggests that they measure skills that contribute to language and literacy achievement. However, several questions remain regarding the utility of the DIVS for preschool vocabulary screening. To date, no published studies have investigated the classification accuracy of the DIVS measures. In addition, it is not clear whether using the DIVS improves the identification of children at risk for vocabulary difficulties beyond the knowledge of demographic characteristics known to be associated with vocabulary difficulties, such as sex, SES, and EL status. It is difficult to justify the resources necessary for universal screening if the screener does not result in improved accuracy beyond knowledge of known risk factors. Therefore, to establish its utility in preschool screening contexts, the DIVS should demonstrate better accuracy in identifying vocabulary risk over demographic risk factors.
Study Purpose
The purpose of this study was to investigate the classification accuracy of the DIVS when assessing preschool children’s risk for later vocabulary deficits, and examine the degree to which the DIVS improved the identification of students with vocabulary difficulties beyond demographic variables known to be associated with vocabulary deficits. In addition, this study investigated how the use of DIVS scores in conjunction with EL status might improve accuracy in identifying students at risk for vocabulary difficulties.
Method
Participants and Setting
The participants in this study included 4-year-old children enrolled in a publicly funded preschool in Massachusetts. From an original sample of 279 children who were present for at least one assessment period during the preschool year, 240 (86%) of the students had complete data on the measures and assessment points for use in the present analyses. 1 Chi-square analyses were conducted to assess whether the sex, SES, or language status of the students with missing data varied from the students with complete data. No differences were found for these independent variables. The final sample consisted of 49.2% females (n = 118) and 35.8% (n = 86) EL students. The majority of the students, 69.6% (n = 167) qualified for free or reduced-price lunch. Children who were receiving special education services were not included in the sample.
Preschool setting and instruction
The preschool was located in a large urban neighborhood. The percentage of individuals living below poverty level in the area was 23%, more than double the percentage across Massachusetts (11%). The preschool provided full-day, full-year services funded through Early Reading First grants. The Open Court Reading Pre-Kindergarten curriculum (McGraw-Hill, 2003) was used to address oral language, phonological awareness, print awareness, and alphabet knowledge. The preschool made use of early language and literacy screening data to evaluate student progress and their program outcomes. These data were used in the present study.
Measures
DIVS
The DIVS were designed to assess levels of receptive and productive vocabulary acquisition in preschoolers and kindergarteners. Consistent with the curriculum-based measurement (CBM) framework (Deno, 1985), the DIVS were designed to be brief, inexpensive measures and thus suitable for screening large populations of children and useful for frequent and repeated administration (Deno, 2003). The DIVS are comprised of two subtests, PNF and RDF.
PNF
Each PNF probe consists of 44 colored pictures randomly ordered on a page, with four pictures per line. Pictures are colored line drawings that represent common nouns that were found on at least five occasions across a sample of children’s books or kindergarten or first-grade basal readers (see Parker, 2006). Students are instructed to begin at the top of the page and name pictures from left to right. If the student hesitates for 5 s, the examiner provides the name of the picture and marks the item as an error. In preschool, PNF has demonstrated test–retest reliability ranging from .80 to .87 (Marcotte et al., 2013), and concurrent validity ranging from .64 to .77 with DIVS RDF, the PPVT-III, and the Auditory Comprehension and Expressive Communication subscales of the Preschool Language Scale (Parker, 2006; Parker & Ditkowsky, 2006; Zimmerman, Steiner, & Pond, 2002).
RDF
RDF contains 30 definitions of words commonly found in children’s literature using similar procedures to those used in the development of PNF. Students are orally presented with a brief definition and asked to name the word that was described. Following two practice examples, the examiner provides the first word definition and begins a stopwatch. When the student provides a response the stopwatch is paused, timing only the latency of response. The timer is started again after the next definition is provided. The examiner stops presenting items after 1 min has elapsed on the stopwatch, and the measure is scored in terms of the number of words named correctly. RDF has demonstrated alternate form reliability of .86 for preschoolers and .76 for kindergarteners (Parker, 2006), and concurrent validity with the PPVT-III, the Preschool Language Scale–Auditory Comprehension subtest, and the Preschool Language Scale–Expressive Communication subtest with coefficients ranging from .70 to .83 (Parker, 2000, 2006).
PPVT-III
The PPVT-III is a measure of receptive vocabulary in which students are shown four pictures on a page and asked to select the picture corresponding to a word presented by the examiner. The present analyses utilized data from the spring administration of the PPVT-III, which was administered using Form B. Age-based standard scores were used in all analyses. With preschool children, the PPVT-III demonstrates coefficient alpha of .95, split-half reliability of .92 to .94, and alternate form reliability of .94 to .95 (Dunn & Dunn, 1997). According to its technical manual, the PPVT-III was moderately correlated to the Oral and Written Language scales (r = .67–.77) and strongly correlated to the verbal scale of the Wechsler Intelligence Test for Children–Third Edition (r = .82–.92).
Procedures
Data were drawn from an existing data set gathered as part of the standard school-wide tri-annual assessment procedures. Data from two cohorts were aggregated over 2 consecutive academic years. With each cohort, fall testing occurred between mid-September through mid-October, winter testing occurred mid-January through mid-February, and spring testing occurred in May. Data for the present analyses included the PNF administered in the fall and winter, RDF administered in the winter, and the PPVT-III administered in the spring. Total time for the DIVS test sessions were maximally 5 min per student. The PPVT-III testing sessions lasted approximately 20 min for each student. Each test was administered to one student by one test administrator. Each test session took place in a quiet area of the school library. All data were collected and managed by a private data management firm contracted by the school district that specialized in the administration, scoring, and interpretation of academic assessment data. All test administrators were highly trained in administering each test. Each had received a direct instruction training session, including practice time, where they learned how to administer the DIVS and the PPVT. Each administrator had accrued an average of 350 to 500 hr of time gathering early literacy data with children across a variety of school districts. Procedures for data entry included a verification of 20% of the data that were entered into the database. The resultant data set represented the official screening data gathered by the school district for the 2 academic years.
Free and reduced-price lunch status was used to estimate SES. EL status was determined at preschool registration according to the public school registration policies. These data were provided by the school district and were dichotomously coded as 0 or 1 (males, students qualifying for free or reduced-price lunch, and ELs were coded as 0).
Data Analyses
Spring scores on the PPVT-III served as the criterion measure of vocabulary knowledge. We used the 25th percentile on the year-end PPVT-III to indicate vocabulary risk status (classification accuracy analyses require a dichotomously coded criterion variable). The 25th percentile has been used to indicate low achievement status across studies (e.g., Catts, Compton, Tomblin, & Bridges, 2012; Fletcher et al., 1994; Francis et al., 2005). Classification accuracy of the DIVS and demographic variables were investigated using receiver operating characteristic (ROC) curves. ROC curves provide an area under the curve (AUC) statistic, which is an index of the overall accuracy of a screening variable. AUC can range from .50 (representing chance accuracy) to 1.00 (perfect accuracy). According to Swets (1988), AUC below .70 reflects little utility for a screening tool, measures with AUC between .70 and .90 are useful for some purposes, and above .90 is ideal. ROC curves were also used to derive cut scores on the DIVS variables, which permitted examining the resulting rates of correct and incorrect classifications of students and classification indices that included sensitivity (i.e., rate of true positives; accuracy in correctly classifying students who will later fail the criterion assessment), specificity (i.e., rate of true negatives; accuracy in correctly classifying students who will pass the criterion assessment), positive predictive power (i.e., the percentage of students deemed at risk on the screen who fail the criterion assessment), and negative predictive power (i.e., the percentage of students deemed not at risk by the screen who indeed pass the criterion assessment). Cut scores were not relevant for the demographic variables because they were already dichotomous variables; therefore, we simply examined the classification accuracy that resulted when considering males, economically disadvantaged, or EL status as the “at-risk” groups, respectively. ROC curves were generated with SPSS (Version 22) using the nonparametric distribution assumption.
For continuous variables such as the DIVS, sensitivity and specificity will vary based on the cut score used. To permit fair comparisons of classification accuracy across measures, cut scores should be selected that hold either sensitivity or specificity constant. In our analyses of each of the DIVS variables, we reported classification accuracy results using two different cut scores that held sensitivity constant at different levels. The first cut score was selected because it was associated with sensitivity of approximately .86, which was closest to the most sensitive of the demographic variables (SES), and was also associated with cut scores on each of the DIVS variables. This enabled us to fairly compare classification accuracy of the DIVS measures with the demographic variables. Because false negatives represent arguably the most egregious type of screening error in educational settings (i.e., failing to identify a student who is truly at risk), Jenkins et al. (2007) recommended sensitivity of at least .90 to keep false negative screening errors below 10%. We therefore selected a second set of cut scores for each of the DIVS variables that placed sensitivity at approximately .97, which might reflect a situation in which a school wished to prioritize sensitivity (i.e., to minimize false negative errors). Holding sensitivity constant permitted comparison of the resulting rates of specificity. Higher levels of specificity are better (as they represent fewer false positive errors); however, there is less agreement on recommended levels of specificity which have ranged from .50 or greater (Catts, Petscher, Schatschneider, Bridges, & Mendoza, 2009) to above .80 (Compton, Fuchs, Fuchs, & Bryant, 2006).
We used logistic regression analyses to investigate whether EL status and the DIVS variables were unique and statistically significant predictors of year-end PPVT-III status, and if multiple variables would improve accuracy in predicting subsequent vocabulary status over single variables. Logistic regression uses a categorical dependent variable and supports either continuous or categorical predictors. As in the classification accuracy analyses, dichotomously coded scores above or below the 25th percentile on the PPVT-III were used as the dependent variable. We limited the predictors to those that demonstrated AUC of at least .70 in the previous analyses, which is considered a minimum level of accuracy acceptable for decision-making purposes (Swets, 1988); therefore, sex and SES were not included in the logistic regression analyses based on their poor overall accuracy. Separate models evaluated the DIVS predictors available in the fall and winter, and predictors were entered simultaneously in each model. 2
Based on results that PNF and RDF were unique predictors of year-end vocabulary risk when accounting for the effects of each, we investigated whether the combination of PNF and RDF resulted in superior classification accuracy over using either measure alone. Consistent with prior studies that examined improved accuracy of multivariate screening models (e.g., Clemens, Shapiro, & Thoemmes, 2011; Johnson, Jenkins, Petscher, & Catts, 2009; Speece et al., 2011), we saved the predicted probability values from a separate logistic regression model that used PNF and RDF as predictors. Predicted probability values were then used in a ROC curve analysis to examine whether the multivariate combination demonstrated improvements in classification accuracy over using single DIVS measures. Finally, we used predicted probabilities to determine the degree to which using the DIVS variables in addition to EL status improved the ability to predict year-end vocabulary risk status.
Results
Intercorrelations among the variables are reported in Table 1. Sex was not associated with vocabulary knowledge measured by either the DIVS measures or the PPVT-III. SES and EL status demonstrated statistically significant correlations with the vocabulary variables, indicating that eligibility for free or reduced-price lunch and EL status were associated with lower vocabulary scores. Strong positive correlations were observed among all of the vocabulary assessments; intercorrelations among DIVS measures ranged from .77 to .87, and fall and winter DIVS scores were positively correlated with spring PPVT with coefficients ranging from .71 to .75.
Intercorrelations Among Study Variables.
Note. Demographic variables were dummy coded such that males, low SES, and ELs were coded as 0. SES = socioeconomic status; EL = English learner; PNF = Picture Naming Fluency; RDF = Reverse Definition Fluency; PPVT = Peabody Picture Vocabulary Test.
p < .05. **p < .01.
Descriptive data for the DIVS variables for the full sample and disaggregated by subgroups are reported in Table 2. Skewness and kurtosis were within acceptable limits of normality (±3 for skewness and ±8 for kurtosis) according to Kline (2011); specifically, skewness did not exceed 1.7, and kurtosis did not exceed 4.81 for the full sample or the subgroups. The data in the present sample resembled the distribution of the normative sample of preschoolers of the PPVT-III. The base rate for students in this sample who scored below the 25th percentile on spring PPVT-III was approximately 27% (n = 64). The students in this sample also performed similarly on fall PNF to preschoolers in the DIVS normative sample, with a median score of 17 which falls at the 45th percentile compared with the DIVS norms. However, the students represented in this study were slightly less proficient than the normative sample of the DIVS for their winter scores, with the PNF and RDF median scores at approximately the 40th percentile of the normative sample.
Descriptive Statistics for Vocabulary Assessments for the Full Sample and Student Subgroups.
Note. Low SES based on family eligibility for free or reduced-price lunch. Raw scores reported for PNF and RDF data, standard scores reported for PPVT. PNF = Picture Naming Fluency; RDF = Reverse Definition Fluency; PPVT = Peabody Picture Vocabulary Test; Skew = Skewness; Kurt = Kurtosis; SES = socioeconomic status; EL = English learner.
We examined the classification accuracy of each demographic variable and DIVS assessment in predicting vocabulary risk status at the end of the school year. Results are reported in Table 3. Sex was ineffective for predicting year-end vocabulary risk, with sensitivity and specificity equivalent to chance. Although SES demonstrated stronger sensitivity compared with the other demographic indices, specificity was very low (i.e., very high number of false positives) and overall accuracy fell below the minimum acceptable standard of .70 (Swets, 1988). EL status demonstrated better overall accuracy (as indicated by the higher AUC) and a better balance of sensitivity and specificity than SES status.
Classification Accuracy of the Demographic Variables and Fall and Winter DIVS Variables in Predicting Year-End PPVT Risk Status (Below 25th Percentile).
Note. Base rate of students below the 25th percentile on spring PPVT-III = 64. At-risk groups on the demographic variables were represented by males, students eligible for free/reduced-price lunch, and English learners. Cut score = students below this score are deemed “at risk.” DIVS = Dynamic Indicators of Vocabulary Skills; AUC = area under the curve; CI = confidence interval; PPP = positive predictive power, NPP = negative predictive power, TP = true positive, FP = false positive, TN = true negative, FN = false negative; SES = socioeconomic status; EL = English learner; PNF = Picture Naming Fluency; RDF = Reverse Definition Fluency; PPVT-III = Peabody Picture Vocabulary Test–Third Edition.
Fall PNF demonstrated stronger accuracy than the demographic variables and, in fact, demonstrated overall accuracy that would be considered very good for preliminary decision making (Swets, 1988). AUC confidence intervals (CIs) for fall PNF did not overlap with those for the demographic variables, indicating greater overall accuracy.
Results of the winter administrations of PNF and RDF predicting year-end PPVT status are also displayed in Table 3. Winter PNF and RDF demonstrated similarly strong accuracy; although RDF demonstrated slightly stronger specificity (and thus lower false positives). Interestingly, although winter PNF and RDF demonstrated strong classification accuracy, slightly stronger accuracy was observed for fall PNF. Fall PNF resulted in fewer false positives compared with winter PNF and RDF.
Next, we used logistic regression to determine whether EL status and DIVS variables were unique predictors of spring PPVT status when accounting for each other. Results of the fall and winter logistic regression models are reported in Table 4. In both models, DIVS variables were significant predictors of vocabulary outcomes, and EL status was not. Both PNF and RDF were significant predictors of spring PPVT status in the winter model, indicating that both DIVS variables accounted for unique variance in the prediction of subsequent vocabulary outcomes.
Logistic Regression Analyses Predicting Year-End PPVT Risk Status (Below 25th Percentile).
Note. The Wald statistic is an index of the significance of each predictor in the model. PPVT = Peabody Picture Vocabulary Test; EL = English learner; PNF = Picture Naming Fluency; RDF = Reverse Definition Fluency.
Because winter PNF and RDF were both significant and unique predictors of year-end vocabulary risk, we investigated the degree to which PNF and RDF used simultaneously resulted in improved accuracy when compared with predictors used alone. To begin, a logistic regression model was analyzed that included only winter PNF and RDF, and the predicted probabilities for each student passing the spring PPVT outcome criterion were saved from this model. The predicted probabilities were then used in a ROC curve to predict spring PPVT-III status. As reported in Table 3, the combination of PNF and RDF demonstrated slightly stronger overall accuracy (AUC = .927, 95% CI = [.891, .963]) than winter PNF or RDF, but resulted in only minor improvements in reducing false positives compared with PNF or RDF alone. Also of note is the high degree of overlap between the AUC CI for PNF and RDF when used individually or in tandem, indicating a high level of similarity in overall accuracy regardless if the DIVS measures are used alone or in combination. Thus, results suggest little practical benefit for using both PNF and RDF at the winter assessment, as acceptable accuracy was achieved with single measures. In addition, none of the winter variables, either alone or in combination, demonstrated superior accuracy over fall PNF. The ROC curves summarizing the overall classification accuracy of the DIVS variables are displayed in Figure 1.

Receiver operating characteristic curves for DIVS measures assessed in the fall and winter predicting year-end PPVT status.
Finally, we were interested in how the use of DIVS scores in conjunction with EL status might improve the precision of vocabulary risk identification beyond the knowledge of EL status. For this analysis, we first used predicted probability values from logistic regression analyses to determine the probability of scoring above the 25th percentile on the spring PPVT based on EL status. As reported in Table 5, EL status was associated with .44 probability of scoring above the 25th percentile on the spring PPVT-III, whereas non-EL status was associated with .90 probability of scoring above the 25th percentile. Next, for ELs and non-ELs, we examined the probability of successful vocabulary outcomes based on scores on fall PNF or winter RDF (we used winter RDF in this case given its stronger accuracy than winter PNF). Rather than use a single cut score, we examined predicted probabilities associated with a range of cut scores on the DIVS variables.
Predicted Probability of Scoring Above the 25th Percentile on Spring PPVT Given EL Status and DIVS Score With EL Status.
Note. PPVT = Peabody Picture Vocabulary Test; EL = English learner; DIVS = Dynamic Indicators of Vocabulary Skills; PNF = Picture Naming Fluency; RDF = Reverse Definition Fluency.
As reported in the two right-hand columns in Table 5, considering DIVS scores in conjunction with EL status enhanced the precision in predicting subsequent vocabulary knowledge over considering EL status alone. For example, whereas EL status alone was associated with .44 probability of scoring above the 25th percentile at year-end, EL status and earning a fall PNF score of 4 were associated with a much lower probability of <.10. On the contrary, EL status combined with a score above 26 on fall PNF was associated with probability >.97 of scoring above the 25th percentile on the spring PPVT-III. The effects of considering PNF scores for non-ELs are also apparent; non-EL status was associated with high likelihood of scoring above the 25th percentile on the spring PPVT overall when only EL status was considered. However, successful vocabulary outcomes were much less likely for non-EL students with lower PNF scores. These results further underscore the potential benefit of using the DIVS as a screening assessment of vocabulary knowledge in identifying preschool children at risk for vocabulary difficulties.
Discussion
Vocabulary knowledge plays a critical role in the developmental processes involved in listening comprehension (Sénéchal et al., 2006), learning to read (Nation & Snowling, 1998; Ouellette, 2006; Sénéchal et al., 2006; Vadasy & Sanders, 2010), and in later reading and writing proficiency (Joshi, 2005; Shanahan, 2006). Early deficits in vocabulary can account for significant achievement gaps observed several years in the future (Catts et al., 2012; Sénéchal et al., 2006). Therefore, much attention has been focused on early language intervention (Hancock, Kaiser, & Delaney, 2002; Pullen, Tuckwiller, Konold, Maynard, & Coyne, 2010).
Preschool vocabulary screening offers the potential to identify students who are at risk for later vocabulary difficulties and who may benefit from early intervention. However, there is a lack of research on brief and inexpensive tools to do so. This study investigated the accuracy of the DIVS subtests in classifying students according to year-end vocabulary risk status, and the degree to which the DIVS improved classification accuracy over demographic variables known to be associated with vocabulary difficulties. Using the 25th percentile on the PPVT-III to indicate year-end vocabulary risk status, the overall classification accuracy (as indicated by AUC statistics) for sex and SES fell below minimum acceptable standards (Swets, 1988). EL status was a more accurate overall predictor than sex and SES; however, sensitivity fell below the minimum recommended level of .90 (Jenkins et al., 2007), indicating that using EL status would likely result in failing to identify a number of children at risk for poor vocabulary outcomes.
In comparison, the 1-min DIVS PNF screener was very accurate in identifying children with later vocabulary deficits. The CI for AUC for fall PNF did not overlap with the areas for the demographic variables, and at the more lenient cut score of 14 pictures named correctly, PNF demonstrated greater sensitivity, positive predictive power, and negative predictive power than the predictions made using any demographic variable.
The DIVS measures administered in the winter also exhibited strong classification accuracy (although, interestingly, slightly stronger accuracy was observed for fall PNF). The winter administration of RDF, which relies on more complicated language compared with PNF, demonstrated slightly stronger specificity than winter PNF when cut scores held sensitivity constant between the two measures. Although decisions made using each measure were similarly accurate regarding true positives and false negatives, the slightly stronger specificity of RDF resulted in fewer false positives.
Logistic regression analyses indicated that the DIVS variables accounted for significant variance in predicting year-end vocabulary status, whereas EL status was not a significant predictor in these models. Although winter PNF and RDF scores were uniquely predictive of vocabulary outcomes, combining measures resulted in only minor practical improvements in classification accuracy and offered little justification of the additional time needed for administering a second measure.
Collectively, these results suggest that PNF is an effective screener of vocabulary upon preschool entry in the fall and RDF an effective screening tool in the winter of preschool. The ability to recognize and name common nouns as tested by PNF at preschool entry may be an efficient index of general language proficiency. By the winter of preschool, as receptive and expressive language has developed further, the ability to identify words by their definitions may reach a sufficient level to be assessed by a task like RDF. The results of this study suggest that this more complex task may be a sufficient measure for vocabulary screening at this developmental time point.
The results also indicated that the DIVS measures could provide data that enhances early childhood educators’ ability to efficiently identify vocabulary risk status. Our analyses demonstrated that although EL status alone was associated with a predicted probability of .44 for scoring above the 25th percentile on the spring PPVT-III, the combination of EL status plus a low score on fall PNF or winter RDF was associated with a very low likelihood of exceeding the 25th percentile on the spring PPVT-III. In contrast, ELs with higher scores on fall PNF or winter RDF were highly likely to score above the 25th percentile on the year-end PPVT-III. Thus, scores on the DIVS measures were shown to potentially add to the precision of determining vocabulary risk beyond simply considering EL status. These results suggest that the presence of multiple risk factors should be considered when interpreting screening data and allocating intervention resources. In addition, these data also suggest that although EL status is certainly a risk factor for vocabulary difficulties, DIVS data may better indicate the students for whom supplementary vocabulary intervention should be considered, or when additional individual assessment data may be needed.
Implications for Practice
This study provides evidence supporting the use of the DIVS for screening the early vocabulary knowledge of preschool children. Previous research reported a strong concurrent and predictive relationship between performance on the DIVS and the PPVT-III (Marcotte et al., 2013). This study provides evidence that the use of the DIVS as a screener in the fall and winter of preschool can provide data about students’ risk for vocabulary difficulties beyond that indicated by demographic factors known to be associated with vocabulary deficits. These results suggest that preschool entry is an important developmental moment for assessing vocabulary acquisition, and the brief, efficient nature of the DIVS measures makes them a potentially useful tool for early childhood educators.
It is well documented that delayed language impedes the process of reading acquisition. Fortunately, instruction that is designed to target vocabulary acquisition has shown promise for increasing vocabulary knowledge in preschool children (e.g., Gonzalez et al., 2010; Kelley et al., 2015; Pollard-Durodola et al., 2011). Screening procedures are a first step in guiding preventive intervention practices to mitigate risks of long-term language and literacy difficulties. For example, researchers have identified a subgroup of students with “late emerging” reading disabilities who, despite seemingly adequate basic reading skills in early grades, experience reading difficulties by middle elementary school particularly in reading comprehension (Catts et al., 2012; Compton, Fuchs, Fuchs, Elleman, & Gilbert, 2008). Catts et al. (2012) attributed the comprehension problems of late emerging poor readers to a history of difficulties in vocabulary or other linguistic skills that were not detected earlier, but negatively affected reading comprehension as text became more complex and demanding in later grades. Thus, the increased use of vocabulary screening in early childhood may assist in the early identification of children who may be at risk for later reading comprehension difficulties.
Despite their potential as a preschool vocabulary screening tool, it must be acknowledged that the DIVS were designed to provide an indication of general vocabulary knowledge. Because they are fluency-based measures, the breadth of general word knowledge that is assessed is dependent upon students’ response rate. A benefit of the testing format of the DIVS is that students must express the correct answers, rather than simply select a picture or correct answer that is presented to them. For example, in the PNF task students point to an array of pictures and name as many pictures as they are able to. For the RDF task, students are presented a description for which they must provide a correct word for what was described, such as “what is a part of your body used to see?” Thus, the RDF task demands use of both receptive language skills to comprehend the question and expressive language skills to provide an appropriate response. However, the DIVS do not test the complex nature of word knowledge (Nagy & Scott, 2000) or specific words that were directly taught. Like other screening measures, the DIVS was designed to provide an overall indication of achievement and risk status, as opposed to specific diagnostic information. Therefore, decisions regarding instructional programming and interventions for individual students should be based on more detailed assessment data.
Another important finding presented in this study was the inaccurate decisions that were possible using common cut scores across student subgroups. Results of the predicted probability analyses revealed that cut scores that indicated risk (or lack thereof) varied considerably across EL and non-EL students. For example, as reported in Table 5, a winter RDF score between 6 and 8 for an EL student was associated with a probability of .58 for exceeding the 25th percentile on the spring PPVT-III, whereas an RDF score of between 6 and 8 for a non-EL student was associated with a probability of .86 for this same outcome. This suggests that alternate cut scores may be needed for making educational decisions for students with additional risk factors for vocabulary difficulties, such as EL status. The results of this study also suggest that cut scores on other screeners should be reevaluated when making decisions for students with other known risk variables.
The results of the predicted probability analyses illustrated the importance of considering a continuum of risk, as opposed to a single cut score. It is common practice for educators to compare student performance on tri-annual screeners to cut scores to allocate supplemental educational supports for struggling students. Although classification accuracy analyses attempt to determine a cut score that best discriminates students according to risk status, cut scores can imply a forced dichotomy in which a student is either “at risk” or “not at risk.” Results from this study showed that the probability of vocabulary risk status in the spring for EL students varied considerably based on the score they attained on the DIVS PNF or RDF measures. Rather than rely on single cut scores and an “either/or” decision, researchers and practitioners are encouraged to consider probability indices associated with the range scores across the distribution on screening measure, which may allow for better allocation of intervention resources.
Limitations and Directions for Future Research
Several limitations of this study should be addressed. First, eligibility for free or reduced-price lunch status is a crude index of socioeconomic disadvantage, and although commonly used and the only index that was available to us, it likely masked important differences in students’ vocabulary knowledge associated with SES. Similarly, EL students can demonstrate a broad range of levels of proficiency with English, and our binary categorization of language status may have obscured important differences in vocabulary knowledge as a function of language background. More detailed assessment of these variables might yield better information on the degree to which the DIVS is sensitive to vocabulary differences across demographic characteristics.
Second, the data for this study were drawn from an urban center where nearly 50% of the children were eligible for free or reduced-price lunch, and a high percentage of students were ELs. The characteristics of the students in this sample do not represent the broader student population in the United States. However, districts with more risk factors are highly relevant settings for screening given the importance of correctly allocating intervention services to students most in need. This study provides evidence for decision-making accuracy using the DIVS with a high-risk student population. Future research should examine whether the DIVS are equally accurate when used with a sample with fewer risk factors and how the cut scores may vary when used with different populations of young children.
Finally, the data used in this study were derived from an extant data set gathered by one urban school district. Although each test administrator was highly trained, data describing the fidelity of the testing sessions and interrater reliability were not available. The absence of these data is a limitation for the present study; however, the results represent decisions that could be made by real school professionals with real DIVS screening data. Future research on the DIVS could address score variance across test sessions and administrators.
The availability of reliable, valid, and brief vocabulary assessments for preschoolers creates opportunities for more research examining the relationship between vocabulary knowledge, later reading development, and general school achievement. The present study provides evidence that the DIVS are efficient and effective screeners for predicting later receptive vocabulary knowledge in young children. Future studies may investigate how well the DIVS predict broader oral language skills by using more comprehensive test batteries, including measures of expressive vocabulary and fundamental language skills. In addition, subsequent studies using the DIVS might investigate the multivariate and reciprocal relationships between vocabulary and emergent literacy skills of preschoolers and kindergarteners (e.g., print awareness, alphabetic knowledge, phonological awareness) for predicting both language and reading development in elementary school. Finally, measures such as the DIVS can be used to study the longitudinal relationship between early vocabulary knowledge and later successful academic behaviors, social and emotional skills, and general student achievement.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
