Abstract
Students who exhibit substantial behavior and emotional problems in school often have shown less severe problems earlier. Screening for such problems can suggest which students need extra support and help educators to direct support to students who are more likely to benefit. The present study explored predictive validity of a very brief behavior problem screening procedure as applied to 2,253 students ages 5 to 17 years. About half were special education students identified with emotional disturbance; the rest were students with no identified disabilities. Teachers rated them on the 10 items of the Emotional and Behavioral Screener. Any student whose sum of ratings exceeded a norm-based cutoff score was designated as at-risk; otherwise the student was not at-risk. Binary classification analyses of four age-level by gender subgroups of students showed that the instrument validly identifies at-risk students. Study method limitations and directions for research to clarify some remaining questions about this screening procedure are presented.
Keywords
A large percentage of children in the United States experience significant emotional or behavioral problems, including problems that meet clinical criteria for a mental disorder. The Surgeon General’s Report (U.S. Department of Health and Human Services, 1999) found that approximately 20% of children and adolescents have, or have had, a significant mental health problem (see also Jaffee, Harrington, Cohen, & Moffitt, 2005; National Research Council and Institute of Medicine [NRC and IoM], 2009). Such problems obviously affect children in school.
Students with emotional disturbance (ED) are students identified for special education due to behavioral and emotional problems in school. Students in this disability category tend to have very poor school and life outcomes (Bradley, Doolittle, & Bartolotta, 2008). They tend to get poor grades, many course failures, and high levels of disciplinary referrals, absenteeism, suspensions, and expulsions. Eventually, these youth drop out of school at much higher rates than their peers (Wagner, Kutash, Duchnowski, & Epstein, 2005). They ultimately show elevated rates of unemployment, involvement with the criminal justice system, and substance dependency and abuse (Kauffman & Landrum, 2009; Wagner, Newman, Cameto, Garza, & Levine, 2005).
Many students with ED have a history of earlier school behavior indicators of behavior and emotional problems (Gilliam & Shahar, 2006). Prevention scientists generally endorse the idea that intervention at an early point can prevent or limit serious problems later and usually is less difficult and costly than intervention later, after a more serious problem emerges (NRC & IoM, 2009). Therefore, educators have keen interest in recognizing early indicators of ED or other substantial emotional and behavioral problems, yet they may be concerned that recognizing early indicators of emotional and behavioral disorders may incorrectly label some students, thus perhaps creating biases and self-fulfilling prophecies (Cullinan, 2007).
Indeed, early identification processes can yield false positive and false negative identifications as well as accurate identifications. In efforts to reduce inaccurate outcomes of early identification efforts, many school professionals have come to view identification of students at risk for future ED or other emotional and behavioral problems as a multicomponent process in which the first component is screening (Kerr & Nelson, 2010).
Assessment procedures are available for screening students who may be at-risk for ED, including several screening instruments (e.g., Drummond, 1994; Elliot & Gresham, 2008; Goodman, 2001; Kamphaus & Reynolds, 2007; Walker & Severson, 1992). Professionals have highlighted three basic components of effective screening instruments: (a) appropriateness of the intended use, (b) technical adequacy, and (c) usefulness (Glover & Albers, 2007). Each screening instrument has strengths, but there are limitations as well. Some are effortful and time-consuming, requiring information from multiple informants and/or several steps instead of just one. Some have not been normed on a nationally representative sample, which may make it more difficult to place measurement results into context. Another limitation is that previously, no screening instrument directly addressed the federal definition of ED as found in the Individuals with Disabilities Education Improvement Act (Federal Register, 2006) even though this is the definition of ED that a student must meet to be identified in the ED category of education disability.
In response to such limitations of existing instruments, the Emotional and Behavioral Screener (EBS; Cullinan & Epstein, 2013) was developed to screen students at-risk of behaviors related to ED. The EBS was designed to align with the federal definition of ED, be clear and brief for teachers to use, and meet acceptable psychometric standards (e.g., Joint Committee on Standards for Educational and Psychological Testing, 1999). To these ends, 10 highly discriminating items were chosen from the Scales for Assessing Emotional Disturbance, 2nd ed., Rating Scale (SAED-2 RS; Epstein & Cullinan, 2010; see Cullinan & Epstein, in press for an in-depth description of the item selection process). The SAED-2 RS was used because it was normed on two large national samples of students (with ED, without ED), is psychometrically sound (see the Epstein & Cullinan, 2010 for information on the internal consistency, interrater reliability, and construct and concurrent validity), and is based on the federal definition of ED. The federal definition (Individuals with Disabilities Education Improvement Act, 2004) recognizes that emotional disturbance is a
condition exhibiting one or more of the following characteristics over a long period of time and to a marked degree that adversely affects a child’s educational performance: (a) an inability to learn that cannot be explained by intellectual, sensory, or health factors; (b) an inability to build or maintain satisfactory interpersonal relationships with peers and teachers; (c) inappropriate types of behavior or feelings under normal circumstances; (d) a general pervasive mood of unhappiness or depression; (e) a tendency to develop physical symptoms or fears associated with personal or school problems. (Cullinan & Epstein, 2013, p. 20)
A teacher completes the EBS on a student by rating its 10 items on a scale of 0 to 3, with a higher rating meaning the problem is shown to a greater extent. The item ratings are summed to yield a Total EBS Score, and if Total EBS Score exceeds a stated cutoff score, that student is considered to be at-risk for ED. The cutoff score for each of four student age level and gender groups (younger and older, females and males) is the Total EBS Score that exceeds the 80th percentile score of that group (see Cullinan & Epstein, 2013). The 80th percentile was selected as the cutoff based on evidence from the Surgeon General’s Report (U.S. Department of Health and Human Services, 1999) and recent epidemiological studies (Merikangas et al., 2010; see Mrazek & Mrazek, 2005; NRC and IoM, 2009)
Three studies have examined reliability of the scores obtained from the EBS. The EBS items have demonstrated adequate internal consistency across age groups, gender groups, and race-ethnic groups (Cullinan & Epstein, in press), with alphas well above the .80 level considered adequate (Nunnally & Bernstein, 1994), except for students age 5 years (.73). In a second study (Nordness, Epstein, & Cullinan, 2012), the test-retest reliability of the EBS was assessed with 42 students rated by their teachers over a 2-week period. The correlation for Total EBS Score was .90 (p < .001), which is very large in magnitude (Hopkins, 2002). A third reliability study examined interrater reliability: 41 middle school students were rated by two teachers or a teacher and paraprofessional (Nordness, Epstein, Cullinan, & Pierce, submitted). This study found the correlation between independent raters using the EBS to be .63, a large magnitude coefficient (Hopkins, 2002).
The validity of EBS scores has been evaluated in two studies. The EBS scores demonstrated adequate construct validity by differentiating various groups—male and female students, younger versus older students, students with ED versus those with no disability, and students with ED versus those with LD—in terms of likelihood of identification as ED (Cullinan & Epstein, in press). In a study of convergent validity, middle school teachers rated 77 students on both the EBS and the Behavioral and Emotional Screening System (BESS; Kamphus & Reynolds, 2007), a psychometrically sound and widely used screening test. The correlation between the EBS and BESS was .87, which is very large (Hopkins, 2002).
The purpose of the present study was to further examine the validity of the EBS scores by considering the diagnostic quality across four subgroups of students: 5- to 11-year-old females, 5- to 11-year-old males, 12- to 17-year-old females, and 12- to 17-year-old males. We examined these groups because there is evidence that the indicators of disabilities, in general, and ED, specifically, vary according to student age level and gender (Achenbach & Rescorla, 2001, 2003), and because that is how the EBS norms are organized.
Method
Participants
Participants were 2,253 U.S. students ages 5 through 17 years. There were 1,101 students not identified with any education disability (no disability, or ND students) and 1,152 students with ED (ED students). All ED students had been identified by their school districts as having ED, had an Individualized Education Program, and were receiving special education when data were collected. These ND and ED students were drawn from the field-testing study of the teacher-completed component (Rating Scale) of the Scales for Assessing Emotional Disturbance–2nd ed. (SAED-2; Epstein & Cullinan, 2010).
In that field-testing study we used the following procedure to obtain ratings on students. The second and third authors recruited teachers by mail or telephone. Teachers who agreed to participate were requested to complete rating forms on either all the students on their rolls/rosters/caseloads, or an unbiased subset of their students. Teachers were instructed to rate students that they had in their class for at least 4 weeks. To obtain an unbiased subset, teachers were instructed to use the following procedure. (a) Decide how many students you wish to rate. (b) Begin either at the top of your roll/roster/caseload and proceed downward, or at the bottom and proceed upward. (c) Select and rate every other student; do not skip any student so selected unless you have known that student for less than 2 months. (d) Stop rating students when you have reached the number of students you decided to rate. We subdivided the 2,253 students into age-level (5 through 11 years, 12 through 17 years) and gender subgroups for the present analysis.
No disability sample
Students in the no disability (ND) sample, who were not identified with any disability, ranged in age from 5 through 17 years, and 51% of the sample was male. Age in years of the four ND age-level by gender subgroups was as follows: 5 to 11/Female, M = 8.2, SD = 2.0; 5 to 11/Male, M = 8.1, SD = 1.9; 12 to 17/Female, M = 13.8, SD = 1.7; 12 to 17/Male, M = 13.9, SD = 1.7. Other important characteristics of the sample (see Epstein & Cullinan, 2010) indicated that it was nationally representative to a substantial degree in terms of race-ethnic status (Black/African American = 16%; Hispanic = 9%; White = 66%; other race-ethnic status = 9%). Students in the ND sample attended schools in 34 states representing all U.S. geographic regions. The families of an estimated 33% of the ND sample earned less than US$25,000 annually.
ED sample
Students in the ED sample ranged in age from 5 through 17 years, and 80% of the sample was male. Age in years of the four age-level by gender subgroups of students with ED was as follows: 5 to 11/Female, M = 8.7, SD = 1.8; 5 to 11/Male, M = 9.1, SD = 1.7; 12 to 17/Female, M = 15.0, SD = 1.5; 12 to 17/Male, M = 14.4, SD = 1.5. The race-ethnic status of this sample was similar to data on the race-ethnic status of students with ED nationally (see Epstein & Cullinan, 2010): Black/African American = 27%; Hispanic = 4%; White = 62%, other race-ethnic status = 7%). Students in the ED sample also attended school in 34 states representing all U.S. geographic regions. The families of an estimated 32% of the ED sample earned less than US$25,000 annually.
Materials and Procedure
The EBS is a 10-item teacher rating scale designed mainly for screening groups of students to determine which, if any, are at increased risk for school identification as ED or for showing serious behavior or emotional problems otherwise. From among all the SAED-2 RS items, we selected these 10 to constitute the EBS because of their high value in identifying students with ED while still representing each of the five characteristics of ED in the federal definition.
A teacher completes the EBS Form by rating each item (e.g., destroys and ruins things; disrespectful, defiant of authority; makes threats towards others) on a Likert-type scale (0 = not a problem; 1 = mild problem; 2 = considerable problem; 3 = severe problem). The 10 item ratings are summed to form a Total EBS Score (score range = 0-30), which is compared to a cutoff score stated on the EBS Form. A student whose Total EBS Score exceeds the cutoff score is identified as at-risk.
Analysis
To examine the predictive validity of the EBS we used binary classification analysis, which can detect the relationship between two dichotomous variables, a predicted status (in this case, at-risk vs. not at-risk) and a known status (ND vs. ED). For each student the EBS Total Score was used to determine at-risk vs. not at-risk predicted status, while our knowledge of whether each student was ND or ED constituted the known classification variable.
We analyzed these dichotomous variables with SPSS v19 to calculate several indications of diagnostic merit, separately for the four subgroups of students: (a) sensitivity and specificity, (b) area under the receiver operating characteristic curve, (c) positive and negative likelihood ratios, and (d) positive and negative predictive values. Multiple diagnostic indicators were reported because each indicator provides a different perspective of the diagnostic quality of an instrument such as the EBS. The four subgroups allowed us to investigate the diagnostic quality of the EBS across empirically and qualitatively different groups of students.
(a) Sensitivity refers to “the proportion of cases in which a disorder is detected when it is in fact present” (American Educational Research Association, American Psychological Association, & National Council of Measurement in Education [AERA, APA & NCME], 1999, p. 182)—that is, the proportion of true positives:
In the present study, sensitivity is the proportion of a sample that is both identified with ED and classified as at-risk by the EBS. Sensitivity can also be conceptualized as the probability of a positive test result on the EBS given that the student is identified with ED. Specificity is “the proportion of cases for which a diagnosis of disorder is rejected when rejection is warranted” (p. 182). (AERA, APA & NCME, 1999)—the proportion of true negatives:
In the present study, specificity is the proportion of the sample that is both identified as ND and classified as not at-risk by the EBS. If all students were to be classified correctly, the instrument’s sensitivity and specificity both would be 1.0. Various authorities have designated as acceptable sensitivity and specificity values ≥ .70 (Wood, Flowers, Meyer, & Hill, 2002), values ≥ .90 (Johnson, Jenkins, Petscher, & Catts, 2009), and often, values around .80.
(b) The area under the receiver operating characteristic curve (AUROC) summarizes the relationship between the sensitivity and specificity. More specifically, the receiver operating characteristic curve is a plot of sensitivity by [1-specificity] (i.e., true positives by false negatives). The AUROC has multiple interpretations. The first interpretation is that AUROC represents the probability that a student identified as ED would have a higher score on the EBS than a ND student. The second interpretation is that AUROC represents “the average value of sensitivity for all possible values of specificity” (Park, Goo, & Jo, 2004, p. 13). AUROC values greater than .90 are considered excellent, .80 to .89 are good, .70 to .79 are fair, and lesser values are poor (Compton, Fuchs, Fuchs, & Bryant, 2006).
(c) Positive likelihood ratio (LR+) is a ratio of true positives to false negatives. It is calculated using the following formula:
In the present analysis, LR+ compares ED students to ND students with respect to the likelihood that each is designated at-risk on EBS Total Score. Higher LR+ values indicate a more discriminating and predictive test (Cook, 2007). Here, for example, a LR+ value of 1.3 would indicate that ED students are 1.3 times as likely than ND students to have an EBS Total Score beyond the at-risk cutoff.
Negative likelihood ratio (LR–) is the proportion of false positives to true negatives. It is calculated using the following formula:
In the present analysis, LR– compares ND students to ED students with respect to the likelihood of being designated as at-risk on EBS Total Score. LR– values much lower than 1.0 are desirable because they indicate that ND students are less likely than ED students to be designated as at-risk on EBS Total Score. For example, a LR– of .50 would indicate that students without ED are only 50% as likely as students with ED to be at-risk according to the EBS.
Sensitivity, specificity, AUROC and likelihood ratios are characteristics of a measurement procedure, and are not influenced by the prevalence of the phenomenon measured. Two other binary classification indices, positive predictive value (PPV) and negative predictive value (NPV), are sensitive to the prevalence of the phenomenon measured (here, ED and at-risk). Thus PPV and NPV provide a more contextualized perspective of the diagnostic quality of the EBS.
(d) PPV reveals, in the present analysis, the proportion of students at-risk on the EBS who are also school-identified ED students. It should be noted that the calculations for PPV include only students identified as at-risk by the EBS—this is different from sensitivity which includes only students with school-identified ED regardless of at-risk status on the EBS. There are two equivalent formulas for obtaining the PPV where the prevalence of ED is defined as the proportion of true positives plus false negatives over the total sample size (n):
NPV reveals the proportion of students not at-risk on the EBS who are also ND students (not identified with any disability). Just as the calculations for PPV include only students at-risk on the EBS, calculations for NPV include only students not at-risk on the EBS regardless of their school-identified ED status. The NPV can also be obtained using either equivalent formula given below:
Results
Table 1 lists indicators of the diagnostic quality of the EBS, including sensitivity, specificity, AUROC, LR+, LR–, PPV, and NPV. The indicators are reported separately for each subgroup that was evaluated. In addition, the sample size of each subgroup is reported alongside the EBS cutoff score (and the corresponding percentile rank) used for each subgroup analysis.
Diagnostic Quality Indicators for the Emotional and Behavioral Screener for Four Age-Level by Gender Subgroups.
Sensitivity and Specificity
Given the general consensus regarding test sensitivity and specificity (Committee on Children with Disabilities, 2001), the EBS would be considered acceptable for all four subgroups. Eighty-one to 96% of students with ED were correctly classified by the EBS as at-risk of having ED. The specificity of the test is also acceptable although not ideal for all four subgroups (.75 - .79). Seventy-five to 79% of ND students were correctly classified by the EBS as not at risk of having ED. Overall, the classification accuracy (i.e., the correct classifications divided by the total number of classifications) ranged from .80 to .85 for subgroups with accuracy highest for the younger students.
Auroc
The AUROC ranged from .86 to .95 for the four subgroups. These values indicate that the EBS is a good to excellent test at identifying the risk of ED depending on the gender and age of the students being screened. The AUROC values indicate that there was an 86% to 95% probability that a student identified as ED would have a higher score on the EBS than a ND student. The test is most discriminating for young females (AUROC = .95), then young males (.91), older females (.89), and older males (.86).
Likelihood Ratios
Students identified with ED were 3.38 to 4.57 times more likely to be classified as at-risk for ED than classified as not at risk for ED by the EBS. The high LR+ values indicate that ED students are much more likely to be classified as at-risk of ED compared to ND students. This is an indicator of a highly predictive test with respect to identifying positive risk status. Similar to the AUROC findings, the test seems to work best for young females (4.57), then young males (4.09), older females (3.56), and older males (3.38).
ND students were 95% (LR– = 0.05) to 75% (LR– = 0.25) less likely than ED students to be identified as at-risk of having ED. The low LR– values indicate that ND students are quite unlikely to be classified as at-risk of having ED compared to the likelihood that ED students would be classified as at-risk of having ED. The negative likelihood ratio is also best for young females and worst for older males.
Positive and Negative Predictive Values
The PPVs ranged from .50 to .92, which indicates that between 50% and 90% of the students classified as at-risk of ED were also identified with ED by school districts. Young females had the lowest PPV indicating that the EBS either overidentifies young females as at risk of ED or that school districts underidentify young females with ED or a combination of both. Older males had the highest PPV and also have the highest prevalence of ED in the population, suggesting that the risk status identified by the EBS is largely in agreement with school districts’ identification of ED status. Ninety percent of older males classified as at risk of having ED were also identified by their school district as having ED.
The NPVs ranged from .53 to .99 indicating that 53% to 99% of students classified as not at risk of ED were also identified as ND by school districts. The NPV was highest for young females (.99) and lowest for older males (.53). Remember that the PPVs as well as the NPVs are sensitive to the prevalence of ED in the population. For example, the prevalence of ED (or at least the identification of ED) for young females is low (18% of the young female sample is identified as ED) and therefore the opportunity to correctly classify young females as at-risk of ED is lower compared to other subgroups and thus young females have the lowest PPV.
Discussion
Professionals have called for screening instruments that meet three basic criteria: (a) the appropriateness for the intended use, (b) the technical adequacy, and (c) the usability (Glover & Albers, 2007). The developmental process of the EBS as well as the findings of prior research (Cullinan & Epstein, in press; (Nordness, Epstein, Cullinan, & Pierce, submitted) help to establish the appropriateness and usefulness of the EBS in identifying youth at risk of ED. The findings of this study help to establish the technical adequacy of the EBS.
Overall, a majority of the diagnostic quality indicators (i.e., AUROC, sensitivity, specificity, LR+, LR−) suggest that the EBS is an accurate, highly discriminating test with the potential to help schools identify students at-risk of having ED. In addition, these findings hold across the four subgroups evaluated in this study indicating that the EBS is well suited to be used across a large range of students. By and large the test appears to be more accurate and discriminating for younger students compared to older students. All of the test-specific indicators (e.g., all indicators except for PPV and NPV) were markedly better for younger students compared to older students.
While most diagnostic indicators suggest that the EBS has high quality properties, the specificity of the EBS was slightly lower than the desired .80 level. However, this was expected given the approach used to select cut-scores. The EBS was developed to identify around 20% of ND students as at-risk of having ED, based on the recommendations of the Surgeon General’s Report (U.S. Department of Health and Human Services, 1999) and numerous epidemiological reports (Jaffee et al., 2005; NRC and IoM, 2009). As a result, we would expect that the specificity would be at or around .80.
The PPVs and NPVs for some subgroups seemed to be artificially high or low depending on the subgroup due to sampling procedures where young females were considerably more likely to be included in the ND sample and older male student were much more likely to be included in the ED sample—nearly 78% of the older male subgroup consisted of school-identified ED students. Since predicted values are sensitive to the prevalence of ED in the population, the values reported for this study may be slightly biased due to sampling or, perhaps, biases in school identification of ED. For example, the error in identification of ED by districts is unequal across subgroups with a greater proportion of young females being less likely to be correctly identified with ED and a greater proportion of older males being more likely to be incorrectly identified with ED. However, further research is needed to replicate these findings and clarify the results.
Limitations and Future Research
There were two major limitations to note: (a) the sampling procedure and (b) the use of school-identified ED status as the state variable in the analyses. The procedure used to select the two samples included stratification of students by age and gender, so that the mean age of the ND sample was lower than the mean age of the ED sample and so that the ED-sample consisted primarily of males (80%). Although this sampling technique leads to greater external validity because the samples match the specific subpopulation characteristics, this also introduces a degree of bias in diagnostic indicators that are sensitive to the prevalence of the true state in the population since the prevalence of ED in this sample is not reflective of the national population (e.g., we observed a surprisingly low NPV for older males and a low PPV for younger females). Future research on the EBS might include an equal probability sampling procedure or the use of a weighting technique where ED students are representative of the true population prevalence. This approach would provide less biased estimates for the PPVs and NPVs.
The other major limitation was that we used school-identified ED status as the state variable—the true ED status of the student. This variable may not be the most appropriate state variable, but in lieu of a “true” diagnostic test for ED, we must rely on some indicator of ED status. As discussed above, differences in the procedures used by independent school districts to identify students as ED may obfuscate the functioning of the EBS. However, since the samples were large and the ND sample nationally representative, it seems safe to assume that the uncertainty of “true” ED status is rather minimal. Future research should consider using widely accepted rating scales such as the Social Skills Improvement System, Performance Screening Guide (Elliot & Gresham, 2008), Behavioral Assessment System for Children (Kamphaus & Reynolds, 2007) or the Child Behavior Checklist (Achenbach & Rescorla, 2003) to obtain a measure of ED status to use as the state variable against which to test the EBS.
Additional directions for future research include an in-depth evaluation of the psychometric properties of the EBS. To this point, it has been assumed that the factor structure of the EBS and item-level properties were acceptable because the items were drawn from the psychometrically sound SAED-2. While this is likely a safe assumption, researchers should examine the properties of the EBS using Rasch and factor analytic techniques. Future research might also examine the predictive validity of the EBS with respect to student behavioral and emotional strengths, behavioral outcomes (e.g., out of school suspensions), and academic outcomes. Researchers may also look within the at-risk group of students for profiles that might differentiate long-term ED students from students who are at-risk at one age, but not at a later age.
Implications
The findings from this study along with the results from previous studies of the EBS demonstrate the acceptable psychometric status of the EBS as a valid and reliable screening measure to identify students at risk of ED. As such the instrument has much to offer school personnel. First, the EBS can be used as part of a school universal screening effort where all students in a designated group (e.g., all first graders, all sixth-grade students) are assessed. The objective of universal screening is to discriminate students who are not at-risk from those who may be at-risk. The brevity and ease of use of the EBS make it especially attractive for this purpose. Second, educational personnel sometimes encounter circumstances that indicate a need to screen an individual student for emotional and behavior problems. For example, such a screening may be appropriate for some or all students who are transferring into a school district. In this case one or more of the student’s former teachers could complete the EBS. Third, many educators have adopted a “tier” model of behavior problem prevention and intervention in schools (Kerr & Nelson, 2010; Lane, Kalberg, & Menzies, 2009) as a feature of a Positive Behavior Intervention and Support (PBIS) or Response to Intervention (RTI) approaches to behavior problem reduction (Simonsen & Sugai, 2009). A premise of three-tier models is that in the typical school setting, students fit one of three levels of risk for emotional and behavioral problems, with each level calling for a different degree or kind of behavior intervention. To change a student’s status from Tier 1 to Tier 2 or from Tier 2 to Tier 3 is undoubtedly an important decision as it is the start of treating the student differently than most other students. There can be serious consequences for erroneously doing so, and just as important, for erroneously failing to do so. The EBS can assist with this decision by providing a psychometrically sound approach based on teacher ratings.
Footnotes
Declaration of Conflicting Interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Douglas Cullinan and Michael H. Epstein are authors of the Emotional and Behavioral Screener discussed in the present article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research reported herein was supported, in part, by the Institute of Education Sciences, U.S. Department of Education, through Grant R324B110001 to the University of Nebraska–Lincoln. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.
