Abstract
The identification of gifted and talented students and the accompanying fact that most identification systems result in the underrepresentation of students from African American, Hispanic, Native American, English language learning, and low-income families are two of the most discussed and hotly debated topics in the field. This article provides an overview of past efforts to mitigate inequity in both K-12 and higher education program identification, highlights successes and limitations, and presents a particular perspective in order to help facilitate broader thinking about the purpose of identification, the development of talent, and how academic excellence can be fostered while simultaneously increasing equity in gifted education.
The field of gifted education has faced a serious challenge for decades. Students from African American, Hispanic, Native American, and low-income families are underrepresented, sometimes dramatically, in identified gifted and talented populations (Yoon & Gentry, 2009). Students with disabilities and who are English language learners (ELLs) are also underrepresented. The current level of economic, racial, and ethnic inequality is a problem not only for political and advocacy reasons but also because students from these subgroups represent the fastest growing segments of the K-12 population, and many of their talents are going overlooked and underdeveloped (Lakin & Lohman, 2011; Wyner, Bridgeland, & Diiulio, 2009). Plucker, Hardesty, and Burroughs (2013) have argued that underrepresentation has contributed to large and growing excellence gaps. The Federal definition of giftedness states, “Outstanding talents are present in children and youth from all cultural groups, across all economic strata, and in all areas of human endeavor” (U.S. Department of Education [USDOE], 1993, p. 3), and yet such an inclusive outcome of talent development is rarely reflected in identified gifted populations.
Underrepresentation is neither a new problem nor unique to advanced educational programs at the K-12 level (Gagné, 2011). Institutions of higher education (IHEs) and K-12 schools have tried to balance equity and excellence for nearly 100 years (Brown, 2008). In one sense, IHEs and gifted education programs exist to develop and foster excellence within specific domains. At the same time, society, as represented by educational institutions, seems to value diversity and greater equity—so much so that is has been described as a compelling government interest (Grutter v. Bollinger, 2003). Balancing these two priorities has caused much tension within American educational institutions. How these institutions have balanced both priorities and dealt with this tension are major foci of this article. These institutions have had to deal with the same underlying question that guided this article: How can programs focused on the development of excellence increase their student diversity while at the same time maintaining the needs-based nature of their services? This article outlines past efforts to address underrepresentation, relates where and why they have faltered, and then presents an argument for a scalable method that could help balance the competing goals of equity and excellence.
Background
A major barrier to greater equity in the identification of students for gifted and talented programs is that, on average, students from Native American, African American, Hispanic, and low-income families receive lower observed scores on tests of academic achievement and ability than do their Caucasian, Asian, and higher income peers (Plucker et al., 2013; Valencia & Suzuki, 2001). ELLs also have lower observed scores, though here differences in ability are confounded to some extent by the specific language demands of the test. To outline the scope of the underrepresentation problem, Tables 1 to 4 present the average observed score differences on multiple achievement and ability tests received by a range of student subgroups (language, income, race, and ethnicity). To aid in comparison, differences are reported as standard deviation units (Cohen’s d), otherwise known as the number of standard deviations that a given group scored from the reference group. For each of the four tables, the reference group is always the final row including the mean score that the group received. For example, Table 1 shows that African American students scored more than a full standard deviation lower (d = −1.04) compared against their Caucasian eighth-grade peers on the Wisconsin Knowledge and Concepts Examination (WKCE)–Math. Similarly, Table 2 shows that 13-year-old students who are eligible for free or reduced-price meals received a score on the National Assessment for Educational Progress (NAEP)–Reading that was 2/3 of a standard deviation lower (d = −0.68) compared to the mean score of 275 obtained by their higher income peers.
Minority Subgroup Score Differences on Various Assessments: Scaled to Standard Deviation Units.
Note. NAEP = National Assessment for Educational Progress; WKCE = Wisconsin Knowledge and Concepts Examination–WI State Achievement Test; CogAT = Cognitive Abilities Test–Form 7; AA = African American; R = reading subscale, M = math subscale, V = verbal subscale, Q = quantitative reasoning subscale, NV = nonverbal subscale.
Taken from the NAEP Data Explorer: http://nces.ed.gov/nationsreportcard/naepdata/dataset.aspx. bData for 17-year-olds from 2008. cData were taken from the 2011 WKCE technical manual: http://oea.dpi.wi.gov/files/oea/pdf/td-2011-techman.pdf. dData were taken from the CogAT Research and Development Guide.
Free or Reduced-Price Meal Eligibility Score Differences on Various Assessments: Scaled to Standard Deviation Units.
Note. NAEP = National Assessment for Educational Progress; WKCE = Wisconsin Knowledge and Concepts Examination–WI State Achievement Test; CogAT = Cognitive Abilities Test–Form 7; FRPL = students eligible for free or reduced-price lunch; R = reading subscale; M = math subscale; V = verbal subscale; Q = quantitative reasoning subscale; NV = nonverbal subscale.
Taken from the NAEP Data Explorer: http://nces.ed.gov/nationsreportcard/naepdata/dataset.aspx. bData for 17-year-olds from 2008. cData were taken from the 2011 WKCE technical manual: http://oea.dpi.wi.gov/files/oea/pdf/td-2011-techman.pdf. dData were taken from the CogAT Research and Development Guide.
Minority Subgroup Score Differences From Giessman, Gambrell, and Stebbins (2013) and Scaled to Standard Deviation Units.
Note. CogAT = Cognitive Abilities Test–Form 7; VQN = composite score; NNAT = Naglieri Nonverbal Ability Test–2nd edition composite score; NV = nonverbal subscale; AA = African American.
ELL Subgroup Score Differences From Giessman, Gambrell, and Stebbins (2013) Scaled to Standard Deviation Units.
Note. CogAT = Cognitive Abilities Test–Form 7; VQN = composite score; NNAT = Naglieri Nonverbal Ability Test–2nd edition composite score; NV = nonverbal subscale; ELL = English language learners.
On every one of the assessments presented, students from African American, Hispanic, Native American, low-income, and ELL families received lower observed scores than did their comparison group. These findings also hold true for nonverbal measures of ability (Cognitive Abilities Test–Nonverbal subscale [CogAT-NV] and the Naglieri Nonverbal Abilities Test [NNAT]), which for African American students showed some of the largest score differences, at more than a full standard deviation unit. It is worth noting these data came from a single study (Giessman, Gambrell, & Stebbins, 2013) and were much larger than the differences observed in normative data. Data from the CogAT Research and Development Guide showed observed score differences closer to half of a standard deviation (Lohman, 2012). With these types of assessments being among the most commonly used in gifted student identification (Callahan, Moon, & Oh, 2013), it should be no surprise that substantial underrepresentation in identified gifted populations is the result. The question that leads from these data is whether these differences are due to systematic error or bias or represent actual group differences.
There are several potential explanations why these observed average score differences are so pervasive, such as the idea that they exist because of teacher bias in referrals or that the tests are all flawed in some systematic way. The one we will focus on is the idea that they exist because of systematic inequality of educational opportunity (Worrell, 2015). Opportunity to learn (OTL) as a concept was introduced to ensure valid comparison among nations on international achievement tests in the late 1970s and early 1980s (McDonnell, 1995). Because different topics were taught at different times and to varying degrees in the tested nations, a metric was needed to determine if students performed poorly because they were actually behind or because they had not yet been exposed to the content. This question lead to the development of OTL as a composite variable, which has been measured by variables such as teacher self-report data regarding the curriculum taught, various measures of teacher quality, and school facility data. However, these variables all dealt with differential experiences once students started formalized education. Preschool learning opportunities also contribute to differential OTL.
The reason that OTL and its composite factors are so important is that most tests of ability or intelligence assume some level of similarity in background experience for a given normative group. For example, intelligence tests have very narrow age-level norms to enable inferences that are as valid as possible regarding a person’s ability. By only comparing an individual to those who have had very similar OTL (based on age), assessments are able to produce a more valid measure of underlying ability or aptitude. Unfortunately, grade level, which is often the comparison group for academic achievement tests, serves as a poor proxy for OTL. Simply comparing all third graders to each other because they are all in third grade—setting them up for the same expectation regardless of their preschool experience—is problematic at best if the goal is to identify talent. A given grade level includes individuals that vary approximately 12 months in age. Such wide classroom variability can cause some students to appear less intelligent or gifted than others within the same grade simply because they are a little younger. There is also wide variability in the type and quality of instruction students in “third grade” have received, depending on where they went to school. Just because students have all been in school for 4 years (K-3) does not mean they have had similar OTL—and yet when national grade-based percentiles are used for gifted identification (e.g., 98th percentile using national norms), schools are making the assumption that all of the students in that grade have had relatively similar chances to develop talent. Even within the same school district, the quality of “grade-level” educational opportunities can vary dramatically.
The variability in the quality of education students receive once in K-12 school is further exacerbated by the wide variation in their preschool, informal education experiences. One way to measure OTL is to look at the components that contribute to it such as access to high-quality early-childhood education and exposure to language and rich vocabulary. The seminal study by Hart and Risley (2003) showed that children of professional parents were exposed to about three times as many words in the home as were children of parents who were on welfare. By age three these differences in word exposure resulted in a cumulative vocabulary for the low-income children that was less than half the size of those from high-income families. Hart and Risley (2003) further noted that these performance gaps continued into at least third grade, where their study ended. Similarly, Kornrich and Furstenberg (2013) looked at another factor related to OTL—parental spending on their children. In their 2013 study they found that parents in the lowest two income deciles spent $750 and $900 (respectively) on their children in 2006-2007 compared to $3,701 and $6,573 spent by the highest two deciles. A greater level of spending by the higher income groups provided increased access to high-quality child care and education (the two largest spending categories), which allowed these children to have a wider and deeper range of early educational experiences. Thus, before children even enter school, massive differences in OTL already exist. These early inequalities contribute to the observed score differences shown in Tables 1 to 4 and in the differential rates of identification as gifted reported by Yoon and Gentry (2009). Put differently, the underrepresentation observed in gifted identification rates is, in part, a symptom of larger societal inequality.
Parental spending on their children and early vocabulary exposure are two proxies for the larger concept of OTL that show some students have had what amounts to twice as much “education” when they begin formal schooling. These differences in the quality and quantity of educational opportunities limit the degree to which normative assumptions regarding similar student OTL can be justified. For this reason, the identification of talent needs to correct in some way for students’ prior educational experience or opportunity—otherwise some portion of what ends up being measured in identification is the degree to which students have had opportunities to interact with the content being tested—opportunities that in turn are directly influenced by family income. The direct relationship between access to resources and OTL can result in students who have had the most OTL being perceived as the most gifted. An article in The New York Times (Anderson, 2013) described the struggle New York City Public Schools has had with kindergarten test preparation for gifted program entry. Higher income families were paying thousands of dollars for test preparation and tutoring for their 4-year-olds to get higher scores on the tests used for admission into the city’s gifted program. In this case, increased income provided greater access to educational opportunities in the form of test preparation courses. Because of this increased access, these students will be perceived as “more gifted” than students who did not have similar access. Because income and OTL are not equally distributed across all demographic groups, the outcome is substantial underrepresentation among certain racial and ethnic groups. The key question then becomes, how can the field of gifted education compensate for differential OTL to more accurately identify talent and increase the equity of identified populations, while still maintaining the needs-based nature of gifted programming?
In the past, the field of gifted and talented education has approached these large observed score differences, and the underrepresentation that is their result, using two perceived solutions. The first follows the belief that the existing assessments themselves are the barrier—that they are somehow flawed and these flaws are what cause the mean score differences—and that new instruments are needed. We refer to this perspective as the “different test” perspective because it often includes the recommendation that other, less-biased tests should be found and used (e.g., Briggs, Reis, & Sullivan, 2008; Maker, Nielson, & Rogers, 1994; Naglieri & Ford, 2003, 2005). Followers of this perspective imply or state directly that most achievement and traditional ability tests used in K-12 education are racially biased or are too culturally loaded to yield valid data for identification decisions. The second perspective follows a belief that the manner in which traditional measures of academic ability and achievement are used is the barrier and that such measures should continue to be used but in different ways. We have referred to this as the “use tests differently” perspective (Peters, Matthews, McBee, & McCoach, 2014). We present an overview of both methods below and discuss the strengths and weaknesses of the specific applications that fall under each approach.
Use Different Tests
Under the “use different tests” perspective, “culture-neutral” or “bias-free” tests and assessment methods are recommended for identification, with authors such as Naglieri and Ford (2003, 2005, 2015) contending that this approach would help eliminate underrepresentation. Such tests often include stand-alone nonverbal ability tests (e.g., NNAT) or broader academic ability tests that include nonverbal subscales (e.g., CogAT). Despite these claims, recent research by Carman and Taylor (2010) and by Giessman et al. (2013) has demonstrated that proportional representation was far from assured by this approach. Tables 1 to 4, based on normative data, show that observed group differences were smaller for the CogAT than they were on NAEP or WKCE (achievement measures). These smaller score differences could suggest the CogAT might be more suitable to help address underrepresentation. However, when these assessments were applied in the Giessman et al. study, both CogAT and NNAT score differences were almost universally larger for African American students than they were on the NAEP or WKCE scores (based, again, on normative data). In the Giessman et al. study, simply using a different test did not guarantee proportional representation or even something close to it. Naglieri and Ford (2015) contended that the reason for the lack of successful proportional representation is that the data were from a single district that did not represent the entire United States’ population demographics. But if underrepresentation actually was caused by racial, ethnic, or cultural bias in the identification tool itself, and a particular test was truly “culture-neutral,” this should make no difference unless the mean score differences represent real differences on achievement caused by unequal opportunity.
Even if culture-neutral tests exist, they are not likely to align well with the most common types of gifted programming—a problem of content validity. Traditional measures of academic ability may be the best choice if the goal is to identify those students who are most in need of and who benefit from the academically oriented programming found in schools (Lohman, 2005, 2009). Such a position does not mean that other measures—such as nonverbal ability tests—could not be used to identify students for nonacademic programming or non–verbal-specific interventions (see Peters et al., 2014). What matters is that the measure or measures used for identification are closely aligned to the intervention for which students are identified. This criterion is the single most important characteristic of a strong identification system (Lohman, 2009; McBee, Peters, & Miller, in press).
In a similar effort to develop a “different kind of instrument that taps into [underrepresented learners’] abilities in a more faithful and valid way” (Sarouphim, 1999, p. 246), the DISCOVER Assessment was designed based on Gardner’s Multiple Intelligences as a performance assessment that would not suffer from the same observed score differences as traditional measures (Maker, 2005). DISCOVER, like the NNAT, was also presented as “bias-free” (Sarouphim, 1999, p. 246) and involved students completing problem-solving tasks aligned to spatial, linguistic, or logical-mathematical intelligences. Although Maker (2005) stated that across two other studies no significant differences in racial/ethnic representation rates were found in the populations identified as gifted and that the resulting proportions were similar to their representation in the overall student population, this otherwise promising finding suffers from the same crucial content validity limitation as most other “different” assessments. Worrell (2015) stated succinctly that no studies have examined the success of students identified using DISCOVER following their placement in gifted programming. If proportional representation (equity) was achieved but these students were placed into traditional, academically focused programming (as opposed to a problem-solving focused DISCOVER curriculum), they may not have succeeded there. Simply measuring a different construct to achieve equity of identification misses the point. What is needed is a process that not only balances equity with excellence but also identifies students who are representative of the larger student population without drastically changing the nature of existing gifted program services.
A second classification within the “use different tests” paradigm that is also similar to DISCOVER involves standardized performance assessments and observational protocols. Examples using such systems include the Young Scholars Program (Horn, 2015) and the USTARS (Using Science, Talents, and Abilities to Recognize Students) system with its corresponding Teacher’s Observation of Potential in Students protocol (Harradine, Coleman, & Winn, 2014). In these systems, teachers follow both a structured environment (model lessons or activities) and a structured observational system to identify student strengths. These structures help avoid a problem common to most achievement tests and teacher rating scales, which is that any given student might not demonstrate the skills the teacher is looking for, simply because he or she has not yet had a chance to develop that particular skill (due to individual differences in OTL).
In the case of the USTARS and Young Scholars programs, the content students interact with is delivered within the current class, which helps mitigate the issue of conflating prior educational opportunity with actual ability. Research on the Young Scholars Program has shown a significant increase (565%) in the number of African American and Hispanic students identified for gifted services (Horn, 2015). Data from the same source also suggested students are successful in the programs for which they have been identified, likely because of the clear connection between the content of the model lessons and the academic nature of the programs and services provided. The greatest potential limitations are in scale and practicality. Involving every teacher not only in the completion of an observation protocol but also in the delivery of a standardized set of lessons would be a substantial undertaking for most districts. Still, based on available data, Young Scholars shows the greatest potential of the “use different tests” methods when it comes to both identifying greater numbers of underrepresented learners and assuring that those identified are successful in the resulting program.
Use Tests Differently
We refer to the second class of potential solutions to underrepresentation as the “use tests differently” method because it often emphasizes using similar instruments to those that are already in common use in K-12 education (e.g., academic achievement tests, ability tests, rating scales, observations, and portfolios), but in a different way—specifically by using different norm groups. Using a local norming group, as opposed to a national one, is probably the most widely accepted version of the “use tests differently” method (Lohman, 2009). In using local norms, a school focuses on the question of which students are the most advanced, and therefore most likely underchallenged and in need of additional services, compared to other students in the same school or district. Local norms, especially at the building level, make the most sense for gifted identification because they identify the students who could most likely benefit from additional advanced intervention—following the logic that those students who are farthest from “typical” in a given school are the most likely to be underchallenged. Furthermore, the comparison of students in one school to those in the rest of the nation, or even in the rest of a given district, may not be useful if the purpose of gifted education is to better match student needs with appropriate interventions (see USDOE, 1993). Such decisions should be made locally because it is locally that a child is or is not appropriately challenged, and it will be locally within a particular school that any intervention will be provided.
Using a local classroom, school, or district for comparison purposes is not the only application of a different norm that has been recommended to address underrepresentation. When used with regard to race or ethnicity, using a different process or identification system for a particular student subgroup parallels some applications of affirmative action efforts in K-12 and higher education. Although often far more complex in implementation, affirmative action involves comparing individuals of particular subgroups only to other members of the same subgroup for the purpose of identifying the most “talented” candidates from within each group to place into a program. In doing so, the students from the subgroup of interest are given a kind of special preference to reach the institution’s goal of greater equity. In such situations, both increased equity and the development of excellence are institutional goals. As any reader is likely aware, these policies are controversial—so controversial that some states have passed laws or constitutional amendments against the use of racial factors in university admission procedures (e.g., California Proposition 209; Schutte v. Coalition to Defend Affirmative Action, 2013).
As we will present in the following section, individuals on both sides of the affirmative action argument continue to disagree on the effectiveness of these policies (Sander & Taylor, 2012). However, methods under the “use the tests differently” category—specifically group-specific norms—have never been fully embraced with regard to gifted and talented program identification. Would such methods work to increase the equity within gifted education populations? If so, could such methods be implemented while also maintaining the focus on the development of excellence within gifted interventions? In the following section we present a rationale for group-specific norms when applied to a specific demographic variable (family income), followed by several case studies of what has happened when similar methods have been applied in K-12 and higher education institutions.
The Purpose of Gifted Education
Although subject to much debate within the field, the overall purpose of gifted education seems to be to support the development of excellence “at the upper end of the distribution in a talent domain” (Subotnik, Olszewski-Kubilius, & Worrell, 2011, p. 7). At the same time, gifted education programs are often drastically unequal when it comes to the racial, ethnic, and economic representation among participating students. These two facts get at the heart of the excellence verses equity debate. In one sense equity or equality are irrelevant—all that matters is the development of excellence. From this excellence perspective, students who demonstrate the potential to achieve excellence should have their abilities developed regardless of their race, ethnicity, or family income. However, the current state of inequity has been a burden to the advancement of excellence and the broader acceptance of gifted education as a field. Balancing these two priorities—excellence and equity—is an important task for the field and for K-12 schools as they develop identification policies and procedures.
The field of K-12 gifted education—through student identification—deals with a kind of program admission or student selection for needs-based interventions (McBee, Peters, & Waterman, 2014; Peters et al., 2014). In this way, identification is similar to the admissions process that occurs in colleges and universities. In theory, both function as a formalized process through which individuals are identified on the basis of whether or not they need or would benefit from the program for which they are being considered. The goals of individual programs vary, as do the number of spaces available. This is why any discussion of identification or program admission must start with the overall goals of the program. The current Federal definition for giftedness can be seen as one such goal: Children and youth with outstanding talent perform or show the potential for performing at remarkably high levels of accomplishment when compared with others of their age, experience, or environment [italics aded]. These children and youth exhibit high performance capability in intellectual, creative, and/or artistic areas, possess an unusual leadership capacity, or excel in specific academic fields. They require services or activities not ordinarily provided by the schools. Outstanding talents are present in children and youth from all cultural groups, across all economic strata, and in all areas of human endeavor [italics added]. (USDOE, 1993, p. 3)
The purpose of gifted education programming that follows some version of the Federal definition is to support the development of excellence in students from all cultural and income groups. This definition is where the focus and attention on both excellence and equity comes from in K-12 gifted education. The policy explicitly states that when students are being identified they should be compared to others of similar “age, experience, and environment”—presumably to obtain the most accurate information regarding which students need additional educational services that are not already provided as part of the “ordinary” curriculum. The traditional implementation of this particular part of the definition is to compare student scores on standardized ability tests to those obtained by their age-level peers. Age then serves as a proxy for “age, experience, or environment”—factors that above and elsewhere have been referred to as OTL (Lohman, 2009; McDonnell, 1995; Pullin, & Haertel, 2008). Unfortunately, relying solely on age-based norms for identification purposes has resulted in persistent underrepresentation over a span of decades, and there is no reason to think this will change in the future.
Much of the discussion so far has proceeded as if differential OTLs exist and that there is nothing that can be done to address these gaps before they occur. But this is not necessarily true. These differentials do not exist to the same extent in other industrialized nations outside the United States nor have they always existed in the United States. Instead of trying to develop alternative assessment procedures that compensate for these differential OTLs, other options would be to eliminate OTL differences or prevent them from forming in the first place. Other nations do not show such wide differences in educational performance due to income, and additionally the United States has had some success in addressing gaps in opportunity through the Early Head Start Program (EHS). Reducing or eliminating economic barriers to educational opportunity could also help mitigate the underrepresentation that exists in gifted education. However, putting such an idea into policy is a value-laden proposition that is also very likely cost prohibitive.
An analysis of scores from the Program for International Student Assessment (PISA) by Carnoy and Rothstein (2013) showed much larger achievement score differences based on economic class in the United States than in similar industrialized nations. Part of the explanation for these differences was the fact that such a large percentage of American students live in poverty. In the United States, 38% of K-12 students live in the bottom two of six income categories, compared to students in “top-scoring” nations: 21% in Canada, 17% in Finland, and 14% in Korea. Comparisons of PISA data show the United States scoring approximately 33 and 50 points lower than these “top-scoring nations” in reading and math. However, when international comparisons were reweighted based on the percentage of participating students in each social class, the United States showed smaller income-related score differences. Instead of attempting to compensate or account for differential OTL, a more effective or ethical approach would be to prevent such gaps from occurring in the first place. Although such a plan might seem like a lofty goal, it is exactly what was intended with EHS. The Federal evaluation of EHS in 2002 found that the program had a significant, although modest, effect on both children’s cognitive development and parenting practices (Administration for Children and Families, 2002). Although these effects were relatively small (effect sizes in the 0.10-0.20 standard deviation range), certain groups, such as African Americans, showed effects in the 0.20 to 0.50 range. Some salient effects of the program included a higher frequency of parents who read to their children every day or at bedtime, higher cognitive development scores for the participating children, higher vocabulary scores, and fewer behavior problems. Although the long-term outcomes of EHS have been less conclusive, what this evaluation shows is that it is possible to close some early readiness and OTL gaps though explicit programming and services. Many of the “top-performing” nations presented above have a higher level of universal access to high-quality early-childhood education, which raises the floor on OTL for all students. We point this out simply because it could be seen as preferable for these OTL gaps never to come into being in the first place. That said, we now proceed as if eliminating these gaps is unlikely in the near future and the alternative is to try to control for them when identifying student talent.
Group-Specific Norms
Lohman (2006, 2009), Lohman and Renzulli (2007), and Peters and Gentry (2012) have been proponents of group-specific norms in gifted education as a way to better control for OTL when making identification decisions. The influence of OTL is likely one of the reasons that higher income students are represented at higher rates in gifted and talented programs than are their lower income peers. Stated simply, greater access to educational opportunity has influenced the child’s observed level of ability (“observed” here is used in reference to the classical test theory formula of observed scores being a factor of true score plus error). The child is not necessarily of higher ability or more gifted than his or her low-income peers; he or she has simply had more time and opportunity to develop her skills in the areas assessed, which in turn leads to disproportional representation rates. To be clear, the fact that some students perform higher than others or that increased schooling results in increased achievement are not bad things. However, the wide variation in students’ OTL does complicate the pursuit of equity in gifted education.
The fact is that no test can measure ability or aptitude alone, without being influenced by students’ prior OTL—the interaction between environment and ability is too strong. Ability and environment interact over the life span to complicate the process of gifted identification or any measure of relative talent. Over time, absent appropriate nurturance, any innate ability will begin to degrade. In the United States, educational opportunity is not equally distributed, meaning any initially equal distribution of ability across demographic groups will not linger for long in the absence of appropriate development. Because of local school funding formulas, neighborhoods and cities with wealthier populations are also more likely to offer a wider range of educational opportunities, including better quality schools and teachers (National Center for Educational Evaluation and Regional Assistance, 2011). These differences in educational opportunity manifest themselves in different mean scores on standardized assessments. These observed score differences occur across racial, ethnic, linguistic, and economic groups, as presented in Tables 1 to 4. Tests themselves are not racist or biased just because they yield differences in observed scores across these groups. Instead, any observed differences in test scores are due in part to differences in OTL, which itself is also strongly related to family income. When children do not have their educational needs met, before or during the school years, any innate ability, potential, or talent is likely to fade. By the time schools attempt to identify high potential in their students, which may be as late as the third grade in many settings, any hope of equity has fallen by the wayside.
We hope by now it is clear that inferences regarding academic talent are best made by comparing students to others of similar OTL. Part of the lingering challenge is that OTL is a complex construct that would be difficult for schools to measure or to use as an instructional grouping method. Instead, a related variable such as family income could be used as one, albeit imperfect, metric of opportunity. To be sure, family income or free or reduced-price lunch (FRPL) eligibility are still only approximations of OTL, but FRPL data are readily at hand in most schools. And, FRPL status is strongly related to OTL. Ladd (2012) in her presidential address to the Association for Public Policy noted that roughly 40% to 46% of the variance in student reading and math scores can be predicted by family income. Of course, many students from low-income families do receive enriched home and formal education, meaning lower opportunity does not apply to all low-income students. However, if group-specific norms are used—for example, all low-income, second-grade students are only compared to other low-income, second-grade students—then the influence of OTL is decreased because individuals from similar income groups are more likely to have had similar OTL than individuals from different income groups. Such an approach is far from perfect and would constitute only a small correction to the much larger problem of educational inequality, but it would still offer an improvement over the status quo. Considering income or FRPL status will eliminate some construct-irrelevant variance that would otherwise be included in a student’s test score and will therefore yield a less biased observed score (Lohman, 2009). This practice, in turn, would help further the goal of equity in gifted education. The lingering question is this: If this method were used, would it still identify those students who are in need of additional challenge? Would such a modified identification system still assure the necessary and appropriate challenge remained present in gifted education services, or would such a system simply admit some students who do not really need the program just for the sake of increasing diversity? These questions represent the essence of the tension that exists between excellence and equity in gifted education. Both goals must be satisfied: greater representation of traditionally underrepresented students and the internal consistency of gifted education as a needs-based intervention designed to foster excellence in academic outcomes.
In previous research, Peters and Gentry (2012) demonstrated that the application of group-specific norms does in fact increase the representation of low-income students in a gifted program regardless of how giftedness was operationalized (whether those in need of the intervention included the top 25%, 10%, or 5%). This was an application of Lohman’s (2009) recommendation: For those whose experiences differ markedly from the norm, aptitudes need to be judged relative to a different cohort. Always, the preferred comparison group would be those who have had roughly similar opportunities to acquire the abilities sampled by the test. (p. 975)
Peters and Gentry’s (2012) findings suggest that applying group-specific norms to any underrepresented population that has had markedly different educational opportunities (e.g., the low-income students in their study) will serve to increase the representation rates of that population, thereby advancing the goal of equity. However, group-specific norms might appear to be in conflict with the entire premise that advanced educational services exist to develop excellence. If a school sets program admission criteria based on who has a need for and will benefit from the program and some students do not meet those criteria, then they should not be admitted regardless of how that lack of need might be explained (e.g., by lower access to OTL). This might appear to be the end of the discussion—the development of excellence is all that matters. However, in the past, several institutions including some outside the K-12 setting have used variations on group-specific norms to combat underrepresentation. What we present next are case studies in these efforts, with specific attention to how the institutions maintained the internal consistency of their programs while also increasing program diversity.
Case Studies in the Proactive Identification of Underrepresented Learners
So far, we have discussed three things: (a) students from certain populations are underrepresented in identified gifted and talented populations; (b) one contributing factor to underrepresentation is the differential OTL that exists across those groups; and (c) using income-group specific norms would allow for closer OTL comparisons, thereby increasing the overall equity of the identified gifted population. The last point on income was made with the understanding that previous research showed that increased diversity was achieved through income group–specific norms because of the strong correlation between income and test scores. In the following text we present some examples of institutions at the higher education and K-12 levels that have attempted to increase the diversity of their enrolled populations via proactive, group-specific methods and the lessons those efforts can provide for the field of K-12 gifted education.
For decades, American K-12 and higher education institutions have grappled with how to further the goal of equity within their student populations. Arguably the most contentious method employed has been race-conscious preferences or affirmative action programs. In the 1990s, the University of Michigan found that racial and ethnic minority students were underrepresented on its campus. To increase the number of racial and ethnic minority students, the University implemented a policy in which race was included as one factor in program admission. The result of this policy was that by 1996 the University of Michigan Law School was able to boast an entering class made up of 35% minority students. Although the substantial increase in minority representation was seen as a success by the university, nonminority students filed suit arguing that they had been denied entrance to both the undergraduate (Gratz v. Bollinger, 2003) and law programs (Grutter v. Bollinger, 2003) on the basis of race. Their argument was one of internal-consistency conflict: The university purported to provide an educational program to those who demonstrated potential for excellence but then rejected some students who appeared to meet the established criteria, while simultaneously admitting some students who did not appear to meet the criteria. Petitioners challenged that the university did not have a compelling reason for its focus on equity.
In a related case at the K-12 level (Parents Involved in Community Schools v. Seattle School District No. 1, 2007), the Seattle, Washington, School District implemented K-12 student assignment plans that relied on race as a factor. For Seattle, the purpose of the plan was to bring their high schools into greater racial balance—to address the problem of equity within particular buildings. As with Grutter and Gratz, individuals filed suit against the district claiming that such decision making was inconsistent with the purposes of the educational systems—that considering race was inappropriate and arbitrary when it came to deciding whom to place in a given school or program. Petitioners argued that the preference given to racial and ethnic minority applicants was inappropriate to the point of being an equal protection violation.
In perhaps the most relevant case for this article (McFadden v. Board of Educ. for Illinois School Dist., 2013), a K-12 district in Elgin, Illinois, established a special gifted education program to further the goal of equity—one targeted to ELLs (primarily Hispanic)—that was taught in Spanish and English by bilingual educators. This bilingual gifted program coincided with a similar program for proficient English speakers taught completely in English. In both cases the curriculum was the same. The district’s rationale for having parallel programs that differed only in language was that the ELLs would not benefit from an intervention delivered in English because of their limited language skills (a fact with which the judge in the case agreed—McFadden, opinion of Gettleman, p. 11). Because of the perceived language barrier, and based on the district’s explicit focus on equity while also developing excellence, they implemented a proactive, specialized program and corresponding identification policy to develop the talents of their ELLs. It is important to note that because the norm group was supposed to be a language proficiency classification (not a protected class), nothing up to this point was seen as inappropriate in the resulting legal case.
An even more recent case involving The University of Texas (UT) at Austin (Fisher v. University of Texas at Austin, 2013), still ongoing as of this writing, highlights some of the challenges as well as successes related to using income as a variable for identification. The University saw a troublingly low rate of enrollment among students from minority and low-income families. Because of this observation, the University set out to increase the diversity of its student population. The State of Texas had allowed that any student who graduated in the top 10% of his or her high school class would automatically receive admission to UT. Thus, the State implemented a group-specific norm using the individual schools as the point of comparison—a solid proxy for OTL because students most often attend schools within geographic areas that are economically similar. In fact, the law was crafted specifically to identify the students who were the most talented relative to their school peers, while at the same time also increasing student diversity.
Where every one of these efforts (K-12: Parents Involved; undergraduate: Gratz and Fischer; graduate law school: Grutter; and K-12 gifted education: McFadden) ran into legal challenges was when they used race or ethnicity as a variable in their application of group-specific norms. Because the use of race or ethnicity was not narrowly tailored and was not implemented after all other alternatives had been tried, the practices were ruled unconstitutional. In the case of McFadden, nothing the district did was seen as crossing a legal threshold up until they used student ethnicity as the indicator of whether or not a student was an ELL. The use of ethnicity as a proxy for OTL for the purposes of group-specific comparisons was inappropriate and meant that even those Hispanic students who were English language–proficient were rarely placed in the standard gifted education program. This fact was unacceptable to the judge in the case. The message from the McFadden case is that proactive gifted education programs designed to increase representation rates among some populations are acceptable as long as they avoid using protected class variables such as race or ethnicity, and instead stick to variables such as family income or language proficiency. This is consistent with research by McBee, Shaunessy, and Matthews (2012) who found greater rates of gifted program participation by Black students in districts that had implemented alternative identification criteria targeting only students identified as ELLs or from low-socioeconomic status households in Florida school districts.
In the one case in which using race was deemed acceptable (Grutter), it was because its use was consistent with a major goal of the university program—that of equity in their student population—and was narrowly tailored. Considering student diversity while also assuring the internal consistency of a program can be applied to gifted education as well, with the conceptual definition or goal of the program being the priority, and income-proxy measures of OTL being the most appropriate grouping variable. Thus, the use of admissions preferences designed to increase diversity can be rationalized if appropriate goals or program priorities are in place.
The UT “Top 10%” rule did lead to large increases in racial and ethnic diversity at the state’s flagship university. Concurrently, in the roughly 10 years since the Top 10% rule was implemented, freshmen retention has increased from 87.7% to 91.9%, and 4-year graduation rates have also increased from 30.2% to 51%. Although certainly not causational, these data do suggest that the inclusion of a more diverse (racially, economically, and regionally) population at UT Austin has not resulted in lower academic performance of the student body as a whole. Top 10% students also have higher college grade point averages than do their non–Top 10% peers (UT Austin, 2008). Clearly, neither did the proactive inclusion of a more diverse set of students harm the programs or services offered nor did the newly included students fail in the intervention they received (undergraduate education). The University was still able to develop and foster excellence, even while it increased diversity among its student population.
In one sense, UT Austin did exactly what it was supposed to do based on case law: It used school norms instead of race or ethnicity norms to increase the diversity of its population. In using group-specific norms, the same assessment tool or standardized test is used for all students—diverse or otherwise. The only difference is how an individual’s score is evaluated—in a general normative fashion or in a group-specific comparison. In the past, schools have compared all individuals of a given grade level to each other in an attempt to control for some degree of prior life experience. However, as Yoon and Gentry (2009) noted, a serious lack of diversity persists. The legal problem that arose in the UT Austin case (Fisher v. University of Texas at Austin, 2013) stemmed from the fact that the University wanted to go even farther to increase diversity. Consequently, the University added race as an additional factor in the admission process. While not striking down the University’s consideration of race in admissions, the Supreme Court required that the lower courts give the University’s policy exacting strict scrutiny so that “[t]he reviewing court must ultimately be satisfied that no workable race-neutral alternatives would produce the educational benefits of diversity” (Fisher, opinion of Kennedy, p. 2420). The Supreme Court reviewed this case for a second time in late 2015 with a decision pending as of this writing. The geographic norm used in the Top 10% law has never been challenged. It was only when the University added specific preference for race or ethnicity that it entered a legal grey area.
Discussion
What these cases make clear is that race and ethnicity as student grouping variables are considered in a completely different fashion with regard to legal permissibility. Even in a case in which the use of race was found to be permissible (Grutter), it was only with a number of caveats and conditions. In most cases the use of race or ethnicity was found to be inappropriate (Gratz, Parents Involved, McFadden) and was further restricted. What these findings suggest is a need for a method that can increase the equity of a program without relying on contentious factors such as student race and that can be implemented while maintaining the needs-based nature of the program in line with the overall goal of the development of excellence. Based on the discussion of assessment and normative scores, the relationship that income has with OTL, and the challenges inherent in using any kind of racial or ethnic factors in educational policies, family income appears well suited to be used in group-specific norms when making identification or placement decisions if there is an explicit priority in place for greater equity in the local gifted education program. Absent the authority granted from an institutional goal or definition such as presented in the Federal definition (USDOE, 1993), such proactive attention and preference cannot be supported since it would degrade the internal consistency of the gifted program without satisfying an alternative goal. The Texas Top 10% rule also made it clear that such a method can be implemented without sacrificing the overall purpose of the program, that is, the development of excellence and advanced achievement.
What to do With Those Identified?
Any proactive effort toward greater equity in gifted education programs will involve some sacrifice. Because of the level of economic and educational inequality that exists in the United States, it seems likely that no gifted education program can have perfect equity or even greater equity without sacrificing some of the focus on excellence. A critical consideration in applying group-specific norms is that the resulting students identified, from both high- and low-income groups, will have more varied educational needs than would a population identified under a single norm. A good rule of thumb is that with any differentiated identification system—one in which the identification procedures have been in any way modified to further the goal of equity—comes a need for differentiated services. The development of excellence will look different for the students identified under a group-specific norm as opposed to a single, general norm. Just as in any identified population, students will have a range of needs, but this range will become broader as the identification criteria are broadened. Some students may just have met the requisite entrance score, whereas others will score at the ceiling of the test. With group-specific norms, these needs become even more diverse and necessitate multiple levels of services or support to assure all students are successful. In implementing such a system, students cannot simply be identified and then placed in a generic “gifted” program that does not match their current level of need. Varied identification systems need to be matched with varied services and supports. This is the cost of greater equity.
What these levels of services might look like depends on the actual learning needs of the identified populations, and these will vary by district and school. However, there are two general classifications of such supports. The first is that differing levels of gifted education interventions can be provided, and then students, once identified through both general and group-specific norms, are placed in the one that is most individually appropriate. The challenge to such tiered levels of services is that while it might not be illegal (similar to McFadden but without the reliance on student ethnicity as a variable), it could create the perception of intentional segregation if (as is likely) the majority of the low-income students are in the more supportive but also possibly less rigorous gifted intervention, while higher income peers are in the more rigorous program. This might maintain the excellence focus in the more rigorous option, but it would also leave that program without much in the way of greater equity. Alternatively, the less rigorous option would have greater equity but less of a focus on excellence. Instead, we suggest that a better option is to have a single domain-specific intervention for which students with certain needs are identified but that those identified through income group-specific norms should be provided additional academic and social support to increase their likelihood of success (e.g., Project EXCITE: Olszewski-Kubilius, Lee, Ngoi, & Ngoi, 2004). Such support parallels what happens in higher education, where assistance in transitioning to college, tutoring, and mentoring are provided to those students who were admitted through proactive identification systems (e.g., group-specific norms) and to help first-generation students succeed in the standard undergraduate courses (Lohfink & Paulsen, 2005; Pascarella, Pierson, Wolniak, & Terenzini, 2004). This example of a single program coupled with additional supports offers the best possible balance of excellence and equity. Greater diversity is achieved through group-specific norms, those students receive additional support to develop excellence, and the purpose and domain-specific content of the intervention stays the same.
What can be learned from these past court cases and the efforts of institutions of higher education seems relatively simple. First, what is meant by “group” when referring to group-specific norms is very important. Family income is much more closely tied to OTL (as is ELL status or language proficiency) than are factors such as race or ethnicity, and these descriptors also avoid the legal challenges of using racial or ethnic variables. It is also clear that considerations of seemingly relevant factors such as language proficiency in the identification of students for gifted programming must be implemented with caution, lest a school district inadvertently confound ethnicity with language proficiency, which would then need to be evaluated under the requirement of strict scrutiny. Probably the more relevant implication is that income groups have not been considered suspect classes from a legal perspective. Moreover, the Federal Government has specifically stated that using race-neutral variables such as family income are a preferred way of addressing diversity and underrepresentation in K-12 school programs (U.S. Department of Justice & USDOE, 2011). These complex considerations point to the use of income-group specific norms as a more appropriate and more legally and politically palatable practice than the use of racial- or ethnic group–specific norms for gifted and talented student identification practices designed to increase the diversity of the resulting populations. There must be a balance between excellence and equity, and developing group-specific norms on measures relevant to the content of the program to be provided seem well suited to achieve such a balance.
Our article does not attempt to address the communication or perception factors associated with group-specific norms, one of which is that parents, teachers, or administrators might be upset by the practice of identifying students as “gifted” even though they met different criteria—as could be the case with group-specific norms. However, such a challenge exists primarily because of a misconception of what “gifted” means in the context of K-12 public schools and its value-laden nature. “Gifted” is a need for a service not already provided in a school program (USDOE, 1993). Giftedness is not a permanent state of being, a kind of reward for good behavior, or the smile of good fortune (Lohman, 2006) of being born into a particular family. One of the challenges with the term is that it is carries a value judgment that is bestowed by the educational system. If gifted education is marketed and described as simply being in place to provide appropriate interventions for students with identified needs that cannot be met in the general education program, it will likely be more politically palatable in general. It would also make gifted education identification a more straightforward process of need finding or need identification as opposed to student trait identification.
In the end, we believe group-specific norms, especially when applied in tandem with local norms and universal screening, have the best potential for increasing the equity of gifted education programs, while also maintaining the nature of gifted education as services that develop domain-specific talents. This does not mean it would be easy because there are both communication and programmatic considerations that would need to be addressed. However, if the field of advanced education believes that greater equity is an important goal, then it needs to take proactive efforts such as those described in this article.
Footnotes
Acknowledgements
Special thanks to the following people for their feedback on prior versions of this article: Linda Greene, the Evjue-Bascom Professor of Law at the University of Wisconsin–Madison; David Lohman, Professor Emeritus of Psychological and Quantitative Foundations at the University of Iowa; Marcia Gentry, Professor of Educational Studies at Purdue University; Jonathan Plucker, the Julian C. Stanley Professor of Talent Development at Johns Hopkins University; Michael S. Matthews, Associate Professor of Special Education and Child Development at the University of North Carolina–Charlotte; and several anonymous reviewers.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
