Abstract
In the Cultivating Diverse Talent in STEM project, funded by the National Science Foundation in the United States, new assessments were developed, field tested, used to identify students with exceptional talent in science, technology, engineering, and mathematics (STEM), and compared with existing methods (grade point average [GPA], letters of recommendation, self-statements). Students identified by both methods participated in an internship program in laboratories of scientists on the campus of an R1 university in the Southwest. Existing methods limited the diversity of students identified. Significant differences were found between students identified by the new methods (M2) and existing methods (M1) in GPA, ethnicity, and parent level of education. Ethnicity differences may be due to the ethnic makeup of the partner schools, but differences in GPA and parent level of education cannot be attributed to the location of schools. Although GPAs of M1 students were significantly higher (3.71) than those of M2 students (3.07) and M1 students came from higher income groups and schools in higher income areas, the M2 students scored higher on all the performance assessments of creative problem-solving and at similar levels on concept maps and mathematical problem-solving. Studies of the usefulness and psychometric properties of the new assessments are needed with different groups and in different contexts.
Keywords
In the past two decades, scientists, policy makers, and the general public have been concerned with the state of science, technology, engineering, and mathematics (STEM) education in the United States, recommending reforms to enhance the ability to produce a workforce that is literate in scientific, numeric, and technological areas, calling for a concerted effort to stop importing STEM talent from other countries and to develop a generation of STEM innovators (National Science Board [NSB], 2010). STEM innovators are defined as “individuals who have developed the expertise to become leading STEM professionals and perhaps the creators of significant breakthroughs or advances in scientific and technological understanding” (p. vii). The NSB members recommended policy changes in three main areas: (a) “provide opportunities for excellence,” (b) “cast a wide net to identify all types of talents and to nurture potential in all demographics of students,” and (c) “foster a supportive ecosystem” (NSB, 2010, pp. 2, 3). Thus, assessments designed to identify future STEM innovators must include measures of varied abilities needed to be innovators, not just conventional, easy-to-administer tests, and they must be appropriate for use with students from all cultures, language groups, and socioeconomic levels. Identification of potential talent must be accompanied by opportunities and programs that are challenging and matched to the levels of ability and interests of the students.
The Importance of Developing New Assessments
When designing assessments to identify future STEM innovators, test developers must create methods to find not only those who can learn and develop expertise in STEM areas but also those who have the potential to make significant breakthroughs or advances in understanding. In an extensive review of research on talent identification and development, Subotnik and colleagues (2011) concluded that the most important variables associated with outstanding achievements in both productivity and performance are “general and domain-specific ability, creativity, motivation and mindset, task commitment, passion, interest, opportunity, and chance” (p. 4). Although many scholars would agree this is an important list of variables, less agreement exists about how to measure them. In the sections that follow, each of these variables and the methods for measuring them are reviewed.
General and Domain-Specific Ability
General abilities usually are measured by intelligence tests and overall grade point average (GPA). These measures often are criticized because students of color (Miller, 2004; Plucker et al., 2010) and students from low socioeconomic status (SES) levels (NSB, 2010; Rojas, 2015; Smith et al., 2014) have lower average scores than students from middle and high SES levels. Domain-specific ability most often is measured by achievement tests in specific domains and by grades in specific subjects (e.g., math, English, life science, physics). Measurement of domain-specific abilities as a method of selection also has been criticized because of the achievement gap between students of color and White students and the achievement gap between students from low-income groups and those from high- and middle-income groups. Miller (2004), for instance, found that African Americans, Hispanics, and American Indians were “severely underrepresented” among the top 1%, 5%, and 10% on almost every measure of achievement: grades, GPA, class rank, and standardized test scores. Plucker and colleagues (2010) found similar results using National Assessment of Educational Progress (NAEP) and state achievement test results.
In recent years, an important paradigm shift has been occurring that has important implications for identification and nurturing of talent (Dai & Chen, 2013; Subotnik et al., 2009). For many years, intelligence was viewed as being fixed and stable; but much research has shown that not only does IQ change in individuals (Dai & Chen, 2013; Sternberg, 2005), but it also changes in large populations, as evidenced by the “Flynn Effect”: massive IQ gains worldwide during the 20th century (Flynn, 1987, 2007; Neisser, 1998). These gains are not consistent across domains of ability, and in some later studies in different cultures, they have reached a plateau or reversed (Teasdale & Owen, 2008). However, these studies taken together demonstrate that intelligence is not fixed or stable. An alternative view of intelligence, an emerging paradigm, is that of intelligence as developing expertise (Ericsson et al., 2005; Sternberg, 1999; Subotnik et al., 2009). From this perspective, any test or assessment simply reflects the level of development of a particular form of expertise.
Another problem with tests of achievement and similar measures of domain-specific abilities is the ceiling effect for students at the highest levels, which has led to the use of out-of-level testing such as that employed in the Study of Mathematically Precocious Youth (SMPY) to select students for special opportunities through their talent search programs (Lubinski & Benbow, 2006). A different type of solution (Sternberg, 1999), consistent with the definition of STEM innovators by the NSB (2010) and with the emerging paradigm, is to consider intelligence (general ability) and domain-specific ability as developing expertise within domains. Intelligence and achievement tests measure a limited aspect of developing expertise, defined as “. . . the ongoing process of the acquisition and consolidation of a set of skills needed for a high level of mastery in one or more domains of life performance” (Sternberg, 1999, p. 359).
In studies comparing novices and experts across different domains, researchers (Bransford et al., 2000; Chi et al., 1981; Dogusoy-Taylan & Cagiltay, 2014; Glaser & Chi, 1988) have found that expertise in a domain increased sensitivity to patterns of meaningful information, and experts’ knowledge was organized around core concepts, which helped them establish meaningful relationships between concepts. From this perspective, rather than simply testing the amount of knowledge possessed by students, one must ask them to show the relationships they perceive among concepts. A method often used in science education is concept mapping, developed by Novak (Novak & Gowin, 1984) and further extended by Ruiz-Primo and colleague (Ruiz-Primo, 2001, 2007; Ruiz-Primo & Shavelson, 1996).
When concept mapping is used, students are given a list of concepts related to a topic and are asked to “map” them, showing which ones are connected, which ones are at higher levels than others, and which ones are connected to others in different sections of a map. Unlike “mind-mapping,” students must write words on the lines connecting different concepts and they must make a hierarchical map. The words on the lines explain the relationship between two concepts, and the hierarchies identified by students show how they have organized the concepts. This approach to assessment of domain-specific knowledge has several advantages: (a) assessing the students’ levels of development of expertise in a domain, (b) having no ceiling effect, (c) assessing higher levels of thinking rather than memory and comprehension (Austin & Shore, 1993; Erdimez et al., 2017; İngeç, 2009; Vanides et al., 2005), and (d) offering an alternative to the usual multiple-choice standardized tests or grades for assessing domain-specific ability and understanding (Maker & Zimmerman, 2020; Tan et al., 2017; Zimmerman et al., 2011).
Creativity
Assessment of creativity, another variable associated with outstanding achievements (Subotnik et al., 2011) is even more controversial than assessment of domain-general and domain-specific abilities, with some researchers favoring the use of self-report inventories (cf. An & Runco, 2016), some focusing on divergent thinking tests (cf. Torrance, 2008), some using integrative approaches (cf. Jellen & Urban, 1989; Urban & Jellen, 1996), and some favoring inclusion of both divergent–exploratory and convergent–integrative creative abilities (cf. Barbot et al., 2016; Lubart et al., 2013). Criticisms of creativity assessment are related mainly to the scoring procedures: (a) a strong relationship has been found between uniqueness and fluency (Silvia, 2015; Torrance, 2008), (b) uniqueness scores are strongly sample dependent (Silvia, 2015), and (c) divergent thinking tasks resemble traditional verbal fluency tasks too closely (Nusbaum & Silvia, 2011). A different approach, the consensual assessment technique (CAT), was developed by Amabile (1982). Judges provide subjective ratings of the creativity of each response, and if multiple judges are employed, the method has been shown to be a reliable and valid method for assessing uniqueness without being confounded with fluency scores (Silvia et al., 2008, 2009).
In addition to these different approaches to creativity assessment and the criticisms of the scoring procedures, creativity assessment research includes two long-standing debates: the relationship between creativity and intelligence (cf. Getzels & Jackson, 1958; Runco, 2007; Runco & Albert, 1986; Torrance, 1977; Wallach, 1971) and the domain specificity versus domain generality of creativity (cf. An & Runco, 2016; Baer, 1998, 2015; Barbot et al., 2016; Hong & Milgram, 2010; Plucker, 1999). An extensive discussion of these debates is beyond the scope of this article, but the main aspects of the debates are reviewed.
Creativity and Intelligence
Out of the debate about whether creativity was the same as or different from intelligence and the relationships between the two constructs (cf. Kaufman et al., 2008; Runco, 2007; Sawyer, 2006; Weisberg, 2006) came the belief that a threshold exists: Intelligence and creativity are related only up to a certain IQ level, usually believed to be 120 (Getzels & Jackson, 1962; Guilford & Christensen, 1973; Simonton, 1994). However, studies of the threshold theory have not yielded consistent results, nor are they viewed as being conclusive (cf. Runco & Albert, 1986), which may result from the use of different measures of both constructs, gender, age, and SES in the samples. These inconsistencies prompted Kim (2005) to conduct a meta-analysis of the results of several studies. She found that, in 21 studies including 45,880 participants, the average correlation across different measures of both creativity and intelligence was small (r = .174), but differed significantly according to age and different creativity tests. In elementary students, the correlation was .086; for the middle school group, .210; for the high school group, .261; and for adults, .205.
Going beyond the debate about the distinction between creativity and intelligence and the threshold theory, and building on the research of Kim (2005), Karwowski and colleagues (2016) report results of eight studies involving 12,225 participants, in which they concluded that intelligence is a necessary but not sufficient condition for creative behaviors and accomplishments: Different aspects of creativity related to intelligence differently, and very few participants in the studies with high creativity and achievement had low intelligence.
In Silvia’s (2015) review of the history of thought on how intelligence and creativity are related, he suggests that statistical advances (e.g., Hong & Milgram, 2010; Silvia, 2008); advances in assessment of creativity, especially uniqueness scoring; emphasis on executive processes (Gilhooly et al., 2007; Nusbaum & Silvia, 2011); and the emerging neuroscience of creativity (Beaty, 2015; Beaty et al., 2014) have provided support for the view that intelligence and creativity are strongly linked. Silvia and others (Nusbaum & Silvia, 2011; Weisberg, 2015) do not suggest that these constructs are the same, but that they are more closely related than many theorists and researchers previously have thought. By studying the deeper connections between intelligence and creativity and viewing intelligence and creativity as “. . . families of processes and functions that the mind can do . . .” (Silvia, 2015, p. 604), researchers and practitioners can see a larger conceptual picture: “The mind can do a lot of things, and the processes that it uses for solving closed-ended problems are similar to those used for making, judging, and playing with ideas” (Silvia, 2015, p. 604).
Consistent with this perspective, Sak and Maker (2006), in a study of mathematical problem-solving using the Discovering Intellectual Strengths and Capabilities while Observing Varied Ethnic Responses (DISCOVER) assessment, concluded that in elementary-age students, two standard deviations above the mean of domain-specific knowledge was a threshold for a one standard deviation above the mean of creativity in math. The DISCOVER math assessment (Bahar & Maker, 2011, 2020; Maker, 1993, 2005), created in 1992 from a perspective similar to that of Silvia (2015), includes solving known problems with known operations and correct answers, accuracy in the solving of other types of problems (domain-specific knowledge and skills), and solving problems with multiple correct answers by applying mathematical concepts and operations in many and novel ways (domain-specific and domain-general creative processes; e.g., write as many problems as you can with the answer of 10). The DISCOVER assessment framework is described in more depth in a later section of this article and the math assessment in more depth in another publication (Bahar & Maker, 2020).
Domain Specificity Versus Domain Generality of Creativity
Although the literature contains many debates over the years, recent consensus (cf. Amabile, 1996, 2013; An & Runco, 2016; Barbot et al., 2016; Barbot & Tinio, 2015; Hong & Milgram, 2010; Plucker & Zabelina, 2009) seems to be that (a) creativity has both domain-general and domain-specific components, (b) these two components of creativity are related, and (c) task characteristics and motivation play a part in the relationship between the two components. Amabile (1996) initially proposed that creativity consists of domain-specific knowledge and skills, creativity-relevant processes, and task motivation, but later acknowledged the importance of both positive and negative effects of environment on the development and expression of creativity (Amabile, 2013). Teasing out these relationships has become the focus of many researchers rather than debating whether creativity is specific to domains or general across domains (Barbot et al., 2016; Lubart et al., 2013). For example, Lubart and colleagues pointed out the importance of task demands, describing each individual as having a unique profile of resources, and the degree to which one is creative depends on the fit between her or his profile of cognitive (i.e., divergent thinking, analytic thinking, mental flexibility, associative thinking, selective combination) and conative (i.e., tolerance of ambiguity, risk taking, openness, intuitive thinking, motivation to create) resources and the demands of the task. Creative potential is the confluence of these distinct, but interrelated, resources and involves more than a simple sum of each person’s level, mainly due to their interactive effects, with the environment serving as a catalyst for person-centered resources to be activated. Evaluation of creative potential can be production based (focused on the product) or resource based (focused on the cognitive and conative resources of the individual; Lubart et al., 2013). This definition is consistent with the varied views of other researchers and comprehensive, in that it pulls together significant research in the field.
Motivation and Mindset
Renzulli (1978) was perhaps the first scholar in the field of education for the gifted to highlight the importance of a combination of traits as necessary for exceptional talent to be evident. His three-ring conception included above-average ability, creativity, and task commitment, three of the variables identified by Subotnik and colleagues (2011). He defined task commitment as “. . . a refined or focused form of motivation . . .” (Renzulli, 1986, p. 69). This form of motivation is important in achievement in both elementary school (Curby et al., 2008) and in math and science (Benbow & Arjmand, 1990; Collins, 2018). Other support comes from the research on “deliberate practice” (Daley, 1999; Eriksson, 1996, 2015; Ericsson et al., 1993, 2005). Without a high level of task commitment, people will not invest the time (10,000 hrs as proposed by Ericsson) and energy needed to become experts or to master the skills needed for high-level productivity or performance. They must believe they control the development of their talents and abilities through effort on their part (Brown & Weiner, 1984; Ericsson et al., 2007; Maker, 1978; Siegle, 2013; Siegle et al., 2017; Siegle & McCoach, 2005; Weiner, 1986; Whitmore & Maker, 1985) because talents and abilities are dynamic and malleable (Dweck, 2012). Those interested in becoming experts or mastering certain skills also must believe they are capable of overcoming obstacles, learning from mistakes, and recovering from setbacks: “growth mindset” as described by Dweck (2006).
From another perspective, Feist (2006a) found that members of the National Academy of Science knew they wanted to be scientists at an early age: 25% by age 14, 50% by age 18, and 75% by age 20. In his study of Westinghouse Talent Search finalists, he found “. . . the only precocity variable that reached the .05 level of significance with lifetime productivity was age that one first conducted formal research” (p. 70). Results of research on motivation and practice, when combined, demonstrate their importance in high-level achievement. One could also argue that the motivation needed for long-term task commitment comes from an individual’s passion for certain activities, another trait Subotnik and colleagues (2011) found to be important to outstanding productivity and performance. For gifted students, interest is highly correlated with performance in a variety of talent domains (Siegle et al., 2010).
Opportunities
One cannot separate opportunities from motivation. Motivation, or the desire and willingness to do something, cannot develop or be expressed without opportunities in the areas of interest. Those opportunities must be appropriately challenging (Gallagher et al., 1997; Maker & Schiever, 2010; Rimm et al., 2018) and in students’ “zone of proximal development” (Vygotsky, 1978), in other words, matched to the abilities and interests of the students (Pease et al., 2020; Wang & Eccles, 2013). To spend the 10,000 hrs of practice proposed as necessary for the development of expertise, teachers, mentors, programs, and other opportunities for this practice must be provided, accompanied by specific feedback to improve performance (Hattie, 2009). In a recent study related to this discussion, Wai and colleagues (2010) used the concept of “educational dose,” consisting of both acceleration and enrichment opportunities in STEM, to study its effects on accomplishments in STEM. In one study, they followed a large sample of students who were identified in the SMPY talent search to determine their levels of achievement and productivity 25 years later. In a second study, they included participants in graduate programs in STEM. The combined results of these studies led to the conclusion that The high-dose groups consistently earned a greater proportion of the STEM outcomes [STEM PhD, STEM publications, STEM tenure, STEM patent, STEM occupation] than the low-dose groups, with the exception of STEM patents for the total sample and STEM patents and publications for the graduate student women. (Wai et al., 2010, p. 869)
Integrating Intelligence, Creativity, Motivation, and Opportunities Through Problem-Solving
In 1993, Maker concluded that educators and psychologists have made artificial distinctions between the constructs of intelligence and creativity because of the tasks included in the tests used to measure them. This perspective is consistent with the recent conclusions of researchers: (a) intelligence and creativity are not as different as once assumed (Karwowski et al., 2016; Nusbaum & Silvia, 2011; Silvia, 2015), (b) motivation and mindset are important individual differences (Dweck, 2012; Ericsson et al., 2007; Renzulli, 1978), and (c) individuals have a unique profile of resources, and the degree to which one is creative depends on the fit between her or his profile of resources and the demands of the task (Lubart et al., 2013).
Using the framework proposed by Getzels and Csikszentmihalyi (1967, 1976), who described problem situations (tasks) as being on a continuum of open-endedness depending on how much information was given to the problem solver, how much was known (specified) by the presenter of the problem, and how much was unknown (unspecified) to either or both (Table 1), Maker analyzed tests of intelligence, achievement, and creativity. She found that tests of intelligence and achievement included mostly “closed” problems, which were well defined with known methods and right answers (Types I and II in Getzels and Csikszentmihalyi’s continuum). At the other end of the continuum, tests of creativity included open-ended problems, problems that were either well defined or not defined with multiple appropriate methods and many possible solutions. No intelligence, achievement, or creativity test was identified that included the most open-ended type (Type III in Getzels and Csikszentmihalyi’s continuum and Type VI in Maker and Schiever’s (2010) modified continuum), the type Getzels and Csikszentmihalyi (1967, 1976) found to distinguish creative artists and scientists from those who were not as creative.
Maker and Schiever Problem Continuum Modified From Getzels and Csikszentmihalyi’s Research.
Note. Presented in “Problem Continuum” by Maker and Schiever (2010).
Types I, II, and VI were Types I, II, and III in Getzels and Csikszentmihalyi’s continuum.
Sternberg (1999) also emphasized the solving of varied types of problems as essential in his theory of intelligence as developing expertise. He suggested that expertise includes certain metacomponents of thinking: “recognition of problems, definition of problems, formulation of strategies to solve problems, representation of information, allocation of resources, and monitoring and evaluation of problem solutions” (p. 361). In his view, problem-solving needs to be defined more broadly than it is usually defined in intelligence and achievement tests; it needs to include the solving of open-ended problems as well as monitoring the solutions as they are applied in practical situations.
In science education, as a result of extensive experience and research with the Westinghouse Science Talent Search (now called the Intel Talent Search), Brandwein (1992, 1995) distinguished general giftedness from science proneness and science proneness from science talent. Giftedness (measured by tests of general ability) was recognized as high achievement in the general scholastic program. Science-prone students displayed a keen and driving interest, as well as high achievement, in science generally, and in addition, demonstrated skill in the laid-out laboratory, then in guided discovery aided by clues (problem-doing), and then in solving the more sophisticated puzzles that require deferred solution. (Brandwein, 1992, p. 122)
Using the problem continuum (Table 1), these would be the closed and semi-open problems. Science talent was defined as “. . . ability to plan and complete an investigation involving a problem without a known solution (attested by a scientist in the particular field of study)” (Brandwein, 1992, p. 123). In the problem continuum, these would be the open-ended problems. More recently, Feist (2006a) found that the age at which one first conducted original research was the only precocity variable that predicted lifetime productivity. STEM innovators would be those with science talent, as evidenced by their ability and interest in “problem finding,” not just science proneness, as evidenced by their interest in solving well-defined problems.
Using a problem-solving focus and a framework initially proposed by Getzels and Csikszentmihalyi (1967, 1976) and modified by Maker and Schiever (Maker, 1993; Maker et al., 2015; Maker & Schiever, 2010) enables integration of the variables involved in the expression of talent (Subotnik et al., 2011) by focusing on each student’s ability to solve various types of problems in STEM domains rather than on labeling a student as intelligent, creative, or even “exceptionally talented.” The assessments designed through this project can help to identify which types of problems students are already capable of solving, which types they are developing the expertise to solve, which ones are challenging for them, and how these abilities and challenges are different in different domains (Pease et al., 2020). Figure 1 shows how these components are related. Through these assessments, one can observe (a) domain-specific knowledge and abilities and domain-general abilities through students’ performance in solving closed problems (Types I and II), (b) domain-specific and domain-general creativity through students’ performance when solving open-ended problems (Types V and VI), and (c) the interplay between domain-specific and domain-general knowledge and abilities and domain-general and domain-specific creativity through the solving of semi-open problems, and (d) task motivation through observing performance in all types of problems across domains (Figure 1). To determine opportunities, an examination of family, school, community, and economic factors is essential.

Problem-solving as a unifying concept for assessment of general abilities (intelligence), domain-specific knowledge and abilities, and creativity-relevant processes and abilities, with personal and environmental factors as influences on all abilities, skills, and knowledge.
Cultural, Ethnic, Linguistic, and Economic Diversity
Another policy change recommended by the NSB (2010) is to identify all types of talent and nurture potential in all demographics of students. Conventional methods for identifying exceptional talent severely limit the diversity of students recognized as exceptionally talented and served in special programs for gifted students. As reflected in Office for Civil Rights data and State Department of Education statistics for the state in which the project was located, in the year 2006, White students made up 47.04% of the overall student population and 64.65% of the students served in programs for exceptionally talented students. African American students made up 5.13% of the student population and 2.88% in programs for exceptionally talented students. Hispanic students made up 39.67% of the student population and 23.21% of students in programs for exceptionally talented students. American Indian students made up 5.64% of the student population and 3.23% of students in programs for exceptionally talented students (Arizona Department of Education, 2006; Snyder & Dillow, 2012). Nationally, the picture is better, but certain groups remain overrepresented, whereas others remain underrepresented: White, 57.7% of the school-aged population and 67.7% of programs for talented students; Black, 14.8% of the school population and 9.1% of students in programs for talented students; Hispanic, 19.7% and 12.8%; American Indian/Alaskan Native 0.9% and 0.9% (Snyder & Dillow, 2012).
Examination of these statistics shows that American Indian students are represented proportionally across the country, but not in the state in which this project was located. Perhaps the reason is that the percentages in the state are higher in general, and students are mostly on reservations where fewer programs exist and the income level is lower than the state and national averages. One cannot ignore the fact that economic differences are important determiners of academic opportunities (Miller & Kimmel, 2012; NSB, 2010), and may be more important than cultural differences for the American Indian students across the country, especially those on reservations. Of all ethnic groups, American Indians have the highest rate of poverty, 24% (Sherman, 2019; Smith et al., 2014). Students from families in high and middle SES groups are more likely to have opportunities to discover their talents, develop their interests, and participate in special programs, thus developing their skills and confidence in STEM areas (Archer et al., 2012).
Some of the reasons for the lack of diversity in special programs for exceptionally talented students are (a) heavy reliance on standardized quantitative measures of achievement and aptitude (Joseph & Ford, 2006; Miller, 2004; Plucker et al., 2010; Smith et al., 2014); (b) using only one method of assessment (Ivcevic & Kaufman, 2013; Joseph & Ford, 2006; Kaufman et al., 2004; Ortiz, 2002; Silvia, 2015; Sternberg, 2010; Stemler et al., 2006); (c) using conventional assessment methods rather than authentic, performance-based assessments (Ford, 1998, 2006; Joseph & Ford, 2006; Luria et al., 2016; Van Tassel-Baska, 2002); (d) using limiting definitions of giftedness (Dai & Chen, 2013; Maker, 2005; Sternberg, 1999; Subotnik et al., 2009); and (e) using identification procedures heavily influenced by teacher perceptions such as grades and teacher recommendations (Clasen et al., 1994; Maker, 1996; McCoach & Siegle, 2003a).
Often, in attempts to increase diversity, selection committees or individuals use the Scholastic Assessment Test (SAT), grades, and teacher recommendations, but apply weighting criteria or use other methods to select students from underrepresented groups who have lower scores and grades than White male students (Smith & Garrison, 2005). However, this practice has been the subject of a number of lawsuits in which White students have claimed that students of color with lower scores were given discriminatory preference (Margulies, 2002; Olivas, 1999). Indeed, Maker (1996) argued that not only is this practice discriminatory to the White students, but it also sends a message to students of color that they are somehow inferior.
Fortunately, promising practices exist and have been used successfully to achieve a more equitable balance of students from diverse ethnic and economic groups in special programs without lowering cutoff scores or applying weighting formulas to equalize scores (Glover, 1976; Ivcevic & Kaufman, 2013; Kaufman et al., 2004; Kaufman, 2006; Luria et al., 2016; Sternberg, 2006; Stemler et al., 2006; Torrance, 1971, 1973; Troiano & Bracken, 1983). A number of researchers have found that, unlike measures of ability and achievement, scores of students from diverse cultural groups and low-income groups on various measures of creativity are not significantly different from those of high-income and mainstream cultural groups (Glover, 1976; Ivcevic & Kaufman, 2013; Kaufman, 2006; Kaufman et al., 2004; Torrance, 1971). In fact, researchers have found that cognitive flexibility, open-mindedness, and imagination, traits important in creativity, are higher in African American students than in White students (Jenkins, 2005); that bilingual students tend to demonstrate higher creativity than monolingual students (Ghonsooly & Showqui, 2012; Kharkhurin, 2012); and that American Indian students are more likely to solve problems creatively as a group and to generate creative stories (DeVries & Shires-Golon, 2011).
In addition to the fact that measures of creativity do not seem to have the same biases as measures of intelligence, inclusion of creativity and/or creative problem-solving in definitions and methods used to select students for special programs has the additional advantage of being more aligned with the research on characteristics needed for talent development and innovation (Luria et al., 2016; Maker, 1993, 1996, 2005; Sternberg, 2010; Sternberg & Coffin, 2010). In fact, Sternberg and his colleagues (Hedlund et al., 2006; Sternberg, 2010) found that measures based on his theories (including analytical, creative, and practical abilities; intelligence as developing expertise) predict college success more accurately than conventional admissions tests while significantly reducing differences by ethnicity. For example, in the Kaleidoscope Project, applications to university programs from African American students and Hispanic American students increased significantly, admission of African American students increased by 30%, and admission of Hispanic American students increased by 15% (Sternberg, 2010).
This approach to identification has characterized the DISCOVER Projects since their beginnings in 1992, with components to assess both domain-specific knowledge and abilities in spatial, linguistic, and mathematical domains and creative problem-solving within and across domains (Maker, 1992, 1993, 1996, 2001, 2005; Sarouphim & Maker, 2010). Across different grade levels, different forms of the assessment, different ethnic and economic groups, and at different times, no significant differences have been found in the percentages of students from different economic, cultural, and gender groups identified using the DISCOVER assessments (Maker, 2005; Nielson, 1994; Sarouphim, 2001, 2002; Sarouphim & Maker, 2010). In one school district, for instance, the DISCOVER assessments were used in designated schools in low-income, high-minority schools because of the problem of underrepresentation of students of color. Using conventional methods (teacher nomination and IQ tests), prior to the use of DISCOVER, in a school district with an overall population of 58% Hispanic students, only 26% of the students served in programs for gifted students were Hispanic; although only 37% of students in the schools were White, the percentage of students served in programs for the gifted who were White was 72%; and for African American students, 3% in the school population and 1.3% served in programs for the gifted. The balance changed in the targeted schools to more equitable percentages after DISCOVER was incorporated into the identification process: Hispanic, 73.5% in the eight schools and 71.5% served in special programs in the schools; White, 21% in the schools and 26.5% in the special programs; African American, 4.2% in the schools and 2.0% in special programs. In a school district with a high percentage of African American students, the balance was changed to reflect the ethnic balance in the school district when an assessment modified from DISCOVER was used as an identification tool (Reid et al., 1999; Romanoff et al., 2009). The conceptual framework for these and the new assessments that were created using the same framework are presented in the following sections.
Conceptual Framework: Integrating Intelligence, Creativity, Problem-Solving, and Motivation
The first and most important aspect of the conceptual framework for creating the assessments was a definition of ability that could serve as the unifying theme for all assessments. Gardner (1983) provided an excellent theme as he defined intelligence as . . . a set of skills of problem solving enabling the individual to resolve genuine problems or difficulties . . . and . . . the potential for finding or creating problems—thereby laying the ground work for the acquisition of new knowledge. (p. 60)
This definition is consistent with the definition of STEM innovators (NSB, 2010), as it includes a clear relationship to the kinds of problems STEM innovators face, the need for a knowledge base, problem-solving as well as problem finding, and the potential for creative breakthroughs.
Defining Exceptional Talent and Creating Instruments
Using a conceptual framework based on Gardner’s (1983) and Sternberg’s (1999) theories of intelligence and analysis of tests of creativity and intelligence, Maker (1993) proposed a different definition of giftedness or exceptional talent: “. . . the key element in giftedness or intellectual competence is the ability to solve the most complex problems in the most efficient, effective, or economical ways” and in addition, “. . . are capable of solving simple problems in the most efficient, effective, or economical ways” (p. 71). Later, teams of observers added “elegant” (i.e., pleasingly ingenious and simple) to the list of adjectives describing solutions to problems as an alternative to more common words such as novel, creative, or unique. Using this framework, Maker and her colleagues (Maker, 1993, 1996, 2005) with funding from various federal sources, developed a performance-based assessment, DISCOVER. Over a period of more than 25 years, Maker and her colleagues found that the assessments based on this framework not only resulted in recognition of students with creative problem-solving ability but also changed the balance of underrepresented students in programs for exceptionally talented students in ways that reflected the ethnicity distribution of students in their communities (Maker, 2005; Nielson, 1994; Sarouphim, 2001, 2002; Sarouphim & Maker, 2010).
Not only were students from diverse and traditionally underrepresented groups identified through DISCOVER shown to be exceptionally talented, but they were successful in programs for gifted students or in regular education programs (Reid et al., 1999; Romanoff et al., 2009; Sak & Maker, 2003; Erdimez & Maker, 2015). For example, Erdimez and Maker (2015) found that the combined scores on the components of the DISCOVER assessment administered at the beginning of Grade 3 explained 43.9% of the variance in American Indian students’ overall achievement at the end of Grade 4, whereas an often-used nonverbal test of intelligence explained only 19.5% of the variance in achievement in the same group of students (Tan & Maker, 2015). Sak and Maker (2003), in two predictive validity studies, found that the DISCOVER assessment administered in kindergarten was a significant predictor of student grades and achievement in Grade 3 and Grade 6 in schools in low-income areas with high percentages of Hispanic students. The areas of student strength identified in kindergarten corresponded to the areas in which they excelled in later years. Sarouphim (2002) found that the high school form of the DISCOVER assessment showed no gender or ethnic biases in her study involving Hispanic, American Indian, and White students. Results such as these lead to the conclusion that an instrument based on the same principles could have the potential not only to identify exceptionally talented students in STEM but also to be appropriate for identifying the talents of students from underrepresented groups rather than being biased against them.
For this research, a definition of exceptional talent in STEM was developed based on the conceptual framework presented here, research and practice in psychology, sociology, general education, and science education: Exceptional talent in STEM has two essential components: (a) a highly integrated and interconnected knowledge structure and (b) the ability and willingness to solve a variety of types of problems, from well-structured and known to ill-structured and novel, in science, technology, engineering, and math in the most effective, efficient, elegant, or economical ways.
Expertise: An Integrated and Interconnected Knowledge Structure
To assess the first component of exceptional talent, Sternberg’s (1999) theory of intelligence as developing expertise, which included both recognition of the knowledge structure of experts and an emphasis on problem-solving, was the framework for developing the assessments. Sternberg defined intelligence as “developing expertise” and described it as the ongoing process of acquiring and consolidating a set of skills needed for mastery in one or more domains. Studies have shown that as expertise in a domain grows, knowledge becomes increasingly interconnected (Chi et al., 1981; Glaser & Bassok, 1989; Shavelson, 1972; Shavelson et al., 1990). The most knowledgeable individuals in a domain have a highly integrated conceptual structure organized around central concepts (Ruiz-Primo et al., 2001). Thus, to solve problems in any domain, and particularly in science, knowledge and conceptual understanding in specific domains (content), problem-solving skills (process), and the ability to apply knowledge and understanding to novel situations (application) are essential (Shavelson et al., 1990). These three areas are roughly similar to the three types of intelligence (i.e., analytical, creative, and practical) combined with the elements of expertise (i.e., metacognitive skills, learning skills, thinking skills, knowledge, and motivation) outlined by Sternberg (1997, 2005) in both his theory of successful intelligence and his view of intelligence as developing expertise (Sternberg, 1999). Concept mapping, a way to assess the conceptual structure of students’ knowledge base, was incorporated into the new DISCOVER assessments in STEM (Maker & Zimmerman, 2020) as a way to assess the first component of the definition of exceptional talent.
The Importance of Creativity and Problem-Solving
Innovation in any area is the process of creating new products, new ideas, new methods, and new ways of thinking. It involves transformation, change, upheaval, and breakthrough (NSB, 2010; Sawyer, 2006; Weisberg, 2006). For these reasons, to assess abilities in the second component of the definition of exceptional talent, performance-based assessments of creative problem-solving were developed. Creative problem-solving is essential to innovation in STEM, and it needs to include the solving of a variety of types of problems, both those involving general and domain-relevant skills and abilities (closed and semi-open), general and specific creativity-relevant processes (semi-open and open-ended), and task motivation (observed during the assessment process; Amabile, 1996, 2013; Renzulli, 1978; Sternberg, 2000, 2005; Figure 1). Most conventional measures used to identify exceptional talent in STEM and other areas do not include creative thinking and problem-solving, perhaps because these traits are seen as being difficult to measure. Combining the views of Gardner (1983, 1999), Maker and Anuruthwong (2003), and Sternberg (1999), problem-solving can be conceptualized as a domain-general ability, whereas the knowledge and skills needed for each domain are domain-specific skills. In Sternberg’s (2000, 2005) theory, one of the metacomponents is problem-solving; core abilities such as memory, metacognition, and reasoning cut across different domains of intelligence, whereas others are specific to domains.
In a similar vein, social psychologist Amabile (1996, 2013) developed and has found support for a componential model of creativity, which includes (a) domain-relevant skills that lead to exceptional performance in a specific domain (such as math or science), (b) creativity-relevant processes that cut across domains of creative performance, and (c) task motivation, the interest in or attitudes toward a specific task. In the field of education for gifted and talented students, Renzulli’s (1978) definition of giftedness consisted of three interacting clusters corresponding to those identified by Amabile, Gardner, and Sternberg: above average ability, creativity, and task commitment.
Motivation
In any discussion of problem-solving, development of expertise, and abilities, motivation must not be ignored. An individual’s passion for solving certain types of problems as well as her or his willingness to spend hours developing the expertise needed to function at high levels is a major determinant of success (Amabile, 1996, 2013; Feist, 2006a, 2006b; Gallagher & Gallagher, 2013; Renzulli, 1978; Sternberg, 1999; Subotnik et al, 2011). Maker and Anuruthwong (2003), as a result of years of research with and observations of children and youth, have described this motivation by noting that the question, situation, challenge, or discrepancy in a situation of interest to the problem solver provides the spark that initiates the action of abilities. They use the metaphor of a prism, describing the “challenge” as the white light coming into one side of the prism, and the activated or observed human abilities as the varicolored lights coming out of another side of the prism. Similarly, in Sternberg’s (1999) developing expertise model, the driving force is motivation. Without it, nothing happens. If motivation is present, the other elements, metacognitive skills, learning skills, thinking skills, knowledge, and context, interact as the novice “. . . works toward expertise through deliberate practice” (p. 364).
Although the long-term motivation and task commitment described by Sternberg (1999) and Renzulli (1978) cannot be seen during a performance-based assessment, task motivation (Amabile, 1996, 2013; Barbot et al., 2016) can be observed. When the DISCOVER assessments were being developed, observers were asked to share their perceptions of the performance of the students in their groups, which often included statements about motivation. If, for example, an observer said “She was highly motivated,” the observer was asked to list the behaviors he or she saw that led to the conclusion that a particular student was motivated. Over the years, the behaviors were compiled and are now included on the forms observers use to record their observations during an assessment: shows involvement in task (e.g., focuses on own work rather than that of others, not easily distracted), works continuously, persists on tasks or activities, increases in motivation or enjoyment as problems increase in open-endedness, follows through to completion, does not want to quit even when others have finished, shows nonverbal enjoyment of task or activity (e.g., smiling, laughing, playing), and verbalizes enjoyment of task.
Defining and Assessing Problem-Solving
With problem-solving as a key element of the assessment, an operational definition was necessary as a guide to developing tasks. Drawing upon earlier work, the research team used the continuum of problems from the research of Getzels and Csikszentmihalyi (1967, 1976). In Csikszentmihalyi’s early research, the ability (and willingness) to structure an open-ended, ill-structured problem, or “problem-finding,” as it was later labeled, was the single trait that most accurately predicted the later creative achievements of artists and scientists. His results were similar to those of later researchers, Brandwein (1992, 1995), and Feist (2006a), in their studies of science talent search winners and accomplished scientists.
Csikszentmihalyi’s research on problem finding had a significant effect on the field of education for talented students, leading to the development of teaching approaches in which problem finding was valued over the solving of already-defined problems or problems with known solutions (Gallagher et al., 1992; Gallagher & Gallagher, 2013; Maker & Nielson, 1995; Maker & Schiever, 2010). However, this same emphasis was not incorporated into the identification process, with the use of tests consisting of only well-defined problems that had known methods for solving them and one right answer (IQ and achievement tests). Assessments to identify exceptional talent must be aligned with the programs and curricula designed to cultivate them. More specifically, these new assessments must include methods to recognize students who can identify problems, design their own methods, and create new solutions rather than simply remembering facts and methods or implementing solutions designed by others, the types of items included in standardized tests of abilities and achievement.
Using Getzels and Csikszentmihalyi’s (1967, 1976) framework of problem types, Maker and Schiever (Maker & Schiever, 2010; Schiever & Maker, 1997) developed a modified continuum consisting of five, and later, six problem types that could be used to design both assessments of problem-solving and curricula to develop it. In their continuum (Table 1), the first two types are closed, the second two are semi-open, and the third two are open-ended. The closed types are characterized by clearly defined problems, specified methods, and right answers; the semi-open types are characterized by clearly defined problems, a range of possible methods, and either a right answer or a range of possible answers. The open-ended types have either a clearly defined problem or an undefined problem, an unlimited number of appropriate methods, and an unlimited number of possible solutions. This continuum has been a guide to developing tasks for the DISCOVER assessments since 1992. Each assessment in the new STEM battery included at least one closed problem (e.g., physical science [mechanical-technical]) or an aspect of scoring that was based on accuracy (e.g., life science), one semi-open problem, and one open-ended problem (Table 2). For example, in the assessment of mechanical–technical abilities (closely related to physics and engineering), the closed problem was to make a gear box given a set of materials and a diagram, the semi-open problem was to make one of two vehicles based on a picture of the final product and a picture of the materials needed to make it, but with no instructions telling how to put the pieces together to make the vehicle, and the open-ended problem was to make a machine that moved with the remote and motors that was different from the vehicle and of the student’s own creation (Alfaiz et al., 2020).
Core Competencies, Problem Types, and Examples of Tasks in M2 STEM Assessments.
Note. STEM = science, technology, engineering, and mathematics.
Tasks That Resemble Real-Life Problems
Another aspect of assessment noted by Milgram and Hong (1993) as important in the assessment of creativity within domains is the relationship between the tasks presented to students and the problems people solve in real-life situations. The problems usually presented to be solved in domain-general creative thinking measures are very different from the kinds of problems people face in real life. For instance, many tests include “unusual uses” of common objects or making different drawings using figural stimuli. In real life, people must solve problems in a life or career context. For these reasons, Hong and Milgram developed creativity assessments in which respondents were given opportunities to use domain-specific creative thinking ability in a wide variety of real-life situations. This may be an important reason why they found that the scores on their tests and indicators of creativity were good predictors of creative adult accomplishments across several domains (Milgram & Hong, 1993, 1999).
In the example from the mechanical–technical assessment in the physical science domain (Table 2), the tasks presented are similar to the kinds of problems engineers will face as they learn new techniques and apply their ideas to develop new designs. Another example is that in the life science performance assessment, students are presented with two separate tasks. In the first task, students are invited to choose either flowers or insects, and then to make as many groups as they can based on the similarities they observe. In this task, they are using the skills of life scientists, who must observe carefully, make inferences about relationships, and give names and descriptions to their categorizations of phenomena. In the second task, students are directed to make an ecosystem of their own. They are to indicate interdependencies, interactions, and connections among the natural phenomena (Zimmerman et al., 2020). Studying and recognizing these interdependencies and interactions are lifelong tasks of scientists.
First- and Second-Order Knowledge
Another aspect of the conceptual framework was a consideration of the kind of knowledge being assessed. When designing the performance-based components of the assessment, practical intelligence (Sternberg, 1997, 2005), skills and knowledge gained from experience, were the focus, called “first-order” knowledge (Gardner, 1992) because it is developed through experiences rather than during formal schooling. “Second-order” knowledge (Gardner) is the knowledge developed in school. These two types of knowledge are also roughly equivalent to the fluid (first-order) and crystallized (second-order) abilities identified by Cattell (1957). Fluid ability is involved in solving new problems, using logic in new situations, and identifying patterns. Crystallized ability is using learned or acquired knowledge. Both types of knowledge and ability are essential.
Depending on their environments, the cultural values of the community, and the schools they attend, students are exposed to different types of knowledge and experiences. In fact, a number of studies have shown that the level of a family’s economic resources may have a greater impact on students’ academic achievement (second-order knowledge; Miller & Kimmel, 2012; NSB, 2010) than cultural and ethnic factors (Archer et al., 2012; Xie et al., 2015). Often, the discrepancies result from both the quality of education provided in schools and the families’ lack of resources available to give their children opportunities for enrichment experiences in STEM and other areas. Thus, items designed to measure second-order knowledge and crystallized abilities often are measures of exposure to information rather than measures of the ability to learn the information (Maker, 2005). All domains in the original DISCOVER assessments and in the new STEM battery include a measure of mainly second-order knowledge (e.g., concept maps in life science and physics and mathematical problem-solving) and a measure of mainly first-order knowledge (e.g., performance assessment of life science, physical science [mechanical-technical], and spatial analytical abilities).
Purpose and Research Questions
The purpose of this study was to evaluate the effectiveness of existing instruments and others created specifically for the Cultivating Diverse Talent in STEM (CDTIS) project based on the new definition of exceptional talent in STEM over a 4-year period. Several studies were conducted as a part of the CDTIS project to identify and cultivate exceptional talent in STEM. The project was a collaboration between (a) faculty members from three colleges (pharmacy, science, and education) and a biomedical research institute at an R1 university in the Southwestern United States and (b) educators from three public schools, one community-controlled school funded by the Bureau of Indian Affairs, and one charter school in the same state. For this study, two research questions were the focus of the inquiry:
Method
During the CDTIS research project, faculty members at the R1 university partnered with educators from five schools with high percentages of students of color and low socioeconomic groups usually underrepresented in special programs for talented students to develop, field test, and implement four new instruments to identify high school students for an internship program on campus. In addition, one of the original DISCOVER assessments (spatial analytical) was added to the battery of measures used to assess the students and one instrument (mathematics problem-solving) was modified to include a greater variety of skills. The new instruments were developed by a team of educators (a researcher with extensive experience developing assessments, a postdoctoral scholar with a degree in education of gifted students and experience teaching science, graduate students in education of the gifted, and science teachers) and scientists based on the new definition of exceptional talent in STEM. Identified students were placed in laboratories of scientists on campus, and were provided with support and special opportunities to present their research to the campus community at the conclusion of the internship.
Students from varied ethnic and socioeconomic backgrounds who applied for admission to the internship program, Keep Engaging Youth in Science (KEYS), also were involved in the research. Both the new instruments (M2) and existing procedures (M1) for selecting students (overall GPA, teacher recommendations, and self-statements) were used to choose the most promising students to participate in the internship program.
Developing the New Assessments
The process of development followed by the research team to create all the instruments was the assessment square proposed by Ruiz-Primo and colleagues (Ruiz-Primo, 2003, 2007; Shavelson et al., 2002). The four corners are the construct, observation, assessment, and interpretation. Guiding development of the assessments were certain underlying principles. Each task was designed to (a) assess the ability to solve closed, semi-open, or open-ended problems (Tables 1 and 2) based on the continuum modified from the research of Getzels and Csikszentmihalyi (1967, 1976) by Maker and Schiever (Maker & Schiever, 2010; Schiever & Maker, 1997); (b) assess core competencies outlined in the theoretical frameworks (Table 2); (c) be engaging and developmentally appropriate; (d) be as closely related as possible to the types of problems people would be expected to solve in real life in each domain; (e) assess either experiential or school-based knowledge (described as first- and second-order knowledge by Gardner [1992]); and (f) elicit observable behaviors in students that are indicators of different levels of ability and motivation. Using the underlying theories and principles as a basis, constructs were identified for each assessment and each task within the assessments. Constructs included (a) core capacities and abilities considered essential in the domains being assessed and (b) elements of analytical, creative, and practical intelligence.
From these principles and constructs, trial instruments and instructions for administering the assessments were designed. Next, research team members solved all problems themselves to identify any discrepancies and to understand the processes involved; they tested the items on friends, their children, and students in their classes (observation). At this time, research team members began to note what problem-solving processes were needed to solve the problems, whether the tasks were engaging and interesting to themselves and others, and the kinds of behaviors that could be observed to distinguish different levels of performance and motivation (observation). Next, these items were tested on larger populations, especially those in the target groups of students to be assessed when the instruments were completed (assessment). The processes of observation continued: noting problem-solving behaviors, student engagement and interest, and behaviors that could be observed to distinguish performance levels. Finally, the research team developed scoring systems that could be used to reliably distinguish levels of performance on all items and in all sections of the assessments (interpretation). This process was cyclical, in that after all phases, revisions were made and the new items were tested in the same way as the initial versions of the items.
When the team was satisfied that the instruments were ready for trial on a larger scale, they field tested them in the partner schools with students in Grade 12 (assessment). During the field testing, lists of student behaviors to observe, rubrics for assigning ratings to products, and scoring systems were developed (interpretation). The process was iterative as the research team returned to earlier steps to refine the assessments. Based on field tests, instruments and observer instructions were revised again, and after field testing and additional revisions as needed, instruments were administered to the students in Grade 11 at the same partner schools. Development of each of these instruments and an in-depth description of their current form can be found in other publications (Alfaiz et al., 2020; Bahar & Maker, 2020; Maker, 2020; Maker & Zimmerman, 2020; Zimmerman et al., 2020).
Participants
Participants included high school juniors (11th grade) in two groups. One group (M2) was selected from the partner schools. The first year, three schools on an American Indian reservation participated, and in the second year, two schools participated from the American Indian reservation. One partner school was located in an urban, low-income area. Students in the M2 group were selected using the six measures: three performance-based measures of creative problem-solving, two measures of sophisticated conceptual understanding using concept maps, and one assessment of application of knowledge and creative application of concepts in math. The other group of students (M1) consisted of students from a variety of types of schools in the same state who applied for and were accepted to the summer internship program at the university. They came from different schools, urban and rural, and from varied ethnic and economic backgrounds, and were selected using the existing procedures: overall GPA, teacher recommendations, and student self-statements. From the pool of students selected for the KEYS internship, the research team chose those who were juniors, and from the juniors, attempted to choose participants who came from areas of the state or school districts with demographics (ethnicity and SES levels) similar to those of the partner schools. However, this was not possible because of the ethnic and economic backgrounds of the students who applied and were chosen for the internship. They also had to agree to participate in the M2 assessments before, during, and after the internship program. Because no American Indian students had participated in the KEYS program prior to the CDTIS project and teachers were not aware of the program, very few American Indian students were available for selection through the existing procedures. In Table 3, the number of students selected using the new assessments (M2) and their ethnic backgrounds are presented along with the number of students and their ethnicities who were selected using the existing methods (M1) and who agreed to participate. In Table 4, the number of students selected using the existing methods and the economic level information about the schools they attended are presented.
Ethnicity of Students Selected by Existing Methods (M1) and New Assessments (M2).
Percentages of students receiving free and reduced-price lunches at schools attended by M1 and M2 students.
Instruments
In each domain to be assessed, two types of measures were developed and tested, corresponding to the two aspects of the definition of exceptional talent. One was a performance-based measure of problem-solving with a particular emphasis on the solving of semi-open and open-ended problems (e.g., life science, mechanical-technical, and spatial analytical), and the other was a measure of domain-specific skills and creative application of knowledge (e.g., concept maps in life science and physics and mathematical problem-solving). The definition of each domain was derived from the conceptual framework presented in an earlier section of this article and drawing upon the research and theories of Maker (1993), Maker and Anuruthwong (2003), Gardner (1983, 1999), Sternberg, (1985, 1997, 1999), and Amabile (1996, 2013).
For the domain-specific knowledge assessment in life science and physical science, concept maps were developed. Concept mapping is a way to assess students’ understanding of the complexity of concepts and their interrelationships, which is essentially a way to determine the extent to which their knowledge base approximates the “. . . highly integrated knowledge structure . . .” (Ruiz-Primo, Shavelson, & Schultz, 1997, p. 2) of experts in a domain. In math, instead of concept maps, a paper-and-pencil assessment of knowledge, application of knowledge and heuristics, and conceptual understanding of mathematical processes was modified from an existing instrument considered to be too narrowly focused on computational mathematics. The core competencies, problem types, and tasks for each assessment are summarized in Table 2. Each assessment is described in depth in other publications (Alfaiz et al., 2020; Bahar & Maker, 2020; Maker, 2020; Maker & Zimmerman, 2020; Zimmerman et al., 2020).
Data Collection and Scoring
In most cases, all six assessments, which included life science performance-based assessment (Zimmerman et al., 2020), physical science (mechanical-technical) performance-based assessment (Alfaiz et al., 2020), spatial analytical performance-based assessment (Maker, 2020), concept maps (Maker & Zimmerman, 2020), and math (Bahar & Maker, 2020), were administered over a 2-day period at each school. Students gathered in the library or another large room at tables for the performance-based assessments, one observer for each three to five students, and for the paper-and-pencil assessments, they gathered in a large room with one research team member administering the assessment and two to four others assisting as monitors.
Performance-based assessments
The performance-based problem-solving assessments (life science, physical science, and spatial analytical) were administered to students in small groups of not more than five students by trained observers who were part of the research team or graduate students in special education for talented students. Observers gave instructions, distributed materials, interviewed students, and recorded their responses, and as a group (along with other observers) scored responses and assigned ratings. Scoring systems and rubrics were designed for each assessment during the field testing and revised as needed before being used for decision-making in this study.
After the assessments were completed, observers met to review the students’ performance. First, they listened to the audio records of student interviews and transcribed their responses, completed their notes about each student in the group, and assigned tentative scores and ratings. When all observers had completed this process, they met to discuss the students’ performance. They reviewed each student’s performance and reached group consensus about the scores to assign.
Concept maps and math assessments
The three paper-and-pencil assessments were administered in large groups by one or two members of the research team. Depending on the size of the group, two to four monitors walked around the room, checking students’ maps and their responses to math problems to make certain they were following instructions, and to answer students’ questions. They did not suggest answers or comment on students’ responses.
Ratings and scoring procedures
After all scoring of the paper-and-pencil assessments of school-based knowledge was completed and rubrics were applied to the results of the performance-based assessments, every student was assigned a rating on each of the six assessments. The first (lowest) level was unknown, signifying that the student had not demonstrated exceptional abilities in that domain, so the level of ability was not known. The second level was maybe, signifying that the student had shown some evidence of competence, but her or his performance was generally below the average of the students assessed. The third level was probably, signifying that the student’s performance, although being above the average of those in the school, was not outstanding, but the student probably had strengths in this domain. The fourth level was definitely, signifying that the student definitely performed at an exceptional level in this domain. The fifth level, wow, was reserved for those students whose performance was exceptionally high; they performed beyond even the highest expectations of the observers.
During the development of the DISCOVER assessments, a decision was made to use words to denote levels of ability rather than numbers, and to recognize, by using unknown for the lowest level, that if a student does not demonstrate abilities, the reasons are not clear. Perhaps the student was having difficulties that day and was unable to perform. The tasks also may not be engaging to some students. In some of the areas, ratings were based on a rubric specific to the assessment. In other areas, such as the concept maps and math assessments, after scoring was complete, the research team members assigned overall ratings of unknown, maybe, probably, definitely, and wow using Jenks Natural Breaks system (Jenks, 1967), a method in which all scores are listed from highest to lowest, and the same rating is assigned for scores that are grouped together and are separated from other groups by at least 3 to 5 points. Specific scoring procedures, including rubrics, points assigned, and criteria for decision-making are included in articles about each assessment (Alfaiz et al., 2020; Bahar & Maker, 2020; Maker, 2020; Maker & Zimmerman, 2020; Zimmerman et al., 2020).
Selection of M1 Students
Students in the M1 group were selected by an admission committee from the College of Pharmacy, the Bio5 Institute, and the College of Science. The committee considered the combination of each student’s GPA, letter(s) of recommendation, and self-statement. The cutoff for GPA on the program website was 3.2, so if a student with a lower GPA was considered, the decision was based on the strength of the teacher recommendation(s) and the student’s self-statement. After initial acceptance, students were interviewed to determine their specific interests to make certain placements were available that matched their interest areas. Further selection of students to participate in the study was based on the similarity of the demographics of their schools or communities (location in the state, ethnicity, SES) to the demographics of the partner schools and the students’ willingness to participate in the M2 assessments before, during, and after the internship program. However, as indicated previously, very few students selected for the internship program through the M1 methods came from schools or communities similar to those of the M2 students, so selection of students with similar demographics was not possible (Table 4).
Selection of M2 Students
After a day of testing, the team gathered to analyze the results of the performance-based assessments. All members of the team reviewed all information collected about the students and made decisions about the ratings to give for each student’s performance. This is an important procedure to note because observer bias is less likely to be a factor in decision-making when multiple observers discuss and reach consensus about ratings to assign (Griffiths, 1996; Kassymov, 2000; Maker, 2005). Students at the large high school in the urban area, predominantly Hispanic, were compared with other students at that school, and students at the smaller high schools with predominantly American Indian students were compared with each other. Initially, students at each of the two schools with American Indian students were compared with only the students at their school, but later, these results were combined because of the small numbers at one school. Comparison of students at similar schools is an important factor to note as it lessens the effect of differences in opportunities to learn information taught in school.
The research team selected students for the internship program in the following order: first were the students who had ratings of definitely or wow in all areas assessed, next were those who had ratings of definitely or wow in four or five areas, and next were those who had definitely or wow in three areas. If more placements were available, the team considered the patterns of ratings on performance assessments and assessments of second-order knowledge. Similar to the M1 student selection, after initial acceptance, students were interviewed to determine their specific interests to make certain placements were available that matched their interest areas.
Schools
The new assessments to select M2 students were conducted in the project’s partner schools: four high schools located in the Southwestern United States. The number of students in these schools ranged from 94 to 2,245, and more than 71% received free or reduced-price lunches. In two schools, all students were American Indian; in one school, 97% were American Indian and others were White; and in the school in the urban area, the makeup of the school was the following: Hispanic, 83.4%; American Indian, 5.0%; Asian, 0.7%; Black, 3.7%; White, 7.0%; Pacific Islander, 0.3%. All partner schools were located in poverty areas with an unemployment rate that ranged from 7.7% to 45.8%. A limited number of academic programs were offered in the larger schools, one on the American Indian reservation and one in the urban area, such as Honors and Advanced Placement (AP) classes and programs for gifted students. However, the smaller schools on the American Indian reservation did not have these programs.
The students selected by conventional methods (M1) came from a variety of schools and types of schools: Most were from public schools in urban areas, but students also came from charter schools, a science academy, a special public high school for high-ability/high-achieving students, and one school in a high-income area of the community with a significant percentage of Hispanic students. Two students came from a public school on the Mexico border, which was predominantly Hispanic, two students came from the partner school in the urban area the second year, and two students came from the partner schools in rural areas that were predominantly American Indian, one each year of the project. Table 3 shows ethnicity of students in the two groups and Table 4 shows economic data from the schools attended by students in the two groups.
Data Analysis
To answer the two research questions, the research team conducted chi-square analyses of the differences between the students identified by both methods in the following areas: gender, ethnicity, primary language identified, and highest educational level of at least one parent. Then, they conducted a t test of the differences in overall GPA.
Results
Research Question 1
Gender
In Table 5, the results of the chi-square analysis for gender are shown. The chi-square statistic was 0.322 and the p value (two-sided) was .756. The result was not statistically significant, showing that significant differences were not found in gender of the students identified by the two methods.
Results of Chi-Square Analysis of Gender by Group.
Note. 0 cells (0.0%) have an expected count less than 5. The minimum expected count is 7.91.
Ethnicity
In Table 6, the results of the chi-square analysis of ethnicity by group are shown. The chi-square statistic was 14.371 and p value (two-sided) was .006. The ethnic balance of the two groups was significantly different.
Results of Chi-Square Analysis of Ethnicity by Group.
Note. Five cells (50.0%) have an expected count less than 5. The minimum expected count is 1.40.
Language
In Table 7, the results of the chi-square analysis of language by group are shown. The chi-square statistic was 2.163 and the p value (two-sided) was .539. The groups were not significantly different in primary language.
Results of Chi-Square Analysis of Primary Language by Group.
Note. Six cells (75.0%) have an expected count less than 5. The minimum expected count is 0.47.
Highest level of parent education
In Table 8, the results of the chi-square analysis of highest level of education of at least one parent, using four levels (e.g., middle school, high school, a bachelor’s degree, graduate degree [MA or PhD]) are shown. The chi-square statistic was 5.326 and p value was .149. The differences were not statistically significant. A second chi-square analysis was conducted because of the observed differences between the highest and lowest two groups (high school and graduate degree), which would have the greatest impact on expectations for achievement and the number and type of opportunities provided for students. For this new analysis, the chi-square statistic was 4.073 and the p value was .044, which is significant. If only the highest (MA and PhD) and lowest levels (high school) were considered, the groups were significantly different, and if all four levels were considered, the groups were not significantly different in the highest level of parent education, mainly because the groups had the same number with parents who had bachelor’s degrees.
Results of Chi-Square Analysis of Highest Level of Parent Education by Group.
Note. Four cells (50.0%) have an expected count less than 5. The minimum expected count is 0.47.
Overall GPA
A t test was applied to find differences in overall achievement as measured by unweighted GPA (UW GPA) between students identified by conventional methods (M1 students) and those identified by the new methods (M2 students). The mean of M1 students was 3.93 and the standard deviation was 0.20. The mean of M2 students was 3.07 and the standard deviation was 0.59. The t-test statistic was 6.43 and p value was .000. The groups were significantly different in overall GPA.
Research Question 2
Students identified by the conventional methods (M1 students) and the students identified by the new methods (M2 students) were not significantly different in gender or primary language (Tables 5 and 7). When the differences between four levels of educational attainment of parents were considered (Table 8), the groups were not significantly different; however, when only the lowest and highest levels of parent educational attainment were considered, the groups were significantly different. The M1 and M2 groups were significantly different in ethnicity (Table 6) and overall achievement (GPA). Thus, the traditional method limited the diversity of students identified in three areas: highest level of educational attainment of parents, ethnicity, and overall GPA.
Discussion
Limitations
Before discussing the results of the analyses, some general limitations need to be noted because they have a potential impact on the results. Perhaps the most significant limitation was the difficulty securing parent permission for students to participate in the assessments. Although the team was prepared to assess all 11th-grade students at each partner school, this was not possible. For parents unfamiliar with legal requirements for research, the length and complexity of the forms were barriers. The forms were provided in English and the predominant languages of parents; however, many adults do not read fluently in their native languages, and some do not read at all in their native languages. Many American Indian languages, for instance, have not been written or have been written only recently, so only those who attended bilingual schools with an emphasis on writing in Native languages were able to read the materials in their home languages. At School D, for example, to get even a small percentage of parents to sign consent forms, a community liaison drove to the homes of students in an attempt to get parent or guardian signatures on the forms.
The pool of students assessed was limited by the lack of parent consent. At School A, out of approximately 300 11th graders, only 75 participated in Year 1 and 100 in Year 2. At School B, out of 60, 21 participated in Year 1 and 25 in Year 2. At School C, all except one or two students participated. Many were old enough to sign the student assent forms without parent permission. At School D, out of 150 students, 74 participated in Year 1 and the school declined to participate in Year 2 because obtaining parent permission was so expensive and time-consuming. The fact that the groups did not contain the entire available population may limit the generalizability of the results.
Another limitation was the restriction on collection of data from students in the partner schools who were assessed but not offered placement in the special internship program. Because these specific students’ demographic characteristics could not be described, the research team was unable to determine the degree to which the students assessed were representative of the population of each school. Considering the demographics of the schools, however, one can conclude that the students who were selected by the new methods were definitely from low-income, culturally diverse groups usually underrepresented in programs for exceptionally talented students in STEM. The generalizability of the findings is affected by this limitation.
In general, the small sample size was a limitation and was due to both the level of funding and the research approval process. Only a few schools could be included in the research project; they were purposefully selected because of their high percentages of the two cultural groups most often underrepresented in programs for exceptionally talented students in the state. Administrators of the schools and school districts also were those who were willing to participate in the project. Thus, the generalizability of the findings is limited, and others are encouraged to use the assessments with larger populations to determine their validity as tools for identifying exceptionally talented students in STEM from underrepresented groups.
Differences Between Groups (Questions 1 and 2)
The characteristics of students identified as exceptionally talented with traditional methods such as teacher recommendations, GPA, and student self-statements (M1) were compared with students identified using the new assessments of problem-solving and abilities in STEM developed during the research project (M2): performance-based assessments, concept maps, and a written assessment of problem-solving in math. No statistically significant differences were found in gender or primary language. However, the groups were significantly different in highest level of parent education, ethnicity, and overall GPA.
Overall GPA
The most important difference between the two groups was in their GPAs. The differences were statistically significant at p = .000. In the M1 group, GPAs ranged from 3.71 to 4.0 with an average of 3.93, whereas the M2 students’ GPAs ranged from 2.11 to 4.00 with an average of 3.07. In many programs for exceptionally talented students, participants are screened out based on their overall GPA. The special internship program is no exception. In program materials, the stated requirement was a GPA of 3.2 on a 4.0 grading scale. Of the students in the M2 group, 8 out of 23 would not have met the requirement of 3.2 on a 4.0 grading scale, and most likely would not have applied to the program. Perhaps more importantly, examination of the GPAs of students identified using the conventional methods shows that the lowest GPA of students accepted was 3.71. Using this real decision-making cutoff, 18 out of 23 would not have been selected.
Another important fact to consider in this discussion is that many exceptionally talented students have uneven profiles. They can demonstrate high levels of competence in one area with low levels or even lack of competence in another area (Newman & Sternberg, 2004; Winner, 2000a, 2000b). Similarly, research on underachievement has shown that underachievement is more common than expected in students with exceptional talent when achievement is defined by conventional measures such as overall GPA (McCoach & Siegle, 2003a).
Another factor to consider when thinking of these results is that the students who do well on assessments such as those designed during this project, which include open-ended problem-solving requiring creative thinking, may not get high teacher recommendations or have high GPAs. Research on teacher perceptions of creative students has shown that teachers have a negative view of students who are creative, and that this view often affects their assignment of grades (Westby & Dawson, 1995) even though measures of creative thinking and creative activities such as involvement in active research may be more valid predictors of productivity and performance in STEM areas than intelligence and school grades (Feist, 2006a, 2006b; Hong & Milgram, 2010; Milgram & Hong, 1993). Students who receive high grades and high recommendations may be better at “problem-doing” in planned laboratory experiences than in “problem-solving” in which they must identify a researchable problem, design appropriate scientific methods, and persist until they have reached an acceptable solution (Brandwein, 1992, 1995; Maker, 1993, 1996, 2005). Together, these results suggest that, as McCoach and Siegle (2003a) conclude from their study of achievers and underachievers, some highly talented high school students may not value the goals teachers have set for them, and are not motivated to achieve those goals.
M1 and M2 student performance on the assessments created and used in this project provide further support for the use of methods other than GPA for selecting high school students for special STEM projects. Researchers (Miller, 2004; Plucker et al., 2010) have found that students of color are “severely underrepresented” among the top 1%, 5%, and 10% on almost every measure of achievement: grades, GPA, class rank, and standardized test scores such as state achievement and NAEP results. This was not the case with the new assessments created for the CDTIS project. Students of color and students from low-income groups were well represented at all levels (Figures 2 and 3). Although one might expect, based on the research of Miller and Plucker and colleagues, that scores of the M2 students on the measures of domain-specific and domain-general knowledge and skills (concept maps and mathematical problem-solving) consisting of mainly knowledge gained in school (second-order knowledge) would be significantly lower than scores of M1 students, this was not the case. The scores were very similar (Figure 2) on physics concept maps (Maker & Zimmerman, 2020) and math (Bahar & Maker, 2020), and the average scores of M2 students were higher than scores of the M1 students on the life science concept maps (Maker & Zimmerman, 2020). Figure 2 shows a comparison of the UW GPAs of the two groups and their scores on concept maps and mathematical problem-solving. A clear and substantial difference was found between the two groups on GPAs but not on the new assessments.

Comparison of ratings of M1 and M2 students on measures of domain-specific and domain-general knowledge and skills (GPA, concept maps, and mathematical problem-solving).
On the performance-based measures of creative problem-solving in life science (Zimmerman et al., 2020), physical science (mechanical-technical) (Alfaiz et al., 2020), and spatial analytical (Maker, 2020) domains (Figure 3), the ratings of students in the M2 group were higher than the ratings of students in the M1 group in all areas. The average of M1 students on the life science performance-based assessment was 3.35 and the average for M2 students was 3.50; whereas the average of M1 students on the life science (mechanical-technical) performance-based assessment was 3.00 and the average of M2 students was 3.27. The greatest difference was on the spatial analytical performance-based assessment: 3.25 for M1 students and 3.64 for M2 students. Thus, one can conclude that the new measures were culturally responsive measures of the knowledge, skills, and abilities needed for future STEM innovators. On these measures, students of color from low SES levels were present in the top 1%, 5% and 10% on all assessments even though they were not in these top levels in GPA.

Comparison of ratings of M1 and M2 students on PBAs of creative problem-solving.
Gender
Although no significant gender differences were found between M1 and M2 groups, an interesting fact is that across the two groups, more girls were selected (26) than boys (17). Reasons for these differences could include factors such as these: (a) because the internships were offered during the summer before students’ last year of high school, boys may have been more likely to be expected to work or more likely to want to work, so they did not apply or did not choose to accept if selected, and (b) more girls than boys applied to the program through the usual method or agreed to participate in the assessment, so the pool of female applicants was greater than the pool of male applicants.
Primary language
Of the 43 who were selected for the special program for exceptionally talented students, only eight identified a primary language other than English. In the partner schools, information about primary language is not available to researchers unless it is collected directly from students. The institutional review board did not allow the research team to collect information about the primary languages of all students who were assessed. This information was collected on only those who were selected and chose to participate in the internship program. Although a few more M2 students than M1 students identified a primary language other than English, the differences were not significant. The reasons for the lack of significant differences are unclear, but one possibility is that the pool of students whose parents signed the consent forms were predominantly native English speakers. Perhaps the English-speaking parents were more likely to sign the forms and also were more familiar with legal requirements for research projects. In the school with a large percentage of Spanish-speaking students, forms were provided in Spanish, but not in other languages spoken by the students.
From another point of view, because of the significant differences in ethnicity of M1 and M2 students in the two major language groups of the state, the following information is important: Of the American Indian students in the M2 group, one out of 13 (7.6%) identified Navajo as her or his native language, higher than the state average of 3%; of the five Hispanic students in the M2 group, three (60%) identified Spanish as her or his native language, much higher than the state average of 12%. In the state where the project was located, the dropout rate of English language learners (ELLs) is significant: Only 18% of ELLs in the state graduate, whereas across the country, the graduation rate is 63% (Sanchez, 2017). A related factor, perhaps a reason for the low graduation rate, is that policies in the state limit the participation of ELLs in regular classes. They are placed in instructional blocks that keep them from earning credits in other classes, including those required for graduation (Jung, 2017). Thus, many of these students may not have participated in the new assessments, both because their teachers did not understand that a high level of reading or oral language proficiency was not required, especially for the performance-based assessments, and the students were not encouraged by their parents to participate in the assessments. Comparison of numbers of students from the two main cultural groups in the state and their identified native languages shows that a higher percentage of M2 students than M1 students identified a native language other than English; however, the differences were not statistically significant and the reasons for the differences or lack of statistical significance in the differences are not clear.
Highest level of parent education
The research team initially identified four levels of parent education and collected information about all four levels: middle school, high school, bachelor’s degree, and master’s or doctoral degree. The number of parents with a bachelor’s degree was the same (five), but the differences between the two groups were greater when the lowest level with more than one student, high school (eight M1 and 15 M2), and the highest level, graduate degree (six M1 and 3 M2), were considered (Table 8). For these reasons, the chi-square analysis was run a second time, including only high school and graduate degree. These differences were statistically significant.
To help understand these differences, state averages for all adults are compared with the levels of attainment of parents of M1 and M2 students. In the state where the project was located, approximately 25% of all individuals aged 25 and older have a high school degree or equivalent. Of the students in the study, 40% in the M1 group and 65.2% in the M2 group had parents with only a high school degree. One could conclude that the new assessments, compared with the usual methods, enabled selection of a higher percentage of students whose parents have the lowest level of educational attainment. Furthermore, in the state, 10% of adults aged 25 and older have a graduate or professional degree. Of the students in the study, 30% of M1 and 13% of M2 students had parents with a graduate or professional degree. The pattern of attainment in the M1 group was much higher than the state average for all ethnic groups, and the pattern of attainment in the M2 group was aligned with the demographics of the state. Based on these statistics, one could conclude that the new assessments enabled students whose parents were not highly educated to be recognized as exceptionally talented in STEM.
Ethnicity
The students identified through the usual methods (M1) and the students identified using the new methods (M2) were significantly different in ethnicity (Table 6). These results need to be interpreted from several points of view. First, identifying a higher percentage of American Indian students in the M2 group can be expected because of the ethnic makeup of the partner schools involved. Three of the schools had 97% to 100% American Indian students, whereas the fourth school had 83.4% Hispanic students. Thus, one would expect that in the M2 group, selected from partner schools, high percentages of Hispanic and American Indian students would be identified.
Another way of looking at the results is to consider the selection process as consisting of recruitment, assessment, and selection. For M1 students, the process of recruitment consisted of visiting schools, describing the program, distributing application forms, and encouraging students to apply. In many cases, students who had attended the program the previous year assisted by encouraging others to apply. Prior to the project, no partner schools with high percentages of American Indian students were included in the recruitment process. Thus, they did not know about the program and did not apply. During the project, however, these schools were included in the recruitment process. Discussions with the teachers and administrators during the project revealed that many students did not apply through the M1 process because they did not think they would be accepted. In addition, many of the students who were recommended for acceptance to the program through the M2 selection process were reluctant to complete the required application forms until they were shown their profiles and told how their performance compared with the performance of others. In short, many of the students were uncertain of their abilities and lacked the academic self-confidence needed to seek out advanced experiences and apply for admission to a special program in STEM. This result is consistent with research showing that African American, Hispanic, and American Indian students’ lower academic self-perceptions and lower self-expectations affect their achievement and their willingness to engage in special programs and opportunities (McCoach & Siegle, 2003b). A helpful conclusion to make from these results and the potential reasons for them is that the use of performance-based assessments and other similar measures is important for students from low-income and culturally diverse groups, and both the students and their parents need to be given the results in a format they can understand (Pease et al., 2020). The students need encouragement and assistance in identifying potential opportunities (Olszewski-Kubilius et al., 2015; Subotnik et al., 2011) to develop their abilities.
Comparison with other research
These results are similar to the findings of researchers who compared students selected with the DISCOVER performance-based assessments of creative problem-solving with students selected by conventional methods such as IQ and achievement. The percentages of students selected using DISCOVER and a performance-based assessment modified from DISCOVER paralleled the percentages of individuals with certain demographic characteristics in the populations of low-income, highly diverse students in the schools in several areas, including the following: race/ethnicity (Nielson, 1994; Reid et al., 1999; Romanoff et al., 2009), ancestral origin, preferred language, second language, religious preference, birthplace of parents, mobility, family income, family home, parents’ occupation, parents’ educational attainment, degrees earned, teaching experience, identified as gifted, and work or volunteer at a school (Nielson, 1994).
Important to consider is that other researchers have found that including measurements of creativity and creative problem-solving in talent identification is beneficial: (a) both domain-general and domain-specific abilities and creativity are important variables associated with outstanding achievements in productivity and performance (Feist, 2006a, 2006b; Subotnik, et al., 2011), so inclusion of both increases the predictive ability of the talent identification (Hedlund et al., 2006; Sternberg, 2010); and (b) inclusion of measures of creativity increases the potential to identify students from groups traditionally underrepresented in programs for exceptionally talented students because they score at similar levels (Glover, 1976; Ivcevic & Kaufman, 2013; Kaufman, 2006; Kaufman et al., 2004, Torrance, 1971) or sometimes higher (DeVries & Shires-Golon, 2011; Ghonsooly & Showqui, 2012; Jenkins, 2005; Kharkhurin, 2012) than students from mainstream groups.
Recommendations
Assess “What They Know”
Based on the results of the CDTIS project and the results of research on the DISCOVER assessments from 1992 to 2005, the most important recommendation for identification of students with exceptional talent in STEM is to include measures in which the emphasis is on “What do the students know and what can they do?” rather than “Do the students know ____?” and “Can they reach the conclusion that _____?” By using this open-ended approach, the different experiences and different levels and types of exposure to information do not determine the levels of ability identified. Here are some examples from the different parts of the assessment:
In the life science performance assessment, the first task is to notice the characteristics of either flowers or insects and to make groups based on their similarities. In this example, students are not asked to remember classification schemes they have been taught, nor are they asked to identify certain characteristics from a list of those considered to be accurate. They are asked to use their ability to observe and compare characteristics, and to group the flowers or insects based on similarities. These are thinking and problem-solving skills, not memory skills.
In the concept mapping assessments, opportunities are provided for students not only to demonstrate that they know the appropriate connections among concepts but also to make connections that might not be conventional. They also are invited to add examples from their own experience.
Another related recommendation is to include interviews. When an observer sits with a student individually and asks him or her to explain a drawing or a construction or to provide additional information about a grouping of flowers or insects, the student does not have to depend on her or his ability to write an answer. It also gives the observer an opportunity to ask a student to provide additional details or explain an answer that is not clear to the observer. Across the different assessments, interviews seemed most important in the life science performance assessment because students often told more in their interviews than they could demonstrate when grouping flowers or insects or designing their ecosystems. An interview also was included in the physical science (mechanical-technical) assessment. Students were asked to explain how their machines operated. Often students demonstrated the working of the machine with little explanation, but the explanations often helped observers understand the students’ levels of knowledge and experience with mechanical-technical concepts. For students with strengths in visual and spatial abilities, making the product first and explaining it later is an important component. For those with strengths in verbal abilities, being able to explain how something is supposed to work provides important information about students’ abilities.
Use Concept Maps as Assessments of Domain-Specific Ability and Developing Expertise
Conceptions of ability as “developing expertise” rather than a static trait are supported by research more frequently than previous views of ability as stable and unchangeable (Flynn, 1987, 2007; Sternberg, 1999; Subotnik et al., 2011); thus, assessment of the current level of expertise in students is an important way to identify those with exceptional talent. The concept mapping assessment developed by Novak and Gowin (1984) and refined during a number of projects by Ruiz-Primo and colleagues (Ruiz-Primo, 2007; Ruiz-Primo et al., 2001; Ruiz-Primo & Shavelson, 1996) is a promising practice that needs to be extended beyond assessments in STEM to other areas in education of exceptionally talented students. Using concept maps is an excellent way to assess students’ higher levels of thinking and their understanding of concepts and their interrelationships, demonstrating the degree to which their thinking is similar to the types of thinking demonstrated by experts in a domain: organized around key concepts, containing patterns of meaningful information, and showing remote connections among different hierarchies of concepts.
Another important strength of the concept mapping assessment is that it has no ceiling. Even if students have completed maps of the same concepts in the past, as their knowledge base becomes increasingly interconnected and diverse, their scores can increase, demonstrating the ways they have integrated new knowledge and experience into their understanding of the domain. Thus, concept maps can be used as pretest and posttest measures of gains from special programs and can be used to track the development of domain-specific expertise over time as recommended by Sternberg (1999). Students’ maps also can be compared with the maps of experts to determine their progress in development of the types of knowledge structures of experts in each domain (Ruiz-Primo et al., 1997).
Include Measures of Both First-Order and Second-Order Knowledge in Each Domain
In the CDTIS project, six assessments in three domains were included in the identification of students with exceptional talent in STEM: math (math problem-solving and spatial analytical performance assessment), life science (life science concept map and life science performance assessment), and physical science (physics concept map and mechanical-technical performance assessment). The measures of first-order knowledge gave information about students’ problem-solving and creative thinking, creativity-relevant skills and abilities in Amabile’s (1996, 2013) theory, whereas the measures of second-order knowledge gave information about their development of domain-specific expertise (Amabile, 1996, 2013; Sternberg, 1999). While observers are watching students during the performance assessments, they also can evaluate students’ task motivation (Amabile, 1996, 2013; Renzulli, 1978).
If selecting students for specific domains rather than STEM in general, identification procedures can be based on the two assessments in each domain rather than administering all six assessments. All students in a school or at a certain grade level should be assessed, however, rather than relying on teacher recommendations to screen out those who participate in further assessments. Teachers often overlook those who are creative thinkers and problem solvers (McCoach & Siegle, 2003a) especially if they are not high achievers on conventional measures.
Use Performance Assessments That Include Real-Life Tasks
As recommended by Maker (2005) and Hong and Milgram (2013), the tasks in performance assessments need to be as closely related as possible to the types of problems professionals solve in real-life situations. Based on his research on the prediction of adult accomplishments in STEM, Wallach (1976) recommended “…we should rely not on tests (such as IQ and achievement) but on samples of professional competencies themselves” (p. 57). In the context of identifying exceptional talent in STEM, these tasks need to be related to the problems solved by mathematicians, scientists, and engineers. In the new assessments, for instance, one of the life science tasks is to develop groups of either flowers or insects based on the similarities in their characteristics. Noticing similarities and differences in natural phenomena is an important skill of a life scientist, and one of the major tasks of life scientists is developing and validating classification schemes. In the physical science (mechanical-technical) assessment, the second task is to make a vehicle that functions the way it should function and the third task is to create a new machine. Both of these tasks represent important problem-solving challenges faced by engineers: designing vehicles that function well and developing new and innovative designs that function well or perhaps better than those of the engineers in competitive companies.
Use a Low Cutoff Score (i.e., 2.0 GPA) and Administer Culturally Responsive Assessments
If the use of a GPA cutoff score is important, set it at a low level such as 2.0 and then administer concept maps, math problem-solving, and the performance-based assessments in the domains of interest or in all domains of STEM. If, for instance, life science is the domain of interest, use the life science concept mapping assessment and the life science performance assessment. If physical science is the domain of interest, use the physics concept mapping assessment and the mechanical-technical performance-based assessment. For math, both the math problem-solving and the spatial analytical performance assessments can be used together. As noted above, the spatial analytical assessment can be used alone as a measure of domain-general abilities across STEM areas.
Setting a low cutoff score without additional assessments is not recommended because many students would be selected who would not demonstrate the knowledge and skills needed for exceptional performance and productivity (Subotnik et al., 2011). However, setting a cutoff score that is too high would result in screening out many students who do have those capabilities, as demonstrated in this research.
Use the Spatial Analytical Assessment as a Domain-General Assessment for STEM
If only one performance assessment can be chosen as a way to identify exceptionally talented students in STEM, the best choice is the spatial analytical assessment. Spatial ability is (a) a key component of the abilities of adolescents who achieve advanced educational and occupational credentials in STEM careers and (b) an important component in educational and occupational settings for all individuals including those who are talented (Wai et al., 2009). In addition, many intellectually talented students are excluded when criteria are restricted to mathematical and verbal ability measures (Wai et al., 2009). Wai and colleagues (2009) found that 70% of the top 1% in spatial ability were not in the top 1% in either math or verbal abilities, and would be missed in many talent searches even though their spatial abilities enabled them to be highly successful in biological sciences, physical sciences, engineering, and mathematics.
A similar pattern can be found in the comparison of M1 and M2 students in the current study. Of the eight students who would not have been accepted into the special internship program because their GPAs were lower than the published cutoff criterion, six had scores of definitely or wow (4 or 5 on a 5-point scale) on the spatial analytical assessment, the scores used to place students in special programs. One student scored 3 and another scored 2. The average score of these eight students on the spatial analytical assessment was 3.75 (on a 5-point scale) and their average GPA was 2.49 (on a 4-point scale). As noted earlier, as many as 18 out of 23 probably would not have been chosen if one compares their GPAs with those of the students selected through the M1 process. Also relevant is the fact that in other studies of the numbers and percentages of students identified as exceptionally talented using DISCOVER, no ethnic, cultural, or economic biases have been found when using the spatial analytical assessment (Maker, 2020; Sarouphim, 2001, 2002, 2004).
Conduct More Research on the Validity and Reliability of the Instruments
Because of limited participation at three out of the four partner schools, the pool of participants may not have been representative of the total school population, with a greater percentage of students from homes of parents with higher educational levels, better English skills, higher incomes, and positive attitudes toward educational opportunities. Thus, the differences between those identified using the conventional methods and those identified using the new methods may have been greater than the differences found in the study. Readers are encouraged to continue this line of research, and to evaluate both the psychometric properties and the practicality of the new assessments. At this time, we have not conducted predictive validity or concurrent validity studies on the new instruments, but these will be essential to the future use of the new assessments.
Conclusion
Many students from cultural groups and low-income families usually underrepresented in special programs were identified as exceptionally talented using the new measures developed during the CDTIS project. These often-overlooked students were given an opportunity to participate in a special internship program at an R1 university in the Southwest. Reports from teachers, graduate students, and observers confirmed that although some students from both the M1 and M2 groups needed additional academic and personal support, both groups were successful in the internship program, demonstrating that students from all cultures can be successful in high-level educational programs. Students participated in original research, created posters about their work, presented to a community of researchers and parents, and also presented to school boards and other interested groups in their local communities. Some also returned in subsequent years as interns to assist other students in the program. Many have applied and been accepted to the R1 university where the internship was held, and many have applied and been accepted to other universities in the state and region. Students identified by the new assessments (the M2 group) believed that the program had a significant impact on their goal setting and planning for the future (Wu et al., 2019).
Another important conclusion is that the use of performance-based assessments and other similar measures is important for students from low-income and culturally diverse groups, and that both the students and their parents need to be given the results in a format they can understand (Pease et al., 2020). Students and their parents, especially parents without college degrees, need encouragement and assistance in identifying potential opportunities (Olszewski-Kubilius et al., 2015; Subotnik et al., 2011) to develop their abilities. Special programs that capture students’ interest, enable them to engage in original research (Feist, 2006a, 2006b), and act as catalysts for their motivation are essential to their continued participation in STEM experiences and careers (Subotnik et al., 2011). Finally, in the words of Joseph and Ford (2006) “. . . every effort must be made to ensure that belief systems, tests, policies, and procedures do not serve as gatekeepers that close the doors of opportunity for diverse students” (p. 50).
Footnotes
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the National Science Foundation (Grant 1321190, Cultivating Diverse Talent in STEM—principal investigator [PI], Uwe Hilgert; co-PIs, C. June Maker, Frans Tax, and Martha Lindsey), The University of Arizona, Harold Begay, and Tuba City Public Schools.
