Abstract
The current study examined the effects of higher order thinking skills (HOTS) interventions with gifted students in Taiwan. A total of 25 studies published between 1997 and 2017 were included. Twenty-nine effect sizes were extracted for the 25 studies. The small number of existing studies indicates a lack of scholarly attention to HOTS in gifted education in Taiwan in the past two decades. On the other hand, the effect sizes, ranged from 0.26 to 2.01, with a mean of 0.78 and standard deviation of 0.39, showed moderately large effect sizes for these interventions, which can be interpreted as evidence for general effectiveness. Subgroup analyses indicated that intervention effects did not vary significantly by grade level, type of program, intervention dosage, and type of dissemination. However, a statistically significant difference was found between the effect sizes in different types of instructional design (i.e. stand-alone HOTS unit vs. integrated HOTS unit). Implications are discussed.
Keywords
In today’s fast-changing world, students are required to not only acquire knowledge but also learn skills that help them synthesize and generate knowledge (Ananiadou and Claro, 2009). Thus, higher order thinking skills (HOTS) are highly valued. According to the World Economic Forum (2016), large global firms forecasted that by 2020, creativity will be one of the top three most desired job skills, along with critical thinking and complex problem-solving skills. In recent years, many education systems, including that of Taiwan, have been responding to these emergent demands (Ananiadou and Claro, 2009). While many studies in North America have shown that HOTS can be enhanced via instructional interventions (e.g. Abrami et al., 2008; Yasin and Yunus, 2014), there is no study that demonstrates the effectiveness of HOTS interventions in Taiwan, where HOTS may be viewed differently (Chan, 2013). The main purpose of this study is to explore the nature and the effectiveness of HOTS instructional interventions conducted in Taiwanese K-12 settings.
An overview of gifted education in Taiwan
Gifted education in the Taiwanese public K-12 system started in 1973 when the Taiwanese Ministry of Education launched a pilot program for intellectually gifted elementary students (Wu, 2013). Today, students identified as gifted by local school districts in Taiwan are served through one of the following two schemes. First, there are school-based pullout resource classrooms—where gifted students remain mostly in regular classrooms while receiving 6–10 hours of enriched curriculum weekly in a resource classroom. Second, there are district-based gifted programs where students are enrolled in part-time programs coordinated by several schools via resource sharing. These part-time programs are typically held during weekends, winter vacation, or summer vacation and may last for several weeks. The content of the programs is flexible and may involve independent studies of natural sciences, social sciences and humanities, leadership, and creativity. Generally speaking, enrichment is the primary form of gifted education in Taiwan. Only a few students each year are qualified for grade skipping or other forms of acceleration.
Embracing evidence as a foundation for practice in Taiwan
Gifted education policies in Taiwan have drawn inspiration from the North American system since their onset (Wu, 2013). Recently, evidence-based practices (Collinson et al., 2009) have emerged as critical for decision-making in Taiwan.
In 2002, the United States launched the No Child Left Behind Act (NCLB), which altered the discourse of evidence-based practices in education (Edyburn, 2010). NCLB stressed efforts to encourage empirically supported pedagogical techniques that could meet the growing demands for increased accountability and improved outcomes through rigorous scientific research (Robinson et al., 2007). Evidence-based practices have since influenced many professions, including gifted education (Adelson and Matthews, 2019; Callahan and Moon, 2007; Cook and Odom, 2013; Groccia and Buskist, 2011; McBee et al., 2018; Reis et al., 2007). Although evidence-based practices are considered optimal for teachers and policy makers in Taiwan (Hung et al., 2015; Tseng et al., 2011), there is a lack of evidence-based inquiries in gifted education.
The importance of meta-analysis on gifted education
Steenbergen-Hu and Olszewski-Kubilius (2016) argued that meta-analyses can make unique contributions to the field of gifted education. First, the findings are reliable since the results derive from systematic and replicable steps (see also Fidler, 2010). Second, by summarizing the current status of evidence, meta-analyses provide opportunities for researchers to situate their insights within the big picture. Third, meta-analyses allow researchers to examine the effects of a large number of independent variables and potential moderating influences. Similarly, Vaughn et al. (1991) have argued that meta-analyses are a more comprehensive method for conducting program evaluations in gifted education given that one could examine a wide range of independent and moderating variables simultaneously.
In the past few decades, a large number of studies on gifted education have been conducted in Taiwan (Wu, 2000; Wu and Cho, 1993; Yu et al., 2017), most of which have shown statistically significant results. However, as Warner (2008) indicated, statistical significance alone is not a warrant of practical usefulness, unlike effect size, which allows researchers and practitioners to look at the magnitude of the obtained difference between the sample mean and the hypothesized population mean. In addition, existing meta-analyses on the effects of intervention in the Taiwanese context are scarce. A review of the literature only identified one meta-analysis published in 2002, which investigated the overall effect sizes of selected instructional implementation on creative thinking performance (Peng, 2002).
Current study
While the current study focuses on the effects of empirical studies conducted in Taiwan, it nevertheless sheds light on programmatic features that can apply to the competency-focused, process-based 21st-century pedagogical approaches more generally. Through a meta-analysis, this study seeks to answer the following questions: What is the state of HOTS studies in the field of gifted education in Taiwan over the past two decades (1997–2017)? How effective were the HOTS interventions for gifted students in Taiwan? What study characteristics produced better outcomes?
Conceptual and operational definitions
Gifted students
The identification of giftedness in Taiwan is based on multiple criteria and follows a systematic procedure (Kao, 2012). According to The Implementation Regulations of Disability and Gifted Students Identification Process in Taiwan (2013), students identified as gifted must meet the following criteria: (a) scored at least 2 standard deviations above the mean or above the 97th percentile on an approved standardized individual intelligence test and (b) the existence of evidence regarding an applicant’s advanced learning needs (e.g. teachers’ responses on observational scales and recommendation letters). In the present study, gifted students refer to those who had been identified during their compulsory education (grades 1–12).
Higher order thinking skills
The term “higher order thinking” has been referred to variously as reflective thinking, higher level reasoning, synthesis, evaluation, creative thinking, critical thinking, and problem-solving (Bloom et al., 1956; Glaser, 1984; Lewis and Smith, 1993; Newmann, 1988; Norris and Ennis, 1989). Despite critiques of the overlapping terminologies and unclear definitions of higher order thinking, most researchers have agreed that it involves complicated cognitive activities such as reasoning and problem-solving, integrating and synthesizing information, and thinking creatively and yielding multiple solutions (Barak and Dori, 2009; Grossen, 1992; Lewis and Smith, 1993; Philp, 1985; Sternberg and Lubart, 1996; Zohar, 2006). In the current study, HOTS is summarized and partitioned into four abilities: creative thinking, critical thinking, problem-solving ability, and reflective thinking.
Method
Shokraneh (2019) recommended the documentation of strategies and steps adopted in a meta-analysis to facilitate reproducibility or new updates of a meta-analysis. In this study, the analytical strategies and steps adopted were as the following.
Search for eligible studies
To ensure a full coverage of empirical studies on the topic, we searched peer-reviewed journal articles, master’s theses, and doctoral dissertations through the following electronic databases: Airiti Library Collection, Index to Taiwan Periodical Literature System, ERIC Database, National Digital Library of Theses and Dissertations in Taiwan, and Google Scholar. The time span of the study search was set from 1997 to 2017, a period of 20 years.
The literature search was primarily conducted using the following keywords: gifted students, higher order thinking, thinking strategies, thinking skills, gifted programs, creative thinking, critical thinking, problem-solving skills, reflective thinking, and all possible combinations and permutations of these terms. Studies containing these keywords in their titles or abstracts were initially selected and individually reviewed to find additional references. Moreover, we conducted more specific searches focusing on research published in the following Taiwanese gifted education and special education journals: Bulletin of Special Education (ISSN: 1026-4485), Journal of Gifted Education (ISSN: 1561-3801), Journal of Special Education (ISSN: 1561-3798), and Special Education Forum (ISSN: 1994-1935). Lastly, we examined the references listed in the included studies for additional studies that could also be included. The first author held a professorship in gifted education, whereas the second author was a doctoral candidate in gifted education. Both the first and second authors were well trained to conduct a systematic literature search.
Study inclusion and exclusion criteria
Once the initial search of studies was completed, these studies were further screened in the following three stages.
Screening I
An initial screening was conducted by examining the following criteria. (a) Subject of the study: The subject of a selected study needed to speak to HOTS interventions or programming for gifted students in a K-12 setting. (B) Outcomes of the study: The outcomes of a selected study had to present changes in HOTS performance among the participants. Students’ performance was used as the dependent variable in this meta-analysis (Henfield et al., 2017). (c) Completeness and nonredundancy: To avoid a single study exerting a disproportionate influence on the overall results, the content of a selected study was reviewed to ensure that there was no overlap with other studies that derived from the same source of research findings. If the same study was reported in more than one article, only the most complete version would be included for this analysis.
Screening II
After the initial screening, we further selected studies that consisted of appropriate research methods and the establishment of comparison groups. First, an experimental design or a quasi-experimental design was required. Studies selected for this meta-analysis had to include pre- and posttest measurements. Second, studies selected had to include appropriate comparison groups. The treatment groups and control groups had to be matched in terms of subjects’ major aptitudes. Appropriate comparisons included, for example, specific teaching methods for gifted students versus general teaching methods for gifted students (Kim, 2016; Steenbergen-Hu and Moon, 2011).
Screening III
Finally, measurement instruments and statistical results were examined before the final inclusion of the study. To be included, the study must (a) use standardized tests to measure HOTS and (b) contain sufficient statistical information for effect size extraction (e.g. means and standard deviations, p values, analysis of variance [ANOVA] tables) or other essential statistics (e.g. t values, F values).
The initial search resulted in 245 studies. Twenty-five studies met all of our inclusive criteria (see Figure 1), including 17 (68%) studies on creative thinking, 6 (24%) studies on critical thinking, and 6 (24%) studies on problem-solving. Among the included studies, three studies provided results for both creative thinking and problem-solving and one study provided results for both creative thinking and critical thinking.

Flow diagram of study selection process.
Study coding
Pilot coding was first performed with three studies that led to some revisions to the initial coding sheet. The final code set contained the following two dimensions: (a) general features of the study and (b) format of the data in the study.
The first dimension included the following information: (a) contributor(s) of the article, (b) publication year, (c) type of article (i.e. journal article, thesis, or dissertation), (d) research design (i.e. experimental, quasi-experimental), (e) sample size, (f) demographics of the research participants, (g) program type (e.g. pullout gifted resource class, district-based gifted program), (h) treatment design and implementation, and (i) outcome measures (e.g. type of instrument, content areas to be measured).
The second dimension included the following information: (a) mean, standard deviation, and sample size of each study, (b) test statistics and associated degrees of freedom, and (c) effect size. Since most studies contained multiple outcome variables, only outcomes that directly assessed HOTS were coded. When difficulties in coding arose, further discussion was employed to resolve these difficulties. Interrater reliabilities were good to excellent (Cohen’s κ = 0.80), and differences were settled through discussion.
Effective size calculation
Effect sizes were used for this meta-analysis to summarize the findings of the selected studies. An effect size speaks to the strength of the relationship between two variables and reflects the magnitude of a treatment effect. When calculating the effect sizes, Lipsey and Wilson (2001) argued that the effect sizes are biased if they are based on small sample sizes, particularly for samples of fewer than 20 participants. Since the samples in many studies of gifted students are quite small in this meta-analysis, Hedges’ g was chosen as the effect size index to eliminate the influence of sample size (Borenstein, 2009). Generally, effect sizes of the research outcomes included in the studies were calculated based on raw means, standard deviations, and sample size in the studies. We subtracted the mean of one group from the other (M1 − M2) and divided by pooled standard deviation. If the comparison involved a single sample tested pre- and postmeasurements, the d-index was calculated first. The d-index would then be converted to Hedges’ g. Cohen’s (1988) criteria were used for interpretation of standardized mean differences and summarized effect sizes as small (≤0.20), medium (0.5), and large (≥0.80). Positive effect sizes were interpreted as treatment groups having stronger results than control groups.
Borenstein et al. (2009) indicated that since interventions and outcomes measures across different studies are not exactly the same, a random-effects model is preferred over fixed-effects model in a meta-analysis. Therefore, we decided to use the random-effects model to estimate the effect size. Comprehensive Meta-Analysis 3.0 software package (CMA) was used for effect size synthesis and moderator analyses. Moreover, since most of the studies contained more than one effect size, the average of these results was calculated to represent each outcome measure, as recommended by Cooper (2010).
Heterogeneity analysis
A random-effects model allows true effect sizes to vary from study to study. In this case, the observed effects varied from one to another for two reasons. One was the real heterogeneity in effect size (i.e. treatment characteristics), and the other was the within-study error (i.e. sample error). Heterogeneity analysis is used to examine whether sampling error alone might be responsible for the variance among the effect sizes (Borenstein and Higgins, 2013; Cooper, 2010; Higgins and Green, 2008; Steenbergen-Hu and Olszewski-Kubilius, 2016). A set of heterogeneity statistics were computed through the CMA software.
We examined both Q T and I 2 statistics to decide whether there was significant variability across studies. The Q T statistic is a heterogeneity statistic commonly used in assessing the collection of effect sizes (Borenstein et al., 2009; Steenbergen-Hu and Olszewski-Kubilius, 2016). A significant Q T value indicates large variability across the effect sizes that is greater than what is likely to have resulted from subject-level sampling error alone. Statistically significant heterogeneity statistics indicate that moderator variables may account for variability in effect size (Lipsey and Wilson, 2001). However, Q T is known for being sensitive to sample sizes (Abrami et al., 2008). Higgins and Thompson (2002) recommend the use of I 2 as a complement to determine how much of the variance between studies is due to true variance rather than sampling error. I 2 is the ratio of between-study variance and total variance, which represents the true heterogeneity of total variance across the observed effect estimates. It ranges from 0% to 100%. Based on Higgins and Green’s (2008) guide for I 2 interpretation, an I 2 of 0–40% indicates “likely not important,” 30–60% is associated with “possible moderate heterogeneity,” 50–90% is associated with “possible substantial heterogeneity,” and 75–100% is associated with “considerable heterogeneity.” Heterogeneity was considered significant for p values of a Q T statistic less than 0.05 and I 2 greater than 50%.
Subgroup analysis
To identify variables that may affect the interventions, we categorized the effect sizes into several groups and conducted subgroup analyses with random-effects ANOVA-like procedures for meta-analysis (Henfield et al., 2017). As Lipsey and Wilson (2001) suggested, there are three types of independent variables commonly used for meta-analysis: substantive variables, method variables, and extrinsic variables. Based on our interests and research purpose, these variables were categorized within the subgroup analysis. The substantive variables included the grade level, intervention dosage, implementation approach, and type of program. The extrinsic variable was the type of dissemination (i.e. published/unpublished) (Hedges and Olkin, 1985; Henfield et al., 2017). The computations produced from a subgroup analysis compartmentalized the total variance (Q T) into between-group effect (Q B) and within-group effect (Q W). Statistically significant differences between the group means would indicate that the effects of HOTS interventions were influenced by the effectiveness of interventions by defined subgroups (Ellis, 2010; Pertti et al., 2008).
Assessing publishing bias
Publication bias refers to the fact that significant results are more likely to be published than insignificant ones, which may lead to an overestimation of the effects found in meta-analyses (Rothstein et al., 2005). To identify possible publication bias, visual inspection of a funnel plot and the trim-and-fill procedure were used in the present study (Borenstein et al., 2009). The funnel plot provides a roughly funnel-shaped symmetric distribution, which represents the magnitude of the effect of each trial compared with a measure of its size, including the standard error. Symmetrical plots can be interpreted as having an absence of publication bias. Asymmetrical data can be adjusted by using the trim-and-fill procedure on a precision plot (Duval and Tweedie, 2000). In the trim-and-fill procedure, extreme effect sizes from the skewed side of the funnel plot are removed and projected missing effect sizes are imputed. The procedure is an iterative process that adjusts the overall effect size by identifying the number of missing studies that would balance the plot to provide an unbiased estimate of effect size. The degree of difference between the adjusted and the observed mean effect size indicates the magnitude of the impact of the publication bias on the mean effect size (Steenbergen-Hu and Olszewski-Kubilius, 2016).
Results
Among the 25 studies, 14 studies were unpublished master’s theses and 11 studies were journal articles. The majority (22 studies) were conducted in elementary school settings (grades 3–6) and the rest (3 studies) were conducted in junior high school settings (grades 7 and 8). Among the 22 studies conducted in elementary school settings, 16 included creative thinking skills, 6 included problem-solving skills, and 4 included critical thinking skills. Three studies examined both creative thinking and problem-solving skills. Among the three studies conducted in junior high school settings, two focused on critical thinking skills, whereas the remaining one focused on creative thinking skills.
In terms of the research designs employed by the studies, 22 used a quasi-experimental, nonequivalent pre–post-test design, while the remaining 3 used a single group pre–post-test design. The participants in both experimental and control groups in these studies were all enrolled in gifted programs. The range of the sample sizes was between 11 and 76, while the pooled sample size was 995 (experimental group = 511, control group = 484). Table 1 provides an overview of the descriptive information of the included studies. The following further describes the characteristics of the intervention programs and the characteristics of the studies.
Summary of inclusive studies.
Note: Type of programs: PRC: pull-out resource classroom; DBGP: district-based gifted program. Type of studies: P: published; UP: unpublished. Outcome measure: CRT: creative thinking; CT: critical thinking; PS: problem-solving. Measurement: TTCT: Torrance Tests of Creative Thinking; CAP: Creativity Assessment Packet; NCTT: New Creative Thinking Test; CTT1: Critical Thinking Test–Level 1; QDTCT: The Questionnaire of Dispositions Toward Critical Thinking; CCTTX: Cornell Critical Thinking Test–Level X; TPS: Test of Problem Solving; CS-CPS: Children Scientific Creative Problem-Solving. Content: SU: stand-alone unit; IU: integrated unit.
Characteristics of intervention
In terms of the intervention programs, diverse gifted education programs were used to improve different thinking skills. For creative thinking skills, the most common teaching methods in the primary studies were the Creative Problem Solving (CPS) program (three studies) (Chiang, 2005; Chou, 2004; Xie and Chang, 2016). Similarly, the CPS program was the most commonly used for improving problem-solving abilities (two studies) (Chiang, 2005; Xie and Chang, 2016). For critical thinking skills, the most widely used teaching methods were critical thinking skills programs (two studies; Chou, 2010; Lin, 2001) that involved five stages: identifying problems, deduction, induction, exploration, and evaluation.
Concerning the intervention dosage, the intervention period varied from 10 sessions to 38 sessions. Studies with the intervention lasting from 10 sessions to 20 sessions were coded as low (12 studies), from 21 sessions to 30 sessions as moderate (9 studies), and more than 31 sessions as high (4 studies). Overall, intervention sessions were implemented one to two times per week for 40–80 min.
With regard to outcome variables, diverse standardized instruments were used to measure the HOTS. For creative thinking performance, the most common outcome variables in the primary studies were standardized aptitude test results, such as the Torrance Tests of Creative Thinking, the Creativity Assessment Packet, and the New Creative Thinking Test. For critical thinking performance, the most widely studied variables were produced from the Critical Thinking Test–Level 1, the Questionnaire of Dispositions Toward Critical Thinking, and the Cornell Critical Thinking Test–Level X. Finally, the Test of Problem Solving and the Children Scientific Creative Problem Solving were used for estimating problem-solving ability performance.
Random-effects model results
In total, 29 effect sizes were extracted from the 25 studies. Table 2 presents the effect sizes in individual studies analyzed and relevant statistical information, including Hedges’ g, standard error, variance, the lower limit and upper limit of the 95% confidence interval (CI) for the effect size, Z value, and the p value. The random-effects model demonstrated that the effect sizes ranged from 0.26 to 2.01. A medium effect was found on the mean effect size (g = 0.76, 95% CI [0.62, 0.90]), which implied a general effectiveness of the HOTS interventions. The grand effect size between the treatment and control groups was significantly different from zero (Z = 10.48, p < 0.001). These results revealed that students receiving teaching interventions performed better than those in the control group on HOTS measures.
Summary of combined effect sizes for HOTS.a
HOTS = higher order thinking skills; SE = standard error.
a Heterogeneity: Q value = 43.18, p = 0.03; I 2 = 35%.
The test for heterogeneity yielded a Q statistic of 43.18 (p < 0.05), indicating some variability among the effect sizes. A further investigation into the I 2 statistic estimation resulted in 25%. However, according to Higgins and Green’s guideline, an I 2 of 0–40% indicates that it is likely not important. Hence, the heterogeneous analysis did not meet the criteria that we predetermined.
Outcome analysis
To examine the effect of selected instructional implementation on different HOTS, effect sizes were grouped by different dimensions of HOTS (see Table 3). Three types of HOTS were treated here as dependent variables, including creative thinking skills (k = 17), critical thinking skills (k = 6), and problem-solving abilities (k = 6). Under a random-effects model, the effects of HOTS interventions were significantly different from zero (p < 0.01) for creative thinking, critical thinking, and problem-solving. In addition, large effect sizes were found on creative thinking skills (g = 0.90, 95% CI [0.71, 1.10]), medium effects were found on critical thinking skills (g = 0.71, 95% CI [0.43, 0.99]), and small effects were found on problem-solving abilities (g = 0.47, 95% CI [0.21, 0.73]). The effect sizes were significantly higher for creative thinking skills than for critical thinking skills and problem-solving abilities (Q B = 6.98, p < 0.05). The results supported the conclusion that the instructional interventions examined in this meta-analysis had a significantly positive effect on different types of HOTS.
Mean effect sizes on the three categories of HOTS.a
HOTS = higher order thinking skills; SE = standard error.
a k = Number of effect sizes included in the analysis.
b p < 0.05.
c p < 0.01.
Subgroup analyses
Several coding characteristics (i.e. grade level, type of program, intervention dosage, instructional design, and type of dissemination) were selected as independent variables for a subgroup analysis. These variables are subsequently discussed. For a summary of all subgroup analysis results, see Table 4.
Results of subgroup analysis.a
SE = standard error.
a k = Number of effect sizes included in the analysis.
b p < 0.01.
Grade level
For grade level, effect sizes were grouped by studies including grades 3–4 (k = 8), grades 5–6 (k = 16), and grades 7–8 (k = 3). With the random-effects model, the effect sizes for students who are gifted in grades 3–4 and 5–6 in elementary school and grades 7–8 in junior high school were significantly greater than zero. Among the studies, the largest effect size was observed for gifted students in junior high school (g = 1.21), followed by the effect size for grades 3–4 (g = 0.84) and grades 5–6 (g = 0.70). However, the mean effect size of studies with different grade level students did not differ from each other (Q B = 3.40, p > 0.05). The results suggested HOTS improvement across grade levels in this meta-analysis.
Type of program
For the type of program, effect sizes were grouped by studies including pullout gifted resource classes (k = 26) versus district-based gifted programs (k = 3). In examining the type of program as a moderator, the effect sizes of both pullout resource class and district-based gifted program were significantly different from zero (p < 0.01), with average effect sizes of 0.82 and 0.54, respectively. There was no significant difference between the effect sizes in different types of programs (Q B = 2.94, p > 0.05). The results suggested that the type of program did not influence the effect sizes of the intervention.
Intervention dosage
For intervention dosage, effect sizes were grouped into low (k = 12), moderate (k = 10), and high (k = 7) dosage of intervention. With the random-effects model, all the effect sizes of low, moderate, and high dosage of intervention were significantly greater than zero (M = 0.76, 0.61, and 1.04, respectively; for all, p < 0.05). There was no statistically significant group difference in intervention dosage (Q B = 3.41, p > 0.05). The results suggested that the effects of selected instructional implementation did not vary significantly by the length of instruction.
Instructional design
Regarding instructional design, we compared the effect sizes of two models: stand-alone HOTS units developed by researchers versus integrated HOTS units derived from a regular curricular subject (e.g. social sciences). Units derived from a regular curricular subject produced a significantly greater intervention effect (g = 1.12), compared with independent units developed by researchers (g = 0.70).
Type of dissemination
With regard to types of dissemination, effect sizes were grouped by peer-reviewed journal articles (k = 15) and unpublished master’s theses (k = 14). Under the random-effects model, the effect sizes of both published and unpublished studies were significantly greater than zero (M = 0.69, 0.90, respectively; for both subgroups, p < 0.01). There were no statistically significant group differences between types of dissemination (Q B = 2.67, p > 0.05). The results suggested that types of dissemination in these studies did not correlate with the effectiveness found in the interventions.
Publication bias
The assessment of publication bias for the meta-analysis of HOTS was performed on the 29 effect sizes extracted from the selected studies, of which 15 (51.7%) were extracted from published journal articles and 14 (48.3%) were extracted from unpublished master’s theses. The funnel plot in Figure 2 showed the effect size of observed studies and four imputed studies. Through visual inspection, asymmetry was observed in the funnel plot, which indicates the probable presence of publication bias. The trim-and-fill method was conducted to impute the studies and adjust the effect size. After replacing the trimmed studies, the mean effect size for the combined studies moved from 0.76 (with a 95% CI [0.62, 0.90]) to 0.84 (with a 95% CI [0.69, 0.97]). Therefore, the effect of publication bias on the results of this meta-analysis was small. This finding implied that the original effect size (g = 0.76) may have been biased downward by publication bias.

Funnel plot of standard error by Hedges’ g.
Discussion and implications
The present meta-analysis aimed to answer three research questions. The first question was concerned with the overall effect of HOTS instructional interventions pertaining to gifted students in the existing literature in Taiwan. According to Cohen’s criteria, the overall effect size of 0.76 is considered between medium and large. The results suggest that HOTS could be enhanced through effective teaching methods (e.g. The Cognitive Research Trust [CoRT] program (de Bono, 1987), Discovering Intellectual Strengths and Capabilities while Observing Varied Ethnic Responses [DISCOVERY] (Maker, 2001)).
The second research question focused on the effect sizes of specific types of HOTS. To answer this question, we categorized the HOTS into three types of thinking skills by measured outcomes and treated these thinking skills as dependent variables. Our analysis revealed an overall positive effect consistent with findings from previous systematic reviews conducted outside of Taiwan (e.g. Abrami et al., 2008; Huber and Kuncel, 2016; Lee et al., 2016; Niu et al., 2013; Peltier and Vannest, 2017; Yasin and Yunus, 2014). In addition, the interventions on creative thinking skills demonstrated the largest effects among all HOTS skills. This may be because creative thinking skills were most commonly used in teaching gifted students in Taiwan and students had had more opportunities to practice them.
The third research question focused on which study characteristics produced the largest intervention effect. To answer this question, we evaluated group differences among the following variables: (a) grade level, (b) type of program, (c) intervention dosage, (d) instructional design, and (e) study’s dissemination. For grade level, the intervention effects were statistically significant at all grade levels. However, the differences in the effect sizes for the grade-level comparisons were not statistically significant at the 0.05 level. The results indicate that HOTS interventions are appropriate for all of the grade levels included in this meta-analysis. However, there should be caution in interpreting this result since the number of studies conducted in junior high school settings may be too small to provide reliable insights.
Similarly, the intervention effect was statistically significant for both types of programs (pullout gifted resource classrooms or districted-based gifted programs) with no significant difference in the effect sizes found between the two types. This finding suggests that instructions carried out through the two schemes could both improve gifted students’ HOTS. However, the number of studies for the district-based gifted program might be too small to detect any differences in effects and the finding should be interpreted with caution.
The intervention dosage did not significantly affect the HOTS effect sizes in this study, suggesting that the effects of HOTS instruction did not vary whether the length of intervention was brief or not. More importantly, the average effect size for HOTS instruction was significantly greater than zero for each category of intervention dosage. The finding suggested that there was no reason to expect exponential effects in intervention effectiveness based on the length of instruction. When thinking about adding HOTS to their regular teaching repertoire, teachers could have the confidence that it is a matter of focus (i.e. reminding students of the importance of HOTS) rather than a matter of length.
In terms of the comparison between the two instructional design methods, the effect was statistically significant for both methods. However, the difference between the two effect sizes was statistically significant. This suggested that an integrated unit derived from a regular curriculum subject was more effective than a stand-alone unit designed specifically to enhance HOTS. This finding is promising for educators since additional instructional hours for students to acquire HOTS may not be necessary. This finding could also suggest that teachers be equipped with some theoretical foundation regarding HOTS and have the capacity to embed HOTS within their daily instruction.
Conclusion and study limitation
This study has a few limitations. First, the number of usable empirical studies for this meta-analysis was much smaller than expected, given the voluminous body of literature related to the effects of HOTS instruction. Many studies were excluded because they were not clear about the definitions of HOTS or their practices. Hence, there should be some caution in interpreting the results from this meta-analysis. Second, this meta-analysis did not include qualitative studies since they do not contain the quantitative information needed for a meta-analysis. In the future, synthesized studies, including both qualitative and quantitative information, would be valuable in providing a more comprehensive understanding related to the effects of HOTS instruction. Third, the findings from this meta-analysis may not generalize to other settings than those in which targeted instruction methods were implemented.
Despite the limitations of this meta-analysis, this study makes contributions to the field of gifted education in several ways. First, the findings from this meta-analysis support the positive effects of HOTS instructions on gifted students in terms of creative thinking, critical thinking, and problem-solving. Second, this meta-analysis has updated previous synthesized research on the effects of HOTS instructions for gifted students. There has been no current meta-analytic studies focusing on HOTS instruction to gifted students in Taiwan, so this study provides research-based information on HOTS instruction for practitioners in different grade levels. Third, this study found that HOTS instruction can promote the development of students’ thinking. However, the effect size of the integrated HOTS units is greater than that of the stand-alone units designed by researchers. It means that it would be beneficial to integrate HOTS into the instruction of regular school subjects and make them a part of students’ daily learning experience. In summary, although previous literature has explored the effects of HOTS instruction, this study provided new information and supported previous literature that can benefit researchers and practitioners.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
