Abstract
The aims of this reliability generalization study were to provide the overall alpha values of the California critical thinking disposition inventory (CCTDI) total score and subscales scores and investigate the characteristics of the studies that may be associated with the variability in the reliability values of the CCTDI total score and subscales scores. This study was carried out with 98 alpha values from 87 unique studies for total CCTDI scores. In the random effects model, total CCTDI was found to be reliable across samples with alpha value of 0.83. Also, the general alpha coefficients were 0.65, 0.56, 0.64, 0.66, 0.74, 0.72, and 0.61 for TS-scale, OM-scale, A-scale, S-scale, SC-scale, I-scale, and M-scale, respectively. Examination of study characteristics indicated that sample type was a significant predictor for alpha value of total CCTDI and all subscales. Samples with university students reported larger Cronbach’s alpha estimates for total CCTDI and all subscales. Also, while language was found to moderate the general alpha coefficient of total CCTDI, OM-scale, A-scale, S-scale, and I-scale, it was not a significant moderator on the general alpha value of TS-scale, SC-scale, and M-scale. Total CCTDI and all subscales showed higher Cronbach’s alpha values for the English-language administrations. Besides, country of the study was a significant moderator on the general alpha coefficient of total CCTDI, S-scale, and I-scale. However, subgroup of discipline was not a significant moderator on the general alpha coefficient of total CCTDI and its subscales. The mean of the test scores significantly explained %5 of the variance of alpha values of the total CCTDI. SD of the test scores significantly explained %10, %55, and %54 of the variance of alpha values of the total CCTDI, A-scale, and S-scale, respectively. It was found that gender and ethnicity significantly moderated the alpha values for M-scale.
Keywords
Introduction
Critical thinking (CT)
Ennis (1991) defines CT as a functional, reflective, and reasonable way of thinking that is employed by individuals while deciding what to do or what to believe. According to Paul (1990), individuals evaluate the source of knowledge, test the validity of the acquired information, analyze its reliability and draw appropriate inferences for specific situations through CT which can be seen as a logical and rational way of dealing with ideas (Ruggerio, 1990). CT works as a defense mechanism for individuals against today’s world in which information is easily accessible (Epstein & Kernberger, 2012) because individuals acquire the true, useful, and accurate information through CT. After they question, examine, and evaluate such information, they can decide to believe it or not. That evaluation includes assessing the sensibility, truth, and accuracy of the given information, claims, evidence, and judgments (Lewis & Smith, 1993). After a Delphi project sponsored by the American Philosophical Association (APA) in 1990, CT was defined as “purposeful, self-regulatory judgment which results in interpretation, analysis, evaluation, and inference, as well as explanation of the evidential, conceptual, methodological, criteriological, or contextual considerations upon which that judgment is based” (Facione, 1990, p.2). There is consensus in the literature that CT is a complex thinking process that includes higher-order reasoning processes and different cognitive skills to achieve the desired outcome (Halpern, 2003; Sternberg, 1986). Some of these cognitive skills are identifying assumptions, evaluating evidence, deducing conclusions (Pascarella & Terenzini, 1991), analyzing and evaluation of arguments, claims or evidence, and making inferences using inductive or deductive reasoning (Facione et al., 2000; Lai, 2011).
CT is not only about skills or abilities but also includes some dispositions toward using those skills (Ennis, 1987; Paul, 1990). CT involves both abilities and dispositions such as seeking the truth, being systematic, and being open-minded (Ennis, 1987). CT dispositions are as important as CT skills because possessing the CT skills alone is not sufficient to use those skills in everyday life (Paul & Elder, 2001) and both skills and dispositions make an individual a good critical thinker (Profetto-McGrath, 2003). Therefore, a good critical thinker should have both the skills and the dispositions toward using of these skills. Without having a strong disposition to use CT skills, they may never be used; conversely, if the individuals do not have high CT skills, strong dispositions might be useless (Profetto-McGrath et al., 2003).
California Critical Thinking Disposition Inventory (CCTDI)
The CCTDI was developed by Facione and Facione (1992) after a comprehensive Delphi study by APA. The CCTDI which derives its conceptualization from the APA Delphi Report is the first instrument devoted to the dispositional aspect of CT and designed to measure individuals’ CT dispositions. In other words, it is not an inventory to measure how well individuals use CT skills, but how much individuals value the inclination to use CT skills. The CCTDI which uses a 6-point Likert scale (from strongly agree to strongly disagree) has 75 items and 7 subscales which are truth-seeking (TS-scale), open-mindedness (OM-scale), analyticity (A-scale), systematicity (S-scale), CT self-confidence (SC-scale), inquisitiveness (I-scale), and maturity (M-scale). The items are interspersed throughout the CCTDI and it takes approximately 20 minutes to be completed. A score of 30 and below and scores above 50 on any of these subscales indicate weakness and strength in relation to the given attribute or characteristic, relatively. The inventory yields an overall score (maximum 420 and minimum 70) and scores above 350 indicate a strong disposition, scores between 280 and 350 reflect a positive inclination, and scores under 280 are described as deficient in CT disposition. The CCTDI is discipline neutral and can be used within any disciplines such as liberal arts and sciences (Facione et al., 1994).
The TS-scale is about to measure the disposition of being willing to find the truest knowledge in a given context, courageous about asking further questions, and objective about continuing inquiry even if the findings are not beneficial. The OM-scale targets the disposition of showing tolerance to divergent views and being sensitive and aware of the possibility of one’s own bias. The A-scale addresses treasuring the use of reasoning and evidence to solve problems. Individuals with a positive inclination toward the A-scale anticipate potential conceptual or practical difficulties and are always beware of the need to intervene. While the S-scale targets to measure being organized, orderly, and persevering in inquiry, the SC-scale measures the trust of someone’s own reasoning processes. Individuals with a positive inclination toward SC-scale trust the excellence of one’s own reasoned judgments and can lead other people in rationally solving problems. The I-scale targets to measure individuals’ intellectual curiosity and their enthusiasm for learning even if the knowledge is not readily apparent. The M-scale is about being judicious in one’s decision-making process. CT-mature individuals are aware that some problems may be ill-structured and some situations can have more than one reasonable option. Also, they make judgments according to some standards, contexts, and evidence.
In 1992, a pilot study was conducted with 156 undergraduates, high school students and post baccalaureates, and alpha coefficient for the overall instrument was found to be 0.91 (Facione & Facione, 1992). Also, the alpha coefficients of subscales were: TS-scale (12 items) 0.71, S-scale (11 items) 0.74, I-scale (10 items) 0.80, OM-scale (12 items) 0.73, SC-scale (9 items) 0.78, M-scale (10 items) 0.75, and A-scale (11 items) 0.72. In a subsequent study by Facione et al. (1994) with a sample of 1.019 freshmen students, alpha coefficients remained relatively consistent with the first pilot study (for subscales 0.60–0.78 and for total inventory 0.90). After a factor analysis conducted by the authors (Facione et al., 1994), the CCTDI indicated seven factors.
Walsh and Hardy (1997) reexamined the inventory with a sample group of 499 undergraduates across different disciplines such as history, nursing, and education. Alpha coefficients for the subscales ranged from 0.57 to 0.78 with an overall alpha of 0.91. After the factor analytic procedure, total items of the inventory were reduced to 52 from the original 75 under four factors. These four factors with total variance of %28 renamed by the authors as perspicacity/confidence (19 items, variance accounted for %15), receptivity/open-mindedness (17 items, variance accounted for %5.6), systematicity (8 items, variance accounted for %4.47), and objectivity/maturity (8 items, variance accounted for %2.97). Bondy et al. (2001) investigated the psychometric properties of the CCTDI using two samples of undergraduate students from different academic majors. While the overall alpha values for the CCTDI for Sample 1 (320 students) and Sample 2 (156 students) were 0.89 and 0.90, respectively, alpha values for subscales ranged from 0.39 to 0.77 for Sample 1 and from 0.49 to 0.79 for Sample 2. The authors conducted a confirmatory factor analysis with the combined sample (N = 476) and found out that emerged factor loadings were not consistent with the previously reported by Facione et al. (1994). According to Bondy et al. (2001), while %33 of the items had loadings of less than 0.30, %12 of them had loadings of 0.35 or less. Also, the total explained variance was reported as %27.7.
The CCTDI was reexamined by Kakai (2003) with a sample group of 536 university students. Alpha values for the subscales ranged from 0.59 to 0.77 with an overall alpha of 0.89. After principal components factor analysis conducted by the author to examine a streamlined 4-factor model, the total items of the inventory reduced to 52 loaded on seven factors four of which closely similar to the factors that emerged in Walsh and Hardy’s (1997) study. Also, Kakai (2003) performed another factor analysis on the 40 items from the four factors to have a more efficient short version. These four factors which explained %33 of the total variance were renamed as intellectual diligence, open-mindedness, nonrelativism, and analyticity. Walsh et al. (2007) reexamined the stability of the factor structure of the CCTDI using 800 undergraduate students. Alpha coefficients for the subscales ranged from 0.53 to 0.84 with an overall alpha of 0.91. After principal components factor analysis, total items of the inventory were reduced to 25 under four factors. These four factors with total variance of %44.95 renamed by the authors as intellectual prowess (13 items, variance explained %24.25), objectivity (6 items, variance explained %9.7%), systematicity (3 items, variance explained %5.6), and receptivity (3 items, variance explained %5.4).
Reliability and Reliability Generalization (RG)
While evaluating the test scores in many fields such as psychology and education, reliability can be seen as one of the most important key concepts. Reliability refers to the stability of test or any measurement scores over repeated administrations (Traub & Rowley, 1991). Contrary to well-known belief, it is a dynamic property of test scores and about the data rather than a fixed entity of the measurement tool itself (Thompson & Vacha-Haase, 2000). Therefore, reliability should be calculated after each measurement administration and should be reported in each study because it is influenced by sample characteristics such as gender, age, or language (Thompson, 2002) and varies across different administrations. Internal consistency, one of the most used types of reliability, captures the extent to which items in a measure assess the same unidimensional construct (Semma et al., 2019). There are many methods to assess internal consistency reliability such as Cronbach’s alpha, split-half reliability, and Kuder–Richardson tests. Among these methods, Cronbach’s alpha is the most common estimate of reliability (McNeish, 2018; Dimitrov, 2002) and can be calculated as
In this formula, n is the number of items in the measure, V i is the variance of a specific item’s score, and V total is the variance of all scores (Cronbach, 1951).
Reliability estimates can vary across sample and study characteristics (Dawis, 1987) and low reliability estimates mean high levels of measurement error and therefore weaken the power of statistical tests (Yetkiner & Thompson, 2010). Therefore, it can be said that reliability estimates are important while interpreting the research findings. Considering the significance of reliability of test scores, Vacha-Haase (1998) proposed a meta-analytic method called RG. RG is a meta-analytic technique that assesses the overall reliability of a measure (Vacha-Haase, 1998). Thanks to RG, we can combine different reliability coefficients across different measurement administrations and reach overall reliability of a measure. Also, we can see the heterogeneity of reliability estimates for a given measure across different studies and possible sources of heterogeneity in the reliability estimates (Vacha-Haase, 1998). By identifying these possible sources of heterogeneity, we can see what really influences the reliability estimates of a given measure. In short, RG provides us a general evaluation of a measurement’s internal consistency and identifies the factors that involve in the variability of the reliability of a measurement.
The Present Study
According to Henson and Thompson (2002), RG should be used only when the measurement was administered widely and if there is a reasonable number of studies that estimated and reported the reliability of the scores. In this study, it was aimed to calculate the overall alpha value of the CCTDI which has been widely used to measure CT dispositions since its development. The inventory has been translated to many languages such as Turkish, Chinese, Italian, and in widespread use in many countries and universities in a variety of academic disciplines (Walsh et al., 2007). Although there are studies investigating the factor structure of the CCTDI (Walsh & Hardy, 1997; Kakai 2003; Bondy et al., 2001; Walsh et al., 2007), there are not any meta-studies that focused on the reliability of the CCTDI. Therefore, the first aim of the present study was to examine the overall alpha values of the CCTDI total score and subscale scores. The second aim was to investigate the variability of the overall Cronbach’s alpha estimates of the CCTDI total score and subscale scores, and then to explore study characteristics that might explain this variability. While sample type, language, discipline, and country of the study were considered as potential categorical moderators of the alpha estimates of the CCTDI, mean age of the sample, gender (%female), ethnicity (%Caucasian), mean, and standard deviation (SD) of the test scores were determined as potential continuous moderator variables. It was expected that alpha values for university students, English CCTDI, and administrations in the USA would be higher than younger students, non-English versions, and administrations in other countries because the development of the CCTDI was carried out in the USA and was based on an English-speaking sample consisting of mostly university students. Therefore, I thought that item wording could be more appropriate for English-speaking samples and university students. Also, it was expected that the CCTDI would have similar alpha values across different disciplines because the CCTDI is discipline neutral and items include no technical vocabulary or CT jargon (Facione et al., 1994). Besides, as the CCTDI was developed in the USA which has ethnically diverse population and with sample including male and female students, it was hypothesized that mean age, gender (%female), and ethnicity (%Caucasian) would not significantly moderate the alpha values of total CCTDI and its seven subscales. Mean and SD of the test scores were also included as continuous moderator variables because the mean and SD of the test scores are often examined as potential moderators of alpha values in RG studies (e.g., Rubio-Aparicio, Badenes-Ribera, Sánchez-Meca, Fabris & Longobardi, 2020; Deng et al., 2019). It was expected that higher levels of mean and SD of the test scores would significantly predict larger alpha values of total CCTDI and seven subscales.
Method
In this RG study, REGEMA guidelines proposed by Sánchez-Meca et al. (2021) and Henson and Thompson’s (2002) recommendations for RG studies were followed.
Collection of Studies and Inclusion and Exclusion Criteria
Several online databases including Google Scholar, Web of Science, ERIC, and Scopus were searched systematically to reach the studies administered the CCTDI. During the literature review, “California Critical Thinking Disposition Inventory” OR “California Critical Thinking Disposition” OR “CCTDI” search pattern was used in English. The literature review resulted in 3512 studies in total by the last search on the 23rd of April 2021. No limits for the date of publication were established. I also checked the citations of the CCTDI test manuals on Google Scholar to reach possible studies. There were 1069 studies for “The California Critical Thinking Disposition Inventory and the CCTDI test manual” by Facione and Facione (1992) and 833 studies for “Critical thinking disposition as a measure of competent clinical judgment: The development of the California Critical Thinking Disposition Inventory” by Facione et al. (1994). In short, after literature review, 5414 studies were found. Studies had to meet three criteria to be eligible for inclusion in the meta-analysis. First, original CCTDI with 75 items must be administered in the studies. Second, studies must report precise Cronbach’s alpha coefficients for total CCTDI or any of its subscales and sample size. Third, studies must be in English or Turkish. Studies were excluded if they did not administer the original CCTDI which has 75 items and only studies that reported Cronbach’s alpha for total CCTDI or for at least one of its subscales were included. I decided to use only Cronbach’s alpha as internal reliability because it is the most commonly reported measure of reliability in the studies. The studies were examined first through their titles and method sections and 4923 studies were excluded because of several reasons (e.g., duplicates, the version of the CCTDI). Then, 491 studies that administered the CCTDI or at least one of its subscales were screened and reviewed by two researchers in terms of inclusion criteria abovementioned and 397 studies were excluded. 115 of these studies (%23.42) did not report any alpha values, 248 of them (%50.51) induced reliabilities from previous research and 34 of them (%6.92) reported reliability as a range of coefficients. Finally, 94 unique studies (%19.14) were included in the analysis. Since more than one Cronbach’s alpha value were reported for multiple samples in some of the studies, those samples were coded separately and considered as independent measures of the CCTDI. However, if two or more alpha values were provided for the same sample (e.g., pre- and post-test or longitudinal studies), only the pre-test or first measurement values were included. Therefore, 98 alpha values (from 87 unique studies) for total CCTDI were included in the RG meta-analysis. This number is 46 for TS-scale (from 37 unique studies), 50 for OM-scale (from 41 unique studies), 47 for A-scale and S-scale (from 38 unique studies), 51 for SC-scale (from 42 unique studies), 48 for I-scale and M-scale (from 39 unique studies). All studies included in the meta-analysis and descriptive results of them can be seen in online supplemental material. Flow diagram for literature review and evaluation of studies is shown in Figure 1. Flow diagram for literature review and evaluation of studies.
Also, it was tried to uncover any other potential studies that may fulfill the selection criteria by examining the reference lists of the collected studies. However, no study has been found to include in the meta-analysis. Also, since some of the studies were not fully accessible, an e-mail was sent to the authors of these studies. However, the authors did not respond to e-mails. As a result, total sample number of the studies included in meta-analysis is 22201 for total CCTDI, 12984 for TS-scale, 13738 for OM-scale, 13183 for A-scale, 13173 for S-scale, 13777 for SC-scale, 13377 for I-scale, and 13256 for M-scale.
Coding of Studies
Based on RG coding recommendations by Henson and Thompson (2002), clear and detailed rules should be determined to code the studies. Therefore, firstly, a detailed coding manual was developed by the author and the studies were coded by both the author of this study and a second researcher who works in the field of Educational Sciences with meta-analysis experience. Inter-coder reliability was found to be great with a mean kappa coefficient of 0.95 (SD = 0.056), ranging from 0.88 to 1 for categorical moderator variables and with a mean intraclass correlation of 0.97 (SD = 0.025), ranging from 0.95 to 1 for continuous moderator variables. All discrepancies between two coders were resolved before statistical analyses. Name of the study, year published, author(s), country of the study, sample characteristics (mean age, gender, ethnicity, etc.), mean and SD of test scores, language of the CCTDI, sample size, and alpha values were extracted from the studies.
Data Analysis and Interpretation
Comprehensive Meta-Analysis (CMA) package program was used for data analysis. Since using raw Cronbach’s alpha estimates in meta-analysis violates the normality assumption (Rodriguez & Maeda, 2006), they should be transformed. In RG literature, there are three transformation methods which are Hakstian–Whalen (Hakstian & Whalen, 1976), Fisher’s R to Z, and Bonett’s transformation (Bonett, 2002). In this study, Bonett’s transformation was used to normalize sampling distributions and to stabilize variances. In Bonett’s transformation, reliability estimates are normalized via
In this study, the confidence interval in all calculations was determined as %95. Reporting bias was checked with funnel plot, Duval and Tweedie’s trim-and-fill method, and Egger’s regression intercept. In meta-analysis studies, the heterogeneity can be investigated by some widely used statistical tests such as Q statistics and I2 value (Hedges & Olkin, 1985; Petticrew & Roberts, 2006). Q value which is greater than the critical limit in X2 table at k-1 degrees of freedom value (k is the number of effect sizes) indicates heterogeneity among studies. With low number of studies, Q statistic is likely to fall weak in identifying the heterogeneity (Huedo-Medina, Sanchez-Meca, Marin-Martinez & Botella, 2006). Therefore, I2 value which is not influenced by the number of studies should also be examined with Q statistic (Petticrew & Roberts, 2006). I2 can take values from %0 to %100 and value ranges of %25, %50, and %75 are considered as low, medium, and high heterogeneity, respectively (Cooper, 2017). In this study, Q statistics and I2 value were used together to check the heterogeneity.
Moderator analyses for categorical variables were conducted according to subgroups of sample type (nurse, university student, high school student, etc.), language of the CCTDI (English or Non-English), discipline (medical and non-medical disciplines), and country of the study (the USA and other countries). Moderator analyses for categorical variables were performed with Analog ANOVA and mixed-effects model was applied to investigate the influence of categorical moderator variables. With Analog ANOVA, different Q-statistic values like between-group (QB), within-group (QW), and total (QTOTAL) can be calculated. Whether the categorical variable is a real moderator can be decided with QB statistic which indicates the between-group homogeneity (Lipsey & Wilson, 2001). QB value which is greater than the X2 table critical value and significant QB statistic show that overall alpha value varies between categories of the moderator variable. Meta-regression analyses assumed mixed-effects model were conducted to investigate the possible effect of continuous moderators (mean age, gender, ethnicity, and mean and SD of the test scores) on the alpha values of total CCTDI and its subscales.
Results
Results on Reporting Bias
The funnel plots for total CCTDI and seven subscales shown in Figure 2 were examined using the trim-and-fill method by Duval and Tweedie (2000). Funnel plots for total CCTDI and its subscales.
The studies need to be distributed symmetrically around the general effect size in the funnel plot to say there is no reporting bias. As shown in Figure 2, it can be said that all funnel plots appear to be symmetric. Besides, the funnel plots were examined using the trim-and-fill method by Duval and Tweedie (2000) and the results showed that there is no need to add any imaginary studies for total CCTDI and for seven subscales to eliminate reporting bias totally. Considering the fact that the review of funnel plot can be subjective, Egger’s test was also examined. Egger’s intercept is 0.967 (%95 CI = −0.960–2.895), p > 0.05 for total CCTDI, 0.537 (%95 CI = −2.547–3.621), p > 0.05 for TS-scale, −0.111 (%95 CI = −3.115–3.339), p > 0.05 for OM-scale, −0.327 (%95 CI = −4.319–3.664), p > 0.05 for A-scale, 0.388 (%95 CI = −2.670−3.446), p > 0.05 for S-scale, 2.420 (%95 CI = 0.006–4.834), p > 0.05 for SC-scale, 1.479 (%95 CI = −1.108–4.067), p > 0.05 for I-scale, and 1.606 (%95 CI = −1.263–4.476), p > 0.05 for M-scale. Overall, it can be said that there is no reporting bias in the results for total CCTDI and seven subscales in this RG study.
Results on the General Alpha Coefficient of the CCTDI and Subscales
Mean alpha coefficients and heterogeneity test results in random effects model for total CCTDI and subscales.
Note. k = number of reliability coefficients; SE = standard error; %95 CI = upper and lower bounds of the %95 confidence interval around the overall reliability estimate; %95 PI = prediction intervals; df = degrees of freedom; Q = heterogeneity statistic representing total variance; I2 = heterogeneity index; *p < 0,05.
As seen in Table 1, Q(df = 97) value was found to be 1732.59 (p < 0.05) for total CCTDI. According to the X2 table, the calculated Q value is greater than the limit with 97 degrees of freedom and in 0.05 confidence interval (df = 97, X2(0.05) = 120.990). It is therefore possible to say that there is heterogeneity among the studies. Also, the calculated I2 value (%94.40) indicates a high level of heterogeneity. Therefore, random effects model was used while calculating the general alpha coefficient for total CCTDI in this study. Indeed, random effects model should be preferred for meta-analysis conducted in social sciences; because it is very difficult to ensure homogeneity among the studies in social sciences (Schmidt & Hunter, 2015; Borenstein et al., 2009). The general alpha coefficient of total CCTDI was calculated as 0.83 (%95CI 0.82–0.85) according to random effects model. When the prediction interval is investigated, it can be said that the range of alpha values that might be expected within primary studies using total CCTDI would likely fall between 0.65 and 0.92.
Also, the heterogeneity tests for subscales showed that there is heterogeneity among the studies for TS-scale (Q(df=45) = 800.68), OM-scale (Q(df=49) = 1007.15), A-scale (Q(df=46) = 1378.78), S-scale (Q(df=46) = 811.90), SC-scale (Q(df=50) = 664.97), I-scale (Q(df=47) = 613.92), and M-scale (Q(df=47) = 776.79). According to random effects model, the general alpha coefficients are for TS-scale 0.65, for OM-scale 0.56, for A-scale 0.64, for S-scale 0.66, for SC-scale 0.74, for I-scale 0.72, and for M-scale 0.61. Forest plots for total CCTDI and its subscales can be seen in online supplemental material.
Results on the Moderator Analyses
As a high level of heterogeneity was found among studies, moderator analyses for categorical variables were conducted according to the subgroups of sample type, language, discipline, and country of the study.
Comparison of alpha coefficients of total CCTDI and subscales across categorical moderators.
Note. k = number of reliability coefficients; %95 CI = upper and lower bounds of the %95 confidence interval around the overall reliability estimate; df = degrees of freedom; QB = between category heterogeneity statistic; p value for QB statistic; medical subgroup includes disciplines such as nursery and midwifery; non-medical subgroup includes disciplines such as education, liberal arts, and language.
Results of the simple meta-regression analysis by the continuous moderator variables for total CCTDI and subscales.
Note. k is the number of studies that contained information for each of the variables; %95 CI = upper and lower bounds of the %95 confidence interval, τ2res means estimated τ2 residual; I2res means I2 residual; R2Meta means R2 of the mixed-effects model. Note that as meta-regression analysis was conducted with Bonett’s transformation values instead of alpha values, the true direction of the relationship between each continuous moderator and alpha values is the inverse of what is shown by the sign of the slope in the table.
As shown in Table 3, the mean of the test scores significantly explained %5 of the variance of alpha values of total CCTDI (p = 0.035). Also, SD of the test scores significantly explained %10 (p = 0.027), %55 (p = 0.003), %54 (p = 0.010) of the variance of alpha values of total CCTDI, A-scale, and S-scale, respectively. There was a statistically significant positive correlation between the SD of the test scores and alpha values for total CCTDI, A-scale, and S-scale. Besides, it was found that the gender (accounting for %8 of the variance) and ethnicity (accounting for %26 of the variance) significantly moderated the alpha values for M-scale. There was a statistically significant negative correlation between the gender (%female), ethnicity (%Caucasian) and alpha values for M-scale. In addition to these, there were not any significant association between other continuous moderator variables and alpha values for the total CCTDI and subscales scores.
Conclusion and Discussion
The aims of this RG study were to provide overall alpha values of the CCTDI total score and subscales scores and to investigate the characteristics of the studies that may be associated with the variability in the reliability values of the CCTDI total score and subscales scores. Of the 491 studies that administered the CCTDI or at least one of its subscales, only 94 (%19.14) reported usable reliability value. 115 of them (%23.42) did not report any alpha values and 248 of them (%50.51) induced reliabilities from previous research. Because reliability is a dynamic property of instruments (Thompson, 2002) and it may vary across test occasions, settings, and samples (Dawis, 1987), it must be calculated in each application (Crocker & Algina, 1986; Streiner et al., 2015). Therefore, underreporting of reliability in these studies can be seen as a concerning issue because most of the studies (%73.93) failed to report any form of reliability for their data. Literature on the reporting of reliability states that many researchers do not report reliability values for their samples (Henson & Thompson, 2002; Vacha-Haase, 1998). Besides, a low percentage of studies administered the CCTDI with usable reliability value (%19.14) is similar to previous RG studies investigating score reliability of instruments from different behavioral science fields (e.g., Bruna et al., 2019; Alcocer‐Bruno et al., 2020). There may be two reasons for this: First, researchers do not report reliability values because they know the instrument is not reliable for their sample. Second, they are not aware that it should be reported for each administration. In either case, the underreporting of reliability values can be harmful to the interpretation of findings in the studies. Therefore, score reliability reporting practices should be improved although this underreporting of reliability problem may not be generalized across all fields of scientific inquiry.
Total CCTDI was found to be reliable across samples with alpha value of 0.83. Also, the subscale which has the greatest alpha value is SC-scale (α = 0.74). I-scale (α = 0.72), S-scale (α = 0.66), TS-scale (α = 0.65), A-scale (α = 0.64), M-scale (α = 0.61), and OM-scale (α = 0.56) follow it, respectively. This RG study indicated lower alpha values for the CCTDI and its subscales than the alpha values reported in test manuals (Facione & Facione, 1992; Facione et al., 1994). Nunnally and Bernstein (1994) state that reliability coefficients above 0.70 are adequate for exploratory research. Therefore, alpha values for the subscales of SC-scale and I-scale are satisfactory. Also, alpha values of S-scale, TS-scale, A-scale, and M-scale can be seen as acceptable for research purposes. However, OM-scale has a really low alpha value. It was found out that scores on the subscales had lower alpha values than the scores on total CCTDI. This finding may be attributable to the subscales having a smaller number of items because alpha values are less stable with a smaller number of items (Thompson, 2002). The studies investigating psychometric properties of the CCTDI (Walsh & Hardy, 1997; Kakai 2003; Bondy et al., 2001; Walsh et al., 2007) indicate that there is no agreement on a conceptual level about the number of factors and the factor structure of the items. The number of factors and the factor structure are not consistent with the original factor structure that emerged in initial pilot studies by Facione and Facione (1992) and Facione et al. (1994). Therefore, this can be another reason why the subscales had lower alpha values. This unstable factor structure of the CCTDI can increase measurement errors and decrease the overall reliability of the subscales.
Examination of study characteristics indicated that sample type was a significant predictor for alpha value of total CCTDI and all subscales. Samples with university students reported larger Cronbach’s alpha estimates for total CCTDI and all subscales. Also, it was found out that while country of the study was a significant moderator on the general alpha coefficient of total CCTDI, S-scale, and I-scale, it was not on the general alpha value of TS-scale, OM-scale, A-scale, SC-scale, and M-scale. However, alpha values for total CCTDI and seven subscales were greater for administrations in the USA. Because the CCTDI was developed in the USA and tested using a sample consisting mostly of university students (Facione & Facione, 1992; Facione et al., 1994), these two moderator results are not surprising. It is conceivable that items of the CCTDI may hold different meanings for samples other than university students and for samples who live in countries other than the USA. Therefore, it can be said that when the CCTDI is used with samples consisting of mostly university students and in the USA, items are more likely to be interpreted similarly and this leads to higher alpha values.
Also, while language was found to moderate the general alpha coefficient of total CCTDI, OM-scale, A-scale, S-scale, and I-scale, it was not a significant moderator on the general alpha value of TS-scale, SC-scale, and M-scale. Total CCTDI and all subscales showed higher Cronbach’s alpha values for the English-language administrations. This result suggests that the cultural and language validations and adaptations of the CCTDI are needed to be improved as administrating the non-English versions instead of the original version seems to decrease the reliability. The adaptation of a psychological instrument to other languages and different cultures is a process that involves not only the translation of the items but also validation studies for this population (Messick, 1995). Also, this finding may be attributable to the quality of translations from English to other languages. Therefore, the translation processes of the CCTDI to other languages need to be further examined.
Besides, moderator analyses showed that subgroup of discipline was not a significant moderator on the general alpha coefficient of total CCTDI and its subscales. Because the CCTDI is a discipline neutral inventory (Facione et al., 1994), this moderator result is not a surprising one. This moderator result indicates that CCTDI can yield reliable results when it is used within different disciplines.
In addition to these, simple meta-regression analyses by the continuous moderator variables for total CCTDI and subscales indicated that SD of the test scores significantly moderated the alpha values for total CCTDI, A-scale, and S-scale. Also, there was a statistically significant positive correlation between mean of the test scores and the alpha values of total CCTDI. These two findings are in line with the findings of previous RG studies (Rubio-Aparicio et al., 2020; Deng et al., 2019). According to Crocker and Algina (1986), if the observed score variance increases, internal consistency will also increase. Therefore, it can be said that these two findings are similar to previous RG studies and in line with classical test theory. Also, simple meta-regression analyses showed that mean age, gender, and ethnicity were not significant moderators on the general alpha coefficient of total CCTDI and its subscales (except for gender and ethnicity on M-scale). This finding may be attributable to the fact that development of the CCTDI was carried out in the USA which has ethnically diverse population and with sample including male and female students.
In the behavioral sciences, the reliability of the inferences made by researchers is dependent on the assumption that the scores produced by the instrument are reliable and valid. There can be little question that the CCTDI has been a widely used instrument to measure CT dispositions since its development in 1992. After its development, validity of the CCTDI scores has been investigated by some researchers (e.g., Walsh & Hardy, 1997; Kakai 2003; Bondy et al., 2001; Walsh et al., 2007). This RG study was designed to complement the existing body of literature on the CCTDI by providing reliability evidence for the total CCTDI and subscales scores. Total CCTDI was found to be reliable across samples with alpha value of 0.83. However, alpha values for the subscales were relatively low, especially for OM-scale.
Limitations and Recommendations for Future Research
Although this study is the first to shed light on the overall reliability of the CCTDI and its subscales and to examine moderators, it has several limitations. Although there are shortened versions of the CCTDI, this RG meta-analysis was conducted with the studies administered the original CCTDI with 75 items. Therefore, it can be seen as the first limitation of this study. Second, it has a limitation related to the selection of primary studies. Although the CCTDI is one of the most popular instruments used to measure CT dispositions, there are not many studies which report reliability value for different samples and this reduces the number of eligible studies for this RG study. Third, there may be other moderators that were not considered in this study such as other country or sample characteristics. Fourth, the CCTDI is mostly used within disciplines related to medical (e.g., nursery or midwifery) and the number of studies for other disciplines is limited. Therefore, I had to divide the disciplines into two groups as medical and non-medical.
The current results have important implications for other studies and future applications of the CCTDI. Further studies can be done to investigate other possible moderators on the reliability of the CCTDI. Also, other studies can be done to compare the internal consistency of the original CCTDI with its shortened versions. Besides, further investigation into the factor structure of the CCTDI should be done to provide evidence on validity. In addition to these, researchers should be aware that the CCTDI subscales, especially S-scale, TS-scale, A-scale, M-scale, and OM-scale, have relatively low alpha values and the CCTDI has higher reliability scores when it was used in English and with university students. I also recommend for future researchers utilizing the CCTDI to report the sample’s reliability estimates for the subscales of the CCTDI. If the number of the reliability values for the subscales increases, other researchers can synthesize reliability estimates of subscales.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
