Abstract
This article outlines the development and validation of the Computer-Delivered Test (CDT) Acceptance Questionnaire (CTAQ). The CTAQ was designed to be a practical measure of CDT acceptance of Singapore secondary and high school students (Grades 7-12) toward taking tests within an e-assessment system. The stages of test (questionnaire item) content, response processes, and internal structure under Messick’s unified concept of validity suggested that the CTAQ had sound psychometric properties. Exploratory factor analysis (EFA; n = 485) and confirmatory factor analysis (CFA; n = 484) yielded a three-factor model (ease of use, involvement and experience) with a reduction of 21 to 13 items for CDT acceptance. Practical applications and limitations of the CTAQ are discussed.
Since the inauguration of the Singapore Ministry of Education (MOE) Masterplan One for Information and Communication Technology (ICT) in Education in 1997 to drive quality learning empowered with technology, focus has been placed on the end-to-end integration of ICT into curriculum, pedagogy, and assessment. Moving in tandem with the wider use of ICT in education and to bring about better alignment with teaching and learning, assessment including high stakes CDTs have been introduced progressively since 2014. Although advances in technology offer an enormous prospect for innovations in learning, testing, and assessment (Kang & Pang, 2014), and despite the obligatory nature of assessments with consequences where students within the mainstream Singapore education system have to offer, little is known about how students take to and accept CDTs.
Existing measures within the literature on CDT acceptance include the commonly cited Computer-Based Assessment Acceptance Model (CBAAM) (Terzis & Economides, 2011). Nonetheless, current measures are related either to assessment functions within learning and teaching management systems or user attitudes toward the use of computers within learning and teaching. Hence, there remains a dearth of measurement instruments specifically for CDT acceptance in the context of systems built solely for assessment, without learning and teaching features, that is, electronic assessment/electronic examination/electronic test (e-assessment/e-examination/e-test) systems. This, and given that student attitudes and opinions of CDT have an important implication on an assessment’s face validity (Anastasi, 1982 cited in Dermo, 2009), lend support to the development and validation of a measure, the Computer-Delivered Test Acceptance Questionnaire (CTAQ). The CTAQ is purposed to provide information about students’ acceptance of CDT, so that students’ CDT experience can be managed more effectively, particularly, in the context of e-assessment in Singapore.
CDT Acceptance and Associated Measures
Despite the multitude of terms that bear semblance to CDT acceptance in the literature, for example, experience, familiarity, and readiness, acceptance, defined as the action of consenting to receive or undertake something offered (Acceptance, n.d.), was theorized as the underlying construct for the CTAQ as it presented a relatively neutral connotation with a smaller likelihood of being confounded by other variables such as classroom teaching and learning.
Readiness has a semantic definition of the state of being fully prepared for something or the willingness to do something (Readiness, n.d.). Whether a test taker can be fully prepared for a test, testing, especially tests with consequences, is necessary for educational progression in Singapore, and hence, the concept of willingness to do a test is premature in the context of this study. An alternative definition of readiness is the point at which a learner is receptive to what is to be learned, and this differs from each individual owing to the dependency on factors such as maturity and motivation (Wallace, 2015). This justly presents readiness as a less neutral connotation as it would be more susceptible to confounding variables such as test preparedness, teaching, and learning, relative to acceptance in part for the context of assessment with consequences.
Within the literature, computer familiarity, defined as the knowledge of something (Familiarity, n.d.) is frequently associated with experience which is defined as an emotion (Experience, n.d.). In measuring computer experience and familiarity, expectedly, both would typically include measures related to the duration of a respondent’s exposure to computer use. For example, Bozionelos (2001) defined computer experience as either a self-report of time length of computer use or a self-report of the degree of experience in various computer applications. Furthermore, Eignor, Taylor, Kirsch, and Jamieson (1998) presented familiarity as beyond the use of computers including the extent of access to and experience with using computers. It is noteworthy that, contrary to popular belief, a series of comprehensive studies on Test of English as a Foreign Language (TOEFL) found that computer familiarity, computer anxiety, and test anxiety had minimal relationship with test takers’ attitude to TOEFL CDTs; computer familiarity also had no relationship with performance (Stricker, Wilder, & Rock, 2004; Taylor, Kirsch, Eignor, & Jamieson, 1999). Undeniably, the increased prevalence of computers and related technology today are a motivation to progress beyond the familiarity of using computers. Contrary to computer familiarity that seemingly draws upon an individual’s cognition in the essence of know how and what to do, it is postulated that acceptance encompasses both cognitive and affective components (attitudes) and hence experience of a test taker.
Given its relative neutrality and lower susceptibility to confounding variables, the review of the semantic definitions of computer acceptance, experience, familiarity, and readiness suggests and supports acceptance as an underlying construct of the CTAQ. The following section discusses commonly cited frameworks and models relevant to CDT acceptance though they lie within the context of assessment in learning and teaching management and not e-assessment systems.
Technology Acceptance Model (TAM)
The TAM (Davis, 1989) was originally conceptualized to explain for variance in user intention of system use. Davis (1989) hypothesized two distinct dimensions that determine computer usage, namely, perceived usefulness and perceived ease of use. This model was subsequently adapted by various researchers in other contexts to predict the acceptance and use of information technology including the acceptance of websites in the World Wide Web context (Moon & Kim, 2001). More recently, and more relevant to the context of this study, the TAM was adapted and extended as the CBAAM.
CBAAM
Although the name of the widely cited CBAAM suggests a possible measure of CDT acceptance, it was developed with computer-delivered assessment within learning management systems. In the CBAAM, a total of nine constructs were theorized based on the TAM, the Theory of Planned Behavior, and the Unified Theory of Acceptance and Usage of Technology (Terzis & Economides, 2011). Causal links were hypothesized and ascertained by structural equation modeling. The constructs were perceived usefulness, perceived ease of use, social influence, perceived playfulness, behavioral intention to use computer-based assessment (CBA), computer self-efficacy, facilitating conditions, goal expectancy, and content. It is noteworthy that CBAAM was developed to explain the intention to use CBA and was a continuation of work done to ascertain the acceptance of assessment within learning management systems. Hence, some of the CBAAM items would justly not be applicable to the context of CDT tests in e-assessment systems, for example, using CBA gives me enjoyment for my learning or I intend to use CBA in the future.
The literature review did not yield existing measures for CDT acceptance in the context of e-assessment systems, and hence, the motivation for this study. Although references can be made from the CBAAM, it was noted that the nine constructs were determined based on a priori theoretical frameworks rather than post hoc exploratory factor analyses; Terzis and Economides (2011) also suggested that other variables could be added to the CBAAM as it was a first effort to develop this model. Hence, and expectedly, not all the constructs in the CBAAM would be appropriate for students’ acceptance of CDT within e-assessment systems.
Method
The development of the CTAQ was guided by the unified concept of validity proposed by Messick (1993). Based on Messick’s concept, five forms of evidence support validity: (a) test (questionnaire item) content, (b) response processes, (c) internal structure, (d) relations to other variables, and (e) consequences of testing. This study sought to establish evidences for test content, response processes, and internal structure. The main effect of levels the subjects were from was also investigated.
Test Content
Test content appropriateness, that is, questionnaire item content, was critical in that the CTAQ was designed to be applicable to students aged 13 years (Grade 7) to 18 years (Grade 12). Items had to be minimally comprehensible to students aged 13 years and sufficiently comprehensive to measure CDT acceptance.
Drawing from the literature review, the CTAQ included primarily original Likert-type scale items, and a few referenced from the 9-factor-30-item CBAAM (Terzis & Economides, 2011) and the revised 3-factor-23-item Computer Attitude Scale (Bandalos & Benson, 1990).
The CTAQ items were developed by the author and researchers who had primary and secondary school teaching and management experiences. Care was taken to minimize the use of polysyllabic words and sentences comprising more than 20 words as suggested by Cutts (2013). Consensus on content appropriateness of the items was achieved between the researchers before the items were critiqued by an expert panel comprising three lead/senior assessment specialists experienced in educational and psychological measurement, school leadership, and management. Feedback from this panel was incorporated and the items were judged to be appropriate and free of bias associated with gender and accessibility to computers. There was also agreement that construct-irrelevant sources of variance that may advantage/disadvantage certain groups of students stronger/weaker in English language have been minimized.
Rating scale and scoring
A search of the literature did not yield findings of Singapore students’ familiarity with the Likert-type scale rating. However, students in Singapore schools under the purview of the Ministry of Education have experience with responding to Likert-type scale ratings given their exposure to school questionnaires, for example, the Quality of School Experience Questionnaire. Hence, a 4-point Likert-type scale (strongly agree, agree, disagree, and strongly disagree) was assumed for the format of the CTAQ items. Given the context and their experiences in schools, the researchers and expert panel agreed that an odd-numbered Likert-type scale would likely garner neutral responses such as neither agree nor disagree, and hence, an even-numbered scale was assumed. Responses were numerically coded and scored, that is, 4 = strongly agree, 3 = agree, 2 = disagree, and 1 = strongly disagree.
Participants and Procedure
969 complete participant responses were used for the validation of the CTAQ (see Table 1). Missing data due to nonresponse were less than 1% of the total responses recorded. These data were excluded from the validation process.
Profile of Participants and Preceding Test Format.
Note. CDT = Computer-Delivered Test.
This is an online platform that can be used to deliver assessments.
This platform was developed by the Singapore Examinations and Assessment Board to develop and deliver assessments electronically.
To minimize intrusion to schools, the original 21-item CTAQ was administered as part of various preceding tests that lasted between 70 and 120 min; the preceding tests were delivered via two like platforms, FastTest Web and SEAB eExam system. As the CTAQ was bundled with a content-related test, students were not asked to provide their gender and age as with other content-related tests. Students were asked to complete the CTAQ to indicate how they perceived CDT after completing the preceding test. It was observed that students spent about 10 min to complete the 21-item CTAQ.
Response Processes
In addition to field observations, focus group discussions (FGDs) were conducted with four groups of Grade 11 participants (see Table 2).
FGD Participants.
Note. FGD = focus group discussion.
Coded school name.
Participants were observed to be able to respond to the preceding test, and CTAQ items without displaying observable signs of test or computer anxiety, for example, extended periods of pauses and thinking, signs of frustration such as slouching, frowning, or making extensive sounds of frustration.
FGDs followed the test and CTAQ administration, and participants were asked questions, for example, “What were you thinking when you responded to this item?” and “Did any term or sentence cause any confusion?” to yield information about the construct. There was no evidence to suggest that students were unable to identify with the purpose of the CTAQ items. Furthermore, there were no significant differences in how participants thought about and responded to the CTAQ items. Items that participants thought were vague were marked for further investigation (see Table 3). These items coincided with those that were noisy or caused misfit in subsequent analyses and hence were discarded. Thereafter, it was deemed that there was sufficient evidence to suggest that the underlying construct of CDT acceptance was generally not influenced by confounds such as assessment difficulty and ancillary capabilities.
Items Marked for Investigation.
Internal Structure
Evidence for the internal structure of the CTAQ was gathered using exploratory factor analysis (EFA) and confirmatory factor analysis (CFA), as existing a priori theoretical frameworks serve to measure CDT acceptance within learning and management systems. SAS (version 9.4) was used for the analyses. Half the sample was used for EFA, and half was used for CFA (see Table 4).
Data Split for EFA and CFA.
Note. EFA = exploratory factor analysis; CFA = confirmatory factor analysis.
As the name suggests, EFA seeks to reduce the number of items to fewer but more reliable latent measures. Commonly used as a multivariate statistical technique to ascertain hypothesized models in educational research, confirmatory factor analysis was thereafter performed to assess the dimensionality and construct validity of the CTAQ.
Assessment of normality
Tests of univariate normality (Shapiro-Wilk, Kolmogorov-Smirnov, Cramer-von Mises, and Anderson-Darling) indicated that all variables across S1 showed a modest departure from normality. Mardia tests of multivariate normality also indicated that S1 was nonnormal and the quantile–quantile plots suggested a negative skew owing to a greater number of students scoring higher for the CTAQ. Nonetheless, the skewness and kurtosis of the variables were within the thresholds (skewness < 2 and kurtosis < 7) for structural equation modeling suggested by West, Finch, and Curran (1995). Due to limitations of performing EFA and CFA on Likert-type scale ratings with Pearsonian correlation matrices (Jöreskog, 1994; Justicia, Pichardo, Cano, Berbén, & De la Fuente, 2008; Olsson, 1979; Holgado-Tello, Chacón-Moscoso, Barbero-García, & Vila-Abad, 2010), polychoric correlation matrices were computed for this purpose.
Reliability
Computed item–rest correlations, Cronbach’s alpha, and Tucker and Lewis’s reliability coefficient (TLRC) suggested that S1 had excellent internal consistency (Kline, 2000; Tucker & Lewis, 1973); αS1 =.90 and TLRCS1 = .92. S1 had item–rest correlations ranging from .27 to .73 with the exception of the item “I found it hard to stop once I started doing the e-exam” (.15). This item was marked for further investigation based on the recommended item–rest correlation threshold of .2 (Everitt, 2002).
EFA
Unweighted least squares (ULS) that does not assume distributional properties of data was used in addition to maximum likelihood (ML) for factor extraction owing to the modest nonnormality of the data. Oblique (promax) rotation was used to allow correlation between the subconstructs as with measurements in social sciences (Costello & Osborne, 2005). The ULS factor extraction did not yield different results from the ML factor extraction.
For the ML factor extraction, the Kaiser–Meyer–Olkin Measure of Sampling Adequacy for S1 was .92; this suggested the sufficiency of the sample and significant correlates between the items for EFA (Hutcheson & Sofroniou, 1999; Pett, Lackey, & Sullivan, 2003). Bartlett’s Test of Sphericity was significant for S1, χ2(210) = 5,575.04, p < .001, suggesting that the data set was suitable for factor analysis and that all the items had sufficient correlations.
Cattell’s scree test, Kaiser’s eigenvalue criterion, and the amount of variance explained were used to determine the number of factors. The analysis surfaced three main factors; the first, second, and third factor explained 87.43%, 11.47%, and 6.47% of variance, respectively. Items that loaded less than .32 or cross-loaded on the factors were discarded (Costello & Osborne, 2005). Final factor loadings are presented in Table 5. Based on the EFA, the 21-item CTAQ was reduced to 13 items reflecting three factors: (a) ease of e-assessment system use, (b) experience with the e-assessment system, and (c) involvement while completing the CDT (see Table 6 for the interfactor correlations, means, and standard deviations).
Final Factor Loadings for S1.
Note. The term exam may be replaced with test or assessment.
S1 Interfactor Correlations, Ms, SDs, and Cronbach’s α.
Note. CTAQ = Computer-Delivered Test Acceptance Questionnaire.
CFA
CFA was performed on the covariance matrices of data set S2 to ascertain the three-factor model specified by the EFA of the CTAQ.
As with S1, S2 had a modest violation of multivariate normality and hence, the maximum likelihood with Satorra–Bentler scaled chi-square statistics for model fit (MLSB) was used. The MLSB adjusts the chi-square statistic and standard errors for the nonnormality of the data and hence yields more accurate goodness-of-fit statistics (Motl, Dishman, Birnbaum, & Lytle, 2005).
Brown (2014) recommended reporting at least one goodness-of-fit index of three categories: (a) absolute fit indices, for example, standardized root mean square residual (SRMR), (b) parsimony correction indices, for example, root mean square error approximation (RMSEA), and (3) comparative fit indices, for example, comparative fit index (CFI). Based on cut-off criteria for goodness-of-fit indicators by Hu and Bentler (1999), Hair, Black, Babin, and Anderson (2007), and Brown (2014), there is adequate model fit to a three-factor model for S2. Table 7 presents the goodness-of-fit indicators of the CTAQ three-factor model.
CFA Goodness-of-Fit Indicators.
Note. CFA = confirmatory factor analysis; χ2 = chi-square statistic; df = degrees of freedom; CFI = comparative fit index; RMSEA = root mean square error of approximation; SRMR = standardized root mean square.
p < .001.
All standardized loading estimates were significant (p < .001) and above .5. The loadings ranged from .63 to .92, indicating that items of the three subscales share an adequate proportion of variance in common. The average variance extracted (AVE) and construct reliability (CR) coefficients were above .5 and .7, respectively, with the exception of Ease of Use that had an AVE value close to but less than .5. Although an AVE value of less than .5 suggests the possibility of more error variance in the items than variance explained by the latent factor on average, the CR coefficient for Ease of Use well exceeded the recommended .7 (see Table 8). These suggest that a three-factor model for S2 had adequate convergence and construct reliability (Hair et al., 2007).
AVE and CR Coefficients at Subscale Levels (S2).
Note. AVE = average variance extracted; CR = construct reliability; CTAQ = Computer-Delivered Test Acceptance Questionnaire.
To ascertain the distinctiveness of each subconstruct, the interfactor correlations were compared with the square root of the AVE values (Fornell & Larcker, 1981; Hair et al., 2007). From Table 9, all the subconstructs appear adequately distinct.
Distinctiveness of Subconstructs (S2).
Refers to the square root of average variance extracted. Bolded digits are values of square root of AVE.
Discussion
This study sought to develop an instrument to measure CDT acceptance within the context of tests delivered by e-assessment/e-exam/e-test systems. Results of the EFA and CFA suggest adequate fit for a three-factor model to explain CDT acceptance. Based on the final 13 items identified, a one-way between-subjects analysis of variance (ANOVA) was performed on S2. The assumption of homogeneity of variance was not violated based on Levene’s test (F = .50, p = .604). The ANOVA showed a main effect of level on CTAQ scores, F(2, 481) = 8.02, p < .05, η2 = .03. This suggested a small effect size, and approximately 3% of variance in the CTAQ score was explained by the level the participant was studying.
Post hoc analyses using the Ryan–Einot–Gabriel–Welsch test suggested that the group means of Grades 10 and 11 were comparable, whereas that of Grade 7 was statistically significantly different. Considering the small effect size of level, this difference and slightly increased mean could be attributed to a difference in item types; the Grade 7 participants were tasked with only selected response (multiple choice) items, whereas the Grade 10 and 11 participants completed tests that included constructed response items.
Furthermore, based on recommendations by Vandenberg and Lance (2000), Cheung and Rensvold (2002), Chen (2007), and van de Schoot, Lugtig, and Hox (2012), there was little to suggest against measurement invariance that would otherwise lead to nonmeaningful interpretations between groups, that is, Grade 7 versus Grade 10/11 (see Table 10).
Test of Measurement Invariance by Comparing Grade 7 With Grades 10/11.
Note. χ2 = chi-square statistic; df = degrees of freedom; CFI = comparative fit index; RMSEA = root mean square error of approximation; SRMR = standardized root mean square.
p < .001.
Limitations and Directions for Future Research
Although there are practical applications for the CTAQ, this study is not without limitations. The sample used to develop and validate the CTAQ presents some restrictions on its use. First, the sample for this study comprised Secondary 1 (Grade 7), Secondary 4 (Grade 10), and Junior College/Integrated program Year 1 (Grade 11) students. Students were not asked to provide their gender and age as the CTAQ was bundled with content-related tests, but it is noted that students within the sample are normally aged between 13 and 17 years. Hence, the three-factor model of the CTAQ would be most applicable to Singapore students within this age range, and the application of the CTAQ to the primary level presents a matter of concern. By this token and given that e-assessment at the national level has been recently implemented to the primary level, the CTAQ should be validated with a primary level sample.
Second, the sample comprised students from Singapore within the mainstream education system where access to a computer at home and household access to Internet have been on an increasing trend since 2007; the most recent 2017 statistic for access to a computer at home and household access to Internet were 87% and 91%, respectively (Infocomm Media Development Authority, 2018). It should also be noted that Singapore is considered a country (a) where home ICT equipment and time spent using the Internet is above the Organisation for Economic Co-operation and Development (OECD) average, (b) where the student-per-school computer ratio is below the OECD average, and (c) where its students’ use of ICT in school is below the OECD average (OECD, 2015). Considering these, findings should be interpreted with caution if the CTAQ is applied to contexts significantly different from that of Singapore.
Third, although efforts have been made to ensure that all the items are system-agnostic, applications of the CTAQ should be interpreted with caution for e-assessment systems that have a user interface significantly different from those used in the development of the CTAQ.
Fourth, further research could be done to ascertain how item types affect students’ CDT acceptance. Although there was a small level effect size in this study, it was noted that the Grade 7 students were tasked to complete a fully selected-response (multiple choice) test, whereas the other students completed a mix of both selected and constructed response items.
Conclusion and Practical Implications
Results of this study to develop and validate an instrument to measure CDT acceptance within the context of e-assessment systems yielded three key dimensions, that is, ease of use, experience, and involvement. Based on the results, the CTAQ holds promise as an instrument that can be used to provide information about students’ acceptance of CDT. With this information, students’ CDT experience can be enhanced in the context of e-assessment in Singapore. Furthermore, the CTAQ can be applied to contexts beyond but not significantly different from Singapore. Although scores for each of the three CTAQ constructs, that is, ease of use, involvement, and experience, could be interpreted separately, for practical purposes, for example, general sensing, an index, the Computer-Delivered Test Acceptance Index (CTAI), could be considered as part of the CTAQ scale development.
In establishing an appropriate and robust method to aggregate the CTAQ items for the CTAI, two methods and their results were examined, noting that scales are “measurement instruments that are collections of items combined into a composite score and intended to reveal levels of theoretical variables not readily observable by direct means” (DeVellis, 2017, p. 15). The first method, unit-weighted, aggregated the CTAQ items at the factor level based on equal weights within each factor. This was followed by averaging the three-factor scores for the CTAI. A second method, regression-weighted or weighted sum scores (WSS; DiStefano, Zhu, & Mindrila, 2009), regarded the three-factor model as empirical evidence capturing the underlying structure of the latent construct of CDT acceptance. Therefore, the weights were based on the CFA loadings as they are confirmatory in nature. Although WSS presents a more technically valid approach and allows items with higher factor loadings to lead to a larger effect on the factor score, factor loadings may vary with factor extraction or rotation methods. The entire sample (N = 969) was used to establish a CTAI threshold. Table 11 presents a comparison of factors and CTAI composite scores for the two methods mentioned.
Comparison of Factors and CTAI Composite Scores From Two Methods.
Note. CTAI = Computer-Delivered Test Acceptance Index.
As the CTAI was an initial effort to understand students’ acceptance toward CDT, an equal weighting for all items is suggested (i.e., Method 1). This method accords the recognition of an equal status for all items, given that there are currently no statistical or empirical grounds for choosing a difference scheme. Moreover, CTAI composite score differences computed based on both methods were negligible.
It is expected that CTAQ respondents who feel that they can accept CDT from an e-assessment system would score higher based on the 4-point Likert-type scale (coded as 4 = strongly agree, 3 = agree, 2 = disagree, 1 = strongly disagree). Analyses of the distribution of scores suggest 3.0 as an appropriate initial threshold, and this indicates a higher level of acceptance toward CDT compared with nonacceptance.
Based on evidences of the test content, responses processes and internal structure, and from a practical application and adoption perspective, the CTAQ presents itself as a convenient yet valid instrument that can be completed within 10 min. Using 3.0 as an initial CTAI threshold would also be easily interpreted by users of the CTAQ.
Footnotes
Acknowledgements
The author sincerely thanks Dr. Ng Siow Chin, Dr. Tay Poh Hua, and Esther Yee for their invaluable feedback during the phase of item development and analyses; Wee Tian Lu for her support in scanning the literature, checking the exploratory factor analysis (EFA) and computing the Computer-Delivered Test Acceptance Index (CTAI); and Hazel Tan for her support during the phase of item generation and in carrying out the study.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
