Abstract
Student evaluation of teaching (SET) assesses students’ experiences in a class to evaluate teachers’ performance in class. SET essentially comprises three facets: teaching proficiency, student rating harshness, and item properties. The computerized adaptive testing form of SET with an established item pool has been used in educational environments. However, conventional scoring methods ignore the harshness of students toward teachers and, therefore, are unable to provide a valid assessment. In addition, simultaneously estimating teachers’ teaching proficiency and students’ harshness remains an unaddressed issue in the context of online SET. In the current study, we develop and compare three novel methods—marginal, iterative once, and hybrid approaches—to improve the precision of parameter estimations. A simulation study is conducted to demonstrate that the hybrid method is a promising technique that can substantially outperform traditional methods.
Student evaluation of teaching (SET) is a teaching assessment format that is used for measuring teachers’ teaching proficiency in educational environments. Students participate in a SET assessment to evaluate their teacher’s teaching performance; SET is usually conducted toward the end of a semester. Estimating teacher performance is important because teacher performance evaluation is critical for schools to understand the quality of teaching. Good teaching can make students’ learning more productive (Kember & Wong, 2000). Therefore, it is important to monitor and estimate the quality of teaching. Teachers can be either promoted or penalized based on their teaching performance, especially in private schools in Asia. Thus, obtaining reliable estimates is critical for teachers.
Conventionally, SET is administered in a paper-and-pencil format. Currently, a growing number of SETs have moved to online formats. For example, online SET has been used at the University of Oslo, where students are asked to provide their feedback on and evaluations of teachers regarding the teaching effectiveness based on the student’s learning experience in the classroom. Using online SET has numerous advantages, such as the ease of recording item responses, the reduction of recording errors, and the avoidance of logistical problems. Additionally, students do not have to fill out the SET questionnaire in front of their teacher, which avoids situations where the students might not truthfully fill out the SET. Moreover, online SET is easy to administer, and the items can be customized for each teacher. In addition, Gamliel and Davidovitz (2005) have further shown that online SET has an internal reliability of .89, which is higher than that of the paper-and-pencil format. Despite these striking advantages, online SET is prone to low response rates (Dommeyer et al., 2004), possibly due to students’ low levels of motivation and interest. For example, in 4-year medical student curriculum evaluations at Kansas Medical Center, the response rate was only 24% because students had to answer 62 items (Anderson et al., 2005; Paolo et al., 2000).
To address the low response rate concern of online SET, computerized adaptive testing (CAT) could be a viable alternative. CAT not only reduces the questionnaire length but also precisely estimates the student’s latent traits. The advantages of CAT have been well documented in the literature (for more information, see Wainer et al., 2000). However, it remains unclear whether CAT can be directly applied to online SET. Because a teacher’s teaching proficiency is evaluated by students, the students’ rating harshness regarding their responses could impact scores. A critical student may give lower scores than a more lenient student in rating the same teacher who should get the same score when all students have consistent levels of harshness. Three facets are conceptualized: items, students’ harshness toward the teacher, and teachers’ teaching proficiency. Here, there is an association between SET and conventional multifaceted measurements. For the conventional multifaceted measurements, raters are asked to evaluate ratees’ performances on, for example, essay writing in paper-and-pencil format. Similarly, online SET requires students (raters) to evaluate teachers’ teaching performance (ratees). Thus, the measurement model for the conventional multifaceted measurement, namely, the multifaceted rating scale model (MFRSM; Linacre, 1993), can be the fundamental psychometric model to develop CAT algorithms for the online SET scenario.
Before using online SET (or SET-CAT), the item parameters must be estimated from a SET dataset. These item estimates constitute an item bank for subsequent use in CAT. Usually, the item estimates are assumed to be fixed and known in CAT. However, in the SET scenario, the student’s trait (γ: student harshness) and teacher’s trait (θ: teaching proficiency) are two unknown parameters that must be estimated during CAT. In contrast, a conventional CAT only addresses one unknown parameter (ie the student’s trait). As a result, estimating γ and θ simultaneously is a challenging task for online SET. This difficulty seriously impedes the empirical application of SET-CAT. If the γ is ignored while estimating the θ, it is expected that an estimation accuracy of θ would be problematic. In the current study, we develop new algorithms for estimating γ and θ simultaneously in SET-CAT. To show the detriment of ignoring estimating γ and the benefit of estimating γ and θ together, a Monte Carlo simulation study will be conducted.
The rest of the current paper is organized as follows. First, we introduce the MFRSM (Linacre, 1993) for the SET assessment. Second, three online estimation methods for γ and θ in the SET-CAT context are proposed. Third, a simulation study is conducted to compare the performance of the three new methods with a naïve method ignoring γ. Finally, concluding remarks are provided to summarize our findings and outline further directions for SET-CAT.
Multifaceted Rating Scale Model
The MFRSM is often used to deal with the rater effect when raters score examinees’ responses. Three facets are involved in the MFRSM—item, examinee, and rater. In the context of SET, the teacher’s teaching proficiency is evaluated by the student. Hence, the three facets are item, teacher, and student. Thus, the student is considered the “rater,” whereas the teacher is regarded as the “ratee.”
Ignoring the rater effect has been found to be detrimental to parameter estimates, leading to unreliable scores (Hoyt, 2000; Wolfe, 2004); some raters have confined the range of ratings (Holzbach, 1978), and the halo effect of the rater can inflate the correlation between latent traits (Hoyt, 2000). The rater effect could be attributed to harshness versus leniency, centrality versus extremity, and accuracy versus inaccuracy (Acuña, 2017; Wolfe, 2004). Although the types of rater effects can be numerous, a more crucial concern is how to eliminate rater effects and attain reliable scores. In the current study, a rater effect of harshness versus leniency that would bias the parameter estimates (e.g., Boone et al., 2016; Wesolowski et al., 2016; Wind & Jones, 2019) is considered. Accounting for the various rater effects, such as halo effects, centrality/extremity, or inaccuracy, is possible; however, such accounting introduces model identification problems to distinguish them in the SET-CAT with sparse item responses. Therefore, we focus on eliminating the harshness effect, which can be achieved by using the MFRSM.
The probability function of the MFRSM for a positive response is formulated as follows
The marginal maximum likelihood with the expectation-maximization method (Dempster et al., 1977; Muraki, 1992) is one method often used to calibrate parameters in item response theory models. γ is considered a random-effect variable and is marginalized in the joint likelihood function of all the parameters, given observed responses. θ is regarded as a fixed effect, and no prior distribution is included. The prior distribution of γ is usually assumed to be a normal distribution with a zero mean (fixed for identification) and an estimable variance parameter, σ2. Additionally, a constraint shall be imposed on θ or δ. For example, ∑θ
r
= 0. In the item bank building stage, the SET is administered to a group of students to rate the teachers’ teaching proficiency.
Latent Trait Estimation in SET-CAT
The difference between a conventional CAT and SET-CAT is that the former only has a single trait (e.g., θ) to estimate, while the latter has γ (student) and θ (teacher). The target of SET-CAT is to estimate the teacher’s θ precisely with the student’s γ. The two unknown parameters must be estimated during SET-CAT. However, an identification issue arises because γ – θ = (γ + c) – (θ + c) = γ* – θ*, where c is constant. Note that in some respects, SET-CAT is similar to conventional CAT with two-dimensional IRT models. The specific difference is that SET-CAT is concerned with rater data (teachers evaluated by students) instead of rating data (students responding to items). Additionally, the model used in SET-CAT is MFRSM rather than other regular models, such as multidimensional generalized partial credit models (Reckase, 2009). In the following, we propose a marginal method (MM), iterative once method (IOM), and hybrid method for latent trait estimation.
Marginal Method
Consider the log-likelihood function with respect to γ
i
and θ
r
for student i and teacher r
Remarkably, the log-likelihood function is a function of the parameter θ
r
(teacher),
Iterative Once Method (IOM)
The iterative once method (IOM) estimates γ and θ. Specifically, for student i giving rating w to teacher r, the first step fixes the teaching proficiency θ
r
to the provisional estimated value
The first step continues until at least one student has rated teacher r, and the next step estimates teaching proficiency θ
r
given the estimates of γ1, …, γ
l
, where l denotes the number of students thus far. With the known values of item parameters
Hybrid Method
A hybrid method is proposed that takes advantage of the MM and the IOM. The MM does not require estimating
The schema of the hybrid method is summarized as follows:
If the standard error of the current
-MM. Execute the MM (Equation (3)) to obtain
-IOM. Execute the IOM to obtain
If student i is required to rate more teachers, return to Step 1 for the teachers.
The above steps are carried out for each student. The hybrid method could improve the precision of
Gamma Ignored Method (GI)
The GI method ignores every student’s harshness and only estimates the teacher’s proficiency θ. Therefore, the log-likelihood function for θ
r
is as follows
Item Selection
In the current study, the Fisher information,
Simulation Study
The simulation study aims to investigate the accuracy and precision of the θ and γ estimations for the MM, IOM, and hybrid methods when compared with the GI method. In the following simulation study, we examine the impact of the four methods and test the length of the parameter estimates. The computer software MATLAB was used to conduct the simulation studies. The execution details of the three proposed methods are summarized in the Appendix.
Design
A simulated SET scenario resembling the real conditions of the 2008 National Dong Hwa University evaluations in Taiwan was carried out. National Dong Hwa University had a total of 173 teachers rated by 6111 students. The course load for each teacher ranged from one to eight classes, and each teacher was rated by 12–146 students. For a reasonable simulation execution time, we set 1000 students rating 50 teachers, where every teacher taught four classes and each class had 20 students on average. Every teacher was rated by 80 students in total from the four different classes, and every student rated four teachers in the four different classes. For generalizability of the study, we simulated conditions with small class sizes. The small class size condition set class sizes equal to 3 and 5, where we simulated that teachers were evaluated by students from four classes with the size of 3 and 5 students; hence, each teacher was rated by 12 and 20 students simultaneously. The small class size condition reflected the empirical situation of Master’s or PhD program classes in universities where fewer than 10 students were enrolled in a class, and teachers were rated by approximately 10 or 20 students in total. A more extreme condition was considered where the class size equaled one. Every teacher taught four classes, which means each ratee was rated by four raters. In summary, four levels of class sizes were simulated with 20, 5, 3, or 1 student in a class, which represented 80, 20, 12, and 4 students evaluating each teacher, respectively.
The item pool contains 100 items, with the difficulty parameters generated from a normal distribution with zero mean and unit variance. The 100 items were sufficient to yield a high level of precision for the parameter estimation in CAT (Rudner, 2009). The three threshold parameters were set to [−2.0, −1.07, 3.07] from an empirical analysis of SET (Setari et al., 2016). The item responses were simulated by the MFRSM.
Test lengths of 5, 10, and 20 items were chosen. The test length of 20 items was used, which resembles SET in the National Dong Hwa University in Taiwan. The 10-item condition was included to examine the effect of fewer items (Stocking, 1994; Wang & Kolen, 2001). The 5-item design aimed to show the impact of a short test length.
The three manipulated variables are test length, parameter estimation method, and class size. Of the CAT approaches, the six conditions are the MM with a prior γ with variance of one, the MM with a prior γ with variance of two, the MM with a prior γ with variance of three, the IOM, the hybrid method, and GI. Of the class size, the three conditions were 20, 5, 3, or 1 student in a class.
A normal distribution of the teacher population was considered. A total of 50 teaching proficiencies
Each condition received six parameter estimation methods, with each governed by either the MM, IOM, hybrid, or GI. We considered the MM in three conditions: MM[σ2 = 1], MM[σ2 = 2], and MM[σ2 = 3], where σ2 denotes the variance of the prior distribution of γ. The condition of σ2 = 1 represents a more informative prior distribution, while σ2 = 3 represents a less informative prior distribution. For illustration purposes, the hybrid method used the prior distribution of γ with the variance equal to 1. The maximum Fisher information criterion was used for item selection. The initial items were randomly chosen to vary the selected items in the very early stage of CAT.
The provisional estimates
The conditions of three test lengths (5, 10, and 20) and the four class sizes (20, 5, 3, and 1) yield a total of 12 conditions in the simulations. The six approaches (MM[σ2 = 3], MM[σ2 = 2], MM[σ2 = 1], IOM, hybrid, and GI) were compared for each condition. Every condition was implemented for 25 replications, which is sufficient to demonstrate the detriment of ignoring the γ parameter (see also Harwell et al., 1996). In addition, we added one more simulation condition where the item bank for SET-CAT contained only 20 items. The purpose was to inspect whether such a small item pool is sufficient in practice. This reflected the empirical situation that SET tests commonly contain few items. The 20 items were set for illustration purposes.
For
Then, we explored the effects of methods, test lengths, and class sizes on the RMSE, bias, and reliability of teaching proficiency. We applied a three-way analysis of variance (ANOVA) where the RMSE, bias, and reliability of θ estimation were the outcome variables, and the methods, test lengths, and class sizes were the independent variables. The three-way interactions between methods, test lengths, and class sizes were examined.
Several results were expected: (a) for all methods, the average RMSE of θ and γ would decrease as more items were used, and the average bias of θ and γ would be close to zero; (b) for the MM and the GI, the RMSE of γ would be larger than the IOM and hybrid approaches because neither the MM nor the GI consider the individual’s γ in the provisional estimation and item selection; (c) the improvement of the precision (RMSE) of the γ estimate could improve the precision of the θ estimate; (d) the hybrid method would yield a lower RMSE for the θ estimate than the other methods.
Results
Class Size = 20
Average Bias, RMSE, and Reliability of Latent Trait Estimation in the Ignoring γ, Marginal, Iterative, and Hybrid Methods for 20 Classes.
Note. RMSE is the root mean square error, GI is the gamma ignored method, MM is the marginal method, IOM is the iterative once method, and σ2 indicates the prior variance of γ in the marginal method.
The longer test lengths (i.e., 20 items) had a lower RMSE and higher reliability for all the methods. For θ and γ, the RMSE of
Among the four methods in the 5-item condition, the RMSE of
For the RMSE of
In the 10-item condition, for the hybrid method the RMSE of
In the 20-item condition, the IOM had the lowest RMSE, which is due to the fact that
The bottom of Table 1 shows that the MM, IOM, and hybrid methods had higher reliability than GI, especially in 5-item and 10-item situations. In the 20-item case, all methods had high reliability for the teaching proficiency estimates.
Figure 1 shows the boxplot of the overlap rates among the four methods for the 5-, 10-, and 20-item test lengths. The MM with a prior N(0,1) is presented in Figure 1 for illustration purposes. The overlap rates for the MM with priors N(0,2) and N(0,3) were similar to the MM with the prior N(0, 1) and thus omitted in Figure 1. In the 5-item condition (top plot in Figure 1), the median of the overlap rate (the circle in box plot) between the IOM and the hybrid method was equal to zero. This means that at least 50% of the test-takers under IOM took the 5 items that are totally different from the items that they took under the hybrid method. The highest 0.4 overlap rate between the IOM and hybrid method indicates that the test-takers who took 2 out of 5 items in the IOM were the same as those in the hybrid method. The overlap rates for the pairs of the MM and the hybrid, the MM and the IOM, the GI and the hybrid, and the GI and the IOM were equal to zero for all samples. This means that the MM and GI groups selected 5 items that were different from the IOM and hybrid groups for every person. However, the overlap rate between the GI and the MM had a third quantile of 0.8 and a maximum value of 1.0. This means that overlap in items between the GI and MM methods was 4 out of 5 items for approximately 25% of the students, on average, and that the overlap was 5/5 items for the other 25% of students. The overlap rate in the 5-item condition implied that the selected items in the GI and MM groups were highly similar to each other; the IOM and hybrid groups’ selected items were moderately similar to each other. The selected items for the GI and MM groups were different from those for the IOM and hybrid groups. The same conclusion about the overlap rate can be applied to the 10- and 20-item conditions (the middle and bottom plots in Figure 1). For example, in the 20-item condition, the median overlap rate between the IOM and hybrid groups was approximately 0.3, whereas the median overlap rate between the GI and MM groups was high at 0.9. The medians of the overlap rate between the MM and the hybrid, the MM and the IOM, the GI and the hybrid, and the GI and the IOM groups were close to or equal to zero. The observation from Figure 1 can be used to explain the results in Table 1. The selected items for the MM were close to GI, so they both had worse precision for Overlap rates among pairs of MMσ2 = 1, hybrid method, GI, and IOM with each other for the 5-, 10-, and 20-item test length conditions. The circles indicate the median across the students in the boxplot.
The overlap rate in Figure 1 partially explained the results of the average RMSE and reliability in Table 1. The moderate level of overlap rates between the hybrid and the IOM in Figure 1 matched the result in Table 1, where the hybrid method and the IOM had RMSE and reliability values of
Class Size = 5, 3 or 1
Average RMSE of Latent Trait Estimation Ignoring the γ, Marginal, Iterative, and Hybrid Methods for Small and Extremely Small Class Size Conditions.
Under the condition of class size = 5 (the top six rows in Table 2), the IOM and hybrid methods performed slightly better (lower RMSE) than the MM in terms of RMSE for
In the class size = 1 condition (the bottom six rows in Table 2), the IOM had a higher RMSE of the
ANOVA Result
ANOVA Summary Table for the Outcome Variable RMSE of θ and the Independent Variables Method, Test Length, and Class Size.
Note. Df = Degrees of freedom.
ANOVA Summary Table for the Outcome Variable RMSE of γ and the Independent Variables Method, Test Length, and Class Size.
Note. Df = Degrees of freedom.
Results Conditional on θ Levels
For the conditional θ levels of [−3, −2, −1, 0, 1, 2, 3], Figures 2, 3, and 4 show the results of the 5-item, 10-item, and 20-item conditions with MM[σ2 = 1]. Bias and RMSE of θ and γ conditional on the different θ levels for the test length of 5 items. GI is the gamma ignored method, MM is the marginal method, σ2 indicates the prior variance of γ in the marginal method, and the IOM is the iterative method. Bias and RMSE of θ and γ conditional on the different θ levels in the test length of 10 items. GI is the gamma ignored method, MM is the marginal method, and σ2 indicates the prior variance of γ in the marginal method. IM is the iterative method. Bias and RMSE of θ and γ conditional on the different θ levels in the test length of 20 items. GI is the gamma ignored method, MM is the marginal method, σ2 indicates the prior variance of γ in the marginal method, and IM is the iterative method.


The upper two plots show the bias of teaching proficiency (on the left, Figure 2(a)) and student harshness (on the right, Figure 2(b)). The lower two plots show the RMSE of teaching proficiency (on the left, Figure 2(c)) and student harshness (on the right, Figure 2(d)). As shown in Figure 2(a), the biases of teaching proficiency in the four methods were large in the high proficiency levels compared with those in the middle proficiency level. High proficiency had a positive bias, and low proficiency had a negative bias. The biased estimates for the extreme levels in the maximum likelihood estimation were anticipated (Lord, 1986). Lord (1986) demonstrated that the bias of the maximum likelihood estimation is the function with the inverse information function (see equation (6) of Lord, 1986), so the extreme levels would bias more than the middle levels. The hybrid method had less biased estimates for θ compared with the other methods. Generally, all methods had biases close to zero in the middle θ levels. In Figure 2(b), the bias of student harshness remained at zero across the levels of teaching proficiency. The average bias of student harshness was close to zero because the student harshness had the mean constrained to zero for model identification.
For the RMSE of teacher proficiency (see Figure 2(c)), the hybrid method had similar performance with the MM and the IOM from the −1 to 1 level of teaching proficiency but lower RMSE at the −3 and 3 levels than the MM and the IOM. The hybrid method performed better than the MM and the IOM at the extreme θ levels. GI had the highest RMSE θ across all levels of teaching proficiency. This suggests that the three proposed methods improved the RMSE for all teacher proficiency levels. For the RMSE of student harshness γ (see Figure 2(d)), the GI and MM had similar RMSEs across all levels of teaching proficiency. This is because neither the GI nor the MM updated γ nor selected items by considering γ estimates. Although the MM considered prior information for γ, the prior was constant across all students. The selected items in the MM provided the maximum information for updated
Figure 3 shows the bias and RMSE conditional on the θ levels for the 10-item situation. Figure 3(a) shows that the hybrid method had a smaller average bias on θ estimation at the −3 and +3 levels than the other three methods. The average bias for the γ estimates was zero across methods (Figure 3(b)). Figure 3(c) shows that the RMSEs ranged from 0.087 to 0.203 for the IOM, from 0.087 to 0.200 for the MM, and from 0.088 to 0.201 for the GI. The hybrid method performed better than the MM and the IOM for the RMSE of
Figure 4 shows the bias and RMSE conditional on the θ levels for the 20-item condition. Figure 4(a) shows that the GI method had a larger average bias for the θ levels at −3 and +3 than the other methods. The average bias for the γ estimates was zero across methods (Figure 4(b)). Figure 4(c) shows that the MM, IOM, and the hybrid method had a lower RMSE of
Only 20 Items in the Item Bank
Root Mean Square Error (RMSE) of
Conclusion and Discussion
The current study proposes three estimation methods for SET-CAT to take teacher proficiency and student harshness into account. The GI was the baseline method that fixes all student harshness values at zero and updates the teacher’s provisional proficiency estimates. The MM marginalizes student harshness using prior distributions in the likelihood function, whereas the IOM method updates student harshness based on the response given for each item and updates teacher proficiency for each student’s completed evaluation. The hybrid method advances the MM and the IOM in that the prior information of the rater’s predisposition is used in the early stages of CAT. Moreover, it updates the student harshness and teacher proficiency iteratively when item information is sufficient for teacher proficiency (e.g., standard error smaller than 0.3) in the later stages.
The simulation results show that the hybrid method, MM, and IOM can reduce the RMSE and increase reliability for both teacher proficiency and student harshness, mostly for 5-item and 10-item tests in the class size = 20 condition. This is especially the case when a rated teacher has teaching proficiency at extremely low or high levels; here, the hybrid method and IOM improved the precision of teacher proficiency as well as the precision of student harshness. Among the three methods, the hybrid method yielded higher precision for student harshness and teaching proficiency than the MM and the IOM for extreme levels (+3 and −3) of teaching proficiency, where the maximum difference was approximately 0.07. For more moderate levels of teaching proficiency, the difference in precision between the hybrid and the IOM is not evident. In the 20-item condition, the IOM performed better in estimation precision than the hybrid method for the extreme level of teaching proficiency, approximately 0.04. Thus, the hybrid and IOM methods could be promising methods for online SET.
In the five-student class condition, the MM, IOM, and hybrid methods had similar precision of the θ estimates when the test length equaled 10 or 20 items, but when only 5 items were administered, the MM slightly had a higher RMSE. This finding suggests that the prior influenced the short tests considerably. In the one-student class condition, the IOM performed worse than the MM and the hybrid when 5 or 10 items were administered. In such a small number of students, the prior helped with the estimation precision for the MM and the hybrid. The GI performed worst compared to other methods and thus is not suggested for use in practice.
In the situation of the item bank with 20 items, the result showed that the IOM and the hybrid had lower RMSEs than that of the MM, whereas the MM had lower RMSEs when only one student was in a class. This suggests that using the IOM and hybrid methods is recommended when the class size is equal to or larger than 5. The MM is recommended when only one student evaluates the teacher in a class.
The contribution of the current study is the consideration of the uncertainty of rater harshness and ratee ability, showing the detriment of ignoring students’ ratings on parameter estimates. For example, in physical therapy, clinicians evaluate the stroke patient’s balance function by using instruments such as the Berg balance scale, the balance evaluation systems test, or the dynamic gait index. The item pool (41 items) for assessing the patient’s balance function has been well established (Hsueh et al., 2010). If using the methods in the current study, the clinician’s harshness and patient’s balance can be iteratively updated, so we should expect an improvement in the precision of both measures (clinician harshness and patient ability) concurrently, especially when the patient’s balance ability is very low or high. Our simulation condition of class size = 1 gave insights for the clinical evaluation situation. The MM would be recommended in such a situation, especially when the size of the item bank is limited.
When student identities cannot be collected in the SET assessment (i.e., anonymous students), the individual student’s γ cannot be identified. In this case, the MM is recommended because it employs the prior distribution for students’ predispositions, which was shown to perform better than the GI in terms of the RMSE for θ.
Several future improvements to the hybrid method can be made. Content balance strategies such as the modified multinomial method (Chen et al., 1999), a constrained CAT (Kingsbury & Zara, 1989), the modification of a constrained CAT (Leung et al., 2000), the maximum priority index method (Cheng & Chang, 2009), and the shadow tests approach (van der Linden, 2005) are valuable applications for SET-CAT in the future because they can make the content areas meet the required number of administered items while maintaining the content validity of CAT. On the other hand, a variable-length CAT for SET using the GI, MM, IOM, and hybrid methods can be further explored in future studies. Stopping rules for terminating CAT under the MM, IOM, and hybrid methods will be needed for a variable-length CAT. In summary, the three proposed methods for SET-CAT successfully improved the measurement precision of teacher proficiency and provided ways of administrating CAT with multifaceted models.
Supplemental Material
Supplemental Material - Online Parameter Estimation for Student Evaluation of Teaching
Supplemental Material for Online Parameter Estimation for Student Evaluation of Teaching by Chia-Wen Chen, and Chen-Wei Liu in Applied Psychological Measurement
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Supplemental Material
Supplemental material for this article is available online.
Appendix
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
