Abstract
Perceptual (mis)matches between teachers and learners are said to affect learning success or failure. Self-assessment, as a formative assessment tool, may, inter alia, be considered a means to minimize such mismatches. Therefore, the present study investigated the extent to which learners’ assessment of their own speaking performance, before and after their being provided with a list of agreed-upon scoring criteria followed by a practice session, matches that of their teachers. In so doing, 29 EFL learners and six EFL teachers served as participants; the learners were asked to assess their audio-recorded speaking performance before and after their being provided with the scoring criteria and practice session. The teachers were also asked to assess the learners’ performance according to the same criteria. Finally, the learners were required to evaluate the effectiveness of doing self-assessment in the form of reflection papers. The results revealed a significant difference between the learners’ assessment of their own speaking ability on the two occasions. The findings also suggested that providing the learners with the scoring criteria and the follow-up practice session minimized the existing mismatches between learner assessment and teacher assessment. Moreover, the inductive analysis of the reflection papers yielded a number of themes suggesting that, despite some limitations, the learners’ overall evaluation of the effectiveness of speaking self-assessment was positive.
Coherence is said to be a prerequisite for the success of any curriculum. According to Johnson (1989), a “coherent curriculum” is one in which all stakeholders, namely policy makers, syllabus designers, materials developers, teacher trainers, teachers, and learners, develop a similar understanding of the goals and objectives for which the curriculum is run in the first place. In this regard, any instance of mismatch, hence, is believed to lead to, in one way or another, incoherence and, accordingly, failure of the curriculum. More specifically, Kumaravadivelu (1994), calls attention to possible perceptual mismatches among teachers and learners, or what he refers to as “teacher intention” and “learner interpretation”, and their effect on the success and failure of the learning and teaching processes.
Self-assessment, as a formative assessment tool and an alternative to traditional modes of assessment, may, inter alia, be considered a means to minimize such mismatches. The self-assessment procedure has introduced changes to the students’ learning experience. Relevant studies recommend using both self- and teacher assessments in the classroom (Boud, 2013; Falchikov, 2013). It is pointed out that getting learners involved in assessing their own ability would help them improve learning. Following the literature on alternatives in assessment, it was found that previous studies compared self- and teacher assessments and examined the reliability of these multiple ratings. Regarding the importance of assessment criteria and evaluation of learners’ and teachers’ perceptions of common criteria for rating (Orsmond, Merry, & Reiling, 1997, 2000), it seems that the mismatches between learners’ and teachers’ assessments concerning the assessment criteria for rating L2 speaking have remained rather unaddressed.
Given the prominent status of speaking self-assessment, few studies have been done to explore the learners’ rating criteria for L2 speaking and their rating accuracy as measured against those of their teachers. Gaining a better understanding of the mismatches between multiple ratings (i.e., self- and teacher assessments) requires empirical research to examine the differences in the learners’ and teachers’ perceptions and use of speaking scoring criteria and to discover the impact of employing assessment criteria on the accuracy of learners’ ratings. The question that still remains unanswered in the literature on speaking self-assessment is what criteria learners use to assess their own speaking performance and what can be done to increase the accuracy of learners’ self-awarded scores. Since speaking self-assessment involves learners in assessing their own ability and speaking has received scant attention in this regard, as Luoma and Tarnanen (2003) pointed out, the main focus of the present study was whether there is a significant difference between learners’ assessments before and after their being provided with the rating criteria and the follow-up practice session. In addition, this study sought to investigate the extent to which the administration of the scoring criteria affected the congruence between learners’ and teachers’ assessments.
Literature review
Perceptual mismatch
Given the nature of L2 speaking, the perceptual gaps between learners and teachers are unavoidable. The concept of “learner perception”, according to Barkhuizen (1998), involves learners in a decision-making process. Speaking assessment, with specific reference to teaching and learning in the language classroom, emanates not just from learners but from teachers as well. Accordingly, the success of any assessment procedure also depends on the decisions teachers need to make concerning learners’ performance and the criteria used for scoring. Differences in assessing speaking ability, however, do exist regarding how learners perceive the speaking construct and what teachers actually do in speaking assessments. For example, Kuo’s (2011) study on learner–learner interaction within real classroom context revealed learners’ different perceptions of the factors contributing to classroom interaction, including communicative intelligibility, grammatical accuracy, and corrective feedback.
Kumaravadivelu (1991) suggested that perceptual gaps exist as a result of at least 10 potential sources of mismatches between “teacher intention” and “learner interpretation”, namely, cognitive (knowledge of the world), communicative (message-oriented communication skills), linguistic (syntactic, semantic, and pragmatic knowledge of the still developing target language), pedagogic (teachers’ and learners’ perceptions of stated or unstated objectives of teaching/learning), strategic (understanding of learning strategies and styles), cultural (prior knowledge of the target cultural norms), evaluative (articulated and unarticulated modes of self-evaluation measures used by learners to monitor their progress), procedural (paths chosen by learners to solve problems), instructional (directions given by teachers to help learners to solve problems), and attitudinal (participants’ disposition towards the nature of the classroom culture, and teacher and learner role relationships). The gaps in perception need to be identified and properly handled to promote successful learning outcomes in the classroom. Previous research on speaking self-assessment has suggested that learners as reflective practitioners are encouraged to develop more responsibility when assessing their own performance. Therefore, there is a need to shed light on the possible evaluative gaps in learners’ and teachers’ perceptions of speaking ability.
Speaking assessment
The assessment of speaking, as an extremely difficult skill to test, involves a number of procedures to capture all the defining characteristics for objective testing. An understanding of the nature of speaking not only helps define the construct in question, but ultimately makes it possible to identify factors involved in speaking assessment (Kim, 2010). According to Butler, Eignor, Jones, McNamara, and Suomi (2000, p. 10), for example, “such features are likely to include accomplishment of task, sufficiency of response, comprehensibility, adequacy of grammatical resources, range and precision of vocabulary, fluency, and cohesion.” Performance on each aspect may vary from individual to individual and from task to task.
L2 speaking assessment, nowadays, calls for more formative assessment which is a key to assessment for learning rather than assessment of learning (Nicol & Macfarlane-Dick, 2006; Stobart, 2008). Regarding the centrality of assessment practices, a collaborative effort within the democratic atmosphere of the classroom in which a shared responsibility for teaching, learning, and evaluating is overarching has been involved in self-appraisal activities (Butler & Lee, 2006, 2010; Grez & Berings, 2010; Leger, 2009; Orsmond, Merry, & Callaghan, 2013). Speaking self-assessment, as a formative assessment tool, promotes learning, establishes a goal-oriented activity, alleviates the assessment burden on teachers, and finally continues as a long-lasting experience (Kirby & Downs, 2007; Mok, Lung, Cheng, Cheung, & Ng, 2006; Ross, 2006). Speaking self-assessment also relies on the social dimension of learning, as Orsmond and Merry (2013) noted, in which self-directed progressive learning is a key aspect in the appropriation of the learning objectives and the involvement of teachers and learners in collaborative learning. To shed light on the notion of learning in self-assessment, Orsmond (2011) proposed a learner-centered GOALS process. In the GOALS process, learners’ concerns are to grasp the learning objectives, orient themselves towards self-regulation, take actions to achieve the desired goals, evaluate their learning, and develop the necessary strategies to make satisfactory progress. Likewise, an ipsative approach to self-assessment, as Hughes (2011) suggested, informs learner of the perceived progress in relation to the learner’s previous performance. Under the influence of the principles of self-regulation and learner autonomy, Nicol and Macfarlane-Dick (2006) argued that teacher and learner dialogue around the assessment process and the criteria they apply to evaluate performance can be very helpful. As a result, the defining characteristic of effective self-assessment is the involvement of learners in understanding the criteria for rating speaking and empowering them to close the gaps in their performance. Further discussion regarding assessment criteria may help avoid any possible (mis)matches between teachers’ and learners’ perceptions of the criteria.
In line with previous studies on speaking self-assessment (Orsmond, Merry, & Reiling, 1997, 2000; etc.), it follows that learners’ criteria when assessing their own speaking ability as examined against teachers’ rating criteria for L2 speaking have been rather underexplored. In fact, gaining a better understanding of the speaking self-assessment requires empirical research to discover the impact of speaking scoring criteria and the follow-up practice session on the accuracy of learners’ self-assessment when compared to teacher assessment. The aim of the present study, therefore, was to investigate the (mis)matches between learners’ assessment of their own speaking ability and teachers’ assessment and examine the effectiveness of speaking self-assessment before and after the learners’ being provided with speaking scoring criteria and the follow-up practice session. To this end, the following questions constituted the focus of the study:
What criteria do learners use to rate their own speaking performance before their being provided with the speaking scoring criteria and the follow-up practice session?
Is there any significant difference between the learners’ assessment of their own speaking ability before and after being provided with the speaking scoring criteria and the follow-up practice session?
To what extent are learners’ assessments in agreement with teachers’ assessments prior to and following the learners’ being provided with the speaking scoring criteria and the follow-up practice session?
How do the learners evaluate the effectiveness of their speaking self-assessment?
Method
Participants
Twenty-nine EFL learners and six EFL teachers took part in this study. Ranging in age from 18 to 23 years, the learners were 22 females and seven males. They were first-year undergraduate students majoring in English language and literature at Kharazmi University, Tehran, Iran and were enrolled in a conversation course regularly offered to students of English language and literature at this level. It should be mentioned that none of the researchers were responsible for delivering the course. Furthermore, in the Iranian context, nationwide examinations are run yearly to allow entry to university, and, therefore, it can be assumed that the participants of the study enjoyed, more or less, the same level of language proficiency, corresponding roughly to the intermediate level. It should also be noted that none of the learners had the experience of taking part in self-assessment practice sessions prior to the study.
The teachers, on the other hand, were five females and one male ranging in age from 24 to 33 years with three to nine years of experience in teaching English as a foreign language. They were all experienced IELTS instructors and had no previous familiarity with the learner group. Brief profiles of the six teachers are given in Table 1; they are given pseudonyms to protect their anonymity.
Teachers’ profile summary.
Instruments and data collection procedures
First of all, the participant-teachers were asked to team up and decide on three topics to elicit the learners’ description, narration, and argumentation speaking abilities. The learners, then, gathered up in a language laboratory equipped with computerized recording systems, were provided with the three topics about each of which they were required to talk for three minutes. Following a format similar to the second task of the speaking section of the IELTS examination, each topic was followed by a couple of prompts specifying what the learners were supposed to cover in their talk. Provided with the chance to take notes, the learners were also given one minute to get prepared to talk about each topic (see Appendix 1).
Having completed all the three tasks, each learner was provided with an audio-recorded copy of his or her own performance. The learners were, then, given a speaking assessment sheet requiring them to write down their own personal criteria for rating speaking ability. They were also asked to listen to their own voice two times and give themselves a score out of 50 with regard to their performance on each task accordingly (see Appendix 2).
Subsequently, the researchers asked the participant-teachers, all of whom having the experience of preparing learners for the speaking section of the IELTS examination, to develop an agreed-upon list of criteria against which the performance of the learners could be evaluated. In so doing, it should be emphasized that they had no access to copies of the learners’ speaking performance. Using their experience as IELTS instructors and consulting the relevant sources on assessing speaking (e.g., Luoma, 2004), the participant-teachers attempted to develop their own lists of criteria individually in the first place. They were then invited to gather up, share their proposals, and come up with a final list of criteria. What follows, therefore, are the 10 criteria agreed upon by all the six teachers.
Fluency (without pauses, hesitation, and false starts)
Grammar (accuracy and variety of structures)
Vocabulary (appropriateness and variety of expressions)
Pronunciation (stress, rhythm, and intonation)
Communicative effectiveness (clarity of ideas and comprehensible (i.e., understandable) speech)
Topic management (topic relevance, topic coverage, and adequacy of details and examples)
Confidence (anxiety-free speech)
Organization (initiation, development, termination and interconnectedness of ideas)
Strategy use (avoiding unfamiliar language and compensating by using familiar language)
Time management (timing your talk)
Building upon the agreed-upon criteria, the researchers developed a checklist comprised of 10 items on a five-point Likert scale, yielding a total of 50 (see Appendix 3). An optimal interval between the two rating occasions seemed to be one long enough not to let memory effect cloud the participants’ judgments. Moreover, since the participants were not supposed to have any other systematic exposure to self-assessment practice sessions, there was a slim chance for other learning experiences to contaminate the results. Therefore, with an interval of 40 days, the learners were invited to collaborate again in the language laboratory. Having provided the learners with the checklist, the researchers introduced and explained the criteria against which the learners were supposed to evaluate their own speaking ability. Secondly, in order to provide appropriate modeling and enhance the learners’ understanding, three samples of speaking performance were each played twice and evaluated according to the introduced checklist; throughout the first two evaluations, which lasted for about 40 minutes, the researchers, adopting the think-aloud technique, evaluated the samples. Meanwhile, the participants remained mostly silent although they were given the chance to provide comments and pose questions whenever they desired. For the third evaluation, which lasted for about 30 minutes, however, the participants were asked to adopt a more active role and cooperate with the researchers. Finally, the learners were asked to listen to the copies of their own performance two times, evaluate their own speaking ability according to the ten criteria included in the checklist, and give themselves a score out of 50.
Meanwhile, the six participant-teachers were asked to meet up in the same language laboratory; they were provided with copies of the learners’ speaking performance and asked to rate their speaking ability according to the same checklist, without having access to the learners’ scores.
Finally, about one month later, the learners were asked to write reflection papers in English and elaborate on their attitudes towards speaking-assessment in general and what they experienced in the course of the present study in particular. As mentioned before, all the learners were enrolled in a conversation course of which self-assessment was not a regular component. They were only briefly introduced to the idea and practice of self-assessment for the purpose of the present study. As such, it was attempted to assure the learners not to feel any pressure to respond positively since the attitudes they were sharing were not supposed to affect their final course grades at all.
Data analysis
To answer the first two questions, the data were gathered from the two administrations of the speaking assessment sheet, once before and once after the learners’ being provided with the speaking scoring checklist and the follow-up practice session. As to the first question, the criteria the learners mentioned in their comments for rating speaking on the first occasion were analyzed inductively and grouped into categories. Moreover, regarding the second question, the two sets of quantitative data collected on the two occasions were, also, entered into the SPSS program and were analyzed by the paired-samples t-test to discover whether or not significant changes occurred as to the learners’ assessment of their own speaking performance owing to the provision of the speaking scoring checklist.
As to the third question, the Pearson product–moment correlation was run to investigate any significant relationship between the sets of scores assigned to the learners’ speaking ability by the learners themselves and the teachers on the two occasions. The two r values were then converted into Fisher’s z-scores so as to investigate whether the correlations for the ratings prior to and following the criteria were statistically significant.
Finally, to probe the fourth question, the researchers adopted the Constant Comparative Method, an inductive and data-driven method used in grounded theory (Glaser & Strauss, 1967), to analyze the qualitative data emerging from the learners’ reflection papers. Being widely employed in qualitative studies, the Constant Comparative Method was considered appropriate to be adopted to extract recurring themes and categories from the data (Creswell, 2012). First of all, the indicators were identified and coded throughout the data set. Next, the coded data were grouped into units called themes. The themes, in turn, were grouped together to form categories. Checking the data against the extracted codes, themes, and categories continued until full “saturation” was achieved (see Creswell, 2012). It should be noted that the researchers went through the data analysis procedure individually. They, then, met up to discuss potential instances of disagreement and reach a final consensus. Accordingly, the categories and themes presented under the result section are the ones that the researchers agreed on.
Results
Learners’ self-mentioned criteria for rating speaking
The first research question intended to probe the learners’ criteria to rate their own speaking performance before their being provided with the teachers’ agreed-upon criteria and the follow-up practice session. Through inductive analysis, the learners’ comments were compiled and categorized to arrive at the learners’ assessment criteria for L2 speaking. The emerging criteria were subsequently put under seven categories.
What follows presents the extracted criteria along with follow-up, relevant excerpts taken from the learners’ comments on their areas of strength and weakness.
1. Topic management:
Recurring in all comments (f = 41), this criterion referred to the adequacy of the answer (i.e., topic coverage) while being focused on the topic. L16, for instance, stated that
I’m not completely focused on the topic. I couldn’t manage to answer all the questions.
2. Confidence:
Similarly present in almost all comments (f = 38), the second criterion covered anxiety-free speech and confidence in speech delivery. In this regard, L8, for example, noted that
I was not prepared for a sudden exam. I was so stressful and I lost my concentration
3. Fluency:
The next major criterion (f = 37) pointed to smooth speech without hesitation and pauses. L6, for instance, wrote that
There was a gap between my words and I used a sound like Hmm a lot. And the speed of my speaking was not good.
4. Time management:
Present in most comments (f = 33), this category encompassed the theme of “keeping the time”. Echoing the same criterion, L26, for example, commented that
I was supposed to talk 3 minutes. But I couldn’t speak for 3 minutes for each topic.
5. Grammar:
Grammatically correct utterances with the corresponding features of appropriateness and accuracy was another frequently mentioned criterion (f = 30) by the EFL learners when assessing their speaking ability in the current research. In this regard, one of the learners (L1), for example, noted that
I made some grammar mistakes and I couldn’t control the complex grammatical structures in my speech.
6. Vocabulary:
Several learners (f = 28) also proposed range and variety of lexical choices. The learners were primarily concerned with the effective use of a wide range of vocabulary in their speaking performance. L12, for example, stated that
The words that I used were very simple. I also repeated some words. Although I know a lot of expressions and collocations, I did not use them.
7. Pronunciation:
The final prevailing criterion (f = 25) referred to prosodic features of speaking, including accent and voice quality. Referring to this criterion for rating speaking, one of the learners (L8) commented that
The most serious problem was my accent. I spoke in a Farsi accent and this was completely obvious in my speech.
Learners’ assessment of their own speaking ability
The second research question explored any significant difference between the learners’ assessment of their own speaking ability before and after their being their provided with the speaking scoring criteria and the follow-up practice session. To address this question, a paired-samples t-test was employed to investigate the effect of the speaking scoring criteria on the learners’ ratings. As displayed in Table 2, the learners’ mean scores before and after their being provided with the criteria are 36.08 and 33.82, respectively. A comparison across the two ranges of mean scores reveals that the learners gave themselves higher scores on the first occasion than the second.
Paired-samples statistics for the learners’ ratings.
Table 3 presents the results of the paired-samples t-test applied to the mean scores of the learners on the two rating occasions. The results show a statistically significant difference in the learners’ ratings with the t-observed value of 3.08 (p < .01). The mean difference in the speaking ratings was 2.25 with a 95% confidence interval ranging from .75 to 3.76. Given the eta-squared value of .25, it can be concluded that there was a large effect size, with a substantial difference in the learners’ ratings when they were given the criteria.
Paired-samples t-test between learners’ assessment of their speaking ability before and after being provided with the speaking criteria and the follow-up practice session.
Learners’ and teachers’ assessments of learners’ speaking ability
The third research question investigated the extent of agreement between the learners’ and teachers’ assessment prior to and following the learners’ being provided with the criteria and the follow-up practice session. As shown in Table 4, to analyze differences of ratings between the teachers and learners, the mean scores, maximum and minimum scores, and standard deviations were calculated between the two groups. The mean scores for the teachers and learners do not show a large difference following the scoring criteria. The difference between the two mean scores in the second phase is just .55 (L2−T), indicating an increase in agreement between the learners’ and teachers’ ratings.
Descriptive statistics of total assessments.
The next step in the analysis was to calculate the correlations between teacher assessment and self-assessment on the two occasions. The scores reported in the current study for the teachers were the average scores given by all the six teachers with the inter-rater reliability of .81. The correlation between the scores awarded by the learners and teachers was estimated using the Pearson product–moment correlation coefficient. Preliminary analyses were performed to ensure no violation of assumptions, including normality, linearity, and homoscedasticity. Table 5 gives the correlation indices. The resulting coefficient for the learners, before their being provided with the criteria, and the teachers was found to be .73 which is significant at p < .01 level.
Pearson product–moment correlations between learners’ and teachers’ ratings.
Correlation is significant at the .01 level (2-tailed).
Furthermore, as displayed in Table 5, the correlation coefficient between the teachers’ ratings and the learners’ self-awarded scores following the scoring criteria points to a strong, positive correlation (r = .90), which is also significant at the p < .01 level.
The correlation coefficients between the teachers’ and the learners’ evaluation of the learners’ speaking ability on the two occasions are statistically significant. Moreover, building upon the effect size criteria developed by Cohen (1988), proposing that a Pearson correlation coefficient larger than .50 is considered to be a large one, it can be argued that the correlation between the teachers’ and the learners’ evaluation was high even when the learners had no access to the criteria. However, when squared, the first correlation coefficient (r1 = .73) yields .53, indicating 53% shared variance. The coefficient of determination for the second coefficient (r2 = .90), on the other hand, is .81 and shows 81% of shared variance which is noticeably larger than 53%. Furthermore, Fisher’s z-transformation was employed to convert the two r values into z-scores and, accordingly, compare the statistical difference between the two correlation coefficients. The result of the comparison (zr1 = .93, zr2 = 1.53, n = 29) was found to be −2.17 (CI = 95%), indicating that the coefficients were significantly different.
This implies a significant increase in agreement between the learners’ and teachers’ scores following the provision of the scoring criteria and the follow-up practice session.
The effectiveness of the speaking self-assessment
The fourth question aimed at probing the learners’ attitudes regarding the effectiveness of their speaking self-assessment. The answer to this question emerged from the analysis of the reflection papers written by the learners. The major themes deriving from the data were as follows: (1) increased self-awareness and detection of weak points followed by improved learning; (2) the positive influence of the use of speaking scoring criteria on the accuracy of self-assessment; (3) the long-lasting effect of self-assessment, in comparison to teacher assessment, on learning; (4) the unreliable nature of self-assessment; (5) the time-consuming nature of self-assessment; (6) encouraging ongoing conduction of self-assessment; and (7) providing perfect models.
In the following paragraphs, these themes, presented under three main categories (i.e., the benefits of speaking self-assessment, the limitations of speaking self-assessment, and suggestions for the betterment of self-assessment) are elaborated on. It should be pointed out in advance that no changes were made to the excerpts taken from the learners’ reflection papers with regard to grammar, punctuation, and so on, except where a change seemed absolutely necessary to rule out any possibility of misunderstanding. It should be also noted that in order to guarantee the learners’ anonymity, they were given pseudonyms.
Benefits of speaking self-assessment
Increased self-awareness and detection of weak points followed by improved learning
In one way or another, all of the learners (f = 29) commented that the speaking self-assessment was helpful since it drew their attention to their weaknesses in speaking English and offered them the chance to work on and minimize those weaknesses. In this regard, one of the learners (L18), for example, stated that
I think it helps us to evaluate our own speaking ability and be able to fix our problems. I found out what errors do I make the most while speaking and what parts I should focus on to speak better.
Similarly, L15 and L24, respectively, commented that
Listening to your speaking lets you know your errors … It is helpful because you can understand your mistakes and you can try to avoid them by practicing.
I think it is very useful and we can improve our speaking skill by understanding our weaknesses. I could understand my speaking weaknesses by listening to my recorded voice and it was very helpful for me to think about them and write them on a paper.
Moreover, referring to their own current, actual experience with self-assessment, L20, highlighting the importance of the checklist they were provided with, expressed his ideas in the following words.
As soon as we heard the recordings and listened to our own voice, we understood we have a lot of problems in our speaking. So we started doing our best in order to eliminate and solve these speaking problems of ours.
Some of them went even further and claimed that weakness detection and, accordingly, improvement in speaking proficiency may not be possible unless one listens to his or her own voice. L9, for instance, pointing to her own detection of her areas of difficulty through self-assessment, wrote that
Until we can’t hear our own voice it’s very obvious that we can’t get our mistakes and mistakes can make progress. Now I got that I don’t have a good pronunciation and I will surely work on it.
Echoing the same ideas, L8 and L7, respectively, stated that
It helped me to know my weaknesses. For example, when I listened to my audio-recorded speech, I found out that I have to improve my accent.
It let me hear my voice to decide how to speak better. Sometimes I need to raise the intonation or fall it. Or I just figured out that there is vibration in my voice which actually needs to be overcome.
The positive influence of the use of speaking scoring criteria on the accuracy of self-assessment
As to the benefits of speaking self-assessment, the second major theme (f = 20) that emerged from the data was concerned with the learners’ positive evaluation of their being provided with the speaking scoring criteria in the second phase. L26, for example, comparing the first and the second phase, commented that
[In the case of the first phase] I didn’t know if my scoring was fair enough because my speaking criteria may be different from my teacher’s. Besides what matters the most for me, may matters the least for [the teacher]. [However, the second phase] gave me a clearer outlook of the required elements of how to give a good speech. It was very helpful to learn about new criteria of speaking fluently because I can work on new, crucial factors that I now know of.
Explaining how the speaking scoring criteria made her aware of all the factors leading to standard, effective speech, the same learner went on and added
I realized that giving a good speech is not only speaking with [standard] accent! You need a lot of other factors. You have to be able to organize your sentences; you need a good vocabulary and good information about the topic of discussion.
Similarly, L11, emphasizing that the speaking scoring criteria encouraged them to have a more realistic assessment of their own speaking proficiency, stated that
Analyses by listeners were more detailed, so we would have a more careful assessment and better scoring system. Here the required factors were mentioned so the assessments were nearer to real results.
Along the same line, with respect to the practicality of speaking self-assessment when they had access to the speaking scoring criteria, L16 and L12, respectively, wrote that
In comparison with phase one, the scoring system was more efficient I guess and the criteria which were more clearly mentioned. They were more detailed so the scoring was easier.
I think, realistically, I can pay more attention to and focus on the criteria which I wasn’t concerned before. I think it is more practical than phase one because there may be some other criteria that should be regarded, but we didn’t know anything about them in phase one.
The long-lasting effect of self-assessment, in comparison to teacher assessment, on learning
Present in most of the reflection papers (f = 16), the third major theme under this category referred to the positive long-term effect of doing self-assessment on learning. Taking into account the context of the present study where final course grades are reported in the absence of any formative feedback, the learners believed that they would not be provided with the chance of being aware of and working on their typical areas of difficulty. In their views, teacher assessment, in contrast with self-assessment, may not have played a positive, active role in the process of their learning. L11, for example, stated that
When you review your task, you can find your error. While finding your error, you can resist making it again, but if you take your exam result, you won’t be able to find out what is your fault … When the exam is gone, we care about the result, not our mistakes.
Some of the learners went even further and claimed that even if they were provided with feedback by their teachers, they would not be very likely to take maximum advantage of such feedbacks in the long-run. L12, for instance, stated that
Definitely, it’ll inform me of something that I could have done but by the self-assessment, this will stick in my mind more efficient than when it’s remarked by someone else.
Furthermore, L4 commented that
You know. I don’t remember how many times I received feedback to my speaking from my professors. Even if I did, I don’t remember. But this experience was wonderful. I never cared about the organization of my speaking. You know. But now I am going to pay attention to it whenever I am going to talk.
Limitations of speaking self-assessment
Although all of the learners had something positive to say about speaking self-assessment in general and their experience with doing self-assessment of their own speaking proficiency in particular, some of them pointed out some limitations of self-assessment that, to them, may have compromised its benefits. Two major themes under this category are presented below.
The unreliable nature of self-assessment
Some learners (f = 11) believed that as a result of learners’ lack of expertise in language assessment, self-assessment is unreliable in nature. They felt the desperate need for teachers’ assessment of their speaking performance. L16, for example, commented that
[The disadvantage of self-assessment is the lack of] being assessed and commented by a specialist who can see all aspects better and can criticize based on standards.
Likewise, casting doubt on their capability to self-assess, even after being aware of the relevant criteria, L25 and L15, respectively, commented that
Because we are not aware of the exact criteria of scoring, it would be more helpful, if someone else gave us a score.
I think if there was a teacher assessment too, it could be better. I didn’t know if my scoring was fair enough because what matters the most for me may matter the least for my teacher.
In the same vein, some learners pointed out that there is a general tendency among language learners to overestimate their own abilities when they do self-assessment, a problem adding to the unreliability of self-assessment. Regarding this, L13 and L20, respectively, wrote that
Sometimes there are students who have a high self-confidence and they give themselves a high score without paying attention their problems.
Perhaps one of the most common disadvantages in such procedures is that some students score themselves more than what they deserve. [So,] instructors should not rely only on students’ self-assessment for final scoring.
The time-consuming nature of self-assessment
Under this category, the second theme emerging from the data (f = 5) referred to the time-consuming nature of self-assessment. For example, L11 and L18, respectively, commented that
In my idea, it was time consuming for students, but generally it didn’t have much disadvantages.
There were no disadvantages although it was a little time consuming.
Suggestions for the betterment of self-assessment
The last category was concerned with the suggestions put forward by the learners for the betterment of self-assessment. In this regard, two major themes extracted from the data were as follows.
Encouraging ongoing conduction of self-assessment
The first theme (f = 14) showed that the learners believed that, for self-assessment to be beneficial, it has to be an ongoing process and done on a regular basis. L13 and L9, respectively, were learners whose reflection papers provided evidence for this theme.
It should be repeated regularly with the better standard and it can help the students to progress more and more.
I think just one time is very low for this practice. It should be continued so we can see if we had improved or not.
Providing perfect models
The other theme under this category (f = 6) revealed the learners’ desire to be provided with perfect models of language performance under the assumption that exposure to such models helps them get familiar with all aspects of effective speech and paves the way for them to become like those models in the long-run. L20, for instance, wrote that
… to present a practical standard example. Maybe a recording of an expert that knows the criteria and performs according to them can be helpful.
Discussion
The results indicated that a number of linguistic and non-linguistic criteria encompassed both the learners’ and the teachers’ mentioned criteria for rating speaking. The analysis of the comments the learners wrote when assessing their own ability before their being provided with the criteria showed the learners were more concerned with topic management, confidence, fluency, time management, grammar, vocabulary, and pronunciation. However, they failed to point to macro-level components, like organization, strategy use, and communicative effectiveness, included in the list of the teachers’ agreed-upon criteria with which they were provided on the second occasion. Based on the findings of the present study, it appeared that the teachers’ criteria were compatible with those reported in previous studies (e.g., Iwashita, Brown, McNamara, & O’Hagan, 2008; Plough, Briggs, & Van Bonn, 2010; Zhang & Elder, 2011). The learners’ self-mentioned criteria, on the other hand, suggested that the skills-and-components-based perspective made them lose sight of higher-order speaking assessment criteria in their self-awarded ratings. This, therefore, in line with previous research (e.g., Orsmond, Merry, & Reiling, 1997, 2000), reveals that the learners were not able to make sound judgments about their own ability prior to the application of the assessment criteria.
Furthermore, the analysis of the learners’ assessments showed significant differences between their self-ratings prior to and following their being provided with the assessment criteria and the follow-up practice session. This finding lends support to the claim that the application of different rating criteria may lead to lack of precision in speaking assessment (Chalhoub-Deville & Wigglesworth, 2005). In addition, it can be argued that providing the learners’ with the teachers’ agreed-upon criteria, along with the follow-up practice session, seems to have led to a better understanding of the importance of the factors required for more consistent rating, a more comprehensive view of speaking ability, and, in turn, a narrowed gap between “teacher intention” and “learner interpretation” (Kumaravadivelu, 1991).
Meanwhile, it was found that the learners rated their ability relatively higher on the first occasion than they did on the second occasion. The overestimation of the self-ability could be ascribed to the application of the learners’ personal criteria on the first occasion. In line with the existing literature (e.g., Jafarpur, 1991; Saito, 2000), it is, then, proposed that the correlations of learners’ and teachers’ ratings in speaking assessment are relatively low when learners apply their own personal criteria for self-rating.
With regard to the third question, it was found that the correlation between learners’ and teachers’ assessments significantly increased after the learners were provided with the scoring criteria and the follow-up practice session. It can, then, be suggested, in line with previous literature (Chen, 2008; Rust, Price, & O’Donovan, 2003, etc.), that the learners’ enhanced understanding of speaking scoring criteria may bring about improved learning and higher self-assessment accuracy.
Recent explorations in learners’ perceptions suggest that there exists a gap between what teachers actually do and how what they do is perceived by their learners (Higgins, Hartley, & Skelton, 2002; Orsmond & Merry, 2011; Wong, 2009). The present study revealed that the provision of speaking scoring criteria contributes to the convergence between learners’ and teachers’ assessments. The findings of this study indicated that providing learners with the assessment criteria may be a step forward with regard to minimizing the potential evaluative mismatches, in response to the call made by Kumaravadivelu (1994), between teachers and learners. The narrower the gaps between teacher intention and learner interpretation, the greater are the chances of achieving speaking rating accuracy.
The answer to the fourth question came from the analysis of the qualitative data deriving from the learners’ reflection papers. Seven major themes were identified under three main categories, namely benefits of self-assessment, limitations of self-assessment, and suggestions for the betterment of self-assessment. With regard to the benefits of self-assessment, literally all of the learners evaluated their experience with assessment of their own speaking proficiency positively and commented that it gave them the opportunity to gain a better understanding of their areas of difficulty and, accordingly, embark on making improvements. This theme, consistent with the existing related literature (Leger, 2009; Orsmond & Merry, 2011; Patri, 2002; to name but a few) provides further support for the claim that self-assessment, as an alternative to traditional modes of assessment, is an efficacious method of raising learners’ self-awareness and facilitating their learning. This theme also resonates with the basic premises of the GOALS process (Orsmond, 2011), highlighting the point that the developmental aspects of learners’ self-directed learning rely on social rather than cognitive dimensions. According to this model, learners’ self-assessment, as a social practice, leads to the betterment of the learning experience.
Continuing with the benefit of self-assessment, the second theme referred to the role the speaking scoring criteria played in raising the learners’ understanding of the building blocks of standard, effective speech and leading them towards having more realistic assessments of their abilities. This theme, in line with previous research (e.g., Cheng & Warren, 2005; Shimura, 2006), indicates that learners’ having access to the relevant criteria results in increased accuracy of self-assessment. Moreover, this theme yields further evidence that providing learners with the standard criteria may contribute to minimizing the possible evaluative mismatches (to use Kumaravadivelu’s words, 1994) that exist between learners and teachers.
The final theme emphasizing the benefits of self-assessment referred to the learners’ positive evaluation of the long-lasting effect of doing self-assessment on their learning. Echoing the same ideas, several studies in the literature (e.g., Boud & Falchikov, 2006; Cheng & Warren, 2005; Orsmond, Merry, & Reiling, 2000) have come to the conclusion that self-assessment, as a tool for formative assessment, is far more effective than teacher assessment when it comes to long-term learning. Boud (2000), for example, suggests that sustainable self-assessment as a self-learning activity, results in improved short- and long-term learning.
Despite their overall positive evaluation of self-assessment, a few learners, however, highlighted some limitations that might compromise its benefits. The first theme under this category shed light upon the learners’ skepticism as to the reliability of self-assessment. Feeling the overwhelming need for teachers’ assessment of and passing judgment on their speaking proficiency, the learners suggested that learners’ assessment of their own performance is, more often than not, unreliable and, in most cases, overestimating. This theme, in harmony with previous findings in the literature (Leger, 2009; Matsuno, 2009; etc.), suggests that self-assessment, as a formative assessment tool, is not likely to be an appropriate candidate to be employed for summative purposes.
Moreover, the second theme in this regard reflected the learners’ relative discontent with the time-taking nature of doing self-assessment. In line with previous research (e.g., Hanrahan & Isaacs, 2001), this theme lends support to the hypothesis that learners, despite their overall positive evaluation of the benefits of self-assessment, may be disheartened to do self-assessment more frequently owing to its time-consuming nature. However, the GOALS process (Orsmond, 2011) emphasizes that the time-consuming nature of self-assessment, as a situated learning practice, is unavoidable considering the fact that learning is integrated into social practice in which externally taught learning is a prerequisite for internal learning.
Yet, under the third category, which was concerned with the suggestions made by the learners to maximize the benefits of doing self-assessment, it was proposed that, for learners to enjoy the full benefits of self-assessment, they need to do it on a very regular basis over a long period of time. This suggestion seems to be supported by previous findings in the literature (Leger, 2009, to name one). According to Hughes (2011), for instance, the central role of self-assessment in terms of the long-term progress has been linked to ipsative assessment, which is likely to account for how self-regulated learners have advanced since the previous assessments. Similarly, Boud (2000) discussed the importance of self-assessment in a view of “learning society”, suggesting that self-directed learning needs to be active to encourage progressive learning. In the context of learning-oriented assessment, Boud and Falchikov (2006) also argued for the need to align assessment with the goal of fostering long-term learning.
The other suggestion put forward by the learners was that learners should be exposed to perfect models of speaking performance under the assumption that exposure to such models deepens their understanding of the factors contributing to standard, effective speech. This suggestion appears to be in harmony with previous findings in the literature, indicating that introducing learners to exemplars contributes to their improved learning (see Handley & Williams, 2011; Hendry, Bromberger, & Armstrong, 2011; Orsmond, Merry, & Reiling, 2002).
Conclusion
Developing a shared understanding of the goals and objectives of a given curriculum is believed to contribute greatly to the coherence and, as such, success of that curriculum (Johnson, 1989; Kumaravadivelu, 1994). The findings of the present study revealed that giving learners the chance to do self-assessment according to an agreed-upon set of scoring criteria is an effective way to increase learners’ and teachers’ agreement on how to rate learners’ speaking proficiency, minimize perceptual mismatches, or more specifically “evaluative mismatches”, to use Kumaravadivelu’s (1994) exact words, between them, and, hence, contribute to the coherence and success of the curriculum.
The findings of this study bear both theoretical and practical implications. At the theoretical level, with specific focus on the under-researched area of speaking self-assessment, this study may be considered a pioneering step in providing empirical evidence on the effectiveness of doing self-assessment in minimizing “evaluative” mismatches between teachers and learners. At the practice level, an implication for teachers is to provide learners with the opportunity to take part in an ongoing procedure of self-assessment to monitor and enhance their learning over time (Leger, 2009) and to accept ownership for their successes and failures in learning (Tremblay & Gardner, 1995). The pedagogic benefits of involving students in self-assessment have been confirmed by both teachers and learners in the related literature (Butler & Lee, 2006, 2010; Chen, 2006; Cheng & Warren, 2005). A further implication for major ELT stakeholders, namely policy makers, syllabus designers, materials developers, teacher trainers, and teachers, is to contribute to the success of the teaching and learning process by developing and sharing comprehensive and comprehensible lists of scoring criteria.
However, this study was not without limitations which may have compromised the generalizability of the findings. First, the data were mainly gathered from a convenient sample of 29 Iranian English language learners and six Iranian English language teachers. To lend support to the findings of this study, future research, thus, may be done with larger and more representative groups of participants in diverse contexts. Moreover, proficiency level was not taken into account as a variable in this study. As evidenced in the literature, instructors are, more often than not, reluctant to rely on learners’ self-ratings owing to the unreliable nature of these ratings. The unreliability of self-assessment might be a result of learners’ lack of language proficiency (Shimura, 2006); therefore, it seems that the effect of learners’ proficiency level on the accuracy with which they do self-assessment needs further investigation. Furthermore, in this study, the learners’ assessment of their own speaking proficiency was limited to a particular task type, that is, monologue. Future research is, then, suggested to explore the same research questions, employing varied speaking task types. Further research on other areas of language proficiency is also needed to validate the claim that requiring learners to do self-assessment is an effective way to minimize perceptual mismatches between teachers and learners in particular and contributes to the success of the curriculum in general.
Footnotes
Appendix 1
Appendix 2
Appendix 3
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
