Abstract
Abstract
The Artificial Intelligence Writing Evaluation system is widely used in China College English writing. It provides for both teachers and the English learners services of automated composition evaluation on the net in order that teacher's working load can be reduced and they can learn directly about the students' English writing level and that the students' English writing will be improved. Juku automated writing evaluation (AWE) is one of the most used systems among colleges and universities in China. The empirical study was conducted on the use of Juku AWE in college English teaching. Through the experiment with 114 students from 2 classes in Xi'an University and questionnaires and interview for both 30 teachers and 200 students using Juku AWE, the author finds that: (1) Using AWE does effectively help the students with their English writing; (2) Both teachers and students have positive attitude to the use of AWE in terms of immediate and clear feedback, time-saving, and arousing interests in English writing; and (3) AWE still needs to be perfected as it cannot provide proper evaluation on the text structure, content logic, and coherence. So both teachers and students should take the score from AWE objectively.
Introduction
In college English teaching in China, English writing as an important vehicle to achieve academic and social interaction is still the most difficult obstacle to overcome. In writing process, feedback plays an important role because it can help the writer assess how effectively the writing mediates the intended messages. 1 Besides, the most common and effective type of feedback is teacher feedback. Unfortunately, this puts an enormous load on the China College English teachers as colleges in China expand their enrollments and the number of students has been sharply increasing. As a result, giving feedback to students in English writing becomes a great heavy burden for College English teachers. On the one hand, teacher feedback is not enough in terms of both quality and quantity. On the other hand, as feedback delay is feedback denied, delayed feedback has lost the best timing for students to apply it to writing improvement and generally it becomes a useless feedback. 2 When the feedback delays too much time, students may have little memory of what they have written. Thus, they cannot revise and perfect their essays according to the feedback given by their teachers.
Under this situation, the Artificial Intelligence Writing Evaluation systems, which are designed to produce immediate computer-based scores for submitted essays along with the diagnostic feedback, have been widely used by English teachers as an alternative assessment and feedback tool in a lot of middle schools and colleges in China. The most widely used automated writing system is Juku correcting net system known as Pigai.org. The present study takes Juku as an example to aim at finding whether it is really effective to help both the English teachers and the students in English teaching and writing, their use of automated writing evaluation (AWE), their attitude toward AWE, and so on.
Literature Review
AWE, also named as computer-assisted writing assessment, computerized essay scoring, computer essay grading, or machine scoring of essays is defined as “the ability of computer technology to evaluate and score written essays.” 3
AWE systems have been under development since the mid 1960s, when a national network of U.S. universities, known as the College Board, supported the development of Project Essay Grade to help score thousands of high school student essays. 4 Page, who was involved with the development, once demonstrated that original enthusiasm for the results was tainted by practical considerations. In the 1960s computer technology was not advanced enough or accessible enough to expand into a larger scale.
In the 1980s, a second product, the Writer's Workbench, was brought to the market when microcomputers were introduced with interest in Project Essay Grade once again renewed. 5 The Writer's Workbench did not only score essays but also provided feedback to writers on their writing quality. Although technology at that time operated on a narrow definition of quality, as it only allowed for recognizing misspelled or misused words and for identifying long or short sentences, the Writer's Workbench pointed the field in an important direction: focus on feedback.
After 10 years of development, in 1998 Vantage Learning finally released the scoring engine IntelliMetric. It was known as the first holistic essay-scoring tool based on artificial intelligence (AI).6,7 Using a blend of AI, natural language processing, and statistical technologies, IntelliMetric is a type of learning engine that internalizes the “pooled wisdom” of expert human raters. 6 One of the best attributes of IntelliMetric™ is its capability of evaluating essay responses in multiple languages, including English, Spanish, Hebrew, Bahasa, Dutch, French, Portuguese, German, Italian, Arabic, and Japanese. 6
Since then, with the significant advancement in computer technology, the potential of AWE has been largely deployed in research, which has been informed by different types of perspectives from teaching pedagogy, educational measurement, and cognitive science. In other words, what is considered to be most beneficial for students, models which reflect students' thought processes, psychometric evaluations of reliability and validity, and considerations about operational systems and their functionality—all have contributed to the development and implementation of AWE systems. WritetoLearn™, Criterion, and My Access! are the most popular three AWE systems in the market, and they were specifically developed from Intelligent Essay Assessor, E-rater, and IntelliMetirc Automated Writing Scoring engines. Many colleges and universities, high schools, and language testing organizations use AWE system to provide grade to student essays.4,6,8–11
In China, a variety of AWE systems have also emerged in recent years, such as EFL Essay Evaluator (EEE) invented by Liang Maocheng,12,13 Writing Roadmap developed by CTB/McGraw-Hill, Juku AWE system developed by Beijing University of Posts and Telecommunications, partly one of English Writing Intelligent Tutoring Systems (EWITS) based on corpus and cloud computing, Bingo English AWE system, E-rater (Electronic Essay Rater)TM and ETS (Educational Testing Services) Criterion Online Writing Evaluation Service. Juku correcting net, Writing Roadmap, and Bingo English AWE system are the main ones used in China, with Juku being used most popularly among college English teachers and students.
Studies on AWE system
Studies on AWE system mainly involve the effectiveness of the AWE, the differences of using AWE and the traditional way, the comparison of two AWE systems, teachers' and students' attitude to AWE, and so on.
Elliot and Mikulas 14 conducted a study among 709 students of 11th grade using one of the AWE systems MY Access! They claimed that the students who used MY Access! showed improvement to a greater extent than those students who did not use it. Warschauer and Ware 15 indicated that this study was not methodologically good enough for lack of control groups in either the pilot or follow-up study or random assignments. Warschauer and Grimes 16 did a study with My Access! in four secondary schools and found that both teachers and students had positive attitudes toward MY Access! in terms of increasing the students' motivation, promoting autonomous student activity, and being a time saver for teachers. Grimes and Warschauer investigated the attitudes of teachers and students toward the use of the MY Access! system at middle school settings and found that despite the negative views of many teachers and students concerning the reliability of the system, MY Access! provided benefits in terms of classroom management and student motivation. 17
Shermis et al. conducted a study using random assignments and found no significant differences between the experimental group and the control group. 18 Attali 19 gave a detailed account of how Criterion was used by 6th–12th graders throughout the United States during the 2002–2003 school year and, in particular, what kinds of changes occurred through revision of essays. Attali's study makes no difference between students by grade level, school, or language background. However, the software easily allows these types of analysis to be carried out.
Chodorow et al. 20 reported two experiments that evaluated the two systems: Criterion and ESL Assistant, for identifying and correcting writing errors, including articles and prepositions. Studies on AWE attach more importance to whether AWE feedback can improve students' writing proficiency or how AWE feedback affects students' revisions and how the students react to the feedback received.
In China, with the increase of large-scale tests of different types and the increasing number of students, teachers have great burden in grading the test papers of students, and the reliability of the students' score is dubious when teachers may score subjectively. Thus the automated grading system is in great need. The early study on AWE was conducted by Liang Maocheng, 21 who studied the application of AWE in English composition of Chinese students, while Li Yanan did the research with Chinese language test. 22 Cao Yiwei and Yang Chen conducted the research on Chinese composition scoring by means of potential semantic analysis. 23
AWE systems have been widely used and discussed in language teaching and learning in foreign countries, but in China it started late and only a few studies have been conducted to explore the students' perceptions toward it, and numbered studies still need to be carried out to compare the AWE feedback with other sources of feedback.
Studies on feedback on students' English writing
Feedback plays an important part in encouraging and strengthening students' English writing. It was introduced to the language acquisition field as comments or information learners receive either from teachers or from other learners on the success of a learning task. 24 It has been a concern of various researchers for centuries.25–30
Many researches and studies explore different aspects of the feedback in English writing in resources,31–33 ways,34–36 and focuses.37,38 Some tried to prove the effectiveness of the feedback given by teachers,1,39–45 some explore the peer feedback,45–47 and some compare the effects on feedback on the writing process and on the performance.48–55 The subjects in the studies are different from native learners to ESL and EFL learners.56–61 Languages in the studies are L1 (native language), L2 (the second language), and FL (foreign language).
In recent years in China, with the application of AWE systems such as JUKU Pigai.org and Bingo intelligence review system, some scholars have concentrated on the study of the reliability and validity of these systems. Some studies are about students' attitudes toward the AWE system. Some researches are conducted among high school students, 62 English majors,63–65 and non-English majors.66–71 Few contrastive studies are conducted on the effectiveness of AWE feedback and teacher feedback. The effects of AWE feedback and teacher feedback need to be further studied.
Research Design
Research questions
Whether AWE is an effective platform to help students with their English writing?
What are the attitudes that teachers and students hold to AWE?
Whether there is a great difference between the traditional feedback and the one given by the computer?
Subjects
One hundred fourteen students from two classes majoring in engineering in Xi'an University in Shaanxi Province were chosen as participants for this study. They are all sophomores in college with 63 from class 1 and 51 from class 2. Randomly one class is the controlled one using traditional way of teaching, and the other is the experimental one receiving Juku AWE feedback. It should make clear that the students were told nothing about the experiment to make sure about the validity of the study.
Means and instruments
The study adopts the combination of quantitative method and qualitative method. The quantitative method is the analysis of the students' scores in the tests before and after the experiment. The qualitative study includes the questionnaires and interview for both teachers and the students. SPSS is used to analyze the data from pretest and post-test, and t test is used to see whether there is an important difference between the two classes.
Steps in the research
Step 1. Pretest
At the beginning of the semester, both the experiment class and the control class (CC) were asked to write a short essay of about 150 words on the given topic within 30 minutes under the supervision of two teachers. The teachers scored all the papers and input all the scores in the computer for further analysis.
Step 2. Registration
First, students of the experimental class (EC) were informed to register on the platform of Juku AWE. Then the teacher distributed the writing task and asked them to submit their composition online according to the requirement. Students can revise their compositions and resubmit until they felt satisfied with their compositions and the scores given by the computer. After that, the teacher's feedback was added to the feedback online. Finally, the teacher summarized the commonly and easily made mistakes that the students often made in their compositions and kept them online for students to discuss. After the students submitted more than 10 compositions, the teacher could set up “My Website” from where new tasks of writing composition would be assigned to the students, well-written compositions could be recommended to the students, or files related to writing could be uploaded. Students could have the right to read and to scan these materials.
Step 3. Experiment
Both of the experiment class and the CC had the same teaching content of writing and the same writing task.
The experiment lasted for one semester. The AWE teaching was applied in the EC. The teacher assigned the students writing task on AWE, and then the students submitted their compositions online with the limit of 1 or 2 weeks. Students could revise their compositions with no limitation until they felt satisfied. After that the teacher read the students compositions online and added the teacher's feedback at proper time. Finally, the teacher gave common comment on all the compositions and summarized the common and easily-made mistakes by students and discussed them in the class.
The traditional way of teaching was applied in the controlled class. The teacher gave the instructions of writing, and the students wrote the composition independently after class. Then the teacher graded the composition and gave feedback and suggestions to students. Actually because of the energy and the time limit, the teacher gave two less compositions to the controlled class.
Step 4. Post-test
At the end of the semester, after the experiment, again both the experiment class and the controlled class were asked to write a short essay of about 150 words on the given topic under the supervision of two teachers. The teachers scored all the papers and recorded the data in computer for further analysis.
Step 5. Questionnaires and interview
The author designed questionnaires for both the teachers and the students to learn the effect of using AWE in writing, teaching, and learning, their attitude to the use of AWE, and so on. Thirty teachers and 200 students in the same university who had used Juku AWE answered the questionnaire. At the same time, the author tried to learn the problems of the system itself and the practical problems in teaching and learning practice.
Two hundred handouts of questionnaire for students were distributed, and 198 were collected, and 2 were invalid. Thirty handouts were distributed to teachers, and 30 were collected.
To know more about opinions from students and teachers, the author interviewed some teachers and the students so as to have an overview of their thinking.
Data Analysis and Discussion
The analysis of pretest scores of the students
The pretest was conducted before the experiment to learn the proficiency of these two classes. Table 1 shows that the mean score of EC is 9.7936, which is rather similar to that of CC (9.7751), and it is a little bit higher than that in CC (9.7936.55 > 9.7751) with the disparity of 0.0185, which is not so significant. Moreover, the standard deviation is 0.98734 in EC which is almost the same as CC (0.99786) too. Therefore, according to the two items of statistics, it is claimed that the average writing proficiency of these two classes is almost at the same level and the independent samples T-test (Table 2) just illustrates this question clearly.
Statistics of pretest in experimental class and control class
CC, control class; EC, experimental class; SD, standard deviation; SE, standard error.
Independent samples T-test
df, degree of freedom; 95% CI, 95% confidence interval of the difference.
According to Table 2 the significant difference of Levene's test for equality of variances is 0.755 (>0.05), which surely indicates that the variances of scores in pretest of the two classes have no significant difference. Furthermore, the mean difference is merely 0.04455, and significance (two-tailed) is 0.785 (>0.05), which also signifies that the mean scores between EC and CC have no obvious difference. In addition, 95% confidence interval of the difference is from −0.32111 to 0.41654 and obviously it includes 0, which also signifies that the two classes have no significant difference. So, in conclusion, the students from both EC and CC nearly have the same writing level before the study, which can ensure the validity and reliability of this experiment at the beginning of the experiment.
The analysis of the post-test scores of the students
The post-test is conducted at the end of the experiment. The results are brought out in the following Tables 3 and 4 after inputting the scores into SPSS19.0, and thus, the analysis of the relevant statistics could be stated clearly.
Statistics of post-test in experimental class and control class
Independent samples T-test
Table 3 shows that the mean score in EC is 13.5556, which is about 2.5752 points higher compared with CC (10.9804). Obviously, there is important difference between the experiment class and the CC. Compared with the mean scores in the pretest, the mean scores of the EC advance from 9.796 to 13.5556 with a big gap of 3.7596 while the mean scores of the CC just moved forward from 9.7751 to 10.9804. This indicates that the disparity is significant. Moreover, the standard deviation is 1.10591 in EC which is higher than CC (1.02937). It means that the students' writing proficiency shows significant progress in EC. This means students in experiment class receiving Juku feedback and the teachers' feedback did better in English writing than those in CC only receiving teacher's feedback and the traditional way of teaching.
From Table 4 we can see that the significance (two-tailed) 0.000 is lower than p (0.05), which means that there are obvious differences in mean scores between EC and CC. Moreover, 95% confidence interval of the difference is from 1.18478 to 1.96554 and it obviously does not contain 0, which also can prove that EC and CC have significant difference in their writing competence after this experiment.
The analysis of the data from the questionnaires and the interview
The questionnaire and interview for students mainly include four aspects: their interest in using AWE, the helpfulness of AWE, their frequency use of AWE, and their preference of the feedback given by the computer or by the teacher or both of them. The students answered the questionnaire, and later on the author interviewed dozens of students personally to learn their real thoughts.
Table 5 shows that most of the students (81.5%) use AWE three to five times a month, with only 20% and 8.5% once or twice and more than five times, respectively. This indicates that students are interested in using AWE. They may use it once every week.
Students' frequency of using automated writing evaluation in a month
As for the helpfulness of AWE, 65% of the students (shown in Table 6) think AWE system is very helpful with their English writing. Every time they submit their compositions, they will be given feedback immediately about the correction suggestions. Gradually they will be aware of their mistakes and try to avoid making them next time.
Students' perception of the helpfulness of using automated writing evaluation
From Table 7, it can be seen that students have a higher perception of the effectiveness of Juku AWE in terms of improvement in grammar, spelling and collocation, organization, revision, and content. Especially, half of the students agree that they have improvement in grammar (20% of strongly agree and 30% of agree); only 10% of the students have no improvement in grammar. The author also learned from the investigation that the students thought Juku AWE system helped a little with the organization and content of their essay. Ninety percent students interviewed told the author that when teachers used AWE, their learning efficiency improved. This reflects that without the teacher's supervision and delayed assessment, most of the students have no awareness of autonomous learning.
Students' attitude toward the effectiveness of Juku automated writing evaluation
Among the three choices of feedback, 60% students (Table 8) tend to have both AWE feedback and teacher's feedback. They gave the answer in the interview that AWE's feedback was not so accurate that they still wanted to get feedback from their teachers. Thus they would know how well they did with their English writing and how much progress they made. All the students interviewed hoped that although they could get feedback from the AWE, they did need the teacher's feedback, for the teacher's feedback was emotional. This is beyond the author's expectation.
Students' preference of the feedback
AWE, automated writing evaluation.
Table 9 shows that the students are satisfied with AWE with the average of 44% of “strongly agree” and the average of 7.4% of “agree” without counting the choice of “neutral.” Especially for the evaluation of the sentence, synthetic analysis was done, with 91.5% and 80% “strongly agree + agree” of the students' choices, respectively. This is further proved by the interview. Almost all the students interviewed told the author that they really paid attention to the evaluation of the sentence because that was the most frequent mistakes they made in their writing. Besides, the author also finds in the investigation that the students have little knowledge of the function of AWE. Some students have difficulty in using the AWE owing to no access to a computer.
Students' satisfaction with evaluation of automated writing evaluation
Teachers' questionnaire and interview include their interest in using AWE, the helpfulness of AWE, their frequent use of AWE, the effect of using AWE, and their preference of giving students' feedback.
In this study, Table 10 clearly shows that teachers have great interest in using AWE for no one disapproves of using it. Teachers interviewed also said that using AWE really reduced their load of evaluating students' essays and saved their time to some extent. They could use the AWE initiatively and they found that AWE met the need of their work, and it was easy for them to use it. It didn't need much technology. Their work load did not increase. Obviously the digital evaluation improves their working efficiency. But we can see that teachers have different attitudes to the use of AWE. Of the teachers 53.3% approve of AWE, while 46.7% of the teachers show dubious attitude. And still 23.3% teachers hold neutral attitude to the helpfulness of AWE.
Teacher's perception of automated writing evaluation
Similarly to the students, most of the teachers (76.7%) use AWE three to five times a month, and 10% of the teachers use it more than five times (Table 11). They combine their daily work with this digital grading system with 73.3% of scoring, 83.3% of feedback of the test paper, 56.7% of quality analysis, 33.3% of test source, 23.3% of quality test, and 83.3% of interaction between teachers and the students (Table 12). The figures show that the information technology really integrates with the English teachers' classroom teaching.
Frequency of teachers' use of automated writing evaluation of a month
Teachers' use of automated writing evaluation
It is interesting to find in Table 13 that 30% of the teachers prefer AWE feedback or teacher's feedback, respectively, while 40% of them prefer both of the feedbacks from the system and from teachers. This is further explained in the interview, and the answers from them confirm the previous figures that some teachers suspect the system. Although half of the teachers think that the use of AWE is helpful with the teaching quality, quite a lot of the teachers hold that the use of AWE has little help to the students, which indicates the lack of the comparative study on using AWE.
Teachers' preference of giving students' feedback
Conclusion
The freshmen are young and curious about everything in college. When they were introduced to the Pigai.org system, they were very excited. They were eager to have the experience of submitting their compositions online. Many students enjoyed their progress every time they submitted their compositions online, and even many of them submitted their compositions again and again. The Pigai.org inspired the students' enthusiasm for writing and increased the time of students' writing exercises. After so many times of writing exercises, and with the system's assessment, the students' English writing will be greatly improved! Thus, the correcting network can effectively help students to improve their English writing. Compared with the traditional teacher marking and giving feedback, it is immediate, clear, and time-saving!
Teachers believe that the AWE system is very convenient to prevent students from plagiarizing each other and from the samples the students find from other sources on the net. Moreover, teachers can add their own feedback to the feedback in the system and they can take advantage of the function of their personal websites to upload learning material and to assign tasks to the students so as to strengthen their interaction with students after class.
Of course, there are still many aspects to be improved in the intelligent correction system of English composition. It is just a tool to assist teaching, and teachers' guidance should be combined with online learning. In addition, AWE system can only comment on the grammar errors and basic word collocations. It cannot meet the requirements of the evaluation for the composition of the text structure, content logic, and coherence. So the writing scores should be taken objectively. The intelligent correction system of composition can not only improve teachers' working efficiency, let teachers have more time to improve classroom teaching and research work, but also more importantly create opportunities for students to write as much as possible and improve their English writing ability. Finally, as the environmental conditions of using Internet and computer are changing, and the teachers scoring composition and students writing composition change with time, college English teachers should try to find good ways for marking essays and give effective feedback for students so that teachers can be liberated from heavy work of marking students' essays while the students will get effective feedback and move forward quickly with their good English composition.
Footnotes
Acknowledgments
The thesis is a product of the study of English Writing open course online supported by Shaanxi Provincial Department of Education (program no.: JSMK1723) and of the Educational Project of Shaanxi Provincial “13th Five-Year Plan”: Research and Practice on Cultivating Innovative Foreign Language Talents in Shaanxi Universities Under the Context of “One Belt and One Road” Strategy (Project No: SGH17H231).
Author Disclosure Statement
No competing financial interests exist.
