Abstract
Reading requires appropriate strategies to spark initial interest and sustain engagement. One promising strategy is the pedagogical approach of learning-by-teaching, transforming learners into active participants. Integrating this approach into digitalized and individualized reading contexts has the potential to foster the development of young readers. Currently, AI techniques are primarily used in chatbots as tutors, with limited focus on tutee chatbots that employ the learning-by-teaching pedagogy. Therefore, this study adopted a teachable Q&A agent and probed into the effect of chatbot training, employing AI techniques and utilizing student-generated questions and answers, with the aim of enhancing students’ reading interest and engagement. Ninety-five fifth graders participated in a 9-week reading program. A quasi-experimental design was conducted. The results proved that incorporating a learning-by-teaching approach into the chatbot training activity significantly enhanced their reading interest and engagement. However, the quantity of certain question types is negatively correlated with interest and engagement. This implies that asking diverse questions poses a certain level of challenge to young readers, which requires deliberate training and incubation. Additionally, the identification of four distinct student clusters exhibited the affordances and limitations of tutee chatbots for reading.
Reading has been widely discussed as an effective way of fostering the enjoyment of learning and implicit learning mechanisms (e.g., Day & Bamford, 1998). To fully embrace the value of reading, it is crucial to emphasize consistent long-term reading, as developing intrinsically motivated readers requires time, and does not occur over one single experience (Guthrie et al., 2005). The literature suggests that the motivation to read is context-dependent, and intrinsic motivation is likely to enhance when the learning environment supports it (Guthrie & Humenick, 2004). Therefore, how to arouse and sustain students’ interest and engagement in reading has become a critical issue in education (Cho & Krashen, 2019; Ng et al., 2019).
Considering constraints of class resources, artificial intelligence (AI) enables chatbots to have promising potential for capitalizing on individuals’ reading experience. The benefits of chatbots for educational purposes include enhancing cognitive gains, motivation, interest, engagement, and self-efficacy, and reducing anxiety associated with learning (Atkinson, 2002; Kim, 2013; Kim et al., 2017; Plant et al., 2009; Xu et al., 2021; Yin et al., 2021). In previous studies, educational chatbots have been demonstrated to foster positive human-chatbot interactions (Arguedas & Daradoumis, 2021; Chen et al., 2020; Kim, 2013; Michaelis & Mutlu, 2018), leading chatbots to have great potential for assisting with interactive learning across diverse domains in education. The same applies to reading. The results of Xu et al.’s (2021) study suggested that a chatbot as a reading partner is adequate for enabling learners to comprehend stories to an equal level as a human companion. In another study by Liu et al. (2022), an interactive reading chatbot was employed as a tutor to enhance students’ engagement and interest in reading. The results indicated that the chatbot effectively maintained students’ interest in reading through a sense of social connection with the chatbot. However, reading requires spontaneity for students to freely explore open-ended domains rather than following the chatbot’s instructions in answering questions. Chatbot-driven interactions were dominant in previous research, which may be detrimental to students’ initiative in reading. With this in mind, our research aimed to foster students’ interest in reading by adopting a learning-by-teaching approach to have students train chatbots by asking questions and evaluating chatbots’ answers. The fairytale question framework (Paris & Paris, 2003; Xu et al., 2022), which emphasizes essential narrative elements, was implemented to guide students in generating questions. Given that the participants were young readers in the process of improving their English reading skills, these story elements, such as character and setting, were selected as a foundation for facilitating question generation practice. Furthermore, posing questions during reading serves as a form of cognitive engagement (Lee et al., 2021; Unrau & Quirk, 2014). This practice of question generation helps in visualizing students’ cognitive involvement in the material.
The literature has confirmed that learning by teaching generates higher intrinsic motivation, enhanced conceptual learning, and active engagement than learning for examinations (e.g., Benware & Deci, 1984). In a similar vein, Pesce (2000) explored the concept of a digital pet, illustrating how children assigned the task of caring for a virtual creature through feeding and other nurturing activities experience an increase in motivation due to the responsibility placed upon them. Additionally, teaching through asking questions can be a promising contribution to the reading experience (Humphries & Ness, 2015). The practice of student question generation has been found to positively impact motivational enhancement, cognitive gains, and metacognitive strategies in reading comprehension (Choi et al., 2005; Yu & Wu, 2020). However, how AI-enabled chatbots can be designed to facilitate learning by teaching between chatbots and students is not clear in the literature.
Therefore, the objective of this research was to promote active engagement in reading through training chatbots. Students were required to coach a self-selected robot. The process involved learners generating questions to test the robot’s reading comprehension while also being required to teach the robot whenever it responded incorrectly. This training process is analogous to self-explaining, which triggers an increase in in-depth engagement in the learning content (Gerjets et al., 2006; Hsu & Tsai, 2013). The following research questions were proposed to gain a better understanding of how the chatbot training activity may affect students’ reading experience: RQ 1. How did the students participate in chatbot training activity? RQ 2. What is the impact of chatbot training activity on students’ reading interest and engagement? RQ 3. What are the relationships, if any, between the students’ chatbot training activity and their interest and engagement in reading? RQ 4. How can the students’ interest and engagement be characterized based on their participation in the chatbot training activity and their English proficiency?
Literature Review
The importance of individualized learning experiences in reading is increasingly recognized (Connor et al., 2013; Stover et al., 2017). Extensive reading, praised for its ability to cultivate positive attitudes, allows learners to explore a wide range of content. Yet, implementing extensive reading in classroom settings often faces the challenge of providing individualized learning experiences. Chatbots, as an innovative educational technology, offer a solution by enabling customized learning. Typically used as tutors or peers, chatbots have positively influenced students’ reading, but often in a passive context where students primarily respond to questions. This research proposes a shift towards proactive reading, with chatbots acting as tutees, encouraging students to actively engage in reading by asking questions. This shift not only empowers learners to control their learning journey, but also aligns with the “learning-by-teaching” concept, whereby students teach their digital counterparts. The following literature review discusses reading interest, chatbots in learning, and teachable agents to address the aim of this research to explore the effects of utilizing chatbots as tutees in reading, focusing on how this innovative role can deepen student engagement and enhance interest in extensive reading.
Reading Interest
Reading interest has been widely discussed in the literature. Among diverse forms of reading that may impact reading interest, extensive reading has the potential to promote positive reading attitudes due to its flexibility, allowing learners to explore unfamiliar territories (Yamashita, 2013). Unlike the school curriculum, which is often predefined to attain a set of skills and knowledge that may restrain learners from interest discovery (Fryer et al., 2019), one major purpose of extensive reading is to increase learners’ interest and thus lead to the development of reading skills (Day & Bamford, 1998). Positive affective states (e.g., interest, motivation, and flow) were activated via extensive reading (e.g., Al-Homoud & Schmitt, 2009; Cremin & Swann, 2016; Ng et al., 2019). A study conducted by Fujita and Noro (2009) revealed that regular 10-min extensive reading sessions could help enhance intrinsic and extrinsic motivation. Kirchhoff’s (2013) study also demonstrated that readers are more likely to perceive a flow-like experience when exposed to an extensive reading program.
While numerous studies have substantiated the positive influence of extensive reading, its adoption and implementation in education still has much room for growth (Cho & Krashen, 2019; Ng et al., 2019). Structured extensive reading programs might not yield the desired outcomes either (Milliner, 2017) due to the manner in which it is implemented, such as a mandatory reading policy which might lead to a counter-effect since reading turns into an enforcement measure rather than a pleasure (Nakamura, 2018). An effective strategy to engage students in reading involves providing an individualized reading experience, possibly guided or accompanied by others. Supporting this, Bloom’s 2-sigma problem (Bloom, 1984) suggested that the one-to-one tutorial approach, in which one teacher guides a small number of students to learn, is promising in terms of having positive effects on learning. However, given the diverse paces and reading preferences of students, and the limited reading time available in schools (Kim, 2013; Tärning et al., 2019), it becomes challenging for teachers to offer such individualized attention. To overcome this, the integration of chatbots in learning environments presents a viable solution by providing personalized learning experiences. These intelligent systems enable student-centered learning at scale, emulating the personalized attention and adaptive feedback of one-to-one tutoring. This research aimed to understand how chatbots can be used to facilitate and maintain students’ reading interest as part of a prolonged reading program.
Chatbots in Learning
Chatbots in the context of reading are receiving growing interest, assuming various roles such as tutor, companion, mentor, facilitator, and tutee (e.g., Kim, 2013; Tegos & Demetriadis, 2017; Xu et al., 2021; Zhang et al., 2022). These chatbot designs address the need to improve reading comprehension, engagement, and affective states. One widely adopted role for chatbots is that of an instructor or tutor, particularly for young learners. In Xu et al.’s (2021) study, a conversational agent acted as a tutor, guiding students through story recitations and posing questions to aid their language development. The findings revealed the distinctive communication patterns that emerge when children interact with conversational agents as opposed to adults. However, both groups displayed a comparable level of accuracy in their responses. Similarly, in the study conducted by Liu et al. (2022), the chatbot encouraged students to narrate key aspects of the stories. The results indicated that students perceived a sense of social connection with the chatbot, effectively maintaining their interest in reading.
Chatbots were also developed as digital peers that read with students. In the study by Kim (2013), a digital peer was employed to enhance students’ reading comprehension by demonstrating effective questioning techniques and showing affective peer-like support. The results revealed the positive effects of the digital peer in enhancing comprehension and engagement. Similarly, in Zhang et al.’s (2022) study, StoryBuddy played the roles of a companion and mentor to help students stay engaged, and to provide question recommendations to parents to guide their children to read. Chatbots as facilitators to enhance collaborative learning (Dyke et al., 2013; Tegos & Demetriadis, 2017) are also emerging, but are found mostly in contexts such as scientific reasoning, other than pure reading for pleasure.
The aforementioned literature proves that chatbots in the roles of tutor and peer yield positive effects on students’ reading experience. However, the interaction sequence and question generation were primarily driven by chatbots. As a result, children played a passive role by answering questions only, rather than being encouraged to take the initiative to ask questions. Therefore, chatbots that allow children-led question-answering in reading may be an area worth more attention (Zhang et al., 2022). Prior literature also revealed a divergent development of learners’ reading interest with chatbots (Fryer et al., 2017, 2019), where the limitation of students in orchestrating conversation flow was not fully considered. Therefore, this research intended to examine the effects of chatbots behaving as tutees in a reading context to enable learners to take greater ownership of their learning.
Enhancing Learning through Learning by Teaching
Teachable agents or digital tutees engage learners in a trainer-trainee relationship, where the teachable agent assumes the role of tutee (Biswas et al., 2005; Chase et al., 2009; Tärning et al., 2019). In this sense, teachable agents are pedagogical agents that permit students to teach computers rather than vice versa, embracing the learning-by-teaching concept and its potential benefits (Okita & Schwartz, 2013). Learning by teaching is considered an effective approach to in-depth learning as it involves students’ active engagement in externalizing their understanding and knowledge-building of the target learning content (Roscoe & Chi, 2007). By undertaking the task of explaining concepts to their peers, students can engage in a deeper cognitive process, thereby enhancing their own learning experiences (Torshizi & Bahraman, 2019). Further, the act of teaching others can be both rewarding and inspiring, catalyzing sustained interest and motivation, thereby facilitating ongoing learning (Benware & Deci, 1984; Roscoe & Chi, 2007).
Recognizing the advantages associated with learning by teaching, researchers examined the effect of teachable chatbots in diverse learning domains. In the study by Chase et al. (2009), the teachable chatbot allowed students to simulate the teaching of cause-and-effect relationships between science-related variables for 3 days via constructing a concept map. The study indicated that students devoted more effort to learning for their chatbots than for themselves. In the social and affective aspects, the students attributed more mental engagement and responsibilities to their chatbots. Similarly, teachable chatbots were applied to enhance algebra learning (Matsuda et al., 2020). Students were given opportunities to monitor the problem-solving process and to provide feedback on the correctness of the chatbots. The results showed students’ proficiency in solving equations increased after using teachable chatbots for 4 days. The above literature suggests that there remains room for more discovery of the teachable chatbot’s potential due to its short experiment period and fairly close-domain knowledge (e.g., one fever mechanism passage or one/two-step algebra equations). How teachable chatbots may be applied to enhance reading interest during a prolonged reading program is not clear.
Extensive research has demonstrated that generating questions facilitates knowledge exploration and reading comprehension, and helps identify gaps in understanding (e.g., Humphries & Ness, 2015; Mishra & Iyer, 2015). Furthermore, learning through exploration around self-generated questions fosters curiosity and a sense of ownership among students (Alaimi et al., 2020), and therefore may be effective in sustaining long-term interest in reading. The practice of student generating questions or feedback has shown positive impacts on motivation, cognitive gains, and metacognitive strategies (Choi et al., 2005; Yeh & Lai, 2012; Yu & Wu, 2020). However, the skill of formulating questions is not commonly taught in schools (Kopparla et al., 2019), with students typically generating fewer than 0.2 questions per class hour across different cultures, and the questions themselves tend to lack sophistication (Graesser & Person, 1994). Hence, this research aimed to explore the effect of training a chatbot through proactive question generation, diverting from the conventional tutor chatbot focus in which chatbots assume the role of guiding and answering. Our question generation approach to reading aimed to encourage students to read and construct their own questions to train their chatbots, rather than receiving teaching from the chatbots. How students teach an AI-enabled chatbot and how the chatbot training activity may impact students’ interest and engagement in reading were analyzed in the endeavors to nurture learners’ prolonged interest and engagement in extensive reading. The findings may enrich our understanding of chatbot integration in reading, making a valuable contribution to the use of AI techniques in education.
Method
Participants
The participants in this research were 95 fifth graders from four classes in a public elementary school in northern Taiwan. Two classes consisting of 47 students were randomly assigned as the experimental group, which incorporated a chatbot during the reading period. The 48 students in the remaining two classes were the control group, who engaged in a regular reading activity. The reading sessions took place once a week, recurring consistently over 9 weeks. Each session lasted for 50 min. All participants were English as a Foreign Language (EFL) learners, with prior experience of participating in an English reading program during the previous semester. In order to prioritize the goal of fostering an increased interest in reading, no written assessments were administered to evaluate comprehension levels. Furthermore, both the participants and their parents provided informed consent for the students’ involvement in the study and the collection of relevant data.
Procedure
Participants from both the experimental and control groups completed a situational interest survey before the 9-week reading program. The survey served as the basis for understanding how students’ reading interest was influenced throughout the study. Both groups engaged in the same reading for the first 2 weeks, where the students engaged in silent reading while sitting alongside their peers. Throughout the study, a collection of English-graded readers focusing on life education was made available to all students in their respective classrooms. Students could select any of the books to read during the weekly 50-min reading sessions. At the end of each reading session, students from both groups were instructed to fill in a 5-min flow survey. The purpose of the flow survey was to keep track of the attentiveness level exhibited by the students during the reading activities. Therefore, two sets of flow survey results were obtained that represented students’ engagement in the traditional silent reading activities for the first two weeks.
During the following 7 weeks, the two groups engaged in different reading activities: the control group continued with the same reading activity, while the experimental group incorporated chatbot training and battling into their reading sessions. Each student from the experimental group was provided with a tablet, which they used to operate and interact with chatbots for training and battling sessions at any time during the 7-week reading period, except for the last week, which was designated as the battle finale. At the end of each reading session, students from both groups also filled in the flow survey, resulting in seven sets of flow survey results for each group. After the 9-week reading program, participants from both groups completed the situational interest survey again in order to observe how their reading interest evolved throughout the study, with and without the involvement of the chatbot training. Finally, students from both groups were invited to take part in individual interviews. Twenty-six students from the control and experimental groups participated in the interviews. See Figure 1 for the experiment procedure. Experiment procedure.
The Chatbot Training System
This study provided a chatbot training system through which students in the experimental group could select a robot image to cater to each individual’s preference. While reading the books, students could test the chatbot with questions about the book they were reading and provide answers if the chatbot did not answer correctly. The chatbot training system encompasses AI techniques and the feature of storing student questions and answers to facilitate the chatbot’s knowledge-building. A battling mechanism was also implemented to enhance students’ engagement. Students could challenge the chatbots trained by their peers through question-and-answer interactions (see Figure 2 for the training and battling interfaces). The knowledge-building process of the chatbots relies on two AI techniques (see Figure 3 for the design of the training system): 1. The SQuAD question answer technique: The chatbot provided to students was equipped with the Stanford Question Answering Dataset (SQuAD) question answer technique (Rajpurkar et al., 2016). The technique includes a large transformer-based language model and is fine-tuned with the SQuAD dataset (Rajpurkar et al., 2016), which consists of question-answer pairs sourced from Wikipedia articles. Therefore, it is used as a question-and-answer model to answer questions about a given story in this study. With the model, the chatbot provided to students was expected to answer simple questions transformed by the original story texts. As the trainers, students provided feedback to the chatbot to confirm whether the answers provided by the SQuAD question-answer technique were correct. The questions, SQuAD answers, and students’ feedback were recorded in the Q&A bank for further queries. The design affords a sense of intelligence for the chatbots, making students perceive the chatbot as being sufficiently intelligent to be trained. However, the SQuAD model’s capacity to respond is limited; it can only answer questions posed by students if the questions are strictly derived from direct sentences in the original text of the story. 2. Question search technique: When the SQuAD question answer technique could not answer students’ questions correctly, the students were asked to provide answers and explanations. The chatbot collected the trainers’ questions and answers as the Q&A bank representing the knowledge bank of the chatbot. The Elasticsearch technique (Gormley & Tong, 2015) was then utilized to detect similar questions asked before from the Q&A bank to answer the new question. The technique functions as an advanced search engine empowered with relevance processing capabilities, rather than solely matching keywords in questions. For instance, when students asked the question “What happened to Dandelion on the way?”, the technique would detect that there were several similar questions asked before, such as “What happened to the main character?” and “Who inspires little Dandelion in the middle of the story?”. The technique retrieves similar questions in the Q&A bank that have been used to train the chatbot. With this technique, the chatbots can then generate answers based on what has been taught. The training (left) and battling (right) interfaces of the chatbot training system (English equivalent translation). The design of the training system.


The training process involved the students in generating questions for their chatbots. Given that students may not possess advanced question-asking skills (Byun et al., 2014), they may benefit from scaffolding to effectively participate in the training activity. To help students ask questions, a list of seven story elements was displayed (see top-left of Figure 2), serving as a hint for generating questions. The seven elements were based on the fairytale question framework proposed and validated previously (Paris & Paris, 2003; Xu et al., 2022), addressing the fundamental narrative elements. The elements include character, setting, action, feeling, outcome resolution, causal relationship, and prediction. Sample questions were provided as examples to assist students in formulating their questions after selecting a question type they wished to inquire about.
While training the chatbot, students could test its knowledge by asking questions about the stories. The chatbot would first answer the questions based on the Q&A bank, and if it could not retrieve existing questions in the bank, it would then use the SQuAD question-answer technique to answer the question. Following each questioning and answering, students were prompted to assess the accuracy of the chatbot’s response. In cases where students marked the chatbot’s answers as incorrect or when the chatbot was unable to respond, they were prompted to provide additional answers and explanations. They justified their answers by identifying where the original story texts can answer the question to teach the chatbots. Furthermore, students could track their progress regarding the number of training questions on the system. A ranking of training questions was available to provide a comprehensive overview of everyone’s progress, which served as a mechanism for self-monitoring (see bottom-left of Figure 2). Five selected books were designated for the chatbot training activity. They were all part of the collection of English-graded readers focusing on life education from the previous two weeks. The students were free to choose any of these books during the activity. These books, designed as picture books, pair texts with corresponding images to aid comprehension. The average length of these books is 23 pages with an average word count of 491 words per book. Students had the flexibility to choose any book from this selection for each session, progressing at their own pace. The emphasis was not on the number of books completed but rather on consistent participation in the activity. To maintain engagement, especially among less proficient readers, the approach was relaxed and inclusive. These students were encouraged to stay involved by asking questions, either to train their own chatbots or to interact with others, without the pressure of meeting specific goals. The key was continuous interaction with the books.
Due to the limitations of the language model we applied (SQuAD) during the study, the language model could not effectively detect incorrect answers provided by the trainers. To promote student engagement and enhance the quality of questions and answers, the chatbot training activity incorporated challenger and rebuttal mechanisms. The challenger mechanism enabled peers to assess the chatbot trained by fellow students, motivating trainers to improve the quality of their questions and answers (see top-right of Figure 2). Students had the option to choose any chatbot for challenging with four questions each time. The Q&A pairs provided by trainers were utilized to respond to challenge questions, without utilizing the SQuAD technique. Furthermore, the rebuttal mechanism allowed trainers to raise objections to teachers when they believed that challengers had wrongly evaluated the accuracy of their chatbots’ answers. These two mechanisms collectively created a peer verification environment. The chatbot training activity lasted for 7 weeks, with the last week focusing on battling others’ chatbots. Information regarding the trainers and the corresponding accuracy rates of each chatbot were provided as a point of reference. A battling ranking was also made available for students to monitor how each chatbot performed (see bottom-right of Figure 2).
Data Collection
Flow survey. The mental engagement of the students during the reading was measured by a flow perception survey adopted from previous research by Liu et al. (2017). The flow perception survey assesses the psychological state of how involved and engaged students are during learning activities, and thus was applied in this research to probe students’ mental engagement during the 9-week reading program. The survey is a 5-point Likert scale composed of four questions with dimensions of attention, sense of control, curiosity, and intrinsic interest, ranging from 1 (strongly disagree) to 5 (strongly agree). Survey questions include “I stay entirely focused in the reading activity,” “It is fairly easy for me to engage in the reading activity,” “I am full of curiosity about the reading activity,” and “I think the reading activity is interesting.” The Cronbach’s reliability (alpha) values for the survey administered over the period of 9 weeks exhibited consistent results, with values ranging from .76 to .83 across the nine survey sets, showing the adequate reliability of the survey.
Situational interest. The situational interest (SI) survey adopted in this study was adapted from the measure developed by Linnenbrink-Garcia et al. (2010) and later revised by Liu et al. (2022) to probe students’ interest in reading. Two sets of situational interest surveys were collected. The survey consisted of 12 questions, divided into three dimensions: triggered situational interest (triggered-SI), feeling dimension of maintained situational interest (maintained-SI-feeling), and value dimension of maintained situational interest (maintained-SI-value). The triggered-SI dimension measured the extent to which the reading activity captured students’ attention. The maintained-SI-feeling dimension assessed the level of enjoyment associated with reading books. The maintained-SI-value dimension explored whether students perceived reading books as valuable. Each dimension had four questions, and all questions were rated on a 5-point Likert scale, ranging from 1 (strongly disagree) to 5 (strongly agree). Example survey statements include “The English reading class of this semester is exciting,” representing triggered-SI, while “English story books are fascinating to me,” reflecting maintained-SI-feeling, and “Reading English story books is important to me,” representing maintained-SI-value. The survey exhibited adequate reliability, with a Cronbach’s alpha coefficient between .78 and .90 for the three dimensions of situational interest.
Data Analysis
In order to understand overall student behavioral engagement in the reading incorporated with the chatbot training activity in the experimental group (RQ1), questions generated by the students in the training and battling fields for 7 weeks were tallied and categorized based on the fairytale question framework, including seven narrative elements of character, setting, action, feeling, outcome resolution, causal relationship, and prediction (Paris & Paris, 2003; Xu et al., 2022). Two coders were involved in categorizing the questions produced by the students to display the distribution of the question types. There were 3370 questions asked by the students in total of which 635 were discussed and the disagreements were resolved after discussions. The interrater reliability was assessed to measure the agreement between two independent coders who evaluated the participants’ responses. Cohen’s kappa statistic was used as a measure of interrater reliability, and the kappa value was calculated to be κ = 0.785. According to Landis and Koch’s (1977) criteria, this kappa value falls into the substantial agreement category, indicating an acceptable level of agreement between the coders.
To answer RQ2, surveys of situational interest were analyzed through ANOVA with repeated measures to observe the progress within respective groups and the variation between the experimental and control groups before and after the 9-week reading program. Furthermore, students’ flow perception over the 9 weeks was divided into three phases: the initial phase (Phase I: weeks 1 and 2), the middle phase (Phase II: weeks 3–5), and the last phase (Phase III: weeks 6–9). Students’ flow perceptions in each phase were averaged to represent the overall flow perception during each phase. ANOVA with repeated measures was also applied to determine whether there was a significant difference in the periodical flow perception over the three reading phases.
Additionally, the associations between the generated questions and situational interest and flow perception were further investigated to respond to RQ3 using a Pearson correlation analysis. Finally, since linear relations may not be able to depict the complex feature of students’ engagement in the reading activity, to answer RQ4, a cluster analysis was conducted to explore the student clusters based on traits of learning behavior, survey results, and prior English proficiency levels. This research utilized K-means clustering to conduct cluster analysis.
Results
Student Participation in the Chatbot Training
The Total Numbers of Training and Battling Questions Students Asked the Chatbots.
The Distribution of Questions Marked by Coders. (N = 47).
During the interviews with students, discussions centered on their reading interests and motivations. The control group students shared that their drive to read was fueled by the goals of expanding their vocabulary and sentence structures, engaging in interactions with foreigners, and succeeding in examinations. For instance, students in the control group indicated that “I want to know more English words and sentences” and “because there’s a possibility of studying abroad or needing to communicate with foreigners, it’s very important.” When asked what reasons would make them want to read, they answered, “for exams, I guess.” Students from the experimental group reported that their motivation to train their robots stemmed from a desire to maintain their robots’ competitiveness, secure higher rankings, use their robots as companions during reading sessions, and enjoy the fun they associated with these activities. For example, the students in the experimental group indicated, “It’s just to beat other people’s robots, to have a higher ranking. Because playing with robots is not like before when I was reading alone, feeling bored; having a robot feels like having a friend by my side to accompany me.” They also stated, “when someone tests my robot with harder questions, I will go back and train on those questions,” and “I thought training a robot shouldn’t be too difficult, so I wanted to give it a try, and then I found it was quite fun. So, I went to the battling field.” Therefore, chatbot training played an important role in inciting their motivation to read.
Table 2 presents the distribution of questions marked by the two coders. A total of 262 questions in the training field were categorized as “non-related,” indicating that 21.18% of the training questions were irrelevant to the stories. However, the percentage of non-related questions was significantly lower (7.74%) in the battling field, showing that students took a more serious attitude toward battling others’ chatbots. Notably, the top two frequently asked question types in both the training and battling fields were character and action. There were 246 character-related questions (19.89%) and 198 action-related questions (16.01%) generated in the training field, and 532 character-related questions (24.94%) and 374 action-related questions (17.53%) asked in the battling field. Feeling-related questions appeared to be much less generated (7.28% and 6.28% for training and battling fields, respectively), implying that students paid less attention to the characters’ emotions but more to actions. For the more advanced question types that require comprehending the whole story, such as “outcome resolution,” “causal resolution,” and “prediction,” students performed better at generating questions about “outcome resolution” (13.58% and 17.49% for the training and battling fields, respectively). Similarly, students displayed proficiency in asking “causal resolution” questions, representing 10.67% and 12.33% of the questions in the training and battling fields, respectively. Conversely, the “prediction” type of question was the least asked, with only nine questions (0.73%) in the training field and 48 questions (2.25%) in the battling field.
The Results of the Situational Interest and Flow Perceptions Surveys
The Repeated Measures Analysis of the Situational Interest Survey.
**p < .01 ***p < .001.
As indicated in Table 3, the results of ANOVA analysis with repeated measures delineate that the changes between the experimental and control groups reached a statistically significant level in all the dimensions, namely triggered interest (F = 19.330, p < .000), maintained interest in feeling (F = 13.176, p < .000), maintained interest in value (F = 7.363, p < .008), and overall interest (F = 17.308, p < .000). While the control group showed a decreasing tendency in their situational interest in reading, the experimental group demonstrated an increasing tendency in their situational interest. In other words, the chatbot training activity significantly positively influenced the students’ reading interest consistently in both triggered and maintained interest.
The Overall Repeated Measures Results of the Students’ Flow Perceptions.
***p < .001; H-F: Huynh-Feldt.
The Correlation Between SI, Flow Perception and Questions
The Correlation Between the Students’ Situational Interest/Flow Perception and the Number of Questions Generated for Training the Chatbot (N = 47).
The Correlation Between Students’ Situational Interest/Flow Perception and the Question Types Generated in the Training Field (N = 47).
*p < .05 **p < .01.
Student Clusters Based on Traits of Learning Behavior and Surveys
Student Clusters Based on Traits of Learning Behavior, Surveys, and English Proficiency (N = 45).

Cluster 1: Low social connection readers.

Cluster 2: High interest active challengers.

Cluster 3: Low proficiency moderate trainers.

Cluster 4: Low interest active trainers.
The 18 students in the first cluster were low social connection readers (Figure 4). They demonstrated low social connection with the chatbots and their peers, asking the fewest training questions (M = 20), resulting in lower chatbot accuracy. In particular, these students did not actively participate in challenging peers’ chatbots since the number of battling questions was the lowest (M = 23). Thus, the cluster demonstrated low social connection with peers since the battling activity involved identifying peers’ chatbots and asking these chatbots questions. However, despite their low social engagement, these students still reported above-average situational interest (M = 3.88) and flow perception (M = 4.07). Hence, this group of students might exhibit self-motivation and interest in reading, but they had limited engagement with chatbots.
The second cluster was high interest active challengers (Figure 5); it comprised eight students who demonstrated the highest levels of English proficiency (M = 93.43), interest (M = 4.02), flow perception (M = 4.13), and active participation in battling questions (M = 84). This group of students was highly capable, engaged, and interested in the reading program with the chatbot training activities. They displayed particular enthusiasm for engaging in battles with other students’ chatbots, indicating a preference for competitive interactions rather than focusing solely on training their own chatbots.
Students in the third cluster were low proficiency moderate trainers (Figure 6). This cluster consisted of seven students who exhibited the lowest levels of English proficiency (M = 64.74). Given their limited English proficiency, it is anticipated that they may have encountered challenges in the reading program and perceived lower levels of interest (M = 3.56) and flow (M = 3.93). Due to their low English proficiency, they focused on reading three books, one book fewer than other clusters. However, it is noteworthy that they maintained a certain level of engagement in training their chatbots and achieved a moderate accuracy rate (M = 0.26) equivalent to that of the high interest active challengers group.
The fourth cluster is low interest active trainers (Figure 7), which is comprised of 11 students who demonstrated a commendable proficiency in English (M = 92.93) but low levels of interest (M = 3.58) and flow perception (M = 3.95). This group of students devoted their efforts to training their own chatbots and also received attention from their peers to battle their chatbots. The accuracy rate of their chatbots, therefore, appeared to be the highest among all clusters (M = 0.42).
Discussion
Incorporating a learning-by-teaching approach in the chatbot training activity proved effective during the 9-week reading program, as it successfully increased students’ interest and flow perception in reading. The study yielded several significant findings. First, the presence of a tutee chatbot facilitated increased interest and flow among the students. Second, the quantity of certain types of questions demonstrated a negative correlation with students' interest and flow. Thus, the quantity of questions asked should not be over-emphasized, as it might result in a counter-effect, considering question generation is challenging for young readers. Lastly, four distinct clusters, low social connection readers, high interest active challengers, low proficiency moderate trainers, and low interest active trainers, emerged, presenting the affordances and limitations of chatbot training in reading.
Engagement in the Chatbot Training Activity
During the 9-week reading program, students asked an average of 25 questions to train their own chatbots and 26 questions to battle other chatbots. The two modes of student-led question-answering activities received nearly equal participation, implying that asking questions to their own tutees or tutees of other competitors both play a certain role in facilitating reading interest and behavioral engagement.
For the training portion, the results are partially comparable to the notion of a digital pet that children were expected to take care of by performing acts that would promote its growth (Pesce, 2000). The responsibility placed on the owners was found to be highly motivating for children. The same effect was also found in the chatbot training. The results support the idea that enabling students to train their own chatbots, rather than relying solely on commercial AI chatbots trained by experts, is a potential approach to enhancing learning. For the battling mechanism, students would challenge other chatbots to see how others performed. This could also be explained as a social networking behavior observed in the virtual world, which helps students establish relationships with their peers in real life (Yu et al., 2010). As the students could identify those who challenged their chatbots, they might then engage in reciprocal challenges, thus creating conducive peer learning (e.g., Parr & Townsend, 2002; Topping, 2005) and heightening learning motivation through social networking.
The accuracy rate of the chatbots in the battle finale was 32.30%, with a maximum of 68.20%. The chatbots heavily relied on the questions provided by their trainers to expand their knowledge base. The number of questions, students’ language skills, and the feedback students provided could all be contributing factors in determining the chatbots’ intelligence regarding the story content. Also, question generation requires extensive practice and proper support (Byun et al., 2014). Given the fact that students provided only about 25 questions to train their chatbots and students’ answers were not further verified, the limited set of data could not achieve a high accuracy rate when answering peers’ open-ended questions. Therefore, it is suggested that the accuracy rate should not be used as an evaluation criterion for students’ learning performance.
Reading Interest and Flow Perception after the Chatbot Training Activity
It is indicated that the chatbot-driven QA, in which the chatbots ask questions and students answer the questions, may produce negative effects when students deviate from the scripted process (Kuhail et al., 2023). Conversely, student-led QA in this study was observed to trigger positive effects. The situational interest in reading of the experimental group increased, while that of the control group decreased over time. Thus, the chatbot training activity positively impacted students’ reading interest in both triggering and maintaining interest. The results of our study are in line with the study by Zhao et al. (2012), where a teachable agent was also adopted to boost learners’ interest in science lessons. Thus, the chatbot training activity builds a sustained relationship between the tutor and tutee, contributing to the long-term learning interest. The mechanism of chatbot training yields a more extended positive affective state of the students and therefore increases their interest in reading.
Increased flow perception was observed for the experimental group, in contrast to the control group. While flow perception with reading is often difficult to observe, the chatbot training activity effectively boosts reading flow through asking questions. This finding is consistent with a study conducted by Matsuda et al. (2020), which also noted that students demonstrated active engagement through various high-order behaviors, such as requests for help in selecting questions and tutoring. The chatbot training actions, including asking questions, selecting questions, testing chatbots, and providing answers, may contribute to improving flow perceptions of the reading activity. This finding also resonates with the study by Chase et al. (2009), indicating that the psychological projection of the protégé effect may occur when teaching a teachable agent. Students were more involved and engaged in the process of teaching the agent as compared to doing the same thing for themselves. When students are held accountable for their tutees’ learning, they tend to spend more effort in the process, as a result, bolstering their reading engagement and flow. Additionally, our study found that the chatbot’s inaccurate responses may present an additional opportunity for students to elaborate on story content, thereby enhancing their engagement, as evidenced by the flow survey results of the experimental group. The approach fulfills the self-explanation strategy that may potentially improve text comprehension (Bisra et al., 2018). This prompting for student explanations activates their self-explanation strategy. The positive impact aligns with extensive literature which supports the effectiveness of this strategy in terms of enhancing knowledge understanding (Ainsworth & Th Loizou, 2003; Aleven & Koedinger, 2002; Chi et al., 1989).
Overall, the observed increase in student interest and flow perception with a tutee agent can be attributed to the influence of social agency theory, as explored by Schroeder et al. (2013). The meta-analysis indicated that engagement and learning are enhanced when learners perceive the agent as a social peer, leading to the alignment of their engagement and behaviors in a social interaction manner. This supports the notion that the presence of a tutee agent in the chatbot training activity likely facilitated increased interest and flow among the students in our study.
Relationships Between Training Engagement and Situational Interest and Flow Perception
The analysis revealed that certain question types demonstrated negative correlations with situational interest and/or flow perception. One possible explanation for this finding is that asking questions is not a skill that occurs naturally, as evidenced by research findings (Humphries & Ness, 2015), even among native speakers of English in grades 4 and 5. The majority of questions asked by students at these grade levels tend to be memory-based (Humphries & Ness, 2015). This observation may become even more significant when considering the context of our study, where students are EFL learners. Additionally, the suggested use of the seven elements of a story as question prompts may have limited the scope of question generation to the text itself, resembling a comprehension test. By focusing solely on the story’s content, students may have missed opportunities to ask more authentic questions that draw on real-life experiences or engage in argumentation, which could have led to more heuristic and deeper conversations (Humphries & Ness, 2015). Expanding the range of question types and encouraging students to explore personal connections could potentially foster a more enriching and intellectually stimulating reading experience.
Student Clusters Based on Traits of Behaviors and Perceptions
Four clusters of students were classified according to their interest, flow perceptions, learning behaviors, and English proficiency. It is notable that the low social connection readers asked the least number of questions to others’ chatbots. This suggests that socialization through battling other chatbots was not a priority for this group of students. However, their situational interest and flow perceptions remained above average. Considering the marginally lower English proficiency observed in this cluster, an educational implication is to initially focus on bolstering their reading skills to bridge gaps and prevent a widening disparity resulting from the Matthew effects (Pfost et al., 2012; Stanovich, 1986). The Matthew effects in reading indicate that initial differences in students’ reading abilities tend to increase over time without support. Therefore, it is imperative to first implement specialized guidance and instruction aimed at enhancing their reading abilities. Subsequently, the emphasis can shift towards encouraging a higher frequency of reading.
The high interest active challengers exhibited the highest proficiency in English and showed a particular enthusiasm for competitive battles with other students’ chatbots. Notably, the number of questions they asked in battles was 50% higher compared to the second-highest group. A pedagogical implication for this cluster of students is to sustain their interest by providing incremental challenges (Chase et al., 2009) on the battlefield, allowing their tutees to advance to higher levels.
The low proficiency moderate trainers faced greater challenges in the reading program compared to other clusters. Despite having the lowest levels of interest and flow perception, they did not show disengagement from the reading activities. Instead, this group of students actively participated in both training and battling. This demonstrates the potential of chatbots to assist low achievers in maintaining their engagement. This finding aligns with a study by Chase et al. (2009), which suggested that low-achievers put more effort into improving their tutee agents’ performance. Additionally, according to the role theory (Thomas & Biddle, 1966), the behaviors of individuals are partially framed by the roles they assume. In the case of low achievers, assuming the role of a tutor benefits them even more compared to studying alone (Allen & Feldman, 1973; Robinson et al., 2005). Although students with lower proficiency in our study may not have outperformed other students in terms of interest and flow perception after playing the role of tutors, the experience kept them motivated to stay engaged in reading.
The low interest active trainers with a commendable English proficiency trained their own chatbots the most. Their chatbots had the highest accuracy rate among all the chatbots. However, they perceived a lower level of flow and interest in reading. This student cluster indicates that excessively focusing on maintaining chatbot training and answering accuracy may also induce an extraneous cognitive load (Sweller, 2010) that distracts students from reading. Educators may consider either increasing text richness for this specific student cluster or redirecting their focus to higher interest levels once they achieve proficiency in the chatbot training activity. For instance, it is suggested that project-based learning and collaborative learning can contribute to maintaining situational interest (Hidi & Renninger, 2006).
Conclusion
Given the growing interest in incorporating chatbots such as ChatGPT into education to enhance the learning experience, there is still room for investigation into how to empower students to take on a more proactive role in their learning. This research confirmed that teachable Q&A chatbot training with AI techniques, which diverts from the conventional tutor chatbot approach, promoted students’ reading interest and flow in extensive reading activities. This research found that both the high interest active challengers and the low proficiency moderate trainers benefited the most from the chatbot training activity. However, the other two clusters, the low social connection readers and the low interest active trainers, revealed limitations of tutee chatbots in reading. Addressing these limitations may require providing additional reading skill support and improving the adaptability of the story text to cater to various reading preferences and abilities. These findings enrich our understanding of chatbot integration in reading, making a valuable contribution to educational technology and pedagogy.
This study did not include verification mechanism to detect wrong answers. The inclusion of the verification mechanism may be helpful to increase the accuracy rate and hence improve students’ interest in reading. Future investigation could add a verification mechanism to enhance the overall performance. The question types presented in our study served as question prompters for EFL learners to form basic questions and subsequently enhance the chance for students to revisit the story books to facilitate initial engagement. Therefore, future research may explore how the prompt of question types can be extended from basic comprehension to questions that require authentic discussions or debates to foster inquisitive mindsets and heighten learners’ reading interest. The quality of the questions and answers could also represent a valuable area for further investigation. Additionally, collaborative training on chatbots could be another interesting approach that leads to better chatbot performance through collective efforts. Further investigation may explore the potential of advanced machine learning approaches, such as using the question-and-answer pairs generated by students as the fine-tuning data set of ChatGPT, to enhance chatbot performance. Gathering data on these issues is needed to unleash the potential of incorporating chatbot training into learning.
Footnotes
Authors’ Contributions
Chen-Chung Liu: Conceptualization; Funding acquisition; Investigation; Supervision; Project administration; Writing - review & editing. Wan-Jun Chen: Data curation; Formal analysis; Investigation; Methodology; Software. Fang-ying Lo: Conceptualization; Investigation; Methodology; Roles/Writing - original draft; Writing - review & editing. Chia-Hui Chang: Software; Resources; Validation. Hung-Ming Lin: Methodology; Validation.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Science and Technology Council, Taiwan (110-2511-H-008 -006 -MY3).
Ethical Statement
Author Biographies
), the goal of which is to develop socio-technical environments that enhance individual and collaborative learning.
). Her research interests are enabling technology for Web information reuse and integration, text mining, and story chatbots.
