Teachable Q&A Agent: The Effect of Chatbot Training by Students on Reading Interest and Engagement

Abstract

Reading requires appropriate strategies to spark initial interest and sustain engagement. One promising strategy is the pedagogical approach of learning-by-teaching, transforming learners into active participants. Integrating this approach into digitalized and individualized reading contexts has the potential to foster the development of young readers. Currently, AI techniques are primarily used in chatbots as tutors, with limited focus on tutee chatbots that employ the learning-by-teaching pedagogy. Therefore, this study adopted a teachable Q&A agent and probed into the effect of chatbot training, employing AI techniques and utilizing student-generated questions and answers, with the aim of enhancing students’ reading interest and engagement. Ninety-five fifth graders participated in a 9-week reading program. A quasi-experimental design was conducted. The results proved that incorporating a learning-by-teaching approach into the chatbot training activity significantly enhanced their reading interest and engagement. However, the quantity of certain question types is negatively correlated with interest and engagement. This implies that asking diverse questions poses a certain level of challenge to young readers, which requires deliberate training and incubation. Additionally, the identification of four distinct student clusters exhibited the affordances and limitations of tutee chatbots for reading.

Keywords

chatbot learning by teaching reading interest flow engagement

Reading has been widely discussed as an effective way of fostering the enjoyment of learning and implicit learning mechanisms (e.g., Day & Bamford, 1998). To fully embrace the value of reading, it is crucial to emphasize consistent long-term reading, as developing intrinsically motivated readers requires time, and does not occur over one single experience (Guthrie et al., 2005). The literature suggests that the motivation to read is context-dependent, and intrinsic motivation is likely to enhance when the learning environment supports it (Guthrie & Humenick, 2004). Therefore, how to arouse and sustain students’ interest and engagement in reading has become a critical issue in education (Cho & Krashen, 2019; Ng et al., 2019).

Considering constraints of class resources, artificial intelligence (AI) enables chatbots to have promising potential for capitalizing on individuals’ reading experience. The benefits of chatbots for educational purposes include enhancing cognitive gains, motivation, interest, engagement, and self-efficacy, and reducing anxiety associated with learning (Atkinson, 2002; Kim, 2013; Kim et al., 2017; Plant et al., 2009; Xu et al., 2021; Yin et al., 2021). In previous studies, educational chatbots have been demonstrated to foster positive human-chatbot interactions (Arguedas & Daradoumis, 2021; Chen et al., 2020; Kim, 2013; Michaelis & Mutlu, 2018), leading chatbots to have great potential for assisting with interactive learning across diverse domains in education. The same applies to reading. The results of Xu et al.’s (2021) study suggested that a chatbot as a reading partner is adequate for enabling learners to comprehend stories to an equal level as a human companion. In another study by Liu et al. (2022), an interactive reading chatbot was employed as a tutor to enhance students’ engagement and interest in reading. The results indicated that the chatbot effectively maintained students’ interest in reading through a sense of social connection with the chatbot. However, reading requires spontaneity for students to freely explore open-ended domains rather than following the chatbot’s instructions in answering questions. Chatbot-driven interactions were dominant in previous research, which may be detrimental to students’ initiative in reading. With this in mind, our research aimed to foster students’ interest in reading by adopting a learning-by-teaching approach to have students train chatbots by asking questions and evaluating chatbots’ answers. The fairytale question framework (Paris & Paris, 2003; Xu et al., 2022), which emphasizes essential narrative elements, was implemented to guide students in generating questions. Given that the participants were young readers in the process of improving their English reading skills, these story elements, such as character and setting, were selected as a foundation for facilitating question generation practice. Furthermore, posing questions during reading serves as a form of cognitive engagement (Lee et al., 2021; Unrau & Quirk, 2014). This practice of question generation helps in visualizing students’ cognitive involvement in the material.

The literature has confirmed that learning by teaching generates higher intrinsic motivation, enhanced conceptual learning, and active engagement than learning for examinations (e.g., Benware & Deci, 1984). In a similar vein, Pesce (2000) explored the concept of a digital pet, illustrating how children assigned the task of caring for a virtual creature through feeding and other nurturing activities experience an increase in motivation due to the responsibility placed upon them. Additionally, teaching through asking questions can be a promising contribution to the reading experience (Humphries & Ness, 2015). The practice of student question generation has been found to positively impact motivational enhancement, cognitive gains, and metacognitive strategies in reading comprehension (Choi et al., 2005; Yu & Wu, 2020). However, how AI-enabled chatbots can be designed to facilitate learning by teaching between chatbots and students is not clear in the literature.

Therefore, the objective of this research was to promote active engagement in reading through training chatbots. Students were required to coach a self-selected robot. The process involved learners generating questions to test the robot’s reading comprehension while also being required to teach the robot whenever it responded incorrectly. This training process is analogous to self-explaining, which triggers an increase in in-depth engagement in the learning content (Gerjets et al., 2006; Hsu & Tsai, 2013). The following research questions were proposed to gain a better understanding of how the chatbot training activity may affect students’ reading experience:

RQ 1. How did the students participate in chatbot training activity?

RQ 2. What is the impact of chatbot training activity on students’ reading interest and engagement?

RQ 3. What are the relationships, if any, between the students’ chatbot training activity and their interest and engagement in reading?

RQ 4. How can the students’ interest and engagement be characterized based on their participation in the chatbot training activity and their English proficiency?

Literature Review

The importance of individualized learning experiences in reading is increasingly recognized (Connor et al., 2013; Stover et al., 2017). Extensive reading, praised for its ability to cultivate positive attitudes, allows learners to explore a wide range of content. Yet, implementing extensive reading in classroom settings often faces the challenge of providing individualized learning experiences. Chatbots, as an innovative educational technology, offer a solution by enabling customized learning. Typically used as tutors or peers, chatbots have positively influenced students’ reading, but often in a passive context where students primarily respond to questions. This research proposes a shift towards proactive reading, with chatbots acting as tutees, encouraging students to actively engage in reading by asking questions. This shift not only empowers learners to control their learning journey, but also aligns with the “learning-by-teaching” concept, whereby students teach their digital counterparts. The following literature review discusses reading interest, chatbots in learning, and teachable agents to address the aim of this research to explore the effects of utilizing chatbots as tutees in reading, focusing on how this innovative role can deepen student engagement and enhance interest in extensive reading.

Reading Interest

Reading interest has been widely discussed in the literature. Among diverse forms of reading that may impact reading interest, extensive reading has the potential to promote positive reading attitudes due to its flexibility, allowing learners to explore unfamiliar territories (Yamashita, 2013). Unlike the school curriculum, which is often predefined to attain a set of skills and knowledge that may restrain learners from interest discovery (Fryer et al., 2019), one major purpose of extensive reading is to increase learners’ interest and thus lead to the development of reading skills (Day & Bamford, 1998). Positive affective states (e.g., interest, motivation, and flow) were activated via extensive reading (e.g., Al-Homoud & Schmitt, 2009; Cremin & Swann, 2016; Ng et al., 2019). A study conducted by Fujita and Noro (2009) revealed that regular 10-min extensive reading sessions could help enhance intrinsic and extrinsic motivation. Kirchhoff’s (2013) study also demonstrated that readers are more likely to perceive a flow-like experience when exposed to an extensive reading program.

While numerous studies have substantiated the positive influence of extensive reading, its adoption and implementation in education still has much room for growth (Cho & Krashen, 2019; Ng et al., 2019). Structured extensive reading programs might not yield the desired outcomes either (Milliner, 2017) due to the manner in which it is implemented, such as a mandatory reading policy which might lead to a counter-effect since reading turns into an enforcement measure rather than a pleasure (Nakamura, 2018). An effective strategy to engage students in reading involves providing an individualized reading experience, possibly guided or accompanied by others. Supporting this, Bloom’s 2-sigma problem (Bloom, 1984) suggested that the one-to-one tutorial approach, in which one teacher guides a small number of students to learn, is promising in terms of having positive effects on learning. However, given the diverse paces and reading preferences of students, and the limited reading time available in schools (Kim, 2013; Tärning et al., 2019), it becomes challenging for teachers to offer such individualized attention. To overcome this, the integration of chatbots in learning environments presents a viable solution by providing personalized learning experiences. These intelligent systems enable student-centered learning at scale, emulating the personalized attention and adaptive feedback of one-to-one tutoring. This research aimed to understand how chatbots can be used to facilitate and maintain students’ reading interest as part of a prolonged reading program.

Chatbots in Learning

Chatbots in the context of reading are receiving growing interest, assuming various roles such as tutor, companion, mentor, facilitator, and tutee (e.g., Kim, 2013; Tegos & Demetriadis, 2017; Xu et al., 2021; Zhang et al., 2022). These chatbot designs address the need to improve reading comprehension, engagement, and affective states. One widely adopted role for chatbots is that of an instructor or tutor, particularly for young learners. In Xu et al.’s (2021) study, a conversational agent acted as a tutor, guiding students through story recitations and posing questions to aid their language development. The findings revealed the distinctive communication patterns that emerge when children interact with conversational agents as opposed to adults. However, both groups displayed a comparable level of accuracy in their responses. Similarly, in the study conducted by Liu et al. (2022), the chatbot encouraged students to narrate key aspects of the stories. The results indicated that students perceived a sense of social connection with the chatbot, effectively maintaining their interest in reading.

Chatbots were also developed as digital peers that read with students. In the study by Kim (2013), a digital peer was employed to enhance students’ reading comprehension by demonstrating effective questioning techniques and showing affective peer-like support. The results revealed the positive effects of the digital peer in enhancing comprehension and engagement. Similarly, in Zhang et al.’s (2022) study, StoryBuddy played the roles of a companion and mentor to help students stay engaged, and to provide question recommendations to parents to guide their children to read. Chatbots as facilitators to enhance collaborative learning (Dyke et al., 2013; Tegos & Demetriadis, 2017) are also emerging, but are found mostly in contexts such as scientific reasoning, other than pure reading for pleasure.

The aforementioned literature proves that chatbots in the roles of tutor and peer yield positive effects on students’ reading experience. However, the interaction sequence and question generation were primarily driven by chatbots. As a result, children played a passive role by answering questions only, rather than being encouraged to take the initiative to ask questions. Therefore, chatbots that allow children-led question-answering in reading may be an area worth more attention (Zhang et al., 2022). Prior literature also revealed a divergent development of learners’ reading interest with chatbots (Fryer et al., 2017, 2019), where the limitation of students in orchestrating conversation flow was not fully considered. Therefore, this research intended to examine the effects of chatbots behaving as tutees in a reading context to enable learners to take greater ownership of their learning.

Enhancing Learning through Learning by Teaching

Teachable agents or digital tutees engage learners in a trainer-trainee relationship, where the teachable agent assumes the role of tutee (Biswas et al., 2005; Chase et al., 2009; Tärning et al., 2019). In this sense, teachable agents are pedagogical agents that permit students to teach computers rather than vice versa, embracing the learning-by-teaching concept and its potential benefits (Okita & Schwartz, 2013). Learning by teaching is considered an effective approach to in-depth learning as it involves students’ active engagement in externalizing their understanding and knowledge-building of the target learning content (Roscoe & Chi, 2007). By undertaking the task of explaining concepts to their peers, students can engage in a deeper cognitive process, thereby enhancing their own learning experiences (Torshizi & Bahraman, 2019). Further, the act of teaching others can be both rewarding and inspiring, catalyzing sustained interest and motivation, thereby facilitating ongoing learning (Benware & Deci, 1984; Roscoe & Chi, 2007).

Recognizing the advantages associated with learning by teaching, researchers examined the effect of teachable chatbots in diverse learning domains. In the study by Chase et al. (2009), the teachable chatbot allowed students to simulate the teaching of cause-and-effect relationships between science-related variables for 3 days via constructing a concept map. The study indicated that students devoted more effort to learning for their chatbots than for themselves. In the social and affective aspects, the students attributed more mental engagement and responsibilities to their chatbots. Similarly, teachable chatbots were applied to enhance algebra learning (Matsuda et al., 2020). Students were given opportunities to monitor the problem-solving process and to provide feedback on the correctness of the chatbots. The results showed students’ proficiency in solving equations increased after using teachable chatbots for 4 days. The above literature suggests that there remains room for more discovery of the teachable chatbot’s potential due to its short experiment period and fairly close-domain knowledge (e.g., one fever mechanism passage or one/two-step algebra equations). How teachable chatbots may be applied to enhance reading interest during a prolonged reading program is not clear.

Extensive research has demonstrated that generating questions facilitates knowledge exploration and reading comprehension, and helps identify gaps in understanding (e.g., Humphries & Ness, 2015; Mishra & Iyer, 2015). Furthermore, learning through exploration around self-generated questions fosters curiosity and a sense of ownership among students (Alaimi et al., 2020), and therefore may be effective in sustaining long-term interest in reading. The practice of student generating questions or feedback has shown positive impacts on motivation, cognitive gains, and metacognitive strategies (Choi et al., 2005; Yeh & Lai, 2012; Yu & Wu, 2020). However, the skill of formulating questions is not commonly taught in schools (Kopparla et al., 2019), with students typically generating fewer than 0.2 questions per class hour across different cultures, and the questions themselves tend to lack sophistication (Graesser & Person, 1994). Hence, this research aimed to explore the effect of training a chatbot through proactive question generation, diverting from the conventional tutor chatbot focus in which chatbots assume the role of guiding and answering. Our question generation approach to reading aimed to encourage students to read and construct their own questions to train their chatbots, rather than receiving teaching from the chatbots. How students teach an AI-enabled chatbot and how the chatbot training activity may impact students’ interest and engagement in reading were analyzed in the endeavors to nurture learners’ prolonged interest and engagement in extensive reading. The findings may enrich our understanding of chatbot integration in reading, making a valuable contribution to the use of AI techniques in education.

Method

Participants

The participants in this research were 95 fifth graders from four classes in a public elementary school in northern Taiwan. Two classes consisting of 47 students were randomly assigned as the experimental group, which incorporated a chatbot during the reading period. The 48 students in the remaining two classes were the control group, who engaged in a regular reading activity. The reading sessions took place once a week, recurring consistently over 9 weeks. Each session lasted for 50 min. All participants were English as a Foreign Language (EFL) learners, with prior experience of participating in an English reading program during the previous semester. In order to prioritize the goal of fostering an increased interest in reading, no written assessments were administered to evaluate comprehension levels. Furthermore, both the participants and their parents provided informed consent for the students’ involvement in the study and the collection of relevant data.

Procedure

Participants from both the experimental and control groups completed a situational interest survey before the 9-week reading program. The survey served as the basis for understanding how students’ reading interest was influenced throughout the study. Both groups engaged in the same reading for the first 2 weeks, where the students engaged in silent reading while sitting alongside their peers. Throughout the study, a collection of English-graded readers focusing on life education was made available to all students in their respective classrooms. Students could select any of the books to read during the weekly 50-min reading sessions. At the end of each reading session, students from both groups were instructed to fill in a 5-min flow survey. The purpose of the flow survey was to keep track of the attentiveness level exhibited by the students during the reading activities. Therefore, two sets of flow survey results were obtained that represented students’ engagement in the traditional silent reading activities for the first two weeks.

During the following 7 weeks, the two groups engaged in different reading activities: the control group continued with the same reading activity, while the experimental group incorporated chatbot training and battling into their reading sessions. Each student from the experimental group was provided with a tablet, which they used to operate and interact with chatbots for training and battling sessions at any time during the 7-week reading period, except for the last week, which was designated as the battle finale. At the end of each reading session, students from both groups also filled in the flow survey, resulting in seven sets of flow survey results for each group. After the 9-week reading program, participants from both groups completed the situational interest survey again in order to observe how their reading interest evolved throughout the study, with and without the involvement of the chatbot training. Finally, students from both groups were invited to take part in individual interviews. Twenty-six students from the control and experimental groups participated in the interviews. See Figure 1 for the experiment procedure.

Figure 1.

Experiment procedure.

The Chatbot Training System

This study provided a chatbot training system through which students in the experimental group could select a robot image to cater to each individual’s preference. While reading the books, students could test the chatbot with questions about the book they were reading and provide answers if the chatbot did not answer correctly. The chatbot training system encompasses AI techniques and the feature of storing student questions and answers to facilitate the chatbot’s knowledge-building. A battling mechanism was also implemented to enhance students’ engagement. Students could challenge the chatbots trained by their peers through question-and-answer interactions (see Figure 2 for the training and battling interfaces). The knowledge-building process of the chatbots relies on two AI techniques (see Figure 3 for the design of the training system):

1. The SQuAD question answer technique: The chatbot provided to students was equipped with the Stanford Question Answering Dataset (SQuAD) question answer technique (Rajpurkar et al., 2016). The technique includes a large transformer-based language model and is fine-tuned with the SQuAD dataset (Rajpurkar et al., 2016), which consists of question-answer pairs sourced from Wikipedia articles. Therefore, it is used as a question-and-answer model to answer questions about a given story in this study. With the model, the chatbot provided to students was expected to answer simple questions transformed by the original story texts. As the trainers, students provided feedback to the chatbot to confirm whether the answers provided by the SQuAD question-answer technique were correct. The questions, SQuAD answers, and students’ feedback were recorded in the Q&A bank for further queries. The design affords a sense of intelligence for the chatbots, making students perceive the chatbot as being sufficiently intelligent to be trained. However, the SQuAD model’s capacity to respond is limited; it can only answer questions posed by students if the questions are strictly derived from direct sentences in the original text of the story.

2. Question search technique: When the SQuAD question answer technique could not answer students’ questions correctly, the students were asked to provide answers and explanations. The chatbot collected the trainers’ questions and answers as the Q&A bank representing the knowledge bank of the chatbot. The Elasticsearch technique (Gormley & Tong, 2015) was then utilized to detect similar questions asked before from the Q&A bank to answer the new question. The technique functions as an advanced search engine empowered with relevance processing capabilities, rather than solely matching keywords in questions. For instance, when students asked the question “What happened to Dandelion on the way?”, the technique would detect that there were several similar questions asked before, such as “What happened to the main character?” and “Who inspires little Dandelion in the middle of the story?”. The technique retrieves similar questions in the Q&A bank that have been used to train the chatbot. With this technique, the chatbots can then generate answers based on what has been taught.

Figure 2.

The training (left) and battling (right) interfaces of the chatbot training system (English equivalent translation).

Figure 3.

The design of the training system.

The training process involved the students in generating questions for their chatbots. Given that students may not possess advanced question-asking skills (Byun et al., 2014), they may benefit from scaffolding to effectively participate in the training activity. To help students ask questions, a list of seven story elements was displayed (see top-left of Figure 2), serving as a hint for generating questions. The seven elements were based on the fairytale question framework proposed and validated previously (Paris & Paris, 2003; Xu et al., 2022), addressing the fundamental narrative elements. The elements include character, setting, action, feeling, outcome resolution, causal relationship, and prediction. Sample questions were provided as examples to assist students in formulating their questions after selecting a question type they wished to inquire about.

While training the chatbot, students could test its knowledge by asking questions about the stories. The chatbot would first answer the questions based on the Q&A bank, and if it could not retrieve existing questions in the bank, it would then use the SQuAD question-answer technique to answer the question. Following each questioning and answering, students were prompted to assess the accuracy of the chatbot’s response. In cases where students marked the chatbot’s answers as incorrect or when the chatbot was unable to respond, they were prompted to provide additional answers and explanations. They justified their answers by identifying where the original story texts can answer the question to teach the chatbots. Furthermore, students could track their progress regarding the number of training questions on the system. A ranking of training questions was available to provide a comprehensive overview of everyone’s progress, which served as a mechanism for self-monitoring (see bottom-left of Figure 2). Five selected books were designated for the chatbot training activity. They were all part of the collection of English-graded readers focusing on life education from the previous two weeks. The students were free to choose any of these books during the activity. These books, designed as picture books, pair texts with corresponding images to aid comprehension. The average length of these books is 23 pages with an average word count of 491 words per book. Students had the flexibility to choose any book from this selection for each session, progressing at their own pace. The emphasis was not on the number of books completed but rather on consistent participation in the activity. To maintain engagement, especially among less proficient readers, the approach was relaxed and inclusive. These students were encouraged to stay involved by asking questions, either to train their own chatbots or to interact with others, without the pressure of meeting specific goals. The key was continuous interaction with the books.

Due to the limitations of the language model we applied (SQuAD) during the study, the language model could not effectively detect incorrect answers provided by the trainers. To promote student engagement and enhance the quality of questions and answers, the chatbot training activity incorporated challenger and rebuttal mechanisms. The challenger mechanism enabled peers to assess the chatbot trained by fellow students, motivating trainers to improve the quality of their questions and answers (see top-right of Figure 2). Students had the option to choose any chatbot for challenging with four questions each time. The Q&A pairs provided by trainers were utilized to respond to challenge questions, without utilizing the SQuAD technique. Furthermore, the rebuttal mechanism allowed trainers to raise objections to teachers when they believed that challengers had wrongly evaluated the accuracy of their chatbots’ answers. These two mechanisms collectively created a peer verification environment. The chatbot training activity lasted for 7 weeks, with the last week focusing on battling others’ chatbots. Information regarding the trainers and the corresponding accuracy rates of each chatbot were provided as a point of reference. A battling ranking was also made available for students to monitor how each chatbot performed (see bottom-right of Figure 2).

Data Collection

Flow survey. The mental engagement of the students during the reading was measured by a flow perception survey adopted from previous research by Liu et al. (2017). The flow perception survey assesses the psychological state of how involved and engaged students are during learning activities, and thus was applied in this research to probe students’ mental engagement during the 9-week reading program. The survey is a 5-point Likert scale composed of four questions with dimensions of attention, sense of control, curiosity, and intrinsic interest, ranging from 1 (strongly disagree) to 5 (strongly agree). Survey questions include “I stay entirely focused in the reading activity,” “It is fairly easy for me to engage in the reading activity,” “I am full of curiosity about the reading activity,” and “I think the reading activity is interesting.” The Cronbach’s reliability (alpha) values for the survey administered over the period of 9 weeks exhibited consistent results, with values ranging from .76 to .83 across the nine survey sets, showing the adequate reliability of the survey.

Situational interest. The situational interest (SI) survey adopted in this study was adapted from the measure developed by Linnenbrink-Garcia et al. (2010) and later revised by Liu et al. (2022) to probe students’ interest in reading. Two sets of situational interest surveys were collected. The survey consisted of 12 questions, divided into three dimensions: triggered situational interest (triggered-SI), feeling dimension of maintained situational interest (maintained-SI-feeling), and value dimension of maintained situational interest (maintained-SI-value). The triggered-SI dimension measured the extent to which the reading activity captured students’ attention. The maintained-SI-feeling dimension assessed the level of enjoyment associated with reading books. The maintained-SI-value dimension explored whether students perceived reading books as valuable. Each dimension had four questions, and all questions were rated on a 5-point Likert scale, ranging from 1 (strongly disagree) to 5 (strongly agree). Example survey statements include “The English reading class of this semester is exciting,” representing triggered-SI, while “English story books are fascinating to me,” reflecting maintained-SI-feeling, and “Reading English story books is important to me,” representing maintained-SI-value. The survey exhibited adequate reliability, with a Cronbach’s alpha coefficient between .78 and .90 for the three dimensions of situational interest.

Data Analysis

In order to understand overall student behavioral engagement in the reading incorporated with the chatbot training activity in the experimental group (RQ1), questions generated by the students in the training and battling fields for 7 weeks were tallied and categorized based on the fairytale question framework, including seven narrative elements of character, setting, action, feeling, outcome resolution, causal relationship, and prediction (Paris & Paris, 2003; Xu et al., 2022). Two coders were involved in categorizing the questions produced by the students to display the distribution of the question types. There were 3370 questions asked by the students in total of which 635 were discussed and the disagreements were resolved after discussions. The interrater reliability was assessed to measure the agreement between two independent coders who evaluated the participants’ responses. Cohen’s kappa statistic was used as a measure of interrater reliability, and the kappa value was calculated to be κ = 0.785. According to Landis and Koch’s (1977) criteria, this kappa value falls into the substantial agreement category, indicating an acceptable level of agreement between the coders.

To answer RQ2, surveys of situational interest were analyzed through ANOVA with repeated measures to observe the progress within respective groups and the variation between the experimental and control groups before and after the 9-week reading program. Furthermore, students’ flow perception over the 9 weeks was divided into three phases: the initial phase (Phase I: weeks 1 and 2), the middle phase (Phase II: weeks 3–5), and the last phase (Phase III: weeks 6–9). Students’ flow perceptions in each phase were averaged to represent the overall flow perception during each phase. ANOVA with repeated measures was also applied to determine whether there was a significant difference in the periodical flow perception over the three reading phases.

Additionally, the associations between the generated questions and situational interest and flow perception were further investigated to respond to RQ3 using a Pearson correlation analysis. Finally, since linear relations may not be able to depict the complex feature of students’ engagement in the reading activity, to answer RQ4, a cluster analysis was conducted to explore the student clusters based on traits of learning behavior, survey results, and prior English proficiency levels. This research utilized K-means clustering to conduct cluster analysis.

Results

Student Participation in the Chatbot Training

Key metrics were collected to provide an overarching perspective of reading in conjunction with chatbot training activity, including the number of questions taught by each student, the number of questions each student asked to challenge peers’ chatbots, and the accuracy rate of the chatbots’ responses. Questions asked by the 47 students on the training and battling fields for 7 weeks were tallied (see Table 1) and categorized (see Table 2). There were 3370 questions in total asked by the students, with 1237 questions in the training field (weeks 3–8), 1198 questions in the battling field (weeks 3–8), and 935 questions in the battle finale (week 9). From week 3 to week 8, students were engaged in training their own chatbots (M = 26.32; SD = 15.91) as well as challenging others’ chatbots (M = 25.49; SD = 17.08). Each week, reading took place, on average, with four to five training and battling questions, respectively. While training their chatbots, the chatbots demonstrated an average 30.9% accuracy rate, with 81.8% being the maximum accuracy. The students’ chatbots demonstrated an average 27.4% accuracy rate in answering peers’ challenge questions during weeks 3–8, with 54.8% being the maximum accuracy. Following 6 weeks of training, the chatbots demonstrated an average 32.3% accuracy rate in the battle finale (Week 9), with 68.2% being the maximum value.

Table 1.

The Total Numbers of Training and Battling Questions Students Asked the Chatbots.

	N	Average/per person	SD
Number of training questions (Weeks 3–8)	47	26.32	15.91
Number of battling questions (Weeks 3–8)	47	25.49	17.08
Battle finale (Week 9)	47	19.89	13.48
Accuracy rate of the chatbot’s answers in training (Week 3–8)	47	30.9%	0.16
Accuracy rate of the chatbot’s answers in battling (Week 3–8)	47	27.4%	0.13
Accuracy rate of the chatbot’s answers in the battle finale (Week 9)	45	32.3%	0.18

Table 2.

The Distribution of Questions Marked by Coders. (N = 47).

Question category	Character	Setting	Action	Feeling	Outcome resolution	Causal relationship	Prediction	Non-related	Total
Training questions	246	132	198	90	168	132	9	262	1237
%	19.89%	10.67%	16.01%	7.28%	13.58%	10.67%	0.73%	21.18%	100%
Battling questions	532	244	374	134	373	263	48	165	2133
%	24.94%	11.44%	17.53%	6.28%	17.49%	12.33%	2.25%	7.74%	100%

During the interviews with students, discussions centered on their reading interests and motivations. The control group students shared that their drive to read was fueled by the goals of expanding their vocabulary and sentence structures, engaging in interactions with foreigners, and succeeding in examinations. For instance, students in the control group indicated that “I want to know more English words and sentences” and “because there’s a possibility of studying abroad or needing to communicate with foreigners, it’s very important.” When asked what reasons would make them want to read, they answered, “for exams, I guess.” Students from the experimental group reported that their motivation to train their robots stemmed from a desire to maintain their robots’ competitiveness, secure higher rankings, use their robots as companions during reading sessions, and enjoy the fun they associated with these activities. For example, the students in the experimental group indicated, “It’s just to beat other people’s robots, to have a higher ranking. Because playing with robots is not like before when I was reading alone, feeling bored; having a robot feels like having a friend by my side to accompany me.” They also stated, “when someone tests my robot with harder questions, I will go back and train on those questions,” and “I thought training a robot shouldn’t be too difficult, so I wanted to give it a try, and then I found it was quite fun. So, I went to the battling field.” Therefore, chatbot training played an important role in inciting their motivation to read.

Table 2 presents the distribution of questions marked by the two coders. A total of 262 questions in the training field were categorized as “non-related,” indicating that 21.18% of the training questions were irrelevant to the stories. However, the percentage of non-related questions was significantly lower (7.74%) in the battling field, showing that students took a more serious attitude toward battling others’ chatbots. Notably, the top two frequently asked question types in both the training and battling fields were character and action. There were 246 character-related questions (19.89%) and 198 action-related questions (16.01%) generated in the training field, and 532 character-related questions (24.94%) and 374 action-related questions (17.53%) asked in the battling field. Feeling-related questions appeared to be much less generated (7.28% and 6.28% for training and battling fields, respectively), implying that students paid less attention to the characters’ emotions but more to actions. For the more advanced question types that require comprehending the whole story, such as “outcome resolution,” “causal resolution,” and “prediction,” students performed better at generating questions about “outcome resolution” (13.58% and 17.49% for the training and battling fields, respectively). Similarly, students displayed proficiency in asking “causal resolution” questions, representing 10.67% and 12.33% of the questions in the training and battling fields, respectively. Conversely, the “prediction” type of question was the least asked, with only nine questions (0.73%) in the training field and 48 questions (2.25%) in the battling field.

The Results of the Situational Interest and Flow Perceptions Surveys

The purpose of the situational interest survey was to observe how the students’ interest in reading evolved after the 9-week reading program for both groups. A 2 × 2 between-within mixed design factorial ANOVA was conducted to compare the interest changes of the experimental group students with those in the control group. As shown in Table 3, the skewness ranged from −.534 to .411, and the kurtosis ranged from −1.104 to .263, indicating that the data were distributed normally based on the values of the skewness and kurtosis which were between +2.0 and −2.0 (Gravetter & Wallnau, 2014). Sphericity is also achieved in this research condition when t has only two levels in a repeated-measures variable, according to Field (2013).

Table 3.

The Repeated Measures Analysis of the Situational Interest Survey.

Dimension	Pre- & posttest	Group	N	M	SD	Skewness	Kurtosis	F	p
Triggered interest	Pretest	Control	48	3.57	0.87	.083	−.950	19.330***	.000
	Pretest	Experimental	47	3.59	0.81	−.139	−.312
	Posttest	Control	48	2.98	0.92	−.146	−.713
	Posttest	Experimental	47	3.92	0.83	−.431	−.538
Maintained-feeling	Pretest	Control	48	3.64	0.84	.173	−.827	13.176***	.000
	Pretest	Experimental	47	3.54	0.80	.411	−.601
	Posttest	Control	48	3.09	0.84	−.098	−.471
	Posttest	Experimental	47	3.66	0.81	.256	−1.104
Maintained-value	Pretest	Control	48	3.81	0.76	−.033	−.691	7.363**	.008
	Pretest	Experimental	47	3.78	0.89	−.534	−.179
	Posttest	Control	48	3.40	0.85	−.128	.263
	Posttest	Experimental	47	3.86	0.87	−.357	−.639
Overall SI	Pretest	Control	48	3.67	0.76	.094	−.849	17.308***	.000
	Pretest	Experimental	47	3.63	0.75	−.051	−.698
	Posttest	Control	48	3.15	0.78	−.192	−.116
	Posttest	Experimental	47	3.81	0.75	−.118	−.693

**p < .01 ***p < .001.

As indicated in Table 3, the results of ANOVA analysis with repeated measures delineate that the changes between the experimental and control groups reached a statistically significant level in all the dimensions, namely triggered interest (F = 19.330, p < .000), maintained interest in feeling (F = 13.176, p < .000), maintained interest in value (F = 7.363, p < .008), and overall interest (F = 17.308, p < .000). While the control group showed a decreasing tendency in their situational interest in reading, the experimental group demonstrated an increasing tendency in their situational interest. In other words, the chatbot training activity significantly positively influenced the students’ reading interest consistently in both triggered and maintained interest.

The overall weekly flow perceptions of the two groups were analyzed using ANOVA repeated measures (see Table 4). Since the Mauchly’s test of sphericity appeared to be significant (p < .05), the Hunyh-Feldt correction was applied for sphericity correction. The results showed that the trend of the flow perception of the students in the experimental group was significantly different from that of the control group (F = 21.485, p < .000). In order to observe the tendency, ANOVA repeated measure was conducted. For the repeated measures analysis of the experimental group, the results showed that the flow perceptions in phases II and III were significantly higher than in phase I (F = 14.882, p < .000), indicating an increasing trend in flow over time. In comparison, the control group demonstrated a decreasing trend, with phases II and III significantly lower than phase I (F = 8.860, p < .001). The results suggest that the chatbot training activity helped maintain and promote students’ flow perception in the reading activities.

Table 4.

The Overall Repeated Measures Results of the Students’ Flow Perceptions.

Timing	Group	N	M	SD	Skewness	Kurtosis	F	p
I: Weeks 1-2 (average)	Control	48	3.46	0.69	−0.047	−0.703	21.485*** (H-F)	.000
I: Weeks 1-2 (average)	Experimental	47	3.77	0.60	0.357	−1.051
II: Weeks 3-5 (average)	Control	48	3.20	0.76	−0.330	−0.091
II: Weeks 3-5 (average)	Experimental	47	4.01	0.60	−0.059	−1.010
III: Weeks 6-9 (average)	Control	48	3.28	0.79	−0.387	0.626
III: Weeks 6-9 (average)	Experimental	47	4.11	0.56	−0.196	−1.049

***p < .001; H-F: Huynh-Feldt.

The Correlation Between SI, Flow Perception and Questions

Pearson correlation analysis was employed to investigate the potential relationship between the students’ situational interest, flow perceptions of the reading activity, and the number of questions asked. As depicted in Table 5, the results indicated no significant correlations between the number of questions and either SI or flow perception.

Table 5.

The Correlation Between the Students’ Situational Interest/Flow Perception and the Number of Questions Generated for Training the Chatbot (N = 47).

	No. of training questions	No. of battling questions	The accuracy rate of the battle finale
Overall situational interest	−.091	.050	.157
Overall flow	.017	.156	−.136

However, when looking into the types of questions generated for training the chatbots, significant correlations were detected between the students’ situational interest/flow perception and the number of questions (see Table 6). The findings revealed negative correlations between students’ situational interest and flow perceptions and the number of certain question types, such as outcome resolution and causal relationship.

Table 6.

The Correlation Between Students’ Situational Interest/Flow Perception and the Question Types Generated in the Training Field (N = 47).

	Character	Setting	Action	Feeling	Outcome resolution	Causal relationship	Prediction
Overall SI	−.227	−.021	.009	−.084	−.373**	−.318*	−.053
Overall flow	−.038	.020	.102	−.136	−.317*	−.180	.103

*p < .05 **p < .01.

Student Clusters Based on Traits of Learning Behavior and Surveys

Student clusters were identified in the experimental group based on the traits of learning behavior, survey results, and prior English proficiency. The data of 45 students were included in the cluster analysis after excluding students with missing data. The Elbow method with Python revealed that with five clusters, the sum of squared errors for the clusters may be significantly reduced, representing high cluster adequacy, characterized by minimal within-cluster variance and clear differentiation between clusters. There were five distinct student clusters, comprising 18, 8, 7, 11, and 1 student(s), respectively. In order to ensure meaningful analysis, the focus of the subsequent results was narrowed down to four main clusters (see Table 7 and Figures 4 –7), as one of the clusters consisted of only a single student.

Table 7.

Student Clusters Based on Traits of Learning Behavior, Surveys, and English Proficiency (N = 45).

	Cluster	1 Low social connection readers	2 High interest active challengers	3 Low proficiency moderate trainers	4 Low interest active trainers
English proficiency	Average	84.25	93.43	64.74	92.93
English proficiency	Standardized mean value	−0.05	0.64	−1.51	0.60
Situational interest (Posttest)	Average	3.88	4.02	3.56	3.58
Situational interest (Posttest)	Standardized mean value	0.49	0.65	0.10	0.12
Flow perception (Average)	Average	4.07	4.13	3.93	3.95
Flow perception (Average)	Standardized mean value	0.00	0.12	−0.25	−0.22
Total training questions	Average	20	21	22	38
Total training questions	Standardized mean value	−0.38	−0.32	−0.25	0.76
Total battling questions	Average	23	84	48	55
Total battling questions	Standardized mean value	−0.95	1.41	0.05	0.32
Questions received (battle finale)	Average	13	31	15	39
Questions received (battle finale)	Standardized mean value	−0.58	0.61	−0.39	1.07
Accuracy rate (battle finale)	Average	0.16	0.26	0.26	0.42
Accuracy rate (battle finale)	Standardized mean value	−0.60	0.02	0.01	0.98
No. of books trained	Average	4	4	3	4
No. of books trained	Standardized mean value	0.10	0.33	−0.50	0.11
No. of students		18	8	7	11

Figure 4.

Cluster 1: Low social connection readers.

Figure 5.

Cluster 2: High interest active challengers.

Figure 6.

Cluster 3: Low proficiency moderate trainers.

Figure 7.

Cluster 4: Low interest active trainers.

The 18 students in the first cluster were low social connection readers (Figure 4). They demonstrated low social connection with the chatbots and their peers, asking the fewest training questions (M = 20), resulting in lower chatbot accuracy. In particular, these students did not actively participate in challenging peers’ chatbots since the number of battling questions was the lowest (M = 23). Thus, the cluster demonstrated low social connection with peers since the battling activity involved identifying peers’ chatbots and asking these chatbots questions. However, despite their low social engagement, these students still reported above-average situational interest (M = 3.88) and flow perception (M = 4.07). Hence, this group of students might exhibit self-motivation and interest in reading, but they had limited engagement with chatbots.

The second cluster was high interest active challengers (Figure 5); it comprised eight students who demonstrated the highest levels of English proficiency (M = 93.43), interest (M = 4.02), flow perception (M = 4.13), and active participation in battling questions (M = 84). This group of students was highly capable, engaged, and interested in the reading program with the chatbot training activities. They displayed particular enthusiasm for engaging in battles with other students’ chatbots, indicating a preference for competitive interactions rather than focusing solely on training their own chatbots.

Students in the third cluster were low proficiency moderate trainers (Figure 6). This cluster consisted of seven students who exhibited the lowest levels of English proficiency (M = 64.74). Given their limited English proficiency, it is anticipated that they may have encountered challenges in the reading program and perceived lower levels of interest (M = 3.56) and flow (M = 3.93). Due to their low English proficiency, they focused on reading three books, one book fewer than other clusters. However, it is noteworthy that they maintained a certain level of engagement in training their chatbots and achieved a moderate accuracy rate (M = 0.26) equivalent to that of the high interest active challengers group.

The fourth cluster is low interest active trainers (Figure 7), which is comprised of 11 students who demonstrated a commendable proficiency in English (M = 92.93) but low levels of interest (M = 3.58) and flow perception (M = 3.95). This group of students devoted their efforts to training their own chatbots and also received attention from their peers to battle their chatbots. The accuracy rate of their chatbots, therefore, appeared to be the highest among all clusters (M = 0.42).

Discussion

Incorporating a learning-by-teaching approach in the chatbot training activity proved effective during the 9-week reading program, as it successfully increased students’ interest and flow perception in reading. The study yielded several significant findings. First, the presence of a tutee chatbot facilitated increased interest and flow among the students. Second, the quantity of certain types of questions demonstrated a negative correlation with students' interest and flow. Thus, the quantity of questions asked should not be over-emphasized, as it might result in a counter-effect, considering question generation is challenging for young readers. Lastly, four distinct clusters, low social connection readers, high interest active challengers, low proficiency moderate trainers, and low interest active trainers, emerged, presenting the affordances and limitations of chatbot training in reading.

Engagement in the Chatbot Training Activity

During the 9-week reading program, students asked an average of 25 questions to train their own chatbots and 26 questions to battle other chatbots. The two modes of student-led question-answering activities received nearly equal participation, implying that asking questions to their own tutees or tutees of other competitors both play a certain role in facilitating reading interest and behavioral engagement.

For the training portion, the results are partially comparable to the notion of a digital pet that children were expected to take care of by performing acts that would promote its growth (Pesce, 2000). The responsibility placed on the owners was found to be highly motivating for children. The same effect was also found in the chatbot training. The results support the idea that enabling students to train their own chatbots, rather than relying solely on commercial AI chatbots trained by experts, is a potential approach to enhancing learning. For the battling mechanism, students would challenge other chatbots to see how others performed. This could also be explained as a social networking behavior observed in the virtual world, which helps students establish relationships with their peers in real life (Yu et al., 2010). As the students could identify those who challenged their chatbots, they might then engage in reciprocal challenges, thus creating conducive peer learning (e.g., Parr & Townsend, 2002; Topping, 2005) and heightening learning motivation through social networking.

The accuracy rate of the chatbots in the battle finale was 32.30%, with a maximum of 68.20%. The chatbots heavily relied on the questions provided by their trainers to expand their knowledge base. The number of questions, students’ language skills, and the feedback students provided could all be contributing factors in determining the chatbots’ intelligence regarding the story content. Also, question generation requires extensive practice and proper support (Byun et al., 2014). Given the fact that students provided only about 25 questions to train their chatbots and students’ answers were not further verified, the limited set of data could not achieve a high accuracy rate when answering peers’ open-ended questions. Therefore, it is suggested that the accuracy rate should not be used as an evaluation criterion for students’ learning performance.

Reading Interest and Flow Perception after the Chatbot Training Activity

It is indicated that the chatbot-driven QA, in which the chatbots ask questions and students answer the questions, may produce negative effects when students deviate from the scripted process (Kuhail et al., 2023). Conversely, student-led QA in this study was observed to trigger positive effects. The situational interest in reading of the experimental group increased, while that of the control group decreased over time. Thus, the chatbot training activity positively impacted students’ reading interest in both triggering and maintaining interest. The results of our study are in line with the study by Zhao et al. (2012), where a teachable agent was also adopted to boost learners’ interest in science lessons. Thus, the chatbot training activity builds a sustained relationship between the tutor and tutee, contributing to the long-term learning interest. The mechanism of chatbot training yields a more extended positive affective state of the students and therefore increases their interest in reading.

Increased flow perception was observed for the experimental group, in contrast to the control group. While flow perception with reading is often difficult to observe, the chatbot training activity effectively boosts reading flow through asking questions. This finding is consistent with a study conducted by Matsuda et al. (2020), which also noted that students demonstrated active engagement through various high-order behaviors, such as requests for help in selecting questions and tutoring. The chatbot training actions, including asking questions, selecting questions, testing chatbots, and providing answers, may contribute to improving flow perceptions of the reading activity. This finding also resonates with the study by Chase et al. (2009), indicating that the psychological projection of the protégé effect may occur when teaching a teachable agent. Students were more involved and engaged in the process of teaching the agent as compared to doing the same thing for themselves. When students are held accountable for their tutees’ learning, they tend to spend more effort in the process, as a result, bolstering their reading engagement and flow. Additionally, our study found that the chatbot’s inaccurate responses may present an additional opportunity for students to elaborate on story content, thereby enhancing their engagement, as evidenced by the flow survey results of the experimental group. The approach fulfills the self-explanation strategy that may potentially improve text comprehension (Bisra et al., 2018). This prompting for student explanations activates their self-explanation strategy. The positive impact aligns with extensive literature which supports the effectiveness of this strategy in terms of enhancing knowledge understanding (Ainsworth & Th Loizou, 2003; Aleven & Koedinger, 2002; Chi et al., 1989).

Overall, the observed increase in student interest and flow perception with a tutee agent can be attributed to the influence of social agency theory, as explored by Schroeder et al. (2013). The meta-analysis indicated that engagement and learning are enhanced when learners perceive the agent as a social peer, leading to the alignment of their engagement and behaviors in a social interaction manner. This supports the notion that the presence of a tutee agent in the chatbot training activity likely facilitated increased interest and flow among the students in our study.

Relationships Between Training Engagement and Situational Interest and Flow Perception

The analysis revealed that certain question types demonstrated negative correlations with situational interest and/or flow perception. One possible explanation for this finding is that asking questions is not a skill that occurs naturally, as evidenced by research findings (Humphries & Ness, 2015), even among native speakers of English in grades 4 and 5. The majority of questions asked by students at these grade levels tend to be memory-based (Humphries & Ness, 2015). This observation may become even more significant when considering the context of our study, where students are EFL learners. Additionally, the suggested use of the seven elements of a story as question prompts may have limited the scope of question generation to the text itself, resembling a comprehension test. By focusing solely on the story’s content, students may have missed opportunities to ask more authentic questions that draw on real-life experiences or engage in argumentation, which could have led to more heuristic and deeper conversations (Humphries & Ness, 2015). Expanding the range of question types and encouraging students to explore personal connections could potentially foster a more enriching and intellectually stimulating reading experience.

Student Clusters Based on Traits of Behaviors and Perceptions

Four clusters of students were classified according to their interest, flow perceptions, learning behaviors, and English proficiency. It is notable that the low social connection readers asked the least number of questions to others’ chatbots. This suggests that socialization through battling other chatbots was not a priority for this group of students. However, their situational interest and flow perceptions remained above average. Considering the marginally lower English proficiency observed in this cluster, an educational implication is to initially focus on bolstering their reading skills to bridge gaps and prevent a widening disparity resulting from the Matthew effects (Pfost et al., 2012; Stanovich, 1986). The Matthew effects in reading indicate that initial differences in students’ reading abilities tend to increase over time without support. Therefore, it is imperative to first implement specialized guidance and instruction aimed at enhancing their reading abilities. Subsequently, the emphasis can shift towards encouraging a higher frequency of reading.

The high interest active challengers exhibited the highest proficiency in English and showed a particular enthusiasm for competitive battles with other students’ chatbots. Notably, the number of questions they asked in battles was 50% higher compared to the second-highest group. A pedagogical implication for this cluster of students is to sustain their interest by providing incremental challenges (Chase et al., 2009) on the battlefield, allowing their tutees to advance to higher levels.

The low proficiency moderate trainers faced greater challenges in the reading program compared to other clusters. Despite having the lowest levels of interest and flow perception, they did not show disengagement from the reading activities. Instead, this group of students actively participated in both training and battling. This demonstrates the potential of chatbots to assist low achievers in maintaining their engagement. This finding aligns with a study by Chase et al. (2009), which suggested that low-achievers put more effort into improving their tutee agents’ performance. Additionally, according to the role theory (Thomas & Biddle, 1966), the behaviors of individuals are partially framed by the roles they assume. In the case of low achievers, assuming the role of a tutor benefits them even more compared to studying alone (Allen & Feldman, 1973; Robinson et al., 2005). Although students with lower proficiency in our study may not have outperformed other students in terms of interest and flow perception after playing the role of tutors, the experience kept them motivated to stay engaged in reading.

The low interest active trainers with a commendable English proficiency trained their own chatbots the most. Their chatbots had the highest accuracy rate among all the chatbots. However, they perceived a lower level of flow and interest in reading. This student cluster indicates that excessively focusing on maintaining chatbot training and answering accuracy may also induce an extraneous cognitive load (Sweller, 2010) that distracts students from reading. Educators may consider either increasing text richness for this specific student cluster or redirecting their focus to higher interest levels once they achieve proficiency in the chatbot training activity. For instance, it is suggested that project-based learning and collaborative learning can contribute to maintaining situational interest (Hidi & Renninger, 2006).

Conclusion

Given the growing interest in incorporating chatbots such as ChatGPT into education to enhance the learning experience, there is still room for investigation into how to empower students to take on a more proactive role in their learning. This research confirmed that teachable Q&A chatbot training with AI techniques, which diverts from the conventional tutor chatbot approach, promoted students’ reading interest and flow in extensive reading activities. This research found that both the high interest active challengers and the low proficiency moderate trainers benefited the most from the chatbot training activity. However, the other two clusters, the low social connection readers and the low interest active trainers, revealed limitations of tutee chatbots in reading. Addressing these limitations may require providing additional reading skill support and improving the adaptability of the story text to cater to various reading preferences and abilities. These findings enrich our understanding of chatbot integration in reading, making a valuable contribution to educational technology and pedagogy.

This study did not include verification mechanism to detect wrong answers. The inclusion of the verification mechanism may be helpful to increase the accuracy rate and hence improve students’ interest in reading. Future investigation could add a verification mechanism to enhance the overall performance. The question types presented in our study served as question prompters for EFL learners to form basic questions and subsequently enhance the chance for students to revisit the story books to facilitate initial engagement. Therefore, future research may explore how the prompt of question types can be extended from basic comprehension to questions that require authentic discussions or debates to foster inquisitive mindsets and heighten learners’ reading interest. The quality of the questions and answers could also represent a valuable area for further investigation. Additionally, collaborative training on chatbots could be another interesting approach that leads to better chatbot performance through collective efforts. Further investigation may explore the potential of advanced machine learning approaches, such as using the question-and-answer pairs generated by students as the fine-tuning data set of ChatGPT, to enhance chatbot performance. Gathering data on these issues is needed to unleash the potential of incorporating chatbot training into learning.

Footnotes

Authors’ Contributions

Chen-Chung Liu: Conceptualization; Funding acquisition; Investigation; Supervision; Project administration; Writing - review & editing. Wan-Jun Chen: Data curation; Formal analysis; Investigation; Methodology; Software. Fang-ying Lo: Conceptualization; Investigation; Methodology; Roles/Writing - original draft; Writing - review & editing. Chia-Hui Chang: Software; Resources; Validation. Hung-Ming Lin: Methodology; Validation.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Science and Technology Council, Taiwan (110-2511-H-008 -006 -MY3).

Ethical Statement

ORCID iD

Fang-ying Lo

Author Biographies

Chen-Chung Liu is currently a Professor at the Department of Computer Science and Information Engineering, National Central University (NCU). He leads the iLearn lab (), the goal of which is to develop socio-technical environments that enhance individual and collaborative learning.

Wan-Jun Chen was a graduate student with expertise in programming languages (C, C++, Python, and Java) at the Department of Computer Science and Information Engineering, National Central University (NCU).

Fang-ying Lo is currently an Assistant Professor at the Center for General Education at Asia University, Taiwan. Her research interests lie in the areas of technology-enhanced language learning, content and language integrated learning, aesthetic experience, and EFL collaborative learning.

Chia-Hui Chang is currently a Professor at the Department of Computer Science and Information Engineering, National Central University (NCU). She leads the Web Intelligence and Data Mining Laboratory (). Her research interests are enabling technology for Web information reuse and integration, text mining, and story chatbots.

Hung-Ming Lin received his Ph.D in management, specializing in consumer behavior, from National Central University, Taiwan. His areas of research include consumer psychology for sensory stimulus, cognitive psychology, and multivariate analysis (i.e., Structural Equation Modeling, Hierarchical Linear Modeling).

References

Ainsworth

Th Loizou

(2003). The effects of self‐explaining when learning with text or diagrams. Cognitive Science, 27(4), 669–681. https://doi.org/10.1207/s15516709cog2704_5

Alaimi

Law

Pantasdo

K. D.

Oudeyer

P. Y.

Sauzeon

(2020, April). Pedagogical agents for fostering question-asking skills in children. In Proceedings of the 2020 CHI conference on human factors in computing systems, Honolulu, HI, April 25-30, 2020 (pp. 1–13). https://doi.org/10.1145/3313831.3376776

Aleven

V. A.

Koedinger

K. R.

(2002). An effective metacognitive strategy: Learning by doing and explaining with a computer‐based cognitive tutor. Cognitive Science, 26(2), 147–179. https://doi.org/10.1207/s15516709cog2602_1

Al-Homoud

Schmitt

(2009). Extensive reading in a challenging environment: A comparison of extensive and intensive reading approaches in Saudi arabia. Language Teaching Research, 13(4), 383–401. https://doi.org/10.1177/1362168809341508

Allen

V. L.

Feldman

R. S.

(1973). Learning through tutoring: Low-achieving children as tutors. The Journal of Experimental Education, 42(1), 1–5. https://doi.org/10.1080/00220973.1973.11011433

Arguedas

Daradoumis

(2021). Analysing the role of a pedagogical agent in psychological and cognitive preparatory activities. Journal of Computer Assisted Learning, 37(4), 1167–1180. https://doi.org/10.1111/jcal.12556

Atkinson

R. K.

(2002). Optimizing learning from examples using animated pedagogical agents. Journal of Educational Psychology, 94(2), 416. https://doi.org/10.1037//0022-0663.94.2.416

Benware

C. A.

Deci

E. L.

(1984). Quality of learning with an active versus passive motivational set. American Educational Research Journal, 21(4), 755–765. https://doi.org/10.3102/00028312021004755

Bisra

Liu

Nesbit

J. C.

Salimi

Winne

P. H.

(2018). Inducing self-explanation: A meta-analysis. Educational Psychology Review, 30, 703–725. https://doi.org/10.1007/s10648-018-9434-x

10.

Biswas

Leelawong

Schwartz

Vye

The Teachable Agents Group at Vanderbilt . (2005). Learning by teaching: A new agent paradigm for educational software. Applied Artificial Intelligence, 19(3-4), 363–392. https://doi.org/10.1080/08839510590910200

11.

Bloom

B. S.

(1984). The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring. Educational Researcher, 13(6), 4–16. https://doi.org/10.3102/0013189x013006004

12.

Byun

Lee

Cerreto

F. A.

(2014). Relative effects of three questioning strategies in ill-structured, small group problem solving. Instructional Science, 42, 229–250. https://doi.org/10.1007/s11251-013-9278-1

13.

Chase

C. C.

Chin

D. B.

Oppezzo

M. A.

Schwartz

D. L.

(2009). Teachable agents and the protégé effect: Increasing the effort towards learning. Journal of Science Education and Technology, 18, 334–352. https://doi.org/10.1007/s10956-009-9180-4

14.

Chen

Park

H. W.

Breazeal

(2020). Teaching and learning with children: Impact of reciprocal peer learning with a social robot on children’s learning and emotive engagement. Computers & Education, 150, 103836. https://doi.org/10.1016/j.compedu.2020.103836

15.

Chi

M. T.

Bassok

Lewis

M. W.

Reimann

Glaser

(1989). Self-explanations: How students study and use examples in learning to solve problems. Cognitive Science, 13(2), 145–182. https://doi.org/10.1207/s15516709cog1302_1

16.

Cho

K. S.

Krashen

(2019). Pleasure reading in a foreign language and competence in speaking, listening, reading and writing. TEFLIN Journal, 30(2), 231–236. https://doi.org/10.15639/teflinjournal.v30i2/231-236

17.

Choi

Land

S. M.

Turgeon

A. J.

(2005). Scaffolding peer-questioning strategies to facilitate metacognition during online small group discussion. Instructional Science, 33(5-6), 483–511. https://doi.org/10.1007/s11251-005-1277-4

18.

Connor

C. M.

Morrison

F. J.

Fishman

Crowe

E. C.

Al Otaiba

Schatschneider

(2013). A longitudinal cluster-randomized controlled study on the accumulating effects of individualized literacy instruction on students’ reading from first through third grade. Psychological Science, 24(8), 1408–1419. https://doi.org/10.1177/0956797612472204

19.

Cremin

Swann

(2016). Literature in common: Reading for pleasure in school reading groups. In McKechnie

Oterholm

Rothbauer

Skjerdingstad

K. I.

(Eds.), Plotting the reading experience: Theory/practice/politics (pp. 279–300). Wilfrid Laurier University Press. https://doi.org/10.3138/utq.87.3.35

20.

Day

R. R.

Bamford

(1998). Extensive reading in the second language classroom. Cambridge University Press. https://doi.org/10.1017/s0272263101213059

21.

Dyke

Adamson

Howley

Rosé

C. P.

(2013). Enhancing scientific reasoning and discussion with conversational agents. IEEE Transactions on Learning Technologies, 6(3), 240–247. https://doi.org/10.1109/tlt.2013.25

22.

Field

(2013). Discovering statistics using IBM SPSS statistics: And sex and drugs and rock “n” roll (4th ed.). Sage.

23.

Fryer

L. K.

Ainley

Thompson

Gibson

Sherlock

(2017). Stimulating and sustaining interest in a language course: An experimental comparison of chatbot and human task partners. Computers in Human Behavior, 75, 461–468. https://doi.org/10.1016/j.chb.2017.05.045

24.

Fryer

L. K.

Nakao

Thompson

(2019). Chatbot learning partners: Connecting learning experiences, interest and competence. Computers in Human Behavior, 93, 279–289. https://doi.org/10.1016/j.chb.2018.12.023

25.

Fujita

Noro

(2009). The effects of 10-minute extensive reading on the reading speed, comprehension and motivation of Japanese high school EFL learners. ARELE: Annual Review of English Language Education in Japan, 20, 21–30. https://doi.org/10.20581/arele.20.0_21

26.

Gerjets

Scheiter

Catrambone

(2006). Can learning from molar and modular worked examples be enhanced by providing instructional explanations and prompting self-explanations? Learning and Instruction, 16(2), 104–121. https://doi.org/10.1016/j.learninstruc.2006.02.007

27.

Gormley

Tong

(2015). Elasticsearch: The definitive guide: A distributed real-time search and analytics engine. O'Reilly Media, Inc.

28.

Graesser

A. C.

Person

N. K.

(1994). Question asking during tutoring. American Educational Research Journal, 31(1), 104–137. https://doi.org/10.3102/00028312031001104

29.

Gravetter

Wallnau

(2014). Essentials of statistics for the behavioral sciences (8th ed.). Wadsworth.

30.

Guthrie

J. T.

Hoa

L. W.

Wigfield

Tonks

S. M.

Perencevich

K. C.

(2005). From spark to fire: Can situational reading interest lead to long‐term reading motivation? Literacy Research and Instruction, 45(2), 91–117. https://doi.org/10.1080/19388070609558444

31.

Guthrie

J. T.

Humenick

N. M.

(2004). Motivating students to read: Evidence for classroom practices that increase reading motivation and achievement. In McCardle

Chhabra

(Eds.), The voice of evidence in reading research (pp. 329–354). Paul H. Brookes.

32.

Hidi

Renninger

K. A.

(2006). The four-phase model of interest development. Educational Psychologist, 41(2), 111–127. https://doi.org/10.1207/s15326985ep4102_4

33.

Hsu

C. Y.

Tsai

C. C.

(2013). Examining the effects of combining self-explanation principles with an educational game on learning science concepts. Interactive Learning Environments, 21(2), 104–115. https://doi.org/10.1080/10494820.2012.705850

34.

Humphries

Ness

(2015). Beyond who, what, where, when, why, and how: Preparing students to generate questions in the age of common core standards. Journal of Research in Childhood Education, 29(4), 551–564. https://doi.org/10.1080/02568543.2015.1073199

35.

Kim

(2013). Digital peers to help children's text comprehension and perceptions. Journal of Educational Technology & Society, 16(4), 59–70. https://www-jstor-org-443.web.bisu.edu.cn/stable/jeductechsoci.16.4.59

36.

Kim

Thayne

Wei

(2017). An embodied agent helps anxious students in mathematics learning. Educational Technology Research & Development, 65, 219–235. https://doi.org/10.1007/s11423-016-9476-z

37.

Kirchhoff

(2013). L2 extensive reading and flow: Clarifying the relationship. Reading in a Foreign Language, 25(2), 192–212. https://eric.ed.gov/?id=EJ1015757

38.

Kopparla

Bicer

Vela

Lee

Bevan

Kwon

Capraro

R. M.

(2019). The effects of problem-posing intervention types on elementary students’ problem-solving. Educational Studies, 45(6), 708–725. https://doi.org/10.1080/03055698.2018.1509785

39.

Kuhail

M. A.

Alturki

Alramlawi

Alhejori

(2023). Interacting with educational chatbots: A systematic review. Education and Information Technologies, 28(1), 973–1018. https://doi.org/10.1007/s10639-022-11177-3

40.

Landis

J. R.

Koch

G. G.

(1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/10.2307/2529310

41.

Lee

Jang

B. G.

Conradi Smith

(2021). A systemic review of reading engagement research: What do we mean, what do we know, and where do we need to go? Reading Psychology, 42(5), 540–576. https://doi.org/10.1080/02702711.2021.1888359

42.

Linnenbrink-Garcia

Durik

A. M.

Conley

A. M.

Barron

K. E.

Tauer

J. M.

Karabenick

S. A.

Harackiewicz

J. M.

(2010). Measuring situational interest in academic domains. Educational and Psychological Measurement, 70(4), 647–671. https://doi.org/10.1177/0013164409355699

43.

Liu

C. C.

Chen

W. C.

Lin

H. M.

Huang

Y. Y.

(2017). A remix-oriented approach to promoting student engagement in a long-term participatory learning program. Computers & Education, 110, 1–15. https://doi.org/10.1016/j.compedu.2017.03.002

44.

Liu

C. C.

Liao

M. G.

Chang

C. H.

Lin

H. M.

(2022). An analysis of children’s interaction with an AI chatbot and its impact on their interest in reading. Computers & Education, 189, 104576. https://doi.org/10.1016/j.compedu.2022.104576

45.

Matsuda

Weng

Wall

(2020). The effect of metacognitive scaffolding for learning by teaching a teachable agent. International Journal of Artificial Intelligence in Education, 30, 1–37. https://doi.org/10.1007/s40593-019-00190-2

46.

Michaelis

J. E.

Mutlu

(2018). Reading socially: Transforming the in-home reading experience with a learning-companion robot. Science Robotics, 3(21), eaat5999. https://doi.org/10.1126/scirobotics.aat5999

47.

Milliner

(2017). One year of extensive reading on smartphones: A report. JALT Call Journal, 13(1), 49–58. https://doi.org/10.29140/jaltcall.v13n1.211

48.

Mishra

Iyer

(2015). An exploration of problem posing-based activities as an assessment tool and as an instructional strategy. Research and Practice in Technology Enhanced Learning, 10(1), 1–19. https://doi.org/10.29140/jaltcall.v13n1.211

49.

Nakamura

(2018). Effects and impact of extensive reading in Japanese university English for general purpose classes. Studies in Self-Access Learning Journal, 9(1), 3–10. https://doi.org/10.37237/090102

50.

Q. R.

Renandya

W. A.

Chong

M. Y. C.

(2019). Extensive reading: Theory, research and implementation. Teflin Journal, 30(2), 171–186. https://doi.org/10.15639/teflinjournal.v30i2/171-186

51.

Okita

S. Y.

Schwartz

D. L.

(2013). Learning by teaching human pupils and teachable agents: The importance of recursive feedback. The Journal of the Learning Sciences, 22(3), 375–412. https://doi.org/10.1080/10508406.2013.807263

52.

Paris

A. H.

Paris

S. G.

(2003). Assessing narrative comprehension in young children. Reading Research Quarterly, 38(1), 36–76. https://doi.org/10.1598/rrq.38.1.3

53.

Parr

J. M.

Townsend

M. A.

(2002). Environments, processes, and mechanisms in peer learning. International Journal of Educational Research, 37(5), 403–423. https://doi.org/10.1016/s0883-0355(03)00013-2

54.

Pesce

(2000). The playful world: How technology is transforming our imagination. Ballantine Books.

55.

Pfost

Dörfler

Artelt

(2012). Reading competence development of poor readers in a German elementary school sample: An empirical examination of the Matthew effect model. Journal of Research in Reading, 35(4), 411–426. https://doi.org/10.1111/j.1467-9817.2010.01478.x

56.

Plant

E. A.

Baylor

A. L.

Doerr

C. E.

Rosenberg-Kima

R. B.

(2009). Changing middle-school students’ attitudes and performance regarding engineering with computer-based social models. Computers & Education, 53(2), 209–215. https://doi.org/10.1016/j.compedu.2009.01.013

57.

Rajpurkar

Zhang

Lopyrev

Liang

(2016). Squad: 100,000+ questions for machine comprehension of text. https://doi.org/10.18653/v1/d16-1264

58.

Robinson

D. R.

Schofield

J. W.

Steers-Wentzell1

K. L.

(2005). Peer and cross-age tutoring in math: Outcomes and their design implications. Educational Psychology Review, 17, 327–362. https://doi.org/10.1007/s10648-005-8137-2

59.

Roscoe

R. D.

Chi

M. T.

(2007). Understanding tutor learning: Knowledge-building and knowledge-telling in peer tutors’ explanations and questions. Review of Educational Research, 77(4), 534–574. https://doi.org/10.3102/0034654307309920

60.

Schroeder

N. L.

Adesope

O. O.

Gilbert

R. B.

(2013). How effective are pedagogical agents for learning? A meta-analytic review. Journal of Educational Computing Research, 49(1), 1–39. https://doi.org/10.2190/ec.49.1.a

61.

Stanovich

K. E.

(1986). Matthew effects in reading: Some consequences of individual differences in the acquisition of literacy. Reading Research Quarterly, 21(4), 360–407. https://www-jstor-org-443.web.bisu.edu.cn/stable/747612

62.

Stover

Sparrow

Siefert

(2017). “It ain't hard no more!” Individualizing instruction for struggling readers. Preventing School Failure: Alternative Education for Children and Youth, 61(1), 14–27. https://doi.org/10.1080/1045988X.2016.1164659

63.

Sweller

(2010). Element interactivity and intrinsic, extraneous, and germane cognitive load. Educational Psychology Review, 22, 123–138. https://doi.org/10.1007/s10648-010-9128-5

64.

Tärning

Silvervarg

Gulz

Haake

(2019). Instructing a teachable agent with low or high self-efficacy–does similarity attract? International Journal of Artificial Intelligence in Education, 29, 89–121. https://doi.org/10.1007/s10648-010-9128-5

65.

Tegos

Demetriadis

(2017). Conversational agents improve peer learning through building on prior knowledge. Journal of Educational Technology & Society, 20(1), 99–111. https://www-jstor-org-443.web.bisu.edu.cn/stable/jeductechsoci.20.1.99

66.

Thomas

E. J.

Biddle

B. J.

(1966). The nature and history of role theory. In Biddle

B. J.

Thomas

E. J.

(Eds.), Role theory: Concepts and research (pp. 3–20). Wiley. https://doi.org/10.1093/sf/45.4.597-a

67.

Topping

K. J.

(2005). Trends in peer learning. Educational Psychology, 25(6), 631–645. https://doi.org/10.1080/01443410500345172

68.

Torshizi

M. D.

Bahraman

(2019). I explain, therefore I learn: Improving students’ assessment literacy and deep learning by teaching. Studies In Educational Evaluation, 61, 66–73. https://doi.org/10.1016/j.stueduc.2019.03.002

69.

Unrau

N. J.

Quirk

(2014). Reading motivation and reading engagement: Clarifying commingled conceptions. Reading Psychology, 35(3), 260–284. https://doi.org/10.1080/02702711.2012.684426

70.

Wang

Collins

Lee

Warschauer

(2021). Same benefits, different communication patterns: Comparing Children’s reading with a conversational agent vs. a human partner. Computers & Education, 161, 104059. https://doi.org/10.1016/j.compedu.2020.104059

71.

Wang

Ritchie

Yao

Warschauer

(2022). Fantastic questions and where to find them: FairytaleQA- an authentic dataset for narrative comprehension. https://doi.org/10.18653/v1/2022.acl-long.34

72.

Yamashita

(2013). Effects of extensive reading on reading attitudes in a foreign language. Reading in a Foreign Language, 25(2), 248–264. https://hdl.handle.net/10125/66872

73.

Yeh

H. C.

Lai

P. Y.

(2012). Implementing online question generation to foster reading comprehension. Australasian Journal of Educational Technology, 28(7), 1152–1175. https://doi.org/10.14742/ajet.794

74.

Yin

Goh

T. T.

Yang

Xiaobin

(2021). Conversation technology with micro-learning: The impact of chatbot-based learning on students’ learning motivation and performance. Journal of Educational Computing Research, 59(1), 154–177. https://doi.org/10.1177/0735633120952067

75.

A. Y.

Tian

S. W.

Vogel

Kwok

R. C. W.

(2010). Can learning be virtually boosted? An investigation of online social networking impacts. Computers & Education, 55(4), 1494–1503. https://doi.org/10.1016/j.compedu.2010.06.015

76.

F. Y.

W. S.

(2020). Effects of student-generated feedback corresponding to answers to online student-generated questions on learning: What, why, and how? Computers & Education, 145, 103723. https://doi.org/10.1016/j.compedu.2019.103723

77.

Zhang

Wang

Yao

Ritchie

T. J. J.

(2022, April). Storybuddy: A human-ai collaborative chatbot for parent-child interactive storytelling with flexible parental involvement. In Proceedings of the 2022 CHI conference on human factors in computing systems, New Orleans LA, 29 April 2022- 5 May 2022 (pp. 1–21). https://doi.org/10.1145/3491102.3517479

78.

Zhao

Ailiya Shen

(2012). Learning-by-teaching: Designing teachable agents with intrinsic motivation. Journal of Educational Technology & Society, 15(4), 62–74. https://www-jstor-org-443.web.bisu.edu.cn/stable/jeductechsoci.15.4.62