Abstract
Educational robots represent a unique form of teacher presence. Exploring how the communication features of robot instructors affect student learning experience could contribute to the advancement of educational robots. This study examined the impact of speech rate, voice type, and emotional tone of robots on students’ cognitive load, attitudes toward robot-assisted teaching, and learning performance. We recruited 477 Chinese primary school students assigned to either the speech rate, voice type, or emotional tone experiment. The results indicate that speech rate significantly influenced students’ cognitive load, with the medium speed condition resulting in higher germane load compared to both fast and slow speed conditions. Moreover, students had a lower preference for adult male voices over adult female, boy, or girl voices. However, voice type did not significantly impact attitudes toward robot-assisted teaching or learning outcomes. Emotional tone did not affect students’ cognitive load, attitudes, or learning performance. These findings provide valuable insights for instructors and designers when configuring the communication features of educational robots in classroom environments. Additionally, students generally prioritized the intelligence of the robot over its communication features, and they did not perceive the teaching content as difficult in all experiments. This study has methodological and practical significance.
The integration of educational robots represents a significant advancement in the field of Artificial Intelligence in Education (AIED) (Alimisis, 2013; Anwar et al., 2019; Benitti, 2012; Han et al., 2005). Educational robots serve as valuable tools that assist teachers in delivering instruction (Tzagkaraki et al., 2021). For instance, these robots can provide real-time feedback, offer explanations, and facilitate interactive activities, which significantly augment the teacher’s role in the classroom (Kuhail et al., 2023; Reich-Stiebert & Eyssel, 2015). Moreover, through hands-on interaction with robots, students can gain a deeper understanding of complex subjects as they actively explore concepts, think critically about the answers of robots, and respond accordingly. Additionally, the presence of robots in the learning environment sparks curiosity and motivation among students. The interactive and dynamic nature of robots captivates their attention and sustains their interest in the learning process (Han et al., 2005). Nevertheless, several issues require further investigation, such as the effectiveness of educational robots in providing guidance for learning across disciplines (Serholt, 2018) and their potential to enhance learning for individuals with diverse backgrounds and characteristics. Moreover, educational robots could suffer from some drawbacks, including cost, technical glitches, limited adaptability, and lack of emotional intelligence. Current research in educational robotics has primarily focused on enhancing technical aspects such as comprehension accuracy and feedback mechanisms (Alimisis, 2013; Anwar et al., 2019). There remains a critical gap in understanding the design features of educational robots, particularly the impact of communication features on learners (Causo et al., 2016).
Language and direct communication are considered the most effective means of instruction (Laland, 2017). In regular classrooms, factors such as the speed of speech (Simonds et al., 2006), vocal timbre, and emotional tone (D’Mello et al., 2022) can have an impact on student learning. Robots represent a unique form of teacher presence, and exploring how the communication features of robot instructors affect student learning could contribute to the advancement of educational robots (Kim et al., 2022). As such, this study aims to examine the effects of the communication features of educational robotics, including speech rate, voice type, and emotional tone, on students’ cognitive load, attitudes toward robot-assisted teaching, and learning outcomes. It is crucial to examine cognitive load in order to assess whether the communication features of educational robots are optimized to facilitate effective learning. If the communication features of educational robots are not appropriately designed, they may confuse or overwhelm students, resulting in increased cognitive load and consequently negative outcomes. Moreover, understanding students’ attitudes toward robot-assisted teaching is essential to assess their acceptance and willingness to engage with educational robots as learning tools. We are also interested in students’ learning outcomes since they provide direct evidence for the effectiveness of educational robots. This research will contribute to a deep and comprehensive understanding of how communication features can be utilized to facilitate teaching and learning experiences in educational robot settings.
Literature Review
Educational robots are “seen by many as offering major new benefits in education at all levels” (Benitti, 2012, p. 978; Johnson, 2003). Educators have been actively generating ideas and developing activities to integrate robots into the teaching of various subjects, including math, science, engineering, and social science (Benitti, 2012; Cheng et al., 2018). Compared to teacher-led instructions, robots can enhance students’ attention and learning interests (Han et al., 2005). Regarding the design of educational robots, researchers generally agree that enhancing the human-like attributes of robots can significantly increase learners’ emotional engagement and attentiveness during interactions, thus, in turn, leading to better instructional outcomes (Alimisis, 2013; Kim et al., 2022; Li et al., 2023). For instance, Li et al. (2023) found that the robot with a highly human-like voice elicited more pleasure and perceived likability, fewer negative attitudes, and higher levels of arousal. When referring to a human-like voice, it encompasses several key components that contribute to its overall characteristics. These components include speech rate, voice type, and emotional tone. In the subsequent sections, we reviewed the research pertaining to the speech rate, voice type, and emotional tone of robots, to gain a deep understanding of how these specific communication features impact user experiences and outcomes.
Speech Rate and Educational Robots
Speech rate is typically defined as the speed at which an individual speaks, measured by the number of words per unit of time (Sturm & Seery, 2007). Existing research has demonstrated that as speech rate increases, overall speech intelligibility tends to decrease (Du et al., 2014). Studies have consistently found that in daily communication, individuals with moderate to fast speech rates are perceived as more intelligent, trustworthy, and socially appealing (Robinson et al., 1999). Simonds et al. (2006) found that teachers’ slow speech rates reduce their credibility and hinder students’ effective learning. To enable effective and efficient communication between humans and machines, Shimada and Kanda (2012) investigated the appropriate speech rate for robots when providing location information to users. Unlike human-to-human interaction, they found that participants preferred robots with normal or moderately slow speech rates, perceiving them as capable. Additionally, different environments have varying demands on robot speech rate, with slower rates being preferred in high cognitive load environments. However, there is limited research on the speech rate of robots in K-12 settings (Williams et al., 2007). As educational robots are being integrated into classrooms and offline teaching contexts, it is crucial to investigate the optimal speech rate that promotes efficient interaction between robots and students. This question requires careful attention and further research.
Voice Type and Educational Robots
Voice type refers to the categorization of a person’s voice based on physical and vocal characteristics, such as the range, pitch, and quality of a person’s vocal production. Voice type is a perceptual attribute that is multidimensional and involves various sensory modalities. That is, voice type is not quantifiable in terms of physical measurements or units but rather relies on subjective judgments and descriptions. In the context of educational robots, voice type could refer to the classification of the artificial voice used by the robot, such as male, female, child, or synthesized voice. Nevertheless, according to McGinn and Torre (2019), “relative litter consideration is given to how the voice of the robot should sound, which may have adverse effects on acceptance and clarify of communication” (p. 211). Research on the voice type of robots in the K-12 curriculum is even scarce. As an example, Dou et al. (2021) investigated users’ affective evaluation toward four types of robot voices, including male, female, child, and synthesized voices, in different application domains, i.e., shopping reception, home companionship, and education. They concluded that the most acceptable voice types for education robots were adult female and male voices. In the study of Sandygulova and O’Hare (2015), four child voices were synthesized, including two female voices named Rosie (English UK) and Ella (English US), as well as two male voices called Harry (English UK) and Josh (English US). They found that children in the Republic of Ireland exhibited a significant preference for the voices of Harry and Rosie with an English UK accent compared to Josh and Ella with an English US accent. However, Kühne et al. (2020) contended that participants demonstrated a preference for voices that closely resembled human voices as opposed to synthesized voices. In general, the current literature provides insights into the design of robot voices, but it does not specifically address the appropriate voice types for K-12 students, especially in the Chinese context.
Emotional Tone and Educational Robots
Teaching is essentially a form of emotional labor (Isenbarger & Zembylas, 2006). Learners tend to learn better from teachers who display positive emotions compared to those displaying negative emotions (Horovitz & Mayer, 2021; Lawson et al., 2021; Lawson & Mayer, 2022). For instance, Lawson et al. (2021) designed virtual animated teachers with positive and negative emotions. Their study revealed that students were able to recognize the emotional tone conveyed by the teacher, and having a positive teacher had a beneficial effect on students’ engagement with the course. However, there were no statistically significant differences in student performance. Beege and Schneider (2023) also found that animated teaching agents (video robots) with positive emotions were more effective in stimulating students’ enthusiasm for learning, although students’ scores were not affected. As another example, Kory Westlund et al. (2017) designed two types of robots, namely expressive and flat robots, for the purpose of dialogic reading with preschoolers. They investigated the impact on children’s vocabulary comprehension when the robot read stories containing target words using either expressive or flat voices. They found that children benefited more from the expressive robot than from the flat robot. Despite the existence of these studies, to the best of our knowledge, no research has yet investigated the influence of emotional tone on students’ learning within our specific research context, namely Chinese literacy learning, which will be further elucidated in the subsequent sections.
The Current Study
In sum, the current literature lacks empirical studies that specifically investigate the communication features of educational robots. Furthermore, the acceptance of educational robots among students varies across different domains. For instance, Reich-Stiebert and Eyssel (2015) found that the respondents expressed willingness to utilize education robots in areas associated with science, technology, engineering, and mathematics (STEM), while they were reluctant to adopt them in arts and social sciences. Additionally, Reich-Stiebert and Eyssel (2015) found that German respondents held neutral attitudes towards educational robots. Students from other cultural backgrounds may hold different attitudes. Thus, this study aims to examine the effects of the communication features of educational robotics, including speech rate, voice type, and emotional tone, on students’ cognitive load, attitudes toward robot-assisted teaching, and learning performance in the Chinese context. More specifically, this study addresses the following research questions: (1) How does the varying speech rate of educational robots impact students’ cognitive load, attitudes towards robot-assisted teaching, and learning performance? (2) What are the effects of different voice types of educational robots on students’ cognitive load, attitudes toward robot-assisted teaching, and learning performance? (3) Does educational robots’ emotional tone influence students’ cognitive load, attitudes toward robot-assisted teaching, and learning performance?
The selection of robot communication features in this study is based on previous research and the categorization of everyday speech features. Specifically, the speech rate of robot teaching is defined as slow (194 words per minute), medium (234 words per minute), and fast (250 words per minute), aligning with the Chinese educational standard of 200–250 words per minute (Cai et al., 2004) and the division of speech rates by Song & Feng (1999). All three speech rates fall within the standard range but meanwhile are distinguishable. Regarding voice type, previous research has not determined the preferred robot voice type for a specific subject (Kühne et al., 2020; Sandygulova & O’Hare, 2015). The voice types used in this study are sourced from iFlytek Open Platform, an AI platform that provides human-like, natural, and high-fidelity voices (Du et al., 2016). Particularly, this study explores four synthesized voice types of educational robots: adult male, adult female, boy, and girl voices. Lastly, the emotional tone of the robot in this study was classified into three categories: positive, negative, and neutral, as these categories represent a broad spectrum of emotional expressions commonly encountered in human-robot interactions.
For the first research question, we hypothesize that students who interact with educational robots that have a medium speech rate will experience lower cognitive load compared to those interacting with robots using slow speech rates. Conversely, students exposed to educational robots with fast speech rates are expected to exhibit higher cognitive load. We also expect that students will exhibit the most positive attitudes towards robot-assisted teaching when interacting with educational robots featuring a medium speech rate. Furthermore, we anticipate that their learning performance will be significantly better compared to students interacting with robots using slow or fast speech rates.
Regarding our second research question, we hypothesize that there will be no significant differences in cognitive load and learning performance across the four voice-type conditions of educational robots. However, we expect that students exposed to boy or girl voices will exhibit more positive attitudes towards robot-assisted teaching.
In terms of the third research question, we anticipate that students who interact with educational robots conveying a positive emotional tone will have lower cognitive load compared to those exposed to a neutral or negative emotional tone. Furthermore, we expect that students exposed to educational robots with a negative emotional tone will show less favorable attitudes toward robot-assisted teaching compared to those exposed to a positive or neutral emotional tone. In line with the literature, we hypothesize no significant differences in learning performance between students interacting with educational robots in a positive tone and those in a negative tone.
Methods
Participants
The study focuses on K-12 students in Guangzhou, China, comprising a total of 477 participants. The participants were third to fifth-grade students. It is noteworthy that we first obtained approval from the school board prior to recruiting the participants. Prior to their participation, all individuals were provided with comprehensive information about the research objectives, procedures, potential risks, and benefits. Moreover, they were encouraged to ask questions and clarify any doubts before expressing their consent to participate. We also ensured that participants were informed of their right to withdraw from the study at any stage without any adverse consequences.
Distribution of the Classes in Experiments.
The Educational Robot
We designed and developed an educational robot equipped with intelligent question-answering abilities, the capability to respond to ancient poetry, and the functionality to engage in word games. It consists of a 3D-printed shell and a dialogue system (Figure 1). The Educational Robot.
In the dialogue system, speech recognition and speech synthesis are implemented by calling the API interface of iFlytek, a well-known intelligent speech and AI company in the Asia-Pacific region. Speech recognition, also referred to as speech-to-text, encompasses several stages, including audio input, pre-processing, feature processing, the recognition process, and post-processing. Specifically, pre-processing optimizes the raw audio information. For instance, using Voice Activity Detection (VAD) technology, it detects audio segments containing sound information and removes the silent parts. Following the pre-processing, acoustic feature parameters are extracted from the audio to obtain useful information for recognition. The extracted feature parameters are matched with acoustic models to determine specific phonemes (consonants and vowels in Chinese). Finally, through language model comparison, Chinese characters are generated as textual materials for natural language processing.
It is noteworthy that we adopted Rasa, an open-source machine learning framework, for building conversational robots (Bocklisch et al., 2017). Rasa covers almost all functionalities of a dialogue system and is currently a mainstream framework for conversational robots. The framework includes Rasa NLU (Natural Language Understanding) for extracting user intents and key contextual information, Rasa Core for selecting optimal replies and actions based on the dialogue history, and Rasa channels and actions for connecting the robot with users and backend services such as instant messaging.
In the dialogue system, natural language processing involves building a database of ancient poetry and adding intents and dialogue rules within the open-source dialogue framework of Rasa. It mainly consists of two major parts: natural language understanding and natural language generation. Since student utterances in the experiments were not complex, we chose a rule-based intent classification method to analyze student utterances and identify intents and entities. This method requires manually creating rule templates and category information, mapping different keywords to different categories, and then performing semantic parsing and inference based on the matched rule templates to determine the user’s intent. For example, if a student asks, “Which dynasty did Su Shi belong to?” the NLU can recognize the intent as “searching for a dynasty” and the entity as “Su Shi.” With this information, we can quickly perform queries in the knowledge base. The knowledge base stores all knowledge of ancient poetry from compulsory education textbooks in China, as well as rules templates for teacher-student interactions in class. Once the corresponding answer is retrieved from the knowledge base, it is transformed into a corresponding reply text using Natural Language Generation (NLG). Finally, the iFlytek TTS (Text to Speech) engine is used to convert the text into speech output.
As mentioned earlier, the speech output in educational robots can be adjusted to different speed rates. For this study, we specifically utilized three speed rates: slow, medium, and fast, corresponding to 194, 234, and 250 words per minute, respectively (Cai et al., 2004; Song & Feng, 1999). Additionally, the robot’s speech was synthesized using four distinct voice types: adult male, adult female, boy, and girl voices. These voice types were obtained from the iFlytek Open Platform (Du et al., 2016). Moreover, we tailored the emotional tone of the robots by employing a diverse range of words. To convey positive emotions, we integrated uplifting, encouraging, and supportive phrases, such as “Excellent work” and “Well done”. For neutral emotions, we used words that maintained a balanced delivery of information, such as “Good”. In the negative emotion condition, the robot refrained from providing positive feedback even when students answered questions correctly.
Instruments
The instruments utilized in this study comprised a classroom quiz, voice feature questionnaire, and cognitive load scale.
Classroom Quiz for Assessing Student Learning Performance
For the experiments exploring speech rate and voice type, we chose the ancient poem “Pale-dark Plum” as the instructional material. The selected poem used in our study is sourced from the standard Chinese language textbook for students in the targeted age group. The researchers, in collaboration with the instructor, carefully assessed the complexity and content of the poem to ensure its appropriateness for our study participants. In the learning process, participants were tasked with recognizing unfamiliar Chinese characters in the poem, understanding their meanings, grasping the overall message of the poem, comprehending the poet’s intended emotions, and familiarizing themselves with the poet’s background and the historical context of the poem’s creation. This multifaceted learning task ensured that students encountered different levels of cognitive load. Moreover, during the learning process, students could engage in conversations with the educational robot, discussing any content related to the poem and the poet. This approach contributed to a rich and diverse cognitive learning experience for the participants.
Furthermore, the researchers and the instructor designed 16 post-test questions, including six phonics questions, six multiple-choice questions, and four fill-in-the-blank questions, to assess students’ performance in language literacy and comprehension with educational robots of varying speech rates. Given that the voice type experiment involved more advanced students who were in the fourth grade compared to those in the speech rate experiment, we designed a total of 15 questions to reflect their levels of understanding of the learning content. These questions encompassed six phonics questions, five multiple-choice questions, one fill-in-the-blank question, and three open-ended questions. For the emotional tone experiment, the teaching content aims to provide an understanding of the life of the poet Su Shi. The teaching content was intentionally selected to align seamlessly with the curriculum for fifth-grade students. Similarly, the researchers and instructors co-designed ten questions to assess students’ learning performance in literacy. The questions consisted of five multiple-choice questions, two fill-in-the-blank questions, two complete-the-verse questions, and one open-ended question. Lastly, we created standardized grading rubrics for those tests to ensure fairness and consistency in the assessment process. It is worth mentioning that we took several steps to ensure the content validity of our tests. First, the tests had a particular focus on evaluating students’ comprehension of the poem. Each test question was carefully crafted with a defined goal, ensuring alignment with the knowledge being tested. Moreover, we collaborated closely with the content expert, in this case, the instructor, to guarantee the relevance and representativeness of the test items. The combination of clear objectives, meticulous question design, and collaboration with a content expert strengthened the content validity of the reading comprehension tests.
The Voice Feature Questionnaire
To explore students’ feelings and preferences toward the robot’s speech features during the teaching process, a questionnaire was designed to collect data. The questionnaire consists of both scale-based questions and open-ended questions. The scale-based questions include three dimensions: (1) Students’ perceived difficulty of teaching content, (2) Students’ preference for speech features, and (3) Students’ engagement in robot-assisted teaching. The open-ended questions are used to determine students’ sensitivity to speech features, their ability to distinguish between different speech features used by the robot, and which attribute of the robot they believe has the greatest impact during the teaching process. Since the three experiments investigate different speech features, the corresponding questionnaires are not entirely identical.
The Cognitive Load Questionnaire
In this study, we designed a cognitive load questionnaire based on the NASA Task Load Index (NASA-TLX) scale. The questionnaire included three categories: intrinsic cognitive load, extraneous cognitive load, and germane cognitive load. Participants were asked to self-assess their cognitive load using a five-point Likert scale. Sample items for intrinsic, extraneous, and germane cognitive load were “I was able to comprehend all of the material in robot-assisted teaching”, “The tone of the robot teacher negatively affects my learning”, and “The robot teacher makes me actively engaged in the lesson”, respectively.
Experiments
Settings in the Speech Rate Experiment.
The experimental procedures for the three experiments were identical. Upon entering the designated classroom, the participants engaged in a tutorial session lasting approximately 15 minutes, during which we provided an explanation of the robot’s functions and reminded the participants of classroom disciplines. During the experiment, the educational robot provided instructions on the teaching content and prompted the students to take notes on key points. Additionally, the robot engaged with the students by posing questions and adjusting its responses based on the accuracy of their answers (see Figure 2). After the teaching session, students watched a short video introducing the voice features to help them understand the complex concepts of “intonation”, “speech rate”, and “emotional tone”. Immediately afterward, students proceeded to undertake a classroom quiz and fill out the voice feature questionnaire and the cognitive load scale. Throughout the process, the researchers were readily available to address any questions or concerns raised by the students regarding the questionnaires. An illustration of the Experimental Setting.
Data Cleaning and Preprocessing
Following the exclusion of classroom quizzes and questionnaires with evident logical errors or uniform responses across all questions, the speed rate experiment produced 126 valid classroom quizzes, 122 valid cognitive load scales, and 126 valid voice feature questionnaires. Similarly, the voice type experiment resulted in 156 valid classroom quizzes, 157 valid cognitive load scales, and 157 valid voice feature questionnaires. Lastly, the emotional tone experiment obtained 113 valid classroom quizzes, 115 valid cognitive load scales, and 111 valid voice feature questionnaires.
Preprocessing of the voice feature questionnaire was conducted through item analysis and exploratory factor analysis. In the item analysis, the scores of each participant for the scale-based questions were summed, and the total scores were then ranked in descending order. The top 27% of the total scores were assigned as the high-score group (assigned as 1), and the bottom 27% were assigned as the low-score group (assigned as 2) (Kelley, 1939). Independent sample t-tests were performed to determine the differences between the high- and low-score groups for each question. Questions that showed no significant differences between the high- and low-score groups, except for the teaching difficulty items, were removed. The results of the Kaiser-Meyer-Olkin (KMO) test and Bartlett’s test showed that all sample data were suitable for factor analysis (see Appendix A). Based on the factor loading matrix and theoretical assumptions, three factors were evident and named: Perceived Difficulty, Preference, Engagement and Continuation. Sample items for the three constructs were “How challenging do you find the content covered in this lesson?“, “Do you like the voice type of the educational robot”, and “Would you prefer the robot teacher to continue instructing you?“, respectively. Regarding reliability, the Cronbach’s alpha coefficients for the questionnaire and their respective dimensions were all above .7, indicating good internal consistency (see Appendix B).
To ensure the quality of the cognitive load scale, we conducted item analysis and exploratory factor analysis. Items that showed no significant differences between high and low scoring groups in the item analysis were removed, as well as items that did not meet the factor loading requirements in the exploratory factor analysis. The Cronbach’s alpha coefficients for the scale used in the three experimental conditions were all above .7, indicating high internal consistency, and the Cronbach’s alpha coefficients for each dimension were above .65, demonstrating good reliability (see Appendix C).
Results
Speed Rate Experiment
Descriptive Analysis of Relevant Variables in the Speech Rate Experiment.
The Dunn’s Test for Comparing Differences in Germane Cognitive Load.
Note. *p < .05, **p < .01. “Adjusted p” refers to the significance value adjusted by the Bonferroni correction.
It is noteworthy that we asked the participants the question, “How would you characterize the speech rate of the robot teacher’s speech?“. The responses indicate that when the speech speed was designated as “slow”, a noteworthy majority of students (over 69%) perceived it as medium speed. Even when the speed was set as “fast,” a considerable 63.41% of students still classified it as medium speed.
Moreover, students were tasked with ranking a set of factors, including “robot’s speech speed,” “robot’s advanced features,” “robot’s appearance,” “robot’s responsiveness,” and “robot’s movement.” The results indicated that the majority of students attributed the highest level of influence on their learning experience to the factor of “robot’s advanced features.” Conversely, the factor of “robot’s speech speed” was perceived as the least influential factor.
Voice Type Experiment
Descriptive Analysis of Relevant Variables in the Voice Type Experiment.
Students’ Preferences Towards Different Voice Types.
Note. *p < .05, **p < .01. “Adjusted p” refers to the significance value adjusted by the Bonferroni correction.
In this experiment, we also asked the participants to rank the relative importance of the following factors: the robot’s voice, advanced features, appearance, response speed, and ability to move. The results revealed that the speed of the educational robot’s response was deemed the most influential attribute in robot teaching, whereas the voice type of the robot was regarded as the least influential factor.
Emotional Tone Experiment
Descriptive Analysis of Relevant Variables in the Emotional Tone Experiment.
Discussion
This study examined the impact of speech rate, voice type, and emotional tone of educational robots on students’ cognitive load, attitudes towards robot-assisted teaching, and learning performance. The results indicate that speech rate significantly influenced students’ cognitive load, with the medium speed condition resulting in higher germane load compared to both fast and slow speed conditions. Moreover, students had a lower preference for adult male voices over adult female, boy, or girl voices. However, voice type did not significantly impact attitudes towards robot-assisted teaching or learning outcomes. Emotional tone did not affect students’ cognitive load, attitudes, or learning performance. We discuss these findings in detail in the following sections.
Speech Rate of Educational Robots
In this study, we found that the germane cognitive load score for the medium speed condition was significantly higher than that in both the fast and slow speed conditions. The moderate pace could have allowed students to comprehend and process the instructional content effectively without feeling overwhelmed or bored. This could have led to a higher germane cognitive load as students actively engaged with the information and made meaningful connections with their existing knowledge. Moreover, the impact of speed on cognitive load might be intertwined with the complexity of the text. A moderate pace may have optimized students’ ability to navigate through moderately complex content. When the speed is too fast or too slow, it might disrupt the students’ cognitive flow and hinder their ability to integrate and retain the instructional content.
Interestingly, the impact of speech rate did not align with students’ subjective perceptions. Generally, when categorizing speech speeds as “fast, medium, and slow,” students showed a similarity in their perceptions. The majority of students perceived these diverse speech speeds as falling within the medium range. This phenomenon could potentially explain the lack of significant differences in students’ learning outcomes and attitudes toward robot-assisted teaching across the three experimental conditions. Furthermore, when considering the influential factors, such as the robot’s advanced features, speech speed, appearance, responsiveness, and movement, students ranked the robot’s speech speed as the least influential factor. This observation provides a potential explanation for the aforementioned findings.
Voice Type of Educational Robots
We found that students had a significantly lower preference for adult male voice compared to the voice types of adult female, boy, and girl. There are several explanations that could potentially account for this finding. Firstly, the students may have been influenced by prevalent gender stereotypes. Specifically, male voices were usually associated with authority or seriousness, and female voices were often perceived as more comforting in Chinese culture. Moreover, the students had more exposure and familiarity with female voices in regular classes since there were more female teachers than male teachers (Huo et al., 2021). Consequently, this familiarity may have influenced their preferences for voices similar to those they encounter in their daily lives. Additionally, students’ preference for voice types might have been influenced by the perceived relevance or relatability of the voices to their age group. They may consider the voices of their peers (boy and girl) as more relatable and engaging compared to adult voices, leading to a lower preference for adult male voices. It is noteworthy that the results of this study contradicted the findings of Dou et al. (2022), who reported that a male voice outperformed female or child voices in education, as they found that male voices conveyed a greater sense of competence to users.
We did not find significant differences in students’ learning outcomes, cognitive load, and attitudes towards robot-assisted teaching (i.e., perceived difficulty, and engagement and continuation) across the four experimental conditions. We argued that the voice type may have had limited influence on students’ perceptions and outcomes if the content itself was the primary focus of the learning experience. In this experiment, students were aware of the post-test for assessing their learning performance, they may pay limited attention to the voice types of the educational robot. As an illustration, the participants ranked the voice type of the robot as the least influential factor among the many contributing factors.
Emotional Tone of Educational Robots
This study found that the emotional tone of educational robots did not have a significant effect on students’ learning performance, cognitive load, and attitudes toward robot-assisted teaching. One possible reason is that students put limited focus on the emotional tone of robots. Upon analyzing a ranking question, it became evident that “the robot’s response speed” was deemed as the most influential factor in relation to educational robots. Another plausible explanation is that the participants may have displayed a limited sensitivity to the emotional tone of robots. We asked the participants the question, “How would you characterize the emotional tone of the robot teacher’s speech?“. Interestingly, students in the positive tone condition achieved a high accuracy rate of 76.47%, whereas students in the neutral tone condition obtained a moderate accuracy rate of 51.43%. The class implementing negative emotion teaching yielded a much lower accuracy rate of only 2.63%. These results suggest that regardless of the emotional tone conveyed by the robots, the majority of students perceived it as either neutral or positive in nature. Additionally, it could be due to the robot’s lack of body language, especially limited facial expressions, which weakens the effectiveness of speech-emotion expression (DiSalvo et al., 2002). Lastly, some students might have concerns about negatively impacting the robot. This was evident during the experiment when two students sought clarification from the experimenters regarding the consequences of their choices on the robot. This observation aligns with the research conducted by Kanda et al. (2004), who reported that students engaged with the robot not primarily for learning and enjoyment but rather out of a sense of sympathy towards the robot due to its lack of companionship with others. For instance, Kory Westlund et al. (2017) found that irrespective of whether the robot employed neutral or expressive voices during storytelling, the children exhibited attentive listening and gained new vocabulary.
General Discussion
Compared to voice features, students generally prioritized the intelligence aspect of the robot in the three experiments related to speech rate, voice type, and emotional tone. As aforementioned, students were asked to rank various features of the educational robot. The factors that ranked highest were “robot’s response speed” and “robot’s advanced features”. It is likely that students, influenced by high-tech films and their exposure to AI, perceive robots as highly intelligent and advanced. Prior to interacting with the robot, many students were curious about its functions. In comparison, limited attention was given to the robot’s voice features. This might explain the insignificant differences in students’ learning performance, and intrinsic and extraneous cognitive load across experimental conditions. However, it cannot be confirmed whether long-term use of these different voice features would directly or indirectly affect learning outcomes. For instance, in the study of Kory Westlund et al. (2017), children who listened to the robot telling stories with expressive voices were more likely to imitate the robot’s intonation in storytelling.
Students did not perceive the teaching content as difficult in all experiments. This finding has significant implications for educators. It suggests that the robots were successful in communicating the instructional materials and engaging in discussion with students. This highlights the effectiveness of educational robots as tools for assisting teaching in classroom settings. However, based on the findings from the experiments on speech rate and voice type, it is suggested to equip educational robots with a medium speech rate and refrain from using adult male voices for primary school students.
Findings from this study offer practical implications for educators, designers, and policymakers. Educators should emphasize the intelligence aspects of educational robots in teaching materials, utilizing advanced features to enhance student engagement. Additionally, they should address students’ curiosity about robot functions and consider integrating introductory sessions to align expectations. Designers need to prioritize the development of advanced features and consistently innovate to enhance response speed. Regarding speech rate and voice type, designers should create robots with a default setting of medium speech rate and female or child voices for primary school students. Policymakers can advocate for technological advancements, transparent communication guidelines, and policies that align with students’ expectations. Moreover, they can support initiatives recognizing the effectiveness of educational robots in facilitating the communication of instructional materials.
Despite the significance of our practical implications, two factors should be kept in mind during the design and incorporation of educational robots. First, the absence of teachers in the experimental setting may have multifaceted effects. On one hand, the involvement of students’ teachers could potentially suppress students’ subjectivity. On the other hand, the absence of teachers may create an informal class atmosphere, potentially diverting students’ focus from the content and, consequently, impacting their cognitive load, attitude, and learning performance. Moreover, it is worth mentioning that students expressed surprise at the robot’s instruction on ancient Chinese poetry, given that the educational robot was largely used in the contexts of programming and AI education. This surprise or curiosity led to speculation among students about the potential disciplinary role of the robot, akin to that of a human teacher. For instance, whether the robot would ask them to sit still or not be involved in chitchats.
In sum, the exploration of communication features in educational robots, as highlighted in our study, aligns with the emerging trend within Human-Computer Interaction (HCI) related to the unique role of educational robots as a form of teacher presence (Lawson et al., 2021). Results from this study shed light on the nuanced ways in which communication features influence the overall learning experience and student preferences, providing insights for the design and implementation of educational robots in the K-12 context. For instance, we found that speech rate significantly influenced germane cognitive load, and students exhibited a preference for certain voice types. These findings resonate with current debates in HCI, particularly those involving human-robot interaction (Lawson et al., 2021; Li et al., 2023). Moreover, we conducted our study in a Chinese primary school setting, echoing the call for culturally sensitive studies in HCI. As Salgado et al. (2015) argued, HCI design should consider the specific application environment, including cultural, social, economic, and political factors. Kyriakoullis and Zaphiris (2016) also emphasized the importance of addressing both visible and invisible cultural characteristics to enhance user acceptance.
This study lays the foundation for further research in three significant areas within HCI that closely align with our research focus: Explainable AI in educational robots, ethical considerations in robot-human interaction, and novel interface designs of educational robots. For instance, the educational robot developed for this study was equipped with intelligent question-answering capabilities and interactive functionalities. It can integrate Explainable AI (XAI) to enhance transparency. One direction is to make the intricate processes of speech recognition, natural language processing, and knowledge retrieval understandable for teachers and students. By providing insights into the system’s decision-making, the educational robot could foster user trust and facilitate comprehension of technology. Furthermore, our study examined educational robots with diverse emotional tones. It might be unethical to convey messages in a negative tone or deliberately overlook students’ progress in the learning process. Future research should involve establishing emotional tone guidelines for educational robots that safeguard users’ psychological well-being. Additionally, it would be interesting to explore novel interface designs for educational robots, such as the integration of facial expressions and hand gestures.
Conclusion
This study investigated the effects of speech rate, voice type, and emotional tone of educational robots on students’ cognitive load, attitudes towards robot-assisted teaching, and learning performance. The results indicate that speech rate significantly influenced students’ cognitive load, with the medium speed condition resulting in higher germane load compared to both fast and slow speed conditions. Voice type showed that students had a lower preference for adult male voices over adult female, boy, or girl voices. However, voice type did not significantly impact attitudes towards robot-assisted teaching or learning performance. Emotional tone did not affect students’ cognitive load, attitudes, or learning performance. These findings provide valuable insights for instructors and designers when configuring the communication features of educational robots in classroom environments. Moreover, students generally prioritized the intelligence aspect of the robot over its communication features, and they did not perceive the teaching content as difficult in all experiments, indicating the effectiveness of educational robots in communicating instructional materials.
While this study has methodological and practical significance, we acknowledge that it suffers from several limitations. Firstly, the short duration of the experiment may have hindered students from fully developing an understanding of voice features and experiencing different attributes for a more comprehensive comparison. Additionally, the limited participation of each student in the experiment prevented them from developing sensitivity to voice features. A within-participant design might reveal more interesting findings across different conditions. For instance, students experiencing various voice features might pay more attention to and be more affected by those features. Moreover, it is noteworthy that the current capabilities of educational robots are limited in terms of interaction. The robot’s role is primarily focused on providing instructions and responses within the scope of the primary school-level poetry curriculum. It lacks advanced interactive capabilities that would enable it to respond to more complex and innovative student questions. Lastly, while it is a feasible approach to tailor the emotional tone of the robots by employing a diverse range of words, one limitation is the potential oversimplification of emotional expression. Emotions are complex and multi-faceted and conveying them solely through chosen words may not fully capture the richness and nuances of human emotional experience.
As a closing remark, several directions merit future exploration in this field. The influence of communication features in teaching is not isolated to each individual feature, but rather involves the complex interaction and collective impact of different voice features on the actual teaching process. Therefore, one direction for future research is to investigate the integration of various voice features and their combined effects. Additionally, it is crucial to explore whether adapting voice features to align with instructional content can yield improved outcomes. For instance, appropriately adjusting the volume during explanations of key points may capture students’ attention and positively influence teaching effectiveness. Furthermore, it would be interesting to examine changes in students’ long-term interest during robot-assisted teaching and develop corresponding strategies to sustain engagement. By analyzing interaction logs generated during human-robot interactions and collecting individualized data, it becomes possible to customize the communication features of educational robots based on learners’ unique characteristics, thus providing personalized services. It is also noteworthy that the most recent studies on robots emphasize the display of nuanced emotions and expressive body movements during interactions (Tran et al., 2023). Future research should focus on this aspect, ultimately aiming to be empathetic and provide students with emotional support. These avenues of research have the potential to enhance the adaptability and effectiveness of educational robots in supporting teaching and learning experiences.
Footnotes
Acknowledgements
We would like to thank Guangzhou Cloudbutterfly Technology Co., Ltd. for providing experimental support.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Humanities and Social Sciences project of Ministry of Education of China (grant number 23YJA880028).
Ethical Statement
Data Availability Statement
The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.
