Abstract
This paper provides a research synthesis of intelligent personal assistants (IPAs) – that is, cloud-based virtual assistants such as Alexa, Google Assistant, and Siri – for second language (L2) learning. The article also offers a theoretical justification for the use of IPAs in language learning and outlines the affordances and constraints of the technology. Finally, the article proposes directions for future research on the topic of IPAs based on the aforementioned review of L2 literature. While research indicates that IPAs increase opportunities for meaningful communication in the target language, enhance speaking ability, and provide indirect pronunciation feedback, they also present some challenges for L2 learners – namely, IPAs struggle to reliably understand L2 speech, which may limit the usability of virtual assistants among heavily accented learners.
Keywords
Introduction
Alongside the increasing proliferation of IPAs, there has been a rise in research interest on their use in informal and formal educational settings (Daley and Pennington, 2020). Research interest in utilizing them for second language (L2) learning also appears to be growing, as there have been numerous studies in the past several years which have examined their use in the context of L2 learning (e.g. Dizon, 2017, 2020; Chen et al., 2020; Moussalli and Cardoso, 2016, 2020; Tai and Chen, 2020). These studies have primarily focused on evaluating the feasibility of IPAs for L2 learning, understanding learner views, experiences, and behavior, and assessing their impact on different aspects related to language learning. Accordingly, this article reviews relevant L2 literature as it pertains to IPAs while outlining the affordances and constraints of the technology for language learning. The paper also recommends avenues for future research on their use in L2 settings.
What are Intelligent Personal Assistants (IPAs)?
IPAs are “software agents that can automate and ease many of the daily tasks of their users” (Santos et al., 2016: 194). This can consist of basic tasks – that is, transactional interactions involving searching for information, setting reminders, and playing music via a smartphone or smart speaker. However, in a survey focused on user behavior with smart speakers, Nielsen (2018) found that 68% of users chat with an IPA for fun, which indicates that they are being used for more than just transactional interactions – that is, social interactions with virtual assistants are becoming more commonplace. Relatedly, the chat bots developed through the Alexa Prize, a competition to promote the development of conversational artificial intelligence, and Google’s Meena are examples of social bots that were developed with the aim of enabling users to have engaging, open-ended conversations with a virtual assistant. Unlike previous social bots, these applications have the ability to give specific responses depending on the conversational context. While Meena is not currently available for use among the general public, users can interact with the Alexa Prize social bots by saying “Alexa, let’s chat” to an Alexa-enabled device. In addition to social bots, there are various applications that could be used for L2 learning purposes. For instance, applications such as Vocabulary Builder by Magoosh could be used to enhance English vocabulary and the interactive audio story application Choose Your Own Adventure could offer learners opportunities for English listening and speaking practice. In short, advancements in the fields of natural language processing and automatic speech recognition (ASR) have made it possible for IPAs to be used to a variety of ways, which may have implications on L2 learning, as discussed in more detail below.
Theoretical Case for the Use of IPAs
The use of IPAs for L2 learning is supported by interactionist theory, which posits that interaction with another speaker is a catalyst for language development (Long, 1996). According to interactionist theory, communication with another L2 speaker affords three benefits: negotiation of meaning, modified input, and attention to linguistic form (Chapelle, 2005). IPAs are not currently able to modify output in order to improve L2 comprehension (i.e. modified input), but they can promote negotiation of meaning and attention to linguistic form through the implicit pronunciation feedback learners receive (Dizon, 2017; Tai and Chen, 2020). Although interaction is typically associated with human communication, Bibauw et al. (2019) note that human–machine interactions also can enhance L2 ability, stating that “meaningful practice of a target language, as it occurs in conversation, leads to improve the learner’s proficiency in that language, and that, even if a native speaker remains the ideal interlocutor, a computer can provide opportunities for such practice” (p.829). Self-regulated learning (SRL) perspectives provide another support for the use of virtual assistants for L2 development. As noted by Wang and Chen (2019), SRL is closely associated with the concept of learner autonomy and stresses the ability of learners to take responsibility for their own learning. The framework has been utilized in many computer-assisted language learning (CALL) studies involving different technologies and the concept of learner autonomy (e.g. Wang and Chen, 2019). While no study has utilized the SRL perspective as a means to study IPAs, preliminary evidence suggests that virtual assistants can be beneficial for autonomous language learning (Dizon and Tang, 2020). Considering this, IPAs could help support L2 autonomy by providing learners with opportunities for language practice in a low-stress environment, much like other ASR-based technologies (Bashori et al., 2021).
Existing L2 Research on IPAs
In this section, previous L2 research involving IPAs is reviewed and the findings from these studies are synthesized in relation to the affordances and constraints of the technology for language learning (Reinders and Hubbard, 2013).
Learner Views and Experiences
Much of the research on IPAs in the context of L2 learning has explored learners’ perceptions and experiences towards the technology for language learning. In a feasibility study conducted in Canada, Moussalli and Cardoso (2016) investigated L2 English students’ views of Alexa for L2 learning. The participants in their study indicated that the IPA was easy to use, fun, and beneficial for L2 development. However, the participants mentioned one notable challenge when interacting with Alexa in English – namely, that the virtual assistant had difficulty understanding their commands. In a similar study involving Alexa, Dizon (2017) also found that L2 English students in Japan had favorable views towards the IPA for L2 learning. Specifically, the participants felt that Alexa provided opportunities for English-speaking practice as well as implicit feedback on pronunciation. Dizon (2020) conducted a follow-up study involving Japanese university students on the use of Alexa but focused on the in-class use of the virtual assistant for English speaking and listening development. The participants had moderately positive views of the IPA and perceived it to be a useful and enjoyable tool for L2 learning.
In another study involving university L2 English learners in Taiwan, Chen et al. (2020) found that the participants had largely favorable attitudes regardless of proficiency level. In particular, they felt that Google Assistant enhanced motivation, reduced learning anxiety, and was useful in their English studies. Nonetheless, sensitivity to pronunciation errors was noted by many of the learners, which highlights the same issue regarding comprehensibility reported by those in Moussalli and Cardoso (2016). Some of the students also noted that the lack of direct feedback made it difficult for them to pinpoint how to overcome these issues related to learner pronunciation and IPA comprehensibility.
Tai and Chen (2020) conducted a study at the junior high school level and found that Taiwanese L2 English students had very favorable attitudes towards Google Assistant for language learning. Three key themes identified from the interviews were that the IPA lowered L2 anxiety, made language learning more enjoyable, and afforded the learners an authentic environment for L2 communication. The researchers also concluded that the implicit feedback provided by Google Assistant directed the learners to gaps in their L2 production, which, in turn, encouraged them to modify their L2 output.
Finally, Dizon and Tang (2020) explored the use of Alexa for autonomous English language learning among Japanese university students. The learners in their study reported that the IPA was fun to use and supported language development. Again, however, issues related to L2 speech comprehensibility were mentioned by some of the participants. In short, studies show that L2 learners generally have positive attitudes towards using virtual assistants for language learning purposes, although they often report that IPAs struggle to understand their L2 speech.
Learner Behavior
Several studies have been conducted to investigate the ways in which L2 learners interact with IPAs. Moussalli and Cardoso (2020) examined the strategies used by English as a second language university students when they encounter breakdowns in communication with an IPA – that is, instances when a learner is not fully understood. They found that the participants most frequently repeated a request (43%), following by rephrasing (32%), and abandonment (23%). Chen et al. (2020) had similar results with the Taiwanese students in their study – namely, repeat and rephrase were the most common responses to communication breakdowns.
Dizon and Tang (2020) also looked at learner behavior when faced with communication difficulties, but had different findings. The Japanese university students in their study most often resorted to giving up or abandonment (63%), while rephrasing (20%) and repeat (17%) strategies were used much less frequently. This contrast in findings is most likely due to the nature of the studies and how data were collected – that is, in Moussalli and Cardoso (2020) and Chen et al. (2020), data were collected from a single session or two sessions, and the participants’ interactions were videorecorded by the researchers. Whereas, in this study the participants used the IPA in their homes over a period of two months. Another interesting finding from Dizon and Tang (2020) is that L2 learners may not actively use IPAs despite having positive views regarding their use for language learning. This indicates that more guidance may be needed in order for sustained technology-mediated L2 learning to occur (Botero et al., 2019).
Reliability in Understanding Speech
As noted previously, L2 speakers have reported difficulties related to being understood by an IPA (Dizon, 2017; Chen et al., 2020; Moussalli and Cardoso, 2016). Quantitatively, the data seem to support the notion that virtual assistants cannot consistently comprehend L2 speech. For example, Alexa was only able to fully understand 50% of the learner-generated commands given by the Japanese L2 English learners in Dizon’s (2017) study. Chen et al. (2020) reported similar results in that Google Assistant had difficulty fully comprehending the L2 English utterances of their participants. The researchers did find that proficiency played a role in comprehensibility, with low- and intermediate-level learners being understood far less frequently than more advanced L2 speakers.
Research by Daniels and Iwago (2017) resulted in mixed findings in terms of IPA capacity to understand L2 speech. Their study compared the accuracy rates of Siri and Google Speech Recognition, which is the technology behind Google Assistant, in transcribing Japanese L2 English learners’ utterances. Although Google Speech Recognition successfully transcribed an average of 82% of the target words/sentences, Siri had a much lower mean accuracy rate (66%). Findings by Moussalli and Cardoso (2020) seem to contradict those by Dizon (2017) and Chen et al. (2020). Specifically, the L2 English learners in their study were understood 83% of the time by Alexa. However, this still lags behind what the human raters were able to comprehend in their study (95%), which suggests that virtual assistants cannot reliably understand L2 speech at this point in their development.
Listening and Speaking
Compared to other aspects related to L2 learning, far less research has been conducted on how IPAs can impact listening and speaking in a L2. One exception is Tai and Chen’s (2020) study involving the use of Google Assistant among Taiwanese junior high school students. In their two-week study, the participants interacted with the virtual assistant in a variety of activities. Pre- and post-surveys were administered to assess the impact that these IPA-mediated activities had on the participants’ willingness to communicate (WTC). Results of their analysis revealed that Google Assistant did in fact have a significant effect on WTC – that is, the learners reported lower levels of anxiety and higher levels of WTC and communicative confidence in the target language when compared with English conversation in the traditional classroom.
In another study, Dizon (2020) investigated the effects of Alexa on Japanese L2 English students’ listening and speaking development. The learners in his study were split into two groups: a control group, which was taught using conventional approaches; and an experimental group, which underwent a 10-week intervention of weekly IPA–student interactions. Pre- and post-tests were administered to determine if the treatment had any influence on the participants’ English listening and speaking skills. Although Alexa did not promote L2 listening, it was found that the virtual assistant significantly enhanced L2 speaking. These studies by Tai and Chen (2020) and Dizon (2020) indicate that IPAs have the potential to not only increase speaking confidence and WTC in a L2, but speaking ability as well.
Affordances and Limitations
Reinders and Hubbard (2013) describe ways in which CALL can promote and constrain language learning. Using these criteria, this section will detail the affordances and constraints of using IPAs for L2 learning based on the current literature. First, IPAs can provide access to L2 resources that would otherwise be difficult for L2 students to obtain. Specifically, learners feel that virtual assistants afford them with more conversational opportunities in the target language (Dizon, 2017). This is significant in the context of foreign language learning as L2 learners may not have sufficient opportunities to interact with native speakers outside of the classroom. In addition, virtual assistants have been found to provide authenticity – that is, IPAs afford an authentic environment for communication in a target language (Dizon, 2017; Chen et al., 2020; Tai and Chen, 2020). Relatedly, IPAs can have a positive influence on interaction, the primary tenet of interactionist theory. For instance, results from Tai and Chen (2020) showed that speaking with an IPA resulted in higher levels of WTC and lowered levels of anxiety, thus leading to enhanced L2 interaction, while Moussalli and Cardoso (2020) concluded that IPAs afford learners with increased opportunities for both L2 input and output. Feedback is the final affordance identified in the literature. Although virtual assistants do not provide explicit feedback, the indirect pronunciation feedback that they provide is perceived to be useful among L2 learners (Dizon; 2017; Moussalli and Cardoso, 2020; Tai and Chen, 2020).
In terms of constraints, one notable limitation of IPAs relates to interaction. While interaction is a critical part of L2 development, technology-mediated interactions may be beyond the capabilities of many language learners (Reinders and Hubbard, 2013). In the context of IPAs, virtual assistants appear to have difficulty fully understanding L2 speech, particularly among learners who have heavily accented speech. This points to a larger issue regarding ASR-based technologies such as IPAs – that is, research indicates that they have difficulties understanding non-standard English varieties. For instance, Tatman (2017) compared ASR accuracy across several English dialects and found that YouTube’s automatic captioning system performed significantly worse with Scottish English than American English. Notably, gender also affected accuracy in the study. These findings highlight the fact that the developers of virtual assistants have much work to do in order to make their systems more equitable for both first-language and L2 speakers. Considering this, IPAs may be more appropriate for use among more experienced L2 learners who do not have marked accents in the target language as their interactions with virtual assistants are less likely to be prone to communication breakdowns. An additional constraint is connected with the one of the previously mentioned affordances – feedback. As noted by Reinders and Hubbard (2013), the feedback that CALL systems provide is often quite limited, and this is also true when it comes to virtual assistants (Chen et al., 2020). Lastly, the non-linearity of IPAs can pose difficulties for L2 learners. Even though non-linearity can enhance learning flexibility, the increased number of choices learners have to make can lead to inefficient learning (Reinders and Hubbard, 2013), which, in turn, may result in abandonment of the technology (Dizon and Tang, 2020). As shown in other research on self-directed language learning through technology (e.g. Botero et al., 2019), learners may struggle to use IPAs for L2 learning without guidance and training as it is often difficult for them to plan and monitor their own technology-mediated learning processes (Viberg and Kukulska-Hulme, 2021). As a result, incorporation of IPAs inside and outside the language classroom should include sustained feedback from the teacher or researcher.
Directions for Future Research
While many studies have examined IPAs in the context of L2 learning, there are areas which are still under-addressed in the literature. First, most studies have collected data from single or a small number of sessions. Therefore, more longitudinal studies ought to be conducted to evaluate the long-term effects of virtual assistants on L2 learning-related aspects. Also, more research needs to be done involving naturalistic settings – that is, informal, out-of-class use of IPAs in order to better understand the ways in which learners leverage these tools for L2 learning. Although it would be difficult to impossible to record learner–IPA interactions in these types of studies, data could be collected in the form of the usage logs that Alexa, Google Assistant, and Siri store in the cloud. Another avenue of research that could be explored is the use of IPAs with learners of less commonly studied L2s such as Chinese, Korean, or German. Virtual assistants can now respond in a variety of languages; therefore, there is a need to examine the views and behaviors of other L2 learners when it comes to IPAs.
Conclusion
The goal of this paper was to review L2 literature concerning IPAs and detail their affordances and constraints for language learning. Using criteria outlined by Reinders and Hubbard (2013), four affordances and three constraints were identified in the research concerning IPA-mediated L2 learning. Some key implications are that IPAs can be used to promote authentic interaction in the target language, which is significant in foreign language contexts, and that virtual assistants may not be suitable for learners with heavy L2 accents. Also, to strengthen existing literature on the topic and promote more targeted use of virtual assistants for L2 learning/teaching, several directions for future research were recommended – namely, research on IPAs should be directed towards investigating how they can be used to enhance specific L2 skills and examining their use among learners of less commonly studied foreign languages.
