Abstract
The rapid advancement in video technologies in recent years has made the creating and editing of subtitles for a video increasingly feasible. For a long time, however, the pedagogical potentials of subtitling in language teaching and learning have been neglected. To fill this gap, this technology review introduces VisualSubSync, a free and flexible tool for subtitle editing, as a practical instrument to facilitate the teaching of listening and viewing skills. This review first provides an overview of the main features of the software and then discusses the possible benefits it offers to listening and viewing pedagogies in the English-as-a-foreign-language classroom. Finally, the challenges that teachers and learners may face with subtitling are considered.
Keywords
Introduction
Listening and viewing comprehension are active skills that can be improved through conscious effort and practice. One of the most feasible approaches to developing listening and viewing skills is to increase learners’ exposure to authentic spoken language input (Caruana, 2021). Audiovisual media such as video are generally considered as a useful source of input for language learning (Hardison and Pennington, 2021). It has been established that watching video clips in the target language, particularly those with subtitles, is effective in promoting learners’ comprehension and in building vocabulary of that language (Baranowska, 2020; Lertola, 2019; Montero Perez et al., 2013; Negi and Mitra, 2022; Yeldham, 2018).
Subtitling is the process of adding on-screen text to a video to represent the message that is being spoken (Costa-Montenegro et al., 2016). Because subtitles establish links between the on-screen text and the verbal message in a video (Williams and Thorne, 2000; Winke et al., 2010), subtitling may have enormous pedagogical potentials to develop language skills. In subtitling tasks, the role of language learners is transformed from passive consumers of subtitles (Ávila-Cabrera and Corral Esteban, 2021) into producers of subtitles. However, although most language learners (and teachers) are experienced users of subtitles, only a few of them have had previous contact with subtitling (Talaván et al., 2017), mainly because of the time-consuming training process. For this reason, the value of subtitling as a didactic tool has long been neglected.
Fueled by technological advances, a number of subtitling programs have been developed in the past two decades. Since then, the pedagogical use of subtitling as an activity of audiovisual translation (AVT) has attracted increasing attention in the field of second language acquisition (SLA). Results of previous studies revealed that subtitling tasks can promote listening comprehension, vocabulary learning, pragmatic awareness and writing skills (Ávila-Cabrera and Corral Esteban, 2021; Talaván et al., 2017; Williams and Thorne, 2000). More importantly, in line with the view of the social constructivist (Vygotsky, 1978) approach to language learning, there is also a collaborative element in subtitling. Specifically, learners are encouraged to undertake a shared activity to achieve the same learning objective through interaction (Talaván et al., 2017).
There are now a number of subtitle editing programs freely available on the Internet. Some of the commonly used ones include Aegisub, Jubler, Subtitle Edit, Subtitle Workshop and VisualSubSync (see Appendix 1 for details). Among these tools, VisualSubSync is unique in that it provides audio waveform representation to facilitate accurate subtitle editing and offers online access to the subtitles via a webpage. For the purpose of demonstration, we focus on the use of VisualSubSync as a tool for subtitling. VisualSubSync was chosen because of its simplicity and user-friendliness which make it an ideal choice for beginners. In a nutshell, this review first provides an overview of the software and then explores its use for listening pedagogies in the EFL classroom.
Overview
In brief, the creation of subtitles within VisualSubSync usually follows a two-step working process. The first step is to extract the audio stream from a local source video file and create a new subtitle project (see Figure 1). Note that the program accepts either audio or video as the source file, which makes it possible to make subtitles for an audio or a video. Accordingly, VisualSubSync can be used to teach both listening and viewing skills in the EFL classroom.

Extracting the audio stream and creating a new project.
The second step is to create a timeline and add subtitle texts. The main user interface of VisualSubSync can be divided into four sections (see Figure 2). The WAV display provides a visualization of soundwaves and timelines. The timeline comprises a series of user-defined time spans (i.e. the start and end time) to represent the utterance in the audio stream. The time spans (or the subtitle timestamp breakdowns) are shown in the subtitle list. To visualize each span, a synchronized preview is provided on the right. When a span is selected, transcripts for the span can be entered in the subtitle editing area. After all of the spans have been transcribed, the subtitles can be exported and shared as a standalone subtitle file.

The main window of VisualSubSync.
VisualSubSync also supports online collaboration (see Figure 3). When the network mode is activated on one's computer (i.e. the host), peers from the same network can get access to the subtitles and the audio streams in their web browsers, and submit their subtitles to the host. Then it is up to the host to decide whether or not the subtitles would be accepted.

Collaboration in the network mode.
Potentials in Teaching Active Listening and Viewing
There are several pedagogical benefits of using VisualSubSync in the English-as-a-foreign-language (EFL) classroom, which include promoting active listening and viewing, fostering collaboration and developing listening and viewing strategies.
First, learners are motivated to create subtitles for their favorite videos. To achieve that goal, learners have to concentrate on the audiovisual messages and attempt to understand exactly what a speaker is saying, and then convert the messages into transcripts in VisualSubSync. The word-for-word listening and viewing process involves word segmentation and recognition from continuous speech, which is an initial and crucial component of language acquisition (Ordin and Nespor, 2016). Since subtitling tasks are cognitively demanding (Ávila-Cabrera and Corral Esteban, 2021), learners need to take advantage of their entire linguistic repertoire in English to process the aural and visual messages. More importantly, learners are able to replay and showcase the outcome on-the-spot, which is a rewarding and engaging learning experience for most beginners.
Second, the VisualSubSync network mode allows sharing subtitling tasks and online interaction among different learners. Learners in an EFL listening and viewing class can be assigned to several groups. For each group, a group leader creates the timeline and acts as the host, and group members work together via their browsers to transcribe the message in each time span. In order to finish the subtitling task, learners are encouraged to cooperate with their peers. Specifically, learners can contribute their transcript and offer suggestions or feedback, where necessary, to others’ contribution. In this connection, collaborative listening and viewing can be achieved.
Last, VisualSubSync can be employed to facilitate, monitor and evaluate EFL learners’ listening and viewing strategies. EFL learners are encouraged to utilize every possible resource at their disposal to complete a subtitling task. Under such conditions, learners may develop different listening and viewing strategies such as listening for gist, listening for details, inferencing, predicting and note-taking to facilitate their transcribing process. For example, they can learn to derive word meaning from the context and audiovisual clues provided by the video when a new vocabulary is met.
Challenges
The most challenging task teachers may undertake is the selection of source videos that are appropriate for subtitling tasks in the EFL classroom. In general, language input (audiovisual input in this case) should be comprehensible to learners (Krashen, 2008). There are a number of factors that may affect the speech comprehensibility. These include but are not limited to the length, content, genre and speech rate of a video. Therefore, due attention should be paid to evaluate the suitability of a video for a particular class.
Based on our teaching practice, the optimal video length for a 45-minute listening class is suggested to be less than 2 minutes for intermediate-level undergraduate students when they are asked to work independently. Most learners in our class feel comfortable to work with news videos featuring interesting events that are rarely seen in daily life. In cases where learners are struggling for understanding, supporting materials should be provided.
Conclusion
Subtitling is probably one of the most motivating listening and viewing pedagogies a language teacher can develop, because it not only encourages active participation but also helps learners develop both confidence and strategies in learning a foreign language. VisualSubSync offers an ideal subtitling solution for the teaching of listening skills in the EFL classroom. More pedagogical potentials of VisualSubSync still remain unexplored in language teaching. Future research may take this point into consideration.
Footnotes
Funding
The authors disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This work was supported by the Social Science Fund of Sichuan Province, Sichuan Research Center of Foreign Languages & Literature and Shanghai Foreign Language Education Press, (grant number SC21WY002, SCWYH18-06).
