Abstract
In implementing audio description (AD) training, it is important to assess the quality of the trainees’ AD output, and the trainees should be made aware of the assessment method before their training begins (See, e.g., Fryer, 2019; Yan & Luo, 2022, 2023). It is, therefore, important to determine how AD quality will be assessed when planning a training course. Furthermore, the alignment of the learning method, the learning outcomes, and the assessment method should be addressed by the trainers in the course design phase (Chmiel et al., 2019). The explicit criteria to be used in the assessment should be made available before the actual learning to ensure that both the trainers and the trainees are informed of what is expected of the students (Marzà Ibañez, 2010). However, as described in Mazur and Vercauteren (2019), quality assessment is particularly complex for AD as compared to other forms of translation: Because the choices are related to content selection, the language used to describe it and the amount of text are often subjective and can be related to genre or even to an individual product, and therefore any potential errors are more difficult to pinpoint and quantify. (p. 13)
Although limited in number, some studies have developed AD quality assessment frameworks for various settings. The assessment methods are designed to be used in audiovisual translation courses (e.g., Kajzer-Wietrzny & Tymczyńska, 2015). Some language classes have used AD as a pedagogical tool to enhance students’ language skills (e.g., Talaván, 2020). Some frameworks have been designed for teaching purposes (e.g., Marzà Ibañez, 2010), whereas others have been used to evaluate professional AD products (e.g., Fryer, 2019). As much as they differ, the criteria proposed in such studies were generally derived from literature reviews, the AD guidelines, or the authors’ own professional experience. That is, AD quality assessment methods tend to be centered on the preferences of AD creators and not those of the individuals who rely on ADs to obtain visual information. Empirical evidence from the perspective of users to support the legitimacy of quality assessment criteria is still lacking.
by answering questions on this article. For more information, visit:☑Earn CEs Online
AD Reception Studies
Mazur (2020) described reception studies as one of the three major strands of AD studies, stating that they aim to “verify whether the proposed description methods are effective, acceptable[,] and enjoyable for the target audience”(p. 237). Reception studies are user-centered; AD reception studies help to verify the effectiveness of AD products from the users’ perspective and, therefore, can directly contribute to the understanding of AD quality.
Most reception studies examined users’ reactions in manipulated situations in which variables were carefully controlled to serve the research question. They have explored users’ reactions to the inclusion of film language in AD (e.g., Fryer & Freeman, 2013), the subjective expressions in AD (e.g., Walczak & Fryer, 2017), and the application of text-to-speech technology in AD (e.g., Tor-Carroggio, 2020).
Mazur (2020) suggested that the methods used in reception studies are usually borrowed from sociology and include focus groups, in-depth interviews, and questionnaires. Tuominen (2018) pointed out that the questionnaire has been a predominant research method in reception studies. However, one prominent drawback of questionnaires is that “they do not allow the research to involve elements of social interaction” (Tuominen, 2018, p. 80). Tuominen (2018) expressed the view that focus groups can provide more details on the group participants’ views and can sometimes provide information beyond researchers’ expectations. Given that the aim of this study is to gain a holistic view of how users perceive and evaluate the quality of ADs, the focus group research tool is well suited to this research.
Present Study
This study is based on an AD module integrated in a university translation program in Hong Kong. The development of the AD training materials was based partially on the materials by Snyder (2014) and ADLAB Pro (2019). (Note: ADLAB stands for Audio Description: Lifelong Access for the Blind, and ADLAB Pro refers to Audio Description: A Laboratory for the Development of a New Professional Profile). The curriculum design and teaching procedures have been reported in a separate article (Yan & Luo, 2022). After the study's 29 students completed a 2-week AD training course and submitted their assignments to their instructor (this article's first author), the researchers invited three AD evaluators who are visually impaired to grade these assignments. The assignment consisted of a 3-min film clip from the movie Lust, Caution (Lee, 2007, pp. 2:18:23–2:21:18). The students were required to write AD scripts for the clip, record performances together with the film soundtrack, and submit their written AD scripts and recordings to the instructor within 10 days.
The current study was guided by the following research questions.
What are the criteria implemented by users in evaluating AD quality? How do trainees perceive the criteria for AD evaluation? What are the differences between trainees’ perceived criteria and those implemented by AD users?
Method
Procedure
This research received approval from the City University of Hong Kong's ethics committee and obtained informed consent from the participants beforehand. Before the three evaluators who are visually impaired started reviewing the students’ AD work, the researchers briefed them about the storyline of Lust, Caution, providing them with a synopsis of the film taken from a book on the movie (Cheng, 2007). The short briefing served as an audio introduction (Fryer & Romero-Fresco, 2014), providing the necessary contextual information for the evaluators. Next, the researchers played the film clip without AD to familiarize the evaluators with the original film sounds. The evaluators then began their formal grading. All of the students’ names were concealed, and the students were represented by numbers (Nos. 1–29). After the evaluators listened to each student's performance, they graded the performance by whispering a grade from 0 to 100 points to the research assistant sitting nearby. (Note that grades were based on the grading scheme used by the academic department sponsoring this program, in which A + : 90 points or above; A: 85–89; A-: 80–84; B + : 75–79; B: 70–74; B-: 65–69; C + : 60–64; C: 55–59; C-: 50–54; D: 45–49; F: below 45 points.). They then discussed the performance they had just graded. In this way, the three evaluators formed a focus group, and they were encouraged to provide both positive and negative comments in their discussion. The researchers recorded the process with a video camera, thereby collecting the comments and discussion for analysis.
After they had completed the AD module and submitted their work to the instructor, the students were also invited to participate in focus group interviews. Eleven of the 29 students agreed to join the interviews. Three focus groups were formed. Each focus group interview with the students lasted about one hour. At the beginning of the interviews, the researchers played a sample version of the assigned movie scene, first informing the students that the sample version was not intended to be a model answer. After the student interviewees gave their opinions on the sample AD, the researchers invited them to reflect on the AD they had created. The focus group was recorded by video camera. All of the participants (the evaluators and the trainees) used their first language (Cantonese or Mandarin) in the discussions. The recordings were later transcribed and imported into NVivo for qualitative analysis.
Data Analysis
This study used methods derived from grounded theory analysis to code the oral comments provided by the AD users and the students. Grounded theory analysis has been defined as “a strategy of inquiry in which the researcher derives a general, abstract theory of a process, action, or interaction grounded in the views of participants” (Creswell, 2009, p. 13). Following the grounded theory analysis coding procedure explicated in the literature (Yan & Horwitz, 2008), the qualitative data underwent a three-level coding process. In the level-1 coding, the researchers broke the transcribed interview texts down into meaningful pieces, then assigned preliminary codes to all of the pieces related to the research questions. In the level-2 coding, the researchers carefully examined all of the preliminary level-1 codes and grouped similar codes into tentative categories. Each category was given a label to represent its core idea. In the level-3 coding, the categories generated from level 2 were examined and re-organized. The categories were finalized through the identification of clear themes after cross-category comparisons. During the level-2 coding, the researchers used the quality assessment construct provided in Fryer (2019) as a reference. The construct consists of four macro criteria for assessing AD quality: accuracy, language, delivery, and synchrony. The description of each AD macro criterion is provided in Table 1. The four macro criteria provide a conceptual framework for studies on AD quality and can facilitate the comparison between user data and trainee data.
Audio Description (AD) Quality Assessment: Macro Criteria and Descriptions.
To reduce the subjectivity inherent in qualitative studies, the researchers used the interactive qualitative analysis method introduced by Northcutt and McCoy (2004). Interactive qualitative analysis relies on a group instead of an individual in the decision-making process of a qualitative analysis. In this study, the labels, themes, and categories for the three levels of coding were finalized by the three coders (two authors of the article and one research assistant) together. If any disagreement arose, the coders discussed and reached a consensus on the labels/themes before continuing.
Results and Discussion
AD Quality Assessment Criteria Perceived by the Users
A framework for evaluating AD quality emerged from the three-level coding analysis of the users’ comments. Table 2 presents the results of the analysis. Examples of the comments provided by the users are given to illustrate each criterion. All of the direct quotations in Cantonese or Mandarin provided by the users and trainees were translated into English.
User-Centered Audio Description (AD) Quality Assessment Framework.
Note: Explanations provided by the researchers appear in parentheses.
Qualitative Differences Between the AD Quality Assessment Criteria of the Users and Trainees
After analyzing the users’ and students’ comments, the researchers compared the AD quality assessment criteria of the two groups. The comparison is presented in Table 3, with divergences marked with an asterisk.
Comparison of User-Centered and Trainee-Perceived Audio Description (AD) Quality Assessment Frameworks.
* Divergences between the two groups (users and trainees).
Certain criteria were perceived by only the users or the trainees. Direct quotations of the trainees’ comments are provided below to illustrate the criteria perceived by the trainees but not the users.
Under the macro criterion “accuracy”, the trainees perceived “correctness of information” as a quality assessment criterion. The information in the AD must be correct. For example, the characters’ names in the film must be accurate. (Student PTC)
Under “language,” “preciseness of language” was only perceived by the trainees. The language must be very concrete and specific and cannot be too general. For example, if you want to use a verb in your description, try to use the right one and avoid using an adverb to modify it. (Student ZHY)
Under “synchrony,” “synchrony between AD and film images” and “not giving away plot” were the two micro criteria that the students perceived and the users did not. As two students stated: I tried hard to synchronize my voice with the film image. This is difficult. I probably didn’t do it well. (Student YJ) It's about whether the describer uses a “god's view” in the description—that is, whether to describe the scene using the knowledge about what will happen later. I didn’t do it well, as I would sometimes talk about what would happen to the two characters later in my version. (Student LLH)
That these criteria were perceived by the trainees and not the users can be explained as follows: since all of the trainees in the study were sighted, they were able to access the audiovisual source text to compare the AD with the source text more easily. This ease of access could also explain why “plausibility of information” and “consistency of information” were not mentioned by the trainees, since these factors can be used to directly assess the “correctness of information” via a comparison of the images and the AD.
Another noteworthy discrepancy is that the trainees did not list “no redundant information” as an assessment criterion. Upon closer inspection of the users’ comments regarding this micro criterion, we were able to further categorize redundant information as belonging to the following types:
easily identifiable film sounds (e.g., “a huge blast”); cast information (e.g., “Mr. Yee, played by Tony Leung”); information about silence (e.g., “She said nothing.”); speaker information (e.g., “Mr. Yee said”); the translation of basic foreign language used by the film characters (e.g., the Chinese translation of “congratulations”); and camerawork (e.g., “The camera turns to the other side.”).
Under the macro criterion of “language,” “clear and logical language,” “easy-to-understand language,” and “use of the present tense” were perceived by the users and not by the trainees in the study. We found that about half of the comments related to “clear and logical language” centered on the poor use of pronouns, which led to confusion (see the example provided in Table 1). The other half were positive comments on the students’ descriptions of a series of complex actions in chronological order. Regarding the criterion “easy-to-understand language,” the users referred to cases in which the describers used too many four-character Chinese idioms, making it difficult for the users to follow, as well as the use of descriptions that were too abstract for visually impaired people to understand. For example, compared to one description of the size of a diamond ring (a six-karat diamond ring), one user commented that she preferred another version, in which the student described the diamond as being “the size of a pigeon egg,” a concept that was more familiar to the user. It was found that all of the comments about “use of the present tense” addressed using inappropriate connectives in AD (e.g., “then,” “afterwards,” “subsequently”), which echoes the prescriptive guidelines provided in Fryer (2016) regarding AD scriptwriting.
The users listed three more criteria under “delivery” than the trainees, namely “good pronunciation,” “appropriate pauses in delivery,” and “accompanying noises minimized.” We examined this phenomenon and found that many of the comments on “good pronunciation” were negative comments on the describers’ use of “lazy sounds (懒音)” in their oral delivery. The cause of “lazy sounds” in Cantonese is usually perceived as the speaker's being “simply unwilling to put forth sufficient effort to articulate the standard pronunciation” (Law et al., 2001, p. 180). The users also commented on the lack of pauses in the oral delivery, which the trainees did not mention. Moreover, the users tended to be sensitive to the sound quality of the AD recording, as they commented on the background noises and strange sounds in the recording. The fact that the trainees touched upon fewer items in discussing delivery quality suggests that trainees’ understanding of good oral delivery in AD may not be as comprehensive as that of users.
Quantitative Differences in AD Quality Assessment Criteria Perceived by Users and Trainees
One advantage of using NVivo in qualitative data analysis is that the software automatically records the number of comments coded under each category. Overall, the three evaluators contributed 240 comments, and the eleven students produced 144 comments. Following the method used by Su (2019) in investigating the patterns in peer evaluation of interpreting quality, this study compared the number of comments across subcategories of AD quality. The distribution of these comments is shown in Table 4.
User and Trainee Comments: Comparison of Distributions.
To further illustrate what aspects of the AD quality the users and the trainees most commented on under each macro criterion, Tables 5–8 present the comment distribution across the criteria.
“Accuracy”-Related Comments Provided by Users and Trainees: Comparison of Distributions.
“Language”-Related Comments by Users and Trainees: Comparison of Distributions.
“Delivery”-Related Comments Provided by Users and Trainees: Comparison of Distributions.
“Synchrony”-Related Comments by Users and Trainees: Comparison of Distributions.
These results presented in Table 5 suggest that, compared to trainees, users are more sensitive to the subjectivity of information, which is evident in the gap between the percentage of related comments provided by the users and trainees (38.0% versus 12.5%, respectively). In contrast, in evaluating AD quality, the trainees paid more attention to “appropriate content selection and prioritization” (users: 2.0%, trainees: 17.2%) and “completeness of information” (users: 17.0%, trainees: 39.1%). These discrepancies suggest that when evaluating AD products, trainees tend to pay more attention to what is described, whereas users tend to focus on how the information is delivered. These issues may have occurred because all of the trainees in this study were sighted and, thus, were better able to compare source text and target text and were more confident in judging whether the information delivered in AD was adequate.
Under the macro criterion of “language,” the users’ comments on each micro criterion were more evenly distributed. We found the trainees seldom commented on either “linguistic correctness” or “economic use of language” in terms of AD quality, providing only one comment on each item in the study. The discrepancies between the two groups suggest that the trainees focused on the form of the AD script, as this group tended to comment more frequently on the vividness and appropriateness of the expressions used. In contrast, the users focused on the functions of language in AD, providing more comments on the linguistic correctness, conciseness, clarity, and comprehensibility of the language being used.
Under the macro criterion of “delivery,” criteria on which the users commented the most were “good pronunciation” (16.1%), “proper volume” (14.3%), “good voice acting” (14.3%), and “intonation neither too flat nor too excited” (14.3%). From the trainees’ perspective, the comments on “good voice acting” (35.9%) and “fluent delivery” (33.3%) accounted for most of the comments on the “delivery” of AD; however, the users provided fewer comments on these two criteria (14.3% and 7.1%, respectively), which suggests that trainees may attach greater importance to fluency and variation in delivery styles when assessing AD delivery. In comparison, users display a more comprehensive view in assessing delivery quality.
A noticeable difference between the users’ and trainees’ comments on “synchrony” was that most of the comments provided by the users were on “synchrony between AD and film sounds” (68.4%), whereas the trainees tended to provide more commentary regarding “no overlap with film sounds” (61.1%). Given that all viewers should easily notice the overlapping of AD lines and film sounds, the discrepancy found between the users and the trainees in this study suggests that AD users may be more sensitive to the consistency of information presented in the acoustic channel than AD students are.
Conclusion
The findings of this study have important pedagogical implications for AD trainers. First, incorporating classroom-based research (e.g., interview and survey studies) into AD training programs is recommended to ensure the consistent application of best practice criteria. Second, the assessment criteria derived from this study can serve as a valuable resource for establishing criteria in AD training. They can also provide guidance for trainers in designing effective teaching activities and developing relevant training materials. For example, when addressing accuracy, trainers should inform their students of elements that are perceived by users as being unnecessary or patronizing. In studying the language features of AD, trainees should be mindful of their use of pronouns and literary expressions and pay close attention to the conciseness and readability of the language. The participant users perceived more quality assessment criteria under the category of “delivery.” Therefore, trainers should provide AD trainees with a more comprehensive understanding of good delivery, for example, by raising students’ awareness of microphone technique and avoiding “lazy sound” in delivery. With regard to synchrony, trainers should guide students in being mindful of the consistency between film sounds and AD lines, as users are sensitive to asynchrony present in information delivered in different forms.
Limitations
This study is not without limitations. First of all, although the focus group interviews are effective in data collection, it would be better, if possible, if individual interviews were followed in this study to gather more details information from each of the participants. Second, each of the trainees commented on two AD clips (one sample provided by the researchers and one produced by themselves). The results might have been different if they evaluated more AD clips from different genres. Third, the study was based on a 2-week AD module integrated into a university interpreting program in Hong Kong. The results could have been different if the students had a longer time in AD training. However, we believe that the current study's results provide support for the integration of the user perspective and reception studies into AD training, and the study presents a possible path to enable users to take a more proactive role in training future audio describers.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the City University of Hong Kong, Guangdong Philosophy and Social Science Foundation (grant number Project #7020037, grant number Project #GD24YWY03).
