Abstract
The purpose of this study was to ascertain whether seeing the lyrics while learning a difficult song aurally induces less cognitive load in learners compared to not seeing the lyrics, leading to better recall accuracy of the learned song. Cognitive load was assessed through a reaction time measure based on a dual-task paradigm. Recall accuracy of the learned song was measured regarding lyrics, pitches, and rhythm. Thirty-six non-music majors individually learned two songs through prerecorded aural instruction; for one song they saw the lyrics and for the other song they did not see the lyrics. The presentation order of instructional condition and song were counterbalanced. Results showed instructional condition affected cognitive load but not recall accuracy. A path analysis revealed a mediating effect of cognitive load regarding lyrics and rhythm, suggesting seeing the lyrics indirectly increases recall accuracy of lyrics and rhythm through its positive effect on cognitive load. Given limited instructional time, several strategies should be considered to prevent learners from experiencing cognitive overload while learning a difficult song aurally. Showing the lyrics of the difficult song could be one strategy for that purpose, at least for young adults with low levels of musical expertise.
Keywords
Singing a song is a unique means of human expression and communication. Recognizing the importance of singing in one’s personal life and society, many music educators and researchers have long been interested in how to help students develop their singing ability and strategies for providing effective song instructions. While a variety of factors (e.g., vocal register, pitch-matching ability, vocal modeling, use of accompaniment) might affect singing accuracy and song acquisition (Goetze, Cooper, & Brown, 1990; Hedden, 2012; Nichols, 2016; Pfordresher et al., 2015), several researchers have focused on the role of the lyrics—whether singing with or without lyrics affects singing accuracy and song acquisition (e.g., Berkowska & Dalla Bella, 2009; Gault, 2002; Goetze et al., 1990; Jacobi-Karna, 1996; Levinowitz, 1987, 1989; Smale, 1987). The underlying assumption of these studies is that lyrics may detract learners’ attention from melodic information hindering song learning (Andress, 1986; Berkowska & Dalla Bella, 2009; Goetze et al., 1990; Levinowitz, 1987); thus, singing without lyrics (e.g., singing on “la”) may help persons sing more accurately than singing with lyrics. However, results have been inconsistent: Some researchers (Berkowska & Dalla Bella, 2009; Goetze, 1985; Levinowitz, 1989) revealed singing on a syllable was more accurate than singing with lyrics, whereas others (Sims, Moore, & Kuhn, 1982; Smale, 1987) found no difference between singing with and without lyrics. In other studies, the results regarding the text condition depended on the song taught (Gault, 2002) and the ages of the participants (Jacobi-Karna, 1996). It is noteworthy the studies mentioned above included only aural instructions; visual materials (e.g., iconic or staff notation) were not employed.
However, music teachers often use multimodal aids to promote students’ musical skills and understanding (Tarnowski, 1986; Zikmund & Nierman, 1992). Visual and kinesthetic representations (e.g., iconic notation, motions) may help learners understand the transient and intangible characteristics of musical elements. Researchers examining the efficacy of multimodal approaches for singing achievement (Apfelstadt, 1984; Hughes, 1991; Pautz, 1988; Persellin, 1994; Tarnowski, 1986) reported inconsistent results. Some researchers found young children who received multimodal instruction performed pitch-pattern echoing better than children who received only aural instruction (Apfelstadt, 1984) and visual or kinesthetic instruction (Persellin, 1994). Yet, others did not find any difference among instructional conditions (Hughes, 1991; Pautz, 1988; Tarnowski, 1986).
In those studies, the musical components aided by multimodal instruction were mostly melodic contour and melodic rhythm. Although some researchers (Persellin, 1994; Tarnowski, 1986) provided visual and kinesthetic reinforcement of lyrics along with other musical components, the lyrics were presented visually in the form of pictures, not written text. The use of visually presented lyrics as written text in song learning has received little attention (Han, 2016). In practice, teachers sometimes show the lyrics (without the staff notation) while teaching or singing a song. In vocal music concerts, program notes sometimes provide lyrics for the audience. Karaoke screens always show the lyrics, and many places of worship display lyrics on a screen for congregational singing. If we often see the lyrics when learning or singing a song, discerning the effect of seeing the lyrics on song learning seems an important topic to be further explored. This study examined how seeing the lyrics affects song learning from a cognitive load perspective.
Cognitive load refers to the total amount of information imposed on working memory (Sweller, 2011; Sweller, Ayres, & Kalyuga, 2011). Cognitive overload occurs when incoming information exceeds the learner’s working memory capacity to process information (Mayer & Moreno, 2003). When learning a song aurally, the verbal (i.e., lyrics) and musical information (e.g., pitches, rhythm) of the song are processed through the aural channel (Baddeley, 1992; Mayer & Moreno, 2003). If the song has less information, in the sense of being a simple pattern or familiar, processing it in the aural channel likely will not be a problem. However, if a song conveys more information or substantial new information, only using the aural channel might cause cognitive overload to the learners, thus hindering one’s ability to learn the song.
A considerable number of studies in instructional psychology have focused on how to optimize human cognitive capacity when presenting instructional materials (Mayer, 2008). Those studies have been guided by three basic assumptions: dual channel, limited capacity, and active processing (Mayer, 2014; Mayer & Moreno, 1998, 2003). Dual channel means that humans possess separate systems to process visual and auditory information (Baddeley, 1992). Limited capacity means that human beings have limited capacity to process information in each channel at any given time (Baddeley, 1992; Cowan, 2000; Miller, 1956; Sweller, 2011). Active processing indicates that humans learn by actively attending to the incoming information, organizing attended information into a coherent mental representation, and integrating mental representations with previous knowledge (Mayer & Moreno, 2003).
Based on these assumptions, numerous studies have shown that presenting material in ways that reduce cognitive load leads to better comprehension and memory (see Mayer, 2014; Sweller, 2011; Sweller et al., 2011). One of these instructional effects suggests that presenting instructional materials using two modalities (visual and aural) is more beneficial for learners than using only one modality (e.g., Low & Sweller, 2014; Mayer & Moreno, 1998; Mousavi, Low, & Sweller, 1995; Tabbers, Martens, & van Merriënboer, 2004; Tindall-Ford, Chandler, & Sweller, 1997). Following these assumptions, I (Han, 2016) hypothesized that if learners saw the lyrics when aurally learning a difficult song (a song that has much information likely leading to cognitive overload), they would better recall the learned song compared with learners who did not see the lyrics. Dual processing of verbal information might reduce the cognitive load in the aural channel, leading to more capacity to process musical information. With 26 members from auditioned choirs and controlling for phonological working memory, I found that the efficacy of these instructional conditions depended on the participant’s level of musical expertise. Overall, non-music majors benefited from seeing the lyrics, whereas music majors did not.
Music majors may process verbal and musical information in a more integrated way than non-music majors (Ginsborg & Sloboda, 2007). In Ginsborg and Sloboda’s (2007) study, participants were randomly assigned to three conditions in which they were asked to deliberately learn a song and memorize it by themselves from notated music and audio recordings. Conditions 1 and 2 required participants to learn the words and tune separately, with words first and then tune (Condition 1) and vice versa (Condition 2), and finally words and tune together. In Condition 3, participants learned the words and melody together throughout the learning session. Results showed singers with higher levels of musical expertise recalled the learned song more fluently and accurately than singers with lower levels of musical expertise, but only when they memorized the lyrics and tune together (Condition 3).
In my prior study (Han, 2016), participants learned a song from pre-recorded audio instructions but not from staff notation under two study conditions: with seeing the lyrics and without seeing the lyrics. Music majors outperformed non-music majors in recalling pitches and rhythm only when not seeing the lyrics. For music majors, seeing only the lyrics but not the staff notation might be disadvantageous due to difficulties integrating the verbal and musical information while hearing the verbal and musical information concurrently.
Also, the dual presentation of verbal information might be redundant for music majors because music majors might have more capacity in the aural channel, which is congruent with the expertise reversal effect (Kalyuga & Sweller, 2014; Sweller, Ayres, Kalyuga, & Chandler, 2003): Presenting verbal information dually—aurally and visually—might be redundant for high knowledgeable learners, but beneficial for low knowledgeable learners. Several studies suggest musical training might be related to auditory temporal processing (Besson, Schön, Moreno, Santos, & Magne, 2007; Jakobson, Cuddy, & Kilgour, 2003) and verbal memory (Chan, Ho, & Cheung, 1998; Jellison & Miller, 1982; Kilgour, Jakobson, & Cuddy, 2000). Also, the musician’s superior understanding of musical structure and characteristics might facilitate encoding and retrieval of musical information (Racette & Peretz, 2007). Music majors’ enhanced encoding strategy and auditory temporal processing may allow them to have more aural channel capacity to process the verbal and musical information, which was supported by the results that music majors better recalled the lyrics and pitches than non-music majors over instructional conditions (Han, 2016). On the other hand, if non-music majors have less aural channel capacity compared to music majors, they could benefit by processing verbal information both aurally and visually, thus reducing their cognitive load in the aural channel. The reduction of aural cognitive load might provide more capacity in the aural channel to process musical information. It is notable that although non-music majors in the prior study were considered to have low levels of musical expertise compared to music majors, they were still members of auditioned choirs. Those with less musical expertise may find seeing the lyrics even more helpful.
In the previous study (Han, 2016), the efficacy of seeing the lyrics was demonstrated by the learning outcome, song recall accuracy; the results were explained from a cognitive load perspective. However, the actual cognitive load induced by each instructional condition was not directly assessed. Therefore, the purpose of this study was to ascertain whether showing the lyrics to non-music majors reduces their cognitive load in the aural channel and whether the reduction of cognitive load leads to better recall accuracy of the learned song.
To measure aural cognitive load, reaction time (RT) in a simple auditory monitoring task was used in a dual-task paradigm (Brünken, Plass, & Leutner, 2004). The dual-task paradigm assumes that performance of a secondary task depends on the amount of cognitive resources induced by a primary task, if both tasks are processed at the same time and require the same cognitive resources (Brünken, Steinbacher, Plass, & Leutner, 2002): If a primary task induces less load, then participants will perform better in a secondary task and vice versa (Britton, Glynn, Meyer, & Penland, 1982; Brünken, Plass, & Leutner, 2003). In this study, a faster reaction indicates the learner has more free cognitive capacity. For non-music majors, it was assumed that if learners see the lyrics while learning a song, the dual processing of the verbal information would induce less cognitive load in the aural channel, leading to faster RT compared to when they do not see the lyrics.
Hence, this study addressed the following research questions:
Do non-music majors recall songs more accurately if they learn difficult songs with or without visually presented lyrics?
Do non-music majors experience less aural cognitive load if they learn difficult songs with or without visually presented lyrics?
Does cognitive load mediate the relationship between instructional condition (learning difficult songs with or without visually presented lyrics) and song recall accuracy in non-music majors?
Method
Design
This study employed a within-participants design with repeated measures to reduce the possibility of confounding factors such as the individual differences in singing ability, working memory, and RT. The independent variable was the instructional condition consisting of two levels: learning a song (1) with visually presented lyrics (with VPL), and (2) without visually presented lyrics (without VPL). While learning two difficult songs aurally, participants saw the lyrics of one song and did not see the lyrics of the other song. To reduce any order effect, the presentation order of instructional condition and song were counterbalanced. The dependent variables were song recall accuracy and cognitive load, operationalized as an RT measure. However, to answer the third research question, cognitive load was used as a mediator in the final analysis.
Participants
Undergraduate students who were not majoring in music but enrolled in a music course at a South Atlantic University in the United States were invited to participate in this study. Thirty-six undergraduate students (8 males and 28 females; mean age, 19.28 years; SD = 1.09) participated in this study. All participants were native English speakers, and none were familiar with the stimulus songs. Twenty-two participants had previous experience singing in a choir for 4.26 years on average (SD = 2.87). Twelve participants played in a band or orchestra for 5.17 years on average (SD = 2.73). Nine participants reported they took a voice lesson from one time up to 2 years. Although 20 participants indicated that they were engaging in some musical activities such as playing the ukulele, singing, and writing music, none were taking voice lessons or participating in a choral or instrumental ensemble at the time of data collection.
Data collection tools and procedure
Songs
This study required songs perceived to be difficult enough to produce high cognitive load in the aural channel. I chose two songs composed by Mary Ellen Pinzino with lyrics by Sara Teasdale. “April” was the song used in my prior study (Han, 2016). The other song was “February Twilight,” which was considered similar to “April” in that it is in triple meter, consists of 16 measures, and is indicated as a difficult song in the original source (http://comechildrensing.com/). These two songs have no repetition of any phrase—the melody varies throughout the song. Whereas “April” is Aeolian, “February Twilight” is Mixolydian. 1 One might speculate “February Twilight” in Mixolydian is more difficult than “April” in Aeolian in that the Aeolian mode is a common minor mode. However, “April” is rhythmically more challenging with its 16th notes. Thus, the difficulty levels of two songs were considered similar.
The original singing range of the two songs seemed too high for the participants, so I transposed “April” to F Aeolian and “February Twilight” to D Mixolydian, matching the tessitura of two songs. I also reduced one macro beat (a half bar unit in those songs) in the 14th measure of “February Twilight,” so the song had 32 macro beats like “April.” The revised scores of “April” and “February Twilight” are presented in Figure 1. The songs were used by permission from the copyright owner, Mary Ellen Pinzino.

Music Scores of “April” in F Aeolian and “February Twilight” in D Mixolydian. Retrieved 2016, from http://comechildrensing.com/. Copyright 1998 for “April” and 1988 for “February Twilight” by Mary Ellen Pinzino. Reprinted with Permission.
Song instruction
All song instructions were audio-recorded by trained singers who were native English speakers. Two versions (Male and Female) were prepared to match the participants’ sex. The pre-recorded aural instruction was played on computer and delivered through headphones. As stated earlier, the order of songs and instructional conditions were counterbalanced: half of the participants learned the song “April” first; the other half learned “February Twilight” first. Half of the participants saw the lyrics while learning the first song; the other half saw the lyrics while learning the second song.
The song instruction procedures used the phrase-by-phrase approach, basically following those from the previous study (Han, 2016). In the instruction for each song, the instructor sang the song (tune and lyrics) in its entirety. Next, the instructor sang the song by short phrases (two measures of the song at a time) and the participant echoed. This short phrase instruction procedure was repeated two times in total. For the first short phrase instruction, a distinct click sound was added several times over the aural instruction to be used as an RT measure, which is explained in detail below. The second time, the aural instruction was presented without the click sound. The instructor then sang long phrases of the entire song (four measures) and the participant echoed. After the instructor sang the whole song again, she or he sang a “ready-sing,” a short singing cue, so that participants could sing the whole song by themselves. In this study, the “ready-sing” used the beginning phrase of the song.
Song recall test
All instructions for the song recall test were audio-recorded by the same persons who recorded the song instruction. In the recall test, the lyrics were not shown. In the test, the participant heard the entire song again before being asked to sing the song from memory. The participant’s singing was prompted with a singing cue (“ready-sing”) and the performance was audio-recorded through a Shure SM58 microphone and Cubase 9, an audio sequencing program.
RT measure
A simple, continuous monitoring task of a single tone (Brünken et al., 2004) was used as a secondary task. RT was considered an indicator of the amount of cognitive resource available: A faster RT indicates less cognitive load is induced. A clave timbre (pitch: B6, duration: 125 ms) in Cubase 9 was used for the distinct click sound. The basic task was to tap a key on the computer keyboard (marked by a green sticker on the key) as soon as the distinct click sound was heard. For each song, eight click sounds were randomly placed over the aural instruction during the first learning phase of short phrases: Each short phrase of the song contained one click sound, but the click sounds were not overlapped with any starting point of the words and pitches. Participants’ RTs were recorded in milliseconds through Cubase 9, and no auditory/visual feedback on tapping was provided.
Procedure
Each participant was tested individually in a quiet room. After giving an overview of the research procedure, I informed the participants that they would not see the lyrics for the song recall test even if they saw the lyrics during instruction for one song. Participants began with practicing the click sound detection task (RT measure). Participants learned the first song following the pre-recorded aural instruction, which took approximately 11 min. After the learning phase, they completed the demographic questionnaire on computer and then they took the recall test for the learned song. Participants learned the second song in the same manner with the similar amount of time they took to learn the first song. But if they saw the lyrics for the first song, they did not see the lyrics for the second song and vice versa. After taking a brief (about 2 min) break, they took the recall test for the second song. During the break, I asked questions about participants’ musical experiences to prevent them from rehearsing the song. I was present during the entire data collection, which took about 35 min in total for each participant.
Ratings
Two experienced music teachers blindly rated participants’ recall accuracy of the two songs based on the written instructions I provided. Recall accuracy was rated by a half bar unit (Ginsborg & Sloboda, 2007) for the components of lyrics, pitches, and rhythm. For “April,” 32 points were possible for each component. For “February Twilight,” only 31 points were possible for each component because the second macrobeat of the second measure of “February Twilight” does not contain new information (see Figure 1). Thus, this unit was excluded in the rating. Since the total possible scores were different for each song, I transformed the raw scores of recall accuracy to percentage scores. The percentage scores were used for all analyses.
Interrater reliability was calculated through three different measures: percentage agreement, Kappa agreement, and Pearson correlation coefficient (Han, 2016; Liao, Hunt, & Chen, 2010; McHugh, 2012). Percentage agreement was high (ranged from 86.07 to 96.75) considering the total possible scores were only 32 for “April” and 31 for “February Twilight.” The Kappa statistics indicated very good agreement (ranged from .80 to 1) for lyrics and good agreement (ranged from .60 to .80) for pitches and rhythm. The Pearson correlation coefficients also yielded high coefficients (ranged from .926 to .995). Hence, the averages of the two raters’ scores for each component (i.e., lyrics, pitches, and rhythm) were used for data analysis.
Results
The descriptive statistics of recall accuracy percentages and RTs for instructional condition are presented in Table 1.
Descriptive statistics of recall accuracy (%) and reaction times (ms) for instructional conditions.
SD: standard deviation; VPL: visually presented lyrics.
Recall accuracy
Before running the statistical analysis, test assumptions for repeated measures multivariate analysis of variance (RM-MANOVA) were examined. Box-plots were generated for each instructional condition in terms of lyrics, pitches, and rhythm to detect outliers. Although there was an outlier (i.e., a particular person) for pitches in both instructional conditions, since these were minor outliers all statistical analyses were undertaken without removing them. Also, the dependent variables were considered normally distributed since the values for skewness and kurtosis for dependent variables were within the acceptable range, −2 and +2. With only two levels for each variable, sphericity was not a concern.
A one-way RM-MANOVA revealed no main effect of instructional condition, F(3, 33) = 1.47, p = .24,
The descriptive statistics of recall accuracy by song, presentation order, and instructional condition is presented in Table 2. Although the main effects of song and presentation order existed, since participants did not experience all the possible combination of those variables (instructional condition, song—they were counterbalanced), any interactions were not examined through RM-MANOVA. Instead, using MANOVA, I explored the effects of instruction condition, song, and presentation order and any interactions between these variables. However, no statistically significant effect of instructional condition and interaction was found in all analyses, perhaps due to small sample size and high variances.
Descriptive statistics of recall accuracy (%) by song, presentation order, and instructional condition.
SD: standard deviation; VPL: visually presented lyrics.
Cognitive load
Cognitive load was measured by RTs; faster RTs indicate more cognitive capacity available. Before analyzing the RT data, any responses that occurred before the target stimuli or that were slower than 3,000 ms (only one case) were eliminated. To detect any outliers, a box-plot was generated based on individual participant’s RTs for each song. Only extreme outliers (11 cases in “April” and 3 cases in “February Twilight” out of all 288 cases for each song) were deleted. The remaining RTs of each participant for each song were aggregated, and the average score was used for data analysis. The RTs for each song were normally distributed. The means and SDs of RTs for each song by instructional condition and presentation order are represented in Figure 2. A paired-samples t-test revealed a main effect for instructional condition on RTs, t(35) = 2.18, p = .036, d = .372. In the instructional condition with VPL, participants detected the click sounds faster compared to without VPL (MD = 45.31 ms). In additional analyses, the effects of song and presentation order were also revealed. Participants’ RTs were significantly faster for “February Twilight” than for “April” (MD = 45. 62), t(35) = 2.2, p = .035, d = .374, and for the second song than for the first song (MD = 66.88), t(35) = 3.51, p = .001, d = .587.

Reaction Times as a Function of Song, Instructional Condition, and Presentation Order.
Although the main effects of song and presentation order were found, the instructional condition significantly influenced the participants’ RTs. This finding indicates that seeing the lyrics while learning a difficult song induced less cognitive load in the aural channel than learning a difficult song without seeing the lyrics.
Recall accuracy through cognitive load
To investigate whether cognitive load mediates the effect of visually presented lyrics on song recall, a path analysis, MEMORE (MEdiation and MOderation analysis for REpeated measures designs) developed by Montoya and Hayes (2017), was used. This statistical mediation analysis allows researchers to estimate the indirect effect in the two-condition within-participants design even without establishing evidence of a direct effect (Montoya & Hayes, 2017). This model uses bootstrap confidence intervals (CIs) to indicate the indirect effect. If the CI excludes zero, the indirect effect is considered significant.
MEMORE-based 5,000 bootstrap samples revealed indirect effects of instructional condition on recall accuracy through cognitive load for lyrics (95% CI = [0.252, 9.887]) and rhythm (95% CI = [0.263, 9.362]), but not for pitches (95% CI = [−0.856, 6.393]). Yet, no direct and total effects were found. This mediation analysis suggests that seeing the lyrics indirectly increased the recall accuracy of lyrics and rhythm through its positive effect on cognitive load in the aural channel compared to the counterpart condition, which in turn increased the recall accuracy of lyrics and rhythm.
Discussion
This study investigated how seeing the lyrics while learning a difficult song affects non-music majors’ song learning, which was examined through recall accuracy of the learned song and cognitive load induced during learning. Furthermore, this study examined whether seeing the lyrics induces less cognitive load in the aural channel compared to not seeing the lyrics, which in turn leads to better recall accuracy of the learned song.
In this study, seeing the lyrics did not affect the participants’ recall accuracy, which contradicts the finding of my prior study (Han, 2016) in which seeing the lyrics was beneficial for non-music majors in recalling pitches and rhythm. Several differences in the research methods in the two studies might account for the incongruent results. The former study (Han, 2016) employed a between-participants design with members of auditioned choirs, whereas this study utilized a within-participants design with non-choir members. In the previous study, participants took a phonological working memory test between the learning and recall tasks, which led to about 15 min delay in recalling the learned song, whereas this study had only about 2 min time gap between the song learning and recall stage. Participants’ singing ability, phonological working memory, or the amount of time delay might affect the study results. Among various possible factors affecting song acquisition and singing accuracy (Hedden, 2012), this study investigated song learning from a cognitive load perspective. As Loui and her colleagues (2015) stated, “successful singing requires perceptual skills (pitch matching, interval reproduction, and fine-grained pitch-discrimination ability), cognitive abilities (working memory, attention, and learning processes), and motor skills (motor planning, motor selection, and motor execution)” (p. 263). Thus, song learning should be examined not only through the cognitive process but also through perceptual and motor skills. Such participants’ pitch-discrimination ability (Watts, Moore, & McCaghren, 2005) and vocal register (Rutkowski, 2015) might play a crucial role in understanding the phenomenon.
The results from the RT measure suggest that learning a difficult song while seeing the lyrics induces less cognitive load in the aural channel compared to learning it without seeing the lyrics. Furthermore, cognitive load mediates the causal relationship between instructional condition and recall accuracy of lyrics and rhythm. It is interesting that the mediation effect of cognitive load existed for recall of lyrics and rhythm, but not pitches. The correlation between lyrics and rhythm in recall accuracy was higher (rlyrics & rhythm = .798) than the other two combinations (rlyrics & pitches = .478, rpitches & rhythm = .575) although all the correlations were significant at the .001 level. It seems that words and rhythm are encoded and retrieved in association with each other (Cason, Astésano, & Schön, 2015; Peretz, Radeau, & Arguin, 2004; Purnell-Webb & Speelman, 2008). Wilson and her colleagues (2011) reported that while singing with lyrics non-expert singers engaged their language network extensively with greater activation of the right hemisphere, whereas expert singers showed less engagement of their language network. Whereas non-music majors showed a higher relationship between lyrics and rhythm in this study, music majors might show a different degree of relationship among lyrics, pitches, and rhythm, which can be another topic for further research.
The underlying assumptions of these findings were that seeing the lyrics during aural instruction allows learners to process the verbal information both visually and aurally; the dual processing of verbal information reduces the cognitive load in the aural channel, leading to more capacity to process musical information. This logic stands on the empirical findings called modality effect in the cognitive load literature (Low & Sweller, 2014; Mayer & Moreno, 1998; Mousavi et al., 1995; Tabbers et al., 2004; Tindall-Ford et al., 1997) and verbal redundancy effect in the working memory literature (Lewandowski & Kobus, 1993; Montali & Lewandowski, 1996; Moreno & Mayer, 2002; Penney, 1989). Given the limited capacity of working memory (Baddeley, 1992; Cowan, 2000; Miller, 1956), attempting to maximize the learner’s capacity by employing two channels—aural and visual—is the basic notion of these effects. Specifically, the modality effect in the cognitive load literature indicates that when learning from pictures and words, if the verbal information is presented aurally as narration instead of visually as text, the visual channel might be offloaded with regard to the verbal information, having more capacity to process the pictures and in turn enhance the learning (Mayer & Moreno, 1998). The verbal redundancy effect in the literature of working memory refers to the finding that presenting identical words in two modalities as written and spoken words helps learning compared with only using one modality (i.e., written or spoken). However, if there is another mode of information in the visual channel such as diagrams or graphs, the verbal redundancy is not effective (Kalyuga, Chandler, & Sweller, 1999; Kalyuga & Sweller, 2014).
In this study, when participants learned the song with visually presented lyrics, the verbal information remained in the aural channel: the aural channel holds two modes of information (verbal and musical). This condition does not satisfy the premise that each channel should have one mode of information to establish the modality effect and the verbal redundancy effect. However, the findings indicate that showing the lyrics induces less cognitive load in learners than not showing the lyrics. Perhaps, the verbal redundancy effect found in this study even without satisfying the premise is because the relationship between lyrics and tunes is different from the relationship between graphs (or diagrams) and words in that lyrics and tunes in singing are temporally integrated, whereas graphs/pictures and words are not.
According to the theoretical framework of this study, the amount of cognitive load induced by the instructional condition influences whether seeing the lyrics or not is beneficial. Hence, in this study, choosing difficult songs causing cognitive overload was a key to demonstrate the modality effect or verbal redundancy effect. The two songs used were considered difficult in that there was no repeated phrase in the 16-bar songs and both were modal (“April” was Aeolian and “February Twilight” was Mixolydian) and in triple meter. Finally, the composer indicated that the songs were difficult. It may be the case that if the song is not cognitively demanding, seeing the lyrics would not matter; hence, future research could compare levels of difficulty of songs.
The possible concern about showing the lyrics might be whether the learners pay more attention to lyrics than other musical information (Andress, 1986; Goetze et al., 1990; Levinowitz, 1987). In this study, participants remembered the songs similarly regardless of whether they saw the lyrics or not; learners’ recall accuracy was not reduced due to seeing the lyrics. On the contrary, participants experienced less aural cognitive load when seeing the lyrics. Thus, showing the lyrics seemed not to diminish song learning, at least for young adults who were untrained singers. However, this issue should be tested with learners of different ages and different levels of musical expertise. As the levels of musical expertise do matter for which instructional condition was more effective (Ginsborg & Sloboda, 2007; Han, 2016), the stages of language development and levels of reading comprehension of learners might affect the results.
It seems noteworthy to mention that in the recall test, participants did not see the lyrics regardless of whether they saw the lyrics or not while learning. Although participants were informed of this recall condition at the beginning of experimentation, a change in format between the learning and recall phases may introduce a bias in text recall in favor of the “without VPL” condition. For consistency, if participants had seen the lyrics both during learning and recall, another bias toward the “with VPL” condition might ensue. The visually presented lyrics would function as an influential retrieval cue for song recall, whereas the counterpart did not have any retrieval cue. On the contrary, if the lyrics were shown on the recall test regardless of the instructional condition, the participants might generate additional cognitive load to process and integrate the newly presented visual information with aural information when recalling the song they learned without visually presented lyrics, which is not an ideal situation, either. Thus, I asked participants to sing the learned songs from memory without VPL regardless of the instructional condition; the potential bias toward the “without VPL” condition in song recall due to format change is a limitation of the study methods.
In this study, hypotheses were tested after one-time instruction. In reality, learners more typically learn a song over time. However, findings gained during a short instructional period might be advantageous to understand the process the learners experience at that moment of learning. Recording and analyzing the entire session of learning, not only the recall test session, would allow the researcher the opportunity for a more in-depth analysis of the results, which can be facilitated with acoustic analysis program such as PRAAT (Boersma, 2001).
The basic idea of this study followed the dual-task investigation conducted by Brünken and his colleagues (2002, 2004), measuring cognitive load through RTs. While they did not intend to statistically examine the relationship between cognitive load and learning outcome in their studies, this study utilized a mediation analysis to understand how learning a difficult song with visually presented lyrics affects song recall through cognitive load. The dual-task approach seems a promising method to assess cognitive load induced under various instructional conditions, and with a mediation analysis, researchers could better understand the underlying mechanisms of learning process with certain learning conditions.
Conclusion
The purpose of this study was to ascertain whether showing the lyrics to non-music majors induces less cognitive load in the aural channel compared to not showing the lyrics, which in turn leads to better recall accuracy of the two songs they learned. Although learning a difficult song with visually presented lyrics did not directly affect recall accuracy of the learned song, seeing the lyrics induced less cognitive load in the aural channel compared to not seeing the lyrics. Also, the mediation analysis suggests that seeing the lyrics indirectly increased recall accuracy of the lyrics and rhythm through its positive effect on RTs, producing less cognitive load compared to not seeing the lyrics.
Instructional design should be based on many considerations such as the instructional time, goals, and characteristics of students (e.g., their levels of expertise in the domain). For students with low levels of musical expertise, learning a difficult song only using the aural channel might cause them cognitive overload, hindering their learning. Given limited instructional time and the need to be more efficient, several strategies should be considered to prevent learners from experiencing cognitive overload while learning a difficult song. Showing the lyrics of the song could be one strategy for that purpose, at least for young adults with lower levels of musical expertise than music majors.
Supplemental Material
POM-19-1635.R2_supplementary_materials – Supplemental material for Mediating effect of cognitive load in song learning with visually presented lyrics
Supplemental material, POM-19-1635.R2_supplementary_materials for Mediating effect of cognitive load in song learning with visually presented lyrics by Yo-Jung Han in Psychology of Music
Footnotes
Acknowledgements
My sincere thanks go to Dr Joanne Rutkowski, Dr Andrea Halpern, and the reviewers for their invaluable comments on the drafts of the manuscript.
Author’s note
Yo-Jung Han is currently an independent scholar. This article is based on the second study of her doctoral dissertation at the Pennsylvania State University, conducted while at Appalachian State University and submitted while at the University of Maryland.
Data accessibility statement
The data set of recall accuracy and reaction times can be accessed through Supplemental materials online.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
