Abstract
This paper provides an integrated social semiotic framework for analyzing intertextuality in multimodal advertising discourse. Following the distinction between manifest intertextuality and interdiscursivity, our model entails the three interrelated components of explicating what the intertextual sources are, how they are constructed with multimodal resources, and how they interact with the promotional discourse. Analysis of 30 popular video advertisements shows the fundamental role of character voices and different social semiotic activities in achieving the purpose of promoting products and services. Through intertextual devices, the advertisements construct multiple identities, including authoritative and peer ones, to evoke different reading positions. In particular, the identity of middle-class urbanites sharing their experiences and values with the audience is dominant. The intertextual devices achieve promotional, relational, and entertainment functions, and the promotional function is realized through sharing, recreating, expounding, and reporting activities, while the recommending activities only occupy a very small portion of the screen time of the advertisements. The framework of multimodal intertextuality provides a useful lens for explicating the complex meaning-making resources, their communicative functions, and hidden ideologies in advertising discourse, which can further provide new insight into the social reality.
Introduction
Considered as ‘parasitic discourse’ (Cook, 1992), or ‘hybridized discourse’ (Rahm, 2006), advertisements often make intertextual references to other texts produced by the mass media such as popular films and television drama (Conradie, 2011). Meanwhile, they often appropriate other genres such as news and scientific reports by exploiting their discursive structures and stylistic features for the purpose of promoting products and brand images. Therefore, intertextuality is deemed as a fundamental rhetorical device, or ‘persuasive metatextuality’ (Peterson, 2005: 135) for advertising, which serves the purpose of enhancing the persuasive effect of ads while reducing the appearance of commercial nature (Li, 2019). Numerous studies have been conducted to elaborate the hybridized nature of advertising discourse in various contexts, focusing on the promotional functions of intertextuality and its textual manifestation (e.g. Ali and Aslam, 2016; Cook, 2001; Feng and Wignell, 2011; Koskela, 2013; Nemčoková, 2014; Marina, 2019). For example, Cook (2001) distinguished between intra-generic intertextuality and inter-generic intertextuality. The former refers to the inclusion of voices of other advertisements, while the latter refers to the evocation of knowledge of other genres, such as films or novels. Li (2019: 509) considered intertextuality as a glocalization strategy for representing different social practices to achieve group affiliation in cross-cultural advertisements. Feng and Wignell (2011) was perhaps the most thorough analysis of intertextuality in television commercials. Working with the notion of intertextual voice, they distinguished between character voice and discursive voice, and examined how they engage with the product. However, previous research has not provided a systematic framework for unraveling the complexity of the sources of voices, particularly the discursive voice, and their multimodal construction. Addressing this gap, this study aims to develop an integrated framework for analyzing multimodal intertextuality in advertising discourse drawing upon various social semiotic frameworks, including the engagement system (Martin and White, 2005), visual grammar (Kress and van Leeuwen, 2006), and register typology (Matthiessen, 2009). In what follows, we will first provide a theoretical account of intertextuality and interdiscursivity. Then we will introduce our data and analytical method, which is followed by our proposed frameworks and analysis of multimodal manifest intertextuality and interdiscursivity. We conclude by arguing how the framework of multimodal intertextuality can provide a new tool for explicating the complex meaning-making processes in advertising discourse.
Intertextuality and interdiscursivity from a social semiotic perspective
Discourse studies of intertextuality are generally based on Bakhtin’s notion of dialogism, which claims that texts are ‘filled with echoes and reverberations of other utterances to which it is related by the communality of the sphere of speech communication’ (Bakhtin, 1986: 91). As Kristeva (1980: 66) put it in her widely quoted definition of intertextuality, ‘any text is constructed of a mosaic of quotations; any text is the absorption and transformation of another’. Fairclough (1992) introduced the notion of intertextuality to critical discourse analysis and distinguished between ‘manifest intertextuality’ and ‘constitutive intertextuality’. He refers to constitutive intertextuality as interdiscursivity in order to highlight the focus on the rules of discourse (i.e. genre). For the textual manifestation of intertextuality, Fairclough (1992) summarizes three types of intertextual relation, namely sequential intertextuality (where different texts or discourse types alternate within a text), mixed intertextuality (where texts or discourse types are merged in a more complex and less separable fashion), and embedded intertextuality (where one text or discourse type is clearly contained within the matrix of another).
Theories of intertextuality (and interdiscursivity) were developed for analyzing linguistic discourse, but recent studies have applied the notions to multimodal discourse. Royce (2013) argues that multimodal texts are inherently a result of other texts and provides an illustrative analysis with a text from The Economist Magazine. Working with the Halliday an notion of Field, Tenor, and Mode, he finds that the multimodal text he analyzed interacts with other texts ‘not only in terms of the text’s subject matter and the issue addressed (i.e. Field), and the attitudes expressed towards this issue (i.e. Tenor), but also in the ways that the magazine has produced them compositionally (i.e. Mode)’ (p. 12). Interdiscursivity has become a key notion in genre analysis (e.g. Bhatia, 2014, 2017). According to Bhatia (2017), interdiscursivity refers to ‘various forms of hybrid and relatively novel constructs by appropriating or exploiting established conventions or resources associated with other genres and professional practices’ (p. 35). As Kress (2010: 25) observes, ‘since the 1970s there has been an ongoing and increasingly far-reaching blurring of the boundaries of genres and of generic types’. Researchers have explored the multimodal construction of interdiscursivity, particularly in digital genres (e.g. Anderson and van Leeuwen, 2017; Feng, 2019; Lam, 2013). Developing this line of research, the present study aims to provide an integrated social semiotic framework for analyzing multimodal intertextuality. Following Fairclough (1992), we distinguish between manifest intertextuality and interdiscursivity and propose two frameworks for our analysis.
Social semiotics is an approach set out in several foundational works, such as Hodge and Kress (1988), Kress (2010), Kress and van Leeuwen (1996, 2001), and van Leeuwen (2005), which aims to examine the social world as it is represented in various (multimodal) semiotic artifacts. The social semiotics of multimodal communication primarily derives from Halliday’s (1978) theory of language as social semiotic. The concepts of meaning and resource are central to the theory, which ‘jointly mark out the domain to be named: ‘meanings to be made’ and ‘means for making meanings’’ (Kress, 2010: 108). Various frameworks have been developed to model the meanings and meaning-making resources in language, visual image, gesture, and so on. Following this approach, our analysis is concerned with two key questions, namely, (1) what are the sources of the intertextual references, and (2) how are they constructed with multimodal resources?
The multimodal analysis of manifest intertextuality draws upon the Engagement system in the Appraisal theory (Martin and White, 2005), which proposes that a dialogic perspective should not only be concerned with ‘who/what is the primary source of the proposition’, but also how the intertextual voices are positioning themselves with respect to the proposition (i.e. the strategies of engagement). Similarly Bazerman (2004: 94) states that ‘intertextuality is not just a matter of which other texts you refer to, but how you use them, what you use them for, and ultimately how you position yourself as a writer to them to make your own statement’. Therefore, our social semiotic framework of analysis of manifest intertextuality contains three components, namely, tracing the source of the voice, explicating the multimodal construction of the source, and investigating how the source engages with the current discourse through multimodal resources.
The multimodal analysis of interdiscursivity is concerned with the domains of activity and subject matters in advertising discourse as well as how they are multimodally constructed. As Fairclough (1992) suggests, each genre is associated with a specific social practice and advertising discourse represents the practice of promoting products and services. However, through time, the genre of advertising has become a site where various types of discourse interact and contribute to the purpose of promotion. To explicitly model the activities involved and how they are mixed, we draw upon Matthiessen’s (2009, 2015) field-based register typology. The typology entails seven types of semiotic activities that are primarily realized by language and other semiotic resources, which are summarized as follows:
Data and analytical method
The data for this paper are video advertisements from Taobao.com, the largest online shopping retail platform in China. It is a worldwide e-commerce trading platform that allows users to buy high-quality beauty, fashion, household appliances, snacks, and other goods from all over the world with one click, and the site has a huge user community in China. The main consideration for collecting data from Taobao is that its advertisements cover a wide range of product categories. The reason for analyzing video advertisements is their complexity in using intertextual and multimodal resources, which is essential for the development of valid analytical frameworks. Based on the Interbrand 2021 Global Top 100 Brands list and the complete list of Chinese brand products listed on the Maigoo website, we searched Taobao.com for 30 video advertisements based on their sales. Understandably, products with a high volume of sales are mostly small products for daily use, and the 30 advertisements included daily care products (N = 12), clothes (N = 7), maternal maternity and infant products (N = 6), and food (N = 5).
Our analytical method is what Serafini and Reid (2019) call ‘multimodal content analysis’, which draws upon qualitative content analysis (Schreier, 2012) and social semiotic theories as introduced above, for ‘conceptualizing and analyzing a selected corpus of multimodal phenomena’ (p. 2). As Serafini and Reid (2019) argue, this approach allows researchers to ‘move beyond traditional analytical perspectives and procedures of quantitative content analysis to address the complexities inherent in the multimodal nature of contemporary modes of representation and communication’ (p. 2). Drawing upon relevant social semiotic frameworks, we annotated and classified the sources of manifest intertextual voices and the social semiotic activities in the dataset. This is followed by in-depth qualitative analysis of how they are constructed with multimodal resources. We need to acknowledge that our dataset might not be big enough to generalize our findings, and the main value of the paper is its development of an integrated framework for analyzing multimodal intertextuality.
The multimodal construction of manifest intertextuality
As introduced in Section 2, our social semiotic analysis of manifest intertextuality contains three components, namely, identifying the source of intertextual voice, explicating the multimodal construction of the source, and investigating how the source engages with the current discourse through multimodal resources. In this section, we are going to demonstrate our multimodal content analysis of the data guided by the framework and report the salient features.
The first step of analysis is to map out the sources of voices in the advertisements. Analysis in this regard is essentially bottom-up, involving the identification and categorization of sources to reveal their salient features. However, in analyzing advertising discourse, we find van Leeuwen’s (2007) framework of Authorization useful for our categorization of sources as the most important function of the external voices is to legitimize the advertised products. van Leeuwen (2007) distinguishes between Custom, Authority, and Commendation in his Authorization framework. Custom entails the reference to Tradition and Conformity (e.g. most people use this product); Authority includes personal authority (e.g. parents to children) and impersonal authority (e.g. the law or policymaking bodies); Commendation may come from Expert (e.g. doctor) or Role model (e.g. celebrity). The distribution of the sources in the dataset is shown in Table 1, from which it is clear that all the 30 advertisements make use of more than one external voice. The most prevalent type of source is various characters, particularly common people, experts, and role models. In the 30 advertisements we collected, experts appear in 9, role models appear in 12, and unidentified common people such as children and mothers appear in 18 advertisements.
Distribution of the sources of manifest intertextual voices.
We distinguish between characters who are specified (i.e. with names) and characters who are not (i.e. generic, such as unnamed children or doctors). In advertising, experts and celebrities are usually specified, while common people are often nameless and generic social types. Generic representations are not concerned with the represented individual per se, but with the social groups and the attributes they embody. The ‘common people’ are usually (idealized) consumers of the advertised products or services who are good-looking, happy, fashionable, healthy, and so on, depending on the target consumers of the products/services. Three features are noticeable from our dataset. First, an overall middle-class identity is highlighted, where the characters are mostly well-off urbanites. Second, the generic attributes of women in some advertisements manifest independence and pursuit of personal happiness, but the majority depicts traditional roles of the mother and housewife. Third, some advertisements use white people (presumably Europeans) as sources of commendation though the product is targeted at Chinese consumers.
The second step of analysis is explicating how the identity of the source is constructed. We will focus on characters as they are the most prevalent in the data. Their identities can be articulated through language or embodied through non-linguistic resources. If an identity is articulated, there are options of labeling through on-screen captions or subtitles, or referencing through characters’ utterance or voiceover. If embodied, the identity is revealed through what a character does, how he/she looks, what he/she wears, who he/she is with, and so on. Drawing upon Kress and van Leeuwen’s (2006) visual grammar, we can distinguish between character actions and analytical features, with the latter referring to appearance, clothing, and accessories. Aside from embodied resources, cinematographic resources like shot distance and camera angles can also play a role in constructing the identity of the characters. Different characters are represented in different ways. For example, the identity of generic characters is usually not articulated as they can be easily recognized through visual depictions. In contrast, specific characters often require explicit labeling or reference. Even though most celebrities can be recognized by the target audience (i.e. embodied), they are also labeled to highlight their status (e.g. ‘巨星’ (super star)). The representation is often multimodal to enhance credibility, involving both verbal and visual resources, especially in representing experts. For example, in the NASENTEL OPHERA cosmetic advertisement (see Figure 1), the screenshot on the left is a generic representation of doctors using the resource of analytical features (esp. clothing). The screenshot on the right features a middle-aged man wearing a lab coat (analytical feature), who is actually the man in the middle in the image on the left, and his name and title (Mr. Wataru Tokue, Senior Research and Development Scientist) is overlaid on his body in blue and bold characters. The move from generic to specific representation is crucial for enhancing the reliability of his identity and authoritative status. In the screenshot on the left, the man would be seen as an ordinary doctor or even just an actor in doctor’s uniform. The labeling of his name and designation makes him a real and identifiable individual, which is further corroborated by his action of doing experiment in a lab.

Cosmetic Advertisement© Image copyright NASENTEL OPHERA, reproduced with permission from https://www.bilibili.com/video/BV1g54y1k7ZK/?spm_id_from=333.337.search-card.all.click.
The third step of analysis is to investigate how the sources engage with the advertising message. Similar to the construction of identity, engagement can be articulated through language or embodied through non-linguistic resources. For articulation, we draw upon Martin and White’s (2005) system of Proclaim, which includes the three options of concur, pronounce, and endorse. Concur refers to overt declaration of agreement, using expressions such as ‘naturally’ and ‘undoubtedly’. Pronounce refers to explicit authorial emphasis or intervention, such as ‘I contend that. . .’, ‘It is clear that. . .’. Endorse refers to the employment of external sources to support the claim, such as ‘experiments/research shows/proves that. . .’. Engagement can also be constructed through non-linguistic resources, including paralinguistic features (e.g. intonation, loudness, and pitch) and nonverbal behaviors. First, paralinguistic features may be used to indicate or reinforce the attitudinal stance of the characters, for example, emphasizing a statement through stress. Second, nonverbal behaviors such as gestures or facial expressions can work alone or accompany verbal articulations to express the speaker’s stance toward the product. For example, facial expressions may include desire for the product, excitement when seeing the product, enjoyment of the product, or seriousness and affirmativeness when recommending the product. A good case in point is shown in Figure 2. In the TV advertisement of YiliWeikezi banana milk, the character (a celebrity) makes a ‘slurping’ sound (shot ①) and shows an expression of enjoyment (shot ② and shot ③) after drinking the product. Finally, in shot ④, he presents the product to the audience in a close-up shot, directly gazing at the audience with an earnest expression. This is a typical way in which characters engage with advertised products, which can be seen as a visual Proclaim of his stance to constrain alternative voices.

YiliWeikezi banana milk advertisement© Image copyright Yili Milk, reproduced with permission from https://play.tudou.com/v_show/id_XMTY4MTU3MDk4MA==.html.
The framework for analyzing manifest intertextuality is summarized in Figure 3. Application of the framework to empirical data analysis enables us to disentangle the complexity of meaning-making in multimodal advertising discourse in terms of the sources that are drawn upon, the multimodal construction of the sources, and how they engage with the product/service.

Manifest intertextuality in multimodal advertising discourse.
The multimodal construction of interdiscursivity
As introduced in Section 2, we draw upon Matthiessen’s (2009, 2015) field-based register typology to analyze interdiscursivity in advertising discourse. The typology is useful for understanding interdiscursivity because it allows us to examine which activities are involved in the advertisements and how they are mixed. Drawing upon this typology, we first annotated and quantified the social semiotic activities performed in all the advertisements to understand their overall distribution. As different types of activities may shade into each other and generate register hybridity (Halliday and Matthiessen, 2013), we further looked at the mixture of activities in each video. For videos that manifest an interdiscursive mix, we differentiated between the primary activity and the secondary activity and categorized the videos by their primary activities. The identification of the primary activity depended on the duration of the activity on the one hand and the foregrounding/backgrounding of the activity on the other hand. For example, if the substantial length of a video features recommendations of the product, with non-diegetic background music, it was coded as recommending with recreating and was categorized as a recommending-oriented video.
While social semiotic activities are concerned with authorial activities, we also looked at the specific activities, or subject matters, represented in the advertisements. It follows that the second step of the analysis is to summarize the subject matters of the videos in each category. Such an analysis provides a more nuanced understanding of the advertisements’ strategic foregrounding of certain information for the particular product/service promoted. Finally, at the micro level, the social semiotic activities are realized by linguistic, visual, and audio resources. The analysis at this level focused on the actual rendering of the activities and subject matters. For linguistic resources, we may look at prominent linguistic features in different activities, for example, language styles. For visual resources, the analysis drew upon Kress and van Leeuwen’s (2006) visual grammar to examine what/who is represented (participant), in what way it is represented (process) and under what circumstances it is represented (setting), as well as what camera positioning (e.g. close/medium/long shot; frontal/oblique angle) and visual effects are used. For audio resources, we took sound effects and ambient music into consideration, which are recognized by Machin and van Leeuwen (2016) as a specific semiotic mode with its own meaning potentials. In the advertisement videos, they mainly serve to create pleasant atmosphere to engage the audience.
The distribution of the social semiotic activities is shown in Table 2. Not surprisingly, recommending is the most frequently occurring activity, which appears in 29 out of the 30 advertisements. However, it is important to note that recommending is never the primary activity; in most cases, it only occupies three or four seconds of the screen time at the end of the advertisement. This is also hardly surprising as the ads are trying to dilute their commercial nature. In most cases, the products are recommended through the utterance of the characters with different paralinguistic features. For example, the voice of male doctors in toothpaste ad which thick and serious, the voice of children that is excited, and the voice of women in skincare ad that is sweet and confident. Visually, recommending is realized through visual performance of people in nine cases, usually a doctor or a celebrity holding up the products or pointing to them. In the rest of the data, a separate picture of the product with the product name and slogan is shown at the end of the advertisement.
Distribution of social semiotic activities.
Recreating is the second most frequent, appearing in 27 advertisements, suggesting the importance of entertaining viewers. There are three ways in which the recreating activity is realized: narrative structure, background music, and performances (esp. dancing). Four advertisements have a narrative structure, usually with a setting in everyday life where a problem arises, then the problem is solved by the product and a happy ending is highlighted. Recreating is realized through visual performances coupled with ambient music in three videos, where the represented participants engage in performances like dancing and singing. Background music is used in all advertisements and the type of music varies depending on the type of product. For example, ads for infant products mainly use cheerful piano and songs accompanied by sweet laughter from babies. For skincare products, the music is mainly graceful violin and singing. In the case of commercials with dramatic scenes, the music varies according to the scene. For example, problem scenes are often accompanied by a tense rhythm, and as the advertised product solves these problems, the music becomes more relaxing and cheerful. In visual performances, the ambient music synergizes with the dance moves to further enhance the dramatic effect.
Sharing appears in 22 advertisements. Though its number is smaller than recommending and recreating, sharing plays a more important role as it is often the primary activity (in 19 advertisements). In the 22 sharing cases in our data, various social practices are represented. Nine of them are family scenes, where the main focus is on stories and conversations between parents and their children, and between lovers. In these cases, the use of a certain product is contextualized as an essential element of good parenting or happy relations. Half are sharing experiences of using the products in various situations constituting a cosmopolitan lifestyle, such as in the office, in the gym, in the restaurant or on a trip, where the product is represented as crucial to the lifestyle. It is noticeable that two advertisements feature women sharing their life attitudes about being independent and positive. The sharing activity is characterized by the use of personalized self-referencing, second-person singular address forms, colloquial expressions as well as imperatives and questions, which construct the brand as a close friend interacting with the audience as if ‘chatting with individual members of them’ (Fairclough, 1992: 205). Emotion expressions, such as ‘love’ and ‘happiness’ are very frequently used. Visually, the represented participants’ smiles, hugs, kisses on babies, gazes, and interactive gestures, often in close-up or medium shots, further relate to the audience by engaging their emotions and eliciting responses from them.
A salient feature observable from Table 2 is that the number of activities far exceeds that of the videos (i.e. 90 activities in 30 videos), suggesting the prevalence of register hybridization in the advertisements. Indeed, there is no advertisement that only involves a single activity. The pattern of activity mixture is summarized in Table 3, where sequential intertextuality, embedded intertextuality, mixed intertextuality, and overlaid intertextuality are identified. Sequential mixture is the most common way of hybridization, appearing in all the advertisements. In these cases, different activities are juxtaposed in different stages of an advertisement. For example, the activity of recommending typically follows other activities. Embedded intertextuality is where one activity is accommodated in another one, for example, when sharing is a component in the overall narrative structure. For mixed intertextuality, recommending is sometimes rendered in the form of sharing, expounding, or reporting. In addition to the three forms of intertextuality proposed by Fairclough (1992), we also identified overlaid intertextuality, where the soundtrack and the visual track construct different activities. The most common scenario is where ambient music as a recreating activity is overlaid with other activities to engage the audience.
Patterns of activity mixture (‘+’ for sequential, ‘=’ for overlaid, ‘^’for mixed, ‘⊂’for embedded).
In what follows, we use an advertisement to illustrate the types of activities, the subject matters, and their multimodal realization. As transcribed in Table 4, this 59-second advertisement is a complex multimodal discourse involving the authorial activity of a multimodal narrative (recreating activity), within which there is an expounding activity and a sharing activity (i.e. embedded intertextuality), a recommending activity that follows the narrative (i.e. sequential intertextuality), and ambient music throughout the advertisements (i.e. overlaid intertextuality). The main plot of the narrative is a little girl practicing dancing, constructed by the audio-visual resources of characters (a mother and a little girl), the process (practicing dancing), and the circumstances (home and stage). The first screenshot sets the scene, where a little girl is practicing dancing in the living room and her mother is reading. Judging from the living condition, this is a middle-class urban family with the mother as the primary caregiver. The girl keeps falling down and in the second screen shot, the mother comes to teach her. Then in screenshot 3, she is dancing gracefully. Then the shot cuts to the bathroom where the activity of the mother helping the girl brushing her teeth is shared. The voiceover in screenshot 4 ‘just like you’ draws an analogy between Colgate and the ‘mother’, and maps mother’s care for children onto the brand. The fifth screenshot uses a visual metaphor in which a tooth (tenor) is personified as an animated person (vehicle) with strong muscles, providing a vivid illustration of the technical line of ‘injecting high calcium and fluoride to help keep her teeth strong’ (expounding). Screenshot 6 continues the sharing activity in which the mother and the girl are tapping on their teeth to show how strong they are. In screenshot 7, the shot cuts to the dancing plot again, featuring the ending of the girl’s successful performance on stage. The girl is wearing a triumphant smile, showing her white teeth, and the scene is accompanied by the voiceover ‘with the support of you and Colgate, she will smile with confidence’, which unabashedly takes credit for the girl’s success. In this way, the effect of Colgate is naturalized as its contribution to the girl’s growth and success in the multimodal narrative. Finally, screenshot 8 features a doctor recommending the product to further corroborate the effect of the product. However, the promotion is represented as a medical advice that serves the interest of the ‘patient’. Summing up the analysis, we can say that none of the scenes or lines are direct ‘sales pitches’, and the persuasive function is realized through the multiple activities of the success story of the girl, the sharing of teeth brushing, the explanation of the ingredients and functions of the toothpaste, and the doctor’s professional advice.
Colgate toothpaste TV advertisement© Image copyright Colgate China.
Reproduced with permission from http://www.bilibili.com/video/BV1BE411c7LU/?spm_id_from=333.337.search-card.all.click.
Discussion and conclusion
The two sections above have provided a detailed analysis of the sources of manifest intertextual voices and social semiotic activities as well as how they are multimodally constructed. The multiple sources of manifest intertextual voices and social semiotic activities are designed to achieve multiple communicative functions, including the promotional function, the relational function, and the entertainment function. While the promotional function is realized congruently through the activity of recommending, what is remarkable is its incongruent realization through all other activities, namely, recreating, sharing, expounding, and reporting. Through featuring different characters and different activities, the advertisements construct different subject positions, most notably, an authoritative one (e.g. experts and the activity of expounding) and a peer one (e.g. common people and the activity of sharing), to construct different writer–reader relations and different reading positions (Feng, 2023). Such exploitations of the boundaries between experts and laypersons have recently been discussed extensively in studies on the hybridized identities of vloggers and YouTubers. For example, from a genre perspective, Bhatia (2018) investigated how amateur experts engaged in ‘interdiscursive performance, including colloquial talk to curate an authentic and real self; jargon and formal instructional talk to curate an expert self; and promotional talk to brand their YouTuber self in the construction of identity online’ (p. 108).
Aside from the promotional function, the advertisements also serve relational and entertainment functions to engage the audience. In sharing activities, the representation of common people, as well as celebrities, using intimate expressions and close-up camera shots, serve to construct an illusion that the characters are experientially relatable and interpersonally close to the audience. The recreating activity also serves to entertain the audience with dramatized stories, humorous expressions, graceful dancing, upbeat music, and so on. It seems the advertisements have fully assimilated the logics of the entertainment industry to engage the interest of the audience in the pan-entertainment era. Based on the dominant role of sharing and recreating activities as well as the related character voices in the data, we can argue that personalization and recreationalization are the two key features of the advertising discourse, and our frameworks provide an explicit understanding of how they are realized through multimodal resources.
More specifically regarding who the characters are and what activities they are engaged in, the analysis shows that the products and services are promoted and legitimized by their association with an idealized ‘good life’ to which the target consumers aspire. The characters manifest a middle-class identity characterized by a modern, affluent, enjoyable, and colorful life. Meanwhile, the consumption of the advertised products is represented as an essential component and a crucial indicator of the ‘good life’ (see Hung and Li, 2006). This echoes Shields’ (1992) comment that identity and lifestyle are increasingly constructed through consumption practices. Overall, the advertisements construct an idealized xiaozi [petite bourgeoisie] lifestyle, which marks the increasingly consumerist nature of contemporary urban Chinese society (Peng, 2019). A direct consequence of consumerism on advertising is that brands are forging symbolic and emotional linkages between the advertised products and potential consumers, instead of merely highlighting the pragmatic aspect of the product. As observed by van Leeuwen (2005), with the increasing quantity of competing brands, consumer goods producers have begun to ‘elaborate symbolic systems to transform them [their products] into lifestyle signifiers, to differentiate them in terms of the kinds of expressive meanings that were traditionally associated with individual styles: feelings, attitudes, personality traits’ (p. 145). Seen in this light, the advertisements also perpetuate an idealized xiaozi lifestyle in China.
On a critical note, the advertisements focus on well-off urban families, career women, professionals and celebrities, manifesting an unmistakable middle-class orientation, while less privileged people from the working class and from the countryside (and their activities), who account for an absolute majority of the Chinese population, are excluded (cf. Feng, 2023). Moreover, the multimodally constructed authenticity of the characters and activities is just an illusion to manipulate viewers to buy the products. At the end of the day, all a buyer can get is the product, and the characters’ life has nothing to do with him or her. Williamson (1978) wrote about the effect of advertising more than four decades ago and suggested that it ‘offers us an image of ourselves that we may aspire to but never achieve’ (p. 64), which is arguably even more applicable to video advertisements today due to their enhanced capacity of creating authenticity.
To conclude, this paper provides an integrated framework for analyzing intertextuality in multimodal advertising discourse. Following the distinction between manifest intertextuality and interdiscursivity, our social semiotic model entails explicating what the intertextual sources are, how they are constructed with multimodal resources, and how they interact with the promotional discourse. Analysis of 30 popular video advertisements shows the fundamental role of character voices and different social semiotic activities in achieving the purpose of promoting products and services. The advertisements construct multiple identities, including authoritative and peer ones, to evoke different reading positions. In particular, they foreground middle class urbanites sharing their experiences and values with the audience. Altogether, the advertisements achieve promotional, relational, and entertainment functions, and the promotional function is realized interdiscursively through sharing, recreating, expounding, and reporting activities, while the recommending activities occupy a very small portion of the screen time of the advertisements. We close by arguing that the framework of multimodal intertextuality provides a useful lens for explicating the complex meaning-making resources, their communicative functions, and hidden ideologies in advertising discourse, through which we can also gain new insight into the social reality. Further studies may analyze multimodal advertising discourse in various contexts from a diachronic perspective, an intercultural perspective, and so on, and the framework can also be applied to (and modified for) the analysis of a wide range of multimodal discourse types beyond advertising.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of thisarticle. The research for this article was supported by Social Science Planned Project of Shandong Province [Grant Number: 22CYYJ14 ] and by the National Social Science Fund of China [Grant Number: 22BYY130].
