Abstract
This article presents a multimodal genre analysis of crowdfunding proposals, an emerging web-based genre for raising funds from internet crowds for a project or venture. Based on an analysis of nine most-funded Kickstarter crowdfunding proposals, the authors describe the generic move structure using a semiotic approach and examine the role of visual images in constructing meaning within and across moves. The analysis shows that visual images facilitate potential backers’ sense-making in basically two dimensions: rhetorically, functioning to persuade by establishing ethos, logos, and pathos, and compositionally, helping achieve cohesion within and between moves and facilitate move mixing, embedding, and positioning. This study also attests a case-based approach to examining multiple influences on genre emergence.
Keywords
The crowdfunding proposal (CFP) is an emerging web-mediated genre for a new business model called crowdfunding, a method of raising funds for a project or venture by appealing to the internet crowd. Unlike traditional agency-centered fund-raising practices, crowdfunding is disintermediated through the use of social media (Frydrych et al., 2016). The business model appeared in the 1990s and quickly rose to prominence with the proliferation of online platforms such as Kiva, IndieGoGo, and Kickstarter. While the business model of crowdfunding has attracted increasing attention in recent years from researchers of business and entrepreneurial finance (e.g., Belleflamme et al., 2014; Brown et al., 2017; Kaminski & Hopp, 2020), the genre of the CFP has so far received inadequate attention from genre analysts.
This study examines the CFP because it is a new member of the fund-raising genre colony (Bhatia, 2017) that represents the rapidly changing business communication in this digital era. To examine the genre, we juxtapose it with other fund-raising genres and promotional genres in general to see more clearly the “overlapping territorial claims” (p. 158), on one hand, and the CFP's distinctive generic features, on the other. The past decades have witnessed a substantial body of literature on fund-raising genres, from fund-raising letters (e.g., Bhatia, 2017; Mann & Thompson, 1992) to philanthropic or research grant proposals (e.g., Connor & Mauranen, 1999; Cotos, 2019; Feng, 2011).
Like advertising and sales letters, fund-raising genres have two primary communicative functions: to inform and to persuade. Given these shared communicative functions, advertising letters have invaded the territorial integrity of fund-raising genres—even philanthropic fund-raising letters display strong similarities with advertising in their rhetorical moves to persuade their readership. But fund-raising genres are distinctive in that they emphasize community participation and seldom focus on competitive campaigns or use pressure tactics, such as setting a deadline for a limited offer (Bhatia, 2017). These distinctive features are probably due to the nature of their appeal, that is, for funds for research or a morally grounded enterprise rather than a business proposition.
The CFP appropriates the generic features of both fund-raising and corporate advertising. It is similar to other fund-raising genres because it also aims to solicit funds, in some cases for nonprofit projects. Yet it differs from traditional fund-raising genres in terms of the target reader (family, friends, and strangers connected via social media rather than institutions or community members) and medium (webpages rather than printed texts). The CFP is similar to advertising because both aim to promote a product or service. Yet unlike corporate advertising, the CFP solicits support for products or ventures that are, at best, only halfway completed (Stanko & Henard, 2017), so potential backers have little way to compare the offer in a CFP with similar products or services (Kaminski & Hopp, 2020), and their investment commitment is driven more by personal interest and affinity than by rational calculations. It is crucial, then, for crowdfunding proposers to establish the legitimacy of the campaign by constructing an entrepreneurial story (Frydrych et al., 2016) that appeals emotionally to potential backers.
In the past decade, researchers have examined factors that may affect the outcomes of crowdfunding. For instance, Mitra and Gilbert (2014) analyzed a corpus of 45,000 CFPs using the Linguistic Inquiry and Word Count tool and found that “the language used in the project ha[d] surprising predictive power—accounting for 58.56% of the variance around successful funding” (p. 49). Among the top predictors are phrases that reflect the principle of reciprocity, emphasize the limited supply, suggest social proof and social identity, and express gratitude. Kaminski and Hopp (2020) employed a neural network and natural-language processing approach to analyze text, speech, and video object–related metadata in 20,188 CFPs. They found that phrases aimed to trigger excitement (“perfect” or “amazing”) or express inclusiveness (“you,” “community”) are positive predictors of campaign success. While these quantitative studies by business researchers relate linguistic features directly to campaign outcome, they do not relate them to the macrolevel rhetorical moves, thus providing fragmental insights for CFP writers.
As a web-based genre, the CFP owes its effectiveness to the integrative use of multiple semiotic modes—texts, images, hyperlinks, and other graphic elements marshaled into a coherent whole by the page layout. Thus, there has been increasing research on how visual communication may impact crowdfunding success. Kaminski and Hopp (2020) reported a hierarchy of different semiotic modes in motivating potential backers to make decisions. For instance, they are more likely to respond to visual cues when they are in a state of low attention, and they will read textual materials more carefully only when the visual cues induce a high attention state. And Lee et al. (2021) found that the animation, color, and explanatory information of a progress bar have an impact on potential backers’ positive affect, trust, and ability to understand information and, in turn, on their decision to support a project.
Yang et al. (2020) were among the few to examine the interaction between different communication modes in the CFP. They found that while text length has a significant effect on fund-raising outcomes, the effect decreases with the increase of images. They attributed this “overshadowing effect” (p. 6) between modes to information redundancy and cognitive load. While most of the literature emphasize the primacy of visual modes in crowdfunding communication, Grebelsky-Lichtman and Avnimelech (2018) rejected the hypothesis that nonverbal communication will have a stronger impact than verbal communication on the crowdfunding success based on regression analysis.
In response to recent calls for more qualitative investigations of crowdfunding presentations, several studies have examined rhetorical argumentation in crowdfunding (Cudmore & Slattery, 2019; Palmieri et al., 2022; Tirdatov, 2014). Cudmore and Slattery examined the rhetorical appeals used in the spoken narration of crowdfunding videos and found that successful videos were more likely to implicitly claim project creators’ credibility (ethos), use positive descriptors like “awesome” and “passionate” (pathos), and introduce benefits and rewards for backers (logos). Their cross-cultural analysis revealed that while successful U.S. videos were more likely to provide information on why the creator needed donations (logos), successful non-U.S. videos were more likely to discuss the exclusivity of the product (pathos), a strategy similar to “pressure tactics” that emphasizes limited supply or availability. Palmieri et al. (2022) also analyzed crowdfunding video narrations, identifying not only microlevel rhetorical strategies (ethos, pathos, and logos) but also macrolevel argumentation (societal problem–solution; personal problem–solution; desire–project). Comparing successful and unsuccessful campaigns, they argued that a compelling narrative makes a case for a societal problem that is experienced by the creator and relevant to the backers.
These rhetorical studies, however, did not examine the rhetorical appeals of visual images. Although some studies have quantitatively examined the role of visual communication in predicting crowdfunding outcomes, qualitative descriptions of the visual designs of this genre, from the macrolevel layout to the meaning-making of individual images, have unfortunately been absent in prior research.
This study offers an in-depth analysis of the CFP from the genre perspective. To characterize this new genre, we identify regularities in terms of rhetorical structure because they reflect both the conventions and the expectations of the crowdfunding platform. Meanwhile, we tap into variations and innovations because the interplay of verbal and visual resources adds complexity to the genre construction. By integrating multimodal analysis (Kress & Van Leeuwen, 2006) with rhetorical move analysis (Swales, 1990), we address the following two research questions:
What is the generic move structure for the CFP? What role do visual resources play in the CFP, and how do they help construct meaning within and across moves?
Theoretical Background
Before we describe our methods for this study, we will first provide a theoretical background for rhetorical move analysis and multimodal analysis.
Rhetorical Move Analysis
The term “move” was first used by Swales (1981) in his seminal work describing the model for creating a research space in research article introductions. Move analysis, a long-standing analytical approach in genre analysis, is used to identify the functional units of texts—moves and steps (subunits of moves)—in genres. Bhatia (1993) developed the move analysis by introducing the analytical unit of nondiscriminative “strategies” that afford flexibility in writers’ strategic choices (unlike “steps,” which indicate a prototypical order). He has also identified a nonlinear, interactive move structure in legal genres, which has made move analysis a flexible rather than reductive method.
Move analysis has been employed to analyze the rhetorical structures of a wide range of academic and professional genres (or subgenres), such as research article introductions (e.g., Samraj, 2002; Swales, 1981), advertorials (Zhou, 2012), business letters (e.g., Bhatia, 1993; Dos Santos, 2002), and grant proposals (e.g., Connor & Mauranen 1999; Cotos, 2019; Feng, 2011). Bhatia (2017) proposed a seven-move structure for fund-raising letters: establishing credentials, introducing the offer, offering incentives, enclosing documents, soliciting response, using pressure tactics, and ending politely. And Mehlenbacher (2017) identified 13 moves in her analysis of two CFPs that solicited support for research on Kickstarter: establishing an exigence, establishing a response, occupying the response, outlining means, stating achievement, claiming benefits, claiming competence, claiming importance, updating supporters, illustrating the project, linking to resources, identifying limitations, and appealing for support.
The findings of those two studies have provided a point of departure for our study's move analysis of CFPs; however, they did not address how visual or hypertextual elements may constitute an important part of moves and move structure. In fact, move analysis has mostly been confined to verbal genres or to verbal parts of genres that deploy multiple semiotic resources. Liu (2020) examined the emerging academic genre of video abstracts, attempting to revise move analysis to incorporate the description of multimodal features, or “move units,” such as talking head, voice-over, minimovie, and animations. Yet Liu did not describe how such move units were integrated in each move to help achieve communicative purposes, and Liu even lamented that “move analysis is unattainable in examining the multimodal textual resources and arrangements” (p. 441).
In recent years, researchers have attempted to extend genre theory to account for multimodal web-mediated genres. Askehave and Nielsen (2005), for instance, proposed a two-dimensional model to account for the characteristics of nonlinear, multimodal digital genres. As they argued, users of web documents often shift between acting as a reader and acting as a navigator. Because of these constant modal shifts, we must analyze web genres at both the reading and navigating modes at three levels: the communicative purpose, moves/links, and rhetorical strategies.
Andersen and Van Leeuwen (2017) extended Hasan's concept of “generic structure potential” and her work on the functional elements of shopping episodes to discuss the evolution of genres when they move online. They described an online fashion shopping site as consisting of an array of microgenres, or webpages (e.g., orientation, catalog, product information, purchase), which are genre hybrids (i.e., serving different communicative and pragmatic purposes) that can be navigated in different ways. Although some obligatory stages are still needed, online shopping is now realized by navigating between and through a series of microgenres. Andersen and Van Leeuwen thus considered online genres as “multi-generic structure potential” (p. 203). Lam (2013) used “multimodal move analysis” more specifically to investigate another online genre—internet group buying. In addition to identifying the move structure, she examined the functions and deployments of multimodal resources and hyperlinks within and across the moves, thus practically extending move analysis “towards a more generally semiotic, rather than strictly linguistic, approach” (Garzone 2002, p. 295).
Multimodal Analysis
Our study also draws on a multimodal analytical framework (e.g., Bateman, 2008, 2014; Cheong, 2004; Kress & Van Leeuwen, 2006; Marissa et al., 2011; Marsh & White, 2003; O’Halloran, 2008; Royce, 2007) to examine how different semiotic resources coconstruct meaning and help realize rhetorical moves. Stemming from systemic functional linguistics, this framework holds that visual images, just like language, can be used to realize ideational (representational), interpersonal, and textual (compositional) metafunctions.
As Kress and Van Leeuwen (2006) theorized, ideationally, visual images can represent the world either narratively (unfolding actions and events, processes of change, transitory spatial arrangements) or conceptually (revealing class, structure, or meaning). The narrative structures are featured via a vector, such as a line with an explicit indicator of directionality.
Interpersonally, visual images constitute and maintain the interaction between two kinds of participants: represented participants (i.e., people, places and things that are the subject of visual images) and interactive participants (the people involved in the act of communication, i.e., the image producers and viewers; Kress & Van Leeuwen, 2006). The interaction can be realized through visual configurations including contact (demand vs. offer dyad in terms of whether the participant's gaze or gesture does or does not directly address the viewer, respectively), social distance (size of frame), and attitude (the degree of involvement and power relationship delivered through the arrangement of angle and perspective). Modality, a linguistic term referring to the truth value, also adds to the interpersonal meaning. A visual may have high modality by representing people, places, and things as if they are real or low modality by representing them as imaginings or fantasies. Modality judgment, however, is socially dependent.
And compositionally, the meaning of an image is realized by the way represented participants of an image are brought together mainly through three interrelated systems: information value, salience, and framing (Kress & Van Leeuwen, 2006). The positioning of the elements in different zones of an image—left and right, top and down, center and margin—endows these elements with different information values (e.g., given and new, ideal and real). Elements’ placement in the foreground or background, relative size, and contrasts in tonal value or color create a hierarchy of salience in attracting the viewer's attention. And devices such as dividing lines present a framing of the elements that indicates how they belong to or separate from each other. These three compositional principles, as Kress and Van Leeuwen argued, “apply not just to single pictures” but also to “integrated text” that combines text and images (p. 177).
Thus, the rhetorical moves in this multimodal genre of CFP, whether they come in text, images, or an “integrated text” of both, can be perceived as represented participants on the webpage layout, and their relative positioning and mutual interactions can be examined in terms of informational value, salience, and connectedness. This affords a useful approach to extending move analysis to the examination of visual resources and their deployment in fulfilling communicative purposes.
Text–image relations have been a vital concern in multimodal research. Relevant studies can be traced back to Barthes (1967), who described text–image relations as consisting of mainly two types: relay (the verbal text extends the meaning of the image or vice versa) and elaboration (the verbal text restates the image or vice versa). Most empirical research has confirmed such a complementary relationship. Cheong (2004), for example, in describing the generic structure potential for print advertisements, revealed the bidirectional investment of meaning between the announcement (the linguistic mode) and the lead (the visual mode). And Royce (2007) referred to the correspondence between verbal and visual modes in constructing ideational, interpersonal, and compositional meaning as intersemiotic complementarity.
But research has also identified relations other than complementarity. Reviewing 24 previous studies, Marsh and White (2003) identified 49 text–image relations and grouped them into three categories according to whether text and image have little relations (e.g., to decorate, elicit emotion, or motivate response), close relations (e.g., to reiterate, organize, relate, or explain), or extending relations (e.g., to provide alternatives, information, or contrast). And in their analysis of an article in New Scientist magazine and its web site, Marissa et al. (2011) revealed how linguistic and image elements in a text worked at cross-purposes that reflected the competing discourses of science and media.
To examine how different types of interrelations between verbal and nonverbal communication may influence crowdfunding outcomes, Grebelsky-Lichtman and Avnimelech (2018) proposed four concepts: supportive congruency, challenging congruency, leakage discrepancy, and adaptive discrepancy. These four types of interrelations differ in terms of whether the two communication modes are congruent or discrepant in delivering positive or negative messages. Without comparative analysis of verbal and visual meaning making at the ideational (representational), interpersonal, and textual (compositional) dimensions, the four concepts they proposed seem to have simplified the complex intermodal relationship by focusing only on consistency in attitudinal expressions.
As this literature review has shown, although a large body of research has examined the complex intersemiotic relations and proposed various taxonomies, studies examining how text and image combine to achieve communicative purposes at the move and genre level are still rather limited, particularly in the context of Kickstarter CFPs. The integration of multimodal and rhetorical move analyses in examining this emerging fund-raising genre could offer further insights for developing analytical methods for new media genres (Askehave & Nielsen, 2005).
Method
We selected Kickstarter because it is one of the most influential crowdfunding platforms. At the time we conducted this study, it had over 21 million backers, 226,444 successfully funded projects, and $6.8 billion pledged to projects (Kickstarter, n.d.).
Data
Using purposive sampling, we collected nine Kickstarter CFPs (see Appendix). Kickstarter presents 15 categories of CFPs, including music, games, film and video, arts, publishing, design, comics, fashion, technology, and others. To select CFPs that represent best-practices strategies, we first focused on the 10 categories that had the most successfully funded projects. Meanwhile, considering that the nature of different categories could influence the way the CFP is designed, we chose three categories from those 10 categories that were quite different in nature: games, arts, and design. Then within each category, we searched for the 10 most-backed projects and the 10 most-funded projects, two different measures for crowdfunding success. We then randomly selected three projects that were on both the most-backed and most-funded lists in each of the three categories.
Data Analysis
We took Bhatia's (2017) seven-move structure for fund-raising letters as our starting point for formulating the generic move structure for CFPs. In identifying rhetorical moves, we used the communicative purpose as the defining criterion, and because we see move analysis as a semiotic rather than a linguistic approach (Garzone, 2002; Lam, 2013), we analyzed textual and nontextual elements in an integrated way. In addition, we identified move boundaries by examining typographical features (e.g., repetitions or discontinuities of fonts, colors, and graphic templates) and framing devices (e.g., the use or absence of white space, demarcation lines, vectors) that indicate connections or disconnections between information units. Given the nonlinearity of webpages, we formulated the move structure along both the horizontal and the vertical axis “as a kind of map” (Van Leeuwen, 2005, p. 85). Unlike the traditional text-based move analysis that describes move sequence, our analysis used move positioning to describe the two-dimensional deployment of rhetorical moves, drawing on Kress and Van Leeuwen's (2006) compositional trilogy of information value, salience, and framing.
We then identified and calculated variations, including move omissions, recurrences, mixing, and embedding. Move mixing is defined as the copresence of two or more moves (in verbal or visual modes) in one graphic space or clause complex. Move embedding refers to the embedding of one move (particularly its visual semiotics) in another move even though the two moves seem unrelated in communicative purpose, or the transition of the two moves seems abrupt. In the initial coding, we identified move mixing and embedding as irregularities, then in the second round, we defined and systematically coded these move variations.
Using the taxonomies of Kress and Van Leeuwen (2006) (see Figure 1), we first analyzed individual visual images, 1 including photographs, caricatures or sketches, maps, tables, and charts. For mutually exclusive categories that have clear judgmental criteria, two of us coded the nine CFPs independently, checked for complete agreement, and then calculated the number of visuals that belong to each category in total and within each move. But for representational meaning, modality, and compositional meaning, simple labeling was either impossible or inadequate. For representational meaning, one image may present several narrative or conceptual processes; for modality, judgment of whether the visual represents what is considered real or true is socially dependent; and for compositional meaning, general layout had to be considered. Therefore, for each image, we offered a description in the form of a comment, which described not just the image itself but also its relations to the texts and images in close proximity and the move in which it was embedded. These qualitative and quantitative analyses at the precoding stage formed the basis of our initial coding of the visual images’ functions.

Scheme for precoding visual images.
We then generated themes based on the comments we made in the initial coding. These themes served to construct our argument about the communicative functions of visuals in the CFP. After generating these themes, we went back to the data, reassessed the codes, refined the themes, and collected images to support our argument. Two of us coded all nine CFPs as a group, checked coding discrepancies every step of the way, and discussed these discrepancies until agreement was reached.
Results
To address our first research question, we present the rhetorical move structure that we identified in our CFP corpus, including its constitutive moves and move positioning. Then we address our second research question by introducing the rhetorical and compositional functions of visual images within and across moves.
Rhetorical Move Structure of the CFP
Our analysis identified eight moves deployed along both vertical and horizontal dimensions (see Figure 2): gaining the viewer's attention (Move 1), introducing the offer (Move 2), establishing credentials (Move 3), soliciting support (Move 4), expressing gratitude (Move 5), encouraging online communication and community building (Move 6), describing transactional details (Move 7), and offering counterarguments to risks and challenges (Move 8).

Diagram of CFP move structure.
Five moves (Moves 1, 2, 3, 4, and 8) are obligatory; that is, they appear in all the CFPs in the corpus (see Table 1). Among these five obligatory moves, three of them (Moves 2, 3, 4) overlap with the core moves of fund-raising letters (Bhatia, 2017); considering that we included offering incentives and using pressure tactics as subsidiary strategies of the soliciting support move, five out of seven moves of fund-raising letters were actually found in our CFP data set. Further, the overlapping three moves (Moves 2, 3, 4) recurred more frequently than other moves; on average, we found recurrences of Move 2 five times, Move 3 three times, and Move 4 four times per CFP. Thus, we argue, this emerging genre bears “the chromosomal imprint of ancestral genres” of fund-raising (Jamieson, 1973, p. 163).
Move Variations in CFPs.
Meanwhile, this emerging genre departs from antecedent fund-raising genres because of the “changed conditions of existence” (Miller, 2017): the new media and technologies that make innovative crowdfunding models possible. For example, Move 1, what potential backers would see first when they open the webpage, incorporates multisemiotic elements typically seen in print advertisements (Cheong, 2004) to pique viewers’ interest. Not seen in traditional fund-raising genres, this “shop window” move is designed in accordance with the practices of web design. And because the web affords an opportunity to extend the original “friendfunding to crowdfunding” (Borst et al., 2018), Moves 5 and 6 function to express thanks to the supporting internet crowd and encourage virtual community building.
The medium properties have shaped the CFP in terms of not only what moves are included but how moves are deployed in both vertical and horizontal dimensions. In contrast to traditional move analyses that examine linear move sequence, our analysis explores how the relative positioning of the moves affords them varying degrees of information value and visual weight (see Figure 2). The move positioning along the vertical axis conforms to the ideal–real pattern, as Kress and Van Leeuwen (2006) discussed. The most eye-catching move is Move 1 because viewers tend to start at the top and then scroll downward (Bateman, 2008). Immediately below Move 1 are the introduction of the offer and the story of why and how the creator developed the project (i.e., Moves 2 and 3). These three moves usually occupy the upper space of the webpage with salient photographs or caricatures that engage viewers and bring them visual pleasure. As Table 2 shows, Move 2 includes the largest number of visual images (on average, 14 visual images per CFP, or 50% of the total visual images); Move 3 has an average of four visual images per CFP, or 14% of the total. By contrast, Move 7, which provides mechanical details of the offer, such as delivery time or locations, usually appears at the lower part of the webpage, waiting to be read until the viewer has an interest in pledging support. Only three CFPs sporadically include visual images in Move 7. Move 8, which addresses the viewers’ potential concerns and offers counterarguments, includes no visual images and is mandated by Kickstarter to appear at the bottom of the page.
Number of Visual Images in Moves and Mixed Moves.
Thus, the vertical structure of the moves follows the common practice of marketing-oriented websites by endowing the top section with the information value of the ideal (the promise of the product) and the lower section with the information value of the real (providing practical information). Moreover, the top section gains more salience because of the large number of visual images used in Moves 1 to 3.
Hypertext menu bars, scrollbars, and white space create the frame lines that divide the whole webpage into different areas. Below Move 1, the vertical triptych that is commonly seen on websites is also constructed in the CFP with the help of white space and two scrollbars. At first sight, salience seems to be given to the moves in the middle column because of their central positioning and accompanying visual images that attract the viewer's gaze. But given the design of two independent scrollbars, the move of soliciting support always remains on the right side even when the viewer scrolls down the middle column to different moves. As such, this move of soliciting support is not merely perfunctory. While the central-column space describes the offer and its value, the right-column space motivates the viewer to take action—to click on the hyperlinks to make the purchase. This spatial arrangement fits quite well with Kress and Van Leeuwen's (2006) argument that “the elements placed on the left are presented as Given, and the elements placed on the right as New” (p. 181). Given is what exists, the moves that describe the offer, whereas new is what remains tentative, the move that entices the viewer to say yes to the project.
The two-dimensional move positioning, then, has changed the traditional linear rhetorical structure into a topological one, with rhetorical moves linked in spatial, locative terms, carrying not only their respective social functions but also different information values and degrees of salience. We will now describe each of the eight moves in detail.
Move 1:Gaining the Viewer's Attention. Just like the abstract in research articles fulfills the dual role of summarizing the article and gaining readers attention from the start (Swales, 1990), this first move in the CFP fulfills the dual role of showcasing the most attractive aspects of the offer and grabbing viewers’ attention from the beginning. In addition to the headline, it consists of multisemiotic elements, such as the lead, announcement, action button, creator name, and pledging information (see Figure 3). Given its window-shopping function, some elements (e.g., action button) repeatedly occur in other moves.

Schematic drawing of move 1.
Usually the headline, in the form of a catchphrase, presents the core feature of the offer with positive descriptors, as in “Arsenal, the intelligent camera assistant,” or “Pebble Time - Awesome Smartwatch, No Compromises.” The lead, using Cheong's (2004) term, is the most salient visual element because it is often the first image that the viewer sees. The lead of the Arsenal CFP, for example, constitutes a reactional structure (Kress & Van Leeuwen, 2006) in which Arsenal, the intelligent camera assistant installed on the camera, is the Reactor; the landscape in the long shot is the Phenomenon; and a vector is formed by an eyeline the direction that the camera is shooting. Arsenal does the looking through the camera like a human being, “capturing amazing images.” This reactional structure makes the viewer feel as if they were standing behind the camera, taking stock of the landscape ahead. The lead uses soft focus and colors to present the hazy and glamorized landscape as a fantasy or promise that Arsenal can offer. The modality of the background scenery is purposely softened in contrast to that of the foreground image of the product itself, which is in sharp focus. This contrast provides both rational appeal (to the product) and emotional appeal (to amazing images that Arsenal can capture).
The announcement in the upper right is the most salient linguistic item that foregrounds the essential feature of the offer. It is often intersemiotically recontextualized in relation to the lead and vice versa. The announcement of the Arsenal, for example, in the form of a command—“Unlock the potential of your DSLR or mirrorless camera and capture amazing images in any conditions”—not only reinforces the image act represented by the narrative structure of the lead but also clearly indicates the functions and the value of the offer.
The action button below the announcement motivates the viewer to act—to either get the offer or learn more about it. The creator's’ name and pledging information (i.e., number of backers, funds raised, and real-time campaign updates) were provided in the lower right. These publicly accessible data, as Frydrych et al. (2016) argued, “contribute to a project narrative that emerges organically from the interaction of the participants” (p. 107).
Move 2: Introducing the Offer. Move 2, which also integrates verbal and visual elements, is realized by two strategies: (a) describing the offer in terms of its characteristic features and functions and (b) indicating the offer's value in terms of its novelty, usefulness to individuals, or contributions to the community. Some CFPs, particularly those of technology products, include both strategies. As in the case of Arsenal, the verbal and visual elements under headings such as “What Is Arsenal?” “Artificial Intelligence Capabilities,” and “Hardware Specs” describe the offer whereas the elements under headings such as “Get Complete Wireless Control,” “Take Great Shots in Tricky Light,” “Keep Everything Sharp,” “Capture Loooooong Exposures” indicate the offer's value.
The photographs under each heading intersemiotically complement the accompanying textual explanations. The “What Is Arsenal” photograph, for example, performs the image act of “Offer” (the term used by Kress & Van Leeuwen, 2006, as opposed to “Demand” in describing how images construct interpersonal meaning)—offering the represented participant (the product Arsenal in this case) to the viewer using a top-down angle. This high angle affords the viewer (i.e., the potential backer) a privileged position and presents the product as the object of dispassionate scrutiny. In sharp focus and high resolution against an all-black background, the product is given high modality. The accompanying textual explanation begins with a relational process (“Arsenal is the world's first intelligent assistant for DSLR and mirrorless cameras”) with positive descriptors followed by “you”-oriented sentences (“lets you,” “help you”) that directly address the potential backer's needs. In a similar vein, the textual explanation also performs the speech act of introducing the offer.
Some CFPs, particularly those in the Games category, devote space mainly to the first strategy (i.e., describing the offer), likely because there is no need to discuss the value or benefits of the game. For example, the Frosthaven CFP devotes almost 60% of its space to introducing the game's storyline, components, characters, and rules because these features are what most interest game players (i.e., potential supporters). By contrast, some CFPs, particularly those with a philanthropic or political agenda, more frequently use the second strategy by emphasizing community needs and stating the proposal's mission, such as in the We the People CFP, which states explicitly, “WE BELIEVE ART HAS THE POWER TO WAKE PEOPLE UP.”
Move 3: Establishing Credentials. Move 3 is often realized by one or more of the following strategies: telling stories of how the offer (product or project) matured over time, constructing a positive persona of the creator, and incorporating endorsements from typical and credible beneficiaries, celebrities, or media. These strategies are often integrated, as in the Arsenal CFP: Hi, my name is Ryan. I’m a software developer, amateur landscape photographer, and the creator of Arsenal. Arsenal started as a side project to automate my own photography techniques. Once I started showing other photographers the photos I was getting, there was so much interest that I decided to create something everyone could use.
Storytelling is an essential tool that entrepreneurs use to assemble capital, particularly when the target audience is a broad, unknown crowd (Frydrych et al., 2016). That example from Arsenal establishes social affinity with unknown backers through first-person narration and legitimizes unfamiliar innovations by leveraging endorsement from “other photographers” and by emphasizing potential market needs (“something everyone could use”).
Similarly, the creator of the Mini Museum confers legitimacy on his venture with a compelling story of how he was inspired by his father at a very young age to want to build Mini Museums of various specimens: Many things inspired me but it really started with my father. He was a research scientist…. Growing up, we had a subscription to every great science magazine—and living near Washington DC we visited the Smithsonian museums and saw dinosaur bones, meteorites and rockets almost every weekend. My father kept an amazing collection of artifacts at his laboratory office and also at home. In 1977, the historic year of Star Wars and the Atari 2600, my father had returned from Malta with some artifacts that he had embedded into epoxy resin. I had never seen this done before and it was beautiful. Then, all at once, I saw it—my first product idea. The mini museum. A grand collection within a manageable space.
In this data set, verbal narrations are always accompanied by photographs or caricatures of the creators. These images, mostly symbolic attributive patterns (Kress & Van Leeuwen, 2006), often use attributes, such as the microscope in the Mini Museum CFP or the mural painting in the We the People CFP, to establish the creator's identity as an expert and front-on, eye-level, close or medium-close shots to establish social affinity with the viewer.
The third strategy—incorporating endorsements —is also frequently used. The CFPs for games often cite fans and earlier backers’ words. To endorse the Frosthaven game, for example, the CFP uses the words and logo “Rahdo Runs Through” of a famous gamer, Rahdo.
For CFPs seeking support for technology products, however, the endorsement is often realized in a front-angle classificational display (Kress & Van Leeuwen, 2006) of logos of companies that use the product or of media that have recommended the product (e.g., the picture of logos entitled “Look who's talking about the Coolest” in the Coolest Cooler CFP). Abstract and schematic, the classificational display of logos encodes a seemingly objective attitude, yet it is no less appealing than direct quotes from fans or celebrities because it conveys to viewers that the product gets an abundance of endorsements.
Move 4: Soliciting Support. Move 4 solicits support by motivating the viewer to pledge. This move appears not only on the right side of the tripartite webpage with an independent scrollbar but also in various parts of the CFP, embedded in or mixed with other moves (e.g., in Move 1's action button). It is realized by one or more of the following five strategies: direct appeal, hyperlinks or buttons for pledging, different pledge levels, incentives, and pressure tactics.
Direct appeals often use first- and second-person pronouns to directly address the potential contributor and frequently foreground contributors’ benefits and the importance of community building, as in the Exploding Kittens CFP: HELP US BUILD THIS GAME
We wanted to put this project on Kickstarter because it's the fastest way for us to get the game into your hands and make improvements as a community. We think this game is great, but we need your help making it even better.
Most CFPs offer several pledge levels, with each pledge tier providing different rewards. This strategy is an important way to motivate financial support because it coconstructs a narrative of success with backers by letting them choose from different levels of commitment and engagement.
Consistent with Cudmore and Slattery's (2019) study, our study found that pressure tactics were used (e.g., indicating exclusivity or limited supply or availability of a product) to make backers eager to pledge, as in the Sketchbook of Loish CFP: Please note that the number of signed copies are limited and will no longer be available once these rewards run out! This campaign is the only opportunity to get this bundle of books, stretch goals and goodies. It will also be the only way to get copies of the book with the unique numbered certificate.
Although on average, eight visuals per CFP demonstrate Move 4, the number of such visuals varied widely among individual CFPs—two include no visuals while one (Shenmue) includes 44 images. Most of these visuals are designed to show different reward levels or stretch goals, and three visuals are a mix of two or more moves.
Move 5: Expressing Gratitude. Mitra and Gilbert (2014) found that the two most important factors that increase backers’ liking are similarity and praise. Entrepreneurs use similarities to create a fan base and praise to extend appreciation and garner support. In our study, we have identified two similar moves—expressing gratitude (Move 5) and encouraging further online communication (Move 6). Here is an example of how the Mini Museum CFP used Move 5 to express gratitude: The successful funding of my Kickstarter campaign has allowed me to pursue my life long dream of bringing the mini-museum to the world. I want to thank all of the backers and the entire Kickstarter community. I also want to thank my family and friends. Without their love and continued support, this project could not become a reality. I am so grateful.
In our data set, only two CFPs include a total of three visuals using mixed moves to express gratitude while introducing the offer, establishing credentials, or soliciting financial support (see Table 2).
Move 6: Encouraging Further Online Communication. While crowdfunding projects are still mainly funded by creators’ family and friends, the purpose of this financing mode is to use social media to attract backers who have not previously been part of the creator's network, and latent-tie backers have been positively associated with project performance (Borst et al., 2018). Therefore, offering CFP readers links to social media, such as Facebook, in order to increase latent-tie funding is an important move. Our analysis identified Move 6 in seven out of the nine CFPs, with varying length, frequency of occurrence, and placement. And this move serves not just to invite further interactions with potential backers but also to cultivate a virtual community with collaborators (e.g., application designers for the smartwatch project).
Move 7: Describing Transactional Details of the Offer. Move 7 informs viewers about the transactional details of the offer, including the timeline for product walk-throughs, budget estimates, delivery information, and warranties. Three CFPs in our study included visuals in this move (see Table 2). For example, the Arsenal CFP draws a topographical timeline, which is a temporal analytical process, showing tasks to be finished in months to come. And with a game character shown in the softly focused and colored background, the Shenmue CFP uses a pie chart to explain how the raised money will be used. The inclusion of visuals, though rare for this move, replaces in a more stimulating way otherwise insipid descriptions of the creators’ plans and bolsters creators’ ethos by constructing their personae as organized professionals.
Move 8: Offer Counterarguments to Risks and Challenges. Because many ventures are only halfway completed (Stanko & Henard, 2017), potential backers in crowdfunding often face high uncertainty about the state of product development and the creator's technical expertise. Thus, Kickstarter requires the CFP creator to address this uncertainty by including a section headed “Risks and Challenges.” It is the only section (other than reward tiers) that has mandatory rules for what content should be included. CFP creators usually address this rhetorical exigence imposed by Kickstarter with a tacitly known strategy: offering de facto counterarguments to risks and challenges.
All of the CFPs in our study used Move 8 (see Table 1). Their counterarguments usually begin with concession—acknowledging possible risks and challenges (e.g., “As an experienced product designer with years of experience, I know that every project comes with risks. It is possible that specimens could be destroyed in some sort of cataclysmic disaster,” Mini Museum CFP). Campaign creators often put themselves in potential backers’ shoes to talk about possible risks (e.g., “For a product like Arsenal, which introduces a new set of capabilities to the market, the immediate questions are, ‘Is this technically possible, and can this team do it?’”) or even mention risks in terms of benefits to potential backers (e.g., “As the COOLEST [Cooler] gets geared up for production, there will likely be some minor adjustments to improve the final product…. These will always be for the benefit of you, the consumer, and we will let you know exactly how and why any are made”).
After showing that they are fully aware of possible risks and challenges, campaign creators provide potential supporters with explanations and assurances to make their capital contribution an easy decision. They often use past or present perfect tense to explain what they have done to address potential risks (e.g., “Over the last 18 months, we overcame all the major technological hurdles on our roadmap. . . . we’ve mitigated the timeline risk,” Arsenal CFP) or introduce what they plan to do with a specific timeline (e.g., “Our plan is to get the files to the printer in October, which should give us plenty of time to print everything and get your games shipped to you by early next year,” Frosthaven CFP). In two CFPs, the creators addressed backers’ concerns by reemphasizing the teams’ experience and competence (e.g., “With regards to development of the game, we have an experienced team, deeply connected with the Shenmue franchise. With modern tools, experienced professionals, and the community of Shenmue by our sides, we have set ourselves up for success”).
In the Exploding Kittens CFP, the creators used a kidding tone that accords with the characteristics of this explosion game to deny challenges (“Production of the game is simple: . . . The biggest challenge for us would actually be if you blow us out of the water) and solicit support (“support our project and in return we’ll send you the playable game”). Thus, in Move 8, institutional regulations interact with creators’ tactful maneuvers; that is, creators use this mandatory section as a venue to address backers’ concerns and enhance their confidence in the project.
The Communicative Functions of Visual Images Within and Across Moves
We will now discuss the communicative functions of visual images within and across moves. As our analysis reveals, all nine CFPs incorporate visual images; each has 28 graphics, on average (see Table 2). These visual images facilitate potential backers’ sense-making in basically two dimensions: pragmatically (establishing ethos, logos, and pathos) and compositionally (achieving cohesion within and across moves and facilitating move mixing, embedding, and positioning).
Establishing Logos, Ethos, and Pathos. Given the lack of financial returns for backers, the CFP needs to influence backers’ investment decisions by appealing to their rationality (logos), establishing the creator's credibility (ethos), and most important, evoking potential backers’ emotion to generate both affinities and affect (pathos; Cudmore & Slattery, 2019; Palmieri et al., 2022; Tirdatov, 2014). Visual images, which are more likely to attract backers’ attention than text is (Kaminski & Hopp, 2020), play a crucial role in establishing logos, ethos, and pathos.
The rational appeal of logos can be seen in the visual representation of the project product. Our analysis shows that visual images in Moves 1 and 2 are primarily analytical or classificational (97%). The lead photograph from Pebble Time for example, represents a classificational process, placing three smartwatches of different colors and band materials at equal distances from each other. The lead photograph from Exploding Kittens is an analytical representation of the product. Both representations show the offers in an objective, decontextualized way with a plain background. In terms of interpersonal metafunctions, the visuals in our corpus mostly use the Offer image (95%) in presenting the product to the viewer for close observation like a specimen on display. Most of these images are photographed either from a top-down angle, as if they were within reach and at the viewer's command (32%), or at eye level, showing equality and engagement (60%). While meant to be viewed as factual exposition, these photographs, with highly saturated color, deliver the alluring sensory quality of the products, so they well serve the persuasive function.
The CFPs also use factual presentations in establishing credentials (ethos), such as in classificational displays of users’ or media's logos to show endorsement. Even in the seven CFPs that show pictures of the project creator, not all pictures use the creator's gaze to directly address the viewers (i.e., Demand). In the Mini Museum CFP, for example, three of the four pictures that include the creator's images do not directly address viewers (i.e., Offer). In one picture in which the creator was holding a projector lamp taking the product picture, the creator is placed in the left corner of the frame, appears in dark shade, and is positioned obliquely. Thus, the creator is presented as being peripheral or old information, and the new information—the product, the Mini Museum—is positioned in the spotlight. Although seemingly just descriptive, the picture contains action and analytical processes, with the lamp's light forming a vector leading the viewer's eye from the creator's hand to the product, revealing to the viewer the work of the creator—how he brings the project into being—and in so doing, establishing ethos.
Most important to the success of a CFP, however, is conveying “warmth, involvement, psychological closeness, availability for communication, and positive affect” (pathos), which all positively relate to backers’ investment decision (Grebelsky-Lichtman & Avnimelech, 2018, p. 4181). The CFP designers have adopted various pathos strategies to make the visual images emotionally appealing. One strategy commonly used in the CFPs was to include the user's hand in the picture's foreground. In the Pebble Time CFP, for example, one picture displays a mosaic of images showing various cozy circumstances in which the product—the smartwatch Pebble Time—is used. The inclusion of the hand wearing the Pebble Time while caressing the cat or the teddy bear and the juxtaposition of the smartwatches with other digital devices such as the iPad or camera enable viewers to emotionally identify with the lifestyle that the picture depicts.
Another pathos strategy we identified in our analysis is the creation of “psychological salience” (Kress & Van Leeuwen, 2006), that is, making some participants stand out through size, contrast to background, layout positioning, color saturation or conspicuousness, sharpness of focus, and so on. For example, the lead of the Arsenal CFP creates a contrast between the foregrounded product and background scenery that is captured through different degrees of focus. In the We the People CFP, the “psychological salience” of the project offer—offset prints with iconic political images—is created through color contrast. In the lead photograph, these prints are of full-color saturation and thus high modality. In contrast, the crowd of people appear in black and white and thus low modality. This kind of contrast and modality configuration, on one hand, attracts viewers’ attention to the offer and, on the other hand, creates a visual impact and thus a strong emotional appeal, which is precisely what a political CFP needs.
The most direct strategy for establishing emotional bonding with potential backers, then, is to include a picture of the project creator. In such pictures, the creators are typically portrayed front on (90%), looking at the viewer with friendly smiles, and at eye level (92%) with their eyeline forming the vector that connects the creator with the viewer. This visual representation of project creators is distinctive to Kickstarter because Kickstarter appeals tend to convey that the creator is an everyday, noncorporate person trying to bring a good idea to life. In two games CFPs (Exploding Kittens and Frosthaven), however, the creators’ images are presented using caricatures rather than photographs. Although naturalistic photographs are usually perceived as high modality because of their correspondence with the naked-eye reality, cartoon figures are no less real. The creators’ images, as Kress and Van Leeuwen (2006) pointed out, are only “a representation, detached from his or her actual body” (p. 116). The innovative use of caricatures serves to vividly represent a “character” that appeals to the viewer—a cartoonist who can design humorous and creative cartoon figures in games. 2 In this sense, these caricatures are of high modality in that they “both realize and produce social affinity, through aligning the viewer…with certain forms of representation” (p. 171).
And another essential strategy to trigger excitement and motivate financial support is displaying multilevel rewards and stretch goals. Shenmue 3, for example, has as many as 38 reward levels ranging from $1 to $10,000. Each level is described in words accompanied by a graph with a semiotic mix (O’Halloran, 2008) of pictures, words, and mathematical symbols. This set of eye-catching graphs saves the viewer the trouble of reading explanatory words and constitutes a narrative that taps into potential backers’ varying degrees of commitment to future engagement. This strategy thus can be seen as an integration of logos and pathos.
Achieving Cohesion Within and Across Moves. While we just discussed the pragmatic function of the visuals, we will now report their compositional function within and across moves. Visual elements are heterogeneous, consisting of various semiotic means—not just pictures but also words, shapes, lines, hyperlinks, videos, and more globally, the layout (Kress & Van Leeuwen, 2006). They are multilayered; for example, in Move 1, the lead is a visual element, yet blocks of written text such as the announcement and the action button can also be seen as visual elements that together constitute the represented participants of a larger visual image (see Figure 3). Cohesion within a move is achieved when these heterogeneous visual elements are welded into a coherent whole. A case in point is the Move 1 of Exploding Kittens. The lead picture contains an analytical process presenting the card game box, naturalistic and static; however, the bottom line of the box is tilted, forming a vector leading viewers’ eyes to the purchase button on the right. This vector serves as a cohesive device that facilitates a reading path, turning the narrative from conceptual to actional. If we say the local representational meaning in the lead is that of a conceptual and seemingly neutral offer of goods, then the tilting creates a sense of vectoriality and a dynamic force that brings all the elements together to constitute an actional structure of Move 1.
Cohesion across moves is usually achieved by using the same or varying typographical and color schemes, which Kress and Van Leeuwen (2006) termed “visual rhymes” (p. 204). The Shenmue CFP, for example, uses the same background template for headings of different sections. This cohesive device could be influenced by the prominent use of PowerPoint templates in presenting business reports or proposals nowadays. Further, the compartmentalization of a webpage into grids, while seemingly suggesting disconnections, links the rhetorical moves in spatial, locative terms and endows them with different informational values and visual weights. In this sense, the layout also serves as a cohesive device.
Facilitating Move Mixing and Move Embedding. As Table 1 shows, move mixing was found in 3 CFPs and move embedding in 5 CFPs. The frequent use of visual images as a meaning-making tool has given rise to the occurrences of these two interesting phenomena.
Two graphics from the Mini Museum CFP illustrate how well visual images facilitate move mixing. One graphic, with words superimposed on the picture of the creator, who is holding the pocket-sized product while gazing at the viewer with a smiling expression, illustrates mixed Moves 2 (Introducing the offer), 3 (establishing credentials), and 4 (soliciting support). The representational meaning of this graphic is complex, consisting of five processes: actional (the creator holding the product), reactional (the creator looking at the viewer), mental/verbal (the sentences “I’ve been working on this pocket-sized collection of rare specimens for most of my life. . . . Now all I need to bring it to the world is your support!” superimposed on the picture to convey what the creator has in mind), analytical, and symbolic attributive (the pocket-sized Mini Museum presented as the attribute that establishes the protagonist's identity as the creator). The image–text relationship is that of “intersemiotic complementarity” (Royce, 2007): While the image act combines demand (addressing the viewer directly with a smile to establish connections and solicit support) and offer (presenting the product analytically), the verbal act similarly combines offer (offering information realized by the indicative mood “I’ve been working on…”) and demand (“All I need…is your support!”). Thus, the integration of several representational processes and interpersonal acts of both “demand” and “offer” has realized the mixing of the communicative purposes of Moves 2, 3, and 4 in this graphic.
Another graphic from the Mini Museum CFP that illustrates move mixing is the photograph in which the Mini Museum product is represented analytically in the spotlight as the main participant, thereby introducing the offer (Move 2). But despite being positioned peripherally, the creator, holding the projector lamp for taking the product picture, constitutes an action process and creates a working environment in which the product is brought to life. In this sense, the lamp, the Mini Museum, and the whole work environment are attributes that “establish the credentials” for the creator (Move 3).
As we discussed, frequent alternations between text and image are commonly seen in CFPs. In the Arsenal CFP, for example, photographs and explanatory texts alternately appear in introducing the product and its functions. In such cases, the text–image relations are those of elaboration (Barthes, 1967) or bidirectional investment (Cheong, 2004), with the visual image illustrating what has been represented linguistically or the verbal text elaborating the information illustrated in the visual image. Both the verbal and visual elements serve the same communicative purpose and belong to the same move.
In some other cases, however, text and image have “no identifiable relationship” (Marsh & White, 2003) and serve different communicative purposes. Such frequent alternation between text and image results in what we term as “move embedding” (see Table 1). For example, in the Sketchbook CFP, in the nine instances of move embedding, the text is introducing the transactional details of the offer (Move 7), such as the delivery time, location, or unforeseen delays, while the images display sensory photographs of the sketchbook or drawings from the sketchbook (Move 2). A rhythm of communication, then, is created through such cycles of alternating between psychologically salient (sensory images) and psychologically nonsalient (transactional texts), that is, between what is ideal (the promise of the product, Move 2) and what is real (mundane shipping details, Move 7; Kress & Van Leeuwen, 2006). This embedding of visual images of a different move, provides a resting place for viewers’ eyes and elicits their aesthetic pleasure in and positive affect toward the CFP.
Discussions and Conclusions
Through a multimodal move analysis, this study has offered a generic description of the CFP in terms of its rhetorical move structure and codeployment of text and images in fulfilling communicative purposes within and across moves. We identified moves as realized by the integration of verbal and visual semiotic resources and conceived of the generic move structure as a visual semiotic one, developing along both horizontal and vertical axes, with moves as the represented participants of the webpage canvas, positioned with different information values (ideal–real, given–new) and levels of salience. And we recognized that the abundance of visual images is an essential contributing factor to move variations such as mixing and embedding, which serve their unique rhetorical purposes. This analytical approach is consistent with the Genre and Multimodality (GeM) model proposed by Bateman (2014), in which the traditional “staged linearity” of moves is now “mapped instead to the spatial distinctions provided by page-flow” (p. 254). By disproving Liu's (2020) pessimistic argument that move analysis is “unattainable” in researching multimodal texts, our study provides some implications for extending move analysis toward a semiotic approach.
Moreover, our study responds to Miller's (2017) call for an “empirical, case-based approach” (p. 26) to examining multiple influences on genre emergence. For the CFP, new media and technology are arguably the most prominent influences; they have not only reconfigured public proximities, which make possible the financing mode of crowdfunding, but have also endowed medium with a significant role in shaping the CFP genre. It is its webpage medium that gives rise to the genre's multimodality and two-dimensional rhetorical structure, as we identified in the study. Although the CFP is geared to a new medium and financing mode, antecedent genres still impose constraints on what rhetorical moves are included. Three obligatory moves—Move 2 (introducing the offer), Move 3 (establishing credentials), and Move 4 (soliciting support)—overlap with core moves of fund-raising letters whereas the obligatory Move 1 incorporates elements of the advertising genre, including the sensory-appealing lead and attention-grabbing announcement.
Apart from the influences of medium properties and ancestral genre conventions, the crowdfunding platform's regulations also shape how the genre looks. Kickstarter offers the Creator Handbook: Telling Your Story (n.d.) to instruct creators on what to include and how. The inclusion and positioning of some moves are mandatory; for instance, the section “Risks and Challenges” is mandated to be placed at the bottom, and Move 4 (soliciting support) is designed to appear in both the center and the right column of the tripartite webpage. So when speaking of how move positioning bestows different informational values and visual weight on moves and thus conveys intended communicative purposes, we need to acknowledge the loci of power. The crowdfunding platform indeed has the power to enforce its expectations of how the webpage should be comparted and how moves should be deployed; meanwhile, these expectations reflect the web-design practice gradually formed over the years based on the reception of users or participants.
While these influential factors coalesce to form the relatively stabilized rhetorical structure, we cannot ignore campaign creators’ agency in bringing innovations and changes to the genre. The agency can be seen from creators’ tacit strategy of turning the mandatory “Risks and Challenges” section into a counterargument to risks and challenges and from their strategic use of visual images and intersemiotic relations to achieve rhetorical appeals. Thus, this study has showcased the interaction between structure and agency and between stabilization and innovations in a genre's emergence and evolution.
Based on its detailed description of the multimodal features of the CFP, this study complements earlier quantitative studies (e.g., Grebelsky-Lichtman & Avnimelech, 2018; Lee et al., 2021; Mitra & Gilbert, 2014; Yang et al., 2020) by confirming or contradicting their findings. Regarding intersemiotic relations, for instance, Yang et al. (2020) argued that multimedia promotes comprehension when the information of a mode cannot be understood alone, but they may cause overshadowing between modes if the information presented in different modes is intelligible and similar, leading to redundancy and cognitive load. They assumed, then, that “most campaign information on technology campaigns, such as product use, rewards, and logistics, is easily understood in isolation and thus can be presented only in the modality of text” (p. 14). But we found that even when the information was not difficult to understand, text and images were used in combination to introduce the product, rewards, and logistics. Instead of identifying overshadowing in this study, we identified diverse intersemiotic relations, including bidirectional investment of meaning between text and image; words superimposed on a picture communicating multiple purposes, thus giving rise to move mixing; and alternations of verbal and visual elements that belong to different moves in order to create a sensory rhythm and make the text more accessible and pleasant to read. For a quantitative study, it is not surprising to see that Yang et al. (2020) tested the moderating effects of the number of images on the relationship between text length and fund-raising outcomes without examining the complex meaning making of visual resources. They focused on cognitive load and overshadowing effect while neglecting the rhetorical and compositional functions of visual images. Our study complements their study by exploring these functions.
This study may also provide practical implications for Kickstarter stakeholders, especially campaign creators. Our diagram of the CFP move structure (see Figure 2) could give creators a bird's-eye view of the deployment of the content at their initial design stage. Creators may also find useful the verbal and visual strategies we identified, such as turning the mandatory “Risks and Challenges” section into a counterargument, including a user's hand in a product-offer picture to establish an emotional connection with the viewer, and using tonal contrast to create “psychological salience” of the project offer, among others.
Future research might further investigate, through eye-tracking or other experimental research designs (cf. Van Mulken et al., 2010), whether different move positioning and text–image combinations may have different effects on the audience and their investment decisions.
Footnotes
Appendix
The Nine Kickstarter CFPs
Acknowledgments
We would like to thank Jo Mackiewicz and two anonymous reviewers for their very constructive comments. We are also grateful to Lori Peterson for her meticulous and helpful copyediting.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the the Ministry of Education of People’s Republic of China, (grant number 19YJA740012).
