Abstract
It has been proposed that comics are a particular form of a fundamental human ability to produce visual narratives – a visual language. The expression of this visual language has received little attention outside comics. To address this matter, this work compares comics and scientific diagrams, focusing on representations of morphological transformation. Cohn’s Visual Narrative Grammar model, the role of dynamic knowledge structures and semiotics are considered in this analysis. A comic book and a diagram are investigated. Both reveal two kinds of transformation narratives: those that are depicted in the image sequence, and those that are inferred. In contrast to depicted narratives, inferred narratives do not depend on a narrative structure. Instead, they require context-specific instructions to organize subjects into narratives. Additionally, simultaneous events in visual narratives are proposed to generate concurrent narrative structures within a single image sequence.
Introduction
Throughout History, visual narratives have appeared in different forms and for a variety of functions (McCloud, 1994). Previous work argues that the ability to produce and understand visual narratives is based on fundamental cognitive processes (Cohn, 2013b). This theory has been developed using comics as a model.
The study of comics has led to many ontological discussions. Based on previous definitions (Groensteen, 2007: 18; McCloud, 1994: 9), and in regard to this study, we can define comics as a deliberate series of contiguous images that depend on each other to provide information (usually narrative, but also descriptive or aesthetic). Their distinction from other sequential visual narratives relies on historically grounded parameters, which are in continuous revision. These include use of text, function, context, mode of production and graphic conventions (Groensteen, 2007: 18, 2009; Meskin, 2012; Witek, 2009). For example, while some scientific diagrams are composed of series of contiguous images, they are meant to support certain scientific claims in a rigorous manner, while comics are usually an autonomous artistic practice, independent of external claims.
Nevertheless, there are grounds to compare comics with other kinds of visual narratives. There is significant overlap between the features of comics and other sequential visual media (Caldwell, 2012; Cohn, 2013b; Groensteen, 2007; Johnson, 2011; Nisbet, 2011; Witek, 2009; Yavuz, 2008). These overlaps are extending as artists challenge the conventions of comics, increasing their field of possibilities.
This study is concerned with how readers understand a narrative or story by looking at a sequence of images. ‘Narratives’ refer to a succession of verbal or visual elements that depict a change or transformation, in ‘sliced up’ chronological moments (Todorov, 1990: 28–29). In this context, ‘transformation’ means the passage of one initial state into another, such as in ‘water is heated into vapour’, or ‘a woman swims in a pool when she is attacked by a shark, losing her left foot’. In this study, ‘transformations’ will specifically refer to images that depict changes to body morphology. We will not discuss the storytelling or rhetorical aspects of sequential visual narratives (such as who and what is in the story, or which ideas are developed or implicit).
A recent model developed from studies in comics may be used to analyse narrative in sequences of images outside comics (Cohn, 2013c). The study of scientific diagrams may benefit from this development. Scientific diagrams are commonly used to represent gradual changes in chronologically ordered elements, organisms or populations, thus fitting the definition of narrative provided above. Despite this, they have not been significantly studied from a narratological point of view. Not addressing this aspect is ignoring the form and function of these diagrams. In addition, comparing comics with scientific diagrams tackles two outstanding issues: whether there are fundamental properties common to most or all sequential visual narratives, and whether different conventions change how we process visual narratives.
Breaking Down Sequential Visual Narratives
Academics and artists alike have worked towards a deeper understanding of narrative in comics. Some models suggest that narrative arises from inferring events between two contiguous images (or panels in comics jargon: McCloud, 1994: 60–68; Saraceni, 2003), or from recognizing semantic relations that bind both contiguous and distant panels (Saraceni, 2003). But these models are unable to account for extra-syntagmatic panel relations that are not based on patterns or repetitions. Simply put, a panel does not just inform the meaning of its closest neighbours, but also the overall meaning of the sequence ( Cohn, 2013b: 66–67; Groensteen, 2007; Peeters, 2000).
A different approach proposed by Kress and Van Leeuwen (2006) analyses visual narratives by focusing on aspects such as composition or perceived vectors and relationships between elements in an image. This approach provides a diagnostic for the semantics of image composition and it can be useful to the study of sequential visual narratives. However, it does not try to explain the global relationships established between images in a sequence, i.e. how does one image relate to and inform other images in a sequence.
Other proposals tackle directly the issue of global panel relationships. One model, devised by Thierry Groensteen (Groensteen, 2007: 4), proposes that comics are a language defined by convention, in which the panel (and not the panel transition) is the fundamental unit of meaning. In this model, narrative arises from the linear reading of juxtaposed panels, and from a networked articulation between the content of distant panels (pp. 108–110, 147). Groensteen presents a compelling description of the semiotic process in comics, but does not offer a mechanism with which to generate hypotheses and test predictions concerning the comprehension of comics’ narratives.
Neil Cohn (2013b) has proposed a model that asserts that comics are a culturally specific expression of a visual language. This is not the same as stating that comics are a language, as Groensteen’s (2007: 2) model proposes. In Cohn’s (2013b: 2) model, comics are not a language, but a medium through which a visual language is expressed, similar to how literature is a medium through which verbal language is expressed. This visual language is present in other instances such as pamphlets, cinema, or the sand drawings of Australian aboriginal communities. In this model, panels function as grammatical components of a conceptual structure – a visual narrative grammar (VNG). VNG coordinates panels according to their narrative role in the overall sequence, regardless of specific semantic content (Cohn, 2013b: 21, 69–71). This coordination follows an underlying hierarchical structure between panels (Cohn, 2013c). In this respect, cognition of visual narratives is analogous to comprehension of verbal language: linguistic theories posit that language is mentally organized by hierarchically-related syntactic constituents (Chomsky, 2002, 2006; Cohn, 2013c; Jackendoff, 2002).
That is not to say that semantics does not play a role in organizing visual narratives. Specific narrative categories have been proposed, which correspond to prototypical semantic functions and identify a panel’s function in a sequence (Cohn, 2013b). Constituents (or panels) follow a canonical order of narrative categories, described by the formula (adapted from Cohn, 2013b: 70, 2014b)
Phase X ➜ (Orienter) – (Establisher) – (Initial [Prolongation]) – Peak – (Release) – (Orienter)
This is the simplest form of a narrative structure, where brackets enclose optional categories. As indicated, the only indispensable narrative category is the Peak. Peaks play the most crucial or significant role, driving the meaning of a sequence. They usually correspond to the culmination or termination of an event or action, but they can also represent the interruption of an event. The Initial is the second most important category, where an event is set in motion or actions are triggered. Alternatively, Initials represent the source of a trajectory along a path. Between the Initial and the Peak, there may be Prolongation panels, which represent an intermediary state, often detailing a trajectory that starts in the Initial and ends in the Peak. The information present in Prolongations can usually be inferred from the Initial and the Peak, and these panels can be discarded without compromising the meaning of a sequence. Additionally, two categories define passive states that precede or succeed the event. Releases follow Peaks and present the outcome of an event or action. Establishers present the conditions prior to the unfolding of events, providing referential information. Orienter panels may be present either at the beginning or end of a phase, to provide contextual information about the event, for example, by showing the outside of the house where a fight is occurring (Cohn, 2013b: 70–77). In some sequences, there are modifier panels called Refiners, which repeat information to emphasize certain details already present in other panels (p. 84). Some of the categories from the VNG model recall the elements of narrative previously proposed by Todorov (1990: 28–30). These include an initial situation of equilibrium (similar to the Establishers), the destabilization of this initial situation (the Initials), and a series of actions (similar to Peaks) that lead to the reestablishment of equilibrium (Releases).
Experimental evidence suggests that the panel categories of VNG belong to two distinct groups of core and peripheral components to a narrative (Cohn, 2014b). Deletion or misplacement of panels attributed with Initial or Peak roles impose a greater challenge to narrative comprehension, suggesting they constitute core components of sequential visual narratives (Cohn, 2014b). Panels interpreted as Establishers, Prolongations or Releases are designated as peripheral components, because they are more flexible in their distribution along panel sequences. In these cases, semantic content is less deterministic to narrative role (Cohn, 2014b). Thus, the core components seem to represent axes around which peripheral components are organized.
Assigning categories to panels is not always a trivial task as some panels do not correspond clearly to a canonical narrative function (Cohn, 2013c; Cohn and Paczynski, 2013). Ambiguous panel sequences can have multiple interpretations and lead us to more than one narrative structure (Cohn, 2013c). Likewise, different sequences of panels can express similar meanings (Cohn, 2013b: 88–89).
In the simplest comics, each category is assigned to a single panel. However, in complex narratives, subsets of panels may serve the function of a single narrative category within the larger narrative Arc. These sequences are known as phases (Cohn, 2013c). Within each phase, panels are also organized by the canonical narrative structure. In this way, a large narrative can be broken down into episodes or smaller narratives with their own internal organization. These episodes will, in turn, constitute phases with specific narrative functions (the Initial, the Peak, or other categories) within the larger narrative. This organization allows for recursive embedding of phases within each other, like the ramifications of a tree. Embedding accommodates larger and more complex narratives. Alternatively, some phases may behave as conjunctions, where each panel represents different aspects of the same environment or entity, provide different steps of a single action, or represent semantically connected events with no clear chronological progression (Cohn, 2014a).
We can test hypotheses within the VNG model by manipulating case studies through rearranging, mixing or deleting panels. Through these methods, empirical studies were able to indicate (1) that cognition of sequential visual narratives involves at least a kind of narrative structure based on the panel as attention unit, and (2) that individual panels play discernible roles in narrative comprehension, which are determined by semantic content and context within the sequence (Cohn, 2014b; Cohn and Paczynski, 2013).
Thus, the VNG model provides an explanation for the narrative relationships of distant panels. However, panels can establish relationships outside their role in narrative structure, through purely semantic means. These include repeating motifs that usually carry implicit meanings – specific compositional solutions, particular uses of a colour, elements that are evocative of each other (Groensteen, 2007: 145–147) – but also recurring sequences of events that we recognize and whose outcome we can predict.
For this reason, we will complement VNG’s framework with theories that focus on the role of acquired knowledge in the processing of narratives, such as schemas, scripts, or frames (Herman, 1997; Jahn, 1997; Minsky, 1975; Schank and Abelson, 1977). Schemas can be defined as memory patterns – stereotypical templates of narrative situations – that describe how a sequence of events is expected to unfold (Herman, 1997). A schema provides the reader with a set of expectations, which assist the gathering of new information. Some aspects of the schema are not fixed and may become altered by newly acquired information, to accommodate new situations. When expectations of a certain schema are not met, the reader may simply discard it and apply a more suitable set of instructions (Herman, 1997; Jahn, 1997). Thus, schemas provide an explanation for specific panel relationships that fall outside the purveyance of narrative structure.
Other narrative models were considered, but were found inadequate for the study of sequential visual narratives. For example, narrative structure models based on verbal discourse, such as Genette’s or Chatman’s theories of narrative (Chatman, 1978; Genette, 1980), do not consider how the reader apprehends whole images as attention units (Naaman, 2000), nor do the grammatical systems they propose appropriately describe the pictorial and serial features of sequential images. Other approaches formulate narrative structures based on story grammars or semantic functions (Black and Wilensky, 1979; Cohn, 2013c; Prince, 1982; Rumelhart, 1975; Ryan, 1979; Thorndyke, 1977). However, these do not offer a general architecture for narrative construction.
Based on the VNG and knowledge structure models, this work proposes a comparative analysis of the grammatical and semantic mechanisms of sequential visual narratives, focusing on comics and scientific diagrams.
Images in Scientific Discourse
This study is necessary because scientific diagrams have not been examined as narrative entities, even though the properties of individual scientific images have been previously discussed (Kosslyn, 2006; Kress and Leeuwen, 2006; Lemke, 1998). The narrative devices of scientific diagrams are part of a set of tools and practices that determine not only what kinds of information can be conveyed, but also the meaning of the message. Understanding how scientific diagrams display information is crucial for understanding how to produce and read them. Nevertheless, comparing scientific diagrams with comics poses obvious challenges, which relate to the form and function of scientific discourse.
While comics generally integrate image and text in the same structure, scientific diagrams are found in larger texts in which image and argumentative text are structurally separated, even though both are semantically interdependent (Martinec and Salway, 2005). Scientific communication relies on the complementary aptitudes of verbal and visual modalities. For example, visual components, such as diagrams, graphs, photographs and drawings are used to depict characteristics that speech is less equipped to portray, such as spatial relations, shape, gradation, and continuous change, while text is more apt in describing causal relations between subjects (Kress, 2011; Lemke, 1998). Because scientific discourse emerges as a combination of visual and verbal information, isolating diagrams from their surrounding context is an ‘artificial’ operation that disregards the construction of joint meanings by both modalities (Lemke, 1998). Nonetheless, this study will concentrate on the visual component, while providing information from the accompanying verbal component whenever necessary to study the properties of image sequences.
This does not mean that scientific diagrams are always dependent on the accompanying written content. A diagram may be fully understood in the absence of accompanying text if it specifies the abstract or concrete relations that connect the depicted subjects (Martinec and Salway, 2005). Additionally, expert readers can interpret figures almost without consulting the text of a scientific communication. Frequently, scientific text has a supporting role to the diagram, introducing the topics of discussion and providing interpretations, many of which may be inferred from the images by an expert reader. As texts that are directed towards a specialized audience, scientific communications are generally composed within the epistemological, verbal and visual conventions of a particular community, constituting a genre with its own patterns and meaning-making practices (Bruce, 2008; Lemke, 1998, 1999, 2004; Martin, 1991). Thus, the diagram may be almost fully independent from the text if the reader is well versed in the particular conventions of the research area. In other words, both the accompanying text and the reader’s previous knowledge can function as directions to interpret scientific diagrams. Because similar observations can be made about comics (Witek, 2009), it will be necessary in the course of this study to understand how convention and semantic content cooperate with the structural components of visual narratives.
Morphological Transformation in Sequential Visual Narratives
To facilitate our comparison, this study was circumscribed to representations of morphological transformation as it is a common subject in both comics (Lee, 1974; Round, 2010) and scientific diagrams. All mentions of ‘transformation’ or ‘morphological transformation’ in this communication refer to narrative events, where an entity gradually changes into another. This entity is either a character in a comic, or a model organism in a diagram (and should not be confused with theoretical transformational processes of generative grammars).
The semantics of transformation in comics and scientific diagrams seem to be related. For this reason, we could speculate that both employ similar narrative devices, even though one shows fictional people turning into monsters, and the other demonstrates biological phenomena. Confirming this hypothesis could further support VNG as a model for sequential narratives in general, and not just for comics studies (Cohn, 2013c).
To study this hypothesis, we used as a case study Parasyte (originally, Kiseijū, ‘Parasytic Beasts’), a Japanese comics series by author Iwaaki Hitoshi (2013). The plot concerns a large-scale invasion by alien parasites. Many characters appear throughout the series that are infected by these parasites, which control their hosts by replacing specific body parts with their own (most often, the head). The parasites can change their hosts’ bodies into monstrous creatures. These transformations appear as linear sequences, or through complex narrative relations, making Parasyte a compelling case study.
We compared the transformation sequences in Parasyte with a diagram by 19th-century evolutionist Ernst Haeckel. This diagram argues for the existence of common ancestry between different species, by pointing to morphological similarities in their embryos. It also depicts the embryonic development of these organisms (Richardson and Keuck, 2002).
Even though their subjects and functions are different, Parasyte and Haeckel’s diagram follow similar strategies to convey transformation narratives. Both cases depict transformation events directly through sequences of images. We also find that they convey implicit transformation narratives, which result from inferring connections independently of a narrative structure.
This research also proposes that visual narratives depicting simultaneous events are subject to simultaneous narrative structures, which can be discerned by identifying local Peak (key) panels with semantically salient features.
Transformation Sequences in Comics
Continuous transformation sequences
Figure 1 reproduces a sequence from Parasyte where all panels depict moments of a transformation event. We will call these types of sequences continuous transformation sequences. The figure shows a parasite folding its tentacles back into a human form, which can be described in general terms as a process or trajectory of transformation. Each new state of the transformation builds upon all previous states (or, in general terms, character performs A, in order to perform B, in order to perform C, and so on). The sequence creates a specific pacing effect, where there is the feeling of progressively building action or process. Cohn has previously determined a particular type of narrative structure that applies to these kinds of sequences, which we will call here recursively-branching structures (Cohn, 2013c: I chose to replace Cohn’s designation of ‘left-branching structures’, as it would not be an intuitive term when analysing comics which are read right to left, such as our example). Recursively-branching structures are so named because they are shaped like a branched tree in which each succeeding node is included within a larger phase (Figure 1, top). These structures describe sequences in which there is no single key panel driving their meaning. Instead, there is a succession of Peak (key) panels that build upon each other. Because of this, all panels preceding a Peak function as the Peak’s Initial.

Continuous transformation sequence, taken from Hitoshi, Parasyte (Vol. 2: 212–213). (c) Hitoshi Iwaaki / Kodansha. Sequences are read in the right–left convention of manga. Recursively-branching structure of the full sequence on top, followed by sequences where each single panel is deleted.
A panel-by-panel analysis confirms that this is the case in Figure 1. In the first panel, the tentacled monster looks onto the sea. This panel functions as an Establisher because it defines the conditions before any action takes place. Then, in the second panel, the monster begins to change shape and its tentacles lose sharpness. This panel corresponds to the Initial, because it is where the action begins, disrupting the original state shown in the first panel. This action prepares the Peak in the third panel, where its tentacles fold on themselves. This Peak builds up into the fourth panel, which also functions as a Peak, where the monster’s tentacles change into a clayish face, culminating in the main Peak of the sequence (fifth panel), where we see that the monster has transformed into a human figure.
We can further validate the narrative structure proposed in Figure 1 by performing deletion tests. Previous empirical work reports that deleting core components (Peaks or Initials) has a greater chance of compromising narrative comprehension of a sequence than deleting its peripheral components (Establishers, Prolongations and Releases; Cohn, 2014b).
Deleting panel 1 (Figure 1) does not disrupt comprehension of the whole sequence, supporting its role as a peripheral component of the sequence. However, deleting panels 2 to 4 compromises sequence felicity significantly, such that transitions between contiguous panels become harder to understand, evidencing that there is a gap in the sequence. Deletion of panel 5 interferes with proper culmination of the sequence, as the human silhouette suggested in panel 4 is never achieved. Therefore, panels 2 to 5 seem to represent core components of the sequence, while, in comparison, panel 1 seems to be of secondary importance, supporting our proposal that this sequence can be explained by a succession of Peak panels.
To better understand why this sequence is explained by a succession of Peak panels, rather than a single phase of Establisher – Initial – Prolongation – Prolongation – Peak, consider a hypothetical sequence where an arrow is shot, flies through the air for one panel, and finally reaches its target. Contrary to the case in Figure 1, the intermediary panel – the arrow flying – can be inferred from the panels where we see the arrow being shot (Initial) and arriving at the target (Peak). Deleting the panel where the arrow is in mid-flight would not fundamentally alter the meaning of the sequence, and therefore this panel would correspond to a Prolongation.
We proposed a narrative structure for the transformation sequence in Figure 1 with no regard to its original context, in which it represents an embedded phase within a larger structure. These are the simplest types of transformation sequences found in Parasyte. But in many other cases, a clean ‘extraction’ of the transformation event is impossible.
Intercalated transformation sequences
More complex passages contain intercalated transformation sequences. These sequences show a single transformation event occurring in parallel with other events, such that the panels depicting the transformation are interspersed between panels focusing on other aspects of the episode (such as other characters or actions). Figure 2a reproduces the first pages of Parasyte’s Chapter 11, a self-contained episode that presents two characters for the first time: a driver, and a parasite possessing a woman. We identify two key events that define the meaning of this sequence: the car accident and the resulting murder of the driver by the parasite. In this narrative Arc, the events that lead to the car accident correspond to the Initial phase, while the events that are triggered by the accident and lead to the murder correspond to the Peak phase (Figure 2b).

Intercalated transformation sequence, taken from Hitoshi, Parasyte (Vol. 2: 41–45). (c) Hitoshi Iwaaki / Kodansha. Sequences are read in the right–left and downwards convention of manga: (a) Panels are numbered by reading sequence on the page. Highlighted panels represent the paraphrased sequence corresponding to the transformation event; (b) Narrative structure of the sequence in (a), derived from the accident and murder events. Narrative structure maintains right–left ordering of the panel sequence for easier correspondence between tree and panel sequence (c) Narrative structure of the same sequence, derived from the transformation event. Grey dashed lines separate constituents by page; numbers below each constituent indicate panel number/page; curved dashed line indicates to which panel the refiner refers; black dashed line indicates a constituent (Peak) that is not shown. Legend:
The accident phase is constituted by an Initial phase and a Peak phase. This Initial phase is further decomposed into an Establisher conjunction phase, where the characters and their motivations are presented (all captions are expositive); and an Initial – Prolongation – Peak sequence of panels, where the parasite’s puzzlement with the seatbelt (an object previously unknown to her) distracts the driver, leading him to take the hand off the wheel (in the Peak panel). These panels provide the Initial conditions to the following Peak phase, where the leisurely car ride is interrupted. Here, two conjoined Peak panels show the car swerving off the road from two different perspectives (page 1, panels 6 and 7).
On the second page, the accident has already occurred, and the car has crashed on a tree (the actual crash is inferred between panels). This semantic separation of events ‘punctuates’ the sequence of panels (Cohn, 2013c), indicating the end of the accident phase (Initial), and the beginning of the murder phase (Peak). In this example, punctuation is accompanied by some ambiguity at the interface of the accident and murder phases, as the second page corresponds both to a Release – by showing the aftermath of the accident – and to the Orienter that contextualizes the ensuing events.
Following page 2, another complex Establisher phase determines the physical conditions of the parasite and the driver after the accident; as a consequence of its likely demise, the parasite moves back into the vehicle and fully transforms into its parasite form. At this point (panel 4 of page 4), the murder plan is initiated, leading to the parasite’s separation from its original host and murder of the driver. Panel 3 of page 5 functions as an Initial to a Peak in the following page, showing the driver’s already severed head, which is omitted from this figure.
In this way, the accident and murder events are mapped into a narrative structure, which seems to demonstrate that recursively-branching structures, such as the one arrived at in Figure 1, do not describe all transformation sequences. However, this narrative structure does not represent the transformation event within the sequence. The panels corresponding to the parasite’s transformation also provide causation to the murder in the form of the parasite’s thoughts. It is this role that is represented in the narrative structure of Figure 2b. This could mean that the transformation event in Figure 2 does not reflect prototypical narrative functions of the panel sequence, but merely the semantic particularities of the murder event. This study argues that this is not the case for two reasons: (1) we have already determined that an isolated transformation sequence possesses an underlying narrative structure (Figure 1); and (2) the transformation event in Figure 2a represents a progressively building process as seen for Figure 1, and constitutes a key event that provides causation to the conclusion of the narrative (Omanson, 1982). Thus, the sequence’s narrative structure should reflect the transformation event, in the same way that it reflects the accident and murder events. Figure 2b does not account for two simultaneous events – the physical and mental activities performed by a single agent (Margolin, 2012[2009]).
To address this issue, a second narrative structure was derived, in which the panels corresponding to the transformation event are treated as core components of the overall sequence (Figure 2c). The new structure can be effectively arranged as recursively-branching. The internal structure of the first Initial phase (the ‘accident’ phase) remains unchanged, because it still contextualizes the narrative and provides a starting point for the parasite’s transformation (her human form). The imminence of the accident serves as local peak of this phase, providing an interruption that triggers the ensuing transformation. This Initial phase is then embedded within succeeding Initial phases, where each new step of the transformation event functions as a Peak, providing tension in the developing threat of the parasite. All other panels serve peripheral roles, by enhancing details of the transformation (as Refiners), providing context (as Releases) or by introducing narrative pauses between transformation states (as Prolongations).
Organizing the narrative structure according to the transformation event is not an arbitrary decision as the panels corresponding to the transformation event are effectively functioning as Peaks. As previously demonstrated by Cohn (2013b), removing peripheral panels from a sequence does not interfere with our comprehension of that sequence. If the transformation panels are functioning as core (Peak) components, then all other panels should be unnecessary to understand the transformation event. Removing panels that have a peripheral role should not interfere with comprehension of the transformation event. The paraphrased transformation sequence, highlighted in the panels of Figure 2a, is still able to communicate that the woman transforms into an alien creature, even though the transformation remains semantically indissociable from the extant events. As expected, the paraphrased transformation sequence is autonomous, supporting this new version of the narrative structure.
This second narrative structure complements the structure of Figure 2b. The insufficiency of either one in describing all ongoing narrative functions suggests that both structures unfold concurrently. They should function as collaborating branches under a single Arc node, linking these simultaneous events.
Compound transformation sequences
We find a third type of transformation sequence in Parasyte, which we will call compound transformation sequence. In contrast with continuous and intercalated sequences, compound transformation sequences do not comprise an actual event. They are assembled as a mental collage from multiple transformation events of the same character. This collage allows the reader to understand how a character transforms from point A to point C, even if all that is provided are discontinuous segments of independent events, such as A to B, and B to C, or Sequence (A – B) + Sequence (B – C) ➜ Compound sequence (A – B – C)
Plot coherence relies on the ability to associate independent events in this manner.
This is the case that is paraphrased in Figure 3, where the full transformation sequence of this character is only guessed from smaller and dispersed transformation sequences. Smaller sequences are ordered into a larger transformation sequence according to overlapping features (not shown). For example, if three sequences appear in the following order in the story: A – B – C; E – F – G; E – D – C
Then, the compound sequence will be organized according to overlap: A – B – C – D – E – F – G

Compound transformation sequence (abridged). Taken from Hitoshi, Parasyte (Vol. 7: 202, 208, and Vol. 8: 118). (c) Hitoshi Iwaaki / Kodansha.
Although connected by the larger narrative Arc, the fragments that constitute compound transformation sequences are separate events. Our ability to establish associations between these events cannot be explained by the underlying narrative structure. Furthermore, the sequence chosen for this analysis spans from Chapters 56 to 61 of the series, which would make it impractical to derive a narrative structure. Thus, arriving at the inference mechanisms that create this compound sequence is more useful than formulating its narrative structure. Image 1 of Figure 3 corresponds to the initial step of a single transformation event that culminates in image 2. From image 2 onwards, this character goes through a series of independent transformation events, from which different fragments of the transformation sequence are shown. For example, one event begins with the parasite’s human form and ends with the parasite turning his arms into spikes, while retaining a human-like head; another event begins already with a humanoid figure with spikes and a monstrous head with six eyes.
It is thought that we are able to ‘reform’ memorized images in the ‘mind’s eye’ when elicited by specific sensorial cues (Kosslyn, 1980). In our case, each new appearance of the character should recall memorized images from previous appearances and transformation events. These memories are elicited by visual cues such as the character’s hair, basic shape or other repeating features, and even contextual cues such as the character’s role in the story and the events in which he or she participates (similar to Saraceni’s [2003] concept of cohesion). A similar process has been proposed for film (Naaman, 2000: 114). To mentally organize these sequences, the reader applies a known pattern or ‘transformation schema’ – acquired from reading the series or gathered from conventions of the science fiction genre, for example. This schema must include the information that characters in the story transform back and forth into aliens, that they may do this at will, and that their appearance varies according to their divergence from human appearance. In this way, the reader may be able to build a mental sequence of the ‘complete’ transformation event from a puzzle consisting of both unique and repeating, superimposing features. In sum, the reader produces a coherent narrative from fragmented sequences, by creating mental visual constructs via a narrative schema.
Transformation Sequences in a Scientific Diagram
Coincidentally, visual scientific discourse often includes images organized chronologically, based on independent and fragmented data. A sense of continuity is present in many diagrams depicting biological transformation phenomena, such as evolutionary relations or developmental processes. As such, evolutionary or developmental diagrams may use visual language mechanisms similar to those we saw in comics. Alternatively, they may reveal other possibilities for representing transformation phenomena.
A classical diagram by evolutionary biologist Ernst Haeckel was chosen to initiate this discussion because it deflects some of the problems raised by expert-level language and graphic conventions (Figure 4). Mostly, it leaves out technical or visual jargon, resorting instead to common species names (in German) and photorealistic representations of embryos (which might, by themselves, demand some form of specialized knowledge that will be approached according to necessity). Additionally, its composition resembles an 8 x 3 comics panel grid, suggesting that reading procedures could be similar in both cases.

Embryology diagram by Ernst Haeckel. Originally published in Haeckel (1874) Anthropogenie, and taken from Richardson and Keuck (2002): (a) The names in the diagram are in German, and translate as: F. Fish, A. Salamander, T. Turtle, H. Chicken, S. Pig, R. Cow, K. Rabbit, M. Human; (b) Recursively-branching structure from the developmental sequences in each column of the diagram.
The diagram demonstrates similarities between embryos of different species, which Haeckel considered to be proof of common ancestry (Haeckel, 1897: 361). It was meant to support Haeckel’s thesis that an organism’s embryonic development recapitulates its evolutionary past (pp. 6–8; Richardson and Keuck, 2002). In his view, new traits were added terminally to embryonic development, and previous developmental steps were shortened in order to keep the whole process biologically sustainable. In this way, embryonic development was a record of the evolutionary addition of new morphological characters, even though the author admitted that other processes influenced development, such as deletion of certain traits, or chronological changes in their appearance (Gould, 1977: 74, 81–82, 354–355; Haeckel, 1897: 7–14; Mayr, 1994; Richardson and Keuck, 2002). This was the theoretical basis that Haeckel (1897) supplied to interpret the diagram of Figure 4.
Each individual embryo of Figure 4 functions as an attention unit similar to the comics’ panel ( Cohn, 2007; Groensteen, 2007: 4), and will be treated as such. The diagram is organized as a grid where each row corresponds to equivalent developmental stages of different organisms, while each column shows a single organism at discrete stages, in a chronological order. The grid offers an horizontal and vertical reading (Richardson and Keuck, 2002): on one hand, the organism’s names at the bottom signal each column as a significant grouping of pictures (embryos from the same organism); on the other, western conventions encourage the reader to scroll through each image from left to right.
Even though the diagram’s purpose is to compare embryological features, the transformation observed in the vertical axis tells the developmental story of each species (which is also mentioned in the verbal description of the diagram; Haeckel, 1897: 360–361). Each column represents a continuous transformation sequence, depicting a species’ embryonic development in three panels (as in Figure 1). As such, we can analyse its narrative structure (Figure 4, right). Unlike Figure 1, these sequences are composed of only three panels, and have a much simpler narrative structure, with no phase embedding. The first panel provides the referential state (Establisher), while the second panel initiates the transformation process (Initial) that culminates in the third panel (Peak). These narrative roles are context-specific, and we can imagine this sequence extending in any direction, conceding the Establisher role to earlier stages of development, and introducing further Peak panels. From the analysis of Figure 1, it can be extrapolated that any extension of this sequence would also yield a recursively-branching structure. Thus, this vertical reading is immediately identified as narrative.
The narrative significance of the horizontal axis is more ambivalent: any suspected relation between the embryos in a row is contradicted by the indication that they belong to a different species. Arguably, there may be some localized features suggesting a morphological progression: for example, in the second row, the overall shape of the embryos could be interpreted as showing a gradual transformation from a lean, upright embryo to a large, curved one. Nevertheless, the parallel disposition of each species favours their comparison at different developmental stages, so that the reader may be directed to Haeckel’s evolutionary narrative.
Even though a contemporary reader easily acknowledges the evolutionary relations suggested by these images, they are not explicit. The reader makes that interpretation either by following instructions present in the accompanying text, or by applying previously acquired information. Following these instructions and a combination of non-linear readings, we can generate a mental evolutionary narrative involving the images provided.
As mentioned previously, embryos from equivalent stages bear many resemblances, some being almost permutable (for example, the first fish and salamander embryos, the second rabbit and human embryos). Also, as was noted by Haeckel, the embryos share traits, which in some cases disappear in later stages, such as the tail precursor or the pharyngeal arches in the human embryo (Figure 4; Haeckel, 1897: 360–361). The gradual divergence between species is consistent with Haeckel’s proposal that evolutionary changes occur terminally, leaving earlier features as artefacts of former evolutionary stages (pp. 7–10). Thus, the development of wings, hooved or fingered limbs from limb buds is a clear differentiating feature for each species. Earlier traits such as the appearance of limb buds or the tail precursor provide a set of features with which to organize the different species into lineages.
These traits form the basis of a classification system, from which the reader establishes various ellipses between the images. These ellipses are dependent on linking all images to an absent connector – an element that is not visually present, but that creates meaningful associations between the images. They are mentally constructed by following specific instructions. In this case, the absent connectors are the common ancestors of the species in the images. Common ancestors are mentally constructed from shared traits between the images, and an ‘evolutionary’ schema provides the instructions for this construction. This schema may be provided by the accompanying text (Haeckel, 1897) or may be a previously acquired ‘evolutionary’ genre-specific schema (Herman, 1997).
Similar operations have been identified previously. Classification systems use explicit or implicit tree structures to associate objects into groups, according to shared traits (Kress and Van Leeuwen, 2006: 42, 79–83). The rhetorical strategies that frame the grouping of images in Figure 4 supports their subordination to a classification system: all embryos are illustrated equidistantly and in the same orientation towards the horizontal and vertical axes; they are placed in a neutral background; and the organisms are named and ordered according to certain morphological and physiological criteria (for example, respiratory organs, morphology of the mouth and limbs, presence of a tail) (pp. 45–47, 87). Connections based on a classification system are not strictly narrative (pp. 82–83). However, the implicit tree structure that is generated in this case reflects a (branching) transformation narrative.
Moreover, this tree structure results from a conscious and detailed articulation of the information present in the diagram. For example, the presence of gill arches and tails in every organism’s early stage embryos points to a common ancestor connecting all depicted organisms, while the absence of limbs in the fish indicates that all animals but the fish share a common ancestor that acquired limbs (Haeckel, 1897: 360–361). Further on, the presence of hooves instead of fingers suggests another common ancestor between the pig and the cow, which is not shared by the remaining limbed animals. From this example, we can arrive at an inferred narrative with three succeeding ancestors (the gills and tail ancestor, the limbed ancestor and the hooved ancestor) that lead to the hooved animals represented in the diagram. The use of mentally constructed ancestors, or absent connectors, allows us to establish a narrative between all images.
Thus, Haeckel’s diagram delivers explicit transformation narratives, while serving as the foundation for an inferable narrative provided by the larger context of the diagram.
Discussion
Although we don’t find cases of ‘intercalated’ or ‘compound transformation sequences’ in Haeckel’s diagram, we can find other affinities with Parasyte’s transformations. The narrative strategies of both case studies can be grouped in two types. In the first type, we have transformation events directly represented in the sequence of images. All of these sequences are subject to recursively-branching narrative structures. This type includes continuous and intercalated transformation sequences, such as the examples in Figures 1 and 2, and the vertical sequences of Figure 4. The second type comprises transformation narratives that are not directly represented in the images. Instead, they emerge in the absence of a distinctly represented sequence or narrative structure, being derived by inference mechanisms. This type includes both compound transformation sequences and the ‘evolutionary’ narrative of Haeckel’s diagram.
The derived transformation narratives of Parasyte and Haeckel’s diagram require genre-specific instructions or schemas. This work argues that they result from two different mental processes. The compound sequence of Parasyte requires a ‘transformation schema’ to associate unrelated sequences of transformation into a canonical complete transformation. This association is done through concrete connectors (i.e. visual or contextual cues). On the other hand, the evolutionary narrative of Haeckel’s diagram relies on an ‘evolutionary schema’ to derive genealogical relations between concrete visual subjects (the depicted embryos) and abstract connectors (their putative common ancestors). This strategy produces narratives by use of two consecutive mental operations of relative complexity – in our case, (1) grouping embryos by common traits, and (2) creating absent entities that connect each embryo into a tree. This process hints at a wealthier field of mental strategies involved in visual narrative inference, which should be further investigated outside the subject of transformations.
The visual representation of morphological transformations dates back at least to Antiquity (for an example from the 6th century BCE, see Buxton, 2004: 19). It is likely that expectations of various provenances, including genre conventions from science fiction, horror or biology play a part in interpreting visual narratives of transformation, both when they are directly depicted in sequences of images or when they are semantically derived. Interpretation of a simple, continuous transformation sequence may vary, depending on whether the reader possesses previously acquired knowledge. For example, an informed reader may know that the sequences of embryonic development in Haeckel’s diagram could be understood as compound sequences of a theoretical event (in this case, assembled directly on the page and not as a ‘mental collage’), because it was not technically possible for Haeckel to draw these sequences from a single animal. Even though they are meant to represent a single embryo at different points in time, they actually illustrate three independent embryos at different stages of development, thus trumping the initial interpretation that these fragments are from a continuous process undergone by a single subject. An even more informed reader might be aware of the polemics surrounding some of Haeckel’s drawings, such as the accusations of intentionally adulterating drawings, speculating developmental stages, or mixing stages from different species (Richardson and Keuck, 2002), from which one might conclude that there is a fictional aspect to these diagrams.
These additional interpretations also attest to the comics-ness of Haeckel’s diagram, which was deliberately chosen because its layout resembles that of a comics’ page. In spite of this, following the reading conventions of comics – left-to-right and downward, or right-to-left and downward, in the case of Japanese comics – would not lead to a competent interpretation. Instead, the diagram requires a combination of vertical, horizontal and transversal readings. Haeckel’s diagram provides a grid of panels whose meaning, narrative and discursive potential change with reading direction. But taken together, those readings form the basis of a whole other narrative. These processes are enabled only through juxtaposing images.
Overall, this study encourages further comparisons between the narrative devices of comics, scientific diagrams and other sequential media. For example, Chris Ware’s comics are noticeably influenced by diagrammatic conventions (Bartual, 2012; Cates, 2010).
It remains to be seen whether simultaneous events are a meaningful occurrence in visual scientific discourse and how this is represented in scientific diagrams. Unlike text, which is bound to linearity and can only depict simultaneous events in tandem (Margolin, 2012; Saussure, 2006: 142–144), images can express a simultaneity of entities and events (Kress, 2011). As they are composed of images in sequence, comics can take advantage of this property. In this regard, comics resemble other visual devices, such as sign language (Tang et al., 2007), film or theatre, in which the shot or the stage can house two or more simultaneous events (F.Dick, 1979; Margolin, 2012). Further investigation should address the relevance of concurrent narrative structures when studying other visual media, including diagrams.
Many contextual factors may influence how the viewer apprehends simultaneous events in visual narratives. When elaborating a structure for the sequence of Figure 2, the accident and the murder events seemed to carry more weight in comparison to the transformation. Their apparent predominance over the transformation event suggests that there are negotiation processes that determine how they interact. It has been proposed for film that the cognitive discrepancies between juxtaposed verbal and visual modalities emphasize one type of information over another because acquisition and storing of information occur through different cognitive mechanisms (Kosslyn, 1980: 347–406; Naaman, 2000). Verbal information is processed through higher cognitive functions because it is already highly coded and abstract, leading to immediate categorization of information. On the other hand, images should be apprehended and stored through a combination of higher cognitive functions (that process information into a narrative structure; Cohn, 2013c), and bottom-up mechanisms that store complex visual information into an holistic, analog mode (Kosslyn, 1980; Naaman, 2000: 58–59, 253–254). Like in film (Barthes, 1977: 25, 32–51), the verbal component of comics frequently contains information that is not present in the images. In the example of Figure 2, text provides an explicitly articulated plot for readers to focus their attention on. In addition, readers may have to pay more attention to plot points than to transformation events. The latter should be easily anticipated by readers, as they respect previously apprehended semantic or narrative patterns that run throughout the series (Cohn, 2014b; Schwan and Ildirar, 2010).
It must be stressed that these are matters pertaining the salience (semantics and not structure) of graphic features, and predominance of certain events should not interfere with a global comprehension of simultaneous narratives. They merely permit the establishment of context-specific hierarchies while narrative structure(s) remain(s) unaffected. It should also be noted that devising concurrent narrative structures is not a means to address cases in which a panel’s role in the sequence is ambiguous.
These observations indicate that interpreting sequential visual narratives depends on their structural features, but also on individual reading practices. A comprehensive description of these practices and their consequences is beyond the scope of this work and the tools applied here. The meaning of an image sequence is shaped by how readers prioritize and negotiate simultaneous events and different modalities (Martinec and Salway, 2005), the readers’ ability to infer meaning based on acquired knowledge, experiences and preferences, or simply the directions their reading follows on the page (Cohn, 2013a). All of these issues demand future investigation.
Conclusion
The present work proposes that the use of acquired knowledge structures can provide narrative in some specific cases, thus bypassing the need for sequences of images to be ordered by a canonical narrative structure. This conclusion is in line with previous empirical research suggesting that reading comics involves the interplay between a VNG and learned canonical narrative patterns (Cohn, 2014b). As such, it may be profitable to consider similar complementary approaches in future studies, while striving for an integrated understanding of the cognitive, semantic and social processes behind the comprehension of visual narratives.
This study’s focus on depictions of morphological transformation was a strategy to circumscribe parameters when comparing sequential visual narratives with very distinct purposes and provenances. At a first glance, transformation narratives are a fairly narrow topic within the area of visual communication. But the conclusions derived here, mainly concerning how we address simultaneity and semantic connections, should be applicable elsewhere. In addition, the variety of diagrammatic imagery, such as graphs, drawings and photographs representing biological changes at the organismal, population, and evolutionary levels suggests a wider range of narrative mechanisms and justifies further incursions into this subject.
Most significantly, we hope this work contributes to the formal study of sequential visual narratives in general. This is the first study that addresses the combined role of narrative structure and acquired knowledge structures or schemas in sequential visual narratives. From our observations, we should expect that complex sequential visual narratives use a variety of narrative mechanisms in parallel. Future work should aim to address their interactions.
Hopefully, this work will encourage further incursions into the study of narrative in scientific diagrams, and new comparative studies between comics, scientific diagrams and other categories of sequential visual narratives.
Footnotes
Acknowledgements
I would like to thank Professor João Paulo Queiroz, Neil Cohn and Ana Matilde Sousa for their comments on previous drafts, which greatly improved the final version. Portions of this study were previously presented at the Third Comics Conferences, in Lisbon, Portugal (CBDPT 2013).
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Biographical Note
HUGO ALMEIDA is a researcher at the Artistic Studies Research Centre (CIEBA) of the Faculty of Fine Arts of Lisbon University. He has a doctoral degree in molecular biology, having previously published in the area of chromosome biology. His current research interests include visual narratives and the intersections between comics and other media. He makes comics under the pseudonym Mao for the Clube do Inferno label.
Address: Artistic Studies Research Centre (CIEBA), Faculty of Fine Arts of Lisbon University CIEBA, Faculdade de Belas-Artes da Universidade de Lisboa, Largo da Academia Nacional de Belas-Artes, Lisbon 1249-058, Portugal. [email:
