Abstract
Presentations with PowerPoint, made possible by digitalization, are another step towards more visualization in the history of science communication. This new genre of communicating scientific knowledge to an audience combines several semiotic sign systems and therefore can be analysed as a form of multimodal discourse that integrates pictures, text, design, etc. on the slides, as well as spoken language, gestures, acts of pointing, etc. by the speaker. This study approaches the problem of multimodal discourse – how meaning is constituted by the different modes – empirically from a recipient’s perspective. To reconstruct the meaning-making process, the authors apply eye tracking and other methods of reception research in real-life scenarios as well as in laboratory settings.
Introduction: Powerpoint Discourse and Its Theoretical Implications
Visualizations in science communication have a long history, dating back to the 19th century, but reaching their climax with the digitalization of communication. Today, hardly any public presentation of science fails to employ computer-based visualization tools or shareware, most often PowerPoint. The integration of visualization devices has enabled the evolution of a new format of scientific presentation, which enriches speech with visual modes of communication such as pictures, audio-visual materials, or written text, and integrates technical devices into spoken discourse (Lobin, 2009; Peters, 2007).
From the perspective of knowledge transfer, this mediatization of science communication has evoked some fundamental criticism, most notably by the information designer Edward Tufte. In his 2003 publication, ‘Powerpoint Is Evil’, he criticizes the ‘cognitive style’ of PowerPoint, which ‘routinely disrupts, dominates, and trivializes content’ and ‘elevates format over content, betraying an attitude of commercialism that turns everything into a sales pitch’. In the eyes of a second group of authors, PowerPoint-based presentations open a new chapter in the debate on the relation of ratio and rhetoric, which is a perennial issue in the history of rhetoric itself. At the centre of this debate lies the suspicion that rhetorical skills tend to obscure the evidence of a statement and therefore per se are always in danger of turning speech into propaganda. Therefore these critics allege that PowerPoint turns scientific argumentation into persuasion, thus violating the principles of rational discourse (Peters, 2007; Schnettler and Knoblauch, 2007; Turkle, 2003).
In contrast to this PowerPoint scepticism, a third group, the PowerPoint optimists, argues its potential for more interactive knowledge transmission and learning processes (Gabriel, 2008) and formulates psychological principles for designing audience-oriented presentations (Kosslyn, 2007). These authors see PowerPoint presentations as a new ‘visual rhetoric’ which is ‘likely to become a pervasive feature of public life’ (Stark and Paravel, 2008: 50).
A fourth branch of research analyses scientific presentations as a genre of communication or a type of discourse whose structure is determined by the accompanying visual channel, the interactional context and the integration of technical devices into spoken language (Knoblauch, 2008; Lobin, 2009; Rowley-Jolivet, 2004). A common feature of this approach is an expansion of the perspective, in which the analysis of presentations is not restricted solely to the slides or software, but instead focuses on the performance of speaker and audience while acting and participating in PowerPoint presentations. Behind the notion of performance stands the idea that meaning cannot be deduced from single elements of communication – for example, the slides in a presentation – but is created in the enactment of the participants and their situational use of communicative means like speech, gestures, referring and pointing actions, and visual material. Even the design or type of the slides and the synchronization of what is said and what is shown can have an impact on the meaning-making process. The methodology of the research presented here is based on a performative conception of scientific presentations. These presentations are treated as a type of complex communicative action, which, in view of the different types of signs employed, is best characterized as multimodal (Bucher, 2011; Kress, 2010; Kress and Van Leeuwen, 2001).
Most of the research on PowerPoint has focused on the presentation itself without taking into account the recipients’ perspective. But knowing how recipients attend to PowerPoint presentations and how the intended knowledge transfer works is a fundamental precondition for evaluating the potential of this communication genre. In order to avoid these shortcomings, we have chosen an empirical and experimental approach, treating scientific presentations as a multimodal form of communication and analysing them within a framework of reception research. This approach is motivated by two research questions. A fundamental one, aimed at a theory of multimodality: How do recipients integrate and understand the complex ‘orchestration’ of different modes and sign systems; and an applied question: How does PowerPoint influence the quality of scientific knowledge transfer?
Background: Scientific Presentations as Multimodal Discourse
Treating scientific presentations as a type of multimodal communication contrasts fundamentally with the PowerPoint scepticism triggered by Edward Tufte’s article. Instead of narrowing the focus to the slides or the software, we conceptualize scientific presentations as a multimodal genre or discourse type in which all the different symbolic resources like speech, text, pictures, graphics, design, intonation, pointing activities, and so on are used to accomplish an overall meaning. The ‘orchestration’ of the multimodal arrangement of a presentation happens in three different modal domains: the visual domain of the slides, the verbal domain of speech, and the performative domain of gestures and pointing (see Bucher, 2011; Bucher et al., 2010). Although multimodal presentations in some respects are very similar to ‘multimedia instructional messages’ (Mayer, 2005), the difference lies in the relevance of the performed referring actions (pointing, gestures, referring utterances) that link the verbal and the visual. The multimodal character of a scientific presentation forces members of the audience to coordinate their attention in a temporal and spatial dimension simultaneously. Because of the co-presence of different modes, the audience is confronted with a kind of hypertextual structure of different information sources – the speaker, the slide with its different elements, and the pointing actions – which are organized in space and whose reception requires a choice of attention. At the same time, the coherence of a scientific presentation also has a sequential logic, namely the linear order of the speech and the progression of the slides or their animations. This idea is very similar to the distinction between ‘phases’ and ‘transitions’ that Baldry and Thibault propose for a multimodal analysis of films (Baldry and Thibault, 2005: 47–50). In PowerPoint presentations, we can define a phase as the segment of communication that is limited by the fade-in and fade-out of a single slide.
The complexity of scientific presentations that incorporate visual projection results from this somewhat paradoxical combination of two different patterns of discourse structure: a linear pattern in time – the sequential rhythm of the discourse – and a non-linear pattern in space – the constellations of signs and symbols in three-dimensional space. The problem of cognitive overload, discussed at length in publications on multimedia learning and knowledge acquisition (Holsanova, 2008; Sweller, 2005) could be nicely explained as the difficulty of acquiring the paradoxical structure of multimodal communication. However, it remains an empirical question if and under what circumstances multimodality complicates or facilitates understanding discourses or, in our case, whether the enrichment of a scientific presentation with a visual channel improves or impairs knowledge transfer.
Research on multimodal discourse is confronted with two basic theoretical problems. First the problem of compositionality: What, specifically, does each of the individual modes contribute to the overall meaning of a discourse and how do they interact? Second, the problem of reception, which is more or less the mirror image of the first: How do recipients integrate the different modes and acquire a coherent understanding of the multimodal discourse? As for a research strategy for these two problems, scientific presentations with PowerPoint are to some extent a paradigmatic object of study. The problem of compositionality lies at the centre of multimodality research. It is labelled with the concepts of ‘intersemiosis’ (O’Halloran, 2008: 470), ‘semantic multiplication’ (Lim, 2004), or ‘modal interrelation’ (Kress, 2010: 165). The basic assumption that lies behind all these concepts is the idea that the whole of a multimodal ensemble is more than the sum of its parts.
The theory underlying our research on multimodal scientific presentations is based on these theories of multimodal discourse and is conceived as a theory of multimodal action (Bucher, 2011). Similar to language, signs from all modes can be used to pursue communicative aims. This assumption is based on a theory-of-action approach to language and communication, rooted in speech-act theory, linguistic pragmatics and semiotics. It assumes, analogously to sign-making in the social-semiotic approach, that all signs in the context of communicative actions are used to make the addressee understand something (Bucher, 2007). We employ the concepts of an action theory of communication to analyse the interplay of various modes as a means of communication, such as the distinction between the meaning potential of a sign, which determines the action possibilities, and the communicative meaning, created by a multimodal action that is actually executed. This theory can be linked to modern interactional reception theories focusing on the recipients’ appropriation actions.
To a much lesser extent, research in multimodality has dealt with the question of how this form of communication is perceived and interpreted by its audience. Traditional theories in audience research have not been very helpful in solving this task for many years. But for some time now, audience research has witnessed a paradigm shift from an effect-oriented to an interactional perspective focusing on the process of appropriation of media items by the recipient (Jensen, 2002). This interactional approach allows the integration of elements from other theory traditions that deal with reception, perception, or comprehension, like psycholinguistics, gestalt theory, or current cognitive science approaches in perception and visual communication. The concepts of attention and affordance (Duchowski, 2007; Neumann, 1987), which by definition incorporate an interactional logic, are of great relevance for an interactional theory of reception. An interactional approach also allows for the translation of the cognitive processes of selecting, organizing, and integrating, which Mayer (2005: 38–42) identifies in multimedia learning, into observable activities of the recipients.
The application of this theory can resolve the difference between stimulus-oriented (realistic) and recipient-oriented (idealistic) approaches. From the perspective of the media stimulus, the process of reception could be described as a salience-based bottom-up model (Itti and Koch, 2000), whereas from the perspective of the recipient, the process could be described as a top-down process driven by the recipient’s expectations, knowledge, competence and intentions (Henderson et al., 2007). Within an interactional paradigm of media reception, both perspectives are valid, which means that both effects intermingle. Salient cues of the media stimulus (bottom-up) are evaluated in relation to the current goals and the actual state of knowledge and competence of the recipient (top-down) (Duchowski, 2007). With these approaches as a background, the concepts of selection, attention, relevance and appropriation can be seen to be at the centre of a theory of reception (Bucher and Schumacher, 2006).
Methods: Eye Tracking, Knowledge Tests and Interviews in Different Scenarios
For empirical research on how multimodal communication is perceived and interpreted, the theories of multimodality and reception mentioned above lead to four methodological principles that are realized in a systematic set of different research scenarios (see Table 1).
Methodical setting of the reception study
Only about 60 of these presentations were part of the complete reception study in scenario I.
Scenario I
The first principle is that reception research on scientific presentations has to be done in real-life settings (scenario I) to make the ‘naturalized engagement’ (Kress, 2010: 170) of recipients visible. Only under this condition it is possible to track interaction between the orchestration of a scientific presentation – the sign-making process of the presenter – and the audience’s reception – the meaning-making process of the addressee. In the study presented here, the eye movements of about 60 persons were recorded in conference presentations of various scientific disciplines, mainly in the third and fourth quarters of 2008. More precisely, the gaze motions of an expert in the discipline of each documented presentation were tracked using the video-based SMI mobile eye tracker ‘iView X RED’ (50 Hz) (Figure 1). Additionally, a questionnaire was distributed to the audience focusing on socio-demographics, professional expertise, the degree of experience with PowerPoint, and the individual evaluation of each presentation in different dimensions. Furthermore, all these presentations were documented by video, and all PowerPoint-slides were archived. In sum, more than 24 hours of scientific presentations and 2244 slides were analysed; 33 presentations from this corpus were chosen for a comparative product analysis and reception analysis (12 from humanities, 11 from natural sciences, and 10 from economics).

A participant in the real-life setting (scenario I) wearing the mobile eye tracker.
Scenarios II and III
A second principle is that the modal density of presentations should be systematically manipulated. Therefore two different laboratory-based research scenarios were used, which permitted step-by-step reduction of the modal complexity of presentations. In scenario II, the test persons watched the original PowerPoint presentation synchronized with the presenter’s speech. In scenario III, the test persons could click freely through the PowerPoint presentations. The three scenarios represent three types of multimodality, which differ in terms of modal density. Scenario I contains all possible modes of a scientific presentation. Scenario II contains the visual slides and the spoken language, but not the performance of the presenter – for example, his or her pointing actions. Scenario III contains only the visual channel of the slides. Overall, 9 presentations from the data corpus (3 from each scientific culture) were tested in the laboratory settings and 31 participants were involved in this part of the study. In order to gain comprehensive reception data, additional methods were applied in the lab scenarios besides eye tracking (SMI, ‘iView X RED’, 50Hz): think aloud, retrospective interviews on the evaluation of the presentations, a questionnaire collecting socio-demographic information, prior knowledge, the participants’ degree of experience with PowerPoint, and a knowledge test with questions formulated after prior consultation with the respective presenter.
As the significance of the individual modes cannot be ascertained in isolation, cutting off the modes systematically permits the reconstruction of their semiotic function by determining their influence on the reception process and on knowledge acquisition. Comparing reception data from the three scenarios, therefore, leads to insights into the interplay between different modes.
Scenario IV
Because of technical limitations in the live scenario – only one head-mounted eye-tracking device was available – the data of only one recipient exist for each presentation. To overcome idiosyncratic assumptions about understanding multimodal communication, as a third principle a control scenario was developed (scenario IV) that permits reproducing the same presentation life size as often as is required on a 3-by-2m wall screen (Figure 2). The staging of the videotaped presentation was already guided by hypotheses derived from the former scenarios – for example, on the relevance of pointing gestures by the presenter or hypotheses on reception patterns of slide types. Using this procedure, reception data from 23 persons attending the same presentation could be collected. Similar to scenario I, eye movements of the recipients were tracked (SMI, ‘iView X HED’, 50 Hz) and the participants completed a questionnaire. As in scenarios II and III, there was a subsequent knowledge test and, in 10 cases, retrospective interviews.

A Participant in the control scenario (IV) wearing the mobile eye tracker.
A fourth methodological principle, derived from reception theory, led to a research design that permitted tracking the influence of recipient features on the understanding of multimodal presentations. Therefore, in scenarios II, III and IV, two populations of test subjects were recruited: experts and novices. As they differed in terms of prior knowledge and competence regarding the topic of the presentations, the influence of these features could be measured.
These four methodological conclusions, i.e. using a natural setting, manipulating modal density, providing a control scenario, and isolating recipient features, guaranteed rich and well-matched reception data and therefore permitted reconstruction of the different layers of the process of appropriating multimodal presentations.
Eye tracking
In our study, we took eye movements as indicators of the allocation of attention (Duchowski, 2007). Compared to traditional retrospective interview data, eye-tracking data are more reliable as they are obtained directly during the reception process and thus provide immediate insight into the process of interaction between the stimulus and the recipient. Furthermore, it is minimally vulnerable to typical interview effects like the social desirability effect (Bucher and Schumacher, 2006: 354). 1 When analysing the process of reception, eye-tracking data are a helpful indicator in many different dimensions. First of all, they show which parts of a stimulus recipients have observed and which parts have been ignored, i.e. they disclose the participants’ selection strategies. In addition, the dwell-time on a particular part of the stimulus (the speaker, slides, etc.) indicates the level of attention and the recipients’ interest. The documentation of eye-tracking data over time shows reception sequences (i.e. the order in which different parts of a stimulus have been received) and thus, at the same time, the acquisition strategies used by the participants. Finally, these data also permit evaluation of the quality of the reception process, as, for instance, reading or scanning can be differentiated when analysing gaze motions.
Prior to the analysis of eye-tracking videos, the participants’ field of vision had to be digitally sliced into ‘areas of interest’ (AOIs) to measure their allocation of attention. For this reason, a set of categories was developed, differentiating the modes of a presentation (speaker vs slides) on the one hand and the diverse modes within a presentation slide (text, photo, video, etc.) on the other. Software-based analysis then showed which AOIs were perceived, in what order, how often, for how long, and to what point in time. This method of data preparation makes it possible to identify patterns of reception for distinctive parts of a presentation or for specific types of slides. The example from scenario I, illustrated in Figure 3, reveals that the two AOIs, ‘speaker’ and ‘graphic’, are gazed at alternately (zig-zag-pattern). This indicates a permanent shift of attention between the person speaking and a graphic shown in the PowerPoint slide.

Still from an eye-tracking video, including the visualization of gaze motion with the software INTERACT.
Qualitative methods
In addition to eye tracking, the think-aloud method or retrospective ad hoc commenting, interviews and knowledge tests provide process data on how the recipients piece together the different modes of scientific presentations (especially think aloud), and they yield information about interpretations and knowledge acquisition (post-hoc methods like interviews and knowledge tests). Moreover, the post-hoc methods allow for the interpretation of direct reception data generated by eye tracking and the think-aloud method.
Empirical Results
The presentation of the empirical results of our reception study addresses the following main questions: What is the influence of the types of slide of a PowerPoint presentation on the reception? What is the impact of modal density? How relevant are referring actions? And, finally, how important are individual characteristics of the recipients, especially their prior knowledge, for the understanding of multimodal presentations?
How types of slide influence reception
Describing the projections of a scientific presentation – in most cases PowerPoint slides – as the visual channel is a somewhat rough description. Following Rowly-Jolivet (2004), from a semiotic point of view, one can distinguish four types of visualizations, namely scriptural, figurative, numerical and graphic visuals. These different visual resources can be used for different functions, namely to express visual information simultaneously with what is said, to create coherence in the temporal progression of the speech as well as the projection, or to help the audience ‘to divide or “chunk” the flow of discourse into more manageable portions’ (p. 168). Functional criteria and a functional typology of slides are analysed elsewhere in more detail (Bucher et al., 2010). It is notable that the public debate on PowerPoint only focuses on scriptural text-based bullet point slides, neglecting the fact that the social impact of the most influential PowerPoint presentations was based on figural and graphical visualizations (Stark and Paravel, 2008), for example the presentation Colin Powell gave to the UN Security Council to present evidence for the existence of weapons of mass destruction in Iraq. The intention of our research was, therefore, to analyse empirically how different types of slides influence the reception process and the transfer of scientific knowledge.
Types of slide
In order to analyse reception processes under a multimodal perspective, slides may be usefully classified into three types: text-only slides, pictorial-only slides, which can contain figurative or graphical elements, and mixed slides, which can be composed of textual and all types of pictorial elements. This classification of slides permits inferences about the basic types of intersemiotic relations, namely relations between speech and text, speech and pictures, and speech and multimodal slides. The distribution of the three types varies in the different scientific disciplines (see Table 2). In economics, for example, the greatest number of the slides are textual (48%), the most ‘pictorial’ presentations come from the natural sciences, where nearly two thirds (65%) of all slides contain pictorial material. As for pictorial slides, the humanities occupy a middle position with 28 per cent figurative and 24 per cent mixed slides. Apart from the differences between the disciplines, the distribution of the types of slide indicates a strong overall tendency towards a visualization of scientific conference presentations in all disciplines.
Distribution of the type of slide within different scientific disciplines (N = 2244 slides/94 presentations [economics = 33; natural sciences = 27; humanities = 34]) (%)
Although the eye-tracking data show fundamental differences concerning the three types of slides, they indicate an overall feature of all meaning-making processes of scientific presentations. The recipients do not add the meaning of the speech and the meaning of the slides mode by mode towards an overall meaning of a presentation, but they combine and integrate the different semiotic resources from the beginning. However, the different types of slides activate different patterns of meaning-making, which are indicated by typical sequences of eye movements. One can interpret these patterns as traces of the recipients’ strategies to construct coherence between what is said and what is shown.
Text slides
Text slides in scientific presentations are not only words on a screen but well-designed spatial arrangements of information units, which we can call visual text (Bucher, 2007). Therefore, understanding a text slide not only means comprehending what is written, but also grasping the spatial constellation of the text units (i.e. the text design of the slide), which can, for example, express hierarchical relations or a relevance structure. Text slides appear in scientific presentations in two different forms: as a static text slide which shows the whole textual content at once and as dynamic slides which fade-in the information units incrementally. The two types of slides elicit two different patterns of eye movement, which indicate two different strategies for integrating the speech and the slide content. As is shown in Figure 4, the static text slide elicits a block-by-block reception where the recipient first reads the whole text on the slide and then turns his attention to the speaker. In contrast to this more linear reception, the appropriation of dynamic slides follows a more non-linear pattern. In the rhythm of the blending of the textual units, recipients’ attention switches between the slide and the speaker. Data from the control scenario (scenario IV) with 15 test persons confirm these patterns and permit some generalizations concerning the integration of speech and visual text. If the visual text information is complex, as in the case of a static text slide, recipients first concentrate on reading and obviously overhear the words spoken additionally. On the other hand, dynamic slides permit an alternation of attention between the speaker and the projection. In terms of multimodal understanding, this is a rather far-reaching observation. The dynamic design of a slide (i.e. the stepwise blending of textual units), serves as a means for managing the coherence between the verbal and the visual mode via synchronization, thereby indicating the relevance of the information on the slide. In scientific presentations, dynamic slide design therefore has a function similar to that of verbal expressions referring to informational units on a slide. ‘More’ visualization does not necessarily cause the audience to neglect the speaker, rather it can be a good way of optimizing integration of speaker and projection.

Dynamic and static text slide, including the corresponding gaze motions of two participants.
Pictorial-only slides
Pictorial-only slides in scientific presentations occur in many variants, for instance as drawings covering only parts of the slide, as full-size photographs, or as photo collages. Independent of size and number of the pictorial elements, these slides typically fulfill one or more of the following functions: They demonstrate, visualize, provide backing for, or illustrate the topic of a presentation. Throughout our reception study, pictorial-only slides in different variants elicited the same three-part pattern of reception, indicated by the following sequence of eye movements. The appearance of the slide initially attracts the participants’ entire visual attention for some seconds (A); subsequently, they avert their eyes from the slide, focusing on the speaker or other elements in the room (B); and, finally, they briefly re-fixate the pictorial slide (C) (Figure 5). The eye movements reveal a linear block-by-block reception comparable in a way to the static text slide. As with the text slide, this pattern was confirmed in the control scenario IV with 15 participants. This interactive pattern, and especially the re-fixations, are strong arguments for a ‘generative theory’ of knowledge acquisition, which means that ‘cognitive processes … are applied segment by segment rather than to the entire message as a whole’ (Mayer, 2005: 41).

Typical gaze motion of a participant watching a pictorial-only slide.
The impact of modal density
When going beyond the types of slides and focusing on the PowerPoint presentation as a whole, the findings of our study reveal the impact of modal density on the reception process. Comparing the results of the knowledge test administered to the participants in scenario II (n = 30) with those in scenario III (n = 30), clear differences are observable. 2 Those who watched a sequence of the PowerPoint slides that was synchronized with the original speech of the presenter (scenario II) scored much higher, reaching 2.46 points on average out of 4 possible points. The participants in scenario III who saw the PowerPoint files without any audio mode only reached 1.76 points on average (Figure 6). These results are less ambivalent than in Wiebe et al. (2007), who used comparable settings in a science learning context and concluded that ‘PowerPoint with voiceover (scenario II in our study) does slightly more in terms of retention of science content than does a PowerPoint with no voiceover, but not enough to show significance’ (pp. 338ff). 3

Results of the knowledge test from scenario II and scenario III.
The comparison of knowledge acquisition in scenarios with different modal density shows that a higher number of modes in scientific presentations does not constrain the transfer of knowledge (e.g. due to cognitive overload), but rather that multiple modes may gainfully support one another (see also Mayer, 2005).
One sees what one shows: the importance of referential actions
Referring actions – verbal and gestural – are another variable in the reception process of PowerPoint presentations as they guide the recipients, and indicate what is relevant (Knoblauch, 2008: 88). Moreover, ‘in combination with speech it [i.e. the gesture] even succeeds quite regularly in creating new meanings that are not represented either in the words spoken or on the slide pointed at.’
Different strategies of reference can be applied by a presenter: verbal references (on the left side, here, etc.), gestural references with the hand, technically-based references with a laser pointer or an arrow on the slide, to name but a few examples (for details, see Lobin, 2009: 67–70, 114ff.; Knoblauch, 2008: 78). In scenario IV of our study, the scenario with the test-presentation, the impact of two of these strategies on the reception process was tested, namely verbal references and references with a laser pointer. Figure 7 shows the stimulus slide presented to the 23 participants. It mainly contains a large still from a 1950s news programme, showing the anchor and an information graphics (a map) next to him in the background. In our test-presentation, the presenter first used a laser pointer to refer to the anchor on the slide, later on he verbally referred to the map next to him by saying: ‘And in the background the … information graphics.’ The visualization of the participants’ eye-tracking data (Figure 8) reveals that the reference with the laser pointer instantly directs the audience to the anchor on the slide. The verbal reference also has an impact on the allocation of visual attention, but a less distinct one. During and after the reference, measurably more participants focus on the map next to the anchor.

Stimulus slide from the test-presentation tested in scenario IV.

Visualization of the gaze motions of 15 participants in scenario IV during the reception process of the slide shown in Figure 7.
This example stresses the overall importance of referential actions to the process of meaning-making. Nevertheless, the realization of their communicative functions presupposes the existence of several fundamental factors:
timing: reference acts have to be synchronized.
wording: same wording has to be used in speech and on the slide.
language: same language should be used in speech and on the slide to express the coherence between speech and projection.
sequence: reference acts on a slide should appear in a successive order.
accuracy: gestural references should point exactly to the object which is meant.
In addition to the theories of multimedia learning, the results indicate that coherence management has an impact on all of the cognitive processes of selecting, organizing and integrating verbal–pictorial input (for details, see Bucher et al., 2010: 394–98).
One sees what one knows: the relevance of prior knowledge
Derived from the reception–theoretical background, a determining factor in the debate on the reception of PowerPoint presentations is the audience itself, or, more precisely, individual characteristics of the recipients (age, gender, prior knowledge, etc.). Our studies show the recipients’ prior knowledge on the topic of a presentation to be of particular importance. The slide in Figure 9 from our data corpus is a mixed slide from a veterinary medical presentation. It represents a typical situation in the sciences, namely the presentation of empirical data. In this case, the slide shows the results of investigations of mechanisms underlying pharmacoresistance in status epilepticus models. 4 A PET 5 scanner was used to observe the absorption of the substance Verapamil by the brain of a rat after the administration of the drug Tariquidar. The picture on the slide in Figure 9 shows four PET scans of the rat, the two on the left were done before it got the drug Tariquidar, the two on the right two hours after taking the drug. A comparison of the different PET scans makes it obvious that Tariquidar led to a higher absorption of Verapamil (light colouring of the brain in the two PET scans on the right in Figure 9). 6

Mixed slide from a veterinary medical presentation.
As for the headline, ‘11C-VERAPAMIL-PET-SCAN NACH[after, HJB/PN] TARIQUITAR)’, the large picture with the four PET scans clearly marks the central element on the slide for a veterinarian who is familiar with the PET imaging technique. The other elements, like the text, diagram, etc., contain ancillary information about the experiment presented.
The veterinarian giving the presentation uses different reference techniques to connect his oral speech to the content of the PowerPoint slide: a laser pointer (gestural deixis), local deictic references and object deictic expressions, as highlighted in the following representative excerpt from the transcript:
On the
Experts vs novices: scenario II
To evaluate the influence of the recipients’ prior knowledge on the allocation of attention, we compared an expert and a novice in scenario II, where the performative mode of the speaker (his facial expressions, the gestural deixis, etc.) was cut off. Figure 10 illustrates the gaze motion of the two recipients while the presenter gives the explanations documented in the excerpt above. Obviously, the expert looks at the relevant areas in the picture showing the PET scans (the lung and the brain of the rat on the left). The novice, however, quite often gazes at the wrong rat (the one on the right), his scan path can be described as a search pattern. 7

Visualization of the gaze motions of an expert and a novice in scenario II during the reception process of the slide shown in Figure 9.
Apparently, the expert’s higher level of prior knowledge allows him to interpret the spoken references of the presenter in the intended way, so there is no need for pointing actions with a laser pointer to direct his attention to the relevant areas on the slide. For the novice, the elimination of the performative mode of the presenter seems to be a major problem, which may be related to a lack of knowledge of the anatomy of a rat, a lack of experience in analysing PET scans, or deficits in using relevant technical terms (e.g. ‘naive rat’).
This interpretation is backed by a statement the expert made during the retrospective interview, commenting: ‘Yes, due to the fact that the brain looks like that … Therefore, I put this in a chronological sequence, because he explained it like that.’ This reveals that the expert is familiar with the anatomical structure of a rat’s brain as well as with the PET imaging technique. Therefore, he is able to allocate his attention adequately on the basis of the presenter’s verbal expressions.
The novice’s problems allow us to draw the conclusion that higher modal density can assist meaning-making. With the additional performative mode of reference actions it would have been much easier for the novice to understand what is relevant on the slide at this specific point of the discourse.
The importance of the recipients’ prior knowledge for the process of reception is also highlighted by the results of the knowledge test conducted in scenario IV. When looking at the average score of the test participants (n = 20), the experts scored considerably higher than the novices, reaching 1.92 points on average out of 4 possible points, compared to 1.6 points for novices.
Discussion: Theoretical Relevance
Presentations, including visual projections like PowerPoint, are a new but widespread genre of scientific communication that open up new possibilities for the public understanding of science. They incorporate a prototype of multimodal discourse, combining speech, text, figurative and graphical visuals, elements of design and a whole family of performative actions. The empirical results on understanding PowerPoint presentations represented here are relevant in two respects: First, they show how limited a whole branch of PowerPoint criticism has been in narrowing the analysis to a few types of (text-)slides, thereby ignoring the multimodal variety and performative dimension of this new genre. Therefore, scepticism towards and condemnation of PowerPoint are more a result of a restricted form of analysis than of analytical evidence. PowerPoint is first and foremost a tool that can be used to enrich a presentation with a visual channel, and this can be done with or without competence. Scientists, usually more experienced with spoken and written language, therefore have to learn how to orchestrate a complex multimodal ensemble of different semiotic systems. At the centre of this multimodal rhetoric lies the challenge of managing the coherence between what is said and what is shown, for example via actions of pointing, verbal references, and synchronization of speech and visual projection. In the end, the quality of communication is not determined by technical devices, as such, but by the competence of dealing with them and understanding their limitations and possibilities.
A second consequence of this study concerns the theory of multimodality and multimodal understanding. Eye tracking and other reception data show that making sense of a presentation means integrating different modes of communication systematically. The comparisons between novices and experts show that for understanding scientific presentations it is essential to select the relevant aspects of a multimodal stimulus in an adequate sequence. The criteria for this selection neither derive from the stimulus itself, nor from the cognitive schemes of the recipient alone. They derive from the interactional activities by which the recipient appropriates the multimodal presentation. These results allow for an alternative approach to the problem that in semiotic theories of multimodality is labelled ‘intersemiosis’. Multimodal meaning can be conceived as a result of meaning-making (in an action theory sense), whereby the recipient constructs his or her meaning by selecting, ignoring, interpreting, and connecting activities. Reducing modal density by cutting off the modes step by step in the experimental setting on the one hand elucidates their respective functions within a multimodal arrangement and, on the other hand, shows how recipients draw upon the respective modes by integrating them for an overall understanding.
Limitations
Two limitations of the findings must be mentioned. In contrast to eye tracking, no technical device exists for ear tracking. Therefore, audio attention, which works simultaneously with visual attention, can only be deduced from secondary data such as utterances of the recipients or reception results such as knowledge tests. Because of technical limitations in the live scenario, we could only collect data with one eye-tracking device. We tried to compensate for this shortcoming with the help of a control scenario with a life-size presentation on a wall screen. Of course, it would be necessary to repeat the study in a real-life scenario at different presentations with more than one eye-tracking device. In view of the obvious lack of empirical reception research on scientific presentations and on multimodal communication in general, the limitations of the study presented here should be justifiable.
Footnotes
Acknowledgements
The authors thank Jochen Adam, Julia Harrer, Lisa Keimburg, Martin Krieg, Christian Lehberger and the HumTec team at RWTH Aachen University, for their support during data collection and preparation. We would also like to thank Gerd Fritz and David Hudson for useful suggestions and corrections to earlier versions of this article. Last but not least we would like to thank the ‘Volkwagen Stiftung’ for funding the research project ‘Interactive Science’.
Notes
Biographical Notes
HANS-JÜRGEN BUCHER is Professor at the Department of Media Studies, University of Trier, Germany, with a background in discourse analysis and practical journalism. His research interests include multimodal media communication, journalism, Internet research, audience research and science communication.
Address: Department of Media Studies, University of Trier, 54286 Trier, Germany. [email:
PHILIPP NIEMANN is a researcher and lecturer at the Department of Media Studies, University of Trier. His interests currently are with audience research, eye tracking and political communication.
Address: Department of Media Studies, University of Trier, 54286 Trier, Germany. [email:
