Abstract
Any analysis of how mass-mediated visuals are perceived and interpreted in multimodal contexts should be informed by a scientific understanding of the biological constraints on visual processing, as well as a solid culturally aware visual communication approach. This article focuses on the interdisciplinary combination of three methods – iconology, a qualitative method of visual analysis targeted at the meanings of visuals and based in the humanities, and eye-tracking and psychophysiological reaction measurement, both based in experimental psychology. The authors propose a Visual Communication Process Model as an integrative means for connecting different facets of the communication processes involved in visual mass communication. The goal of this new model is to widen and sharpen the focus on explaining (a) meaning-attribution processes, (b) visual perception and attention processes, and (c) psychophysiological reactions to mass-mediated visuals, illustrated in this article with examples of press photography.
Introduction
In this article, we propose a new approach to bridging the methodological and theoretical gap in interdisciplinary research of the visual, by combining iconological visual analysis (Müller, 2011) with measurements of cognitive processes and emotional effects (Olk and Kappas, 2011; Müller and Kappas, 2011). Our goal is to create a theoretical framework that will allow for the combination of qualitative, non-standardized methods suitable for analysing the meanings of visual motifs in a specific context, with standardized experimental methods of measuring psychophysiological reactions to viewing press photographs (for a similar approach connecting eye-tracking with semiotics, see Boeriis and Holsanova, this issue; Holsanova et al., 2006). The selection of the journalistic genre of press photography, and particularly news photographs, is based on the assumption that press photographs play a crucial role in shaping human perceptions of socio-political reality and the formation of opinions about reality (Fahmy and Wanta, 2007). Additionally, there are strong indications that, quite frequently, visual motifs in the news depict violent or catastrophic scenes, showing corpses, severed limbs, mutilated bodies and so on, as for example ‘trophy shots’ of killed dictators such as Saddam Hussein or Muammar Khadafi. There is reason to believe that these stark visual motifs elicit strong reactions in audiences which might, under certain circumstances, even lead to clinical conditions such as post-traumatic stress disorder (Ahern et al., 2004; Garry and Gerrie, 2005; Tso et al., 2011). However, despite this indication of the relevance of press photographs, most experimental research on visuals has been conducted in the area of advertisements with respect to relatively narrow target audiences. The rationale behind our theoretical model (Figure 2) is to provide interdisciplinary teams of researchers with a common understanding of the complex visual process and the various feedbacks and interactions that need to be taken into account. Thus, we approach the study of press photography by integrating three processes crucial for the functions and effects of visual news communication: attention allocation, meaning-attribution and emotional reactions to press photography, as well as the relationship between these processes. Empirically, they are related to knowledge (context, media expertise), goals (motivation), processing style, self-relevance and emotional reactions. Thus, at the start of our collaboration, in very general terms, the main research questions were: How do people perceive and interpret press photography? How do they react emotionally to this particular type of visual news?
Our theoretical findings are based on our long-term collaboration since 2004 in the context of the transdisciplinary Research Center Visual Communication and Expertise (VisComX), and particularly on our preceding work on visual competence (Kappas and Olk, 2008; Müller, 2008; www.visualcompetence.org). In this context, we developed a theoretical model – the Visual Competence Cycle (see Figure 1) – that is meant to illustrate the context-dependency of meaning attribution processes and its interrelation with the four dimensions of visual competence – visual production, visual perception, visual interpretation and visual reception competencies. Thus, we stipulate that the inference of meanings to visuals is strongly influenced on three contextual levels: (1) the individual, personal or dispositional context, both in terms of the individual producing a visual (intended meanings), and individual recipients interpreting the visuals (attributed meanings); (2) the situational context; and (3) the wider systemic context in which the respective visual was produced and is perceived.

The Visual Competence Cycle with four visual competence dimensions (production, perception, interpretation and reception) and three context levels (individual context, situational context and systemic context) (Müller, 2008: 103).
Theoretically building on the Visual Competence Cycle in Figure 1 and based on a broad literature review, we propose a theory-based process model, the Visual Communication Process Model (VCPM), for focusing visual research goals (see Figure 2). We acknowledge that each area – perception, meaning, emotion – and their relationship will be affected by a large array of factors and that there are several ways in which an integrative investigation could be approached. Despite much relevant research (see below), as far as we know, no study to date has investigated these intricate relationships in their totality. The VCPM integrates different sub-processes and in doing so provides both a conceptual tool as well as a roadmap for large-scale research programs. Our approach should be understood as a very first starting point for fostering further research and not a complete analysis of all possible factors.

The Visual Communication Process Model (VCPM).
The essence of the VCPM lies in its feedback loops linking the three major processes involved in visual communication into a cyclical pattern that is based on a co-dependent cascade with a total of nine sub-processes (translating into nine arrows in Figure 2): the perception of visuals depends on visual exploration (1). Perceived images are then interpreted (2) based on knowledge, individual experience and context information (3). Depending on the interpretation, emotions might be elicited (4). There are numerous feedback processes that influence and guide visual exploration as images are being interpreted via top-down processes (5). Emotions shape the interpretation (6). Additionally, it is likely that emotions have the capacity to influence visual exploration via fast sub-cortical pathways (7). While the influence of knowledge and the three context levels (Figure 1) on emotions is via interpretative processes, there might be changes in the thresholds for the elicitation of emotions that are depicted via a dashed arrow (8). Similarly, there is an indirect influence of individual emotions on images, depicted as a dashed arrow (9). This means that the selection of images by journalists, editors and other media producers/publishers are in all likelihood a function of anticipated emotions. This is the moment when the Visual Communication Process comes full circle, and previously experienced emotional reactions to similar visual stimuli influence the production of new visuals, thus creating a continuous cascade of mutual influences from the production of images to their perception, interpretation and emotional reception.
Background: Visual Communication Research and Press Photography
Past research has so far suggested that visuals – and photographs in particular – have a strong impact on people’s perception of the world (e.g. Garry and Gerrie, 2005; Fahmy and Wanta, 2007; Monk, 1989). Visuals are assumed to have a strong influence on attitude formation and public opinion, as well as on political motivation, participation and action; as David Perlmutter (1998: xii) stated: ‘it has become almost assumed wisdom that a picture ... can trigger the emotional reaction of world opinion and force the hand of policy makers.’ However, Perlmutter suggests that this emotionally activating potential of press photography has its origin in what he dubbed a ‘first person effect’, where it is the general belief of ‘discourse elites’ that the visuals do have a strong impact, and not necessarily the real and empirically measurable impact that press photography has on the general beholder. He goes even further, suggesting that the same decision-making elites ‘then, often falsely, project this effect on the general viewing public’. The photographic vehicle for those assumed emotion-eliciting, as well as action-eliciting, visuals are what Perlmutter labelled ‘icons of outrage’, defined as ‘a picture that demands our attention’ (p. xiv). Despite those strong assumptions of attention-grasping press photographs and their actual or assumed impact on opinion formation and social or political action, no empirical evidence for an ‘inner connection’ between the perception and reception of visuals and a certain ‘public image’ has been provided so far. This gap in empirical research is particularly surprising given the strong indications that exposure to certain types of visuals, and press photographs in particular, might have potentially traumatizing effects on audiences (e.g. with respect to the impact of trauma on the perception of images related to the 9/11 attacks, see Tso et al., 2011). A deeper understanding of the interaction of perception, meaning attribution and emotional reaction to publicized news photographs is desirable both from the dual perspectives of basic research and applied journalism. Despite the increase in importance of mass-mediated dynamic visuals (film/video/online), press photography remains a ‘hot topic’ which is reflected in the continuing publication output (e.g. Griffin and Lee, 1995; Monk, 1989; Parry, 2010; Zillmann et al., 2001).
Reception studies in communication science testing the impact of visuals are rare (e.g. Petersen, 2006) and often focused on testing the reception of full print or online pages (e.g. Bucher and Schumacher, 2006), or their major goal is to test the ‘side effects’ of visual communication on the reading process (e.g. Zillmann et al., 2001). The article by Petersen is particularly interesting for actually measuring what the effects of press photographs are in a field experiment. He used text and image manipulation for this task, his key interest being to test the actual impact of the visual by comparison to text. His – cautious – result is that text can be the ‘mightier sword’ than the visuals, thus relating his research results to Perlmutter’s aforementioned ‘first person effect’, suggesting that the cognitive, emotional, social and political impact of press photography might be a post-hoc construction. One of the shortcomings of studies solely based on self-report is that potentially non-conscious bodily reactions are not assessed, and thus no ‘objective’ measures are available. This is the major reason why we included psychophysiological measures in our research design, 1 which focuses on the visual (pilot study 1), and then on the potential text–caption effect (pilot study 2).
The method we apply to investigate both intended and attributed meanings is the adaptation of iconology which was originally devised for the systematic analysis of artworks. In the context of this research, iconology is used to identify patterns of production and depiction in their respective production and reception contexts using the example of contemporary press photography (see Müller and Kappas, 2011). The results from the iconological analysis will thus contribute to better understanding the processes that relate to meaning-attribution in the VCPM (Figure 2, arrows 2, 3, 4 and 5).
Attention research and press photography
An elegant way of inferring the allocation of attention is to monitor eye movements. Eye-tracking helps to measure the rapid sequences of saccades and fixations. The obtained data provide objective measurements, such as where a person looked first, for how long a given area was fixated, and how the exploration continued (Olk and Kappas, 2011; see also other articles in this issue).
Psychological research shows that, apart from ‘low-level’ characteristics (e.g. colour, luminance, onset), gaze and attention are also guided by ‘high-level factors’ such as knowledge, goals, processing style, self-relevance and emotional reactions (see also Kappas and Olk, 2008). Pictures are viewed in an active manner and observers search for relevant information (Henderson, 2003). For instance, visual exploration strategies of experts and novices differ significantly. There are cultural differences in visual exploration (Chua et al., 2005), and the goals of the observer are very important (Yarbus, 1967). Eye-movement patterns differ depending on the task given to participants and hence the information required. The promise of attention research then is that eye-tracking can provide clues as to how: (1) particular image contents are perceived and evaluated in general; (2) context influences perception; and (3) pre-existing knowledge, attitudes and values (e.g. membership of particular groups, or cultural origin) influence perception.
In the context of communication science, eye-tracking has been mainly applied in three areas: public relations/marketing (e.g. Pieters et al., 1999), newspapers (e.g. Bucher and Schumacher, 2006) and advertising research (e.g. Krugman et al., 1994). Academic research mainly uses eye-tracking to study the perception of advertised products in mass media, e.g. the scrutiny of online advertising with respect to the positioning of buttons, headers and flyers is currently receiving heightened attention. But newspaper and net-paper perception are also topics of study (e.g. Holsanova et al., 2006).
Measuring visual attention via eye-tracking in the applied context has two specific goals: (1) to establish whether a viewer/reader spends time on a particular detail – if not, for example, important details might be missed; or (2) to study the time and frequency a viewer spends on a particular detail in order to gain an insight into covert psychological processes, such as attitudes or knowledge. However, given the fact that there are many low-level and high-level processes interacting in directing visual attention, eye-tracking data alone are frequently ambiguous. What is needed is a link between psychological perception and reaction processes with interpretative meaning-attribution. This is where iconology as a method of qualitative inquiry into specific meanings (see Müller, 2011), attributed to particular press photographs, comes into play. Attention research and eye-tracking as its applied method in our research design contributes to a better understanding of the processes guiding attention and perception (Figure 2, arrows 1, 2, 3 and 9). No objective markers have so far been identified that could identify directly from the eye-tracking data whether visual attention is due to liking/disliking or to the purely visual properties of a stimulus, such as, for example, a visual symbol (e.g. the armoured vehicle/tank in the press photograph depicted in Figure 3). It is here where multimodal reaction assessment, described in the section below, becomes relevant.

The photograph at the top was one of the images presented to the participants. Blue circles represent the fixation locations; the size of the circles and blue numbers show fixation duration; arrows indicate saccade direction; and the yellow numbers indicate the sequence of the saccades. The chart below displays time (percent of total viewing time) spent on most prominent areas of interest shown for each viewing.
Emotion research and press photography
Emotions are not reflexes that are unconditionally linked to a specific stimulus, but there is much flexibility. In general, affective responses are either elicited by: (1) stimuli that are quite universal, such as pain, a particular facial configuration shown by infant mammals, or anything that moves at high speed towards the viewer; (2) stimuli that have acquired meaning through environmental effects and learning (e.g. a particular person, someone’s dog, or cultural symbols); or (3) how an individual interprets the personal meaning of an object or event in a concrete situation and context. The process that elicits the emotion is known as appraisal (e.g. Frijda, 2007). The advantage of the appraisal approach is that it gives an understanding of what it is in an image that ‘works’ for an individual or a group of individuals with certain shared properties or values (e.g. supporters of a particular politician or idea; Müller and Kappas, 2011).
There are two competing theoretical points of view on how to conceptualize the elicitation of ‘emotions’. One is based on a small number of ‘basic or discrete emotions’ (e.g. happiness, fear, sadness, or anger; see Russell et al., 2011), the other is based on more fundamental underlying differences between emotional states in that they are to different degrees positive or negative, or arousing (e.g. Bradley and Lang, 2007). This dimensional approach might not correspond well to folk-theories in everyday life, but has been shown to be more useful in assessing affective states (Mauss and Robinson, 2009). Thus, for the present purpose, we will adopt the latter approach. This will contribute to an improved understanding of the emotion-eliciting processes in our VCPM (Figure 2, arrows 3, 4, 6–9).
Understanding how and why images elicit specific affective reactions promises also to shed light on how those emotional reactions directly, or indirectly, influence visual production and distribution processes (Figure 2, arrow 9). This would also appear to be crucial in developing ethical standards for distributing images that might elicit strong negative reactions in a subset of recipients. At a more general level, the role of images for affecting attitude change and actions is theoretically associated with emotional affordances. Critical statements, such as Perlmutter’s previously cited doubts as to the power of images, will have to be evaluated in the light of empirical evidence of the emotional force of press photography, taking into account aspects of the individual and the reception contexts (see Figure 1).
The measurement of emotions is not a trivial task. Firstly, it matters which (type of) emotion definition the researcher uses (e.g. discrete emotions vs dimensional approaches). Secondly, it depends on what the components of emotions are. Presently, few researchers would use the term ‘emotion’ synonymously with feeling, but instead consider emotion a multimodal response to events and situations that are appraised in a particular way. Specifically, this refers to subjective experience, physiological changes in the brain and the rest of the body, expressive behaviour, and changes in actions and action tendencies/readiness. Every emotional state involves changes in several but not necessarily all of these components. None of these can be considered a ‘gold-standard’ (see Mauss and Robinson, 2009). This is important as physiological measures are sometimes seen as objective measures of emotion; they are objective, but the observed changes do not map uniquely on emotions, or, inversely, subjective reports are informative but not read-outs as they are easily influenced by volition and thus social context or pressures. Having argued this, if a viewer focuses on a particular symbol (here, the tank in the press photograph depicting a situation of political protest in Nepal in 2006, see Figure 3), as assessed via eye-tracking – and there is evidence of activation of the autonomic nervous system, for example in terms of changes in the electrodermal activity (skin conductance), and there is a smile and a reduction of brow activity as assessed via facial electromyography (EMG) – then the data would suggest that the observed visual behaviour was related to a positive evaluation of a meaningful stimulus. This analysis should then be augmented with knowledge regarding the personal meaning of particular objects or symbols, based on the iconological approach (Müller, 2011), which involves a three-step method from (a) most neutral description of the perceived image to (b) expression of inferred meanings for individual participants, and (c) contextualized interpretations, relying both on typical patterns of visual motifs and on textual sources of interpretation. It is this logic we followed in the design of two pilot studies that will be discussed briefly.
Method
We report here two pilot studies, applying a mixed-method design, combining experimental approaches with qualitative visual content analysis. Specifically, we assessed visual attention (eye-tracking), meaning attribution (qualitative visual analysis/iconology) and emotional reactions and interpretations. These experiments extend our previous efforts in applying psychophysiological measures (heart rate, skin conductance, facial electromyography) while participants observed a set of visual stimuli which they later had to describe and interpret, using a self-report questionnaire (Kappas et al., 2009). We argue that a best-practice approach would be to combine all of these in a single design. However, our third pilot study testing for psychophysiological reactions to the same visual stimuli cannot be presented thoroughly within the scope of this article.
With these two pilot studies we investigated: (a) whether we could replicate effects of task demands/goals (Yarbus, 1967) with complex press photographs; (b) how participants described the press photographs, and what meanings they attributed to them; and (c) how knowledge in the shape of context information provided by captions affected meaning attribution. We also have data (Kappas et al., 2009) on how the same visual stimuli are processed emotionally by different recipients using facial electromyography (EMG) and autonomic measures (such as heart rate and electrodermal activity).
Pilot study 1: Combining eye-tracking and iconological analysis
The purpose was to test whether the results of the study by Yarbus (1967) that observers’ task demands/goals have an impact on exploration behaviour, using the black-and-white reproduction of a well-known Russian painting, would transfer to colour press photography. Five online press photographs, which were unknown to the participants, were presented. The selection criteria were that visuals should (a) depict a current event which was not widely noticed, (b) be a mixture of different camera angles, (c) show different emotional situations that express a certain tension and anxiety, and (d) depict motifs that were ambivalent in their interpretation outside their actual news context. A control condition (16 participants, 7 of whom were female; mean age 20.6) checked for the effects of repeated exposure to the stimuli independent of instruction. In the experimental condition, 13 participants (7 of whom were female; mean age 19.3) viewed all stimuli four times, each time with a different instruction. The order of the presentation of the pictures was randomized between participants; the order of instructions was the same for all participants. For the first viewing, participants simply looked at each picture. Second, pictures were shown in the same order as before but participants verbalized their immediate association. Third, they were asked to describe what happened in the pictures. Fourth, they were asked to describe what the people depicted feel. While serial effects could not be excluded, it is apparent that the sequence of conditions cannot be fully randomized as the free response must come first before specific instructions regarding visual attention; otherwise, the ‘spontaneous’ responses are primed by the previous task. Each picture was viewed for 30 seconds, eye movements were recorded at 500 Hz sample rate at a spatial resolution of about 0.3° visual angle, using an EyeLink II eye tracker (SR Research Canada). In addition, we recorded their verbal descriptions and included elements of qualitative visual analysis, including self-report and iconographic image description, the latter being the first step (pre-iconographical description) in the art historian Erwin Panofsky’s acclaimed iconographical three-step method (see Müller, 2011).
Pilot study 2: Modifying meaning by manipulating captions
To demonstrate how context information can impact on the meaning of images, as indexed by a description of the context, or by ratings of the valence of image content, a pilot study was conducted in which six pre-selected press photographs were paired with captions suggesting either a positive or negative context or presented without a caption (see Köhler and Kappas, 2009). For example, in the case of one image (Figure 4), a man is shown raising his arms and apparently looking at a line of military vehicles passing by. In one instance, this was described as a man greeting troops arriving (positive) or upset at troops leaving (negative). Thus, the caption suggested a different emotional meaning of the event for the person in the foreground; 45 participants rated images in a between-subjects design. The results suggested that some images were susceptible to caption-induced bias, whereas others were not. This most likely relates to factors such as ambiguity and strength of image context that have been studied in the context of the decoding of emotional expressions (Kappas and Poliakova, 2008).

The photograph at the top shows significant effects of ‘positive’ (A man from Georgia is happy to see the troops from his region return) and ‘negative’ (A man from South Ossetia who has lost his home in the conflict is upset at the withdrawing Russian troops) captions on the question ‘What is the mood in this photograph?’ (adapted from Köhler and Kappas, 2009).
Empirical Examples
Findings of pilot study 1: Combining eye-tracking and iconological analysis
To analyse the eye movement patterns, ‘areas of interest’ were defined, and it was calculated how often and for how long those areas were fixated. This allowed testing whether the fixation patterns changed with the instructions, as would be expected according to Yarbus (1967). Figure 3 shows results for one of the images that are typical for instruction-dependent patterns found for all five photographs. This specific photo shows soldiers on a tank near a street blocked by demonstrators. One of the prominent foreground elements is a large red flag waved by one of the demonstrators. On free viewing trials, participants spent most of their time (as a percentage of total viewing time) looking at the tank and the flag as well as at the demonstrators and soldiers. When asked about their immediate associations, the time spent on those areas increased slightly for flag, tank and demonstrators. Evident demonstrations of the effects of instructions on eye movements can be seen when participants describe what the people depicted feel. They looked more on the demonstrators and soldiers on the tank, which are the most informative components in this condition. This effect was clearly evident for all five pictures. We thus successfully replicated the classic results by Yarbus. We also observed inter-individual differences. For example, while some participants looked more on objects and people in the foreground, others spent more time looking at the background. Our pilot study thus points towards the conclusion that processing motivation may play a significant role when looking at press photographs. While in line with the classical demonstrations by Yarbus almost 50 years ago, this result is far from trivial. All too often, dwell times or saccades are taken as indicative of properties of the images (for example, whether they are shocking or novel). This cannot be the conclusion – instead, these objective indicators of visual attention are the product of an interaction between the image and what the observer tried to do with the image (for interactional reception theory, see also Bucher and Niemann, this issue). For example, if a press photo might be suspect with regard to its authenticity, visual exploration might be focused on very different elements than exploration in the context of a particular text or caption. If, for example, a person is described as ‘the last survivor to reach the safe haven of ...’ then individuating aspects of the person (e.g. face) might be more relevant to the observer than if the context was given as ‘one of many immigrants currently flooding the camp...’, which might trigger more attention on the immediate surroundings. Thus, it is important to have a sense of what participants actually thought or understood from the information that was (or in this case was not) provided.
The iconological analysis of the participants’ descriptions of the images showed, for instance, for the example picture (Figure 3), that 77 per cent (10/13) of the participants interpreted this scene as violent; however, 23 per cent (3/13) of the participants interpreted the scene as a celebratory event. One participant, for example, responded: There is a military tank in the centre with soldiers on top of it. It is apparently moving and there are civilians at the back who are cheering. They are apparently happy that the tank is going wherever it is going.
Another participant described the depicted scene as: The protesters are blocking the road for the tank and are celebrating.
This opposing interpretation might be related to high-level factors such as knowledge or previous experience with the depicted situation or objects in the photographs such as the prominent communist red flag. It is likely that participants coming from a post-communist country had a different reaction to the picture as well as a different interpretation of the picture, depending on whether their family supported the respective communist regime or was among the victims prosecuted by the regime. The low number of participants in this pilot study precluded finding significant differences in the eye-tracking patterns. However, the spontaneous differences in interpretation suggest that analyses of visual exploration and reactions should take the individual interpretation into account and not assume a priori that the meaning of an image is shared by all viewers.
Seeing how different the interpretation of such a ‘simple’ image can be, is indeed (no pun intended) an eye-opener. We believe that the differences in how images are being perceived can be rather large, but not random. Instead, there are specific facets, ranging from pre-existing knowledge, to attitudes to context-dependency on all three levels (Figure 1) that moderate the objective viewing behaviour. The consequence here is to establish a research program that systematically tests how large the differences are associated with these different facets to see whether it might be possible to estimate/diagnose particular cognitive processes (such as evaluations) from active image exploration, if particular information is available regarding the viewer (e.g. previous knowledge, cultural background). Alternatively, viewing behaviour can be brought into focus by providing specific viewing instructions, as was done here; however, this clearly would reduce the ecological validity of the data that are thus collected. Compared with free-ranging exploration (e.g. Holsanova et al., 2006) it would be difficult to estimate how images are perceived ‘in the real world’.
Findings pilot study 2: Modifying meaning by manipulating captions
A general assumption in folk-theories, but also among some researchers is that there are certain aspects of images that ‘speak for themselves’ – that people who smile are seen as happy, and that people who cry are perceived as being sad. This is possibly not so – and pilot study 2 was an attempt to demonstrate the effect of meaning-making as a function of the captions provided. In contrast to study 1, where we observed spontaneous differences in interpretation, we wanted to see whether we could bias interpretation of relevant image elements (emotions expressions) using specific image captions. As mentioned earlier, there have been studies trying to assess the relative power of text and image (e.g. Bucher, 2011; Petersen, 2006). Thus, the premise here was that the text can be very powerful – but how powerful can it be? Figure 4 shows an example where the interpretation of the facial and bodily expression of the man on the left is completely inversed as a function of the caption.
Again, this calls for a consideration of the personal interpretation of viewers in experimental designs. However, going further, it underscores that visuals should not (only) be studied in isolation. The typical reception situation is arguably multimodal (e.g. Bucher, 2011). However, allowing for the full impact of multimodality can also be limiting because it is difficult to estimate the interaction of multimodal context and specific image (or sequence of images). This is relevant in many applied contexts, such as when it comes to the selection of images to be placed in a multimodal context. Understanding this interaction suggests a particular research strategy in which image effects are estimated with and without context (see Bucher and Niemann, this issue).
Furthermore, even within images, plenty of context is provided – consider the role of symbols such as the flags in Figure 3. In fact, even the framing of a portrait might affect how the person is being perceived.
Relevance
The relevance of this contribution is threefold. Firstly, we want to underscore that visual communication is a complex process that ranges from criteria that might lead an editor to select an image to provoke a particular response, low-level visual aspects that direct attention and memory of images, subjective evaluation of images that can be, for example, a function of knowledge, attitude, motivation, or group membership, to the specific multimodal context in which images are presented. Secondly, we propose a VCPM to aid theory development and as a roadmap for the development of research programs that take the sequential interfaces of visual perception, meaning-attribution and emotional reaction into consideration (see Figure 2). Even in its very simple state, the model allows the prediction of certain effects and cautions against oversimplification regarding a discourse on the impact of images. The model is an integration of different theories and viewpoints into a coherent framework. Thirdly, using some pilot research from our laboratories, we demonstrate how different pieces of the puzzle might be investigated with the ultimate goal of providing a coherent theory based on the VCPM.
What are the consequences of the model? Considering the impact of emotional content on the allocation of attention and possibly the reduced ability of observers to control their attention, a relevant question for current media practitioners and media scholars arises: if audiences cannot fully control their visual attention, should stricter ethical selection criteria be applied to the publication of, for example, news photographs that depict violence or human suffering? Considerations concerning strong emotional reactions towards photographs are, for instance, apparent in the controversy over the deliberate publication of a press photo depicting the severed head of a female Palestinian suicide bomber in the magazine of the Swiss quality newspaper Neue Zürcher Zeitung (NZZ-Folio, 2005), which culminated in a reprimand from the Swiss Press Council, or the potential post-traumatic stress disorder (PTSD) caused by explicit images in the aftermath of terrorist attacks (Ahern et al., 2004; Tso et al., 2011), or natural disasters like the 2004 tsunami in Southeast Asia, Hurricane Katrina in the US 2005, or the tsunami and ensuing nuclear disaster in Japan in 2011. However, before specific traumatizing effects of visual press coverage can be analysed, basic assumptions about the relationship between visual perception, meaning-attribution and emotional reaction have to be established.
The present approach is relevant for visual communication research because it not only aims at investigating the impact of high-level factors on attention and eye movements but goes much further by linking visual exploration, meaning-attribution and emotional factors. We believe this approach to be very fruitful and innovative for visual research because the investigation of the link between eye-movement patterns and the understanding and interpretation of the meaning of a picture is typically not assessed because participants are not asked to describe what they see and understand. In fact, the exact meaning of a picture to a person is rather neglected in psychological studies, partly because the main focus of the research has not been about a given type of picture (e.g. press photography) and its theme(s), and also because sound methods of how to assess and analyse the meaning and interpretation of complex scenes are largely not available in this research tradition. This is where the mixed-method design, including iconological analysis, is most promising. However, many more studies have to be conducted before a reliable body of evidence can corroborate our assumption and understanding of visual communication’s intricate processing modes and modulations.
The VCPM provides a coherent framework in which not only individual differences in the exploration and impact of images can be explained, but also groups of individuals, particularly with regard to cultural differences. Here much research is needed as images travel easily, specifically via electronic media but their meaning does not travel in the same way. Instead, the intended meaning, associated with the motivation of producers and distributors of images interacts with the personal meaning-making that takes place in different cultural and reception contexts. Clashes between intended/anticipated meaning and individual meaning of the receiver can only be understood using a multimodal, multidisciplinary framework.
Limitations
The main limitation of this contribution might be its ambition. It is obvious that many researchers before us have commented on the complexity of aspects of communication processes, as well as visual communication. However, part of the innovation of our model is that we tried to bring together different questions and different approaches with the goal of modelling the complete visual communication process in what we discovered to be a cyclical structure (Figure 2). This is indeed an ambitious enterprise. However, critics might question whether a simple model, with a few arrows, was not too simplistic to capture the complexity of the process we are interested in? Should it not be more complicated? We see the present version of the VCPM as a working model – an initial sketch that interacts with the empirical research it is intended to structure and generate. It is itself a work in progress. Hence, this is a first-generation version of our model that might be incomplete or unbalanced. However, at the moment, it is the best available blueprint for researching visual communication in an interdisciplinary yet integrated approach.
The examples we have selected come from press photography – there are many other types of mass-mediated visuals and one could argue that the research and the model might fit with press photography but not with other media, or other production contexts, such as advertising, political or scientific communication or art (see also Müller and Kappas, 2011). However, we believe that, as sketchy as Figure 2 might appear on paper, this model is flexible enough to be applied to other multimodal media types. The pilot studies were intended to demonstrate how one could go about testing and demonstrating the interaction of specific sub-processes. They do not stand alone and they were intended to be pilot studies with a view to extending these. Specifically, as regards the synchronization of all measures, eye-tracking, psychophysiology and subjective experience, and description of images, there still remains a lot of work to do. Some of that requires time, some of it requires funding. Internally, there are many directions in which one could develop the research; for example, one could analyse the time course of eye-movement patterns during the viewing period in order to assess attention allocation during an initial orientation phase and later cognitive processing (Bucher and Schumacher, 2006). Maybe this would make sense for emotional measurements as well, but this would go beyond the scope of this article.
Further limitations of our study relate to the ‘artificial’ laboratory conditions in which the two pilot studies were conducted. We used colour press photographs from the online version of a popular German newsmagazine. Thus, our study did not involve testing the actual reception context of individual participants perceiving press photography while browsing the web, nor what differences might result when testing press photographs in a condition of reading a black-and-white printed newspaper. While Yarbus (1967) used black-and-white reproductions of a coloured painting of the 19th century, showing only few strong emotional reactions on the faces of the depicted persons, and demonstrating the task-dependence of eye-tracking patterns, we used very current news images with strong colours and also strong emotions on the faces of most of the depicted people. A follow-up study, controlling for colour in terms of different participant groups looking at the same visual stimuli, one in colour, and the other in black-and-white might shed light on the role of colour in guiding viewers’ attention and interpretation processes.
Discussion
The present special issue testifies to the rapid development that the inclusion of eye-tracking into the communication scientist’s tool box has triggered. Given the importance of visuals in all walks of life, visual communication is still marginalized in the study of communication. However, objective measures of visual attention do contribute to a better understanding of how humans interact with their world, including personal and mass-mediated images. This is important as textual communication has a sequential logic that is evident, easily produced and easily reproduced as intended. However, images are more complex and based on non-sequential association. Additionally, the exploration and interaction of the individual with an image is much more idiosyncratic. While visual perception involves a sequential process, the physical source material of an image consists of a large array of information that is being fed through a non-sequential funnel of visual attention. Every time we look at an image, there are likely to be differences in how we look at it, how we interact with it, how we use it to answer questions, or are challenged by what we see – the interaction of the mental and material images. It is here that a multidisciplinary research design opens windows on the comprehension of visual communication.
However, neither eye-tracking, nor iconology or psychophysiology are mind reading methods. Even if they were, if we think of mind as the thoughts that the little voice in our head seems to provide, this is only the tip of the iceberg of how our brain – and with it our biological and socio-cultural history – deals with the interaction of the self and visual representations. Meaning is an utterly complicated construct that involves concepts that have been discussed in the humanities, social and behavioural sciences. These approaches hold controversial assumptions as to what exactly influences meaning-attribution, or by what specifically the meaning is ‘triggered’ – by the material image/artifact, or by the mind of the observer, or even by the mind of the producer. It is with this complexity in mind that we are proposing the VCPM. We see it as short-hand for the integration of theories and empirical evidence, generated in different disciplines. It is a roadmap for further research and interdisciplinary exchange. It requires multidisciplinary approaches and necessitates good will and tolerance from all disciplines involved when dealing with the complexity of the visual process and coping with the difficulties of bridging disciplinary gaps by creating interfaces and synapses that could improve our understanding of visuals that are increasingly omnipresent, yet more ominous than obvious.
Footnotes
Acknowledgements
The authors would like to thank the reviewers of this paper for their constructive criticism which helped to improve the manuscript to a considerable degree. The two pilot studies were made possible thanks to start-up funding provided by the School of Humanities and Social Sciences at Jacobs University Bremen. ‘Perceiving Press Photography’ is a transdisciplinary research project conducted in the context of, and with support from, the Research Center Visual Communication and Expertise (VisComX) at Jacobs University (
).
Notes
Biographical Notes
MARION G. MÜLLER is Associate Professor of Mass Communication and Director of the Research Center Visual Communication and Expertise (VisComX) at Jacobs University Bremen, Germany. Previously she chaired both the German Communication Association’s Visual Communication Division and the International Communication Association’s Visual Communication Studies Division and has published six books, among them the first German textbook on Foundations of Visual Communication, as well as special issues and many journal articles on visual communication theory, visual competence, political iconography, press photography and the role of visuals in press coverage with respect to conflict, war and amok.
Address: School of Humanities and Social Science, Jacobs University Bremen, Campus Ring 1, D-28759 Bremen, Germany. [email:
ARVID KAPPAS is Professor of Psychology at Jacobs University Bremen. He has been conducting research on emotions for over 25 years. Having obtained his PhD at Dartmouth College, NH, USA, he has lived and worked in Switzerland, Canada, the UK and in Germany. His research addresses how factors, such as the social context, or certain cognitive processes, influence how components of the emotion system interact, such as what people feel, what expressions they show, and how their body reacts. His recent work relates to Internet Communication and Affective Computing.
Address: same as Marion G. Müller. [
BETTINA OLK is Associate Professor of Psychology at Jacobs University Bremen. She obtained her PhD at the University of Bristol, UK, and has conducted research in Canada and the USA. Her work aims at the understanding of fundamental mechanisms of visual attention, eye movement control and brain functions. Past and current cognitive and neuropsychological research questions focus on the relationship between reflexive (involuntary) and voluntary overt and covert orienting, i.e. how involuntary and voluntary orienting are integrated on a behavioural as well as a neural level in healthy young and elderly persons, children with autism and patients with brain injuries.
Address: as Marion G. Müller. [
