Abstract
The film editor’s task in refining film edits by frame-by-frame matching is an important undertaking in perceptual precision. This article investigates whether the failure of a few frames jeopardizes the perceived continuity of a film. Thirty-three Swedish students were eye-tracked while watching two versions of the same documentary film sequence; one version was completed to continuity satisfaction by a film editor, while the other had some frames altered toward discontinuity. Gaze hits in Areas-of-Interest appointed by the film editor, saccade frequency, and pupil dilation after edit points were measured. No significant difference was found for hits in Areas-of-Interest, whereas saccade frequency and relative pupil size increased after edits in the altered version of the film sequence. Results indicate that the altered film sequence constrained viewers with possible cognitive effects, implying that frame-by-frame matching of film edits achieved by film editors is crucial to film continuity.
Keywords
An ongoing investigation among film scholars regard how continuity is achieved in film creation, not merely as Hollywood convention or “style” but rather how narrative films are adapted to human perceptual and cognitive capacities (see Smith & Henderson, 2008, for a literature review, or Shimamura, 2013, for a comprehensive compilation of theory). Continuity, primarily, is a means of making film narratives comprehensible to the audience. In this article, narrative means conveying a story, or several stories, in either fiction or nonfiction form. It is also noted that there are several categories of film that do well without the use of story, such as essay film (Corrigan, 2011), art film (Perlmutter, 1975), music video (Calavita, 2007), or film of attractions (e.g., such as trick films in the early days of cinema [Gunning, 1990] or YouTube videos of extreme sport stunts [cf. Partington, 2008]); hence, these films do not need continuity. However, film stories that are to be experienced and enjoyed by viewers must involve and engage the viewer emotionally, including embodied events (Corrigan & White, 2012; Plantinga, 2009). Such engagement exists also in relation to nonfiction film (Corrigan & White, 2012; Sobchack, 1999). Discontinuity breaks the audiovisual flow that enables the film viewer to get involved in the cinematic experience and makes him or her highly aware of the film that is being viewed as a technology mediated and constructed object. Therefore, careful consideration to film construction regarding continuity principles enhances the involvement experience of the viewer (Grodal, 1999, 2009; Persson, 2000).
Continuity and its opposite, discontinuity, need some further introduction. When film, most commonly, consists of numerous joined shots, the apparent consequence is that each shot in a series of shots is different from the others and identifiable as an individual shot. Shots follow each other, with distinct shifts from one to the next; the shots are discontinuous, since there are overt breaks between them (Bordwell, Staiger, & Thompson, 1985; Lindgren, 1948). These breaks relate to how objects, time, and space, together with corresponding ecological conditions and viewpoints are presented in addition to the graphical appearance of the shots (Bordwell & Thompson, 2008; Corrigan & White, 2012; Zettl, 1999). Thus, any two shots are discontinuous from the outset. The challenge for any filmmaker is to overcome these discontinuities, if that is the desired aim, in order to enable the film viewer to experience and follow the film as one continuous audiovisual flow. For that reason, since the early days of film making, filmmakers have developed means of achieving impressions of continuity, regarding what is in front of the camera, how the camera operates, how the shots are joined in editing, and how the sound can support the impressions (Bordwell et al., 1985; Lindgren, 1948). Continuity editing strives for graphical matching of shots, making movements in or out of the shots match, creating a sense of rhythm, making objects, space, and time appear as continuing smoothly from one shot to the next (Bordwell & Thompson, 2008). Thus, giving structure to the flow of shots acts as a kind of mental map for the viewer to relate to while following what is presented on screen (Zettl, 1999). Whether a film is edited by continuity principles or not makes a difference to how film viewers experience the film emotionally as well as intellectually (Corrigan & White, 2012).
This is not to say that continuity and discontinuity are the only valid concepts for understanding film editing style. Pearlman (2009) prefers linkage and collision, concepts she interprets from Pudovkin and Eisenstein as terms for intershot relations. Murch (2001) overrides the value of editing style in favor of emotion, story comprehension, and rhythm. These are also aspects of film editing highly appreciated by Pearlman (2009).
Nonetheless, continuity or discontinuity within and between shots is a matter of shot construction, relating to the characters appearing in the film, mise-en-scène, cinematography or videography, and the associated sound. Further, how shots flow in a sequence, as a continuous or discontinuous image stream, is a matter of how they are joined, concretely, in the editing. This is an aspect of editing with its own impact on continuity or discontinuity and therefore merits an examination of its own.
How a film is edited is already understood as key to accomplishing film continuity by both practitioners and scholars (Fairservice, 2001; Orpen, 2003; Reisz & Millar, 1968; Smith, 2005, 2012), and there is reason to expect that continuity matters in film viewing (Messaris, 2012; Smith, 2012). The film editor takes on the duty of refining every continuity edit to its optimal flow so that the viewer perceives the film as continuous. As in the case of the problem addressed here, failing to refine the edits thoroughly can then jeopardize the perceived continuity of the film. The effects of discontinuity in film editing have thus been studied in order to establish whether continuity editing, as prescribed in film theory, is perceptually important. Results, though, are not uniform (d’Ydewalle, Desmet, & Van Rensbergen, 1998; d’Ydewalle & Vanderbeeken, 1990; Hecht & Kalkofen, 2009; Shimamura, Cohn-Sheehy, Pogue, & Shimamura, 2015; Smith & Henderson, 2008). In this article, two versions of the same film sequence, one version with continuity edits and one version with the same edits slightly altered toward discontinuity (by opening up for stronger visual transients), are tested for the perceptual effects on film viewers by means of eye tracking. The perceptual intentions of the editor who made the continuity film sequence are known and then compared with the eye-tracking data. In brief, then, the effects of the film editor’s continuity editing can be measured via the means of identifying specific gaze behavior of the film viewer.
Film Perception Research
Interest in the relation between film continuity and human perception dates back to early film history (Münsterberg, 1916/2002) and has been scrutinized by perception researchers (e.g., Carroll & Bever, 1976; Hochberg, 1986) with an increased interest in recent decades in eye movements during film viewing (e.g., Goldstein, Woods, & Peli, 2007; Marchant, Raybould, Renshaw, & Stevens, 2009; Tosi, Mecacci, & Pasquali, 1997). More recently, the Eye-Tracking the Moving Image Research group has progressed and widened this research interest, using eye tracking for studying a variety of individuated ways of watching film, not only from the perspective of controlling attention (Brown, 2015; Robinson, Stadler, & Rassell, 2015) but also combining this method with other methods in order to broaden the understanding of film comprehension (Dyer & Pink, 2015). Still, filmmaking includes the delicate matter of mastering attention (cf. Batty, Perkins, & Sita, 2015). Perceived film continuity, as an illusion accomplished by the functioning of the human perceptual system, is principally described by Berliner and Cohen (2011). Continuous film narratives are shown to strongly engage viewers, since they considerably distract the viewer’s attention away from executing secondary tasks while viewing (Cohen, Shavalian, & Rube, 2015). In addition, motion in dynamic scenes is found to uniformly draw the viewer’s gaze (Mital, Smith, Hill, & Henderson, 2011).
There has also been continuous research interest in the perception of film edits, some in gradual transitions between film shots (e.g., Cutting, Brunick, & DeLong, 2011), but mostly toward cuts. Geiger and Reeves (1993) studied TV programming regarding how cuts between scenes of related and unrelated content, respectively, make impact on human attention. Sequences of unrelated content were found to be harder for viewers to follow and therefore stressed their cognitive load (Geiger & Reeves, 1993). In an effort to improve video coding, Tam, Stelmach, Wang, Lauzon, and Gray (1995) studied the visual masking effects of scene cuts between two moving images on viewer perception of coding artifacts. They found that the first frame after a cut is masked from human visual perception (Tam et al., 1995). In his doctoral dissertation, Carmi (2007) investigated the relation between attention, cuts, and natural vision. He found that discontinuous cuts made viewers’ gaze behavior more similar, in focusing on the screen center. Germy and d’Ydewalle (2007) examined the effects of object movements across cuts and concluded that it is the most salient area of a moving image that draws viewer attention after an edit, regardless of the shape of the edit. The impact on viewer perception of filmic events was studied by Magliano and Zacks (2011), in regard to editing continuity (or not). Their results suggest that continuity editing support event perception, not just mere visual continuity perception (Magliano & Zacks, 2011). When reviewing the state of research on film perception, Smith, Levin, and Henderson (2012) noted that there are several kinds of evidence that point to continuity editing as reflective of natural shifts of human attention. For instance, the construction of film sequences as orders of shots follows a similar pattern to how attention follows in a similar real-world environment: motion and faces draw attention, as do viewing-task relevant features. Conversational turns, gaze shifts, and gestures of characters cue attention. Image brightness limits attention and blinks segment events. All these phenomena are drawn on in continuity editing (Smith et al., 2012). In recent research, eye movement across cuts has attracted interest (e.g., Shimamura, Cohn-Sheehy, & Shimamura, 2014; Smith, 2005, 2012, 2013). In experiments, film viewers were asked to indicate when they discovered a graphic distractor inserted into film sequences (Shimamura et al., 2015). Results showed that viewer attention to postedit distractors was significantly less than to preedit targets but that postedit attention increased when sound was removed from the film sequence. These findings imply that multimodal perception is stronger than visual perception alone and have disruptive effects on film viewers after edit points.
Rules and conventions of continuity editing have previously been examined from a perceptual perspective (d’Ydewalle & Vanderbeeken, 1990; d’Ydewalle et al., 1998; Hecht & Kalkofen, 2009; Schröder, 1990; Shimamura et al., 2015; Smith & Henderson, 2008), and film editors’ sensitivity to attentional cues has been recognized as key to continuity editing in practice, with the purpose of avoiding visual transients (Smith, 2005). D’Ydewalle and Vanderbeeken (1990) used both eye tracking and reaction time tests in order to estimate the effects of maintaining or abandoning classical “film editing rules.” This research showed that violation of editing rules provoked gaze reactions as well as cognitive responses. Later, d’Ydewalle et al. (1998), using eye tracking, found eye movements 200 to 400 ms after cuts that do not adhere to editing rules. When studying viewer detection of edits by means of eye tracking and reaction time tests, Smith and Henderson (2008) found a distinct difference between how often viewers failed to detect continuity and discontinuity edits; 9% of the discontinuity edits were missed, compared with 11% to 32% of the continuity edits (depending on type of continuity). Counter to the privilege of continuity, Hecht and Kalkofen (2009) produced results indicating viewers’ preferences away from continuity at match-action edits. In their study, viewers were asked to actively alter the edit point between two computer graphic-animated camera angles of a moving “blimp,” until the cut seemed to represent a continuous motion, using 200 ms incremental steps (approximately five frames). Viewers preferred a gap in the motion between the camera angles, rather than an overlap, which is a continuity editing prescription for match-action cuts (cf. Anderson, 1996; Dmytryk, 1984). These results are questioned by Shimamura et al. (2014) in terms of neglect of finely grained differences in action matching at edit points (down to the single-frame instance, 42 ms increment order) and disregard of blink durations as a possible explanation. Instead, by employing a similar methodology, and also including single-frame incremental order of frame altering as well as live-action moving images, Shimamura et al. (2014) found that viewers preferred a three-frame overlap for match-action edits.
Research regarding how sensitivity to attentional cues is actually and explicitly employed by film editors has been missing (Smith, 2005, p. 351), and, recently, an observational study by one of the present researchers has explored this issue, which is presented as the first stage of the current study. In that study, the editing of a documentary film sequence was followed and screen recorded, as video that captured editing software, film material, as well as editor–researcher conversations (Swenberg, 2016). The analysis of the film editing against established perceptual phenomena (cf. Hochberg & Brooks, 1978; Smith, 2005; Wang, Freeman, Merriam, Hasson, & Heeger, 2012) measured the amount of phenomena at stake for each edit, time consumption, and reiterations needed for the editor to complete the edit (Swenberg, 2017). In this particular case, the film editor’s handling of perceptual phenomena, such as audiovisual transients and attentional cues, occupied one third of the film-editing process time, with a strong correlation between the number of perceptual phenomena at an edit point and time to complete the shape of the cut to the editor’s satisfaction. These results are taken as evidence confirming Smith’s (2005) predictions regarding how film editors attend to visual perceptual phenomena.
In the present article, the effects of the editor’s perceptual considerations on viewers’ gazes are considered as the second stage of the study. Eye-tracking data provide evidence regarding the editor’s achieved fluidity in the editing as a manifestation of continuity. The current aim is to test whether an editor’s achieved continuity in the film sequence is dependent on its frame-to-frame matching, as a kind of perceptual precision. The research question addressed is as follows: Can the altering of a few frames (four to six) of continuity edits impact the film viewer’s perception of these edits as they occur in a film sequence?
The results will indicate whether perceptual precision is important or not and thus support or oppose theories on film continuity. The results will indicate whether perceptual precision is worthwhile in film editing and thus give directions for how to expend editing efforts. Together, the results from the two parts of the larger study will link how perceptual precision is achieved in film editing, at what cost, and for what purpose.
Theoretical Background
The core concepts in this article are edit, edit point, continuity, discontinuity, perception, attention, and transients. These, and a few more terms, are presented here.
Perception here refers to the process during which sensory organs respond to, and excerpt salient information from, the world surrounding a human, and send signals to the brain for further processing, separate from ideas and understanding (Goldstone, de Leeuw, & Landy, 2015; Pylyshyn, 2003). The cognitive function by which “some sensory inputs are processed faster or deeper than others, and thus become more readily available for action, memory, or thought” (Lamme, 2003, p. 12) is called attention. Such processing can take place during sensory input, while the cognitive processing of the input takes place, as well as when a conscious awareness of the input is reached (Smith, 2005, referring to Lamme, 2003). Attention can be steered by will (top-down) or provoked by the senses (bottom-up), and Pinto, Van der Leij, Sligte, Lamme, and Scholte (2013) found that these are separate processes.
Film perception is a term used to encapsulate human capacities to adapt natural perceptual functioning to the situation of experiencing a film (Berliner & Cohen, 2011; Smith, 2012). During film perception, many viewers frequently attend to the same image features simultaneously, which is labeled attentional synchrony (Smith & Mital, 2013). All sound or image properties in a film sequence that stand out, which induce attentional or perceptual reaction in a viewer, are referred to as transients (Smith, 2005). Attentional cues are transients that provoke attention, for example, an unpredicted noise or a radical change of visual information (cf. Smith, 2005; Swenberg, 2017).
Continuity refers to the human experience of the world as stable, that is, continuous in its existence in regard to our sensory input (Hecht & Kalkofen, 2009; Smith, 2012). Any sudden change in the world causes sensory transients that are detected and understood as a discontinuity (Smith, 2012). In film perception, perceptual continuity is the viewer’s experience of a film as presenting a continuous flow of unbroken audiovisuality, usually a scene in a story, whereas film continuity is the audiovisual construction of filmic expressions meant for perceptual continuity. Smith’s (2012) Attentional theory of cinematic continuity focuses on the relationship between the film, as a construction, and the viewer’s perception of it. Discontinuity is similarly twofold: Viewer perception or attention to audiovisual transients that disturbs the experience of a continuous audiovisual flow is defined as perceptual discontinuity (cf. Smith, 2005), while the filmic construction provoking those transients constitutes film discontinuity.
Regarding film construction, a shot is a limited instance of a moving image, with properties such as composition, framing, and a start where the camera started recording and an end where the camera stopped recording, on each specific occasion. Shots are assembled into a sequence during editing, with no gaps in between shots, with a sound track running along. Each visual shift in this sequence, one shot ending, followed by the next shot starting, is an edit as a transition between the two shots. When such a transition occurs immediately from one frame to the next, it is called a cut. There are other graphical forms of shot transitions with specific names not regarded here. The point in time where an edit appears is defined as an edit point. The shot running before the edit point is referred to as the incoming shot, whereas the shot starting at the edit point is called an outgoing shot. The work of a film editor is labeled editing, which includes choosing shots; ordering them into a sequence; removing, adding, and ordering sound; and trimming the transitions between shots for the desired outcome. Transitions desired to appear smooth or invisible are referred to as continuity edits, where the sequence supposedly appears as one continuous shot, without breaks (Swenberg, 2017). Breaks between shots may be left with transitions overtly transient, to be responded to perceptually, attentionally, or consciously by a viewer, and are then called discontinuity edits. Yet, attentional cues may be employed for discontinuity just as for continuity, depending on the choices of the editor (Smith, 2005).
Edits, Types of Continuity, and Number of Frames Altered (Incoming, Outgoing) in Condition 2.
Note. Types of continuity are categorized as Scene continuity (Sc), Action continuity (A), Space continuity (Sp), Time continuity (T), Gaze continuity (G), and Event continuity (E), based on Cutting, Brunick, and Candan (2012); Hecht and Kalkofen (2009); Magliano and Zacks (2011); Smith (2005, 2012); Smith and Henderson (2008). For incoming shots, −frames means that the shot ends earlier than in Condition 1, whilst +frames means that it ends later. For outgoing shots, −frames means that the shot begins earlier than in Condition 1, whilst +frames means that it starts later.
Perceptual phenomena are all explicit audiovisual features standing out as objective qualities in the film material, possibly provoking perception, attention, or awareness in a viewer. Actual film cuts are often made complex by containing a manifold of perceptual phenomena that offer several competing audiovisual transients in combination as cues across the edit (Smith, 2012). Film viewers expect to perceive phenomena that are at the focus of their attention, also across cuts, and as long as these minimum expectations are met, it is possible to cue viewer attention for perceived continuity by employing audiovisual transients (Smith, 2012). Thereby, discontinuous features of cuts can be hidden from viewer attention, which makes perceived continuity possible also in film genres that do not consist of shots where film continuity is considered while shooting.
During the trimming of edits, the audiovisual perceptual properties of shots are evaluated by the film editor, either using audiovisual transients for discontinuity or (mostly) avoiding or (occasionally) employing them for continuity (Swenberg, 2017). In regard to what is convenient, the editor’s degree of optimal use of audiovisual transients can be regarded as the perceptual precision achieved. This perceptual precision will be tested experimentally here, when film viewer’s eye movements are captured. Fast eye movements (saccades) after edit points are considered as responses to the edits, if they occur within a time frame of 120 to 400 ms after the edit (Smith, 2005). This is also the time span within which a missed Area-of-Interest (AoI) intended by the film editor can be caused by a slightly discontinuous edit. Eye behaviors outside the appointed time window are likely to be the responses to other factors than the edit.
Increase of pupil size of 200 to 500 ms after edit points will be taken as evidence of increased cognitive load (Beatty, 1982; Chen et al., 2016; Kramer, 1991; Seeber, 2013). Testing of mathematical problem-solving (Beatty, 1982; Chen & Epps, 2014) and of configuration memory (Chen & Epps, 2014; Chen, Epps, Ruiz, & Chen, 2011; Peysakhovich, Causse, Scanella, & Dehais, 2015) have established pupil size increases of approximately 5% for easier tasks, whereas difficult tasks cause pupil sizes to increase by 10% and more. Intermediate task difficulty results in 7% to 8% larger pupils. Particularly, pupil dilation is shown to be a stable measurement when engaging young adult participants (Paas, Touvinen, Tabbers, & Van Greven, 2003).
Hypotheses
Our aim is to distinguish whether perceptual precision in film editing is important or not. The primary hypothesis is that altering a few frames (four to six) of continuity edits as they occur in a film sequence will impact film viewer’s perception of these edits: where the viewer looks, how many compensatory eye movements are made after an edit (saccades), as well as the pupil size of the viewer after the edits. When the smoothest possible continuity edits created by a film editor are counteracted by removing or adding a few frames, visual transients are supposed to occur at the edits (Smith, 2005), sufficient to provoke perception or attention and thereby draw slightly toward discontinuity (Smith, 2012; Swenberg, 2017). This slight discontinuity effect will manifest itself by the film viewer attending less frequently to the AoIs intended by the film editor, by a higher amount of compensatory saccades (Smith, 2005), as well as increased pupil dilation (Seeber, 2013), after edit points.
Method
Eye tracking was used to gather eye movement data from the film viewer. A methodology regarding the full study, including video observations with running screen recordings of a film editor while editing from the first stage of the study, is presented in Swenberg (2016).
Design
The study is designed to falsify the intentions of a film editor regarding perceptual aspects of achieving continuity edits in a documentary film sequence (cf. Swenberg, 2016). Documentary and fictional film are regularly produced in different ways. A striking dissimilarity is the degree of planning shots before recording them. Documentary films are usually made by shooting the currently most interesting event, only occasionally considering how individual shots should be joined in editing. Often, editing principles, like continuity, have to be abandoned (Hampe, 2007; Reisz & Millar, 1968). When producing fictional films, the uttermost care is usually taken to plan the shots for good continuity matching in editing. It is common for documentary film makers to try to achieve a film flow when editing so that sequences are experienced as continuous, thus employing continuity editing principles, when possible. However, there is a greater challenge to attain continuity from documentary footage (Billinge, 2017; Kriwaczek, 2003; Rabiger, 2014; Reisz & Millar, 1968; Rosenthal, 2002). Some documentary film researchers reject the relation of documentary films to continuity editing (e.g., Hight, 2008; Rabinowitz, 1993), while others recognize it. They note that although documentary films tell nonfictional stories, documentary film makers occasionally use filmic devices common to fiction film, when applicable (Corner, 2008; Nichols, 2001, 2010; Ward, 2005). Such devices are time and space continuity (Bruzzi, 2004; Chanan, 2000; McLane, 2012), which are also manifested in continuity editing. The reason for using documentary film material in this study was that it would be a challenge to the film editor in achieving continuity edits; thus, the editing would be more overt to analysis for perceptual phenomena at stake.
The editor’s intentions regarding viewer perception of the edits were captured through observations, screen recordings, and an elicitation that was video recorded (Swenberg, 2016). An alternative version was created where the editor’s intentions were countered. Both versions of the film sequence, the editor’s and the altered sequence, were then screened for participant viewers in randomized order, mixed with other visual stimuli, in order to avoid immediate repetition. During this screening, participants were eye-tracked. The eye-tracking data were analyzed after edit points for hits in AoIs, amounts of saccades, as well as pupil dilations. The measurements from the two versions of the film sequence were then compared.
Participants
The film-viewing participants consisted of 50 students recruited at Dalarna University. Data from 17 of these participants had to be abandoned; for 5 of the participants, gaze calibration to the required precision was not possible (1° of visual angle, see Procedure section). Five participants admitted to being conscious of having their gazes recorded, which is likely to have affected their viewing behavior (Holmqvist et al., 2011). Four participants acted counter to experiment procedure by skipping the second version of the film sequence when it appeared on the screen or behaving unusually nervously. One participant disagreed with the use of the data recorded and hence deleted, while two participants had a low gaze-time-on-screen (GTS) ratio, 1 considered as bad eye-tracking data quality (cf. Hvelplund, 2014), and the data were therefore removed.
Data from 33 participants were used for analysis. Within this group of participants: 30 had Swedish as their mother tongue, 17 were female, and 32 claimed to be inexperienced in film creation (although 2 of these participants attended moving image production courses and 2 had previously done so). Three participants were graduate students, while the remaining were undergraduates. The average age was 26 years. All participants had normal or corrected to normal eye sight and were unaware of the purpose of the experiment. Participation was voluntarily and rewarded with a cinema ticket (€10).
The research project was checked by the authors for ethical aspects, according to the local Bill-of-Self-Audit (Dalarna University Research Ethics Committee, 2008), and passed all stipulated criteria. Neither procedure nor stimuli was unethical in regard to Dalarna University research standards.
Apparatus and Stimuli
The eye data were collected by means of a stationary eye tracker, SMI RED250 (Figure 1), operating at 120 Hz sampling rate, run by the SMI iViewX 2.8.26 software. Participants were seated in front of a computer screen, with an eye–screen distance of 70 ± 10 cm, where the film sequences were monitored as experimental stimuli. The sound was calibrated as balanced around an 82 dB threshold for pink noise in the listening position. The stimuli were run with SMI Experiment Center 3.0.128 software, from a control station next to the participant position, separated from it by a screen wall, to avoid having the participant disturbed by the experimenter.
(A) The researcher’s position (chair + table with computer, screen, mouse, keyboard, and sound controls). (B) Participant’s position (chair + table + stimuli screen, mouse [M], keyboard [K], and speakers [S]). Speakers were placed on a separate table to avoid spread of vibrations to the eye-tracking camera. The researcher’s and the participant’s respective position are separated by a screen wall to shield the participant from visual disturbance from the researcher’s activities (see Swenberg, 2016).
A 3-minute, 10-second (3’10”) long sequence was created by the film editor from a stock of existing documentary film footage. Documentary film material was chosen in order to stress the challenges of achieving continuity at the edit points, since such film material is generally not shot with regard to continuity editing (Rabiger, 2014; Reisz & Millar, 1968), thus making the effort of the editor more overt for scrutinizing during the first stage of the study. The film story was about a man restoring an old family building, which he presented along with the restoration process. The editor worked on the sequence until she was content with its appearance and she then accepted public exposure of the film sequence along with her credentials. This version of the sequence (https://vimeo.com/210944884/be8001e786) is referred to as Condition 1.
Knowing the editors intentions with each edit, a new version of the documentary film sequence was created (https://vimeo.com/210953506/9ed3cb3b6e). This is referred to as Condition 2, where every edit was altered by one to three frames each for the incoming and outgoing shots, in order to counteract the sought-for continuity (see Table 1). This altering is here considered as a slight draw toward discontinuity by an increase of visual transients at the edit points.
The video material was generated at 25 frames per second, in ProRes4-2-2 codec, with 1280 × 720 resolution .avi-files. It was screened on a Dell P2211 computer screen, driven by a NVIDIA GeForce GT440 video card, in 1680 × 1050 px resolution with SMI default codec xmp4. The screen emitted 90 cd/m2 light, for a 255-255-255 white screen at 65% brightness and 75% contrast. Yamaha HS50M speakers were used as sound monitors at 82 dB for pink noise at the listening or viewing position (see Figure 2, X2), a sound level which was assigned balance Level 0, around which the sound fluctuated ± 15 dB. For analysis of the eye data, SMI BeGaze 3.5.101 was used.
Distances between participants and screen, speakers, and screen and speakers. Positions X1 and X2 are measure points for sound and light emission. X2 is also the participants’ viewing position (see Swenberg, 2016).
Procedure
The eye tracking of the film-viewing participants was conducted in the laboratory. To counteract possible stress and to level out blood–sugar levels of participants, they were given approximately 15 minutes of rest, with a cup of tea or coffee, or some mineral water, and a biscuit, upon their arrival at the lab. Participants leaving the lab were prevented from meeting the entering participants and were requested not to talk about the experiment until the recordings were over. Preexperiment knowledge about the purpose of an experiment has an effect on participant behavior during the experiment (Holmqvist et al., 2011). Tasks during eye tracking affect the viewing (Yarbus, 1967), and knowledge about experiment purposes may be interpreted into tasks by participants (Holmqvist et al., 2011, pp. 77–79). Thus, participants were not informed about our hypothesis before the experiment but were assured and given information about the aim and purpose of the experiment afterwards (cf. Holmqvist et al., 2011). The pre- and postexperiment information was given to the participants in writing. Participants were afterwards given the opportunity to give their written consent or to disallow the usage of their eye data for further analysis. If the participant disallowed usage, the eye data were immediately deleted from the eye-tracking computer, while the participant looked on.
Each participant, one at a time, was seated before a computer screen, eyes at approximately 70-cm distance from the screen (Figure 2). During the experiment, all instructions and information were given as text on the screen. Participants were informed that they were allowed to interrupt their participation at any time during the experiment. The experimenter was screened from the participant by a portable wall, so as not to disturb the participant. The experiment started with a nine-point calibration of eye data capturing, and a consecutive validation of it, which was not allowed to exceed 1° of visual angle (cf. Holmqvist et al., 2011). For each stimulus, leading questions were presented to be answered after the viewing. These questions were aimed as tasks for participants to promote a natural viewing of the film sequences, for example, “What genre would you suggest that this film belongs to?” Optional film genres were given as answer alternatives after viewing, but these data were omitted in the analysis. Other stimuli were run, interwoven between the two conditions of the experiment, in order to lessen repetitiveness for participants. In total, each participant spent between 45 and 60 minutes going through the experiment.
Results
There are three categories of results including eye-tracking data, all considering gaze behavior after edits. These consist of hits in AoIs, saccade frequency, and pupil dilation. In accordance with the study design, where two similar conditions with only one variable are tested on all members of a group, the statistical method used at all instances are two-tailed, pairwise difference t-tests, consistently employing a significance level α of 5%. Only the figures for the full sequences are considered (and not for particular edits), since types of continuity differ between edits.
Viewer Hits in Areas-of-Interest (Count and Share), per edit, for the two versions of the film sequence.
Note. Viewing data regarding Condition 1 (the film editor’s version) of the film sequence is indicated by a light gray background, while Condition 2 (the researcher’s version) is indicated by a dark gray background. Discontinuity edits are indicated by white text on dark background.
aFor the edits numbers 8, 19, and 20, the film editor did not appoint any particular Area-of-Interest after the cut.

Viewer hits in the AoI appointed by the film editor after Edit #14, as graphic overlay on Frame 10 after the edit point; Condition 1 to the left (a) and Condition 2 to the right (b). AoI = Area-of-Interest. The still frames come from video footage by Stefan Ek for the documentary film A Life Worth Living, produced and directed by Ingrid Jonsson Wallin.
Second, viewer saccade frequency within the same time span (120 to 400 ms) after edits was compared for the two conditions. Only saccades larger than 2.0° of visual degree were included to exclude within-fixation eye movements. Condition 1 scored 326 saccades for the 33 participants over the full sequence, while Condition 2 provoked 422 saccades. This was a significant difference of 29.4% at t(32) = 4.013, p = .0003, d = 0.428. This result supported the hypothesis that viewers have to make more compensatory saccades after altered edits (up till six frames; cf. Figure 4).
Fixations (rings) and saccades (lines between rings) as graphic overlays on Frame 10 after the edit; Condition 1 to the left (a) and Condition 2 to the right (b). The still frames come from video footage by Stefan Ek for the documentary film A Life Worth Living, produced and directed by Ingrid Jonsson Wallin.
Distribution of Viewer’s Pupil Dilation after edits for the two conditions.
Note. Pupil dilation data regarding Condition 1 (the film editor’s version) of the film sequence is indicated by a light gray background, while Condition 2 (the researcher’s version) is indicated by a dark gray background. Discontinuity edits are indicated by white text on dark background. Viewing cognitively nondemanding screen content is used for reference pupil sizes. Figures represent average increases of pupil sizes while viewing, regarding 33 participants.
Discussion
In this study, we compared participant viewer’s gaze data when viewing the very same film sequence twice, as two versions, where one has had its edits altered by a few frames, as opposed to watching similar but different film sequences that adhered to either continuity or discontinuity editing. What is actually happening when a few frames at an edit point are altered is that the visual transients in the shots are allowed to manifest themselves before the viewers’ eyes causing a perceptual or attentional response. Regarding the film sequence used in this study, the film editor minimized visual transients at edit points in her version of the sequence (Condition 1), whereas the researcher manipulated them in Condition 2. From the results of eye tracking 33 viewers viewing both conditions, the hypotheses regarding slight toward-discontinuity-effects can be assessed. A manipulation of four to six frames of the film editors accomplished cuts did not affect viewers’ capability to spot the intended AoI in due time (120 to 400 ms after the edit point); the viewers’ eyes made compensatory eye movements (saccades) after the cuts, as predicted by Smith (2005); and pupil dilations increased significantly in response to less smooth cuts.
The answer to the research question is therefore that altering a few frames (four to six) of continuity edits (in the form of cuts), as they occur in a film sequence, impacts film viewers’ perception of these edits. This impact does not extend as far as making the eye attend to unintended visual features of the image but affects eye movements that matter to the cognitive processing of the film.
However, many of the AoIs appointed by the film editor were rather wide, so a large share of hits in those AoIs was expected (but was less than 60% hits for the continuity edits). Thus, measuring hits in such generous AoIs did not point out any major viewing differences for the altering of a few frames of the current edits. Narrower AoIs might have done so but would that instead deviate from the ideas of the film editor and his or her design of the edits.
The increase of the saccade frequency after the edit points drawing toward discontinuity also confirms previous empirical studies, since it is in line with what d’Ydewalle and Vanderbeeken (1990) and d’Ydewalle et al. (1998) found after edits that disobeyed continuity-editing rules. D’Ydewalle and Vanderbeeken (1990) could also state that there was a cognitive load effect coexisting with the increase in saccade frequency. Therefore, we suggest that the saccade frequency after edits indicates some kind of cognitive constraint on part of the film viewer.
The present increase in pupil dilation is interpreted as an indication of higher cognitive load (cf. Beatty, 1982; Chen et al., 2016; Kramer, 1991; Seeber, 2013) when viewing the film sequence slightly altered toward discontinuity. Since there is a limit to how much the pupil can expand, a 28% relative increase is noteworthy. Such a pupil enlargement is equal to when cognitive tasks increase from intermediate to high difficulty (cf. Beatty, 1982; Chen et al., 2011; Chen & Epps, 2014; Peysakhovich et al., 2015).
Considered together, the rise of the saccade frequency and the larger pupil dilation in this study are signs of more constrained cognition while viewing Condition 2, compared with Condition 1. The findings suggest that film continuity is indeed jeopardized if a film editor cannot achieve good frame-by-frame perceptual precision. Instead, there is a discontinuity effect on edits that are intended to be continuous, an effect that can be expected to make film viewing cognitively more challenging (cf. D’Ydewalle & Vanderbeeken, 1990; Frith & Robson, 1975; Lang, Geiger, Strickwerda, & Sumner, 1993; Magliano & Zacks, 2011).
With reference to gaze behavior and cognitive effects that occur as a result of visual transients that draw toward discontinuity, film editing rules and conventions that strive for film continuity are again supported as providing a basis for perceptual continuity (cf. d’Ydewalle & Vanderbeeken, 1990; d’Ydewalle et al., 1998; Shimamura, A. et al., 2015; Smith & Henderson, 2008). Furthermore, there is evidence that the film continuity of these rules and conventions are meant to support and rely on very finely grained frame-to-frame matching (cf. Shimamura et al., 2014). Therefore, the current results confirm that film editors possess sensitivity to attentional cues, as suggested by Smith (2005). By employing this sensitivity to attentional cues, film editors achieve perceptual precision at edit points, and thereby avoid discontinuity effects, since altering a few frames of accomplished film edits impacts the viewers’ perception of those frames.
We suggest, for further research, that the discontinuity effect that the altering of a few frames at an edit point causes might interfere with event perception as identified by Magliano and Zacks (2011). It might also impinge on the attentional synchrony viewers exhibit at continuity edits (Smith & Mital, 2013). Moreover, since film consists of both sound and visuals, film editing calls for more research attention that takes multimodal capacities into consideration (cf. Freeland, 2012; Shimamura, Cohn-Sheehy, Pogue, & Shimamura, 2015). Additionally, the effects found in this study should be studied using other kinds of film material, such as film adverts, computer generated animations, or TV drama, in order to see if the effect found is, perhaps, unique to documentary film material.
In this study, continuity is confirmed to matter to film viewing (cf. Messaris, 2012; Smith, 2012), as well as being important to film editing (cf. Fairservice, 2001; Orpen, 2003; Reisz & Millar, 1968). The results thus support film theories that justify film continuity on cognitive grounds (cf. Shimamura, 2013; Smith, 2012; Smith & Henderson, 2008).
Footnotes
Acknowledgments
Our appreciations goes to Ingrid Jonsson Wallin, who participated in the study as film editor, and provided the film material used as director/producer of the film A Life Worth Living, as well as to videographer Stefan Ek.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research project Film Editors’ Visual Intentions and Viewer Perceptions was sponsored by Dalarna University, the European Regional Development Fund, and the Municipality of Falun, Sweden.
