Abstract
Filmmakers’ efforts not only advance film as an art form, they also provide insights about basic perception. This research was designed to uncover commonalities between the aesthetic appreciation of viewers of films and the perceptual capacities of observers of environmental events. We assessed whether the temporal structure of events in the environment is reflected in the temporal structure of events in film. Participants in Study 1 segmented neutral environmental events to establish a benchmark temporal structure. Study 2 compared the temporal structure of editing in amateur and professionally made films. Results from these two studies suggest a particular fractal structure common to environmental event perception and the editing structure of professional films. This hypothesis was then tested in an experiment that reedited one film so as to produce four different versions, each with a different fractal structure. These versions were evaluated by audiences in terms of aesthetics (e.g., general likability, comprehension, technical aspects of craft). The results suggest that the fractal structure typical of environmental event perception is preferred, even when it is not the original, artistically intended version. It is argued that narrative films succeed, at least in part, because their temporal structure reflects the temporal structure of environmental event perception.
Hollywood continuity editing style—the process by which cinematic events are presented to the audience—has largely been advanced through trial and error, not through a systematic exploration of the interaction between the viewer and the viewed (Anderson, 1996). While filmmakers have no doubt directed their efforts at advancing film as an art form, they have unwittingly been simultaneously performing experiments on the human perceptual system (Blau, 2020). Given the unlikelihood that the perceptual process for viewing natural, environmental scenes 1 is distinct from that for viewing filmed scenes, an understanding of what works for film editing should provide insight into what constrains visual and auditory perception under more natural, noncinematic conditions. After all, “motion pictures are not somehow exempt from ecological constraints because they are cultural artifacts rather than natural phenomena” (Anderson, 1996, p. 28).
We are arguing, in effect, that perceptual systems get “tuned up” by encounters with the environment. The information that experts such as filmmakers are sensitive to, and that they (implicitly) choose to present, is anchored in the circumstances of their having been immersed in a sea of information their whole lives (cf. Carello, Thuot, Anderson, & Turvey, 1999). This is not meant to imply that filmmakers are trying to “imitate reality”—much of experienced reality is far too mundane or tedious to be included in what is intended to be compelling cinema—but to provide a plausible sequence as a backdrop for their aesthetic effects. There should be domains, therefore, in which film can be used in order to examine perception more generally. To that end, we have chosen to investigate the relationship between film editing and the perception of events, in particular, as indexed by event segmentation.
Event Segmentation
In the 70 years since James Gibson and Gunnar Johansson first championed the primacy of events for perception (e.g., Gibson, 1950, 1957; Johansson, 1950, 1958), event perception has become a broad area of study. Some researchers record the number of events into which an observer partitions a sequence (e.g., Buck, Baron, Goodman, & Shapiro, 1980; Newtson, 1976; Zacks & Swallow, 2007). Others consider the style of change that defines an event (e.g., Zacks & Tversky, 2001), or attempt to identify what perceivers are highlighting when they mark events (e.g., Zacks, Speer, Swallow, Braver, & Reynolds, 2007; Zacks, Swallow, Vettel, & McAvoy, 2006). The present research does not attempt to encompass the entirety of event perception. Instead, as we have noted elsewhere, understanding the perceived timing of events—especially the structure of that timing—over and above the number or qualia of events can open a new avenue for research (Blau, Petrusz, & Carello, 2013). In that vein, the present research emphasizes the temporal structure of events “in the wild” and events in the cinema.
One tradition in event perception research (for good exemplars, see Baldassano, Chen, Zadbood, Pillow, Hasson, & Norman, 2017; and Zacks et al., 2007) has been to treat events as if they were discrete—that time is broken down by the perceiver into indivisible chunks of a particular grain size. In such a characterization, events are considered sequential. However, it has been argued that events, and our perception of them, happen at many timescales simultaneously. This suggests a kind of nesting (Blau et al., 2013; Gibson,1979/2015; Stewart & Blau, 2019; Wagman & Miller, 2003; Warren & Shaw, 1985) as well as the imperative to understand all events, not just those at a particular grain size. As Warren and Shaw (1985) put it, “we must strive to understand the spatio-temporal interval of an event at many different scales of analysis: slower and faster, larger and smaller, so long as we stay within the bounds of ecological relevance” (p. 10, emphasis added). But what are those bounds? Doubtless, there is a scale at which a segment might go unnoticed, but even temporal changes below conscious detection can influence behavior (Repp, 2000). And at larger scales, events contextualize the information. For example, the meaning of “attending a political rally” is quite different as a function of the broader event in which it is nested, say, “my political activism” versus “chance encounter while visiting a new city.” The broader event changes the meaning of the nested event. It is possible that the bounds of ecological relevance are much more extensive than typically suspected.
A guiding intuition for an event from Gibson (1975) is that it involves change. As elaborated by Shaw and Pittenger (1978), “An event can be defined as a minimal change of some specified type … wrought over an object or object complex within some determinate region of the space-time continuum” (p. 189). This definition of events applies at all of the time scales of ecological interest, suggesting a similarity across events happening at the longest and shortest time scales. If that similarity is in the structuring of events across scales, then the characterization of events and their perception can take advantage of the analytic domain of fractals.
Fractal Structure in Cognition
The term fractal was coined by Mandelbrot (1975) to refer to patterns generated by recursion. Many classic examples of fractals (whether mathematical, such as the Koch curve and Sierpinski gasket, or natural such as coastlines and clouds) are spatial. They exhibit self-similarity at all scales and variability that increases disproportionately with resolution. These hallmark characteristics are also exhibited by processes over time (e.g., heartbeats, earthquakes, sunspots). One characteristic that is of particular relevance to the study of nested events is that measuring the dynamics at one scale in a fractal system allows inferences about the dynamics at all scales. Fractal analysis allows for a scale-independent investigation (Blau & Paxton, 2019).
Fractal structure (also known as power-law behavior, 1/f scaling, and pink noise) has been found in an ever-increasing number of natural and sensorimotor phenomena (see Newman, 2005, and Kello & Van Orden, 2009, respectively, for reviews). This ubiquity has also made its way into the cognitive science laboratory, most prominently with respect to response time (e.g., Gilden, 2001; Kello, Beltz, Holden, & Van Orden, 2007; Van Orden, Holden, & Turvey, 2003, 2005). Of particular relevance for the present research, it has been found that under a variety of experimental circumstances, observers seem to reproduce the structure of the events with which they are presented. Whether segmenting films or basketball games (Blau et al., 2013), denoting emotional encounters (Isenhower, Frank, Kay, & Carello, 2012), or simply attempting to synchronize with a chaotic metronome (Stephen, Stepp, Dixon, & Turvey, 2008) participants produce (usually by tapping a button on a recording device) a time series of electronic signals that echoes the structure of the flow of events.
Especially telling is that the temporal structure of the environment seems to be reflected in the temporal structure of cognitive processes (e.g., Anderson & Schooler, 1991; Blau et al., 2013; Rhodes & Turvey, 2007). If the event structure of the environment is fractal, then it should not be surprising that the perception of that event structure—the attunement to the nesting of subordinate and superordinate events—is fractal as well. We are, in essence, endorsing the perspective offered by Van Orden et al. (2003, 2005) in terms of what they take to be the question for cognitive science: “What kind of system do we study?” The broad answer—“a system with 1/f scaling”—is accompanied by a particular prediction— “the same processes govern cognitive performance in very short and very long time frames” (p. 122). However, what satisfies a formal characterization of fractal is, essentially, a value within a fairly broad range of a particular index (to be described below). Within the designation fractal we would like to understand whether a particular value reveals anything else about the event. Might values that are particularly high or low reveal differences in how corresponding events are perceived, experienced, or appreciated? We have found, for example, that anxiogenic films and events are accompanied by fractal indices at the higher end relative to films and events that are not as anxiety-producing (Blau et al., 2013).
The present research exploits some possibilities offered by film and the way cinematic events reflect environmental events. Insofar as editing is a reflection of perception, building from an understanding of that relationship will, ideally, reveal something fundamental about the nature of event perception more generally.
Editing as a Reflection of Perception
We have argued elsewhere that film provides a potentially rich medium for understanding the structure of perception (Blau, 2020). The events depicted in narrative movies, in particular, are generally tractable, in the sense of being easy to identify. But they are also, to a certain extent, malleable; that is, they can be modified through editing. This latter characteristic makes them amenable to systematic investigation. Much of film and editing theory centers on what people can understand, believe, and view without being jarred, confused, or discontented. The Hollywood style of filmmaking and editing, in particular, is said to be an invisible style (Bordwell, 1985; Messaris, 1994); when done well, the audience will not notice the craft and will focus instead on the story. What has not been studied extensively is what, if any, perceptual basis there is for how editing might aid that experience.
Directors and editors make choices in how to present the story over and above mere considerations of continuity and coverage (Pearlman, 2017) and those choices have consequences for the feel of the movie. It has been argued that commercially successful Hollywood movies effectively encourage their audiences to emotionally engage with the depicted environment as if it were real (Anderson, 1996; Blau, 2020). Nonetheless, it cannot be said that films are faithful recordings of events. By necessity and by design, the editor will depart from a real-time structure. Most obviously, a story typically takes longer than the few hours of the film. The requisite condensing is facilitated by the editor’s use of transformations that cannot happen in real events (e.g., sudden translocations in the viewer’s point of observation, instantaneous jumps forward in time). For narrative films, at least, there must be a coherence to the flow of events that allows a viewer to follow along. It could be argued that they do so by virtue of presenting events in a way similar to reality. We argue that one dimension of similarity that affects viewer enjoyment is ensuring that the temporal structure of the editing—its fractality—reflects the temporal structure or fractality of event perception.
Success in movies is, at best, a gray area usually involving how much money a film has made; that said, as any lover of a cult classic will tell you, that is not the only appropriate measure. However defined, it is possible that the success of editing (and, indeed, the success of the movie itself) rests on respecting the structure of the world, or rather, how we perceive that structure. To demonstrate this, we first need to discern what natural perception looks like. To that end, Study 1 is directed at capturing the dynamics of environmental events of a relatively neutral emotional character.
Study 1
As noted, our previous research has found the temporal structure of natural, environmental events to be fractal (Blau et al., 2013). The event in question was a basketball game specifically chosen because it had anxiety-producing emotional content—observers were invested in the outcome. However, that event may not make the best proxy for more emotionally neutral natural perception. In addition to being anxiogenic (that is, anxiety-producing), basketball games have a prescribed structure dictated by the rules of the game. For example, it has a defined beginning and end with a scheduled break at half-time, along with time-outs that impose a change in the flow of what is being observed. Within the larger event, certain smaller events are guaranteed to happen (e.g., passing and dribbling the ball prior to shots at the basket) with a good deal of turn-taking. A nested structure is inherent to the character of that event. In order to characterize environmental events as nested, therefore, a less question-begging event would be more appropriate.
Everyday noncinematic scenes differ in a variety of ways. Their activity level can be relatively low and calm or relatively high and bustling. The ongoing activity may be human-generated or it may arise from things in nature. And in most circumstances, unlike the basketball game, the flow of events is not constrained by rules and there may not be a definite beginning, ending, or duration. Given that our goal is to characterize the structure of environmental events, identifying which changes matter for particular observers may not be necessary. Instead, we can simply ask people to observe a variety of settings and to segment the events however they choose. Earlier arguments about the nesting of events suggest that the temporal structure of their event segmentation ought to be fractal and independent of the kind of setting.
Method
Participants
Participants were recruited from three different universities: University of Connecticut (n = 12), Purchase College (n = 10), and State University of New York College at Oneonta (n = 30). All individuals participated in partial fulfillment of a course requirement, providing written informed consent in accordance with their respective university’s regulations for studies with human participants.
Materials
Perceived events were segmented with an event recorder, a small box with a lever that, when pushed, emits an inaudible beep that is fed via a wire to an Olympus digital recorder (model number WS-400S) for later analysis (Figure 1).

Event recorder used in Study 1. Black box has a lever that produces a tone when pushed (inaudible to the participant). Tones are recorded by a digital recorder for further analysis.
Procedure
A variety of locations on the different campuses were chosen to provide a range of activity levels (ranging from low to high) and event types (e.g., ranging from predominantly human-generated to exclusively naturally occurring, and including a mix of these). Locations included an open field, a university plaza, a library entranceway, a Student Union, a catwalk overlooking a cafeteria, and a dormitory entryway with a view of the outdoors.
Participants were brought to their location, given the event recorders, and instructed on their use. They were asked to press the recorder lever whenever they saw “something change.” This very broad definition of an event boundary (cf. Gibson, 1975; Shaw & Pittenger, 1978) allowed for each observer’s personal interpretation of what events mattered. The duration of observation varied from 30 to 120 min with most occurring over approximately 50 min. Data from participants who produced fewer than 450 taps (primarily due to going off-task, for example, to talk to passers-by or look at a cell phone, rather than not encountering enough events to mark) were eliminated from all subsequent analyses. A time series of inter-tap-intervals (from tap onset to tap onset) was generated for each participant.
Results
By its very nature, a fractal time series is not described appropriately by traditional summary statistics. For example, the mean and standard deviation change over time and with the level of resolution (see Liebovitch, 1998, for a review). Therefore, methods specifically designed for investigating fractal phenomena were used. In particular, the Hurst exponent, H, was calculated for each time series using detrended fluctuation analysis (DFA; Peng et al., 1994; see Blau et al., 2013 for more details). H is an estimate of the structure of a time series. If 0.5 < H < 1.0, then the time series is considered fractal. At 0.5, the time series exhibits true randomness, also called white noise. At 1.0, the time series is more rigidly structured (if not periodic or predictable). It is important to appreciate that this range is a continuum, not categorical. Low numbers in this range (e.g., 0.59) are fractal but tending towards randomness; high numbers in this range (e.g., 0.92) are fractal but tending towards rigidity.
The number of taps ranged from 477 to 5,626 (M = 1,832.6, SD = 1,307.32). DFA was used to calculate Htap for each participant’s time series. Overall Htap averaged 0.65 (ranging from 0.52 to 0.82, SD = 0.09). Although activity level was not manipulated systematically (and a given type of setting could differ in activity level as a function of time of day), number of taps can stand proxy for activity level. The correlation between number of taps and Htap was not significant, r(32) = 0.29, p > .05.
Discussion
Blau et al. (2013) had found that the temporal structure of a live event occurring in real time was fractal. It might be argued that their event setting, a basketball game, was ripe for a fractal characterization due to the inherently nested nature of that sport. In keeping with that finding, however, Study 1 revealed the segmentation of natural, neutral events without prescribed beginnings or endings, to be fractal as well. Moreover, the value of the average Htap in both studies was 0.65. Although there was variability in the responses, it is important to note that event segmentation does not assess the temporal structure of environmental events per se but of how they are seen (Bingham, 2000). The same setting might be seen as calming or frenetic given the (immediate and long-term) history of the observer. Given the documented connection between higher anxiety and higher Hurst exponents (Blau et al., 2013; Gordon, Blau, & Carello, 2011), it is conceivable that individual differences in emotional state may have been responsible for that variety. Having said that, the average remained the same (i.e., 0.65) regardless of location, activity level, or duration of observation. This suggests the kind of ubiquity and indifference to grain size that would be expected for fractal structure. Taken together, these results suggest that Htap = 0.65 is at least a reasonable benchmark for perception of relatively neutral events.
To the extent that a film constitutes the presentation of nested event structure, it is reasonable to expect perception of films to reflect that structure. It has already been shown that the structure of editing is fractal (Blau et al., 2013; Cutting, DeLong, & Nothelfer, 2010). And, indeed, the perception of films assessed through event segmentation is also fractal, with a significant correlation of Htap and Hedit (Blau et al., 2013). However, these previous studies focused exclusively on professional, Hollywood-style movies. A reasonable conjecture is that those movies that are considered successful might differ structurally from movies considered unsuccessful or, at least, less successful. Although Cutting et al. (2010) did not find a correlation between viewer ratings of the films and their fractality, it is possible that this was due to a kind of ceiling effect—presumably, once editors are skilled enough to work on professional movies they have already learned to make their editing fit natural, noncinematic event perception. Amateurs, however, may not have learned this skill.
Study 2
Amateur films are generally considered less watchable than commercial films. This disparity can be attributed to any number of factors, from the artistic (e.g., the quality of the acting and writing) to the prosaic (e.g., the quality of the equipment; the amount of time devoted to preproduction, production, and postproduction). Certainly, commercial films have an advantage in the sheer number of specialized contributors (e.g., writers, cinematographers, editors). It is important to note, therefore, that Study 2 addresses only one of the important specialized talents distinguishing commercial films, namely, the structure of the editing.
To the extent that narrative films aim to capture the information that specifies the natural flow of events, we speculate that they will be characterized by an Hedit of 0.65. We speculate, further, that an Hedit of 0.65 is more likely to characterize professional films than amateur films. Film editors who exemplify the Hollywood style may be more likely to (implicitly) create the perception-relevant Hedit than editors outside the mainstream who have not yet learned how to produce the requisite temporal structure. As Cutting and Pearlman (2019) demonstrated, professional editors might go through a gradual shaping process to achieve just such a structure. What might be termed the “target hypothesis” predicts that, for professional filmmakers, Hedit = 0.65, whereas for amateurs, Hedit ≠ 0.65. Alternatively, the professionals may simply be more adept at hitting that target consistently. What might be termed the “conformity hypothesis” predicts that, regardless of the average Hedit, the standard deviation of Hedit will be higher for amateurs than for professionals. Although the target hypothesis and the conformity hypothesis are not mutually exclusive, they provide a straightforward way to characterize potential differences in the editing structure of professional and amateur films as they relate to the temporal structure of events in the wild.
Method
Materials
Fifteen professional, feature-length movies (95–178 min long) were chosen. Criteria such as viewer recommendations, ticket sales, and ratings by voter boards suggest that these films can be considered very watchable. Many of them either won or were nominated for the Academy Award for Best Picture or Best Editing, or for similar awards. Films were selected so that no director (or film series) was represented more than once. In addition, films were selected from a wide variety of release dates (1939 to 2006 2 ) and genres (e.g., comedy, drama, romance, fantasy). Table 1 (top) provides the list of films so chosen.
Editing Statistics for Professional (Top) and Amateur (Bottom) Movies.
Fifteen amateur movies were chosen (34–106 min long). As a rule, amateur films tend to be shorter than professionally made films (due to limited funding, constraints of film festivals, technical difficulties, etc.). In addition, given the technological requirements for making and distributing such films (inexpensive digital cameras, nonlinear editing, 3 the Internet), they tend to be much more recently produced. Although release dates are generally not available, it is likely that all were made in the last two decades. The movies were available either for purchase or free download from the filmmaker’s website, or directly from the filmmaker. They have not been shown at any recognized film festivals, they have made little or no profit from sales, and they were unknown to a random sample of 10 movie-goers. Only two of these films are in the Internet movie database, IMDb.com (a popular site that includes viewer reviews in addition to information about cast, crew, and awards), and both of those received fewer than 15 votes (compared to a minimum of 157,000 for the professional films). While these films were not prescreened to be particularly bad or unwatchable, most of them were both (e.g., receiving a majority of “thumbs down” votes on YouTube). Table 1 (bottom) provides the list of chosen films.
Procedure
All 30 movies were imported into Final Cut Pro where the edit points—an abrupt change in perspective or scene (see online Appendix A for details) were identified. The inter-edit-intervals constituted the time series for the movies. The 30 time series were analyzed using DFA to obtain Hedit for each film.
Results
As expected, the professionally made films were significantly longer (M = 122.7 min, SD = 24.71) than the amateur films (M = 74.3 min, SD = 21.95), t(28) = 5.68, p < .0001; however, edit density was roughly the same for the professionally made films (M = 9.92 edits/min, SD = 4.90) as for the amateur films (M = 10.57 edits/min, SD = 5.12), t(28) < 1.
Hedit of the professional films ranged from 0.54 to 0.73, with a mean of Hedit = 0.65 (SD = 0.05) which is not significantly different from the proposed target 0.65, t(14) < 1, but is significantly different from 1.0, t(14) = 26.35, p < .0001, and from 0.5, t(14) = 10.80, p < .0001 (see Table 1; Figure 2). 4 The Hedit of the amateur films ranged from 0.48 to 0.85, with a mean of Hedit = 0.65 (SD = 0.10) (see Table 1; Figure 2). As with the professional films, this value is not significantly different from the proposed target 0.65, t(14) < 1, but is significantly different from 1.0, t(14) = 13.04, p < .0001, and from 0.5, t(14) = 5.53, p < .0001.

Amateur films have a significantly higher variance of Hedit than professionally made films.
The target hypothesis, that the two groups of editors produce films with different mean Hedits, is not supported, t(28) < 1. However, the conformity hypothesis—that the editors differ in how well their films conform to the mean—shows promise as Levene’s test for homogeneity of variance revealed that the variance of the amateur group was significantly higher than that of the professional group, F(1, 28) = 6.48, p = .017. In other words, Hedits for the professional films were, as a group, more tightly clustered than Hedits for the amateur films (see Figure 2).
Discussion
One way of looking at these results is that both amateur and professional editors aim at the same target (i.e., Hedit = 0.65), but professional editors are better at hitting it. As expected, that target echoes Htap from the neutral environmental settings in Study 1, suggesting that filmmakers are at least incidentally producing the structure of natural events. This is also in keeping with the hypothesis that films succeed at least in part by presenting the viewer with a structure analogous to segmentation of events in commonplace social activities; after all, those films that are more successful (i.e., the professionally made films) are more tightly clustered around the natural event target than those that are less successful (i.e., the amateur films).
The temporal structure of professionally edited films—the way in which the flow of events is presented—may well contribute to their success. As noted, however, professional and amateur films differ in a number of other dimensions of talent and production value. A true test of the impact of editing structure would require that those elements be kept constant. An experiment was designed to focus exclusively on editing structure.
Experiment 1
Given the results in Study 2, it is possible that a prerequisite for creating an aesthetically pleasing film is to first succeed in capturing the natural flow of events. A direct test of this hypothesis would be provided by comparing the aesthetics of distinct versions of the same film that differ only with respect to the temporal structure of their editing. To that end, this experiment employed a single film reedited to produce three additional versions, each with a different temporal structure (i.e., different values of Hedit). Each variant used the same footage; as a consequence, they have the same actors, writing, and production values. Each also has the same overall length, number of edits, and edits per minute. The sequence of shots is retained but with individual shots truncated or lengthened. If the fractality of editing has nothing to do with viewer enjoyment, then all versions of the movie should be enjoyed equally. Alternatively, if alignment with the natural flow of events is important, then the versions that most closely match the fractality of event perception should be better received.
Methods
Participants
Undergraduate students (with an age range of 18–48, M = 19.2), at the State University of New York, College at Oneonta, volunteered to participate in this study for course credit. Out of 310 participants, 76 identified as male, 229 as female, and 5 declined to answer the question. Written informed consent was obtained in accordance with the State University of New York’s internal review board’s regulations for research involving human subjects.
Materials
Production of different versions of a given film requires that the filmmaker provide the original footage—namely, all film stock from which the final product was produced—so that it can be manipulated appropriately. This is not possible with commercial films. Instead, we selected a film produced at Purchase College, an institution with a well-respected film program and annual competitions for student films. The 2012 winner of Best Sophomore Film, Sophie! (Qutab, 2012), was chosen based on its popularity among students at Purchase College as well as its suitability to the type of editing manipulation to be employed. In particular, while it was a narrative film, it was largely (though not exclusively) in a montage format allowing for lengths of shots to be altered without principally affecting the presentation of information. Its short length (approximately 13 min) also ensured that viewers would not have trouble staying actively engaged throughout their participation.
The film was first imported into Final Cut Pro where all edit points were identified following the same editing heuristics used in Study 2. The sequence of inter-edit-intervals was then analyzed using DFA. The Hedit of the film was found to be 0.84. First, we should note that a high Hedit is not unusual in films intended to be anxiogenic (Blau, 2011; Blau et al., 2013). And while the particular value differs from the H = 0.65 target inspired by naturally occurring environmental events, that value can also be higher in anxiogenic settings (Gordon et al., 2011). Finally, since this experiment is based on comparisons of different versions of the same film, what matters is whether the fractality of the different versions affects its likability.
The time series of the editing was shuffled repeatedly—over 700 times—using a random shuffling method. That is, the inter-edit-intervals were put into an excel document in the order they appeared in the original movie, the list was then randomly rearranged so the length of first interval might now be the length of the 14th, or the 63rd, and so on. Creating the new time series in this way maintained the movie length (to the frame), edit density, average shot length, and so on. The only difference was the order (and thus the structure) of the inter-edit-intervals. Each resultant time series was analyzed using DFA. From these options, three time series were chosen that most closely replicated the desired Hedits: Hedit = 0.50, 0.65, and 0.79. Hedit = 0.65 was chosen to reflect the natural, neutral event structure found in Study 1 and the average of the Hedit of the professional films from Study 2. Hedit = 0.50 was chosen as a perfectly random structure (i.e., without any of the long-term correlations found in the fractal structures) as this would allow us to test the hypothesis that any fractal structure would be seen as superior to a random structure. Hedit = 0.79 was chosen as it is still fractal, but does not reflect neutral event structure (Study 1) or anxiogenic event structure (Blau et al., 2013).
Each generated time series of intervals was then utilized (by an experienced editor) as a model to reedit the raw footage of the film into a new version in such a way that the new version and the original can be played side-by-side and have the audio stay in synch throughout the film. It is worth noting that in addition to differing in their edit Hurst, the remakes are also different from the original in that the editing is no longer tied directly to the events in the film. That is to say, the reedited versions break one of the editing “rules” espoused in continuity editing, namely, cut on the moment of most action (i.e., a moment of change). As moments of change also denote events, continuity editing ties edit points to the events. This could not be replicated in the re-shuffling method as the timing of each edit point was dictated by the shuffled time series. However, while a viewer provided with more than one version of the film could discern that the versions differed, that same viewer would be unable to tell which was the original version. This is partly why a film with a more montage like structure was chosen. In nonmontage editing, a shot of a woman pouring coffee would be followed by a shot of the same woman, but from a different perspective. In montage editing, the shot of the woman might be followed by something discontinuous such as a shot of a plane landing.
A Likert survey consisting of 18 statements was designed to measure participants’ aesthetic appraisal of the film. As can be seen in online Appendix B, the survey covered issues such as general likability, comprehension, plot/character development, and technical aspects of the craft. As previous research (Blau et al., 2013) has shown that anxiety has an effect on the fractality of event perception (and vice versa, e.g., Gordon et al., 2011), we included the Spielberger State Anxiety Inventory, or STAI (Spielberger, 2010) to measure participants’ anxiety following their viewing of the film, for use as a possible covariate. Given diversity in the participant sample, a demographic survey was also included to provide additional possible covariates for the analysis.
Design and Procedure
Participants watched one version of the film, the original (Hedit = 0.84) or one of the reedited variants (i.e., Hedit = 0.50, 0.65, or 0.79), and immediately afterwards were asked to complete the STAI, aesthetic appraisal, and demographic surveys. They were thanked for their participation and asked not to discuss the film with any of their classmates (who might be potential future participants).
Results
The aesthetic appraisal scale was found to have a very high reliability (Cronbach’s α = .857). For each participant, an “appreciation score” was computed by averaging across all 18 items. As expected, the original version of the film, Hedit = 0.84, was the most well-liked (M = 3.7, SD = 1.02), followed by the version replicating the structure found in Studies 1 and 2, Hedit = 0.65 (M = 3.6, SD = 0.84). The other two versions, Hedit = 0.50, and 0.79, were not as well liked (M = 3.4, SD = 0.73, and M = 3.31, SD = 0.85, respectively; see Figure 3). A one-way analysis of variance found those differences to be significant, F(3, 306) = 4.12, p = .007. Post hoc tests revealed that 0.65 and 0.84 differed significantly from both 0.50 and 0.79 (p < .05) but that 0.65 and 0.84 did not differ from each other (p = 1.0), nor did 0.50 and 0.79 differ from each other (p = 1.0).

Average aesthetic appraisal of different Hedits of the same movie.
Including the demographic factors as covariates did not affect this pattern. Indeed, only one factor (how many movies they watch per month, p = .001) was significant (the more movies watched, the more they enjoyed the film, regardless of version). The rest (i.e., age, gender, anxiety, year in school, major, minor, and whether either their major or minor is artistic) were not significant.
Discussion
The results from this experiment support the hypothesis that some Hedits (namely, 0.65 and 0.84) are more well-liked than others (specifically 0.50 and 0.79), even when controlling for all other film production factors. It is tempting to take from these results that editing a film in such a way that it reflects natural event perception in terms of structure (i.e., Hedit = 0.65) makes a film better—after all, professional editors produce that structure and a film reedited to that structure is better liked. However, there are some caveats to this. First, “better” is an amorphous term. We have here defined it with respect to a self-report of aesthetics as well as in terms of financial success, but—particularly when it comes to art—that is a gray area. Second, even if it does make a film better appreciated, that may not be because it reflects environmental event perception but rather that it is a case of familiarity. Given that Study 2 demonstrated that a great many professionally made films are edited with Hedit = 0.65, it is possible this structure is pleasing because it is familiar (Frederick & Loewenstein, 1999). Nevertheless, this is evidence that the fractal structure of natural event perception is preferred, even when it is not the original, artistically intended version.
It should be noted that the film used here was chosen, in part, because its montage style editing was more amenable to the kind of reediting required by the experimental manipulation. It remains to be seen whether different styles of editing (e.g., montage vs. continuity) would be responded to in the same way.
General Discussion
The participants of Study 1 viewed neutral environmental events in an effort to establish a benchmark temporal structure. Previous experiments (Blau et al., 2013; Isenhower et al., 2012) have demonstrated a correspondence between a viewer’s Htap and the Hevent that they confront. Given that an average Htap = 0.65 was obtained under varied circumstances, it suggests that it is a reasonable reference point for perception of neutral environmental events. Study 2 demonstrated that Hedit averaged 0.65 for both amateur and professionally made films, although the professionals were significantly more successful at hitting that target. These results taken together with those from Study 1 suggest that films succeed at least in part because they capture the temporal structure of environmental events. If this is the case, then creating a film with that structure should make it more successful (with success here defined as likable). The experiment addressed this question by taking the same film and editing it in a variety of ways. The original (Hedit = 0.84) and the version created using the benchmark found in both Studies 1 and 2 (Hedit = 0.65) were significantly better liked than the two films that were either not fractal (Hedit = 0.50) or fractal, but not at the preferred benchmark (Hedit = 0.79). That the original was made at Hedit = 0.84 could be an indication that the film was perhaps intended to be anxiogenic (Gordon et al., 2011) or indicate a secondary benchmark for likeable films. More research is needed.
This is not the first time that fractal structures have been found to be more pleasing and reflective of natural phenomena when it comes to artistic expression. The poured paintings of Jackson Pollock have been described as “Fractal Expressionism” (Taylor, Micholic, & Jonas, 1999a). Using the box-counting method (which measures spatial fractals instead of temporal fractal structure and results in an estimation of its dimensionality, D, instead of an H), analysis of Pollock’s paintings yields values of D ∼ 1.7 (Taylor, 2002; Taylor, Micolich, & Jonas, 1999b; Taylor, Spehar, Donkelaar, & Hagerhall, 2011). This same number is also found in natural patterns such as clouds and waves. In addition, it is possible to create Pollock-like paintings of varying D values using a chaotic pendulum. When displayed, 94% participants demonstrated a preference for fractal patterns over nonfractal patterns (Taylor, 1998). In general, visual preference peaks for D values ranging from 1.3 to 1.5 for natural, mathematically generated, and handmade (i.e., Pollock’s paintings) fractals (Spehar, Clifford, Newell, & Taylor, 2003). Voss and Clarke (1975) analyzed hours of classical music as well as human speech (in English, specifically) and found them to have the same type of 1/f power spectra (yet another measure of fractality) in terms of their melody and loudness. Moreover, music generated using 1/f noise was found to be more pleasing than music generated by white (random) noise or brown (structured) noise.
In all of these cases, we see a correspondence between the fractality of natural phenomena (clouds, speech, event perception) and preferred artistic expressions (paintings, music, film). Just as Pollock’s drip paintings are not clouds, and music is not human speech, films are not merely recordings of events—editing, lighting, and sound balancing all alter the recording to make it more compelling—but a reasonable conjecture is that these alterations are not meant to deviate from the actual information that specifies reality, but rather to heighten it. “The art of film-editing should be guided by knowledge of how events and the progress of events are naturally perceived” (Gibson, 1979/2015, p. 288). Most, if not all, ad-hoc editing rules can be traced to perceptual underpinnings (Anderson, 1996), so it is not surprising that the pacing of event presentation (i.e., editing) should do so as well.
Clearly, our efforts in this direction are nascent and much remains to be examined both in terms of types of environmental events and types of films. The fact that desired aesthetic effects are achieved in cinematic event segmentation that is not—at least explicitly—intentionally fractal is worthy of note. At the very least, it might open a potential for training editors to attend to the information that is available in the world and manipulable in the studio. As with all of the other rules of editing, however, following them is by no means a guarantee of success.
Supplemental Material
PEC903166 Supplemental Material - Supplemental material for Perceptual Underpinnings for “Good” Editing: A Fractal Analysis
Supplemental material, PEC903166 Supplemental Material for Perceptual Underpinnings for “Good” Editing: A Fractal Analysis by Julia J. C. Blau and Claudia Carello in Perception
Footnotes
Acknowledgements
Many thanks to Michaela K. Frymus for her work on a pilot study of Experiment 1, as well as Alan Wertz for reediting the film, and Tamur Qutab for allowing us to use his film (as well as providing the raw footage). Additional thanks to the undergraduate research team in the Blau lab (particularly Brittany Engert) for their assistance gathering and entering data for Experiment 1.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
