Abstract
Traditionally, theories of complexity used in psychology have been based on geometric, probabilistic, and algorithmic paradigms. While these have been useful in highlighting the importance of complexity for psychology, they have not, in general, addressed the relationship between complexity and processing cost. In this paper, we review some of the classic and current complexity theories in psychology and suggest that psychological complexity and processing cost can be quantified using the notion of hierarchical change. Finally, we discuss the relationship between change, complexity, and the Gestalt principles of perceptual organization.
Complexity is an all-embracing term that underpins most forms of human activity. From mathematics and science (Gell-Mann, 1995; Grassberger, 1986) to psychology (e.g., Donderi, 2006), complexity defines a boundary of what is perceivable, knowable and expressible. The aim of the current paper is to discuss some classical and current approaches to complexity in psychology. We give a partial overview of the main theoretical treatments and point out their strengths and weaknesses and argue that thus far, most psychological theories have, in general, ignored the relationship between complexity and processing cost. In order to address this issue, we propose that psychological complexity is best defined in terms of change. We illustrate our argument using a computational example and conclude by suggesting how our approach could advance the understanding of the long-debated issue of perceptual organization.
Information-theoretic approach to complexity
Intensive study of complexity within psychology began with the introduction of Shannon’s information theory (Shannon, 1948) into psychology in the 1950s. The principal exponents of this approach were Attneave (1954), Hick (1952), Miller (1953), and Pollack (1952), who adapted information-theoretic concepts of entropy/redundancy in order to address different perceptual and cognitive issues. Although initially influential in psychology, information theory has not fulfilled its initial promise (Luce, 2003). Nevertheless, it has stimulated and inspired a number of attempts to quantify the amount of information in different contexts including perception and language. Generally, the uncertainty of a particular outcome is a function of the size of a source, which determines the predictability of emitted information. Redundancy of a message emitted by a known random source represents the difference in the amount of information that the source is capable of emitting and that of the message. In order to calculate the entropy of a pattern or a message, one has to know the prior probabilities of individual symbols, and these depend on the size of the source. To illustrate, the entropy of the message “01010010,” the source of which is a Turing machine capable of outputting only two symbols, is different from the entropy of a message generated by a digital computer capable of producing all ASCII symbols.
Since the human observer rarely understands the source of any kind of novel information, little benefit can be gained from speculating about its origin, as indicated by the problem of ergodicity—the assumption or requirement that the underlying stochastic process does not change during the production of a message. In order to estimate the entropy of a sequence, one has to assume that the same laws of probability operate on the entire sequence. Otherwise, in Attneave’s words, “…the reader can readily appreciate the difficulty which would be created if, for example, the bias on the coin which we have discussed were changed in an erratic manner between throws” (Attneave, 1959, p. 13). It is clear from Attneave’s appeal that the difficulty is completely due to the limitation of the human observer. In other words, one has to believe that a single unchanging process is in operation; otherwise simple stochastic models are of little use. But how is it possible to know whether a process is ergodic if all that can be accessed is its output? The answer is the same as the answer to the question “how can we know whether a particular source is random”—never.
Another problem for the information-theoretic approach is the meaning of information. From the very beginning, two conflicting interpretations of information have co-existed within Shannon’s paradigm. To illustrate, Attneave (1959) began his exposition by defining information as something which “removes or reduces uncertainty” (p. 1). This meaning of information refers to its usefulness as well as its statistical rarity. Yet, later in the text, a concept of redundancy is introduced, which represents a probabilistic analog of regularity and the opposite of entropy. We quote: “At the opposite extreme, at 100 percent redundancy, symbols are generated in an altogether lawful and regular sequence, such that one can predict with complete certainty what the next symbol will be” (Attneave, 1959, p. 14). Under the previous definition, total redundancy should be maximally informative because it reduces uncertainty to zero. Yet, a maximally redundant sequence contains no information at all, since a maximally redundant sequence is 111111111… or 00000000…. This is corroborated by Palmer (1983), who equates predictable structure with “little information” (p. 283). This statement is not completely correct, because the two sequences do contain some information. The length of the sequence tells us something about the process as does the symbol chosen, although these factors are ignored by information theory. What is missing is structure—relations between symbols. This point was alluded to by Luce (2003) in his comments on the impact of information theory on psychology.
The same ambiguous treatment of information can be found in Donderi’s (2006) comprehensive review of visual complexity. Discussing Shannon’s basic idea, he states that an “unlikely symbol conveys more information than a likely one” (p. 75). Yet, on the same page, he says “uncertainty reduced is information gained” (p. 75). This should be interpreted in the sense that highly redundant messages reveal little about the source. As we discussed at the beginning, this relies on the (in our view unsustainable) assumption that the statistical properties of the source are known. Thus, information is unpredictable and complex while at the same time being simple and predictable. It is here that the fundamental incompatibility between information theory and psychology is revealed. Information theory does not consider the amount of information contained in individual patterns. If one thinks of exhaustive sets of finite patterns, it is obvious that simple patterns are statistically rare and highly unlikely to be generated by a random process (Ichikawa, 1985). Consequently, we arrive at an apparent paradox that a source outputting unexpected (improbable) simple patterns is significantly reducing uncertainty while at the same time providing the receiver with highly predictable redundant patterns, which convey little information about the variability of the source.
Notwithstanding the shortcomings of information-theoretic approach in terms of psychology (see Chater, 1996 for a detailed discussion), the notion of redundancy goes some way towards quantifying regularity. The relationship between structure and quantity was implicitly addressed by the original application of information theory. Specifically, if a pattern contains any form of structure (e.g., periodicity), the frequencies of occurrence of different combinations of symbols are uneven and there is a preponderance of only a few n-grams (Attneave, 1959; Falk & Konold, 1997). Similarly, a regular geometric figure will have fewer turns, different angles, or sides of different length. The failure of information theory-based approach to apply this insight was due to the assumption of statistical independence of different levels of structure. In the words of Luce (2003), “. . .the stimuli of psychological experiments are to some degree structured, and so, in a fundamental way, they are not in any sense interchangeable” (p. 185). In other words, different levels of structural organization within a pattern should not be treated as mutually independent statistical events.
The notion of interdependence of different levels of structure represents one of the cornerstones of Gestalt psychology, whose impact on complexity research in psychology cannot be overestimated (e.g., Hochberg & McAllister, 1953). The first serious attempt to quantify pattern goodness was offered by Gestalt psychologists in the 1920s and 1930s. They proposed that human observers organized sensory/perceptual and cognitive information according to a number of simple rules. Fundamentally, an individual perceptual scene is organized in such a way as to minimize the expenditure (or rather conversion) of energy. This is what the Gestaltists named the “law of Prägnanz” or “minimum principle” (e.g., Koffka, 1935). Patterns are considered “good” if they are compact, symmetrical, repetitive, or predictable. By contrast, “poor” patterns are complex, asymmetrical, and defy easy description. The relationships between elements as well as the relationship between objects and the background are crucial for perceptual and cognitive functioning. The original proponents of information theory in psychology did not, perhaps understandably, consider the implication of Gestalt notions of the minimization of energy expenditure and the relationship between the whole and its parts. This prevented Attneave and others from offering general observations on structure beyond indicating a certain degree of patterning at a particular level of structure. As we show later in the text, Gestalt principles are intimately related to change.
Wendell Garner: Reconciling probability and structure
Information theory was primarily focused on the quantitative aspect of entropy. The primary determinant of information is the size of the source or the frequency of occurrence of different outcomes. This aspect of information has been described as “metrical” (MacKay, 1950). The other, “structural” aspect, describes the relationship between units of information. Although the structural aspect of information cannot be isolated from the metric aspect, there is a sense in which pattern structure possesses properties that cannot be captured by enumerating frequencies of different outcomes. Rather, the essence of structure lies in the interrelatedness of pattern elements. Symmetry, orderliness, harmony, balance, and simplicity have a profound influence on all aspects of human life from politics to science, logic, and art and conceptually precede mathematical formalization. The connection between the two aspects of information lies in the fact that by definition, good patterns are simple, that is, contain few elements or require little space or time to be described. The fact that complex patterns are the most numerous and consequently most probable reflects the limited analytical ability of the human observer.
One of the most important contributions to the study of complexity in psychology was made by Wendell Garner, whose work related pattern goodness to the probabilistic, information-theoretic concept of redundancy. Starting with what was known at the time namely that simple regular structures are also statistically rare, Garner (1974) attempted to show that psychological complexity had its objective origin in statistical uncertainty. He began his exposition with a discussion of stimulus structure by citing a dictionary definition, according to which, structure is “. . .a complex system considered from the point of view of the whole rather than of any single part” (p. 2). Although Garner considered information and structure to be identical, he did not offer an explicit definition of information. Garner also distinguished between intrinsic and extrinsic structure, with the former term referring to the inherent structural properties of the stimulus which can be “defined independently of a user-organism” (p. 3). By contrast, extrinsic structure is only associated with the stimulus, often arbitrarily, and the stimulus “signifies something other than itself” (p. 3).
Garner began his exposition with defining “total sets” of patterns. These represent full factorizations of pattern properties along n dimensions each with x levels, producing xn patterns. Such total sets possess zero redundancy and contain maximum information. The definition of a total set is arbitrary and depends solely on the choice of relevant variables/ dimensions. According to Garner, redundancy is created by the selection of a subset from the total set. This is exemplified by the fact that in any subset of a non-redundant total set there is a certain amount of correlation between dimensions. This is what Garner calls “correlational” structure. Any subset of an exhaustive parent set possesses a degree of correlational structure, which can be more or less salient.
In an attempt to reconcile the quantitative basis of redundancy with the notion of pattern goodness, Garner started from two premises. First, he posited that “the properties of a single stimulus cannot be specified except in relation to the properties of sets within which it exists” (1974, p. 9). Here, Garner implicitly linked the information-theoretic focus on populations/sets with the observation that regular patterns are invariant under transformations. At the same time, he rejected the possibility that individual patterns could possess intrinsic structure. Second, he proposed that each stimulus created an “inferred set” in the mind of the observer. Presumably, this contains all possible transformations of the perceived pattern. The meaning of the inferred set is ambiguous because, according to Garner, such a set could include not only structural transformations but also symbolic/semantic considerations. Nevertheless, he suggested that pattern goodness should be inversely proportional to the size of such an inferred set. In other words, since good patterns come from small total sets, they lead to the creation of small inferred sets. To test this hypothesis, Garner and Clement (1963) compared participants’ judgments of goodness of simple dot patterns with the way in which the participants grouped the patterns into similar sets. The selected samples were members of an exhaustive set of 90 patterns constructed by marking five dots in a 3 x 3 grid. The only constraint was that each row and column should contain at least one dot. Set size was determined by the number of perceptually different patterns obtained by reflection and rotation, resulting in three distinct transformational sets. The participants were presented with all 90 possible patterns and asked (a) to rate them for goodness and (b) to divide them into arbitrary sets of similar patterns. There was a significant correlation between these two variables, prompting the authors to conclude that the perceived pattern uncertainty, which was associated with the size of the transformational set, represented the most important factor in determining pattern goodness.
This study has been influential in motivating thinking about complexity (see e.g., Palmer, 1983) but a closer inspection gives some cause for questioning the validity of the original conclusions. The patterns employed in the study possess considerable symbolic value. Inspection of the original stimuli indicates that many of the good patterns resemble letters and familiar symbols. As the patterns became more complex, they had fewer associations (see Corcoran, 1971, p. 50, for a similar criticism). This suggests that with small sampling spaces (e.g., 3 x 3 grids), participants resort to “non-structural” strategies to discriminate structurally similar patterns. A later study by Ichikawa (1985) demonstrated that Equivalence Set Size (the size of a set of patterns formed by reflection and 90-degree rotation) was not a good predictor of 2-D pattern complexity.
Garner’s theorizing on structure, complexity, and goodness has had considerable impact on subsequent research in areas of perception and attention. This is obvious when one considers theories of feature integration which conceptualize visual stimuli in terms of multidimensional collections of features. Even more important is Garner’s contribution to the transformational approach to the study of pattern goodness. At the same time, his argument relating complexity to probability cannot be seen as a genuine advance with regard to the original information-theoretic approach. Rather than the complexity or goodness of a pattern being a consequence of the size of its parent set (or an inferred set), set size could equally be defined in terms of the complexity of the pattern. The more complex a pattern is the more perceptually different patterns it can generate by means of mathematical transformations. We propose that it is an inherent property of simple patterns/forms that they do not change under transformation. This poses a problem for the idea of a set as a causal factor. As stated earlier with regard to information theory, quantity cannot capture structure.
Notwithstanding potential problems, Garner’s work offers a crucial insight into the relationship between structure and quantity. His statement that “good patterns have few alternatives” (1974, p. 21) elucidates the fact that simple, symmetrical and “good” patterns are statistically rare and impervious to change.
Algorithmic Information Theory and descriptive coding languages
The difficulties of information theory coincided with developments in computational complexity theory which were paralleled in the psychological domain. The development of the Algorithmic Information Theory (AIT) represents an attempt to reconcile structural complexity with the probabilistic nature of information theory. Since good overviews are available elsewhere (e.g., Li & Vitanyi, 1997), only a very brief outline is given here. On the computational side, an important advance in the study of complexity was made by the work by Kolmogorov (1965), Solomonoff (1964), and Chaitin (1969). The theory of algorithmic complexity combines information theory with the theory of computation. Algorithmic complexity is defined in terms of the length of the shortest algorithm in any programming language, which computes a particular binary string. A string of length x is incompressible if the shortest program that can produce it is at least x bits long. The fundamental postulate of AIT is that the Kolmogorov complexity (Kx) of a sequence is roughly equivalent to the shortest algorithm that generates it. For simple structures it is clear what such algorithms should look like [e.g., 010101010… => x*(01)].
The measure makes it possible, at least theoretically, to compute the complexity of an individual string. Since structurally simple sequences (e.g., 010101010101010…) require simple descriptions, they also possess low Kolmogorov complexity. Conversely, random strings possess high Kolmogorov complexity—a property, which agrees with our view that randomness equals high complexity. Maximum Kolmogorov complexity is given by log n + C, that is, the description of a string equals its length (n) plus a constant (instruction to print). The positive contribution of Kolmogorov was his focus on the idea of compressibility, namely, the property of simple structures which enables them to be described or reproduced using short, simple instructions. The notion of compressibility is related to structural descriptors such as simplicity, periodicity, and symmetry.
Information-theoretic and AIT approaches attempt to relate the outcome (message or output) to a known agent (source or algorithm). In the case of the former, the entropy of a string cannot be known without the knowledge of the appropriate probability distribution of its source. In AIT approach, patterns are viewed as outputs of algorithms. However, when faced with a novel structure, human observers (scientists) usually have little information about its source—the very definition of novelty. They are compelled to study it from scratch and attempt to relate it to the information in their possession. At the same time, given sufficient time and effort, any structure can be compressed so that its initial form becomes unrecognizable. We propose that pattern rather than its source should represent the starting point for defining psychological complexity. We address this problem by pointing out the difficulty of describing a structure in terms of its generating process. We view the human observer as the ultimate reference point and all other generators (stochastic processes or algorithms) as results of human abstractive compression.
In the case of algorithmic approach, the input/output problem is reflected in the fact that apparently simple algorithms (e.g., expansion of π) can produce vastly complex outputs and perceptually and intuitively simple structures (e.g., fractals) often require complex and apparently interminable calculations. Measures of algorithmic complexity treat algorithms as “input” ignoring the fact that algorithms themselves represent outputs of vastly complex compression processes. The only way in which a correspondence between the information in the input and output can be established is to “unravel” the input to the point where it is completely transparent, that is, to decompress or “unfold” it fully (Bennett, 1988). The problem is that no process can be completely isolated from its environment. The overall cost of compression can be expressed only in terms of the effort of generations of mathematicians and scientists, many unknown, who have devoted their lives to providing elegant and efficient abstract descriptions of different phenomena. Their effort, as well as many other factors, has to be taken into account when estimating the amount of compressed information. Thus, a fundamental problem for all algorithmic approaches can be succinctly formulated in two statements:
Algorithms appear simple because the compressed information is “discarded” (Bennett, 1987, 1988).
Algorithms are simple because apparently simple actions often have complex consequences.
The first statement refers to the problem of treating an algorithm as a starting point for complexity analysis. In the process of arriving at an algorithm (compressive abstraction) a great deal of information contained in the original phenomenon is hidden. This after all is the purpose of algorithms. Yet, this discarded information has not disappeared and can be assumed to equal the cost involved in arriving at the algorithm. The second statement refers to the universal irreversibility of most processes in accordance with the Second Law of Thermodynamics (e.g., Feynman, Leighton, & Sands, 1977). In either case, it is clear that simple (or apparently simple) algorithms can generate complex outputs. This can happen either because algorithms themselves contain large amounts of hidden information or because they tap into processes over which we have little control and cannot easily reverse. Equally, algorithms generating simple outputs can be made arbitrarily complex. Consequently, appealing to algorithms is not particularly helpful for understanding psychological complexity.
The above problems notwithstanding, many pattern-coding languages have been proposed based on algorithmic compression, which are widely used in psychology for describing structure. These models (Simon, 1972) involve different kinds of algorithmic notation aimed at providing compact description of patterns. As pointed out by Simon, different coding languages produce relatively similar results irrespective of notation. This is because a simple pattern is by definition more compressible than a complex one. Specifically, both computational (Kolmogorov, 1965) and psychological (e.g., Falk & Konold, 1997) formulations suggest that complexity represents the inverse of compressibility, either computational or psychological. In a sense, all pattern languages represent translations of the available information contained in a pattern into ultimately similar codes. These codes retain the structural information contained within the pattern but also add idiosyncratic coding information. Simon analyzed different pattern coding schemes (e.g., Restle, 1970; Simon & Kotovsky, 1963; Vitz & Todd, 1969) and provided translations from one into another. Although certain schemes are suited to particular contexts, there appears to be a conceptual “common core” that all of these measures possess, resulting in high correlation with subjective perception of complexity.
Perhaps the most important such language is Leeuwenberg’s Structural Information Theory (SIT; 1969) and its modifications (e.g., van der Helm, 2000). Leeuwenberg’s coding scheme produced impressive correlations with experimental data and has been influential in guiding psychologists’ treatment of complexity. The rationale behind this approach, which is shared by other authors, is that the complexity of (or structural information contained in) a pattern is compressible and expressible in a formal coding language. This approach is very similar to the AIT approach in that the complexity of a string equals the length of the shortest statement that can encode it. Its strengths are the development of a formal language, applicable to different types of structure, insensitivity to alphabet size, sensitivity to the hierarchical nature of structure, and high correlation with experimental data. However, in common with other attempts of this kind, Leeuwenberg’s and related schemes face conceptual and practical problems.
Leeuwenberg’s scheme represents a lexical system—a formal language in which the structure of patterns can be expressed. In this sense, his attempt is representative of a number of similar languages which translate a structure into a set of formal statements. How useful are such languages? On the one hand, they represent ingenious exercises in algorithmic compression, which often help bring to light certain aspects of structure. The importance of this line of thinking has been accentuated by the development of various computer-programming languages. On the other hand, a critic could argue that they represent formal translation of what is perceived and processed in the first place. If a pattern is complex, it is not surprising that its encoding will be complex. As noted by Simon (1972), most of the classic schemes correlate highly with one another, suggesting a same underlying principle, namely compressive or algorithmic translation—a same structural content is expressed in different ways, some of which are more efficient than others (in agreement with AIT; Donderi, 2006, p. 79). What is missing is a clear definition of psychological information/complexity that avoids this kind of circularity.
SIT and similar schemes have provided interesting and ingenious ways of encoding structure. However, they generally face the same problems that face AIT (but see van der Helm, 2004). They begin with a simple code (algorithm) for a given sequence and use the code to produce it. The possible forms of regularity are given in advance and used to specify different sequences. However, going in the opposite direction, that is, inferring the code from the pattern is difficult if not impossible for patterns of reasonably high complexity. As remarked elsewhere, Leeuwenberg “never suggested that complexity was captured by his coding scheme, only simplicity” (Cutting & Garvin, 1987, p. 370). The codes themselves are often as complex if not more so than the original sequences (with multiple levels of nesting and arbitrary chunkings). Despite the caveat that the number of different symbols in the code is what matters (Leeuwenberg & van der Helm, 1991), the “infrastructure” of the code (levels of nesting and operators) must be taken into account because it reflects the cost of compression. In our opinion, different codes represent a trade-off between encoding effort and apparent code complexity. Specifically, codes containing fewer symbols contain more levels of nested structure and vice versa. This means that even with simple strings it is possible to alternate between different codes of approximately equal complexity. If the most efficient code is the one containing the fewest structural units, this could be because one has ignored the cost of arriving at such a code.
A further point concerns the definition of complexity in SIT-based schemes. According to van der Helm & Leeuwenberg (1996), pattern complexity can be defined as “the minimum amount of pattern information necessary to describe the pattern” (p. 444). This definition is circular—it does not point at any specific quantities that could be measured independently leaving the interpretation of information vague. This can be clearly demonstrated by considering a complex pattern of reasonable length. The more effort the investigators invest in finding and encoding possible regularities, the simpler the descriptions they are likely to come up with (see Olivers, Chater, & Watson, 2004, for related criticism). A question one needs to ask at this point is: “Does the brain really search for the most efficient encoding?” If so, the brain’s quest would be a never-ending one. For any structure of nontrivial complexity, it would have to cycle among countless alternatives, without ever coming to a decision (which is what happens in the case of intractable “random” patterns). Alternatively, if such a solution were found, it might rely on non-structural (semantic) factors or a novel, more general, structural rule. The brain would start by reading the string and attempting to compress it. If a structure is simple (e.g., symmetric or periodic), it is assimilated almost instantly and if it is complex, the brain searches for patterns, shortcuts, and heuristics in order to “break down” the complexity or “squeeze out” regularity (van der Helm, 2000). If a structure is highly complex (“random” according to AIT), the brain (or computer) eventually gives up and the pattern is treated as “noise” which is inaccessible to analysis (e.g., SETI signal, “junk” DNA). This insight is highlighted by the study by Alberoni (1962) in which subjects treated patterns as random only after failing to find any regularities in them.
Information and its cost
In order to provide a consistent definition of psychological information, which overcomes the ambiguities of the classical approaches, we suggest a subject-centered distinction between two types of information. On the one hand there is available information, which corresponds to the totality of information contained in the pattern. In terms of information theory, this is provided by the set of frequency distributions of all possible combinations of symbols within a pattern, while in terms of AIT, this is given by its uncompressed form. This should be contrasted to accessible information, which is defined by the limited analytical ability of the observer. This might also be termed useful or meaningful information. The amount of accessible information depends on the properties of the stimulus (complexity) as well as a number of extrinsic factors in sense of Garner (1974; e.g., individual differences and task demands). As shown in Figure 1, the potentially infinite increase in complexity reflects increase in available information. This is true of accessible information only up to a point. As the complexity increases, the distance between the available and accessible information begins to increase too. Eventually, the observer has to abandon attempting to interpret the pattern and labels it random until a new regularity is discovered and the boundary of ignorance (boundary between available and accessible information) is shifted.

Relationship between available and accessible information and complexity. See text for details.
The figure could be interpreted as showing that in any set of patterns there is much more available than accessible information (there are many more complex “intractable” patterns). It also elucidates the point that in any exhaustive set of patterns, only a few are perceived as simple. The corollary is that human observers operate with simple (statistically rare) patterns because they cannot access the wealth of available information hidden in more complex patterns and assimilate it into extant schemas. Available information is made accessible through a process of compression, pattern extraction, and relating the available information to the information in our possession. As we invest effort into studying and describing complex patterns, we eventually assimilate them and treat them as meaningful. One consequence of the increase in disparity between available and accessible information is the widely noted inverse relationship between pattern discriminability and its complexity.
What is often ignored is that these processes involve effort and cost (Falk & Konold, 1997). As proposed by Bennett (1987) in the context of computation, the complexity of a message should be measured based on the amount of work that has gone into creating it. The problem with applying this clearly important concept to psychology is that the cost of abstraction and compression is difficult to quantify. In the case of classical information theory, the cost of locating a cell in a grid or the correct answer to “20 questions” is directly related to the magnitude of the context—the larger the grid, the longer it takes to locate the cell. The cost of transforming the available (all possible positions of a cell) into accessible information (the exact solution) provides an implicit measure of the effort invested into searching through the problem space until the solution is found.
The problem arises when one focuses on the structural aspect of information. Here, one has to measure the informational distance between the available information and a useful contraction/algorithm. As has been shown (Chaitin, 2001), for sequences of a certain level of complexity, it is not possible to know if a better (more efficient) algorithm does not exist. For reasonably complex patterns the distance between available and accessible information is so large that the latter can never equal the former. The process of transforming potentially available into useful information is highly complex and irreversible. Any simple structure can be encoded in a complex way but at the same time, any encoding can hide almost infinitely many meanings. Alternatively, one can attempt to measure the amount of structural information present in an individual string and use this as a predictor of the ease with which such a string could be compressed using a general, psychologically valid principle.
Complexity as change
In the light of the above discussion, there is a need to define complexity in a way which restores the link between the psychological and physico-mathematical domains. Such a definition should be based on a simple, primitive, and general notion that underpins human perception and cognition. Perhaps the most primitive is the notion of change. Change is of fundamental importance for psychology. Study of changes in sensation represents the basis of experimental psychology and psychophysics. Any textbook on the subject clearly demonstrates that enquiry into perception and cognition must begin with change. Similarly, there is overwhelming evidence that the primary role of the brain is to process change. Any account of visual and auditory processing stresses the importance of changes in physical parameters of the stimulus for its sensory encoding. Neurons fire in response to stimulus onset and changes in stimulus contour. Changes in stimulus structure are detected pre-attentionally and are reliably reflected in brain responses (e.g., Mismatch Negativity; Näätänen & Alho, 1995).
Change allows direct quantification of the relationship between pattern elements. As already stated, one of the reasons for the failure of information theory to capture the structural aspect of complexity/entropy was due to its focus on individual symbols (objects) and their frequencies at the expense of a description of their relationship. It is here that the importance of change becomes clear. Structural information (relationship between elements) is contained in the transition from one symbol (or element) to another and not in the symbols themselves. The idea that change might represent an important (if not the most important) source of psychological information was hinted at by Attneave (1954). He described a mental experiment, the focus of which was a simple image consisting of clearly defined areas of three different colors (an inkwell on a desk in a room; Figure 1, p. 183). The image was divided into 4000 cells (pixels). In describing how observers would guess the colors by scanning the image pixel by pixel, Attneave hypothesized that errors would occur at the points of transition between areas of different color. The rest of the image would be highly redundant because once the observers settled on a correct color, they would make correct predictions most of the time. In other words, most of the time, the image was highly predictable. The exceptions were the boundaries between areas of different color. In addition, Attneave described a simple experiment in which participants were asked to reproduce an irregular closed figure by drawing 10 dots which resembled the shape as closely as possible (Figure 2, p. 185). The participants were then asked to indicate the places on the original figure which the dots were supposed to represent. In most cases the dots were placed at points of the greatest change in direction of the contour. These were the points at which most information was concentrated (see also Leyton, 1987).

Hypothetical physical entropy curve elicited by an unchanging pattern (top panel) and a pattern containing change (bottom panel).
Despite this suggestive evidence, change has not been explicitly addressed in the study of complexity. This is perhaps because complexity research in psychology has been inspired by probability and geometry. While the former approach focuses on enumeration and quantification of different outcomes (e.g., symbols and their combinations), the latter stresses the importance of invariance (the inverse of variance/change). Invariance could be viewed as the opposite of change in the same way that complexity is the opposite of simplicity. If viewed in this way, the two perspectives are interchangeable. This is implicitly represented in the entropy/redundancy distinction, which views the former as the opposite of the latter (entropy = 1 – redundancy; Attneave, 1954). Yet, psychological research has indicated that change is more difficult to process than the absence of change and that symmetrical information-theoretic probability distributions that treat change and its absence as equiprobable do not agree with distributions of scores on a number of complexity and randomness-related tasks, which are negatively skewed (Falk & Konold, 1997, p. 306). Briefly, observers consider as most random or complex those patterns which contain a large amount of change as long as the change itself is not regular.
Transformational approaches to pattern goodness/complexity (e.g., Palmer, 1983) start with certain prescribed forms of invariance and assume that these govern the perception of structure. Many influential mathematical theories are founded on the notion of invariance (e.g., Weyl, 1952) and invariance has been given a special place in psychological theorizing on complexity and goodness. The main problem with accepting structural invariance as the starting point is that it is very difficult if not impossible to define regularity because it is multifaceted and could be defined in different ways (e.g., symmetry, regularity, periodicity, monotonicity, invariance under motion or growth, etc.). We choose change rather than invariance because there is a sense in which change conceptually precedes invariance. Cutting (1998) referred to invariance as absence of change and transformations as “ways of invoking a change” (p. 75). According to Weyl (as cited in Feynman, 1999): “a thing is symmetrical if one can subject it to a certain operation and it appears exactly the same after the operation” (p. 1). Thus, invariance can easily be defined as resistance to change whereas the converse is not straightforward. The precedence of change extends to the way in which the brain processes sensory stimuli. While change is clearly important in the early processing stages (e.g., luminance and orientation differences) research has shown that invariance (symmetry) is exclusively encoded in the higher (extra-retinotopic) regions of the visual cortex (Tyler et al., 2005).
Furthermore, change allows direct measurement of complexity/entropy in the sense that any action by an agent, human or nonhuman, involves change, and any conceivable action is accompanied by an irreversible conversion of energy. In other words, change equals increase in entropy, and this in turn equals cost. We propose that any action, however trivial, by an agent of limited life span must incur cost and that this cost is reflected in an increase in (physical or computational) entropy. In other words, and in the context of what follows, we are assuming that all computing (human or nonhuman) is thermodynamically irreversible in the sense of Bennett (1987). Registering change always costs more than registering no change. However trivial an operation might be, it requires some form of irreversible energy conversion. This means that in its interaction with the environment, the agent will have converted a certain amount of available energy, irrespective of its scale or apparent inactivity. This formulation brings together physical, computational, and psychological meanings of complexity and entropy.
In order to elucidate the physical/computational rationale behind our argument we describe a simple physical computing device, which reads a binary string one symbol at a time. Its physical entropy is assumed to increase monotonically (a necessary simplification, which does not affect the argument). Its algorithm is also assumed to be fully “unraveled” in the sense that no compression has been performed. It consists of a set of explicit instructions that could be carried out by the human observer. The algorithm instructs the machine to read the string and register change every time a zero changes into 1 or vice versa. Every time it does register a change, the machine performs extra work relative to a “blind” or inactive machine. This work increases the wear and tear of the physical parts of the machine (e.g., head and counter) and accelerates its dissipation. Even if the physical aspect of computing is ignored, the algorithm which instructs the machine to register and count instances of change is more complex than an algorithm that instructs the machine only to scan the string. Writing a “complex” algorithm costs more than writing a “simple” one. The actions associated with the more complex algorithm result in higher physical entropy.
Figure 2 shows the hypothetical physical entropy curve of such a machine scanning a binary sequence as a function of time. In the top panel, the entropy rate is assumed to be constant because reading each zero costs the same amount of converted energy. We wish to stress that we do not know the shape of the empirical entropy function and are assuming it to be linear for the sake of simplicity. The bottom panel illustrates the increase in entropy rate caused by the presence of change in the string. Whenever the head encounters change, its entropy rate increases due to the additional work the computer has to perform in order to register it. This extra energy conversion accelerates the physical entropy rate of the machine. This example could easily be extrapolated to sensory coding. Above-threshold changes in stimulus intensity cause sensory neurons to fire above their background rate. This in turn increases the overall physical entropy of the neural network. The above discussion allows us to redefine some of the classical terms used in psychological complexity research which have thus far lacked formal definition. Runs (e.g., Gambino & Myers, 1967) explicitly refer to the portions of a string which do not contain change and alternations to the transitions between runs (instances of change). Number of alternations equals the number of changes and the number of runs in a pattern equals the number of changes plus one.
This example only applies to computers which possess a very limited memory capacity. Clearly, for such devices, the most complex string is 01010101… since it contains more instances of change than any other string. The situation is different with regard to human observers who have the capacity to retain and analyze higher-level structural information. There is a great deal of evidence that in both simultaneous (e.g., Alexander & Carey, 1968) and sequential (Galanter & Smith, 1958; Garner & Gottwald, 1967; Royer & Garner, 1966) modes of presentation of binary patterns, human participants consider higher levels of structure in the form of sub-patterns, which do not need to consist of runs of identical symbols (Feldman & Hanna, 1966). This is corroborated by substantial neurological evidence that the brain processes sensory and perceptual information hierarchically. Primate perceptual systems possess an interdependent hierarchical structure, with the primary centers, responsible for the processing of low-level information such as pitch or luminance difference. These pass on their outputs to higher centers which exhibit progressively greater levels of generality (high-level structural information; see, e.g., Frackowiak et al., 2003, p. 177). Consequently, a plausible model of psychological complexity must accommodate the fact that the brain encodes change, that is, the relational aspect of psychological information, at different hierarchical levels. A complexity model based on a simple premise outlined in this paper (i.e., hierarchical change) correlated significantly with 25 out of 28 subjective and objective complexity data sets generated over 50 years of psychological complexity research (Aksentijevic & Gibson, in press).
From Gestalt to complexity and back
The conceptualization of complexity in terms of change allows us to link the psychological, computational, and physical meanings of complexity and entropy. At the same time, it has implications for the Gestalt principle of Prägnanz or Minimum Principle according to which, the perceptual system seeks states which minimize expenditure (or conversion) of energy. The principle of proximity refers to the metric information (MacKay, 1950)—the distance between elements determines the goodness of grouping. The principles of similarity and good continuation implicitly address change. One of the most important Gestalt laws is the principle of good continuation, according to which the simplest perceptual path through a structure is the straight line. Figure 3 demonstrates this principle: an X is commonly perceived as a crossing of two straight lines (the framed example in panel A), rather than two arrowheads or arcs facing each other (the adjacent example). We can hypothesize that the former is less complex in the sense that it minimizes energy conversion—the perception follows the shortest and simplest path (geodesic; Aksentijevic, Elliott, & Barber, 2001). However, the change in the stimulus forces the perceptual system to adopt a new, more complex, solution (panel B).

Good continuation and change. Panel A: The perceptual system chooses the simplest interpretation (linear geodesics). Panel B: A change in the stimulus forces perception to adopt a more complex solution.
A change-based approach to complexity has the potential to provide the “missing link” between the phenomenological aspects of Gestalt principles and the early theorizing on the underlying mechanisms of perceptual grouping (Koffka, 1935). Specifically, change represents a phenomenal/psychological analogue of energy. Perhaps, it might be more accurate to say that energy represents an abstract analogue of perceptual change. The principal task of the perceptual (and cognitive) systems is to interact with the environment and interpret it efficiently using limited resources. Perceptual and cognitive constraints caused by the limited energetic potential of the cognitive system necessitate search for pattern and regularity (absence of change). The need for the minimization of energy expenditure manifests itself as a predilection for grouping—for interpreting the perceptual (and cognitive) stimuli in the simplest possible way (Chater, 1996).
The requirement that change should be considered at all levels is motivated by the Gestalt notion that a whole does not equal the sum of its parts. From the psychological perspective, the complexity of a whole can only be assessed if interrelated contributions of many levels are taken into account. Different accounts of perceptual grouping indicate that the goodness of a pattern is determined by the amount of change. In agreement with the Law of Prägnanz, goodness represents the state of minimum change (or minimum complexity) in a particular context. The laws of proximity and similarity could be said to represent the two aspects of entropy: magnitude and structural. The former law addresses the limited magnitude resolution of the human observer. Objects perceived as being close to one another will be grouped provided they are sufficiently similar. The dynamics of grouping depend on the magnitude of the objects and their mutual distance. It is clear that there is a grouping region—a range of distances and magnitudes that allows grouping to occur. Proximity reduces the amount of effort necessary to compare objects and/or discriminate between them. The law of similarity refers to redundancy, that is, structural simplicity. Objects that are structurally similar are automatically assigned to a group provided they are sufficiently close. Again, less effort is needed to compare and categorize them and redirect attention to the rest of the scene. When a visual (or auditory) scene consists of similar objects, its dynamics are simple, however we decide to describe them. The fundamental quality determining the simplicity of a scene is the amount of change.
In conclusion, we propose that psychological complexity is best explained and quantified in terms of change. Patterns that contain little change possess little information and have low entropy. Simple or regular patterns or objects are impervious to change. They are predictable, resistant to disruption, structurally stable, and easy to process, assimilate, and memorize. Part of the aesthetic appeal of periodicity and symmetry might lie in the fact that they appear to defy entropy (increase in disorder). A complex pattern contains more information than a simple pattern and is less predictable. It is less symmetric and will change if subjected to various operations. The human observer searches for informational shortcuts provided by the notions of symmetry, similarity, and regularity in order to minimize the expenditure (conversion) of energy associated with understanding.
Footnotes
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Aleksandar Aksentijevic is a senior lecturer in Psychology at the University of Roehampton, London. He is an experimental psychologist with interest in complexity, perceptual organization, auditory and cross-modal perception, psychology of music and the relationship between psychology, mathematics and science. Address: Department of Psychology, University of Roehampton, Holybourne Avenue, London SW15 4JD, UK. Email:
Keith Gibson sadly passed away prior to publication of this article. He was a lecturer in Computer Science at Birkbeck College, London. His interests included cryptography, especially the use of error correcting codes, and complexity.
