Abstract
Aspects of the ‘zygonic’ model of expectation in music (Ockelford, 2006) were tested experimentally. Forty subjects were played a diatonic melody, starting with the initial note only, then the first two notes, and so on. Each time, subjects were asked to sing what they considered to be the most likely continuation. The results were compared with the outputs of three algorithms derived from the zygonic model, which took into account adjacency (‘Z1’), adjacency and recency (‘Z2’), and adjacency, recency, and between-group projections (‘Z3’). Each algorithm modelled the perceptual responses with statistically distinct degrees of accuracy; Z3 was the most faithful to subjects’ expectations. Given the empirical data, potential refinements to the quantification of the zygonic model were considered. Additionally, it was found that men and women exhibited different patterns of expectation in relation to the stimuli that were presented, paralleling recent neuropsychological data suggesting that the location of music-structural processing in the brain may differ by gender.
Introduction
As David Huron so vividly describes, expectation pervades the human condition (2006, p. 3): ‘A cook expects a broth to taste in a certain way. A pedestrian expects traffic to move when the light turns green. A poker player expects an opponent to bluff.’ Indeed, the capacity to make judgements about the future based on past and present experience is apparently wired deep in our neural architecture (Kveraga, Ghuman, & Bar, 2007). Hence it is unsurprising that the importance of expectation has been reported by researchers working across the cognitive sciences: in visual perception, for instance (Engel, Fries, & Singer, 2001; Summerfield & Egner, 2009), attention (Hogarth, Dickinson, Austin, Brown, & Duka, 2008), memory (Whittlesea, Masson, & Hughes, 2005), language processing (Otten, Nieuwland, & van Berkum, 2007), and behaviour (Olson, Roese, & Zanna, 1996).
Expectation is implicit in much contemporary music theory too (Schmuckler, 1989, p. 111), often retracing the paths pioneered by Leonard Meyer (1956, 1967). Meyer’s ideas, rooted in Gestalt perception and information theory, were extended by Eugene Narmour in his ‘implication–realization’ model (1977, 1990, 1992, 1996). This subsequently found some support in empirical studies (Schellenberg, 1996, 1997; Thompson & Stainton, 1996), which indicated that simplification leads to little diminution of its predictive power. Moreover, von Hippel and Huron’s (2000) analysis of melodies from a variety of cultures showed that Narmour’s key principle of ‘registral return’ could be explained as an artefact of constraints on range. Nonetheless, the broad thrust of his theory retains its relevance in certain areas of psychomusicological endeavour, in which other approaches to expectation also continue to play a prominent role (Bharucha, 1999; Jones, 1981, 1982, 1992; Margulis, 2003, 2005, 2007).
In 2006, Adam Ockelford drew a number of these strands of thinking together into the conceptual framework offered by ‘zygonic’ theory (Ockelford, 2005, 2006, 2008, 2009). This asserts that the cognition of musical structure stems from a sense of derivation, whereby musical elements are heard (typically nonconsciously) as existing in imitation of others. The relationships – hypothesized cognitive constructs – through which such derivation is held to occur are said to be ‘zygonic’ (from the Greek word for ‘yoke’, implying the union of two similar things). ‘Zygons’ constitute a type of ‘interperspective relationship’, through which perceived aspects of musical sounds are compared. Such relationships can be represented graphically as in Figure 1. 1

Examples of interperspective and zygonic relationships
Ockelford formulated a new model of expectation (2006, pp. 127ff), whereby anticipation in music is said to arise through the projection of zygonic relationships into the future, using what Husserl (1964) called ‘protentions’: the anticipation of what is to come, enauralized in the present. These relationships stem from one of two sources:
‘current’ structures, which form part of the hearing process in train at the time, are encoded in working memory, and operate either
within groups of notes or between them (
‘previous’ structures (which formed part of past hearing processes, and therefore necessarily operate only between groups). These may be encoded ‘schematically’ (

Zygonic model of expectation in music (after Ockelford, 2006, p.127)
Current ‘within-group’ structures can offer only a general indication of what is to come (

The opening bar of Rachmaninov’s 2nd Symphony, 3rd Movement
What pitch would she expect to occur next in the first violins? Projecting zygonic relationships forward suggests that any note framed by the semitonal universe with nodes at concert pitch would offer a logical option for continuation (see Figure 4).

Potential coherent intraopus continuations from the first three notes of the theme and its principal counterpoint in the 3rd movement of Rachmaninov’s 2nd Symphony
Now consider

Indicative interaction of ‘within group’ and ‘schematic’ structures in expection
The musical validity of this model can be illustrated with the following potential harmonizations; see Figure 6.

Potential coherent continuations following bar 1 of the 3rd Movement of Rachmaninov’s 2nd Symphony, based on within-group and schematic structures
But there is more to
range, whereby ‘mid-range’ pitches are encountered more frequently (and are therefore felt to be more probable) than those at the extremes;
interval size, with a tendency for smaller intervals to be used more than larger ones;
scale degrees, with context-specific differences in the frequencies of utilization; and
scale-degree transitions, which, again, show context-specific patterns of occurrence.
To take a further example: consider the frequencies with which different continuations of Rachmaninov’s opening melodic gesture

Examples of melodic continuations following the opening i-iii-v in the Western classical tradition
The position with regard to
Consider this assertion in relation to ‘current structures’. Take the opening of bar 3 of the Rachmaninov slow movement. In the first violin part, between-group projections potentially kick in: secure predictions enabled through zygonic invariants (series of relationships operating in parallel). A similar teleological drive may characterize the perception of the inner parts and the bass-line too; see Figure 8.

Examples of between-group projections in the 3rd Movement of Rachmaninov’s 2nd Symphony
What is the nature of the interaction between

The ‘chameleon’ effect operating in musical expectation
The mechanism through which the relative certainties offered by
Just how the implications inherent in musical structure and the evolving expectations associated with them affect the listening experience is an issue that preoccupied Meyer. His initial proposition was that an affective response would be aroused when an expectation activated by a musical stimulus – a ‘tendency to respond’ – was inhibited (Meyer, 1956, p. 31). This thesis proved contentious, though. How could one reconcile the uncertainty deemed necessary to stimulate affect with repeated hearings, since people often listen to pieces many times yet continue to enjoy them? Indeed, we typically react most strongly to familiar music (Panskepp, 1995, p. 172). It cannot be the case, though, for a piece one has memorized ‘that the ebb and flow of partially fulfilled expectations control one’s enjoyment of it: every note is exactly what is expected’ (Bever, 1988, p. 166).
Meyer countered arguments like this in various ways (see, for example, Meyer, 1967, pp. 42ff). His final thoughts on the subject (2001) involved the ‘willing suspension of disbelief’, whereby listeners generate an aesthetic illusion, ignoring their knowledge of a piece and hearing it as if for the first time (2001, p. 352). Some 10 years earlier, Ray Jackendoff had questioned thinking along these lines (1991, pp. 224–228), as it seemed to ‘conflate enjoying a piece with not remembering how it goes.’ However, Jackendoff proposed ‘rescuing’ Meyer’s expectation theory (p. 228) by suggesting that violations of what is expected may occur on a subconscious level, involving a closed module for music processing. This effectively always hears a piece as if for the first time, thereby ensuring that affect remains intact (cf. Bharucha, 1994, pp. 215–216; Fodor, 1983; Margulis, 2005; Schmuckler, 1989, p. 114). It may be that Meyer’s original assertion would be better couched in terms of expectation in music working through the nonconscious (rather than the willing) suspension of disbelief. Whatever the precise neurocognitive processes involved, though, the zygonic model of expectation, through its three sources of projection (
To date, despite its intuitive consonance with the aesthetic experience of listening to music, the zygonic theory of expectation has remained just that – a theory. While it would be difficult to test the whole model empirically at one time, given certain reasonable assumptions, features of it can be used to produce testable predictions. Three of these are reported here.
Designing the empirical work: Rationale, constraints, and assumptions
Our research questions are as follows. With reference to the zygonic model shown in Figure 2:
Is there evidence to support the hypothesis that, ceteris paribus, expectations arising from ‘current structures, within groups’ and ‘previous structures, schematically encoded’ interact to produce a general sense of what may follow?
Is there evidence to support the hypothesis that expectations arising from ‘current structures, between groups’ produce a specific sense of what is to follow?
Is there evidence to support the hypothesis that (b) and (a) interact, whereby (b) adds greater specificity to (a), and (a) grounds (b) in a local context?
These questions engage
A number of constraints were required to make the empirical work manageable. First, there were restrictions pertaining to the design and utilization of the stimulus.
The domain in which expectations were predicted and elicited should be pitch, since this constitutes the principal structure-bearing dimension of the ‘what’ in music (working in tandem with the ‘when’ afforded by patterns of interonset intervals; Boulez, 1971). It further required tessitura, tempo, timbre, and loudness to be as ‘neutral’ as possible, to avoid potentially confounding effects.
The major diatonic scale (see Figure 5 earlier) should be used as a framework for the stimulus, and the material should conform to Western tonal ‘common practice’, with which a broad spectrum of subjects would be familiar. All seven available scale degrees should appear within the span of an octave to facilitate prediction and analysis of the results.
A single line should be used, to keep the stimulus as simple as possible, avoiding the complexity of expectations in one part potentially influencing those in another.
New material should be created for the stimulus, to avoid the danger that C (previous structures encoded veridically) would figure in subjects’ responses.
For similar reasons, subjects should be limited to one hearing.
To gauge the impact of between-group structures (an element of A), the melody should contain clusters of notes that are repeated or transposed, as well as having episodes in which no such connections exist.
Given the method of data collection that was adopted (where participants sing the note they expect to come next – see below), and given that both adult males and females were involved, the melody should be positioned within two alternative pitch spans that were mid-range for ‘typical’ men’s and women’s voices.
Working within these constraints, the following stimulus was created (Figure 10).

Stimuli used, and their characteristics
In accordance with Constraint 1, the pace is moderate, with little rhythmic variety (the two significantly longer notes serving to underscore the ends of phrases); timbre is unvarying and rich (though intended to be stylistically ‘non-specific’); and most musical information is conveyed in the domain of pitch. Each scale-degree occurs at least once, in the key of D major, over the range of a minor 7th (Constraint 2), situated in the 3rd octave for men and the 4th octave for women (Constraint 7). By the fifth note, the unaccompanied melody (Constraint 3) does not conform to any well-known tunes in the Western classical or popular repertories (Constraint 4).
The melody is in the form a1 a1 b1 a1 a1 b2 (Constraint 6), which can be summarized as A1 A2 (see Figure 11). As we are seeking to determine how the perception of between-group relationships impacts on expectation, it is crucial to gauge the detectability of the structure, albeit nonconsciously, in first-time listening. To this end, a psychomusicological analysis will be undertaken, using zygonic theory. This draws on the intuitions of the analyst – here, the same as the composer – and is therefore susceptible to problems of subjectivity. However, for the empirical work to get off the ground, a predictive model is essential, making certain assumptions inevitable. These could subsequently be modified, if necessary, in the light of the results obtained from a range of listeners, hearing the stimulus with no structural preconceptions. The assumptions are listed below and illustrated in Figure 11.

Stimulus melody and musicological and psychomusicological analyses
The boundary between the first group a1 and its repetition will be detected as the second D is heard, on account of all or any of three signals: the shortening of the preceding A, which leaves a discernible gap in the continuity of sound (see Lerdahl & Jackendoff’s grouping preference rules, 1983, pp. 43ff); the interval of the descending 5th between the A and D, which is comparatively large, providing a relative melodic discontinuity (Bregman, 1990, pp. 461ff); and the fact the D is a repetition of the opening pitch – potentially an indication that the first motif (or a variant of it) is about to restart (see Lerdahl & Jackendoff’s notion of ‘parallelism’, 1983, p. 51). That is, it is hypothesized that schemata pertaining to the way in which formal structures typically unfold (an aspect of B) may play a part in determining group boundaries.
The second A (at the end of bar 2) will be heard as concluding the second group a1, due to the equivalent note in the first appearance of a1 fulfilling that function (Lerdahl & Jackendoff, 1983). It is assumed that structural cognition will be reinforced on hearing the B that opens bar 3, through the short break in sound that occurs before it.
b1 (bars 3 and 4) will be heard as single group, due to the adjacency of successive pitches, the continuity of sound, and the longer duration of the final note which, it is expected, will strongly signal a phrase boundary, subsequently reinforced upon hearing the brief gap in sound before the next note, D.
3
This (third) D will indicate a return to the opening segment A1 (or a variant of it), due to the symmetry that is implied (through listeners’ assumed experience of archetypal formal structures) by a repeat of the opening note after the end of the first phrase. It is anticipated that the perception of metre will be established by this point (bolstered by the recognition of previous grouping structures, with their strong metrical correspondence), and that listeners will suppose that the initial 4/4 will continue, running in parallel with any motivic repetition and transformation, thereby reinforcing expectation in the domain of pitch (cf. Temperley, 1995, p. 141; Ockelford, 2009, p. 75).
Listeners will anticipate that the symmetry continues, with a1 + a1 followed by b1 or a variant of it (which, in simple tonal melodies, would typically resolve the dominant that was heard at the halfway point – A – onto the tonic, D).
The E at the beginning of bar 7 (after the second reprise of a1) will register with listeners as an indication that something different is about to happen. However, it is not until the onset of the following D (which, like the opening of bar 3, frames a descending major 2nd) that listeners are expected to anticipate a transposed version of b1, concluding on the tonic.
Further constraints and assumptions, arising partly from the design of the melody, impinge upon the way in which the patterns of expectation that evolve as the stimulus is heard are modelled according to zygonic theory: limitations and suppositions that will enable quanta to be assigned to the anticipated relative strengths of expectation involved.
In relation to A1 (‘current structures, within groups’, see Figure 2 earlier), since little is known about how expectations arising from pitches or intervals interact (from primary and secondary zygonic relationships respectively), it was decided, for this initial investigation, to model only the impact of perfect and imperfect primary zygons that may conceivably operate from a given note (see Figure 12).

Hypothetical range of potential zygonic relationships from a given pitch
Such relationships will be subject to an adjacency effect, since zygonic theory suggests that, ceteris paribus, the strength of expectation will be inversely proportional to the dissimilarity of a predicted note to the stimulus, although the precise nature of the relationship is not determined (cf. Ortmann, 1926, p. 30). Turning to the empirical data pertaining to the melodic intervals occurring between successive notes that are available from Ortmann (1926) and Huron (2001, p. 25), the statistical picture is actually more complex than this: exact repetition tends to be used less frequently than intervals of a second (the desire for similarity apparently outweighing the wish for duplication), and the interval of an octave arises more often than sixths and sevenths (arguably due to the influence of the harmonic series; see Ortmann, 1926, p. 31). Combining the findings of the two studies yields intervallic data relating to over 340 pieces from 10 cultures, which can be expressed in terms of differences between scale steps as follows (see Figure 13).

Intervallic probabilities, after Ortmann (1926) and Huron (2001)
These data will be used to form what Huron would term a ‘serviceable heuristic’, to quantify the expectations pertaining to
Given that the overall probability must total 1, and on the working assumption that the heuristic is bidirectional and operates symmetrically about the pitch that is presented, the model predicts that, following the first note of the stimulus melody, pitches will be anticipated with the following probabilities (see Figure 14).

Predicted probabilities of expectation using the adjacency model following the first note of the melody
The significance of adjacency in prediction may extend to stimuli other than the one heard most recently (Ockelford, 2006, p. 108), and a recency effect is postulated, whereby the closer the stimulus to the point at which expectation occurs, the greater its impact on anticipating what will happen next. This is represented schematically in Figure 15.

Schematic representation of zygonic adjacency + recency model of expectation in music
To enable this model to function predictively – to indicate which pitches are likely to be expected, and with what probabilities – it is necessary to quantify two factors: the number of notes (how far back in the sequence to extend) and the nature of the relationship between events in terms of their relative impact (for example, a more contemporaneous note could be deemed to exert twice the effect of the one preceding, and so on).
Two criteria were used in the process of quantification: the psychological constraints of working memory, and the mathematical consideration that some of the proportions involved in predictions from just one note are already very small, meaning that significant reduction would render them trivial. Given these two restrictions, a model was devised that took into account a maximum of four events, with a linear decrement of impact. The latter is calculated such that the predicted probability of an nth pitch (pn) (occurring after n–1 events) is given by the equation:
To illustrate this principle in action, here are the predicted probabilities of notes 3, 4, and 5 from the stimulus melody (Figure 16). 4

Predicted probabilities of expectation using the adjacency + recency model following the first 2, 3, and 4 notes of the melody
In relation to A3 of the zygonic model (‘current structures, between-groups’), groups will be recognized from the first note onwards in the case of exact repetition (see Assumptions 1 and 4, and Figure 11 above) and from the first interval (that is, the second note) in the case of transposition.
The combined ‘adjacency/recency’ effect (A1 + B2) will interact with the ‘between-groups’ effect (A3), whereby the strength of expectation generated by a group of repeated or transposed notes will increase rapidly as the sequence of pitches and intervals is heard again. That is, as listeners become more certain that what they are hearing is familiar, it is assumed they will make increasingly specific predictions as to what is likely to occur next. No empirical data are available to quantify this conjecture, but it is postulated that the impact ratio between the two factors – (A1 + B2) : A3 – will change exponentially as one proceeds further through the group that is repeated or transposed, such that, after the nth pitch, the ‘between-group’ and ‘adjacency/recency’ influences will be deemed to be

Predicted ratios between ‘within-group’ and ‘between-group’ influence
Figure 18 illustrates these equations operating in relation to notes 5, 6, 7, and 8 of the stimulus.

Predicted probabilities of expectation using the adjacency &0x002B; recency &0x002B; intergroup model following notes 5, 6, 7, and 8
Although the principles driving the model are straightforward, its quantification results in a plethora of data, whose manipulation is time-consuming and arguably of little perceptual consequence at the periphery of the action (where values are very small). For example, after only four notes of a repeated group, the relative impact of adjacency/recency is negligible at the extremes of pitch, although the model still sets out the calculations. A certain degree of over-specificity is probably inevitable in the early stages of developing any such protocol that seeks to emulate complex human behaviour, when key parameters are still being determined. However, it is hoped that future empirical work may suggest refinements and ways in which the model may be simplified without losing its flexibility or anticipated predictive power.
Hence we have three models of expectation rooted in zygonic theory that will enable us to tackle the research questions: the one-factor ‘adjacency’ model, ‘
Both the production methods – singing and playing – mean that subjects can produce only one response per note. Hence, in order to obtain a range of perceived probabilities pertaining to a number of potential future pitches, the responses of many subjects have to be amalgamated. Therefore, if this method is to be used, a further assumption is required.
A valid impression of human musical expectation in relation to the differing probabilities that potentially pertain to different melodic continuations can be obtained by combining a number of listeners’ responses.
A problem with the ‘playing’ production method is that subjects need to be able to play by ear, restricting the pool of potential subjects, and conceivably resulting in an atypical sample. The potential challenge of having subjects sing responses is that these may be constrained by the tessitura of the voice, and, in the absence of vocal training, that pitches may be produced inaccurately. The first difficulty can be obviated by avoiding registral extremes (Constraint 7), and the second can be countered by using frequency detection software and making reasonable assumptions as to what was intended. However, a number of issues remain: the task is artificial, since listeners do not usually listen to music in incrementally increasing chunks; they have to sing notes, consciously reflecting on and extrapolating from the listening experience in a way that is unnatural; and the pitches they produce – particularly if they prove to be incorrect – may be distracting. Hence, a final assumption is necessary.
A singing protocol of the type described will not interfere with subjects’ listening experiences to such an extent that the responses they offer fail to present a reasonable picture of the expectations that would otherwise occur.
Method
Research participants
Forty subjects, 24 female and 16 male, aged between 21 and 76, mean 34 years, were recruited through direct contact and posters at Roehampton University, London and other community sites in the area. One subject reported minor hearing loss, but this did not appear to interfere with his ability to take part in the experiment and his contribution was included. Subjects were recruited without regard to their musical or, specifically, vocal training. However, only 11 (28%) reported having had no previous formal music education. Of the remaining 29, five (17%) had had less than 2 years specialist input, six (21%) had had 2 to 4 years, nine (31%) had had 4 to 8 years, and nine (31%) had had more than 8 years. Seventeen (59%) also reported having had voice or singing lessons. All subjects reported listening to music every day (15% less than 30 minutes, 45% between 30 and 60 minutes, and 40% more than 60 minutes) and had had significant exposure to Western mainstream pieces. Three also reported listening regularly to other styles, including Indian ragas and Slovakian folk music.
Materials
The stimulus used was the melody shown in Figure 10, in different ranges for males and females. The timbre was instrumentally and stylistically non-specific (to avoid experience of specific instruments and composers’ use of them affecting subjects’ expectations) yet rich in harmonics and musically ‘realistic’ (to make the task as ecologically valid as possible), and, to this end, three sounds were blended from the Sibelius 5 software: the ‘ocarina’, ‘horn in F’, and ‘flute’.
Since listeners were asked to do something that was outside their experience, another – very short – practice melody was created. This was similar to the main stimulus, yet differed from it, so that when the principal melody was subsequently heard, it would be regarded as a distinct musical entity, and ‘previous structures’, ‘between-groups’, ‘veridical’ memories would not be invoked (
To set the auditory scene in each case, two introductions were produced, using the relevant diatonic major scale ascending and descending. This bi-directionality was again intended to counter any ‘previous structures’, ‘between-groups’ effect – reinforced by using a different timbre (the piano); see Figure 20.

‘Practice’ melody

The introductions played before the practice and stimulus melodies were heard
Environment and apparatus
Data were collected in a soundproofed room, with only the first author and the subject present, to minimize any discomfort that may have been felt in having to sing in front of someone else. The apparatus was set up so that the researcher was outside subjects’ field of vision, and assurances were given that they were not being judged on their singing ability; rather, they were encouraged to relax and follow their musical intuitions.
The materials were saved as MP3 files, and replayed using a Dell Dimension 3100C PC with a Lexicon Alpha soundcard and Harman Kardon speakers, which presented the stimuli to subjects at around 60dB. Responses were recorded through a Sony ECM-MS907 microphone connected to the same Dell PC. Vocal frequencies were measured in Hertz (Hz) using Praat, version 5.1.03 (Boersma and Weenink). This software was selected for its successful history in the field of psychological research (for example, Sergeant & Welch, 2009; Steinbeis, Koelsch, & Sloboda, 2006).
Procedure
In both the practice and experimental conditions, subjects were presented with the introduction, followed after a short pause by the first note of the melody. They were asked to sing the note that they thought would be most likely to come next. Then the first two notes of the melody were presented, and again, subjects were requested to sing the pitch that they thought would follow. This process continued until the penultimate note of the melody. After each response, subjects signified how confident they felt that their guess was correct on a Likert scale from 1 (‘not at all confident’) to 7 (‘extremely confident’). They were encouraged to use the whole of the scale, but reserving endpoints for extreme cases. The experiment took about 20 minutes.
Initial data processing
The raw data comprised recordings of 1000 brief vocalizations (25 responses from each of 40 subjects). Each was measured in Hz, determined by taking the average frequency of the most consistent portion of the response. Occasionally there was significant variation: for example, where the vocal pattern started at a particular frequency, rose up to a higher one, then fell back again. Here, subjects’ efforts were evaluated by an independent judge, who determined perceptually the point at which they seemed to settle on their intended pitch.
The frequencies obtained were assigned to categories from the D major scale, assuming equal temperament, and given that D4 (the D above middle C on the piano) = 294 Hz. Responses from male participants were transposed up an octave in music-notational terms, to facilitate male and female data being considered together.
Results and discussion
Combining the 40 subjects’ responses (see Assumption 11), and scaling them so that the sum pertaining to each note is 1, yields the dataset shown in Figure 21.

Observed, scaled responses
The diversity is striking. The number of different predictions (NP) varies from 3–12 (M = 7.36, SD = 1.96), and the range in scale degrees (RP) from 5–13 (M = 8.68, SD 2.58). Combining these factors to give a ‘coefficient of variability’ (VP), such that
reveals that the specificity of expectations increases only by around a quarter, as Figure 22 shows. Despite designing the melody with a structure that was thought to be readily apprehensible (which was intended to make prediction easier as the music progressed), there is a high degree of variability in listeners’ predictions. Even the last note, which is signalled both intraopusly (see Figure 11) and schematically (as the tonic at the end of a tonally and thematically symmetrical melody, whose first section concluded on the dominant), was not anticipated by three listeners.

Decrease in the variability of responses over the course of the melody
The results show high inter-subject variability too (see Figure 23). With a potential maximum of 25, the number of correct predictions ranged from 0–19, M = 11.9, SD = 4.84. There are no significant effects for age, years of musical training, or the time reportedly spent listening to music.

Individual responses
However, gender does appear to be important, with a significant difference between men’s (n = 16, M = 14.3, SD = 2.75) and women’s (n = 24, M = 10.3, SD = 5.28) success in anticipating what came next: t(38) = 2.83, p = .0075. As the groups of male and female subjects did not differ significantly in terms of age, musical training, or time spent listening to music, one has to look for other explanations. Clearly, the relatively small numbers of men and women involved may have resulted in unrepresentative samples – an issue that only larger-scale replication could resolve. Then, there was an experimental difference between the two groups, whereby the octave in which responses were elicited was different for men and women (Constraint 7; see Figure 11). Were tessitura a significant factor in the formation of expectations, there would be a difference in the frequencies with which intervals of different polarities were predicted (up, down, or neutral), according to where the stimulus was pitched, since, ceteris paribus, the lower a given note, the greater the probability that the one following will be higher, and vice versa (Huron, 2006, pp. 80ff). Following this reasoning, women’s predictions (whose stimulus was higher) should contain a greater proportion of descending intervals, while men’s should be more likely to ascend. Analysis by intervallic polarity does indeed show a difference in the patterns of response by gender, but these are not straightforward (see Figure 24).

Intervallic polarity of responses
With regard to descending intervals, the proportions predicted by women (M = 11, SD = 3.8) and men (M = 10.6, SD = 1.6) are very similar: 44% and 42% respectively, which suggests no effect of tessitura. The data pertaining to ascending intervals paint a different picture, though: women’s predictions (M = 10.7, SD = 4.88; 43%) and men’s (M = 13.6, SD = 1.71; 55%) differ significantly, t(38) = 2.28, p = .029, supporting the tessitura hypothesis. Moreover, as the proportion of ascending intervals to descending is 3 : 2, this could potentially explain men’s greater success in prediction.
There are two confounding factors, however. First, men’s proportion of successful ascending predictions (150 out of 218, or 69%) was significantly greater than women’s (114 out of 256, or 44%), χ 2 (1, n = 474) = 28.12, p < .0001. (Men’s proportion of successful descending predictions, being 77 out of 169, or 45%, was also greater, at 90 out of 264, or 34%, although here the difference was less marked, χ 2 (1, n = 453) = 8.76, p = .003.) Second, the different proportion of ascending predictions can largely be accounted for by women’s tendency to expect pitches to be repeated (although none in fact was) (M = 3.1, SD = 3.20; 12%), which was significantly greater than that for men (M = 0.7, SD = 1.08; 3%): t(38) = 2.88, p = .007.
The confidence with which men and women made their predictions differed significantly too. On a scale of 1 (low) to 7 (high), women’s confidence ratings were M = 3.68, SD = 1.42, while men’s were both higher and more consistent, with M = 4.57, SD = 0.96: t(38) = 2.19, p = .035. Although men were almost invariably more confident than women (after only one event, note 5, were men less so, and then by just 0.07 of a point), and while the confidence of both sexes grew through the tests, men increased in confidence almost twice as much as women: by around two points on the Likert scale as opposed to one (see Figure 25). It could be that, as men’s early predictions proved to be correct more often than women’s, their confidence grew more strongly.

The changes in confidence ratings of predictions for men and women
In summary, while tessitura may explain some of the differences in men’s and women’s expectancy profiles, it is not the only factor – and it could be that the results reflect a difference in the way that males and females process and predict musical structure. This possibility has some neuropsychological support: Koelsch, Maess, Grossmann, and Friederici (2003) found that an electrophysiological correlate of music-syntactic processing is generated bilaterally in females, with right hemispheric predominance in males. There appear to be attitudinal differences too, and the surety with which predictions are made may have impacted on expectational ‘success’. For example, women’s lower levels of predictive confidence may have resulted in the greater frequency with which they anticipated repetition (which never actually occurred) rather than change. It is possible that such differences may have arisen from opposing personality traits, stereotyped as ‘male’ and ‘female’ in Western culture. These are intriguing areas for future research.
The range and variability of data across all subjects (particularly in the case of women, whose patterns of expectation were more varied, as the differences in standard deviation show) raise two phenomenological issues. First, the variation in responses could be held to argue against a common mechanism of expectation. Second, as over half the predictions (52%) were incorrect, could expectation of this kind be part of the ‘typical’ listening experience (Ockelford, 2006, p. 135), or is it a feature of musical metacognition that was induced experimentally?
The zygonic model (versions

Predicted and observed responses
Model
Research Question 2 asks whether there is support for the hypothesis that expectations arising from ‘current structures, between-groups’ produce a specific sense of what is to come. Evidence can be sought by comparing the coefficients of variability (VP) of those pitches that the zygonic analysis shown in Figure 11 indicates could be predicted through between-group relationships and those that could not, and the data set out in Figure 27 indicate that there is indeed a significant difference in the average VP values pertaining to each category, t(23) = 2.59, p = .016. This finding is supported by the differences in confidence with which subjects reported their predictions were made: where between-group structures were available, anticipation was significantly more assured than where they were not, t(23) = 3.21, p = .004.

Variability and confidence of prediction differ significantly with the existence or non-existence of between-group structure
Research Question 3 asked whether there was evidence to support the notion that expectations arising from (a) ‘current structures, within groups’ and ‘previous structures, schematically encoded’ interact with (b) ‘current structures, between-groups’, such that (b) lends greater specificity to (a), and (a) grounds (b) in a local context. Model
Analysis of the data shown in Figure 26 indicates that the average difference between predicted and observed responses was only M = 0.039, SD = 0.019, and that this differed significantly from

Correlations between Z3 outputs and observed responses, event by event
Taking grand averages of these results reveals a high degree of correlation between the model and subjects’ responses (Figure 29).

Correlation between the means of all Z3 outputs and observed responses, note by note
There are differences, however, which indicate limitations of the model and suggest potential modifications, as well as suggesting possible improvements in the experimental design. In relation to the first three events,
The second stimulus pitch (a scale step higher than the first) would have supported the expectations of the 49% of subjects who anticipated the ascent, and suggested to all listeners that the melodic opening was based on stepwise upward movement. There were three possible sources of such a projection: the schema of initial melodic ascent, the introductory material, and, most immediately, the opening interval – each reliant on secondary zygonic relationships (see Figure 30).

Anticipation through secondary zygonic structure not factored in to
However, in relation to
From Event 5 (see Figure 28), evidence emerges of between-group expectation, reflected in the peak shared by subjects’ responses and
After the second A4 (Event 8), both the subjects and
After Event 10, with no between-group relationships in play, the model predicts a broad range of values – partly matched by the profile of subjects’ responses, although a clear majority (41%) anticipate F#4. It is unclear why this prediction dominates, since the secondary zygonic structural logic in evidence up to this point would have projected G4 (although this was the second most common expectation at 26%).
At Event 11, secondary zygonic structure reasserts itself, and the majority of subjects (68%) anticipate F#4 as a continuation of the descent from B4, A4, and G4 (Figure 31), while

Further anticipation through secondary zygonic structure not factored into
At Event 13,
At Event 14, between-group structures feature again, whose effects on expectation gradually gain in strength among listeners and
At Event 21, both listeners and
In summary,
Conclusion
This article describes preliminary research that tested aspects of the zygonic model of expectation in music. Broad support was found for the theory, particularly the notion that, at a first hearing, general expectations may arise in response to within-group and schematically encoded structures, and specific expectations may be stimulated by the repetition or transposition of groups of notes. The proposition that specific projections are enabled by veridical memories was not tested. Suggestions were made for refining the model, such as incorporating more within-group and schematic information. The study had limitations, not least the fact that only 40 listeners’ responses were tested in relation to one melody, whose design will inevitably have influenced the results. The findings confirm that humans can project what is coming next when listening to music, but provides no evidence that they usually do. Indeed, the fact that most projections made in the absence of inter-group information subsequently proved to be incorrect throws into doubt whether, beyond having a general sense of what is coming next, listeners actively seek to anticipate the future in first-time hearings, since this would presumably have a negative aesthetic impact.
An unexpected finding was that men and women appear to predict (and therefore process) musical structure differently, and future research could aim to establish to what extent commonalities and differences in musical expectation may exist between distinct sub-populations. Such work may take us a step closer to the venerable and tantalizing issue of the extent to which we all experience a piece of music in the same way or idiosyncratically.
Footnotes
Acknowledgements
With thanks to the volunteers who took part in the research, to Dr. Evangelos Himonides, Institute of Education, University of London, for his comments on a draft of the manuscript, and to the two anonymous reviewers for their telling insights into earlier versions of this article.
