Abstract
We present a novel approach to the analysis of jazz solos based on the categorisation and annotation of musical units on a middle level between single notes and larger form parts. A guideline during development was the hypothesis that these midlevel units (MLU) correspond to the improvising musicians’ playing ideas and action plans. A system of categories was devised, comprising nine main categories (line, lick, theme, quote, melody, rhythm, expressive, fragment, void), 19 subcategories, and 41 sub-subcategories as well as syntactical rules to encode motivic relationships between units. A set of 140 monophonic jazz solos from various jazz styles (traditional, swing, bebop, hardbop, cool jazz, postbop, free jazz) was annotated manually, resulting in 4939 units in total. The median number of midlevel units is 32 per solo and 13.75 per chorus. The average duration is 2.25 s (SD = 1.57 s), in good agreement with the duration of the subjective present. Overall, the most common main category is lick (45.7% of all units), followed by line (31.5%), but distributions of the main MLU types differ significantly between styles and performers. About one quarter (M = 25.1%, SD = 15.3%) of the annotated units have motivic relations to preceding units. The mean length of consecutive motivic chains is 2.8 (SD = 1.4). The amount of motivic relations varies considerably between performers, but not between styles. Based on these first results, we discuss implications for jazz research and options for further applications of the proposed method.
Jazz improvisation has been investigated by analysing either transcriptions of recorded solos (e.g. Finkelman, 1997; Kerschbaumer, 1978; Owens, 1974; Schuller, 1958), introspective accounts by musicians (e.g. Hargreaves, Cork, & Setton, 1991; Sudnow, 1978) or both (Berliner, 1994; Norgaard, 2008). The analysis of transcriptions has been complemented and extended by using computational and statistical methods (e.g. Frieler, Abeßer, Zaddach, & Pfleiderer, 2013; Järvinen, 1995; Johnson-Laird, 1991; Norgaard, 2014; Porter, 1985). Recently, neuroimaging techniques have also been utilised to gain insights into the neuronal underpinnings of jazz improvisation (Berkowitz, 2010; Berkowitz & Ansari, 2008; Limb & Braun, 2008). All these approaches generate valuable knowledge about the processes and peculiarities of jazz improvisation. However, they also have certain limitations.
Analyses based on transcriptions are often concerned with the minute details of single solos, frequently focusing on specific harmonic relations between the tonal material and the underlying harmonic framework. From a perspective of creativity research, the extension of single-piece analysis to computational and statistical analysis of corpora seems to be fruitful as far as generalisability is concerned. But these approaches frequently deliver descriptions on a very global level; for example, pitch histograms for an entire solo. Computer-based approaches have also been used to investigate the extent to which patterns (formulas, riffs) are employed by improvisers (Frieler, 2014; Norgaard, 2014; Porter, 1985; Weisberg, 2004).
Studies of jazz improvisation based on interviews with musicians (expert or novice) give valuable and indispensable insights into the process of improvisation. For example, the use of patterns is consistently mentioned by practitioners of jazz, often in connection with learning strategies. As Berliner (1994) has demonstrated, verbal accounts by musicians and analytical results converge to form a convincing picture of “the infinite art of improvisation”. However, studies based on interviews and introspective accounts provided by musicians themselves rely on their reflective capabilities, which are limited by cognitive constraints. Some parts of the improvisational process are not accessible to the performing artists because the monitoring process is either too slow or too vague due to the heavy cognitive demands imposed by the very process of improvisation. Additionally, these accounts are always retrospective, which adds uncertainty due to lapses of memory, or are distorted by the musicians’ own assumptions, which are frequently informed by jazz theory. Finally, they are also constrained by the communicative skills of those surveyed. Therefore, introspective and interviewing methods are somewhat limited in their explanatory power for studying improvisational processes. The most promising approach is a combination of interviews and transcriptions as employed for instance by Norgaard (2008), who let jazz musicians comment on their own improvisations recorded immediately before the interview. But interview-based methods are generally not applicable to historical solos of the jazz masters.
To sum up, transcription-based analysis tends to be unsuitable for capturing the decisive strategies of the improvising musician, whereas introspective approaches are prone to bias because of the inevitable subjectivity in performers’ accounts as well as principal constraints on introspection. Moreover, both approaches tend to focus solely on certain remarkable, special or otherwise salient parts of an improvised solo, but very seldom on its structural totality. There is a need for a complementary methodology that takes structural aspects into account and, at the same time, describes and quantises improvisational strategies in a concise and compact way. This kind of method would be very helpful in laboratory experiments with improvisers under controlled conditions for a quantified assessment of systematic differences with respect to varying tempo, tonality and other external parameters.
Recently, Lothwesen and Frieler (2012) and Schütz (2015) developed such an approach for jazz piano improvisation called midlevel or ideational flow analysis. In their pilot study, Lothwesen and Frieler (2012) investigated a small set of jazz piano improvisations using qualitative methods based on data-driven classification methods inspired by Grounded Theory (Glaser & Strauss, 1967). This resulted in a categorical system for the annotation of non-overlapping segments of the improvised stream of events, representing distinct playing ideas on a middle level between the level of single events (tones) and structural levels such as the underlying chord progression, single choruses or even the typical head–solo–head structure of a jazz tune. This middle level roughly corresponds to musical phrases without always being identical to them. Lothwesen and Frieler used this system of annotation with midlevel units (MLU) to examine the influence of tempo and tonality on jazz piano improvisations.
Schütz (2015) further revised and extended the system of Lothwesen and Frieler, relying on a much larger set of jazz piano improvisations that were produced by expert performers in controlled conditions using different types of harmonic templates (ballad, blues, modal). The piano improvisations were recorded as MIDI-files including several repetitions on several recording dates. Schütz was able to show that the different templates had a distinct influence on the stream of midlevel units of the improvisers and that those players have different preferences for unit types in general. Finally, by teaching and introducing his participants to the annotation method, he gathered strong evidence that the approach captures valid aspects of the improvisational process according to the players’ introspective accounts. Therefore, the sequence of midlevel units in a solo can be viewed as a compact structural, though retrospective and therefore hypothetical, description of the improvisation process itself, and that the instantaneous playing decisions often correspond with midlevel units as identified by annotators. Based on these results, Schütz devised a novel model of improvisation, which he dubbed “Ideational Flow Model”.
Inspired by these studies, we adapted the midlevel annotation method to a corpus of 140 monophonic jazz solos taken from the Weimar Jazz Database. Since the original categorical systems evolved from annotations of polyphonic jazz piano solos, it had to be adjusted for monophonic solos. In the following sections, we will give a detailed account of the annotation procedure and the category system and indicate the various potentials of our approach for an advanced statistical analysis of jazz improvisation on a structural level. The discussion of the theoretical model underlying the midlevel analysis, notably a revision and extension of Schütz’ Ideational Flow Model, will be left for a future paper which is in preparation.
Midlevel analysis
In the first stage, solos were segmented in clearly discernible chunks which were annotated in a so-called “Region Layer” in the Sonic Visualiser project file of the original transcription (Cannam, Landone, & Sandler, 2010; Frieler et al., 2013). Segments are not allowed to overlap and each event must be contained in exactly one segment. Tentative labels were given to each unit (“open coding”, see Glaser & Strauss, 1967). A new category was added if a newly encountered unit seemed sufficiently distinct from all previous categories and if it occurred more than once. After the categorisation had reached saturation, that is, no new categories were required to annotate new solos, the system was restructured into an exhaustive and compact three-tier hierarchy (“axial coding”). The main categories are labelled lick, line, melody, theme, quote, rhythm, expressive, fragment and void, and will be discussed in detail in the next section.
All categories were initially devised with the concept of playing ideas in mind; however, on the lower levels of the hierarchy the categories are increasingly more descriptive of the musical surface itself.
Additionally, a compact syntax was developed to ease coding by assigning short labels (e.g. line_a for an ascending line). Coding options for similarity relationships between units, for example transpositions and variants, have also been included. We did not prescribe how to determine these similarity relationships, leaving this to the coders’ judgement. For such derived MLUs, the coding includes a back reference as the distance between the related MLUs in number of units. In case of more than one possible relationship, the shortest such distance is used.
While each musical phrase starts a new midlevel unit by definition, phrases can consist of several units “glued” together, which can be marked by a preceding tilde sign “~”. This option should only be used if a clear change in character can be observed over the course of a phrase. In this way, phrase structure can be fully recovered from the midlevel annotation. Finally, extra information can also be coded, for example in case of quotations the source is added, and for MLUs based on the original composition of a piece, the corresponding bar numbers are referenced.
A codebook was written (document S2 in the Supplemental Material Online) which includes precise descriptions of all categories and detailed coding instructions. Subsequently, a small set of solos was annotated by the four designated annotators in a trial round and results were compared and discussed to reach maximal consensus before applying the method on a larger scale.
Midlevel categories
The final MLA system encompasses nine main categories, with 18 subcategories and 41 sub-subcategories. The system is not balanced; that is, only three categories (lines, licks and rhythm) have several subcategories, whereas the other categories have none. It forms a strict mono-hierarchy: segments must be assigned to one and only one class, even though it might be sometimes desirable to assign fringe cases to more than one class label. However, we decided to use a strict system in order to facilitate the annotation process and to ease subsequent statistical analysis. In future versions of the method, this constraint might be dropped.
Class labels were chosen to be expressive, compact and linked to existing jazz discourse. We aimed to label the classes as clearly as possible, but slight deviations from standard meanings cannot be ruled out. To differentiate class labels as such, italicised names will be used.
line: a line is a series of tones mostly proceeding in small, step-sized intervals with high rhythmical uniformity and a salient trajectory in pitch space. Depending on the shape of a line, there are several sub- and sub-subcategories. The main subcategories are simple, tick, interwoven and wavy lines. The main shapes are horizontal, ascending, descending, concave and convex. Tick lines are lines of exclusively convex and concave shape but with asymmetric arms, i.e. a longer or shorter descent or ascent combined with a complementary longer or shorter ascent or descent. Simple lines show a straight direction, i.e. without too many twist and turns, which are characteristic of wavy lines. Wavy lines tend to be rather long and may have an overall direction besides “wiggling around”. Interwoven lines consist of two independent horizontal ascending or descending lines that are played in tone-wise alternation.
lick: in the context of MLA, a lick is a rather short and concise melodic figure that often includes rhythmical and intervallic salience. 1 Licks have a clear gestalt-like quality, which distinguishes them from fragments (see below). They comprise mostly short notes and sometimes large intervals and chromaticism, which distinguish them from melody. Shortness, rhythmic diversity, or both qualities together, separate licks from lines. There are two proper subtypes: lick bebop and lick blues. All other licks are grouped in a residual subclass lick other. Blues licks are defined by tonal features such as blue notes as well as typical constructions. Historically, the blues played (and still plays) a special role in jazz improvisation (Schuller, 1968/1986), so it seemed worth defining a special subcategory. Bebop licks on the other hand use certain techniques which are typical for bebop improvisations, such as approach notes and chromatic passing tones.
melody: a melodic figure that is not derived from the theme of the song and embodies some kind of song-like, lyrical or cantabile character. A rule of thumb may be: if an MLU sounds more like scatting (if sung), it should be termed a lick or a line; if it sounds more like a Broadway tune, a pop song, a folk tune or an opera aria, it should be labelled melody.
rhythm: this category describes units in which the rhythmical expression is the single most prominent feature. There are four subtypes that differ according to the number of pitches (single or multiple) and basic rhythm quality (regular or irregular). The most important subtypes are single tone repetitions, predominantly regular and isochronous, and oscillations with multiple note repetitions, predominantly regular.
theme: denotes material taken directly from the theme of the tune, possibly with variations. The features of theme MLUs are often similar to those of melody, but because of its relationship to the tune, it is a distinct playing idea.
quote: these are direct quotes from another piece of music (jazz tune, classical tune, etc.), which might resemble a melody or a theme, but again, because of its origin, it is a different category. However, playing a pattern taken from another jazz musician as part of a longer line or a lick does not count as a quote if it is not clearly recognisable as such.
fragment: this is a small set of tones which neither combine to form a clear contour-based succession or motivic/thematic figure, nor are they very expressive. Fragments are most often single tones or very short segments which can sound like trials or mistakes.
expressive: these are figures or single tones with a sound- or gesture-like character in which aspects of expressivity are clearly focused, e.g. scream-like sounds.
void: this category refers to moments of “actively playing nothing”. Generally, jazz soloists add short breaks between phrases, e.g. just for breathing, which do not belong to this category. The length of the break in the flow of a solo should clearly exceed these usual gaps between phrases. It might also be a moment of consciously developing new ideas (see Lehmann & Goldhahn, 2009).
The system could be easily enhanced by more detailed descriptions, for example with respect to the tonal content of MLUs (e.g. diatonic vs. chromatic, “inside” vs. “outside” the harmonic frame) or rhythmic and metrical aspects (offbeat, double-time, half-time, poly-metrical), etc. This might indeed be a fruitful approach in the future. For the time being, however, we choose not to differentiate and possibly further complicate the already rather comprehensive system. Moreover, during the open coding process, these additional features did not suggest themselves easily, so we decided to leave the option of refined levels of description for later stages, possibly by using automated feature extraction algorithms available in the MeloSpySuite toolbox developed by the Jazzomat Research Project (Frieler, Pfleiderer, Abeßer & Zaddach, 2015).
Example: Annotation of Sonny Rollins’ “Blues Seven” (1956)
In Figure 1, the first two choruses of Sonny Rollins’ solo on “Blue Seven” are depicted together with a midlevel annotation. The solo is commonly regarded as a masterpiece, to which Gunther Schuller devoted one of the first structural jazz analyses (Schuller, 1958; for a critique see Givan, 2014). The tune itself is a simple blues with a theme that was improvised by Rollins on the fly. One striking feature of the solo is the impression it gives of being well-thought out, which, especially in the beginning, is partly due to the eminent motivic-thematic work – which fascinated Schuller, too.

Sample midlevel annotation of the first two choruses of Sonny Rollins’ (first) solo on “Blue Seven” (1956).
The solo starts with a short melody MLU with a rather unusual intervallic structure of an ascending minor second followed by a major seventh jump up. It is answered by a lick, followed by another melody, which has parallels to the starting melody unit, but is actually new material. The subsequent lick indeed seems to be a variant of the lick two units before, hence it is annotated with the label ##lick. It is followed – quite surprisingly at this point – by a wavy line over two bars, with no clear overall direction. It starts and ends on the same note, and meanders over a pitch range of one-and-a-half octaves. This line features 12 turning points for about 40 tones in total (hence it is clearly of the wavy type ) as well as chromatic passing tones, short arpeggios, which are partly asymmetrical with large intervals and chromatic approaches, occasional blue notes (minor thirds and tritones) but also simple step-wise diatonic motion. This line can be regarded as a prototypical example of the category wavy lines. Next, Rollins resorts to a short melody which is clearly derived from the initial motif. Then a fragment consisting of a single tone is followed by an oscillation, that is, a regular rhythm with multiple tones, which is glued to a concluding blues lick. This blues lick could also be labelled as a reference to the very first measure of the theme, which contains a falling tritone motif that can as well be found at the core of this lick. However, due to the different accent structures the perceptual similarity appears weak. Next in line is a reference to the third measure of the theme of the piece, which was actually improvised by Rollins (see Givan, 2014). After that, the solo continues with two melody MLUs which are successively upwardly transposed variations (or extensions) of the initial motif (notably using upper structures of the underlying chords G over F7 and F-7 over Eb7). The second chorus ends with a wavy line; this time with a clear descending overall motion, ending on the first beat of the second bar of the 12-bar-form (on the flatted fifth of Eb7), thus blurring the chorus boundary.
This short example demonstrates that the midlevel analysis gives a fairly clear and comprehensible overview about what is going on in a solo. However, some typical problems of the annotation process can also be found. For example, the initial motif was dubbed melody; but it could have also been called a lick, or even a blues lick, because it employs the blues third of the F7 chord. But it does not sound very bluesy, probably because of its unusual interval structure. The rhythmic structure and Rollins’ interpretation of the motif finally tipped the scales for the annotation as melody, and it actually is singable. The ensuing lick could also be labelled more specifically as a blues lick – arguments both for and against this label could be found. The following melodic unit seems vaguely related to the initial motif, but there are also some differences. The basic strategy of the midlevel analysis for these kinds of problems is to adopt a conservative attitude that is, if in doubt, to choose the least specific option.
Inter-coder reliability of midlevel annotations
In order to estimate the amount of variance introduced by the subjective annotation process, we devised an evaluation method based on a set of 10 solos for which annotations by two (5 solos), three (4 solos) or four (1 solo) annotators were available. The solos were chosen from the data set at random.
Based on binary strings representing segment boundaries, we calculated F1-scores and consensus values for each available pair of annotations. For full details of the method employed, please refer to document S3 in the Supplemental Material Online. The results show a high agreement on MLU boundaries with a mean F1-score across all pairs of annotators of .83 (SD = .08, baseline chance value is .085, Fleiss’ κ = .81). Disagreement occurs mostly for glued MLUs and ambiguous phrase beginnings.
The mean consensus for main category labels is .60 (SD = .15, baseline = .32, Fleiss’ κ = .43). The main sources of disagreement are licks and lines, which account for about 23% of all confusion. The labelling of sub-subcategories is, with respect to the much larger set of labels, still acceptable with .46 (SD = .15, baseline = .24, Fleiss’ κ = .29).
All in all, the inter-coder reliability is sufficiently high. There are some grey areas between categories, particularly between lines and licks, which can hardly be circumvented. However, the confusion of lines and licks is nearly symmetrical across coders, therefore, in statistical analyses, these differences might balance each other out. Furthermore, all annotations were examined and cross-checked by the first author, which further increased their reliability.
Connection to models of improvisation
The guiding principle during the development of the MLA methodology was the concept of “ideas” or, more precisely, that of meso-level action plans (see Lothwesen & Frieler, 2012; Schütz, 2015), which resulted in distinctive musical units with distinctive features that could be robustly identified during the coding process. This resembles the concept of event clusters in Pressing’s influential cognitive model of improvisation, which is also based on a non-overlapping segmentation of the improvisational stream (Pressing, 1988). Pressing did not fully specify details of these chunks, such as length, extent, constituents or rules on how to derive them from actual improvisations. Instead, he provides a rather open description system in terms of what he calls objects, features and processes. Studying his examples and an application of the model to a free keyboard improvisation (Pressing, 1987), one could assume that he probably imagined smaller units which can be linked by processes of association and interruption; the first being divisible into similarity and contrast, the second leading to a grouping of event clusters into event cluster classes, for which a certain amount of coherence is assumed. Obviously, these event cluster classes bear some similarities to the midlevel units of the MLA, although they are less tightly defined and governed by more fine-grained cognitive processes. The sample analysis in Pressing (1987) suggests that he more or less equated phrases and event cluster classes. Pressing found a similar duration range for event cluster classes, but with a longer mean value of about 5 seconds.
Dean, Bailes, and Drummond (2014) measured macro-structure contrast in a series of free (non-tonal) improvisations. They use statistical analysis of time series for a number of features to identify change points in free piano improvisations recorded with MIDI. The resulting segments might be related to ideas and action plans even though they seem to be longer than the MLUs. Unfortunately, the authors do not provide full statistics on the segments. Moreover, segments do not always seem to be congruent across feature dimensions.
From his guided interviews with jazz musicians, Norgaard (2008) identified a component of the improvisational process that he calls “sketch planning”, which seems to be similar to the concept of ideas or action plans in the sense of our MLA. However, Norgaard is not very specific about what exactly constitutes these sketches, probably on behalf of his informants, who mentioned these sketches only sporadically while commenting on their own improvisations. Norgaard states that sketches can be concerned with any musical feature of the upcoming passage such as “architectural elements like note density, use of various registers on the instrument, or harmonic structure” (Norgaard, 2008, p. 62). Additionally, Norgaard identifies four basic generative strategies “harmonic priority”, “melodic priority”, “idea bank” and “internal reference”. Compared with the MLA, the first can be mapped partly onto the MLA category line, but with an emphasis on harmonic ideas instead of movement and contour. The second strategy can be related to the MLA categories lick and melody. The third strategy, “idea bank” is, despite the use of the term “idea”, not connected to action plans in the sense of the MLA. Instead Norgaard’s “idea bank” is a collection of pre-rehearsed musical formulas (patterns) stored in long-term memory, which musicians, particularly experienced players, always have at their disposal. These patterns are not part of the MLA system, but are assumed to be used to actually perform lines and licks by recalling, linking and adapting shorter patterns. The fourth strategy, “internal reference” (i.e. the repetition and variation of earlier material), is incorporated into the MLA via syntactical rules (see above). Furthermore, this strategy may also refer to the head of the tune, which is represented as the MLA category theme.
Exploratory data analysis
The sample of the following examination consists of a randomly chosen but roughly style-balanced set of 140 solos by 64 performers from the Weimar Jazz Database (Frieler at al., 2013), which was assigned to four annotators at random. All MLUs were collected and cross-checked for syntactical errors by the first author, and labels were adjusted if it seemed necessary in order to enhance consistency. The solos cover the jazz styles traditional, swing, bebop, hardbop, cool jazz, postbop and free jazz and were recorded between 1925 and 1997 (median recording year: 1956). See Table S1 in the Supplemental Material Online for a full list of solos. For each solo, all annotated MLUs were imported into R (R Development Core Team, 2008) for further analysis.
Results
Descriptive statistics
The solos were annotated with 4939 midlevel-units in total. The median number of MLUs per solo is 32 (range: 6–163, SD = 23.0), the median number per chorus is 13.75 (range: 2.5–87, SD = 13.64). The distribution of main MLU types can be seen in Table 1 and Figure 2. The most common type is lick with 45.7%, followed by line with 31.5%, melody with 7.3%, expressive with 5.2% and rhythm with 4.7%. All other ideas together comprise only 5.6% of all MLUs.
Frequency and duration of main MLU categories.
Note. RF = Relative frequency, Cum. RF = Cumulative relative frequency, Dur. = Total duration, Rel. Dur. = Relative duration.

Histogram of MLU main types (left) and subtypes (right).
The mean duration in seconds and the note lengths of the ideas can be found in Table 2. The longest MLUs are theme units with M = 3.7 s (SD = 2.2 s), followed by line with M = 2.9 s (SD = 1.8 s). Licks are rather short with a mean duration of 1.8 s (SD = 1.1 s). With respect to note length, lines have the largest average number of notes (M = 19.4), but vary considerably (SD = 13.8). The same holds true for rhythm units (M = 13.8, SD = 12.9). Licks also have smaller numbers of notes (M = 8.3, SD = 5.0). The average duration of all MLUs together is 2.3 s (SD = 1.6 s), with a mean note length of 11.8 (SD = 10.7).
Mean duration and note length of main MLU categories.
We also looked at the distribution of subclasses (using the main class for classes without subclasses). The subcategory most often employed is the unspecified lick other (43.1%), followed by line wavy (18.6%), melody (7.3%) and expressive (5.2%). These four subtypes comprise 74.2% of all MLUs (see Table 3 and Figure 2). Moreover, simple descending lines (3.5%) are more common than simple ascending (2.2%), convex (1.0%) and concave (0.4%) lines. Similarly, rhythms with multiple tones are twice as frequent (3.1%) as single tone rhythms (1.6%). The least common subtype is quote (0.4%); however, some quotations probably went unnoticed by the annotators. With respect to duration, line wavy has a share of 29.1% of the total MLU duration (not counting gaps in between MLUs), which is nearly as much as the more frequent, but much shorter licks, which occupies one third (34.4%) of the total MLU duration.
Frequency and duration of MLU subcategories.
Note. RF = Relative Frequency, Cum. RF = Cumulative relative frequency, Dur. = Total duration of subtype, Rel. Dur. = Relative duration with respect to total duration of all MLUs.
Looking at the distribution of motivic relationship between MLUs, 25.1% are derived MLUs, that is, a quarter of all units is based on earlier material. The types of derivation are distributed as follows: 70.8% are unspecified variations, 0.3% exact repetitions, 16.7% lower transpositions and 12.2% higher transpositions. Furthermore, rhythms (39.9%) and licks (38.2%) are much more likely to be derived than lines (19.3%) or melodies (23.3%).
The encoded back references allow an investigation of “associative chains” (Jost, 1974), which are strings of MLUs where each unit is related to the immediately preceding one. A histogram of MLU-chains lengths can be found in Figure 3. Isolated, long-range relationships and modifications of external sources such as theme references and quotes are not considered here. The mean chain length is 2.8 (SD = 1.4) units. The maximum value is 15, which occurs in John Coltrane’s 8-minute modal solo on “Impressions”, recorded in 1963. MLU chains are generally rather short, lasting about two or three units and 15.7% of all derived MLUs are part of such a chain; the remaining 11.7% therefore represent long-range relationships; 11.6% of all derived MLUs have a back reference to a unit exactly 2 units before, and only 8.5% have a reference range of more than two units. But again, the annotators probably did not catch every long-range relationship between MLUs, so this number might actually be an underestimation.

Histogram of MLU chain lengths. An MLU chain is defined as a consecutive string of MLUs where each MLU is related to the immediate preceding one.
Differences with respect to style
Further insights were provided by examining the distributions of main MLU types with respect to style. We excluded Ornette Coleman, because he is the only performer in our data set categorised as free jazz. The remaining styles are thus traditional, swing, bebop, hardbop, cool and postbop. 2 We also used three combined style groups, pre-bop (= traditional, swing), bop (= bebop, hardbop, cool) and postbop. Additionally, the recording year of the solo serves as a handy proxy for consecutive jazz styles.
A boxplot of relative frequencies of categories can be seen in Figure 4. Upon visual inspection, the distributions for different jazz styles look rather similar, since licks and lines are prevalent in each. The most striking difference is the strong dominance of licks and the lesser use of lines in solos of the traditional styles. Generally, the frequencies of licks decrease with later recording dates, while lines become more frequent (bootstrap Spearman correlation, based on 50 simulations (see below) between the recording year and frequency of the category lick: pmed = .001, ρmed = –.325, and with line: pmed = .02, ρmed = .292).

Boxplot of MLU main categories with respect to style (traditional, swing, bebop, cool jazz, hardbop, postbop).
Unfortunately, our data does not permit the application of straightforward significance tests because the relative frequencies of main MLU types are far from being normally distributed and for most performers more than one solo is present. Hence, we resorted to a bootstrap sampling procedure as follows. For each MLU main category, 20 simulations were run for which exactly one solo per performer was drawn at random from the full data set. For these subsets, Kruskal-Wallis tests with relative frequency of the main category as dependent variable and style as independent variable were conducted, and p-values were collected. Cohen’s d values were calculated for each style combination (using all available data to avoid pooling issues). Only those values with a strong effect of at least |d| > 0.8 and a bootstrap p-value of p < .05 were retained. Results can be found in Table 4.
Differences between main MLU categories with respect to style.
Note. Meta = Meta variable; p = p-value of a Kruskal-Wallis test over all main types; pmed = median p-value for bootstrap Kruskal-Wallis tests with samples of one solo per performer (N = 63) over 20 iterations; Values 1 and 2 = distinct levels of the tested meta variable for which Cohen’s d is shown in the last column. All entries were selected for pmed < .05 and |d12| > .8. Levels of style: TRADITIONAL, SWING, BEBOP, COOL, HARDBOP, POSTBOP (df = 5). Levels of style groups: PRE-BOP (= TRADITIONAL, SWING), BOP (= BEBOP, HARDBOP, COOL), POSTBOP (df = 2).
Most evident is the less frequent use of lines in the traditional style which is complemented by a much more frequent use of licks. All effects are very large (mostly |d| > 1). The only further differences are the higher frequency of voids in bebop and cool jazz as compared to swing and the higher frequency of fragments in hardbop as compared to traditional, swing and cool jazz. Though this could hint at a greater expressivity in bop styles, this might also be a spurious result, since voids and fragments occur very seldom in general.
As for style groups, the type expressive is more frequent in postbop than in bop, or in pre-bop, but Cohen’s d is only −0.65, so it does not appear in the table. Moreover, licks are more frequent in pre-bop than in postbop (and in bop with d = 0.65) and, correspondingly, lines are more frequent in bop and postbop styles. Bop and postbop do not differ with respect to the usage of lines and licks. It seems that with the introduction of the long lines typical to bebop, a fundamental change in improvisation had taken place. However, lines already started to grow more popular in the swing style, so that it must be seen as a gradual transition, which reached its peak in bebop and has not been reversed since then.
Information entropy is a metric for measuring information content in a probability distribution. The more uniform the distribution, the less predictable it is; that is, the higher the information content. Entropy can be used to measure the variety of MLUs in a given solo. We will restrict ourselves to the entropy of nine main categories. Very robust differences could be found for single jazz styles and style groups. The rank correlation between main type entropy and recording year is positive and highly significant (bootstrap Spearman correlation, based on 50 simulations, of recording year with entropy of main types: pmed < .001, ρmed = .449). Therefore, the later the year of recording, the higher the variety of employed MLUs, as shown in Figure 5. Jazz solos became increasingly diverse in their musical means. Note that the MLU distributions are still dominated by licks and lines. The median entropy is 1.55 bits, which means that on average one or two yes/no questions are already sufficient to identify the main type of an MLU.

Scatterplot of the entropy of main categories vs. recording year.
Interestingly, we could not find any robust differences in the proportion of derived ideas with respect to style (bootstrap Kruskal-Wallis test with 20 simulations: median p = .662). According to Schuller’s (1958) claims on motivic improvisation, one would have expected an increase in motivic relations with time. However, this is not the case. Instead, it seems that the amount of internal relationships between MLUs remains fairly constant across all styles and performers, since the mean value is 25.1% with a standard deviation of 15.3%. Half of all solos have a proportion of derived MLUs between 13.8% and 34.6%.
Likewise, no robust differences could be found for the “glue rate” (bootstrap Kruskal-Wallis test with 20 simulations: median p = .422), that is, the number of ideas that do not start at the beginning of a phrase, though a slight tendency towards a lower glue rate in traditional style and higher glue rate in bebop could be observed (omnibus Kruskal-Wallis test χ2(5) = 15.02, p = .010).
Differences between performers
We briefly examined some differences between performers using multi-dimensional scaling (MDS), which is a dimension reduction procedure for distance data. We calculated the Euclidean distances between the pooled vectors of relative frequencies of the main MLU categories for each pair of performers, and applied a metrical MDS. A two-dimensional solution is depicted in Figure 6. Note that some players – Stan Getz (2 solos), Sonny Stitt (2 solos), Chu Berry (1 solo), Charlie Shavers (1 solo), Warne Marsh (1 solo), Wynton Marsalis (1 solos) and David Murray (1 solo) –were removed to achieve a better display. The goodness-of-fit in both dimensions is very high, at .86. The x-axis is strongly positively correlated with the relative frequency of line (Spearman’s ρ(127) = .75, p < .001), weakly positively with expressive (ρ(127) = .37, p = .001) and void (ρ(127) = .28, p = .001) and strongly negatively with lick (ρ(127) = − .75, p < .001). The y-axis is weakly positively correlated with line (ρ(127) = .32, p < .001) and weakly negatively with melody (ρ(127) = −.34, p < .001), rhythm (ρ(127) = −.33, p < .001), and expressive (ρ(127) = −18, p = .05).

Two-dimensional metrical MDS solutions using Euclidean distances of aggregated relative frequency vectors of main MLU categories according to six jazz styles. Goodness-of-fit for both dimensions is .86. Some musicians (Warne Marsh, Chu Berry, Wynton Marsalis, David Murray, Sonny Stitt and Stan Getz) are left out for display reasons. Performers were mapped onto their main style if more than one style was available.
Performers playing in traditional styles are the most distinct from all other performers, particularly bebop and hardbop musicians, but notably also from swing, the succeeding style. Interestingly, the convex hulls of bebop, hardbop and postbop are fully nested within each other, which agree with the standard jazz history readings that these styles were continuous extensions of each other. Compared to all other styles, bebop and hardbop occupy a very small area. Since postbop takes up the most space and encompasses nearly all other styles except for traditional jazz, postbop players seem to use a rather wide variety of ideas. Swing and cool jazz appear to form a series with the traditional style by a counter-clockwise rotation due to the increasing use of lines. Swing also has a significant overlap with bop styles as well as with cool jazz, which in turn shares the upper-left hand corner with bebop and hardbop.
To inspect these differences a little further, we devised the lick–line–other continuum, which can be visualised compactly using a triangular plot, where the heights on the sides of an equilateral triangle correspond to the relative frequency of lines, licks and others. Other is the summed relative frequency of all remaining main MLU types.
As a more specific example, we compared the solos by Miles Davis and John Coltrane, as depicted in Figure 7. The lines parallel to the opposite sides of a corner are lines of constant value. The closer a point is to a corner, the greater the frequency of the class labelled at this corner. For instance, Miles Davis’ solo on “Vierd Blues” is located on the line connecting the lick and other corners, which means that it employs mostly licks (~75%) and to a lesser degree other MLUs (~25%), but no lines at all. On the opposite side, John Coltrane’s solo on “Countdown” can be found, which consists almost exclusively of lines (~90%) and other MLUs (~10%), but no licks.

Ternary plot of relative frequencies of line, lick and other main MLU categories for all solos of Miles Davis (MiDa) and John Coltrane (JoCo). Size and transparencies of circles are proportional to average entropy of main MLU categories distribution.
Solos in the centre of the plot make use of all three possibilities in roughly equal proportions. Moreover, the size and transparency of the circles are directly proportional to the average entropy of the performer with respect to main category. This gives a further indication of the variety of other MLUs used.
The hardbop solos of Davis and Coltrane (until 1959) are quite distinct, but then their respective stylistic developments converged, in particular for the modal pieces, which John Coltrane extensively used in his later phase (1959–1965). Accordingly, the solos of both players in their collaboration on “So What” can be found rather close together in the upper centre of the triangle.
The graph also reveals that considerable variation can exist between different solos of a performer (particularly from different periods). In order to reliably assess the existence of personal styles on the level of MLU usage, a much larger number of solos per performer would be needed.
Discussion
In this paper we presented a novel method of analysis for monophonic jazz solos, called midlevel analysis (MLA). MLA focuses on a meso-level of creative decision-making in jazz improvisation which (hypothetically) corresponds to action plans within the psychological present and results in the generation of distinctive midlevel units (MLUs). The corresponding annotation system for MLUs was developed in a data-driven process of categorisation guided by the methods and approaches of Grounded Theory (Glaser & Strauss, 1967). A codebook was written and inter-coder reliability has been estimated to be very high (.83) for segment boundaries and sufficiently good (.60) for main category labels. The ensuing annotation of 140 solos provided novel insights into the improvisation process of players of several jazz styles. Not surprisingly, licks and lines are the most common MLUs. The first is more frequent, while the second takes up more total time of a solo, due to its longer duration. This observation basically holds for most of the solos in the study, but with considerable variation among solos, performers and styles. Nonetheless, jazz and personal styles can be partly differentiated using the basic data of MLU frequencies alone.
The fact that MLU durations centre neatly around the well-established time-window of subjective presence time of 2–3 seconds (Fraisse, 1982; cf. also Lehmann & Goldhahn, in press, for a recent application regarding the psychological present in jazz improvisation) can, in our view, be viewed as indirect evidence that the underlying concept of action plans is viable and might capture some “true” elements of the underlying psychological processes.
Notably, motivic improvisation is a rather constant trait of jazz improvisation, and not an invention of later styles, as Schuller (1958) suggested. Furthermore, traditional players mostly used licks, while lines evolved in swing and fully blossomed with bebop and later styles, sometimes clearly dominating all other MLUs (e.g. in improvisations by players of the Tristano School, such as Lee Konitz and Warne Marsh). With the advent of postbop, the arsenal and usage of MLUs became more diverse. Modern players such as Bob Berg, David Liebman or Joe Henderson display considerably larger entropy – that is, unpredictability in their MLU usage. Generally, complexity and expressivity are heightened in postbop styles, as can also be seen on other levels of description (Frieler, Pfleiderer, Abeßer & Zaddach, in press).
Conclusions and outlook
In this first exploration of the midlevel annotations of our dataset, only a few basic analyses were employed. One obvious next step would be a more fine-grained examination with the additional aid of melodic feature extraction. This would allow for an investigation of the actual differences between different MLU types as well as of the grey area between licks, lines and melodies. Moreover, finer details of line construction (especially wavy lines) with respect to patterns, harmonic relationship, scales, contour, micro-timing, accents and other features could be examined. A similar analysis could be applied to licks and other MLU types.
As demonstrated in this study, the concept of MLA suggests new approaches to the historiography of jazz styles and personal styles. In addition, MLA could be applied to the analysis of further aspects of jazz improvisation, for instance of dramaturgical and narrative aspects (“telling a story”), or in case studies of single solos as illustrated in our example (Figure 1). Since midlevel annotations are forming sequences of symbols, MLA could be expanded by the study of Markov and other sequential properties of MLU chains. For instance, different MLUs have different occurrence probabilities during the course of a solo, for example more expressive MLUs such as expressive and rhythm tend to occur more often towards the end of a solo, whereas voids and licks can be found more frequently at the beginning (Frieler et al., 2015).
We consider developing a fully automated midlevel annotation system, for which the already annotated solos can serve as a ground truth to build upon. However, some of the categories, notably theme and quote, as well as the coding of motivic relationships and the concept of glued ideas will provide computational challenges, due to the similarity computations involved. But even a partly achieved computational solution might already facilitate and speed up the annotation process in a semi-automated fashion.
Last but not least, midlevel annotations seem to be very suitable for enhancing and communicating an understanding of the structural composition of jazz solos, in particular for non-jazz listeners. This might offer educational potential. Moreover, the concepts of the Ideational Flow Model and the MLA might be fruitful for the training of jazz practitioners.
Supplemental Material
sj-zip-1-msx-10.1177_1029864916636440 – Supplemental material for Midlevel analysis of monophonic jazz solos: A new approach to the study of improvisation
Supplemental material, sj-zip-1-msx-10.1177_1029864916636440 for Midlevel analysis of monophonic jazz solos: A new approach to the study of improvisation by Klaus Frieler, Martin Pfleiderer, Wolf-Georg Zaddach and Jakob Abeßer in Musicae Scientiae
Footnotes
Acknowledgements
The authors would like to thank all jazz and musicology students who transcribed the improvisations, in particular the three annotators Friederike Bartel, Benjamin Burkhardt, and Martin Breternitz. The authors are grateful to the two reviewers for their helpful comments.
Funding
The JAZZOMAT RESEARCH PROJECT is supported by a grant provided by the German Research Foundation (“Melodisch-rhythmische Gestaltung von Jazzimprovisationen. Rechnerbasierte Musikanalyse einstimmiger Jazzsoli”, DFG-PF 669/7-1).
Supplementary material
Supplemental online material is available from http://msx.sagepub.com/content/by/supplementaldata,
, raw data and R scripts from https://osf.io/wumcd/.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
