Abstract
In the first half of the 20th century, Carl Seashore and colleagues undertook extensive work in performance analysis of a variety of instruments. Their data were embodied in so-called performance scores, which still exist as illustrations of laborious work undertaken by the early pioneers of music performance research and, in their original form, offer today’s researchers the opportunity to only visually examine the experimental data. This article describes the use of image-processing methods to accurately convert the visual data in Seashore’s performance scores into data points. This use of technology offers researchers the opportunity to directly engage with the data collected in Seashore’s laboratory, either for the purposes of validation or performing analyses for which the researchers in Seashore’s time did not have the computational facilities or algorithms to perform. This paper also presents a proof-of-concept study of the vocal performance scores in Harold Seashore’s “An Objective Analysis of Artistic Singing” (1936). Specifically, this study analyzes extracted fundamental frequency data in regards to vibrato rate and depth. Discrepancies between the statistics calculated on the digitized data and those reported in H. Seashore’s publication are discussed, as well as other types of analysis that may be performed on the extracted data.
The researchers in Carl Seashore’s laboratory at the University of Iowa in the 1920s and 1930s undertook extensive studies of musical performance. Piano performances were studied from both piano rolls and films of the movement of the hammers during the performance. The team also undertook numerous studies of the singing voice and violins using phonophotographic apparati, generating analyses of timing, dynamics, intonation, and vibrato (C. Seashore, 1938). They reproduced many of these analyses visually in their publications as “scientific musical pattern scores” or performance scores (Metfessel, 1926; C. Seashore, 1927).
The performance scores for piano performances display timing and intensity information, which was transcribed from films recorded with a camera that captured both hammer and pedal action. Due to the time-varying nature of the tones in singing and violin performance, these performance scores are more complex than the ones for the piano, displaying both continuous fundamental frequency (F0) and intensity information as they evolve over the course of a performance. The violin and vocal performances were transcribed from the output of phonophotographic devices. Intensity information was captured with a voltmeter (Tiffin, 1932) and timing information was assessed as a function of linear space on the phonophotographs. F0 estimates were made with oscillograph tools, referred to as a stroboscope (Metfessel, 1929) or a tonoscope (C. Seashore, 1929), and timbral analysis was performed with harmonic analyzers (Miller, 1916). Oscillograph tools perform a time-domain analysis of the inputted signal and output an F0 trace as a function of time. Harmonic analyzers, in contrast, perform a frequency-domain, or Fourier, analysis and output, in the case of musical audio inputs, a graphical representation of the amplitude and phase of the constituent partials of the complex tones in the audio.
The performance scores are rich in expressive performance parameters and the digitization of their data offers contemporary researchers the opportunity to use the experimental data from Carl Seashore’s laboratory for both validation of their reported calculations and more in-depth analyses than the researchers of that time were able to do, such as statistically modeling variations between performances or examining the role of musical context on performance practice (as discussed in Repp, 1992). The digitized data would also allow for empirical comparisons between contemporary and historic performance practices. Indeed, Seashore himself noted that “there is rich raw material to work upon in the performance scores” (C. Seashore, 1938), a statement that served as the inspiration for this project.
This is a large-scale project, whose goal is to digitize these scores, and make the data available for contemporary researchers. This paper presents the first steps in this project through describing the use of optical plot recognition technology to accurately digitize the performance scores data and presenting an example of the type of studies that can be undertaken with the recovered data. Specifically, a vibrato analysis of the performance scores in Harold Seashore’s “An Objective Analysis of Artistic Singing” chapter in Objective Analysis of Musical Performance, edited by Carl Seashore (1936), is reported as a proof of concept. In “An Objective Analysis of Artistic Singing”, H. Seashore analyzed the performance data for 10 vocal performances, of which the data for seven were reproduced as performance scores. The F0 data in these seven scores have been digitized and analyzed in regards to vibrato rate and depth. The vibrato analysis explores how accurately one may reproduce from the performance scores the summary statistics for vibrato rate and extent reported in the article.
Method
Performance scores
H. Seashore reported that in moving from the phonophotographic output to the published performance scores (see Figure 1), the F0 information was quantized to 20 cents (one fifth of a semitone) and intensity, used as an approximate measurement of loudness, was graphed in decibels (zero decibels represented the level of the quietest half second segment of that singer’s recording). F0 and intensity were both reproduced on the y-axis as a function of time, which was represented on the x-axis. Each plot reproduced the frequency and intensity analysis for 10 seconds of music, with each second marked with a solid vertical line and each tenth of a second marked with dashes and dots, which alternate in horizontal rows to increase readability. The vertical spaces between these horizontal rows represent both one semitone and a specified number of dB. Both frequency and intensity curves are plotted on the graph, with the frequency curve above the intensity curve. Notes and rests from the musical score are marked roughly at the time where they occur in the audio file and (in the case of singing performances) lyrics are annotated in line with the notes from the score. The systematic and consistent design and implementation of these performance scores allows for the data to be recaptured using optical plot recognition software.

Example of a performance score from Gounod’s “Ave Maria”, sung by Herald Stark (from H. Seashore, 1936, p. 17).
For this study, the published performance scores were digitized using Engauge Digitizer, 1 a piece of software designed to extract data points and curves from printed graphs. Separate analyses were required to capture frequency and intensity curves, as well as note onset/offset locations. Engauge Digitizer automatically calculates the distances between points once the axis anchor points are set. It also provides functionality for auto-filling detected curves and allows for missed data points to be manually added. It provides visual representations of its estimates of curve locations (see Figure 2 for an example) with different levels of zoom and different views of the image (original and binarized) in order to facilitate manual labeling. The combination of automatic and manual data point annotation resulted in uneven sampling of the curves along the x-axis, time. To correct for this, the curve data was later resampled at 10 ms intervals using a cubic interpolation function in MATLAB. Notes were segmented manually, following the description in the text about how H. Seashore identified note onsets and offsets.

Zoomed-in digitization of the frequency curve in the performance score example from Figure 1 using the software Engauge Digitizer. The closely spaced plus signs overlying the fundamental frequency
Vibrato analysis
Vibrato rate is defined as the number of vibrato cycles per second and vibrato extent as the distance from peak to trough measured in cents. As noted above, F0 values were quantized to the nearest 20 cents. Also, the vibrato rate was quantized to units of 0.5 Hz, as per the description in H. Seashore’s text, although following from the text the mean values were reported to the closest 0.1 Hz. The digitized data have been quantized accordingly in order to facilitate comparison of the computational calculations to the summary statistics provided in the publication. The main complication in this comparison is assessing which notes and/or vibrato cycles to include in the analysis. H. Seashore simply states the total number of cycles that were included in his analysis, but does not provide any exclusion criteria. For this study, only notes with at least two cycles and an absolute slope of less than one semitone (calculated using the technique described in Devaney, Mandel, & Fujinaga, 2011) were included in order to remove short notes that are better characterized as glides than discrete pitches.
In the computational analysis, an F0 trace is assumed to be a sinusoid, allowing the vibrato rate and extent to be estimated from the position and amplitude of the maximum point in its Fourier transform. Specifically, this is done by applying MATLAB’s implementation of the fast Fourier transform (FFT)
2
to the F0 trace,
In the sum component of the calculation,
and the position of this maximum corresponds to the rate of the vibrato sinusoid,
In equation (3),
An example of these calculations is demonstrated visually in Figure 3. The upper plot shows a time-domain representation of a test signal and the lower plot shows the FFT of the signal. In the lower plot, the maximum value (42 cents) corresponds to half the average extent of the vibrato (84 cents) and the position of the maximum corresponds to the rate of the vibrato (6.2 Hz). The extent value of 84 cents and the rate value of 6.2 Hz can be verified by examining the time-domain representation of signal in the upper plot.

Demonstration of FFT-based calculation for vibrato rate and extent. The upper plot shows a fundamental frequency (F0) trace quantized to 20 cents and normalized so that 0 cents corresponds to the mean of the F0 trace. The F0 trace is plotted in respect to cents versus milliseconds. The lower plot shows the FFT of the signal, normalized by the signal’s length and sampling rate.
The duration of the notes was taken into account in order to calculate the average vibrato extent and rate over all of the vibrato cycles. To facilitate this, a weighted mean, based on the length of each segmented note, was calculated across the note-wise vibrato extent (2) and rate (3) calculations for each singer. This can be formally expressed for extent as
where N is the total number of notes, n indexes a single note, Ln is the length of the note and En correspond to values calculated in equation (2) for that note. The corresponding equation for rate is
where Rn correspond to values calculated in equation (3) for a given note. A weighted standard deviation was calculated in a similar manner. The calculation of the weighted mean is necessary to make comparisons with H. Seashore’s data, since only a single average value for rate and extent was reported for each performance.
Results
The results of the calculations, along with the values reported by H. Seashore, are shown in Table 1 for vibrato rate and in Table 2 for vibrato extent. The values reported by H. Seashore were calculated from vibrato rate and extent measurements made directly on the phonophotographic films, so discrepancies in the analysis of the digitized data could speak to issues either in the transfer from the phonophotographic films to the performance scores or in the calculation of vibrato rate and extent from the digitized data. In order to validate the vibrato rate calculations in Equation 3, a manual count of the number of cycles per second in the performance score for one of the singers (Stark, the first singer reported in Table 1) was compared to the results of those from Equation 3. The FFT rate values reported in Table 1 were comparable to the manual counts, so it is likely that the difference between the FFT calculations and the H. Seashore values are due to the difference in the cycles considered. The lower number of cycles analyzed by H. Seashore likely excluded the more extreme values, which is reflected by standard deviations being larger on average for the FFT-based rate calculations.
Comparison of summary statistics of vibrato rate (Hz).
Note. The means and standard deviations reported by H. Seashore were taken across entire performances. Weighted means and standard deviations, based on the length of each note, were taken across the note-wise calculations in computational approach across each performance in order to render the note-wise calculations comparable to the Seashore data.
Summary statistics of vibrato extent (in cents).
Note. The means and standard deviations reported by H. Seashore were taken across entire performances. Weighted means and standard deviations, based on the length of each note, were taken across the note-wise calculations in computational approaches across each performance in order to render the note-wise calculations comparable to the Seashore data.
Accurate manual counting of vibrato rate was facilitated by the presence of one tenth second ticks on the performance scores. For frequency information, only semitones are indicated on the scores, making the accurate visual examination of vibrato extent impossible. In order to validate the FFT-based vibrato extent calculations, cycle-by-cycle extent calculations were made. The averages of these calculations are shown alongside the FFT calculations in Table 2. Overall, the cycle-by-cycle calculations are closer to the results reported by H. Seashore than those obtained using the FFT approach. This does not, however, invalidate the FFT approach since the discrepancy between the FFT and cycle-by-cycle calculation is less an issue of accuracy than an issue of what the two approaches are measuring. The frequency of the maximum magnitude of the FFT provides an estimate of the sinusoid that best fits the most prominent cyclical trend in F0 trace, while the cycle-to-cycle approach treats each cycle as equally important in the calculation and does not provide an overall assessment of the general trend of the vibrato.
Discussion
While the vibrato rate data could be calculated by counting and the extent data by cycle-by-cycle measurements, other pitch-related parameters require more complex calculations. An example of this is perceived pitch; H. Seashore used the results from previous studies in the Seashore laboratory (Metfessel, 1926; H. Seashore, 1932; Tiffin, 1931), which suggested that the mean of the steady-state portion of a note corresponds to its perceived pitch. The mean pitch was calculated by visually tracing a line through each note. Once visually calculated, H. Seashore considered the mean-pitch lines in regards to the “correct pitch” implied by an equal tempered tuning. He considered a note to be the correct pitch if vibrato cycles were regularly spaced and undulated equally above and below the frequency of correct pitch. He also observed the upwards or downwards trend of the line. Unlike vibrato rate and extent, which could be succinctly calculated and compared, H. Seashore did not calculate the specific details of mean-pitch data, such as measuring the regularity of the vibrato around the correct pitch or the precise rise or fall of its slope. Rather, he presented information about the minimum and maximum deviations of the mean pitch from the correct pitch and general statistics about whether the sung tones sloped up or down. Going forward, one of the next tasks of this project is to perform a detailed analysis of the F0 data in regards to recent psychoacoustic models of perceived pitch, such as Gockel, Moore, and Carlyon’s (2001) model that uses a weighted mean that downweighs, but does not completely ignore, portions of the note where the fundamental frequency values have a high rate of change. Another is to examine the trends of the F0 traces in a more systematic way, using approaches such as decomposing the F0 traces into discrete cosine coefficients, as described in Devaney et al. (2011). Both of these endeavors will facilitate in-depth comparative analysis that was not possible in Seashore’s time, both within the Sea-shore data and with findings from contemporary experiments.
Many contemporary researchers in performance studies are familiar with Carl Seashore’s book Psychology of Music (1938), but they do not have access to the wealth of data in the other books and articles published by his la boratory. The goal of this project is to make the data available by digitizing the performance scores printed in the laboratory’s publications. This will facilitate investigations into a number of research questions. The first, which has been partially addressed here, is whether the results reported in the books can be replicated with the data presented in the performance scores. It is likely that there will be some deviations in specific measurements, but the more important question is whether the general trends reported in the publications from Seashore’s laboratory can be verified. This could include analyzing the original recordings that still exist with current signal processing techniques and comparing the results to the data reported in the performance scores. The second question is what trends can be observed from the performance score data that Seashore and colleagues did not calculate. This includes re-analysis of the continuous data for singing and violin performance with more sophisticated models of perceived pitch and intensity for time-varying tones than those that were available in Seashore’s time. This also includes questions that the researchers in Seashore’s laboratory did not address, such as local variations in performance parameters between different musical contexts, in addition to larger-scale features of the collection of recordings, such as variations across performances and performers. The third question is whether experiments with contemporary performers using the same musical materials would produce similar results.
Footnotes
Funding
This project is funded in part by a grant from the Division of Arts and Humanities, College of Arts and Sciences, The Ohio State University.
