Abstract
Many pieces from the canon of Western classical music have been recorded by a variety of performers, often representing a wide spectrum of performance styles. Various factors, such as playing technique, stylistics, or instrumentation, influence the perceived musical similarity – or dissimilarity – between various recordings of one composition. The present study investigated the perceived dissimilarities of two melody sequences, using Bach's second Brandenburg Concerto as an example. The collected data were analyzed using weighted multidimensional scaling (INDSCAL), with which the dissimilarities of the pairwise tested melody sequences were represented as distances between spatial coordinates. In addition, acoustic features were extracted from the melody sequences to interpret the resulting INDSCAL dimensions. The results showed that participants shared the same perceptual space in terms of similarity assessment and were implicitly guided in their judgment by the respective instrumentation and musical design. Furthermore, melody sequences were perceived as more similar when they shared the same spectral properties. The present work extends the current state of research regarding the similarity assessment of different musical interpretations, with a specific consideration of Romantic and historically informed performance practices. Furthermore, this study provides the basis for further studies on the perceived similarity of cover versions of popular music, studio vs. live recordings, or electric vs. acoustic recordings and their evaluation. Future research should therefore consider a larger number and variety of stimuli and be performed under more controlled acoustic conditions.
Introduction
Similarity Perception in the Context of Gestalt Psychology
The perception of similarity is one of the essential constructs of psychology, permeating numerous theories of cognition and forming the basis of key grouping laws in Gestalt psychology (Pinna et al., 2022). One such principle describes the spontaneous and unconscious tendency of human cognition to form perceptual units based on object properties, their relations, and salience, which accounts for perceived similarity and degrees of equivalence (Wertheimer, 1923).
In the context of musical perception, Deutsch (1978) stated that the boundaries of perceptual units are often marked by contrasts in sound structure, timbre, pitch, tone duration, or harmony (see also Stevens & Byron, 2016). Expanding on this, Lerdahl and Jackendoff (1983) proposed in their Generative Theory of Tonal Music that listeners perceive boundaries when adjacent units differ, while similarities reinforce grouping within a unit. Based on Lerdahl and Jackendoff's (1983) approach and considering the Gestalt laws described by Wertheimer (1923), Deliège (1987) developed a model describing the principles of similarity and dissimilarity as they account for the grouping of units in a musical context. In this model, Deliège generalizes the Gestalt laws under the terms proximity, similarity, common fate, closure, and good continuation and subordinates them to the principles of similarity and dissimilarity. According to Deliège, perceived similarity between elements supports the formation of a unit, while dissimilarity signals a boundary between units (Deliège, 2007). Previous studies based on Deliège's model suggested that stimulus pairs in short temporal intervals (Dowling & Bartlett, 1981) and within short melodic sequences (Carterette et al., 1986; Cuddy et al., 1981; Edworthy, 1985) are perceived as more dissimilar when there is a change in melodic contour. Other findings considered the relevance of rhythm (Gabrielsson, 1973; Monahan & Carterette, 1985; Palmer & Krumhansl, 1990), musical themes (Pollard-Gott, 1983; Ziv & Eitan, 2007), and motifs (Lamont & Dibben, 1997; Zbikowski, 1999), while Serafine et al. (1989) pointed to the hierarchical structure of a melody as crucial in the context of similarity perception.
In terms of further development and distinguishing her research from Gestalt laws, Deliège postulated her assumptions in cue abstraction theory (1996), drawing on Peirce's (1974) conceptualization of an index based on semiotics. An index in the form of a cue refers to a concrete event, which is evoked due to a dynamic connection with both the event and the existing mental representations of the person receiving the cue (Peirce, 1974). According to cognitive psychology, individuals draw on long-term memory when interacting with their environment (Baddeley, 2020). Familiar stimuli are categorized based on existing mental schemas, whereas novel stimuli trigger comparison processes with known categories of similarity and dissimilarity to enable categorization.
Cue abstraction theory has since been applied to compositions by Bach, Schubert, and Mozart from the classical repertoire (Deliège, 1996, 2001; Deliège et al., 1996), modernist works of Debussy, Webern, and Stockhausen (Addessi & Caterina, 2000; Deliège, 1989, 1993, 1995; Imberty, 1981, 1984; Palmer & Krumhansl, 1990) as well as contemporary repertoire of composers Reich, Berio, and Boulez (Deliège, 1989; Deliège & El Ahmadi, 1990). From the perspective of developmental psychology, various aspects of segmentation and categorization ability have also been addressed within the framework of cue abstraction theory. Méélen and Wachsmann (2001) demonstrated that cue abstraction mechanisms exist as early as in infancy, while Koniari et al. (2001) found that children aged 10 to 11 were able to assess the degree of similarity between musical units. These findings are consistent with Dowling's (1999) assumptions that numerous mechanisms of music perception and processing in adults build on elements already present in early childhood.
Similarity Perception in the Context of Quantifiable Properties
In the early 1970s, Tversky and Kahneman (1973) described in the context of availability heuristics that the frequency of events and features is crucial for the process of category formation. Specific research on musical perception has demonstrated the effect of quantitative information for both transposed melodies (Van Egmond et al., 1996) as well as investigating the effects of pitch direction, progression, and information (Freedman, 1999; Quinn, 1999; Schmuckler & Boltz, 1994), pitch range, and key spacing (Van Egmond & Povel, 1996).
Cross-cultural studies demonstrated the influence of quantitative features on perceived similarity in North Indian (Castellano et al., 1984), Balinese (Kessler et al., 1984), and North Sami music (Krumhansl et al., 2000). Similarly, Oram and Cuddy (1995) demonstrated that manipulated quantitative features influence perceived similarity. These findings suggest that recipients are sensitive to pitch distribution, with differences based on musical expertise. While recipients without prior musical training drew conclusions about basic melodic and tonal structures based on the pitch distribution, experienced recipients developed style-specific expectations about the progression of the melody. This aligns with observations that composers tend to use important tones of tonality more frequently, in longer durations, and in structurally significant places than other tones (Järvinen, 1995; Knopoff & Hutchinson, 1983; Krumhansl, 1990).
Specifically for folk music, Eerola et al. (2001) used multidimensional regression analysis to show that similarity of frequency-based musical features explained approximately 40% of recipients’ similarity judgments. Prediction accuracy improved to 55% when incorporating descriptive variables such as number of tones, rhythmic variability, and melodic predictability. These results indicate that both measures capture some aspects of those properties representing the salient dimensions that recipients look for when categorizing melodies.
Changes in Performance Practice and Musical Arrangements
Johann Sebastian Bach is considered one of the most frequently recorded composers in music history. The broad range of available recordings represents a wide spectrum of performance styles, shaped by various factors that influence the perceived musical similarity between various performances of a specific composition. Of relevance to the present study are performances that align either with the ideal of a “Romantic” sound or represent historically informed performances. Romantic practice typically involves modern instruments (e.g., string instruments with steel strings and modern bows) and favors techniques and styles that closely resemble those used for music from the Romantic era. In contrast, historically informed performance practice aims to reconstruct the sound ideal of the period in which the work was composed. This includes a comprehensive consideration of the transformation of notated music into sound, with the goal of realizing the composers’ intentions as precisely as possible (Brown et al., 2001). Thus, historically informed performances rely on the performer's interpretation of tempo, dynamics, articulation, phrasing, and ornamentation – elements often incompletely captured by musical notation. The following section outlines key characteristics that distinguish historically informed from Romantic performance practice.
Changes in Instruments, Instrumentation, and Playing Techniques
With the increasing spread of historically informed performance practice since the beginning of the 20th century, the knowledge of historical instruments, their playing styles, and structural changes has also grown due to the increasing change in music-making demands (Gülke, 1995). While the sound ideal during the Baroque period was a perfect imitation of the human voice, musical life gradually shifted from court culture to public concert settings in larger halls, requiring symphonic instrumentation with more sonorous instruments (Lawson & Stowell, 2012). One example from the string instrument family is the development of the modern bow. It enabled the use of more clearly virtuosic fingerings, varied bow strokes, and positional playing, which had not previously been possible and thereby expanded the possibilities available to composers (McLennan, 2008). Even more drastic, the development of the flute into a new type of instrument was completed in 1827 with the presentation of the conical ring-keyed flute by Theobald Boehm. This differed fundamentally from all previous designs by a new arrangement of the holes and a new fingering system (Reuter, 2001).
An essential decision in historically informed performance, which ultimately can only be made by the performer, is the choice of tempo (Forkel, 2008). Since the metronome had not yet been invented and the mensural notation was still in use, only sparse tempo indications in Johann Sebastian Bach's chamber and orchestral works provided clues to the intended tempo (Elste, 1984). Accordingly, considerable variation can be observed across recordings and editions (Mertin, 1973; Siegele, 2014).
Following Riemann's comprehensive investigations, the consideration of dynamics has become the focus of performance research. At the beginning of the Baroque period, dynamics as a musical parameter were not given much attention, and specific indications for the differentiation of musical syntax or the single tone were quite rare (Riemann, 1884). A distinctive feature, however, is the Baroque differentiation of solo and tutti passages, which, in the sense of dynamics, indicate when one voice is to recede or be emphasized (Katunjan, 2012).
By the mid-18th century, the conviction gradually prevailed that music had to be freed from everything arbitrary and that the practice of ornamentation had to be strictly discarded (Weber, 1807). This was followed by prohibitions of ornamentation and improvisation, which ultimately led to the disappearance of the basso continuo in symphonic literature (Gassner, 1988). Even today, there are no precise instructions for the playing of the basso continuo, once again highlighting the general problem of shaping a work in addition to instrumentation and scoring and the topicality of historically informed performance practice. In summary, numerous factors influence the perceived dissimilarity between historically informed and Romantic performances.
The deliberate selection of historically informed and Romantic performance styles allows for a comparison between two contrasting interpretative paradigms. These styles not only differ in instrumentation and articulation but also represent fundamentally different aesthetic ideals. Their juxtaposition thus provides an ideal test case with a rather large disparity between both interpretations for examining the extent to which interpretive traditions shape listeners’ perception of musical similarity.
Aims and Research Questions
Despite the obvious relevance of the previously described components of musical performance practice, no empirical studies to date have examined the perceived similarity of musical arrangements following different performance practices. The aim of the present study is to extend the current state of knowledge regarding similarity judgments specific to Romantic and historically informed performance practice, using Johann Sebastian Bach's Brandenburg Concerto No. 2 as an example. In addition, characteristics related to the specific similarity judgments will be extracted. In this regard, the following research questions will be addressed:
To what extent do participants share the same perceptual space regarding musical similarity of different performance practices? What objective parameters can be used to explain the differences in perception of similarity?
Method
Participants
A total of 172 subjects began participation in the online study; they were recruited via word-of-mouth, social media notices, and the SurveyCircle research platform (Johé, 2021). Due to early termination of the survey or implausible response times, 38 participants had to be excluded from the study, resulting in a final sample of N = 134 subjects (female: 66.42%, male: 33.58%) aged 19 to 79 years (M = 32.24, SD = 13.72). The self-reported musical perceptual ability determined using the Perceptual Abilities subscale from the Gold-MSI (Schaal et al., 2014) was M = 44.92, SD = 9.10, just below the mean of M = 45.84 (SD = 8.62) in the German-speaking sample from Schaal et al. (2014) (N = 641). No compensation was provided for participation in this study. The datasets generated and analyzed during the current study, as well as the audio files and the music scores, are available on request from the second author.
Online Questionnaire
To investigate the research questions, an online questionnaire was created and distributed via an online link (Leiner, 2019). It included sociodemographic items, the Perceptual Abilities subscale of the German Goldsmiths Musical Sophistication Index (Gold-MSI; Schaal et al., 2014), and a listening task with pairwise similarity ratings of melody sequences. Participants could leave optional feedback at the end. The components are described in detail below.
Stimuli
To investigate the similarity perception of musical configurations, two recordings of Johann Sebastian Bach's Brandenburg Concerto No. 2 (BWV 1047) were used. The first, recorded in 1968 by the Saarland Chamber Orchestra under the direction of Karl Ristenpart (Bach, 1721/1966), can be attributed to historically informed performance practice. Ristenpart's musical arrangement is characterized by its straightforward progress in tempo and dynamics as well as emotionally charged and exciting playing style (Hanford, 1998). In addition to the four solo parts, a total of four violins in the first and second voices, two violas, one cello, and one bass each were played on historical instruments. Furthermore, the basso continuo on the harpsichord was suspended. In contrast, the second recording, performed by the Munich Bach Orchestra under the direction of Karl Richter (Bach, 1721/1968), represents Romantic performance practice. Deliberately distancing himself from the historically informed performance practice, Richter scored the Brandenburg Concerto for modern instruments. The use of a full symphony orchestra contributed to a pronounced expressivity as did the rhythmic intensity and tightened tempo. Most relevant to the present study was the choice of two markedly disparate representations in performance style, while the exact attribution of either recording to a specific performance style, such as historically informed performance practice, is of secondary priority. Another source of validation for the disparity between both recordings was their use in the Compare Bach series (Bach, 1721/1966, 1721/1968).
Using the audio editing program Audacity, four musical stimuli were extracted from each recording. The selected material comprises the second theme of the first movement, in which all four solo instruments are introduced immediately after the beginning. This theme comprises two measures including the upbeat and is marked by a characteristic sequence of 16th notes. A detailed overview of the stimuli is given in Table 1.
Overview of the Stimuli.
Note. HIP = historically informed performance practice; RP = Romantic performance practice. Each stimulus was faded in and out for 0.75 s.
We generated all possible stimulus pairs (28 total), which participants rated for melodic similarity on a 101-point scale (0 = maximally dissimilar to 100 = maximally similar) via a slider tool. To reduce potential bias, the slider remained hidden until a selection was made. To familiarize the participants with the rating scale, two illustrative examples were presented. The first demonstrated maximal dissimilarity by contrasting measures 42 and 43 from the RP interpretation with measures 72 and 73 of the HIP version. These stimuli differed entirely in musical design, instrumentation of the solo instrument, musical text, tempo, or mode. Analogously, maximal similarity was illustrated by playing measures 32 to 34 from the RP interpretation twice, as they represented the highest possible agreement across all comparisons. Participants were also instructed to use these examples to adjust the audio volume to a comfortable listening volume.
The 28 pairwise similarity ratings could be completed without time constraints, and participants were free to replay each melody pair as often as desired.
Data Analysis
Weighted Multidimensional Scaling
The statistical analysis of the present work was performed using the programs R Studio (version 1.4.1106) and Matlab (version R2021). Weighted multidimensional scaling (INDSCAL, smacof package in R; Mair et al., 2021) was used to represent perceived dissimilarities between melody sequences as distances in a low-dimensional space (Mair et al., 2021). The use of INDSCAL was justified by interindividual differences in similarity criteria, as the aggregated data indicated a lack of sample homogeneity. During the INDSCAL analysis, ordinal similarity ratings were transformed into interval-scaled distance data. Equal ranks were treated using the primary approach, allowing identical values to be represented by different distances (Borg & Staufenbiel, 2007). This procedure was chosen because nonmetric INDSCAL preserves ordinal properties even when rank differences are expressed as greater than/equal to (≥) relations in distances. We computed distances with the Euclidean metric, as it reflects linear distances between points and thus aligns with natural perceptual space (Borg & Groenen, 2005). The dimensionality of the resulting configuration was determined based on practical considerations, interpretability of the graph, and the number of stimuli. According to Backhaus (2015), a minimum of n = 8 stimuli justifies a two-dimensional solution.
To enable meaningful interpretations of the INDSCAL dimensions, acoustic features of the melody sequences were extracted using the MIRtoolbox (Lartillot & Toiviainen, 2007). In addition to tempo (mirtempo), the analysis included the low-level features zerocrossing (mirzerocross), brightness (mirbrightness), roughness (mirroughness), irregularity (mirregularity), spectral entropy (mirentropy[mirspectrum]), spectral flux (mirflux[mirspectrum]), and key (mirkey). The resulting parameters were finally correlated with the perceptual dimensions.
Quality Criteria for the INDSCAL Configuration
The appropriateness of the INDSCAL configuration was evaluated using two criteria. First, a Shepard diagram determined the goodness of fit between the configuration distances and the perceived dissimilarities. Distances derived from multidimensional scaling were plotted on the ordinate, and the dissimilarity values on the abscissa. An optimal fit can be assumed if all points lie on a monotonic regression line. Furthermore, the squared correlation between distances and disparities (R2) was calculated to quantify the proportion of variance in disparities explained by the configuration. Second, the model fit was assessed using the standardized residual sum of squares (stress), which reflects the squared deviations of the points from the regression line. Lower stress values indicated a better representation of the data through the spatial configuration.
Results
Quality of the INDSCAL Configuration
The Shepard diagram as shown in Figure 1 indicated a good fit between the dissimilarity ratings and the distances of the configuration. Regarding the squared correlation of the distances with the disparities, the two-dimensional (R2 = 0.868) and the three-dimensional configuration (R2 = 0.723) differed moderately from each other. The regression line was weakly monotonic through a cloud of points with only a few outliers. Despite a higher stress value of the two-dimensional configuration of 0.251 compared to the three-dimensional one (stress = 0.164), we decided to use a two-dimensional solution due to the small number of eight stimuli (Backhaus, 2015) and the better fit of the disparities to the distances. The shape of the Shepard plot, together with the R2 and stress values obtained, suggests an acceptable MDS solution with an acceptable reliability.

Shepard diagram of the two-dimensional INDSCAL configuration.
INDSCAL Configuration
We analyzed the mean dissimilarity judgments of the participants using weighted multidimensional scaling (INDSCAL), which yielded the result shown in Figure 2. Triangles depict the historically informed melody sequences; circles depict the Romantic melody sequences. The numerical expressions of the configuration, which determine the position of the melody sequences in the perceptual space, are documented in Table 2.

Plot of the INDSCAL configuration.
Numerical Positions of the Stimuli in the INDSCAL Configuration.
Note. HIP = historically informed performance practice, RP = Romantic performance practice.
When the axial partitioning patterns were laid over the two-dimensional INDSCAL configuration, two distinct categories emerged. Along Dimension 1, the melody sequences could be distinguished based on the solo instrument. Here, participants perceived the greatest dissimilarity between the trumpet and violin melody sequences. The sequences of the two recorders were perceived as most similar. Therefore, Dimension 1 could be described as the “instrument dimension.”
In contrast, the melody sequences can be separated along the second dimension in terms of musical design and performance style. Here, the Romanticized recorder sequence was perceived as most dissimilar compared to the historically informed oboe sequence. The participants perceived the greatest similarity along the second dimension between the melody sequences of the clarinet trumpet and the Baroque violin of the HIP performance. Therefore, Dimension 2 could be described as the “performance practice dimension.” Due to the clear separation of the INDSCAL solution along the two dimensions, both the musical design and the instrumentation used showed an influence on the similarity perception of the melody sequences. Participants clearly shared a common perceptual space for musical similarity with empirical evidence for a distinction of the diverging performance practices.
Extraction of Acoustic Features
In order to objectively interpret the two INDSCAL dimensions described previously, we extracted basic acoustic features of the melody sequences using the MIRtoolbox (Lartillot & Toiviainen, 2007) (see Table A1 Table A2). The subsequent correlations of the psychoacoustic features with the perceptual dimensions presented in Table 3 showed a strong correlation of the first dimension—depicting the musical instruments—with various spectral features. Regarding Dimension 2, only a significant correlation with tempo was found. However, by means of a biserial correlation of the two performance practices (Romantic vs. historically informed) with the second dimension, a very large and significant correlation could be confirmed (rbs = –.93, p < .001).
Pearson Correlation Table of the Acoustic Features (MIRtoolbox) and Perceptual Dimensions 1 and 2.
Note. *p < .05; **p < .01.
Discussion
The aim of the present work was to expand the existing body of knowledge regarding the perception of similarity in musical performances, with specific consideration of historically informed and Romantic performance practice using Johann Sebastian Bach's Brandenburg Concerto No. 2 as an example. The results allowed for first answers to the research questions: Participants shared the same perceptual space when evaluating similarity and differentiated in their judgment between both instrumentation and musical design. Furthermore, it became clear that melody sequences were perceived as similar if they shared the same spectral properties, which in turn correlated highly with psychoacoustic features from the MIRtoolbox.
In the listening experiment conducted in this study, participants implicitly extracted both instrumentation and musical design, thus expanding the list of previously identified parameters used in the assessment of musical similarity. Taken together, the MIRtoolbox features extracted from the MIRtoolbox correlated strongly with the first INDSCAL dimension, underscoring the role of timbral and instrumental properties in similarity perception. These findings align with the theoretical predictions of cue abstraction theory, where such features serve as perceptual cues for categorization. Furthermore, this aspect can also be applied to the significant correlation of tempo with the second dimension of musical performance by considering the accelerated tempo as a characteristic of the Romantic performance practice.
A similar conclusion was reached by Eerola et al. (2001), which, to our knowledge, is the only study to date that has investigated the perception of similarity in music using multidimensional scaling and was able to explain a moderate proportion of similarity ratings of folk melodies using descriptive variables, such as number of tones, rhythmic variability, and melodic predictability. However, in contrast to the present work, the frequency-based features used by Eerola were found to be only slightly related to the similarity ratings. One possible explanation for this discrepancy lies in the nature of features used: Eerola's frequency-based variables were symbolic, that is, derived from notated scores. These included the distribution of the 12 tones of an octave, all intervals, or tone durations from the melody sequences (Eerola et al., 2001). In contrast, the present study relied on sound-based acoustic features such as spectral entropy or sensory dissonance (roughness).
Furthermore, it remains unclear to what extent the observed correlations depend on the specific stimulus material. While Eerola et al. (2001) specifically considered folk songs of different nationalities, the present work used a single melody sequence from the Baroque era. A key limitation of the current study lies in the use of only eight musical stimuli, all derived from a single movement of one composition. While this approach ensured maximal control over structural variables, it inevitably restricts the generalizability of the findings. Future studies should include multiple compositions from different stylistic periods to validate the present results across broader musical contexts. This limitation also raises the question whether correlations of perceived similarity between different interpretations of one melodic sequence and specific acoustic features may vary when another musical genre is investigated. Furthermore, the evaluation of similarity between different works is probably based on other evaluation categories than those used in the evaluation of two versions of one work. At present, no empirical studies are known that specifically address the perception of similarity across different interpretations of a single work, making direct comparisons with existing research difficult. Future studies could systematically vary the stimulus material, for example by comparing different genres. In this regard, further studies on the perceived similarity of cover versions of popular music, studio vs. live recordings, or electric vs. acoustic recordings and their evaluation could be conducted and, in the long term, lead to generalizable dimensions in the perception of various styles and designs of music. In addition, controlling acoustic parameters in future studies would allow for more precise conclusions about the cognitive process underlying similarity perception and help to identify key acoustic features. These findings might also inform the functionings of music recommendation systems, particularly those seeking to model stylistic similarity beyond genre labels. By incorporating features such as spectral flux or tempo variance, algorithmic systems could achieve a more nuanced reflection of listener preferences based on interpretative style.
The previous observation that psychoacoustic features, such as those measured by the MIRtoolbox, represent basic structures and correspond to participants’ ratings of similarity should be discussed in the light of the cue abstraction theory (Deliège, 1996). Along these lines, the different musical parameters, such as the tempi of the recordings, could function as cue stimuli and justify the categorization of the melody sequences according to their performance practice. Specifically, in the example of this study, the perception of a tightened tempo and the more radiant sound of the modern instrumentation should have rather pointed to a Romantic interpretation. Future research could test which features in the music, such as tempo, instrumentation, or sound, influence the listener's differentiation between diverging interpretational styles.
To the best of our knowledge, this study is the first empirical work to focus on the similarity perception between different interpretations of the same piece of music. Accordingly, several limitations should be noted. First, the version of the Brandenburg Concerto interpreted by Karl Ristenpart (1966) was used as a representative example of historically informed performance. The selection of stimuli followed an exclusion-based procedure: First, recordings with poor sound quality were excluded and then those that did not provide clear indications of performance practice. Subsequently, many recordings were evaluated with respect to tempo, dynamics, and articulation as well as phrasing, ornamentation, suspension of the basso continuo, and instrumentation. While Ristenpart's recording displays several characteristics typically associated with historically informed performance, such as articulation, phrasing, and instrumentation, it does not fully comply with current HIP standards. Rather, it may be regarded as an early attempt at historically informed interpretation that integrates stylistic elements from both traditions. This classification was nonetheless deemed appropriate for the purposes of this study due to its clear contrast with the Romantic interpretation by Karl Richter. Furthermore, the selected recordings were produced in the same decade and corresponded to those selected for the series Compare Bach (Bach, 1721/1966; 1721/1968). Nevertheless, due to the insufficient transmission of the original works and the improvisational freedom of the musicians, which was common in the Baroque era, any modern interpretation can be called historically informed only with reservations.
Another limitation concerns the number of presented stimuli, which some participants perceived as overly demanding. To achieve the similarity measurements in the present study, we chose a complete pairwise rating procedure. A ranking method or the anchor point method would have been faster as each pair of objects is judged in isolation and not compared with other pairs of melodies and is recommended when the number of stimuli is large, and the participants’ resilience is low (Backhaus, 2015). However, to receive the most precise data in this exploratory study, we preferred the rating task with 28 pairwise comparisons.
In summary, the present study extends existing research on music perception by empirically addressing similarity perception in relation to two distinct interpretative styles of the same musical work. The main results revealed a shared perceptual space regarding musical similarity and the identification of evaluation parameters based on instrumentation and performance practice. These provide support for further studies to specify the process of similarity perception and evaluation in a more differentiated way.
Footnotes
Ethical Approval
The present study was conducted in accordance with ethical principles and standards according to the guidelines of the German Society for Psychology. According to German law, no ethics approval has been required. Informed consent was provided by all participants, and they had the option to cease participating in the study at any time without any negative consequences.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
Action Editor
Isabel Martinez, Universidad Nacional de La Plata, Facultad de Bellas Artes
Peer Review
Adam Ockelford, University of Roehampton, Applied Music Research Centre
One anonymous reviewer
Appendix A. MIRtoolbox features
Overview of MIRtoolbox features extracted from Karl Richter's 1968 performances.
| Acoustic feature (MIRtoolbox feature) | Violin | Oboe | Recorder | Trumpet |
|---|---|---|---|---|
| Brightness mirbrightness | 0.54482 | 0.43972 | 0.23863 | 0.65078 |
| Irregularity mirregularity | 0.87828 | 0.56721 | 0.80186 | 0.27601 |
| Key mirkey | C major | F major | C major | F major |
| Roughness mirroughness (mirspectrum) | 6547667.20 | 44835784.98 | 28120207.70 | 283479223.61 |
| Spectral entropy mirentropy (mirspectrum) | 0.84849 | 0.82458 | 0.78684 | 0.81662 |
| Spectral flux mirflux (mirspectrum) | 12.317 | 24.8438 | 29.5002 | 76.0249 |
| Tempo (bpm) mirtempo | 102.3752 | 139.5237 | 102.0489 | 103.9842 |
| Zero crossing mirzerocross | 1428.1044 | 1479.7537 | 954.3391 | 1617.065 |
