Abstract
This article aims to analyse the representation and retrieval of musical scores submitted by netizens or institutions, repositories and digital libraries across various digital environments such as score storage sites. It considers the characteristics of these interfaces and observes how the scores are thematically represented. The method employed in this study was exploratory research to search and retrieve scores, and descriptive research to analyse the thematic representation within these environments. The results reveal that these sites are organized by category, which helps netizens in their search for musical scores. The analysed sites differ in comprehensiveness and organization but have similar features with regard to indexing. Consequently, the study argues for investment in their structure, organization, languages and systems, and professional training.
Introduction
With the advent of the Internet and development of information and communications technologies, netizens now have new ways of interacting and communicating on the Web. Consequently, information science has had to keep up with technological advances and developments in order to adapt to the new practices of information treatment, organization, representation and retrieval in digital environments.
Digital environments, as complex environments, require technological resources that are compatible with the level of development of the information field (Souza et al., 2016). They are built and enabled by information and communications technologies, using online document storage sites created by netizens or institutions, repositories and digital libraries. Among these environments, digital libraries aim to organize digital information within their environment, which is linked to information search and retrieval, classification and categorization, libraries, and the automation of publications and databases (Borgman, 1999). Digital repositories have been developed as ‘information environments, which enable the processes of access to information, preservation and interoperability across information resources’ (Gonçalez et al., 2018: 5237). In turn, they are similar to traditional information environments, but with characteristics specifically geared to the digital environment, minimizing netizens’ information needs (Camargo and Vidotti, 2011: 43). Repositories are digital environments that are typically developed to store documents, such as musical scores, while institutional repositories have links to institutions such as companies, non-governmental organizations and universities, among others. Both types of site aim to make their documents readily accessible on the Internet.
Due to the large amount of information on the Internet, Dougan (2006) points out that queries on search bars can produce an equally large number of results, even with consolidated tools. In addition, netizens often find it difficult to discern between different types of searches, such as a library catalogue, a Google search, the free web or the deep web itself, because if the information has been located via the browser on the Internet, they automatically assume that it is only available on the Web, without further understanding. Discerning the differences between the various types of storage sites can equip netizens to better navigate the options, effectively search and evaluate the available content, and form realistic expectations when searching for the desired document, such as a score in digital format, for example (Dubnjakovic, 2009).
In this sense, musical scores can be considered as an example of a document and information object, as they go through representation processes in order to be retrieved and accessed. When talking about location, Dougan (2006) adds that they are mostly affiliated to universities or cultural institutions, especially historical scores that have not been made available by netizens. Such an affiliation allows the score and its history to be preserved. Other barriers to localization that affect representation and retrieval are the possibility of a score being poorly written or in an online database that searches cannot access, such as the deep web. For this reason, Dubnjakovic (2009) states that the quality of musical score storage sites should be addressed, as well as the editions available, breadth of the collections, visual appearance and functionality. Dubnjakovic explains that establishing criteria is important with digital musical score content that is generally hosted by individuals, libraries, societies, professional organizations or companies and the media.
According to Dougan (2006), describing musical scores for representation purposes can be a complex and arduous process for two reasons: (1) the incompleteness of the metadata associated with these scores and (2) netizens’ lack of knowledge – in particular, their inability to develop strategies to carry out effective searches. Dougan explains that although netizens often search by title or keywords, the approach they use may not lead them to the desired result. Also according to Dougan, the interface and search bars are considered additional challenges when searching for musical scores, especially if they are free. This was identified as a concern by the librarian community as early as 2006, and continues to be a problem today, since all of these factors impact the representation and retrieval of musical scores, as well as the overall accuracy of the user experience when searching.
Given this context and the concern with score representation and retrieval explained by other authors, the following question was developed: How do digital environments carry out the process of indexing musical scores? In this regard, the study aimed to investigate how descriptions were formulated, what information was chosen, and what representative terms were assigned in the score indexing process. The research proposal was to carry out (1) a theoretical study of the process of indexing musical scores in order to understand how scholars discuss the topic and (2) an exploratory study of how digital environments (libraries, repositories and websites) choose which information and representative terms to represent scores, with the aim of observing how they carry out this practice. From this, the study aimed to propose improvements to the systems and recommendations to those in charge of the process of indexing musical scores to make search and retrieval more effective.
This research is justified on the basis of the investigations mentioned in this context (Dougan, 2006; Dubnjakovic, 2009) and the need to give musical scores adequate descriptive and thematic treatment in digital environments. Oliveira et al. (2017: 147) add that the ‘context of digital information production has demanded scientific research on forms of storage and appropriate tools to enable access to and use of information’ by netizens, which demonstrates the need for specific research of this nature.
Indexing of musical scores in digital environments
Musical scores originated when symbols began to be placed next to the lyrics of the songs of religious ceremonies at the beginning of the 9th century (Souza and Souza, 2014). In the 21st century, the best-known and most popular type of musical score is the Western score. Although its origin is not precise, there is consensus that it was directly influenced by the Benedictine monk Guido d’Arezzo (c.991–c.1033), who is considered to be the creator of the modern musical score (Bairral, 2010).
Musical scores are defined as ‘graphic representations of sounds and the instructions for playing them, made on paper’ or in digital format (Gandelman, 2003: 1). For McLane (1996), they are composed of their own features and functions, which provide a standardized guide for all musicians when performing a piece.
The information contained in a musical score is presented in the details of its structure, as these details disclose the process and moment of its creation (such as the title date), stave, tonality, lyrics and notes (Torres Mulas, 2000). This set of elements reveals what information makes up a score and its functions, which are important both for its execution by a musician and for obtaining knowledge to carry out the indexing process.
The indexing process is made up of sub-processes that will identify the content of a given document, such as a musical score. These sub-processes can be divided into examining the document and establishing the subject of its content; identifying the concepts present in the subject; and translating the concepts into indexing language terms (International Organization for Standardization, 2011; Lancaster, 2003). The aim is to represent documents in analogue and digital information environments thematically.
Musical scores require the same procedures as textual documents. However, they differ in that they contain symbols (notes, clefs, stave, tonality, time signature, etc.) and text (title, date, composer, lyricist, etc.), which turns them into both a textual and a visual document, and greater care is therefore required when carrying out indexing procedures. Dubnjakovic (2009) explains that the need to focus on the process of indexing musical scores is due to the demands that have arisen in the development of technology and the fields of music and information science itself. With the explosion of information technologies and resources, such as the Internet, netizens and librarians now require new tools for searching, representing and retrieving information, especially in digital environments.
According to Faria (2009), the process of indexing a score is challenging due to the lack of knowledge of the music field on the part of most information professionals (indexers, librarians, archivists) to meet the information needs of musicians, such as the recurring symbology in scores. On the other hand, musicians lack knowledge of the library and information science techniques and methodologies to carry out the indexing process effectively.
During the indexing process, the indexer should understand both the graphic and textual elements of the musical score, as recognizing the functions and significance of each symbol is crucial for retrieval. In this sense, Souza and Souza (2014) explain that the descriptive elements (title, composition, data, copyists, notes, symbols, instrumentation) can be checked in different parts of the structure (cover and header, top or bottom). They stress the need for this specialized knowledge to carry out the representation, as the information is not always obvious, which poses additional challenges for the indexer. According to Smiraglia (2001) and Dougan (2006), librarians must determine what to use for representation. In the context of music, the subject can vary based on the structure of the musical score. Librarians might not be familiar with the title or composer but may recognize the first line of the composition, which requires additional tools for the indexing process.
According to Cavalcanti and Carvalho (2011), the process of analysing a score for representation can fail as the textual sources of a score may not be sufficient or clear enough to identify information during the indexing procedure. They list the thematic and descriptive elements, without making a distinction: original title, translation into another language, authorship, date of the work’s composition, biographical features of the author, publisher, printing, edition and fingering, collection/series, period, genre, nature of the characteristics of the work and format.
One way to improve the process of indexing musical scores is to implement and adopt metadata as a complement, as it can improve the representation and impact score retrieval in digital environments. For Sasser (2009), the use of metadata by information professionals is essential, as its thoroughness enables netizens to locate and interpret their needs. Consequently, metadata should adequately represent the specialized and unique features of musical scores, since information professionals do not usually describe scores at the item level, but by collections represented by composers, lyricists and other types of creators (Riley and Dalmau, 2007).
According to Riley and Dalmau (2007), representation should be based on a created metadata model and contain the following descriptive and thematic elements: title of the score, name or creator, composer, arranger, lyricist and dedication or performer dates (copyright, date of publication or if there is no date); cover art; subjects (lyrical language, form, genre and style); instrumentation; and geographical information. Carrying out the score indexing process requires that the most significant elements of the musical score and its structural analysis be considered, as adopting these elements allows other facets of the score’s analysis to be achieved (social, aesthetic, sociological and historical reflections; Cavalcanti and Carvalho, 2011).
The theoretical basis discusses important points in the literature on indexing musical scores that have been under discussion since the beginning of the 21st century, including the definition of scores, what the thematic and descriptive elements are and their meanings, access to and availability of scores on the Internet, how the indexer makes choices for thematic representation and the procedure itself.
Methodological procedures
This research is characterized by methodological procedures consisting of exploratory and descriptive analyses, the purpose of which was to observe the process of indexing musical scores in digital environments. In order to investigate the process of representing and retrieving musical scores, the study was divided into three stages: (1) the theoretical foundation, derived from bibliographic research through published literature; (2) the selection of an analysis matrix in which musical scores are stored and made available, and the creation of categories for analysis; (3) data collection and analysis through the search and retrieval of scores.
In the theoretical foundation stage, a bibliographic search was carried out in different databases: BRAPCI – Database in Information Science, Directory of Open Access Journals, Google Scholar, and Library, Information Science and Technology Abstracts. The terms used in the searches were ‘indexing’ AND ‘musical score’; ‘organization’ AND ‘musical score’; ‘representation’ AND ‘musical score’; and ‘musical score’. The time frame was 2000–2023. However, there was research from earlier than 2000 that proved to be important, such as the study by McLane (1996).
In the second stage, the selection of the analysis matrix and the creation of the categories were carried out together as they are interlinked. Dubnjakovic’s (2009) study developed a list of websites divided into categories, which was incorporated into the study as a way of deepening the research and analysis of the repositories (see Table 1). The categories drawn up by Dubnjakovic (2009) are: (1) individually produced websites; (2) library websites; (3) professional society and organization websites; and (4) company-sponsored websites.
List of repositories analysed in the research.
The repositories shown in Table 1 were chosen based on observation while exploring the Internet. The following aspects were analysed: categories and subcategories and their access (based on the search and retrieval of scores) and representative terms. Of the 17 repositories selected, two were unstable and prevented analysis of the search results. However, the organizational structure of their interfaces was noteworthy and interesting enough to be discussed in the results.
The third stage was the analysis of the results based on searches carried out in these information environments to verify how the scores were retrieved. The representative term used was ‘Clair de lune’, a piano score composed by Claude Debussy in 1905.
Results
The repositories were analysed and divided into the four categories developed by Dubnjakovic (2009) in his study: individually produced websites and library websites had five repositories each; professional society and organization websites had four repositories; and company-sponsored websites had three repositories. In order to analyse how the score representation and retrieval took place, we looked at the categories and subcategories made available by them (see Figure 1), the representative terms attributed in the metadata (see Table 2), and the relationship between the general categories presented by the repositories and the categories developed by Dubnjakovic (2009; see Table 3).

Most frequent categories and subcategories.
Representative terms identified in the repositories.
Relationship between general categories and types of repository.
Note. LCSH: library of congress subject headings.
In total, 133 categories and subcategories were found in the repositories analysed. Of these, 11 appeared more frequently in at least three different repositories. The ‘composer’ category, for example, appeared in nine different repositories and can therefore be characterized as an essential element, given the significant number of repositories that used this category as an element for retrieving scores. ‘Title’, ‘genre’, ‘format’ and ‘author’ appeared in five different repositories and can also be regarded as essential categories, like ‘composer’. There is a necessary issue to consider in relation to the ‘author’ category; it is presented in a generic way and it is therefore not possible to know who ‘author’ refers to – the composer, lyricist, editor or arranger, for example. The same can be said for the ‘name’ category, which does not appear in Figure 1 because it was used in only two repositories. Both categories could be considered synonymous. However, the absence of specificity prevents the determination of what role the person responsible for the score played, and makes categorization difficult.
A similar situation occurred with the categories ‘genre’ and ‘subject’, which were also used by few repositories. ‘Genre’ can vary according to the context in which it is used, due to the complexity and scope of the term. In this study, as a category, it refers to musical styles (jazz, samba, rock, classical, etc.). When analysed, it can be verified that it is similar to ‘musical style’ – another category that does not appear in Figure 1 due to it not being used in many repositories, but which is referred to in the same way as ‘genre’. The categories of ‘subject’ and ‘theme’ can also lead to misunderstandings, as they can be used as synonyms for ‘genre’ and ‘musical style’ at the time of representation. This can be observed when searching for the musical score ‘Clair de lune’ by Claude Debussy in the following repositories: Music Editions, Music Scores, Mutopia, São Paulo University Collections and SuperPartituras. The terms in the ‘subject’ category are almost the same as those assigned to ‘genre’. Smiraglia’s (2001) and Dougan’s (2006) studies support this observation in explaining that ‘subject’ can vary according to the structure of the musical score and the level of knowledge of the professional, which can cause discrepancies and variations in representation.
In order to compare the categories and attributed elements, the score records were accessed to check which representative terms were attributed to the ‘Clair de lune’ score (see Table 2).
Table 2 shows that only seven repositories had representative terms – less than half of the analysed repositories. São Paulo University Collections had the most terms (3) with ‘instrumental’, ‘music’ and ‘five pages’, followed by the Gutenberg Project (2) with ‘American drama’ and ‘text’. The remaining repositories had only one term: SuperPartituras with ‘classical (genre)’; Mutopia Project with ‘classical and modern (genre and period)’; Sheet Music Catalog with ‘piano music’; ABC Music with ‘popular (music style)’; and Music Scores with ‘weddings (genre)’. There are similarities and differences across the assigned terms, with some being very specific (‘weddings (genre)’, ‘American drama’, ‘five pages’) and others being more generic (‘popular (music style)’, ‘music’). The choice of these terms raises the possibility of questioning the existence of the use of controlled vocabularies – an essential stage in the indexing process (International Organization for Standardization, 2011; Lancaster, 2003) – and the challenges generated by the procedure for the person responsible for indexing (Faria, 2009). Despite this, establishing guidelines for decision-making and using metadata can help establish the elements due to their completeness and ability to provide adequate representations (Sasser, 2009).
In order to verify the relationship between the most frequent categories in the analysis matrix and the website categories developed by Dubnjakovic (2009), the data was cross-referenced (see Table 3).
The italicized categories shown in Table 3 appeared most frequently and, from Figure 1, it can be observed that all types of repositories used them. It is therefore clear that the way in which guidelines and objectives are established can affect the entire development of a repository and its categories.
Company-sponsored websites presented repositories with the smallest number of categories and with musical specificity considered more expressive, such as ‘jazz’, ‘blues’ and ‘classical’, which are different music styles. On the other hand, library websites had repositories with a large number of categories, some of which were very generic, such as ‘house’ and ‘library’. Nevertheless, they were still the most complete in their choices, as they presented 10 of the 11 categories that appeared most frequently. As these are both public and private university institutions, it is unclear whether they use additional tools to control vocabulary. Although not entirely effective, these tools help to improve representation and retrieval (Dougan, 2006; Smiraglia, 2001).
Determining what each category will cover can ensure that there are fewer flaws in the representation and less confusion for netizens when trying to retrieve a score. Therefore, in order to better develop this type of digital environment, those in charge of the process need to have in-depth knowledge of music and indexing processes, so that they can work with both areas together (Faria, 2009; Souza and Souza, 2014). To this end, establishing guidelines for making decisions, ranging from a repository’s information architecture to the function of each category and even the sub-processes of indexing (International Organization for Standardization, 2011; Lancaster, 2003), can bring improvements and make repositories efficient in representing and retrieving scores. These choices can be formalized as policies and instructions, and made available to both those in charge of the process and netizens.
When analysing the literature and the selected categories, it can be seen that themes and standardization are essential for score representation and retrieval. The representative elements presented by the authors in the theoretical background corresponding to the categories analysed in the repositories were ‘title’ (Cavalcanti and Carvalho, 2011; Riley and Dalmau, 2007; Souza and Souza, 2014; Torres Mulas, 2000), ‘composition’ (Souza and Souza, 2014), ‘composer’ (Dubnjakovic, 2009; Riley and Dalmau, 2007), ‘author’ (Cavalcanti and Carvalho, 2011), ‘instrumentation’ (Souza and Souz, 2014), ‘collection’ (Cavalcanti and Carvalho, 2011; Riley and Dalmau, 2007), ‘genre’ (Cavalcanti and Carvalho, 2011), ‘format’ (Cavalcanti and Carvalho, 2011) and ‘subject’ (Riley and Dalmau, 2007). When analysing these elements, it is clear that there is not a complete consensus on the number of categories and which should be considered for the indexing process. Some authors present a larger number (Cavalcanti and Carvalho, 2011; Souza and Souza, 2014), while others present a smaller number (Riley and Dalmau, 2007; Torres Mulas, 2000), focusing on the title and those responsible for developing the score, respectively.
In the process of analysing the repositories, all the categories showed that, at different levels, they resemble, differ from or complement each other in either the descriptive or thematic elements pointed out by the authors in the literature (Cavalcanti and Carvalho, 2011; Dubnjakovic, 2009; Riley and Dalmau, 2007; Souza and Souza, 2014; Torres Mulas, 2000). Each repository was developed with its own specific objectives to reach a target audience. Despite the positive aspects, the attribution of representative terms can be considered scarce and ineffective, due to the repositories’ low adherence to the procedure, as evidenced by the lack of terms in their records. The representative terms were at two extremes: on the one hand, they were too generic (‘music’, ‘text’); on the other, they were too specific (‘American drama’, ‘weddings (genre)’). The few repositories that did assign terms gave rise to questions about the use of some kind of vocabulary-control tools, which diverges from another question posed above, since the library website repositories presuppose the use of controlled vocabulary due to the categories they present.
Conclusion
The investigation of musical scores in different digital environments has provided a visualization of the details of how they perform in search and retrieval, the representative terms assigned, and their accessibility.
At first, there is no explicit information on the functioning, access or indexing process of the analyzed matrix. Although empirical browsing offers levels of understanding, it can often be a significant challenge, with recurring problems when searching for musical scores, especially if there is no knowledge of indexing or music. The analysis shows that only a small number of the sample assigned representative terms, ranging from the most generic to the most specific possible, which raises the question of whether vocabulary-control tools are used and why other repositories do not use them.
Both the literature presented and the results discuss professionals’ lack of knowledge, as well as structure and stability. Developing guidelines and using metadata can be solid solutions for improving the representation and retrieval of musical scores.
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
This research was supported by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior and National Council for Scientific and Technological Development.
