Abstract
The Oral History and Folklore Collection at the National Library of Australia is a research archive; a substantial collection of unpublished audio recordings of varying length and levels of documentation created by and for researchers. These recordings are often extensive and difficult to navigate; even so they are a much appreciated resource for a wide variety of users, from family historians to professional writers. Following a long running preservation program, the majority of the collection has been digitized to archival standards and user copies made which preserve access to the primary document itself, that is, the voice. To allow users to discover and have access to the primary source sound recording, and to be able to locate specific content within those recordings, the National Library of Australia has developed a time-linked search and delivery system and a schema to enable it to be searched by its innovative aggregation platform, Trove (http://trove.nla.gov.au). The Library’s Audio Management and Delivery System currently makes nearly 7,000 hours or 13 percent of the Oral History and Folklore collection publicly available, and content continues to be added. This paper describes the development of that system and considers the nature of innovation within a library environment.
Introduction
Innovation, far from a sudden burst of inspiration and implementation, is most often realized in a continuous process, building on capabilities and drawing together existing technologies in novel ways. For all its radical change to work practices, innovation in libraries is rarely disruptive in the way Clayton Christensen described it (Christensen 1997). Innovative approaches in libraries are more commonly evolutionary, the resultant change most often characterized by being the latest step in a line of sustained innovative work. The innovative online delivery of the NLA’s Oral History and Folklore Collection is possible only because it is on the long tail of sustained innovation within our organization. That innovation has produced the acclaimed Trove service, a platform of discovery technology which interrogates much more than just bibliographic records and makes large swathes of other materials for which content is the principle discovery avenue, such as newspapers, widely available. This platform is itself built on a wide range of innovative digitization programs, digital library technologies and digital management systems. In the project planning and thinking to develop an audio delivery system for oral history, it was clear that to make an impact on users, it must take advantage of the granular discovery capabilities of Trove, and deliver the primary source material online. The resulting Audio Management and Delivery system (AMAD) enables discovery within the content of the oral history recording from a global search, allows for searching within a single, often extensive, interview and returns and plays the identified piece of audio to provide users with the full experience of the primary source. It also provides innovative tools to support research and publication.
The National Library of Australia collection
The National Library of Australia (NLA) holds the world’s largest collection of material relating to Australia and the Australian peoples, ranging from the earliest European speculation about the Great Southern Land to the most recent publications. The NLA’s collection comprises more than 10 million items, ranging from books and journals to paintings, photographs, personal and organizational archival records, maps, music, ephemera, electronic resources and oral history and folklore recordings.
The International Association of Sound and Audiovisual Archives (IASA) describes research archives as those “whose holdings include collections of unique and usually non-commercial recordings, such as field recordings, curated primarily for research purposes.” (International Association of Sound and Audiovisual Archives 2014). The NLA’s Oral History and Folklore collection comprises some 50,000 hours of unique, non-commercial and almost entirely field-created recordings, the largest such collection in Australia. The collection comprises three strands: extended biographical interviews with Australians who have achieved a national reputation in some field of endeavour; social history projects which document particular experiences or aspects of Australian life; and folklore recordings of the vernacular creative outputs of communities, be they Indigenous, recent or earlier migrant groups. Typical of such ‘research archives’, it comes with many constraints and encumbrances, including unpublished unstructured works, raising rights and ethical issues, having potentially libellous content, limited documentation and other concerns that accompany managing and preserving heritage sound and audiovisual collections.
The Library’s early enthusiasm for including oral history recordings in its collection, when the discipline was still in its infancy, was largely due to the then National Librarian, Sir Harold White. He conceived of libraries as democratic institutions, documenting a more socially inclusive history, and this philosophy dovetailed with the burgeoning ideals of oral history. While in the United States in 1950, he met Columbia University’s Allan Nevin, who had established one of the first oral history programs in the previous year (York 2001:186). Back in Australia, White’s interest led to the acquisition of the earliest oral history and folklore recordings from pioneer practitioners. Later the NLA began to commission interviewers and recording projects, establishing more formal relations with interviewers and supplying them with recording technology that met standards for quality and preservation. The NLA departed from the practice at Columbia, in deciding that the audio recording was the primary document, not the transcript. Although some 30 percent of the oral history collection has an accompanying transcript, the audio was to be retained and time was invested in its preservation and management. The voices captured in the NLA’s collection include those of people born more than 150 years ago, whose personal reminiscences are of times and events now beyond living memory. For example, W.M. “Billy” Hughes (1862-1952), the seventh Prime Minister of Australia, and to this day the longest-serving member of the Australian Parliament, airs his views at http://nla.gov.au/nla.oh-vn918290, or Australian poet Dame Mary Gilmore (1865-1962), can be heard to discuss her memories of 19th century Australia at http://nla.gov.au/nla.oh-vn222644. Those voices are preserved as sound recordings, captured digitally at the standards accepted in international sound archiving community guidelines (Bradley 2009).
The creation of an oral history discipline
Libraries are the primary holders of oral history recordings, and have been since the discipline began to take shape around 1950. The development of the discipline within the higher education and research sector, coupled with a community and outreach focus, made libraries natural partners. Many national libraries hold oral history collections and maintain relationships in the oral history community. A significant feature of library outreach of the past two decades has been the focus on the digital environment, making collections available online in keeping with user needs, community expectations, and opportunities of the current information infrastructure. There have been many notable and now familiar enterprises to develop a set of linked up services that enable users to find the content they need, with the most significant successes in image- and text-based materials. Libraries and the oral history practice community have been grappling with the issues raised by placing audio content online, and considering processes that will support online access to both recently created digital recordings and legacy collections created under the old delivery paradigms. There are fewer library-based examples of successful online audio collections compared with those for print and static images. It is salutary to note the formation of Europeana Sounds, a European Community-funded project to improve the discovery of and access to Europe’s rich digital sound and music collections, which it recognizes as the “missing fifth domain” (Europeana Sounds 2014).
This concern to make audio research collections visible and accessible is not limited to libraries. There is regularly a similar lament amongst the oral history community that collections of oral history recordings are underutilized. Most digital access projects are motivated by a desire for increasing access and deeper engagement (Kaufman 2013:3), while researchers defensively respond that the lack of transcripts makes working with such recordings all but impossible (Bradley 2013). A fundamental constraint that is the common ground between these issues could well be the distinguishing properties of the audio items themselves.
The nature of audio documents
Sound recordings are time-based media; time is an integral part of their creation and consumption and this is one of the fundamental distinctions between audio recordings and text and static image documents. These time-based media provide significant benefits to users; however, they also bring challenges in delivering useful and relevant content and complexity in integrating it with other materials in our information abundant environment. To effectively use oral history recordings requires commitment to that time constraint. This constraint has real implications for how researchers have worked with such materials, and time dependency consequently influences system design to enable efficient use and in measuring the performance of the system.
Oral History and Folklore recordings are sound documents by their nature, rich in information and containing far more valuable insights than a transcript can ever portray. The sound recording is the primary document. The written transcript fails to convey the complete meaning inherent in a recording and may sometimes be misleading. Sarcasm and irony, laughter, silences and vocal explosions all provide meaning. The voice conveys history in its delivery and accent, background noises convey place, and the associated audio quality conveys space.
Author David Foster was commissioned to prepare a book based on interviews with writers in the NLA’s Oral History collection, recorded by Hazel de Berg. Counter to his preferred approach, he was obliged to read the transcripts before being in a position to listen to the recordings, as at that time he had to wait while copies were made. He came to appreciate what seemed to be a contradictory sequence, writing that it made him “more fully appreciate the value of an oral history collection.”
He went on to passionately describe the distinction: “In reading the transcripts I was doing what most of us do when we research anything; in listening to the tapes, I had the illusion I was in the presence of living beings…The fifteen transcripts, though accurate as transcripts go, were much of a muchness physically … On the other hand, when I actually heard the tapes for the first time, I met fifteen human beings. And now I can’t read the transcripts as I read them initially, for I see people now. Somehow listening to a voice facilitates the process of visualization. The words, to be sure, are the same, but my interpretation of those words has changed.”
And elsewhere he restates his Damascene moment and writes powerfully; “I don’t want to labour the point, but a living voice is a landscape of which the transcript, in print, is the merest map” (Foster 1991:2-4).
Foster’s passion for the nuance and detail embedded in the voice in an oral history recording reflects the view of the NLA’s curators. We have always provided access to audio recordings to reading room users; we reasoned that not to do the same online would be a diminution of service, and so resolved to implement a system which would deliver audio online. But while the NLA developed a system that would deliver the primary source audio, it was recognized that to address usability concerns, it would need to be usable and discoverable in the existing and expanding discovery infrastructure, which is at the core of library innovation. In effect, the delivery system would need to put these unstructured sound recordings on an even footing with more traditional sources of information which can be interrogated by available discovery systems. It would need to build something new on the existing innovative discovery systems.
Access and rights management
Before oral history can be published online, consideration must be given to rights and permissions issues. The manner in which oral history recordings have been accessed in libraries and used for research is grounded in the development of the discipline, whose formative years predated the massive availability of digital content. Early oral history projects assumed naively, or perhaps paternalistically, that the interviewers’ aims were the principal ones, and the interviewees were subservient to those intentions. However, as the discipline began to mature, and interviewers considered the rights of those they interviewed, rights agreements which allowed interviewees to control access to their recorded reminiscences became standard practice. These rights agreements and the constraints they imposed on users caused little or no difficulty in the pre-online world of libraries. The audio recording could only be used in the reading room, or perhaps via inter-library loan. Low demands from collection users meant they could be supervised and the access conditions honoured. Listeners huddled over tape players with headphones making notes, or read those transcripts that existed, seeking to imagine the voice in the typed display, link it to the tape counter or estimate the time passed, all the while watched over by reference librarians.
Rights agreements were built on the assumption that collection material would only be made available in a reading room, that the practicality of copying tapes would limit the circulation of content, that only ‘researchers’ would use the collection, and that all shared a common understanding of the difference between the published and unpublished. In the online environment this model has almost no currency: ‘reading rooms’ – neither rooms nor for reading in this case – are now virtual entities sharing attributes of the physical and online; replication and distribution are incorporated into the act of providing access; and the distinction between published and unpublished is blurred by the act of delivering library materials to users. Bona fide researchers may have once been vetted as a requirement of access; now anyone who wants to use the collection may do so online.
Not all of the National Library’s oral history recordings had rights agreements in place. Where rights agreements existed, they did not foresee the development of online access. There is nothing especially innovative about how the NLA developed a solution to the problem of providing access to collections with uncertain or inappropriate rights conditions; just careful assessment, the need to embrace low and sometimes medium level risk, and the ability to retreat from any decision if necessary. Collections were assessed, opinions sought, and discussions held to establish an understanding of the intention of those who contributed to, or had an interest in, the content of recordings.
In some circumstances the Library might manage risk and place an item online even where the rights were not clear, if it was decided this was the best course to take. It is a fine balance; though there is a desire to make material available, there is an equal aspiration to ensure curators keep faith with those who were interviewed, and with any agreements that were established.
For several years the Library’s rights form has included a question asking the interviewee: “Do you permit the Library to provide access to the material on its website so that people can listen to it, make notes about it or download a copy for personal use only?” For material placed online which predates this agreement, our policy is that if the Library were to be challenged regarding online access we would implement ‘takedown’ action, to immediately restrict access while the claim is investigated. After investigation, a decision would be taken whether to reinstate online access, or leave it closed. Only a handful of objections have been raised in the 6 years since we have made audio available online. Interpretation and management of rights has not been the complex or contentious problem we expected, perhaps because people being interviewed for the National Library’s collection are very often motivated to participate, with a desire to tell their story and make it public. The increased audience which online access brings is often welcomed by those interviewed, and the families of those interviewed many years ago are overwhelmingly pleased that their forebears’ recollections are publicly available.
Providing access to the voice in oral history recordings
The increased accessibility of oral history and the wealth of discoverable detail not infrequently results in novel and unexpected subjects for research. It lends weight to the adage that no single mind can imagine all the questions that will be asked of the collection. Oral history recordings are discursive, covering a range of topics beyond the particular subject being investigated, with much other social, cultural and historical information embedded in their conversational form. Interviewees from unrelated projects, recorded at different times, will express different views of the same event or social understanding. Indeed, one of the strengths of oral history is its multi-vocality.
The first step in developing a discovery and delivery mechanism for the granular content within the oral history collection was to find a language that would allow identification of content within a time-based sound recording. This was achieved by developing an XML (eXtensible Markup Language) schema, using Text Encoding Initiative (TEI) Lite, which allows the identification of keywords and topic summaries linked to time code information within the sound recording. The schema itself is agnostic of the type of text included, and so the addition of full transcripts was easily achieved in 2012. Though the Library’s delivery system does not yet support video, the schema should present no impediment to achieving the same capability with different time-based media.
Only one third of the National Library’s oral history interviews are transcribed, as transcription is an expensive process. However, we require documentation in the form of interview summaries to accompany all new recordings and are undertaking work to summarize selected older collection material. The creation of the summary information was incorporated into the normal workflow of interviewer creation and submission of recorded interviews. To populate our time-coded summaries we asked interviewers to summarize interviews as part of the documentation process.
Initially interviewers were required to create summaries as tables in Word documents, which were then transcoded into TEI XML. This proved to be very time consuming and subject to structural errors as our ever creative interviewers found new and previously unimaginable ways to place data within a table. Library staff subsequently developed a password-protected online summary tool which interviewers can use remotely to populate the summary fields with relevant topic and keyword information, while listening to the audio. The online tool specifies what form that data must take and includes guidance and instruction on how to enter information. Once the content is completed and submitted, it is checked and accepted, based on compliance with our style guide, after which the TEI encoded XML output is created and exported for ingest into the Library’s Collection Management System.

NLA online Summary Tool (password protected).
The collection management system (CMS), which was developed by NLA staff, manages, among other things, the sound and audiovisual collection preservation processes, and currently 80 percent 1 of the Library’s 50,000 hours of Oral History and Folklore materials has been preserved to IASA TC04 standards (Bradley 2009), having both preservation master copies and low and medium bandwidth access copies. The CMS produces a package of data in METS (Metadata Encoding and Transmission Standard) which contains the audio and any textual information, including both transcript and summary where they exist, and presents it to the Audio Management and Delivery System (AMAD).
The delivery system is a server side solution, meaning that the functionality of the system is delivered by the Library and streamed to the user’s browser. The first iteration of AMAD used Adobe Flash exclusively to link audio and text content, as at the time, standard HTML4 did not include appropriate features for replay of audio within browsers. In the ensuing clash of software developers, it became difficult to use Flash on the iOS 2 platform. In 2012, when AMAD was redeveloped to incorporate transcripts, we took advantage of the capabilities promised in HTML5 to make the process less dependent on proprietary software and hopefully usable on popular portable devices. During testing it became apparent that not only did users have old versions of browsers, but, with some few exceptions, the latest browsers did not fully support HTML5, though this was necessary for iOS devices. To synchronize the summary, transcript and audio on all devices, our solution was for AMAD to interrogate the browser and make decisions about how to deliver the content.
Leveraging investment in transcription
Transcription of NLA interviews is provided by a commercial contractor. Voice recognition software did not deliver the required quality, especially with the variety of accents and speaking styles represented in the collection. However, the contractor had developed the capability of precisely linking an existing transcript and the audio using voice-recognition-like software which automatically created time points for the beginning of the start and finish of each word. These were very large documents, but being automated, were cost-effective to create.

TEI text from TRC 6449 Martin Green interviewed by Peter Pockley.
User preference is that if a topic or word is searched for, AMAD should deliver the audio in context, with the replay of any searched word commencing slightly before the identified word to give it meaning in relation to the recording. Though it would be relatively simple to identify the time point in a recording a certain number of words before the selected one, using something like the schema above, the bandwidth required to present the extensive time-related metadata meant that the system was significantly slowed down. Instead, in an approach which had a number of innovative outcomes, the Library’s developers reduced the timing information in the transcription to only that which would identify the paragraph’s beginning and end. AMAD measures the time between the words, the number of words and calculates the specified place before the selected word to commence playing at that point. This reduced the requirement for detailed metadata, increased delivery speed and also reduced the cost of creating transcripts. With a granularity of coding at the level of paragraph but with the delivery system enabling delivery at what appears to be the word level, a much simpler level of coding is possible. One of the transcription contractors developed a simple transcribing program which allows a transcriber to manually insert time points at the beginning and end of paragraphs. Though the software is not publicly available, the consequence of this innovation enables small scale, more cost effective transcription services to undertake the XML encoding work.
As Christensen (1997) points out, to ask a stakeholder what innovation they want usually invites a description of a better version of what they already have. True innovation is to supply what they didn’t know they needed, but which, once received, appears to be precisely what the user wanted all along. Researchers always indicate that they want more transcripts. Putting aside the expense, this was not a preferred outcome because of our unwavering belief that the transcript is merely a guide to the richer information held in the audio recording. At the heart of AMAD’s innovation is to have combined the audio with the interview summary, or a transcript when it exists, and to deliver this linked content to the user. Additional benefits include making more untranscribed audio material searchable through available summaries, using the same schema and delivery mechanism and so opening up a research collection to users, wherever they may be located.
The use of timed TEI summaries to provide wide access to content which would otherwise be undiscoverable, has a concrete benefit in times of constrained funding. Oral history interviews are made discoverable even if there is only a minimal catalogue record without crafted MARC summaries. This outcome has the effect of changing the traditional relationship between the catalogue and the item; it is the item that is itself discoverable rather than the catalogue record. The summary has no prescribed terminology and no agreed thesaurus, and instead relies on the capability of a Google-like search infrastructure. This is the most ‘disruptive’ aspect, in Christensen’s terms, of the work done to achieve AMAD’s functionality, albeit disruption which is internal to the Library and invisible to the user. In all other respects our innovative work is an evolutionary development on existing systems.
Rights, Trove and Google
The decision to have a take-down policy as one of the pillars of our risk-managed access approach has had an impact on how material might be made available. The Library's aggregation service, Trove, indexes the collections it aggregates, and through an API (application programming interface), allows third-party search engines to likewise index the data. If unconstrained, this data would include the transcripts and detailed summaries which link to the audio materials. Since a requirement of our risk-managed approach was that we could take down content, it is not desirable to share indexed content beyond Trove. A search within Trove will discover content in the uncatalogued information in the summary and transcript, carry the search term through to the audio delivery system and list the results in the item-specific AMAD search tool. A search in Google will only show the information exposed in the Library's catalogue. However, we can be sure that any material that we are required to ‘take down’ will not remain indexed in search engines which are beyond our control.
Successes
Citing the sound
The ability that our delivery system brings to identifying segments of the audio also allows the user to identify that point in the audio using an HTML address, to paste it into their electronic publication and allow the reader to connect to that audio segment in the Library’s delivery system. For example, http://www.nla.gov.au/amad/nla.oh-vn457054/0-849∼0-1062 provides a connection to the interview with the Chief Librarian of the Commonwealth National Library from 1928 to 1947, Kenneth Binns. By clicking on that link or entering it into a browser, the audio delivery system plays the segment in which Binns describes, in his 1967 interview with oral history pioneer Hazel de Berg, how he organized, transferred and re-accommodated 60-70,000 books from Melbourne to Canberra at the time of the creation of the first Parliament in the nation’s new capital (Binns and De Berg 1967).

AMAD (Audio Management and Delivery System), delivering Interview with Kenneth Binns.
Use and reception of the audio delivery system
Such user feedback as we have received indicates that the system has been well accepted and statistical reports show that use is growing. Use of the not yet online part of the collection on-site in NLA’s reading rooms also remains constant. Visits to the AMAD site and page views remain in the several thousand per month, and statistical reports of time spent on the page indicate good engagement. Time-based media can only be engaged with by taking time to listen, and so a combination of unique page visits and average time on each page tells us about the core user group. This group appears to be significant in number, exceeding on-site collection users, though the statistics gathered are still in their infancy, requiring further interpretation and better understanding. What is clear is that the vast majority of users come to the collection via Google – hundreds more than do through the catalogue and Trove combined. This statistic puts pressure on us to think about the decision regarding the manner in which the summarized and transcribed information is made available to Google, and the impact that this has on our risk-management and takedown policy.
Third party innovation on the platform
Innovation doesn’t stop with the Library’s development of the delivery system, but continues with other users who can leverage their own innovations. A collaborative oral history project, Australian Generations, funded by the Australian Research Council with partners from the National Library of Australia, the Australian Broadcasting Corporation, Monash and La Trobe Universities, has built on the innovative AMAD platform. While the NLA’s Audio Management and Delivery system enables interrogation of the audio interviews and related data, this is only possible with publicly available content. The Australian Generations project is recording 1,500 hours of audio and is creating timed summaries in an instance of the Library’s summary service. Under the research agreement, as with all oral history recordings, interviewees can set the conditions under which their interviews may be used. For many of the interviews in the Generations project, permission has been given for use in research but not to make them publicly available for some years. Project researchers, however, need to work with them now.
To enable the project researchers to work collaboratively and to interrogate the audio content using the timed summary, the NLA developed an export tool which created extracted data from the digital collection management system, the timed summary service, and the audio delivery system, and presented a METS wrapper containing the audio and summary. This package can be recognized and acted on by an HTML5-compliant Chrome browser on a local computer. The university-based project team members ingest the METS packages into Zotero 3 , an open source research tool, which enables searching, tagging, citing and sharing of the audio content and the adding of rich and detailed commentary and information. When the research project is completed, the enriched metadata will be deposited with the rest of the interviews and related content at the National Library to provide even more detailed information to future collection users.
Conclusion
The Audio Management and Delivery System could only be developed and implemented on the back of a sequence of innovations and developments. The capability it provides to locate topics and content and replay the resulting audio is built on a foundation of innovative audio and digital collection management systems, content discovery systems (Trove) and a structured catalogue. AMAD, for all its inventive approach, does not bring a new and advanced technology to bear on the problem, but rather evolves from the (r)evolutionary work already being undertaken within the NLA and the library community generally, and recombines its components in novel and interesting ways. The innovation stems from a combination of imagining the service and the manner in which the development team implemented AMAD.
The creative destruction that accompanies a disruptive type of innovation is by and large anathema to the library community. The deep investment over many, many decades, if not centuries, in information organization provides a legacy of content whose value continues to grow in the modern information age. That is not to say that such innovations are not disruptive to the working environment of libraries: the distinction between record and content is destabilized by the new systems, and disrupts the traditional hierarchy of the library world. Rather, our innovative work allows us to compete with other sources of content by making the unique unpublished oral history and folklore collection available, searchable and the usable in the same way as traditional text-based documents.
