Spatial Sound Design for Cinematic Virtual Reality

Abstract

Virtual reality (VR) has widespread application across multiple fields, including education, entertainment, medical, training of soldiers and pilots using virtual simulation. Entertainment is one of the fastest-growing fields, including games, theme parks, virtual tours, live events and films. In entertainment, cinematic virtual reality (cine-VR) is fast emerging as an effective way of immersive storytelling. Spatial sound design is considered pivotal in enhancing the immersive experience of cine-VR, redefining how audiences engage with narratives. Since 2012, there has been a steady increase in interest in cine-VR; however, the scholarship on sound design for cine-VR still needs to be expanded with focused studies. This bibliometric analysis delves into the scholarly landscape to gain insights into spatial sound design research trends in cine-VR. A systematic search of the Web of Science Core Collection database revealed 910 English documents published between 2012 and 2024 (June). Information retrieved through 269 shortlisted articles was analysed using the Web of Science analysis tools, VOSviewer and RStudio. The bibliometric analysis identifies, quantifies, visualises and analyses publication trends, influential authors, key journals, prevalent keywords, co-occurrences of keywords, collaborative networks and global distribution patterns. The United States of America is the most influential country with the most publications, followed by Germany and France. Eight hundred and fifty-two authors affiliated with 352 institutions from 39 countries contributed to 269 shortlisted publications. Stefania Serafin is the most influential author in the field and has the most publications. Aalborg University leads with most publications. The articles from Columbia University received the maximum number of citations. The most used keywords are VR, spatial sound and localisation. Cine-VR is an emerging field, and the study’s findings will be relevant to academia and industry to understand better current research trends and themes in spatial sound design in cine-VR. The findings will guide further explorations in the field.

Keywords

Immersion 360° video localisation presence 3D audio

Introduction

Virtual reality (VR) has widespread application across multiple fields, including education, entertainment, medical, training of soldiers and pilots using virtual simulation. Entertainment is one of the fastest-growing fields, including games, theme parks, virtual tours, live events and films, as it creates an immersive experience for the viewer. Though the field has recently gained popularity, filmmakers and scholars have been experimenting to create immersive experiences as early as the 1940s. Klapholz (1991) and Griffin (2015) describe the significance of the multichannel sound system Fantasound developed by Disney and RCA in 1940s for animation film Fantasia in the desire to create an immersive experience. In 1952, Fred Waller developed Cinerama, an ultra-wide curved screen with multiple projectors, creating a continuous projection experience around the audience (Lipton & Lipton, 2021; Reeves, 1999; Waller, 1993). However, the viewer remained away from the screen, and no interaction was possible. In 1955, Morton Heilig presented a vision of multichannel, multisensory immersive theatre Sensorama and called it the cinema of the future (Bennett, 1994; Gutierrez, 2023). A decade later, Ivan Sutherland (1965, 1968) presented a concept followed by a prototype of a head-mounted three-dimensional display called ultimate display, more popularly known as Sword of Damocles because of the headset hanging from the top. Since then, several scholars and inventors contributed to making a portable VR headset. At the same time, scholars also investigated interactive narratives in computer-generated imagery (CGI) and game environments (Aylett, 1999; Aylett & Louchart 2003; Laurel, 1991; Mateas, 2001). The turning point came in 2012 when Palmer Luckey presented a portable head-mounted display (HMD), the Oculus Rift and opened the possibility of taking VR to the masses (Harley, 2020). Still, HMDs and overall VR technology remained complex and expensive for end users.

Year 2014 is considered the year of transformation for VR. Facebook (now meta) acquired the Oculus VR company (Stuart, 2021). Google launched USD 10 mobile-based stereoscopic cardboard and mobile applications to enable access to VR content using smartphones. Subsequently, Sony, HTC and Samsung also accelerated research in VR headsets. These developments also created the possibility of immersive storytelling in VR. Filmmakers across the globe were exploring VR for cinematic experience. Arora and Milk successfully created Cloud Over Sidra (Arora & Milk, 2015), a 360° VR film about the Syrian refugee crisis. The film narrates the story from the point of view of Sidra, a 12-year-old girl living in the refugee camp. The film underlines the refugee crisis and showcases cine-VR’s possibilities. In 2016, YouTube VR, a dedicated version with an interface for VR devices, was launched. However, the filmmaking community took note of the medium when ‘Carne y Arena’—a VR experience by Iñárritu (2017)—received a special achievement Oscar award. The VR experience highlights the plight of migrants crossing Mexico border to enter the United States of America (USA). The VR experience also drew attention of scholars, and they conducted studies on the quality of immersion (Adelman, 2019), use of realistic sound, image and other sensorimotor sensations (Raessens, 2019), and effective use of immersive storytelling tools (Zacarias, 2024). The rapid growth of film festivals, conferences, publications and courses dedicated to immersive filmmaking in VR in the last decade highlights the growing interest in the field.

This study aims to gain critical insights into research trends and an overview of spatial sound design for cine-VR. We searched the Web of Science Core Collection (WoSCC) database and shortlisted relevant publications. The bibliographic data—citations of countries, regions, institutions, authors, study categories, keywords and references—were extracted for the identified publications. Using the analysis method discussed in the methods section, this study conducted descriptive analysis. Data visualization generated with open-source software displays results graphically, enhancing comprehension. This study aims to provide answers to the following research questions.

RQ1: What is the growth of annual scientific production in research on sound design in cine-VR?

RQ2: Who are the most productive and relevant authors, countries and institutions working in this field?

RQ3: How are authors, countries and institutions collaborating across the globe?

RQ4: What are the frequent keywords and primary themes of research in the area?

RQ5: What are the future directions and possibilities of sound design in cine-VR?

This study is structured as follows to answer the stated research questions. The second section provides the literature review, the third section explains the study’s methodology, followed by results in the fourth section, discussion in the fifth section, limitations and future in sixth section and conclusion of the study in the seventh section.

Literature Review

Cinematic VR (cine-VR) is referred to by different authors as cine-VR, CVR, VR films or 360° films. There is also a debate among scholars on the definition of cine-VR and what constitutes it. Some definitions include 360° video or CGI experienced through an HMD (Kjær et al., 2017; Nielsen et al., 2016); cine-VR work contains some interactivity even if limited to just choosing a field of vision within the 360° environment (Vosmeer & Schouten, 2017); live-action work filmed using the panoramic or omnidirectional (360°) camera (Reyes, 2018; Ross & Munt, 2018); the ‘Cinematic’ label relates not just to the high-end nature of the format’s image quality but to its narrative leanings (Ross & Munt, 2018). While the debate continues to develop with medium, Mateer’s (2017) statement, ‘VR with media fidelity approaches found in the feature film’, is appropriate because it is not limited to a form or technology but instead talks about the approach. Based on the level of interaction and storytelling, there are two possible applications of VR in entertainment: interactive (games/movies) and linear (experiential, storytelling/cinematic). With some level of interactivity but a fixed outcome of the story, cine-VR is somewhere between game- and screen-based cinema (Breslin et al., 2017). Dooley (2021) argues that 360° videos also allow viewers to choose the vantage point of the story and should be included in cine-VR. This study includes articles related to 360° video for the analysis.

In screen-based cinema, filmmaking language and grammar developed over the years around the two-dimensional visual frame and audiences’ fixed orientation towards the screen. In cine-VR, the viewer has the freedom to look in any direction in 360° video. This freedom presents a challenge to filmmaking and exhibition conventions. Scholars argue that the filmmaking conventions of screen-based cinema must be reimagined for 360° cine-VR (Chan, 2023; Erkut, 2017; Gödde et al., 2018; Guaraná, 2024; Reyes & Zampolli, 2017). Since the viewer is part of the cinematic world and can interact with the environment, her role and level of interactivity need to be defined at the ideation stage to create a sense of immersion and presence (Cho et al., 2016; Dolan & Parets, 2016; Gödde et al., 2018; Nash, 2021).

The needs to have a new approach and various aspects of screenwriting for VR is covered in detail by Dooley and Munt (2024). Previously also, scholars explored the narrative framework of screenwriting for cine-VR (Alves et al., 2023), the idea of the spatialised screenplay (Ross & Munt, 2018), a narrative structure for open-world cine-VR (Mazarei, 2023) and writing for space instead of screen (Reyes, 2022). Likewise, the other filmmaking aspects are also explored, such as the mode of production (Chan, 2023; Zhang & Weber, 2023), the use of the camera (Heagerty et al., 2024), editing and transitions (Marañes et al., 2023; Medlar et al., 2024; Zhang et al., 2024).

In cine-VR, viewers can look around in 360° video. However, at any given point of time, one could only look at a limited field of view only. This limitation leads to fear of missing out in viewers (De Abreu et al., 2017). Hence, there is a need to guide the viewer’s attention towards the critical events of the narrative (Wang et al., 2021). The omnipresent nature of sound is considered helpful in guiding viewers’ attention as well as creating a sense of presence and immersion. Scholars acknowledge the importance of spatial sound in VR (Bosman et al., 2024; Poeschl et al., 2013; Serafin & Serafin, 2004), authenticity in sound design for VR experience (Tatlow, 2024) and to guide users’ attention through sound (Begault & Trejo, 2000; Cohen et al., 2015; Walter, 2023). The spatial audio formats, binaural and ambisonic, have been developed over the years but gained popularity with the evolution of VR. Research in binaural recording and playback started in the nineteenth century. Binaural remains one of the most accessible formats (Paul, 2009; Sunder, 2021). Ambisonic, developed in the 1970s, is a full spherical sound format representing the sound field at a point or in space (Baxter, 2022; Boren, 2017). In 2012, Dolby Atmos extended the surround sound format by incorporating height channels and object-based audio techniques to enhance the immersive experiences further (Pfanzagl-Cardone, 2023; Sergi, 2013; Visser et al., 2024). A review of ambisonics and object-based audio presents the limitations and challenges of spatial audio formats (Zhang et al., 2017).

Even though spatial audio formats are available for many years, there are still challenges in sound design for cine-VR. First, recording audio on location with a 360° camera remains a challenge as everything is in the frame, and using the regular shotgun microphone is difficult. Second, the viewer is free to look around and interact with the cinematic world; hence, instead of pre-rendered, the soundtrack in cine-VR must respond to the viewer’s interaction. Hence, there is a need to re-look at the sound design ideas and processes to incorporate the viewer’s interactivity.

In the last decade, the cine-VR has gained popularity, and rapid technological developments have created new opportunities for sound design. Scholars have acknowledged and explored the role of sound in VR. Candusso (2017), in one of the earlier studies, underlined the difference between traditional cinema and cine-VR sound design workflow. The importance of directional sound, spatialisation, to guide attention towards reason of interest is highlighted by Bala et al. (2019) and Masia et al. (2021). Studies have also highlighted possible negative impact of 3D audio (Mendonça et al., 2019). Whitford (2021) presented the overview of 360° location sound recording formats and their significance in immersive experience. The question of balance between creativity and realism addressed by Butterworth (2022). A framework of sonic interaction for the virtual environment has been presented by Geronazzo and Serafin (2023). While these studies cover different aspects of sound in VR, the focused studies on sound design for cine-VR remain sparse (Chaurasia & Majhi, 2021, 2023). Also, with studies focusing on diverse aspects of sound, there is a need to identify the trend and themes of research on sound design for cine-VR in last decade. The current bibliometric study aims to bridge this research gap, provide a comprehensive outlook of current research trends and identify future directions.

Methodology

This study uses bibliometric analysis of online databases as it can provide valuable insight in addition to traditional literature review methods like meta-analysis and systematic analysis (Donthu et al., 2021). It provides a broad perspective of key concepts, their co-relations and the research trends and themes in the given time frame (Aria & Cuccurullo, 2017; Kraus et al., 2022; Mukherjee et al., 2022; Ozturk, 2021). We chose the WoSCC database, as it is one of the largest, most widely used and most reliable databases of scientific publications, including publication conferences, journals and books (Liu et al., 2022; Norris & Oppenheim, 2007).

Data Collection

On 20 June 2024, we retrieved the bibliometric data from the WoSCC online database for publication in the English language between 2012 and 2024 (June). Creating a search string using relevant keywords is crucial for getting all representative samples (Liu et al., 2022). As discussed in the literature review, many terms are used for cine-VR and sound design. Hence, the search string used multiple combinations of words to include all relevant documents. We searched the topic field, which included the title, abstract, author keywords and keywords plus in the WoSCC. The search formula was TS = ((‘Sound Design’ or spatial or 3D or surround or multichannel or ‘object-based audio’ or ‘wave-field synthesis’ or binaural or ambisonics or atmos or MPEG*) and (sound* or audio* or auditory or sonic* or voice or music or Foley or ambiance) and (‘virtual reality’ CVR or Cine-VR or ‘Cinematic Virtual reality’ or VR or 360*)) (Topic) and 2012–2024 (Year Published) and English (Language). The search presented a total of 910 documents, out of which only 881, including Journal articles, conference proceedings, book chapters and early access articles, were considered for further review. Conference proceedings and early access articles present the latest updates and are significant in emerging fields of study, such as sound design for cine-VR. Subsequently, the documents were further shortlisted based on the title and abstract review for relevance per the following criteria: (1) The research topic is unrelated to spatial sound design, and (2) the research direction is unrelated to cine-VR or VR. However, some articles from allied fields were also included to present diverse perspectives on sound design. Finally, 269 documents were shortlisted for final analysis. Figure 1 shows the flowchart of the search and screening process. All bibliographic data were extracted, including title, publication year, country or region, institution, journal, references, keywords and abstract for each publication.

Figure 1.

Flow Diagram of the Search and Screening Process.

Data Analysis

The study uses VOSviewer, RStudio and WoSCC to analyse and visualise bibliometric data. The VOSviewer, developed by Van Eck and Waltman (2010), is an open-source programme available to the bibliometric research community (see www.vosviewer.com). RStudio is also an open-source programme (see https://posit.co/download/rstudio-desktop/) to conduct a descriptive analysis of bibliometric datasets (Grömping, 2015; Kronthaler & Zöllner, 2021). The most popular keywords, terms, influential authors, active organisations, countries and their co-relation are analysed using VOSviewer and RStudio. The WoSCC analysis presented the number of publications year-wise, document type and research area.

Results

Document Types and Research Area

Of 269 documents shortlisted for analysis, 173 (64.31%) are conference proceedings, 104 (38.66%) are journal articles and only 1 (0.37%) are early access documents. In an emerging field of inquiry, it is evident that a greater number of articles are presented at conferences than journals as initial findings of the experiments/studies. However, more than 100 journal articles also indicate the scientific validation of the subject.

The WoSCC database listed search results on sound design for cine-VR under 36 research areas. Table 1 presents the top 10 research areas. The top three research areas are computer science with 178 (66%), engineering with 95 (35%) and acoustics with 67 (25%) publications. The top 10 areas listed as per the WoSCC categories are in science and technology domain. The finding suggests that the majority of studies in the field focus on science and technology aspect of sound in cine-VR.

Table 1.

Top Ten Research Areas.

Research Area	Record Count	% of 269
Computer science	178	66.171
Engineering	95	35.316
Acoustics	67	24.907
Imaging science photographic technology	27	10.037
Telecommunications	15	5.576
Physics	10	3.717
Psychology	9	3.346
Chemistry	8	2.974
Materials science	7	2.602
Neurosciences neurology	6	2.23

Publications over the Years

Figure 2 presents a year-wise chart of publications from 2012 to 2024. The publication year is on the X-axis, while the number of publications is on the Y-axis. The dotted blue line presents the growth of publications over the years, with numbers for each year. The grey line indicates the linear trend line based on the annual publications in this period. The trend line has a positive slope of 2.61. The increase in publications is steady. The average yearly growth of around <5 is encouraging for an emerging field. In seven years, from 2012 to 2019, the number of publications increased from just one to 46. The interest in the subject of sound design for cine-VR peaked in 2019 with 46 publications. In the following year, in 2020, the number of publications reduced to 27 due to the global COVID-19 pandemic. The publications slowly increased to 35 in 2021 and 41 in 2022. However, in 2023, only 28 publications happened, just one more than pandemic times. Seven documents had already been published at the time of study in June 2024. The decline in 2023 needs further investigation. The growth in the last 10 years aligns with technological developments such as portable HMDs, 360° cameras and software support.

Figure 2.

Note: The dotted blue line shows the publications and the red line presents the trend.

Countries

The search result shows the contribution of 39 nations or territories to spatial sound design for cine-VR and allied subjects. Table 2 lists the top 10 countries with citations received for all publications. The USA 54 (20%), England 45 (16%) and Germany 30 (11%) are the top three contributing countries in terms of number of publications. The USA also received the highest number of citations (589). There are 22 countries with at least three contributions, and only 16 countries have five or more contributions.

Table 2.

Top Ten Most Productive Countries.

Country	Documents	Citations
USA	54	589
England	45	360
Germany	30	177
China	22	31
Ireland	18	83
Japan	18	51
Denmark	16	76
South Korea	16	43
Italy	14	85
Spain	13	100

The network visualisation of publications across countries is generated through VOSviewer software, as shown in Figure 3. In the visualisation, the size of the circular block is proportionate to the number of publications; the more publications there are, the bigger the circle becomes. The circle of the same/similar colours indicates a close association between countries. The thickness of connecting lines of different colours represents the cooperative relationship between countries or link strength (LS) (Van Eck & Waltman, 2017). The purple block has thicker connected lines, indicating that the LS of the USA with other countries is highest. However, with just 13 documents, Spain has slightly better LS than Germany. The LS is most substantial between the USA and England, as indicated by the thickness of the connecting line.

Figure 3.

Note: The size of the circle is proportional to the number of publications.

Organisations

A total of 340 organisations from 39 countries contributed to the scholarship. Of the 340 organisation, 28 contributed at least three documents, 11 at least four and only six organisations contributed five or more. Table 3 lists top 10 organisations of the 28 with a minimum of three publications along with the number of citations received by them. Aalborg University leads the contribution with 10 publications, followed by the University of York with seven and Trinity College Dublin and the University of Southampton with six each. However, the total number of citations is highest for publications of Columbia University (147), followed by the University of York (95) and Microsoft Research (94).

Table 3.

Top Ten Productive Organisations.

Organisation	Documents	Citations
Aalborg University	10	35
University of York	7	95
Trinity College Dublin	6	45
University of Southampton	6	5
International Audio Laboratories Erlangen	5	22
Queen Mary University of London	5	7
Nanyang Technological University	4	69
University of Surrey	4	43
University of Michigan	4	35
Technical University of Denmark	4	31

Figure 4 shows the network visualisation map of 21 organisations with at least three publications for their collaboration with each other as LS. Seven organisations are not actively connected with the rest, hence not included in the networking map. The size of the circle indicates the LS of the organisation based on active collaboration with other organisations. The University of Surry (Blue) has a total LS of 11 with just five active links. However, even with more links (six), the University of North Carolina has a LS of only nine. Microsoft Research is third in LS, with seven, while it has the most links (07). Microsoft is connected with several organisations, but the collaboration is not very active. The University of Surrey and the University of Southampton (with overlapping blue circles) have the strongest link (LS 4), indicating an active collaboration.

Figure 4.

Note: The size of the circle is proportional to the link strength.

Sources

A total of 187 sources, including journals and conference proceedings, were identified in the database. According to the results, 19 sources published more than three documents, while only nine published more than five. Figure 5 presents the most relevant sources with at least three publications created using RStudio. With eight publications, the AES International Conference on Immersive and Interactive Audio 2019 and Journal of IEEE Transactions on Visualization and Computer Graphics are the most active sources, closely followed by Journal of Applied Science with seven publications. Audio Engineering Society (AES) has contributed significantly to the sound design field for cine-VR. Five of the top 10 sources are AES conference proceedings. Journal of IEEE also received the highest number of citations (131). Even though Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems is not in top 10 list in terms of number of publications, it received second highest citations (125).

Figure 5.

Top 10 Sources Based on the Number of Documents Published.

Authors

A total of 867 authors contributed to the research in the field. Thirty-one authors published three or more articles, 13, four or more, and just three published five or more. Stefania Serafin contributed the most with eight publications, followed by Gavin Kearney and Niall Murray with five each. Thomas Robotham, Sylvia Rothe and eight other authors contributed four articles each.

Table 4 presents top authors with at least four publications and include total 13 authors. Sylvia Rothe with just four publication received highest number of citations (79), underlying the significant and relevance of her contribution to the field.

Table 4.

Top Productive Authors with at Least Four Publications.

Author	Documents	Citations
Stefania Serafin	8	34
Gavin Kearney	5	45
Niall Murray	5	13
Andrew Allen	4	30
Hauke Egermann	4	44
Emanuel A. P. Habets	4	21
Daniel Johnston	4	44
Hansung Kim	4	36
Cagri Ozcinar	4	30
Thomas Robotham	4	21
Sylvia Rothe	4	79
Aljosa Smolic	4	26
Jing Wang	4	10

The authors’ production over the years is analysed using RStudio and presented in Figure 6. Stefania Serafin is one of the first authors to publish on the topic in 2015 and one of the most consistent ones with publication till 2021. The size of the circle is proportional to the number of publications. Majority of top authors published work in 2020, even though the overall number of publications went down in that year.

Figure 6.

Note: The size of the circle is proportional to the number of articles.

Keywords

WoSCC database lists author keywords and keywords plus. While author keywords indicate the author’s intent and the study’s essence, keywords plus are generated through an algorithm that provides extended terms derived from the cited references or the record’s bibliography (Zhang et al., 2016). The analysis for all keywords was done using the default setting of the VOSviewer to understand the most active keyword. Out of the 971 keywords listed, only 41 occurred more than five times and are presented through density visualisation in Figure 7. The solid yellow colour block, its size along the text size, indicates higher frequency, while the faded yellow blocks indicate the low frequency of a word. The words used together are also placed near each other. The top three keywords based on their occurrences are VR (104), spatial audio (50) and ambisonics (33). Cine-VR and sound design do not appear in the top 10 used keywords. However, many keywords in the list were also versions of the same word, that is VR, virtual-reality, VR, spatial audio, spatial sound, localisation and sound localisation.

Figure 7.

Note: The size of the yellow block and the size of the text is proportional to the frequency.

Since VR is the domain of study, we excluded it to identify the other keywords relevant to the study. Figure 8 presents the network visualisation map of the most used keywords. The size of the circle is proportionate to the number of articles identified in the study. Spatial audio is the biggest circle, followed by ambisonics and localisation. Even in this map, spatial audio, spatial sound, 3D sound and 3D audio appear as different circles while they are actually in the same terms. In VOSviewer, it is not possible to club these keywords and create visualisations maps. However, it is possible to manually create table with clubbing the similar words. By doing so, the list also includes cine-VR in the top 10 keywords.

Figure 8.

Note: The size of the circle and text is proportional to the frequency of the keyword.

Relevant Documents

The number of citations received indicates the relevance of study in the domain. Of the 269 documents, 79 received more than five citations. Only seven of these documents are related to each other. However, we created the network visualisation map of all 79 documents for citation, as presented in Figure 9. The size of the circle in the map is proportionate to the number of citations received by the document presented by the author’s name and year. Audible panorama: automatic spatial audio generation for panorama imagery (Huang et al., 2019) received the highest citation (116), followed by 66 citations for Quality assessment of acoustic environment reproduction method for cine-VR in soundscape application (Hong et al., 2019) and Viking VR: designing a VR experience for a museum (Schofield et al., 2018). The highly cited articles discuss sound as part of the cine-VR experience but do not focus entirely on sound design for cine-VR.

Figure 9.

Note: The size of the circle is proportional to the number of citations received.

Relationships Among Author’s Country, Keywords and Author

RStudio offers a function to create a three-field plot illustration to establish a relationship between three bibliometric data elements. In this study, we decided to understand the relationship between the author’s country (AU), keywords (DE) and authors (AU), as illustrated in Figure 10. The grey linkages represent the relationship between countries on the left, the keywords in the middle and authors on the right. The thicker the grey line, the more the correlation between the two elements.

Figure 10.

Note: The size of the rectangle is proportional to the number of publications (by country and author) and frequency of keyword.

The plot included up to 20 elements for each of the three fields. The size of the rectangle is proportionate to the number of articles connected with each component. The USA, United Kingdom, Germany and China have published studies on the top topics, with the most publications on VR and spatial audio. The number of publications by authors is less; hence, the size of rectangles for the author field is small. The grey line is also thinner between authors and keywords. Top authors Hansung Kim, Jainjun He, Andrew Hines and Stefania Serafin have also worked on VR and spatial audio.

Discussion

This study uses bibliometric analysis to present the global research trends in spatial sound design for cine-VR. We searched the WoSCC—one of the largest online databases—for documents published in English. Since 2012, the possibilities of immersive storytelling in VR have gained momentum with the presentation of the Oculus—the first portable HMD. Hence, this study includes articles starting from 2012 till today (June 2024). Being an emerging field, scholars use various terms to discuss the same idea/concept. Hence, creating a search string was challenging. The search string was designed to include as many relevant articles as possible and presented total 910 documents. However, this also included many non-relevant articles. Hence, search results were further shortlisted based on the title and abstract review. Many articles were in the health and medical domain and had to be removed. Along with journal articles, conference proceedings and early access articles were also included to cover the latest studies in the field. This way total 269 articles were selected for final analysis.

Top 10 publications are listed in science and technology categories of WoSCC. Even though a database lists a document under multiple research areas, the result suggests an overall trend that the focus of research in the last 12 years has been on technology. In an emerging field, it is obvious that the focus of study is on developing tools, techniques, software and plug-ins of sound design in cine-VR. The number of publications increased from just one in 2012 to 46 in 2019. The dip in publications in 2020 is evident due to the global COVID-19 pandemic. The number of publications increased again to 41 in 2022, but in 2023, the publication went down to 28 again. However, the growth of scholarship is steady and positive. The results indicate a growing interest of scholars in sound design for cine-VR.

The research interest in the field is global, with 39 countries or territories contributing to the scholarship on spatial sound design for cine-VR. Amongst these 39 countries, 22 contributed three or more studies, 21 at least four and 16 five or more. The top three contributing countries are the USA (54), England (45) and Germany (30). The studies from USA received highest number of citations and its collaboration with other countries is also most active. The findings indicate the significant role of the USA in the growth of research on sound design in cine-VR. With the development of the MPEG-H format, Germany is also contributing significantly after USA and England. However, for more than a decade, the overall number of publications by countries remains low.

While the USA has the most publications, the top three contributing organisations are Aalborg University of Denmark, the University of York, UK, and Trinity College Dublin, Ireland. The results indicate that the research in the USA is spread across and perhaps led by individuals compared to institutional research in other countries. However, Columbia University, USA, received the highest citations. The University of Surry, UK’s LS is strongest with other organisations. Just six organisations across the globe contributed five or more articles. This raises questions of consistency and interest in the subject. There is also a possibility that sound design is not the focus of studies.

The AES International Conference on Immersive and Interactive Audio 2019 and Journal of IEEE Transactions on Visualization and Computer Graphics are the most active sources, with eight contributions. The top 10 contributing sources include six conference proceedings and four journals. The result is proportional to the overall ratio of conference proceedings and journal articles in shortlisted documents.

The most cited document, Audible Panaroma (Huang et al., 2019), is about generating automatic spatial audio for panoramic imagery. Hong et al. (2019) is about the quality assessment of acoustic environment reproduction methods for cine-VR. The results indicate that the interest in articles focusing purely on sound design remains low, and scholars do not find it to be of broader application. However, in the shortlisted documents, the most common concerns are spatial audio and localisation technologies, and guiding viewers’ attention through audio cues impacts viewers’ experience in 360 cine-VR.

The keyword analysis presents virtual reality as the most used keyword. Since it is the domain name, we excluded it while creating network visualisation maps to analyse the other relevant keywords to the study. Many keywords are also used in variations like audio and sound or localisation and sound localisation. This indicates that well-defined terms and definitions are yet to emerge in the field. Hence, any search string needs to include all possible combinations of keywords to include relevant results in an emerging field like cine-VR. The three most used keywords are spatial audio, ambisonics, and localisation. Spatial audio is considered essential to localise sound sources in 360° video. Ambisonics as a format facilitates the recording of spatial audio in the field and later situates sound elements in the sphere. In this regard, spatial audio, isonics and localisation are complimentary. The result indicates the significance of spatial audio and localisation in 360° cine-VR for immersive experience. Thus, an in-depth exploration of spatial audio and localisation in cine-VR is required. Stefania Serafin is the most active author with eight publications. However, only three authors contributed five or more studies in the field. This indicates a lack of consistency or interest of scholars. Though Sylvia Rothe only published four articles, her work received highest number of citations.

The three-field plot created using RStudio presents the correlation between three different fields: author’s country, keyword and author. The keywords virtual reality and spatial audio are used by most top contributing countries, such as the USA, United Kingdom and Germany, as well as top contributing authors Hansung Kim, Jainjun He, Andrew Hines and Stefania Serafin.

Figure 11 presents the study summary through science mapping of critical findings. The top part of the diagram underlines the focus of research on spatial sound design for cine-VR through key authors, research areas, frequently used keywords and important themes. The bottom part of the diagram presents the demographics and frequency of publication over the years. The most frequently used keyword is virtual reality, while cine-VR comes in ninth place.

Figure 11.

Science Mapping.

This reflects that studies on sound design remain spread across wider domain of virtual reality, and focused studies on cine-VR are limited. The other frequent keywords are about spatial audio formats such as ambisonics, binaural and 3D audio. The themes that emerged from the study can further be clubbed into two subcategories. The first is spatial audio, which includes studies on localisation, guiding viewers’ attention and diegetic cues in 360° videos. The second is viewer experience, including studies on the immersive experience, interactivity, noise and quality assessment. These two categories are interconnected and influence each other as the ultimate objective of spatial audio is to enhance the viewer’s experience. The important themes do not include focused studies of sound designers and filmmakers’ approach towards sound design in cine-VR. There is a need to explore this aspect further.

Limitation and Future

There are some limitations to this study. Cine-VR and spatial sound design studies are multidisciplinary, with literature spread across different databases. This study only searched the literature in the WoSCC database. There is a possibility of missing some significant studies that are not listed in the database. Also, spatial sound design for cine-VR is an emerging field, and some articles, interviews and reviews published on the web could be of great relevance but are missing from this analysis. Since many keywords have synonyms and are used in combination with other words, the search string became a bit long. The idea was to include all possible keywords. The search results presented many articles that were not related to the subject. Further, the shortlisting was done objectively based on relevance, but we cannot rule out the researchers’ inherent bias.

A follow-up study incorporating more than one database might help present more conclusive results. Future studies can focus on one aspect of sound design, that is sound recording, sound editing or mixing for better results. The authors are also conducting a user study to understand and map the differences/similarities between screen-based cinema and cine-VR sound design.

Conclusion

This study presents an overview of research trends in spatial sound design for cine-VR through bibliometric analysis of shortlisted articles from the WoSCC. This study uses open-source bibliometric analysis software’s VOSviewer and RStudio to present the most relevant and active publications, authors, organisations, countries, sources and their co-relationships. With just 269 shortlisted articles in 12 years, the scholarship is limited but growing steadily. Being an emerging field, the focus of the studies remains on technology. The interest in the subject was at its peak in 2019. However, 104 (38.66%) journal articles in an emerging and specialised field reflect the broader interest and acceptance of the subject in the scientific community. The interest in the field is spread across the globe, with 39 countries contributing to the research. However, overall, the scholarship remains inconsistent, with only three authors and six organisations publishing five or more studies in last 12 years. The possible reasons of this inconsistency could not be asserted in this study and require a separate inquiry. The major shift in cine-VR from screen-based cinema is the expansion of visual frame from two dimensional to 360 degree. Hence, it is obvious that spatial audio, localisation and guiding viewers’ attention emerged as the key themes of the research.

The technology is in flux and a standard workflow of sound design is yet to emerge. There is a need to explore these themes further in the context of immersive storytelling in cine-VR. Also, it’s time to expand the scope of studies and explore creative aspects of sound design—the ideation for sound design in 360 and its execution through the sound design process; the mapping of sound design for screen-based cinema and cine-VR. More in-depth exploration of the creative use of different spatial audio formats and the viewer’s response to them is required. The findings of the current bibliometric analysis would help scholars and filmmakers further explore the challenges and possibilities of sound design in cine-VR.

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

Hitesh Chaurasia is a recipient of a Seed Research Grant for Early Career Faculty (SRG-ECF), National Institute of Design, Ahmedabad.

References

Adelman

R. A.

(2019). Immersion and Immiseration: Alejandro González Iñárritu’s Carne y Arena. American Quarterly, 71(4), 1093–1109.

Alves

, Rubio-Tamayo

J. L.

, & DurAn-Fonseca

(2023). Investigating a cinematic virtual reality narrative framework for screenwriting. Journal of Screenwriting, 14(3), 311–333.

Aria

, & Cuccurullo

2017, Bibliometrix: An R-tool for comprehensive science mappinganalysis. Journal of Informetrics, 11(4), 959–975.

Arora

& Milk

(Director). (2015). Cloud over Sidra [Film]. VRSE Works Production.

Aylett

(1999, November). Narrative in virtual environments—Towards emergent narrative. In Proceedings of the AAAI fall symposium on narrative intelligence (pp. 83–86).

Aylett

, & Louchart

(2003). Towards a narrative theory of virtual reality. Virtual Reality, 7, 2–9.

Bala

, Masu

, Nisi

, & Nunes

(2019, May 2). ‘When the elephant trumps’: A comparative study on spatial audio for orientation in 360° videos. In Conference on human factors in computing systems – Proceedings.

Baxter

(2022). Immersive sound production using ambisonics and advance audio practices. In Immersive sound production (pp. 46–66). Focal Press.

Begault

D. R.

, & Trejo

L. J.

(2000). 3D sound for virtual reality and multimedia (No. NASA/TM-2000-209606).

10.

Bennett

(1994). Hollywood’s indeterminacy machine: Virtual reality and total recall [Paper presented at the Media Futures: Policy and Performance, Conference (1994: Griffith University)]. Arena Journal, 3, 23–32.

11.

Boren

(2017). History of 3D sound. Immersive sound (pp. 40–62). Routledge.

12.

Bosman

I. D. V.

, Buruk

O. O.

, Jørgensen

, & Hamari

(2024). The effect of audio on the experience in virtual reality: A scoping review. Behaviour & Information Technology, 43(1), 165–199.

13.

Breslin

, Argo

D. J.

, & Petrova

(2017). Novel approaches to production and post-production of immersive VR/360 audio-visual experiences. School of Simulation and Visualisation.

14.

Butterworth

(2022). Beyond sonic realism—A cinematic sound approach in documentary 360° film A Butterworth. Studies in documentary film.

15.

Candusso

(2015, July). Designing spatial sound: Adapting contemporary screen sound design practices for virtual reality. In SMPTE17: Embracing connective media (pp. 1–10). SMPTE.

16.

Chan

F. Y.

(2023). Cinematic virtual reality film practice: expanded profilmic event and mode of production. Nanyang Technological University.

17.

Chaurasia

H. K.

, & Majhi

(2021, December). Sound design for cinematic virtual reality: A state-of-the-art review. In International conference of the Indian society of ergonomics (pp. 357–368). Springer International Publishing.

18.

Chaurasia

H. K.

, & Majhi

(2023, January). Challenges and opportunities of spatial sound design in cinematic virtual reality: A scoping review. In International conference on research into design (pp. 1127–1139). Springer Nature Singapore.

19.

Cho

, Lee

T. H.

, Ogden

, Stewart

, Tsai

T. Y.

, Chen

, & Vituccio

(2016). Imago: Presence and emotion in virtual reality. In ACM SIGGRAPH 2016 VR village (pp. 1–2).

20.

Cohen

, Villegas

, & Barfield

(2015). Special issue on spatial sound in virtual, augmented, and mixed-reality environments. Virtual Reality, 19, 147–148.

21.

De Abreu

, Ozcinar

, & Smolic

(2017, May). Look around you: Saliency maps for omnidirectional images in VR applications. In 2017 ninth international conference on quality of multimedia experience (QoMEX) (pp. 1–6). IEEE.

22.

Dolan

, & Parets

(2016). Redefining the axiom of story: The VR and 360 video complex. Tech Crunch.

23.

Dooley

(2021). Cinematic virtual reality: A critical study of 21st-century approaches and practices. Springer Nature.

24.

Dooley

, & Munt

(2024). An introduction to screenwriting for virtual reality. Screenwriting for virtual reality: Story, space and experience (pp. 1–26). Springer International Publishing.

25.

Donthu

, Kumar

, Mukherjee

, Pandey

, & Lim

W. M.

(2021). How to conduct a bibliometric analysis: An overview and guidelines. Journal of Business Research, 133, 285–296.

26.

Erkut

(2017, March). Rhythmic interaction in VR: Interplay between sound design and editing. In 2017 IEEE 3rd VR workshop on sonic interactions for virtual environments (SIVE) (pp. 1–4). IEEE.

27.

Geronazzo

, & Serafin

(2023). Sonic interactions in virtual environments (p. 428). Springer Nature.

28.

Gödde

, Gabler

, Siegmund

, & Braun

(2018). Cinematic narration in VR–rethinking film conventions for 360 degrees. In Virtual, augmented and mixed reality: Applications in health, cultural heritage, and industry: 10th international conference, VAMR 2018, held as part of HCI International 2018, Las Vegas, NV, USA, July 15–20, 2018, Proceedings, Part II 10 (pp. 184–201). Springer International Publishing.

29.

Griffin

K. M.

(2015). Fantasound: A retrospective of the groundbreaking sound system of Disney. University of Colorado at Denver.

30.

Grömping

(2015). Using R and RStudio for data management, statistical analysis, and graphics. Journal of Statistical Software, 68, 1–7.

31.

Guaraná

(2024). Interactive cinema. Film Quarterly.

32.

Gutierrez

(2023). The ballad of Morton Heilig: On VR’s mythic past. Journal of Cinema and Media Studies, 62(3), 86–106.

33.

Harley

(2020). Palmer Luckey and the rise of contemporary virtual reality. Convergence, 26(5–6), 1144–1158.

34.

Heagerty

, Li

, Lee

, Bhattacharyya

, Bista

, Brawn

, Feng

B. Y.

, Jabbireddy

, JaJa

, Kacorri

, Li

, Yarnell

, Zwicker

, & Varshney

(2024). HoloCamera: Advanced volumetric capture for cinematic-quality VR applications. IEEE Transactions on Visualization and Computer Graphics, 30(5), 2767–2775.

35.

Hong

J. Y.

, Lam

, Ong

Z. T.

, Ooi

, Gan

W. S.

, Kang

, Feng

, & Tan

S. T.

(2019). Quality assessment of acoustic environment reproduction methods for cinematic virtual reality in soundscape applications. Building and Environment, 149, 1–14.

36.

Huang

, Solah

, Li

, & Yu

L. F.

(2019, May). Audible panorama: Automatic spatial audio generation for panorama imagery. In Proceedings of the 2019 CHI conference on human factors in computing systems (pp. 1–11). Association for Computing Machinery.

37.

Iñárritu

A. G.

(Director). (2017). Carne y arena [Flesh and Sand] [Film]. Legendary Entertainment, Emerson Collective, Fondazione Prada and PHI Studio.

38.

Kjær

, Lillelund

C. B.

, Moth-Poulsen

, Nilsson

N. C.

, Nordahl

, & Serafn

(2017, November). Can you cut it? An exploration of the effects of editing in cinematic virtual reality. In Proceedings of the 23rd ACM symposium on virtual reality software and technology (pp. 1–4).

39.

Klapholz

(1991). Fantasia: Innovations in sound. Journal of the Audio Engineering Society, 39(1/2), 66–70.

40.

Kraus

, Breier

, Lim

W. M.

, Dabić

, Kumar

, Kanbach

, Mukherjee

, Corvello

, Piñeiro-Chousa

, Liguori

, Palacios-Marqués

, Schiavone

, Ferraris

, Fernandes

, & Ferreira

J. J.

(2022). Literature reviews as independent studies: Guidelines for academic practice. Review of Managerial Science, 16(8), 2577–2595.

41.

Kronthaler

, & Zöllner

(2021). Data analysis with RStudio. Data Analysis with RStudio.

42.

Laurel

(2013). Computers as theatre. Addison-Wesley.

43.

Lipton

, & Lipton

(2021). This is cinerama. The cinema in flux: The evolution of motion picture technology from the magic lantern to the digital Era (pp. 527–539). Springer-Verlag New York Inc.

44.

Liu

, Urquıa-Grande

, Lopez-Sanchez

, & Rodrıguez-Lopez

(2022). Research into microfinance and ICTs: A bibliometric analysis. Evaluation and Program Planning, 97, 102215.

45.

Marañes

, Gutierrez

, & Serrano

(2023). Towards assisting the decision-making process for content creators in cinematic virtual reality through the analysis of movie cuts and their influence on viewers’ behavior. International Transactions in Operational Research, 30(3), 1245–1262.

46.

Masia

, Camon

, Gutierrez

, & Serrano

(2021). Influence of directional sound cues on users’ exploration across 360° movie cuts. IEEE Computer Graphics and Applications, 41(4), 64–75.

47.

Mateas

(2001). A preliminary poetics for interactive drama and games. Digital Creativity, 12(3), 140–152.

48.

Mateer

(2017). Directing for cinematic virtual reality: How the traditional film director’s craft applies to immersive environments and notions of presence. Journal of Media Practice, 18(1), 14–25.

49.

Mazarei

(2023, October). Story-without-end: A narrative structure for open-world cinematic VR. In International conference on interactive digital storytelling (pp. 329–343). Springer Nature.

50.

Medlar

, Lehtikari

M. T.

, & Glowacka

(2024, May). Behind the scenes: Adapting cinematography and editing concepts to navigation in virtual reality. In Proceedings of the CHI conference on human factors in computing systems (pp. 1–12).

51.

Mendonça

, Rummukainen

, & Pulkki

(2019). 3D sound can have a negative impact on the perception of visual content in audiovisual reproductions. In 12th Asia Pacific workshop on mixed and augmented reality.

52.

Mukherjee

, Lim

W. M.

, Kumar

, & Donthu

(2022). Guidelines for advancing theory and practice through bibliometric research. Journal of business research, 148, 101–115.

53.

Nash

(2021). Interactive documentary: Theory and debate. Routledge.

54.

Nielsen

L. T.

, Møller

M. B.

, Hartmeyer

S. D.

, Ljung

T. C.

, Nilsson

N. C.

, Nordahl

, & Serafn

(2016, November). Missing the point: An exploration of how to guide users’ attention during cinematic virtual reality. In Proceedings of the 22nd ACM conference on virtual reality software and technology (pp. 229–232).

55.

Norris

, & Oppenheim

2007, Comparing alternatives to the Web of Science for coverage of the social sciences’ literature. Journal of Informetrics, 1(2), 161–169.

56.

Ozturk

(2021). Bibliometric review of resource dependence theory literature: An overview. Management Review Quarterly, 71(3), 525–552.

57.

Paul

(2009). Binaural recording technology: A historical review and possible future developments. Acta Acustica United with Acustica, 95(5), 767–788.

58.

Pfanzagl-Cardone

(2023). The DOLBY→ “Atmos™” System. The art and science of 3D audio recording (pp. 143–188). Springer International Publishing.

59.

Poeschl

, Wall

, & Doering

(2013, March). Integration of spatial sound in immersive virtual environments an experimental study on effects of spatial sound on presence. In 2013 IEEE Virtual Reality (VR) (pp. 129–130). IEEE.

60.

Raessens

(2019). Virtually present, physically invisible: Alejandro G. Iñárritu’s mixed reality installation Carne y Arena. Television & New Media, 20(6), 634–648.

61.

Reeves

(1999). This is cinerama. Film History, 11(1), 85–97.

62.

Reyes

M. C.

(2018, December). Measuring user experience on interactive fiction in cinematic virtual reality. In International Conference on Interactive Digital Storytelling (pp. 295–307). Springer.

63.

Reyes

M. C.

(2022). From screenwriting to space-writing. Disegno—A Designkultúra Folyóirata, 6(1), 86–103.

64.

Reyes

M. C.

, & Zampolli

(2017, June). Screenwriting framework for an interactive virtual reality film. In 3rd immersive research network conference/ ILRN.

65.

Ross

, & Munt

(2018). Cinematic virtual reality: Towards the spatialized screenplay. Journal of Screenwriting, 9(2), 191–209.

66.

Serafin

, & Serafin

(2004). Sound design to enhance presence in photorealistic virtual reality. Georgia Institute of Technology.

67.

Sergi

(2013). Knocking at the door of cinematic artifice: Dolby Atmos, challenges and opportunities. The New Soundtrack, 3(2), 107–121.

68.

Stuart

D. R. E.

D. G.

. (2021). Facebook closes its $2 bn Oculus Rift acquisition. What next. The Gaurdian.

69.

Sunder

(2021). Binaural audio engineering. 3D audio (pp. 130–159). Routledge.

70.

Sutherland

I. E.

(1965, May). The ultimate display. Proceedings of the IFIP Congress, 2, 506–508.

71.

Sutherland

I. E.

(1968, December). A head-mounted three-dimensional display. In Proceedings of the December 9–11, 1968, fall joint computer conference, part I (pp. 757–764).

72.

Tatlow

(2024). Authenticity in sound design for virtual reality. History as fantasy in music, sound, image, and media (pp. 161–184). Routledge.

73.

Van Eck

, & Waltman

(2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523–538.

74.

Visser

, Pratt

, & Bourbon

(2024). Exploring Dolby Atmos: Past, present, and future. In Innovation in music: Cultures and contexts (pp. 19–34). Focal Press.

75.

Vosmeer

, & Schouten

(2017, June). Project Orpheus, a research study into 360 cinematic VR. Proceedings of the 2017 ACM international conference on interactive experiences for TV and online video (pp. 85–90).

76.

Waller

(1993). The archeology of Cinerama. Film History, 5(3), 289–297.

77.

Walter

(2023). Where to look? Sustaining presence while directing attention in virtual reality stories. Aniki: Revista Portuguesa da Imagem em Movimento, 10(1), 138–158.

78.

Wang

, O’Fearghail

, Zerman

, Braungart

, Smolic

, & Knorr

(2021, December). Visual attention analysis and user guidance in cinematic VR film. In 2021 international conference on 3D immersion (IC3D) (pp. 1–8). IEEE.

79.

Whitford

(2021). The ‘Truth of Sound’: Exploring the effects of an immersive location sound recording methodology within realist filmmaking. The Soundtrack, 13(1), 61–71.

80.

Zacarias

(2024). Virtuality and performativity: Nepantla within González Iñárritu’s Carne y Arena. Arizona State University.

81.

Zhang

, Lee

L. H.

, Wang

, Jin

, Fei

D. L.

, & Hui

(2024, March). Jump cut effects in cinematic virtual reality: Editing with the 30-degree rule and 180-degree rule. In 2024 IEEE conference virtual reality and 3D user interfaces (VR) (pp. 51–60). IEEE.

82.

Zhang

, Yu

, Zheng

, Long

, Lu

, & Duan

(2016). Comparing keywords plus of WOS and author keywords: A case study of patient adherence research. Journal of the Association for Information Science and Technology, 67(4), 967–972.

83.

Zhang

, Samarasinghe

P. N.

, Chen

, & Abhayapala

T. D.

(2017). Surround by sound: A review of spatial audio recording and reproduction. Applied Sciences, 7(5), 532.

84.

Zhang

, & Weber

(2023). Adapting, modifying, and applying cinematography and editing concepts and techniques to cinematic virtual reality film production. Media International Australia, 186(1), 115–135.

Spatial Sound Design for Cinematic Virtual Reality—A Bibliometric Analysis

Abstract

Keywords

Introduction

Literature Review

Methodology

Data Collection

Flow Diagram of the Search and Screening Process.

Data Analysis

Results

Document Types and Research Area

Publications over the Years

Countries

Organisations

Sources

Top 10 Sources Based on the Number of Documents Published.

Authors

Keywords

Relevant Documents

Relationships Among Author’s Country, Keywords and Author

Discussion

Science Mapping.

Limitation and Future

Conclusion

Footnotes

Declaration of Conflicting Interests

Funding

References