Abstract
This study systematically reviews the evolution of Music Emotion Computing (MEC) over the past decade, focusing on its two core branches: Music Emotion Recognition (MER) and Music Sentiment Analysis (MSA). Through a comprehensive bibliometric analysis, the research aims to uncover emerging trends, interdisciplinary and cross-regional collaboration patterns, and key application areas within this field. Using data collected from the Web of Science Core Collection (WoSCC), we conducted a comprehensive bibliometric analysis to map global academic output, highlighting influential studies, leading authors, and primary collaborative networks in MEC. Results indicate that research in MEC has exhibited significant growth over the last ten years, especially with heightened interest in applications such as multimodal emotion analysis and personalized music recommendation systems. MEC research demonstrates a high degree of interdisciplinary integration, with contributions from computer science, psychology, and neuroscience jointly driving advances in the field. Cross-regional collaboration analysis shows that Asia, Europe, and North America are the primary research hubs, characterized by extensive intercontinental partnerships. Current trends reveal a strong focus on multimodal MEC and deep learning-based methods combining audio, text, video as well as biosignals, suggesting future potential for MEC in areas like multimodal interaction, intelligent emotional feedback, and real-world applications, including mental health and music creation. Additionally, the study identifies challenges facing MEC applications, such as technical hurdles in multimodal data fusion, cultural variations in emotional perception, and concerns surrounding data privacy and ethics. Based on these findings, future research should further explore integration across diverse data sources, enhance the interpretability and generalizability of emotion recognition models, and innovate methods for cross-cultural emotion computing. This study provides a panoramic perspective for scholars in the MEC field and offers strategic recommendations for future research.
Keywords
Introduction
Music as a vehicle for emotional expression has profoundly shaped human culture and life across millennia. Beyond an art form, music serves as a potent tool for evoking, reflecting, and even regulating human emotions (Saarikallio, 2010). Previous researches highlighted that elements like rhythm, melody, and harmony in music can elicit specific emotional responses, influencing an individual’s mood and psychological experience (Thompson & Quinto, 2011). In daily life, people express and manage their emotions through diverse types of music, often choosing upbeat tunes when happy and selecting somber, lyrical music during times of sadness. This unique capability of music to influence emotions has drawn significant scholarly attention. However, accurately identifying the emotions conveyed by music remains challenging. Although humans can intuitively perceive the emotions expressed in music, its inherent subjectivity, abstraction, and individual variability complicate efforts to pinpoint specific emotions precisely (Juslin & Västfjäll, 2008).
Psychologists have developed emotional models that categorize human emotions into structured frameworks, generally organized as discrete, dimensional, or hybrid models. Discrete emotion models classify emotions into distinct categories, such as “joy,” “sadness,” “anger,” and “calm,” as in the Hevner model (Hevner, 1936). While these models facilitate intuitive categorization, they can struggle with complex or ambiguous emotional states. Dimensional emotion models, on the other hand, describe emotions along multiple continuous axes, exemplified by models such as Thayer (1990), Tellegen et al. (1999), Russell (1980), and Wiswede (2000). These dimensions often denote specific attributes like valence, arousal, pleasure, or dominance, allowing for the mapping of interrelationships among emotions. Hybrid models combine the strengths of dimensional and categorical approaches, seeking a balance between capturing emotional complexity and providing clear categorization. They typically use dimensions as a foundation while incorporating emotion labels to facilitate the identification of particular emotional responses, making them particularly suited for recognizing diverse emotional states in music emotion analysis (Gómez-Cañón et al., 2022).
Music Emotion Computing (MEC) leverages computational techniques to analyze music features, establishing a mapping between musical characteristics and the emotional space, thereby enabling the identification of emotions expressed through music (Yang et al., 2017). Once emotional labels are assigned to music data based on an emotional model, MEC techniques can automatically identify music emotions according to the selected model. MEC further expands the scope of musical expression and creation, enabling quantitative analysis and precise manipulation of music, automated music generation, and fostering connections between music and other domains (Gomez-Canon et al., 2021). MEC encompasses two primary branches: Music Emotion Recognition (MER) and Music Sentiment Analysis (MSA).
MER primarily involves analyzing the acoustic characteristics of music—such as rhythm, pitch, and harmony—to identify the emotions it conveys. The goal of this task is to extract emotion-reflective features from the music signal and classify them to automatically identify or predict musical emotions. Early MEC research mainly relied on manually extracted audio features, such as spectral and temporal features, in combination with traditional machine learning algorithms like Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN) for emotion classification and analysis (Caetano et al., 2013; Shi et al., 2006). As audio processing technology and computational power have advanced, researchers have employed more sophisticated feature extraction methods, such as Linear Predictive Cepstral Coefficients (LPCC) (Mini et al., 2021) and Mel-Frequency Cepstral Coefficients (MFCC) (Saxena et al., 2022). These features better capture aspects of pitch, timbre, and rhythm in music, providing more effective support for emotion computation. However, manual feature extraction has its limitations, particularly when dealing with complex musical emotions, as manually extracted features may not fully or accurately capture the emotional content in music. Recently, with the advent of deep learning, MER has achieved substantial progress, notably in emotion prediction accuracy and robustness. The application of models like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) has enabled MER to capture emotional content more effectively, especially in recognizing both short-term signals and long-term emotional trends (Jia, 2022; Sheykhivand et al., 2020).
Music Sentiment Analysis (MSA) focuses on identifying emotions within music-related text, such as lyrics, reviews, and listener feedback, by utilizing natural language processing (NLP) techniques. The core task in MSA is to extract emotional features from textual data, thus allowing researchers to discern both the emotive intent behind the music and listeners’ emotional responses (Shukla et al., 2017). Early studies in MSA relied on techniques like word frequency analysis and sentiment lexicons to identify emotional vocabulary in lyrics or comments (Xia et al., 2008). With advances in NLP, particularly the expansion of sentiment lexicons and the proliferation of deep learning models, MSA has increasingly incorporated enhanced sentiment lexicons and classification models to improve analytical accuracy. In recent years, transformer-based models, such as BERT, have been widely adopted within MSA, enabling more precise capture of complex emotions embedded in lyrics and reviews (Agatha et al., 2021).
In the field of MEC, researchers strive to develop algorithms and systems capable of interpreting, recognizing, and analyzing emotional elements in music, allowing computers to perceive and interpret musical emotions as humans do. MEC exemplifies an interdisciplinary domain, merging insights from psychology, musicology, computer science, and cognitive science to approach emotional features in music from multiple perspectives (Devaney, 2020). The ultimate goal of both MER and MSA is a profound understanding of how musical works express and influence emotions, exploring the relationship between music and human emotions to foster deeper insights into musical affect.
Recent years have witnessed an accelerating trend toward the convergence of MER and MSA within multimodal MEC research. Multimodal MEC aims to enhance emotion recognition accuracy by combining audio and textual features (Zhao et al., 2022). Research shows that audio and text features often complement each other in emotion recognition: audio can capture emotional expressions such as rhythm and melody, while lyrics and reviews offer rich semantic emotional content (Proverbio & Russo, 2021). Consequently, contemporary multimodal emotion recognition approaches have increasingly integrated deep learning techniques, allowing models to process both audio and text features through cross-learning mechanisms, thereby boosting emotion recognition performance.
As an emerging interdisciplinary field, MEC draws scholars from various academic backgrounds, each contributing unique theories and methodologies. Analyzing patterns and outcomes of interdisciplinary collaboration could illuminate the synergistic effects between fields and identify more effective research pathways. Additionally, as globalization advances, cross-regional academic collaboration increasingly drives technological progress. Examining the contributions of different countries and regions in MEC can shed light on the development of global cooperation networks and their impact.
Building upon this, the study employs a bibliometric approach to comprehensively analyze MEC research over the past decade, with a focus on the dual domains of MER and MSA. While MER and MSA stem from distinct technical fields—audio processing and NLP, respectively—MEC as a whole represents a truly interdisciplinary domain. By simultaneously addressing MER and MSA within bibliometric analysis for the first time, this study aims to reveal the cross-disciplinary nature of MEC and to provide a more comprehensive view of its research trajectory. As the field continues to integrate, multimodal approaches have gained prominence; thus, analyzing MER and MSA in tandem offers a cohesive framework for exploring multimodal MEC’s evolution, enabling a deeper understanding of its research trends, leading techniques, and core methodologies. Examining developments in fusion techniques further elucidates the main trends in multimodal MEC research, while a consolidated analysis of specific technologies, such as deep learning, across modalities offers an integrated view of technological advancements.
This study centers on answering the following core questions: Q1: What are the main research trends in mec over the past ten years? Q2: How is cross-disciplinary collaboration structured in mec, and what contributions do different fields make? Q3: What are the patterns in international collaboration within mec, and how do cooperation networks vary by region? Q4: What have been the key application areas of mec in the past decade, and what barriers might impede further development?
To address these questions, a bibliometric analysis was conducted, leveraging a scholarly database to identify relevant MEC literature and conducting quantitative analysis using bibliometric tools. First, core literature related to MEC was identified through keyword filtering and classification tools. Then, methods including co-word analysis, collaborative network analysis, and citation analysis were employed to reveal key research trends and collaborative patterns. This framework systematically maps MEC’s academic contributions and offers insights into future research directions.
The paper is structured as follows: Section 2 outlines the dataset construction and bibliometric analysis methods employed. Section 3 presents an analysis of MEC’s research trends and prominent issues over the past decade, examining interdisciplinary and cross-regional collaboration and the evolution of global academic networks. Section 4 addresses core research questions and explores potential future trajectories for MEC. Finally, Section 5 concludes with strategic recommendations based on the study’s findings.
Methods
This study draws on the Web of Science Core Collection (WoSCC) to evaluate research outputs in music emotion computing (MEC) over the past decade. By analyzing the development of MEC within WoSCC, this study provides a robust understanding of research trends, interdisciplinary collaborations, and cross-regional interactions in the field. WoSCC, a highly authoritative and selective database, indexes peer-reviewed journals known for their strong academic impact and emphasis on high-quality research. While alternative databases such as Scopus offer broader coverage, their inclusion of non-specialized journals and conference proceedings may obscure clear trends in computationally intensive domains like MEC. The bibliometric analysis began with a systematic process of literature identification and selection, as illustrated in Figure 1, to ensure the construction of a comprehensive and focused dataset. Flowchart of the Dataset Construction Process
Dataset Construction
In this study, the dataset construction process adhered to rigorous identification, screening, and eligibility assessment standards, selecting relevant literature from the WoSCC database to ensure the inclusion of highly relevant and quality research. Data collection was finalized as of October 2024.
During the initial dataset identification phase, a targeted search string, TS = ((music or song or lyrics) AND (“sentiment analysis” or “emotion analysis” or “emotional analysis” or “emotion recognition” or “sentimental analysis” or “sentiment classification” or “emotion classification” or “sentiment categorization”)), was employed to identify 1206 records. This preliminary search encompassed studies on MEC from the past decade (2015–2024), establishing a robust foundation for constructing a comprehensive dataset for bibliometric analysis.
The screening phase involved a multi-tiered filtering process to enhance the dataset’s coherence and suitability. An initial examination of the 1206 records excluded 263 documents that did not meet the study’s objectives or criteria. Exclusions included documents published outside the 2015–2024 period, review articles, editorial materials, book chapters, book reviews, corrections, and letters, as well as non-English publications to ensure language consistency. Following this primary screening, 943 records were retained for further eligibility assessment.
In the eligibility assessment phase, high-quality studies directly aligned with the research objectives were selected from the retained records. Following a meticulous manual review, 564 documents unrelated to MEC’s central themes were excluded. Criteria for exclusion included lack of direct relevance to core MEC areas. Consequently, a total of 379 publications were included in the bibliometric dataset for this study. These selected studies met predefined inclusion criteria and represented high-quality, MEC-related research, forming a reliable foundation for subsequent analyses aimed at uncovering trends, interdisciplinary, and international collaboration in the field.
This dataset uniquely combines MER and MSA in MEC bibliometric analysis, offering a more comprehensive view of research hotspots, dominant methodologies, and technological pathways in MEC filed. Analyzing developments in fusion technologies within MEC reveals key trends in multimodal approaches and provides insight into the integration of MER and MSA, guiding technology development for practical applications. Additionally, research in MEC requires collaboration across disciplines such as arts, psychology, and computer science. By examining MER and MSA jointly, the study uncovers interdisciplinary and international collaboration networks, identifying high-impact research institutions and teams focused on multimodal fusion. These insights promote collaborative advancement in MEC, particularly in identifying core teams and regions dedicated to multimodal fusion research.
Bibliometric Tools and Methodology
To conduct a comprehensive bibliometric analysis of Music Emotion Computing (MEC), this study employed the Bibliometrix R-package and its interactive visualization tool Biblioshiny (Aria & Cuccurullo, 2017). These tools were chosen for their robust capabilities in bibliometric and scientific mapping, allowing systematic exploration of the citation metrics, research structure, thematic evolution, and collaboration networks (Guofang et al., 2024).
One of the key citation metrics analyzed is Mean Total Citations per Article (MeanTCperArt), which measures the average citation impact of a collection of documents. It is calculated as:
For the co-citation network, a minimum co-citation threshold of 10 was applied to filter out weaker relationships, ensuring the inclusion of robust and frequently co-cited references. The Louvain modularity algorithm was employed to detect thematic clusters based on the strength of internal co-citation links. This clustering method identifies communities of references that share strong citation relationships, revealing the intellectual structure of the field. The resulting network was further refined and spatially optimized using Pajek, which improves the visual separation of clusters and enhances interpretability (De Nooy et al., 2018).
The keyword co-occurrence network was constructed using Author Keywords appearing at least five times across the dataset. This threshold was chosen to balance inclusivity and interpretive clarity by focusing on the most significant terms. In the resulting network, nodes represent individual Author Keywords, and edges indicate the frequency of keyword co-occurrence within the same publications. To evaluate the importance of each keyword, three centrality measures were applied: PageRank, which ranks keywords based on their connections to other influential terms; Betweenness Centrality, which measures a keyword’s role as a bridge connecting different parts of the network; and Closeness Centrality, which reflects how efficiently a keyword connects to all others. Visualization of the network was performed in Python using the NetworkX and Matplotlib libraries, where node size was scaled proportionally to PageRank values, node color represented thematic clusters identified through Louvain modularity detection, and edge thickness reflected the strength of keyword co-occurrence relationships.
The collaboration networks were constructed to examine both author-level and country-level collaborations. For the author collaboration network, a minimum edge threshold of one was applied, with isolated nodes removed to focus on significant co-authorship relationships. For the country collaboration network, a minimum collaboration threshold of two co-authored publications was established to highlight meaningful international partnerships while reducing visual noise caused by less significant collaborations. These networks were initially generated in VOSviewer to create co-authorship matrices (Van Eck & Waltman, 2009), and then visualized using Schimago Graphica, which applies a force-directed layout algorithm to optimize node positioning and minimize overlaps (Hassan-Montero et al., 2022). In these networks, node sizes were scaled according to the total publication outputs of authors or countries, while edge thickness reflected the strength of co-authorship ties.
Through the systematic application of bibliometric tools and clustering techniques, this study provides a robust and multi-dimensional analysis of MEC research. By combining quantitative network measures with qualitative thematic interpretation, the methodological framework ensures transparency, reproducibility, and a comprehensive understanding of the research trends, collaboration patterns, and intellectual foundations within the MEC domain.
Results
This section undertakes a detailed analysis of the filtered MEC article collection, which incorporates research on both MER and MSA, examining various dimensions such as sources, authorship, countries, and thematic content.
Dataset Overview
Dataset Statistics
The dataset contains 315 Keywords Plus, generated by indexing services, illustrating broad topical coverage, alongside 973 author-provided keywords, which reflect a diverse array of specific research areas within MEC. Over the past decade, 1060 authors have contributed to MEC research, indicating substantial academic engagement and collaboration. Of the 379 documents, only 49 are single-author publications, with an average of 3.53 co-authors per document, highlighting the collaborative and interdisciplinary nature of MEC research. Furthermore, 17.41% of the documents are co-authored internationally, meaning nearly one in five studies benefits from cross-border cooperation—emphasizing the global interest in and collaborative ethos of MEC research.
In terms of document types, the dataset includes 206 journal articles and 162 conference papers, reflecting researchers’ focus on both the sustained knowledge accumulation associated with journal publications and the rapid dissemination of cutting-edge findings in conference settings. This publication model underscores the dynamism and openness of the MEC field within fast-evolving technological and application domains. Overall, the data reflect an expanding MEC field that emphasizes collaborative work and is increasingly impactful within academia.
Figure 2 illustrates the annual publication trends in MEC over the past decade. While exhibiting some fluctuations, the overall trend is upward, rising from 27 publications in 2015 to 44 in 2024. Notably, publication volumes in 2021 and 2022 reached 42 and 55, respectively, signaling heightened interest in the field during these years. Annual Scientific Publication From 2015 to 2024
Early publications (2015–2017) show relatively high mean total citations per article (MeanTCperArt), with 13.93 in 2015, 12.32 in 2016, and 13.03 in 2017, suggesting these works laid foundational groundwork for the current MEC research landscape. Between 2018 and 2021, MeanTCperArt gradually declined, dropping to 8.79 in 2018 and 8.09 in 2019, but rebounded in 2020 and 2021 to 10.41 and 12.1, respectively, indicating renewed attention in the field. This renewed interest, particularly in 2021, may be attributed to the adoption of new technologies and the advancement of multimodal approaches. The relatively low MeanTCperArt for more recent publications in the last three years is consistent with the tendency for newly published work to require time to accumulate citations.
Additionally, starting in 2020, the mean total citations per year (MeanTCperY) increased markedly, peaking in 2021 at 3.02. This trend suggests that studies from this period had substantial influence, likely driven by the rise of multimodal techniques and novel technologies that propelled the field forward. Following 2022, the MeanTCperY has declined, which is typical for recent works that have yet to amass a substantial citation count.
General Analysis
Top 10 Sources
The distribution of journals reflects MEC’s diversity and interdisciplinary nature, spanning affective computing, intelligent systems, psychology, musicology, and biomedicine. The MEC field not only draws on computer science and AI but also relies on theories from psychology and musicology to deepen understanding of emotional interaction, finding application scenarios in areas such as health monitoring and mobile platforms. This interdisciplinary integration is driving MEC’s growth in multimodal data analysis and practical applications.
Production trends over time for the top ten authors by publication volume are displayed in Figure 3, where the horizontal axis represents publication years from 2015 to 2024. Each circle represents a publication, with larger circles indicating years in which an author produced multiple works. The color intensity of each circle reflects the total citations per year, with darker shades indicating higher citation counts. Relatively high citations in specific years suggest that the author’s publications during these periods made notable contributions to the field. The presence of large, dark circles (highly cited papers with frequent publications) for authors like YANG YH and WANG JC reflects a strong influence on the research community. Their work is not only consistent but also draws substantial academic attention. PANDA R and MALHEIRO R also show high citations for fewer publications, indicating that even limited contributions can have substantial impact. Authors such as GREKOW J and LEE J have fewer publications spread out over time but still maintain a citation impact. Authors’ Production Over Time
Top 10 Countries
Overall, the data indicate that MEC research has achieved global expansion, with a solid research base established across numerous countries, laying a strong foundation for international collaboration and cross-cultural applications.
Top 20 Highly Cited Paper
To analyze the 20 most-cited articles identified in this study, a systematic and transparent process was employed to ensure the robustness of the thematic interpretation. The total citation counts were extracted directly from the WoSCC database, providing a reliable foundation for further analysis. To identify the themes of the top 20 most-cited articles, we conducted an in-depth review of the full text of these publications. Based on this analysis, we categorized and summarized their themes by integrating information from their titles, abstracts, and keywords to ensure the reliability and objectivity of our conclusions. Following this, a systematic content analysis was conducted to identify recurring research topics, methodologies, and applications.
For instance, studies such as Aljanaki et al. (2015) and Jiang et al. (2016) delved into the psychological mechanisms of emotion induced by music, exemplifying psychology’s collaboration within the MEC domain. These investigations explore the impact of musical emotion on emotional regulation and stress alleviation, offering theoretical support for applications such as music therapy. Meanwhile, Ayata et al. (2018) integrated biosignal processing and computer science through data from wearable physiological sensors to create an emotion-based music recommendation system, showcasing the intersection of physiology, medical technology, and computing within affective computing. This work advances the system’s ability to dynamically perceive emotional states. Hu and Yang (2016) addressed the cross-cultural dimension of music emotion prediction by investigating how different cultural contexts interpret musical emotion, demonstrating MEC’s growing interest in cultural influences on emotional expression. This research combines anthropological and social psychology perspectives, providing critical insights for cross-cultural eMEC methods.
Recent MEC studies showed an intensified focus on multimodal information fusion, with works like Pandeya and Lee (2020) and Pandeya et al. (2021). These studies applied deep learning models to multimodal emotion classification, merging audio and video inputs to enhance accuracy in emotion recognition. By fusing the emotional features of music, video and texts, these studies refined the understanding of emotion and expand MEC’s application landscape.
Several high-impact articles, including Chen et al., 2015; Zhang et al., 2018, focused on developing emotion datasets. These datasets provide diverse multimodal data (e.g., audio, lyrics, and video), supporting interdisciplinary research across multiple fields. Other contributions, such as Ayata et al. (2018) and Moscato et al. (2020), concentrated on developing emotion-based music recommendation systems. These applications integrate knowledge from psychology, computer science, and AI, offering broad potential in emotion recognition and regulation, signaling MEC’s interdisciplinary promise in fields like intelligent recommendation systems and personalized experiences.
Collaboration Network Analysis
The author collaboration network was generated to uncover collaborative relationships among researchers within the Music Emotion Computing (MEC) domain. A minimum edge threshold of one was applied to ensure the inclusion of meaningful co-authorship ties while removing isolated nodes (authors with no substantial collaborations) to maintain focus on significant interactions. To highlight the most active contributors, the analysis was limited to the 50 most collaborative authors, ranked based on their co-authorship frequency.
In the resulting visualization, node size represents the total number of articles published by each author, reflecting their research productivity, while the thickness of edges indicates the strength of co-authorship relationships, measured by the number of jointly published articles. The Louvain modularity algorithm was employed to detect collaborative clusters, visually distinguishing groups of closely connected authors through color-coded communities. This approach reveals key patterns of collaboration and identifies central hubs driving research innovation in the MEC field.
The visualization highlights prominent researchers such as Yang YH and Wang JC, who occupy central positions in the network due to their high research output and extensive collaborative ties. These authors form collaborative hubs, serving as focal points for co-authorship and knowledge exchange. For example, Yang YH’s collaboration cluster includes connections with notable researchers such as Gómez E, Hu X, and Zhang KJ, demonstrating a cohesive and productive research group. Similarly, another significant cluster involving Paiva RP, Panda R, Malheiro R, and Gomes P suggests a dedicated collaboration group focused on specific thematic areas within MEC.
Additional clusters highlight other active research groups, such as the network involving Peng J, Bai JJ, Luo K, Shi JL, and Feng LX, indicating frequent co-authorship among these researchers. Interestingly, certain authors act as bridges between otherwise distinct clusters, facilitating knowledge sharing and interdisciplinary collaboration. For instance, Hu X connects Yang YH’s group with other research networks, fostering exchanges of ideas and methodologies across clusters.
These findings demonstrate the presence of both tightly connected research communities and interlinked clusters, which collectively drive the advancement of MEC. Authors like Yang YH and Wang JC play particularly influential roles, acting as central figures in the field’s collaborative structure and contributing to the development of innovative research through their extensive co-authorship networks (Figure 4). Collaboration Network - Co-Authorship Between Authors
The country collaboration network was constructed to examine international partnerships in MEC research. A minimum threshold of two co-authored publications between countries was applied to focus on significant international collaborations while filtering out less substantial ties. The co-authorship matrix generated using VOSviewer was subsequently visualized in Schimago Graphica, where a force-directed layout algorithm was applied to optimize node positioning and minimize edge overlaps. In this network, nodes represent countries, and their size is proportional to the total number of publications contributed by each country. The thickness of edges reflects the strength of collaboration between two countries, measured by the number of co-authored articles.
Figure 5 illustrates the international collaboration patterns within MEC research, showcasing a global co-authorship network. Notable cross-continental partnerships emerge, particularly among countries in Europe, North America, and Asia. China stands as a major hub, with robust bilateral collaborations with the UK, Australia, and India, underscoring its central role in advancing MEC research partnerships internationally. The UK, US, and Australia also function as key nodes with extensive cooperative ties. The UK collaborates closely with other European countries like the Netherlands, France, and Italy, demonstrating its active role in fostering pan-European research initiatives. A prominent East Asian cluster, featuring China, Japan, South Korea, and India, signals strong regional engagement in MEC research. Meanwhile, a European cluster involving the UK, the Netherlands, Spain, Italy, and Germany suggests substantial collaborative efforts within Europe. Australia’s close links with both China and the US indicate its bridging role between East and West in MEC research. This network underscores the highly internationalized environment within MEC research. Collaboration Network - Co-Authorship Between Countries
The co-citation network shown in Figure 6 was generated to uncover the intellectual structure of the Music Emotion Computing (MEC) field by identifying relationships among frequently co-cited references. A minimum co-citation threshold of 10 was applied to ensure the inclusion of references with substantial co-citation links, highlighting the most influential works. The Louvain modularity algorithm was used to detect thematic clusters, with each cluster visually represented by a distinct color to facilitate interpretation. Co-Citation Reference Network
To analyze and interpret these clusters, the most central and prominent references—determined by their high co-citation frequency—were systematically reviewed. Titles, abstracts, and full texts of core references within each cluster were examined to extract key themes, methodologies, and research focuses. The analysis reveals distinct thematic clusters that reflect the interdisciplinary nature of MEC, encompassing foundational psychological theories, computational methodologies, and practical applications.
The yellow cluster centers on emotion psychology, featuring foundational studies on emotion theory that have significantly advanced the categorization and assessment of emotions, such as Hevner (1936) and Thayer (1990). The green cluster focuses on practical methods in MER, highlighted by studies like Yang et al. (2008) and Russell (1980), which employed machine learning models to analyze audio signals and detect emotional content. The purple cluster concentrates on the extraction and analysis of audio features and their roles in emotion recognition, with influential studies like Kim et al. (2010) and Tzanetakis and Cook (2002). These works underscored the overlap between computational audio analysis and affective computing, often relying on advanced feature extraction techniques (such as MFCC) to develop models that classify emotions elicited by musical stimuli.
The pink cluster emphasizes multimedia tools in MEC, focusing on the intersection of multimedia systems and human emotion. It explores how computational methods enhance multimedia systems to make them more intuitive and responsive to user emotions, as seen in Panda et al. (2020). The blue cluster delves into the cognitive processing of musical content and emotional cues in musical performance, merging insights from cognitive science to understand how individuals perceive musical elements and how these translate into emotional responses, with studies like Bigand et al. (2005) and Scherer (2004).
Together, these clusters highlight MEC’s interdisciplinary nature, drawing from psychological theories, computational models, and practical applications to deepen understanding and enhance emotional recognition in music. The connections between clusters, such as those between psychological research (yellow) and machine learning/audio analysis (green), emphasize that understanding emotion in music demands both strong psychological theories and computational implementation. This analysis reveals how foundational theory integrates with advanced technology, contributing to the development of systems that effectively interpret and respond to human emotions in multimedia environments.
Thematic Analysis
The word cloud shown in Figure 7 was generated to identify the most frequent and prominent research themes in MEC research over the past decade, using Author Keywords as the input data. This choice was made because Author Keywords, provided directly by the authors, offer a precise representation of core research themes, unlike Keywords Plus, which can include overly generic or irrelevant terms. Word Cloud
To construct the word cloud, all Author Keywords were extracted from the dataset, and a filtering process was applied to remove high-frequency but non-meaningful terms. A stoplist was used to exclude generic words such as “system” or “analysis.” The final word cloud visualized the top 50 most frequent keywords, ranked based on their occurrence frequency across the dataset. In the resulting visualization, the size of each keyword corresponds to its frequency.
In the resulting visualization, term size indicates frequency and prominence within the literature analyzed, revealing the field’s interdisciplinary and multimodal progression. Keywords like “music emotion recognition,” “affective computing,” and “emotion perception” indicate MEC’s focus on identifying, perceiving, and understanding music-induced emotions. These areas often incorporate insights from psychology and neuroscience to decode the diversity of human emotional responses. High-frequency terms such as “deep learning,” “machine learning,” “convolutional neural network,” and “recurrent neural networks” underscore the significance of computer science and AI in MEC, while terms like “EEG” and “electroencephalography” highlight the role of neurophysiological signals. These interdisciplinary studies leverage biological signals, such as EEG data, to explore the impact of music on brain activity, combining neuroscience with affective computing to provide empirical insight into the physiological mechanisms underlying musical emotions.
Further terms, including “audio features,” “lyrics,” and “music information retrieval,” point to the role of multimodal data in MEC. These studies transcend audio signals alone, integrating textual analysis from lyrics to attain a more nuanced understanding of music’s emotional impact. This multimodal approach enriches emotional analysis by combining auditory features with linguistic sentiment, facilitating a deeper, layered expression of emotions. Terms like “valence” and “arousal” signify the application of the dimensional emotion model (pleasure and activation levels), mapping auditory features to an emotional space and enhancing the integration of varied data types.
In the context of applications, terms like “music recommendation,” “personalization,” and “recommender systems” illustrate MEC’s deployment within music recommendation systems. By combining multimodal data, these emotion-driven recommendation systems aim to personalize music suggestions based on user sentiment. Terms like “transfer learning” and “multi-task learning” highlight a focus on model generalizability, enabling data sharing and optimization across fields and enhancing adaptability within multimodal emotion computing.
Innovative methods of data collection and labeling are hinted at by “crowdsourcing,” which highlights the expansion of emotion annotation through large-scale crowdsourced data. The incorporation of “mfcc” and “midi” illustrates standard audio processing techniques and data formats that underpin MEC research, promoting standardized multimodal data use.
Figure 8 displays a co-occurrence network of keywords, where terms’ co-occurrence within publications reveals underlying research trends. The network was constructed to map the conceptual structure of MEC research by analyzing relationships between frequently co-occurring Author Keywords. Keywords appearing at least five times in the dataset were included in the analysis to focus on significant terms while reducing noise. In the resulting network, nodes represent individual keywords, with their size scaled according to PageRank values, a measure of relative importance based on connections to other influential nodes. The edges reflect the strength of co-occurrence relationships, with thicker edges indicating stronger associations. Clusters of keywords were detected using the Louvain modularity algorithm, and each cluster was assigned a distinct color to represent a thematic group. Co-Occurrence - Keyword Network Analysis
The blue cluster encompasses core concepts in emotion and sentiment analysis, with terms like “music emotion recognition,” “music information retrieval,” “arousal,” and “sentiment analysis.” This group represents foundational ideas and methods for emotional content detection, forming the bedrock of more complex systems for emotion analysis. The orange cluster focuses on computing techniques and models, with terms like “deep learning,” “convolutional neural network,” “natural language processing,” “neural network,” and “machine learning,” reflecting the use of AI and machine learning models to capture intricate patterns critical for nuanced emotion analysis.
The green cluster pertains to feature engineering and data processing, with terms like “audio feature extraction,” “feature selection,” “mfcc,” and “music features.” These processes are essential in converting raw data into meaningful inputs for machine learning models, ultimately determining system quality and performance in emotion recognition. The pink cluster centers on advanced topics and applications, with terms like “affective computing,” “music recommendation,” “personalization,” “recommender systems,” and “transfer learning.” This highlights MEC’s practical applications, including emotion-driven content delivery and intelligent interaction systems, while emphasizing the role of transfer learning in enhancing learning efficiency across fields. The purple cluster includes interdisciplinary areas, with keywords such as “electroencephalography,” “EEG,” “citizen science,” and “social media.” These terms underscore the expansion of emotion analysis beyond traditional data sets, incorporating physiological, social, and community-driven data to enrich insights.
The co-occurrence analysis indicates a growing trend toward using advanced AI techniques for emotion analysis, revealing the field’s diversification into applications like personalized music recommendation and EEG-based emotion recognition. The appearance of keywords like “natural language processing” and “arousal” highlights a holistic approach that integrates linguistic, physiological, and auditory data for a deeper exploration of MEC.
Figure 9 uses a thematic trend chart to depict the evolution of key MEC terms over the last decade. Circles represent term frequency, divided into three levels: frequency ≤10 (small circles), frequency ≤20 (medium-sized green circles), and frequency >20 (larger purple circles). The evolving themes reflect shifting technological and application priorities within the field. From 2016 to 2018, terms like “feature selection,” “feature extraction,” and “support vector machine” emerged frequently, indicating an emphasis on foundational feature extraction methods that would underpin future model complexity. Keywords like cross-cultural and valence from this period signify attention to cultural variance in emotional perception. Trend Topics
By 2018, terms such as “deep learning” and “convolutional neural network” became prevalent as deep learning increasingly replaced traditional machine learning approaches. Between 2018 and 2020, deep learning significantly improved emotion recognition accuracy and scalability, especially in integrating multimodal data sources such as audio and lyrics. The rising frequency of terms like “music recommendation” and “personalization” indicates the development of emotion-based recommendation systems during this period, which became a prominent MEC application.
From 2020 onward, the integration of multimodal data became mainstream within MEC research. The increased appearance of “EEG,” “audio features,” and “midi” shows the broad adoption of combining physiological, auditory, and visual data, enabling more complex emotion analysis and enhancing MEC systems’ sensitivity to user emotional states. Keywords such as “transfer learning” and “multi-task learning,” appearing after 2021, reflect attempts to leverage existing knowledge in data-limited situations, particularly in cross-cultural emotion computation. This trend has broadened MEC models’ applicability across diverse cultural contexts and personalized scenarios.
From 2022 to 2024, terms like “artificial intelligence,” “deep learning,” and “recurrent neural networks” continued to grow, showcasing AI’s role in personalization and sophisticated multimodal data analysis. EEG and other physiological data have gained traction during this period, reflecting the interdisciplinary expansion of affective computing. MEC is now increasingly merging psychology, brain science, and computer science to further uncover music’s emotional impact.
Discussions
This chapter draws on insights from bibliometric analysis and literature review to address the four research questions introduced in the study’s outset, offering an expanded discourse on MEC developments over the past decade.
What are the Main Research Trends in MEC over the Past Ten Years?
MEC research over the past decade has transitioned from foundational feature extraction to multimodal deep learning and, more recently, the integration of physiological signals. Bibliometric data show that early MEC studies, particularly between 2015 and 2017, focused on emotional feature extraction and traditional machine learning techniques like SVMs to improve classification accuracy. This foundational phase established MEC’s emotional modeling framework, though the reliance on single-modal data constrained classification adaptability and scalability (Panda et al., 2015; Xing et al., 2014).
From 2018 to 2020, MEC saw a pivot toward deep learning and multimodal data integration, with algorithms such as CNNs and LSTMs significantly enhancing emotion recognition accuracy, especially in the fusion of audio and lyrics data sources (Dong et al., 2019; Sarkar et al., 2019). This period marked a shift from single-source data toward multimodal processing and initial applications in music recommendation and personalized affective systems. Researchers began combining audio, lyrics, and social media data to increase precision and ensure more reliable model performance across contexts (Baltazar & Västfjäll, 2020; Shen et al., 2020).
After 2020, MEC research evolved toward comprehensive multimodal fusion, including physiological signals like EEG, expanding emotion recognition from audio-based analysis to physiologically integrated emotion models (Liu et al., 2024; Yin et al., 2022). This shift enhances the perceptual depth of emotional models and extends MEC’s relevance to interdisciplinary areas, such as affective neuroscience, positioning MEC as an emergent field poised to impact affective recommendations, therapeutic emotional monitoring, and cross-cultural emotion analysis (Wang et al., 2022; Zhou et al., 2022).
How is Cross-Disciplinary Collaboration Structured in MEC, and what Contributions Do Different Fields Make?
Cross-disciplinary collaboration in MEC reflects the complexity of music emotion recognition, drawing on psychology, computer science, neuroscience, and musicology. Bibliometric analysis suggests that psychology offers foundational theories on emotional categorization and perception, providing MEC with the theoretical basis to map emotions within models like the valence-arousal framework (Croom, 2014). Such models serve not only to ground emotion classification but also support applications in therapeutic music interventions (Santana et al., 2023).
Computer science contributes through algorithmic advancements for emotion classification and data fusion, particularly with the deployment of deep learning, CNNs, and NLP techniques across MEC applications. These methods elevate recognition accuracy while facilitating multimodal data integration (Pandeya et al., 2021; Sams & Zahra, 2022). The recent adoption of transfer and multi-task learning in MEC underscores its adaptability across cultural contexts and varying datasets (Qiu et al., 2022; Tong, 2022).
Additionally, physiology and neuroscience bring insights from neuroaffective research to MEC, especially through physiological data like EEG, which provides a more intricate understanding of the neurological basis of emotions (Cui et al., 2022). EEG data, for instance, enable MEC research to both detect affective states and elucidate the biological mechanisms underpinning these responses, establishing a robust scientific foundation for future cross-context and cross-cultural emotion models (Liu et al., 2022).
What are the Patterns in International Collaboration within MEC, and How Do Cooperation Networks Vary by Region?
Analysis of MEC’s international collaboration networks reveals significant contributions from Asia, North America, and Europe, with a shared focus on fostering cross-regional academic exchange and innovation. China’s high publication volume indicates substantial resource investment in MEC, notably in AI-driven emotion recognition applications. China maintains active collaborations with the UK, the US, and India, achieving progress in large-scale emotion recognition and neuro-signal processing. This international cooperation fosters MEC’s globalization while enhancing applicability across cultural and linguistic landscapes.
European MEC research is marked by diversity and regional cooperation. The UK, Germany, and Spain offer strong foundations in theoretical frameworks and psychology-based applications, while East Asian countries (China, Japan, South Korea) focus on cultural adaptation in MEC, such as cross-cultural music emotion understanding and the localization of multimodal data applications.
Australia plays a bridging role in MEC’s global collaboration, actively engaging with both Eastern and Western countries. In particular, Australia’s frequent partnerships with China and the US encourage MEC’s diversity, facilitating cross-cultural studies that enrich the field. Overall, MEC’s international networks are broad and multilayered, advancing through cross-disciplinary research to address challenges in emotion recognition across diverse cultural and linguistic contexts.
What Have Been the Key Application Areas of MEC in the Past Decade, and What Barriers Might Impede Further Development?
The principal applications of MEC encompass emotion-driven music recommendation systems, mental health management, trend detection in music popularity, and assistance in music composition. These applications capitalize on advancements in affective computing, significantly improving the accuracy of emotion recognition and real-time emotional response.
Emotion-based recommendation systems, for instance, tailor music content to align with users’ current emotional states, enhancing personalization (Abdul et al., 2018; Gilda et al., 2017). By combining emotion analysis techniques across audio, lyrics, and video data, these systems dynamically adjust recommendations based on users’ real-time emotional fluctuations. This showcases the deep integration of affective computing with intelligent recommendation.
In the mental health sector, MEC’s potential applications are broad. Emotion recognition in music therapy, for example, has been explored to alleviate anxiety and stress, with EEG and other physiological signals gradually incorporated for real-time emotional monitoring (Gallego & García, 2017; Magee & Davidson, 2002). The ability to monitor emotional states through physiological signals supports more precise emotional management, highlighting MEC’s promise in mental health applications (Ray & Mittelman, 2015).
In social media analysis, MEC can analyze public sentiment toward songs and artists, allowing researchers and industry experts to track trends and shifts in popular music. Through NLP techniques, MEC can identify sentiment—positive, negative, or neutral—within user comments, providing insights into the public reception of specific songs or artists (Kumpulainen et al., 2020). Analyzing keywords and emotional shifts in commentary allows MEC to reveal thematic, stylistic, or emotional trends tied to specific genres (Goel et al., 2022).
MEC also aids music creators by offering emotion-informed guidance to help them craft music aligned with particular emotional themes (Wang, 2021). MEC can assign emotional tags—such as joy, sadness, or exhilaration—to musical segments, guiding creators in selecting melodies and harmonies that align with the desired emotional tone (Cai & Cai, 2019). During the creative process, MEC can provide real-time feedback by analyzing the congruence between a creator’s compositions and target emotions. By incorporating user preferences and historical styles, MEC enables personalized composition tools, enhancing the depth and quality of musical creation (Wang et al., 2024).
Despite MEC’s promising applications, several factors may limit its further development. First, current multimodal MEC methods face challenges in processing and integrating diverse data sources—such as audio, video, lyrics, and physiological signals—due to their heterogeneity. Effective alignment and fusion of these data streams, particularly in real-time emotion analysis, remains complex.
Second, there are substantial cultural differences in the interpretation of musical emotions, with existing emotion-labeling systems often rooted in Western cultural contexts, limiting their applicability in other cultural settings. This lack of cross-cultural adaptability hampers the global rollout of MEC systems, particularly in recommendation and emotion regulation applications.
Lastly, MEC technology often involves the collection and analysis of sensitive emotional data and physiological signals, raising privacy and ethical concerns. Data collection and handling in privacy-sensitive contexts must adhere to stringent privacy policies and regulations, as any lapses may deter user acceptance of MEC systems.
Limitations and Future Directions
While this study provides valuable insights into the trends, collaboration networks, and thematic evolution of Music Emotion Computing (MEC), it is important to acknowledge that bibliometric visualizations, including co-occurrence maps, collaboration networks, and co-citation networks, are influenced by methodological choices. These include parameter settings, thresholds, and clustering algorithms, all of which significantly impact the resulting network structures and visual representations.
For instance, co-citation and collaboration networks rely on thresholds to filter meaningful relationships, such as the minimum number of co-citations (set at 10 for Figure 6) or co-authored publications (set at 2 for Figure 5). Adjusting these thresholds could alter the prominence of nodes and edges, potentially highlighting different collaborative or citation relationships. Similarly, in constructing the keyword co-occurrence network (Figure 8), a minimum frequency threshold of five was applied to focus on significant terms and reduce noise. Modifying this frequency could change the keywords included in the analysis and, subsequently, the observed relationships.
Thematic clusters detected in the networks are also sensitive to the chosen clustering algorithm. In this study, the Louvain modularity algorithm was employed due to its robustness in identifying community structures based on internal connectivity. While effective, alternative algorithms might generate slightly different groupings or interpretations of the dataset. These parameter-dependent outcomes underscore the importance of recognizing that the maps presented are indicative interpretations rather than definitive conclusions.
Future studies could address these limitations by exploring the impact of alternative parameters and clustering methods or incorporating comparative analyses across different bibliometric tools, such as Scopus or other databases, to validate and extend the findings.
Conclusions
As an emergent interdisciplinary field, MEC has demonstrated robust growth over the past decade. By integrating knowledge from computer science, psychology, physiology, and musicology, MEC has developed a comprehensive research framework from basic emotional feature extraction to complex multimodal emotion recognition. Bibliometric analysis indicates that MEC research has evolved from traditional machine learning methods to deep learning and multimodal fusion, with physiological signals progressively incorporated into emotion models, adding a richer biological foundation to affective computing. This progression in technology and methodology has opened broad opportunities for MEC applications, such as emotion-driven recommendation, intelligent interaction, and mental health management, and has created feasible pathways for personalized and cross-cultural emotion recognition. Global collaboration networks have emerged, with researchers across Asia, North America, and Europe actively collaborating and sharing resources, which has significantly supported MEC’s international growth. Yet despite notable achievements in both technology and applications, MEC still faces key challenges. Multimodal data fusion requires further advancements to precisely and efficiently process the heterogeneity of diverse data types like audio, video, lyrics, and physiological signals. Additionally, cultural differences in musical emotion perception hinder the universality of emotion models. To support MEC’s global adaptability, there is an urgent need to establish universally applicable emotion labels and cross-cultural emotion models. Finally, MEC systems raise privacy and ethical concerns in handling emotional data and physiological signals, necessitating stringent data management and privacy protections to secure user trust. In sum, MEC holds great potential as a cutting-edge, multidisciplinary, multimodal, and globally cooperative field. To unlock its broader applications in emotion recognition, intelligent recommendation, and emotion regulation, substantial progress in technology, cultural adaptability, and privacy protection is essential. Addressing these challenges will enhance MEC’s utility and lay a foundation for its expansion in affective computing and human-computer interaction, propelling the field to new heights.
Statements and Declarations
Footnotes
Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Anhui Provincial Quality Engineering Project (2023jyxm1421 and 2022jnds051).
