Abstract
In this paper, a sparse feature extraction method is presented based on sparse decomposition and multiple musical instrument component dictionaries to address the challenges of existing methods in component-recognition and analysis of mixed musical instrument music data. These methods, which are often dependent on data labels, and rely primarily on frequency domain or physical features, can be improved significantly using this technique. Through the in-depth analysis of the sparse coefficient vectors, this method is capable of generating independent sparse music features that are highly interpretable and have been shown to intuitively express the composition of musical instruments, and capture the variations of emotion in the music. Consequently, this approach has great potential for application in the field of mixed musical instrument composition analysis and other time-varying signal analysis.
Keywords
Introduction
Music signal is a type of audio time-varying data that consists of fixed elements but in different combinations and intensities. This makes it a complex and variable source. In analyzing music, playing instruments is very important, especially in the study of chamber music, Concerto, and symphony. However, because of the multi-instrument ensemble and changes in intensity of these genres, the real mixed audio difficulty attaining the accurate labels for machine learning. This can lead to an unsatisfactory result in instrument recognition. The solution to this problem is the utilization of interpretable feature extraction methods. Traditional audio analysis and recognition methods are usually based on the frequency domain and physics, for example, FFT (Fast Fourier transform), CQT (Constant Q Transform), MFCC (Mel Frequency Cepstrum Coefficient), DCT (Discrete Cosine Transform), and WT (Wavelet Transform). These can reflect the frequency changes in music signals, but for mixed instruments composition, the frequency domain feature does not correlate with the main feature and harmonics changes need to be used to reach recognition of the instruments. In mixed music, the accuracy of harmonics and other features can be affected by the overlapping and intensity changes of the instruments, making it complex to analyze.
Sparse decomposition, which is a well-established method for compressions and analyzing signal components, has been well-studied by the research community [1] and has experienced significant advancement. Its applications include image and audio storage [2], super-resolved reconstruction and combination [3, 4], noise elimination [5], signal measurement of events or components [6], and sparse representation classification [7, 8]. Moreover, music signals, composed of notes with definite shapes but fluctuating amplitudes as well as compositions, are also suited to this method. The physical interpretation of sparse decomposition is evident. Consequently, by utilizing this technique, quantitative assessment and investigation of the music genre and mode are feasible, leading to a range of research and application prospects. For instance, The sparse representation classifier is employed to pinpoint chords in music [9], the sparse representation technology is combined with time modulation for music genre classification [10]. Sparse feature is applied for instrument recognition [11], and sparse feature is employed for evaluating music performance [12], thereby achieving impressive recognition outcomes. Furthermore, sparse coding is also employed for the transcription of polyphonic music [13], fast convolutional sparse coding is again employed to accomplish the transcription of common and context sensitive piano music [14, 15].
Sparse coding is a powerful technique that has been developed to generate music [16]. This method involves extracting sparse features from a given piece of music. These features are then used to generate new and original music pieces. Sparse coding is an important technique used in the analysis of musical signals for timbre modelling [17]. Musical timbre refers to the unique tonal and textural qualities of a sound that enable us to distinguish one sound from another, even when they have the same pitch and loudness. In sparse coding, a musical signal is decomposed into a set of basis functions that represent different spectral components of the sound. These basis functions are chosen such that very few of them are needed to accurately reconstruct the original signal. A sparsely coded model for symbolic music generation is gaining popularity in the field of artificial intelligence and music [18]. This model uses sparse coding, a technique that reduces a high-dimensional input into a smaller and more representative set of features, to encode music data in a concise manner. Sparsity-driven composition of atonal music is a modern approach that has gained popularity among avant-garde composers [19]. This method involves a technique that is rooted in the concept of sparsity, the idea that a musical composition can be represented through a sparse and compact set of fundamental material. Exploring sparsely coded representations for music structure analysis is an essential research topic in the field of music information retrieval [20]. Music structure is an essential aspect of music analysis that deals with identifying various sub-sections, such as verses, choruses, and bridge sections of a piece of music. These sub-sections form the basis for further analysis and processing, like music recommendation, cover song identification, and music annotation.
The above literary pieces cover music chords, genres, instrument classification and a variety of modulation analyses – all highlighting the effectiveness of sparse decomposition in the area of music signal analysis.
The use of computational intelligence in music composition has been a subject of great interest in recent years. The paper on the EvoComposer algorithm presents a new approach to music composition using evolutionary algorithms [21]. The authors show how they use a set of rules to generate four-voice music compositions that they test against a set of predefined metrics. They report achieving significant results compared to previously existing algorithms in the field. In a related study, a hybrid approach is presented to automatic music composition [22]. The approach combines different computational intelligence techniques, such as genetic algorithms and fuzzy logic, to generate music. A set of predefined musical rules is used to generate compositions that are then evaluated using heuristic metrics. In a fascinating study, a bio-inspired learning approach is presented to music style [23]. A set of musical style patterns are learned from existing music compositions. These patterns are then used to generate new compositions that emulate the learned musical styles. The authors report impressive results with their approach, showing that their model can achieve high accuracy in emulating different musical styles. These three studies highlight the potential that computational intelligence has in music composition. The use of evolutionary algorithms, hybrid approaches, and bio-inspired learning show promise in generating high-quality compositions that can rival those created by human composers. These techniques can be used to create customized music for different applications, such as film scores, video games, and music therapy.
In contrast to conventional techniques, sparse decomposition is an effective time-domain waveform-based and training data-driven method, which opens up a gate to music analysis and processing. Concrete studies suggest that the sparsity of music has a pronounced link to its genre and emotion [10, 24]. In this paper, the Sparse Performance Index (SPI) method is presented based on the sparse performance of a sample’s reconstruction vector, and how to utilize SPI as a feature for instrument recognition and time-domain music analysis in mixed instrument audio. To address the difficulty of assigning labels and quantifying unmarked mixed music audio, multiple instrument component dictionaries are produced. In light of these dictionaries, SPI sparse features can be extracted for real-time mixed instrument analysis and visualization of music samples, this scheme’s viability is authenticated in the experiments.
In the paper, the method of sparse feature extraction is founded on multiple instrument dictionaries and sparse decomposition. It may tackles the difficulties of current methods in recognizing individual musical components and analyzing mixed instrument music data. By avoiding data labels and depending on neither frequency-domain nor physical characteristics, the method provides interpretable, self-contained music features that reveal changes in emotional expression and instrument composition. This is highly useful for analyzing mixed instrument compositions and analyzing signals that vary over time.
Materials and methods
Basic principles of sparse decomposition
The dictionary D is used to sparse decompose the sample y
i
. The following general expression is shown in Equation (1):
Where, D = {d1, d2, …, d
M
}, the length of each atom d
j
is L; y
i
= (yi1, yi2, …, y
iL
)
T
, α
i
= (αi1, αi2, …, α
iM
)
T
represents the ith sample and its reconstruction coefficient vector; e
i
= (ei1, ei2, …, e
iL
)
T
represents the reconstruction error of the decomposition result. The sparse decomposition problem with l0 norm has been proved to be NP hard, so l1 norm is usually used to replace l0 norm in practical applications. At this time, the sparse decomposition iterative solution model can be expressed as Equations (2) and (3):
In practical calculations, the L1-norm is usually regarded as a constraint condition to transform the problem into a convex optimization problem for a solution. For example, the K-SVD (K-means Singular Value Decomposition) dictionary learning method [25, 26] and OMP (Orthogonal Matching Pursuit) regression analysis method [27] and its variants all employ this type of method to solve the calculation problem of sparse decomposition [28, 29]. In the results of the sparse decomposition, the sparse dictionary can be considered as the approximate fitting of the components in the sample set, and the sparse coefficient vector is the representation of the distribution of these samples, which can be used as the feature of some classification tasks. However, the distribution of the coefficient vectors is largely affected by dictionaries, and their semantic properties are not usually very clear. To solve the dependence of common methods on data labels and the problem of feature interpretability, further feature processing and semantic interpretation based on the sparse coefficient vector is needed to obtain a more effective sparse feature system.
As mentioned in the overview of this article, the dependence on the labels of training data is now a major constraint for mixed-data analysis and recognition. Specifically, in the field of mixed-music analysis, the mixed data for training must be labeled first in the training process of Deep Neural Networks (DNNs) and other methods. These labels are generally manually labeled, thus requiring heavy workload, and usually only include the labels of the main musical instrument types. They have no ability to measure the intensity change of the secondary musical instruments. For mixed data composed of several components with similar strength, the accuracy of the labeling cannot be guaranteed. Consider a data sample y and two different sparse dictionaries D
a
, D
b
, where D
a
is trained from a data set of the same quality as y, and D
b
is trained from an unrelated data set. Assuming C
i
is an independent component of y and d
aj
, d
bk
is any atom in the dictionary D
a
, D
b
, it is easy to see the conclusion shown in formula (4):
Thus, if the coefficient vectors of sample y modeled by D a , D b are V a , V b respectively, it is easy to know E (max(V a )) > E (max(V b )); By analogy, all components in y have higher correlation with the maximum value in the dictionary D a , and the expected value of a single coefficient is larger. It can be seen that the sparse dictionary with more concentrated coefficient vector energy and better sparse performance has higher matching with the sample. That is, if the energy distribution of the coefficient vector can be accurately measured, the measurement result will give clear semantic information— the matching degree of the data sample and specific component dictionary. Thus, it is possible to use only labeled single-component data and unlabeled mixed-source data to realize mixed-source data identification and further data analysis at the semantic level, without relying on mixed-source data labels.
Because the music signal itself is composed of a certain number of single tones, and the random noise intensity in the music signal set is relatively low, the dataset must be sparse, which is suitable for analysis using sparse decomposition algorithms. This paper proposes a Sparse Performance Index (SPI) based on sparse decomposition, which can effectively measure the complexity of the components in the sample. When this index is regarded as a sparse feature, the calculation method can be defined as Equation (5):
Where M is the number of dictionary atoms, and a ij , a ik represents the a i coefficient of the coefficient vector j, k respectively. When ||a i ||0 = 1, SPI (a i ) = 0, the minimum value is obtained, indicating that the sample sparsity is the best; When ai0 = M and |a i j | = |a i k | exist for all j, k, the maximum value of SPI (a i ) = 1 indicates that the sample sparsity is the worst (the coefficient is completely homogenized). By using the SPI sparse feature, an approach is proposed in this paper to extract a multi-dimensional feature vector with interpretable semantic information when using a dictionary of multiple musical instruments. This technique requires only single instrumental audio data, which is easy to obtain, and doesn’t necessitate manual labeling of combined music data. Consequently, our demand for manual labeling is minimal and can easily be applied to the analysis and recognition of other mixed-component datasets, resulting in high research and application potential. The feature index can estimate the energy distribution of coefficients within the sparse model by computing the sum of the two differences in reconstruction coefficients, allowing it to provide an immediate response to changes in the music signals, while being unaffected by variations of the Dictionary size, making it applicable to a variety of music analyses.
Parameter selection is one of the most important factors affecting the performance of sparse decomposition algorithm, including frame length, dictionary size and sparsity constraint (the number of atoms used in sparse modeling). For the more common music signal with 44.1 kHz sampling rate, the central C (261.63 Hz) can be taken as the reference to ensure that each frame has at least one complete waveform. Generally speaking, 256 sampling points is a suitable frame length. In terms of dictionary size, it is necessary to cover the waveform and phase of most single tones, while taking into account the over completeness and computational efficiency of the dictionary. Generally speaking, 1024 or 2048 dimensional dictionaries can meet the requirements of most solo or chamber music audio analysis. In terms of sparsity constraint, considering the multiple harmonics and noise in the general music signal, the optimal sparsity constraint for Solo and chamber music is generally no more than 10, and that for symphony is about 35.
After determining the dictionary-learning parameters, it is necessary to train the dictionary for each separate instrument or combination of instruments. All dictionaries need to use the same parameter training, especially the sparsity constraint, to ensure that the calculated SPI value will not be affected. Depending on different recognition requirements and data conditions, the dimensions of component dictionaries and SPI features can be freely selected. As an example, when it comes to a String Quartet, four types of dictionaries can be trained, respectively, for the violin, the viola, the cello and the string quartet. Each frame sample uses these four types of dictionaries to build models and calculate SPI. Additionally, data of several instrument ensembles can also be introduced to train more dictionaries, thus increasing the amount of SPI time series feature vectors and improving recognition accuracy. Conversely, only a few instrument training dictionaries with reliable data can be used, providing high flexibility to the application of this method.
In terms of sparse modeling and dictionary learning algorithms, due to the uncertainty of sparse decomposition algorithms, the SPI time series obtained needs to be smoothed for observation and analysis. The length of the smoothing window determines the time resolution of the analysis algorithm. The longer the window length, the better the stability of the algorithm, but the time resolution will be reduced accordingly. When the frame length is 256 sampling points and the overlap between frames is 50%, 400 ∼800 frames is an appropriate range. Because the smooth window will cover a certain length of time, the change of music signal with relatively real analysis results will produce a fixed length of time delay; When the window covers 400 frames, the delay is about 1.16 s, which will not significantly affect the normal real-time analysis. After the above steps are completed, multiple music sparse analysis time sequence feature vectors corresponding to different musical instrument composition dictionaries can be obtained, which can intuitively display the time sequence characteristics of the music itself.
After obtaining reliable sparse features of music timing, any general classifier, including SVM, neural network, etc., can be used to recognize musical instrument types or analyze time-domain mixing. The overall flowchart of feature extraction method is shown in Fig. 1.

Flow chart of feature extraction method.
In the experimental part, first, the proposed SPI sparse feature is verified by using real mixed instrument short segment audio to identify the main instrument components in the mixed instrument music. Then, the time-domain analysis spectrum based on SPI is given under the specific music genre (string quartet) and compared with the segment information to verify the ability of SPI sparse feature to reflect music segment differences and instrument composition changes. In the actual analysis operation, relatively mature K-SVD dictionary learning method and OMP regression analysis method are chosen to calculate the sparse decomposition.
Identification and classification of single musical instrument
In this part, the mixed-instrument audio dataset from the IRMAS Music Database [30] (IRMAS, a dataset for instrument recognition in musical audio signals: mtg.upf.edu/download/datasets/irmas) is used to verify the recognition effect of sparse features on a single instrument. The training data of the database is typically dominated by only one type of musical instrument, but it is also mixed with drums, accompaniment, and other musical instruments or sound information, and contains a variety of different musical styles. The training data segment is 3 s long and the sampling rate is 44.1 kHz, which meets the needs of mixed musical instrument audio recognition in the real environment.
The total feature dimension of SPI features must be ascertained between a number of tag types needed to accomplish the identification analysis task and the quantity of existing reliable single component sample data sets. For the analysis task, one must select all available sample data that corresponds to the tags, or alternatively, other data sets can be included as references.
Due to the lack of data sets containing unique instruments (e.g. electric guitars and vocal music), some uncommon instruments were excluded. Nonetheless, it is relatively simple to collect one-instrument data sets for practical use and such omissions will not affect the application of the method in question. Eventually, the identified musical instruments comprised of a cello, clarinet, flute, piano, saxophone, trumpet and violin. For each kind of musical instrument, the average SPI value of the segment is calculated to obtain the 7-dimensional feature vector, and all the recognition rates are the average results of 50% cross validation.
The experimental results in Table 1 show the distinguishing accuracy of each instrument from other instruments in percent. The table includes different feature extraction approaches and their corresponding accuracy rates for various musical instruments. The SPI Characteristics approach, which uses 7 dimensions, achieved an average accuracy of 90.21%. The MFCC Features approach, with 39 dimensions, had a slightly lower average accuracy of 89.95%. The Fusion Features approach, utilizing 2023 dimensions, achieved the highest average accuracy of 93.41%. The Timbre Toolbox approach, with 63 dimensions, achieved an average accuracy of 92.10%. Finally, the combination of SPI and MFCC features, with 46 dimensions, resulted in an average accuracy of 94.41%.Overall, the results indicate that the Fusion Features approach has the highest accuracy in distinguishing musical instruments, followed closely by the SPI+MFCC approach. The Timbre Toolbox approach also performed well, while the SPI Characteristics and MFCC Features approaches had slightly lower accuracy rates.
Distinguishing accuracy of each instrument from other instruments %
Distinguishing accuracy of each instrument from other instruments %
To demonstrate the performance of the feature extraction method more clearly, Support Vector Machine (SVM) was employed as the classifier to compare SPI features with the frequently used traditional MFCC features, the wide scale (2023 dimensional) fusion features used in literature [31], matlabtimbre toolbox features [32], and a combination of SPI and MFCC fusion features. Table 1 displays the recognition accuracy for every musical instrument and the total weighted average.
It is evident that the recognition accuracy of musical instruments varies due to the similarities between instruments and the disparity between the annotation quality of various instruments in the data set. For instance, the saxophone portion of the set is typically composed of ensemble parts which causes an overall decreased accuracy in its recognition. Meanwhile, the cello segment in the data set is mostly Concerto pieces and the trumpet segment consists mainly of solo works. Therefore, the recognition rate of trumpet is higher in contrast to other musical instruments. Comparatively, the recognition accuracy of the improved version of the MFCC for single instrument audio single tone reaches 96.28% [33]. For the more complicated IRMAS mixed instrument short-duration data set, the proposed feature extraction method proves to reach a similar effectiveness. Additionally, a standard neural network classifier was implemented to classify and recognize the same data set, and the results are presented in Table 2. It is clear that our approach has exceeded the baseline.
Accuracy of ANN seven-classification recognition with different features
In comparison, the accuracy of recognition using the fusion feature in the time-frequency domain with 2023 dimensions stands at approximately 68.3% on this dataset. On the other hand, the classification accuracy for 7 categories with the timbre toolbox feature having 63 dimensions is approximately 67.2%. Despite the high dimensionality of the two aforementioned fusion features, the performance of recognition using the proposed SPI+MFCC fusion feature has evident advantages, reaffirming the reliability of the experimentation results.
The experimental results in Table 2 show the accuracy of an Artificial Neural Network (ANN) in classifying seven different musical instruments using various feature extraction approaches. The SPI feature approach achieved an accuracy of 66.30%, while the MFCC feature approach had a slightly lower accuracy of 66.09%. The fused feature approach, which combines multiple features, achieved the highest accuracy of 68.28%. The Timbre feature approach achieved an accuracy of 67.18%, and the SPI+MFCC approach had the highest accuracy of 69.03%. Overall, the results indicate that the fused feature approach and the SPI+MFCC approach are the most effective in accurately classifying musical instruments using an ANN. The SPI and MFCC approaches had similar accuracies, while the Timbre approach had a slightly lower accuracy. It can be concluded that combining features or using a combination of SPI and MFCC features leads to improved classification accuracy in this experiment.
Based on the two sets of experimental results, it is evident that SPI features have a slightly better recognition ability on various classifiers compared to traditional MFCC features. Additionally, fusion features of SPI features display significantly higher recognition potential over traditional MFCC features. This phenomenon can be attributed to the semantic information contained in SPI features, while MFCC features primarily measure the physical properties of the data. By combining the two features, we can achieve improved recognition results. Moreover, due to the low feature dimension (only a 7-d feature), it can be concluded that sparse features are a reliable and effective tool for distinguishing instruments. Sparse features can be used alone or in combination with other features to achieve superior recognition capabilities, considering the complexity of the training data.
To test the efficacy of the proposed method on analyzing unlabeled hybrid instruments, an experiment was conducted using F.X. Richter’s String Quartet in F major performed by the Odessa Quartet. Due to the difficulty of obtaining reliable labels for mixed music data, a recognition rate was not used for evaluation. Instead, time-domain change spectra of SPI sparse features were displayed to demonstrate the ability of SPI features in reflecting changes in the composition of mixed music instruments. To conduct this experiment, three types of sparse component dictionaries were established for Violin and Viola. Bach’s unaccompanied Sonatas for Violin and Cello and Tillerman’s 12 unaccompanied Fantasies for Viola, with similar compositions and playing styles, were selected as training samples. Nathan Milstein, Petr Přibyl, and Maurice Gendron were the performers for the training samples.
To start the comparison process, the SPI time domain analysis spectra of the three types of solo music were displayed. The String Quartet data training dictionary was also used as a reference and plotted on the atlas. The smooth window length of the SPI curve was 400 frames, which is equivalent to 1.16 S. One sample of music data from each instrument was selected randomly to be used as the test sample. The remaining samples were utilized as training samples to establish a 256×2048 musical instrument composition dictionary. The reconstructed single frame sample employed up to 10 different dictionary atoms to form four types of SPI time-domain characteristic curves and construct the SPI time-domain characteristic spectrum. Numerous experiments were conducted, and Fig. 2 displays the results of the timing analysis of specific single instrument audio. Due to space limitations, only a portion of the findings was showcased.

Experimental results of single instrument audio SPI atlas.
The experimental results are consistent with the other trend analysis presented. The SPI spectra in all single instrument data show stable relative positions of several SPI curves across all segments. The curves of the corresponding instruments in the violin and viola experiments are generally lower than those of other instruments, while the cello experiment shows that the cello curve is only slightly lower than that of the viola and quartet analysis curves. This is because the cello timbre is thicker, resulting in higher complexity, as verified by the high position of the cello corresponding curve in experiments with other musical instruments. The SPI characteristics of the three types of instruments during solo performances show distinct differences: the violin curve is the lowest during solo performances, while the other curves are relatively scattered; the viola curve is the lowest during Viola solo performances, with all curves concentrated; the cello curve position is low during solo performances, while the violin curve is the highest. Therefore, it is possible to visually identify and analyze the differences in SPI feature distribution of different musical instruments, making the feature system highly interpretable, an advantage not available in other feature systems.
The following section showcases the outcomes of the String Quartet experiment, centering on the F.X. Richter String Quartet op.26:II in F major Presto. Figure 3 presents the analytical findings of selected sections.

Experimental results of SPI atlas of string quartet.
During the Quartet experiment, the SPI curves of various musical instruments were observed to fluctuate alternately, as indicated in the paragraph. While most of the performance consisted of multiple instruments playing together, the differences in SPI curves between instruments were not as noticeable as during the solo sections. However, the periods in which each instrument’s curve reached its lowest point and exhibited similar distribution characteristics to its solo section were found to correspond exactly with the periods in which that instrument’s voice dominated the overall performance, such as the dominant section of the cello from 5 to 20 seconds, 35–48 s for Violin and 210–215 s for Viola. Except for some solo passages of musical instruments, the Quartet curve in the whole music is at the lowest position most of the time, reflecting the genre of the music.
The experimental findings presented above demonstrate significant features of the SPI characteristic index system. Firstly, it employs single component data and unlabeled data as training data to analyze mixed component music data efficiently and accurately, resulting in accurate recognition through the distribution of the spectrum or comparison of SPI values. Secondly, it effectively characterizes intensity changes in non-dominant and weak components, which proves valuable for the analysis of secondary components. Lastly, SPI maps depict common changes that correlate with the time domain’s emotional and musical content, which opens up another avenue of research for future studies.
Sparse feature extraction also is an important technique in percussion music composition. Percussion instruments are often used to add texture and rhythm to a musical piece and as such, it is important to carefully select the features to be extracted and implemented in a composition. Sparse feature extraction involves identifying and selecting only the essential features of a musical piece that contribute to the overall sound and feel of the composition.
One way to achieve sparse feature extraction in percussion music is through the use of minimalism. Minimalism is a movement in music that emphasises restraint and simplicity, where musical ideas are repeated or gradually developed over time. This approach allows the composer to use minimal instrumentation to achieve maximum impact.
Another technique for sparse feature extraction could be to focus on a specific element of the percussion sound, such as the attack or decay, and use it as the basis for the composition. For example, a composer might use a snare drum with a very pronounced attack sound and build the entire piece around that sound.
Overall, sparse feature extraction in percussion music composition requires a careful balance between selecting the right elements and limiting unnecessary distractions. With these techniques, a composer can create a powerful and evocative percussion-based musical piece.
In this paper, a time-domain analysis method of music signals is designed based on sparse decomposition of multi-instrument dictionaries. By establishing component dictionaries of different instruments, measuring their time-domain complexity using the SPI index, and extracting sparse features, the combination of mixed instruments can be effectively distinguished, and the emotion and content changes of music can be directly reflected. The quantitative analysis of music has clear significance and value. It should be emphasized that the application of this method is not limited to the field of music. All sound signals that can obtain a clear component dataset can use this method to train the data component dictionary and analyze the component time series change based on sparse complexity evaluation. This makes the method described in this paper have intuitive and potential research and application prospects in many aspects.
Music composition truly embodies the concept of sparse representation with its minimalistic approach to sound [34–36]. The use of only a few select instruments and sparse notes creates an atmospheric and introspective piece that forces the listener to focus on each individual sound and its impact on the composition as a whole. The silence that lingers between each note allows the listener to fully absorb the sounds that are being played, and is used to great effect in building tension and emotion throughout the piece. The transitions between different sections of the composition are seamless, with each new sound being introduced slowly and deliberately, adding to the overall feeling of contemplation and introspection. Despite its sparse nature, the composition still manages to carry a strong emotional weight, and leaves a lasting impression on the listener long after it has finished playing.
Sparse feature extraction methods and other methods given for music analysis of mixed instruments differ in several ways including: Feature selection: Sparse feature extraction methods select only a small subset of features that are most significant in representing the music signals. Other methods, on the other hand, may use all the available features without selecting the most relevant ones. Computational efficiency: Sparse feature extraction methods are computationally efficient because they only calculate a small number of selected features. Other methods may require extensive computational resources due to the vast number of features to be calculated. Interpretability: Sparse feature extraction methods produce features that are easy to interpret and understand. Other methods may produce complex and abstract features that may be difficult to interpret and understand. Accuracy: Sparse feature extraction methods may achieve high accuracy in identifying the different instruments in a mix while using only a small number of features. Other methods may require more features to achieve the same level of accuracy. Robustness: Sparse feature extraction methods may be more robust to noise and variations in the input signals. Other methods may be more sensitive to noise and variations in the input signals.
Looking ahead, the potential of this method is endless. We can analyze more complex musical forms such as concertos, symphonies, and singing, including vocals, and study the inherent complexity characteristics of different musical instruments. In addition, we can explore further applications of this method in other fields, such as the sound capture and recognition of wild organisms, the capture and early warning of mechanical fault noise, and the capture of abnormal fluctuations of bioelectrical signals. With further research and innovation, this method will undoubtedly become increasingly popular and influential in a number of industries.
Footnotes
Acknowledgments
This work was supported by 2022 Scientific Research Project of Hunan Provincial Department of Education: Research on the Influence of “Information Cocoon” in the Intelligent Age on the Ideological cognition of Higher Vocational College Students and its Countermeasures (Project No.: 22C1038).
Conflict of interest statement
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.
Ethical approval statement
This article does not contain any studies with human participants performed by any of the authors.
This article does not contain any studies with animals performed by any of the authors.
This article does not contain any studies with human participants or animals performed by any of the authors.
