Abstract
Music recognition is an interdisciplinary field, in the field of music retrieval and automatic music has very important application value in technology. In order to study the improvement method of music recognition for piano music, this paper compared the characteristics of music signals and speech signals around music related theories, discussed the selection of dimension of feature vectors, and used RBF neural network to identify 88 monosyllabic pianos. At the same time, the characteristics and calculation methods of the sound level contour with high frequency in western music and chord recognition were studied, and the specific formulas were given. The final study shows that: The improved method gives intermediate weights more inclined note nest, which has a higher accuracy than the traditional method and fault tolerance.
Introduction
Computer science is developed directly based on electronic technology, and computer music is also developed by electronic music. In the middle of the nineteenth Century, there was electronic music. In 1904, the United States made a wave sounding device, which could produce different sounds of sound, and established the first electronic music laboratory. For a long time, computer musicians should manually program programs, input the computers, and find out corresponding computer parameters according to various parameters of music, and use computers as a music generator.
In recent years, computer music has developed rapidly. On the one hand, various synthesizers are changing with each passing day. At the beginning of the 80s invention of the musical instrument digital interface, greatly promoted the synthesizer music prosperity, thus supporting the production of a variety of synthesizer and sequencer, audio device, effects, greatly develop sound equipment for the production of [1]. On the other hand, it is a variety of personal computer software and hardware devices that use personal computers to develop their music and develop services for music research. With the continuous upgrading and generation of computers, computer multimedia information processing is in the ascendant, with computer music as the guide.
The computer music signal processing is the interdisciplinary subject of modern signal processing, pattern recognition, artificial intelligence and music art. The music signal is a kind of special quasi periodic signals, compared to speech signal, tone frequency composition content is more rich, more complex, more broad spectral range, time rhythm features more obvious, which determines the music signal processing can’t follow the inherent [2] method and mode of processing of speech signal. At present, the research of computer music has made great progress in the recognition of computer music and the synthesis of music.
A series of problems, such as music recognition, classification and feature extraction based on computer and electronic technology, have been paid more and more attention by researchers. If we can flexibly apply the development of electronics to the field of music appreciation, we will undoubtedly reduce the labor intensity of music workers and assist them in their work. On the other hand, they will promote the intellectualization of music processing, recognition and creation. Therefore, to find a reasonable and practical value of the combination of points, to promote the development of this cross discipline has far-reaching significance.
State of the art
In music, only when the time value of sound is organized according to the rhythm of music, their relations are fixed, such as beat, rhythm, fixed rhythm and so on. Therefore, the concept of rhythm in the narrow sense is the values of repeated the sequences, and the main purpose of the identification of positive rhythm and pitch from find this relatively stable relationship. In rhythm recognition, a set of typical rhythmic models under the fixed rhythm should be established. The rhythm model and the beat model are interdependent, which together reflect the regularity of time organization. In western music, this rule is often multilevel, so the rhythm model should be multilevel.
Some scholars put forward a music rhythm recognition algorithm based on spectral analysis, because people’s perception of music rhythm is basically a physiological feeling of music energy fluctuation. A segment of the vocals cappella music rhythm to make the right judgments, the periodic variation of intensity mainly depends on the energy of the signal, so that I can analyze the signal energy in the frequency domain, determine the periodic component of signal energy in the song, this cycle is also music rhythm signal [3]. In order to get the rhythm of the music information analysis in signal energy, after determining the signal energy fluctuation method using signal extracting whole number values reduce the amount of data to be analyzed, the data on the value of the signal model of AR (autoregressive model) power spectrum estimation. In order to find out the cycle of the energy fluctuation of the music signal in the frequency domain, the rhythm of the sound signal is determined. Other scholars introduce Bayesian rhythm model, and then use the sequential Monte Carlo method based on Bayesian theory to infer the position of the segment and the rhythm of the musical fragments. For different musical instruments, different music speed, different rhythm mode music, this method can effectively extract the rhythm characteristics.
Methodology
Application of neural network to single tone recognition of piano
With the continuous development of computer network and digital entertainment, computer music has become an important part of intelligent multimedia. In order to build a natural and harmonious human-computer interaction technology, it is necessary to make a detailed and in-depth study of the computer audio-visual information processing. Computer music plays an important role in the field of human-computer interaction. That is, the computer can “understand” music, and make some natural response to the purpose of human-computer interaction [4].
Artificial neural network (Artificial Neural Network), is a general simulation model of human brain’s information processing, is a kind of parallel distributed information processing system is completely different from the traditional pattern recognition, it takes advantage of the biological neural network, is committed to a variety of real world according to the information processing of the biological neural system is similar to the way. The neuron is the basic processing unit of the neural network. It is a nonlinear device with multiple inputs and multiple outputs, and its structure is shown in Fig. 1.

Neuron structure model.
Generally speaking, a single neuron with multiple inputs can’t meet the requirements of practical applications. In practical applications, there are multiple parallel operation neurons. And on this basis, the outermost layer of the network structure is called the input layer, and the layer of the final output is called the output layer. The layer between the two is called the intermediate layer or the hidden layer [5]. The expression of the neuron model is as follows:
Among them, x j is the input signal to be transmitted, θ i is the threshold, W ij is the weight, s i is the external signal, that is the bias signal, s i outputs the signal for the neuron node, σ i is the input signal of the neuron node, f is called the action function of the neuron, and also is often referred to as the transfer function.
The learning of RBF networks is equivalent to finding the best fitting plane for training data in multidimensional space. The functions of each hidden layer neuron of the radial basis function network constitute a basic function of the fitting plane, and the network is also named. Radial basis function network is a local approximation network. For a certain local area of input space, there exist only a few neurons to decide the output of network. The RBF neural network is a three layer feed-forward network with a single hidden layer.
The input layer is composed of signal source nodes. The number of elements in the hidden layer is determined by the needs of the described problem, and the output layer responds to the role of the input mode. The transformation from the input space to the hidden layer is nonlinear, and the transformation from the hidden layer space to the output layer space is a linear [6]. The transformation function of the implicit element is a nonnegative nonlinear function of the local distribution center radial symmetry attenuation.
When the center point of RBF is determined, the mapping relationship is also determined. The mapping from the hidden layer space to the output space is linear, that is, the output of the network is the linear weighted sum of the hidden unit output, and the right here is the tunable parameter of the network. The topology of the RBF network is shown in Fig. 2.

RBF network topology.
From Fig. 2, we can see that RBF network is very similar to the general three tier BP net structure, and it is also a forward neural network model composed of input layer, hidden layer and output layer. The only difference is the activation function of the hidden layer neurons, which uses a function with local action instead of the Sigmoid function used by the BP network. The most commonly used activation function in RBF networks is the Gauss function, the formula (3); ∥• ∥ is a vector norm, usually the Euclidean norm, that is, the formula (4):
c
j
is usually called the center (or the center of mass) of the implicit element, and it can also be regarded as the weight vector of the implicit element. As a normalization parameter, δ
j
is used to control the size of the accepted domain, which is called the width (or radius) of the implicit unit function. X is one dimensional time:
When x = c j , φ j (x) gets the maximum value of 1, and when x is far away from c j , it gradually decreases to 0. The size of the accepted domain is approximately [c - 3δ j , c + 3δ j ], that is δ j , defines the size of the accepted domain. In addition, φ j (x) is about the radial symmetry of the center. Because the Gauss function is similar to the normal distribution function in mathematical statistics, it is also called e as the mean, and c j is the standard deviation. In the same way, there are similar properties for high dimensional functions.
The local characteristics of the RBF network makes its decision to accept the implicit concept of distance, that only when the input is closer to the accepted domain when the network will be the response between the input vector and the distance from the center of the network output size, so the input vector distance is equal or similar can be classified as a class, equivalent to between the input vector in accordance with the distance from the center of the space division. According to pattern recognition theory, the problem of nonlinear separable in low dimensional space can be mapped to a high-dimensional space and make it linearly separable [6] in this high-dimensional space. In this way, the original problem can be mapped to a linear separable problem if the number of the hidden elements (the dimension of the high dimensional space) and the function are reasonably selected.
In the RBF network, the ownership value between the input layer and the hidden layer is fixed to 1. The center and radius of each hidden unit are usually determined in advance, and the weights between the hidden layer and the output layer can be adjusted. The hidden layer of RBF network performs a nonlinear transformation, mapping the input space to a new hidden layer space, and the output layer realizes linear combination in the new space. This linear characteristic of the output unit makes it very simple to adjust the parameters, and there is no local minimum.
Music and speech are all voice signals. The basic principles of their recognition are interlinked. They are the analysis of sound signals, the processing of noise, feature analysis, recognition and so on. The monosyllabic signal of a piano is used as a sound signal and follows the basic law of acoustics. In the field of voice research, the current technology is relatively stable and mature, so this paper draws on and refers to the technology of speech recognition, and applies it to monosyllabic recognition.
At the same time, speech and music have obvious differences: individual differences in speech is larger, even the same person two times saying the same words sound, there are large differences and not the same as [7], but with a musical instrument, such as the piano, the same key at any time, any person press the voice, a very small difference, highly similar acoustic properties; music signal frequency band coverage is more abundant than the speech signal is much wider; in data processing, speech recognition by establishing a corpus to complete, the amount of data than the larger, and the tone for processing 88 note piano the range of recognition, the piano tone data obtained by recording the amount of data is relatively small. The principle of recognition is shown in Fig. 3.

Schematic diagram of piano monosyllabic recognition based on RBF network.
For a monosyllabic signal input from a microphone recording, quantization noise, aliasing interference, etc. will be generated when it is converted from quantization to digitization. In order to reduce the interference caused by these noises to monosyllabic signal analysis and feature parameter extraction, we first need to deal with the monosyllabic signal processed by filtering. It is similar to the significance of speech signal endpoint detection, and accurately identify the starting point and ending point of piano monosyllabic, suppress the noise interference of silent segment, and make the collected data truly become the data of monosyllabic signal, so as to reduce data volume and computation amount and reduce processing time.
Taking the piano as an example, the frequency of its notes is shown in Fig. 4. Twelve tone of average law provides a reliable scientific means for the quantitative characterization of the musical note, general instruments are in accordance with the law of average to twelve tone sound, as for the different instruments have different pronunciation is due to the general interval instrument, stringed musical instrument Co., and interval in different range leads to differences in the level of the pitch; musical instrument sounds sweet, bold don’t use lies in the pronunciation of different materials, the common vibration pronunciation material steel string, reed, diaphragm, overtone pronunciation constitute different proportions of these materials has led to its tone (bright and deep features that auditory feel different). But the height of a sound (pitch) is followed by the law of twelve tones, which provides a unified measure for the study of many musical instruments.

The frequency relationship of piano notes.
A musical signal is a [8] consisting of a base tone and its harmonics. For a particular musical signal, the phase relation between the pitch and the harmonic is determined. The periodic superposition of these signals is necessarily periodic on the time axis, which is the basis of the time domain recognition algorithm. In the algorithm, the signal is first passed through a high pass filter to filter the 50 Hz’s AC, and then the parallel processing is performed.
Music recognition is a complex subject, and its ultimate goal is polyphonic recognition. In order to achieve this goal, it is necessary to establish a unified system framework, in order to accommodate the recognition algorithm and the theory of music as much as possible. In the dimension of feature selection, because of the high dimension contains more information than the low tone signal dimension, but requires a longer processing time and more complex network structure, so the dimension from low to high were performed [9], weighing the recognition results and the complexity of the network and will eventually determine in the 64 dimension feature vector. The layout of the keyboard and the name of the corresponding notes are shown in Fig. 5.

The layout of the piano keyboard and the corresponding note name.
Experiments show that the above method can identify piano monosyllabic notes very effectively, and the accuracy rate is 100%, which indicates that the recognition algorithm is very effective. Because the fixed tone pronunciation, pronunciation from the start until it disappears, the pitch and harmonic frequency components are completely determined, unchanged, only the amplitude decreases gradually, even different piano overtone frequency and the pitch is fixed, so the training and recognition of the data as the sample data of the same under the conditions of recognition to obtain a rate of 100%. The results of the identification of the monosyllabic combinations of various combinations are shown in Table 1.
Piano monosyllabic recognition results (%)
By analyzing the results of the recognition, most of the errors occurred in the empty chords, especially in some fast - paced periods. Because the analysis window is determined, if input is a fast-paced segment, it means that the window will span more notes, and one frame data may contain more than one kind of chord, so the system will be confused. The beat synchronization algorithm will avoid this problem effectively, because not only the speed of chord changing is slower than that of beat, but also the [10] of air chord is seldom seen in one beat.
In this paper the music monophonic recognition, limits the study object of piano music, and there are different musical instruments in different countries and different ethnic groups, with the in-depth study, identification work may not be confined to a specific instrument, and is no longer limited to a single instrument. With the continuous expansion of research object instrumental music, recognition and recognition work can eventually develop into the recognition and recognition of musical instruments and even the vocal performance recognition.
Based on the background of the development of computer music, this paper introduces the hot research issues in the direction of music recognition, including rhythm recognition and chord recognition. The popular speech recognition methods, including RBF neural network and hidden Markov model (HMM), are applied to piano monosyllabic recognition and music chord recognition respectively. The two recognition algorithms are implemented by software, and the experimental results are analyzed. The conclusions of this paper are as follows:
Based on the analysis of the fundamental frequency extraction method of note, the author proposes an improved algorithm based on short time autocorrelation, cepstrum method and short time amplitude difference algorithm to extract the fundamental frequency of the gorge samples in each note. On this basis, using the mean method, and the method of processing sine method also detects the sample data, put forward a note recognition method of high accuracy. At the same time, discusses the identification and evaluation of the process of music notes, and notes the name from the two aspects of recognition of music notes, the degree of approximation ratio on each duration, frequency and standard scores, complete note recognition, the final evaluation standard for a given.
This paper is in the initial stage of research. There are still many problems to be improved. For example, the research of piano playing music ingenious algorithm is improved. The improved time-frequency domain method has a high degree of recognition accuracy for the detection frequency between 50 HZ–4000 HZ, but it has poor recognition accuracy in low frequency.
