Abstract
At present, with the development of music, the variety of music is gradually enriched, and the piano music of ASEAN is one of the most popular music types. However, traditional music recognition is difficult to identify such a variety of music, and it cannot meet the needs of people, the design of the piano music library and the display platform is urgent and necessary, based on this, the article first established the N-grams index database, and built a music feature index database based on the learning N-grams algorithm which contains only the soprano part, finally we tested the algorithm built in this paper, and the results show that the retrieval speed of the database designed in this paper has reached the requirement, the algorithm designed in this paper has a certain degree of recognition, it can meet the daily needs of the people for the piano music of ASEAN.
Introduction
With the rapid development of network technology and multimedia technology, people have been able to reach more and more music easily, the following question is how to quickly find the music that you need in this vast music [1]. Therefore, the study of music retrieval is becoming more and more important, it has also gradually received a wide range of attention of people [2]. The traditional music retrieval is aimed at Mono music like humming, the search for polyphonic music such as the piano is relatively less [3]. But with the improvement of people’s living standards, there are more and more people especially children, who have a strong interest in learning piano, and the piano is the instrument that has the wildest diapason [4]. Therefore, the study of piano music retrieval based on the content is of great significance for people’s life and entertainment and polyphony retrieval research [5].
State of the art
The main research work of the music retrieval is to study how to use the features of the music to realize the retrieve of music information, the segmentation of the continuous notes is the preparation of the feature extraction, it can be realized by onset point detection, its algorithm is mainly focused on the detection function and peak extraction [6]. The detection function mainly includes the algorithm based on the signal feature and the algorithm based on the signal probability model, the signal characteristics mainly includes time domain characteristics, frequency domain energy characteristics and phase characteristics; the probability model mainly includes the model based on the change point detection and the detection instantaneous transformation algorithm based on the different models [7]. All these algorithms have their own shortcomings, for example, the effect of the time domain characteristic is not ideal when the amplitude of the music signal is very small; the algorithm of frequency domain energy characteristics cannot well detect the onset points of non-percussion music; while the algorithm based on probability model has the disadvantages of high complexity and time-consuming. The mainstream algorithm for peak extraction is to extract the correct peak point by using an adaptive threshold based on median filter, but its effect is very unstable [8]. By the note segmentation, music is divided into a series of single note, next is to extract the key features in each note [9]. A continuous note forms a melody that can reflect the characteristics of musical content, thus, the current music retrieval is mainly using the notes, melodies, rhythms, and other acoustic perceptual layers of music, the absolute pitch sequence and the relative pitch sequence are the two commonly used melodic expressions [10].
Methodology
Setting up a N-grams index database
The feature library is built by reading music files in the MIDI format, and this allows you to get information about the pitch of each note and the beginning and end of the note. The feature extracted from the third chapter is the candidate pitch sequence, it is not related to the duration of information, so the establishment of the characteristic library only uses the pitch information. In order to improve the speed of retrieval, establishing index by doing N-grams division to the pitch of music library, that is to find the target music need to match and the location of the matching pitch by index before fast matching. Before describing the establishment of the feature library, please take a look at some polyphonic piano music and the corresponding pitch value MIDI. As shown in Table 1, Table 1 below lists the MIDI pitch value correspond to this period of music:
Part of the pitch of the piano music “memories of childhood”
Part of the pitch of the piano music “memories of childhood”
For polyphony like the piano, all the pitch that has the same start time is called a musical event. According to the basic knowledge of music, in the piano playing, the right hand usually plays the main theme, and the left hand is only playing with the accompaniment, so the strength of the right hand is generally greater than the strength of the left hand, this is why when it is extracted to the pitch sequence, the pitch position of the right hand (the soprano) is always located in the front of the candidate pitch sequence, and the pitch position of the notes (the bass) played in the left hand is relatively backward. Therefore, when do N-grams index, it is possible to consider only the pitch value of the soprano part as the index object, and do not consider the bass part.
In the N-grams division, the selection of the size of N directly affects the retrieval speed in the N-grams partition, if the N is too big, it will increase the amount of the index data, and causing that index time is much long, general requiringN ≥ 2. The piano has 88 pitches, if the two element method is used to divide, then the index data no more than 88×88 = 7744, if the three elements method is used to divide, then there is a maximum of 88×88×88 = 681472 index data, this is an index value of 88 times more than two element method, so considering the method of dividing by two elements, that is, there are 2 soprano parts of the music event in each sliding window. Taking the soprano values of the first 6 sequences [47, 51, 54] in Table 1 as an example, 5 sequence windows [47, 51, 54] will be obtained by using two elements to divide, for the same window needs to be removed, so the location of the ID and pitch in the file is considered at the same time when establishing index, that is the offset of the first pitch of the music file, establishing the index libr ary as shown in the following formula:
In formula (1), first to find all soprano sequences which contain the [pitch i , pitchi+1] pitch in the music file, then the music files and their corresponding offset are stored in a cell, the cell is divided according to the different ID of the music file, Finally combining all the small cells corresponding to the pitch sequence [pitch i , pitchi+1] to complete the establishment of an index data. Taking the high tone sequence of the first 6 notes is divided by two elements methods for an example, to illustrate the establishment of this indexing mechanism, as Table 2 shown below:
“Childhood memories” part of the sound of high index table
There are only 3 sets of index data in Table 2, they are [47, 51, 54], this is the combination of the same data in the original 5 sets of data, so at the bottom of the sequence number and location of the file, the file positions of the next two lines are [1, 4] and[2, 5], this indicates that the two elements division of the pitch sequence of the position 1 and position 4 of the file 1 is [51, 54]; Similarly, the two elements division of the pitch sequence of the position 2 and position 5 is also [47, 51]. In this way, the index mechanism of the first 6 notes is established. What needs to be explained is that the key words in the index library [pitch i , pitchi+1] has many situation of same pitch i but different pitchi+1. In order to easily search in the index library, index library will be stored in a Hash structure here, as shown in Fig. 1.

Hash storage structure of the pitch index library.
In Fig. 1, pitch_l to pitch_n is the first data of each set of pitch in the index library, after them, then the corresponding pitch is second data in each set of pitch, each layer of pitch data is stored as an ordered array in a small to large order. The rightmost [ID, offset], is the different file corresponding to each set of pitch and its pitch offset. For sequentially stored data, the binary search method is a commonly used search algorithm, for an array of N length, the average search speed using binary search method is only o (log 2 (n + 1)). There are up to 88 pitches for Piano Music, for the pitch in the Hash structure for which the length of first layer is n, we can know that n = 88. The range of the pitch’s length m of the next floor is m ≤ 88, the ranges of pitch length k and R behind are the same as m, and the complexity of the algorithm of the binary search method obtained by this way is o (log 2 (m + 1)). From the point of view of computational complexity, just considering how many offset that a set of pitch group corresponds to, for the music feature library, if there are N notes in the library, and then the largest number of offset is N-1. If it’s the BF algorithm, for an input fragment of a length of 1, the time complexity of the algorithm is o (l × n), this is greater than the retrieval time complexity after using binary search method.
Many unnecessary searches can be excluded through the introduction of the N_graxns index, and it is helpful to improve the speed of retrieval, the specific pitch position of a music file can be directly matched by an index when input the music retrieval feature, thus, there is no need to make comparison between the features of the retrieval and all the music in the music library.
After setting up the N-grams index library, we have to consider how to match the target music quickly, and the research of fast matching algorithm is based on the N-grams index mechanism mentioned earlier in this paper. For the piano period in the WAV format, extracting candidate pitch sequence as a retrieval feature, and the N-grams index library is a soprano note for every musical event, so it is necessary to look for the soprano of every musical event in the candidate pitch sequence. According to the candidate pitch sequences extracted from the previous third chapters, we found that the soprano of every musical event usually appears in the front of the sequence, so we consider selecting the pitch of the first few candidates of the candidate pitch sequence as the keyword in the index library for searching.
Selecting the first few candidates of the candidate pitch sequence as the index key words is crucial to the retrieval results, and selecting 10 music with 200 notes for it (each note may contain more than one pitch), to count the position of the pitch in the standard pitch of the target music in the candidate pitch sequence, the statistical results are shown in Table 3 below:
The position statistics of the standard soprano in the candidate pitch sequence
The position statistics of the standard soprano in the candidate pitch sequence
In Table 3, “No” indicates that the high pitch of the standard pitch is not found in the candidate pitch sequence. From the statistical results, 62.5% of the total number of notes located in the first bit, the second bit have 25, the first two occupy 75% of the total number of notes, the number of the third and the behind is relatively small. So the first two bits of the candidate pitch are used as index key words, comparing with the pitch features in the index library, if the top two bits of the candidate pitch contain the pitch features in the index library, the match is considered successful; otherwise, match failed.
This part of the fast matching algorithm for music can only use the simplest BF algorithm at present, because in addition to the BF algorithm, all other algorithms use jumping mechanism would judge whether there is the same character in the pattern string, and it is not possible to judge whether the candidate pitch sequence of the two notes is equal currently.
For extracted retrieved features, we would compare the similarity degree between it and the feature library by using similarity calculation, when the user enters WAV format music clip retrieval, the candidate pitch sequence is extracted by features, then the similarity calculation algorithm is analyzed in this case. Assuming the candidate pitch sequence of a note is S = {s1, s2, s3, s4, s5, s6, s7, s8, s9, s10}, and its matching standard pitch is T = {t1, t2, …, t
m
}, m < 10, the m is limited here because only one person’s play is considered, of course, it’s still very rare to play 10 sounds at the same time. The 10 elements of the candidate pitch sequence, from S1 to S10, are arranged in a large to small order according to the amplitude of the spectrum. Taking the candidate pitch sequences and its matching standard pitch r with a length of m as an example, looking for it in turn in s: if the value of r appears, If the standard pitch appears, remembering its position in *s, and marking it as I = {i1, i2, …, i
n
} , n ≤ m, So the similarity between them can be defined as:
The m in formula (2) is the number of corresponding standard pitch, W is a scoring sequence, after calculating by similarity, returning the similarity value of the top 10 highest piano to the user as the retrieval result.
Accuracy test of system retrieval
Music retrieval based on content is a new field of research, currently there is no unified evaluation method and evaluation database, the evaluation method of retrieval in the international music information retrieval evaluation competition can be used for reference. The system’s original phonetic library contains 230 MIDI - formatted piano music, the input clip is 35 WAV - Format piano pieces, and the sequence number of the returned music list is sorted from large to small according to the similarity value of each piece of music, the more foregoing the sequence number is, the better the retrieval results are, the experimental data are shown in Table 5 as follows:
Figure 2 shows the retrieval results when retrieving different number of notes. It is found that the more the number of input notes can be detected, the more music can be detected. It is easy to understand that, with the increase of input music fragments, the more features that can be matched, and the accuracy of retrieval will naturally increase. When the input of a music fragment is only 5 notes, we just obtain the result of there is only 9 music in the first bit of music return list, while inputting a fragment of 20 notes, the number of music at the top of the search results has been greatly improved. Meanwhile, the number of detected music when the input note is 17 is same as when input note is 23, therefore, in order to improve the accuracy of retrieval in the actual retrieval, it is best for users to input the amount of notes that are more than 17, instead of having to reach more than 20. Because it is not a single pursuit of accuracy, as a matter of retrieval, the speed of retrieval is also considered, the more the input notes, the faster the retrieval speed will fall, the next section is an analysis of the problem.

Retrieval of accuracy statistics.
It is also found in the experiment, some retrieval results have same similarity, particularly when the number of notes in the music fragment is very small, as in Table 5, when the number of notes is only 5, according to the statistics of the similarity value of the first 10 bits, we found that the similarity value of 2 music was repeated on average, this is very bad for music retrieval. One reason for this is that this article only uses the pitch features of the music, and didn’t consider the rhythm information, subsequent research should take this rhythm information into consideration
For the retrieval system, it is an important index to examine the time consumption. This system mainly consists of two parts: feature extraction and search matching. In the program, time function test system with MATLAB consumes time, it can be accurate to the halo second level. In the time consuming test, close all other applications and unrelated services. Selecting 20 WAV-format music fragments and dividing it into situations include 5, 8, 11, 14, 17, 2 and 23 according to the number of notes, separately count its feature extraction and search matching time, in order to reduce the error, taking the average of 3 tests for each piece of music. The statistics of time consuming test results are shown in Fig. 3 below.

System time consuming test results.
From the graph a) in Fig. 3, we can see that the time of the feature extraction is basically linear with the number of notes, in the graph b), the search time also increases with the increase of the number of notes, because the more the number of notes, the more index times, and the more matching notes are needed. In order to see more directly the relationship between retrieval time and input time of music clips, the average time and retrieval time of 20 music under different number of notes were counted, as shown in Table 4.
Music file length and system retrieval time
From the statistical results in Table 4, we can see that the longer the input music segments are, the longer the retrieval time is. Combining with the statistics of accuracy in the previous Table 5, we can know that if the time is too short, then the accuracy will be very poor. Comprehensively considering the retrieval time and accuracy, users should input more than 6 seconds or more than 17 notes when entering the retrieval fragments, so that it is possible to detect the target music more accurately in a relatively short time. The retrieval accuracy of algorithm and the retrieval time are tested and analyzed above, and the experimental results show that the algorithm in this paper can achieve high accuracy for piano music retrieval in a short time.
With the rapid development of multimedia and network technology, Music retrieval has become an important research direction in the field of audio information retrieval. In the traditional music retrieval, we mainly process humming which is monotonous music, and it cannot meet people’s needs, therefore, it is particularly important to study polyphony retrieval, represented by piano music. First of all, this paper is based on the pitch extraction algorithm of the predecessors, proposing the method of extracting candidate pitch sequence, and each frequency in the sequence is sequentially ordered according to the magnitude of the frequency in the frequency domain; Secondly, based on the analysis of the existing similarity calculation algorithm, this paper gives the corresponding similarity definition of piano music according to the different position of the pitch in the candidate sequence. Through the study of N-grams algorithm, combined with the extracted pitch characteristics of piano music in this paper, and introducing N-grams feature index library which contains only the treble part; finally we test the algorithm structured by this paper, and the test results show that, the introduction of the search results improves the retrieval speed in the situation of the success rate has a small change, the corresponding experiment simulation verified the analysis of the above theory.
