Similarity analysis and repeating pattern detection in fingerprint features

Abstract

In a piece of music, repeating patterns can be easily identified by human beings. Theoretically, similarities between repeating patterns and non-repeating patterns should be different. In this paper, we study similarities of patterns based on fingerprint features. According to the analysis results, we also present a relevant method to detect repeating patterns. Evaluations on some of familiar songs indicate that our method is promising.

Keywords

1. Introduction

Many musical pieces generally show prominently repetitive structure. These repeating patterns are readily comprehended and commonly regarded as one of the most expressive and representative parts in music objects. Consequently, repeating pattern is commonly used to further analyze music, such as themes [1], motifs [2, 3, 4], structure analysis [5, 6, 7], music thumbnail [7, 8] and so on.

In order to discover repeating patterns, a lot of algorithms [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] have been proposed. According to types of feature data, techniques for discovering repeating patterns can be categorized into two types: alphabetic and numeric. character techniques first convert music signal into symbolization representation by a series of processing and then apply some techniques relevant to string matching to discover repeating patterns; and numerical techniques mainly process the numeric data by employing the similarity measure to detect a similar data or segment and further extracting repeating patterns from these data or segments by treatments of refinements.

However, previous works mainly focus on selecting features, improving accuracy, reducing the complexity of time and space, and so on. But, few literatures have been published about studying the similarity distribution of patterns. Actually, repeating pattern could be distinguished by human beings, and they should have explicit structures relative to non-repeating patterns. In this paper, the main aim is to study similarities of patterns. Though, durations of repeating patterns usually show tremendous variations from a few seconds to a few minutes, so it is very difficult to directly analyze as a whole. In literature [15], 256 subsequent sub-fingerprints as a basic unit were used to identify a song. This motivates us to exploit fixed-length blocks instead of patterns for analyzing similarities. According to analysis results, a related approach is also proposed to find repeating pattern and finally experiments are performed to evaluate.

The remainder of this paper is organized as follows. Section 2 introduces extraction of features; in Section 3, we analyze similarities; Section 4 detects repeating patterns; experiments and analyses are performed in Section 5; finally, we conclude in Section 6.

Figure 1.

Scheme of Haitsma and Kalker’s audio fingerprint extraction.

2. Extraction of features

What features are used mainly depends on actual applications. In this paper, we mainly focus on similar melody. In fact, the perception of repetitions is generally based on melody, which is related to note similarity. It is proved that there features [16, 17] such as Fourier Coefficients, Frequency Cepstral Coefficients, Spectral Flatness, Sharpness, CQT [5], fingerprint [15], etc., could express the perceptual features well. Here the concept of audio fingerprint can be considered as a short summary of a clip of music by exploiting fingerprint function to generate a feature sequence.

In addition, it was indicated that Haitsma and Kalker’s [15] fingerprint feature could represent the music notes more accurately and were robust against signal degradations. Therefore, it is very suitable in our research. Here we briefly review the extraction of the features as shown in Fig. 1.

Firstly, the audio signal is segmented into overlapping frames. The overlap factor of 31/32 is used to allow 5.8 milliseconds off with respect to the real frame boundaries and assure that even in the worst cases, both the query sub-fingerprints and the sub-fingerprints of the same clip in database are still very similar. Beside, each frame is weighted by a Hanning window. Secondly, raw time series data is converted into the frequency domain representation by performing a Fourier Transform on each frame. Thirdly, the frequency domain for each frame is divided into 33 non-overlapping sub-bands from 300 Hz to 2000 Hz with a logarithmic spacing. Finally, the sign of energy differences between subsequent sub-bands is calculated to obtain 32-bits as a sub-fingerprint from each frame.

Let $E(n,m)$ denote the energy of band $m$ of frame $n$ and the $m$ -th bit of the sub-fingerprint of frame $n$ by $F_{n}(m)$ , then $F_{n}(m)$ is computed as Eq. (1).

$F_{n}(m)=\begin{cases}1&\text{if }E>0\\ 0&\text{if }E\leqslant 0\end{cases}.$ (1)

where

$\displaystyle E(n,m)=E(n,m)-E(n,m+1)-E(n-1,m)+E(n-1,m+1)$ (2)

In applications, however, a single sub-fingerprint is usually too short to contain sufficient information. Thus, a fingerprint block as the basic unit is referred. Let $\textit{FP}=f_{1}f_{2}\ldots f_{N}$ represent a sub-fingerprint sequence, then the $n$ -th fingerprint block $\textit{FP}_{n}^{L}$ is expressed as Eq. (3).

$\textit{FP}_{n}^{L}=f_{i}\ldots f_{n+L-1},n\in[1,N-L],L\ll N.$ (3)

where $L$ denotes the amount of sub-fingerprints in a fingerprint block, namely block length; $N$ denotes the total number of sub-fingerprints in a fingerprint sequence; $f_{i}$ denotes the $i$ -th sub-fingerprint.

Moreover, bit error rate is usually used to represent similarity (distance). In this paper, the Hamming distance is used. If $\textit{FP}_{i}^{L}$ and $\textit{FP}_{j}^{L}$ are two fingerprint blocks, then the bit error rate $\textit{BER}(\textit{FP}_{i}^{L},\textit{FP}_{j}^{L})$ is described as Eq. (4).

$\textit{BER}(\textit{FP}_{i}^{L},\textit{FP}_{j}^{L})=\frac{\sum\nolimits_{l=0% }^{L-1}{\sum\nolimits_{m=0}^{31}{f_{i+l}(m)\wedge f_{j+l}(m)}}}{32\ast L}.$ (4)

where $m$ denotes the $m$ -th bit of the sub-fingerprint; and “ $\wedge$ ” is bit operator XOR (exclusive OR). It is noted that the smaller the BER, the higher the similarity is. The domain of BER is from 0 to 1.

Finally, we introduce the fingerprint block distance short for FD. For two fingerprint blocks $\textit{FP}_{i}^{L}$ and $\textit{FP}_{j}^{L}(j<i)$ , then $\textit{FD}(i,j)$ is defined as Eq. (5).

$\textit{FD}(i,j)=j-i.$ (5)

3. Analysis of similarity

In this section, the goal is to carry out the similarity analysis based on Haitsma and Kalker’s fingerprint features. Our method is to first cut a sub-fingerprint sequence into fixed-length blocks and then analyze the similarities of blocks. There are three typical relationships between fingerprint blocks: subsequent, non-similar and similar. Here the similar refers to having similar melodies. We design three separate experiments, with respect to three aspects. The training corpus consists of 100 songs performed by both male and female singers with different musical genres.

3.1 Similarities of subsequent fingerprint blocks

Because the large overlap is used for extracting features, subsequent frames have a large similarity and are varying as the increasing FD. Therefore, it is essential to learn the distribution of similarities in subsequent 31 blocks, namely FD from 1 to 31. The objective of the first experiment is to observe the distribution of similarities for subsequent fingerprint blocks. We select all fingerprint blocks of the training corpus, which contains about 2.5 million blocks corresponding to 77.5 million samples. $L$ sets to be 1, 64, 128, and 256, respectively. Finally, we calculate average BER for each FD.

Figure 2.

The distribution of similarities in subsequent blocks.

Figure 3.

The distribution of similarities in non-similar blocks.

The similarity distribution of subsequent fingerprint blocks is illustrated in Fig. 2. Firstly, because the lines coincide, it is evident that the similarity distribution of subsequent fingerprint blocks mainly relates to FD and is not associated with block length. Secondly, an important rule is shown in Fig. 2: as the FD increases, the similarity first decreases to the global minimum (BER gets to the global maximum), then increases and finally fluctuates around 0.5. In addition, by further observations, when $\textit{FD}>31$ , the similarity always slightly fluctuates around 0.5.

This experiment mainly indicates that subsequent blocks have a large similarity.

3.2 Similarities of non-similar fingerprint blocks

The goal of the second experiment is to study the similarities of non-similar blocks. Here the non-similar is relative to the similar. How to get non-similar blocks is the key to success in this experiment. In theory, each pair of fingerprint blocks from two different songs should be less likely to be similar or the probability of the similar can be almost ignored.

Based on these analyses, the second experiment is designed: randomly choose 2 K thousand pairs of fingerprint blocks for every two songs, and build 1 M pairs of fingerprint blocks from the training corpus. Moreover, to further observe similarities of different block lengths, we also choose four representative lengths: 1, 64, 128, and 256 for experiments. Finally, we calculate the BER scores for these block pairs and observe the score distributions.

Figure 4.

The distribution of similarities in similar blocks.

The distribution of similarities in non-similar blocks approximately follows the normal distribution as illustrated in Fig. 3. Besides, as the block length is increasing, the distribution is more concentrated, which means that the longer block is more expressive and representative.

Here we mainly focus on $L=256$ and its similarity almost lies in the domain from 0.41 to 0.57, with the error of less than $10^{-5}$ . One song usually contains about 25 thousands of sub-fingerprints, so if meeting $\textit{BER}\geqslant 0.41$ , then it can be completely seen as the non-similar relationship.

This experiment shows that the non-similar similarity assumes highly localized distributions, especially, as the block length is increasing and the distribution is more centralized.

3.3 Similarities of similar fingerprint blocks

The third experiment will investigate similarities of similar blocks. We here focus on $L=256$ and from Fig. 3 we know that if $\textit{BER}<0.41$ and $L=256$ , then the relationship should be similar or at least slightly similar. According to the analyses, an algorithm is specially designed to discover all similar blocks as far as possible and then observe its similarity distribution as described as follows.

Step 1: Step 1:
A fingerprint sequence $\textit{FP}=f_{1}f_{2}\ldots f_{N}$ is equally divided into fixed-length blocks with 256 subsequent sub-fingerprints and the $t$ -th block is expressed as:

$B_{t}=\textit{FP}_{(t-1)\ast 256+1}^{256},t\in[1,T].$ (6)

where $T=\left[\frac{N}{256}\right]$ is integer part of $\frac{N}{256}$ .
Step 2:
Scanning from $B_{1}$ to $B_{T}$ , the current block $B_{t}$ is detecting its similar blocks in $f_{(t-1)\ast 256+32}$ $\ldots f_{N}$ , namely meeting the condition $\textit{BER}<0.41$ . Moreover, In the experiments, we also find that if $B_{t}^{\prime}$ is a similar block of the current block $B_{t}$ , then $B_{t}$ is also similar to adjacent blocks of $B_{t}^{\prime}$ . To solve this redundant, after finding $B_{t}^{\prime}$ , we further check the neighborhood of $B_{t}^{\prime}$ to look for a best similar block $B_{t}^{\prime\prime}$ in place of $B_{t}^{\prime}$ ; after finishing, next operation will jump 32 sub-fingerprints forward from the current location $B_{t}^{\prime\prime}$ . The same operations are looping until the end.

We exploit this algorithm to generate similar blocks and observe similarities. Dataset comes from the training corpus. Experimental results are shown in Fig. 4.

Figure 5.
Hash table of storing similar blocks.

Figure 4 indicates that as the BER increases, the BER score reaches the first peak at $\textit{BER}=0.33$ , then promptly decreases to a minimum at $\textit{BER}=0.35$ , finally rapidly increases from $\textit{BER}=0.37$ .

This experiment indicates that similar similarities are the smaller compared with non-similar.
4. Detecting repeating patterns

In this section, according to the analyses of Section 3, a new algorithm is designed to detect repeating patterns. It has been proven that the longer block is more expressive and representative, so the block-to-block measure is more suitable in our method. Besides, repeating patterns come from similar blocks and a repeating pattern is usually consisted of many subsequent blocks, so we first capture similar blocks, join these blocks to form the longer blocks and finally obtain repeating patterns by refining the longer blocks. This approach is described as the following subsections.

4.1 Capturing similar blocks

We use the method of Section 3.3 to generate similar blocks. In addition, parameter settings are as follows: the similar threshold $\textit{BER}=0.41$ and block length $L=256$ . After capturing similar blocks, they will be inserted into a hash table short for HT as shown in Fig. 5, of which each item represents a segmented block $B_{t}$ and points to a list. In the list, each node represents a similar block, which is similar to $B_{t}$ , and is consisted of data domain and pointer domain. The data domain contains the first sub-fingerprint subscript and length of a similar block. The default of length is equal to be 256.

4.2 Mergence

A lot of similar blocks are obtained in Section 4.1. In this subsection, subsequent similar blocks will be combined to form the longer blocks. Because of using the high overlapping factor in the feature extraction, it also causes misalignments in subsequent blocks. So the error of 32 sub-fingerprints is allowed in combining.

Mergence rule: subsequent similar blocks are combined together to form the longer block by modifying length of the first sub-fingerprint and deleting the combined nodes.

Mergence description: assume that subsequent similar blocks $\theta_{1}\ldots\theta_{i-1}$ are coming from HT[ $j$ ] $\ldots$ HT[ $i+j-2$ ], if HT[ $i+j-1$ ] has a similar block $\theta_{i}$ meeting Eq. (7):

$|\theta_{i}.\textit{index}-\theta_{1}.\textit{index}-\theta_{1}.\textit{length% }|<32,$ (7)

then, they belong to the same pattern. Equation (8) is used to update length of the first sub-fingerprint $\theta_{1}$ and the node of $\theta_{i}$ is removed from HT[ $i+j-1$ ].

$\theta_{1}.\textit{length}=\theta_{i}.\textit{index}-\theta_{1}.\textit{index}% +\theta_{i}.\textit{length}.$ (8)

After merging, we obtain a lot of the longer blocks. But, there are still some of similar blocks which have not been combined. Such blocks mainly come from the noise or slightly similar blocks, which are difficult to distinguish. Therefore, in this paper, we restrict length of repeating patterns to be longer than 512 sub-fingerprints (approximates 6 sec duration) and the shorter will be removed.

4.3 Boundary refinement

In our method, using segmented blocks are to capture similar blocks, so there should be some of missing sub-fingerprints in boundaries of the longer blocks. It needs to perform the boundary refinement. This operation is very easily done by traversing along two boundaries and the whole fingerprint sequence.

After the boundary refinement, in HT, each node represents a detected repeating pattern. According to sampling rate, the duration of each pattern is easily computed.

5. Experiments and analyses

According to analyses of Section 3, Section 4 proposes an approach to discover repeating patterns. This section carries out experiments to evaluate the whole algorithm. It is well know that no matter what types of music, the comprehension for repeating patterns is the same. Besides, discovering repeating pattern is in the same song, so for different orchestrations, different rhythms and different human voices of a music, it will not affect the final evaluations. Thus, songs with clear structure and relatively strict repetition should be used to test for improving the quality of experiments.

These works [2, 5] have shown that pop music usually has relatively strict recurrence and clear structures. Consequently, we select a test corpus of 30 familiar pop songs, which contains male and female singers with different musical genres. We also annotate the ground truth of the repeating patterns on these 30 songs and these annotated patterns are exploited as our ground truth patterns. In annotating, such repeating patterns with longer than 6 s are chosen.

5.1 Evaluation

In this paper, the proposed method is based on similarity analysis. This subsection will evaluate our method by using three criteria: recall, precision and $F1$ .

According to sample rate, time representation of the annotated patterns can be easily converted into sub-fingerprint representation.

To evaluate our method, we construct two sparse $N\times N$ matrices $X$ and $Y$ . $X$ represents denotes the set of detected repeating patterns generated from Section 4 and $Y$ for the ground truth patterns of each song. In $X$ and $Y$ , the location where the patterns occur is assigned “1” and the other locations for “0”. Then the correctly detected repeating patterns are represented by $S$ :

$S=X\wedge Y.$ (9)

where the operator “ $\wedge$ ” represents the element-wise logical operation, with reference to subscripts. Moreover, in this paper, the recall $R$ is expressed as:

$R=\frac{S}{Y}.$ (10)

The precision $P$ is described as:

$P=\frac{S}{X}.$ (11)

The $F1$ measure represents the overall performance, which is usually defined as the harmonic mean of the average recall and precision:

$F1=\frac{2\ast R\ast P}{(R+P)}.$ (12)

The average recall, precision and $F1$ -measure based on these 30 songs are illustrated in Table 1.

Table 1

Average performance

	Recall	Precision	F1
Our method	81.7%	78.3%	79.9%

Table 1 shows how our approach can discover the true repeating patterns well. It shows that exploiting segmented blocks instead of the whole repeating patterns is feasible. Besides, the setting of similar threshold $\textit{BER}=0.41$ is reasonable, namely meeting the given conditions, the similarity distributions of both repeating patterns and non-repeating pattern are distinguishable in fingerprint features.

5.2 Performance analysis

In Section 5.1, it has shown that our method based similarity analysis is correct. In this subsection, we will analyze its performance. Because the longer block is more expressive and representative, the block-to-block similarity measure should be more efficient. This paper employs the fourth and fifth experiments to compare its performance with two relevant approaches: self-similarity [5] and AMG [2]. These three methods focus on melody, namely having similar melodies. Besides, their features are also similar.

The fourth experiment: computed the element-wise similarities ( $L=1$ ) in detected repeating patterns and observed the distribution of similarities. The results are shown in Fig. 6.

The fifth experiment: randomly selected a pair of the similar blocks ( $\textit{BER}=0.36$ ) from detected repeating patterns to compute the element-wise similarities and observed similarities of subsequent element-wise. The results are shown in Fig. 7.

Figure 6.

Distribution of element-wise BER scores in repeating patterns.

Figure 7.

The distribtuion of subsequent element-wise similarities in a similar block pair.

The element-wise similarity mainly lies in the domain from 0.05 to 0.8 based on detected repeating patterns as shown in Fig. 6. In each repeating pattern, the element-wise similarity greatly varies and the change of subsequent element-wise similarities is disorder as illustrated in Fig. 7. Therefore, literature [5] used a single threshold to capture similar data and literature [2] tried to use adaptive thresholds and suffix tree to capture variable similar segments based on the global correlation, clearly the ability of repeating pattern recognition is low. That leads to heavily relying on refinements to extract repeating patterns in these two approaches. In our method, the block-to-block similarity measure is applied and the longer block is more expressive and representative as shown in Fig. 3, obviously the ability for recognizing repeating patterns is remarkable. Therefore, similar blocks are easily identified and tasks of extracting repeating patterns are more efficient compared with [2, 9].

6. Conclusion

This paper mainly study and analyze similarities of patterns in fingerprint features and to verify the analysis results, a relevant algorithm is also presented to detect repeating patterns. Final evaluations show that how our approach can capture the true repeating patterns well, namely for the given conditions, similarity distributions of repeating patterns and non-similar repeating patterns can be approximately distinguishable. In addition, in finding repeating patterns, the block-to-block similarity match is employed, and the analysis results show that this method is more efficient in extracting repeating patterns.

This paper has shown that similarity analysis can be employed to study repeating patterns. It also provides a new way to understand and study repeating patterns.

In further works, we will exploit the similarity analysis to mine more valuable applications in fingerprint features and try to extend the similarity analysis to other features.

References

Liu

C.C.

Hsu

J.L.

and Chen

A.L.

, Efficient theme and non-trivial repeating pattern discovering in music databases, In Proc 15th International Conference on IEEE, Sydney (1999), 14–21.

Wang

Chng

E.S.

and Li

, A tree-construction search approach for multivariate time series motifs discovery, Pattern Recognition Letters 31(9) (Jul 2010), 869–875.

Lin

Keogh

Lonardi

and Patel

, Finding motifs in time series, In Proc the Second Workshop on Temporal Data Mining, Edmonton, Alberta, Canada (2002), 53–68.

Chiu

Keogh

and Lonardi

, Probabilistic discovery of time series motifs, In Proc 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington (2003), 493–498.

Wang

and Zhang

H.J.

, Repeating pattern discovery and structure analysis from acoustic music data, In Proc 6th ACM SIGMM International Workshop on Multimedia Information Retrieval, New York (2003), 275–282.

Paulus

and Klapuri

, Music structure analysis by finding repeated parts, In Proc 1st ACM workshop on Audio and Music Computing Multimedia, California (2006), 59–68.

Wei

and Vercoe

, Structural analysis of musical signals for indexing and thumbnailing, In Proc 2003 Joint Conference on IEEE in Digital Libraries, Texas (2003), 27–34.

Aucouturier

J.J.

and Sandler

, Finding repeating patterns in acoustic musical signals: Applications for audio thumb nailing, In Proc AES22 International Conference on Virtual, Synthetic and Entertainment Audio, Espoo (2002), 412–421.

Hsu

J.L.

Liu

C.C.

and Chen

A.L.

, Efficient repeating pattern finding in music databases, In Proc the Seventh International Conference on Information and Knowledge Management, ACM, Washington (1998), 281–288.

10.

Hsu

J.L.

Chin Liu

and Arbee Chen

L.P.

, Discovering nontrivial repeating patterns in music data, In Proc IEEE Transactions on Multimedia, New York (2001), 311–325.

11.

Chiu

S.C.

et al., Mining polyphonic repeating patterns from music data using bit-string based approaches, In Proc ICME 2009, New York (2009), 1170–1173.

12.

Wang

and Shi

, N-gram inverted index structures on music data for theme mining and content-based information retrieval, Pattern Recognition Letters 27(5) (Apr 2006), 492–503.

13.

Darrell

, Discovery of distinctive patterns in music, Intelligent Data Analysis, Intelligent Data Analysis 14 (Oct 2010), 547–554.

14.

Foote

, Automatic audio segmentation using a measure of audio novelty, In Proc IEEE International Conference on Multimedia and Expo, New York (2000), 452–455.

15.

Haitsma

and Kalker

, Highly robust audio fingerprinting system, In Proc 3rd International Conference on Music Information Retrieval (ISMIR 2002), Paris (2002), 107–115.

16.

Cano

, Content-based audio search: From fingerprinting to semantic audio retrieval, Doctoral Thesis, Fabra University, 2006.

17.

Mitrović

Zeppelzauer

and Breiteneder

, Features for content-based audio retrieval, Advances in Computers 78 (Mar 2010), 71–150.