Deep symbolic processing of human-performed musical sequences

Abstract

Artificial music tutors are needed for assisting a performer during his/her practice time whenever a human tutor is not available. But for these artificial tutors to be intelligent and fulfill the role of a music tutor, they have to be able to identify errors made by the performer while playing a musical sequence. This task is not a trivial one, since all musical activities are considered as open-ended domains. Therefore, not only there is no unique correct way of performing a musical sequence, but also the analysis made by the tutor has to consider the development level of the performer, the difficulty level of the performed musical sequence, and many other variables. This paper describes an ongoing research that uses cascading connected layers of symbolic processing as the core of a human-performed error identification and characterization module able to overcome the complexity of the studied open-ended domain.

Keywords

Artificial intelligence intelligent music tutors musical sequence symbolic processing

1 Introduction

Musical sequences are the standard means for communication with the musical language, just as phrases or sentences play the same role in a natural language. A musical sequence or sequence of musical phenomena has time, tonality and tempo information embedded, as well as a set of dynamics and timbral specifications [13], but above all it carries semantics. The semantics of a musical sequence always maps to emotions, and regardless of the type, complexity, and intensity of the communicated emotions, a sequence of musical phenomena is always needed because individual sounds, chords or even arpeggios alone are incapable of conveying such a complex message. Music is indeed one of the most intriguing activities of human intelligence.

Our current scientific understanding of the human brain is still insufficient to fully explain its behavior when engaging in musical activities (i.e. hearing, learning, composing, performing, etc.). Even short and long term memory responses seem to operate different in musicians and non-musicians [7]. But recent research with Event Related Potential graphs (ERP) has clearly shown some behavioral patterns in response to stimuli sequences [5]. Those patterns very much resemble the brain response when stimuli are phrases in natural language. The cognition process of determining the semantics of a stimuli sequence exhibits steady electrical brain activity when faced with a semantically correct natural language phrase. Nevertheless, it also exhibits a significant negative electrical response to a semantically incorrect phrase [6, 8]. Apparently, the brain’s response to musical sequences follows a very similar behavior [5, 9] even though the semantics of a musical sequence is much harder to specify.

Even in the presence of all the above limitations it seems clear that understanding the structure, attributes and relevance of musical sequences, as well as acknowledging the complexity and expressiveness of music notation enables us to understand the role of a human music teacher during the teaching-learning process, and in doing so, it allows the identification of the essential characteristics that an artificial music tutor must exhibit.

However, musical activities are open-ended domains [3] where, despite the presence of a universally accepted symbolic language, the semantics of such language is subject to interpretations, experience-related bias and emotional states. Moreover, in an open-ended domain there are never correct or incorrect solutions for a task. Consequently, in such a domain there is no easy way to formally define problems, and no exact way to test an agent or assess an answer. This scenario has posed serious obstacles for the development of artificial intelligence tools for the music domain [1], and has led some researchers to think that the pattern extracting and learning capabilities of connectionist models could help to overcome some or all of these obstacles. In this research we adopt a strictly symbolic approach and argue in favor of symbolic processing as an adequate means for dealing with the imprecision and open-endedness of the musical tasks domain. An important characteristic of the symbolic processing proposed herein is that it is structured as a sequence of symbolic processing layers, each of which has its own objective and solves a different part of the whole task. Also, each layer takes the output of the previous layer as input, in some sort of deep symbolic processing architecture.

This paper describes the ongoing construction and testing of a human-performed error detection and characterization module that is proposed as part of the core for future intelligent music tutors capable of assisting a student when learning to play a (monophonic) musical instrument.

2 Related previous works

Much in the state of the art research related to musical sequences (sometimes referred to as melodic sequences) points in two main directions: 1) sequence similarity measurement, and 2) pattern discovery and extraction. On one hand, the problem of defining and measuring the similarity between two musical sequences is commonly studied as part of a much bigger context related to plagiarism and copyrights [10 –12], but it can also be a core part in automatically learning the particular composing or performing style of a human musician [16 –18]. On the other hand, pattern discovery and extraction within musical sequences is a very general problem with applications in music-generating systems [15 –21], content-based retrieval [19, 22], and automatic music transcription [20, 23]. However, research aimed at designing and/or constructing artificial tutors in the music domain is more scarce. Much effort is placed on the teaching of singing and folk music because of the social and intercultural consequences of that knowledge [26, 27]. But when available, this type of research focuses on constructing software tools that operate within a well defined and heavily restricted domain. Depending on the application area, these tools present the student with some previously organized exercises and carefully constrained task environments. Also, they restrict their analysis of the student’s solution to finding some discrepancies with a pre-programmed model of the solution [24]. In other words, these applications do not deal with the open-endednes of the activity they intend to tutor [4].

3 Conceptual and theoretical framework

A musical sequence is an ordered list of symbols representing musical phenomena. A musical phenomenon is either a sound corresponding to any of the twelve semitones in the chromatic scale, or a silence. Both, sounds and silences can last for any valid duration, and sounds can be placed in any pitch octave within the sound range of a specific instrument (i.e. its tessitura or tessiture).

Musical notation is a symbolic language used to represent musical phenomena played with instruments or sung by the human voice. Computationally, this notation can be considered as a semi-formal language because, although its syntax is unambiguously defined, its semantics is highly ambiguous [14]. Part of the reason for the ambiguity of this language lies in the fact that, with the exception of tones, all other musical attributes (duration, intensity, pace, etc) are specified with categorical labels instead of mathematical quantities. Therefore, in a strict sense, it is not adequate to consider musical symbols as well grounded symbols. Although during the last century, the habit of describing the tempo of a sequence in beats per minute (bpm) was widely popularized, many other aspects of musical performance remain subject to the performer’s personal interpretation. Among the main conceptual proposals in this paper, such bpm specification can be used as the sole grounding anchor for analyzing and assessing human performed (monophonic) musical sequences without violating any due flexibility in the semantics of the language. In order to adequately solve this task, we carefully slice the analysis process into delimited layers, each one with a clear analysis goal of a concrete aspect of the performed musical sequence. Once the various aspects of the performance are analyzed, the final assessment process (not addressed in this report) can adopt the structure of an expert system, deciding the performance errors to be reported to the performer, as well as the corrective actions to be suggested, based not only on a performance analysis, but also on the experience level of the performer, the difficulty level of the sequence, and any other available and relevant piece of knowledge.

In this research we assume that the module proposed herein will fit into the architecture of a (future) intelligent tutor composed by the following modules [2]:

An interactive user interface.

A domain model knowledge base.

An error detection and characterization module.

A recommendation system.

A student model knowledge base.

A user profiling and management module.

Since the level of rigor or severity for the analysis and assessment process must be configurable, it is of the utmost importance that all layers are human-interpretable so that programmers and users can understand their behavior and output. Therefore, all processing layers operate in a strictly symbolic fashion, where each symbol and operation has unambiguous semantics and thus can be explained and traced back. Even the first layer, whose goal is to translate the acoustic information of the performed sequence into a symbolic intermediate language, operates in the same way by computing a chromagram of the acoustic signal. Since the initial problem statement guarantees that the sequence intended for recognition is strictly monophonic, there is no need at all to include connectionist elements in the first layer, the computation of a chromagram is enough for the intended purpose [25].

4 Taxonomy of errors

As the proposed module has the goal of identifying errors made during a musical sequence performance, it seems prudent to first make the type and nature of the errors sought clear. One of the guiding principles during this research has been the emphasis on analyzing musical sequences, as opposed to just musical sounds. In musical terms this means we are aiming at detecting solfège errors (i.e. the performer’s ability to correctly read music and perform what is written). But also, since the global task pursued is to tutor the performer in his/her learning process of playing a musical instrument, there is an obvious need to consider mechanization errors which are highly dependent on the specific instrument played.

In order to assess the correctness, usefulness, and generality of the reported module, we have implemented a study case for transverse flute. This wind instrument complies with the monophonic restriction, and at the same time it presents a set of interesting challenges, such as characterizing the effects on sound that can be induced by a poor insufflation technique by the performer. Fig. 1 shows the taxonomy of currently considered errors.

Fig. 1

Taxonomy of considered performance errors.

Solfège errors are those related to the reading and execution of a musical sequence independently from the musical instrument played. Along the performance, some notes may be misread or misinterpreted (tone error), and both notes and silences may last longer or shorter than intended (duration error). Some musical phenomena may be out of sequence (sequencing error) if the performer does not adequately read or follow repeat signs (such as da capo or dal segno) or breathing signs (for wind instruments). Lastly, the performer may ignore or misspell some dynamic marks as crescendo or piano-forte (dynamic errors).

Sound quality errors, on the other hand, are related to the mastering level of the musical instrument. In some musical instrument families if a sound is incorrectly emitted it may appear as a non-musical sound (background noise error) or as a series of very short sounds indistinguishably separated (excess vibrato error). Also, it is a common situation that the lack of experience of the performer leads him/her to an abrupt attack or decay of a note (incorrect attack/decay error).

5 Cascading layered symbolic processing

The core of the proposed module is structured as a sequence of five symbolic processing layers, each of which has a distinct goal and solves a different part of the whole processing task. Most important, each layer has its own flexibility criteria, and the accumulated effect of those criteria is what brings the intelligent aspect of the proposed module by closely reproducing particular aspects of the human behavior. An overview of the proposed symbolic processing architecture is shown in Fig. 2, and following is a detailed description of each layer.

Layer #1: Representation. The goal of this layer is to lay the symbolic framework for the rest of the process. As the first layer, its inputs are the formal specification of the sequence to be performed and the recording of the actual performance along with all pertinent tempo information. Both inputs are translated into the same custom intermediate symbolic language. The specification is not validated in any way, just rewritten in terms of the custom language and represented with a list (the REFERENCE list). The audio is analyzed by computing a chromagram which allows the identification of all its recorded phenomena in a time-frame context and all durations are expressed in frames from the chromagram. Each phenomenon is represented as a custom symbol and the ordered sequence of symbols representing the performed sequence is the output (the PLAYED list).

Layer #2: Segmentation. As input, this layer receives the two lists generated by the previous layer: the REFERENCE and the PLAYED lists. Considering each reference symbol as the label of a segment, this layer identifies and associates all the symbols in the PLAYED list that were played during the musical beat of the reference symbol. In the ideal case of a perfect execution, each segment in the sequence will be associated only with one played symbol, but even if errors are made, all the phenomena occurred during that beat will be clustered under the same segment label. A SEGMENTED-PLAYED list is the output of this layer.

Layer #3: Sequence analysis. Once all phenomena (all symbols) are clustered into the beat of the corresponding reference symbol, this layer identifies and measures all discrepancies between the REFERENCE list and the SEGMENTED-PLAYED list, but instead of a symbol-by-symbol analysis, this layer exclusively performs segment analysis. All duration and dynamics discrepancies are measured in time frames, while tone discrepancies are measured in semitones. These discrepancies are not labeled as performance errors yet. The output of this layer is a list of all found discrepancies (the DISCREPANCIES list).

Layer #4: Error identification. Taking the REFERENCE and DISCREPANCIES lists as input, as well as global information about the development level of the performer, this layer is devoted to the identification of error patterns. These patterns were previously included in a small knowledge base. Each error pattern, expressed as an if-then rule, specifies the presence of particular discrepancies with associated magnitude values, as the distinct signal for a performance error. The output of this layer is a list of identified errors (the ERRORS list).

Layer #5: Causal determination. Taking the ERRORS and REFERENCE lists as input, as well as an instrument-dependent knowledge base, the goal of this layer is to determine the causes for each error identified. The scope of this knowledge base is of a much more abstract level than the one used by the previous layer.

Fig. 2

The proposed cascading layered architecture.

The main working hypothesis is that the accumulated effect of all previous layers adequately approaches the behavior of a human tutor when identifying error in performed musical sequences.

6 Experimental assessment

In order to assess the precision and usefulness of the process described above, we designed a simple multi-sequence, multi-user test. The goal of this test was to compare the number of errors detected, during the performance of selected musical sequences with diverse challenging conditions.

Although the restrictions associated with the current COVID-19 pandemic are making it extremely difficult to experiment, we managed to bring together a set of four transversal flute performers of intermediate and advanced expertise levels. For the test, we carefully selected/designed four short musical sequences for test purposes. These musical sequences were specified as to test a wide variety of experience levels and performance abilities in the test subjects. A brief explanation of each sequence follows:

Major scales. The practice of scales is part of the study routine for performers of all levels. Therefore this first sequence contains seven measures with simple G-major scales with no silences. As the transition from one note to the next one almost always requires moving one finger, its difficulty level is considered as EASY. This sequence was selected and extracted from Gariboldi, G.(1998) “Ètudes compléte des gammes: opus 127: pour flûte.", as it is considered a classical text for flute practitioners[28]. See Fig. 3.

Intervals. While practicing simple complete scales is considered routine for every music student, separating consecutive tones changes the melodic sensation and presents a wide range of extra challenges for the performer. This sequence has seven measures with third scales in a 3/4 time signature, also with no silences. The difficulty level is INTERMEDIATE. This sequence was selected and extracted from Taffanel & Gaubert (1923) “Méthode Complete de Flûte.", exercise number 6, paragraph A [29]. See Fig. 4.

Unpatterned sequence. This sequence has very little melodic value if at all. However it was designed specifically for testing the performer’s ability to alternate the reading/execution of sounds and silences. It has eleven measures and each one contains silences of varying duration. Difficulty level is HARD. Unlike sequences 1 and 2, this sequence is an original proposal from the authors and it was designed to test some specific solfège abilities. See Fig. 5.

Patterned syncopated sequence. The last test sequence is also a solfège challenge, albeit considerably more difficult. In this case the sequence gets the performer involved in an apparently constant pattern, and then it gradually introduces several syncopation phenomena. This sequence contains ten measures and is also labeled with a difficulty level of HARD.This sequence is also an original proposal that has almost no musical value. See Fig. 6.

Fig. 3

Test sequence #1. G-Major scales.

Fig. 4

Test sequence #2. Intervals.

Fig. 5

Test sequence #3. Unpatterned sequence.

Fig. 6

Test sequence #4. Patterned and syncopated sequence.

The four test subjects were initially labeled for their experience level as intermediate or advanced. Tables 1 , 3 and 4 show the analysis of errors of each test subject while performing each test sequence. These tables show the number of errors detected in each performance and the ratio of those errors to the total number of musical phenomena in each sequence. Also, errors are grouped by type (solfège and sound quality errors) and a total and an average are indicated for each performer on each group. The total is just the sum of all errors made by the performer, while the average is the arithmetic average of the percentage of error versus the total number of phenomena in each performed sequence.

Table 1

Practical test on sequence 1

		solfège errors				sound quality errors
level	player	note error	duration error	sequencing error	subtotal average	background noise error	interrupted note	messy attack	transition error	subtotal average	total average
intermediate	1	0	22	4	26	9	8	8	42	67	93
		0.00	0.24	0.04	9.52%	0.10	0.09	0.09	0.46	18.41%	13.97%
	2	0	21	5	26	15	13	4	14	46	72
		0.00	0.23	0.05	9.52%	0.16	0.14	0.04	0.15	12.64%	11.08%
	3	1	13	3	17	3	5	18	34	60	77
		0.01	0.14	0.03	6.23%	0.03	0.05	0.20	0.37	16.48%	11.36%
	4	0	8	3	11	5	3	4	38	50	61
		0.00	0.09	0.03	4.03%	0.05	0.03	0.04	0.42	13.74%	8.88%
	5	0	19	4	23	5	9	10	43	67	90
		0.00	0.21	0.04	8.42%	0.05	0.10	0.11	0.47	18.41%	13.42%
	6	0	35	14	49	5	27	20	27	79	128
		0.00	0.38	0.15	17.95%	0.05	0.30	0.22	0.30	21.70%	19.83%
	7	0	23	16	39	12	1	16	64	93	132
		0.00	0.25	0.18	14.29%	0.13	0.01	0.18	0.70	25.55%	19.92%
advanced	8	1	11	8	20	2	0	5	23	30	50
		0.01	0.12	0.09	7.33%	0.02	0.00	0.05	0.25	8.24%	7.78%
	9	1	10	10	21	5	3	5	9	22	43
		0.01	0.11	0.11	7.69%	0.05	0.03	0.05	0.10	6.04%	6.87%
	10	0	26	11	37	3	1	5	2	11	48
		0.00	0.29	0.12	13.55%	0.03	0.01	0.05	0.02	3.02%	8.29%
	11	0	14	12	26	6	0	6	1	13	39
		0.00	0.15	0.13	9.52%	0.07	0.00	0.07	0.01	3.57%	6.55%
	12	0	19	15	34	9	3	8	3	23	57
		0.00	0.21	0.16	12.45%	0.10	0.03	0.09	0.03	6.32%	9.39%

Table 2

Practical test on sequence 2

		solfège errors				sound quality errors
level	player	note error	duration error	sequencing error	subtotal average	background noise error	interrupted note	messy attack	transition error	subtotal average	total average
intermediate	1	1	13	4	18	7	3	17	24	51	69
		0.01	0.14	0.04	6.59%	0.08	0.03	0.19	0.26	14.01%	10.30%
	2	1	26	8	35	12	4	13	2	31	66
		0.01	0.29	0.09	12.82%	0.13	0.04	0.14	0.02	8.52%	10.67%
	3	2	16	5	23	6	2	29	13	50	73
		0.02	0.20	0.06	9.47%	0.07	0.02	0.36	0.16	15.43%	12.45%
	4	0	37	20	57	4	24	20	9	57	114
		0.00	0.46	0.25	23.46%	0.05	0.30	0.25	0.11	17.59%	20.52%
	5	0	44	15	59	3	39	19	7	68	127
		0.00	0.54	0.19	24.28%	0.04	0.48	0.23	0.09	20.99%	22.63%
	6	0	27	6	33	3	0	4	34	41	74
		0.00	0.33	0.07	13.58%	0.04	0.00	0.05	0.42	12.65%	13.12%
	7	0	28	17	45	13	1	17	1	32	77
		0.00	0.35	0.21	18.52%	0.16	0.01	0.21	0.01	9.88%	14.20%
advanced	8	0	17	6	23	6	9	12	3	30	53
		0.00	0.19	0.07	8.42%	0.07	0.10	0.13	0.03	8.24%	8.33%
	9	0	21	4	25	6	2	7	7	22	47
		0.00	0.23	0.04	9.16%	0.07	0.02	0.08	0.08	6.04%	7.60%
	10	0	13	6	19	8	3	23	18	52	71
		0.00	0.16	0.07	7.82%	0.10	0.04	0.28	0.22	16.05%	11.93%
	11	0	13	7	20	3	1	3	0	7	27
		0.00	0.16	0.09	8.23%	0.04	0.01	0.04	0.00	2.16%	5.20%
	12	0	15	11	26	7	0	8	1	16	42
		0.00	0.19	0.14	10.70%	0.09	0.00	0.10	0.01	4.94%	7.82%

Table 3

Practical test on sequence 3

		solfège errors				sound quality errors
level	player	note error	duration error	sequencing error	subtotal average	background noise error	interrupted note	messy attack	transition error	subtotal average	total average
intermediate	1	0	35	2	37	3	3	15	5	26	63
		0.00	0.56	0.03	19.58%	0.05	0.05	0.24	0.08	10.32%	14.95%
	2	0	31	5	36	10	8	13	2	33	69
		0.00	0.49	0.08	19.05%	0.16	0.13	0.21	0.03	13.10%	16.07%
	3	0	32	7	39	6	4	12	11	33	72
		0.00	0.51	0.11	20.63%	0.10	0.06	0.19	0.17	13.10%	16.87%
	4	0	36	8	44	6	2	14	10	32	76
		0.00	0.57	0.13	23.28%	0.10	0.03	0.22	0.16	12.70%	17.99%
	5	0	31	4	35	0	3	16	5	24	59
		0.00	0.49	0.06	18.52%	0.00	0.05	0.25	0.08	9.52%	14.02%
	6	0	36	3	39	2	5	23	6	36	75
		0.00	0.57	0.05	20.63%	0.03	0.08	0.37	0.10	14.29%	17.46%
	7	0	36	6	42	4	1	7	17	29	71
		0.00	0.57	0.10	22.22%	0.06	0.02	0.11	0.27	11.51%	16.87%
advanced	8	1	31	4	36	5	6	4	6	21	57
		0.02	0.49	0.06	19.05%	0.08	0.10	0.06	0.10	8.33%	13.69%
	9	2	21	4	27	3	1	2	1	7	34
		0.03	0.33	0.06	14.29%	0.05	0.02	0.03	0.02	2.78%	8.53%
	10	0	39	14	53	9	3	9	3	24	77
		0.00	0.62	0.22	28.04%	0.14	0.05	0.14	0.05	9.52%	18.78%
	11	0	34	13	47	6	2	6	0	14	61
		0.00	0.54	0.21	24.87%	0.10	0.03	0.10	0.00	5.56%	15.21%
	12	0	40	10	50	6	2	9	2	19	69
		0.00	0.63	0.16	26.46%	0.10	0.03	0.14	0.03	7.54%	17.00%

Table 4

Practical test on sequence 4

		solfège errors				sound quality errors
level	player	note error	duration error	sequencing error	subtotal average	background noise error	interrupted note	messy attack	transition error	subtotal average	total average
intermediate	1	0	48	5	53	6	0	34	27	67	120
		0.00	0.38	0.04	13.80%	0.05	0.00	0.27	0.21	13.09%	13.44%
	2	0	48	5	53	17	3	18	6	44	97
		0.00	0.38	0.04	13.80%	0.13	0.02	0.14	0.05	8.59%	11.20%
	3	0	52	6	58	5	0	51	17	73	131
		0.00	0.41	0.05	15.10%	0.04	0.00	0.40	0.13	14.26%	14.68%
	4	0	50	4	54	5	3	38	12	58	112
		0.00	0.39	0.03	14.06%	0.04	0.02	0.30	0.09	11.33%	12.70%
	5	0	41	4	45	1	1	67	36	105	150
		0.00	0.32	0.03	11.72%	0.01	0.01	0.52	0.28	20.51%	16.11%
	6	0	45	5	50	4	4	60	31	99	149
		0.00	0.35	0.04	13.02%	0.03	0.03	0.47	0.24	19.34%	16.18%
	7	0	55	7	62	5	0	10	58	73	135
		0.00	0.43	0.05	16.15%	0.04	0.00	0.08	0.45	14.26%	15.20%
advanced	8	1	48	1	50	1	0	0	1	2	52
		0.01	0.38	0.01	13.02%	0.01	0.00	0.00	0.01	0.39%	6.71%
	9	0	56	2	58	5	1	3	5	14	72
		0.00	0.44	0.02	15.10%	0.04	0.01	0.02	0.04	2.73%	8.92%
	10	0	54	9	63	11	2	10	1	24	87
		0.00	0.42	0.07	16.41%	0.09	0.02	0.08	0.01	4.69%	10.55%
	11	0	59	7	66	9	2	7	1	19	85
		0.00	0.46	0.05	17.19%	0.07	0.02	0.05	0.01	3.71%	10.45%
	12	0	54	9	63	10	1	10	0	21	84
		0.00	0.42	0.07	16.41%	0.08	0.01	0.08	0.00	4.10%	10.25%

7 Discussion and conclusions

Since errors detected during the test are part of two different groups, solfège errors and sound quality errors, it would be an oversimplification to try to characterize test subjects just with their total number of detected errors. Solfège errors are caused by either incorrectly reading or executing what has just been read. Sound quality error, on the other hand, evidence the performer’s mastery of a specific instrument. The skills needed to avoid both types of errors are achieved only with years of practice, consistency and dedication.

During the test of all four sequences, the most common error, both in intermediate and advanced performers, was always the duration error. That fact is consistent with the initial explanation that music is an open-ended domain, and that the quantification of the pace in beats per minute was used as the only grounding element for the symbols. From the moment a chromagram is computed (in the first layer of the proposed architecture), all the following time-related analyses on the played sequence are made in terms of time-frames, and so even a one frame error can be detected, which is not audible by the human ear. Depending on their magnitude not all duration errors shown in test tables should be reported to the user, but that is a decision other future modules should make.

The observed behavior of all other errors falls within the normal and expected parameters. There is no test subject, either intermediate or advanced, with the best performance in all four sequences. But the diversity and precision of the errors reported by our architecture supports the initial hypothesis that the cascading connected layers architecture is an adequate means for emulating a human tutor’s ability to detect performed errors without transgressing the open-ended conditions of the problem and without the need of any connectionist element.

The immediate next steps in this research will follow two main directions. First, the development of layer #5, the causal characterization layer that will require the construction of an instrument-specific knowledge base. Then, the many extension possibilities must be examined, including upgrading to the full musical notation language, and eliminating the monophonic restriction on instruments. Also, in the quest for the best possible conditions of practical application, several tests should be made, particularly with beginners. Those tests should yield information on particular needs to be fulfilled by the error identification and characterization module, as well as by the future intelligent artificial tutors.

Footnotes

Acknowledgments

This work was supported in part by the Mexican Government through Instituto Politécnico Nacional (IPN) under SIP Multidisciplinary Project 2083, Projects SIP 20210189, 20211424, EDI, COFAA-SIBE, BEIFI-IPN; and CONACyT.

References

Marsden

, Music, Intelligence and Artificiality Chapter 2 in Readings in Music and Artificial Intelligence. Eduardo Reck Miranda (Editor). Routledge, Taylor & Francis Group, 2000.

Smith

, Artificial Intelligence and Music Education, Chapter 12 in Readings in Music and Artificial Intelligence, Eduardo Reck Miranda (Editor). Routledge, Taylor & Francis Group, 2000.

Holland

, Artificial Intelligence in Music Education: A Critical Review, Chapter 13 in Readings in Music and Artificial Intelligence, Eduardo Reck Miranda (Editor). Routledge, Taylor & Francis Group, 2000.

McLean

and Dean

R.T.

, (Eds.), The Oxford handbook of algorithmic music, Oxford University Press, 2018.

Hagoort

, Interplay between syntax and semantics during sentence comprehension: ERP effects of combining syntactic and semantic violations, Journal of Cognitive Neuroscience15(6) (2003), 883–899.

Hahne

and Jescheniak

J.D.

, What’s left if the Jabberwock gets the semantics? An ERP investigation into semantic and syntactic processes during auditory sentence comprehension, Cognitive Brain Research11(2) (2001), 199–212.

Williamson

V.J.

, Baddeley

A.D.

and Hitch

G.J.

, Musicians’ and nonmusicians’ short-term memory for verbal and musical sequences: Comparing phonological similarity and pitch proximity, Memory & Cognition38 (2010), 163–175.

Canette

L.H.

, Lalitte

, Bedoin

, Pineau

, Bigand

and Tillmann

, Rhythmic and textural musical sequences differently influence syntax and semantic processing in children, Journal of Experimental Child Psychology191 (2020), 104711.

Francois

and Schön

, Musical expertise and statistical learning of musical and linguistic structures, Frontiers in Psychology2 (2011), 167.

10.

Foucard

, Durrieu

J.L.

, Lagrange

and Richard

, Multimodal similarity between musical streams for cover version detection, In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (2010), pp. 5514–5517, IEEE.

11.

Volk

and Van

, Kranenburg, Melodic similarity among folk songs: An annotation study on similarity-based categorization in music, Musicae Scientiae16(3) (2012), 317–339.

12.

Nakano

, Yoshii

and Goto

, Musical similarity and commonness estimation based on probabilistic generative models of musical elements, International Journal of Semantic Computing10(01) (2016), 27–52.

13.

Tanguiane

A.S.

, Artificial perception and music recognition, In: Siekmann (Ed) LNCS746 (1993), pp. 95–130, Springer Verlag.

14.

Holland

, The Language of Music, Blog at WordPress.com (https://thepathofsound.wordpress.com/the-language-ofmusic/) The Path of Sound, John Holland, 2016.

15.

Pachet

, Papadopoulos

and Roy

, Sampling Variations of Sequences for Structured Music Generation, In ISMIR (2017), pp. 167–173.

16.

Pachet

and Roy

, Markov constraints: steerable generation of Markov sequences, Constraints16(2) (2011), 148–172.

17.

De Prisco

, Malandrino

, Zaccagnino

and Zizza

, A Kind of Bio-inspired Learning of mUsic stylE, In International Conference on Evolutionary and Biologically Inspired Music and Art, (2017), pp. 97–113, Springer, Cham.

18.

Fiebrink

, Caramiaux

, Dean

and McLean

, The machine learning algorithm as creative musical tool, Oxford University Press, 2016.

19.

Grosche

, MÃijller

and Serra

, Audio content-based music retrieval, In Dagstuhl Follow-Ups, vol. 3, Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2012.

20.

Benetos

, Dixon

, Duan

and Ewert

, Automatic music transcription: An overview, IEEE Signal Processing Magazine36(1) (2018), 20–30.

21.

Carnovalini

and Rodà

, A multilayered approach to automatic music generation and expressive performance, In 2019 International Workshop on Multilayer Music Representation and Processing (MMRP) (2019), pp. 41–48, IEEE.

22.

Murthy

Y.S.

and Koolagudi

S.G.

, Content-based music information retrieval (cb-mir) and its applications toward the music industry: A review, ACM Computing Surveys (CSUR)51(3) (2018), pp. 1–46.

23.

Goto

and Dannenberg

R.B.

, Music interfaces based on automatic music signal analysis: New ways to create and listen to music, IEEE Signal Processing Magazine36(1) (2018), 74–81.

24.

Tareq Hasan

Md.

and Shakara

A.H.

, A Signal processing approach to Music tutor, IOSR Journal of Computer Engineering (IOSR-JCE)19(6) (2017), 13–25.

25.

Jiang

, Grosche

, Konz

and MÃijller

, Analyzing chroma feature types for automated chord recognition, In Audio Engineering Society Conference: 42nd International Conference: Semantic Audio. Audio Engineering Society.

26.

Westerlund

, Karlsen

and Partti

, Visions for Intercultural Music Teacher Education, Springer Nature, (2020), pp. 219.

27.

Carson

and Westvall

, Intercultural approaches and diversified normality in music teacher education: Reflections from two angles, Action, Criticism, and Theory for Music Education15(3) (2016), pp. 37–52.

28.

Gariboldi

, Etudes complète des gammes: opus 127: pour flûte, G. Billaudot. 1998.

29.

Taffanel

and Gaubert

, Méthode complète de flûte Paris: Éditions Musicales Alphonse Leduc. 1923.