Innovative application of artificial intelligence in piano music education and evaluation

Abstract

The primary aim of this research is to develop and implement a piano evaluation mechanism enhanced with an error correction feature utilizing AI. Precisely, the study discovers to overcome the limitations of traditional piano teaching evaluations which frequently suffer from poor convergence and a tendency to fall into local extremes. We propose a novel AI method called Selfish Herds Search-integrated Improved Probabilistic Neural Network (SHS-IPNN), for piano music evaluation. The method leverages the Improved Probabilistic Neural Network (IPNN) for its robustness in handling data and for improving convergence in learning. In addition to using the short-term energy difference (STED) technique to precisely determine the temporal assessment of every note in the audio of a piano performance. Additionally, the Discrete Wavelet Transform (DWT) is applied to assess pitch accuracy. The entire system is integrated within a Musical Instrument Digital Interface (MIDI) framework to facilitate detailed evaluation of piano performances. The piano performances are categorized as “Good,” “Fair,” or “Poor” in this examination, which is structured for a classification problem. Our findings emphasize the efficacy of the SHS-IPNN technique, as demonstrated by its overall performance in terms of recall (90.5%), accuracy (96%), F1-score (93.5%), and precision (95.5%). The experimental outcomes indicate that the SHS-IPNN model outperforms existing methods in terms of accurately detecting performance errors and evaluating piano performances. The model’s increased accuracy in providing expressive, rhythmic, and overall judgments is demonstrative of this development. The innovative application of the SHS-IPNN method in piano music education demonstrates a significant advancement in the field. This approach not only improves accuracy in performance evaluation but also improves the learning procedure by providing error correction, which is crucial for developing proficient piano skills.

Keywords

music teaching piano performance evaluation model artificial intelligence (AI)error checking selfish herds search-integrated improved probabilistic neural network (SHS-IPNN)

Introduction

The piano is becoming more and more popular across the world as a vital instrument for performing music.¹ A thorough music education must include piano music instruction and assessment as it promotes both technical proficiency and creative expression. This introduction explores the many facets of teaching piano, the many pedagogical philosophies, and the techniques used to assess student development and competency.

Structure in piano education

Piano learning commonly starts with foundational capabilities which include analyzing music, keyboard layout, and primary hand positioning.² As students’ progress, they explore greater complex elements of piano playing, such as scales, arpeggios, chord progressions, and finally superior interpretative capabilities. Instruction may be formal, taking region in academic institutions or private training, or informal, which includes self-taught musicians gaining knowledge via online platforms.³

A vital factor of piano education is repertoire development. Students examine several pieces that now not only enchantment their musical tastes but also undertake their technical and interpretive capabilities. Teachers frequently encourage students to interact with an extensive variety of musical genres, from classical and jazz to modern famous music, improving their versatility and appreciation for special musical traditions.⁴

Contemporary piano learning often integrates generation, the usage of digital equipment to enhance traditional knowledge of Software for notation, rhythm education, or even digital pianos with demonstrated keys are not unusual in current school rooms. Online systems and video tutorials have additionally broadened get right of entry to piano learning, permitting students from various backgrounds to learn remotely and in their own steps.⁵

Cognitive and Emotional Benefits of Piano Education

Learning piano not handiest develops musical capabilities but also complements cognitive capabilities which include reminiscence, attention, and spatial-temporal reasoning. The problematic challenge of analyzing musical notation and translating it into motor moves requires complex neural processing. Moreover, playing the piano includes emotional interpretation and expression, fostering emotional intelligence and empathy.⁶

Piano training also gives big emotional advantages, imparting a creative outlet that could reduce pressure and enhance mental well-being.⁷ For many, piano playing is a lifelong adventure that enriches personal and social life, connecting people across cultures and generations.

Evaluation in Piano Music Education

A key component of teaching piano music is evaluation, which ensures that students gain technical proficiency as well as musicality and self-expression. Formative and summative approaches can be used to broadly categorize evaluation processes in piano teaching. Continuous feedback is given to students during the learning process through formative assessment, which aids in identifying their areas of strength and growth. However, the final assessment frequently takes various types of events, tests, or performances and serves as an indicator of students’ performance and suitability for progressing to more advanced education.⁸

Limitations

The accessibility and cost of piano music education are frequent issues, which makes high-quality instruments and instruction pricey for many conventional methods, which broadly communicate consciousness on the classical tune, and might not be capable of capturing students’ interest in a wide range of musical styles. Additionally, traditional training may also forget about improvisation and originality in want of an unduly technical emphasis.⁹ Evaluations with subjective biases might not be fair while judging performances. Learning important physical and audio complications may also be compromised by too much reliance on technology. We presented the novel use of the selfish herds search-integrated improved Probabilistic Neural Network (SHS-IPNN) technique for piano music education and evaluation.

Contribution and motivation of the work

The motivation and contribution of the study are discussed in this part. The shortcomings of conventional coaching techniques, which regularly lack accuracy and are ineffective in guiding students, are addressed via the introduction of the SHS-IPNN model for piano assessment. Through the combination of advanced AI algorithms and dependable record processing, this model improves assessment precision for piano performances. It uses techniques to assess pitch accuracy and temporal dynamics, in addition to techniques together with IPNN for advanced knowledge of convergence. The intention is to offer assessments that are more thorough and responsive, enabling quick fixes and customized comments. Increasing the precision of performance evaluations and the quality of the learning process overall greatly increases piano education. Thus, the contributions of this study could be summarized as follows.

• Data Collection: Ten piano students and five piano teachers make up a total of fifteen “participants.” 15 participant piano recordings, 23 songs for students, 6325 distinct samples, 1265 testing data samples, and training data samples including 5060 samples, were utilized to assess the model’s efficacy in each group, evaluate the outcome measures accurately, and ensure the accuracy of the outcomes.

• Data Splitting: The initial stage of creating a model involves creating training and testing sets from the dataset, with training data being the initial set and testing set the finalized model fit. 80% as training and 20 as testing.

• Temporal Note Value Analysis in Piano Performance using STED: The primary objective of the STED approach is to accurately discern the beginning and end of the audio recording from the noise outside.

• Assessing Pitch Accuracy using DWT: Radical frequency is the term used to describe the inherent tone in any musical style that has the highest intensity and lowest pitch. The pitch of the whole tone is directly determined by the pitch of the basic tone.

• Model building with selfish herds search-integrated improved Probabilistic Neural Network (SHS-IPNN): The network, SHS-IPNN, is utilized for piano music evaluation, enhancing data handling robustness and learning convergence through the use of IPNN.

This work is structured as follows: Part 2 related work. The methods and materials are described in Part 3. Part 4 presents the model evaluation, while Part 5 concludes the study.

Related work

The possibility that mobile AR apps may help with piano instruction was investigated in study.¹⁰ As mobile applications were being employed in educational activities, the connected sequence “Piano for Beginners” aim was described in the study. Learn to play the piano with Flowkey, Simply Piano, Skoove: Learn to Play Piano, and AR Apps: AR Pianist: VR Piano Concerts and Music All Around. The study’s findings could be useful to working piano instructors that are looking for fresh approaches to update their methods.

Examining novel approaches to teaching music and aesthetics using intelligent technology was the goal of study.¹¹ A total of 343 students 112 from elementary school, 123 from middle school, and 98 from high school participated in the study, which was conducted in the areas of piano, violin, and percussion in several Beijing music institutions. The students’ competency were assessed in multiple phases. First, the student’s level of proficiency was compared to its pre-experiment level using an average eight-point system. The findings indicated that the percussion class had improved the most, while the violin class had improved the least.

An RNN-based, Spark-based MIDI piano evaluation system was presented in Study¹² using the Deeplearning4J DL framework. Parallelization in feature extraction, model training, and preprocessing music data was realized with the Spark distributed computing engine. The RNN parameters are also evaluated. The outcomes demonstrated that the three-layer RNN structure’s error value was lower than that of its nearest competitors’ methods. The findings demonstrated that, with ensured efficiency, the assessment outcomes of the piano evaluation and performance framework were essentially commensurate with the real skill levels of the performers.

Techniques for automatic piano evaluation were established by study.¹³ Two distinct piano articulation techniques were taken into consideration: legato, which features staccato, and sustain pedals, and vibrating notes, which feature unconnected notes with the addition of sustain pedals. Piano noises were examined for each kind and categorized as “Good,” “Normal,” and “Bad.” For this task, the study looked into four different methods. LSTM, CNN, NB, and SVM. 4680 test items, including kids’ tunes and disconnected scale sounds performed by 13 singers, were used in the research. With a classification accuracy of over 80%, the findings demonstrate that the CNN technique was better than the other techniques.

For professional self-improvement while studying or to engage in a collaborative learning environment in higher music education, study¹⁴ utilized a smartphone. The study’s objective was to assess the effectiveness of an application intended to enhance sheet music reading skills. A pertinent innovation in many educational institutions was the use of a mobile app to improve the perspective of music education. The test group followed the application program and utilized the software both inside and outside of the classroom. Using the Student’s t-test, the test outcomes were compared.

The quality of training in the traditional preschool piano curriculum was raised by a study.¹⁵ DL technology was employed in piano training to boost kids’ interest in studying music. A focused music education plan was created after the issues with children’s traditional piano instruction were examined using the teaching methodologies covered by educational psychology. Second, technology for recognizing musical instruments was unveiled, and DL was used to implement the model for the purpose. Thirdly, to help kids learn music and increase their enthusiasm for studying the piano, the suggested approach was implemented in children’s piano instruction. The suggested approach shows increased feature identification and acquisition.

A specific technique for evaluating the potential correlation between ML and piano teaching was provided.¹⁶ The enhanced T-test approach was integrated with the ML association rule mining technology. A novel measure and degree of effect on association rules are presented, and the enhanced T-test was used to measure association rules. The result indicated that the degree of interaction can serve as a measure of association regulations for predicting the application of multimedia-assistant piano teaching evaluation data.

The usage of electronic piano teaching, its drawbacks, the notion of digital instruction as a single route of knowledge, and the lack of connectivity were all explored.¹⁷ The piano performance can be evaluated using the NN model, and it serves as a teacher substitute for assisting students with their exercises. The piano piece “Ode to Joy,” which was different from the set of NN training samples, was used to assess the effectiveness of the suggested system. Student A and student B, along with another piano teacher, performed the piece 10 times.

Investigating the role of interactive piano instruction in distance learning was the goal of the study.¹⁸ The study offered innovative methods for interactive piano teaching. The training program’s foundations include interactive groups, the Flowkey application, technical and psychological elements, role-shifting, improvisation, and the growth of self-control. Out of 120 pupils, 83% exhibited a good level of knowledge, according to the program findings. The remaining 2% indicated a poor level, which might be attributed to absence.

The broad examination of multimedia and complex networks in connection to piano performance and education was the main topic of study.¹⁹ Complexity systems and multimedia technologies were introduced into piano instructing instances in the article to categorize, analyze, research, and evaluate the cases, eliminate the extraneous, maintain the importance, and eliminate certain typical instructional scenarios that follow modern learning theories, teaching standards, and piano teaching features. The experimental findings demonstrated the beneficial supplementary effects of multimedia and network technologies on piano instruction and performance. Table 1 shows the existing works.

Table 1.

Depicts the existing works on piano music education and evaluation.

Reference	Technology used	Findings	Benefits
¹⁰	Mobile AR apps (flowkey, simply Piano, skoove, AR Pianist)	Useful for piano instructors seeking fresh teaching methods	Update teaching methods
¹¹	Intelligent technology	Used an average eight-point system	Enhanced music teaching in piano, violin, and percussion
¹²	RNN, spark-based MIDI piano evaluation system, Deeplearning4J	Lower error value in RNN compared to competitors; assessed piano performance	Ensured efficient assessment commensurate with real skill levels
¹³	LSTM, CNN, NB, SVM	CNN technique outperformed others with over 80% accuracy in classifying piano articulations	High classification accuracy
¹⁴	Smartphone app for sheet music reading	Improved sheet music reading skills	Improved sheet music reading inside and outside the classroom
¹⁵	DL technology in piano training	Boosted children’s interest in music. Enhanced feature identification and acquisition	Improved quality of preschool piano curriculum
¹⁶	ML, enhanced T-test, association rule mining	The potential usefulness of multimedia-assistant piano instruction identified	Novel measurement and impact assessment methods
¹⁷	NN model	NN model used as a substitute teacher to evaluate “Ode to Joy” performance	Assist students with exercises without the direct involvement of a teacher
¹⁸	Interactive piano instruction via distance learning	83% of students showed a good level of knowledge	Innovative methods for interactive and distance learning in piano teaching
¹⁹	Multimedia and complex network technologies	Beneficial effects on piano instruction and performance; focus on essential teaching scenarios	Elimination of extraneous content and focus on essential modern learning theories and practices

Proposed methodology

Dataset

This study collected 10 piano students and five piano teachers making up the total of fifteen “participants.” 15 participant piano recordings, 23 songs for students, 6325 distinct samples, 1265 testing data samples, and training data samples including 5060 samples, were utilized to assess the model’s efficacy in each group, evaluate the outcome measures accurately, and ensure the accuracy of the outcomes. There were 275 samples from every experiment participant in each of the 23 songs that were provided to the students. 80 training, 20 testing.

Based on the training data, the six individuals can be categorized into the following categories: We utilized the information from the 1st, and 2nd teachers to indicate “Good” achievement; 4th, and 5th students indicating “Fair” achievement; and the 2nd and 3rd students to indicate bad achievement. As a result, there are 5060 performers in the training sample, and nine of them provide data for assessment. We selected data from the 6th and 8th students to indicate fair achievement, data from the 3rd 4th, and 5th teachers to indicate “Good” achievement, and data from the 1st, 7th, 10th, and 9th students to indicate “Poor” achievement. Consequently, a total of 1265 test datasets were available.

Evaluation of the short-term energy difference (STED) for an enhanced endpoint detection algorithm

Detection of endpoints searches for the starting and ending points of the musical note signals within a segment of the audio file to ascertain the signature of every note’s duration in the audio. There is usually noise while recording audio. The primary objective of the STED approach is to accurately discern the beginning and end of the audio recording from the noise outside. Consequently, the double-threshold approach, which uses inputs from the time-domain defining factors, is the most often utilized endpoint-detecting methodology. It uses three parameters and a secondary assessment to primarily determine endpoints. However, there are several disadvantages to the dual threshold detection method, including its low noise resistance, threshold setting complexity, and over-reliance on threshold setting. By using the STED generated by those problems, this study suggests an enhanced endpoint detection technique. The primary method used by this algorithm to identify the energy mutation information and establish the note’s beginning is the STED. Next, based on the initial point position, by creating two levels of assessment, each starting point’s corresponding final position is identified. It is necessary to frame, window, and calculate the STED of the sound waveform to treat each note touched while performing the piano as an STED, as shown in

F_{j} = \sum_{M = 0}^{K - 1} {| w (m) |}^{2}

(1)

Equation (1) uses window $l e n g t h (k)$ and the $m t h$ point’s magnitude in the $j t h$ framing signals $w (m)$ , whose value depends on the sampling speed.

The STED $∆ F_{j}$ between two consecutive frames should then be calculated, as illustrated in:

∆ F_{j} = F_{j} - F_{j - 1}

(2)

By calculating the energy difference between two frames, rather than the energy difference between two sample points, this approach may remove minute energy variations from the audio stream. Furthermore, the difference function may allow for a more precise determination of the note’s starting point and precisely depict the unexpected decrease in energy. To determine the endpoint that matches the commencement point of each message, two fundamental thresholds are set: the STE and the STZCR standards. When as the signal’s two parameters fall below the threshold, the point is identified as roughly matching the current note’s start point. The destination that matches the starting position $m = 10$ is considered to represent the $m$ frames before the subsequent message begins. The second-level analysis concentrates on the differences between every two start and endpoint pairs. The variance among each pair of beginning and finishes is calculated. The pair is considered to be noise if the separation between its beginning and endpoints is shorter than the note’s shortest duration. It has been removed from the collections. Figure 1 depicts the design of the enhanced endpoint identification method using the STED.

Figure 1.

Architecture for STED.

Evaluation of the modified standard harmonic method for the RFE technique

In every musical style, the natural tone having the lowest pitch and highest intensity is referred to as radical frequency. The pitch of the whole tone is determined directly by the RFE, which is the pitch of the fundamental tone. The three primary types of extreme frequency extraction techniques are statistics-based frequency-domain and time-domain-based techniques. In this section, the RFE methods that are frequency-domain-based are generally employed. The HP technique and the assurance factor are the two types of extraction algorithms via the frequency domain. One common approach used by Discrete Wavelet Transform (DWT) is the HP technique.

The DWT provides good duration-frequency analysis by using long time frames for low-frequency signals and short-time windows for higher frequencies. A signal’s DWT decomposed employs two down samplers by two and sequential high-pass and low-pass filtering of the time sequence. As the discrete mother wavelet, the high-pass filter $h (m)$ is mirrored in the low-pass filter $(m)$ . The approximate and comprehensive parameters, denoted by B₁ and C₁, accordingly, are the outputs of the initial high-pass and low-pass filters. Whenever the desired number of division levels is reached, the B₁ is further broken down and the process is repeated.

The wavelet function $φ_{i, l} (m)$ essentially comes after the high-pass filter, and the dilation function $ψ_{i, l} (m)$ depends on the low-pass filter.

φ_{i, l} (m) = 2^{\frac{i}{2}} g (2^{i} m - l)

(3)

ψ_{i, l} (m) = 2^{\frac{i}{2}} h (2^{i} m - l)

(4)

Where m = 0,1, 2…, N-1; i = 0,1, 2…, I-1; l = 0,1, 2…, 2ⁱ-1; I = log2(N); and N is the signal’s duration.

The primary frequency elements of the provided signal determine the maximum level of decomposition that can be defined. The initial time series and the required foundational functions are multiplied to get the coefficients of the DWT, which are called the product of dots. Equation (5) represents the detailed coefficients $C_{j}$ in the ith level and the approximate coefficients $B_{j}$ .

B_{j} = \frac{1}{\sqrt{N}} \sum_{m} w (m) \times φ_{i, l} (m)

(5)

C_{j} = \frac{1}{\sqrt{N}} \sum_{m} w (m) \times ψ_{i, l} (m)

(6)

Where

l = 0, 1, 2, \dots, 2^{i} - 1 .

The equivalent and total energy of wavelet sub-bands the following formula is used to get the wavelet energy at each decomposition level,

j = 1, \dots, K

F_{C_{j}} = \sum_{i = 1}^{M} {| C_{j i} |}^{2}, j = 1, 2, 3, \dots, K

(7)

F_{B_{j}} = \sum_{i = 1}^{M} {| B_{j i} |}^{2}, j = K

(8)

The highest degree of decomposition is indicated by a “ $K$ ” Therefore, using equations (7) and (8), the total energy can be expressed as follows:

F_{T o t a l} = (\sum_{j = 1}^{K} F_{C_{j}} + F_{B_{K}})

(9)

The relative wavelet frequency is represented by the normalized energy values.

F_{q} = \frac{F_{i}}{F_{T o t a l}}

(10)

Selfish herds search-integrated improved Probabilistic Neural Network (SHS-IPNN)

There are several benefits to using the SHS-IPNN for piano music evaluation and education. By emphasizing on the most important modules for instructional suggestions, the SHS algorithm exploits the selection of characteristics from piano performances. After that, the IPNN component assesses these data and offers accurate evaluations of difficulties such as dynamics and timing. The hybrid model offers great assessment accuracy and customized education recommendations by incorporating the sophisticated pattern recognition of IPNN with the robust optimization of SHS. This integration allows for customized learning knowledge that improves student development in piano studies as well as the effectiveness of teachers.

Selfish herd optimization algorithm

The SHO optimizes teaching techniques based totally on each student’s overall performance, as a result enhancing individualized understanding in piano training. Evaluating information to discover strengths and defects and offer specific recommendations, complements the evaluation method. This method efficiently customizes piano courses to increase knowledge of goals and student involvement.

Initialization

The $M$ people production up the SHO population group are labeled as $T .$ Here is the definition of each person $T_{j} = (t_{j, 1}, t_{j, 2}, \dots t_{j, m}))$

t_{j, i} = w_{i}^{l o w} + r a n d (w_{i}^{l o w}, w_{i}^{h i g h})

(11)

Where the lower and higher boundaries are denoted by

w_{i}^{l o w}

and

w_{i}^{h i g h}

Furthermore, these equations are used to determine the numbers of predators $M_{o}$ and prey $(M_{g}) :$

M_{g} = f l o o r (M . r a n d (0.7, 0.9))

(12)

M_{o} = M - M_{g}

(13)

Where a purpose that converts a physical amount into an integer is represented by

f l o o r (.)

Distribution of the survival values

The capacity to survive is represented by an individual’s survival value $({S V}_{t_{j}})$ . For the survival value, the following is the mathematical equation:

{S V}_{t_{j}} = \frac{e (t_{j}) - e_{w o r s t}}{e_{b e s t} - e_{w o r s t}}

(14)

Where the goal function is represented by

f l o o r (.)

Movement of the prey

This section mostly consists of the predator leader’s movements and the prey followers’ escape or following movements.

Movement of the predator leader

The definition of the predator leader is as follows:

{S V}_{g_{K}} = \max_{j \in {1, 2, \dots . M_{g}}} ({S V}_{g_{j}})

(15)

Its present position is as the following:

g_{K} = \begin{array}{l} g_{K} + 2 . \propto . φ_{g_{K} o_{N}} . (O_{N} - g_{K}) i f {S V}_{g_{K}} = 1 \\ g_{K} + 2 . \propto . φ_{g_{K} w_{b e s t}} . (w_{b e s t} - g_{K}) i f {S V}_{g_{K}} I t; 1 \end{array}

(16)

o_{N}

denotes a location where the prey is a little hazardous,

w_{b e s t}

denotes the globally ideal position, and

φ_{w, z}

reflects the attraction among people

w

and

z

. The following defines

φ_{w, z}

and

O_{N}

φ_{w, z} = {S V}_{z} . e^{{- | | w - z | |}^{2}}

(17)

o_{M} = \frac{\sum_{j = 1}^{M_{o}} {S V}_{o_{j}} . o_{j}}{\sum_{i = 1}^{M_{o}} {S V}_{o_{i}}}

(18)

Where

{| | w - z | |}^{2}

indicates the Euclidean distance between prey individuals

w

and

z

{S V}_{o_{j}}

denotes the survival value of predator

j

, and

o_{j}

denotes the location of predator

j

Escape movement of the prey followers

Following prey ( $G_{E}$ ) and escape prey ( $G_{C}$ ) are the two categories of prey followers. Following prey ( $G_{E}$ ) is further separated into prevailing prey ( $G_{c}$ ) and subsidiary prey ( $G_{t}$ ).

The definitions of $G_{E}$ , $G_{C}$ , $G_{C}$ , and $G_{t}$ are as follows:

G_{E} = {g_{j} \neq g_{K} | {S V}_{g_{j}} \geq r a n d (0, 1)}

(19)

G_{C} = {g_{j} \neq g_{K} | {S V}_{g_{j}} \geq r a n d (0, 1)}

(20)

G_{C} = {g_{j} \neq G_{E} | {S V}_{g_{j}} \geq {S V}_{g_{v}}}

(21)

G_{t} = {g_{j} \neq G_{E} | {S V}_{g_{j}} \geq {S V}_{g_{v}}}

(22)

Where

{S V}_{g_{v}}

stands for the prey’s mean value, which is described as follows:

{S V}_{g_{v}} = \frac{\sum_{j = 1}^{M_{g}} {S V}_{g_{v}}}{M_{g}}

(23)

The following is the location update formula for various prey:

g_{j} = {\begin{cases} g_{j} + 2 . (β . φ_{g_{j}, g_{K}} \cdot (g_{K} - g_{j}) + γ \cdot φ_{g j, g_{d_{j}}} \cdot (g_{d_{j}} - g_{j})) i f g_{j} ϵ G_{C} \\ g_{j} + 2 . δ . φ_{g j, g_{d_{j}}} \cdot (g_{K} - g_{j}) i f g_{j} ϵ G_{t} \end{cases}

(24)

Where the random integers in the interval

[0, 1]

are represented by the symbols

β

γ

, and

δ

. A locally optimum person is represented by

g_{d_{j}}

g_{N}

stands for the prey’s comparatively secure location, which is described as follows:

g_{d_{j}} = (\begin{array}{l} g_{i} \in G, g_{i} \neq [g_{j}, g_{K}] | {S V}_{g_{i}} \\ q_{j, i} = \begin{array}{l} \min \\ i \in {1, 2, \dots \dots M_{g}} \end{array} ‖ g_{j} - g_{i} ‖ \end{array})

(25)

g_{N} = \frac{\sum_{j = 1}^{M_{g}} {S V}_{g_{j}} . g_{j}}{\sum_{i = 1}^{M_{g}} {S V}_{g_{j}}}

(26)

The following is the escape prey position update formula:

g_{j} = g_{j} + 2 . (β . φ_{g_{j} w_{b e s t}} . (w_{b e s t} - g_{j}) + γ . (1 - {S V}_{g_{j}}) . ε)

(27)

Where a casual path in the result space is denoted by

ε

Predator hunting process

Predators use the following equation to update their position:

o_{j} = o_{j} + 2 . ρ . (g_{q} - o_{j})

(28)

where

g_{q}

is a randomly chosen prey individual from the prey population according to the likelihood of predation, and

ρ

is a random value within the interval

[0, 1]

. The definition of the predation probability

(θ_{o_{j}, g_{i}})

is as follows:

θ_{o_{j}, g_{i}} = \frac{ω_{o_{j}, g_{i}}}{\sum_{n = 1}^{M_{g}} ω_{o_{j}, g_{n}}}

(29)

where the following definition of

ω_{o_{j}, g_{i}}

indicates predators and prey connection

j

and prey

i

ω_{o_{j}, g_{i}} = (1 - {S V}_{g_{i}}) \cdot e^{{- ‖ o_{j} - g_{i} ‖}^{2}}

(30)

Restoration stage and predation stage

Restoration stage

Through mating behaviors, every prey that has been slain by predators will regenerate. The following definition applies to the chance of prey mating:

\partial_{g_{i}} = \frac{{S V}_{g_{i}}}{\sum_{(g_{n} \in N)} {S V}_{g_{n}}}, g_{i} \in N

(31)

M stands for prey, which is not the target of hunting. The following are the individuals that result from mating:

g_{n e w} = m i x ([g_{q_{1}, 1}, g_{q_{2}, 2, \dots \dots . .,} g_{q_{m}, m}])

(32)

Where dimensionality elements of people are selected by

m i x (.)

Predation stage

Predators use dangerous regions as hunting grounds. The following radius circle surrounds the risky domain:

Q = \frac{\sum_{i = 1}^{m} | w_{i}^{l o w} - w_{i}^{h i g h} |}{2 . m}

(33)

In perilous environments, prey is described as

S_{o_{j}} = {g_{i} \in G | {S V}_{o_{j}}, | | o_{j} - g_{i} | | \leq Q}

(34)

Prey’s likelihood of becoming preyed upon in hazardous places is as follows:

μ_{o_{j}, g_{i}} = \frac{ω_{o_{j}, g_{i}}}{\sum_{(g_{n} \in S_{o_{j}})} ω_{o_{j}, g_{n}}}, g_{i} \in S_{o_{j}}

(35)

Improved Probabilistic Neural Network (IPNN)

The highly accurate pattern identification of the IPNN enables extensive assessment with customized information for efficient learning, greatly improving piano music teaching and evaluation. Real-time analysis and rapid interactive adjustments during courses are ensured by its quick processing speed.

In this part, the IPNN design is displayed, which substitutes the f-mean for the dot product of the initial instruction sequence and the evaluation sequence. The neuron located in the accumulation component of the PNN combines up each Gaussian generated by the layout element. Because the total of all Gaussians is nonlinear, PNN design is more complicated. Because Gaussian is a nonlinear exponential function, it may be made linear by applying a mean-containing logarithm function. This simplifies the design and reduces the amount of time needed to calculate the unidentified pattern’s categorization. There are the complete derivations. Give the PNN the design that it received as input.

W = (W_{1}, W_{2}, \dots . . W_{o})

(36)

A series of patterns unit’s output matches the training pattern $(S_{j 1}^{D_{l}}, S_{j 2}^{D_{l}}, \dots \dots, S_{j o}^{D_{l}})$

h (Y_{j}) = \exp [\frac{\sum_{i = 1}^{o} W_{i} . S_{j i}^{D_{l}} - 1}{σ^{2}}]

(37)

The weighted average of the $j^{t h}$ training pattern is represented by the symbol $\sum_{i = 1}^{o} W_{i} . S_{j i}^{D_{l}}$ . $\log (\sum_{i = 1}^{o} W_{i} . e^{S_{j i}^{D_{l}}})$ is an exponential function-based weighted $f - m e a n$ of the training sequence.

N_{j} = \log (\sum_{i = 1}^{o} W_{i} . e^{S_{j i}^{D_{l}}})

(38)

Insert

σ = 1

and swap out

Y_{j}

for

N_{j}

. Consequently

h (N_{j}) = \exp [\log (\sum_{i = 1}^{o} W_{i} . e^{S_{j i}^{D_{l}}}) - 1]

(39)

h (N_{j}) = \sum_{i = 1}^{o} W_{i} . e^{S_{j i}^{D_{l}}}

(40)

Where

l

is the number of training patterns for class

D_{l}

. Let

S_{C_{l}}

stand for the summarizing unit’s output. The summation neuron in a PNN adds up each pattern unit’s outputs according to its classification.

S_{C_{l}} = \sum_{j = 1}^{l} h (N_{j})

(41)

Changing the value of $h (N_{j})$ in equation (40).

S_{C_{l}} = \sum_{j = 1}^{l} \sum_{i = 1}^{o} W_{i} \cdot e^{S^{{}_{j i}^{D_{l}}{- 1}}}

(42)

The calculation for $j$ and $i$ in equation (42) may be swapped out, and as a result

S_{C_{l}} = \sum_{i = 1}^{0} W_{i} \sum_{i = 1}^{l} e^{S^{{}_{j i}^{D_{l}}{- 1}}}

(43)

Adding sigma

σ

as a smoothed parameter in equation (43),

S_{C_{l}} = \sum_{i = 1}^{o} W_{i} {(\sum_{i = 1}^{l} e^{S^{{}_{j i}^{D_{l}}{- 1}}})}^{\frac{1}{σ^{2}}}

(44)

The accuracy of a training dataset’s categorization will alter as sigma $σ$ values differ. The ideal sigma value for a given set can always be found. Consider that when $s i g m a = 1$ , equations (43) and (44) are identical.

\emptyset^{C_{l}} = (\emptyset_{1}^{C_{l}}, \emptyset_{2}^{C_{l}}, \dots \dots ., \emptyset_{o}^{C_{l}})

(45)

Where

\emptyset^{D_{l}} = {(\sum_{i = 1}^{l} e^{S_{j i}^{D_{l}} - 1})}^{\frac{1}{σ^{2}}}

ranges from 1 to

o

for every

i

Then, equation (44) can be altered as

S_{C_{l}} \sum_{i = 1}^{o} W_{i} \cdot \emptyset_{i}^{C_{l}}

(46)

The weightlifting schedule pounds are the ones that belong to class $C_{l}$ and are represented as

\emptyset_{i}^{C_{l}} i = 1 t o 0

(47)

Instead of four levels, IPNN only has three. For every input characteristic, there are neurons in the first layer. There is one neuron in the subsequent layer for every category in the initial information.

The weight between the neurons in the first and second layers, $i^{t h}$ and $j^{t h}$ , is represented by the symbol $\emptyset_{i}^{C_{l}}$ . In the second layer, every neuron calculates the dot product of the weight vector $\emptyset^{C_{l}} = (\emptyset_{1}^{C_{l}}, \emptyset_{2}^{C_{l}}, \dots \dots ., \emptyset_{o}^{C_{l}})$ with the input vector $W = (W_{1}, W_{2}, \dots . ., W_{o})$ . The result of the second layer, SCK, is supplied by equation (46). The decision layer is located in the third layer, and compares data transmitted by the second layer’s neurons. The class of the input pattern is determined by the second-layer neurons wherever the greatest value appears.

The size of the training set determines how much computation is required to classify an unknown point in a probabilistic NN, where the whole set could be saved and utilized through analysis. These are PNN’s two main shortcomings. However, with IPNN, the amount of sessions in the initial information determines how long it takes to categorize an unknown point because the second layer of the network has one neuron for each class. In general, IPNN can categorize an unknown location computationally easily since there are too few classes compared to the number of sequences in the initial information. Therefore, IPNN eliminates PNN’s limitations.

IPNN’s development of a piano evaluation of performance framework

MIDI standards-based music performance

Digital music uses MIDI as a standard. A MIDI instrument translates a player’s movements into MIDI signals, which are then sent to the sequencing via the MIDI device when the player produces music. A synthesizer is an instrument that arranges, modifies, and exports the notes, rhythm, and other elements needed for a musical composition to noise source for audio generation. A MIDI file represents the recorded MIDI signal. Sound properties, including pitch and time value, may be examined as the MIDI signal has been obtained. Different MIDI musical equipment’ MIDI inputs are linked directly to the computer audio chip in a MIDI system that consists of a desktop computer. Software on computers called sequencers is designed to receive MIDI signals, which are primarily received by the computer’s operating system. The audio card on a computer provides the API. The computer system receives MIDI messages that consist of many bytes.

Establishing the structure of IPNN

In the input layer of the NN, the input parameters that had been established correspond to 153 neurons overall. An output neuron must adapt to the player’s performance for it to be evaluated overall. Measurements of creativity and rhythm are also frequently employed in performance assessment.

Both are represented by two neurons, and the number of neurons in the NN output layer is found to be three. Based on past experiences, it may be inferred that, given the estimated amount of hidden layer neurons, compared to the entire amount of output and input layer nodes, there can be more hidden layer nodes collectively. After that, it is recommended to raise the number of disconnected layer nodes through performance parameters that are satisfied to the optimal level. After several trials, the number of hidden layer nodes for the NN structure of the minuet proved to be 168. Figure 2 depicts the framework of the IPNN-designed piano assessment framework.

Figure 2.

IPNN-based piano assessment model’s architectural design.

MIDI files for the correction of errors in piano playing music

Initially that have a recording of a piano playing, use the above-mentioned STED to extract the end and start points of every note using the enhanced endpoint identification technique. At this point, the note’s time value data is also established. To collect every letter in the audio and to determine the rhythm, the analysis separates the initial audio signal. Secondly, determine every note’s pitch data using the enhanced standard harmonics technique’s RFE technique. To identify the incorrect note that the player performed, the note time duration and pitch information that was acquired in the first two phases will be compared to the matching standard information in MIDI music.

Establishing the input parameters

Establish the chord characteristics’ parameters

Pronunciation occurring simultaneously at one moment creates a chord. The musical score and standard MIDI files provide the period point of each chord in progress. Each chord is evaluated at a different time point throughout the playing procedure. The basic tone is judged correct or incorrect based on the pitch difference. When playing a faulty chord, the amount of chord judgment mistakes is 1, and if the other one is incorrect, it is regarded as 0.5. Determine chord judgment errors and chord quantity to determine the input variable of chord pitch, similar to 2 input layer neurons. Determine chord time and strength using the same methodology.

Establish the pitch feature parameters

The opening bar of this piano piece consists of five notes. The notes D, G, A, B, and C are the greatest. Those are the MIDI messages 74, 79, 69, 83, and 72. When performing five tones, when playing four types, the input parameter is $5 / 5 = 1$ , it is $4 / 5 = 0 : 8$ , and thereafter. As there are 48 bars in the overall score, 48 neurons must be present in the input layer to satisfy the high-pitch feature’s input specifications.

Establish the melody characteristics parameters

The artificial arrangement of musical notes is called a melody. The initial melody may be split into four phrases, while the minuet can be separated into two sections. The portion that is played again is the second melody. The strength, pitch, and time value characteristics of each note segment may be added collectively to determine the typical song parameters. The neurons in the 6 input layers should have standard minuet melodic characteristics.

Establish the rhythm characteristics parameters

A sound’s rhythm may be used to characterize its duration, and the score’s representation of time is comparative. The variation in the sound’s duration must be obtained by converting it to absolute time. Utilize the initial calculation of the minuet as an illustration. The initial note has a time frame of one-quarter note, and the final four notes have a duration of one-eighth note and $0.5$ period, respectively. Consider that the duration of each note pronunciation is $0.25 s$ and $0.75 s$ , accordingly, and that the overall time is $0 s$ . The participant’s perception of rhythm might be computed if the note’s duration and timings are $0.1$ and $0.8 s,$ respectively.

e (w) = \sum a b s (T_{j} - A_{j}) R_{j} + a b s (F_{j} - C_{j}) R_{j}

(48)

In equation (48),

T_{j}

represents a quarter note while performed,

A_{j}

denotes a hypothetical quarter’s note pronunciation, and

F_{j}

is the pronouncing duration or length of an eight note while performed. The hypothetical eighth note pronunciation is represented by

C_{j}

, and weight is represented by

R_{j}

. Equation (49) indicates that the rhythmic knowledge of the performer is displayed in:

a b s (0.2 - 0) + a b s (0.8 - 0.75) = 0.16

(49)

a b s (0.7 - 0.75) + a b s (0.3 - 0.25) = 0.1

(50)

When the time is increased by the weight of

0.2,

the performance’s effect is slightly affected, but the rhythm input parameter is

0 : 16 + 0 : 2 * 0 : 2 = 0 : 17

also required for the input parameters of the rhythm are

48 n e u r o n s

in the input layer.

Establish the beat feature parameters

The power of the sound may be described by the beat. The force is expressed as a relative number, thus the score only minimally depicts the sound’s intensity. Use a MIDI standard signal that is artificially designed to represent the median value of samples entered with exceptional beat control. The additional bar of the minuet piano value consists of three beats: $w e a k, s t r o n g, a n d w e a k$ . Assume that the artist achieves $95, 75, a n d 70$ correspondingly, whereas the typical levels of each note’s strength are 70, 80, and 100. In this portion, $5 + 5 + 10 = 20$ is the beat input parameter. 48 neurons are needed in the input layer to process the beat’s input parameters.

Technique of experimental analysis

Training of IPNN model

The NN training procedure uses the Mean Square Error (MSE), whose description is provided in:

M S E = \frac{1}{m p} \sum_{o = 1}^{o} \sum_{i = 1}^{n} ({\hat{z}}_{p j} - z_{p j})

(51)

Equation (51), where $n$ denotes the total amount of production nodes, and the number of training samples, ${\hat{z}}_{p j}$ is the network’s predicted output value and $z_{p j}$ is its actual output value.

The NN must be developed to achieve the required accuracy standards after the input variables that influence the performance impact, as well as the characteristics and framework of the IPNN, have been identified. First, MIDI files and the piano teacher’s performance provide the standard data for each attribute. Then, using varying piano capabilities, features are entered and extracted. Three students and two piano teachers repeatedly playing the minuet piano provide training samples for the NN training technique. After receiving the NN input data, the group’s overall performance, rhythm, and expressiveness are assessed manually. The training set consists of 10 samples, with the data input ranging from 0 to 1.

Results

Experiment configuration

A directional condenser shotgun microphone with selectable intensity was used to record the piano sounds, and it was attached to an Aputure V-Mic D2. The baseline configuration used a monophonic channel with a 16-bit quality and an average sample rate of 22,050 Hz. The TensorFlow Dl architecture is used in the network, and the dataset is split 80:20 between training and test data. An Intel i7-10870H CPU, an RTX3080 8 GB graphic cards, $32 G B o f D D R 4 - 3200 R A M, a n d a 1 T B M . 2 P c i e S S D$ were used in our research.

Confusion matrix

A 3x3 confusion matrix is a tool used to show and examine the accuracy of predictions produced by a classification system in the framework of piano music instruction and evaluation. This matrix contributes to evaluating the effectiveness of the technique or system and classifies items into three predetermined groups in this case, “good,” “Fair,” and “poor.” These categories can be used to describe instructional strategies, piano performance quality, or any other component that is being assessed. The confusion matrix can be organized and explained in Figure 3.

Figure 3.

Confusion matrix obtained with SHS-IPNN.

Outcomes of the model evaluation

The metrics of precision, accuracy, F1-score, and recall are examined in this section. The effectiveness of traditional and suggested methods is being compared.

The ratio of accurately anticipated observations (including true positives and true negatives) to all observations is known as accuracy.

Figure 4 illustrates the accuracy rate achieved by the proposed methodology. Compared to other traditional method, the suggested model achieves an accuracy rate of (96%). SHS-IPNN has superior outcomes compared to the traditional method.

Figure 4.

Results of accuracy.

The ratio of accurately predicted positive observations to all expected positives is called precision, or positive predictive value. It serves as a measure of the precision of the positive forecasts performed.

The precision rate obtained by the suggested technique is shown in Figure 5. When compared to other traditional methods, the proposed model achieves a precision rate of (95.5%). When compared to the traditional method, SHS-IPNN produced better results.

Figure 5.

Results of precision.

The ratio of accurately anticipated positive observations to all actual class observations is known as recall. It assesses a model’s capacity to identify all pertinent cases or positive observations.

Figure 6 illustrates the recall achieved by the proposed methodology. Compared to other traditional method, the suggested model achieves a recall rate of (90.5%). When SHS-IPNN was compared to the traditional method, better results were obtained.

Figure 6.

Results of recall.

The calculated average of Precision and Recall is known as the F1-Score. False positives and false negatives are subsequently taken into consideration in this score.

The F1-score rate obtained by the suggested technique is shown in Figure 7. When compared to the traditional method, the proposed model achieves an F1-score rate of (93.5%). Better results were observed when SHS-IPNN was compared to the traditional method. Table 2 depicts the values of recall, accuracy, F1-score, and precision.

Figure 7.

Results of F1-score.

Table 2.

Values of recall, accuracy, F1-score, and precision.

No. of epoch	Accuracy (%)	Precision (%)	Recall (%)	F1-score (%)
10	87	86	87.1	87.2
20	88.2	87	87.8	88.5
30	90	88.2	88.2	89.2
40	92	88.5	88.5	90.2
50	96	95.5	90.5	93.5

The accuracy test outcome with error correction

Figure 8 displays the ECR outcome of the IPNN piano presentation assessment framework. The accuracy of the participant’s pitch error correction has been greatly enhanced by the SHS-IPNN. The model can assist learners at the piano in addressing mistakes and increase the accuracy and efficacy of their learning process.

Figure 8.

Test findings for the evaluation model’s error correction.

Conclusion

The study proposes a distinctive use of AI in teaching piano music by way of developing an Improved Probabilistic Neural Network (SHS-IPNN) that is Search-included for Selfish Herds. By using DWT for pitch accuracy and STED for unique note timing, the SHS-IPNN framework improves evaluation precision. Furthermore, the model’s overall performance evaluation abilities are much enhanced via optimizing IPNN using the SHS technique. Utilizing a MIDI framework, the SHS-IPNN version correctly assesses rhythmic, expressive, and trendy execution elements of piano playing. The version turned into education and tested the usage of statistics from minuet performances using teachers and students, teaching a strong basis for confirming its efficacy. The SHS-IPNN model represents a significant development in piano teaching as experimental findings display that it could become attentive and correct faults performance more accurately than current approaches. Our findings emphasize the efficacy of the SHS-IPNN technique, as demonstrated by its overall performance in terms of recall (90.5%), accuracy (96%), F1-score (93.5%), and precision (95.5%). This method improves educational findings and the efficiency of piano teaching by assisting students in identifying and correcting errors on their own, to improve evaluation accuracy. A of the limitations facing the SHS-IPNN in piano music education and evaluation are its high computational demand, and the requirement for huge, high-quality data. Due to its dependency on particular scientific parameters, its performance evaluations could not accurately represent the restraints of musical appearance. In the future, the model’s adaptation to various piano types could be prolonged, its computational efficiency could be maximized, and its sensitivity to expressive components might be improved. Making SHS-IPNN a more flexible tool that may be applied to different musical instruments and performance settings is possible by increasing the training dataset to include a range of performing levels and styles.

Footnotes

Funding

The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Exploration and Practice of Integrating Music and Virtual Reality Technology in Higher Education from the Perspective of New Liberal Arts, grant number: 220900583051429.

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The authors declare that the data supporting the findings of this study are available within the article. The raw/derived data supporting the findings of this study are available from the corresponding author at request.*

Appendix

Augmented reality = AR	Machine learning = ML
Deep learning = DL	Neural network = NN
Recurrent neural network = RNN	Application Programming Interface = API
Musical Instrument Digital Interface = MIDI	Short-term energy = STE
Long Short-Term Memory = LSTM	Short-term zero-crossing rate = STZCR
Convolutional Neural Network = CNN	Radical Frequency Extraction = RFE
Naive Bayes = NB	Harmonic peak = HP
Support Vector Machine = SVM	Error correction rate = ECR

References

. Innovative music education: using an AI-based flipped classroom. Educ Inf Technol 2023; 28(11): 1–16.

Olvera-Fernández

Montes-Rodríguez

Ocaña-Fernández

. Innovative and disruptive pedagogies in music education: a systematic review of the literature. Int J Music Educ 2022; 08: 025576142210937.

. Inheritance and promotion of Chinese traditional music culture in college piano education. Heritage Science 2022; 10(1): 75.

Dai

. Teaching integration of piano and traditional music elements in colleges and universities based on network flow optimization. Appl Math Non Sci 2024; 9(1): 1–18. https://sciendo.com/article/10.2478/amns.2023.2.01545.

Peng

. Digital technologies: potential for piano education. Interact Learn Environ 2023; 1: 1–13.

Wei

. A study of piano timbre teaching in the context of artificial intelligence interaction. Comput Intell Neurosci 2021; 2021: 1–11. Kumar A, editor.

. Analysis of piano performance characteristics by deep learning and artificial intelligence and its application in piano teaching. Front Psychol 2022; 27: 12.

Vidulin

Kazić

. Cognitive-emotional music listening paradigm in professional music education. International Journal of Cognitive Research in Science Engineering and Education 2021; 20: 131.

Ramoneda

Jeong

Vsevolod

, et al. Combining piano performance dimensions for score difficulty classification. Expert Syst Appl 2024; 238: 121776.

10.

Cui

. Artificial intelligence and creativity: piano teaching with augmented reality applications. Interact Learn Environ 2022; 31: 1–12.

11.

Sun

. Innovations of music and aesthetic education courses using intelligent technologies. Educ Inf Technol 2023; 28: 13665–13688.

12.

Liao

. Educational evaluation of piano performance by the deep learning neural network model. Mob Inf Syst 2022; 2022: 1–12. Zakarya M, editor.

13.

Varinya

Tsai

. Automatic evaluation of piano performances for STEAM education. Appl Sci 2021; 11(24): 11783.

14.

Yin

. Educational innovation of piano teaching course in universities. Educ Inf Technol 2023; 28: 1–16.

15.

. Piano education of children using musical instrument recognition and deep learning technologies under the educational psychology. Front Psychol 2021; 12: 705116.

16.

Sun

. Evaluation of potential correlation of piano teaching using edge-enabled data and machine learning. Mob Inf Syst 2021; 2021: 1–11. Jan MA, editor.

17.

Luo

Ning

. Toward piano teaching evaluation based on neural network. Sci Program 2022; 2022: 1–9. Hussain J, editor.

18.

Wei

Karuppiah

Prathik

. College music education and teaching based on AI techniques. Comput Electr Eng 2022; 100: 107851.

19.

Niu

. Penetration of multimedia technology in piano teaching and performance based on complex network. Math Probl Eng 2021; 2021: 1–12. Tsai SB, editor.