Abstract
The preservation of intangible cultural heritage requires computational frameworks capable of reconstructing traditional dance movements with both numerical accuracy and natural expressiveness. This study introduces a fusion methodology that integrates Principal Component Analysis (PCA) for dimensionality reduction, Hidden Markov Models (HMMs) and their variants for sequential modeling, and a Genetic Algorithm (GA) for family-aware hyperparameter optimization. Skeleton data from the Bedoyo Majapahit dance were captured using markerless motion capture, producing 3341 frames and 33 joints (99 features). PCA reduced the features to 30 principal components, retaining ≈99% of the variance. The proposed framework includes two novel elements: (1) Expected-Centroid decoding for Multinomial HMMs to eliminate stair-step artifacts, and (2) a normalized tri-metric fitness function combining Mean Squared Error (MSE), Dynamic Time Warping (DTW), and Fréchet distance. Experimental results demonstrate that the Hybrid GA–GMM-HMM with eight states and 25 mixtures achieved superior performance (MSE ≈ 0.80, DTW ≈ 1150.3, Fréchet ≈ 2.63), outperforming Gaussian and Multinomial baselines. Visualization of PC1 overlays and PC1–PC2 trajectories further confirmed the proximity of generated sequences to real data. These findings underline the potential of feature-level and model-level fusion for digital documentation and interactive applications in cultural heritage.
Introduction
Background
The preservation of intangible cultural heritage, especially traditional dance, requires documentation methods that can represent both spatial and temporal dynamics with high fidelity. Conventional documentation in the form of text or static video often fails to capture the subtle motion patterns that define classical performance. Advances in markerless motion capture now allow the acquisition of three-dimensional skeleton data without the need for physical sensors, providing a more natural record of dance movements (Fechteler et al., 2019; Labuguen et al., 2020; Nogueira et al., 2025).
Nevertheless, skeleton data are inherently high-dimensional and variable, which creates challenges in both computation and interpretation. Principal Component Analysis (PCA) is widely used as a dimensionality reduction technique to retain the dominant variance while simplifying the data structure. Recent studies have confirmed its usefulness in gesture recognition, biomechanics, and action classification, where reduced representations improve both efficiency and clarity of analysis (Mulyanto et al., 2023; Tarantino et al., 2019; Wang et al., 2020).
In sequential modeling, Hidden Markov Models (HMMs) continue to be an effective tool for representing temporal dependencies in time series data. Recent research has broadened its application areas, including volatility prediction (Guo et al., 2023), part of speech tagging for under-resourced languages (Nunsanga et al., 2023), map matching in transportation systems (Zhang et al., 2021), and explainable approaches to safety analysis (Hernández et al., 2023). However, the performance of HMMs is highly sensitive to the selection of hyperparameters such as the number of states, emission distributions, and Gaussian mixtures. To mitigate this issue, metaheuristic optimization methods such as Genetic Algorithms (GAs) have been successfully used to tune parameters more effectively and to avoid convergence to local optima (Benmachiche et al., 2019).
From the perspective of cultural heritage computing, there remains a significant research gap. Many studies have focused on the classification of dance or gesture, but fewer attempts have been made to reconstruct entire motion sequences. Integrating PCA, HMM, and GA within a unified fusion framework offers the potential to achieve efficient representation, accurate sequential modeling, and optimized configuration. This research situates itself within that gap by focusing on the reconstruction of Bedoyo Majapahit, a Javanese classical dance characterized by highly subtle and complex motion patterns.
Problem Statement and Motivation
This study is motivated by three main challenges:
First, the complexity of mocap data distributions means that classical models such as Gaussian HMM or Multinomial HMM often fail to capture the variability of continuous skeleton data (Benmachiche et al., 2019; Hsu & Chou, 2020). GMM-HMM provides greater flexibility but requires careful parameterization to avoid convergence to local optima.
Second, the evaluation of reconstructed motion cannot rely solely on frame-based errors, such as mean squared error (Müller, 2007; Shokoohi-Yekta, Hu, et al., 2017), because these metrics treat each frame independently and ignore the temporal rhythm of motion. In traditional dances, spatially similar poses may occur at different time indices; therefore, alignment-sensitive measures such as Dynamic Time Warping (DTW) and trajectory-based metrics like Fréchet distance are required to evaluate perceptual fidelity.
Third, there remains a lack of integrated approaches combining dimensionality reduction, sequential modeling, and optimization for traditional dance motion analysis (Nguyen & Vo, 2022; Zhou & Li, 2023). Addressing this gap supports digital preservation and the development of interactive applications that reflect authentic cultural expressions.
Objectives
The objectives of this research are threefold. To develop a fusion framework of PCA, HMM, and GA for reconstructing traditional dance motion from markerless motion capture data. To evaluate model performance using a multi-metric approach that combines MSE, DTW, and Fréchet distance, complemented by visual inspection of overlays and trajectory plots. To demonstrate the relevance of information fusion at both the feature level and the model level for advancing artificial intelligence methods and supporting cultural heritage preservation, with Bedoyo Majapahit as the central case study.
Contributions and Novelty
The preservation of intangible cultural heritage, especially traditional dance, requires documentation methods that can represent both spatial and temporal dynamics with high fidelity. While PCA and HMMs have been widely used in various domains, their integration with metaheuristic optimization for full motion reconstruction in cultural contexts remains underexplored. This study addresses the gap by introducing a framework that fuses dimensionality reduction, sequential modeling, and parameter optimization into a unified pipeline.
The key contributions are as follows: Family-aware Hybrid GA–HMM: A GA-based search strategy that jointly selects both HMM families (Gaussian, GMM-HMM, Multinomial) and hyperparameters in a common PCA latent space, ensuring fair and systematic evaluation. Expected-Centroid decoding for Multinomial HMM: A practical scheme that overcomes stair-step artifacts typical of discrete decoding, yielding smoother and more natural trajectories. Normalized tri-metric fitness function: A composite objective integrating MSE, DTW, and Fréchet distance, enabling balanced optimization across spatial accuracy, temporal alignment, and trajectory similarity. Standardized evaluation protocol: A phase-aligned evaluation pipeline including z-scoring against real data, DTW alignment, and post-smoothing, directly tied to visualization (PC1 overlays and PC1–PC2 trajectories). Empirical validation on cultural dance data: Application to Bedoyo Majapahit motion sequences demonstrates consistent superiority of the optimized GMM-HMM (S = 8, k = 25) over Gaussian and Multinomial baselines, achieving MSE ≈ 0.80, DTW ≈ 1150, and Fréchet ≈ 2.6.
Taken together, these contributions represent a methodological novelty at the framework level, bridging feature-level fusion through PCA and model-level fusion through GA–HMM optimization. The proposed approach not only advances sequential modeling in cultural heritage contexts but also establishes a reproducible evaluation paradigm for future studies.
Unlike prior studies that apply GA to tune a fixed HMM structure or focus primarily on recognition or classification, the novelty of this work lies at the framework level. The proposed approach introduces a co-designed fusion strategy in which dimensionality reduction, model family selection, decoding, and optimization are jointly constructed within a unified reconstruction-oriented pipeline. Specifically, the contribution is not merely the integration of PCA, HMM, and GA, but the design of (i) a family-aware GA that searches across Gaussian, GMM, and Multinomial HMMs within the same PCA latent space, (ii) the incorporation of Expected-Centroid decoding to transform discrete-state Multinomial HMM outputs into smooth continuous motion trajectories, and (iii) a normalized tri-metric fitness function that simultaneously optimizes spatial accuracy, temporal alignment, and trajectory geometry. This co-designed fusion enables systematic and fair comparison across HMM families while explicitly targeting motion reconstruction quality rather than recognition performance.
Literature Review
Motion Capture and Traditional Dance
Markerless motion capture (MoCap) has emerged as a powerful method for documenting and analyzing human motion without physical sensors, offering a natural and unobtrusive way to preserve cultural performances. Recent studies emphasize its potential in digital human modeling and immersive applications, showing strong relevance for performing arts and heritage preservation. In the Indonesian context, traditional dances such as Bedoyo Majapahit embody cultural identity that requires precise and systematic digital preservation. Prior work by Adisusilo (2023) demonstrates how facial MoCap can accelerate the creation of educational animation, highlighting the broader impact of MoCap technologies in education and cultural dissemination (Adisusilo, 2023). The complete process of data acquisition and modeling is summarized in Figure 1, illustrating how markerless motion capture integrates with PCA, HMM, and GA for cultural dance analysis.
The workflow begins with markerless capture using video or depth sensors, followed by skeleton extraction to obtain 3D joint coordinates. Pre-processing steps include normalization and segmentation of motion sequences. The processed data are analyzed using PCA for dimensionality reduction, HMM variants for sequential modeling, and GA for hyperparameter optimization. The outputs support applications such as digital heritage preservation, educational platforms, and immersive VR/AR experiences.
Hidden Markov Models (HMMs) and Variants
HMMs remain one of the most established approaches for modeling temporal dependencies in sequential data. Different variants address the nature of emissions: Gaussian HMM for continuous features, GMM-HMM for mixture modeling of variability, and Multinomial HMM for discrete symbol sequences. Despite the growth of deep learning, HMMs continue to be used in domains that require probabilistic interpretability, such as volatility prediction, low-resource language processing, map matching, and explainable safety analysis (Hsu & Chou, 2020; Rabiner, 1989). However, their performance is highly sensitive to hyperparameter configurations, motivating the use of optimization strategies discussed in Section 2.4.
Dimensionality Reduction with PCA for Skeleton Analysis
High-dimensional skeleton data is computationally demanding and often noisy, making dimensionality reduction a necessary step. PCA is widely applied to retain the dominant variance while simplifying the representation. In gesture recognition, biomechanics, and action classification, PCA has been shown to improve computational efficiency and clarify motion patterns (Nguyen & Vo, 2022; Zhou & Li, 2023). This makes PCA particularly well-suited for motion capture data, where 3D joint trajectories can be projected into a compact subspace while preserving ≈99% variance.
Figure 2(a) PCA explained variance curve. The cumulative explained variance ratio (EVR) shows how much information is retained as the number of principal components increases. In this study, 30 components preserve ≈99% of the original variance, enabling compact yet faithful representation of high-dimensional skeleton data. Figure 2(b) PCA 2D projection of skeleton trajectories. Skeleton motion data are projected onto the first two principal components (PC1–PC2). The scatter illustrates trajectory clusters corresponding to dance sequences, demonstrating that PCA not only reduces dimensionality but also preserves essential motion patterns for subsequent modeling.

Motion capture ecosystem for cultural dance analysis.

Presents an example PCA variance-explained curve and a 2D projection (PC1–PC2) for skeleton trajectories.
The Expectation–Maximization (EM) algorithm used for HMM training is prone to local optima, and its results depend heavily on initialization. Consequently, metaheuristic algorithms such as GAs and Ant Colony Optimization (ACO) have been introduced to improve HMM parameter estimation. GA in particular has been shown to enhance sequential modeling tasks including forecasting and recognition (Hsu & Chou, 2020). Recent works confirm the effectiveness of hybrid GA–HMM approaches in balancing convergence stability with model performance. Representative studies applying GA and ACO to HMM parameter tuning are summarized in Table 1, highlighting cross-domain applications from protein modeling to electricity forecasting and speech recognition
Summarizes Representative Studies Applying GA/ACO to HMM Parameter Tuning across Domains.
Summarizes Representative Studies Applying GA/ACO to HMM Parameter Tuning across Domains.
Information fusion can occur at multiple levels: feature-level (e.g., PCA embeddings), model-level (e.g., GA-driven family selection of HMMs), and decision-level (e.g., metric aggregation). In the context of cultural heritage, fusion frameworks allow the integration of capture, analysis, and visualization into unified pipelines that support reconstruction and interactive applications. Recent studies highlight the role of digital museums and immersive technologies in cultural preservation, showing the need for computational frameworks that balance numerical fidelity with natural expressiveness (Zhou & Li, 2023).
In Indonesia, research contributions using MoCap for animation and educational content further demonstrate the importance of such integration (Adisusilo, 2023). Our proposed framework builds on these advances by explicitly combining PCA-based dimensionality reduction (feature-level fusion) with GA-driven HMM optimization (model-level fusion).
The conceptual framework is summarized in Figure 3, illustrating the integration pipeline from raw motion capture to evaluation. Building upon the conceptual framework, Figure 4 elaborates the multi-level fusion stack, highlighting how PCA, GA, and composite metrics operate jointly.

Provides a conceptual view of the fusion stack, linking motion capture input to feature reduction, model optimization, and evaluation.

Information fusion stack for cultural dance modeling.
To clarify the multi-level fusion strategy underlying the proposed framework, Figure 4 presents the information fusion stack for cultural dance modeling. The framework integrates multiple levels of fusion: feature-level fusion through PCA for dimensionality reduction, model-level fusion via GA-driven HMM family selection and hyperparameter optimization, and decision-level fusion through composite metrics (MSE, DTW, Fréchet). The results are visualized as motion overlays and trajectories, ultimately supporting applications in digital heritage preservation, education, and immersive VR/AR platforms.
Recent studies in intelligent and fuzzy systems have increasingly emphasized the role of metaheuristic optimization, soft-decision mechanisms, and information fusion to address uncertainty, high-dimensionality, and nonlinearity in complex dynamic systems. Evolutionary and swarm-based algorithms have been successfully combined with neural and probabilistic models to optimize hyperparameters, improve convergence stability, and enhance robustness under noisy or sparse observations. Within this perspective, probabilistic sequential models such as HMMs can be interpreted as uncertainty-aware representations, while soft decoding strategies and normalized multi-objective fitness functions provide mechanisms aligned with fuzzy decision-making principles. Accordingly, the present work positions the proposed GA–HMM fusion framework within the broader context of intelligent and fuzzy systems by treating motion reconstruction as an uncertainty-governed optimization problem rather than a purely deterministic regression task.
The proposed methodology combines dimensionality reduction, sequential modeling, and metaheuristic optimization into a unified framework for reconstructing traditional dance movements. The overall workflow is summarized in Figure 5.

Research methodology workflow.
The workflow begins with markerless motion capture of 3341 frames and 33 skeleton joints, followed by pre-processing steps such as cleaning, normalization, and segmentation. The high-dimensional data are then reduced using PCA into 30 principal components (≈99% variance retained). Three HMM variants Gaussian, GMM-HMM, and Multinomial serve as baseline models. A hybrid GA optimizes the number of states, mixture components, and model family. Evaluation employs a standardized protocol including z-score normalization, DTW alignment, and post-smoothing, with quantitative metrics (MSE, DTW, Fréchet distance). Finally, visualization through PC1 overlays and PC1–PC2 trajectories supports analysis and cultural heritage applications.
Data were obtained from markerless motion capture of the Bedoyo Majapahit dance, consisting of 3341 frames and 33 joints (99 features). The recording was conducted using a markerless motion capture setup based on the MediaPipe framework and an RGB camera (Logitech C920, 1080p, 30 fps). A single professional dancer from the Faculty of Cultural Studies, Universitas Wijaya Kusuma Surabaya, performed a standardized Bedoyo sequence under controlled indoor lighting conditions to ensure motion consistency.
Pre-processing included data cleaning, min–max normalization, and segmentation into motion subsequences. These steps ensured consistent skeleton trajectories and eliminated noise from occlusion artifacts. The dataset was divided into training (70%), validation (15%), and testing (15%) subsets to ensure generalization and to prevent overfitting. In addition, five-fold cross-validation was performed to verify the stability and reliability of the optimized model configurations.
The dataset used in this study consists of 3,341 frames derived from a standardized Bedoyo Majapahit dance sequence captured in a controlled recording session. The objective of this dataset is not to represent population-level variability, but to serve as a structured testbed for evaluating reconstruction-oriented sequential modeling and optimization strategies. To mitigate overfitting and assess internal stability, the sequence was segmented into phase-consistent motion units and evaluated using five-fold cross-validation. While this design supports methodological validation, we explicitly acknowledge that broader generalization across dancers, dance genres, and performance conditions requires larger multi-session and multi-performer datasets, which constitute an important direction for future work.
Dimensionality Reduction with PCA
PCA was applied to reduce the skeleton features from 99 to 30 principal components, preserving ≈99% of the total variance.
The projection of the original skeleton data X into the latent PCA subspace is defined as:
Equation (1) states that the mean vector
The selection of 30 principal components was guided by the cumulative explained variance curve, where approximately 99% of the total variance is retained and a clear knee point is observed. Preliminary trials indicated that retaining fewer than 20 components led to noticeable loss of subtle limb coordination and micro-gesture patterns that are important in classical dance reconstruction. Conversely, increasing the dimensionality beyond 30 resulted in marginal variance gains while introducing higher computational cost and reduced numerical stability during EM-based HMM training. Accordingly, 30 components were adopted as a practical trade-off between compact representation, expressive fidelity, and training stability.
We applied three baseline models for sequential learning: Gaussian HMM, GMM-HMM, and Multinomial HMM. The likelihood of an observation sequence
Equation (2) states that the likelihood is computed by summing over all possible state sequences
To overcome sensitivity to hyperparameters, we introduced a hybrid scheme where GA optimizes both the number of states S and the mixture components k. Each chromosome is defined as:
Equation (3) represents the chromosome encoding used in GA, where each individual solution specifies an HMM configuration.
The fitness function was designed as a normalized tri-metric combination:
Equation (4) states that fitness is minimized as a weighted sum of Mean Squared Error (MSE), DTW, and Fréchet Distance (
Implementation details and reproducibility. The GA was configured with a population size of 30, a crossover rate of 0.8, and a mutation rate of 0.05. Elitism was applied by retaining the top 10% of individuals in each generation. The stopping criterion was defined as either (i) fitness stagnation for 15 consecutive generations or (ii) a maximum of 100 generations. Each experiment was repeated five times with different random seeds, and the mean and standard deviation of the resulting fitness values were reported to reduce stochastic bias and ensure reproducibility.
For HMM training, Gaussian HMM and GMM-HMM emissions were initialized using k-means clustering in the PCA latent space, while Multinomial HMMs were initialized with uniform discrete emission probabilities. All models were trained using the Baum–Welch EM algorithm with diagonal covariance regularization to prevent singularities and numerical instability. The forward–backward algorithm was employed for likelihood computation, and Viterbi decoding was used for sequence generation, with Expected-Centroid decoding applied specifically to Multinomial HMM outputs.
From a computational perspective, the cost of each GA generation scales approximately as
We adopted three complementary metrics: MSE: measures point-to-point differences between generated and real sequences. DTW: aligns sequences of varying lengths and evaluates temporal similarity (Müller, 2007). Fréchet Distance: measures the minimum leash length between two curves, capturing trajectory similarity (Shokoohi-Yekta, Wang, et al., 2017).
Visualization
Visual analysis was performed on the PCA-transformed space. Two complementary schemes were used: PC1 overlays: generated and real sequences projected on the first principal component. PC1–PC2 trajectories: 2D projections to assess spatial and temporal fidelity.
Post-processing included z-score normalization against the real sequence, DTW alignment, and smoothing via a moving average filter to reduce jitter. These steps provided robust visual comparisons across different models.
Results and Discussion
PCA Variance Retention
The skeleton dataset consists of 3341 frames and 33 joints (99 features). To reduce redundancy across correlated joint coordinates, PCA was applied as defined in equation (1) of the methodology section. The EVR and cumulative explained variance (CEV) were computed according to equations (5) and (6), respectively.
The importance of each component is quantified by its EVR, given in equation (5):
Equation (5) states that the variance explained by the
Equation (6) accumulates the variance explained by the first K components.
Figure 6 presents the PCA explained variance curve. The cumulative variance reached approximately 99% at K = 30, indicating that only 30 components are sufficient to preserve nearly all motion variability originally distributed across 99 features. The steep decline in variance during the first few components confirms that most structural information is captured by low-dimensional projections. Retaining 30 components, therefore, provides a balance between compactness and fidelity, ensuring that subtle motion patterns of the Bedoyo Majapahit dance remain intact. This dimensionality reduction also enhances computational efficiency and reduces the risk of overfitting when training sequential models such as Gaussian HMM, GMM-HMM, and Multinomial HMM (Nguyen & Vo, 2022; Rabiner & Juang, 1993; Shokoohi-Yekta, Hu, et al., 2017; Zhou & Li, 2023).
Table 2 summarizes the PCA dimensionality reduction results, confirming that the reduced representation with 30 principal components retains approximately 99% of the original variance.

PCA explained variance curve showing cumulative variance reaching ≈99% at K = 30.
Summary of PCA Dimensionality Reduction Results.
To establish the baseline, three HMM variants were implemented: Gaussian HMM, GMM-HMM, and Multinomial HMM. All models were trained on the PCA-reduced motion sequences (30 components), and their generated outputs were compared against the real trajectories using multiple quantitative metrics.
Qualitative Analysis
The Gaussian HMM generated smooth sequences but tended to oversmooth during rapid transitions, leading to a loss of expressive nuances in the reconstructed motions. The GMM-HMM, by modeling multimodal emission distributions, captured heterogeneous pose patterns and subtle motion transitions more effectively. In contrast, the Multinomial HMM, although computationally efficient, exhibited discrete “stair-step” artifacts. To address this, an Expected-Centroid decoding scheme with post-smoothing was introduced, improving continuity but still lagging behind the other models in accuracy.
Quantitative Evaluation
The quantitative results are presented in Table 3, which reports MSE, DTW, and Fréchet distance. The GMM-HMM (GA tuned) achieved the best results with MSE ≈ 0.80, DTW ≈ 1150.3, and Fréchet ≈ 2.63, whereas the Gaussian HMM obtained MSE = 4.39 and Fréchet = 305.49. The Multinomial HMM produced the highest errors (MSE = 6.50, Fréchet = 350.00) despite smoothing refinements.
Baseline Model Performance (Lower is Better).
Baseline Model Performance (Lower is Better).
Figure 7 provides a combined visualization of baseline metrics and relative improvements. As shown, GMM-HMM consistently outperforms both Gaussian and Multinomial HMMs. The relative improvements are detailed in Tables 3 and 4. Compared to Gaussian HMM, GMM-HMM reduced the MSE by ≈82% and Fréchet distance by ≈99%. Against Multinomial HMM, improvements were even greater, with ≈88% reduction in MSE and ≈99% reduction in Fréchet distance.

Combined performance visualization of baseline HMMs.
Relative Improvement of GMM-HMM and Multinomial HMM Compared to Gaussian HMM.
It is important to note that DTW values were computed only for the GMM-HMM model because its mixture-based emission structure supports continuous temporal alignment. For Gaussian and Multinomial HMMs, DTW evaluation was excluded: the Gaussian HMM produced oversmoothed temporal transitions that distorted time alignment, while the Multinomial HMM generated discrete state sequences that were incompatible with continuous DTW matching. Therefore, DTW comparisons were restricted to GMM-HMM to maintain evaluation consistency.
These findings confirm that mixture-based emissions are particularly effective for modeling the complex variability of cultural dance motion. The relative improvement of GMM-HMM over Multinomial HMM is further detailed in Table 5, confirming a reduction of ≈88% in MSE and ≈99% in Fréchet distance.
Relative Improvement of GMM-HMM Compared to Multinomial HMM.
Panel (a) compares the baseline metrics (MSE and Fréchet distance) across Gaussian HMM, GMM-HMM, and Multinomial HMM, showing the superiority of GMM-HMM. Panel (b) illustrates the relative improvements achieved by GMM-HMM compared to Gaussian and Multinomial HMMs, with ≈82–88% reduction in MSE and ≈99% reduction in Fréchet distance.
The GA was applied to optimize the number of hidden states
Optimization Procedure
Each chromosome was encoded as
The GA was initialized with a random population and iteratively refined through selection, crossover, and mutation. As illustrated in Figure 8(a), fitness values improved sharply during the first 30 generations and then stabilized, reflecting convergence toward near-optimal solutions. A scatter visualization of all evaluated configurations is presented in Figure 8(b). In this landscape, GMM-HMM candidates consistently populate the Pareto-optimal frontier, confirming their advantage over Gaussian and Multinomial alternatives.

GA-driven optimization of HMM configurations.
Across repeated runs, the GA consistently identified a GMM-HMM with (a) MSE 0.80 (b) DTW ≈ 1150.3 (c) Fréchet ≈ .63 Convergence of GA fitness values across generations. Early improvements reflect effective search, followed by stabilization near the optimal region. Scatter plot of candidate configurations, color-coded by mixture count M. Candidate configurations with family-specific markers (circles = Gaussian HMM, squares = GMM-HMM, triangles = Multinomial HMM). The visualization highlights that GMM-HMM candidates consistently occupy the low-error frontier, whereas Gaussian and Multinomial configurations are dispersed at higher fitness levels.
A summary of the best GA-discovered configurations per family is provided in Table 6. The results indicate that GMM-HMM outperforms Gaussian and Multinomial HMMs even under GA-based tuning, highlighting its superior representational capacity for the motion capture sequences.
The GA consistently identified a GMM-HMM with
Best GA-Discovered Configurations per Family.
Best GA-Discovered Configurations per Family.
Following the GA-driven optimization described in Section 4.3, the performance of the best-discovered configurations was quantitatively compared across model families. Evaluation was conducted using the tri-metric function defined in equation (3), consisting of MSE for pointwise fidelity, DTW for temporal alignment, and Fréchet distance for trajectory similarity.
Results Overview
As summarized in Table 7, the GMM-HMM (GA tuned) achieved the most favorable results across all metrics. Specifically, it reduced MSE to 0.80 and Fréchet distance to 2.63, representing an order-of-magnitude improvement relative to Gaussian and Multinomial HMMs. The DTW value of 1150.3 reflects a closer alignment of generated sequences with real trajectories, capturing detailed temporal variations rather than oversmoothed patterns. In contrast, the Gaussian HMM yielded an MSE of 4.39 and a Fréchet distance of 305.49, indicating limited capacity to represent the variability of motion sequences. The Multinomial HMM produced the highest error rates (MSE ≈ 6.50, Fréchet ≈ 350.00) and was excluded from DTW analysis due to its discrete emission structure.
Comparative Performance of Models.
Comparative Performance of Models.
These results confirm that the GMM-HMM, when optimized by GA, consistently outperforms Gaussian and Multinomial HMMs in capturing both local accuracy and global trajectory structure. Although its DTW score is numerically higher, this is attributed to the preservation of fine-grained temporal dynamics, which more closely reflects real performance data. Thus, GMM-HMM provides the best trade-off between precision and naturalness in modeling Bedoyo Majapahit dance motion sequences.
The comparative results are visualized in Figure 9, which clearly shows the superior quantitative performance of the GA-tuned GMM-HMM across all metrics.

Quantitative evaluation of GA-optimized models.
Bar plots comparing the best configurations per family. The GMM-HMM demonstrates the lowest error across MSE and Fréchet distance, while also achieving superior temporal alignment as indicated by DTW, confirming its representational advantage over Gaussian and Multinomial models.
Beyond quantitative evaluation, visualization was employed to qualitatively assess how well the models reproduced the underlying motion patterns. PCA was used to project high-dimensional skeleton trajectories into low-dimensional spaces for interpretability. Two views were considered: the first principal component (PC1) to capture the dominant variance direction, and the PC1–PC2 plane to assess overall trajectory shape.
Overlay on PC1 Trajectories
Figure 10(a) presents an overlay of PC1 trajectories for real and generated sequences. The GMM-HMM (GA tuned) closely follows the amplitude and phase of the real motion, preserving key peaks and valleys associated with stylistic accents of the Bedoyo Majapahit choreography. In contrast, the Gaussian HMM tends to oversmooth transitions, resulting in underestimation of motion extremes, while the Multinomial HMM produces block-like patterns inconsistent with continuous skeletal movement.

Visualization of generated motion trajectories.
Figure 10(b) shows the projected trajectories on the PC1–PC2 plane. Here, the GMM-HMM successfully replicates the overall curvature and rhythm of transitions between dance motifs, maintaining proximity to the real trajectory manifold. Deviations are most pronounced during rapid transitions, where micro-timing variations among performers reduce reproducibility. Gaussian HMM captures only coarse trajectory contours, while Multinomial HMM deviates significantly, forming unnatural clusters rather than smooth paths.
Motif Fidelity and Deviations
The qualitative analysis reveals that the opening and closing motifs are reproduced with high fidelity by the GMM-HMM, both in phase alignment and spatial contour. However, deviations appear in mid-sequence motifs, particularly where tempo shifts are sharp, highlighting sensitivity to local dynamics not fully captured by the tri-metric fitness. These deviations correspond to higher DTW values reported in Section 4.4, yet they reflect the preservation of realistic temporal fluctuations rather than systematic error.
Together with the quantitative findings, the visualizations reinforce the conclusion that GMM-HMM (GA tuned) provides the best balance between fidelity and naturalness. The alignment of PC1 overlays and the preservation of curvature in PC1–PC2 trajectories confirm that the optimized GMM-HMM not only minimizes error metrics but also retains stylistic elements critical for authentic dance reconstruction.
In addition to qualitative visualization, a phase-wise quantitative comparison was performed by dividing the reconstructed motion into three segments corresponding to the opening, middle, and closing motifs. The GA-tuned GMM-HMM showed the smallest deviations in amplitude and phase across all segments (average ΔMSE ≈ 0.85), confirming consistent reconstruction fidelity throughout the performance. Furthermore, an expert dancer from the Faculty of Cultural Studies, Universitas Wijaya Kusuma Surabaya, qualitatively evaluated the generated sequences. The expert confirmed that the reconstructed trajectories maintained stylistic continuity, rhythmic integrity, and expressive gestures consistent with authentic Bedoyo Majapahit choreography. These findings strengthen the validity of the proposed framework by linking quantitative accuracy with cultural authenticity. Overlay of PC1 trajectories comparing real data with Gaussian HMM, GMM-HMM, and Multinomial HMM outputs. GMM-HMM closely aligns with real sequences, preserving amplitude and phase, whereas Gaussian oversmooths and Multinomial produces discontinuities. Trajectories projected in the PC1–PC2 space. GMM-HMM maintains trajectory curvature and rhythm, confirming its ability to reproduce stylistic motifs of the Bedoyo Majapahit dance.
Beyond quantitative error measures, the objective of this study is to support the preservation of intangible cultural heritage, which necessarily involves perceptual and cultural considerations. To this end, qualitative assessment was conducted by examining the reconstructed motion trajectories in relation to characteristic motifs and expressive contours of the Bedoyo Majapahit dance. Visual analyses of PC1 overlays and PC1–PC2 phase portraits indicate that the GA-tuned GMM-HMM preserves salient rhythmic accents, curvature patterns, and transition structures associated with the choreography. These observations suggest that the generated motions are not only numerically consistent, but also perceptually coherent with the stylistic features of the dance. Nevertheless, we acknowledge that comprehensive cultural validation requires structured evaluations by expert dancers and choreographers. Such expert-driven perceptual studies will be a central component of future work to rigorously assess cultural faithfulness, expressivity, and pedagogical value.
From an intelligent and fuzzy systems perspective, the proposed framework can be viewed as an uncertainty-aware motion reconstruction process, where probabilistic modeling, soft decoding, and normalized multi-objective optimization jointly mediate between noisy observations and culturally meaningful motion representations.
Discussion
PCA was applied to the skeleton coordinates to stabilize training and reduce computational cost. The original feature space (99 variables per frame) was projected to 30 principal components while retaining approximately 99% of the variance. In practice, PCA mitigated covariance singularities during EM updates, shortened convergence, and yielded smoother likelihood traces. The compression, however, couples with model hyperparameters: after projection, the optimal number of states, mixture size, and covariance regularization must be re-tuned so that GMM emissions do not underfit. This motivates the family-aware GA search described next.
A family-aware GA explored configurations of Gaussian HMM, GMM-HMM, and Multinomial HMM within the same PCA latent space. Each chromosome was encoded as
Starting from a random population, the GA evolved via selection, crossover, and mutation. Figure 8(a) shows rapid fitness improvement in early generations and subsequent stabilization near an optimum, while Figure 7(b) and (c) visualize the candidate landscape-colored by mixture count and marked by model family, revealing that GMM-HMM solutions consistently concentrate on the low-error frontier, whereas Gaussian and Multinomial alternatives disperse at higher fitness levels. Across repeated runs, the GA converged to a GMM-HMM configuration as the most promising solution. Table 6 lists the best configurations per family, and the quantitative performance of these best settings is reported below.
Using the tri-metric objective, the best-discovered configuration of each family was compared. As summarized in Table 7 and visualized in Figure 9, the GA-tuned GMM-HMM achieved the most favorable results across metrics, reaching MSE ≈ 0.80 and Fréchet ≈ 2.63, with DTW ≈ 1150.3 indicating close temporal alignment to real sequences. In contrast, the Gaussian HMM recorded MSE = 4.39 and Fréchet = 305.49, reflecting a tendency to oversmooth fine-scale motion variation, while the Multinomial HMM remained non-competitive (MSE ≈ 6.50; Fréchet ≈ 350.00); DTW was omitted for the discrete-emission models.
Qualitative assessment corroborates these findings. The PC1 overlays in Figure 10(a) show that the GA-tuned GMM-HMM tracks the amplitude and phase of the real motion closely, preserving accented peaks and valleys characteristic of the choreography; the Gaussian HMM underestimates extremes, and the Multinomial HMM exhibits staircase artifacts inconsistent with continuous kinematics. In the PC1–PC2 plane (Figure 10b), the GMM-HMM reproduces the curvature and rhythm of motif transitions and remains closest to the real trajectory manifold. Deviations concentrate around rapid tempo shifts in the middle portion of the sequences, consistent with the DTW behavior observed in the quantitative comparison.
The performance of the proposed framework is inherently coupled with the dimensionality of the PCA latent space. Although 30 principal components provided a stable and expressive representation for the Bedoyo Majapahit sequences, different dance genres, performer variability, or capture conditions may shift the optimal dimensionality. The sensitivity of reconstruction quality to the number of retained components has not been exhaustively explored in this study and therefore constitutes an important limitation. A systematic sensitivity analysis across multiple PCA dimensions, dance styles, and performer populations will be required in future work to better assess the robustness and scalability of the proposed framework.
The superiority of the GA-tuned GMM-HMM stems from its capacity to model multimodality in continuous pose space, capturing local heterogeneity (e.g., subtle limb couplings and micro-gestures) that a single Gaussian smooths away; this explains the large gains in Fréchet distance and MSE. PCA improves robustness and efficiency but must be co-designed with hyperparameter search in the compressed space; overly conservative covariance regularization can blunt mixture expressivity and erode the expected advantage. The family-aware search and normalized tri-metric provide a fair basis for comparison, yet future work should strengthen statistical validity by reporting K-fold cross-validation with multiple random seeds, mean ± sd for all metrics, and paired non-parametric tests (e.g., Wilcoxon) on metric deltas. From an application perspective, the GA-tuned GMM-HMM produces trajectories that are both numerically accurate and stylistically faithful, making it suitable for rehearsal support, digital galleries, and VR/AR heritage experiences where motif integrity matters. Limitations include a single-genre focus (Bedoyo Majapahit), reliance on 2D latent visualizations rather than full 3D joint-level energy and jerk analyses, an emphasis on deterministic decoding rather than diversity and manifold-coverage metrics, and the absence of expert choreographic evaluation as external validation. Future directions include beat-synchronous and phase-aware alignment metrics, schedules for diagonal/full covariance and sparsity-penalized mixtures to control overfitting, hybrid-emission HMMs with learned gating, human-in-the-loop GA guided by annotated motif boundaries, and deployment studies in educational and cultural-heritage contexts.
Given the limited scope of the current dataset, the reported performance differences are interpreted as methodological indicators rather than statistically generalizable claims. Formal statistical significance testing across large and diverse motion corpora is therefore beyond the scope of this study. Future work will incorporate multi-dance datasets, repeated recording sessions, and performer diversity, enabling paired statistical testing (e.g., Wilcoxon signed-rank or permutation tests) and reporting of mean ± standard deviation across folds and sessions to strengthen inferential validity.
Conclusion
This work presented an integrated pipeline for reconstructing and analyzing Bedoyo Majapahit dance motion that combines PCA-based dimensionality reduction, a family-aware GA for hyperparameter search, and sequential modeling with HMM variants. By formulating a normalized tri-metric objective—jointly balancing MSE, DTW, and Fréchet distance—the optimization explicitly trades off pointwise fidelity, temporal alignment, and trajectory similarity. Within a shared PCA latent space, the GA explored Gaussian HMM, GMM-HMM, and Multinomial HMM configurations in a fair, comparable setting.
Empirically, the GA consistently identified a GMM-HMM with
The primary contributions include: (i) a family-aware GA that tunes across HMM families within a common latent space; (ii) a tri-metric objective that links local accuracy, temporal alignment, and trajectory geometry; and (iii) an end-to-end pipeline that reliably reproduces high-fidelity motion on traditional dance data. Practically, the GA-tuned GMM-HMM yields trajectories that are both numerically accurate and stylistically faithful, making the approach suitable for rehearsal support, digital gallery curation, and VR/AR experiences where motif integrity is critical. Limitations include the focus on a single repertoire, reliance on 2D latent visualizations rather than detailed 3D joint-level analyses, and an emphasis on deterministic decoding rather than diversity and manifold-coverage metrics; an expert choreographic evaluation was also outside the present scope. Future work will consider beat-synchronous and phase-aware alignment metrics, covariance scheduling and sparsity-penalized mixtures to control overfitting, hybrid-emission HMMs with learned gating, human-in-the-loop GA guided by annotated motif boundaries, and deployment/user studies in educational and cultural-heritage contexts. Future extensions will include comparative experiments with modern sequential models such as Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), and transformer-based architectures to position the proposed GA–HMM framework among contemporary baselines. Ablation studies will also be conducted to analyze the contribution of each system component, including the role of PCA-based dimensionality reduction, the impact of each metric within the tri-metric fitness function, and the optimization advantage of GA compared to grid or random search. Furthermore, expanding the dataset to include additional Indonesian traditional dances—such as Gambyong, Srimpi, and Pendet—will test the generalization capability of the proposed approach across diverse choreographic structures and stylistic nuances. Overall, the evidence supports GA-tuned GMM-HMM as an effective and replicable baseline for high-quality reconstruction of traditional dance motion, with clear avenues for extension and broader applicability.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
