Computational framework with novel features for classification of foot postures in Indian classical dance

Abstract

There is a greater need to develop and establish Artificial Intelligence (AI) and its subdomains, as the computing requirements are increasingly met by the emerging hardware technologies. The machine-learning techniques are well suited for the learning-based AI applications that are useful to our daily life. Further, the machine-learning applications can resolve numerous problems of the South Indian Classical Dance (SICD). Nevertheless, these aspects are not yet addressed thoroughly owing to the vastness of the domain. Moreover, the lack of a combined expertise in both domains of the SICD and the computing aggravates the problem. The automatic identification and annotation of a vital aspect called sthanas (foot postures) are necessary for the process of digitizing, archiving and analytics of the SICD. Hence in this paper, we propose a framework to classify the SICD images based on the foot sthanas. The proposed framework incorporates methods to convert raw data to a curated dataset, and extract principal features that are unique to the various foot posture in classical dance, in order to generate an accurate classification. Among the different techniques that were used to evaluate the accuracy, Naive Bayes, trained with the domain-specific features, outperformed all other classification models. The methodology followed in this work can be applied to various national and international dance forms with proper incorporation of their domain-specific features.

Keywords

Sthanas classical dance classification digitization foot postures CNN semantic segmentation Naive Bayes

1. Introduction

Figure 1.

General postures of six sthanas.

The recent initiatives towards digitization of art [1] has alleviated the difficulties in preserving the culture of art forms through archiving. The Indian classical dance or Shastriyanritha [2] is one of the widely recognized art forms, and the works like Natya Shastra [3], Abhinaya Darpana, and Thandavalakshana have expounded its theories and practices lucidly. The digitization of these data opens up the possibilities of computation analysis since the domain is potential. For example, different computational tools can be used to analyze the concept of Karanas in South Indian Classical Dance (henceforth SICD). The Karana consists of three elements called Sthanas or static postures of feet, Nritha Hasthas or movement of arms, and Chaaris or movement of legs and feet. In this work, automatic identification of sthanas (static postures of feet) from images and video segments, which alone posed a problem that requires an efficient computational model for analysis. The set of sthanas slightly alters with each dance form, even though Natya Shastra and Abhinaya Darpana have defined the static foot postures in SICD as a whole. Even though the sthanas are primarily defined based on the lower body and the legs, the entire body posture of the dancer marks its features. In this paper, sthanas as they figure in Kuchipudi dance form, in accordance with the definition of Natya Shastra. In Natya Shastra, sthanas are defined according to their geometrical peculiarities such as Vaishnava, Samapada, Vaishakha, Mandala, Alidha, and Prathyalida. Hence, this work consider these six different classes of Sthanas for classification and analysis. The following Fig. 1 shows the general postures of these sthanas.

We adopted a pipeline of computational processes like Digital Image Processing, Computer Vision, and Machine Learning techniques in order to classify SICD images according to the sthanas of the dancer.

The primary challenge for the work was the unavailability of dataset for the SICD Sthanas. Initially, the Image Processing and the Computer Vision experiments were performed on images available on the internet. Gradually, we created our own dataset for the SICD sthanas that consists of around 500 images of 6 sthanas. These images were collected from dance videos available on the YouTube and archives of different dance schools.

The Image Processing and The Computer Vision techniques were used to extract the required features from images for learning purpose. The unavailability of an efficient Segmentation algorithm for the proper segmentation of the Dancer and the background was a challenge faced at the image processing stage. Nevertheless, a deep learning-based Semantic Segmentation method helped to overcome this difficulty and obtain a good segmentation result. The Image Processing stage was completed using the existing algorithms, and successfully extracted the sthana based features from the dataset of the SICD sthanas.

Image Classification is a widely known machine-learning problem. A supervised learning defines a set of the target class and builds a model that recognizes an input image and its label class. This paper describes the development of a classification model for the sthanas dataset. The CNN is the state-of-the-art method for the image classification, and hence, we first attempted to build a CNN model for the sthanas dataset. But, the dataset was too small, and the performance of the CNN was insufficient. As a result, in this work, we performed identification and extraction of the features followed by their classification. Further, different classification models such as SVM, KNN, and Naive Bayes were applied in the SICD sthanas dataset for better accuracy in classification.

2. Literature review

This section offers a brief review of different works that elucidate the theoretical aspects of the posture recognition in the dance forms. Computational models that are used in the field are also exhaustively studied.

Protopapadakis [4] evaluate the classification techniques for the recognition of dance types from motion-captured human skeleton data. They used Microsoft Kinectic2 for sensing motion in traditional folk dances. The device is capable of capturing depth, video, and voice. And, the authors make use of the data related to the depth for real-time skeleton-tracking. The overall architecture as explained in the work consists of 3 stages: capturing of dance poses, identification of key poses and classification of poses. The authors investigated classification techniques such as the KNN, Naive Bayes, Discriminant Analysis, Classification Tree, Ensemble Methods and SVM in order to classify postures, and have identified the KNN and the Random Forest as the best performing classifiers among them. Nagata et al. [5] has proposed that the motion-capture method detected motions in the Latin dance forms and its outcome was used to synthesize the dance animations. Whereas, Ofli et al. [6] proposed the Hidden Markov Model (HMM) in order to determine the figures of dance that are present in a video recording. This work proposes to capture the dynamics of the motion pattern of the body and synthesize the dance-figures. Further, Saha et al. [7] have proposed Gestures Recognition Algorithm that uses a Kinect sensor, as the best method for Indian Classical Dance forms. They have extracted altogether 23 features using this framework based on the distance between different parts of the upper human body. The velocity and acceleration that was generated during the performance, along with the angle between the different joints, were put to use in the analysis and interpretation of the Indian Classical Dance. According to Partha [8] the Advanced Sensor Technology has led to the development of affordable multimedia cameras like Kinect that can detect and track various human movements that are synchronized with the audio streams. Incidentally, hardly any work has been done in the computaional analysis of dance forms so far. And, none of the proposed methods can be applied for the image data analyzed in this study.

Kannan et al. [9] proposes a Tutoring System for learning danceand summarizes the previous works done by the authors towards modeling, extracting, querying and annotating expressive semantics of a dance video as well as its archiving and retrieving process. Sangeetha [10] has developed a Computational Model for Bharatanatyam Choreography, which identifies and classifies different angalakshanas in the dance form. Hassan et al.’s [11] method based on the Multiple Kernel Learning (MKL) Algorithm was applied for annotation in a novel set of dance images that were foregrounded on texture-based features. This annotation method gave a promising result and was tested along with SIFT. Samanta et al. [12] suggested a new way to represent the dance poses in a video using Pose Descriptor (action descriptor), which uses the histogram of Oriented Optic Flow (HOOF). This further led to the development of a classification model for the dance-videos. Again, as the Indian Classical Dance uses expressive gestures called the mudras that enhances the visual mode of communication with the audience, Mozarkar and Warnekar [13] called attention to the need for a computer-aided recognition of Bharatanatyam mudras in order to depict the salient features of the static double hand mudra image. Fukayama and Goto [14] who is enriched with a large amount of consumer generated data of dance-motion obtained from the web, suggested the Probabilistic Model that maps the beat structures and dance movements using a Gaussian Process. Jadhav et al. [15] has recommended the Stick Figure Representation for automated choreography of dances and Nelson Yalta advocated the implementation of a Sequential Learning Model for dancing-robots. The model generates quasi-realistic patterns of dance-moves without being constrained by the accompanied music. Karavarsamis et al. [16] defined a dance step as the shortest possible extract of the bodily motion that can be uniquely identifiied as a particularly repetitive movement through the time. So, he proposed a classifier that classifies the primitive of Salsa dance. Sicchio [17] has put forward a Machine-Learning algorithm for layering the choreographic process. According to this model, the system creates live choreographic scores from a collection of time-lapsed photos as it works with the t-SNE algorithm. King [18] developed a database for K-pop dance and designed an efficient Rectified Linear Unit (RELU)-based Extreme Learning Machine Classifier (ELMC) with features extracted from Fisher-dance. Shubhangi [19] proposed an algorithm to archive the poses in the Indian Classical Dance domain using the SVM. Kishore [20] classifies the images of hand-mudras in various classical dances using the HOG features and the SVM. Further, Kishore [21] developed a new model using the discrete wavelet transformation of local binary pattern that helps in the segmentation of dance-images and also compared classification-accuracy of AGM with SVM. And finally, P. Kishore classified the acts in the Indian classical dance using a powerful artificial intelligence tool known as Convolutional Neural Networks (CNN). Different CNN architectures were designed and tested with our data to obtain better accuracy in recognition [22]. This exhaustive review literature has proved that the academic discourses has not yet addressed the proposed problem regarding the classification of the SICD images on sthanas.

3. Preliminaries

3.1 Hu moment

In the field of Image Processing, the weighted average of the image pixel intensity, also called as moments or functions of such moments, are usually used as invariant features for representing the objects. Consider a digital image $f(x,y)$ having a dimension $M\times N$ . The 2D moment of such an image of the order $(p+q)$ is defined as

$\displaystyle m_{pq}=\sum_{x=0}^{M-1}\sum_{y=0}^{N-1}x^{p}y^{q}f(x,y)$

Where $p,q=0,1,2,3\ldots$ etc. The central moment corresponding to the 2D moment of order $p+q$ defined as

$\displaystyle{{\mu}}_{pq}=\sum_{x=0}^{M-1}\sum_{y=0}^{N-1}{(x-\bar{x})}^{p}{(y% -\bar{y})}^{q}f(x,y)$

where,

$\displaystyle\bar{x}=\frac{m_{10}}{m_{00}}\textit{ and }\bar{y}=\frac{m_{01}}{% m_{00}}$

The normalized Central moment,

$\displaystyle{\eta}_{pq}=\frac{{\mu}_{pq}}{{\mu}_{00}^{\gamma}}$ $\displaystyle\gamma=\frac{p+q}{2}+1$

The set of seven invariant moments that can be derived from the second and third moment which is invariant to translation scale change, mirroring and rotation is known as hu-moment and are defined as

$\displaystyle\phi_{1}=\eta_{20}+\eta_{02}$ $\displaystyle\phi_{2}=(\eta_{20}-\eta_{02})^{2}+4\eta_{11}^{2}$ $\displaystyle\phi_{3}=(\eta_{30}-3\eta_{12})^{2}+(3\eta_{21}-\eta_{03})^{2}$ $\displaystyle\phi_{4}=(\eta_{{30}}+\eta_{{12}})^{2}+(\eta_{{21}}+\eta_{{03}})^% {2}$ $\displaystyle\phi_{5}=(\eta_{30}-3\eta_{12})(\eta_{30}+\eta_{12})[(\eta_{30}+% \eta_{12})^{2}-3(\eta_{21}+\eta_{03})^{2}]{}+(3\eta_{21}-\eta_{03})(\eta_{21}+% \eta_{03})[3(\eta_{30}+\eta_{12})^{2}-(\eta_{21}+\eta_{03})^{2}]$ $\displaystyle\phi_{6}=(\eta_{20}-\eta_{02})[(\eta_{30}+\eta_{12})^{2}-(\eta_{2% 1}+\eta_{03})^{2}]{}+4\eta_{11}(\eta_{30}+\eta_{12})(\eta_{21}+\eta_{03})$ $\displaystyle\phi_{7}=(3\eta_{21}-\eta_{03})(\eta_{30}+\eta_{12})[(\eta_{30}+% \eta_{12})^{2}-3(\eta_{21}+\eta_{03})^{2}]{}-(\eta_{30}-3\eta_{12})(\eta_{21}+% \eta_{03})[3(\eta_{30}+\eta_{12})^{2}-(\eta_{21}+\eta_{03})^{2}]$

3.2 K-Nearest Neighbour(KNN) classifier

Nearest Neighbor (KNN) algorithm is a simple classifier that predicts the unknown values by comparing them with the most similar known values. Simple method to find the closeness between two objects $P(p_{1},p_{2},\ldots p_{n})$ and $Q(q_{1},q_{2},\ldots q_{n})$ is the Euclidean distance.

$\displaystyle d_{PQ}=\sqrt{\sum_{i=1}^{n}{(q_{i}-p_{i})}^{2}}$

The KNN algorithm finds K nearest neighbors based on minimum distance from the query instance to the training samples. After getting k-nearest neighbors the algorithm chooses the class of majority of the k-nearest neighbors as the predicted class. In Fig. 2 three classes of data points are shown with red, blue and yellow colors, the new data point represented using green has four neighbors three in the class of red and on in the class of yellow, so it will be classified to the class of red object since it has majority neighbors there.

Figure 2.

KNN classifier.

3.3 Naive Bayes classifier

The Naive Bayes classifier is a simple and powerful classifier which works with the Bayes Theorem. It computes the probability of the membership of a query instance with the existing classes and chooses the class with maximum probability. Mathematically it is modeled as:

For finding the membership of a data item $\textit{data}_{j}$ in class $C$ , we will find the probability

$\displaystyle P(C|\textit{data}_{j})=\frac{P(C)P(\textit{data}_{j}|C)}{P(% \textit{data}_{j})}$

The class of $\textit{data}_{j}$ given as

$\displaystyle C=\underset{C_{i}}{\text{argmax}}{P(C_{i}|\textit{data}_{j})}$

Which is the class with the maximum probability of membership for $\textit{data}_{j}$ .

Figure 3.

Different approaches of multi-class SVM.

Figure 4.

BlitzNet architecture.

3.4 Support vector machine(SVM)

Support vector machine (SVM) is a classification tool used for binary classification. SVM always tried to find a hyper-plane which is characterized by a subset of data points supported vectors. SVM is trained with a set of training vectors $x_{i}\in R^{n}$ which is labelled with $y_{i}\in\{-1,1\}$ . SVM works as an Optimization problem to find the discrete boundary $D(x)$ .

$\displaystyle D(x)=\underset{w,b,\lambda}{\text{min}}\left(\frac{1}{2}ww^{T}+c% \sum_{i=1}^{l}\lambda_{i}\right)$

subject to,

$\displaystyle y_{i}\{w^{T}\phi(x)+b\}\geqslant 1-\lambda_{i},\quad\lambda_{i}>0,$ $\displaystyle i=1,2,3,\ldots,l$

Where $l$ is number of observations, $c$ is regularization parameter $w$ and $b$ are weight and the bias respectively, $\lambda$ is the misclassification handler and $\lambda_{i}(x)$ is the transformation function. Using for multiclass problem the variants such as one vs all or one vs one can be used and it is shown in Fig. 3.

3.5 CNN and Semantic Segmentation

The Convolution Neural Networks (CNN) is a class of deep neural network. It has an organized multi-layer perceptron architecture which utilizes the Convolution for extracting features. The CNN performs classification task through its architecture consisting of several convolution layers, pooling layers, ReLU layers and finally a fully connected layer. In this paper, the CNN was tried for the classification procedure, but it produced poor accuracy as the dataset was too small. Segmentation is one of the important procedures for image-analysis. Most of the successful Segmentation algorithms failed while segmenting the dancer from the scene. Semantic Segmentation [23, 24] is a method that relies on deep neural network, where each pixel in an image is mapped with a class label. Figure 4 shows the BlitzNet [25] Semantic Segmentation architecture which is used in this work.

BlitzNet use the Adam algorithm with a mini-batch size of 32 images, it uses use ResNet-50 as a feature extractor, 512 feature maps for each layer in down-scale and up-scale streams, 64 channels for intermediate representations in the segmentation branches. BlitzNet300 takes input images of size 300 $\times$ 300 and BlitzNet512 uses 512 $\times$ 512 images.

4. Proposed framework

In this section, workflow of the proposed method for the classifier framework implementation is described. Data collection and preparation, pre-processing, segmentation, feature extraction, and learning are the different stages in the classifier implementation as illustrated in Fig. 5. In the following subsections, each of this stage is elaborately explained.

Figure 5.

Proposed system architecture.

4.1 Data collection

The difficulty in procuring the relevant data was one of the primary requirements and the greatest challenge that we faced while conducting this study. TheImage Processing and the Machine Learning techniques were unexplored avenues in the SICD domain until now, and, no work has been done specifically in relation to the sthanas. It implies the unavailability of the required dataset for learning purpose. Hence, creation of a dataset for SICD sthanas was the priority of this work. After accruing and understanding the theoretical and practical aspects of sthanas, a large collection of images and videos were examined in order to pick the frames with the sthanas from them. These frames that consisted of sthanas were cut and kept as a labelled dataset. Around 500 images of 6 different sthanas were collected. The sources of the images and videos are different online archives, YouTube, and recordings of dance performances from different institutions.

4.2 Preprocessing

The properties of the data in this raw dataset of the sthanas highly vary, owing to the diversity of our sources. All these data had to be somehow allied for the proper processing of the entire dataset. Hence, in the pre-processing stage of the Image Processing pipeline, each image in the dataset was routed through three different stages as shown in the Fig. 6. It is obvious that the data obtained from different sources have different dimensions. The first stage where images with different dimensions were scaled to a common dimension of 225 $\times$ 300 pixels, helped in normalizing these varied data into a homogeneous dimension. Next stage of denoising the normalized image was completed using the Gaussian filter as it removes the noise fast while keeping the edges relatively sharp. And finally, the Histogram Equalization helped in bringing the distribution of the image intensities to a normal fashion, thereby producing an increased contrast for the image. Figure 7 shows the preprocessed result of a sample image.

Figure 6.

Preprocessing pipeline.

Figure 7.

Preprocessed result of a sample image.

Figure 8.

Semantic Segmentation input image and segmented image.

Figure 9.

Segmentation procedure.

4.3 Segmentation

In the field of Digital Image Processing and Computer Vision, the segmentation is still a challenging problem. Many scholars have proposed several segmentation algorithms based on certain image features like pixel intensity value, color, texture, etc. But, as explicated in the Literature Review, it is clear that no single algorithm is better than the others as a universally applicable frame and not all algorithms suit every particular image type. In this work, the segmentation of the dancer from the background was crucial in order to extract his/her geometric and shape-based features., and hence, finding a suitable algorithm for the proper segmentation was our greatest challenge. In the initial stages, algorithms like the Watershed Algorithm, the Grab Cut Algorithms, etc. were tried, failing to produce competent results. Finally, the scope of Semantic Segmentation for object detection and segmentation matched with our own requirements for a suitable method. In general, the Semantic Segmentation [23, 24] is a method that relies on deep neural network, where each pixel in an image is mapped with a class label. A typical illustration is shown in Fig. 8. This work used such a semantic segmentation model implementation called BlitzNet [25] for effectively segmenting the dancer from the image. BlitzNet is a deep architecture that performs both the object-detection and the semantic segmentation in one forward pass, that allows real-time computations. We utilized the human object-detection capability of BlitzNet and segmented the human from the background. However, the segmented result did not completely match with the requirements of the feature extraction stage. Hence, a post processing is done by incorporating some modification in the code, which helped in obtaining a competent segmented result by creating a binary image. The overall flow of the procedure is shown in Fig. 9.

4.4 Feature extraction

Figure 10.

Geometrical attributes for feature extraction.

As already discussed, according to Natya Shastra, the sthanas are the postures of the legs, which are aligned and placed according to the given definitions. In a computational perspective, these positioning of the legs yield a set of geometrical attributes foreach of the sthanas, as shown in the Fig. 9. That is in mandala pada position, the posture can be divided vertically as two symmetrical halves, but the symmetric nature cannot be found for positions like alida and prathyalida. Similarly, this paper proposes eight sthana-specific geometric features. Among the eight features, six features correspond to the ratios between different length measures obtained from the contour of a dancer in the segmented image and two correlate with the ratios between the areas measured from the lower body. In the feature-extraction procedure, the input image is the segmented image, $I_{m}$ . The initial step is to find the contours in the input image. Contours are curves that join all the continuous points along the boundary, having the same intensity. They are useful tools in shape-analysis, object-detection and object-recognition. After marking the contours, the contour with the second largest area is chosen to define the silhouette of the dancer, because the contour with the largest area is the boundary of the entire image. The next step is to identify the rectangular boundary $<x,y,w,h>$ of the obtained contour so as to find the Contour Moment Center $(c_{x},c_{y})$ .

Further, using this moment center, the boundary rectangle is divided into four sub-rectangular regions say Left Upper Rectangle (LUR), Left Lower Rectangle (LLR), Right Upper Rectangle (RUR), and Right Lower Rectangle (RLR). The geometrical attributes such as dimensions and areas of these sub regions are very significant for the sthana based features. From these sub regions, four different dimensions say upper height (uh), lower height (LH), left width (LW), right width (RW) can be calculated. By taking the ratio of different combinations of these dimensions, six different features are extracted, two area-based features are also extracted by taking the ratios of white and black region areas in LLR and RLR.

Feature extraction[1] $N_{w}$ $X$ Get count of white pixels $\textit{wcount}\leftarrow 0$ pixel in $X$ $\textit{pixel}=1$ $\textit{wcount}\leftarrow\textit{wcount}+1$ returnwcount[1] $N_{b}$ $X$ Get count of black pixels $\textit{bcount}\leftarrow(\textit{X.height}*\textit{H.width})-N_{w}(X)$ returnbcount[1] ExtractFeature $I_{m}$ Extraction process $C\leftarrow$ set of contours of $I_{m}$ $c\leftarrow c_{i}$ , $c_{i}\in C$ and $c_{i}$ is the counter with $2^{nd}$ minimum area $<x,y,w,h>\leftarrow$ countor rectangular boundry of $c$ $<c_{x},c_{y}>\leftarrow$ moment center of $c$ $l_{w}\leftarrow c_{x}-x$ $r_{w}\leftarrow x+w-c_{x}$ $u_{h}\leftarrow c_{y}-y$ $l_{h}\leftarrow y+h-c_{y}$ Compute ratio vector $<r_{1},r_{2},r_{3},r_{4},r_{5},r_{6}>$ as $r_{1}\leftarrow\frac{l_{h}}{u_{h}}$ $r_{2}\leftarrow\frac{r_{w}}{l_{w}}$ $r_{3}\leftarrow\frac{l_{w}}{l_{h}}$ $r_{4}\leftarrow\frac{r_{w}}{l_{h}}$ $r_{5}\leftarrow\frac{l_{w}}{u_{h}}$ $r_{6}\leftarrow\frac{r_{w}}{u_{h}}$

Compute area vector $<a_{1},a_{2}>$ as

$a_{0}\leftarrow\frac{N_{w}(\textit{LLR})}{N_{b}(\textit{LLR})}$ $a_{1}\leftarrow\frac{N_{w}(\textit{RLR})}{N_{b}(\textit{RLR})}$

$<h_{0},h_{1},h_{3},h_{4},h_{5},h_{6},h_{7}>\leftarrow$ HuMoment ( $I_{m}$ ) $F\leftarrow<r_{1},r_{2},r_{3},r_{4},r_{5},r_{6}>\cup<a_{0},a_{1}>\cup<h_{0},h_% {1},h_{3},h_{4},h_{5},h_{6},h_{7}>$

return $F$

Apart from these six features, seven invariant hu moments are also included in feature vector for improving the learning accuracy. In total thirteen features are extracted from each segmented image. Algorithm 1 explains the exact steps followed in the feature extraction process. Following Fig. 11 shows the sample data extracted form the dataset using above procedure. The first six columns in the figure represents the ratio based features, next two represents the area based features and remaining seven is invariant hu moments. After the extraction process the data were kept as comma separated values (csv) file.

Figure 11.

Sample data extracted from the dataset.

Figure 12.

Output at different processing stages.

Figure 13.

Confusion Matrix fro different classifiers.

Figure 14.

Classification summary for different classifier.

4.5 Classification

The extraction of the features brings us one-step closer to accomplishing the main objective of this work, which is building a framework for the classification of the SICD sthanas. The remaining task of identifying the best classifier model required a Brute Force Strategy, that is, the features extracted from the dataset of the SICD sthanas were fed to different classifiers such as KNN, SVM, and Naive Bayes. After rigorous analysis of the classification results from each classifier, we have located the Naive Bayes as the best classifier model to meet the concerns of the study. The details of the classifier implementation and analysis are elaborated in the next section.

5. Implementation and result analysis

Implementation of the proposed work was done with the Python programming language. Different library packages were used for various purposes, like sklearn for machine learning, OpenCV for computer vision and image processing, Yellow Brick and matplotlib for visualization, and other packages like NumPy, pandas etc. for scientific computations. The TensorFlow and related packages were also used for executing the BlitzNet.

The experiment was first carried out with CNN as it is the most prominent tool for image classification. But as the volume of the dataset is less CNN failed to give better accuracy. In the newly developed hand-crafted machine learning approach, the first phase consists of the implementation of the image-processing pipeline and feature-extraction. The raw images from the dataset were processed at different stages as already mentioned in the discussion on the proposed framework. These processed images were further used for the extraction of features. A simple illustration of the outputs obtained at various stages is shown in Fig. 12. After the feature-extraction was completed, three classifier models namely SVM, KNN and Naive Bayes classifiers were used for the classification of the data. An analysis of the results obtained from each of the classifier models is discussed below. In the first level of analysis, the entire dataset of SICD sthanas were split into 2 subset with the ratios 9:1 (90% for training and 10% for testing), 8:2 (80% for training and 20% for testing), 7:3 (70% for training and 30% for testing) in 3 different trials. The Fig. 13 depicts the Confusion Matrices of KNN classifier, SVM classifier, and Naive Bayes classifier respectively for the first trial.

Accuracy is yet another parameter used for validating the classifier models. In general, accuracy is a metric that measures the ratio of correct predictions done by the model over the total number of instances that are evaluated. The accuracy obtained for the three models is shown in Table 1. From the table, it is clear that Naive Bayes classifier gives better accuracy of 82.21%, though it is only for a single instance.

For validating the obtained model K Fold cross-validation was done with the value of $K=$ 8. The accuracy obtained after cross-validation is shown in Table 2.

Table 1
Accuracy obtained from different classifiers

Classifier	CNN (without explicit features)	SVM	KNN	Naive Bayes
Accuray	52.00%	71.92%	78.95%	82.21%

Table 2

Accuracy obtained from cross validation

Classifier	SVM	KNN	Naive Bayes
Accuracy	78.34%	84.16%	85.95%

Some other metrics are also there for evaluating the classifier like precision, recall, f1-score, and support. Precision gives a measure of correctly predicted actual class observations within total predicted actual class observations. High precision indicated low false positive rate. Recall is the ratio of correctly predicted actual class observation in the total actual class observation. f1 score is the harmonic mean of precision and recall. Good f1 score helps to select the best classifier. Classification summary for different models are visualized and illustrated in Fig. 14.

6. Conclusion and future scope

Dance is best understood as a coordinated outcome of movements, postures, and expressions. These components of the South Indian Classical Dance are well defined and theorized in the seminal works like Natya Shastra, Abhinaya Darpana, etc. The Natya Shastra has lucidly defined sthanas as the leg-postures of the dancers. Nonetheless, despite the compelling presence of the sthanas in the dance-images, the academic discourses have not produced a competent method to identify them. In order to address this computational problem, this work builds a classification framework for identifying and classifying the sthanas that are present in the dance-images. The proposed model was implemented as an application of Image Processing, Computer Vision, and Machine Learning. An image-dataset was created for the SICD sthanas containing six classes of the sthanas. This image dataset was processed at different stages like the pre-processing, the segmentation, the feature-extraction and thereby obtained thirteen specific features of the sthanas. These extracted features were fed to the classifiers for building a classification model. Models like the SVM, the KNN and the Naive Bayes were applied and Naive Bayes gave a better classification accuracy of 85.95%. In this work, we have considered only sthanas for the purpose of classification. But there are other concepts like hastha bedhas, mudras, bhava, etc. which can also be analyzed computationally. Further, the features extracted in this work has considered only the geometrical attributes of the sthanas. The classification can be done more precisely by using deep learning models like CNN, but the size of the dataset is a concern for such methods. However, other ways like Transfer Learning and Data Augmentation Techniques can be adopted for such minimal dataset. The methodology followed in this work can be applied to various national and international dance forms with proper incorporation of their domain-specific features.

References

Motebennur

Lahkar

Gajakose

. Digitisation of culture. 5th Convention PLANNER-2007, Gauhati University, Guwahati, December 7–8, 2007 ©INFLIBNET Centre, Ahmedabad. 2007; (December 2007).

Ministry

. Indian Culture. Available from: https://indiaculture.nic.in/dance.

BharathaMuni – Translated By: Manomohan Ghosh. Natya Shastra (with English Translations). Asiatic Society of Bengal, Calcutta; 1951. Available from: https://archive.org/details/NatyaShastra.

Protopapadakis

Voulodimos

Doulamis

Camarinopoulos

Doulamis

Miaoulis

. Dance pose identification from motion capture data: A comparison of classifiers. Technologies. 2018; 6(1): 31. Available from: http://www.mdpi.com/2227-7080/6/1/31.

Nagata

Okumoto

Iwai

Toro

Inokuchi

. Analysis and Synthesis of Latin Dance Using Motion Capture Data. In 5th Pacific Rim Conference on Multimedia. 2004 November; pp. 566–574. Available from: https://link-springer-com.web.bisu.edu.cn/chapt er/10.1007/978-3-540-30543-9_71.

Ofli

Erzin

Yemez

Tekalp

Erdem

, et al. Unsupervised dance figure analysis from video for dancing avatar animation. In Proceedings – International Conference on Image Processing, ICIP. 2008; pp. 1484–1487.

Saha

Ghosh

Konar

Nagar

. Gesture recognition from Indian classical dance using kinect sensor. In Proceedings – 5th International Conference on Computational Intelligence, Communication Systems, and Networks, CICSyN 2013. 2013; pp. 3–8.

Das

Majumdar

Dutt

. Analysis and Interpretation of Indian Classical Dance. 2014; 721302.

Kannan

Andres

Ramadoss

. Tutoring System for Dance Learning. In IEEE International Advance Computing Conference IACC? 9. 2010; pp. 9–11. Available from: http://arxiv.org/abs/1001.0440.

10.

Jadhav

Mukundan

. A computational model for bharata natyam choreography. International Journal of Computer Science and Information Security. 2010; 8(7): 7–9.

11.

Hassan

Chaudhury

Gopal

. Annotating dance posture images using multi kernel feature combination. In Proceedings – 2011 3rd National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, NCVPRIPG 2011. 2011; pp. 41–45.

12.

Samanta

Purkait

Chanda

. Indian Classical Dance classification by learning dance pose bases. In Proceedings of IEEE Workshop on Applications of Computer Vision. 2012; pp. 265–270.

13.

Mozarkar

Warnekar

DCS

. Recognizing bharatnatyam mudra using principles of gesture recognition. International Journal of Computer Science and Network. 2013; 2(4): 46–53. Available from: http://ijcsn.org/IJCSN-2013/2-4/IJCSN-2013-2-4-57.pdf.

14.

Fukayama

Goto

. Automated choreography synthesis using a Gaussian process leveraging consumer-generated dance motions. In Proceedings of the 11th Conference on Advances in Computer Entertainment Technology – ACE ’14. 2014; pp. 1–6. Available from: http://dl.acm.org/citation.cfm?doid=2663806.2663849.

15.

Jadhav

Aras

Joshi

Pawar

. An Automated Stick Figure Generation for BharataNatyam Dance Visualization. In Proceedings of the 2014 International Conference on Interdisciplinary Advances in Applied Computing – ICONIAAC ’14. 2014; pp. 1–8. Available from: http://dl.acm.org/citation.cfm?doid=2660859.2660917.

16.

Karavarsamis

Ververidis

Chantas

Nikolopoulos

Kompatsiaris

. Classifying Salsa dance steps from skeletal poses. In Proceedings – International Workshop on Content-Based Multimedia Indexing. 2016-June; pp. 1–6.

17.

Sicchio

. Layering the choreographic process: Making dance Work with machine learning. In CEUR Workshop Proceedings. 2017; p. 1907.

18.

Kim

Kwak

. Classification of K-pop dance movements based on skeleton information obtained by a kinect sensor. Sensors (Switzerland). 2017; 17(6).

19.

Shubhangi, Tiwary US. Intelligent Human Computer Interaction. 2017; 10688: 67–80. Available from: https://link-springer-com.web.bisu.edu.cn/10.1007/978-3-319-72038-8.

20.

Kumar

KVV

Kishore

PVV

. Indian classical dance mudra classification using HOG features and SVM classifier. International Journal of Electrical and Computer Engineering (IJECE). 2017; 7(5): 2537. Available from: http://www.iaescore.com/journals/index.php/IJECE/article/view/8206.

21.

Kumar

KVV

Kishore

PVV

Kumar

. Indian Classical Dance Classification with Adaboost Multiclass Classifier on Multifeature Fusion. 2017; 2017.

22.

Kishore

PVV

Kumar

KVV

Kiran Kumar

Sastry

ASCS

Teja Kiran

Anil Kumar

, et al. Indian Classical Dance Action Identification and Classification with Convolutional Neural Networks. Advances in Multimedia. 2018; 2018.

23.

Guo

Liu

Georgiou

Lew

. A review of semantic segmentation using deep neural networks. International Journal of Multimedia Information Retrieval. 2018; 7(2): 87–93. Available from: https://doi.org/10.1007/s13735-017-0141-z.

24.

Deng

Zhou

Sha

Mori

. Recalling Holistic Information for Semantic Segmentation. CoRR. 2016; abs/1611.08061. Available from: http://arxiv.org/abs/1611.08061.

25.

Dvornik

Shmelkov

Mairal

Schmid

. BlitzNet: A Real-Time Deep Network for Scene Understanding. In Proceedings of the IEEE International Conference on Computer Vision. 2017-Octob; 4174–4182.

Computational framework with novel features for classification of foot postures in Indian classical dance

Abstract

Keywords

1. Introduction

3. Preliminaries

3.1 Hu moment

3.2 K-Nearest Neighbour(KNN) classifier

3.5 CNN and Semantic Segmentation

4. Proposed framework

4.2 Preprocessing

4.4 Feature extraction

5. Implementation and result analysis

Table 1 Accuracy obtained from different classifiers

References

Table 1
Accuracy obtained from different classifiers