Abstract
Vision-based Sign Language Recognition has been an open research problem since decades. Many existing methods for sign recognition works well under restricted laboratory conditions but failed to support real-time scenarios because extraction of manual and non-manual movements with constantly changing shapes of signs are considered as tedious problem in machine vision and machine learning. To overcome these shortcomings, an interactive real time class level gesture similarity based sign recognition using Artificial Neural Network is presented in this paper. The method uses the sign images and starts with enhancing the image quality. The quality enhancement is performed by equalizing the histograms of luminance and contrast. The features of hand as subunits from quality improved image have been extracted by template matching techniques. Extracted features are used to generate neural network and trained with different class of signs. The classification is performed by measuring the class level gesture similarity measure towards each class of signs and images. Based on the measure estimated, the method classifies the image and sign. The result produced to the user has been iterated based on the actions provided by the user. The method is capable of iterating the result and recognition till the user gets satisfied. The method produces higher accuracy in sign recognition and reduces the false ratio.
Introduction
The growth of information technology has been adapted to various problems of different industry. Any industry is focused to maintain the relationship of their users/customers. To make them tie with the organization, it is necessary for them to identify the user goals, interest and so on. By identifying the user interest, the user interest related the information related to user interest can be generated in web search. Similarly, in terms of video play back, the sign recognition can be enforced to support the multimedia entertainment organizations. Similarly, there are number of applications can be named for the application of sign recognition. Earlier days, the sign recognition has been used for the learning purpose and news agencies. But the same can be used to support various problems of different sectors.
The sign recognition is the process of recognizing or classifying the sign being produced by the person. In simple term, it can be recognition by viewing the image. But in terms of automatic recognition, it needs some strategic techniques. Similar to image processing, various features has been used for the classification of sign. The color based approaches are used in some articles but suffer to achieve performance because, it just share the luminance feature. In terms of sign recognition, the texture and shape features have to be considered. Also, the clustering algorithms used for sign recognition like KNN, uses only the limited features like texture or color. This increases the false classification ratio and it is necessary to consider more number of features and measure the similarity with various facts and factors.
However many features considered, identifying the gesture of the hands and fingers plays vital role. From the image given, the face features can be identified using template matching, color features and so on. Once the features are identified, they have to be separated from the image using subtraction algorithms. Then the hand gesture features can be extracted by using template matching techniques. Using the features, the similarity between different class can be measured. Even though the features of the image extracted, identifying the signals and considering all the values of features is more essential. The artificial neural network (ANN) can be well adapted to many problems. The sign recognition can be performed with ANN. As the neurons of ANN initialized with the features of hand gesture, the classification can be performed efficiently.
The modern society looks for more specific results. Towards this issue, an interactive model can be generated. The model is capable of receiving the result generated by the system and can be able to adjust the parameters which helps the system to produce more efficient recognition. Towards the scope to achieve higher performance, the fuzzy based algorithms can be adapted. The hand gestures geometry can be used in generating the fuzzy rule. Based on fuzzy rule, the classification can be performed. With this aim, an interactive real time gesture similarity based sign recognition has been proposed and will be discussed in the next section.
The rest of the paper is organized as follows: section 2 gives the related work in this domain, section 3 depicts the proposed CLGSM and its working strategy, section 4 elaborates the qualitative and quantitative experimental results of proposed CLGSM and finally section 5 concludes the proposed methodology.
Related works
There are number of methods discussed earlier for the problem of sign recognition. This section present list of approaches related to the problem. In [1], the author explores the application of color and texture features in image mining. The color features are extracted and converted to feature descriptors. The popular Speed Up Robust Features represent the color feature and textures in form of interest points. The interest points of the image have been converted into feature descriptors of dimension 124. Based on the feature descriptors the similarity between the images has been estimated to perform retrieval.
In [2], the author describes an automatic indexing scheme to support web search. The method extracts text features and image features from the images. Based on the features extracted, the method identifies the class of the image where it has to be indexed. In [3], an texture based mammogram indexing scheme has been presented. The method uses both textual features and image features for the classification. The method extracts the text words and identify the class of image based on the radiological information. Based on the reports and the image, the indexing is performed which helps to classify the image class.
In [4], an efficient approach for the indexing and classification of geographic images is presented. The semantic features, image features and text features are extracted initially. The structure feature of the image has been extracted and based on that, the semantic meanings are identified. In the classification phase, the semantic meanings and structure features are used to measure the similarity between the images. In [5], the quantization of features is performed in ordered manner to identify the distinct features. The distinct feature descriptors are identified based on the similarity estimation. Similarly in [6], an local binary pattern (LBP) based approach is presented for content based retrieval. The method extract the features to generate local binary pattern which has been indexed and used to measure the similarity between the images. Based on the similarity measure, the similar images are identified.
In [7], the correlation between the pixel value has been used for classification. The relation between the neighbouring pixel are computed based on the contrast values of the pixels. The relationship has been measured in multiple directions which has been used to extract similar images. In [8], an tree based indexing scheme is presented. The images features in form of coarse and fine are indexed into different level of the tree which makes the tree as hierarchical one. This helps to improve the accuracy of image retrieval approach.
In [9], an code book based algorithm is presented. The method compress the features into blocks and vector quantization is used to generate the codes. The generated codes are indexed with the coefficient codes. The code books are represented by RGB values. Based on the histogram values available in the code book, the method identifies the similar images. In [10], an graph based approach is presented for image retrieval. The method split the image into number of region and for each of them, the method generate a node which contains the values of the region. Based on the relationship with the values of any node, two nodes has been connected with an edge. Based on this feature, the method identifies the relevant images.
Gesture recognition method based on deep learning [11], proposes a new method based on time gesture recognition. By analysing the kinematics of gestures, the features of gestures are extracted and classified using Recurrent Neural Networks and their variant networks. Sign language recognition using image based hand gesture recognition techniques [12], introduced software which presents a system prototype that is able to automatically recognize sign language to help deaf and dumb people to communicate more effectively with each other or normal people. Pattern recognition and Gesture recognition are the developing fields of research. Being a significant part in nonverbal communication hand gestures are playing key role in our daily life. Hand Gesture recognition system provides us an innovative, natural, user friendly way of communication with the computer which is more familiar to the human beings.
Deep convolutional neural networks for sign language recognition [13], proposes the recognition of Indian sign language gestures using a powerful artificial intelligence tool, convolutional neural networks (CNN). Selfie mode continuous sign language video is the capture method used in this work, where a hearing-impaired person can operate the SLR mobile application independently. Due to non-availability of datasets on mobile selfie sign language, we initiated to create the dataset with five different subjects performing 200 signs in 5 different viewing angles under various background environments. Real time static hand gesture recognition system prototype for Indonesian sign language [14], propose a prototype system that can recognize the hand gesture sign language in real time. We use HSV (Hue Saturation Value) color space combined with skin detection to remove the complex background and create segmented images. Then a contour detection is applied to localize and save hand area. Further, we use SURF algorithm to detect and extract key point features and recognize each hand gesture sign alphabet by comparing with these user image database.
Vision-based SLR systems [15], have greatly influenced the researchers by their heftiness and their ability to handle cluttered, dynamic inhomogeneous environments and variations under different illuminations and occlusions in the feature extraction stage. Many of the publications have not addressed these issues explicitly; instead they suggest that the system works well in homogeneous background with a common constraint that the signer needs to wear long-sleeved clothes that differs from the skin tone of the signers allowing colour detection method to detect face and hands of the signer. Roussos [16], extracted the hand shape features by employing Affine-invariant shape appearance model to offer a compact and discriminative feature version of hand configurations. Later, the hand shape features extracted via this model was integrated to form the subunits. However this system produced around 85% accuracy in subunit level for isolated signs.
In [17], the author introduced a novel Bayesian Parallel Hidden Markov Model (BPHMM) to combine manual and non-manual features classified using Parallel HMM and it also handles the problem of movement ambiguities using dynamic pruning of clustering with DTW. This proposed BPaHMM combines the visual and linguistic transcriptions of sign lexicon along with extracted manual and non-manual features to form a subunit gesture base. In this work, shape-based features and region-based features are considered as subunit-manual spatial feature vectors. At the same time, motion trajectories are estimated to form the subunit-manual temporal feature vectors. The facial parameters namely shape and texture information are extracted to structure the subunit-non-manual feature vectors. Extraction of manual and non-manual features using subunit modeling helps to resolve the movement ambiguities and occlusions which are considered as the most important issues in real world sign language recognition.
Gesture Recognition and Machine Learning Applied to Sign Language Translation [18], propose an intelligent system for translating sign language into text. This approach consists of hardware and software. The hardware is formed by flex, contact, and inertial sensors mounted on a polyester-nylon glove. The software consists of a classification algorithm based on the k-nearest neighbours, decision trees, and the dynamic time warping algorithms. The proposed system is able to recognize static and dynamic gestures. This system can learn to classify the specific gesture patterns of any person.
Hand Signal Recognition for Handling Surgical Instruments [19], presents a computer vision system which identifies the hand symbols and the surgical instruments’ position using a Kinect sensor in the operating room. The system aims at finding and locating a hand and its midpoint based on a depth image from Kinect sensor and it is thus able to recognize the hand symbols from a surgeon in order to control a manipulator to pick the right instrument. Then, this system detects position and direction of this instrument on a table to direct a robot to pick it up properly.
Hand Sign Recognition for Thai Finger Spelling: an Application of Convolution Neural Network [20], investigates Thai Finger Spelling(TFS), its unique characteristics, a design of automatic TFS recognition, and approaches to handle a TFS key potential issue. This research designs automatic TFS recognition as a two-stage pipeline: (1) locating and extracting a signing hand on the image and (2) classifying the signing image into the valid TFS sign. Signing hand is located and extracted based on color scheme and contour area using Green’s Theorem. Two approaches are examined for signing image classification: Convolution Neural Network(CNN)-based and Histogram of Oriented Gradients(HOG)-based approaches. The above discussed algorithms do not produce efficient result in sign recognition and introduces higher false classification ratio.
To overcome all these shortcomings, an interactive real time class level gesture similarity based sign recognition using Artificial Neural Network is proposed. The contributions of the proposed approach are as follows: The method uses the sign images and starts with enhancing the image quality. The novelty of this method lies in the stage of improving the quality of sign images and the quality enhancement is performed by equalizing the histograms of luminance and contrast. With the fine-tuned quality improved image, the subunit hand features will be extracted and these features are used to generate neural network and trained with different class of signs. The classification is performed by measuring the class level gesture similarity measure towards each class of signs and images. Based on the measure estimated, the method classifies the image and sign.
Real time class level gesture similarity measure based sign recognition
The Proposed architecture of real time fuzzy class level gesture similarity based sign recognition algorithm and functional components of the system are presented in Fig. 1. The proposed real time gesture similarity based sign recognition algorithm improves the quality of image by applying histogram equalization which acts on different level of the image. From the quality improved image, the method performs template matching to identify the face features. Identified face features has been removed from the Image and extract the hand unit. The extracted features have been trained with neural network and generates the fuzzy rule for different class. Finally, at the classification, the method estimates class level gesture similarity (CLSM) measure. The detailed approach is presented below:

Architecture of Interactive real time class level gesture similarity based sign recognition.
The pre-processing is performed to improve the image quality. First the input sign image has been read and the different layer signals of RGB are extracted. At each signal layer, the method performs histogram equalization. The equalized layer signals are restored to produce the improved image. The quality improved image has been used to extract the features of the image to perform classification.
Gesture feature extraction
In this stage, the method reads the quality improved image. The quality improved image textures have been extracted. Extracted texture feature has been used to match with the templates of human faces. The features and the template region identified are removed or subtracted from the image. The subtracted image has been retained to produce the hand gesture and units features. The hand gesture features are extracted to generate the fuzzy rules. The below discussed algorithm extracts the texture features of face and hand. The face features are eliminated from the image and the hand gesture features are retained. The retained features are converted into a template to perform matching.
The fuzzy rule generation is performed on the texture features being extracted from various images. The method reads all the sub set of images or textures extracted. For each image texture the method reads the coordinate points at each row. Then for each coordinate pixel, the method compute the distance measure with all the coordinate values. For each coordinate, the method computes the minimum and maximum coordinate location. Similarly, for each distinct coordinate value, a set of minimum and maximum values are computed. The computed values are used to generate the fuzzy rule.
The above discussed algorithm generates fuzzy rule on the texture coordinates of hand gesture. The generated fuzzy rule has been used to perform sign recognition.
The training of neural network is performed based on the feature and hand gestures generated in the previous stages. The proposed approach used ANN for training along with the CLGSM because ANN is meant for training the raw data. But the proposed uses the architecture in such a way that the method first reads the image set and performs pre-processing and feature extraction on each image. Second, the method generates the fuzzy rule with the texture features extracted. Third, the method initializes the neural network with number of layers and number of neurons. Each layer neuron has been initialized with the feature being extracted. The neurons computes the class level gesture similarity measure for each class. Based on the value of CLGSM measure, the method classifies the image to index them. This algorithm discusses performs indexing of images of sign based on the class level gesture similarity measure estimated. The grouped image has been used to perform sign recognition.
The class level gesture similarity measure has been estimated using the gesture images available in the feature extraction stage. For the given gesture images, the method extracts the coordinate points of the image. Then for each level or row of the image, the method computes the number of coordinates on and off. Using these values, the method estimates the number of true on and true off. Using these two, the method computes the CLGSM measure.
In this stage, the gesture images are given as input and the corresponding features coordinates are extracted. The fuzzy rule has been framed adhere to the extracted coordinate points with respect to true features and false features. The similarity measure is estimated in such a way that the ratio of truly predicted features and falsely identified features to the total number of extracted features. The class level gesture similarity is obtained by the way of computing the ratio of total number of similar gestures with respect to total number of extracted features. The above discussed algorithm estimates the gesture similarity measure at each level of the feature. Estimated gesture similarity measure has been used to perform classification.
The proposed sign classification algorithm reads the input image and extracts the hand gesture feature from the image. Extracted features have been feed through neural network designed where each neuron estimates class level gesture similarity measure. The CLGSM measure has been measured towards each class of sign. Finally, the class with higher CLGSM has been used to perform recognition. The classified signs and pictures are given to the user. The system gets the user feedback of action and based on that the action will be performed iteratively.
The above discussed algorithm estimates the class level gesture similarity measure for each class of sign. Finally, a single sign with maximum similarity measure has been selected as result.
The proposed real time class level gesture similarity measure based sign recognition algorithm has been implemented using mat lab. The algorithm has been evaluated for its efficiency in various parameters. The method has been evaluated with number of classes. The method has produced efficient results. The results obtained have been discussed below. The details of data set being used to perform evaluation of proposed algorithm are presented in Table 2. The list of commands and gesture has been presented in Table 1. Similar to the content of Table 1, the method has been validated for its performance in different categorical gestures like words, numbers, actions, habits, emotions and so on.
Command and gesture
Command and gesture
Details of data set
Comparison on sign recognition accuracy with varying signs

Comparison on sign recognition accuracy.

Performance analysis on recognition accuracy towards a) number, b) alphabet, c) word and d) emotion.
The accuracy of recognizing the signs under varying number of signs has been measured for different methods based on the number of correctly identified divided by the total number of testing samples. The results produced by various methods at different number of signs have been compared with the result of proposed class level gesture similarity algorithm. The proposed real time CLGS algorithm has achieved higher recognition accuracy up to 98% which is higher than other methods. The performance on sign recognition accuracy has been measured and compared with the results of different methods. The proposed approach has been evaluated against the standard benchmark dataset ASLLVD which consists of 3300 individual signs performed by six signers. The proposed CLGSM algorithm has produced higher classification accuracy than other methods and the results are plotted in Table 3 and Fig. 2. The CLGSM outperforms in all different validation because of the fine-tuned data given as input to the network.
The performance on sign recognition accuracy at different features like numbers, alphabets, words and emotions are measured for each method. The results produced by the methods has been compared and presented in the Table 4. The proposed algorithm has produced higher recognition accuracy than other methods since the features are enhanced with their quality and it is plotted in Fig. 3.
Comparison on sign recognition accuracy on different features
Comparison on sign recognition accuracy on Traffic and Sports
The performance on sign recognition accuracy at different domains like sports and traffic symbols are measured for each method. The results produced by the methods has been compared and presented in the Table 5. The proposed algorithm has produced higher recognition accuracy than other methods and it is shown in Fig. 4.

Performance analysis on recognition accuracy towards traffic symbols and sports symbols.
The false classification produced at sign recognition by different methods under varying number of signs has been measured and compared with the results of proposed CLGS algorithm. In each case, the proposed CLGS algorithm has produced less false classification ratio than other methods and it is tabulated in Table 6. The proposed CLGS algorithm has produced false classification ratio up to 4% and it is illustrated in Fig. 5.
Comparison on false classification ratio
The false ratio produced by different methods in sign recognition has been measured and compared with the results of other methods. The proposed method has produced less false classification ratio than other methods. The time complexity produced by different methods at the recognition process has been measured under varying number of signs. The result of time complexity produced by different methods has been measured and compared with the proposed CLGS algorithm. The proposed CLGS algorithm has produced less time complexity than other methods. The comparative analysis on time complexity produced by different method has been measured and presented in Fig. 6. From this result, it is inferred that with the fine-tuned data the proposed approach recognizes the sign images faster than the existing approaches. The proposed approach does not require additional processing to process the data during validation and hence the convergence of error rate and complexity has been reduced and it in turn increases the accuracy and performance.
Comparison on time complexity

Performance analysis on false classification ratio.

Comparison on time complexity estimation.
In this paper, an efficient real time fuzzy class level gesture similarity based sign recognition has been presented. The method first receives the image and pre-processes the image to enhance it. The image enhancement is performed by applying histogram equalization in each layer of the image. The quality improved image has been used to extract the texture features by removing the face feature from the image. The segmented image has been used to extract the hand gesture features from the image. The features extracted are used to generate the neural network and the same has been used to generate the fuzzy rule. Using all this, the method estimates the class level gesture similarity measure which has been used to perform sign recognition. The method introduces higher sign recognition accuracy and reduces the false ratio. As a future scope, the proposed approach will be used for analysing the human action in surveillance and can be used as a natural gesture interface.
