Tongue print identification using deep CNN for forensic analysis

Abstract

The need of newer biometric traits is increasing, as the conventional biometric systems are found to be vulnerable to forging. Nowadays, tongue print is gaining importance as a biometric trait, especially in the area of forensics. Tongue is a well protected vital organ which exhibits rich structural patterns. Success of tongue print as a biometric tool depends on how well the discriminating features are extracted from it. Advancements in the field of deep neural network and availability of high-end computing environments facilitate remarkable progress in the area of image recognition. CNN follows a hierarchical learning to extract feature maps that highly characterize the training data. However, obtaining a tongue print dataset large enough to train a CNN for recognition poses a huge challenge. Alternatively, two techniques can be used to successfully employ CNN for recognition: fine-tuning pre-trained CNN models, to use as a classifier, with the new input dataset and class labels to perform tongue-print image recognition. Another effective method is to use a pre-trained CNN model as a feature extractor, to extract features from the input tongue dataset and then use a state-of-the-art classifier to perform image recognition. In this paper, we addressed three important factors regarding the deployment of tongue-print as a biometric tool. Since, a tongue-print dataset is not publicly available, our first objective to create a challenging tongue-print dataset. We then explored and evaluated different state-of-the-art CNN architectures for image recognition. These models are varied in their architecture and contain 5 million to 144 million parameters. Finally, we analyzed different approaches to use the pre-trained CNN models for the tongue-print identification task.

Keywords

Tongue print biometric identification CNN support vector machine forensics

1 Introduction

A biometric system should support the facet of identification, authentication and non-repudiation in information security. The conventional biometric systems fail to meet these requirements as they can be forged. Hence, tongue prints are gaining importance in biometric authentication as a new biometric trait. Tongue is a unique vital organ and the characteristic features of the tongue exhibits remarkable difference even between identical twins. In traditional Chinese medicine, tongue played an important role in diagnosing disease conditions by observing characteristics such as colour and shape. Not much studies have been initiated in the field of tongue print identification. Zhi Liu et al. [1] made an attempt to build a tongue database and based on their analysis, concluded that the tongue print can be used for personal identification. Li Q et al. [2] and Manoj Diwakar et al. [3] continued to study on tongue print image for the possibility of using tongue prints as a biometric trait. They also proposed different methods for creating tongue database.

Omer et al. [4], based on their study, showed that the tongue possessed different characterization even for identical twins using tongue’s cross-section and stated that tongue print image can be used as a new biometric trait for human identification. Radhika et al. [5] compared tongue prints with other biometric traits and highlighted its superiority over other biometric tools. Bob Zhang and Han Zhang [6], used geometric features that are extracted from the tongue print images of both healthy and unhealthy humans to study a patient’s condition. Stefanescu et al. [7], reported a classification for tongues by analyzing the morphological features.

Salim Lahmiri [8], used six statistical features for tongue print verification by extracting textural features from tongue print image using wavelet transformation. Manoj Diwakar et al. [3], used the histogram features for human identification that are extracted from tongue print images. Ryszard S. Choras [9] proposed steerable filters combined with Weber Law Descriptor feature for identification. Zhang et al. [10] used both shape and textural features for identification. In their work, they have considered geometrical features for shape characterization and textural codes as the textural features. In their paper, Jeddy et al. [11] explained the use of tongue print as a method of biometric authentication for personal identification. Sivakumar et al. [12] studied the textural patterns of the tongue by extracting Local Binary Pattern (LBP) features. A linear Support Vector Machine (SVM) is used to train the extracted features for personal identification.

1.1 Tongue-print as biometric

For number of reasons, tongue-print can be considered as a reliable biometric trait. First and foremost, tongue-print gets fully formed even at the time of fetus development. Studies by Omer et al. [4], have proven that, the tongue possessed different characterization even for identical twins. Since tongue is well protected inside the mouth, it is a reliable tool for forensic studies, as it is not affected by external factors.

The study group comprises of randomly selected 180 individuals with age ranging from 17-35 years of both genders who volunteered to participate after taking informed consent. Initial objective of our work is to build a challenging dataset of tongue-print images. Images of the dorsal tongue are captured under the standardized lighting conditions using a SONY-WX 350 Compact Camera with 20× Optical Zoom (DSC-WX350) with fixed head position and tongue protrusion, maintaining the distance of subject to camera. For each individual, 5 different images from dorsal surface of tongue are captured with different orientation, scale and shape. Therefore, our dataset consists of a total of 900 images with resolution 4896 × 2752. In order to not lose the generality, we have not pre-processed the image to segment dorsal surface of tongue. Table 1 gives the description of the tongue-print dataset used for personal identification. Fig. 1 shows example tongue-print images from the Tongue DB dataset.

Table 1
Description of the tongue-print datasets used for the proposed method

Dataset Name Sensor Type Resolution (dpi)

Tongue DB CMOS Sensor "Exmor R" 300

Image Size No. of Individuals No. of Images

4896 × 2752 180 900

Dataset Name	Sensor Type	Resolution (dpi)
Tongue DB	CMOS Sensor "Exmor R"	300
Image Size	No. of Individuals	No. of Images
4896 × 2752	180	900

Fig. 1

Examples of Tongue DB dataset: Eight tongue-print images from 4 different individuals that varied in different scale and orientations.

Success of a tongue print image as a biometric tool for human identification lies on how well the geometric outline and physiological texture information of the human tongue helps to analyze the uniqueness of tongue [12]. Therefore, a study has been initiated to extract the most suitable features automatically from the dorsum of tongue for the automatic identification of an individual.

There are a number of handcrafted feature extraction methods proposed in the literature, like Gabor features, Histogram of oriented Gradients (HoG), SURF features and Visual Bag of Words (BoW) framework, that are proven to be effective in their own domains. However, these features are hard engineered, the low-level semantics of these features may not be well suited for a specific domain like tongue print identification. An automatic feature extraction method that learns features by itself is a good alternative. For instance, Convolutional Neural Network (CNN) based methods are proven in learning to extract specific features automatically from the input image. Deep CNN architecture uses stack of layers to learn features where the initial layers extracts local features like blobs and edges and the final layers are capable of extracting global features that are used for the recognition [13]. Recently, variety of CNN based models are reported in the literature that can learn features automatically for image recognition.

The paper is organized as follows: Section 2 explains the methodology used in this paper. Section 3 describes the different approaches used to deploy CNN to the tongue identification. The experimental setup, implementation details and the results are provided in Section 4. Section 5 draws the conclusions.

2 Methodology

Deep CNN methods are capable of learning feature representations to discriminate between different objects in the recognition task. However, most of the popular deep CNN models are trained on a large-scale dataset like ImageNet [23] for recognition. Many studies [19, 20] have shown that the feature representations generated using some pre-trained CNN models also achieved excellent performance in many application domains like classification, recognition and retrieval. Moreover, existing CNN models are computationally expensive and have not been tested in the tongue-print dataset. Therefore, we have used different approaches in pre-trained CNN models to investigate it’s performance on the tongue-print application.

2.1 Deep CNN models

We used four most popular deep neural network architectures that varies in their computational complexity, the number of parameters, depth and representational power to investigate how well the features are extracted from tongue print images for the human identification. A general architecture of CNN is shown in the Fig. 2.

Fig. 2

General architecture of a deep CNN model.

AlexNet: The first model used in our evaluation is the AlexNet proposed by Krizhevsky et al. [14], which was the first deep convolutional neural network to successfully cross the ImageNet [23] challenge in 2012. It achieved a recognition accuracy that outperformed all the state-of-the-art techniques of that time in the classical classification and recognition task.

The architecture of AlexNet is more deeper and wider compared to other CNN models when they proposed for object recognition. It consists of a total of 8 layers, where 5 are convolutional layers and the rest are fully connected. The major contribution they brought to the CNN model is the response-normalization and pooling. The first convolutional layer takes the input image of size 227 × 227 × 3 and filters it with 96 filters of size 11 × 11 × 3 with a stride of 4 pixels. The second layer performs the convolutional operation using 256 filters of size 5 × 5 ×48 and the normalized, pooled output is fed as the input to the third layer. The third, fourth and fifth convolutional layers are connected to one another without any normalization and pooling layers. 384, 384, and 256 filters of size 3 × 3 are used in these layers, respectively. Fully connected layers are used with dropouts and have 4056 neurons each and are followed by a softmax layer for classification. AlexNet neural network architecture has over 60 million trainable parameters.

VGGNet: The second deep CNN model used in our experiment is the VGG NET proposed by the Visual Geometry Group (VGG) in 2014 [15]. This model is proposed with the assumption that deeper the layers better the recognition accuracy. In their initial proposal, they introduced three variants of VGG, named VGG-11, VGG-16 and VGG-19. These variants remains the same in their architecture and differs only in the depth. VGG-11 contains a total of 11 layers (8 convolutional layers and 3 fully connected layers), VGG-16 and VGG-19 contains a total of 16 and 19 layers respectively with the same 3 fully connected layers.

VGG takes the input image of size 224 × 224 × 3. This input image is processed using a stack of convolutional layers (depending on the architecture used). The main change they introduced in VGG compared to AlexNet is that they used very small filters of size 3 × 3. However, the number of filters increase by a factor of 2, starting from 64 in the first layer until it reaches 512 in the last layer. The convolution stride set to 1 pixel. VGG uses 5 max-pooling layers in between some of the convolution layers for performing spatial pooling. 3 fully connected layer follow the stack of convolution layers and the final layer is a softmax layer. The 1000 output channels of the last fully connected layer is used for classification. Number of parameters of VGG-16 is 138 million where as for VGG-19 is 144 million.

GoogLeNet: The third deep CNN model we used is the GoogLeNet proposed by Szegedy et al. in 2015 [16, 17]. This model is proposed to reduce the complexity of training phase. The main contribution made was the introduction of novel inception architecture. GoogLeNet architecture consists of a total of 22 layers, that is more deeper than AlexNet and VGG, but still has much fewer parameters and less computational complexity.

The idea of the inception architecture is to reduce drastically the parameter required for convolution. This is done by replacing the larger sized convolution filters of size n × n with a sequence of 2 respectively smaller filters of size n × 1 and 1 × n. Overall, GoogLeNet architecture consists of 2 convolutional layers, 2 pooling layers and 9 Inception layers. Each Inception layer is built as 6 convolutional layers and 1 pooling layer. GoogLeNet architecture does not use fully connected layers as other models does, instead it uses a global average pooling layer, and the activation values of each 1000 output channels are used for image classification.

ResNet: A simple yet powerful network called residual network (ResNet) that consists of layers with skip connection identity mapping was proposed by He et al. [21]. The introduction of ResNet architecture made possible to create and train ultra deep neural network successfully without the problem of vanishing gradient. The idea of skip connection, in the ResNet architecture, is to alleviate the problem of vanishing gradient by allowing the gradient to feed through an alternative shortcut path. That is, the skip connection add the outputs from the previous layers to the outputs of stacked layers. This also gives the ResNet the ability to train deeper networks. ResNet has very deep network with 34, 50, 101, 152, and even 1202 number of layers.

The overall ResNet-50 architecture can be viewed as a connection of 5 residual groups each with a convolution and identity block. Each convolution block has 3 convolution layers and each identity block also has 3 convolution layers. Feature maps computed by different layers in each group share the same resolution. The ResNet-50 has over 25 million trainable parameters. There are several advanced architectures that have been proposed with the combination of Inception and Residual units. The concept of Inception block with residual connections is introduced in the Inception-v4 architecture [22].

Table 2 summarize the key properties of the deep CNN models we considered for training. Output size field in the Table 2 specifies the number of output channels in the last fully connected layer of the deep CNN models selected, while in our case the output size will be 1 × 180 that corresponds to the 180 subjects of our Tongue DB dataset.

Table 2

Comparison of the characteristics of the selected deep CNN models

Method	No. of Parameters	Input Size	Output Size	No. of Layers	Type
AlexNet	60M	227 × 227 × 3	1 × 1000	8	Convolutional
VGGNet16	138M	224 × 224 × 3	1 × 1000	16	Convolutional
VGGNet19	144M	224 × 224 × 3	1 × 1000	19	Convolutional
GoogLeNet	5M	224 × 224 × 3	1 × 1000	22	Inception
ResNet	25M	224 × 224 × 3	1 × 1000	51	Residual

3 Training protocols and transfer learning

CNN models being computationally expensive, requiring a large-scale dataset for the recognition task, we propose two approaches to apply CNN for tongue print image recognition: 1) The first one uses a pre-trained CNN model as a classifier and we fine tune it for our dataset 2) The second approach uses it as a feature extractor and subsequently, these features are trained using SVM classifier for the final classification.

Deep CNN models can be either learned from scratch or fined-tuned from pre-trained models. Training millions of parameters available with deep CNN models like AlexNet, GoogLeNet and annotating enormously large number of tongue print images pose an unmanageable challenge to start with from the scratch. Studies [19, 20] have shown that CNN models pre-trained with dataset like ImageNet go well with other datasets too with some degree of fine tuning. Therefore, the idea of the first approach is to transfer the weights that are trained using large-scale ImageNet dataset to make the tongue print image recognition tasks more effective.

For this approach, we used the methodology cited in [19, 18], wherein, all CNN layers except the last one gets fine tuned at a learning rate, 10 times smaller than the default. The last fully connected layer is completely replaced with a fresh layer that accommodate the new 180 subject labels with respect to our Tongue DB dataset. This layer is then randomly initialised and trained afresh.

Our second approach is to use the pre-trained models as the feature extractor. We considered four pre-trained models described in the Section 2.1. In this approach, we give our tongue-print image dataset as input and use the same trained parameters of the pre-trained models to get the final activation from the fully connected layer. This activation represents the global generic descriptor of the input image. The advantage of this approach is that, it gives faster training time as compared to the first approach, since the training phase does not modify the network parameters. Let define this training process by: $y = f (x)$ (1) where $x \in ℝ^{n}$ represents the input image, f (.) represents the deep CNN model selected and $y \in R^{n^{'}}$ denotes the extracted image descriptor. After computing the descriptor for the tongue-print image dataset, it will be given to a Support Vector Machine (SVM) classifier for final classification.

3.1 Support vector machine

Given set of N training samples {X_i, y_i} , i = 1, . . . . . , N where X_i ∈ Rⁿ belong to the binary class labeled by y_i ∈ {1, - 1}, SVM implicitly maps the data into a higher dimensional feature space and finds a separating linear hyperplane, for the given data, with a maximum margin. For a new sample X, the SVM classifier use the following function to decide its class. $f (X) = sgn [\sum_{i = 1}^{N} α_{i} y_{i} . K (X, X_{i}) + b]$ (2) where α_i are Lagrange multipliers of a dual optimization problem that describe the separating hyperplane and K (. , .) is the kernel function and b is the weight. sgn (ψ) is the sign function $sgn (ψ) = {\begin{matrix} 1, if ψ \geq 0 \\ - 1, otherwise \end{matrix}$ (3)

The training samples X_i with α_i > 0 are called support vectors, and SVM finds the separating hyperplane that maximizes the margin between the support vectors and the hyperplane. The most frequently used kernel functions are linear, polynomial and Radial Basis Function (RBF).

Being maximum margin classifier, SVM are designed to solve two-class problem, while tongue print identification is a q-class problem where q is the number of known individuals. Two approaches can be taken to solve the q-class problem. First is to reformulate the tongue print identification problem as a several separate two-class problems (one-vs-all). Employ a set of SVMs to solve a generic q-class recognition problem (one-vs-one) [12]. In this paper, we used the one-vs-all technique, which trains binary classifiers to separate one class from all other classes, and outputs the class with largest posterior probability.

4 Experimental results and discussions

In order to test the proposed method, we have used the Tongue DB dataset. Initially, we divided the dataset such that 60% images were taken for training and the rest for testing the supervised classifiers. Since, each CNN models used in the experiment requires different input size, we programmed to automatically augment the required sized data for training. We have investigated the performance of the deep CNN architecture described in the Section 2.1 for the identification task. For the first approach, we used the default softmax layer for subject identification, while for the second approach we have used the SVM classifier.

4.1 Experimental setup

The proposed tongue-print identification method is implemented using MATLAB R2019a deep learning toolbox. We run the deep learning toolbox in the Testla K80 dual-GPU environment with dynamic NVIDIA GPU boost technology with widely used CUDA parallel computing model.

In the first approach, we perform the fine-tuning of a pre-trained model using our tongue dataset as the input training set to train the new subject labels. The learning rate was set to a very low value (0.000003) to avoid large weight updates, preserving the useful features learned by the pre-trained model.

In the second approach, pre-trained CNN model was used for feature extraction with the default network’s weight parameters using our tongue dataset as the input training set. The output of the last fully connected layer with an output dimension of 1 × 1000 is considered as the final feature vector for classification. This feature vector is given as the input to the SVM classifier. We have set the total number of support vectors α_i = 3 and used 5-fold cross validation for training the samples. We have achieved best accuracy when the quadratic polynomial kernel of degree 2 is used with SVM.

Table 3 details the results of the two different approaches used for investigating the performance of different CNN models in the automatic tongue-print identification system. We have achieved the best accuracy of 98.61% for the first approach (using fine tuned pre-trained model) and an accuracy of 96.94% for the second approach (using pre-trained model without fine-tuning) in the identification task when ResNet deep CNN model used. Though the training time is high for the first approach when compared with the second, the results underline the superior performance of the transfer learning for the identification task. The results also show that the ResNet deep CNN model outperformed the other models in both approaches.

Table 3
Comparison of the accuracy with the different deep CNN models used

CNN models used Accuracy

CNN (Transfer learning) CNN + SVM (Pre-trained CNN)

AlexNet 98.33 96.11

VGG16 96.11 92.78

VGG19 96.66 93.33

GoogLeNet 97.50 95.00

ResNet 98.61 96.94

InceptionResNet V2 93.61 93.61

CNN models used	Accuracy
AlexNet	98.33	96.11
VGG16	96.11	92.78
VGG19	96.66	93.33
GoogLeNet	97.50	95.00
ResNet	98.61	96.94
InceptionResNet V2	93.61	93.61

4.2 Performance analysis of CNN models

To evaluate the performance of the proposed method, we have compared the accuracy of the state-of-the-art techniques which have been already proposed in the literature by Salim Lahmiri [8], Zhang et al. [10] and Sivakumar et al. [12] in the same dataset and the comparison result is shown in the Table 4. From the table, it is evident that the proposed method is giving better performance compared to other state-of-the-art techniques.

Table 4
Performance comparison with other worksreported in the literature

Method Accuracy(%)

Salim Lahmiri [8] 90.44

Zhang et al. [10] 92.64

Sivakumar et al. [12] 93.13

Proposed Method 98.61

Method	Accuracy(%)
Salim Lahmiri [8]	90.44
Zhang et al. [10]	92.64
Sivakumar et al. [12]	93.13
Proposed Method	98.61

To further evaluate the performance, False Acceptance Ratio (FAR) and False Rejection Ratio (FRR) are used. All biometric system works in a range of operational values. The system will accept the user as genuine, once the system outputs a value within this range. A Receiver Operating Characteristic (ROC) curve is used to plot the Genuine Acceptance Rate (1-FRR) against false acceptance rate for all system operational points. To compute the False Acceptance Rate $FAR = \frac{Number of accepted imposter claims}{Total number of imposter accesses} * 100$ (4) and the False Rejection Rate $FRR = \frac{Number of rejected genuine claims}{Total number of genuine accesses} * 100$ (5) The ideal case of any biometric system is when both the FAR and FRR values is zero. But in real scenario, these two values will meet at an operating point in a ROC curve. Fig. 3 show the ROC curve that compares with ResNet CNN model that performs best with our dataset and LBP+SVM model proposed by Sivakumar et al. [12]. The ROC plot shows the performance of our proposed system with the other method. Even at very low FARs, the performance of the proposed system is far superior than LBP based method.

Fig. 3

ROC plot of experimental results on Tongue DB dataset for two different methods.

5 Conclusion

Tongue is a well protected vital organ within the oral cavity and hence not vulnerable to forgery. The dorsal surface of tongue exhibits rich structural patterns that enables the possibility of using tongue-prints as a novel biometric tool in forensic and biometric applications. In this paper, a framework for applying deep CNN models for automatic tongue-print identification is proposed. The framework used two approaches for the identification task: First approach used fine-tuned pre-trained deep CNN model as a classifier while the second approach used pre-trained deep CNN model as a feature extractor. In the second approach, the feature extracted by the CNN is trained by SVM classifier for the final identification. In the above two approaches, the first approach with ResNet CNN model achieves the best identification accuracy. The results shows the superior performance of the deep CNN for the personal identification using tongue-print and thus tongue-print can be used as a reliable biometric trait.

Footnotes

Acknowledgment

Authors acknowledge the University Grants Commission (UGC) for providing the funding to set up Massive Parallel Processing systems at Department of Computer Science, Cochin University of Science and Technology (CUSAT), under UGC XII plan. (File No. PL.(UGC)1/SPG/2016-17 dated 08.07.2016)

Authors also acknowledge PMS College of Dental Science and Research, Thiruvananthapuram, Kerala for the partial funding of the project.

References

Zhi

, Yan

, Zhang

and Tang

, A tongue-print image database for recognition, In: Proceedings of the Sixth International Conference on Machine Learning and Cybernetics, Hong Kong, 2007, pp. 19–22.

and Zhi

, Tongue color analysis and discrimination based on hyper spectral images, Computerized Medical Imaging and Graphics33(3), 217–221.

Manoj

and Manish

, An extraction and recognition of tongue-print images for biometrics authentication system, International Journal of Computer Applications61(3) (2013), 36–42.

Omer

A.M.

, Tagwa

E.E.

and Muhammed

E.H.

, Tongues: could they also be another fingerprint, Indian journal of forensic medicine and toxicology8(1) (2014), 171–175.

Radhika

, Jeddy

and Nithya

, Tongue prints: A novel biometric and potential forensic tool, Journal of Forensic Dental Sciences8 (2016), 117–119.

Bob

and Han

, Evidence-based complementary and alternative medicine, 2015.

Stefanescu

C.L.

, Popa

M.F.

and Candea

L.S.

, Preliminary study on the tongue based forensic identification, Romanian Journal of Legal Medicine22 (2014), 263–266.

Salim

, Recognition of tongueprint textures for personal authentication: A wavelet approach, Journal of Advances in Information Technology3(3) (2012), 168–175.

Ryszard

S.C.

, Biometric identification through tongue texture measurements, International Journal of Computers1 (2016).

10.

Zhang

, Liu

, Yan

and Shi

, Tongue-Print: A Novel Biometrics Pattern, Springer, Seoul, Korea, 2007, pp. 1174–183.

11.

Jeddy

, Radhika

and Nithya

, Tongue prints in biometric authentication: A pilot study, Oral Maxillofac Pathol21(1) (2017), 176–179.

12.

Sivakumar

, Nair

, Geevar

, Nair

M.S.

and Joseph

, Identification of tongue print images for forensic science and biometric authentication, Journal of Intelligent & Fuzzy Systems (JIFS), IOS Press, 2018.

13.

Muhammad

, Ahmad

and Baik

S.W.

, Early Fire Detection using Convolutional Neural Networks during Surveillance for Effective Disaster Management, Neurocomputing, 2017.

14.

Krizhevsky

, Sutskever

and Hinton

G.E.

, Imagenet classification with deep convolutional neural networks, In Advances in neural information processing systems, 2012, pp. 1097–1105.

15.

Simonyan

and Zisserman

, Very deep convolutional networks for large-scale image recognition, 2014, [Online]. Available: https://arxiv.org/abs/arXiv:1409.1556.

16.

Szegedy

, Liu

, Jia

, Sermanet

, Reed

, Anguelov

, Erhan

, Vanhoucke

and Rabinovich

, Going deeper with convolutions, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp.1–9.

17.

Szegedy

, Vanhoucke

, Ioffe

, Shlens

and Wojna

, Rethinking the inception architecture for computer vision, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2818–2826.

18.

Girshick

, Donahue

, Darrell

and Malik

, Region-based convolutional networks for accurate object detection and semantic segmentation, IEEE Trans Pattern Anal Mach Intell, 2015.

19.

Razavian

A.S.

, Azizpour

, Sullivan

and Carlsson

, Cnn features off-the-shelf: an astounding baseline for recognition, in IEEE CVPRW, 2014, pp. 512–519.

20.

Zhou

, Lapedriza

, Xiao

, Torralba

and Oliva

, Learning deep features for scene recognition using places database, in NIPS, 2014, pp. 487–495.

21.

, Zhang

, Ren

and Sun

, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.

22.

Szegedy

, Ioffe

and Vanhoucke

, Inception-v4, inception-resnet and the impact of residual connections on learning, arXiv preprint arXiv:1602.07261, 2016.

23.

Deng

, et al., Imagenet: A large-scale hierarchical image database, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009.

CNN models used	Accuracy
	CNN (Transfer learning)	CNN + SVM (Pre-trained CNN)
AlexNet	98.33	96.11
VGG16	96.11	92.78
VGG19	96.66	93.33
GoogLeNet	97.50	95.00
ResNet	98.61	96.94
InceptionResNet V2	93.61	93.61

Tongue print identification using deep CNN for forensic analysis

Abstract

Keywords

1 Introduction

1.1 Tongue-print as biometric

Table 1 Description of the tongue-print datasets used for the proposed method Dataset Name Sensor Type Resolution (dpi) Tongue DB CMOS Sensor "Exmor R" 300 Image Size No. of Individuals No. of Images 4896 × 2752 180 900

2.1 Deep CNN models

4.1 Experimental setup

Table 3 Comparison of the accuracy with the different deep CNN models used CNN models used Accuracy CNN (Transfer learning) CNN + SVM (Pre-trained CNN) AlexNet 98.33 96.11 VGG16 96.11 92.78 VGG19 96.66 93.33 GoogLeNet 97.50 95.00 ResNet 98.61 96.94 InceptionResNet V2 93.61 93.61

Table 4 Performance comparison with other worksreported in the literature Method Accuracy(%) Salim Lahmiri [8] 90.44 Zhang et al. [10] 92.64 Sivakumar et al. [12] 93.13 Proposed Method 98.61

Footnotes

Acknowledgment

References

Table 1
Description of the tongue-print datasets used for the proposed method

Dataset Name Sensor Type Resolution (dpi)

Tongue DB CMOS Sensor "Exmor R" 300

Image Size No. of Individuals No. of Images

4896 × 2752 180 900

Table 3
Comparison of the accuracy with the different deep CNN models used

CNN models used Accuracy

CNN (Transfer learning) CNN + SVM (Pre-trained CNN)

AlexNet 98.33 96.11

VGG16 96.11 92.78

VGG19 96.66 93.33

GoogLeNet 97.50 95.00

ResNet 98.61 96.94

InceptionResNet V2 93.61 93.61

Table 4
Performance comparison with other worksreported in the literature

Method Accuracy(%)

Salim Lahmiri [8] 90.44

Zhang et al. [10] 92.64

Sivakumar et al. [12] 93.13

Proposed Method 98.61