A performance evaluation of convolution neural networks for kinship discernment: An application in digital forensics

Abstract

Kinship Verification from facial images is known to have attracted major attention since time immemorial. Identifying the underlying patterns that exist between images and analysing the relationship hidden between them have enabled the multitudes of applications to utilize kinship relationships. This work serves as a study on the amount of influence that hereditary features can exert on the families tied together by lineage in identifying the relationship prevailing between them and whether it holds true or not. The approach employed involves detecting the relationship existing between the provided facial images using Siamese Network, which comprises two identical convolutional neural networks that share common weight values. A difference vector is computed from this Siamese CNN, which is then fed into a network of fully connected linear layers. This extended layer will determine whether the two individuals in the input images are related to each other or not.

Keywords

Convolution neural network siamese network deep learning kinship detection artificial intelligence

ï»¿

1. Introduction

The term Kinship can easily be traced way back centuries ago when facial traits aided people to distinguish between various families. Kinship truly acts as the universal form of human existence on the ties of blood and marriage. Several kinds of kinship exist between people wherein the consanguineal, also known as blood relations, holds the primary importance [9]. Affine relationships that are based on marriage do not necessarily act as a rudimentary factor in deciding one’s lineage. Major studies and researches have indicated that facial characteristics as well as gestures, serve as an integral key and criteria for classifying people belonging to certain families. In this Research predominantly focus is on the assertion of the former to make the relationship prediction process probable. If the relationship tying a family can be successfully predicted based on the provided images, it can further be utilized in several other prospects as described next. The traditional methods adopted by the law to search and find missing people [13] consume a large amount of time and effort considerably and ends up yielding poor outcomes. In addition to that, it is a tedious task to search for the ones who are related to you on any social media platform [11]. These two applications can make use of such a model to obtain fruitful and accurate results. The model curated in this approach deals with discerning the relationship that exists between 2 people vividly based on the images which are provided or in a nutshell predicting if two persons are belonging to the same family or not. These images are passed onto the Siamese network which produces a feature vector whose difference is computed and further used for classificationn [9]. Multiple works are related to what is presented here. The major difference is in the feature extraction techniques that are deployed here along with the scenario in which the Siamese network is used. Some of the main contributions of this research can be summarized as follows: A deep siamese network is proposed for kinship discernment. Calculating the element-wise difference for values in the feature vector of two images, which is then fed as an input to the extended CNN for classification.

2. Related work

In [1] major applications with regard to computer vision, which includes multi-view 3D reconstruction, retrieval of images, structure-from-motion (SFM), and image-based localization, finding out matching images across numerous datasets plays a vital role. This technique aims to find matching and non-matching pairs of images by representing them using neural network-based feature vectors wherein the similarity is computed using Euclidean distance. From the labelled instances of matching and non-matching image pairs which are obtained by the use of a contrastive loss function in a Siamese Architecture, the feature vectors are learnt. The experimental outcomes portray that the proposed features enhance the performance in comparison with baseline features produced with networks trained for the purpose of image classification tasks. The results are promising, taking into consideration image retrieval applications. In [2], Traditional kinship verification techniques containing discriminative features such as pitch are faced with a multitude of problems, namely concentrating on local region data comprising of a lot of noisy in other regions excluding the face, redundant information in overlapping regions. To eradicate these issues, a novel kinship verification method was devised based on nonlinear feature mapping and deep transfer learning. A new deep learning model that is trained on the face recognition dataset is transferred to kinship datasets wherein high-level features are extracted. The combination of triangular similarity metric learning and Siamese Multi-Layer Perceptron reduces the dimensionality of a feature vector and at the same time creates a larger distance between non-kin pairs and shorter distance within the kin pairs. In the end, the cosine similarity is calculated for each feature vector pairs, and classifiers such as Support Vector Machine can be used. In [3], due to the recent explosion of information on various platforms but prominently on social media, there is an urgent requirement to manage and organize images of people instantly. A solution is proposed by exploiting the relationship of people in the images by constructing two databases where the first one is UB KinFace Ver2.0 comprising of images of children, their young parents, and their old parents and the second database is named FamilyFace. To decrease the major differences in the distribution of appearance between the old parents and children facial images, the transfer subspace learning-based algorithm is developed. In addition to that, an algorithm is proposed which explores the semantic relevance of the corresponding metadata to predict the kin relationship concealed in an image. Human subjects are made use of in a baseline study. Experimental outcomes have proved that the proposed solution can efficiently annotate the kin relationships within people and semantic context can enhance the precision to a larger extent. In [4], karyotyping is the process of ordering and pairing 66 chromosomes from various images of cells depending upon facts such as size, centromere position, and banding pattern. Since the technique of karyotyping requires knowledge in the domain, experience and a large amount of manual effort to diagnose various diseases, automation is highly desirable to aid the assistants. It lessens the cognitive load required for the entire process. Based on the concepts in deep learning, this method aims to provide chromosome classifications wherein chromosomes are straightened and then fed into the Siamese networks to push the embeddings of samples emerging from labels closer. Balanced sampling is performed from the pairwise dataset and at the same time choosing different training pairs for Siamese Network and an MLP based prediction on the embeddings generated from the trained Siamese Networks. In [5], a new approach was proposed, namely DeepTensor $+$ ELM to predict the kinship relationship based on deep and tensor features through Extreme Learning Machine. Small size training features, dimensions, deep and tensor features are focused by ELM to provide significant over shallow features or vector-based counterparts. Pair of two facial images for instance, grandparent and grandchild, is provided as input and features are extracted into deep and tensor features for each pair. The cosine similarities between the deep and tensor features are calculated to encode the final metric. To the Extreme Learning Machine classifier, the similarities are fed wherein it is trained to predict whether two people are kin related or not. Fusing scores of three multi-view deep and tensor features learned with the proposed DeepTensor $+$ ELM method accounts for the best result. In [6], a massive range kinship detection challenge known as the Recognizing Families In The Wild (RFIW) used a FIW dataset. This is one of the biggest dataset comprising of more than 12000 family pictures and a 1000 families. The major duty of the database was to predict the kin relationship in case any exists between 2 individuals. A convolutional neural network is used for feature extraction and the reason why it out passes the rest of the techniques available is clearly demonstrated. The triplet loss function was used to verify the kinship relationship and for recognizing the families and ultimately, the softmax classifier was used in the network.

3. Methodology

3.1 Dataset description and instance generation

The dataset is collected from Kaggle, Families In the Wild (FIW) dataset issued by Northeastern SMILE Labs [22]. It comprises of images of people from 470 different families, with multiple images of each individual in the family. But not every member of the family shares a kin relation. For example, a mother and father are kin to their children, but not to each other since they hail from two different families. So, a comma-separated value file is given along with the dataset that describes people who share a kin relation. The CSV file has around 3350 rows, which means that there are those many pairs who share a kin relationship. The images in the dataset are of varying size, a clear image of the face of the individual. All the images are in RGB format with varying brightness and lightings. The CSV file provided with the dataset describes the instances of people who share a kin relationship. Although it has images of people belonging to different families, it is not explicitly mentioned anywhere. Hence, we generate instances for people who do not share a kin relation by picking individuals belonging to different families. Since we have 470 different families and an average of 4 individuals per family, it is possible to generate a large number of samples. But to maintain the balanced state of the dataset, we restrict our instance generation to 3350 samples. The generated instances are appended at the end of the CSV file obtained from Kaggle.

Figure 1.

Dataset example [22].

3.2 Data cleaning

All the data obtained from Northeastern SMILE Lab are not in the same state. In short, they differ from each other with regard to their size and other features. The directories which do not contain any images are removed from the dataset. Each image is been reshaped to a file dimension of 224x224 pixels without changing its depth from 3. The dataset is well shuffled to have an unbiased learning. A label of one is assigned to the image pairs that share a kin relation and label of zero otherwise. The overall dataset is been split into training and validation sets with 80 percentage of the data allocated for training and remaining 20 percentage for validation.

Fig. 2. ConvBlock1. Fig. 3. ConvBlock2.

Figure 4.

CNN Architecture.

3.3 Data augmentation

For any deep learning algorithm to learn, it is better to have more training instances as possible i.e. those who share a similarity and those who do not. Since the focus is on solving the complex problem the amount of quantity of training images available was not sufficient. To alleviate this problem, more input samples were generated from the available images. This process is called data augmentation. New samples are generated by altering multiple properties of the available images.

Here first we resized the images into 255x255 pixels by appending at the outer part of the image pixel matrix. Also the image is cropped at the center into 224x224 pixels. Then it is flipped horizontally with probability 0.5. At the end, it is also normalized with both mean and standard deviation of 0.5 for each channel of the input tensor.

3.4 CNN architecture

Convolutional Neural Network is a popular deep learning algorithm used for extracting important features from the images. In this approach, the CNN algorithm is used to extract important features such as the diameter of eyes [23, 24, 25], the distance between ears and eyes, length of the nose, etc., from the images of individuals’ faces. The proposed CNN consists of two ConvBlock1 and three ConvBlock2 with three fully connected layers at the ends, as shown in Figs 2 and 3. A single ConvBlock1 comprises of two convolutional layers with a kernel size of 3x3 and 1x1 striding. The padding of 1x1 is used along with these layers.

In the end, we have a maximum pooling layer of dimension 2x2 which is defined as

$\displaystyle y_{j,k}^{i}={}_{0\leqslant m,n\leqslant s}^{\max}\left\{x_{j.s+m% ,k.s+n}^{i}\right.$ (1)

Where $y^{i}_{j,k}$ denotes the output of the ith future map in the location of ( $j, k$ ) in the ith future Map. This will reduce the feature vector dimension to its half ConvBlock2 also has the same architecture except that it has three convolutional layers. The parameters of the additional convolutional layer remain the same as that of the other two layers, as shown in Figs 2 and 3.

The diagram in 4 shows the architecture of the Convolutional Neural Network. The input to the network is the face images in RGB mode with a fixed dimension of 224x224 pixels. This input is given to a ConvBlock1, which is then passed through another ConvBlock1 and three ConvBlock2. This helps in extracting the features and reducing the dimensionality of the feature maps. Three fully connected layers are with two having 4096 active units and one with 2622 units at the end. ReLu activation is used throughout the network, which has proven to yield better end results. With ReLu convolution operation is computed as

$\displaystyle y^{j(r)}=\max\left(0,b^{j(r)}+\mathop{\sum}\nolimits_{i}w^{ij(r)% }*x^{i(r)}\right)$ (2)

In Eq. (2) where $x^{i}$ is the i-th input map, $y^{j}$ the j-th output map, $w^{ij}$ denotes the weight between the i-th input map and the j-th output map and $b^{j}$ is the bias of the j-th output map. Two dropout layers are also used in the model to reduce the chances of model overfitting.

Figure 5.

Twin architecture of siamese model.

Figure 6.

Extended siamese for classification.

3.5 Siamese CNN for similarity

The CNN architecture explained in the previous section can extract features from a single image. But it cannot perform complex image comparison tasks. To compare two images, we use the twin architecture of the CNN model. This architecture is shown in 5. The diagram in 4 shows the architecture of the Convolutional Neural Network. The input to the network is the face images in RGB mode with a fixed dimension of 224x224 pixels. This input is given to a ConvBlock1, which is then passed through another ConvBlock1 and three The two CNN models are the exact replica of the CNN model architecture shown in Fig. 4. The input image pair will be passed through the twin CNN models. In the end, a feature map with 2622 features for both the images will be obtained since it had those many active units in the final fully connected layer. The similarity score is computed using these feature maps. It is nothing but the absolute difference between the two feature maps. Suppose if $X1$ and $X2$ are the input images and $f(X1)$ and $f(X2)$ are the corresponding feature vectors, then the similarity score is computed using the equation,

$\displaystyle\text{Similarity Score}=\text{sum}(||f(X1)-f(X2)||)$ (3)

This absolute feature difference is used to tune the parameters of the Siamese network. For the pair of images that is representing a kin relation, the difference is being reduced, and increase this parameter for those who are not unrelated. This will basically set a threshold value boundary for difference, below which it’s been stated that they are related. If the difference is outside the boundary, it is concluded that they are not related. By performing the operation of weight update on different image samples a certain number of times, the model weight values will be adjusted to rightfully classify the samples. One important characteristic of the Siamese network is that both the CNN model in the twin architecture will share the same weight values.

3.6 Siamese CNN for classification

A twin CNN model is instantiated for finding the feature difference between two images, which basically yields us the absolute feature difference. Here Siamese model has been extended for classification. This is shown in Fig. 6.

Figure 7.

Training loss curve.

Figure 8.

Validation loss curve.

The extended layer consists of five fully connected layers. In the first four layers, the dimension is reduced by half at every layer starting from 1024 active units in the first layer. The final fully connected layer consists of only two active states that determine the class to which it belongs by providing a probability distribution over the two classes. The one with the maximum probability is considered as the output class. The input to this extended model is a similarity tensor computed from the image feature vectors.

$\displaystyle\text{Similarity Tensor}=f(X1)-f(X2)$ (4)

In this equation, $X1$ and $X2$ represent the two input images, and $f(X1)$ , $f(X2)$ are their corresponding feature vectors obtained from Siamese CNN. This vector has the same dimensionality as that of the feature vectors. This input vector is further processed at different layers giving the right classification at the end.

Figure 9.

RoC curve.

3.7 Training strategy

The Siamese model is trained using the pre-processed training data. Iterating through the dataset considering one image pair at a time, pass the images through the Siamese architecture and obtain the feature maps. Further, compute the similarity vector from the obtained features. Then pass the obtained similarity vector through the extended Siamese network for classification. This will predict the output class to which it belongs. If it is the right class, go for the next sample in the dataset. If not, calculate the loss function and update the weight values to reduce the overall loss of the system. Here Binary Cross-Entropy Loss function has been used since two output classes are there. This loss function is given by,

$\displaystyle\text{Cross-Entropy Loss}=-y\log(p)-(1-y)$ $\displaystyle\quad\log(1-p)$ (5)

where $y$ is the target value that is either 0 or 1, and $p$ is the predicted value. Stochastic gradient descent optimizer is used with a learning rate of 0.008 and a momentum of 0.9. The gradient is cleared each other before calculating the new loss. These steps are repeated for 150 times, which is the number of epochs. At each iteration, the validation accuracy is checked to ensure the performance of the model.

Algorithm 1: Deep Siamese Network
Input: N Image pairs (N, X0, X1, Y)
Output: Updated weight values of Siamese CNN and extended network
Step 1: Instantiate Siamese CNN model
Step 2: Instantiate Extended model for classification Step 3: For epoch times:
Step 3.1: For each image pair (X0, X1):
Step 3.2: Pass the image pair through the Siamese network,
$f(Xi)=$ Siamese_CNN_Model(Xi), Xi € (X0, X1)
Step 3.3: Find the elementwise difference between the two resulting feature vectors Step
Step 3.4: Feed the difference vector to the extended network
Step 3.5: Obtain the output class using softmax, which is calculated as, Where is a 2D vector
Step 3.6: Calculate the loss Step
Step 3.7: Compute the gradients
Step 3.8: Propagate backwards updating weights
Step 4: Return the weight values.

In continuation to algorithm, As deep Siamese architecture do not use a classification loss function instead of that uses contrastive function it described in Eq. (3.7) and data acquisition [1, 2, 3]. The description of the course of research should be supported references, so theexplanation can be accepted scientifically [2, 4].

$\displaystyle\left(1-Y\right)\frac{1}{2}\left(\textit{DW}\right)\left(\textit{% DW}\right)^{2}+\left(Y\right)\frac{1}{2}$ $\displaystyle\quad\left\{{\max}\left(0,m-\textit{Dw}\right)\right\}^{2}$ (6)

4. Results

From the experiment, the overall accuracy of the trained model on the training data and validation data has been computed. The results proved that the model is 72 percent accurate on the training data and 65 percent accurate on the validation data. The confusion matrix for the training set and validation set are given in Tables 1 and 2, respectively.

Table 1
Confusion matrix for training

Kinship	Related	Unrelated
Related	480	144
Unrelated	346	430

Table 2

Confusion matrix for validation

Kinship	Related	Unrelated
Related	1973	903
Unrelated	665	2059

The loss curve of training and validation overtraining is shown in Tables 1 and 2, respectively. From the curve, it is inferred that the training loss is decreasing over time, whereas validation loss is varying largely from the mean throughout.

Here we plot ROC curve for both the class and its corresponding AUC metrics is displayed.

For People who don’t share any kin relation class – ROC curve.

The above diagram shows the ROC curve for class containing image of people who do not share any kin relation among their selves. In this study we give equal importance to both true positive rate and false positive rate, since we do not need any false classifications in the output. From the plot, it’s observed that for a smaller value of false positive rate, true positive rate is high and as false positive rate increases, increase in true positive rate is minimal. Area under curve of 0.84 suggests that at this point we have the best separation on results.

People who share kin relation class – ROC curve.

The below diagram shows ROC curve for people who share kin relation. Here also we can observe the same trend as in the other class. That is, among all instances most of the positive class is been predicted as positive comparing to being predicted as negative wrongly. Area under the curve metric suggests that we have 85% separability on the results.

5. Conclusion

A study of inherited facial features through blood relations is successfully completed. An accuracy of 65 percent over unseen set of validation data confirms that more than half of the facial features that an individual possess is inherited from their ancestors or it can also be stated as people who are related by blood will share similar facial characteristics. If the accuracy that is generated from the model can be further improved, a wide range of applications can utilize the former.

Searching and findng missing people always prove to be a source of hardship and a tedious process for the government bodies and agencies. If a kinship prediction model can be employed in such scenarios, the entire task can become extremely easier. Moreover, such a model can generate suggestions on social media platforms based on the images provided, thereby enhancing the user experience. Future development can be carried out to exalt the quality of the system, thereby increasing its chances of being exploited in further applications.

Footnotes

Acknowledgments

This project has been supported by VGST, Department of ITBT, Government of Karnataka. Project has been developed as a part of Project CoE Digital Forensics Intelligence.

References

Melekhov

Kannala

Rahtu

. Siamese network features for image matching. In 2016 23rd international conference on pattern recognition (ICPR). 2016 Dec 4 (pp. 378-383). IEEE.

Yang

. A novel kinship verification method based on deep transfer learning and feature nonlinear mapping. In Proc. Int. Conf. Artif. Intell. Eng. Appl. (AIEA). 2017 Oct, pp. 947-956.

Xia

Shao

Luo

. Understanding kin relationships in a photo. IEEE Transactions on Multimedia. 2012 Feb 10; 14(4): 1046-56.

Jindal

Gupta

Yadav

Sharma

Vig

. Siamese networks for chromosome classification. In Proceedings of the IEEE international conference on computer vision workshops. 2017; pp. 72-81.

Laiadi

Ouamane

Benakcha

Taleb-Ahmed

Hadid

. Kinship verification based deep and tensor features through extreme learning machine. In 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019). 2019 May 14, pp. 1-4. IEEE.

Robinson

Shao

. Families in the wild (fiw) large-scale kinship image database and benchmarks. In Proceedings of the 24th ACM international conference on Multimedia, 2016 Oct 1, pp. 242-246.

Robinson

Shao

Liu

Gillis

. Visual kinship recognition of families in the wild. IEEE Transactions on pattern analysis and machine intelligence. 2018 Apr 13; 40(11): 2624-37.

Wang

Robinson

. Kinship verification on families in the wild with marginalized denoising metric learning. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), 2017 May 30, pp. 216-221.

Nandy

Mondal

. Kinship verification using deep siamese convolutional neural network. In 2019 14th IEEE international conference on automatic face & gesture recognition (FG 2019), 2019 May 14, pp. 1-5. IEEE.

10.

Boutellaa

Feng

Hadid

. Kinship verification from faces: Methods, databases and challenges. In 2016 IEEE international conference on signal processing, communications and computing (ICSPCC), 2016 Aug 5, pp. 1-6. IEEE.

11.

Sellam

Azzoune

. Linear Feature Learning for Kinship verification in the wild. In 2018 International Conference on Applied Smart Systems (ICASS), 2018 Nov 24, pp. 1-6.

12.

Patil

Chandra

. Deep learning based kinship verification on KinFaceW-I Dataset. In TENCON 2019-2019 IEEE Region 10 Conference (TENCON), 2019 Oct 17, pp. 2529-2532. IEEE.

13.

Alirezazadeh

Fathi

Abdali-Mohammadi

. A genetic algorithm-based feature selection for kinship verification. IEEE Signal Processing Letters. 2015 Oct 14; 22(12): 2459-63.

14.

Hao

Xie

. Deep fusion siamese network for automatic kinship verification. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), 2020 Nov 16, pp. 892-899. IEEE.

15.

Cui

Zhan

Sun

Zhang

. Face recognition via convolutional neural networks and siamese neural networks. In 2019 International Conference on Intelligent Computing, Automation and Systems (ICICAS), 2019 Dec 6, pp. 746-750. IEEE.

16.

Chergui

Ouchtati

Sequeira

Bekhouche

Bougourzi

Telli

. Deep features for kinship verification from facial images. In 2019 International Conference on Advanced Systems and Emergent Technologies (IC_ASET), 2019 Mar 19, pp. 64-67. IEEE.

17.

Chergui

Ouchtati

Sequeira

Bekhouche

Bougourzi

. Kinship verification using BSIF and LBP. In 2018 International Conference on Signal, Image, Vision and their Applications (SIVA), 2018 Nov 26, pp. 1-5. IEEE.

18.

Rehman

Khalid

Asghar

Khan

. Kinship verification using deep neural network models. In 2019 International Symposium on Recent Advances in Electrical Engineering (RAEE), 2019 Aug 28, 4, pp. 1-6. IEEE.

19.

Dahan

Keller

Mahpod

. Kin-verification model on fiw dataset using multi-set learning and local features. In Proceedings of the 2017 Workshop on Recognizing Families In the Wild, 2017 Oct 27, pp. 31-35.

20.

Xia

Shao

. Kinship verification through transfer learning. InTwenty-Second International Joint Conference on Artificial Intelligence, 2011 Jun 29.

21.

Laiadi

Ouamane

Benakcha

Taleb-Ahmed

. Rfiw 2017: Lpq-sieda for large scale kinship verification. In Proceedings of the 2017 workshop on recognizing families in the wild, 2017 Oct 27, pp. 37-39.

22.

Robinson

Shao

. Families in the wild (fiw) large-scale kinship image database and benchmarks. In Proceedings of the 24th ACM international conference on Multimedia, 2016 Oct 1, pp. 242-246.

23.

Fernandes

Bala

. A comparative study on ICA and LPP based Face Recognition under varying illuminations and facial expressions. In 2013 International Conference on Signal Processing, Image Processing & Pattern Recognition, 2013 Feb 7, pp. 122-126. IEEE.

24.

Lawrence Fernandes

Josemin Bala

. 3D and 4D face recognition: a comprehensive review. Recent Patents on Engineering. 2014 Dec 1; 8(2): 112-9.

25.

Lawrence Fernandes

Josemin Bala

. Development and analysis of various state of the art techniques for face recognition under varying poses. Recent Patents on Engineering. 2014 Dec 1; 8(2): 143-6.

A performance evaluation of convolution neural networks for kinship discernment: An application in digital forensics

Abstract

Keywords

1. Introduction

2. Related work

3. Methodology

3.1 Dataset description and instance generation

3.4 CNN architecture

Table 1 Confusion matrix for training

Footnotes

Acknowledgments

References

Table 1
Confusion matrix for training