Abstract
Nowadays the manipulations of digital images are common due to easy access of many online photo editing applications and image editing softwares. Forged images are widely used in social media for creating deceitful propaganda of an individual or a particular event and for cooking up fake evidences even in court proceedings. Hence ensuring the integrity of digital images is of prime significance and it has become a hot research area. In this paper, a novel technique for image forgery detection is proposed. The method utilizes the layer activation of inception-ResNet-v2, a pretrained Convolutional Neural Network(CNN)to extract the deep textural features from Rotation Invariant – Local Binary Pattern (RI-LBP) map of the chrominance image. Non-negative Matrix Factorization (NMF) technique is used to reduce the dimensionality of the extracted features. The dimensionality reduced features are used to train a quadratic Support Vector Machine(SVM) classifier to classify images into forged or authentic. The method is assessed on four benchmark datasets (CASIA ITDE v1.0, CASIA ITDE v2.0, CUISDE and IFS-TC). Extensive experimental analysis is done and the results show an improved detection accuracy compared to the state-of-the-art methods.
Keywords
Introduction
Images contain sensitive information contents, and everyday a huge number of images are posted in various social medias. The authenticity of the images shared in social media is uncertain since anyone can purposefully and easily manipulate images using photo editing softwares such as Adobe Photoshop and PhotoScape. The wide spread propagation of manipulated images may result in unwanted social or political unrest among the public. Thus, the privacy and authenticity of digital images shared via Internet is an important concern. Images are also used in police investigations as a scientific proof and presenting a manipulated image to the court may lead to an incorrect judgment. Thus digital image forgery detection is becoming a hot research area now and researchers are widely exploring various methods to detect image forgeries.
Copy move forgery [3] and image splicing [22] are the two broad categories of image manipulation techniques. A slice of an image is copied and pasted in some other location of the same image in copy move forgery, whereas a spliced image is a composite image created by copying portions from one or more images and then pasting them on another image. Figure 1 demonstrates examples of image manipulations. Figure 1(a) is an original image of an evidence of crime scene having one cartridge and Fig. 1(b) is the manipulated (copy move forged) version of original image with two cartridges. Rooster displayed in Fig. 1(c) is copied and pasted on Fig. 1(d) to create the composite image as shown in Fig. 1(e).

Digital image forgery detection approaches can be mostly categorized as active and passive [30]. Pre-embedded information like watermark [18] or digital signature [19] is used in the active approaches. Image specific features which can clearly discriminate a forged image from an authentic one is used in passive techniques. Only a few images in the media contains pre-embedded information and hence passive methods are extensively studied. The traditional machine learning workflow of the passive methods consists of pre-processing of the images, handcrafted feature extraction, selection of optimum features and then training a suitable classification model. The feature engineering process is time consuming and complex part of any machine learning framework due to the difficulty of defining appropriate features for different types of manipulations.
Recent advancement of data-driven techniques like deep learning using Convolutional Neural Networks (CNNs) [15] have shown exceptional results in general image classification problems. These CNNs are capable of learning rich feature representations directly from images [21]. The layer activations of the pretrained CNNs models can be used as feature extractors [4] for numerous applications in the field of computer vision.
In this work, a novel method is proposed for image forgery detection by exploiting the power of pretrained CNN along with rich texture representation capability of Rotation Invariant – Local Binary Pattern (RI-LBP) maps. The textural inconsistencies in images due to manipulations are captured by RI-LBP maps of chrominance images. The deep textural features are extracted from these RI-LBP maps using the layer activation of pretrained CNN. The dimensionality of the extracted features is reduced using Non-negative Matrix Factorization (NMF) technique. The dimensionality reduced features are used for training a Support Vector Machine (SVM) classifier. The method is evaluated on four benchmark image forgery datasets: (i) Chinese Academy of Sciences Institute of Automation Image Tampering Detection Evaluation version 1.0 (CASIA ITDE v1.0) database, (ii) version 2.0 of CASIA ITDE database (CASIA ITDE v2.0) [9], (iii) Columbia Uncompressed Image Splicing Detection Evaluation (CUISDE) dataset [11] and (iv) Image Forensic Challenge (IFS-TC) dataset. The experimental analysis shows improved detection accuracy compared to other state-of–the-art methods for image forgery detection.
The remainder of this paper is structured as follows. Section 2 discusses the earlier works related to the proposed technique. The proposed method and its details are explained in Section 3, which is followed in Section 4 by the specifications of the experimental setup. The experimental results and discussions are presented in Section 5. Section 6 gives the conclusions and future work.
Many studies have done on image forgery detection using handcrafted feature engineering techniques. Various techniques have been proposed by the researchers for classifying the manipulated images from the authentic ones. In this section, we discuss some passive image forgery detection methods for copy move and image splicing forgeries.
Al-Hammadi et al. [1] proposed a technique using curvelet transform and LBP for image forgery detection. Curvelet transform is applied on the chrominance images and its LBP histograms are calculated. A feature vector is formed by combining these histograms and fed an SVM model. Muhammad et al. [20] applied a Steerable Pyramid Transform (SPT) on the chrominance images. Then obtained LBP histograms for each SPT sub bands and feature vector is formed by concatenating these histograms. The feature vector is then served to an SVM classifier.
Hussain et al. [12] suggested a forgery detection technique using multi-scale Weber Local Descriptors (WLD) and multi-scale LBP techniques. Locally Learning Based (LLB) algorithm is used as feature selection technique and the selected features are then fed to an SVM classifier. Alahmadi et al. [2] proposed a block based technique using LBP and Discrete Cosine Transform(DCT) for image forgery detection. The chroma component of image is divided into overlapping blocks, and LBP image of each block is obtained. Then DCT is applied on each LBP images and the statistical measures of DCT coefficients are used as feature vector to train an SVM classifier. Vidyadharan and Thampi [32] utilized multi-texture feature extraction technique by combining texture descriptors such as Local Phase Quantization (LPQ), LBP, Binary Gabor Pattern (BGP) and Binarized Statistical Image Features (BSIF) for detecting image forgeries. These texture features are extracted from the Steerable Pyramid Transform (SPT) sub bands of image and are then combined together to form the multi-texture descriptor. ReliefF feature selection method is to generate a compact representation of texture and the selected features are given to a Random Forest classifier.
A method based on Gabor wavelets and LPQ for detecting image forgery is proposed by Isaac and Wilscy [13]. Gabor wavelet transform is applied on the Cr component of the image at different scales and orientations. Then the LPQ features obtained from the different Gabor sub band images are reduced using NMF technique and given to an SVM classifier. The above-mentioned approaches [1, 32] utilized handcrafted feature extraction techniques to capture the discriminative features between manipulated regions and authentic regions.
Recently some image forgery detection methods utilized the CNN to detect image forgeries. Rao and Ni [26] designed a CNN architecture and trained the network using image patch samples. This trained CNN is used to extract the patch based features of image using a sliding window. The extracted features are combined using feature fusion technique and fed to an SVM classifier. Rota et al. [27] proposed a CNN architecture for tampered image classification and the network is trained with patches of the training images to perform a classification. Zhou et al. [35] trained a rich model Convolutional Neural Network (rCNN) for detecting image forgery using a special block strategy. Shi et al. [29] proposed a Dual-domain CNN architecture for image forgery detection. The network is trained with image patches to perform the classification. These methods [26, 35] use image patches or blocks for training a CNN, however this patch based approach may lead to the loss of evidence in image forgery detection. A huge number of labelled data is essential for obtaining an accurate and reliable classification model. Also, training a CNN architecture from scratch is computationally expensive. In image forgery detection tasks, it is often difficult to attain vast amount of labelled training data and to overcome these issues, we utilize the transfer learning approach [33] by exploiting the power of pretrained CNN as a feature extractor [6, 7] along with rich textural description capability of LBPs. LBP is a good discriminative and computationally proficient texture descriptor [23]. Thus LBPs can store hidden texture variations due to image manipulations.
These observations motivated us to combine the deep feature extraction power of CNNs and rich texture description capability of LBPs for forgery detection. In the proposed method, we are extracting deep textural features from the RI-LBP maps of chrominance images using the layer activation of pretrained CNN. The NMF technique is used to obtain optimized number of features and then features are given to an SVM classifier with quadratic kernel for training. The details of the proposed technique, experimental setup and analysis of experimental results are presented in the following sections.
Proposed method
An overview of the proposed technique for image forgery detection is provided in Fig. 2, the proposed method for detecting image forgery utilizes high texture description power of LBP along with rich feature representations of pretrained CNN is described. The deep textural features are extracted from RI-LBP maps of chrominance (Cb, Cr) channels of images using the layer activation of Inception-ResNet-v2 [31], a pretrained CNN. NMF technique is used to reduce the dimensionality of the extracted deep textural features and this dimensionality reduced features are utilized to train a quadratic SVM model for classifying images into forged or authentic.

An overview of the proposed image forgery detection method.
The method contains the following stages. (A) Conversion of RGB images to YCbCr color space since it is proved in the literature [1, 34] that chrominance channels (Cb, Cr) are more useful for identifying image forgeries. We conducted experiments to verify this claim and it is found that very high detection accuracies are obtained when chrominance channels are used, compared to luminance channel (Y) or RGB images. The details of this experimental analysis is given in Section 5. (B) Obtaining RI-LBP maps from chrominance components of images. RI-LBP is an effective texture descriptor which is able to capture texture variations in images due to manipulations. (C) Deep textural feature extraction using pretrained CNN. The fully connected layer of pretrained network Inception-ResNet-v2 is used as feature extractor for extracting deep features from RI-LBP maps, since the deeper layers give rich discriminative features [25]. (D) The dimensionality reduction of extracted features using NMF. (E) Classification of images into forged or authentic using an SVM classifier with quadratic kernel. The detailed explanation of various steps used in this method are explained in the following subsections.
In this pre-processing step, RGB color image is converted into YCbCr color space. The conversion is done using the JPEG 2000 standard as shown in Equation (1). The chrominance-blue (Cb) and Chrominance-red (Cr) channels give higher forgery detection accuracies than the luminance (Y) component of YCbCr image and hence two chrominance components (Cb, Cr) are considered for the further processing.
The texture of an image provides details about the spatial arrangement of color or intensity variations. The manipulations like copy move or splicing induce unnatural textural variations in images. LBPs are good discriminative and computationally proficient texture descriptors [23] which can capture hidden texture variations in manipulated images. Even though LBP efficiently captures the local texture structure, it is not rotation invariant. To attain the rotation invariance in local binary patterns, the RI-LBP [24] is used in this work. RI-LBPX is obtained by circularly rotating the original LBPX until its minimum binary value is reached as given in Equation (2), where LBPX is the local binary pattern of an image considering X neighboring pixels. The function ROR(LBPX,i) symbolizes the circular shift operator which circularly shifts LBPX array i times to the right.
In the proposed method, RI-LBPs are obtained from the chrominance (Cb, Cr) components of each image using a 3×3 neighborhood. Due to the convolution operations performed in CNN, unordered local binary pattern codes cannot be directly given as input to CNN. To overcome this issue, local binary pattern codes are mapped to a 3D metric space using Multi-Dimensional Scaling(MDS) [17]. The resulting 3 channel RI-LBP maps are used as input to CNN model.
CNN is a deep neural network architecture that contains convolutional layers, pooling layers and classification layers. CNNs are used mainly for various image recognition applications. One of the most significant results in deep learning is the use of CNNs for the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [15]. These pretrained CNNs have learned rich feature representations for a wide range of images. The early layers of pretrained CNNs are able to learn basic features and the deeper layers use these features to produce more discriminative rich features [25].
In this proposed method, layer activation Inception-ResNet-v2 [31], a pretrained CNN is used as a feature extractor. Inception-ResNet-v2 network achieved lowest error percentage (Top-1 error percentage of 19.9% and Top-5 error percentage of 4.9%) on the ILSVRC dataset. Rich features are extracted from the layers which are nearer to the classification layer [21] and hence in the proposed technique, fully connected(fc) layer of Inception-ResNet-v2 is used to extract deep features. The dimension of the extracted deep features using fc layer (‘prediction’) is1000. Instead of giving RGB images, RI-LBP maps of images are given as inputs to pretrained CNN for deep feature extraction and the extracted deep features are called deep textural features. Hence in this work, we combine the power of pretrained CNN and rich texture description capability of LBPs for forgery detection. The dimension of extracted deep features is reduced using NMF technique and the dimensionality reduced features are utilized to train an SVM classifier.
Dimensionality reduction using NMF
The dimensionality reduction techniques help to select the relevant features for training a classifier by removing redundant features. This results in reduced size of feature vector and thereby reduces the training time of a classifier. It also helps to avoid overfitting of the classifier. In this work, the dimensionality of the extracted deep features is reduced to an optimum size using NMF technique[16]. Let
Each x
n
sample in X can be approximated as the linear combination of the columns of U multiplied by the components of the n
th
column of V, as in Equation (4).
Hence, U is collection of basis vectors, while v
n
, the n
th
column of V is the coding vector of the n
th
data sample. Squared Euclidean Distance (SED) is the cost function used to solve U and V as shown in Equation (5).
The dimensionality reduced deep textural features are utilized to train an SVM classifier [10] for filtering images into two classes, authentic and forged. We use an SVM with quadratic kernel for classification. On initial testing, quadratic kernel was found superior to other kernel functions like linear, cubic and gaussian. The quadratic kernel transforms the features into a higher dimensional feature space and where the features can be linearly separated. The quadratic kernel function is defined as in Equation (6) where
MATLAB R2019a is used for implementing the proposed technique with the help of a GPU based system. The system contains an Intel(R) core(TM) i7 processor with a NVIDIA GTX 1060 graphics card. The details of the datasets and the performance evaluation metrics are discussed in the following subsections.
Datasets
CASIA ITDE v1.0, CASIA ITDE v2.0,CUISDE and IFS-TC1 are four publically accessible image forgery detection datasets used for evaluating the proposed method. The details of these benchmark datasets are given in Table 1.
Details of datasets
Details of datasets
In the proposed method, forged images are taken as positive and authentic images are considered as negative. Following performance metrics are used for assessing the proposed method,
Accuracy is defined as percentage of images that are classified correctly and is calculated as in Equation (7). Precision and Recall of the classifier are obtained using Eqs. (8) and (9), respectively. FMeasure is the harmonic mean of precision and recall and is calculated using Equation (10).
Four experiments are performed to study the efficacy of the proposed work. The following experiments are conducted to study: Effect of chrominance channels on the detection accuracy Effect of feature dimensionality reduction on detection accuracy Effect of extracting deep textural features from RI-LBPs Performance of SVM classifier with other classifiers
The experiments are done on four standard image forgery datasets and finally, a comparison with state-of-the-art methods is done to evaluate the performance the proposed work.
Effect of chrominance channels on the detection accuracy
In this experiment, we evaluate the effectiveness of chrominance channels (Cb, Cr), luminance channel (Y) of images and RGB images in detecting image forgeries. The detection accuracies are evaluated for each case. The dimension of deep features extracted from RI-LBP map of each image is 1000.The dimensionality reduction of these features is done using NMF technique and empirical method is used to find the optimal feature dimension for each dataset. The optimum feature dimension for each dataset is tabulated in Table 2. The experiments are performed using these optimum number of features. The results are analyzed in Fig. 3, which clearly shows that higher accuracies are obtained when chrominance channels are used for feature extraction. Hence in this work, chrominance channels (Cb, Cr) are used for the forgery detection and analysis.
Optimum feature dimensions for various datasets
Optimum feature dimensions for various datasets

Detection accuracies of various datasets by considering chrominance channels (Cr and Cb), luminance channel (Y) and RGB images.
Table 3 gives the performance evaluation of the proposed method using chrominance channels (Cb, Cr) and effect of combining chrominance channels. It can be observed from Table 3 that, the highest detection accuracy of 99.1% is obtained for the dataset CASIA ITDE v1.0 with a good recall of 1.00 and precision of 0.99 when Cr channel is considered for the experiment. Highest accuracy of 99.30% obtained for CASIA ITDE v2.0 dataset by considering Cb channel. Highest detection accuracy of 98.3% is obtained for CUISDE dataset when the features from Cb and Cr channels are concatenated(Cb+Cr). Also it can be seen that detection accuracy of greater than 99% is obtained for CASIA ITDE v2.0 datasets for all the three cases (Cb, Cr, Cb+Cr).
Performance evaluation
In this experiment, we study the impact of feature dimensionality reduction using NMF technique on the detection accuracy. The number of deep features extracted from each image is 1000 and the optimum number of features after feature reduction for each dataset is given in Table 2. The detection accuracy before and after applying the feature reduction on chrominance Cb channel is shown in Fig. 4 and we can see that the accuracies obtained without the application of feature reduction are 94.2% for CASIA ITDE v1.0,95.1% for CASIA ITDE v2.0,91.5% for CUISDE and 94.0% for IFS-TC 92.0%. However, after applying feature reduction using NMF it can be observed that the detection accuracy has improved to a great extent. The accuracy for CASIA ITDE v1.0 has improved from 94.2% to 98.0% with a feature dimension of 500. In the case of CASIA ITDE v2.0, accuracy has hiked from 95.1% to 99.3% with a feature dimension of 600.The accuracy for CUISDE has raised from 91.5% to 96.7% with a feature dimension of 52 and accuracy for IFS-TC has increased from 94.0% to 97.7% with a feature dimension of 350. Thus feature reduction technique using NMF increased the detection accuracy on an average of 4 % on every dataset.

Detection accuracies before and after feature reduction considering Cb channel.
In this experiment, the effect of using RI-LBP map of images in the proposed method is investigated. To study this, deep features are extracted directly from chrominance channels of images using the fully connected layer of Inception-ResNet-v2. The extracted features are reduced using NMF technique and a quadratic SVM classifier is training using these features. The results are compared with the detection accuracies obtained by using deep textural features extracted from RI-LBP maps of the chrominance channels. Cb channel is considered for this comparison and the results are presented in Fig. 5 and it can be noted that, there is a great improvement in the detection accuracies if deep features are extracted from RI-LBP maps. This is due to the fact that, local binary patterns are powerful texture descriptors and any textural inconsistencies induced in forged images are well captured by LBPs. Thus by extracting deep textural features from RI-LBP maps of chrominance channels enhance the detection accuracy.

Detection accuracies for various datasets without and with RI-LBP map for deep feature extraction.
To examine the effectiveness of quadratic SVM classifier in the proposed method, we have done experiments to compare the performance of SVM classifier with other classifiers. K Nearest Neighbor(KNN) [8] and Decision Tree(DT) [5] are the two classifiers used for the comparison. CUISDE dataset is considered for this evaluation. The experiment is done using the optimum deep features extracted from the RI-LBP maps of (Cb+Cr) channel of the images and Table 4 shows the comparison results. It is evident that the performance of quadratic SVM classifier is much higher than other two.
Comparison of detection accuracy (%) obtained by using various classifiers on CUISDE dataset
Comparison of detection accuracy (%) obtained by using various classifiers on CUISDE dataset
The detection accuracy of the proposed method is compared with four state-of-the-art methods which use deep learning techniques for detecting image forgeries [26, 35]. These state-of-the-art methods used CASIA ITDE v1.0 and CASIA ITDE v2.0 datasets for the experimental evaluation. The detection accuracies of these state-of-the-art techniques are obtained straight from respective papers. Table 5 gives the comparison and the highest detection accuracies obtained (99.10% for CASIA ITDE v1.0 using Cr and 99.30% for CASIA ITDE v2.0 using Cb) by the proposed method are considered for the comparison with state-of-the-art methods. The proposed method is also evaluated on CUISDE and IFS-TC(Phase-1-Train) datasets. To the best of our knowledge, forgery detection on CUISDE and IFS-TC datasets using any deep learning techniques are not reported in literature so far. The highest detection accuracies reported using traditional machine learning techniques are, 98.72% [14] for CUSIDE and 85% [13] for IFS-TC. The proposed method obtained a comparable detection accuracy of 98.30% on CUSIDE dataset, while a high improvement is obtained for IFS-TC dataset with an accuracy of 97.70%. These comparative analysis proves that the proposed method using deep textural features obtained from RI-LBP maps of chrominance images outperforms state-of-the-art methods.
Comparison the proposed method with state-of-the-art methods
Comparison the proposed method with state-of-the-art methods
This paper suggests a novel method for detecting image forgeries by utilizing the layer activation of pretrained CNN, Inception-ResNet-v2 as a feature extractor. In this work, RGB images are converted into YCbCr space and obtained the RI-LBP maps of chrominance (Cb, Cr) channels of images. The fully connected layer of Inception-ResNet-v2 is used for extracting deep textural features from RI-LBP maps. The NMF technique is used for reducing the dimensionality of the extracted deep features and these features are utilized to train a quadratic SVM classifier for classifying images into forged or authentic. The proposed method is assessed on four benchmark datasets – CASIA ITDE v1.0, CASIA ITDE v2.0, CUISDE and IFS-TC. The results show that the proposed technique outperforms state-of-the-art methods. The power of pretrained CNN along with rich texture representation capability of RI-LBP maps helped to increase the performance of the method, which is evident from the results obtained. In future, we aim to develop efficient methods to detect deepfake images.
