Abstract
In the era of digital technology, it becomes easy to share photographs and videos using smartphones and social networking sites to their loved ones. On the other hand, many photo editing tools evolved to make it effortless to alter multimedia content. It makes people accustomed to modifying their photographs or videos either for fun or extracting attention from others. This altering brings a questionable validity and integrity to the kind of multimedia content shared over the internet when used as evidence in Journalism and Court of Law. In multimedia forensics, intense research work is underway over the past two decades to bring trustworthiness to the multimedia content. This paper proposes an efficient way of identifying the manipulated region based on Noise Level inconsistencies of spliced mage. The spliced image segmented into irregular objects and extracts the noise features in both pixel and residual domains. The manipulated region is then exposed based on the cosine similarity of noise levels among pairs of individual objects. The experimental results reveal the effectiveness of the proposed method over other state-of-art methods.
Keywords
Introduction
Nowadays, people exchange their ideas, monuments through photographs or videos through social networking sites with their loved ones. Having widely available photo editing tools such as Photoshop or Corel Draw, people easily modify multimedia content either for fun or to get attention from others. One can be an expert to alter in a way that it cannot be traceable visually. When used as documentary proof in Journalism, in the court of law, the pictures play a significant role in supporting or disapproving a particular argument or removing any doubt [6]. This altering brings many security challenges in the trustworthiness of content shared over the internet [35]. The government initiative towards digitization also brings many challenges in securing digital data [34]. These security challenges bring an urgency to find effective and efficient forensic tools capable of detecting malicious attacks and bringing trustworthiness to multimedia content. Image Forensics, a Multimedia Forensics division, aims to explore techniques and tools to detect manipulation attacks on images [10]. In traditional effective forensic methods like watermarking, an authentication code is embedded by some means into the original image, and the same is used for verification [29, 31]. Contrary to these, a blind or passive forgery detection approach assess image authenticity with no external clue [1]. These techniques assume that images took from various cameras or undergone different processing operations introduce different inherent patterns in the targeted image [12]. These underlying patterns were usually consistent in the original image, and the manipulated image will be incoherent. These intrinsic statistics on inconsistencies can be used as a forensic feature to detect manipulation attacks [9]. Different manipulation attacks involve copy-move, splicing, resampling, etc., to tamper digital images. To detect such forgery attacks, the techniques extracts predetermined forensic features from image dataset and apply classification to train the model and evaluate validity of an image from the malicious content attack [19, 36]. In recent years, the researchers focus not only on tampering detection but also on localizing manipulated region [37]. A lot of research work carried out in the field of detecting copy-move and splicing attacks and there is a need to improve the robustness of localizing splicing attack because of its complex nature. In general, the splicing attack involves two different images; thereby, the area affected will be different from the whole image, which is a fundamental key factor in localizing tampered region [27].
Related work
Many algorithms in literature introduce some specific operations such as blurring or median filtering in a spliced region, to reveal tampering attacks [16]. Some techniques use fingerprints of the original image, such as Photo Response Non-Uniformity (PRNU), which does not require any assumption but requires the camera model to capture the image [20]. In their works [3–5] the authors considered the fact that the amount of intrinsic noise in the whole image is generally consistent. Whenever any splicing attack occurs, there are inconsistencies due to different camera models or various processing operations. It is a significant factor in localizing the manipulated area. In [3], the authors detect small spliced regions based on local noise level inconsistencies obtained through wavelet coefficients with non-overlapping blocks. Based on each block noise level homogeneity, the image is segmented into several inhomogeneous sub-regions to expose the spliced region. The same applied to static scene video [1], where Camera Response Function(CRF) measures for each frame and local variance obtains from a noise level function. In [4, 17], an image is segmented into non-overlapping blocks of equal size, and local noise variance is estimated based on positive kurtosis in the band-pass region. In their work [5], the authors used Principal Component Analysis (PCA) to estimate noise-level for each block. They used two block sizes for finer localization. The disadvantage of block-based methods is that the rate of false alarms is high. To reduce it, [7] used superpixel iterative clustering (SLIC) for irregular segmentation of image rather than block-based. The authors estimated the noise levels on each superpixel at different scales in a residual domain. They derived a noise level function from brightness, standard deviation among superpixels and using k-means clustering expos spliced superpixels. The technique is robust to find more than one suspicious region but multi-scaling takes more computational time. In their work [18], authors extracted the regular shape of each superpixel and estimated noise level based on PCA and noise distribution characteristics. They used Fuzzy means clustering to separate the suspicious superpixels from the original one. In color image steganalysis, residual features extracted from each color channel plays a significant role to improve the performance. In their work [8] the authors obtained noise features from color channel differences and estimated noise level for each superpixel. They used Farthest Distributed Centroids Clustering (FDCC) for localizing spliced superpixels. The technique achieved exceptional localization over other superpixel segmentation methods. Recently, Deep Learning techniques have shown better performance on splicing localization. In [33], the authors proposed a fully convolutional neural network (FCN) method for identifying manipulated regions in synthesized images. This method often makes the image smooth, thereby ignoring small objects. To improve its performance in [21] included a region proposal network (RPN) to the FCN and made the whole network an end-to-end learning system. In [32], the authors propose localization architecture that uses resampling features to capture artifacts through Long short-term memory (LSTM) network and an encoder network designed to differentiate tampered regions from non-tampered. The decoder network in the architecture learns features to localize the tampered region. The final layer, known as softmax, is used to understand the network parameters through the back-propagation algorithm from ground truth masks. The advantage is that this technique is capable of localizing pixel level with high precision. In [24] proposed a two-layer deep convolutional neural network (DNN) in which the first layer learns local residual features. The residual features will be further block pooled with SVM to obtain the final discriminative features for splicing detection. A conditional random field (CRF) is incorporated to localize the tampered region. The method is robust even to JPEG compressed images. These deep learning-based techniques improve accuracy but require training on large labeled databases and high computational complexity. The network extracts high-level visual features and neglects low-level features, which can be sources for forensic cues. To overcome the authors of [28] propose a neural network to learn low-level forensic features capable of detecting splicing forgery trained on a small splicing dataset. The technique is capable of detecting splicing forgery in realistic scenes. This paper proposes a forensic technique that can use low-level features to localize the tampered region from a single image without any prior knowledge of the manipulated attack. Unlike other conventional or deep learning techniques that learn features from the whole picture, we proposed an efficient statistical model that extracts noise level features from individual objects in pixel, residual domains to localize tampered regions.
Our contribution
The main contribution of the proposed work is: i) An object-based segmentation is used to extract the features. ii) The noise features are estimated from individual objects to reduce computational time. iii) A localization algorithm is proposed to expose splicing attack using cosine similarity among pairs of individual objects of a spliced image. vi) It is the first time to use object segmentation to localize tampered regions in noised based localization techniques The rest of the paper organizes in the following way: Section 2 describes the proposed image splicing localization method. In Section 3, the experimental results and performance analysis are present. Finally, the paper ends with conclusions in section 4.
Proposed method
The primary focus of this work is the localization of a spliced region. The proposed work introduces object-level segmentation [30], to improve accuracy and efficiency in localizing the tampered region. The work carried out in three levels, as shown in Fig. 1, namely - i) segment the spliced image into irregular objects, ii) estimate noise levels of each object in pixel and residual domains, iii) estimate pairwise dissimilarity among pairs of objects and localize spliced region.

The Proposed Frame Work.
In literature, many state-of-art algorithms exist in the field of noise estimation of still images. The works [11, 15] used Principal Component Analysis (PCA) and are considered as standard techniques. However, both approaches underestimate noise variance by taking the smallest eigenvalue of covariance from selected low-rank paths. To improve noise level estimation, in [13] the authors suggested a method based on assumptions that patches taken from noise-free image lie in a low-dimensional subspace that can be learned from PCA rather than spread uniformly across the image.
Noise estimation in pixel domain
The noise estimation objective is to isolate the manipulated region from the original one, so this work uses the method [13] for noise level estimation in the pixel domain.
The noisy image from the original image defined as
In their work [13], it assumes that clean patches lie in m-dimensional (m - a positive integer) subspace, so the noisy image of each patch x t in the set X s redefined as
Define a rotation matrix R = [A, U] with a condition R T R = I where U be an additional matrix. Now, R can effectively solved by eigen-decomposition of the covariance matrix Σ x . Then eq. 2 can be redefined as:
From the eq. 4 it is observed that noise component η t separated from x t by rotation matrix R. From the Gaussian properties, η t = U T η t is a random variable satisfying Gaussian distribution.
Since the rotation matrix R consisting of eigenvectors of Σ x , we can have
Where Φ is a diagonal matrix, the diagonal elements are eigenvalues.
Thus, the covariance matrix of R T x t can be evaluated as:
The eq. 6 indicates that i
th
eigenvalue λ
i
is the variance obtained on i
th
dimension of vector R
T
x
t
. It observed that variance on principal dimensions are larger than σ and variances on the redundant dimensions to be equal to σ. Then arranging λ
i
in an decreasing order λ1 ≥ λ2 ≥ … ≥ λ
r
and represent into two vectors S = S1 ⋃ S2 where
Let τ t [i] denote the i t h element of vector τ t = U T e t , then eigenvalues in the subset S2 is defined as
The denoised image I η can be obtained by applying filter H(x,y) is defined as
The residual image I r (x, y) consists of the pixel’s noise and obtain by subtracting the denoised image from the original image.
Let S i be the set of irregular objects obtained from the input image, and for each object, the variance of the residual pixels estimated as its noise level as:
μ (S
i
) is the pixel average, and
Noise variances σ p in pixel domain & σ r in the residual domains together used as noise level estimation for localizing tampered region.
The work uses Mask-RCNN image segmentation framework [41] to extract individual irregular segments with their bounding boxes. For each segment, the original object is known as foreground, and the remaining bounding box area is called background. Then noise level characteristics are estimated for both foreground and background objects, as discussed in section2.1. Then spliced region is exposed by evaluating cosine similarity among pairs of individual objects.
For any pair of the distinct foreground or background objects, let the noise level estimation be S1 and S2. Then the cosine dissimilarity between the objects defined as:
where
C (S1, S2) is the cosine angle between the two noise levels. The metric L D gives values in the range [0,1], where the values near to 0 represent similar noise levels and near to 1 represents different noise levels.
1:
2:
3: DM (i, j) = L D (S i , S j )
4:
5:
Find the pair having maximum dissimilarity
6:
7: [COLMAX j , COLIDX j ] = max (DM j )
8:
9: [cmax, cidx] = max (COLMAX)
10:
11: [ROWMAX i , ROWIDX i ] = max (DM i )
12:
13: [rmax, ridx] = max (ROWMAX)
DM (rmax, cmax) has maximum dissimilarity
Now find which object has maximum dissimilarity
14: RROW = ridx, CCOL = COLIDX (ridx)
15: count = 0 ;
16:
17:
18: count = count + 1
19:
20:
retrun the tampered object
21:
22: T P = RROW
23:
24:
After evaluating noise level deviation among pairs of objects, the object which has maximum deviation among all other is estimated by using the proposed localization algorithm 1 and exposed as spliced region.
In this section, the proposed method is evaluated using two datasets and compare its performance with other existing state-of-art techniques in the literature. Typically, Columbia dataset [39] is a widely used image dataset for localizing image splicing forgery, and it consists of 183 spliced and 180 original raw images without undergoing any preprocessing. The proposed method is tested on spliced images from the Columbia dataset and visually compared with three state-of-art noise level localization methods.
The qualitative evolution of splicing images from the Columbia image dataset are shown in Fig. 2. The first row consists of sample Columbia dataset images and the respective ground truth masks in the second row. The proposed work results present in the third row by highlighting the spliced region with white background. The remaining rows are the results of three state-of-art methods. The fourth row is the results of [18], where the authors used superpixel based noise distribution characteristics to localize tampered regions. The technique localized the spliced object, but there are some false positive and true negative superpixels observed. The fifth row consists of results of [5], where the authors use block-based PCA noise level estimation. The results reveal that lot of blocks are falsely detected as spliced. Since the objects are usually in an irregular shape, it could not cover the entire region. The last row consists of the work of [4], where authors used kurtosis based noise level estimation of each pixel. The technique gives a clue for tampering, but there are many false pixels recognized as spliced pixels. From these, it is evident that the proposed method gives accurate localization results of all images. The state of art methods fails or result in high false alarm rates as the noise differences between original and spliced regions are typically small in the Columbia dataset. The proposed method can distinguish the spliced region even with small noise differences, whereas other methods are not.

For evaluating the performance quantitatively, the proposed work uses two metrics: True Positive Rate (TPR) and False Positive Rate (FPR) and are defined as follows:
Where TP-True Positive: signifying the number of tampered pixels detected correctly, FP - False Positive: signifying the number of original pixels detected incorrectly, TN - True Negative: signifying the number of original pixels identified correctly, and FN - False Negative: signifying the number of tampered pixels detected incorrectly. TPR gives the rate of correctly recognized pixels as spliced, and FPR is the rate of pixels that are falsely perceived as tampered in the region. The technique is efficient when it expects to have high TPR and low FPR in the results.
Table 1 demonstrates the pixel-level evaluation of the proposed work on the Columbia splicing image dataset. The results show that the proposed method gives accurate localization with less false alarm rate than other existing methods. From Fig. 3, the TPR and FPR of the proposed method show their superiority and stability among all other existing methods.
Pixel based accuracy and Comparative results on DSO dataset
Most of the noise-based localization methods use datasets consisting of general objects for splicing. In reality, human beings or their faces are used frequently for tampering. To evaluate the performance of the proposed work, DSO spliced imaged dataset [40] is used, consisting of 100 original and 100 spliced high-resolution images of size 1536 x 2048. Fig. 4 consisting of sample images from the DSO dataset and their corresponding masks.

Qualitative comparison of splicing localization on Columbia dataset.

sample spliced images from DSO dataset [40] and the corresponding masks.
Table 2 demonstrates the evaluation results of proposed work on the DSO splicing image dataset. The results reveal that the proposed method gives effective localization than other existing methods. From Fig. 5, it is evident that the TPR and FPR of the proposed method is stable.
Pixel based accuracy and Comparative results on DSO dataset

Comparison of splicing localization accuracy on DSO dataset.
For effective localization accuracy, zero-mean white Gaussian noise is added to a particular segment with variance σ2 ranging from 0.01 to 0.09 with a step size of 0.02 and spliced it with another original image. In this way, 500 spliced images with different noise levels are generated. Then the proposed method is carried out on those images and compared them with other methods.
Figure 6 are the evaluation results of the sample images. Each row consists of spliced images at different noise levels. The first column (a) represents proposed method results, where each spliced object is visible with high accuracy. Column (b) gives the results of [18], which localizes the spliced object, but as the variance reduces, we find few false superpixels. The results of [5] given in column (c). Since the techniques use block-based segmentation, few blocks falsely identified as spliced, and few others falsely identified as non-spliced. The last column (d) gives the results of [4], where there is a clue for the spliced object, but the false-positive rate is high. The results of Fig. 5 concludes that the proposed method is robust to even small differences in noise levels.

The Comparative results summarized in Fig. 7 of images from the DSO spliced dataset over TPR and FPR concerning different noise levels. Figure (a) representing average TPR with Zero mean white Gaussian noise and (b) average FPR with zero-mean white Gaussian noise. From the results of Fig. 6, it is evident that the proposed method has higher TPR and lower FPR among all other techniques. It also observed that the proposed work’s localization accuracy is efficient and stable over all other existing methods, and no significant degradation detected even if the noise variance is small.

comparison of splicing localization accuracy on noise differenes from DSO dataset.
The method is useful when it gives stable results even after performing post-processing operations on the spliced image. To evaluate the robustness of the proposed work, Gaussian blur, JPEG compression, and Gamma Correction operations performed on spliced images. The visual evaluation of the proposed method after post-processing operations are shows in Fig. 8. In the first row, a Gaussian Blur filter is applied to make splicing untraceable. From the second row to the fourth row, JPEG compression with compression ratios 90%, 80%, and 70% is applied on spliced images, and in the last row, Gamma Correction (γ = 1) with decreased dynamic range (50 to 200) is applied. The proposed method results are stable and effective in all operations and give very high TPR and very low FPR.

Visual evaluation of robostness of proposed method.
The method is efficient when the average computational time spent on segmentation, feature computation, and localization is minimal. In this work, after segmenting into individual objects, noise features are estimated from each foreground and background, assuming that the background’s noise level is similar to the whole image. Thereby a lot of computation time is saved. Pairwise dissimilarity is then obtained among objects to localize the tampered region. Table 3 shows the average running time of the proposed work and the other state of the art methods [4, 18] on Columbia and DSO splicing datasets. Among all, the proposed work takes less time to localize the tampered region.
Average Running Time
Average Running Time
The state-of-art methods work on either regular or irregular block-based approaches. To the best of our knowledge, no noise-based techniques use object-based segmentation for localization. The proposed method used object-based segmentation, thereby achieved accurate and efficient localization of the tampered area.
The proposed method is robust to localize even when spliced and original images have small noise differences and achieve higher TPR and lower FPR than the state of the art methods.
The method is also robust in performing post-processing operations such as Gaussian Blur, JPEG compression, and Gamma Correction on a spliced image. However, as the compression ratio decreases further, the method cannot localize accurately, and it considers as future work.
Conclusion
This paper proposes an effective method for localizing the spliced image’s tampered region using object-based segmentation. Each segment is then split into the foreground and background objects. Noise level features are estimated for all objected on both pixel and residual domains. Then using cosine similarity among pairs of objects are estimated, and using the proposed algorithm, the tampered region is exposed. The experimental findings demonstrate that the proposed approach is more efficient and reliable than the state-of-art noise-based methods in localizing the manipulated region accurately.
Footnotes
Acknowledgments
The autors expressed deep sense of gratitude to the anonymous reviewers for giving their suggestions to improve the authenticity of the work. The authors also expressed thanks to [5,
] for providing code and dataset available online.
