Video interframe forgery detection: Classification,technique & new dataset

Abstract

Multimedia communication as well as other related innovations are gaining tremendous growth in the modern technological era. Even though digital content has traditionally proved to be a piece of legitimate evidence. But the latest technologies have lessened this trust, as a variety of video editing tools have been developed to modify the original video. Therefore, in order to resolve this problem, a new technique has been proposed for the detection of duplicate video sequences. The present paper utilizes gray values to extract Hu moment features in the current frame. These features are further used for classification of video as authentic or forged. Afterwards there was also need to validate the proposed technique using training and test dataset. But the scarcity of training and test datasets, however, is indeed one of the key problems to validate the effectiveness of video tampering detection techniques. In this perspective, the Video Forensics Library for Frame Duplication (VLFD) dataset has been introduced for frame duplication detection purposes. The proposed dataset is made of 210 native videos, in Ultra-HD and Full-HD resolution, captured with different cameras. Every video is 6 to 15 seconds in length and runs at 30 frames per second. All the recordings have been acquired in three different scenarios (indoor, outdoor, nature) and in landscape mode(s). VLFD includes both authentic and manipulated video files. This dataset has been created as an initial repository for manipulated video and enhanced with new features and new techniques in future.

Keywords

Video forgery detection interframe forgery frame duplication Hu moments Dataset library

1. Introduction

Digital multimedia technology has transformed the way of human beings, exist on this planet over the last two decades. Multimedia forensics has proved to be a significant and perhaps most likely prevalent field of research throughout this technological era. Digital Multimedia usually includes images and videos which are primarily used for communication and entertainment purposes. It has also performed a crucial role in the field of military, surveillance and judiciary. But somehow, it dominate a lot of our everyday lives and are also evolve tremendously. As the high tech and low-cost smartphone camera technology has been developed and it allows anyone to record multimedia content easily. Even with the advent of advanced yet readily available tools for video editing technologies (such as Adobe Photoshop, Apple Finalcut etc.), digital photos and videos are being modified more and more rapidly. Such alterations are made, knowingly or unknowingly by individuals to alter the truth for his amusement or to perform the illegal activity [53]. Alterations in videos happen in such a way that the forger may duplicate or shuffle the video sequence, erase object from a video sequence, impose object from another video sequence, and may form artifact by any of the available computer graphics software. Due to these issues, the authenticity of digital multimedia has been destroyed [47]. With the naked eye, visual detection of criminals is extremely difficult to find. Several other domains are affected by video forgery including video falsification for the videos of surveillance camera used in public places to defame the celebrities or disguise current trends. It is also very challenging in simulating the sources and content of digital visual information. Therefore, it further increases the need for the development of digital video forgery detection techniques.

1.1. Measures of video forensics

Video forgery detection mainly depends on two measures: active approaches and passive approaches [3,34]. Digital signatures [21,52] and watermarking [30,40] have made great strides as effective digital video forgery identification techniques. But pre-extracted or pre-embedded data is necessary for an active approach during the acquisition phase [13,43]. Many cameras, however, have no such feature, so the passive approaches are most commonly used in the latest research trend. Passive approaches do not require any historical information about the video. Interframe and intraframe video forgery are such techniques to manipulate videos. Interframe modification occurs at the sequence level where the pixels of the individual frame are retained, but the original frame sequence is altered [16]. Such manipulation generally involves removing frames from a video sequence, adding frames to a video sequence, duplicating the frames, and shuffling the video frame sequence [42]. Intra-frame alteration occurs at pixel level where the pixels of individual frames are manipulated. The prevalent intraframe forgery techniques are copy move, splicing and retouching [39]. The frame regions are copied and pasted somewhere else in the same frame in copy-move video forgery, to hide or add additional information in the original video, whereas two or more frame regions are used in splicing to create a forged frame [26]. Retouching is a soft destructive forgery technique in which an image/frame does not change fundamentally, but some of the original image features are enhanced [10,33].

The research paper presents a video forgery detection technique based on the Hu moments and also contains significant contributions as:

A technique for detecting frame duplication forgery in videos.

A new dataset for testing the tools and techniques proposed to detect the frame duplication forgery in videos.

The ultimate goal of present paper is to recognize the challenges of past research in this field. More precisely, it focuses on the ongoing problem of integrity and authenticity of the video content. The paper is organized in six sections and is as follows: Section 1 presents a concise description of digital image and digital video forensics, also describes the techniques used for detecting forgery in images or videos, as well as the necessity of dataset for assessment. Section 2 delivers a review of existing relevant papers on video forgery from 2005 to 2020, as well as a brief analysis of their identification techniques. The paper specifically discusses the significance and unresolved issues in the existing work of the various frame duplication video forgery detection techniques. Section 3 demonstrates the nature of the dilemma in the proposed technique and reveals the significance for the same. It begins with the basic principle of the algorithm key topics. Comprehensive description for each step in the proposed algorithm is also provided in this section. Then the experimental findings obtained using the proposed technique are presented in Section 4. Ground truth about the dataset contributions is discussed in Section 5. Ultimately, Section 6 summarizes the overall work and provides recommendations for further studies in the area of video forgery detection.

2. Related works

2.1. Inter-frame forgery detection

The detection of interframe forgeries covers four types of manipulation [42,46]: Frame Insertion, Frame Deletion, Frame Duplication, Frame Shuffling. In order to expose digital video forgery, [46] used motion-compensated edge artifacts (MCEA). It requires a hard threshold factor and degradation of performance in slow motion videos. The limitations severely limit its realistic adaptability. Authors [11,24,32,45] introduced the techniques for the identification of frame deletion forgery but not all of them work properly if the frames deleted becomes an integral multiple of Group of Pictures (GOP). In addition, a proposed scheme of [23] also a coarse-to-fine technique that consists of candidate clip selection, estimation of spatial similarity, and identification of frame replication. The histogram difference of two adjacent frames in the RGB color space has been used as a function to classify duplicate candidates for the temporal domain and to measure the spatial correlation using a block-based algorithm. This strategy did not work well if copied frames shuffled before they pasted into another region. The given technique also requires more computational time and unacceptable for the videos with post processing operations. To detect frame duplication forgery in videos an author [22] introduce a passive-blind scheme, but the approach failed to perform localization. In [54] the authors introduced a method based on the consistency of the velocity field designed to detect the interframe falsification in digital video. The proposed technique only work on videos obtained through static surveillance. Another technique in [41] has adopted sub blocks based features to recognize the frame replication forgery. Compression often degrades the system performance as the strategy works with correlation. The author in [56] presented a technique for frame duplication detection which divides the video into subsequent overlapping frames, and then applied singular value decomposition (SVD) for every frame of the video. This approach limit the identification of duplicated frames when there present only fewer duplicated frames than the size of the window considered. Zernike moment correlation has been used in [25] to calculate Zernike opponent chromaticity moments (ZOCM) from the chromaticity space. The technique was adequate for static or slow-motion camera mode but did not work in dynamic background videos. For the residue dataset contribution during the decoding process, spatial energy (SE), temporal energy (TE) and SNR are determined. Interframe forgeries, e.g. replication, insertion and removal forgery associated with [12] also dependent on these kinds of procedures. It moreover failed to capture both irregularities and identification if any frame removed from a static scene. In [6], the author suggested a two-stage forensic technique for the detection of video interframe forgery. Initially, outlier frames become identified by using the Haralick coded frame correlation and then analyses carried out to eradicate false positives, thus increasing the effectiveness of forgery detection. But this technique failed to identify frame reshuffling and replacement tampering.

2.2. Intraframe forgery detection

A technique has been designed to detect tampering in interlaced and de-interlaced videos in [49]. Spatial and temporal correlation introduced by deinterlacing algorithm demolished due to the manipulation in videos. This disrupted the motion of surrounding frames. The authors also suggested a technique for detecting region duplication in [50] using correlation that failed when replication has performed instantaneously with time. Block-level correlation values of noise residuals become retrieved in [14] and their distribution modelled as a Gaussian mixture model (GMM) in the forged and normal video. The expectation maximization (EM) algorithm was then used to estimate the parameters of GMM model. Based on the estimated parameters, Bayesian classifier employed to find the optimal threshold value. This method was not good enough for moving camera or dynamic background videos. In [18], the magnitude and orientation of the motion vectors were computed from adjacent frames and used to differentiate the authentic region and the forged region. The distribution of MVs often uniform for normal movement compared to the tampered region. So, the variance of the angle of the MVs in the tampered regions bigger than those in the normal moving regions. In [20], an approach detect tampering by using inpainting methods, such as temporal copy-and-paste (TCP) and exemplar-based texture synthesis (ETS). These inpainting methods fill the holes left by removed objects. Frame motion information calculated from the grayscale convert video for frame grouping and an alignment to handle the camera motion so that it has been neglected in the subsequent analysis and detection steps. Spatio-temporal coherence analysis performed over each frame group independently. This produced a group coherence abnormality pattern (GCAP) which now used to identify regions having unnaturally high or abnormally low coherence. Each spatio-temporal slice has been compared to its GCAP to determine if it is tampered with or not. A method to detect region tampered frames (add/remove objects) using features extraction from motion residuals of each frame has been proposed by [7]. In order to create the system robust to variable GOP structure videos, collusion operators have been used for generating motion residues. A sequence of frames centered at k taken, to compute the motion residue of the kth frame. A method to detect copy-move forgery using Zernike moments and 3D patch match has also been discussed by [9]. The authors extended the 2D patch match for images used in [8] to account the temporal information. Though the method achieved very low accuracy, it is rotation and scale invariant. The authors [5] developed a hybrid mechanism in which authors compared the triangles and then determine the interesting points in the figures to detect copy-move forgery. The block matching approaches worked efficiently for pure translation but incredibly slow in geometric transformation, so they did not work well. In order to identify upscale crop and splicing tampering, [37] introduced a technique called resampling in digital videos. For pixel-covariance correlation analysis the modified Gallagher Detector (MG) and the fractional modified Gallagher Detector (F-MG) has been used. The technique also used in the regions of interest (ROIs) of video frames for the identification of splicing (region-level) tampering. This method presented the major challenge in estimating parameters used for analytical considerations like scaling factors and interpolation filter for identification. To explore the area of splicing(region level) detection [38] used sensor pattern noise (SPN), Hausdorff distance-based clustering and color filter array (CFA) for copy-paste forgery detection and localization. The authors explored the technique to detect and locate copy-paste falsifications when the specific area of a video frame modified. The methods for copy-paste forgery detection and localization includes sensor pattern noise (SPN), Hausdorff distance-based clustering and color filter array (CFA). Considering the frame-to-frame and region-based matching involved in this procedure face great difficulty in the computation. By using the block correlation matrix, [4] presented a brilliantly simple strategy for detecting copy-move attacks, but the strategy has false positive and false negative results which create detection errors.

In this paper, greater emphasis is placed on the strategies for inter-frame falsification. The systematic review of interframe forgery detection techniques of all the papers discussed above shown in Table 1 focusing on the state-of-the-art matrices of their research gaps, test dataset and resolution of test dataset. In brief, however, certain techniques detect video inter-frame forgery efficiently, but their methods require a high time to compute and less suited for real-life applications [23,25,41,54,56]. Furthermore, the techniques have been evaluated by using a low-resolution video dataset [6,12]. In the same way, the entire research aims to develop a quick method for detecting and locating duplicate frames in videos. A video-interframe forgery detection technique based on Hu invariant moments and the coarseness evaluation has been presented to significantly increase the detection effectiveness and accuracy. In addition, high-resolution VLFD dataset library has also been designed to enable future researchers in evaluating their findings. It contains 420 original and forged videos in FullHD and UltraHD mode. The technique efficiently detect forgery in static scene and also it performed well in case of reshuffling. Experimentation work is performed with high resolution test videos. In the best of my knowledge, till now there is no such dataset repository available with ultra-high resolution. Section 4.2 carried out a comparative overview with the available dataset libraries.

Table 1
Systematic review of interframe forgery detection techniques

Author Name Test Dataset Resolution of Test Dataset Research Gaps

Lin et al. [23] 4 Videos 720 × 480 ∙ High computational time
∙ It is not acceptable for the forged videos with post processing operations.
∙ This strategy doesn’t work if copied frames are shuffled before they are pasted

Lin & Chang [22] 15 Videos 720×480, 640×272 ∙ This technique fails to perform localization

Wu et al. [54] 120 Videos 720 × 576 ∙ The proposed technique only works on videos obtained through static surveillance.

Singh et al. [41] 5 videos 1080 × 1920,
240 × 320 ∙ The forgery in the videos captured with the stationary camera is difficult to detect.
∙ Compression often degrades the system performance as the strategy works with correlation.

Yang et al. [56] 25 videos 320×240,
720×576,
640×480,
328×264,
320×134 ∙ This approach limits the identification of duplicate frames when there are fewer duplicate frames than the size of the window considered.

Liu & Huang [25] 30 videos 320 × 240 ∙ The technique was adequate for static or slow-motion camera mode but did not work in dynamic background videos.

Fadl et al. [12] 132 videos 352 × 288 ∙ Fails to detect for static scene video.
∙ Not be able to tackle the noisy videos.

Bakas et al. [6] 150 videos 320 × 240,
352 × 288 ∙ This technique fails to identify frame reshuffling and replacement tampering.

Wang & Farid [49] 2 Videos 480 × 720 ∙ Lower detection accuracy
∙ Not very efficient technique
∙ Higher time complexity
∙ Localization not performed

Author Name	Test Dataset	Resolution of Test Dataset	Research Gaps
Lin et al. [23]	4 Videos	720 × 480	∙ High computational time ∙ It is not acceptable for the forged videos with post processing operations. ∙ This strategy doesn’t work if copied frames are shuffled before they are pasted
Lin & Chang [22]	15 Videos	720×480, 640×272	∙ This technique fails to perform localization
Wu et al. [54]	120 Videos	720 × 576	∙ The proposed technique only works on videos obtained through static surveillance.
Singh et al. [41]	5 videos	1080 × 1920, 240 × 320	∙ The forgery in the videos captured with the stationary camera is difficult to detect. ∙ Compression often degrades the system performance as the strategy works with correlation.
Yang et al. [56]	25 videos	320×240, 720×576, 640×480, 328×264, 320×134	∙ This approach limits the identification of duplicate frames when there are fewer duplicate frames than the size of the window considered.
Liu & Huang [25]	30 videos	320 × 240	∙ The technique was adequate for static or slow-motion camera mode but did not work in dynamic background videos.
Fadl et al. [12]	132 videos	352 × 288	∙ Fails to detect for static scene video. ∙ Not be able to tackle the noisy videos.
Bakas et al. [6]	150 videos	320 × 240, 352 × 288	∙ This technique fails to identify frame reshuffling and replacement tampering.
Wang & Farid [49]	2 Videos	480 × 720	∙ Lower detection accuracy ∙ Not very efficient technique ∙ Higher time complexity ∙ Localization not performed

3. Proposed technique

The proposed technique is designed to detect frame duplication forgery in videos by using seven invariant Hu moments. Hu moments are used to effectively extract features from an image or frame and is robust against preprocessing. The paper presents a frame duplication video forgery detection technique. Figure 1 demonstrates the various phases of the proposed technique.

Fig. 1.

Workflow of proposed technique.

3.1. Preprocessing

The first step is to convert a video into the frame sequence. Every multipixel frame is made up of three colors red, green and blue(RGB). The required storage space of an RGB color image is three times larger than that of a gray image. Consequently, in order to minimize calculations, gray images are used, which represent the real color image. The primary motivation for each frame’s preprocessing operations is to improve efficiency by eliminating redundancies. The operations involved in the preprocessing phase are image re-sizing, filtering etc. The gray scale image is represented by ‘I’ and the size of each frame is ‘ $n \times m$ ’ and every frame is converted to grayscale by using the Eq. (1) [48]: $\begin{matrix} (1) & I = 0.229 R + 0.587 G + 0.114 B \end{matrix}$ where $R, G$ and B represent red, green and blue components respectively.

3.2. Feature extraction

The areas under which hu moments are often used seems to be image processing, video processing and computer vision [27]. Such moments and their invariant functions are primarily applied for extracting features including pattern recognition and for other digital watermarking techniques.

3.2.1. Hu moments

In present technique the feature extraction with hu moments searches for feature vectors and frame invariants. The Hu invariant moments seems to be statistical key measures originally developed to remain unchanged after object rotation, scaling, translation and certain other transitions. Hu moments present a generic view of objects and extraction is convenient [28]. The two dimensional frame is represented as $f (x, y)$ and its $(p + q) t h$ order moment $m_{pq}$ is illustrated in Eq. (2) [55]: $\begin{matrix} (2) & m_{p q} = \sum_{x} \sum_{y} x^{p} x^{q} f (x, y) \end{matrix}$ where $p, q = 0, 1, 2, \dots n$ . A grayscale image obtained using Eq. (1) with pixel intensities $I (x, y)$ , the moments $m_{i j}$ for an image are calculated by: $\begin{matrix} (3) & m_{i j} = \sum_{x} \sum_{y} x^{i} x^{j} I (x, y) \end{matrix}$ where $i, j = 0, 1, 2, \dots n$ . To statistically define the mass of each frame which is also described as the zero order moment is shown as: $\begin{matrix} (4) & m_{i j} = \sum_{x} \sum_{y} f (x, y) \end{matrix}$

The transformations such as the translation, rotation or scaling of frame f(x, y) varies but the moments in Eq. (2) cannot be invariant. The invariant features can indeed be fulfilled by using central moment $μ_{pq}$ : $\begin{matrix} (5) & μ_{p q} = \sum_{x} \sum_{y} {(x - \bar{X})}^{p} {(y - \bar{Y})}^{q} f (x, y) \end{matrix}$

Centroid of the frame $f (x, y)$ has been represented by $(\bar{X}, \bar{Y})$ . If the centroid of a target is on the same location as the original coordinates, specified as $\bar{X} = 0$ and $\bar{Y} = 0$ . Specifically, to evaluate the centroid of a frame the first-order moments $(m_{10}, m_{01})$ are used i.e. $\bar{X} = \frac{m_{10}}{m_{00}}$ & $\bar{Y} = \frac{m_{01}}{m_{00}}$ Normalization could even maintain invariance against scaling, and can measure normalised central moments by (6): $\begin{matrix} (6) & η_{p q} = \frac{μ_{p q}}{{(μ_{20} + μ_{02})}^{y}} \end{matrix}$ in which, $y = \frac{(p + q + 2)}{2}$ , where $p + q = 2, 3, \dots n$ .

The author [15] used the normalized central moments and then introduced seven invariants of the moment. Such moments are applied on each frame for feature extraction and also illustrated as the nonlinear combination of $η_{pq}$ as Eq. (7): $\begin{array}{l} H_{1} = & η_{20} + η_{02} \\ H_{2} = & {(η_{20} - η_{02})}^{2} + 4 {(η_{11})}^{2} \\ H_{3} = & {(η_{30} - 3 η_{12})}^{2} + {(η_{03} - 3 η_{21})}^{2} \\ H_{4} = & {(η_{30} + η_{12})}^{2} + {(η_{03} + η_{21})}^{2} \\ H_{5} = & (η_{30} - 3 η_{12}) (η_{30} + η_{12}) ({(η_{30} + η_{12})}^{2} - 3 {(η_{21} + η_{03})}^{2}) + (3 η_{21} - η_{03}) (η_{21} + η_{03}) \\ \times (3 {(η_{30} + η_{12})}^{2} - {(η_{03} + η_{21})}^{2}) \\ H_{6} = & (η_{20} - η_{02}) ({(η_{30} + η_{12})}^{2} - {(η_{21} + η_{03})}^{2}) + 4 η_{11} (η_{30} + η_{12}) (η_{21} + η_{03}) \\ H_{7} = & (3 η_{21} - η_{03}) (η_{30} + η_{12}) ({(η_{30} + η_{12})}^{2} - 3 {(η_{21} + η_{03})}^{2}) + (η_{30} - 3 η_{12}) (η_{21} + η_{03}) \\ (7) & \times (3 {(η_{30} + η_{12})}^{2} - {(η_{03} + η_{21})}^{2}) \end{array}$ In the proposed approach, the mean of these seven invariant moments is then determined for every individual frame and stored in a feature vector. Then the mean difference of adjacent frames is determined and the correlation of adjacent feature vectors is analysed using Euclidean Distance. Euclidean distance is prioritised for calculation because, by using the Euclidean distance will have very lower computational load in comparison to certain other techniques. It has been observed that there is a higher correlation present between the duplicate frames of a forged video. The Euclidean distance is thus used in the technique proposed. The similarity between neighbouring frames is then computed using the Euclidean distance of their corresponding feature vectors to determine whether they are duplicates or not. The Euclidean distance [56] between $I_{i}$ and $I_{i + 1}$ is computed by $Edis (i)$ as in Eq. (8): $\begin{matrix} (8) & Edis (i) = \sqrt{\sum_{k = 1}^{k = r} {(x_{i} - x_{i + 1})}^{2}} \end{matrix}$

$Edis (i)$ is the feature vector to show the differences between $I_{i}$ and $I_{i + 1}$ , and $Edis (1) = 0$ . There is clearly just one factor inside a new function, which significantly decreases the amount of calculations. SVM classifier further improved the accuracy of the proposed technique.

3.3. Classification

The feature vectors have been fetched to SVM classifier for the classification and labelling of authentic or tampered video frames. Support Vector Machine(SVM) [17] is the classificator used for classification of genuine or tampered video frames labelling. The kernal function in SVM converts the input feature vectors into higher dimensional feature space. The optimul hyperplane is then drawn and it acts as a decision boundary between the similar classes and rest of the classes. The classes having similar feature vectors is classified as tampered, while the classes having dissimilar feature vectors are classified as genuine.

3.4. Abnormal point detection

After classification of authentic and tampered video frames the exact location of duplicate frames will be extracted. The technique has firstly computed the mean $μ 1$ and variance $σ 1$ of Euclidean distance. For the distribution to set extreme values, a rule of 3 sigma has been used. Then it is required to measure the lower limit $l b$ and the upper limit $u b$ , where $\begin{array}{l} (9) & l b = μ 1 - (6.0 * σ 1) \\ (10) & u b = μ 1 + (6.0 * σ 1) \end{array}$ by using lower bound $l b$ and upper bound $u b$ , then suspected abnormal points are identified. However, if $σ 1$ has become too low, those points might also include ordinary points, which is called false detection. To minimize the false detection a threshold value T has been considered. Eventually, the duplicate frames in the video have been located and highlight them.

4. VLFD dataset contributions

In video forensics, several scholars have proposed variety of methodologies for the identification of image and video forgery and significantly reducing multimedia felony. RAISE, VISION, UCID, CASIA and other dataset libraries are available to test and validate proposed techniques in image forensics. In the same way, reliable dataset for the evaluation of proposed video forgery detection techniques developed by numerous researchers and are limited.

Fig. 2.

Video interframe forgery.

4.1. Proposed dataset library

Video Forensics Library for Frame Duplication (VLFD) is mainly designed for the validation of techniques proposed for video frame duplication forgery detection. Video frame duplication is a kind of interframe video forgery where a sequence of frames are copied and placed somewhere else in the same video to create a forgery attack. This attack is performed to hide an object present in the video frame sequence for entertainment or sometimes it is performed to hide the criminal activity. In Fig. 2 video inter-frame forgery attacks are illustrated. As discussed in literature, a series of techniques have been introduced by various researchers to detect these forgeries but to validate these techniques dataset libraries are scarce. Therefore, a new dataset library is introduced where 210 native videos are acquired with modern smartphone cameras belonging to different brands: ASUS and VIVO. These devices are able to generate standardized and great quality videos.

Fig. 3.

Layout for building the library by utilizing data from different devices.

4.1.1. Generalized layout framework

Figure 3 demonstrates a framework for building the library by utilizing data from different devices. All videos from both devices have been specifically acquired in landscape mode. There are also different scenarios for video acquisition i.e. flat, indoor, outdoor and nature. Video adhering to the flat scenario includes sky, walls etc. Videos taken inside the building are included in the indoor scenario while videos taken outside, in open crowded areas like gardens are considered as outdoor videos. Nature corresponds to the various aspects of the environment. An example of different scenarios is illustrated in Figs 4, 5, 6. Figure 4(a) and 4(b) depicts the original and forged video frames of butterflies and shot indoor. As showing in this example, the video sequences in the first row are manipulated in the form of frame duplication forgery, while the second row depicts the original video frames. The original and forged video frames from proposed video dataset has been shown in Fig. 5(a) and 5(b). As demonstrated in the context, an outdoor video sequence depicts a car on the road and a tree. The video sequences in the first row contain the original video frames while the second row depicts the manipulated video frame sequence with the form of frame replication forgery where some previous frames are duplicated. Eventually, the images illustrate a simple video sequence of nature in Fig. 6(a) and 6(b). After this, the two different acquisition modes (i.e. still mode and move mode) have been used for each scenario. Acquiring the video of static areas is named as still mode while in move mode there is some movement happening; for example, a user strolls while attempting to capture the video. All the captured videos have the static or moving object with or without a tripod.

Fig. 4.

Indoor video frames: (a) original frames (b) forged frames.

Fig. 5.

Outdoor video frames sequence: (a) original frames (b) forged frames.

Fig. 6.

Nature video frames sequence: (a) original frames (b) forged frames.

4.1.2. Video acquisition

In the dataset library, there are 50 still and 60 move mode videos using ASUS smartphone camera having 3840 × 2160 resolution. Phase detection autofocus (PDAF) camera technology is used in this model with an aperture scale of f2.2. The frame rate of each video is 30 frames per second(fps). Furthermore, the dataset collected using Vivo smartphone camera includes 50 still and 50 move mode videos with 1920 × 1080 resolution. The camera technology used in this model is Phase detection autofocus(PDAF) and the aperture is f2.2. Each video uses 30 frames per second(fps). The length of all the acquired videos using both cameras is less than 15 seconds. The key features of the whole dataset are listed in Table 2. It describes Brand, Model, Camera, Resolution, Frame rate, Aperture, Length and the number of videos captured on both devices. Moreover, it also explains the type of forgery implemented and the quantity of indoor, outdoor and nature videos.

4.1.3. Generalized framework of video forgery process

Frame duplication is the most common forgery in VLFD video tampering dataset.The measures of tampering procedure are:

Original videos are acquired using smartphone camera with file type (.mp4);

Import video to Video Editor;

Frame duplication is performed in Video Editor;

Export Tampered Video.

A basic flow chart of the video frame duplication tampering process is illustrated in Fig. 7. In the present work Openshot software has been used, which is most successful in video manipulation, to conduct frame duplication falsification. The tampering process begins with the import of videos into the software and drags them one by one into the timeline. By utilizing appropriate tools, copy the range of frames from the surrounding region and paste at some random location onto the same video for frame duplication manipulation. Eventually, the tampered video has been exported to the storage device.

Table 2
The key features of VLFD dataset

Brand Model Camera Format Resoultion Fps Aperture Length Forgery Type Indoor Outdoor Nature Total

ASUS ASUS_X00TD PDAF MP4 3840 × 2160 (UHD) 30 f2.2 <15sec Frame Duplication Forgery 25 55 30 110

VIVO VIVO_1811 PDAF MP4 1920 × 1080 (FHD) 30 f2.2 <15sec Frame Duplication Forgery 10 40 50 100

Brand	Model	Camera	Format	Resoultion	Fps	Aperture	Length	Forgery Type	Indoor	Outdoor	Nature	Total
ASUS	ASUS_X00TD	PDAF	MP4	3840 × 2160 (UHD)	30	f2.2	<15sec	Frame Duplication Forgery	25	55	30	110
VIVO	VIVO_1811	PDAF	MP4	1920 × 1080 (FHD)	30	f2.2	<15sec	Frame Duplication Forgery	10	40	50	100

Fig. 7.

Generalized framework of video forgery process.

4.2. Qualitative & quantitative analysis with available dataset libraries

This segment summarizes the existing scenario in video forgery dataset. One of the most admired datasets generated in 2012 by [31] is SULFA. SULFA dataset is accessible via the URL of the University of Surrey website [31]. This dataset has been applied for the validation of techniques commonly proposed for copy-paste forgery detection. The forgery is implemented with the help of CS3 and CS5 Adobe Photoshop over the original video. The resolution of the video captured is 320 × 240 and that is very low. Even, the REWIND dataset contribution expanded the Surrey University Library for Forensic Analysis (SULFA) dataset. It has 10 original and 10 forged videos and has 320 × 240 pixel resolution, and the frame rate is 30 fps. A dataset library has been proposed to create manipulated clips with the help of Mokey 4.1.4 [44]. Afterwards, a dataset collection was introduced by [1] also known as Video Tampering Dataset (VTD) that includes the video forensic library located on YouTube URL [2] and publicly accessible. The primary aim is to evaluate the methodologies used for the identification of splicing, swapping frames and copy-move forgery. Another, huge dataset SYSU-OBJFORG is developed by [7] for object-based video forgery, but not available free of cost to the public. A dataset introduced by [36] is accessible in the [35] web address, and has been granted free access to the research community. It is not effective for all forgery detection techniques. Table 3 illustrates a range of alternative repositories for video manipulation, along with their key characteristics. The table also elaborates and compare various features of proposed VLFD dataset with the existing state of the art techniques. The proposed dataset is efficient and reliable when compared with the existing datasets. PDAF camera is used to acquire the FullHD and UltraHD resolution videos. VLFD dataset also provided ground-truth which contains significant information about the video dataset repository.

Table 3
Comparative summary of notable video forgery dataset contributions

Sr. No. Tampering Dataset Total Videos Video Source Resolution Camera Type Duration Format Forgery Type Gaps

1. SULFA2012 (Surrey University Library for Forensic Analysis) [31] 150 videos (30fps) ∙ Canon SX220 (codec H.264)
∙ Nikon S3000 (codec MJPEG)
∙ Fujifilm S2800HD (codec MJPEG) 320 × 240 Static Camera 10sec MOV, AVI Spatial & Temporal Copy-Move Forgery ∙ Low Resolution Videos
∙ Do not cover all the types of video tampering

2. Liao et al. 2013 [19] 10 videos (25 & 30fps) SONY DSCP10 & Internet 640 × 480 Static Camera NA MPEG-2 Copy Move Object Forgery ∙ This dataset has no ground truth to show the significant information such as video length and number of frames.

3. Lichao et al. 2014 [44] 7 videos (25fps) SONY DSC-P10 & Internet 640 × 480 Static Camera NA MPEG-2 Copy Move Object Forgery ∙ The video tampering of this dataset was performed depending on a single object movement and has small digital video origin.

4. VTD2016 (Video Tampering Dataset) [1] 33 videos (30fps) Nine different Smartphone cameras,YouTube & by exploring social networking sites 1280 × 720 Static & Dynamic Camera 16sec AVI Spatial & Temporal Domain (Splicing, Copy-Move, Frame-Shuffling) ∙ Distribution on YouTube means all videos affected by varying compression.

5. SYSU-OBJFORG 2016 [7] 100 Public 100 Authentic videos (25fps) Static commercial surveillance camera 1280 × 720 Static Camera 11sec H.264/ MPEG-4 Spatial Domain (Object based Forgery) ∙ No source for public download

6. VISION 2017 [36] 622 Native 587 YouTube videos (24fps) 35 different portable devices & Youtube FullHD Static camera >60sec MOV, MP4, 3GP Source Device Identification ∙ Due to large size videos it is more time consuming to evaluate a technique

7. TDTVD 2020 [29] 210 Tampered Videos (30fps) SULFA dataset & Youtube(VTD dataset) 320 × 240 or 640 × 360 Static Camera 6sec to 18sec MOV, AVI Temporal Domain Tampering ∙ Resolution is very low
∙ Compression affected videos

8. Proposed VLFD 2020 200 original & 200 tampered videos (30fps) ∙ ASUS PDAF Camera
∙ VIVO PDAF Camera 1920 × 1080 (FullHD), 3840 × 2160 (UltraHD) Static & Dynamic >15sec MP4, AVI Interframe Forgery ∙ Intraframe forgery has not been implemented.

Sr. No.	Tampering Dataset	Total Videos	Video Source	Resolution	Camera Type	Duration	Format	Forgery Type	Gaps
1.	SULFA2012 (Surrey University Library for Forensic Analysis) [31]	150 videos (30fps)	∙ Canon SX220 (codec H.264) ∙ Nikon S3000 (codec MJPEG) ∙ Fujifilm S2800HD (codec MJPEG)	320 × 240	Static Camera	10sec	MOV, AVI	Spatial & Temporal Copy-Move Forgery	∙ Low Resolution Videos ∙ Do not cover all the types of video tampering
2.	Liao et al. 2013 [19]	10 videos (25 & 30fps)	SONY DSCP10 & Internet	640 × 480	Static Camera	NA	MPEG-2	Copy Move Object Forgery	∙ This dataset has no ground truth to show the significant information such as video length and number of frames.
3.	Lichao et al. 2014 [44]	7 videos (25fps)	SONY DSC-P10 & Internet	640 × 480	Static Camera	NA	MPEG-2	Copy Move Object Forgery	∙ The video tampering of this dataset was performed depending on a single object movement and has small digital video origin.
4.	VTD2016 (Video Tampering Dataset) [1]	33 videos (30fps)	Nine different Smartphone cameras,YouTube & by exploring social networking sites	1280 × 720	Static & Dynamic Camera	16sec	AVI	Spatial & Temporal Domain (Splicing, Copy-Move, Frame-Shuffling)	∙ Distribution on YouTube means all videos affected by varying compression.
5.	SYSU-OBJFORG 2016 [7]	100 Public 100 Authentic videos (25fps)	Static commercial surveillance camera	1280 × 720	Static Camera	11sec	H.264/ MPEG-4	Spatial Domain (Object based Forgery)	∙ No source for public download
6.	VISION 2017 [36]	622 Native 587 YouTube videos (24fps)	35 different portable devices & Youtube	FullHD	Static camera	>60sec	MOV, MP4, 3GP	Source Device Identification	∙ Due to large size videos it is more time consuming to evaluate a technique
7.	TDTVD 2020 [29]	210 Tampered Videos (30fps)	SULFA dataset & Youtube(VTD dataset)	320 × 240 or 640 × 360	Static Camera	6sec to 18sec	MOV, AVI	Temporal Domain Tampering	∙ Resolution is very low ∙ Compression affected videos
8.	Proposed VLFD 2020	200 original & 200 tampered videos (30fps)	∙ ASUS PDAF Camera ∙ VIVO PDAF Camera	1920 × 1080 (FullHD), 3840 × 2160 (UltraHD)	Static & Dynamic	>15sec	MP4, AVI	Interframe Forgery	∙ Intraframe forgery has not been implemented.

4.3. Ground truth

The primary objective of developing the ground truth of this dataset is to provide comprehensive details and information on the type of video manipulation encountered in the approach to the study. VLFD dataset allows the investigators to rapidly test and analyze their methods using a reliable repository. The latest repository with the maximum number of videos and lengthiest 12 sec encompassed numerous forms of video manipulation technology. Indeed the contributions also include a number of falsified videos ascertained by manipulating the original video. In addition, the VLFD dataset includes 200 original and 200 forged videos. All the original videos are manipulated by applying frame duplication forgery, a category of video inter-frame forgery technique. The information on the key fact of the dataset is shown in Table 4. It incorporates about the video, the context in which the video is captured, the actual duration of the video, the total frames in the video, the quantity of frames manipulated in the video and, finally, the proportion of area manipulated in the video. The dataset contribution is very valuable for researchers and capable of allowing a clearer understanding of the methodologies used in their study.

Table 4
Ground truth of video forgery dataset

Sr. No. Test_Video Category Total_length Bitrate (kbps) Total_Frames Tampered_Frames Tampering (%) Execution_Time (s/frame)

1. Monkey Nature 00:00:09 79999 270 241–270 11.11% 1.06

2. Crow Nature 00:00:11 16554 302 271–302 10.26% 0.76

3. Flowers Nature 00:00:06 81400 180 151–180 16.11% 1.12

4. River Nature 00:00:07 16602 239 31–60 12.13% 0.80

5. Bowl Indoor 00:00:06 81201 180 151–180 16.66% 1.05

6. Butterflies Indoor 00:00:06 10285 180 151–180 16.66% 0.79

7. Globe Indoor 00:00:06 19146 182 151–182 17.58% 0.85

8. Duck Indoor 00:00:06 9752 173 126–150 13.87% 0.80

9. Pine_Flower Outdoor 00:00:06 41154 182 151–182 17.03% 0.80

10. Traffic Outdoor 00:00:06 81681 181 152–182 16.02% 1.03

11. Car Outdoor 00:00:08 80373 240 211–240 12.08% 1.10

12. Road&Car Outdoor 00:00:08 80314 240 211–240 12.08% 1.16

13. ABoy People 00:00:06 80675 180 31–60 16.11% 1.11

14. Walking People 00:00:05 15576 175 151–175 13.71% 0.81

15. Playing_children People 00:00:06 80933 180 151–180 16.11% 1.13

16. Village_Women People 00:00:12 16881 362 19–36 4.69% 0.77

Sr. No.	Test_Video	Category	Total_length	Bitrate (kbps)	Total_Frames	Tampered_Frames	Tampering (%)	Execution_Time (s/frame)
1.	Monkey	Nature	00:00:09	79999	270	241–270	11.11%	1.06
2.	Crow	Nature	00:00:11	16554	302	271–302	10.26%	0.76
3.	Flowers	Nature	00:00:06	81400	180	151–180	16.11%	1.12
4.	River	Nature	00:00:07	16602	239	31–60	12.13%	0.80
5.	Bowl	Indoor	00:00:06	81201	180	151–180	16.66%	1.05
6.	Butterflies	Indoor	00:00:06	10285	180	151–180	16.66%	0.79
7.	Globe	Indoor	00:00:06	19146	182	151–182	17.58%	0.85
8.	Duck	Indoor	00:00:06	9752	173	126–150	13.87%	0.80
9.	Pine_Flower	Outdoor	00:00:06	41154	182	151–182	17.03%	0.80
10.	Traffic	Outdoor	00:00:06	81681	181	152–182	16.02%	1.03
11.	Car	Outdoor	00:00:08	80373	240	211–240	12.08%	1.10
12.	Road&Car	Outdoor	00:00:08	80314	240	211–240	12.08%	1.16
13.	ABoy	People	00:00:06	80675	180	31–60	16.11%	1.11
14.	Walking	People	00:00:05	15576	175	151–175	13.71%	0.81
15.	Playing_children	People	00:00:06	80933	180	151–180	16.11%	1.13
16.	Village_Women	People	00:00:12	16881	362	19–36	4.69%	0.77

5. Implementation & results evaluation

5.1. Experimental environment

The proposed technique is implemented on a machine using Intel(R) Core(TM) i3–7020U processor and 4 GB of RAM and Microsoft Windows 10 with a 64 bit operating system with MATLAB. Parallel processing has been used to reduce the implementation of serial order processing and enables the faster execution of a project.

5.2. Experimental procedure

The proposed frame duplication detection algorithm has been evaluated for effect of generated 420 original and forged videos. Based on the proposed discussed VLFD dataset the simulation of the proposed technique has been allowed to conduct using MATLAB environment. To validate the current scenario, a frame sequence has been obtained from a video and then each RGB frame is converted into grayscale in preprocessing stage. Afterwards, by using the Hu invariant moment keypoints are extracted from each frame. As one frame is reproduced using frame duplication, so original and destination frame contain identical features in the forged video. A comparison has been performed to find variance of values in the forged video frames. It has been represented using the graph in Fig. 8(a). Ultimately abnormal points are generated to localize the actual location of duplicated frames present in the video. The frames containing with abnormal points are labelled as tampered, while the rest of the frames are considered to be the original frames. Figure 8(b) describes the results of a forged video.

Fig. 8.

Visualization of tampered video frame sequence.

5.3. Quality metrics

Confusion matrix, precision, recall and accuracy are the metrics used to assess the efficiency of the proposed technique. A table that defines the performance of a classifier using the test dataset for which the true values are specified, and is called the confusion matrix. It enables the quality of an algorithm to be visualized and illustrated in Table 5.

Table 5
Confusion matrix for evaluation process

Actual

Evaluated Forged/Positive Original/Negative

Forged/Positive True Positive False Negative

Original/Negative False Positive True Negative

	Actual
Forged/Positive	True Positive	False Negative
Original/Negative	False Positive	True Negative

True Positive (TP) represents duplicated frames are available in video and are also classified after evaluation as forged. True Negative (TN) represents original frames are classified after evaluation as original. False Negative (FN) represents duplicated frames are available in video and are classified after evaluation as original. False Positive (FP) represents original frames are classified after evaluation as forged. Accuracy is defined by the overall degree to which the classification process is accurate. Accuracy is given by the equation (11) $\begin{matrix} (11) & Accuracy (%) = \frac{(T P + T N)}{(T P + T N + F P + F N)} * 100 \end{matrix}$

Precision is defined by the accuracy of the model and by the amount of actual positive expectations. Sometimes it is also termed as the positive predictive value (PPV). Precision is given by the equation (12): $\begin{matrix} (12) & Precision (%) = \frac{T P}{(T P + F P)} * 100 \end{matrix}$

Recall is defined as the appropriate identification of the model by all possible positive labels. Sometimes it is also termed as the true positive rate (TPR). Recall is given by the equation (13) $\begin{matrix} (13) & Recall (%) = \frac{T P}{(T P + F N)} * 100 \end{matrix}$

5.4. Performance of proposed technique on different video categories

Experiments are performed on original and forged videos of the proposed dataset in this segment, to demonstrate the proposed technique efficacy. The features have been obtained first using hu moments. Then these feature vectors have been compared with each other for each frame, to identify duplicate frames. Such features improve the proposed technique efficiency, reliability and achieve more accuracy on higher resolution videos when compared with other state of art techniques. Performance thus calculated shows a prominent result. The performance of proposed technique in frame duplication forgery detection has been evaluated in terms of Accuracy, Precision and Recall. Proposed dataset videos are used to analyze the methodology’s performance. Figure 9 illustrates the accuracy, precision and recall of the Hu moment based technique for different video categories. Implementing the proposed technique on ASUS database Fig. 9(a) demonstrates the precision, accuracy and recall, while Fig. 9(b) reveals accuracy, precision and recall by implementing on the proposed VIVO database. Forgery detection in outdoor videos using the Asus PDAF camera may have the highest Precision up to 94.84% whereas indoor videos also have the highest precision and recall. It can be observed that even though TPR is highest for outdoor videos and is slightly better than scenic videos but FPR for indoor videos is much better than natural videos. Recall for people videos is also better than other categories. The proposed technique has been applied to all original and tampered videos to detect whether a video has tampered with or not, and results are shown in Fig. 9.

Fig. 9.

Dataset evaluation.

5.5. Comparision with state of art techniques

The proposed technique is compared with the equivalent existing state of art techniques in terms of detection efficiency and implementation period, in order to determine its effectiveness. While certain techniques have reasonable accuracy, the technique suggested has been computationally effective. Table 6 shows a comparison between the proposed technique and existing techniques, [6,12,19,22,23,25,41,51,54,56,57] in terms of average execution time, accuracy, localization and resolution of test videos. The proposed technique takes less computation time especially in comparison with [6,12,19,22,23,25,41,51,54,56,57]. Due to optical flow estimates the method in [51] takes the maximum computation time. Although [22,23] has not identified the location of the forgery but the proposed method has resolved the issue and returns exact location of forged videos. Furthermore, the resolution of videos used for testing proposed technique is even higher than the datasets used by existing state of the art techniques.

Table 6
Comparision with existing state of art techniques

Method Execution Time (sec.) Accuracy Localization Resolution of Test Videos

Lin et al. [23] 199.23 Low No 720 × 480

Lin et al. [22] – Low No 720 × 480

Liao et al. [19] 265.11 High Yes 640 × 480

Wang et al. [51] 95182.69 Medium Yes 720 × 576

Wu et al. [54] – Medium No 720 × 576

Singh et al. [41] 337.01 Medium Yes 320 × 240, 1080 × 1920

Yang et al. [56] 216.29 Medium Yes 320 × 240, 720 × 576

Liu et al. [25] 447.6 High Yes 320 × 240, 640 × 480

Zhao et al. [57] 337 High Yes 320 × 240, 1920 × 1080

Bakas et al. [6] – High Yes 320 × 240, 352 × 288

Proposed 187.67 High Yes 1920 × 1080, 3840 × 2160

Method	Execution Time (sec.)	Accuracy	Localization	Resolution of Test Videos
Lin et al. [23]	199.23	Low	No	720 × 480
Lin et al. [22]	–	Low	No	720 × 480
Liao et al. [19]	265.11	High	Yes	640 × 480
Wang et al. [51]	95182.69	Medium	Yes	720 × 576
Wu et al. [54]	–	Medium	No	720 × 576
Singh et al. [41]	337.01	Medium	Yes	320 × 240, 1080 × 1920
Yang et al. [56]	216.29	Medium	Yes	320 × 240, 720 × 576
Liu et al. [25]	447.6	High	Yes	320 × 240, 640 × 480
Zhao et al. [57]	337	High	Yes	320 × 240, 1920 × 1080
Bakas et al. [6]	–	High	Yes	320 × 240, 352 × 288
Proposed	187.67	High	Yes	1920 × 1080, 3840 × 2160

6. Conclusion

Due to increased Video editing software and dependency on multimedia content, video forgery detection is the necessity of time. This research paper has proposed a technique for the detection of frame duplication forgery in videos to accomplish the objective. The features of each video frame are extracted by Hu moments on the basis of gray values, which are then evaluated to identify and locate tampering in the video. Further, the paper proposes a new VLFD dataset containing 210 native videos. It is introduced as a valuable resource for evaluating forgery detection techniques. The phase detection autofocus (PDAF) camera technology has been used to acquire the dataset videos. These videos have been acquired with or without a tripod. The detailed quantitative analysis of the proposed dataset in comparison to the existing dataset(s) has also been discussed. Finally, the evaluation of the proposed dataset has been presented using the proposed technique. The proposed technique has validated the dataset and significantly outperforms in comparison to the existing state of the art techniques. The technique has achieved higher accuracy and less computation time. This paper may help future researchers to propose and evaluate the new techniques in video forensics. The dataset has been created as an initial repository for manipulated videos and can be enhanced with more types of forgeries in the future. Anti-forensic counterparts for the proposed technique can also be explored in the future.

References

O.I.

Al-Sanjary,

A.A.

Ahmed and

Sulong, Development of a video tampering dataset for forensic investigation, Forensic science international 266 (2016), 565–572. doi:10.1016/j.forsciint.2016.07.013.

O.I.

Al-Sanjary,

A.A.

Ahmed and

Sulong, Development of a Video Tampering Dataset VTD, Youtube, https://www.youtube.com/channel/UCZuuu-iyZvPptbIUHT9tMrA.

O.I.

Al-Sanjary and

Sulong, Detection of video forgery: A review of literature, Journal of Theoretical & Applied Information Technology 74(2) (2015).

Aparicio-Díaz,

Cumplido,

M.L.

Pérez Gort and

Feregrino-Uribe, Temporal copy-move forgery detection and localization using block correlation matrix, Journal of Intelligent & Fuzzy Systems 36(5) (2019), 5023–5035. doi:10.3233/JIFS-179048.

Ardizzone,

Bruno and

Mazzola, Copy-move forgery detection by matching triangles of keypoints, IEEE Transactions on Information Forensics and Security 10(10) (2015), 2084–2094. doi:10.1109/TIFS.2015.2445742.

Bakas,

Naskar and

Dixit, Detection and localization of inter-frame video forgeries based on inconsistency in correlation distribution between Haralick coded frames, Multimedia Tools and Applications 78(4) (2019), 4905–4935. doi:10.1007/s11042-018-6570-8.

Chen,

Tan,

Li and

Huang, Automatic detection of object-based forgery in advanced video, IEEE Transactions on Circuits and Systems for Video Technology 26(11) (2015), 2138–2151. doi:10.1109/TCSVT.2015.2473436.

Cozzolino,

Poggi and

Verdoliva, Copy-move forgery detection based on patchmatch, in: 2014 IEEE International Conference on Image Processing (ICIP), IEEE, 2014, pp. 5312–5316. doi:10.1109/ICIP.2014.7026075.

D’Amiano,

Cozzolino,

Poggi and

Verdoliva, Video forgery detection and localization based on 3D patchmatch, in: 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), IEEE, 2015, pp. 1–6.

10.

Deep Kaur and

Kanwal, An analysis of image forgery detection techniques, Statistics, Optimization & Information Computing 7(2) (2019), 486–500.

11.

Dong,

Yang and

Zhu, A MCEA based passive forensics scheme for detecting frame-based video tampering, Digital Investigation 9(2) (2012), 151–159. doi:10.1016/j.diin.2012.07.002.

12.

S.M.

Fadl,

Han and

Li, Inter-frame forgery detection based on differential energy of residue, IET Image Processing 13(3) (2018), 522–528. doi:10.1049/iet-ipr.2018.5068.

13.

N.K.

Gill,

Garg and

E.A.

Doegar, A review paper on digital image forgery detection techniques, in: 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), IEEE, 2017, pp. 1–7. doi:10.1109/ICCCNT.2017.8203904.

14.

C.-C.

Hsu,

T.-Y.

Hung,

C.-W.

Lin and

C.-T.

Hsu, Video forgery detection using correlation of noise residue, in: 2008 IEEE 10th Workshop on Multimedia Signal Processing, IEEE, 2008, pp. 170–174.

15.

M.-K.

Hu, Visual pattern recognition by moment invariants, IRE transactions on information theory 8(2) (1962), 179–187. doi:10.1109/TIT.1962.1057692.

16.

Johnston and

Elyan, A review of digital video tampering: From simple editing to full synthesis, Digital Investigation (2019).

17.

Kanwal,

Girdhar,

Kaur and

J.S.

Bhullar, Digital image splicing detection technique using optimal threshold based local ternary pattern, Multimedia Tools and Applications (2020), 1–18.

18.

Li,

Wang,

Zhang,

Yang and

Hu, Detecting removed object from video with stationary background, in: International Workshop on Digital Watermarking, Springer, 2012, pp. 242–252.

19.

S.-Y.

Liao and

T.-Q.

Huang, Video copy-move forgery detection and localization based on Tamura texture features, in: 2013 6th International Congress on Image and Signal Processing (CISP), Vol. 2, IEEE, 2013, pp. 864–868. doi:10.1109/CISP.2013.6745286.

20.

C.-S.

Lin and

J.-J.

Tsay, A passive approach for effective detection and localization of region-level video forgery with spatio-temporal coherence analysis, Digital Investigation 11(2) (2014), 120–140. doi:10.1016/j.diin.2014.03.016.

21.

C.-Y.

Lin and

S.-F.

Chang, Generating robust digital signature for image/video authentication, in: Multimedia and Security Workshop at ACM Multimedia, Vol. 98, Citeseer, 1998, pp. 49–54.

22.

G.-S.

Lin and

J.-F.

Chang, Detection of frame duplication forgery in videos based on spatial and temporal analysis, International Journal of Pattern Recognition and Artificial Intelligence 26(07) (2012), 1250017. doi:10.1142/S0218001412500176.

23.

G.-S.

Lin,

J.-F.

Chang and

C.-H.

Chuang, Detecting frame duplication based on spatial and temporal analyses, in: 2011 6th International Conference on Computer Science & Education (ICCSE), IEEE, 2011, pp. 1396–1399.

24.

Liu,

Li and

Bian, Detecting frame deletion in H. 264 video, in: International Conference on Information Security Practice and Experience, Springer, 2014, pp. 262–270. doi:10.1007/978-3-319-06320-1_20.

25.

Liu and

Huang, Exposing video inter-frame forgery by Zernike opponent chromaticity moments and coarseness analysis, Multimedia Systems 23(2) (2017), 223–238. doi:10.1007/s00530-015-0478-1.

26.

Mahmood,

Nawaz,

Irtaza,

Ashraf,

Shah and

M.T.

Mahmood, Copy-move forgery detection technique for forensic analysis in digital images, Mathematical Problems in Engineering 2016 (2016).

27.

Mahmood,

Nawaz,

Shah,

Khan,

Ashraf and

H.A.

Habib, Copy-move forgery detection technique based on DWT and Hu Moments, International Journal of Computer Science and Information Security (IJCSIS) 14(5) (2016).

28.

Otiniano-Rodrıguez,

Cámara-Chávez and

Menotti, Hu and Zernike moments for sign language recognition, in: Proceedings of International Conference on Image Processing, Computer Vision, and Pattern Recognition, 2012, pp. 1–5.

29.

H.D.

Panchal and

H.B.

Shah, Video tampering dataset development in temporal domain for video forgery authentication, Multimedia Tools and Applications (2020), 1–25.

30.

V.M.

Potdar,

Han and

Chang, A survey of digital image watermarking techniques, in: INDIN’05. 2005 3rd IEEE International Conference on Industrial Informatics, IEEE, 2005, pp. 709–716. doi:10.1109/INDIN.2005.1560462.

31.

Qadir,

Yahaya and

A.T.S.

Ho, Surrey University Library for Forensic Analysis (SULFA) of video content, 2012, http://sulfa.cs.surrey.ac.uk/.

32.

Shanableh, Detection of frame deletion for digital video forensics, Digital Investigation 10(4) (2013), 350–360. doi:10.1016/j.diin.2013.10.004.

33.

Sharma and

Kanwal, Multimedia forensics: Analysis, classification and future directions, Wireless Communication and Mathematics (2019), 131.

34.

Sharma,

Kanwal and

R.S.

Batth, An ontology of digital video forensics: Classification, research gaps & datasets, in: 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), IEEE, 2019, pp. 485–491. doi:10.1109/ICCIKE47802.2019.9004331.

35.

Shullani,

Fontani,

Iuliani,

Al Shaya and

Piva, VISION: A video and image dataset for source identification, 2017, https://lesc.dinfo.unifi.it/en/datasets.

36.

Shullani,

Fontani,

Iuliani,

O.A.

Shaya and

Piva, Vision: A video and image dataset for source identification, EURASIP Journal on Information Security 2017(1) (2017), 15. doi:10.1186/s13635-017-0067-2.

37.

R.D.

Singh and

Aggarwal, Detection of upscale-crop and splicing for digital video authentication, Digital Investigation 21 (2017), 31–52. doi:10.1016/j.diin.2017.01.001.

38.

R.D.

Singh and

Aggarwal, Detection and localization of copy-paste forgeries in digital videos, Forensic science international 281 (2017), 75–91. doi:10.1016/j.forsciint.2017.10.028.

39.

R.D.

Singh and

Aggarwal, Video content authentication techniques: A comprehensive survey, Multimedia Systems 24(2) (2018), 211–240. doi:10.1007/s00530-017-0538-9.

40.

T.R.

Singh,

K.M.

Singh and

Roy, Video watermarking scheme based on visual cryptography and scene change detection, AEU-International Journal of Electronics and Communications 67(8) (2013), 645–651. doi:10.1016/j.aeue.2013.01.008.

41.

V.K.

Singh,

Pant and

R.C.

Tripathi, Detection of frame duplication type of forgery in digital video using sub-block based features, in: International Conference on Digital Forensics and Cyber Crime, Springer, 2015, pp. 29–38. doi:10.1007/978-3-319-25512-5_3.

42.

Sitara and

B.M.

Mehtre, Digital video tampering detection: An overview of passive techniques, Digital Investigation 18 (2016), 8–22. doi:10.1016/j.diin.2016.06.003.

43.

Sowmya,

Chennamma and

Rangarajan, Video authentication using spatio temporal relationship for tampering detection, Journal of Information Security and Applications 41 (2018), 159–169. doi:10.1016/j.jisa.2018.07.002.

44.

Su,

Huang and

Yang, A video forgery detection algorithm based on compressive sensing, Multimedia Tools and Applications 74(17) (2015), 6641–6656. doi:10.1007/s11042-014-1915-4.

45.

Su,

Nie and

Zhang, A frame tampering detection algorithm for MPEG videos, in: 2011 6th IEEE Joint International Information Technology and Artificial Intelligence Conference, Vol. 2, IEEE, 2011, pp. 461–464. doi:10.1109/ITAIC.2011.6030373.

46.

Su,

Zhang and

Liu, Exposing digital video forgery by detecting motion-compensated edge artifact, in: 2009 International Conference on Computational Intelligence and Software Engineering, IEEE, 2009, pp. 1–4.

47.

Tao,

Jia and

You, Review of passive-blind detection in digital video forgery based on sensing and imaging techniques, in: International Conference on Optoelectronics and Microelectronics Technology and Application, Vol. 10244, International Society for Optics and Photonics, 2017, p. 102441C.

48.

Wang,

Li,

Zhang and

Ma, Video inter-frame forgery identification based on consistency of correlation coefficients of gray values, Journal of Computer and Communications 2(04) (2014), 51. doi:10.4236/jcc.2014.24008.

49.

Wang and

Farid, Exposing digital forgeries in video by detecting duplication, in: Proceedings of the 9th Workshop on Multimedia & Security, ACM, 2007, pp. 35–42. doi:10.1145/1288869.1288876.

50.

Wang and

Farid, Exposing digital forgeries in interlaced and deinterlaced video, IEEE Transactions on Information Forensics and Security 2(3) (2007), 438–449. doi:10.1109/TIFS.2007.902661.

51.

Wang,

Jiang,

Wang,

Wan and

Sun, Identifying video forgery process using optical flow, in: International Workshop on Digital Watermarking, Springer, 2013, pp. 244–257.

52.

Wang,

Xue,

Zheng,

Liu and

Li, Image forensic signature for content authenticity analysis, Journal of Visual Communication and Image Representation 23(5) (2012), 782–797. doi:10.1016/j.jvcir.2012.03.005.

53.

Wary and

Neelima, A review on robust video copy detection, International Journal of Multimedia Information Retrieval 8(2) (2019), 61–78. doi:10.1007/s13735-018-0159-x.

54.

Wu,

Jiang,

Sun and

Wang, Exposing video inter-frame forgery based on velocity field consistency, in: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2014, pp. 2674–2678. doi:10.1109/ICASSP.2014.6854085.

55.

Yan,

Mei and

Chunqin, SAR image target recognition based on Hu invariant moments and SVM, in: 2009 Fifth International Conference on Information Assurance and Security, Vol. 1, IEEE, 2009, pp. 585–588. doi:10.1109/IAS.2009.289.

56.

Yang,

Huang and

Su, Using similarity analysis to detect frame duplication forgery in videos, Multimedia Tools and Applications 75(4) (2016), 1793–1811. doi:10.1007/s11042-014-2374-7.

57.

D.-N.

Zhao,

R.-K.

Wang and

Z.-M.

Lu, Inter-frame passive-blind forgery detection for video shot based on similarity analysis, Multimedia Tools and Applications 77(19) (2018), 25389–25408. doi:10.1007/s11042-018-5791-1.

	Actual

Evaluated	Forged/Positive	Original/Negative
Forged/Positive	True Positive	False Negative
Original/Negative	False Positive	True Negative

Video interframe forgery detection: Classification,technique & new dataset

Abstract

Keywords

1. Introduction

1.1. Measures of video forensics

2. Related works

2.1. Inter-frame forgery detection

2.2. Intraframe forgery detection

3.2. Feature extraction

3.2.1. Hu moments

3.3. Classification

3.4. Abnormal point detection

4. VLFD dataset contributions

4.1.3. Generalized framework of video forgery process

5.1. Experimental environment

5.2. Experimental procedure

Table 5 Confusion matrix for evaluation process Actual Evaluated Forged/Positive Original/Negative Forged/Positive True Positive False Negative Original/Negative False Positive True Negative

References

Table 5
Confusion matrix for evaluation process

Actual

Evaluated Forged/Positive Original/Negative

Forged/Positive True Positive False Negative

Original/Negative False Positive True Negative