Content relative thresholding technique for key frame extraction

Abstract

The growth in communication methods have motivated a good number of users to migrate the existing communication methods towards video-based communications. Thus, the use of video-based communications have become the basic communication method for various fields and domains as distance education, business, physical security monitoring and also in the field of news and media. The summarization process demands to extract key components from the video data in order to reduce the size of the data without compromising on any information loss. This processing is called key frame extraction process. Realizing the priority of the key frame extraction process, a few parallel research attempts were executed to match with the bottleneck of information loss and size reduction. Nevertheless, the processes were highly criticised for being time complex and sometimes for information loss. The issue with the standard or parallel methods for extraction of key frames is either high or low rate of key frame extractions, which in turn results into high size or high information loss respectively. Thus, this work aims to provide a novel key frame extraction process using the image meta data and further the adaptive thresholding method. The work demonstrates a nearly 50% reduction in time complexity with 100% accuracy of the key frame extraction process and finally a nearly 30% reduction in the key frame replication control.

Keywords

Key frame extraction automated framework data replication control reduced time complexity video stabilization

1. Introduction

Any key frame extracted from a video can represent a lot of information. As the key frames provides a suitable meta-data for indexing, browsing and retrievals. The review work by Aigrain et al. [12] presents a significant proof of concepts demonstrating the benefits of key frame extractions for various video processing and information extraction methods. The notable work by Zhang et al. [6] also proves that the key frame extraction and key frame-based indexing, searching and retrieving can be faster for any video datasets. The searching methods can quickly extract the desired key frames based on the analysis carried out for only key frames. None of the single research has predicted the accurate number of key frames to be extracted based on the length of the video. As the optimal number of key frames depend on the density of the information present in the video. The information density can be estimated based on the supported audio codec or the colour variations between the frames of the image. Thus it is natural to understand that the high density video will have more information available compared to the less denser video data. Thus, this becomes the most prominent method for key frame extraction. Zhang et al. [7] attempted first to extract the key frames based on the colour histogram difference between the first key frame and the sub sequent frames. This idea was widely accepted due to the nature of accumulating the effects of object motion or camera motion. This proposal was enhanced by Gunsel and Tekalp [2] by enhancing the threshold based key frame selection method and demonstrated significant improvements. Another direction of the same research is to cluster the frames based on the colour depth and further extract the significant frames from each cluster groups in order to formulate the key frame sequence. This proposal was first proposed by Hanjalic and Zhang [1]. Nevertheless, the selection of the significant frame for each cluster was criticised by various parallel researches and encountered with a modified approach of selecting the significant frame by trial methods. Zhuang et al. [21] proposes a method for selecting the significant frame for each cluster by first assuming the centroid frame and then adjust the frame by comparing the threshold values with other frames. The end result of this process is to generate the key frame with actual mid threshold value. Also, various other research attempts demonstrate the use of motion metric to detect the key frame based on the local minima of the time function on the optical key flow. The work by Wolf [17] demonstrates the key frame selection process by using motion analysis for the first time. This work was motivated by the work by Gresle and Huang [13]. The work by Gresle and Huang proposed a key frame selection algorithm by analysing the activities in each frame.

Conversely, the key frame extraction process is also presented from a different dimension by various researchers. The studies from these parallel researches are focused on the capture device specifications and benefits. The method demonstrated by Liu and Zhao [4] utilizes I frames, P frames and B frames for extracting the key frames. This method demonstrated higher robustness in the process. Another approach is to detect the key frames based on a provided predefined summery of similar images. This process is demonstrated by Sujatha [14]. The most recent outcome from the parallel research for extracting the key frames based on the perceived motion energy by Liu et al. [15]. The work by Liu and Zhao [4] and the work of Liu et al. [15] was analysed and an attempt to combine these two approaches was made by Gargi et al. [16]. Henceforth it is natural to understand that none of the single framework provides the optimal solution to the key frame extraction problem [18, 19, 20]. Thus, this work proposes a framework for video key frame extraction.

The result of the work is furnishing such as the outcomes from the parallel researches are been compared and analysed in Section 2. The statistics behind the motivation of the research is furnished in terms of consumer video statistics in the Section 3. The proposed framework for key frame extraction is elaborated in the Section 4 and the algorithm deployed for the key frame extraction on the framework is demonstrated in Section 5. In Section 6 the mathematical model is formulated; this work enhances a regular dataset for the establishing the proposed method. The analysis of the dataset is carried out and presented in the Section 7. The proposed method is compared with the existing framework based on architecture, time complexity, network load and data replication control analysis and the findings are furnished in the Section 8. The results obtained from this method is elaborated and analysed in the Section 9. The conclusion of the work is presented in the Section 10.

2. Parallel research outcomes

Many of the algorithms have been proposed for key frame extraction from videos. These traditional algorithms combine the functionalities of key frame identification and object detection to guarantee that the extracted key frames contain valid information about specified objects.

In general, the key frame extraction process is done in the following ways.

2.1 Sample frames

A sample frame is a frame which is considered as a reference frame to extract relevant key frames in a motion video. Identification of such sample frames is very easy and simple but the algorithms which use this technique will not effectively represent some significant contents and semantic patterns in a frame.

2.2 Curvature interpretation

Some image processing techniques represent each frame as a group of points in the feature space. These points are sequentially connected to form an arbitrary shaped curve. The traditional approaches use sequential information for curve simplification and key frame extraction. The main limitation of these approaches is that optimization of curve representation. Different metrics are also proposed for curve simplification by the discrete shape evolution algorithms.

2.3 Clustering

Clustering extracts the similar objects into a group. Clustering algorithms are used in image and video retrieval algorithms to identify key frames. The frames which are closest to the cluster centres are to be considered as the key frames. The main advantage of the clustering-based algorithms is that they can use global characteristics of a video can be inherited by the extracted key frames.

Even though clustering techniques performing better in key frame extraction, but there are certain limitations in attainment of semantically meaningful clusters, especially for high dimensional data and sequential nature of video frames.

This work performs $k$ -implies clustering is a technique for vector quantization, initially from sign pre-paring, that is well known for bunch examination in information mining. $k$ -implies clustering intends to segment n perceptions into k groups in which every perception has a place with the bunch with the closest mean, filling in as a model of the group. These outcomes in an apportioning of the information space into Voronoi cells. The issue is computationally troublesome (NP-hard); be that as it may, proficient heuristic calculations merge rapidly to a nearby ideal. These are typically like the desire boost calculation for blends of Gaussian circulations by means of an iterative refinement approach utilized by both k-implies and Gaussian blend displaying. The two of them use group focuses to demonstrate the information; in any case, $k$ -implies clustering will in general discover bunches of practically identical spatial degree, while the desire augmentation component enables bunches to have various shapes.

2.4 Sequential vs. global characteristic comparisons

Recent works on video processing, focusing more on sequential and global characteristics to distinguish key frames from the extracted frames from a video. Every key frame is sequentially compared with every extracted frame until a frame which is very different from the key frame is obtained. This new frame is selected as the next key frame.

The characteristics of the sequential comparison algorithms include their simplicity, perception, low computational complexity. However, these algorithms are less utilised because of the extracted key frames represent only local specifications of the video rather than the global specifications. So, there is a huge chance of data replication.

Other side, some algorithms use global characteristics of a key frame by minimizing a predefined objective function based on the application.

The characteristics of the global comparison algorithms include temporal variance, coverage, and correlation and reconstruction error.

2.5 Temporal variance

Variance is an important parameter to be considered while extracting a key frame from each video segment. There should be an equal temporal variance exist between the each extracted key frame. The objective function can be selected as the sum of differences between temporal variances of all the segments. The temporal variance in a video can be approximated by the cumulative change of contents across consecutive frames in the segment or by the difference between the first and last frames in the segment.

2.6 Maximum coverage

Capturing objects in a motion video is a complex task. To confine that key frame extraction is a best of representation for object identification. There is no fixed limitation for extraction of key frames. Traditional algorithms minimize the number of key frames subject to a predefined reliability criterion. These algorithms extract key frames by maximizing their representation coverage, which is the number of frames that the key frames can represent.

2.7 Minimum correlation

Few algorithms minimize the sum of correlations between key frames, making key frames as uncorrelated with each other as possible. For instance, represent frames their correlations using a directed weighted graph. The shortest path in the graph is found and the vertices in the shortest path which corresponds to minimum correlation between frames entitle as the key frames.

2.8 Minimum reconstruction error

Another set of algorithms extract key frames to minimize the sum of the differences between each frame and its corresponding predicted frame reconstructed from the set of key frames using interpolation. The set of key frames extracted by considering global characteristics are more concise and less redundant than that produced by the sequential comparison-based algorithms. The limitation of the global comparison-based algorithms is that they are more computationally expensive than the sequential comparison-based algorithms.

Further, the detailed analysis of the parallel research outcomes has contributed toward building a guardrail in this work demonstrating few key points to be done and not to be done while building a key frame extraction framework or an algorithm (Table 1).

Table 1
Parallel research outcome review conclusion

Method name and author	Characteristics	Advantages	Shortcomings of the method
Clustering method Kong et al. [18]	Analysis on short boundary video	Faster processing	• Less key frame selection for single shot activity • More key frame selection for multiple shot activity
Entropy method Mentzelopoulos and Psarrou [11]	Best method for unpredictable data set	Local feature selection	• External effects such as lighting condition affects the performance
Histogram method Rasheed and Shah [22]	Similarity measure between key frames	High level segmentations	• Cannot consider the local similarities
Motion analysis method Wolf [17]	Optical flow based analysis	Faster mid-range key frame selection	• Highly depends on the static frame references
Triangle based method Liu et al. [15]	Determination of the motion characteristics	Reduces the motion effects on the video	• Cannot detect the colour based information change
3-D augmentation method Chao et al. [5]	Processing short and fast motion video data	Combines the video data into multi dimension model	• Highly time complex
Optimal key frame selection method Sze et al. [10]	Best method for continuously growing video sequence by adopting the temporary key frame	Faster processing due to probabilistic analysis	• Highly time complex
Context based method Chang et al. [8]	Best method for repetitive information contents	Generates a multilevel abstract of the information	• Information loss due to less key frame selection
Motion based extraction method Luo et al. [9]	Adopts the advantages from digital capture devices	Reduces the spatio-temporal effects	• High quality video information expected
Robust principal component analysis method Zhuang et al. [21]	Adopts the decomposition method for sparse component analysis	Analysis the frames for consumer videos with less contents or rapid content shift	• Assumptions are not always reflecting better results

The proposed framework can overcome the shortcomings identified and the detail analysis is provided further sections.

3. Consumer video statistics

The video data is comprising of contents where sometimes the content is mixed with people’s expression and sometimes only with the objects. Henceforth, it is a prime task to understand the distribution of the content over these two categories. Also, the understanding of the content length and motion factors of the objects in the video and capture devices. This will certainly help in building a much robust algorithm for key frame extraction (Table 2).

Table 2
Consumer video data analysis

Type of the video	Content	Average length	Camera motion
Animated	20% with people 80% without people	5 mins	95% steady
Indoor	30% without people 70% with people	15 mins	60% steady
Outdoor	50% without people 50% with people	20 mins	15% steady

Subsequently these learning are the plan rules for structure the system and the calculation, showed in the following stage.

4. Proposed framework

The proposed structure is a mechanized system for extraction of the key edges. The oddity of the system is to extricate the diminished number of casings with least time unpredictability. Likewise, this system gives a work in video catching and video adjustment part. The accessible connection specialist is equipped for modifying limit amid correlation and can learn dependent on the video types. The parts of the proposed structure are expounded here (Fig. 1).

Table 3
Capture library parameters

Parameter name	Parameter description
Frames per second	The number of frames in the original video from the capture source per second.
Scan system	The scan system classified the video into two categories: • Progressive: All the scan lines are refreshed sequentially • Interlaced: All the scan lines are refreshed consecutively
Actual aspect ratio	The ratio between the original height and original width of the video captured from the web URL
Augmented aspect ratio	The ratio between the reduced height and reduced width of the video
BPP	Bits per pixel
Compression ratio	The ratio of the original video and the video after storage
Channel	Number of channel used to build the video sequence

Figure 1.

The proposed framework.

4.1 Capture code

The capture code agent in the framework is responsible for capturing the video from various internet sources as programmed. Firstly, this component detects the native nature of the video and loads the appropriate library from the driver library associated with the capture module. Secondly, the frame size of the video after capturing is defined and the stream capture object is made ready. Further, the native socket connection is established and the video from the source URL is captured. Before writing the capture video from the source into the local machine, the header for the video with the metadata information in written in the local buffer then stored into the local video repository.

4.2 Capture library

This static component of the framework houses the library and recommendation system for the video stabilization and key frame extraction processes. The extracted metadata from the capture code agent is utilized by this agent and provides further recommendation in order to reduce the overall time complexity of the algorithm. The parameters in the video library are elaborated here (Table 3).

4.3 Video stabilization

After the capturing process and the library metadata analysis of the videos, the next agent in the framework performs the video stabilization process. During the video stabilization process, from the initial frames the reflexive points are extracted and compared with the subsequent frames. After the initial sets of points are extracted, then the relative points are measured. Based on the differences between the reflexive point and the relative point, the points are shifted to the relative average points. The result of this process is the stabilized video.

4.4 Key frame extraction agent

The one of major components of this framework is the key frame extraction agent. The agent calculates the total threshold of the video. The calculated threshold then compared with the per frame threshold and based on the compared result, the key frame is been selected.

4.5 Correlation analysis agent

The correlation agent controls the replications after the key frames are extracted. The correlation agent firstly converts the image into the binary images thus resulting into the two-dimensional images. Further the converted images are compared based on the correlation of the objects. The duplicated key frames are rejected, and the unique key frames are finally reported to the reporting agent.

4.6 Key frame reporting agent

The final UI agent of the framework is the visual representation of the images. The final key frames are presented to the end users.

4.7 Performance analyser

The internal framework agent called performance analyser calculates the CPU time for each phase of the algorithm and performs the overall time complexity analysis. This analysis helps the framework to identify the agents which are under performing, thus making the framework adaptive and continuously enhancing.

5. Proposed algorithm

The proposed algorithm is the prime component of the novel proposed framework and the algorithm is furnished here.

Step-1.	Read Frames from a web URL
Step-2.	Collect Silent Points from Each Frame
Step-3.	Select Correspondences between Points
Step-4.	Estimating Transform from Noisy Correspondences
Step-5.	Transform Approximation and Smoothing
Step-6.	Present the final video
Step-7.	Image Metadata extraction
a.	Extracts Frames Per Second
b.	Extracts Scan System
c.	Extracts Actual Aspect Ratio
d.	Extracts Augmented Aspect Ratio
e.	Extracts BPP
f.	Extracts Compression Ratio
g.	Extracts Channel
h.	Calculate the Global Threshold of the Video
i.	Extract the frames from the video
Step-8.	Compare the key frame threshold with the global threshold
Step-9.	IF the frame threshold is greater than the global threshold
a.	Then accept the frame as key frame
Step-10.	Else
a.	Reject the frame
Step-11.	Converts the images into the two dimensional images
Step-12.	Calculate the correlations between all frames
a.	Calculate the replication factor
Step-13.	Present the final key frames

6. Mathematical model for the solution

In this section of the work, the mathematical model for the solution is formulated here.

Firstly, the video signal is considered to be the input for this process and represented as,

$\displaystyle v=\int^{t=n}_{t=0}f(t)$ (1)

where, $n$ denotes the duration of the captured video $f(t)$ is the function collaboration of images

Natural to understand, that the image collaboration function is a general representation of image sequences, thus

$\displaystyle f(t)=\sum^{t=n}_{t=0}I(t)$ (2)

The improvement demonstrated in this work, compared to the existing key frame extraction method is the consideration of local and global thresholds for identification of key frames. This work proposes to calculate the global threshold by usual apprehension and calculate the local threshold for each class or the group or the clusters of the images.

In order to calculate the local threshold, it is important to extract frames from the video signal and classify the images or frames.

$\displaystyle I=\frac{I(t)}{\Delta(t)}$ (3)

Here, $\Delta(t)$ denotes the pre-decided fraction of time $I$ denote any extracted frame

Further, the detection of objects in each frame based on the characteristics set is performed.

$\displaystyle I(obj[])=\lfloor I\rfloor_{\text{featureset}}$ (4)

And

$\displaystyle\text{featureset }[]=[c,i,r,a)$ (5)

Here, $c$ denotes the colour range; $i$ denotes the intensity; $r$ denotes the region; $a$ denotes the area of the object.

After the calculation of the objectsets for each image frame, this work calculates the number of objects and subsequently labels the objects for each class.

$\displaystyle\text{Class}(I)=\frac{I(obj[])}{\text{featureset}(n)}$ (6)

Hence, any image class for image $I$ will be based on the value proposition of the feature set parameter values.

Once the clustering of the images are completed, this work calculates the local threshold for each class and the global threshold for the complete video.

The local threshold considered as $g$ ,

$\displaystyle g=\frac{\text{Class}(I)}{\Delta\text{featureset}(n).\det(I)}$ (7)

Here, it is prominent to understand that the median set of values for each parameter in the feature set is the realization factor of the threshold.

Thus, the global threshold considered as $G$ ,

$\displaystyle G=\sum^{k=n}_{k=1}\int\text{Class}(I)\bigg{/}\oint_{I}g(I)$ (8)

Here, the natural combination of all the image classes and the natural distribution of the local thresholds are considered.

Finally, the key frame extraction process is relatively adaptive and based on the content, which separated the images into various clusters or classes.

$\displaystyle I^{\prime}=\frac{\delta\frac{\delta I(t)}{\delta g}y}{\delta G}$ (9)

For realizing the complete process, this work also furnishes the process here (Fig. 2).

7. Dataset analysis

This framework and the algorithm are rigorously tested on the following data (Table 4).

Table 4
Analysis of the testing dataset

Category name	Number of videos in the category	Average duration (min)	Capture frames per second
Animated	85	2.90	24
Indoor	94	6.46	24
Outdoor	122	3.19	24

Figure 2.

Schematic process of key frame extraction.

The dataset is comprised on three different categories and with multiple videos with different lengths. The videos are with 24 FPS.

The results of the algorithm and the framework execution on the mentioned dataset are furnished in the further section of this work.

8. Comparative analysis

In order to understand the performance of the proposed framework and advancements over the other parallel research outcomes, this work compares the proposed framework with the parallel outcomes on the basis of framework architecture, time complexity analysis, and network load analysis and data replication controls.

8.1 Framework comparison

Firstly, the analyses of the framework architecture in terms of the drawbacks identified are listed from the knowledge of parallel research outcome analysis. Further the shortcomings are been supplemented by the advantages of this framework (Table 5).

Table 5
Deep analysis on the proposed framework for bottlenecks

Shortcomings of the existing methods	Advantages of the proposed framework
Less key frame selection for single shot activity more key frame selection for multiple shot activity	The global threshold analysis of the video summarizes the overall threshold and thus makes a normalized threshold point to reduce the effect of low or high frame selections
External effects such as lighting condition affects the performance	The video stabilization and normalization phase in this proposed algorithm reduces the effects of the external lighting effects.
Cannot consider the local similarities	This algorithm considers the local and global threshold of the frames and videos respective to reduce the local similarity problem.
Highly depends on the static frame references	Do require initial frame reference for video stabilization
Cannot detect the colour-based information change	Colour based information can be preserved
Information loss due to less key frame selection	Optimal key frame selection
High quality video information expected	This framework does not depend on the quality of the video
Assumptions are not always reflecting better results	This framework does not rely on any assumptions

8.2 Time complexity analysis

The time complexity is a key factor of evaluation for any key frame extraction process and the time complexities for each dataset is analysed here (Table 6).

Table 6
Time comparison analysis

Model name	Average time complexity analysis (MSec)
	Animated video	Indoor video	Outdoor video
Clustering method	995	1698	703
Entropy method	893	1221	644
Histogram method	1258	1713	733
Proposed threshold based method	894	1519	695

Hence it is natural to understand that the proposed algorithm is faster than majority of the methods.

8.3 Network load analysis

The re-enactment of the structure is completed on an independent framework without system segments. In any case, the number of key edges separated from the video is transmitted over the system and produces the system load. The created system burden will give the examination of the system burdens or exercises (Table 7).

Table 7
Network load analysis

Video category	Number of frames (A)	Number of key frames (B)	Total size (MB) (C)	Network load (MBPS) ( $\rm D=$ $\rm((C/A)*B)$
Animated	2802	81	156	4.50
Outdoor	7904	186	423	9.95
Indoor	7940	84	425	4.49

The analysis demonstrates the higher load in case of the outdoor videos. Nevertheless, for the animated and indoor videos the network load is reduced significantly.

8.4 Data replication analysis

As expressed by the calculation in this work, the replication control is likewise been taken consideration amid the extraction of key edge process. This is considered one of the real results of the work. From this time forward, the similar investigation of the information or key edge replication control is outfitted here (Table 8).

Table 8
Data replication analysis [16]

Model name	Average data replication (%) (MSec)
	Animated video	Indoor video	Outdoor video
Clustering method	1.9	1.3	0.7
Entropy method	2.1	1.4	0.9
Histogram method	1.0	1.1	0.4
Proposed threshold based method	0.1	0	0.1

The advantages of the proposed algorithm are discussed in the prior sections of this work.

8.5 Validation of proposed algorithm on standard dataset

In order to justify the correctness of the algorithms and the improvements obtained by the proposed framework, in this section of the work, the algorithm and the framework is tested on standard TREC Video Retrieval Test Collection, 2015 (Table 9).

Table 9
Key frame extraction on standard dataset

Video data item	Category	Total frames	Total extracted key frames	Key frame %
IBM’s universal virtual computer and preservation manager (2008)	Educational	70	69	98.57
TLT conference UNCH machinima presentation (2008)	Educational	20	20	100

Thus, with the awareness of the improvements specified in this section over the parallel researches, this work presents the results of this framework in the next section.

9. Results and discussion

Firstly, the frame separation process outcomes are showcased (Table 10).

Table 10
Frame separation process

Video data item	Category	Duration (Min)	Camera motion	Object motion	Total frames
Sample-1	Animated	1.56	No	Yes	2802
Sample-2	Animated	4.23	No	Yes	7904
Sample-3	Animated	4.25	No	Yes	7940
Sample-4	Outdoor	3.19	Yes	Yes	4784
Sample-5	Indoor	8.41	No	No	12973
Sample-6	Indoor	4.50	No	Yes	7221

The results are analysed visually (Fig. 3). The result demonstrates that the nature of the video is significantly reflected in the number of frames separated from the video. This improvement is due to the availability of the capture library in the framework.

Secondly, the threshold analysis results are pictured (Table 11).

Table 11

Mean, std. dev and threshold analysis

Video data item	Category	Size	Mean	Std. dev.	Threshold
Sample-1	Animated	156.00	12919.75	37757.91	89436.92
Sample-2	Animated	423.00	14383.59	48845.77	106380.11
Sample-3	Animated	425.00	9418.27	31048.01	68721.09
Sample-4	Outdoor	319.00	10054.95	22034.04	62253.83
Sample-5	Indoor	841.00	11014.45	26121.14	70178.93
Sample-6	Indoor	450.00	10465.47	25766.24	67628.11

The result is analysed visually (Fig. 4).

Furthermore, the analysis of the number of key frame selection process is analysed (Table 12).

Table 12

Key frame selection

Video data item	Category	Total frames	Total extracted key frames	Key frame %
Sample-1	Animated	2802	81	2.8908
Sample-2	Animated	7904	186	2.3532
Sample-3	Animated	7940	84	1.0579
Sample-4	Outdoor	4784	66	1.3796
Sample-5	Indoor	12973	126	0.9712
Sample-6	Indoor	7221	84	1.1633

The results are analysed graphically (Fig. 5).

Next, the accuracy in terms of key frame selection and rejection based on correlation analysis is been furnished here (Table 13).

Table 13

Key frame accuracy

Video data item	Category	Total key frames	Accepted key frames	Accuracy %
Sample-1	Animated	2802	81	100
Sample-2	Animated	7904	186	100
Sample-3	Animated	7940	84	100
Sample-4	Outdoor	4784	66	100
Sample-5	Indoor	12973	126	100
Sample-6	Indoor	7221	84	100

Figure 3.

Frame separation process outcomes.

Figure 4.

Threshold analysis of the video data.

Finally, the analysis of the time complexity results is furnished (Table 14).

Table 14

Time complexity analysis

Video data item	Category	Total key frames	Accepted key frames	CPU time (MSec)
Sample-1	Animated	2802	81	386.84
Sample-2	Animated	7904	186	1093.59
Sample-3	Animated	7940	84	1201.84

Figure 5.

Key frame extraction (%).

The comparative analysis demonstrates significant improvements over the parallel research outcomes by this work. Henceforth, with this improvement acknowledgement, in the next section of this work, the final research conclusion is presented.

10. Conclusion

The data extraction from the video information or the video outline is the most significant procedure. This work assesses the prevalent parallel results from the ongoing examines and builds up the enhancements in the proposed system. This work shows a 100% precision of the key casing extraction and lessens the time multifaceted nature. The decreases in the system burden and replication proportion of the key casings are likewise one of the striking results of this work. The proposition from the structure of including the extraction library is incorporated as recommender framework into the work and shows the huge enhancements for key casing extraction independent of the video length rather on the substance type. This is made conceivable because of another idea of consideration of metadata extraction process and joining that data amid the edge division process. With the essentially acceptable outcomes this work establishes the frameworks for further investigation of data extraction from the key casing and supports the video ordering and data recovery forms for improving the data portrayal for the science.

References

Hanjalic

and Zhang

H.J.

, An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis, IEEE Trans Circuits Syst Video Technol (9 Dec 1999), 1280–1289.

Gunsel

and Tekalp

A.M.

, Content-based video abstraction, in: Proc IEEE Int Conf Image Processing, Chicago, IL, 1998, pp. 128–132.

Dang

and Radha

, RPCA-KFE: Key frame extraction for video using robust principal component analysis, IEEE Transactions on Image Processing 241(1) (November 2015).

Liu

and Zhao

, Key frame extraction from MPEG video stream, Second Symposium International Computer Science and Computational Technology (ISCSCT ’09), Huangshan, China, 26–28 (Dec 2009), 7–11.

Chao

G.-C.

Tsai

Y.-P.

and Jeng

S.-K.

, Augmented 3-D keyframe extraction for surveillance videos, IEEE Transactions on Circuits and Systems for Video Technology 201(1) (November 2010).

Zhang

H.J.

Wang

J.Y.A.

and Altunbasak

, Content-based video retrieval and compression: A united solution, in: Proc IEEE Int Conf Image Processing 1 (Oct 1997), 13–16.

Zhang

H.J.

Zhong

and Smoliar

S.W.

, An integrated system for content-based video retrieval and browsing, Pattern Recognit 30(4) (1997), 643–658.

Chang

H.S.

Sull

and Lee

S.U.

, Efficient video indexing scheme for content-based retrieval, IEEE Transactions on Circuits and Systems for Video Technology 9(8) (December 1999).

Luo

Papin

and Costello

, Towards extracting semantically meaningful key frames from personal video clips: From humans to computers, IEEE Transactions on Circuits and Systems for Video Technology 19(2) (February 2009).

10.

Sze

K.-W.

Lam

K.-M.

and Qiu

, A new key frame representation for video segment retrieval, IEEE Transactions on Circuits and Systems for Video Technology 15(9) (September 2005).

11.

Mentzelopoulos

and Psarrou

, KeyFrame extraction algorithm using entropy difference, Proceedings of the 6th ACM SIGMM International Workshop on Multimedia Information Retrieval, MIR 2004, New York, NY, USA, (October 2004), 15–16.

12.

Aigrain

Zhang

and Petkovic

, Content-based representation and retrieval of visual media: A state-of-the-art review, Multimedia Tools Applicat 3 (1996), 179–202.

13.

Gresle

P.O.

and Huang

T.S.

, Gisting of video documents: A key frames selection algorithm using relative activity measure, in: Proc 2nd Int Conf Visual Information Systems, 1997, pp. 279–286.

14.

Sujatha

, A study on keyframe extraction methods for video summary, IEEE Conference on Computational Intelligence and Communication Networks (CICN), Gwalior, India, (October 2011), 7–9.

15.

Liu

Zhang

H.-J.

and Q

, A novel video key-frame-extraction algorithm based on perceived motion energy model, IEEE Transactions on Circuits and Systems for Video Technology 13(10) (Oct 2003).

16.

Gargi

Kasturi

and Strayer

S.H.

, Performance characterization of video-shot-change detection methods, IEEE Transactions on Circuits and Systems for Video Technology 10(1), (February 2000).

17.

Wolf

, Key frame selection by motion analysis, in: Proc IEEE Int Conf Acoust, Speech Signal Proc 2 (May 1996), 1228–1231.

18.

Kong

Chen

Ren

Qian

and Liu

, Particle filter-based vehicle tracking via HOG features after image stabilisation in intelligent drive system, IET Intelligent Transport Systems 13(6) (2019).

19.

Zhou

Huang

Liu

and Niu

, Learning content-adaptive feature pooling for facial depression recognition in videos, Electronics Letters (2019).

20.

Zhuangt

Rui

Huang

T.S.

and Mehrotra

, Adaptive key frame extraction using unsupervised clustering, IEEE Int conf on Image Proc (1998).

21.

Zhuang

Rui

Huang

T.S.

and Mehrotra

, Adaptive key-frame extraction using unsupervised clustering, in: Proc IEEE Int Conf Image Processing, Chicago, IL, (Oct (1998), 886–870.

22.

Rasheed

and Shah

, Detection and Representation of Scenes Videos, IEEE Transactions on Multimedia 7(6) (December 2005).

Content relative thresholding technique for key frame extraction

Abstract

Keywords

1. Introduction

2. Parallel research outcomes

2.1 Sample frames

2.2 Curvature interpretation

2.3 Clustering

2.4 Sequential vs. global characteristic comparisons

2.5 Temporal variance

2.6 Maximum coverage

2.7 Minimum correlation

2.8 Minimum reconstruction error

Table 1 Parallel research outcome review conclusion

Table 2 Consumer video data analysis

Table 3 Capture library parameters

4.2 Capture library

4.3 Video stabilization

4.4 Key frame extraction agent

4.5 Correlation analysis agent

4.6 Key frame reporting agent

4.7 Performance analyser

5. Proposed algorithm

6. Mathematical model for the solution

Table 4 Analysis of the testing dataset

8.1 Framework comparison

Table 5 Deep analysis on the proposed framework for bottlenecks

Table 6 Time comparison analysis

Table 7 Network load analysis

Table 8 Data replication analysis [16]

Table 9 Key frame extraction on standard dataset

Table 10 Frame separation process

References

Table 1
Parallel research outcome review conclusion

Table 2
Consumer video data analysis

Table 3
Capture library parameters

Table 4
Analysis of the testing dataset

Table 5
Deep analysis on the proposed framework for bottlenecks

Table 6
Time comparison analysis

Table 7
Network load analysis

Table 8
Data replication analysis [16]

Table 9
Key frame extraction on standard dataset

Table 10
Frame separation process