Abstract
With the advances in video technology, the advent of spherical video (360° video) recorded using an omnidirectional camera offers a limitless field-of-view (FoV) to the viewers. However, they suffer from the fear of missing out (FOMO) because they can only see a particular FoV at a time. Reviewing a long recorded surveillance video i.e., 24 hours a day is a time-consuming process due to temporal and spatial redundancy. A solution to this problem is to compactly represent the video synopsis by shifting the objects along the time domain. Using a multi-camera setup for surveillance creates blind spots. This problem is solved by using a spherical camera. Therefore, in this paper, we focus on creating and visualizing the video synopsis recorded by the spherical camera. The optimization algorithm plays a key role in condensing the recorded video. Hence, a novel spherical video synopsis optimization framework has been introduced to generate compact videos that eliminate FOMO. The synopsis is generated by shifting objects on the temporal axis and displays them simultaneously by optimizing multiple constraints. It minimizes activity loss, virtual collisions, temporal inconsistencies, and synopsis video length by preserving interactions between objects. The proposed multiobjective optimization includes a new constraint to restrict the number of objects displayed per frame due to the limitation of the human visual system. Direction-based visualization methods have been proposed to improve the viewer’s experience without FOMO. Comparative performance of the proposed framework using the latest metaheuristic optimization algorithms with existing video synopsis optimization algorithms is performed. It is found that chronological disorder ratio and overall virtual collision are minimized effectively through the recent metaheuristics optimization algorithms compared to the related works on video synopsis.
Keywords
Introduction
The spherical camera records the scene as a spherical environment pointing from the center, rather than the rectangular view used in traditional videography [1]. Figure 1 gives the different FoV’s of the input 360° video. With the growing popularity of spherical content, end-users are provided with convenient content consumption [2]. Blind spots are a common problem in multi-camera surveillance setups [3], but spherical camera technology solves this problem by capturing the scene in a realistic and unobstructed manner. To monitor the busy city, the 360° camera must be placed motionless on the spot. At the visualization end, the viewer can see all the sides of the scene (i.e., 360° view), allowing a limitless FoV (i.e., no scenes are overlooked), whereas, with multi-camera surveillance, the viewer has to use the camera switch manually to see the whole scene.

Different FoV’s of the input 360° video
The steps involved in 360° video processing are recording, stitching, projecting, and visualizing the scene. The internal structure of the omnidirectional camera encompasses multiple cameras placed on a specific rig. It records the entire panoramic scene covering all the events around the camera, excluding the camera itself. Once the scene is recorded, it undergoes stitching using specific stitching software or by the camera themselves to synchronize the events together [4]. After the recording and stitching, the 360° scene will experience a projection from one form to another form based on the viewer’s interest. Generally, they can be projected in three ways: panoramic, cube map, and spherical projection. In this work, equirectangular projection is used. The 360° video is immersive and can be visualized on computers, smartphones, or by Head Mounted Display(HMD). On a computer, using the mouse cursor, different FoV can be visualized by clicking and dragging across the scene. For smartphones and HMD, FoV can be visualized based on their orientation.
With the exponential growth in the number of surveillance videos, searching and retrieving content can be time-consuming. Video abstraction is a common solution to this problem by creating a concise representation of the original video. However, the object-based synopsis of the 360° video is still an unexplored area. It provides a way of creating a condensed video from the input 360° surveillance video, highlighting only the activities of the key object.
Previous works on spherical video summarization ([5–9]), have focused on a fixed viewing angle of at least 12 normal FoV (NFoV) glimpses and the saliency of spherical video scenes is not taken into account. Also, the presence of unlimited FoV leads to FOMO. This motivated us to suggest a spherical video synopsis approaches that rely on visualizing prominent areas with only 4 NFoV glimpses without FOMO.
The human visual system is restricted to handling a fixed number of objects less than eight at the same time [10]. During the spherical surveillance video synopsis investigation, the viewer’s experience of understanding the scene is challenging when they display limitless objects per frame [11]. Therefore, the number of objects to be displayed must be considered. The existing traditional video synopsis works have ignored this concept. This motivated us to propose a multiobjective function to restrict the number of visible objects to the viewers simultaneously.
Our contributions and focus of this paper are in the modules of energy minimization problem formulation. Hence, state-of-the-art methods are employed for the pre-processing and post-processing steps of spherical video synopsis generation like track extraction and tube stitching. Toward this end, this work proposes a novel spherical surveillance framework for video synopsis generation. Also, a direction-based visualization framework is proposed as a classical video visualization approach is inappropriate for spherical videos.
The main contributions of this work are given as follows: A novel A novel For easy understanding of the scene, a viewer determined A comprehensive performance analysis of various latest optimization algorithms with the existing video synopsis works for both
This paper is organized as follows. Section 2 reviews the related research work on traditional video synopsis and 360° video summarization. Section 3 details the proposed method. Moreover, this section describes the spherical synopsis videos’ problem formulation, generation, and visualization. Section 4 presents the proposed work’s experimental results and performance analysis. Finally, the conclusions is expressed in Section 5.
This section gives a brief survey of related works on traditional video synopsis and 360° video summarization.
Traditional video synopsis
Acha et al. proposed two methods for video synopsis, which are low-level graph optimization and object-based approach [12]. Low-level graph optimization processes every pixel in the video. This method allows pixels to be from any time, which is a difficult problem. Hence an object-based approach is introduced. Pritch et al. suggested a video synopsis method for an infinite video stream [13]. The proposed approach has two phases, an online phase, and a response phase. Pritch et al. suggests two ways to create a condensed synopsis video [14]. It uses a greedy approach to generate a synopsis video from the input. Pritch et al. introduced a clustering-based video synopsis method to search and browse surveillance video [15]. Objects with similar motions are clustered together and displayed simultaneously. Tian et al. proposed an energy minimization-based trajectory mapping method using a genetic algorithm [16]. A synopsis video is generated by classifying the object tracks into four groups. The space-time-based event rearrangement optimization algorithm is suggested by Yi et al. [17]. It outlines the problem as an iterative judgment of event association. Spatial integrity is maintained by introducing an event subsection modification approach. Ruan et al. introduced an online-based dynamic tube rearrangement [18]. It includes a graph coloring approach in which the relationships between object tubes are modeled using dynamic graphs. The approach to generate a synopsis of the outdoor environment is given by Ahmed et al. [19]. Objects are categorized using deep learning mechanisms and grouped into entry and exit areas. Based on the viewer’s request, the energy minimization process generates the synopsis. The hybrid version of simulated annealing and teacher learning-based optimization algorithm is suggested by Ghatak et al. [20]. It provides the best global solution with less computational time. It effectively solves the problem of optimal tube rearrangement. Hybridization of simulated annealing and grey wolf optimizer for generating video synopsis was proposed by Ghatak et al. [21]. The overall efficiency of the synopsis framework is improved by using a Generative Adversarial Network-based foreground extraction approach, namely, multi-frame and multiscale in Generative Adversarial Network. The survey of works related to traditional video synopsis is given in Table 1. Existing video synopsis methods for traditional video are investigated by [22–24]. Through the literature study, it is known that there are numerous works available for the generation of synopsis videos from traditional videos.
Survey on Traditional Video Synopsis
Survey on Traditional Video Synopsis
Su et al. proposed a novel Pano2Vid framework to generate a natural-looking NFoV for panoramic 360° videos [5]. For the given input 360° video, Pano2Vid generates glimpses like traditional viewing with a horizontal spanning angle of 65.5°. Pano2Vid uses a data-driven approach, namely, AutoCam. AutoCam uses dynamic programming for selecting optimal camera trajectories to generate glimpses. Pano2Vid compiles a new dataset for automatic cinematography in 360° videos. It uses 18 azimuthal and 11 polar angles. Su et al. adds additional capability to the Pano2Vid task by controlling its FoV dynamically using a coarse-to-fine optimization technique [6]. In this work, Pano2Vid was extended to Zoom Lens Pano2Vid with three horizontal spanning angles as {104.3°, 65.5°, 46.4°}. It uses 18 azimuthal and 11 polar angles.∥Hu et al. [7] proposed a solution for automatic 360° video piloting by using an online agent. With the help of the previously selected viewing angle and the key object, the future viewing angle will be predicted using a Recurrent Neural Network(RNN). For evaluation, a 360° Sports video dataset comprising five sports areas was introduced. 360° video spatial and temporal summarization was introduced by Yu et al. [8]. Composition View Score (CVS), a deep learning-based ranking approach, was proposed to identify which view is appropriate for 360° video highlight. CVS generates a score map of composition per video segment in spherical form. The CVS model outperforms the Pano2Vid dataset. It uses 4 azimuthal and 3 polar angles.∥Lee et al. presented a story based temporal 360° video summarization [9]. It uses a novel approach to the memory network model, Past Future Memory Network (PFMN), where the past memory stores selected sub-shots and the future memory consists of possible future sub-shots. It uses 9 azimuthal and 9 polar angles.∥Xu et al. suggested an approach to apply deep reinforcement learning-based approach to predict the viewports [25]. It extracts the spatio-temporal features of attention-related informations. It provides a database collecting head movements in panoramic videos. The proposed method was implemented in both online and offline versions. In the online approach, the future head movement is estimated based on the current head position, whereas the offline approach determines the head movement at each spherical frame. Then, a heat map was generated to predict the final head movement.∥Ban et al. performs prediction of future viewports through the viewer’s personalized and cross viewers behavior information [26]. A Quality of Experience (QoE) based framework is also introduced to optimize available video streaming methodologies and an algorithm to solve the Nondeterministic Polynomial (NP) problem at a low cost. The saliency computed at different scales is different; hence, Xu et al. solved the problem of tracking large-scale dynamic spherical videos [17]. Saliency maps are computed at different spatial scales: sub-image patch centered at the current viewpoint, sub-image corresponding to FoV, and spherical image. The saliency maps and the corresponding images are now given into the Convolutional Neural Network (CNN) for feature extraction. Long Short-Term Memory (LSTM) is used to encode the historical viewing path. Then the features of CNN and LSTM are combined to predict the gazes points. It uses 208 spherical videos with at least 31 subjects.∥Pietrangelo et al. introduces an algorithm for the long-term viewport prediction in the spherical videos [28]. The trajectories with similar viewing experiences are grouped, and a different function is used for each group. At each run-time, the pre-computed functions are used. In the field of viewport prediction, Wu et al. provided a solution to various challenges by using a preference-based viewport prediction network [29]. The 360 feature extraction is improved by using spherical CNN (S2CNN). A RNN is used to extract content information to utilize the spatial attention of the interested regions to achieve multimodal data learning. Tang et al. jointly exploited the viewport trajectory with multi-object tracking and the historical viewport trajectory for predicting the future viewport [30]. A multi-object selection procedure was introduced due to the multiple objects in the panoramic video. The proposed methodology supports multiple viewport prediction, whereas the existing works predicted only the single viewport.∥Chen et al. suggested a neural network architecture for predicting the future head movement [31]. The video features are extracted using CNN, and the future head movement is predicted using LSTM. It provides better prediction accuracy considering only the video’s position data. Chao et al. proposed a transformer-based viewport prediction in the spherical videos [32]. The suggested approach was implemented on three widely-used datasets and was observed to outperform the state-of-the-art methods in terms of computation complexity. It also provides high accuracy for long and short-term viewport prediction.∥Li et al. used S2CNN instead of a traditional two-dimensional CNN to mitigate the weight-sharing failure caused due to the video projection distortion [33]. Salient spatio-temporal features are extracted using a spherical convolution-based saliency detection model. RNN is used to predict the video’s time series of FoV information. Then the salient features and the time series information are combined to predict the future viewer’s head movement. Table 2 illustrates the existing works on the summarization of 360° videos.
Survey on 360° Video Summarization
Survey on 360° Video Summarization
The approach proposed in this work involves the following steps: First is the mathematical formulation of the energy minimization problem. The second is the generation and visualization of spherical video summaries.
Problem formulation
A spherical video synopsis (S) is obtained by shifting the objects in the recorded spherical surveillance video (r). This work concentrates on the temporal shifting of the objects, keeping the spatial domain unchanged. Based on the literature related to traditional video synopsis, the Equation 1 was formulated. Here, a new constraint namely display cost was integrated. The temporal shift (M) is achieved by minimizing the subsequent energy cost function:
It penalizes for the activity that is missed out while mapping, r to S. It is zero if all the objects (O) in r are mapped to S [12].
It adds a penalty for the collision of two temporally shifted objects in S [12].
It adds a penalty for violating the chronological order of events that are shifted in S. If there is no violation, then it will be zero [13].
It penalizes for a lengthy synopsis in comparison with r [3].
The human visual system is restricted to handle a fixed number of objects less than eight at the same time [10]. The existing traditional video synopsis frameworks have ignored this concept. It penalizes for showing more than the viewer’s limited number of objects per frame in S.
The pipeline for generating and visualizing spherical video synopsis consists of four phases: recording, processing, stitching, and visualization. This is shown in Fig. 2.

Phases involved in the generation and visualization of spherical synopsis video
The 360° surveillance video is recorded using a spherical camera. Here, the scene is recorded using multiple cameras which are independent of each other and are stitched together to create a spherical view. The frame of the recorded 360° surveillance video in equirectangular projection is given in Fig. 3.

Frame No: 4574 extracted from the recorded spherical surveillance video in equirectangular projection
In this phase, spherical background extraction, object detection, and track extraction are performed. The background of r with no moving objects is extracted by using the time-lapse background video generation [14]. It uses a temporal histogram. Figure 4 illustrates a time-lapse background extracted from the recorded spherical surveillance video.

Illustration of a time-lapse background extracted from a recorded spherical surveillance video
Object detection and track extraction are the key tasks in generating an object-based video synopsis for spherical surveillance video. Faster R-CNN [35] is used to detect moving objects in r. Figure 5 illustrates object detection performed on recorded spherical video. Once the objects are detected, the track extraction is done by using Deep SORT [36]. A visualization of the 12 object track paths denoting the starting and end point in the input video is illustrated in Fig. 6. The object tube consists of a sequence of activities performed by the respective objects throughout the recorded surveillance video. Finally, extracted tracks and the objects’ respective timestamps are placed individually in an object store.

Object detection performed on recorded spherical video

A visualization of the 12 object track paths extracted from the recorded spherical video
The interactions between two moving objects namely, O A and O B in r are combined into a single tube and preserved in S. Examples of interactions include two people talking for a few minutes, a handshake, and an exchange of items. The recursive tube-grouping algorithm [37] is used to group such interactive objects.
The objects in the object store undergo the energy minimization process to minimize Equation (1). According to the best temporal positions obtained, the objects and their respective timestamps are stitched to the extracted spherical background image using Poisson image editing [38] in the stitching phase. Figure 7 represents the stitched objects with timestamps to the spherical background.

Representation of the stitched objects with timestamps to the extracted background
360° video offers an immersive experience to the viewers. However, if the viewer is looking at one part, the viewer is more likely to miss an event occurring in the other part of the sphere. It can be commonly termed as FOMO [39] as it contains multidirectional information about the scene. Therefore, it is important to view all the scenes ignoring irrelevant content (i.e., giving less viewing importance to insignificant content). As a result, the audience can see and interpret all the key parts of the content that needs to be viewed without FOMO. Visual saliency prediction allows the viewer to focus on certain locations or parts of the visual scene while ignoring other information based on its importance. Graph-Based Visual Saliency 360 (GBVS360) [34] is used to generate a spherical saliency map. Figure 8 illustrates the saliency map of the recorded spherical surveillance video. It provides a probability distribution of human visual attention within a spherical frame.

Illustration of recorded 360° surveillance video saliency map
In this work, it is observed that the top and bottom scenes cover less salient information, hence these two regions are given less importance. Now, the viewport is generated for other regions using rectilinear projection. It maps the spherical portion to a flat space [40]. The viewport generation is primarily dependent on the pitch, yaw, and roll angles. The pitch and yaw angle (θ, φ) are plotted to the row and column of Cartesian coordinate (x, y) respectively. As per the literature, the roll angle is set at 0° [41]. The existing works on 360° video summarization uses at least 12 NFoV glimpses. Therefore, in this work, the condensed spherical video synopsis is visualized in multiple directions as per the traditional videography having limited FoV of less than 12 NFoV glimpses i.e., θ ={ 0° } and φ ={ 0°, ± 90°, 180° }. This allows all salient areas to be displayed in the video synopsis while ignoring less important areas in multiple directions at the same time (i.e., direction-based visualization). Visualizing the scene from multiple directions simultaneously eliminates FOMO from the generated video synopsis.
Due to the unavailability of real-time spherical surveillance video, the user-generated video was recorded from the National Institute of Technology Puducherry at 24 fps with Insta360 ONE X. The video duration (Z) is 01:03:11 (HH:MM:SS), and it contains 110 360° objects with 20 interacting objects. The objects include pedestrians, two-wheelers, and four-wheelers. The resolution of the recorded spherical frame and FoV of generated synopsis video is medmuskip = 0mu5760 × 2880 and medmuskip = 0mu1200 × 1200 respectively. The number of frames of the recorded 360° surveillance video is 91,000. Exhaustive experiments are carried out on the recorded spherical surveillance videos for the performance analysis of the proposed framework. During the development and testing phase of the proposed approach, MATLAB (version 2019a) with an Intel Core i7 2.60 GHz CPU with 16GB RAM is employed. Figure 9 presents the viewport generation performed by the rectilinear projection of a 360° surveillance synopsis video. The preservation of interacting objects in the synopsis video is shown in Fig. 10. To track an object from one area to another, a direction-based video synopsis of each area must be played back simultaneously. Figure 11 gives the multiple direction-based video synopsis of 35 th frame, where each area is played synchronously.
The proposed work on the generation of spherical surveillance synopsis video experiments on two cases:
In this case, the length of the spherical synopsis video will vary for each optimization algorithm based on its performance.
In this case, the length of the spherical synopsis video is the longest object tube length for all optimization algorithms.

Viewport generation performed using the rectilinear projection of a 360° surveillance synopsis video

Preservation of interacting objects in the synopsis video

Multiple direction-based synopsis of Frame No: 35, where each area is played synchronously with only 4 NFoV glimpses
The proposed spherical synopsis framework has experimented with recent metaheuristics optimization algorithms such as follows, Ant Lion Optimization (ALO) [44], Aquila Optimization (AO) [45], Giza Pyramids Construction (GPC) [46], Grey Wolf Optimization (GWO) [47], Heap-Based Optimization (HBO) [48], Hybrid of Particle Swarm and Grey Wolf Optimization (HPSOGWO) [49] for performance analysis. It is also compared with optimization algorithms used in existing traditional video synopsis works such as Hybrid of Grey Wolf Optimization and Simulated Annealing (HGWOSA) [21], HSAJAYA [50], Hybrid of Simulated Annealing and Teacher Learning based Optimization (HSATLBO) [20], Particle Swarm Optimization (PSO) [42], and Simulated Annealing (SA) [12].
Compared to other costs, collision costs have a higher weight to effectively reduce the collision of 360° objects. The activity, collision, temporal consistency, length, and display cost weights are 0.1, 0.4, 0.2, 0.1, and 0.2, respectively. 100 iterations are used for the optimization algorithms, and 10 populations are used. The assignment of other specific parameter values for tuning is the same as that described in the relevant literature.
Here, the activity cost is zero in both cases, which indicates that the object’s loss is zero. Figure 12 illustrates the convergence characteristics of the used optimization algorithms. It is observed that compared with other optimization algorithms, ALO [44] and PSO [42] for V L and ALO [44] for F L converges in less time compared with other optimization algorithms.

Analysis of convergence characteristics for both V L and F L
Table 3 presents the experimental analysis of the optimization algorithms used in the proposed framework for V
L
. According to Table 3, we can reasonably infer as follows, Video synopsis generated by optimization algorithms other than ALO [44] and HBO [48] shifts the objects with high collision. Video synopsis generated by optimization algorithms other than GPC [46] shifts the object with a high temporal disorder. Video synopsis generated using PSO [42] has minimum synopsis length. Video synopsis generated using HGWOSA [21] packs the desired number of objects per frame effectively.
Experimental analysis of the proposed framework for V L
Table 4 presents the experimental analysis of the optimization algorithms used in the proposed framework for P
L
. According to Table 4, we can reasonably infer as follows, Video synopsis generated by optimization algorithms other than SA [12] shifts the objects with high collision. Video synopsis generated by optimization algorithms other than GPC [46] and HBO [48] shifts the object with a high temporal disorder. Video synopsis generated using HSATLBO [20] packs the desired number of objects per frame effectively.
Experimental analysis of the proposed framework for P L
The evaluation metrics used are object loss, activity preservation rate, space efficiency, chronological disorder ratio, and overall virtual collision rate.
It is the ratio of the number of objects in the synopsis video (S
O
) to the recorded 360° surveillance video r
O
[16]. It is denoted as σ.
It is the ratio of the number of activities in the synopsis video (A
S
) to the recorded video (A
r
) [51]. It is referred to as η.
Space efficiency (S
E
) is defined as the ratio of the number of frames discarded by the synopsis (f
d
) to the number of frames (N) in the recorded video [43].
It is defined as the ratio between the sum of all disordered objects (O
d
) to the total number of moving objects (T
Mo
) [37]. Disordered objects mean objects that are not following the order of appearance as that of the recorded 360° surveillance video. It is denoted as δ.
It is denoted as χ and given as follows [37],
The performance metrics of the spherical synopsis video with V L are given in Table 5. PSO [42] generates a spherical synopsis with minimum length and maximum space efficiency. Better chronological disorder ratio and overall virtual collision are provided by GPC [46] and HBO [48] respectively.
Performance metrics of the spherical synopsis with V L
Table 6 gives the performance metrics used in the proposed work for generating a fixed-length synopsis. For F L , the synopsis length is 17,345, equal to the longest object tube length in the recorded surveillance video, and the synopsis duration Z is 00:12:02. The space efficiency is 80.94%. The synopsis generated using HBO [48] and SA [12] provide better chronological disorder ratio and overall virtual collision results, respectively. For both cases V L and F L , all the objects in the recorded video are included in the synopsis video, thus making the object loss and activity preservation rate 0 and 100% respectively.
Performance metrics of the spherical synopsis with F L
Table 7 presents the analysis of the activity preserved for the viewer-specified video duration for V L and F L for 4 minutes. Here, ALO [44] provides better results for both object loss and activity preservation rate for both V L and F L . Table 8 presents the analysis of the activity preserved for the viewer-specified video duration for V L and F L for 6 minutes. For object loss in V L and F L , ALO [44] and AO [45] performs better, respectively. For activity preservation rate in V L and F L , ALO [44] and AO [45] performs better, respectively.
Analysis on the activity preserved for the viewer specified video duration for V L and F L for 4 minutes
Analysis of the activity preserved for the viewer specified video duration for V L and F L for 6 minutes
Analysis of the activity preserved for the viewer-specified video duration for V L and F L for 8 minutes is given in Table 9. Here, for object loss in V L and F L , ALO [44] and AO [45] performs better, respectively. For activity preservation rate in V L and F L , ALO [44] and AO [45] performs better, respectively.
Analysis on the activity preserved for the viewer specified video duration for V L and F L for 8 minutes
Table 10 presents the analysis of the activity preserved for the viewer-specified video duration for V L and F L for 10 minutes. Here, ALO [44] provides better results for object loss and activity preservation rate.
Analysis of the activity preserved for the viewer specified video duration for V L and F L for 10 minutes
Table 11 presents the analysis of the activity preserved for the viewer-specified video duration for V L and F L for 12 minutes. Here, for object loss in V L , ALO [44] and SA [12] while, in F L , ALO [44] performs better. For activity preservation rate in V L , ALO [44] and SA [12] while, in F L , ALO [44] performs better.
Analysis of the activity preserved for the viewer specified video duration for V L and F L for 12 minutes
An analysis of the amount of activity held within the viewer-specified synopsis duration is performed to see how the optimization algorithm works regarding activity preservation rate. Figure 13 illustrates the analysis of the activity preserved for the viewer-specified synopsis video duration for V L and F L using recent metaheuristics optimization algorithms like ALO [44], AO [45], GPC [46], GWO [47], HBO [48], HPSOGWO [49] and compared with the existing traditional video synopsis works such as HGWOSA [21], HSAJAYA [50], HSATLBO [20], PSO [42], and SA [12].

Analysis of the activity preserved for the viewer-specified video duration (i.e., 4 minutes, 6 minutes, 8 minutes, 10 minutes, and 12 minutes) for V L and F L
To understand the effectiveness of the new display constraint in the proposed framework a user rating is performed to assess the human visual system experience of the user for the generated video synopsis. Here, 10 users are selected for rating purposes. Each user is requested to rate the individual algorithm out of 10 for three cases such as 6, 8, and 10 objects per frame. Table 12 presents the average user rating for the mentioned cases. It is observed that 8 objects per frame offer the human visual system a better opportunity to understand the scene perfectly.
User rating based on the number of objects displayed per frame
A comparative statistical analysis was performed to validate the effectiveness of the proposed framework over the other considered meta-heuristic methods presented in Table 13 in terms of best, mean, worst, standard deviation (S.D), and execution time in seconds. It is observed that recent metaheuristics optimization algorithm yields better results compared to the existing traditional video synopsis work.
Statistical comparison to validate the effectiveness of the proposed framework
A corner case analysis is performed to ensure the scalability of the proposed framework. This is shown in Table 14. Two test cases were used and it was observed that both test cases passed the proposed framework. This shows that the proposed framework is effectively scalable. And the proposed framework has been tested on different types of spherical video. It is found that the proposed approach serves as a generalized framework for creating spherical video synopsis.
Corner case analysis for the proposed framework
This paper proposes a novel spherical surveillance video synopsis generation and visualization framework with interaction preservation. The main contribution of this work is the elimination of FOMO through direction-based visualization by using only four NFoV glimpses. To better understand the scene, the framework allows the flexibility to restrict the objects the viewer sees in each frame. A comprehensive comparative analysis of recent metaheuristic optimization algorithms and existing video synopsis work is performed for varying and fixed synopsis lengths. The analysis is also done based on the viewer’s input constraints on the length of the synopsis video. Various performance metrics are used to find the efficacy of the proposed framework. Experimental analysis suggests that the proposed framework provides better results in solving the multiple constraints through the utilization of recent metaheuristic optimization algorithms compared with the state-of-art methods in the domain of video synopsis. The limitation of this work is that it forces the viewer to visualize from multiple directions. This is a difficult task as the viewer has to track single object in the video synopsis in multiple directions simultaneously.
