Ontology based multiobject segmentation and classification in sports videos

Abstract

The primary objective is to identify and segments the multiple, partly occluded objects in the image. The subsequent stage carry out our approach, primarily start with frame conversion. Next in the preprocessing stage, the Gaussian filter is employed for image smoothening. Then from the preprocessed image, Multi objects are segmented through modified ontology-based segmentation, and the edge is detected from the segmented images. After that, from the edge detected frames area is extracted, which results in object detected frames. In the feature extraction stage, attributes such as area, contrast, correlation, energy, homogeneity, color, perimeter, circularity are extorted from the detected objects. The objects are categorized as human or other objects (bat/ball) through the feed-forward back propagation neural network classifier (FFBNN) based upon the extracted attributes.

Keywords

Object segmentation gaussian filtering object classification object detection feature extraction ontology

1 Introduction

Video segmentation is a huge concern in several applications that identify the object in the temporal image series for the next process like tracking, editing, reconstruction or identification [1]. The Video Object Segmentation (V.O.S.) aims to extort the front objects from the back scenario in a video [2]. Video object segmentation is a must for several sophisticated vision applications, including video understanding, object recognition and vision object replacements (cut and paste the video objects). In image and video processing, image segmentation is a challenging task and is one of the vital operations during processing [3]. The segmentation depends on the image’s dimensions, and it may be grey content, depth, texture, colour, or motion [4]. Every pixel in a region is comparable with respect to a few traits or computed properties like colour, intensity or texture. The adjacent regions fluctuate concerning similar characteristics [5].

Two circumstances occur with the changeable segmentation scales, namely, a) Under segmentation and b) Over segmentation [6, 7]. Under segmentation arises when segmentation limits the parameter like heterogeneity, band count, and resolution, resulting in higher segments than the actual attributes [8]. Over segmentation arise when the segmentation strictures generate lower than the actual attributes [9]. Under and over, segmentation makes a considerable change in the classification accuracy; however, under segmentation holds a much worse consequence [10].

Image segmentation is employed in several applications like biomedical image analysis, character detection and target identification [11]. In general, the image segmentation techniques are broadly classified into three types, namely, a) Boundary-based techniques, b) Region-based techniques, and c) Hybrid techniques [12, 13]. Several completely automated techniques and supervised techniques for segmenting the objects have been developed like histogram-based, edge-based, region-based and Markov-random field-based approaches [14]. Image processing is a burgeoning region in the arena of computer science. Its escalation has been occupied by the technological advancement in computer processing, digital imaging and huge storage devices [15]. Image processing generally begins with the edge detection technique, which is then followed by the feature extraction [16]. The edge detection is one of the most common practices in digital image processing. Image segmentation could be carried out through several edge detection methods like Sobel, Prewitt, EM algorithm, OSTU, Roberts, Canny, LoG, and Genetic Algorithm [17]. Image and video object detection face a protracted-standing challenge for computer vision, and the problem complexity is mainly based on the constriction and the limits enforced on the data [18]. The advantages of digital image processing are low-cost processing, reliably high image quality and the capability to control every view of the process. [19]. The most outstanding factor is the performance gap amid at the recent video object segmentation approaches, which experiences significant growth nowadays [20].

This manuscript’s sketch is designed as follows: Section 2 investigates the related problem regarding the proposed approach. Section 3 comprises a concise discussion related to the proposed method; Section 4 analyses the experimental outcomes, and section 5 concludes the work.

2 Related work

Ying pings Huang et al. [21] have projected a new method for the concurrent detection and categorization of the multiple class impediments such as automobiles, pedestrians and others. That technique incorporates stereovision-based obstacle recognition and the dynamic contour models. The stereovision is employed to build a depth map. Thus, many impediments got separated from the congested background based upon their locations. The dynamic contour prototypes are implemented for mining contour. The geometrical attributes such as aspect, area ratio and height are included for categorizing various objects including automobiles, pedestrians and others.

Mohini Deokar et al. [22] have projected an innovative technique for shot boundary detection through Block based histogram Comparison to deal with the concern in the video segmentation process. Initially, they flash videos depending upon the color and motion attributes offered for the cricket videos. A group of videos spin prominent interpretation regarding the affair department approaches normally fail because of the absence of visual dissimilarity amid the space and time. They confined a precedent-setting computationally smarter to find out the abrupt and gradual variations. In this approach, the frame traits such as color and motion features have been employed to attain the constant variation for the dissimilarities in illumination.

Jayanthh et al. [23] have projected the vision-based object recognition algorithms to mechanize the video frame abstraction with the sports action in a different view. In the preprocessing, filtered the frames with a scene of cricket area, where the cricket field made a region, by removing the frames which include an outlook of the viewers, close shots of the particular actors, ads, etc. The subpart of frames comprising the cricket region was then subjected to the Statistical Model of Grayscale histogram (SMoG). As the SMoG would not exploit the color or domain precise data, they develop anotherComponent Quantization-based RoI Extraction (i.e., CQRE) for pitch frame abstraction.

Mojtaba Seyedhossein et al. [24] have introduced a contextual structure known as the Contextual Hierarchical Model (i.e., C.H.M.) ascertains the related information in a hierarchical structure for semantic separation. It integrates the resultant multiresolution contextual data into a classifier to fragment the input image at a unique resolution. That the practices permit for the optimized joint posterior likelihood at several resolutions via the priority order. Contextual classified model merely depends upon the input image spots and did not utilize any segmented or the outline instances. Therefore, these were appropriate to several problems like object segmentation and edge detection.

Chi-Man Pun et al. [25] have introduced an innovative on-line video object segmentation approach which depends upon the light invariant color surface feature abstraction and indication likelihood. The developed object marker prediction approach comprises approximating the consumer quantified markers and positioning the object of interest in the subsequent existing frame through the superpixel motion prediction through, illumination invariant optical flow, marker superpixel applicant production short-termsuperpixel similarity, and maximum likelihood calculation using the long tenure superpixel similarity.

Zuoyong Li et al. [26] have projected a well-built mixture single body object image separation technique by developing the salient shift region. This technique initially employs local difficulty and native variance to recognize the changeover areas of an image. After that, the changeover region with manypixels was preferred as the salient changeover region. Next, a gray level detail was ascertained through transition regions and image information. One gray level of the interval was established as the segmentation threshold through the prominent changeover region. Lastly, the image thresholding consequence is processed as the final segmentation outcome through the salient transition region to confiscate the object’s phoney regions.

In the emergence of Deep Neural Networks (D.N.N.s) [29], a more significant benefit is obtained with the initiation of special Regions in association with CNN features (R-CNN) which have deeper architectures along with the capacity to learn more complex features than the shallow ones. This training algorithm is a robust one as it performs classification and bounding box regression tasks in an optimized way. Thus, D.N.N. outperforms over the primary CNN and makes it real-time and accurate object detection more achievable. “You just look once” - YOLO is an advanced real-time object detection system created by Joseph Redmon and Ali Farhadi from the University of Washington. Their algorithm applies a neural network to partitions the picture into a grid and uses one forward propagation to detect all objects in an image.

Augmented reality (A.R.) andvirtual reality (V.R.), one of the hottest topics in technology today. More to that, mixing these two techniques is still challenging and very potential to be studied. Some researchers started reviewing the techniques that mainly focus on vision-based techniques. Their general principles organized the methods. Strengths and weaknesses for these principles were also explored [30]. Thus, it enhances the quality of experience and quality of service of reality applications to improve users’ everyday lives. On the other hand, mobile Augmented Reality (A.R.) frameworks can continuously track a camera’s pose within the scene. They can estimate the environment’s correct scale by using Visual-Inertial Odometry (V.I.O.) [31].

3 Proposed methodology

In this manuscript, a novel method has been introduced a method to efficiently classify the multi objects and categorize the objects in sports videos. Originally the input video will be brought from the data catalogue, and that fetched video will be split into some frames. To simplify the segmentation process and the accurate detection of a human object, both forward and backward tracking will be supported out on the projected method. During the forward tracking, the order of frames will be considered from first to end. Initially, Preprocessing is implemented for every frame for eliminating the noise. In preprocessing level Gaussian filter is utilized for eliminating the noise. Then for each preprocessed frame, modified ontology-based segmentation is applied to segment the objects.

Meanwhile, backward tracking can also be executed. Here, the order of the frames will be from end to first. The same process carried out in the forward tracking will be repeated in backward tracking. Then the outcome of both forward plus backward tracking will be intersected for obtaining the segmentation result. Then the edge is detected from the segmented objects. Finally, the objects are detected by mean of the extraction of the edge detected frames. Attributes such as contrast, color, area, perimeter circularity, homogeneity, energy, and correlation are extorted from the detected objects. Depending upon the extracted attributes, the objects are perfectly categorized as human or bat or ball through the FFBNN classifier. Block diagram of our intended multi objects detection and classification approach is illustrated below.

3.1 Preprocessing

Let us assume a database I (D) containing several videos. The database is the representation whichgiven as follows. $I (D) = {i_{id}, i_{id}, . . . i_{nd} |; i = 1, 2, 3, \dots n$

Where n denotes the total number of videos in the database I (D), and each video is converted into some frames that denote the video’s frame i_id. Each image has ’x’ rows and ‘y’ columns. After segmenting the video into several frames, each frame is converted to grey from RGB. Gaussian filtering is employed in all frames for image smoothening.

3.1.1 Gaussian filtering

Gaussian filter is employed to eliminate the Gaussian noise. The Gaussian Smoothing operator carries out a weighted average of surrounding pixels depending upon the Gaussian distribution. The weights offer high consequence to pixels close to the edge (which lessens blurring in the edges). The degree of smoothing is proportional to the term σ (high σ for more intensive smoothing). Sigma describes the degree of blurring. The radius slider is employed to manage the size of the template. More sigma values offer more blurring for large size of the template. Noise can be included by employing the sliders. Gaussian filtering is employed for image blurring and also to eliminate the noise and the details. The Gaussian function is described.

$g (y, z) = \frac{1}{2 π σ^{2}} • e^{- (\frac{y^{2} + z^{2}}{2 σ^{2}})}$

Where σ signifies the standard deviation of Gaussian distribution. The Gaussian distribution is considered to have a mean which is equal to 0. Gaussian smoothing is valuable for eliminating the Gaussian noise. The filtered image (I′ (x, y)) is then fed as input to the detection process.

3.2 Multi-object detection

The multi-object detection of our proposed technique includes the following three stages,

Segmentation.

Edge detection.

Area Extraction.

3.2.1 Ontology-based multiple object segmentation

The smoothened image obtained after applying the Gaussian filter is subjected tothe segmentation process with modified ontology-based segmentation [28]. Here, the Dirichlet process mixture model is applied to the filtered image to transform the low-level visual space to intermediate semantic space to reduce the features’ dimensionality. After that, with the help of multiple C.R.F.s, the Dirichlet process features are weighed and learned separately within the context. After referencing to change the binary image to an RGB image, the segmentation technique looks like a classification technique. Finally, Modified k-means clustering technique is employed to get the segmented images accurately, and the segmented images are given as input for the edge detection process.

K-Means algorithm is an unsupervised clustering algorithm that classifies the input data points into multiple classes based on their inherent distance from each other. The algorithm assumes that the data features form a vector space and tries to find natural clustering. The points are clustered around centroids μ_i, i = 1 . . . k which are obtained by minimizing the objective

$O = \sum_{i = 1}^{k} {\sum_{x_{o} \in S_{i}} (x_{j} - μ_{i})}^{2}$ (1) where there are k clusters S_i, i = 1, 2 . . . k and μ_i is the centroid or mean point of all the points x_j belongs to S_i. Often the integer that is farthest from any cluster center is chosen. When outliers are present, the resulting cluster centroids (prototypes) may not be as representative. So, this will affect the segmentation process and avoid this, andis changed the μ_i as in Equation 3.

Steps:

Compute the intensity distribution of the intensities.

Initialize the centroids μ_i with k random intensities.

Repeat the following steps until the cluster labels of the image do not change

Cluster the points based on the distance of their intensities from the centroid intensities.

The computation μ_i is modified as given in Equation (3). $c_{i} = \min ({∥ x_{j} - μ_{i} ∥}^{2})$ (2) where, $μ_{iij} = \frac{1}{S_{i}} {[\sum_{k = 1}^{S} ∥ x_{i} - μ_{k} ∥]}^{2 / m - 1}$ (3)

Here m is the constant value.

Compute the new centroid (μ_i) for each of the clusters.

Finally, human, bat and ball objects are segmented from each frame individually and are subjected to process.

3.2.2 Object tracking

For attaining a better detection rate on every shot of a video frame, the detection plus tracking are pooled to acquire an entire tracking procedure. The two types of tracking are

Forward Tracking and

Backward Tracking.

The forward tracking procedure is implemented on every frame, starting from frames where the object has been identified. Backward Tracking is implemented on every frame for yielding an added group of objects being pursued. This tracking normally signifies the object tracking from the frame identified to the last the entire shot. In contrast, the backward tracking yields the unobserved consequence from the shot’s original frame to the frame where the final object detection is carried out. This tracking confirms to be effectual as the forward tracking does not place the object position in a specific frame. This occurs owing to occlusion, poor illumination or depending upon the tracker sticks to the background. If an object in frame1 is not traced properly and the similar object is traced in frame 5, the data are propagated back, offering object tracking in the initial frame. Finally, both the result from forwarding plus backward tracking is intersected to get the final tracking result.

3.2.3 Edge detection

An Edge in an image is a considerable local variation in the intensity of the image, generally related with a discontinuity in the image’s intensity. The edge description of an image considerably lessens the quantity of data to be handled. However, it preserves the vital information concerning the shapes of the objects in the image.

3.2.4 Area extraction

The area gets extorted from the edge detected frames for multi-object detection. The outcome is the multi-object detected frames. The area of the segmented image is computed through Equation (5). $Area = A = \frac{i (h)}{i (w)}$ (4)

Where, i (h), i (w) signifies the height and weight of the image.

3.3 Features extraction

Attributes such as perimeter circularity, homogeneity, energy, contrast, color, area, and correlation represent the imagecontent. The following are the attributes employed for proficient multi-object classification.

3.4 Multi objects classification using FFBNN classifier

The Extracted feature set is inputted to the Feed Forward Back Propagation Neural Network (FFBNN) classifier for training in the training phase. The FFBNN network gets admirably by the extracted features. The features extracted from the processed image are area (A), contrast ( $\overset{\land}{C}$ ), correlation (C_n), energy (E), homogeneity (H), color (C), perimeter (P), circularity (C_r) which are inputted to the FFBNN classifier to undergo training and classification. The FFBNN is formed with eight input units: the extracted attributes, hidden units, and one output unit. The structure of the FFBNNN classifier is illustrated in Fig. 3.

Fig. 1

Block diagram of the intended multi-object detection and classification.

Fig. 2

Properties of various Attributes.

Fig. 3

Structure of FFBNN.

1. The bias function of the neural network.

The input weights are allotted to every neuron, excluding the neurons present in the input layer. The neural network’s planned bias function and activation function are expressed below: Bias function signifies the product of weights and inputs.

$\begin{matrix} b (f) = & α + \sum_{n = 1}^{h} (i (\overset{\land}{C}) + i (C) + i (A) + i (C_{n}) \\ + i (E) + i (H) + i (P) + i (C_{r})) \end{matrix}$ (5)

In the bias function A, $\overset{\land}{C}$ , C_n, E, H, C, P, and C_r be the features extracted.

2. Activation functions for the neural network:

Activation function be a non-linear function which is expressed below $H = \frac{1}{1 + {exp}^{- b (f)}}$ (6)

3. Calculation of learning error of the neural network obtained is given below:

$Z = \frac{1}{s} \sum_{n = 0}^{s - 1} r_{n} - t_{n}$ (7)

The output of FFBNN r_nandt_n signifies the desired and the actual outputs and s be the total number of neurons present in the hidden layer. The error which arises at the training phase gets diminished through the back-propagation algorithm. The error minimization procedure using the backpropagation algorithm is discussed in [27]. The FFBNN network gets trained finely using the extracted features, and the network categorizes the specific value of the feature depending upon the feature types it belongs or not. Comparable FFBNN training procedure takes place for the entire feature set training.

In the testing stage, a fresh set of testing frames are acquired for the classification purpose. The testing frames get processed using our proposed approach, From the detected images, the features such as area (A), contrast ( $\overset{\land}{C}$ ), correlation (C_n), energy (E), homogeneity (H), color (C), perimeter (P), circularity (C_r) values get computed and the extracted feature values are inputted to the well trained FFBNN classifiers. Depending upon the input features, the specific object form FFBNN offers any one of the objects like a bat, ball and human as the output.

4 Results and discussion

In our projected object finding plus tracking approach, the input video obtained from the database is transformed into several frames. For accomplishing more precision in object detection, the noise is censored through Gaussian filter. Subsequently, objects are segmented with the assistance of graph cut with shape prior technique. The backward tracking process is performed, and the result of the forward tracking technique and the backward tracking technique is interconnected in the object tracking phase. After detecting every object in the frames, features are calculated to classify the objects into human and non-human. The intended object detection plus classification approach is executed in the effective MATLAB platform of version 14a. By employing the statistical measure, the performance of the intended object detection plus tracking method is scrutinized.

4.1.1 Snapshots of object detection plus tracking

The performance of the intended object tracking and classification method is evaluated using the conventional F.C.M. plus K-means methods. For assessing the performance of the intended approach, areutilized. Figure 3 (i), (ii) & (iii) illustrates the sample input frames, RGB to gray converted frames, noise-free frames obtained from video 1 and 2. Figure 5 (i), (ii), (iii), (iv) & (v) illustrates the segmented object obtained from the filtered images using F.C.M., K-Means, proposed technique, Edge image and tracking of the video1 and 2 respectively.

Fig. 4

(i) Sample Input frames from video, (ii) RGB to Gray Converted Image, (iii) Filtered Image.

Fig. 5

Segmentation via (i) FCM, (ii) K-Means, (iii) Proposed Technique, (iv) Edge Image (v) Detection.

4.1.2 Performance analysis

The performance analysis of the intended object detection plus tracking approach is implemented through more images. The intended approach is assessed based upon precision, recall, F-measure, accuracy, specificity, sensitivity, F.D.R., F.P.R., F.N.R., and MCCto the conventional methods. The evaluation Table 1 of precision evokes the F-measures of the conventional and projected systems for video 1 and 2 mentioned below.

Table 1
Evaluation of Segmentation of the suggested and prevailing technique in (a) video1, (b) video2

Segmentation Result for Proposed Modified Ontology-based Semantic

No ‘Sen’ ‘Spec’ ‘Acc’ ‘F.P.R.’ ‘PPV’ ‘N.P.V.’ ‘F.D.R.’ MCC

Frame 1 0.985411016 1 0.986553819 0 1 0.853494751 0 0.917084036

Frame 2 0.977806531 0.967077268 0.976974826 0.032922732 0.997178652 0.785483651 0.002821348 0.859956353

Frame 3 0.997031175 0.957722029 0.993984375 0.042277971 0.996449832 0.964418631 0.003550168 0.957805953

Frame 4 0.983012175 0.989235255 0.983493924 0.010764745 0.999081977 0.830110562 0.000918023 0.897875445

Frame 5 0.982722287 0.990849893 0.983350694 0.009150107 0.999220379 0.827752767 0.000779621 0.897283706

Frame 6 0.984849554 0.99061745 0.985295139 0.00938255 0.999203043 0.845537812 0.000796957 0.907753728

Frame 7 0.981966473 0.991177793 0.982677951 0.008822207 0.999248541 0.821450193 0.000751459 0.893676825

a.(i)

Segmentation Result for Existing K-Means

No ‘Sen’ ‘Spec’ ‘Acc’ ‘F.P.R.’ ‘PPV’ ‘N.P.V.’ ‘F.D.R.’ MCC

Frame 1 0.957358821 0.982728556 0.958684896 0.017271444 0.999005993 0.559680318 0.000994007 0.72471646

Frame 2 0.956772572 0.984746187 0.958229167 0.015253813 0.999125009 0.555822159 0.000874991 0.72283689

Frame 3 0.956482118 0.982626766 0.957847222 0.017373234 0.999000364 0.554351904 0.000999636 0.720873103

Frame 4 0.958575112 0.97967311 0.959678819 0.02032689 0.998830811 0.566249461 0.001169189 0.728138421

Frame 5 0.955830704 0.984516773 0.957326389 0.015483227 0.999109737 0.550784706 0.000890263 0.719090991

Frame 6 0.955129908 0.984434826 0.956657986 0.015565174 0.999104295 0.546888005 0.000895705 0.716236769

Frame 7 0.956288983 0.982549443 0.95766059 0.017450557 0.998995364 0.55332491 0.001004636 0.720096866

a.(ii)

Segmentation Result for Existing F.C.M.

No ‘Sen’ ‘Spec’ ‘Acc’ ‘F.P.R.’ ‘PPV’ ‘N.P.V.’ ‘F.D.R.’ MCC

Frame 1 0.908137156 0 0.907313368 1 0.999001214 0 0.000998786 –0.009578692

Frame 2 0.956766237 0.982126528 0.958090278 0.017873472 0.998972005 0.555822159 0.001027995 0.721728638

Frame 3 0.907368311 0 0.906588542 1 0.999052976 0 0.000947024 –0.009366132

Frame 4 0.958577199 0.980568012 0.959726563 0.019431988 0.998883305 0.566249461 0.001116695 0.728520234

Frame 5 0.906718217 0 0.905911458 1 0.999018796 0 0.000981204 –0.00956705

Frame 6 0.955134222 0.986158593 0.956749132 0.013841407 0.999204882 0.546888005 0.000795118 0.716961164

Frame 7 0.956290584 0.98320306 0.957695313 0.01679694 0.999033636 0.55332491 0.000966364 0.720373058

a.(iii)

Segmentation Result for Proposed Modified Ontology-based Semantic

No ‘Sen’ ‘Spec’ ‘Acc’ ‘F.P.R.’ ‘PPV’ ‘N.P.V.’ ‘F.D.R.’ MCC

Frame 1 0.9504683 0.996274029 0.994722392 0.003725971 0.899437532 0.998259857 0.100562468 0.921893767

Frame 2 0.994188158 0.997417831 0.997309335 0.002582169 0.930477647 0.99979749 0.069522353 0.960451143

Frame 3 0.976073953 0.99703251 0.996327029 0.00296749 0.919726729 0.999164795 0.080273271 0.945610533

Frame 4 0.973486926 0.996818804 0.996040268 0.003181196 0.913520933 0.999082696 0.086479067 0.941012504

Frame 5 1 0.99609195 0.996223307 0.00390805 0.898988251 1 0.101011749 0.946295387

Frame 6 0.983299313 0.997508059 0.997040879 0.002491941 0.930628732 0.999431121 0.069371268 0.955096624

Frame 7 0.976681128 0.997183774 0.996491763 0.002816226 0.923747649 0.999183808 0.076252351 0.948056197

b.(i)

Segmentation Result for Existing K-Means

No ‘Sen’ ‘Spec’ ‘Acc’ ‘F.P.R.’ ‘PPV’ ‘N.P.V.’ ‘F.D.R.’ MCC

Frame 1 0.951119115 0.986438602 0.985600976 0.013561398 0.630134651 0.998797719 0.369865349 0.767893481

Frame 2 0.953375796 0.986616659 0.985820622 0.013383341 0.636070032 0.998841897 0.363929968 0.772536357

Frame 3 0.949152542 0.986506756 0.98561928 0.013493244 0.631255337 0.998747192 0.368744663 0.767768015

Frame 4 0.949235949 0.986484544 0.985607077 0.013515456 0.628860673 0.998760059 0.371139327 0.766340382

Frame 5 0.959470199 0.984349727 0.983776693 0.015650273 0.591057441 0.999030246 0.408942559 0.746281795

Frame 6 0.951788079 0.986879001 0.986070775 0.013120999 0.631015104 0.998849601 0.368984896 0.768916942

Frame 7 0.954072096 0.985788767 0.985064063 0.014211233 0.610873654 0.998911744 0.389126346 0.757042555

b.(ii)

Segmentation Result for Existing F.C.M.

No ‘Sen’ ‘Spec’ ‘Acc’ ‘F.P.R.’ ‘PPV’ ‘N.P.V.’ ‘F.D.R.’ MCC

Frame 1 0.951105332 0.986192246 0.985369128 0.013807754 0.623316857 0.998810375 0.376683143 0.763621862

Frame 2 0.011764706 0.965711843 0.964722392 0.034288157 0.000356125 0.998938617 0.999643875 –0.003985579

Frame 3 0.942387904 0.985830668 0.984813911 0.014169332 0.614482407 0.998601416 0.385517593 0.754371123

Frame 4 0.940385123 0.985697244 0.984649176 0.014302756 0.608881298 0.998570028 0.391118702 0.750033307

Frame 5 0.937868238 0.985477567 0.984392923 0.014522433 0.600892244 0.998532314 0.399107756 0.743959778

Frame 6 0.945508497 0.983626001 0.98276388 0.016373999 0.571964752 0.998719671 0.428035248 0.728177578

Frame 7 0.935640887 0.986055105 0.984917633 0.013944895 0.607657183 0.998495632 0.392342817 0.747454762

Segmentation Result for Proposed Modified Ontology-based Semantic
Frame 1	0.985411016	1	0.986553819	0	1	0.853494751	0	0.917084036
Frame 2	0.977806531	0.967077268	0.976974826	0.032922732	0.997178652	0.785483651	0.002821348	0.859956353
Frame 3	0.997031175	0.957722029	0.993984375	0.042277971	0.996449832	0.964418631	0.003550168	0.957805953
Frame 4	0.983012175	0.989235255	0.983493924	0.010764745	0.999081977	0.830110562	0.000918023	0.897875445
Frame 5	0.982722287	0.990849893	0.983350694	0.009150107	0.999220379	0.827752767	0.000779621	0.897283706
Frame 6	0.984849554	0.99061745	0.985295139	0.00938255	0.999203043	0.845537812	0.000796957	0.907753728
Frame 7	0.981966473	0.991177793	0.982677951	0.008822207	0.999248541	0.821450193	0.000751459	0.893676825
a.(i)
Segmentation Result for Existing K-Means
No	‘Sen’	‘Spec’	‘Acc’	‘F.P.R.’	‘PPV’	‘N.P.V.’	‘F.D.R.’	MCC
Frame 1	0.957358821	0.982728556	0.958684896	0.017271444	0.999005993	0.559680318	0.000994007	0.72471646
Frame 2	0.956772572	0.984746187	0.958229167	0.015253813	0.999125009	0.555822159	0.000874991	0.72283689
Frame 3	0.956482118	0.982626766	0.957847222	0.017373234	0.999000364	0.554351904	0.000999636	0.720873103
Frame 4	0.958575112	0.97967311	0.959678819	0.02032689	0.998830811	0.566249461	0.001169189	0.728138421
Frame 5	0.955830704	0.984516773	0.957326389	0.015483227	0.999109737	0.550784706	0.000890263	0.719090991
Frame 6	0.955129908	0.984434826	0.956657986	0.015565174	0.999104295	0.546888005	0.000895705	0.716236769
Frame 7	0.956288983	0.982549443	0.95766059	0.017450557	0.998995364	0.55332491	0.001004636	0.720096866
a.(ii)
Segmentation Result for Existing F.C.M.
No	‘Sen’	‘Spec’	‘Acc’	‘F.P.R.’	‘PPV’	‘N.P.V.’	‘F.D.R.’	MCC
Frame 1	0.908137156	0	0.907313368	1	0.999001214	0	0.000998786	–0.009578692
Frame 2	0.956766237	0.982126528	0.958090278	0.017873472	0.998972005	0.555822159	0.001027995	0.721728638
Frame 3	0.907368311	0	0.906588542	1	0.999052976	0	0.000947024	–0.009366132
Frame 4	0.958577199	0.980568012	0.959726563	0.019431988	0.998883305	0.566249461	0.001116695	0.728520234
Frame 5	0.906718217	0	0.905911458	1	0.999018796	0	0.000981204	–0.00956705
Frame 6	0.955134222	0.986158593	0.956749132	0.013841407	0.999204882	0.546888005	0.000795118	0.716961164
Frame 7	0.956290584	0.98320306	0.957695313	0.01679694	0.999033636	0.55332491	0.000966364	0.720373058
a.(iii)
Segmentation Result for Proposed Modified Ontology-based Semantic
No	‘Sen’	‘Spec’	‘Acc’	‘F.P.R.’	‘PPV’	‘N.P.V.’	‘F.D.R.’	MCC
Frame 1	0.9504683	0.996274029	0.994722392	0.003725971	0.899437532	0.998259857	0.100562468	0.921893767
Frame 2	0.994188158	0.997417831	0.997309335	0.002582169	0.930477647	0.99979749	0.069522353	0.960451143
Frame 3	0.976073953	0.99703251	0.996327029	0.00296749	0.919726729	0.999164795	0.080273271	0.945610533
Frame 4	0.973486926	0.996818804	0.996040268	0.003181196	0.913520933	0.999082696	0.086479067	0.941012504
Frame 5	1	0.99609195	0.996223307	0.00390805	0.898988251	1	0.101011749	0.946295387
Frame 6	0.983299313	0.997508059	0.997040879	0.002491941	0.930628732	0.999431121	0.069371268	0.955096624
Frame 7	0.976681128	0.997183774	0.996491763	0.002816226	0.923747649	0.999183808	0.076252351	0.948056197
b.(i)
Segmentation Result for Existing K-Means
No	‘Sen’	‘Spec’	‘Acc’	‘F.P.R.’	‘PPV’	‘N.P.V.’	‘F.D.R.’	MCC
Frame 1	0.951119115	0.986438602	0.985600976	0.013561398	0.630134651	0.998797719	0.369865349	0.767893481
Frame 2	0.953375796	0.986616659	0.985820622	0.013383341	0.636070032	0.998841897	0.363929968	0.772536357
Frame 3	0.949152542	0.986506756	0.98561928	0.013493244	0.631255337	0.998747192	0.368744663	0.767768015
Frame 4	0.949235949	0.986484544	0.985607077	0.013515456	0.628860673	0.998760059	0.371139327	0.766340382
Frame 5	0.959470199	0.984349727	0.983776693	0.015650273	0.591057441	0.999030246	0.408942559	0.746281795
Frame 6	0.951788079	0.986879001	0.986070775	0.013120999	0.631015104	0.998849601	0.368984896	0.768916942
Frame 7	0.954072096	0.985788767	0.985064063	0.014211233	0.610873654	0.998911744	0.389126346	0.757042555
b.(ii)
Segmentation Result for Existing F.C.M.
No	‘Sen’	‘Spec’	‘Acc’	‘F.P.R.’	‘PPV’	‘N.P.V.’	‘F.D.R.’	MCC
Frame 1	0.951105332	0.986192246	0.985369128	0.013807754	0.623316857	0.998810375	0.376683143	0.763621862
Frame 2	0.011764706	0.965711843	0.964722392	0.034288157	0.000356125	0.998938617	0.999643875	–0.003985579
Frame 3	0.942387904	0.985830668	0.984813911	0.014169332	0.614482407	0.998601416	0.385517593	0.754371123
Frame 4	0.940385123	0.985697244	0.984649176	0.014302756	0.608881298	0.998570028	0.391118702	0.750033307
Frame 5	0.937868238	0.985477567	0.984392923	0.014522433	0.600892244	0.998532314	0.399107756	0.743959778
Frame 6	0.945508497	0.983626001	0.98276388	0.016373999	0.571964752	0.998719671	0.428035248	0.728177578
Frame 7	0.935640887	0.986055105	0.984917633	0.013944895	0.607657183	0.998495632	0.392342817	0.747454762

4.1.3 Experimental results

By examining the tables, In video1, the intended object segmentation system attained the utmost value of 98% when related to the accuracy of the conventional segmentation F.C.M. and K-Means techniques of 93.7% and 95.7%. In video 2, our system attains the utmost value of 99.6% when related to the accuracy of the conventional segmentation F.C.M. and K-Means techniques of 98.2% and 98.5%. Therefore, our intended system attains superior accuracy outcome when related to that of the conventional system. Similarly, the suggested technique offers better performance measures than the prevailing techniques. It is ascertained in the table. The comparison graph is illustrated below.

Figure 6 illustrates the proposed performance; the graph is built by considering the average accuracy, specificity, sensitivity, positive predictive value, negative predictive value, false discovery rate, false-positive rate, false-negative rate and MCC specified in the tables mentioned above. It is observedthrough the performance metrics of the suggested and existing techniques, the suggested technique shows better performance. Thus, it signifies that our intended approach outperforms the conventional approach.

Fig. 6

Performance Comparison of the suggested and prevailing segmentation techniques.

The performance consequences of three videos are tabularized in Table 2 based upon accuracy, specificity, sensitivity, F.D.R., F.N.R., F.P.R. and Mathew’s correlation coefficient (MCC). By examining the tables, in video1, the proposed classification attained the utmost value of 94% when related with that of the sensitivity of the conventional techniques K.N.N. and ANFIS possess 62% and 91% respectively. Thus, the proposed classification technique is 2-32% higher than the existing technique. When looking at the specificity and accuracy of the suggested and existing techniques, the proposed technique overtakes the prevailing techniques. In video 2, our system attains the utmost value of 96.8% when related to the accuracy of the conventional techniques K.N.N. and ANFIS possess 89% and 90% respectively. Thus, the suggested technique offers better classification performance, and it can be used for real-time purposes.

Table 2

Performance Comparison of the proposed and existing classification technique

Technique	Video 1			Video 2
	Sensitivity	Specificity	Accuracy	Sensitivity	Specificity	Accuracy
NN	0.94093223	0.979031407	0.968215159	0.939593301	0.97908398	0.96835443
KNN	0.629107981	0.928031832	0.863080685	0.821304348	0.931032135	0.892405063
ANFIS	0.914023428	0.969182943	0.947432763	0.83974359	0.938584566	0.905063291

5 Conclusion

Our intended multiple object detection plus classification method uses forward and backward tracking based on modified ontology-based semantic and artificial intelligence approaches. The intended approach is executed in MATLAB platform. The intended method’s performance is assessed by means of two shadowing videos and it is compared by means of standard statistical measures like precision, F-measure, recall, accuracy, etc. Besides, the performance is evaluated with that of the conventional classifiers and segmentation system. The statistical measures exhibit that the projected method’s performance is superior compared to conventional methods. The projected technique yields superior performance outcomes, which shows that our intended approach implies performs better. It can be employed for actual time purposes.

References

Djelouah

, Franco

J.-S.

, Boyer

and Perez

, George Drettakis Inria Technicolor, “Cotemporal Multi-ViewVideo Segmentation”, IEEE conference, (2016).

Gupta

, A Survey on Video Content Analysis, International Journal of Research in Advance Engineering1(2) (2015).

Shivhare

and Gupta

, Review of Image Segmentation Techniques Including Pre & Post Processing Operations, International Journal of Engineering and Advanced Technology (IJEAT)4(3) (2015).

Rahini

S. K.k.

, Review of Image Segmentation Techniques: A Survey, International Journal of Advanced Research in Computer Science and Software Engineering4(7) (2014).

Joshi

S.V.

and Shire

A.N.

, A Review of an Enhanced Algorithm for Color Image Segmentation, International Journal of Advanced Research in Computer Science and Software Engineering3(3) (2013).

Haque

M.E.

, Al-Ramadan

and Johnson

B.A.

, Rule-based land cover classification from very high-resolution satellite image with multiresolution segmentation, Journal of Applied Remote Sensing10(3) (2016), 036004–036004.

Held

, Guillory

, Rebsamen

, Thrun

and Savarese

, A Probabilistic Framework for Real-time 3D Segmentation using Spatial, Temporal, and Semantic Cues.

Johnson

and Xie

, Unsupervised image segmentation evaluation and refinement using a multi-scale approach, ISPRS Journal of Photogrammetry and Remote Sensing66(4) (2011), 473–483.

Mahapatra

, Gilani

S.O.

and Saini

M.K.

, Coherency based spatio-temporal saliency detection for video object segmentation, IEEE Journal of Selected Topics in Signal Processing8(3) (2014), 454–462.

10.

Maheswari

S.U.

and Ramakrishnan

, Sports video classification using multi scale framework and nearest neighbor classifier, Indian Journal of Science and Technology8(6) (2015), 529–535.

11.

Abo-Eleneen

Z.A.

and Abdel-Azim

, An Improved Image Segmentation Algorithm Based on M.E.T. Method, International Journal of Computer Science Issues9(5) (2012), 3.

12.

Wanjari

M.T.

, Kalaskar

2K.D.

and Dhore

Dr.M.P.

, Document Image Segmentation using Region Based Methods, International Journal of Computing Science and Information Technology3(3) (2015).

13.

, Liu

, Zhang

and Xu

, Robust single-object image segmentation based on salient transition region, Pattern Recognition52 (2016), 317–331.

14.

Manikannan

and Senthil Murugan

, A Comparative Study about Region Based and Model Based Using Segmentation Techniques, International Journal of Innovative Research in Computer and Communication Engineering3(3) (2015).

15.

Sharma

, Automatic Vehicle Detection Using Various Object Detecting Algorithm and Thresholding Methods,16–28, International Journal on Future Revolution in Computer Science & Communication Engineering1(1) (2015).

16.

Adhikari

, Kar

and Dastidar

J.G.

, An automatic and efficient foreground object extraction scheme, International Journal of Science and Applied Information Technology3(2) (2014).

17.

Ramadevi

, Sridevi

, Poornima

and Kalyani

, Segmentation and Object Recognition Using Edge Detection Techniques”, International Journal of Computer Science & Information Technology (IJCSIT)2(6) (2010).

18.

, Xu

, Zhang

, Lin

and Ward

R.K.

, Object-based multiple foreground video co-segmentation via multi-state selection graph, IEEE Transactions on Image Processing24(11) (2015), 3415–3424.

19.

Ghute

M.S.

, Parkhi

A.A.

, Kamble

K.P.

and Anjum

, Iris Movement Detection by Morphological Operations for Robotic Control, International Journal of Engineering Science and Innovative Technology (IJESIT)1(4) (2013).

20.

Perazzi

, Pont-Tuset

, McWilliams

, Van Gool1

, Gross

, Sorkine-Hornung

and Zurich

, A Benchmark Dataset and Evaluation Methodology forVideo Object Segmentation, IEEE conference (2016).

21.

Huang

and Liu

, Multi-class obstacle detection and classification using stereovision and improved active contour models, I.E.T. Intelligent Transport Systems10(3) (2016), 197–205.

22.

Deokar

and Ruhikabra

, Video Shot Detection & Classification in Cricket Videos, International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)4(7) (2015).

23.

Jayanth

S.B.

and Srinivasa

, Automated classification of cricket pitch frames in cricket video, ELCVIA Electronic Letters on Computer Vision and Image Analysis13(1) (2014).

24.

Seyedhosseini

and Tasdizen

, Semantic image segmentation with contextual hierarchical models, IEEE transactions on pattern analysis and machine intelligence38(5) (2016), 951–964.

25.

Pun

C.-M.

and Huang

, On-line video object segmentation using illumination-invariant color-texture feature extraction and marker prediction, Journal of Visual Communication and Image Representation41 (2016), 391–405.

26.

, Liu

, Zhang

and Xu

, Robust single-object image segmentation based on salient transition region, Pattern Recognition52 (2016), 317–331.

27.

Dhanalaxmi

, Apparao Naidu

and Anuradha

, Adaptive PSO Based Association Rule Mining Technique for Software Defect Classification Using ANN, Procedia Computer Science46 (2015), 432–442.

28.

and Manjunath

B.S.

, Shape prior segmentation of multiple objects with graph cuts, Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, (2008).

29.

Zhao

Z.-Q.

, Member, IEEE, Peng Zheng, Shou-tao Xu, and XindongWu, Object Detection with Deep Learning: A Review, IEEE Transactions on Neural Networks and Learning Systems, (2019).

30.

Rungruanganukul

, Thitirat Siriborvornratanakul, Deep Learning Based Gesture Classification for Hand Physical Therapy Interactive Program, International Conference on Human-Computer Interaction HCII (2020).

31.

, Tian

, Zhang

, Quan

and Xu

, Object Detection in the Context of Mobile Augmented Reality, IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (2020).