Content-Based Management Service for Medical Videos

Abstract

Development of health information technology has had a dramatic impact to improve the efficiency and quality of medical care. Developing interoperable health information systems for healthcare providers has the potential to improve the quality and equitability of patient-centered healthcare. In this article, we describe an automated content-based medical video analysis and management service that provides convenience and ease in accessing the relevant medical video content without sequential scanning. The system facilitates effective temporal video segmentation and content-based visual information retrieval that enable a more reliable understanding of medical video content. The system is implemented as a Web- and mobile-based service and has the potential to offer a knowledge-sharing platform for the purpose of efficient medical video content access.

Introduction

Medical video repositories play important roles for many health-related issues such as medical imaging, medical research and education, medical diagnostics, and training of medical professionals. Because of limitations in accessing medical expertise, the health maintenance system of a country may face a variety of problems that will directly affect an individual's quality of life and also the entire well-being of a society. Connecting as many hospitals as possible to a medical information system would be very beneficial in terms of an improved standard of medical practice and educational aspects for medical students and staff who cannot reach medical resources because of resource, geographical, and time constraints.^1,2

The maturation of Adobe Flash and Web 2.0 has led the launch of several video-sharing Web sites such as YouTube³ and Vimeo,⁴ allowing users to post video clips online and share them with others.⁵ There are also topic-specific video sites like OrLive,⁶ an online surgical and healthcare video and Web cast platform. These Web sites enable users to stream video content. Although they are efficient in distributing videos over the broadband network, they lack mechanisms for effective content management and organization. Video content in digital libraries is of only limited utility without appropriate organization and management.⁷ Video streams must be divided into smaller meaningful segments, and their semantics must be described in order to construct an index for effective retrieval. Indexing the video makes access to certain entities of the content timely and efficient. The data must be partitioned in a hierarchical fashion into meaningfully clustered subgroups so that the foundational structures required for conducting point operation for extracting the related information are obtained.⁸

In this article, we present a content-based management service for medical videos that provides convenience and ease in accessing the relevant medical video content without sequential scanning. The proposed service (1) automatically detects the boundaries of the shot changes and partitions a video into shorter segments, (2) provides a pictorial summarization of the video, (3) enables retrieval and access of the video content based on query image, and (4) provides a variety of ways for accessing any particular part of the video (i.e., clicking the key frame starts the video playing from that point in time). We implemented the system for both Web and mobile environments. We give a high-level description of key system components here. The details of the algorithms used in the system can be found in our previous work.^9,10

Materials and Methods

BACKGROUND

This section presents a brief overview of temporal video segmentation, including shot boundary detection and key frame extraction processes and video content retrieval problems. Temporally segmenting videos by detecting the shot boundaries aims to break up the video into meaningful segments so designated shots contain the same semantic information and then key frames are selected to represent each shot.

Most existing methods use a similarity metric between successive frames to detect shot boundaries. Based on the similarity measure, the algorithms can be divided into three categories: pixel, block-based, and histogram comparisons.

Pixel-level comparison^11

–15 is the simplest way to evaluate the intensity values of corresponding difference in pixels between successive frames. A shot boundary has been found if the difference in mean absolute change in the intensity value of the pixels is greater than a prespecified threshold T.

Block-based approaches^{11,13,16
–18} are based on the comparison of corresponding regions (blocks) in two successive frames. Frames are divided into blocks that are compared with their corresponding blocks. In contrast to pixel-level comparison, which is based on global image characteristics, block-based approaches use local characteristics to increase the robustness to camera and object movement while retaining enough spatial information.

To increase the robustness to the camera and object motion, alternative approaches have been proposed based on the comparison of histograms of successive images. The histogram comparison algorithms can be divided into two categories: global and local histogram comparisons. Several comparisons of histogram-based techniques have been performed for shot boundary detection based on difference between two histograms.^13,19
–21 On the other hand, several local histogram comparison methods^22
–24 have been proposed based on that frames are divided into uniform and nonoverlapping regions. Histogram values of each region are then compared with corresponding regions of the successive frame.

Key frame extraction involves selecting one or multiple frames that will represent the content of the video. The techniques for key frame extraction can be classified into three categories: curve simplification, matrix-based, and clustering-based methods.

Curve simplification methods^25
–27 are based on approximating line segments in a curve into smaller number of vertices. A simplified curve is computed that approximates a trajectory curve representing a video sequence in a high-dimensional feature space, according to some predefined error criterion. The junctions between simplified curve segments are then chosen as key frames.

Another main approach to key frame extraction is matrix factorization.^28
–30 The frames of video sequences are represented as matrices. Then, by applying a matrix factorization technique to this feature-frame matrix key frames are selected.

Clustering-based techniques^8,9,31 are alternative methods for key frame extraction. After the extracted features are grouped into clusters, key frames are selected from these clusters.

Video content retrieval aims access of the video content based on query image by applying content-based image retrieval principles. Features representing the visual content of the video frames and query image are extracted. Based on the similarity metric determines how close query image and video frames are measured. Retrieval results are then ranked according to the similarity score. Several general-purpose content-based image retrieval systems have been previously proposed. Some examples include SIMPLIcity,³² CIRES,³³ ALIPR,³⁴ FIRE,³⁵ AMORE (Advanced Multimedia Oriented Retrieval Engine),³⁶ and MARS.³⁷

System Overview

The system has two components: (1) temporal video segmentation and (2) content-based retrieval. The temporal video segmentation process includes partitioning a video sequence into a set of shots and extracting one or multiple key frames to represent each shot. In the retrieval process, video content can be searched, browsed, or retrieved based on a query image.

Temporal video segmentation

Temporal video segmentation is the first process for automatic video indexing, aiming to split visual data into coherent and smooth groups along the time axis. Figure 1 shows the overview of the temporal segmentation process. It includes two fundamental steps: (1) shot boundary detection and (2) key frame extraction. Shot boundary detection targets partitioning a video into shorter segments (shots). Key frame extraction provides a compact pictorial summarization and representation of a video sequence.

Fig. 1.

Overview of the temporal segmentation process.

Shot boundary detection of our system is based on hue–saturation–value (HSV) color histogram comparison. RGB color space is converted to HSV space, and the differences of HSV histograms between consecutive frames are computed using the equation \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}D ( k ) = \mathop\sum_{i = 1}^N h_k ( i ) - h_{k - 1} ( i ) \tag{1}\end{align*} \end{document}

where h_k is the color histogram of the kth frame of the video sequence with N bins.^9,10 Furthermore, a color quantization is performed using 256 colors (16 levels for the H channel, 4 levels for the S channel, and 4 levels for the V channel) in order to reduce computational effort. Figure 2 shows color histogram differences of a video part. Peaks are associated as shot boundaries where large discontinuities occur between histograms.

Fig. 2.

Hue–saturation–value (HSV) color histogram difference.¹⁰

After the video sequence is divided into shots, key frames are chosen from these shots such that each represents the content of the corresponding shot. Our system uses k-means clustering and principal component analysis for key frame extraction process. Figure 3 shows a clustering plot of a video with k=3. The horizontal and vertical axes of clustering plot are the projections of HSV histogram vectors onto the first two principal components. The frames closest to the cluster centroids are selected as key frames.

Fig. 3.

Clustering plot with k=3.⁹

Content-based retrieval

To search shots and images inside the video sequences, our system takes the advantage of visual features of the key frames. Query image and key frames of the videos are represented by the HSV color histogram feature vector and then are compared using a similarity metric. Based on the similarity, the relevant shots or images are retrieved from the videos. The retrieval process is depicted in Figure 4.

Fig. 4.

Overview of the retrieval process. HSV, hue–saturation–value.

To evaluate the similarity between query image and the key frames, the Euclidean distance between corresponding color histograms h ¹ and h ² is computed³⁸: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}L_2 ( h^1 , h^2 ) = \sum \nolimits_x \sum \nolimits_y \sum \nolimits_z ( h^1 ( x , y , z ) - h^2 ( x , y , z) ) \tag{2}\end{align*} \end{document}

where x, y, and z denote the color channels of hue, saturation, and value, respectively, for HSV space.

Data Structure

We designed a tree structure in the form of XML (Extensible Markup Language) storing the results of temporal video segmentation. The XML is used for further access, browsing, and retrieval of the video content. The information about the video sequence, in particular the name, description, ID, and frames per second of the video and its key frames, histogram bins of the key frames, the starting and end times of the shots, and where each key frame belongs, is organized using the nested hierarchy of XML elements. Figure 5 shows this information about the video described in the XML tree structure.

Fig. 5.

Extensible Markup Language tree structure describing the information about the video. FPS, frames per second.

Using XML structural organization, users will be able to efficiently access specific parts of the video they are interested in, browse the video summary (key frames), and query the stored information to retrieve the video content.

Application

Our service is implemented for both Web and mobile environments. Figure 6 shows the overall view of the Web interface. The shot boundary locations of a video chosen from the video list on the right panel are automatically marked by red lines on the timer of the control bar. The key frames of the selected video are arranged on the key frame panel below.

Fig. 6.

Overall view of the Web interface.

The user can browse the key frames of a video and select any particular key frame that interests him or her most. By clicking on the specific key frame, all key frames of the shot are shown in a separate window (Fig. 7). When the user clicks on one of them, the video can be viewed starting from the segment represented by the selected key frame. Therefore, without having to look through the entire video, users can watch the interesting part they want to see.

Fig. 7.

Key frame selection and video browsing.

In the content-based retrieval module (Fig. 8), the user can search and retrieve a particular content from the video list based on the query image. Once a query image is specified, the HSV color histogram feature is extracted from the image and compared with the histogram values of key frames that are stored in the XML file. According to the similarity score, relevant shots with their key frames are presented. Once the user identifies an interesting result, clicking on its specific key frame starts video playback from that point.

Fig. 8.

Content-based retrieval module.

We also implemented our system for mobile devices. Figure 9 shows the mobile interface of the service deployed on a Windows Phone 7 emulator.

Fig. 9.

Mobile interface on the Windows Phone 7 emulator.

A demo of our Web-based service is available online at http://h205629.dreamsparkhosting.com/ (username, Student1; password, UALR).

Discussion

Medical video libraries are dedicated to many health-related applications such as medical imaging, medical research and education, medical diagnostics, and training of medical professionals. In recent years, rapid expansion in the use of digital videos has led to a significant increase in the availability and the amount of video data.³⁹ In this article, we have presented an automated content-based management service for medical videos. The proposed service brings ease and effectiveness to the access of visual medical content. Our system is designed for Web and mobile platforms and has the potential to provide a robust framework for effective management and organization of medical video content.

Footnotes

Disclosure Statement

No competing financial interests exist.

References

Mendi

, Cecen

, Ermisoglu

, Bayrak

. Automated neurosurgical video segmentation and retrieval system. J Biomed Sci Eng, 2010; 3:618–624.

Mendi

, Bayrak

. Shot boundary detection and key frame extraction using salient region detection and structural similarity. 48th ACM Southeast Conference (ACM-SE '10), New York, ACM, 2010.

YouTube. www.youtube.com/. 2011 November .

Vimeo. http://vimeo.com/. 2011 November .

Fang

. Video repository: A law library's approach. Newark, NJ: Rutgers-Newark Law Library for the Center of Law and Justice, 2011.

OrLive. www.orlive.com/. 2011 November .

Volkmer

. Semantics of video shots for content-based retrieval [Ph.D. thesis] Melbourne: RMIT University, 2007.

Cecen

. Histogram based video segmentation and key frame extraction on SOM and DFT [Master's thesis] Little Rock: University of Arkansas at Little Rock, 2009.

Mendi

, Bayrak

. Shot boundary detection and key frame extraction from neurosurgical video sequences. Imaging Sci J, 2011; 60:90–96.

10.

Mendi

, Bayrak

. A Web-based medical video indexing environment. 4th IEEE International Conference on Semantic Computing (ICSC '10). New York: IEEE, 2010; 172–175.

11.

Koprinska

, Carrato

. Temporal video segmentation: A survey. Signal Processing Image Commun, 2001; 16:477–500.

12.

Porter

. Video segmentation and indexing using motion estimation [Ph.D. thesis] Bristol, UK: University of Bristol, 2004.

13.

Boreczky

, Rowe

. Comparison of video shot boundary detection techniques. J Electron Imaging, 1996; 5:122–128.

14.

Kikukawa

, Kawafuchi

. Development of an automatic summary editing system for the audio-visual resources. Trans Electron Inf, 1992; J75-A,2:204–212.

15.

Zhang

, Kankanhalli

, Smoliar

. Automatic partitioning of full-motion video. Multimedia Syst, 1993; 1:10–28.

16.

Kasturi

, Jain

. Dynamic vision. Jain

, Kasturi

. Computer Vision: Principles. Washington, DC: IEEE Computer Society Press, 1991; 469–480.

17.

Y-F

, Sheng

, Chen

, Zhang

H-J

. MSR-Asia at TREC-10 video track: Shot boundary detection task. The Tenth Text REtrieval Conference (TREC 2001). NIST Special Publication500-250. Gaithersburg, MD: National Institute of Standards and Technology, 2001; 371–377.

18.

Porter

, Mirmehdi

, Thomas

. Detection and classification of shot transitions. British Machine Vision Conference (BMVC) Washington, DC: IEEE Computer Society, 2001; 73–82.

19.

Gargi

, Kasturi

, Strayer

. Performance characterization of videoshot-change detection methods. IEEE Trans Circuits Syst Video Technol, 2000; 10:1–13.

20.

Lupatini

, Saraceno

, Leonardi

. Scene break detection: A comparison. Proceedings 8th International Workshop on Research Issues in Data Engineering. Continuous-media databases and applications, New York, IEEE, 1998; 34–41.

21.

Dailianas

, Allen

, England

. Comparison of automatic video segmentation algorithms. SPIE Conf Ser, 1996; 2615:2–16.

22.

Lee

JC-M

, Ip

DM-C

. A robust approach for camera break detection in color video sequence. Proceedings of IAPR Workshop on Machine Vision Application (MVA'94), Kawasaki, Japan, International Association for Pattern Recognition, 1994; 502–505.

23.

Nagasaka

, Tanaka

. Automatic video indexing and full-video search for object appearances. Knuth

, Wegner

. Visual Database Systems II. New York: Elsevier, 1995; 113–127.

24.

Swanberg

, Shu

C-F

, Jain

. Knowledge-guided parsing in video databases. Niblack

. Proceedings of SPIE, volume 1908: Storage and retrieval for image and video databases. Bellingham, WA: SPIE, 1993; 13–24.

25.

Lowe

. Three-dimensional object recognition from single two dimensional images. Artif Intell, 1987; 31:355–395.

26.

DeMenthon

, Kobla

, Doermann

. Video summarization by curve simplification. Multimedia '98: Proceedings of the sixth ACM international conference on multimedia, New York, ACM, 1998; 211–218.

27.

, Zhang

, Tretter

. An overview of video abstraction techniques. HP Laboratory technical report HPL-2001-191. Palo Alto, CA: HP, 2001.

28.

Gong

, Liu

. Video summarization using singular value decomposition. Proceedings IEEE Conference on Computer Vision and Pattern Recognition, 2000 New York, IEEE, 2000; 174–180.

29.

Cooper

, J. Foote

. Summarizing video using non-negative similarity matrix factorization. 2002 IEEE Workshop on Multimedia Signal Processing, New York, IEEE, 2002; 25–28.

30.

Abd-Almageed

. Online, simultaneous shot boundary detection and key frame extraction for sports videos using rank tracing. IEEE International Conference on Image Processing (ICIP08), Washington, DC, IEEE Computer Society, 2008; 3200–3203.

31.

Yueting

, Yong

, Huang

, Mehrotra

. Adaptive key frame extraction using supervised clustering. 1998 IEEE International Conference on Image Processing (ICIP-98), Washington, DC, IEEE Computer Society, 1998; 866–870.

32.

Wang

, Li

, Wiederhold

. SIMPLIcity: Semantics-sensitive integrated matching for picture libraries. IEEE Trans Pattern Anal Machine Intell, 2001; 23:947–963.

33.

Iqbal

, Aggarwal

. CIRES: A system for content-based retrieval in digital image libraries. 7th International Conference on Control Automation, Robotics and Vision, 2002 (ICARCV 2002), New York, IEEE, 2002; 205–210.

34.

, Wang

. Real-time computerized annotation of pictures. IEEE Trans Pattern Anal Machine Intell, 2008; 30:985–1002.

35.

Deselaers

, Keysers

, Ney

. FIRE—Flexible image retrieval engine: Image CLEF2004 evaluation. Peters

, Clough

, Gonzalo

, Jones

GJF

, Kluck

, Magnini

. Lecture Notes in Computer Science 3491: Multilingual Information Access for Text, Speech and Images, 5th Workshop of the Cross-Language Evaluation Forum, CLEF 2004, Bath, UK, September 15–17, 2004, Revised Selected Papers. New York: Springer, 2005; 688–698.

36.

Mukherjea

, Hirata

, Hara

. Amore: A World Wide Web image retrieval engine. World Wide Web, 1999; 2:115–132.

37.

Huang

, Mehrotra

, Ramchandran

. Multimedia analysis and retrieval system—MARS project. Heidorn

, Sandore

. Papers presented at the 1996 Clinic on Library Application of Data Processing, March 24–26, 1996, University of Illinois at Urbana-Champaign, 1997; 100–117.

38.

Mendi

, Bayrak

. Performance evaluation of color image retrieval. 2010 39th IEEE Applied Imagery Pattern Recognition Workshop (AIPR) New York, IEEE, 2010; 1–5. 10.1109/AIPR.2010.5759680.

39.

Mendi

, Bayrak

. Summarization of MPEG compressed video sequences. Adv Sci Lett, 2011; 4:3706–3708.