Abstract
The purpose of this paper is to present a rapid and efficient fish tracking method suitable for real world automatic underwater fish observation. Based on fish tracking, biologists are able to observe fish and their ecological environment. A distributed real-time underwater video stream system has been developed in Taiwan for large-scale, long-term ecological observation. In addition, not only does the system archive video data, but also incorporates data analysis. However, it is difficult to discriminate moving fish from drift water plants due to the severe drift of water plants caused by the water flow in real world underwater environments. Thus, fish tracking is complicated in unconstrained water. In order to overcome this problem, we propose a bounding-surrounding boxes method, which enables integration with state-of-the-art tracking methods for fish tracking in this paper. According to the method, fixing cameras must be used so that the moving fish are classified as foreground objects and are tracked, whereas the drifting water plants are classified as the background objects and are removed from the tracked objects. It enables the efficient, rapid removal of irrelevant information (non-fish objects) from large-scale fish video data. Experimental results show that the proposed method is able to achieve high accuracy.
1. Introduction
Ecological observation is imperative for marine scientists to study marine ecosystems. It is, however, difficult to sustain a long-term and real-time observation, mainly as a result of the inaccessibility of the marine environment. A distributed real-time underwater video stream system has been developed for long-term observation of ecosystems on the Southern tropical coast of Taiwan [6]. The video data is not only broadcasted in real-time via the Internet, but also archived to form a resource base for further analysis.
In recent years, considerable research has been conducted on video monitoring systems. Detecting and retrieving moving objects are necessary pre-processes for the application of real-time monitoring systems [2, 9]. Therefore, numerous outstanding algorithms for land object tracking have been proposed. Background subtraction is regarded as an approach to capture the complete shape of tracking objects [4]. Particularly, the Gaussian Mixture Model (GMM) proposed by Stauffer and Grimson [8], an adaptive background subtraction method used widely in many areas, enables dynamic background models to be built. Although many applications have been proposed, under the uncontrolled conditions, i.e., in real-life underwater systems, a challenge is still existent [1, 7]. Fish detection and tracking are complicated by the variability of the underwater environment. The water plants are regarded as foreground objects as a result of the severe drift from the interference of the water flow, which results in complexities and difficulties in discriminating moving fish from drifting water plants. Thus, the accuracy of fish tracking has been seriously affected by the use of traditional methods. In this paper, a rapid fish tracking method, bounding-surrounding boxes (BSB), is implemented to efficiently deal with the problem of drifting water plants; meanwhile, it enables the discrimination of moving fish as foreground objects and drifting water plants as background objects. It enables the efficient, rapid removal of irrelevant information.
The rest of the paper is organized as follows. Section 2 briefly introduces the real world underwater video stream system. Section 3 describes the BSB method for fish tracking. Section 4 shows experimental results and the conclusion is drawn in Section 5.
2. Real World Underwater Video Stream System
In this paper, a distributed real-time underwater video stream system is developed to conduct long-term underwater fish observation. Figure 1 shows the architecture of the underwater video stream system. Four fixing CCTVs are set up underwater. The video signal captured from the CCTV is converted into a Motion JPEG stream. However, the native Motion JPEG stream data can arrive 20 gigabytes per hour. With 12+ hours of usable daylight this could lead to the order of 100 terabytes of data per year. We transfer the source stream data into multiple encoded formats and bitrates to reduce the massive amount of data. In general, we obtained a better result, good quality and low data capacity by computing the Peak Signal-to-Noise Ratio (PSNR) between native Motion JPEG and other encoded formats. Figure 2 shows the PSNR values with different encoded formats and bitrates. Mpeg4 format with 5Mb bitrate (PSNR=31.87) are adopted since it meets our requirements and allows real-time observation (low data) and fish tracking (good quality data).

Architecture of the underwater video stream system.

The PSNR values with different encoded formats and bitrates.
On the other hand, the interlace effect existing in the source data might affect fish detection and tracking. Hence, a motion adaptive deinterlacing method [3] is implemented to remove the interlace effect. First, the motion areas of the video data are detected. Then, intra-field deinterlacing is adopted in motion areas and inter-field deinterlacing is implemented in static areas. Figure 3 shows the workflow of the motion adaptive deinterlacing method, which is able to achieve the advantages of both intra-field deinterlacing and inter-field deinterlacing. Figure 4(a) presents the original image with the interlace effect and Figure 4(b) shows the deinterlacing result.

The workflow of the motion adaptive deinterlacing method.

(a) The original image with interlace effect (b) the deinterlacing result.
Afterwards, the Mpeg4 stream is processed in two modes, one is converted into Mpeg4 video files for storing in local storage and the other is directly transmitted into a multicasting pool via ADSL lines for real-time observation. Figure 5 shows the real-time underwater stream observation via the Internet.

The real-time underwater video stream observation via the Internet.
3. Bounding-Surrounding Boxes Method for Fish Tracking
In this paper, we provide a Bounding-Surrounding Boxes method, which enables integration with several tracking algorithms, such as particle filtering or probability hypothesis density filtering to improve the tracking result. The aim of the BSB method is to efficiently distinguish the foreground objects between moving fish and drifting water plants. A tracking method based on the GMM is adopted to integrate the BSB method for fish tracking.
3.1 Background Subtraction
Fish tracking is implemented using the stored video data and background subtraction is the first step. In this paper, the GMM [8] is utilized to build a background model and to update this model frame by frame. It can be updated and restructured by spatial variation of successive images. Each pixel is modelled by a mixture of G Gaussian distributions. The history of a pixel is defined as a time series {X1, …, X t }. The probability function of the pixel value in frame t is:
Where
where the covariance matrix is assumed to be of the form:
The weighted value
where α is the learning rate and
where ρ is the learning rate:
In the above formulas, α and ρ are weighted values between zero and one. The previous frame is multiplied by weights (1- α) and (1- ρ) and the influence is reduced gradually. The background update is achieved by discarding the old pixel information and adding the new pixels. Figure 6(a) shows the current frame and Figure 6(b) shows the background model constructed using the GMM.

(a) The current frame (b) the background model.
3.2 Foreground Segmentation and Tracking
After the background model is constructed, we use background subtraction to extract the pixel value of the background model from the current frame by adopting R, G and B, respectively. Thereafter, the foreground objects are obtained. However, due to the movement of the foreground objects, the remnant shade usually appears in the foreground object by means of background subtraction, which causes the broken and disordered shape of the foreground objects. To compensate for this situation, a 3×3 crossed structure element is utilized to implement a morphology operation including erosion and dilation. Thus, these foreground pixels can be segmented into regions by the connected components algorithm.
By using colour information, the foreground images are compared to track the moving objects. In this paper, we apply a correlation coefficient histogram to measure the similarity between two consecutive frames in the time sequence. The most familiar measure of dependence between two values is the Pearson's correlation [5]. It has the form:
where μX and σX are the average and standard deviation of random variable X, respectively, μY and σY are the average and standard deviation of random variable Y, respectively and σXY is the covariance of X and Y. The correlation coefficient ρXY is between −1 and 1. When ρXY = 1, X and Y have a perfect positive correlation; that is, Y will follow the direction of X. When ρXY = −1, X and Y have a perfect negative correlation; that is, Y will follow the opposite direction of X. When ρXY = 0, X and Y have no correlation; that is, no linear relation between X and Y. Therefore, we carry out the correlation coefficient analysis for the objects in two consecutive frames. When comparing the two consecutive frames, the objects with the correlation coefficient closest to one and with the shortest distance are treated as the same object. Figure 7(a) shows the binary foreground image by background subtraction with a morphology operation and Figure 7(b) shows the bounding boxes of the detected foreground objects.

(a) The binary foreground objects (b) the bounding boxes of the foreground objects.
3.3 Bounding-Surrounding Boxes Method
Owing to the interference of the severely drifting water plants, the underwater environment in the real world is unconstrained, which leads to difficulty and complexity in discriminating moving fish and drifting water plants. Therefore, we propose a BSB method based on the concept of drifting water plants in a fixed field, but movable fish in an unfixed field. Due to the fixed CCTV, the frame underwater is stationary so our method can be adopted. Each foreground object is circumscribed by its bounding box with width d1 and height h1. Let (c x , c y ) be the centre point of the bounding box and the upper-left point is (c x —0.5×d1, cy—0.5×h1). Then, the surrounding box is set to T (T > 1) times the size of the bounding box with the same centre point. The illustration of the bounding and surrounding boxes is shown in Figure 8.

Illustration of the bounding and surrounding boxes.
Let B and S be the bounding box and surrounding box. The size and location of S is fixed in the image. The location of B of the object is observed for period of time τ. If the location of B is always inside the range of S, the object is classified as a non-fish object (water plant), which is eliminated from the tracking objects. On the contrary, if the location of B is outside the range of S, the object is classified as a foreground object (fish), which continues to be tracked. Figure 9 shows the detecting results after tracking during period of time τ. In Figure 9(a) the yellow box represents the fixed surrounding box of the object and the red box and red line represent the bounding box and the trajectory of the tracking object. The tracking object is classified as a fish due to the location of B outside the range of S. The blue box and blue line in Figure 9(b) represent the bounding box and the trajectory of tracking object. Because the location of B always remains inside the range of S, the object is classified as “non-fish”.

(a) The object (red box) is classified as fish (b) the object (blue box) is classified as non-fish (drifting water plant).
4. Experimental Results
To evaluate the effectiveness of our proposed system, ten different underwater videos with multiple complex scenes were tested. Each video is ten minutes long with a frame rate of 20fps. The resolution of these videos is 640×480. On a PC with CPU i5-3570 at 3.4 GHz, our method can process nine to ten frames per second. The aim of our experiment is to evaluate the accuracy of classifying moving fish as foreground objects and drifting water plants as background objects.
In our experiment, we define that the size of the surrounding box as four times (T = 4) the size of the bounding box, where d2 = 2×d1, h2 = 2×h1. The observed time τ is defined as 1 second (20 frames) and 2 seconds (40 frames). We compute the Classification Success Rate (CSR), which is defined below:
Where
FN: the fish number of tracking objects.
PN: the non-fish number of tracking objects.
FF: the number of fish correctly classified as foreground objects.
PG: the number of non-fish correctly classified as background objects.
The CSR represents the accuracy rate of classifying moving fish as foreground objects and drifting water plants as background objects. Table 1 shows the experiment results during the different time periods. The CSR is able to approach about 90% in our cases.
The CSR with different tracking times
Figure 10 shows the fish tracking results from another video by using the bounding-surrounding boxes method. Figure 10(a) shows the binary foreground image by background subtraction with a morphology operation and Figure 10(b) shows the background model constructed using the GMM. The red box and red line in Figure 10(c) represent the bounding box and the trajectory of the tracking object, which is classified as a fish. The blue box and blue line in Figure 10(d) represent the bounding box and the trajectory of the tracking object, which is classified as non-fish.

(a) The binary foreground objects (b) the background model (c) The object (red box) is classified as a fish (d) the object (blue box) is classified as non-fish (drifting water plant).
5. Conclusion
In this paper, a distributed real-time underwater stream system is developed for fish observation and analysis in the real world. In particular, a bounding-surrounding boxes method, which enables integration with several state-of-the-art tracking algorithms for fish tracking in complicated underwater scenes, is proposed. In light of the method, fixing cameras must be utilized so that foreground objects such as moving fish and drifting water plants can be efficiently distinguished between. That is, the moving fish are classified as foreground objects and continue to be tracked and the drifting water plants are identified as background objects and are removed from the tracking objects. The performance of our method is computed by a classification success rate (CSR) of approximately 90%.
Footnotes
6. Acknowledgments
The research was funded by the Taiwan National Science Council (grant NSC 100-2933-I-492-001) and the European Commission (FP7 grant 257024) and undertaken in the Fish4Knowledge project (www.fish4knowledge.eu). We thank the Third Nuclear Power Plant of Taiwan Power Company and National Museum of Marine Biology & Aquarium, Taiwan for logistical support. We also thank Ecogrid team at National Centre for High-Performance Computing, Taiwan for fish video data supply.
