A hybrid approach to generate visually seamless aerial mosaicks from unmanned aerial vehicles

Abstract

The need for understanding the terrain or conditions of large areas aerially has gained prominence as the aerial images provide a near clear coverage of the area under study. Individual image provides just a portion of the area, thus to understand the whole area, mosaicking or stitching of these images is needed. Image mosaicking aids in providing with a ”Big Picture” as an outcome by joining the images taken during the flight. In this paper we propose a method which aims at generating a seamless aerial mosaick using only the images captured by the UAV as input. This involves identifying candidate images from the images captured by the UAV periodically during its flight and stitching the images together. This method evaluates various feature descriptors and feature matching techniques that can be integrated into the mosaicking system. The proposed work is a hybrid approach that uses the Scale Invariant Feature Transform (SIFT) for feature extraction and the key features are matched using the Fast Library for Approximate Nearest Neighbors (FLANN). RANdom Sample Consensus (RANSAC), is used for the removal of features that are redundant or act as outliner, providing candidates for Homography estimation. This is followed by image stitching that involves the use of Multi-Band Blending to produce a visually seamless mosaick. The results obtained were evaluated for quality using Universal Quality index Measure (QIM) and is found to be perfect.

Keywords

Mosaicking UAV SIFT FLANN RANSAC Homography Multi-band Blending

1 Introduction

Images have meant a lot to us and thus the saying ”A Picture speaks a thousand words” aptly puts forth how well, we relate to an image. This is why we make use of images to put forth our ideas, be it during meetings, discussions or bringing out ideas understandable to the common man. Images bring out the complexity of information in a very simple way, helping the people understand concepts better and in an easier way. This is one reason why images are so widely used in all complex fields like medicine, astronomy, marketing, aerial imaging etc.

The proposed work focuses on the use of aerial images, which is gaining prominence in the field of image processing. The initial stage of aerial imaging, where the images were captured from balloons has come a long way since. Aerial imaging has grown with time to become more accurate, cheaper and covering larger areas.

With the emergence of Industry 4.0 [37], we have been exposed to technology and material which were once beyond the reach of common man. Aerial photography, surveillance and mapping using Unmanned Aerial Vehicles, commonly known as UAVs is one such area. The use of UAVs was mainly in the military domain, is now extending increasingly to the civilian domain for monitoring the environment, surveillance, etc.

Mosaicking of images has always been an area of interest mainly due to the reason of being able to show the Big Picture that can be derived from the merging of multiple images. Just like a puzzle where each piece appears to hold some simple information and the completion of the puzzle reveals the big picture, mosaicking provides an output that gives details which could not be got from a single image.

Availability of low cost high grade hardware has led to the development of low cost UAVs which offer the same services that high end UAVs offered. With innovative developments in the field of camera, we are now able to get affordable off-the-shelf cameras providing features of high end cameras. These low cost UAVs carry as payload, multiple sensors such as gyroscope, Inertial Measurement Unit (IMU), Global Positioning System (GPS), accelerometer, cameras etc., but have limited flight capabilities. With the advancement in the area of control engineering, UAVs are provided with increased stability which prevent distortions due to internal vibrations and external impacts from wind and turbulence during flight.

Here we evaluate various feature descriptors and matching techniques to come up with a hybrid strategy for mosaicking images captured by UAVs. The proposed method does not use any navigational sensor data for the mosaicking operation and only uses the images shared by the UAV for mosaicking.

The paper is organized in the following manner: Section 2 gives an overview of the existing works done in this field; Section 3 describes the proposed work in detail; Results and Analysis have been earmarked in Section 4. The paper is concluded in Section 5.

2 Related works

Image Mosaicking has been a niche area with its presence stamped firmly in various domains. The mosaicking process has improved significantly with the introduction of Feature descriptors like Scale Invariant Feature Transform (SIFT) [1], Speeded Up Robust Features (SURF) [2] and Oriented FAST and Rotated BRIEF (ORB) [3]. These Feature descriptors have changed the way traditional mosaicking operations happened. No longer is the need for any Ground Control Points (GCP) due to the robust feature being tracked by these Feature Descriptors. An extensive insight into the various strategies employed in image registration can be seen in [4, 6]. [6] further stress that feature based methods are apt for remote sensing applications. [5 –10] showcase how image mosaicking is carried out. It is evident from the works that use of feature based method outweigh other approaches in mosaicking process.

Use of on-board sensors and preprocessing of the images plays an important role for mosaicking of aerial images from UAVs. [11 , 17–20] and [22] present methods that use only images obtained from UAVs as input to perform the mosaicking operation. Various strategies using images obtained from UAVs along with the UAV meta-data are observed in [12 , 17]. Saeed Yahyanejad et.al [15] in their work focus on ortho-rectified mosaicks by making use of a hybrid approach using both the camera position and orientation along with the image for mosaicking. Mosaicking Low resolution images without the use of any meta-data followed by conversion of the resultant mosaick to high resolution is presented by Debabrata Ghosh et al. [24]. Jinyan Tian et al. [23] aim at removal of seams in mosaicks generated from UAVs using Wallis Dodging and Gaussian Distance Weight Enhancement method. Saeed Yahyanejad and Bernhard Rinner [21] explain about fusing of images obtained from different aerial sensors that are heterogenous in nature, which focus on inter-spectral image registration at real time. Correcting errors that accumulate over time during the mosaicking process especially in loop independent traversal of the UAV, thereby increasing the ortho-rectification [16] is explained in a simplistic manner by Saeed Yahyanejad et al. [25].

The proposed method focuses on evaluating various feature descriptors and matching techniques to come up with a hybrid strategy for mosaicking images obtained from UAVs. This method differs from other methods in the use of an indigenously made UAV that provides images without any distortions. The proposed method also differs in the image capture rate during flight and makes use of a device with less computation power for the mosaicking operation.

3 Proposed method

The aim of the proposed method is to generate a visually seamless mosaick that is appealing to the eye and does not make use of any navigational sensor data for mosaicking. The resultant mosaick can be used for understanding the area covered by the UAV during its flight.

This method does not provide an ortho-mosaick and is not intended to be used for any precision calculations. The assumptions considered in this method are:

The UAV flies at constant altitude in ”stable mode”

The images are captured in NADIR view

There is no Camera Lens distortion

The seamless mosaick from our proposed method is generated by subjecting the images to Scale Invariant Feature Transform (SIFT) [1], for feature extraction, Fast Library for Approximate Nearest Neighbors (FLANN) [26] for enhanced feature matching and RANdom Sample Consensus (RANSAC) [27] for removal of features that are redundant or act as outliner to all the features extracted. Homography [24] estimation using the resultant points from RANSAC is followed by image blending using Multi-Band Blending [8, 28] that generates a seamless mosaick. Figure 1 depicts the workflow of the proposed method.

Fig.1

Workflow of the proposed method.

3.1 Feature extraction

SIFT has been chosen as its goal is ”Extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene” [1]. Figure 2 shows the key-points extracted from the image and Fig. 3 provides a detailed representation which shows the orientation of the keypoints.

Fig.2

Features extracted using SIFT.

Fig.3

Detailed view of Features extracted using SIFT.

Identification of stable keypoints is carried out by first convolving the candidate image using gaussian filters at different scales using a constant multiplicative factor k and then obtaining the difference between adjacent convolved images. The scale space extrema points of the Difference of Gaussian (DoG) are considered as the candidate stable keypoints [1]. DoG is computed as follows:

$\begin{matrix} D (x, y, σ) = ((G (x, y, k σ)) - (G (x, y, σ))) * I (x, y) \\ = L (x, y, k σ) - L (x, y, σ) \end{matrix}$ (1) where I is the candidate image, G is the Gaussian filter, σ is the standard deviation and k is the multiplicative factor. SIFT was chosen over SURF and ORB as the number of salient features detected by SIFT was higher. Table 1 shows the number of features detected by SIFT, SURF and ORB. It is evident from Table 1 that SIFT captures a higher number of salient features whereby providing more candidates for matching and leading to better mosaick.

Table 1

Comparative Analysis based on number of features detected

DATASET	SIFT	SURE	ORB
	Features Detected	Features Detected	Features Detected
1	552	455	495
2	655	537	500
3	855	680	500
4	1035	817	500
5	1217	819	500
6	1142	804	500
7	1071	813	500
8	1146	774	500
9	1026	699	500
10	897	634	500
11	859	600	500

3.2 Feature matching

Fast Library for Approximate Nearest Neighbors (FLANN) is used for speeding up the feature matching process. Based on the input dataset, k-means or randomized k-d trees [29, 30] is chosen automatically. The k-means tree construction time is reduced by limiting the number of iterations of the K-Means Clustering [26]. This adaptive behavior is highly advantageous in terms of processing speed. Figure 4 shows the keypoint matching done by FLANN.

Fig.4

Keypoint matching using FLANN.

FLANN has been chosen for feature matching as it performs faster matching of points. When compared with another matching technique, the Brute Force Matching approach [36], the time taken for matching using FLANN is less and hence is considered as a better choice for feature matching. Table 2 depicts the Comparative analysis of Brute force and FLANN approaches based on the points matched and time consumed for matching. Figure 5 shows a graphical representation of the time consumed by the two techniques for each pair matching. It can be seen from Table 2 and its equivalent graphical representation in Fig. 5 that FLANN performs faster and is apt to be considered a choice for feature matching in the proposed work.

Fig.5

Comparison of time taken for matching the features using Brute Force and FLANN approaches.

Table 2

Comparative analysis based on Feature matching and time consumed

Feature Matching
Stitch Iteration	Image 1	Image 2	Feature Matching using Brute Force Approach	Time consumed for matching using Brute Force approach(ms)	Feature Matching using FLANN Approach	Time consumed for matching using FLANN approach(ms)
1	482	594	215	44.00	215	31.00
2	653	789	260	80.00	260	47.00
3	880	981	332	116.00	332	94.00
4	1072	1101	417	156.00	417	125.00
5	1242	994	381	172.00	381	111.00
6	1359	1000	416	200.00	416	140.00
7	1553	1036	437	240.00	437	187.00
8	1755	922	418	248.00	418	188.00
9	1756	799	393	216.00	393	172.00
10	1758	716	362	252.00	362	183.00

3.3 Outliner removal and homography estimation

RANSAC can be considered as an approach to estimate parameters while coping to a large portion of outliners in the input data [31]. Perspective transform or Homography operates on homogeneous coordinates where the mapping from a point X in non homogenous coordinate system to X’ in homogeneous system is from (x,y) to (x,y,1). It is represented using Equations 2 and 3.

$(\begin{matrix} x_{1}^{'} \\ x_{2}^{'} \\ 1 \end{matrix}) = | \begin{matrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{matrix} | (\begin{matrix} x_{1} \\ x_{2} \\ 1 \end{matrix})$ (2) $x^{'} = Hx$ (3)

The homography matrix has 8 Degrees of Freedom(DoF) and hence each point correspondence is enough to solve the homography directly [32].

Transforming each point in the input to the output results the presence of holes in the resultant image. This can be avoided using inverse homography instead of forward homography. In inverse homography, the intensity values at non-integer pixels in the input are obtained by bilinear interpolation [10].

3.4 Image blending

M. Brown and D. G. Lowe [9] in their paper, mention that even though each pixel along a ray must have same intensity in all the images it intersects, it differs in reality due to issues like exposure effect, exposure time, aperture changes etc. This is the reason why a good blending strategy is required [9].

Multi-Band Blending strategy was developed by Burt and Adelson [28]. It is based on blending an image not in a single band, but in multiple frequencies. Low frequencies are blended over large spatial ranges and high frequencies are blended over small spatial ranges using Laplacian pyramid. The low frequency information is blended using a linear weighted sum, and the high frequency information from the image is selected using the maximum weight. Each warped image is converted into Laplacian pyramid with smoothing of each level by a factor of 2. The mask images associated with these source images are converted to a low pass Gaussian pyramid. The masks become weights to perform each level feathering blend. All the information from the images are fused together using the weight function where the weights vary from 0 to 1. The final image is constructed by interpolating and summing all of the pyramid levels. Figure 6 shows the result of multi-band blending.

Fig.6

Result of Multi-band Blending.

This method creates a mosaicked image I_N which is the result of N images captured by the UAV during its flight. It can be mathematically represented as I_N = MOSAIC (I_N, (MOSAICK (I_N-1, . . .))) The MOSAICK function contains two operands of which the second input is the transformed image that is aligned to the first input image, which acts as a reference image. Result from the previous mosaicking operation act as the first input in the next iteration. This operation happens in an iterative manner thus forming a final visually seamless result giving the complete overview of the area covered by the UAV. As mentioned earlier, Fig. 1 illustrates workflow of the proposed work.

4 Results and analysis

In this section, we evaluate performance of the proposed method qualitatively and quantitatively. From the evaluation it is seen that the proposed work is suitable for bringing out seamless mosaicks. The experiment was split into two stages namely

Phase 1: Data Acquisition

Phase 2: Image Mosaicking

As part of the phase 1, ”Drishti”, a UAV, was used to capture images [33]. The dataset comprised of a sequence containing 61 images which were taken over IIT Kanpur [34]. Details of the UAV flight and camera are mentioned in Tables 3 and 4.

Table 3
UAV flight specification

UAVNAME Drishti

Duration Altitude Speed of UAV (m/sec)

Flight (minutes) Maintained (m)

10 150 5

UAVNAME	Drishti
Duration	Altitude	Speed of UAV (m/sec)
Flight (minutes)	Maintained (m)
10	150	5

Table 4

Camera specification

Camera Model	Frequency of Frame Capture (Seconds) (Seconds)	Percentage of Overlap between adjacent images	Images Resolution (Pixels)	Exposure Time (Seconds)
Cannon S110	2	>65	4000x3000	1/600

The Phase 2 is done offline. Due to the hardware restrictions, the images have been down-sampled to 500px X 375px. As the images are in ”png” format which are lossless decomposition in nature, the reduction in image size does not affect the image quality. The first 11 images of the dataset have been used for the evaluation process. Area covered by the UAV, Dristhi during its flight has been plotted using the Google maps app and is given in Fig. 7.

Fig.7

Area Covered by Dristhi - A Satellite View using Google Maps.

The proposed method has been implemented on a system that has a 2.16 GHz Intel Pentium Quad Core processor and 4 GB DDR3 RAM using Python and OpenCV. For the image sequence obtained from the UAV, a perfect seamless mosaick is obtained.

Figure 8 shows the result obtained by applying the proposed method on the first 11 images from the dataset captured by Dristhi.

Fig.8

Resultant Mosaick from the Proposed Work.

Universal Image Quality Index QIM is used to evaluate quality of the resultant mosaick obtained. QIM is modeled as a combination of three factors namely loss of correlation, luminance distortion and contrast distortion [35], making it more reliable when compared to the traditional metric like Mean Square Error (MSE) [35]. It can be represented mathematically as:

$Q = \frac{4 * σ_{xy} * \bar{x} * \bar{y}}{(σ_{x}^{2} + σ_{y}^{2}) * [(\bar{x})^{2} + (\bar{y})^{2}]}$ (4) The range of QIM is [-1, +1], where +1 shows the nearest similarity between the original image and the reference image [35]. To evaluate the quality of mosaick, each image frame in the sequence of 11 images that are numbered sequentially from 1 to 11 were taken individually, split into 2 sub frames with an overlapping region greater than 65 percent. These images were stitched together and the quality of the mosaick is tested against the original image, which is used as the reference image. This was done as there was no reference image for the big mosaick generated.

Figure 9 shows one of the images used as a test sample to carry out the evaluation and Fig. 10 shows the corresponding resultant mosaick using the proposed method. From Fig. 10, it is evident that the visual quality of the resultant mosaick is seamless with clear depiction of the area. Table 5 depicts the Quality index values obtained against each image in the sequence. Average QIM result seen from Table 5 is 0.9998 which implies that a very high quality mosaick was generated by the proposed method.

Fig.9

The test image used for evaluation.

Fig.10

Resultant Mosaick obtained by applying the proposed method.

Table 5

Quantitative analysis based on QIM

Image Sequence	QIM Value
Image 1	0.9995
Image 2	1
Image 3	1
Image 4	0.9998
Image 5	0.9996
Image 6	0.9999
Image 7	0.9998
Image 8	0.9999
Image 9	0.9995
Image 10	0.9998
Image 11	1
Average QiM Vlaue	0.9998

5 Conclusion

In this paper, a hybrid method that uses various techniques to bring out mosaick of aerial images obtained from UAVs is proposed. The proposed method makes use of an indigenously made UAV which captures images at regular interval during its flight. Various feature descriptors and matching techniques are evaluated to choose the right feature descriptor and feature matching techniques to be used in the system. The proposed method uses SIFT for feature extraction, FLANN for feature matching, RANSAC for outliner removal and better estimation of Homography and finally Multi-band blending for image blending. The implementation is done on a less powerful computer and it is found to exhibit seamless mosaick output for the dataset obtained from the UAV. Qualitative and Quantitative performance evaluation using Universal Quality Index Measure (QIM) proves the ability of the proposed method in producing seamless mosaicks.

Footnotes

Acknowledgment

The authors would like to express their gratitude to Mr. Suhas and Aarav Unmanned Systems Pvt Ltd, for providing the dataset containing UAV images for this work. The authors would like to thank Dr. S N Omkar, Chief Research Scientist, Department of Aerospace Engineering, IISc Bangalore, Dr. Adrian Rosebrock of PyImageSearch, Dr. Tessy Mathew(Associate Professor and Head) and all the faculty members of the Department of Computer Science and Engineering, Mar Baselios College of Engineering and Technology, Trivandrum for all the support rendered towards this work.

References

D.G.

Lowe , Distinctive image features from scaleinvariant keypoints, International Journal of Computer Vision 60(2) (2004), 91–110.

Bay , et al., Speeded-up robust features (SURF), Computer Vision and Image Under-Standing 110(3) (2008), 346–359.

Rublee , et al., ORB: An efficient alternative to SIFT or SURF, 2011 International Conference on Computer Vision IEEE, 2011, pp. 2564–2571.

L.G.

Brown , A survey of image registration techniques, ACM Computing Surveys (CSUR) 24(4) (1992), 325–376.

Szeliski and

H.-Y.

Shum , Creating full view panoramic image mosaics and environment maps, Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, ACM Press/AddisonWesley Publishing Co., 1997.

Zitova and

Flusser , Image registration methods: A survey, Image and Vision Computing 21(11) (2003), 977–1000.

Szeliski , Image alignment and stitching: A tutorial, Foundations and Trends in Computer Graphics and Vision 2(1) (2006), 1–104.

Adel ,

Elmogy and

Elbakry , Image stitching based on feature extraction techniques: A survey, International Journal of Computer Applications Volume (2014), 0975–8887.

Brown and

D.G.

Lowe , Recognising panoramas, ICCV 3 (2003), 1218.

10.

Bhaskaranand and

Bhat , Image Registration and Mosaicking. University of California.

11.

Majumdar ,

Vinay and

Selvi , Registration and mosaicing for images obtained from UAV, International Conference on Signal Processing and Communications, 2004, pp. 198–203.

12.

Zhu , et al., An efficient method for georeferenced video mosaicing for environmental monitoring, Machine Vision and Applications 16(4) (2005), 203–216.

13.

Suzuki ,

Amano and

Hashizume , Vision based localization of a small UAV for generating a large mosaic image, SICE Annual Conference 2010, Proceedings of IEEE, 2010, pp. 2960–2964.

14.

Xing and

Huang , An improved mosaic method based on SIFT algorithm for UAV sequence images, International Conference on Computer Design and Applications (ICCDA), IEEE, Vol. 1, 2010, p. 414.

15.

Yahyanejad , et al. Incremental mosaicking of images from autonomous, small-scale UAVs, Advanced Video and Signal Based Surveillance (AVSS), 2010 Seventh IEEE International Conference on IEEE, 2010, pp. 329–336.

16.

Orthorectification meaning, https://trac.osgeo.org/ossim/wiki/orthorectification Accessed on 14 December 2015.

17.

Botterill ,

Mills and

Green , Realtime aerial image mosaicing, Image and Vision Computing New Zealand (IVCNZ), 25th International Conference of IEEE, 2010, pp. 1–8.

18.

Xiong , et al., A Real-time Stitching Algorithm for UAV Aerial Images, Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering, Atlantis Press, 2013.

19.

Kekec ,

Yildirim and

Unel , A new approach to real-time mosaicing of aerial images, Robotics and Autonomous Systems 62(12) (2014), 1755–1767.

20.

Li , et al., A robust mosaicking procedure for high spatial resolution remote sensing images, ISPRS Journal of Photogrammetry and Remote Sensing 109 (2015), 108–125.

21.

Yahyanejad and

Rinner , A fast and mobile system for registration of low-altitude visual and thermal aerial images using multiple small-scale UAVs, ISPRS Journal of Photogrammetry and Remote Sensing 104 (2015), 189–202.

22.

Xu , et al., Mosaicking of unmanned aerial vehicle imagery in the absence of camera poses, Remote Sensing 8(3) (2016), 204.

23.

Tian , et al., An efficient seam elimination method for UAV images based on wallis dodging and gaussian distance weight enhancement, Sensors 16(5) (2016), 662.

24.

Ghosh ,

Kaabouch and

W.-C.

Hu , A robust iterative super-resolution mosaicking algorithm using an adaptive and directional Huber-Markov regularization, Journal of Visual Communication and Image Representation 40 (2016), 98–110.

25.

Yahyanejad ,

Quaritsch and

Rinner , Incremental, orthorectified and loop-independent mosaicking of aerial images taken by micro UAVs, IEEE International Symposium on Robotic and Sensors Environments (ROSE), IEEE, 2011.

26.

Muja and

D.G.

Lowe , Fast approximate nearest neighbors with automatic algorithm configuration, VISAPP (1) (2009), 331–340.

27.

M.A.

Fischler and

R.C.

Bolles , Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Communications of the ACM 24(6) (1981), 381–395.

28.

P.J.

Burt and

E.H.

Adelson , A multiresolution spline with application to image mosaics, ACM Transactions on Graphics (TOG) 2(4) (1983), 217–236.

29.

Silpa-Anan and

Hartley , Localisation using an image-map, Proceedings of the 2004 Australasian Conference on Robotics and Automation, 2004.

30.

Silpa-Anan and

Hartley , Optimised KDtrees for fast image descriptor matching, Computer Vision and Pattern Recognition, 2008 CVPR 2008 IEEE Conference on IEEE, 2008.

31.

K.G.

Derpanis , Overview of the RANSAC algorithm, Image Rochester NY 4(1) (2010), 2–3.

32.

Peter

Capel, Image Mosaicing and Superresolution, 3rd Chapter-Geometric Registration Robotics Research Group Department of Engineering Science University of Oxford, 2001.

33.

Dataset Courtesy- Aarav Unmanned Systems Pvt Ltd. www.aus.co.in.

34.

IIT Kanpur official website, http://www.iitk.ac.in/

35.

Wang and

A.C.

Bovik , A universal image quality index, IEEE Signal Processing Letters 9(3) (2002), 81–84.

36.

Brute Force Matching- A Tutorial, OpenCV, http://docs.opencv.org/3.0-beta/doc/pytutorials/pyfeature2d/pymatcher/pymatcher.html, Accessed 8 August 2016.

37.

Bernard Marr. What Everyone Must Know About Industry 4.0, https://www.forbes.com/sites/bernardmarr/2016/06/20/whateveryone-must-know-about-industry-4-0/#6366fd9795f7, Accessed 8 August 2016.

A hybrid approach to generate visually seamless aerial mosaicks from unmanned aerial vehicles

Abstract

Keywords

1 Introduction

2 Related works

3 Proposed method

Table 3 UAV flight specification UAVNAME Drishti Duration Altitude Speed of UAV (m/sec) Flight (minutes) Maintained (m) 10 150 5

Footnotes

Acknowledgment

References

Table 3
UAV flight specification

UAVNAME Drishti

Duration Altitude Speed of UAV (m/sec)

Flight (minutes) Maintained (m)

10 150 5