Abstract
The need for understanding the terrain or conditions of large areas aerially has gained prominence as the aerial images provide a near clear coverage of the area under study. Individual image provides just a portion of the area, thus to understand the whole area, mosaicking or stitching of these images is needed. Image mosaicking aids in providing with a ”Big Picture” as an outcome by joining the images taken during the flight. In this paper we propose a method which aims at generating a seamless aerial mosaick using only the images captured by the UAV as input. This involves identifying candidate images from the images captured by the UAV periodically during its flight and stitching the images together. This method evaluates various feature descriptors and feature matching techniques that can be integrated into the mosaicking system. The proposed work is a hybrid approach that uses the Scale Invariant Feature Transform (SIFT) for feature extraction and the key features are matched using the Fast Library for Approximate Nearest Neighbors (FLANN). RANdom Sample Consensus (RANSAC), is used for the removal of features that are redundant or act as outliner, providing candidates for Homography estimation. This is followed by image stitching that involves the use of Multi-Band Blending to produce a visually seamless mosaick. The results obtained were evaluated for quality using Universal Quality index Measure (QIM) and is found to be perfect.
Introduction
Images have meant a lot to us and thus the saying ”A Picture speaks a thousand words” aptly puts forth how well, we relate to an image. This is why we make use of images to put forth our ideas, be it during meetings, discussions or bringing out ideas understandable to the common man. Images bring out the complexity of information in a very simple way, helping the people understand concepts better and in an easier way. This is one reason why images are so widely used in all complex fields like medicine, astronomy, marketing, aerial imaging etc.
The proposed work focuses on the use of aerial images, which is gaining prominence in the field of image processing. The initial stage of aerial imaging, where the images were captured from balloons has come a long way since. Aerial imaging has grown with time to become more accurate, cheaper and covering larger areas.
With the emergence of Industry 4.0 [37], we have been exposed to technology and material which were once beyond the reach of common man. Aerial photography, surveillance and mapping using Unmanned Aerial Vehicles, commonly known as UAVs is one such area. The use of UAVs was mainly in the military domain, is now extending increasingly to the civilian domain for monitoring the environment, surveillance, etc.
Mosaicking of images has always been an area of interest mainly due to the reason of being able to show the
Availability of low cost high grade hardware has led to the development of low cost UAVs which offer the same services that high end UAVs offered. With innovative developments in the field of camera, we are now able to get affordable off-the-shelf cameras providing features of high end cameras. These low cost UAVs carry as payload, multiple sensors such as gyroscope, Inertial Measurement Unit (IMU), Global Positioning System (GPS), accelerometer, cameras etc., but have limited flight capabilities. With the advancement in the area of control engineering, UAVs are provided with increased stability which prevent distortions due to internal vibrations and external impacts from wind and turbulence during flight.
Here we evaluate various feature descriptors and matching techniques to come up with a hybrid strategy for mosaicking images captured by UAVs. The proposed method does not use any navigational sensor data for the mosaicking operation and only uses the images shared by the UAV for mosaicking.
The paper is organized in the following manner: Section 2 gives an overview of the existing works done in this field; Section 3 describes the proposed work in detail; Results and Analysis have been earmarked in Section 4. The paper is concluded in Section 5.
Related works
Image Mosaicking has been a niche area with its presence stamped firmly in various domains. The mosaicking process has improved significantly with the introduction of Feature descriptors like Scale Invariant Feature Transform (SIFT) [1], Speeded Up Robust Features (SURF) [2] and Oriented FAST and Rotated BRIEF (ORB) [3]. These Feature descriptors have changed the way traditional mosaicking operations happened. No longer is the need for any Ground Control Points (GCP) due to the robust feature being tracked by these Feature Descriptors. An extensive insight into the various strategies employed in image registration can be seen in [4, 6]. [6] further stress that feature based methods are apt for remote sensing applications. [5–10] showcase how image mosaicking is carried out. It is evident from the works that use of feature based method outweigh other approaches in mosaicking process.
Use of on-board sensors and preprocessing of the images plays an important role for mosaicking of aerial images from UAVs. [11, 17–20] and [22] present methods that use only images obtained from UAVs as input to perform the mosaicking operation. Various strategies using images obtained from UAVs along with the UAV meta-data are observed in [12, 17]. Saeed Yahyanejad et.al [15] in their work focus on ortho-rectified mosaicks by making use of a hybrid approach using both the camera position and orientation along with the image for mosaicking. Mosaicking Low resolution images without the use of any meta-data followed by conversion of the resultant mosaick to high resolution is presented by Debabrata Ghosh et al. [24]. Jinyan Tian et al. [23] aim at removal of seams in mosaicks generated from UAVs using Wallis Dodging and Gaussian Distance Weight Enhancement method. Saeed Yahyanejad and Bernhard Rinner [21] explain about fusing of images obtained from different aerial sensors that are heterogenous in nature, which focus on inter-spectral image registration at real time. Correcting errors that accumulate over time during the mosaicking process especially in loop independent traversal of the UAV, thereby increasing the ortho-rectification [16] is explained in a simplistic manner by Saeed Yahyanejad et al. [25].
The proposed method focuses on evaluating various feature descriptors and matching techniques to come up with a hybrid strategy for mosaicking images obtained from UAVs. This method differs from other methods in the use of an indigenously made UAV that provides images without any distortions. The proposed method also differs in the image capture rate during flight and makes use of a device with less computation power for the mosaicking operation.
Proposed method
The aim of the proposed method is to generate a visually seamless mosaick that is appealing to the eye and does not make use of any navigational sensor data for mosaicking. The resultant mosaick can be used for understanding the area covered by the UAV during its flight.
This method does not provide an ortho-mosaick and is not intended to be used for any precision calculations. The assumptions considered in this method are:
The UAV flies at constant altitude in ”stable mode” The images are captured in NADIR view There is no Camera Lens distortion
The seamless mosaick from our proposed method is generated by subjecting the images to Scale Invariant Feature Transform (SIFT) [1], for feature extraction, Fast Library for Approximate Nearest Neighbors (FLANN) [26] for enhanced feature matching and RANdom Sample Consensus (RANSAC) [27] for removal of features that are redundant or act as outliner to all the features extracted. Homography [24] estimation using the resultant points from RANSAC is followed by image blending using Multi-Band Blending [8, 28] that generates a seamless mosaick. Figure 1 depicts the workflow of the proposed method.

Workflow of the proposed method.
SIFT has been chosen as its goal is ”Extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene” [1]. Figure 2 shows the key-points extracted from the image and Fig. 3 provides a detailed representation which shows the orientation of the keypoints.

Features extracted using SIFT.

Detailed view of Features extracted using SIFT.
Identification of stable keypoints is carried out by first convolving the candidate image using gaussian filters at different scales using a constant multiplicative factor k and then obtaining the difference between adjacent convolved images. The scale space extrema points of the Difference of Gaussian (DoG) are considered as the candidate stable keypoints [1]. DoG is computed as follows:
Comparative Analysis based on number of features detected
Fast Library for Approximate Nearest Neighbors (FLANN) is used for speeding up the feature matching process. Based on the input dataset, k-means or randomized k-d trees [29, 30] is chosen automatically. The k-means tree construction time is reduced by limiting the number of iterations of the K-Means Clustering [26]. This adaptive behavior is highly advantageous in terms of processing speed. Figure 4 shows the keypoint matching done by FLANN.

Keypoint matching using FLANN.
FLANN has been chosen for feature matching as it performs faster matching of points. When compared with another matching technique, the Brute Force Matching approach [36], the time taken for matching using FLANN is less and hence is considered as a better choice for feature matching. Table 2 depicts the Comparative analysis of Brute force and FLANN approaches based on the points matched and time consumed for matching. Figure 5 shows a graphical representation of the time consumed by the two techniques for each pair matching. It can be seen from Table 2 and its equivalent graphical representation in Fig. 5 that FLANN performs faster and is apt to be considered a choice for feature matching in the proposed work.

Comparison of time taken for matching the features using Brute Force and FLANN approaches.
Comparative analysis based on Feature matching and time consumed
RANSAC can be considered as an approach to estimate parameters while coping to a large portion of outliners in the input data [31]. Perspective transform or Homography operates on homogeneous coordinates where the mapping from a point X in non homogenous coordinate system to X’ in homogeneous system is from (x,y) to (x,y,1). It is represented using Equations 2 and 3.
The homography matrix has 8 Degrees of Freedom(DoF) and hence each point correspondence is enough to solve the homography directly [32].
Transforming each point in the input to the output results the presence of holes in the resultant image. This can be avoided using inverse homography instead of forward homography. In inverse homography, the intensity values at non-integer pixels in the input are obtained by bilinear interpolation [10].
M. Brown and D. G. Lowe [9] in their paper, mention that even though each pixel along a ray must have same intensity in all the images it intersects, it differs in reality due to issues like exposure effect, exposure time, aperture changes etc. This is the reason why a good blending strategy is required [9].
Multi-Band Blending strategy was developed by Burt and Adelson [28]. It is based on blending an image not in a single band, but in multiple frequencies. Low frequencies are blended over large spatial ranges and high frequencies are blended over small spatial ranges using Laplacian pyramid. The low frequency information is blended using a linear weighted sum, and the high frequency information from the image is selected using the maximum weight. Each warped image is converted into Laplacian pyramid with smoothing of each level by a factor of 2. The mask images associated with these source images are converted to a low pass Gaussian pyramid. The masks become weights to perform each level feathering blend. All the information from the images are fused together using the weight function where the weights vary from 0 to 1. The final image is constructed by interpolating and summing all of the pyramid levels. Figure 6 shows the result of multi-band blending.

Result of Multi-band Blending.
This method creates a mosaicked image I N which is the result of N images captured by the UAV during its flight. It can be mathematically represented as I N = MOSAIC (I N , (MOSAICK (IN-1, . . .))) The MOSAICK function contains two operands of which the second input is the transformed image that is aligned to the first input image, which acts as a reference image. Result from the previous mosaicking operation act as the first input in the next iteration. This operation happens in an iterative manner thus forming a final visually seamless result giving the complete overview of the area covered by the UAV. As mentioned earlier, Fig. 1 illustrates workflow of the proposed work.
In this section, we evaluate performance of the proposed method qualitatively and quantitatively. From the evaluation it is seen that the proposed work is suitable for bringing out seamless mosaicks. The experiment was split into two stages namely Phase 1: Data Acquisition Phase 2: Image Mosaicking
As part of the phase 1, ”Drishti”, a UAV, was used to capture images [33]. The dataset comprised of a sequence containing 61 images which were taken over IIT Kanpur [34]. Details of the UAV flight and camera are mentioned in Tables 3 and 4.
UAV flight specification
UAV flight specification
Camera specification
The Phase 2 is done offline. Due to the hardware restrictions, the images have been down-sampled to 500px X 375px. As the images are in ”png” format which are lossless decomposition in nature, the reduction in image size does not affect the image quality. The first 11 images of the dataset have been used for the evaluation process. Area covered by the UAV, Dristhi during its flight has been plotted using the Google maps app and is given in Fig. 7.

Area Covered by Dristhi - A Satellite View using Google Maps.
The proposed method has been implemented on a system that has a 2.16 GHz Intel Pentium Quad Core processor and 4 GB DDR3 RAM using Python and OpenCV. For the image sequence obtained from the UAV, a perfect seamless mosaick is obtained.
Figure 8 shows the result obtained by applying the proposed method on the first 11 images from the dataset captured by Dristhi.

Resultant Mosaick from the Proposed Work.
Universal Image Quality Index QIM is used to evaluate quality of the resultant mosaick obtained. QIM is modeled as a combination of three factors namely loss of correlation, luminance distortion and contrast distortion [35], making it more reliable when compared to the traditional metric like Mean Square Error (MSE) [35]. It can be represented mathematically as:
Figure 9 shows one of the images used as a test sample to carry out the evaluation and Fig. 10 shows the corresponding resultant mosaick using the proposed method. From Fig. 10, it is evident that the visual quality of the resultant mosaick is seamless with clear depiction of the area. Table 5 depicts the Quality index values obtained against each image in the sequence. Average QIM result seen from Table 5 is

The test image used for evaluation.

Resultant Mosaick obtained by applying the proposed method.
Quantitative analysis based on QIM
In this paper, a hybrid method that uses various techniques to bring out mosaick of aerial images obtained from UAVs is proposed. The proposed method makes use of an indigenously made UAV which captures images at regular interval during its flight. Various feature descriptors and matching techniques are evaluated to choose the right feature descriptor and feature matching techniques to be used in the system. The proposed method uses SIFT for feature extraction, FLANN for feature matching, RANSAC for outliner removal and better estimation of Homography and finally Multi-band blending for image blending. The implementation is done on a less powerful computer and it is found to exhibit seamless mosaick output for the dataset obtained from the UAV. Qualitative and Quantitative performance evaluation using Universal Quality Index Measure (QIM) proves the ability of the proposed method in producing seamless mosaicks.
Footnotes
Acknowledgment
The authors would like to express their gratitude to Mr. Suhas and Aarav Unmanned Systems Pvt Ltd, for providing the dataset containing UAV images for this work. The authors would like to thank Dr. S N Omkar, Chief Research Scientist, Department of Aerospace Engineering, IISc Bangalore, Dr. Adrian Rosebrock of PyImageSearch, Dr. Tessy Mathew(Associate Professor and Head) and all the faculty members of the Department of Computer Science and Engineering, Mar Baselios College of Engineering and Technology, Trivandrum for all the support rendered towards this work.
