Abstract
Direct web-based access to ready-to-use huge archives of satellite images and cloud-based services for planetary-scale data processing (e.g., Google Earth Engine or Amazon S3) is making possible to analyze unprecedented amounts of remotely sensed images simultaneously. Multiple images can be exploited to improve traditional results achieved through on-premises (on-site) processing, coupling cloud offerings, and redundant image information.
This paper will introduce the concept of image network optimization for the case of registration problems based on groups of terrain-geocoded images. The particular case of multi-image registration will be discussed, notwithstanding the proposed approach can be extended to other practical issues, as illustrated in the paper. The concept of network design and optimization for satellite images is mathematically formulated and quantified with a multi-purpose objective function comprising precision, reliability, and cost. Results are illustrated with theory and numerical simulations carried out with a rigorous stochastic approach, in which the significance of the different input variables is estimated. The developed network-based approach allows one to reduce the number of external constraints, mainly focusing only on images and their increasing availability through web-services integrated by massive cloud computation capability.
Introduction
Why use network-based analysis for remotely sensed data?
The availability of a massive amount of satellite images directly accessible through the Internet is changing the traditional processing approach of remotely sensed data (Zhu et al., 2009). Processing is moving from personal computers and workstations (local storage) to cloud computing services. For instance, Sentinel-2 archives organized into tiles using the Military grid system are available to anyone via Amazon S3. The Sentinel image browser is available at
The opportunity to work without downloading massive datasets allows users to deal with an unprecedented amount of images with variable spatial, temporal, radiometric, and spectral resolution. Improved data access is correlated to more exhaustive usages of data, in which the combined use of more images than those traditionally exploited is quite attractive. On the other hand, large datasets introduce relevant problems and require efficient and reliable processing algorithms. Big data has, therefore, become a favorite word in remote sensing and, at the same time, a practical challenge for the development of applications and services capable of delivering information for productive work (Ma et al., 2015). The well-known 6Vs (Volume, Variety, Velocity, Veracity, Validity, and Volatility) have to be translated from a technological problem to a set of reliable solutions for data processing (Nativi et al., 2015).
Recent work carried out in the field of close-range photogrammetry and computer vision (say in the latest 6–8 syears) has increased the number of users of image-based reconstruction. Software like PhotoScan, PhotoModeler, 3D Zephyr, ContextCapture, Pix4D Mapper (among the others) is now used by different operators of different fields. This has also proved that (i) facilitating data availability and access (in this case digital cameras) coupled with (ii) automated processing solutions (such as low cost 3D reconstruction software) open new opportunities not only for the specialists of the sector, but also for a wide community of users interested in digital reconstruction.
Another important consideration deserves to be mentioned. The availability of automated processing algorithms for 3D reconstruction has led to an increment of the number of image acquired and processed. This is due to two main reasons: automated orientation procedures (coined Structure from Motion in computer vision) require more images than those strictly needed for manual processing, where a human operator selects points with interactive measurements. Procedures for image matching and orientation based on scale invariant features require short baselines, increasing the final number of images. The second reason is less technical and depends on user’s experience. Operators not very expert in the field of image analysis tend to acquire several images for both simple and complex applications. The list of users includes geologists, engineers, archeologists, designers, architects, etc., who are probably interested in such techniques for productive purposes, but also casual snapshooters or professional photographers. Photogrammetrists and computer vision specialists are a part of the users of image reconstruction software, where the high level of automation achieved in image processing has allowed users to “ignore” basic rules for the efficient production of 3D models. The opportunity to upload images and run automatic data processing is surely quite tempting. Obviously, a lack of “photogrammetric experience” has also implications regarding metric accuracy, level of detail, completeness, processing time, etc. (Nocerino et al., 2014).
In the case of satellite images, the author’s intention is not to expect a massive usage among different operators with limited experience in remote sensing for the growing availability of new cloud-based services. While the production of a 3D model from a set of images is a tangible result usable in a variety of applications (from pure visualization purposes to advanced applications where the model is corrected and refined), remotely sensed data processing still require expert users who have to analyze and interpret the result to deliver reliable products. This is a remarkable difference compared to 3D modeling solution, in which a “tangible visual result” is directly available (like an orthophoto or a textured 3D model). On the other hand, it is clear that the availability of a huge number of images coupled with efficient techniques for data processing will foster a more productive use of space data, requiring new downstream services and applications where satellite images collected by different sensors will be processed to deliver products.
More attention is also required to the different phases of the image processing pipeline. Reliable products from images can be automatically generated when all the various processing steps have reached a significant technological maturity, facing the new challenges of a multi-sensor approach with images with variable geometric, radiometric, temporal, and spectral resolution.
The work carried out in this paper tries to extend the geometric registration problem from a traditional “reference-to-sensed” approach to the simultaneous exploitation of a complete time series, in which images are registered in a single step. In fact, accurate image registration is still a significant preprocessing step in remote sensing applications (Dawn et al., 2010). The lack of accurate image-to-image registration can result in significant errors in practical applications, as demonstrated by Townshend et al. (1992). Previous work (Barazzetti et al., 2014) has shown that a multi-image registration approach is feasible, more precise and robust than basic pairwise registration approaches. The availability of more than two images collected over the same area is an opportunity to achieve better results.
The aim of this paper is not an evaluation of specific case studies in different areas with different images. The work carried out wants to highlight the potential (and limitations) for the case of a generic network of satellite images, generalizing the approach from a single image pair to a complete network with an arbitrary number of images. The work wants to present the concept of network optimization and design applied to satellite image registration problems, which can be defined as the estimation of network quality before processing real data.
Results through numerical simulations and theoretical explanations will be discussed to clarify the different aspects of the design problem, testing the improved precision when different parameters are varied, such as the number of images, image-to-image points, and image-to-control points, as well as their relative percentage and distribution. Outcomes will be available through rigorous stochastic approaches, in which the variance-covariance matrix of unknown parameters will provide a numerical solution to validate the different configurations.
Although the proposed work focuses on network registration problems, it can be extended to different applications with satellite images. Starting from set of images, it is possible to improve the mathematical relationships (i.e., the functional model) between observed values and unknowns to take into consideration a complete network. For instance, the work presented in this paper could be extended to the relative normalization problem with pseudo-invariant features, where usually the same point is matched on single image pairs.
The multi-purpose objective function in satellite network design for data registration
We can define the aim of the optimization process for network-based image registration problems as the maximization of a multi-purpose objective function
where coefficients
Precision can be evaluated from the variance-covariance matrix of unknown parameters
Reliability is intended as the ability to detect and remove in an automated way gross errors (outliers) that play a fundamental role in the case of fully automated data processing. Modern techniques for outlier detection are based on sequential approaches based on robust estimators (such as the work initially presented in Fischler & Bolles, 1981), where a geometric model is coupled with iterative adjustment techniques based on random sampling. The aim is to find a set of inliers for the final estimation via least squares.
An additional consideration deserves to be mentioned. Precision, reliability, and accuracy are numerical indicators of random, gross and systematic errors in the network. Although this work will mainly focus on precision as an overall quality parameter of the network (see next sections), precision is not more important than reliability and accuracy. The choice of precision will be motivated by the intrinsic opportunity to specify (a priori) precision criteria. This aim can be achieved by exploiting the variance-covariance matrix, that can be analytically expressed as
The cost can be quantified by considering CPU time (or the number of floating-point operations per second) needed to complete data processing. CPU cost is a parameter that cannot be ignored in the case of a large dataset of images. The formulation of a specific cost function has to take into consideration the number of images and points used. An initial formulation of the cost function could be based on a quadratic form, that depends on the combination of images. For instance, a dataset of
Different statistical parameters can be used to quantify the reliability of a network. On the other hand, recent experiences in the field of close-range photogrammetry have proved that reliable methods for automated image processing are now available. This is the case of photogrammetric applications where most steps of the processing workflow are fully automatic. As mentioned, different scientific papers and commercial software are an indicator of the significant maturity achievable in automated close-range image processing, where gross errors are detected and discarded exploiting robust estimators. Such algorithms can be adapted to deal with satellite images, as demonstrated in different scientific papers (e.g. Qiaoliang et al., 2009; Teke & Temizel, 2010; Zhao et al., 2009; Bouchiha & Besbes, 2013).
High reliability can be achieved by integrating several strategies for outlier rejection in the procedure. Previous work in close-range photogrammetry has demonstrated that outlier rejection strategies can rely on high breakdown point estimators (Christmann, 1996), which allow the detection of possible outliers in the observations. The idea of having robustness in the estimation is to have safeguards against deviations from the assumptions. Within robust estimators, gross errors are defined as observations that do not fit the stochastic model of the estimated parameters. Their efficiency depends on different factors but primarily on the percentage of outliers. Basically, there is a lack of repeatability because of their random way of selecting the points (random sampling). Compared to data snooping techniques (a statistical test of the normalized residuals), robust estimators do not provide a measure or a judgment about the quality of the found (or rejected) outliers.
For this reason, reliability can be assumed as a part of CPU cost because of the need to integrate specific, robust algorithms that increase CPU time. From this point of view, reliability can be integrated into (Eq. (1)) to become an effective part of cost:
where cost
The design process of the network is related to the choice of images and points, as well as their location and distribution, the operator for image matching and the mathematical model for geometric registration. Different operators for image matching were developed for photogrammetric and remote sensing applications. An extensive review is presented in Gruen (2012). The mathematical model for image registration is the geometric relationship between corresponding points in different images (Brown, 1992; Le Moigne et al., 2011). In the case of image matching between two images only, commercial software packages are based on models with different geometric transformations (such as rotation
Let us consider a set of
where
The idea is to extend the pairwise registration approach for a network of images. Networks require the extraction of image points from all image combinations, applying robust algorithms for outlier detection, and determining the registration parameters of the different images in a single step. This combined adjustment extends the traditional approach based on individual pairs and makes the analysis more exhaustive, incorporating all the data in a unique adjustment process. Indeed, in the case where points cannot be matched in some image pairs, the availability of more images allows one to exploit additional combinations. More images also provide higher data redundancy and therefore better results regarding precision and reliability.
The scheme of the proposed network-based approach. Images points can have a direct connection to control points (triangles) or can be visible in multiple images (circles). Images without control points (image 2) can be registered by concatenating images into a network.
The extension from a pair of images to a network (Fig. 1) does not modify the equations used for registration. Equation (1) remains the mathematical model used. As all the images have to be registered with an affine transformation (with different numerical values of transformation parameters) between image points and control points (reference), we can assume that the geometric model between two generic images is still an affine transformation. This means that the same point can be matched in different images, requiring a unique “id” for the points and the computation of the corresponding multiplicity, i.e., the number of images where the same point is visible. Points visible in a single image (in this case corresponding control points are needed), but also in different configurations (such as pairs or triplets of images) play in a similar way of tie points in photogrammetric projects.
A generic block of
The algorithms for outlier rejection based on random sampling also require a small CPU cost, which is negligible if compared to the time for point detection in the images. Although robust estimators have to be applied to all image pairs, their contribution is relatively small when compared to the time needed to extract the point from all the images. Finally, the estimation of the solution via least squares is another operation which does not require a significant CPU cost, since optimized algorithms provide the solution vector without inverting the normal matrix.
The overall cost of data processing can be therefore computed as a linear function that depends only on the number of images, which is related to the detection phase of corresponding points thanks to the initial georeferencing parameters:
This relationship becomes even more useful for large networks in which the number of images increases. In such case, the cost of point detection is more significant when compared to the other phases of data processing.
One may ask why the large cost of image resampling (after the computation of transformation parameters) is not taken into consideration. This is a linear cost that depends of the number of images. However, network resampling is not different than traditional resampling used in pairwise image registration with commercial software. The network-based approach can be used with more images than those necessary, and then resampling only the images that will be processed to obtain additional products. The comparison presented in this paper is carried out from image matching to parameter computation, where the network-based approach is different from traditional processing.
In the case of a network where all the images require registration, the lack of a reference system gives a rank deficiency during the computation of registration parameter. The observable (image coordinates) does not contain information about the reference system, and a set of fixed control points has to be included in the registration. Fixed points have to be included in the adjustment and can be (i) pixel values of an additional image assumed as a reference, (ii) an external set of pixel coordinates, or (ii) a set of map coordinates.
The datum problem (Zero Order Design, ZOD) involves the choice of an optimal reference system in which images will be registered (Fraser, 1982). The use of a common datum to establish the reference system is related to the need of georeferenced images in remote sensing applications. High, medium and low-resolution satellite images are (often) delivered with a reference system based on geographic or cartographic coordinates. Other solutions, such as free network adjustment, as described in Granshaw (1980), based on criteria for datum definition that improve parameter precision are not taken into account in this work to meet basic requirements of remote sensing applications.
Given a set of fixed coordinate (control points) corresponding to points visible in the images (at least a single image pair if the number and distribution of image-to-image matches guarantee a connection among all the images), Eq. (2) can be turned into a system of linear equations. Unknown parameters in the network are not only image transformation parameters. Additional unknowns are added to take into consideration points matched between two or more images, whose coordinates are not given in the control point list. Such additional unknowns correspond to the image-to-image coordinates projected in the control point reference system. Image-to-image points allow one to register images without direct visibility of control points and, at the same time, to run the adjustment process in a single step.
The linear functional and stochastic models based on Eq. (2) have the form:
where
As mentioned, the rank defect of
where
The solution (and its precision
Equation (7) are used for a theoretical description of the mathematical problem. On the other hand, they are not the best choice for practical applications, especially when the number of images and points becomes relevant. Other methods (e.g., Gaussian decomposition) can be used to estimate
The problem (FOD, first-order design) involves the choice of the number of images (
Different simulations are presented in the next sections, in which the parameters were varied to understand the role of the input variables. Simulations were carried out by using images of 10,000 pixels
The particular symmetric form of Eq. (2) provides the same result for the precision of pixel coordinates (
The influence of image points on network precision (
variable,
2,
)
Let us consider the simplest case of a single image pair (
The form of Eq. (2) separates the coefficients for parameters (the two blocks (
The achieved precision of transformation parameters depends on
The importance of control points: network analysis reduced to pairwise processing (
constant,
variable,
)
As mentioned in the previous section, a good choice regarding points is between 400 and 2500 points. In this example,
The precision of translation and scale parameters in the case of image processing with a variable number of images remains unvaried. The number of images does not provide any change in translation and scale parameters (
Network analysis to reduce control points (
variable,
const.,
variable)
Let us consider a constant number of points
Number of equations and unknowns for 2 and 24 images as a function of the decimation factor.
The choice of a constant value for
The number of equations and unknowns (i.e., rows and columns of matrix
The previous equations highlight that the number of equations is directly proportional to the number of images. Adding other images will immediately result in a substantial improvement of data redundancy because the number of unknowns is only increased by 6
Finally, the complete result with the simulation based both on a variable number of images and decimation factors are shown in Fig. 3. Results confirm that without control point decimation the number of images is not significant, whereas an analysis based on a network of images provide better results, with an improvement of a factor 2.7 between the case with 2 and 24 images, with control points decimated by 40 times. The second important outcome is the considerable variability between the solution with the full set of control points and the decimated dataset (40 times), for the case of 2 images. Results regarding precision are much worse with an estimated factor 4.1. In the case of an analysis based on 24 images, the degradation of precision corresponds to a factor 1.5. Increasing the number of images will provide a set of curves in Fig. 3, which tend to become horizontal lines, confirming a very limited degradation of precision notwithstanding the strong decimation factor. According to these considerations, we can state that increasing the number of images allows one to reduce the number of control points, giving the opportunity to focus on the images (now available in cloud-based services) rather than control points.
The relationship between transformation parameter precision in the case of a variable number of images and control points. A similar graph is obtained for scale parameters.
The experiments in the previous section demonstrated that a huge number of image points is not significant, as precision improvement depends on
We could say that the results regarding precision for a network of satellite images requiring registration, with a limited number of control points and several images, become equivalent to those of the same network with the full set of control points. At the same time, this allows one to quantify the precision with several images and a limited dataset of control points as those achievable with the full set of control points, leading to the following relationships for the precision:
in which
The precision of translation parameters provides an immediate evaluation of the overall accuracy achievable from a network with a large number of images. A graphic visualization of
The precision for a network of several images as a function of image points. When the number of images increases, the effect of control points on parameter precision becomes less significant.
The final cost function introduced in Section 2 for a large network of images becomes:
The relationships illustrate that the cost can be minimized by reducing the number of images. On the other hand, this is not feasible to obtain a sufficient precision with limited control points, as demonstrated in the previous sections. At the same time, precision can be improved by increasing the number of image points, and it does not significantly affect CPU time. A network of multiple images connected provides better registration results for images with a weak connection to the final reference system (e.g., few or even no image-to-control points) because the additional images can add new connections inside the network. Such result can also motivate the use of the large number of satellite images available in web-services, selecting more images than those strictly necessary for the following phases of the work.
This paper presented some considerations on the advantages and disadvantages of image network analysis applied to the registration of satellite images. Results achieved through simulation based on the variance-covariance matrix of least squares adjustment revealed that an increment of the number of images becomes significant when a complete dataset of control points is not available. The number of points matched is not significant after gathering a regular distribution with only 400 points.
Results regarding precision for a network of satellite images, in which images have to be registered with a limited number of control points, become more similar to those of the same network with the full set of control points. From the point of view of automation in satellite image registration, this is a significant point since most work can be directly carried out with the images, i.e., the data available in web-services. We could say that the network will be connected not only with control points (such as in independent processing image pairs), but additional image-to-image connections will balance the lack of a large dataset of control points.
The proposed approach was presented for the particular case of registration problems. On the other hand, it could be extended for other practical issues where image redundancy can improve automatic data processing. An example is the case of the mathematical formulation used for atmospheric normalization via corresponding image points, in which the extension from independent image pairs to a network of images can be carried out by introducing a mathematical model that encapsulates all the images. Future work will be carried out to extend the proposed approach to other applications.
