Abstract
Many multimedia objects accept an abstract representation as point sets, or point clouds, in the plane. Searching for objects in a collection is traduced to searching for matching point clouds. In this paper algorithms and data structures are given for indexing and searching point clouds. The indexes are implemented using off-the-shelf, popular, software components. Experimental tests were performed on large databases, including a synthetic database of 10 million point clouds (1000 points per cloud) and the MIR Flickr-1M database, which contains 1 million high-resolution images. The performance of the proposed indexes was evaluated according to: Average search time, construction time, recall@k, memory usage and performance under insertions and deletions. A thorough comparison was performed between the fastest method available in the literature and a repertoire of implementations. The most competitive index is three orders of magnitude faster than the state of the art, in the image database, with recall @ 1 ≥0.989 for 20% insertions and deletions.
Introduction
The impressive development of computer power and digital storage in the last years, along with an ever increasing number of sensors, have motivated a huge increase in the size of multimedia databases. A useful abstraction of a multimedia object is a point cloud, representing an object by a set of points in a two or three-dimensional coordinate system. Some examples of point cloud instances of objects include key point location in images, such as SIFT image descriptors [13], anchor points in audio signal spectrograms in the popular audio matching software Shazam [19] and fingerprints minutiae [8]. Point clouds can be affected by noise, affine transformations, and or insertions and deletions. There are, however, methods to construct affine invariant representations of point clouds, using quotients of Fourier coefficients as in [4]. In other words, using invariants the index only needs to be robust to noise, insertions and deletions. This is the focus of this work. To make the approach scalable and to enable its use by practitioners, only off-the-shelf software components were used for the indexes implementation and the code has been made available through a public software repository [18].
Point cloud retrieval problem
Let a point cloudP be a finite subset of the plane
Since the two matching point clouds might have different sizes, the relative support can be defined dividing by the size of the respective sets. Let
Notice that it is just a generalization of the well known Jaccard’s index, if the match have maximal support, the index have a value of 1.
Please notice that the support set depends on δ. A larger δ gives a larger support. A special case to consider is when the representation space is a discrete grid, then a point cloud is a finite set of
Three approaches to the point cloud recovery problem are considered in this paper. The first one considers the construction of an inverted index, where every point
Up to the best of the knowledge of the authors of this paper, there is only one index tackling the specific point cloud matching problem. This index was presented by Wang in [19], and is the basis of the Shazam app, matching a song recorded with a mobile phone with songs stored in a database. The key idea is to index the highest energy points in a spectrogram as the fingerprint of the audio stream. Songs and audio segments are represented as two-dimensional point clouds. Matching of audio segment is attempted using a collection of tuples (combinatorial hashes) of the form (freq1, freq2, Δ time ). The hashes are indexed in a table and matching audio files are detected by counting the maximum number of common tuples.
The point cloud recovery problem slightly resembles range join or KNN-join problems. The KNN-join have as input two datasets and output the k-nearest neighbors in the complement set. In this setup, Q would be the same query and the set P would be the union of all of the database point clouds. There are several range and KNN-join algorithms in the literature. The main ideas on those proposals are:
A strong assumption of the above KNN-join indexes is that both sets are available at once. In the problem under consideration, the query cloud is not available when the index is constructed. Moreover, the point cloud collection size is much larger than the query point cloud size. Hence, KNN-join indexes discussed above cannot be applied.
Summary of results and experimental setup
The focus of this paper is experimental, aiming at applications and the adoption by practitioners. We let all the implementations open source [18]. A summary of results, and the experimental setup follow. The best result is a data structure with average search time of 14ms for a 10 million point clouds database (1000 points/cloud), obtaining a Recall@1=1. Several indexes were explored for the point cloud recovery problem. A metric space approach and several inverted indexes were tested. There were seven data structures in total. An evaluation of the seven data structures designed for the point cloud recovery problem is presented. Two synthetic databases of ten million point clouds (one thousand points per cloud) were used. Point clouds were of two types, uniformly distributed and from a three fold gaussian mixture. We also compared with the real life MIR Flickr-1M database, which contains 1 million high-resolution images producing point clouds of key points, e.g. SIFT, of various sizes.
Inverted indexes
An inverted index takes all the points in a collection P and points back to the point cloud ID. Every point
Inverted index based on R*-Tree
The R*-Tree proposed by Beckmann et al. [2] is one of the R-Tree variants having better performance. It is based on the same structure as the R-Tree but apply different heuristics to reduce the size and overlap of the minimum bounding rectangles (MBR). The Bulkloading algorithm for R*-Tree construction is employed here, performing a pre-processing of the data to reduce the index size. This may be realized by partitioning and sorting the data as well as estimating the number of MBR needed [11]. This is a fast construction method producing R-Tree indexes with better internal structure.
Inverted index based on Vantage Point Tree
Vantage Point Tree (VPT) index, proposed by Yianilos [20], is a metric space index performing a data decomposition as a balanced binary tree. This index, along with L2 metric were selected to realize similarity searches. The index presents a good performance in similarity searches when the intrinsic dimension of the data is small (in this case is two dimensions).
Inverted indexes using a discrete grid
Inverted Grid Index
The Inverted Grid Index (IGI) presented by Ji et al. [10] was proposed with the main goal of improving KNN and reverse nearest neighbor (RNN) searches in distributed spatial data bases. The index is constructed by partitioning data space in a discrete grid with uniformly distributed cells. An inverted list for each cell is attached. The index was built in Apache Spark with MapReduce.
In the implementation,
Inverted Grid Index + Vantage Point Tree/R*-Tree
A combination of two techniques is applied here. The technique was inspired on the paper of Du and Li [6] on KNN Join based index, they make data decomposition on blocks. For each block, a Hilbert R-Tree is obtained to find the k nearest neighbors of elements in set S for each element of a set R.
Following a similar strategy, a data decomposition using a discrete version was performed, the Inverted Grid Index (see Section 3.1). For each cell, a vector space index is generated using either the R*-tree or the Vantage Point Tree. Notice that a two-level inverted index is obtained, in the first level there is an R*-Tree or a VPT index for each cell. In a second level, a small inverted index is constructed and each cell have attached an R*Tree or a VPT index. The construction and querying of index can easily be parallelized.
Succinct Inverted Grid Index
A succinct data structure stores objects using information as near to the information-theory lower-bound as possible, allowing to perform a set of operations with a small performance penalty, compared to the full versions of the data structure [16].
Okanohara and Sadakane [17] proposed the succinct index sarray for bitmaps. This index requires a sparse bitmap, that is, if m is the number of 1’s in bitmap and n the bitmap size, m ⪡ n. The operation select
v
(B, j), returning j - th bit v ∈ {0, 1} in bitmap B, for any j ≥ 0 was used. This structure requires
A modification of IGI is proposed here. Let ρ be the average amount of points in each cell and t the number of clouds. The average number of elements in each cell can be fixed such as ρ ⪡ t (using an adaptive discrete grid). Then, the inverted identifier lists can be converted to bitmaps, for a posterior succinct array index representation. Using this scheme, the amount of memory used is near the information theoretic minimum.
For each activated cell at query time, an sarray index is recovered and using only select1 operations the identifiers stored in the index are the output. The scenario of this approach is in memory-restricted applications, where storing an explicit index is not feasible. This memory savings come with a price at query time. Let m be as before, the number of 1’s in the bitmap, we have to perform mselect1 operations per sarray to obtain identifiers. Nevertheless, according to results presented on Section 6, under some conditions, this index uses up to 54% less memory and presents an improvement of 37% in search time compared to IGI index. It is important to notice that this memory reduction allows to allocate more point clouds in main memory, which is always a scarce resource.
Metric indexes for point cloud recovery
Assuming that point clouds are located on a discrete grid, then a point cloud is a finite subset of
A compromise among δ and the dimension of representation is observed. Assigning a large δ reduces memory space and dimension of B. However, a considerable reduction on the grid resolution is produced, affecting the selection process on the database. On the other hand, by selecting a small δ, the required memory space and dimension of B increases considerably. Additional factors depending of the specific problem domain should also be considered.
The main focus here is on the bitmap-based metric indexes for point cloud representation utilizing a small amount of memory.
Point cloud index based on sarray and VPT
An alternative for representing the point clouds with bitmaps using a small amount of memory can be implemented when the sarray succinct index is considered. In the test performed and presented later, a bitmap that originally required 1 M bit was stored using only 12 K bits with the sarray index.
Because of that, the point clouds will be represented with a bitmap. The bitmap is stored using a sarray index, and similarity search is realized using VPT.
The evaluation of distance functions involves an inner product calculation, a linear function of each bitmap longitude. As the bitmaps are stored with sarray indexes, the inner product of B1 with B2 can be performed in sublinear time using select1 and intersection algorithm. Inner product is evaluated in O (m1 + m2), with m1, m2 the longitudes of B1, B2, respectively. The use of sarray for point cloud representation in bitmaps allows to attain a close to the information-theoretic minimum memory.
Point cloud index based on relative positions list and VPT
Another form of representing bitmaps is a relative positions list. Let B = {0, 1}
n
a longitude n bitmap, m the number of 1’s in B and j the j - th 1 in bitmap B. For every bitmap of the database, a list l is generated, where the position of each 1 in B is stored, that is:
The inner product evaluation is performed using SIMD Galloping algorithm of Lemire et al. [12].
With this representation, the VPT index is used to perform similarity search.
The evaluation details are presented below, for the insertion and deletion on the database. Search time, construction time and recall@k are evaluation parameters used for performance comparison.
Note that the query schema proposed in previous sections is quite flexible. For example, instead of providing δ, the user can provide a parameter k, such that a single point q of a point cloud matches with a point p from another point cloud, when p is in the kNN of q. In all the experimental tests presented in the Section 6, the parameter k = 1 is used instead of the δ.
Experimental framework
All indexes were constructed in C++, exception made of a second version of IGI, implemented on Apache Spark 2.0.1. This index was constructed using MapReduce methodology. All code is available on GitHub [18].

Libraries used in the implementation of indexes for point clouds.
Besides Apache Spark, the following libraries were used (Fig. 1): Boost 1.6.2, SDSL Lite 2.0 [7], and SIMD Compression and Intersection [12].
The experiments were made on a 64 core server Intel Xeon CPU E7-4809 v3@2.00 GHz with 1.5 TB of RAM, on Ubuntu Linux using G++ 4.7 compiler with -O3 optimization flag. All indexes implemented use a single core, exception made of second version of IGI on Apache Spark, that requires 60 cores.
Two databases were synthetically generated with 10 million two-dimensional point clouds with 1000 points/cloud in [0, 10000) × [0, 10000). Two distributions were considered for the generation: Uniform and a 3 components Gaussian Mixture (GM).
The GM distribution was considered as an adverse case simulation. Representing the case of data distributed over a discrete grid, with many empty cells and a small number of cells containing a big amount of the data.
Experimental tests were performed on 10 K, 100 K, 1 M and 10 M point clouds for both distributions.
Similarly, experimental tests were performed on real data, using the MIR Flickr-1M database, which contains 1 million high-resolution images from Flickr. Both the SIFT and SURF algorithms, were used to obtain the points of interest of each image.
Insertion/deletion evaluation
A subset of 1000 point clouds of the 10M synthetic databases and from the MIR Flickr-1M were selected to evaluate insertion and deletion operations for both distributions on the synthetic data and for the MIR Flickr-1M database according to the following configuration: 333 clouds were modified with 1% deletions and 1% insertions 333 clouds were modified with 5% deletions and 5% insertions 334 clouds were modified with 10% deletions and 10% insertions
Experimental results
This section is devoted to the experimental evaluation of the point cloud indexes. The performance measures focus on construction and searching time, at a certain recall, under insertion and deletion of points in the query cloud. The star goal would be to have sub-second queries for very large databases.
Up to the best of the authors’ knowledge, there is a single index designed for the point cloud recovery problem, that is the Wang index [19]. Consequently, this is the only evaluated index for comparison with the proposed approaches. The index proposed by Wang was implemented on C++ using an inverted index for the indexation process, following [15].
Data structure parameters
After several independent tests with each algorithm, the following parameters of the data structures were selected: R*tree: Max - Elements/Node = 20, Min - Elements/Node = 10 IGI: δ = 10 Wang [19]: t
offset
= 0, Δ
time
= 500, Δ
freq
= 500, Tuples/Anchor = 5
Recall@k - results
All evaluated indexes, including Wang’s attain a recall @ 1 =1 for synthetic databases and a recall @ 1 ≥0.989 for the MIR Flickr-1M.
This indicates that the point cloud abstraction and data structures are robust regarding insertion and deletion. Because of that, the main focus in the evaluations will be on search and construction times.
Synthetic databases - Inverted indexes
Inverted indexes are quite fast. In this section they are compared against the baseline and each other in the alternative implementations.
Search time
Searching times for the inverted indexes are shown on Fig. 2. It can be observed that the best performance is presented by R*-Tree based index of Section 2.1, with average search times of 12 ms and 14 ms for uniformly and GM distributed data, respectively for 10 M point clouds. It should be noticed that Wang’s index average times were of 34 ms y 512 ms for the same data.
Average time for Apache Spark version of IGI index is over 4.6 s for the 10 M database with GM distributed data. Even using 60 cores, the SQL module of Spark does not allow SQL indexes, producing sequential readings of all data [1].

Search time of inverted index based point cloud indexes for a 1NN consult for uniformly and GM distributed data.

Construction time for uniformly and GM distributed point cloud indexes.
Succinct IGI index of Section 3.3 presents comparable, and sometimes better, search times than original IGI index, on 100 K and 1 M point clouds. In some cases requiring 50% less memory. Tables 1 and 2 shows the memory requirements comparison for both structures.
Storage (MB) comparison for IGI and Succinct IGI indexes for uniformly distributed data with respect to the number of point clouds indexed
Storage (MB) comparison for IGI and Succinct IGI indexes for GM distributed data with respect to the number of point clouds indexed
IGI index implemented on Apache Spark attains the best construction time of all evaluated indexes. This is expected because this index uses 60 cores of the server for the construction process.
It should be noticed that the largest construction time is required by VPT based and Wang’s indexes, for both data distributions. The rest of the indexes present comparable construction times, exception made of IGI Spark index, as observed on Fig. 3.

Search times for metric space based indexes of uniformly and GM distributed point clouds, kNN query (k = 1).
The performance obtained with metric space based indexes is presented below.
Search times
Figure 4 present the search times for the metric space based indexes. It can be observed that succinct sarray based indexes attain search times greater than 1 s on 10 K and 100 K point clouds. As the goal is to be able to do similarity searches in less than 1 s for those databases, those indexes present a poor performance compared with the results of the previous section. It should be noticed that experimental tests were also conducted for bitmaps of higher dimension (1 M and 100 M), resulting in similar search times.
As the point clouds are represented by high dimensional (1 × 106) bitmaps, metric indexes are affected by the dimensionality curse, causing search times not competitive with those of Section 6.3, based on inverted indexes.
Synthetic databases - Overall comparison
A summary of experimental results is presented on Table 3 for the GM distributed 10 M point clouds database with construction and search times, standard deviation of search time, memory required, dependencies and implementation difficulty.
Two difficulty levels are assessed for the implementation (as a guide for the practitioner): Level 1 - Easy: An API is available and the dependencies are easily installed. Leve 2 - Moderate: There is not an API available but the dependencies are easily installed.
From Table 3 it can be observed that the best search times are attained by RTree kNN and VPT.
Point cloud index comparison for GM distributed data with 10 M clouds. The timeout label means search time is ⪢1
Point cloud index comparison for GM distributed data with 10 M clouds. The timeout label means search time is ⪢1
Search time - SIFT

Search time for point clouds indexes based on inverted indexes for 1NN - Flickr-1M MIR database with SIFT algorithm.
Figure 5 shows the average search time for point clouds obtained from the SIFT key points in an image database. According with experimental tests, Wang’s index has the largest average search time of all inverted indexes based indexes. Furthermore, the search time of Wang’s index is 3 orders of magnitude larger than the index based on VPT and almost 3 orders larger than the indexes based on IGI + R*-Tree and IGI + VPT. The VPT-based index has the best average search time of 2.8 ms compared with 3.2 s in Wang’s index, that is 1.14 × 103 times slower.

Search time for point clouds indexes based on inverted indexes for 1NN - Flickr-1M MIR database with SURF algorithm.
For SURF key points, the results are similar. We can observe that in the Fig. 6, Wang’s index search time has one of the highest average search times of all indices for point points based on inverted indexes, with a difference of almost 3 orders of magnitude with respect to the indexes for point clouds based on R*-Tree, VPT, IGI + R*-Tree and IGI + VPT. The VPT-based index has the best average search time of 3 ms, compared to 2.06 s of the Wang index.
For all of the previous experiments approximately 1000 key points per object were used in the synthetic and the Flickr-1M MIR databases. This effectively upper bound the average search and construction time, because in practice, the average search time is proportional to the number of points in a point cloud. Another measure of interest would be a lower bound on the number of key points needed to to retrieve a point cloud from a database, maintaining recall @ 1 ≈1, under the effect of insertions and deletions.
Search results in MIR Flickr-1M database for 25 points of interest per point clouds using the SIFT algorithm (23% of insertions and deletions per cloud)
Search results in MIR Flickr-1M database for 25 points of interest per point clouds using the SIFT algorithm (23% of insertions and deletions per cloud)
In Table 4 it can be observed that the average search time is improved in two orders of magnitude by using 25 points of interest instead of the 1000 points used in the experiments. Here the average search time was 36 us for the VPT-based Index, compared to 189 us for the Wang’s index. Note that in this experimental trial, the effect of insertions and deletions increased from an average of 5.33% of insertions/deletions from the previous tests, to a 23.33% of insertions/deletions maintaining recall @ 1 ≥0.999 for proposed indexes and recall @ 1 ≥0.989 for Wang’s index.
Three schemes for the point cloud matching problem were explored in this paper. Namely, inverted indexes, discrete grids and metric spaces as shown in Fig. 7.

Schemes for the point cloud recovery problem.
It was experimentally shown that the point cloud abstraction and the proposed data structures are robust to insertions and deletions, achieving a recall@1 = 1 in synthetic databases. The fastest index, the R*-Tree based, achieves an average search time of 12 and 14 ms, respectively, for uniform and Gaussian mixtures of 10 M point clouds. For the same data, the implementation of the only known point cloud data index (Wang’s) takes 34 and 512 ms, respectively.
The superiority of R*-Tree and VPT based algorithms can be explained noticing that the inverted list size of those algorithms its controlled by k in the kNN search. On the other side, Wang’s index is constructed on a time-frequency discrete grid. Due to this, search time increases for denser data distributions.
Since only off-the-shelf, well known tools, are used, this approach is amenable for practitioners in several tasks of pattern recognition and multimedia indexing. Further improvements can boost the IGI index in Apache Spark by using other tools, such as Apache Ignite, to improve the performance of SQL queries, but ultimately SQL does not seem to fit for this task. Another open avenue of performance improvement comes from parallelization. In this work only one processor core was used at query time.
Footnotes
Acknowledgments
This research was partially supported by the Consejo Nacional de Ciencia y Tecnología (CONACyT) grant CB-179795. The first author had a graduate scholarship from CONACyT.
