Abstract
Approximate Membership Query (AMG) data structures are widely used in the networking area recently. The most popular AMQ data structures are Bloom, Quotient and Cuckoo Filter (CF). The CF is more efficient than the Quotient and Bloom filters in term of the lookup time and false positive rate. An issue in CF is utilization of two hash functions in the query phase that increases the lookup time. Another issue is the relocation loop problem in the programming phase that attempts to find an empty bucket for the collied items. In this paper, a new approach called Single-hash Lookup Cuckoo Filter (SLCF) using one hash function to lookup an item is proposed. In the programming phase, SLCF utilizes a sequence of hash functions without any relocation to find an empty bucket and storing the overflowed items. It finds a bucket with an empty place and sets an index in the original bucket to the address of found bucket. To accelerate the query phase, a handle as a tiny part of the fingerprint is placed in conjunction with the index value in the bucket. In the query phase, only one hash function is used and if the queried item cannot be found in the bucket, the handle is investigated to membership checking. Otherwise, the second bucket pointed by the index is accessed. The simulation results for name search in the named data networking demonstrate that the lookup time is improved 27% for 16 million names in comparison to CF.
Introduction
Emerging the new paradigms in computer networking, such as Named Data Networking (NDN) and Software-Defined Networking (SDN) need efficient data structures with high-speed lookup time. In NDN, the name lookup in Forwarding Information Base (FIB) and Pending Interest Table (PIT) and in SDN, the many-field packet classification in the flow tables of forwarding devices are challenging problems. Therefore, the using of efficient data structures to design the FIB, PIT and flow tables are definitely important. On the other side, the Approximate Membership Query (AMQ) data structures have been used widely in the networking area in recent years. The popular AMQ data structures are Bloom, Quotient and Cuckoo filters. They use a small amount of memory and are able to identify whether an element is member of a set that previously programmed or not. An AMQ data structure can answer that an element is not definitely in a set or may exist in a set with high probability. Among the AMQ data structures, Cuckoo filter is the most recent one which has a lower false positive rate and faster lookup operation in comparison to the Bloom and Quotient filters. Cuckoo Filter (CF) [5] is a set-membership data structure based on the Cuckoo hash table [10]. It takes a fingerprint of the inserting items and then uses an array of buckets and two hash functions to locate the inserting items. If the first place is filled, it will be placed in second place. Accordingly, each inserted item always will be in one of its own places. In this paper, in the programming phase, when the first bucket is overflowed we utilize a set of k hash functions sequentially that generates different addresses, hence the generated addresses by means of the different hash functions are investigated until the empty bucket is found. The address of the found bucket is set as an entry in index-array. In the meantime, a handle field of the overflow bucket is filled with the tiny part of the fingerprint to accelerate the search operation in the query phase. In the query phase, a single-hash approach is utilized that performs the lookup operation using a single hash operation. In each bucket, an index-array in conjunction with a tiny part of fingerprints of overflowed items is used. The index-array is used to point to the overflowed items in the second bucket found by hash functions in the programming phase. The tiny part of a fingerprint which called handle is used for fast lookup. The handle is a small part of the most significant bits of the fingerprint. The simulation results for the name search in named data networking show that the proposed approach can improve the lookup throughput 27% for 16 million names in comparison to the Cuckoo filter. The main contribution of this paper is in the following:
Proposal of using a single hash function in the query phase of the Cuckoo filter to accelerate the search operation.
Proposal of using multiple hashing functions to find the bucket with the empty bucket to avoid the relocation problem in the Cuckoo filter.
The rest of this paper is organized as follows. Section 2 presents related research in this area. Section 4 introduces the single-hash lookup Cuckoo filter concept and the related algorithms. Section 5 describes the evaluation of the SLCF approach and the Section 6 concludes the paper.
Related work
Some efforts have been done to improve the lookup time of the Cuckoo filter. The “dynamic cuckoo filter” [9] enables reliable delete operation for a dynamic set and exploits a novel extendible and compressible structure to make the data structure space efficient. “SmartCuckoo” [15] is proposed to address relocation loop problem. They proposed a hashing scheme using a direct pseudo forest to placement of the inserted item that solves the endless loop in relocation phase. The “length aware cuckoo filter (LCAF) for IP lookup” [7] uses different numbers of hash functions to store and search for entries based on the prefix length popularity of routing entries. Using the LACF decrease the false positive rate while the memory usage is increased. A new general-purpose AMQ, the counting quotient filter (CQF) [11] supports approximate membership testing and counting the occurrences of items in a data set. This general-purpose AMQ is small and fast, has good locality of reference, scales out of RAM to SSD, and supports deletions, counting (even on skewed data sets), resizing, merging, and highly concurrent access. The paper reports on the structure’s performance on both manufactured and application-generated data sets. In their experiments, the CQF performs in-memory inserts and queries up to an order-of-magnitude faster than the original quotient filter, several times faster than a Bloom filter, and similarly to the cuckoo filter. New dynamic set intersection data structures, which we call 2–3 cuckoo filters and hash tables is introduced [4]. These structures differ from the standard cuckoo hash tables and cuckoo filters in that they choose two out of three locations to store each item, instead of one out of two, ensuring that any item in an intersection of two structures will have at least one common location in both structures. They use these structures to design improved algorithms for dynamic set intersection, which can, in turn, be used for answering triangle listing queries faster than with previous approaches, in the practical RAM and EM models. Name Component Encoding method has been proposed in [16] which speeds up the name lookup through encoding components to codes before searching in FIB. However, the mappings between components and codes are stored in a character trie, hence the encoding process will potentially slow down the name lookup noticeably.
In [8] a name prefix matching technique has been proposed that depends on using Bloom filter as a pre-searching tool before accessing the hashing-based name prefix trie. The main issue that arise when using Bloom filter is the use of multiple hash functions that slowing down the system.
In [1], the authors propose a new variant of membership query data structure, called quotient-based Cuckoo filter (QCF), that reflects a merging process between QF and CF to minimise the calculation overhead and employing it in a DPI system. QCF employs only two hash functions and consumes less calculation overhead than BF, QF, and CF. In [13], a new approach called fast two-dimensional filter with hash table (FTDF-HT) is proposed. In the proposed approach, a hash table storing the name prefixes, in a hierarchical manner, is only accessed when the proposed filter (FTDF) states that the name under querying exists. Thus, reducing the unnecessary access to the hash table. Moreover, the access to the hash table will be done with same hash function that is used for FTDF and this will minimize the search time.
In [14], a novel PIT implementation called FTDF-PIT for NDN is proposed. This new design depends on employing an approximate data structure that called fast two dimensional filter (FTDF) which has better performance than the other proposed filters (Bloom and Quotient filters).
In our proposed work, the standard cuckoo filter is improved to accelerate the query time and the relocation operation is overcome.
Cuckoo filter concept
The Cuckoo filter [5] is a compact data structure for approximate set-membership queries where elements can be added and removed dynamically in
Cuckoo hashing basics
Cuckoo hash table [10] consists of an array of buckets where each element has two candidate buckets specified by two hash functions

(a) Before inserting element x. (b) After element x inserted. (c) A Cuckoo filter, two hash per element and functions and four entries per bucket.
The role of
Therefore, an insertion only utilizes information in the table, and never has to retrieve the original element x.
Single-hash lookup cuckoo filter
Single-hash Lookup Cuckoo Filter (SLCF) is a data structure for approximate set-membership based on the standard Cuckoo filter. SLCF uses multiple hash functions in the programming phase rather than doing relocation by removing items. During the insertion phase, if each address is filled, the algorithm calculates a new address using a new hash function.

(a) Schema of programming of SLCF with k hash functions. (b) SLCF query with one hashing function.
Figure 2(a) shows a schema of the programming phase of SLCF with k hash functions. In AMQ data structures, by given two hash functions, it can be generated others using an equation introduced by Kirsch and Mitzenmacher [6] as can be observed in Eq. (3).

An example of initially stored items a, b, and c in the SLCF.
Algorithm 1 depicts the insertion procedure. Initially, the fingerprint of the given item x in line 2 and then the first hash function in line 3 are calculated. In lines 4–7, If the initial bucket which is obtained by the hash function has an empty entry, therefore, the fingerprint is inserted, otherwise, in lines 8 and 9, more hash functions are calculated until a maximum number of attempts (considered as 150). Then, in line 11, the fingerprint is inserted at this entry of the bucket and in line 12, the Most Significant Bits (MSB) of the fingerprint as the “handle” is extracted and in line 13, a Relocate Index (RI) for this fingerprint in the initial bucket toward the inserted bucket is set. The size of RI is equal to the sum of a tiny part of the fingerprint (the “handle”) and an index for containing the address of the inserted item’s bucket, which can be different depending on the implementation. We considered 8 bits of the fingerprint as the “handle”.

Insert
Algorithm 2 depicts the lookup procedure. Initially, the fingerprint of the given item x in line 2 and then the first hash function in line 3 are calculated. In line 4, if the algorithm finds the fingerprint in the bucket of the first hash functions’ bucket, then it is assumed that x exists in the CF in line 5, otherwise, the algorithm attempts to find an RI which contains the “handle” of x’s fingerprint in the bucket. If it is successful to find the RI (line 7) then it is assumed that x exists in the CF (line 8), and if more than one RI are found, the algorithm starts to discovering the RIs until it finds the fingerprint (lines 10–16). Otherwise, it returns a negative answer to discovering x’s fingerprint in the CF (line 17).

Lookup
Algorithm 3 depicts the deletion procedure. Initially, the fingerprint of the given item x in line 2 and the first hash function in line 3 are calculated. If it finds the fingerprint in the first hash function’s bucket (line 4), it will be deleted (line 5), otherwise, in lines 8–15, the algorithm attempt to find the fingerprint’s RI in the bucket. If it finds any RI, the RI will be deleted and the procedure will be terminated.

Delete
Our approach is evaluated using JAVA language on a machine with the following specifications: Operating system is Microsoft Windows 10, the processor is Intel(R) Core(TM) i7-6500U CPU @ 2.50 GHz, 2 cores, 4 logical processors, 4MB cache and memory is 8GB DDR3 RAM. We utilized the standard dataset of Content Name Collection (CNC) which contains unique and different length names [12] generated for NDN networks. The dataset has the following features: Hierarchical infrastructure of URL names, no duplicates, ordered, lowercase and uppercase letters, application layer protocol removed, and “www” removed. There are some examples of the dataset names: “/com/vimeo/33561523”, “/com/canaryislandgoing/index.php”, “/com/blogspot/ cocinaconsilvia/2011/12/ turron-de-chocolate-con-almendras.html”, and “/com/facebook/ photo.php”
In the implementation, the utilized hash function for SLCF is xxHash [2] and for CF are xxHash and Murmur3. It is assumed
We ran each experiment 20 times and then calculated the average results. Three existing approaches for name matching in NDN have been chosen for comparison with our approach: Bloom Hash (BH) [3], Components-Trie (CT) [16] and Name prefix trie with Bloom filter (NPT-BF) [8].
Lookup evaluation
Figure 4 shows the lookup throughput of BF, CF, Bloom Hash (BH) [3], Components-Trie (CT) [16], Name prefix trie with Bloom filter (NPT-BF) [8] and SLCF. We considered the number of entries per each bucket (e) equals to 4 because in this case, the memory usage is the most efficient [5]. The results in Fig. 4 demonstrate that SLCF improves lookup time by average approximately 19% in comparison to CF. The improvement of lookup time is due to that when the CF is on the verge of overflow, more items have been relocated, hence for more items it needs to calculate the second hash function. In addition, Fig. 4 shows that the SLCF outperforms the BH, CT, NPT-BF and CF and improves the lookup time.
Moreover, we evaluated the lookup procedure for different number of entries per each bucket which is shown in Table 1. The results demonstrate that lookup time is improved approximately by an average of 21%.

Lookup time comparison of BF, CF, and SLCF in milliseconds
1 Number of entries per each bucket of CF and SLCF.
2 Improvement of SLCF than CF.
Figure 5 shows the comparison of the insertion throughput. The results demonstrate that SLCF’s insertion throughput is improved than CF by average approximately 8%. In CF, time is lost in two ways while relocating an item. The first is calculating a hash function of the stored item’s fingerprint and performing an XOR operation and the second is performing the relocation operation. However, in SLCF, the only required operation is calculating a hash function without relocating the item.

Insertion throughput comparison of BF, CF, and SLCF.
Figure 6 shows the comparison of the deletion throughput. The results demonstrate that SLCF is improved than CF by average approximately 9% of deletion time. This improvement is due to that in the CF if the algorithm does not found an item by the hash function, it performs an XOR operation to find the item. However, in SLCF, discovering the existence of an RI does not require any calculation.

Deletion throughput comparison between CF and SLCF.
We evaluated the load factor (α) by means of an individual experiment. We performed it by inserting maximum possible items in order to obtain the overflow. The number of inserted items (the maximum number of possible insertable items) was different in CF and SLCF. Therefore the load factor using Eq. (4) is measured.
Load factor comparison between CF and SLCF
Load factor comparison between CF and SLCF
1 Number of entries per each bucket of CF and SLCF.
2 Improvement of SLCF than CF.
The results demonstrate that SLCF is improved than CF by average approximately 4% in load factor. This improvement is significant if the number of entries per each bucket is lower. This improvement is due to that in SLCF, sufficient bit count is extracted depending on the number of buckets (22 bits in our evaluation) from the hash function in the inserting phase. There is a possible circumstance which the hash functions generated by Eq. (3) are repeated. To solve this problem in such cases, a different substring of the bits from one of the hash functions in Eq. (3) is extracted and utilized to generate others.
Figure 7 shows the comparison of memory usage measured after insertion items.

Memory usage comparison of BF, CF, and SLCF.
It shows by average approximately 34% increase in memory usage. It is applied 32 bits for each address and 8 bits for each tiny fingerprint handle in an RI. In our experiment, CF and SLCF have
The false positive rate in SLCF is increased than CF by average of approximately
False positive rate comparison between CF and SLCF
False positive rate comparison between CF and SLCF

False positive comparison between CF and SLCF. (a) In term of number of buckets. (b) In term of load factor.
As can be observed in Figs 8(a) and (b), the false positive of SLCF is close to the CF, while the lookup time of SLCF has been improved.
In this paper, a variation of Cuckoo filter called Single-hash Lookup Cuckoo Filter (SLCF) was proposed. The SLCF utilizes k hash functions in the programming phase to overcome the relocation overheads. It additionally utilized a relocation index to point to the bucket which stores overflowed items and a handle to store a tiny fingerprint of overflowed items. In the query phase, the SLCF uses a single hash function to perform the lookup. In this phase, initially, it will check the tiny fingerprint then, if there is more than one matched tiny fingerprint, the fingerprints in the overflowed bucket will be investigated.
