Many-field packet classification using AMQ-R-tree

Abstract

In recent years, a modern networking paradigm called Software-Defined Networking (SDN) has been introduced to manage the large networks and datacenters. This paradigm was introduced to keep pace with the rapid growth rates on the Internet and to support advanced network applications. SDN uses an open standard protocol called OpenFlow to manage the network traffic. In OpenFlow switches, the software control plane has been decoupled from the hardware data plane and placed in a controller which has a network-wide view. OpenFlow process all incoming packets that have the same characteristics (lie in the same class) in the same manner. In this paper, we propose and use a generalized extension of BR-tree data structure called AMQ-R tree to organize the rule-sets in a multi-dimensional tree integrated with approximate membership query data structures including Bloom, Quotient and Cuckoo filters in order to get higher quality packet classification performance. Three types of approximate membership query filters integrated with the R-tree have been designed and implemented. Also, a multi-threading technique has been used to accelerate the packet classification by querying the tree in a parallel manner. The experimental results using simulated rule-sets show that this technique can classify incoming packets in up to 1.4 MPPS (Mega Packet per Second) with very small memory requirements.

Keywords

Software defined networking OpenFlow many-field packet classification R-tree approximate membership query Cuckoo filter

1. Introduction

Software Defined Networking (SDN) is a modern networking paradigm based on the OpenFlow (OF) protocol [30] that decouples the network software control plane from the hardware data forwarding plane [7]. In SDN, the network applications are equipped with a global network view so as to support complex implementations, such as fault tolerance, network verification, quality-of-service (QoS), access control, and network security. To support modern network applications, SDN needs a kernel network function called packet classification.

Unlike the traditional packet classification problem that concerts in forwarding packets according to five fields (source/destination IP addresses, source/destination port numbers, and protocol type), in OpenFlow packet classification, to route a packet, network global view are required at each network switch to decide the behavior of the packet, where the packet behavior is a set of switch actions on that packet [22]. The OpenFlow contains a set of predefined rules called rule-set, depending on the OpenFlow version, each rule in that rule-set contains a set of matching fields. OpenFlow version 1.4 supports up to 40 matching fields [37], these fields have arbitrary size and required different types of match. In the SDN packet classification, more than 15 fields in the incoming packet header fields are required to check against the rule-set in order to decide the packet behavior, hence it called many-field packet classification. The same behaviors are set to all incoming packets that belong to the same rule in a rule-set [14]. Recent rule-sets contain thousands or more of 18 fields’ rules, which make classifying packets against such rule-sets at the line speed a challenging problem [14].

In recent years, a large group of researchers made their interest to develop solutions to improve packet classification performance. Many hardware and software OpenFlow packet classification solutions have been proposed. The most of the hardware solutions use parallel lookup techniques to perform high-speed classification. They use a special type of high-speed memory such as ternary content addressable memory (TCAM) devices to perform their solutions, but using such hardware devices may limit the size of rule-sets [39]. To address the scalability problems, software solutions have been proposed [8,10]. Software solutions need more resources and they are more scalable such that they affect the classification performance significantly. Software solutions can be classified into several classes. The solutions that use tree data structures have a good memory accesses rate per classification, but they duplicate the classification rules among the tree more than one time, hence they suffered from high memory requirements [9,36]. Recently, the approximate membership query (AMQ) data structures have become an important component in much research area [15]. AMQ is a data structure used in many databases, routers, and storage system to handle the large amounts of data that they process. An AMQ data structure represents a large set of elements in a small amount of memory and supports a membership query with small false positive probability. In this work, we propose a generalized extension of BR-tree for many-field packet classification using the rectangular tree (R-tree) to speed up the classification throughput integrated with an approximate membership query data structure to minimize the memory requirements. R-tree requires only ${log}_{M} N$ memory accesses to visit the tree, where each rule is represented as Minimum Bounded Rectangle (MBR) in the R-tree by using its source IP address ( $SIP$ ) and destination IP address ( $DIP$ ).

The Bloom filter based R-tree (BR-tree) has been used, and two new extensions of BR-tree including the Quotient filter based R-tree (QR-tree) and the Cuckoo filter based R-tree (CR-tree) have been proposed in this work and then used in the many-field packet classification. In this work, we call the R-tree that integrated with an approximate membership query data structure as AMQ-R-tree. The AMQ data structure (filter) uses a little amount of memory to represent large rule-sets, also the AMQ data structure can handle the arbitrary size of rule fields. Using the R-tree data structure for packet classification provides high classification throughput, but it needs additional memory requirements to store metadata in the internal nodes. Hence, the AMQ data structures have been used to reduce the memory requirements, and to eliminate the redundant R-tree search at the similar branch of the tree. In the performance evaluation section, we perform a performance comparison with the other well-known many-field packet classification algorithms. Our proposed solution achieves better search performance and memory requirement than other many-field packet classification algorithms in the most cases. The main contributions of this paper are as follows:

Proposing of two new extensions of BR-tree including Quotient filter based R-tree (QR-tree) and Cuckoo filter based R-tree (CR-tree), that we name them AMQ-R-trees.

Using AMQ-R-trees for many-field packet classification.

Evaluating three variants of AMQ-R-tree (Bloom filter based R-tree, Quotient filter based R-tree, and Cuckoo filter based R-tree).

Performing a performance comparison with other many-field packet classification algorithms.

The rest of this paper is organized as follows. The related works are discussed in Section 2. The approximate membership query and BR-tree data structures concept are described in Section 3. The proposed AMQ-R-tree is introduced in Section 4, and the proposed AMQ-R-tree in many-field packet classification is presented in Section 5. A performance evaluation and simulation results are described in Section 6, and a conclusion about the paper is presented in Section 7.

2. Related works

Packet classification is an important network kernel function, therefore, it takes a big attention from the network developers. The classical packet classification is the network function that used by the traditional network devices to deal with forwarding packet according to 5 packet header fields only. Many-field packet classification is the network function that used to classify packets into flows according to a large predefined set of rules called rule-set. Each rule in that rule-set contains up to 18 matching fields. In the following sections, we will describe the current classical and many-field packet classification techniques briefly.

2.1. Classical packet classification

Many of hardware and software solutions for the classical packet classification problem have been proposed. Hardware solutions such as Ternary Content Addressable Memory (TCAM) [27] and Bitmap Intersection [26,39] provide a good classification performance by using a parallel lookup, but using hardware resources limit the size of the forwarding tables and rule-sets. Hence, the software solutions are required to solve the scalability problem. Software solutions use extendible data structures to store the rule-sets and forwarding tables. The current classical packet classification solutions can be classified as:

Basic Search Algorithms: This group contains the simplest solutions for packet classification such as Linear Search, Hierarchical Tries (H-Tries), and Hierarchical Binary Search Tree. The linear search algorithm uses a simple list to store the rules in decreasing priority order and searches for matching in that list in a sequential manner. Hierarchical tries use the prefixes to recursively build two-dimensional trie, and then, for each incoming packet, it traverses that trie looking for matching rule. The solutions that belong to this group store the rule-sets in a basic data structure with a small amount of memory, but it suffers from the scalability problem.

Heuristic algorithms: This group contains solutions such as the Recursive Flow Classification (RFC) [16], and Distributed Cross-producing of Field Labels (DCFL) [40]. The classification query is divided into a number of exact match queries. RFC solution can be used for classical and many-field packet classification. It divides the rule-set into a smaller set and performs the same single-field matching over several phases. This group requires more preprocessing time to perform rule-sets and query dividing.

Decomposition-based algorithms: In this group, the rule-sets are divided into several single-field rule sets, and then lookup for incoming packet fields has performed separately. The results are then combined from all fields. Field Split Bit-Vector (FSBV) [23] is an example of this group solution. This group spends more additional memory and processing time to merge the fields results in order to obtain the final result.

Decision Tree-Based Algorithm: In decision tree-based solutions, each header field is defined as a search space, and then, these spaces are divided into smaller sub-spaces and so on. To classify incoming packet, it performs a linear search on these sub-spaces. The HyperCuts [36,46] algorithm generates a shorter decision tree by performing multiple field cutting per step, to form shorter decision tree. It has high scalability and low rule-set dependency. The main Decision-tree-based solutions drawback is its high memory requirement.

Some researchers have turned their research on finding alternative ways to provide the packet classification services. The authors in [13] studied the non-linear dynamics of some specified traffic flow types and proposed a traffic classification approach. They based their analysis on non-linear statistics of the IP traffic flows that are not affected by payload encryption or dynamic port change. The non-linear analysis can be considered as an eminent alternative to linear analysis of traffic data. The authors in [11] use a cuckoo hashing for field-programmable gate arrays-based IP address filtering. Their proposed hash function is pipelined reconfigurable and supports parallel computation. They achieved a constant worst case lookup by using cuckoo hashing instead of the classical hashing in FPGAs. The classical hash functions have linear lookup and may map many IP addresses to a single table position while Cuckoo hashing guaranteed mapping each IP address to a unique hashing table position. The authors in [21] use neural network based on training function to perform packet classification. In their work, they used Artificial Neural Network (ANN) with combination of hidden layer neuron and training functions as an effective tool for packet classification. The authors in [28] study the existing decision-tree algorithms and their cutting schemes weaknesses and propose a new cutting schemes, they call their scheme BitCuts. BitCuts generates a decision-tree algorithm performs bit-level cutting and achieves 2.0×–2.2× the throughput of HyperCuts.

2.2. Many-field packet classification

Recently, the researchers focus in their works on many-field packet classification problem that classifies the incoming packets according to the rule-sets of more than 15 fields in each of their rules. We will list a group of the most recent works that try to accelerate the packet classification from different directions. A new approach to identify properties of the many-field packet classifier has been proposed in [25]. This approach has a linear space and logarithmic time complexity. It can classify packets with fields of more types including range type, but this approach uses a hardware device (TCAM). These devices are expensive and have a limited size that prevents to scale such approach to work on large rule-sets. Many software many-field packet classification algorithms had been proposed, many of these algorithms use tree structures to represent the rulesets, some of them use a probabilistic data structures to accelerate querying these trees. The authors in [20] proposed a packet classification algorithm using Bloom filter in a leaf-pushing area-based quad-tree. They hash the tree nodes in Bloom filter that stored in on-chip memories. Another Bloom search solution has been proposed in [42], using the GPU-accelerated implementation. The authors in [29] proposed a decision-tree based solution for many-field packet classification, they called their solution Binary search on levels (BSOL). They represent the rules as a hyper-rectangle in a two-dimensional address space, and then they generate a binary decision tree and associate each tree node with space. The space of each node is divided between its two children. They associate each node with a filter list. They concluded that their solution has a high classification speed. The authors of [9] employed a replication control scheme on the Binary search on levels (BSOL) algorithm to minimize the memory requirements, and the processing latency, they call the new algorithm as (BSOL-RC). The memory requirement is still high because they have to store the entire rule-set plus many additional data in their tree. A two-dimensional pipelined method for many-field packet classification has been proposed in [35], they used a two-dimensional array of Processing Elements (PEs). The array works as horizontal and vertical pipeline at the same time, and its PEs can access their memory locally and can support prefix, and exact match. Their method requires a special high-performance processing elements.

The authors in [34] proposed many-field packet classification approach using range-tree [45,47] and hash table [33] to build an efficient search algorithm. Firstly, they construct a range tree or hash table for each filed according to on the field’s type. Next they query their data structure for each field of the incoming packets independently. Later they merge the partial searching results to compute the final result. This approach also requires large preprocessing time and need high processing latency including the time required to reach in the range trees, the time required to search in the hash tables and the merging time.

In our previous work [2], we used the classical R-tree for packet classification. In this work, we introduce a generalized extension of BR-tree called AMQ-R-tree that integrates AMQ data structure with R-tree. We use this AMQ-R-tree in many-field packet classification, first, we use the Bloom filter based R-tree structure (BR-tree), and then we introduce and use two new data structures named Quotient filter based R-tree (QR-tree), and Cuckoo filter based R-tree (CR-tree). In the proposed solution, we avoid all the factors that restrict the scalability of the classifier and we shift the tree construction to the off-line phase to avoid the preprocessing time that is needed by the most of the classification methods that described in this section.

3. Approximate membership query and BR-tree data structures concepts

In this section, we will provide a general background about all of the data structures that we need in our work.

3.1. BR-tree

A BR-tree [19] is d-dimensional tree data structure. Each node in the BR-tree is R-tree node in the combination with Bloom filter array. Figure 1 depicts an example of a simple BR-tree data structure. In BR-tree of M degree, each node can contain between (m and M) d-dimensional entries, $m ⩾ \frac{M}{2}$ . In BR-tree nodes, each entry attribute is stored as a hashed item in the corresponding Bloom filter.

Fig. 1.

An example of Bloom-filter-based R-tree data structure.

BR0 is the root node and contains a big range to represent all possible attributes. In BR-tree of M degree, each internal node can contain m $(\frac{M}{2} ⩽ m ⩽ M)$ pointers to next level nodes, and the leaf nodes contain m $(\frac{M}{2} ⩽ m ⩽ M)$ pointers to the database location. When the number of entries becomes more than M for some nodes, BR-tree performs splitting on that node to generate two new nodes. Each node contains m entries as minimum allowed number of entries. Each internal node contains up to M entries in the form of $({MBR}_{i}, Bloomfilter, Pointer)$ where ${MBR}_{i} = (m b r_{0}, m b r_{1}, \dots, m b r_{M - 1})$ is a rectangle that bounds the entry range in minimum area called Minimum Bounded Rectangle (MBR), Pointer is a link to the next level node, and Bloom filter is a bit-array which stores all hashed attributes of the corresponding item. The internal nodes store a meta-data and ranges about their children, and the real data are stored only in the leaf nodes that appear in the last tree level. The leaf nodes are defined as $({MBR}_{i}, Bloomfilter, Item-pointer)$ where Item-pointer represents a pointer to the database location. BR-tree guarantees the data integrity so that the range union of all entries at each level is the same as the range union as the upper level.

3.2. Approximate membership query data structures

The approximate membership query (AMQ) data structures use a small amount of memory and have the ability to identify whether an item has a match in a large previously programmed dictionary or not. The AMQ data structures can answer the item is definitely not in the set or it may be in the set with high probability [6]. In this work, three types of AMQ filters have been used. In the next sections, we will introduce a summary of the filters that used in this work for approximate membership query purpose.

3.2.1. Bloom filter

Bloom filter (BF) [24] proposed by Burton Howard Bloom in 1970 is a probabilistic data structure, used to query about the elements membership. When querying BF about an item if it in a set, the result will be possibly in the set, with some error probability (false positive), or definitely not in the set without any false negative. The classical BF is a bit-array structure, where the items can be hashed to the BF, but cannot remove. Advanced BF versions (such as Counting BF) have the ability to remove items. The BF contains an array of m bits all set to 0, to add n new items to a BF of size m, first, feed each of these items to K different hash functions. These hash functions return K array positions, and set these positions in the BF array to 1. To query BF for an item, first feed that item to the same K hash functions, if all the K positions that returned from hash functions are already set to 1 in the BF array, then the item possibly in the set. If any of the K positions is 0 in the BF array, then the item is definitely not in the set. The size of the BF m, and the number of hash functions k are related to the required false positive rate. Also adding more items to BF increase the probability of false positive. BF false positives probability $= {(1 - e^{- k n / m})}^{k}$ .

Fig. 2.

A simple example to add three items ${X, Y, Z}$ to BF, and querying the BF for an item ${W}$ .

Figure 2 depicts a simple example to add three items ${X, Y, Z}$ to the BF, and querying the BF for an item ${W}$ . In this example, to add $(n = 3)$ elements, we feed each of them to $(K = 3)$ hash functions, and set the resulted position to 1. The colored arrows point to the resulted positions in the BF array of size $(m = 17)$ in this example. To query for an element W, feeds it to the same hash functions, and checks the resulted K positions if any of them pointing to a location its value is 0, then the element is not in that set, otherwise, it is possibly in that set. In the example, W is not in the set ${X, Y, Z}$ , and one of its pointers points on 0.

3.2.2. Quotient filter

A Quotient Filter (QF) is a probabilistic data structure proposed by Bender et al. in 2011 [4]. Like Bloom filter, QF has the ability to reply either the queried item is definitely not in the set or it possibly in the set. QF has better false positive rate than BF. There is a trade-off between the false positive rate and the filter size. Also adding more items to fixed size QF increases the false positive rate. In addition to insertion and query operations, QF can perform removing operation without affects the storage or the false positive rate. Also, the resize and merge operations can perform without needing of additional storage or re-hashing operations. Unlike the BF, an empty QF is a set of m slots, each slot contains a key and three additional meta-data bits. QF uses only one hash function to generate a fingerprint of p- bits from the inserted item. It uses r least significant bits called the remainder as a key, and the quotient $q = p - r$ most significant bits to set the slot number (insertion position) for that item. The quotient bits $(q)$ determine the QF size so that the QF has $2^{q}$ slots.

QF uses three meta-data bits to deal with collisions (when different items got the same keys and the same slot number). The quotient filter defines a $run$ as a set of fingerprints having the same quotient ( $q = p - r$ ) most significant bits that stored in contiguous slots, and the run whose first fingerprint occupies its canonical slot indicates the start of a $cluster$ . The $cluster$ is a set of runs ends by the unoccupied slot or the start of another cluster. The three additional meta-data bits are: $is_occupied$ , $is_continuation$ , and $is_shifted$ . The $is_occupied$ specifies whether the slot is the canonical slot for a key stored in the filter, $is_continuation$ is set only when slot is occupied but its remainder is not the first remainder in its run, a slot with $is_continuation$ clear indicates a start of a new run and end of the current run, $is_shifted$ is set to 1 when the corresponding remainder stored not in its canonical slot and when set to 0 indicates the start of a cluster.

The look up for an item X in a quotient filter is performed as follows. First, we hash the item X using a hash function to produce the $X_{p}$ -bits fingerprint, then use the $X_{r}$ least significant bits as a key (the remainder), and the quotient $X_{q} = X_{p} - X_{r}$ most significant bits as a slot number [15]. The slot number $X_{q}$ is the item’s canonical slot. If the three metadata bits are not set, then that slot will be empty and the quotient filter does not contain the item. Otherwise, if slot $X_{q}$ is occupied, we must find the quotient run (a set of reminders that have the same quotient and stored contiguously), and the start of the cluster (a contiguous set of runs). We must start from the canonical slot and scan left to find the start of the cluster (a slot with $is_shifted$ is not set), and then scan right to find the quotient’s run. We can find the quotient’s run by scanning and counting the runs to the right. We increment the run’s count when find any slot has $is_occupied$ set, and decrement the run count when find a slot with its $is_continuation$ not set (a start of another run and means end of the current run). When we reach the quotient’s run (when the run’s count reaches zero), we compare each remainder in that run with the $X_{r}$ -bits, if it is not found then the item is definitely not in that quotient filter, otherwise, it is possibly in the quotient filter with some error probability. To insert an item X in a quotient filter, we follow the same steps of looking up until we find quotient’s run and conclude that the key is not in the quotient filter. Then we insert the $X_{r}$ -bits (the reminder) in a slot in the quotient’s run. The run must remain in a sorted order, we should shift the remainders in that cluster and update the slot bits. If we shift an existing run during the insertion then we should set its $is_shifted$ bit. If we insert the $X_{r}$ at the start of other run, then that run will be shifted and we should set its $is_continuation$ bits.

3.2.3. Cuckoo filter

A Cuckoo Filter (CF) [43] is another probabilistic data structure. Its idea was originated from Cuckoo hashing, which was proposed by Pagh et al. [32]. CF can perform the usual probabilistic data structures operations (insertion, and query), as well as removing, re-size, and merging. An empty CF is an array of buckets, each bucket can store K fingerprints. CF usually uses two hash functions to generate a key and two positions. If the CF fails to insert the key in the first position (the slot already contains k keys), then it will try to insert it to the second position. If the alternative position is also full, then the CF uses specific operation called reallocation operation to deal with such situations. An advanced version of CF uses only one position to insert the element’s key, and if it fails to insert the key in that position, then it generates another position using the key in order to accelerate the reallocation operation. CF has better query performance and fewer memory requirements than both BF and QF. Figure 3 depicts a Cuckoo filter with size of 8 buckets, each bucket can store k=1 item. Initially, the items b and c are already in the Cuckoo filter. Figure 3(a) depicts how to add element “a” into the filter, $h 1 (a) = 1$ , $h 2 (a) = 6$ , we will check both buckets if one of them is empty we can add $f h (a)$ into that empty bucket, in this case, both $h 1 (a) = 1$ , and $h 2 (a) = 6$ buckets are empty, then we select one of them randomly (bucket(6)) and insert $f h (a)$ into it. Figure 3(b) depicts how to add element “d” into the filter, $h 1 (d) = 2$ , $h 2 (d) = 5$ . In this case, one bucket is empty (bucket(5)), then we can add $f h (d)$ into that empty bucket. Figure 3(c) depicts how to add element “e” into the filter, $h 1 (e) = 2$ , $h 2 (e) = 6$ . In this case, both buckets are not empty, then we add it into one of that buckets (bucket 6), kicks out the existing item (“a”), and reinserts this old item to its own alternate location (bucket 1). The process of kicking out of existing items and reinserting them in their alternate locations called reallocation process. In some cases, reallocating the old item may also require kicking out another existing item, and this procedure may repeat until a vacant bucket is found or the maximum number of reallocation is reached (e.g., 150 times in our implementation). Even with long sequence of reallocation process, Cuckoo filter still has $O (1)$ insertion time and high space occupancy.

To lookup Cuckoo filter for a given item x, first calculate x’s fingerprint $f h (x)$ and two candidate buckets $h 1 (x)$ , and $h 2 (x)$ , and then read these two buckets. If there is fingerprint match $f h (x)$ in these two buckets then returns true. Otherwise, returns false. The Cuckoo filter has more advanced features such as: it ensures that there is no false negatives as long as bucket overflow never occurs, it has better lookup performance and it supports deleting items dynamically [17].

Fig. 3.

(a) Inserting “a” at bucket (6). (b) Inserting “d” at bucket (5). (c) Inserting item “e” at bucket (6).

4. The proposed approximate membership query-based R-tree (AMQ-R-tree)

In this section, we propose AMQ-R-tree, which is a new extension of the BR-tree that uses the rectangular tree with integrated AMQ filters in its nodes. The AMQ-R-tree data structure is a combination of root, internal nodes, and leaves. Figure 4 shows the main structure of the AMQ-R-tree. Each node in AMQ-R-tree is an R-tree node integrated with an AMQ filter where an AMQ filter is a small data structure that hashes a set with n items by applying k hash functions on that set. In AMQ-R-tree, the root node’s entries (e.g., AMQ-R1, and AMQ-R2) hash all possible attributes. Let M be the degree of AMQ-R-tree (the maximum number of the entries of each node), the root node contains (1 to M) entries, internal nodes and leaves contain m entries $(\frac{M}{2} ⩽ m ⩽ M)$ . AMQ-R-tree can dynamically organize elements that contain multi-dimensional information such as rectangles or polygons and geographical objects. To represent such set of objects where each object contains p multidimensional attributes and q identities. The root and internal nodes contain entries in the form of (MBR, DAMQ, Pointer) where MBR is a d-dimensions bounded rectangle which covers the p multidimensional attributes. DAMQ is a dynamic length AMQ filter which stores the q identities whose multidimensional attributes are represented by the MBR, and the Pointer is the address of a child node. The leaves contain entries in the form of (MBR, SAMQ, Pointer) where MBR is a d-dimensions bounded rectangle which covers the p multidimensional attributes. SAMQ is a static length AMQ filter which stores the q identities whose multidimensional attributes are represented by the MBR, and the Pointer is address of items in database.

Fig. 4.

An example of AMQ-R-tree data structure.

4.1. AMQ-R-tree insertion

To insert a new entry to an AMQ-R-tree, subsequent steps should be followed. We describe the insertion process in a simple example depicted in Fig. 5 that shows how to insert the objects of Table 1. In this table there is a set of objects, each object has one multidimensional attribute ( $range$ ), two identities ( $length$ , and $color$ ). We need to insert these objects one by one in an empty R-tree of degree $M = 3$ (maximum number of entries at each node) and m $⩾ \frac{M}{2}$ (minimum number of entries at each node = 2). In the first step, we generate an MBR from each entry. Since an inserted MBR should be placed in a leaf node, we first need to locate the appropriate leaf node and then insert it. To locate the appropriate leaf, the tree from the root node to the leaves is traversed. At each level, the node entry that can cover the new MBR with the minimum area is selected, and the search is moved to its child until reaching the leaves. At the corresponding leaf, if that leaf can accommodate the new MBR, then the AMQ-R-tree inserts that MBR into that leaf and updates the MBRs in the path from the root to that leaf so that all of them should cover the new MBR. For the instance, Fig. 5(a) inserts K, L, and M to the root node. In this case, the root node is a leaf at the same time. Figure 5(c) depicts the insertion of items J to O.

Table 1
A simple example of 8 objects, each object with one multidimensional attribute and two identities

Object name K L M N J O P I

Range 9–15 10–19 20–25 23–30 3–9 27–35 35–40 1–5

Length 7 10 6 8 7 9 6 5

Color green white black yellow blue purple black red

Object name	K	L	M	N	J	O	P	I
Range	9–15	10–19	20–25	23–30	3–9	27–35	35–40	1–5
Length	7	10	6	8	7	9	6	5
Color	green	white	black	yellow	blue	purple	black	red

Fig. 5.

The full steps of generating R-tree from the objects in Table 1.

In the case of the selected leaf is already full, then splitting operation is performed on the selected leaf (Figs 5(b) and (d)). The splitting operation is performed by selecting two seeds ( $s 1$ , $s 2$ ) from the M entries that the full node contains plus the new entry (in Fig. 5(d) $M = {M, N, O}$ and the new entry is P so that $s 1$ and $s 2$ have the maximum distance between them among all other pairs ( $s 1 = M$ , $s 2 = P$ ). Two new nodes (Node1, Node2) have been formed so that Node1 contains $s 1$ , and Node2 contains $s 2$ (Node1 contains M, and Node2 contains P). The remaining entries $(M - 2) {N, O}$ have been allocated to Node1, and Node2 depending on which of Node1, and Node2 can accommodate each of the remaining entries with minimum area (allocate N to Node1, O to Node2), then a new entries in the parent have been generated and the parent’s entries node have been updated to cover these two new nodes (generate F and update $E^{'} s$ range in the parent). After that, AMQ-R-tree performs splitting into the upper levels if necessary. If splitting is required on the root node, then it splits the root node into two new nodes and generates a new root with two entries to cover the newly generated nodes, and increases the tree height by one. As can be seen in Fig. 5(e), inserting I to leaf node that is already full needs to split the old leaf, and adding a new entry to its parent and updating the other entries in its parent. Since the parent (the root in this case) is also already full (contain 3 entries $D, E$ , and F), the root node has to split too. The splitting operation is performed by selecting two seeds ( $C, F$ ) from the M entries that the full node contains plus the new entry ( $D, E, F$ , and the new entry C). Then forms two new nodes (Node1, Node2) so that Node1 contains C, Node2 contains F, allocates D to Node1, and E to Node2. After that, it generates a new root node with two entries (A, and B) so that A covers $C, D$ and B covers $E, F$ , then increases the tree height by one.

4.2. AMQ programming

The second step to construct AMQ-R-tree is to add an AMQ filter for each node’s entry. The AMQ filter is a small data structure which stores all hashed attributes of the corresponding entry. In the AMQ-R-tree root and internal nodes, the AMQ filter stores the hashed attributes of all the nodes in the lower levels that belong to its corresponding entry. In our implementation of AMQ-R-tree, a dynamic length AMQ filter at each tree level has been used, starting from the leaves. We select the static size m of required AMQ filter in order to insert all of node entries’ item identities with false positive rate less than decided upper bound of false positive probability. Then group all leaf entries AMQ filters in one larger dynamic length AMQ filter to hash them at the next upper level AMQ filters, so that the size of AMQ filter in the next upper level can hold all AMQ filters for the entries that bounded by that node entry.

Fig. 6.

An example of how to add AMQ filters to R-tree nodes, SAMQ shows the static AMQ filter and DAMQ shows dynamic AMQ filter.

To program AMQ filter, we use a set S of hash functions, each hash function returns insertion position with P bits, the number of P bits depends on the used AMQ filter. To insert a new item to the AMQ filter, we feed it to all of these S hash functions, each of the S hash functions returns P bits, since the size of the AMQ filters is variable at each tree level. We select a d bits from these P bits to fit the size of the AMQ filters at the corresponding level. Figure 6 shows a simple example of how to add AMQ filters to R-tree nodes. After constructing the R-tree, we start from the leaves and hash all the item identities in a static length AMQ filter. We select a d bits from the P bits that generated by the S hash functions. In the next upper level, we hash all the item identities in a dynamic length AMQ filter of the parent node, and so on till the root. To hash the item identities in the dynamic length AMQ filter of the parent node, we use $d + c$ bits so that c is a positive integer value, so that $d + c ⩽ p$ , and $d + c$ fits the length of the dynamic length AMQ filter of the parent.

4.3. AMQ-R-tree query

To query AMQ-R-tree for an item, first we should create MBR from that item’s multi-dimensional attributes and feed its identities to the same S hash functions that used at AMQ filters programming, then the AMQ-R-tree will traverse from the root to the leaves. At each level, we select the appropriate node entry that its MBR covers the queried MBR with minimum area, and its filter gives a positive result to the query identities. When querying the AMQ filters, we select appropriate number from the bits returned from the S hash functions according to the AMQ filter size at that level. At the leaves level, if the queried MBR matches any leaf entry and the AMQ filter gives a positive result for all the query attributes, we report that leaf’s entry (or entries) as a matching result.

5. Many-field packet classification using AMQ-R tree

Figure 7 depicts the main structure of our proposed classifier.

Fig. 7.

The proposed approximate membership query-based R-tree (AMQ-R-tree) data structure.

As it can be observed, the proposed classifier classifies incoming packets using the proposed AMQ-R-tree in two main phases:

Phase 1: Off-line AMQ-R-tree construction, in this phase, the proposed AMQ-R-tree maps the rule-set into an R-tree based on the source and destination address prefixes (SIP & DIP), then appends a static length AMQ filter for each leaf node entry and programs all the remaining fields from each rule to its corresponding AMQ filter. After that, the AMQ-R-tree creates a dynamic length AMQ filter for each node entry at the next upper level to describe all of the leaves that it covers. Subsequently, performs the same procedure until reaching the root, so that the AMQ filters at the root node entries hash all the rule-set.

Phase 2: On-line AMQ-R-tree query, in this phase, the proposed algorithm classifies N incoming packets by using K threads. The tree from phase 1 is used to accelerate classification process.

The Bloom Filter (BF), Quotient Filter (QF), and Cuckoo Filter (CF) have been used as AMQ filters to create AMQ-R-trees. The following subsections describe the proposed classifier in details.

5.1. Minimum bounding rectangles (MBRs) creation

For each rule in the rule-set, we generate one MBR as the tree entry. To create MBR, first we convert the SIP & DIP from each rule into range values, and represent these ranges as MBR, then describe the similar objects with bigger MBR in the next higher level. The converting of prefixes into ranges is performed by representing the minimum allowed value for the corresponding prefix as the start of the range, and the maximum allowed value as the end of the range. Table 2 depicts a simple example of 13 rules with SIP & DIP and their corresponding MBRs after conversion. Figure 8(a) depicts the corresponding MBRs for Table 2 after representing them in the geographic coordinate system. The dotted blue rectangles (R3-R7) represent the upper-level rectangles that AMQ-R-tree define, and the dotted black rectangles (R1, and R2) represent the root rectangles that AMQ-R-tree define. Figure 8(b) shows the corresponding AMQ-R-tree for Fig. 8(a).

Table 2
A simple example of 13 rules with SIP & DIP length = 8 and their corresponding ranges after conversion

Rule Name R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20

SIP 000* 00* 010* 0011* 011* 001* 000* 00* 11* 111* 1* 10100010 101*

DIP 0101* 010* 001111* 101* 1101* 111* 1001* 1011* 10* 00* 001* 1111000 01*

SIP range 0–31 0–63 64–95 48–63 96–127 32–63 0–31 0–63 192–255 224–255 128–255 220–220 160–191

DIP range 80–95 64–95 60–63 160–191 208–223 224–255 144–159 176–191 128–191 0–63 32–63 100–100 64–127

Rule Name	R8	R9	R10	R11	R12	R13	R14	R15	R16	R17	R18	R19	R20
SIP	000*	00*	010*	0011*	011*	001*	000*	00*	11*	111*	1*	10100010	101*
DIP	0101*	010*	001111*	101*	1101*	111*	1001*	1011*	10*	00*	001*	1111000	01*
SIP range	0–31	0–63	64–95	48–63	96–127	32–63	0–31	0–63	192–255	224–255	128–255	220–220	160–191
DIP range	80–95	64–95	60–63	160–191	208–223	224–255	144–159	176–191	128–191	0–63	32–63	100–100	64–127

Fig. 8.

Minimum bounding rectangles (MBRs) creation & AMQ-R-tree construction.

5.2. AMQ-R-tree insertion and programming operations

In our implementation of AMQ-R-tree, three different types of AMQ filters (Bloom Filter (BF), Quotient Filter (QF), and Cuckoo Filter (CF)) have been used to generate three different types of AMQ-R-tree that we named them BR-tree, QR-tree, and CR-tree. To insert a new rule to an AMQ-R-tree, first we generate an MBR from SIP & DIP, and then traverse the tree from the root node to the leaves, and perform the insertion operation as described in Section 4. After that, the remaining fields from each rule is hashed in AMQ filter. Each of the proposed trees adopts different standards for the programming operation. The following sections describe the programming operation for each of the proposed trees.

5.2.1. BR-tree programming

In our implementation of BR-tree, a dynamic length Bloom filter has been used in each node entry, and a set S of different hash functions have been used. Each leaf in BR-tree contains m $(\frac{M}{2} ⩽ m ⩽ M)$ entries in MBR. Each MBR represents the SIP & DIP for one rule from the rule-set, and the reminder fields of the rule are programmed in a Bloom filter attached with the MBR. Four different hash functions have been used to program each field (Murmurhashing2 [38], CRC32 [31], Bernsteinhash [5], Highwayhash [3]). A small register has been attached with each MBR to mention the fields that not required for matching (wild-card fields) in the corresponding rule. After that, the small BFs from each leaf has been represented by one larger BF at the next upper-level nodes, and so on until reaching the root. The BFs at the root node describes the entire rule-set’s fields. To prevent calculating the hashing indexes more times, we use a hash function that returns P-bits at each level. We select a number of bits to represent indexes appropriate for the size of the BF at the corresponding tree level.

5.2.2. QR-tree programming

In our implementation of QR-tree, we use a variable number of slots as needed in the QF for each tree node. We use Murmurhashing2 [38] to generate P-bits fingerprint. QF takes the R (remainder) least significant bits of these P-bits as the insertion position, and $Q = P - R$ bits as key to inserted instead of inserting the elements. In QR-tree, we use the same length R least significant bits as remainder with variable length of most significant bits as $Q = P - R$ quotient bits according to the Quotient filter size in the corresponding tree node.

At each leaf, we insert the reminder fields in the node by using the fingerprint bits that is generated by the hash function. These fingerprints are generated one time only and used the same number of R-bit and a variable number of quotient bits depending on the size of the QF at the corresponding level. The small QFs from each leaf has been represented by one larger QF at the next upper-level nodes to insert these fields, and so on until reaching the root. Table 3 depicts a simple example to show the size and value of the R, and Q bits when the word “TCP” in RQ-tree with 5 levels using Murmur hash function is inserted. After that Murmur hash function returns “110100110011110010001110” as a fingerprint. From Table 3, it can be seen that the QF at the root node has $2^{16}$ slots, and at the leaves has $2^{6} = 64$ slots only.

Table 3
A simple example to show the size and value of the remainder bits (R), and quotient bits (Q) when hashing the word “TCP” in a leaf node at the proposed RQ-tree with 5 levels using Murmur hash function

Tree level R size Q size R value Q value

Root 8 bits 16 bits 10001110 1101001100111100

level 2 8 bits 13 bits 10001110 1101001100111

level 3 8 bits 11 bits 10001110 11010011001

level 4 8 bits 8 bits 10001110 11010011

leaf 8 bits 6 bits 10001110 110100

Tree level	R size	Q size	R value	Q value
Root	8 bits	16 bits	10001110	1101001100111100
level 2	8 bits	13 bits	10001110	1101001100111
level 3	8 bits	11 bits	10001110	11010011001
level 4	8 bits	8 bits	10001110	11010011
leaf	8 bits	6 bits	10001110	110100

5.2.3. CR-tree programming

The Cuckoo filter (CF) is a table of buckets, that each bucket stores M keys. CF uses two hash functions to provide two possible locations in the buckets table for each key, and one hash function to generate the key. In our implementation of CR-tree, we use a variable size CF’s buckets table at each tree level. In each location of buckets table, we use a bucket that can store eight keys. In general case, we use only two hash functions, one to set location and the other for generating item key. In the case of the set position bucket is already full, we perform XOR operation between the two hash function results (location XOR key) to get a new position. In the case of that new location’s bucket also is already full, we put the new key in that location and give one of the old items as an alternative location by performing XOR between its current location and its key. The operation of setting alternative location to an item is called reallocation operation. In our implementation, we perform the reallocation operation 150 times if required. If there are a need to performing more than 150 reallocation operations to insert an item, then we decide that the CF is full, and extend the size of CF to insert the remaining items.

The insertion operation in CR-tree is very similar to QR-tree insertion. We use increasing size CF’s buckets table at each tree level. To insert a rule’s field to the CF, we calculate the hash functions for all rule fields one time only at the leaves and insert all remainder fields keys at that node’s CF at locations commensurate with the size of the current node’s CF size by selecting a set of bits from positions bits. Then move to the next upper levels until reaching the root. At each level, we insert these rule’s fields at the corresponding node’s entry.

5.3. On-line AMQ-R-tree query

To classify N incoming packets, K independent threads have been used, where each thread is responsible for classifying $\frac{N}{K}$ incoming packets. Each thread will perform the same on-line AMQ-R-tree query, then the intermediate results from each thread will add to the answer set.

Each thread, to classify an incoming packet (test whether it maps a rule in the rule-set), first it creates the query’s MBR from the incoming packet’s SIP & DIP, and feeds the remaining fields to the hash functions to get the keys and locations. Then the thread performs two queries simultaneously, one is MBR query, and the second is reminder fields approximate membership query. The MBR query is the same query at each of the proposed trees, where the thread will query the AMQ-R-tree by traversing it from the root to the leaves. At each level, it selects an appropriate node that its MBR covers the queried MBR with minimum area, then it moves the search to the next tree level and so on until reaching the leaves. At the leaves, it performs a matching between the queried MBR and the selected node’s MBRs and reports the corresponding rule as a primary search result, if there is any match. The thread will report these results as final results if it is a success at the remainder fields approximate membership query too. To perform the remainder fields approximate membership query, there are three cases:

Using Bloom filter based R-tree (BR-tree): We feed each of the remaining fields to K hash functions to get k array positions for each field, then traverse the BR-tree from the root to the leaves at each level. We map the array positions to the Dynamic Bloom filter DBF integrated at the appropriate node entry (the same node entry that resulted from the MBR query at the current level). If any of the bits at these positions is 0 and it mentioned as not wild-card field in wild-card register then the packet definitely not matches any rule in that rule-set, and we end the search (if it was, then all the bits would have been set to 1 when it was programmed). If all bits are 1 and the MBR query success at that level, we move the search to next tree level and so on until reaching the leaves. At the leaves, we perform the array position mapping with BF.

Using Quotient Filter based R-tree (QR-tree): The query in QR-tree is very fast and simple. For each incoming packet first, we generate MBR and calculate the fingerprints for the incoming packet fields and then we traverse the tree from root to leaves. At each level, we select all node entries that can cover the queried MBR and its QF gives a positive result for all the remaining fields taking into account the appropriate length for quotient bits at every level. If there are no node entries that can cover the queried MBR or any of fields get a negative result in QF query, then the QR-tree reports a negative result for the queried packet, otherwise moves the search into the next level and repeats these steps until reaching the leaves.

Using Cuckoo Filter based R-tree (CR-tree): To query an incoming packet in CR-tree, we calculate the MBR and hash functions for all incoming packet fields at the beginning and traverse the tree from root to the leaves. At each level, we choose the node entry that can cover the rule’s MBR in a smaller area, and map all remainder fingerprints with that node’s CF. Taking into account the appropriate position at every level that commensurates with the size of the current node’s CF size by ignoring a set of bits from previous level positions. Note that when query CF for an item, we should try in its alternative position if we fail to find its fingerprint in its position’s bucket. If any of incoming packet’s fields gives a negative result in CF, we will try in other node entries that can cover the queried MBR. If there are no node entries that can cover the queried MBR and query fails in its CF, then the RC-tree reports a negative result for the queried packet. Otherwise, we move the search into the next level and repeat these steps until reaching the leaves.

If there is more than one rule in the result, then the rule with the highest priority is selected, and the results from each thread are added to the answer set.

5.4. Theoretical analysis

In this section, we use the most commonly used two metrics in literature including query time, and memory consumption.

5.4.1. Query time analysis

Query time (T) is the amount of time that the classifier needed to classify single packet theoretically. It has been proved in [18] that the maximum height (h) for an R-tree that stores N items (rules) and has m minimum number of entries at each node is: $\begin{array}{l} (1) & h_{max} = ⌈ {log}_{m} N ⌉ - 1 . \end{array}$ To query an R-tree, we need to visit one branch only at each level. If we store one rule at each R-tree entry and each rule contains Q fields, then the R-tree query time is: $\begin{array}{l} (2) & T_{R-tree} = ⌈ Q {log}_{m} N ⌉ - 1 . \end{array}$

Since we use one AMQ filter at each node’s entry, then we need to query m filters at each level. Bloom filter requires $O (K)$ query time where K is the number of hash functions [24], and the total query time for the BR-tree will be: $\begin{array}{l} (3) & T_{BR-tree} = O (K {log}_{m} N) . \end{array}$ Note that Q is much bigger than K.

Quotient filter requires $O ({log}_{2} l \times p)$ where l is the number of elements [4], and p is the number of slots, then the total query time for QR-tree is: $\begin{array}{l} (4) & T_{QR-tree} = O (({log}_{m} N) \times ({log}_{2} l \times p)) ≃ O ({log}_{m} N) . \end{array}$ Cuckoo filter it requires a fixed number memory accesses (at most two memory accesses) [17]. The total query time for CR-tree is: $\begin{array}{l} (5) & T_{CR-tree} = O ({log}_{m} N) . \end{array}$

5.4.2. Memory consumption analysis

Memory consumption M is the amount of memory needed to store the rule-set in the proposed structure. Since we store the whole rule-set fields in the AMQ filters at each tree level, and the R-tree has ${log}_{m} N$ levels. To store rule-set with N rules that each rule has K fields. Since we have constructed the R-tree using two fields only and hashed the remaining fields in integrated AMQ structures, then the total space size that required in case of BR-tree. Each Bloom filter requires s bits. $\begin{array}{l} (6) & s = \frac{g ln P}{{(ln 2)}^{2}}, \end{array}$ where g is the number of hashed items and P is the maximum allowed probability of false positive. The whole ruleset has been hashed at each level with $(g - 2)$ fields at each rule. Then the BR-tree memory space will be: $\begin{array}{l} (7) & M_{BR-tree} = ({log}_{m} N) \times (\frac{(N (g - 2)) ln P}{{(ln 2)}^{2}} + 2 N) . \end{array}$ In case of QR-tree, the Quotient filter requires s bits [4]. $\begin{array}{l} (8) & s = b \times 2^{q} \times (r^{'} + log r), \end{array}$ where b is the number of bucket in the hash table, r is the remainder, and $r^{'}$ is the size of reduced remainder measured in bits. The total space size that is required to store the BR-tree in the memory is: $\begin{array}{l} (9) & M_{QR-tree} = ({log}_{m} N) \times (N (g - 2) \times (b \times 2^{q} \times (r^{'} + log r)) + 2 N) . \end{array}$ In case of CR-tree, to store one item in Cuckoo filter required C bits [12]. $\begin{array}{l} (10) & C = \frac{f}{α} bits, \end{array}$ where f is the fingerprint size, and α is the load factor. To store N rules with K fields for each rule, then the total space size that is required to store the CR-tree in the memory is: $\begin{array}{l} (11) & M_{CR-tree} = ({log}_{m} N) \times ((N (K - 2)) (\frac{f}{α} bits) + 2 N) . \end{array}$

6. Performance evaluation

In this Section, we will evaluate our work in term of different performance metrics. Simulation using Embarcadero RAD Studio 2010 (Delphi 2010) at Core-i7 machine with 8 GB RAM has been performed. For this simulation, we have created a number of rule-sets by using Classbench tool [1,41]. These rule-sets have similar characteristics to the real rule-sets and have different types (access control list (ACL), firewall (FW), and IP chain (IPC)), with different size (about 1K, 5K, and 10K rules) for each type. Table 4 depicts the simulated rule-sets and the tracing data size and specifications.

Table 4
The simulation rule-set and tracing data type, size, and specifications

Rule-set type ACL 1K ACL 5K ACL 10K IPC 1K IPC 5K IPC 10K FW 1K FW 5K FW 10K Input tracing packets

No. of rule 916 4417 9602 937 4459 9037 790 4652 9602 100, 1K,

Specifications Medium overlapping, high wild-card Medium overlapping, low wild-card High overlapping, medium wild-card 10K, 100K

Rule-set type	ACL 1K	ACL 5K	ACL 10K	IPC 1K	IPC 5K	IPC 10K	FW 1K	FW 5K	FW 10K	Input tracing packets
No. of rule	916	4417	9602	937	4459	9037	790	4652	9602	100, 1K,

Specifications	Medium overlapping, high wild-card	Medium overlapping, low wild-card	High overlapping, medium wild-card	10K, 100K

Table 5

Performance comparison in terms of the required storage, and number of memory accesses in both average and worst cases using variable size and type of rule-sets for our proposed trees and other packet classification algorithms

Rule-set type		ACL 1K	ACL 5K	IPC 1K	IPC 5K	FW 1K	FW 5K
No. of rules (N)		902	4,660	972	4,468	852	4,351
Memory requirement (Kbytes)	BR-tree	27.1	176	31.1	171	34.2	183
	QR-tree	32.2	201	34.2	199	39.1	215
	CR-tree	30.2	198	32.3	192	35.2	205
	BSOL-RC	463.1	893	324.1	567	250.1	513.4
	DBAMCP	161.1	211.5	90.1	188.2	118.3	323.2
	BF-AQT	177.2	230.7	125.3	265.8	184.2	644.4
Memory accesses (average)	BR-tree	23.1	26.4	21.3	24.5	25.1	35.2
	QR-tree	21.5	24.2	19.5	22.6	24.1	34.1
	CR-tree	18.4	20.0	19.4	22.2	20.1	30.3
	BSOL-RC	26	32	21.5	27.9	28.1	37.7
	DBAMCP	25.2	29.1	20.8	26.3	25.7	35.7
	BF-AQT	23.1	32.1	66.9	87.4	261.1	349.3
Memory accesses worst-case	BR-tree	55	84	55	84	55	84
	QR-tree	43	53	42	54	43	54
	CR-tree	44	46	39	44	38	46
	BSOL-RC	51	78	50	62	49	71
	DBAMCP	46	71	45	58	52	63
	BF-AQT	49	71	70	119	201	497

6.1. Memory requirement and memory access evaluation

The search speed is highly depended on AMQ filter performance and the number of memory accesses. In this evaluation, the memory access has been considered as the most time-consuming operation. We have used the dynamic size AMQ filters for each of our proposed trees (BR-tree, QR-tree, and CR-tree), then the average of memory access number to classify packets using each of the proposed data structures have been counted. In Table 5, we show performance comparison in terms of the required storage, and the number of memory accesses in both average and worst cases using variable size and type of rule-sets for our proposed trees against three many-field packet classification algorithms (A Decomposition-Based Approach for Scalable Many-Field Packet Classification on Multi-core Processors (DBAMCP) [34] , Binary Search on Levels with Replication Control (BSOL-RC) [9], and Packet Classification Using a Bloom Filter in a Leaf-Pushing Area-based Quad-Trie (BF-AQT) [20]). We have chose these algorithms because they related to our proposed algorithm so that they use tree structures in their solutions. In [20] they use a tree structure combined with a bloom filter.

As can be seen, the performance of the proposed classification algorithm for all the simulated rule-sets is distinctly better or similar to the other solutions. In BSOL-RC, each node in the tree has a filter list, these filters overlap the spaces of the corresponding nodes. The filters of each node are inserted to the child that overlaps its space. In the case of the parent space overlap both of its two children, insert the filter in both of its two children. However, inserting the parent space in both of its two children causes a memory expansion. Also, when the number of replicated filters became unacceptable due to the high decision tree, they divide the filter set into multi-subsets, and they generate a hash table for each tree level contains a leaf, this may generate a large set of hash tables and affects the classification performance. The range tree in DBAMCP has empty nodes and more tree levels than CR-tree in our approach. Also, it uses separated filters to stores the remainders fields. Hence, it required more memory accesses and storage memory. The BF-AQT builds a leaf-pushing Area base Quad Tree by assigning codewords for $SIP & DIP$ fields. They give every $SIP & DIP$ that have the same length one codewords. For $SIP & DIP$ that have varied lengths, they determine the codewords by the shorter length. For example, for a given rule with SIP&DIP = (11*, 101*), since DIP is longer, they extend the SIP to three bits, 110* and 111*, then they generate two codewords for each one combined with DIP. For $SIP & DIP$ that varied in length with large number of bits, the BF-AQT generates a huge number of codewords and stores these codewords in a hash table. Hence, it requires more storage memory and memory accesses.

Table 5 shows that the performance of the proposed solution on the IPC rule-sets type is better than other types in terms of worst and average number of memory accesses. Since IPC type has a small number of the overlapped rules, the more overlapped rules lead to more tree overlapped MBRs, which make the internal tree nodes have the minimum allowed number of entries that affect the tree height, and increases the required number of memory access. In case of FW type, it has worst performance, since it has many overlapped rules that affect the tree height and needs more number of memory accesses to traverse the tree. The number of memory accesses that required in the ACL type is between two other rule-sets. Our proposed trees show very good performance in comparison to the other packet classification algorithms, especially CR-tree that requires storing single fingerprint for each entry, and one or two memory access only. CR-tree performs as the best data structure of our proposed data structures and almost the best one between all the other techniques.

CR-tree has high scalability and low memory storage as we describe in Section 6.4 comparing to other many-field packet classification. BSOL-RC stores the replicated filters in a new decision tree, and this will increase its memory usage and memory accesses. DBAMCP uses unbalanced range tree with many empty unused nodes to stores prefixes and ranges fields, these nodes cost more memory storage and memory accesses. BF-AQT needs to store large number of rules to represents small rulesets, and huge number of rule to represents big rulesets especially for those rulesets that contain $SIP & DIP$ with different size.

6.2. Processing latency

The processing latency (L) is defined as average processing time to classify a single packet. We assume that all 15 field of the packet header is already available for use. To measure the processing latency, we use rule-sets of type IPC with variable size (1K, 2K, 4K, 8K, 16K, 32K, and 64K) rules. Then we compare the processing latency of our approach with the many-field packet classification in [9] (BSOL-RC), [34] (DBAMCP), [20] (BF-AQT). Each of these approaches uses rule-sets generated by the same tool (Classbench tool) that we use to generate our rule-sets. The DBAMCP uses range-tree and hashing to search the fields of the input packet header in parallel. It merges the partial results from all the fields in rules ID to produce the final match results. BSOL-RC enhances the binary search on levels BSOL [44] by employing a replication control scheme. The range tree in DBAMCP has more empty nodes and more tree level than R-tree in our approach. Also, it uses a separated filters to store the remainders fields. The overall processing latency of DBAMCP contains range tree search, filter query, and partial results merging process. Hence, as can be seen in Fig. 9, the proposed approaches latency outperform the DBAMCP approach. CR-tree outperforms the DBAMCP approach by (2×) times with respect to various rule-set sizes. The BSOL-RC constructs an unbalanced binary search tree and uses a set of hash tables to store the leaves nodes for every tree level that contains a leaf. Moreover, in BSOL-RC, the rule-set may divide into more than two sub-rule-sets because that single decision tree may not be scalable for large rule-sets. Also BSOL-RC dynamically stores replicated filters in a new decision tree to result in fewer memory requirements. Hence, as can be seen in 9, the BSOL-RC outperforms the DBAMCP only in the small rule-sets, and increasing the rule-sets size increases BSOL-RC latency rapidly. Also, the proposed approaches outperform BSOL-RC and BF-AQT with respect to various rule-set sizes.

Fig. 9.

Processing latency comparison with respect of various rule-set sizes.

6.3. Classification throughput

To evaluate the throughput for our proposed approaches, we define classification throughput as the number of the incoming packets that our system can classify in a second measured as Mega packet per second (MPPS). We have generated variable size of rule-sets, for each type (100, 1k, 5k, and 10k) rules. Then we perform the tree query for N incoming packets using a variable number of threads, at each time. We have used a varied number of tracing data as incoming packets. We have generated the tracing data by the same tool that we use to generate the simulated rule-sets. The results show that the best number of the threads should be used in our machine is 32. For example Fig. 10(a), (b), and (c) show the throughput of our proposed BR-tree, QR-tree, and CR-tree, respectively. All of these trees were built for IPC rule-sets using a different number of threads to classify a different number of the incoming packets. The best number of threads to be used for classification is highly depends on the used machine specifications, and it is variable from machine to other.

Fig. 10.

The average throughput for BR-tree, QR-tree, and CR-tree using a different number of threads.

As can be seen in Fig. 10(a) that BR-tree can classify the incoming packets in up to 1 MPPS when working on rule-set of type IPC with size 100 rules, and up to 0.830 MPPS when working on the rule-set of type IPC with size 10K rules. For QR-tree and CR-tree, we get the same results about the number of threads. Figure 10(b) shows the average throughput for QR-tree using a different number of threads to query IPC rule-set. Figure 10(c) shows the average throughput for CR-tree using a different number of threads to query IPC rule-set. From this results, we conclude that the best number of threads to classify packets in our proposed trees in our machine was 32 threads. In the rest of this paper, we will use 32 threads to measure all of the proposed data structures performance. Figure 11 shows our proposed trees throughput for different types of rule-sets using 32 threads.

Fig. 11.

The proposed data structures throughput for different types of rule-sets using 32 threads for querying.

As can be seen from Fig. 11, the throughput of our three trees have same patterns, and this is because the throughput highly depends on the R-tree height multiplied by the number of memory access required to query each node’s AMQ filters. The main R-tree was built independently from the AMQ filters, that means the R-tree height is not affected of what type of AMQ filter used in it, and the number of nodes that required to be visited is fixed. The only difference of the performance of our three trees is how many memory access required to matching the incoming packets in each tree level at query operation. The CR-tree presents the best classification performance among our other proposed trees because it requires the minimum number of memory accesses. BR-tree requires more memory accesses because it uses more hash functions. QR-tree uses two hash functions, but it needs more processing overhead to check up additional bits to mention the exact location of stored items. On its best case, CR-tree can classify up to 1.42 MPPS on rule-sets of size 100 rules and up to 1.2 MPPS on rule-sets of size 10K rules. In comparing to BR-tree and QR-tree, the best throughput that BR-tree can perform was 0.98 MPPS and the best throughput that QR-tree can perform was 1.21 MPPS.

6.4. Scalability evaluation

To evaluate the scalability of the proposed data structures from the respects of storage requirements and the average number of memory accesses. We have generated 15 large rule-sets with different size (from 20K rules to 100K rules) and characteristics. Since this experiment uses rule-sets with different size, we use Bytes per rule (BpR) to provide a better evaluation of proposed data structures scalability [44]. BpR is equal to the total required storage to represent the entire rule-set in the proposed data structures divided by the number of rules.

Fig. 12.

Storage evaluation using Bytes per rule, and throughput evaluation of the proposed data structures.

Figure 12(a) shows the required Bytes per rule (BpR) for the proposed data structures. As can be seen, the proposed data structures require 40 to 96 Bytes to store each rule for rule-sets of size 20K to 100K. As a result, the proposed data structures do not incur any exponentially increased storage. Moreover, the average required storage of the proposed data structures is only 70 Bytes for rule-sets with size of up to 100K rules. The proposed data structures are rectangular trees integrated with AMQ data structures, and these data structures are balanced and semi-full trees. The main factor to increase the BpR in such trees is the tree height. Increasing the tree height by one required to represent each rule in one more AMQ data structure in the new level. For large rule-sets, it requires adding a large number of rules till the tree became full or semi-full to increase the tree height one more time. Also, the bigger AMQ filters in the internal levels can hash relatively larger number of rules. Therefore, the proposed data structures could maintain the required storage without being affected by the growing size of rule-sets. Figure 12(b) shows the average number of memory accesses for the proposed data structures with variable size rule-sets. For each rule, up to one million packets have been used as tracing data to measure the average number of memory accesses. The proposed data structures require 20 to 49 memory accesses to classify each packet with average memory access equal to 34 for rule-sets of size up to 100K rules. The average required memory accesses increases only when the tree height increases. Hence, the average required memory accesses increases moderately in very slow rate with the growing size of rule-sets.

In our experimental scalability evaluation, the results show that the proposed data structures could obtain regular storage requirement and reasonable memory accesses for 15 different rule-sets, and provide good scalability in terms of variable sizes and characteristics of rule-sets.

6.5. Classification accuracy evaluation

In our proposed system, we have used a set of space-efficient probabilistic data structures. These data structures provide false positive in a very small probability, but they do not provide false negatives. In other words, when querying such data structures, the result will be either “possibly in the set” or “definitely not in set”. The false positive rate in such data structures is not a constant value, instead of that, it has a probability depends on a set of factors, such as the number of items to be inserted, the size of the data structure, the number of hash functions, and the size of the fingerprints. Each of the AMQ data structures that used in the proposed solution has a standard equation to calculate the false positives probability. The Bloom filter, Quotient filter, and Cuckoo filter have been used with R-tree data structure. To insert n elements, if k is the number of hash functions, m is the number of bits in the filter array, f is the fingerprint bits, and b is bucket size, then: $\begin{array}{l} (12) & BF false positive probability = {(1 - e^{- k n / m})}^{k}, \\ (13) & QF false positive probability = (1 - e^{- α / 2^{- f}}) ⩽ 2^{- f} : α = \frac{n}{m}, \\ (14) & CF false positive probability = 1 - {(1 - \frac{1}{2^{f}})}^{2 b} ≃ 2 b / 2^{f} . \end{array}$ All these filters used in our system in a dynamic form, so that the exact number of the array bits is not static, and hence the false positive probability is very difficult to calculate in a theoretical manner. For each of our proposed data structures, we have calculated the false positives probability experimentally by classification the variable sets of the incoming packets that generated especially to match the existing rules, and then we find the ratio between the incoming packet that classified as expected and those that fail to find its corresponding rules and classified to wrong rules. Table 6 depicts the false positive probability for all our proposed solutions.

Table 6
False positive probability of the proposed structures using up to one million packets as a tracing data

Tracing data 1K 10K 100K 1 M 1K 10K 100K 1 M False positive probability

Algorithms False positive True positive

BR-tree 3 26 234 2148 997 9974 99766 997852 0.00217

QR-tree 1 9 88 749 999 9991 99912 999251 0.000762

CR-tree 0 2 18 168 1000 9998 99982 999832 0.000169

Tracing data	1K	10K	100K	1 M	1K	10K	100K	1 M	False positive probability
BR-tree	3	26	234	2148	997	9974	99766	997852	0.00217
QR-tree	1	9	88	749	999	9991	99912	999251	0.000762
CR-tree	0	2	18	168	1000	9998	99982	999832	0.000169

The results in Table 6 show that our proposed solutions have false positive rates ( $2 \times 10^{- 3}$ , $7 \times 10^{- 4}$ , and $1 \times 10^{- 4}$ ) for BR, QR and CR trees respectively, that rate includes false positive from all levels and the R-tree itself. Note that the quality of the employed hash function and the length of the array will affect noticeably on the false positive rate. We have used simple hash functions such as (Murmurhashing2 [38], and CRC32 [31]). In general, BR-tree has the highest false positive rate, this is because Bloom filter uses a very simple hashing technique, QR-tree uses high-quality hashing technique, but still has high hard collision probability, and CR-tree is the best data structure among the other proposed data structures, which has very small false positive probability. This is because that CF uses the reallocation technique that prevents inserted elements collision.

7. Conclusion

In this paper, a generalized extension of BR-tree integrated with approximate membership query data structures has been proposed and used in many-field packet classification. In addition, multi-threading technique has been used to classify multiple packets in a parallel manner. Three types of the approximate membership query filters have been used separately in this tree, for each type, and performance evaluation has been performed using simulated rule-sets generated by Classbench tools. In addition, the scalability and the accuracy of the packet classification have been practically measured. From the results, we concluded that the best number of threads to be used in our machine was 32 threads to get the best result. The proposed technique can classify packets in up to 1.4 MPPs using Cuckoo filter based R-tree (CR-tree) with very small memory requirements. It can classify packets in up to 1 MPPS and 1.2 MPPS using Bloom filter based R-tree (BR-tree) and Quotient filter based R-tree (QR-tree), respectively. In the performance comparison with the other well-known packet classification, we concluded that our proposed techniques have less memory accesses and less memory requirements in most cases. Also, we performed performance comparison with two modern 15-fields packet classification approaches from respect of the processing latency. The results showed that the proposed approaches outperformed these approaches in more than 2× times.

References

http://www.arl.wustl.edu/~det3/ClassBench/index.htm.

Abdulhassan and

Ahmadi, Parallel Many Fields Packet Classification Technique Using R-Tree, Annual Conference on New Trends in Information and Communications Technology Applications (NTICT 2017), IEEE, 2017. doi:10.1109/NTICT.2017.7976137.

Alakuijala and

Wassenberg, HighwayHash on Github, 3-2016.

M.A.

Bender,

Farach-Colton,

Johnson,

Kraner,

B.C.

Kuszmaul,

Medjedovic and

Zadok, Don’t thrash: How to cache your hash on flash, in: Proceedings of the VLDB Endowment, 2012, pp. 1627–1637.

Bernstein hash, djb2, http://www.cse.yorku.ca/~oz/hash.html.

Breitinger,

Guttman,

McCarrin,

Roussev and

White, Approximate matching: definition and terminology, NIST Special Publication 800-168, https://nvlpubs.nist.gov/nistpubs/specialpublications/nist.sp.800-168.pdf.

J.D.

Burman, Software-Defined Networks (SDN): What systems integrators need to know, 2013.

Chang,

Lee and

Su, Multi-field range encoding for packet classification in TCAM, in: Proceedings IEEE (INFOCOM 2011), IEEE, 2011, pp. 196–200. doi:10.1109/INFCOM.2011.5935001.

Y.C.

Cheng and

P.C.

Wang, Packet classification using dynamically generated decision trees, IEEE Transactions on Computers 64(2) (2015), 582–586. doi:10.1109/TC.2013.227.

10.

Cohen and

Raz, Simple efficient TCAM based range classification, in: INFOCOM, 2010 Proceedings IEEE, 2010, pp. 1–5.

11.

Dobai,

Korenek and

Sekanina, Evolutionary design of hash function pairs for network filters, Applied Soft Computing 56 (2017), 173–181. doi:10.1016/j.asoc.2017.03.009.

12.

Fan,

D.G.

Andersen,

Kaminsky and

M.D.

Mitzenmacher, Cuckoo filter: Practically better than bloom, in: Proceedings of the 10th ACM International on Conference on Emerging Networking Experiments and Technologies, ACM, 2014, pp. 75–88.

13.

Francesco and

Fiore, A nonlinear, recurrence-based approach to traffic classification, Computer Networks 53(6) (2009), 761–773. doi:10.1016/j.comnet.2008.12.015.

14.

Ganegedara,

Jiang and

V.K.

Prasanna, A scalable and modular architecture for high-performance packet classification, IEEE Transactions on Parallel and Distributed Systems 25(5) (2014), 1135–1144. doi:10.1109/TPDS.2013.261.

15.

Goudarzi,

Tabatabaei and

Ahmadi, Khorramshahr: A scalable peer to peer architecture for port warehouse management system, Journal of Network and Computer Applications 76 (2016), 49–59. doi:10.1016/j.jnca.2016.09.015.

16.

Gupta and

McKeown, Packet classification on multiple fields, ACM SIGCOMM Computer Communication Review 29(4) (1999), 147–160. doi:10.1145/316194.316217.

17.

Gupta and

Breitinger, How Cuckoo Filter Can Improve Existing Approximate Matching Techniques, International Conference on Digital Forensics and Cyber Crime, Springer International Publishing, 2015.

18.

Guttman, R-Trees: A Dynamic Index Structure for Spatial Searching, Vol. 14, ACM, 1984.

19.

Hua,

Xiao and

Wang, BR-tree: A scalable prototype for supporting multiple queries of multidimensional data, IEEE Transactions on Computers 58(12) (2009), 1585–1598. doi:10.1109/TC.2009.97.

20.

Hyesook and

Young Byun, Packet Classification Using a Bloom Filter in a Leaf-Pushing Area-Based Quad-Trie, Proceedings of the Eleventh ACM/IEEE Symposium on Architectures for Networking and Communications Systems, IEEE Computer Society, 2015.

21.

Imam and

A.W.

Muhammad, Network packet classification using neural network based on training function and hidden layer neuron number variation, International Journal of Advanced Computer Science and Applications 8(6) (2017), 248–252.

22.

Inoue,

Mano,

Mizutani,

S.I.

Minato and

Akashi, Rethinking packet classification for global network view of software-defined networking, in: 2014 IEEE 22nd International Conference on Network Protocols (ICNP), IEEE, 2014, pp. 296–307. doi:10.1109/ICNP.2014.52.

23.

Jiang and

V.K.

Prasanna, Field-split parallel architecture for high performance multi-match packet classification using FPGAs, in: Proceedings of the Twenty-First Annual Symposium on Parallelism in Algorithms and Architectures, ACM, 2009, pp. 188–196.

24.

Kaler, Cache efficient bloom filters for shared memory machines, https://pdfs.semanticscholar.org/465b/0d7872764bba2a6daadffa9a169d056b8c7c.pdf.

25.

Kogan,

Nikolenko,

Rottenstreich,

Culhane and

Eugster, SAX-PAC (Scalable and Expressive Packet Classification.), in: ACM SIGCOMM Computer Communication Review, ACM, 2014, pp. 15–26.

26.

T.V.

Lakshman and

Stiliadis, High-speed policy-based packet forwarding using efficient multi-dimensional range matching, in: ACM SIGCOMM Computer Communication Review, ACM, 1998, pp. 203–214.

27.

A.X.

Liu,

R.M.

Chad and

Eric, Packet classification using binary content addressable memory, Biological Cybernetics 24(3) (2016), P:1295–P:1307.

28.

Liu,

Sun,

Zhu,

Gao and

Li, BitCuts: A fast packet classification algorithm using bit-level cutting, Computer Communications 109 (2017), 38–52. doi:10.1016/j.comcom.2017.05.001.

29.

Lu and

Sahni, O (logW) multidimensional packet classification, IEEE/ACM Transactions on Networking (TON) 15(2) (2007), 462–472. doi:10.1109/TNET.2007.892845.

30.

McKeown,

Anderson,

Balakrishnan,

Parulkar,

Peterson and

Rexford, OpenFlow: Enabling innovation in campus networks, ACM SIGCOMM Computer Communication Review 38(2) (2008), 69–74. doi:10.1145/1355734.1355746.

31.

Mitra and

Tapan, Reconfigurable very high throughput low latency VLSI (FPGA) design architecture of CRC 32, Integration, the VLSI Journal 56 (2017), 1–14. doi:10.1016/j.vlsi.2016.09.005.

32.

Pagh,

Pagh and

S.S.

Rao, An optimal Bloom filter replacement, in: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, 2005, pp. 823–829.

33.

Pagh and

F.F.

Rodler, Cuckoo Hashing, Springer, Berlin, 2001.

34.

R.Y.

Qu,

Shijie and

K.P.

Viktor, A decomposition-based approach for scalable many-field packet classification on multi-core processors, International Journal of Parallel Programming 43(6) (2015), 965–987. doi:10.1007/s10766-014-0325-6.

35.

R.Y.

Qu and

K.P.

Viktor, High-performance and dynamically updatable packet classification engine on FPGA, IEEE Transactions on Parallel and Distributed Systems 27(1) (2016), 197–209. doi:10.1109/TPDS.2015.2389239.

36.

Singh,

Baboescu,

Varghese and

Wang, Packet classification using multidimensional cutting, in: Proceedings of the 2003 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM’03), ACM, New York, 2003, pp. 213–224. doi:10.1145/863955.863980.

37.

Specification, OpenFlow Switch, Version 1.4. 0, October 14, 2013.

38.

Tanjent wrote, MurmurHash first announcement, Tanjent.livejournal.com, January 2012.

39.

D.E.

Taylor, Survey and taxonomy of packet classification techniques, ACM Computing Surveys (CSUR) 37(3) (2005), 238–275. doi:10.1145/1108956.1108958.

40.

D.E.

Taylor and

J.S.

Turner, Scalable packet classification using distributed crossproducing of field labels, in: 24th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM 2005), Proceedings IEEE, IEEE, 2005, pp. 269–280. doi:10.1109/INFCOM.2005.1497898.

41.

D.E.

Taylor and

J.S.

Turner, Classbench: A packet classification benchmark, IEEE/ACM Transactions on Networking 15(3) (2007), 499–511. doi:10.1109/TNET.2007.893156.

42.

Varvello,

Laufer,

Zhang and

T.V.

Lakshman, Multilayer packet classification with graphics processing units, IEEE/ACM Transactions on Networking 24(5) (2016), 2728–2741. doi:10.1109/TNET.2015.2491265.

43.

G.S.

Walia and

Kapoor, Particle filter based on cuckoo search for non-linear state estimation, in: Advance Computing Conference (IACC), 2013 IEEE 3rd International, IEEE, 2013, pp. 918–924. doi:10.1109/IAdCC.2013.6514349.

44.

P.C.

Wang, Scalable packet classification with controlled cross-producting, Computer Networks 53(6) (2009), 821–834. doi:10.1016/j.comnet.2008.11.017.

45.

Warkhede,

Suri and

Varghese, Multiway range trees: Scalable IP lookup with fast updates, Computer Networks, 44(3) (2004), 289–303. doi:10.1016/j.comnet.2003.09.004.

46.

Wee and

Wooguil, Fast packet classification based on hybrid cutting, IEEE Communications Letters 21(5) (2017), 1011–1014. doi:10.1109/LCOMM.2017.2658605.

47.

Zhong, An IPv6 address lookup algorithm based on recursive balanced multi-way range trees with efficient search and update, in: Proceedings of the International Conference on Computer Science and Service System (CSSS), 2011, pp. 2059–2063.

Many-field packet classification using AMQ-R-tree

Abstract

Keywords

1. Introduction

2. Related works

2.1. Classical packet classification

2.2. Many-field packet classification

3. Approximate membership query and BR-tree data structures concepts

3.1. BR-tree

3.2.1. Bloom filter

3.2.3. Cuckoo filter

Table 1 A simple example of 8 objects, each object with one multidimensional attribute and two identities Object name K L M N J O P I Range 9–15 10–19 20–25 23–30 3–9 27–35 35–40 1–5 Length 7 10 6 8 7 9 6 5 Color green white black yellow blue purple black red

5. Many-field packet classification using AMQ-R tree

5.2.1. BR-tree programming

5.2.2. QR-tree programming

5.3. On-line AMQ-R-tree query

5.4. Theoretical analysis

5.4.1. Query time analysis

5.4.2. Memory consumption analysis

6. Performance evaluation

6.2. Processing latency

References

Table 1
A simple example of 8 objects, each object with one multidimensional attribute and two identities

Object name K L M N J O P I

Range 9–15 10–19 20–25 23–30 3–9 27–35 35–40 1–5

Length 7 10 6 8 7 9 6 5

Color green white black yellow blue purple black red