Abstract
The main challenge of problem lies in the perception of Cognitive Radio technology is to discover licensed empty spectrum pattern. The efficient model is needed for allocation among licensed and unlicensed users in wireless spectrum to improve the extraction rate and collision rate. To discover the spectrum hole in spectrum paging bands, stirred by FP mining technique proposed an efficient enumeration approach, namely Constraint Based Frequent Periodic Pattern Mining (CBFPP). The proposed algorithm uses TRIE-like data structure with data mining constraints. CBFPP algorithm predicts periodic spectrum occupancy holes in the paging bands. It is shown that CBFPP has a high prediction accuracy with reasonable time complexity. Experiment with synthetic and real data validate higher prediction accuracy and with reasonable time complexities. The unlicensed user utilizes the predicted spectrum pattern in spectrum usage of channel without significant interference to licensed users.
Introduction
In recent wireless applications, to exploit vacant licensed spectrum chunks in a dense allocation of frequency bands has become difficult [16]. The variations in both spatial and temporal spectrum domains resulted in hardness when using traditional static allocation. In recent trends, the Cognitive Radio (CR) concept is being used to find the vacant spectrum chunk of learning the surrounding environment [4, 8]. In CR network, finding spectrum holes in licensed channels for secondary unlicensed Secondary Users (SUs) been rigged. The secondary user must leave the channel based on Primary Users (PUs) request or return. The Dynamic spectrum access (DSA) assigns the unoccupied slots of PUs to SUs [4]. Limitations in CR network, spectrum holes are allocated to SUs without considering time slots, leads to a high probability of collision [7, 8]. Many service bands avoid database access method, due to high overhead by short and frequent session. Some SUs may act as virtual PUs depending on physical layer standards, lead to tough spectrum sharing problem.
A Frequent Pattern Mining method used to avoiding collision rate in wireless communication, namely Partial Periodic Pattern Mining (PPPM) [27] was introduced to find patterns in spectrum usage. The collision rate can be bounded to occupancy prediction, a pattern represents the channel utilization within certain time but not regular. A partial periodic pattern refers to the channel utilization in a day without the peal channel utilization. The channel state prediction rules derived by PPPM, useful in identifying occupancy pattern even in the presence of noise, sensing errors and irregular user behaviors. PPPM uses candidate pattern generation and diminish the number of candidate pattern that cannot construct future longer patterns. In CR devices, PPPM optimize the strategies using candidate pattern for channel switching [27]. Mining patterns in sequence databases [3, 36] are not appropriate for finding spectrum slots. In single period, [16], [35] has explored time series database for all partial periodic. Due to complex in looping to mine in single span and no need in generating all pattern in a single span, made many pattern mining algorithm inappropriate. The problem focuses on whether a frequent pattern exists in channel spectrum in the presence of multiple devices.
In sum up, the problem is not straightforward and none of the solutions mentioned can deal with all the aspects relevant to it. To solve the problem of finding the periodic pattern, we need to create a robust algorithm that manages asynchronous periodicity by locating periodic patterns to permissible limit. We present an algorithm that uses a TRIE structure alike a Frequent Pattern (FP) tree constructed by CBFPP called a consensus tree, which enable parallel searching with the sub tree. The consensus tree’s progress is abridged using mining constraints namely level and rule constraints [19]. In [28] Generalized Center String (GCS) problem is fixed parameter tractable (FPT) concerning pattern length L and symbol set size |∑|. CBFPP using the downward closure property will avoid candidate generation and using level-wise search strategy, search space is limited as mentioned in [24]. CBFPP technique is proposed based on two tasks; first it searches all patterns for particular spectrum occupancy pattern at a particular instance. Secondly, it mines all patterns of any length with or without mismatch (wild card symbol). In summary, the contributions are to mine pattern representing a spectrum hole with the proposed CBFPP algorithm in the wireless channel. The second consensus tree is used to predict the pattern for the next time slot, which in turn represents the spectrum gaps in the wireless channel.
The rest of paper is organized as follows: In Section2, related work is presented in this Section. In Section 3, we generalize the problem for finding the spectrum pattern. In Section 4, we present the CBFPP algorithm and discuss its time and space complexity. Experimental results are reported in Section 5 using both synthetic and real data. Finally, in Section 6, concludes the paper.
Related work
Many papers [6, 38] describe a model that predicts the signal power, channel state and its variations. Auto regressive integrated moving average model (ARIMA) [17] is a combination of Moving Average model (MA) and Auto-Regressive model (AR) [38] for fixed time series in TV channels. ARIMA is designed to convert non stationary time series similar to time shifted series. Both ARMA and ARIMA model needs to convert time series into fixed and then analyse it, suffer as snag in real time. Hidden Markov Model (HMM) considers the spectrum channel as binary, such as occupied or unoccupied and use binary time series [5, 20]. In [26], addressed Multi class classification problem and using RF traces, the spectrum prediction through Convolution Neural Network and Long Short Term Memory techniques. A statistical approach is designed by [13], the classification framework use cooperative prediction to analyse the spectrum. Multilayer perceptron (MLP) used along with HMM to forecast the channel state, described in [5]. In [37], channel state predictor introduced, namely 3-state high order HMM [14] and also shows that channel spectrum contains Markov chain. In [30], Neural network based energy detection algorithm is used classify occupied and vacant slot in CR. The neural network increase the prediction accuracy in continuous sensing [15]. Spectrum profile and its usage are forecasted from historical spectrum statistics using Functional Link Artificial Neural Network (FLANN), prediction depend on how hidden layer is designed [9].
In sequence databases, spectrum gaps are represented as mined patterns discussed in [3, 36] not suitable due to fixed parameters. All single instance periodic patterns are generated in time series database using Periodic pattern mining [28], incurs high overhead for repeating mining process. In predicting occupancy spectrum, enumerating all patterns is inappropriate in spectrum occupancy prediction, and PPPM using conditional entropy to find spectrum gaps [27]. Pattern Sequence Based Forecasting (PSF) method for finding spectrum allocation to Sus [18] and it works only for particular Frequency bands [25]. Neural Networl based energy detection algorithm used for classifying the channel but highly influenced by threshold and bias term [30]. Bayesian online learning BOL model [2] is employed to detect the availability of PUs and predicting PU channel state, but state transmission within short time interval affects prediction [1]. The user provides anticipated period value to PPPM for detecting patterns in a section of the channel spectrum. In this work, Constraint Based Frequent Periodic Pattern Mining (CBFPP) algorithm is designed to find all possible length and position of all instances for patterns, which is described in our previous work [10]. The CBFPP algorithm can also find periodicity among pattern for a particular instance of the channel spectrum, which in turn can describe the PUs and SUs characteristics.
Problem definition
The spectrum holes availability in a channel is sensed by SUs in CR network. The spectrum holes are identified before commencing any transmission, and multiple PUs are considered as virtual PU to avoid interference created by PUs. The channel period can be represented by 1 and 0 denoted busy and idle channel. A wild card character matches 1 or 0; specify the channel state is uncertain. The pattern is a spectrum hole (gap) with a wild card, size, number of repetition and pattern are formulated by [27] for PPPM. Using PPPM parameters a pattern T = 110 110 110 101 110, symbol 0 has periodicity p = 3. Similarly, a partial periodic pattern is more than a symbol in channel spectrum is periodic. For example T= 110110110101110, the sequence 11’s starting position 4 with periodicity p = 4 periodicity; and the periodicity for pattern with wild card (*) symbol 11* in T is 4.
The main goal is to find a pattern is a partial periodic frequent pattern in channel with confidence not less than a user-specified threshold. However, in many real applications, the length l of pattern is not known in advance. The user behavior, sensing error and noise affects the spectrum channel. The spectrum occupancy pattern likely may contain wild card symbols. To solve the above problem, uses the definition of Generalized Center String (GCS) problem [11, 16]. GCS problem search for a pattern of any length, 1 ≤ l ≤ L, in input pattern length L. In N input patterns (0 ≤ q ≤ N), the search pattern l sequence expected to present for at least q times, with d mismatches. An enumeration approach Constraint Based Frequent Periodic Pattern Mining (CBFPP) [10], based on idea of frequent pattern mining techniques is proposed to solve the GCS in spectrum channel. In this paper, two techniques likely Frequent Pattern Tree (FP-tree) and Constraint Based Mining are integrated to solve GCS problem in spectrum occupancy pattern.
Constraint based frequent periodic pattern mining technique
A TRIE structure alike consensus tree is constructed by Constraints Based Frequent Periodic Pattern mining techniques along with two constraints [19, 32] to find spectrum gaps in channel spectrum. CBFPP technique outperforms than existing technique like PPPM and HMM because CBFPP is FPT and handles wild card as mismatches. CBFPP technique use level-wise search strategy to mine pattern along with mismatched (i.e. wildcard symbol) d ≥ 0 from the given instance of channel spectrum [12, 21]. The consensus tree grows with |∑| branches only if recommended support and confidence level fulfilled by each node in every. The value (j, k, e) represents the position of mutated pattern exists in each node of consensus tree (refers pattern with wild card symbol). The value (j, k, e) refers to sub pattern of jth in kth input of giving N input sequences with mutation e <d. The number of mismatches or wild card used in a pattern is referred as mutation factor e. A spectrum gap pattern present in a particular instance of channel spectrum, is considered as a path in consensus tree from root to any node.
CBFPP technique uses backward closure property to restrict FP tree growth, using antimonotone, monotonic and succinct constraints [19, 25]. An antimonotone constraint is the nodes with conf (b)=((N-sup(b))/(N-q)) <1, confidence value will be suppressed [32]. The number of pointers in each input pattern of channel spectrum is support value sup (s t ) of sub pattern. Monotonic constraint will suppress the consensus tree branch with support value is e < q. Each position of the consensus node with mutation e < d will be pruned, continued with other position like succinct constraint [19].
Each path p of the consensus tree refers the consensus pattern. There is a one to one correspondence between the paths and the patterns. The node t contains parent node’s subset of the pointer. Any nodes with less q pointers of different origins are pruned, which does not contribute to the production of sub-pattern patterns. The consensus tree is highly parallelized order, constructed in a breath-first strategy. The nodes are pruned using the downward closure property. Each |∑| branches are restricted with sup(b) < a and conf(b) >1, which reduces undesirable pointer checking and node formation in the consensus tree. All mutated sub-pattern are not generated as candidates for pattern, for each instance only one mutated copy is generated to reduce the space consumption. The patterns takes all degenerative neighbourhoods of all consensus slots as search space, not stored in consensus tree. If the downward closure property is applied, then tree nodes with sup(b) < a and conf(b) >1, will pruned which is pointing to less than q input patterns. The level constraint stops the growth of consensus tree at some level if level constraint is not satisfied.
The level-wise structure of consensus tree is shown in Fig. 1. In level l=1, contains sub-pattern 0, 1 in given sequences. The second level of the tree contains the pattern 00, 01, 10, and 11 based on constraints. Rule constraint prune the pattern (0) and (00), since (0) is a leaf node and (00) pattern appears in less than four sequences. Any sub-pattern containing (00) will not be generated in next level due to downward closure property. For the third level of consensus tree, the pattern {011,101, 110} among candidates {010, 011, 101, 110, 111}, are allowed to grow to the next level. The level constraint allows {1011} guaranteed to be patterned in the next level of consensus tree because all other contiguous sub patterns with length 3. CBFPP algorithm terminates tree, construct for the next level does not satisfy level constraint.

CBFPP construct consensus tree with d=0, ∑ = = {0, 1} and S={(0101101), (1101111), (1011100), (0110110)}, N=4, L=7, q=4, and ∑=2.
FOR each pointer j in input N
{
FOR k = j to L
{
IF ∃ k in i th symbol of b1 ∈ ∑ {
Create node Sb1 = (j, k, 0), find Sub-
pattern Sb′1 = (j, k, 1) ∀ b′1 ≠ b1
and j in Tb″1 for each b′′1 ∈ ∑
if and only if sup(b1) > threshold
}
}
}
FOR i=1 to L
{
FOR each sub-slot’s
with conf (b1, b2, b3, . . . . , b(i+1)) ≥1
{
FOR ∀ (j, k, e) in Sb1,b2, . . . , b i where k < L - i + 1
{
IF the k + i in j of bi+1 ∈ ∑
and sup(bi+1) < q
{
Sb1,b2, . . . , b i = (j, k, e + 1)
IF (e < d) and (∀b′i+1 ≠ bi+1)
IF conf (b1, b2, . . . , bi+1) ≥1
Sb1,b2, . . . , b i = (j, k, e + 1)
}
IF conf (bi+1) <1
Remove (S b i+1 );
}
FOR ∀Sb1,b2, . . . , b i ≠ Θ
{
FOR ∀Sb′1,b′2, . . . , b′ i with (b i , b′ i ) ≤ d
{
FOR ∀Sb1,b2, . . . , b i ≠ Θ
and conf (Sb′′1,b′′2, . . . , b′′ i ) ≥ q
IF (b′′ i , b′ i ) ≤ d
{
IF conf (b′′ i ) <1
Remove Sb′′ i
Create a new level,
Tb′′1,b′′2,...,b′′ i ← Tb′′1,b′′2,...,b′′ i ∪ Sb′′1,b′′2,...,b′′ i
IF ∃Tb′′1,b′′2,...,b′′ i
i = i + 1
GOTO L2:
else
Print (b′′ i , Sb′′ i )
}
L2:
}
}
IF ∀ (Sb1,b2,...,bi+1) =
Print (b1, b2, . . . , bi+1 ; Sb1,b2, . . . , bi+1)
Remove (Sb1,b2,...,b i )
}
i = i + 1;
}
CBFPP mining algorithms produce all sub-patterns, and using the upper bound to analyze the complexity of the algorithm with existing techniques. O(i) deals with comparison of symbols in two patterns in each ith level of consensus tree. The time complexity is O (N × L), since each pattern in each node is
For each mutated copy that passes rule constraint, the instance of the channel spectrum scanned where it is contained and get the starting positions of the mutated pattern. The mutated node created one at a time and maximal number of nodes generated at each level not exceeds N × (L - i + 1). Hence the total space complexity of CBFPP algorithm is O (N × L). Time complexity to |∑| L influenced by Rule and level constraints would, by lacking testing of all potentials of consensus pattern. No restriction for Hamming distance computation, it can be supported by other types of measures too, like average distance, etc. This computation can be done while checking each pattern during the nested loops. The aspirant pattern is generated if the consensus pattern satisfies the prescribed rule constraint. Therefore, the time complexity of CBFPP bounded by O (N × L) with certain low space complexity.
To construct a consensus tree, CBFPP need a single scan for channel at particular instance; all spectrum patterns with length ≥ 2 is generated. To obtain the pattern, ample comparisons between two patterns in ith levels of the FP tree is performed by CBFPP algorithm, with distance calculation as N× (L - i + 1), complexity is
To evaluate the performance, experiments performed on an IBM synthetic dataset (T1014D100K). In Fig. 2, shows performance of CBFPP algorithm, with randomly inserted pattern with d=2 variations inserted in synthetic sample data. The total number of generating patter denoted by TOTND. The maximum modes residing in virtual memory as M×N and the running time of the algorithm runs are denoted by M×N and RT. The total number of pattern stated in given data, and found by the algorithm at the same level. From Fig. 2, shows CBFPP running time increases slowly as the value of L and N are scaled larger. The space requirement remains intact with increase of N and L.

CBFPP performance with existing technique using network traffic traces. (a). Total Number of Patterns and Maximal number of nodes among N background sequences with d=2, (b). Total Number of Patterns and Maximal number of nodes among fixed length background with d=2, (c). Running time RT (ms) with varying N and L Values.
CBFPP technique compared with PPPM, Frequent Pattern Mining (FPM) model [34] and HMM model to show the pattern predicted are more accurate and generalized to predict the future pattern which in turns describes the prediction performance. Pre-processed data are obtained from Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/. The pattern extraction performance is presented in Fig. 3 across a long period of time with slow changing signal paging band. The frequent patterns are identified by CBFPP are more accurate than existing models in a short duration of observation under different confidence constraints. CBFPP is able to find all frequent pattern with a single span of paging band, hence miss rate is less and accuracy is high. PPPM uses rules to predict patterns, if no rule found, then consider as miss and an error. PPPM prediction accuracy increased by predicting shorter lengths. CBFPP is fixed length parameter, hence accuracy is not depending on the length of the pattern and its accuracy depended on observation time. Frequent pattern with degenerative or mutation enable to predict patterns with different binary time series of channel states. CBFPP pattern can also predict pattern with mutation among different time slots which in turn find the correlation of pattern among different time slots.

CBFPP pattern prediction in slow changing signal paging band across a long period of time.
CBFPP algorithm is tested with SIGCOMM’08 [33] of real-world network traffic traces. The network traces are sliced into 20ms smaller time slots and channel states converted into binary time series. In CR device a beacon is set for every 100 ms and interval represented by 5 bits. The channel activities are detected by the spectrum hole with a transmission. The detection of channel activities makes entire channel busy. CBFPP is tested with one day’s training data and test to show its accuracy and miss rate under different confidence constraints shown in Fig. 4. Usually CBFPP miss rate is less than existing PPPM, FPM and HMM model, since CBFPP scan the entire channel at a particular time slot and construct a consensus tree. CBFPP has some missing rate when using mutation factor and presence of noise in the channel. CBFPP miss rate does not affect the prediction accuracy in a larger extent. The lower miss rate implies more useful predicted patterns mined by CBFPP, which well again than PPPM. FPM does not discover partial periodicity.

Among N Background Sequences with d=2, pattern 10100100 is inserted 3N/4. (a). Total Number of Patterns and Maximal number of nodes among N background sequences with d=2, (b).Total Number of Patterns and Maximal number of nodes among fixed length background with d=2.
In Fig. 4 Rule confidence is reduced, the CBFPP accuracy remains same, with most slots are predicted with accuracy above 80 percent. The accuracy is calculated with predicted slots with the total number of slots. In single scan of channel spectrum, CBFPP finds maximum slots; few slots are missing at the rear end of the pattern due to contiguous in nature. PPPM, HMM and FPM predict the slots based the fixed length of slots and based on the prediction rules. The total accuracy shows that CBFPP is best making using of mutation factors, handles irregularity far better than PPPM. CBFPP prediction rate is higher in PCS bands with different rule constraints in Fig. 4. The miss rate varies with real confidence affects the accuracy due to random activities of patterns exists in PCS bands. CBFPP handle this random activity and which hard for FPM and HMM to predict. The total accuracy of PPPM is less compared to CBFPP with a slight variation because CBFPP predicts all patterns slots within PCS bands. The prediction performance of CBFPP is far better than PPPM, overheads the analysis in dynamic spectrum access.
In this proposed model, Constraint Based Frequent Pattern mining is used to predict the spectrum occupancy slots in real-world spectrum bands. In real world traffic irregularity, mining algorithm is suitable for such variations. The proposed CBFPP algorithm combines FP tree technique with Constraint based Data mining constraints. The downward closure property and level-wise search strategy on TRIE like structure improves the prediction accuracy. The CBFPP algorithm, including existing algorithms, for predicting occupancy slots in real time spectrum bands are exponential due to the nature of spectrum bands, which is foreseeable. CBFPP pruning technique using TRIE structure will impact on reducing the complexity. CBFPP also outperforms current methods to be more effective against anomalies. There are many spectrum bands idle due to tight interference in utilizing the slots between SUs and PUs. While there are many underused spectrum bands, due to the tight interference restrictions of the PUs, the actual number of spectrum holes which SUs can use are few. We also demonstrate that with data collected in the paging bands, the prediction of high/low channel utilization duration reaches an accuracy of 93 percent with a miss rate of 5 percent. Under low utilization periods, it is observed that CBFPP mines slots with substantial accuracy which can improve utilization of the spectrum band with varying the constraints.
