High utility itemsets mining with negative utility value: A survey

Abstract

Mining high utility itemsets (HUIs) is a basic task of frequent itemsets mining (FIM). In recent years, a trend in FIM has been to design algorithm for mining HUIs because FIM assumes that each item can not appear more than once in a transaction and all items have the same importance (weight, unit profit, price, etc.). However, in real-world, items appear more than once in a transaction and also have some importance. HUIs mining considers that items appear with some quantity and importance. Traditional HUIs mining algorithms assume that items have only positive unit profit. However, in real-world, items may appear with negative unit profit also. For example, it is common that a retail store sells items at a loss to stimulate the sale of other related items or simply to attract customers to their retail location. Therefore, items occur with negative unit profit or negative utility. To consider negative unit profit, HUIs with negative utility has been introduced. This paper surveys recent studies on HUIs mining with negative utility and their applications. The main goal is to provide a survey of recent advancements and research opportunities. This paper presents key concepts and terminology related to HUIs mining with negative utility. This presents a taxonomy of all the algorithms consider negative utility. To the best of our knowledge, this is the first survey on the mining task of HUIs with negative utility. The paper also presents research opportunities and the challenges in HUIs mining problems.

Keywords

High utility itemsets mining utility mining negative utility

1. Introduction

Frequent itemsets mining (FIM) [1 , 60] is one of the data mining technique that finds meaningful information from the raw data. FIM finds sets of items that often occur together. First and widely used application of FIM is market basket analysis. FIM mines important information, but it has two limitations. Firstly, it considers items as a binary occurrence, and secondly, it does not consider the relative importance such as unit profit or price of an item. To address these two issues, high utility itemsets (HUIs) mining has been actively studied [5 , 56]. In utility mining, each item is associated with some utility (or unit profit). Each item in the transactions also contains the information about the purchase quantity of items. The utility is a measure of the importance of an itemset. It can be defined as the profit generated by the sale of items in a retail store.

High utility itemsets mining is the process of transforming raw transactional datasets into useful rules for effective strategic and decision-making purposes so that real business benefits can be yielded. HUIs mining is not an easy task because it does not satisfy anti-monotonic property

¹ If a set can not pass a test, all the supersets will fail the same test as well.

[1] which reduces the search space and prunes non-HUIs in the early stage of mining. To solve the problem of anti-monotonicity Liu et al. proposed an algorithm named Two-Phase [34]. Two-Phase algorithm presents TWU based anti-monotonic property. Many algorithms have been proposed by following Two-Phase model [25 , 55]. All these algorithms suffer from a lot of candidate generation, itemsets joining processes and multiple time dataset scanning problems. To reduce these limitations, tree-based HUIs mining algorithms [3 , 59] have been proposed. Tree-based mining algorithms mine HUIs without expensive candidate generation and test strategy. However, these algorithms scan dataset more than once and still generate too much candidate itemsets. To overcome these limitations, utility-list based algorithms [16 , 32] have been proposed. Utility-lists store all the information of items. Each itemset has a corresponding utility-list that store all the information of itemset. Utility-list is a vertical representation of the dataset alike Eclat [61]. Utility-list based algorithms consider itemsets which are not appearing in dataset. To remove this limitation, Zida et al. proposed a horizontal dataset projection-based algorithm [66].

All the above-discussed algorithms consider the positive utility only. However, in real-life, transactional datasets have negative utility. For example, a retail store sells items at a loss to stimulate the sale of other related items or simply to attract customers to their retail location. If we mine HUIs with negative items by using traditional algorithms, some candidate itemsets may be lost [7]. To address this issue of negative utility, in 2009 Chu et al. proposed an algorithm named HUINIV-Mine [7]. HUIs mining with negative items is difficult for newcomers to understand. Hence, in this paper, we present an up-to-date survey of HUIs mining with negative utility that can serve both as an introduction and as a guide to recent advancements and opportunities in the field. The major contributions of this paper are:

The basic taxonomy of the most common approaches for HUIs mining with negative utility including data stream, transactional, on-shelf, sequential and uncertain datasets based mining as depicted in Fig. 1.

We present a brief review and analysis of existing HUIs mining algorithms.

We categorize HUIs mining with negative utility approaches into level-wise, tree-based and utility-list based mining.

We analyze current HUIs mining with negative utility approaches and discuss their pros and cons in-depth.

We also give future directions in negative utility based HUIs mining.

Fig. 1.

Taxonomy of high utility itemsets mining with negative utility.

The rest of the paper is organized as follows. Section 2 describes preliminary definitions, properties to handle negative utility items and basic high utility itemsets mining algorithms. Section 3 discusses algorithms of high utility itemsets mining with negative items. Section 4 presents a summary and discussion on negative utility based high utility itemsets mining. Section 5 discusses research opportunities of high utility itemsets mining with negative items. Finally, Section 6 draws conclusions.

2. High utility itemset mining

2.1. Preliminaries and definitions

Let I = {i₁, i₂, …, i_n} be a set of distinct items. Each item i is associated with an external utility (price or unit profit, etc.), denoted as EU (i), which may be the unit profit or price of item i. Each item i is associated with internal utility (quantity) in transaction T_j, denoted as IU (i, T_j).

Let us consider the transactional dataset shown in Table 1, which will be used as running example. This dataset contains seven transactions (T₁, T₂, …, T₇). It contains five items A, B, C, D and E. The external utilities of items are presented in Table 2. For example, the transactional dataset, T₁ contains four items A, B, D, E and has internal utility 2, 2, 1, 3 respectively. The external utilities of these items A, B, C, D and E are, respectively, 2, -3, 1, 4 and 1. Thus, item B is sold at a loss.

Table 1
Transactional dataset

TID Transaction

T ₁ (A, 2) (B, 2) (D, 1) (E, 3)

T ₂ (B, 1) (C, 5) (E, 1)

T ₃ (B, 2) (C, 1) (D, 3) (E, 2)

T ₄ (C, 2) (D, 1) (E, 3)

T ₅ (A, 2)

T ₆ (A, 2) (B, 1) (C, 4) (D, 2) (E, 1)

T ₇ (B, 3) (C, 2) (E, 2)

TID	Transaction
T ₁	(A, 2) (B, 2) (D, 1) (E, 3)
T ₂	(B, 1) (C, 5) (E, 1)
T ₃	(B, 2) (C, 1) (D, 3) (E, 2)
T ₄	(C, 2) (D, 1) (E, 3)
T ₅	(A, 2)
T ₆	(A, 2) (B, 1) (C, 4) (D, 2) (E, 1)
T ₇	(B, 3) (C, 2) (E, 2)

Table 2

External utility of items

Item	A	B	C	D	E
External Utility	2	-3	1	4	1

Definition 2.1 (Utility of an item). The utility of an item i in a transaction T_j is denoted as U (i, T_j) and is defined as U (i, T_j) = IU (i, T_j) ×EU (i). The utility of an itemset X in a transaction T_j is denoted as U (X, T_j) and defined as U (X, T_j) = ∑_i∈XU (i, T_j).

For example, the utility of an item A in transaction T₁ is 2 × 2 =4. The utility of an itemset {A, D} in transaction T₁ is 4 + 4 =8. Note that, U (X) may be negative, which means that itemset X has generated a loss (a negative profit). For example, if X is a single item with negative unit profit, then U (X) will always be negative.

Definition 2.2 (Utility of an itemset in dataset). The utility of an itemset X in dataset D is denoted as U (X) and is defined as U (X) = ∑_{X⊆T_j∧T_j∈D} U (X, T_j).

For example, utility of an itemset {A, D} is U (AD, T₁) +U (AD, T₆) = 8 + 12 = 20.

Definition 2.3 (Transaction utility). The transaction utility of a transaction T_j is denoted as TU (T_j) and is defined as TU (T_j) = ∑ U (i, T_j).

For example, TU of T₁ is U (A, T₁)+ U (B, T₁)+ U (D, T₁) + U (E, T₁) = 4+ (-6) + 4 + 3 =5. Third column of Table 3 shows TU for the running example.

Table 3

TU and RTU for the running example

T _ID	Transaction	TU	RTU
T ₁	A, B, D, E	5	11
T ₂	B, C, E	3	6
T ₃	B, C, D, E	9	15
T ₄	C, D, E	9	9
T ₅	A	4	4
T ₆	A, B, C, D, E	14	17
T ₇	B, C, E	-5	4

Definition 2.4 (Transaction weighted utilization). The transaction weighted utilization (TWU) of an itemset X is denoted as TWU (X) and is defined as TWU (X) = ∑_{X⊆T_j∧T_j∈D} TU (T_j).

For example, TWU of item A is T₁+ T₅+ T₆ = 23. TWU value of each item is shown in Table 4.

Table 4

TWU and RTWU of each item

Item	A	B	C	D	E
TWU	23	26	30	37	35
RTWU	32	53	51	52	62

Property 2.1 ( Pruning using TWU). If TWU of an itemset X is less than minimum utility (min _ util) threshold, then itemset X and all of its supersets are non-HUIs. The min _ util is user defined threshold value.

HUIs mining algorithms use TWU for fulfilling the anti-monotonic property.

Definition 2.5 (High utility itemsets). An itemset X is called a HUIs if utility of X is no less than min _ util. Otherwise, itemset X is a low-utility itemset.

Table 5 shows the HUIs for the running example where min _ util is 10.

Table 5

HUIs of the running example

Itemsets	Utility	Itemsets	Utility
{A}	12	{A, D}	20
{A, C, D}	16	{A, D, B}	11
{A, C, D, B}	13	{A, D, E}	24
{A, C, D, E}	17	{A, D, E, B}	15
{A, C, D, E, B}	14	{A, E}	12
{C}	14	{C, E}	23
{C, D}	31	{D}	28
{C, D, B}	16	{D, E}	37
{C, D, E}	37	{D, E, B}	15
{C, D, E, B}	19	{E}	12

2.2. Properties to handle negative utility items

The traditional HUIs mining algorithms only can handle positive utility values and can utilize TWU based Property 2.1 to prune the search space. The positive utility mining algorithms cannot be directly applied where negative utility is considered [7]. To handle the problem of mining HUIs with negative utility, HUINIV-Mine utilizes redefined transaction utility (RTU) as well as redefined TWU as described below.

Definition 2.6 (Redefined transaction utility). The redefined transaction utility is denoted as RTU (T_j) for transaction T_j and is computed as RTU (T_j) = ∑_EU(x)>0 U (x, T_j). To calculate RTU, items must be sorted according to descending order to their utility values [7].

For the running example, RTU of T₁ is computed as RTU (T₁)= U (A, T₁) + U (D, T₁) + U (E, T₁) = 4 +4 + 3 = 11. We did not include utility of item B because we calculate RTU by only adding the positive items. RTU (X) ≥ TU (X), because, TU is the summation of all items in a transaction includes negative items whereas RTU is the summation of items have positive utility. Table 3 shows the TU and RTU value of each transaction.

Definition 2.7 (Redefined transaction weighted utility). The redefined transaction weighted utility (RTWU) of an itemset X is defined as RTWU (X) =∑_{X⊆T_j∈D} RTU (T_j) [7].

For example, RTWU of T₁ is RTU (T₁)+ RTU (T₅)+ RTU (T₆) =11 + 4 +17 = 32. RTWU value of each item is shown in Table 4.

Property 2.2 (Pruning using RTWU) . The RTWU downward closure property states that any superset of a low RTWU itemset is low utility. For an itemset X, if the RTWU (X) < min _ util then X is not a HUIs and all the supersets of itemset X are also not HUIs. The detailed proof of this property can be found in [7, 30].

For example, RTWU of all the items is shown in Table 4.

2.3. Traditional high utility itemsets mining algorithms

Traditional HUIs mining algorithms generate rules from transactional datasets. Transactional datasets usually have a large number of distinct single items and their combinations is also a huge number of itemsets. Therefore, basic HUIs mining algorithms need some upper-bound or search space pruning strategies. In order to address this issue, Liu et al. proposed an algorithm Two-Phase algorithm [34] which proposed TWU based pruning strategy. Algorithms [25 , 55] use two-phase model and TWU based pruning strategy. All these two-phase based algorithms suffer from multiple dataset scans and expensive joining operations. In order to overcome these limitations, in 2010, Tseng et al. proposed a tree-based algorithm named UP-Growth [50]. UP-Growth follows popular FIM algorithm FP-Growth [20]. Later on, some other tree-based algorithms are proposed such as [3 , 59]. These algorithms are the improved versions of the basic UP-Growth algorithm. UP-Growth or UP-tree based algorithms still suffer from the generation of a lot of candidate itemsets and multiple dataset scans.

To overcome the limitations of tree-based algorithms, Liu et al. proposed HUI-Miner algorithms [32] which used utility-list based data structure. Later on, variations of utility-list and extension of HUI-Miner are proposed to improve the performance such as FHM [16] and HUP-Miner [21]. But utility-list algorithms suffer from generating the candidate itemsets which are not present in datasets. Utility-based algorithms perform worst when utility-list contains an entry for each transaction. To target these issues, Zida et al. proposed an algorithm EFIM [66]. It uses pattern-growth approach to mine HUIs. All the above-discussed algorithms do not work with negative utility value. In real-life, items may occur with negative utility value. To address this problem, HUIs with negative utility based algorithms have been presented. We discuss HUIs mining with negative utility algorithms in the next section.

3. High utility itemset mining with negative utility

HUIs mining is an important research area of data mining. HUIs mining gained the immense importance because of huge application areas. HUIs with negative item recently received much attention from the decision making community. It is quite useful in the marketing and retail communities as well as other more diverse fields.

This paper reviews three main types of HUIs mining algorithms as depicted in Fig. 2: level-wise (Apriori-based or Two-Phase based), tree-based (UP-tree like) and utility-list (vertical dataset format) based algorithms.

Fig. 2.

Taxonomy of high utility itemsets mining with negative utility according to mining approaches.

3.1. Level-wise mining algorithms

Level-wise algorithms follow Apriori-like approach for generating candidate such as joining. k-itemsets can generates (k + 1)-candidates. Level-wise approach generates itemsets of length k before length (k + 1)-itemsets. Only two (HUINIV-Mine and TS-HOUN) algorithms are available in the literature to mine HUIs with negative utility using level-wise approach.

HUINIV-Mine: In 2009, Chu et al. proposed an algorithm named HUINIV-Mine [7]. It is the first level-wise algorithm that considers negative utility. It is an extension of the Two-Phase algorithm. It mines HUIs with negative items by following Two-Phase model. HUINIV-Mine presents RTWU based overestimation and pruning strategy. The author demonstrated that HUIs mining with negative utility must have at-least one positive item. Otherwise, utility would be negative and it would not be a HUIs. It is a level-wise algorithm. Hence, it needs to maintain a large number of itemsets in memory to find larger itemsets. It scans dataset three times and mines HUIs.

TS-HOUN: In 2014, Lan et al. proposed an algorithm named TS-HOUN [24]. It is the first algorithm that mines HUIs with negative utility and on-shelf time periods. Using on-shelf time periods, the actual utility values of itemsets in a temporal dataset can be accurately evaluated. Most algorithms consider that items have the same shelf time, i.e., that all items are on sale for the same time period. In real-life, some items are only sold during some short time period (e.g., the summer). TS-HOUN scans dataset three times and efficiently find high on-shelf utility itemsets with negative profit values from temporal datasets. TS-HOUN uses Two-phase based (TWU) pruning strategy to prune the search space. It is an extension of Two-Phase approach. Hence, it needs much runtime and memory space to finish the mining task and generates a lot of candidates.

3.2. Tree-based mining algorithms

Tree-based algorithms are based on set-enumeration tree-based concepts. The candidates can be explored with the use of the lexicographic tree or enumeration tree. The main characteristic of tree-based algorithms is that the enumeration tree (or lexicographic tree) provides a certain order of exploration that can be extremely useful in many scenarios. It is assumed that a lexicographic ordering exists among the items in the dataset. This lexicographic ordering is essential for efficient set enumeration without repetition.

Tree-based mining algorithms mines HUIs by starting from itemset length-1 (as an initial suffix itemset), constructing its UP-tree and performing mining recursively on such a tree. The pattern growth is achieved by the concatenation of the suffix itemset with HUIs from UP-tree. Tree-based algorithms transform the problem of finding long itemsets to search for shorter ones recursively and then concatenating the suffix. Four algorithms (MHUI-BIT-NIP, MHUI-TID-NIP, UP-GNIV and HUSP-NIV) are available in the literature to mine HUIs with negative utility using tree-based approach.

MHUI-BIT-NIP & MHUI-TID-NIP: In 2011, Li et al. proposed two algorithms to mine HUIs with negative item profit over continuous stream transaction-sensitive sliding windows [28]. An efficient data structure LexTree-2HTU (Lexicographical Tree with 2-HTU-itemsets) is presented for maintaining a set 2-HTU (high transaction-weighted utilization)-itemsets from the current transaction-sensitive sliding window. LexTree-2HTU consists of two components, item-information and a set of trees with prefixes. The item-information is bit-vectors and TID-lists for MHUI-BIT algorithm and MHUI-TID algorithm respectively. The prefix is an entry contained in item-information. Bit-vector and TID-list improve the performance of proposed algorithms. The MHUI-BIT-NIP & MHUI-TID-NIP algorithms mine HUIs in three phases: window initialization phase, window sliding phase, and HUIs generation phase. Both algorithms use TWU based search space pruning technique.

UP-GNIV: In 2015, Subramanian et al. proposed an algorithm named UP-GNIV [46] (Utility Pattern-Growth approach for Negative Item Values) to mine HUIs with negative values by using tree-based approach without candidate generation. It is a negative utility version of UP-Growth [50] algorithm. It maintains the information of items in UP-Tree alike proposed in [2]. UP-GNIV proposed two strategies RNU (Removing Negative item Utilities) and PNI (Pruning Negative Itemsets). UP-GNIV calculates the TU using RNU strategies. PNI is applied for mining HUIs. The author checked the performance of the proposed algorithm and the state-of-the-art algorithm HUINIV-Mine using IBM synthetic datasets.

HUSP-NIV: In 2017, Xu et al. proposed high utility sequential itemsets with negative utility value [54]. In high utility sequential itemsets mining, an item occurs more than once in a sequence. None of the state-of-the-art algorithms are suitable for sequential mining. It is the first algorithm that mines high utility sequential itemsets with negative utility. HUSP-NIV is an extension of USPAN algorithm [58]. It uses the same LQS-tree (lexicographic quantitative sequence tree) as USPAN to extract the high utility sequence using I-Concatenation and S-Concatenation mechanisms. I-Concatenation and S-Concatenation mechanisms are adopted from USPAN algorithm to generate new candidate sequences and calculate the utility of sub-nodes based on its super node’s utility. The author demonstrated that the proposed algorithm is the first method of its kind.

EHIN: In 2018, Singh et al. proposed an algorithm for mining HUIs with negative utility by using a patten-growth tree. EHIN reduces the dataset scanning cost by merging the identical transactions. It uses projected dataset based transaction merging techniques to further reduce the dataset scanning cost. EHIN proposed two techniques to prune the search space named redefined subtree and redefined local utility. The author demonstrated that EHIN is up to 28 times faster and consumes up to 10 times less memory than the state-of-the-art algorithm FHN. The author showed the relative runtime and relative memory comparison between EHIN and the state-of-the-art algorithm FHN. The experimental results show that EHIN always performs better for dense datasets.

3.3. Utility-list based mining algorithms

Both the level-wise and tree-based algorithms mine HUIs from a set of transactions in a horizontal data format. Alternatively, mining can also be performed with data presented in a vertical data format like proposed in Eclat [61]. Vertical data format first scan of dataset builds the T_ID set of every single item. The computation is done by intersection of the T_ID sets of HUIs k-itemsets to compute the T_ID sets of the corresponding (k + 1)-itemsets. This process repeats until itemsets fulfill min _ util threshold. In the generation of candidate (k + 1)-itemset from k-itemsets, the merit of this method is that there is no need to scan dataset to find the utility of (k + 1)-itemsets. This is because the T_ID set of each k-itemset carries the complete information required for counting utility. Four algorithms (FHN, GHUM, HUPNU and FOSHU) are available that mine HUIs with negative utility items using utility-list based approach.

FHN: In 2014, Fournier-Viger et al. proposed an algorithm named FHN (Faster High-utility itemset miner with Negative unit profits) [13]. FHN is an extension of FHM algorithm [16]. It utilizes utility-list structure to explore the search space of itemsets. FHN uses a separate utility-list data structure for positive and negative utility values. It also utilizes EUCS (Estimated Utility Co-occurrence Structure) which provides an efficient pruning strategy to limit the search space. The author demonstrated that FHN is up to 500 times faster and uses up to 250 times less memory than the state-of-the-art algorithm HUINIV-Mine. Later, in 2016, an extensive version of basic FHN algorithm was proposed. The extended FHN [30] utilizes LA-Prune strategy which is proposed in [21] to prune the search space. The extended FHN is shown to be 2-3 orders of magnitude faster than HUINIV-Mine.

GHUM: In 2017, Krishnamoorthy et al. proposed an efficient algorithm named GHUM (Generalized High Utility Mining) [22]. GHUM presents a simplified utility-list based data structure to store information of itemsets. It does not use separate utility-list to store items. It sorts the negative items in ascending order using support (frequency) of items to generate candidates efficiently. GHUM adopts and modifies existing pruning strategies U-Prune and LA-Prune [21]. Utility-list based algorithms require expensive intersection operations to evaluate candidate. Therefore, it presents a novel pruning strategy (N-Prune) to significantly reduce the total number of evaluations. It also presents an anti-monotonic property based pruning strategy (A-Prune) for mining HUIs with negative items. GHUM is shown more than an order of magnitude improvement at a fraction of the memory over the current state-of-the-art FHN.

HUPNU: In 2017, Gan et al. proposed an algorithm named HUPNU (mining High-Utility itemsets with both Positive and Negative unit profits from Uncertain databases) to mine HUIs with negative utility value from uncertain datasets [19]. It considers the probability values of items for mining HUIs. It uses a vertical PU±-list (Probability-Utility list with Positive-and-Negative profits) structure to store both negative and positive items. It presents six pruning strategies to reduce the search space, a number of unpromising itemsets can be early pruned when constructing the PU±-list.

FOSHU: In 2015, Fournier-Viger et al. proposed an algorithm FOSHU (Faster On-Shelf High Utility itemset miner) [17] to mine HUIs with negative utility from on-shelf datasets. On-shelf items consider the shelf time of items. It is an extension of FHN algorithm [13]. Hence, it utilizes the utility-list structure to store the information of items. FOSHU mines itemsets in a single phase without generating candidates and also mines all times periods at the same time rather than mining each time period separately unlike USpan [58]. Therefore, it avoids the costly merge operations of itemsets found in each time period.

KOSHU: In 2017, Dam et al. proposed an algorithm KOSHU [8] which is an extension of FOSHU. KOSHU presents a new research issue where negative utility and shelf time of items are considered. KOSHU targets limitations of minimum utility threshold based on-shelf HUIs mining. KOSHU mines top-k high on-shelf utility itemsets (HOUIs). Hence, it allows the user to specify the value of k instead of the min _ util threshold. k is the number of itemsets to be found. KOSHU mines top-k HOUIs in all the time periods at the same time alike FOSHU. KOSHU introduces three novel strategies named EMPRP (efficient estimated co-occurrence maximum period rate pruning) CE2P (Concurrence existing of pair 2-itemset pruning) and PUP (Period utility pruning). KOSHU proposed a threshold raising strategy named RIRU (Relative utilities threshold raising strategy). RIRU is inspired by the traditional top-k HUIs mining strategy named RIU which is proposed by REPT [38].

4. Summary and discussion

The previous section has reviewed three main types of HUIs with negative utility mining algorithms. The key differences between these algorithms can be described in terms of dataset scanning, data structure used, pruning strategies and base algorithm. Table 6 summarizes the characteristics of negative utility based HUIs algorithms which are discussed in section 3. It is noticed that there is no comprehensive study covering all of these algorithms in the literature.

Table 6
An overview of High utility itemset mining algorithms with negative utility values

Algorithm Year Author Dataset scanning Data structure Dataset Mining Pruning strategy State-of-the-art algorithms Base algorithm

Level-wise mining algorithms

HUINIV-Mine [7] 2009 Chu et al. Three times … Transactional HUIs TWU-based None (first of its kind) Two-Phase [34]

TS-HOUN [24] 2014 Lan et al. Three times … Temporal transactional HOUIs^a TWU-based None (first of its kind) Two-Phase [34]

Tree-based mining algorithms

MHUI-BIT-NIP &MHUITID-NIP [28] 2011 Li et al. Once Lexicographical tree Data stream HUIs TWU-based self THUI-Mine [26] (Two-Phase)

UP-GNIV [46] 2015 Subramanian et al. Two times UP-Tree Transactional HUIs DGU, DGN [50] HUINIV-Mine UP-Growth [50]

HUSP-NIV [54] 2017 Xu et al. Once Lexicographical tree Sequential transactional High utility sequential itemsets Depth-pruning, width-pruning, and depth &width pruning. None (first of its kind) USpan [58]

EHIN [40] 2018 Singh et al. Once UP-Tee Transactional HUIs RLU &RSU FHN [30] EFIM [66]

Utility-list based mining algorithms

FHN [13] 2014 Fournier-Viger et al. Once Utility-list Transactional HUIs TWU-based &EUCP HUINIV-Mine FHM [16]

FOSHU [17] 2015 Fournier-Viger et al. Once Utility-list Temporal transactional HOUIs TWU-based &EUCP TS-HOUN FHN

FHN [30] (extended) 2016 Lin et al. Once Utility-list Transactional HUIs TWU-based, EUCP &LA-Prune HUINIV-Mine FHN

GHUM [22] 2017 Krishnamoorthy Once Simplified utility-list Transactional HUIs U-Prune, LA-Prune, N-Prune &A-Prune FHN FHN

HUPNU [19] 2017 Gan et al. Once PU ^+- -list^b Uncertain transactional HUIs PU-Prune, RTWU &EUCP None (first of its kind) …

KOSHU [8] 2015 Dam et al. Once Utility-list Temporal transactional top-k HOUIs EUCP, EMPRP, CE2P &PUP TS-HOUN &FOSHU FOSHU

Algorithm	Year	Author	Dataset scanning	Data structure	Dataset	Mining	Pruning strategy	State-of-the-art algorithms	Base algorithm
Level-wise mining algorithms
HUINIV-Mine [7]	2009	Chu et al.	Three times	…	Transactional	HUIs	TWU-based	None (first of its kind)	Two-Phase [34]
TS-HOUN [24]	2014	Lan et al.	Three times	…	Temporal transactional	HOUIs^a	TWU-based	None (first of its kind)	Two-Phase [34]
Tree-based mining algorithms
MHUI-BIT-NIP &MHUITID-NIP [28]	2011	Li et al.	Once	Lexicographical tree	Data stream	HUIs	TWU-based	self	THUI-Mine [26] (Two-Phase)
UP-GNIV [46]	2015	Subramanian et al.	Two times	UP-Tree	Transactional	HUIs	DGU, DGN [50]	HUINIV-Mine	UP-Growth [50]
HUSP-NIV [54]	2017	Xu et al.	Once	Lexicographical tree	Sequential transactional	High utility sequential itemsets	Depth-pruning, width-pruning, and depth &width pruning.	None (first of its kind)	USpan [58]
EHIN [40]	2018	Singh et al.	Once	UP-Tee	Transactional	HUIs	RLU &RSU	FHN [30]	EFIM [66]
Utility-list based mining algorithms
FHN [13]	2014	Fournier-Viger et al.	Once	Utility-list	Transactional	HUIs	TWU-based &EUCP	HUINIV-Mine	FHM [16]
FOSHU [17]	2015	Fournier-Viger et al.	Once	Utility-list	Temporal transactional	HOUIs	TWU-based &EUCP	TS-HOUN	FHN
FHN [30] (extended)	2016	Lin et al.	Once	Utility-list	Transactional	HUIs	TWU-based, EUCP &LA-Prune	HUINIV-Mine	FHN
GHUM [22]	2017	Krishnamoorthy	Once	Simplified utility-list	Transactional	HUIs	U-Prune, LA-Prune, N-Prune &A-Prune	FHN	FHN
HUPNU [19]	2017	Gan et al.	Once	PU ^+- -list^b	Uncertain transactional	HUIs	PU-Prune, RTWU &EUCP	None (first of its kind)	…
KOSHU [8]	2015	Dam et al.	Once	Utility-list	Temporal transactional	top-k HOUIs	EUCP, EMPRP, CE2P &PUP	TS-HOUN &FOSHU	FOSHU

^aHigh on-shelf utility itemsets. ^bProbability-Utility list with Positive-and-Negative profits.

Table 6 depicts basic details of algorithms such as name of algorithms, publishing year of algorithms, name of the authors. The "Dataset scanning" column of this table gives the information about how many times a algorithm scans dataset; the "Data structure" shows the type of data structure used to store the information of items; the "Dataset" shows the type of inputted dataset used for experiments; the "Mining" represents output HUIs; the "Pruning strategy" column shows the strategies used to prune search space; the "State-of-the-art algorithms" shows the name the state-of-the-art algorithms and the "Base algorithm" represents the base algorithm for the presented algorithms. This table shows all the mining algorithms used to mine HUIs with negative utility value.

Negative utility based mining can also be used in other domains, such as market basket analysis [2 , 56], website click-stream [27, 67], cross-marketing in retail stores [2 , 57] biomedical applications [5] and mobile commerce applications [39, 47].

4.1. Datasets

Both real and synthetic datasets having varied characteristics have been used for experimental results. Two synthetic datasets T10I4D100K and T40I10D100K also used for experiments by most of the algorithms. These datasets also used in the traditional HUIs mining. Table 7 shows the statistical characteristics of the datasets. The external utility or unit profit for items are generated between -1000 to 1000 by using a log-normal distribution [30, 40]. The internal utility or quantity of items are generated randomly between 1 and 5 [16, 49]. We categorize the datasets into two parts, dense and sparse. Dense datasets where all items appear in almost all transactions. Sparse datasets where a few items in the transaction appear. The datasets used in the studied paper can be found on the SPMF website [15].

Table 7
Statistical information about datasets

Dataset # of transactions # of distinct items Avg. length Max. Length Type

accidents 340183 468 33.8 51 Dense

chess 3196 75 37 37 Dense

mushroom 8124 119 23 23 Dense

pumsb 49046 2113 74 74 Dense

T40I10D100K 100000 942 39.6 77 Dense

BMSPOS 515366 1656 6.51 164 Sparse

retail 88162 16470 10.3 76 Sparse

T10I4D100K 100000 870 10.1 29 Sparse

kosarak 990002 41270 8.09 2498 Sparse (Large)

Dataset	# of transactions	# of distinct items	Avg. length	Max. Length	Type
accidents	340183	468	33.8	51	Dense
chess	3196	75	37	37	Dense
mushroom	8124	119	23	23	Dense
pumsb	49046	2113	74	74	Dense
T40I10D100K	100000	942	39.6	77	Dense
BMSPOS	515366	1656	6.51	164	Sparse
retail	88162	16470	10.3	76	Sparse
T10I4D100K	100000	870	10.1	29	Sparse
kosarak	990002	41270	8.09	2498	Sparse (Large)

5. Future directions for high utility itemsets mining with negative utility value problem

Lot of research progresses have been carried out recently in high utility itemsets mining with negative utility values. However, there are more emerging issues to challenge the traditional methods. In this section, we identify the related opportunities in high utility itemsets mining with negative utility values.

5.1. Concise high utility itemsets mining with negative utility value

Closed HUIs are the itemsets that do not have superset having the same support count. Discovery of closed HUIs instead of all HUIs reduces the number of itemsets. Hence, closed HUIs are more interesting and actionable because they are a lossless representation of all HUIs. In other words, using closed HUIs, the information about all HUIs including their support can be recovered without scanning the dataset. Closed itemsets are more actionable because they represent the largest set of non repeated HUIs. Recently many closed algorithms for HUIs are proposed [18 , 52]. But till now no algorithm has been proposed for mining closed HUIs with negative utility value. Therefore, closed HUIs with negative utility value is a good area to explore.

5.2. Constraint-based high utility itemsets mining with negative utility value

Although mining with negative utility uncovers thousands of HUIs, the user is particularly interested only in long and more actionable itemsets. Many algorithms have been proposed to reduce the total number of itemsets generated based on constraints on the resulting rules. Hence, the users are more interested in constrained HUIs mining. Pei and Han [36] presented many constraints for FIM. These presented constraints can also be pushed into HUIs mining.

Length-based HUIs mining plays an important role in constraint HUIs mining. Two upper bound (minimum length and maximum length) can be defined to mine length HUIs rules. In order to remove the tiny items, user can set the minimum length threshold. Length based HUIs can remove lots of small itemsets and produce more interesting and actionable HUIs. HUIs mining algorithm with length threshold has been presented by Fournier-Viger [14]. But, this work not considers negative utility value. Therefore, the length constraints can be utilized in HUIs mining with negative utility value. In order to find HUIs more efficiently, we need to push the constraint as deep as we can.

5.3. Top-k high utility itemsets mining with negative utility value

Mining HUIs using min _ util is difficult for user because specifying the appropriate min _ util is not an easy task. Hence, top-k HUIs mining problem were proposed. Many top-k HUIs mining algorithms are proposed such as [11 , 53] but none of these algorithms can mine HUIs with negative utility value. Therefore, in future, top-k HUIs mining with negative utility can also be explored.

5.4. High utility itemsets mining with negative utility value from data stream

HUIs mining from data stream has many applications such as retail market analysis, wireless sensor networks, and stock market prediction. To handle the huge and dynamic dataset is a challenging task. To handle the data stream and negative utility value only MHUI-BIT-NIP & MHUI-TID-NIP [28] algorithms are proposed. These algorithms follow ordinary lexicographical tree structure. Hence, there are a lot of scope to improve and propose new algorithms for mining HUIs with negative utility from the data stream.

5.5. High utility itemsets mining on big data

The algorithms discussed in this paper do not scale with big data. In literature, no algorithm is available for mining HUIs with negative utility from big data. Big data can be dealt using Multi-core computing, Grid computing, Graphics processing units (GPUs), MapReduce, Apache Hadoop and Spark framework. Some works [4 , 65] deal with big data and describe the basic data mining operations on big data. In future, itemsets mining can be done with big data also.

5.6. Other problems

Some other extensions of HUIs with negative utility value mine rich itemsets in various ways, such as fuzzy HUIs mining, periodic HUIs mining, episode HUIs mining, etc. We can be used latest prepossessing techniques to improve the accuracy, completeness, and consistency of the algorithms. Some works [37 , 63] show the latest preprocessing techniques that can be utilized with HUIs mining.

6. Conclusion

HUIs mining have numerous applications such as market basket analysis [2 , 56], website click-stream [27, 67], cross-marketing in retail stores [2 , 57], business intelligence [9, 10] biomedical applications [5] and mobile commerce applications [39, 47]. In this paper, we introduce the fundamental methods to readers and let them choose the proper method for their applications. This work significantly reduces costs for finding the suitable algorithm for their applications and improves retailer satisfaction. The extracted HUIs help the corporate managers for a target customer and gain more profit. HUIs also help companies promptly to take the decision. In all, HUIs mining has a bright future and deserves the research attention.

In this survey, we presented an in-depth analysis of a number of existing algorithms which made a significant contribution to improve the efficiency of HUIs mining with negative utility.

Moreover, the paper presents important extensions of the HUIs mining with negative utility problems that address some shortcomings. In addition, the paper discusses other research problem related to HUIs mining with negative utility such as closed HUIs mining with negative utility, constraints HUIs mining with negative utility, HUIs mining with negative utility from data stream, top-k HUIs mining with negative utility and periodic HUIs mining with negative utility.

7. Compliance with ethical standards

The authors declare no conflicts of interest.This article does not contain any studies with human participants or animals performed by any of the authors. The article gives review of high utility itemsets mining with negative utility values.

References

Agrawal and

Srikant , Fast algorithms for mining association rules in large databases, In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB ’94, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc, 1994, pp. 487–499.

C.F.

Ahmed ,

S.K.

Tanbeer ,

B.-S.

Jeong and

Y.-K.

Lee , Efficient tree structures for high utility pattern mining in incremental databases, IEEE Trans on Knowl and Data Eng 21(12) (2009), 1708–1721.

C.F.

Ahmed ,

S.K.

Tanbeer ,

B.-S.

Jeong and

Y.-K.

Lee , Huc-prune: An efficient candidate pruning technique to mine high utility patterns, Applied Intelligence 34(2) (2011), 181–198.

Apiletti ,

Baralis ,

Cerquitelli ,

Garza ,

Pulvirenti and

Michiardi , A parallel mapreduce algorithm to efficiently support itemset mining on high dimensional data, Big Data Research 10 (2017), 53–69.

Chan ,

Yang and

Y.-D.

Shen , Mining high utility itemsets, In Proceedings of the Third IEEE International Conference on Data Mining, ICDM '03, Washington, DC, USA, IEEE Computer Society, 2003, pp. 19–26.

C.-J.

Chu ,

V.S.

Tseng and

Liang , An efficient algorithm for mining temporal high utility itemsets from data streams, Journal of Systems and Software 81(7) (2008), 1105–1117.

C.-J.

Chu ,

V.S.

Tseng and

Liang , An efficient algorithm for mining high utility itemsets with negative item values in large databases, Applied Mathematics and Computation 215(2) (2009), 767–778.

T.-L.

Dam ,

Li ,

Fournier-Viger and

Q.-H.

Duong , An efficient algorithm for mining top-k on-shelf high utility itemsets, Knowledge and Information Systems 52(3) (2017), 621–655.

Duan and

Xiong , Big data analytics and business analytics, Journal of Management Analytics 2(1) (2015), 1–21.

10.

Duan and

L.D.

Xu , Business intelligence for enterprise systems: A survey, IEEE Transactions on Industrial Informatics 8(3) (2012), 679–687.

11.

Q.-H.

Duong ,

Liao ,

Fournier-Viger and

T.-L.

Dam , An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies, Knowledge-Based Systems 104 (2016), 106–122.

12.

Erwin ,

R.P.

Gopalan and

N.R.

Achuthan , Efficient Mining of High Utility Itemsets from Large Datasets, Springer Berlin Heidelberg, Berlin, Heidelberg, 2008, pp. 554–561.

13.

Fournier-Viger , Fhn: Efficient mining of high-utility itemsets with negative unit profits, In X. Luo, J.X. Yu and Z. Li, editors, Advanced Data Mining and Applications, Cham Springer International Publishing, 2014, pp. 16–29.

14.

Fournier-Viger ,

J.C.-W.

Lin ,

Q.-H.

Duong and

T.-L.

Dam , FHM $$+$$ : Faster High-Utility Itemset Mining Using Length Upper-Bound Reduction, Springer International Publishing, Cham, 2016a, pp. 115–127.

15.

Fournier-Viger ,

J.C.-W.

Lin ,

Gomariz ,

Gueniche ,

Soltani ,

Deng and

H.T.

Lam , The SPMF Open-Source Data Mining Library Version 2, Springer International Publishing, Cham, 2016b, pp. 36–40.

16.

Fournier-Viger ,

C.-W.

Wu ,

Zida and

V.S.

Tseng , FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-Occurrence Pruning, Cham, Springer International Publishing, 2014, pp. 83–92.

17.

Fournier-Viger and

Zida , Foshu: Faster onshelf high utility itemset mining - with or without negative unit profit, In Proceedings of the 30th Annual ACM Symposium on Applied Computing, SAC ’15, New York, NY, USA, ACM, 2015, pp. 857–864.

18.

Fournier-Viger ,

Zida ,

J.C.-W.

Lin ,

C.-W.

Wu and

V.S.

Tseng , EFIM-Closed: Fast and Memory Efficient Discovery of Closed High-Utility Itemsets, Springer International Publishing, Cham, 2016c, pp. 199–213.

19.

Gan ,

J.C.-W.

Lin ,

Fournier-Viger ,

H.-C.

Chao and

V.S.

Tseng , Mining high-utility itemsets with both positive and negative unit profits from uncertain databases, In J. Kim, K. Shim, L. Cao, J.-G. Lee, X. Lin and Y.-S. Moon, editors, Advances in Knowledge Discovery and Data Mining, Cham. Springer International Publishing, 2017, pp. 434–446.

20.

Han ,

Pei and

Yin , Mining frequent patterns without candidate generation, In ACM Sigmod Record 29 (2000), 1–12. ACM.

21.

Krishnamoorthy , Pruning strategies for mining high utility itemsets, Expert Systems with Applications 42(5) (2015), 2371–2381.

22.

Krishnamoorthy , Efficiently mining high utility itemsets with negative unit profits, Knowledge-Based Systems (2017a).

23.

Krishnamoorthy , Hminer: Efficiently mining high utility itemsets, Expert Systems with Applications 90(Supplement C) (2017b), 168–183.

24.

G.-C.

Lan ,

T.-P.

Hong ,

J.-P.

Huang and

V.S.

Tseng , On-shelf utility mining with negative item values, Expert Syst Appl 41(7) (2014a), 3450–3459.

25.

G.-C.

Lan ,

T.-P.

Hong and

V.S.

Tseng , An efficient projection-based indexing approach for mining high utility itemsets, Knowledge and Information Systems 38(1) (2014b), 85–107.

26.

Lee ,

S.-H.

Park and

Moon , Utility-based association rule mining: A marketing solution for cross-selling, Expert Systems with Applications 40(7) (2013), 2715–2725.

27.

H.F.

Li ,

H.Y.

Huang ,

Y.C.

Chen ,

Y.J.

Liu and

S.Y.

Lee , Fast and memory efficient mining of high utility itemsets in data streams, In 2008 Eighth IEEE International Conference on Data Mining, 2008a, pp. 881–886.

28.

H.-F.

Li ,

H.-Y.

Huang and

S.-Y.

Lee , Fast and memory efficient mining of high-utility itemsets from data streams: With and without negative item profits, Knowledge and Information Systems 28(3) (2011), 495–522.

29.

Y.-C.

Li ,

J.-S.

Yeh and

C.-C.

Chang ,. Isolated items discarding strategy for discovering high utility itemsets, Data and Knowledge Engineering 64(1) (2008b), 198–217. Fourth International Conference on Business Process Management (BPM 2006)8th International Conference on Enterprise Information Systems (ICEIS' 2006)Four selected and extended papers Three selected and extended papers.

30.

J.C.-W.

Lin ,

Fournier-Viger and

Gan , Fhn: An efficient algorithm for mining high-utility itemsets with negative unit profits, Knowledge-Based Systems 111 (2016), 283–298.

31.

Liu ,

Wang and

B.C.M.

Fung , Mining high utility patterns in one phase without generating candidates, IEEE Transactions on Knowledge and Data Engineering 28(5) (2016), 1245–1257.

32.

Liu and

Qu , Mining high utility itemsets without C candidate generation, In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM '12, New York, NY, USA, ACM, 2012, pp. 55–64.

33.

Liu ,

W.-K.

Liao and

Choudhary , A fast high utility itemsets mining algorithm, In Proceedings of the 1st International Workshop on Utility-based Data Mining, UBDM '05, New York, NY, USA, ACM, 2005a, pp. 90–99.

34.

Liu ,

W.-K.

Liao and

Choudhary , A two-phase algorithm for fast discovery of high utility itemsets, In Proceedings of the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD'05, Berlin, Heidelberg, Springer-Verlag, 2005b, pp. 689–695.

35.

Martin ,

Martinez-Ballesteros ,

Garcia-Gil ,

Alcala-Fdez ,

Herrera and

Riquelme-Santos , Mrqar: A generic mapreduce framework to discover quantitative association rules in big data problems, Knowledge-Based Systems 153 (2018), 176–192.

36.

Pei and

Han , Constrained frequent pattern mining: A pattern-growth view, SIGKDD Explor Newsl 4(1) (2002), 31–39.

37.

Ramirez-Gallego ,

Krawczyk ,

Garcia ,

Wozniak and

Herrera , A survey on data preprocessing for data stream mining: Current status and future directions, Neuro-computing 239 (2017), 3957.

38.

Ryang and

Yun , Top-k high utility pattern mining with effective threshold raising strategies, Knowledge-Based Systems 76 (2015), 109–126.

39.

B.-E.

Shie ,

H.-F.

Hsiao ,

V.S.

Tseng and

P.S.

Yu , Mining High Utility Mobile Sequential Patterns in Mobile Commerce Environments, Springer Berlin Heidelberg, Berlin, Heidelberg, 2011, pp. 224–238.

40.

Singh ,

H.K.

Shakya ,

Abhimanyu and

Biswas , Mining of high utility itemsets with negative utility, Expert Systems (2018a), e12296. e12296 10.1111/exsy.12296.

41.

Singh ,

H.K.

Shakya and

Biswas , An efficient approach to discovering frequent patterns from data cube using aggregation and directed graph, In Proceedings of the Sixth International Conference on Computer and Communication Technology, ICCCT '15, New York, NY, USA, ACM, 2015, pp. 31–35.

42.

Singh ,

H.K.

Shakya and

Biswas , Discovery of Multi-frequent Patterns Using Directed Graph, Springer India, New Delhi, 2016a, pp. 153–162.

43.

Singh ,

H.K.

Shakya and

Biswas , Frequent Patterns Mining from Data Cube Using Aggregation and Directed Graph, Springer India, New Delhi, 2016b, pp. 167–177.

44.

Singh ,

S.S.

Singh ,

Kumar and

Biswas , Tkeh: An efficient algorithm for mining top-k high utility itemsets, Applied Intelligence (2018b).

45.

Song ,

Liu and

Li , Bahui: Fast and memory efficient mining of high utility itemsets based on bitmap, Int J Data WarehousMin 10(1) (2014), 1–15.

46.

Subramanian and

Kandhasamy , Up-gniv: An expeditious high utility pattern mining algorithm for itemsets with negative utility values, International Journal of Information Technology and Management 14(1) (2015), 26–42.

47.

V.S.

Tseng ,

B.-E.

Shie ,

C.-W.

Wu and

P.S.

Yu , Efficient algorithms for mining high utility itemsets from transac-tional databases, IEEE Trans on Knowl and Data Eng 25(8) (2013), 1772–1786.

48.

V.S.

Tseng ,

C.W.

Wu ,

Fournier-Viger and

P.S.

Yu , Efficient algorithms for mining the concise and lossless representation of high utility itemsets, IEEE Transactions on Knowledge and Data Engineering 27(3) (2015), 726–739.

49.

V.S.

Tseng ,

C.W.

Wu ,

Fournier-Viger and

P.S.

Yu , Efficient algorithms for mining top-k high utility itemsets, IEEE Transactions on Knowledge and Data Engineering 28(1) (2016), 54–67.

50.

V.S.

Tseng ,

C.-W.

Wu ,

B.-E.

Shie and

P.S.

Yu , Upgrowth: An efficient algorithm for high utility itemset mining, In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '10, New York, NY, USA, ACM, 2010, pp. 253–262.

51.

Uno ,

Kiyomi and

Arimura , Lcm ver, 2: Efficient mining algorithms for frequent/closed/maximal itemsets, In IEEE ICDM Workshop on Frequent Itemset Mining Implementations, volume 126, Brighton, UK, 2004.

52.

C.W.

Wu ,

Fournier-Viger ,

J.Y.

Gu and

V.S.

Tseng , Mining closed+ high utility itemsets without candidate generation, In 2015 Conference on Technologies and Applications of Artificial Intelligence (TAAI), 2015, pp. 187–194.

53.

C.W.

Wu ,

B.-E.

Shie ,

V.S.

Tseng and

P.S.

Yu , Mining top-k high utility itemsets, Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, New York, NY, USA, ACM, 2012, pp. 78–86.

54.

Xu ,

Dong ,

Xu and

Dong , Mining high utility sequential patterns with negative item values, International Journal of Pattern Recognition and Artificial Intelligence 31(10) (2017), 1750035.

55.

Yao and

H.J.

Hamilton , Mining itemset utilities from transaction databases, Data Knowl Eng 59(3) (2006), 603–626.

56.

Yao ,

H.J.

Hamilton and

C.J.

Butz , A foundational approach to mining itemset utilities from databases, In Proceedings of the Third SIAM International Conference on Data Mining, 2004, pp. 482–486.

57.

S.-J.

Yen and

Y.-S.

Lee , Mining High Utility Quantitative Association Rules, Springer Berlin Heidelberg, Berlin, Heidelberg, 2007, pp. 283–292.

58.

Yin ,

Zheng and

Cao , Uspan: An efficient algorithm for mining high utility sequential patterns, In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, New York, NY, USA, ACM, 2012, pp. 660–668.

59.

Yun ,

Ryang and

K.H.

Ryu , High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates, Expert Systems with Applications 41(8) (2014), 3861–3878.

60.

M.J.

Zaki , Scalable algorithms for association mining, IEEE Transactions on Knowledge and Data Engineering 12(3) (2000a), 372–390.

61.

M.J.

Zaki , Scalable algorithms for association mining, IEEE Transactions on Knowledge and Data Engineering 12(3) (2000b), 372–390.

62.

Zhai ,

Li and

Wang , A cross-selection instance algorithm, Journal of Intelligent & Fuzzy Systems 30(2) (2016a), 717–728.

63.

Zhai ,

Wang and

Pang , Voting-based instance selection from large data sets with mapreduce and random weight networks, Information Sciences 367-368 (2016b), 1066–1077.

64.

Zhai ,

Zhang and

Wang , The classification of imbal-anced large data sets based on mapreduce and ensemble of elm classifiers, International Journal of Machine Learning and Cybernetics 8(3) (2017), 1009–1017.

65.

Zhai ,

Zhang ,

Zhang and

Liu , Fuzzy integral-based elm ensemble for imbalanced big data classification, Soft Computing 22(11) (2018), 3519–3531.

66.

Zida ,

Fournier-Viger ,

J.C.-W.

Lin ,

C.-W.

Wu and

V.S.

Tseng , Efim: A fast and memory efficient algorithm for high-utility itemset mining, Knowledge and Information Systems 51(2) (2017), 595–625.

67.

Zihayat and

An , Mining top-k high utility patterns over data streams, Information Sciences 285 (2014), 138–161. Processing and Mining Complex Data Streams.

High utility itemsets mining with negative utility value: A survey

Abstract

Keywords

1. Introduction

2.1. Preliminaries and definitions

Table 1 Transactional dataset TID Transaction T 1 (A, 2) (B, 2) (D, 1) (E, 3) T 2 (B, 1) (C, 5) (E, 1) T 3 (B, 2) (C, 1) (D, 3) (E, 2) T 4 (C, 2) (D, 1) (E, 3) T 5 (A, 2) T 6 (A, 2) (B, 1) (C, 4) (D, 2) (E, 1) T 7 (B, 3) (C, 2) (E, 2)

2.3. Traditional high utility itemsets mining algorithms

3. High utility itemset mining with negative utility

3.2. Tree-based mining algorithms

3.3. Utility-list based mining algorithms

4. Summary and discussion

5.1. Concise high utility itemsets mining with negative utility value

5.2. Constraint-based high utility itemsets mining with negative utility value

5.3. Top-k high utility itemsets mining with negative utility value

5.4. High utility itemsets mining with negative utility value from data stream

5.5. High utility itemsets mining on big data

5.6. Other problems

6. Conclusion

7. Compliance with ethical standards

References

Table 1
Transactional dataset

TID Transaction

T ₁ (A, 2) (B, 2) (D, 1) (E, 3)

T ₂ (B, 1) (C, 5) (E, 1)

T ₃ (B, 2) (C, 1) (D, 3) (E, 2)

T ₄ (C, 2) (D, 1) (E, 3)

T ₅ (A, 2)

T ₆ (A, 2) (B, 1) (C, 4) (D, 2) (E, 1)

T ₇ (B, 3) (C, 2) (E, 2)