Actionable knowledge discovery from social networks using causal structures of structural features

Abstract

Knowledge discovery and data mining provide an array of solutions for real-world problems. When facing business requirements, the ultimate goal of knowledge discovery is not the knowledge itself but rather making the gained knowledge practical. Consequently, the models and patterns found by the mining methods often require post-processing. To this end, actionable knowledge discovery has been introduced which is developed to extract actionable knowledge from data. The output of actionable knowledge discovery is a set of actions that help the domain expert to gain the desired outcome. Such a process where a set of actions are extracted is called action extraction. One of the challenges of action extraction is to incorporate causal dependencies among the variables to find actions with higher effectiveness compared to when no such dependencies are used. The goal of this paper is to dive into the lesser studied subject of “action discovery in social networks” and intends to extract actions by utilizing the casual structures discovered from such data. Furthermore, in order to capture the underlying information within a social network, we extract the corresponding structural features. We propose a method called SF-ICE-CREAM (Social Features included Inductive Causation Enabled Causal Relationship-based Economical Action Mining) to overcome the challenges introduced above. This method uses structural features to find the underlying causal structures within a social network and incorporates them into the action extraction process.

Keywords

Actionable knowledge discovery action extraction causal network feature extraction

1 Introduction

Social networks have been a rich source of study due to their continuous evolution over the past decade. A variety of data mining methods have been carried out by scientists on social data to extract knowledge. Detection of social anomalies and smart advertisement are just a few to name among the applications of such knowledge. The typical data mining methods provide either predictive or descriptive models by processing such data which results in useful knowledge. But when we face the real-world problems, having only a model of the data would not suffice because we rather require to change the current situation towards a desirable one by immediate actions and interventions according to the business requirements. In order to intervene, i.e. apply the suggestions that could potentially result in an improved situation, a field of study called “Actionable knowledge Discovery” has been introduced. However, in the actionable knowledge discovery domain, like other typical data mining tasks, the foundation of the gained knowledge is the correlations discovered inside the input data. Meanwhile, these correlations do not necessarily imply causality and as a result, the suggested actions are solely based on statistical correlations rather than the actual causal dependencies. Subsequently, these methods fail to guarantee the desired outcome. Pearl introduced the impact of causality in data mining for the first time in [1] and then proposed a method to extract causal structures from data with Verma in [2]. Extracting causal structures has lots of limitations and complexities and it is near impossible in most cases. But in the works such as [3], we can see that these structures can be found if certain assumptions are made. We can furthermore see that using these structures can improve the quality of actions compared to when only the correlations are used. Considering causal structures in the actionable knowledge discovery domain has been a lesser researched subject to the best of our knowledge and none of the works in the field has discovered actionable knowledge from social networks while incorporating causality.

Markedly, action mining in social networks has recently been introduced in [4, 5] where the relationships between individuals are incorporated in the mining process. In this work, cost-effective action mining is turned into an optimization problem where the objective is defined as obtaining a desired label for an intended object while minimizing the cost of changes suggested by the extracted actions. The problem is modeled using random walks over the network and solved by Stochastic Gradient Descent. Although this work has comprehensively extracted actions from a social network, it does not calculate the impact of causality. Whereas, in this paper we intend to incorporate causality in the action mining process.

In addition to causality, in social network data, the relationships between objects is another important factor. To the best of our knowledge, methods of actionable knowledge discovery use data where each object and its features are independent of others. In other words, the data is in a tabular format where each record comprises features without regarding the dependencies inside the data.

In this paper, we propose a method that extracts actionable knowledge while exploiting causal structures inside social network data. We also intend to shift away from the typical features used in this domain and use features solely based on the relationships among the objects inside the network. Such features called structural features. To the best of our knowledge, this is the first research that tackles the extract ion of actionable knowledge from causal structures of social networks. Therefore, our detailed contributions in this paper is as follows:

Extract interpretable structural features based on the relationships inside a social network.

Extract causal structures from the extracted structural features.

Extract actions from social network graph using causal structures of structural features.

Our goal is to ultimately obtain a set of suggestions that could help to alter the undesired situation. In other words, for an intended object in an undesired class, we are looking for a set of suggestions to change the feature values of the intended object such that the altered object would be in the desired class. This is the definition of an action. That is, an action is a suggestion to change the value of a specific feature of the intended object in order to make its situation more desirable. A process in which a set of such actions are extracted for any given object in an undesirable state is called Action Mining. In some methods like [6] and also in ours, a cost is associated for each change of features values which is dependent to the deficit of original feature value and the feature value suggested by the action. Such methods are called economical because they incorporate such costs. We will show that our method outperforms its competitors on social network data in terms of different measures of action quality such as effectiveness.

Since actions suggest some changes on the value of features, structural features extracted by the feature extraction method should be interpretable by a domain expert. Therefore, we utilize a ReFeX-based method introduced in [6] by extending it to provide distinguishable features. For action extraction, we use ICE-CREAM method proposed in [3] exploiting the causal structures extracted so far. We name it SF-ICE-CREAM (Structural Features-based ICE-CREAM) The rest of this paper is structured as follows. We review the latest works of this field in the related works section. In the terminology section, we redefine some terms and definitions for disambiguation and integrity. In the proposed method section we thoroughly examine our method. Then we show the experimental results and present comparisons with other state-of-the-art methods in the experiments section. Finally, in the discussion and future works section, we conclude the research and propose future works.

2 Related work

Since the scientific contribution of our work is mainly focused on action mining, in this section we review the latest researches in the literature. Each research and its application will be described concisely. It is noteworthy that very few of the works in this field have studied causality. Furthermore, to the best of our knowledge, causality in social networks is not investigated in the researches in the action mining literature.

Action mining is part of a subdomain of data mining called Actionable Knowledge Discovery (AKD), which is concerned with finding the knowledge which not only is of technical significance but also satisfies domain expectations, and can be applied to operations with minimal further effort of domain experts [7]. Action mining methods find an optimal action for a given instance (Transductive methods) or find rules for different groups of similar instances (Inductive methods) which can afterwards be used to produce actions.

Ras et al. in (Ras & Wieczorkowska, 2000) defined the concept of action rules and proposed a method to produce the rules using pairs of classification rules. Afterwards, this work was developed to mine action rules without pre-existing classification rules through an Apriori-like algorithm to handle big data based on Map-Reduce framework [8].

Su et al. [9] introduced actionable behavioral rule mining which aims to extract action rules from an object-based data for affecting an entity behavior. The data includes observations (objects) of an entity instead of members of an entity. In addition, there is a current observation where each change of a proposed action rule needs to change an attribute value of the entity from the current observation. In [10], Su et al. proposed a method to extract actionable behavioral rules based on decision trees.

While an action rule is a set of changes that need to be made for achieving the desired result, meta-actions are the actions that need to be executed in order to trigger corresponding changes. In [11], Ranganathan et al. proposed a new efficient system, to generate meta-actions by implementing Specific Action Rule discovery based on Grabbing Strategy (SARGS) and applying it to Twitter data for semantic analysis.

In [12], Kuang et al. have worked on a hierarchical meta-action ordering strategy (used to represent action rules and their corresponding triggers in a tree-like structure) along with the effects of meta-nodes. The method in [13] developed by Almardini et al. aims to find meta-actions by clustering patients using a graph-based model. This method is specifically designed to solve the prediction of patients’ treatment path taken from readmission to the end of their treatment problem. Kuang et al. in [14] proposed one of the main modules of HAMIS (recommender system designed for 34 industrial services companies). Their method introduces a module that is responsible for discovering meta-actions from a massive set of textual comments obtained from their customers during surveys about their satisfaction with the company’s provided services.

The work in [15] has been developed by Touati et al. in which the evaluation of action rules is done by measuring the benefits of meta-actions by two measures of probability and reliability of performance.

Cui et al. used additive tree models (ATM) to extract actions [18] using a transductive method. ATMs are typically not very well interpreted when extracting actions so in the paper a new framework is proposed to post-process such ATMs capable of proposing an actionable plan to modify each input with minimum cost to achieve the desired state. The main approach of this work is to formulate optimal action mining to an integer linear programming problem. In [19] Lu et al. proposed yet another approach to extract actions from ATMs. Here the optimal actionable planning (OAP) is targeted and defined for each ATM. Then a heuristic method is proposed to find the optimal solution from the state space graph.

In [4] action mining from social networks is tackled. An action suggests a set of optimal changes in the weight of edges. The proposed method utilizes a node classification technique based on label propagation to ultimately calculate the optimal changes in the edges of the input graph which turns the label of the intended node to a desired one. The problem is an optimization problem which is modeled using random walks and solve using stochastic gradient descent.

In [20] Li et al. refer to the concepts of both actionable knowledge and causality and assert causality as a principle in data mining. In this paper, actionability is declared as an important property of knowledge. In order to carry out an action, the discovered knowledge has to imply causal relationships to be able to justify the occurrences inside the data under consideration. Having a set of simple data mining tools can be beneficial to discover causal relationships. Although these tools do not guarantee their finding to be absolutely causal but are capable of finding candidates excluding non-causal features. In [3] Shamsinejad et al. try to solve the problem that the correlations found do not necessarily imply causation. To address this, they proposed ICE-CREAM in which actions are extracted using inductive causation introduced by Pearl and Verma in [2]. They benefit from a causal structure obtained from data and use causal inference to extract actions based on causality.

The common characteristics of these researches are that they are mainly focused on either business or health domains. The works focused on social network data do not incorporate causal dependencies; hence, they fail to carry out causal inference. Whereas our method focuses on social network data, incorporates causal dependencies in the network and performs causal inferences to achieve more qualified actions.

3 Terminology and preliminary

The concepts and definition of actionable knowledge discovery vary in different sources. We intend to redefine the terms used in this paper for clarification and better understanding. Table 1 briefly describes notations and symbols used throughout the paper.

Table 1
Notations and definitions

Notation Definition

G = (V, E) a network

F feature matrix of the input network

f _o feature vector of object o

A_i (o) the ith feature of o

L target feature

L desired state

O ^- the set of objects in an undesirable state

Γ An action

pl the profit associated with l

C () Cost function

NetProfit (Γ, o) the net profit of an action set

Pr _o the probability of object o being positively labeled

Pr _or the probability of object o being positively labeled after applying action Γ

Notation	Definition
G = (V, E)	a network
F	feature matrix of the input network
f _o	feature vector of object o
A_i (o)	the ith feature of o
L	target feature
L	desired state
O ^-	the set of objects in an undesirable state
Γ	An action
pl	the profit associated with l
C ()	Cost function
NetProfit (Γ, o)	the net profit of an action set
Pr _o	the probability of object o being positively labeled
Pr _or	the probability of object o being positively labeled after applying action Γ

Feature Vector: Assuming an object o with o features, its characteristics are defined as the vector in the form of f_o = [A₁ (o) , A₂ (o) , …, A_m (o)] in which A_i is the ith feature of o and A_t (o) is a value from its domain i.e. A_t (o) ∈ Dom (A_t).

Feature Matrix: Assuming there are n objects, a feature matrix is a n × m matrix in which each row of the matrix corresponds to the feature vector of an object. A derivative F_ij corresponds to jth feature value of the object corresponding to the ith row. Where 1 < i < n and 1 < j < m.

Social Network: We model a social network as the tuple <G, F>. G = (V, E) is the underlying graph where V is the set of n nodes, each representative of actors inside the network and E is the set of edges showing the links between actors. As an example, the edge (u, v) shows some interaction between u and v. Finally, F is the feature matrix described earlier.

Class Label: Assuming A is the set of all features, one feature L and a specific value l ∈ Dom (L) is chosen to determine the state of the object according to the business requirements. Given an object o, its state is defined as desirable if L (o) = l and undesirable otherwise. The set of objects in an undesirable state is denoted as O^- and the chosen feature that determines the state of the object is called the target feature.

Action: For an intended object o ∈ O^-, an action is formally denoted by α = (A, a → a′) where A ≠ L, a is the current value of o and a′ is the new value suggested by α.

Action Set: The result of an action mining process is a set of actions denoted by Γ = {α₁, …, α_k} where k ≤ m and also there are no subset of actions that change the same feature. Given an object ∈O^-, an action set Γ will turn L (o) = l if all the actions inside Γ are applied. The state of o after the application of the action set Γ is by O_Γ.

Cost: A cost is associated with each feature that is the cost of changing the feature value. For a given action α a cost C is associated since it suggests a feature value change. Usually, the cost of an action set Γ is calculated through the summation of the cost of each individual action i.e. C (Γ) = ∑_α∈ΓC (α).

Profit: Profit is a business gain achieved by the application of action set which results in a desirable state. A profit value p_l > 0, associated with l, is defined by the domain expert.

Net Profit: the net profit of an action set is defined as:

$Net Profit (Γ, o) = {\begin{matrix} p_{l} - C (Γ), & L (o) \neq l, L (o_{Γ}) = l \\ - C (Γ), & otherwise \end{matrix}$ (1)

Effectiveness: to measure the ability of an action set to boost the likelihood of the intended object to be in a desirable state, we introduce this measure. It compares the probability of an intended object being positively labeled before (Pr_o) and after (Pr_oΓ) an action set. Given o ∈ O^- the effectiveness of an action set is defined as:

$Effectiveness (Γ, o) = \Pr_{o Γ} - \Pr_{o}$ (2)Structural Feature: the structural feature is the type of feature generated by solely investigating the structure of the network data i.e. the nodes and links in the graph. As an example, given a node v in the social network modeled as graph G, its degree, ego-net degree (the number of edges connecting the neighbors of v) and ego-net out-degree (the number of edges connecting the neighbors of v to the other nodes in the graph) [6] are structural features.

Causal Structure: Discovering causal structures from data is one of the significantly practical fields in data mining [20]. This concept, explicitly suggests the idea that the correlations inside the data do not necessarily imply causation and it is not possible to give causal meaning to statistical dependencies. Finding such structures is very complex and often near impossible but there are methods such as [21] and [22] in which certain assumption can lead to fining such structures approximately. In this paper, we have been inspired by the method introduced by Pearl in [23] called Inductive Causation ( I C). According to his work, finding causal structures among system variables is possible provided that the value of at least two other dependent variables are known for each target variable. The output of this method is similar to a Bayesian network whereas the links inside this network show causal relationships between variables and the direction of the links determines the cause and the effect.

Causal Network (CN): a causal network is a partially directed acyclic graph where each input feature is mapped to a node in the network. In such a network a link denoted by a → b means that a is the cause of b. In other words, a is upstream to b and b is downstream from a. Figure (1) is an example of a causal network where Wet is the target feature and Season, Sprinkler and Rain are cause features.

Fig. 1

Sample casual network.

In our method, a causal network is extracted from a set of structural features and we use them to extract actions. We will dive into the details of this method in the next section.

4 The proposed method

We intend to extract actions using the causal network extracted from a social network graph. To this end, we are going to take the following steps:

Extract the structural features from the social network represented as a graph.

Extract a causal network based on the structural features.

The causal network will then be used in the action extraction process.

4.1 Feature extraction

To capture the information which lies within the dependencies inside the network data, we extract structural features and later feed them into the action mining process. To extract such features we utilize ReFeX [6]. The output of this method is a set of features built based on the structure of the network. ReFeX calculates three simple measures: degree, ego-net degree and ego-net out-degree for each node. These features are called local, meaning they can be easily induced by probing only the neighborhood of each node [6]. Furthermore to generate more features that further describe the nodes of the graph, ReFeX performs an iterative process including the following steps: initially calculates the three aforementioned measures for each node v inside the graph G. It specifically creates a feature vector of size three for each node in its first phase. Then for each node v and each measure m defines a new measure m′ which is the aggregate of m over the neighbors of v. These aggregation values are then added to the corresponding feature vectors. The feature generation process continues until no new information is given by the newly added features. In other words, ReFeX keeps only the most informative features.

There are other novel methods proposed in this category such as [24 –27] which also extract structural features. These methods produce structural features by either matrix factorization or random walks that are only informative. However, it is unclear what exactly their features and their values represent. In other words, there is no set of predefined features and since an action suggests a change in the value of some features, we clearly need to extract a predefined set of structural features. Particularly, due to real-world business requirements, it is an interest to suggest some interpretable changes in the value of features which make sense according to the related domain. In this regard, we chose ReFeX and extend it by retaining additional descriptions for the structural features (in the feature vectors) for the sake of providing a set of trackable features. That is, how the structural features are used regarding the changes suggested by an action set are applied inside the network.

To address this, we extend ReFeX (and name it ReFeX2)by keeping additional information in the feature extraction phase for each feature which will subsequently help to exactly determine the changes. We keep normalization factor, iteration number and feature type to achieve the requirements of our method. By having the following elements we can get the initial value and the type of each feature, its neighborhood and the edges to be changed:

Normalization factor: it is a number for normalizing the value of features during the feature extraction phase. There are a variety of functions to do so. Having this number and the function used for normalization enables us to obtain the original value from its normalized one.

Iteration number: by having this number we can find the neighborhood to which the changes should be applied. Given an object o, the features generated in the 1st iteration are neighbors of o, features from the 2nd iteration are distance-2 neighbors of o and so on. For example, the iteration number k determines the distance of neighborhood that we need to apply changes to. k corresponds to the neighborhood distance of a given object for which we suggested and action.

Feature type: finally we need to determine what edges are to be modified, according to the changes suggested by the action. We need to determine whether the change is related to the degree, ego-net or ego-net out-degree to know where exactly the edges have to change. To do this we need to keep the name of each feature generated in the first phase and forward this name to the features extraction in aggregating phase. This way, when the feature extraction process finishes, each feature will have an indicator attached to it, showing its type i.e. degree, ego-net degree or ego-net out-degree.

4.2 Constructing a causal network based on the structural features

As mentioned earlier, inductive causation (IC-Algorithm) [2] will be used to extract the causal structures. This method searches for V-structures inside the data i.e. the causal inference of a variable C can only be made when the value of at least two other upstream variables such as A and B of C are known (Fig. 2).

Fig. 2

A and B are not independent when the value of C is known.

To find the causal network, the IC-Algorithm uses conditional independence relationships between features.

Given the structural features extracted by ReFeX we can extract a causal network from them using algorithm which is the base of many proposed methods. IC is based on conditional independencies among input data to find cause and effect relationships in which the main idea is to find the V-structures mentioned earlier.

The IC algorithm takes the following steps to find the causal dependencies: for any feature pair (a, b) inside input data, search for a set S_ab where a and b are conditionally independent having the features inside the set. In other words, given S_ab, a and b must remain independent. Then create undirected graph G with an edge between a and b if and only if no S_ab can be found for a and b. Then for any feature which is not the neighbor of a nor b, having a common neighbor c, check whether c ∈ S_ab. If so, continue; otherwise, direct the edges to c i.e. a → c → b. In the remaining graph, direct the undirected edges with the following conditions: 1) the directions must not form a new V-structure and 2) the direction must not make a cycle inside the graph.

The idea of using V-structures is apparent in the above procedure. The output of IC is a causal network which is used for inference in our method. However, because of exponential complexity of the IC algorithm w.r.t the number of features, we apply a greedy approach due to the fact that we are interested in finding out a partial causal structure leading to the intended target feature. Therefore, we check the different combination of features and stop this procedure as soon as a causal network is found. The overall conception is illustrated by the following example.

The following figure is a simplified hypothetical Bayesian network built from a dataset with features S1={A, B, C, D, E, F, G} where G is selected as target feature.

Hereon, the IC algorithm is applied to the Bayesian network to find the underlying causal network. As mentioned earlier, this method is interested in finding the V-Structures affiliated with the target feature (G in this example). The network shown by Fig. 4 is the only possible causal network according to Fig. 3 because only D and E are not independent if the value of G is known.

Fig. 3

A Bayesian network.

Fig. 4

The Causal Network inferred for the Bayesian network in Fig. 3.

According to the obtained causal network (depicted by Fig. 4), features S₂ = {D, E, G} are the upstream nodes to G. In other words, S₂ is the candidate features set. Instead of incorporating S₁ in the action extraction process, only S₂ (where S1 ⊂ S2) is used which leads to a much less computation overhead.

4.3 Action Extraction from causal network

We are going to extract actions based on causal relationships between features where these relationships are represented inside a causal network.

In order to extract actions based on the causal dependencies rather than correlations, a causal network is extracted and incorporated in the action mining process. To extract actions, we need two elements. Feature vectors and the underlying causal structure. We have adopted the method in [3] to extract actions but the features we use are structural and the causal network used is extracted from those features. In the causal network, each node is the representative of a feature and one of them is chosen as the class label. We seek causal actions so the class label has to be selected from the causal network. The nodes upstream to a target are called cause nodes. Therefore, finding such cause nodes will result in a set of the candidate target features. Once a target feature is chosen, its upstream nodes are candidate features in the action mining process. That is because of the fact that only these nodes can causally affect the target feature. So any other feature will be discarded in the action mining task. These candidate nodes will be called action features only when they show up inside an action.

Assuming we have extracted the causal network CN, we try to find the action set Γ for an intended object o ∈ O^-. The quality of action sets are evaluated by using two measures namely effectiveness (formula 2) and net profit (formula 1). For a given object o the more the effectiveness of the action set the more the probability of turning the object into the desirable state. It is noteworthy that a causal network is incorporated when calculating the effectiveness of an action set. The effectiveness of an action set Γ with respect to causal network CN is as follows: $\begin{matrix} Effectiveness (Γ, o) = \Pr_{o Γ} | C^{'} - \Pr_{o} | C \end{matrix}$ Where C is the set if cause nodes and C′ is the set of cause nodes that are not downstream from any action feature of Γ. In other words, Pr_o|C is calculated using the original values of the cause features and Pr_oΓ|C′ is calculated using the values of the cause features inside C′ suggested by Γ.

Ultimately, the goal of action extraction for an object o is to find an action set that maximizes the net profit. Each action in the form of (A, a → a′)) that A is a feature corresponding to a cause node, can be a candidate to suggest a change in the value of the goal feature. If C is the set of cause nodes, the total number of candidate actions will be ∑_A∈C|Dom (A) -1| and the number of possible actions sets will be $\prod_{A \in C} | Dom (A) |$ .

In order to extract action sets two approaches have been proposed in the literature. One is to investigate all actions one by one (exhaustive search or ES) [3]. In ES the best action set will be chosen net profit-wise by calculating the net profit of each possible action set. However, it is apparent that searching the state space exhaustively will be intractable in medium to large datasets. The other approach is to follow a greedy search (GS) [3].

Algorithm: SF-ICE-CREAM (T, O, C, pg, G, t, max)

Input: target attribute T, data objects O,

cost data C, profit pg,

Social Network Graph G,

unit by which threshold is increased t,

maximum iterations max,

Output: one action set for each objecto ∈ O

1: F ← REFEX2(G, t, max) // a feature matrix

2: Model ← findBNfromSamples(F) // Bayesian model

3: CN ← model2graph(Model)

4: if CN. inDegree(T)>1 then

5: ICE-CREAM(T, O, C, pg, CN)

6: else return NUL

GS approach starts with an empty action set, then for each object O in an undesirable state (negative label) we add one of the candidate actions to the action set that improves the net profit of the action set compared to the other candidate actions. Then the new action set is compared to the previous action set, if the net profit of the newly obtained action set is improved, the new action will be permanently added into the action set. This continues until no action can be added to improve the net profit of the action set or no action remains. The whole process of extracting actions is briefly illustrated in the algorithm, we name it SF-ICE-CREAM. The algorithm uses the features generated by REFEX2, provided by a feature matrix F. Then, a Bayesian network is constructed out of the features where using the IC algorithm (line 3) the corresponding causal network is extracted. Finally, the greedy approach of algorithm ICE-CREAM generates the action sets for the given set of the input objects.

5 Experimental results

In this section, we will compare the three methods. The first method is our proposed SF-ICE-CREAM. The other methods are ICE-CREAM [3] and ICE-CREAM-EXT 1 . The latter is proposed because in some cases ICE-CREAM fails to extract causal structures from non-structural features. Therefore, for the sake of better comparisons and contextualization we extended ICE-CREAM. To this end, a limited number of simple structural features are extracted and used in the method. Particularly, for a node x, we adopted the following structural features in the action extraction process for extending ICE-CREAM to ICE-CREAM-EXT:

The total number of edges adjacent to x.

The total number of edges between x and its positive neighbors.

The total number of edges between neighbors of x.

The total number of edges between positive neighbors of x.

The number of structural features is limited to the mentioned list in order to avoid complicating the method. These features differ from the features extracted by ReFeX which are used in SF-ICE-CREAM in terms of complexity and comprehensiveness. More precisely, in addition to contextualization, our experiments demonstrate that ICE-CREAM not only fails to extract a causal network from node features occasionally, but also there are cases that even a predefined set of structural features do not lead to a causal network. However, the experiments show that using ReFeX structural features enables SF-ICE-CREAM to extract a causal network in all cases.

Using structural features improves the results compared to the competing methods. It also can be noted that we are not bound to node features to extract causal structures and we can use structural features as well.

We used a computer with 8 GB of RAM and a Core i7 1.6 GHz processor. We have also used Python programming language alongside Pomegranate, SKLearn and NetworkX packages for implementation, learning causal network, causal inference and traversing network respectively.

5.1 Data

The popular and publicly available 2 friendship and co-authorship networks of this domain have been selected for the experiments. The details of the datasets are described in Table 2. In the friendship networks, nodes are users and edges indicate friendship relations. We consider the following friendship networks: Facebook, where labels are locales, Google+where labels are places. It is notable that due to some memory limitations in our implementation we have divided Google+dataset into two smaller sub-graphs. The mentioned friendship networks are also provided with node features which is a necessity to compare competing methods as they rely on these type of features. In co-authorship networks, the nodes are authors and an edge exists between two authors if they have co-authored the same paper. We consider co-authorship network DBLP. In order to label input data in a binary class format, the values corresponding to the target value are assumed to be positive, and otherwise negative.

Table 2
Information about datasets

Dataset Number of node features Number of edges Number of nodes

Facebook 27 88234 4039

Google+ largest node set 6 54792 4938

Google+ largest edge set 6 1143015 4926

DBLP 11 8803 4112

Dataset	Number of node features	Number of edges	Number of nodes
Facebook	27	88234	4039
Google+ largest node set	6	54792	4938
Google+ largest edge set	6	1143015	4926
DBLP	11	8803	4112

Since DBLP does not explicitly consist of node features, therefore, we have extracted the features by exploring the textual description and the underlying citation graph which have been attached by the provider of the dataset. For every node u of the networks we generate the following features:

Number of papers u authored

Number of papers u authored in goal conference

Number of papers in which u is the first author

The time since u authored the last paper

Time since u last authored a paper in goal conference

Number of time slices in which u authored a paper

Number of Conferences/Journals in which u authored a paper

Number of Conferences/Journals in which u was the first author

Number of citations of u

Number of citations of u in the goal conference

Number of the papers cited by u

Number of the papers in the goal conference cited by u

5.2 Evaluations

To compare and evaluate the results of each method, we measure effectiveness, cost, net profit, time efficiency, generality, and coverage. The first three measures are defined in terminology. The rest is defined as follows:

Time Efficiency of method:

For each o ∈ O^- the running time from the beginning of feature extraction until finding the best action set is recorded. This measurement is used to compare the running time of the methods (in millisecond).

Generality of action sets extracted by a method:

It is the fraction of the number of features included in the action set and the total number of features. The closer to 1 the more proportion of the features has to be changed. The lesser the number of changes suggested by an action set the more general is the action. This could be a cause of concern for some business requirements as they may prefer fewer feature changes. The generality for a given method m is defined as follows (assuming that the method extracts at least one action set for a given object): $\begin{matrix} G (m) = \\ \frac{Average number of features in the action sets extracted by m}{Total number of features} \end{matrix}$

The averaged value is used if the method is applied for a given set of objects.

Coverage of method:

For a given method m the ability of finding action sets with positive profit given a set of objects in an undesirable state for which the method could extract action sets with positive profit. As mentioned before, our goal is to find an action set for a given object which not only increases the likelihood of being in a desirable situation but also maximizes the net profit. The introduced action mining methods guarantee one action set for each object o ∈ O^- but naturally, some action sets may result in profits smaller than or equal to zero due to their high costs. In these cases, we set net profits to zero for better comparisons. Given a set of objects O^- in an undesirable state, the coverage of method m is as follows: $\begin{matrix} Coverage (m, O^{-}) = \\ \frac{number of action sets with NetProfit > 0 extracted by m}{| O^{-} |} \end{matrix}$

5.3 Causal networks

The main contribution of our works is to show by evidence that it is possible to extract a causal network from structural features in social networks. In the initial step, given a network dataset, a set of structural features are extracted by ReFeX. The number of structural features extracted from each dataset and their corresponding graph is as follows (Table 3):

Table 3
Number of structural features of each dataset

Dataset Number of structural features

Facebook 223

Google+ Largest Node Set 90

Google+ Largest Edge Set 123

DBLP 339

Dataset	Number of structural features
Facebook	223
Google+ Largest Node Set	90
Google+ Largest Edge Set	123
DBLP	339

The next step is to extract the causal network from the structural features extracted by ReFeX2. Off-the-shelf libraries of finding causal structures suffer greatly from the limit of input variables. In order to extract a causal network, feeding more than ∼20 features will cause IC to halt. We have empirically set the number of input features to 15. We use the greedy approach since an exhaustive search over the space of possible combinations is nearly infeasible for the networks under experiment.

In our experiments, we initially select the first 15 features shown as nodes numbered 0–14 in the resulting causal networks (Fig. 5 through Fig. 8). All these features were extracted by ReFeX and the first 15 features of all the datasets were of the same type. More precisely, each feature is either a degree or an aggregation of the node degree in different areas of the social network. The following are the causal networks extracted from each dataset:

Fig. 5

The causal network of DBLP.

Fig. 6

The causal network of Facebook.

Fig. 7

Casual network of Google+Largest Nodes subgraph.

Fig. 8

Causal network of Google+Largest Edges subgraph.

In each of the induced causal networks, the 15th feature was selected as the target as it is at the end of most of the possible topological sorts of each causal network. Having these networks, SF-ICE-CREAM conducts action extraction. The results are presented in the following section.

5.4 Results

In order to run the methods and perform our experiments, we have run each method 100 times over 100 randomly selected objects among O^-. We have also selected the target feature randomly from the end of the causal network to avoid biased results and to better challenge the methods as it is more reasonable to traverse through the end of the network. In other words, a target feature is one which exists at the end of the most topological sorts of the causal network. Furthermore, the values inside the results tables are averaged for all the input instances.

We will see in the following tables that the effectiveness and net profit of our method outperform other methods. For simplicity SF-ICE-CREAM, ICE-CREAM-Ext and ICE-CREAM are abbreviated as SFIC, ICX, and IC respectively.

Table 4 shows the performance of the methods against Facebook dataset. Our method outperforms others in net profit and effectiveness. It is noteworthy that the effectiveness and the net profit of ICX are more than that of IC which shows that using even simple structural features can also improve the performance.

Table 4
Comparison of methods in the Facebook dataset

SFIC ICX IC

NetProfit 0.015 0.0009 0.0004

Effectiveness 0.03 0.01 0.002

Cost 15.43 10.56 2.08

Time 0.04 0.03 0.10

Coverage 0.97 1.0 0.15

Generality 0.07 0.25 0.01

	SFIC	ICX	IC
NetProfit	0.015	0.0009	0.0004
Effectiveness	0.03	0.01	0.002
Cost	15.43	10.56	2.08
Time	0.04	0.03	0.10
Coverage	0.97	1.0	0.15
Generality	0.07	0.25	0.01

As IC fails to extract causal network from Google+dataset so only SFIC and ICX are compared. We can see the better performance of our method here as well (shown in Table 5).

Table 5

Comparison of SFIC and ICX in Google+ dataset

	SFIC		ICX
	Google+ Largest	GOOGLE+ Largest	GOOGLE+ Largest	GOOGLE+ Largest
	Edges subgraph	Nodes subgraph	Edges subgraph	Nodes subgraph
NetProfit	0.05	0.02	0.001	0.01
Effectiveness	0.14	0.04	0.009	0.02
Cost	90.71	23.22	8.03	5.53
Time	0.10	0.03	0.02	0.02
Coverage	0.56	0.44	0.99	0.99
Generality	0.15	0.05	0.24	0.25

In the co-authorship network even with the simple structural features, ICX fails to extract causal networks. Only SFIC managed to successfully perform against this dataset. Table 6 shows the results of SFIC in DBLP dataset.

Table 6

SFIC results in DBLP dataset

	SFIC
NetProfit	0.01
Effectiveness	0.02
Cost	13.09
Time	0.20
Coverage	0.98
Generality	0.07

According to results, the NetProfit of the proposed SFIC method outperforms the others significantly. The ultimate goal of the three methods is to obtain the maximum net profit possible and the proposed method has achieved the goal. As formally defined earlier NetProfit is tightly related to effectiveness and cost. However, in most cases, the NetProfit of SFIC is high in spite of the high cost. In this cases, although the cost of SFIC is much higher, its prominent effectiveness offsets the high cost which results in a better NetProfit.

The generality of SFIC is better than the competitors on average, i.e. actions suggested by SFIC are more general than the ones suggested by the other methods. In fact, the number of features to be tweak in order to reach to a desirable situation is less than the others (IC does not involve structural features at all). It is interesting for business owners because they are more willing to tweak less features as possible, yet gain a higher profit.

The setback of our methods shows when putting coverage into perspective, SFIC does not guarantee to find action sets for all possible input objects yet our method has produced comparable results.

The running time of the methods lie within the same lane since there are not significant differences between them in average.

6 Discussion and future works

We proposed the idea of using the underlying causal network inside social network data and utilized it for action mining. We discussed how structural features can be regarded as a principle when analyzing such data. We also targeted the problem of neglecting the structural features when working on network data and we reintroduced one of the methods addressing this issue. It could be seen that using such features can greatly enhance the quality of action sets compared to when no features derived from graph structure are used. We introduced a hybrid method called SF-ICE-CREAM which is a combination of two methods, one in the field of structural feature extraction and the other for extracting actions with respect to the causal network.

Even though our method clearly improves its base methods but needs improvements itself. Solving the uncertainty of automatically induced casual networks is a bottleneck in this method. Because it is a time-consuming process and has high computational complexity and it also has many assumptions that may degrade the truthfulness of discovered causal dependencies, therefore, reduces the reliability of extracted action sets. In addition, our method assumes that there are no missing or noisy data present and the variables are all discrete. Another setback of the method is the increase of its computational complexity as the number of features increase. In this case, we need to limit the number of input variables which means not all of the features can be included in the process. Proposing a framework specifically designed to preprocess the input social data can be beneficial.

The massive amount of generated social data is also a challenge since it disables our method to perform on real-world massive social graphs. Having a distributed version of the proposed method can also be beneficial towards increasing the applicability of our method in real-world problems.

Finding causal networks themselves has a lot of challenges. In our method, one of the main steps is to find the causal network from the input data. To this end, a handful of libraries and methods have been introduced but none of them is able to include all the features since their complexity increases exponentially as the inputs grow in number. This is also a setback because it weakens the method to solve real-world problems with a large number of features.

Footnotes

ICE-CREAM Extended.

References

Pearl

, “A theory of inferred causation,” Causality, pp. 41–64.

Pearl

and Verma

T.S.

, “A theory of inferred causation,”, Studies in Logic and the Foundations of Mathematics 134 (1995), 789–911.

Shamsinejad

, Saraee

and Blockeel

, “Causality-based Cost-effective Action Mining,”, Intelligent Data Analysis 17(6) (2013), 1075–1091.

Kalanat

and Khanjari

, “Action extraction from social networks,” Journal of Intelligent Information Systems, 2019.

Kalanat

and Khanjari

, “Extracting Actionable Knowledge from Social Networks with Node Attributes,” Expert Systems with Applications, vol. 3, 2019.

Henderson

, Gallagher

, Li

, Akoglu

, Eliassi-Rad

, Tong

and Faloutsos

, “It’s who you know: graph mining using recursive structural features,” KDD, 2011.

Cao

, Zhao

and Zhang

, et al., “Flexible Frameworks for Actionable able Knowledge Discovery,” , IEEE Transactions on Knowledge and Data Engineering, TKDE 22 (2010), 1299–1312.

A.A.M.M.S.a.P.M. Tzacheva, “MR-Apriori count distribution algorithm for parallel Action Rules discovery,” In Knowledge Engineering and Applications (ICKEA), IEEE International Conference, 2016.

, Mao

, Zeng

and Zhao & H., “Mining actionable behavioral rules,”, Decision Support Systems 54 (2012), 142–152.

10.

, Jian

, Zhenpeng

and Yuan

, “Mining Actionable Behavioral Rules Based on Decision Tree Classifier,” Semantics, Knowledge and Grids (SKG), pp. 139–143, 2017.

11.

Ranganathan

, Allen

, Arunkumar

and Angelina

, “Actionable pattern discovery for Sentiment Analysis on Twitter Data in clustered environment,” Journal of Intelligent & Fuzzy Systems, 2018.

12.

Kuang

J.Z.W.R.a.A.D.

, “Personalized meta-action mining for NPS improvement,” In International Symposium on Methodologies for Intelligent Systems, pp. 79–87, 2015.

13.

M.A.H.Z.W.R.L.C.D.O.Y.P.J.P.a.Y.X. Almardini, “Reduction of readmissions to hospitals based on actionable knowledge discovery and personalization,” In Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery, pp. 39–55, 2015.

14.

Kuang

and W.

, “In Search for Best Meta Actions to Boost Businesses Revenue,” In Flexible Query Answering Systems, pp. 431–443, 2016.

15.

Touati

H.Z.W.R.a.J.S.

, “Meta-actions as a tool for action rules evaluation,” Feature Selection for Data and Pattern Recognition, pp. 177–197, 2015.

16.

Pothirattanachaikul

, Takehiro

, Sumio

, Akira

and Katsumi

, “Mining Alternative Actions from Community Q&A Corpus,” Journal of Information Processing, pp. 427–438, 2018.

17.

Subramani

and Manjula

, “Extracting Actionable Knowledge from Domestic Violence Discourses on Social Media,” arXiv preprint arXiv, 2018.

18.

Cui

Z.W.C.Y.H.a.Y.C.

, “Optimal action extraction for random forests and boosted trees,” In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 179–188, 2015.

19.

, Zhicheng

, Yixin

and Xiaoping

, Extracting optimal actionable plans from additive tree models, Frontiers of Computer Science 1 (2017), 160–173.

20.

, “Beyond Understanding and Prediction: Data Mining for Action,” In Proceedings of the 26th International Conference on World Wide Web Companion, pp. 1361–1361, 2017.

21.

Spirtes

, Glymour

and Scheines

, Causation, prediction, and search, MIT Press.

22.

Spirtes

and Glymour

, An algorithm for fast recovery of sparse causal graphs, Social Science Computer Review 9(1) (1991), 62–72.

23.

Pearl

, Causality models, reasoning, and inference, Cambridge University Press, 2018.

24.

Aditya

and Leskovec

, “node2vec: Scalable feature learning for networks,” Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2016.

25.

M.Q.M.W.M.Z.J.Y.a.Q.M.J. Tang, “LINE: Large-scale Information Network Embedding,” WWW, 2015.

26.

Tang

and Liu

, Leveraging social media networks for classification, Data Mining and Knowledge Discovery 23(3) (2011), 447–478.

27.

Perozzi

R.A.-R.a.S.S.B.

, “DeepWalk: Online learning of social representations,” KDD, 2014.

Actionable knowledge discovery from social networks using causal structures of structural features

Abstract

Keywords

1 Introduction

2 Related work

3 Terminology and preliminary

4.1 Feature extraction

4.2 Constructing a causal network based on the structural features

5 Experimental results

5.1 Data

Table 2 Information about datasets Dataset Number of node features Number of edges Number of nodes Facebook 27 88234 4039 Google+ largest node set 6 54792 4938 Google+ largest edge set 6 1143015 4926 DBLP 11 8803 4112

5.3 Causal networks

Table 3 Number of structural features of each dataset Dataset Number of structural features Facebook 223 Google+ Largest Node Set 90 Google+ Largest Edge Set 123 DBLP 339

Table 4 Comparison of methods in the Facebook dataset SFIC ICX IC NetProfit 0.015 0.0009 0.0004 Effectiveness 0.03 0.01 0.002 Cost 15.43 10.56 2.08 Time 0.04 0.03 0.10 Coverage 0.97 1.0 0.15 Generality 0.07 0.25 0.01

Footnotes

References

Table 2
Information about datasets

Dataset Number of node features Number of edges Number of nodes

Facebook 27 88234 4039

Google+ largest node set 6 54792 4938

Google+ largest edge set 6 1143015 4926

DBLP 11 8803 4112

Table 3
Number of structural features of each dataset

Dataset Number of structural features

Facebook 223

Google+ Largest Node Set 90

Google+ Largest Edge Set 123

DBLP 339

Table 4
Comparison of methods in the Facebook dataset

SFIC ICX IC

NetProfit 0.015 0.0009 0.0004

Effectiveness 0.03 0.01 0.002

Cost 15.43 10.56 2.08

Time 0.04 0.03 0.10

Coverage 0.97 1.0 0.15

Generality 0.07 0.25 0.01