Abstract
In this paper, a decision support tool is developed in order to assist senior managers to choose a dynamic community detection algorithm. The main objective behind this novel approach is to improve the search strategy of algorithms to decrease the search time of the best fitting one according to the study case. The suggested study ranks existing dynamic community detection algorithms in order to find the most appropriate one according to some preferences. For that purpose, we need a multi-criteria ranking method such as PROMETHEE II. This latter allows ranking alternatives from the best to the least according to some criteria extracted from the analysis of a provided questionnaire on dynamic community detection algorithms. The process of our approach is as follows: first, the questionnaire is provided to some experts in community detection domain. Second, answers are collected and analyzed in order to extract criteria. Third, a ranking of the algorithms with PROMETHEE II is operated according some extracted criteria and user preferences. Finally, the first ranked algorithm by PROMETHEE II is provided to the user in order to detect the communities.
Introduction
A social network is a set of actors (individuals, groups or organizations) that are connected by social interactions. These networks tend to be dynamic and scalable; nodes can join and leave the network over time. One feature of these networks is that they behave locally as a clique i.e. a group of nodes are strongly connected to each other and weakly connected to the rest of the network; this is how a community is generally defined. The community detection in dynamic networks has become a widely discussed topic by many researchers over the world. These latter have proposed different approaches they applied on real networks [13].
Many companies that need a community detection algorithm for professional use find it hard to choose one among many different existing algorithms that best suits their needs, this inspired us to provide them with a novel approach in order to assist these companies with a decision support process that is based mainly on the preferences of experts or senior managers working for these companies.
The main objective of our study is to develop a simple and easy to use decision support tool that aims to assist companies that want to detect communities within their network for business use. It allows an enterprise manager to choose the most fitting algorithm for dynamic community detection according to his preferences.
The major steps of our approach are as follows: First, a questionnaire on dynamic community detection algorithms is provided to some experts in the field of community detection in order to extract criteria. Next, algorithms are ranked with PROMETHEE II according to the extracted criteria; we also take into consideration user preferences. Finally, we provide to the user the first ranked algorithm in order to detect the communities.
Given the diversity and the number of community detection algorithms that exist in the literature, it becomes necessary to assist a company that wishes to use one of these algorithms to manage its professional enterprise network.
Our approach presents the following contributions:
A simple and effective decision support tool that facilitates the search and selection of a dynamic community detection algorithm. This tool is based on a set of knowledge that is related to the preferences of a decision-maker or a manager of the professional business network. A ranking of the most known algorithms that are used for dynamic and/or static community detection, based on the PROMETHE II method and according to several criteria such as operations on communities, scalability and the size of the network. An approach that allows decision makers to test both the best response (according to their preferences) that is provided by this tool (best algorithm) and the worst response; this will give more confidence to the decision-maker. Our tool is very simple and easy-to-use thanks to the graphical interfaces it offers throughout its use. It saves research time and provides several reliable answers.
The remainder of this paper is organized as follows: the second section presents a literature review on community detection and PROMETHEE II. The third section describes our proposed approach and some relevant results. The last section gives our conclusion and some future research directions.
This section is composed of two parts namely: community detection algorithms with their use in professional networks and multi-criteria methods.
Dynamic community detection
Below, we shade light to some definitions which are repeated in this paper.
A community is generally defined as a dense group of nodes, i.e. nodes inside a community are strongly connected to each other, and at the same time they have little connection to nodes outside the community. The community detection problem is thus formalized in the search for a partition of a graph in dense groups partially connected to each other [16]. There are definitions of communities where a node can belong to various communities. Such communities are called overlapping communities.
To evaluate the quality of a partition, a so-called quality function that gives a score to a partition is generally used. Then, it remains to maximize this function to find the best score for a given graph. There are several quality functions; the most used one is modularity that is defined by Newman and Girvan [7]. The optimization of this value leads theoretically to the best possible grouping of the nodes of a given network. Many of static algorithms are based on the well-known Louvain Algorithm, which is used in weighted graphs; it has been widely used because of its rapid convergence properties, high modularity and hierarchical partitioning [17]. Despite these advantages, it faces certain problems such as the limit of resolution, which restricts the ability to detect small communities even if they are well defined. Also, the order in which the nodes are treated has an influence on the graph partitioning [13].
A growing interest has been paid to the dynamic aspect of temporal networks, evolving and time-varying graphs, the number of proposed approaches for the dynamic community detection is increasing to date. Many definitions have been proposed for what a dynamic community is, some consider it as a succession of static communities, and others define it as a community that evolves over time in different ways. However there are still new questions in this regard and one of the most relevant is community operations [16].
Dynamic community detection is defined as the process of finding relevant communities in a network that changes over time. A dynamic community can evolve in different ways, adding a node corresponds to the expansion of a community, removing a node corresponds to the operation of contraction. Two communities that become similar over time will be merged into one, and a community can be split into two new and smaller communities [3]. And since we are talking about dynamic networks that evolve over time, new communities may appear, and other old communities may disappear, which corresponds respectively to the birth and death operations of communities.
In [3], Rossetti and Cazabet proposed a categorization of proposed approaches for dynamic community discovery. They presented three main classes of approaches namely Instant Optimal Community discovery, Temporal Trade-Off Community discovery and Cross-Time community discovery, these latter are described below:
In the first category, methods are based on applying existing static community algorithms on dynamic networks, which is considered as a major advantage. Clusters at
While in the second category (Temporal Trade-Off Community discovery), algorithms take into consideration the communities that are discovered in the step
Finally, Algorithms that are presented in the third category (Cross-Time community discovery), take into account all the network snapshots simultaneously (past and future states of the network), and run the community detection process in a single shot. The problem of instability is overcome, but these approaches suffer from some drawbacks such as the absence of merge and split operations, and have a fixed number of communities, they also cannot manage real-time community detection. Ben Jdidia et al. [10] created a single network from different snapshots by linking together similar nodes appearing at different snapshots. They also created links between nodes connected to at least one common neighbor in two consecutive snapshots. In [16], Instead of detecting communities on each time step, authors detected a unique decomposition in communities that is relevant for every time step during a given period called the time window. Matias and Miele [2] proposed to use a Stochastic Block Model to find clusters in temporal networks. They suggested an adapted Expectation-Maximization to find the belonging of nodes that better fit the observed network.
After the study of all related works cited above, the following remarks are made:
No study directly treats the addressed problem that we posed in the introduction.
The Works presented in Cazabet and Amblard [14] and Rossetti et al. [4] were very interesting for our research ongoing: We were simply inspired by the approaches taken from these works, which defined a list of rules that determine how network changes lead to the updating of communities. These works allow most operations on communities (birth, death, expansion, contraction, fusion and division); they can also be executed on large scale networks.
We also chose in our study the two methods ALPA and GLmetric where ALPA [5], is an adaptive label propagation algorithm that was proposed by Han et al. in order to discover and monitor communities in dynamic networks. Sami and Hamzeh [8] proposed GLmetric which is a two-phase approach; it explores the global information in the first phase and then exploits the local information in the second phase to discover communities more accurately.
Multi-criteria methods
In a situation where multiple criteria are involved confusion can arise if a logical, well-structured decision-making process is not followed.
Multi-Criteria Decision Aiding (MCDA) is a decision-making tool that is developed for complex problems. It deals with multiple conflicting criteria, establishing scientific bases to elaborate recommendations according to the needs of each decision maker. It aims to compare different actions or solutions according to multiple criteria and policies.
Multi-criteria methods can be divided into three categories depending on how judgments are aggregated [1]:
Complete aggregation (top-down approach), where we seek to aggregate the
Partial Aggregation (bottom-up approach). In this category, we are looking to compare potential actions or rankings to each other and to establish between these elements outranking relationships. Partial aggregation allows respecting the incomparability. Principal methods: ELECTRE, PROMETHEE.
Local aggregation, where we look in the first place, for a starting solution. Thereafter, an iterative search is conducted to find a better solution. Example of methods using local aggregation: Stem (Pop), Uta interactive.
The PROMETHEE method (Preference Ranking Organization Method for Enrichment Evaluations) was developed by Jean-Pierre Brans in 1982. It is based on pair-wise comparison of the alternatives. PROMETHEE differs from other partial aggregation methods in the fact that it constructs a valued outranking relation reflecting preference intensity. It helps decision makers find the alternative that best suits their goal and their understanding of the problem [6].
There are two most known PROMETHEE methods applied for ranking problems: PROMETHEE I which provides a partial ranking relations, and PROMETHEE II which yields a total ranking of alternatives from the best to the worst. For this reason we chose in our study to use PROMETHEE II.
The PROMETHEE II method performs the analysis through a comparison of alternatives per pair for each criterion, to obtain the preference relations between two alternatives [11]. The conditions for applying the PROMETHEE II method, assume that for each criterion is assigned a weight and a preference function that represents the type of criterion. It is denoted
There are several algorithms that are proposed for dynamic community detection, but not all of them provide satisfying results. The main goal of our study is to find the most suitable algorithm for community detection in dynamic social networks among these algorithms and suggest it to the user. We describe in the following our suggested approach.
Proposed approach
Our approach is designed for companies that use Web 2.0 and wish to identify communities within their organizations. These communities can be used to establish statistics, to make an inventory, or to identify the type of clients to provide the appropriate services. For that, the acquisition of preferences according to the field of application will be very useful. As there is a large number of proposed community detection algorithms on the web, a selection of these latter is necessary. This filtering makes it possible to have the algorithm that best suits the case study.
Among the preference-based multi-criteria methods, we have chosen to use PROMETHEE II since it provides a total outranking of alternatives (in our case the algorithms) from the best to the least good, this allows the user to switch to the next ranked alternative if the first one is not satisfied.
More precisely, our approach consists in ranking some known dynamic community detection algorithms using a preference based multi-criteria method, we have chosen PROMETHEE II which provides a total ranking of actions according to selected criteria. A general overview of the major steps that we followed in this approach is given in Fig. 1.
In order to better understand how our decision support tool works, we propose to follow three sequential steps; each one is illustrated in respectively Algorithms 1–3.
A general architecture of our approach.
Algorithm 1 starts the process by sending the questionnaire to all experts from the initiator (which is the admin profile). After filling all the questions (lines 2–4) by the experts, the initiator analyzes the obtained answers and gets two scenarios: an ideal case where answers are similar, and a conflict case where answers are different (lines 5–8). In the ideal case (lines 9–12), we extract criteria from the answers in order to fill the evaluation matrix (which is the input of PROMETHEE II). Then we execute the PROMETHEE II method which ranks the algorithms from the best to the least appropriate and finally recommend the first ranked one to the user. Algorithm 1 corresponds to the architecture presented in Fig. 1.
Algorithm 2 describes the ranking process with PROMETHEE II. First, we calculate the intensity of preferences
This algorithm presents the negotiation process in the proposed approach. In case of a conflict in answering the questionnaire, the initiator sends a proposal
Dynamic community detection questionnaire.
In order to construct the evaluation matrix which is used as the input of PROMETHEE II, we consider for our study the following algorithms for the simple reason that the results they provide are very relevant: iLCD [14], FacetNet [18], ALPA [5], Tiles [4] and GLmetric [8]. These latter serve as “alternatives” in the evaluation matrix. a questionnaire is used in order to extract criteria about these algorithms.
Extracting criteria method
In this section, we execute PROMETHEE II in order to classify the selected algorithms.
In order to extract criteria to use in classifying algorithms, we propose a questionnaire that contains some questions about the algorithms chosen in our study. This questionnaire is designed to collect information about the most known community detection algorithms in order to extract criteria to use in PROMETHEE II to classify these algorithms.
Example of answers of experts E1 and E2.
We distinguish two profiles, which are the Expert profile and the Admin profile, each profile has a specific role in our questionnaire.
The role of the Expert profile is to answer (fully or partly) the questionnaire. We defined the experts as the researchers in community detection domain, e.g. someone who proposed or contributed in one of the existing algorithms, since they have knowledge on community detection algorithms enough to answer our questionnaire. For example: Rémy Cazabet, is a researcher in community detection domain who proposed the iLCD algorithm, he can answer questions about his own algorithm (iLCD) and about the other algorithms (Tiles, FacetNet, ALPA and GLmetric).
The Admin profile is the one who starts the decision support process, consults, collects and analyzes the responses of the experts. At the first session of the process, the admin is at a cold start state, he starts the process, analyzes the responses of the experts to extract criteria in order to execute PROMETHEE II and proceed with the support process. As the process is repeated, he obtains knowledge and a history of the chosen algorithms at previous sessions.
To test our approach, we consider a sample implying four expert profiles (E1, E2, E3 and E4) to answer our questionnaire. A part of this latter is illustrated in Fig. 2.
A conflict example.
After collecting all the responses, we analyze each one in order to have a clear view on the criteria to be extracted for use in PROMETHEE II. In our study, two cases are considered namely: an ideal case and a conflict case, the process of both cases will be described in the following.
In the ideal case, the experts answers are converging i.e. answers are similar to each other for all experts. This will make it easier to extract the most relevant criteria in choosing a community detection algorithm. Figure 3 shows an example of the experts’ responses to our questionnaire. We can see that there is a resemblance between the expert E1 and E2’s responses about the proposed algorithms.
As for the conflict case, it occurs when human experts don’t agree on an answer of a question in the questionnaire. In order to resolve this conflict, we use the Contra Net Protocol (CNP) as a negotiation mechanism, which is a dynamic and easy to implement algorithm. We have selected some scenarios of negotiation between experts.
Contract Net Protocol, proposed originally by Smith (1980) is a high-level protocol for communication between nodes of a distributed problem solver as a classical negotiation strategy. It facilitates distributed control of the execution of cooperative tasks with efficient communication between nodes [15]. According to Smith, bargaining has four important components:
It is a local process that does not involve centralized control. The exchange of information takes place in both directions. Each party in the negotiation evaluates information from its own perspective. The final agreement is made by mutual selection.
The negotiation protocol is triggered when a conflict occurs between experts. We supposed that there’s a conflict in the question about “fusion and division” in the section of “iLCD” as it is shown in Fig. 4.
The first scenario represents a failing negotiation case. The initiator sends to the experts E1, E2, E3 and E4 the first proposal, which is a proposition of an answer to the question that led to a conflict. All the experts reject the initiator’s proposal. Thus, the negotiation failed and the question will be removed from the questionnaire. This scenario is illustrated in Fig. 5.
Scenario 1 of negotiation.
The second scenario represents a successful negotiation case. The initiator sends to the experts E1, E2, E3 and E4 the first proposal. Each expert proposes to modify the proposal and sends the modification to the initiator. The initiator realized that the suggested modifications coming from all participants are equivalent so, after an evaluation, he notes that the final proposal is
Scenario 2 of negotiation.
After analyzing the experts’ answers, four criteria are extracted. The first one is the basic operations on communities (birth, death, expansion, contraction). Second one treats the fusion/Division of communities. The third one represents the size of the network. And finally the fourth criterion expresses the scalability (possibility of large-scale transition for a network).
User preferences such as “Algorithm’s clarity” and “Availability of algorithm’s source code” are also taken into consideration.
Figure 7 illustrates a user interface to collect information about the network and users preferences. To test our approach, we simulate a user U1 to fill in the fields of the user interface. U1 wants an algorithm that can treats a network of 3000 nodes, which allows scalability and all operations on communities. He also wants its source code to be available, or at least the pseudo-code of the algorithm is provided and easy to understand in order to implement it.
Information collecting interface.
The evaluation matrix which is the input of PROMETHEE II is shown in Fig. 8. In our case study, alternatives are the algorithms iLCD, FacetNet, ALPA, Tiles and GLmetric; and criteria are network size, basic operations (birth/death/expansion and contraction), fusion/division of communities, network scalability, source code availability, algorithm clarity.
Evaluation matrix.
The ranking provided by PROMETHEE II illustrated in Fig. 9 shows that the best algorithm according to the user preferences is Tiles.
Ranked algorithms by PROMETHEE II.
After ranking algorithms with PROMETHEE II, the first ranked one (in the obtained results) is suggested to the user which is in our case the algorithm “Tiles”. As it is seen from the answers of the questionnaire, “Tiles” is an algorithm which can treat a network of more than 5000 nodes; it allows network scalability and all operations about communities. Also its source code is available online, and it is not much complicated to understand. Therefore, “Tiles” is best suited to the user preferences.
The suggestion would be the source code which is available online. In case it wasn’t provided by the authors, the pseudo-code of the algorithm is suggested, and it’s up to the user to implement it.
In our study, it is supposed that the user is a computer scientist or a programmer.
Conclusion
Many of the existing algorithms for static community detection are based on modularity optimization which is proposed by Girvan and Newman. It gives satisfying results, but it is time-consuming, which restricts its application to only small networks where communities are clearly defined. Several approaches that tried to treat dynamic community detection, especially the older ones, used the temporal graph as a succession of independent snapshots. A first step was to apply a static algorithm on each of these snapshots, and then find a match between the existing communities in consecutive snapshots. This method has an intrinsic weakness, on two graphs relatively close topologically; the same algorithm will sometimes find communities strongly differing.
In this paper, we presented a new approach that consists on classifying some existing dynamic community detection algorithms by using the well known method PROMETHEE II, then suggesting the source-code (which might be available online) of the most fitting one according to user preferences.
Additionally, in order to extract criteria used in PROMETHEE II, firstly, a questionnaire that contains questions about the chosen algorithms is proposed and provided to domain experts. Secondly, the answers to the questionnaire are analyzed according to two cases: an ideal case where the answers are similar, and a conflict case where the answers are different; thus, a negotiation protocol is used to resolve the conflict and thereafter extract criteria. The algorithms are then ranked using PROMETHEE II as a total ranking method according to the extracted criteria. Finally, the first ranked algorithm is recommended to the user.
The experimental part was done with four human experts to answer our questionnaire, and a user to provide user preferences and receive the decision support.
As future works, we would like to test our approach with an enterprise manager and ask for more participation and collaboration with more human participants operating as experts or simple collaborators with expertise on the dynamic management of communities. Furthermore, we want to automate the analysis of questionnaire’s answers instead of doing it manually. Also we would like to develop our own algorithm to detect communities and compare it with the first suggested algorithm in the given approach in this article. This will provide us with better dynamic community detection.
Footnotes
Authors’ Bios
