Abstract
In this work we introduce the notion of the weighted graph associated with a fuzzy measure. Having a finite set of elements between which there exists an affinity fuzzy relation, we propose the definition of a group based on that affinity fuzzy relation between the individuals. Then, we propose an algorithm based on the Louvain’s method to deal with community detection problems with additional information independent of the graph. We also provide a particular method to solve community detection problems over extended fuzzy graphs. Finally, we test the performance of our proposal by means of some detailed computational tests calculated in several benchmark models.
Introduction
Clustering methods are unsupervised learning tasks, whose main aim is the decomposition of a given input set of objects into subgroups called clusters or communities. This decomposition is based on the similarity among several objects. The goal is to obtain a partition in such a way that those objects belonging to the same community are as similar as possible, whereas that elements belonging to different clusters are as dissimilar as possible. This process of grouping elements is known as clustering.
From an extensive literature review, we have found a broad range of sectors and backgrounds in which clustering has become a widely accepted tool of exploratory data analysis and model development due to its skill to gain an interesting insight into data structure. Some of these areas are business and economics, engineering and astronomy, medical and biological disciplines [32, 47]. In [5] it can be found a detailed motivation for this structure searching. This type of analysis has also taken an essential role in solving pattern recognition or image processing problems [49, 54]. Some examples of real-world networks whose nodes are prone to exhibit a modular structure can be found in [9, 11]. It is well established that nodes belonging to the same clusters are more likely to be ‘related’ to one another than they are to the rest of the network (Friedkin held that ‘the more distant two vertices, the less they influence each other’ [14]). The modular structure of the graphs shows clearly the relations among the, besides the inner information of the network. Then, we can affirm that one of the most important analysis to be done with reality networks is about community detection. Along the years, many authors have focused their attention on this problem. Some extensive analyses and comparison between several representative methods on community detection can be found in [25, 53]. These papers provide a clear vision of the state of art of this field.
A graph is a mathematical object that allows to model (and also visualize) the relationships that exist between a set of objects, V. In the topic of community detection problems, thousands of works can be found in which a graph models the relationships between different objects with the aim of identifying strongly cohesive communities or groups [10, 39]. Undoubtedly, the relationships between objects (valued or not) should be the main factor when identifying communities in a network. Nevertheless, it would not be unreasonable to use additional information to which a graph provides in order to extend the concept of group or community [22, 43].
Let us illustrate this idea with a toy example. We consider a set of 10 individuals whose relationships can be seen in Fig. 1. The set of links is E = {(1, 2) , (1, 3) , (2, 3) , (2, 4) , (3, 5) , (4, 5) , (5, 6) , (6, 7) , (6, 8) , (7, 9) , (8, 9) , (8, 10) , (9, 10)}. Then, we assume that we have additional information about these 10 individuals. For example, we know their particular opinion (positive or negative) in favor of a politician. In particular, let us assume that the following vector represents the opinion of these 10 individuals O = (+ , + , + , - , - , + , + , - , - , -). From a structural point of view (i.e. the graph), any community detection algorithm will identify clearly two groups or communities: C1 = {1, 2, 3, 4, 5} and C2 = {6, 7, 8, 9, 10}. However, if we reformulate the idea of a group and we incorporate the information provided by each individual’s opinion, it seems logical to think that there should be four groups, C1 = {1, 2, 3}, C2 = {4, 5}, C3 = {6, 7}, C4 = {8, 9, 10}.

Graph with 10 nodes.
In the literature of Social Network Analysis, (SNA), we have found some authors that have incorporated additional information to the graph to enrich some of the classic problems of this field (see for example [3, 43]). In the case presented in [16], the information provided by a game is incorporated in order to reformulate the idea of centrality to obtain new and different measures. In the topic of fuzzy sets, we can found that the information provided by a fuzzy set has also been incorporated into the graph to enrich the classic problem of community detection or other SNA problems (see for example [12, 48]). Nevertheless, as it is mentioned in [20], the usual way of defining fuzzy graphs only takes into account the associated uncertainty for each pair of adjacent nodes of a given graph. However, there are some situations in which this model looks insufficient to capture possible larger interactions as well as the capacity of those sets of nodes whose cardinality is greater than two.
Considering this, in this work we analyze the community detection problem when the available information has two different and independent parts: a graph G = (V, E), and a fuzzy measure μ defined over V (see [46] for further details), the set of nodes of the network. See that a fuzzy measure can be viewed as an extension of probabilistic, plausibility/belief or possibilities measures. Particularly in this paper, fuzzy measures are assumed to be the representation of the affinity among nodes. Let us remark that in this study we work in a much more general case than fuzzy graphs are, in which the only available information is about the edges of the graph: our research is based on
Our starting point is one of the most popular community discovery algorithms, which was introduced in 2008 by Blondel et al. [4], called Louvain algorithm. It is based on modularity optimization, which is an NP-hard problem. In this paper, we first propose a modification of it which can manage several information sources when solving community detection problems, apart from the structure of the graph. Then, we explain a particular application of our proposal to deal with community detection problems over extended fuzzy graphs in which the fuzzy measure μ defines some affinity relation among the nodes. We also explain the particular case in which the fuzzy measure is additive so the application of our algorithm is almost trivial. To carry on with it, we first introduce the notion of the
The remainder of the paper is organized as follows. In Section 2 we introduce some new concepts related to fuzzy measures. Then, in Section 3 we work with fuzzy measures obtained from fuzzy relations. In Section 4 we propose a modification of the Louvain algorithm which can manage different information sources. We also provide a community detection algorithm to find communities in extended fuzzy graphs. The performance of our algorithm is tested with some computational results based on benchmarking in Section 5. We conclude the paper with some conclusions in Section 6.
Let V = {1, …, n} be a finite set of elements called nodes, and let E = {{i, j} | i, j ∈ V} be a set of links or edges, it is, the set of neighboring or related elements of V. It is assumed that if the elements i, j are related, then ∃ e = {i, j} ∈ E; otherwise, ∄ e = {i, j} ∈ E. Hence, the pair G = (V, E) is a graph or network. A network can also be represented by means of its adjacency matrix, denoted as A. It shows the direct connections between the elements of V in that sense: if A ij ¬ =0, then ∃ e = {i, j} ∈ E; else, if A ij = 0, then ∄ e = {i, j} ∈ E. A particular case of graph is that in which each edge of E has a numerical weight assigned. This type of graphs are known as weighted graphs. In theses cases, ∀i, j ∈ V, the element A ij represents the weight of the edge e = {i, j} (let us note that it has to be not negative, but not necessarily 0 or 1).
A wider concept is showed by fuzzy graphs. This type of model representation is really useful in those cases in which there exists some uncertainty or vagueness about the description of the objects, its relations, or even, in both aspects. This notion was introduced by Rosenfeld [41], and it is based on the fuzzy relations between the elements [52], under the assumption that they represent the degree of relationship between the individuals. In Definition 2 we provide a formal characterization of fuzzy graphs.
Let us emphasize that many authors simplify previous definition of fuzzy graphs under the assumption that the fuzzy vertex is crisp, so the only information is given by the pair G = (V, ψ). Also, it is important to note that another common assumption in fuzzy graphs is that there exists a set of crisp links E, that forces a fuzzy edge to be zero if the corresponding link does not exist in E. Taking into account these considerations, we present the following and simplified definition of a fuzzy graph:
Some remarks could be done about the previous definition of a crisp graph with fuzzy edges ψ. The first one is that if a pair of nodes are not adjacent in the crisp graph G, then they can not have any membership degree in ψ. Then, the only available information given by this model is the crisp graph as well as the memberships degree of each pair of adjacent links. Hence, we can see this type of graph as a weighed graph where the weights represent the memberships degree of each pair of links.
In order to give an extended vision of the information, we provide the Definition 2 as an extension of fuzzy graphs, that is far from the notion of valued graphs. First, let us describe another concept which holds the definition of the extended fuzzy graphs.
Then, an extended fuzzy graph will be a graph together with a fuzzy measure defined over the set of nodes. Formally:
Now, let us introduce more important concepts, which fix the essentials of this paper.
Then, we define the k-additivity of a fuzzy measure depending on its Möbius transformation.
Obviously, 1-additive fuzzy measures are additive fuzzy measures. Along this paper we will focus on 2-additive fuzzy measures.
Fuzzy measures based on relations
In this section we provide a practical definition of a particular fuzzy measure. Our purpose is to build this fuzzy measure based on a fuzzy relation. Then, let V = {1, …, n} be a finite set of elements, and let us suppose that there is a fuzzy affinity relation between the elements of V. Let us suppose that matrix R characterizes this relation, defining the degree of affinity among each pair of elements of V, in the sense that the element R ij quantifies the affinity between the individuals i and j. Let us suppose that R is a non-negative, symmetric and normalized matrix whose main diagonal is null. Then, we define μ R : 2 V ⟶ [0, 1] as follows:
μ
R
(∅) =0 Trivial. μ
R
(V) =1 due to R’s normalization. μ
R
is monotonous ⇔ R
ij
≥ 0. Trivial due to R definition.
Then we show some properties of this fuzzy measure μ R based on a fuzzy relation.
Let us define a
i
= 0 ∀ i ∈ V, and a
ij
= R
ij
∀ {i, j} ⊂ V. Then, according to Equation (1),
Let μ be a fuzzy measure which defines some affinity relation between all the pairs of elements of a finite set V. In this subsection we define the weighted graph associated with μ. Its definition is based on the Shapley value [44] as follows: for each pair of elements i, j, we quantify how the absence of one individual affects the other element of the pair by means of the value F ij :
The calculation of Shapley value has exponential complexity, which could be moderately solved with some sampling techniques (see [6, 7] for further information). However, when additive fuzzy measures are considered, this calculation is immediate. According to Proposition 2, and under the assumption that the matrix R characterizes the affinity relations among the nodes, the fuzzy measure μ R obtained from R following Equation (1) is 2-additive, so the calculation of the Shapley value does not imply any complexity increasing.
For a half of o ∈ π (n), k ∈ pred (i), then, for a half of the values of previous summation, it is held that x k = 1. Hence,
Due to R’s simmetry, R ik = R ki . Moreover, R ii = 0. Then
Then, the demonstration of point 2 is analogous.
The following proposition is an immediate consequence of Proposition 4.
The proof of point 2 is analogous.
Hence, we can define the affinity between each pair of nodes as
An alternative definition of the affinity between a pair of nodes{i, j} is related to the interaction index proposed by Grabisch [17].
Then, according to Equation 1, we can rewrite the components of the interaction index as follows:
Hence,
The problem of finding a good partition for a given graph is known as Community Detection Problem. In the classical version of this problem, each node just belongs to one community, this is, only non overlapping partitions are allowed [31]. However, we can find in the literature several variations of it, in which nodes can belong to more than one community can be found in [8, 51].
In this work we propose a new concept of a group which is based on the notion of the extended fuzzy graphs (see Section 2). Hence, under the assumption that
In this section we show a modification of the Louvain algorithm [4], which can deal with different information sources when solving community detection problems.
The Louvain algorithm has two key points in its performance. One of these points is the search of ‘possible’ clusters, finding for each node in the graph all its neighbours (we say ‘possible’ clusters because two nodes can not be in the same module if they are not connected). The other point is about the calculation of the modularity variation that takes place when a node is moved from its community to another. Both points are faced with the support of the adjacency matrix of the graph, which is the input parameter of this algorithm. Nevertheless, we propose a new vision of this method, assuming it has two input parameters, with different purposes: one is devoted to connection analysis and other is devoted to modularity calculation. This approach is summarized in Algorithm 1.
Then, our proposed community detection algorithm is based on this vision of the method with two input parameters. The main aim of our method is to find community structures in networks, not only considering the structure of the graph, as classical methods do, but also with the analysis of any other type of information. Let A be the adjacency matrix of the graph G = (V, E), and let
1: ∀i ∈ V, let i be an isolated community
2: o = permutation (V)
3:
4: search in A all the neighbours of k, j
5: ∀j, calculate ΔQ
k
(j) in matrix
6:
7:
8: Move node k to j* ’s community
9:
10: k remains in its community
11:
12:
1: P1 is a single-community partition with all the vertices
2: P2 is a partition of V such that ∀i ∈ V, i is an isolated community in P2.
3: A1 = A, A2 = A,
4:
5: P1 = P2
6: P2=
7: A2 is the aggregated matrix obtained from the network A1, whose nodes are the communities of P2.
8:
9: A1 = A2,
10:
11: P = P2
Let us remark that the application of Duo_Louvain(A, A), using the adjacency matrix for both input parameters, is exactly equal as the application of the Louvain algorithm over the matrix A. Furthermore, the complexity of our proposal is the same as the Louvain algorithm’s complexity is.
Thus far we have not questioned the origin of matrix
Then, we explain how to adapt Duo Louvain Algorithm for its application in community detection problems over extended fuzzy graphs. In this framework, the additional matrix will be related to the corresponding fuzzy measure. Let
1:
2:
3: Duo_Louvain(
Assuming that μ
R
is built from the matrix R according to Equation (1),
Let us recall the example proposed in the Introduction. Now we show that the Algorithm 3, Duo Affinity Louvain detects the communities which seemed to be logical when considering the political point of view of the individuals.

Extended Fuzzy Graph
Considering
Benchmark graphs
One of the most important points to compare different methods besides their complexity or speed, is the quality metric used to measure the goodness of obtained groups. There are a lot of measures to test the ‘goodness’ of a partition, as for example, the modularity [40] or the centrality degree [26]. Another type of metrics have been deeply studied by means of the performance of clustering algorithms in ‘benchmark graphs’ [30]. Benchmark graphs are artificial graphs built with a known clustering structure integrated in the network by construction. This structure is considered the ‘standard’ that any community detection algorithm should find. Under the assumption that the expected degree of each node, <k>, and the community size of each cluster, |C t |, are power laws with exponents α and β, the construction of benchmark graphs is based on the Equation (4): given the cluster C t , for each pair of nodes (i, j), there will be a link between them with the probability defined below.
A clustering method is as good as its ability to detect this fixed structure is good: the best algorithm will be the one which provides a clustering structure as similar as possible to the standard.
This similarity between the standard structure and the obtained partition is usually measured with the normalized mutual information NMI [29].
In this work, we empirically analyze the performance of our community detection algorithm based on extended fuzzy graphs, Duo Affinity Louvain Algorithm, by means of its performance in four different benchmark graphs, following the idea proposed in [15].
Each benchmark graph, representing the extended fuzzy graph
To build these two matrices randomly, A and R, different values of in/out degree (z in and z out respectively) will be considered, as can be seen in Table 1. Both networks will be generated according to the Equation (4). The standard partition will depend on the benchmark graph considered.
Benchmark Parameters
Benchmark Parameters
Hence, we propose the generation of four different benchmark models, as can be seen in Figs. 3, 4, 5, 6, all based on the idea about the construction of synthetic or benchmark graphs proposed in [15]. We assume that nodes in the boxes of the ‘main’ diagonal of each relation network have high affinity degree, it is, a strong relation in μ R . In the same sense, nodes in the boxes of the ‘main’ diagonal of each adjacency graph are tightly-knit connected. Following the idea of group given in Section 4, when possible, connected nodes with a strong good relation should be in the same module.

Benchmark Model. Case 1.

Benchmark Model. Case 2.

Benchmark Model. Case 3.

Benchmark Model. Case 4.
In this work we empirically test the performance of our method. Hence, we analyze several linear aggregations of corresponding matrices,
NMI. Benchmark Model Case 1
NMI. Benchmark Model Case 2
NMI. Benchmark Model Case 3
NMI. Benchmark Model Case 4
Now we show the performance of our algorithm in the most basic case. Here, the relations graph is defined in the same way as it was done in [15], considering 256 nodes. Regarding the structure of the graph, in matrix A1 the nodes are organized into 2 modules whose expected size is
Obtained results for different values of α and β (in Table 1 we can see the label assigned for each value of these parameters) are shown in Table 2.
Benchmark model. Case 2
In this subsection we show the performance of our algorithm in a more complex structure. Now the ‘gold’ structure is finer than it is in the situation showed in Subsection 5.2.1. In this case, the graph is organized into 4 clusters, where
Benchmark model. Case 3
In this subsection we apply our algorithm in a more complex situation: we broke the symmetry of the structure in the relations matrix R3, hindering the clustering process (see Fig. 5) Then, the expected size of these 5 clusters is
Benchmark model. Case 4
Finally, we increment the complexity of the framework. Not only the structure of the relation network is asymmetric, but the communities of the graph are somewhat small (see Fig. 6). The adjacency matrix A4 is built in the same way as it is built matrix A2 in Subsection 5.2.2. Regarding the relations matrix R4, it is built with 8 embedded clusters, whose expected sizes are
Conclusions and further research
Graphs are used to model and visualize the existing relationships between a set of objects. However, in real-life problems there is always some extra information which is not represented by graphs. Hence, graphs are not enough for modeling those situations in which there is some additional information apart from relationships. Here lies the importance of considering networks including additional information. Then, our proposal is based on the use of extended fuzzy graphs.
The main aim of this work is related to community detection problems over extended fuzzy graphs [20]. We can summarize it as the search of community structures in problems modeled with a network in which there exists some additional information apart from the structure of the graph. This type of situation is defined by means of an extended fuzzy graph, G = (V, E, μ), where μ is a fuzzy measure defining some affinity relation among the elements of V, which is not inherent to the structure of the graph.
To carry on with it, first we do a recall of some basic concepts about fuzzy framework. We provide the definition of extended fuzzy graphs, a triplet
Apart of these contributions, we propose a particular definition of a fuzzy measure μ R based on matrix R which defines some affinity relation. We also demonstrate different properties of μ R , for example, it is a 2-additive fuzzy measure. Then, we define the weighted graph associated with a fuzzy measure. It is the key point of the algorithm that we propose to deal with community detection problems over extended fuzzy graphs, in which there exists some additional information defined by a fuzzy measure that is related to the affinity relation among the nodes.
Then, we suggest a modification of the Louvain algorithm [4], called Duo Louvain Algorithm, which can manage several information sources when dealing with community detection problems. This proposal has the same computational complexity as the Louvain method. Moreover, it supposes a valuable way to find good partitions. We also provide a particular application of Duo Louvain Algorithm to work with extended fuzzy graphs, Duo Affinity Louvain Algorithm. This algorithm provides a community structure in which the nodes assigned to the same group have high affinity degree. For the specific case of a fuzzy measure built from a fuzzy relation (see Section 3), we show that its application is immediate. In fact, the only input parameters of the algorithm here proposed are a graph and a relation matrix related to the nodes, it is, two matrices (n x n). This basic framework allows us to model great complexity problems that had not been possible to address until now. Also, let us remark the importance of working with simple fuzzy measures as the one here proposed (2-additive). The ‘low’ complexity in its formulation, allow us to model in a quite simple way real situations.
It is easy to imagine a background application for the problem here proposed. Just imagine a set of people with several physical connections, which have different affinities depending on their political opinion or hobbies. More examples can be found in social networks or microblogging services, such as Twitter. In this context, we can see the different pairwise of users as a network in which there is a link between two profiles (nodes) if they follow each other. We can assume that this type of information defines the structure of a graph. Obviously, the footprint of the user’s profiles is much more rich than it: taking a look to the social interactions, such tweeting, or the textual information, such as tweets, retweets or hashtags, we can obtain a rich source of knowledge about how users are related and how they should be grouped, beyond the existing links in the network. Some researches under this starting point can be found in [2, 35].
To finish, let us remark that this contribution provides a great improvement with respect to the classical community detection. Although some good ideas have been proposed along the years to include some extra information in clustering problems [3, 43], all of them just consider the topology of the graph with a pre-processing step or similar strategies.
In this work we also show the performance of our proposal in several benchmark models [1] with different levels of complexity. As we show in Section 5, the performance of our algorithm is excellent. In Tables 2, 3, 4, 5 we provide the average of the calculation of the NMI [28] of obtained partitions for 100 iterations of each case. As it can be seen, it is almost 1 in all the situation. These values allows us to affirm that our method guarantees very good results in solving clustering problems with additional information.
As further research, we will work in the develop of our idea considering different fuzzy measures (the bipolar case was analyze with a simple example in [21]). We will also study some properties of the fuzzy measure and we will apply it in a real-case example, a database extracted from Twitter. Other work direction is about the inclusion of the additional information in others clustering methods, such as the Girvan-Newman algorithm defined in [15].
Acknowledgments
This research has been partially supported by the Government of Spain, Grant Plan Nacional de I+D+i, MTM2015-70550-P, PGC2018096509-B-I00 and TIN2015-66471-P.
