Abstract
Social networks currently belong to a vast area of research as information spreads at a remarkable speed due to technology, and social connections have become easily accessible in the online environment. Social networks are dynamic entities, which new individuals can join, or other links can be lost because members no longer interact with one-another. Dynamic analysis of social networks is important in topology changes of the network and also in information diffusion. Some information that spreads through the social network may be untrue, hence in this paper we propose a protocol based on evidence theory with Dempster-Shafer and Yager’s rules in which the network becomes more immune to false information. We also analyze the impact of topology change for an initial network by adding new connections in the information diffusion process. We show information diffusion by coloring the nodes of the network and also illustrate the time evolution of messages for a better accuracy in our comparisons. The experimental results confirm that the proposed model fits the behavior of inhibiting false information.
Introduction
In recent years, social networks have had a quick development and an increase in their diversity, so people can connect and interact with other users in a very easy way. A social network is an efficient way of spreading news and facts, but it also has the disadvantage that some information may be untrue. We propose a protocol that makes the network more immune to the diffusion of false information based on evidence theory with Dempster-Shafer and Yager’srule.
The present paper is an extended version of [1], in which we analyze the diffusion of two contradictory pieces of information spread by two specified source nodes. Here, we consider new case studies: a different positioning of the source nodes, the possibility that a source node does not always transmit the same information during a simulation, a larger network with 1000 nodes, the addition of new connections to the original networks, and an analysis of the number of messages during the information diffusion.
In the first case, we consider that the source node with false information can also transmit true information with a certain probability. We choose this scenario because it is quite likely that a source node that lies all the time becomes less credible for its neighbors. Another case we consider is the one in which we choose a central and peripheral source to analyze the impact that this central source has in spreading information through the network. In the third case study, we analyze the impact of changing the topology of the initial network on the information diffusion and thus we introduce the possibility of dynamically adding new connections to the existing nodes in the network. We also aim at tracking the evolution of the two types of information diffusion over time, both initially and after establishing the true information.
The paper is organized as follows. In Section 2 we present some related work about credibility analysis and the evolution of dynamic networks. In Section 3 we briefly describe the basics of evidence theory and the Dempster-Shafer and Yager’s rules of aggregating evidence. The characteristics of the model under study are presented in Section 4, followed by some case studies and experimental results in Section 5. Finally, in Section 6 we present the conclusions of our work.
Related work
In [2], the authors present a sophisticated Knowledge-Based Trust (KBT) method to evaluate the trustworthiness of web pages with regard to the information they provide. The first step is to parse data to obtain a certain format: (subject, predicate, object). This knowledge triplet is provided by various extractors, i.e. methods for information extraction from web pages. However, this extracted data may be erroneous, but also the information published on the web pages may be untrue. KBT is a multi-level probabilistic model that can distinguish between these main sources of error: incorrect data on a web page and incorrect extractions made by the extractor.
A model that evaluates the content of posts in a social network, as well as the interest in a post to determine the credibility of the person which distributed the news is proposed in [3].
In [4], the Dempster-Shafer theory of combined evidence is used to identify the insider attacker from a wireless sensor network (WSN) by observing the parameters of the neighbor nodes. To identify the inside attacker, the authors take into account the observations of the neighbor nodes regarding the behavior of the suspected attacker. Data from neighbors is considered evidence. They combine these independent pieces of evidence and decide based on the Dempster-Shafer theory.
The authors of [5] describe a way to solve the problem of identifying credible sources of relevant information in social networks. In order to evaluate the information sources, their approach combines the analysis of the link structure of social networks with topic content models of messages. They develop a method to automatically identify and categorize users based on relevance and knowledge in a particular domain for any given subject.
In [6] the authors aim at defining a model for creating social networks with a topology as realistic as possible. They approach various strategies for creating connections between individuals, and also strategies to inhibit them. For creating connections, they present two relatively simple models. Nodes are represented as agents and, initially, they do not have fixed connections between them, but rather all agents have the opportunity to visit any other agent in the network. Each agent has its own set of weights for everyone else, which represents the probability that it may visit one of those agents. The first model provides a directed graph. The second model of creating connections is similar to the first one, but the connections are bidirectional. By applying a large number of iterations on these models, the authors notice that interesting network structures may appear randomly, but there may also be a weight convergence issue observed on a simple three-agent network. They also approach the case where connections can be inhibited, but all these models converge to network structures that are not realistic. For this reason, two ways of perturbing these structures are introduced: altering the past and adding noise. These methods are relatively simple, but they can lead to surprising results.
A multi-agent model with the purpose of reproducing realistic diffusions of information and misinformation in scale-free networks is presented in [7]. The authors investigate the information diffusion in detail on a real data set and make some assumptions to model the behavior of the agents. A random threshold is assigned to each agent, which is used to determine whether the agent will spread the news or not. This threshold represents the personal preparation of the agent on the topic. Three types of diffusions are distinguished in their model: spontaneous spreading, collective influence, and communication persuasion. Every agent has the possibility to spread the information using one of these three ways of diffusion. In the spreading of false information, they introduce a new type of agent which is able to recognize the misinformation and to alert its neighbors. Their model behaves well on real social network data and they also manage to reproduce real phenomena at the macroscopic level.
There are also several approaches that analyze the credibility of information diffusion in social networks. For example, in [8] an algorithm to detect the spreading of false information through the network is presented. It uses the collaborative filtering property of social networks to measure the credibility of the sources of information as well as the quality of news items. Two aspects regarding the diffusion of misinformation in social networks are presented in [9]. These problems identify the misinformation sources and limit its diffusion in the network. Paper [10] analyzes the credibility of information in tweets corresponding to fourteen high impact news events of 2011 around the globe. To predict the credibility of information in a tweet, they use regression analysis to identify the most relevant features on the Twitter social network that can help in assessing the credibility of the messages. The most relevant features found are content-based (unique characters, swear words, pronouns, emoticons) and user-based features (number of followers, length of username). The CredRank algorithm is proposed in [11] to measure the credibility of social media users based on their online behavior by finding the users with similar behavior and clustering them.
The factors that influence the individuals’ perceived information credibility are studied in [12]. Five factors are identified as the most relevant to assess online information: medium dependency, interactivity, medium transparency, argument strength and information quality. A learning method called Information Credibility Evaluation (ICE) is proposed in [13], used to learn representations of information credibility where learning is based on user credibility, behavior types, temporal properties, and comment attitudes.
Other machine learning or simulation models could be used as well [14–16].
Brief description of evidence theory
Evidence theory can be considered an extension of the classical probability model because the single value that represents a probability is replaced by confidence intervals.
When we relate to some information, there may be different evidence to support it in different and possibly contradictory degrees. A first way to combine evidence from different sources was developed by Dempster and Shafer [17]. The confidence about information coming from different sources is represented by an interval in which the lower bound is called Belief (denoted Bel) and the upper bound is called Plausibility (denoted Pl). For a piece of information A, the plausibility is determined by:
The values for A and not A (
Let Θ be the set of all mutually exclusive hypotheses, also called the frame of discernment. In this case
Let m be a function called the mass function, m : ℘ (Θ) → [0, 1], where ℘ (Θ) is the powerset of Θ. The values of m (A) are called basic belief assignments (BBA). By applying the properties of the Dempster-Shafer theory, we have:
The Dempster-Shafer fundamental equation for combining two pieces of evidence of m1 and m2 into a new one m3 is:
Unlike the Dempster-Shafer rule. Yager’s rule [18] does not normalize the conflict, instead it adds it to the Θ set. The following equations are used to combine two pieces of evidence:
When more pieces of evidence are combined, Yager’s rule is defined by:
In this paper we analyze the spreading of information starting simultaneously from two source nodes, but we consider that these nodes spread two different pieces of information. Let x I be one of the two pieces of information, and we consider that it is true. Let
The information diffusion protocol
The protocol of information diffusion is a cascaded one. We initialize a queue with the two source nodes and update it as follows:
After a node has transmitted, it is removed from the queue; If a node receives a piece of information, it is added to the queue if it is not already contained by the queue; A node can be re-added to the queue for retransmission if it has received a piece of information that is different from the one which it has previously sent.
The information diffusion process stops when the queue no longer contains any nodes. We define a round as the action of information diffusion starting with the moment in which the queue is initialized with the two source nodes until the queue becomes empty. We consider a simulation to be the execution of a certain number of rounds. When a round is completed, the queue is reinitialized with the two source nodes, but the nodes of the network retain the statistical data: the information type (I or
Regarding the transmission probability of the node, we use Dempster-Shafer or Yager’s rule together with a Gaussian distribution. When a node sends a type of information, the receiver node retains the fact that its neighbor has sent that specific information as well as its confidence degree. From the point of view of the receiver node, the received confidence degree is taken as a piece of evidence, namely belief. In our model we have no plausibility evidence, so it will have the default value of 1. In order to keep statistics for the received information type, some specific counters are incremented.
The beliefs accumulated by a node which must send are transformed into confidence intervals, where only the plausibility is set by default with value 1.
Then, we use Dempster-Shafer or Yager’s rule to combine the intervals, but only if the node contains two such intervals and the information type is different. If the node contains only one interval, then the combining procedure is ignored. In this case, the transmission probability is given by the Gaussian distribution, where the mean is equal to the average of belief and plausibility. If the obtained probability is higher than the send threshold value, the node transmits the information.
Once two confidence intervals have been combined, we obtain two more different confidence intervals. At this point we distinguish several cases in our chosen probabilistic transmission protocol, depending on the newly computed beliefs and plausibilities, as follows:
If both intervals are equal, the node will transmit the information with the higher generated number based on the Gaussian distribution and this number will also be higher than the send threshold (Fig. 1a); If both the lower and the upper bounds are higher than the other interval limits, the information that has the interval with the larger limits will always be transmitted (Fig. 1b); If the absolute difference of the means from the two interval limits is less than or equal to a chosen value ɛ = 0.05, both intervals have the chance to further send the information. This case is only considered if the above cases are not satisfied (Fig. 1c). Cases for the chosen probabilistic transmission protocol.

For all the above cases, we choose the Gaussian distribution to have the variance σ = 0.025.
A node contains a list of confidence degrees, one for each neighbor. Thus, each node has its own point of view towards a neighbor. For example, the confidence degree of node 3 towards node 1 may be different from the confidence degree of node 3 towards node 2. This approach fits well with the real behavior because each individual has his/her own point of view and it is not totally influenced by the opinion of others on the same common friend.
We quantify this confidence degree as a real number in the range of [0, 1]. The higher it is, the higher the probability for the receiver node to get information from its transmitting neighbor, to whom this confidence degree is attached.
In the first phase of the information diffusion through the network, when it is not yet known which of the two pieces of information is true, we consider the confidence degree in all node lists to be initialized with 0.9. At the moment that the round number has reached the established one, when the true and false information is specified, the confidence degree in the node lists will be recomputed using the Laplace correction [19].
After the execution of all rounds until the true information is established, each node also stores statistics with the received information type from neighbors as well as their total number.
For the computation of the confidence degrees of each node, the order in which they are processed is relevant. For this reason, before the main simulation, we establish the node transmission order considering the simple case in which the diffusion is permanently possible for all nodes. Thus, we establish a queue in which the nodes are introduced as they transmit information. This queue is initialized with the two source nodes and it will be updated until there are no longer transmitting nodes. We expect the queue to contain all the nodes of the network and we chose to determine it separately, before the start of the main simulation. In this way, we avoid the case in which it is possible that some nodes do not transmit or receive information, thus leading to an incomplete queue.
Once the confidence degree of the nodes has been computed, we let the simulation run further in order to observe how this impacts the information diffusion process.
The simulation algorithm contains three main phases:
The execution of k rounds with the initial confidence degrees; The computation of confidence degrees after k rounds (i.e. when the ground truth about the information is revealed); The execution of j rounds with the new confidence degrees to observe the effect of the truth recently found.
Adding connections
In this study, we consider a 100-node network with a scale-free topology, where each new node randomly connects to any other node in the network, but the connection probability is also influenced by the node degree to which the new connection is made. In this type of topology, the node degrees have a power law distribution, i.e. the probability of connecting to a node with a higher degree is higher [20]:
We assume that a new incoming node can only create one connection to any other existing node in the network, thus maintaining the power law property.
The method of adding new connections is to randomly choose two nodes i and j that belong to the network, and to create a new connection between them. We also want the network to maintain its scale-free property. Thus, the connection probability of the two randomly chosen nodes is based on their degree, thus the power law distribution is satisfied. The initial search of the two candidate nodes is performed until both of the following conditions are met: the two nodes are not identical and they are not already connected.
In Pseudocode 1 the extraction procedure of a random node based on the power law distribution is presented.
Regarding the evolution of message flow, we want to see the total number of messages received by the nodes at each simulation step, where both true and false information is collected. This number of messages may be different from one round to another, so we compute the average number of messages between the simulation rounds.
Simulation results
The experimental results confirm that the network becomes more immune to the transmission of false information. In [1], we chose three networks of different sizes, two of them for model validation, and the third one, a 100-node network with free scale topology, was used as a major case study. As in [1], the two chosen sources are labeled on figures as S1 and S2 and we decide that S1 will transmit the information I, and S2 will transmit the information
In order to see how the two information types spread over the network, we color the nodes with different grey intensity levels, where the white color means the node has received only the type I information and black color, only
The fraction is the normalized quantity of type I information received by a node, in the range of [0, 1], and M is the maximum allowed value of the pixel color representation –in our case, M = 255.
In the previous study [1], we analyzed the initial diffusion of the two information types for send threshold values of 0.5 and 0.55. Once we have established the ground truth to be transmitted by S1, i.e. I is true, it was observed that the confidence degrees of the nodes for their neighbors have changed, so that the information I spread more easily through the network.
In terms of experimental results, the new simulations are made on a 100-node network and a 1000-node network taking into account another several points described below.
First, we study the impact on information diffusion when the specified source nodes have a different positioning. We focus our analysis on two major cases: in the first one, both source nodes are peripheral; in the second one, a source node is peripheral while the other is central. As for the first case, both peripheral sources are chosen in the same way as in the previous study [1]. Another contribution is the addition of new links, in which we insert a fixed number of connections based on a percentage (e.g. 10%) of the total number of connections in the original network.
We also discuss the time evolution of messages throughout the network for the information I and
Once the ground truth is revealed, the time evolution of messages is represented with the same grey scales, but with dotted lines.
A final point is to analyze the behavior of information diffusion when the source node which spreads the false information also spreads the true information with a certain probability. For all of these cases, we choose three values for the information transmission threshold: 0.45, 0.5 and 0.55, respectively.
In the case with two peripheral source nodes, we see in Fig. 2a that both I and

Time evolution of messages for two peripheral sources with threshold value of: a) 0.5, b) 0.55.
The information I is encouraged to spread, while
Therefore, the send threshold is a critical parameter in our model because a change in its value leads to a significant impact in the diffusion of information.
In the case study with the peripheral and central sources, we consider that the information I is spread by the peripheral source and
By comparing Fig. 3a and 3b, one can see that the initial information transmitted by the central source S2 is widespread throughout the network and when the true information is established as being transmitted by S1, i.e. I, the diffusion of this information type is encouraged.

The information diffusion through the network with central and peripheral sources and transmission threshold of 0.45: a) initial diffusion, b) diffusion after the ground truth has been established.
For a better comparison, we can see in Fig. 4a the time evolution of messages for a threshold value of 0.45 and we also illustrate in Fig. 3 the colored networks for this threshold. Around simulation steps 3 and 4, we see again that the information I is encouraged to spread after the ground truth is revealed, while

Time evolution of messages for central and peripheral sources with threshold value of: a) 0.45, b) 0.5, c) 0.55.
In Fig. 4b and 4c, we increase the threshold value to 0.5 and 0.55 and we can see how the spread of I is diminished both in the initial diffusion and after the ground truth is established. By increasing the threshold value, we notice that the information of the peripheral source (i.e. I) is transmitted more difficultly through the network, and the diffusion is completed in fewer simulation rounds. However, we do not see this behavior in the case of the central source which transmits information
In the scenario with the addition of new connections, we show how information diffusion is influenced by these new connections. The modified networks with the new connections are illustrated in Fig. 5, where we only show the initial information diffusion for the two main cases: the first one with two peripheral source nodes (Fig. 5a) and the second one with a peripheral and a central source node (Fig. 5b). The insertion of new connections changed the topology of networks and we rearranged the nodes for a better view. However, the source nodes are the same as in the original network.

The initial diffusion with a send threshold value of 0.5 through the modified networks which contain the newly added connections: a) peripheral sources, b) peripheral and central sources.
In the case of two peripheral source nodes, we use the same send threshold values of 0.5 and 0.55.
The source S2 spreads the information

Time evolution of messages for peripheral sources with the newly added connections, and the threshold value of: a) 0.5, b) 0.55.
Thus, the initial diffusion of I is now more inhibited in the new network due to the fact that the source S2 has a closer access to the central part of the network.
In case of a higher send threshold, i.e. 0.55, a comparison between Fig. 6a and 6b shows that the number of messages decreases and the information diffusion is finished slightly earlier.
For a central source node and added connections, the time evolution of messages does not have a significant impact (Fig. 7) compared to the original network (Fig. 4b). The behavior is the same, i.e. the information

Time evolution of messages for central and peripheral sources with the newly added connections and a threshold value of 0.5.
In all previous scenarios, the source nodes S1 and S2 always transmit their assigned information with a probability of 100%. In the current scenario, we want to analyze the impact of information diffusion when S2 sends both information types. Previously, S2 sent only the information
In the case of peripheral sources, comparing Fig. 8a and 8b, there is no great difference when S2 periodically sends the opposite information type. The spread of information I is slightly encouraged and the most visible moments are observed at simulation steps 3 and 8.

Time evolution of messages for peripheral sources with a threshold value of 0.45, when S2 sends I with a probability of: a) 0%; b) 20%.
When we have a central source and a peripheral one, comparing the Figs. 4a and Figs. 9, the number of messages for the information I becomes higher (when S2 periodically transmits I as well) in that region where the false information spreading is most active, i.e. around simulation step 4. We can also see that the number of messages from information I decreases at simulation step 7, compared to the case when S2 always sends

Time evolution of messages for central and peripheral sources with a threshold value of 0.45, when S2 sends I with a probability of 20%.
We made simulations on a 1000-node network and we choose a send threshold value of 0.45. The colored networks for both the initial diffusion and the established ground truth are presented in Fig. 10. The behavior is similar to that of a 100-node network, but due to the increased network size, there is a difference in the average number of messages. We can see in Fig. 11a that the information spread from peripheral sources is transmitted more difficultly through the network at the beginning. Afterwards, the evolution is followed by a higher increase because the messages manage to reach the central nodes.

The information diffusion through the 1000-node network with central and peripheral sources and a transmission threshold of 0.45: a) initial diffusion, b) diffusion after the ground truth has been established.

Time evolution of messages for the 1000-node network with threshold value of 0.45: a) peripheral sources, b) central and peripheral sources.
When we have a central source node and a peripheral one, in Fig. 11b, we can observe that information is spread more easily over the network, the diffusion is completed more rapidly, and the average number of messages peak increases from 120 (Fig. 11a) to 160.
Based on the presented results, we have seen that our model is very sensitive to the send threshold parameter, but only after the two information types collide during the diffusion.
The information collision is an interesting behavior which our model incorporates and it was best revealed by the case study with two peripheral sources. When the ground truth is revealed, the time evolution of messages for both information types suffers changes at the same time: the false information is inhibited, while the true one is encouraged. This effect is less visible in case of peripheral and central sources because the false information is transmitted from the central node and its inhibition is notsignificant.
A possible future direction of research is to use the model repeatedly on the same network with new information types. Thus, a peripheral node which sends only true information may have a chance to spread its information in a vast area of the network, perhaps a similar concept related to the popularity of an individual.
