Abstract
Collective knowledge is understood as the common knowledge state of a collective consisting of autonomous units. The knowledge states referred from these autonomous units to some degree reflect the real knowledge state of a subject in the real world, but it is not known to what degree because of incompleteness and uncertainty. Although collective knowledge determination is an important task because these knowledge states can be different from each other, there exists another important issue with its quality. The quality of collective knowledge is based on the difference between the real knowledge state and the collective knowledge. In this study, we investigate the influence of the number of collective members on the quality of collective knowledge. Through experimental analysis, the larger collective we use, the better the quality of collective knowledge will be. In other words, the large number of collective members positively affects the quality of collective knowledge. Besides, some theorems about the relationship between the collective knowledge and the knowledge states in a collective, the influence of adding or removing members on the quality of collective knowledge are also proved.
Introduction
The problem of using multi-autonomous units (collective members) such as agents, experts, etc., for solving some common subjects in the real world is increasing popular. For example, in multi-agent technology a set of autonomous sources are used in making decision processes [5, 20]; or in clinical decision support systems where tasks are performed by taking into account a number of sources consisting of a set of databases and a set of data sources [47]. On the one hand, this phenomenon is useful in giving a proper solution [49, 51]. Since a large collective may have additional knowledge that a collective with small members do not possess on their own. This knowledge may be relevant to the solving subject [52] and it can be reused for solving similar subjects in the future [48]. On the other hand, this phenomenon also causes conflicts because collective members can have different (or inconsistent) elements of knowledge about the same subject in the real word. This is due to the fact that each collective member can be a good specialist in one or two dimensions whereas subjects in the real world aremulti-dimensional.
The concept of collective knowledge has remained controversial [44]. There exist many concepts about the collective knowledge such as belief justified true or acceptable by most members in a collective [45]; the sum of shared contributions among collective members [42] or the knowledge of a collective as a whole [37]. In this study, a collective is understood as a set of knowledge states referred from collective members on the same subject. The real knowledge state of the subject exists but is not known by collective members when they are asked to give their knowledge states about the subject. Thus, the knowledge states in a collective to some degree reflect the real knowledge because of incompleteness and uncertainty. The knowledge of a collective as a whole (collective knowledge) is determined on the basis of the knowledge states in a collective [37]. In collective knowledge determination, based on the way in which the real knowledge state exists, there may be two possible cases: The real knowledge state exists dependently on the knowledge states in a collective. The real knowledge state exists independently of the knowledge states in a collective.
The first case is related to the problem of election committee, group decision making, etc. In this case the collective knowledge and the real knowledge state are identical. Thus, we name this case as subjective case. For the second case, however, the existing of the real knowledge state is independent of the knowledge states in a collective such as in the problem of weather forecast for a future day, the currency rate in the next month, etc. The collective knowledge of a collective to some degree reflects the real knowledge state of a subject in the real world. Thus, we name it as objective case.
In [35] Nguyen has worked out many consensus-based algorithms which serve for determining collective knowledge of a collective for different knowledge representations such as logical, relational structures, ontology, etc. Consensus choice has usually been understood as a general agreement in situations where participants have not agreed on some matters [10]. It does not require a full degree of agreement i.e. unanimity [23] because unanimity is difficult to achieve in practice. In this study, however, we don’t aim to propose a consensus model to reach a high level agreement between collective members’ knowledge states as in [11, 56]. Instead, a consensus model is used to determine a knowledge state which is considered as the representative of the knowlege states in a collective. Consensus models have been proved useful in solving conflicts arising in the process of collective knowledge determination [3, 32].
The general research problem of the quality of collective knowledge for objective case is described in Fig. 1 as follows:
For a given subject in the real world, a set of collective members are asked to give their opinions (knowledge states). These opinions to some degree reflect the real knowledge state of the subject. Then collective knowledge is determined from the basis of these knowledge states. There exist two most important factors affecting the quality of collective knowledge: the inconsistency degree of a collective and the number of collective members. Inconsistency degree presents the coherence and density levels of the knowledge states in a collective [35]. In the previous works [35, 36] based on the Euclidean space, we have proved that the collective knowledge is better than the worst knowledge state in a collective; in case the distances from all knowledge states to the real state are identical, the collective knowledge is the best one in comparison with all knowledge states in a collective. In addition, through simulation [39], the authors have concluded that the larger the number of collective members, the better the quality of collective knowledge. In [41] with some restrictions, the hypothesis “the higher the inconsistency degree, the better the quality of collective knowledge” is true. In other words, the inconsistency degree positively affects the quality of collective knowledge.In [16, 25], the inconsistency degree also has positive influence on new knowledge elements which arisen in the process of collective knowledge determination. It is due to the fact that a homogeneous collective involving members with the same knowledge states about a real world subject can not generate any new additional knowledge elements.
Beyond inconsistency degree, the problem of the influence of the number of collective members on the quality of collective knowledge is also an important issue. In some situations small collectives can give better solutions as in [14, 21]. In other situations, however, large collectives play an important role in giving better solutions as in [7, 53]. For example, we consider the following collectives when collective members are invited to give their answers for the question “Will it be rainy tomorrow in Wroclaw, Poland?” as follows: X = {yes, no}, Y = {yes, no, yes}, Z = {yes, no, yes, yes}. With collective X, it is difficult to infer which one can be chosen as the representative of the collective. However, for collective Y, “yes” is more probable and it is the most probable in the case of collective Z. In this example, the number of collective members plays an important role not only in determining the representative of a collective but also in evaluating the quality of that representative. Thus, in this work, we will investigate the influence of the number of collective members on the quality of collective knowledge for objective case (with assumption the cost for inviting/implementing more collective members is acceptable). This approach can be considered as a revised and expanded version of the papers from ACIIDS 2015, 2016 [39, 40] with three separate datasets in the field of wisdom of crowds. Through experimental analysis, the large number of collective members has positive impact on the quality of collective knowledge. However, the experiments also reveal that the difference between the real knowledge state and the collective knowledge of a collective becomes insignificant when the number of collective is large enough. This means that the increase of the number of collective members should not lead the collective knowledge of a collective to be closer to the real knowledge state. In addition, some theorems about the relationship between the collective knowledge and the knowledge states in a collective, the influence of adding or removing members on the quality of collective knowledge are also proved. These are the main contributions of the paper.
The remaining part of this paper is organized as follows. In Section 2, some related works to the influence of the number of collective members on the quality of collective knowledge are briefly introduced. Section 3 revises some basic notions related to consensus choice, collective of knowledge states, knowledge of a collective, and a method for measuring the quality of collective knowledge. Section 4 presents the proposed method of the paper, the experimental results and their evaluation. Finally, some conclusions and future works are pointed out in Section 6.
Related works
The problem of using a large collective to solve some common problems in the real world has been being more popular in recent years, after the influential book The Wisdom of Crowds has been published by Surowiecki [51]. However, maybe it has existed for a very long time, especially in the field of psychology. For instance, in 1907 Galton conducted a contest about guessing the weight of an ox [13]. Through experimental analysis, the collective guess was only 9 pounds off the actual weight. Similarly, in case of ranking the weights of the objects, introduced by Gordon in [15], the average of all collective members rankings is 0.79 for collectives of ten; 0.86 for collectives of twenty; and for collectives of fifty it was 0.94.
Recently, there exist a lot of statistical results which have shown that a collective is better than a single member in solving some common problems in the real world. According to [49], a statistical result from the game show “Who wants to be a millionaires?” only 65% of answers given by experts were on target, while 91% of answers given by the public were correct. This is due to the fact that a single member can be a good specialist in one or two dimensions, whereas problems in the real world are multi-dimensional. Then the author has put forward a hypothesis that a collective is more intelligent than an individual. Later, this hypothesis has been confirmed by Surowiecki in [51]. In that work the author has described several experiments which served for proving the statement the reconciled decision of a collective is in general more proper than individual decisions. For instance, a number of collective members have been asked to guess how many beans are in a jar. The average of results given by the collective members is 871 grains and the actual number of beans in the jar is 850; the error is only 2.5%. In [53] the authors have invited a collective of 500 members (478 valid guesses) for solving some common problems such as forecast temperature, guessing the weights of specific amounts of coffee, milk, gasoline, air, and gold. Through experimental analysis with different numbers of collective members such as 10, 20, 50, 100, and 200, the authors have concluded that the large collective has positive influence on the collective judgment quality. In [7] through experiments using prediction markets the authors have highlighted the important role of the number of collective members in producing a good forecast. Concretely, the large collectives (18 members) perform substantially better than small collectives (8 members). The success of these works is considered as based on the Law of Large Numbers which has been proved useful in error reduction in collective intelligence [22, 50]. Besides, in [55] the authors have stated that both speed and accuracy of decision making increase with the number of collective members.
In addition to these related works, this study presents another approach to determine the influence of the number of collective members on the quality of collective knowledge. For this aim, we will increases the number of collective members from minimal value (n) up to maximal value (m), each step by k members. This is helpful in determining a number of collective members such that its increase does not lead the difference between the collective knowledge and the real knowledge state becomes significant. In addition, the knowledge states in the collective at step t are involved in the collective at step (t + 1). In addition, based on Manhattan distance space, the influence of adding or removing members on the quality of collective knowledge is also formally proved.
Background
Collective of knowledge states
By U we denote a set of objects representing the potential elements of knowledge referring to a concrete real-world subject. The elements of U can represent, for example, logic expressions, tuples, etc. Symbol 2
U
denotes the powerset of U that is the set of all subsets of U. By we denote the set of all k-element subsets (with repetitions) of set U for k ∈ N (N is a set of natural numbers), and let
Thus ∏ (U) is the set of all non-empty finite subsets with repetitions of set U. A set X ∈ ∏ (U) can represent the knowledge states in a collective given by collective members on the same subject where each element x ∈ X represents the knowledge state of a collective member. We also call X a collective. Set U can contain elements which are inconsistent with each other. In this work, a collective is described as follows:
where x ik ∈ R, k = 1, 2, …, m, m is the number of dimensions.
It is a multi-dimensional vector including a set of collective members’ opinions about a real-world subject. Concretely, we consider the following example about the forecasts of Euro to Dollar Exchange Rate (EUR/USD).
Collective
X
is called:
homogeneous, if all its knowledge states are identical, that isx
i
= x
j
foralli, j ∈ 1, 2, …, n(nis a natural number,n ∈ N). heterogeneous, if it is not homogeneous. That is ∃x
i
, x
j
∈ X : x
i
≠ x
j
.
Elements of U have two structures. They are: macrostructure and microstructure. The microstructure is considered as the structure of elements of the set U such as: linear orders [1, 57], semi-lattices [4], n-tree [10, 54], ordered partitions and coverings [8], incomplete ordered partitions [17], non-ordered partitions [2], weak hierarchies [26], time interval [30, 38], etc. The macrostructure is understood as the relationship between elements in a collective. In general, the macrostructure is often defined by distance functions for measuring difference between elements in a collective. The definition of the macrostructure is as follows:
which is: Nonnegative, i.e.
Reflexive, i.e.
Symmetrical, i.e.
where [0,1] is the closed interval of real numbers between 0 and 1. Pair (U, d) is called a distance space. The definition of a distance function is independent of the structure of elements of U.
In this work, the Manhattan distance function is used for determining a representative of each collective (collective knowledge) and proving some theorems about the relationship between the collective knowledge and the knowledge states in a collective; the influence of adding or removing members on the quality of collective knowledge.
As aforementioned, the knowledge of a collective (or collective knowledge) is considered as the consensus of members’ knowledge states in a collective. It is considered as the representative of a collective. In [33, 35] the author has presented many criteria for collective knowledge determination [35] such as: reliability, unanimity, simplification, quasi-unanimity, consistency, condorcet consistency, general consistency, proportion, 1-optimality and 2-optimality (or O1 and O2 for short). Criteria O1 and O2 are the most popular criteria used in collective knowledge determination because of satisfying these criteria it implies satisfying the majority of other criteria [32, 35]. There exist a lot of consensus-based algorithms which have been proposed for determining the knowledge of a collective for different knowledge representations such as ordered partitions [9], relational structure [33, 35], interval numbers [30, 31], hierarchical structure [24], ontology [35], etc.
criterion
O
1
if:
criterion
O
2
if:
where x* is the collective knowledge X, d (x*, X) is the sum of distances from x* to elements in collective X and d2 (x*, X) is the sum of squared distances from x* to elements in collective X.
In case of satisfying criterion O2, x* has the following form:
In this study, criterion O1 will be used to determine the collective knowledge of a collective while some theorems as mentioned in Introduction section will be proved based on criterion O2.
For objective case, the real knowledge state of a subject is independent of the knowledge states in a collective and the collective knowledge reflects the real knowledge state to some degree. Thus, the quality of collective knowledge is based on the distance from the knowledge of a collective to the real knowledge state [35, 36]. In this case, the best collective knowledge is the closest one to the real knowledge state.
From Fig. 2, the quality of collective knowledge is measured by taking into account the distance from the collective knowledge to the real knowledge state. Its definition is as follows:
Let vector
be its knowledge satisfying criterion
O
2
; then
For any dimension j, we know that:
Then,
In conclusion, the following dependency is true:
According to Theorem 1, the distance from the collective knowledge to the real knowledge state does not exceed the average of the distances from the knowledge states in a collective to the real knowledge state. In other words, if collective X is not homogeneous, then the collective knowledge is closer to the real knowledge state than the average of distances from the elements in a collective to the real knowledge state. Thus, the average of distances from the knowledge states in a collective to the real knowledge state can be considered as the upper limit of the difference between the real knowledge state and the collective knowledge. In the next section, we will propose a method to determine the influence of the number of collective members on the quality of collective knowledge for objective case.
The proposed method
The problem of analyzing the influence of the number of collective members on the quality of collective knowledge is described as follows: for a given subject in the real world, a set of collective members are asked to give their opinions (knowledge states). These knowledge states reflect the real knowledge state (which is not known to them) of the subject to some degree. To determine the influence of the number of collective members on the quality of collective knowledge, a collective consisting of minimal members will be increased up to maximal members, each step by k members. The detailed procedure of the proposed method is described in Fig. 3 as follows:
We start with a collective consisting of the minimal number of collective members (n). Then the number of collective members is increased up to the maximal number of collective members (m). Y j is an added collective with k members. At the end of this process we have (m - n)/k collectives. Then criterion O1 is used to determine a representative (collective knowledge) of each collective. As aforementioned, the quality is based on the distance from the collective knowledge to the real knowledge state. The best collective knowledge is the closest one to the real knowledge state. Based on the difference between the real knowledge state and a set of collective knowledge of (m - n)/k collectives we will determine whether the large number of collective members positively affects the quality of collective knowledge?
Experimental results
Datasets
In this study, we use three separate datasets from the field of wisdom of crowds to determine the influence of the number of collective members on the quality of collective knowledge. The first dataset has done by Gideon Rosenblatt 2 about guessing the total pieces of cereal in a weird-looking glass vase. The second one is guessing the weight of a cow 3 . The last one is about guessing how many coins were in a huge jar 4 . The details of these datasets are described in Table 2.
In these datasets, of course, the real state (actual value) is not known by collective members when they are asked to give their opinions. Thus, the collective opinion (collective knowledge) of the collective consisting of these opinions may reflect the real state to some degree.
Experimental results and their evaluation
In this section, we will present the experimental results of the proposed method with above datasets. Since the problem of choosing the number of collective members should be enough for a given subject is a difficult and complicated task. Thus, the minimal number of collective members is beginning with an odd number as mentioned in [34, 35]. First, the experiments are performed with collectives (from 3 to 331 members, k = 1). For each collective size we repeat with 50 collectives. Then with each collective we calculate the difference between the collective knowledge and the real knowledge state (d (r, x*)) and the average of the differences accross each 50 collectives. Criterion O1 is used to determine the collective knowledge of each collective.
In [35] the distance between elements of set U is normalized to interval [0,1]. However, according to Table 2, the difference between the min and max values is too large (such as 1 and 14,555 in case of cow weight dataset). Thus, the normalization of whole dataset leading the difference between members in a collective or between collective knowledge and the real knowledge state are too small. In addition, the difference seems to become insignificant when the number of collective members is increased. According to Fig. 2, the better quality of collective knowledge is considered as the closer one to the real knowledge state. Thus, in this study, the distance between two elements of set U belongs to . In the following figures (Figs. 4–7), d rx presents the distance from the collective knowlege of a collective to the real knowledge state (on the y-axis). The x-axis presents the number of collectivemembers.
From these figures we can state that the number of collective members has influence on the quality of collective knowledge. Concretely, if we increase the number of members in a collective, then the difference between the real knowledge state and the collective knowledge tends to be down. However, with dataset cow weight, when the number of collective members increases up to 331, the influence of the number of collective members to the quality of collective knowledge seems insignificant in comparison to the case of increasing the number of collective members up to 100.
Another problem is whether the relationship between the number of collective members and the quality of collective knowledge is significant? For this aim, a statistical test (correlation coefficient) is used to evaluate the strength of the relationship. The data from our experiments do not come from a normal distribution (according to the Shapiro-Wilk tests). Therefore the Spearman correlation coefficient is used for measuring the strength of the relationship.
From Table 3 the relationship between the number of collective members and the difference between the collective knowledge of a collective and the real knowledge state is very strong (–0.957 for Cereal vase, –0.818 for Coin jar, and –0.746 for Cow weight). In addition, the p-values are very small. Therefore, we can conclude that the correlation coefficient in this case is statistically significant. Generally, with some restrictions, the hypothesis the large number of collective members positively affects the quality of collective knowledge. In other words, the large number of collective members is helpful in reducing the difference between the collective knowledge of a collective and the real knowledge state.
In addition, the experiments also reveal that the difference between the real knowledge state and the collective knowledge of a collective become insignificant when the number of collective is large enough. It means that the increase of the number of collective members does not lead the collective knowledge of a collective to be closer to the real knowledge state. Concretely, we consider the following figures:
Figures 8–10 present the experiments with the number of collective members from 3 to 353 and in each step the number of collective members is increased by k = 50 members (not 1 member as above figures). According to these figures the difference between the real knowledge state and the collective knowledge seems become insignificant in case the number of collective members is larger than 100. However, this phenomenon should be formally proved and should be the subject of future work. Owning to it we can determine the number of collective members should be enough for a given subject in the real world. This problem has very high practical impact.
In the next section, we will prove some theorems about the influence of increasing or decreasing the number of collective members on the quality of collective knowledge in case of satisfying criterion O2.
Let x*, y*, and z* are collective knowledge of collectives (satisfying criterionO2) X, Y, Z respectively. If d (r, y) ≤ d (r, x*), then d (r, y*) ≤ d (r, x*). If d (r, z) ≤ d (r, x*), then d (r, z*) ≥ d (r, x*).
(a) For 2a):
We need to prove:
From d (r, y) ≤ d (r, x*), we have:
By adding to both sides of above formula, then we have:
By applying |a| + |b| ≥ |a + b|, the left side of above formula is:
In conclusion, we have:
In other words, the following dependency is true: d (r, y*) ≤ d (r, x*)
(b) For 2b):
We need to prove:
By applying |a - b| ≥ |a| - |b| we have the following dependency:
In addition,
According to the condition of removing member z, d (r, z) ≤ d (r, x*), we have:
In conclusion, the following dependency is true:
From Theorem 2 if we add members which are closer to the real knowledge state than the collective knowledge of a collective, then the quality of new collective (after adding that member) will be better. Conversely, the quality is worse in case of removing members which are closer to the real knowledge state than the collective knowledge of the collective. In addition, if we add a member which is the collective knowledge of a collective to that collective, then the quality of collective knowledge is unchanged. Concretely, for a given collective X, Y as follows:
According to [35], x* and y* are identical. That is:
Thus, the quality of collective knowledge is not better or worse in case of adding the collective knowledge to the collective. In other words, in case of objective case, this member is not helpful in improving the quality of collective knowledge. It is only useful in improving the consistency degree of a collective.
Conclusions and future works
In this paper, we have investigated the influence of the number of collective members on the quality of collective knowledge for objective case. Through experimental analysis, the number of collective members positively influences the quality of collective knowledge. However, the experiments also reveal that the difference between the real knowledge state and the collective knowledge of a collective seems to become insignificant when the number of collective is large enough. In this case, the increase of the number of collective members should not lead the collective knowledge of a collective to be closer to the real knowledge state. In addition, the following statements have been formally proved. The distance from the collective knowledge to the real knowledge state does not exceed the average of distances from the knowledge states in a collective to the real knowledge state. The quality of collective knowledge is better in case of adding members which are closer to the real knowledge state than the collective knowledge of a collective. Conversely, the quality is worse in case of removing this kind of members from the collective.
For future work, we will investigate the problem of analyzing the influence of the number of collective members on the quality of collective knowledge by taking into account the inconsistency degree of a collective. The number of collective members should be enough for solving some common problems in the real world are also considered as the future work. For these problems, paraconsistent logics could be useful [27].
