Abstract
Discovering an effective team of experts toward accomplishing the specific task in social networks has been considered in many real projects. The communication and collaboration among the members and the small cardinality of the team are in the opposite direction to success of the projects. In this paper, we show that the type of a similarity function is also impressive. Its importance is revealed on determining what similar or dissimilar experts should be selected or rejected in the process of the assignment. Considering the graph of underlying social network as conceptual social networking websites, we attribute the team formation problem as a vertices similarity environment based on their common neighbors regarding their co-authored papers. Also, the implicit similarities are used with respect to inattention of additional intermediates between any two nodes in the graph. In addition, taking inspiration from human–human interactions, using just the implicit vertex similarities propose a collaborative recommendation that is based on the team formation framework. They can also identify effectors in social networks established by the structural equivalence relation. Thus, they make the algorithm faster on searching for members of the team. Moreover, the proficiency similarity measures of authors are considered as their potential characteristics that measure their skillfulness level and real contribution corresponding to the required skill. The combination of similarity measures in the cost function causes the algorithm to search the more effective team specially in equal situations. The experimental results on DBLP co-authorship graph show the effectiveness of using the new similarity measures in the proposed method.
Introduction
A social network is a complex network presenting social interactions between people [38]. Recently, online social networking web sites such as Facebook, Twitter,1
MySpace, LinkedIn,2 Xing3 etc. have become so popular and applicable. Infact, these networks improve communications of people and society and let them to share their opinions, shopping and etc. all around the world. Social networks like many other networks show several interesting properties such as high network transitivity [39], power-law degree distributions [4], the existence of repeated local motifs [27] and groups structure [29]. Such highly informative environments can help experts to improve their collaborations. Therefore, finding qualified people and teams for a highly specific purpose can be achievable. In order to reduce costs, increase speed and improve quality and agility under the effects of changing business in social environments, organizations have flattened, and teams have emerged as a key organizational tool. Integrated product teams, task force teams, management teams, self-directed teams, multi-functional teams, leadership teams, quality improvement teams and venture start-up teams are just a few examples of how teams are used as an effective response to a range of business challenges. Clearly, teams have their problems. There have been enough studies on teams untill now to understand the critical phases that teams go through during their formations and developments. What is rarely considered as a cause of team failures is flaws in the initial configuration and structure of the team itself. Several researchers try to present a high-performance algorithm to find the desired team. Salas et al. define a team as a distinguishable set of two or more people who interact dynamically, interdependently and adaptively toward a common and valued goal/object/mission which was assigned by specific roles or functions to perform with a limited life span of membership [32]. Thus, teams are the special type of groups that are interdependent from each other and play specific roles for different team members. However, members of the team may be expert in different skills, but they can complement each other when they collaborate for the specific project according to their deadlines. Therefore, the measure of cooperation in teams is more significant than groups. Although members of the team maybe separate from each other, they can be selected from disconnected (variant) groups in the social graphs. The team’s purpose and the nature of the performed work are greatly influenced by a team’s configuration, formation and development. Likewise, in this paper, the appropriate team of experts is determined based on a given task as a set of skills.In the past few years, finding a team of experts has become one of the most important and popular subjects in the realm of social network analysis. It is necessary for many companies or departments to detect persons who have enough relevant experiences and expertise to accomplish a given specific task. Consequently, the success of the project is not only defined by the accomplishment of the task but also influenced by the communications and collaborations of team members that play significant role in the performance of the project. So, a project manager tries to evaluate and improve these relations. In addition, it is desirable to achieve a minimum possible cost in the process of accomplishing the project. Also, selecting two experts with respect to their previous collaborations and skills would greatly influence on the final team in real projects. Thus, considering the skill grading of selected individuals can affect the efficiency and quality of the project.
There are several investigations on detecting a team of experts among the works done recently by Lappas et al. [21]. The authors propose the cost function as a diameter of the selected sub-graph to estimate the communication costs of teams. They suggest the connection weights calculated by the Jaccard similarity measure. However, in present paper, we want to investigate other similarity measures for the final assignment. The proficiency and vertex similarity measures that indicate the basic concept of relationships and collaborations in the social network are considered. Suppose that, there is a set of candidates

Overal design of proposed cost function. Basic similarity measures and their mixed mode are used in this paper.
Figure 1 depicts the overall design of the new cost function in regard to the new similarity measures. As it is shown, the proposed cost function utilizes four factors in practical mode: first, the diameter distance that uses Jaccard or Cosine similarity measures based on the number of co-authored papers. Second, the implicit vertex similarity that contains the similarity measures based on the inattention of additional intermediate co-authorship concept. Third, the proficiency similarity that indicates the skillfulness level of each author according to the given task. The last one is the JaccardPearson skill grading which considers both the skill profile of the author and his contribution measure in the network. Moreover, combinations of these similarities are used in the cost function for searching more potential experts in a process of the algorithm. Here, Jaccard similarity, Cosine similarity, Topology Overlap similarity and the relation strength similarity are defined as vertex similarity measures. Furthermore, the contribution of skill (COS) and the significance of interests of skill (SOI) for the specific expert are determined as proficiency similarity measures. All of these similarity measures are charted in Fig. 2 and studied in details in Section 3.3.

Overal design of proposed cost function in details. The explicit similarity implies the direct connection of authors in regard to their co-authored papers. The implicit similarity is based on the virtual connection according to common neighbors. The proficiency similarity and JaccardPearson skill grading indicate the measure of author’s expertise and contributions.
The main contribution of the proposed method is the comparisons of the team performance upon different similarity functions and the combination of these functions. Basically, the Jaccard similarity function which is mentioned by Lappas et al. is used in this paper. In this function, experts are comprised based on their more quantitative common co-authored paper, and favorite team is formed by the lowest cost. It means that the comminucation cost is reduced when two members have more common co-authored paper. This similarity function may be sufficient for an investigation of the algorithm which is used by Lappas et al. [21]. However, this similarity function, Jaccard, is not adequate to introduce team members who are skilled and efficient specially in the equal conditions. Hence, we try to improve the team formation problem by using implicit similarity functions based on common neighbors. Also, the proficiency similarity function of each member is presented based on required skill. Actually, this team is not only connected based on the number of co-authored papers but also investigated by other prospectives such as level of skill and degree of co-neighbors.
Suppose that person A wants to choose one between person B and C as his next teammate. The one who has more co-authored papers is chosen to reduce the communication cost. In the previous algorithm, the next teammate is chosen randomly when they have similar communication cost. However, in this paper, the one who has shared parameters from other perspectives is selected. In fact, based on the social network structure, it can be stated that two individuals are more similar if they have more co-neighbors. Actually, there exist more individuals who are collaborating and trusting on them. Therefore, it seems that these two members has more commons and similarities in other skills in addition to their current skill. Also, they are capable for further collaborations. Hence, this factor is highly important. Moreover, in the next step, the personal specification of these members is considered. This character can be evaluated by the degree of favorite and/or contribution rank of each member on the required skill. Then, it can be possible to prefer some members rather than others according to this parameter.
It is noted that the proper position of these similarity functions is in the cost function. As the other example, suppose that person X is selected and skilled in our required task in the initial process. The algorithm should gradually improve to find proper team members as the teammates of X. So, person X should find his next skilled teammate (like person Y) through his connected neighbors and add these connected neighbors (i.e. path) to his related team. On the other hand, if these two members (person X and Y) are virtually connected, the cost of the final team will be decreased. By forming the virtual network in the initial step, the people are virtually linked based on the rank of their common neighbors. So, the algorithm will find team members faster and more economic in this new network since the main effectors are only inserted. It is noted that this advanced team is formed when the priority of diameter distance is assumed zero in the cost function, and just the implicit similarity for each pair of nodes are initially defined. In this method, first, the virtual network is constructed regards to the RSS measure of each two nodes in the graph. Then, the algorithm is implemented on this virtual graph and forms the final team. The similarity function of each skill is used for quality enlargement, and it does not interfere to its contributions. Although the members of this team may not directly connect to each other, the final team is expert and highly contributed. Then, we can use this team as a good framework for recommending of effectors to each other for collaboration in the future. These effectors are chosen according to their level of abilities, skills, contributions, shared papers and neighbors. Therefore, they are more reliable and can be recommended as proper co-workers. This network can be organized as an efficient social network in its next generation.
Figure 3(a) is given here to clarify our contribution comprehensively. We have an expert social graph where each node represents an expert, and the undirected weighted edges indicate the existence of communication between corresponding nodes. The
Regarding the proficiency similarity, we immediately conclude that
The other main contribution is depicted in Fig. 3(b). In this graph, implicit connections between nodes are assigned by assisting their intermediates. In the real scenario, when author
Here, several factors that influence on performance of the team are presented. The effectiveness of the team of experts is considered by various parameters and perspectives. Corresponding to our viewpoint, these factors are categorized as follows: first, the communication cost of the team should be minimized as much as possible. In fact, this function shows the similarity measure of each member. For example, Lappas et al. depicted that the communication cost is reduced when two people co-author more papers. Second, the size of the team must be reduced since it often has a positive effects on the expenses of a project. Third, connected teams should be preferred rather than disconnected ones. It means that the algorithm first finds the team which is connected to accomplish the required task, otherwise the disconnected one is chosen. Furthermore, the chosen members must be highly expert and skilled to support the required task. This factor can be the significance of interest and the quantity of contribution of task. Finally, the members who are highly communicated to other members are preferred. In fact, the members who have more collaborators are more trusty and effective in social networks. This parameter is depicted by more co-neighbors in our paper. The mentioned parameters of effectiveness are used in experiments and formulation of the problem, and these recommended methods show their performance in quantitative results.
The remainder of this paper is organized as five sections. The Section 2 gives brief reviews on the related works in the field of team formation and similarity measures. The Section 3 introduces the proposed framework as follows: first, the problem definition and preliminaries are defined, then suggested similarities including vertex, proficiency and skill grading similarity measures are mentioned. The experimental results are presented in the Section 4, and finally we conclude the paper in the Section 5.
The related works can be classified in two perspectives: (1) the previous studies on the team formation problem, and (2) the related works on proposed similarity measures which are used in this paper.
Team formation
The issue of selecting project team members has many usages in real projects. Particularly, it is more crucial for companies, R&D-oriented institutes and human resource management departments. For company’s activities, in-house staff members are organized in teams and departments. These organized teams perform as intended units and influence on the performance of the entire organization [26]. Team formation has great effects on creating organizational performance, so it becomes a very critical issue in the modern company environment. The R&D activity in institutes is typically run with the same approach as the project team management, and the main part of the organization is its related team [37]. However, the success of a project highly depends on human resources rather than the various resources required for the project. Like a project team, the task force team (TFT) organized for companies to carry out certain tasks is also characterized by its performance. They are highly depended on the organization of team members. Actually, being a functional team in companies, a project team in research organizations, or a TFT for a specific purpose, the team performance depends upon whether team members are well organized and collaborating or not. Wi et al. refer a team formation as the basis of a project team in a research organization. This viewpoint can be consistently applied to a functional team and TFT in companies [40].
A considerable number of works in the literature have been devoted to study the team formation problem. In these works, the authors study the team formation problem by transforming it into an integer programming. Simulated annealing [5], branch-and-cut [42] and genetic algorithm [40] are used to find an optimal match between individuals and requirements. Chen et al. use a psychological test to form a team by estimating the individual’s interpersonal relational attributes and their personalities [10]. The drive and temperament of individuals for achieving the fine quality of the team are evaluated by Fitzpatrick et al. [14]. However, Gaston et al. show the correlation between different graph structures and performance of a team, but they do not consider a computational problem of finding the team formation [15]. Cheatham et al. consider the structure of the social network by collecting the neighbors surrounding each skill in a social-concept [8]. All of these works have been done based on psychology, and they do not pay attention to the graph structure among individuals and the communication cost among them. The specific techniques required to estimate the strength of relationship among individuals are valuable researchs that have been pursued in other research works [23,41]. However, these directions of researches are used for general purpose and do not consider the team formation problem and its’ constrains. Backstrom et al. provide useful information about different aspects of the team formation in social networks [3]. In Ref. [2], the authors study a version of the team formation problem that the goal is to balance the load of individuals (i.e. the number of projects one is assigned to). However, this version of the problem overlooks the social collaboration and cost altogether.
The team formation problem in the presence of a social network of individuals is first addressed by Lappas et al. by considering the communication cost function [21]. They also prove that the problem of finding this kind of teams is NP-hard. In their work, they propose the RarestFirst and EnhancedSteiner algorithm to solve the team formation problem based on diameter and minimum spanning tree (MST) respectively. In RarestFirst approach which we use here as the basic algorithm, it first estimates each required skill’s supporter. Then, the skill with rarest sponsors is determined. Afterward, for each of its candidates, a sub-graph is defined by investigating the closest connected individuals in other support sets. In the last step, it selects the sub-graph with the minimum communication cost by using the diameter metric. The problem of finding a team of experts with a leader is introduced in Ref. [17] by Kargar et al. They also determine the modified cost function as the sum of distances of the expert team. The sum of the shortest distances between experts is responsible for each pair of skills. C. Li et al. in Ref. [24] propose a grouping graph method that condenses the expert information to a grouped graph according to the required skills. They extend the EnhancedSteiner algorithm of Lappas et al. to its generalized version for generalized tasks. Also, they offer a Generalized EnhancedSteiner algorithm for generalized tasks by associating each required skill with a specified number of experts.
Proposed similarity measures
Matching the proficiency level of individuals are considered in many communities. These studies are on individuals who highly satisfy the task requirements. Thus, the expert level is based on the number of evaluated residual skill sets which are or are not satisfied by related individuals. To formulate these applications, Zakarian et al. [42] and Boon et al. [6] aim to handle the problem with a fuzzy approach. Zakarian et al. in Ref. [42] propose an analytical model for selecting the multi functional teams that prioritize ‘team members’ based on task requirements. Boon et al. in Ref. [6] enhance the quality of the team by combining the qualities of candidates with functional requirements in order to assign the right individuals to right teams. In another study, to measure the suitability of the individual’s skill set for performing a task, Korwin et al. [20] address an algorithm that considers skills of candidates and utilizes fuzzy set theory to evaluate compatibilities of skill sets. In Ref. [11], forming a qualified and attributed team is presented where each attributed team satisfies at least the minimum requirements and required skills for a specific task. In this case of study, a fuzzy mathematical programming is modeled, and a solution algorithm based on the simulated annealing method is proposed for building an effective and qualified attributed team. Liu et al. [28] measure the similarity between users based on the correlation between their rankings of items rather than the rating values. Moreover, they propose a new collaborative filtering algorithm for ranking items based on the preferences of similar users. Candillier et al. in Ref. [7] give guidelines and discuss about combining traditional similarity measures to design a new weighted similarity criterion in the field of recommended systems.
There are several areas like social network analysis [22], information retrieval in World Wide Web [19] and collaborating filtering [34] that use vertex similarity measures to determine the similarity of two vertices based on the structure of the network. Some measures compute the number of common neighbors. Jaccard similarity [35], Cosine similarity [33] and Topology Overlap similarity [30] measures are usually used in many cases. These similarities indicate that two vertices have more similarities if they have more common neighbors. Although they consider only local information, they are computationally efficient.
Furthermore, the global similarity measures can be used for the vertex similarity measures. Although they look for the global network structure and consider extra appearance of the network, their computational complexity is prohibitive. Additionally, small changes in the network such as adding or deleting one node would affect the global structure based on similarities of whole network. So, using global structures is not feasible for large scale dynamic networks. Therefore, Chen et al. in Ref. [9] introduce the relation strength similarity function which is the same as global structures in large degree and local measures based similarities in low degree. The relation strength similarity shows the similarity of two vertices based on their common neighbors, and the separation degree marks the scale of separation between two vertices. Thus, by balancing the degree of separation, the compromise between local and global information search is achieved. The details of relation strength similarity measure are studied in Section 3.3.
Here, we use a skill grading method for achieving the effective team based on given weighted social network and communication cost. To achieve a specific weighted similarity measure, a formula is proposed by combining Pearson and Jaccard measures to benefit their complementarity. Pearson correlation corresponds to the Cosine deviation from the mean and considers only common attributes. On the contrary, Jaccard similarity measures the overlap that two individuals share with their skills. However, it does not take into account the different uses of rating scale by various individuals. So, the combination of these two traditional measures with the simple production is useful to benefit from their advantages.
The proposed framework
In this section, first, the problem definition of our work and preliminaries are completely reviewed. Consequently, our framework is presented.
Problem statement
As being humans, it is necessary for all of us to make social interactions with each other. None of us like to be alone, and we often have incentives to establish collaboration with everyone that has a common behavior, ethics, social interactions, and specially skill profile with us. This is the reason that we belong to the teams whose members are very similar to us in both real and cyberspace lives. This motivates us to consider the team formation problem in social networks as a play of co-authorship interactions between their constituents, i.e. people.
We indicate similarity measures on our work because they provide rigorous mathematical models of strategic interactions between experts and intelligent authors. Indeed, the fine similarity measure is a good tool to capture both the proficient level of expert’s skill profiles and the strategic interactions among them. It shows the influence of our contribution on the expert social behavior. According to this fact, the team formation problem is formally defined as follows: Given the social network graph
Preliminaries
Terminologies used in this paper
Terminologies used in this paper
The social network is modeled as an undirected and weighted graph
Furthermore, it is assumed that we have a set of m skills,
Framework foundation
In this section, we formally define our proposed method in details. Finding a team of experts while minimizing the presented communication cost function is proved to be an NP-hard problem in Ref. [21]. So, we cannot hope to compute optimum solutions for our team formation problem. In this investigation, we try to achieve the better team than previous studies with auxiliary parameters. The presented approximated Generalized Diameter algorithm based on diameter distance is proposed the same as the RarestFirst algorithm [21]. It addresses the team formation problem for basic tasks to compute closer optimum teams due to the proposed cost models. In this section, we turn our attention from an improved greedy and incremental method to the modified version of generalized tasks by presenting the Generalized Diameter algorithm in Algorithm 1. In general, the inputs of the algorithm are a graph
Identical to previous paper [13], the algorithm works as follows: it first computes the support set

The Generalized Diameter algorithm
Among all of the candidates from the set
The simple weight function can be calculated by Jaccard or Cosine similarities which is defined by explicit vertex similarity measures:
It is noted that there are several studies which show the Cosine similarity generally performs better than Jaccard similarity in most practical situations [16].
We assume that all of the shortest paths between each pair of nodes, proficiency and vertex similarity measures and all author’s skill grading have been pre-calculated and stored in a hash table. Then, for the graph
The vertex similarity measures describe how much two authors are similar regarding their previous co-authorship relations. In addition, their motivations to collaborate in the future are discussed based on the structure of social networks. At first, two well-known similarity measures are defined as Jaccard and Cosine similarity functions. In order to discriminate the last version of the Jaccard and Cosine similarity, we call them VicinityJaccard and VicinityCosine here:
Topology Overlap similarity also uses neighborhood information and presented as follow:
Although global similarity measures consider the whole network, these methods are not feasible for large scale social networks since they have high time complexity. However, Chen et al. [9] suggested a relation strength similarity function (RSS). The RSS of two experts is proportional to the number of their co-authored papers and stated as follow:
The relation strength similarity for two non-adjacent vertices shows that if two authors (
Suppose that there exists a simple path
For M distinct paths

Example of connected graph: (a) Explicit connected graph. (b) Implicit connected graph.
The proficiency similarity mapping table corresponding to Fig. 3
Although RSS is a good option for using in the cost function as it can consider the whole network, its complexity of computing the similarity from one vertex to all other vertices is
In this section, solutions for scaling the skill abilities of individuals are discussed. The proficiency similarity measures applied in this paper are the significance of interests of skill (SOI) and the contribution of skill (COS) which are defined as follows:
Table 2 represents the sample of skill mapping which is depicted by Fig. 3. Regarding the searching process of the best author to support required skill
JaccardPearson skill grading
We believe that the skill grading model for experts in a social network should not only rely on their skill profiles. Indeed, the main idea of our method is to present a formulation which considers both the expert skill information and his abilities of team working in a collaborative network. We focus on two well-known traditional similarity metrics (Pearson correlation and Jaccard similarity) that have been attracted by extensive researchs on Recommender Systems community (see [7,12,18,25,31]). Pearson correlation corresponds to the Cosine deviation of the mean and considers only the attributes in the common. On the other hand, Jaccard similarity measures the overlap that every two individuals share with their attributes. However, it does not take into account the different usages of rating scale by different users. In our proposed JaccardPearson skill grading, these two traditional measures are combined with simple multiple function to use the benefit of their advantages. This combined measure comprises between 0 and 1. In this adapted formulization, for determining the skillfulness level of each author
It is noted that these formulations, including vertices proficiency similarities, vertex similarities and the above skill grading, are implemented offline as a preprocessing phase in our settings.
Cost function
In the present study, the diameter is investigated as a criterion to measure the effectiveness of the selected team. The diameter definition refers to the largest shortest path between any pair of nodes in the graph. By considering the Dijkstra’s algorithm as a method to compute the distance among nodes, the diameter computation only relies on the physical distance. Since effectiveness of the team is a relatively general term, we could generalize its meaning by redefining the magnitude of the diameter as an instantiation of the communication cost. Unlike Lappas et al., we tend to optimize the diameter as a measure of both distance and skillfulness level of team members regards to their ability of contribution. Hence, according to this modification, a new definition of communication cost is presented. The cost of a path which is followed by individual
Where the parameter
Based on the related surveys on the deficiencies of the Pearson correlation [1,36], we trust on SOI and COS more than the skill grading of JaccardPearson similarity measure. Also, corresponding example mentioned afterwards in Section 4.2 verifies this shortage. So, if we want to use the proficiency similarity in the combined cost function, the JaccardPearson skill grading is not appropriate to integrate with other implicit vertex similarities. Despite of its deficiencies, this similarity could be used separately in the aforementioned cost function due to consider both skillfulness level of expert and its ability of contribution. Moreover, JaccardPearson skill grading somehow states the combined concept of our vertex and proficiency similarities. In addition, the algorithm can use the cost function to compare experts based on their minimum diameter distance, implicit vertex similarity or proficiency similarity functions. Then, the algorithm chooses one path to go ahead on searching process and finally gives us the appropriate team.
So, we define new cost function as follows:
Above parameters are balanced by the project manager. If he just evaluates based on diameter distance, γ should be one, and β and α are zero. On the other side, if he just evaluates upon quantity of co-neighbors, α should be one, and β and γ are defined zero. Finally, if these three factors are simultaneously evaluated, the compromising must be done. The balanced γ is set to 0.5 and
Experimental evaluation
Data set
Finding datasets that reflect the application scenarios of the team formation is the main challenge of current study. In fact, such systems are not widely used, and most data from them is not publicly available. For our experiments, we use the DBLP dataset as a benchmark dataset which is publicly available from the DBLP portal. The snapshot of this dataset was taken on April 12, 2006, while the data is related to papers which are published in areas of Database (DB), Data mining (DM), Artificial intelligence (AI), and Theory (T) conferences. They are used in order to cover the diverse fields of study (including 19 venues as follows: {SIGMODE, VLDB, ICDE, ICDT, EDBT, PODS, WWW, KDD, SDM, PKDD, ICDM, ICML, ECML, COLT, UAI, SODA, FOCS, STOC and STACS}).
The expert social network is constructed by using co-authorship graph in the following procedure. First, to collect the experts, the authors who have less than three papers in DBLP are discarded. This initial purification on dataset removes many authors who are not enough qualified for the selection. Moreover, these authors severely increase cost and cardinality of the team and negatively influence on quality of selections. Then, the skill set
The graph distance between two nodes in the graph
The preliminaries for performing our method are as follows. Each task
In this section, our results are compared with the well-known algorithm, RarestFirst (Lappas et al.). The influences of proposed similarity measures and their combinations on the cost function to search for suitable teammates are also analyzed. The comparisons are done with respect to the communication cost, team cardinality, the number of disconnected teams and the time complexity under consideration of different similarity measures.

Average communication cost, cardinality and number of disconnected team reported by RarestFirst algorithms: (a) Average communication cost. (b) Average cardinality. (c) Average number of disconnected teams. (Colors are visible in the online version of the article;

Average communication cost and cardinality of teams reported by Generalized Diameter algorithms: (a) Average communication cost. (b) Average cardinality. (Colors are visible in the online version of the article;
The average communication cost, team cardinality and the number of disconnected teams of the Diameter algorithm based on different similarity measures for each specified task are depicted in Figs 4 and 5. Figure 4 shows the efficiencies of similarity measures by RarestFirst Diameter algorithm, and Fig. 5 illustrates them with Generalized Diameter algorithm. The following results are achieved from the analysis of these figures. As we can see in Fig. 4(a) and Fig. 5(a), by increasing the number of required skills, the communication costs of the algorithms grow considerably. It is because the search space which is needed to be explored will be expanded in this case. In the other words, the probability of the existence of experts who are capable to do the required skills decreases due to the sparsity of the underlying graph. Therefore, the size of the team is clearly growing by increasing the number of required skills in Fig. 4(b) and Fig. 5(b).
In the case of similarity measures, Fig. 4(a) and Fig. 5(a) show that the Cosine similarity is less dominated than Jaccard similarity which Lappas et al. used in their cost function. Besides, the VicinityJaccard, VicinityCosine and TopologyOverlap similarity measures present relatively equal results. However, VicinityCosine is somehow better than VicinityJaccard similarity. This result is reasonable in many studies (like [16]) and hence, VicinityCosine is more reliable than VicinityJaccard in the practice. Regarding the relation strength similarity, the outcome of RSSmax is relatively identical to Jaccard similarity, and RSSsigma is partly the same as other three implicit vertex similarities. It is due to the fact that RSSmax selects the largest path and benefits only from one path of the expert’s contribution. While RSSsigma uses the advantages of collaboration abilities of all neighbors, it determines much more effective team than RSSmax. Hence, like Chen et al. [9], the sigma version of RSS is recommended for using relation strength similarity.
The observations from the analysis of Fig. 4(b) and Fig. 5(b) show the comparisons of teams cardinality measures. The algorithm with different similarity measures has been developed regarding the growth of required skills. However, the proposed similarity measures such as VicinityJaccard, VicinityCosine, TopologyOverlap and RSSsigma outperform those other similarities. Hence, they can control the size of the team much better.
According to the implicit vertex similarity measures like RSS, it should be considered that when a degree of separation is increased, the results can be much more effective. Furthermore, the presented framework is general and could be easily extended to be used in higher degrees. Actually, the final selected team would be more efficient by ascending the degree of separation. In this case, the teammate selection is directly formed by the author in the network, and more effectively indirect collaborators are founded. Hence, the algorithm searches for effectors in the whole picture of the social graph. Also, it may be developed in the next snapshot because of introducing effectors for co-authoring in the next generation. So, the communication cost and size of the selected team will become small as much as possible. Although this concept means that a current expert can explore more potential collaborations by degree
Our final evaluation is about the number of disconnected teams. In the real world projects, the employees who are in the same department can communicate and collaborate more easily than other employees who are in the outside. Therefore, it is so importance to detect connected teams to minimize the communication cost. As it is depicted in Fig. 4(c), the Diameter algorithm by various similarity measures like Jaccard similarity first try to find the connected teams. Then, if these teams are not available, they determine the disconnected ones.
As we can see in Fig. 5, compared to RarestFirst algorithm, Generalized Diameter algorithm generally achieves a dramatic decrease in the size of selected team and communication cost. As a matter of fact, in the Generalized Diameter algorithm, for covering each skill, the organized team is checked whether the visited individuals satisfy other required skills or not. This improvement leads to a remarkable reduction in the size of the team and its communication cost. Since the size of selected team could play crucial role in the expense of the project, it has been included and compared with our experimental evaluations. Furthermore, the number of disconnected teams is the same as the RarestFirst algorithm. Hence, the experimental results based on the number of disconnected teams are not depicted again.

Average communication cost and cardinality of teams reported by Generalized Diameter algorithms with combination of Cosine for diameter cost, RSS for implicit vertext and SOI, COS for proficiency similarity measure by considering
According to combination of similarities, Fig. 6 shows that using the proposed similarities and their combinations are satisfied in the cost function. In this experiment, the Cosine similarity is considered for a diameter cost between authors due to approximately better results than Jaccard similarity measure shown in Fig. 4 and Fig. 5. Moreover, RSS is appointed as a representative of the implicit vertex similarity because it can be improved by increasing the degree of separation (hole size) in practice. Hence, we believe that this implicit vertex similarity participating in the comparing process is more appropriate.
As it is mentioned in Section 3.3.5, for combined scenario, the constants are defined as

Network of connections between individuals in
Although the JaccardPearson skill grading helps the algorithm defining the effective team, the approach has many deficiencies. There are many studies indicating that the Pearson correlation is problematic [1,36]. The following example proves this claim. Suppose we have the weighted graph shown in Fig. 7: In Fig. 7(a), with respect to author

The efficiency evaluation of using different similarity measures on the Generalized Diameter algorithm for the supposed graph. (Colors are visible in the online version of the article;
Figure 8 indicates the efficient evaluation of various similarity measures which are used in the Generalized Diameter algorithm on DBLP dataset. We find the followings: first, as seen in Fig. 8, the results show that all of the algorithms associated with RSS similarity measure outperform the other algorithms. The average running time of Diameter algorithm by using RSS, VicinityJaccard and VicinityCosine similarities have gradually increased by rising the number of required skills. The RSS method which provides effectiveness and scalability of the results reduces the search space into a well-condensed form of main effectors. Also, the Diameter algorithm using RSS provides a significant improvement in term of efficiency compared with explicit vertex similarity. The main reason is that when we use approaches like implicit vertex similarities, the intermediates are released in a searching process. In addition, if two authors need to join to each other for making the other searching generation, they can do it at first. Furthermore, these implicit vertex similarities can be used to find more effective team of experts in social graph. They are chosen based on their expertise profiles and their effective contributions in spite of their direct co-authors collaborators. Thus, these implicit co-authors can be suggested to each other to achieve more effective teams in the future.
In summary, it can be seen that more appropriate team of experts can be achieved by using our Generalized Diameter based on proposed similarity measures. The analysis of the experimental results imply that our proposed framework is a selective method by offering promising performance on communication cost and team cardinality of the selected team for the required given task.
In the present paper, the problem of finding the effective team of experts which can accomplish a specific given task with minimum communication cost is studied. The proposed framework is formed under different similarity measures. We attribute the team formation problem as a vertices similarity environment based on their common neighbors. Actually, they regard their co-authored paper and their implicit similarities with respect to releasing their intermediates. Moreover, the implicit vertex similarities propose a collaborative recommendation that is based on the team formation framework. They can also identify effectors in social networks established by the structural equivalence relation. Thus, they make the algorithm faster on searching for members of the team. The proficiency similarities are also studied to be involved in the searching process. This measure analyzes the potential expertise of authors according to the measure of interest and contribution of the required task in the social scenario. Although the combination of similarity measures in the cost function causes the algorithm to have more complexity, they help the algorithm to search for the more effective team specially in equal situations. While the implicit vertex similarities are used, applying the maximum degree of separation to search for the suitable collaborators is not feasible due to the very large strategy space. In fact, the influence of using the implicit vertext similarity rather than diameter measure is important for us. Hence, just a minimum degree of separation is applied in presented experiments to show its effect. However, the framework could be easily extended to be used in higher degrees. In addition, it is obvious that the final selected team would be more efficient by rising the degree of separation.
The experimental results on DBLP show that the effective teams could be found with the minimum communication cost, cardinality and number of disconnected teams. Also, the time complexity of using different similarity measures is studied for efficiency estimation of the approach.
For the future works, more constraint teams can be considered. Furthermore, the generalized tasks can be studied and defined with the required skills which should be supported by the minimum number of experts.
Footnotes
Acknowledgement
This work is supported by Iranian Telecommunication Research Center (ITRC) under Grant No. T/500/13226.
