Bayesian inference of financial networks

Abstract

Network data arises naturally in a wide variety of applications in different fields. In this article we discuss in detail the statistical modeling of financial networks. The structure of such networks red has not been studied thoroughly in the past, mainly due to limited accessible data. We explore the structure of a real trading network corresponding to transactions within the natural gas future market over a four-year period. The detection of meaningful communities of actors within networks is particularly relevant to understand the topology of a complex system like this. We explore the usage of stochastic block models in conjunction with a nonparametric Bayesian approach in order to identify clusters of traders in a flexible modeling framework. Our findings strongly indicate that the proposed models are highly reliable at detecting community structures.

Keywords

Bayesian inference community structure stochastic block models trading networks

1. Introduction

Due to recent substantial computational advances as well as vast data resources, the interest in developing methods to analyze data associated with network structures has dramatically increased in diverse research areas during the last two decades. In this regard, Newman (2003) provides a general classification for network data in four groups: Information networks, technological networks, biological networks, and social networks. A popular example of a network with information elements is the World Wide Web citations between web pages, while the Internet and telephone systems are examples of technological networks. There is a variety of biological systems that can be represented trough a network structure, such as food webs and neural networks. Finally, social networks represent relationships between individuals, also known as actors in this context. Examples of social networks include networks of friendships in a community and financial exchanges between traders.

Statistical methods for network analysis are mainly oriented to explore local and global features, in order to explain the structure of the network as well as the processes that created it. Basic analysis tools have been developed to describe properties such as centrality and connectivity of the network elements, but more sophisticated analysis allow us to estimate unobserved structural features and make inference about the network properties.

1.1 Basics

A network is defined as a collection of inter-connected entities. Such elements are typically called nodes or vertices. The underlying structure of the network is given by the so-called links or edges between pairs of vertices. Networks can be regarded as weighted networks when each edge is associated with a particular number known as weight, as opposed to binary networks, in which only the presence or absence of an edge between vertices is relevant. For example, weights might represent costs of transactions between companies, distances between locations, or the strength of the relationship between individuals.

An edge between a pair of vertices can be directed, meaning that knowing which element initiated the connection is relevant for the structure of the network. In this case, the network is said to be a directed network, and it is called undirected network otherwise. A classical example of a directed network is the network of citations between academic papers, where there is an intrinsic edge directionality due to the chronological nature of the system. Examples of undirected networks include the network of sexual contacts within a community and the network of diplomatic exchange between countries.

For the analysis of networks, the associated graph is commonly represented using an adjacency matrix, also known as sociomatrix. In the case of a binary networks with $n$ vertices, the adjacency matrix is a $n\times n$ binary matrix $\mathbf{Y}=[y_{i,j}]$ , where $y_{i,j}=1$ or $0$ depending on whether vertices $i$ and $j$ are connected or not. For weighted networks the entries of $\mathbf{Y}$ correspond to the values of the corresponding weights. The adjacency matrix of undirected networks is necessarily symmetric. Also, the diagonal entries are considered as structural zeros, i.e., edges connecting a vertex to itself are not allowed. This work is mainly concerned with undirected binary social networks, but most of the approaches discussed here can be extended to other types of relational data.

1.2 Descriptive methods

The properties of a networks and its elements are useful to explore the overall structure of the entire system. Some characteristics and measures of interest are the following:

•
Degree: Number of edges connected to a given vertex. In directed networks, the degree can be defined in terms of in-coming edges (in-degree) and out-going edges (out-degree). For a network with $n$ vertices, the vertex degree can take values from $0$ (isolated vertex) to $n-1$ (fully connected vertex). Thus, the degree of a vertex can be considered as a measure of centrality of the vertex in the network.
•
Betweenness: Vertex centrality measure typically defined as

$\displaystyle B_{v}=\sum_{i\neq j\neq v}\frac{\mathfrak{g}(i,j\mid v)}{% \mathfrak{g}(i,j)}\,,$

where $\mathfrak{g}(i,j\mid v)$ is the total number of shortest paths connecting vertices $i$ and $j$ going through vertex $v$ , and $\mathfrak{g}(i,j)=\sum_{v}\mathfrak{g}(i,j\mid v)$ is the total number of shortest paths between vertices $i$ and $j$ (regardless of whether they go through vertex $v$ or not).
•
Clustering coefficient: Network transitivity measure defined as the propensity with which two neighbors of the same vertex will be neighbors too. It is typically calculated as

$\displaystyle C=3\times\frac{\text{number of triangles in the network}}{\text{% number of connected triples of vertices}},$

where a triangle is a set of three vertices connected to each other by three edges, and a connected triple is a sub-graph of three vertices connected by two edges.

Other relevant properties of a network are resilience, community structure, and mixing patterns. See Kolaczyk and Csárdi (2020) for a pragmatical discussion about these and other network related topics. Centrality measures reflect the importance of a vertex in the topology of the network, which allows us to identify those vertices that by being removed cause structural changes in the full system, and therefore, affect the network resilience. In addition, it is often important to identify clusters of vertices with a large number of edges between them and a low connectivity with other vertices belonging to other groups. This behavior is commonly observed in social networks reflecting a community structure, which is natural in this type of data. More generally, it is common to observe selective linking between actors leading to mixing patterns in the network. For example, assortative mixing or homophily occurs when actors with similar attributes are more likely to relate to one another. This phenomenon of assortativity is a possible source of community structure as well as transitivity in the network. However, a network can have assortative mixing patterns and no community structure.
1.3 Scope

This project is particularly interested in methods for community detection in networks. Hierarchical clustering has been traditionally used to find communities by assigning linkage weights to pairs of vertices according to a similarity measure. In particular, two vertices can be considered similar when they are connected to the same vertices. This similarity criteria is known as structural equivalence (pattern in which nodes can be divided into groups such that members of the same group have similar patterns of relationships) and, for binary networks, it is usually measured by means of the Hamming (or edit) distance. Such structural equivalence approach for community structure identification was extended in the stochastic block modeling literature first introduced by White (1976). More measures and properties for general networks are discussed in Newman (2003) and specific analyses in social network applications are widely developed in Wasserman and Faust (1994). See also Nowicki and Snijders (2001) for foundational aspects related to specifics of the stochastic block modeling approach.

2. Network modeling

Descriptive analysis techniques such as the ones discussed above allow us to explore network features that can be difficult to associate with a particular global process as the complexity of the network increases. However, it is also of interest to estimate as well as make inferences about unobserved structural features of the network. Hence, statistical modeling strategies for network analysis becomes relevant.

The simplest model for networks is the class of random graphs proposed by Erdos and Rényi (1960). In this model, $\mathcal{G}_{n,p}$ denotes the set of all possible graphs with $n$ vertices such that an edge between any pair of nodes is added in the graph independently with probability $p$ . Under these conditions, the probability distribution of the sociomatrix is given by

$\displaystyle p(\mathbf{Y}\mid p)=p^{s}(1-p)^{n(n-1)-s},$

where $s=\sum_{i\neq j}y_{i,j}$ for directed networks. Thus, the degree distribution of any vertex has the form

$\displaystyle p(d\mid p)=\binom{n}{d}p^{d}(1-p)^{n-d}\approx\frac{e^{-\lambda}% \lambda^{d}}{d!},$

where the approximation is obtained by assuming a constant mean degree $\lambda=(n-1)p$ for large $n$ . Subsequently, generalized random graph models arise as an extension of random graph models. Popular examples include those models whose degree distribution follows a power-law distribution of the form

$\displaystyle p(d\mid\alpha)=\frac{d^{-\alpha}}{\zeta(\alpha)},$

where $\zeta(\alpha)$ serves as a normalizing constant. Even though such distribution exhibits a realistic behavior often seen in real-world networks, this model assumes independence between neighbors of vertices, an therefore, other structural network features, such as the network transitivity, fail to be captured under this approach.

In order to tackle the transitivity issue described above and built more realistic models for social networks, Frank and Strauss (1986) proposed the introduction of Markov dependence in the context of exponential random graph models (also known as $p^{*}$ models). The general exponential random graph model uses local features to explain the global structure of the network. The model assigns a joint probability distribution for the elements of $\mathbf{Y}$ based on simple summary statistics (Robins et al., 2007). More specifically, the model is given by

$\displaystyle p(\mathbf{Y}\mid\theta_{1},\ldots,\theta_{K})\propto\exp{\left\{% \sum_{k=1}^{K}\theta_{k}S_{k}\right\}},$

where $S_{k}$ is a network statistic (such as the number of links or the number of triangles), and $\theta_{k}$ is the corresponding parameter. These models are computationally challenging because the normalizing constant of the likelihood (which involves a sum over a finite but very large number of terms) depends on the parameters $\theta_{1},\ldots,\theta_{K}$ , which restricts the use of traditional likelihood methods. Moreover, the main drawback of this class of models is that they may produce poor fitting models for real data as they are weak at capturing local (as opposed to global) network features (Snijders, 2002).

Alternatively, Holland and Leinhardt (1981) proposed the class of $p_{1}$ models for directed networks. In the binary case, the likelihood in exponential family form using the conditional independence of the entries of $\mathbf{Y}$ is given by

$\displaystyle p(\mathbf{Y}\mid\bm{\eta})=\exp{\left\{\sum_{i\neq j}(\eta_{i,j}% y_{i,j}-\log(1+e^{\eta_{i,j}}))\right\}},$ (1)

where the natural parameters in $\bm{\eta}=[\eta_{i,j}]$ are such that

$\displaystyle\eta_{i,j}=\text{logit}(\theta_{i,j})=\mu+\alpha_{i}+\beta_{j}+% \rho_{i,j},$ (2)

with $\mu$ denoting the network density, $\alpha_{i}$ and $\beta_{j}$ the propensity of nodes to produce and attract links, $\rho_{i,j}$ the reciprocation degree of links, and $\theta_{i,j}$ the interaction probabilities. Although $p_{1}$ models directly account for local node features (such as in-degree, out-degree, and reciprocity), it does not provide information about network community structure. Furthermore, under Eq. (2) the model is over-parameterized as the number of parameters exceeds the amount of data points, which means that regularization techniques are in order to fit the model.

A related approach was introduced in Hoff et al. (2002) using latent space models. Under this perspective the interaction probability among actors decreases as their latent positions become distant in the so called social space. Latent positions $\bm{z}_{1},\ldots,\bm{z}_{n}$ are estimated from the data, which allows us to characterize to some extent similarities and dissimilarities among subjects. Once again, the entries of $\mathbf{Y}$ are considered as conditionally independent given the latent positions and other model parameters, and therefore. Thus, this model uses the likelihood given in Eq. (1) with

$\displaystyle\eta_{i,j}=\text{logit}(\theta_{i,j})=\mu+\alpha_{i}+\beta_{j}-\|% \bm{z}_{i}-\bm{z}_{j}\|,$

where $\mu$ , $\alpha_{i}$ and $\beta_{j}$ are defined as before, and $\|\bm{z}_{i}-\bm{z}_{j}\|$ is the Euclidean distance between the unknown latent positions in the social space for actors $i$ and $j$ , respectively. The network transitivity is naturally introduced in the model considering that neighbors of the same subject tend to be close to each other in the social space, as they have a higher probability of being related. Although estimation under this approach is simpler compared with the exponential random graph models, latent space models are not specifically design to identify network clustering patterns (Sosa & Buitrago, 2021).

In order to address the issue of cluster identification, the notion of stochastic block models (SMBs) was first introduced in Wang and Wong (1987) for data with pres-specified groups (also known as clusters, blocks, or factions). The block modeling approach relies on the concept of structural equivalence to create a partition of similar subjects allowing the modeler to learn about the community structure of the network. Structural equivalence implies that the probability distribution of links with other subjects is approximately the same for vertices in the same group, in a way that the partition is induced not only by the inner relations within clusters, but also by the interactions among subjects belonging to different clusters Wasserman and Faust (1994).

Wang and Wong (1987) proposed to generalize $p_{1}$ models in Eq. (1) by including an interaction term in the linear predictor as follows:

$\displaystyle\eta_{i,j}=\text{logit}(\theta_{i,j})=\mu+\alpha_{\xi_{i}}+\beta_% {\xi_{j}}+\gamma_{\xi_{i},\xi_{j}},$ (3)

where $\mu$ , $\alpha_{\xi_{i}}$ and $\beta_{\xi_{j}}$ can be interpreted as before, $\gamma_{k,\ell}$ are block parameters introducing unobserved cluster information, $\xi_{i}\in\{1,\ldots,K\}$ are cluster assignments ( $\xi_{i}=k$ means that actor $i$ belongs to cluster $k$ ), and $K$ is the number of pres-specified network factions. For a known vector of cluster assignments $\bm{\xi}=(\xi_{1},\ldots,\xi_{n})$ , this model is a good alternative to the over-parameterized formulation given in Eq. (2). However, in the case of SBMs, $\bm{\xi}$ is unknown and needs to be estimated from the data along with all the other parameters. Such a challenge leads to the use of latent variable models and mixture models. Nowicki and Snijders (2001) proposed a simple Bayesian model using a Dirichlet prior for the probabilities of the latent classes. Airoldi et al. (2008) introduced the idea of mixed membership SBMs for binary networks, where actors can belong to more than one latent class allowing to explore subjects with multiple roles in the network. Some extensions using a Bayesian approach are based on the Dirichlet process prior under the scope of nonparametric hierarchical Bayesian models (Kemp et al., 2006; Xu et al., 2006).

To conclude this brief review, it is important to distinguish the network modeling approaches discussed here from other areas, such as graphical models (also called Bayesian networks), in which nodes represent random variables and the goal of the model consists in inferring the structure of the graph rather than understand how the network itself arises.

3. SBMs as generalized linear models

Community structure is a common feature in real social networks, and the ability to find and interpret clusters can provide a powerful tool in understanding the structure of a given network. This work is mainly focused on SMBs for directed binary networks, but further extensions to undirected and weighted networks are also feasible. Thus, the entries of $\mathbf{Y}$ can be modeled as conditionally independent given the cluster assignments $\bm{\xi}=(\xi_{1},\ldots,\xi_{n})$ , for a fixed number of groups $K$ , and the cluster-specific interaction probabilities $\mathbf{\Theta}=[\theta_{k,\ell}]$ . Further, it is just natural to assume Bernoulli distributions for the values in $\mathbf{Y}$ , and therefore, the likelihood has the following general form:

$\displaystyle p(\mathbf{Y}\mid\bm{\xi},\mathbf{\Theta})=\prod_{i,j:i\neq j}% \textsf{Ber}(y_{i,j}\mid\theta_{\xi_{i},\xi_{j}})=\prod_{k=1}^{K}\prod_{\ell=1% }^{K}\prod_{(i,j)\in A_{k,\ell}}\textsf{Ber}(y_{i,j}\mid\theta_{k,\ell}),$ (4)

where $A_{k,\ell}=\{(i,j):i\neq j,\xi_{i}=k,\xi_{j}=\ell\}$ and $\theta_{k,\ell}=\textsf{Pr}(y_{i,j}=1\mid(i,j)\in A_{k,\ell})$ . In the case of undirected binary networks, the likelihood is computed only for $i<j$ .

Under the Bayesian paradigm, the cluster-specific interaction probabilities can be modeled by imposing a prior directly on $\mathbf{\Theta}$ for example, by letting $\theta_{k,\ell}\mathrel{\overset{\makebox[0.0pt]{\mbox{\@setsize{\tiny}{6pt}{% \vpt}{\@vpt}iid}}}{\sim}}\textsf{Beta}(a,b)$ , for all $k$ and $\ell$ . Alternatively, the parameters can be linked to a linear predictor by using a link function $g(\cdot)$ , such that $\theta_{\xi_{i},\xi_{j}}=g^{-1}(\eta_{i,j})$ , where $\eta_{i,j}$ is given as in Eq. (3). In particular, the probit link $g=\Phi$ , with $\Phi$ the standard Normal cumulative distribution function, is a convenient choice because of computational simplicity under Gaussian priors for $\mu$ , $\{\alpha_{k}\}$ , $\{\beta_{k}\}$ , and $\{\gamma_{k,\ell}\}$ . Furthermore, using the probit link the expression for the linear predictor resembles a random effects two-way analysis of variance making it simpler the interpretation of the model parameters. Tests for the $\gamma$ parameters in this model are useful to test for transitivity patterns in the network.

3.1 Optimal cluster number selection

The Bayesian Information Criterion (BIC) is useful for a naive exploration of the number of groups as well as the agglomeration method detecting optimal clustering patterns. The traditional BIC cannot be computed for the model given in Eq. (3) when blocks with only zeroes or ones are observed. In that case, the maximum likelihood estimator of the probability of ties in a generic block is simply $\hat{\theta}_{k,\ell}=0$ or $\hat{\theta}_{k,\ell}=1$ , respectively. To address this issue, we propose the following modified version of the BIC:

$\displaystyle\text{BIC}^{*}=\sum_{k=1}^{K}\sum_{\ell=1}^{K}\sum_{(i,j)\in A_{k% ,\ell}}\log\textsf{Ber}(y_{i,j}\mid\tilde{\theta}_{k,\ell})-{\textstyle\frac{K% }{2}}\log n,$

where

$\displaystyle\tilde{\theta}_{k,\ell}=\frac{y_{k,\ell}+0.5}{n_{k,\ell}+1}$

is a shrinkage estimator of $\theta_{k,\ell}$ , with $y_{k,\ell}=\sum_{(i,j)\in A_{k,\ell}}y_{i,j}$ and $n_{k,\ell}=|A_{k,\ell}|$ . Computing $\text{BIC}^{*}$ for different values of $K$ and/or agglomeration methods for hierarchical clustering, provides a straightforward approach to identify communities in the data.

3.2 Bayesian nonparametric modeling

A fully Bayesian approach for making inference using the model given in Eq. (4) requires a prior specification for both $\mathbf{\Theta}$ and $\bm{\xi}$ . The specification of nonparametric priors allows us to automatically identify the number of parameters in order to address the problem of over-fitting traditionally observed in regular parametric models. In particular, we consider a prior for the community indicators such that

$\displaystyle\xi_{i}\mid\bm{\omega}\mathrel{\overset{\makebox[0.0pt]{\mbox{% \@setsize{\tiny}{6pt}{\vpt}{\@vpt}iid}}}{\sim}}\sum_{k=1}^{\infty}\omega_{k}% \delta_{k}$

and $\omega_{k}=u_{k}\prod_{h<k}(1-u_{h})$ are weights constructed from a sequence $u_{1},u_{2},\ldots$ , with $u_{k}\mathrel{\overset{\makebox[0.0pt]{\mbox{\@setsize{\tiny}{6pt}{\vpt}{\@vpt% }ind}}}{\sim}}\textsf{Beta}(1-a,b+ka)$ for $0<a<1$ and $b>-a$ . The joint distribution of the set of weights $\bm{\omega}=(\omega_{1},\omega_{2},\ldots)$ is called a stick-breaking prior with parameters $a$ and $b$ , which is denoted by $\bm{\omega}\sim\textsf{SB}(a,b)$ . This formulation is intrinsically connected to the stick-breaking construction of the Poisson-Dirichlet process (Pitman & Yor, 1997). The stick-breaking representation associated with the Dirichlet process is a special case with $a=0$ .

Under the previous prior specification, the cluster assignments $\bm{\xi}$ follow a simple set of predictive rules satisfying $\xi_{1}=1$ and

$\displaystyle\xi_{i}\mid\xi_{i-1},\ldots,\xi_{1}\mathrel{\overset{\makebox[0.0% pt]{\mbox{\@setsize{\tiny}{6pt}{\vpt}{\@vpt}ind}}}{\sim}}\sum_{k=1}^{K^{*}}% \frac{m_{k}^{i}-a}{b+i-1}\,\delta_{k}+\frac{b+K^{*}a}{b+i-1}\,\delta_{K^{*}+1}% ,\quad\text{for }i\geqslant 2,$

where $K^{*}=\max\{\xi_{1},\ldots,\xi_{i-1}\}$ is the number of distinct values among the first $i-1$ cluster assignments, and $m_{k}^{i}=\sum_{j=1}^{i-1}1_{\{\xi_{j}=k\}}$ is the number of cluster assignments taking the value $k$ among the first $i-1$ observations, for $k=1,\ldots,K^{*}$ . This generative model for $\bm{\xi}$ is often called the generalized Chinese Restaurant Process, which is denoted by $\bm{\xi}\sim\text{CRP}(a,b)$ . Furthermore, it can be shown that the joint distribution of $\bm{\xi}$ is given by

$\displaystyle p(\bm{\xi})=\frac{\Gamma(b+1)}{(b+Ka)\Gamma(b+n)}\prod_{k=1}^{K}% \frac{(b+ka)\Gamma(m_{k}-a)}{\Gamma(1-a)},$ (5)

where $K$ is the number of distinct values among $\xi_{1},\ldots,\xi_{n}$ , and $m_{k}$ is the number of cluster assignments taking the value $k$ .

3.3 Illustration of community detection: Sampson’s monastery data

The classical Sampson network dataset consists of the relationships among $n=18$ novice monks in a New England monastery (Sampson, 1968). The data can be represented as a directed binary network with 88 edges. For this social network, the clustering coefficient is 0.465 which is higher than the estimated probability of a link $\hat{p}=0.288$ suggesting that transitivity is an important feature in the network structure. An initial exploration of the community structure is addressed by choosing the optimal number of groups and the agglomeration method by using our proposed metric $\text{BIC}^{*}$ .

The first panel in Fig. 1 shows the values of $\text{BIC}^{*}$ for $K=1,\ldots,10$ using seven different methods of agglomeration (hierarchical clustering variations using different strategies). The complete hierarchical clustering method with $K=3$ produces the maximum value of $\text{BIC}^{*}$ . The results of the hierarchical clustering obtained from this optimal choice are presented in the dendrogram (upper-left corner) in Fig. 1. In addition, the second row of this Figure displays the sociomatrix along with the estimated interaction probability matrix using the shrinkage estimator $\tilde{\theta}_{k,\ell}$ and re-arranged according to the optimal clustering.

Our findings show that three tight factions of monks are observed, which is reflected in the interaction probability matrix by the high values in the corresponding groups (red blocks) and low interaction probabilities between them. The identified factions are consistent with the original observations made by Sampson, in which the monks were classified in four groups: The loyal opposition consists of the novices who entered the monastery first (Bonaventure, Ambrose, Berthold, Peter, Louis), the young turks who joined later (Winfrid, Jhon, Gregory, Hugh, Boniface, Mark, Albert), the outcasts who were not accepted in the last two groups (Basil, Elias, Simplicius), and the interstitial group with monks that wavered between the two main factions (Ramuald, Victor, Amand).

Figure 1.

$\text{BIC}^{*}$ values using seven methods of agglomeration, dendrogram for the optimal hierarchical clustering, adjacency matrix, and estimated probability matrix according to the optimal clustering.

Figure 2.

Incidence matrices (first row) and posterior interaction probability matrices (second row) for the Probit and Beta-Binomial models.

In addition to the previous characterization, the generalized linear model

$\displaystyle y_{i,j}\mid\theta_{k,\ell}\mathrel{\overset{\makebox[0.0pt]{% \mbox{\@setsize{\tiny}{6pt}{\vpt}{\@vpt}ind}}}{\sim}}\textsf{Ber}(\theta_{k,% \ell}),\quad\theta_{k,\ell}=\Phi(\eta_{k,\ell}),\quad\eta_{k,\ell}=\mu+\alpha_% {k}+\beta_{\ell}+\gamma_{k,\ell},$

was fitted conditional on $(\xi_{i},\xi_{j})=(k,\ell)$ given by the optimal clustering identified above. The test for the interaction term of the probit model was significant ( $p$ -value $<2.2e-16$ ) indicating that network transitivity is a relevant source of community patterns in these data.

Finally, we consider fully Bayesian SBMs using two slightly different approaches:

•

Probit: Using the representation through auxiliary variables $z_{i,j}$ such that $y_{i,j}\mid z_{i,j}=1$ if $z_{i,j}>0$ and $y_{i,j}\mid z_{i,j}=0$ if $z_{i,j}\leqslant 0$ . Thus, $z_{i,j}\mid\eta_{\xi_{i},\xi_{j}}\mathrel{\overset{\makebox[0.0pt]{\mbox{% \@setsize{\tiny}{6pt}{\vpt}{\@vpt}ind}}}{\sim}}\textsf{N}(\eta_{\xi_{i},\xi_{j% }},1)$ , and $\eta_{k,\ell}\mathrel{\overset{\makebox[0.0pt]{\mbox{\@setsize{\tiny}{6pt}{% \vpt}{\@vpt}iid}}}{\sim}}\textsf{N}(m,v)$ .

•

Beta-Binomial: $\theta_{k,\ell}\mathrel{\overset{\makebox[0.0pt]{\mbox{\@setsize{\tiny}{6pt}{% \vpt}{\@vpt}iid}}}{\sim}}\textsf{Beta}(c,d)$ .

In both cases, we let $\bm{\xi}\sim\text{CRP}(a,b)$ as the prior distribution for the cluster assignments. Furthermore, in the probit representation of the model, $z_{i,j}$ and $\eta_{i,j}$ are highly correlated, which can produce poor mixing when using Markov chain Monte Carlo (MCMC) to fit the model. We also implement the method described in Held and Holmes (2006) to jointly update $z_{i,j}$ and $\eta_{i,j}$ in order to deal with the slow mixing issue. For the Beta-Binomial model, the hyperparameters are taken as $c=\hat{p}$ and $d=1-c$ , where $\hat{p}=0.288$ is the estimated probability of a link in the network. Following the same idea, the hyperparameters $m$ and $v$ for the probit model are chosen carefully in order to obtain similar expectation and variance values than the Beta-Binomial model after transforming back from the probit link.

Figure 2 presents the incidence matrices (first row) and the posterior interaction probability matrices (second row) for the two proposed models. The incidence matrix represents the proportion of times in which two monks belong to the same community. These results are based on a Markov chain with 30,000 iterations after a burn-in period of 10,000 iterations. The mixing of the algorithms for these data is fast due to the strong cluster signals. As expected, the results for the two models are very similar, and consistent with the initial analysis. The three initial factions are identified by the models with an additional advantage of better discrimination of the actors according to their roles into the bigger communities. For example, Amand wavered between the three factions as it is shown by the posterior interaction probabilities under the two models.

4. Trading network data

Trading networks belong to the more general category of social networks. In this type of application, the actors are traders and the edges represent transactions between a pair of traders. Trading networks in Finance are particularly interesting as the structure of a network can be affected by failure/success of traders in the market and by market changes over time. These factors are relevant to study the global resiliency and the evolution of the network as well as to determine how traders’ relationships impact both themselves and the market.

4.1 Descriptive analysis

We used data from transactions within the natural gas futures market during the period from January 2005 to December 2008 to construct directed binary networks. The natural gas futures were traded on the New York Mercantile Exchange (NYMEX) only through the traditional open-outcry trades until September 5, 2006. After this date, a platform of electronic trading was introduced. The networks were constructed based on weekly trades. For each week in the four-year period, a link from trader A to trader B was established if there was at least one transaction during that week in which A was the seller and B was the buyer.

Table 1
Summary of descriptive statistics computed over the 201 weekly networks of the natural gas market

Measure	Min.	Q25%	Q50%	Q75%	Max.
Active traders	213	264	275	289	339
Mean in-degree	4	6	7	7	9
Mean out-degree	4	6	7	7	9
Max. in-degree	74	102	138	159	201
Max. out-degree	75	103	137	160	195
Degree corr.	0.952	0.973	0.978	0.981	0.989
Clustering coeff.	0.331	0.383	0.415	0.480	0.526

Figure 3.

Graph of the trading network for the week starting at July 1st, 2005. The network has 263 active traders participating in a total of 6377 transactions. Red nodes represent traders with a low number of links during this week.

A total of 970 traders were involve in transactions during the 201 weeks with an average of 277 active traders per week. Figure 3 displays the graph of the trading network for the week starting at July 1st, 2005 showing only those traders participating in the market that week. The the arrows directionality is determined by the direction of the financial transaction such that a vertex with an out-going (in-coming) arrow is referred to as seller (buyer). The total number of weekly networks is 201.

Table 1 presents some summary statistics computed over the whole observation range (201 networks), including the number of active nodes, the mean and maximum in- and out-degree values, the degree correlation, and the clustering coefficient. The number of active nodes through the 201 weekly networks is small compared to the total number of traders involved in the market during the four-year period, producing very low mean values for both the in- and the out-degree. This feature also reflects a high presence of transient traders, which is partly a consequence of traders failure. The high positive values of the degree correlation suggest that traders tend to make buy and sell transactions with the same partners.

Figure 4.

Time series plots for the number of active traders, clustering coefficient, mean degree, and maximum degree of the trading network. The vertical line represents the week of introduction of electronic trading, and the horizontal lines represent the median during each period.

It is particularly interesting to observe the evolution of the network over time as the introduction of electronic exchange represents a drastic shift in the trading mechanism. This perturbation in the market environment puts to the test the resiliency of the network and the ability of traders to adapt and survive. Figure 4 shows time series plots for the number of active traders, the clustering coefficient, and the mean and maximum total degree (sum of in and out degrees). From this plot it is observed that the number of active traders and the clustering coefficient tend to decrease while the maximum degree increased significantly after the first week of September. Wilcoxon rank tests were performed to evaluate the differences in the median values of these metrics before and after the implementation of the hybrid trading exchange. The results show significant differences in all cases ( $p$ -value $<$ 1.084e-5), suggesting that the establishment of the electronic platform as alternative trading method had a relevant effect on the network structure of the natural gas market.

The dramatic increment in the maximum degree suggests the advent of central traders hoarding a big number of partners. This behavior has an expected effect on the network transitivity shown by the decrease of the clustering coefficient values. Furthermore, the role that traders play can be perturbed by changes in the market structure as it may cause prominent traders to exit the market and low profile traders to become important. Figure 5 displays the betweenness time series for six selected traders. The first four traders DOIT, GALT, JTAV and WACK played a key role in the network before the electronic trading, but their centrality drops dramatically afterwards. In contrast, the traders identified as EDFM and PIOF had a very limited number of partners during the first period but became central after the implementation of electronic exchange.

Figure 5.

Evolution of betweenness for six selected traders with the highest values through the four-years period.

4.2 Exploring the community structure

The topology of a network can be further explored through SBMs in order to identify community patterns. In this section, each week is treated independently by grouping traders being structurally equivalent. We first measure structural equivalence among subjects using the Hamming distance along with a number of different clustering methods. The number of groups and the method of agglomeration are chosen in an optimal way by using our modified version of the BIC.

Figure 6.

In the first row, values of $\text{BIC}^{*}$ for $K=1\ldots,50$ for the following methods of agglomeration: Ward (red), complete (black), single (green), average (blue), mcquitty (aquamarine), median (pink), centroid (yellow). In the second and third rows, raw data and the estimated probability matrix according to the optimal clustering for the selected traders in each week.

To illustrate the results, three weeks were selected according to the maximum (week 31), approximate mean (week 91) and minimum (week 179) observed values of the clustering coefficient through out the 201 weekly networks. For these selected weeks, the first row of Fig. 6 displays plots of the $\text{BIC}^{*}$ values for different number of groups ranging in $K=1,\ldots,50$ , using seven different strategies of hierarchical clustering. Considering the maximum value of $\text{BIC}^{*}$ , the optimal group configuration is obtained by the Wardâ€™s minimum variance method (red line) that provides 13, 14 and 12 clusters for weeks 31, 91 and 179, respectively. In order to observe the network evolution in time for the same group of traders, the 970 traders were reduced to 290 traders that participated in transactions in at least 25 weeks and made at least 50 transactions over the four-year period. The reduced data for the selected weeks was rearranged using the optimal clustering, and the probabilities of interaction between the resulting communities were estimated using the shrinkage estimator $\tilde{\theta}_{k,\ell}$ .

The second and third row of Fig. 6 present plots of the raw data and the estimated interaction probabilities for the selected 290 traders. The raw data shows high reciprocity as expected from the high in- and out-degree correlation values in Table 1. The probability plots in Fig. 6 allow us to see how the structure of the network changes over time in terms of community structure. In week 31, most of the communities show an assortative pattern, but after the establishment of electronic trading the communities in weeks 91 and 179 are mainly disassortative.

Additionally, it is of interest to understand how stable the communities are over time. The average incidence matrix $\bar{\mathbf{D}}$ was computed for the reduced data as

$\displaystyle\bar{\mathbf{D}}=\frac{1}{201}\sum_{t=1}^{201}\mathbf{D}_{t},$

where $\mathbf{D}_{t}=[d_{t,i,j}]$ is the incidence matrix at time $t$ , with $d_{t,i,j}=1$ if the pair of actors $(i,j)$ are in the same community at time $t$ , and $0$ otherwise. The average incidence matrix $\bar{\mathbf{D}}$ representing the proportion of weeks in which any two traders belong to the same community (not shown here) suggests that some communities are preserved over time. In particular, there is a large portion of traders that consistently participate in a very small number of trades.

4.3 Choosing a nonparametric model

Recently, the usage of a Dirichlet process (DP) as a prior specification for $\bm{\xi}$ has become quite popular when fitting SBMs. Alternatively, we propose implementing the more general Poisson-Dirichlet process (PDP) as an alternative prior distribution for the cluster assignments. In order to support this proposal, we computed the maximum likelihood estimators of $a$ and $b$ associated with the optimal clustering identified above. Working with the PDP, the likelihood function corresponds to the joint distribution of $\bm{\xi}$ given in Eq. (5) (recall that $a=0$ for the DP). Table 2 displays the number of clusters $K$ , the maximum likelihood estimates of model parameters $a$ and $b$ , and the maximum log-likelihood Lik, under DP and PDP prior models, for the selected weeks 31, 79, and 179.

Table 2
Optimal number of clusters, maximum likelihood estimates of model parameters $a$ and $b$ , and maximum log-likelihood Lik, for DP and PDP prior models

Measure	Week 31		Week 91		Week 179
$K$	13		14		12
DP $\hat{b}$	2	.017	2	.213	1	.825
DP $\hat{a}$	0	.000	0	.000	0	.000
DP Lik	$-$ 877	.48	$-$ 943	.99	$-$ 824	.87
PDP $\hat{b}$	1	.150	1	.795	0	.924
PDP $\hat{b}$	0	.110	0	.042	0	.133
PDP Lik	$-$ 876	.51	$-$ 943	.26	$-$ 823	.81

The estimates for the model parameter $a$ are relatively far from zero, in particular for weeks 31 and 179, indicating that a Poisson-Dirichlet process is an appropriate prior specification for these data.

4.4 Fitting independent stochastic block models

To complement the previous analysis, Bayesian SBMs were fitted independently to each of the three selected weeks for the reduced data with 290 traders. The results for the Probit and Beta-Binomial models using the Poisson-Dirichlet process prior for $\bm{\xi}$ are summarized by the incidence matrix and the interaction probability matrix in each case. The specification of the model hyperparameters was made based on the estimated probability of a link for each week for both models. The estimated probabilities of a link for weeks 31, 91, and 179 are 0.070, 0.082, and 0.066, respectively. These values are low compared with the clustering coefficients of the reduced networks 0.570, 0.448, and 0.393, suggesting that transitivity is a relevant source of community structure for these data.

Adjacency matrices, incidence matrices, and the posterior interaction probability matrices, under the probit and Beta-Binomial models, for 290 selected traders in weeks 31, 91, and 179 (not shown here), reordered according to the optimal clustering given by week 31 (not shown here) show how a portion of the network structure is preserved over time.

Again, these results are based on a Markov chain with 30,000 iterations after a burn-in period of 10,000 iterations. Such findings are quite consistent with the data and the block model analysis based on $\text{BIC}^{*}$ .

5. Discussion

In this paper, we have provided a rigorous statistical framework using stochastic block models and Bayesian nonparametric methods to perform clustering tasks of directed binary networks. Our findings strongly indicate that our proposal is highly reliable at detecting community structures, since our experiments with real trading data produce consistent clustering patters.

Also, we strongly believe that our approach is quite beneficial because it is flexible (cluster assignments are based on nonparametric methods) as well as prolific (cluster and interaction probabilities can be easily obtained), among others (e.g., prediction). Lastly, our proposal has been formulated by means of a well-established statistical model, and therefore, it is susceptible of generalization and formal evaluation.

The MCMC algorithms showed poor mixing when applied to the trading network data. For the probit model, the method introduced in Held and Holmes (2006) was used to address mixing issues, but the improvement was not substantial compared with other conventional Gibbs sampling algorithms. This is likely due to the fact that the community indicators $\bm{\xi}$ are strongly correlated with other model parameters in the posterior distribution. Another alternative to deal with this issue relies on sampling the cluster assignments from the marginal distribution $p(\xi_{i}\mid\bm{\eta})$ , after integrating the $\{z_{i,j}\}$ out.

Additionally, nowadays there are available more sophisticated methods to address slow convergence and mixing problems. Thus, we can go beyond the typical cluster assignments sampler considered here, in which we sample each indicator one at a time from its full conditional distribution. Some of them include DP mixture models type of algorithms (Neal, 2000), the split-merge algorithm (Jain & Neal, 2004), and the the chaperons algorithm (Betancourt et al., 2016). The main idea behind these samplers consists in generating new Markov chain states by modifying several cluster assignments simultaneously through either more plausible moves or cluster merges and splits.

Finally, reciprocity is commonly observed in directed social networks. By including this parameter, we would be able to explain tie patterns in the network. In this way, it is of interest to extend the probit model in such a way that $(z_{i,j},z_{j,i})\mid\bm{\eta},\mathbf{\Sigma}\sim\textsf{N}(\bm{\eta},\mathbf% {\Sigma})$ , with

$\displaystyle\bm{\eta}=(\eta_{i,j},\eta_{j,i})\quad\text{and}\quad\mathbf{% \Sigma}=\left[\begin{array}[]{cc}\sigma^{2}_{1}&\rho\sigma_{1}\sigma_{2}\\ \rho\sigma_{1}\sigma_{2}&\sigma^{2}_{2}\end{array}\right],$

where $\eta_{i,j}$ is given as in Eq. (3), $\sigma^{2}_{1}$ and $\sigma^{2}_{2}$ are variance components, and finally, $\rho$ is a correlation coefficient, which can be interpreted in this context as a reciprocity parameter. Such an extended model can be fitted once again under the Bayesian paradigm by formulating a joint prior distribution (perhaps independently) on the new model parameters $(\sigma^{2},\sigma^{2}_{1},\rho)$ . An standard choice to be tested in the future is letting $\sigma_{k}^{2}\sim\textsf{Half-Cauchy}(a_{\sigma},b_{\sigma})$ , for $k=1,2$ , and $\rho\sim\textsf{Unif}(a_{\rho},b_{\rho})$ , for carefully chosen values of the new set of hyperparameters $a_{\sigma},b_{\sigma},a_{\rho},b_{\rho}$ . In this regard, either simulation-based methods (Gamerman & Lopes, 2006) or variational approximations (Blei et al., 2017) can be employed to approximate the posterior distribution. This approach will be pursued elsewhere.

Footnotes

Statements and declarations

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.

Appendix

MCMC algorithm: Probit model

Let us introduce a latent variables $\{z_{i,j}\}$ and set

$\displaystyle y_{i,j}\mid z_{i,j}=\left\{\begin{array}[]{ll}1&\text{if }z_{i,j% }\geqslant 1;\\ 0&\text{if }z_{i,j}<0.\end{array}\right.$

where $z_{i,j}\mid\eta_{\xi_{i},\xi_{j}}\mathrel{\overset{\makebox[0.0pt]{\mbox{% \@setsize{\tiny}{6pt}{\vpt}{\@vpt}ind}}}{\sim}}\textsf{N}(\eta_{\xi_{i},\xi_{j% }},1)$ . The full conditional distribution of the auxiliary variables $\{z_{i,j}\}$ for the standard version of the algorithm is given by

$\displaystyle z_{i,j}\mid\eta_{\xi_{i},\xi_{j}},y_{i,j}=\left\{\begin{array}[]% {ll}\textsf{N}_{[0,\infty)}(\eta_{\xi_{i},\xi_{j}},1)&\text{if }z_{i,j}% \geqslant 1;\\ \textsf{N}_{[-\infty,0)}(\eta_{\xi_{i},\xi_{j}},1)&\text{if }z_{i,j}<0.\end{% array}\right.$

However, the method described in Held and Holmes (2006) suggest to sample from the leave-one-out marginal predictive densities

$\displaystyle z_{i,j}\mid\bm{z}_{-(i,j)},y_{i,j}=\left\{\begin{array}[]{ll}% \textsf{N}_{[0,\infty)}(\tilde{h}_{i,j},\tilde{v}^{2}_{i,j})&\text{if }z_{i,j}% \geqslant 1;\\ \textsf{N}_{[-\infty,0)}(\tilde{h}_{i,j},\tilde{v}^{2}_{i,j})&\text{if }z_{i,j% }<0,\end{array}\right.$

where, for $(i,j)\in A_{k,\ell}=\{(i^{\prime},j^{\prime}):i^{\prime}\neq j^{\prime},\xi_{i% ^{\prime}}=k,\xi_{j^{\prime}}=\ell\}$ ,

$\displaystyle\tilde{h}_{i,j}=\frac{\frac{h}{v}+z_{k,\ell}^{-(i,j)}}{\frac{1}{v% }+m_{k,\ell}^{-(i,j)}}\quad\text{and}\quad\tilde{v}^{2}_{i,j}=1+\frac{1}{\frac% {1}{v^{2}}+m_{k,\ell}^{-(i,j)}},$

$z_{k,\ell}^{-(i,j)}$ is the sum of the auxiliary variables over $A_{k,\ell}$ with $z_{i,j}$ removed, and $m_{k,\ell}^{-(i,j)}=|A_{k,\ell}|-1$ .

The full conditional for the means are given by

$\displaystyle\eta_{k,\ell}\mid\bm{\xi},\bm{z}\sim\textsf{N}\left(\frac{\frac{h% }{v}+z_{k,\ell}}{\frac{1}{v}+m_{k,\ell}},\frac{1}{\frac{1}{v}+m_{k,\ell}}\right)$

for $z_{k,\ell}$ the sum of the auxiliary variables over $A_{k,\ell}$ and $m_{k,\ell}=|A_{k,\ell}|$ .

Finally, the posterior full conditional of the faction indicator $\bm{\xi}$ has the following form:

(6) $\displaystyle\textsf{Pr}(\xi_{i}=k\mid\bm{\xi}_{-i},\bm{z})=\left\{\begin{% array}[]{ll}(m_{k}^{-i}-a)\prod_{\ell=1}^{K^{-i}}\frac{p(z_{i,j}\mid A^{i}_{k,% \ell},m^{i}_{k,\ell})}{p(z_{i,j}\mid A^{-i}_{k,\ell},m^{-i}_{k,\ell})}\frac{p(% z_{j,i}\mid A^{i}_{k,\ell},m^{i}_{k,\ell})}{p(z_{j,i}\mid A^{-i}_{k,\ell},m^{-% i}_{k,\ell})}&\text{if }k\leqslant K^{-i};\\ (b+aK^{-i})\prod_{\ell=1}^{K^{-i}}p(z_{i,j}\mid A^{-i}_{\ell},m^{-i}_{\ell})&% \text{if }k=K^{-i}+1.\end{array}\right.$

where $K^{-i}=\max_{j:j\neq i}\{\xi_{j}\}$ , $m_{k}^{-i}=\sum_{j:j\neq i}1_{\{\xi_{j}=k\}}$ , $m_{k,\ell}^{-i}=|A_{k,\ell}^{-i}|$ , $m_{k,\ell}^{i}=|A_{k,\ell}^{i}|$ , $m_{\ell}^{-i}=|A_{\ell}^{-i}|$ , with

$\displaystyle A_{k,\ell}^{-i}=\{(i^{\prime},j^{\prime}):i^{\prime}\neq j^{% \prime},j^{\prime}\neq i,\xi_{i^{\prime}}=k,\xi_{j^{\prime}}=\ell\},$ $\displaystyle A_{k,\ell}^{i}=A_{k,\ell}^{-i}\cup\{(i^{\prime},j^{\prime}):i^{% \prime}=i,\xi_{j^{\prime}}=\ell\},$ $\displaystyle A_{\ell}^{-i}=\{j:j\neq i,\xi_{j}=\ell\},$

and finally,

$\displaystyle p(z_{i,j}\mid A,m)=(2\pi)^{-m/2}(1+mv^{2})^{-1/2}\exp{\left\{-% \frac{1}{2}\left[\frac{h}{v^{2}}+\sum_{A}z_{i,j}^{2}\right]-\frac{\sum_{A}% \left(\frac{h}{v^{2}}+z_{i,j}\right)^{2}}{\frac{1}{v^{2}}+m}\right\}}.$

MCMC algorithm: Beta-Binomial Model

The posterior full conditional for the interaction probabilities is

$\displaystyle\theta_{k,\ell}\mid\bm{\xi},\mathbf{Y}\sim\textsf{Beta}(a+y_{k,% \ell},b+m_{k,\ell}-y_{k,\ell})$

where $a$ and $b$ are the hyperparameters of the prior distribution of $\theta_{k,\ell}$ , $y_{k,\ell}=\sum_{A_{k,\ell}}y_{i,j}$ , with $A_{k,\ell}=\{(i,j):i\neq j,\xi_{i}=k,\xi_{j}=\ell\}$ , $m_{k,\ell}=|A_{k,\ell}|$ .

The posterior full conditional of $\bm{\xi}$ has the same form of Eq. (6) replacing $z_{i,j}$ by $y_{i,j}$ , and the predictive distribution of $y_{i,j}$ after integrating over $\theta_{k,\ell}$ is now given by

$\displaystyle p(y_{i,j}\mid A,m)=\frac{\Gamma(a+b)\Gamma(a+\sum_{A}y_{i,j})% \Gamma(b+m-\sum_{A}y_{i,j})}{\Gamma(a)\Gamma(b)\Gamma(a+b+m)}.$

Notation

The cardinality of a set $A$ is denoted by $|A|$ . If P is a logical proposition, then $\mathbf{1}\left\{\text{P}\right\}=1$ if P is true, and $\mathbf{1}\left\{\text{P}\right\}=0$ if P is false. $\lfloor x\rfloor$ denotes the floor of $x$ , whereas $[n]$ denotes the set of all integers from 1 to $n$ , i.e., $\{1,\ldots,n\}$ . The Gamma function is given by $\Gamma(x)=\int_{0}^{\infty}u^{x-1}e^{-u}\text{d}u$ .

Matrices and vectors with entries consisting of subscripted variables are denoted by a boldfaced version of the letter for that variable. For example, $\bm{x}=(x_{1},\ldots,x_{n})$ denotes an $n\times 1$ column vector with entries $x_{1},\ldots,x_{n}$ . We use $\bm{0}$ and $\bm{1}$ to denote the column vector with all entries equal to 0 and 1, respectively, and $\mathbf{I}$ to denote the identity matrix. A subindex in this context refers to the corresponding dimension; for instance, $\mathbf{I}_{n}$ denotes the $n\times n$ identity matrix. The transpose of a vector $\bm{x}$ is denoted by $\bm{x}^{\textsf{T}}$ ; analogously for matrices. Moreover, if $\mathbf{X}$ is a square matrix, we use $\mathsf{tr}(\mathbf{X})$ to denote its trace and $\mathbf{X}^{-1}$ to denote its inverse. The norm of $\bm{x}$ , given by $\sqrt{\bm{x}^{\textsf{T}}\bm{x}}$ , is denoted by $\|\bm{x}\|$ .

References

Airoldi

E.M.

Blei

D.M.

Fienberg

S.E.

, & Xing

E.P.

(2008). Mixed membership stochastic blockmodels. Journal of machine learning research.

Betancourt

Zanella

Miller

J.W.

Wallach

Zaidi

, & Steorts

R.C.

(2016). Flexible models for microclustering with application to entity resolution. Advances in neural information processing systems, 29.

Blei

D.M.

Kucukelbir

, & McAuliffe

J.D.

(2017). Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518), 859-877.

Erdos

, & Rényi

(1960). On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1), 17-60.

Frank

, & Strauss

(1986). Markov graphs. Journal of the American Statistical Association, 81(395), 832-842.

Gamerman

, & Lopes

H.F.

(2006). Markov chain Monte Carlo: stochastic simulation for Bayesian inference. CRC press.

Held

, & Holmes

(2006). Bayesian auxiliary variable models for binary and multinomial regression, Bayesian Analysis 1(1), 145-168.

Hoff

P.D.

Raftery

A.E.

, & Handcock

M.S.

(2002). Latent space approaches to social network analysis. Journal of the American Statistical Association, 97(460), 1090-1098.

Holland

, & Leinhardt

(1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, 76(373), 33-50.

10.

Jain

, & Neal

(2004). A split-merge markov chain monte carlo procedure for the dirichlet process mixture model. Journal of Computational and Graphical Statistics, 13(1), 158-182.

11.

Kemp

Tenenbaum

J.B.

Griffiths

T.L.

Yamada

, & Ueda

(2006). Learning systems of concepts with an infinite relational model. In AAAI, Vol. 3, p. 5.

12.

Kolaczyk

E.D.

, & Csárdi

(2020). Statistical analysis of network data with R, Vol. 65. Springer.

13.

Neal

R.M.

(2000). Markov chain sampling methods for dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2), 249-265.

14.

Newman

(2003). The structure and function of complex networks. SIAM Review, 45(2), 167-256.

15.

Nowicki

, & Snijders

(2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455), 1077-1087.

16.

Pitman

, & Yor

(1997). The two-parameter poisson-dirichlet distribution derived from a stable subordinator. The Annals of Probability, 855-900.

17.

Robins

Pattison

Kalish

, & Lusher

(2007). An introduction to exponential random graph (p*) models for social networks. Social Networks, 29(2), 173-191.

18.

Sampson

S.F.

(1968). A novitiate in a period of change: An experimental and case study of social relationships. Cornell University.

19.

Snijders

(2002). Markov chain monte carlo estimation of exponential random graph models. Journal of Social Structure, 3(2), 1-40.

20.

Sosa

, & Buitrago

(2021). A review of latent space models for social networks. Revista Colombiana de Estadística, 44(1), 171-200.

21.

Wang

Y.J.

, & Wong

G.Y.

(1987). Stochastic blockmodels for directed graphs. Journal of the American Statistical Association, 82(397), 8-19.

22.

Wasserman

, & Faust

(1994). Social network analysis: Methods and applications. Cambridge university press.

23.

White

H.C.

Boorman

S.A.

, & Breiger

R.L.

(1976). Social structure from multiple networks. i. blockmodels of roles and positions. American Journal of Sociology, 81(4), 730-780.

24.

Tresp

Kriegel

H.-P.

, et al. (2006). Learning infinite hidden relational models. Uncertainity in Artificial Intelligence (UAI2006), 2.