The global shape of a protein molecule is believed to be dominant in determining low-frequency deformational motions. However, how structure dynamics relies on residue interactions remains largely unknown. The global residue community structure and the local residue interactions are two important coexisting factors imposing significant effects on low-frequency normal modes. In this work, an algorithm for community structure partition is proposed by integrating Miyazawa–Jernigan empirical potential energy as edge weight. A sensitivity parameter is defined to measure the effect of local residue interaction on low-frequency movement. We show that community structure is a more fundamental feature of residue contact networks. Moreover, we surprisingly find that low-frequency normal mode eigenvectors are sensitive to some local critical residue interaction pairs (CRIPs). A fair amount of CRIPs act as bridges and hold distributed structure components into a unified tertiary structure by bonding nearby communities. Community structure analysis and CRIP detection of 116 catalytic proteins reveal that breaking up of a CRIP can cause low-frequency allosteric movement of a residue at the far side of protein structure. The results imply that community structure and CRIP may be the structural basis for low-frequency motions.
1. Introduction
Protein functionality strongly relies on conformational changes. Theoretical and experimental works have revealed that protein molecule dynamics spontaneously rises from the hierarchical organization of amino acids (Torchia and Ishima, 2003; Bahar et al., 2007; Moritsugu et al., 2015). In the protein molecule, a number of residues are in close contact and form clusters under the influence of inter-residue interactions, such as hydrophobic effect, hydrogen bonding, van der Waals and electrostatic force (Sun and He, 2010). Beyond the cluster level, protein structure forms a 3D network with residues as nodes and interactions as edges.
Conformational motion of residue network occurs over a wide range of spatial and temporal scales. Studies suggest that structure dynamics can be regarded as superposition of different modes, from local oscillations of side chains at subpicosecond time scale to collective movements of larger segments at nanosecond time scale or longer (Simonson and Perahia, 1992; Kornev and Taylor, 2015). The collective movement, such as slow conformational changes in catalysis, may require microseconds to seconds. Many biological processes depend on large-scale collective motions (Kurkcuoglu et al., 2009; Hong et al., 2016; Smith et al., 2016). Unfortunately, the relationship between global allosteric movement and local residue interactions has not been fully explained. The mechanism underlying the formation of collective synchronization behaviors from locally coupled oscillating residues remains largely unknown.
It has been suggested that low-frequency movement mainly depends on the shape of macromolecules, rather than the local connectivity in residue contact networks (Lu and Ma, 2005). As a matter of fact, protein molecule structures are organized hierarchically from residue clusters to networks (Sun and He, 2010). In the residue network, a group of amino acids contact with each other and form a community. Recent researches have highlighted the potential roles of community structure as a framework for complex networks (Redner, 1998; Vespignani, 2003; Mina et al., 2009; Shen, 2013). However, few studies have focused on community structures of the residue contact network, which is crucial to the understanding of protein molecule dynamics.
Residue communities in native protein structure usually share bridging residues (Sun and He, 2010). Structure components (such as alpha helix and beta sheet) are not isolated from each other for the existence of such bridging residues. A number of critical residue interaction pairs (CRIPs) play an essential role in binding secondary structure and stabilizing tertiary structure. However, some, but not all, CRIPs are bridging residues. The mechanisms by which these CRIPs affect low-frequency movement are just starting to be understood, and many of the important details have yet to be uncovered.
In this work, we attempt to relate low-frequency motion of protein structure to the interplay between residue network topology and local interaction stiffness of residue pairs. The goal is to partition the amino acid network into different communities. Meanwhile, we aim to investigate the influence of CRIPs on protein structure dynamics by using the normal mode analysis method. Finally, a set of catalytic proteins are used to verify the reliability of the proposed method.
2. Methods
2.1. Amino acid contact criteria and interaction network
In the protein structure, two residues contact with each other when they are close enough in space. The interaction-based contact definition is usually replaced by distance-based contact criteria. A popular contact distance criterion is 7.0 Å between \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${C_ \alpha }$$
\end{document} atoms (Tudos et al., 1994; Ravasz et al., 2002; Atilgan et al., 2004; Barah and Sinha, 2008). However, significant errors have been observed by using a single fixed cutoff distance value (Sun and He, 2011), especially for residue pairs with bulky side chains. A cutoff distance criterion incorporating chain anisotropy can greatly improve the residue contact determination.
In this study, we use the atom distance criterion (ADC) model to judge residue contact relationship (Sun and He, 2011). Consider atom i in residue A and atom j in residue B, if the distance \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${r_{ij}}$$
\end{document} between atoms i and j satisfies the criterion \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${r_{ij}} < r_{cnt}^{ij}$$
\end{document}, residues A and B are in contact. The atom contact cutoff distance \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$r_{cnt}^{ij}$$
\end{document} depends on amino acid type. The values of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$r_{cnt}^{ij}$$
\end{document} for all types of residue contact pairs are given by Sun and He (2011).
In the normal mode analysis (NMA) method, inter-residue stiffness is usually a constant value or a function of the atom distance. In fact, the force field between residues is associated with amino acid contact energy. Here, we define elastic constant \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \gamma}$$
\end{document} as follows:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
{ \gamma _ { mn } } = \left( { \frac { e_ { mn } ^ { \max } - { e_ { mn } } } { e_ { mn } ^ { \max } - e_ { mn } ^ { \min } } } + 1 \right) { \gamma _0 } \tag { 1 }
\end{align*}
\end{document}
where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${e_{mn}}$$
\end{document} is the Miyazawa and Jernigan contact energy related to residue types (Miyazawa and Jernigan, 1996). The subscripts m and n represent amino acid types of residues A and B. The maximal and minimal values of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${e_{mn}}$$
\end{document} are \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$e_{mn}^{ \max }$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$e_{mn}^{ \min }$$
\end{document}. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \gamma _0}$$
\end{document} is a force constant reflecting average residue interaction stiffness, which is usually set to 1.0. With the new force constant \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \gamma _{mn}}$$
\end{document}, the residue network dynamics explicitly relies on protein sequence.
2.2. Quantify community structure in residue networks
Identifying community structure in residue network consists of two steps, including generating a dendrogram and breaking the dendrogram into communities. First, we find out all maximal cliques in the network by the Bron-Kerbosch algorithm (Bron and Kerbosch, 1973; Calzals and Karande, 2008). Each maximal clique is taken as an initial community. Then, we calculate the similarity between each pair of communities and select the pair of communities with the maximum similarity, combine them into a new one, then calculate the similarity between the new communities and other communities until only one community remains.
The second step is to cut the dendrogram. A measurement extended-modularity quantity is defined (Shen, 2013):
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
E = \frac { 1 } { L } \mathop \sum \limits_i { \mathop \sum \limits_ { v \in { C_i } , w \in { C_i } } { \frac { 1 } { { { O_v } { O_w } } } } } \left( { { A_ { vw } } - \frac { { { k_v } { k_w } } } { L } } \right) , \tag { 2 }
\end{align*}
\end{document}
where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${A_{vw}}$$
\end{document} is the element of adjacency matrix of the network. Here, we construct \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${A_{vw}}$$
\end{document} by using the fractional contact parameter γmn. The quantity \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$L = \sum \nolimits_{vw} {{A_{vw}}}$$
\end{document} is the total number of edges in the network and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${k_v} = \mathop \sum \limits_w {{A_{vw}}}$$
\end{document} is the degree of node v. The larger the value of E, the better the cut for the cover. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O{}_v$$
\end{document} is the number of communities that node v belongs to. Maximal E means the number of communities the network can be divided into. Thus, we use it to cut through the dendrogram.
For a cover of network, however, a node may belong to more than one communities. With the belonging coefficient \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \alpha _{vc}}$$
\end{document}, which reflects how much the node v belongs to the community c, the quantity of a cover c can be measured by (Shen, 2013)
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
Q = \frac { 1 } { L } \mathop \sum \limits_ { c \in C } { \mathop \sum \limits_ { vw } { { \alpha _ { vc } } } } { \alpha _ { wc } } \left( { A_ { vw } } - \frac { { { k_v } { k_w } } } { L } \right). \tag { 3 }
\end{align*}
\end{document}
The belonging coefficient should satisfy a normalization property. This property is formally written as 0≤αvc≤1, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\forall v \in V , { \forall _{}}c \in C$$
\end{document}, and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mathop \sum \limits_{c \in C} {{ \alpha _{vc}}} = 1$$
\end{document}.
Here, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$ { \alpha _ { vc } } = \frac { 1 } { { { \alpha _v } } } \mathop \sum \limits_ { w \in V ( c ) } { { \frac { O_ { vw } ^c } { { O_ { vw } } } } } { A_ { vw } } $$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${O_{vw}}$$
\end{document} denote the number of maximal cliques containing the edge (v, w) in the whole network, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$O_{vw}^c$$
\end{document} denotes the number of maximal cliques containing the edge (v, w) in the community c, and αv is a normalization term denoted as (Shen, 2013):
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
{ \alpha _v } = \mathop \sum \limits_ { c \in C } { \mathop \sum \limits_ { w \in V ( c ) } { { \frac { O_ { vw } ^c } { { O_ { vw } } } } { A_ { vw } } } } . \tag { 4 }
\end{align*}
\end{document}
2.3. Detect the CRIP pair
The low-frequency normal mode eigenvectors are sensitive to a few local CRIPs. To detect such CRIPs, the inter-residue stiffness coefficients are perturbed to 50% of the original values one by one. The structure of the Hessian matrix is maintained, that is, the inter-residue connectivity will not break down and new connectivity will not emerge. Then, the normal mode vector R is recalculated for each residue interaction perturbation. The difference between perturbed and original mode vectors is normalized as
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
\delta { \bf{R}} = \left( {{ \bf{R}} - {{ \bf{R}}_0}} \right) { \rm{ / }}{{ \bf{R}}_{ \rm{0}}} \tag{5}
\end{align*}
\end{document}
where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${{ \bf{R}}_0}$$
\end{document} is the vector before stiffness perturbation. A scaler parameter is defined to measure the sensitivity of normal mode displacement to CRIP
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
< \delta { \bf { R } } > = \sqrt { \frac { 1 } { n } \mathop \sum \limits_ { i = 1 } ^n { \delta R_i^2 } } , \tag { 6 }
\end{align*}
\end{document}
where n is the dimension of δR and δRi is the element of δR. Residues A and B are an interacting (contact) pair if distance \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${r_{ij}}$$
\end{document} between atom i in residue A and atom j in residue B is less than \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$r_{cnt}^{ij}$$
\end{document}. The interaction stiffness \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \gamma _{mn}}$$
\end{document} between residues A and B is defined as Eqn. (1). In a residue network, we perturb each of the stiffness values one by one by replacing \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \gamma _{mn}}$$
\end{document} with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \gamma _{mn}}$$
\end{document}/2.
3. Data Set and Results
We selected 116 protein data bank (PDB) structures from the Catalytic Site Atlas database (Furnham et al., 2014), a database documenting enzyme active sites and catalytic residues in enzymes of 3D structure. A heme oxygenase (PDB ID:1DVE) enzyme and a ribonuclease (PDBID: 1BOL) are taken as examples.
3.1. The hierarchical and overlapping community structure of protein 1DVE
The residue contact matrix of protein 1DVE is constructed with the ADC model. After finding out all maximal cliques, the similarity parameter is calculated and the dendrogram of 1DVE residue interaction networks is clustered. Therefore, we calculate the E value for community number from 0 to 20 (Fig. 1), which shows that the maximal E is achieved at community number 6.
The community cluster of residue interaction network in 1DVE. (a) The network is divided into six communities. (b) The leaf nodes correspond to best cover communities. The label of each leaf node is the numerical order of different communities.
As community structure is extracted, different colors are used to distinguish the six communities. The residues that only belong to one single community are denoted in circular nodes. Those residues shared by at least two communities are shown as squares. The larger square indicates that this node is shared by more communities.
Figure 2 shows that more than half of the residues are in a single community. Among all 214 residues, 89 residues are shared by at least two communities. The residues in a single community are connected to each other more closely. It should be noted that in the case of the residue network, we find that local communities more or less share residues with each other and it is nearly impossible to partition the network into absolutely isolated communities.
Community structures of protein 1DVE are shown in different colors. The shared residues by multiple communities are shown as squares. The red edges denote CRIPs. Ncmn is the community number. Ncnt is the number of edges. Nnodeovl is the number of nodes shared by at least two communities. Red edges denote the CRIPs. The markers with a thick red edge denote catalytic residues. CRIP, critical residue interaction pair.
The emerging community structure provides a quantitative measurement of the tertiary structure organization. An interesting observation is that the top 1 significant CRIP (ALA165-GLY161) crosses different communities (Fig. 3). For example, ALA165 and GLY161 are the two residues forming CRIP ALA165-GLY161. ALA165 is shared by communities 3 and 5. GLY161 is also shared by communities 3 and 5. The CRIP is a link between bulky residue groups and transfer momentum from one community to the other.
The structure of catalytic protein 1DVE. Catalytic sites are shown in solid surface style. CRIPs are shown in wireframe surface style. Catalytic residue CRIPS are colored according to residue type. Different communities are shown in different colors.
As a matter of fact, such intercommunity bridges can be regarded as pathways channeling kinetic energy throughout the whole protein molecule. It is reasonable to expect that the vibrations will be locked inside the isolated community if such CRIPs break down. Recent studies have suggested that localized vibrations may play an active role in enzyme catalysis (Sitnitsky, 2006). Discrete breathers also have been predicted as localized modes in the nonlinear network model of proteins (Juanico et al., 2007). In this sense, variation of such local intercommunity CRIPs will obviously lead to substantial changes in collective movements of protein structure.
Figure 4 confirms remarkable changes of the eighth mode displacement caused by the stiffness variation of CRIP ALA165-GLY161. It can be observed that a 50% reduction in interaction stiffness between ALA165 and GLY161will lead to a remarkable change (about 142% of the original mode, see Fig. 4b) in the low-frequency mode displacements. The maximal model variation even reaches more than 1335% (Fig. 4a).
Remarkable changes of the eighth mode displacement caused by the stiffness variation of CRIP ALA165-GLY161. (a) The normalized eighth mode displacement difference after the stiffness of CRIP ALA165-GLY161 reduced by 50%. The normal vector value at HIS119 increased to about 1335%. (b) The average mode displacement difference for all the CRIPs detected by perturbation of residue–residue interaction stiffness.
A surprising observation is that the largest value of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\delta { \bf{R}}$$
\end{document} occurs at residue HIS119, which is on the far opposite side of the 3D structure. The large \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\delta { \bf{R}}$$
\end{document} indicates that the motion of residue HIS119 is highly sensitive to the interaction stiffness between residues in pair ALA165-GLY161, which is far from HIS119. It is not unreasonable, therefore, to believe that there are hidden long-range interactions between different residues. Change of local interactions in one CRIP of residue networks may lead to significant movement in another site that is not explicitly associated with the CRIP.
In addition to ALA165-GLY161, many other CRIPs can be detected (Fig. 4b), which strongly indicates that the low-frequency normal mode displacement not only depends on the global topology structure of the protein molecule (such as community structure) but also on local CRIPs connecting communities in residue networks.
3.2. The community structure and CRIPs of 1BOL
The community structure of ribonuclease 1BOL (PDB ID) is analyzed and the CRIPs are detected. 1BOL contains helix and beta-sheets (Fig. 5). The catalytic sites His46, Glu105, and His109 are closely connected residues between a helix and a beta-strand. The results indicate that 1BOL is partitioned into four communities (Fig. 6). In the residue network, communities are shown in different colors (Fig. 7).
The structure of catalytic protein 1BOL. Catalytic sites are shown in solid surface style. CRIPs are shown in wireframe surface style. Catalytic residue CRIPS are colored according to residue type. Different communities are shown in different colors.
The community cluster of residue interaction networks in 1BOL. (a) The network is divided into four communities. (b) The leaf nodes correspond to best cover communities. The label of each leaf node is the numerical order of different communities.
Community structures of protein 1BOL are shown in different colors. The shared residues by multiple communities are shown as squares. The red edges denote CRIPs. Ncmn is the community number. Ncnt is the number of edges. Nnodeovl is the number of nodes shared by at least two communities. Red edges denote the CRIPs. The markers with a thick red edge denote catalytic residues.
Ncnt is the number of edges. Nnodeovl is the number of nodes shared by at least two communities. Red edges denote the CRIPs. The markers with a thick red edge denote catalytic residues.
Figure 8 shows the changes of the eighth mode displacement caused by the stiffness variation of CRIP SER209-ALA178. Here, eighth mode means the second mode after extracting six rigid modes. The root mean square difference <δR> = 41% is observed in the low-frequency mode displacements. The maximal model variation even reaches more than 225%. Although the change of molecule structure motion is not as large as that of 1DVE, the importance of CRIPs can still be identified (Fig. 8b). This example again confirms that not only the global topology but also CRIPs play essential roles in determining the movement of protein structures.
Changes of the eighth mode displacement caused by the stiffness variation of CRIP SER209-ALA178. (a) The normalized eighth mode displacement difference after the stiffness of CRIP SER209-ALA178 reduced by 50%. (b) The average mode displacement difference for all the CRIPs detected by perturbation of residue–residue interaction stiffness.
3.3. Statistics of community structure and CRIPs of 116 catalytic proteins
We analyze the community structures for all the 116 catalytic proteins. The community number Ncmn values are shown in Figure 9. Figure 9 indicates that the maximal community number is 9 in the current PDB set. Although the data points are widely distributed, it should be noted that there is a roughly increasing trend between community number and protein chain length naa.
Community number versus protein chain length of 116 catalytic proteins.
An interesting observation is that among PDBs with similar size, the PDB with a compact structure (smaller Rg) has fewer communities. In Figure 9, the chain lengths of 1K4L and 1GLO are 216 and 217, respectively. However, the community numbers of 1K4L and 1GLO are quite different. Detailed investigations of PDB structures show that 1K4L has a long tail and the whole structure is relatively loose (with a radius of gyration around 16.7 Å). The structure of 1GLO is quite similar to the main body of 1K4L except for the long tail. 1GLO has a smaller Rg of 15.8 Å. It is possible that the higher number of communities in 1K4L (Ncmn = 9) is a consequence of incompact structure.
If two residues forming a CRIP are in the same community, it is an intracommunity CRIP. Otherwise, it is an intercommunity CRIP. Figure 10 shows that 71% of the 116 PDBs have an intercommunity residue pair, which has essential influence on the low-frequency movement of the whole structure.
The percentage of proteins with intercommunity CRIP in the top three CRIPs in a set of 116 protein data bank (PDB) structures.
In the case of residue mutation, the local interaction between residue pairs may change significantly. If mutation occurs in the CRIP, the whole structure may undergo a sharp transformation due to long-range interactions through the residue network. The residue with prominent displacement, which we call target residue here, is usually far from the CRIP. For the 116 PDBs, we calculated the relative position of target residue and CRIP. Taking the center of protein structure as the origin point x0, we construct two vectors v1 and v2 representing the target residue and CRIP, respectively.
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
{{ \bf{v}}_1} = {{ \bf{x}}_t} - {{ \bf{x}}_0} , \tag{7}
\end{align*}
\end{document}\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
{{ \bf{v}}_2} = {{ \bf{x}}_{CRIP}} - {{ \bf{x}}_0} , \tag{8}
\end{align*}
\end{document}
where xt is the center position of target residue and xCRIP is the center position of CRIP. The angle θ between v1 and v2 is in the range of [0, 180°]. The length of v1 is normalized by Rg. Angle θ and length of v1 are shown in polar coordinates (Fig. 11).
The relative position between the top 1 CRIP and target residue with most significant movement variations for 116 catalytic proteins.
Figure 11 shows that the angle between target residue vector and CRIP vector is in the range of [0, 180°], while the length of target protein is uniformly distributed between Rg/2 and 1.5 × Rg. When θ is about 180°, the target residue is on the opposite side of the protein structure. In such case, we find that a kick to the CRIP causes a significant low-frequency movement at the other side of the protein. This may be a possible mechanism behind significant conformational changes of the transmembrane protein upon binding of a signal molecule.
To figure out the effect of a CRIP on target residues, we calculated low-frequency movement variations after CRIP stiffness decreases to 50%. Here, the movement of eighth normal mode was used. The movement change δRCRIP of the community containing the top 1 CRIP is calculated. In the same way, we calculated δRtarget of the community with the target residue. It is believed that a significant value of δRtarget/δRCRIP is direct evidence that high correlations exist between target residue and CRIP. Here, we checked the target residue communities with δRtarget/δRCRIP ≥0.9.
The histogram of δRtarget/δRCRIP for 116 PDBs is shown in Figure 12. Among all 116 proteins, we find that around 87% proteins have δRtarget/δRCRIP ≥0.9. This result indicates that collective movements of the target cluster have high correlation with that of CRIP. Among these proteins, the movement variation of the community containing the target residue is at least no less than that of the CRIP community and is, at most, 11 times the movement of the CRIP community.
The histogram of normalized movement changes of target residue for 116 catalytic proteins.
We also calculated histograms for other low-frequency modes. Table 1 shows the movement variation ratios for normal modes 7–10. For all these low-frequency modes, nearly 90% proteins in the current data set have high movement variation ratios (≥0.9) between the target community and CRIPs. The results clearly indicate that conformational transformations at target residues closely correlate with the decrease of CRIP stiffness.
Movement Variation Ratio Between Target Residue Community and Critical Residue Interaction Pair
Movement variation ratio
Mode 7 (%)
Mode 8 (%)
Mode 9 (%)
Mode 10 (%)
δRtarget/δRCRIP ≥0.9
89
87
89
90
δRtarget/δRCRIP <0.9
11
13
11
10
CRIP, critical residue interaction pair.
4. Conclusions
In summary, this work shows that community structure is a fundamental feature of the residue contact network in a protein molecule. The residues in the local community have more connection to each other. Such communities usually form cores of the protein molecule. In residue networks, communities overlap with each other and share many common residues. Intercommunity residue pairs act as bridges connecting bulky residue groups. Normal mode analysis of residue networks with perturbed local interaction stiffness shows that the most important CRIPs usually bridge different communities. Low-frequency normal mode displacements are sensitive to such CRIPs, suggesting that the newly developed method has a remarkable advantage in exploring the effect of global topology and local stiffness on low-frequency movements. In addition, we show that community structure analysis method is functionally reliable in terms of network partition for the current PDB set containing 116 catalytic proteins.
Footnotes
Acknowledgment
This research was sponsored by the C.C. Lin specific fund.
Author Disclosure Statement
No competing financial interests exist.
References
1.
AtilganA.R., AkanP., and BaysalC.2004. Small-world communication of residues and significance for protein dynamics. Biophys. J., 86, 85–91.
2.
BaharI., ChennubhotlaC., and TobiD.2007. Intrinsic dynamics of enzymes in the unbound state and relation to allosteric regulation. Curr. Opin. Struct. Biol., 17, 633–640.
3.
BarahP., and SinhaS.2008. Analysis of protein folds using protein contact networks. Pramana J. Phys. 71, 369–378.
4.
BronC., and KerboschJ.1973. Finding all cliques of an undirected graph [H]. Commun. ACM, 16, 575–577.
5.
CalzalsF., and KarandeC.2008. A note on the problem of reporting maximal cliques. Theor. Comp. Sci., 407, 564–568.
6.
FurnhamN., HollidayG.L., de BeerT.A.P., et al.2014. The Catalytic Site Atlas 2.0: Cataloging catalytic sites and residues identified in enzymes. Nucl. Acids Res., 42, D485–D489.
7.
HongL., JainN., ChengX., et al.2016. Determination of functional collective motions in a protein at atomic resolution using coherent neutron scattering. Sci. Adv. 2, e1600886.
8.
JuanicoB., SanejouandY.H., PiazzaF., et al.2007. Discrete breathers in nonlinear network models of proteins. Phys. Rev. Lett. 99, 238104.
9.
KornevA.P., and TaylorS.S.2015. Dynamics-driven allostery in protein kinases. Trends Biochem. Sci., 40, 628–647.
10.
KurkcuogluO., KurkcuogluZ., DorukerP., et al.2009. Collective dynamics of the ribosomal tunnel revealed by elastic network modeling. Proteins, 75, 837–845.
11.
LuM.Y., and MaJ.P.2005. The role of shape in determining molecular motions. Biophys. J., 89, 2395–2401.
12.
MinaZ., Keivan AghababaeiS., and Gholam RezaO.2009. Complex eigenvectors of network matrices give better insight into the community structure. J. Stat. Mech. Theor. Exp. 2009, P10018.
13.
MiyazawaS., and JerniganR.L.1996. Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J. Mol. Biol., 256, 623–644.
14.
MoritsuguK., KoikeR., YamadaK., et al.2015. Motion tree delineates hierarchical structure of protein dynamics observed in molecular dynamics simulation. PLoS One, 10, e0131583.
15.
RavaszE., SomeraA.L., MongruD.A., et al.2002. Hierarchical organization of modularity in metabolic networks. Science, 297, 1551–1555.
16.
RednerS.1998. How popular is your paper? An empirical study of the citation distribution. Eur. Phys. J. B, 4, 131–134.
17.
ShenH.-W.2013. Community Structure of Complex Networks. Springer Berlin, Heidelberg.
18.
SimonsonT., and PerahiaD.1992. Normal modes of symmetric protein assemblies. Application to the tobacco mosaic virus protein disk. Biophys. J., 61, 410–427.
19.
SitnitskyA.E.2006. Dynamical contribution into enzyme catalytic efficiency. Physica A, 371, 481–491.
20.
SmithC.A., BanD., PratiharS., et al.2016. Allosteric switch regulates protein–protein binding through collective motion. Proc. Natl Acad. Sci. 113, 3269–3274.
21.
SunW., and HeJ.2010. Understanding on the residue contact network using the log-normal cluster model and the multilevel wheel diagram. Biopolymers, 93, 904–916.
22.
SunW., and HeJ.2011. From isotropic to anisotropic side chain representations: comparison of three models for residue contact estimation. PLoS One, 6, e19238.
23.
TorchiaD.A., and IshimaR.2003. Molecular structure and dynamics of proteins in solution: Insights derived from high-resolution NMR approaches. Pure Appl. Chem. 75, 1371–1381.
24.
TudosE., FiserA., and SimonI.1994. Different sequence environments of amino-acid-residues involved and not involved in long-range interactions in proteins. Int. J. Peptide Protein Res., 43, 205–208.