Abstract
In this minireview, we focus on advances in our knowledge of the human erythrocyte proteome and interactome that have occurred since our seminal review on the topic published in 2007. As will be explained, the number of unique proteins has grown from 751 in 2007 to 2289 as of today. We describe how proteomics and interactomics tools have been used to probe critical protein changes in disorders impacting the blood. The primary example used is the work done on sickle cell disease where biomarkers of severity have been identified, protein changes in the erythrocyte membranes identified, pharmacoproteomic impact of hydroxyurea studied and interactomics used to identify erythrocyte protein changes that are predicted to have the greatest impact on protein interaction networks.
Introduction
Over five years ago we published, in this journal, the first review written on the erythrocyte proteome including the first attempts to look at the red blood cell interactome and how it changes in the case of sickle cell disease (SCD). 1 At that time, there had been 751 unique proteins that were demonstrated to be part of the erythrocyte proteome based on studies from several laboratories including our own.2–5 Today, that number stands at 2289 proteins, and several computational and mathematical techniques have been utilized to understand the resulting interactome and how it is impacted by proteomic changes in blood disorders such as SCD. In this minireview, we summarize the advances over the period of 2007–2012 in both the proteomics and interactomics of the human erythrocyte and how they are impacted by SCD. The aim of the study was to give our view of the flow of this field and, due to the page and reference limitations of a minireview, it will not be an exhaustive review of the entire literature on this subject. We apologize in advance to numerous authors of articles that we will not be able to discuss or reference and refer the reader to several reviews that followed our seminal review, on this subject.6–16
Advances in erythrocyte proteomics
The erythrocyte proteome in 2007 had 751 unique proteins.
1
We listed them in a table and produced the very first erythrocyte interactome maps which included a repair or destroy (ROD) box that contained a dense set of interconnected and clustered nodes, including proteasomal, chaperonin and heat shock proteins.
1
The presence of proteasomes in mature erythrocytes, first suggested by proteomic studies,
3
was confirmed when Neelam et al. recently demonstrated functional 20 S proteasomes within these cells.
17
The 751 proteins listed came primarily from three studies, each of which brought sequential improvements in methodology and/or technology over the previous one. Low et al.
2
in 2002 used a two-dimensional IEF-SDS PAGE followed by in gel trypsin digestion and MALDI TOF mass spectrometry (MS) and looked only at erythrocyte membrane proteins. They identified 84 unique proteins. Our own study followed in 2004 and used a well-established set of cell biological approaches to expose various membrane surfaces and membrane skeletal proteins, as well as separating cytosolic proteins by gel filtration, prior to digestion with trypsin in solution.
3
We then utilized reverse phase HPLC coupled to a ThermoFinnigan LCQ DECA XP Ion Trap Tandem MS to identify proteins. The result of this next step up in methodology and technology was the identification of 181 unique membrane and cytosolic proteins. The third study in 2006 was by Pasini et al.
4
in which the jump in proteins identified was based on the availability of more sensitive technology. They used a similar methodology to that used by Kakhniashvili et al.,
3
but coupled it to very sensitive and accurate mass spectrometers which had become commercially available (Applied Biosystems Quadropole TOF Q-STAR MS and a Thermo Electron hybrid linear ion trap Fourier transform MS with sensitivity down to attomoles and accuracy to 1 ppm).
4
The result was identification of 566 unique membrane and cytosolic proteins.
4
In total, these three studies led to the 751 proteins known to be in the erythrocyte proteome in 2007, which represent one-third of the proteins known to be part of the proteome today (Figure 1).
1
Growth of our knowledge of the erythrocyte proteome from 2002 to 2012
Within one year of the publication of our initial review on this subject, published in Experimental Biology and Medicine, the number of proteins within the erythrocyte proteome had more than doubled to greater than 1800. The greatest jump in the number of erythrocyte proteins within the proteome between 2007 and the end of 2008 was in our knowledge of cytosolic proteins. This growth had been due to a new methodology being applied to the erythrocyte proteome: peptide ligand library and advanced MS.18–21 Using two different combinatorial libraries of hexapeptides in series, and a Thermo Electron LTQ Orbitrap MS, Roux-Dalvai et al. were able to decrease the dynamic range in solution between the most and least prominent proteins and identify 1578 cytosolic proteins. 20 An interesting update of our review by D’Alessandro et al. 12 which was published in 2010 included proteomic studies published after the submission of our review. 1 These studies included the analysis by the combinatorial library Proteominer Technology18–21 and studies on stored blood,22–24 and in total the erythrocyte proteome stood at 1989 distinct gene products in this review. 12 Interestingly, D'Alessandro and colleagues interactive network analysis of this larger database of erythrocytes proteins confirmed the central role that the ROD box 1 plays in maintaining cellular homoeostasis in the face of oxidative stress and added the possibility that the ROD box proteins are functioning as a catalytic ring. 12 The total number of unique erythrocyte proteome proteins based on our analysis of the literature stood at 2082 by the end of the period (Supplementary Table 1 and Figure 1) summarized by D’Alessandro et al. 12 This increase in 1331 proteins represented 58% of the erythrocyte proteome known today and was largely due to the combinatorial library Proteominer Technology expanding our understanding of the erythrocyte cytosolic protein complement (Figure 1).
What has been accomplished between 2010 and today in expanding our knowledge of the erythrocyte proteome? During the past three years, the growth in our knowledge of the human erythrocyte proteome has been mainly in the area of the membrane proteins. The challenges concerning the mining of membrane proteins are well known and chronicled, 25 and relate to high hydrophobicity and isoelectric point of many transmembrane integral membrane proteins, and heavy glycosylation and the low copy of some.
During the past three years since the review by D’Allesandro et al., 12 there have been several important technical advances that have uncovered new membrane proteins. van Gestel et al., by using a two-dimensional blue-native PAGE which separates protein complexes followed by SDS PAGE, were able to see 150 protein spots, from an RBC membrane preparation, which led to the identification of 524 proteins of which only 67 were cytosolic. 26 More recently, this approach has been expanded to a four-dimensional orthogonal electrophoresis system which includes non-denaturing thin layer IEF followed by native-PAGE, to separate protein complexes, followed by denaturing IEF and SDS PAGE to separate individual proteins. 27 This approach allowed the proteomic identification of proteins in six different complexes, the 20 S proteasome, haemoglobin α2β2, haemoglobin α2δ2, carbonic anhydrase, heat shock protein-60 and peryoxiredoxin-2. 27 De Palma et al. 28 starting with erythrocyte membranes first performed trypsin digestion of intact cells, as Kakhniashvili et al. had done, 3 then isolated membranes, solubilized in triton X-100, separated a soluble fraction and a skeletal pellet and trypsin digested both. Each of the three fractions of tryptic peptides were then separated by multidimensional protein identification technology (MudPIT) and then analysed on a linear ion trap LTQ MS. The result was identification of 299 unique proteins of which 211 were identified as being from the membrane. 28 Recently, Speicher and colleagues 29 have separated erythrocyte membrane proteins by 1D SDS PAGE and found that when they cut the gels into 30 uniform slices, and performed in gel digestion of each, they were able to identify 842 unique proteins utilising an LTQ-Orbitrap XL MS. This is interesting as Bosman et al. by using the same 1D SDS PAGE approach with a 7 T linear quadrupole ICR-FT MS identified 257 unique proteins in 2008. 22 Apparently, the critical difference was that Bosman et al. cut their gels into six segments 22 while Pesciotta et al. utilized 30 slices. 29 Bosman et al. by studying the proteome of various age erythrocytes and microparticles isolated from plasma identified 271 proteins in the erythrocyte membranes and 71 proteins in the erythrocyte-derived microparticles. 23
While the studies listed above have been impressive in applying creative proteomic approaches to identifying erythrocyte membrane proteins, the end result is that the erythrocyte proteome grew from 2082 proteins in 2010 (Figure 1) to 2289 proteins today (see Figure 1 and Supplementary Table 1 which provides the entire list with gene codes). The growth of unique proteins within the erythrocyte proteome was only 207 from 2010 to 2012 which represents 9% of our currently identified erythrocyte proteome (Figure 1). This means that either we are nearing the end of the mining of the erythrocyte proteome or that we now must await the next improvements of mass spectrometers with even higher sensitivity and accuracy to mine for those proteins which are present in one to one hundred copy numbers. It is possible that both are true. We believe that there is more work to be done on the low copy number erythrocyte proteome.
Probably the greatest advances over the past three years is the application of our knowledge of the normal erythrocyte proteome and interactomics to gain a better understanding of the molecular basis, identification, severity and stage of disease, and the effect of various therapeutic drugs upon the erythrocyte proteome. While great advances have been made in SCD (discussed next), there has also been major advances in our understanding of proteomic changes in the erythrocyte membrane in malaria,30–36 various hemolytic anaemias37–41 and haemoglobinopathies,1,42,43 cell storage and aging,44–47 Alzheimer’s disease, 48 schizophrenia,49–51 chronic pulmonary disorder, 52 chronic kidney disease 53 and diabetes.54,55
Sickle cell disease
SCD was the first genetic disorder where the precise molecular basis was understood at the protein level. It is an autosomal recessive disorder where the inheritance of two defective β-globin genes results in homozygous sickle cell (SS) disease. Adult haemoglobin is a α2β2 tetramer, and in SCD the α subunits are normal. A point mutation in the β-globin gene on chromosome 11, thymine replaces adenine and causes the single amino acid substitution of valine for glutamic acid in the sixth residue of β-globin. This single amino acid substitution in β-globin creates a sickle cell hemoglobin (HbS) molecule which polymerizes into a 14 stranded polymer when in its deoxy state and depolymerizes when it is well oxygenated. The polymerization of HbS causes the characteristic sickled shape and the polymerization depolymerization cycle leads to reversibly sickled SS erythrocytes. SS subjects also have dense irreversible sickled cells (ISCs) which remain sickled even when HbS is in its oxygenated depolymerized form. First, we will discuss changes in the erythrocyte membrane skeleton, which lead to the ISC, and then the process of vasooclusion, as a prelude to the discussion of the use of proteomics in understanding SCD (review1,56).
We demonstrated that reversible oxidative damage to actin and diminished ubiquitination of spectrin leads to ISC.57–65 The formation of a disulphide bridge between Cys 284 and Cys 373 in β-actin leads to an actin filament that will not depolymerize at 37℃.57–59 Goodman and colleagues have demonstrated that spectrin is a chimeric E2/E3 ubiquitin ligase which can ubiquitinate itself and several other membrane skeletal proteins.66–70 Spectrin’s E2/E3 ubiquitin conjugating/ligating activity is diminished in SCD with the result being 50–90% decrease in α-spectrin ubiquitination in repeat units 20/21.62,65 We further demonstrated that ubiquitination regulates the dissociation of the spectrin-4.1-actin 62 and spectrin–adducin–actin ternary complex, 63 and that reduced ubiquitination leads to ternary complexes that dissociate poorly at 37℃.62,63,71 We demonstrated that antioxidants that raise the reduced glutathione levels within RBCs can block the formation of dense ISCs in vitro and in vivo.72,73 Our phase 2 human trial with n-acetyl-cysteine demonstrated a 60% reduction in SS crisis rate, at 2400 mg/day, with no obvious side-effects. 73
The pathophysiology of SCD is primarily caused by vasoocclusion of the circulatory system leading to impaired oxygen delivery to cells, tissues and organs. Vasoocclusion leads to the painful SS crises or vasoocclusive episodes, which in turn are correlated to the survival of the SS patient. (review1,56) Vasoocclusion is caused based on changes in erythrocytes, white blood cells, platelets and plasma factors causing these diverse cells to associate with each other and the blood vessel endothelial cells resulting in clogging of the circulatory system. Reversible sickled cells tend to be adhesive to themselves, white blood cells and the blood vessel endothelium, while the ISCs become trapped in the narrowing passage way of blood vessels and capillaries. There is a 15-year difference in 50% survival probability between those SS subjects with one or fewer SS crises per year (mild) versus three or greater crises per year (severe). So, despite SCD being caused by a single point mutation in the β-globin locus of chromosome 11, there is great variability is SS severity and outcome between homozygous SS patients (review1,56).
Proteomics and interactomics have been and will continue to be used to answer many important questions in SCD. What protein changes exist on the membrane and cytosol of erythrocytes, leukocytes and blood vessel endothelial cells in SS patient versus control blood? What changes occur in the plasma in SCD versus control blood? What changes occur in these cells and plasma when patients or their blood are treated with various drugs? (This is the subject of pharmaco-proteomics.) Can we define biomarkers in erythrocytes, leukocytes, blood vessel endothelial cells or plasma that can reliably predict SS severity? If this is possible, then we will have made available personalized medicine for SCD. We are pleased to report that we have made substantial progress in all of these areas, but there is a long road left to travel.
We performed the first protein profiling on SS versus control RBC membrane protein using the two-dimensional difference gel electrophoresis (2D DIGE) technique and tandem MS. 74 Of the 500 fluorescent spots studied approximately, 49 changed by at least 2.5 fold when comparing SS versus control (AA) erythrocyte membranes. Thirty-eight proteins increased and 11 decreased by 2.5 fold or greater. The 38 analysed spots yielded 44 protein forms with their modifications and 22 unique proteins. When considering the 22 unique proteins, they fell into a few functional categories: membrane skeletal (actin accessory proteins), protein repair participants, lipid raft components, protein turnover components, scavengers of oxygen radicals and other categories. Interestingly, the results indicated that lipid raft components (stomatin and flotillin) are diminished >2.5 fold in SS RBC membranes, while proteins involved in an adaptive response to oxidative stress were increased by >2.5 fold in SS RBC membranes (heat shock protein subunits, chaperonin subunits, proteasomal subunits, peroxyredoxin and catalase). The protein that changes to the greatest extent was Heat Shock 70 KDa protein 8 subunit 1.
We followed this 2D DIGE study with a proteomic study on SS versus AA core membrane skeletal proteins using cleavable isotope-coded affinity tags (cICAT) which label cysteine residues. We demonstrated that the method variation was 14.1% in log ratios and the variation of the sample population was 13.8% in log ratios, so the calculated total variation in log ratios is 19.7%. Neither α-spectrin, β-spectrin, β-actin or protein 4.1 varied from a ratio of 1 in SS versus AA skeletal proteins. 75 Therefore, while the cICAT method detects no change in total spectrin, 4.1 or actin, 75 the 2D DIGE method which can separately detect post-translational modifications (PTMs) and alternate splice forms did demonstrate changed amounts of the modified forms. 74 As an example, protein 4.1 can be found in 11 discrete spots on the 2D IEF-SDS-PAGE gel and seven of these spots either increase (six spots) or decrease (one spot). 74 Therefore, when taking these two methods together, protein 4.1 does not change in total content in SCD versus AA erythrocyte membranes, but specific modified forms are changing. If the modifications alter binding affinities or capacity, then interactions in the membrane skeleton will be changing.
Towards the goal of identifying protein biomarkers that can predict SS severity, we conducted protein profiling studies on erythrocytes, leukocytes and plasma derived from patients with known five-year crisis rates or Acute Chest syndrome rates. Our initial success has come in 2D DIGE studies on monocytes. First, we wanted to look at the protein variance in the monocyte proteome in control populations
76
before moving on to SS versus reference controls.
77
Using isolated membrane and cytoplasmic fractions from highly purified monocytes isolated from 18 healthy individuals, we performed 2D DIGE experiments.
76
We studied ∼900 fluorescent protein spots and identified 31 cytosolic and 12 membrane proteins that had the greatest person to person variability in this control population. Twenty-seven cytoplasmic proteins and nine membrane proteins were identified by trypsin digestion and tandem MS. Of the cytosolic proteins, enolase-1 and WD (tryptophan-aspartate) repeat containing protein 1 demonstrated the largest standard deviations (SD). In the membrane fraction, the largest SD was observed for lamin B1 and
Studies on the plasma of subjects with SCD with or without accompanying pulmonary hypertension (PH) have indicated that decreased apolipoprotein A-I (apoA-I) is a potential marker for PH risk. 56 Further, the same laboratory has demonstrated that elevation of the serum amyloid A/apoA-I ratio in plasma could be a marker for increased crisis rate. 56
These same protein profiling approaches are being used to study the proteomic changes induced by drug therapies, and this is referred to as pharmaco-proteomics. Hydroxyurea (HU) raises fetal haemoglobin levels leading to a milder version of SCD. However, substantial evidence exists that it also affects mean cell volume (MCV), reduced SS adhesion to the endothelium and increased deformability of SS erythrocytes (review1,56). We therefore performed pharmaco-proteomic studies on the impact of HU upon the SS erythrocyte proteome utilizing the 2D DIGE protein profiling methodology and tandem MS. 78 We demonstrated that when SS erythrocytes were incubated with a clinically relevant concentration of HU, we saw PTMs of the following functional categories of proteins: antioxidant enzymes (catalase, peroxyredoxins), oxidoreductases (aldehyde dehydrogenase), protein repair participants (human T complex protein delta subunit, chaperoning containing TCP 1 subunit 7), protein degradation machinery (proteasome alpha 2 subunit variant), CO2 conversion (carbonic anhydrase I and II), membrane skeletal protein (p55) and haemoglobin (beta subunit). In the case of catalase, we demonstrated that the increase in acidic forms was due to phosphorylation of tyrosine residues. 78 This in vitro study was then followed with an in vivo determination of HU-dependent changes in protein expression in SS subjects. 79 We again used 2D DIGE to perform protein profiling studies on erythrocytes membranes from five SS subjects that had received HU treatment at 30–38 mg/kg/day, for a minimum of seven months, versus a reference control from five SS subjects who had not received HU. 79 The results indicated increases in expression of multiple modified forms of membrane proteins (band 3, protein 4.1, ankyrin, actin, tropomodulin, stomatin and p55) and glycolytic enzymes (glyceraldehyde 3-phosphate dehydrogenase and fructose-bisphosphate aldolase) and decreases in chaperonin containing TCP1 subunit 2 and proteasome subunit alpha type 4. Only palmitoyalated membrane protein (p55) was increased in both in vitro and in vivo studies. In the in vivo studies where altered expression can indicate increased synthesis and/or PTM in erythropoetic cells or PTMs in the mature erythrocytes, the increased p55 expression in SCD patients receiving HU was five to 10 fold higher than SCD subjects not being treated with HU. 79 A study with a larger number of SCD patients on HU versus not treated with HU will be of interest to determine whether p55 increases are important in the improved clinical status for SCD subjects being treated with HU.
We refer the reader to other reviews on the proteomics of SCD.43,56
Interactomics
As biologists keep discovering new RBC proteins and protein interactions from wet lab experiments, mathematicians and computer scientists, on the other hand, provide efficient computational methods so that the newly obtained data can be interpreted accurately and efficiently. Furthermore, wet lab experiments are expensive and time consuming, and thus the importance of obtaining results by efficient and relatively inexpensive computational methods cannot be stressed enough.
A key step in analysing interactomics data is constructing a protein–protein interaction (PPI) network. The foundations of PPI networks lay in the graph theory, one of the core subjects in theoretical computer science. Formally, a graph G is an ordered pair (V(G), E(G)) consisting of a set of nodes and a set of edges, V(G) and E(G), respectively, with an incidence function that associates each edge of G with a pair of (not necessarily distinct) nodes of G. A graph is said to be weighted if a numerical value is associated with each edge in the graph. In the case of PPI networks, proteins are represented by nodes, and if two proteins are known to interact they are connected by an edge. Edge weights denote the probability that the corresponding two proteins will interact. The process of building a PPI network has two major steps: identifying the list of proteins experimentally and discovering their interactions from available interaction databases. RBC interactions from the papers we analyse in this review are obtained from Unified Human Interactome Database (UniHI). Each interaction is assigned the Spearman correlation coefficient, derived from gene expression data, which represents the confidence level of the interaction. Unfortunately, databases such as UniHi contain a number of false positives and false negatives. False positives are interactions that are listed in a database but are non-existent in the real world, while false negatives are existing interactions omitted from the database. In our case, in order to reduce the chance of a false positive interaction being listed, researchers introduced a threshold and all interactions with Spearman coefficient below the threshold of 0.3 are left out.
Once a PPI network is constructed, follow-up steps include utilizing computational methods to analyse the network. Due to the fact that PPI analysis is a broad field and beyond the scope of this review, we only focus on the methods that have been used for the RBC PPI network. Those methods include: centrality measures, Voronoi diagram for graphs and Ingenuity pathway/network analysis.
Centrality measures 80 was the first computational approach that was utilized in analysing the RBC PPI network and the impact SCD has on it. The idea was to apply a method that was used in social network analysis to a biological network. Centrality measures mainly comprise three parameters that are used to predict the ‘importance’ of a node in a given network, which are as follows: degree centrality, closeness centrality and betweenness centrality. Degree centrality is arguably the simplest and the most intuitive of the three. It represents the number of edges a particular node is incident to. In a PPI network, in particular, it represents the number of interacting partners for a given protein. Degree centrality is fast and easy to compute, but it considers only individual nodes and not the network as a whole. Therefore, some discrepancies in results could occur. For instance, a node with a high degree centrality can still be disconnected from major parts of the network. Closeness centrality measures how close a given node is to all other nodes in the network. Formally, it is the inverse of the average length of all shortest paths from the node of interest. Consequently, nodes with low closeness centrality take small amount of time to propagate through the network. Furthermore, this measure is more accurate than degree centrality as it considers the relationship between the node and the entire network. Betweenness centrality is the ratio between the number of shortest paths going through a vertex and the total number of existing shortest paths in the network. Unfortunately, centrality measures algorithms are slow when it comes to worst case scenarios. It takes O(|V2|) to compute degree centrality and O(|V3|) to compute closeness and betweenness centrality, where |V| denotes the number of nodes in the network. Moreover, although centrality measures give us a notion of which nodes have the highest impact in the network, they do not provide any information regarding the vertices being impacted by a particular node, which is extremely important in the RBC/SCD study as we want to know which proteins will be impacted by a particular SCD altered protein.
SCD-altered protein ranking according to degree, closeness and betweenness centrality measures
Note: The following gene code abbreviations have been used in the table: PSMC6 = Proteasome 26S subunit; PSMA1 = Proteasome subunit alpha type 1; PSMB1 = Proteasome subunit beta type 1; CCT6A = Chaperonin containing TCP1 subunit 6 A isoform a; CCT2 = T-complex protein 1 beta subunit; PRDX1 = Peroxiredoxin 1; CCT4 = T-complex protein 1 delta subunit; CCT7 = T-complex protein 1 eta subunit; ANK1 = Ankyrin 1; HSPA8 = Heat shock 70 kDa protein 8; TUBA6 = Tubulin alpha 6.
Ammann and Goodman 81 utilized statistical cluster analyses to measure the similarity of nodes within a network a method called Generalized Topological Overlap Measure (GTOM). Using Agnes (agglomerative nesting) for GTOM 1, we demonstrated that multiple SCD altered proteins in the ROD Box group: proteasomal subunits and chaperonins fell within large clusters 1 and 4. Using Diana (divisive analysis) on GTOM 1, we demonstrated that the largest cluster 1 contains the proteasomal subunits altered in SCD. 81 As the Ammann and Goodman article 81 was submitted prior to the appearance of the combinatorial library study by Roux-Dalvai et al., 20 like Kurdia et al., 80 it relied upon the erythrocyte proteome 751 proteins described by Goodman et al. 1
In an effort to continue RBC interactome analysis and enhance methods used in Kurdia et al. 80 and Ammann and Goodman, 81 Zivanic et al. 82 proposed the use of Voronoi diagram for graphs (VDG). The Voronoi diagram is a distance-based decomposition of a metric space relative to a discrete set S of Voronoi sites. Each Voronoi site determines a Voronoi region, which is a set of points that are closer to that site than to any other site. The common boundary of two Voronoi regions is called a Voronoi edge, and two Voronoi edges meet at a Voronoi vertex. VDG is a generalization of the Voronoi diagram and was proposed by Erwig. 83 VDG provides an efficient way to cluster nodes in the network based on their distance to the members of a predetermined subset of cluster centres called Voronoi sites. Formally, let G = (V, E, w) be a graph, where V denotes the nodes, E denotes the edges and w is a weigh function that assigns a weight w > 0 to each edge in E. The VDG for G and a subset K = {v1, v2, … , v k ) of V is a partition Vor(G,K) such that for each node u of V, d (u, vi) ≤ d(u, vj) for all j = 1, … , k, where d(u,v) is the shortest path distance between the nodes u and v. If a node is equidistant from multiple cluster centres, it is assigned to all corresponding clusters. The running time of the VDG algorithm is O(|E|) in an unweighted graph, and O(|E| + log |V|) in a weighted graph. This turned out to be the most suitable approach for the study of the RBC interactome and its relationship with SCD. SCD altered proteins serve as cluster centres. Proteins belonging to the same cluster are more likely to be affected by the corresponding SCD-altered protein than any other SCD-altered protein, which was not the case with the centrality measures approach. Furthermore, VDG is much faster than centrality measures and clustering algorithms.
Zivanic et al.
82
compiled a list of 1834 proteins by assembling data from Roux-Dalvai et al.
20
and Goodman et al.
1
After excluding proteins with no interaction and applying the Spearman correlation threshold, the number of proteins dropped to 829 and those were used to create the PPI network. Of the 22 SCD altered proteins, 16 proteins were present in that network implying that the resulting VDG clustering will yield 16 clusters. They consider both the unweighted and the weighted network. In the unweighted network, the distance between any two nodes is represented by the number of edges on the shortest path between those two nodes. In the weighted network, the edge weight is set equal to the confidence level for the interaction between the corresponding two proteins. The distance between any two edges is equal to the product of the edge weights on the shortest path between the two nodes. Figure 2 illustrates the results obtained by Zivanic et al.
82
where again the chaperonin, proteasomal and antioxidant proteins altered in SCD were all part of major clusters while the ANK1 cluster is small and disconnected from the rest (Figure 2).
Voronoi regions induced by the nodes corresponding to the proteins altered by SCD. Each Voronoi site, shown as a square labelled with its gene symbol, and the nodes of its induced Voronoi region are distinctly marked. Triangle-shaped nodes belong to more than one cluster. yEd graph drawing tool
94
has been used to generate the image. This figure was previously published in our recent article on this subject
82
and is being reproduced with the permission of the copyright holder (Elsevier)
D’Alessandro et al. published a review summarizing the contemporary state of the RBC proteome and interactome as of 2010. 12 They merged the data from available RBC proteomics studies and performed pathway and network analysis using Ingenuity Software. Each gene identifier from the list they compiled was mapped to the corresponding gene object in the Ingenuity Pathway Knowledge Base. Of the 2086 proteins that were in the list originally, 1574 of those had a match in the database and were eligible for the network analysis, whereas 1374 proteins were eligible for pathway analysis. The association between the data set canonical and toxicity pathways was assigned a score, where the highest scores are proportional to a lower probability of casual association. The software identified 69 main canonical pathways and 850 different subpathways. When it comes to the network analysis, they focused on the top 50 subnetworks generated by the software. An in-depth examination showed that the top two ultra-networks share similar functions and at least one node. They concluded that ultra-networks display a well-ordered structure and are focused around the activity of several key nodes, in spite of the fact that some of them have as many as a few hundred nodes. They also considered the ROD box proposed by Goodman et al. 1 in their network and found that the results they obtained were in agreement with our analysis.1,12
Future directions
What lies ahead of us in the field of erythrocyte, and SS, proteomics and interactomics. Five years from now, we believe that we will have seen the following advances.
It is reasonable to expect that improvements to mass spectrometers will allow us to better study those proteins that are present in extremely low copy number (1 to 100 copies). This will expand the known erythrocyte proteome. While it will not approach the number of unique proteins in nucleated cells, with a complete complement of organelles, there is no reason to believe that it will not be higher than today. If the sensitivity of our protein profiling approaches improves in parallel, then we will be able to ascertain the role of these low copy number proteins to erythrocyte-based disorders.
The further development of single cell proteomic technology84–86 could allow the first comparison of the proteomes between individual ISCs versus RSCs from an SCD patient’s blood sample; comparisons of individual WBC classes’ proteomes in severe versus mild SCD and the effect of drug therapy on individual cells. While this statement concerns SCD, it should be apparent to the reader that the same will be true for any erythrocyte disorder or changes as blood bank blood ages or during normal or pathologic in vivo ageing of erythrocytes.
The fusion of proteomics with computational analysis and informatics has led to software which can couple interactome networks to the three-dimensional structure of the interacting proteins.87–93 When this is coupled to software that can analyse the affinities and on/off rates of PPI, then we will have more valuable 3D interactome networks which will be able to accurately predict in silico the physiological effects of disease-related protein defects and the results of pharmaco-proteomic changes upon cellular homoeostasis. Five years from now, the combined software capabilities and MS hardware sophistication should be in place to link modified structure and function of proteins in SS erythrocytes and leukocytes to changes in their 3D interactome network that leads to the pathophysiology of SCD including the cellular interactions that lead to vasoocclusion.
Footnotes
Authors contributions
SRG wrote the proteomics and sickle cell sections. OD and MZ wrote the interactomics section and DGK did the research to develop Supplementary Table 1.
Acknowledgements
This work was supported by NSF grant CCF-0635013 to Ovidiu Daescu and the SD fund from SUNY Upstate for Steven R. Goodman.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
