Abstract
Although many interactions between HIV-1 and human proteins have been reported in the scientific literature, no publicly accessible source for efficiently reviewing this information was available. Therefore, a project was initiated in an attempt to catalogue all published interactions between HIV-1 and human proteins. HIV-related articles in PubMed were used to develop a database containing names, Entrez GeneIDs, and RefSeq protein accession numbers of interacting proteins. Furthermore, brief descriptions of the interactions, PubMed identification numbers of articles describing the interactions, and keywords for searching the interactions were incorporated. Over 100,000 articles were reviewed, resulting in the identification of 1448 human proteins that interact with HIV-1 comprising 2589 unique HIV-1-to-human protein interactions. Preliminary analysis of the extracted data indicates 32% were direct physical interactions (e.g., binding) and 68% were indirect interactions (e.g., up-regulation through activation of signaling pathways). Interestingly, 37% of human proteins in the database were found to interact with more than one HIV-1 protein. For example, the signaling protein mitogen-activated protein kinase 1 has a surprising range of interactions with 10 different HIV-1 proteins. Moreover, large numbers of interactions were published for the HIV-1 regulatory protein Tat and envelope proteins: 30% and 33% of total interactions identified, respectively. The database is accessible at
T
To make progress in understanding HIV-1 replication, and the mechanisms of viral pathogenesis, it is important to understand the interactions of HIV-1 proteins with the vast array of human cellular proteins exploited for virus replication.
4
While these interactions can be direct viral-host cell protein-protein interactions, many are indirect, such as the regulatory interactions that alter expression of a human gene. An in-depth review and comprehension of these interactions enhance insight into HIV pathogenesis on a cellular level and are also essential for focusing efforts on new drug targets and for better understanding vaccine immune responses. Since 1984 the HIV/AIDS research community has published extensively on virus-host interactions, but unlike HIV sequence, resistance, and immunology data (
To facilitate the development of a database describing HIV-1 and human protein interactions, the annotation on the HIV-1 reference sequence (RefSeq nucleotide accession number NC_001802.1) was first improved and expanded. RefSeq
5
is a database of nucleotide and protein reference sequences annotated and distributed by the National Center for Biotechnology Information (NCBI). The HIV-1 reference sequence was updated in order to provide functional information on the HIV-1 polyproteins and mature peptides and to use standard names. Explicit representation of each mature peptide product supports more specific and accurate cataloguing of the interactions. All references within the database are catalogued using this primary reference sequence and the associated protein accession numbers. Subsequently, literature indexed in PubMed was searched using keywords for each of the HIV-1 proteins [e.g., “HIV and (matrix or p17)”]. The article titles and abstracts that were retrieved were reviewed to identify papers describing interactions between HIV-1 and human proteins. Relevant publications were collected and used to manually catalogue interactions into a database containing the (1) protein names, (2) Entrez GeneIDs (species-specific gene identification numbers from Entrez Gene, NCBI's database for gene-specific information),
6
(3) RefSeq protein accession numbers, (4) brief descriptions of the interactions, (5) PubMed identification numbers (PMIDs) of articles describing the interactions, and (6) keywords for searching the interactions. Since the articles were from peer-reviewed publications, all identified interactions were incorporated into the database without placing additional judgment on the scientific validity of the report. If two different reports had conflicting conclusions, a comment describing the data ambiguity was incorporated into the interaction descriptions. For the majority of publications the full-text paper was reviewed, but some interactions were catalogued based on abstract only (e.g., when abstracts contained complete descriptions of the interactions or full copies of the articles could not be obtained). Upon completion, the searchable database was uploaded into the HIV-1 protein interactions section of NCBI's Entrez Gene at
The HIV-1 human interaction data can be accessed at NCBI through three mechanisms: (1) Entrez Gene, (2) the dedicated web site, or (3) by file transfer protocol (FTP). Records in Entrez Gene that contain HIV-1 interaction data can be retrieved with the query: “hiv1interactions”[Properties] AND “Homo sapiens”[Organism]. The “Table of Contents” for each gene record provides a link to the section entitled “HIV-1 protein interactions” where data are presented as a table. For human gene records, the table includes a link to the gene record for the HIV-1 protein, the interaction comment, and links to the PubMed citation(s) supporting the comment. On HIV-1 records, the table reports the interaction keyword, provides a link to the corresponding human interactant gene record, and links to the supporting PubMed citation(s). The “Links” menu for each gene record also allows users to readily navigate from Entrez Gene to other NCBI resources to find additional data related to the interacting proteins. For example, gene expression data can be retrieved via the link to “GEO Profiles.”
8
Similarly, all gene records that contain HIV-1 interaction data and that also have gene expression data in the Geo
8
database can be identified in Entrez Gene using the query: “hiv1interactions”[Properties] AND gene_geo[filter]. Many other queries can be designed to retrieve and study similar subsets of the database. For instance, using specific Gene Ontology
9
(GO) terms, queries can be built to find all HIV-1 interacting proteins that are located in the cytoplasm (e.g., “hiv1interactions” [Properties] AND cytoplasm[go]) or that are involved in stem cell maintenance (e.g., “hiv1interactions”[Properties] AND “stem cell maintenance”[go]). Furthermore, in addition to Entrez Gene, the HIV-1 human protein interaction database web site lists all interaction types described per HIV-1 gene. Reports include the HIV-1 protein, the interaction type, and the human protein (linked to the gene record). Report pages include a query interface that facilitates accessing gene records that have a specific type of interaction (such as activation) with a specific HIV-1 protein. Results from these queries return as a list with links to the gene records. The report interface also provides an option to download all, or a subset, of the data in a columnar text format. Finally, the complete dataset can be transferred by FTP from
For the entire database, over 100,000 journal abstracts published between 1984 and 2007 were identified by PubMed queries and further reviewed, leading to the identification of 3200 papers describing putative interactions between HIV-1 and human proteins. Table 1 summarizes the HIV-1 protein interactions catalogued from these papers (see supporting online materials for a listing of all interactions). 10 A total of 1448 human proteins that interact with HIV-1, comprising 2589 unique HIV-1-to-human protein interactions, were identified. In addition, 5135 summary descriptions of the interactions were generated, with a total of 14,312 PMID references to the original articles that reported the interactions. Sixty-eight unique keywords (directional from HIV-1 protein to human protein) are associated with these descriptions. Keywords were selected based on the text in the original journal articles by identifying the most important functional keyword used by the authors to describe the interaction. Whenever possible, similar keywords were combined. For example, “downregulates” and “downmodulates” were combined to use the single keyword “downregulates.” The most pervasive keywords used in the database are “interacts with,” 17.0%; “upregulates,” 11.9%; “binds,” 11.7%; “activates,” 9.7%; “downregulates,” 7.8%; “inhibits,” 6.9%; “inhibited by,” 4.1%; “processed by,” 2.6%; “regulated by,” 2.1%; and “phosphorylated by,” 1.5% (see supporting online materials for a listing of all keywords). 10 While it cannot be excluded that some of these interactions are nonspecific or human-prone errors, nevertheless, 58% of the interactions were confirmed by more than one publication. Collectively, the catalogued interactions provide a unique collection of data generated from the available scientific literature.
To demonstrate the complexity of the HIV-1 human protein interaction network, the catalogued data were visualized using InterView 11 and Gene Ontology 9 (GO) terms (Fig. 1). The Gene Ontology is a set of three structured controlled ontologies that describes gene products in terms of their associated cellular component (Fig. 1), biological process, or molecular function in a species-independent manner. GO terms were collected from Entrez Gene for each HIV-interacting human protein (see supporting online materials for alternative visualizations based on biological processs and molecular function GO terms, and for distribution of interactions based on GO terms). 10 Proteins without GO terms in the three ontologies were annotated as unknown by using the root ontology term (cellular component, 14%; biological process, 9.5%; and molecular function, 7.8%). These visualizations reveal the extent to which HIV-1 interacts with diverse human proteins and demonstrate there are many examples of the virus interacting with human proteins that are part of the same functional category. Interestingly, the majority of interactions, 68%, are indirect (e.g., altered expression of a human protein), while only 32% are direct physical interactions (e.g., binding) (Table 1). In addition, 529 (37%) of the human proteins in the database were found to interact with more than one HIV-1 protein. For example, the signaling protein mitogen-activated protein kinase 1 (MAPK1) has a surprising range of interactions with 10 different HIV-1 proteins. MAPK1 is a member of the MAP kinase family involved in a wide variety of cellular processes such as proliferation, differentiation, transcription regulation, and development. Thus, it is likely that MAPK1 is intimately involved in many steps of the HIV-1 replication cycle. Similarly, mitogen-activated protein kinase 3 (MAPK3), protein kinase C-alpha (PRKCA), and interferon-gamma (IFNG) have been described as interacting with nine different HIV-1 proteins each, suggesting these proteins also play important roles in HIV-1 replication and pathogenesis.

Visualization of the HIV-1 human protein interaction network. The network was visualized with InterView. 11 Gray ovals represent HIV-1 proteins. Colored circles represent human proteins and are shown clustering around the HIV-1 protein they interact with. Human proteins that interact with multiple viral proteins are shown toward the center of the image rather than clustered around a specific HIV-1 protein. Colors correspond to human protein categories based on cellular component Gene Ontology (GO) terms. 9 The number of proteins in each category and percentage of the total 1448 human proteins interacting with HIV-1 are indicated in parentheses. Black lines correspond to direct interactions (e.g., binding) and gray lines to indirect interactions (e.g., downregulation). Circles for specific human proteins of interest discussed in the text (MAPK1, MAPK3, PRKCA, and IFNG) are indicated with dashed red lines. Overlapping circles are a result of the visualization process and do not imply any specific relationship. See supporting online materials for alternative visualizations of the HIV-1 human protein interaction network based on biological process and molecular function GO terms, and for information about specific human proteins that interact with multiple viral proteins. 10
Table 2 summarizes the distribution and number of interacting HIV-1 proteins per cellular protein (see supporting online materials for a comprehensive listing of the specific HIV-1 proteins interacting with each cellular protein). 10 Moreover, large numbers of interactions were published for the HIV-1 regulatory protein Tat, as well as for the envelope proteins: 30% and 33% of the total interactions identified, respectively. Of particular note is the cataloguing of 273 different nuclear proteins that interact with Tat, including over 40 transcription factors and regulators. Similarly, 219 extracellular and plasma membrane proteins were identified as interacting with HIV-1 gp120 and 67 were identified as interacting with gp41, including over 70 cellular receptors, integrins, and adhesion molecules. Overall, the database catalogues a wealth of information that can be mined to understand better the breadth of the HIV-1 human protein interaction network.
In conclusion, the HIV-1 human protein interaction network forms the basis of a detailed map for tracking the cellular interactions that drive HIV-1 replication and pathogenesis. Integration of the database into NCBI's online resources
7
(
Footnotes
Acknowledgments
We thank Joel Gillman for providing database support. R.G.P., W.F., and B.E.S.-B. were funded in whole with federal funds from the NIH, NIAID under Contracts N01-AI-05415 and N01-AI-70042, and wish to thank Dr. Roger Miller, Project Officer, DAIDS. J.E.D. was supported by a Wellcome Trust studentship and J.W.P by a BBSRC project grant (BB/C515412/1). M.N.R., K.S.K, D.R.M., and K.D.P. were supported by the Intramural Research Program of the NIH, National Library of Medicine. R.G.P. and W.F. catalogued the interactions and curated the database with support from B.E.S.-B.; J.E.D., J.W.P., and D.L.R. carried out the Gene Ontology analysis and visualization of the interactions; M.N.R. and R.G.P. generated the improved NC_001802.1 RefSeq annotation; K.S.K., D.R.M., and K.D.P. generated the open access database resource at NCBI and incorporated the data into Entrez Gene; C.W.D. provided the original concept for the database and support for the project through NIAID/DAIDS.
Disclosure Statement
No competing financial interests exist.
