Abstract
Abstract
Here we present a database, WikiCell, as a portal for a unified view of the human transcriptome. At present, WikiCell consists of Expressed Sequenced Tags (ESTs), and users can access, curate, and submit database data by interactive mode, and also can browse, query, upload, and download sequences. Researchers can utilize the transcriptome model based on a human taxonomy graph. The sequences in each model are sorted by attributes such as physiological and pathological samples. The Genbank EST data format are conserved. Gene information is provided, including housekeeping genes, taxonomy location, and gene ontology (GO) description. We believe that WikiCell provides a useful resource for defining expression pattern and tissue differentiation based on human taxonomy mode. It can be accessed at http://www.wikicell.org/.
Introduction

Expressed Sequenced Tag (EST) annual cumulative rising rate. There are few EST data available before 2001. The statistics span nearly a decade. The quality of EST sequences is increasing every year, but the rate trend is steady.
Fortunately, the development of Mediawiki (Mediawiki, 2007) software resolves the deficiency described above (www.mediawiki.org). Current biological databases provide users with data submission forms, tools, and access to the compiled data via websites, FTP sites, or programmatic interfaces. Internal curation teams organize and update the data. A wiki model for biological databases, such as WikiPathways (Kelder et al., 2009; Pico et al., 2008), Proteopedia (Eran et al., 2008), and Genewiki (Jon et al., 2008, 2009), provides a single, intuitive interface for submitting, updating, organizing, and accessing data. This allows users to participate in the curation process and keep up with the influx of new data.
To exploit the features of these databases, we have developed WikiCell (www.wikicell.org), a portal that provides a unified view of the human transcriptome. WikiCell is an open and public platform dedicated to the annotation of the human transcriptome. Researchers can contribute transcriptome data, including ESTs and annotations. The wiki format allows authors to create and edit any number of interlinked webpages. Based on the anatomy of the human body, the logical structure traces out an image of refined classification from nine major systems to cell level, and includes both physiological and pathological transcriptome data.
Materials and Methods
The overall ESTs statistics were identified from human tissues, as shown in Table 1. Data were collected from two sites, with the majority from the National Center for Biotechnology Information (www.ncbi.nlm.nih.gov), and the remainder from the Beijing Institute of Genomics (www.big.cas.cn). The second set represents our independent procreant data. The sequences were downloaded from the dbEST database, which contained 8,444,018 reported EST entries from Homo sapiens as of October 2011. ESTs in groups of 100 or more were considered members of a library, which resulted in 5,943,083 EST sequences. These sequences were sorted by physiological and pathological origin based on library information of sequence source, which expatiated on human taxonomy pathologic and medical information. There were more than 4000 libraries in human EST data, and several libraries belonged to the same tissue or cell. Accordingly, these libraries were assigned 231 clusters. Physiological sequences from different tissues or cells were used for researching gene expression relationships. Some specific EST sequences of pathologic tissue can be helpful for researching causes of morbidity.
The WikiCell online database (www.wikicell.org) was implemented using a mediawiki engine, mySQL relational database, and PHP technology, and provides a simple way to access the EST data and their annotations.
WikiCell presents a new model for EST databases that enhances and complements ongoing efforts. Each node of taxonomy at WikiCell has a dedicated wiki page displaying dynamic pictures, descriptions, references, a system tree diagram, statistics, and a path. The statistics section provides the number of current nodes and sub-nodes owned by the EST. Users may only enter the summary page from the statistic section of a leaf node page, which is shown in Figure 2.

The flow chart of browsing each page. Users can browse each of nine systems from the main page. There is a correlative taxonomy graph in each system, in which the link nodes denote that there are Expressed Sequenced Tag data in the pages. The entrance of summary pages is at the statistics region in the note page.
All pages of WikiCell are freely accessible and do not require registration. However, the contribution of information is only possible for registered users who are logged in. Our policy with respect to registrations is based on balancing two conflicting aims: encouraging users to contribute to the wikis by making it simple while ensuring the reliability of the information provided. Thus a mandatory but liberal registration policy seemed to be the best way to balance these two aims.
Results
WikiCell has three main page types: taxonomy, summary, and EST data. Taxonomy pages include human systems, organs, tissues, and cells, as shown in Figure 3. Summary and EST pages are located a level below tissue and cell pages. Taxonomy pages are more like a tree trunk, and summary and EST pages represent the tree branches. The majority of WikiCell is devoted to ESTs, with each page displaying GenBank, information on the EST location, and gene annotation data.

Sample of a human taxonomy page. This page on the WikiCell website is one note page in a human taxonomy graph. It introduces some information about human local composition.
WikiCell can be searched by organ, tissue, and cell type, as well as GenBank accession number. In addition, one can browse available human gene information, including chromosome location, housekeeping (HK) genes, gene description, and gene ontology (GO) function. Gene information may also be sorted into three categories: chromosome, HK, and GO function. We have chosen to utilize the definition of Zhu's group of HK genes (Jiang et al., 2008a, 2008b). Through the analysis of public expression data (from ESTs and microarray data) from 18 human tissues, these researchers found that 40% of the currently annotated human genes were constitutively expressed in at least 16 of 18 tissues. We have adopted the most rigorous definition of an HK gene, in which a gene must be expressed in all 18 tissues examined. According to this definition, 3182 HK genes are compiled in each chromosome using this method.
Features of WikiCell
WikiCell is a portal for querying, browsing, communicating, uploading, and downloading contributed datasets. One important character of this database compared to a traditional database is that users can access, curate, and submit database by interactive mode, which speeds up the maintenance and renewal of WikiCell. First, WikiCell provides EST data relative to spatial expression in the human body. You can access the “body path” of the data, such that it identifies in which system, organ, tissue, and/or cell the transcript is expressed. For example, there are 185 EST sequences for the intraglomerular mesangial cell. WikiCell shows the following taxonomy path: Human -;-> Urinary System -;-> Kidney -;-> Renal Parenchyma -;-> Uriniparous Tubule -;-> Renal Corpuscle -;-> Glomerulus -;-> Intraglomerular Mesangial Cell. It also indicates the number and type of EST data that are from the same library. This kind of structure of the database is useful for further research on defining expression patterns and tissue differentiation. Second, WikiCell allows searches for multiple page types, as shown in Figure 4. A user may query with general key words for things such as taxonomy, or with specific terms like GenBank accession numbers. Third, there are three parts in each EST page. The GenBank EST reporting format is preserved to display identifiers such as clone information, primers, sequence, comments, library, submitter, and citations. In addition, there is detailed information about the EST chromosomal location (e.g., matches, strand, sequence size, chromosome number, and start and end location), and block feature (block count and block size). The gene name where the EST is located and the taxonomy path are also provided at the bottom of the page. Fourth, the links of some relevant human integrative resources are added to the front page, such as elements of the human gene compendium, including: GeneCards (http://www.genecards.org/), a member of the biocyc database collection; MetaCyc (http://metacyc.org/), a curated knowledgebase of biological pathways in humans; Reactome (http://www.reactome.org/ReactomeGWT/entrypoint.html), a model organism protein expression database; MOPED (http://moped.proteinspire.org), pathways for the people; and WikiPathways (http://wikipathways.org/index.php/WikiPathways) (Rebhan et al., 1997; Ron et al., 2008; Joshi-Tope et al., 2005; Kolker et al., 2012; Kelder et al., 2009).

Search function. Two search tools are on the WikiCell website: Mediawiki-owned and Google custom search extension. You can input keywords to find correlative matches.
Finally, WikiCell offers a bulk data download option, which increases database utility for computational biologists, who may use WikiCell data to conduct in silico analyses of transcriptome activity.
The content management system
Mediawiki was chosen as the software platform for the WikiCell. This interface is identical to that used by Wikipedia, and thus many users will be quickly familiar with the system. In addition, Mediawiki allows the use of extensions. Due to the popularity of software, many extensions are readily available. For WikiCell, we use several extensions: DataInvoker (www.mediawiki.org/wiki/Extension:DataInvoker), which introduces new parser functions allowing database data retrieval; Secure HTML (www.mediawiki.org/wiki/Extension:Secure_HTML), which allows a user to specify an arbitrary HTML when the HTML includes a corresponding hash created by combining the HTML input with an authorized key; and ConfirmEdit (www.mediawiki.org/wiki/Extension:ConfirmEdit), which enables a simple text Captcha that should minimize automated editing.
Discussion and Future Directions
With the exponential growth of biological wikis, it is clear that the wiki model resonates with biological and scientific communities. However, many of these biological wikis appear to suffer from a lack of participation. Establishing a critical mass of users and useful content appears to be the most common obstacle in these efforts.
Active participation by members of the transcriptomics community in sharing data through WikiCell has resulted in a prolific increase in the amount of transcriptome annotations. A great deal of additional transcriptome data, such as SAGE, SRA, GEO, and RNA-Seq, will be added to the WikiCell (Wang et al., 2009). Some useful tools for transcriptomics research, such as DEGSeq, Tophat, and Cufflink, will be integrated for online data analysis (Likun et al., 2010; Roberts et al., 2011; Trapnell et al., 2009). In addition, the further research on expression patterns for different tissue/cell types based on these data is an important resource, as secondary analysis data will be integrated into WikiCell. With the increased participation of the scientific community, we foresee that WikiCell will serve as a unified resource for transcriptome research. It will be an important resource for defining transcriptome models and tissue diversity using the human taxonomy method.
Footnotes
Acknowledgements
We are grateful to Jin Jing, Zhang Ning, and Li Hongjie for their helpful suggestions. This study was supported by a grant from the National Basic Research Program (973 Program, no. 2010CB126604), a grant from the Special Foundation Work Program (no. 2009FY120100), a grant (2012AA020409) from National Programs for High Technology Research and Development (863 Program), support from the Ministry of Science and Technology of the People's Republic of China, and a grant from the National Science Foundation of China (no. 31071163 and 31101063).
Author Disclosure Statement
No competing financial interests exist.
