Editor: Induced pluripotent stem cells (iPSCs) are valuable tools for disease modeling and might be used for future therapeutic applications. A critical step for all studies using iPSCs is to validate their pluripotent nature. Assays based on genome-wide expression profiles are an alternative to the classical teratoma assay [1]. PluriTest®, a patented bioinformatics assay for the quality assessment of iPSCs has developed into an important platform [2,3]. Model testing of next-generation sequencing (NGS) data for PluriTest has not been reported, yet. We therefore redefined thresholds for pluripotency and novelty with publicly available datasets and tested with cells sequenced in our laboratory if PluriTest can distinguish between pluripotent and nonpluripotent stem cell preparations (Fig. 1).
(A) Scatterplot showing the novelty and pluripotency scores calculated from the dataset from E-GEOD-41716 [4] for which Array and NGS data are available. The iPSCs (n = 21) from this dataset were used to calculate the statistical thresholds. iPSCs are represented by red dots and fibroblasts by blue dots. Red dashed lines indicate the calculated statistical thresholds for novelty and pluripotency. High-quality cells should be located in the upper left quadrant, with a pluripotency score above the threshold and a novelty score below the threshold. It can be seen that a reduction of pluripotency is correlated with a gain in novelty. All fibroblasts have a novelty score above the novelty threshold and a pluripotency score below the threshold, as expected. (B) Analysis of a test dataset of 69 high-quality pluripotent cells and 197 other cell lines or tissues. Kernel density estimations for ESCs and nuclear transfer iPSCs (yellow to red), brain tissue samples, glioblastoma tumor samples, fetal tissues, differentiated and differentiating ESCs, cancer cells and other cell cultures, embryonal carcinoma and seminoma cells, nonpluripotent stem cells, as well as partially reprogrammed cells (shades of blue/gray) are shown. Separately highlighted (colored points) are differentiating/differentiated ESCs (purple), nonpluripotent and malignant stem cells (brown), and partially reprogrammed cells (turquoise). Red dashed lines indicate the calculated statistical thresholds for novelty and pluripotency and black dashed lines the empirical thresholds maximized for specificity. It can be seen that the statistical thresholds detect high-quality pluripotent cells with a sensitivity close to 100%, but this population still contains some false positives. To exclude those, an empirical cutoff is required. (C) Differentiation time course for pancreatic differentiation from E-MTAB-1086. Dashed lines indicate the calculated statistical (red) and empirical (black) thresholds. Over time, a drop in the pluripotency score is correlated with a gain in novelty. At day 2 already, the differentiating cells are clearly classified as nonhigh-quality stem cells. (D) Scatterplot showing the novelty and pluripotency scores calculated from NGS data generated in our own laboratory. Dashed lines indicate the calculated statistical (red) and empirical (black) thresholds. Most iPSCs established are located in the upper left quadrant of the empirically calculated scores, whereas nonpluripotent cell preparations are characterized by a loss in pluripotency and a gain in novelty. (E) Suggested workflow for the usage of PluriTest® with NGS data. ESCs, embryonic stem cells; iPSCs, induced pluripotent stem cells; NGS, next-generation sequencing. Color images available online at www.liebertpub.com/scd
First, a highly validated iPSC population was selected for which NGS data as well as microarray-based PluriTest classifications were available [4]. With this high-quality dataset, we reset the thresholds using a statistical (s) approach according to a classical outlier definition (Supplementary Data including Supplementary Figs. S1–S3 and Supplementary Tables S1–S4; Supplementary Data are available online at www.liebertpub.com/scd) to thresholdpluripotency(s) (961.24) and thresholdnovelty(s) (2.54). For further model testing and refinement, we performed a systematic and unbiased analysis of the ArrayExpress database (www.ebi.ac.uk/arrayexpress) supplemented with Gene Expression Omnibus (GEO) data and extracted 266 datasets containing 69 datasets of high-quality pluripotent cells (Supplementary Data). Analysis of these datasets revealed that the statistical approach provides a fast and useful estimate of the quality of iPSCs. Even when only considering embryonic stem cells (ESCs) and differentiated or nonpluripotent stem cells, sensitivity was 98.55% and specificity 87.90%. PluriTest readout was robust for replicates and across different sequencing platforms (Supplementary Fig. S2).
As nonpluripotent cells can be very similar to pluripotent cells, we then—similar to the original PluriTest article [2]—calculated an empirical (e) pluripotency and novelty threshold what requires processing large amounts of data. Indeed, empirical refinement [pluripotency(e) (1450) and novelty(e) (2.43)] was able to increase specificity to 100% for NGS data. However, sensitivity was decreased to 82.61%, excluding some well-characterized stem cell populations. As such, cells mapping between the statistical and the stricter empirical thresholds should not be directly discarded, but critically evaluated for pluripotency and teratoma formation in the teratoma assay.
Finally, we tested the derived thresholds with NGS data from our own laboratory. We used iPSCs (n = 24), ESCs (n = 2), cancer cells (n = 14), and fibroblasts (n = 12). All of our cells were correctly classified, with only few iPSC lines within the two thresholds.
Taken together, we show that PluriTest is compatible with NGS data and a valuable tool for analysis of NGS-based iPSC profiles. Statistical threshold adaptation for NGS data is possible on a small training dataset, can be performed by any investigator proficient in R, and allows for a reasonable quality estimation of pluripotent cells. However, as different mapping algorithms exist, it is mandatory that the individual experimenter strictly follows the methods outlined and compares his calculated scores to the ones calculated by us (Supplementary Data). In contrast, the original PluriTest application [2] relies on empirically derived thresholds. Such thresholds can just as well be calculated for NGS data by employing high-quality cells from different public sources, lead to slightly more conservative cutoff criteria, but require a bioinformatically well-equipped laboratory environment.
References
1.
ButaC, DavidR, DresselR, EmgardM, FuchsC, GrossU, HealyL, HeschelerJ, KolarR, et al. (2013). Reconsidering pluripotency tests: do we still need teratoma assays?. Stem Cell Res, 11:552–562.
2.
MullerFJ, SchuldtBM, WilliamsR, MasonD, AltunG, PapapetrouEP, DannerS, GoldmannJE, HerbstA, et al. (2011). A bioinformatic assay for pluripotency in human cells. Nat Methods, 8:315–317.
3.
LoringJF and MullerFJ (2013). Compositions and methods for defining cells. US Patent, 8, 442,772.
4.
AbyzovA, MarianiJ, PalejevD, ZhangY, HaneyMS, TomasiniL, FerrandinoAF, Rosenberg BelmakerLA, SzekelyA, et al. (2012). Somatic copy number mosaicism in human skin revealed by induced pluripotent stem cells. Nature, 492:438–442.
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.