Loss-Function Learning for Digital Tissue Deconvolution

Abstract

The gene expression profile of a tissue averages the expression profiles of all cells in this tissue. Digital tissue deconvolution addresses the following inverse problem: given the expression profile y of a tissue, what is the cellular composition c of that tissue? If X is a matrix whose columns are reference profiles of individual cell types, the composition c can be computed by minimizing $ℒ (y - X c)$ for a given loss function $ℒ$ . Current methods use predefined all-purpose loss functions. They successfully quantify the dominating cells of a tissue, while often falling short in detecting small cell populations. In this study we use training data to learn the loss function $ℒ$ along with the composition c . This allows us to adapt to application-specific requirements such as focusing on small cell populations or distinguishing phenotypically similar cell populations. Our method quantifies large cell fractions as accurately as existing methods and significantly improves the detection of small cell populations and the distinction of similar cell types.

1. Introduction

Different tissues of the body have different cellular compositions. The composition of tumor tissue is different from that of normal tissue. Also, when comparing two tumor tissues, their cellular composition can differ greatly. The relatively small populations of tumor-infiltrating immune cells are of particular importance. They affect progression of disease (Galon et al., 2006) and success of treatment (Fridman et al., 2012). Immune therapies block communication lines between tumor cells and infiltrating immune cells. Whether they are successful or not depends on the presence, quantity, and molecular subtype of the infiltrating immune cells (Hackl et al., 2016). Immune cell populations are typically small, and their molecular phenotype can be difficult to observe under the microscope. Single-cell technologies such as fluorescence-activated cell sorting (FACS; e.g., [Ibrahim and van den Engh, 2007]), cytometry by time-of-flight (Bendall et al., 2011), and single-cell RNA sequencing (Wu et al., 2014) assess molecular features on the single-cell level and can thus be used to determine the cellular tissue composition experimentally.

A more cost- and work-efficient alternative to single-cell assays is a combination of bulk tissue gene expression profiling with digital tissue deconvolution (DTD) (Lu et al., 2003; Abbas et al., 2009; Gong et al., 2011; Qiao et al., 2012; Altboum et al., 2014; Newman et al., 2015; Li et al., 2016). DTD addresses the following inverse problem: given the bulk gene expression profile y of a tissue, what is the cellular composition c of that tissue? Supervised DTD assumes that there is a matrix X whose columns are reference profiles of individual cell types. The composition c of y can be computed by minimizing $ℒ (y - X c)$ for a given loss function $ℒ$ . Competing DTD methods use different predefined all-purpose loss functions $ℒ$ and different estimation algorithms to distil c from y and X.

The practical objective of DTD is to estimate c correctly, whereas the formal objective of common DTD algorithms is to estimate y correctly. If tissue expression profiles were exact mixtures of reference profiles, existing methods should work perfectly. They are not and this causes problems: (1)

Collections of reference profiles can be incomplete. There might be cells in the tissue that are not represented by the reference profiles. In that case the global DTD problem is not solvable, and DTD algorithms will compensate for the contributions of these cells by increasing the frequencies of other cell types.

(2)

Small cell fractions are hard to quantify. From a practical point of view, this is probably the most important point, and improvements are needed badly. Immunological cell populations in a tumor are small, but they may determine the reaction of a tumor to immunotherapy. Therefore, DTD algorithms must use faint signals from small cell populations more effectively.

(3)

Some cell types can hardly be distinguished by their expression profiles. The profile of an epithelial cell differs greatly from that of a lymphoid cell. For two immunological subentities of CD8⁺ T cells, the differences are more subtle. The more similar two cell types are, the more similar are their expression profiles, and the more difficult is their distinction.

In summary, different applications need different approaches. One way to adapt the estimation of c is to adapt the loss function $ℒ$ . If the focus of an application is on a predefined set of cell types, genes that are informative to distinguish exactly these cells should dominate $ℒ$ . This is even more important if the focus is on small cell populations, the faint signals of which must not be suppressed. Unfortunately, it is not clear a priori which genes to ignore and which to focus on.

2. Methods

2.1. Notations

Let $X \in ℛ^{p \times q}$ be a matrix with cellular reference profiles $X_{\cdot, j}$ in its columns, where the dot stands for all row indices. $X_{i j}$ is the reference expression value of gene i in cells of type j, p the number of genes, and q the number of cell types in X, respectively. We further introduce a matrix $Y \in ℛ^{p \times n}$ with bulk profiles of n cell mixtures $Y_{\cdot, k}$ in its columns and a matrix $C \in ℛ^{q \times n}$ with the cellular compositions of the mixtures $C_{\cdot, k}$ as columns.

2.2. Loss-function learning

Following the established linear DTD algorithms, we approximate the mixture $Y_{\cdot, k}$ by a linear combination of reference profiles (the columns of X) with $C_{., k}$ as weights and estimate the composition of the k-th mixture $C_{\cdot, k}$ by minimizing $ℒ_{g} (Y_{\cdot, k} - X C_{\cdot, k}),$ (1)

where $ℒ_{g} = | | d i a g (g) (Y_{\cdot, k} - X C_{\cdot, k}) | |_{2}^{2} .$ (2)

In contrast to standard DTD algorithms, which determine g by prior knowledge or separate statistical analysis, we learn g directly from data. To this end, we assume that we have a training set of mixtures $Y_{\cdot, k}$ from a specific application context with known cellular proportions $C_{\cdot, k}$ that sum to 1. The entries of g are the gene weights that define the loss function. We want to learn g from the training data such that minimizing $ℒ_{g} (y - X c)$ with respect to c yields accurate quantifications of cell populations for future samples with similar characteristics as those used for training.

Our method has two nested objective functions: an outer function $L (g)$ and an inner function $ℒ_{g}$ , which is here given by Equation (2). L evaluates discrepancies between the estimated and the true cellular frequencies of cell types across samples by Pearson correlation: $L (g) = - \sum_{j = 1}^{q} c o r (C_{j, \cdot}, Ĉ_{j, \cdot} (g)) s u b j e c t t o g_{i} \geq 0 a n d | | g | |_{2} = 1,$ (3)

where the $Ĉ_{j, \cdot} (g)$ are the estimates of $C_{j, \cdot}$ given g. To evaluate $L (g)$ , we need to calculate all $Ĉ_{j, \cdot} (g)$ , which requires optimizing $ℒ_{g}$ with respect to all $C_{\cdot, k}$ . Note that if $ĝ$ is a minimum of L, so is $α ĝ$ for $α > 0$ . The constraint $| | g | |_{2} = 1$ is thus needed to ensure unique solutions. Note, we chose Pearson correlation in Equation (3) since it allows to deal with normalization uncertainties, as described in section 5.1.

The minimum of $ℒ_{g}$ can be calculated analytically, yielding $\hat{C} (g) = {(X^{T} Γ X)}^{- 1} X^{T} Γ Y$ (4)

with $Γ = d i a g (g)$ . Inserting this term into L leaves us with a single optimization problem in g. We minimize L by a gradient-descent algorithm. Let $μ_{j}$ and $σ_{j}$ be the mean and standard deviation of $C_{j, \cdot}$ , respectively. We obtain the gradient $\frac{\partial L (g)}{\partial g_{i}} = \sum_{j = 1}^{q} \sum_{k = 1}^{n} \frac{1}{σ_{j} {\hat{σ}}_{j}} (\frac{c o v (C_{j, \cdot}, Ĉ_{j, \cdot})}{n {\hat{σ}}_{j}^{2}} (Ĉ_{j k} - {\hat{μ}}_{j}) - \frac{1}{n} (C_{j k} - μ_{j})) \frac{\partial Ĉ_{j k} (g)}{\partial g_{i}}$ (5)

with $\frac{\partial L (g)}{\partial g_{i}} = {(X^{T} Γ X)}^{- 1} X^{T} δ (i) (1 - X {(X^{T} Γ X)}^{- 1} X^{T} Γ) Y,$ (6)

where $δ (i) \in ℛ^{p \times p}$ is defined as $δ {(i)}_{j k} = {\begin{matrix} 1 & if i = j = k, \\ 0 & else . \end{matrix}$ (7)

Using these equations, we implemented our optimization algorithm. For a more efficient implementation, consider section 5.2. Constraints $| | g | |_{2} = 1$ and $g_{i} \geq 0$ were incorporated by normalizing g by its length and by restricting the search space to $g_{i} \geq 0$ .

3. Results

3.1. DTD of melanomas

For both training and validation, we need expression profiles of cellular mixtures of known composition. We used expression data of melanomas whose composition has been experimentally resolved using single-cell RNAseq profiling (Tirosh et al., 2016). These data included 4645 single-cell profiles from 19 melanomas. The cells were annotated as T cells (2068), B cells (515), macrophages (126), endothelial cells (65), cancer-associated fibroblasts (CAFs) (61), natural killer (NK) cells (52), and tumor/unclassified (1758). The first 9 melanomas defined our validation cohort and the remaining 10 defined our training data.

First, data were transformed into transcripts per million. Then, for each cell cluster we sampled $20 %$ of single-cell profiles in the training data, summed them up, normalized them to a common number of counts, and removed them from the training data. This yielded reference profiles $X_{\cdot, j}$ . The 1000 genes with the highest variance across all reference profiles were used to train models.

The sum of all single-cell profiles of a melanoma gave us bulk profiles. In addition, we generated a large number of artificial bulk profiles by randomly sampling single-cell profiles and summing them up. All bulk profiles were normalized to the same number of reads as those in $X_{\cdot, j}$ .

3.2. Loss-function learning improves DTD accuracy in the case of incomplete reference data

We generated 2000 artificial cellular mixtures from our training cohort. For each of these mixtures, we randomly drew 100 single-cell profiles, summed up their raw counts, and normalized them to a fixed number of total counts. Analogously, we generated 1000 artificial cellular validation mixtures.

Then, we restricted X to three cell types (T cells, B cells, and macrophages). Hence endothelial cells, CAFs, NK cells, and tumor/unclassified cells in the mixtures are not represented in X. For standard DTD with $g = (1, \dots, 1)$ , we observed correlation coefficients of 0.70 (T cells), 0.39 (B cells), and 0.52 (macrophages) between true and estimated cell population sizes for the validation mixtures. These improved to 0.86 (T cells), 0.89 (B cells), and 0.83 (macrophages) for loss-function learning, after we ran 1000 iterations of the gradient descent algorithm on the training data. We tested our gradient descent algorithm on the 100 most variable genes for 100 different uniformly drawn starting points $g \in {[0, 1]}^{p}$ . The maximal Euclidean distance between resulting composition vectors c was $2 %$ .

To test the limits of the approach, we excluded all but the macrophages, which account for < $3 %$ of all cells, from the reference data X. We observed that standard DTD broke down, whereas loss-function learning yielded a model that predicted macrophage abundances that still correlated well ( $r = 0.84$ ) with the true abundances (Fig. 1).

FIG. 1.

Deconvolution performance with only a single reference profile (macrophages). Predicted cell frequencies are plotted versus real frequencies. Results from the standard DTD model with $g = 1$ are shown in (a), for DTD with loss-function learning in (b). DTD, digital tissue deconvolution.

3.3. Loss-function learning improves the quantification of small cell populations

We generated data as above for mixtures of T cells, B cells, macrophages, endothelial cells, CAFs, NK cells, and tumor/unclassified cells, and use all cells except the tumor cells in X. This time we control the abundance of B cells in the simulated mixtures at 0–5 cells, 5–15, 15–30, 30–50, and 50–75 out of 100 cells. Not surprisingly, small fractions of B cells were harder to quantify than large fractions. Loss-function learning improved the accuracy for all amounts of B cells, but the improvements were greatest for small amounts (Fig. 2a). With only 0–5 cells in a mixture, the accuracy improved from $r = 0.22$ to $r = 0.79$ . Furthermore, we observed that loss-function learning on small B cell proportions yielded a model that was highly predictive of B cell contributions over the whole spectrum (Fig. 2a green stars).

FIG. 2.

Plot (a) shows how the correlation between predicted and true cellular frequencies for B cells depends on the proportion of B cells. The blue triangles correspond to models from loss-function learning and red diamonds correspond to the standard DTD model with $g = 1$ . Furthermore, the green stars show how the model trained on mixtures with 0 to $5 %$ B cells extrapolates to higher B cell proportions. The orange line in contrast was trained on mixtures with 50 to $75 %$ B cells and extrapolates to lower B cell proportions. Plot (b) shows a heatmap of the 50 most important genes corresponding to the green star model (genes were ranked by $ĝ_{i} \times v a r (X_{i, \cdot})$ ). Plot (c) shows an analogous heatmap for loss-function learning on macrophages only. Blue corresponds to low expression and red to high expression.

If we compare the top-ranked genes of the model learned for the small B cell population (Fig. 2b) with that of the macrophage-focused simulation (Fig. 2c), we observe that the former still comprises marker genes to distinguish all cell types, whereas the latter focuses on genes that characterize macrophages.

3.4. Loss-function learning improves the distinction of closely related cell types

The cell types that were annotated by Tirosh et al. (2016) displayed very different expression profiles. If we are interested in T cell subtypes such as CD8⁺ T cells, CD4⁺ T-helper (Th) cells, and regulatory T cells (Tregs), reference profiles are more similar and DTD is more challenging. We subdivided the fraction of annotated T cell profiles as follows: all T cells with positive CD8 (sum of CD8A and CD8B) and zero CD4 count were labeled CD8⁺ T cells (1130). Vice versa, T cells with zero CD8 and positive CD4 count were labeled CD4⁺ T cells (527). These were further split into Tregs if both their FOXP3 and CD25 (IL2RA) count was positive (64), and CD4⁺ Th cells otherwise (463). T cells that fulfilled neither the CD4⁺ nor the CD8⁺ criteria (411) contributed to the mixtures, but were not assessed by DTD. We augmented the reference matrix X, here consisting of T cells, B cells, macrophages, endothelial cells, CAFs, and NK cells, by these cell types, replacing the original all T cell profile with the more specific profiles for CD8⁺ T cells, CD4⁺ Th, and Tregs. Then we simulated 2000 training and 1000 test mixtures as already described.

For standard DTD with $g = 1$ , we observed correlation coefficients of 0.19 (CD4⁺ Th), 0.53 (CD8⁺), and 0.08 (Tregs) between true and estimated cell population sizes. These improved to 0.58 (CD4⁺ Th), 0.78 (CD8⁺), and 0.57 (Tregs) for our method (Fig. 3).

FIG. 3.

Deconvolution of T cell subentities. Results from the standard DTD model with $g = 1$ are shown in the upper row, plots (a–c), results from loss-function learning in the lower row, plots (d–f).

3.5. Loss-function learning is beneficial even for small training sets, and the performance improves as the training data set grows

We repeated the simulation in section 3.4, but varied the size of the training data set. We observed that loss-function learning improved accuracy for training data sets as small as 15 samples. Moreover, with more training data added, the boost in performance grew and saturated only for training sets with >1000 samples (Fig. 4).

FIG. 4.

Performance with and without loss-function learning as a function of the size of the training set. Performance was assessed by calculating the average correlations between predicted and true cellular contributions over all cell types. The blue diamonds and black triangles correspond to the performance of loss-function learning for the validation mixtures and training mixtures, respectively. The performance of standard DTD with $g = 1$ is shown as a red line for the validation mixtures.

3.6. High-performance computing-empowered loss-function learning rediscovers established cell markers and complements them by new discriminatory genes for improved performance

Here, we introduce a final model, optimized on the 5000 most variable genes. For this purpose, we generated 25,000 training mixtures from the melanomas of the training data. With standard desktop workstations, the solution of this problem was computationally not feasible. A single computation of the gradient took 16 hours (2x Intel Xeon CPU [X5650; Nehalem Six Core, 2.67 GHz], 148 Gb RAM), and this needs to be computed several hundred times until convergence. Therefore, we developed a high-performance computing implementation of our code by parallelizing Equations (3) and (6) with MPI, using the pbdMPI library (Chen et al., 2012a, 2012b) as an interface. Furthermore, we linked R with the Intel Math Kernel Library for threaded and vectorized matrix operations. We ran the algorithm on 25 nodes of our QPACE 3 machine Georg et al. (2017) with 8 MPI tasks per node and 32 hardware threads per task, where each thread can use two AVX512 vector units. In 16 hours, 5086 iterations were finished, after which the loss (3) was stable to within 1%.

The high-performance model includes several genes, whose expression is characteristic for the cells distinguished in this study. These include, among others, the CD8A gene, which encodes an integral membrane glycoprotein essential for the activation of cytotoxic T-lymphocytes (Veillette et al., 1988) and the protection of a subset of NK cells against lysis, thus enabling them in contrast to CD8⁻ NK cells to lyse multiple target cells (Addison et al., 2005). As evident from Figure 5, NK cells are clearly set apart from all the other cell types studied by the expression of the killer cell lectin-like receptor genes KLRB1, KLRC1, and KLRF1 (Moretta et al., 2001). B cells, in contrast, are clearly characterized by the expression of (1) CD19, which assembles with the antigen receptor of B lymphocytes and influences B cell selection and differentiation (Rickert et al., 1995), (2) CD20 (MS4A1), which is coexpressed with CD19 and functions as a store-operated calcium channel (Li et al., 2003), (3) B lymphocyte kinase (BLK), a src-family protein tyrosine kinase that plays an important role in B cell receptor signaling and phosphorylates specifically (4) CD79A at Tyr-188 and Tyr-199 as well as CD79B (not among the top 150 genes) at Tyr-196 and Tyr-207, which are required for the surface expression and function of the B cell antigen receptor complex (Hsueh and Scheuermann, 2000), and (5) B Cell Linker (BLNK), which bridges BLK activation with downstream signaling pathways (Wienands et al., 1998). The expression of FOXP3 is also highly cell specific. FOXP3 distinguishes Tregs from other CD4⁺ cells and functions as a master regulator of their development and function (Hori et al., 2003). Finally, CD4⁺ Th cells are distinguished indirectly from all the other aforementioned lymphocytes by the lack of expression of cell type-specific genes.

FIG. 5.

Heatmap of X for the features with the top 150 weights ( $ĝ_{i} \times v a r (X_{i, \cdot})$ ). Blue corresponds to low expression and red to high expression. The data were clustered by Euclidean distance.

In contrast to lymphocytes, macrophages, CAFs, and endothelial cells, which line the interior surface of blood vessels and lymphatic vessels, are characterized each by a much larger number of genes. Exemplary genes include CD14, CD163, MSR1, STAB1, and CSF1R for macrophages. The monocyte differentiation antigen CD14, for instance, mediates the innate immune response to bacterial lipopolysaccharide by activating the NF- $κ$ B pathway and cytokine secretion (Haziot et al., 1996), whereas the colony stimulating factor 1 receptor (CSF1R) acts as a receptor for the hematopoietic growth factor CSF1, which controls the proliferation and function of macrophages (Sherr et al., 1985). CAFs, in contrast, are distinguished by the expression of genes encoding extracellular matrix proteins such as fibulin-3 (EFEMP1), various collagens (COL1A1, COL3A1, COL6A1, and COL6A3), versican (VCAN), a well known mediator of cell-to-cell and cell-to-matrix interactions (Wu et al., 2005) that plays critical roles in cancer biology (Du et al., 2013), as well as the matrix metalloproteinases MMP1 and MMP2, two collagen degrading enzymes that allow cancer cells to migrate out of the primary tumor to form metastases (Gupta et al., 2014).

Noteworthy is also GREM1, an antagonist of the bone morphogenetic protein pathway. Its expression and secretion by stromal cells in tumor tissues promote the survival and proliferation of cancer cells (Sneddon et al., 2006). Genes characteristic for endothelial cells include among others CDH5, a member of the cadherin superfamily essential for endothelial adherens junction assembly and maintenance (Gory-Faure et al., 1999), the endothelial cell-specific chemotaxis receptor (ECSCR) gene, which encodes a cell surface single-transmembrane domain glycoprotein that plays a role in endothelial cell migration, apoptosis and proliferation (Shi et al., 2011), claudin-5 (CLDN5), which forms the backbone of tight junction strands between endothelial cells (Haseloff et al., 2015), and the von Willebrand factor, which mediates the adhesion of platelets to sites of vascular damage by binding to specific platelet membrane glycoproteins and to constituents of exposed connective tissue (Sadler, 1998).

We discussed 28 genes of the top 150 shown in Figure 5. These genes have a total weight of 28% of all 5000 gene weights (calculated as $ĝ_{i} \times v a r (X_{i, \cdot})$ ). Our algorithm complements this gene set with additional genes, including some that were, to our knowledge, not yet used to characterize cell types. An interesting example is CXorf36 (DIA1R), which has been described as being expressed at low levels in many tissues and deletion and/or mutations of which have been associated with autism spectrum disorders (Aziz et al., 2011). However, nothing is known about its function to date. Therefore, its observed overexpression in endothelial cells may provide an important clue for future study on its function.

3.7. Loss-function learning shows similar performance as CIBERSORT for the dominating cell populations and improves accuracy for small populations and in the distinction of closely related cell types

Next we compared our model trained in section 3.6 with a competing method. For this, we generated 1000 test mixtures from our validation melanomas. We chose CIBERSORT (Newman et al., 2015) for comparison, because it was consistently among the best DTD algorithms in a broad comparison of five different algorithms on several benchmark data sets (Newman et al., 2015). We ran CIBERSORT on the test mixtures, using two distinct approaches: first, we uploaded our validation data to CIBERSORT using their reference profiles. The performance is summarized in Figure 6 as CIBERSORT^a. We observed that the large population of B cells was estimated accurately, whereas smaller populations were inaccurate (NK cells, Tregs). Next, we uploaded our reference profiles and used the CIBERSORT gene selection (CIBERSORT^b green) were predicted with high accuracy. However, the distinction of similar cell types such as CD4⁺ T helper cells and Tregs was compromised, $r = 0.42$ and $r = 0.42$ , respectively. Similarly, predictions for the small populations of CAFs were compromised. That might be explained by the fact that CIBERSORT does not take into account their distinction and thus appropriate marker genes might be missing. In a direct comparison with CIBERSORT, our method showed similar or better performance.

FIG. 6.

Performance comparison. The methods are from left to right: standard DTD with $g = 1$ on the 5000 most variable genes (red), CIBERSORT^b(green), loss-function learning (blue), the study where bulk and reference profiles were generated with different technologies (pink), and CIBERSORT^a(yellow). Performance was calculated as correlation between predicted and true frequencies on 1000 validation mixtures. Endothelial cells (endo.) and CAFs were not estimated by CIBERSORT^a and microarray reference profiles were not available. Thus no yellow and pink bars are shown. CAF, cancer-associated fibroblast.

Next, we tested whether our method would have also worked for bulk profiles generated by a different technology than the reference profiles. We used the scRNAseq-derived loss-function and the bulk profiles already described but replaced the reference profiles in X by microarray data downloaded from the CIBERSORT webpage. We rescaled the microarray matrix X such that the gene-wise means were identical to the scRNAseq data. Results are shown in Figure 6 in pink. Although accuracy was slightly reduced, we still improved on the CIBERSORT results.

3.8. Loss-function learning improves the decomposition of bulk melanoma profiles

All mixtures discussed so far were artificial because only 100 single-cell profiles were chosen randomly. They might differ significantly from mixtures in real tissue. Therefore, we generated 19 full bulk melanoma profiles by summing up the respective single-cell profiles. These should reflect bulk melanomas (Marinov et al., 2014). Our predictions are contrasted with the true proportions in Figure 7. Only the predictions for Tregs were compromised with $r = 0.48$ , whereas the predictions for all other cell types were reliable with correlations ranging from $r = 0.70$ (CD4⁺ Th) to $r = 0.99 e q n o o p e n (C A F s)$ on the validation melanomas.

FIG. 7.

Deconvolution of melanoma tissues. The circles indicate melanomas from the validation data and plusses indicate those from the training data. (a–h) Correspond to B cells, macrophages, endothelial cells, CAFs, NK cells, CD4⁺ Th cells, CD8⁺ T cells, and CD4⁺ Tregs, respectively. The solid black lines show the corresponding linear regression fits on the validation data, the dashed lines show the identity. NK, natural killer; Th, T-helper; Tregs, regulatory T cells.

4. Discussion

We suggest using training data for loss-function learning for DTD to adapt the deconvolution algorithm to the requirements of specific application domains. The concept is similar to an embedded feature-selection approach in regression or classification problems. In both contexts, feature selection is directly linked to a prediction algorithm and not treated as an independent preprocessing step.

The main limitation of our method is the availability of training data. Other methods do not use, and cannot use, training data. In fact, the strength of loss-function learning results primarily from the additional information in training data with known cellular compositions. Such data are not always available, but with current improvements in FACS and single-cell sequencing technology, it is becoming increasingly available.

We described and tested a specific instance of loss-function learning using squared residuals for $ℒ_{g}$ . The concept is not limited to this type of inner loss function and can also be used in combination with other loss functions such as those from penalized least-squares regression (Altboum et al., 2014), l₁ regression, or support vector regression (Newman et al., 2015). However, the least-squares loss function allowed us to state the outer optimization problem in a closed analytical form, reducing computational burden.

The outer loss function L evaluates the fit of estimated and true cellular proportions in the training samples. We chose the correlation of estimated versus true quantities across samples, and no absolute measure of deviation such as $| | c - ĉ | |_{2}^{2}$ , which does not fulfill symmetry (8). Moreover, we did not require the estimated proportions $Ĉ_{\cdot, k}$ for tissue k to sum up to 1. Consequently, the estimated cellular composition for a given cell type is comparable between tissues, but the estimated cellular composition across cell types is not. When testing our method, we did not look at absolute deviations of true versus estimated cell proportions but only at their correlation. We do not infer how many cells of a specific type (e.g., T cells) are in a tissue (Fig. 7), nor whether they constituted 10% or 20% of the cells in this tissue. However, if we had two tissues and estimated that there were more cells of that type in the first tissue compared with the second, this relation was also found in the true cell populations.

In summary, we introduced loss-function learning as a new machine-learning approach to the DTD problem. It allows us to adapt to application-specific requirements such as focusing on small cell populations or delineating similar cell types. In simulations and in an application to melanoma tissues, the use of training data allowed our method to quantify large cell fractions as accurately as existing methods and significantly improved the detection of small cell populations and the distinction of similar cell types.

5. Appendix

5.1. The choice of the objective function

The loss function (3) is not unique. However, Pearson correlation has advantages with respect to data normalization as shown in the following. First, note that $c o r (C_{j, \cdot}, a_{j} Ĉ_{j, \cdot}) = c o r (C_{j, \cdot}, Ĉ_{j, \cdot}),$ (8)

where a_j is an arbitrary positive constant. This symmetry is important, since bulk and reference profiles must be normalized to a common mean across genes or to a common library size. A normalized reference profile $X_{\cdot, j}$ of a cell type reflects the true RNA content ${\tilde{X}}_{\cdot, j}$ of these cells only up to an unknown factor: $X_{\cdot, j} = α_{j} {\tilde{X}}_{\cdot, j}$ . Large cells with a lot of RNA have smaller $α_{j}$ than smaller cells. The same is true for the bulk profiles $Y_{\cdot, k}$ , where we have $Y_{\cdot, k} = β_{k} Ỹ_{\cdot, k}$ . The deconvolution equation $Ỹ_{\cdot, k} = \tilde{X} {\tilde{C}}_{\cdot, k} + ε$ (9)

yields estimates ${\tilde{C}}_{j k}$ that reflect the number of cells of type j. However, $Ỹ$ and $\tilde{X}$ are not observable in practice, and consequently, $\tilde{C}$ is not accessible by DTD directly. We need to work with X and Y instead.

Note that $C_{\cdot, k} = {\tilde{C}}_{\cdot, k} ∕ \sum_{j = 1}^{q} {\tilde{C}}_{j k}$ . Consider now the hypothetical deconvolution formula with normalized Y but the unobservable true $\tilde{X}$ $Y_{\cdot, k} = \tilde{X} C'_{\cdot, k} + ε .$ (10)

Here, we assume $C'_{\cdot, k} = c C_{\cdot, k}$ for all k, where c is a positive constant. In other words, we assume that if the library size of $Y_{\cdot, k}$ is the same for all samples, we will roughly need the same number of cells to account for it. This allows us to replace $Ỹ$ by Y.

The choice of the correlation in the definition of $L (g)$ also allows us to replace $\tilde{X}$ by X. If we write Equation (10) using X, we obtain $Y_{\cdot, k} = \sum_{j = 1}^{q} \frac{1}{α_{j}} X_{\cdot, j} C'_{j k} + ε .$ (11)

Thus, the estimated cell frequencies are $\frac{1}{α_{j}} C'_{j, \cdot} = \frac{c}{α_{j}} C_{j, \cdot}$ , and can be quite different from the training proportions $C_{j, \cdot}$ in absolute numbers. Nevertheless, they correlate with $C_{j, \cdot}$ and will thus generate small losses $L (g)$ .

In summary, data normalization makes tissue deconvolution a nonstandard deconvolution problem. The choice of correlation as loss function allows us to estimate cell frequencies independent of normalization factors.

5.2. Calculation of the loss-function gradient

The gradient $\frac{\partial L (g)}{\partial g_{i}}$ can be rewritten as follows. We define $\begin{matrix} A_{j, k} & = \frac{1}{σ_{j} {\hat{σ}}_{j}} (\frac{cov (C_{j, \cdot}, {\hat{C}}_{j, \cdot})}{n {\hat{σ}}_{j}^{2}} ({\hat{C}}_{j k} - {\hat{μ}}_{j}) - \frac{1}{n} (C_{j k} - μ_{j})) \\ B & = {(X^{T} Γ X)}^{- 1} X^{T}, \\ C & = (1 - X {(X^{T} Γ X)}^{- 1} X^{T} Γ) Y, \end{matrix}$ (12)

and $E = C \cdot A^{T} \cdot B .$ (13)

Then, we get $\begin{matrix} \frac{\partial L (g)}{\partial g_{i}} & = T r [A^{T} \cdot \frac{\partial Ĉ (g)}{\partial g_{i}}] \\ = T r [A^{T} \cdot B \cdot δ (i) \cdot C] \\ = T r [E \cdot δ (i)] \\ = E_{i i} . \end{matrix}$ (14)

Thus, the gradient is given by the diagonal elements of $C \cdot A^{T} \cdot B$ .

Footnotes

Author Disclosure Statement

The authors declare they have no conflicting financial interests.

Funding Information

This study was supported by BMBF (eMed Grant No. 031A428A) and DFG (FOR-2127 and SFB/TRR-55).

References

Abbas

A.R.

, Wolslegel

, Seshasayee

, et al. 2009. Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus. PLoS One, 4, e6098.

Addison

E.G.

, North

, Bakhsh

, et al. 2005. Ligation of CD8α on human natural killer cells prevents activation-induced apoptosis and enhances cytolytic activity. Immunology, 116, 354–361.

Altboum

, Steuerman

, David

, et al. 2014. Digital cell quantification identifies global immune cell dynamics during influenza infection. Mol. Syst. Biol. 10, 720.

Aziz

, Harrop

S.P.

, and Bishop

N.E.

2011. DIA1R is an X-linked gene related to deleted in autism-1. PLoS One, 6, e14534.

Bendall

S.C.

, Simonds

E.F.

, Qiu

, et al. 2011. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science, 332, 687–696.

Chen

W.-C.

, Ostrouchov

, Schmidt

, et al. 2012a. A quick guide for the pbdMPI package. R Vignette. Available at: https://cran.r-project.org/package=pbdMPI. Last viewed on January 20, 2020.

Chen

W.-C.

, Ostrouchov

, Schmidt

, et al. 2012b. pbdMPI: Programming with big data—Interface to MPI. R Package. Available at: https://cran.r-project.org/package=pbdMPI. Last viewed on January 20, 2020.

, Yang

, and Yee

A.J.

2013. Roles of versican in cancer biology—Tumorigenesis, progression and metastasis. Histol. Histopathol. 28, 701–713.

Fridman

W.H.

, Pagès

, Sautès-Fridman

, et al. 2012. The immune contexture in human tumours: Impact on clinical outcome. Nat. Rev. Cancer, 12, 298.

10.

Galon

, Costes

, Sanchez-Cabo

, et al. 2006. Type, density, and location of immune cells within human colorectal tumors predict clinical outcome. Science, 313, 1960–1964.

11.

Georg

, Richtmann

, and Wettig

2017. DD-αAMG on QPACE 3. arXiv.org DOI: 10.1051/epjconf/201817502007.

12.

Gong

, Hartmann

, Kohane

I.S.

, et al. 2011. Optimal deconvolution of transcriptional profiling data using quadratic programming with application to complex clinical blood samples. PLoS One, 6, e27156.

13.

Gory-Faure

, Prandini

, Pointu

, et al. 1999. Role of vascular endothelial-cadherin in vascular morphogenesis. Development, 126, 2093–2102.

14.

Gupta

, Kaur

C.D.

, Jangdey

, et al. 2014. Matrix metalloproteinase enzymes and their naturally derived inhibitors: Novel targets in photocarcinoma therapy. Age. Res. Rev. 13, 65–74.

15.

Hackl

, Charoentong

, Finotello

, et al. 2016. Computational genomics tools for dissecting tumour-immune cell interactions. Nat. Rev. Genet. 17, 441–458.

16.

Haseloff

R.F.

, Dithmer

, Winkler

, et al. 2015. Transmembrane proteins of the tight junctions at the blood–brain barrier: Structural and functional aspects. Semin. Cell Dev. Biol. 38, 16–25.

17.

Haziot

, Ferrero

, Köntgen

, et al. 1996. Resistance to endotoxin shock and reduced dissemination of gram-negative bacteria in CD14-deficient mice. Immunity, 4, 407–414.

18.

Hori

, Nomura

, and Sakaguchi

2003. Control of regulatory T cell development by the transcription factor Foxp3. Science, 299, 1057–1061.

19.

Hsueh

R.C.

, and Scheuermann

R.H.

2000. Tyrosine kinase activation in the decision between growth, differentiation, and death responses initiated from the B cell antigen receptor. Adv. Immunol. 75, 283–316.

20.

Ibrahim

S.F.

, and van den Engh

2007. Flow cytometry and cell sorting. Adv. Biochem. Eng. Biotechnol. 106, 19–39.

21.

, Severson

, Pignon

J.-C.

, et al. 2016. Comprehensive analyses of tumor immunity: Implications for cancer immunotherapy. Genome Biol. 17, 174.

22.

, Ayer

L.M.

, Lytton

, et al. 2003. Store-operated cation entry mediated by CD20 in membrane rafts. J. Biol. Chem. 278, 42427–42434.

23.

, Nakorchevskiy

, and Marcotte

E.M.

2003. Expression deconvolution: A reinterpretation of DNA microarray data reveals dynamic changes in cell populations. Proc. Natl. Acad. Sci. U. S. A. 100, 10370–10375.

24.

Marinov

G.K.

, Williams

B.A.

, McCue

, et al. 2014. From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing. Genome Res. 24, 496–510.

25.

Moretta

, Bottino

, Vitale

, et al. 2001. Activating receptors and coreceptors involved in human natural killer cell-mediated cytolysis. Annu. Rev. Immunol. 19, 197–223.

26.

Newman

A.M.

, Liu

C.L.

, Green

M.R.

, et al. 2015. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods, 12, 453–457.

27.

Qiao

, Quon

, Csaszar

, et al. 2012. PERT: A method for expression deconvolution of human blood samples from varied microenvironmental and developmental conditions. PLoS Comput. Biol. 8, e1002838.

28.

Rickert

R.C.

, Rajewsky

, and Roes

1995. Impairment of T-cell-dependent B-cell responses and B-l cell development in CD19-deficient mice. Nature, 376, 352–355.

29.

Sadler

J.E.

1998. Biochemistry and genetics of von Willebrand factor. Annu. Rev. Biochem. 67, 395–424.

30.

Sherr

C.J.

, Rettenmier

C.W.

, Sacca

, et al. 1985. The c-fms proto-oncogene product is related to the receptor for the mononuclear phagocyte growth factor, CSF 1. Cell, 41, 665–676.

31.

Shi

, Lu

, Wu

, et al. 2011. Endothelial cell-specific molecule 2 (ECSM2) localizes to cell-cell junctions and modulates bFGF-directed cell migration via the ERK-FAK pathway. PLoS One, 6, 1–15.

32.

Sneddon

J.B.

, Zhen

H.H.

, Montgomery

, et al. 2006. Bone morphogenetic protein antagonist gremlin 1 is widely expressed by cancer-associated stromal cells and can promote tumor cell proliferation. Proc. Natl. Acad. Sci. U. S. A. 103, 14842–14847.

33.

Tirosh

, Izar

, Prakadan

S.M.

, et al. 2016. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science, 352, 189–196.

34.

Veillette

, Bookman

M.A.

, Horak

E.M.

, et al. 1988. The CD4 and CD8 T cell surface antigens are associated with the internal membrane tyrosine-protein kinase p56lck. Cell, 55, 301–308.

35.

Wienands

, Schweikert

, Wollscheid

, et al. 1998. SLP-65: A new signaling component in B lymphocytes which requires expression of the antigen receptor for phosphorylation. J. Exp. Med. 188, 791–795.

36.

A.R.

, Neff

N.F.

, Kalisky

, et al. 2014. Quantitative assessment of single-cell RNA-sequencing methods. Nat. Methods, 11, 41–46.

37.

Y.J.

, La Pierre

D.P.

, Wu

, et al. 2005. The interaction of versican with its binding partners. Cell Res. 15, 483–494.