NeuroMDAVIS: Visualization of Single-Cell Multi-Omics Data under Deep Learning Framework

Abstract

Single-cell technologies have favored extensive advancements in cell-type discovery, cell state identification, development of lineage tracing, and disease understanding among others. Further, single-cell multi-omics data generated using modern technologies provide several views of omics contribution for the same set of cells. Analyzing these views of multi-omics data is hindered by large dimensions of the same. In this regard, one effective approach is dimensionality reduction and thereby visualization (in 2D or 3D space) of the integrated views of multi-omics data. However, dimension reduction and visualization of these datasets remain a challenging task since obtaining a low-dimensional embedding that preserves information about local and global structures in data is difficult. Moreover, combining different views obtained from each omics layer to interpret the underlying biology is even more challenging. In this work, we introduce NeuroMDAVIS, a novel unsupervised deep neural network model, for joint visualization of biological datasets having multiple modalities. Joint visualization refers to transforming the feature space of each modality and combining them to produce a latent embedding that supports visualization of the multi-modal dataset in the newly transformed feature space. NeuroMDAVIS transforms the feature space of each modality and integrates them into a shared latent space, capturing both modality-specific and common information across different layers. The model effectively learns both local and global relationships within the data, providing a meaningful low-dimensional representation for further analysis. NeuroMDAVIS is able to capture both individual modality-specific information as well as common information across all modalities. When it comes to visualization capability, NeuroMDAVIS competes against state-of-the-art visualization models such as t-distributed Stochastic Neighbor Embedding (t-SNE), Uniform Manifold Approximation and Projection (UMAP), Fast interpolation-based t-SNE (Fit-SNE), the Siamese network-based visualization method (IVIS), and the manifold learning-based generalized version of UMAP, called MultiMAP. Downstream analyses have reflected effective classification and clustering performance over all the datasets, in terms of accuracy, precision, recall, F1 score, and various cluster validity indices. To the best of our knowledge, NeuroMDAVIS is the first model to offer joint visualization for multi-modal biological datasets. It competes with the state-of-the-art visualization methods, providing a robust and efficient approach for understanding complex multi-omics data.

Keywords

ATAC-seq Mass cytometry CITE-seq deep learning global structure preservation multi-omics visualization shape preservation single-cell omics unsupervised learning

1. INTRODUCTION

With technological advancements, the realm of molecular biology has advanced into a plethora of possibilities. Single-cell technology has opened up newer dimensions for omics data analyses that lead to unraveling of the disease or developmental processes at the cellular level. Single-cell RNA-sequencing (scRNA-seq) allows measurement of mRNA expressions at a cellular resolution. Current scRNA-seq protocols include DROP-seq (Macosko et al., 2015), SMART-seq2 (Picelli et al., 2013), 10x Genomics (Weisenfeld et al., 2017; Marks et al., 2019; Zheng et al., 2017; Satpathy et al., 2019), 10x Chromium (Farbehi et al., 2019), CEL-seq2 (Hashimshony et al., 2016) and MARS-seq (Adil et al., 2021). Assay for Transposase-Accessible Chromatin with sequencing (ATAC-seq) (Buenrostro et al., 2015), on the other hand, allows sequencing of open chromatin regions within the genome. More recently, joint sequencing technologies have come up, which allow simultaneous measurement of more than one modalities thereby offering multiple views of the same cells of interest. These technologies include CITE-seq (Stoeckius et al., 2017), REAP-seq (Peterson et al., 2017) (joint measurement of proteins and transcriptome), scM&T-seq (Angermueller et al., 2016), scMT-seq (Y. Hu et al., 2016), scTrio-seq (Hou et al., 2016) (simultaneous measurement of methylome and trascriptome), and sci-CAR (Cao et al., 2018), SNARE-seq (S. Chen et al., 2019) and SHARE-seq (Clyde, 2021) (joint measurement of chromatin accessibility and transcriptome in single-cells or nuclei). All these technologies facilitate deeper understanding of the cellular identity and processes in organs and tissues.

Addressing the challenges in data analysis with an eye towards interpretability has become necessary as data volumes have increased. A recently developed interpretable framework, called scMoMtf (Lan et al., 2024), capable of multi-task learning, can address some of these challenges involved in single-cell multi-omics data analysis. However, multi-omics datasets originate from different sources, and a large number of dimensions in each modality makes the data incredibly complex. Thus, reduction of dimension while preserving the inherent structures within the data, is a basic requirement for easy exploration and interpretation of data. Visualizations help in identifying patterns, clusters, outliers, and correlations within these single and multi-omics datasets.

The task of visualization can be thought as a refined dimension reduction (DR) task, where the number of reduced dimensions is limited to two or three. Some of the DR techniques developed so far are used to accomplish visualization task. Earlier, most of the DR methods were linear in nature (Van Der Maaten et al., 2009). However, a large number of nonlinear DR techniques for data visualization have come up during the last decade. These methods possess the ability to handle data nonlinearity. Methods such as Stochastic Neighbor Embedding (SNE), t-distributed SNE (t-SNE) (Van der Maaten and Hinton, 2008), and Fast interpolation-based t-SNE (Fit-SNE) (Linderman et al., 2019) assume that the data follow a particular theoretical distribution. Hence, they try to learn a low-dimensional embedding that follows a similar distribution, which, however, may contradict the assumption. Some methods, such as Isomap and Uniform Manifold Approximation and Projection (UMAP) (Becht et al., 2018), try to reconstruct the hidden topological structure within the data but fail to do so when the actual structure is complex. Other neural network-based methods such as IVIS (Szubert et al., 2019), SOM (Kohonen, 1990), and Autoencoders try to learn a suitable nonlinear transformation to project the data into a low dimension. Neural network-based methods have an edge over the others since they mostly do not assume any distribution of the data, and are nonparametric.

Interestingly, when it comes to visualizing observations from multi-omics experiments, there is no joint data visualization method that can combine views from several omics layers. Such a joint visualization can assist researchers in explaining the underlying biology of a system/process in a better way. MultiMAP (Jain et al., 2021), a recently developed method for dimensionality reduction and multi-omics integration (a more generalized version of UMAP), enables shared visualization of multi-omics datasets. It creates a nonlinear manifold representing different omics modalities, allowing the identification of shared features across cells in a co-embedded space. However, the shared visualization produced by MultiMAP generates separate embeddings for each of the omics modalities.

Thus, motivated by the wide benefits of using neural networks, in this work, we have introduced a multi-modal data visualization model developed under a deep learning framework to address this challenge. NeuroMDAVIS is capable of extracting crucial features from the data and producing an effective joint visualization. It is a feed-forward deep learning model that does not assume any kind of data distribution to visualize high-dimensional data. Most of the state-of-the-art methods have been observed to perform well on complex datasets when conjugated together with an initialization step like that of applying a Principal Component Analysis (PCA), to reduce the number of dimensions to something less than or equal to 50 (Kobak and Linderman, 2021). NeuroMDAVIS does not require any such initialization, which makes it more efficient and effective. Our primary aim is multi-omics data visualization; in this study, we have limited our experiments to multi-omics datasets only. NeuroMDAVIS is the first of its kind to generate a joint visualization of different omics modalities. It facilitates visualizing multi-omics modalities jointly, thereby providing an integrated view that captures correlations and interactions among modalities directly in the same space. However, it can be applied to other domains as well. Additionally, we have also explored how NeuroDAVIS, a precursor of NeuroMDAVIS (Maitra et al., 2024), performs on these multi-omics datasets when used on each omics modality separately.

2. METHODS

The unsupervised deep learning model, called NeuroMDAVIS, developed in this work, allows joint visualization of high-dimensional multi-omics datasets. To the best of our knowledge, it may be considered as the first model that allows joint visualization of multi-omics data. NeuroMDAVIS is a generalized version of NeuroDAVIS (Maitra et al., 2024) introduced earlier to provide visualization of a single data modality. Nevertheless, NeuroMDAVIS incorporates significant changes to the network architecture over NeuroDAVIS. Figure 1A and B describes the overall workflow and the detailed architecture of the proposed model.

FIG. 1.

(A) A graphical abstract reflecting the overall workflow. (B) The architecture of NeuroMDAVIS, that have been developed for multi-omics data visualization.

2.1. Architecture

NeuroMDAVIS architecture represents a novel structure as shown in Figure 1B. It consists of four different types of layers, viz., an Input layer, a Latent layer, one or more Hidden layer(s) and a Reconstruction layer. Hidden layer(s) are of two types, viz., Shared hidden layer(s) and Modality-specific hidden layer(s). Input layer, Latent layer, and Shared hidden layer(s) are densely connected, i.e., each node in any of these layers is connected to each node of its adjacent layer(s). However, Modality-specific hidden layer(s) does not share a completely dense connection; instead, it shares a dense connection modality-wise only.

Let $X = {x_{i j} : x_{i j} \in R^{d_{j}} | i = 1, 2, \dots, n; j = 1, 2, \dots, m}$ be a dataset containing m different omics modalities. In each modality, there are equal number (say n) of samples/observations, where each sample has data related to all m omics modalities. A $j^{t h}$ ( $j = 1, 2, \dots, m$ ) modality is characterized by $d_{j}$ features, i.e., $x_{i j} = {[x_{i j 1}, x_{i j 2}, \dots, x_{i j d_{j}}]}^{T}, i = 1, 2, \dots, n; j = 1, 2, \dots, m$ . The Input layer of NeuroMDAVIS has n number of neurons while the Latent layer has k ( $= 2$ or 3) number of neurons, where k is the desired number of dimensions to be used for visualization. Both the Modality-specific hidden layer(s) and Reconstruction layer have m modality-specific sub-modules. In each sub-module of the Reconstruction layer, there are $d_{j}$ neurons. However, the number of neurons in the Hidden layer(s) are decided empirically.

Input layer of NeuroMDAVIS takes an identity matrix of order $n \times n$ as input. Similar to NeuroDAVIS, the Input layer creates a random latent embedding of n samples at the Latent layer. Latent layer then tries to regress all the data modalities simultaneously through the aforesaid Hidden layers. The purpose of using a Shared hidden layer is to capture the common information across different modalities for a particular sample/observation, whereas, a Modality-specific hidden layer extracts the modality-specific information for that sample/observation. In this work, only one Shared hidden layer and one Modality-specific hidden layer have been used. One can use multiple such layers based on the problem requirement. When $m = 1$ , the architecture resembles that of NeuroDAVIS with a general Hidden layer(s) instead of distinct Shared hidden layer(s) and Modality-specific hidden layer(s). Finally, the Reconstruction layer reconstructs the data in hand, i.e., the original individual omics modalities.

2.2. Forward propagation

We have considered a dataset $X$ with m omics modalities each having n paired samples, i.e., $X = {x_{i j} : x_{i j} \in R^{d_{j}} | i = 1, 2, \dots, n; j = 1, 2, \dots, m}$ . NeuroMDAVIS takes an identity matrix $I$ of order $n \times n$ as input. An $i^{t h}$ column vector $e_{i}$ of $I$ is considered for regressing the $i^{t h}$ paired sample. More precisely, $e_{i}$ propagates through the layers of NeuroMDAVIS and reconstructs an approximate version $[{\tilde{x}}_{i 1}, {\tilde{x}}_{i 2}, \dots, {\tilde{x}}_{i m}]$ of $[x_{i 1}, x_{i 2}, \dots, x_{i m}]$ at the Reconstruction layer on presentation of $i^{t h}$ sample. Let $a_{i l}$ and $h_{i l}$ correspond to the input to and output from the $l^{t h}$ layer respectively. Let $W_{l}$ and $b_{l}$ be the weight matrix between ${(l - 1)}^{t h}$ layer and $l^{t h}$ layer ( $l = 1, 2, \dots, (s + 1)$ ), and the bias term for nodes in $l^{t h}$ layer respectively. Here, s stands for the total number of Shared hidden layer present in the network. An $l^{t h}$ layer may be any of the Input layer ( $l = 0$ ), Latent layer ( $l = 1$ ) or Shared hidden layer(s) ( $l = 2, 3, \dots, (s + 1)$ ). Thus, for the Input layer, we have

{\begin{array}{l} a_{i 0} = e_{i}, \\ h_{i 0} = e_{i}, i = 1, 2, \dots, n \end{array}

(1)

For the Latent layer, we have

{\begin{array}{l} a_{i 1} = W_{1} e_{i} + b_{1}, \\ h_{i 1} = a_{i 1}, i = 1, 2, \dots, n \end{array}

(2)

Here, the weight parameters, controlled by the input $e_{i}$ , create an independent low-dimensional representation of the $i^{t h}$ sample at the Latent layer. This is required to ensure that only the links connected to the $i^{t h}$ neuron of the Input layer activate neurons in the Latent layer on presentation of the $i^{t h}$ sample. Then, for s Shared hidden layer(s), we have

{\begin{array}{l} a_{i l} = W_{l} h_{i (l - 1)} + b_{l} \\ h_{i l} = ReLU (a_{i l}), l = 2, 3, \dots, (s + 1); i = 1, 2, \dots, n \end{array}

(3)

where

ReLU (y) = \max (0, y)

; max (maximum) being an element-wise operation. The output from the last Shared hidden layer will flow to the end of the network through different Modality-specific hidden layer(s) and reconstruct every modality at the Reconstruction layer. Let the number of Modality-specific hidden layer(s) be p. Let us assume that

A_{i l}^{(j)}

and

H_{i l}^{(j)}

denote the input to and output from the

j^{t h}

module of the

l^{t h}

layer respectively. Let us also assume that

V_{l}^{(j)}

and

B_{l}^{(j)}

denote the weight matrix between the

j^{t h}

modules of the

{(l - 1)}^{t h}

and

l^{t h}

layer, and the bias term for nodes in

j^{t h}

module of the

l^{t h}

layer respectively. Here, an

l^{t h}

layer may be any of the Modality-specific hidden layer(s) (

l = (s + 2), (s + 3), \dots, (s + p + 1)

) or a Reconstruction layer (

l = (s + p + 2)

Thus, for the first Modality-specific hidden layer, we have

{\begin{array}{l} A_{i (s + 2)}^{(j)} = V_{(s + 2)}^{(j)} h_{i (s + 1)} + B_{(s + 2)}^{(j)} \\ H_{i (s + 2)}^{(j)} = ReLU (A_{i (s + 2)}^{(j)}) i = 1, 2, \dots, n; j = 1, 2, \dots, m \end{array}

(4)

For the remaining Modality-specific hidden layer(s), we have

{\begin{array}{l} A_{i l}^{(j)} = V_{l}^{(j)} H_{i (l - 1)}^{(j)} + B_{l}^{(j)} \\ H_{i l}^{(j)} = ReLU (A_{i l}^{(j)}) l = (s + 3), (s + 4), \dots, (s + p + 1); i = 1, 2, \dots, n; j = 1, 2, \dots, m \end{array}

(5)

Finally, at the Reconstruction layer, a reconstruction (lossy) of the original data is formed. For the Reconstruction layer, we have

{\begin{array}{l} A_{i (s + p + 2)}^{(j)} = V_{(s + p + 2)}^{(j)} H_{i (s + p + 1)}^{(j)} + B_{(s + p + 2)}^{(j)} \\ H_{i (s + p + 2)}^{(j)} = A_{i (s + p + 2)}^{(j)} \\ {\tilde{x}}_{i j} = H_{i (s + p + 2)}^{(j)}, i = 1, 2, \dots, n; j = 1, 2, \dots, m \end{array}

(6)

Thus, NeuroMDAVIS projects the latent embedding for a sample, obtained at the Latent layer, to different $d_{j}$ dimensional spaces, corresponding to each of $j^{t h}$ omics modality, through the Hidden layer(s), so that the sample gets reconstructed at the Reconstruction layer. The vector $H_{i (s + p + 2)}^{(j)}$ represents the lossy reconstruction ${\tilde{x}}_{i j}$ of $x_{i j}$ , the $i^{t h}$ sample in the $j^{t h}$ omics modality.

2.3. Learning

NeuroMDAVIS enables DR and visualization of high-dimensional multi-omics data. Similar to NeuroDAVIS, NeuroMDAVIS also tries to reconstruct the data to be visualized. For $i^{t h}$ sample $x_{i j}$ , NeuroMDAVIS tries to minimize the reconstruction error $‖ x_{i j} - {\tilde{x}}_{i j} ‖$ in order to find an optimal reconstruction ${\tilde{x}}_{i j}$ of the sample $x_{i j}$ . Since different omics modalities may have different numbers of dimensions, to avoid disparity in learning, a balancing parameter $λ_{j}$ has been introduced into the objective function for NeuroMDAVIS. The value of $λ_{j}$ lies in $(0, 1]$ , where higher value of $λ_{j}$ implies higher weightage to the $j^{t h}$ data modality. $λ_{j}$ is useful when knowledge about the data modalities is available a priori. In the absence of prior knowledge, one can simply choose $λ_{j} = 1,$ $\forall j$ . The objective function thus becomes

L_{NeuroMDAVIS} = \frac{1}{n} \sum_{j = 1}^{m} λ_{j} \sum_{i = 1}^{n} ‖ x_{i j} - {\tilde{x}}_{i j} ‖^{2}

(7)

To avoid overfitting and minimization of model complexity, L2 regularization, involving activities of nodes and weights, has been considered. The objective function thus becomes

\begin{array}{l} L_{NeuroMDAVIS} = \frac{1}{n} \sum_{j = 1}^{m} \sum_{i = 1}^{n} λ_{j} ‖ x_{i j} - {\tilde{x}}_{i j} ‖^{2} + α \sum_{i = 1}^{n} [\sum_{l = 1}^{(s + 1)} ‖ h_{i l} ‖_{2} + \sum_{l = (s + 2)}^{(s + p + 1)} \sum_{j = 1}^{m} ‖ H_{i l}^{(j)} ‖_{2}] \\ + β [\sum_{l = 1}^{(s + 1)} ‖ W_{l} ‖_{F} + \sum_{l = (s + 2)}^{(s + p + 1)} \sum_{j = 1}^{m} ‖ V_{l}^{(j)} ‖_{F}], \end{array}

(8)

where

α

and

β

are regularization parameters set empirically.

We have used the Adam optimizer for training NeuroMDAVIS. The number of epochs needed for convergence has been set empirically by monitoring the convergence behavior of the training loss. Section 3.4 further explains how other hyperparameters have been tuned. On convergence, NeuroMDAVIS learns a parametric function that can efficiently produce the omics modalities reconstructed from the low-dimensional latent embedding. This latent embedding has then been extracted to produce a joint visualization of the multi-omics data.

2.4. Projection of new observations

The first two layers of NeuroMDAVIS, viz., Input layer and Latent layer, are used to learn suitable regressors. In other words, they control the points in the low-dimensional embedding in each iteration, while the remaining part of the network tries to optimize a function that can project the low-dimensional data into high dimension. Once NeuroMDAVIS is trained on a training dataset, we can visualize the data at the Latent layer. The weights from the Latent layer through the Reconstruction layer have been learned in such a way that it can project any low-dimensional point to a high-dimensional space.

In order to visualize new observations (not present during training but having a similar distribution), the Input layer of NeuroMDAVIS has to be presented with an identity matrix, which will be of the order of the number of observations in this unseen data (test dataset). This test dataset must contain the same number of omics modalities as the training data. Thus, the Input layer dimension will be equal to the size of the test dataset. The weights between the Input layer and the Latent layer need to be initialized again, and only the sub-network comprising the Input layer and the Latent layer needs to be re-trained, allowing the weights connecting these two layers to be updated, while the others are kept frozen to the already learned values. These weights are then used to visualize new samples.

Let there be $n_{1}$ observations in the test data. In order to visualize the test data, an identity matrix of size $n_{1} \times n_{1}$ is fed to the Input layer of the new network. Similar to the training phase, a loss is calculated at the Reconstruction layer. However, unlike training, during visualization of new samples, only the weights connecting the Input layer to the Latent layer are updated via standard back-propagation. Upon convergence, the final latent embedding for these unseen samples can be extracted from the Latent layer.

3. RESULTS

Superior performance of NeuroMDAVIS has been demonstrated on multi-omics datasets (Table 1), including CITE-seq (Section 3.1.1) and multiome data (Section 3.1.2), over some state-of-the-art visualization methods, such as t-SNE, UMAP, IVIS, and Fit-SNE. Moreover, it has also been compared against the shared embedding produced by MultiMAP in Section 3.2. Table 2 presents a comparison of basic features of NeuroMDAVIS with those of the above existing methods. Finally, the sensitivity analysis of hyperparameters has been discussed in Section 3.4.

Table 1.
Summary of Datasets Used in This Work

Dataset Technology Description #Cells #RNAs #ADTs/ #peaks Batches present Source

bmcite30k CITE-seq scRNA-seq profiles measured alongside a panel of antibodies from bone marrow 30672 17009 25 Yes Stuart et al., 2019

kotliarov50k CITE-seq CITE-seq profiling of 82 surface proteins and transcriptomes of 53,201 single-cells from healthy high and low influenza-vaccination responders 58654 32738 87 Yes Kotliarov et al., 2020; Lotfollahi et al., 2022

pbmc10k Multiome single-cell multiome ATAC and gene expression data from cryopreserved human peripheral blood mononuclear cells of a healthy female donor 11909 36601 108377 No Bredikhin et al., 2022

Dataset	Technology	Description	#Cells	#RNAs	#ADTs/ #peaks	Batches present	Source
bmcite30k	CITE-seq	scRNA-seq profiles measured alongside a panel of antibodies from bone marrow	30672	17009	25	Yes	Stuart et al., 2019
kotliarov50k	CITE-seq	CITE-seq profiling of 82 surface proteins and transcriptomes of 53,201 single-cells from healthy high and low influenza-vaccination responders	58654	32738	87	Yes	Kotliarov et al., 2020; Lotfollahi et al., 2022
pbmc10k	Multiome	single-cell multiome ATAC and gene expression data from cryopreserved human peripheral blood mononuclear cells of a healthy female donor	11909	36601	108377	No	Bredikhin et al., 2022

ATAC, Assay for Transposase-Accessible Chromatin; scRNA-seq, single-cell RNA-sequencing.

Table 2.

Comparison of Basic Features of the Visualization Methods Used in This Study

Method	Type	Supports multi-modal visualization (Yes/No)	Makes assumption about data distribution (Yes/No)
NeuroMDAVIS	Neural network-based	Yes (joint visualization)	No
NeuroDAVIS	Neural network-based	No	No
t-SNE	Distribution-based	No	Yes
UMAP	Manifold-based	No	No
Fit-SNE	Distribution-based	No	Yes
IVIS	Neural network-based	No	No
MultiMAP	Manifold-based	Yes (shared visualization)	No

Fit-SNE, Fast interpolation-based t-SNE; SNE, Stochastic Neighbor Embedding; t-SNE, t-distributed SNE; UMAP, Uniform Manifold Approximation and Projection.

3.1. Visualizing multi-omics data using NeuroMDAVIS

To evaluate the performance of NeuroMDAVIS with respect to multi-omics data visualization, we have used two multi-omics datasets, including two CITE-seq (joint profiling of RNA and surface protein measurements) and one multiome (paired RNA-seq and ATAC-seq) dataset. Since the existing visualization methods used for comparison do not support multi-modal visualization, for each dataset, we have concatenated the omics modalities available against the set of paired cells and used them as input to these methods for visualization. The following subsections describe the experiments carried out on these two datasets and their corresponding results.

3.1.1. Visualizing CITE-seq data

First, we have considered two CITE-seq datasets, viz., bmcite30k and kotliarov50k. These datasets have been downloaded from and preprocessed following (Maitra et al., 2023). CITE-Seq allows concurrent measurement of mRNA and cell surface proteins. In order to trace cell types, most of the existing visualization methods either use separate projections for ADTs and RNA molecules or project the protein expressions on the transcriptomic landscape. However, NeuroMDAVIS supports joint visualization of multiple modalities. As shown in Figure 2, for these two datasets, NeuroMDAVIS has been able to keep distinct cell types into distinct clusters, the clustering quality being qualitatively comparable only to IVIS. However, t-SNE, UMAP, and Fit-SNE have been unable to keep clusters well separated. We had initially apprehended that this might be due to the absence of a prior initialization step, like the usage of PCA, which sometimes serves as a good initialization for the state-of-the-art methods, such as t-SNE and UMAP (Chari and Pachter, 2023). Hence, we have carried out further experiments by using PCA as a prior initialization step for t-SNE, UMAP, and Fit-SNE, as shown in Figure 2. Though the results have improved quite a bit, NeuroMDAVIS and IVIS have still remained the best-performing models in terms of visualization.

FIG. 2.

2-dimensional embeddings produced by NeuroMDAVIS, t-SNE, PCA + t-SNE, UMAP, PCA + UMAP, Fit-SNE, PCA + Fit-SNE, and IVIS for bmcite30k (first row), and kotliarov50k (Second row) datasets. Fit-SNE, Fast interpolation-based t-SNE; PCA, Principal Component Analysis; SNE, Stochastic Neighbor Embedding; t-SNE, t-distributed SNE; UMAP, Uniform Manifold Approximation and Projection.

Thereafter, we have used k-NN and Random Forest classifiers to identify cell types based on the NeuroMDAVIS-generated projection, and compare results with those performed on projections generated by other existing methods. Training and test datasets have been prepared in an 80:20 ratio. The parameters k, representing the number of neighbours in k-NN, and n, being the number of estimators in Random Forest classification, have been varied between 5 and 45, and 20 and 100, respectively, to ensure consistency of results. As can be seen in Figure 3A–D, NeuroMDAVIS has outperformed all other methods with respect to Accuracy, Precision, Recall, and F1-score values for both the datasets, consistently over all values of k (for k-NN) and n (for Random Forest classifiers), on the $20 %$ held out test dataset. The prior implementation of PCA has improved the performance of t-SNE, UMAP, and Fit-SNE, but has still fallen short of achieving the accuracy of NeuroMDAVIS.

FIG. 3.

(A) and (B) show classification performance on low-dimensional embeddings of bmcite30k dataset generated using NeuroMDAVIS, t-SNE, PCA + t-SNE, UMAP, PCA + UMAP, Fit-SNE, PCA + Fit-SNE, and IVIS, in terms of accuracy, precision, recall, and F1-score using k-NN and Random Forest classifiers, respectively. (C) and (D) show the same on low-dimensional embeddings of kotliarov50k dataset.

Further, k-means clustering on the embedding generated by NeuroMDAVIS, has shown highly competitive performance, if not better than the other existing methods, in terms of Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), and Fowlkes–Mallows Index (FMI) scores. As depicted in Figure 4A,B, for bmcite30k dataset, NeuroMDAVIS has outperformed all the other methods, while for kotliarov50k dataset, the performance of NeuroMDAVIS has surpassed t-SNE, UMAP, and Fit-SNE by a high margin. In order to capture robustness, results from 100 distinct runs have been reported using different train-test partitions.

FIG. 4.

Clustering performance on low-dimensional embeddings generated using NeuroMDAVIS, t-SNE, PCA + t-SNE, UMAP, PCA + UMAP, Fit-SNE, PCA + Fit-SNE, and IVIS, with respect to ARI, NMI, and FMI scores of (A) bmcite30k, and (B) kotliarov50k datasets, using k-means clustering algorithm. ARI, Adjusted Rand Index; FMI, Fowlkes–Mallows Index; NMI, Normalized Mutual Information.

For further validation, we have performed a clinical classification on the kotliarov50k embedding generated by NeuroMDAVIS. The dataset kotliarov50k deals with single-cell CITE-seq data gathered from high and low-influenza vaccine responders. We have tried to explore whether the NeuroMDAVIS-generated projection is capable of classifying cells into low and high responder categories. For each unique cell cluster, we have performed a train-test split of 80:20 ratio. k-NN classifier with k = 45 has been used for this binary classification. Figure 5 (right) demonstrates that the Area-Under-the-Curve (AUC)-score obtained using NeuroMDAVIS on each of the cell types has been highly competitive with those obtained using the other state-of-the-art methods, viz., t-SNE, UMAP, IVIS, and Fit-SNE. It may be noted here that none of the methods could achieve beyond 58% AUC, which suggests that the cell type labels obtained for this dataset may not be optimal [Figure 5 (left)].

FIG. 5.

NeuroMDAVIS-produced sub-embeddings (based on cell types), colored by high/low influenza vaccine responder (left). Area-Under-the-Curve (AUC)-score for high/low influenza vaccine responder classification obtained using k-NN classifier (with $k = 50$ ) on the NeuroMDAVIS, t-SNE, UMAP, IVIS, and Fit-SNE projections of kotliarov50k dataset (right).

3.1.2. Visualizing multiome data

A commonly used method for assessing chromatin accessibility across the genome is the ATAC-seq. It may be mentioned here that one can learn about how chromatin packaging and other variables impact gene expression by using ATAC-seq to sequence open chromatin regions.

In this work, a multiome dataset pbmc10k, containing paired RNA-seq and ATAC-seq data, has been further used to demonstrate the effectiveness of NeuroMDAVIS for multi-omics data visualization. This pbmc10k dataset has been first preprocessed using MUON (Bredikhin et al., 2022), reduced to highly variable features only and then further processed to match cells in both the RNA and ATAC modalities, as done in (Maitra et al., 2023). NeuroMDAVIS has then been applied on the multi-omics data to generate a joint embedding of the paired assays. When compared to t-SNE, UMAP, and Fit-SNE, NeuroMDAVIS has produced better visualization of pbmc10k data, mapping each cell-type to a distinct cluster (Fig. 6). Prior application of PCA has improved the visualization for t-SNE, UMAP, and Fit-SNE, but still has some overlap among cell types. NeuroMDAVIS-generated embedding is only comparable to that produced by IVIS (Fig. 6).

FIG. 6.

2-dimensional embeddings produced by NeuroMDAVIS, t-SNE, PCA + t-SNE, UMAP, PCA + UMAP, Fit-SNE, PCA + Fit-SNE, and IVIS for pbmc10k dataset.

Additionally, to evaluate the quality of the embedding quantitatively, we have performed further downstream analysis, i.e., classification and clustering of cell-types on the NeuroMDAVIS-generated embedding. A similar train-test split has been used for classification in the case of CITE-seq data. Classification results on the held out test dataset, as shown in Figure 7A and B, demonstrate NeuroMDAVIS as the best performing model for DR and visualization across all measures like accuracy, precision, recall, and F1-score. On a similar note, k-means clustering on the NeuroMDAVIS projection has produced the highest ARI, FMI, and NMI scores compared to all other state-of-the-art methods, like t-SNE, PCA + t-SNE, UMAP, PCA + UMAP, Fit-SNE, PCA + Fit-SNE, and IVIS, as shown in Figure 7C.

FIG. 7.

(A) and (B) show classification performance on projections of pbmc10k dataset generated by NeuroMDAVIS, t-SNE, PCA + t-SNE, UMAP, PCA + UMAP, Fit-SNE, PCA + Fit-SNE, and IVIS in terms of accuracy, precision, recall, and F1-score using k-NN and Random Forest classifiers. (C) k-means clustering performance on embeddings of pbmc10k dataset produced by NeuroMDAVIS, t-SNE, PCA + t-SNE, UMAP, PCA + UMAP, Fit-SNE, PCA + Fit-SNE, and IVIS, in terms of ARI, NMI, and FMI scores.

3.2. Comparison against separate and shared visualization

In order to demonstrate effectiveness of joint-embedding from multi-omics dataset, we have compared that generated by NeuroMDAVIS with separate embeddings produced by NeuroDAVIS. We have further compared NeuroMDAVIS against another state-of-the-art dimensionality reduction and multi-omics integration method, viz., MultiMAP. MultiMAP, however, produces a shared visualization of multiple omics modalities, which projects different omics layers onto separate latent spaces while observing the combined effect of all modalities. In contrast, a joint visualization, as produced by NeuroMDAVIS, refers to generating a single and unified latent embedding by integrating multiple omics modalities.

Figure 8 illustrates the joint embedding generated by NeuroMDAVIS, and separate embeddings produced by NeuroDAVIS (a precursor of NeuroMDAVIS), followed by shared embeddings produced by MultiMAP on the bmcite30k (CITE-seq) and pbmc10k (Multiome) datasets. As discussed earlier, MultiMAP has generated separate embeddings for the RNA and protein modalities. NeuroMDAVIS-generated joint embedding, as observed by Figure 8, has resulted in better inter-cell cluster separation than both MultiMAP and NeuroDAVIS-generated RNA and protein-embeddings. In addition, the comparative performance of these three methods has been evaluated for the classification and clustering tasks. Classification performance on the NeuroMDAVIS-embedding, RNA, and protein-embedding generated by NeuroDAVIS and MultiMAP, has been assessed using metrics such as accuracy, precision, recall, and F1-score, while clustering performance on the same has been evaluated using ARI, NMI, and FMI). Figure 9 reflects NeuroMDAVIS-generated embeddings have performed better than their counterparts generated by NeuroDAVIS and MultiMAP, in all kinds of downstream tasks for both datasets.

FIG. 8.

2-dimensional joint embedding produced by NeuroMDAVIS, separate embeddings produced by NeuroDAVIS, and shared embeddings produced by MultiMAP for bmcite30k (first row), and pbmc10k (Second row) datasets.

FIG. 9.

Classification and clustering performance on projections generated by NeuroMDAVIS, NeuroDAVIS, and MultiMAP, in terms of accuracy, precision, recall and F1-score, and ARI, NMI, and FMI using k-NN classifiers and k-means clustering, respectively, of (A) bmcite30k and (B) pbmc10k datasets.

3.3. Projection of new observations

NeuroMDAVIS can be used as a pre-trained model to visualize data that are not present during the training process. In order to evaluate NeuroMDAVIS for its capability to visualize new observations not present during training, we have used both CITE-seq and multiome datasets. For each of these datasets, following a 60:40 split of training:test datasets, NeuroMDAVIS has been trained on the $60 %$ training data and the held-out test data of $40 %$ samples has been used for the visualization of new observations. As shown in Figure 10A, for the bmcite30k dataset, the embeddings on training samples and on test samples are very similar. In both the embeddings, CD8, CD4, CD14, NK, and B celltypes are clearly visible. Similar results have been observed on kotliarov50k dataset too (Fig. 10B). All the cell-types are identifiable in both the training and test embeddings, with their distributions being quite close to each other.

FIG. 10.

(A), (B), and (C) show 2D embeddings produced by NeuroMDAVIS on both the training and test datasets for bmcite30k, kotliarov50k, and pbmc10k, respectively.

For the multiome dataset, as depicted in Figure 10C, the major cell types, viz., T, NK and B, are clearly identifiable in both the training and test embeddings. All subtypes of T cells are also well separated. Thus, even when the entire data are not available at the time of training, NeuroMDAVIS supports visualization of newer observations added to an existing embedding at runtime.

Finally, classification and clustering performance have been compared between the train and test embeddings produced by NeuroMDAVIS. Based on accuracy, precision, recall, and F1-Score, classification performance is comparable for the multiome dataset (pbmc10k) and the CITE-seq datasets (bmcite30k and kotliarov50k), as shown in Figure 11. Similar results have been observed for clustering in terms of ARI, NMI, and FMI. The drop in performance for CITE-seq datasets is due to poor sample sizes in some of the clusters; for example, the ‘Classical monocytes and mDC’ and ‘Non-classical monocytes’ classes could not be identified properly in their test embeddings due to their smaller sample sizes, as shown in Figure 10B.

FIG. 11.

Classification and clustering performance on projections generated by NeuroMDAVIS over the train and test parts of bmcite30k, kotliarov50k, and pbmc10k datasets, in terms of accuracy, precision, recall, and F1-score, and ARI, NMI, and FMI using k-NN classifiers and k-means clustering, respectively.

3.4. Sensitivity analysis

NeuroMDAVIS, being a neural network model, involves several hyperparameters. In order to evaluate the robustness of NeuroMDAVIS, we have performed a sensitivity analysis on key hyperparameters. We have systematically varied the key hyperparameters around their original values. For each specific hyperparameter value, we have generated a latent embedding for a multi-omics dataset. The embedding qualities have then been quantitatively assessed by evaluating their performance in both classification and clustering tasks. Classification performance has been measured in terms of accuracy, precision, recall, and F1-score to identify the model’s ability to discriminate the classes, while clustering performance has been assessed using ARI, NMI, and FMI to evaluate how well the intrinsic data structure is preserved in the learned latent space.

Figure 12 indicates that the performance of NeuroMDAVIS has been robust to variations with respect to both batch size and number of hidden layer neurons. The figure also conveys that across all three datasets (bmcite30k, kotliarov50k, and pbmc10k), the classification and clustering metrics remain remarkably stable. For classification, k-NN and Random Forest models have consistently achieved high scores for Accuracy, Precision, Recall, and F1-Score, with minimal variance among different embeddings. On a similar note, k-Means clustering performance, as measured by ARI, FMI, and NMI, has shown little to no change across the various embeddings. This consistent performance suggests that the model’s ability to generate high-quality latent representations is not significantly affected by changes in batch size or number of neurons in the hidden layers, thereby demonstrating its robustness and stability.

FIG. 12.

Sensitivity analysis for the hyperparameters (A) ‘batch size’ and (B) ‘hidden layer neurons’ on bmcite30k, kotliarov50k, and pbmc10k datasets. (A) The batch sizes for the six embeddings for bmcite30k and kotliarov50k have been considered to be 80, 96, 112, 128, 144, and 160; and 32, 48, 64, 80, 96, and 112 for pbmc10k. (B) The number of hidden layer neurons for the six embeddings for bmcite30k, and kotliarov50k have been considered to be [32, [100, 64]], [32, [128, 64]], [16, [100, 64]], [16, [128, 64]], [64, [100, 64]], and [64, [128, 64]]; and [32, [100, 100]], [32, [128, 128]], [16, [100, 100]], [16, [128, 128]], [48, [100, 100]], and [48, [128, 128]] for pbmc10k, respectively. Here, the term $[a, [b_{1}, b_{2}]]$ conveys the structure of the hidden layers, where a is the number of nodes in the shared hidden layer, and $b_{1}$ and $b_{2}$ are the same in modality-specific hidden layer modules, respectively.

4. DISCUSSION

In this work, we have developed a neural network model, called NeuroMDAVIS, for multi-omics data visualization. NeuroMDAVIS provides joint visualization combining all the omics modalities together, and is the first of its kind in this regard. The model is a generalization of NeuroDAVIS (Maitra et al., 2024), recently developed by the authors for the purpose of visualizing single data modality. NeuroMDAVIS enables visualization of multiple omics modalities as a joint embedding, which gives rise to a unified perspective depicting correlations and interactions among the modalities in the same space. The performance of NeuroMDAVIS has been demonstrated on CITE-seq and multiome datasets, and results have been compared with several state-of-the-art methods, such as t-SNE, UMAP, IVIS, Fit-SNE, and MultiMAP. Both NeuroDAVIS and NeuroMDAVIS are feed-forward neural network architectures that can be used to produce a latent 2-dimensional embedding from high-dimensional single-omics and multi-omics data, respectively. This latent embedding/projection captures significant data characteristics, which can be useful for various downstream tasks, including classification and clustering.

In the context of multi-omics data visualization, NeuroMDAVIS has been able to produce qualitatively excellent visualization compared to the other state-of-the-art methods. The latent embedding, when used for cell-type classification or clustering, has surpassed those obtained by most of the other methods. NeuroMDAVIS does not assume any prior data distribution, and thereby, is nonparametric. It stands out as one of its key strengths. Furthermore, to our knowledge, there has been no single method to this date, which supports joint visualization of multiple modalities together. In literature, there exist methods for visualizing one modality overlaid on the other(s) or simultaneously distinct embeddings for each modality. NeuroMDAVIS fits into this gap perfectly. Besides, NeuroMDAVIS does not need PCA for prior initialization, contrasting with some other existing methods. Moreover, unlike major existing visualization methods such as t-SNE and UMAP, both NeuroDAVIS and NeuroMDAVIS can be used as pretrained models to visualize streaming samples, which implies that they support iterative continuous integration and deployment. This attribute makes them highly suitable for large-scale projects such as Human Cell Atlas (HCA) or Human Tumor Atlas Network initiatives. Thus, NeuroDAVIS and NeuroMDAVIS together can be claimed to be the new state-of-the-art methods for omics data visualization.

NeuroMDAVIS requires an identity matrix of size equal to the number of samples as input, which may lead to high memory consumption. It is a polynomial-time algorithm. The sparse identity matrix makes the memory overhead manageable and can be further utilized to handle the issue of scalability in case of large datasets. In the present study, NeuroMDAVIS has been successfully applied to datasets of substantial sizes without encountering computational bottlenecks. It might be due to the usage of small batch sizes during model training, i.e., during any training epoch, only a mini-batch of samples is used instead of the entire training set. For extremely large-scale scenarios (e.g., > 100k cells), an additional optimization, such as block-wise computations or approximate strategies, may be incorporated to further improve scalability, which remains a scope for improvement in the future.

AUTHORS’ CONTRIBUTIONS

C.M.: Conceptualization, methodology, data curation, data analysis, formal analysis, implementation, investigation, code review, validation, writing—initial draft preparation. D.B.S.: Conceptualization, methodology, data curation, data analysis, formal analysis, investigation, code review, validation. V.D.: Conceptualization, methodology, data curation, data analysis, formal analysis, writing—review and editing. R.K.D.: Conceptualization, methodology, writing—review and editing, overall supervision.

Footnotes

ACKNOWLEDGMENT

R.K.D. acknowledges the Department of Biotechnology, Government of India, for partially supporting this research in the form of the grant (Sanction Order Number: BT/PR40176/BTIS/137/84/2023).

AUTHOR DISCLOSURE STATEMENT

V.D. works as a Lead Data Scientist at Novo Nordisk A/S, Søborg. He has received no funds for this work.

FUNDING INFORMATION

The computing system procured through the grant (Sanction Order Number: BT/PR40176/BTIS/137/84/2023) is funded by the Department of Biotechnology, Government of India, and has been used extensively in this work.

DATA AND CODE AVAILABILITY

The datasets used in this study can be downloaded from https://doi.org/10.5281/zenodo.10623932. Codes to reproduce the results can be found at .

References

Adil

, Kumar

, Tasleem Jan

, et al. Single-cell transcriptomics: Current methods and challenges in data acquisition and analysis. Front Neurosci, 2021; 15:591122.

Angermueller

, Clark

, Lee

, et al. Parallel single-cell sequencing links transcript tional and epigenetic heterogeneity. Nat Methods, 2016; 13(3):229–232.

Becht

, McInnes

, Healy

, et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol, 2018; 37(1):38–44.

Bredikhin

, Kats

, Stegle

. Muon: Multimodal omics analysis framework. Genome Biol, 2022; 23(1):42.

Buenrostro

, Wu

, Chang

, et al. ATAC-seq: A method for assaying chromatin accessibility genome-wide. Curr Protoc Mol Biol, 2015; 109(1):21.29.1–21.29.9.

Cao

, Cusanovich

, Ramani

, et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science, 2018; 361(6409):1380–1385.

Chari

, Pachter

. The specious art of single-cell genomics. PLoS Comput Biol, 2023; 19(8):e1011288–e20; doi: 10.1371/journal.pcbi.1011288

Chen

, Lake

, Zhang

. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat Biotechnol, 2019; 37(12):1452–1457.

Clyde

. SHARE-seq reveals chromatin potential. Nat Rev Genet, 2021; 22(1):2.

10.

Farbehi

, Patrick

, Dorison

, et al. Single-cell expression profiling reveals dynamic flux of cardiac stromal, vascular and immune cells in health and injury. Elife, 2019; 8:e43882.

11.

Hashimshony

, Senderovich

, Avital

, et al. CEL-Seq2: Sensitive highly-multiplexed single cell RNA-Seq. Genome Biol, 2016; 17(1):1–7.

12.

Hou

, Guo

, Cao

, et al. Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas. Cell Res, 2016; 26(3):304–319.

13.

, Huang

, An

, et al. Simultaneous profiling of transcriptome and DNA methylome from a single cell. Genome Biol, 2016; 17(1):88.

14.

Jain

, Polanski

, Conde

, et al. MultiMAP: Dimensionality reduction and integration of multimodal data. Genome Biol, 2021; 22(1):346.

15.

Kobak

, Linderman

. Initialization is critical for preserving global data structure in both t-SNE and UMAP. Nat Biotechnol, 2021; 39(2):156–157.

16.

Kohonen

. The self-organizing map. Proc IEEE, 1990; 78(9):1464–1480.

17.

Kotliarov

, Sparks

, Martins

, et al. Broad immune activation underlies shared set point signatures for vaccine responsiveness in healthy individuals and disease activity in patients with lupus. Nat Med, 2020; 26(4):618–629.

18.

Lan

, Ling

, Chen

, et al. scMoMtF: An interpretable multitask learning framework for single-cell multi-omics data analysis. PLoS Comput Biol, 2024; 20(12):e1012679.

19.

Linderman

, Rachh

, Hoskins

, et al. Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nat Methods, 2019; 16(3):243–245.

20.

Lotfollahi

, Litinetskaya

, Theis

. Multigrate: Single-cell multi-omic data integration. BioRxiv, 2022:2022–2023.

21.

Macosko

, Basu

, Satija

, et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell, 2015; 161(5):1202–1214.

22.

Maitra

, Seal

, De

. NeuroDAVIS: A neural network model for data visualization. Neurocomputing (Amst), 2024; 573:127182.

23.

Maitra

, Seal

, Das

, et al. Unsupervised neural network for single cell Multi-omics INTegration (UMINT): an application to health and disease. Front Mol Biosci, 2023; 10:1184748.

24.

Marks

, Garcia

, Barrio

, et al. Resolving the full spectrum of human genome variation using Linked-Reads. Genome Res, 2019; 29(4):635–645.

25.

Peterson

, Zhang

, Kumar

, et al. Multiplexed quantification of proteins and transcripts in single cells. Nat Biotechnol, 2017; 35(10):936–939.

26.

Picelli

, Bjorklund

, Faridani

, et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat Methods, 2013; 10(11):1096–1098.

27.

Satpathy

, Granja

, Yost

, et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat Biotechnol, 2019; 37(8):925–936.

28.

Stoeckius

, Hafemeister

, Stephenson

, et al. Simultaneous epitope and transcriptome measurement in single cells. Nat Methods, 2017; 14(9):865–868.

29.

Stuart

, Butler

, Hoffman

, et al. Comprehensive integration of single-cell data. Cell, 2019; 177(7):1888–1902.e21.

30.

Szubert

, Cole

, Monaco

, et al. Structure-preserving visualisation of high dimensional single-cell datasets. Sci Rep, 2019; 9(1):8914–8910.

31.

Van der Maaten

, Hinton

. “Visualizing data using t-SNE.”. Journal of Machine Learning Research, 2008; 9:2579–2605.

32.

Van Der Maaten

, Postma

, van den Herik

, et al. Dimensionality reduction: A comparative review. Journal of Machine Learning Research, 2009; 10(66–71):13.

33.

Weisenfeld

, Kumar

, Shah

, et al. Direct determination of diploid genome sequences. Genome Res, 2017; 27(5):757–767.

34.

Zheng

GXY

, Terry

, Belgrader

, et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun, 2017; 8(1):14049–14012.