DRGAT: Predicting Drug Responses Via Diffusion-Based Graph Attention Network

Abstract

Accurately predicting drug response depending on a patient’s genomic profile is critical for advancing personalized medicine. Deep learning approaches rise and especially the rise of graph neural networks leveraging large-scale omics datasets have been a key driver of research in this area. However, these biological datasets, which are typically high dimensional but have small sample sizes, present challenges such as overfitting and poor generalization in predictive models. As a complicating matter, gene expression (GE) data must capture complex inter-gene relationships, exacerbating these issues. In this article, we tackle these challenges by introducing a drug response prediction method, called drug response graph attention network (DRGAT), which combines a denoising diffusion implicit model for data augmentation with a recently introduced graph attention network (GAT) with high-order neighbor propagation (HO-GATs) prediction module. Our proposed approach achieved almost 5% improvement in the area under receiver operating characteristic curve compared with state-of-the-art models for the many studied drugs, indicating our method’s reasonable generalization capabilities. Moreover, our experiments confirm the potential of diffusion-based generative models, a core component of our method, to mitigate the inherent limitations of omics datasets by effectively augmenting GE data.

1. INTRODUCTION

Precision oncology or personalized oncology, which focuses on delivering oncological treatments to each individual separately depending on their tumor type, has emerged as a promising strategy to improve patient outcomes while reducing health care costs by avoiding ineffective therapies (Al-Mekhlafi et al., 2022). This approach could be further enhanced through the growing availability of individual medical data, paving the way for advancements in web-based precision medicine (Kurzlechner et al., 2023; Robson and Boray, 2016; Zarei et al., 2020). Recently, large-scale omics datasets have become publicly available, such as the Genomics of Drug Sensitivity in Cancer (GDSC) (Yang et al., 2013), Cancer Cell Line Encyclopedia (CCLE) (Barretina et al., 2012), Cancer Therapeutics Response Portal (CTRP) (Seashore-Ludlow et al., 2015), Patient-Derived Tumor Xenografts (Gao et al., 2015) (PDX) encyclopedia, and The Cancer Genome Atlas (TCGA) (Weinstein et al., 2013). These resources offer multi-omics data, including gene expression (GE), somatic mutations, copy number aberrations, and drug response information.

Accurately predicting how a specific drug will respond in treating a patient is increasingly critical in the field of precision oncology. However, genetic variations among patients lead to differences in drug responses, making accurate predictions a complex challenge (Arik and Pfister, 2020; Gezici and Sefer, 2024; Patro et al., 2011, 2012; Sefer, 2021, 2022a, 2022b; Sefer and Kingsford, 2011). Numerous studies (Battaglia et al., 2018; Chadebec and Allassonnière, 2021; Chaudhari et al., 2020; Chiu et al., 2019) have explored deep learning (DL) models to predict drug responses by analyzing patients’ genetic data, with many previous works demonstrating that GE data has the highest effectiveness for those types of predictions, where it takes the measurements of genes activities (Chiu et al., 2019; Ding et al., 2018; Li et al., 2021; Park et al., 2021; Sharifi-Noghabi et al., 2019; Sharma et al., 2023; Xu et al., 2019b). In bioinformatics studies, such as those focused on predicting cancer subtypes or drug responses using omics data like GE, one of the key challenges is the high dimensionality of the data combined with a limited number of samples. A large number of features relative to the small dataset increases the risk of overfitting during model training, which reduces the model’s ability to generalize effectively Amani (Al-Mekhlafi et al., 2022; Liu et al., 2017; Lv, 2013). Moreover, the presence of outliers in small datasets further compromises the robustness of these models. Thus, achieving a balance between the number of features and available samples is crucial, especially when dealing with omics data characterized by vast dimensions and limited sample sizes. This study aims to address and mitigate these challenges.

To tackle this problem of high dimensionality of the data with a limited number of samples, three main strategies from different perspectives have been proposed: 1- Incorporating Inductive Bias, 2- Feature Selection, and 3- Data Augmentation. First of all, to enhance predictive power with limited training data, an effective strategy involves taking the inter-relationships among genes into account. Predictive models take these relationships into account as an inductive bias (Battaglia et al., 2018; Hamilton et al., 2017). One way to achieve such inductive bias consideration is to use graph neural networks (GNNs), where GNNs might infer knowledge over the existing biological pathways (Kim et al., 2021). However, improving the quality of predictive power requires methods that focus on a distinct set of pathways integration as well as methods, which can capture the significance of the relationship between genes in drug response prediction. Another standpoint is feature selection, where one approach selects only genes that are highly relevant to the disease/drug via utilizing biological pathways (Fernández-Torras et al., 2019; Guney et al., 2016). Studies have shown that using biological pathways for feature selection helps reduce high dimensionality effectively. Lastly, another approach is to augment training data through the use of generative models, such as generative adversarial networks (GANs) (Karras et al., 2020) and variational autoencoders (VAEs) (Chadebec and Allassonnière, 2021), which have been widely explored in image processing. For instance, some studies have demonstrated improved cancer classification performance via augmentation of GE dataset (Chaudhari et al., 2020; Xu et al., 2019a) with GANs and proposed universal tabular generative models applicable to GE data. Another set of studies (Lacan et al., 2023; Lee et al., 2020) also focus on deep generative models such as GANs to generate additional samples. However, these generative models often overlook the relationships between genes, limiting their ability to capture underlying biological mechanisms. Recent advancements in generative modeling, such as denoising diffusion probabilistic models (DDPMs) (Ho et al., 2020; Song and Ermon, 2019) and denoising diffusion implicit models (DDIMs) (Song et al., 2020), which surpass GANs and VAEs in performance, offer new opportunities for data augmentation to mitigate data sparsity.

To address the limitations of prior studies, we propose DRGAT, a method designed to overcome the challenges of overfitting due to the high dimensionality and limited sample sizes in omics datasets for drug response prediction. DRGAT consists of three key components: (1) Selecting features by using biological pathway analysis to tackle high dimensionality, (2) GE data augmentation via diffusion-based generative models such as DDIMs to mitigate the issue of small sample sizes, and (3) drug response prediction through recently introduced graph attention networks (GATs) with high-order neighbor propagation (HO-GATs). The initial feature selection module focuses on reducing dimensionality by identifying biological pathways most relevant to the target protein of each drug and selecting the genes involved in these pathways. The second module augments GE data using a recently popular probabilistic generative model (DDIM) paired with a graph autoencoder (AE) to capture biological mechanisms. Lastly, the drug response prediction module employs a GAT with high-order neighbor propagation, which leverages prior knowledge of biological pathways, enhancing the model’s generalization ability when trained on limited data.

Overall, our main contributions are summarized as follows:

1.
Novel and accurate drug response prediction approach DRGAT is proposed. Our method utilizes a specialized type of GAT with high-order neighbor propagation. Integration of information from distinct biological pathways can be done by such attention network, which can also consider these pathways genes’ closeness to target proteins/genes.
2.
We integrate the recently introduced diffusion-based model DDIM for augmenting GE data into our method. This DDIM-based augmentation is key for addressing the challenges of small expression datasets.
3.
Our method DRGAT achieved better drug response prediction than the competing approaches across many drugs where some of them are previously unseen. Additionally, the quality of the data generated via our diffusion-based generative model is higher than the data generated via other competing generative models.

1.1. Related work

Various methods have been employed for predicting drug responses. Traditionally, machine learning techniques have been used to select key features for prediction tasks (Ding et al., 2018; Geeleher et al., 2014; Graim et al., 2019; Iorio et al., 2016), while AEs have been applied for feature extraction to improve model performance (Xu et al., 2019b). Ding et al. (2018) uses an autoencoder for feature selection, along with an elastic net and support vector machine (SVM) as classifiers. Many studies emphasize dimensionality reduction because the success of these methods often relies on how effectively essential features are extracted from high-dimensional and complex multi-omics data (Eraslan et al., 2019). AutoBorutaRF (Xu et al., 2019b) employs random forests for classification after performing feature selection using AE and the Boruta algorithm (Kursa and Rudnicki, 2010). In DeepDR (Chiu et al., 2019), AE is pre-trained with TCGA data that excludes drug response information, and the AE weights are then used to initialize a prediction model, which is later trained on labeled GDSC data. DeepDSC (Li et al., 2021) utilizes a stacked deep AE to reduce the dimensionality of multi-omics data. Guney et al. (2016) present a drug-disease proximity measure that quantifies the relationship between drug targets and diseases. This measure addresses known biases in the interactome and aids in revealing the therapeutic effects of drugs while also differentiating between palliative and effective treatments.

Recently, with advancements in DL, drug response prediction research has progressed. For instance, the multi-omics late integration (MOLI) (Sharifi-Noghabi et al., 2019) model leverages a deep neural network architecture to integrate multiple types of omics data at different stages of the network. MOLI also encodes multi-omics data using a deep neural network, but it differs from other models by integrating the loss function, where the encoders and the classifier are trained jointly to share each loss. A key feature of MOLI is its triplet loss function (Schroff et al., 2015), which is designed to better distinguish resistant samples from sensitive ones. In MOLI, the encoders and the classifier are trained together using the triplet loss function. Another approach, supervised feature extraction learning with triplet loss (Super.FELT) (Park et al., 2021), reduces the dimensionality of multi-omics data through feature selection, followed by a supervised encoder that extracts vital information from the reduced data. This encoded data is then classified using a neural network for drug response prediction. Additionally, the DeepInsight-3D (Sharma et al., 2023) model takes a novel approach by transforming structured data into images, utilizing convolutional neural networks (CNNs) to predict patient-specific anti-cancer drug responses. Lastly, GraphCDR (Liu et al., 2022) builds a graph neural network using multi-omics profiles of cancer cell lines, drug chemical structures, and known cancer cell line-drug responses to predict cancer drug response (CDR). It incorporates a contrastive learning task as a regularizer within a multi-task learning framework to improve generalization performance.

On the other hand, generative models have become a cornerstone in modern machine learning, with a wide range of applications, including image generation, text synthesis, drug discovery, and anomaly detection (Radford, 2015). These models learn the underlying data distribution and generate new samples that resemble the training data. By understanding the data’s probabilistic structure, generative models can create novel instances, simulate scenarios, and fill in missing information. One of the most well-known types of generative models is GANs (Goodfellow et al., 2014). GANs consist of two neural networks, a generator and a discriminator, that are trained simultaneously. The generator creates fake data samples, while the discriminator attempts to distinguish between real and fake samples. Through this adversarial process, the generator improves its ability to produce realistic data, while the discriminator enhances its ability to spot fake data. VAEs (Higgins et al., 2017b; Kingma, 2013) are another class of generative models based on the AE architecture. VAEs assume that the data is generated from a latent variable model, with both the encoder and decoder learning a probabilistic distribution. By enforcing a constraint on the latent space to follow a known prior distribution (e.g., Gaussian), VAEs can generate new samples by sampling from this latent space. Normalizing flows (Rezende and Mohamed, 2015) are a class of generative models that transform a simple distribution, such as a Gaussian, into a more complex data distribution using a series of invertible transformations. The key advantage of normalizing flows is that they allow exact likelihood computation, which enables both efficient sampling and model evaluation. These models have been particularly useful in applications where interpretability and tractability of the latent variable space are important, such as in density estimation and generative modeling of molecular structures.

2. PRELIMINARIES AND NOTATIONS

A collection of biological pathways, represented as $G = {G_{1}, G_{2}, \dots, G_{k}}$ , is constructed over a set of subgraphs $G = (V, E)$ . Each subgraph G corresponds to an undirected graph consisting of nodes $v_{i} \in V$ and edges $e_{i} = (v_{i}, v_{j}) \in E$ . Here, each node signifies a gene, while each edge indicates an interaction or connection between the genes.

2.1. Denoising diffusion implicit models

DDIMs (Song et al., 2020) are a variant of DDPM, which are designed to improve the efficiency of the diffusion process by reducing the number of sampling steps required to generate high-quality data, such as images. DDIMs maintain the generative power of DDPMs while making the reverse diffusion process faster and more deterministic. Some key features of DDIM can be summarized as follows:

1.
Deterministic sampling: Unlike DDPMs, which typically use a stochastic reverse diffusion process, DDIMs introduce a deterministic way to reverse the noise. This means that, for a given starting point (random noise), DDIMs can generate the same output each time.
2.
Reduced sampling steps: DDPMs usually require hundreds to thousands of reverse steps to generate high-quality samples, which makes them slow. DDIMs significantly reduce the number of steps needed while still maintaining competitive sample quality. Fewer steps mean faster generation.
3.
Implicit model: DDIMs achieve this speed-up by introducing an implicit generative model, which allows the process to skip some of the intermediate noisy steps, making the generation process more efficient.
4.
Flexibility in trade-offs: One of the strengths of DDIMs is that they offer a trade-off between speed and sample quality. Users can adjust the number of sampling steps based on their needs, choosing to prioritize faster generation or higher-quality output.
5.
Improved sample quality: DDIMs tend to produce higher-quality images with fewer sampling steps compared with traditional DDPMs, which is crucial for real-time or resource-constrained applications.

2.2. Mathematical framework of DDIM

DDIM builds on the diffusion model framework, where the forward process gradually adds noise to data over multiple time steps, and the reverse process aims to recover the original data. Our problem can simply be defined as Problem 1.

Problem 1. Given gene expression data X for genes, response label for each gene in x as c, and a collection of biological pathways, represented as $G = {G_{1}, G_{2}, \dots, G_{k}}$ , which are constructed over a set of subgraphs $G = (V, E)$ , our goal is to predict whether each node (gene) will be respond to a given drug.

2.2.1. Forward process (non-Markovian diffusion process)

The forward process in DDIM defines a set of noisy variables ${x_{t}}_{t = 0}^{T}$ , starting from the data point x₀. The key difference between DDPM and DDIM is that DDIM uses a non-Markovian forward process instead of a Markov chain. The forward process is defined as a non-Markovian diffusion from x₀ to x_T, where noise is added to the data gradually based on a predetermined noise schedule: $q (x_{t} | x_{0}) = N (x_{t}; \sqrt{α_{t}} x_{0}, (1 - α_{t}) I)$ (1)where x₀ is the original data point (e.g., an image), x_t is the noisy version of x₀ at step t, α_t is a variance schedule which is defined as a function of t, $N$ is a Gaussian distribution, and I is the identity matrix.

The forward process can be derived directly from this expression, and for each t, x_t is a noisy version of x₀ generated by using: $x_{t} = \sqrt{α_{t}} x_{0} + \sqrt{1 - α_{t}} ϵ, ϵ \sim N (0, I)$ (2)where ϵ is the random Gaussian noise added at each step.

2.2.2. Reverse process (deterministic sampling)

Unlike the stochastic reverse process in DDPM, DDIM employs a deterministic reverse process. In this process, we start from a noisy sample x_T and iteratively remove noise to reconstruct x₀. This reverse process is non-Markovian and deterministic, making it faster and more efficient than DDPM. The reverse process is derived based on the model’s prediction of noise $ϵ_{θ} (x_{t}, t)$ , where $ϵ_{θ}$ is typically parameterized by a neural network. The reverse process at step t is defined as: $x_{t - 1} = \sqrt{α_{t - 1}} (\frac{x_{t} - \sqrt{1 - α_{t}} ϵ_{θ} (x_{t}, t)}{\sqrt{α_{t}}}) + \sqrt{1 - α_{t - 1}} ϵ_{θ} (x_{t}, t)$ (3)where $ϵ_{θ} (x_{t}, t)$ is the noise estimate at step t, α_t and $α_{t - 1}$ are part of the predefined noise schedule, x_t is the noisy image at time step t, and $x_{t - 1}$ is the denoised image at the next time step. This deterministic reverse process ensures that the noisy sample x_t is progressively transformed into x₀ through a sequence of transformations.

2.2.3. Interpretation of the reverse process

The reverse process equation can be interpreted as separating the noisy part from the signal: $x_{t - 1} = \sqrt{α_{t - 1}} (estimate of x_{0}) + \sqrt{1 - α_{t - 1}} (estimate of ϵ)$ (4)where the first term is an estimate of the clean image (based on x_t), and the second term is the remaining noise to be subtracted. The noise estimate $ϵ_{θ} (x_{t}, t)$ is the model’s prediction of the added noise at each step.

2.2.4. Connection to DDPM

DDIM generalizes DDPM by introducing flexibility in the reverse process. While DDPM uses a stochastic process in the reverse direction (adding noise sampled from a Gaussian distribution at each step), DDIM allows for deterministic sampling. This can be achieved by choosing a specific variance schedule α_t and adjusting the number of time steps T. In fact, when the noise schedule and the number of steps are chosen appropriately, DDIM can recover the behavior of DDPM.

DDIM allows for fewer sampling steps than DDPM. The steps in DDIM do not have to match the exact forward diffusion process. This flexibility makes DDIM much faster in practice while retaining high-quality samples. For example, while DDPM might require 1000 steps to generate high-quality samples, DDIM can achieve similar results with only 50 or 100 steps, reducing computational cost.

3. OUR SOLUTION DRGAT

Here, we introduce our proposed solution DRGAT method for drug response prediction. It has three parts, which are summarized in the sections below.

3.1. Selecting features via graph analysis

To determine which biological pathways have the greatest influence on drug response prediction outcomes, we need to extract the biological pathways, which are statistically significantly close to drug-related genes.

We calculated the proximity of these pathways by measuring the distance between each biological pathway and the drug-related genes, by following the approach outlined in Guney et al. (2016). According to this approach, we measure the distance between each biological pathway and the drug-related genes via average shortest path lengths between them, which is defined as: $d_{closest} = \frac{1}{| T |} \sum_{t \in T} \min_{s \in S} d (s, t)$ (5)where d(s, t) defines the shortest path between a pathway gene and a drug-associated gene, the set of pathway genes is represented by S, and T defines the set of drug-associated genes (target proteins/genes).

We determined the statistical significance of these estimated distances for each pathway by generating a reference distribution without depending on the number of vertices inside the pathway where vertices correspond to genes. We generated this reference distribution via bootstrapping a randomly selected set of genes. Genes are randomly chosen, where we ensure selected genes match the degree and size of the original drug-related genes. Afterward, the closest distances are estimated upon these randomly chosen genes. We calculated z-scores by using such reference distribution’s mean and standard deviation. Following such z-score calculation, we sort each pathway according to its z-scores and select the highest-scoring K pathways as the proximal ones. Eventually, we choose the genes existing on the chosen pathways having the highest impact on drug response prediction for analysis in the following steps.

3.2. Data augmentation for generalizability

In this subsection, we introduce the second component of DRGAT where we focus on handling the limited training data. We handle limited training data and generalize our method by proposing a data augmentation method. The module consists of two key parts. The first step is the projection step, where we utilize graph AE for projecting the GE dataset into latent space. The second step expands such latent space via DDIM. Afterward, we transform such augmented latent space into GE dataset via graph AE’s trained decoder part.

3.2.1. Gene expression profiles graphical compression via autoencoder

Our proposed graphical compression model for GE profiles was built on an AE architecture. Graph AEs provide an effective way to learn graph-structured data representations by encoding structural and feature information into a lower-dimensional space. Graph AEs capture the rich interconnections between nodes, making them powerful tools for tasks like link prediction, clustering, and anomaly detection in various domains such as social networks, bioinformatics, and recommendation systems.

In our case, our AE’s input is based on utilizing the graph knowledge about the inferred K pathways by the approach in Section 3.1. Such input is encoded into a graph embedding in latent space. We feed the GE data $X \in R^{N \times d}$ to AE model as an input, where AE model represents each pathway as a subnetwork as discussed in Section 3.3.1. The attributes of a subgraph involve gene indicators and profiles of GE. AE’s encoder part has the identical structure of the graph attention layer discussed in Section 3.3.1. However, it generates the latent embeddings $X \in R^{N \times d} (d < < D)$ via an extra affine layer, once graph attention process is completed. Apart from this encoder, we trained a decoder having affine layers in reconstructing the original vertex states containing the GE profiles. We train the AE model via optimizing the loss below concerning ψ and $ϕ$ : $L_{Autoencoder} = \sum_{i = 1}^{N} (x_{i} - D e_{ϕ} (E n_{ψ} (x_{i}, G)))^{2}$ (6)

By following this main step, we design the next section’s generative model in generating the latent space from this section’s process.

3.2.2. Generative modeling of latent space

The trained graph AE model, consisting of $E n_{ψ}$ and $D e_{ϕ}$ as mentioned earlier, was able to access a low-dimensional latent space that encapsulated information on biological relationships. Unlike typical image generation tasks where DDPMs and DDIMs are frequently used, this study required adjustments to the model in order to generate latent spaces that capture both GE data and the relationships between genes.

In image generation area where DDPMs and DDIMs are frequently used, the backbone architecture of DDIM-based models typically uses U-Net (Ronneberger et al., 2015), as the CNN operations within U-Net align well with the inherent patterns of image data. However, for our purposes, using a CNN-based U-Net model is unsuitable since the data we’re dealing with does not have the local structures or neighboring pixel dependencies found in images. Given the nature of GE data, we modified the backbone architecture by replacing the convolution layer with an affine layer.

The module described in this section improves the model’s ability to predict whether a sample is sensitive or resistant to a drug by expanding the training data. Instead of generating samples indiscriminately, it focuses on producing samples labeled with their sensitivity or resistance to drugs. In this conditional generation approach, the input data x₀ is paired with a condition term such as the sensitive or drug-resistant group. The diffusion model is then adapted to incorporate the condition term c into the reverse process, allowing the model to learn a conditioned generative model $p_{θ} (x_{0} | c)$ . $p_{θ} (x_{t - 1} | x_{t}) = N (x_{t - 1}; μ_{θ} (x_{t}, t, c), σ_{t}^{2} I)$ (7)

In the reverse process, where denoising occurs, the transformation is applied as a conditional probability based on the given condition. However, the noise injection remains the same for all data classes, meaning the forward process does not vary according to the condition or class. When the mean in the reverse process changes from $μ_{θ} (x_{t}, t)$ to $μ_{θ} (x_{t}, t, c)$ , the condition c must be included as an additional input to the trainable backbone architecture. With this modification in the reverse process, the original simplified loss function can be rewritten accordingly: $L_{cond} : = E_{t, ϵ, x_{0}} [‖ ϵ - ϵ_{θ} (\sqrt{{\bar{a}}_{t}} x_{0} + \sqrt{1 - {\bar{a}}_{t}} ϵ, t, c) ‖^{2}]$ (8)

To generate latent variables that capture biological information under specific conditions, it is essential to adjust the stochastic generation step, redefined as ancestral sampling by Ho et al. (2020) and Song et al. (2021). The goal is to modify the inference process to produce stochastic samples that adhere to the given conditions. This adjustment can be outlined as follows: $x_{t - 1} = \frac{1}{\sqrt{α_{t}}} (x_{t} - \frac{1 - α_{t}}{\sqrt{1 - {\bar{α}}_{t}}}) ϵ_{θ} (x_{t}, t, c) + σ_{t} z$ (9)

Lastly, we decoded the generated latent space back into GE data using a trained graph AE model. Next, Section 3.3 outlines the proposed data augmentation method for GE profiles aimed at enhancing the generalization performance of drug response prediction models. This approach addresses the challenge of generating GE data by utilizing a compression model using graph AE to map GE data into a low-dimensional latent space that retains biological structure information, which is then generated using the DDIM model. This method offers several benefits. By incorporating a graph AE to capture biological relationships during the generative model’s training, the complexity of the training process is reduced. Additionally, mapping high-dimensional GE data into a low-dimensional latent space lowers computational demands.

3.3. Drug response prediction with HO-GATs

Now, we introduce the last constituent of our method, a drug response prediction technique that utilizes high-order neighbor propagation graph attention networks (HO-GATs) (Xiong et al., 2024). In this approach, each biological pathway is modeled as a subgraph $G = (V, E)$ , and the HO-GAT model is applied to these subgraphs to capture the gene relationship patterns. (HO-GATs extend the basic GAT model by considering multi-hop (high-order) neighbors of a node, allowing for more global context and enhanced feature propagation within the graph. This makes the model particularly useful for tasks where information from a wider range of neighbors (beyond immediate neighbors) is important. In this case, each node $v \in V$ is associated with an input feature vector $h_{v}^{(0)} \in ℝ^{F}$ , where F is the dimension of the input feature space. More specifics about this method are outlined in the following sections.

Overall, high-order propagation has the following benefits: (1) Global context: HO-GATs leverage the graph structure more fully by incorporating information from distant nodes, allowing for a more global perspective on node relationships. (2) Expressiveness: By considering high-order neighbors, HO-GATs capture more complex patterns and interactions within the graph. (3) Flexibility: The model adapts to various tasks where the immediate neighborhood may not provide sufficient information, such as in semi-supervised learning tasks on graphs with long-range dependencies.

3.3.1. Basic graph attention network model

In a traditional GAT, for each node v, the attention mechanism is applied to its direct neighbors $u \in N (v)$ , where $N (v)$ is the set of neighbors of node v. A shared attention mechanism computes a pairwise attention score between the feature vectors of v and u, which are used to weigh the aggregation of the neighbors’ features. The key operations in the GAT are:

1.
Attention coefficients: $e_{v u} = LeakyReLU (a^{T} [W h_{v}^{(0)} ‖ W h_{u}^{(0)}])$ (10)where $W \in ℝ^{F' \times F}$ is a learnable weight matrix, $a \in ℝ^{2 F'}$ is the attention vector, and $‖$ denotes concatenation. e_vu is the unnormalized attention score for the pair of nodes v and u.
2.
Normalization (Softmax): The attention scores are normalized across all neighbors $u \in N (v)$ using the softmax function: $α_{v u} = \frac{\exp (e_{v u})}{\sum_{k \in N (v)} \exp (e_{v k})}$ (11)where α_vu represents the normalized attention coefficient.
3.
Feature aggregation: The feature update for node v is then given by: $h_{v}^{(1)} = σ (\sum_{u \in N (v)} α_{v u} W h_{u}^{(0)})$ (12)where σ is a non-linear activation function, like ReLU.

3.3.2. High-order neighbor propagation in HO-GATs

HO-GATs extend the GAT by including higher-order neighbors, meaning the neighbors’ neighbors (and so on) in the message-passing mechanism. This allows information from more distant nodes in the graph to influence the representation of each node. In high-order neighbor aggregation, instead of aggregating information only from direct neighbors (1-hop), the HO-GAT aggregates information from multi-hop neighbors. Specifically, for a node v, we can consider neighbors up to k-hops. The feature propagation now incorporates not only immediate neighbors but also the second, third, etc., neighbors.

The node features consisted of GE data and indicators. Each gene was uniquely assigned a gene indicator using a trainable embedding matrix, similar to the token embedding process in Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al., 2019). Additionally, nodes from different biological pathways with the same gene symbol shared the same gene-embedding space. The connection details for each gene were derived from the adjacency matrix, which was generated by preprocessing data obtained from the Kyoto Encyclopedia of Genes and Genomes (Kanehisa and Goto, 2000) database. Let $h_{v}^{(l)}$ denote the feature vector of node v after l-hop message passing. The attention mechanism at the l-th layer can be applied to all neighbors within l-hops. For a k-order GAT, the node features are updated in the following manner: $h_{v}^{(l + 1)} = σ (\sum_{u \in N^{l} (v)} α_{v u}^{(l)} W_{l} h_{u}^{(l)})$ (13)where $N^{l} (v)$ denotes the l-hop neighbors of node v, and $α_{v u}^{(l)}$ are the attention weights at the l-th layer, learned based on the node pairs (v, u). The attention coefficients $α_{v u}^{(l)}$ are computed similarly to the 1-hop GAT, except that they now consider the features of nodes that are l-hops away.

In terms of final output, the final node representation after k-order neighbor aggregation is given by: $h_{v}^{(k)} = σ (\sum_{l = 1}^{k} \sum_{u \in N^{l} (v)} α_{v u}^{(l)} W_{l} h_{u}^{(l)})$ (14)where each layer’s attention mechanism learns to weigh the importance of different neighbors, including high-order ones, allowing for richer feature propagation.

3.3.3. Multi-head attention

HO-GATs typically employ multi-head attention, where the attention mechanism is applied multiple times in parallel. Each attention head produces its own set of attention weights and node updates, and the results are then combined (concatenated or averaged). For H attention heads, the output is: $h_{v}^{(k)} = ‖_{h = 1}^{H} σ (\sum_{u \in N^{l} (v)} α_{v u}^{(l, h)} W_{l}^{(h)} h_{u}^{(l)})$ (15)where $∥$ denotes concatenation, and each head has its own weight matrix $W_{l}^{(h)}$ .

3.3.4. Predicting drug response via readout and concatenation

As soon as the whole set of vertices in every subgraph has been updated, we employ a readout function to collect knowledge from the subgraph. Additionally, a concatenation operation then combined the distance data of the target protein, associated with the biological pathway represented by the subgraph, with the information from the nodes in that subgraph, as described in Eq. (16): $Z_{G} = concat (d_{k}^{- 1} * M L P_{readout} (H^{L, k}) | k = 0, \dots, K)$ (16)

Here, d_k denotes the distance between the target protein and its corresponding subgraph (biological pathway), and $H^{L, k}$ refers to the final node state of the k-th subgraph [the graph AE’s encoder layer in Section 3.2.1 is similarly structured as shown in Eq. (17)]. Once information from all networks was gathered, it was concatenated to create a representation vector, which was then utilized to predict the final drug response task, as described below. $y_{pred} = M L P_{pred} (Z_{G})$ (17)

The approach described in this subsection differs from the model proposed by Ryu et al. (2018) in two key ways. First, among the node features in each subgraph, the gene indicator was obtained from a shared, trainable gene-embedding matrix. This shared matrix allows networks to use common information across all biological pathways while learning the distinct features of each pathway. In other words, by sharing the gene-embedding matrix, the networks can leverage similarities (shared genes) between pathways while still learning their unique traits (geometric information). Second, by integrating the distance information from the target protein into the readout function, the model more effectively captures the connection between GE and drug response, leading to more accurate predictions. This improved accuracy is largely because genes closer to the target protein have a greater influence on drug response than those farther away (Guney et al., 2016).

4. RESULTS

4.1. Experimental setup

We adopted the same dataset configuration as a previous study (Sharifi-Noghabi et al., 2019). The training set comprised samples containing GE values and corresponding drug response data for cell lines from the GDSC (Yang et al., 2013) database. To label drug response (resistant/sensitive), we used experimentally determined cutoffs from the prior study (Iorio et al., 2016) to classify samples as sensitive or resistant for each drug. The test set included GE values and drug response data from patients in TCGA (Weinstein et al., 2013) and the PDX encyclopedia (Gao et al., 2015). The data were downloaded from the Zenodo repository (Sharifi-Noghabi et al., 2019). Our code can be found at https://github.com/seferlab/drgat.

For gene interaction and connection information, we use the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database that has protein–protein interaction (PPI) datasets (Szklarczyk et al., 2019). Across the STRING database, we selected the largest connected subgraph and focus only on higher confidence interactions where a high-confidence interaction corresponds to interactions having more than 0.7 confidence score. Such high-confidence score indicates a stronger likelihood that the interaction is biologically relevant and not a random or spurious connection. STRING assigns confidence scores to interactions based on various evidence types, such as experimental data, computational predictions, and known curated databases. We ensured the relevance of extracted pathways by eliminating drugs and pathways without genes in the PPI graph. As a result, we obtained 1923 biological pathways.

We have evaluated the performance of the modification site prediction via various metrics such as F1, area under receiver operating characteristic curve (AUC), and precision-recall (PR) curve. Let TP, TN, FP, FN represent true positive, true negative, false positive, and false negative, respectively. Then, evaluation metrics are defined as: $\begin{array}{l} Accuracy = \frac{T P + T N}{T P + T N + F P + F N} \\ Precision = \frac{T P}{T P + F P} \\ Recall = \frac{T P}{T P + F N} \\ F1 = \frac{2 * Precision * Recall}{Precision + Recall} \\ Sensitivity = Recall = \frac{T P}{T P + F N} \\ Specificity = \frac{T N}{F P + T N} \end{array}$

Receiver Operating Characteristics (ROC) AUC is calculated as the area under the Sensitivity (TPR) − (1 − Specificity) (FPR) curve. It takes values between 0 and 1 where random guess obtains an ROC AUC score of 0.5. PR curve plots precision values on the y-axis and recall values on the x-axis. ROC AUC provides a single value that summarizes the model’s ability to discriminate between the two classes. Overall, the AUC score reflects a balance between sensitivity (how well the model identifies true positives) and specificity (how well it avoids false positives), making it useful in cases where one may be more critical than the other.

4.2. Drug response prediction enhancement

We train and validate our model by utilizing the GDSC dataset for training and the PDX/TCGA datasets for testing, focusing on six drugs. To ensure a fair comparison, we maintained the same training and test set configurations, drug selection, and evaluation metrics as used in previous studies (Park et al., 2021; Sharifi-Noghabi et al., 2019; Sharma et al., 2023). The training set was split 80:20 for validation purposes, and stratified 5-fold cross-validation is utilized in determining optimal epoch count and the best optimizer, with early stopping employed. Early stopping was triggered when validation loss ceased to decrease, with a patience value of 10 epochs. We did not include the generated synthetic data as in Section 3.3 in validation datasets, but included only in training subsets. The data augmentation rate for drug response prediction was set to 70% of the real sample count, which represented the optimal augmentation ratio identified across the optimal hyperparameter search discussed in Section 4.5. For example, if the number of real samples for each label (resistant/sensitive) is n and m, the synthetic samples per label would be $\frac{(n + m)}{2}$ . The evaluation of the synthetic data is described in detail in Section 4.4. We measured the AUC values of the test set (PDX/TCGA) by using the parameters that achieved the highest average AUC in the validation set.

We assessed the performance of several state-of-the-art methods for predicting anti-CDR using multi-omics data. Specifically, three recently developed methods—MOLI, Super.FELT, and DeepInsight-3D—were used as benchmarks for comparison. Along with these, we also included non-negative matrix factorization, a feedforward network, and the method proposed in Sharifi-Noghabi et al. (2019) in the comparison analysis. Additionally, we compared other methods cited in a previous study (Park et al., 2021), such as the AE, artificial neural network after feature selection, AutoBoruta Random Forest, and SVM. Lastly, we evaluated the proposed method against TabNet (Arik and Pfister, 2020), a widely used model for similar tasks involving tabular datasets.

4.2.1. Performance

Table 1 provides a summary of the generalization performance of various baseline models compared with the proposed model for drug response predictions, especially on the PDX and TCGA benchmark datasets. The following abbreviations are used for the drugs: Pac for Paclitaxel, Erl for Erlotinib, Cet for Cetuximab, Doc for Docetaxel, Cis for Cisplatin, and Gem for Gemcitabine. The results demonstrated that the proposed DRGAT model achieved the highest AUC scores for all drugs in the PDX benchmark dataset, with the exception of Erlotinib (Erl), and outperformed other models for all drugs in the TCGA benchmark dataset. Overall, the proposed method outperformed almost all baseline models, with an average AUC of 0.82 across the six datasets, indicating its strong potential to enhance the accuracy of drug response predictions, particularly on the PDX and TCGA benchmark datasets. Among the baselines, DeepInsight has generally performed the best so we also apply our augmentation to DeepInsight and present results under DeepInsight (Aug). DeepInsight’s performance increases when we augment the data by following our augmentation technique.

Table 1.
Drug Response Prediction Performance on the PDX and TCGA Dataset

PDX TCGA

Model Pac Cet Erl Doc Cis Gem Avg

NMF 0.25 0.55 0.26 0.38 0.41 0.59 0.40

AE 0.45 0.45 0.34 0.49 0.45 0.49 0.44

ABRF 0.50 0.34 0.30 0.44 0.47 0.56 0.45

SVM 0.50 0.42 0.69 0.54 0.48 0.48 0.51

TabNet 0.53 0.53 0.60 0.54 0.52 0.52 0.53

FFN 0.69 0.41 0.38 0.68 0.43 0.67 0.54

Geeleher et al. 0.54 0.57 0.69 0.60 0.61 0.53 0.59

ANNF 0.66 0.41 0.65 0.65 0.67 0.58 0.60

MOLI 0.77 0.56 0.65 0.60 0.68 0.69 0.66

Super.FELT 0.68 0.58 0.79 0.68 0.77 0.64 0.69

DeepInsight 0.76 0.73 0.87 0.81 0.71 0.57 0.74

DeepInsight (Aug) 0.80 0.79 0.87 0.83 0.81 0.70 0.79

GraphCDR 0.75 0.77 0.74 0.76 0.77 0.73 0.75

DRGAT (no-aug) 0.72 0.69 0.82 0.74 0.77 0.68 0.73

DRGAT 0.81 0.85 0.86 0.86 0.84 0.75 0.82

	PDX	TCGA
NMF	0.25	0.55	0.26	0.38	0.41	0.59	0.40
AE	0.45	0.45	0.34	0.49	0.45	0.49	0.44
ABRF	0.50	0.34	0.30	0.44	0.47	0.56	0.45
SVM	0.50	0.42	0.69	0.54	0.48	0.48	0.51
TabNet	0.53	0.53	0.60	0.54	0.52	0.52	0.53
FFN	0.69	0.41	0.38	0.68	0.43	0.67	0.54
Geeleher et al.	0.54	0.57	0.69	0.60	0.61	0.53	0.59
ANNF	0.66	0.41	0.65	0.65	0.67	0.58	0.60
MOLI	0.77	0.56	0.65	0.60	0.68	0.69	0.66
Super.FELT	0.68	0.58	0.79	0.68	0.77	0.64	0.69
DeepInsight	0.76	0.73	0.87	0.81	0.71	0.57	0.74
DeepInsight (Aug)	0.80	0.79	0.87	0.83	0.81	0.70	0.79
GraphCDR	0.75	0.77	0.74	0.76	0.77	0.73	0.75
DRGAT (no-aug)	0.72	0.69	0.82	0.74	0.77	0.68	0.73
DRGAT	0.81	0.85	0.86	0.86	0.84	0.75	0.82

Bold values indicate methods with best performance for each drug.

ABRF, AutoBoruta Random Forest; AE, autoencoder; ANNF, artificial neural network after feature selection; CDR, cancer drug response; FFN, feedforward network; GAT, graph attention network; MOLI, multi-omics late integration; NMF, non-negative matrix factorization; PDX, Patient-Derived Tumor Xenografts; Super.FELT, supervised feature extraction learning with triplet loss; SVM, support vector machine; TCGA, The Cancer Genome Atlas.

Starting from the issue identified in previous studies, which showed poor generalization due to insufficient training data, we evaluated the performance of our proposed method without the data augmentation module to determine if it performs worse without this component. This experiment also allowed for a practical comparison with other baseline models by using the GAT with subnetworks without data augmentation, which can be seen as another contribution of this study. Models trained solely on real data, without augmentation, displayed lower performance across all six drugs compared with those trained with augmented data as seen in Table 1. This indicates that augmenting the training data significantly improves generalization on unseen benchmark datasets and confirms that the augmented data is well-suited for enhancing drug response prediction. Furthermore, using only the GAT, the third module in the DRGAT method, without augmentation, led to higher prediction accuracy than all baseline models for two out of three drugs in the TCGA dataset, and none in the PDX dataset. These findings emphasize the importance of both the data augmentation module and the high-order GAT in achieving strong generalization performance in the DRGAT.

We also evaluate the methods on full CCLE and CTRP datasets as seen in Tables 2 and 3 in terms of PR AUC. In these tables, DRGAT (no-aug) stands for DRGAT without augmentation. The results demonstrated that the proposed DRGAT model achieved the highest PR AUC scores for the majority of drugs in the benchmark datasets, where Super.FELT, DeepInsight, and GraphCDR outperform in a few drugs. Overall, the outperformance of our method indicates its strong potential to enhance the accuracy of drug response predictions.

Table 2.

Results of Validation on CCLE and CTRP in Terms of PR AUC

Drug	Super.FELT	DeepInsight	GraphCDR	DRGAT (no-aug)	DRGAT
17-AAG	0.682	0.676	0.648	0.677	0.698
Afatinib	0.782	0.786	0.771	0.788	0.796
Axitinib	0.785	0.777	0.785	0.781	0.792
AZD7762	0.781	0.785	0.777	0.782	0.784
AZD8055	0.679	0.690	0.678	0.675	0.693
BI-2536	0.510	0.521	0.489	0.530	0.547
Bleomycin	0.611	0.602	0.599	0.611	0.615
BMS-345541	0.740	0.736	0.739	0.736	0.741
BMS-754807	0.641	0.645	0.643	0.649	0.662
Bortezomib	0.293	0.274	0.278	0.303	0.333
Bosutinib	0.774	0.778	0.773	0.760	0.790
CAL-101	0.772	0.748	0.748	0.752	0.754
Crizotinib	0.718	0.727	0.685	0.715	0.735
Cytarabine	0.801	0.805	0.815	0.795	0.809
Dabrafenib	0.810	0.801	0.795	0.810	0.821
Dasatinib	0.669	0.664	0.633	0.655	0.657
Docetaxel	0.469	0.412	0.354	0.450	0.476
Doxorubicin	0.649	0.634	0.606	0.615	0.654
Erlotinib	0.698	0.665	0.664	0.667	0.680
Etoposide	0.814	0.813	0.827	0.814	0.820
EX-527	0.739	0.706	0.721	0.700	0.774
GDC0941	0.704	0.702	0.717	0.703	0.723
Gefitinib	0.680	0.683	0.684	0.676	0.695
Gemcitabine	0.614	0.628	0.621	0.607	0.618
GW843682X	0.454	0.534	0.413	0.483	0.582
Imatinib	0.682	0.679	0.648	0.690	0.698
JNJ-26854165	0.740	0.707	0.698	0.700	0.745
KU-55933	0.653	0.661	0.646	0.647	0.650
Lapatinib	0.661	0.638	0.692	0.662	0.680
Masitinib	0.820	0.813	0.816	0.816	0.819
Methotrexate	0.727	0.722	0.724	0.724	0.729
MG-132	0.394	0.403	0.405	0.403	0.412
Mitomycin C	0.707	0.655	0.657	0.672	0.682
MK-2206	0.730	0.733	0.710	0.717	0.722
Nilotinib	0.599	0.587	0.602	0.601	0.611
Nutlin-3a	0.928	0.906	0.888	0.910	0.938
NVP-BEZ235	0.636	0.584	0.543	0.607	0.621
NVP-TAE684	0.535	0.551	0.494	0.538	0.570
OSI-027	0.681	0.690	0.678	0.675	0.688

Bold values indicate methods with best performance for each drug.

AUC, area under receiver operating characteristic curve; CCLE, Cancer Cell Line Encyclopedia; CTRP Cancer Therapeutics Response Portal; PR, precision recall.

Table 3.

Results of Validation on CCLE and CTRP Continued in Terms of PR AUC

Drug	Super.FELT	DeepInsight	GraphCDR	DRGAT (no-aug)	DRGAT
OSI-930	0.758	0.755	0.756	0.760	0.773
PAC-1	0.721	0.680	0.706	0.708	0.729
Paclitaxel	0.449	0.455	0.412	0.436	0.444
Parthenolide	0.623	0.622	0.586	0.616	0.632
Pazopanib	0.670	0.655	0.656	0.659	0.691
PD-0325901	0.854	0.841	0.833	0.834	0.840
PD-0332991	0.645	0.629	0.678	0.652	0.671
PHA-665752	0.524	0.533	0.497	0.535	0.547
PHA-793887	0.822	0.808	0.815	0.818	0.825
PI-103	0.775	0.759	0.758	0.762	0.784
PIK-93	0.827	0.821	0.826	0.822	0.826
Piperlongumine	0.727	0.739	0.692	0.701	0.717
PLX4720	0.865	0.873	0.866	0.859	0.871
Ruxolitinib	0.763	0.756	0.765	0.765	0.774
SN-38	0.735	0.730	0.717	0.730	0.749
SNX-2112	0.780	0.789	0.782	0.770	0.783
Sorafenib	0.652	0.633	0.595	0.616	0.643
Sunitinib	0.582	0.642	0.578	0.628	0.651
Tamoxifen	0.617	0.630	0.591	0.629	0.645
Temozolomide	0.821	0.827	0.820	0.826	0.828
Temsirolimus	0.731	0.692	0.689	0.710	0.749
TG101348	0.771	0.766	0.763	0.764	0.774
TGX221	0.511	0.521	0.534	0.544	0.555
TPCA-1	0.784	0.779	0.788	0.788	0.795
Trametinib	0.782	0.773	0.746	0.762	0.780
Tubastatin A	0.870	0.876	0.868	0.867	0.872
TW 37	0.599	0.564	0.574	0.572	0.620
Vorinostat	0.763	0.770	0.747	0.776	0.779
VX-680	0.724	0.773	0.719	0.765	0.802
YK 4–279	0.780	0.763	0.785	0.769	0.776
YM155	0.602	0.591	0.602	0.601	0.613
ZSTK474	0.773	0.783	0.779	0.776	0.789

Bold values indicate methods with best performance for each drug.

Among the considered drugs, cisplatin is an alkylating agent used to treat various cancers. It inhibits DNA synthesis and RNA transcription by damaging DNA (Bloemink and Reedijk, 1996). The functional enrichment analysis on genes related to cisplatin revealed enrichment in terms such as cell division (GO:0051301), DNA replication (GO:0006260), and cellular response to DNA damage stimulus (GO:0006974), supporting the idea that the genes distinguishing sensitive and resistant samples are linked to DNA synthesis. Temozolomide, another alkylating agent, is used to treat brain tumors by methylating the purine bases of DNA, which triggers tumor cell death (Zhang et al., 2012). The enrichment test identified significant terms related to temozolomide mechanisms, including mitotic nuclear division (GO:0007067), cell division (GO:0051301), and DNA replication (GO:0006260). Docetaxel, a taxoid antineoplastic agent, binds to microtubules and prevents their depolymerization induced by calcium ions (Kumar, 1981), disrupting the cytoskeleton of cancer cells during mitosis (Trendowski, 2014). We found significant molecular function terms in gene ontology, including calcium ion binding (GO:0005509) and structural constituent of cytoskeleton (GO:0005200). For these genes, we conducted a functional enrichment analysis using the Database for Annotation, Visualization, and Integrated Discovery (Huang et al., 2009) to explore gene ontology (GO) terms (Consortium, 2008).

4.3. Ablation studies

To study the impact of latent diffusion on either AE or variational encoder space as well as the diffusion model directly generating expression data, we report ablation performance for drug response prediction performance on the PDX and TCGA datasets as in Table 4 in terms of F1 score. In the table, latent diff on AE corresponds to a latent diffusion model on AE latent space, diff direct corresponds to a diffusion model that directly generates expression data, latent diff on VAE corresponds to a latent diffusion model on variational AE latent space, DRGAT (no FS) corresponds to DRGAT without feature selection part, and DRGAT (no HO-GAT) corresponds to DRGAT without high-order GAT where HO-GAT is replaced with a traditional GAT (Veličković et al., 2018). DRGAT outperforms all remaining ablation studies. Overall, training the diffusion model in latent space leads to a better performance (Rombach et al., 2022). Among the ablation scenarios, running latent diffusion on variational encoder latent space performs the best, as variational encoder is also effective in searching through latent space (Sohn et al., 2015a).

Table 4.
Ablation Performance in Terms of F1 Score on Drug Response Prediction on the PDX and TCGA Datasets

PDX TCGA

Model Pac Cet Erl Doc Cis Gem Avg

Latent diff on AE 0.79 0.59 0.68 0.64 0.70 0.71 0.69

Diff direct 0.72 0.61 0.81 0.71 0.80 0.67 0.72

Latent diff on VAE 0.78 0.75 0.85 0.83 0.73 0.63 0.76

DRGAT (no FS) 0.72 0.77 0.75 0.72 0.63 0.70 0.71

DRGAT (no HO-GAT) 0.70 0.69 0.79 0.73 0.68 0.53 0.68

DRGAT (no-aug) 0.72 0.69 0.82 0.74 0.77 0.68 0.73

DRGAT 0.81 0.85 0.86 0.86 0.84 0.75 0.82

	PDX	TCGA
Latent diff on AE	0.79	0.59	0.68	0.64	0.70	0.71	0.69
Diff direct	0.72	0.61	0.81	0.71	0.80	0.67	0.72
Latent diff on VAE	0.78	0.75	0.85	0.83	0.73	0.63	0.76
DRGAT (no FS)	0.72	0.77	0.75	0.72	0.63	0.70	0.71
DRGAT (no HO-GAT)	0.70	0.69	0.79	0.73	0.68	0.53	0.68
DRGAT (no-aug)	0.72	0.69	0.82	0.74	0.77	0.68	0.73
DRGAT	0.81	0.85	0.86	0.86	0.84	0.75	0.82

Bold values indicate methods with best performance for each drug.

HO-GAT, high-order neighbor propagation graph attention networks; VAE, variational autoencoder.

We also compared our augmentation module with four generative models designed for tabular data, excluding image-based generative models. Results are shown in Table 5 in terms of F1 score. The conditional variational autoencoder (CVAE) (Sohn et al., 2015b) is a generative model that generates new data samples by mapping a set of conditional variables to the output data distribution, extending the traditional VAE by adding conditional variables to guide the generation process. We used a CVAE with the KL annealing technique, inspired by β-VAE (Higgins et al., 2017a). The conditional tabular GAN (CTGAN) (Xu et al., 2019a) generates synthetic tabular data by learning the distribution of real data and using the generator to produce synthetic data that mirrors the statistical properties of the original data. Tabular VAE (TVAE) (Xu et al., 2019a) is a VAE specifically designed for tabular data, aiming to learn a compact latent representation for tasks such as data generation and anomaly detection. Lastly, CopulaGAN (Patki et al., 2016), implemented in the Synthetic Data Vault library, uses copulas to capture complex multivariate dependencies in tabular data, allowing it to generate synthetic data with a similar statistical structure to the original dataset.

Table 5.

Ablation Performance in Terms of F1 Score on Drug Response Prediction on the PDX and TCGA Datasets

	PDX			TCGA
Model	Pac	Cet	Erl	Doc	Cis	Gem	Avg
DRGAT (with CVAE)	0.77	0.77	0.81	0.81	0.79	0.70	0.75
DRGAT (with CTGAN)	0.80	0.81	0.83	0.81	0.79	0.70	0.78
DRGAT (with TVAE)	0.81	0.82	0.83	0.82	0.80	0.70	0.79
DRGAT (with CopularGAN)	0.79	0.80	0.84	0.81	0.78	0.69	0.77
DRGAT	0.81	0.85	0.86	0.86	0.84	0.75	0.82

Bold values indicate methods with best performance for each drug.

CTGAN, conditional tabular GAN; CVAE, conditional variational autoencoder; TVAE, tabular VAE.

4.4. Synthetic data results

In the prior experiments, we observed that incorporating the data generated by the proposed augmentation technique into the training set led to enhanced prediction performance on the benchmark dataset. However, it remained uncertain whether this improvement was solely attributable to the high quality of the generated data. Thus, in this section, we examine whether the data produced by the augmentation module accurately replicated the distribution of real data.

In synthetic data experiments, we employed the generative module with the hyperparameters outlined in Section 4.2 to generate samples equal in number to the real samples. To quantitatively assess the quality of the generated samples compared with existing methods and our proposed approach, we utilized four evaluation metrics inspired by a previous study (Goncalves et al., 2020): cosine similarity, log-cluster, pairwise difference (PD), and Kullback–Leibler divergence (KLD). Cosine similarity measured the angle between two non-zero vectors in an inner product space, indicating the similarity between them. The log-cluster metric evaluated the similarity of the latent structures in the real and synthetic datasets in terms of clustering. PD was determined as the Euclidean distance between pairs of real and synthetic data. Lastly, KLD was calculated based on the comparison of real and synthetic marginal probability mass functions, measuring the similarity between the two. For KLD, PD, and log-cluster, lower values indicated better performance, while higher cosine similarity indicated better performance.

We compared our augmentation module with four generative models designed for tabular data, excluding image-based generative models. The CVAE (Sohn et al., 2015b) is a generative model that generates new data samples by mapping a set of conditional variables to the output data distribution, extending the traditional VAE by adding conditional variables to guide the generation process. We used a CVAE with the KL annealing technique, inspired by β-VAE (Higgins et al., 2017a). The CTGAN (Xu et al., 2019a) generates synthetic tabular data by learning the distribution of real data and using the generator to produce synthetic data that mirrors the statistical properties of the original data. TVAE (Xu et al., 2019a) is a VAE specifically designed for tabular data, aiming to learn a compact latent representation for tasks such as data generation and anomaly detection. Lastly, CopulaGAN (Patki et al., 2016), implemented in the Synthetic Data Vault library, uses copulas to capture complex multivariate dependencies in tabular data, allowing it to generate synthetic data with a similar statistical structure to the original dataset.

4.4.1. Quantitative performance

Using the evaluation metrics described above, we assessed the quality of the generated GE data for each of the six drugs, as shown in Table 6. We then calculated the final model performance by averaging the results of the four evaluation metrics among all these drugs. When comparing the overall performance across the four metrics between our model and the four baseline models, the proposed model showed significantly better results in KLD, pairwise distance, and cosine similarity, with the exception of the log-cluster metric. These results indicate that the proposed model effectively approximates the distribution of real-world data and accurately captures the relationships between features.

Table 6.
Comparison of Generation Performances of Different Models with DRGAT in Terms of Four Metrics

Method KLD PD Log-cluster Cosine sim

CVAE 0.0149 8.260 −1.312 0.975

CTGAN 0.0111 6.121 −1.535 0.983

TVAE 0.0058 4.765 −1.443 0.986

CopularGAN 0.0128 7.021 −1.356 0.977

DRGAT 0.0041 4.565 −1.497 0.992

Method	KLD	PD	Log-cluster	Cosine sim
CVAE	0.0149	8.260	−1.312	0.975
CTGAN	0.0111	6.121	−1.535	0.983
TVAE	0.0058	4.765	−1.443	0.986
CopularGAN	0.0128	7.021	−1.356	0.977
DRGAT	0.0041	4.565	−1.497	0.992

Bold values indicate methods with best performance for evaluation metric.

KLD, Kullback–Leibler divergence; PD, pairwise difference.

4.4.2. Qualitative performance

To qualitatively evaluate the sampling diversity between the baseline generative models used in the quantitative assessment and our proposed augmentation module, we conducted a T-distributed Stochastic Neighbor Embedding (t-SNE) visualization (Van der Maaten and Hinton, 2008) comparison between real and synthetic data. The results are shown in Figure 1A−D for CTGAN, CVAE, CopulaGAN, and TVAE, respectively. Compared with the t-SNE distribution of real data, the synthetic data generated by CTGAN and CopulaGAN showed significant deviations in distribution due to the excessive presence of non-existent values. While the CVAE model generated data within the range of real data, it lacked diversity. Similarly, the TVAE model produced data within the real data range but also showed limited diversity. In contrast, the proposed augmentation module outperformed the baseline models in sampling diversity, with a distribution nearly identical to the real data, as shown in Figure 1E.

FIG. 1.

t-SNE visualizations of the synthetic and real data. (A) CTGAN. (B) CVAE. (C) CopulaGAN. (D) TVAE. (E) Proposed vs. real. CTGAN, conditional tabular GAN; CVAE, conditional variational autoencoder; TVAE, tabular VAE; t-SNE, T-distributed Stochastic Neighbor Embedding.

4.4.3. Controllable label generation

To demonstrate the ability of our proposed augmentation module to generate labels conditioned on drug response, we augmented the training data for both matched and unmatched cases. Matched cases refer to instances where both real and synthetic labels are either resistant or sensitive, while unmatched cases involve real and synthetic labels that differ, with one being resistant and the other sensitive. The augmentation rate was set to 70% of the real data, consistent with the rate used in Section 4.2. Figure 2 displays the validation loss when training the drug response prediction model for Docetaxel using augmented data in both cases. The validation loss reduction was smaller for unmatched cases compared with matched cases, suggesting that the prediction model struggled to learn drug response predictions when trained on unmatched data. This indicates that our proposed generative model is capable of controlling drug response predictions based on the given condition, whether it is resistance or sensitivity.

FIG. 2.

Augmentation of training data for matched and unmatched cases.

4.5. Optimal hyperparameter tuning

These experiments outline the number of biological pathways selected in the first module and the amount of augmented data required in the second module to achieve optimal performance. The experimental setup was similar to the one described in the earlier section on drug response prediction enhancement. The training set was split 80:20 for model validation, with synthetic data excluded from the validation set. Importantly, the independent benchmark datasets PDX and TCGA were not included in this analysis. Additionally, a stratified 5-fold cross-validation was conducted within the training set, maintaining the same 80:20 ratio for training and validation. The performance metric was defined as the average of the final validation losses recorded across each fold under these experimental conditions.

4.5.1. Performance

Figure 3A illustrates the pattern of average final cross-validation loss values based on different levels of training data augmentation, while Figure 3B shows the same pattern based on varying degrees of biological pathway selection. The performance improvements related to augmentation and pathway selection were inconsistent across the six drug cases. However, a common observation across all drug cases was the presence of a threshold beyond which further data augmentation no longer enhanced performance. In other words, simply increasing the amount of data did not result in unlimited improvements in drug response prediction performance. Similarly, there was a threshold for performance improvement with pathway selection, where selecting more pathways did not continuously enhance performance. The optimal performance in drug response prediction was generally achieved when the data augmentation ratio was between 75% and 125% of the original data across all drugs. Specifically, the best performance, reflected by the lowest average validation loss, was observed with a data augmentation ratio of 70%−75% across various augmentation levels at 25% increments from 0% to 150%. Likewise, the optimal hyperparameter for pathway selection was determined by selecting 5% of the total pathways, as per the previously described method.

FIG. 3.

Validation losses across augmentation ratio and selection ratio. (A) Augmentation ratio. (B) Selection ratio.

5. CONCLUSION

Here, we come up with a novel drug response prediction method to tackle the challenge of high dimensionality and lower sample size in the dataset. We handle this challenge by utilizing a network-based attribute selection method to identify key variables where we calculate the proximity between a drug and its target proteins/genes. Additionally, we focus on augmenting the GE dataset via DDIM-based generation models and used a high-order GAT for drug response prediction. During these steps, we use biological pathways in the graph as a prior knowledge. Our proposed approach exhibited remarkable generalizability across unseen patient datasets in terms of predicting drug responses, when compared with the competing approaches. Additionally, our diffusion-based generation module, which is used for GE data augmentation, has generated high-quality samples compared with those generated via competing baselines.

Footnotes

AUTHOR’S CONTRIBUTIONS

E.S.: Methodology, software, visualization, and writing. The author read and approved the final article.

AUTHOR DISCLOSURE STATEMENT

The author declares no conflict of interest.

FUNDING INFORMATION

This research was funded by TUBITAK (Scientific and Technological Research Council of Turkey) 3501 Project with grant number 122E706.

References

Al-Mekhlafi

, Becker

, Klawonn

. Sample size and performance estimation for biomarker combinations based on pilot studies with small sample sizes. Communications in Statistics—Theory and Methods, 2022; 51(16):5534–5548; doi: 10.1080/03610926.2020.1843053

Arik

, Pfister

. Tabnet: Attentive interpretable tabular learning, 2020.

Barretina

, Caponigro

, Stransky

, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature, 2012; 483(7391):603–607.

Battaglia

, Hamrick

JBC

, Bapst

, et al. Relational inductive biases, deep learning, and graph networks. arXiv. 2018. Available from: https://arxiv.org/pdf/1806.01261.pdf

Bloemink

, Reedijk

. Cisplatin and derived anticancer drugs: Mechanism and current status of dna binding. Met Ions Biol Syst, 1996; 32:641–685.

Chadebec

, Allassonnière

. Data augmentation with variational autoencoders and manifold sampling. In: Deep Generative Models, and Data Augmentation, Labelling, and Imperfections. ( Engelhardt

, Oksuz

, Zhu

, Yuan

, Mukhopadhyay

, Heller

, Huang

S. X

, Nguyen

, Sznitman

, and Xue

., eds). Springer International Publishing: Cham; 2021. 184–192 ISBN 978-3-030-88210-5.

Chaudhari

, Agrawal

, Kotecha

. Data augmentation using mg-gan for improved cancer classification on gene expression data. Soft Comput, 2020; 24(15):11381–11391; doi: 10.1007/s00500-019-04602-2

Chiu

Y-C

, Chen

H-IH

, Zhang

, et al. Predicting drug response of tumors from integrated genomic profiles by deep neural networks. BMC Med Genomics, 2019; 12(Suppl 1):18; doi: 10.1186/s12920-018-0460-9

Consortium

. The gene ontology project in 2008. Nucleic Acids Research, 2008; 36(suppl_1):D440–D444.

10.

Devlin

, Chang

M-W

, Lee

, et al. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). ( Burstein

, Doran

, and Solorio

, eds) pages 4171–4186, Association for Computational Linguistics: Minneapolis, Minnesota, June 2019; doi: 10.18653/v1/N19-1423

11.

Ding

, Chen

, Cooper

, et al. Precision oncology beyond targeted therapy: Combining omics data with machine learning matches the majority of cancer cells to effective therapeutics. Mol Cancer Res, 2018; 16(2):269–278; doi: 10.1158/1541-7786.MCR-17-0378

12.

Eraslan

, Avsec

, Gagneur

, et al. Deep learning: New computational modelling techniques for genomics. Nat Rev Genet, 2019; 20(7):389–403.

13.

Fernández-Torras

, Duran-Frigola

, Aloy

. Encircling the regions of the pharmacogenomic landscape that determine drug response. Genome Med, 2019; 11(1):17; doi: 10.1186/s13073-019-0626-x

14.

Gao

, Korn

, Ferretti

, et al. High-throughput screening using Patient-Derived Tumor Xenografts to predict clinical trial drug response. Nat Med, 2015; 21(11):1318–1325; doi: 10.1038/nm.3954

15.

Geeleher

, Cox

, Huang

. Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. Genome Biol, 2014; 15(3):R47; doi: 10.1186/gb-2014-15-3-r47

16.

Gezici

AHB

, Sefer

. Deep transformer-based asset price and direction prediction. IEEE Access, 2024; 12:24164–24178; doi: 10.1109/ACCESS.2024.3358452

17.

Goncalves

, Ray

, Soper

, et al. Generation and evaluation of synthetic patient data. BMC Med Res Methodol, 2020; 20(1):108; doi: 10.1186/s12874-020-00977-1

18.

Goodfellow

, Pouget-Abadie

, Mirza

, et al. Generative adversarial nets. Advances in Neural Information Processing Systems, 2014; 27.

19.

Graim

, Friedl

, Houlahan

, et al. PLATYPUS: A Multiple–View Learning Predictive Framework for Cancer Drug Sensitivity Prediction. Pac Symp Biocomput, 2019; 24:136–147; doi: 10.1142/9789813279827_0013

20.

Guney

, Menche

, Vidal

, et al. Network-based in silico drug efficacy screening. Nat Commun, 2016; 7(1):10331; doi: 10.1038/ncomms10331

21.

Hamilton

, Ying

, Leskovec

. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 1025–1035, Curran Associates Inc: Red Hook, NY, USA; 2017. ISBN 9781510860964.

22.

Higgins

, Matthey

, Pal

, et al. beta-vae: Learning basic visual concepts with a constrained variational framework. ICLR (Poster), 2017b;3.

23.

Higgins

, Matthey

, Pal

, et al. beta-vae: Learning basic visual concepts with a constrained variational framework. In 5th International Conference on Learning Representations, ICLR 2017, April 24-26, 2017, Conference Track Proceedings. OpenReview.net: Toulon, France; 2017a. https://openreview.net/forum?id=Sy2fzU9gl

24.

, Jain

, and Abbeel

. Denoising diffusion probabilistic models. arXiv preprint arxiv:2006.11239, 2020.

25.

Huang

, Sherman

, Lempicki

. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc, 2009; 4(1):44–57.

26.

Iorio

, Knijnenburg

, Vis

, et al. A landscape of pharmacogenomic interactions in cancer. Cell, 2016; 166(3):740–754; doi: 10.1016/j.cell.2016.06.017

27.

Kanehisa

, Goto

. KEGG: Kyoto Encyclopedia Of Genes And Genomes. Nucleic Acids Res, 2000; 28(1):27–30.

28.

Karras

, Aittala

, Hellsten

, et al., andTraining generative adversarial networks with limited data. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Curran Associates Inc.: Red Hook, NY, USA; 2020. ISBN 9781713829546.

29.

Kim

, Bae

, Piao

, et al. Graph convolutional network for drug response prediction using gene expression data. Mathematics, 2021; 9(7):772–7390; doi: 10.3390/math9070772

30.

Kingma

D. P

. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.

31.

Kumar

. Taxol-induced polymerization of purified tubulin. Mechanism of action. J Biol Chem, 1981; 256(20):10435–10441.

32.

Kursa

, Rudnicki

. Feature selection with the Boruta package. J Stat Soft, 2010; 36(11):1–13.

33.

Kurzlechner

, Kishnani

, Chowdhury

, et al. ¡i¿discovari¡/i¿: A web-based precision medicine tool for predicting variant pathogenicity in cardiomyopathy- and channelopathy-associated genes. Circ Genom Precis Med, 2023; 16(4):317–327; doi: 10.1161/CIRCGEN.122.003911

34.

Lacan

, Sebag

, Hanczar

. Gan-based data augmentation for transcriptomics: Survey and comparative assessment. Bioinformatics, 2023; 39(39 (Suppl 1)):i111–i120.

35.

Lee

, Park

, Doing

, et al. Correcting for experiment-specific variability in expression compendia can remove underlying signals. Gigascience, 2020; 9(11):giaa117.

36.

, Wang

, Zheng

, et al. Deepdsc: A deep learning method to predict drug sensitivity of cancer cell lines. IEEE/ACM Trans Comput Biol Bioinform, 2021; 18(2):575–582; doi: 10.1109/TCBB.2019.2919581

37.

Liu

, Song

, Huang

, et al. GraphCDR: A graph neural network method with contrastive learning for cancer drug response prediction. Brief Bioinform, 2022; 23(1):bbab457; doi: 10.1093/bib/bbab457

38.

Liu

, Wei

, Zhang

, et al. Deep neural networks for high dimension, low sample size data. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI’17, AAAI Press; 2017. 2287–2293. ISBN 9780999241103.

39.

. Impacts of high dimensionality in finite samples. Ann Statist, 2013; 41(4):2236–2262; doi: 10.1214/13-AOS1149

40.

Park

, Soh

, Lee

. Super.FELT: Supervised feature extraction learning using triplet loss for drug response prediction with multi-omics data. BMC Bioinformatics, 2021; 22(1):269; doi: 10.1186/s12859-021-04146-z

41.

Patki

, Wedge

, Veeramachaneni

. The synthetic data vault. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2016. 399–410 doi: 10.1109/DSAA.2016.49

42.

Patro

, Sefer

, Malin

, et al. Parsimonious reconstruction of network evolution. In Algorithms in Bioinformatics. ( Przytycka

T. M

and Sagot

M.-F

, eds) Springer Berlin Heidelberg: Berlin, Heidelberg; 2011. 237–249. ISBN 978-3-642-23038-7.

43.

Patro

, Duggal

, Sefer

, et al. The missing models: A data-driven approach for learning how networks grow. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ‘12. Association for Computing Machinery: New York, NY, USA, 2012. 42–50. ISBN 9781450314626; doi: 10.1145/2339530.2339541

44.

Radford

. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.

45.

Rezende

, Mohamed

. Variational inference with normalizing flows. In International conference on machine learning. PMLR; 2015. 1530–1538.

46.

Robson

, Boray

. Studies of the role of a smart web for precision medicine supported by biobanking. Per Med, 2016; 13(4):361–380; doi: 10.2217/pme-2015-0012

47.

Rombach

, Blattmann

, Lorenz

, et al. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. 10684–10695.

48.

Ronneberger

, Fischer

, Brox

. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. ( Navab

, Hornegger

, Wells

W. M

, and Frangi

A. F

., eds). Springer International Publishing: Cham; 2015. 234–241. ISBN 978-3-319-24574-4.

49.

Ryu

, Lim

, Hong

S. H.

, and Kim

W. Y

. Deeply learning molecular structure-property relationships using attention- and gate-augmented graph convolutional network, 2018.

50.

Schroff

, Kalenichenko

, Philbin

. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2015. 815–823.

51.

Seashore-Ludlow

, Rees

, Cheah

, et al. Harnessing connectivity in a large-scale small-molecule sensitivity dataset. Cancer Discov, 2015; 5(11):1210–1223.

52.

Sefer

. Biocode: A data-driven procedure to learn the growth of biological networks. IEEE/ACM Trans Comput Biol Bioinform, 2022a;PP(6):3103–3113.

53.

Sefer

. Hi–c interaction graph analysis reveals the impact of histone modifications in chromatin shape. Appl Netw Sci, 2021; 6(1):54; doi: 10.1007/s41109-021-00396-1

54.

Sefer

. Probc: Joint modeling of epigenome and transcriptome effects in 3d genome. BMC Genomics, 2022b;23(1):287.

55.

Sefer

, Kingsford

. Metric labeling and semi-metric embedding for protein annotation prediction. In Research in Computational Molecular Biology. ( Bafna

and Sahinalp

S. C

., eds) Springer Berlin Heidelberg: Berlin, Heidelberg; 2011. 392–407. ISBN 978-3-642-20036-6.

56.

Sharifi-Noghabi

, Zolotareva

, Collins

, et al. MOLI: Multi-omics late integration with deep neural networks for drug response prediction. Bioinformatics, 2019; 35(14):i501–i509; doi: 10.1093/bioinformatics/btz318

57.

Sharma

, Lysenko

, Boroevich

, et al. Deepinsight-3d architecture for anti-cancer drug response prediction with deep-learning on multi-omics. Sci Rep, 2023; 13(1):2483; doi: 10.1038/s41598-023-29644-3

58.

Sohn

, Lee

, Yan

. Learning structured output representation using deep conditional generative models. Advances in Neural Information Processing Systems, 2015a;28.

59.

Sohn

, Yan

, Lee

. Learning structured output representation using deep conditional generative models. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS’15. MIT Press: Cambridge, MA, USA; 2015b. 3483–3491.

60.

Song

, Ermon

. Generative Modeling by Estimating Gradients of the Data Distribution. Curran Associates Inc.: Red Hook, NY, USA; 2019.

61.

Song

, Meng

, and Ermon

. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.

62.

Song

, Sohl-Dickstein

, Kingma

, et al. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021. https://openreview.net/forum?id=PxTIG12RRHS

63.

Szklarczyk

, Gable

, Lyon

, et al. STRING v11: Protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res, 2019; 47(D1):D607–D613; doi: 10.1093/nar/gky1131

64.

Trendowski

. Exploiting the cytoskeletal filaments of neoplastic cells to potentiate a novel therapeutic approach. Biochim Biophys Acta, 2014; 1846(2):599–616.

65.

Van der Maaten

, Hinton

. Visualizing data using t-sne. Journal of Machine Learning Research, 2008; 9(11).

66.

Veličković

, Cucurull

, Casanova

, et al. Graph attention networks, 2018. Available from: https://arxiv.org/abs/1710.10903

67.

Weinstein

, Collisson

, Mills

, et al. Cancer Genome Atlas Research Network. The Cancer Genome Atlas pan-cancer analysis project. Nat Genet, 2013; 45(10):1113–1120.

68.

Xiong

, Sun

, Luo

, et al. Graph attention network with high-order neighbor information propagation for social recommendation. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24. ( Larson

, ed) International Joint Conferences on Artificial Intelligence Organization; 2024. 2478–2486; doi: 10.24963/ijcai.2024/274 Main Track

69.

, Gu

, Wang

, et al. Autoencoder based feature selection method for classification of anticancer drug response. Front Genet, 2019b;10:233; doi: 10.3389/fgene.2019.00233

70.

, Skoularidou

, Cuesta-Infante

, et al. Modeling Tabular Data Using Conditional GAN. Curran Associates Inc.: Red Hook, NY, USA; 2019a.

71.

Yang

, Soares

, Greninger

, et al. Genomics of drug sensitivity in cancer (GDSC): A resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res, 2013; 41(Database issue):D955–D961.

72.

Zarei

, Costas

, Orozco

, et al. A web-based pharmacogenomics search tool for precision medicine in perioperative care. J Pers Med, 2020; 10(3):2075–4426; doi: 10.3390/jpm10030065

73.

Zhang

, Stevens

MFG

, Bradshaw

. Temozolomide: Mechanisms of action, repair and resistance. Curr Mol Pharmacol, 2012; 5(1):102–114.