Gene Network Enrichment Analysis and Its Application to Explore Enriched Immune Disease Pathways for Gene Network of Acute Myeloid Leukemia Cell Lines

Abstract

Recent advances in gene network analysis have improved our understanding of complex disease mechanisms; however, interpreting estimated gene networks remains challenging. Existing methods for pathway enrichment analysis focus on gene sets and therefore fail to capture interaction-level information that is critical for understanding disease-related molecular interplays. Here, we propose a novel computational strategy for gene network enrichment analysis (GNEA) that evaluates pathway overrepresentation at the edge level, explicitly incorporating both network structure and the biological importance of hub genes. Thus, our strategy provides reliable biological results. We demonstrated the efficacy of our approach through Monte Carlo simulations of myeloid neoplasms and pan-cancer-related pathway-enriched gene network analysis. The proposed strategy was applied to immune disease pathway-enriched gene network analysis. Our results identify inflammatory bowel disease-related pathways enriched in both acute myeloid leukemia (AML)-aged and AML-young networks, and asthma-related pathways enriched in healthy-young networks. Our results suggested that “activation of CD40 and CD40LG” and “mutual activation between HLA-DPB1 and IL4R” are potential markers to uncover AML-related mechanisms. Overall, this study demonstrates that GNEA provides a powerful framework for uncovering biologically meaningful interaction-level insights into complex diseases.

Keywords

acute myeloid leukemia enrichment analysis functional pathway gene network

1. INTRODUCTION

Gene network analyses provide a comprehensive understanding of the relationships among genes involved in disease mechanisms at the system level. The molecular mechanisms of diseases involve abnormalities in complex molecular networks rather than disorders in a single gene. Thus, heterogeneous gene networks have garnered considerable attention for understanding complex disease mechanisms. To infer gene regulatory networks, various computational approaches have been developed and used for gene network-based analyses, such as L1-type regularization and deep learning methods. Although several studies on computational network biology have been conducted to infer gene networks, the interpretation of the estimated large gene networks remains challenging. However, crucial clinical insights cannot be obtained without the interpretation of the estimated molecular interplays. Uncovering the biological meaning and function of the estimated network is critical for identifying biomarkers and related mechanisms involved in diseases.

Several studies have identified the biological significance of gene sets (Kim and Volsky, 2005; Huang et al., 2009). Huang et al. (2009) developed a computational strategy called gene enrichment analysis, which detects the gene sets that are differentially enriched under experimental conditions. Overrepresentation analysis (ORA) is a commonly used methodology for pathway analysis of gene sets (Draghici et al., 2003; da et al., 2009). ORA determines whether known biological functions are overrepresented in the gene set of interest, based on a hypergeometric test. The network-based gene set enrichment analysis, also known as the efficient network enrichment analysis test (NEAT) (Signorelli et al., 2016), measures the level of enrichment based on the association between genes in the gene set of interest and those in the functional set. Although several computational strategies for gene set enrichment analysis have been developed and applied to interpret experimental results, relatively little attention has been paid to the pathway analysis of gene networks.

In this study, we aimed to develop a novel strategy for gene network enrichment analysis (GNEA). The developed method measures the enrichment of a gene network for a specific functional pathway based on the overrepresentation of edges. In other words, we assessed the enrichment of a gene network by considering the number of edges that were overrepresented in the network of genes involved in a specific functional pathway. The developed method enabled us to perform GNEA based on comprehensive information about the gene network, that is, edge structure and genes belonging to the network, because the edge structure is determined by genes. Furthermore, hub genes linked to several genes have a major effect on the computation of the gene network enrichment score because hub genes have several edges. This implies that our method incorporates the knowledge of network biology that hub genes play key roles in biological processes. Thus, we can obtain biologically reliable results.

Through Monte Carlo simulations, we demonstrated the effectiveness of the proposed strategy in myeloid neoplasms and pan-cancer-related pathway-enriched gene network analysis. We applied our strategy to explore the enriched immune disease pathways for gene networks of “healthy-young,” “healthy-aged,” “acute myeloid leukemia (AML)-young,” and “AML-aged groups.” Our strategy revealed that the AML-aged and -young networks were enriched in the inflammatory bowel disease (IBD) pathway, whereas the asthma pathway was identified as the enriched pathway for the healthy-young network. As AML-specific molecular interplays, “suppression of interplay between HLA-DPB1 and IL4R” and “inhibition of interplay between JUN and HLA-DPB1” were identified. Furthermore, “disappearance of the interplay between RNASE3 and PRG2” was identified as a specific molecular characteristic of healthy-young individuals. We suggest that “activation of CD40 and CD40LG” and “mutual activation between HLA-DPB1 and IL4R and between JUN and HLA-DPB1” may be key markers to uncover molecular mechanisms underlying AML.

The remainder of this article is organized as follows: In the section on pathway analysis of gene sets, we introduce previous studies on gene set enrichment analysis. We proposed a novel computational strategy for GNEA in Section 3. In Section 4, we evaluate our method based on myeloid neoplasm and pan-cancer-related pathway-enriched gene network analysis. Finally, we describe the results of immune disease pathway-enriched gene network analysis. The conclusions are presented along with the discussion section.

2. METHODS

The gene network can be represented by a weighted graph $G = (V, E, W)$ , where $V = {1, ..., p}$ is the set of vertices corresponding to p genes and $E \in V \times V$ is the set of edges (i.e., pair $(i, j)$ , where $i, j \in V$ and $(i, j) \in E \Leftrightarrow (j, i) \in E)$ . The edge weight is given as $W = (w_{i j}), (i, j) \in E$ .

2.1. ORA

ORA is one of the most widely used methods for pathway analysis of gene sets. This method determines whether known biological functions are overrepresented in the gene set of interest (Boyle et al., 2004). ORA is based on a hypergeometric test that describes the discrete probability of x successes in $N_{s}$ random draws without replacement from a population of total size T, containing $N_{p}$ objects with that attribute.

For the $i^{t h}$ target pathway, T is the total number of genes in all pathways of the collection, $N_{s}$ is the number of genes in the gene set of interest, and x is a subset of $N_{s}$ belonging to the $i^{t h}$ target pathway. The observed x can be considered the realization of a random variable X that follows the hypergeometric distribution

X \sim hypergeom (n = N_{s}, K = N_{p}, N = T) .

(1)

The observed overlap x can be viewed as a realization of a hypergeometric random variable, and the probability of observing an intersection of size x is given by

P (X = x) = \frac{(\begin{matrix} N_{p} \\ x \end{matrix}) (\begin{matrix} T - N_{p} \\ N_{s} - x \end{matrix})}{(\begin{matrix} T \\ N_{s} \end{matrix})},

(2)

where

(\begin{matrix} i \\ j \end{matrix})

is the binomial coefficient. The significance of overrepresentation (i.e., enrichment) can be measured by the following p-value:

p . {value}_{ORA} = \sum_{y = x}^{N_{s}} \frac{(\begin{matrix} N_{p} \\ y \end{matrix}) (\begin{matrix} T - N_{p} \\ N_{s} - y \end{matrix})}{(\begin{matrix} T \\ N_{s} \end{matrix})} .

(3)

2.2. An efficient NEAT

Signorelli et al. (2016) developed a computational strategy for gene set enrichment analysis based on gene networks called an efficient NEAT. The associations and/or dependencies between genes can be described by a network, that is, the relationship between gene sets is determined by the presence or absence of links connecting genes into two sets. NEAT performs enrichment analysis by evaluating the level of association between genes in the gene set of interest and those in the functional set. In network-based enrichment analysis, the situation in which the number of links between two gene sets is larger (or smaller) than expected by chance is considered as the enrichment between two gene sets. Thus, the presence of enrichment from gene set A (i.e., $V_{A}$ ) to gene set B (i.e., $V_{B}$ ) is determined by considering the number of arrows from the genes in $V_{A}$ to the genes belonging to $V_{B}$ . The number of arrows from genes in $V_{A}$ to genes belonging to $V_{B}$ is denoted by $x_{A B}$ , which can be considered a realization of a random variable $X_{A B}$ . The significance of the enrichment is measured by the hypergeometric distributions similar to the ORA, where the arrows in $V_{A}$ that reach genes in $V_{B}$ are considered “successful” and the remaining ones are considered “unsuccessful” (Signorelli et al., 2016). The arrows from the genes in $V_{A}$ are considered a random sample without replacement from the population of arrows in the graph. Thus, if no relationship (i.e., no enrichment) exists between $V_{A}$ and $V_{B}$ , the distribution of $X_{A B}$ is as follows:

X_{A B} \sim hypergeom (n = N_{A}, K = N_{B}, N = T_{a}),

(4)

where the sample size

N_{A}

is the total number of arrows exiting genes belonging to

V_{A}

, the number of successful cases in the population

N_{B}

is the number of incoming arrows in

V_{B}

, and the population size

T_{a}

is the total number of arrows. NEAT computes the probability of enrichment as follows:

P (X_{A B} = x_{A B}) = \frac{(\begin{matrix} N_{B} \\ x_{A B} \end{matrix}) (\begin{matrix} T_{a} - N_{B} \\ N_{A} - x_{A B} \end{matrix})}{(\begin{matrix} T_{a} \\ N_{A} \end{matrix})} .

(5)

The significance of the enrichment is assessed using the following p-value:

p . {value}_{NEAT} = \sum_{y = x_{A B}}^{N_{A}} \frac{(\begin{matrix} N_{B} \\ y \end{matrix}) (\begin{matrix} T_{a} - N_{B} \\ N_{A} - y \end{matrix})}{(\begin{matrix} T_{a} \\ N_{A} \end{matrix})} .

(6)

The p-value_NEAT and p-value_ORA are often adjusted for multiple comparisons.

The gene set enrichment analysis of ORA and NEAT can be extended to GNEA by considering a set of nodes (genes) in the network as the gene set of interest.

Limitations of the existing methods: Although the existing methods can effectively uncover functional gene sets, the methods cannot perform gene set enrichment analysis based on the network. That is, existing methods are not used for GNEA but for network-based gene set enrichment analysis. Thus, we cannot effectively identify functional gene networks because the existing methods cannot incorporate the characteristics (information) of gene networks, such as edge structures.

3. GENE NETWORK ENRICHMENT ANALYSIS

We developed a novel computational strategy called GNEA. The overall framework of our strategy is given as follows (see Fig. 1).

FIG. 1.

Overview of the gene behavior-based network enrichment analysis. Given a query gene network, edges are classified based on whether they overlap with the network formed by genes involved in a target pathway. That is, the edges in the query network that are also present in the network of genes associated with the target pathway are labeled as successful (e.g., red and blue edges corresponding to pathways A and B, respectively), while the remaining edges are labeled as unsuccessful. A hypergeometric test is then performed to assess whether the number of successful edges observed in the query network is greater than expected by chance, given the total number of possible edges. Pathways with statistically significant enrichment are identified as significantly enriched pathways.

3.1. Query network

The query network represents the gene network of interest to be tested for enrichment (i.e., subject of GNEA). It is constructed from the data under a specific biological condition (e.g., disease or age group), where nodes correspond to genes and edges represent inferred gene–gene interactions. The query network can be estimated using various approaches, including regression-based methods, correlation-based networks, or protein–protein interaction-derived networks.

3.2. Target network

The target network represents a reference gene network associated with a predefined biological pathway (i.e., pathway-related network). In our study, nodes of the target network correspond to genes involved in a specific pathway. The target network can also be obtained from curated pathway databases or estimated using network inference methods applied to pathway-restricted gene sets.

3.3. Association between query and target networks

The GNEA evaluates whether the query network contains a large number of edges that overlap with the target (pathway) network. Specifically, the overlap is defined as the set of edges that are present in both networks. The statistical significance of this overlap is assessed using a hypergeometric test, which evaluates whether the observed number of shared edges exceeds what would be expected by chance.

We assessed the enrichment of the gene network by considering the number of edges in the network of genes involved in specific functional pathways. This implies that enrichment is measured by the information of not only the edge structure but also the genes in a network because the edge structure is determined by genes. We then proposed a gene network enrichment test based on hypergeometric distribution to assess the significance of the number of overrepresented edges. That is, we marked the edges in the network of interest that belong to the network of genes involved in the target pathway as “successful” and the remaining ones as “unsuccessful.” In our strategy, edges are treated as independent units under the null hypothesis, analogous to the independence assumption of genes in conventional gene set enrichment analysis. This assumption is used to define a hypergeometric null model for assessing whether the observed overlap of edges exceeds random expectation, given the fixed network size and pathway-specific edge count.

We denote $X_{e}$ as the number of edges in the network of interest belonging to the network of genes involved in the target pathway. Thus, if there is no relationship (i.e., no enrichment) between the gene networks of interest and the target pathway, the distribution of $X_{e}$ (the number of successes in the sample) is

X_{e} \sim hypergeom (n = N_{e (i)}, K = N_{e (t)}, N = | E |)

(7)

where

N_{e (i)}

is the number of edges in the network of interest,

N_{e (t)}

is the number of successful cases in the network of genes involved in the target pathway, and the population size |E| denotes the total number of gene–gene interactions in the estimated network, that is, the nonzero edges identified among the p genes.

The hypergeometric distribution gives the probability of observing exactly $x_{e}$ edges from the network of interest that belong to the network of genes involved in the target pathway.

P (X_{e} = x_{e}) = \frac{(\begin{matrix} N_{e (t)} \\ x_{e} \end{matrix}) (\begin{matrix} | E | - N_{e (t)} \\ N_{e (i)} - x_{e} \end{matrix})}{(\begin{matrix} | E | \\ N_{e (i)} \end{matrix})} .

(8)

We then assessed the significance of gene network enrichment based on the hypergeometric distribution as follows:

p . {value}_{GNEA} = P (X_{e} \geq x_{e}) = \sum_{y = x_{e}}^{N_{e (i)}} \frac{(\begin{matrix} N_{e (t)} \\ y \end{matrix}) (\begin{matrix} | E | - N_{e (t)} \\ N_{e (i)} - y \end{matrix})}{(\begin{matrix} | E | \\ N_{e (i)} \end{matrix})} .

(9)

The p-value_GENA was adjusted for multiple comparisons using the false discovery rate q-value based on the Benjamini–Hochberg correction (Benjamini and Hochberg, 1995). This correction is applied to control the expected proportion of false positives arising from multiple testing, where each hypothesis corresponds to assessing whether a given pathway exhibits significant network enrichment.

3.4. Advantages of the GNEA

3.4.1. Comprehensive information-based GNEA

Our method performs GNEA by jointly considering genes (nodes) and their interactions (edges). Since each edge represents a regulatory or functional relationship between a pair of genes, information at both the node and edge levels is explicitly incorporated into the enrichment analysis.

3.4.2. Biologically interpretable results

Hub genes linked with several genes have been recognized as crucial markers for understanding biological mechanisms because they play critical roles in gene regulation and biological processes. In our analysis, the hub genes have a significant effect on the enrichment score computation because the hub genes have several edges. That is, our method recognizes the hub genes that play a key role in biological processes as crucial factors in the GNEA. It implies that our strategy incorporates the knowledge of network biology into the GNEA and thus can provide biologically interpretable and reliable results.

4. MONTE CARLO SIMULATIONS

We illustrate the performance of the proposed strategy using Monte Carlo simulations. We used the publicly available CCLE expression dataset consisting of mRNA expression levels of 19,221 genes from 1406 cells from the DepMap database (https://depmap.org/portal/) to estimate gene networks. The CCLE expression dataset comprises 30 lineages, with lung cancer representing the largest proportion (over 14%), and most samples originating from individuals aged 40 years or older. Table 1 shows demographic information about the dataset.

Table 1.
Demographic Characteristics (i.e., Lineage, Gender, and Age) of the CCLE Expression Dataset, Where the Column “Percent” Indicates the Proportion of Samples in Each Category

Percent (%)

Lineage (top 5)

Lung 14.7

Blood 7.4

Central nervous system 6.1

Skin 6.0

Lymphocyte 5.9

Gender

Female 34.9

Male 43.6

Unknown 11.0

Age

∼20s 18.1

20s∼40s 18.8

40s∼60s 55.4

60s∼ 45.1

Unknown 31.3

	Percent (%)
Lineage (top 5)
Lung	14.7
Blood	7.4
Central nervous system	6.1
Skin	6.0
Lymphocyte	5.9
Gender
Female	34.9
Male	43.6
Unknown	11.0
Age
∼20s	18.1
20s∼40s	18.8
40s∼60s	55.4
60s∼	45.1
Unknown	31.3

In this study, the following linear regression model was used to describe the gene network:

y_{i l} = β_{l}^{T} x_{i} + ϵ_{i l}, i = 1, ..., n, l = 1, ..., q,

(10)

where

y_{l} \in R^{n}

and

x_{i} = (x_{i 1}, ..., x_{i p}) \in R^{n}

are the expression levels of the target and regulator genes, respectively,

β_{l} = {(β_{l 1}, ..., β_{l p})}^{T}

is the regression coefficient vector that represents the effect of p regulator genes on the

l^{t h}

target gene, and

ϵ_{i l}

is a random error vector. The elastic net is a well-established strategy for high-dimensional settings, such as gene expression data, where the number of predictors (e.g., regulator genes) far exceeds the sample size. It simultaneously performs variable selection and coefficient estimation by combining

L_{1}

and

L_{2}

penalties, making it suitable for inferring sparse gene regulatory relationships. In gene expression data, predictors corresponding to genes within the same pathway or functional module often exhibit strong correlations. While the lasso [i.e.,

δ = 1

in (11)] tends to select only a small subset of correlated predictors, the elastic net addresses this issue through its grouping effect, allowing sets of correlated genes to be selected jointly (Zou and Hastie, 2005). Beyond these statistical advantages, the regression framework also provides biologically meaningful representations of molecular interplays. In contrast to correlation-based networks, which reflect marginal associations, regression-based approaches aim to capture conditional dependencies among genes, thereby more closely approximating regulatory or functional interactions. Furthermore, the grouping effect of the elastic net enables the joint selection of genes that are functionally related, such as genes participating in the same signaling pathway or regulatory module (Zou and Hastie, 2005). This leads to subnetworks that are more coherent in terms of known biological processes, providing a biologically interpretable basis for downstream GNEA. As a result, elastic net-based models have been widely used for gene network estimation in high-dimensional biological data (Shimamura et al., 2011; Lee et al., 2019). Taken together, these characteristics make the elastic net-based regression framework a suitable and biologically relevant choice for constructing the query networks in our study. Thus, we employed the following elastic net to estimate the gene network (Zou and Hastie, 2005),

{\hat{β}}_{l} = \underset{β_{l}}{arg min} {\frac{1}{2} \sum_{i = 1}^{n} {(y_{i l} - β_{l}^{T} x_{i})}^{2} + λ \sum_{j = 1}^{p} [\frac{1}{2} (1 - δ) β_{l j}^{2} + δ | β_{l j} |]},

(11)

where

λ > 0

is a regularization parameter that controls the degree of shrinkage applied to

β_{l}

, and

0 \leq δ \leq 1

is a mixing parameter between the ridge (Hoerl and Kennard, 1970) and lasso (Tibshirani, 1996) penalties. The optimal values of the regularization parameters

λ

and

δ

were selected based on 3-fold cross-validation. The elastic net provides sparse estimation results for

β_{l}

, that is, the method can perform regulator genes selection and edge weight estimation simultaneously. Specifically, for the

l^{t h}

target gene, a directed edge from gene j to the

l^{t h}

target gene is included in the network if the estimated coefficient

{\hat{β}}_{l j} \neq 0

. No additional thresholding is applied beyond the sparsity induced by the elastic net regularization. The gene network is then constructed by aggregating all such edges across all target genes, with the nonzero coefficients serving as edge weights.

4.1. Myeloid neoplasm-related pathways

We considered the following myeloid neoplasm-related pathways and correct related pathways as the target pathways: AML (hsa05221), hematopoietic cell lineage (HMT: hsa04640), chronic myeloid leukemia (CLM: hsa05220), myelodysplastic syndrome (MDS: H01481), and myelodysplastic/myeloproliferative neoplasms (H02410) from the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database (https://www.genome.jp/kegg/pathway.html). Table 2 lists the myeloid neoplasm-related KEGG pathways, entries, and the number of genes involved in the pathways.

Table 2.
Myeloid Neoplasm-Related KEGG Pathways

Name Entry $#$ genes

AML Acute myeloid leukemia hsa05221 68

HMT Hematopoietic cell lineage hsa04640 99

CLM Chronic myeloid leukemia hsa05220 77

MDS Myelodysplastic syndrome H01481 9

Myelodysplastic/myeloproliferative neoplasms H02410 20

	Name	Entry	$#$ genes
AML	Acute myeloid leukemia	hsa05221	68
HMT	Hematopoietic cell lineage	hsa04640	99
CLM	Chronic myeloid leukemia	hsa05220	77
MDS	Myelodysplastic syndrome	H01481	9
Myelodysplastic/myeloproliferative neoplasms	H02410	20

KEGG, Kyoto Encyclopedia of Genes and Genomes.

The proposed method was evaluated via GNEA using four target pathways (AML, HMT, CLM, and MDS). We extracted the genes involved in myeloid neoplasm-related KEGG pathways and belonging to the CCLE expression dataset (i.e., AML: 64, HMT: 87, CLM: 41, and MDS: 22), where the extracted 214 genes were named KEGG myeloid neoplasm genes.

We estimated the gene network between p genes comprising KEGG myeloid neoplasm genes, from which the $p - 214$ genes with the highest variance in expression levels were extracted. The network between p genes describes the population of edges.

The target networks of the AML, HMT, CLM, and MDS pathways were estimated using 64, 87, 41, and 22 genes involved in the AML, HMT, CLM, and MDS pathways, respectively. For the query network (QR-“XXX”nwt) of the enrichment test, we estimated the gene network based on 90% (75%) of the randomly selected genes involved in a specific pathway and 10% (25%) of the genes not involved in the pathway, that is, we considered the noise rate as 10% and 25% for each query networks. That is, the query network of the AML-pathway enrichment gene network analysis (QR-“AML”nwt) was estimated by randomly selecting 58 ( $60 \times 0.9$ ) genes involved in the AML pathway and six ( $64 \times 0.1$ ) genes not involved in the pathway. If the p-value of the QR-“AML”nwt was smaller than the significance level in the AML-pathway-enriched network analysis, we defined the result as a true positive. In contrast, if the p-value of the QR-“AML”nwt was smaller than the significance level in the HMT, CLM, and MDS pathway-enriched network analysis, then we defined this as a false-positive result. Subject networks were generated 100 times by randomly selecting 90% of the genes involved in the AML pathway and 10% of the genes not involved in the pathway. We evaluated our method based on the proportion of true positives among the 100 simulated networks. For the HMT, CLM, and MDS pathways, we also performed GNEA, similar to AML-pathway-enriched network analysis. We considered the significance level $α = 0.01$ .

The results of our strategy were compared with those of existing methods, namely NEAT, ORA, signaling pathway impact analysis (SPIA, Tarca et al., 2009), and Discriminative Random Walk with Restarts (DRaWR) (Blatti and Sinha, 2016). In the ORA, we considered two types of functional sets, that is, ORA_NW, genes belonging to the target pathway-enriched networks (target AML, HMT, CLM, and MDS pathway-enriched networks), and ORA_GENE, genes involved in the target pathways (AML, HMT, CLM, and MDS pathways). We considered the total number of genes $p = 250, 500, 750$ , and 1000. Tables 3 and 4 list the proportion of networks enriched in these pathways (p-value $< 0.01$ ) in the 100 simulations for error rates 10% and 25%, respectively. The gray and white columns correspond to the true positive rate and false-positive rate, respectively. As shown in Tables 3 and 4, the proposed method and DRaWR demonstrate outstanding performance in identifying enriched gene networks. Although NEAT shows effective performance in identifying CLM- and MDS-enriched gene networks, it does not perform well for HMT pathway-enriched gene networks. Furthermore, DRaWR and NEAT exhibit reduced performance in terms of false-positive (true negative) rates under higher noise levels (i.e., a 25% error rate). In contrast, our method shows robustness to the noise levels. Overall, SPIA and ORA (ORA_NW and ORA_GENE) do not demonstrate effective performance.

Table 3.

Gene Network Enrichment Analysis Results for Myeloid Neoplasm-Related KEGG Pathways with 10% Error Rate

p	Method	Target: AML				Target: CLM
p	Method	AML	CLM	HMT	MDS	AML	CLM	HMT	MDS
250	Pro	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
	NEAT	1.00	0.00	0.15	0.00	0.00	1.00	0.00	0.00
	SPIA	0.25	0.00	0.00	0.00	0.00	0.11	0.00	0.00
	DRaWR	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
	ORA $_{NW}$	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
	ORA $_{GENE}$	1.00	0.00	0.00	1.00	0.00	1.00	0.00	1.00
500	Pro	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
	NEAT	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
	SPIA	0.17	0.00	0.00	0.00	0.00	0.11	0.00	0.00
	DRaWR	1.00	0.00	0.01	0.00	0.00	1.00	0.00	0.00
	ORA $_{NW}$	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
	ORA $_{GENE}$	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.78
750	Pro	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
	NEAT	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
	SPIA	0.21	0.00	0.00	0.00	0.00	0.05	0.00	0.00
	DRaWR	1.00	0.00	0.01	0.00	0.00	1.00	0.00	0.00
	ORA $_{NW}$	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
	ORA $_{GENE}$	1.00	0.00	0.00	0.85	0.00	1.00	0.00	0.87
1000	Pro	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
	NEAT	1.00	0.00	0.30	0.00	0.00	1.00	0.00	0.00
	SPIA	0.21	0.00	0.00	0.00	0.00	0.07	0.00	0.00
	DRaWR	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
	ORA $_{NW}$	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
	ORA $_{GENE}$	1.00	0.00	0.00	1.00	0.00	1.00	0.00	0.94
		Target: HMT				Target: MDS
		AML	CLM	HMT	MDS	AML	CLM	HMT	MDS
250	Pro	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
	NEAT	0.13	0.00	1.00	0.00	0.01	0.00	0.00	0.99
	SPIA	0.00	0.00	0.93	0.00	0.00	0.00	0.01	0.10
	DRaWR	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
	ORA $_{NW}$	0.00	0.00	1.00	0.57	0.00	0.00	0.00	1.00
	ORA $_{GENE}$	0.00	0.00	1.00	1.00	1.00	0.00	0.00	1.00
500	Pro	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
	NEAT	0.00	0.00	1.00	0.00	0.00	0.00	0.00	0.99
	SPIA	0.00	0.00	0.88	0.00	0.00	0.00	0.01	0.10
	DRaWR	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
	ORA $_{NW}$	0.00	0.00	1.00	0.00	0.00	0.60	0.00	1.00
	ORA $_{GENE}$	0.00	0.00	1.00	0.00	0.00	0.82	0.00	1.00
750	Pro	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
	NEAT	0.00	0.00	1.00	0.00	0.01	0.00	0.00	0.99
	SPIA	0.00	0.00	0.89	0.00	0.00	0.00	0.00	0.11
	DRaWR	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
	ORA $_{NW}$	0.00	0.00	1.00	0.00	0.15	0.10	0.00	1.00
	ORA $_{GENE}$	0.00	0.00	1.00	0.00	0.40	1.00	0.00	1.00
1000	Pro	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
	NEAT	0.34	0.00	1.00	0.00	0.00	0.00	0.00	1.00
	SPIA	0.00	0.00	0.92	0.00	0.00	0.00	0.00	0.12
	DRaWR	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
	ORA $_{NW}$	0.00	0.00	1.00	0.00	0.11	0.01	0.19	1.00
	ORA $_{GENE}$	0.00	0.00	1.00	0.72	1.00	0.93	0.18	1.00

Each value represents the proportion of detected enriched gene networks for each pathways across 100 simulations, with significance defined as $p < 0.01$ . The gray and white columns correspond to the true-positive rates and false-positive rates, respectively.

AML, acute myeloid leukemia; CLM, chronic myeloid leukemia; DRaWR, discriminative random walk with restart; HMT, hematopoietic cell lineage; MDS, myelodysplastic syndrome; NEAT, network enrichment analysis test; ORA, overrepresentation analysis; SPIA, signaling pathway impact analysis.

Table 4.

Gene Network Enrichment Analysis Results for Myeloid Neoplasm-Related KEGG Pathways with 25% Error Rate

p	Method	Target: AML				Target: CLM
p	Method	AML	CLM	HMT	MDS	AML	CLM	HMT	MDS
250	Pro	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
	NEAT	1.00	0.02	0.08	0.00	0.00	1.00	0.00	0.00
	SPIA	0.09	0.00	0.00	0.00	0.00	0.04	0.02	0.00
	DRaWR	0.97	0.01	0.06	0.00	0.01	1.00	0.02	0.00
	ORA $_{NW}$	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
	ORA $_{GENE}$	1.00	0.00	0.00	1.00	0.03	1.00	0.00	1.00
500	Pro	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.01
	NEAT	1.00	0.00	0.00	0.01	0.00	1.00	0.00	0.00
	SPIA	0.02	0.00	0.00	0.00	0.00	0.01	0.01	0.00
	DRaWR	0.69	0.01	0.03	0.00	0.00	0.99	0.00	0.00
	ORA $_{NW}$	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
	ORA $_{GENE}$	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.66
750	Pro	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
	NEAT	1.00	0.00	0.03	0.01	0.00	1.00	0.00	0.00
	SPIA	0.05	0.00	0.00	0.00	0.00	0.01	0.00	0.00
	DRaWR	0.66	0.00	0.01	0.00	0.00	0.99	0.00	0.00
	ORA $_{NW}$	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.01
	ORA $_{GENE}$	1.00	0.00	0.00	0.59	0.00	1.00	0.00	0.77
1000	Pro	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
	NEAT	1.00	0.00	0.18	0.00	0.01	1.00	0.00	0.00
	SPIA	0.07	0.00	0.00	0.00	0.00	0.03	0.00	0.00
	DRaWR	0.80	0.00	0.01	0.00	0.00	0.99	0.00	0.00
	ORA $_{NW}$	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
	ORA $_{GENE}$	1.00	0.00	0.00	0.97	0.00	1.00	0.00	0.77
		Target: HMT				Target: MDS
		AML	CLM	HMT	MDS	AML	CLM	HMT	MDS
250	Pro	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
	NEAT	0.04	0.00	1.00	0.00	0.01	0.00	0.00	0.51
	SPIA	0.00	0.00	0.84	0.00	0.00	0.00	0.00	0.06
	DRaWR	0.00	0.00	1.00	0.00	0.00	0.00	0.02	1.00
	ORA $_{NW}$	0.00	0.00	1.00	0.15	0.00	0.00	0.00	1.00
	ORA $_{GENE}$	0.00	0.00	1.00	1.00	1.00	0.00	0.00	1.00
500	Pro	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
	NEAT	0.00	0.00	1.00	0.00	0.00	0.00	0.00	0.80
	SPIA	0.00	0.00	0.65	0.00	0.00	0.00	0.04	0.02
	DRaWR	0.00	0.00	0.86	0.00	0.00	0.00	0.02	1.00
	ORA $_{NW}$	0.00	0.00	1.00	0.00	0.00	0.47	0.00	1.00
	ORA $_{GENE}$	0.00	0.00	1.00	0.00	0.00	0.80	0.00	1.00
750	Pro	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
	NEAT	0.00	0.00	1.00	0.01	0.00	0.01	0.01	0.82
	SPIA	0.00	0.00	0.66	0.00	0.00	0.00	0.02	0.06
	DRaWR	0.00	0.00	0.87	0.00	0.00	0.00	0.01	1.00
	ORA $_{NW}$	0.00	0.00	1.00	0.00	0.14	0.13	0.00	1.00
	ORA $_{GENE}$	0.00	0.00	1.00	0.00	0.38	0.97	0.00	1.00
1000	Pro	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
	NEAT	0.16	0.00	1.00	0.00	0.00	0.00	0.00	0.81
	SPIA	0.00	0.00	0.81	0.00	0.00	0.00	0.01	0.03
	DRaWR	0.00	0.00	0.95	0.00	0.00	0.00	0.00	1.00
	ORA $_{NW}$	0.00	0.00	1.00	0.00	0.05	0.00	0.12	1.00
	ORA $_{GENE}$	0.00	0.00	1.00	0.61	0.95	0.83	0.20	1.00

4.2. Pan-cancer-related pathways

We also evaluated our method for pan-cancer pathway-enriched gene network analysis based on pan-cancer-related pathways, that is, PPAR signaling pathway (hsa03320), calcium signaling pathway (hsa04020), MAPK signaling pathway (hsa04010), and focal adhesion (hsa04510) given as Table 5.

Table 5.
Pan-Cancer-Related KEGG Pathways

Name Entry $#$ genes

PPAR PPAR signaling pathway hsa03320 76

Calcium Calcium signaling pathway hsa04020 254

MAPK MAPK signaling pathway hsa04010 300

Focal Focal adhesion hsa04510 203

	Name	Entry	$#$ genes
PPAR	PPAR signaling pathway	hsa03320	76
Calcium	Calcium signaling pathway	hsa04020	254
MAPK	MAPK signaling pathway	hsa04010	300
Focal	Focal adhesion	hsa04510	203

Similar to the myeloid neoplasm-related pathway-enriched gene network analysis, we first extracted genes involved in the pan-cancer-related KEGG pathways and belonging to the CCLE expression dataset (i.e., PPAR: 74, calcium: 228, MAPK: 253, and focal: 142), and we named the extracted 697 genes as KEGG pan-cancer genes. We estimated the gene network between p genes consisting of the 697 KEGG pan-cancer genes and the $p - 697$ genes having the highest variances in their expression levels. The target networks of the PPAR, calcium, MAPK, and focal pathways were estimated using the 74, 228, 253, and 142 extracted genes, respectively. The pan-cancer-related pathway-enriched gene network analysis is based on a relatively larger scale of the networks than those of the myeloid neoplasm-related pathway-enriched gene networks analysis. Table 6 presents the enrichment analysis results for $p = 750$ and 1000. Our method, DRaWR, and ORA based on the functional set of the gene network (ORA_NW) showed effective performance, whereas NEAT did not. Although ORA_NW provided effective results, ORA based on the functional set of genes involved in the pathway (ORA_GENE) did not perform well. Furthermore, the performance of DRaWR decreased under a higher noise level (i.e., a 25% error rate). In contrast, our method remains robust under different noise levels. These results indicate that the proposed method is a useful tool for GNEA.

Table 6.

Gene Network Enrichment Analysis Results for Pan-Cancer-Related KEGG Pathways

	p	Method	Target: PPAR				Target: Calcium
	p	Method	PPAR	Calcium	MAPK	Focal	PPAR	Calcium	MAPK	Focal
10% error rate	750	Pro	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
		NEAT	1.00	0.32	1.00	0.00	0.31	1.00	0.06	0.00
		SPIA	0.16	0.00	0.00	0.00	0.00	0.75	0.00	0.00
		DRaWR	1.00	0.00	0.01	0.00	0.00	1.00	0.01	0.00
		ORA $_{NW}$	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
		ORA $_{GENE}$	1.00	0.00	1.00	0.00	1.00	1.00	0.00	0.00
	1000	Pro	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
		NEAT	1.00	0.00	0.99	0.00	0.00	1.00	0.01	0.00
		SPIA	0.16	0.00	0.00	0.00	0.00	0.71	0.00	0.00
		DRaWR	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.01
		ORA $_{NW}$	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
		ORA $_{GENE}$	1.00	0.00	0.00	0.00	0.37	1.00	0.00	0.00
			Target: MAPK				Target: Focal
			PPAR	Calcium	MAPK	Focal	PPAR	Calcium	MAPK	Focal
	750	Pro	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
		NEAT	1.00	0.08	1.00	0.02	0.00	0.00	0.01	1.00
		SPIA	0.00	0.00	0.44	0.00	0.00	0.00	0.00	0.60
		DRaWR	0.00	0.02	1.00	0.04	0.00	0.00	0.00	1.00
		ORA $_{NW}$	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
		ORA $_{GENE}$	1.00	0.00	1.00	0.00	1.00	0.00	0.94	1.00
	1000	Pro	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
		NEAT	0.99	0.00	1.00	0.00	0.00	0.00	0.01	1.00
		SPIA	0.00	0.00	0.41	0.00	0.00	0.00	0.00	0.59
		DRaWR	0.01	0.02	1.00	0.02	0.00	0.00	0.00	1.00
		ORA $_{NW}$	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
		ORA $_{GENE}$	0.04	0.00	1.00	0.00	0.20	0.00	0.00	1.00

	p	Method	Target: PPAR				Target: Calcium
	p	Method	PPAR	Calcium	MAPK	Focal	PPAR	Calcium	MAPK	Focal
25% error rate	750	Pro	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
		NEAT	1.00	0.14	0.95	0.01	0.14	1.00	0.00	0.01
		SPIA	0.12	0.00	0.00	0.00	0.00	0.35	0.00	0.00
		DRaWR	0.91	0.02	0.07	0.00	0.00	1.00	0.25	0.06
		ORA $_{NW}$	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
		ORA $_{GENE}$	1.00	0.00	1.00	0.00	1.00	1.00	0.10	0.00
	1000	Pro	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
		NEAT	1.00	0.00	0.76	0.00	0.00	1.00	0.01	0.01
		SPIA	0.10	0.00	0.00	0.00	0.00	0.23	0.00	0.00
		DRaWR	0.65	0.04	0.06	0.01	0.01	1.00	0.23	0.04
		ORA $_{NW}$	1.00	0.00	0.00	0.00	0.00	1.00	0.00	0.00
		ORA $_{GENE}$	1.00	0.00	0.00	0.00	0.95	1.00	0.00	0.00
			Target: MAPK				Target: Focal
			PPAR	Calcium	MAPK	Focal	PPAR	Calcium	MAPK	Focal
	750	Pro	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
		NEAT	0.77	0.01	1.00	0.00	0.00	0.00	0.01	1.00
		SPIA	0.00	0.00	0.06	0.00	0.00	0.00	0.00	0.67
		DRaWR	0.12	0.74	1.00	0.52	0.00	0.00	0.00	1.00
		ORA $_{NW}$	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
		ORA $_{GENE}$	1.00	0.00	1.00	0.00	1.00	0.00	1.00	1.00
	1000	Pro	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
		NEAT	0.88	0.01	1.00	0.01	0.00	0.00	0.07	1.00
		SPIA	0.00	0.00	0.03	0.00	0.00	0.00	0.00	0.47
		DRaWR	0.14	0.41	0.92	0.36	0.00	0.00	0.00	1.00
		ORA $_{NW}$	0.00	0.00	1.00	0.00	0.00	0.00	0.00	1.00
		ORA $_{GENE}$	0.91	0.00	1.00	0.00	0.77	0.00	0.00	1.00

These results demonstrated that our strategy provides stable performance across query networks with substantially different sizes and densities, ranging from sparse networks with fewer than 30 genes (i.e., myeloid neoplasm-related pathway analysis) to dense networks containing up to 300 genes (i.e., pan-cancer-related pathways). Moreover, consistent enrichment results were observed across target pathway-related networks of varying scales (250–1000 genes), indicating that the method is robust to variations in network density and degree structure and is not driven by highly connected hub genes.

5. UNCOVERING IMMUNE DISEASE PATHWAYS FOR THE GENE NETWORK OF AML

We aimed to explore the enriched immune disease pathways for AML-specific gene networks. We estimated the gene networks in AML-aged and -young and healthy-aged and -young samples and identified the enriched immune disease pathways in each network. We used publicly available single-cell gene expression data from bone marrow donors (Triana et al., 2021). The dataset encompasses expression levels for 461 genes across 49,507 cell lines, sourced from “3 healthy-young and 3 healthy-aged bone marrow donors.” We also used “15 leukemic bone marrow donors” dataset, consisting of the expression levels for 458 genes in 31,586 cell lines, where the cell lines with a development stage of $\leq 56$ and with a development stage of $\geq 61$ are defined as young-AML cell lines (20,154 cell lines) and aged AML cell lines (11,432 cell lines), respectively. For the “healthy-young,” “healthy-aged,” “AML-young,” and “AML-aged” groups, we estimated four gene networks based on the expression levels of each group, that is, four networks are the query networks of the GNEA. Figure 2 shows the query gene networks of the “healthy-young,” “healthy-aged,” “AML-young,” and “AML-aged” groups. For clarity of visualization, we focused on the 0.1% of edges with the largest weights. As shown in Figure 2, the gene networks of the healthy groups exhibit more active interplays, characterized by a larger number of edges, compared with those of the AML groups. The HLA and S100A gene families were hub genes across all groups. It can also be seen that the interplays involving EEF1A1, GAS5, RACK1, HINT1, NPM1, B2M, and HLA-A/B are specific to the healthy groups, whereas these interplays are absent in the gene networks of the AML groups.

FIG. 2.

Query gene network of the “healthy-young,” “healthy-aged,” “AML-young,” and “AML-aged” groups. Directed arrows ( $\oplus \to \otimes$ ) indicate regulatory effects from $\oplus$ to $\otimes$ . The thickness of each edge reflects its weight, while the edge color denotes the sign of regulation (blue: positive; red: negative). AML, acute myeloid leukemia.

We then collected immune disease pathways from the KEGG pathway database, that is, the eight immune disease pathways were considered as target pathways of our GNEA (see Table 7). For the eight target immune disease pathways, we estimated the target networks between genes involving the pathways based on the expression levels of the “healthy-young,” “healthy-aged,” “AML-young,” and “AML-aged” groups, respectively. The enrichment analysis of the gene network for the “healthy-young” group was performed based on the degree of overlapped edges between the network of the “healthy-young” group and the networks of the genes involving eight target pathways estimated by the expression levels of the “healthy-young” group. For the “healthy-aged,” “AML-young,” and “AML-aged” gene networks, the enrichment analysis was also performed based on a process similar to that described for the “healthy-young” GNEA.

Table 7.

Immune Disease Pathways

Target immune disease pathways	Entry	♯ genes
Asthma	hsa05310	31
Autoimmune thyroid disease	hsa05320	53
Inflammatory bowel disease	hsa05321	65
Systemic lupus erythematosus	hsa05322	139
Rheumatoid arthritis	hsa05323	94
Allograft rejection	hsa05330	38
Graft-versus-host disease	hsa05332	44
Primary immunodeficiency	hsa05340	38

Table 8 shows the GNEA results (i.e., p-value of the target immune disease pathways for the gene networks of the “healthy-young,” “healthy-aged,” “AML-young,” and “AML-aged” groups). As shown in Table 8, the gene networks of AML-aged and -young groups were enriched in the IBD (hsa05321) pathway, whereas gene networks for the healthy groups were not enriched in this pathway. Gene networks in the healthy-young group were enriched in the asthma (hsa05310) pathways.

Table 8.

Enriched Immune Disease Pathway Results for Bone Marrow-Related Gene Networks, Where the Numbers Indicate p-Value of the Enrichment Analysis of the Gene Networks of AML-Aged/-Young and Healthy-Aged/-Young Samples for the Immune Disease Pathways

		Gene networks
Immune disease pathways		AML		Healthy
Name	Entry	Aged	Young	Aged	Young
Asthma	hsa05310	3.07 × 10 $^{- 1}$	8.70 × 10 $^{- 2}$	5.79 × 10 $^{- 1}$	7.00 × 10 $^{- 3}$
Autoimmune thyroid disease	hsa05320	8.15 × 10 $^{- 1}$	6.39 × 10 $^{- 1}$	1.00 × 10⁰	1.00 × 10⁰
Inflammatory bowel disease	hsa05321	4.00 × 10 $^{- 3}$	3.00 × 10 $^{- 3}$	9.98 × 10 $^{- 1}$	9.98 × 10 $^{- 1}$
Systemic lupus erythematosus	hsa05322	8.43 × 10 $^{- 1}$	5.94 × 10 $^{- 1}$	1.00 × 10⁰	1.00 × 10⁰
Rheumatoid arthritis	hsa05323	4.92 × 10 $^{- 1}$	1.09 × 10 $^{- 1}$	9.99 × 10 $^{- 1}$	9.95 × 10 $^{- 1}$
Allograft rejection	hsa05330	7.27 × 10 $^{- 1}$	4.42 × 10 $^{- 1}$	1.00 × 10⁰	1.00 × 10⁰
Graft-versus-host disease	hsa05332	3.59 × 10 $^{- 1}$	2.30 × 10 $^{- 1}$	1.00 × 10⁰	1.00 × 10⁰
Primary immunodeficiency	hsa05340	3.26 × 10 $^{- 1}$	5.59 × 10 $^{- 1}$	1.00 × 10⁰	6.34 × 10 $^{- 1}$

5.1. Inflammatory bowel disease: hsa05321

IBD is a chronic inflammatory disorder of the gastrointestinal tract, including Crohn’s disease and ulcerative colitis. Previous epidemiological and clinical studies have reported an increased risk of hematological malignancies, particularly AML, in patients with IBD (Askling et al., 2005). The enrichment of the IBD pathway in AML-aged and AML-young gene networks suggests that immune-mediated inflammatory processes associated with IBD may be reflected in AML-specific network structures.

5.2. Asthma: hsa05310

Asthma is a chronic inflammatory disease of the airways characterized by dysregulated immune responses and typically manifests early in life. The enrichment of the asthma pathway in the healthy-young gene network, but not in AML networks, may reflect age-dependent immune regulatory patterns that are preserved in healthy individuals and disrupted in disease conditions.

To identify the significantly enriched pathways (i.e., IBD and asthma), we extracted information regarding the molecular interplays of each group. That is, we extracted the overlapping edges between the gene networks of the four groups and the target gene networks of significantly enriched pathways (i.e., IBD and asthma). The networks comprising the extracted edges can be considered the molecular interplays involved in the enriched pathways (i.e., IBD and asthma). Table 9 lists the genes comprising the networks involved in the identified enriched pathways. These genes can be considered crucial markers for uncovering the biological meaning of IBD and asthma-enriched gene networks.

Table 9.
Genes and $#$ of Edges in Enriched Pathway-Related Networks

AML Healthy

Aged Young Aged Young

Genes ♯ edges Genes ♯ edges Genes ♯ edges Genes ♯ edges

hsa05321: Inflammatory bowel disease (sig in aged and young-AML)

HLA-DPA1 12 HLA-DPA1 14 HLA-DPA1 11 HLA-DPA1 10

HLA-DPB1 13 HLA-DPB1 13 HLA-DPB1 12 HLA-DPB1 12

HLA-DQB1 14 HLA-DQB1 15 HLA-DQB1 14 HLA-DQB1 14

HLA-DRA 14 HLA-DRA 14 HLA-DRA 15 HLA-DRA 15

HLA-DRB1 13 HLA-DRB1 14 HLA-DRB1 12 HLA-DRB1 12

HLA-DRB5 16 HLA-DRB5 16 HLA-DRB5 16 HLA-DRB5 16

IL1B 9 IL1B 9 IL1B 6 IL1B 6

IL21R 4 IL21R 6 IL21R 4 IL21R 4

IL4R 7 IL4R 9 IL4R 12 IL4R 11

JUN 12 JUN 12 JUN 10 JUN 10

hsa05310: Asthma (sig in young-healthy)

CD40 7 CD40 7 CD40 16 CD40 14

CD40LG 1 CD40LG 1 CD40LG 8 CD40LG 8

FCER1A 15 FCER1A 15 FCER1A 19 FCER1A 16

FCER1G 14 FCER1G 14 FCER1G 14 FCER1G 13

HLA-DPA1 17 HLA-DPA1 17 HLA-DPA1 19 HLA-DPA1 17

HLA-DPB1 17 HLA-DPB1 16 HLA-DPB1 17 HLA-DPB1 16

HLA-DQB1 15 HLA-DQB1 15 HLA-DQB1 18 HLA-DQB1 16

HLA-DRA 16 HLA-DRA 16 HLA-DRA 20 HLA-DRA 18

HLA-DRB1 15 HLA-DRB1 15 HLA-DRB1 21 HLA-DRB1 17

HLA-DRB5 17 HLA-DRB5 17 HLA-DRB5 19 HLA-DRB5 17

PRG2 8 PRG2 8 PRG2 11

RNASE3 4 RNASE3 3 RNASE3 12

AML	Healthy
hsa05321: Inflammatory bowel disease (sig in aged and young-AML)
HLA-DPA1	12	HLA-DPA1	14	HLA-DPA1	11	HLA-DPA1	10
HLA-DPB1	13	HLA-DPB1	13	HLA-DPB1	12	HLA-DPB1	12
HLA-DQB1	14	HLA-DQB1	15	HLA-DQB1	14	HLA-DQB1	14
HLA-DRA	14	HLA-DRA	14	HLA-DRA	15	HLA-DRA	15
HLA-DRB1	13	HLA-DRB1	14	HLA-DRB1	12	HLA-DRB1	12
HLA-DRB5	16	HLA-DRB5	16	HLA-DRB5	16	HLA-DRB5	16
IL1B	9	IL1B	9	IL1B	6	IL1B	6
IL21R	4	IL21R	6	IL21R	4	IL21R	4
IL4R	7	IL4R	9	IL4R	12	IL4R	11
JUN	12	JUN	12	JUN	10	JUN	10
hsa05310: Asthma (sig in young-healthy)
CD40	7	CD40	7	CD40	16	CD40	14
CD40LG	1	CD40LG	1	CD40LG	8	CD40LG	8
FCER1A	15	FCER1A	15	FCER1A	19	FCER1A	16
FCER1G	14	FCER1G	14	FCER1G	14	FCER1G	13
HLA-DPA1	17	HLA-DPA1	17	HLA-DPA1	19	HLA-DPA1	17
HLA-DPB1	17	HLA-DPB1	16	HLA-DPB1	17	HLA-DPB1	16
HLA-DQB1	15	HLA-DQB1	15	HLA-DQB1	18	HLA-DQB1	16
HLA-DRA	16	HLA-DRA	16	HLA-DRA	20	HLA-DRA	18
HLA-DRB1	15	HLA-DRB1	15	HLA-DRB1	21	HLA-DRB1	17
HLA-DRB5	17	HLA-DRB5	17	HLA-DRB5	19	HLA-DRB5	17
PRG2	8	PRG2	8	PRG2	11
RNASE3	4	RNASE3	3	RNASE3	12

The AML-aged and -young gene networks involved in the IBD pathway are dominated by HLA-D (i.e., HLA-DPA1, HLA-DPB1, HLA-DQB1, HLA-DRA, HLA-DRB1, and HLA-DRB5) and IL (IL1B, IL21R, and IL4R) genes, while CD40 (CD40 and CD40LG), FCER1 (FCER1A and FCER1G), and HLA-D genes dominate the healthy-young networks involved in the asthma pathway.

5.3. HLA class II: HLA-DP, HLA-DQ, and HLA-DR

HLA class II molecules are primarily expressed on antigen-presenting cells and play a central role in immune recognition. (Nelde et al., 2023) reported that HLA class II-presented antigens are associated with leukemia stem and progenitor cells, and Cao et al. (2020) identified HLA-DP and HLA-DQ variants linked to increased AML risk. The enrichment of HLA class II genes in our analysis suggests a potential involvement of altered antigen presentation in AML-specific network structures.

5.4. CD40 genes: CD40 and CD40LG

CD40 and CD40LG mediate immune cell interactions and have been implicated in AML. Previous studies have shown that altered CD40 signaling affects immune recognition, prognosis, and therapeutic response in AML (Aldinucci et al., 2002; Hock et al., 2006; Feng and Chen, 2020; Li et al., 2023). Their enrichment in our results highlights the role of CD40-related immune interactions in AML-associated gene networks.

5.5. FCER1 genes: FCER1A and FCER1G

FCER1 genes are involved in immune signaling and have been associated with AML progression. FCER1G has been reported to be upregulated in AML, linked to poor prognosis, and identified as a hub gene in AML-related networks (Tan et al., 2020). The enrichment of FCER1-related genes supports their relevance in AML-specific immune network architectures.

5.6. IL genes: IL1B and IL21R

Grauers Wiktorin et al. (2021) demonstrated that interleukin-1 beta (IL-1 $β$ ), a proinflammatory cytokine, plays a role in the expansion of myeloid progenitors in AML and contributes to the suppression of lymphocyte-mediated immunity against malignant cells by myeloid cells. Chen et al. (2024) uncovered novel predictors for outcomes in AML and suggested further investigation into the potential use of TXNIP, NLRP3, and IL1B genes for targeted therapies in AML. Chen et al. (2024) suggested that IL-1 $β$ may serve as a target for preventing relapse and improving survival outcomes in individuals with AML. These results emphasize the critical role of ex vivo functional screening for pinpointing common and actionable extrinsic pathways in genetically heterogeneous cancers. Carey et al. (2017) highlighted that prior research has catalyzed the clinical development of therapies focused on the IL-1/IL1R1/p38MAPK signaling pathway for use in AML. IL-21R expression was absent on monocytes derived from the lymphoid nodes of patients with AML (Ma et al., 2011).

The identified markers provide strong evidence of being key markers of AML. Figure 3 shows the enriched gene networks for the IBD pathway. The mutual suppression between HLA-DPB1 and IL4R can be considered an AML-specific molecular interplay for IBD-enriched gene networks, whereas the genes activate each other in the gene networks of healthy groups (i.e., healthy-aged and -young groups). Furthermore, the mutual inhibition between JUN and HLA-DPB1 is also an AML-specific molecular interplay that disappears in gene networks of healthy groups.

FIG. 3.

Inflammatory bowel disease (hsa05321) pathway-enriched gene networks. Directed arrows ( $\oplus \to \otimes$ ) indicate regulatory effects from $\oplus$ to $\otimes$ . The thickness of each edge reflects its weight, while the edge color denotes the sign of regulation (blue: positive; red: negative).

We show the asthma-enriched gene networks for “healthy-young,” “healthy-aged,” “AML-young,” and “AML-aged” groups in Figure 4. CD40 showed higher activity in the networks of the healthy-aged and -young groups than in those of AML groups. The disappearance of the interplay between PRG2 and RNASE3 is considered characteristic of a healthy-young network enriched in the asthma pathway. The interplay between RNASE3 and PRG2 was observed in the networks of AML-aged and -young and healthy-aged groups. However, their interplay disappeared in the networks of the healthy-young group. That is, the disappearance of the interplay between RNASE3 and PRG2 can be considered a specific molecular characteristic of healthy-young individuals. The hubness of CD40LG is also considered a health-specific characteristic, whereas CD40LG inhibits only HLA-DPA1 in the gene networks for the AML-aged and -young groups. Given the density of the networks in Figures 2 and 3, all edges and their corresponding edge weights are provided in the Supplementary Data for completeness.

FIG. 4.

Asthma (hsa05310) pathway-enriched gene networks. Directed arrows ( $\oplus \to \otimes$ ) indicate regulatory effects from $\oplus$ to $\otimes$ . The thickness of each edge reflects its weight, while the edge color denotes the sign of regulation (blue: positive; red: negative).

Our results of the IBD pathway-enriched gene network analysis suggest that controlling the activation of CD40 and CD40LG and mutual activation between HLA-DPB1 and IL4R and between JUN and HLA-DPB1 may provide crucial clinical clues to reveal AML-related mechanisms.

To demonstrate the statistical validity of the observed pathway enrichment results, we performed permutation testing at the network level. For each of the AML-young, AML-aged, healthy-young, and healthy-aged gene networks, permutation networks were generated by randomly selecting edges while preserving the total number of edges in the original network. For each group, 200 permutation networks were constructed. We then applied our strategy to each permutation network across the eight immune disease pathways. Permutation-based p-values were computed for each network–pathway combination, and the average p-values across the 200 permutations were summarized. Notably, none of the immune disease pathways showed significant enrichment in the permutation networks, with the averaged permutation p-values for all four networks across the eight immune disease pathways equal to one. This result indicates that the significant pathway enrichments observed in the original AML gene networks are unlikely to arise from random network structures with matched edge density, providing statistical validation of the pathway enrichment analysis and suggesting that the results reflect disease-relevant network organization.

6. DISCUSSION

Our study aimed to uncover enriched functional pathways of gene networks for interpretable gene network analysis. We developed a novel computational methodology for GNEA. In other words, our method measures the enrichment of a gene network based on the overrepresentation of edges in the gene network of interest and the network of genes involved in the functional pathway. In our analysis, similar edge structures between the gene network of interest and the functional network indicated that the gene network of interest was enriched in specific functional pathways. To illustrate the efficiency of the proposed strategy, we performed Monte Carlo simulations based on myeloid neoplasms and pan-cancer-related pathway-enriched gene network analyses. The simulation results indicated that our strategy provides outstanding performance for GNEA.

We applied the proposed method to explore the enriched immune disease pathways in AML-specific gene networks. For the AML-aged and -young and healthy-aged and -young groups, we estimated each gene network and then applied our method to reveal the enriched immune disease pathways of the estimated networks. Our results demonstrate that the AML-aged and -young networks were enriched in the IBD pathway, whereas the healthy-young network was enriched in the asthma pathway. The results of the IBD pathway-enriched gene network analysis revealed the “suppression of the interplay between HLA-DPB1 and IL4R” and “inhibition of the interplay between JUN and HLA-DPB1” as AML-specific molecular interplays. In the asthma pathway-enriched gene network analysis, the disappearance of the interplay between RNASE3 and PRG2 was identified as a specific molecular characteristic of healthy-young individuals. Furthermore, hubness of CD40LG was identified as a health-specific characteristic. Our results, combined with a review of the literature, suggest that activation of CD40 and CD40LG and mutual activation between HLA-DPB1 and IL4R and between JUN and HLA-DPB1 may provide crucial insights into understanding AML-related mechanisms.

In the current study, we described our strategy for gene network estimated by a regression framework with elastic net. However, our strategy can be applied to networks constructed using a wide range of approaches, including correlation-based networks, regression-based gene regulatory networks (e.g., elastic net, lasso), protein–protein interaction-derived networks, or other externally defined prior networks. Importantly, the accuracy of network estimation is a critical issue for downstream enrichment analysis, as uncertainty or instability in inferred network may affect enrichment results. While the proposed framework treats the estimated network as an input object, it can be combined with upstream procedures such as bootstrap-based network estimation or stability selection to improve network reliability and assess robustness across network realizations.

Although our strategy demonstrated consistently effective performance across gene networks of various scales, incorporating degree-preserving randomization as an explicit normalization strategy will be explored to further mitigate potential biases arising from network density and degree heterogeneity, thereby enhancing the robustness of the proposed framework across diverse network structures. Thus, we consider incorporating degree-preserving randomization into our strategy as one of future work of the current study.

A potential limitation of our strategy is that larger networks tend to exhibit a higher number of overlapping edges with pathway-specific networks simply due to their increased size, which may confound the results of GNEA. In the proposed method, this issue is partially alleviated by evaluating enrichment relative to a hypergeometric null model that explicitly conditions on the total number of edges in both the query network and the pathway-specific network. As a result, statistical significance reflects whether the observed overlap exceeds random expectation given the network sizes, rather than being driven by absolute edge counts. Nevertheless, future work may further refine the null model by incorporating matched network size or permutation-based strategies to more explicitly account for variability in network scale across different conditions.

AUTHORS’ CONTRIBUTIONS

H.P. developed the method, performed the analysis, and drafted the article. S.M. supervised the work. All authors have read and approved the final version of the article.

DATA AVAILABILITY STATEMENT

The datasets used in the Application section are from the CELL×GENE database (https://cellxgene.cziscience.com/collections/93eebe82-d8c3-41bc-a906-63b5b5f24a9d). The code and dataset used in our study are available on GitHub (https://github.com/HeewonGitHub/GNEA_JCB).

Footnotes

ACKNOWLEDGMENT

This research used the computational resources of the Supercomputer System, Human Genome Center, Institute of Medical Science, University of Tokyo.

AUTHOR DISCLOSURE STATEMENT

The authors declare they have no conflicting financial interests.

FUNDING INFORMATION

This work was supported by the Sungshin Women’s University Research Grant of 2024 (Grant Recipient: Heewon Park). This research was also supported by AMED under grant numbers 23tk0124003h0001, 24tk0124003h0002, and 25tk0124003h0003, and JSPS KAKENHI grant number JP24H00009.

Supplemental Material

References

Aldinucci

, Poletto

, Nanni

, et al. CD40L induces proliferation, self-renewal, rescue from apoptosis, and production of cytokines by CD40-expressing AML blasts. Exp Hematol, 2002; 30(11):1283–1292; doi: 10.1016/s0301-472x(02)00921-9

Askling

, Brandt

, Lapidus

, et al. Risk of haematopoietic cancer in patients with inflammatory bowel disease. Gut, 2005; 54(5):617–622; doi: 10.1136/gut.2004.051771

Benjamini

, Hochberg

. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc B, 1995; 57(1):289–300; doi: 10.1111/j.2517-6161.1995.tb02031.x

Blatti

, Sinha

. Characterizing gene sets using discriminative random walks with restart on heterogeneous biological networks. Bioinformatics, 2016; 32(14):2167–2175; doi: 10.1093/bioinformatics/btw151

Boyle

, Weng

, Gollub

, et al. GO::TermFinder—open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms. Bioinformatics, 2004; 20(18):3710–3715; doi: 10.1093/bioinformatics/bth456

Carey

, Edwards

, Eide

, et al. Identification of interleukin-1 as a key mediator of cellular expansion and disease progression in acute myeloid leukemia. Cell Rep, 2017; 18(13):3204–3218; doi: 10.1016/j.celrep.2017.03.018

Cao

, Wu

, Qian

, et al. Genetic variants in HLA-DP/DQ contribute to risk of acute myeloid leukemia: A case-control study in Chinese. Pathol Res Pract, 2020; 216(3):152829; doi: 10.1016/j.prp.2020.152829

Chen

, Hou

, Chang

, et al. Analysis of prognostic biomarker models of the TXNIP/NLRP3/IL1B inflammasome pathway in patients with acute myeloid leukemia. Int J Med Sci, 2024; 21(8):1438–1446; doi: 10.7150/ijms.96627

Draghici

, Khatri

, Martins

, et al. Global functional profiling of gene expression. Genomics, 2003; 81(2):98–104; doi: 10.1016/s0888-7543(02)00021-6

10.

Feng

, Chen

. Raised CD40L expression attenuates drug resistance in adriamycin-resistant THP-1 cells. Exp Ther Med, 2020; 19(3):2188–2194; doi: 10.3892/etm.2020.8452

11.

Grauers Wiktorin

, Aydin

, Christenson

, et al. Impact of IL-1β and the IL-1 receptor antagonist on relapse risk and survival in AML patients undergoing immunotherapy. Oncoimmunology, 2021; 10(1):1944538; doi: 10.1080/2162402X.2021.1944538

12.

Hock

, McKenzie

, Patton

, et al. Circulating levels and clinical significance of soluble CD40 in patients with hematologic malignancies. Cancer, 2006; 106(10):2148–2157; doi: 10.1002/cncr.21816

13.

Hoerl

, Kennard

. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 1970; 12(1):55–67; doi: 10.1080/00401706.1970.10488634

14.

Huang

, Sherman

, Lempicki

. Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res, 2009; 37(1):1–13; doi: 10.1093/nar/gkn923

15.

Kim

, Volsky

. PAGE: Parametric analysis of gene set enrichment. BMC Bioinformatics, 2005; 6:144; doi: 10.1186/1471-2105-6-144

16.

Lee

, Lee

, Seok

, et al. Regression-based network estimation for high-dimensional genetic data. J Comput Biol, 2019; 26(4):336–349; doi: 10.1089/cmb.2018.0225

17.

Nelde

, Schuster

, Heitmann

et al. Immune Surveillance of Acute Myeloid Leukemia Is Mediated by HLA-Presented Antigens on Leukemia Progenitor Cells. Blood Cancer Discov, 2023; 4(6):468–489; doi: 10.1158/2643-3230.BCD-23-0020

18.

Shimamura

, Imoto

, Shimada

et al. A novel network profiling analysis reveals system changes in epithelial-mesenchymal transition. PLoS One, 2011; 6(6):e20804; doi: 10.1371/journal.pone.0020804

19.

Signorelli

, Vinciotti

, Wit

EC.

NEAT: an efficient network enrichment analysis test. BMC Bioinformatics, 2016; 17(1):352; doi: 10.1186/s12859-016-1203-6

20.

Tan

, Zheng

, Du

et al. Identification of the hub genes and pathways involved in acute myeloid leukemia using bioinformatics analysis. Medicine (Baltimore), 2020; 99(35):e22047; doi: 10.1097/MD.0000000000022047

21.

Tarca

, Draghici

, Khatri

et al. A novel signaling pathway impact analysis. Bioinformatics, 2009; 25(1):75–82; doi: 10.1093/bioinformatics/btn577

22.

Tibshirani

. Regression shrinkage and selection via the lasso. J R Stat Soc B, 1996; 58(1):267–288; doi: 10.1111/j.2517-6161.1996.tb02080.x

23.

Triana

, Vonficht

, Jopp-Saile

et al. Single-cell proteo-genomic reference maps of the hematopoietic system enable the purification and massive profiling of precisely defined cell states. Nat Immunol, 2021; 22(12):1577–1589; doi: 10.1038/s41590-021-01059-0

24.

Zou

, Hastie

. Regularization and variable selection via the elastic net. J R Stat Soc B, 2005; 67(2):301–320; doi: 10.1111/j.1467-9868.2005.00503.x

Gene Network Enrichment Analysis and Its Application to Explore Enriched Immune Disease Pathways for Gene Network of Acute Myeloid Leukemia Cell Lines

Abstract

Keywords

1. INTRODUCTION

2. METHODS

2.1. ORA

3.2. Target network

3.3. Association between query and target networks

3.4.1. Comprehensive information-based GNEA

3.4.2. Biologically interpretable results

4. MONTE CARLO SIMULATIONS

Table 2. Myeloid Neoplasm-Related KEGG Pathways Name Entry # genes AML Acute myeloid leukemia hsa05221 68 HMT Hematopoietic cell lineage hsa04640 99 CLM Chronic myeloid leukemia hsa05220 77 MDS Myelodysplastic syndrome H01481 9 Myelodysplastic/myeloproliferative neoplasms H02410 20

Table 5. Pan-Cancer-Related KEGG Pathways Name Entry # genes PPAR PPAR signaling pathway hsa03320 76 Calcium Calcium signaling pathway hsa04020 254 MAPK MAPK signaling pathway hsa04010 300 Focal Focal adhesion hsa04510 203

5.2. Asthma: hsa05310

5.4. CD40 genes: CD40 and CD40LG

5.5. FCER1 genes: FCER1A and FCER1G

5.6. IL genes: IL1B and IL21R

AUTHORS’ CONTRIBUTIONS

DATA AVAILABILITY STATEMENT

Footnotes

ACKNOWLEDGMENT

AUTHOR DISCLOSURE STATEMENT

FUNDING INFORMATION

Supplemental Material

References

Table 2.
Myeloid Neoplasm-Related KEGG Pathways

Name Entry $#$ genes

AML Acute myeloid leukemia hsa05221 68

HMT Hematopoietic cell lineage hsa04640 99

CLM Chronic myeloid leukemia hsa05220 77

MDS Myelodysplastic syndrome H01481 9

Myelodysplastic/myeloproliferative neoplasms H02410 20

Table 5.
Pan-Cancer-Related KEGG Pathways

Name Entry $#$ genes

PPAR PPAR signaling pathway hsa03320 76

Calcium Calcium signaling pathway hsa04020 254

MAPK MAPK signaling pathway hsa04010 300

Focal Focal adhesion hsa04510 203