TFF3 Gene as a Molecular Target Potentially Shared by Idiopathic Pulmonary Fibrosis and Pulmonary Hypertension: Drug Repurposing Prospect with Aminoglutethimide

Abstract

Idiopathic pulmonary fibrosis (IPF) and pulmonary hypertension (PH) are two chronic conditions that can coexist occasionally, resulting in high morbidity and mortality. Despite their clinical association, the underlying genetic mechanisms and therapeutic targets that might link these two chronic disorders remain poorly understood. The present study used in silico analyses and machine learning to uncover genetic features and potential therapeutic targets shared by IPF and PH. Differentially expressed genes (DEGs) were identified using RNA sequencing data from the Gene Expression Omnibus, which revealed a total of 13 common DEGs between IPF and PH. Importantly, among the identified genes, TFF3 was significantly upregulated in both diseases. TFF3 is targeted by aminoglutethimide as identified through the Drug Gene Interaction Database, and with an interaction score of 3.26. Using the Protein Contact Atlas, PROCHECK, PROSA, and ProtParam tools, the structural model of TFF3 was validated. Finally, molecular docking analysis demonstrated a binding affinity score of −6.1 kcal/mol between TFF3 and aminoglutethimide, indicating a stable and potentially effective interaction between aminoglutethimide and the target protein. Aminoglutethimide displayed favorable ADMET properties as well. In conclusion, this in silico study reports (1) potential overlapping molecular links between IPF and PH, and (2) in silico potential of aminoglutethimide and TFF3 in drug repurposing for therapeutic interventions targeting both IPF and PH. These findings also challenge the traditional paradigm of pharmaceutical innovation that has long relied on the “one drug, one disease” premise, and highlight the potentials of “one drug, polydisease” paradigm of drug discovery and development.

Introduction

Idiopathic pulmonary fibrosis (IPF) is a chronic progressive interstitial lung disease (ILD) characterized by rapid scarring or fibrosis of the lung parenchyma, leading to a decline in respiratory function (Richeldi et al., 2017). The prevalence of IPF varies globally, ranging from 7 to 1650 cases per 100,000 individuals, with higher rates observed in older populations (Shah Gupta et al., 2023). Diagnosis is often challenging and typically involves high-resolution computed tomography scans, and in some cases, lung biopsies to be able to ascertain the hallmark patterns of the disease.

IPF frequently coexists with comorbidities such as cardiovascular diseases, gastroesophageal reflux disease, and pulmonary hypertension (PH), which significantly worsen symptoms and clinical outcomes of the disease (Caminati et al., 2019; Dasgupta, 2025a). Among these, PH is particularly severe, as it often worsens breathlessness, limits exercise capacity, and is associated with reduced survival. PH is characterized by elevated blood pressure in the pulmonary arteries, which, if left untreated, can progress to right heart failure (Choudhury et al., 2021; Collum et al., 2017). The cross talk between IPF and PH is mediated through overlapping mechanisms such as vascular remodeling, chronic inflammation, and hypoxia-driven signaling cascades. It is well known that both diseases share common hallmarks such as chronic inflammation, Transforming Growth Factor-beta (TGF-β) signaling, oxidative stress, and endothelial-to-mesenchymal transition (EndMT) (Ruffenach et al., 2020). In PH, structural alterations affect all layers of the vascular wall, with key involvement of endothelial and smooth muscle cells (Graham et al., 2011). In IPF, fibroblast foci and disrupted alveolar architecture dominate the pathology (Yamashita et al., 2009). Identifying common genetic features between IPF and PH is crucial, as it may uncover novel molecular targets for therapeutic intervention (Alharbi, 2024; Collum et al., 2017; Ruffenach et al., 2020).

The reported prevalence of PH among patients with IPF varies widely, especially among those evaluated for lung transplantation. Studies have described PH in 20–46% of IPF patients undergoing right heart catheterization (RHC) as part of transplant assessment (Patel et al., 2007; Rivera-Lebron et al., 2013; Shorr et al., 2007). Notably, Nathan et al. (2008) demonstrated that PH prevalence increased from 38.6% at initial transplant evaluation to 86.4% at the time of transplantation, suggesting that disease progression and timing of diagnostic assessment critically influence detection rates (Nathan et al., 2008). Furthermore, a population-based study from the United States in 2018 reported 30,335 hospitalizations with a diagnosis of IPF, of which 8075 (26.6%) were also diagnosed with PH (Desai et al., 2024). These data reflect the significant clinical burden and demographic disparities in the coexistence of IPF and PH.

Despite the clinical association between IPF and PH, the underlying genetic mechanisms and therapeutic targets that might link these two chronic disorders remain inadequately understood. Hadi et al. (2023) reported the potential of saracatinib and sotatercept in managing IPF associated with PH (Hadi et al., 2023). Rajagopal et al. (2021) reviewed the molecular mechanisms linking lung fibrosis and vascular remodeling, in support of the development of targeted pharmacotherapies. Waxman et al. (2022) discussed diagnostic and therapeutic advances in PH associated with ILDs, highlighting the significance of RHC in diagnosis. Collum et al. (2017) explored the high prevalence of PH in IPF patients and the associated mortality risk, calling for deeper insights into molecular mechanisms and treatment strategies. Gaikwad et al. (2020) delved into EndMT as a crucial mechanism connecting vascular remodeling and fibrosis, proposing it as a potential therapeutic target. Collectively, these studies underscore the urgent need for innovative approaches to understand and manage the intricate relationship between IPF and PH.

Bioinformatic-based research has identified several genetic targets and predictive drug candidates that may be effective for the treatment of conditions that can occasionally overlap such as IPF and PH (Barrett et al., 2013; Dasgupta, 2025b). Studies have also used detailed structural investigation of the identified important proteins to understand their physicochemical and stereochemical properties (Hossain et al., 2022; Tripathi et al., 2024). Bioinformatic tools such as ProtParam can be used to evaluate the physicochemical properties of common proteins implicated in both IPF and PH, while the Protein Contact Atlas provides structural information. Furthermore, PROCHECK tool aids in the assessment of stereochemical quality of the target protein (Kayikci et al., 2018). In silico docking studies involving these shared proteins and potential therapeutic compounds offer critical insights into drug–protein interactions. Molecular docking helps predict the binding affinity and orientation of drugs within the active sites of target proteins, aiding in the rational design of precision therapeutics.

The present study leverages bioinformatic analyses and tools as noted above to uncover the potential genetic overlaps between IPF and PH, aiming to deepen our understanding of the molecular interconnections underlying these diseases. A predictive machine learning model, built on the basis of altered genes, evaluated their significance as potential disease biomarkers. The expression of these genes was validated using independent datasets to ensure robustness. In addition, an in silico analysis highlighted a key protein and its interacting drug molecule, with their interaction confirmed through molecular docking analysis. A detailed structural snapshot of the target protein and the drug candidate provided further insights into its functional dynamics, potentially informing the future design of drug repurposing and precision medicine strategies.

Materials and Methods

Identification of the DEGs in IPF and PH

The data harnessed in this study were obtained from the publicly available and accessible National Center for Biotechnology Information-Gene Expression Omnibus (NCBI-GEO) database (http://www.ncbi.nlm.nih.gov/geo/) and did not require research ethics board approval or informed consent. The study and analyses reported herein were conducted under the overall research ethics oversight of the author’s institution.

Using the keywords “idiopathic pulmonary fibrosis” and “pulmonary hypertension,” relevant datasets were identified and screened. Datasets were selected based on the following criteria: (1) human-derived samples, (2) relevance to IPF or PH, (3) inclusion of both patient and healthy control samples, and (4) availability of raw or processed RNA-seq data. This study incorporated the datasets GSE231693 and GSE236251, which provide insights into gene expression changes in IPF and PH compared with controls, respectively.

The GSE231693 dataset comprises bulk RNA sequencing data generated from lung tissue samples of 20 IPF patients and 18 healthy controls. Sequencing was performed using the Illumina HiSeq 4000 platform (Jia et al., 2023). The dataset was developed by Genentech from explanted lung tissues collected during transplantation from patients diagnosed with ILD, including systemic sclerosis-associated ILD, and IPF. The broader cohort included a total of 208 subjects with a mean age of 48.09 years (standard deviation: 12.43). The median forced vital capacity percentage predicted was 82.48, with an interquartile range of 71.12–93.30. For the present study, a subset of this cohort was selected, consisting of 20 IPF patients and 20 healthy controls.

The GSE236251 dataset comprises RNA sequencing data obtained from pulmonary artery trunk samples collected during heart transplantation. The original cohort for this dataset consisted of four groups as follows: donor controls (n = 33), patients with left heart disease without PH (LHD w/o PH; n = 41), patients with PH-LHD (n = 49), and patients with pulmonary arterial hypertension (n = 4). Demographic characteristics showed that the donor group included 10 females and 23 males with a median age of 41 years (range: 16–72). The PH-LHD group included 14 females and 35 males with a median age of 55 years (range: 31–65). Median body weights varied across subgroups, with females in the PH-LHD group averaging 65 kg (range: 50–91) and males 85 kg (range: 55–127). For the purpose of the present analysis, pulmonary artery trunk samples from patients with PH-LHD (n = 18) and healthy donor controls (n = 4) were included to ensure direct disease-relevant comparison. RNA was extracted using the TRIzol method following snap-freezing of the arterial tissue in liquid nitrogen. Sequencing was conducted using the Illumina HiSeq 2000 platform (Kucherenko et al., 2023). The overall study design of the present study is shown in Figure 1.

FIG. 1.

Workflow of the study design.

Screening of common DEGs

The GEO2R package was used to analyze RNA sequencing data from the GSE231693 and GSE236251 datasets. The updated definitions provided by GEO2R for sample and control groups were followed during dataset analysis (https://www.ncbi.nlm.nih.gov/geo/info/geo2r.html#groups). This tool uses the limma and GEOquery algorithms to perform gene expression analyses, identifying DEGs with rigorous control over multiple testing. The significant DEGs were initially screened using a raw p value threshold of <0.05. Subsequently, gene expressions meeting the criteria of |log₂(fold change)| > 2 and an adjusted p value <0.05 were selected for further analysis. The adjusted p values were calculated using the Benjamini–Hochberg method to control for false discovery rate). This two-step filtering approach ensures robust statistical significance while minimizing type I errors that are common in high-throughput RNA-seq analyses.

Validation of the common genes using independent dataset

The expression levels of the identified DEGs were validated using additional datasets obtained from the NCBI-GEO repository. Two datasets, GSE79544 and GSE188938, were selected for this validation process. The GSE79544 dataset includes RNA-seq transcriptomic profiles from 7 normal smoking controls and 16 IPF smoking patients, based on RNA extracted from macrophages isolated from bronchoalveolar lavage fluid (Lee et al., 2018). The GSE188938 dataset comprises 12 peripheral blood samples, including 7 samples from patients with chronic thromboembolic PH (CTEPH), a subtype of PH, and 5 samples from healthy individuals (Xu et al., 2022a). Validation in these biologically distinct but disease-relevant tissues further supports the robustness and cross-cohort consistency of the DEGs identified in the discovery datasets.

Machine learning model for biomarker candidate discovery

The machine learning approach was used to evaluate the potential of the common DEGs as disease biomarkers using the MetaboAnalyst web tool (Pang et al., 2021; Xia et al., 2009). The random forest algorithm was applied to assess the predictive value of the genes, with the analysis conducted using 500 trees and 7 predictors. The algorithm was implemented with 500 trees, as this is the default and widely accepted setting in the MetaboAnalyst platform. This configuration is supported by the foundational work of Breiman (2001) and Liaw and Wiener (2002), who demonstrated that increasing the number of trees improves classification stability until reaching a performance plateau (Breiman, 2001; Liaw and Wiener, 2002). This approach is also consistent with established practices in omics-based random forest applications (Pawloski et al., 2022; Pink, 2016). Feature importance plots were generated to rank the significance of each gene in distinguishing disease states.

Correlation and comparison between gene expression in IPF and PH

The correlation between gene expression in IPF and PH was analyzed. A heatmap was generated to visualize the relationships between gene expression levels across the two conditions. Pearson correlation coefficients were used to quantify the strength and direction of the associations. Darker blue indicates negative correlation, whereas the red color suggests positive correlation between the genes of IPF and PH subjects. The expression levels of the genes were also compared between IPF and PH using GraphPad Prism version 8.0.1. The bar chart plots showed the expression of these DEGs in both cases.

Drug–gene interaction identification

The Drug Gene Interaction Database (DGIdb) was used to identify potential molecular drug candidates that could target the commonly dysregulated genes in IPF and PH. DGIdb (https://www.dgidb.org/search_interactions) is a comprehensive resource for drug–gene interactions, containing information about human drugs, “druggable genes,” and various drug–gene interactions from multiple sources. At present, it includes over 14,144 drug–gene interactions covering 2611 human genes and 6307 drugs (Wagner et al., 2016).

Absorption, distribution, metabolism, excretion, and toxicity studies

Absorption, distribution, metabolism, excretion, and toxicity (ADMET) analysis is essential for assessing the pharmacodynamic properties of potential drug candidates. This analysis is based on the ligand SMILES, obtained via the PubChem tool, and provides a structured approach to evaluating drug viability. This approach aids in the systematic assessment of drug-likeness and safety profiles of the identified potential drug from DGIdb, thereby supporting the selection of promising therapeutic candidates with favorable ADMET characteristics (Ferreira and Andricopulo, 2019).

Toxicity and bioavailability prediction

The toxicity levels of the predicted drugs were thoroughly examined using the Protox-II webtool (https://tox-new.charite.de/protoxII/). Protox-II uses computational models trained on real in vitro and in vivo data to estimate the toxicity potential of both existing and virtual compounds. For each input compound, the tool calculates the acute toxicity class and various endpoints by analyzing chemical similarities to known toxic substances and applying these trained models (Banerjee et al., 2018). The toxicity analysis results provide the predicted median lethal dose (LD₅₀) in mg/kg body weight, along with the toxicity class and prediction accuracy for each drug compound analyzed.

Homology modeling

To assess the homology modeling of the target protein, the corresponding Protein Data Bank (PDB) ID was retrieved from the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB-PDB) database and analyzed using the SWISS-MODEL tool (https://swissmodel.expasy.org/) (Waterhouse et al., 2018). Model accuracy was evaluated using the PROCHECK tool (https://saves.mbi.ucla.edu/). A high-quality model is characterized by an ERRAT quality factor exceeding 50 and more than 80% of residues achieving a score ≥0.2 in the 3D/1D profile, as determined by the VERIFY3D server (Gundampati et al., 2012). The atomic composition and amino acid profile of the target protein is evaluated using ProtParam tool (https://web.expasy.org/protparam/). The quality of modeled structures was also validated by ProSA tool (https://prosa.services.came.sbg.ac.at).

Molecular docking

Preparation of protein and ligand

The target protein was prepared following the methodology described by Tallei et al. (2020). For this study, the 3D crystal structure of the target receptor was retrieved from the RCSB-PDB (http://www.rcsb.org/pdb/).

Similarly, the 3D structures of the selected drugs were obtained from the PubChem database (https://pubchem.ncbi.nlm.nih.gov/compound) and were downloaded in a simplified data format for further processing.

Docking of ligands into protein

Molecular docking was carried out using AutoDock Vina integrated with UCSF-Chimera© version 1.13 software (Trott and Olson, 2010). For receptor preparation, water molecules were removed, and polar hydrogen atoms and Gasteiger charges were added. The ligand structures were also processed by incorporating hydrogen atoms and charges. The processed protein and ligand structures were saved as PDBQT files before initiating the docking procedure.

The docking process generated 10 configuration files, representing the top 10 docking poses for each protein–ligand complex. The pose with the lowest binding energy (kcal/mol) and minimal root mean square deviation (RMSD) was selected as the optimal docking configuration, indicating the strongest and most favorable interaction.

Results

Identification of the common DEGs in IPF and PH

In case of IPF, 871 upregulated and 211 downregulated genes were observed compared with the controls. A comparative analysis of gene expression between PH and controls revealed a total of 245 elevated and 59 decreased genes in the patient group. Comparative analysis revealed 13 DEGs with consistent expression trends in both diseases, suggesting potential shared molecular mechanisms. Among these, ADAMTS18, TFF3, MUC16, PPP1R1A, SYT12, HCAR1, C14orf180, DNAH9, and C6 were significantly upregulated, whereas RNASE2, CNTN6, SIGLEC10, and SIGLEC10-AS1 were consistently downregulated in both IPF and PH, compared with controls.

Validation of common genes

The majority of the 13 common DEGs appeared to exhibit similar expression patterns across independent validation datasets (GSE79544 and GSE188938), thereby supporting the observations made during the discovery phase. This consistency may suggest that these DEGs play a reproducible role in the molecular landscape shared by IPF and PH, highlighting their potential relevance for further investigation as diagnostic or therapeutic targets.

Machine learning model analysis

Random forest, a popular machine learning model, was developed to distinguish IPF and PH from controls based on the expression of 13 common genes. The model’s performance was evaluated using the out-of-bag (OOB) error. For IPF classification, the model achieved an OOB error of 0.154. For PH classification, the model demonstrated improved performance with an OOB error of 0.105. These results suggest that the random forest model effectively differentiates both disease groups from controls, with particularly high accuracy in classifying PH cases. These results demonstrate the model’s strong potential in distinguishing IPF and PH from controls, with varying levels of classification accuracy across the groups (Supplementary Fig. S1).

Correlation and comparison between IPF and PH

A heatmap was generated to visualize the correlation between IPF and PH using 13 DEGs, with blue indicating negative correlations and red representing positive correlations close to 1. The analysis revealed strong positive correlations among several genes, such as PPP1R1A and C14orf180 (0.952966), as well as SIGLEC10 and SIGLEC10-AS1 (0.988131). Moderate positive correlations were observed between CNTN6 and RNASE2 (0.546617), as well as HCAR1 and C6 (0.80817). Negative correlations were prominent between PPP1R1A and genes such as CNTN6 (−0.25324) and RNASE2 (−0.20742). Similarly, C14orf180 showed a negative correlation with CNTN6 (−0.2259) and RNASE2 (−0.16939) (Supplementary Fig. S2). These findings highlight intricate gene interactions, providing insights into the molecular underpinnings shared between IPF and PH.

Upon direct comparison between IPF and PH, it was observed that PPP1R1A and C14orf180 showed a higher expression in PH than in IPF. In contrast, C6, DNAH9, RNASE2, TFF3, HCAR1, SIGLEC10, SYT12, MUC16, ADAMTS18, and SIGLEC10-AS1 exhibited a relatively higher expression in IPF compared with PH. These findings support the outline of the disease-specific transcriptional signatures and provide additional insights into the pathobiological differences between IPF and PH. The expression profiles of these genes in the IPF versus PH group are illustrated in Supplementary Figure S3.

Structural analysis of potential drug

A total of 36 drugs targeting the 13 common DEGs were identified using the DGIdb database, of which 11 are currently approved for clinical use. The complete list of drug–gene interactions, along with their approval status by the Food and Drug Administration (FDA) and the corresponding DGIdb interaction scores, is provided in Supplementary Table S1. Among these, aminoglutethimide was selected for further analysis as it is an FDA-approved drug, its chemical structure is available in the PubChem database, and it targets TFF3, one of the overlapping genes identified between IPF and PH in this study (interaction score: 3.26). Aminoglutethimide was further evaluated through ADMET analysis to assess its pharmacokinetic properties and toxicity profile as a potential therapeutic candidate targeting TFF3. Its chemical formula is C13H16N2O2, with a molecular weight of 232.28 g/mol and 17 heavy atoms. Aminoglutethimide passes all key pharmacological filters, including Lipinski’s, Ghose, Veber, Egan, and Muegge rules, with a bioavailability score of 0.55. Toxicity analysis shows an LD₅₀ value of 1800 mg/kg, classifying it as toxicity class 4 (low toxicity), suggesting that aminoglutethimide could be a viable candidate for further investigation in the treatment of IPF and PH (Fig. 2).

FIG. 2.

(A) Physicochemical space for oral bioavailability, illustrating the position of aminoglutethimide in relation to Lipinski’s rule of five. The pink zone represents the optimal range for oral bioavailability, with aminoglutethimide’s properties falling within this favorable region. (B) Boiled egg plot depicting the bioavailability score of aminoglutethimide. The compound demonstrates a bioavailability score of 0.55, suggesting its potential for efficient oral absorption. (C) Toxicity assessment of aminoglutethimide, highlighting its safety profile and potential adverse effects.

Structural validation and composition analysis of TFF3

The structural model of TFF3 was validated using Protein Contact Atlas, PROCHECK, PROSA, and ProtParam tools. Ramachandran Plot analysis revealed 90.2% of residues in the most favored regions, 9.8% in additional allowed regions, and none in generously allowed or disallowed regions, indicating a high-quality model. G-factor scores for dihedral angles and main-chain covalent forces were −0.15 and 0.43, respectively, with an overall average score of 0.09, reflecting good stereochemical properties. The PROSA Z-score of −5.62 further confirmed the structural quality compared with high-resolution crystal structures (Fig. 3). ProtParam analysis determined the atomic composition as C374H589N103O112S10, with 1188 atoms in total. The amino acid composition revealed high proportions of alanine (11.2%), leucine (10%), and cysteine (10%), along with other residues such as glycine (6.2%) and proline (7.5%) (Fig. 4A). Finally, the structural and interaction insights of the TFF3 protein based on Protein Contact Atlas analysis are shown in Figure 4B–D. The validation results and composition metrics demonstrate that the TFF3 structural model is of reliable quality and suitable for further computational and experimental analyses.

FIG. 3.

Validation of the TFF3 structural model. (A) Ramachandran plot (PROCHECK), (B and C) PROSA Z-score, and (D) secondary structural features of the TFF3 protein structure, showing high-quality modeling results.

FIG. 4.

Structural and interaction analysis of the TFF3 protein using in silico tools. (A) ProtParam analysis showing the atomic composition and amino acid profile of the TFF3 protein. Notable residues include alanine (11.2%), leucine (10%), cysteine (10%), proline (7.5%), and glycine (6.2%), indicating a structure enriched in stabilizing amino acids. Visualization of protein contacts and atomic neighborhood for TFF3. (B) Chord plots illustrating secondary-level protein contacts, revealing inter-residue connectivity and interaction patterns. (C) Scatter plots displaying quantitative residue characteristics, including network centrality, residue extent, and solvent-accessible surface area, providing insights into structural and functional roles. (D) Asteroid plots offering a ligand- or residue-centric view of the atomic neighborhood, highlighting residues in the first and second interaction shells.

Molecular docking analysis

Molecular docking analysis was performed to assess the binding affinity between aminoglutethimide and TFF3 protein. The binding site of the TFF3 protein was identified and visualized using UCSF Chimera. The docking procedure was performed using AutoDock Vina, with the grid box defined around the predicted binding domain. Default grid parameters were applied, centered on the largest surface-accessible pocket of the modeled structure. The grid box was centered at coordinates (X: 4.23, Y: 12.90, Z: −10.16) with dimensions (Size X: 44.44, Y: 46.18, Z: 53.07) to encompass the entire active site region. The final docking run generated multiple binding poses, and the best pose was selected based on binding energy and RMSD criteria. The docking analysis demonstrated a binding affinity score of −6.1 kcal/mol, indicating a stable and potentially effective interaction between the drug and the target protein (Fig. 5). A lower binding affinity score generally reflects a stronger and more favorable interaction between the ligand (aminoglutethimide) and the receptor (TFF3).

FIG. 5.

Molecular docking results of aminoglutethimide and TFF3. Binding pose of aminoglutethimide within the active site of the TFF3 protein. The interaction features a binding affinity score of −6.1 kcal/mol, indicating strong ligand–receptor interaction.

Discussion

Drug repurposing and thinking beyond the disease silos in drug design, discovery, and clinical development are among the contemporary approaches in pharmaceutical innovation (Dasgupta et al., 2023; Dasgupta, 2024a, 2024b;). Uncovering the molecular substrates shared among two or more diseases can help develop interventions with a broad therapeutic and diagnostic scope. The overlap in the genetic signatures between IPF and PH observed in the present study highlights a potential convergence in their molecular mechanisms, which may help explain their frequent coexistence in clinical settings.

The identification of 13 DEGs with consistent expression trends in both diseases suggests a shared pathogenic framework that could be driven by chronic inflammation, fibrosis, and vascular remodeling. Genes such as ADAMTS18, TFF3, and MUC16, which were significantly upregulated as observed in the present analyses, are associated with processes such as extracellular matrix remodeling, epithelial integrity, and immune response modulation (Chen et al., 2024; Meyer zum Büschenfelde et al., 2006; Nie et al., 2024). Similarly, the consistent downregulation of genes such as RNASE2 and CNTN6 points toward disrupted cellular communication and immune regulation (Acquati et al., 2019). This shared gene expression profile suggests that common environmental triggers, such as hypoxia or inflammatory insults, might initiate or exacerbate pathological changes in both diseases. Supplementary Figure S4 illustrates the possible involvement of common genetic features, cellular components, and pathways that may potentially contribute to the pathogenesis of both IPF and PH.

The performance of the random forest model in distinguishing IPF and PH from controls using these 13 DEGs underscores the potential for biological relevance. The model’s high classification accuracy, particularly for PH, reflects the potential utility and promise of these shared genes as diagnostic biomarker candidates. However, the slightly higher classification error observed for IPF patients suggests greater heterogeneity in its genetic and phenotypic presentation. This variability may stem from differences in disease progression, the involvement of secondary pathways, or comorbidities that complicate the underlying molecular landscape. These findings emphasize the need for further refinement of machine learning algorithms and the inclusion of additional multi-omics data to improve diagnostic precision.

Moreover, the correlation patterns among the 13 DEGs provide intriguing insights into the regulatory networks shared by IPF and PH. Strong positive correlations, such as those between PPP1R1A and C14orf180, may reflect coordinated regulatory mechanisms that amplify fibrotic or vascular responses. On the contrary, negative correlations, such as between CNTN6 and PPP1R1A, suggest opposing functional roles in the disease processes, possibly as a result of compensatory mechanisms or divergent signaling pathways. These findings suggest that the interplay between genes involved in fibrosis, immunity, and vascular biology contributes to the pathogenesis of both conditions, offering potential targets for dual-disease therapies. The differential expression patterns between IPF and PH further emphasize distinct molecular mechanisms, highlighting potential targets for developing tailored therapeutic strategies specific to each disease.

It is important to mention that this study points that aminoglutethimide could be a potential drug for the treatment option for IPF associated with PH. The structural and pharmacological profiling of aminoglutethimide highlights its potential as a decent candidate drug for targeting TFF3, a protein implicated in the overlapping molecular mechanisms of IPF and PH. Its bioavailability score of 0.55 and classification under toxicity class 4, with an LD₅₀ of 1800 mg/kg, suggest a favorable safety and efficacy profile. These properties underscore its suitability for further in vitro and in vivo validation in the context of dual-disease pathology. The robust ADMET characteristics, combined with the compound’s interaction score from DGIdb, present aminoglutethimide as a viable candidate in the therapeutic landscape of fibrotic and vascular comorbidities.

In parallel, validation of the TFF3 structural model was performed to ensure reliability in downstream computational analyses. Quality indicators such as 90.2% residues in favored regions of the Ramachandran plot, a PROSA Z-score of −5.62, and favorable G-factor scores confirmed the model’s structural integrity. Furthermore, the amino acid composition analysis revealed a structure rich in stabilizing residues, including cysteine, proline, and alanine. The subsequent molecular docking analysis provided a binding affinity score of −6.1 kcal/mol, indicative of a moderately strong and energetically favorable interaction between aminoglutethimide and TFF3. In clinical practice this drug is used for treatment of seizures, Cushing’s syndrome, breast cancer, and prostate cancer (Santen and Misbin, 1981). However, its potential in other lung disease spectra cannot be overlooked, particularly given its reported efficacy in neutrophil-predominant severe asthma (Xu et al., 2022b). Collectively, these findings provide a strong foundation for future experimental exploration and rational drug design aimed at addressing the shared molecular underpinnings of IPF and PH.

Despite the valuable insights provided, this study has several limitations that warrant consideration. First, the sample size used for the gene expression and machine learning analyses was relatively small, which may limit the generalizability of the findings to broader patient populations. Second, the study relied primarily on transcriptomic data, without incorporating other multi-omics layers such as proteomics or metabolomics, which could provide a more comprehensive understanding of the molecular mechanisms underlying IPF and PH. In addition, while the random forest model demonstrated robust classification performance, the inclusion of more advanced algorithms or ensemble approaches might further enhance predictive accuracy and account for the inherent heterogeneity in these diseases. In addition, model optimization was limited to the use of default parameters. Future studies should consider systematic hyperparameter tuning—such as varying the number of trees and comparing model accuracy across different tree depths—using cross-validation to enhance model generalizability and performance. Another limitation is the lack of experimental validation for the identified genes, as the findings are primarily based on computational analyses. Although validation was performed using an independent dataset, it involved a different biofluid type and focused on CTEPH, which differs biologically from PH associated with LHD. Future validation efforts using datasets specific to PH due to LHD would enhance the robustness and disease relevance of the findings by ensuring alignment within the same subclass of the PH spectrum. Finally, while aminoglutethimide showed promising potential as a therapeutic candidate, its efficacy and safety in the context of IPF and PH need to be rigorously evaluated through preclinical and clinical studies. Addressing these limitations in future research could strengthen the translational relevance of the findings and pave the way for more effective diagnostic and therapeutic strategies.

Conclusions

This study reveals a potential genetic convergence and shared molecular pathways between IPF and PH, highlighting the prospects for unified diagnostic and therapeutic approaches. The consistent dysregulation of 13 genes across both conditions, along with their robust performance in classification models, reinforces their potential as biomarker candidates. Among the therapeutic leads identified, aminoglutethimide emerged as a promising candidate for potential drug repurposing targeting TFF3, a protein that appears to be notably altered in both IPF and PH based on the present in silico study. Its favorable ADMET profile and strong interaction score from DGIdb underscore its therapeutic potential, although further preclinical and in vitro and in vivo studies and validation are necessary and called for. Structural validation and molecular docking analyses further support aminoglutethimide’s potential to effectively bind and modulate TFF3 activity. Despite the limitations of in silico studies, a modest sample size, and reliance on transcriptomic data, the findings offer a valuable framework for future research aimed at precision diagnostics and the development of targeted therapies for patients with IPF and PH. Finally, these findings also challenge the traditional paradigm of pharmaceutical innovation that has long relied on the “one drug, one disease” premise, and highlight the potentials of “one drug, polydisease” paradigm of drug discovery and development.

Footnotes

Acknowledgments

The author is thankful for the permission to use the freely available figure-making tool BioRender.com and the NCBI-GEO tool for the gene expression data.

Author’s Contributions

S.D.: Conceptualization, formal analysis, investigation, visualization, and writing—review/editing.

Author Disclosure Statement

The author declares she has no conflicting financial interests.

Funding Information

No funding was received for the present study.

Supplementary Material

Abbreviations Used

References

Acquati

, Mortara

, De Vito

, et al. Innate immune response regulation by the human RNASET2 Tumor Suppressor Gene. Front Immunol, 2019; 10:2587; doi: 10.3389/fimmu.2019.02587

Alharbi

. Management of pulmonary hypertension in idiopathic pulmonary fibrosis. J Nat Sci Med, 2024; 7(4):229–236; doi: 10.4103/jnsm.jnsm_98_23

Banerjee

, Eckert

, Schrey

, et al. ProTox-II: A webserver for the prediction of toxicity of chemicals. Nucleic Acids Res, 2018; 46(W1):W257–W263; doi: 10.1093/nar/gky318

Barrett

, Wilhite

, Ledoux

, et al. NCBI GEO: Archive for functional genomics data sets—Update. Nucleic Acids Res, 2013; 41(Database issue):D991–D995; doi: 10.1093/nar/gks1193

Breiman

. Random forests. Mach Learn, 2001; 45(1):5–32; doi: 10.1023/A:1010933404324

Caminati

, Lonati

, Cassandro

, et al. Comorbidities in idiopathic pulmonary fibrosis: An underestimated issue. Eur Respir Rev, 2019; 28(153); doi: 10.1183/16000617.0044-2019

Chen

, Sandrine

, Yang

, et al. MUC1 and MUC16: Critical for immune modulation in cancer therapeutics. Front Immunol, 2024; 15:1356913; doi: 10.3389/fimmu.2024.1356913

Choudhury

, Bhattacharya

, Dasgupta

, et al. Identification of novel metabolic signatures potentially involved in the pathogenesis of COPD associated pulmonary hypertension. Metabolomics, 2021; 17(10):94; doi: 10.1007/s11306-021-01845-9

Collum

, Amione-Guerra

, Cruz-Solbes

, et al. Pulmonary hypertension associated with idiopathic pulmonary fibrosis: Current and future perspectives. Can Respir J, 2017; 2017:1430350; doi: 10.1155/2017/1430350

10.

Dasgupta

, Das

, Patidar

, et al. Identification of common dysregulated genes in COVID-19 and hypersensitivity pneumonitis: A systems biology and machine learning approach. OMICS, 2023; 27(5):205–214; doi: 10.1089/omi.2022.0171

11.

Dasgupta

. Thinking beyond disease silos: Dysregulated genes common in tuberculosis and lung cancer as identified by systems biology and machine learning. OMICS, 2024a;28(7):347–356; doi: 10.1089/omi.2024.0116

12.

Dasgupta

. Systems biology and machine learning identify genetic overlaps between lung cancer and gastroesophageal reflux disease. OMICS, 2024b;28(10):492–503; doi: 10.1089/omi.2024.0150

13.

Dasgupta

. Unravelling genetic commonalities between idiopathic pulmonary fibrosis and gastroesophageal reflux disease: A bioinformatics exploration. Human Gene, 2025a;43:201375; doi: 10.1016/j.humgen.2025.201375

14.

Dasgupta

. Identification of overlapping genetic signatures between obstructive sleep apnea and lung cancer: Moving Beyond “One Drug, One Disease” Paradigm of Pharmaceutical Innovation. OMICS, 2025b;29(5):221–228; doi: 10.1089/omi.2025.0010

15.

Desai

, Katukuri

, Kanagala

, et al. Examining prevalence and predictors of pulmonary hypertension in adults with idiopathic pulmonary fibrosis: A population-based analysis in the United States. J Med Life, 2024; 17(1):35–40; doi: 10.25122/jml-2023-0324

16.

Ferreira

LLG

, Andricopulo

. ADMET modeling approaches in drug discovery. Drug Discov Today, 2019; 24(5):1157–1165; doi: 10.1016/j.drudis.2019.03.015

17.

Gaikwad

, Eapen

, McAlinden

, et al. Endothelial to Mesenchymal Transition (EndMT) and vascular remodelling in pulmonary hypertension and idiopathic pulmonary fibrosis. Expert Rev Respir Med, 2020; 14(10):1027–1043; doi: 10.1080/17476348.2020.1795832

18.

Gundampati

, Chikati

, Kumari

, et al. Protein-Protein docking on molecular models of aspergillus niger rnase and human actin: Novel target for anticancer therapeutics. J Mol Model, 2012; 18(2):653–662; doi: 10.1007/s00894-011-1078-4

19.

Graham

, Zhang

, Tuder

, et al. Histological and pathological diagnosis of pulmonary hypertension: pathological classification of pulmonary vascular lesions. In: Textbook of Pulmonary Vascular Disease. ( Yuan

JX-J

, Garcia

JGN

, West

, et al. eds.) Springer: Boston; 2011, pp. 1413–1423; doi: 10.1007/978-0-387-87429-6_101

20.

Hadi

, Marsool

MDM

, Marsool

ADM

, et al. Idiopathic pulmonary fibrosis: Addressing the current and future therapeutic advances along with the role of sotatercept in the management of pulmonary hypertension. Immun Inflamm Dis, 2023; 11(11):e1079; doi: 10.1002/iid3.1079

21.

Hossain

, Ferdous

, Reza

, et al. Pathogen-driven gene expression patterns lead to a novel approach to the identification of common therapeutic targets. Sci Rep, 2022; 12(1):21070.

22.

Jia

, Ramalingam

, Heiden

, et al. An Interleukin 6 responsive plasma cell signature is associated with disease progression in systemic sclerosis interstitial lung disease. iScience, 2023; 26(11):108133; doi: 10.1016/j.isci.2023.108133

23.

Kayikci

, Venkatakrishnan

, Scott-Brown

, et al. Visualization and analysis of non-covalent contacts using the protein contacts atlas. Nat Struct Mol Biol, 2018; 25(2):185–194; doi: 10.1038/s41594-017-0019-z

24.

Kucherenko

, Sang

, Yao

, et al. Elastin stabilization prevents impaired biomechanics in human pulmonary arteries and pulmonary hypertension in rats with left heart disease. Nat Commun, 2023; 14(1):4416; doi: 10.1038/s41467-023-39934-z

25.

Lee

, Arisi

, Puxeddu

, et al. Bronchoalveolar lavage (BAL) cells in idiopathic pulmonary fibrosis express a complex pro-inflammatory, pro-repair, angiogenic activation pattern, likely associated with macrophage iron accumulation. PLoS One, 2018; 13(4):e0194803; doi: 10.1371/journal.pone.0194803

26.

Liaw

, Wiener

. Classification and regression by randomForest. R News, 2002; 2(3):18–22.

27.

Meyer Zum Büschenfelde

, Tauber

, Huber

. TFF3-peptide increases transepithelial resistance in epithelial cells by modulating Claudin-1 and -2 expression. Peptides (NY), 2006; 27(12):3383–3390; doi: 10.1016/j.peptides.2006.08.020

28.

Nathan

, Shlobin

, Ahmad

, et al. Serial development of pulmonary hypertension in patients with idiopathic pulmonary fibrosis. Respiration, 2008; 76(3):288–294; doi: 10.1159/000114246

29.

Nie

, Dang

, Zhu

, et al. ADAMTS18 deficiency associates extracellular matrix dysfunction with a higher risk of HER2-positive mammary tumorigenesis and metastasis. Breast Cancer Res, 2024; 26(1):19; doi: 10.1186/s13058-024-01771-3

30.

Pang

, Chong

, Zhou

, et al. MetaboAnalyst 5.0: Narrowing the gap between raw spectra and functional insights. Nucleic Acids Res, 2021; 49(W1):W388–W396; doi: 10.1093/nar/gkab382

31.

Patel

, Lederer

, Borczuk

, Kawut

. Pulmonary hypertension in idiopathic pulmonary fibrosis. Chest, 2007; 132(3):998–1006.

32.

Pawloski

, Gonen

, Wen

, et al. Supervised machine learning model to predict oncotype DX risk category in patients over age 50. Breast Cancer Res Treat, 2022; 191(2):423–430; doi: 10.1007/s10549-021-06443-w

33.

Pink

. Forensic ancestry assessment using cranial nonmetric traits traditionally applied to biological distance studies. In: Biological Distance Analysis. ( Pink

., ed.) Academic Press: San Diego; 2016, pp. 213–230.

34.

Rajagopal

, Bryant

, Sahay

, et al. Idiopathic pulmonary fibrosis and pulmonary hypertension: Heracles meets the hydra. Br J Pharmacol, 2021; 178(1):172–186; doi: 10.1111/bph.15036

35.

Richeldi

, Collard

, Jones

. Idiopathic pulmonary fibrosis. Lancet, 2017; 389(10082):1941–1952; doi: 10.1016/S0140-6736(17)30866-8

36.

Rivera-Lebron

, Forfia

, Kreider

, et al. Echocardiographic and hemodynamic predictors of mortality in idiopathic pulmonary fibrosis. Chest, 2013; 144(2):564–570.

37.

Ruffenach

, Hong

, Vaillancourt

, et al. Pulmonary hypertension secondary to pulmonary fibrosis: Clinical data, histopathology and molecular insights. Respir Res, 2020; 21(1):303.

38.

Santen

, Misbin

. Aminoglutethimide: Review of pharmacology and clinical use. Pharmacotherapy, 1981; 1(2):95–120; doi: 10.1002/j.1875-9114.1981.tb03557.x

39.

Shah Gupta

, Koteci

, Morgan

, et al. Incidence and prevalence of interstitial lung diseases worldwide: A systematic literature review. BMJ Open Respir Res, 2023; 10(1); doi: 10.1136/bmjresp-2022-001291

40.

Shorr

, Wainright

, Cors

, et al. Pulmonary hypertension in patients with pulmonary fibrosis awaiting lung transplant. Eur Respir J, 2007; 30(4):715–721.

41.

Tallei

, Tumilaar

, Niode

, et al. Potential of plant bioactive compounds as SARS-CoV-2 Main Protease (M(pro)) and Spike (S) glycoprotein inhibitors: A Molecular Docking Study. Scientifica (Cairo), 2020; 2020:6307457; doi: 10.1155/2020/6307457

42.

Tripathi

, Singh

, Kumar

, et al. Identification of hub genes and potential networks by centrality network analysis of PCR amplified Fusarium oxysporum f. sp. lycopersici EF1α gene. BMC Microbiol, 2024; 24(1):336.

43.

Trott

, Olson

. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem, 2010; 31(2):455–461; doi: 10.1002/jcc.21334

44.

Wagner

, Coffman

, Ainscough

, et al. DGIdb 2.0: Mining clinically relevant drug-gene interactions. Nucleic Acids Res, 2016; 44(D1):D1036–D44; doi: 10.1093/nar/gkv1165

45.

Waterhouse

, Bertoni

, Bienert

, et al. SWISS-MODEL: Homology modelling of protein structures and complexes. Nucleic Acids Res, 2018; 46(W1):W296–W303; doi: 10.1093/nar/gky427

46.

Waxman

, Elia

, Adir

, et al. Recent advances in the management of pulmonary hypertension with interstitial lung disease. Eur Respir Rev, 2022; 31(165); doi: 10.1183/16000617.0220-2021

47.

Xia

, Psychogios

, Young

, et al. MetaboAnalyst: A web server for metabolomic data analysis and interpretation. Nucleic Acids Res, 2009; 37(Web Server issue):W652–W660; doi: 10.1093/nar/gkp356

48.

, Deng

, Meng

, et al. The alterations in molecular markers and signaling pathways in chronic thromboembolic pulmonary hypertension, a study with transcriptome sequencing and bioinformatic analysis. Front Cardiovasc Med, 2022a;9:961305; doi: 10.3389/fcvm.2022.961305

49.

, Chen

, Ge

, et al. Identification of potential biomarkers and pathogenesis in neutrophil-predominant severe asthma: A comprehensive bioinformatics analysis. Medicine (Baltimore), 2022b;101(38):e30661; doi: 10.1097/MD.0000000000030661

50.

Yamashita

, Yamauchi

, Chiba

, et al. The definition of fibrogenic processes in fibroblastic foci of idiopathic pulmonary fibrosis based on morphometric quantification of extracellular matrices. Hum Pathol, 2009; 40(9):1278–1287.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.79 MB