The Application of Spearman Partial Correlation for Screening Predictors of Weight Loss in a Multiomics Dataset

Abstract

Obesity has reached epidemic proportions in the United States, but little is known about the mechanisms of weight gain and weight loss. Integration of omics data is becoming a popular tool to increase understanding in such complex phenotypes. Biomarkers come in abundance, but small sample size remains a serious limitation in clinical trials. In the present study, we developed a strategy to screen predictors from a multiomics, high-dimensional, and longitudinal dataset from a small cohort of 10 women with obesity who were provided an identical very-low calorie diet. Our proposal explores the combinatorial space of potential predictors from transcriptomics, microbiome, metabolome, fecal bile acids, and clinical data with the application of the first-order Spearman partial correlation coefficient. Two statistics are proposed for screening predictors, the partial association score, and the persistent significance. We applied our strategy to predict rates of weight loss in our sample of participants in a hospital metabolic facility. Our method reduced an initial set of 42,000 biomarker candidates to 61 robust predictors. The results show baseline fecal bile acids and regulation in RT-polymerase chain reaction as the most predictive data sources in forecasting the rate of weight-loss. In summary, the present study proposes a strategy based on nonparametric statistics for ranking and screening predictors of weight loss from a multiomics study. The proposed biomarker screening strategy warrants further translational clinical investigation in obesity and other complex clinical phenotypes.

Introduction

Obesity has reached epidemic proportions in the United States, with about two-thirds of adults who are classified as being overweight or obese (Hales et al, 2018). In obese subjects, gradual weight loss ameliorates adipose tissue inflammation and related systemic changes. Little is known about the simultaneous effects of rapid weight loss induced by a clinically relevant very-low-calorie diet (VLCD) on subcutaneous adipose tissue inflammation, the plasma metabolome, microbiome, and bile acids content. It is well known that weight loss and weight gain occur at differing rates in individuals, but the mechanisms responsible for it are still unclear (Bouchard et al, 1990; Leibel et al, 1995). A greater understanding of the factors contributing to an individual's enhanced rapid weight loss might significantly increase the development and efficacy of weight loss therapies.

This gap is being gradually filled with the development of omics and bioinformatics technologies and their ability to generate large amounts of data from complex biological systems. Recently, the integration of lipidomics, transcriptomics, proteomics, and metabolomics was used for deep phenotyping of glycemic responders upon clinical weight loss (Valsesia et al, 2020).

Despite the excitement and decreasing costs for these new technologies, small sample sizes are still an obstacle, and consequently, underpowered studies labeled as exploratory generate incipient insights for developing therapies. The combination of high dimensionality, small sample sizes, and heterogeneous data sources in multiomics pose a challenge for bioinformatics and statistical methods. Assumptions such as normality, linearity, and additive effects are hard to be assessed, and complex models cannot be fully validated. Type-I error is dramatically inflated when performing variable selection in small samples for multiomics datasets (Kirpich et al, 2018).

Until now, methods to handle multiomics data with small sample sizes lacked parsimonious and informative solutions to unravel disease mechanisms (McCabe et al, 2020).

The present study proposes a strategy based on nonparametric statistics for ranking and screening predictors of weight loss from a multiomics study with a small sample size. This strategy consists of (1) dimensionality reduction, (2) first-order partial Spearman correlation, (3) a scaled Shannon information (S-value) integrating the coefficient of correlation and its p-value, and (4) biological validation with inferred networks.

We applied the proposed strategy to a weight loss study (Alemán et al, 2017) performed in a group of 10 obese females fed an identical diet in a metabolic ward. Data were generated from six sources: clinical, transcriptomics (RNA-seq and RT-polymerase chain reaction [RT-PCR]), fecal metabolome, gut microbiome, and fecal bile acids.

Materials and Methods

Dataset

A single-center interventional study performed at The Rockefeller University Hospital between September 2012 and August 2013 studied 10 very obese (body mass index ≥35 kg/m²) postmenopausal women (defined by ≥2 years without menstrual periods). The participants consumed a VLCD until they lost ∼10% of the baseline weight. This diet provided ∼800 Kcal/d and consisted of a choice of shakes, soups, bars, and puddings. The study investigated clinical outcomes and systemic biomarkers of inflammation and metabolism at baseline and after VLCD-induced weight loss.

RNA was extracted from adipose tissue biopsies, and gene expression was assessed by bulk RNA-seq, and RT-PCR. Stool 16S rRNA sequencing and mass spectrometry were performed for fecal microbiota and fecal bile acids, respectively (Alemán et al, 2018; Alemán et al, 2017).

Ethics approval and consent to participate in the study were approved by the Institutional Review Boards at The Rockefeller University, Weil Cornell Medical College, and Memorial Sloan Kettering Cancer Center, and registered under ClinicalTrials.gov identifier NCT01699906. A written informed consent was obtained from all study participants.

Supplementary Table S1 shows the number of predictors within each data source, followed by the relative frequency. Any biomarker with more than 20% missing data was excluded from analyses.

The rate of weight loss (Eq. 1) is the daily average loss in kilograms observed in a study participant. $W L = \frac{W e i g h t_{p o s t} - W e i g h t_{p r e}}{D a y s}$ (1)

To account for potential confounding created by different levels of exercising in the patients, we adjusted the daily rate of weight loss by the total number of steps during the study period. Therefore, we use $Y = W L ∕ (T o t a l S t e p s)$ as response variable across our analyses.

Screening strategy

Our pipeline (Fig. 1) for screening predictors of weight loss accomplishes the following: (1) performing dimensionality reduction; (2) iterating the first-order partial Spearman correlation coefficient over the set of predictors and confounders; (3) ranking predictors based on the median partial correlation coefficient and robustness of statistical significance; (4) screening predictors among the top-ranked ones; and (5) representing the connection between data sources with a network.

FIG. 1.

Schematic of multiomic pipeline for screening predictors of weight loss. (a) Multiple data sources serve as input to the predictive model. After dimensionality reduction, all possible monotonic regressions, including two predictors, are run. The first-order partial Spearman partial correlation is used to measure association between Y and X after removing the effect of the potential confounder Z. (b) Evaluation of direct association between Y and predictor X. Partial correlation coefficients are computed over the set of confounders and mapped to the partial association score $(Π (X))$ . Predictors are ranked, and screened according to the persistent significance. A network based on aggregated measures of partial correlation is built to illustrate interactions between data sources.

Dimensionality reduction

According to Supplementary Table S1, there is a significant imbalance in the number of predictors from different data sources, as the majority comes from mRNA expression levels derived from RNA-seq. To mitigate selection bias caused by overrepresentation of a specific data source, we reduced the dimensionality in adipose tissue RNA-seq analysis and plasma metabolites using Gene Set Variation Analysis (GSVA) (Hänzelmann et al, 2013). This technique mapped transcriptomic and metabolomics expression levels into pathway activity scores.

GSVA scores were computed for 4107 canonical pathways and 54 metabolic pathways for RNA-seq genes and metabolites, respectively. We further reduced RNA-seq data to 176 pathways that were differentially expressed when comparing post- versus pre-VLCD activity scores. Differential expression analysis was performed with mixed effect models implemented in the limma library, part of the R programming environment.

Spearman partial correlation coefficient

Spearman's coefficient of correlation $γ_{Y X}$ is a nonparametric measure of association between two ranked variables. This coefficient is a less restrictive approach compared to the Pearson product-moment correlation that is traditionally used for measuring the linear association between quantitative variables. The only assumption is that the scale of measurements is at least ordinal. The Spearman coefficient is associated with monotonic regression analysis, representing linear and curvilinear monotonic associations (van den Heuvel and Zhan, 2022).

Given a sample of n observations from the pair $(Y, X)$ , two variables measured at the ordinal level, there are two standard formulations to estimate the Spearman correlation coefficient. The first one is shown in (Eq. 2), $r s_{Y X} = 1 - \frac{6 \sum_{i} d_{i}^{2}}{n (n^{2} - 1)}$ (2)

where n is the sample size and d_i is the difference between the ranks of y_i and x_i, i-th observed values from Y and X, respectively. The Spearman's correlation coefficient lies in the interval between $- 1$ and $+ 1$ , and the closer to the interval limits, the stronger is the evidence of association.

The second and most familiar representation (Eq. 3) is exactly the sample correlation used for estimating the Pearson correlation coefficient but observations y and x are replaced by their rankings, v and w, respectively.

Our approach to screen predictors of weight loss relies heavily on the use of the first order Spearman partial correlation coefficient (Eq. 4). $γ_{Y X . Z} = \frac{γ_{Y X} - γ_{Y Z} γ_{X Z}}{\sqrt{{(1 - γ_{Y Z})}^{2} {(1 - γ_{X Z})}^{2}}}$ (4)

The coefficient $γ_{Y X . Z}$ measures the association between Y and X, after removing the effect of a third variable Z that is a potential source of confounding. Z is associated with each element in the pair $(X, Y)$ being a common cause of spurious correlation. This coefficient can be seen as the correlation between probability-scaled residuals obtained after regressing Y on Z and X on Z (Kim, 2015; Whittaker, 1990)

We show in Eq. (5) a test statistic (Sheskin, 2003; Weatherburn, 1968) for the hypothesis $H_{0} : γ_{Y X . Z} = 0$ against the two-sided alternative. Under H₀, this statistic follows the Student's t distribution with n − 3 degrees of freedom. $t_{γ_{Y X . Z}} = r s_{Y X . Z} \sqrt{\frac{n - 3}{1 - r s_{Y X . Z}^{2}}}$ (5)

Partial association score

The use of the partial correlation coefficient for robust selection of features in high-dimensional data has been discussed in diverse applications (de la Fuente et al, 2004; Li et al, 2012; Raghuraj Rao and Lakshminarayanan, 2007). Consider Y a response variable, X a potential predictor, and $Z$ a vector of p confounders. We propose a scaled S-value (Greenland, 2019), a statistic that combines first-order partial correlation and its p-value (Eq. 6), to measure the strength of the relationship between Y and X after removing confounding effects from Z. $S_{Y X} (Z) = - |r s_{Y X . Z}| l o g (p_{γ_{Y X . Z}})$ (6)

In Eq. (6), $p_{γ_{Y X . Z}} = 2 \times P (T_{n - 3} > t_{γ_{Y X . Z}})$ where $T_{n - 3}$ follows the Student's t distribution with $n - 3$ degrees of freedom. Rescaling the observed p-value to the S-value, we obtain a better scale for measuring the amount of information supplied by the test against the null hypothesis (Greenland, 2019). We multiply the S-value by the absolute estimated first-order partial correlation to embed the magnitude of the association in the proposed metric. The statistic $S_{Y X} (Z)$ is written as a function of Z and can be evaluated for elements in the set of potential confounders $Z = \{Z_{1}, Z_{2}, \dots, Z_{p}\}$ . Therefore, we propose for each predictor X, the Partial Association Score described in Eq. (7). $Π (X) = M e d i a n \{S_{Y X} (Z_{1}), S_{Y X} (Z_{2}), \dots, S_{Y X} (Z_{p})\} .$ (7)

$Π (X)$ is a robust statistic of direct relationship between Y and X accounting for effects from confounders. This metric is agnostic to the direction of effects and functional importance of the confounders, returning a non-negative value that incorporates the magnitude of association between Y and X, statistical significance, and confounding effects from all variables in the dataset.

Persistent significance

$Π (X)$ is formulated for ranking biomarkers according to direct association with the outcome Y, but it does not perform feature selection. To accomplish this task, we propose the use of a complementary metric that we call Persistent Significance $Ψ_{α} (X)$ in Eq. (8). $Ψ_{α} (X) = \frac{1}{p} \sum_{Z} I (p_{γ_{Y X . Z}} < α) .$ (8)

$I (.)$ is an indicator variable. The statistic in Eq. (8) represents the proportion of times, in which the confounding variable does not remove statistical significance from the partial coefficient of correlation, at a fixed $α$ level. It is a predictor's metric of robustness to confounders in the association with the outcome.

We propose an ad hoc threshold $θ$ for $Ψ_{α} (X)$ in such way that a predictor X is screened if estimated $Ψ_{α} (X)$ is greater than $θ$ .

Network analysis

We further explore the Spearman partial correlation to propose a metric for the strength of connection between two nodes of a weighted directed network. Consider X and Z, two potential predictors of Y, to be the vertices of a directed graph. We propose the edges of this graph to be represented according to Eq. (9). $C (X, Z) = \frac{γ_{Y X}^{2} - γ_{Y X . Z}^{2}}{γ_{Y X}^{2}}$ (9)

The connectivity in this network characterizes how much the relationship between Y and X is affected by the confounder Z. Note that Eq. (9) represents the proportion of the coefficient of determination that is reduced (or increased) due to confounding. The farther $C (X, Z)$ is from zero, the more significant the shared importance between X and Z to predict the outcome Y.

The network built on $C (X, Z)$ connectivity can be rather complex due to the number of combinations. We thus explore the median connectivity to learn about the connection between different omics. Consider a omics data block $k \in \{1, 2, \dots, K\}$ , we evaluate the interaction between two distinct omics k₁ and k₂ with a graph where the edges are quantified in Eq. (10).

Note that $C (k_{1}, k_{2}) \neq C (k_{2}, k_{1})$ and $C (k_{i}, k_{i}) \neq 0$ , meaning that the direction is important to quantify the impact of an omic on another one.

We first evaluate $C (X, Z)$ for all pairs of predictors, then we summarize the connection between omics with a more compact network based on $C (k_{1}, k_{2})$ . Visualization of the network is done with the R package qgraph (Epskamp et al, 2012).

Results

Weight loss rates differ across individuals

The rates of weight loss characterized by their longitudinal trajectories and estimated slopes differ markedly between study participants (Fig. 2). In general, the participants showed a linear decay in weight at different speed rates. Adding a quadratic term to describe the weight loss trajectory had no statistical significance (data not shown).

FIG. 2.

Rate of diet-induced weight loss is highly variable and not predicted by baseline weight. (a) Dynamics of the individual weight losses across the study period. The y-axis displays the % weight loss from baseline and x-axis the number of days after baseline, at which the weight measurements were taken. Smoothed curves and their CIs are estimated by the Loess method and overlaid to individual data points. (b) Estimated slope coefficients and their 95% CIs obtained from the OLS regression of the weight loss as a linear function of the days on study. The subjects are ordered according to their weights at baseline (x-axis). CI, confidence interval; OLS, ordinary least squares.

The most straightforward predictor equation assuming each pound lost to be equivalent to a 3300-kcal deficit, did not predict individual rates. The slopes are different not only in their magnitude but also in the level of uncertainty estimated by the standard error. Although subjects were enrolled in different months over 1 year, there is no statistical evidence of seasonal components affecting the weight loss dynamics.

Omics heterogeneity

Omics profiles are heterogeneous and unstable, that is, they vary between subjects and fluctuate over time. In Supplementary Table S2, we show the median coefficient of variation, and interquartile range, across data sources. We used the recently developed Variance Partition method (Hoffman and Schadt, 2016), implemented in the variancePartition R package, to describe how much of the variance of each predictor can be explained by biological variability (within-subject) and effects of covariates (deterministic component). The residual component is related to random or nonexplained variance.

The method consists of fitting a linear model to each predictor, including fixed and random factors that are potential sources of variation. The method quantifies the proportion of total variation explained by each factor. In our data, after log transformation, we partitioned the variance in each predictor by fitting a mixed model with Time (pre-VLCD and post-VLCD) as a fixed effect and a random intercept for each subject. We show in Figure 3a how sources of variation are distributed within each omics.

FIG. 3.

Bile acids are the least correlated with VLCD and the most correlated with weight loss rate. (a) The violin plots show variance partition in the different omics. Variance is explained by three components: SUBJECT associated with the biological variation, VLCD accounting for time change induced by diet, and RESIDUAL for the nonexplained variability. The plot shows RNA-seq and RT-PCR as data sources with the highest variation explained by biological variability. (b) The boxplots show that the correlation of biomarkers with weight loss depends on the data source and time point in the study. The box shows quartiles of the Spearman correlation coefficient. The horizontal blue line locates the null correlation, and the red lines correlations of intermediate magnitude $(r = 0.5)$ . The proximity of the center of the box to the red lines shows that bile acids are the most highly correlated predictors at pre-VLCD and metabolites at post-VLCD. Changes in post- versus pre-VLC demonstrate RT-PCR as the most highly correlated data source. RESIDUAL; RT-PCR, real-time-polymerase chain reaction; VLCD, very-low-calorie diet.

Overall, we see in Figure 3a that a large proportion of metabolites, bile acids, and microbiota exhibit high nonexplained variation, suggesting that the impact of VLCD on biomarkers originating from these omics is less predictable. On the contrary, variation in mRNA-based predictors from RT-PCR and RNA-seq demonstrated more considerable within-subject variation. For clinical data, as expected, the variation is less explained by VLCD compared to measurements at the molecular level. Another interesting finding is that VLCD could only explain a very small portion of the variation in bile acids. On the contrary, within-subject variation in some bile acids is higher than most of RNA-seq or RT-PCR biomarkers.

Spearman correlation between weight loss and predictors

Figure 3b illustrates the Spearman correlation coefficient between weight loss rate and predictors ranges across data sources and different time-points. In this figure, the blue horizontal line is placed at the null association level $(γ_{Y X} = 0)$ , and the red horizontal lines represent moderate association $(γ_{Y X} = 0.5)$ . Changes from baseline in RT-PCR is the unique scenario, in which the median Spearman correlation coefficient exceeds moderate associate levels for most of the predictors. Bile acids' correlations are close to this level but still below the 0.5 threshold.

Screening weight loss predictors based on partial association score and persistent significance

Our approach ranked and selected predictors of weight loss by setting up $α = 0.05$ and a threshold for persistent significance of $θ = 0.7$ . Therefore, after applying $Π (X)$ to rank predictors from all omics sources, we selected among the top ones, a robust set of predictors with high persistent significance, that is, ${\hat{Ψ}}_{0.05} (X) > 0.7$ . The predictors of weight loss were examined and screened in three different scenarios; pre-VLCD (baseline), post-VLCD (after the participant reached 10% weight loss), and for changes from pre-VLCD, that is, the differences between post-VLCD and pre-VLCD.

Figure 4a and Supplementary Tables S3–S5 summarize the predictors screened from our strategy. Figure 4a shows a bar plot with the estimated partial association score for predictors screened in pre-VLCD (n = 26), post-VLCD (n = 27), and changes from pre-VLCD (n = 22). Our strategy screened 61 unique predictors over these three scenarios. The cholic acid and genus Alistipes were screened in all scenarios.

FIG. 4.

Metrics for screening predictors of weight loss rate. (a) Biomarkers were selected at pre-VLCD, post-VLCD, and for changes from pre-VLCD according to the partial association score and persistent significance >0.7. Screened predictors are labeled on the Y-axis, and bar plots are colored to distinguish the data source. (b) A bubble plot based on GSEA results explains how data sources are enriched in the ranked list of biomarkers. The ranking criterion is based on the partial association score. The bubble size is proportional to the NES, and the gradient color schema indicates statistical significance. Bile acids are significantly enriched in all scenarios, pre-VLCD, post-VLCD, and for changes from pre-VLCD. GSEA, Gene Set Enrichment Analysis; NES, Normalized Enrichment Score.

At pre-VLCD, predictors from clinical data, metabolomics, and microbiota were equally represented, 23.1% in each of these data sources (Supplementary Table S3). The genera Butyricicoccus, Eggerthela, and Defluvitalea, oxygen binding, and primary bile acid metabolism were among the five top-ranked predictors. RNA-seq (33.3%) and bile acids (22.2%) were important data sources at post-VLCD. TCA, TCDCA, and cholic acid were at the top of the ranking, together with genus Alistipes and Response to DNA Damage Stimulus (Supplementary Table S4).

Among the 22 predictors screened for changes from pre-VLCD (Supplementary Table S5), the most represented data sources are RNA-seq (22.8%) and RT-PCR (22.8%), followed by bile acids (18.2%) and microbiota (18.2%). Insulin is the only predictor screened in the clinical data source. The allolithocholic acid showed the most significant partial association score (5.18) and persistent significance (0.99).

In Supplementary Figure S1, violin plots show the distribution of $Π (X)$ across data sources and time references (pre-VLCD, post-VLCD, and changes from pre-VLCD). We can infer from each distribution the relative importance of the data source in predicting weight loss. The central tendency of the distribution is a proxy for the data source importance, and the dispersions help to compare heterogeneity from different omics. The densities in Supplementary Figure S1 are skewed toward zero. The high concentration of predictors with estimated $Π (X)$ close to zero suggests that most of them cannot directly explain weight loss variation.

Enrichment of data sources

The $Π (X)$ metric provides a way of ranking all 412 predictors integrating all data sources. To verify the enrichment of individual omics on top-ranked biomarkers, we applied Gene Set Enrichment Analysis (GSEA) (Subramanian et al, 2005) to predictors ranked in descending order according to the estimated $Π (X)$ . The Normalized Enrichment Score and the associated p-value were used for enrichment. The results are shown in Figure 4b, where bile acids and RT-PCR predictors are enriched in pre-VLCD, post-VLCD, and changes from pre-VLCD.

RT-PCR and fecal bile acids as network hubs

Finally, we aimed to understand the connectivity between the examined predictors by network analysis. In Figure 5, we show two networks (pre-VLCD and changes from pre-VLCD) built with the connectivity measures described in Eqs. (9) and (10). The directed arrows in the network indicate how a data source in the origin node impacts the median association with weight loss on the data source in the descendant node. In Figure 5b, when analyzing changes from pre-VLCD, all data sources have a clear impact on RT-PCR, meaning that gene expression interacts heavily with other data sources to predict weight loss rate. A self-loop in RT-PCR is also expected since the selected genes for amplification are supposed to be jointly associated with weight loss mechanisms. This dynamic differs from the one in Figure 5a, where bile acids and clinical data sources interact widely with RNA-seq.

FIG. 5.

Network analysis of biomarker interactions for weight loss rate. Network analyses at pre-VLCD and changes from pre-VLCD show how data sources interact when predicting the weight loss rate. Nodes represent individual data sources, and the line thickness represents the strength of the interaction. (a) At pre-VLCD, transcriptomics (RNA-Seq, RT-PCR) and fecal microbiota interact with bile acids. (b) In changes from pre-VLCD, the network structure shows RT-PCR occupying a hub position, interacting with other data sources.

Discussion

We propose a method for ranking and screening predictors of weight loss in a high-dimensional study with a small sample size. A multiomics pipeline performs the data integration, feature selection, and explores the interconnection between omics. The strategy relies heavily on the use of the first-order partial correlation coefficient between the outcome and predictors, adjusting for any potential confounder. Two metrics; Partial Association Score $(Π (X))$ and Persistent Significance $(Ψ_{α} (X))$ were proposed, and jointly used to screen a robust set of weight loss predictors.

We applied the method to omics data collected in a small cohort of obese females provided a VLCD. We investigated the results in prestudy, poststudy, and changes from prestudy to poststudy. We evaluated heterogeneity in omics data sources and Spearman's correlation with the weight loss rate. We also investigated, within each data source, the empirical distribution of the Partial Association Score. The higher the entropy in the distribution, more important is the data source (Supplementary Fig. S1).

Clinical data and metabolites are more relevant predictors of weight loss at baseline (prestudy). This fact is verified in Figure 3b, where almost 25% of the predictors are in the region of moderate-to-large univariate correlation. GSEA results in Figure 4b show significant overrepresentation of these data sources in the predictors' ranking. Primary acid metabolism and high-density lipoprotein (HDL) are highly ranked in prestudy data (Supplementary Table S3).

A closer look at the top predictors revealed changes from baseline in abundance of the genus Alistipes, member of the Bacteroidetes phylum as the most important predictor within fecal microbiota (Supplementary Table S5) and three genera at the top-ranked predictors at prestudy, Butyricicoccus, Defluvitalea and Eggerthella (Supplementary Table S3). The impact of Bacteroidetes on host metabolism was recently demonstrated in (Gutiérrez-Repiso et al, 2022). The role of baseline gut microbiota in weight loss prediction has been discussed in recent literature (Diener et al, 2021).

Among the 22 biomarkers selected for changes from baseline, five came from the RT-PCR data source (Supplementary Table S5). These markers include LDLR and AKT1, which are part of the lipoprotein response pathway. Association between lipoprotein and weight loss has been extensively reported in several studies (Falkenhain et al, 2021; Ge et al, 2020; Rosenkilde et al, 2018). The importance of changes from prestudy in RT-PCR gene expression is also evident in univariate correlation with weight loss (Fig. 3b).

Subsets of fecal bile acids show large within-subjects variability and moderate correlation with the rate of weight loss (Figs. 3). Unlike omics heavily skewed toward small values of the Partial Association Score, bile acids are overrepresented in the upper tail of $Π (X)$ distribution (Supplementary Fig. S1). This fact is confirmed by GSEA (Fig. 4b). Finally, in Figure 5a, for prestudy data, bile acids occupied a hub position in the omics interaction network. Thus, analysis of our data through several approaches points to fecal bile acids as playing a role in determining the rates of weight loss.

Conjugated bile acids are synthesized from cholesterol in the liver (Quarfordt and Greenfield, 1973), pass into the small intestine by contraction of the gall bladder during a meal, and are extensively reabsorbed in the ileum. Although only about 5% of bile acids escape small intestinal absorption, the bile acid pool circulates up to six times per day, permitting a significant mass of these contents to enter the colon. In the colon and, to some extent, in the small intestine, microbiota first deconjugate the conjugated bile acids and then further metabolizes these to form numerous metabolites. Several bile acids have been shown to stimulate the production of gut peptides that have important metabolic consequences (Vítek and Haluzík, 2016). Whether these could influence the rate of weight loss is presently unknown.

Conclusion

This study integrated data from multiple sources to increase understanding of weight loss biological processes. Because of the small sample size, we developed a strategy based on nonparametric ranking statistics, namely the first-order partial correlation coefficient. Data originated from 10 obese postmenopausal women provided a VLCD in a metabolic facility in New York, USA. The small sample size is a limitation, therefore, we carried out a nonparametric approach, which explored parsimonious monotonic regressions.

Major variations in weight loss and weight gain rates have been observed in previous studies (Bouchard, 2021; Stunkard, 1996). Hypotheses about the determinants of weight loss rates in individual subjects have focused on genetic factors, differences in metabolic rates, energy expenditure, microbiota composition, and metabolites. Some individuals have been shown to have a “thrifty” phenotype with lower weight loss rates during caloric restrictions than those with a “spendthrift” phenotype who lost more weight (Reinhardt et al, 2015).

Our study provided elements to examine the relative importance of several omics used as predictors of weight loss rate. The top-ranked predictors include allolithocholic acid, taurocholic acid, microbiota abundance in genera Alistipes, Butyricicoccus, Defluvitalea, and Eggerthella, primary bile acid, and purine metabolisms. We also highlight that our method placed high predictive importance on RT-PCR gene expression. Screened predictors such as LDLR, DGAT2, AKT1, and PI3KCA have been associated with weight loss mechanisms.

A striking point was an overrepresentation of fecal bile acids on ranked predictors from prestudy and for changes from prestudy. Bile acids in the gut are now recognized as potent signaling molecules for receptors that act on systemic lipid and carbohydrate metabolism (Molinaro et al, 2018).

It must be pointed out that our data were derived only from postmenopausal women and therefore is limited to this population. However, we know of no data that suggests a gender difference in the variability in the weight loss rate. Furthermore, we studied only 10 subjects permitting us to develop a hypothesis about the important role of gut bile acids in determining the rate of weight loss. We hope our results will stimulate further studies on the biologic role of gut bile acids on this process.

Footnotes

Acknowledgment

We thank the participants of the original clinical study and the coauthors of prior publications from this research program.

Authors' Contributions

J.C.d.R., J.O.A., J.L.B., and P.R.H. conceived the study. J.C.d.R. and J.M. performed the statistical design and analyses. J.O.A. executed the clinical transcriptomic and RT-PCR analyses, and Y.L. executed bioinformatic analyses. J.C.d.R., J.O.A., J.L.B., and P.R.H. wrote the article with input from all coauthors.

Availability of Data and Materials

Clinical Trial design is reported at NCT01699906. Deidentified transcriptomic data are available at GEO Accession Number GSE106289.

Author Disclosure Statement

The authors declare they have no conflicting financial interests.

Funding Information

This work was supported by the National Center for Advancing Translational Sciences (grant no. UL1 TR000043), the National Institutes of Health Clinical and Translational Science Award Program to Rockefeller University, and The Rockefeller University Center for Basic and Translational Research on Disorders of the Digestive System.

Supplementary Material

Abbreviations Used

References

Alemán

, Bokulich

, Swann

, et al. Fecal microbiota and bile acid interactions with systemic and adipose tissue metabolism in diet-induced weight loss of obese postmenopausal women. J Transl Med, 2018; 16(1):244; doi: 10.1186/s12967-018-1619-z

Alemán

, Iyengar

, Walker

, et al. Effects of rapid weight loss on systemic and adipose tissue inflammation and metabolism in obese postmenopausal women. J Endocr Soc, 2017; 1(6):625–637; doi: 10.1210/js.2017-00020

Bouchard

. Genetics of obesity: What we have learned over decades of research. Obesity, 2021; 29(5):802–820; doi: 10.1002/oby.23116

Bouchard

, Tremblay

, Després

J-P

, et al. The response to long-term overfeeding in identical twins. N Engl J Med, 1990; 322(21):1477–1482; doi: 10.1056/NEJM199005243222101

de la Fuente

, Bing

, Hoeschele

, et al. Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics, 2004; 20(18):3565–3574; doi: 10.1093/bioinformatics/bth445

Diener

, Qin

, Zhou

, et al. Baseline gut metagenomic functional gene signature associated with variable weight loss responses following a healthy lifestyle intervention in humans. Ercolini D. ed. mSystems, 2021; 6(5):e0096421; doi: 10.1128/mSystems.00964-21

Epskamp

, Cramer

AOJ

, Waldorp

, et al. Qgraph: Network visualizations of relationships in psychometric data. J Stat Soft, 2012; 48(4):1–8; doi: 10.18637/jss.v048.i04

Falkenhain

, Roach

, McCreary

, et al. Effect of carbohydrate-restricted dietary interventions on LDL particle size and number in adults in the context of weight loss or weight maintenance: A systematic review and meta-analysis. Am J Clin Nutr, 2021; 114(4):1455–1466; doi: 10.1093/ajcn/nqab212

, Sadeghirad

, Ball

GDC

, et al. Comparison of dietary macronutrient patterns of 14 popular named dietary programmes for weight and cardiovascular risk factor reduction in adults: Systematic review and network meta-analysis of randomised trials. BMJ, 2020; 369:m696; doi: 10.1136/bmj.m696

10.

Greenland

. Valid P -values behave exactly as they should: Some misleading criticisms of P -values and their resolution with S -values. Am Statist, 2019; 73(Suppl. 1):106–114; doi: 10.1080/00031305.2018.1529625

11.

Gutiérrez-Repiso

, Garrido-Sánchez

, Alcaide-Torres

, et al. Predictive role of gut microbiota in weight loss achievement after bariatric surgery. J Am Coll Surg, 2022; 234(5):861–871; doi: 10.1097/XCS.0000000000000145

12.

Hales

, Fryar

, Carroll

, et al. Trends in obesity and severe obesity prevalence in US youth and adults by sex and age, 2007–2008 to 2015–2016. JAMA, 2018; 319(16):1723; doi: 10.1001/jama.2018.3060

13.

Hänzelmann

, Castelo

, Guinney

. GSVA: Gene set variation analysis for microarray and RNA-seq data. BMC Bioinform, 2013; 14(1):7; doi: 10.1186/1471-2105-14-7

14.

Hoffman

, Schadt

. VariancePartition: Interpreting drivers of variation in complex gene expression studies. BMC Bioinform, 2016; 17(1):483; doi: 10.1186/s12859-016-1323-z

15.

Kim

. Ppcor: An R package for a fast calculation to semi-partial correlation coefficients. CSAM, 2015; 22(6):665–674; doi: 10.5351/CSAM.2015.22.6.665

16.

Kirpich

, Ainsworth

, Wedow

, et al. Variable selection in omics data: A practical evaluation of small sample sizes. Orloff MS. ed. PLoS One, 2018; 13(6):e0197910; doi: 10.1371/journal.pone.0197910

17.

Leibel

, Rosenbaum

, Hirsch

. Changes in energy expenditure resulting from altered body weight. N Engl J Med, 1995; 332(10):621–628; doi: 10.1056/NEJM199503093321001

18.

, Peng

, Zhang

, et al. Robust rank correlation based screening. Ann Statist, 2012;40(3); doi: 10.1214/12-AOS1024

19.

McCabe

, Lin

D-Y

, Love

. Consistency and overfitting of multi-omics methods on experimental data. Brief Bioinform, 2020; 21(4):1277–1284; doi: 10.1093/bib/bbz070

20.

Molinaro

, Wahlström

, Marschall

H-U

. Role of bile acids in metabolic control. Trends Endocrinol Metab, 2018; 29(1):31–41; doi: 10.1016/j.tem.2017.11.002

21.

Quarfordt

, Greenfield

. Estimation of cholesterol and bile acid turnover in man by kinetic analysis. J Clin Invest, 1973; 52(8):1937–1945; doi: 10.1172/JCI107378

22.

Raghuraj Rao

, Lakshminarayanan

. Partial correlation based variable selection approach for multivariate data classification methods. Chemom Intell Lab Syst, 2007; 86(1):68–81; doi: 10.1016/j.chemolab.2006.08.007

23.

Reinhardt

, Thearle

, Ibrahim

, et al. A human thrifty phenotype associated with less weight loss during caloric restriction. Diabetes, 2015; 64(8):2859–2867; doi: 10.2337/db14-1881

24.

Rosenkilde

, Rygaard

, Nordby

, et al. Exercise and weight loss effects on cardiovascular risk factors in overweight men. J Appl Physiol, 2018; 125(3):901–908; doi: 10.1152/japplphysiol.01092.2017

25.

Sheskin

DJ.

Handbook of Parametric and Nonparametric Statistical Procedures: Third Edition, 0 ed. Chapman and Hall/CRC; 2003; doi: 10.1201/9781420036268

26.

Stunkard

. Current views on obesity. Am J Med, 1996; 100(2):230–236; doi: 10.1016/S0002-9343(97)89464-8

27.

Subramanian

, Tamayo

, Mootha

, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA, 2005; 102(43):15545–15550; doi: 10.1073/pnas.0506580102

28.

Valsesia

, Chakrabarti

, Hager

, et al. Integrative phenotyping of glycemic responders upon clinical weight loss using multi-omics. Sci Rep, 2020; 10(1):9236; doi: 10.1038/s41598-020-65936-8

29.

van den Heuvel

, Zhan

. Myths about linear and monotonic associations: Pearson's r, Spearman's ρ, and Kendall's τ. Am Statist, 2022; 76(1):44–52; doi: 10.1080/00031305.2021.2004922

30.

Vítek

, Haluzík

. The role of bile acids in metabolic regulation. J Endocrinol, 2016; 228(3):R85–R96; doi: 10.1530/JOE-15-0469

31.

Weatherburn

CE.

A First Course in Mathematical Statistics. Cambridge University Press; 1968.

32.

Whittaker

Graphical Models in Applied Multivariate Statistics. Wiley Series in Probability and Mathematical Statistics. Wiley: Chichester [England]; New York, NY, USA; 1990.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB

0.01 MB

0.16 MB