Abstract
Metabolic dysfunction-associated steatotic liver disease (MASLD) is a leading cause of global liver morbidity. Genetic variants, particularly in the patatin-like phospholipase domain-containing 3 (PNPLA3) gene, are critical determinants of metabolic traits and disease progression. This study presents the first meta-analysis to clarify the association between the PNPLA3 rs2896019 polymorphism and MASLD susceptibility and severity. We systematically searched PubMed, Embase, Web of Science, and Google Scholar for relevant articles published up to February 12, 2026. Data were extracted, and summary estimates of the association between PNPLA3 rs2896019 and MASLD were assessed. Odds ratios (ORs) and 95% confidence intervals (CIs) were used to measure the effect. Ten eligible case–control cohorts involving 15,028 participants (3118 cases and 11,910 controls) were included. Our results demonstrated that the G allele is significantly associated with a 51% increased risk of MASLD (OR: 1.5, CI: 1.23–1.85, P < 0.0001). A clear gene-dose effect was observed, with risks escalating in the homozygous model (OR: 2.32, 95% CI: 1.53–3.53, P < 0.0001). Subgroup analysis revealed that diagnostic modality was the primary source of heterogeneity, whereby restricting the analysis to biopsy-proven cohorts eliminated heterogeneity (I2 = 0%) and strengthened the association (OR = 1.91). Furthermore, the G allele significantly increased the risk of severe MASLD/MASH (OR = 2.06) compared to mild steatosis (OR = 1.40). In conclusion, the PNPLA3 rs2896019 G allele is a robust, dose-dependent risk factor for both the development and severe progression of MASLD.
Introduction
Metabolic dysfunction-associated steatotic liver disease (MASLD) was recently renamed from non-alcoholic fatty liver disease (NAFLD) to more accurately reflect its underlying pathophysiology. 1 The disease continuum initiates as simple hepatic steatosis, characterized by excessive fat accumulation in over 5% of hepatocytes. Without timely intervention, this condition can progress to metabolic dysfunction-associated steatohepatitis (MASH), a more severe histological state characterized by hepatocyte ballooning and lobular inflammation. 2 Over time, chronic inflammation in MASH drives progressive fibrogenesis, eventually culminating in cirrhosis and hepatocellular carcinoma. 3 Fueled by the rising global prevalence of obesity, type 2 diabetes mellitus, and metabolic syndrome, 4 MASLD has emerged as the most prevalent chronic liver disease worldwide. 5 It currently affects an estimated 38% of the global adult population. 6 The staggering clinical and economic burden associated with MASLD 7 underscores the urgent need for a deeper understanding of the factors that precipitate its onset and accelerate its progression.
While metabolic and environmental drivers are central to MASLD pathogenesis, substantial evidence suggests that individual susceptibility and disease progression are strongly influenced by genetic factors. 8 Accumulating evidence from familial and epidemiological studies strongly implicates genetic predisposition in the natural history of the disease. 9 Extensive genome-wide association studies (GWAS) have identified several genetic loci linked to the disease, 10 with the patatin-like phospholipase domain-containing protein 3 (PNPLA3) gene, located on chromosome 22q13.31, consistently emerging as the major genetic determinant of MASLD. 11 Specifically, the single nucleotide polymorphism (SNP) of this gene, rs2896019, has been associated with the full spectrum of liver injury, ranging from simple steatosis to liver cancer. 12 It is important to note that the primary causal variant driving PNPLA3-related liver injury is rs738409, a non-synonymous mutation resulting in an isoleucine-to-methionine substitution (I148M) that directly alters hepatic lipid metabolism. 13 In contrast, rs2896019 is an intronic polymorphism that does not alter the protein-coding sequence. 14 However, it has been frequently reported in GWAS and clinical genotyping platforms because it is in strong linkage disequilibrium (LD) with rs738409 in several populations, thereby serving as a “tag SNP” that serves as a proxy for the causal variant at this locus.12,15 Because the strength of this LD can fluctuate based on ancestral genetic architecture, 16 As a result, the reliability of rs2896019 as a surrogate marker for the causal variant is not uniform. This may contribute to inconsistencies observed across individual association studies. 17 Therefore, the rationale of our study is not to imply causality of rs2896019 per se, but rather to systematically evaluate whether this commonly used proxy SNP reliably reflects MASLD risk across diverse populations and study designs. This is particularly relevant for interpreting GWAS datasets and non-targeted genotyping panels where rs738409 may not have been directly assessed. Furthermore, the existing literature is complicated by significant methodological heterogeneity. Particularly, variations in study design, sample sizes, and diagnostic modalities used to define hepatic steatosis create variance in reported risk estimates.18–24
The recent transition in diagnostic criteria from NAFLD to MASLD necessitates a rigorous, updated synthesis of genetic evidence that aligns with current clinical definitions. Despite the recognized importance of the PNPLA3 locus, prevailing discrepancies in reported effect sizes, often confounded by ethnic diversity and varying diagnostic methods, remain unresolved. To address these gaps, we conducted the first systematic review and meta-analysis to establish a precise estimate of the association between the PNPLA3 rs2896019 polymorphism and MASLD susceptibility in adult populations. This study seeks to clarify the gene-dose effect of the “G” allele. Ultimately, this research aims to provide high-level evidence confirming PNPLA3 rs2896019 as a risk factor for MASLD.
Method
Search strategy
A comprehensive literature search was conducted to identify studies evaluating the association between the PNPLA3 rs2896019 variant and MASLD. Electronic databases including PubMed, Embase, Web of Science, and Google Scholar were systematically searched, with the most recent update performed on February 12, 2026. The search strategy followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. The following keyword combinations were used: “PNPLA3 and (rs2896019 or genetic variant or SNP) and (MASLD or MAFLD or NAFLD or fatty liver or steatosis).” Additionally, reference lists of relevant original articles and review articles were manually screened to ensure comprehensive coverage and identify any potential articles that could have been missed in the initial search.
Study eligibility
Only published articles that fulfilled the following inclusion criteria were included in this study: (i) full-text articles available in English in a readable and interpretable format; (ii) MASLD defined as the primary clinical outcome; (iii) evaluation of the association between the PNPLA3 rs2896019 polymorphism and MASLD; (iv) case–control study design; (v) sufficient genotype data to calculate odds ratios (ORs) with 95% confidence intervals (CIs) or allowed for the calculation of this estimate; and (vi) study populations consisting exclusively of adults (≥18 years of age). Studies were excluded if they: (i) lacked accessible or comprehensible full text; (ii) did not focus primarily on MASLD; (iii) investigated PNPLA3 polymorphisms other than rs2896019; (iv) used non-case–control designs; (v) lacked a relevant genetic association analysis; or (vi) investigated pediatric or adolescent populations (<18 years of age).
Data extraction and quality score assessment
Data from eligible articles were extracted independently by two investigators (W-YL. and R.M.) using standardized extraction forms. Initial screening was conducted based on titles and abstracts, followed by full-text evaluation where eligibility was uncertain. The following data were extracted: first author, year of publication, country, ethnicity, age group, diagnostic modality, genotyping method, sample size, and genotype distributions for cases and controls. Any discrepancies in data extraction were resolved through mutual discussion; if a consensus could not be reached, a third author (S.M.Z.) was consulted to adjudicate and make the final decision. The methodological quality of the included observational studies was assessed using the Newcastle–Ottawa Scale, which evaluates study selection, comparability of study groups, and outcome assessment. Scores were given on a scale from 0 to 9, with studies receiving a score of 7 or above categorized as having high quality. Details of the PRISMA criteria and flow diagram used are provided in (Supplementary Table S1) and (Supplementary Fig. S1).
Statistical analysis
Meta-analysis was carried out using Review Manager (RevMan 5.4, The Cochrane Collaboration, Copenhagen, Denmark). The primary outcome of this meta-analysis was to assess the pooled ORs with 95% CIs to determine the strength of association between the variant and MASLD. Allele frequencies were determined using the allele counting method. Hardy-Weinberg equilibrium (HWE) of the genotypes from the control group in each individual study was determined using a goodness-of-fit χ2-test, with deviations defined as P < 0.05. LD between rs2896019 and the I148M (rs738409) variant was assessed using LDlink based on 1000 Genomes Project population data. 25 The following outcomes of interest were assessed across five genetic models of inheritance: allelic (G vs. T), dominant (GG+TG vs. TT), recessive (GG vs. TG+TT), homozygote (GG vs. TT), and heterozygote (TG vs. TT). The Mantel–Haenszel test was performed to estimate the pooled ORs and corresponding 95% CIs. Visual inspection of the forest plots was first carried out, followed by the χ2-test of heterogeneity (a test of significance for heterogeneity) and the inconsistency index I2 (for the magnitude of heterogeneity). Significant heterogeneity was defined when the χ2-test yielded a P-value of 0.10 and I2 values exceeded 50%. Due to the presence of substantial clinical and statistical heterogeneity (I2 > 50%) across the overall analyses, a random-effects model was systematically adopted. Subgroup analysis of different covariates (specifically diagnostic modality, population ethnicity, and disease severity) was carried out in consideration of the influence of these clinical variables on the overall effect size.
Publication bias and sensitivity analysis
Potential publication bias was assessed by visual inspection of funnel plot symmetry and quantitatively evaluated using Begg and Mazumdar’s rank correlation test and Egger’s linear regression test, performed in R software (version 4.0.5). Sensitivity analyses were conducted by sequentially excluding individual studies to examine their influence on pooled effect estimates. Statistical significance was defined as a two-tailed P value <0.05.
Result
Study characteristics
This study selection procedure is detailed in Figure 1. Overall, 696 articles met the search terms. Following the removal of duplicates and screening of abstracts, 17 full-text articles were evaluated for eligibility. Eight studies were excluded due to investigating other polymorphisms, incorrect subject populations, lacking a relevant genetic association analysis, or investigating outcomes other than MASLD (Fig. 1). A total of 10 independent adult case–control cohorts from 9 eligible studies were included in the final meta-analysis. The baseline characteristics of the included populations are summarized in Table 1. The genotype frequencies of the PNPLA3 rs2896019 polymorphism in the control groups were consistent with HWE (P > 0.05) in all cohorts except for one. The study by Tsedendorj et al. demonstrated a deviation from HWE (P < 0.05), which was likely attributable to the small sample size of its control group (n = 50). All of the studies scored well on methodological quality assessment, which evaluated selection criteria, comparability of cases and controls in terms of design and analysis, and availability of genetic data (Supplementary data).

PRISMA flow diagram detailing the literature search strategy, screening process, and final study selection.
Baseline Characteristics of the Case–Control Studies Included in the Meta-Analysis
SNP, single nucleotide polymorphism; PCR-RFLP, polymerase chain reaction-restriction fragment length polymorphism; RT-PCR, real-time polymerase chain reaction; ASA, Asian Screening Array.
Quantitative synthesis
Due to the presence of significant statistical heterogeneity across the included studies (I2 = 88%, P < 0.00001), a random-effects model was systematically employed for all primary analyses. The quantitative synthesis revealed a highly significant association between the PNPLA3 rs2896019 variant and MASLD susceptibility. Under the allelic model (G vs. T), the risk G allele was significantly more prevalent in cases, conferring a 51% increased risk of MASLD (OR: 1.51, confidence interval [CI] 1.23–1.85, P < 0.0001). Consistent associations demonstrating a clear gene-dose effect were observed across all other genetic inheritance models. Significant increased risks were identified in the heterozygous (OR: 1.49, 95% CI: 1.34–1.66, P < 0.00001), dominant (OR: 1.58, 95% CI: 1.21–2.07, P = 0.0008), recessive (OR: 1.96, 95% CI: 1.49–2.59, P < 0.00001), and homozygous (OR: 2.32, 95% CI: 1.53–3.53, P < 0.0001) models (Fig. 2, Table 2).

Forest plot of the overall association between the PNPLA3 rs2896019 polymorphism and MASLD risk under the primary allelic model (G vs. T).
Summary of Pooled Odds Ratios (ORs) and Heterogeneity Estimates for the Association Between the PNPLA3 rs2896019 Polymorphism and MASLD across Five Genetic Models of Inheritance
CI, confidence interval; RE, random-effects model; FE, fixed-effect model.
Subgroup analysis by diagnosis modality
Stratification by diagnostic modality successfully resolved the primary source of statistical heterogeneity. In the subgroup of patients diagnosed via the gold-standard liver biopsy, the genetic risk was highly consistent (OR: 1.91, 95% CI: 1.75–2.08, P < 0.00001) with heterogeneity dropping to an I2 of 0% (P = 0.80). Conversely, studies utilizing non-invasive imaging or ultrasonography showed a weaker, non-significant association (OR: 1.28, 95% CI: 0.96–1.71, P = 0.10) and retained high heterogeneity (I2 = 86%). These findings indicate that diagnostic methodology acts as a significant modifier of the observed genetic risk (Fig. 3).

Forest plot of the subgroup analysis stratified by diagnostic modality (Liver Biopsy vs. Diagnostic Imaging) under the allelic model (G vs. T). Note: The replication cohort from Kitamoto et al. was excluded from this specific subgroup analysis to prevent methodological confounding, as the study utilized a combined diagnostic approach of both liver biopsy and CT/MRI, precluding its strict assignment to a single diagnostic category.
Subgroup analysis by ethnicity
Stratification by ethnicity revealed distinct risk profiles across diverse populations. The G allele significantly increased MASLD risk in Asian cohorts (OR: 1.69, 95% CI: 1.46–1.96, P < 0.00001, I2 = 70%) and Caucasian cohorts (OR: 1.61, 95% CI: 1.30–1.99, P < 0.0001, I2 = 0%). In contrast, a reduced risk association was observed in the single Hispanic cohort (OR: 0.64, 95% CI: 0.50–0.82). These variations demonstrated a potential influence of underlying population-specific genetic architectures (Fig. 4).

Forest plot of the subgroup analysis stratified by population ethnicity (Asian, Caucasian, Hispanic) under the allelic model (G vs. T).
Subgroup analysis by disease severity
To assess the role of the polymorphism in disease progression, pooled risks were evaluated across different disease stages. The G allele conferred a significant risk for mild, simple steatosis (OR: 1.40, 95% CI: 1.17–1.66, P = 0.0002). Notably, this pooled risk estimate escalated significantly in patients diagnosed with severe MASLD, including MASH and MASH-HCC (OR: 2.06, 95% CI: 1.65–2.57, P < 0.00001). This significant contrast in the pooled estimates between mild and severe disease stages provides robust evidence that the PNPLA3 G allele actively drives the progression of hepatic damage (Fig. 5).

Forest plot of the subgroup analysis stratified by disease severity (Mild MASLD vs. Severe MASLD/MASH) under the allelic model (G vs. T).
Subgroup analysis by genotyping platform
To evaluate whether the choice of laboratory methodology influenced the observed association, a subgroup analysis stratified by genotyping platform was conducted. Within the subset of studies utilizing SNP arrays, the PNPLA3 G allele demonstrated a trend toward increased MASLD risk, though it did not reach independent statistical significance (P = 0.08). Conversely, significant associations were observed across all other genotyping platforms, yielding robust risk estimates for cohorts assessed via real-time polymerase chain reaction (RT-PCR) (OR: 1.50, 95% CI: 1.28–1.77) and polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) (OR: 1.67, 95% CI: 1.22–2.28). The replication cohort reported by Kitamoto et al. was excluded, as it was the only study employing a multiplex genotyping panel. Despite the variance observed within the SNP array subgroup, the overarching risk estimates across these diverse experimental techniques continue to support the reliability of the overall genetic association (Fig. 6).

Forest plot of the subgroup analysis stratified by genotyping platform (SNP Array, RT-PCR, PCR-RFLP) under the allelic model (G vs. T).
Sensitivity analysis
A leave-one-out sensitivity analysis was conducted under the primary allelic model (G vs. T) to evaluate the stability of the overall pooled ORs by sequentially omitting one study at a time and recalculating the effect size. The systematic exclusion of any single study did not materially alter the overall significance or direction of the pooled estimates, confirming the high stability and reliability of the findings. Notably, the exclusion of the Hispanic cohort (Larrieta-Carrasco et al.) resulted in a substantial reduction in statistical heterogeneity (I2 improved from 88% to 61%, P = 0.008), while the primary association remained robustly significant (OR: 1.68, 95% CI: 1.49–1.90, P < 0.00001). This identifies this distinct population as a primary source of statistical variance.
Publication bias
Visual inspection of the funnel plot, along with Egger’s regression test (P = 0.26) and Begg and Mazumdar’s rank correlation test (P = 0.087), provided no evidence of publication bias (Supplementary Fig. S2).
Discussion
The PNPLA3 gene plays an important role in hepatic lipid metabolism. 28 While the extensively studied rs738409 variant results in a functional isoleucine-to-methionine (I148M) substitution, rs2896019 is a distinct intronic polymorphism within the PNPLA3 locus. 14 Although it does not alter the protein-coding sequence, rs2896019 is strongly associated with altered liver enzyme levels and the pathogenesis of steatotic liver disease. 26 In addition, its strong LD with the functional I148M variant in several populations highlights its potential pathogenic relevance. Therefore, clarifying the association between rs2896019 and both MASLD risk and disease severity cross diverse populations remains of substantial clinical importance.
To our knowledge, this is the first comprehensive meta-analysis investigating the association between PNPLA3 rs2896019 polymorphism and MASLD susceptibility in adults. Drawing on 10 independent case–control cohorts involving over 15,000 subjects, our findings demonstrate that the G allele confers a significantly increased risk of MASLD, with a 1.51-fold higher odds under the primary allelic model. Importantly, this association exhibited a pronounced gene-dosage effect, escalating to a 132% increased risk in homozygous (GG) individuals. These results underscore PNPLA3 rs2896019 polymorphism as an important genetic biomarker for MASLD risk assessment.
Interestingly, we noted discrepancies in the directionality of allele effects across ethnicities. While the G allele was a consistent risk factor in Asian and Caucasian populations, the single Hispanic cohort demonstrated a divergent, protective trend. 27 This finding is particularly paradoxical given that the PNPLA3 locus typically exerts its greater risk effect within Hispanic populations, driven by the high prevalence of the functional I148M variant. 15 As noted previously, the strength of this LD varies across populations due to differences in genetic architectures and recombination patterns, meaning the reliability of rs2896019 as a surrogate marker is not uniform. While rs2896019 is an excellent proxy for the causal variant in many populations, Hispanic cohorts are characterized by complex historical admixture. 29 Frequent genetic recombination events in admixed populations can alter local haplotype blocks, causing the statistical correlation between the tag SNP and the causal mutation to degrade. 30 Our LD (LDlink) confirmed that in Hispanic sub-populations, rs2896019 is in strong LD (D’ = 0.97) with the causal I148M variant. This suggests that the divergence in this specific cohort may stem from unique haplotype structure. Sensitivity analysis showed that exclusion of this cohort significantly improved the findings.
A major strength of this study is its large aggregated sample of over 15,000 individuals, enabling robust and well-powered evidence for the significant contribution of rs2896019 to MASLD susceptibility. Nevertheless, a notable limitation of the overall analysis was the presence of substantial heterogeneity (I2 = 88%). Our subgroup stratification by diagnostic modality successfully identified the source of this variance. Specifically in cohorts utilizing the gold-standard liver biopsy, heterogeneity was entirely eliminated (I2 = 0%). Within this biopsy-proven subgroup, the pooled risk estimate was both highly consistent and significantly stronger (OR: 1.91) compared to studies utilizing ultrasonography (OR: 1.28). Ultrasonography is highly operator-dependent and relatively insensitive to mild steatosis, often detecting hepatic fat only when it exceeds 20%–30% of the liver parenchyma. 31 Our findings suggest that imaging-based phenotypic misclassification may introduce substantial variability into genetic association analyses. Conversely, accurate histological diagnosis reveals a more consistent underlying genetic risk for the PNPLA3 variant. Furthermore, our analysis provides evidence that the rs2896019 polymorphism is not only associated with early steatosis but also disease severity. Subgroup analysis demonstrated a significant increase in risk from simple steatosis (OR: 1.40) to severe MASLD and MASH (OR: 2.06).
We also investigated the impact of genotyping methodology. The association between the G allele and MASLD remained robust across various targeted genotyping platforms, including RT-PCR, PCR-RFLP, and the Invader assay. High-throughput genome-wide SNP arrays demonstrated a trend toward increased risk, though without independent statistical significance, likely due to the non-targeted nature of GWAS designs. The consistent direction of effect across diverse platforms reinforces that the observed genetic risk reflects a true biological phenomenon rather than an assay-specific artifact, supporting the reliability of our pooled estimates.
Nonetheless, this study was conducted using a rigorous methodology, including a comprehensive literature search, stringent selection criteria, detailed quality assessment, and evaluation across five genetic models. Sensitivity analysis confirmed the robustness of results, reinforcing the credibility of our conclusions.
Conclusion
In conclusion, this meta-analysis provides the first aggregated evidence that the PNPLA3 rs2896019 G allele significantly increases both the susceptibility to and severity of MASLD. These findings support the potential of PNPLA3 rs2896019 as a robust genetic biomarker in MASLD.
Original Work Contribution
This contribution represents original work that has not been previously published or simultaneously submitted for publication elsewhere.
Author Contribution and Approval
All authors contributed to the conception and design of the article and interpreting the relevant literature. All authors were involved in writing the article or revising it for intellectual content. The article has been read and approved by all the authors who agree to be accountable for all aspects of the work. All ICMJE conditions have been met.
Authors’ Contributions
W.-Y.L.: Writing—review and editing, writing—original draft, conceptualization, data curation, and formal analysis. R.M.: Writing—review and editing, writing—original draft, conceptualization, data curation, and formal analysis. Y.-F.P.: Writing—review and editing, and conceptualization. H.-K.L.: Writing—review and editing. R.-X.N.: Writing—review and editing. S.M.Z.: Writing—review and editing, conceptualization, and supervision. All authors approved the final version of the article.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
Funding Information
The work was not supported by any external funding.
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
