Abstract
BACKGROUND:
Ever since the GALAD (gender-age-Lens culinaris agglutinin-reactive alpha-fetoprotein-alpha-fetoprotein-des-gamma-carboxy prothrombin) logistic regression model was established to diagnose hepatocellular carcinoma (HCC), there has been no high-level evidence that evaluates and summarizes it.
OBJECTIVE:
This meta-analysis was performed to assess the diagnostic ability of the GALAD model.
METHODS:
The following databases were systematically searched for original diagnostic studies on HCC: PubMed, Embase, Medline, the Web of Science, Cochrane Library, China National Knowledge Infrastructure Wanfang (China), Wiper and the Chinese BioMedical Literature Database. After screening the search results according to our criteria, the Quality Assessment of Diagnostic Accuracy Studies 2 tool was used to evaluate the methodologic qualities, and statistical software were used to output the statistics.
RESULTS:
Ultimately, 10 studies were included and analyzed. The results revealed the pooled sensitivity and specificity of the GALAD model to be 0.86 (95% confidence interval [CI]: 0.82, 0.90) and 0.90 (95% CI: 0.87, 0.92), respectively, for all-stage HCC. The area under the curve (AUC) was 0.94. For early-stage HCC, the pooled sensitivity and specificity of the GALAD model were 0.83 (95% CI: 0.78, 0.87) and 0.81 (95% CI: 0.78, 0.83), respectively. The AUC was 0.90.
CONCLUSION:
This meta-analysis confirmed that the GALAD model has excellent diagnostic performance for early-stage and all-stage HCC and can maintain high sensitivity and specificity in early-stage HCC. Therefore, the GALAD model is qualified for screening early-stage canceration from chronic liver disease.
Introduction
In 2018, approximately 841,000 patients worldwide were newly diagnosed with hepatocellular carcinoma (HCC), and about 781,000 patients died of the disease. The World Health Organization estimated that there were 905,677 new cases of HCC worldwide in 2020 [1]. Today, HCC is the sixth most common malignant tumor in the world, and the number of new cases of HCC is expected to exceed 1 million annually by 2025 [1, 2]. This increase may be correlated to the population’s increased exposure to risk factors.
There are several proven risk factors associated with HCC. For instance, viral hepatitis or liver cirrhosis caused by the hepatitis B virus (HBV) or the hepatitis C virus (HCV), excessive aflatoxin intake, fatty liver caused by long-term alcoholism, obesity, smoking, diabetes mellitus type 2, and other factors all can lead to HCC. The main risk factors vary from region to region. In most of the world’s high-risk areas (e.g., China and East Africa), long-term aflatoxin intake and chronic HBV infection are the primary factors, while in some countries (e.g., Japan and Egypt), chronic HCV infection may be the main cause [3].
The 5-year survival rate of patients with early-stage HCC (per the Barcelona Clinic Liver Cancer [BCLC] staging classification 0–A) who receive timely radical treatment can be improved from 5%–30% to 69.0%–86.2% compared with patients receiving late treatment [4, 5]. The key to increasing the overall survival rate of HCC is to scientifically determine the high-risk population and formulate a hierarchical monitoring strategy for early detection and diagnosis [6].
At present, the screening and detection methods of HCC include imaging methods, serological biomarkers, epigenetic markers, and pathological biopsy. In terms of imaging, the American Association for the Study of Liver Diseases recommended in its 2018 guidelines [7] that regular abdominal ultrasound should be the primary monitoring method for high-risk individuals. However, the benefits of regular ultrasound examination for the early detection of HCC are limited, whether combined with serum alpha-fetoprotein (AFP) or not, and several studies have demonstrated that the efficacy of ultrasound in detecting early HCC is unsatisfactory. For liver cancers
There is no doubt that computed tomography (CT) and magnetic resonance imaging (MRI) are more reliable than ultrasound for accurately diagnosing HCC. The detection of lesions by CT is still regarded as one of the necessary criteria in the Chinese diagnosis standard, with the total sensitivity and specificity of the technique reaching 70%–81% and 79%–94%, respectively [13, 14, 15]. However, MRI and positron emission tomography are more accurate than ultrasound, but since they take longer and are more expensive, they are not generally used for primary screening [13].
In terms of biomarkers, including serum markers and cancer gene testing, gene testing requires technologies such as DNA sequencing to measure the concentration of telomerase reverse transcriptase, microRNAs, tumor protein 53, etc. to make the diagnosis clear or to assess treatment efficacy [12, 16]. However, cancer gene testing involves an immature laboratory test with poor reproducibility, while the use of serum biomarkers, including AFP, Lens culinaris agglutinin-reactive AFP (AFP-L3), carcinoembryonic antigen (CEA), and des-gamma-carboxy prothrombin (DCP, also known as PIVKA-II), is an established technology [6]. Liebman [17] reported DCP as a remarkably specific indicator of HCC; its diagnostic value has been verified globally, and it has been included in the Chinese diagnosis and treatment guidelines for HCC and approved for risk stratification in the United States [18]. However, although the specificity of DCP can reach 81%–98%, its sensitivity reaches only 48%–62%, which is not conducive to the detection of early-stage HCC [19].
Several studies have shown that the combination of DCP and other serum biomarkers, such as AFP, CEA, and AFP-L3, can effectively raise the sensitivity of the diagnosis and maintain high specificity [18]. In 2014, Johnson [20] established the GALAD (gender-age-Lens culinaris agglutinin-reactive alpha-fetoprotein-alpha-fetoprotein-des-gamma-carboxy prothrombin) logistic regression model, which combined gender, age, AFP-L3, AFP, and DCP. They found that the GALAD model had a high correct diagnosis rate for all-stage and early-stage HCC. The GALAD model has been preliminarily verified in China, Germany, The U.K., The U.S., Japan, and many other countries, most of which reported excellent diagnostic efficacy [21, 22, 23].
Consequently, this meta-analysis aimed to evaluate the diagnostic accuracy of the GALAD model for all-stage and early-stage HCC by integrating previous relevant high-quality studies on the application of the model for diagnosing HCC to provide reference data for accurate screening, diagnosis, and prognosis assessments for HCC in clinical practice.
Methods
Study retrieval strategies
The timescale for database retrieval was from the time of the databases’ establishment to June 2022. English periodical databases included PubMed, Embase, Medline, the Web of Science, and Cochrane Library, while Chinese periodical databases included the China National Knowledge Infrastructure, Wanfang, Wiper and the Chinese BioMedical Literature Databases. The retrieval mode of each database was determined by prior discussion. Our search string combined synonyms for each medical term to reduce the probability of missing studies.
Study inclusion criteria
The inclusion criteria were studies that (1) used methods of pathological or histopathological biopsy or imaging examination to formulate a definite diagnosis for HCC, (2) used the GALAD model to diagnose HCC and analyze its diagnostic efficacy (patients with HCC were compared with control groups), (3) provided true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) data, or these could be determined from the statistical data of the paper, and (4) included articles on prospective or retrospective randomized controlled studies.
Study exclusion criteria
The exclusion criteria were studies that (1) were not human clinical trials, (2) were expert consensuses, conferences, monographs, books, reviews, and interviews, (3) were repeatedly reported or searched in different databases, (4) did not present specific key data, (5) did not use the diagnostic gold standard tests of biopsy or radiography, (6) had an interval of more than 6 months between gold standard tests and blood sampling, and (7) had less than 50 cases in total.
Study quality assessment
Two reviewers independently read the title, abstract, and full text of the retrieved results to determine which records to include and exclude. Disagreements arising from this process were resolved by discussion. The quality and inclusion qualifications of the alternative studies were assessed using The Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool [24].
Data analysis
(1) Key data extracted by two reviewers from each study independently were summarized in two-by-two tables of TP, FP, FN, and TN values based on the formulas for sensitivity (TP rate)
(2) When diagnosing the same disease, the sensitivity and specificity of the same index will differ owing to diverse cut-off values; this results in the threshold effect, which is the main source of heterogeneity. Therefore, in our study, Spearman’s correlation coefficient, which was accessed by the Moses-Shapiro-Littenberg model (i.e., the Moses model), was used in Meta-DiSc statistical software (version 1.4) [25] to check for heterogeneity caused by the threshold effect. Just as the
(3) An inconsistency test and Cochran’s Q statistics were used to evaluate the heterogeneity among studies induced by the non-threshold effect. In this test,
(4) Forest plots related to sensitivity and specificity for each biochemical index and summary receiver operating characteristic (SROC) curves were drawn in Stata 16 software. By comparing the SROC curve of each indicator and their area under the curve (AUC), the advantages and disadvantages of each indicator as a diagnostic marker of HCC were obtained.
Results
Retrieval results
According to the jointly agreed retrieval scheme, 340 records were obtained through preliminary searches. Seventeen records of repeated retrieval were deleted, and 276 records irrelevant to the diagnosis of HCC were excluded after reading the abstract. After reading the full text, 25 records without GALAD model analysis were excluded along with 12 records without key data. Ultimately, a total of 10 studies [20, 21, 22, 23, 27, 28, 29, 30, 31, 32] were retained for consideration for further analysis. Figure 1 shows the retrieval strategies we discussed and agreed on jointly, along with the overall retrieval flow chart and selection process.
Part of the retrieving strategy in PubMed (A) and studies retrieval flow chart (B).
All 10 studies contained data on using the GALAD model to diagnose HCC and included 10,251 patients with HCC. The GALAD value calculation methods used in the included studies were the same as those of Johnson [20]:
[Z
In this equation, the “sex” value for males
Table 1 provides the quality evaluation results for all 10 studies through the application of the QUADAS-2 tool. Two reviewers used the QUADAS-2 tool independently for evaluation before summarizing the data. In the case of dispute, we re-evaluated the data through discussion or by specifying more detailed criteria. Table 2 lists the regions, the total number of participants, the diagnostic indicators used, and the nature of the control group in each included study; additionally, it presents the final data extracted from each study. For cases of early-stage HCC, the data calculation results of the GALAD model are shown in Table 3.
QUADAS-2 tool for study quality evaluation
QUADAS-2 tool for study quality evaluation
QUADAS-2 tool for study quality evaluation. Low: Low risk; High: High risk; ?: Unclear risk.
To enhance the comparability of the data, AFP, AFP-L3, DCP, and their combination were also tested by Spearman’s correlation coefficient and Moses’ model in addition to the GALAD model to verify whether there was heterogeneity of threshold effect sources. However, the cut-off values of AFP and AFP-L3 in Lin’s study [30] were 186 ng/mL and 1%, respectively, while in Huang’s research [27], the cut-off value of AFP was 7.75 ng/mL. In other studies, the values were 20 ng/mL and 10%, respectively. Therefore, the above data were not included in the threshold effect test and the final analysis of the results.
Funnel plot: GALAD model (A); AFP 
Table 4 shows Spearman’s correlation coefficient and Moses’ model test results, which revealed that there was no threshold effect for each serum biomarker. Funnel plots were obtained using Deeks’ funnel plot asymmetry test. The funnel plots are presented in Fig. 2 and indicate that there was no publication bias in the included studies for each biomarker. The heterogeneity test conducted in Meta-DiSc software showed that heterogeneity existed among different studies; hence, the random-effects model in Stata software was used in this meta-analysis.
A meta-analysis of the GALAD model for the diagnosis of all-stage and early-stage HCC was conducted, and the three serum biomarkers mentioned above and their combination for the diagnosis of
The basic information features and the specific data extracted of studies
The basic information features and the specific data extracted of studies
Total: Total number of cases and controls; AFP: Alpha-fetoprotein. The cut-off values are in parentheses. The cut-off value of AFP used in the fourth paper was 186 ng/mL, and 7.75 ng/mL was used in the eighth paper, and 20 ng/mL in the rest; AFP-L3: The Percentage of Lens Culinaris Agglutinin-reactive Alpha-fetoprotein; DCP: Des-
Data results for early-stage HCC
GALAD model was used to diagnose patients with early liver cancer (BCLC 0/A). Total: Total number of cases and controls in early liver cancer patients.
Result of spearman correlation coefficient and Moses’ model test
It can be considered that there is no threshold effect if the
all-stage HCC were analyzed for comparison. The preliminary analysis revealed that the pooled sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the GALAD model were 0.86 (95% confidence interval [CI]: 0.82, 0.90), 0.90 (95% CI: 0.87, 0.92), 8.4 (95% CI: 6.6, 10.7), and 0.15 (95% CI: 0.12, 0.20), respectively, for all-stage HCC. The DOR was 56 (95% CI: 39, 79), and the AUC was 0.94 (95% CI: 0.92, 0.96). For early-stage HCC classified by BCLC, the pooled sensitivity, specificity, and DOR of the GALAD model were 0.83 (95% CI: 0.78, 0.87), 0.81 (95% CI: 0.78, 0.83), and 22.49 (95% CI: 12.84, 39.37), respectively, and the AUC was 0.899.
The analysis results of the three serum markers that constitute the GALAD model were as follows, with all results relating to all-stage HCC. (1) The pooled sensitivity, specificity, PPV, NPV, and DOR of AFP were 0.62 (95% CI: 0.56, 0.68), 0.92 (95% CI: 0.88, 0.95), 8.1 (95% CI: 5.3, 12.3), 0.41 (95% CI: 0.36, 0.47), and 20 (95% CI: 13, 29), respectively. The AUC was 0.82 (95% CI: 0.78, 0.85). (2) The pooled sensitivity, specificity, PPV, NPV and DOR of AFP-L3 were 0.59 (95% CI: 0.51, 0.66), 0.93 (95% CI: 0.86, 0.96), 7.9 (95% CI: 4.3, 14.5), 0.45 (95% CI: 0.38, 0.53), and 18 (95% CI: 10, 33), respectively. The AUC was 0.79 (95% CI: 0.75, 0.82). (3) The pooled sensitivity, specificity, PPV, NPV and DOR of DCP were 0.76 (95% CI: 0.68, 0.82), 0.88 (95% CI: 0.84, 0.91), 6.4 (95% CI: 5.1, 8.0), 0.28 (95% CI: 0.21, 0.36), and 23 (95% CI: 17, 32), respectively. The AUC was 0.90 (95% CI: 0.87, 0.93). (4) The pooled sensitivity, specificity, PPV, NPV, and DOR of the combination of the three biomarkers were 0.85 (95% CI: 0.81, 0.88), 0.86 (95% CI: 0.80, 0.91), 6.3 (95% CI: 4.3, 9.2), 0.17 (95% CI: 0.14, 0.21), and 37 (95% CI: 24, 55), respectively. The AUC was 0.92 (95% CI: 0.89, 0.94).
Forest plot of sensitivity and specificity for GALAD model (A) and combination of AFP, AFP-L3 and DCP (B).
SROC curve (A), meta regression for looking for sources of heterogeneity (B), and Fagan nomogram of GALAD model (C).
In Fig. 3A is the forest plot of the GALAD model and shows the sensitivity and specificity of each included and pooled study. The forest plot of the combination of AFP, AFP-L3, and DCP is shown in Fig. 3B. Figure 4 presents an overview of the SROC curves for all indicators and their AUCs.
In addition, meta-regression in Stata 16 was applied to assess the source of heterogeneity caused by non-threshold effects. According to the information given in the included studies, we divided them by “yes” and “no” per the criteria of the region of the study case (Europe and Asia), the types of control groups selected, the type of study, whether the controls were followed up, and the number included. The “Quality Evaluation” section in Table 2 shows the results of our classification based on the information in each paper. Figure 4B shows the analysis results of the meta-regression. It can be seen that none of our classifications were the source of heterogeneity.
Hepatic carcinoma is one of the world’s major cancers, with China being one of the world’s hardest-hit areas. According to China’s 2020 cancer statistics, HCC is the fourth most commonly diagnosed malignant tumor in the country, but it causes the second highest number of deaths among all malignant diseases [33]. In China, chronic HBV infection that can cause cirrhosis is still the main cause of HCC [34]. It is thought that monitoring carcinogenesis in patients with hepatitis cirrhosis is a particularly important measure for its prevention and control. The cost of serum biomarker monitoring is low, and its accuracy is considerable, which provides convenience for long-term and multiple dynamic monitoring of patients with chronic liver disease (CLD). Furthermore, it can not only reduce medical costs but also decrease the mortality of high-risk patients with HCC [35]. In this meta-analysis, the diagnostic efficacy of AFP, AFP-L3, DCP, and their combination (AFP
In Zhou’s [39] meta-analysis, the AUC of AFP-L3 was 0.755. In a meta-analysis by Tzartzeva [9], the pooled sensitivity of ultrasound alone for early-stage HCC (according to Milan Criteria staging) was just 47%. When it was combined with AFP, the sensitivity reached 63%, while the sensitivity of CT or MR was higher, reaching 84%; however, these advantages were lower than the 85% sensitivity of the GALAD model.
Although the GALAD model may not be the optimal solution for the diagnosis and screening of all-stage patients with HCC, compared with other indicators for the diagnosis of early-stage patients, the GALAD model’s significant advantage means that it can play a better role in the early detection of HCC in regular screening for CLD. Currently, clinical screening methods for HCC are mainly ultrasound and AFP, which have been used for a long time and seem reliable, but we should not stop there. Today’s patients are busy and visit the hospital only occasionally, needing a quick and reliable diagnosis. Previous studies have shown that compared with combined ultrasound and AFP, the combination of AFP, AFP-L3, and DCP can significantly improve diagnostic efficiency [37]. However, the GALAD model improves diagnostic accuracy on the basis of the combination and fully extracting the information left by patients when they register for blood testing. Additionally, computers can quickly perform the model’s series of algorithm calculations to obtain rapid results and help establish the best algorithm suitable for different laboratories.
It is worth mentioning that on the one hand, changes in serum AFP and AFP-l3 levels are used as an indicator to evaluate HCC recurrence; the studies of Xu et al. [32] and Huang [27] showed that the GALAD model can be regarded as an indicator to evaluate the prognosis of HCC as well as to estimate the survival rate after resection. Statistically, the survival time in patients with low GALAD model scores was 32 to 36 months longer than that of patients with high GALAD model scores, on average. Furthermore, the diagnosis of HCC by this model is independent of HBV-DNA status, which is helpful for screening HCC from chronic viral hepatitis cirrhosis. On the other hand, the studies of Tong et al. [31] and He et al. [29] showed that the GALAD model has a good verification effect for the microvascular invasion. It is indicated that this model can guide the staging of HCC and help decide whether to expand the surgical scope. Furthermore, in Yang’s [23] study, when ultrasound was incorporated into the GALAD model and combined into a GALADUS model, the AUC for all-stage HCC reached 0.98, while the AUC for early-stage HCC remained at 0.97. If this conclusion is further confirmed, the non-invasive diagnosis of HCC can be raised to close to the gold standard. However, additional regions and further studies are needed for further meta-analyses and confirmation of this high diagnostic value.
The GALAD model has both merits and shortcomings. In terms of advantages, it has a high cost-performance ratio. In a study by Piratvisuth [41] 50 HCC-related biomarkers were used to explore the diagnostic ability of all-stage and early-stage HCC, and 3–5 biomarkers were combined in an attempt to improve diagnostic efficacy. The combination of AFP, DCP, matrix metalloproteinase-3 (MMP3), insulin-like growth factor-binding protein-3 (IGFBP3), cartilage oligomeric matrix protein (COMP), as well as gender and age, can achieve the best diagnostic efficacy. The corresponding calculation method and its diagnostic efficacy are both similar to that of the GALAD model; however, the biomarkers of IGFBP3, COMP, and MMP3 are not yet technically mature enough, and their low penetration and high detection cost limit their application. In contrast, the GALAD model is a promising method for HCC screening because of its low cost, high prevalence, and nearly perfect diagnostic efficiency.
Logistic regression algorithms also need to be improved. In this meta-analysis, according to the sensitivity
The GALAD model also performs well in assessing HCC prognosis. However, the BALAD-2 model combined with glutamic pyruvic transaminase and aspartate aminotransferase seemed to have a better predictive effect. Therefore, it remains to be studied which indicators are most closely related to prognosis [42]. In addition, some biomarkers with high specificities, such as Golgi transmembrane glycoprotein 73 [43], have been reported recently but have not been included in logistic regression model studies. When additional effective serum biomarkers are discovered and popularized, logistic regression analysis may be used to develop more accurate diagnostic indicators for HCC screening and diagnosis, such as the GALAD model.
This meta-analysis has some limitations. One of the main ones is that only 10 studies were included, and the small number prevented us from investigating all the potential reasons for heterogeneity. Second, the inclusion of healthy negative controls in studies may have led to a larger final result of the diagnostic efficacy of the meta-analysis. Third, the included cases covered a long period, and the changes in the prevalence and detection rate of patients with HCC may also have affected the results. Fourth, most of the included studies did not conduct long-term follow-ups for the control group and did not exclude pregnancy and reproductive-derived tumors, which may have raised the positive rate of women and thus reduced the diagnostic efficacy results of our meta-analysis. Finally, the serum DCP concentration is affected by oral anticoagulants, such as warfarin [44], and oral warfarin will lead to high DCP concentration, with FPs in the GALAD model. However, some of the included studies did not mention the related exclusion criteria.
Conclusion
In conclusion, our meta-analysis revealed that the GALAD model can maintain excellent sensitivity while ensuring high specificity for all-stage and early-stage HCC diagnoses. Therefore, it is suitable for the regular screening of HCC in patients with CLD.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Ethical approval
The study did not require ethical board approval because it did not contain human or animal trials.
Footnotes
Conflict of interest
The authors declare no potential conflicts of interest with respect to the research, authorship, and publication of this article.
