Abstract
Despite the appearance of new oral antiviral drugs, pegylated interferon (PEG-IFN)/RBV may remain the standard of care therapy for some time, and several viral and host factors are reported to be correlated with therapeutic effects. This study aimed to reveal the independent variables associated with failure of sustained virological response (SVR) to PEG-IFN alpha-2a versus PEG-IFN alpha-2b in treatment of naive chronic hepatitis C virus (HCV) Egyptian patients using both statistical methods and data mining techniques. This retrospective cohort study included 3,235 chronic hepatitis C patients enrolled in a large Egyptian medical center: 1,728 patients had been treated with PEG-IFN alpha-2a plus ribavirin (RBV) and 1,507 patients with PEG-IFN alpha-2b plus RBV between 2007 and 2011. Both multivariate analysis and Reduced Error Pruning Tree (REPTree)-based model were used to reveal the independent variables associated with treatment response. In both treatment types, alpha-fetoprotein (AFP) >10 ng/mL and HCV viremia >600 × 103 IU/mL were the independent baseline variables associated with failure of SVR, while male gender, decreased hemoglobin, and thyroid-stimulating hormone were the independent variables associated with good response (P < 0.05). Using REPTree-based model showed that low AFP was the factor of initial split (best predictor) of response for either PEG-IFN alpha-2a or PEG-IFN alpha-2b (cutoff value 8.53, 4.89 ng/mL, AUROC = 0.68 and 0.61, P = 0.05). Serum AFP >10 ng/mL and viral load >600 × 103 IU/mL are variables associated with failure of response in both treatment types. REPTree-based model could be used to assess predictors of response.
Introduction
H
Despite the appearance of new oral antiviral drugs, pegylated interferon (PEG-IFN)/RBV may remain the standard of care therapy for some time, particularly with the high costs of the new treatment (Esmat and others 2015).
The sustained virological response (SVR) in genotype 4 Egyptian patients treated with PEG-IFN/RBV was estimated around 60% (El Makhzangy and others 2009). Two forms of PEG-IFN, PEG-IFN alpha-2a (40 KD) (Pegasys; Hoffmann-La Roche) and PEG-IFN alpha-2b (12 KD) (PegIntron; Schering-Plough Corporation), are commercially available, which differ in terms of their pharmacokinetic, viral kinetic, and tolerability profiles (Bruno and others 2004).
Knowledge of predictors of response is extremely valuable as this will lead to considering pretreatment counseling, sparing patients the side effects, and cost of therapy (Abdo and Sanai 2009). However, it is difficult to determine which factor is the most important predictor for an individual patient. Data mining analysis is useful for combining all together to predict the therapeutic effects (Namiki and others 2010).
Data mining is a method of predictive analysis, which can explore information to discover hidden patterns and relationships. Decision tree analysis is a core component of data mining that can be used to build predictive models (Breiman and others 1980). This method has been used to define prognostic factors in various diseases, such as liver cell failure (Baquerizo and others 2003), and for the prediction of virological response in HCV patients (Kurosaki and others 2010).
The aim of this study was to explore the independent variables associated with poor response to treatment with PEG-IFN alpha-2a compared with PEG-IFN alpha-2b in treatment of naive chronic HCV Egyptian patients using both statistical methods and data mining techniques.
Patients and Methods
Type of the study
This was a retrospective single-center study as a part of national program focused on chronic HCV patients recruited from the Cairo Fatemic Hospital, Egyptian Ministry of Health and Population (MOHP)—located in Cairo, Egypt—between 2007 and 2011.
Study sample
This study involved 3,235 interferon-naive chronic HCV Egyptian patients recruited from one of the largest centers affiliated to the MOHP in Cairo for treatment of viral hepatitis. Adult patients aged 18–60 years who were recruited had serological, virological, and histopathological evidence of HCV infection.
A detailed history was obtained, and a thorough clinical examination, basic laboratory tests, and liver biopsy were performed before treatment and after obtaining a written informed consent from each patient and local ethics committee approval at the time of patients' recruitment. The study protocol conformed to the ethical guidelines of the 1975 Declaration of Helsinki with ethics committee approval from the Egyptian MOHP for this study.
Inclusion criteria for patients' enrollment were as follows: • Adult HCV patients of both sexes who are positive for HCV antibodies using a third-generation enzyme-linked immunosorbent assay (ELISA) test and detectable HCV RNA expressed in IU/mL and measured by COBAS® AmpliPrep/COBAS TaqMan® HCV assay, which uses real-time reverse transcriptase–polymerase chain reaction (PCR) to measure HCV viral load over a broad dynamic range • Hematological profile: white blood cell (WBC) >4,000/mm3, neutrophil count >2,000/mm3, and platelets >100,000/mm3
• Biochemical liver profile: direct bilirubin 0.3 mg/dL or within 20% of upper limit of normal (ULN), prothrombin time (PT) <2 s above ULN, and albumin >3.5 gm/dL • Normal kidney function tests and controlled blood sugar in diabetic patients • Thyroid-stimulating hormone (TSH) within normal range and antinuclear antibody (ANA) titer <1:20.
Exclusion criteria were as follows: other causes of liver disease, decompensated liver disease, coinfected patients with hepatitis B virus (HBV; positive HBsAg) and HIV (positive HIV-1 or HIV-2), and patients with hepatocellular carcinoma, severe psychiatric disease, or serious comorbid conditions.
Laboratory procedures
Blood tests performed at enrollment process include complete blood count, quantitative HCV RNA by PCR, liver enzymes (aspartate aminotransferase [AST] and alanine aminotransferase [ALT]), serum albumin, INR, TSH, and ANA.
Liver biopsy was performed under ultrasonography guidance after checking the adequacy of the patients' coagulation profile. The METAVIR scoring method was used for histopathological assessment (Bedossa and Poynard 1996). Only liver biopsy samples at least 10 mm long or that had 6 portal tracts were examined to allow for adequate interpretation.
Randomization method
A random allocation for placing patients into 2 groups was used. The 2 treatment types were assigned to every other week basis. Participants who fulfill the eligibility criteria will receive the treatment type assigned for the corresponding week.
Patients were randomized to receive either PEG-IFN alpha-2a (Pegasys) 180 μg/week subcutaneously plus ribavirin (RBV) 15 mg/kg/day orally or PEG-IFN alpha-2b (PegIntron) 1.5 mcg/kg/week subcutaneously plus RBV 15 mg/kg/day orally.
Follow-up
Virological parameters (HCV RNA) at baseline and at weeks 12, 24, 48, and 72 were tested. Patients not fulfilling early virological response (EVR) at 12 weeks have discontinued treatment. Clinical and laboratory follow-up data were collected. A 3-fold increase in AST or ALT was considered a significant elevation; TSH abnormalities refer to an increase in blood level, and for those patients for whom endocrinal consultation was done, no reduction in treatment dose or discontinuation was recommended. The duration of follow-up for relapse cases and side effects was 72 weeks for compliant patients.
Data collection
Ethical approval for data collection was obtained from the MOHP ethics committee. Data obtained from HCV patients' medical records included demographic and laboratory data and liver biopsy results.
Data mining
Using the data mining analysis, we constructed decision tree learning algorithm: Reduced Error Pruning Tree (REPTree; it is a fast decision tree learner, which was published by Mehta, Agrawal, and Rissanen in 1996). The proposed model was used to predict variables associated with SVR among patients with chronic HCV who received PEG-IFN alpha-2a versus PEG-IFN alpha-2b using 20 attributes, including clinical, laboratory, and histopathological variables.
Feature selection
A subset of 20 attributes contained 3 demographic variables [age, gender, and body mass index (BMI)], 3 hematological variables (hemoglobin, WBCs, and platelets), laboratory variables [serum glucose, total bilirubin, albumin, AST, ALT, alkaline phosphatase, prothrombin concentration, alpha-fetoprotein (AFP), creatinine, ANA, and TSH], and HCV viral load in addition to 2 histopathological variables (activity and stage of fibrosis).
Data formatting
A number of data transformation techniques have been used to format and prepare the patient records to be processed by the learning algorithms.
REPTree has been used to build the predictors of both PEG-IFN alpha-2a and PEG-IFN alpha-2b. REPTree is an example of commonly used decision trees of high accuracy in medical classification, which can handle both categorical and numerical data. In addition, it is a fast method as time used to build up the model is 0.03 s.
Validation of the decision tree
Internal validation was performed with Test mode: 10-fold cross-validation, which is generally applied to predict the performance of a model on a validation set using computation in place of mathematical analysis. It is a technique for assessing how the results of a statistical analysis will generalize to an independent data set.
Performance of algorithm
The performance of the algorithm was done according to evaluation matrix based on values for the correctly classified instance: precision, recall, F-score, and receiver operating characteristic (ROC) curve.
• Recall (sensitivity): the ability of the test to correctly identify those with the disease (true +ve rate).
• Precision (specificity): the ability of the test to correctly identify those without the disease (true −ve rate).
• ROC area: area under the curve (AUC) serves as an indicator of the overall performance of the algorithm.
• Correctly classified instances, which evaluate the overall accuracy.
Statistical methods
The descriptive statistics were expressed as mean ± standard deviation (SD) or median for nonparametric data. The χ2 test and Student's t-test were used for analysis of qualitative or quantitative variables, respectively. Pearson correlation was done to correlate continuous variables, while Spearman correlation was for correlating fibrosis stages with other variables. In all tests, P values were significant if <0.05. For the multivariate analysis for factors associated with treatment failure, logistic regression models with backward selection were used to identify the independent predictors of response. Variables that showed a significant association with response by univariate analysis were included in the multivariate analysis. AUC >0.60 with P value <0.05 was considered significant.
Results
This study included 3,235 patients: 1,728 received PEG-IFN alpha-2a, while 1,507 received PEG-IFN alpha-2b. The 2 arms of the study were comparable. There was male predominance in both groups, with no statistical significance (80.3% for PEG-IFN alpha-2a and 82% for PEG-IFN alpha-2b, P = 0.32).
Baseline demographic features and laboratory data of the study patients are shown in Table 1 with no significant difference between responders and nonresponders in each treatment group apart from age and serum AFP (both were lower among responders in each group).
Bold values represent statistically significant values.
Upper level of normal of ALT and AST = 40 U/L and AFP = 10 ng/mL.
AFP, alpha-fetoprotein; ALT, alanine aminotransferase; AST, aspartate aminotransferase; BMI, body mass index; HCV, hepatitis C virus; PEG-IFN, pegylated interferon; SD, standard deviation; TSH, thyroid-stimulating hormone; WBC, white blood cell.
The histopathological features did not show a statistically significant difference between both treatment groups (Table 2).
Regarding the response to treatment in both groups, the EVR was statistically nonsignificant between both groups; however, the end of treatment response (ETR) and the SVR showed a statistically significant difference (64.10% versus 58.20% and 59.60% versus 53.90%, P < 0.05), with a better response in those treated with PEG-IFN alpha-2a (Fig. 1).

Virological response in relation to the type of treatment.
Both univariate and multivariate analyses showed that serum AFP >10 ng/mL and viral load >600 × 103 IU/mL were the independent baseline predictors associated with treatment failure, while male gender was associated with SVR in patients receiving either PEG-IFN alpha-2a or PEG-IFN alpha-2b (P < 0.05) (Tables 3 and 4).
Bold values represent statistically significant values.
Bold values represent statistically significant values.
ALT level >40 U/L was associated with treatment failure in patients who received PEG-IFN alpha-2a, while this was not evident in those who received PEG-IFN alpha-2b using multivariate analysis.
Follow-up data showed that decrease in hemoglobin and TSH levels (reflecting patients' compliance) was the independent variable associated with favorable treatment response in patients who received either PEG-IFN alpha-2a or PEG-IFN alpha-2b using both univariate and multivariate regression analyses (P < 0.05) (Tables 3 and 4).
When applying data mining techniques using REPTree, the decision tree model used for each group (in the patients receiving either PEG-IFN alpha-2a or PEG-IFN alpha-2b) revealed that baseline AFP was selected as the variable of initial split (most decisive). Among patients who were compliant to treatment with PEG-IFN alpha-2a, those with baseline AFP cutoff levels <8.53 ng/mL were classified as the high probability group, SVR (78.89%), while in those with AFP levels ≥8.53 ng/mL, SVR dropped to 51.92% (low probability group), with sensitivity of 75.8% and specificity of 94%. However, among patients who were compliant to treatment with PEG-IFN alpha-2b, those with baseline AFP cutoff levels <4.89 ng/mL were classified as the high probability group, SVR (74.52%), while in those with AFP levels ≥4.89 ng/mL, SVR dropped to 52.13% (low probability group), with sensitivity of 69% and specificity of 92% (Figs. 2 and 3).

REPTree algorithm for the likelihood of SVR among chronic HCV patients who received PEG-IFN alpha-2a. HCV, hepatitis C virus; SVR, sustained virological response; PEG-IFN, pegylated interferon.

REPTree algorithm for the likelihood of SVR among chronic HCV patients who received PEG-IFN alpha-2b.
In contrast, other attributes, such as age, BMI, ALT, AST, prothrombin concentration, serum creatinine, WBCs, serum glucose, hepatic fibrosis, and activity, had less decisive role for prediction of response.
To assess performance of this REPTree-based model, ROC curve was designed, and AUC showed that it could be used for prediction of response in either treatment group (for PEG-IFN alpha-2a, AUC = 0.68, and for PEG-IFN alpha-2b, AUC = 0.61, P = 0.05).
Discussion
Treatment with either PEG-IFN alpha-2a or PEG-IFN alpha-2b, plus RBV, was recommended for patients infected with HCV genotype 4, the most common genotype in Egypt (Bruno and others 2004). In addition, it is still having a place according to the EASL Guidelines 2015. However, few data comparing both treatment types are available.
In this study, we aimed to explore the independent variables associated with poor response to treatment with PEG-IFN alpha-2a versus PEG-IFN alpha-2b in treatment of naive chronic HCV Egyptian patients using both statistical methods and data mining techniques.
A total of 3,235 treatment-naive subjects were randomly allocated into 2 treatment groups: 1,728 who received PEG-IFN alpha-2a and 1,507 who received PEG-IFN alpha-2b. Both treatment groups were comparable in their baseline laboratory data and histopathological pattern.
Regarding the response to treatment in both groups, the EVR was statistically nonsignificant between both groups; however, ETR and SVR rates for patients receiving PEG-IFN alpha-2a versus PEG-IFN alpha-2b were 64.10% versus 58.20% and 59.60% versus 53.90%, with a statistically significant difference and better response in those treated with PEG-IFN alpha-2a (P < 0.05).
Similarly, several trials have also reported higher SVR rates with PEG-IFN alpha-2a compared with PEG-IFN alpha-2b (Ascione and others 2010). The majority of head-to-head randomized controlled trials, including the large randomized IDEAL (Individualized Dosing Efficacy versus Flat Dosing to Assess Optimal Pegylated Interferon Therapy) trial (n = 3,070), demonstrated similar SVR rates for PEG-IFN alpha-2a and PEG-IFN alpha-2b (41% versus 39% in IDEAL) in combination with RBV; however, 2 randomized controlled trials (n = 431 and 320) demonstrated a statistically significant benefit for PEG-IFN alpha-2a (66% versus 54% and 69% versus 54%). Furthermore, 2 large retrospective studies and 1 prospective observational study in real-life settings have shown a significant benefit for PEG-IFN alpha-2a versus PEG-IFN alpha-2b, although SVR rates were generally lower than those seen in controlled trials (Foster 2010).
Male gender was a dependent factor associated with good treatment response using both univariate and multivariate analyses in both treatment groups. This disagrees with the fact that female gender is favored for HCV clearance (Bakr and others 2006; Berg and others 2006).
In large prospective studies of PEG-IFN and RBV combination therapy, younger age correlated significantly with an SVR when assessed by univariate and multivariate analyses, and patients younger than 40–45 years showed the best response rates (Shiffman and others 2007). Most of our study patients were <50 years old who received either PEG-IFN alpha-2a or PEG-IFN alpha-2b. Univariate analysis showed that patients' age >50 years is associated with poor response to treatment with PEG-IFN alpha-2a only. However, this could not be proved by the multivariate analysis.
Obesity is a known predictor of disease progression in patients with chronic HCV. In a prospective trial, a BMI ≥25 kg/m2 was significantly associated with fibrosis progression (Ortiz and others 2002). A high BMI, but not body weight, was also inversely correlated with SVR in both IFN- and PEG-IFN-treated individuals (Berg and others 2006). A lower baseline body weight (75–80 kg) was significantly associated with achieving SVR across all genotypes (Shiffman and others 2007). In the current study, the majority of the study patients had a high BMI ≥25 kg/m2 in both treatment groups, and it was not a significant variable associated with poor response.
A study by Zechini and others (2004) showed a statistically significant positive correlation of baseline aminotransferase values with the hepatitis activity index and fibrosis score. In our study, transaminases were mildly elevated in both treatment groups (<2-fold rise), and ALT level >40 U/L was a factor associated with failure of treatment response in patients who received PEG-IFN alpha-2a, whereas it was not the same in those who received PEG-IFN alpha-2b using multivariate analysis.
Serum baseline AFP was significantly higher in the nonresponders either in those treated with PEG-IFN alpha-2a or in those treated with PEG-IFN alpha-2b (P = 0.00). Furthermore, AFP >10 ng/mL was a dependent variable associated with failure of treatment response in both treatment groups using both univariate and multivariate analyses. This is in agreement with many studies, and higher serum AFP was found to be independently and negatively associated with SVR in Egyptian patients with genotype 4 as Males and others (2007) found that higher baseline serum AFP levels independently predicted a lower SVR rate among patients with chronic HCV. Similar findings have been found in Egyptian patients with chronic HCV as low baseline AFP level could be a potential predictor for SVR, while the decrease in serial AFP levels was related to antiviral therapy irrespective of treatment outcome (Mabrouk and others 2013).
Although HCV RNA quantification was not shown to be predictive for the degree of HCV-related liver injury or the progression of disease, assessment of viral load before, during, and after therapy is an important tool for the prediction of treatment outcome. A low baseline viral load (<600,000–800,000 IU/mL) was shown to be an independent predictor of SVR, regardless of the genotype in several studies (Shiffman and others 2007).
The effect of viral load as a predictor was found to be nonlinear. While for HCV RNA concentrations up to ∼400,000 IU/mL a linear correlation with SVR was shown, for higher HCV RNA levels, relatively stable SVR rates without a significant further decline have been observed in PEG-IFN alpha-2a/RBV-treated patients (Zeuzem and others 2006).
In our study, HCV viral load >600 × 103 IU/mL was a dependent factor associated with failure of treatment response in both treatment groups. This coincides with other studies documenting low baseline viral load to be a significant predictor of SVR, regardless of the genotype (Poynard and others 2000; Berg and others 2006).
In this study, neither the histological activity nor the stage of fibrosis showed a statistically significant difference between both treatment groups; in addition, the advanced fibrosis (F > 2) was not a dependent variable associated with poor response. This finding is in contrary to that of Derbala and others (2006), who found a positive association between SVR and pretreatment histopathological injury. In addition to other previous studies, the absence of cirrhosis and bridging fibrosis are identified as the independent predictors related to SVR (Poynard and others 2000; Berg and others 2006).
During the follow-up, decreased hemoglobin and decreased TSH were the independent variables associated with poor response less likely (P < 0.01) in both treatment groups. This finding matches with that of other studies, which showed that anemia may increase the likelihood of achieving SVR during PEG-IFN and RBV treatment of HCV infection (Sulkowski and others 2010; Sievert and others 2011). In addition, other studies found that interferon-alpha and RBV therapy induce thyroid dysfunction in chronic hepatitis C patients, and there is no association between severity of disease and response to therapy with interferon-induced thyroid dysfunction (Masood and others 2008; Nadeem and Aslam 2012).
The data mining techniques and decision tree analysis were used to design a simple decision tree model for both treatment groups using the already available routine tests to predict SVR with a high probability (Kurosaki and others 2011a; Kurosaki and others 2011b). With the help of this model, rapid estimation of response before treatment can be made.
An interesting finding was that the low serum AFP was the first split variable (best predictor) in the predictive model for response in both treatment groups but with different cuttoff levels, being higher for patients treated with PEG-IFN alpha-2a (<8.53 ng/mL) classified as the high probability group for SVR (78.89%), with sensitivity of 75.8% and specificity of 94%, while for those treated with PEG-IFN alpha-2b, cutoff levels <4.89 ng/mL had a high probability for SVR (74.52%), with sensitivity of 69% and specificity of 92%. This also further confirms what was found by the univariate and multivariate analyses as well.
Our study confirmed previous other studies' findings regarding the predictive ability of AFP to detect the likelihood of SVR in chronic HCV Egyptian patients, where we applied one of the decision tree algorithms. Zayed and others (2014) found that when applying REP decision tree for treatment outcome to PEG-IFN/RBV therapy among HCV Egyptian patients, they could identify that cutoff level for baseline AFP (<2.48 ng/mL) has a high probability (72%) for SVR, which approaches other rates in other easily treatable genotypes (2 and 3) (Zayed and others 2013).
According to these findings, we may consider baseline AFP to be equal to other well-accepted predictive variables of treatment response, such as HCV genotype and viral load. Other variables, such as patients' age, gender, serum ALT, stages of hepatic fibrosis, and grades of activity, had less predictive value.
Conclusions
Studying the independent variables of response in relation to both treatment types revealed that serum AFP >10 ng/mL and viral load >600 × 103 IU/mL were variables associated with failure of response, while male gender, decreased hemoglobin, and decreased TSH were variables associated with favorable response in both treatment types. Applying the REPTree-based model further confirmed the predictive role of baseline serum AFP in both treatment types.
Footnotes
Acknowledgments
We thank the Egyptian National Committee for Control of Viral Hepatitis and the Science and Technology Development Fund (STDF) for their support to this work. We also thank Dr. Wafaa Al akel, associate professor of Endemic Hepatology and Gasteroenterology, for performing the statistical part of the study.
Author Disclosure Statement
No competing financial interests exist.
