A Systematic Gene–Gene and Gene–Environment Interaction Analysis of DNA Repair Genes XRCC1,XRCC2,XRCC3,XRCC4,and Oral Cancer Risk

Abstract

Oral cancer is the sixth most common cancer worldwide with a high mortality rate. Biomarkers that anticipate susceptibility, prognosis, or response to treatments are much needed. Oral cancer is a polygenic disease involving complex interactions among genetic and environmental factors, which require multifaceted analyses. Here, we examined in a dataset of 103 oral cancer cases and 98 controls from Taiwan the association between oral cancer risk and the DNA repair genes X-ray repair cross-complementing group (XRCCs) 1–4, and the environmental factors of smoking, alcohol drinking, and betel quid (BQ) chewing. We employed logistic regression, multifactor dimensionality reduction (MDR), and hierarchical interaction graphs for analyzing gene–gene (G×G) and gene–environment (G×E) interactions. We identified a significantly elevated risk of the XRCC2 rs2040639 heterozygous variant among smokers [adjusted odds ratio (OR) 3.7, 95% confidence interval (CI)=1.1–12.1] and alcohol drinkers [adjusted OR=5.7, 95% CI=1.4–23.2]. The best two-factor based G×G interaction of oral cancer included the XRCC1 rs1799782 and XRCC2 rs2040639 [OR=3.13, 95% CI=1.66–6.13]. For the G×E interaction, the estimated OR of oral cancer for two (drinking–BQ chewing), three (XRCC1–XRCC2–BQ chewing), four (XRCC1–XRCC2–age–BQ chewing), and five factors (XRCC1–XRCC2–age–drinking–BQ chewing) were 32.9 [95% CI=14.1–76.9], 31.0 [95% CI=14.0–64.7], 49.8 [95% CI=21.0–117.7] and 82.9 [95% CI=31.0–221.5], respectively. Taken together, the genotypes of XRCC1 rs1799782 and XRCC2 rs2040639 DNA repair genes appear to be significantly associated with oral cancer. These were enhanced by exposure to certain environmental factors. The observations presented here warrant further research in larger study samples to examine their relevance for routine clinical care in oncology.

Introduction

Oral cancer is the sixth most common cancer worldwide and causes high mortality because it is commonly ignored at the early stage (Warnakulasuriya, 2009). Three common environmental exposures such as betel quid (BQ) chewing, cigarette smoking, and alcohol drinking were reported to be highly associated with oral cancer in Taiwan (Ko et al., 1995). These factors were also reported to induce DNA damage and apoptosis. For example, alcohol was reported to induce oxidative DNA damage like acetaldehyde-derived DNA adducts of esophageal (Yukawa et al., 2014) and oral cells (Balbo et al., 2012). Effects of BQ chewing were reported to induce mitochondrial DNA mutation (Lee et al., 2001; Tan et al., 2003) and oxidative DNA damage (Chen et al., 2002). Cigarette smoke was also reported to generate toxic carbonyl compounds (Fujioka and Shibamoto, 2006). Its condensate was reported to induce DNA damage in oropharyngeal mucosa biopsies (Baumeister et al., 2009) and lung cells (Nyunoya et al., 2014). Accordingly, all tested environmental factors, alcohol, BQ, and smoking contributed to oral carcinogenesis (Chiang et al., 2013), moderately depending on the extend of DNA damage effects.

Recently, growing evidence indicated that single nucleotide polymorphisms (SNPs) of DNA repair genes were associated with oral cancer susceptibility (Gal et al., 2005; Yang et al., 2012; Yen et al., 2008). For example, SNPs of DNA repair genes such as X-ray repair cross-complementing group 1 (XRCC1) (Wu et al., 2014; Zhang et al., 2013), XRCC2 (Romanowicz-Makowska et al., 2012), XRCC3 (Tsai et al., 2014), and XRCC4 (Chiu et al., 2008; Tseng et al., 2008) were reported to be associated with oral cancer. However, most of these studies focused on the single SNP effect or single SNP–environment effect. The complex gene–gene (G×G) and gene–environment (G×E) interactions associated with oral cancer are less addressed.

G×G and G×E interactions were well-established to detect the epistasis which involved a complex association between disease/cancer related genes in case-control and family-based association studies (Chang et al., 2008; Chen et al., 2013, 2014; Chuang et al., 2012; Lin et al., 2009; Moore et al., 2010; Steen, 2012; Yang et al., 2011). This epistasis assists us to understand the causes of disease and cancer. Multifactor dimensionality reduction (MDR) represents an epistasis detection approach (Hahn et al., 2003; Ritchie et al., 2001) and several improved MDRs were suggested to detect particular data sets such as imbalanced data sets (Yang et al., 2013). MDR-ER (Yang et al., 2013) allowed that G×G interaction detection works on imbalanced data sets without the need of balanced demographic approaches. It can provide strong analytical abilities for imbalanced data sets for the detection of possible multiple factors interactions.

In this study, we examined the G×G and G×E effect by an improved MDR (MDR-ER) by tandem consideration of genetic factors (four SNPs of XRCCs 1–4) and environmental factors (gender, age, smoking, alcohol drinking, and BQ chewing) in a dataset of 103 oral cancer cases and 98 controls from Taiwan. Risk-ranking of oral cancer was identified in terms of the G×G and G×E interactions.

Methods

Multifactor dimensionality reduction (MDR)

The nonparametric and model-free MDR method is widely used in the investigation of G×G and G×E interactions (Ritchie et al., 2001). Nonlinear interactions among multiple factors such as genetic and environmental factors can effectively discriminate nonsignificant effects for each individual factor (Ritchie et al., 2003). MDR is a data reduction method that searches for multifactor combinations associated with either high or low risks of oral cancer. Therefore, several genetic and environmental factors are classified as being of high and low risk. A high-order G×G interaction for the ability to classify and predict outcome risk status can be evaluated by cross-validation (CV) and permutation testing of the data space is reduced to a two-way contingency table.

Supposedly the N SNPs are considered as a case-control data set, and the M is the maximum order of G×G and G×E interactions (i.e., M≤N). Let m be the number of order G×G and G×E interactions (m≤M). The procedure to use MDRs for detecting the best m-way of G×G and G×E interaction models is illustrated in Figure 1. The MDR procedure can be divided into the following eight steps:

FIG. 1.

MDR flowchart.

Step 1. Divide data set into a k sub data set and select a i^th sub data set as the test data set and the other remaining sub data set as the training data set.

Step 2. A set of m factors (loci) is consisted from all factors.

Step 3. All possible combinations of genotypes in m factors are represented in m-dimensional space (multifactor cells). Equation 1 is defined as a multifactor cell that includes a set of m genetic and environmental factors. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}L = \{ l_1 , l_2 , l_3 , \ldots , l_m \} \tag{{ \rm Eq}.1}\end{align*} \end{document}

Step 4. High or low risk is defined in each multifactor cell. Equation 2 is used to compute the ratio between case and control and the symbol u() is used to determine a score of “1” if all elements l in L match a sample in P or N, otherwise given a score “0”. Each multifactor cell is labelled as ‘H’ or ‘L’ symbol. The ‘H’ indicates the high-risk group if the ratio in multifactor cell meets or exceeds a threshold, while ‘L’ indicates a low-risk group. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}f ( L ) = \frac { { \bf \sum } _ { j = 1 } ^ { P^ { * } } u ( L , P_j ) } { { \bf \sum } _ { j = 1 } ^ { N^ { * } } u ( L , N_j ) } \tag { { \rm Eq}.2 } \end{align*} \end{document}

where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}u ( L , A ) = \begin{cases}1 \qquad \forall \ l \in \,A \\ \qquad\qquad\quad \qquad, \forall l \in L \\ 0 \qquad \forall \ l \notin \,A\end{cases}\end{align*} \end{document}

where

P the cases data set;

N the control data set;

P* the number of case groups in the training set;

N* the number of control groups in the training set;

L a vector of variable combinations.

Step 5. Repeat steps 2–5 until all possible sets of m factors are evaluated.

Step 6. Evaluate error rates of all possible sets of m factors and the model with the minimum training error rate (classification error rate) is chosen as a best model in each CV. The possible combinations in n-factors are reduced into a 2-way contingency table by step 4. Thus, Equation 3 can be used to evaluate model error rate. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}f ( C ) = \frac { FN + FP } { TP + FN + FP + TN } \tag { { \rm Eq}.3 } \end{align*} \end{document}

where

C the evaluated model;

TP true positive, the total number of labeled ‘H’ in the case data;

FP false positive, the total number of labeled ‘H’ in the control data;

FN false negative, the total number of labeled ‘L’ in the case data;

TN true negative, the total number of labeled ‘L’ in the control data.

Step 7. After classification error rate evaluate the all possible G×G interaction models, the model with minimum error rate is regarded as the best model of training data at i^th-fold CV. This best model is then evaluated by the testing data and evaluation approach is the same as steps 3–6 but only evaluates the best model of training data.

Step 8. Repeat steps 1–8 of next k-fold CV until maximum k is met. If all k-fold CVs are evaluated, then all CVs are collected as a cross-validation consistency (CVC) and the highest frequency with CVC is selected as the best G×G interaction model. If the two or more models have equal CVC frequency, then the model found first is the best G×G or G×E interaction model. The classification error rate of the finally selected best model is calculated by averaged classification error rates of CVs.

MDR-ER

MDR-ER was introduced to apply in imbalanced case-control data sets (Yang et al., 2013) and had proven to identify significant G×G and G×E interactions effectively. MDR-ER was only to improve both the classification and error rate evaluation functions by using the proportion of cases and the proportion of controls. All procedures of MDR-ER were the same with MDR, as above description. Equation 4 is the improved classification function using proportion of cases and the proportion of controls. The symbol u() is used to determine a score of “1” if all elements l in L match a sample in P or N; otherwise are given a score “0”. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}f ( L ) = \frac { N^* [ { \bf \sum } _ { j = l } ^ { P^ { * } } u ( L , P_j ) ] } { P^* [ { \bf \sum } _ { j = l } ^ { N^ { * } } u ( L , N_j ) ] } \tag { { \rm Eq}.4 } \end{align*} \end{document}

where

P the case data set;

N the control data set;

P* the number of case group in the training set;

N* the number of control group in the training set;

L a vector of variable combinations.

Equation 5 is the improved error rate evaluation function that is based on the arithmetic mean of the sensitivity and specificity. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}f ( C ) = 0.5 \times \left( \frac { FN } { TP + FN } + \frac { FP } { FP + TN } \right) \tag { { \rm Eq}.5 } \end{align*} \end{document}

where

TP the total number of high risk group in the case data;

FP the total number of high risk group in the control data;

FN the total number of low risk group in the case data;

TN the total number of low risk group in the control data.

Statistical analysis

The odds ratio (OR), 95% confidence interval (CI), and p value of each SNP were performed by the logistic regression for individuals with other alleles compared to those homozygous major alleles for a SNP. OR indicated the risk of disease and p value indicated the significance for the difference between groups. Both logistic regression and MDR-ER were used to evaluate the G×G and G×E interactions. As the logistic regression we used the multinomial logistic regression model without interaction among factors. Statistics were analyzed by SPSS version 19.0 for Windows (SPSS Inc., Chicago, IL) and the power analyses were performed by Power and Sample Size Calculations tool (Dupont and Plummer, 1998).

Dataset

The oral cancer case-control data set containing genotypes of XRCCs 1–4 genes and habits (smoking, alcohol drinking, and BQ chewing) was derived from our previous study (Yen et al., 2008) which was approved by the Institutional Review Board of Chi-Mei Medical Center in Tainan, Taiwan. This anonymous and delinked data set is available at http://bioinfo.kmu.edu.tw/ORCA-XRCCs-ABC.xls. All patients were pathologically confirmed to be primary OSCC. Importantly, the G×G or G×E interactions in MDR-ER analysis were not investigated hitherto.

Results

Information on the oral cancer patients and controls

Table 1 listed the information about the genotypes and prevalence of conventional oral cancer risk factors (smoking, alcohol drinking, and BQ chewing) in oral cancer patients vs. controls. The listed genotype frequency showed no significant difference between cases and controls. The percentages of risk factors were higher in cases than in control. In terms of age grouping (Supplementary Table S1), the studied population contained more patients at age <50 than at age ≥50. The percentage of male was higher among gender and the percentage of age ≥50 was higher among cases. The prevalence of smoking, alcohol drinking, and BQ chewing were higher among cases in both age groups.

Table 1.

Information on the OSCC Patients and Controls used in this Study

	Controls1 (n=98) N (%)*	Cases2 (n=103) N (%)*	χ² value	P value
XRCC1
CC	54 (55.1)	48 (46.6)
CT	35 (35.7)	40 (38.8)	0.68	0.41
TT	9 (9.2)	15 (14.6)	1.85	0.17
XRCC2
AA	18 (18.4)	16 (15.5)
AG	41 (41.8)	61 (59.2)	1.69	0.19
GG	39 (39.8)	26 (25.2)	0.46	0.50
XRCC3
CC	89 (90.8)	96 (93.2)
CT	9 (9.2)	7 (6.8)	0.39	0.53
TT	0 (0.0)	0 (0.0)	—	—
XRCC4
TT	4 (4.1)	3 (2.9)
TG	36 (36.7)	37 (35.9)	0.16	0.69
GG	58 (59.2)	63 (61.2)	0.23	0.64
Gender
Male	49 (50.0)	100 (97.1)	58.06	2.54E-14
Female	49 (50.0)	3 (2.9)
Smoking
No	68 (69.4)	12 (11.7)	69.87	6.33E-17
Yes	30 (30.6)	91 (88.3)
Drinking
No	76 (77.6)	26 (25.2)	54.98	1.22E-13
Yes	22 (22.4)	77 (74.8)
Betel nut chewing
No	75 (76.5)	12 (11.7)	86.11	1.70E-20
Yes	23 (23.5)	91 (88.3)

^* ¹

The controls of data sets (Yen et al., 2008) were collected from the people routine physical checkups, non-neoplastic minor operations, or maxillofacial trauma.

^* ²

The cases of data sets (Yen et al., 2008) were collected from the three pathologically confirmed primary OSCC patients.

Logistic regression analyses for independent environmental effects to each SNP

We previously found that the individual SNP effects of the XRCC1 rs1799782, XRCC3 rs861539, and XRCC4 rs2075685 were nonsignificantly associated with oral cancer risk in terms of its adjusted overall ORs for consideration of all listed factors such as age, gender, alcohol drinking, smoking, and BQ chewing (left of Table 2) (Yen et al., 2008). However, the impacts of these risk factors to oral cancer were not individually investigated.

Table 2.

Effects of Genotypes of Four DNA Repair Genes on Oral Cancer Risk by Overall and by Smoking, Alcohol Drinking, and BQ Chewing Status

	Adjusted * 5 overall OR * 9	Never smoking* 6 (n=80)	Smoking* 6 (n=121)	Never drinking* 7 (n=102)	Drinking* 7 (n=99)	Never BQ chewing* 8 (n=87)	BQ chewing* 8 (n=114)
XRCC1: rs1799782* 1
CC	1.0	1.0	1.0	1.0	1.0	1.0	1.0
CT	1.0 (0.4–2.4)	5.2 (0.4–60.9)	0.7 (0.3–1.9)	0.8 (0.2–3.7)	1.1 (0.4–3.3)	3.3 (0.5–19.5)	0.7 (0.3–1.9)
TT	1.6 (0.4–6.1)	8.7 (0.3–287.4)	1.3 (0.3–5.5)	2.4 (0.3–22.2)	1.5 (0.3–8.3)	3.6 (0.3–43.3)	1.5 (0.3–7.6)
XRCC2: rs2040639* 2
AA	1.0	1.0	1.0	1.0	1.0	1.0	1.0
AG	2.9 (1.0–8.8)	1.3 (0.1–21.0)	3.7 (1.1–12.1)	1.1 (0.2–7.6)	5.7 (1.4–23.2)	2.5 (0.3–18.9)	3.4 (0.9–12.6)
GG	1.0 (0.3–3.1)	1.2 (0.1–27.5)	0.9 (0.3–3.1)	0.4 (0.0–3.8)	1.3 (0.3–4.8)	1.1 (0.1–12.8)	1.0 (0.3–3.6)
XRCC3: rs861539* 3
CC	1.0	1.0	1.0	1.0	1.0	1.0	1.0
CT	0.4 (0.1–1.7)	—	0.5 (0.1–2.5)	—	0.8 (0.1–4.7)	0.5 (0.0–11.5)	0.4 (0.1–1.9)
TT	—	—	—	—	—	—	—
XRCC4: rs2075685* 4
TT	1.0	1.0	1.0	1.0	1.0	1.0	1.0
TG	1.2 (0.1–11.7)	—	0.8 (0.1–8.8)	—	0.7 (0.1–7.9)	—	0.8 (0.1–9.1)
GG	1.2 (0.1–12.1)	—	1.1 (0.1–12.1)	—	1.0 (0.1–11.5)	—	1.2 (0.1–13.4)

Bold fonts indicate significant relationships.

^* ¹

rs1799782, Arg194Trp, allele=C/T, MAF=0.130, Chromosome (Chr.) 19; ^*²rs2040639, 5′ locus, allele=A/G, MAF=0.371, Chr. 7; ^*³rs861539, Thr241Met, allele=C/T, MAF=0.251, Chr. 14; ^*⁴rs2075685, T1394G intron 1, allele=G/T, MAF=0.385, Chr. 5; ^*⁵Overall adjustment for age, gender, alcohol drinking, smoking, and BQ chewing. It was reported in our previous work (Yen et al., 2008); ^*⁶Only adjustment for age, gender, alcohol drinking, and BQ chewing; ^*⁷Only adjustment for age, gender, smoking, and BQ chewing; ^*⁸Only adjustment for age, gender, alcohol drinking, and smoking; ^*⁹OR, odds ratio; CI, confidence interval.

In the current study, we used the logistic regression to independently adjust these risk factors to oral cancer for analyzing its independent effect of risk factor on each genotype of these SNPs of DNA repair genes (Table 2). Coupling with XRCC2 rs2040639, the single risk factor effects for smoking and alcohol drinking were individually and significantly associated with oral cancer (i.e., XRCC2 rs2040639 heterozygote; adjusted overall OR=2.9, 95% CI=1.0–8.8; adjusted OR in smoking=3.7, 95% CI=1.1–12.1; adjusted OR in alcohol drinking=5.7, 95% CI=1.4–23.2). The coupling effect of XRCC2 rs2040639 with BQ chewing showed no significant association to oral cancer [adjusted OR=3.4, 95% CI=0.9–12.6]. The case number between no smoking and smoking is imbalanced and it may have limited generalization. However, Table 2 shows the stratification analysis to control the confounder factors amongst the smoking, drinking, and BQ chewing. The results suggested that smoking and drinking have significant effect in oral cancer and these factors also reported to associate with oral cancer (Ko et al., 1995).

Analyses of gene–gene interaction

All significant results of the two-factor G×G interaction models generated by MDR-ER are shown in Table 3. Among them, the best model (XRCC1 rs1799782+XRCC2 rs2040639) was selected by its minimum classification error rate (0.365). Using this strategy, the best one-factor model and 2–5 factors G×G interaction models of MDR-ER analysis are shown in Table 4. For example, XRCC2 rs2040639 was the best single-factor associated with oral cancer (error rate=0.413, CVC=4/5). It indicates that the classification error rate≤0.413 is observed by chance in randomized data based on the null hypothesis of no association. The combination of XRCC1 rs1799782 and XRCC2 rs2040639 was the best two-factor model, with an error rate of 0.365 and a CVC of 5/5. The three-factor model added XRCC4 rs2075685 to XRCC1 rs1799782 and XRCC2 rs2040639 for the error rate (0.339). XRCC1 rs1799782, XRCC2 rs2040639, XRCC3 rs861539, and XRCC4 rs2075685 was the four-factor model of the most accurate for oral cancer prediction (error rate=0.330, CVC=5/5). Likewise, the full outcome of the MDR-ER genotype combination of these four factors are presented to show the high and low risks for oral cancer (Figure 2).

FIG. 2.

Summary of XRCC1, XRCC2, XRCC3, and XRCC4 genotype combinations associated with high and low risks for oral cancer from MDR-ER with the lowest prediction error. For each genotype combination (cell), the left bar of the histogram shows the case number, while the right bar shows the control number. The high risk cells are represented as being darker. As an evidence for epistasis, each genotype is associated with the other, and the risk is calculated according to this epistasis.

Table 3.

Two-Factors G×G Interactions among Eight SNPs Assessed by MDR-ER

Two-factors	OR (95% CI)	95% Bootstrap CI	P value	Error rate
XRCC1+XRCC2	3.133 (1.63–6.01)	(1.66–6.13)	5.18E-04	0.365
XRCC1+XRCC4	2.189 (1.16–4.13)	(1.14–4.31)	0.015	0.407
XRCC2+XRCC3	2.112 (1.13–3.95)	(1.11–4.00)	0.019	0.408
XRCC2+XRCC4	2.189 (1.16–4.13)	(1.20–4.22)	0.015	0.407

All two-factors G×G interactions were identified by the unbalanced function based on MDR method with significant testing accuracy but not best CVC. OR, odds ratio.

Table 4.

The Identified Best Model in One Factor and 2- to 4-Factors G×G Interaction by MDR-ER

Factor no.	Best model	CVC * 1	Error rate	Power	OR (95% CI)	Bootstrap 95% CI * 2	p value
1	XRCC2	4/5	0.413	0.689	2.009		0.028
					(1.08–3.76)	(1.07–3.76)
2	XRCC1	5/5	0.365	0.978	3.133		5.18E-04
	XRCC2				(1.63–6.01)	(1.66–6.13)
3	XRCC1	4/5	0.339	0.998	4.251		4.10E-05
	XRCC2				(2.21–8.18)	(1.99–7.41)
	XRCC4
4	XRCC1	5/5	0.330	0.999	4.543		7.11E-06
	XRCC2				(2.34–8.82)	(2.31–8.98)
	XRCC3
	XRCC4

^* ¹

CVC, cross-validation consistency; ^*²Bootstrap 95% CI was the adjusted 95% CI by bootstrapping 1000 samples.

Table 4 also shows that OR values in the 1–5 factor models were increased from 2.009 to 4.543 and the bootstrapping in 1000 samples adjusted to 95% CI of OR (95% bootstrap CI) values were adjusted from 1.07 to 8.98. P values of 1–4 factors models decrease from 0.028 to 7.11E-06. These OR values indicated that the oral cancer risk is significantly raised by the joint effect of multiple genotypes. The power analysis in 1–4 factors, ranging from 0.689 to 0.999, showed that occurrence probability in 2–4 factors models were higher than 0.9. These three power values of models indicated that we can reject the null hypothesis that this OR value equals 1 with probability over 0.9.

Analyses of gene–environment interaction

Oral carcinogenesis may be the outcome for the interaction of multiple genes that fails to respond the damage induced by several environmental factors, such as gender, age, smoking, BQ chewing, and alcohol drinking. Therefore, the effects of G×E interaction in oral cancer risk was evaluated in terms of four SNPs in DNA repair genes like XRCCs 1–4 and five environmental factors using MDR-ER analysis in this study.

Supplementary Figures S1 to S4, respectively, illustrated the best models of 2–5 factors (Table 5) in the association with high and low risks of oral cancer. Table 5 shows that BQ chewing [i.e., BQ (Y)], was the best single-factor for oral cancer association with a high accuracy (error rate=0.18) and a good CVC (5/5). The best two-factor model was the combination of alcohol drinking and BQ chewing, with a testing error rate of 0.18 and a CVC of 4/5. The three-factor model contained two genetic factors XRCC1 rs1799782 and XRCC2 rs2040639 to an environmental factor BQ chewing (error rate=0.18; CVC=4/5). These two models had the highly accurate and high CVC. Four-factor models included XRCC1 rs1799782, XRCC2 rs2040639, age, and BQ chewing. XRCC1 rs1799782, XRCC2 rs2040639, and BQ chewing were the most common three factors across all factors listed in the best models, however, the addition of age slightly decreased the testing accuracy of the best model of four factors (error rate=0.20; CVC=3/5). All of the five-factor models included XRCC1 rs1799782, XRCC2 rs2040639, age, alcohol drinking, and BQ chewing. Compared to the best model of four factors, the addition of alcohol drinking slightly decreased the testing accuracy of the best model of five factors (error rate=0.21; CVC=3/5).

Table 5.

Best Model in One Factor and 2- to 5-Factors G×E Interaction by MDR-ER * 1

Best model	Low risk	High risk	CVC * 2	TestingErr. * 3	OR (95% CI)
One factor
BQ chewing	1. BQ (N)* 4	1. BQ (Y)* 5	5/5	0.18	25.49
					(10.8–59.8)
Two factors
Drinking	1. Drinking (N) or BQ (N)	1. Drinking (Y) or BQ (Y)	4/5	0.18	32.9
BQ chewing					(14.1–76.9)
Three factors
XRCC1	1. BQ (N)	1. Any other combination	4/5	0.18	31.0
XRCC2	2. XRCC1=TT, XRCC2=AA, BQ (Y)				(14.0–64.7)
BQ chewing	3. XRCC1=CT, XRCC2=GG, BQ (Y)
Four factors			3/5	0.20	49.8
XRCC1	1. Age <50, BQ (N)	1. Any other combination			(21.0–117.7)
	2. XRCC1=CC, age ≥50, BQ (N)
XRCC2	3. XRCC1=CT, XRCC2=AG, age ≥50, BQ (N)
Age	4. XRCC1=TT, XRCC2=AA, age≥50, BQ (N) or BQ (Y)
BQ chewing	5. XRCC1=CC, XRCC2=CT, age <50, BQ (Y)
	6. XRCC1=CT, XRCC2=GG, age <50, BQ (Y)
Five factors			3/5	0.21	82.9
XRCC1	1. Age <50, drinking (N), BQ (N)	1. Age ≥50, drinking (N), BQ (Y)			(31.0–221.5)
	2. XRCC2=AG, age <50, drinking (Y), BQ (N)	2. XRCC2=AG, BQ (Y)
XRCC2	3. XRCC1=CC, age ≥50, drinking (N), BQ (N)	3. XRCC1=CC, age ≥50, drinking (Y)
Age	4. Any other combination	4. XRCC1=CC, age <50, drinking (N), BQ (Y)
Drinking		5. Age ≥50, drinking (Y), BQ (N)
BQ chewing		6. XRCC1=CT, XRCC2=GG, age <50, drinking (Y), BQ (N)
		7. XRCC1=CT, XRCC2=AA, drinking (Y), BQ (Y)
		8. XRCC1=CT, XRCC2=AA or GG, age ≥50, drinking (N), BQ (N)
		9. XRCC1=TT, XRCC2=AG, age ≥50, drinking (N), BQ (N)10. XRCC1=TT, XRCC2=GG, age ≥50, drinking (Y), BQ (Y)

^* ¹

Detailed information of its MDR-ER chart are supplemented in Figures S1–S4 in respective to the best models of two–five factors; ^*²CVC, cross-validation consistency; ^*³ Testing err., average of testing error rate; ^*⁴N, No; ^*⁵Y, Yes.

Discussion

In this study, we demonstrated a systematic analysis for complex interactions between genetic and risk factors in oral cancer. We used a multifaceted analytical method that combines regular statistical methods with MDR-ER to identify an associated relationship between DNA repair polymorphisms and oral cancer risk. Accordingly, the potential effects of DNA repair genes may be explored by G×G or G×E interactions (i.e., the association between genotypes of SNPs of XRCCs 1-4 genes and between genotype combinations with smoking, BQ chewing, alcohol drinking status in this study).

The traditional statistical analyses were reported to analyze the modeling of multifactor interactions inadequately (Moore and Williams, 2002). For example, Andrew et al. (2006) used logistic regression to evaluate the three-way interactions, but the model was unable to converge results from the small number of individuals in some cells. Alternatively, the risk effects associated with multifactors were analyzed by a MDR approach (Collins et al., 2013; Greene et al., 2010; Lee et al., 2014; Oh et al., 2013), which enhances the statistical power to explore the possible G×G and G×E interactions (Moore, 2004). However, MDR was inadequate for analyzing the imbalanced data set due to all cells towards either higher or lower groups. Therefore, we utilized the MDR-ER to assess and interpret possible G×G and G×E interactions; this approach improved the MDR disadvantage on the imbalanced data set (Yang et al., 2013).

We identified the two-factor combinations in terms of G×G and G×E interactions. MDR-ER analysis selected XRCC1 rs1799782 and XRCC2 rs2040639 as the best two biomarkers of oral cancer risk in G×G interaction (Table 3), while alcohol drinking and BQ chewing were the best two biomarkers of oral cancer risk in gene–environment interaction (Table 5). Moreover, BQ chewing was the best model of single factors among all individual genetic and environmental factors. These results suggested that environmental factors alone may have a dominant effect on oral cancer susceptibility compared to that of genetic factors alone under the best 1–2 factor models.

Results of independent analyses of XRCC3 were associated with either XRCC1 or XRCC2 in oral cancer risk (Benhamou et al., 2004; Dos Reis et al., 2013). XRCC1 rs1799782 (Ramachandran et al., 2006; Yen et al., 2008) and XRCC2 rs2040639 (Yen et al., 2008) were reported to be individually associated with oral cancer. However, the combined effect of XRCC1 rs1799782 and XRCC2 rs2040639, as well as their interaction with environmental factors, had not been reported. Recently, G×E interactions between several genes with smoking and with BQ chewing have been reported (Chiu et al., 2008; Sugimura et al., 2006). Similar to our best models of 3–5 factors, the interaction effects between XRCC1 rs1799782 and XRCC2 rs2040639 might be dramatically increased by the environmental factors such as age, alcohol drinking, and BQ chewing but not for smoking [i.e., the OR values are increased from 31.0 to 82.9 (Table 5)].

Furthermore, the combined effect of SNPs with environmental factors (tobacco products and alcohol) were also reported in other genes (IL12RB2, Rad 52, XRCC2, P53, CCND3, and ABCA1) (Cederblad et al., 2013). The combined effects of XRCC1 rs1799782 and XRCC2 rs2040639 was dominantly associated with oral cancer compared to one environmental factor such as age under the best models of 3–5 factors. Additionally, higher oral cancer risk was found when environmental factors were present only in heterozygotic patients for XRCC2 rs2040649 5′UTR polymorphism but not in homozygotes for the major or minor allele (Table 2). This may be partly explained by the law of “incomplete dominance”, which meant that the dominant, recessive, and intermediate phenotypes may appear in this case.

The results show XRCC2 polymorphism and BQ chewing as the best single genetic and environmental factors associated to oral cancer. However, no gene–environment interaction was found between them, unless the combination of XRCC2+XRCC1 and not only the XRCC2 polymorphism was considered in the model. It can be explained that the 2–5 factors G×E interactions by MDR-ER included the G×G, G×E, and E×E interaction. For two factors, drinking (Y) or BQ (Y) is high risk and drinking (N) or BQ (N) is low risk, which belong to the E×E interaction. Other two factors such G×G and G×E as were not the best models were not shown.

MDR uses the machine learning technique to find the nonlinear model associated with disease, thus biologically meaningful results can be detected without a big population. For example, the associations between renin-angiotensin system-related gene polymorphisms and atrial fibrillation risk were reported for 97 cases and 97 controls (Asselbergs et al., 2006). The early onset coronary artery disease using 90 cases and 90 controls were analyzed by MDR (Agirbasli et al., 2011). Furthermore, the performance of MDR and penalized logistic regression (PLR) was compared for detecting G×G interaction that were associated with acute rejection in kidney transplant patients, based on randomly selected 120 Caucasian patients by different two-way and three-way interaction models (He et al., 2009). Moreover, they used the receiver operating characteristic analysis and suggested that MDR outperforms PLR in detecting G×G interaction on the small samples in the real dataset.

The computational time remains an important limitation in MDR as well as MDR-ER due to the astronomical number of higher order combinations. Consequently, with the huge time consumed, it is difficult to implement the multiple test in the identification of more complex interactions between genes.

In conclusion, our results indicated that the MDR-ER methodology was effective in identifying the G×G and G×E interactions in an oral cancer association study. We found that XRCC1 rs1799782 and XRCC2 rs2040639 of DNA repair genes was significantly associated with oral cancer susceptibility and this risk may be enhanced with exposure to certain environmental factors. Furthermore, our findings highlight the impacts of MDR-ER based G×G and G×E interaction for oral cancer susceptibility, providing a potential algorithm to apply to other complex disease predictions. Finally, we suggest that the observations presented here warrant further research in larger study samples to examine their relevance for routine clinical care in oncology.

Footnotes

Acknowledgments

This work was partly supported by funds of the MOST 103-2221-E-151-029-MY3, MOST 103-2320-B-037-008, the Kaohsiung Medical University “Aim for the Top Universities Grant, Grant No. KMU-TP103A33”, the 103CM-KMU-09, the KMU-TP103A33, the National Sun Yat-sen University-KMU Joint Research Project (#NSYSU-KMU 103-p014 and #NSYSU-KMU 101-p036), and the Health and welfare surcharge of tobacco products, the Ministry of Health and Welfare, Taiwan, Republic of China (MOHW104-TDU-B-212-124-003 and MOHW103-TDU-212-114007). We also thank Dr. Hans-Uwe Dahms for his help with English editing.

Author Disclosure Statement

The authors declare that there are no conflicting financial interests.

References

Agirbasli

, Guney

, Ozturhan

, et al. (2011). Multifactor dimensionality reduction analysis of MTHFR, PAI-1, ACE, PON1, and eNOS gene polymorphisms in patients with early onset coronary artery disease. Eur J Cardiovasc Prev Rehab, 18, 803–809.

Andrew

, Nelson

, Kelsey

, et al. (2006). Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking and bladder cancer susceptibility. Carcinogenesis, 27, 1030–1037.

Asselbergs

, Moore

, van den Berg

, et al. (2006). A role for CETP TaqIB polymorphism in determining susceptibility to atrial fibrillation: A nested case control study. BMC Med Genet, 7, 39.

Balbo

, Meng

, Bliss

, Jensen

, Hatsukami

, and Hecht

. (2012). Kinetics of DNA adduct formation in the oral cavity after drinking alcohol. Cancer Epidemiol Biomarkers Prev, 21, 601–608.

Baumeister

, Reiter

, Kleinsasser

, Matthias

, and Harreus

. (2009). Epigallocatechin-3-gallate reduces DNA damage induced by benzo[a]pyrene diol epoxide and cigarette smoke condensate in human mucosa tissue cultures. Eur J Cancer Prev, 18, 230–235.

Benhamou

, Tuimala

, Bouchardy

, Dayer

, Sarasin

, and Hirvonen

. (2004). DNA repair gene XRCC2 and XRCC3 polymorphisms and susceptibility to cancers of the upper aerodigestive tract. Int J Cancer, 112, 901–904.

Cederblad

, Thunberg

, Engstrom

, Castro

, Rutqvist

, and Laytragoon-Lewin

. (2013). The combined effects of single-nucleotide polymorphisms, tobacco products, and ethanol on normal resting blood mononuclear cells. Nicotine Tob Res, 15, 890–895.

Chang

, Chuang

, Ho

, Chang

, and Yang

. (2008). Odds ratio-based genetic algorithms for generating SNP barcodes of genotypes to predict disease susceptibility. OMICS, 12, 71–81.

Chen

, Chi

, and Liu

. (2002). Hydroxyl radical formation and oxidative DNA damage induced by areca quid in vivo. J Toxicol Environ Health A, 65, 327–336.

10.

Chen

, Chuang

, Lin

, et al. (2013). Preventive SNP-SNP interactions in the mitochondrial displacement loop (D-loop) from chronic dialysis patients. Mitochondrion, 13, 698–704.

11.

Chen

, Chuang

, Lin

, et al. (2014). Genetic algorithm-generated SNP barcodes of the mitochondrial D-loop for chronic dialysis susceptibility. Mitochondrial DNA, 25, 231–237.

12.

Chiang

, Lee

, Chang

, et al. (2013). Combined effects of differentiation factor 15 and substance use of alcohol, betel quid and cigarette on risk of head and neck cancer. Head Neck Oncol, 5, 23.

13.

Chiu

, Tsai

, Tseng

, et al. (2008). A novel single nucleotide polymorphism in XRCC4 gene is associated with oral cancer susceptibility in Taiwanese patients. Oral Oncol, 44, 898–902.

14.

Chuang

, Chang

, Lin

, and Yang

. (2012). Chaotic particle swarm optimization for detecting SNP-SNP interactions for CXCL12-related genes in breast cancer prevention. Eur J Cancer Prev, 21, 336–342.

15.

Collins

, Hu

, Wejse

, Sirugo

, Williams

, and Moore

. (2013). Multifactor dimensionality reduction reveals a three-locus epistatic interaction associated with susceptibility to pulmonary tuberculosis. BioData Min, 6, 4.

16.

Dos Reis

, Losi-Guembarovski

, de Souza Fonseca Ribeiro

, et al. (2013). Allelic variants of XRCC1 and XRCC3 repair genes and susceptibility of oral cancer in Brazilian patients. J Oral Pathol Med, 42, 180–185.

17.

Dupont

, and Plummer

Jr . (1998). Power and sample size calculations for studies involving linear regression. Control Clin Trials, 19, 589–601.

18.

Fujioka

, and Shibamoto

. (2006). Determination of toxic carbonyl compounds in cigarette smoke. Environ Toxicol, 21, 47–54.

19.

Gal

, Huang

, Chen

, Hayes

, and Schwartz

. (2005). DNA repair gene polymorphisms and risk of second primary neoplasms and mortality in oral cancer patients. Laryngoscope, 115, 2221–2231.

20.

Greene

, Sinnott-Armstrong

, Himmelstein

, Park

, Moore

, and Harris

. (2010). Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS. Bioinformatics, 26, 694–695.

21.

Hahn

, Ritchie

, and Moore

. (2003). Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics, 19, 376–382.

22.

, Oetting

, Brott

, and Basu

. (2009). Power of multifactor dimensionality reduction and penalized logistic regression for detecting gene-gene interaction in a case-control study. BMC Med Genet, 10, 127.

23.

, Huang

, Lee

, Chen

, Lin

, and Tsai

. (1995). Betel quid chewing, cigarette smoking and alcohol consumption related to oral cancer in Taiwan. J Oral Pathol Med, 24, 450–453.

24.

Lee

, Yin

, Yu

, et al. (2001). Accumulation of mitochondrial DNA deletions in human oral tissues—Effects of betel quid chewing and oral cancer. Mutat Res, 493, 67–74.

25.

Lee

, Jin

, Lee

, Ha

, Yeo

, and Oh

. (2014). Gene-gene interactions of fatty acid synthase (FASN) using multifactor-dimensionality reduction method in Korean cattle. Mol Biol Rep, 41, 2021–2027.

26.

Lin

, Tseng

, Yang

, et al. (2009). Combinational polymorphisms of seven CXCL12-related genes are protective against breast cancer in Taiwan. OMICS, 13, 165–172.

27.

Moore

. (2004). Computational analysis of gene-gene interactions using multifactor dimensionality reduction. Expert Rev Mol Diagn, 4, 795–803.

28.

Moore

, Asselbergs

, and Williams

. (2010). Bioinformatics challenges for genome-wide association studies. Bioinformatics, 26, 445–455.

29.

Moore

, and Williams

. (2002). New strategies for identifying gene-gene interactions in hypertension. Ann Med, 34, 88–95.

30.

Nyunoya

, Mebratu

, Contreras

, Delgado

, Chand

, and Tesfaigzi

. (2014). Molecular processes that drive cigarette smoke-induced epithelial cell fate of the lung. Am J Respir Cell Mol Biol, 50, 471–482.

31.

, Jin

, Lee

, et al. (2013). Identification of stearoyl-coa desaturase (SCD) gene interactions in Korean native cattle based on the multifactor-dimensionality reduction method. Asian-Australas J Anim Sci, 26, 1218–1228.

32.

Ramachandran

, Ramadas

, Hariharan

, Rejnish Kumar

, and Radhakrishna Pillai

. (2006). Single nucleotide polymorphisms of DNA repair genes XRCC1 and XPD and its molecular mapping in Indian oral cancer. Oral Oncol, 42, 350–362.

33.

Ritchie

, Hahn

, and Moore

. (2003). Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genet Epidemiol, 24, 150–157.

34.

Ritchie

, Hahn

, Roodi

, et al. (2001). Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet, 69, 138–147.

35.

Romanowicz-Makowska

, Smolarz

, Gajecka

, et al. (2012). Polymorphism of the DNA repair genes RAD51 and XRCC2 in smoking- and drinking-related laryngeal cancer in a Polish population. Arch Med Sci, 8, 1065–1075.

36.

Steen

. (2012). Travelling the world of gene-gene interactions. Brief Bioinform, 13, 1–19.

37.

Sugimura

, Kumimoto

, Tohnai

, et al. (2006). Gene-environment interaction involved in oral carcinogenesis: Molecular epidemiological study for metabolic and DNA repair gene polymorphisms. J Oral Pathol Med, 35, 11–18.

38.

Tan

, Chang

, Chen

, et al. (2003). Novel heteroplasmic frameshift and missense somatic mitochondrial DNA mutations in oral cancer of betel quid chewers. Genes Chromosomes Cancer, 37, 186–194.

39.

Tsai

, Chang

, Liu

, Tsai

, Lin

, and Bau

. (2014). Contribution of DNA double-strand break repair gene XRCC3 genotypes to oral cancer susceptibility in Taiwan. Anticancer Res, 34, 2951–2956.

40.

Tseng

, Tsa

, Chiu

, et al. (2008). Association of XRCC4 codon 247 polymorphism with oral cancer susceptibility in Taiwan. Anticancer Res, 28, 1687–1691.

41.

Warnakulasuriya

. (2009). Global epidemiology of oral and oropharyngeal cancer. Oral Oncol, 45, 309–316.

42.

, Liu

, Yin

, Guan

, Li

, and Zhou

. (2014). Association of X-ray repair cross-complementing group 1 Arg194Trp, Arg399Gln and Arg280His polymorphisms with head and neck cancer susceptibility: A meta-analysis. PLoS One, 9, e86798.

43.

Yang

, Chuang

, Chen

, Tseng

, and Chang

. (2011). Computational analysis of simulated SNP interactions between 26 growth factor-related genes in a breast cancer association study. OMICS, 15, 399–407.

44.

Yang

, Chuang

, Cheng

, et al. (2012). Single nucleotide polymorphism barcoding to evaluate oral cancer risk using odds ratio-based genetic algorithms. Kaohsiung J Med Sci, 28, 362–368.

45.

Yang

, Lin

, Chuang

, Chen

, and Chang

. (2013). MDR-ER: Balancing functions for adjusting the ratio in risk classes and classification errors for imbalanced cases and controls using multifactor-dimensionality reduction. PLoS One, 8, e79387.

46.

Yen

, Liu

, Chen

, et al. (2008). Combinational polymorphisms of four DNA repair genes XRCC1, XRCC2, XRCC3, and XRCC4 and their association with oral cancer in Taiwan. J Oral Pathol Med, 37, 271–277.

47.

Yukawa

, Ohashi

, Amanuma

, et al. (2014). Impairment of aldehyde dehydrogenase 2 increases accumulation of acetaldehyde-derived DNA damage in the esophagus after ethanol ingestion. Am J Cancer Res, 4, 279–284.

48.

Zhang

, Wang

, Wu

, and Li

. (2013). XRCC1 Arg194Trp polymorphism is associated with oral cancer risk: Evidence from a meta-analysis. Tumour Biol, 34, 2321–2327.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.22 MB

0.02 MB