Abstract
Aims:
Our study was designed to explore the applied characteristics of the back propagation artificial neural network (BPANN) on studying the genetic variants in adipnectin ADIPOQ, peroxisome proliferator-activated receptor (PPAR)-γ, and retinoid X receptor-α (RXR-α) genes and type 2 diabetes mellitus (T2DM) risks in a Chinese Han population.
Subjects and Methods:
We used BPANN as the fitting model based on data gathered from T2DM patients (n=913) and normal controls (n=1,001). The mean impact value (MIV) for each input variables were calculated, and the sequence of the factors according to their absolute MIVs was sorted.
Results:
The results from BPANN were compared with multiple logistic regression analysis, and the generalized multifactor dimensionality reduction (GMDR) method was used to calculate the joint effects of ADIPOQ, PPAR-γ, and RXR-α genes. By BPANN analysis, the sequence according to the importance of the T2DM risk factors was in the order of serum adiponectin level, rs3856806, rs7649121, hypertension, rs3821799, rs17827276, rs12495941, rs4240711, age, rs16861194, waist circumference, rs2241767, rs2920502, rs1063539, alcohol drinking, smoking, hyperlipoproteinemia, gender, rs3132291, T2DM family history, rs4842194, rs822394, rs1801282, rs1045570, rs16861205, rs6537944, body mass index, rs266729, and rs1801282. However, compared with multiple logistic regression analysis, only 11 factors were statistically significant. After overweight and obesity were taken as environment adjustment factors into the analysis, model A2 B4 C5 C6 C8 (rs3856806, rs4240711, rs7649121, rs3821799, rs12495941) was the best model (coefficient of variation consistency=10/10, P=0.0107) in the GMDR method.
Conclusions:
These results suggested the interactions of ADIPOQ, PPAR-γ, and RXR-α genes might play a role in susceptibility to T2DM. BPANN could be used to analyze the risk factors of diseases and provide more complicated relationships between inputs and outputs.
Introduction
Back propagation artificial neural network (BPANN) is an unconventional multidimensional data analysis that is mainly applied in the problem of prediction and discriminant analysis. It can master the essence feature, through learning and training the representative examples. A well-trained BPANN, with a very strong sense of self-organization, adaptive ability, and higher fault tolerance, theoretically is capable of approaching any nonlinear mapping between input (independent variables) and output (dependent variables). The variables of BPANN do not require normality, independence, and other conditions. Thus, it broke through the limitations of case-control studies and traditional logistic regression and provided a new method for etiology research. 4 We used the BPANN method to analyze our data and compared our findings with the results of logistic regression analysis, to explore the applied characteristics of BPANN on studying the genetic variants among ADIPOQ, PPAR-γ, and RXR-α genes with T2DM risk in a Chinese Han population and also to better understand and use the artificial neural network in the approach to epidemiology.
Subjects and Methods
Subjects
Our study selected a total of 1,914 subjects, consisting of 913 T2DM patients and 1,001 T2DM-free controls, with their informed consent. All subjects were genetically unrelated ethic Han Chinese. The cases included all eligible patients newly diagnosed with T2DM according to the diagnosis criteria of the World Health Organization for diabetes, who were consecutively recruited between March 2008 and August 2010 at the diabetes outpatient clinic from three affiliated hospitals of Nanjing Medical University (NJMU) (The Affiliated Changzhou 2nd Hospital of NJMU, the 3rd Affiliated Hospital of NJMU, and the Affiliated Nanjing 1st Hospital of NJMU) without the restrictions of age and sex (448 men and 465 women; 57.21±10.99 years old). A diagnosis of T2DM required either fasting plasma glucose ≥7.0 mmol/L (126 mg/dL) or 2-h glucose ≥11.1 mmol/L (200 mg/dL) after an oral glucose tolerance test. 16 All the patients were tested by glutamic acid decarboxylase autoantibodies and islet-cell antibodies (ICA512) (RSI Company, Oxford, UK) to remove patients with type 1 diabetes. T2DM-free control subjects who had no history of T2DM, frequency-matched to the cases on age (±5 years) and gender, were randomly selected from the outpatient clinic within the same geographical area and the period of the cases. Controls were enrolled with routine annual health examinations and did not have diabetes as determined by an oral glucose tolerance test (75 g of glucose), performed according to World Health Organization criteria (472 men and 529 women; 57.48±11.21 years old). Those with medical illnesses such as hyperthyroidism, pituitary diseases, and chronic liver diseases and those taking glucocorticoids or other medications affecting glucose metabolism were excluded from the study. After signing an informed consent, participants were administered standard questionnaires, in order to collect information on demographic data and environmental exposure history, including smoking and drinking. After interview, an approximately 5-mL venous blood sample was collected from each participant.
We used BPANN as the fitting model based on data gathered from T2DM patients and normal controls. The mean impact value (MIV) for each input variable was calculated, and the sequence of the factors according to their absolute MIVs was sorted.
Measurements
The anthropometric measurements included body weight, height, waist circumference (WC), and hip circumference. Body mass index (BMI) was calculated according to the Quetelet equation. Blood pressure was measured on the right arm with the individual in a sitting position after a 10-min rest using a standard sphygmomanometer of appropriate cuff size. The mean value of two consecutive blood pressure readings was taken into account. After an overnight fast, venous blood samples were collected and promptly centrifuged, and the serum was stored at −20°C until the adiponectin assay was performed. All samples were tested in the same assay. Serum adiponectin was measured by enzyme-linked immunosorbent assay (human adiponectin enzyme-linked immunosorbent assay kit; RapidBio Co., Seattle, WA, USA). Fasting plasma glucose was measured in the laboratories in the three affiliated hospitals of NJMU with the glucose oxidase method. Total cholesterol, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, and triglycerides were determined in the three affiliated hospitals by an enzymatic colorimetric method (Au5400; Olympus, Tokyo, Japan). DNA was extracted from peripheral blood by the use of proteinase K and phenol/chloroform. This study was approved by the Research Ethics Committee of NJMU.
Single nucleotide polymorphism selection and genotyping assays
Two potentially functional single nucleotide polymorphisms (SNPs) (rs16861194 and rs266729) of the ADIPOQ gene, three of the PPAR-γ gene (rs2920502, rs3856806, and rs1801282), and four of RXR-α (rs1045570, rs3132291, rs4240711, and rs4842194) with minor allele frequency of ≥0.05 in the Chinese Han population were identified from the National Center for Biotechnology Information dbSNPs database (
Genomic DNA was extracted from peripheral blood samples of all subjects. 5'-Nuclease TaqMan® assays were used to genotype the polymorphisms in 384-well plates by the ABI PRISM 7900HT Sequence Detection system (Applied Biosystems, Foster City, CA). The primers and probes of TaqMan assays were designed using Primer Express Oligo Design software version 2.0 (ABI PRISM) and are available upon request as TaqMan Pre-Designed SNP Genotyping Assays. The primer and probe sequences used are shown in Table 1. Polymerase chain reactions were performed in a 5-μL reaction mixture containing 5 ng of DNA, 2.5 μL of 2* TaqMan Universal PCR Master Mix, and 0.083 μL of 40* Assay Mix. The polymerase chain reaction conditions were 50°C for 2 min, 95°C for 10 min, 95°C for 15 s, and then 60°C for 1 min; 40 cycles of real-time polymerase chain reaction were performed. Individual genotypes identification was performed by SDS software version 2.0 (ABI). Each plate contained two samples from a same individual as positive controls and two blank samples as negative controls for the genotyping quality confirmation. In addition, 10% of the samples were randomly selected to perform repeated assays; the results were 100% concordant.
SNP, single nucleotide polymorphism.
Statistical analysis
All the statistical analyses were performed by SPSS software (version 13.0) (SPSS, Inc., Chicago, IL). The associations between genotypes and T2DM were estimated by computing the standardized partial regression coefficient (β), the odds ratios, and 95% confidence intervals from both univariate and multinomial logistic regression analysis. Among controls, genotype frequencies for each SNP were tested for Hardy–Weinberg equilibrium. Generalized multifactor dimensionality reduction (GMDR) software (version 1.0.1) (
Results
Univariate logistic regression analysis for the associations between the risk factors and T2DM
The genotype distribution for all the SNPs did not show any deviation from the Hardy–Weinberg equilibrium (all P>0.05 in controls). Binary logistic regression analysis was used to analyze the association among the 29 variables, including 10 SNPs of the ADIPOQ gene, four of the PPAR-γ gene, and five of the RXR-α gene, serum adiponectin concentration and other possible environmental factors, and T2DM. Table 2 displays the sequences of the influence of the various factors with T2DM risk. The results showed that the top 10 sequences of the factors associated with T2DM risk were T2DM family history, hypertension history, WC, rs17817276, smoking, rs3856806, drinking alcohol, serum adiponectin, rs1801282, and BMI.
BMI, body mass index; CI, confidence interval; OR, odds ratio; T2DM, type 2 diabetes mellitus; WC, waist circumference.
Multiple logistic regression analysis for the associations between the risk factors and T2DM
In multiple logistic regression analysis, with stepwise regressive method, 11 factors were statistically significant. We sequenced the influence of various factors on high-risk T2DM. The order was family history, drinking alcohol, hypertension history, smoking, WC, rs7649121, rs3856806, rs12495941, gender, rs16861194, and serum adiponectin (Table 3). All samples had entered the model; there were no missing values.
CI, confidence interval; OR, odds ratio; T2DM, type 2 diabetes mellitus; WC, waist circumference.
BPANN multiple analysis for the associations between the risk factors and T2DM
Using 29 factors as input variables and T2DM diagnosis as output variables, we established the BPANN model with all available samples. The transfer function was logsig function. Learning rate was 0.1. Training error was 0.01. Maximum training steps was set to 1,000 steps. After the termination of training, the MIV was obtained. Then, according to the absolute value of MIV, we sequenced the related factors in the order of serum adiponectin level, rs3856806, rs7649121, hypertension, rs3821799, rs17827276, rs12495941, rs4240711, age, rs16861194, WC, rs2241767, rs2920502, rs1063539, drinking alcohol, smoking, hyperlipoproteinemia, gender, rs3132291, T2DM family history, rs4842194, rs822394, rs1801282, rs1045570, rs16861205, rs6537944, BMI, rs266729, and rs1801282 (Table 4).
BMI, body mass index; MIV, mean impact value; T2DM, type 2 diabetes mellitus; WC, waist circumference.
Locus–locus interactions of ADIPOQ, PPAR-γ, and RXR-α genes
The SNPs rs17817276, rs3856806, rs4240711, rs16861194, rs7649121, rs3821799, and rs12495941, which were located in the top 10 in BPANN multiple analysis, were termed, respectively, A1, A2, B4, C2, C5, C6, and C8. GMDR software was used to consider the joint effects of the seven SNPs. The results showed that the A2 B4 C5 C6 C8 (rs3856806, rs4240711, rs7649121, rs3821799, and rs12495941) model was the best model (cross-validation consistency 10/10, P=0.0107) (Table 5). After taking overweight and obesity as environment adjustment factors into the analysis, the results showed that A2 B4 C5 C6 C8 was still the best model (cross-validation consistency 10/10, P=0.0107) (Table 6).
A1, rs17827276; A2, rs3856806; B4, rs4240711; C2, rs16861194; C5, rs7649121; C6, rs3821799; C8, rs12495941; CV, cross-validations.
Adjusted for overweight and obesity.
A1, rs17827276; A2, rs3856806; B4, rs4240711; C2, rs16861194; C5, rs7649121; C6, rs3821799; C8, rs12495941; CV, cross-validations.
Discussion
T2DM is a typical multifactor disease. Because of T2DM's own restrictions such as confounding factors and multiple collinearities, the interactions among factors could not be fully reflected when we analyzed the associations between genes and T2DM by traditional model analysis. T2DM is caused by both genetic and environmental factors. We had to consider not only SNPs, haplotypes, and gene–gene interaction, but also environmental elements such as behaviors, living habits, family history, and so on. The traditional model often had limitations to analyze the gene–environment interactions, so there was no appropriate method to deal with this kind of problem for the moment. The BPANN with self-study ability did not require the distribution form and independence of variables and also could handle the problem of collinearity better. It not only helped us to find the unknown relationships among variables, but also contributed to analysis of the complex relationships, especially showing its unique advantage in gene–environment interaction analysis. Therefore, we discussed the associations among ADIPOQ, PPAR-γ, and RXR-α gene polymorphism with susceptibility to T2DM in China's southern Han population with the BPANN method and contrasted the results with logistic regression analysis, for a better understanding of using the artificial neural network from the view of epidemiology.
In this study, we preliminarily screened factors, especially genetic factors, using univariate logistic regression, multiple logistic regression, and BPANN analysis. According to the values of MIV and β, we could determine the ranking of importance of each factor and provide the basis for further analysis. We analyzed 29 factors that may be involved with the risk of T2DM by univariate logistic regression. According to the standardized partial regression coefficient (β) of each variable, we sequenced the influence of various factors on high-risk T2DM. The results showed that the positions of environmental factors such as T2DM family history, hypertension history, smoking, and drinking alcohol were the greatest, and most of the genetic SNPs were less important. Because of the relative presence of environmental factors, the effects of gene polymorphism on the disease were concealed and faint. Univariate logistic regression could not consider the combined action of elements and only provided the references for the establishment of the multifactor model.
Compared with univariate logistic regression, multiple logistic regressions determined the independent factor by the adjustment of other factors and also reflected part of the combined action among various factors. We screened out 11 variables that had positive or negative statistically significant associations (P<0.01) with the risk of T2DM in the order of T2DM family history, drinking alcohol, hypertension, smoking, WC, rs7649121, rs3856806, rs12495941, gender, rs16861194, and serum adiponectin. SNPs rs7649121, rs12495941, and rs16861194 belonged to the ADIPOQ gene. Reports have shown that the ADIPOQ gene had susceptible associations with diabetes. Sanghera et al. 17 found that two-site haplotype analysis in the ADIPOQ locus using only two marginally associated SNPs (rs182052 and rs7649121) revealed a significant protective association of the GA haplotype with T2DM. Multiple logistic regression analysis also revealed significant association of an ADIPOQ variant (rs12495941) with total body weight, waist, and hip. Fumeron et al. 18 found the −11391 GA genotype (rs16861194) is a risk for hyperglycemia. SNPs rs3856806 belonged to the PPAR-γ gene. Up to now, the studies investigating the association between rs3856806C/T of the PPAR-γ gene and the risk of T2DM had been reported rarely, and the conclusions were not identical. In these studies, some researchers found a statistically significant association between rs3856806C/T polymorphism with obese and overweight patients with T2DM. 19 –22 However, Evans et al. 23 did not find a statistically significant association between the rs3586806C>T polymorphism with T2DM in a German population study. We failed to find those results in our study. This may be due to the differences in ethnicity and T2DM susceptibility variants. Although the effect of serum adiponectin concentration appeared in the model, because of genetic and environmental multiple influences and the complex interaction between variables, the model contribution of adiponectin was weakened.
We performed this analysis using BPANN, which is a computer-based algorithm that can be trained to recognize and categorize complex patterns. 24 –27 It was able to process data containing complex (nonlinear) relationships and interactions that were often too difficult or complex to interpret by conventional linear methods. 28 Meanwhile, BPANN did not require variables to satisfy normality and independence conditions and could process collinearity questions between variables. Thus it had a unique advantage in the gene–environment interaction analysis. According to the absolute value of MIV, we sequenced relevant factors. Compared with univariate and multiple logistic regression analysis, the ranking arrangement of serum adiponection concentration was obviously raised. The position of serum adiponectin concentration was changed to the first from the last one in the sequence for multiple logistic regression. So we considered that adiponectin might have a great influence in the pathogenesis of T2DM. Studies have proven that adiponectin can regulate insulin sensitivity. Treatment with adiponectin can reverse insulin sensitivity in the fat atrophy rat. 29 Studies of rhesus monkeys in vivo showed that the adiponectin concentration began to drop early in the beginning of obesity and continuing to decrease with the development of T2DM. Low levels of adiponectin decreased the glucose absorption rate of insulin, and the reduction of adiponectin paralleled the progress of insulin resistance. 30
The direct influence of adiponectin to T2DM was embodied in the BPANN model. The ADIPOQ gene locus rs3821799, the PPAR-γ gene locus rs17817276, and the RXR-α gene locus rs4240711 had no associations with T2DM in univariate and multiple logistic regression analysis, but ranking arrangements of the three gene polymorphism were in the beginning in BPANN analysis. This result suggested that the three gene interactions may be associated with the risk of T2DM. In order to test this hypothesis, we used the GMDR method to analyze the interaction of seven polymorphism loci (rs7649121, rs3856806, rs12495941, rs16861194, rs3821799, rs17817276, and rs4240711) that screened out from multiple logistic regression and BPANN multifactor analysis. GMDR had been applied to SNP association studies for detecting gene–gene interactions associated with several complex diseases and was applicable to both continuous and dichotomous phenotypes. In addition, GMDR permits adjustment for discrete and quantitative covariates in various population-based studies with unbalanced case–control samples. 31 Model A2 B4 C5 C6 C8 (rs3856806, rs4240711, rs7649121, rs3821799, and rs12495941) was the best model because of the highest test balanced accuracy (0.5407) (cross-validations consistency=10/10). It showed that interactions existed among the loci of the five SNPs and might point out the interactions among the three genes. PPAR-γ is a member of the nuclear receptor family of ligand-activated transcription factors that heterodimerize with the RXR to regulate gene expression. 15 PPAR-γ/RXR heterodimer directly combined to the PPAR-responsive element and increased the promoter activity. The present study has identified a functional PPAR-responsive element in the adiponectin promoter and has demonstrated that it plays a significant role in the transcriptional activation of the adiponectin gene in adipocytes. 15 The related research just focused on the individual or several sites of the PPAR-γ and ADIPOQ genes. So far, it has not been found on the RXR-α gene and in the genetic interactions of the three genes in the relationship with T2DM.
Compared with traditional analysis methods, BPANN provided much more abundant information about the relationships of variables and a model that was more practical to use. In practical application, BPANN analysis and multiple logistic regression analysis would be used to select common influence factors, especially in the early screening of genetic factors, according to the ranking list of MIV and standardized partial regression coefficient (β). For instance, in the GMDR analysis of our study, we added all the 19 SNPs into the analysis without screening. We failed to get an optimal model. Accordingly, we might naturally arrive at the conclusion that there was no interaction among the three genes. However, when we put in seven SNPs, which were selected by multiple logistic regression and BPANN analysis, into the model analysis, we could get an optimal model with statistical significance (cross-validation consistency=10/10, P=0.0107). These results suggested that an interaction existed among the ADIPOQ, PPAR-γ, and RXR-α genes. This study suggested that the actual application significance of BPANN mainly lies in the output MIV of BPANN analysis, which, as a relative stable reference, can be used as a reference for the early screening of variables, especially the early screening of SNPs, which provide a basis for interaction of genes for further analysis. Even so, BPANN analysis only gave the weights and MIV values of each variable. It could not reject or choose some variables clearly as logistic regression analysis or give a exact P value for judging the effect of independent variable to dependent variable, which indicated whether the association had statistical significance or not. It is necessary to seek a solution in our further study.
Footnotes
Acknowledgments
We would like to thank the National Natural Science Foundation of China for grant 30771858 and the Jiangsu Provincial Natural Science Foundation for grant BK2007229.
Author Disclosure Statement
No competing financial interests exist.
