Algorithm for diabetes risk evaluation from past gestational diabetes data

Abstract

Gestational diabetes mellitus (GDM) is defined as glucose intolerance that is diagnosed in pregnancy period, leading to possible complications for both mother and fetus during pregnancy. The aim of this study was to build an objective method to evaluate diabetes mellitus (DM) risk from past GDM data recorded 15 years ago and find a short list of most informative indicators.

The dataset consists of demographic, lifestyle, clinical, genetic and pregnancy related information recorded 15 years ago. Due to the large time gap data are limited and have missing values (MVs). Follow-up tests were performed to see if DM or impaired metabolism has developed after pregnancy with previously diagnosed GDM. The research steps involve pre-processing data to evaluate MVs, finding most informative attributes and testing standard classification algorithms to combine in to most effective voting meta-algorithm. Initially the attributes and records with large number of MVs were rejected. A small percentage (2.04%) was imputed using regression based methods. The data set was prepared for two scenarios: classification in two classes (1-healthy; 2-impaired metabolism including DM) and three classes (1-healthy; 2-impaired metabolism; 3-DM).

Voting meta-algorithm combining best algorithms of 21 from five different groups including Bayesian, regression, lazy, rule, and decision trees makes classification more objective and not depending on preferences. Relative frequency of occurrence (RFO) analysis of attributes combined with voting meta-algorithm helped finding optimal amount of attributes giving best possible classification result. The algorithm applied to two class data set with 12 selected attributes produced accuracy of 75.85 and AUC $=$ 0.82 with standard error of 0.11. Similarly for three class dataset the 9 attributes were selected allowing to reach classification accuracy 63.77 and AUC $=$ 0.76 with standard error of 0.1.

Meta-algorithm based classification of limited anamnestic GDM related data for DM prediction is proving to be effective. Testing multiple algorithms and performing RFO analysis appears to be natural and objective way of selecting most informative attributes and evaluating their importance.

Keywords

Diabetes prediction classification gestational diabetes attribute ranking

1. Introduction

The prevalence of GDM is increasing worldwide, intensified with elder maternal age, ethnicity, obesity, family history of diabetes, and sedentary lifestyle [1, 2]. GDM is diagnosed in pregnancy, and increases the risk of complications for both – the mother and the fetus [3, 4]. Recognised risk factors include obesity and family history of T2DM which are steadily increasing in the general population [5]. It is estimated that GDM affects around 7% of all pregnancies worldwide although prevalence is difficult to estimate as rates vary from study to study because of a lack of accepted diagnostic criteria and differences in screening procedures [4]. Other systematic meta-studies also show that there are higher related risk of T2DM for women with GDM [6].

In addition to adverse outcomes during pregnancy and birth, the consequences of GDM extend beyond pregnancy with affected women having a seven fold increased risk of type 2 diabetes mellitus compared to women who have not had GDM [6]. Rates of T2DM after a diagnosis of GDM vary depending on the population and length of follow up, but have been reported to be as high as 70%. Women are thought to be at the greatest risk of developing T2DM in the first five years following a pregnancy with GDM, with incidence of T2DM plateauing at around 10 years. Although women who have had GDM are at an increased risk of T2DM, research has shown that by making lifestyle changes they can prevent or delay progression to T2DM [4]. GDM increases maternal and their offspring risk of T2DM, obesity and cardiovascular disease. Recent research indicates that high glucose levels or over nutrition during pregnancy can result in a ‘metabolic programming’ effect on the fetus, predisposing the child to early onset diabetes or obesity [7].

There are several recommendations of post-partum screening among health organizations. Normally first post-partum test is performed after weeks and if normal – annually. The standard tests include 75 g 2-hour glucose or fasting blood glucose test. The sensitivity and specificity data of those strategies do not exist [8]. The intensive studies involving machine learning methods mainly focuses on biomarker prediction and diagnosis of DM [9]. There are a limited number of studies of diabetes prediction in future for women with previously diagnosed GDM. However there is a lack of data related lifestyle, behaviour and inheritance including link with other pathophysiological conditions especially when talking about patients prior GDM. As DM develops after the GDM not in all women, there is a necessity of effective algorithm which is capable of classifying who of women are at highest risk for development of DM (even with limited anamnestic GDM related data with MVs) in to evolved pathologies during long period of time.

1.1 The missing values

When analysing anamnestic data, it usually comes with MVs. Some old records may be unclear to interpret while others are unknown or simply missing. The best case would be to avoid MVs in the data collecting stage, but in reality, especially in medical field and considering long time periods incomplete data is practically inevitable. For large datasets it would be simply possible to remove the records with MVs and leaving only complete data for further analysis. In small dataset removing all records with MVs can be critical, so softer approach to this issue should be used. The only way to soften the problem of MVs is to use an imputation method, which would fill the MVs with most probable data. If number of MVs in data set is high and not randomly distributed, there is a risk of oversimplifying the problem and producing invalid knowledge [10]. Classifying dataset containing significant amount of MVs can lead to series of difficulties such as loss of efficiency, complications in handling and analysis of data using standard tools and systematic bias rising between missing and complete data [11]. In the first stage of preprocessing the data having more than 10% of MVs are removed from dataset. According to Cheema if total percentage of missing values do not exceed 10%, the correlation between imputed and original data does not exceed 0.95 [12]. The remaining MVs can be imputed using one of several available methods. The most basic are statistical ones where MVs are replaced with mean of median of attribute values. They are least accurate, but easy and fast to implement. The more advanced method giving better accuracy and is K-Nearest Neighbor (KNN) imputation method, which computes the most similar instance of a given number of surrounding data when similarity is determined using distance function [13]. The regression based prediction models that try to predict the most probable attribute values from relations of existing data. They usually require preparation of data and predictors to specific cases and might not be suitable for small datasets with significant amount of MVs. In either case, the imputation is a synthetic process that normally doesn’t bring valuable information. The main reason of applying imputation to data with MVs is that it is easier to apply different classification algorithms. Also imputation allows avoiding bias errors that may rise due to possible correlation of MVs to class attribute [14]. The effectiveness of different imputation methods depend on classification algorithm used. One method closest to reality is using prediction based imputation [10]. For data with small number of MVs the regression based Iterative Robust Model-Based Imputation (IRMI) algorithm can be used, which iteratively fills missing attribute values, while remaining attributes serve as learning data. Algorithm is implemented in to VIMGUI package of R language which automatically adopts best regression model for the data [15].

1.2 Optimizing attribute set

The dataset with all possible attributes do not always mean best classification result. The higher dimension data may lead to negative effects like redundancy and noise. There are two method groups of selecting best performing attributes. One group is classifier independent where attributes are ranked by information input to target, other analyze correlation to find usefulness or redundancy [16]. It may be possible to find best attribute subset that works well with particular classifier, but with different classifier model wouldn’t be optimal. Second group of methods is more suitable for multiple classifier approach. One of the best practices is to use classifier-dependent attribute backward feature elimination method also known as wrapper. Wrapping allows eliminating attributes iteratively until smallest tolerable selected error rate is achieved [17]. In all tests default threshold of 1% is selected. Wrapper algorithm is robust and can be implemented before individual classification routines. The downside of wrapper method is that it cannot tell how attributes are ranked by importance [18]. Each algorithm may produce differing subset of attributes so they may need to be joined in to single subset when ensemble methods with multiple classifiers are used.

In this paper the main focus is to evaluate best voting meta-algorithm for prediction of evolving DM and impaired metabolism for women who have previously been diagnosed with GDM. The data were collected 15 and more years ago which have a limited number of possible attributes some having up to 30% of MVs. Recent repeated clinical tests have been made to find out if DM or impaired metabolism has been developed over the years. The data carries valuable long term information which may be used for predicting the evaluation of DM for women with prior GDM cases.

The aim of this study is to build an objective method and find the most informative attributes to evaluate DM and impaired metabolism risk from GDM complications recorded in remote past.

The tasks include following:

•
Processing MVs with minimal impact to classification error.
•
Evaluating best performing classification meta-algorithm for DM prediction.
•
Finding the minimal number of most informative attributes related with development of DM and impaired carbohydrates metabolism by classifying data in 2 and 3 classes.

2. Materials and methods

2.1 Dataset analysis

The dataset was collected mostly in two time periods: 15 years old and more retrospective data was collected from women previously diagnosed GDM; and recently to detect if DM and impaired metabolism evolved. After pre-processing anamnestic data (when GDM was diagnosed) the final set consists of 151 patient records having 44 attributes. The attributes can be classified in to five groups including: demographic, the pregnancy related data during GDM diagnosis, clinical, genetic factors and lifestyle (Table 1).

Table 1
Attributes and their short-codes used in dataset

Demographic	X1 – residence; X7 – weight before pregnancy with GDM; X20 – marital status
Pregnancy related	X2 – number of pregnancies; X3 – number of given births; X4 – GDM detection time; X5 – year of first given birth; X6 – delivery status during GDM; X9 – weight of first born child
Clinical	X10 – GDM treatment; X19 – DM family history
Genetic [19]	X11 – rs7901695_TT; X12 – rs7903146_CC; X13 – rs7895340_GG; X14 – rs7895340_AA; X15 – rs11196205_CC; X16 – rs11196205_GG; X17 – rs12255372_GG; X18 – rs12255372_GT
Lifestyle	X8 – smoking; X21 – job status; X22 – education; X23 – income level; X24 – work physical intensity; X25 – time needed to get to work; X26 – physical activity; X27 – physical activity duration; X28 – cycling and walking duration; X29 – physical form; X30 – did activity maintain current physical form; X31 – total meals per day; X32 – fish per week; X33 – poultry per week; X34 – meat per week; X35 – vegetables per week; X36 – fast food per week; X37 – common fat type; X38 – amount of vegetables per meal; X39 – salad dressing; X40 – fruits; X41 – nuts and seeds; X42 – bread and butter; X43 – desserts; X44 – sugar consumption

39.4% of the women indicated that they did not follow any dietary and treatment recommendations during pregnancy. 46.3% of the participants followed dietary recommendations and 14.4% of the women received insulin therapy during GDM.

TCF7L2 is one of mostly researched gene in women having GDM [20]. The single nucleotide polymorphism (SNP) rs7903146 has shown strong associations with developing T2DM in different ethnical groups [21]. The relation has also been noticed of TCF7L2 SNPs rs290487, rs6585194 and rs7094463 between insulin resistance and insulin secrete for patients with diagnosed GDM [22]. Totally eight SNP related attributes have been included in the research dataset for analysis.

The largest part of data consists of lifestyle habits that relate to occupation, physical activity and eating habits which was acquired by filling questionnaire form.

The recent diagnostic data serves for class assignment:

2 class problem: class1 – healthy; class2 – impaired carbohydrates metabolism (including DM); 3 class problem: class1 – healthy; class2 – impaired carbohydrates metabolism; class3 – DM.

2.2 Algorithms used

For preprocessing clinical data the IRMI algorithm has been chosen in order to get best possible values for imputation. No other imputation methods have been tested since total amount of MVs in cleaned dataset is low (2.04%) which leads to minimal impact to results. Testing multiple imputation methods is beyond the scope of this research.

Waikato Environment for Knowledge Analysis (WEKA) data mining software, hosted by University of Waikato, has an extensive set of machine learning tools that may be used to run multiple experiments [23]. By analyzing included machine learning algorithms, total of 21 classifiers from five groups were selected for further experiments, including Bayesian, regression, lazy, rule based, and decision trees (Table 2).

Table 2
Selected algorithms for research

Algorithm group	Algorithms
Bayesian	Bayes Net, Naive Bayes
Regression	Logistic, Multilayer Perceptron, SGD, Simple Logistic, SMO, Voted Perceptron
Lazy	IBk, Kstar
Rule	Decision Table, JRip, OneR, Part
Decision tree	Decision Stump, Hoeffding Tree, J48, LMT, Random Forest

Each algorithm has its pros and cons. The models of regression algorithms including support vector machine (SVM) have lower variance, but higher bias leading to underfit. Decision trees are prone to overfit but work well with categorical data. Bayesian algorithms are computationally fast, work well with large number of attributes but relies on independence assumption [24, 25].

The five best classification algorithms from different groups are selected by the highest area under ROC (AUC) value. Each selected algorithm is included in voting meta-algorithm where result are combined in to final prediction using averaging of results [26, 27].

In this study the wrapper and relative frequency of occurrence (RFO) [28] methods together were used to filter and rank attributes by importance. The proposed ranking method works in two stages: finding most informative attribute subsets for each standard algorithm using wrapper method. Then all subsets are combined in to single set by calculating RFO for each attribute as seen in Fig. 1.

Figure 1.

Calculation RFO of attributes using two stage method: wrapper and RFO.

The attributes with highest RFO can be marked as most informative. The accuracy of this method may depend on number of classification algorithms used in the first stage.

During test stage, each classification algorithm is run iteratively within wrapper algorithm (Fig. 1) to find the smallest subset of informative attributes. As the optimization criteria the AUC parameter is selected since it represent better intuition between sensitivity and specificity [29]. Each algorithm performs a 10-fold cross-validation in order to get a more generalized and independent classification results leading to less overfitting.

3. Results

3.1 Data preparation and processing missing values

The amount of MVs in dataset ranges from 0% to more than 30% on each attribute. More than 10% of MVs for a single patient record were considered to have high impact to final classification error, so they were rejected initially. The total amount of remaining MVs were 2.04%, and according to [12] imputation is suitable in order to get classification results close to possible true complete data. The final dataset consists of 44 attributes and 151 patient records.

The dataset with small amount of remaining MVs were imputed by regression based IRMI algorithm using VIMGUI package of R language. After imputing missing values, dataset was prepared for two test cases – classification in two classes and three classes. For 2 class setup 43.71% of records are assigned to class 1 (healthy); to class 2 (impaired carbohydrates metabolism including DM) 56.29% of records. For 3 class problem to class 1 (healthy) there are 43.71% of records assigned, to class 2 (impaired carbohydrates metabolism) – 15.89% of records and to class 3 (DM) – 40.4%.

3.2 Two class test results

Test results of every algorithm are represented in Fig. 2 box plot diagram where statistical distribution of 10 times cross validation is visible. Classification algorithms are compared by mean AUC parameter. Testing of algorithms shows that best performing is Logistic classifier having accuracy of 78.54% and AUC 0.85 with standard error 0.03. The selected number of optimal features is 15 (X1, X5, X7, X9, X11, X12, X13, X16, X17, X23, X,31, X40, X43, X44). The following high performing algorithms are Naïve Bayes, KStar and Hoeffding tree having AUC values of 0.83 with attribute numbers of 13, 13 and 11 respectively.

Figure 2.

Box plot of performance of classification algorithms compared by AUC parameter.

All 21 classification algorithms generated their own optimal set of attributes using wrapper method. RFO of attribute of all tested algorithms indicate the ranking by their importance. Evaluation of attribute appearance frequency indicates most informative ones while the rest – rarely appearing – can be omitted thus saving clinical resources.

Figure 3.

Attribute RFO chart of all tested classification algorithms for two class problem. Axis x values indicate attribute index (see Table 1 for reference).

Figure 4.

Box chart of performance of classification algorithms.

As can be seen in Fig. 3 the most commonly used attributes are X23 – ‘income level’ and X44 – ‘sugar consumption’ attributes that are selected by almost all classification algorithms. The following six are less than twice frequent in all classification algorithms. Further attributes are gradually losing their importance and few attributes (X28, X30, X34 and X39) are not even selected by any classification algorithm. The RFO analysis sorted attributes by their importance allowing to select most informative in order to reduce cost in decision making.

3.3 Three class test results

In the second scenario, the same dataset with three class attribute was tested. In this case the accuracy and AUC values came out lower. The best performing algorithm was Hoeffding tree (Fig. 4) with accuracy 63.57%, AUC $=$ 0.78 with standard error 0.03 and optimal set of attributes equal to 10 (X3, X7, X9, X10, X23, X26, X28, X35, X43 and X44). The second best algorithm came out Logistic with AUC $=$ 0.77 and optimal attribute set of 8.

In the attribute RFO chart (Fig. 5) it can be seen that the leading attribute is X44 – ‘sugar consumption’ which was selected by all tested algorithms. It can be considered one of potential major causes of developing DM type 2. The other attributes appear moderately important and further loose importance Fig 6. In this case several attributes including X11, X13, X14, X15, X25, X29, X40 and X42 were also not selected by any algorithm and can be excluded from further research.

Figure 5.

Attribute RFO chart of all tested classification algorithms for three class problem. Axis x values indicate attribute index (see Table 1 for reference).

3.4 Selecting optimal attribute set from RFO list

In each test case the attributes were sorted according to their RFO. In order to find RFO threshold from which most informative attributes give the best classification result, the voting meta-algorithm was implemented. In meta-algorithm the best classification algorithms from each group were implemented. The final AUC score was calculated by averaging resulting AUC distributions of each classifier [30].

Figure 6.

AUC parameter dependence on RFO threshold of attributes used in meta-algorithm.

In Fig. 6 it can be seen that AUC curves depend on RFO threshold of selected attributes in meta-algorithm for both scenarios. Both AUC curves have peaks at 20% RFO threshold point. In two class problem, the number of selected optimal attributes is 12 and in 3 class case – 19. Towards the lower selected RFO threshold values AUC parameter drops because less informative and probably even misleading attributes are included. Similarly AUC value is dropping down when higher attribute RFO threshold points are used. This happens because at higher values more informative attributes are rejected resulting in higher errors in classification results.

In Fig. 7 it can be seen that the number of most informative attributes depends on different selected RFO thresholds. For instance in class 3 case the AUC values at the RFO thresholds 20% and 30% do not differ significantly with values 0.775 and 0.76 respectively but number of selected attributes differ significantly – from 19 vs 9 attributes. It is worth to consider selecting 30% RFO threshold for better clinical resource saving and cost effective solution. The t-paired statistical test between 20% and 30% RFO threshold points with significance level of 0.05 showed that difference is insignificant with calculated $p=$ 0.14. In this case we can select 30% RFO threshold point with 9 selected attributes.

Classification meta-algorithm results at the optimal RFO threshold points are presented in Table 3.

Table 3

Meta-algorithm results at the optimal attribute subset selection point

Test case	Accuracy, %	Precision	Recall	AUC	STD
2 class	75.85	0.74	0.73	0.82	0.11
3 class	63.77	0.61	0.64	0.76	0.10

Figure 7.

Number of attributes N depending on different RFO thresholds of attributes used in meta-algorithm.

Using meta-algorithm is more objective way of classifying subjects than relying on single best performing standard classifier, because it diversifies results of best algorithms from different groups. It allows minimization of possible specific errors and bias brought by distinct algorithms.

4. Discussion

This work describes the applications of modern data mining approach to solve DM classification/prediction problem for women with previously diagnosed GDM using anamnestic data collected 15 years ago. The MVs of dataset were filled by using regression method. Other imputation methods were not tested because the amount of MVs was not big (2.04%) and according to Cheema correlation between possible original dataset and imputed data is above 0.95 [12].

The dataset was prepared by assigning two and three class attributes: the two class attribute contain healthy patients and DM joined with impaired carbohydrate metabolism.

Voting meta-algorithm combining best algorithms of 21 from five different groups makes classification more objective and not depending on preferences. The RFO analysis of attributes and voting meta-algorithm based optimization helped finding optimal threshold point indicating minimal amount of attributes from RFO ranking list.

For two class scenario the optimal attribute cut off frequency is 20%. The optimal number of attributes is 12 and they are ranked by importance as follows: X23 – ‘income level’, X44 – ‘sugar consumption’, X12 – ‘rs7903146_CC’, X17 – ‘rs12255372_GG’, X43 – ‘desserts’, X5 – ‘year of first given birth’, X7 – ‘weight before pregnancy with GDM’, X9 – ‘weight of first born child’, X26 – ‘physical activity’, X3 – ‘number of given births’, X1 – ‘residence’, and X25 – ‘time needed to get to work’ (Table 1 for reference). Using selected attributes, meta-algorithm produced accuracy 75.85 and AUC $=$ 0.82 with standard error of 0.11. Since two class test results were good, problem was expanded to three class problem using same dataset. In this case the optimal selected attribute RFO threshold is 30% with 9 most informative attributes ranked by importance as follows: X44 – ‘sugar consumption’, X23 – ‘income level’, X3 – ‘number of given births’, X22 – ‘education’, X9 – ‘weight of first born child’, X5 – ‘year of first given birth’, X7 – ‘weight before pregnancy with GDM’, X18 – ‘rs12255372_GT’, and X43 – ‘desserts’ (Table 1 for reference). Meta-algorithm produced accuracy 63.77 and AUC $=$ 0.76 with standard error of 0.1.

In both classification cases the number of attributes were reduced from 44 to 12 (two classes) and 9 (three classes) indicate the potential of saving clinical resources while collecting data.

Table 4
Comparison of the proposed method with similar research

Method	Attributes	Period in years	AUC
Proposed meta-algorithm with RFO	Demographic, the pregnancy related data during GDM diagnosis, clinical, genetic factors, and lifestyle	Over 15	0.82
Univariate logistic regression [31]	Glycated haemoglobin (HbA1c), age, BMI, family history of diabetes	Over 4	0.88 (Caucasian)

We can see a comparison of one similar study that has been found during study in Table 4. The study was mainly focused on DM prognosis in a Caucasian population. The secondary test on a group was performed after 4 years which is small enough compared to our study. Their prognosis included glycated haemoglobin test result which is considered a strong predictor. In our study we have not included any glucose tests, but results of AUC are on the same level. This indicates that even without a control of initial attributes the proposed method provides strong results.

5. Conclusion

This work shows that data mining and computation methods can be effectively adapted to clinical medicine that uses patient specific information to predict outcome of interest. Standard classification algorithms combined into voting meta-algorithm, attribute ranking methods such as RFO could be applied to the construction of predictive models for prognosis of developing DM for women with a GMD diagnosis. Since the proposed approach is quite objective and adaptable, the algorithm could be applied far beyond DM problems, e.g. for other similar classification tasks when datasets have lot of attributes and MVs.

5.1 Limitations and further research

There might be several influences in to objectivity of the results. Instead of the initial rejection of the records with most appearing MVs there could be iterative evaluation process included. There can be situation, when rejected attributes could have significant value to classification and those might be treated differently. There can be more attention payed to data pre-processing stage and more comprehensive analysis of MVs could be performed.

There was an attempt to find the rank of attributes by calculating their RFO. The number of the classification algorithms used can also influence the ranking result. The more objective ranking result could be achieved by including more classification algorithms and their variations.

Ranking value of attributes could also be augmented by introducing attribute cost evaluation. The more expensive clinical test could be more penalized than simple data collection. Thus the economic gain could be achieved by reducing data acquisition cost. With the bigger and more complete data there could be even more classes included to classify in to more cases.

The algorithms in WEKA were used with their default parameters. The initial study showed that default settings work well, so no further parameter tweaking was performed. In order to fine-tune the results, parameter optimization could potentially positively influence the results.

Footnotes

Conflict of interest

None to report.

References

Pons

Rockett

de Almeida Rubin

Oppermann

Bosa

. Risk factors for gestational diabetes mellitus in a sample of pregnant women diagnosed with the disease, Diabetol. Metab. Syndr. 2015; 7(Suppl 1): A80.

Witczak

et al., What is the impact of gestational diabetes mellitus on frequency of structural chromosome aberrations in pregnant women and their offspring, Mutat. Res. Toxicol. Environ. Mutagen. Jun. 2017; 818: 27-30.

Eadara Krishna Murthy

ŽM.

Ivana Pavlic-Renar, DIABETES AND PREGNANCY, Diabetol. Croat. 2002; 31(3): 131-146.

Eades

Cameron

Evans

JMM

. Prevalence of gestational diabetes mellitus in Europe: A meta-analysis, Diabetes Res. Clin. Pract. Jul. 2017; 129: 173-181.

Gestational diabetes, Obstet. Gynaecol. Reprod. Med. 2017; 27(6): 171-176.

Bellamy

Casas

J-P

Hingorani

Williams

. Type 2 diabetes mellitus after gestational diabetes: a systematic review and meta-analysis, Lancet. May 2009; 373(9677): 1773-1779.

A systematic review of interventions for Hispanic women with or at risk of Gestational diabetes mellitus (GDM), Sex. Reprod. Healthc. 2017; 13: 14-22.

Kim

. Maternal outcomes and follow-up after gestational diabetes mellitus. Diabet. Med. Mar. 2014; 31(3), 292-301.

Machine Learning and Data Mining Methods in Diabetes Research, Comput. Struct. Biotechnol. J. Jan. 2017; 15: 104-116.

10.

Luengo

García

Herrera

. On the choice of the best imputation methods for missing values considering three groups of classification methods, Knowl Inf Syst. 2012; 32: 77-108.

11.

Barnard

Meng

X-L

. Applications of multiple imputation in medical studies: from AIDS to NHANES, Stat. Methods Med. Res. Feb. 1999; 8(1): 17-36.

12.

Cheema

. Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research, J. Mod. Appl. Stat. Methods. Nov. 2014; 13(2): 53-75.

13.

Singh

Prasad

. Estimation of Missing Values in the Data Mining and Comparison of Imputation Methods, Math. J. Interdiscip. Sci. 2013; 1(2): 147-162.

14.

Kang

. The prevention and handling of the missing data, Korean J. Anesthesiol. May 2013; 64(5): 402-6.

15.

Kowarik

Templ

. Imputation with the R package VIM, J. Stat. Softw. 2006; 74(7), 1-16.

16.

Guyon

Elisseeff

. An Introduction to Variable and Feature Selection, J. Mach. Learn. Res. 2003; 3: 1157-1182.

17.

Talavera

. An evaluation of filter and wrapper methods for feature selection in categorical clustering, Adv. Intell. Data Anal. VI, 2005; p. 742.

18.

Tang

Alelyani

Liu

. Feature selection for classification: A review Title, Data Classif. Algorithms Appl. CRC Press, 2014; 37-64.

19.

Michalak-Wojnowska

et al., Association between rs7901695 and rs7903146 polymorphisms of the TCF7L2 gene and gestational diabetes in the population of Southern Poland, Ginekol. Pol. Nov. 2016; 87(11): 745-750.

20.

Lin

P-C

Lin

W-T

Yeh

Y-H

Wung

S-F

. Transcription Factor 7-Like 2 (TCF7L2) rs7903146 Polymorphism as a Risk Factor for Gestational Diabetes Mellitus: A Meta-Analysis, PLoS One. Apr. 2016; 11(4): e0153044.

21.

Cauchi

et al., TCF7L2 rs7903146 variant does not associate with smallness for gestational age in the French population, BMC Med. Genet. Dec. 2007; 8(1): 37.

22.

Liu

P-H

et al., Genetic Variants of TCF7L2 Are Associated with Insulin Resistance and Related Metabolic Phenotypes in Taiwanese Adolescents and Caucasian Young Adults, J. Clin. Endocrinol. Metab. Sep. 2009; 94(9): 3575-3582.

23.

Weka 3: Data Mining Software in Java, 2016. [Online]. Available: http://wwwcs.waikato.ac.nz/ml/weka/.

24.

Fernández-Delgado

Cernadas

Barro

Amorim

Amorim Fernández-Delgado

. Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? J. Mach. Learn. Res. 2014; 15: 3133-3181.

25.

Jordan

. On generative vs. discriminative classifiers: A comparison of logistic regression and naive bayes, Proc. Adv. Neural Inf. Process. 2002; 28(3): 169-187.

26.

(Computer scientist) Zhou

Z-H

. Ensemble methods: foundations and algorithms. Taylor & Francis, 2012.

27.

Marozas

Zykus

Sakalauskas

Kupčinskas

Lukoševičius

. Noninvasive evaluation of portal hypertension using a supervised learning technique, J. Healthc. Eng. 2017; 2017.

28.

Sanfeliu

Lazo Corteés

. Progress in pattern recognition, image analysis and applications: 10th Iberoamerican Congress on Pattern Recognition, CIARP 2005, Havana, Cuba, 15–18 November 2005: proceedings. Springer, 2005.

29.

Swets

. Measuring the accuracy of diagnostic systems, Science. Jun. 1988; 240(4857): 1285-93.

30.

Hill

Miller

. How to combine independent data sets for the same quantity, Chaos An Interdiscip. J. Nonlinear Sci. Sep. 2011; 21(3): 33102.

31.

Oldfield

Donley

Walwyn

Scudamore

Gregory

. Long term prognosis of women with gestational diabetes in a multiethnic population. Postgrad. Med. J. Jun. 2007; 83(980): 426-30.