Abstract
BACKGROUND:
Timely and accurate diagnosis of genetic diseases can lead to proper action and prevention of irreparable events.
OBJECTIVE:
In this work we propose an integrated genetic-neural network (GNN) to improve the prediction risk of trisomy diseases including Down’s syndrome (T21), Edwards’ syndrome (T18) and Patau’s Syndrome (T13).
METHODS:
A dataset including 561 pregnant were created. In this integrated model, the structure and input parameters of the proposed multilayer feedforward network (MFN) were optimized.
RESULTS:
The results of execution of the GNN on the testing dataset showed that the developed model can be accurately classify the anomalies from healthy fetus with 97.58% accuracy rate, and 99.44% and 85.65% sensitivity, and specificity, respectively. In the proposed GNN model, the Levenberg Merquident (LM) algorithm, the Radial Basis (Radbas) function from various types of functions were selected by the proposed GA. Moreover, maternal age, Nuchal Translucency (NT), Crown-rump length (CRL), Pregnancy-associated plasma protein A (PAPP-A) were selected by the proposed GA as the most effective factors for classifying the healthfetuses from the cases with fetal disorders.
CONCLUSION:
The proposed computerized model increases the diagnostic performance of the physicians especially in the accurate detection of healthy fetus with non – invasive and low – cost treatments.
Background
Anomalies disorders such as down’s syndrome (T21), Edwards’ syndrome (T18) and Patau’s syndrome (T13) are among the most common anomalies diseases among pregnant women [1, 2, 3, 4, 5]. T21 is a set of physical, psychological and functional anomalies, such as a change in a distinctive appearance accompanying a defect in the vital organs of the body, such as the heart and the immune system [6]. T18, the result of trisomy in the 18 chromosome, is the second common autosomal trisomy syndrome with an outbreak of 1 in 6000 to 1 in 8000 of the live birth [7]. T13 is also a severe clinical disease resulting from the trisomy of 13 human chromosomes, and in cases of T13 syndrome, 82% of the cases will not live more than one month, and 85% do not live more than one year [8].
Trisomy detection methods are divided into two classes of invasive and non-invasive methods during pregnancy [9]. Passive methods usually have higher risk and more accuracy to non-invasive treatments. In the non-invasive treatments, first the risk of infection is examined, and in the case of high risk infection, the pregnant suggests aggressive methods to more confidence. In fact, non-invasive treatments play a role in determining the risk of disease and the need to carry out aggressive methods. The screening is among the usual non-invasive methods to determine the risk of trisomy during pregnancy.
Age, the difference between paired gonadotropin concentrations and Pregnancy-associated plasma protein A (PAPP-A), during week 16 of pregnancy and the Nuchal translucency (NT), Crown-rump length (CRL), as obtained by fetal sonography, in the week 11 to 14 of pregnancy, as well as the fetus’s DNA sampling through their mother’s blood, reduces the risk of syndrome cases [10]. In the non-invasive treatments with low risk of intervention, in non-invasive methods, despite the low risk of intervention, they have a diagnostic error in determining the risk of disease; so the use of invasive methods such as Amniocentesis is used as a more effective method to examine the healthy pregnancy in clinical specialist labs [11]. It is possible to recognize the condition of the fetal health through examination and the existence of DNA Karyotype in amniotic fluid. Usually this type of diagnosis is performed in the second quarter of pregnancy. The advantage of this method is its high accuracy (over 99%) and the disadvantages are the leakage of amniotic fluid, abortions, infection, lack of samples and the risk of fetus damage. The rate of abortions is reported between 0.05 to 1 percent [12]. High cost and time-consuming presentation of results are the disadvantages of those methods. Also, due to the high cost of definitive diagnostic methods for genetic anomalies, it has been observed in many cases that due to the financial inability of couples to perform these methods, they decided to terminate the pregnancy based on the screening test results of the first trimester. Furthermore, this issue can reduce the fertility rate and increase the rate of aging of the population. Predictions show that Iran will be the third country in the world in terms of population aging rate after the UAE and Bahrain [13]. In addition, the anxiety of pregnant women during pandemic diseases such as COVID-19 in going to screening centers and stressing them has caused negative effects on their referrals for screening tests. This has increased the risk of diagnosing fetal genetic diseases [14, 15, 16].
The application of artificial intelligence technologies in the area of medicine and human health, which is one of the most essential problems in everyone’s life, has grown considerably in recent decades [17]. Today’s human civilization is seeing the appearance of stronger decision support systems, more accurate diagnostic tools for illnesses, and even robot surgery, due to artificial intelligence and machine learning methods. Artificial intelligence makes use of current learning algorithms as computational techniques to enhance performance or generate correct prediction [18, 19]. Researchers now see ANNs, which use machine learning techniques to solve difficult problems in medicine and health, as an accurate tool for diagnosis and classification [20]. Akbarian et al. obtained an average accuracy of 90.9% in their research utilizing 45 variables affecting the pregnancy outcomes of lupus ectopic moms by testing on just 149 pregnant women and using ANNs [21].
Katlan et al. examined 97 pregnant women considered high-risk in screening tests for fetal health using an ANN model [22]. Feed-forward neural network is also a popular ANN algorithm that was used to classify and diagnosis of diseases in several studies [23, 24, 25]. The differences in the mentioned studies are mainly on the type of ANNs and the number of layers and neurons, and the architecture of the ANNs.
An ANN’s success is determined by its design and architecture. Typically, ANN optimization is performed by manually tuning the variables that define the network’s structure [26]. There are no clear and accurate guideline for ANN optimization, which is often accomplished via repetition, trial and error. This is generally tough and time-intensive, and it does not always result in the intended outcome [26]. In recent years, there has been a surge in the usage of ANN design optimization techniques, particularly GAs and hybrid models [27, 28, 29, 30, 31, 32]. There are two types of functional basis-based optimization algorithms: techniques based on reinforcement learning (RL) and methods based on evolutionary algorithms (EA). In RL-based techniques, choosing a component of the ANN design is called an action, and a succession of action sequences leads to the development of an ANN. Finally, the accuracy of the constructed ANN is taken into account as a metric. The search for the optimum architecture in EA-based techniques is based on mutations and the reassembly of the ANN architecture component [31]. In a sense, the designs with the best performance are chosen and utilized to continue the development process.
GA is an evolutionary algorithm widely used to optimize machine learning systems and provide an almost optimal solution with a random search technique [32, 33]. This algorithm works by modelling chromosome proliferation and recombination capabilities and mutations in a process similar to what occurs in the meiotic stage of cell proliferation. An initial set of solutions is created randomly to solve a problem called a population. Each solution is considered a chromosome. Chromosomes make a new generation each time the recombination and mutation cycles are repeated, and the function of each generation of chromosomes is evaluated. Chromosomes whose phenotypes perform best in the ANN are considered the next generation’s parents. Thus, in the next generations, the population will evolve. After several generations, the population converges to the best chromosome, providing the desired architecture of the ANN [34, 35]. In this study, an intelligent model was achieved by using an ANN technique and a GA optimization capacities to identify the risk of trisomy of T13, T18 and T23 during first trimester pregnancy.
Method
The current research was carried out in three phases. The research dataset was generated in the first step, and then the beneficial variables in detecting prenatal anomalies were discovered. The intelligent model was then developed to predict the genetic anomalies, and finally, the constructed model was evaluated. To evaluate the reliability of the model performance, the proposed GNN model was implemented 10 times on the datasets. Therefore, the average results were reported in every experiments.
Dataset
The dataset in this study consists of 15 to 45 year-old pregnant women who were referred to the Genetics Laboratory Center in Ahvaz city, center of province in southwest of Iran, for screening tests between the 11th and 14th weeks of pregnancy. The required information of 561 pregnant was observed through the study of files in the center’s archives, including maternal age, race, smoking during pregnancy, maternal diabetes mellitus, maternal preeclampsia, fetal heart rate, crown rump length (CRL), fetal nasal bone, human chorionic gonadotropin (hCG), and pregnancy-associated plasma protein A (PAPP-A). Based on the information in the mothers’ files, pregnancy outcomes and fetal health status were also retrieved. Data was collected in February 2018 to July 2019 period. The data included the initial results from the analysis of the NIPT according to International Society of Prenatal Diagnosis (ISPD) protocol were at three scales of low, moderate and high risks. The pregnancy outcomes and fetal health status for moderate and high risk were followed to second trimester of pregnancy to perform the additional tests.
The selection of effective diagnostic factors
After extracting the diagnostic factors from the scientific resource, and in consultation with the obstetrician, maternal age, maternal weight, smoking, pre-eclampsia, fetal heart rate, CRL, NT, free beta placental gonadotropin (Free
Intelligent neuro-genetic integrated model development
In this study, among several machine learning methods, an integrated ANN system was developed as the classification model for identifying three prenatal anomalies. Various researches have demonstrated that ANNs outperform other models in the field of medical diagnosis in the majority of situations [36, 37, 38, 39, 40].
ANN
The proposed ANN model in this study is a multilayer feedforward network (MFN) as a kind of supervised ANNs. The MFN have an output layer, an input layer, and multi-layer of processing neurons (middle layer) in which information move unilaterally from the input layer neurons to the output layer neurons. In other words, the output of one layer becomes the input for the next. A transmission function (activation function) is necessary in ANNs to transform input impulses to output signals. The transfer function governs how neurons work. There are various transmission functions for activating ANN neurons, each with its own characteristics such as whether it is linear or nonlinear and how it functions [39].
Every communication in the ANN is associated with a weight. The output of the ith neuron in the middle layer is obtained from the following formula [41]:
Where
In this work, to use the most appropriate transfer function for the input and middle layers, 10 types of transfer functions have been used (Table 1). It should be noted that in order to create an optimal ANN, the selection of appropriate transmission functions for each layer were done by a GA.
As mentioned, this network has multi-hidden layers. The number of neurons in each of these layers was considered to be between 1 and 20 neurons. In order to train the ANN, 10 types of training functions and 15 types of learning functions were used (Table 1). To create the optimal ANN structure, the training function, and the appropriate learning function were optimized by the GA.
ANN output is also a system for diagnosing healthy cases from genetic disorders T21, T18, and T13.
The parameters that organize the ANN structure and the input factors were defined by haploid chromosomes containing the genes including the number of neurons in the hidden layers, transfer functions, training functions, learning functions and the presence or absence of one of the seven ANN input factors. Figure 1 shows the structure of the GA and its operation to optimize the ANN prediction performance in the GNN model. Furthermore, Table 1 shows the transfer and training functions for the hidden layers and the output layer to be used by the GA in the integrated GNN model.
ANN training and transferring functions used in the GA optimization process
ANN training and transferring functions used in the GA optimization process
Footnotes
Acknowledgments
The authors would like to thank the Fertility, Infertility and Perinatology Research Center of Ahvaz Jundishapur University of Medical Sciences and the vice chancellor of Research Affairs of Ahvaz Jundishapur University of Medical Sciences for their financial and administrative support to undertake this project. Research project No: FIRC-9709. Research ethics code: IR.AJUMS.REC.1397.826.
Conflict of interest
None to report.




