Abstract
In order to assess the fetus health and make timely decisions throughout pregnancy, Fetal Electrocardiography (FECG) monitoring is essential. Huge datasets for electrocardiograms are freely accessible from Physionet ATM Dataset1- Abdominal and Direct Fetal ECG Database (adfecgdb), Dataset2- Fetal ECG Synthetic Database (fecgsyndb), Dataset3- Non-Invasive Fetal ECG Database(nifecgdb). In this study, categorization is done based on normal and abnormal (Atrial fibrillation) FECG from three online dataset which contains FECG recordings as major details. Deep learning models like Transfer Learning (TL) and Convolutional Neural Networks (CNN) are being investigated. The composite abdominal signal and the FECG are separated using a wavelet transform approach. The best model for categorizing the parameters of the FECG is determined through a comparative analysis and performance is improved using Continuous Wavelet Transform (CWT). The accuracy of the CNN-based technique is found to be 98.59%, whereas the accuracy of the transfer learning model is 99.01% for FECG classification. The computation of metric parameters for all the datasets is done. The classification of normal and abnormal (Atrial fibrillation) is best performed in TL model compared to CNN. Real-time data analysis is done for PQRST plotting and comparative study is done using Net Reclassification Improvement (NRI) and obtained NRI = 13%, z static 0f 3.7641, p-Value of 0.00016721. Acute Myocardial Infraction (AMI) identification is done based on ST segment of Maternal ECG (MECG) images to analyze the heart attack risk. The proposed work can be utilized to track FECG waveforms in real-time for wearable technology because of its end-to-end properties and expandable intrinsic for diagnosing multi-lead heart disorders.
Keywords
Introduction
Heart defects are a significant factor in prenatal stillbirths globally, whereas perinatal complications account for over 30% of all perinatal and maternal fatalities. As a result, it’s important to keep an eye on the fetal heart rate (FHR) both during pregnancy and during delivery. It has long been recognized that Electrocardiogram (ECG) signal analysis is a fairly reliable method for identifying heart diseases [1, 2]. Theoretically, an ECG taken from the mother’s belly during pregnancy may be used to track the electrical activity of the growing fetus’s heart. Cardiotocography (CTG) can detect FHR and uterine contractions. During the perinatal phase, doctors analyze changes in heart rhythms, respiratory rate, and fetal electrocardiogram to track the growth and development of the FECG. The FECG displays the activity of the heart as the foetus is developing. Fetal anomalies during fetal development, such as fetal distress and intrauterine hypoxia [3, 4], may be identified via fetal ECG waveform analysis. Since CTG is susceptible to signal degradation and only offers an estimate of the FHR, it cannot be used for an extended period of time. The Maternal ECG (MECG) makes up the majority of the data in the FECG signals captured at the chest leads. Although the signal energy is frequently lower than that of MECG, it is also known that the ECG signal energy at the abdominal leads, obtained in the later months of pregnancy, offers information on the FECG.
Motivation
To provide high performance metrics like accuracy and other metrics based on the performance of deep learning model that are utilised for large dataset To Process the massive data in biomedical imaging, which leads in superior outcomes that aid clinics in making the proper diagnoses.
Contributions
To give the best deep learning model for fast performance and best analysis. Can be used in the hospitals for immediate result decision by the doctors for treatment.
Right now, there are two methods for getting a FECG. One method tracks the pure FECG signal directly and is called invasive scalp electrode technique. It can only identify FECG signals at birth and is obtrusive. As a result, it might be detrimental to both the mother and the foetus. The non-invasive abdomen electrode method is an innovative technique [5, 6]. Long-term pregnancy monitoring is made possible by the application of an electrode patch to the mother’s abdomen, which collects signals from the abdominal body surface.
The abdominal recordings contain a variety of interferences and noises, including the MECG, power line noise, baseline wander, foetus and mother’s muscle sounds, as well as movement artefacts. The recovered FECG signals frequently have an extremely poor signal-to-noise ratio (SNR) because some of these interferences’ waveforms overlap with the FECG in both time and frequency. The three processes that are common to the FECG extraction method are preprocessing, separation, and post processing. Preprocessing entails eliminating unwanted noise, including baseline wander and power line interference. The FECG is generated in the separation process by first subtracting the signals from the estimate mother ECG. Finally, post processing is used to improve the retrieved fetal ECG signals’ quality [7]. Low SNR are usual for the recovered FECG signals. Several studies have proposed methods for improving the FECG signal quality in order to increase the SNR. Only a few studies have focused on the post processing of the acquired signals in non-invasive FECG analysis [8], which has mostly concentrated on the first two phases along with the advancement of the acquisition equipment. A common technique for increasing the SNR of obtained signals is beat-to-beat averaging, which has the drawback of eliminating small variations in pulse shape.
Ultrasound is not passive and periodically moving the ultrasound transducer is necessary. Therefore, a practical, safe, and comfortable long-term fetal monitoring system is required to monitor the fetus’s health throughout daily activities. It is challenging to extract the fetal ECG since the maternal ECG identified in the belly is 10 times larger than the fetal ECG [9, 10]. Therefore, it is essential to create a non-invasive technique that can successfully extract the FECG. Convolutional neural networks (CNNs) [11, 12] Transfer Learning (TL) [13, 14] are some examples of deep learning architecture-based techniques that have been widely utilized to reduce noise in signals and images in recent years [15, 16]. These techniques have also been widely utilized to detect arrhythmia [17], remove noise from adult ECGs, and extract fECGs. In this paper a proposed work is made with deep learning models such as CNN, TL to classify FECG waveforms based on a time-frequency representation [18].
Literature survey
KwangJinLee [19] et al., proposed a novel method using an end-to-end deep learning network architecture based on W-net that successfully separates a single-channel maternal abdominal ECG into a MECG and FECG. To train the model, a simulation dataset is utilized. After that, a realistic maternal abdominal ECG is used to create a FECG. On the basis of the identification of QRS complexes, the suggested architecture’s performance is evaluated against a number of cutting-edge deep learning models. Higher precision, recall, and F1 scores are reported by the suggested model. These results show that the suggested approach can successfully produce an ECG for the foetus from a mother’s single-channel abdominal ECG. Continuous maternal and fetal monitoring should be supported by commercial applications.
Yuwei Zhang [20] et al., suggested a portable ECG monitoring device that might be used to monitor FHR at home. Using this system, which also records the maternal and FECGs, a pregnant woman’s abdominal ECG (AECG) is captured. The data collection circuits, data transfer module, and pattern recognition platform that support the ECG monitoring system all have low input-referred noise, high input impedance, and high resolution. The adaptive dual threshold (ADT) and independent component analysis (ICA) techniques are utilized to distinguish between the signals of the FECG and AECG. The results show that the proposed method is capable of accurately and with good signal quality recording the AECG for FECG and heart rate data in a range of locations. The average Se, PPV, ACC, and F1 scores for the fQRS complexes extraction are 99.62%, 97.90%, 97.40%, and 98.66%, respectively. This research demonstrates the suggested system’s potential for use in fetal health monitoring.
Fabian Braun [21] et al., tested a wearable system based on cooperative sensors from CSEM, a flexible technology that enables the detection of numerous bio signals and a simple integration into a garment or patch. The testing of the method involved 25 women with singleton pregnancies and maternal ages of fewer than 37 weeks. Inaccurate estimates of the FHR can be discarded using the signal processing approach’s signal quality index. With a mean absolute error of less than 5 bpm and an acceptance rate of more than 70%, fHR calculations performed excellently in 12 out of the 21 patients who were available for the trial. The remaining 9 patients, on the other hand, had poor acceptance rates and high error rates.
Yu-Ching Ting [22] et al., developed a method using a convolutional neural network (CNN) to identify a fetus’s ECG from an abdominal ECG data. Pre-processing and classification are the two stages of the flow. During pre-processing, the spectrogram is produced using the short-time Fourier transform, and it is then classified using a 2D CNN. With a high detection accuracy of up to 95.2%, the CNN-based technique outperforms the conventional algorithm after combining categorised data from numerous channels. The 2D CNN classifier and spectrogram processor make up the hardware of the FECG detector, which is implemented on an FPGA platform.
Li Yuan [23] et al., created a lightweight, portable device for collecting FECG data that continually captures signals from the mother’s abdomen. A smartphone client received the ECG data over Bluetooth. Smartphone app software was developed based on the Android operating system. The software combines the sample entropy technique and the fast fixed-point algorithm for independent component analysis (FastICA) to swiftly extract the FECG signals from the mother abdominal ECG data. Using the FECG signals that were recovered, the FHR was calculated. According to experimental findings, the FastICA algorithm can clearly extract a fetal ECG, and the sample entropy can properly identify the channel in which the FECG is present. The suggested real-time, non-invasive FECG monitoring equipment could be useful.
Eleni Fotiadou [24] et al., for learning end-to-end mappings from unclear FECG data to clear ones, a deep convolutional encoder-decoder network with symmetric skip-layer connections was proposed. The experiments using simulated data revealed an average improvement in signal-to-noise ratio (SNR) of 9.5 dB for FECG transmissions with input SNR varying from -20 to 20 dB. Additionally, a sizable collection of real signals are used to assess the approach, showing that it may significantly enhance the quality of noisy fetal ECG signals. Demonstration was done in the network performs better and more reliably when using multi-channel signal information compared to single-channel network. The suggested approach does not require prior knowledge of the noise or pulse locations and can withstand beat-to-beat morphological fluctuations.
Zhong W [25] et al., demonstrated an innovative technique for retrieving FECG data using a deep learning algorithm from single-channel AECG recordings. A residual convolutional encoder-decoder network (RCED-Net) has been created with FECG extraction in consideration. The single-channel AECG recording serves as the source for the RCED-input Nets. In the AECG recording, the RCED-Net directly offers the estimation of the FECG component after extracting the AECG feature. The effectiveness of the proposed approach is demonstrated by combining the AECG recordings from two separate databases. The suggested strategy outperforms the other methods mentioned in the literature by a wide margin. According to this analysis, the suggested method can successfully isolate the FECG component from AECG recordings. The emphasis on single-channel FECG extraction techniques is conducive to commercial applications for long-term fetal monitoring. healthy foetus, 300 participants were involved in the study. This assessment looks at the foetal heart axis, section intervals, standardized amplitude, and FHR. A questionnaire is used to determine whether a newborn is healthy.
In the most of the previous works based on CNN and TL deep learning models the PQRST plotting was not made and only the best performance discussions were done. In this paper the plotting of PQRST waveform for real time dataset is verified using the deep learning models under study.
Database specifications
The database used under study complete details is discussed here.
-Dataset1- Abdominal and Direct Fetal ECG Database (adfecgdb)
During weeks 38 to 41 of pregnancy, the labour will start to form. There were four signals gathered from the mother’s abdomen. The fetal head simultaneously recorded a direct electrocardiogram. 7000 sample points in total were recorded. The electrodes were in the same location during every recording. Abrasive material and Ag-AgCl electrodes were used to boost skin conductivity. Bandwidth from 1 to 150 Hz. With 16-bit resolution and a 1 kHz sampling rate, baseline drift and extra digital filtering for power-line interference (50 Hz) are used.
-Dataset2- Fetal ECG Synthetic Database (fecgsyndb)
The FECGSYNDB is a huge library of non-invasive (NI-FECG) signals that simulate adult and FECG signals and provides a trustworthy, replicable resource to help field research. 5000 sample points altogether were counted. 500 Hz bandwidth. To produce the data, an FECGSYN simulator is employed. 0 to 5 are the baseline values. Noise and baseline are given by the number 0, noise and fetal movement by the number 1, acceleration and deceleration of the MHR/FHR by the number 2, uterine contraction by the number 3, and ectopic beats by the number 4, respectively.
-Dataset3- Non-Invasive Fetal ECG Database (nifecgdb)
A single subject provided 55 multichannel abdominal non-invasive FECG recordings between weeks 21 and 40 of pregnancy for this database. Various periods of time were spent keeping the records on a weekly basis. 15,000 sample points were recorded in total. These recordings may prove to be quite helpful when testing signal separation methods. Additional information is provided, such as two thorax signals, three or four abdominal signals, an improved SNR due to the electrode position, a 50 Hz notch filter, a bandwidth of 0.01Hz-100 Hz with a sampling rate of 1 kHz, and other factors. There is a 16-bit resolution alternative.
Data processing
Preparing data for training and testing is done at this step. An altered data store and the resize data helper function are used to first segment the data. Next convertion of 1-D FECG impulses into 2-D colored scalogram images was Continuous Wavelet Transformation (CWT). Normalization of the data is done using mean-standard method as preprocessing [26]. This involves subtracting the training data’s mean from all subsets of data, including the test, validation, and training subsets. After that, the outcome is multiplied by the standard deviation of the training data [27, 28]. The generated data has a standard deviation of one and a mean of zero. Over the duration of 100 ADAM optimizer iterations, With the learning rate of 0.003,training of deep neural network is done and a decay rate of 10-5.
Data segmentation
The sequence segments for the recording totaled 65,536. By discarding the remaining portions of the segment, 65,536 sequence segments into ten portions of 500 samples each is partitioned [29, 30]. There are 900 recordings, each of which is broken into ten chunks of 500 samples [31].
Methodology
There are two deep learning models involved under comparison for FECG dataset. They are CNN, Transfer Learning. It involves preprocessing, augmentation, normalization, validation and evaluation steps shown in Fig. 1.

The Proposed flow chart for finding Metrics.
Convolutional neural networks (1D grid), a particular subtype of deep neural networks, are utilized for processing input that has dimensions. CNN can be used to handle data like as image data and time-series data (1D grid). This model extracts the characteristics of the data with the use of minimal interactions, parameter sharing, and equivariant representations. For this reason, CNN has been used in many different domains, including fault diagnosis, bioelectrical signal analysis, and picture processing. The CNN model in this paper makes use of Convolutional, maxpooling, dense layers are structured. Each convolutional layer is followed by a batch normalizing layer, and then the max pooling layer. The accuracy is enhanced by making use of dense layers. The CNN makes use of 5 filters with a stride of 1. The output in multi-class systems uses the layer softmax function classification. Each row of the 162 by 65536 matrix in the data field is represented for FECG recordings of sampling rate varied from one dataset to another. One diagnostic label is present in each row of data’s 162*1 cell array known as Labels. By using data augmentation techniques, the measured FECG signals are inflated and data cross validation is performed out. ReLu nonlinearity is used to obtain better performance using CNN. With the exception of the final layer, each layer utilised a Rectified Linear Unit (ReLU) activation function. In the first quadrant, this function is linear, while in the remaining quadrants, it is zero. The last layer use a SoftMax activation function to forecast the probability of each class and output the class with the highest probability. The model was optimised using an Adaptive Moment Initializer (Adam), and the error was calculated using a Categorical Cross-Entropy loss function. Max pooling is used in each feature map, to provide the highest secured value. Fully connected feed forwarding neural networks are used to provide output from the final pooling layer. The Adam optimizer uses RMSprop with momentum and stochastic gradient descent as its main components which is shown in Fig. 2.

The Proposed 2D CNN model architecture used for FECG.
An ablation study is carried out for 1D CNN and 2D CNN of 5 filters with stride size of 1.
The specifications of the 2D-hyper CNN’s parameters is provided in Table 1. The learning rate for training is lr 0.003 to find which suits best for accuracy of the model.
Hyper parameters and its values used in the proposed 2D CNN model
Baseline drift, electrode contact noise, electromyographic noise, motion artefacts, power line interference and instrument noise are the disturbances that frequently interfere with FECG outcomes. These noises may result in a wrong classification or inaccurate diagnosis. De noising an original FECG signal using a wavelet transform technique, which is suitable for all frequency ranges, will result in an increase in accuracy in both the temporal and frequency domains. The signals can essentially be mapped into a time-scale domain using the CWT. It also improves the accessibility of the frequency components in the signals for study as a viable option.
Where τ is the scale variable represents the translation variable. After translation and scaling, the wavelet transform is performed, and in addition to identifying the frequency components of the signal, it also examines the precise locations of various frequencies throughout time. Wavelet transform, in contrast to Fourier transform, provides a movable time-frequency window, allowing dynamic adjustment of its scale. For the duration of the finite support window, it is assumed that the signal remains steady in order to explain such signals. The following formulas demonstrate the conversion of a 1D signal into 2D (224 x 224) data.
Where N represents the window length, x[i] denote the input signal. The log values of YCWT [a,b] represents as spectrogram (224×224) data. Some wavelet transforms mainly used for enhancing the accuracy of the FECG signals. They are frequency B-Spline wavelets (fbsp), complex Morlet wavelets (cmor), mexican hat wavelet(Mexh), daubechies wavelets (db), biorthogonal wavelets (bior), and symlets (sym) [32].
The fbsp function is described as
Where n is an integer order parameter, f a is a bandwidth parameter, and f b is the wavelet center frequency.
The Mexh function is described as
Where
The wavelet transforms which have no specific mathmatical formula are daubechies wavelets (db), biorthogonal wavelets (bior), and symlets (sym).
The wavelet functions are related to scaling functions shown in Equations (6), (7).
An optimal scale is decided and given as
The Signal to Noise Ratio (SNR) is calculated as,
Where sAnoise the sum of amplitude disturbance of FECG signals is, sPRnoise is the sum of amplitudes of Rpeak signals. Two test-related parameters such as dilation and thrhd should remain constant at 0.1 and 0.4, respectively, when performing peak detection. Database 1 loop equals 10; for Database 2, loop equals 25; for Database 3 loop equals 30 [33].
K value for different wavelet transform
The difference between the targets and the estimated labels is represented by the cost function. It uses the optimizer function to minimize the gap. The cross-entropy function is the most frequently employed cost function in the neural network.
Where Cfn stands for the cost function that should be lowered. The goal value is xc, and the class index is c. M stands for all possible classes, and b stands for real value. The gradient descent with a learning rate of 0.003 is used.
Squeeze Net is used for FECG waveform classification using a time-frequency representation after being pretrained for image identification. This is done by (a) 3x3 filters swapped out for 1x1 filters,(b) 3x3 filter configuration with fewer input channels,(c) network down sampling after some time by making stride > 1. In order to create a convolution layer with a high activation, subsampling operations is done at the network’s final phases given in Equations (12)–(16).
The feature map of the image is obtained from Equation (6) after employing the activation function.
Where xi,n –is the signal from ith lead, and xi,max are maximum and minimum for xi,n having 0≤n≤N and (N/fs)=120 s. The overlapping at two ends for avoiding fragmentation of QRS complex at 250 ms by segmenting xi,n.
The goal of the machine learning (ML) study area known as transfer learning (TL) is to transfer knowledge that has been gained through the solution of one problem to another that is unrelated but nonetheless pertinent. Using the fetus data set from Physionet ATM as a training set, Pretraining of the deep convolutional neural networks (CNN)is done intially. There are two components for the domain A. The feature space Y and P(Y) the probability distribution where Y={y1, . . . .yn} ∈ Y and {y1, . . . .yn} are the subset of n features. Two domains can have different feature spaces or marginal probability distributions if they are different. If a domain is given, A={Y, P(Y) }. A task is composed of two elements: a label space Z and an objective prediction practice (.). Despite not being experienced, this function can be learned from a set of pairs that make up the training data {Y, Z} where Y ∈ Y and Z ∈ Z. The function used can be employed to figure out the right label for an instance V. The task is given by W={Z,f(.)}. The domains for source, Task, a target and learning target get improved by the given function. The source domain is not equal to target domain or task domain is not equal to task learning. Figure 3 gives the steps involved in the TL model.

Steps involved in the Transfer Learning.
The size of the dataset is reduced and many layers were freezed in the TL model during performance.
The complete workflow diagram is shown in Fig. 4.

The complete flow diagram of the deep learning model under study.
As part of the preprocessing, each recording’s signal is down sampled from 500 Hz to 250 Hz to correlate to the FECG bit rate. To achieve this, the signal is first normalized using mean and standard deviation estimations calculated throughout the complete data set. A 60 sec constant signal length is also achieved by adding zeros to some recordings. The data set is partitioned into train, test, and validation sets at 75%, 20%, and 5% of the recordings, respectively, prior to training. The class ratio in each set was kept constant. Random samples are used during pretraining to create mini-batches. An FECG frame is defined as a smaller than one-minute-long segment of the continuous FECG signal. Each frame is normalized using the mean and standard deviation calculated across the full data set. Collection is done based on extra training samples for pretraining by taking an average of 4096 FECG frames from each patient. Saving of CNN weights after a few thousand training steps is obtained. Upon completion of the pretraining, the model is reset to the checkpoint where it performed most accurately during validation. Figure 5 shows the sample FECG waveform for one dataset involved. The waveforms are constructed similarly for the remaining Dataset 2 and Dataset 3. Ablatory study is carried out for two activation function Relu and Softmax.

FECG waveform from online database (Dataset 1).
For both positive and negative samples, this step also involves the use of data augmentation techniques. Convergence is accelerated and data volume is increased. The magnitude of each augmentation, denoted by M, and the number of augmentations to be applied in each batch, denoted by N, are additional hyper parameters for augmentation. Multiple oversampling and linear transformation values are used for data augmentation. The target object’s data diversity can be increased using this strategy, which can also raise segmentation accuracy. The generation of fictitious samples from the trained models and merging them with the initial data of all datasets to make the number of AF class samples equal to the number of samples from the Normal class is done.
FECG signal denoising
Each slice of the direct transmission was 3 sec long. When compared to other values used in practice, this value was picked as a good middle ground because shorter slice durations may result in incomplete waveform information interception while longer slice durations may contain more waveforms,which could affect the neural network’s detection capabilities. The flow of dividing the original signal and noise is shown in Fig. 6.

Adaptive noise cancelling.
Figure 7 depicts the original signal and the filtered signal after undergoing noise cancellation.

Original and reconstructed FECG waveform.
Different factors and various sample sizes are employed in the datasets used to categorize FECG as normal or abnormal. Therefore, when implementing deep learning models, taking into account the parameters and its size is highly significant. The recordings taken are for 10 sec. The minimum and maximum values for each dataset is given in Table 2.
Minimum and Maximum values
Minimum and Maximum values
Deep learning models are used for the training and testing. For the FECG datasets CNN, TL models were employed. As the dataset have various parameters and different sample size shown in Table 3. All the samples need to be of equal shapes which is done by data augmentation phase. For the two models involved in this work of Epoch: 50, LR: 0.003, Iterations: 225 is used. The deep learning models’ expansion in the dataset without null value is shown in the Fig. 7 for both before and after preprocessing phases.
Size of the samples
Before preprocessing phase the percentage of dataset is about 75%. After preprocessing stage there is increase in the dataset to a maximum of 90%. Due to the uniqueness of the datasets and the usage of various factors, the accuracy varies which is shown in Fig. 8.

Preprocessing process (a) Before preprocessing stage, (b) After preprocessing stage.
In the early stages, the TL model performs best above 84% for dataset1 with sample values in the range of -15(μv) to 15(μv), followed by the TL model above 88% for dataset2 with sample values of extremely low from -8(μv) to 8(μv). After preprocessing stage, dataset2 gives higher accuracy for TL above 89% for which more augmentation is needed. The accuracy of the online dataset is increased based on the CWT ablation with its types. The training and testing of all the three datasets accuracy performance is based on SNR shown in Table 4.
Sum of SNR for all database
It is found that cmor1-1.5 has less SNR in all the dataset involved under study compared to other wavelet types and can be used for improvement in the accuracy of the testing.
Using Google Colab and Jupyter Notebook, a Python IDE-based tool. The model was developed, put to the test, and analyzed. According to the size of the sample used for each dataset, Table 4 shows the amount of time required for training and testing for the deep learning models involved. As the transfer learning needs less amount of data for training and testing, it saves the training time. The comparative study for computational timing is listed in Table 5.
Comparison of time duration taken for training and testing using 2D CNN model and TL model
Comparison of time duration taken for training and testing using 2D CNN model and TL model
When augmentation is minimal, the dataset2 with more data samples produces highly accurate results for each model, and when augmentation is maximal, less accurate results are produced. The associated confusion parameters are Accuracy to have the predictions be true, where a high value of 1.0 is the best outcome and a low value 0.0 is the worst outcome, Specificity is the proportion of correctly anticipated negative outcomes divided by the total number of negative predictions, whereas sensitivity, often known as recall (REC), is the proportion of correctly predicted positive events divided by the total number of positive outcomes. Precision, also known as Positive Predictive Value (PPV), is calculated by dividing the total number of positive predictions by the proportion of correct positive predictions. The F1 Score represents the mean value of both REC and PPV. True positive (TP)=the number of FECG signals correctly identified as abnormal. False positive (FP)=the number of FECG signals incorrectly identified as abnormal. True negative (TN)=the number of FECG signals correctly identified as normal. False negative (FN)=the number of FECG signals incorrectly identified as normal
It is discovered that dataset3 which have higher sample values of range -74(μv) to 60 (μv) provides much higher accuracy for all of the models utilized compared to dataset1, dataset2. The loss is more for dataset2 of 0.4% to 0.5% which have more augmentation since the data samples are extremely low compared to other datasets which is much minimal given in Table 6. The baseline for the deep learning models utilized varies from 0.99 to 1.00 for Dataset3 of FECG maximum amplitude, where less preprocessing and data augmentation is performed.
Metric analysis comparison for all FECG dataset
Figure 9, displays the accuracy and loss for epoch till 15 taken as example for the deep learning models involved. It is discovered that the loss is 10% to 20% greater for dataset 2 whose samples require greater levels of preprocessing and augmentation. These datasets have extremely small samples with very low amplitudes, making it challenging to analyze the data if noise is also present. Dataset1 and Dataset3 have the maximum amplitude of samples 16(μv) and 60 (μv). In this if noise is present, it is easy to denoise the samples. When compared to other dataset samples with fewer FECG amplitudes taken, dataset3 samples with high amplitude FECG samples have an accuracy of each epoch that reaches a maximum of 91%. Figure 10 depicts the accuracy, loss bar chart for all the dataset under study. It is found that Dataset 3 has less error loss.

Accuracy and Loss plot (a) TL Accuracy, (b) TL Loss,(c) CNN Accuracy,(d) CNN Loss.

Illustration of confusion Matrix (a) 2D CNN model accuracy 98.26%, (b) 1D CNN model with accuracy 97.45%.
The confusion matrix is given in the Table 7 in which TL model produced the highest level of classification. Here, each column represents the expected level, and each row represents the output’s actual level. The other deep leaning models CNN, TL all gives more false predictions as the manual augmentation is done to increase the sample size.
Confusion Matrix analysis for all FECG datasets
Figure 10 depicts the ablation study of CNN model with 1D and 2D convlution layers for 2 classes and found that 2D performance well with 98.26% accuracy compared to 1D model with 97.45% accuracy. The best accuracy and loss graphical representation is shown in Fig. 8 for both CNN and TL model. It is seen that TL gives accuracy of 99% compared to CNN of 98% and loss is much reduced in TL. When augmentation is minimal, the dataset2 with more data samples produces highly accurate results for each model, and when augmentation is maximal, less accurate results are produced. Receiver Operating Characteristic Curve (ROC) is used for binary classification which is a plot of True Positive rate and the false positive rate. Area under Curve (AUC) displays a total evaluation of the deep learning model’s performance. Figure 11 depicts the ROC curve for CNN and TL. It is found that the ROC curve performces well for CNN with 95% compared to TL with 91% as TL freezes some layers for training and makes use of less dataset with transferred features.

Receiver operating characteristic curve 0- Abnormal, 1-Normal (a) CNN (b) TL.
Four realtime dataset of Maternal ECG and Fetal ECG recordings were obtained from a private hospital during trimester periods (36 week) to compare the deep learning model performance under consideration. We take pregnant women recordings in the age group of 26 years -37 years. The details of the four dataset is provided in Table 8. As the FECG amplitude will be very very less compared to MECG, MECG recordings were taken for better outcomes. These recordings were taken from Electrocardiograph for a duration of 5 mins to 10 mins. As a small variations in the “low-frequency” ST segment are analysed, baseline drift is to be removed in order to minimise changes in beat shape that do not have cardiac cause. This is shown in Figure. As the patient4 age is more compared to other patients involved, it is taken for our performance study. From Fig. 12 it is found that from R peak detec tion the HeartRate for pateint4 is 67 bpm and by the simulation 74 bpm is obtained. This shows that our CNN model gets trained well for detecting anomalies. The plotting of PQRST is done for patient4 which is shown in Fig. 13.

R peak detection for Patient4 and heartbeat calculation.

PQRST plotting for patient4.
Real-time dataset with PQRST values
The matlab code to calculate R peak is provided below.
The ROC curve compares predictions across the whole range of risks by showing how well the model can rank predicted probabilities. It can be useful for classification, where discriminating between those with and without disease is of most interest. The ROC curve directly assesses discrimination and is not affected by calibration, or how well the predicted risks match those observed. When comparing models, changes in ranks may or may not be as important in patient care as changes in the levels of absolute riskAUC emphasizes discrimination, which is not the only factor and might not even be the most significant one when it comes to risk assessment. It is vital to assess the models before adding new markers to existing prediction models in order to establish their efficacy. Net Reclassification Improvement (NRI) is based on the movement in probable risks between those with and without events is being focused on, both upward and downward. Simple increases and decreases in anticipated risks can be used to describe upward and downhill movement. NRI assesses the ‘net’ difference between the number of people successfully categorised using CNN by Calculating how many people experiencing an event increased their absolute risk or risk category like medium to high and how many people experiencing an event lowered their absolute risk or risk category like medium to low and how this is done. In this without event subjects that are correctly reclassified as lower risk are given as x + 1, and the events that are reclassified correctly as higher risk. Similarly without events incorrect classification were taken as high risk assigned as x-1 as the events which were assigned as lower risk. Subjects which are not assigned are given as 0. 80 subjects taken from Dataset3 classification improvement for moving up using model with normal heartrate (moving up) and 40 subjects with abnormal heartrate (moving down). When the event is not experience (D = 0, n=3081). 164 were reclassified up and 182 were reclassified down. The NRI is reported as 0.1308 and p < 0.0001. The NRI = 0.1308, z=3.7641, p-Value=0.00016721 is obtained. Z is the statistical for NRI. The improved classification for a NRI of 13 per cent of individuals with events, with no net loss for non-events. The outcomes are shown in Table 9.
NRI Moving UP/DOWN values
NRI is calculated using the formula:
1000 samples from Dataset 2 is taken as it has MECG information and the real-time dataset values are appended in to it and PQRST plotting were done. It is shown in Fig. 14.

Labeling of PQRST plot for MECG signal.
Acute Myocaardial Infraction (AMI) is amedical condition that can be fatal when the heart muscle’s blood supply is suddenly interrupted and causes tissue damage. This typically happens when one or more coronary arteries get occluded. A blockage may arise as a result of the build-up of plaque, which is primarily composed of fat, cholesterol, and cellular waste materials, or as a result of a sudden blood clot that forms on the obstruction. Left anterior descending artery (LAD) obstruction tends to be the cause of anterior ST elevation myocardial infarction (STEMI). Acute Myocaardial Infraction (AMI) identification is done for real-time patient4 shown in Table 7 which will be helpful in identifying AMI for the real-time dataset. Figure 15 shows AMI findings which states Hyperacute T-waves with a lower R wave height are present in V2–6. Q waves can be deep (>1/3 of a R wave’s height) or broad (>1 mm in duration). Hyperacute T waves, which are tall and uniform and appear during the first few minutes of the event, are the earliest ECG finding in acute myocardial infarction. These then go away, and over the course of minutes to hours, ST elevation occurs in the afflicted leads, whereas ST depression occurs in the opposite leads.

AMI findings (a) Normal ECG, (b) Patient4 ST Segment visulaization with area of AMI(STEMI).
In this paper Hampel filter is used as outlier for true and false predictions. In this method the data is scanned using a moving window whose width can be adjusted. Determination is done using the median and standard deviation for each window (provided the observation, the 2 window size surrounding components, and the window size at each side). For implementing the Hampel filter for time series data rolling method using lambda function is used.
When using the rolling approach, the window size is doubled and centering is used, putting the observation under consideration in the middle of a window that is [2 * window size+1]. The Hampel filter’s attempt to determine if a prediction is true or untrue is shown in Fig. 16. However, it was discovered that only one of the three sample series could be successfully predicted, with an accuracy of 33%. First point this might be due to trouble in identifying outliers at the start and end of the series, when the window is incomplete (even on one side), the function does not identify potential outliers. Second point when the outliers are scattered over the window range, it struggles to identify them.

Outlier detection for a set of sample values of FECG (a) Outlier plotting, (b) Outlier true and false.
The classification of FECG data from online databases for FECG is discussed in this study, along with methods for noise removal and performance metrics for evaluating the precision of the deep learning models used. The CNN and TL involved in the project have all had their metrics analyzed. Utilizing a confusion matrix, these measurements can be computed. Numerous studies have employed a neural network as a classifier and the MIT-BIH arrhythmia database as a dataset, particularly for ECG categorization. With a value of 99%, TL provides better accuracy. Future applications of this categorization study include early FECG prediction, which enables clinicians to act quickly in cases where pregnant women are in emergency situations.
Future scope
Piezoelectric sensors as wearable belt to measure abdominal fetal electrocardiography is currently undergoing a stage of intense development and in the near future it may replace, in large part, the well-known Doppler ultrasound method while ensuring that the accuracy of the fetal heart rate measurement will be similar to that of direct electrocardiography. The cost of recording bioelectrical data is unquestionably cheaper, so the economic side is also quite significant. Additionally, it is a more practical choice for tele medical home care, which is gaining popularity in high-risk pregnancy monitoring.
