Lipoprotein detection: Hybrid deep classification model with improved feature set

Abstract

Patients with chronic liver diseases typically experience lipid profile problems, and mortality from cirrhosis complicated by portal vein thrombosis (PVT) is very significant. A lipoprotein (Lp) is a bio-chemical assemblage with the main job of moving fat molecules in water that are hydrophobic. Lipoproteins are present in all eubacterial walls. Lipoproteins are of tremendous interest in the study of spirochaetes’ pathogenic mechanisms. Since spirochaete lipobox sequences are more malleable than other bacteria, it’s proven difficult to apply current prediction methods to new sequence data. The major goal is to present a Lipoprotein detection model in which correlation features, enhanced log energy entropy, raw features, and semantic similarity features are extracted. These extracted characteristics are put through a hybrid model that combines a Gated Recurrent Unit (GRU) and a Long Short-Term Memory (LSTM). Then, the outputs of GRU and LSTM are averaged to obtain the output. Here, GRU weights are optimized via the Selfish combined Henry Gas Solubility Optimization with cubic map initialization (SHGSO) model.

Keywords

Lipoproteins improved log energy gated recurrent unit LSTM selfish combined henry gas solubility optimization

1. Introduction

Lipoproteins are complex molecular compounds made up mostly of free and fermented proteins, triglycerides, phospholipids, Apolipoprotein A-I (APO A-I), and cholesterol that transport plasma lipids [1, 2, 3]. The four key types of lipoprotein chylomicrons are Very-Low-Density Lipoprotein (VLDL), Low-Density Lipoprotein (LDL), and High-Density Lipoprotein (HDL), as well as the two minor classes, i.e., Lp (a) and Intermediate-Density Lipoprotein (IDL), are differentiated by density, size, composition and lipid content, and composition and protein content [4, 5, 6], to name a few.

Nomenclature
Abbreviation	Description
APO A-I	Apolipoprotein A-I
ANN	Artificial Neural Network
ALTs	Advanced Lipoprotein Tests
CSO	Cat Swarm Optimization
CNN	Convolutional Neural Network
CRP	C-Reactive Protein
DBN	Deep Belief Network
IDL	Intermediate-Density
FPR	False Positive Rate
FNR	False Negative Rate
GRU	Gated Recurrent Unit
HDL-C	High-Density Lipoprotein Cholesterol
HDL-P	HDL Particle Concentration
HBA	Honey Badger Algorithm
HGSO	Henry Gas Solubility Optimization
HC	Hybrid Classifier
LSTM	Long Short-Term Memory
LDL-C	Low-Density Lipoprotein Cholesterol
LEE	Log energy entropy
Lp	Lipoprotein
LDL	Low-Density Lipoprotein
MRA	Multivariable Regression Analysis
MMS	Modified Marshall Score
MCC	Matthews Correlation Coefficient
NSTEMI	Non ST Segment Elevation Myocardial Infraction
NPV	Negative Predictive Value
RF	Random Forest
RBFNN	Radial Basis Function Neural Networks
RNN	Recurrent Neural Network
SVM	Support Vector Machine
SSO	Shark Smell Optimization
SHGSO	Selfish combined Henry Gas Solubility Optimization with cubic map initialization
SFO	Selfish Optimizer
TIMI	Thrombolysis In Myocardial Infarction
VLDL	Very-Low-Density Lipoprotein

For the past fifty years, sub-fractionation and improved lipoprotein analysis have been applied in human scientific medical research [7, 8]. The Framingham Heart Study and the Lawrence Livermore Study, both undertaken at the University of California, were the first to show the therapeutic relevance of sophisticated screening and laid the foundation for future studies. Clinical research over the next 50 years has shown that certain parts of these tests provide insight into the atherogenic processes independent of routine lipid results and findings [9, 10]. Several lifestyles and pharmacologic therapy trials have revealed substantial disparities in treatment response based on lipoprotein subclass categorization. Differences in arteriographic results were associated with alterations in lipoprotein subtype dispersion [11, 12]. Standard lipoprotein measures of whole cholesterol, triglycerides, High-Density Lipoprotein Cholesterol (HDL-C), and low-density lipoprotein cholesterol (LDL-C) have failed to uncover numerous lipoprotein anomalies that lead to coronary heart disease and peripheral vascular cancer risk over the last two decades [13, 14, 15, 16]. The ensemble technique combines several trained base classifiers [40]. These techniques are typically referred to as ensemble learning, which is known to lower the classifiers’ variance and increase the robustness and accuracy of the decision-making system [41]. The abundance of training data is the primary determinant of victorious generalizing machine learning models. Ensemble approaches are the machine-running technology that was created to address these issues. To categorize new examples, an ensemble of classifiers combines the judgment of individual classifiers in some way [42, 43, 44]. Advanced Lipoprotein Tests (ALTs) provide an understanding of minor but significant elements of lipoproteins and atherosclerosis, which may assist in explaining why the LDL-C–lowering technique has been so ineffective in halting the atherosclerosis epidemic. ALTs can be used in four different ways: (1) to improve atherosclerosis prognosis, (2) to improve outcome forecasting, (3) to aid in therapy selection and dose modification, and (4) to advise first degree relations of a patient with atherosclerosis [17].

The contributions are as follows:

•
Introduces lipoprotein prediction model, where features like correlation features, improved entropy, raw features, and semantic similarity featuresare derived.
•
Then, derived ones are classified via hybrid classifier. The hybrid classifier is the combination of LSTM and GRU. GRU weights are optimized via Selfish combined HGSO with cubic map initialization.
•
Proposes a new, Selfish combined Henry Gas Solubility Optimization with cubic map initialization (SHGSO) model, which is the hybridized version of existing Selfish Optimizer (SFO) and HGSO models.

Sections 2 and 3 reviews the extant LP(a) schemes and explainsthe proposedsystem. Sections 4 and 5 portrays features and classifiers. Sections 6 and 7 discuss results and conclusions.
2. Literature review

Nancy et al. [18] investigated the relationship between Lp (a) and incident cardiovascular diseasein women in 2018. Lp (a) was measured using a turbidimetric assay in three women’s cohorts (the Women’s Health Study [ $N=$ 24.558], a case-cohort sample from the Women’s Health Initiative Observational Study [ $n=$ 1.815 cases, subcohort $n=$ 1.989], and the JUPITER [Justification for Use of Statins in Prevention] trial [ $n=$ 2.5]. The kind of relationship with incident cardiovascular disease was investigated using a Women’s Health Study derivation sample ( $n=$ 16.400). Risk reclassification was used to evaluate models with and without Lp (a) that included standard cardiovascular risk variables.

In 2018, Peter et al. [19] used patient-level data from seven random, placebo-controlled statins outcomes studies to determine hazard ratios for cardiac disease, classified as fatal or non-fatal coronary heart disease, stroke, or revascularization operations. Also, hazard ratios for coronary heart disease events were determined. Without impact alteration by any erstwhile study level or patient-level factors, the link between on-statin coronary heart disease risk and LP (a) was larger for placebo LP (a). It was muchmore evident at early age. Qi et al. [20] enlisted 83 UAP patients and 105 NSTEMI in 2021. Lp-PLA2 tertile data was used to create another group divide. The artery flow state was represented using the modified Thrombolysis In Myocardial Infarction (TIMI frame) count Corrected thrombolysis in myocardial infarction (TIMI) frame count (CTFC). CTFC and other clinical markers were analyzed for correlation. The researchers performed MRA to find the parameters influencing cardiac flow in NSTEMI patients.

Chen et al. [21] set out to see if APO A-I and HDL-C could predict permanent health complications in 2018. A maximum of 102 adolescent acute pancreatitis patients with local problems, organ damage, or progression of previous comorbid disease during treatment were integrated into a retrospective analysis between January 2011 and September 2016. The association between serum lipids and clinical outcomes or grading systems was calculated. In addition, the AUCs for predicting recurrent OF were computed and compared.

Anindita et al. [22] looked into whether Lp (a) cascade screening was efficient in finding cases reported of increased Lp (a) in families in 2022. In a tertiary hospital setting, relatives with Lp (a) concentrations less than 100 mg/dl were examined for high Lp (a) (50 mg/dl) using a cascade analysis method. The pervasiveness and sensitivity tonoticenovel cases of increased Lp (a) in relations were investigated. When probands possess elevated amounts of Lp (a), the probability of identifying raised Lp (a) is higher, and it outperformed the identification of relations with Combined Hyperlipidemia (CH) and Familial Combined Hyperlipidemia (FCHL).

In 2022, Robert et al. [23] investigated the therapeutic value of HDL subclasses and HDL-P in mortality risk classification and discriminatory models in a high-risk cardiac cohort. We measured HDL and HDL-P subclasses in 3972 people enrolled in the CATHGEN coronary catheterization biorepository using nuclear magnetic resonance spectroscopy, evaluated for associations including all death rates in robust clinical models, and looked at the functionality of HDL sub-domains in gradual mortality risk discriminatory practices and categorization.

In 2020, Kamil et al. [24] examined the pooled and individual predictive values of basal HDL-C and CRP levels in patients who had completed a 2-year follow-up after receiving transcatheter aortic valve implantation (TAVI). We looked at 334 patients who received CRP and HDL-C readings on admissions during certification for TAVI from January 2010 to July 2017. HDL-C levels of 46 mg/dl (AUC $=$ 0.657) and CRP levels of 0.20 mg/dl (AUC $=$ 0.634) were significant forecasters of two years mortality.

Table 1
Reviews of lipoprotein prediction models

Employed models	Features	Challenges	Authors
JUPITER trial	High Recall Better risk prediction	Not efficiently used	Nancy et al. [18]
Meta-analysis	Less plasma protein levels Reflects high residual risk	Personal data cannot be obtained from outcomes	Peter et al. [19]
Correlation analysis	High specificity Sensitivity is high	A more precise cut-off is required	Qi et al. [20]
Modified Marshall Score (MMS)	High sensitivity High specificity	Randomized control trials are needed	Chen et al. [21]
Cascade testing protocol	Less cost High effectiveness	Cost analysis is required	Anindita et al. [22]
Logistic Regression	The risk level of coronary heart disease is reduced	Causality is not inferred	Robert et al. [23]
Fisher’s analysis	High predictive value Estimated hazard ratios	The cause of death was not observed	Kamil et al. [24]
Sorting Intolerant from Tolerant (SIFT)	High accuracy Interprets biological annotations	Needs analysis on varied mutations	Jiayan et al. [25]
Machine Learning techniques	High Accuracy Low Error	The expensive method	Malaysha et al. [45]
Stacking Ensemble Learning	The prediction accuracy is high	Not used in dynamic traffic segmentation	Li et al. [46]

Jiayan et al. [25] created silico forecasting models for familial hypercholesterolemia (FH) with low density lipoprotein receptor (LDLR) single missense mutations dubbed Structure-based Functional Impact Prediction for Mutation Identification in 2019. We compared our model’s operational impact and morbidity predictions to those of other traditional processes with evidence-based variants and in vitro functional lab tests for LDLR variant patients.

In 2022, Malaysha et al. [45] The recognition and diagnosis of LDL-C, based on past medical history and heuristic data, is supported by machine learning approaches. In this study, the LDL-C was predicted and classified using machine learning algorithms. The methods used for HDL-Ccategorization and prediction are also included. ANNs, RNN, RBFNN, fuzzy logic, SVM, Decision Tree, Logistic Regression, and a hybrid model combining ANNs and fuzzy logic are the techniques that are used to improve the results’ accuracy and lower the classification error.

In 2020, Li et al. [46] A technique for predicting mobile traffic based on stacking ensemble learning. This model has two components: a distributed multilayer perceptron (MLP) base learner and Self-adaptive Support Vector Regression is a meta-learner model (SSVR). Various real-world mobile traffic flows use mobile applications at various base stations.

Table 1 reviews the lipoprotein prediction models. Clinical advice for patients with eminent Lp (a) currently comprises the usage of statin and aspirin medication, even though the latter may cause Lp (a) levels to rise slightly. However, there is debate about the form of Lp (a) risk curve. While Lp (a) forecasts risk in people with low Total Cholesterol or LDL-C levels due to statin treatment, its relevance as a risk predictor in people with lower natural lipid levels is unknown [18].

Figure 1.

Representation of proposed lipoprotein prediction scheme.

3. Proposed lipoprotein prediction scheme

The proposed lipoprotein prediction scheme encompasses the following steps.

•
Primarily, features like correlation features, improved entropy, raw features, and cosine semantic similarity features are derived.
•
Then, prediction occurs using hybrid classifier (LSTM and GRU).
•
Further, GRU and LSTM are averaged to obtain the absolute output.
•
The GRU weights are optimally chosen via the SHGSO model.

Figure 1 depicts the SHGSO scheme.
4. Feature extraction

The features from input data are: (i) correlation features, (ii) improved entropy, (iii) raw features, and (iv) semantic similarity features.

4.1 Correlation features

It [26] is defined as the specified metric, which quantifies the strength of the linear association among 2 variables in a correlation study and is indicated by $fe_{CF}$ . It is designed as in Eq. (1), wherein, $z$ and $v\rightarrow$ sample value and $s\rightarrow$ data amount.

$\displaystyle fe_{CF}=\frac{s\left({\sum zv}\right)-\left({\sum z}\right)\left% ({\sum v}\right)}{\sqrt{\left[{s*\left({\sum z^{2}-\left({\sum z}\right)^{2}}% \right)}\right]*\left[{s*\left({\sum v^{2}-\left({\sum v}\right)^{2}}\right)}% \right]}}$ (1)

4.2 Improved log energy entropy (LEE)

Entropy is a concept that refers to how much information is carried by a signal. Furthermore, LEE [27] delivers dependable characteristics with a 0.01 error rate. Equation traditionally gives the LEE of (2). However, certain modifications to the existing method address the reliability problems. The improved entropy is shown in Eq. (3), wherein $wt\rightarrow$ weight and $E({y_{i}})\rightarrow$ conservative entropy as in Eq. (5).

$\displaystyle fe_{LE}(y)=-\sum\limits_{i=0}^{n-1}{({\log({P_{i}(y)})})^{2}}$ (2) $\displaystyle fe_{\textit{ILE}}(y)=-\sum\limits_{i=0}^{n-1}{({\log({P_{i}(y)})% })^{2}}*wt$ (3) $\displaystyle wt=\frac{1}{1+e^{-E({y_{i}})}}$ (4) $\displaystyle E({y_{i}})=-\sum\limits_{i=0}^{n}P_{i}\log P_{i}$ (5)

The improved LEE features are implied as $fe_{\textit{ILE}}$ .

4.3 Raw features

The extracted raw (original) features are indicated by $fe_{ra}$ . The original data is extracted in raw features.

4.4 Semantic similarity features

In information retrieval and information retrieval, cosine similarity is a broadly used metric [28]. Textual data is represented as a vector of phrases in this metric. The cosine similarity returns the similarity between the target and the data. It returns the pair-wise similarities between the row vectors. The extracted cosine similarity features are indicated by $fe_{ss}$ .

5. Hybrid classifiers: Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM)

Here, LSTM and GRU are used for lipoprotein prediction to get the final result.

5.1 Gated Recurrent Unit (GRU)

GRU [29] features special gates known as reset and update gatesfor minimizing dispersion with minimal loss. As in Eq. (6), ( ${ut}$ ) substitutes forget and input gate of the LSTM.

$\displaystyle ut=\mu({W_{u}.[{R_{t-1},\textit{Fea}_{t}}]+f_{u}})$ (6)

where $\mu$ point out sigmoid activation function among 0 and 1, $\textit{Fea}_{t}$ stand for input matrix at a time step $t$ , $R_{t-1}$ stand for hidden state at the prior time step $t-1$ . $W_{u}$ stand for weight matrix of $u t$ and $f_{u}$ stand for bias matrix of $u t$ . The ( ${rt}$ ) regulates how much chronological data have to be ignored that is revealed in Eq. (7), wherein, $W_{r}$ characterize weight matrix of $r t$ and $f_{r}$ symbolize bias matrix of $r t$ .

$\displaystyle rt==\mu({W_{r}.[{R_{t-1},\textit{Fea}_{t}}]+f_{r}})$ (7)

The hidden candidate state is revealed in Eq. (8), in which, $\tanh$ stand for $\tanh$ activation function. $f_{R}$ and $W_{R}$ stand for bias matrix and weight matrix of new cell state, $*$ stand for dot multiplication function. The output ( $R_{t}$ ) involve linear disruption amongst ( ${\tilde{R}_{t}}$ ) and $R_{t-1}$ .

$\displaystyle\tilde{R}_{t}=\tanh({W_{R}.[{R_{t-1}*rg,\textit{Fea}_{t}}]+f_{R}})$ (8) $\displaystyle R_{t}=({1-ut})*R_{t-1}+ut*\tilde{R}_{t}$ (9)

In Eq. (10), GRU is designed here, $\mathord{\buildrel\lower 3.0pt\hbox{$\scriptscriptstyle\leftarrow$}\over{R}}_{t}$ and $\vec{R}_{t}\rightarrow$ the hidden state of the backward and forward GRU in that order $C t$ correspond to combining technique of outputs intwo directions.

$\displaystyle Yt=Ct({\vec{R}_{t},\mathord{\buildrel\lower 3.0pt\hbox{$% \scriptscriptstyle\leftarrow$}\over{R}}_{t}})$ (10)

The GRU outputs are denoted as $\textit{GRU}_{\textit{Out}}$ .

5.2 Long Short Term Memory classifier (LSTM)

It [30] comprises three parts: a forget gate, input gate, and output gate. Assume that variables ${Z}$ and $D$ are concealed and cell states. ( ${X_{t},D_{t-1},Z_{t-1}}$ ) and ( ${{Z}_{t},D_{t}}$ ) be input and output layers.

LSTM is used $F_{t}$ for sorting data as in Eq. (11), in which, Eq. (11), $\sigma_{\rightarrow}$ activation function, ( ${J_{ZF},L_{ZF}}$ ) and ( ${J_{IF},L_{IF}}$ ) implied weight and bias constraints to map hidden and input layers to forget gate At the time $t$ , the forget gate $\rightarrow F_{t}$ , input gate $\rightarrow I_{t}$ and output gate $\rightarrow O_{t}$ .

$\displaystyle F_{t}=\sigma({J_{IF}X_{t}+J_{ZF}Z_{t-1}+L_{IF}+L_{ZF}})$ (11)

$I_{t}$ is deployed by LSTM as in Eqs (12)–(14), here, ( ${J_{ZG},L_{ZG}}$ ) and $({J_{IG},L_{IG}})\rightarrow$ weight and bias factors to map hidden and input layers to cell gate. ( ${J_{ZI},L_{ZI}}$ ) and ( ${J_{II},L_{II}}$ ) imply weight and bias constraint to map hidden and input layers to $I_{t}$ . LSTM attains output hidden layer from $O_{t}$ as in Eqs (15) and (16), wherein, ( ${J_{ZO},L_{ZO}}$ ) and $({J_{IO},L_{IO}})\rightarrow$ weight and bias to map $O_{t}$ .

$\displaystyle G_{t}=\tanh({L_{IG}+J_{IG}X_{t}+L_{ZG}+J_{ZG}{Z}_{t-1}})$ (12) $\displaystyle I_{t}=\sigma({J_{II}X_{t}+J_{ZI}Z_{t-1}+L_{II}+L_{ZI}})$ (13) $\displaystyle D_{t}=F_{t}D_{t-1}+I_{t}G_{t}$ (14) $\displaystyle O_{t}=\sigma({L_{IO}+J_{IO}X_{t}+L_{ZO}+J_{ZO}{Z}_{t-1}})$ (15) $\displaystyle{Z}_{t}=O_{t}\tanh({D_{t}})$ (16)

The LSTM outputs are implied by $\textit{LSTM}_{\textit{Out}}$ .

Advantages of LSTM and GRU:

•

LSTM is more accurate on a larger dataset. It can able to deal with larger data sequences.

•

GRU uses lesser training constraintswith lesser memory.

•

GRU performsquickerand provides the results quickly.

5.3 Proposed Selfish combined Henry Gas Solubility Optimization scheme with cubic map initialzation (SHGSO) model

Objective: It is designated in Eq. (17).

$\displaystyle\textit{Obj}=\min(er)$ (17)

Solution Encoding: The GRU weights $W$ are chosen by the SHGSO scheme. The solutions are shown in Fig. 2; whereinthe whole GRU weight count is indicated $u n$ .

Figure 2.

Solution encoding.

The present HGSO [31] identifies optimum solutions but suffers from a lack of accuracy. To alleviate the drawbacks of standard HGSO, SHGSO combines the concepts of Selfish Optimizer (SFO) [32] and HGSO. Specific search challenges are believed to be capable of hybridized optimization techniques [33, 34, 35, 36]. Additionally, many Evolutionary Algorithms have common weaknesses, including premature convergence and a hard time overcoming local optima. The operators used to update the positions of search agents are typically the source of these problems. For instance, in particle swarm optimization, search agents are typically drawn to the location of the current best individual, which drives the entire population to congregate around the best particle thus far and favors premature convergence. The proposed model is used to address a particular flaw in the existing methodology. The term hybridization refers to the merging of algorithms in the metaheuristic method to create a new, powerful algorithm that draws on the strengths of the merged algorithms. Selfish Optimizer (SFO) and HGSO are combined in the model that was adopted. The major drawback of the HGSO is it endures lower convergence and lack of accuracy. To overcome certain drawbacks the HGSO is hybridized with Selfish Optimizer. The Selfish Optimizer will provide better convergence,and accuracy is also improved.

Henry’s Law: The amount of a specific gas that dissolves in a given kind and volume of liquid at a fixed temperature is exactly proportional to the partial pressure of that gas in equilibrium with that liquid, according to Henry’s law. As a result, Henry’s law is temperature dependent; suggesting that solubility of gas ( ${B_{n}}$ ) is exactly proportional to its partial pressure ( ${R_{n}}$ ) as specified by Eq. (18), where $G\rightarrow$ Henry’s constant.

$\displaystyle B_{n}=G\times R_{n}$ (18)

Step 1: Initialization: The first step is to begin the initialization process. Equation (19) is commonly used to assigna number of gases (populacedimension $P$ ) and their positions. Initialization depends on a cubic map, as stated in Eq. (20), according to SHGSO, wherein $x^{m}$ lie among [0, 1] and $\rho=$ 2.59.

$\displaystyle H_{i}({\hat{l}+1})=H_{\min}+r\times({H_{\max}-XH_{\min}})$ (19) $\displaystyle x^{m+1}=\rho x^{m}({1-({x^{m}})^{2}})$ (20)

where $H(i)$ denotes the $i^{\text{th}}$ gas’s location in the population $P$ , $r$ seems to be a random integer among 0 and 1 and $H_{\max}$ , $H_{\min}$ are the problem’s boundaries, and $\hat{l}$ is the iteration time. Equation (5.3) is used to initialize the count of gas $i$ , type of Henry’s constant values $\hat{m}(G_{\hat{m}}(\hat{l}))$ , partial pressure $R_{i,\hat{m}}$ of gas $i$ in the cluster $\hat{m}$ , as well as $\frac{\nabla_{\textit{sol}}A}{S}$ constant value of type $\hat{m}({T_{i}})$ , here, $\hat{k}_{1}$ , $\hat{k}_{2}$ , and $\hat{k}_{3}$ are defined as constants with values equal to ( $5A-02,100$ , and $1A-02$ ), correspondingly.

$\displaystyle G_{\hat{m}}(\hat{l})=\hat{k}_{1}\times\textit{rand}({0,1})$ $\displaystyle R_{i,\hat{m}}=\hat{k}_{2}\times\textit{rand}({0,1})$ (21) $\displaystyle T_{\hat{m}}=\hat{k}_{3}\times\textit{rand}({0,1})$

Step 2: Clustering: Henry’s steady values $({G_{\hat{m}}})$ for all clusters are the same because the gases in each cluster are similar.

Step 3: Evaluation: Every cluster $\hat{m}$ is examined, and the gases are ranked.

Step 4: Update Henry’s coefficient as in Eq. (22). Conservatively, $C(\hat{l})$ is calculated as in Eq. (23). As per SHGSO, $C(\hat{l})$ is calculatedas in Eq. (24).

$\displaystyle G_{\hat{m}}({\hat{l}+1})=G_{\hat{m}}(\hat{l})\times\exp\left({-T% _{\hat{m}}\times\left({1\mathord{\left/{\vphantom{1{C\left(l\right)}}}\right.% \kern-1.2pt}{C\left(l\right)}-1\mathord{\left/{\vphantom{1{C^{\theta}}}}\right% .\kern-1.2pt}{C^{\theta}}}\right)}\right)$ (22) $\displaystyle C(\hat{l})=\exp\left({-\hat{l}\mathord{\left/{\vphantom{\hat{l}{% iter}}}\right.\kern-1.2pt}{iter}}\right)$ (23) $\displaystyle C(\hat{l})=\exp\left({-\hat{l}*\left({1-\frac{\hat{l}}{\hat{l}_{% \max}}}\right)}\right)$ (24)

where $G_{\hat{m}}$ is the cluster $\hat{m}$ Henry’s coefficient, $C$ seems to be the temperature, $C^{\theta}$ is indeed a constant of 298.15, and iter and $\textit{iter}_{\max}$ indicates the entire iteration and maximal iterations.

Step 5: Update solubility: It is updated as in Eq. (25), here $R_{i,\hat{m}}$ denotes partial pressure in a cluster $\hat{m}$ on gas $i$ , $O$ denotes constant, and $B_{i,\hat{m}}$ indicates solubility in the cluster $\hat{m}$ of gas $i$ .

$\displaystyle B_{i,\hat{m}}(\hat{l})=O\times G_{\hat{m}}({\hat{l}+1})\times R_% {i,\hat{m}}(\hat{l})$ (25)

Step 6: Update position: Conservatively, update by HGSO scheme. As per SHGSO, the position is updated with Selfish Optimizer (SFO) update as specified in Eq. (26).

$\displaystyle H_{i,\hat{m}}({\hat{d}^{\hat{l}+1}})=u_{i,\hat{m}}(\hat{d})+2.% \varpi.({h_{r}^{\hat{d}}-H_{i,\hat{m}}(\hat{d})})$ (26)

where the attacking predator is denoted as $H_{i,\hat{m}}$ , $\varpi$ refers to an arbitrary integer, $h_{r}^{\hat{d}}$ indicates a randomly elected herd member.

Step 7: Escaping from local optima: Equation (5.3) is used to rank and choose the worst ( ${\hat{M}_{\textit{worst}}}$ ) agent count here $\hat{M}\rightarrow$ search agent count.

$\displaystyle\hat{M}_{\textit{worst}}=\hat{M}\times({\textit{rand}({\hat{a}_{2% }-\hat{a}_{1}})+\hat{a}_{1}}),$ (27) $\displaystyle\hat{a}_{1}=0.1\,\text{and}\,\hat{a}_{2}=0.2$

Step 8: Worst agents position: Update the worst agent’s position in Eq. (28), wherein $\hat{D}_{({i,\hat{m}})}$ is the gas’s $i$ location in the cluster $m$ , $r$ denotes a random number, and $D_{\min}$ , $D_{\max}$ relates the problem’s boundaries.

$\displaystyle\hat{D}_{({i,\hat{m}})}=\hat{D}_{\min({i,\hat{m}})}+r\times({\hat% {D}_{\max({i,\hat{m}})}-\hat{D}_{\min({i,\hat{m}})}})$ (28)

Exploration and exploitation phases: Equations (29) and (30) is utilized to compute the exploitation/exploration percentage, wherein $\textit{Div}^{\hat{l}}\rightarrow$ population diversity and maximum diversity of $\hat{l}^{\text{th}}$ and $C$ iterations, respectively.

$\displaystyle\textit{Exploration}\%=\frac{\textit{Div}^{\hat{l}}}{\textit{Div}% _{\max}}\times 100$ (29) $\displaystyle\textit{Exploitation}\%=\frac{|{\textit{Div}^{\hat{l}}-\textit{% Div}_{\max}}|}{\textit{Div}_{\max}}\times 100$ (30)

Algorithm 1: Developed SHGSO scheme
1. Initialize as per Eq. (20)
2. Divide the population agents into count of gas types with the similar Henry’s constant value $G_{\hat{m}}$
3. Evaluate each cluster $\hat{m}$
4. Obtain the best gas $H_{({i,\textit{best}})}$ , and best search agent $H_{\textit{best}}$
5. while $\hat{l}<$ higher iterations do
6. for each search agent do
7. Update the proposed position based upon Selfish Optimizer (SFO) as in Eq. (26).
8. end for
9. Update Henry’s coefficient of each gas type using Eq. (22).
10. Update solubility of each gas using Eq. (25).
11. Rank and select the number of worst agents using Eq. (5.3).
12. Update the position of the worst agents using Eq. (28).
13. Update the best gas $H_{({i,\textit{best}})}$ , and best search agent $H_{\textit{best}}$
14. End while
15. $\hat{l}=\hat{l}+1$
16. return $H_{\textit{best}}$

6. Results and discussion

6.1 Simulation set up

The lipoprotein predictionschemewas made in Matlab. The HC $+$ SHGSO was calculated over HC $+$ CSO, HC $+$ SSO, HC $+$ HBA, HC $+$ HGSO and HC $+$ SFO. Also, the examination was done over classifiers such as CNN, RNN, SVM, RF, DBN, ANN [37], and SVM-DBN [38]. The examination was held for varied Learning Percentatges, and the dataset was downloaded from [39].

Dataset description: Circulating microRNAs (miRNA) are relatively stable in plasma and are a new class of disease biomarkers. Here we present evidence that human HDL transports endogenous miRNAs and delivers them to recipient cells with functional targeting capabilities. Highly-purified fractions of human HDL contain small RNAs, and the HDL-miRNA profile from normal subjects is significantly different than familial hypercholesterolemia subjects. miRNAs were demonstrated to associate with native and reconstituted HDL particles and reconstituted HDL injected into mice retrieved distinct miRNA profiles from normal and atherogenic models. Cellular export of miRNAs to HDL was regulated by neutral sphingomyelinase. HDL-mediated delivery of miRNAs to recipient cells was demonstrated to be scavenger receptor BI-dependent.

Furthermore, HDL delivery of both exogenous and endogenous miRNAs resulted in the direct targeting of mRNA reporters. Notably, HDL-miRNA from atherosclerotic subjects induced differential gene expression, with significant loss of conserved mRNA targets in cultured hepatocytes. These observations suggest that HDL participates in a novel mechanism of intercellular communication involving the transport and delivery of miRNAs. Gene expression changes in human Huh7 cells with familial hypercholesterolemia HDL treatment. Gene expression (mRNA) profiles in human Huh7 cells treated with normal HDL ( $n=$ 3) or familial hypercholesterolemia (FH) HDL ( $n=$ 3) in lipoprotein-depleted serum (48 h).

6.2 Analysis on performance

The study on HC $+$ SHGSO is evaluated over traditional models on dissimilar metrics. The assessment of HC $+$ SHGSO done over HC $+$ CSO, HC $+$ SSO, HC $+$ HBA, HC $+$ HGSO and HC $+$ SFO schemes is in Figs 3–5 for Learning Percentatges. Tables 2–4 explain the estimation of HC $+$ SHGSO over traditional CNN, RNN, SVM, RF, DBN, ANN [37], and SVM-DBN [38]. Here, HC $+$ SHGSO have proffered superior results than HC $+$ CSO, HC $+$ SSO, HC $+$ HBA, HC $+$ HGSO and HC $+$ SFO, CNN, RNN, SVM, RF, DBN, ANN [37] and SVM-DBN [38]. Here, higher specificity and accuracy for HC $+$ SHGSO is obtained at 80th Learning Percentatge. At 70th Learning Percentatge, the HC $+$ SHGSO has gained rather high sensitivity than at 60th, 80th and 90th Learning Percentatges. In Table 2, HC $+$ SHGSO has gained the best accuracy outcome of 0.96 at 80th Learning Percentatge. Further, ANNhas gained better outcomes after HC $+$ SHGSO. Thus, HC $+$ SHGSO is established over HC $+$ CSO, HC $+$ SSO, HC $+$ HBA, HC $+$ HGSO and HC $+$ SFO, CNN, RNN, SVM, RF, DBN, ANN and SVM-DBN.

Figure 3.

Analysis via HC $+$ SHGSO.

Figure 4.

Analysis via HC $+$ SHGSO.

Figure 5.

Analysis via HC $+$ SHGSO.

Table 2

Analysis via HC $+$ SHGSO over other classifier schemes

Learning Percentatge $=$ 60%
Metrics	CNN	RNN	SVM	RF	DBN	ANN [37]	SVM $-$ DBN [38]	HC $+$ SHGSO
FDR	0.3674	0.25912	0.39263	0.3422	0.36156	0.36947	0.25105	0.098148
Sensitivity	0.79792	0.78971	0.80269	0.73663	0.83775	0.85646	0.74895	0.90553
Accuracy	0.86137	0.85423	0.86544	0.80428	0.89399	0.90826	0.81651	0.94597
F1-score	0.8876	0.8825	0.89055	0.84834	0.91171	0.92268	0.85646	0.94839
Precision	0.77464	0.82571	0.80417	0.83768	0.79554	0.74017	0.71397	0.90185
FPR	0.33349	0.35377	0.39931	0.393	0.37755	0.38776	0.34587	0.10729
MCC	0.74398	0.73169	0.75103	0.64659	0.80095	0.82635	0.66733	0.89727
Specificity	0.83148	0.79178	0.80649	0.82285	0.80026	0.77007	0.78003	0.89271
FNR	0.20208	0.21029	0.19731	0.26337	0.16225	0.14354	0.25105	0.13575
NPV	0.74133	0.72044	0.70384	0.72562	0.77986	0.80458	0.75198	0.89271
Learning Percentatge $=$ 70%
Specificity	0.81502	0.84305	0.74633	0.70512	0.70845	0.84737	0.79539	0.91553
NPV	0.78052	0.8146	0.79799	0.70914	0.80167	0.77696	0.71807	0.91553
Accuracy	0.8519	0.87228	0.83288	0.87364	0.81658	0.87228	0.79484	0.95788
FPR	0.27955	0.26855	0.37237	0.28528	0.28297	0.33401	0.27588	0.084469
Sensitivity	0.79198	0.81532	0.77138	0.81693	0.75455	0.81532	0.73322	0.91825
FNR	0.20802	0.18468	0.22862	0.18307	0.24545	0.18468	0.26678	0.11444
F1-score	0.88392	0.89827	0.87093	0.89924	0.8601	0.89827	0.84608	0.95969
MCC	0.72323	0.75932	0.68978	0.76174	0.66122	0.75932	0.62314	0.91901
Precision	0.74015	0.75516	0.75722	0.82883	0.76342	0.79832	0.84559	0.9225
FDR	0.27353	0.34598	0.38494	0.32271	0.37993	0.35344	0.3918	0.0775
Learning Percentatge $=$ 80%
Sensitivity	0.75	0.77716	0.75405	0.83036	0.75815	0.84802	0.87736	0.8999
Accuracy	0.81059	0.83707	0.81466	0.88391	0.81874	0.89817	0.92057	0.96538
FDR	0.32953	0.34765	0.37443	0.36277	0.38092	0.28036	0.38439	0.064394
Precision	0.74348	0.80901	0.72359	0.83688	0.83734	0.72994	0.7838	0.93561
Specificity	0.81506	0.79087	0.82285	0.81079	0.7954	0.80865	0.73283	0.93033
FPR	0.363	0.27355	0.29767	0.29298	0.32519	0.30496	0.26441	0.069672
F1-score	0.85714	0.87461	0.85978	0.90732	0.86244	0.91776	0.93467	0.96673
MCC	0.64884	0.69562	0.65603	0.77917	0.66323	0.805	0.84614	0.93296
FNR	0.25	0.22284	0.24595	0.16964	0.24185	0.15198	0.19001	0.12264
NPV	0.8463	0.7527	0.75716	0.84236	0.76791	0.79279	0.73961	0.93033
Learning Percentatge $=$ 90%
Specificity	0.83523	0.78295	0.78889	0.71905	0.75205	0.83418	0.70696	0.90833
FDR	0.32009	0.32596	0.30314	0.35054	0.3991	0.34197	0.39282	0.080882
Accuracy	0.8898	0.77959	0.80816	0.80816	0.77959	0.86531	0.84898	0.9551
FPR	0.3132	0.35048	0.39688	0.33952	0.33768	0.31092	0.36314	0.091667
MCC	0.79153	0.59886	0.64849	0.64849	0.59886	0.74809	0.71945	0.91371
FNR	0.16463	0.28272	0.25543	0.25543	0.28272	0.19412	0.21264	0.11244
Precision	0.71812	0.75912	0.80692	0.80177	0.70286	0.80317	0.8303	0.91912
F1-score	0.9103	0.83537	0.85358	0.85358	0.83537	0.89251	0.88103	0.95785
Sensitivity	0.83537	0.71728	0.74457	0.74457	0.71728	0.80588	0.78736	0.91946
NPV	0.74534	0.79773	0.76427	0.72225	0.81255	0.71735	0.80982	0.90833

6.3 Statistical analysis

Table 3 highlight the statistical study using HC $+$ SHGSO over HC $+$ CSO, HC $+$ SSO, HC $+$ HBA, HC $+$ HGSO and HC $+$ SFO, CNN, RNN, SVM, RF, DBN, ANN and SVM-DBN on accuracy. The met heuristic schemes are stochastic, and to substantiate its fair evaluation, each model is analyzed quite often to accomplish high accuracy. Accuracy of 0.956 is attained with HC $+$ SHGSO for median and mean cases, while HC $+$ CSO, HC $+$ SSO, HC $+$ HBA, HC $+$ HGSO, and HC $+$ SFO, CNN, RNN, SVM, RF, DBN, ANN, and SVM-DBN have gained lesser accuracy for median and mean cases. Similarly, superior outputs are attained for HC $+$ SHGSO for the best case. Thus, incorporating enhanced features and optimization ensures the enhancement of the HC $+$ SHGSO method.

Table 3
Statistical study on accuracy

Metrics	SVM $-$ DBN	ANN	DBN	RF	SVM	RNN	CNN	HC $+$ CSO	HC $+$ SSO	HC $+$ HBA	HC $+$ HGSO	HC $+$ SFO	HC $+$ SHGSO
Std dev	0.0549	0.0205	0.0480	0.0421	0.0257	0.0401	0.0328	0.0233	0.0185	0.0172	0.0204	0.0081	0.0080
Worst	0.9206	0.9083	0.8940	0.8839	0.8654	0.8723	0.8898	0.9306	0.9307	0.9347	0.9246	0.9061	0.9654
Best	0.7948	0.8653	0.7796	0.8043	0.8082	0.7796	0.8106	0.8787	0.8860	0.8941	0.8807	0.8900	0.9460
Median	0.8328	0.8852	0.8177	0.8409	0.8238	0.8457	0.8566	0.9015	0.9130	0.9190	0.9184	0.8977	0.9565
Mean	0.8452	0.8860	0.8272	0.8425	0.8303	0.8358	0.8534	0.9031	0.9107	0.9167	0.9106	0.8979	0.9561

6.4 Convergence analysis

The cost of the SHGSO scheme over HC $+$ CSO, HC $+$ SSO, HC $+$ HBA, HC $+$ HGSO, and HC $+$ SFO is shown in Fig. 6. As crucial, the SHGSO has got lesser cost from iteration 20th to 50th. From Fig. 6, a lesser cost of 1.0418 is achieved via SHGSO than HC $+$ CSO, HC $+$ SSO, HC $+$ HBA, HC $+$ HGSO and HC $+$ SFO. Thus, better results are accomplished using the SHGSO scheme.

Figure 6.

Convergence analysis of SHGSO over others.

6.5 Ablation study

The developed HC $+$ SHGSO technique is analyzed over SHGSO without optimization, SHGSO with conservative entropy, and SHGSO with raw features in Table 4 for varied Learning Percentatges. A less FDR of 0.098 is noted for HC $+$ SHGSO that are lesser than the adopted modelwithout optimization, SHGSO with conservative entropy, and SHGSO with raw features. Next to the HC $+$ SHGSO, SHGSO with conservative entropy has revealed superior values over the adopted modelwithout optimization and SHGSO with raw features. This growth is owing to proposed features and the SHGSO concept.

Table 4
Comparison of HC $+$ SHGSO

Learning Percentatge $=$ 60%
Metrics	An adopted model with no optimization	HC $+$ SHGSO with conventional entropy	HC $+$ SHGSO with raw features	HC $+$ SHGSO
NPV	0.69775	0.7637	0.70358	0.89271
Precision	0.69671	0.74206	0.69366	0.90185
Sensitivity	0.85201	0.83069	0.8213	0.90553
Accuracy	0.82161	0.85933	0.82263	0.94597
F1-score	0.82125	0.85193	0.81913	0.94839
FDR	0.30329	0.25794	0.30634	0.098148
Specificity	0.69775	0.7637	0.70358	0.89271
FPR	0.30225	0.2363	0.29642	0.10729
MCC	0.69723	0.7528	0.6986	0.89727
FNR	0.23899	0.38711	0.36603	0.13575
Learning Percentatge $=$ 70%
FPR	0.29867	0.28913	0.32854	0.084469
NPV	0.70133	0.71087	0.67146	0.91553
Accuracy	0.81658	0.81929	0.78261	0.95788
FNR	0.33979	0.30123	0.36993	0.11444
Sensitivity	0.81155	0.86791	0.84341	0.91825
Precision	0.6778	0.67482	0.6088	0.9225
F1-score	0.80797	0.80584	0.75684	0.95969
Specificity	0.70133	0.71087	0.67146	0.91553
MCC	0.68947	0.69261	0.63936	0.91901
FDR	0.3222	0.32518	0.3912	0.0775
Learning Percentatge $=$ 80%
Accuracy	0.81263	0.84929	0.80244	0.96538
NPV	0.70886	0.75	0.69876	0.93033
FNR	0.35299	0.26515	0.26045	0.12264
Precision	0.65543	0.72491	0.63534	0.93561
FPR	0.29114	0.25	0.30124	0.069672
Specificity	0.70886	0.75	0.69876	0.93033
MCC	0.68162	0.73735	0.66629	0.93296
Sensitivity	0.89036	0.80586	0.87347	0.8999
F1-score	0.79186	0.84052	0.77701	0.96673
FDR	0.34457	0.27509	0.36466	0.064394
Learning Percentatge $=$ 90%
Accuracy	0.82857	0.75102	0.82857	0.9551
Sensitivity	0.82448	0.84998	0.89291	0.91946
MCC	0.70461	0.59854	0.70257	0.91371
Precision	0.67939	0.57042	0.66929	0.91912
FDR	0.32061	0.42958	0.33071	0.080882
Specificity	0.73077	0.62805	0.7375	0.90833
FPR	0.26923	0.37195	0.2625	0.091667
FNR	0.39	0.33069	0.29523	0.11244
F1-score	0.80909	0.72646	0.80189	0.95785
NPV	0.73077	0.62805	0.7375	0.90833

6.6 Region of curve

The ROC curve describes the performance by merging confusion matrices at all threshold values. Figure 7 depicts the ROC curve. A ROC curve is a graphical figure that shows how a binary classifier system’s diagnostic capacity changes as the discrimination threshold are altered. The value is high for the proposed approachover other conventional approaches.

6.7 Area under curve

AUC stands for Area under the ROC Curve. AUC represents the level or measurement of separability. It reveals how well the model can differ across classes. The model performs better at differentiating between the positive and negative classes the higher the AUC. Table 5 depicts the AUC analysis.

Table 5
Area under curve analysis

Methods	Learning Percentage
	60	70	80	90
HC $+$ CSO	0.9102	0.8869	0.9111	0.8567
HC $+$ SSO	0.8960	0.9165	0.8806	0.8715
HC $+$ HBA	0.9073	0.9284	0.9229	0.8664
HC $+$ HGSO	0.8827	0.9136	0.8782	0.8821
HC $+$ SFO	0.8988	0.8998	0.8794	0.8664
HC $+$ SHGSO	0.9534	0.9674	0.9487	0.9084

Figure 7.

Region of curve.

7. Conclusion

This paper suggested an Lp detection model wherein correlation features, enhanced improved log energy entropy, raw features, and semantic similarity features were extracted. These extracted characteristics were put through a hybrid model that combined a GRU and LSTM. The GRU and LSTM outcomes were averaged to get the absolute output. To improve the precise and accurate prediction, GRU weights were optimized via the SHGSO model. The offered HC $+$ SHGSO have offered superior outputs than HC $+$ CSO, HC $+$ SSO, HC $+$ HBA, HC $+$ HGSO, and HC $+$ SFO, CNN, RNN, SVM, RF, DBN, ANN, and SVM-DBN. Here, higher specificity and accuracy for HC $+$ SHGSO are obtained at 80th Learning Percentatge, the HC $+$ SHGSO has gained rather high sensitivity than at the 60th, 80th, and 90th Learning Percentatges. Also, HC $+$ SHGSO has gained best accuracy outcome of 0.96 at 80th Learning Percentatge. Further, ANN has gained better outcomes next to HC $+$ SHGSO. In the future, causality rates should be considered.

Footnotes

Declaration of statement

To the best of the authors’ knowledge, the paper entitled “Lipoprotein Detection: Hybrid Deep Classification Model with Improved Feature Set” is not considered for publication elsewhere and has not been published anywhere.

Author’s Bios

Pravin Kathavate obtained a M. E. degree in Computer Engineering from Pune Institute of Computer Technology, Pune, India. Presently he is Assistant Professor of Information Technology, Walchand Institute of Technology, Solapur Maharashtra. He is a Research Scholar, pursuing Ph.D. in Computer Science & Engineering at Vijayawada, Andhra Pradesh. His current research interests in the areas of: Internet of Things (IOT), Machine Learning, Software Engineering, and Software Testing & Quality Assurance. He is presently Associate Member of Institution of Engineers, India (IEI), Kolkata. He has published 8 papers in journals such as Scopus Indexed Journals.

Dr. J. Amudhavel, Ph. D., Assistant Director – Digital Outreach at VIT Bhopal University, Bhopal, Madhya Pradesh, India. Developing real-world societal applications-based Machine Learning, Deep Learning and Artificial Neural Networks. Delivered Hands on Guest Lectures, Workshops by various reputed institutions of National Importance related to Augmented Reality, Deep Learning, Research Methodologies, Scientific Research Paper Writing etc … He has 10 licenses & certifications in various courses like, Amazon Web Services, Cloud Architecture core concepts etc.

References

Gao

Xiao

and Zhang

, High-density lipoprotein cholesterol for the prediction of mortality in cirrhosis with portal vein thrombosis: A retrospective study, Lipids Health Dis 18 (2019).

Erdem

and Kaya

, Prediction of diabetic retinopathy in patients with type 2 diabetes mellitus by using monocyte to high-density lipoprotein-cholesterol ratio, Int J Diabetes Dev Ctries, 2021.

Liu

Fan

and Wu

, Compared with the monocyte to high-density lipoprotein ratio (MHR) and the neutrophil to lymphocyte ratio (NLR), the neutrophil to high-density lipoprotein ratio (NHR) is more valuable for assessing the inflammatory process in Parkinson’s disease, Lipids Health Dis 20 (2021).

Y.E.

Qiu

H.X.

and Wu

R.Z.

, Oxidised low-density lipoprotein and its receptor-mediated endothelial dysfunction are associated with coronary artery lesions in kawasaki disease, J. of Cardiovasc. Trans. Res. 13 (2020), 204–214.

Huang

Q.x.

and Zeng

H.l.

, Prediction of Metabolic Disorders Using NMR-Based Metabolomics: The Shanghai Changfeng Study, Phenomics 1 (2021), 186–198.

Zhou

Shen

and Wang

, Association between carotid intima media thickness and small dense low-density lipoprotein cholesterol in acute ischaemic stroke, Lipids Health Dis 19 (2020), 177.

Fonseca

M.I.H.

da Silva

I.T.

and Ferreira

S.R.G.

, Impact of menopause and diabetes on atherogenic lipid profile: Is it worth to analyse lipoprotein subfractions to assess cardiovascular risk in women? Diabetol Metab Syndr 9 (2017), 22.

Y.Q.

Y.Y.

and Li

G.N.

, Rare novel LPL mutations are associated with neonatal onset lipoprotein lipase (LPL) deficiency in two cases, BMC Pediatr 21 (2021), 414.

Yokoyama

Tani

and Matsuo

, Increased triglyceride/high-density lipoprotein cholesterol ratio may be associated with reduction in the low-density lipoprotein particle size: Assessment of atherosclerotic cardiovascular disease risk, Heart Vessels 34 (2019), 227–236.

10.

Lin

Xia

and Yu

, The predictive study of the relation between elevated low-density lipoprotein cholesterol to high-density lipoprotein cholesterol ratio and mortality in peritoneal dialysis, Lipids Health Dis 19 (2020), 51.

11.

R.X.

Zhang

and Zhang

, Effects of pitavastatin on lipoprotein subfractions and oxidized low-density lipoprotein in patients with atherosclerosis, Curr Med Sci 40 (2020), 879–884.

12.

Chen

and Li

, Non-high-density lipoprotein cholesterol/high-density lipoprotein cholesterol ratio serve as a predictor for coronary collateral circulation in chronic total occlusive patients, BMC Cardiovasc Disord 21 (2021), 311.

13.

Bando

Wakaguri

and Aoki

, Non-high-density cholesterol level as a predictor of maximum carotid intima-media thickness in Japanese subjects with type 2 diabetes: A comparison with low-density lipoprotein level, Diabetol Int 7 (2016), 34–41. doi: 10.1007/s13340-015-0208-0.

14.

and Wei

, Association between small dense low-density lipoprotein cholesterol and neuroimaging markers of cerebral small vessel disease in middle-aged and elderly Chinese populations, BMC Neurol 21 (2021), 436.

15.

Lin

and Hu

, Relationship between non-high-density lipoprotein cholesterol and carotid atherosclerosis in normotensive and euglycemic Chinese middle-aged and elderly adults, Lipids Health Dis 16 (2017), 55.

16.

Tohidi

Baghbani-Oskouei

and Ahanchi

, Fasting plasma glucose is a stronger predictor of diabetes than triglyceride-glucose index, triglycerides/high-density lipoprotein cholesterol, and homeostasis model assessment of insulin resistance: Tehran lipid and glucose study, Acta Diabetol 55 (2018), 1067–1074.

17.

M.F.

K.Z.

and Guo

Y.G.

, Lipoprotein(a) and atherosclerotic cardiovascular disease: Current understanding and future perspectives, Cardiovasc Drugs Ther 33 (2019), 739–748.

18.

Cook

N.R.

Mora

and Ridker

P.M.

, Lipoprotein(a) and cardiovascular risk prediction among women, Journal of the American College of Cardiology 72 (2018), 287–296.

19.

Willeit

Ridker

P.M.

Nestel

P.J.

Simes

Tonkin

A.M.

Pedersen

T.R.

Schwartz

G.G.

Olsson

A.G.

Colhoun

H.M.

Kronenberg

Drechsler

Wanner

Mora

Lesogor

and Tsimikas

, Baseline and on-statin treatment lipoprotein(a) levels for prediction of cardiovascular events: Individual patient-data meta-analysis of statin outcome trials, The Lancet 392 (2018), 1311–1320.

20.

Liang

Lei

Huang

Fan

and Yu

, Elevated Lipoprotein-Associated Phospholipase A2 is Valuable in Prediction of Coronary Slow Flow in Non-ST-Segment Elevation Myocardial Infarction Patients, Current Problems in Cardiology 46 (2020).

21.

Zhou

C.L.

Zhang

C.H.

Zhao

X.Y.

Chen

S.H.

Liang

H.J.

C.L.

and Chen

N.W.

, Early prediction of persistent organ failure by serum apolipoprotein A-I and high-density lipoprotein cholesterol in patients with acute pancreatitis, Clinica Chimica Acta January 476 (2018), 139–145.

22.

Chakraborty

Chan

D.C.

Ellis

K.L.

Pang

Barnett

Woodward

A.M.

and Watts

G.F.

, Cascade testing for elevated lipoprotein(a) in relatives of probands with high lipoprotein(a), American Journal of Preventive Cardiology 10 (2022).

23.

de la Cruz-Ares

Leon-Acuña

Yubero-Serrano

E.M.

Torres-Peña

J.D.

Arenas-de Larriva

A.P.

Cardelo

M.P.

and Delgado-Lista

, High density lipoprotein subfractions and extent of coronary atherosclerotic lesions: From the cordioprev study, Clinica Chimica Acta Available online, 2022.

24.

Zieliński

Kalińczuk

Ł.

Chmielak

Mintz

G.S.

Da̧browski

Prȩgowski

and Witkowski

, Additive Value of High-Density Lipoprotein Cholesterol and C-Reactive Protein Level Assessment for Prediction of 2-year Mortality After Transcatheter Aortic Valve Implantation, The American Journal of Cardiology 126 (2020), 66–72.

25.

Guo

Gao

and Jiang

, Systematic prediction of familial hypercholesterolemia caused by low-density lipoprotein receptor missense mutations, Atherosclerosis 281 (2019), 1–8.

26.

Correlation Coefficient: Simple Definition, Formula, Easy Steps https://www.statisticshowto.com/probability-and-statistics/correlation-coefficient-formula/.

27.

Aydın

Saraoğlu

H.M.

and Kara

, Log energy entropy-based EEG classification with multilayer neural networks in seizure, Annals of Biomedical Engineering 37 (2009), 2626–2630.

28.

Faisal

Kitasuka

and Aritsugi

, Semantic Cosine Similarity, 2012.

29.

Xiao

Wang

and Zhang

, Time-series production forecasting method based on the integration of Bidirectional Gated Recurrent Unit (Bi-GRU) network and Sparrow Search Algorithm (SSA), Journal of Petroleum Science and Engineering 208 (2022).

30.

Zhou

Lin

Zhang

Shao

and Liu

, Improved itracker combined with bidirectional long short-term memory for 3D gaze estimation using appearance cues, Neuro computing In press, corrected proof, 2019.

31.

Hashim

F.A.

Houssein

E.H.

and Mirjalili

, Henry gas solubility optimization: A novel physics-based algorithm, Future Generation Computer Systems 101 (2019), 646–667.

32.

Fausto

Cuevas

Valdivia

and González

, A global optimization algorithm inspired in the behavior of selfish herds, Biosystems 160 (2017), 39–55.

33.

Marsaline Beno

Valarmathi

I.R.

Swamy

S.M.

and Rajakumar

B.R.

, Threshold prediction for segmenting tumour from brain MRI scans, International Journal of Imaging Systems and Technology 24 (2014), 129–137.

34.

Thomas

and Rangachar

M.J.S.

, Hybrid optimization based DBN for face recognition using low-resolution images, Multimedia Research 1 (2018), 33–43.

35.

Devagnanam

and Elango

N.M.

, Optimal resource allocation of cluster using hybrid grey wolf and cuckoo search algorithm in cloud computing, Journal of Networking and Communication Systems 3 (2020), 31–40.

36.

Mahammad Shareef

S.K.

and Srinivasa Rao

, A hybrid learning algorithm for optimal reactive power dispatch under unbalanced conditions, Journal of Computational Mechanics, Power System and Control 1 (2018), 26–33.

37.

Sanad Awad

and Mohammed Hadrob

, Classification and Prediction of Low-Density Lipoprotein Cholesterol LDL-C in The Palestinian Patients Using Machine Learning Techniques, Intelligent Networks and Systems Society (INASS), International Journal of Intelligent Engineering and Systems 15 (2022), 453–463.

38.

C.Çubukçu

and Topcu

D.İ.

, Estimation of low-density lipoprotein cholesterol concentration using machine learning, Lab Med 53 (2022), 161–171.

39.

https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-25311/samples/.

40.

Zhi-Hua

, Ensemble methods: foundations and algorithms, CRC press, 2012.

41.

Cha

and Ma

, eds, Ensemble machine learning: methods and applications, Springer Science & Business Media, 2012.

42.

Tagel

Rorissa

and Srinivasagan

, Stacking-based ensemble learning method for multi-spectral image classification, Technologies 10 (2022), 17.

43.

Hyunjin

Park

and Lee

, Stacking ensemble technique for classifying breast cancer, Healthcare Informatics Research 25 (2019), 283–288.

44.

Alexandropoulos

S.A.N.

Aridas

C.K.

Kotsiantis

S.B.

and Vrahatis

M.N.

, Stacking strong ensembles of classifiers, in: IFIP International Conference on Artificial Intelligence Applications and Innovations, Springer, Cham, 2019.

45.

Sanad

Awad

and Hadrob

, Classification and Prediction of Low-Density Lipoprotein Cholesterol LDL-C in The Palestinian Patients Using Machine Learning Techniques, 2022.

46.

Cai

Wang

Qin

and Fu

, A Stacking Ensemble Learning Model for Mobile Traffic Prediction, in: 2020 IEEE/CIC International Conference on Communications in China (ICCC), IEEE, 2020.

Lipoprotein detection: Hybrid deep classification model with improved feature set

Abstract

Keywords

1. Introduction

Table 1 Reviews of lipoprotein prediction models

4.1 Correlation features

4.4 Semantic similarity features

5. Hybrid classifiers: Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM)

5.1 Gated Recurrent Unit (GRU)

6.1 Simulation set up

6.2 Analysis on performance

Table 3 Statistical study on accuracy

Table 4 Comparison of HC + SHGSO

6.7 Area under curve

Table 5 Area under curve analysis

Footnotes

Declaration of statement

Author’s Bios

References

Table 1
Reviews of lipoprotein prediction models

Table 3
Statistical study on accuracy

Table 4
Comparison of HC $+$ SHGSO

Table 5
Area under curve analysis