Classification of data on stacked autoencoder using modified sigmoid activation function

Abstract

A Neural Network is one of the techniques by which we classify data. In this paper, we have proposed an effectively stacked autoencoder with the help of a modified sigmoid activation function. We have made a two-layer stacked autoencoder with a modified sigmoid activation function. We have compared our autoencoder to the existing autoencoder technique. In the existing autoencoder technique, we generally use the logsigmoid activation function. But in multiple cases using this technique, we cannot achieve better results. In that case, we may use our technique for achieving better results. Our proposed autoencoder may achieve better results compared to this existing autoencoder technique. The reason behind this is that our modified sigmoid activation function gives more variations for different input values. We have tested our proposed autoencoder on the iris, glass, wine, ovarian, and digit image datasets for comparison propose. The existing autoencoder technique has achieved 96% accuracy on the iris, 91% accuracy on wine, 95.4% accuracy on ovarian, 96.3% accuracy on glass, and 98.7% accuracy on digit (image) dataset. Our proposed autoencoder has achieved 100% accuracy on the iris, wine, ovarian, and glass, and 99.4% accuracy on digit (image) datasets. For more verification of the effeteness of our proposed autoencoder, we have taken three more datasets. They are abalone, thyroid, and chemical datasets. Our proposed autoencoder has achieved 100% accuracy on the abalone and chemical, and 96% accuracy on thyroid datasets.

Keywords

Autoencoder sigmoid activation function logsigmoid neural network classification stacked autoencoder

1 Introduction

Machine learning techniques are used for the training of the data. We can train the data with the help of three types of learning processes. They are supervised, reinforcement, and unsupervised learning processes. If the input and output or targets values are given then we use a supervised learning process. This process is also called the classification technique. Some classification techniques are Decision Trees [1], Bayesian Classifiers [2], Neural Networks [3], K-Nearest Neighbour [4], Support Vector Machines [5], linear regression [6], and Logistic Regression [7]. Time series classification, dummy query sequence (for mobile users), Back-propagation (BP) neural network, Stochastic Neural Network, and linguistics-based stacking approach are some modern techniques of classification. Time series classification is a supervised learning process. A set of time series is achieved using a class label. Symbolic Aggregate approXimation (SAX) is a very famous method for the reduction of the dimension of time series, but BingBai et al. [8] used multi-feature dictionary representation and ensemble learning for time series classification. Using this method, the authors first extract mean features, second construct multiple single classifiers, and finally build an ensemble classifier to improve classification performance [8]. For protecting users’ privacy in the case of locally based services (LBS), Zongda Wu et al. [9] constructed a group of dummy query sequences. Using this method they successfully remove the problems on personal privacy i.e. location privacy and query privacy. Siyuan Tang and Feifei Yu [10] achieved better accuracy of retina vessel segmentation using a retinal segmentation algorithm. This algorithm is based on BP neural network. On the DRIVE library, they achieved 94.77% accuracy and on the STARE library, they achieved 94.98% accuracy. Danni Lu [11] provided a detailed study related to the exponential synchronization problem of stochastic neural networks. They proposed a dynamic model of master-slave neural networks which contains time-varying delays and Levy noises [11]. Yuwei Zeng et al. [12] used vogers to classify ISP-level DNS traffic into disposable and normal types of domains. They achieved a true positive rate of 98.98%, which is 0.41% more than the Y. Chen et al. methods. The authors have taken 12 basic linguistic features. They are Length, Label Level, Average Label Length, (Vowel Ratin, Consonant Ratio, and Digit Ratio), (Consecutive Consonants Ratio, Consecutive Digit Ratio), Character Cardinality, Inner digits number, Word ratio, and Entropy [12]. Classification of innovation input and innovation outputs are used in intelligent manufacturing. In the manufacturing process for the optimization of real data analysis, we use intelligent manufacturing. Measuring fiscal policies and innovation environment is known as innovation input and the results of these innovative activities within the economy are known as innovation output. Limeng Ying showed that intelligent manufacturing can promote both innovation input and innovation output of the enterprises [13]. Jing Feng et al. [14] proposed a sliding force prediction model based on Belief Rule-Based (BRB) inferential methodology. Using this method, authors accurately predict the slope of landslides and achieve early learning for landslide disasters. Losses to the lives and property of people are avoided by using this information. With the help of rock mass classification techniques, we classify rock mass into groups or classes. Fei Song et al. [15] have represented the post-failure behavior and time-dependent behavior of rock masses using the ViscoElastic-ViscoPlastic Strain-Softening (VEVP-SS) model. Area coverage classification is very useful in the mobile robot field. A mobile robot moves in the surrounding using area coverage classification techniques. Sheng Feng defines three coverage areas [16]. They proposed the Optimal k-coverage WSN deployment problem (OLXWDP) model. Using this model authors achieved average rates of nearly 100% for 1-coverage, 91.34% for 2- coverage, and 89.00% for 3- coverage. A neural network is a branch of machine learning [17]. Clustering, classification, recognition, etc. are major areas of Artificial Neural Network (ANN) [17-20]. There are many techniques of neural networks. Time Delay Neural Network [21], Multilayer feed-forward neural network, Self Organization Map, Convolutional neural network, Deep network, and autoencoder are some techniques of neural network.

Some classification applications based on stacked autoencoder are: Hyperspectral image classification [22], Network traffic classification [23], Electricity theft defection [24], Alzheimer’s disease classification [25], Cervical cancer predication and classification [26], Early gastric cancer classification [27], COVID-19 diagnosis [28], ECG beat classification [29], Brain Tumor Detection and Classification [30], Fault Diagnosis of permanent Magnet Synchronous Motor [31], Cross-domain fault diagnosis [32], Real-time radio technology and modulation classification [33], Weld defect classification [34] and Heart Sound classification [35]. Hinton and the PDP group proposed the first Autoencoder in 1980 (Rumelhart et al., 1986) [41]. An autoencoder is one of the networks of unsupervised learning algorithms. Better performance of the autoencoder is obtained by using a stacked autoencoder [42]. Generative Adversarial Network with Autoencoder (GAN-AE), Sparse Autoencoder, Variational autoencoder, Stacked Sparse Autoencoder (SSAE), Locality-Constraint Sparse Autoencoder (LSAE), Weight-Clustering Sparse Autoencoder (WCSAE), and Stacked Pruning Sparse Autoencoder (SPSAE) are some representations of stacked autoencoder models. Ian Goodfellow and his colleagues proposed GAN in June 2014. This is an unsupervised learning technique and therefore these networks create their own training data. Stephanic Ger et al. [43] proposed a noval GAN with autoencode (GAN-AE). In this model, the authors developed a GAN architecture with an additional autoencoder component. In order to generate synthetic data in this model, the authors used a recurrent neural network (RNN). In Sparse Autoencoder, the output is achieved using the sparsity method. The result of the loss function also depends on the sparsity regularization value [see equation (3)]. In the Variational autoencoder (VAE), the output values are obtained using probabilistic manners. Its loss function depends upon the reconstruction loss value and the Kullback-Leibler divergence value. In the SSAE, each hidden layer is made of the hidden layer of an individually trained Sparse Autoencoder [44]. The main properties of this autoencoder are that input values of each sparse autoencoder are obtained from the output value of the previous sparse autoencoder. During the experiment on CIFAR-10, STL-10, and Caltech-101 datasets, Luo et al. [45] proposed LSAE. They used the logistic sigmoid activation function in this network. The authors found a local subspace of training data using this proposed SAE. Fan et al. [46] proposed a novel approach for change detection using weight-clustering sparse autoencoders (WCSAE) combined object-oriented classification with difference images (DIs). The authors also introduced L1/2 regularization to extract more sparse features and avoid over-fitting. Using diagnosing bearing faults, Haiping Zhu et al. [47] proposed an SPSAE model. In this model, the input of each layer comes from the output of all the previous layers and the feature information of previous layers can be shared with subsequent layers.∥The activation function is one of the important parts of the autoencoder. We may increase the power of the autoencoder if we take a better activation function. Table [1] has some generally used activation functions. Some activation functions use exponential formulas in their functions. The logsigmoid, Hexpo, RelTanh, and tansigmoid are some activation use exponential formulas in their functions (Table 1). The purelin and satlin do not use the exponential functions. The exponential function is more useful for nonlinear complex datasets. So purelin and satline do not achieve the better result as compared with logsigmoid and tansigmoid activation function (AF) in maximum cases. The ReLU does not use the exponential or trigonometric functions. The ReLU suffers from a dying ReLU problem. This function is mainly used in CNN and very deep neural networks. In the case of ReLU AF, the neurons will be deactivated if the output of the linear transformation is less than 0 (i.e. for the negative value). The equation of a RelTanh AF is shown in Table 1, at serial number 8. This RelTanh AF is an improved AF of tanh. Xin Wang et al. achieved 96.15% testing accuracy with the help of RelTanh on the faulty dataset of CWRU [38]. The ReLU and its variants are mainly used in very deep layers and CNN networks. For an autoencoder generally ReLU and its variations AF is not used. So we compare our proposed autoencoder with purelin, satlin, tansigmoid, and logsigmoid AFs. In Table 2, we have shown comparisons of three other techniques and their results with our proposed autoencoder technique and our results.

Table 1
Different activation function, their mathematical equations, derivatives and range

Sr. Activation function Equation Derivative Range

1 Logsigmoid or Sigmoid [18, 19] $f (n) = \frac{1}{1 + e^{- n}}$ $f^{'} (n) = \frac{e^{- n}}{{(1 + e^{- n})}^{2}}$ (0,1)

2 Tansigmoid or Tanh [18, 19] $f (n) = \frac{e^{n} - e^{- n}}{e^{n} + e^{- n}}$ $f^{'} (n) = \frac{4 e^{2 n}}{{(1 + e^{2 n})}^{2}}$ (-1, 1)

3 Modified activation function $f (n) = \frac{e^{\frac{2 n}{3}} - 1}{{(1 + e^{2 n})}^{\frac{1}{3}}}$ $f^{'} (n) = \frac{2 e^{\frac{2 n}{3}}}{3 {(1 + e^{2 n})}^{\frac{1}{3}}} - \frac{2 e^{2 n} (e^{\frac{2 n}{3}} - 1)}{3 {(1 + e^{2 n})}^{\frac{4}{3}}}$ (-1, 1)

4 Linear or purelin [18] f(n) = n f′(n) = 1 unbounded

5 Saturating Linear or satlins [18] $f (n) = {\begin{matrix} 0, n < 0; \\ n, 0 \leq n \leq 1; \\ 1, n > 1 \end{matrix}$ $f^{'} (n) = {\begin{matrix} 0, n < 0; \\ 1, 0 \leq n \leq 1; \\ 1, n > 1 \end{matrix}$ (depends on conditions)

6 ISigmoid [36] $f (n) = {\begin{matrix} α (n - a) + Sigmoid (a), n \geq a; \\ Sigmoid (n), - a < n < a; \\ α (n + a) + Sigmoid (a), n \leq - a; \end{matrix}$ $f^{'} (n) = {\begin{matrix} α, | n | \geq a; \\ {Sigmoid}^{'} (n), - | n | < a; \end{matrix}$ (depends on conditions)

where ‘a’ is the threshold, α is the slope, and both of them are preset

7 ReLU [37] $f (n) = {\begin{matrix} n, n \geq 0; \\ 0, n < 0; \end{matrix}$ $f^{'} (n) = {\begin{matrix} 1, n > 0; \\ 0, n < 0; \end{matrix}$ (0,∞)

8 RelTanh [38] $f (n) = {\begin{matrix} {tanh}^{'} (λ^{+}) (n - λ^{+}) + tanh (λ^{+}), & n \geq λ^{+}; \\ tanh (n), & λ^{-} < n \geq λ^{+}; \\ {tanh}^{'} (λ^{-}) (n - λ^{-}) + tanh (λ^{-}), & x \leq λ^{-}; \end{matrix}$ $f^{'} (n) = {\begin{matrix} {tanh}^{″} (λ^{+}), & n \geq λ^{+}; \\ {tanh}^{'} (n), & λ^{-} < n \geq λ^{+}; \\ {tanh}^{″} (λ^{-}), & n \leq λ^{-}; \end{matrix}$ (depends on conditions)

$λ_{lower}^{+} \leq (λ^{+}) \leq λ_{upper}^{+}$ $λ_{upper}^{+}$ and {% $λ_{lower}^{-} \leq (λ^{-}) \leq λ_{upper}^{+}$

9 Swish [39] f(n) = n * α(n) where $α (n) = \frac{1}{1 + e^{- n}}$ f′(n) = f(n) + α(n)(1 - f(n)) (unbound above and bounded below)

10 Hexpo [40] $f (n) = {\begin{matrix} - a (e^{- \frac{x}{b}} - 1), & if x \geq 0 \\ c (e^{- \frac{x}{b}} - 1), & if otherwise \end{matrix}$ $f^{'} (n) = {\begin{matrix} \frac{a}{b} e^{\frac{x}{b}}, & if x \geq 0 \\ \frac{c}{d} e^{\frac{x}{b}}, & if otherwise \end{matrix}$ (-c, a)

Sr.	Activation function	Equation	Derivative	Range
1	Logsigmoid or Sigmoid [18, 19]	$f (n) = \frac{1}{1 + e^{- n}}$	$f^{'} (n) = \frac{e^{- n}}{{(1 + e^{- n})}^{2}}$	(0,1)
2	Tansigmoid or Tanh [18, 19]	$f (n) = \frac{e^{n} - e^{- n}}{e^{n} + e^{- n}}$	$f^{'} (n) = \frac{4 e^{2 n}}{{(1 + e^{2 n})}^{2}}$	(-1, 1)
3	Modified activation function	$f (n) = \frac{e^{\frac{2 n}{3}} - 1}{{(1 + e^{2 n})}^{\frac{1}{3}}}$	$f^{'} (n) = \frac{2 e^{\frac{2 n}{3}}}{3 {(1 + e^{2 n})}^{\frac{1}{3}}} - \frac{2 e^{2 n} (e^{\frac{2 n}{3}} - 1)}{3 {(1 + e^{2 n})}^{\frac{4}{3}}}$	(-1, 1)
4	Linear or purelin [18]	f(n) = n	f′(n) = 1	unbounded
5	Saturating Linear or satlins [18]	$f (n) = {\begin{matrix} 0, n < 0; \\ n, 0 \leq n \leq 1; \\ 1, n > 1 \end{matrix}$	$f^{'} (n) = {\begin{matrix} 0, n < 0; \\ 1, 0 \leq n \leq 1; \\ 1, n > 1 \end{matrix}$	(depends on conditions)
6	ISigmoid [36]	$f (n) = {\begin{matrix} α (n - a) + Sigmoid (a), n \geq a; \\ Sigmoid (n), - a < n < a; \\ α (n + a) + Sigmoid (a), n \leq - a; \end{matrix}$	$f^{'} (n) = {\begin{matrix} α, \| n \| \geq a; \\ {Sigmoid}^{'} (n), - \| n \| < a; \end{matrix}$	(depends on conditions)
			where ‘a’ is the threshold, α is the slope, and both of them are preset
7	ReLU [37]	$f (n) = {\begin{matrix} n, n \geq 0; \\ 0, n < 0; \end{matrix}$	$f^{'} (n) = {\begin{matrix} 1, n > 0; \\ 0, n < 0; \end{matrix}$	(0,∞)
8	RelTanh [38]	$f (n) = {\begin{matrix} {tanh}^{'} (λ^{+}) (n - λ^{+}) + tanh (λ^{+}), & n \geq λ^{+}; \\ tanh (n), & λ^{-} < n \geq λ^{+}; \\ {tanh}^{'} (λ^{-}) (n - λ^{-}) + tanh (λ^{-}), & x \leq λ^{-}; \end{matrix}$	$f^{'} (n) = {\begin{matrix} {tanh}^{″} (λ^{+}), & n \geq λ^{+}; \\ {tanh}^{'} (n), & λ^{-} < n \geq λ^{+}; \\ {tanh}^{″} (λ^{-}), & n \leq λ^{-}; \end{matrix}$	(depends on conditions)
			$λ_{lower}^{+} \leq (λ^{+}) \leq λ_{upper}^{+}$ $λ_{upper}^{+}$ and {% $λ_{lower}^{-} \leq (λ^{-}) \leq λ_{upper}^{+}$
9	Swish [39]	f(n) = n * α(n) where $α (n) = \frac{1}{1 + e^{- n}}$	f′(n) = f(n) + α(n)(1 - f(n))	(unbound above and bounded below)
10	Hexpo [40]	$f (n) = {\begin{matrix} - a (e^{- \frac{x}{b}} - 1), & if x \geq 0 \\ c (e^{- \frac{x}{b}} - 1), & if otherwise \end{matrix}$	$f^{'} (n) = {\begin{matrix} \frac{a}{b} e^{\frac{x}{b}}, & if x \geq 0 \\ \frac{c}{d} e^{\frac{x}{b}}, & if otherwise \end{matrix}$	(-c, a)

Table 2

Comparison of three other techniques and experiment with our proposed autoencoder

Sr.	Autoencoder	Techniques	Experiment
1	Csaha Brunner et al. [48]	They proposed SAE-SNN (Sparse Autoencoder stacked neural network), AE-SNN (Autoencoder enhanced SNN), and SNN using logsigmoid activation function.	The authors have classified the NSL-KDD dataset using these techniques.
2	S. F. Qadri et al. [49]	They proposed stacked sparse autoencoder-based patch classification modeling for vertebrae segmentation (SVeg).	The authors have classified MICCAI Computational Spine Imaging (CSI) dataset using these techniques.
3	Gated relational probabilistic stacked denosing autoencoder (GRSLA) [50]	This network is an extension of the probabilistic stacked denoising autoencoder (PSDAE) by adding gated units.	Tao Dai et al. [50] used this network on AAN, DBLE, and CORD-19 datasets. They achieved better results as compared with previous global citation recommendation methods.
4	Our proposed autoencoder	We made a two-layer stacked autoencoder with a modified sigmoid activation function.	We tested our proposed autoencoder on the iris, glass, wine, ovarian, digit (image), abalone, thyroid, and chemical datasets.

1.1 Contributions

Our main aims in this paper are to propose an effective autoencoder technique that should be able to achieve better results compare to the existing autoencoder technique. Especially, the contribution of this paper is to change the activation function of an autoencoder.

For achieving this aim, firstly, we have taken some datasets. We have taken eight different types of datasets whose attributes and targets are different. Secondly, based on these datasets, now, we have decided to classify these datasets. Autoencoder is one of the best deep learning techniques. Therefore, we have taken an existing autoencoder technique. We have classified these datasets using this network or technique. But, we have found that these datasets do not achieve a better result. The accuracy of an autoencoder depends upon the size of hidden nodes, Epochs, L2 regularization, L2 Weight regularization, Sparsity Proportion, Sparsity regularization, training algorithm, and activation functions. Now our main contribution starts, we have decided to change the activation formula (AF) which is used in the existing technique. Therefore, we have proposed a new AF. Using this AF, we have proposed a new autoencoder. Finally, using this autoencoder, we have again analyzed these datasets on the same configuration. In multiple cases, we have achieved at par or better results as compared with the existing autoencoder technique. In the discussion section, we have presented full details of why our autoencoder gives better results using this modified AF.

2 Autoencoder

An autoencoder has the same number of neurons in the input layer and output layer. With the help of an autoencoder, we reduce the dimensions of the data [41]. Its main work is to copy its input to its output. There are two types of autoencoders. The first is a sparse autoencoder, and the second is a stacked autoencoder. There are three layers of the autoencoder. First is input, second is code (hidden), and third is the output layer. There are two main processes or components of the autoencoder. They are encoder and decoder (Fig. 1a.).

Fig. 1

(a) Autoencoder, (b) stacked autoencoder.

Encoder process: The output value of this process is calculated with the help of equation (1). Suppose, n_i is the value of the output of i^th neuron, then $n_{i} = f (\sum_{j = 1}^{m} w_{j} x_{j} + b)$ (1)

Where w_j is a weight value of j^th neuron, x_j is a input value of j^th neuron, b is a bias value, m is total number of inputs, and f(.) denotes an activation function of the encoder.

Decoder process: The output value of this process is calculated with the help of Equation (2). Suppose, nn_i is the value of the output of i^th neuron, then $n n_{i} = g (\sum_{j = 1}^{mm} w_{j} n_{j} + bb)$ (2)

Where w_j is a weight value of j^th neuron, n_j is a input value (or ouput value of encoder) of j^th neuron, bb is a bias value, mm is total number of inputs, and g(.) denotes an activation function of the decoder.

On the encoder and decoder place, if we use the best activation then we may get the best results. But if our activation is not best then we may not get the best result. So in this paper, we use some modified activation functions for getting the best results.

During the design of an autoencoder, we calculate the Loss function. This Loss function (L (y_ij, $y_{ij}^{'}$ )) is a sum of mean square error (MSE), L2 regularization (L₂ (n)) multiple with a control value (λ), and sparsity regularization (Ψ(n)) multiples with a control value (β) (as in Equation (3)). $L (y_{ij}, y_{ij}^{'}) = MSE + λ * L_{2} + β * Ψ (n)$ (3) Here, $MSE = \frac{\sum_{i = 1}^{n} \sum_{j = 1}^{m} {(y_{ij} - y_{ij}^{'})}^{2}}{n}$ (4)

Where y_ij is the target value, y_ij’ is a predicted value, n is a number of input values and m is a number of variables or classes.

A number of the hidden nodes, Epochs, L2 regularization, L2 Weight regularization, Sparsity Proportion, Sparsity regularization are some parameters used during the training of the autoencoder. Hidden neurons (or nodes) work as a feature detector. So, with the help of the hidden neurons, we find out the features of the network. To get high accuracy, we run the network many times. So epochs are used to show how many times we run the network. L2 Weight regularization is also known as ridge regression or Tikhonov regularization. L2 Weight regularizer is used for weight decay only. It is not used for bias control. Regularized objective function J’ is calculated with the Equation (5). $J (θ; x, y) = J (θ; x, y) + α . Ω (θ)$ (5)

Where J is an objective function, θ is the size of the parameter, α is a hyperparameter, x is an input value, and y is a target value.

In the above equation if we put $Ω (θ) = \frac{1}{2} w^{T} w$ then we get Equation (6). This equation is called weight decay. $J (θ; x, y) = J (θ; x, y) + \frac{α}{2} * w^{T} w$ (6)

Now if we calculate the gradient of the above equation with respect to weight then we get Equation (7). $Δ_{w} J (w; x, y) = Δ_{w} J (w; x, y) + α w$ (7)

Now, we update the weight by using Equation (8). $\begin{matrix} w_{new} = w_{old} - η Δ_{w} J (w; x, y) \\ = w_{old} - η [Δ w J (w; x, y) + α w] \end{matrix}$ (8)

Where, η is learning rate parameters.

To get better accuracy of the result of the w_new, we minimize the value of Δ_w J (w; x, y). So, we typically put its value as quite small.

In Equation (3), β * Ψ(n) is added to the loss function. Its main work is to control the sparsity. We generally take a low value of β so that we get a high degree of sparsity. It is also known as Sparsity Proportion. Sparsity regularization is calculated with the help of the Kullback_Leibler divergence [41] function, which is defined as under Equation (9). $Ψ (n) = KL (ρ | | ρ_{j}) = ρ log (\frac{ρ}{ρ_{j}}) + (1 - ρ) \frac{(1 - ρ)}{(1 - ρ_{j})}$ (9) Where, ρ is desired value and ρ_j is an average output activation value of j^th neuron and ρ_j is calculated with the Equation (10). $ρ_{j} = \frac{\sum_{i = 1}^{n} f (w_{i} x_{i}) + b}{n}$ (10)

If we want the range of the input data to match with the range of the transfer function for the decoder, then we use scale data. We maximize or minimize the value of any function with the help of optimization algorithms. There are two classes of maximum and minimum value. They are local maxima and local minima, and global maxima and global minima. Optimization algorithms are used during the calculation of new weights and biases. Losses are reduced with the help of these optimization algorithms. SCG, Adagrad, RMSprop, Adam are some famous optimization algorithms. The scaled conjugate gradient (SCG) algorithm is based on conjugate directions. SCG is used for finding out the local minima of a differentiable function. Whenever derivative of weight, input, and transfer functions are given in the network, then the SCG method may be used for training a network.

In the SCG case, the parameter of each training set is updated by the Equation (11). $θ = θ - η . Δ_{θ} J (θ; x^{(i)}, y^{(i)})$ (11)

SCG is used by many researchers. Upadhyay et al. (2019) had used SCG in violation prediction in Cloud Computing [51]. Jyotiprakash et al. had used LM, SCG, and BR for the prediction of water quality index in the ANN [52]. S.O. Sada had used LM and SCG for the prediction of accuracy in a mild steel turning operation [53]. Abdollahi et al. had proposed a new conjugate gradient method that was based on SCG and was used for solving an optimization problem [54]. Sujatha et al. had used BR, SCG, and LM for analysis of Bitcoin Trends [55]. SCG is used for large networks. SCG is a faster algorithm and can be also used during online training. Due to these uses and properties of SCG, we test our proposed autoencoder on SCG.

If we make a network by stacking more than one autoencoder together with the softmax layer, then it is called a Stacked Autoencoder (Fig. 1b.).

3 Comparisons of related activation functions

We may solve classification problems with the help of the neural network technique. There are a number of types of neural network techniques. An autoencoder is one of those techniques. In this paper, we have proposed an effectively stacked autoencoder with the help of a modified sigmoid activation function. We have proposed a two-layer stacked autoencoder, with a modified sigmoid activation function. After that, we have compared our autoencoder to the existing autoencoder technique. Authors of papers [41 , 58-65] had used the existing autoencoder technique. In the existing autoencoder technique, we generally use the logsigmoid activation function. Therefore here, first of all, we have discussed logsigmoid Activation Function (AF) and its properties and after that, we have discussed our modified Activation Function (AF) and its properties and lastly, we have explained why our autoencoder achieved better results.

A) logsigmoid Activation Function (AF) and its properties

Equation (12) is a log sigmoid function. Figure 2 is a plot of this AF.

Fig. 2

The plot of logsigmoid and modified sigmoid activation functions. This graph clearly shows the range of logsigmoid is [0, 1] and the range of our modified sigmoid is [-1, 1].

f (n) = \frac{1}{1 + e^{- n}}

(12)

Where ‘n’ is input and f (n) is output.

This AF has the following properties:

Nonlinear: The logsigmoid activation function uses the exponential function. Therefore, this activation function is also used for solving nonlinear problems.

Range and Shape: The range of this activation function is (0, 1). This range is a finite range. Therefore, in the case of pattern recognition and classification, this activation function will show more stability. The shape of this activation function is ‘S’ (Fig. 2).

Continuously Differentiable: This AF is a continuously differentiable function (Fig. 3). Its differentiation is given in equation (13). Therefore, we may use this AF in the gradient-based Optimization method.

Fig. 3

The plot of differentiation of logsigmoid and our modified sigmoid activation function.

f^{'} (n) = \frac{e^{- n}}{{(1 + e^{- n})}^{2}}

(13)

Zero-Centered: This AF is not a zero-centered activation function.

B) Modified Sigmoid Activation Function and its properties

Equation (14) is our modified AF. Figure 2 is a plot of this AF. $f (n) = \frac{e^{\frac{2 n}{3}} - 1}{{(1 + e^{2 n})}^{\frac{1}{3}}}$ (14)

Our modified AF has the following properties:

Nonlinear: Our modified AF uses an exponential function like logsig AF. Therefore, we can use our AF for solving nonlinear problems. Due to this reason, our AF gives better results compared to the purelin and satlin AFs.

Range and Shape: The range of our AF is (-1, 1). This range is a finite range. Therefore, in the case of pattern recognition and classification, our AF will show more stability. The shape of this activation function is ‘S’ (Fig. 2).

Continuously Differentiable: Our AF is a continuously differentiable function (Fig. 3). Its differentiation is given in Equation (15). Therefore, we may use our AF in the gradient-based Optimization method. $f^{'} (n) = \frac{2 e^{\frac{2 n}{3}}}{3 {(1 + e^{2 n})}^{\frac{1}{3}}} - \frac{2 e^{2 n} (e^{\frac{2 n}{3}} - 1)}{3 {(1 + e^{2 n})}^{\frac{4}{3}}}$ (15)

Zero-Centered: Our AF is a zero-centered activation function.

As compared with logsigmoid AF, our activation gives more variations for different input values (Table 3). So we can get more accuracy for different input values with the help of our AF. As with logsigmoid AF, our modified activation function (AF) also allows backpropagation, because our AF is a differential function. Our modified AF has zero centered but logsigmoid is not zero centered AF. Therefore, our AF works better than logsigmoid. The logsigmoid is not symmetric around zero, but our AF is symmetric around zero. Due to this reason, we may use this AF for solving very complex problems (Non-linear problems); such as audio, images, or any high dimensionality problems.

Table 3

The output value of logsigmoid and modified sigmoid on different input values

Input	Logsigmoid	Modified sigmoid
value	value	value
1	0.73105858	0.4664253
2	0.88079708	0.7319611
3	0.95257413	0.8639515
4	0.98201379	0.9304125
5	0.99330715	0.9643114
6	0.99752738	0.9816824
7	0.99908895	0.9905962
8	0.99966465	0.995172
9	0.99987661	0.9975212
10	0.9999546	0.9987274
11	0.9999833	0.9993466
12	0.99999386	0.9996645
13	0.99999774	0.9998278
14	0.99999917	0.9999116
15	0.99999969	0.9999546
16	0.99999989	0.9999767
17	0.99999996	0.999988
18	0.99999998	0.9999939
19	0.99999999	0.9999968
20	1	0.9999984

As with logsigmoid, there are two limitations of our AF. Our AF has a finite range, due to this reason for the very high or very low value of input; there is almost no change to prediction. This problem is also called the vanishing gradient problem. Our AF may be performing slow convergence.

4 Experiments

We have done this experiment on Intel core i7, window 10, and MATLAB 2021a [56]. In this research, we have proposed a Stacked Autoencoder. In this Stacked Autoencoder, we have taken one input layer, two autoencoder layers, a softmax layer, and an output layer. We have taken 5-datasets from the MATLAB software. These datasets are the iris, glass, wine, ovarian, and digit (image) datasets for comparison propose. After taking all these datasets, we have taken the values of regularizers.

Iris dataset: The iris dataset has 150 samples of three flowers. This dataset has four attributes and three targets. We have taken 150 samples for training and 150 samples for the testing purposes of this dataset. We have taken the values of the Hidden neuron size is to 4, Maximum epochs is to 400, L2WeightRegularization is to 0.001, SparsityRegularization is to 4, SparsityProportion is to 0.05, Decoder Transfer Function is to logsigmoid/‘our formula’ of autoencoder 1. And set values of the Hidden neuron size is to 50, the Maximum epochs is to 400, L2WeightRegularization is to 0.0015, SparsityRegularization is to 4, and SparsityProportion is to 0.05 of autoencoder 2. In this dataset, we have put our modified activation formula on autoencoder 1 at decoder place.

Wine, ovarian, and glass dataset: The wine dataset has 178 samples of wine data. This dataset has 13-attributes and 3-targets. We have taken 178 samples for training and 178 samples for the testing purposes of this dataset. The ovarian dataset has 216 samples of ovarian data. This dataset has 100-attributes and 2-targets. We have taken 216 samples for training and 216 samples for testing purposes of this dataset. The glass dataset has 214 samples of glass values. This dataset has 9-attributes and 2-targets. We have taken 214 samples for training and 214 samples for the testing purposes of this dataset. We have taken the values of the Hidden neuron size is to 10, Maximum epochs is to 400, L2WeightRegularization is to 0.001, SparsityRegularization is to 4, SparsityProportion is to 0.05, and Decoder Transfer Function is to logsigmoid/‘our formula’ of autoencoder 1. And set values of the Hidden neuron size is to 50, Maximum epochs is to 400, L2WeightRegularization is to 0.0015, SparsityRegularization is to 4, and SparsityProportion is to 0.05 of autoencoder 2. In these datasets, we put our modified activation formula on autoencoder 1 at decoder place.

Digit dataset: This dataset consists of 5000 training images and 5000 testing images with the size of 28-by-28 pixels. We have taken the values of the Hidden neuron size is to 100, Maximum epochs is to 400, L2WeightRegularization is to 0.004, SparsityRegularization is to 4, and SparsityProportion is to 0.15 of autoencoder 1. And set values of the Hidden neuron size is to 50, Maximum epochs is to 100, L2WeightRegularization is to 0.002, SparsityRegularization is to 4, SparsityProportion is to 0.1, and Decoder Transfer Function is to logsigmoid/‘our formula’ of autoencoder 2. In this dataset, we have checked our modified activation formula into autoencoder 2. In Fig. 4.b we have shown our modified sigmoid position. We have put our modified sigmoid at the decoder place.

Fig. 4

Detail of autoencoder made by digit datasets. (a) is the first autoencoder. (b) is a second autoencoder. In the second autoencoder, we have put our modified sigmoid on the decoder place. (c) is a softmax layer, and (d) is a stacked network made by stacking of (a), (b), and (c).

All the parameter setting values are presented in Table 4 of autoencoder 1 and Table 5 of autoencoder 2. If we take all these values of the regularizer, then we can achieve better results (above 90%). When we change these values, then our accuracy result reduces. So for testing our network, we have taken all these values of the regularizer. After setting all the above regularization values, we have taken a softmax layer. During the training of the softmax layer, we have taken a cross-entropy loss function; because in multi-class classification problems cross-entropy loss function (Equation (16)) gives a better result as compared with the mean squared error (Equation (4)) loss function [57]. $E = - \frac{\sum_{i = 1}^{n} \sum_{j = 1}^{m} (y_{ij} . ln y_{ij}^{'} + (1 - y_{ij}) . ln (1 - y_{ij}^{'}))}{n}$ (16)

Table 4

Parameter values of autoencoder 1

Sr. No.	Name of dataset	Parameters values
		Hidden neuron size	L2Weight regularization	Sparsity regularization	Sparsity proportion	Decoder transfer function
1	Iris	4	0.001	4	0.05	logsigmoid/our formula
2	Wine, ovarian, glass,	10	0.001	4	0.05	logsigmoid/our formula
	abalone chemical
3	Digit	100	0.004	4	0.15	logsigmoid
4	Thyroid	28	0.001	4	0.05	logsigmoid/our formula

Table 5

Parameter values of autoencoder 2

Sr. No.	Name of dataset	Parameters value
		Hidden neuron size	L2Weight regularization	Sparsity regularization	Sparsity proportion	Decoder transfer function
1	Iris, wine, ovarian, glass,	50	0.0015	4	0.05	logsigmoid
	abalone, thyroid, chemical
2	Digit	50	0.002	4	0.1	logsigmoid/our formula

Where y_ij is the target value, y’_ij is a predicted value and n is a number of input values.

In the Softmax, Soft means softmax is continuous and differentiable [17]. The Softmax function is now mostly used in the output layer of the classifier [17]. In this softmax function, if n is an input then output (a) is obtained with the help of Equation (17). $a = \frac{e^{n_{i}}}{\sum_{j = 1}^{n} e^{n_{j}}}$ (17)

First of all, we have trained our autoencoder in an unsupervised manner up to the softmax layer, after that, we add all these networks. This joined network is known as stacked autoencoder. Figure 4.d is a stacked autoencoder made by stacking of Fig. 4.a, 4.b and 4.c (MATLAB generated images). At last, we have trained this stacked autoencoder in a supervised manner for getting high accuracy. We have to repeat this process until we don’t get stopping criteria. After completing these processes, we have achieved the exact weight and bias of the network of both layers. Figure 5 is an example of the weight image of the first autoencoder in the case of a digit dataset. After that, we have calculated errors, accuracy with the help of a confusion matrix. We have shown all confusion matrixes in Fig. 6.

Fig. 5

Weight of the first autoencoder in the case of digit dataset.

Fig. 6

Confusion matrix obtained from all eight datasets. The first column is obtained from the existing autoencoder technique. The second column is obtained from Our proposed autoencoder technique. Fig. ‘a.1’ and ‘a.2’ are iris, ‘b.1’ and ‘b.2’ are wine, ‘c.1’ and ‘c.2’ are ovarian, ‘d.1’ and ‘d.2’ are glass, ‘e.1’ and ‘e.2’ are digit, ‘f.1’ and ‘f.2’ are abalone, ‘g.1’ and ‘g.2’ are thyroid, and ‘h.1’ and ‘h.2’ are chemical confusion matrix.

For more verification of the effeteness of our proposed autoencoder, we have taken three more datasets. They are abalone, thyroid, and chemical datasets. The abalone dataset has 4177 samples of abalone shell rings. This dataset has 8-attributes and 1-targets. We have taken 4177 samples for training and 4177 samples for the testing purposes of this dataset. The thyroid dataset has 7200 samples of thyroid patients. This dataset has 21-attributes and 3-targets. We have taken 7200 samples for training and 7200 samples for the testing purposes of this dataset. The chemical dataset has 498 samples of chemical sensors. This dataset has 8-attributes and 1-targets. We have taken 498 samples for training and 498 samples for the testing purposes of this dataset. The parameter setting values of these datasets are presented in Table 4 of autoencoder 1 and Table 5 of autoencoder 2.

5 Results and discussion

After completing the training process of this network, we have achieved confusion matrixes. In this research, we have taken eight datasets for comparison purposes. So, we have found 16 confusion matrixes. In all these confusion matrixes, the value of the output class (row) is known as a predicated class and the target class (columns) is known as true class. Diagonal values of these matrixes are correctly classified values and other values are incorrectly classified values. The upper and lower values of the last column are the percentage value of all the true classified values (or positive predictive value of precision value) and false discovery rate values (or incorrectly classified value) respectively. The upper and lower value of the last row (or bottom) values are the percentage of all example values (or recall values or true positive rate values) and false negative rate values respectively. The last right value of these matrixes is the overall accuracy value. The confusion matrixes of Fig. 6.a.1, 6.b.1 6.c.1, 6.d.1, 6.e.1, 6.f.1, 6.g.1 and 6.h.1 have been found with the help of the existing autoencoder technique and the confusion matrixes of Fig. 6a.2, 6.b.2, 6.c.2, 6.d.2, 6.e.2, 6.f.2, 6.g.2 and 6.h.2. have been found with the help of our proposed autoencoder technique. Figure 6a.1 and 6.a.2 .have been obtained by training of iris datasets on existing autoencoder technique and on our proposed autoencoder technique respectively. In the case of iris datasets, we have achieved 96.0% accuracy with the help of the existing autoencoder technique and 100.0% accuracy with the help of our proposed autoencoder technique. Figure 6.b.1 and 6.b.2 have been obtained by training of wine datasets on existing autoencoder technique and on our proposed autoencoder technique respectively. In the case of wine datasets, we have achieved 91.0% accuracy with the help of the existing autoencoder technique and 100.0% accuracy with the help of our proposed autoencoder technique. Figure 6.c.1 and 6.c.2 have been obtained by training of ovarian datasets on existing autoencoder technique and on our proposed autoencoder technique respectively. In the case of ovarian datasets, we have achieved 95.4% accuracy with the help of the existing autoencoder technique and 100.0% accuracy with the help of our proposed autoencoder technique.

Figure 6.d.1 and 6.d.2 have been obtained by training of glass datasets on existing autoencoder technique and on our proposed autoencoder technique respectively. In the case of glass datasets, we have achieved 96.3% accuracy with the help of the existing autoencoder technique and 100.0% accuracy with the help of our proposed autoencoder technique. Figure 6.e.1 and 6.e.2 have been obtained by training of digit datasets on existing autoencoder technique and on our proposed autoencoder technique respectively. In the case of digit datasets, we have achieved 98.7% accuracy with the help of the existing autoencoder technique and 99.4% accuracy with the help of our proposed autoencoder technique. Figure 6.f.1 and 6.f.2 have been obtained by training of abalone datasets on existing autoencoder technique and on our proposed autoencoder technique respectively. In the case of abalone datasets, we have achieved 100% accuracy with the help of the existing autoencoder technique and with the help of our proposed autoencoder technique. Figure 6.g.1 and 6.g.2 have been obtained by training of thyroid datasets on existing autoencoder technique and on our proposed autoencoder technique respectively. In the case of thyroid datasets, we have achieved 95% accuracy with the help of the existing autoencoder technique and 96% accuracy with the help of our proposed autoencoder technique. Figure 6.h.1 and 6.h.2 have been obtained by training of chemical datasets on existing autoencoder technique and on our proposed autoencoder technique respectively. In the case of chemical datasets, we have achieved 100% accuracy with the help of the existing autoencoder technique and with the help of our proposed autoencoder technique.

We have compared the performance of our modified AF with purelin, satlin, and tansigmoid AF on the autoencoder. In multiple cases, our proposed autoencoder has achieved at par or better results (Table 6).

Table 6
Comparisons of accuracy on different activation function (AF) on two layer autoencoder

Sr. No. Activation functions Iris Wine Ovarian Glass Digit Abalone Thyroid Chemical

1 Purelin 33.3 100 100 97.7 98.9 100 96.1 100

2 Logsig 96.0 91.0 95.4 96.3 98.7 100 95.0 100

3 Satlin 98.7 100 99.1 97.7 99.0 100 96.0 100

4 Tansig 98.7 100 100 96.3 98.7 100 93.5 100

5 Modified sigmoid 100 100 100 100 99.4 100 96.0 100

Sr. No.	Activation functions	Iris	Wine	Ovarian	Glass	Digit	Abalone	Thyroid	Chemical
1	Purelin	33.3	100	100	97.7	98.9	100	96.1	100
2	Logsig	96.0	91.0	95.4	96.3	98.7	100	95.0	100
3	Satlin	98.7	100	99.1	97.7	99.0	100	96.0	100
4	Tansig	98.7	100	100	96.3	98.7	100	93.5	100
5	Modified sigmoid	100	100	100	100	99.4	100	96.0	100

We have also checked the effectiveness of our proposed autoencoder on two more hidden layers (3 and 4) of stack autoencoder. In all these layers our proposed autoencoder has achieved acceptable results (Table 7).

Table 7

Accuracy value of different datasets on different number of layers of stack autoencoder

Sr. No.	No. of layers of	Accuracy value
	stack autoencoder	Iris	Wine	Ovarian	Glass	Digit	Abalone	Thyroid	Chemical
1	2	100	100	100	100	99.4	100	96.0	100
2	3	98.7	100	100	76.2	98.1	100	98.2	100
3	4	97.3	86.0	100	100	97.0	100	97.8	100

From all the above results, we can say that in multiple cases our proposed autoencoder technique shows very effective performance and can achieve at par or better results compare to existing autoencoder techniques.

For checking the overfitting problems of the proposed autoencoder, we have divided our datasets into two different numbers of training and testing samples. In Table 8, we have shown these different training and testing samples. In this table, we have also shown the training and testing accuracy. These accuracies have been achieved using 2 layers of our proposed autoencoder. All these accuracies are acceptable accuracy.

Table 8

Training samples, testing samples, total samples, training accuracy and testing accuracy on different datasets

Sr. No.	Datasets	Samples			Accuracy (%)
		Training samples	Testing samples	Total samples	Training accuracy	Testing accuracy
1	iris	120	30	150	100	100
2	wine	148	30	178	100	96.7
3	ovarian	166	50	216	100	96.0
4	glass	164	50	214	100	92.0
5	digit	5000	5000	10000	100	99.4
6	abalone	3133	1044	4177	100	100
7	thyroid	6200	1000	7200	98.2	96.9
8	chemical	400	98	498	100	100

Csaha Brunner et al. [48] achieved 73.2% accuracy from SAE-SNN, 74.26% accuracy from AE-SNN, and 77.25% accuracy from SNN on the NSL-KDD dataset. S. F. Qadri et al. [49] achieved 91.53% precision from SVeg on the MICCAI dataset. In Table 9, we have presented some other autoencoder, techniques, and their result. We have compared our techniques and our results with them. Serial no1 autoencoder is proposed by Jinling Zhao et al. based on a combination of SAE and 3 DDRN networks. Serial no 2 autoencoder is proposed by Yongming Li et al. based on ESGSAE. Serial no 3 autoencoder is proposed by R. Dhanuka et al. and based on A Semi-supervised autoencoder-based approach. Serial no 4 autoencoder is proposed by O Aouedi et al. based on a novel federated semi-supervised learning scheme. Serial no 5 autoencoder is our proposed autoencoder.

Table 9

Comparison of three other techniques with our proposed autoencoder

Sr.	Autoencoder	Techniques	Results
1	Combining the SAE and 3DDRN network [66]	This network is a combination of a stacked autoencoder and 3D convolutional neural network.	Jinling Zhao et al. [66] used this network on Indian Pines (IP), Pavia University (PU), and Salinas (SA) datasets. They achieved an overall accuracy of 98.97% on IP, 99.69% on PA, and 99.24% on SA datasets.
2	Embedded stacked group sparse autoencoder (ESGSAE) [67]	For the complimentarily between the original features and the hidden outputs by embedding the original features into hidden layers, the authors proposed an embedded stacked group sparse autoencoder (ESGSAE). For more improvement, the authors used the L1 regularization-based feature selection strategy.	Yongming Li et al. [67] used this network on several public datasets and improved the classification accuracy by up to 13.33%, 7.33% and 7.55% respectively.
3	A Semi-supervised autoencoder-based approach for protein function prediction [68]	R. Dhanuka et. al. proposed [68] three types of this method. The first proposed is the sequential method, the second proposed is the sequential &Interaction method, and the third is the sequential &Interaction &Physicochemical method.	They achieved average precision of 0.331 (from the first method), 0.4019 (from the second method), and 0.4013 (from the third method) through the biological process. They achieved average precision of 0.4347 (from the first method), 0.5078 (from the second method), and 0.5129 (from the third method) through molecular function.
4	A novel federated semi-supervised learning scheme	O Aouedi et al. proposed this learning scheme. During training on each device, they used an autoencoder.	They got 95.84% overall accuracy on the SCADA system dataset.
5	Our proposed autoencoder	We made a two-layer stacked autoencoder with a modified sigmoid activation function.	Our proposed autoencoder achieved 100% accuracy on the iris, wine, ovarian, glass, abalone, and chemical, 96% accuracy on thyroid datasets, and 99.4% accuracy on digit (image) datasets.

5.1 Performance analysis

Figure 7, shows the performance analysis of three AFs. This analysis has obtained with the help of the SCG training algorithm and on the glass dataset. We have trained these networks upto 10000 epochs. We have achieved the best training performance is 4.9753e-16 at epoch 2469 with the help of our proposed activation function. We have achieved the best training performance is 1.8993e-15 at epoch 9468 with the help of tanh. We have achieved the best training performance is 7.922e-16 at epoch 6478 with the help of losgmoid. Thus we say that on the basis of performance analysis, for the same training samples, algorithms, and the same configurations, our proposed activation function gives better performance.

Fig. 7

Cross-Entropy value on 10000 epochs. Figures (a), (b), and (c) have been obtained from our proposed activation function, tansigmoid, and logsigmoid respectively.

6 Conclusion and future scope

In this paper, we have explained our proposed effective stacked autoencoder with the help of a modified sigmoid activation function. Our modified sigmoid activation function is a smooth S-shape, bounded range (-1, 1), continuously differentiable, and zero centered function. Vanishing gradient and slow convergence are two limitations of our modified sigmoid activation function. We have compared our proposed stacked autoencoder with the presently used autoencoder technique. We have tested our proposed autoencoder on the iris, glass, wine, ovarian, and digit image datasets for comparison propose. The existing autoencoder technique has given 96% accuracy on the iris, 91% accuracy on wine, 95.4% accuracy on ovarian, 96.3% accuracy on glass, 98.7% accuracy on digit (image) dataset. Our proposed autoencoder has given 100% accuracy on the iris, wine, ovarian, and glass, and 99.4% accuracy on digit (image) datasets. In the discussion part, we have explained why our proposed stacked autoencoder is so effective and gave better results as compared with the existing autoencoder technique. The accuracy achieved with our proposed autoencoder is more than 4% on the iris, 9% on wine, 4.6% on ovarian, 3.7% on glass, and 0.6% on digit datasets as compared with the existing autoencoder technique. For more verification of the effeteness of our proposed autoencoder, we have taken three more datasets. They are abalone, thyroid and chemical datasets. Our proposed autoencoder has achieved 100% accuracy on the abalone and chemical, and 96% accuracy on the thyroid datasets.

Therefore in the future, we may consider our proposed stacked autoencoder in the neural network for getting better accuracy.

References

Luo , Xi , Jinwen Xia , Yanfeng Liu , Extraction of dynamic operationstrategy for standalone solar-based multi-energy systems: A methodbased on decision tree algorithm,}, {Sustainable Cities andSociety 70 (2021), 102917.

Bhajantri , Lokesh

, Basavaraj Kumbar

, Clusterbased data aggregation in wireless sensor networks: a bayesian classifier approach. In Evolutionary Computing and Mobile Sustainable Networks (2021), 971–980. Springer, Singapore.

Alkhodari , Mohanad , Luay Fraiwan , Convolutional and recurrentneural networks for the detection of valvular heart diseases inphonocardiogram recordings,, Computer Methods and Programs inBiomedicine 200 (2021), 105940.

Zhao , Dongdong , Xiaoyi Hu , Shengwu Xiong , Jing Tian , Jianwen Xiang , Jing Zhou , Huanhuan Li , K-means clustering and kNN classification based on negative databases,, Applied Soft Computing 110 (2021), 107732.

Ketu , Shwet , Pramod Kumar Mishra , Scalable kernelbasedSVMclassification algorithm on imbalance air quality data for proficient healthcare, Complex & Intelligent Systems (2021), 1-19.

Karimuzzaman , Md Nusrat Islam , Sabrina Afroz , Md Moyazzem Hossain , Predicting stock market price of Bangladesh: a comparative study of linear classification models,, Annals of Data Science 8(1) (2021), 21–38.

Itoo , Fayaz , Satwinder Singh , Comparison and analysis of logisticregression, Naïve Bayes and KNN machine learning algorithms forcredit card fraud detection,, International Journal ofInformation Technology 13(4) (2021), 1503–1511.

Bai , Bing , Guiling Li , Senzhang Wang , Zongda Wu , Wenhe Yan , Time series classification based on multi-feature dictionary representation and ensemble learning,, Expert Systems with Applications 169 (2021), 114162.

Wu , Zongda , Guiling Li , Shigen Shen , Xinze Lian , Enhong Chen , Guandong Xu , Constructing dummy query sequences to protect location privacy and query privacy in location-based services,, World Wide Web 24(1) (2021), 25–49.

10.

Tang , Siyuan , Feifei Yu , Construction and verification of retinal vessel segmentation algorithm for color fundus image under BP neural network model,, The Journal of Supercomputing 77(4) (2021), 3870–3884.

11.

Lu , Danni , Dongbing Tong , Qiaoyu Chen , Wuneng Zhou , Jun Zhou , Shigen Shen , Exponential Synchronization of Stochastic NeuralNetworks with Time-Varying Delays and Lévy Noises viaEvent-Triggered Control,, Neural Processing Letters 53(3) (2021), 2175–2196.

12.

Zeng , Yuwei , Xiaochun Yun , Xunxun Chen , Boquan Li , Haiwei Tsang , Yipeng Wang , Tianning Zang , Yongzheng Zhang , Finding disposable domain names: A linguistics-based stacking approach,, Computer Networks 184 (2021), 107642.

13.

Ying , Limeng , Xiaojing Liu , Menghao Li , Lipeng Sun , Pishi Xiu , Jie Yang , How does intelligent manufacturing affects enterprise innovation? The mediating role of organisational learning, Enterprise Information Systems (2021), 1-38.

14.

Feng , Jing , Xiaobin Xu , Pan Liu , Feng Ma , Chengrong Ma , Zhigang Tao , Slope Sliding Force Prediction via Belief Rule-Based Inferential Methodology,, International Journal of Computational Intelligence Systems 14(1) (2021), 965–977.

15.

Song , Fei , Alfonso Rodriguez-Dono , Sebastia Olivella , Zhen Zhong , Analysis and modelling of longitudinal deformation profiles of tunnels excavated in strain-softening time-dependent rock masses,, Computers and Geotechnics 125 (2020), 103643.

16.

Feng , Sheng , Haiyan Shi , Longjun Huang , Shigen Shen , Shui Yu , Hua Peng , Chengdong Wu , Unknown hostile environment-oriented autonomous WSN deployment using a mobile robot,, Journal of Network and Computer Applications 182 (2021), 103053.

17.

Goodfellow

, Bengio

, Courville

Deep Learning, MIT Press (2016).

18.

Hagan

M.T.

Neural Network Design, 2^nd Edition Book, 2014.

19.

Haykin

Neural Networks, and Learning Machines, 3^rd Edition, Pearson Prentice Hall, 2009.

20.

Charu Aggarwal

Neural Networks and Deep learning: A Textbook, Springer Publication, 2018.

21.

Wang , Tianzuo , Linxiang Wang , Fei Xue , Mengya Xue , Identification of crack development in granite under triaxial compression based on the acoustic emission signal,, International Journal of Distributed Sensor Networks 17(1) (2021), 1550147720986116.

22.

Zhao , Jinling , Lei Hu , Yingying Dong , Linsheng Huang , Shizhuang Weng , Dongyan Zhang , A combination method of stacked autoencoder and 3D deep residual network for hyperspectral image classification,, International Journal of Applied Earth Observation and Geoinformation 102 (2021), 102459.

23.

D’Angelo , Gianni , Francesco Palmieri , Network traffic classification using deep convolutional recurrent autoencoder neural networks for spatial–temporal features extraction,, Journal of Network and Computer Applications 173 (2021), 102890.

24.

Lin , Guoying , Xiaofeng Feng , Wenchong Guo , Xueyuan Cui , Shengyuan Liu , Weichao Jin , Zhenzhi Lin , Yi Ding , Electricity theft detection based on stacked autoencoder and the undersampling and resampling based random forest algorithm,, IEEE Access 9 (2021), 124044–124058.

25.

Ferri , Raffaele , Claudio Babiloni , Vania Karami , Antonio Ivano Triggiani , Filippo Carducci , Giuseppe Noce , Roberta Lizio , et al. Stacked autoencoders as new models for an accurate Alzheimer’s disease classification support using resting-state EEG and MRI measurements,, Clinical Neurophysiology 132(1) (2021), 232–245.

26.

Khamparia , Aditya , Deepak Gupta , Joel JPC Rodrigues , Victor Hugo C. de Albuquerque , DCAVN: Cervical cancer prediction and classification using deep convolutional and variational autoencoder network,, Multimedia Tools and Applications 80(20) (2021), 30399–30415.

27.

Aslam , Muhammad Aqeel, , Cuili Xue , Yunsheng Chen, , Amin Zhang, , ManhuaLiu , Kan Wang , Daxiang Cui, , Breath analysis based early gastriccancer classification from deep stacked sparse autoencoder neuralnetwork,, Scientific Reports 11(1) (2021), 1–12.

28.

Li , Daqiu , Zhangjie Fu , Jun Xu , Stacked-autoencoder-based model for COVID-19 diagnosis on CT images,, Applied Intelligence 51(5) (2021), 2805–2817.

29.

Siouda , Roguia , Mohamed Nemissi , Hamid Seridi , ECG beat classification using neural classifier based on deep autoencoder and decomposition techniques, Progress in Artificial Intelligence (2021), 1-15.

30.

Abd El Kader , Isselmou , Guizhi Xu , Zhang Shuai, , Sani Saminu , Imran Javaid , Isah Salim Ahmad , Souha Kamhi , Brain Tumor Detection and Classification on MR Images by a Deep Wavelet Auto-Encoder Model,, Diagnostics 11(9) (2021), 1589.

31.

Xu , Xiaowei , Jingyi Feng , Liu Zhan , Zhixiong Li, , Feng Qian , Yunbing Yan , Fault Diagnosis of Permanent Magnet Synchronous Motor Based on Stacked Denoising Autoencoder,, Entropy 23(3) (2021), 339.

32.

Deng , Ziwei , Zhuoyue Wang , Zhaohui Tang , Keke Huang , Hongqiu Zhu , A deep transfer learning method based on stacked autoencoder for cross-domain fault diagnosis,, Applied Mathematics and Computation 408 (2021), 126318.

33.

Ke , Ziqi , Haris Vikalo , Real-time radio technology and modulation classification via an LSTM auto-encoder, IEEE Transactions on Wireless Communications (2021).

34.

Yang , Lu , Hongquan Jiang , Weld defect classification in radiographic images using unified deep neural network with multi-level features,, Journal of Intelligent Manufacturing 32(2) (2021), 459–469.

35.

Deperlioglu , Omer , Heart sound classification with signal instant energy and stacked autoencoder network,, Biomedical Signal Processing and Control 64 (2021), 102211.

36.

Qin , Yi , Xin Wang , Jingqiang Zou , The optimized deep belief networks with improved logistic sigmoid units and their application in fault diagnosis for planetary gearboxes of wind turbines,, IEEE Transactions on Industrial Electronics 66(5) (2018), 3814–3824.

37.

Li , Yuanzhi , Yang Yuan Convergence analysis of two-layer neural networks with relu activation, arXiv preprint arXiv:6 (2017).

38.

Wang , Xin , Yi Qin , Yi Wang , Sheng Xiang , Haizhou Chen , ReLTanh: An activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis,, Neurocomputing 363 (2019), 88–98.

39.

Prajit , Ramachandran , Barret Zoph , Le Quoc

, Swish: a self-gated activation function, arXiv preprint arXiv:17 (2017).

40.

Kong , Shumin , PraMasahiro Takatsukajit , Hexpo: A vanishing-proof activation function,” In 2017 International Joint Conference on Neural Networks (IJCNN), IEEE (2017), 2562-2567.

41.

Zheng-Lin , Tong , Wang Hai-Xing , Yuan Shao-Xun , Sun Xiao , Xie Jian-Ming , Classification of Alzheimer’s Disease Based on Stacked Denoising Autoencoder.” In 2018 4th Annual International Conference on Network and Information Systems for Computers (ICNISC), IEEE (2018), 248-253.

42.

Vincent , Pascal , Hugo Lalle , Yoshua Bengio , Pierre-AntoineManzagol , Extracting and composing robust features with denoising autoencoders.” In Proceedings of the 25th International Conference on Machine Learning (2008), 1096-1103.

43.

Ger , Stphaenie , Diego Klabjan , Autoencoders and Generative Adversarial Networks for Imbalanced Sequence Classification, arXiv preprint arXiv:1901.02514 (2019).

44.

Yan , Binghao , Guodong Han , Effective feature extraction via stacked sparse autoencoder to improve intrusion detection system,, IEEE Access 6 (2018), 41238–41248.

45.

Luo , Wei , Jian Yang , Wei Xu , Tao Fu , Locality-constrained sparse auto-encoder for image classification,, IEEE Signal Processing Letters 22(8) (2014), 1070–1073.

46.

Fan , Jianchao , Kai Lin , Min Han , A novel joint change detectionapproach based on weight-clustering sparse autoencoders,, IEEEJournal of Selected Topics in Applied Earth Observations and RemoteSensing 12(2) (2019), 685–699.

47.

Zhu , Haiping , Jiaxin Cheng , Cong Zhang , Jun Wu , Xinyu Shao , Stacked pruning sparse denoising autoencoder based intelligent fault diagnosis of rolling bearings,, Applied Soft Computing 88 (2020), 106060.

48.

Brunner

, Ko

and Fodor

, An Autoencoder-Enhanced StackingNeural Network Model for Increasing the Performance of IntrusionDetection,, Journal of Artificial Intelligence and SoftComputing Research 12(2) (2022), 149–163.

49.

Qadri

S.F.

, Shen

, Ahmad

, Qadri

, Zareen

and Akbar

M.A.

, SVseg: Stacked Sparse Autoencoder-Based Patch Classification Modeling for Vertebrae Segmentation,”, Mathematics 10(5) (2022), 796.

50.

Dai , Tao , Wenjun Yan , Kaiqi Zhang , Chen Qiu , Xiangmo Zhao , Shirui Pan , Gated relational stacked denoising autoencoder with localized author embedding for global citation recommendation,, {Expert Systems with Applications 184 (2021), 115359.

51.

Upadhyay

P. K.

, Pandita

, Joshi

, Scaled Conjugate Gradient Backpropagation based SLA Violation Prediction in Cloud Computing, In 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), 203–208, IEEE.

52.

Nayak , Jyotiprakash

, Patil

L. G. .

, Vinayak Patki

, Artificial neural network based water quality index (WQI) for river Godavari (India), Materials Today: Proceedings (2021).

53.

Sada , Samuel

, Improving the predictive accuracy of artificial neural network (ANN) approach in a mild steel turning operation, The International Journal of Advanced Manufacturing Technology (2021), 1-10.

54.

Abdollahi , Fahimeh , Masoud Fatemi , A new conjugate gradient method based on a modified secant condition with its applications in image processing,, RAIRO-Operations Research 55(1) (2021), 167–187.

55.

Sujatha

and Mareeswari

, Jyotir Moy Chatterjee, A. Mousa Abd Allah and Aboul Ella Hassanien, A Bayesian Regularized Neural Network for Analyzing Bitcoin Trends,, IEEE Access 9 (2021), 37989–38000.

56.

Website of MATLAB program, https://www.mathworks.com.

57.

Ho , Yaoshiang , Samuel Wookey , The real-world-weight cross-entropy loss function: Modeling the costs of mislabeling,, IEEE Access 8 (2019), 4806–4813.

58.

Liu , Guangyuan , Yangyang Li , Licheng Jiao , Yanqiao Chen , RonghuaShang , Multiobjective evolutionary algorithm assisted stackedautoencoder for PolSAR image classification,, Swarm andEvolutionary Computation 60 (2021), 100794.

59.

Yu , Miao , Tianqi Quan , Qinglong Peng , Xu Yu , Lei Liu , Amodel-based collaborate filtering algorithm based on stacked AutoEncoder, Neural Computing and Applications (2021), 1-9.

60.

Deperlioglu , Omer , Heart sound classification with signal instant energy and stacked autoencoder network,, Biomedical Signal Processing and Control 64 (2021), 102211.

61.

Aslam , Muhammad Aqeel, , Cuili Xue , Yunsheng Chen , Amin Zhang , Manhua Liu , Kan Wang , Daxiang Cui, , Breath analysis based early gastric cancer classification from deep stacked sparse autoencoder neural network,, Scientific Reports 11(1) (2021), 1–12.

62.

Khamparia , Aditya , Deepak Gupta , Joel JPC Rodrigues , Victor HugoC. de Albuquerque , DCAVN: Cervical cancer prediction and classification using deep convolutional andvariational autoencoder network,, Multimedia Tools andApplications 80(20) (2021), 30399–30415.

63.

Sánchez-Morales , Adrián , José-Luis Sancho-Gómez , Aníbal and Figueiras-Vidal

, , Complete autoencoders forclassification with missing values,, Neural Computing andApplications 33(6) (2021), 1951–1957.

64.

Gupta , Prabhav , Yash Ghatole , Nihal Reddy , Stacked Autoencoderbased Intrusion Detection System using One-ClassClassification.” In 11th International Conference on CloudComputing, Data Science & Engineering (Confluence), IEEE (2021).

65.

Gulhane , Sushen Rameshpant , Suresh Damodar Shirbahadurkar , SanjayShrikrushna Badhe , Indian classical musical instrument classification using Timbral features, Concurrency and Computation: Practice and Experience (2021), e6418.

66.

Zhao , Jinling , Lei Hu , Yingying Dong, , Linsheng Huang, , Shizhuang Weng , Dongyan Zhang , A combination method of stacked autoencoder and 3D deep residual network for hyperspectral image classification,, International Journal of Applied Earth Observation and Geoinformation 102 (2021), 102459.

67.

Li , Yongming , Yan Lei , Pin Wang , Mingfeng Jiang , Yuchuan Liu , Embedded stacked group sparse autoencoder ensemble with L1 regularization and manifold reduction,, Applied Soft Computing 101 (2021), 107003.

68.

Dhanuka

, Tripathi

, Singh

J.P.

A semi-supervised autoencoder-based approach for protein function prediction, IEEE Journal of Biomedical and Health Informatics (2022).