Neural network with NewSigmoid activation function

Abstract

We increase the power of the Artificial Neural Networks with the help of the Activation Function (AF). The tansig and logsig are widely used AF. But there is still requires some improvement in the AF. So, in this paper, we have proposed a NewSigmoid AF in the neural network. NewSigmoid is also as powerful as tansig and logsig. In multiple cases, the NewSigmoid function gives a better or equivalent performance as compared with both these AF. Like these AF, NewSigmoid is also a smooth S-shape, bounded, continuously differentiable, and zero-centered function. Therefore the NewSigmoid is also suitable for solving non-linear problems. We have tested this AF on iris, cancer, glass, chemical, bodyfat, wine, and ovarian datasets. We use Scaled Conjugate Gradient (SCG), Levenberg-Marquardt (LM), and Bayesian Regularization (BR) algorithms during the optimization of the neural network. Maximum 100% accuracy in the iris dataset while using LM, and BR; 99.9% accuracy in the cancer dataset using BR; 100% accuracy in the glass dataset using BR; 100% accuracy in the chemical and bodyfat dataset using SCG, LM, and BR; 100% accuracy in the wine dataset using LM, and BR; and 99.1% accuracy in the ovarian dataset using BR has been found while working with multilayer neural networks. The NewSigmoid also achieves 100% training and validation accuracy on the mathework-cap image dataset using SCG.

Keywords

Logsigmoid tanigmoid neural network activation function multilayer network.

1 Introduction

A neural network is a branch of machine learning [1]. Clustering, classification, and recognition are some major areas of Artificial Neural Networks (ANN) [1 –4]. Classification of Tobacco Leaf Pests [5], Image compression with VGG16 [6], Garbage Recognition and classification [7], Facial Emotion Recognition [8], Human Action Recognition [9], Prediction of chloride diffusivity in concrete [10], improve salient object detection [11], prediction of covid-19 patient [12], stock price pattern classification and prediction [13], and network traffic classification [14] are some applications, where ANN are used. ANN is more suitable for solving nonlinear and complex problems. Therefore some other important ANN applications are robotic control, quality control, disease detection in the medical field. ANN may also be used for relationship detection between things that are represented by graphs. So that ANNs are used for Community Detection (CD) in complex networks [15]. Moh. Al. Andoli et al. used Parallel Stacked Autoencoder with PSO for community detector in complex networks. Subspaces groups or individuals are also detected with the help of CD [16].

The architecture of an ANN is inspired by the biological human brain [1 –4]. Normally, ANN is made with the help of multiple layers. All the layers are categorized into three parts. They are an input layer, an output layer, and one or more hidden layers [2]. One input, one output, and one hidden layer combinedly make a single-layer neural network. Training of ANN is called the learning process. This learning process has been categorized into three parts: Supervised learning, Reinforcement Learning, and unsupervised learning. For testing our work, we have taken Multilayer Feedforward Neural Network (MLFFNN), which comes under the supervised learning process. Normally, MLFFNN is made either a single hidden layer or multiple hidden layers. The value of the neuron is calculated with the help of that hidden layer. This neuron is also known as the Information processing unit of ANN [3].

The working of a neuron depends on three basic elements [3]. The first element is Synapses or connecting links where weights of all inputs are assigned and these weights are multiplied by their input value. The second element is the adder where these multiplied values are added. The result of this added value is known as neuron. The third element is the activation function where outputs are calculated with the help of this neuron value.

1.1 Working of MLFFNN

Figure 1 is an ANN model. In this figure, the first layer is the input layer, the second layer is the hidden layer, and the third layer is the output layer. Each layer has its own weight matrix, its own bias vector, a net input vector, and an output vector. Hidden neurons work like feature detectors [3]. Therefore, by applying the hidden neurons, we find out the features of the network.

Fig. 1

Sample of pattern recognition neural network. This network is made with the help of the iris dataset. In this network, the numbers of inputs are 4. The tansig function is used in the hidden layer and softmax function is used in the output layer. The numbers of outputs are 3.

Training of multilayer perceptron can be done by using a back-propagation algorithm, which also includes a special case of least mean-square (LMS) algorithm (presented by Widrow and Hoff) [3]. This training has two phases [2 –4].

1.1.1 Forward phase

During this phase layer-by-layer, the synaptic weights of the network are calculated and the input signal is propagated through the network until we reach the outputs. After that, we calculate output with the help of Equation (2). The final predicted output is used for calculating the errors.

Suppose, v_ji is the value of i^th neuron at j^th layer (Summation of multiplication of inputs and weights are called induced local field. Mention in Equation (1).) $v_{ji} = \sum_{i = 0}^{m} w_{ji} y_{i}$ (1)

Where, w_ji is the weight value of i^th neuron at j^th layer, y_i is the output value of the i^th layer neuron or input value of the j^th layer neuron, and m is the total number of inputs (including bias). If activation function (AF) is φ (v_J), then output value of j^th layer is calculated with the help of Equation (2) where y_j is the output value of the j^th layer neuron. $y_{j} = φ_{j} (v_{J})$ (2)

Generally, logsig and tansig activation functions are used in neural networks [17, 18]. Arvind et al. [18], Fatemeh et al. [19], and Chandra et al. [20] have used tansig AF in ANN. Sunil et al. [21] have used logsig AF in ANN. Raja et al. [22] have used log-sigmoid, radial-base, and tan-sigmoid AF in ANN.

Suppose y_j is the final predicted output value and d is the given target value then the value of error (e) of the network is calculated with the help of Equation (3). $e = d - y_{j}$ (3)

If this error value is greater or less than zero then we apply the backward phase to minimize this error value otherwise we stop the training of the network.

1.1.2 Backward phase

The error values occur due to random decision of weight and bias values. So during this phase, our main aim is to update weight and bias values. The error of Equation (3) is again propagated through the network [layer by layer] in the backward direction. Derivative of the activation function is required during the back-propagation process.

Instantaneous error energy (ɛ) of the neuron at j^th layer is calculated with the help of Equation (4). $ɛ_{j} = \frac{1}{2} (e_{j}^{2})$ (4)

We apply the chain rule with respect to weight and bias on the Equation (4). The result of this chain rule is Equations (5) and (6). $\frac{d (ɛ j)}{d (w_{ji})} = \frac{\partial ɛ_{j}}{\partial e_{j}} x \frac{\partial e_{j}}{\partial y_{j}} x \frac{\partial y_{j}}{\partial v_{j}} x \frac{\partial v_{j}}{d (w_{ji})}$ (5) $\frac{d (ɛ j)}{d (b_{ji})} = \frac{\partial ɛ_{j}}{\partial e_{j}} x \frac{\partial e_{j}}{\partial y_{j}} x \frac{\partial y_{j}}{\partial v_{j}} x \frac{\partial v_{j}}{d (b_{ji})}$ (6)

Now, we update weight and bias using the Equations (7) and (8). $w_{new} + w_{old} + η \frac{d (ɛ j)}{d (w_{ji})}$ (7) $b_{new} = b_{old} + η \frac{d (ɛ j)}{d (b_{ji})}$ (8)

Optimization algorithms are used during the calculation of new weights and biases. Losses are reduced with the help of these optimization algorithms. Levenberg_Marquardt (LM), SCG, Bayesian Regularization (BR) are some famous optimization algorithms. LM is proposed by D. W. Marquardt [23]. The scaled conjugate gradient (SCG) algorithm is based on conjugate directions [24]. LM, SCG, and BR are generally used by many researchers. Liang et al. (2020) used LM in the load frequency system [25]. Ju et al. (2020) used LM in power flow calculation [26]. Upadhyay et al. (2019) used SCG in violation prediction in Cloud Computing [27]. Jyotiprakash et al. used LM, SCG, and BR for the prediction of water quality index in the ANN [28]. Vidha et al. used LM in the prediction of daily nitrogen oxide for health management [29]. Faraggi et al. used LM for scoring protein models [30]. S.O. Sada used LM and SCG for the prediction of accuracy in a mild steel turning operation [31]. Abdollahi et al. proposed a new conjugate gradient method that was based on SCG and was used for solving an optimization problem [32]. Sujatha et al. used BR, SCG, and LM for analysis of Bitcoin Trends [33]. Bakhshayesh et al. used LM and BR for accurate estimation of Nuclear power plants (NPP) [34]. Aneja et al. used BR in ANN to predict strength characteristics of Fly-Ash and Bottom-Ash based Geopolymer Concrete [35]. Handayani et al. used BR to solve inverse kinematics on planar manipulators [36]. In the case of batch training mode, LM is the fastest training algorithm. SCG is used for large networks. BR is used for good generalization, without the need for a validation set. BR method is used on arbitrary sizes. This method ensures that only the required number of parameters is effectively used. Due to the wide range of SCG, LM, and BR algorithms; we test our NewSigmoid activation function using SCG, LM, and BR.

1.2 Some activation functions and their properties

Activation functions are categorized into two parts. They are: Linear activation function and Non-linear activation function.

Linear activation function: The linear activation function is useful for linear separability types of problems. Hard limit, symmetrical hard limit, and linear are some examples of the linear activation function. In the hard limit function, if the input value is less than zero then the output will become zero otherwise output will become one. In the symmetrical hard limit function if the input value is less than zero then the output will become a negative one otherwise output will become one. In the linear activation function if the input (n) is any value then the output will become the same (n) value as that of input.

Non-linear activation function: The non-linear function is more useful for non-separability problems. The log sigmoid, hyperbolic tansigmoid, Elliot symmetric sigmoid transfer function (elliotsig), softmax are some examples of non-linear activation function. In log sigmoid function, if n is input then output is 1 / (1 + e^-n) (Table 1). In Hyperbolic Tangent sigmoid function, if n is input then output is (eⁿ - e^-n) / (eⁿ +e^-n) (Table 1). In softmax function, if n is input then output is exp (n)/ sum (exp (n)). Soft means softmax is continuous and differentiable [24]. The softmax function is now mostly used in the output layer of the classifier [24].

Table 1
Different activation function, their mathematical equations, derivatives and range

Sr. No. Activation function Equation Derivative Range

1 Logsigmoid or Sigmoid $f (n) = \frac{1}{1 + e^{- n}}$ $f^{'} (n) = \frac{e^{- n}}{{(1 + e^{- n})}^{2}}$ (0,1)

2 Tansigmoid or Tanh $f (n) = \frac{e^{n} - e^{- n}}{e^{n} + e^{- n}}$ $f^{'} (n) = \frac{4 e^{2 n}}{{(1 + e^{2 n})}^{2}}$ (–1, 1)

3 NewSigmoid $f (n) = \frac{e^{n} - e^{- n}}{\sqrt{2 (e^{2 n} + e^{- 2 n})}}$ $f^{'} (n) = \frac{e^{n} + e^{- n}}{\sqrt{2 (e^{2 n} + e^{- 2 n})}} - \frac{4 (e^{- 2 n} - e^{2 n}) (e^{- n} - e^{n})}{2 {(2 e^{2 n} + 2 e^{- 2 n})}^{\frac{3}{2}}}$ (–0.7071, 0.7071)

4 ISigmoid [38] α(x-a)+ Sigmoid (a); x> = a α; |x|> = a (depends on conditions)

Sigmoid (x); -a<x<a Sigmoid’ (x); |x|<a

α(x + a)+ Sigmoid (a); x< = -a Where ‘a’ is the threshold α is the slop, and both of them are preset.

5 ReLU [39] x; x> = 0; 1; x> = 0 (0,∞)

0; x < 0 0; x < 0

6 Rel Tanh [39] Tanh’(λ⁺)(x-λ⁺)+ Tanh (λ⁺); x> =λ⁺ Tanh”(λ⁺); x> =λ⁺ (depends on conditions)

Tanh (x);λ^- <x> =λ⁺ Tanh’ (x);λ^- <x> =λ⁺

Tanh’(λ^-)(x-λ^-)+ Tanh (λ^-); x< =λ^- Tanh”(λ^-); x< =λ^-

$λ_{lower}^{+}$ < =λ+< = $λ_{upper}^{+}$ and $λ_{lower}^{-}$ < =λ^-< = $λ_{upper}^{-}$

Sr. No.	Activation function	Equation	Derivative	Range
1	Logsigmoid or Sigmoid	$f (n) = \frac{1}{1 + e^{- n}}$	$f^{'} (n) = \frac{e^{- n}}{{(1 + e^{- n})}^{2}}$	(0,1)
2	Tansigmoid or Tanh	$f (n) = \frac{e^{n} - e^{- n}}{e^{n} + e^{- n}}$	$f^{'} (n) = \frac{4 e^{2 n}}{{(1 + e^{2 n})}^{2}}$	(–1, 1)
3	NewSigmoid	$f (n) = \frac{e^{n} - e^{- n}}{\sqrt{2 (e^{2 n} + e^{- 2 n})}}$	$f^{'} (n) = \frac{e^{n} + e^{- n}}{\sqrt{2 (e^{2 n} + e^{- 2 n})}} - \frac{4 (e^{- 2 n} - e^{2 n}) (e^{- n} - e^{n})}{2 {(2 e^{2 n} + 2 e^{- 2 n})}^{\frac{3}{2}}}$	(–0.7071, 0.7071)
4	ISigmoid [38]	α(x-a)+ Sigmoid (a); x> = a	α; \|x\|> = a	(depends on conditions)
		Sigmoid (x); -a<x<a	Sigmoid’ (x); \|x\|<a
		α(x + a)+ Sigmoid (a); x< = -a	Where ‘a’ is the threshold α is the slop, and both of them are preset.
5	ReLU [39]	x; x> = 0;	1; x> = 0	(0,∞)
		0; x < 0	0; x < 0
6	Rel Tanh [39]	Tanh’(λ⁺)(x-λ⁺)+ Tanh (λ⁺); x> =λ⁺	Tanh”(λ⁺); x> =λ⁺	(depends on conditions)
		Tanh (x);λ^- <x> =λ⁺	Tanh’ (x);λ^- <x> =λ⁺
		Tanh’(λ^-)(x-λ^-)+ Tanh (λ^-); x< =λ^-	Tanh”(λ^-); x< =λ^-
		$λ_{lower}^{+}$ < =λ+< = $λ_{upper}^{+}$ and $λ_{lower}^{-}$ < =λ^-< = $λ_{upper}^{-}$

The Aranda, Bi-sig1, Bi-sig2, Bi-tanh1, Bi-tanh2, cloglog, cloglogm, Elliott, Gaussian, logarithmic, loglog, logsigm, modified Elliott, rootsig, saturated, sech, sigmoidalm, sigmoidalm2, sigt, skewed–sig, softsign, and wave are some other activation functions [17]. Every sigmoid function has some properties. Some of these properties are Nonlinear, Range, Continuously Differentiable, and Shape. Mostly nonlinear problems use exponential function. The derivation of logsig AF is e^-n/ (1 + e^-n)² and its range is (0, 1) (Table 1). The derivation of tanh AF is 4e²ⁿ/ (1 + e²ⁿ)² and its range is (–1, 1) (Table 1) [1 –4]. The derivation of Elliott AF is 0.5/(1+|x|)² and its range is (0,1) [1 –4]. There is a lot of AFs present in these times. Out of them, the tansig and logsig are commonly used sigmoid AFs. One AF is not suitable for every problem. We have to use different AF for numerous problems to achieve better performance. For example, Yi Qin et. al. achieved a classification accuracy of 93.73% using sigmoid, and 95.56% using Improved logistic sigmoid (Isigmoid) on the MNIST dataset [38]. The equation of an improved logistic sigmoid (Isigmoid) AF [38] is shown in Table 1, at serial number 4. The equation of a ReLU AF is shown in Table 1, at serial number 5. ReLU is the fastest AF. But normally, we do not use ReLU and its minor variants for 2 or 3 layer networks. ReLU does not use exponential or trigonometric function so does not give a local effect. Due to this reason, this function is mainly used in CNN and very deep neural networks. In the case of ReLU AF, the neurons will be deactivated if the output of the linear transformation is less than 0 (i.e. for the negative value). The equation of a RelTanh AF is shown in Table 1, at serial number 6. This RelTanh AF is an improved AF of tanh. Xin Wang et al. achieved 96.15% testing accuracy with the help of RelTanh on the faulty dataset of CWRU [39]. The tansigmoid and logsigmoid do not always achieve better performance. Taking this as a motivation, for the purpose of solving non-linear complex problems we try to develop a new activation function that should perform at par or better on some other hidden neuron size. We have succeeded to find a new function which performs at par or better, especially if compared with logsig. In the result section, we clearly show that in multiple cases our activation function achieves better performance. ReLU and its variants are mainly used in very deep layers and CNN networks. For a 2 or 3 layers deep network, we normally do not use ReLU AF. So we compare our proposed AF with tansigmoid and logsigmoid AFs. If tansigmoid and logsigmoid do not give satisfactory results then our proposed activation function can be used for getting better results.

2 NewSigmoid activation function and its properties

Equation (9) is our proposed NewSigmoid activation function (AF). $f (n) = \frac{e^{n} - e^{- n}}{\sqrt{2 (e^{2 n} + e^{- 2 n})}}$ (9)

Where n is input value and f (n) is output value.

NewSigmoid AF has following four properties:

Nonlinear: NewSigmoid AF uses exponential function (Equation (9)) as like tansig and logsig AF. Therefore, we can use this NewSigmoid AF for solving of nonlinear problems.

Range and Shape: The range of the NewSigmoid function is [–0.7071, 0.7071]. This range is a finite range. Therefore, in the case of pattern recognition, this function will show more stability. The shape of the proposed activation function is ‘S’ (Fig. 2).

Continuously Differentiable: NewSigmoid is a continuously differentiable function. Its derivative is found by using Equation (10). Therefore, we may use NewSigmoid AF in gradient-based Optimization methods.

Fig. 2

Plot of tansigmoid, logsigmoid and myActivationFun(or NewSigmoidmoid) activation function. This graph clear show the range of myActivationFun is [–0.7071, 0.7071].

$f^{'} (n) = \frac{e^{n} + e^{- n}}{\sqrt{2 (e^{2 n} + e^{- 2 n})}} - \frac{4 (e^{- 2 n} - e^{2 n}) (e^{- n} - e^{n})}{2 {(2 e^{2 n} + 2 e^{- 2 n})}^{\frac{3}{2}}}$ (10)

Figure 3 shows the graph of the derivative of the proposed activation function.

Fig. 3

Plot of differentiation of tansigmoid, logsigmoid and my Activation Fun (or New Sigmoidmoid) activation function. This graph clear shows that the differentiation of myActivationFun(f’) is continuous at 0.

Suppose n = 0, then the value of Equation (9) is as under:

$f (0) = (e^{0} - e^{- 0}) / (2^{*} (e^{2 * 0} + e^{- 2 * 0}) = 0$ (11) and the value of Equation (10) is as under: $f^{'} (0) = (e^{0} + e^{- 0}) / ∖ \sqrt (2^{*} (e^{0} + e^{- 0}) - 4 (e^{- 0} - e^{0}) (e^{0} - e^{0}) / (2^{*} (* e^{0} + 2^{*} e^{0})^{3 / 2}) = 2 / 2 - 0 = 1$ (12)

On the basis of Equations (11) and (12), we can say f^′ (n) is continuous at 0. Therefore, NewSigmoid has an approximate identity near the origin. During the optimization process, the proposed activation function becomes easier as compared to the logistic function.

Zero-Centered: NewSigmoid is a zero-centered activation function.

3 Experiments

We did this experiment on intel core i7, window 10, and MATLAB 2020b [37]. We did the following six steps for this research:

Step 1 (Datasets): We had taken 7-datasets from the MATLAB software toolbox. These datasets were iris, cancer, glass, bodyfat, chemical, wine and ovarian datasets. The iris dataset has 150 samples of three flowers. This dataset has four attributes and three targets. The cancer dataset has 699 samples of patient data. This dataset has 9-attributes and 2-targets. The glass dataset has 214 samples of glass values. This dataset has 9-attributes and 2-targets. The bodyfat dataset has 252 samples of bodyfat data. This dataset has 13-attributes and 1-targets. The chemical dataset has 498 samples of chemical data. This dataset has 8-attributes and 1-targets. The wine dataset has 178 samples of wine data. This dataset has 13-attributes and 3-targets. The ovarian dataset has 216 samples of ovarian data. This dataset has 100-attributes and 2-targets. We have also tested our NewSigmoid activation function on an image dataset. We tested on the mathwork-cap dataset. This dataset has 75-images of 5-classes. The size of the images is 227-by-227-by-3.

For this work, datasets were divided into three parts. 70% of the dataset is used for the purpose of the training set, 15% of the dataset is used for the validation set and 15% of the dataset is used for the purpose of the testing set.

Step 2 (Network Selection): In all seven datasets, input and corresponding output values are given. So problems of these datasets can be solved by any supervised learning model. For testing our NewSigmoid activation function, we have taken three layers of MLFFNN. One layer of this network is the input layer, the second is the hidden layer and the third is the output layer. We use tansig, logsig, and NewSigmoid activation function between input and hidden layer, one function in onenetwork.

The softmax activation function is usually used for the multi-classification model, but the logistic function is usually used for the binary classification model, therefore we take the softmax activation function between the hidden and output layer. For deciding about the number of hidden neurons or nodes in the network we first run the network by varying the hidden nodes between 1 to 50 on the iris, cancer, and glass dataset, and then check the network performance.

Step 3 (Selecting Training algorithm): We use LM, SCG, and BR training algorithms in the networks, a single algorithm in a single network.

Step 4 (Weight initialization): For this MLFFNN, we randomly assign initial weights and biases.

Step 5 (Run the network): In this step, we train/run the network with the following six conditions:

In the case of the SCG algorithm, with the help of Tables 1, 2, and 3; on the hidden node numbers 17, 20, 21, 25, 41, 44, and 50, we can say that the NewSigmoid activation function works better or equivalent as compared to the tansig activation function. On the hidden node numbers 25, 34, 38, 41, 44, 47, and 48, we can say that the NewSigmoid activation function works better or is equivalent as compared to the logsig activation function. These results are only for three datasets i.e. iris, glass, and cancer datasets. For testing on the other four datasets, we had randomly chosen hidden neuron sizes 25, 41, 44 for the SCG algorithm (by observing these two results).

In the case of the SCG algorithm, we have taken the maximum number of epochs as 1000, epochs value between display as 25, the value of performance goal as 0, the value of minimum performance as 10^–⁶, the value of maximum validation failure as 6, the value of determine change in weight for second derivate approximation as 5×10^–⁵ and the value of the parameter for regulating the indefiniteness of the Hessian as 5×10^–⁷. In this case, the performances of the network have been checked with the help of cross-entropy.

In the case of the LM algorithm, with the help of Tables 1, 2, and 3; on the hidden node numbers 49, 15, 21, 23, 25, 30, 31, 33, 39, 47, and 48, we can say that NewSigmoid activation function works better or equivalent as compared to the tansig activation function. On the hidden node numbers 5, 21, 25, 30, 32, 35, 40, 41, 44, 45, 46, and 50, we can say that NewSigmoid activation function works better or equivalent as compared to the logsig activation function. These results were only for three datasets i.e. iris, glass, and cancer datasets. For testing on other four datasets, we have randomly chosen hidden neuron sizes 25, 30, 39 for the LM algorithm (by observing these two results).

In the case of the LM algorithm, we have taken the maximum number of epochs as 1000, epochs value between display as 25, value of the performance goal as 0, value of the minimum performance as 10^–⁷, the value of the maximum validation failure as 6, the value of the initial Marquardt adjustment parameter (mu) as 0.001, which can be increased upto maximum 10¹⁰ with the help of factor 10. In this case, the performances of the network have been checked with the help of mean square error (mse).

In the case of the BR algorithm, with the help of Tables 1, 2, and 3; on the hidden node numbers 2, 5, 6, 7, 9, 14, 16, 18, 20, 31, 32, 33, 39, 41, 44, 45, 47, 49, and 50, we can say that NewSigmoid activation function works better or equivalent as compared to the tansig activation function. At the number of hidden nodes 2, 7, 9, 12, 14, 16, 18, 21, 29, 30, 31, 32, 39, 42, 47, 49, and 50, we can say that NewSigmoid activation function works better or equivalent as compared to the logsig activation function. These results were only for three datasets i.e. iris, glass. and cancer datasets. For testing on the other four datasets we had randomly chosen hidden neuron size 16, 18, 32 for the BR algorithm (by observing these two results).

In the case of the BR algorithm, we had taken the maximum number of epochs as 1000, epochs value between display as 25, the value of the performance goal as 0, the value of the minimum performance as 10^–⁷, the value of the maximum validation failure as 6, the value of initial Marquardt adjustment parameter (mu) as 0.1, which can be increased upto maximum 10¹⁰ with the help of factor 10. In this case, the performances of the network have been checked with the help of mse.

Table 2
Errors, average error, and average accuracy values on the iris dataset for the hidden node varied from 1 to 50 using tansig, logsig and NewSigmoid on SCG, LM and BR training algorithms

No of hidden node SCG algorithm LM algorithm BR algorithm

tansig logsig NewSigmoid tansig logsig NewSigmoid tansig logsig NewSigmoid

1 0.013333 0.02 0.333333 0.306667 0.34 0.386667 0.006667 0.006667 0.34

2 0.026667 0.033333 0.086667 0.04 0.02 0.04 0.033333 0.02 0.006667

3 0.02 0.026667 0.166667 0.026667 0.033333 0.026667 0.02 0.02 0.013333

4 0.026667 0.026667 0.066667 0.033333 0.026667 0.026667 0.006667 0.006667 0.006667

5 0.04 0.046667 0.313333 0.033333 0.033333 0.033333 0.02 0.006667 0.013333

6 0.033333 0.033333 0.02 0.02 0.02 0.02 0.02 0.02 0.02

7 0.013333 0.02 0.02 0.013333 0.02 0.033333 0.006667 0.006667 0.006667

8 0.026667 0.02 0.02 0.033333 0.02 0.033333 0.013333 0.02 0.02

9 0.066667 0.046667 0.046667 0.02 0.04 0.013333 0.013333 0.013333 0.013333

10 0.013333 0.013333 0.026667 0.006667 0.006667 0.006667 0.006667 0.006667 0.006667

11 0.02 0.02 0.026667 0.02 0.02 0.026667 0.013333 0.013333 0.013333

12 0.026667 0.02 0.026667 0.013333 0.013333 0.013333 0.013333 0.013333 0.013333

13 0.013333 0.026667 0.02 0.033333 0.053333 0.06 0.013333 0.013333 0.013333

14 0.026667 0.02 0.026667 0.026667 0.053333 0.046667 0.006667 0.006667 0.006667

15 0.026667 0.033333 0.02 0.04 0.02 0.013333 0.006667 0.006667 0.006667

16 0.04 0.04 0.046667 0.006667 0.02 0.046667 0.006667 0.006667 0.006667

17 0.026667 0.026667 0.026667 0.02 0.006667 0.04 0.006667 0.006667 0.006667

18 0.026667 0.026667 0.04 0.02 0.02 0.08 0.013333 0.013333 0.013333

19 0.033333 0.033333 0.04 0.033333 0.02 0.046667 0.013333 0.013333 0.013333

20 0.04 0.02 0.04 0.013333 0.013333 0.006667 0.013333 0.006667 0.013333

21 0.026667 0.106667 0.02 0.073333 0.02 0.013333 0.006667 0 0.006667

22 0.02 0.046667 0.04 0.046667 0.02 0.026667 0.006667 0.006667 0.006667

23 0.013333 0.046667 0.053333 0.02 0.013333 0.02 0 0.006667 0.006667

24 0.013333 0.013333 0.013333 0.013333 0.013333 0.006667 0.006667 0.006667 0.006667

25 0.033333 0.033333 0.013333 0.06 0.02 0.02 0.006667 0.013333 0.013333

26 0.02 0.02 0.02 0.013333 0.006667 0.02 0.006667 0.006667 0.013333

27 0.02 0.02 0.026667 0.02 0.02 0.02 0.013333 0.013333 0.02

28 0.026667 0.046667 0.04 0.02 0.026667 0.02 0.013333 0.006667 0.013333

29 0.033333 0.02 0.033333 0.026667 0.02 0.026667 0.02 0.02 0.013333

30 0.006667 0.006667 0.006667 0.4 0.006667 0.006667 0.006667 0.006667 0.006667

31 0.04 0.04 0.04 0.066667 0.046667 0.073333 0.013333 0.013333 0.013333

32 0.02 0.026667 0.02 0.006667 0.02 0.02 0.006667 0.006667 0.006667

33 0.02 0.033333 0.04 0.013333 0.006667 0.013333 0.013333 0.666667 0.013333

34 0.02 0.02 0.02 0 0.006667 0.013333 0.006667 0.006667 0.013333

35 0.02 0.02 0.02 0.033333 0.026667 0.013333 0.013333 0.006667 0.006667

36 0.013333 0.013333 0.02 0 0.013333 0.026667 0 0.873333 0.006667

37 0.013333 0.013333 0.013333 0 0 0.006667 0 0.013333 0

38 0.006667 0.006667 0.006667 0.006667 0 0 0.013333 0.013333 0.013333

39 0.013333 0.013333 0.013333 0.013333 0 0.013333 0.006667 0.006667 0.006667

40 0.026667 0.02 0.013333 0.013333 0.013333 0.013333 0 0.006667 0.006667

41 0.02 0.026667 0.02 0.02 0.02 0.006667 0.006667 0.006667 0.006667

42 0.026667 0.026667 0.013333 0.006667 0.013333 0.013333 0.013333 0.02 0.02

43 0.02 0.02 0.026667 0.033333 0.04 0.04 0 0.006667 0

44 0.02 0.013333 0.013333 0 0.333333 0 0 0.006667 0

45 0.02 0.013333 0.02 0.026667 0.02 0.013333 0.013333 0.013333 0.013333

46 0.02 0.02 0.013333 0.006667 0.013333 0.006667 0 0 0.006667

47 0.006667 0.013333 0.013333 0.08 0.02 0.073333 0.006667 0.006667 0.006667

48 0.013333 0.013333 0.013333 0.033333 0.033333 0.026667 0.006667 0.006667 0.006667

49 0.006667 0.02 0.006667 0.006667 0.006667 0.006667 0.006667 0.006667 0.006667

50 0.02 0.013333 0.02 0.013333 0 0 0.006667 0.006667 0.006667

Sum of errors 1.14 1.3 2.046667 1.833333 1.6 1.56 0.473333 2.006667 0.82

Average error 0.0228 0.026 0.040933 0.036667 0.032 0.0312 0.009467 0.040133 0.0164

Average Accuracy 0.9772 0.974 0.959067 0.963333 0.968 0.9688 0.990533 0.959867 0.9836

No of hidden node	SCG algorithm	LM algorithm	BR algorithm
1	0.013333	0.02	0.333333	0.306667	0.34	0.386667	0.006667	0.006667	0.34
2	0.026667	0.033333	0.086667	0.04	0.02	0.04	0.033333	0.02	0.006667
3	0.02	0.026667	0.166667	0.026667	0.033333	0.026667	0.02	0.02	0.013333
4	0.026667	0.026667	0.066667	0.033333	0.026667	0.026667	0.006667	0.006667	0.006667
5	0.04	0.046667	0.313333	0.033333	0.033333	0.033333	0.02	0.006667	0.013333
6	0.033333	0.033333	0.02	0.02	0.02	0.02	0.02	0.02	0.02
7	0.013333	0.02	0.02	0.013333	0.02	0.033333	0.006667	0.006667	0.006667
8	0.026667	0.02	0.02	0.033333	0.02	0.033333	0.013333	0.02	0.02
9	0.066667	0.046667	0.046667	0.02	0.04	0.013333	0.013333	0.013333	0.013333
10	0.013333	0.013333	0.026667	0.006667	0.006667	0.006667	0.006667	0.006667	0.006667
11	0.02	0.02	0.026667	0.02	0.02	0.026667	0.013333	0.013333	0.013333
12	0.026667	0.02	0.026667	0.013333	0.013333	0.013333	0.013333	0.013333	0.013333
13	0.013333	0.026667	0.02	0.033333	0.053333	0.06	0.013333	0.013333	0.013333
14	0.026667	0.02	0.026667	0.026667	0.053333	0.046667	0.006667	0.006667	0.006667
15	0.026667	0.033333	0.02	0.04	0.02	0.013333	0.006667	0.006667	0.006667
16	0.04	0.04	0.046667	0.006667	0.02	0.046667	0.006667	0.006667	0.006667
17	0.026667	0.026667	0.026667	0.02	0.006667	0.04	0.006667	0.006667	0.006667
18	0.026667	0.026667	0.04	0.02	0.02	0.08	0.013333	0.013333	0.013333
19	0.033333	0.033333	0.04	0.033333	0.02	0.046667	0.013333	0.013333	0.013333
20	0.04	0.02	0.04	0.013333	0.013333	0.006667	0.013333	0.006667	0.013333
21	0.026667	0.106667	0.02	0.073333	0.02	0.013333	0.006667	0	0.006667
22	0.02	0.046667	0.04	0.046667	0.02	0.026667	0.006667	0.006667	0.006667
23	0.013333	0.046667	0.053333	0.02	0.013333	0.02	0	0.006667	0.006667
24	0.013333	0.013333	0.013333	0.013333	0.013333	0.006667	0.006667	0.006667	0.006667
25	0.033333	0.033333	0.013333	0.06	0.02	0.02	0.006667	0.013333	0.013333
26	0.02	0.02	0.02	0.013333	0.006667	0.02	0.006667	0.006667	0.013333
27	0.02	0.02	0.026667	0.02	0.02	0.02	0.013333	0.013333	0.02
28	0.026667	0.046667	0.04	0.02	0.026667	0.02	0.013333	0.006667	0.013333
29	0.033333	0.02	0.033333	0.026667	0.02	0.026667	0.02	0.02	0.013333
30	0.006667	0.006667	0.006667	0.4	0.006667	0.006667	0.006667	0.006667	0.006667
31	0.04	0.04	0.04	0.066667	0.046667	0.073333	0.013333	0.013333	0.013333
32	0.02	0.026667	0.02	0.006667	0.02	0.02	0.006667	0.006667	0.006667
33	0.02	0.033333	0.04	0.013333	0.006667	0.013333	0.013333	0.666667	0.013333
34	0.02	0.02	0.02	0	0.006667	0.013333	0.006667	0.006667	0.013333
35	0.02	0.02	0.02	0.033333	0.026667	0.013333	0.013333	0.006667	0.006667
36	0.013333	0.013333	0.02	0	0.013333	0.026667	0	0.873333	0.006667
37	0.013333	0.013333	0.013333	0	0	0.006667	0	0.013333	0
38	0.006667	0.006667	0.006667	0.006667	0	0	0.013333	0.013333	0.013333
39	0.013333	0.013333	0.013333	0.013333	0	0.013333	0.006667	0.006667	0.006667
40	0.026667	0.02	0.013333	0.013333	0.013333	0.013333	0	0.006667	0.006667
41	0.02	0.026667	0.02	0.02	0.02	0.006667	0.006667	0.006667	0.006667
42	0.026667	0.026667	0.013333	0.006667	0.013333	0.013333	0.013333	0.02	0.02
43	0.02	0.02	0.026667	0.033333	0.04	0.04	0	0.006667	0
44	0.02	0.013333	0.013333	0	0.333333	0	0	0.006667	0
45	0.02	0.013333	0.02	0.026667	0.02	0.013333	0.013333	0.013333	0.013333
46	0.02	0.02	0.013333	0.006667	0.013333	0.006667	0	0	0.006667
47	0.006667	0.013333	0.013333	0.08	0.02	0.073333	0.006667	0.006667	0.006667
48	0.013333	0.013333	0.013333	0.033333	0.033333	0.026667	0.006667	0.006667	0.006667
49	0.006667	0.02	0.006667	0.006667	0.006667	0.006667	0.006667	0.006667	0.006667
50	0.02	0.013333	0.02	0.013333	0	0	0.006667	0.006667	0.006667
Sum of errors	1.14	1.3	2.046667	1.833333	1.6	1.56	0.473333	2.006667	0.82
Average error	0.0228	0.026	0.040933	0.036667	0.032	0.0312	0.009467	0.040133	0.0164
Average Accuracy	0.9772	0.974	0.959067	0.963333	0.968	0.9688	0.990533	0.959867	0.9836

Table 3

Errors, average error and average accuracy values on cancer dataset for the hidden nodes varied from 1 to 50 using tansig, logsig and New Sigmoid activation function on SCG, LM and BR training algorithms

No of hidden node	SCG algorithm			LM algorithm			BR algorithm
	tansig	logsig	NewSigmoid	tansig	logsig	NewSigmoid	tansig	logsig	NewSigmoid
1	0.035765	0.037196	0.114449	0.030043	0.028612	0.032904	0.027182	0.027182	0.032904
2	0.037196	0.057225	0.107296	0.028612	0.027182	0.06867	0.017167	0.018598	0.021459
3	0.035765	0.031474	0.057225	0.02432	0.02432	0.028612	0.014306	0.012876	0.014306
4	0.031474	0.030043	0.032904	0.028612	0.02289	0.025751	0.011445	0.014306	0.008584
5	0.027182	0.025751	0.055794	0.028612	0.028612	0.02289	0.007153	0.010014	0.007153
6	0.030043	0.030043	0.041488	0.025751	0.030043	0.02289	0.008584	0.008584	0.008584
7	0.02432	0.02289	0.032904	0.025751	0.008584	0.020029	0.005722	0.012876	0.012876
8	0.027182	0.028612	0.032904	0.014306	0.027182	0.021459	0.010014	0.007153	0.007153
9	0.030043	0.028612	0.040057	0.025751	0.011445	0.025751	0.007153	0.005722	0.005722
10	0.041488	0.041488	0.042918	0.027182	0.027182	0.028612	0.005722	0.002861	0.007153
11	0.038627	0.035765	0.037196	0.020029	0.02289	0.027182	0.007153	0.010014	0.012876
12	0.02289	0.044349	0.02432	0.008584	0.02289	0.017167	0.005722	0.008584	0.007153
13	0.030043	0.030043	0.032904	0.027182	0.027182	0.021459	0.004292	0.007153	0.008584
14	0.02432	0.02432	0.015737	0.012876	0.018598	0.021459	0.004292	0.007153	0.002861
15	0.028612	0.030043	0.038627	0.025751	0.021459	0.020029	0.007153	0.005722	0.007153
16	0.037196	0.037196	0.021459	0.032904	0.021459	0.012876	0.004292	0.008584	0.002861
17	0.032904	0.017167	0.032904	0.020029	0.021459	0.014306	0.007153	0.005722	0.008584
18	0.028612	0.035765	0.027182	0.015737	0.015737	0.012876	0.010014	0.004292	0.008584
19	0.032904	0.030043	0.028612	0.02289	0.028612	0.008584	0.004292	0.002861	0.007153
20	0.025751	0.028612	0.025751	0.02289	0.02432	0.025751	0.007153	0.005722	0.004292
21	0.028612	0.025751	0.027182	0.021459	0.030043	0.020029	0.010014	0.008584	0.005722
22	0.027182	0.028612	0.035765	0.018598	0.027182	0.015737	0.005722	0.008584	0.014306
23	0.025751	0.032904	0.030043	0.030043	0.030043	0.020029	0.004292	0.005722	0.004292
24	0.034335	0.034335	0.035765	0.014306	0.012876	0.014306	0.005722	0.010014	0.007153
25	0.030043	0.028612	0.027182	0.02289	0.010014	0.002861	0	0.001431	0.004292
26	0.030043	0.02289	0.025751	0.027182	0.017167	0.014306	0.008584	0.008584	0.010014
27	0.032904	0.032904	0.032904	0.025751	0.025751	0.020029	0.005722	0.004292	0.005722
28	0.041488	0.032904	0.038627	0.017167	0.027182	0.027182	0.008584	0.004292	0.014306
29	0.030043	0.025751	0.021459	0.027182	0.017167	0.038627	0.007153	0.004292	0.004292
30	0.011445	0.030043	0.034335	0.028612	0.028612	0.025751	0.001431	0.001431	0.001431
31	0.031474	0.035765	0.034335	0.02289	0.027182	0.018598	0.010014	0.010014	0.008584
32	0.041488	0.034335	0.032904	0.028612	0.025751	0.025751	0.007153	0.010014	0.002861
33	0.025751	0.027182	0.02432	0.02432	0.034335	0.017167	0.015737	0.011445	0.008584
34	0.02289	0.032904	0.027182	0.008584	0.010014	0.020029	0.005722	0.005722	0.007153
35	0.011445	0.035765	0.02289	0.02432	0.02289	0.018598	0.005722	0.344778	0.010014
36	0.028612	0.025751	0.02432	0.031474	0.02289	0.02289	0.005722	0.008584	0.010014
37	0.028612	0.025751	0.038627	0.014306	0.011445	0.014306	0.010014	0.011445	0.011445
38	0.042918	0.041488	0.038627	0.021459	0.027182	0.027182	0.010014	0.012876	0.010014
39	0.035765	0.032904	0.031474	0.027182	0.028612	0.025751	0.007153	0.008584	0.007153
40	0.031474	0.027182	0.028612	0.014306	0.015737	0.014306	0.010014	0.005722	0.015737
41	0.008584	0.028612	0.005722	0.015737	0.02289	0.017167	0.008584	0.005722	0.008584
42	0.027182	0.030043	0.02432	0.041488	0.02432	0.02432	0.008584	0.010014	0.008584
43	0.034335	0.032904	0.031474	0.02432	0.015737	0.02432	0.008584	0.008584	0.007153
44	0.030043	0.030043	0.027182	0.018598	0.085837	0.012876	0.010014	0.008584	0.014306
45	0.02432	0.030043	0.021459	0.021459	0.027182	0.010014	0.005722	0.002861	0.005722
46	0.021459	0.031474	0.034335	0.008584	0.025751	0.02432	0.001431	0.002861	0.010014
47	0.04721	0.037196	0.02289	0.02432	0.017167	0.014306	0.005722	0.005722	0.005722
48	0.028612	0.035765	0.031474	0.015737	0.005722	0.012876	0.004292	0.005722	0.007153
49	0.038627	0.032904	0.04721	0.007153	0.008584	0.021459	0.007153	0.008584	0.007153
50	0.044349	0.040057	0.044349	0.034335	0.041488	0.021459	0.007153	0.005722	0.004292
Sum of errors	1.519313	1.589413	1.745351	1.130186	1.187411	1.065808	0.387697	0.736767	0.437768
Average error	0.030386	0.031788	0.034907	0.022604	0.023748	0.021316	0.007754	0.014735	0.008755
Average Accuracy	0.969614	0.968212	0.965093	0.977396	0.976252	0.978684	0.992246	0.985265	0.991245

Step 6 (Stopping Criteria): Our network stops training/running when the network meets any one of the following conditions:

When the number of epochs of the network reaches 1000 (maximum).

When the value of the goal of the network reaches 0 (zero).

When the value of the performance gradient of the network reaches 10^–⁶.

When the value of validation failure of the network reaches 6.

In the case of BR, when the value of mu of the network reaches 10¹⁰.

After completing the above process, the exact weight and biases of the network are obtained for both layers. After that, we calculated errors, accuracy, and average accuracy. In Tables 2 to 11, we have shown errors, accuracy, and average accuracy values.

Table 4

Errors, average error and average accuracy values on glass dataset for hidden nodes varied from 1 to 50 using tansig, logsig and New-Sigmoid on SCG, LM and BR training algorithms

No of hidden node	SCG algorithm			LM algorithm			BR algorithm
	tansig	logsig	NewSigmoid	tansig	logsig	NewSigmoid	tansig	logsig	NewSigmoid
1	0.051402	0.084112	0.065421	0.074766	0.056075	0.051402	0.042056	0.056075	0.042056
2	0.03271	0.065421	0.084112	0.014019	0.023364	0.046729	0.009346	0.009346	0.014019
3	0.042056	0.046729	0.060748	0.023364	0.03271	0.03271	0.009346	0.009346	0.023364
4	0.060748	0.084112	0.060748	0.065421	0.116822	0.028037	0.004673	0.004673	0.014019
5	0.046729	0.023364	0.065421	0.014019	0.037383	0.037383	0.009346	0.009346	0.009346
6	0.088785	0.051402	0.079439	0.018692	0.060748	0.042056	0.004673	0	0.004673
7	0.014019	0.102804	0.051402	0.03271	0.014019	0.023364	0.009346	0.009346	0.004673
8	0.023364	0.042056	0.056075	0.037383	0.028037	0.03271	0.009346	0.009346	0.018692
9	0.051402	0.051402	0.051402	0.023364	0.079439	0.023364	0.014019	0.018692	0.014019
10	0.03271	0.046729	0.037383	0.03271	0.046729	0.056075	0.014019	0.014019	0.014019
11	0.046729	0.060748	0.046729	0.028037	0.238318	0.023364	0.009346	0.009346	0.004673
12	0.037383	0.060748	0.051402	0.028037	0.028037	0.042056	0.014019	0.018692	0.014019
13	0.046729	0.056075	0.056075	0.023364	0.037383	0.028037	0.014019	0.014019	0.018692
14	0.065421	0.065421	0.070093	0.028037	0.023364	0.023364	0.018692	0.018692	0.018692
15	0.056075	0.051402	0.060748	0.074766	0.028037	0.051402	0.014019	0.018692	0.018692
16	0.03271	0.037383	0.042056	0.014019	0.014019	0.042056	0.009346	0.009346	0.009346
17	0.079439	0.088785	0.079439	0.037383	0.056075	0.065421	0.004673	0.009346	0.004673
18	0.037383	0.046729	0.037383	0.009346	0.046729	0.014019	0.004673	0.004673	0
19	0.074766	0.065421	0.084112	0.042056	0.014019	0.014019	0.018692	0.009346	0.014019
20	0.102804	0.088785	0.093458	0.018692	0.051402	0.023364	0	0	0
21	0.074766	0.042056	0.042056	0.004673	0.023364	0.009346	0.004673	0.014019	0.009346
22	0.056075	0.056075	0.023364	0.023364	0.023364	0.03271	0.014019	0.009346	0.014019
23	0.03271	0.03271	0.028037	0.023364	0.046729	0.014019	0.009346	0.014019	0.009346
24	0.051402	0.046729	0.004673	0.023364	0.004673	0.037383	0.009346	0	0.009346
25	0.056075	0.074766	0.056075	0.023364	0.03271	0.023364	0.004673	0.009346	0.004673
26	0.046729	0.051402	0.051402	0.018692	0.023364	0.023364	0	0.238318	0
27	0.046729	0.056075	0.056075	0.009346	0.004673	0.018692	0.004673	0.004673	0.004673
28	0.070093	0.074766	0.070093	0.004673	0.009346	0.018692	0.004673	0.014019	0.009346
29	0.018692	0.023364	0.023364	0.018692	0.023364	0.023364	0.009346	0.018692	0.018692
30	0.060748	0.056075	0.046729	0.037383	0.074766	0.018692	0.009346	0.014019	0.014019
31	0.056075	0.046729	0.051402	0.03271	0.009346	0.009346	0.004673	0.004673	0.004673
32	0.046729	0.042056	0.056075	0.051402	0.028037	0.014019	0.004673	0.004673	0.004673
33	0.051402	0.051402	0.037383	0.03271	0.070093	0.03271	0.004673	0	0.004673
34	0.070093	0.070093	0.046729	0.014019	0.023364	0.018692	0.004673	0.004673	0.004673
35	0.037383	0.042056	0.051402	0.028037	0.028037	0.037383	0.018692	0.023364	0.028037
36	0.018692	0.028037	0.018692	0.018692	0.018692	0.023364	0.023364	0.023364	0.023364
37	0.028037	0.023364	0.023364	0.004673	0.004673	0.004673	0	0	0.004673
38	0.042056	0.051402	0.046729	0.060748	0.03271	0.037383	0.009346	0.014019	0.018692
39	0.060748	0.070093	0.074766	0.051402	0.084112	0.042056	0.009346	0.009346	0.009346
40	0.037383	0.070093	0.051402	0.154206	0.042056	0.028037	0.028037	0.018692	0.028037
41	0.088785	0.065421	0.088785	0.03271	0.03271	0.028037	0.023364	0.023364	0.023364
42	0.065421	0.079439	0.074766	0.009346	0.009346	0.023364	0.004673	0.004673	0.004673
43	0.042056	0.060748	0.070093	0.023364	0.042056	0.014019	0.009346	0.014019	0.014019
44	0.056075	0.074766	0.03271	0.121495	0.018692	0.023364	0.004673	0.004673	0.004673
45	0.051402	0.056075	0.060748	0.037383	0.060748	0.056075	0.023364	0.023364	0.023364
46	0.028037	0.051402	0.051402	0.03271	0.028037	0.023364	0.018692	0.014019	0.014019
47	0.065421	0.056075	0.056075	0.088785	0.042056	0.060748	0.014019	0.023364	0.014019
48	0.042056	0.070093	0.023364	0.03271	0.023364	0.018692	0.009346	0.009346	0.009346
49	0.028037	0.023364	0.009346	0.023364	0.023364	0.004673	0.004673	0.009346	0.004673
50	0.051402	0.070093	0.023364	0.009346	0.03271	0.014019	0.014019	0.014019	0.014019
Sum of errors	2.504673	2.836449	2.584112	1.686916	1.953271	1.434579	0.537383	0.831776	0.61215
Average error	0.050093	0.056729	0.051682	0.033738	0.039065	0.028692	0.010748	0.016636	0.012243
Average Accuracy	0.949907	0.943271	0.948318	0.966262	0.960935	0.971308	0.989252	0.983364	0.987757

Table 5

Accuracy values of tansig, logsig and NewSigmoid activation function on SCG training algorithm for the wine-dataset. At 25 hidden neuron size, we show training accuracy (Tr. acc.), validation accuracy (Val. acc.), testing accuracy (Test. acc.) and all confusion accuracy (Avg. acc.). But at 41 and 44 hidden neuron size, we show average accuracy only

Sr.	Activation Function (AF)	At hidden neuron size (i) = 25				i = 41	i = 44
		Tr. acc.	Val. acc.	Test. acc.	Avg. acc.	Avg. acc.	Avg. acc.
1	tansig	100	96.3	100	99.4	98.9	98.9
2	logsig	100	96.3	100	99.4	98.3	97.8
3	NewSigmoid	100	96.3	100	99.4	99.4	98.9

Table 6

Accuracy values of tansig, logsig and NewSigmoid activation function on LM training algorithm for the wine-dataset. At 25 hidden neuron size, we showed Tr. acc., Val. acc., Test. Acc. and Avg. acc. But at 30 and 39 hidden neuron size, we show average accuracy only

Sr.	Activation Function	At hidden neuron size (i) = 25				i = 30	i = 39
		Tr. acc.	Val. acc.	Test. acc.	Avg. acc.	Avg. acc.	Avg. acc.
1	tansig	100	92.6	100	98.9	98.9	100
2	logsig	100	96.3	100	99.4	97.8	27
3	NewSigmoid	100	100	100	100	97.8	99.4

Table 7

Accuracy value of tansig, logsig and New Sigmoid activation function on BR training algorithm for the wine-dataset. At 16 hidden neuron size, we show Tr. acc., Val. acc., Test. Acc. and Avg. acc. But at 18 and 32 hidden neuron size, we showed average accuracy only

Sr.	Activation Function	At hidden neuron size (i) = 16				i = 18	i = 32
		Tr. acc.	Val. acc.	Test. acc.	Avg. acc.	Avg. acc.	Avg. acc.
1	tansig	100	–	100	100	100	99.4
2	Logsig	100	–	100	100	100	99.4
3	NewSigmoid	100	–	100	100	100	99.4

Table 8

Accuracy value of tansig, logsig and NewSigmoid activation function on SCG training algorithm for the overian-dataset. At 25 hidden neuron size, we show Tr. acc., Val. acc., Test. Acc. and Avg. acc. But at 41 and 44 hidden neuron size, we showed average accuracy only

Sr.	Activation Function	At hidden neuron size (i) = 25				i = 41	i = 44
		Tr. acc.	Val. acc.	Test. acc.	Avg. acc.	Avg. acc.	Avg. acc.
1	tansig	100	96.9	84.4	97.2	93.1	94
2	Logsig	98	96.9	87.5	96.3	93.5	94.4
3	NewSigmoid	100	96.9	84.4	97.2	91.7	94

Table 9

Accuracy value of tansig, logsig and NewSigmoid activation function on LM training algorithm for the overian-dataset. At 25 hidden neuron size, we show Tr. acc., Val. acc., Test. acc. and Avg. acc. But at 30 and 39 hidden neuron size, we show average accuracy only

Sr.	Activation Function	At hidden neuron size (i) = 25				i = 30	i = 39
		Tr. acc.	Val. acc.	Test. acc.	Avg. acc.	Avg. acc.	Avg. acc.
1	tansig	100	100	81.3	97.2	90.7	98.6
2	logsig	100	96.9	84.4	97.2	96.8	98.6
3	NewSigmoid	100	96.9	84.4	97.2	94.0	99.5

Table 10

Accuracy value of tansig, logsig and NewSigmoid activation function on BR training algorithm for the overian-dataset. At 16 hidden neuron size, we show Tr. acc., Val. acc., Test. Acc. and Avg. acc.. But at 18 and 32 hidden neuron size, we show average accuracy only

Sr.	Activation Function	At hidden neuron size (i) = 16				i = 18	i = 32
		Tr. acc.	Val. acc.	Test. acc.	Avg. acc.	Avg. acc.	Avg. acc.
1	tansig	100	–	90.6	98.6	98.6	96.8
2	logsig	100	–	87.5	98.1	99.1	56
3	NewSigmoid	100	–	93.8	99.1	98.6	97.2

Table 11

Accuracy values on tansig, logsig and New Sigmoid activation function on 41 hidden neuron size, and SCG training algorithm for the mathwork-cap-dataset

Sr.	Activation Function	Tr. acc.	Val. acc.	Test acc.	All acc.
1	tansig	100	100	63.6	94.7
2	logsig	100	63.6	90.9	93.3
3	NewSigmoid	100	100	72.7	96

4 Results

After completing the training process of these networks, we observed the following eight sets of results:

In the case of the iris dataset (Table 2) with the help of SCG training algorithm, a maximum of 99.3% accuracy has been achieved using tansig, logsig, and NewSigmoid. With the help of LM training algorithm, a maximum of 100% accuracy has been achieved using tansig, logsig, and NewSigmoid. From hidden node number 1 to 50, with the help of LM, the average accuracy of 96.3% has been achieved using tansig, the average accuracy of 96.8% has been achieved using logsig, and the average accuracy of 96.9% has been achieved using NewSigmoid which is the highest. With the help of BR training algorithm, a maximum of 100% accuracy has been achieved using tansig, logsig, and NewSigmoid. From hidden node number 1 to 50, with the help of BR, the average accuracies of 99.05%, 96%, and 98.4% have been achieved using tansig, logsig, and NewSigmoid respectively.

In the case of the cancer dataset (Table 3) with the help of the SCG training algorithm, a maximum of 98.8% accuracy has been achieved using tansig, a maximum of 98.3% accuracy has been achieved using logsig, and a maximum of 99.4% accuracy has been achieved using NewSigmoid which is the highest. The LM training algorithm achieves a maximum of 99.3% accuracy with tansig, a maximum of 99.4% accuracy has been achieved using logsig and a maximum of 99.7% accuracy has been achieved using NewSigmoid which is the highest. From hidden node 1 to 50, with the help of the LM, the average accuracy of 97.7% has been achieved using tansig, the average accuracy of 97.6% has been achieved using logsig, and the average accuracy of 97.9% has been achieved using NewSigmoid which is the highest. With the help of the BR training algorithm, a maximum of 100% accuracy has been achieved using tansig, a maximum of 99.9% accuracy has been achieved using logsig, NewSigmoid. From hidden node numbers 1 to 50, the average accuracy of 99.2%, 98.5%, and 99.1% have been achieved using tansig, logsig, and NewSigmoid respectively.

In the case of the glass dataset (Table 4) with the help of the SCG training algorithm, a maximum of 98.6%, 97.7%, and 99.5% accuracy has been achieved using tansig, logsig, and NewSigmoid (highest) respectively. With the help of the LM training algorithm, a maximum of 99.5% accuracy has been achieved using tansig, logsig, and NewSigmoid. From hidden node numbers 1 to 50, the average accuracy of 96.5% has been achieved using tansig, the average accuracy of 96.1% has been achieved using logsig, and the average accuracy of 97.1% (highest) has been achieved using NewSigmoid. With the help of the BR training algorithm, a maximum of 100% accuracy has been found by tansig, logsig, and NewSigmoid. From hidden node numbers 1 to 50, the average accuracy of 98.9%, 98.3%, and 98.8% have been achieved using tansig, logsig, and NewSigmoid respectively.

In the case of the chemical dataset, 100% accuracy has been achieved using tansig, logsig, and NewSigmoid on SCG, LM, and BR training algorithms.

In the case of the bodyfat dataset, 100% accuracy has been achieved using tansig, logsig, and NewSigmoid on SCG, LM, and BR training algorithms.

In the case of the wine dataset according to Table 5, 99.4% accuracy has been achieved using tansig, logsig, and NewSigmoid on the number of hidden node 25 and on the SCG training algorithm. On the number of hidden node 41, 99.4% accuracy (highest) has been achieved using NewSigmoid. On the number of hidden node 44, 98.9% accuracy has been achieved using NewSigmoid which is equivalent to the tansig. According to Table 6, on the number of hidden node 25 and on the LM training algorithm, 100% accuracy (highest) has been achieved using NewSigmoid. On the number of hidden node 30, 97.8% accuracy has been achieved using NewSigmoid which is equivalent to the tansig. On the number of hidden node 39, 99.4% accuracy has been achieved using NewSigmoid which is much better than the logsig. According to Table 7, on the number of hidden nodes 16 and 18 and on the BR training algorithm, 100% accuracy (highest) has been achieved using NewSigmoid. On the number of hidden node 32, 99.4% accuracy has been achieved using NewSigmoid which is equivalent to the tansig and logsig.

In the case of the ovarian dataset according to Table 8, 97.2% accuracy has been achieved using NewSigmoid on the number of hidden node 25 and on the SCG training algorithm, which is equivalent to the tansig. On the number of hidden node 41, 91.7% accuracy has been achieved using NewSigmoid. On the number of hidden node 44, 94% accuracy has been achieved using NewSigmoid which is equivalent to the tansig. According to Table 9, on the number of hidden node 25 and on the LM training algorithm, 97.2% accuracy has been achieved using tansig, logsig, and NewSigmoid. On the number of hidden node 30, 94% accuracy has been achieved using NewSigmoid which is better than the tansig. On the number of hidden node 39, 99.5% accuracy has been achieved using NewSigmoid (highest). According to Table 10, on the number of hidden node 16 and on the BR training algorithm, 99.1% accuracy has been achieved using NewSigmoid which is the highest. On the number of hidden node 18, 98.6% accuracy has been achieved using NewSigmoid which is equivalent to the tansig. On the number of hidden node 32, 97.2% accuracy has been achieved using New Sigmoid which is the highest.

In the case of the mathwork-cap image dataset, we randomly chose 41 hidden neuron size and the SCG training algorithm (Table 11). 96% accuracy (highest) has been achieved using the NewSigmoid activation function. 94.7% and 93.3% accuracy have been achieved using tansig and logsig activation function respectively.

4.1 Training, validation and testing accuracy

We have also shown training and testing accuracy from Tables 5 to 11. We achieved 100% training and testing accuracy (with SCG and LM), and validation accuracy 96.3% with SCG and 100% with LM on the wine dataset on the hidden neuron size 25 (Table 5 and 6). We achieved the highest validation accuracy on LM (Table 6) using NewSigmoid. With the help of the BR training algorithm on the hidden neuron size 16, we achieved 100% training and testing accuracy on the wine dataset (Table 7). We achieved 100% training accuracy, 84.4% testing accuracy, and 96.9% validation accuracy (with SCG and LM) on the hidden neuron size 25 on the ovarian dataset (Tables 8 and 9). With the help of the BR training algorithm on the hidden neuron size 16, we achieved 100% training and 93.8% testing accuracy on the ovarian dataset (Table 10). We achieved 100% training and validation accuracy, and 72.7% testing accuracy (with SCG) on the hidden neuron size 41 on the mathwork-cap dataset (Table 11).

4.2 Performance analysis

Figure 4, shows the performance analysis of three AFs. This analysis has obtained with the help of the LM training algorithm and 39 hidden neuron size on the wine dataset. We have achieved the best training performance is 1.2408e-24 at epoch 1000 with the help of tansigmoid. We have achieved the best training performance is 5.1271e-24 at epoch 1000 with the help of logsigmoid. We have achieved the best training performance is 5.8873e-05 at epoch 1000 with the help of NewSigmoid. For the same training samples, algorithms, and the hidden neuron size NewSigmoid achieves the highest accuracy.

Fig. 4

Mean square error. Figure (a), (b) and (c) are with respect to tansigmoid, logsigmoid, and New Sigmoid.

5 Discussion

In this paper, our goal is to introduce a new activation function and its properties. We propose a new activation function called as NewSigmoid activation function. We compare NewSigmoid with logsigmoid and tansigmoid (widely used activation function). With the help of seven datasets and one image dataset, we may say that in multiple cases NewSigmoid achieves better or equivalent results as compared with logsigmoid and tansigmoid. NewSigmoid achieves the highest average accuracy on LM in the case of the iris dataset (96.9%), cancer dataset (97.9%), and glass dataset (97.1%). So, NewSigmoid may achieve better results than the other two activation function where the LM algorithm is used. As compared with logsigmoid, NewSigmoid achieves better average accuracy on BR in the case of iris dataset (98.4%), cancer dataset (99.1%), and glass dataset (98.8%). So, NewSigmoid may achieve better results than the logsigmoid activation function where the BR algorithm is used. Like logsigmoid and tansigmoid, Newsigmoid also achieves 100% accuracy on the iris, glass, chemical, bodyfat, wine dataset. In the ovarian dataset, NewSigmoid achieves the highest (99.5%) accuracy on LM. On the mathwork-cap dataset (image dataset), NewSigmoid achieves the highest (96%) accuracy. We may also increase these accuracies with the help of some preprocessing techniques, such as normalization, taking some other networks. Like logsigmoid and tansigmoid our proposed AF also allows backpropagation, because this function is a differential function. NewSigmoid is zero centered but logsigmoid is not zero centered AF. Therefore, our function works better than the logsigmoid. The logsigmoid is not symmetric around zero, but NewSigmoid is symmetric around zero like tansigmoid. So, due to this, we may use NewSigmoid for solving very complex problems and non-linear problems such as audio, images, or any high dimensionality problems. Amir Farazad et. al. [17] observed that sigmoid activation with range [–0.5, 1.5] generally produced better or more accurate results. The range of NewSigmoid is [–0.7071, 0.7071], and this AF also produces better results in multiple cases as compared with logsigoid (range is [0, 1]) and tansigmoid (range is [–1, 1]).

Like logsigmoid and tansigmoid, there are two limitations of the NewSigmoid AF. This AF has a finite range, due to this reason for the very high or very low value of input; there is almost no change in prediction. These problems are also called the vanishing gradient problems. These problems can be solved by scaling methods. This AF may be performing slow convergence.

6 Conclusion and future scope

In this paper, we have explained proposed NewSigmoid Activation Function (AF). First of all, we have explained the neural networks and their usages. We have taken Multilayer Feedforward Neural Network and explained its working. Then we have explained some activation functions. Afterward, we proposed our NewSigmoid AF, mentioning its properties. Our AF is a smooth S-shape, rangebound (–0.7071, 0.7071), continuously differentiable, and zero-centered function. Vanishing gradient and slow convergence are two limitations of the NewSigmoid AF. We have shown through eight datasets that in multiple cases this AF is at par with the tansig and logsig activation function, and in some cases, this AF achieved better results as compared with the tansig and logsig activation function. We have tested our activation function on iris, cancer, glass, chemical, bodyfat, wine, and ovarian datasets. We used SCG, LM, and BR algorithms during the optimization of the neural network. With the help of NewSigmoid, we explored multiple better results. In the iris dataset, we achieved a maximum of 99.3% accuracy on SCG, 100% accuracy on the LM and BR algorithms. In the cancer dataset, we found a maximum of 99.4% accuracy on SCG, 99.7% accuracy on LM, and 99.9% accuracy on the BR algorithm. In the glass dataset, we achieved a maximum of 99.5% accuracy on SCG and LM, and 100% accuracy on the BR algorithm. In the chemical and bodyfat dataset, we achieved a maximum of 100% accuracy on the SCG, LM, and BR algorithms. In the wine dataset, we achieved a maximum of 99.4% accuracy on SCG (at hidden neuron size 25 and 41), 100% accuracy on LM (at hidden neuron size 25), and 100% accuracy on BR algorithm (at hidden neuron size 16 and 18). In the ovarian dataset, we achieved a maximum of 97.2% accuracy on SCG (at hidden neuron size 25), 99.5% accuracy on LM (at hidden neuron size 39), and 99.1% accuracy on BR algorithm (at hidden neuron size 16). We have also tested on the mathwork-cap dataset (image dataset). We achieved 96% accuracy on SCG (at hidden neuron size 41). We also achieved 100% training and testing accuracy on some datasets.

In view of the above, NewSigmoid may be considered in the neural network for achieving better accuracy in place of logsig and tansig activation function while experimenting with LM or BR algorithm on numerical datasets and SCG algorithm on image datasets. On all the sizes of hidden nodes, tansigmoid and logsigmoid do not achieve better performance i.e. on some hidden neuron sizes tansigmoid achieves better results, and on some hidden neuron sizes, logsigmoid achieves better results. So if tansigmoid and logsigmoid do not achieve satisfactory results then we may use our newSigmoid for checking and getting better results. In the future, we will strive to make some more good activation functions with the help of NewSigmoid or existing AFs. In the future, we will also use NewSigmoid in deep neural networks.

References

Goodfellow

, Bengio

, Courville

, Deep learning, MIT press, (2016).

Hagan

M.T.

, Neural Network Design, 2^nd Edition Book (2014).

Haykin

, Neural Networks, and Learning Machines, 3^rd Edition, Pearson Prentice Hall (2009).

Aggarwal

C.C.

, Neural Networks and Deep learning: A Textbook, Springer Publication, (2018).

Swasono

D.I.

, Tjandrasa

, Fathicah

, Classification of Tobacco Leaf Pests Using VGG16 Transfer Learning, 12th International Conference on Information & Communication Technology and System (ICTS), Surabaya, Indonesia, (2019), 176–181.

Selimovic

, Meden

, Peer

, Hladnik

, Analysis of Content-Aware Image Compression with VGG16, 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), San Carlos, (2018), 1–7.

Wang

, Garbage Recognition and Classification System Based on Convolutional Neural Network VGG16, 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE), Shenzhen, China, (2020), 252–255.

Enkhtaivan

, Adesuyi

T.A.

, Kim

, Facial Emotion Recognition using Convolutional Neural Network Based on Repetitive Learning Blocks Approach, (2020), 512–514.

Hanxu

, Yue

, Hao

, Qiongyang

, Xiaonan

, Yongquan

, Jun

, Research on Human Action Recognition Based on Improved Pooling Algorithm, 2020 Chinese Control And Decision Conference (CCDC), IEEE, (2020).

10.

Liu

Q.-f.

, Iqbal

M.F.

, Yang

, Lu

X.-y.

, Zhang

, Rauf

, Prediction of chloride diffusivity in concrete using artificial neural network: Modelling and performance evaluation, Construction and Building Materials 268 (2021), 121082.

11.

Kousik

N.V.

, Natarajan

, Raja

R.A.

, Kallam

, Patan

, Gandomi

A.H.

, Improved salient object detection using hybrid Convolution Recurrent Neural Network, Expert Systems with Applications 166 (2021), 114064.

12.

Shorfuzzaman

, Shamim

, Hossain, MetaCOVID: A Siamese neural network framework with contrastive loss for n-shot diagnosis of COVID-19 patients, Pattern Recognition 113 (2021), 107700.

13.

Zhang

, Lou

, The application research of neural network and BP algorithm in stock price pattern classification and prediction, Future Generation Computer Systems 115 (2021), 872–879.

14.

Ren

, Gu

, Wei

, Tree-RNN: Tree structural recurrent neural network for network traffic classification, Expert Systems with Applications 167 (2021), 114363.

15.

Al-Andoli

, Tan

S.C.

, Cheah

W.P.

, Parallel stacked autoencoder with particle swarm optimization for community detection in complex networks, Applied Intelligence (2021), 1–21.

16.

Al-Andoli

M.N.

, Tan

S.C.

, Cheah

W.P.

, Tan

S.Y.

, A Review on Community Detection in Large Complex Networks from Conventional to Deep Learning Methods: A Call for the Use of Parallel Meta-Heuristic Algorithms, IEEE Access 9 (2021), 96501–96527.

17.

Farzad

, Mashayekhi

, Hassanpour

, A comparative performance analysis of different activation functions in LSTM networks for classification, Neural Computing and Applications 31(7) (2019), 2507–2521.

18.

Arvind

T.K.R.

, Brand

, Heidorn

, Boppu

, Hannig

, Teich

, Hardware Implementation of Hyperbolic Tangent Activation Function for Floating Point Formats, In 2020 24th International Symposium on VLSI Design and Test (VDAT), 1–6, IEEE, (2020).

19.

Shakiba

F.M.

, Zhou

, Novel Analog Implementation of a Hyperbolic Tangent Neuron in Artificial Neural Networks, IEEE Transactions on Industrial Electronics (2020).

20.

Chandra

, A Novel Method for Scalable VLSI Implementation of Hyperbolic Tangent Function, IEEE Design & Test (2021).

21.

Kumar

, Kumar

, Singh

A.K.

, Artificial Neural Network Model Development for the Analysis of Maximum Pressure of Hole Entry Journal Bearing Using SciLab, In Emerging Trends in Mechanical Engineering 19–29, Springer, Singapore, (2021).

22.

Raja

M.A.Z.

, Shah

F.H.

, Tariq

, Ahmad

, Design of artificial neural network models optimized with sequential quadratic programming to study the dynamics of nonlinear Troesch’s problem arising in plasma physics, Neural Computing and Applications 29(6) (2018), 83–109.

23.

Marquardt

D.W.

, An algorithm for least-squares estimation of nonlinear parameters, Journal of the Society for Industrial and Applied Mathematics 11(2) (1963), 431–441.

24.

Moller

M.F.

, A scaled conjugate gradient algorithm for fast supervised learning, Neural Networks 6 (1993), 525–533.

25.

Liang

, Ning

, Parameter optimization of load frequency control system composed of hydroelectric and thermal power units based on Levenberg-Marquardt algorithm, In 2020 5th Asia Conference on Power and Electrical Engineering (ACPEE), 75–80, IEEE.

26.

, Wang

, Zhang

, Huang

, Lin

, A Calculation Method for Three-Phase Power Flow in Micro-Grid Based on Smooth Function, IEEE Transactions on Power Systems (2020).

27.

Upadhyay

P.K.

, Pandita

, Joshi

, Scaled Conjugate Gradient Backpropagation based SLA Violation Prediction in Cloud Computing, In 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), 203–208, IEEE.

28.

Nayak

J.G.

, Patil

L.G.

, Patki

V.K.

, Artificial neural network based water quality index (WQI) for river Godavari (India), Materials Today: Proceedings (2021).

29.

Yadav

, Nath

, Novel Application of Linear Scaling to Improve Accuracy of Optimized Artificial Neural Network Using Levenberg-Marquardt Algorithm in Prediction of Daily Nitrogen Oxide for Health Management, Metaheuristic and Evolutionary Computation: Algorithms and Applications 665–688, Springer, Singapore, (2021).

30.

Faraggi

, Jernigan

R.L.

, Kloczkowski

, A Hybrid Levenberg–Marquardt Algorithm on a Recursive Neural Network for Scoring Protein Models, In Artificial Neural Networks, pp. 307–316, Humana, New York, NY, (2021).

31.

Sada

S.O.

, Improving the predictive accuracy of artificial neural network (ANN) approach in a mild steel turning operation, The International Journal of Advanced Manufacturing Technology (2021), 1–10.

32.

Abdollahi

, Fatemi

, A new conjugate gradient method based on a modified secant condition with its applications in image processing, RAIRO-Operations Research 55(1) (2021), 167–187.

33.

Sujatha

, Mareeswari

, Chatterjee

J.M.

, Mousa Abd Allah

and Hassanien

A.E.

, A Bayesian Regularized Neural Network for Analyzing Bitcoin Trends, IEEE Access 9 (2021), 37989–38000.

34.

Moshkbar-Bakhshayesh

, Identification of the appropriate architecture of multilayer feed-forward neural network for estimation of NPPs parameters using the GA in combination with the LM and the BR learning algorithms, Annals of Nuclear Energy 156 (2021), 108222.

35.

Aneja

, Sharma

, Gupta

, Yoo

D.-Y.

, Bayesian Regularized Artificial Neural Network Model to Predict Strength Characteristics of Fly-Ash and Bottom-Ash Based Geopolymer Concrete, Materials 14(7) (2021), 1729.

36.

Handayani

A.N.

, Lathifah

, Herwanto

H.W.

, Asmara

R.A.

, Arai

, Neural Network Bayesian Regularization Backpropagation to Solve Inverse Kinematics on Planar Manipulator, In 2018 Joint 7th International Conference on Informatics, Electronics & Vision (ICIEV)and 2018 2nd International Conference on Imaging, Vision & Pattern Recognition (icIVPR), 99–104, IEEE.

37.

https://www.mathworks.com, Website of MATLAB program.

38.

Qin

, Wang

, Zou

, The optimized deep belief networks with improved logistic sigmoid units and their application in fault diagnosis for planetary gearboxes of wind turbines, IEEE Transactions on Industrial Electronics 66(5) (2018), 3814–3824.

39.

Wang

, Qin

, Wang

, Xiang

, Chen

, ReLTanh: An activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis, Neurocomputing 363 (2019), 88–98.