Abstract
We increase the power of the Artificial Neural Networks with the help of the Activation Function (AF). The tansig and logsig are widely used AF. But there is still requires some improvement in the AF. So, in this paper, we have proposed a NewSigmoid AF in the neural network. NewSigmoid is also as powerful as tansig and logsig. In multiple cases, the NewSigmoid function gives a better or equivalent performance as compared with both these AF. Like these AF, NewSigmoid is also a smooth S-shape, bounded, continuously differentiable, and zero-centered function. Therefore the NewSigmoid is also suitable for solving non-linear problems. We have tested this AF on iris, cancer, glass, chemical, bodyfat, wine, and ovarian datasets. We use Scaled Conjugate Gradient (SCG), Levenberg-Marquardt (LM), and Bayesian Regularization (BR) algorithms during the optimization of the neural network. Maximum 100% accuracy in the iris dataset while using LM, and BR; 99.9% accuracy in the cancer dataset using BR; 100% accuracy in the glass dataset using BR; 100% accuracy in the chemical and bodyfat dataset using SCG, LM, and BR; 100% accuracy in the wine dataset using LM, and BR; and 99.1% accuracy in the ovarian dataset using BR has been found while working with multilayer neural networks. The NewSigmoid also achieves 100% training and validation accuracy on the mathework-cap image dataset using SCG.
Introduction
A neural network is a branch of machine learning [1]. Clustering, classification, and recognition are some major areas of Artificial Neural Networks (ANN) [1–4]. Classification of Tobacco Leaf Pests [5], Image compression with VGG16 [6], Garbage Recognition and classification [7], Facial Emotion Recognition [8], Human Action Recognition [9], Prediction of chloride diffusivity in concrete [10], improve salient object detection [11], prediction of covid-19 patient [12], stock price pattern classification and prediction [13], and network traffic classification [14] are some applications, where ANN are used. ANN is more suitable for solving nonlinear and complex problems. Therefore some other important ANN applications are robotic control, quality control, disease detection in the medical field. ANN may also be used for relationship detection between things that are represented by graphs. So that ANNs are used for Community Detection (CD) in complex networks [15]. Moh. Al. Andoli et al. used Parallel Stacked Autoencoder with PSO for community detector in complex networks. Subspaces groups or individuals are also detected with the help of CD [16].
The architecture of an ANN is inspired by the biological human brain [1–4]. Normally, ANN is made with the help of multiple layers. All the layers are categorized into three parts. They are an input layer, an output layer, and one or more hidden layers [2]. One input, one output, and one hidden layer combinedly make a single-layer neural network. Training of ANN is called the learning process. This learning process has been categorized into three parts: Supervised learning, Reinforcement Learning, and unsupervised learning. For testing our work, we have taken Multilayer Feedforward Neural Network (MLFFNN), which comes under the supervised learning process. Normally, MLFFNN is made either a single hidden layer or multiple hidden layers. The value of the neuron is calculated with the help of that hidden layer. This neuron is also known as the Information processing unit of ANN [3].
The working of a neuron depends on three basic elements [3]. The first element is Synapses or connecting links where weights of all inputs are assigned and these weights are multiplied by their input value. The second element is the adder where these multiplied values are added. The result of this added value is known as neuron. The third element is the activation function where outputs are calculated with the help of this neuron value.
Working of MLFFNN
Figure 1 is an ANN model. In this figure, the first layer is the input layer, the second layer is the hidden layer, and the third layer is the output layer. Each layer has its own weight matrix, its own bias vector, a net input vector, and an output vector. Hidden neurons work like feature detectors [3]. Therefore, by applying the hidden neurons, we find out the features of the network.

Sample of pattern recognition neural network. This network is made with the help of the iris dataset. In this network, the numbers of inputs are 4. The tansig function is used in the hidden layer and softmax function is used in the output layer. The numbers of outputs are 3.
Training of multilayer perceptron can be done by using a back-propagation algorithm, which also includes a special case of least mean-square (LMS) algorithm (presented by Widrow and Hoff) [3]. This training has two phases [2–4].
During this phase layer-by-layer, the synaptic weights of the network are calculated and the input signal is propagated through the network until we reach the outputs. After that, we calculate output with the help of Equation (2). The final predicted output is used for calculating the errors.
Suppose, vji is the value of ith neuron at jth layer (Summation of multiplication of inputs and weights are called induced local field. Mention in Equation (1).)
Where, wji is the weight value of ith neuron at jth layer, yi is the output value of the ith layer neuron or input value of the jth layer neuron, and m is the total number of inputs (including bias). If activation function (AF) is φ (vJ), then output value of jth layer is calculated with the help of Equation (2) where yj is the output value of the jth layer neuron.
Generally, logsig and tansig activation functions are used in neural networks [17, 18]. Arvind et al. [18], Fatemeh et al. [19], and Chandra et al. [20] have used tansig AF in ANN. Sunil et al. [21] have used logsig AF in ANN. Raja et al. [22] have used log-sigmoid, radial-base, and tan-sigmoid AF in ANN.
Suppose yj is the final predicted output value and d is the given target value then the value of error (e) of the network is calculated with the help of Equation (3).
If this error value is greater or less than zero then we apply the backward phase to minimize this error value otherwise we stop the training of the network.
The error values occur due to random decision of weight and bias values. So during this phase, our main aim is to update weight and bias values. The error of Equation (3) is again propagated through the network [layer by layer] in the backward direction. Derivative of the activation function is required during the back-propagation process.
Instantaneous error energy (ɛ) of the neuron at jth layer is calculated with the help of Equation (4).
We apply the chain rule with respect to weight and bias on the Equation (4). The result of this chain rule is Equations (5) and (6).
Now, we update weight and bias using the Equations (7) and (8).
Optimization algorithms are used during the calculation of new weights and biases. Losses are reduced with the help of these optimization algorithms. Levenberg_Marquardt (LM), SCG, Bayesian Regularization (BR) are some famous optimization algorithms. LM is proposed by D. W. Marquardt [23]. The scaled conjugate gradient (SCG) algorithm is based on conjugate directions [24]. LM, SCG, and BR are generally used by many researchers. Liang et al. (2020) used LM in the load frequency system [25]. Ju et al. (2020) used LM in power flow calculation [26]. Upadhyay et al. (2019) used SCG in violation prediction in Cloud Computing [27]. Jyotiprakash et al. used LM, SCG, and BR for the prediction of water quality index in the ANN [28]. Vidha et al. used LM in the prediction of daily nitrogen oxide for health management [29]. Faraggi et al. used LM for scoring protein models [30]. S.O. Sada used LM and SCG for the prediction of accuracy in a mild steel turning operation [31]. Abdollahi et al. proposed a new conjugate gradient method that was based on SCG and was used for solving an optimization problem [32]. Sujatha et al. used BR, SCG, and LM for analysis of Bitcoin Trends [33]. Bakhshayesh et al. used LM and BR for accurate estimation of Nuclear power plants (NPP) [34]. Aneja et al. used BR in ANN to predict strength characteristics of Fly-Ash and Bottom-Ash based Geopolymer Concrete [35]. Handayani et al. used BR to solve inverse kinematics on planar manipulators [36]. In the case of batch training mode, LM is the fastest training algorithm. SCG is used for large networks. BR is used for good generalization, without the need for a validation set. BR method is used on arbitrary sizes. This method ensures that only the required number of parameters is effectively used. Due to the wide range of SCG, LM, and BR algorithms; we test our NewSigmoid activation function using SCG, LM, and BR.
Activation functions are categorized into two parts. They are: Linear activation function and Non-linear activation function.
Linear activation function: The linear activation function is useful for linear separability types of problems. Hard limit, symmetrical hard limit, and linear are some examples of the linear activation function. In the hard limit function, if the input value is less than zero then the output will become zero otherwise output will become one. In the symmetrical hard limit function if the input value is less than zero then the output will become a negative one otherwise output will become one. In the linear activation function if the input (n) is any value then the output will become the same (n) value as that of input.
Non-linear activation function: The non-linear function is more useful for non-separability problems. The log sigmoid, hyperbolic tansigmoid, Elliot symmetric sigmoid transfer function (elliotsig), softmax are some examples of non-linear activation function. In log sigmoid function, if n is input then output is 1 / (1 + e-n) (Table 1). In Hyperbolic Tangent sigmoid function, if n is input then output is (en - e-n) / (en +e-n) (Table 1). In softmax function, if n is input then output is exp (n)/ sum (exp (n)). Soft means softmax is continuous and differentiable [24]. The softmax function is now mostly used in the output layer of the classifier [24].
Different activation function, their mathematical equations, derivatives and range
Different activation function, their mathematical equations, derivatives and range
The Aranda, Bi-sig1, Bi-sig2, Bi-tanh1, Bi-tanh2, cloglog, cloglogm, Elliott, Gaussian, logarithmic, loglog, logsigm, modified Elliott, rootsig, saturated, sech, sigmoidalm, sigmoidalm2, sigt, skewed–sig, softsign, and wave are some other activation functions [17]. Every sigmoid function has some properties. Some of these properties are Nonlinear, Range, Continuously Differentiable, and Shape. Mostly nonlinear problems use exponential function. The derivation of logsig AF is e-n/ (1 + e-n)2 and its range is (0, 1) (Table 1). The derivation of tanh AF is 4e2n/ (1 + e2n)2 and its range is (–1, 1) (Table 1) [1–4]. The derivation of Elliott AF is 0.5/(1+|x|)2 and its range is (0,1) [1–4]. There is a lot of AFs present in these times. Out of them, the tansig and logsig are commonly used sigmoid AFs. One AF is not suitable for every problem. We have to use different AF for numerous problems to achieve better performance. For example, Yi Qin et. al. achieved a classification accuracy of 93.73% using sigmoid, and 95.56% using Improved logistic sigmoid (Isigmoid) on the MNIST dataset [38]. The equation of an improved logistic sigmoid (Isigmoid) AF [38] is shown in Table 1, at serial number 4. The equation of a ReLU AF is shown in Table 1, at serial number 5. ReLU is the fastest AF. But normally, we do not use ReLU and its minor variants for 2 or 3 layer networks. ReLU does not use exponential or trigonometric function so does not give a local effect. Due to this reason, this function is mainly used in CNN and very deep neural networks. In the case of ReLU AF, the neurons will be deactivated if the output of the linear transformation is less than 0 (i.e. for the negative value). The equation of a RelTanh AF is shown in Table 1, at serial number 6. This RelTanh AF is an improved AF of tanh. Xin Wang et al. achieved 96.15% testing accuracy with the help of RelTanh on the faulty dataset of CWRU [39]. The tansigmoid and logsigmoid do not always achieve better performance. Taking this as a motivation, for the purpose of solving non-linear complex problems we try to develop a new activation function that should perform at par or better on some other hidden neuron size. We have succeeded to find a new function which performs at par or better, especially if compared with logsig. In the result section, we clearly show that in multiple cases our activation function achieves better performance. ReLU and its variants are mainly used in very deep layers and CNN networks. For a 2 or 3 layers deep network, we normally do not use ReLU AF. So we compare our proposed AF with tansigmoid and logsigmoid AFs. If tansigmoid and logsigmoid do not give satisfactory results then our proposed activation function can be used for getting better results.
Equation (9) is our proposed NewSigmoid activation function (AF).
Where n is input value and f (n) is output value.
NewSigmoid AF has following four properties: Nonlinear: NewSigmoid AF uses exponential function (Equation (9)) as like tansig and logsig AF. Therefore, we can use this NewSigmoid AF for solving of nonlinear problems. Range and Shape: The range of the NewSigmoid function is [–0.7071, 0.7071]. This range is a finite range. Therefore, in the case of pattern recognition, this function will show more stability. The shape of the proposed activation function is ‘S’ (Fig. 2). Continuously Differentiable: NewSigmoid is a continuously differentiable function. Its derivative is found by using Equation (10). Therefore, we may use NewSigmoid AF in gradient-based Optimization methods.

Plot of tansigmoid, logsigmoid and myActivationFun(or NewSigmoidmoid) activation function. This graph clear show the range of myActivationFun is [–0.7071, 0.7071].
Figure 3 shows the graph of the derivative of the proposed activation function.

Plot of differentiation of tansigmoid, logsigmoid and my Activation Fun (or New Sigmoidmoid) activation function. This graph clear shows that the differentiation of myActivationFun(f’) is continuous at 0.
Suppose n = 0, then the value of Equation (9) is as under:
On the basis of Equations (11) and (12), we can say f′ (n) is continuous at 0. Therefore, NewSigmoid has an approximate identity near the origin. During the optimization process, the proposed activation function becomes easier as compared to the logistic function. Zero-Centered: NewSigmoid is a zero-centered activation function.
We did this experiment on intel core i7, window 10, and MATLAB 2020b [37]. We did the following six steps for this research:
Step 1 (Datasets): We had taken 7-datasets from the MATLAB software toolbox. These datasets were iris, cancer, glass, bodyfat, chemical, wine and ovarian datasets. The iris dataset has 150 samples of three flowers. This dataset has four attributes and three targets. The cancer dataset has 699 samples of patient data. This dataset has 9-attributes and 2-targets. The glass dataset has 214 samples of glass values. This dataset has 9-attributes and 2-targets. The bodyfat dataset has 252 samples of bodyfat data. This dataset has 13-attributes and 1-targets. The chemical dataset has 498 samples of chemical data. This dataset has 8-attributes and 1-targets. The wine dataset has 178 samples of wine data. This dataset has 13-attributes and 3-targets. The ovarian dataset has 216 samples of ovarian data. This dataset has 100-attributes and 2-targets. We have also tested our NewSigmoid activation function on an image dataset. We tested on the mathwork-cap dataset. This dataset has 75-images of 5-classes. The size of the images is 227-by-227-by-3.
For this work, datasets were divided into three parts. 70% of the dataset is used for the purpose of the training set, 15% of the dataset is used for the validation set and 15% of the dataset is used for the purpose of the testing set.
Step 2 (Network Selection): In all seven datasets, input and corresponding output values are given. So problems of these datasets can be solved by any supervised learning model. For testing our NewSigmoid activation function, we have taken three layers of MLFFNN. One layer of this network is the input layer, the second is the hidden layer and the third is the output layer. We use tansig, logsig, and NewSigmoid activation function between input and hidden layer, one function in onenetwork.
The softmax activation function is usually used for the multi-classification model, but the logistic function is usually used for the binary classification model, therefore we take the softmax activation function between the hidden and output layer. For deciding about the number of hidden neurons or nodes in the network we first run the network by varying the hidden nodes between 1 to 50 on the iris, cancer, and glass dataset, and then check the network performance.
Step 3 (Selecting Training algorithm): We use LM, SCG, and BR training algorithms in the networks, a single algorithm in a single network.
Step 4 (Weight initialization): For this MLFFNN, we randomly assign initial weights and biases.
Step 5 (Run the network): In this step, we train/run the network with the following six conditions: In the case of the SCG algorithm, with the help of Tables 1, 2, and 3; on the hidden node numbers 17, 20, 21, 25, 41, 44, and 50, we can say that the NewSigmoid activation function works better or equivalent as compared to the tansig activation function. On the hidden node numbers 25, 34, 38, 41, 44, 47, and 48, we can say that the NewSigmoid activation function works better or is equivalent as compared to the logsig activation function. These results are only for three datasets i.e. iris, glass, and cancer datasets. For testing on the other four datasets, we had randomly chosen hidden neuron sizes 25, 41, 44 for the SCG algorithm (by observing these two results). In the case of the SCG algorithm, we have taken the maximum number of epochs as 1000, epochs value between display as 25, the value of performance goal as 0, the value of minimum performance as 10–6, the value of maximum validation failure as 6, the value of determine change in weight for second derivate approximation as 5×10–5 and the value of the parameter for regulating the indefiniteness of the Hessian as 5×10–7. In this case, the performances of the network have been checked with the help of cross-entropy. In the case of the LM algorithm, with the help of Tables 1, 2, and 3; on the hidden node numbers 49, 15, 21, 23, 25, 30, 31, 33, 39, 47, and 48, we can say that NewSigmoid activation function works better or equivalent as compared to the tansig activation function. On the hidden node numbers 5, 21, 25, 30, 32, 35, 40, 41, 44, 45, 46, and 50, we can say that NewSigmoid activation function works better or equivalent as compared to the logsig activation function. These results were only for three datasets i.e. iris, glass, and cancer datasets. For testing on other four datasets, we have randomly chosen hidden neuron sizes 25, 30, 39 for the LM algorithm (by observing these two results). In the case of the LM algorithm, we have taken the maximum number of epochs as 1000, epochs value between display as 25, value of the performance goal as 0, value of the minimum performance as 10–7, the value of the maximum validation failure as 6, the value of the initial Marquardt adjustment parameter (mu) as 0.001, which can be increased upto maximum 1010 with the help of factor 10. In this case, the performances of the network have been checked with the help of mean square error (mse). In the case of the BR algorithm, with the help of Tables 1, 2, and 3; on the hidden node numbers 2, 5, 6, 7, 9, 14, 16, 18, 20, 31, 32, 33, 39, 41, 44, 45, 47, 49, and 50, we can say that NewSigmoid activation function works better or equivalent as compared to the tansig activation function. At the number of hidden nodes 2, 7, 9, 12, 14, 16, 18, 21, 29, 30, 31, 32, 39, 42, 47, 49, and 50, we can say that NewSigmoid activation function works better or equivalent as compared to the logsig activation function. These results were only for three datasets i.e. iris, glass. and cancer datasets. For testing on the other four datasets we had randomly chosen hidden neuron size 16, 18, 32 for the BR algorithm (by observing these two results). In the case of the BR algorithm, we had taken the maximum number of epochs as 1000, epochs value between display as 25, the value of the performance goal as 0, the value of the minimum performance as 10–7, the value of the maximum validation failure as 6, the value of initial Marquardt adjustment parameter (mu) as 0.1, which can be increased upto maximum 1010 with the help of factor 10. In this case, the performances of the network have been checked with the help of mse.
Errors, average error, and average accuracy values on the iris dataset for the hidden node varied from 1 to 50 using tansig, logsig and NewSigmoid on SCG, LM and BR training algorithms
Errors, average error, and average accuracy values on the iris dataset for the hidden node varied from 1 to 50 using tansig, logsig and NewSigmoid on SCG, LM and BR training algorithms
Errors, average error and average accuracy values on cancer dataset for the hidden nodes varied from 1 to 50 using tansig, logsig and New Sigmoid activation function on SCG, LM and BR training algorithms
Step 6 (Stopping Criteria): Our network stops training/running when the network meets any one of the following conditions: When the number of epochs of the network reaches 1000 (maximum). When the value of the goal of the network reaches 0 (zero). When the value of the performance gradient of the network reaches 10–6. When the value of validation failure of the network reaches 6. In the case of BR, when the value of mu of the network reaches 1010.
After completing the above process, the exact weight and biases of the network are obtained for both layers. After that, we calculated errors, accuracy, and average accuracy. In Tables 2 to 11, we have shown errors, accuracy, and average accuracy values.
Errors, average error and average accuracy values on glass dataset for hidden nodes varied from 1 to 50 using tansig, logsig and New-Sigmoid on SCG, LM and BR training algorithms
Accuracy values of tansig, logsig and NewSigmoid activation function on SCG training algorithm for the wine-dataset. At 25 hidden neuron size, we show training accuracy (Tr. acc.), validation accuracy (Val. acc.), testing accuracy (Test. acc.) and all confusion accuracy (Avg. acc.). But at 41 and 44 hidden neuron size, we show average accuracy only
Accuracy values of tansig, logsig and NewSigmoid activation function on LM training algorithm for the wine-dataset. At 25 hidden neuron size, we showed Tr. acc., Val. acc., Test. Acc. and Avg. acc. But at 30 and 39 hidden neuron size, we show average accuracy only
Accuracy value of tansig, logsig and New Sigmoid activation function on BR training algorithm for the wine-dataset. At 16 hidden neuron size, we show Tr. acc., Val. acc., Test. Acc. and Avg. acc. But at 18 and 32 hidden neuron size, we showed average accuracy only
Accuracy value of tansig, logsig and NewSigmoid activation function on SCG training algorithm for the overian-dataset. At 25 hidden neuron size, we show Tr. acc., Val. acc., Test. Acc. and Avg. acc. But at 41 and 44 hidden neuron size, we showed average accuracy only
Accuracy value of tansig, logsig and NewSigmoid activation function on LM training algorithm for the overian-dataset. At 25 hidden neuron size, we show Tr. acc., Val. acc., Test. acc. and Avg. acc. But at 30 and 39 hidden neuron size, we show average accuracy only
Accuracy value of tansig, logsig and NewSigmoid activation function on BR training algorithm for the overian-dataset. At 16 hidden neuron size, we show Tr. acc., Val. acc., Test. Acc. and Avg. acc.. But at 18 and 32 hidden neuron size, we show average accuracy only
Accuracy values on tansig, logsig and New Sigmoid activation function on 41 hidden neuron size, and SCG training algorithm for the mathwork-cap-dataset
After completing the training process of these networks, we observed the following eight sets of results: In the case of the iris dataset (Table 2) with the help of SCG training algorithm, a maximum of 99.3% accuracy has been achieved using tansig, logsig, and NewSigmoid. With the help of LM training algorithm, a maximum of 100% accuracy has been achieved using tansig, logsig, and NewSigmoid. From hidden node number 1 to 50, with the help of LM, the average accuracy of 96.3% has been achieved using tansig, the average accuracy of 96.8% has been achieved using logsig, and the average accuracy of 96.9% has been achieved using NewSigmoid which is the highest. With the help of BR training algorithm, a maximum of 100% accuracy has been achieved using tansig, logsig, and NewSigmoid. From hidden node number 1 to 50, with the help of BR, the average accuracies of 99.05%, 96%, and 98.4% have been achieved using tansig, logsig, and NewSigmoid respectively. In the case of the cancer dataset (Table 3) with the help of the SCG training algorithm, a maximum of 98.8% accuracy has been achieved using tansig, a maximum of 98.3% accuracy has been achieved using logsig, and a maximum of 99.4% accuracy has been achieved using NewSigmoid which is the highest. The LM training algorithm achieves a maximum of 99.3% accuracy with tansig, a maximum of 99.4% accuracy has been achieved using logsig and a maximum of 99.7% accuracy has been achieved using NewSigmoid which is the highest. From hidden node 1 to 50, with the help of the LM, the average accuracy of 97.7% has been achieved using tansig, the average accuracy of 97.6% has been achieved using logsig, and the average accuracy of 97.9% has been achieved using NewSigmoid which is the highest. With the help of the BR training algorithm, a maximum of 100% accuracy has been achieved using tansig, a maximum of 99.9% accuracy has been achieved using logsig, NewSigmoid. From hidden node numbers 1 to 50, the average accuracy of 99.2%, 98.5%, and 99.1% have been achieved using tansig, logsig, and NewSigmoid respectively. In the case of the glass dataset (Table 4) with the help of the SCG training algorithm, a maximum of 98.6%, 97.7%, and 99.5% accuracy has been achieved using tansig, logsig, and NewSigmoid (highest) respectively. With the help of the LM training algorithm, a maximum of 99.5% accuracy has been achieved using tansig, logsig, and NewSigmoid. From hidden node numbers 1 to 50, the average accuracy of 96.5% has been achieved using tansig, the average accuracy of 96.1% has been achieved using logsig, and the average accuracy of 97.1% (highest) has been achieved using NewSigmoid. With the help of the BR training algorithm, a maximum of 100% accuracy has been found by tansig, logsig, and NewSigmoid. From hidden node numbers 1 to 50, the average accuracy of 98.9%, 98.3%, and 98.8% have been achieved using tansig, logsig, and NewSigmoid respectively. In the case of the chemical dataset, 100% accuracy has been achieved using tansig, logsig, and NewSigmoid on SCG, LM, and BR training algorithms. In the case of the bodyfat dataset, 100% accuracy has been achieved using tansig, logsig, and NewSigmoid on SCG, LM, and BR training algorithms. In the case of the wine dataset according to Table 5, 99.4% accuracy has been achieved using tansig, logsig, and NewSigmoid on the number of hidden node 25 and on the SCG training algorithm. On the number of hidden node 41, 99.4% accuracy (highest) has been achieved using NewSigmoid. On the number of hidden node 44, 98.9% accuracy has been achieved using NewSigmoid which is equivalent to the tansig. According to Table 6, on the number of hidden node 25 and on the LM training algorithm, 100% accuracy (highest) has been achieved using NewSigmoid. On the number of hidden node 30, 97.8% accuracy has been achieved using NewSigmoid which is equivalent to the tansig. On the number of hidden node 39, 99.4% accuracy has been achieved using NewSigmoid which is much better than the logsig. According to Table 7, on the number of hidden nodes 16 and 18 and on the BR training algorithm, 100% accuracy (highest) has been achieved using NewSigmoid. On the number of hidden node 32, 99.4% accuracy has been achieved using NewSigmoid which is equivalent to the tansig and logsig. In the case of the ovarian dataset according to Table 8, 97.2% accuracy has been achieved using NewSigmoid on the number of hidden node 25 and on the SCG training algorithm, which is equivalent to the tansig. On the number of hidden node 41, 91.7% accuracy has been achieved using NewSigmoid. On the number of hidden node 44, 94% accuracy has been achieved using NewSigmoid which is equivalent to the tansig. According to Table 9, on the number of hidden node 25 and on the LM training algorithm, 97.2% accuracy has been achieved using tansig, logsig, and NewSigmoid. On the number of hidden node 30, 94% accuracy has been achieved using NewSigmoid which is better than the tansig. On the number of hidden node 39, 99.5% accuracy has been achieved using NewSigmoid (highest). According to Table 10, on the number of hidden node 16 and on the BR training algorithm, 99.1% accuracy has been achieved using NewSigmoid which is the highest. On the number of hidden node 18, 98.6% accuracy has been achieved using NewSigmoid which is equivalent to the tansig. On the number of hidden node 32, 97.2% accuracy has been achieved using New Sigmoid which is the highest. In the case of the mathwork-cap image dataset, we randomly chose 41 hidden neuron size and the SCG training algorithm (Table 11). 96% accuracy (highest) has been achieved using the NewSigmoid activation function. 94.7% and 93.3% accuracy have been achieved using tansig and logsig activation function respectively.
Training, validation and testing accuracy
We have also shown training and testing accuracy from Tables 5 to 11. We achieved 100% training and testing accuracy (with SCG and LM), and validation accuracy 96.3% with SCG and 100% with LM on the wine dataset on the hidden neuron size 25 (Table 5 and 6). We achieved the highest validation accuracy on LM (Table 6) using NewSigmoid. With the help of the BR training algorithm on the hidden neuron size 16, we achieved 100% training and testing accuracy on the wine dataset (Table 7). We achieved 100% training accuracy, 84.4% testing accuracy, and 96.9% validation accuracy (with SCG and LM) on the hidden neuron size 25 on the ovarian dataset (Tables 8 and 9). With the help of the BR training algorithm on the hidden neuron size 16, we achieved 100% training and 93.8% testing accuracy on the ovarian dataset (Table 10). We achieved 100% training and validation accuracy, and 72.7% testing accuracy (with SCG) on the hidden neuron size 41 on the mathwork-cap dataset (Table 11).
Performance analysis
Figure 4, shows the performance analysis of three AFs. This analysis has obtained with the help of the LM training algorithm and 39 hidden neuron size on the wine dataset. We have achieved the best training performance is 1.2408e-24 at epoch 1000 with the help of tansigmoid. We have achieved the best training performance is 5.1271e-24 at epoch 1000 with the help of logsigmoid. We have achieved the best training performance is 5.8873e-05 at epoch 1000 with the help of NewSigmoid. For the same training samples, algorithms, and the hidden neuron size NewSigmoid achieves the highest accuracy.

Mean square error. Figure (a), (b) and (c) are with respect to tansigmoid, logsigmoid, and New Sigmoid.
In this paper, our goal is to introduce a new activation function and its properties. We propose a new activation function called as NewSigmoid activation function. We compare NewSigmoid with logsigmoid and tansigmoid (widely used activation function). With the help of seven datasets and one image dataset, we may say that in multiple cases NewSigmoid achieves better or equivalent results as compared with logsigmoid and tansigmoid. NewSigmoid achieves the highest average accuracy on LM in the case of the iris dataset (96.9%), cancer dataset (97.9%), and glass dataset (97.1%). So, NewSigmoid may achieve better results than the other two activation function where the LM algorithm is used. As compared with logsigmoid, NewSigmoid achieves better average accuracy on BR in the case of iris dataset (98.4%), cancer dataset (99.1%), and glass dataset (98.8%). So, NewSigmoid may achieve better results than the logsigmoid activation function where the BR algorithm is used. Like logsigmoid and tansigmoid, Newsigmoid also achieves 100% accuracy on the iris, glass, chemical, bodyfat, wine dataset. In the ovarian dataset, NewSigmoid achieves the highest (99.5%) accuracy on LM. On the mathwork-cap dataset (image dataset), NewSigmoid achieves the highest (96%) accuracy. We may also increase these accuracies with the help of some preprocessing techniques, such as normalization, taking some other networks. Like logsigmoid and tansigmoid our proposed AF also allows backpropagation, because this function is a differential function. NewSigmoid is zero centered but logsigmoid is not zero centered AF. Therefore, our function works better than the logsigmoid. The logsigmoid is not symmetric around zero, but NewSigmoid is symmetric around zero like tansigmoid. So, due to this, we may use NewSigmoid for solving very complex problems and non-linear problems such as audio, images, or any high dimensionality problems. Amir Farazad et. al. [17] observed that sigmoid activation with range [–0.5, 1.5] generally produced better or more accurate results. The range of NewSigmoid is [–0.7071, 0.7071], and this AF also produces better results in multiple cases as compared with logsigoid (range is [0, 1]) and tansigmoid (range is [–1, 1]).
Like logsigmoid and tansigmoid, there are two limitations of the NewSigmoid AF. This AF has a finite range, due to this reason for the very high or very low value of input; there is almost no change in prediction. These problems are also called the vanishing gradient problems. These problems can be solved by scaling methods. This AF may be performing slow convergence.
Conclusion and future scope
In this paper, we have explained proposed NewSigmoid Activation Function (AF). First of all, we have explained the neural networks and their usages. We have taken Multilayer Feedforward Neural Network and explained its working. Then we have explained some activation functions. Afterward, we proposed our NewSigmoid AF, mentioning its properties. Our AF is a smooth S-shape, rangebound (–0.7071, 0.7071), continuously differentiable, and zero-centered function. Vanishing gradient and slow convergence are two limitations of the NewSigmoid AF. We have shown through eight datasets that in multiple cases this AF is at par with the tansig and logsig activation function, and in some cases, this AF achieved better results as compared with the tansig and logsig activation function. We have tested our activation function on iris, cancer, glass, chemical, bodyfat, wine, and ovarian datasets. We used SCG, LM, and BR algorithms during the optimization of the neural network. With the help of NewSigmoid, we explored multiple better results. In the iris dataset, we achieved a maximum of 99.3% accuracy on SCG, 100% accuracy on the LM and BR algorithms. In the cancer dataset, we found a maximum of 99.4% accuracy on SCG, 99.7% accuracy on LM, and 99.9% accuracy on the BR algorithm. In the glass dataset, we achieved a maximum of 99.5% accuracy on SCG and LM, and 100% accuracy on the BR algorithm. In the chemical and bodyfat dataset, we achieved a maximum of 100% accuracy on the SCG, LM, and BR algorithms. In the wine dataset, we achieved a maximum of 99.4% accuracy on SCG (at hidden neuron size 25 and 41), 100% accuracy on LM (at hidden neuron size 25), and 100% accuracy on BR algorithm (at hidden neuron size 16 and 18). In the ovarian dataset, we achieved a maximum of 97.2% accuracy on SCG (at hidden neuron size 25), 99.5% accuracy on LM (at hidden neuron size 39), and 99.1% accuracy on BR algorithm (at hidden neuron size 16). We have also tested on the mathwork-cap dataset (image dataset). We achieved 96% accuracy on SCG (at hidden neuron size 41). We also achieved 100% training and testing accuracy on some datasets.
In view of the above, NewSigmoid may be considered in the neural network for achieving better accuracy in place of logsig and tansig activation function while experimenting with LM or BR algorithm on numerical datasets and SCG algorithm on image datasets. On all the sizes of hidden nodes, tansigmoid and logsigmoid do not achieve better performance i.e. on some hidden neuron sizes tansigmoid achieves better results, and on some hidden neuron sizes, logsigmoid achieves better results. So if tansigmoid and logsigmoid do not achieve satisfactory results then we may use our newSigmoid for checking and getting better results. In the future, we will strive to make some more good activation functions with the help of NewSigmoid or existing AFs. In the future, we will also use NewSigmoid in deep neural networks.
