Abstract
In recent years Extreme Learning Machine (ELM) has gained the interest of various researchers due to its superior generalization and approximation capability. The network architecture and type of activation functions are the two important factors that influence the performance of an ELM. Hence in this study, a Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) oriented multi-criteria decision making (MCDM) framework is suggested for analyzing various ELM models developed with distinct activation functions with respect to sixteen evaluation criteria. Evaluating the performance of the ELM with respect to multiple criteria instead of single criterion can help in designing a more robust network. The proposed framework is used as a binary classification system for pursuing the problem of stock index price movement prediction. The model is empirically evaluated by using historical data of three stock indices such as BSE SENSEX, S&P 500 and NIFTY 50. The empirical study has disclosed promising results by evaluating ELM with different activation functions as well as multiple criteria.
Introduction
In recent years, financial markets are becoming populated with companies and investors in an unprecedented speed. With increase in technology and availability of data, investors are becoming more cognizant about the stock market and its working principle. This may be a main reason of adequate involvement of them in different investments related to stock market. Equity, stocks, bonds, currency price, gold price etc. are some of the products or commodities that the investors buy and sell on a daily basis in stock market. Although people are aware of high future profit of these investments, the risk of investment cannot be eliminated. Hence predicting the upcoming prices and its movement are extremely important to reduce this risk. Again the stock market is very volatile and non-linear in nature, due to its dependency upon external factors like inflation, deflation, gold price, economic policies of a country, current trends etc. [1]. So the predictability of a stock’s upcoming price is a very demanding yet problematic affair.
In literature a nimble elevation in application of computational techniques are appearing for different aspects of stock market analysis such as stock price prediction, its movement prediction, its volatility prediction, stock index return prediction, stock trading and soon. Out of these techniques, Artificial Neural Network (ANN) is one popular stream, which has widely applied in different stock market analysis. Though different types of ANN have performed well from time to time, most of them suffer from lower generalization, high complexity, network architecture, rigorous parameter tuning, human intervention and higher training time. Deciding the structure of the network, activation functions to be used and learning algorithm have always been challenging for researchers. In existing literature, the simple Feed Forward Neural Network (FFNN) with a single hidden layer has applied extensively for various prediction and classification task. Back propagation (BP) algorithm is one commonly used training algorithm for it. The BP is a popular one but it suffers from local minima and has slow convergence rate. So researchers have introduced a variety of nature inspired learning algorithms for ANN. Though it solves the local minima problem but it increases the time of learning. In both approaches, the unknown weights of the network are adjusted iteratively through the whole learning process, which has a major role in increasing the training time of the network. To solve these issues, Huang et al. [15] have proposed a new training algorithm noted as Extreme Learning Machine (ELM) for training the FFNN with a single hidden layer. This simple FFNN uses two sets of weights. One set is related to the connections between the input and hidden layer, which are assigned randomly initially during network creation. These are not tuned further in the learning phase. The other set of weights is associated with the connections between hidden and output layer, which is induced by a pseudo inverse least square computation. Unlike the other traditional training algorithms, ELM completes the network learning within a single iteration. Due to this, the overall model execution time is reduced greatly making it a preferable alternative over the other training models. Being a competent and immensely fast learning algorithm, ELM has successfully applied for solving various regression and classification problems in financial domain. However, the choice of the number of hidden layer neurons and the activation functions used in the hidden layer, involved in modelling an ELM highly influences the overall outcome of ELM. These components need to be examined thoroughly before modelling an ELM for any application. As the performance of ELM varies for different activation functions with respect to different performance measures, it is impractical to model an ELM with a specific activation function for all purposes. Hence, developing an efficient ELM by investigating the impact of different activation functions that will able to produce better result with respect to multiple criteria can be considered as a Multi Criteria Decision Making (MCDM) problem.
In this study, TOPSIS, one of the in demands MCDM technique is suggested to examine the performance of an ELM, modelled with distinct activation functions with respect to miscellaneous evaluation criteria. Estimating the capability of the ELM with respect to multiple criteria instead of a single criterion can help in designing a more robust network. The proposed framework is used as a binary classification system for pursuing the problem of stock index price movement prediction. The TOPSIS has contributed in ranking ELMs modelled separately by using fifteen activation functions such as Sigmoid, Gaussian, Sinusoid, TanH, ArcTan, ArcSinH, Rectified Linear Unit (ReLU), Leaky ReLU (LreLU), Parametric ReLU (PReLU), SinC, SoftPlus, Exponential Linear Unit (ELU), Inverse Square Root Unit(ISRU), ElliotSig, and Square Nonlinearity (SQNL). The outcome of all the fifteen ELM classifiers are observed for sixteen evaluation criteria such as True Positive (TP), True Negative (TN), Accuracy (ACC), Precision (PREC), Recall (REC), F-Measure (FM), G-Mean (GM), Specificity (SPEC), Negative Prediction Value (NPV), False Positive Rate (FPR), False Negative Rate (FNR), Youden’s Index (YI), Markedness (MK), Balanced Classification Rate (BCR), Half Total Error Rate (HTER) and Jaccard (JACC). Afterward it is analyzed as a 15
Fifteen distinct ELM models are developed using fifteen different activation functions in the hidden layer of a SLFN for stock index price movement prediction. The ELM models are ranked separately in training and testing phase to identify the activation functions that performs well in designing an ELM based classifier. A TOPSIS based approach is applied for ranking the ELMs by considering sixteen evaluation criteria.
The rest of the paper is assembled as follows: Section 2 provides a concise overview of the literature associated with ELM prediction models as well as TOPSIS. Section 3 enumerates the particulars of suggested model for stock index price movement prediction by implementing various activation functions in ELM models and ranking these models using TOPSIS and obtaining the best ELM model for this experiment. Section 4 specifies the empirical result obtained from classification as well as TOPSIS. Section 5 incorporates the concluding observations and future prospects.
The advancement in technology, computing power and complex yet good algorithms have made the work of prediction more effective than ever. There are several approaches for predicting the stock index movement as well as stock price. Researchers have tried to tackle this problem by using statistical methods and using intelligent computing techniques. As the computers now a days are more powerful than their ancestors, so more complex yet efficient prediction models such as Artificial Neural Networks (ANN) [2, 3, 4, 5, 8], Support Vector Machine (SVM) [6, 7], Decision Tree (DT) [10, 11], Naïve Bayes (NB) [12], K-Nearest Neighbour (KNN) [13, 14] and soon are being used by researchers for such analysis. In [3] Guresen et al. have proposed a MLP neural network based time series forecasting model for NASDAQ stock index. They compared the results of their proposed model with two benchmark models such as generalized autoregressive conditional heteroscedasticity (GARCH) and dynamic artificial neural network (DAN2) and found MLP to be the better performing one. In [4] Mostafa et al. have hypothesized that neural network models are better suited for time series forecasting than traditional statistical models. They advocated a forecasting model which employs MLP and generalized regression neural network (GRNN) for predicting the price movements of Kuwait stock exchange. Huang et al. presented a SVM model in [6] for predicting the directions of NIKKEI 225 index and exchange rate of USD. They used random walk model (RM), linear discriminant analysis (LDA), elman back propagation neural network (EBNN) and quadratic discriminant analysis (QDA) models as benchmark models and found SVM to be the prevailing model. The inclusion of kernel functions and the capacity control of the decision function have represented SVM as a promising trend prediction model in [7]. A Radial Basis Function Network (RBFN) trained using ridge ELM (RELM) is proposed in [8] for future stock index movement prediction of two popular stock indices such as BSE SENSEX and S&P500. The authors have investigated the performance of RBFN with 7 different basis functions by considering three different learning algorithms. The results reveal the better performance of the model trained using RELM with Inverse Multi Quadratic and Gaussian radial basis functions for both the dataset. Another Computational efficient functional link artificial neural network (CEFLANN) model trained using ELM is come forth as a promising model for classifying future stock index price movements of BSE SENSEX and S&P500 indices in [9]. The model has shown better results in comparison to Chebyshev FLANN and RBFN model. In [10] another SVM based trend prediction system is presented, in which DT is used for feature selection. The parameters of DT and SVM are optimized by applying genetic algorithm. Tiwari et al. implemented a hybrid model comprising of decision tree rough set and Hierarchical Hidden Markov Model (HHMM) in [11] to predict trends in BSE SENSEX. They also compared the result of the proposed model with hybrid decision tree rough set based model and standalone rough set based model and identified the superiority of the proposed model over benchmark models. An automation model based on NB is proposed in [12], which helps in alerting users for taking valuable decisions regarding buy or sell company’s stock. The model is validated over real-time as well as dummy datasets. Though the above models and methods give very promising result in terms of their applied area, they fail to impress the researchers due to its high computational time used for model training.
In 2004, Huang et al. proposed a fast learning algorithm for SLFNs termed as ELM [15]. According to their experiment, ELM overcomes the drawbacks such as parameter turning, local minima problem and slower convergence of traditional gradient based learning algorithms. In their experimentation they showed the superiority of ELM over FFNN trained with BP, RBF, SVM, bagging and boosting methods. The authors used real world medical diagnostics dataset and US forest service dataset for binary class classification. ELM outperformed other algorithms in terms of accuracy, parameter setup and training speed by a very large margin. Further, the authors have examined the implications of ELM on various fields of research in [16, 17]. Due to better generalization and extremely faster training speed, ELM has shown promising outcomes in real world problems such as regression, medical diagnostics, large and complex applications, signal processing [18], image processing [19], financial time series prediction [20, 39], automatic control [21] and medical diagnostics [22]. ELM also overcomes the trivial issues like local minima, over-fitting and irregular learning rate encountered by traditional gradient based learning algorithms. In [23] the authors have used the ELM to predict reservoir permeability, through the contrast analysis with SVM. In [24] ELM is applied for remote sensing image fusion. Here it is applied to get the training sample regression relationship between the remote sensing image data to improve the real quality of a thermal infrared image data. Another feasible and effective application of ELM is found in lithologic identification, and the establishment of the lithologic classification model in [25]. Huang et al. also advocated in [26], that unlike classic gradient based learning algorithms which only work with differentiable activation functions, ELM can work with a variety of activation functions including non-differentiable activation functions. According to their proposal, ELM is able to approximate any continuous function and can implement any classification problem. SLFNs do not require parameter tuning; instead random parameters can yield better performance [27]. Several other variations of ELMs such as Fully complex ELM [28], Online Sequential ELM (OS-ELM) [29], Optimally pruned ELM (OP-ELM) [30], Error Minimized ELM (EM-ELM) [31], Multiple Kernel Based ELM (MK-ELM) [32], Incremental ELM (I-ELM) [33] have been proposed by several researchers for the purpose of classification and regression and these ELMs have better generalization and faster convergence than conventional learning algorithms. ELM’s literature suggests that they are very much dependent on the activation functions used in their hidden layer neurons. There are a several activation functions available for implementation which are better as well as worse than one another for several test cases. It is quite uncertain to decide which activation function is best suited for ELM. So in this research, few of the commonly used activation functions have been selected from the literature, to analyse each one’s performance when it is applied on the hidden layer neurons of ELM.
As suggested in [34], evaluating the performance of a classification model with respect to different evaluation criteria is more reliable and robust. In their proposal they analyzed that different models having same accuracy can have different performance characteristics. So, evaluation of classification by employing multiple performance criteria can be favourable. In literature, it can be observed that the majority of the researches have considered one or two performance measures to show the validity and reliability of their proposed models. However in real world when there are multiple alternatives present and it is necessary to evaluate those alternatives with respect to multiple criteria, which is currently gaining priority in several domain as a MCDM problem. It can also be noted that evaluating the performance of a classifier with respect to multiple criteria can give us vital information regarding various decision, behaviour and functioning of the classifier and in the end can produce a more valid and reliable classification. The following literatures have described various MCDM approaches to evaluate multiple alternatives and criteria and produce a mathematically sound ranking system so that the user can choose the best alternative from all the alternatives. Dash et al. presented a TOPSIS based classifier ranking approach for stock index movement prediction in [35]. Their experiment shows that by evaluating classification performance of different classifiers with respect to multiple evaluation criteria and ranking them using a MCDM model can give robust and reliable prediction. In [36] the authors have examined various MCDM models such as TOPSIS, VIKOR, PROMETHEE II, Elimination and Choice Expressing REality (ELECTRE) III and Grey Relational Analysis (GRA) with the help of Spearman’s rank correlation coefficient to find the better MCDM model among them. A PROMETHEE based assessment of 11 common classifiers with few performance measures is presented in [37]. Further a ranking of classifiers using PROMETHEE and TOPSIS is presented in [38] by considering 10 classifier models and 6 performance evaluation criteria. An extension of that work is appeared in [40], where the TOPSIS has applied for selection of well performed base classifiers for a crow search based classifier ensemble. However the primary aim of the MCDM is to intensify the subjective evaluation of models considering multiple conflicting criteria.
Exploring the literature in domain of stock index movement prediction, application areas of ELM and MCDM, it is observed that prior to this proposed work, any other study has not checked the performance of ELM with diverse set of activation functions. As the performance of ELM varies for different activation functions with respect to different performance measures, it is impractical to model an ELM with a specific activation function for all purposes. Hence, it is essential to analyze the performance of ELMs with different activation functions. Again instead of selecting and ranking the well performed one on basis of a single criterion, taking the decision with respect to multiple criteria can help in designing a more robust predictor. Ranking of ELMs using TOPSIS approach, for predicting future stock index movements is also a novel contribution by the authors.
An ELM network architecture.
This section provides a brief theoretical description about ELM, different activation functions, evaluation criteria and TOPSIS used in the proposed work.
Extreme learning machine
ELM is learning algorithm, which is originally proposed for a SLFN by Huang et al. in [15] and then, is extended to generalized SLFNs, where no tuning of hidden layers are required. Analytical calculation of output weights in a single iteration, following the random assignment of weights between inner layer and hidden layer nodes is one of the important features of ELM. This arbitrary weight assignment can produce better generalization than traditional learning algorithms. Again the negligible parameter tuning and lesser user intervention make the learning speed much faster than the other learning algorithms. ELM, as seen in the literature is very useful for function approximation, regression and real world classification problems and they are better suited for applications in which faster prediction and response is needed. Due to its superior performance, several other variations of ELMs have been applied in several real world problem solving. Using ELM for classification and regression is advantageous due to the fact that it trains faster, needs lesser intervention for parameter tuning and it can reinforce various activation functions. The network architecture used for ELM learning is represented in Fig. 1. The network is created using
where,
Equation (1) can be written as follows:
where
where
In our proposed model, the output weights linking hidden layer neurons and output neurons are acquired mathematically using identical concepts of ELM.
Here
Activation function acts as a mathematical gateway, which are used to regulate the end product of an artificial neural network. These functions are associated with each individual neuron in the neural network and decide whether that neuron should fire or not. Another key responsibility of an activation function is to normalize the output values of each neuron within a certain range. Activation functions get activated or deactivated based on certain rules or threshold value. These functions can simply be a step function that sets the output of a neuron as on or off. With increase in the complexity of the network, the strain on the activation function also increases. In general two types of activation functions are used such as linear activation functions and non-linear activation functions. Linear activation functions do not give that much flexibility in terms of training the model. It has very restricted ability to grasp complex datasets. Now a days,neural networks are implementing non-linear activation functions which have the ability to handle complex datasets, calculate and comprehend any function. These non-linear functions allow the network to construct composite mappings in between inputs and outputs in the network.
While constructing a neural network model in real-world, determining a suitable activation function is a crucial task. Evaluating several activation functions for a certain problem will help in achieving superior outcome. Hence in this study, fifteen different activation functions are taken in to account for analyzing the performance of ELM with respect to each individual one. Figure 2 presents the graphical representation of these activation functions and the details of these are listed as follows:
Activation functions.
It is one of the most common nonlinear activation functions used by researchers in several domains. Its output values lie between the range on 0 and 1. Sigmoid function has a smooth gradient and is also known as logistic regression. The sigmoid transformation of any value
ISRU has smooth and continuous derivatives and it is faster than TanH. The mathematical expression of ISRU is given below:
Gaussian function is another one of the most popular activation function whose output ranges between 0 and 1. It is a continuous function and has a bell shaped curve. The Gaussian transformation can be carried out by using the following formula:
Sinusoid or sine wave is a mathematical curve and is continuous in nature. Due to its periodic nature, sine wave can give rise to a ‘rippling’ cost function making it difficult to train. The output values of sinusoid activation function ranges between
TanH activation function is a non-linear function that transforms real valued numbers into a range between 1 and 1. Due to its stronger gradient, it is more preferred than sigmoid activation function. After a certain value, TanH suffers from ‘vanishing gradient problem’. Its mathematical equation is as follows:
ArcTan or Tan
It is also known as Hyperbolic sine and it transforms real values in to the range between
ReLU is one of the most popular activation functions being used. It is non-linear in nature and its output value ranges between 0 and
As compared to ReLU, instead of making the value of output 0 at
Unlike LReLU where a predetermined slope 0.01 has been used, PReLU introduces a
In SinC activation function, the output decreases in ratio of the distance from the origin. When the input value
Softplus transforms real valued inputs into the range between 0 and
ELU is one of the most commonly used activation functions. It has a
Ellioitsig or Softsign doesn’t require any trigonometric or exponential functions for calculation which makes it faster and simple. Its input and output values range between
SQNL is a non-linear activation function which transforms real values into the range between
The performance assessment is a vital process while evaluating the performance of a classifier. These performance criteria help in the process of training the classifier and provide useful information regarding the classifier’s ability to handle unknown data during the testing phase too. A single performance measure may not always provide all the key information for post classification decision making. Some cases it may lead to ambiguous outcome. For an example, two different classifiers can have the exact accuracy value but they might have taken different positive and negative decisions during classification. Although their accuracy is identical, other performance measures may or may not be identical. So in many cases, single criterion classification is not the suitable path to follow. Hence in this study, sixteen performance measures such as TP, TN, ACC, PREC, REC, FM, GM, SPEC, NPV, FPR, FNR, YI, MK, BCR, HTER and JACC are calculated and used for evaluating the classification performance of each individual ELM. The respective definitions, significance and formulas are given as follows.
True positive
If the sample is having the actual class value as 1 and it is also classified as 1, then it is a correctly classified positive sample and is denoted as true positive. A higher value of TP signifies a better classification model.
If the sample is having 0 as actual class label and is classified as 0 too, it is a correctly classified negative sample and denoted as true negative. A higher value of TN signifies a better classification model.
The sensitivity value represents the correctly classified true positive samples to the total number of positive samples. Hence it can be considered as another form of accuracy representing the positive samples only. A higher value of sensitivity signifies a better classification model.
Similar to sensitivity, specificity represents the correctly classified true negative samples to the total number of negative samples and can be considered as the representation accuracy in terms of negative samples. A higher value of specificity signifies a better classification model.
It is the ratio between the correctly classified negative and positive samples to the total number of samples. Accuracy can face some problem when two classifiers yield same accuracy but may provide different positive and negative decisions. A higher value of accuracy signifies a better classification model.
Positive prediction value or precision represents the proportion of true positive samples to the total number of positive predicted samples. A higher value of precision signifies a better classification model.
Negative prediction value represents the proportion of true negative samples to the total number of negative predicted samples. A higher value of NPV signifies a better classification model.
F1-score or F-measure represents the harmonic mean of precision and recall. It ranges from 0 to 1. A higher value of F-measure signifies a better classification model.
Jaccard or Tanimoto similarity coefficient ignores the correct classification of negative samples as shown in the Eq. (28). A higher value of JACC signifies a better classification model.
Markedness depends on the PPV and NPV values hence any changes in those two values results in variation of markedness. A higher value of markedness signifies a better classification model.
Youden’s index combines specificity and sensitivity and ranges from 0 to 1. The YI values closer to 0 represent poor classification and values closer to 1 represent perfect classification. A higher value of YI signifies a better classification model.
Balanced classification rate combines the values of specificity and sensitivity as represented in Eq. (31). A higher value of BCR signifies a better classification model.
The opposite of BCR is called as Half Total Error Rate and is represented as follows:
A higher value of HTER signifies a better classification model.
False positive rate or false alarm rate or fallout is the ratio between the incorrectly classified negative samples to the total number of negative samples. In general it is the proportion of the negative samples that are incorrectly classified and is the complement of specificity. A lower value of FPR signifies a better classification model.
False negative rate or miss rate is the proportion of positive samples that are incorrectly classified. Hence it is the complement of sensitivity. A lower value of FNR signifies a better classification model.
As the aim of classification is to improve the sensitivity without sacrificing specificity, Geometric mean combines these two and is calculated using the formula as follows:
A higher value of GM signifies a better classification model.
Evaluating the performance of ELMs on the basis of sixteen different criteria, is clearly appearing as a multi-criteria decision making (MCDM) problem. Hence in this study, TOPSIS, one of the in demands MCDM technique is suggested to examine the performance of an ELM, modelled with distinct activation functions with respect to miscellaneous evaluation criteria. TOPSIS is developed on the foundation of figuring out the best substitute by increasing its distance from a negative-ideal solution, whereas decreasing its distance from an ideal solution. The alliance of outstanding performance values exhibited by any alternative for each criterion represents the ideal solution and the combination of least values produces the negative-ideal solution. The steps involved in TOPSIS are summarized as follows:
A decision matrix ( A standardized decision matrix (SM) is obtained from
A weighted standardized decision matrix (WM) is obtained from SM as follows:
where, is the weight of the The ideal solution (
where, The separation measures
The relative closeness (
Finally, the alternatives are ranked according to the decreasing order of their (
Proposed TOPSIS-ELM stock index price movement prediction model.
This section describes the detailed steps of stock index price movement prediction using ELM classifier. To understand the response of the dataset towards various activation functions being used in the hidden layer neurons in the ELM, 15 activation functions are taken into consideration. These 15 activation functions based ELMs are further analysed with the help of 16 performance evaluation criteria. Here multiple classifier models are evaluated using multiple criteria making it a suitable condition to implement a MCDM approach known as TOPSIS for generating ranks for each distinct activation function based ELM models. The proposed model is illustrated in Fig. 3. The model works in five phases.
Initially the historical prices of three benchmark stock indices BSE SENSEX, S&P 500 and NIFTY 50 are collected from 10
where,
The output of the prediction model is the upward or downward movement of the stock index prices i.e., if the stock index’s closing price of current day is higher than that of the previous day, it is moving in upward direction and if its closing price of current day is lower than previous day, it is moving in downward direction. To translate this movement into a suitable mathematical expression, the upward movement is labelled as 1 and the downward movement as 0 hence making this a binary classification problem where the two classes being 0 and 1. The movement of the stock index has been derived by using the formula,
where,
After pre-processing, 1209 data fields for each stock index have been extracted for classification out of which 847 data elements are utilized for training and the remaining 362 for testing.
Dataset description
Classifiers and activation functions
In the third phase of the model, 15 separate SLFNs are created with six input layer neurons, one output layer neuron for predicting day ahead stock index price movement. Each network is trained using ELM. Keeping the architecture and the training approach of the networks fixed, the models are varied by using different activation functions in the hidden layer neurons. The activation functions used in the hidden layers in an ELM plays a vital role and influences the performance of the model hugely. Hence in this study, fifteen commonly used activation functions as explained in Section 3.2 have been implemented on the hidden layer neurons resulting fifteen distinct ELMs. Furthermore, the stock index price movement of each day is empirically predicted using these 15 ELMs. The number of hidden layer neurons used in each ELM model during training phase is kept fixed to 4, which is found from prior simulations. The weights and biases linking the input layer and hidden layer neurons are measured randomly where the values lie between
In the fourth phase the performance of fifteen ELMs are observed over sixteen evaluation measures described in Section 3.3. The measures used for evaluating the classification performance are categorized as type 1 and type 2. Type 1 set includes those measures, in which the more the value of the performance measure the higher the classification performance. Whereas type 2 set includes those, in which the less value implies higher classification performance. In this analysis a combination of these two types of performance measures has been taken into consideration. Here, type 1 performance measures are accuracy, precision, recall, FM, TP, TN, GM, specificity, NPV, YI, BCR, HTER and JACC. Similarly type 2 performance measures are FPR, FNR and MK. Further based on their steps of post calculation involved from the confusion matrix obtained from the network output, all the 16 measures are categorized into three groups. Group 1 consists of TP, TN, TPR and TNR measures that can be directly obtained from confusion matrix. So these are considered as initial classification performance measures. By using these four basic values, various other performance measures can be calculated. Group 2 consists of YI, BCR, HTER, GM, FPR and FNR that can be obtained from group 1 measures. Group 3 consists of ACC, JACC, NPV, PPV, FM and MK measures which can be calculated in two steps from the original output. The grouping of these criteria is done to provide different weight during ranking of the ELMs.
Finally in the last phase the ranking of ELMs are done using TOPSIS approach. The weights used in Eq. (3.3.17) to find weighted standardized decision matrix are normally assigned as per the user’s convenience. But in this study, the weights are assigned according to their precedence. According to their group precedence, each one has been assigned with a certain weight value. 40% of total weight value is given to group1 and 30% is given to each group 2 and group 3. The performance measures present in each group are given equal importance according to that group’s total weight. Again the sum of all weights is kept equal to 1. Applying the Eqs (3.3.17) to (40) over the derived WM with this new weighted scheme, finally the relative closeness is obtained for each ELM which has been used for ranking of them. The models are ranked separately over training and testing set to result in an appropriate conclusion.
This section presents the day ahead stock index price movement for three benchmark stock indices such as BSE SENSEX, NIFTY 50 and S&P 500 observed using fifteen ELM models, their corresponding outcomes during training and testing phase followed by a detailed analysis. The main purpose of this analysis is to evaluate various activation functions used in ELM models’ hidden layer neurons for the prediction of forthcoming stock index price movements. To examine these models, the historical price values of the three benchmark stock indices are collected across 10
Result analysis over training samples
As this is a supervised learning method, all 15 ELM models are first trained using the training sample set in which 6 normalized technical indicator values are given as input and their respective class labels are given as output to the classifiers. Therefore a total of 15 ELM classifier models having distinct activation functions in their hidden layer neurons are trained for predicting the future stock index movements. The training performance of these classifier models are averaged over 20 runs and is evaluated using 16 performance evaluation criteria. The ELM classifier models and their respective activation functions are represented in Table 2. Tables 3 to 5 depict the classification results over training samples for BSE SENSEX, NIFTY 50 and S&P500 respectively.
Training classification result of BSE SENSEX
Training classification result of BSE SENSEX
Training classification result of NIFTY50
Training classification result of S&P500
From Table 3 it is observed that, for BSE SENSEX dataset though ELM11 is producing better ACC, PREC, TN, GM, SPEC, FPR, Y1, MK, BCR, JACC values but ELM2 is producing better FM value, ELM5 is producing better HTER value and ELM8 is producing better REC, TP, NPV and FNR. Similarly analyzing Table 4 it is observed that for NIFTY 50 dataset ELM11 is producing better ACC, PREC, TN, GM, SPEC, FPR, Y1, MK, BCR values but ELM2 is producing better REC, FM, TP values, ELM5 is producing better HTER value and ELM9 is producing better NPV, FNR, JACC values. From outcomes of S&P 500 dataset represented in Table 5 it is observed that, ELM15 is producing better ACC, FM, GM values, but ELM2 is producing better REC, TP, SPEC, NPV, FPR, FNR, Y1, MK, BCR, JACC values, ELM1 is producing better HTER value and ELM12 is producing better PREC and TN values. After analysing all the training performance values of each ELM model across 16 performance evaluation criteria, it is found that no ELM model gives the best result for all performance measures. Although few ELMs perform superior in few criteria, they fail to impress for the remaining criteria. Thus, it points to a sensible situation of forming a more reliable decision considering multiple criteria. As several alternate ELM models are evaluated with respect to several criteria, TOPSIS a MCDM approach is further implemented to analyse and rank all these 15 alternate with respect to 16 criteria. Accordingly, the Tables 3 to 5 are considered as a decision matrix of size 15
TOPSIS ranking of ELM models with respect to their training classification result
Top five ELM model with respect to their training classification result
From Table 7, it is found that Elliotsig is having rank 1 for BSE SENSEX, NIFTY 50 and ISRU is having rank 1 for S&P 500. However, from further analysis, it is observed that, ELU, ArcSinH and LReLU are the three common performers in the top 5 ranked list for all the three dataset. From these observations, it is still uncertain that which activation functions are comparatively more suitable for our benchmark datasets. To observe the capability of ELMs in terms of generalization and handling unknown data, further the validity and reliability of these models are tested by introducing some testing samples.
In real world classification, the classifiers have to deal with unknown data samples. The testing performance of a classifier notifies and ensures about the generalization capability of the classifier. Hence in the next step of analysis, the considered ELMs are introduced to 362 number of testing data samples each consisting of 6 columns holding the normalized values of 6 technical indicators. The testing classification performance of these 15 ELM models with respect to 16 performance
Testing classification result of BSE SENSEX
Testing classification result of BSE SENSEX
Testing classification result of NIFTY 50
Testing classification result of S&P 500.
evaluation criteria have been shown in Tables 8 to 10 for BSE SENSEX, NIFTY 50 and S&P500 respectively.
From Table 8, it is observed that, for BSE SENSEX dataset ELM1 is producing better PREC, TN, SPEC, FPR, values, ELM4 is producing better REC,TP, NPV, FNR, HTER values, ELM9 is producing better ACC, FM,GM, Y1, BCR, JACC values and ELM14 is producing better MK value. Similarly analyzing Table 9 it is observed that for NIFTY 50 dataset ELM1 is producing better PREC, TN, SPEC, FPR values, ELM3 is producing better REC,TP, FNR, HTER values, ELM5 is producing better NPV value and ELM14 is producing better ACC, FM, GM, Y1, MK, BCR, JACC values. From outcomes of S&P 500 dataset represented in Table 10 it is observed that, ELM2 is producing better TN, SPEC, FPR, HTER values, ELM5 is producing better REC, TP, NPV, FNR values, ELM9 is producing better PREC,TN, SPEC, FPR values and ELM15 is producing better ACC, FM, GM, Y1, BCR and JACC values. Over testing samples, it is also observed that no ELM model gives the best result for all performance measures. Hence considering it as a MCDM problem further the TOPSIS ranking of all ELMs over test samples is obtained and depicted in Table 11.
After the TOPSIS ranking, it is found that ELMs 9, 12, 14, 6 and 11 are the top 5 performers for BSE SENSEX during testing. Similarly ELMs 14, 12, 9, 11 and 6 for NIFTY 50 and ELMs 15, 4, 12, 13, and 1 for S&P500 are found to be the best 5 performing ELM models during testing. Table 12 represents the top 5 ELM performers for all the three dataset over test samples. From Table 12 it is observed that, ELM 12 which uses ArcSinH as the activation function in the hidden layer neurons is the common performer among all three datasets. Apart from ARCSinH, ELM 9 (LReLU), ELM 11(ElliotSig) and ELM 6 (Sinusoid) are common for BSE SENSEX and NIFTY 50 dataset. By analysing the training and the testing rankings represented in Tables 7 and 12 respectively, it is observed that ELM 12 using ArcSinH activation function is coming among the top performers in both training and testing samples. Beside it ELM 9 using LReLU activation function is also producing better result commonly in training and testing sample for BSE SENSEX and NIFTY 50 dataset.
TOPSIS ranking of ELM models with respect to their testing classification result
Top five ELM models with respect to their testing classification result
After observing the training classification results and testing classification results, it is come to the fact that, no single ELM performs superior over all the criteria. Over training samples, ELU, ArcSinH and LReLU are the only three activation functions that are commonly appeared among the top 5 best performing models according to TOPSIS ranking across all three datasets. However, in testing ArcSinH is the only one that is common in all three data sets’ top 5 performing models. Although LReLU is common for BSE SENSEX and NIFTY 50 datasets while testing, it is not present for S&P 500. So ArcSinH is the only activation function that is showing promising result and is common for both training and testing classification. From the observations it is inferred that, ArcSinH has better implications in terms of stock index movement prediction for BSE SENSEX, NIFTY 50 and S&P 500 datasets as compared to other activation functions for the considered time period. From the above empirical analysis, it can be observed that the selection of activation functions for any ELM model is a crucial task. ELMs having different activation functions yield varying classification result. Also, the evaluation of the classification results in terms of multiple criteria can give a better understanding about the performance of the classifier.
Conclusion
Understanding and predicting the stock index price movements accurately can help stock traders in their financial decision making. As stock market data is very diverse, volatile and dynamic in nature, it is a challenging task to predict the future outcomes of the stock index movement. ELM being a fast learning algorithm shows very promising result in stock index price movement prediction. However, the classification result of an ELM model is heavily affected by the number of neurons in the hidden layer and the activation function being used in the hidden layer. As determining the best activation functions for an ELM model is very significant to achieve better classification results, in our experiment three benchmark stock indices i.e. BSE SENSEX, S&P500 and NIFTY 50 are empirically analysed. Here fifteen activation functions have been used for training distinct ELM models for predicting the upcoming movements of stock indices. The training performance of each model has been assessed with respect to sixteen performance evaluation criteria. The main reason behind evaluating the classification performance with respect to more than one criterion is to achieve a better performing classifier and a robust classification. These classification models are further ranked using an MCDM framework i.e. TOPSIS to determine the best performing models on training dataset and the testing dataset. The above empirical analysis shows that certain activation functions such as ArcSinH and LReLU are performing better while training the classifiers for all three benchmark datasets whereas only ArcSinH is elected as common better performer over testing sample of all three datasets. As classifiers are dataset dependent and ELM classifier is dependent on the activation function being used in the hidden layer, it can be concluded that classification using ELM is activation function dependent. So choosing the suitable activation function for any ELM classifier is a crucial task and this experiment describes one of the many ways to complete this task in a robust and accurate way. This experiment shows that ELMs are dataset dependent and their generalization performance depends on the dataset, activation functions being used.
The proposed work can be summarized into the following key points:
An ELM model has been considered for stock index price movement prediction. To understand the response of the network, fifteen activation functions are applied on the hidden layer neurons for distinct models. Input weights and biases are assigned randomly and output weights are calculated analytically. Empirical analysis of the model has been done over three distinct stock index datasets such as BSE SENSEX, S&P 500 and NIFTY 50 collected from 10 A MCDM framework denoted as TOPSIS has been implemented to rank the training and testing performance of the fifteen classification models with respect to sixteen criteria. The activation functions that are performing better for each individual stock index datasets are represented separately for training and testing samples.
Although the proposed TOPSIS-ELM framework is helpful in identifying the suitable activation functions in designing ELM based classifier, still it includes a number of challenges to be addressed in more details. Further work can be done on ELMs with respect to stock index price movement prediction by modifying the number of hidden layer neurons, exploring other learning approach used in ELMs, optimizing the structure of the ELMs using effective optimization algorithms and so on. The generalization ability of ELM can also be validated over more authentic datasets.
