Abstract
Index tracking is one of the most popular passive strategy in portfolio management. However, due to some practical constrains, a full replication is difficult to obtain. Many mathematical models have failed to generate good results for partial replicated portfolios, but in the last years a data driven approach began to take shape. This paper proposes three heuristic methods for both selection and allocation of the most informative stocks in an index tracking problem, respectively XGBoost, Random Forest and LASSO with stability selection. Among those, latest deep autoencoders have also been tested. All selected algorithms have outperformed the benchmarks in terms of tracking error. The empirical study has been conducted on one of the biggest financial indices in terms of number of components in three different countries, respectively Russell 1000 for the USA, FTSE 350 for the UK, and Nikkei 225 for Japan.
Introduction
Investors have two main investment strategies that can be used to generate return, respectively an active and a passive portfolio management. In an active strategy, the portfolio manager will try to pick the best performing stocks using their experience and judgement. On the other hand, a passive investment is based on the assumption that “you can’t beat the market” and in the long run an active strategy will lead to diminishing returns due to transaction costs and other market frictions. One of the most popular passive strategies is to track a stock index considered a proxy for market behavior.
Index tracking describes the process of attempting to replicate the performance of a financial index in time. A straightforward approach is to buy each stock constituting the market index in the proportion of its market weights. Even if in theory this will lead to a perfect (or full) replication, this is rarely used in practice due to some unrealistic constraints. First of all, you will need a lot of capital to buy all the stocks in a given proportion, making this approach impossible for most of the individual investors. Secondly, index weights are changing in time. For example, S&P 500 composition changed 60 times in the year 2000 (Beasley et al., 2003). Moreover, the exact composition for some indices is not known or the investment in some constituents is not possible as Andriosopoulos and Nomikos (2014) noted for Spot Energy index. Therefore, a partial replication is preferred.
In order to make a performant partial replication, two steps are required, respectively selection of the best k stocks and weight calculation for each of them. Classical approaches are based on mean-variance portfolio framework proposed by Markowitz (1952). The variance is defined as tracking error relative to a benchmark in Roll (1992). Rohweder (1998) enhances his approach by including transaction costs in the objective function but minimizing the objective function results in a quadratic program such as in Jansen and Dijk (2002). Other authors like Corielli and Marcellino (2006), among others investigate a factor-based approach to index tracking using Arbitrage Pricing Theory framework and in a more recent paper, Strub and Baumann (2018) proposed some practical constrains in a mixed integer linear programing formulation in order to obtain superior tracking performance.
Another line of research has been explored due to increasing performance of Machine Learning algorithms. Beasley et al. (2003) showed that an evolutionary heuristic method in index tracking problem is more desirable than a full replication. Oh et al. (2005) used a genetic algorithm to optimize the weights of stocks selected through fundamental analysis and in Chiam et al. (2013) the same procedure was used to minimize both tracking error and transaction costs. Another method with great potential was proposed by Heaton et al. (2017). They were using deep neural networks for index tracking problem. Most recent papers were built on this approach respectively Ouyang et al. (2019) and Kim and Kim (2019) which uses deep neural network for both selection and dynamic allocation.
This paper enhances the research on heuristic methods by proposing new algorithms for stock selection like tree based algorithms (Random Forest and XGBoost) and Lasso with stability selection. To the best of author knowledge, none of these methods have been used before in an index tracking problem. However, they were successfully used in other feature selection problems in finance as in Liu et al. (2015), Nobre and Neves (2019), or Sohrabi and Movaghari (2020). As benchmark, the autoencoders proposed by Kim and Kim (2019) and a strategy based on the largest stocks have been considered. On the allocation part, three schemes have been tested, respectively the neural network sensitivity approach proposed by Ouyang et al. (2019), a simple OLS method and a dummy equally weighted scheme for a robust comparison.
The organization of this paper is as follows. In the next two sections I will briefly discuss the models used for stock selection and for allocation. Section 4 will discuss data and methodology and Section 5 and 6 the empirical performance of the proposed methodology, respectively the conclusions.
Selection
This section will briefly discuss the algorithms used to determine the most relevant stocks in an index. As Kim and Kim (2019) noted, a more direct way is to choose the largest components of the index. However, there are cases in which this approach could not be applied because we either could not invest in all components or the exact structure is not known. Therefore, a more general procedure is required.
The task of finding the most relevant stocks could be seen as a classical feature selection problem. There are three general classes of feature selection algorithms (Miao & Niu, 2016), respectively: Filter methods which consist in applying a statistical measure to assign a score for each feature like Chi-Squared test, information gain or correlation coefficient score; Wrapper methods that consider the feature selection as a search problem where different combinations are prepared, evaluated, and compared to one another. An example of wrapper is recursive feature elimination; Embedded methods which determine what features best contribute to the accuracy of the model while the model was created.
This paper focuses on the last class of feature selection algorithms. From the embedded methods, Random Forest, XGBoost and Lasso model combined with a wrapper method (stability selection) have been chosen. For comparison, a Deep Autoencoders algorithm was also considered.
Random forest
Random Forest model introduced by Breiman (2001) is one of the most efficient algorithms for both classification and regression tasks. The model is based on bagging principle (Breiman, 1996), an aggregation scheme that generates multiple sets of data by bootstrapping from the original input set, makes a prediction for each set using CART model and aggregate the predictions in a single result.
The characteristics of this algorithm make it suitable for selection task. One of the biggest issues in detecting relevant features is discerning between variables that seem important due to random fluctuations and weakly but relevant variables. In a Random Forest model, each variable has chances to be included in the tree construction, so even weakly relevant features that are marginally related with the decision attribute will be used.
Following Genuer et al. (2010), the importance of a variable X
j
is defined as follows. For each tree t of the forest we consider an out of bagging sample (00B
t
) on which we are computing err00B
t
, the error (mean square error in this case) of a single tree t on this 00B
t
sample. By randomly permuting the values of X
j
in 00B
t
, a perturbed sample denoted by
XGBoost model developed by Chen and Guestrin (2016) is an efficient and scalable implementation of Gradient Boosting Machine. Its popularity in the Machine Learning competitions is due to numerous optimizations like (i) the addition of regularization term that improves the generalization ability, (ii) the multithreading parallel computing which increase the speed with over 10 times according to Chen and Guestrin (2016) and (iii) the efficiency of dealing with missing data.
To train the model the following optimization function must be minimize:
Stock selection will be made using the same approach as in Random Forest. Features will be ranked based on their importance in the model and the best k stock will be selected.
Lasso (least absolute shrinkage and selection operator) model popularized by Tibshirani (1996) minimizes the residual sum of squares subject to the sum of absolute value of coefficients being less than a constant. This constraint will shrink the coefficients towards zero, non-null ones being the most informative. The Lasso estimator
Stability selection was proposed by Meinshausen and Bühlmann (2010) as a technique designed to improve the existing methods. They consider β a p-dimensional vector where s < p components are non-zero. Denote the set of non-zeros values by S = k: β
k
\ne 0. The goal of this structure estimation is to find the set S from noisy observations. For every value of regularization parameter λɛz.epsi;Λ⊆ Â + it is obtained an estimate
In order to do that, a subsample of size n/2 is randomly selected without replacement on which Lasso algorithm is applied. This procedure will be executed many times, for every iteration a structure estimate
Autoencoders are one of the most used dimensionality reduction techniques. They were successfully used in index tracking problem by Heaton et al. (2016), Ouyang et al. (2019) and Kim and Kim (2019). The goal is to create a deep network architecture that will reconstruct the input vector in the output layer with as much accuracy as possible. In other words, for any given input x
i
, we will try to obtain through a series of nonlinear transformations an output
Autoencoders have usually a symmetric architecture. The middle layer consists in one or multiple neurons. Ouyang et al. (2019) argue that a structure with one neuron in the middle shares certain similarities with Capital Asset Pricing Model (CAPM) because its value can be interpreted as a market portfolio. For this reason, in this paper the center layer has exactly one node.
Selection of the best stocks will be made using Heaton et al. (2016) methodology. The most informative stocks will be the ones that have the highest similarity in the autoencoder. This will be measured using
This section discusses some stock allocation schemes that can be used after the stock selection stage. The allocation procedure consists in finding weights for the selected stocks in which we can invest so that the difference between partially replicated index return and the true index return to be as small as possible at the end of the testing period.
As stated in Introduction, many scholars use some optimization functions to determine the true weights based on Markowitz framework (e.g., Roll (1992), Rohweder (1998) or Jansen and Dijk (2002)). Other approaches use an equal-weighted scheme (Heaton et al.., 2016) or some schemes based on correlation of return (Chen and Kwon (2012), Kim and Kim (2019)). From heuristic methods, one of the most used approaches is to determine the weights through an evolutionary algorithm applied on an objective function (Beasley et al. (2003), Oh et al. (2005), Chiam et al. (2013)). However, if the objective function has a formulation similar to the mean square error function, the weights computed through an evolutionary algorithm will tend to the weights computed through ordinary least squares method, so this approach will be superfluous.
Despite their success in regression analyses, other Machine Learning algorithms like tree-based models or deep learning models, are not suited for this task due to their inability to express the output as a linear combination of features required in trading. However, Ouyang et al. (2019) propose to use a sensitivity analyses in order to determine the weights for each stock. They argue that a Neural Network model can efficiently extract the representation of each stock prices and model the nonlinear interaction between them.
In this paper two different allocation schemes have been considered, respectively a linear approach based on OLS method and a nonlinear approach based on the sensitivity of a deep neural network.
Ordinary least squares (OLS)
OLS is a statistical method used to determine the unknown parameters of a linear regression. The algorithm choses the parameters based on the least squares’ principle: minimizing the sum of squares of the difference between the observed dependent variables and those predicted by the linear function. The mathematical formulation can be written as:
A deep neural network is a set of interconnected processing nodes whose functionality is based on an animal’s neural network and it was first introduced by McCulloch and Pitts (1943).
Any neural network model presents 3 different types of layers, respectively an input layer in which we have the explicative variable (stock prices in this case), one or more hidden layers and an output layer (index price). Each layer contains many neurons. The functionality of an individual neuron is simple and direct. Each neuron summates all the signals sent to it, adds a bias term and performs a non-linear transformation through an activation function. The activation (transfer) function is an increasing monotonic function, most often a logistic function, hyperbolic tangent or ReLu type. The signal transformed into a neuron is forwarded by a certain weight to another neuron in another layer, and the process is repeated. This process is called feedforward step. The processing power of the network is determined by the weights given to each neuron which are computed using backpropagation method (see Rumelhart et al. (1986) for details).
For a simple network (one hidden layer) case, the feedforward step can be written as:
The wights matrixes W1 and W2 reflects the relationship between different units and different layers in the neural network, but they cannot reflect the relationship between the input and the output. In order to overcome this issue, Ouyang et al. (2019) propose a sensitivity analysis in order to determine the direct influence of input with respect of output. This sensitivity could be interpreted as weights of stocks in a portfolio.
For a general deep network case, the equation can be written as:
In order to highlight the practicality of heuristic approaches, three of the biggest indices in terms of number of components have been considered, respectively Russell 1000 for the USA, FTSE 350 for the UK, and Nikkei 225 for Japan. They represent over 90%of total market capitalization in each country. The data used in this analysis are represented by the daily prices of each index and their corresponding stocks traded between 01.01.2010 and 31.12.2020. To ensure the robustness of the methodology, 6 rolling windows have been considered, each of them with 5 years of training data and 1 year of out of sample data. The source of the datasets is Thomson Reuters Tick History, and the components of the indices are the ones at the end of each training period. The stocks that were not traded in the training period have been eliminated. In the case there are some non-trading days for some stocks, the last available price has been considered for the missing days. This also includes the cases in which a company have been delisted in out of sample dataset due to merge and acquisition or bankruptcy.
The aim of the empirical analysis is to track an index using fewer constituents. Therefore, k stocks have been selected, where k is 10, 25 and 50, respectively. Using only those stocks, the cumulative return of the index is being replicated in order to have the smallest tracking error. The input for Machine Learning models will be the daily prices for each stock, and the output will be the daily prices of the corresponding index.
Section 2 briefly discusses the algorithms and the methodology used in selection. In addition to those algorithms, as benchmark, the largest k stocks at the end of each training set have been considered. Random Forest was estimated using 100 decision trees with a maximum depth of 20 levels. The minimum number of samples required to split an internal node is 2, and the minimum number of samples required in a leaf is 1. A node will be split only if the division induces a decrease of the impurity greater than 0. The criterion for measuring the quality of the division of a tree is given by the mean squared errors function. XGBoost uses a gbtree booster with a learning rate of 0.3. Maximum depth of a tree is 20, as in Random Forest and the minimum loss reduction required for node partition is 0. The method used to sample the training instances is uniform. The L2 regularization term has a value of 1 and the objective function is to minimize the sum of squared errors. The hyperparameters tunning strategy for these algorithms is random search method.
In the case of autoencoders, same architecture as Kim and Kim (2019) has been used, respectively a 3-hidden layer deep autoencoder where the first and second layer have neurons of 1/4 and 1/16 of input stocks number. The middle layer has only one neuron as in Ouyang et al. (2019). The activation function is represented by hyperbolic tangent function
Section 3 briefly discusses the algorithms and the methodology used in stock allocation. The main disadvantage of the majority of heuristic models is that they are mainly nonlinear, so unsuited for weights computation in a portfolio management problem. Kim and Kim (2019) tested 3 allocation schemes, 2 based on correlation of return of the stocks with the index and one based on solving a quadratic programming problem. However, they found out that those approaches are not better than a simple equally weighted scheme. Therefore, in this paper I have chosen to compare the sensitivity of neural networks approach propose by Ouyang et al. (2019) (which has not been compared before with other approaches) with a standard OLS approach and with an equally weighted scheme. The purpose of this analysis is not to find the best possible calibration of the parameters, but to show that even an arbitrary configuration can produce notable results. Each strategy can be furthermore optimized based on the number of inputs or the rolling window.
As a measure of tracking error, a tracking error volatility of return (TEV), introduced by Roll (1992) and used by Kim and Kim (2019) have been considered:
Here T is the number of days in out of sample data, R is the daily return of the index, k is the total number of selected stocks, w (a) are the weights computed for each allocation scheme (a) and r is the daily return for a stock s. Another measure of performance used in this paper is the correlation between the partially replicated index and the market value of the index.
Table 1 shows the tracking error expressed in both TEV and correlation for three indices, respectively FTSE 350, Russell 1000, and Nikkei 250. The tracking error has been computed for 5 different selection strategies, respectively Autoencoders, Lasso with stability selection, Random Forest, XGBoost, and the Largest stocks in term of market capitalization, each of the selection algorithm with three different allocation schemes, respectively Neural Network Sensitivity (NNS), ordinary least squares (OLS) and an equally weighted scheme for benchmark. For robustness, three sets of stocks have been selected for each strategy, respectively 10, 25 and 50 stocks. The results represent an average over 6 one-year rolling window from 2015 to 2020. The individual results for each year can be found in Appendix. Most of the results are in the same range, the error being, on average, only 0.4%–0.7%.
Average tracking error expressed in both tracking error volatility (TEV) and correlation for each index, each selection strategy, allocation scheme and number of stocks for 1 year rolling window from 2015 to 2020. The best combinations for each index have been highlighted
Average tracking error expressed in both tracking error volatility (TEV) and correlation for each index, each selection strategy, allocation scheme and number of stocks for 1 year rolling window from 2015 to 2020. The best combinations for each index have been highlighted
To highlight the performance of each algorithm, a more in-depth analyses have been conducted. Table 2 presents the average tracking error and correlation with respect to selection strategy for each index. On average, in the analyzed period, all heuristic methods outperformed the benchmark based on the Largest companies in the index, with XGBoost having the smallest tracking error. Although was successfully used by Kim and Kim (2019), Autoencoders had worse performance than the other data-driven approaches. However, it seems that it has better results if the number of inputs is bigger as in the case of Russell 1000. Table 3 shows the average tracking error and correlation with respect to allocation scheme for each index. In all cases, an allocation based on an ordinary least squares model is better than the dummy equally weighted scheme. The neural network sensitivity model proposed by Ouyang et al. (2019) is not outperforming the benchmark.
Average tracking error (TEV) and correlation grouped by Selection strategies for each index. The best selection strategy for each index have been highlighted
Average tracking error (TEV) and correlation grouped by Allocation schemes for each index. The best allocation scheme for each index have been highlighted
Table 4 presents the average tracking error and correlation based on the number of stocks selected in the selection step for each index. There is a clear negative relationship between the number of stocks in the partially replicated index and tracking error, higher number of stocks imposing smaller errors. This can be explained due to diversification of the portfolio that reduce the overall volatility. Table 5 shows the average errors with respect to rolling window. Note that the tracking error is fluctuating with more than 2 times from period to period requiring a better training on a bigger training set in order to capture market volatility.
Average tracking error (TEV) and correlation grouped by number of stocks (k) selected for each replicated index. The best results for each index have been highlighted
Average tracking error (TEV) and correlation grouped by one year rolling window (starting from 2015) for each index
Figures 1–3 show the cumulative return curves for each index with the partially replicated indices generated by the three allocation schemes (NNS, OLS and equals) for one of the best selection strategies in term of the lowest tracking error, respectively Lasso with stability selection for FTSE 350 in 2018, XGBoost for Nikkei 225 in 2015 and Random Forest for Russell 1000 in 2016. The selected period capture both a lateral movement in markets and some huge volatilities with corrections of over 30%in one week. In all cases, the replicated index follows closely the true value of the index.

Cumulative result for FTSE 350 with Lasso selection on 50 stocks in 2019.

Cumulative result for Russell 1000 with Random Forest on 50 stocks in 2016.

Cumulative result for Nikkei 225 with XGBoost selection on 50 stocks in 2015.
Index tracking problem has a great practical importance in the financial economics field. This paper extends the latest methodologies in stock selection and allocation by proposing three new approaches for selection, respectively XGBoost, Random Forest and Lasso with stability selection. Among those, autoencoders have also been used. For the selection algorithm ordinary least squares has been compared to a neural network sensitivity approach and an equally weighted strategy. For robustness, three different indices have been considered, respectively Russell 1000, FTSE 350, and Nikkei 225.
Empirical results suggest that the proposed selection strategies have outperformed the considered benchmarks of top largest stocks in terms of market capitalization and deep autoencoders who were successfully used by the latest scholars. From those, XGBoost had the best performance. In the case of allocation schemes, ordinary least squares overperforms the dummy equally weighted allocation scheme and the neural network sensitivity approach. However, the differences are not significant. The number of stocks selected in the partially replicated index have a clearly negative relationship with the tracking error volatility, the error decreasing as the number of stocks increase. The error is not constant in time and can fluctuate more than 2 times from period to period.
The main advantage of the data driven approach is that it can be used to recreate any index return from a given pool of financial assets. Moreover, the transaction costs are very low because it does not require a dynamic allocation. One of the biggest limitations of this study is the assumption that the market allows short-selling which is not always the case. Moreover, stocks with higher beta could negatively influence the newly generated index. More robust tests should be performed in order to confirm the performance of this methodology.
Footnotes
Appendix
Tracking error expressed in both tracking error volatility (TEV) and correlation for each index, each selection strategy, allocation scheme and number of stocks for 1st rolling window (Jan 2015 –Dec 2015). The best combinations for each index have been highlighted
k = 10
k = 25
k = 50
Selection
Allocation
TEV
Correlation
TEV
Correlation
TEV
Correlation
FTSE 350
0.52%
89.90%
0.42%
89.99%
0.73%
82.42%
2.31%
85.87%
0.76%
94.08%
0.33%
96.51%
0.57%
86.13%
0.51%
93.52%
0.38%
95.17%
0.61%
63.67%
0.52%
87.85%
0.40%
92.53%
2.09%
72.65%
0.49%
90.11%
0.37%
94.47%
1.19%
83.77%
0.44%
90.47%
0.39%
93.53%
0.46%
89.35%
0.41%
95.49%
0.37%
95.38%
0.47%
89.81%
0.44%
94.66%
0.39%
97.22%
0.52%
88.35%
0.45%
93.83%
0.44%
94.65%
0.46%
89.59%
0.54%
85.30%
0.54%
85.04%
0.51%
87.32%
0.47%
89.00%
0.43%
93.06%
0.53%
86.29%
0.53%
85.75%
0.55%
88.49%
0.54%
85.46%
0.42%
92.11%
0.40%
92.76%
0.54%
85.75%
0.38%
93.27%
0.42%
91.41%
Russell 1000
0.42%
95.68%
0.35%
97.03%
0.46%
94.24%
96.65%
98.67%
0.52%
93.42%
0.40%
0.27%
0.67%
86.96%
0.42%
94.98%
0.23%
98.59%
0.73%
81.45%
0.54%
91.83%
0.24%
98.40%
0.57%
90.47%
0.62%
92.33%
0.28%
97.89%
0.79%
87.56%
0.46%
94.27%
0.35%
96.73%
0.77%
71.42%
0.53%
92.52%
0.48%
93.71%
1.21%
83.39%
0.52%
92.53%
0.42%
95.16%
0.91%
73.60%
0.52%
93.02%
0.27%
97.94%
0.68%
86.21%
0.50%
94.57%
0.26%
98.25%
0.62%
90.92%
0.50%
93.98%
0.30%
97.66%
0.51%
92.99%
0.39%
95.96%
0.33%
97.33%
0.61%
91.81%
0.39%
96.10%
0.23%
98.58%
0.61%
90.91%
0.40%
96.00%
0.25%
98.46%
Nikkei 225
0.59%
80.49%
0.63%
78.84%
0.45%
88.86%
0.68%
78.19%
0.46%
89.28%
0.39%
91.78%
0.80%
73.02%
0.58%
83.71%
0.48%
88.69%
0.49%
88.87%
0.32%
95.40%
0.29%
95.51%
0.45%
90.57%
0.51%
91.53%
0.37%
92.71%
0.61%
83.18%
0.41%
92.49%
0.43%
91.26%
0.49%
92.08%
0.44%
94.33%
0.37%
97.65%
0.59%
87.79%
0.42%
92.76%
0.36%
97.31%
0.53%
88.29%
0.46%
95.52%
0.38%
96.88%
0.61%
84.55%
0.44%
89.35%
0.42%
90.67%
0.39%
93.02%
0.47%
90.00%
0.41%
92.54%
0.41%
94.16%
0.43%
95.92%
0.37%
96.63%
0.35%
93.88%
0.27%
96.13%
0.26%
96.38%
0.35%
94.26%
0.30%
95.69%
0.29%
96.25%
Appendix
Tracking error expressed in both tracking error volatility (TEV) and correlation for each index, each selection strategy, allocation scheme and number of stocks for 2nd rolling window (Jan 2016 –Dec 2016). The best combinations for each index have been highlighted
k = 10
k = 25
k = 50
Selection
Allocation
TEV
Correlation
TEV
Correlation
TEV
Correlation
FTSE 350
1.56%
37.27%
0.60%
89.12%
1.02%
75.80%
0.64%
83.13%
0.61%
95.13%
0.31%
95.57%
0.92%
81.26%
0.46%
91.48%
0.38%
94.35%
0.74%
81.21%
0.47%
90.39%
0.49%
88.54%
2.86%
80.40%
0.57%
85.62%
0.44%
90.58%
0.83%
80.80%
0.48%
90.16%
0.45%
91.62%
0.90%
0.62%
93.75%
0.53%
93.06%
0.90%
92.52%
0.62%
93.06%
0.46%
0.92%
85.57%
0.69%
93.22%
0.57%
94.62%
1.02%
58.50%
0.46%
90.41%
0.87%
58.78%
0.68%
79.35%
0.53%
88.79%
0.45%
91.68%
0.93%
75.47%
0.80%
79.02%
0.54%
87.44%
0.94%
54.03%
0.40%
94.18%
0.58%
88.88%
71.02%
96.81%
0.85%
87.34%
0.51%
92.83%
0.37%
93.72%
Russell 1000
93.59%
0.58%
94.68%
0.55%
94.73%
0.60%
94.14%
95.52%
0.43%
96.87%
0.64%
93.89%
0.60%
0.71%
90.99%
0.53%
94.97%
0.50%
96.08%
0.75%
89.92%
0.54%
95.03%
0.46%
96.50%
0.73%
92.90%
0.60%
93.54%
0.49%
96.09%
0.93%
86.28%
0.79%
90.98%
0.52%
95.59%
0.96%
75.42%
0.65%
84.10%
0.63%
93.73%
1.52%
83.59%
1.15%
92.80%
0.71%
93.43%
0.71%
92.82%
0.54%
95.24%
0.45%
96.60%
0.68%
94.10%
0.59%
94.56%
0.49%
96.27%
0.69%
91.91%
0.55%
94.76%
0.46%
96.46%
0.73%
0.63%
94.52%
0.50%
96.04%
0.76%
93.79%
0.55%
95.24%
0.47%
96.41%
0.86%
92.25%
0.82%
92.21%
0.62%
94.83%
Nikkei 225
0.65%
69.02%
0.75%
70.43%
0.33%
92.98%
0.92%
66.76%
0.55%
86.32%
0.44%
91.06%
0.77%
74.71%
0.56%
84.88%
0.47%
89.03%
0.56%
80.25%
0.36%
94.20%
0.31%
94.06%
0.55%
83.88%
0.48%
91.20%
0.32%
92.99%
0.60%
86.18%
0.47%
93.44%
0.36%
92.67%
0.64%
89.16%
0.47%
0.38%
0.63%
0.49%
94.44%
0.39%
95.67%
0.67%
88.84%
0.47%
94.88%
0.37%
94.16%
0.94%
51.03%
0.45%
86.33%
0.32%
93.14%
0.45%
89.86%
0.37%
91.69%
0.29%
95.35%
0.48%
89.68%
0.38%
93.22%
0.35%
91.52%
0.73%
76.39%
0.37%
90.62%
0.33%
92.84%
90.94%
94.44%
95.52%
0.55%
83.04%
0.32%
94.28%
0.28%
95.96%
Appendix
Tracking error expressed in both tracking error volatility (TEV) and correlation for each index, each selection strategy, allocation scheme and number of stocks for 3rd rolling window (Jan 2017 –Dec 2017). The best combinations for each index have been highlighted
k = 10
k = 25
k = 50
Selection
Allocation
TEV
Correlation
TEV
Correlation
TEV
Correlation
FTSE 350
0.95%
22.26%
0.41%
74.26%
0.60%
60.34%
0.36%
71.44%
0.34%
87.56%
0.30%
86.64%
0.86%
51.52%
0.47%
84.15%
0.35%
86.68%
0.38%
74.67%
0.32%
81.13%
0.35%
80.65%
0.46%
69.40%
0.38%
74.41%
0.29%
86.19%
0.53%
65.00%
0.39%
74.52%
0.30%
83.08%
0.51%
0.35%
0.24%
91.18%
0.53%
73.99%
0.38%
90.04%
0.24%
91.39%
0.57%
85.23%
0.41%
87.24%
0.27%
87.37%
0.79%
37.14%
0.44%
58.52%
0.29%
81.62%
0.39%
73.05%
0.36%
78.77%
0.25%
87.72%
0.82%
42.06%
0.42%
58.72%
0.35%
72.03%
81.03%
0.34%
82.24%
0.27%
86.90%
0.29%
84.35%
87.75%
0.59%
61.40%
0.37%
79.58%
0.24%
89.40%
Russell 1000
0.33%
91.56%
0.22%
95.71%
0.20%
96.37%
0.38%
88.31%
0.30%
94.20%
0.24%
94.44%
0.52%
76.78%
0.31%
91.20%
0.21%
95.97%
0.41%
84.87%
0.32%
90.15%
0.20%
96.68%
0.54%
75.64%
0.35%
90.19%
0.23%
95.25%
0.57%
77.40%
0.43%
83.91%
0.31%
91.32%
0.57%
76.77%
0.41%
84.78%
0.29%
92.19%
0.69%
71.21%
0.52%
77.74%
0.39%
86.30%
0.43%
85.19%
0.34%
91.41%
0.28%
92.81%
0.38%
87.51%
0.32%
92.23%
0.18%
96.94%
0.49%
81.77%
0.27%
93.68%
0.24%
95.12%
0.41%
85.69%
0.27%
93.50%
0.25%
94.75%
0.51%
78.75%
0.33%
91.57%
0.26%
94.83%
0.55%
76.04%
0.39%
85.60%
0.22%
95.72%
Nikkei 225
1.08%
21.59%
0.34%
76.41%
0.57%
58.92%
0.78%
50.91%
0.57%
64.27%
0.46%
70.76%
0.60%
60.81%
0.44%
72.01%
0.36%
78.48%
0.65%
41.45%
0.30%
84.41%
0.26%
86.25%
0.69%
45.30%
0.48%
72.11%
0.28%
84.77%
0.64%
66.27%
0.45%
83.72%
0.34%
82.33%
0.54%
72.41%
0.38%
0.34%
88.01%
0.65%
65.82%
0.51%
73.63%
0.33%
88.22%
0.64%
67.80%
0.45%
82.03%
0.35%
88.14%
0.67%
53.00%
0.60%
49.89%
0.39%
72.13%
0.42%
73.97%
0.42%
75.28%
0.37%
74.33%
0.35%
79.81%
0.32%
83.84%
0.22%
89.61%
0.67%
35.39%
0.34%
73.55%
0.25%
86.36%
0.39%
68.39%
0.33%
79.56%
0.29%
81.48%
84.26%
Appendix
Tracking error expressed in both tracking error volatility (TEV) and correlation for each index, each selection strategy, allocation scheme and number of stocks for 4th rolling window (Jan 2018 –Dec 2018). The best combinations for each index have been highlighted
k = 10
k = 25
k = 50
Selection
Allocation
TEV
Correlation
TEV
Correlation
TEV
Correlation
FTSE 350
0.88%
53.23%
0.66%
61.19%
0.63%
76.71%
1.01%
54.81%
0.64%
76.79%
0.41%
85.95%
0.51%
75.57%
0.48%
78.61%
0.45%
81.49%
0.65%
65.22%
0.52%
77.23%
0.51%
79.87%
0.58%
71.15%
0.48%
79.60%
0.45%
83.80%
0.70%
70.28%
0.58%
76.80%
0.47%
82.33%
0.49%
81.32%
88.11%
0.52%
78.25%
0.43%
0.55%
82.75%
0.53%
84.08%
0.48%
87.04%
0.66%
71.03%
0.66%
71.07%
0.58%
67.29%
0.62%
74.48%
0.51%
78.98%
0.51%
80.07%
1.04%
57.82%
0.78%
67.99%
0.50%
76.12%
0.53%
77.83%
0.45%
84.62%
0.44%
84.58%
0.51%
79.64%
0.45%
85.14%
0.42%
87.76%
0.79%
70.52%
0.60%
77.14%
0.48%
82.47%
Russell 1000
0.51%
91.71%
0.48%
92.77%
0.31%
96.50%
0.44%
93.86%
0.33%
96.03%
0.27%
97.39%
0.61%
85.82%
0.48%
91.38%
0.29%
97.00%
0.67%
84.53%
0.50%
90.95%
0.29%
96.94%
0.51%
89.97%
0.53%
89.49%
0.27%
97.28%
0.58%
88.14%
0.55%
89.25%
0.39%
94.45%
0.70%
83.71%
0.49%
84.64%
0.49%
91.61%
0.52%
90.28%
0.73%
91.01%
0.43%
93.13%
0.67%
84.94%
0.39%
94.79%
0.35%
95.75%
0.61%
87.63%
0.36%
95.20%
0.35%
95.66%
0.51%
92.79%
0.37%
95.08%
0.35%
95.97%
0.67%
83.88%
0.41%
93.76%
0.33%
96.05%
0.55%
90.45%
0.44%
93.24%
0.34%
95.85%
0.52%
91.18%
0.36%
95.32%
0.30%
96.85%
Nikkei 225
0.63%
84.39%
0.62%
75.26%
0.47%
89.87%
0.88%
68.62%
0.50%
91.29%
0.54%
91.08%
0.73%
82.20%
0.49%
88.63%
0.40%
92.68%
0.82%
44.61%
92.98%
0.80%
84.64%
0.47%
93.71%
0.35%
95.60%
0.80%
87.73%
0.55%
92.85%
0.51%
94.71%
0.75%
80.49%
0.53%
95.50%
0.39%
96.70%
0.81%
84.02%
0.58%
95.14%
0.41%
95.92%
0.76%
89.38%
0.56%
0.48%
95.47%
0.77%
83.12%
0.74%
80.43%
0.47%
91.34%
0.61%
86.44%
0.55%
84.45%
0.35%
95.75%
0.56%
91.01%
0.66%
90.43%
0.37%
94.97%
1.17%
66.24%
0.59%
85.62%
0.35%
94.39%
0.66%
83.38%
0.41%
92.93%
0.66%
89.42%
0.54%
92.70%
0.35%
95.49%
Appendix
Tracking error expressed in both tracking error volatility (TEV) and correlation for each index, each selection strategy, allocation scheme and number of stocks for 5th rolling window (Jan 2019 –Dec 2019). The best combinations for each index have been highlighted
k = 10
k = 25
k = 50
Selection
Allocation
TEV
Correlation
TEV
Correlation
TEV
Correlation
FTSE 350
0.86%
66.04%
0.52%
75.04%
0.97%
52.39%
0.74%
64.57%
0.72%
76.66%
0.42%
85.55%
0.56%
76.32%
0.49%
83.61%
0.37%
85.45%
0.56%
67.80%
0.49%
80.86%
0.45%
81.45%
0.51%
78.99%
0.45%
80.87%
0.43%
85.22%
0.61%
68.89%
0.45%
80.12%
0.41%
83.44%
0.53%
78.17%
0.40%
90.59%
88.89%
0.60%
74.72%
0.43%
88.91%
0.41%
0.60%
82.95%
0.41%
90.06%
0.40%
88.75%
0.55%
82.39%
0.53%
75.56%
0.52%
73.70%
0.37%
87.89%
1.02%
65.90%
0.64%
73.07%
0.46%
76.55%
0.48%
82.87%
0.51%
79.82%
0.42%
84.49%
0.37%
87.46%
0.37%
86.86%
0.32%
89.90%
0.84%
72.54%
0.60%
79.88%
0.45%
85.52%
Russell 1000
0.54%
81.33%
0.44%
89.30%
0.30%
94.43%
0.31%
93.92%
0.42%
89.79%
0.18%
97.81%
0.63%
78.36%
0.41%
88.97%
0.28%
94.65%
0.64%
81.35%
0.43%
88.18%
0.30%
94.24%
0.53%
83.71%
0.49%
83.39%
0.29%
94.29%
0.69%
79.19%
0.50%
85.07%
0.42%
88.60%
0.90%
74.74%
0.62%
80.03%
0.45%
87.43%
0.63%
77.05%
0.53%
81.65%
0.39%
89.79%
0.52%
81.86%
0.38%
90.84%
0.40%
89.51%
0.55%
82.79%
0.47%
87.19%
0.42%
91.48%
0.46%
89.43%
0.42%
91.32%
0.45%
92.01%
0.52%
84.13%
0.38%
90.86%
0.26%
95.53%
0.49%
87.38%
0.31%
93.78%
0.25%
96.23%
0.61%
87.98%
0.43%
88.56%
0.27%
95.17%
Nikkei 225
0.71%
66.37%
0.55%
75.97%
0.61%
65.01%
0.92%
62.30%
0.45%
88.70%
0.36%
83.12%
0.75%
68.87%
0.48%
81.27%
0.47%
88.94%
0.86%
71.60%
0.37%
92.98%
0.26%
95.26%
0.77%
79.21%
0.41%
92.90%
0.31%
94.99%
0.73%
81.49%
0.48%
90.22%
0.40%
91.69%
0.54%
86.46%
0.44%
90.61%
0.34%
95.12%
0.69%
74.63%
0.42%
92.14%
0.36%
95.07%
0.64%
80.85%
0.48%
89.22%
0.38%
94.57%
0.74%
75.04%
0.48%
86.98%
0.48%
84.02%
0.49%
90.13%
0.49%
86.67%
0.36%
92.17%
0.48%
0.42%
93.70%
0.36%
92.55%
0.52%
80.76%
0.33%
93.00%
0.31%
92.93%
89.08%
0.59%
87.79%
0.46%
91.56%
0.30%
95.82%
Appendix
Tracking error expressed in both tracking error volatility (TEV) and correlation for each index, each selection strategy, allocation scheme and number of stocks for 6th rolling window (Jan 2020 –Dec 2020). The best combinations for each index have been highlighted
k = 10
k = 25
k = 50
Selection
Allocation
TEV
Correlation
TEV
Correlation
TEV
Correlation
FTSE 350
1.14%
81.01%
0.75%
91.07%
0.70%
75.01%
1.30%
82.13%
93.33%
0.73%
91.73%
0.97%
88.98%
0.69%
93.87%
0.64%
94.24%
0.84%
89.07%
0.74%
91.54%
0.67%
92.94%
0.92%
88.28%
0.73%
79.24%
0.65%
94.29%
1.07%
87.65%
0.70%
92.17%
0.64%
93.67%
1.43%
72.86%
0.78%
95.11%
0.66%
95.37%
1.34%
81.08%
0.78%
91.10%
0.65%
93.43%
1.45%
73.38%
0.80%
93.84%
0.71%
94.60%
0.91%
88.67%
0.90%
89.35%
0.79%
90.64%
0.82%
90.80%
0.66%
93.30%
1.09%
90.17%
0.93%
90.55%
0.83%
91.30%
0.98%
87.08%
0.98%
91.65%
0.65%
94.82%
0.70%
93.60%
0.68%
95.60%
1.60%
85.18%
0.89%
94.35%
0.65%
Russell 1000
0.65%
92.33%
0.38%
97.22%
0.52%
94.98%
92.24%
0.38%
97.21%
0.29%
98.39%
0.52%
95.43%
0.82%
87.95%
0.68%
91.12%
0.40%
96.96%
0.87%
86.69%
0.74%
89.08%
0.48%
95.66%
0.80%
90.61%
0.76%
88.68%
0.39%
97.15%
1.00%
81.81%
0.78%
88.26%
0.56%
94.75%
1.31%
71.04%
0.93%
87.15%
0.73%
94.37%
1.02%
81.77%
0.78%
88.25%
0.51%
95.07%
0.76%
88.64%
0.64%
92.07%
0.66%
92.12%
0.88%
84.57%
0.78%
88.05%
0.63%
92.56%
0.65%
91.91%
0.50%
95.81%
0.41%
97.25%
0.88%
86.23%
0.60%
93.03%
0.40%
97.01%
0.66%
92.44%
0.61%
92.90%
0.45%
96.14%
0.67%
0.49%
95.57%
0.39%
97.22%
Nikkei 225
94.45%
0.57%
96.55%
0.90%
93.28%
1.02%
88.98%
0.69%
95.15%
0.71%
96.02%
0.79%
93.23%
0.62%
95.85%
0.52%
97.09%
0.82%
93.40%
97.58%
0.53%
97.62%
0.93%
91.45%
0.51%
97.49%
0.43%
98.35%
0.87%
93.71%
0.49%
97.51%
0.64%
95.54%
0.95%
96.43%
0.69%
97.83%
0.65%
96.77%
0.93%
0.71%
0.61%
94.35%
0.97%
95.24%
0.71%
97.52%
0.68%
95.46%
0.80%
91.82%
0.75%
95.73%
0.58%
97.05%
1.04%
91.08%
0.75%
96.38%
1.83%
72.99%
0.74%
97.15%
1.31%
88.33%
1.03%
77.31%
0.87%
94.71%
0.77%
87.03%
1.05%
90.36%
0.86%
95.13%
0.50%
97.99%
0.82%
94.44%
0.71%
96.38%
0.45%
97.87%
