In technical analysis-based algorithmic trading strategies, we use historical price patterns to predict future prices and trade accordingly. This is analogous to machine learning where we use the existing data patterns to classify or predict new patterns. This paper uses this analogy and explains trading strategies as a machine learning classification problem. We derive simple approximations that relate the performance of trading strategies to machine learning statistics. We introduce a new performance measure of the Return Efficiency Index. This index provides a link between trading strategy return statistics and classification accuracy. It has a simple geometric interpretation, similar to the ROC index in machine learning, and can be used to compare strategies in terms of their ability to capture the potential returns possible with the underlying assets. We illustrate the proposed approach by a detailed comparison of daily trading strategies designed by analogies to nearest neighbor classification widely used in machine learning and to some strategies based on deep learning.
In the fundamental analysis of stock pricing, we focus on financial statements and economic statistics to predict the intrinsic value. By contrast, in the technical analysis, we examine historical market data to find patterns to predict future asset price movements and design trading strategies accordingly. This is very similar to what we do in machine learning (Bishop, 2016; Hastle, 2018) when we use existing data to find patterns that help us classify or predict unseen data patterns. Not surprisingly, in recent years there has been an increased interest in using machine learning to design new algorithms or improve the existing ones using machine learning techniques.
In these models, the effectiveness of using machine learning is evaluated using the standard machine learning metrics (like accuracy, confusion matrix, etc.) and the effectiveness of trading is evaluated using the financial metrics (like return, and volatility). We propose a unified approach to evaluate strategies using machine learning. We derive (approximate) expressions relating the entries of the confusion matrix to return and volatility. We introduce the new index, which we call the return efficiency index. This index is analogous to the area under the curve (AOC) measure in machine learning but relates both the accuracy of the underlying classifier and the return characteristics of the assets. This new metric has a simple interpretation: it is the proportion of the possible return range between the worst and the best trading strategies. It allows a direct comparison of the performance of different trading strategies with possibly different underlying assets as well as a comparison to a random flip. it has a simple geometrical interpretation as the cosine similarity between the metrics that reflect the accuracy of the strategy as a classifier and the underlying return profiles of the traded securities. We illustrate we consider two daily strategies built by analogy to the nearest-neighbor classification: Growth-Value (trading the S&P Growth vs. S&P Value) and Market-Cash (trading S&P-500 vs. cash) strategies.
This paper is organized as follows: In Section “Machine-learning interpretation of trading strategies”, we describe trading strategies in terms of machine learning and list some important statistics from an ML viewpoint. In Section “Analysis of strategy performance by machine learning metrics” we relate returns to label counts. We compare simple and logarithmic returns in terms of accuracy and volatility and justify our choice of logarithmic returns. In Section “Analysis of volatility and sharpe ratios by corresponding machine learning metrics”, we present a framework to express volatility and Sharpe ratios in terms of the confusion matrix and its corresponding ratios. In Section “The “return efficiency index”” we introduce and discuss the return efficiency index to compare trading strategies. We illustrate this with a detailed example in Section “A detailed example”. In Section “Example: k-NN “winners” and “losers” trading strategies” we introduce “Winner” and “Loser” Growth-Value and Market-Cash strategies based on analogies to -NN. In Section “Results and discussion” we present a detailed comparison and show that “loser” strategies outperform “winner” strategies with the best choice being the Growth-Value strategies. In Section “Choosing the number of nearest neighbors and transaction costs” we address the question of choosing the best value of and show that this value is . We summarize our key findings and conclude in Section “Concluding remarks”.
Machine-learning interpretation of trading strategies
In a typical trading strategy, for any subsequent time period , we choose between investing in asset or asset according to some rule for that strategy. In our discussion, we assume that our periods are days, and our rule tries to predict whether the daily return for is higher or lower than the corresponding daily return for .
In the ideal case, we would like to invest in for the day if the daily return for for that day is higher than the daily return for . Similarly, we would like to invest in for the day if the daily return for for that day is higher than the daily return for . Accordingly, we can assign the so-called “True” (the so-called “Ground Truth”) labels to each trading day as follows:
a true label “+” for the day means we would like to be invested in for that day
a true label “−” for the day means we would like to be invested in for that day.
However, in the context of trading algorithms, this assignment of Ground Truth labels can only be done for past historical data. We illustrate this with the following two examples.
Security is the S&P index and security is cash. We would call such strategies Market-Cash strategies. They can be implemented by trading the SPY Exchange-Traded Fund SPY. The benchmark for this class of strategies is “Buy-and-Hold” strategy: a passive investing in the S&P-500 index.
In these strategies, we would like to invest in positive return days and be in cash on negative return days. We try to predict “+” (invest) or = “−” (cash) decisions for day based on historical returns of the S&P-500 index.
Suppose that for 10 consecutive days , the daily returns for S&P-500 index are known. We can then assign daily true labels as shown in Table 1 below:
Assignment of true labels for market-cash (MC) strategies.
Day
1
3
2
4
2
2
1
3
1
1
:
Positive true labels (“+”) are assigned to six days , , , , and with non-negative daily returns . On these five days, we would like to be invested in the S&P-500 index. By contrast, negative true labels (“−”) are assigned to the remaining four days , , , and with negative daily returns . We want to remain in a cash position on these four days.
Security is the S&P Growth index and security is the S&P value index. We would call such strategies Growth-Value strategies. They can be implemented by trading S&P Growth and S&P Value Exchange-Traded Funds SPYG and SPYV respectively. In these strategies, we are always fully invested. The benchmark for this class of strategies is “Buy-and-Hold” strategy: a passive investing in the S&P Growth or S&P value indices.
We try to predict “+” (invest in S&P Growth index) or “−” (invest in S&P Value index) decisions for day based on comparative returns of Value and Growth indices. For example, we believe that Growth stocks have higher average returns than Value stocks.
Suppose that for 10 consecutive days the daily returns for Growth and daily returns for Value indices are known. We can then assign daily True labels as shown in Table 2 below:
Assignment of true labels for growth-value (GV) strategies.
1
2
3
1
3
2
2
5
1
1
0
1
2
2
2
1
1
2
3
1
Positive true labels (“+”) are assigned to six days , , , , and when the Growth index outperforms the Value index (). These days we would like to be invested in the S&P Growth index. By contrast, negative true labels (“−”) are assigned to the remaining four days , , , and when the Growth index underperforms the Value index (). These days, we would like to invest in the S&P Value index. Schematic Representation: A trading strategy decides on investing for the day by predicting a label according to some rule. For every day, we have a predicted label, and our trading strategy is implemented as follows:
label “+” for the day means we invest in for that day
label “−” for the day means we invest in for that day
If and represent days with predicted “+” and “−” labels then schematically, our trading strategy can be represented as follows
Trading according to true labels represents an ideal strategy where : we never make a mistake in predicting labels. However, in practice, that is not possible. Therefore, our strategy would make mistakes in predicting “+” and/or “minus” labels. As a result, its performance will be lower than that of an ideal strategy. The higher the accuracy of our strategy in predicting true labels, the better its performance.
A similar situation arises in many problems in machine learning, such as supervised learning. In these problems, we are given true “+” and “−” labels to some datasets and we need to design rules for classifying the unseen data points. In the language of machine learning, we are given true labels for past trading days and are asked to predict future labels. However, our predictions of labels are not perfect, and on some days, we predict labels incorrectly. The resulting statistics of classification are summarized in the so-called 4-element “confusion” matrix, Therefore, as in machine learning, we can split all days into four disjoint groups:
(true positive): on these days, the true label was and we correctly invested in security
(false positive): on these days, the true label was but we predicted it incorrectly “−”. As a result, we invested in security instead of . Such misclassification of true “+” labels is called a Type I error
(true negative): on these days, the true label was and we correctly invested in security
(false negative): on these days, the true label was but we predicted it incorrectly as “+” label. As a result, we invested in security instead of . Such mis-classification of true “−” labels is called a Type II error
Define and . Then, and represent the number of days days with true “+” and “−” labels respectively. From the above entries in the confusion matrix, we can compute the following ratios:
True Pos. Rate (recall or sensitivity):
True Negative Rate (specificity):
Accuracy:
Pos. Pred. Value (precision):
Neg. Predicted Value:
Prevalence:
The entries of the confusion matrix, the rates, and the number of positive and negative labels are related as follows:
Intuitively, the performance of the trading strategy would depend on the number of False Positive days (Type I error), False Negative days (Type II error), and the difference in returns of securities and on these days. We will quantify this more precisely in the next section.
Analysis of strategy performance by machine learning metrics
In machine learning, the metrics are derived from counting the number of correctly identified labels. On the other hand, the total returns are computed by multiplication. Therefore, to analyze the performance of trading strategies in terms of machine learning metrics, we need to consider approximations to total returns that are additive (Hudson and Gregoriou, 2015).
Approximating total returns
We start by considering two commonly used methods to express total returns in terms of suitable averages. Recall that if we have days with daily returns then the total return over these days is
We consider the following two commonly used methods to approximately relate individual returns to total return :
This method approximates the total return for days as . This method is simple but it ignores the compounding effect.
average logarithmic return:
If denote prices on days then for this return we have
This method approximates the total return for days as . Logarithmic returns consider the compounding effect by calculating the logarithm of the ratio of final price to initial price .
To compare and interpret the two methods, we proceed as follows: from the definition of the total compounded return over days in equation (2), we have
The term on the right in equation (5) is the geometric mean of terms . We can transform this geometric mean into an arithmetic mean by taking logarithms to obtain
Using the first-order Taylor expansion applied to function at point gives us . Applying this to we obtain
From equations (3) and (7), we can approximate total return by adding average simple or logarithmic returns. The accuracy would depend on the values of and the number of days .
All of these are approximations to true return. To illustrate the differences between the methods, let us consider two simple examples:
Consider an investment of that increases 20% () on the first day from to $120 and then decreases 16.67% ( on the second day back to the original amount of $100. The total return is therefore . For the three approximation methods, we have
simple averaging method: . This would result in the final amount of
average log return method: The average log return is calculated as
which is the same as the actual total return. In this example, simple averaging and IRR method over-estimate the total return. The average “log” returns captures the compounding value better.
Suppose that for 10 days the % returns were . There are days with positive returns , , , , , and . There are days with negative returns , , , and .
The total return for this strategy is
A $100 investment would grow to $103.82 after 10 days. Let us compare the two methods for this example:
simple averaging method: we compute a simple average of all returns over 10-day period
Using this method, a $100 investment would grow to $100.40 after 10 days.
average logarithmic return:
Using this method, a $100 investment would grow to $103.75 after 10 days.
When we compare the two methods, we see that simple averaging gives the worst result because it ignores the effect of compounding. The logarithmic return method is better giving us the closest value.
The above example illustrates a well-known result: for long-term time-series data, simple averaging can generate very inaccurate results as it ignores the effect of compounding.
To see the relationship between the two returns, we proceed as follows (Hudson and Gregoriou, 2015). Recall the Taylor series expansion of around
Therefore, for the mean logarithmic return in equation (4) we have
For large and small overall simple , the mean daily logarithmic return is approximately the mean daily return minus one-half the variance of daily returns. In particular, the total return obtained by logarithmic daily return could be higher or lower than simple average return . The results obtained by these returns would differ at times of high volatility.
We illustrate the differences in accuracy of using returns for different daily returns (from to ) for different time periods (from 25 to 250 days) in Table 3.
Comparison of exact (E), simple (S) and logarithmic (L) results for total return (%) for different daily return rates and time periods.
25 days
50 days
125 days
250 days
E
S
L
E
S
L
E
S
L
E
S
L
0.05
1.26
1.25
1.25
2.53
2.50
2.50
6.45
6.25
6.25
13.31
12.50
12.50
0.10
2.53
2.50
2.50
5.12
5.00
5.00
13.31
12.50
12.49
28.39
25.00
24.99
0.15
3.82
3.75
3.75
7.78
7.50
7.49
20.61
18.75
18.74
45.46
37.50
37.47
0.20
5.12
5.00
5.00
10.51
10.00
9.99
28.37
25.00
24.98
64.79
50.00
49.95
0.25
6.44
6.25
6.24
13.30
12.50
12.48
36.63
31.25
31.21
86.68
62.50
62.42
0.30
7.78
7.50
7.49
16.16
15.00
14.98
45.42
37.50
37.44
111.46
75.00
74.89
0.35
9.13
8.75
8.73
19.09
17.50
17.47
54.76
43.75
43.67
139.52
87.50
87.35
0.40
10.50
10.00
9.98
22.09
20.00
19.96
64.71
50.00
49.90
171.29
100.00
99.80
0.45
11.88
11.25
11.22
25.17
22.50
22.45
75.28
56.25
56.12
207.25
112.50
112.25
0.50
13.28
12.50
12.47
28.32
25.00
24.94
86.53
62.50
62.34
247.95
125.00
124.69
For example, when , the exact return over days is 2.53%, while the simple and logarithmic returns are both 2.50%. if we consider a longer period of 250 days, the exact return is 28.39%, compared to 25.00% for simple returns and 24.99% for logarithmic returns respectively. The difference with exact values becomes significantly larger for higher daily return values. For example, if we take , then the exact return over 25 days is 10.50%, while the simple and logarithmic returns are 10.00% and 9.98%. Over 250 days, the exact return grows to 171.29%, compared to 100.00% for simple returns and 99.80% for logarithmic returns. This difference between exact values of returns and values using simple or logarithmic averaging becomes larger for higher values of daily returns and longer time period.
In real market scenarios, we have both positive and negative returns. Table 14 compares real market results and shows that logarithmic returns are closer to exact values than simple averaging. As for the volatility of logarithmic returns, using an approximation in equation (9) and ignoring cubic and higher terms, we have
Therefore, the volatility of logarithmic returns is approximately equal to the volatility of simple returns.
We can also derive some of these results more formally if we make a commonly used assumption in finance that prices follow log-normal distribution (Satchell and Knight, 2001). Let denote the normal distribution and let (prices). In other words, represents prices and represents the returns). Then follows a log-normal distribution with mean and variance given by Johnson and Kotz (1970)
and
Using the approximation , we obtain
and for small and , we have
Therefore, the average of daily returns is approximately higher than the average of daily logarithmic returns. The difference would depend on the volatility of daily returns.
It is instructive to examine Table 18 and relate the differences in approximating annual return results to daily return volatility for S&P-500 provided in Table 14. Here are some observations:
For most years (16 out of 23) simple averaging has given us us better results than logarithmic averaging.
For the simple averaging method, the average relative error of with a very large standard deviation .
The approximation was the worst in 2011 when its relative error was 138.94. The exact return for B&H was 1.89%, and the simple averaging method gave a value of 4.53% more than 100% relative error.
For the average logarithmic returns, the average relative error was and a much lower standard deviation of error . its worst performance was in 2008 when the relative error was about 25%. Interestingly, this method has a relative error of less than 1% in 2011,
For the 5 years with the highest daily return volatility ( in 2009, in 2020, in 2009, in 2002 and in 2002) we had very large differences in accuracy between the two methods.
For the 5 years with the lowest daily return volatility, ( in 2017, in 2006, in 2005, in 2014 and in 2004) we had very small differences in both return estimates and they were within of the exact results.
Overall, the average absolute error is about half-lower for logarithmic returns than for simple returns ( vs. ) with a much lower standard deviation ( vs. ).
Therefore, for computing strategy returns, we will use average logarithmic returns. Using such returns supports time additivity and retains the compounding effect. We emphasize that although it provides a good approximation and captures the compounding effect, it is still an approximate value.
Evaluating strategy performance
The ability to relate returns in terms of averages allows us to analyze and compare the strategies by partitioning trading days into disjoint groups of correctly and incorrectly classified true labels and then (approximately) relate the strategy returns to average returns for these groups.
We proceed as follows. Let and denote the average log returns for and on days with true “” labels. Similarly, let and denote the average log returns for and on days with true “” labels.
Finally, let us define
A trading strategy invests in or for each day based on predicted labels for that day. Since the order of days is not important for the final return, we can schematically describe any trading strategy “Str” as follows:
We can measure the performance of strategies relative to the ideal (“max”) strategy and the worst (“min”) strategy, as well as to buy-and-hold benchmark strategies and (tracking errors). Each of these strategies can be represented schematically as follows:
a generic strategy “str” is schematically composed of four parts, corresponding to entries in the confusion matrix as follows:
ideal (“max”) strategy: predicted labels are all correct for each day .This means that we invest in on all days with true positive labels and invest in on all days with true negative labels.
worst (“min”) strategy: predicted labels are all incorrect for each day . This means that we invest in on all days with true positive labels and invest in on all days with true negative labels as follows:
Buy-and-Hold (B&H) “A” strategy: all predicted labels for each day . This means that we invest in security on all days as follows:
Buy-and-Hold (B&H) “B” strategy: all predicted labels for each day . This means that we invest in security on all days as follows:
We now make a connection between trading statistics and machine learning statistics by approximately expressing the strategy returns and other metrics in terms of confusion matrix counts and average daily logarithmic returns. We will use the notation “*” and the prefix ML (machine learning) to emphasize that these are approximations.
With this notation, we summarize ML-returns as follows:
The underperformance of the strategy relative to the ideal strategy is
The first contribution to underperformance is due to our error in predicting days with true “” labels. The second contribution is due to an error in predicting days with “” true labels.
The ML-tracking errors and relative to and are given by
To outperform benchmark we need . To outperform benchmark we need the . To outperform both benchmarks, we need
The ML-based metric for the underperformance of Buy-and-Hold in and relative to the ideal strategy is
For days, we have an average negative return . Unlike the ideal strategy, in the Buy-and-Hold strategy BH-A, we were fully invested in during these days, resulting in a loss given by equation (14). Although we do not make trading decisions in a Buy-and-Hold strategy, intuitively, we can think of this loss as consisting of two components: we lost a portion of the potential return in negative true labels days. These were correctly identified by the strategy. The second component of the loss in days. These days were incorrectly identified by our strategy as well.
Machine-learning interpretations for market-cash
For the Market-Cash strategies, security is S&P-500 index, and security is cash. For S&P-500, we have and . For cash, we have , and therefore, and Therefore, the expressions relating returns to machine learning metrics are reduced to a much simpler form. For the Market-Cash strategy we have:
The expressions for ML-returns in equation (13) are reduced to a much simpler form
The difference in ML-returns with the ideal strategy is then
This has a simple and intuitive interpretation: our under-performance relative to the ideal strategy is composed of two parts:
for FP days, we misclassified the true negative days as “+” and invested in the losing days, and therefore, losing (approximately)
for FN days, we misclassified the true positive days as “−”, stayed in cash, and did not invest in the positive days. This resulted in the “opportunity” loss of approximately
Finally, from equations (15) we can estimate the ML-tracking error as
Therefore, the tracking error consists of two components:
For days, the market was positive. The market-cash strategy was in cash, while the Buy-and-Hold was invested. This resulted in a loss of approximately to the tracking error.
For days, the market was negative. The market-cash strategy was in cash, while the Buy-and-Hold was invested. This resulted in a gain of approximately .
Analysis of volatility and sharpe ratios by corresponding machine learning metrics
We now consider volatility. We will use to denote volatility with appropriate subscripts to indicate specific trading strategies or benchmarks, and to represent ML-volatility computed from the confusion matrix. If denotes the standard deviation of daily returns, then the volatility over days is given by . Recall that the standard deviation of simple returns is approximately equal to the standard deviation of logarithmic returns as shown in equation (10). Let denote the prevalence. As before, we assume that daily returns are independent.
Let and denote the standard deviation of daily returns for and , then the volatilities for the buy-hold strategies and we have
In Appendix A we derive the following expressions for ML-volatility and ML-Sharpe ratios for different strategies. We summarize our results below: ML-Volatility:
ideal strategy:
worst strategy:
generic strategy:
Ignoring the risk-free rate, we can write the expressions relating the Sharpe Ratios with confusion matrix entries. We will use the notation to denote the Sharpe ratio. ML-Sharpe Ratio:
Buy-and-Hold strategies and we have
ideal strategy:
worst strategy:
generic strategy:
The proposed ML-metrics are approximations and cannot be used to exactly compare two trading strategies. However, these approximations of strategy performance could offer insights to explain the relative underperformance of strategies in terms of prediction accuracy. For example, if we can improve the strategy by identifying only one extra true positive day, this means that we increase by 1 and decrease by 1. This results in an increase of to the return in the numerator. This will also increase by 1 and decrease by 1. As a result, the expression in the denominator under the square root will increase by .
On the other hand, if we can improve the strategy by identifying only one extra true negative day, this means that we increase by 1 and decrease by 1. This results in an increase of to the return in the numerator. This will also decrease by 1 and increase by 1. As a result, the expression in the denominator under the square root will decrease by .
Finally, if we can improve the strategy by identifying only one extra true positive and one extra negative day, then and will remain unchanged. Therefore, the volatility will remain unchanged, whereas the return will increase by increasing the Sharpe’s ratio.
This means that we always increase both the returns and Sharpe’s ratio by increasing our accuracy in identifying only negative labels. On the other hand, if we increase our accuracy in identifying only the positive labels, we increase our returns but not necessarily the Sharpe’s ratio. Increasing the accuracy in both positive and negative will increase both the return and Sharpe’s ratio.
For the Market-Cash strategies, is the -500 index and is cash. In this case, we have , , and . Therefore, the above expressions for volatility and Sharpe’s ratio was reduced to a much simpler form
The “return efficiency index”
In the previous section, we derived (approximate) expressions for strategy performance in terms of machine learning metrics. But how should we compare any two strategies? Many metrics are used in finance, such as tracking error, Sharpe ratio, drawdowns, and others. One drawback of such metrics is that they do not take into account how close the trading strategy is to the ideal case. Maybe it was not possible to significantly outperform the benchmark.
Is there a way to quantify this and is there a way to compare strategies not by comparing their relative absolute performance but by assigning them a universal score from 0 to 1, reflecting their ability to capture the maximum possible return?
We suggest proceeding as follows: consider the worst and the best trading strategy with corresponding returns and respectively. For any strategy, we have . Unless all daily returns are the same, we have . We suggest to define the strategy return efficiency index as
For brevity, we will refer to it as the efficiency index. The above formula is analogous to “min-max” scaling of data widely used in machine learning.
The above definition implies that for any strategy . The numerator is the excess return compared to the worst return , whereas the denominator is the maximum possible excess return generated by predicting all true labels correctly.
Therefore, the return capture efficiency has the following simple interpretation: it tells us what fraction of the possible return range (from best to worst possible strategy) is captured by our strategy. For the worst strategy, this index is 0, and for the best possible strategy, it is 1.
For any strategy, its efficiency index would depend on how good the strategy is in predicting positive and negative true labels. To express the efficiency index in terms of machine learning metrics, we compute ML-efficiency index as follows. As before, recall and . Note that and .
From our equations (13) we obtain the following for the ML-efficiency index:
We can rewrite this in terms of recall and as follows:
To interpret the index in equation (19) we note that for the ideal strategy and its efficiency index can be written as
The first term in in the above equation (20) is the fraction of excess return over captured by the positive True labels, and the second term is the fraction of excess return captured by the negative True labels.
Therefore, the (return) efficiency index for any strategy has the following interpretation: it is the weighted sum of fractions of excess returns captured by the positive and negative true labels with precision and recall as the corresponding weights.
Next, consider a buy-and-hold strategy of investing only in or . These can be thought of as trading strategies where all predictive labels are “+”.
From equations (13) and (19) the return efficiency indices for these buy-and-hold strategies are
Therefore, we can rewrite the efficiency index for any strategy in terms of the efficiency indices of buy and hold strategies for and as follows:
This provides an alternative interpretation of the efficiency index: it is the weighted sum of the efficiency indices of buy-and-hold strategies taken with weights and respectively.
We can re-write the above equation (22) in terms of the (unscaled) dot product as follows:
The above expression for is the dot-product of two vectors:
the vector is machine learning related. It describes the accuracy of our strategy as a classifier predicting positive and negative labels.
the vector is returns-related. It describes the return “profiles” defined by the benchmarks and .
Geometrically, this unscaled dot product is related to the cosine of the angle between and via . This is illustrated in Figure 1.
AUC Score (left) and Reutn Efficiency Index (right).
We can compare our strategy to benchmarks by comparing the corresponding efficiency indices instead of examining the tracking errors. From equations (21) and (18) we have
Therefore, our ability to outperform the benchmark depends on our ability to correctly identify labels and on the relative values of and . From equation (23) we have
Finally, consider a “random flip” strategy where we flip a coin to decide True labels. Let be the probability that we invest in security and be the probability that we invest in security . of “head” (we choose security ) and be the probability of tails. This means that out of true labels we correctly identify positive true labels and out of true “−” labels we correctly identify negative true labels. This is equivalent to and . Therefore, from equation (19) we obtain
Therefore, the efficiency index of the random strategy is a weighted average of and with weights and respectively.
From equations (19) and (24), it follows that to outperform a random index, we must have
If then our strategy outperforms a -random flip strategy if
Therefore, we can compare our strategy to the efficiency index of some benchmarks and to a random strategy. This is similar to analyzing classifiers in machine learning, where we compute the so-called Area Under Curve and compare it to to see how different our predictions on labels are from those generated by a random coin flip.
The equation (19) gives us a universal way to compare any strategies in terms of their ability to capture the potential excess return over the worst strategy.
Consider the Market-Cash (MC) strategies. for these strategies, we have and therefore, and . Note that and therefore . They can be interpreted as the absolute average log returns for positive and negative return days respectively.
The efficiency index of the strategy and benchmark S&P-500 index is given by
The market-cash strategy can outperform the index if we do not have too many False Negative (FN) predictions, namely, we must have .
A detailed example
Let us present a detailed example of comparing two strategies, and . Strategy is a Growth-Value strategy and strategy is a Market-Cash strategy. We assume that we have the following daily data for eleven days starting with day . The detailed results for returns , balances , true labels and predicted labels for each day are summarized in Table 4.
Daily data for and strategies.
S&P
G
V
Strategy
Strategy
1
101
1
101
0
100
1
101
+
+
1
101
+
+
3
104
2
103
1
101
2
103
+
+
3
104
+
2
102
3
100
2
99
3
100
2
102
4
98
1
99
2
97
2
98
0
102
2
100
3
102
2
99
3
101
+
+
0
102
+
2
102
2
104
1
100
2
103
+
+
3
105
+
+
1
101
2
102
1
99
1
102
1
104
+
3
104
5
107
2
101
2
104
0
104
1
105
2
109
3
104
3
107
2
106
+
1
104
1
108
1
103
1
106
+
1
105
+
The growth of $100 investments in S&P Buy-and-Hold strategies and our two strategies in shown below in Figure 2.
Example of Strategy Comparison.
We first compute the relative performance of the two strategies over 10 days starting with day .
Our results are summarized in the Table 5: All strategies start with the same balance . The higher final balance (after rounding to the nearest integer) of $106 is achieved by the Growth-Value , while the lower final balance of $105 was achieved by the Market-Cash . The corresponding total returns for these strategies are and , respectively. The total return of Strategy is more than twice the total return of Strategy .
Performance measures for strategies.
Buy-and-Hold
Strategy
Metrics
S&P
Gr.
Val
Final balance
104
108
103
106
105
Total return
4
8
3
6
5
(Daily) st.dev.
2.2
2.4
1.7
2.1
1.6
Volatility (risk)
7.0
7.5
5.3
6.5
5.2
Tracking error
*
4
2
2
1
Sharpe ratio
0.6
1.1
0.6
0.9
1.0
MDD - max draw
6
4
5
5
2
# Trades
1
1
1
5
5
Next, we compare volatility. If denotes the standard deviations of daily returns over days, then we can compute the risk (volatility of returns over days) as . The volatility for Strategy is and is higher than the volatility of for Strategy .
From the above, we can compute the corresponding Sharpe ratios for the two strategies. For strategy , the Sharpe’s ratio (after rounding to decimals) is , whereas for Strategy the Sharpe’s ratio is . Therefore, Strategy has a higher absolute return than Strategy , but not on a risk-adjusted basis as shown by the Sharpe’s ratio.
Next, we examine the maximum drawdowns. For Strategy , the maximum decrease was from to . This gives the maximum drawdown . For Strategy , the largest decrease in balance was from to . This gives the maximum drawdown . For the S&P-500, the largest decrease was from to , giving us the maximum drawdaown . For the S&P Growth index, the maximum decrease was from decrease was from to giving us . Finally, for the S&P Value index, the maximum decrease was from to , giving us .
We summarize the results (after rounding) in Table 5.
Machine learning description of strategy
From Table 4 we can represent this strategy schematically as:
There are days , , , , with Positive True Labels. The average log returns of Growth and Value indices on these days are
There are days , , with Negative True Labels. The average log returns of Growth and Value indices on these days
This gives us and .
Next, we examine the confusion matrix for Strategy :
The corresponding recall , specificity and accuracy . From Table 5, we have , and . From equation (13) we have
The Predicted Positive Value and the Predicted Negative Value . The ML-volatility of strategy- is then
and this gives us the corresponding Sharpe’s ratio
We now compute the return efficiency indices , and . From Table 5 and equation (26) we have
To compute the equivalent -random strategy, we obtain from equation (25)
The strategy recovered about 3/5 of the difference in returns between the best and the worst strategy. In terms of the return efficiency index, we have : strategy outperforms the buy-and-hold in Value but underperforms the buy-and-hold in Growth strategy. And this strategy outperforms the random flip strategy with .
Machine learning description of strategy
For this strategy, security is the market (“M”) - S&P-500 index and security is cash (“C”). From Table 4 we can compute represent for Strategy- schematically as:
There are days , , , , and with Positive True Labels and there are days , , , and with negative True Labels. The corresponding average log returns of S&P-500 (market index) index on these days are
The returns for investing in cash on these days are . This gives and .
The confusion matrix is
The corresponding recall , specificity and accuracy . From Table 5 we have , and . For the ML-returns of the best and worst strategies, from equation (15) we have
The Predicted Positive Value and the Predicted Negative Value . The ML-volatility of strategy- is then
and this gives us the corresponding Sharpe’s ratio
We now compute the return efficiency indices , and . From Table 5 and equation (27) we have
To compute the equivalent -random strategy, we obtain from equation (25)
The strategy recovered of the difference in returns between the best and the worst strategy. Its return efficiency index is greater than that of a random flip strategy.
In terms of accuracy indices (machine learning-based metric) . This result tells us that strategy provides very low advantage over buy-and-hold: it is only 5% better at recovering the potential return compared with buy-and-hold.
Machine learning comparison of X and Y
We summarize the comparison results for strategies and in Table 6.
Machine learning metrics for strategies.
Strategy
Metrics
True Positive (TP)
4
4
False Positive (FP)
2
3
True Negative (TN)
2
1
False Negative (FN)
2
2
Recall (TPR)
0.67
0.6
Specificity (TNR)
0.5
0.25
Accuracy (ACC)
0.6
0.5
Precision (PPV)
0.67
0.57
Predicted Negative Value (PNV)
0.5
0.33
Return Efficiency Index
0.61
0.65
Equivalent Random Flip probability
0.71
0.5
Strategy predicts both the positive and negative True labels at a higher rate, resulting in higher overall accuracy ACC (0.6 vs. 05). However, in terms of return efficiency, strategy is more efficient: it recovers about 65% of the potential return vs. 61% for . Recall that this potential return depends on the benchmarks used and is different for these strategies. For the Growth-Value, the range of returns was 7.2, and for Market-Cash, the range was 20 and was much wider. For the equivalent random strategy, Growth-Value requires a larger value of (0.71 vs. 0.5). This is consistent with higher accuracy for and a lower True negative Rate for .
Comparing these two strategies, we see that the strategy has a higher return but this comes at the cost of higher volatility and drawdowns as well as increased trading frequency. The risk-adjusted return of is lower than for as measured by the Sharpe’s ratio,
Example: k-NN “winners” and “losers” trading strategies
In the previous sections, we draw an analogy between a trading strategy of choosing the appropriate asset to invest and a machine learning problem of predicting a label. We explore this idea further and suggest some trading strategies based on analogies to machine learning.
One of the simplest algorithms for classification in machine learning is the so-called -NN - nearest neighbor classification (Bishop, 2016). In this method, we assume that we are given a distance metric between any two points and . To assign a label to any point , we find the closest labeled points (the so-called “neighbors”) of and assign a label to based on the majority of labels from these neighbors. The number must be odd to have a well-defined predicted label. The simplest case is : we assign a label to based on the label of its closest neighbor.
An example is illustrated in Figure 3 where we have six True (“Ground Truth”) labels in the training set.
Example of Nearest Neighbor Classification.
In this example, we need to assign a label to point . If we take , then the nearest neighbor is point 1 with a (Ground Truth) “green” label. In this case, we assign the label “green” to . If we take neighbors from the training set, the nearest three neighbors to are point 1 (“green”), point 2 (“red”) and point 3 (“red”). The majority of these labels are “red” labels and therefore will be assigned the label “red”. Finally, take . The nearest five neighbors to are point 1 (“green”), point 2 (“red”), point 3 (“red”), point 4 (“green”) and point 5 (“green”). Most of these five points have the Ground truth label “green,” and therefore, will be assigned “green”. This example shows that the final label depends on the value . This value is computed by experiments.
Let us consider a trading strategy based on this analogy. Unlike typical scenarios in applying supervised machine learning algorithms where ground truth labels are known in advance (Bishop, 2016), in trading many trading algorithms, these ground truth labels are known only for historical data. This is illustrated in Figure 4 where we need to make a prediction for based on historical ground truth labels .
Next-Day Label Prediction by -NN Analogy.
By analogy to -NN in machine learning, for any two days and define the distance as the number of days in between . With this definition, the neighbors of any day are the previous days. In the simplest case of , the nearest neighbor of is the previous day . In this case, we assign a predicted label to day based on the true label of the previous day . Since the ground truth label for day is “green”, we assign “green” as predicted label for day . In the more general setting of , we assign a predicted label to day based on the majority of ground truth labels of its “nearest neighbors”, namely the labels of the preceding days .
If the predicted label is taken to be the majority of the true label of neighbors, we will call such strategy -winners strategy. We denote this strategy as AB-W to indicate that it trades in two securities and , and uses the majority (“winning”) of true labels of the previous days (“neighbors”). Formally, in AB-W strategy, we assign predicted label for day as:
If the predicted label is taken to be the minority of the true labels of neighbors, we will call such strategy -losers strategy. We denote such a strategy as AB-L to indicate that it trades in two securities and , and uses the minority (“losing”) of true labels of the previous days (“neighbors”). Formally, in AB-L strategies, the predicted label for day
Market-cash -NN strategies
In this class of strategies, we predict the choice of investments for the next day (market or cash) based on the daily returns of the S&P-500 index over the last days.
Recall that in Market-Cash strategies, to each day we assign a True label ” or ” depending on the daily return of the S&P-500 index as follows:
Once the true labels are assigned to each trading day, we generate predicted labels (trading signal) for the day based on the majority (“winning”) or the minority (“winning”) of True labels in the previous day(s).
Our trading algorithm invests in day based on predicted label for that day as follows:
predicted label ”: (re)invest in S&P-500 Index for day
predicted label ”: be in Cash for day
In MC-W (Market-Cash -winners) strategies, we predict the next day label ” if most daily returns of S&P-500 index of the previous days were non-negative. In such strategies, we tend to believe that the current positive momentum has some inertia and will continue. In MC-L (Market-Cash -losers) strategies, we predict the next day label if most daily returns of S&P-500 index of the previous days were positive. In these strategies, we tend to believe that the current positive momentum is about to change.
We illustrate this by the following example of and for 11 days , , …, with true and predicted labels as shown in Table 7.
Predicted labels for different MC-* trading strategies.
Day
(MC)
(MC-1W)
n/a
(MC-1L)
n/a
(MC-3W)
n/a
n/a
n/a
(MC-3L)
n/a
n/a
n/a
For , we can assign a signal starting on the day . For the three preceding (“neighbor”) days have true labels , and respectively. This means that on these days, the S&P-500 index had mostly non-negative daily returns. The majority of these labels are “+” and therefore, the 3-Day Losers strategy assigns predicted label (trading signal) for the fourth day , indicating to be in Cash position.
For example, consider MC-3W (“winners”) strategy and the assignment of predicted labels starting with day . The true labels for the preceding three (“neighbor”) days , , and were “+”, “+” and “−” respectively. The majority of these three labels was “+”. Therefore, for nearest neighbor winning strategy, we would assign a predicted label for the day in strategy MC-3W. Therefore, the MC-3W strategy suggests to be invested in S&P index for the day . By contrast, for the same day we would assign a predicted label by the “losers” strategy MC-3L. The MC-3L strategy suggests being in cash position for the day
Growth-value -NN strategies
In this class of strategies, we predict the choice of investments for the next day (growth or value) based on the relative performance of the corresponding indices over the last days (“neighbors”)
Recall that for Growth-Value strategies, to each day we assign a true label or depending on the daily returns and of the Growth and Value indices as follows:
Once the true labels are assigned to each trading day, we generate predicted label (trading signal) for the day based on the assigned True labels in the previous day(s). Our trading algorithm invests in day based on predicted label for that day as follows: We will consider two trading signals:
predicted label : choose S&P-500 Growth index for day
predicted label : choose S&P-500 Value index for day
In GV-W (Growth-Value -winners) strategies, we predict the next day label if most daily returns of S&P Growth index of the previous days were higher than the daily returns of the S&P Value index. And we predict the next day label if most daily returns of S&P Growth index of the previous days were lower than the daily returns of the S&P Value index. In such strategies, we tend to believe that the current overperformance of one index over the other has some inertia and will continue. By contrast, in GV-L (Growth-Value -losers) strategies, we believe that the current overperformance of one index over another is about to change.
We illustrate this by the following example of and for 11 days , , …, with true and predicted labels as shown in Table 8 below:
Predicted labels for different GV- trading strategies.
Day
True Label
GV-1W
n/a
GV-1L
n/a
GV-3W
n/a
n/a
n/a
GV-3L
n/a
n/a
n/a
Summary metrics for KNN, LSTM and CNN models.
Strategy
GV-1L
3-Month
6-Month
CNN
LSTM
CNN
LSTM
Average ML Statistics
TPR (%)
52
42
43
41
41
TNR (%)
55
60
61
62
62
Acc.(%)
53
49
50
49
49
Average Trading Statistics
Final $
2,642
383
300
338
304
Annual
16.6
7.6
6.7
7.3
6.6
MDD
15.2
16.5
16.9
16.7
16.5
Volatility
18.6
17.7
17.7
17.9
17.7
Sharpe
1.2
0.7
0.7
0.7
0.7
Summary Statistics of Return Efficiency Index (REI)
Median
0.41
0.35
0.36
0.35
0.34
Mean
0.40
0.34
0.33
0.34
0.34
St. dev.
0.10
0.08
0.08
0.08
0.09
For , we can assign a signal starting on the day . For the first three days , and have true labels , and respectively. This means that on these days, the S&P Growth index overperformed the Value index (in terms of number of days when ). The majority of these true labels are “+” and therefore, the 3-Day “Winners” strategy GV-3W assigns predicted label (trading signal) for the day , indicating to be invested in the S&P Growth index. By contrast, the 3-day losers strategy assigns predicted label (trading signal) for the day , indicating to be invested in the S&P Value index.
Results and discussion
We now turn to analyze results for the nearest neighbor Growth-Value and Market-Cash strategies using Machine learning metrics.
We present the following comparisons:
growth comparison
comparison of returns and machine learning metrics
comparison of volatility and drawdowns
comparison of tracking errors and Sharpe ratios
comparison by return efficiency ratio
choosing the number of neighbors and transaction costs
The detailed tables for each year as well as summary statistics (min, max, median, average and standard deviation) are presented in the Appendix. In most tables, we used the color “green” to identify the best value, “red” to identify the worst value, and “yellow” to identify the median value.
Growth comparison
We start by considering the three Buy-and-hold strategies investing in the three indices S&P-500, S&P-Growth and S&P-Value and in day Growth-Value and Market Cash Strategies. The growth of $100 for these is shown in Figure 5
Comparison of Growth.
As can be seen from this graph, the highest growth is achieved by the Growth-Value “Loser” strategy GV-1L with . The detailed annual growth is summarized in Table 10.
Comparison of annual end balances of investment strategies.
Buy-and-Hold
Growth-Value
Market-Cash
Year
S&P
G
V
GV-1W
GV-1L
MC-1W
MC-1L
2001
88
74
94
60
117
93
95
2002
69
50
77
25
157
67
103
2003
89
65
97
26
238
73
122
2004
98
68
110
27
281
77
127
2005
103
70
116
26
308
73
141
2006
119
76
141
29
367
81
147
2007
125
84
143
29
413
71
176
2008
79
53
91
16
303
41
194
2009
100
72
106
19
408
43
234
2010
115
84
123
21
501
43
265
2011
117
88
122
20
547
46
256
2012
136
100
143
19
745
53
258
2013
180
133
188
24
1,045
60
299
2014
204
153
211
26
1,222
62
332
2015
207
161
205
26
1,278
62
333
2016
232
172
240
29
1,419
59
392
2017
282
218
276
35
1,720
63
445
2018
269
218
252
34
1,604
69
388
2019
353
285
331
47
2,016
83
424
2020
418
381
336
60
2,137
72
577
2021
538
503
419
85
2,486
82
659
2022
440
355
397
65
2,161
72
615
2023
555
461
486
85
2,642
78
717
IRR
7.7
6.9
7.1
0.7
15.3
1.1
8.9
The GV-1L strategy ends with the highest balance of $2,642 whereas the Market-Cash MC-1W ends with $717. Both strategies outperform the Buy-and-hold S&P-500, S&P Growth and S&P Value strategies that yielded $555, $461 and $486 respectively. Both Growth-Value and Market-Cash “Winner” strategies resulted in a loss.
Comparison of returns and machine learning metrics
A detailed breakdown of annual returns for 23 years for these strategies is presented in Table 11. A detailed Table 13 of machine learning metrics is presented in the Appendix.
Comparison of annual returns of investment strategies.
Buy-and-Hold
Growth-Value
Market-Cash
Year
S&P
G
V
GV-1W
GV-1L
MC-1W
MC-1L
2001
11.8
25.9
5.6
40.2
17.0
7.1
5.0
2002
21.6
32.1
18.0
58.5
34.2
28.0
8.9
2003
28.2
28.3
25.2
6.1
51.4
8.9
17.8
2004
10.7
5.3
13.2
0.7
18.4
5.9
4.5
2005
4.8
2.8
5.4
1.2
9.6
5.4
10.8
2006
15.8
9.0
21.6
11.4
19.0
11.1
4.3
2007
5.1
10.8
1.4
0.2
12.5
11.9
19.3
2008
36.8
37.4
36.3
45.7
26.5
42.8
10.6
2009
26.4
37.0
17.1
19.3
34.4
5.0
20.3
2010
15.1
16.2
15.5
9.4
22.7
1.4
13.5
2011
1.9
4.6
0.7
4.9
9.2
5.5
3.4
2012
16.0
14.2
17.2
1.8
36.3
15.2
0.7
2013
32.3
32.6
31.8
24.6
40.3
14.1
16.0
2014
13.5
14.8
12.2
10.2
16.9
2.3
11.0
2015
1.2
5.1
3.2
2.7
4.6
1.0
0.3
2016
12.0
6.8
17.1
12.6
11.1
5.0
17.9
2017
21.7
27.2
15.4
21.1
21.2
7.4
13.4
2018
4.6
0.1
9.0
2.5
6.7
9.2
12.6
2019
31.2
30.8
31.7
37.1
25.7
20.1
9.2
2020
18.3
33.5
1.4
27.7
6.0
13.1
36.1
2021
28.7
32.0
24.9
41.7
16.3
12.8
14.1
2022
18.2
29.4
5.3
23.1
13.1
12.4
6.6
2023
26.2
30.0
22.2
30.0
22.2
8.3
16.5
32.3
37.0
31.8
41.7
51.4
20.1
36.1
36.8
37.4
36.3
58.5
26.5
42.8
12.6
13.5
10.8
13.2
6.1
17.0
5.0
10.8
9.4
9.4
8.5
3.1
16.6
0.1
9.5
18.3
22.3
16.6
25.4
17.4
14.6
10.8
Yearly confusion matrix statistics for strategies.
GV-1L
MC-1L
Confusion Matrix
True Label
Confusion Matrix
True Label
Year
TP
FP
TN
FN
TP
FP
TN
FN
2001
67
60
67
54
121
127
64
59
64
61
125
123
2002
81
54
80
37
118
134
73
60
72
47
120
132
2003
73
52
73
54
127
125
66
42
66
78
144
108
2004
70
61
71
50
120
132
60
46
61
85
145
107
2005
66
65
66
55
121
131
65
47
65
75
140
112
2006
67
76
66
42
109
142
58
52
58
83
141
110
2007
71
42
71
67
138
113
71
41
71
68
139
112
2008
66
61
67
59
125
128
63
65
62
63
126
127
2009
67
40
67
78
145
107
62
48
63
79
141
111
2010
68
59
68
57
125
127
63
43
62
84
147
105
2011
70
56
69
57
127
125
59
56
60
77
136
116
2012
75
59
75
41
116
134
57
55
57
82
139
111
2013
70
49
70
63
133
119
64
39
64
85
149
103
2014
66
46
66
74
140
112
65
37
66
84
149
103
2015
66
48
67
71
137
115
66
65
66
55
121
131
2016
63
60
63
66
129
123
74
40
74
64
138
114
2017
66
40
66
79
145
106
67
40
67
77
144
107
2018
60
48
59
84
144
107
60
57
59
75
135
116
2019
58
70
59
65
123
129
57
45
57
93
150
102
2020
57
46
57
93
150
103
72
35
72
74
146
107
2021
59
68
59
66
125
127
63
42
64
83
146
106
2022
64
75
63
49
113
138
61
80
61
49
110
141
2023
62
45
63
80
142
108
63
46
63
78
141
109
81
76
80
93
150
142
74
80
74
93
150
141
57
40
57
37
109
103
57
35
56
47
110
102
66
56
67
63
127
125
63
46
64
77
141
111
67
56
67
63
129
122
64
50
64
74
138
114
6
11
5
14
11
11
5
11
5
12
11
10
Comparison of machine learning metrics.
Recall, Specificity, and Accuracy
Precision, NPV, and Prevalence
TPR
TNR
ACC
PPV
NPV
Year
GV
MC
GV
MC
GV
MC
GV
MC
GV
MC
GV
MC
2001
55
51
53
52
54
52
53
52
55
51
49
50
2002
69
61
60
55
64
58
60
55
68
61
47
48
2003
57
46
58
61
58
52
58
61
57
46
50
57
2004
58
41
54
57
56
48
53
57
59
42
48
58
2005
55
46
50
58
52
52
50
58
55
46
48
56
2006
61
41
46
53
53
46
47
53
61
41
43
56
2007
51
51
63
63
57
57
63
63
51
51
55
55
2008
53
50
52
49
53
49
52
49
53
50
49
50
2009
46
44
63
57
53
50
63
56
46
44
58
56
2010
54
43
54
59
54
50
54
59
54
42
50
58
2011
55
43
55
52
55
47
56
51
55
44
50
54
2012
65
41
56
50
60
45
56
51
65
41
46
56
2013
53
43
59
62
56
51
59
62
53
43
53
59
2014
47
44
59
64
52
52
59
64
47
44
56
59
2015
48
55
58
50
53
52
58
50
49
55
54
48
2016
49
54
51
65
50
59
51
65
49
54
51
55
2017
46
47
62
63
53
53
62
63
46
47
58
57
2018
42
44
55
51
47
47
56
51
41
44
57
54
2019
47
38
46
56
46
45
45
56
48
38
49
60
2020
38
49
55
67
45
57
55
67
38
49
59
58
2021
47
43
46
60
47
50
46
60
47
44
50
58
2022
57
55
46
43
51
49
46
43
56
55
45
44
2023
44
45
58
58
50
50
58
58
44
45
57
56
69
61
63
67
64
59
63
67
68
61
59
60
38
38
46
43
45
45
45
43
38
38
43
44
53
45
55
57
53
50
56
57
53
45
50
56
52
47
55
57
53
51
55
57
52
47
51
55
7
6
5
6
4
4
5
6
7
6
5
4
Before discussing the overall results, let us focus on 2023 and illustrate the computations of returns for Growth-Value and Market-Cash strategies using the data for 2023 for machine learning and returns from Tables 13 and 14. To that end, recall the general equation (13) linking the returns of a strategy with machine learning metrics is ML-return
Daily risk and return details.
St. Dev (risk)
Average Log Return for True Labels
Buy-and-Hold
GV
MC
Year
2001
1.39
2.58
1.09
1.51
1.67
0.00
0.04
1.00
1.12
2002
1.67
2.09
1.55
1.09
1.25
0.08
0.08
1.25
1.32
2003
1.04
1.15
1.04
0.49
0.30
0.09
0.27
0.79
0.83
2004
0.70
0.70
0.66
0.26
0.20
0.11
0.19
0.51
0.60
2005
0.65
0.64
0.61
0.24
0.20
0.07
0.11
0.48
0.56
2006
0.63
0.68
0.62
0.29
0.16
0.05
0.10
0.47
0.47
2007
1.00
0.92
1.05
0.00
0.09
0.32
0.41
0.67
0.78
2008
2.60
2.35
2.57
0.34
0.03
1.06
0.68
1.52
1.87
2009
1.68
1.61
1.75
0.18
0.05
0.31
0.57
1.16
1.26
2010
1.13
1.19
1.12
0.35
0.22
0.00
0.11
0.73
0.88
2011
1.45
1.36
1.49
0.08
0.12
0.48
0.49
0.96
1.11
2012
0.80
0.76
0.88
0.04
0.06
0.34
0.42
0.59
0.60
2013
0.70
0.69
0.66
0.27
0.06
0.04
0.18
0.55
0.52
2014
0.71
0.77
0.65
0.35
0.31
0.11
0.03
0.48
0.58
2015
0.97
1.00
0.97
0.17
0.16
0.14
0.14
0.75
0.68
2016
0.82
0.84
0.83
0.20
0.16
0.08
0.21
0.57
0.59
2017
0.42
0.45
0.45
0.17
0.01
0.10
0.27
0.33
0.27
2018
1.07
1.23
0.95
0.50
0.68
0.08
0.20
0.67
0.82
2019
0.79
0.81
0.80
0.22
0.00
0.05
0.26
0.57
0.57
2020
2.10
2.18
2.21
0.38
0.28
0.37
0.55
1.21
1.49
2021
0.82
1.03
0.82
0.62
0.39
0.12
0.29
0.63
0.63
2022
1.53
1.93
1.21
1.25
1.27
0.47
0.43
1.28
1.14
2023
0.82
0.84
0.84
0.23
0.06
0.06
0.27
0.66
0.64
2.60
2.58
2.57
1.54
0.12
0.48
0.71
1.56
0.27
0.42
0.45
0.45
0.31
1.64
1.02
0.42
0.33
1.84
0.97
1.00
0.95
0.26
0.16
0.08
0.22
0.67
0.68
1.11
1.21
1.08
0.38
0.30
0.12
0.21
0.78
0.83
0.53
0.61
0.52
0.42
0.47
0.28
0.26
0.32
0.37
For Growth-Value GV-1L strategy, is the S&P-500 Growth and is the S&P-500 Value index. For Market-Cash, is the S&P-500 index and is cash.
Growth-Value: From Table 14, the average log returns for GV-1L in 2023 were , , , and . For confusion matrix counts, from Table 12 we obtain: , , , and . Substituting these values in the above equation for , we get
Market-Cash: From Table 14, the average log returns for MC-1L in 2023 were , and . For confusion matrix counts, from Table 12 we have , , , and . Substituting these values in above equation for we get:
We now examine the returns data in Table 11. The outperformance of GV-1L strategy over other strategies is very significant - about 700 basis points higher than buy-and-hold strategies and about 600 basis points higher than MC-1L. This is somewhat unexpected: intuitively, we would expect the opposite since in Growth-Value strategies we are always invested and the indices themselves are correlated, whereas in Market-Cash we seek to avoid losses by having a cash position. However, these results tell us that our intuition is wrong. One plausible explanation for this is that on most days markets overreact to news and this is somewhat corrected in the next day(s).
We also observe that the annual return’s standard deviation of the MC-1L strategy is very stable at 10.8, much better than GV-1L at 17.4, Buy-and-hold strategies investing in the three indices S&P-500 at 18.3, S&P-Growth at 22.3, and S&P-Value at 16.6. This stability makes MC-1L a viable option for those seeking a more stable return.
Therefore, we remove the “winner” strategies from the analysis and focus on a comparison of Growth-Value and Market-Cash “loser” strategies. We first consider the case .
In 23 years from 2001 to 2023, compared with the MC-1W, the GV-1L strategy has a higher True Positive Rate in 17 years, higher TNR in 10 years, and higher overall accuracy in 18 years. The difference in TNR was quite significant (53 vs. 45 median values), the underperformance in TNR was minor (55 vs. 57) as was the overall accuracy (53 vs. 51). Nevertherless, the significant over-performance in TPR resulted in significant overperformance of the GV-1L strategy.
One exception to this is the year 2020. In that year, the GV-1L strategy returned 6% (vs. Median return ) whereas the MC-1l returns 36.1% (vs. median return ). We can explain this by examining the machine learning metrics in Table 13 and average benchmark returns in Table 14.
Growth-Value GV-1L: for most years, the number of “+” and “−” days is about equal, with median values of and , respectively. However, in 2020, the number of “+” days increased significantly to and the number of “−” days decreased significantly to . At the same time, as seen from Table 13, for that year the True Positive Rate dropped to compared to its median . This resulted in , much lower than the median value of . The True Negative Rate remained at its typical (median) value of . As seen from Table 11, although the value of was higher than the median value of , the decrease in and the number of “+” days resulted in big drop in returns for this strategy. Also, 2020 was one of the few years when and , compared to typical values, the overall decline in to 57 contributed to significant underperformance of GV-1L strategy for that year with unusually low return of compared to the typical annual return of
Market-Cash MC-1L: in 2020, the number of “+” days and the number of “−” days was close to the median values median values of and , As seen from Table 13, for that year the True Positive Rate was compared to its median . This resulted in , much higher than the median value of . The True Negative Rate was compared to its median value . This resulted in , much higher than the median value . As seen from Table 11, for 2020 the returns and were twice the median values of and respectively. However, a much higher value for means that we remained in cash positions for more days, and much higher value for means that we took more advantage of higher returns on the “+” days. Since the number of correctly predicted true labels is significantly higher than this number in a typical year (. All this resulted in an unusually high return of for MC-1L strategy, compared to the typical value of .
The above discussion illustrates how one can connect machine learning and return statistics and explain the difference in strategy performance.
Comparison of volatility and drawdowns
Next, we consider the volatility. We saw in Section “Analysis of volatility and sharpe ratios by corresponding machine learning metrics” that increasing TPR increases both the returns and the volatility whereas increasing TNR increases the return by a smaller amount and decreases volatility. increase in TNR. A detailed comparison of volatility and drawdowns is presented in the Appendix in Table 15. A summary of median values is presented in Table 15.
Annual maximum drawdown and volatility of investment strategies.
Maximum Drawdowns
Annual Volatility
Buy-and-Hold
day
Buy-and-Hold
day
Year
S&P
G
V
GV
MC
S&P
G
V
GV
MC
2001
28.8
48.6
18.1
32.0
19.7
21.9
40.7
17.2
33.4
16.2
2002
33.0
40.1
31.2
26.4
18.0
26.4
33.1
24.5
29.2
19.5
2003
13.7
12.8
15.3
13.9
6.5
16.5
18.2
16.5
17.2
10.9
2004
7.5
10.8
7.5
6.5
6.6
11.1
11.1
10.5
11.2
7.4
2005
7.0
8.0
6.2
5.6
4.9
10.3
10.2
9.8
10.1
7.1
2006
7.6
9.0
7.4
7.7
6.8
10.0
10.7
9.8
10.6
7.0
2007
9.9
9.0
12.0
10.1
5.7
15.9
14.6
16.7
15.9
11.0
2008
47.1
48.0
47.0
41.8
20.9
41.4
37.4
40.9
39.9
33.3
2009
27.1
23.0
30.6
28.4
14.7
26.6
25.5
27.8
26.9
18.9
2010
15.7
16.3
14.5
14.3
8.0
17.9
18.8
17.8
18.1
14.6
2011
18.6
16.5
21.9
16.8
16.1
23.0
21.5
23.7
22.2
16.3
2012
9.7
8.3
11.2
6.3
9.2
12.7
12.0
13.9
13.1
9.5
2013
5.6
5.8
5.1
4.3
4.7
11.1
10.9
10.4
10.8
7.7
2014
7.3
7.4
7.4
7.4
4.2
11.2
12.2
10.3
11.3
8.0
2015
11.9
11.8
13.6
13.2
11.7
15.4
15.8
15.3
15.7
12.6
2016
9.2
9.5
8.6
10.8
3.8
13.1
13.4
13.3
13.8
8.8
2017
2.6
2.4
4.4
3.3
1.8
6.7
7.2
7.2
7.2
4.9
2018
19.3
20.6
19.2
19.5
19.9
17.0
19.5
15.1
17.1
14.0
2019
6.6
6.4
7.7
7.3
6.6
12.5
12.9
12.8
13.0
9.1
2020
33.7
31.3
36.9
36.0
17.6
33.5
34.7
35.2
35.3
26.3
2021
5.1
8.7
5.8
7.9
4.3
13.0
16.3
13.0
15.3
9.4
2022
24.5
32.3
17.9
19.5
17.0
24.2
30.6
19.2
26.3
18.0
2023
10.0
9.1
10.9
9.8
7.0
13.0
13.3
13.3
13.5
9.3
2.6
2.4
4.4
3.3
1.8
41.4
40.7
40.9
39.9
33.3
47.1
48.6
47.0
41.8
20.9
6.7
7.2
7.2
7.2
4.9
10.0
10.8
12.0
10.8
7.0
15.4
15.8
15.1
15.7
10.9
15.7
17.2
15.7
15.2
10.2
17.6
19.2
17.1
18.6
13.0
11.5
13.6
11.2
10.8
6.2
8.4
9.7
8.4
9.0
6.8
Not surprisingly, much higher TNR for GV-1L translated into significantly higher volatility. In fact, in each of the 23 years from 2001 to 2023, the volatility of GV-1L strategy was higher than the volatility of MC-1L by about 50% as measured by the median values (15.7 vs. 10.9). For the maximum drawdowns, GV-1L had higher drawdowns than MC-1L in 19 out of 23 years, also about 50% as measured by the median values ( vs. ).
For volatility, recall the equation (16) linking ML-volatilities of the benchmarks and machine learning metrics:
From the above, we compute the volatilities of the two strategies for 2023. For illustration, we will compute these from the ratios.
Growth-Value: we have , , , and , , . Substituting this in equation (16) we obtain .
Market-Cash: we have , , , and , , . Substituting this in equation (16) we obtain .
We could consider other measures of volatility, for example, the integrated volatility defined by . This integrated volatility can be expressed in terms of the average return and volatility by , and therefore, can be expressed in terms of confusion matrix counts. Similarly, from equation (10) we can consider integrated volatility in terms of logarithmic returns.
Comparison of tracking errors and sharpe ratios
A detailed comparison of tracking errors and Sharpe values is presented in Table 16
Comparison of tracking errors and sharp ratios.
Return
Tracking Error
Sharp Ratio
Year
S&P
G
V
GV
MC
S&P
G
V
GV
MC
2001
11.8
14.1
6.2
28.8
6.7
0.5
0.6
0.3
0.5
0.3
2002
21.6
10.5
3.6
55.7
30.5
0.8
1.0
0.7
1.2
0.5
2003
28.2
0.2
3.0
23.3
10.4
1.7
1.6
1.5
3.0
1.6
2004
10.7
5.4
2.5
7.7
6.2
1.0
0.5
1.3
1.6
0.6
2005
4.8
2.1
0.5
4.8
6.0
0.5
0.3
0.5
1.0
1.5
2006
15.8
6.8
5.7
3.1
11.6
1.6
0.8
2.2
1.8
0.6
2007
5.1
5.6
3.7
7.4
14.2
0.3
0.7
0.1
0.8
1.8
2008
36.8
0.6
0.5
10.3
47.3
0.9
1.0
0.9
0.7
0.3
2009
26.4
10.6
9.3
8.1
6.1
1.0
1.5
0.6
1.3
1.1
2010
15.1
1.2
0.4
7.7
1.6
0.8
0.9
0.9
1.3
0.9
2011
1.9
2.8
2.6
7.4
5.3
0.1
0.2
0.0
0.4
0.2
2012
16.0
1.8
1.2
20.3
15.3
1.3
1.2
1.2
2.8
0.1
2013
32.3
0.3
0.5
7.9
16.3
2.9
3.0
3.0
3.7
2.1
2014
13.5
1.3
1.3
3.4
2.5
1.2
1.2
1.2
1.5
1.4
2015
1.2
3.8
4.4
3.3
1.0
0.1
0.3
0.2
0.3
0.0
2016
12.0
5.2
5.1
0.9
5.9
0.9
0.5
1.3
0.8
2.0
2017
21.7
5.5
6.3
0.5
8.3
3.2
3.8
2.1
3.0
2.7
2018
4.6
4.5
4.4
2.2
8.1
0.3
0.0
0.6
0.4
0.9
2019
31.2
0.4
0.5
5.5
22.0
2.5
2.4
2.5
2.0
1.0
2020
18.3
15.1
17.0
12.3
17.8
0.5
1.0
0.0
0.2
1.4
2021
28.7
3.3
3.8
12.4
14.6
2.2
2.0
1.9
1.1
1.5
2022
18.2
11.2
12.9
5.1
11.5
0.8
1.0
0.3
0.5
0.4
2023
26.2
3.8
4.0
3.9
9.7
2.0
2.3
1.7
1.6
1.8
32.3
15.1
12.9
55.7
47.3
3.2
3.8
3.0
3.7
2.7
36.8
14.1
17.0
12.4
22.0
0.9
1.0
0.9
0.7
0.9
13.5
0.3
0.5
5.1
5.3
0.9
0.8
0.9
1.2
1.0
9.4
0.0
0.9
7.2
0.0
0.9
0.9
0.8
1.2
0.9
18.3
6.8
6.0
14.4
16.1
1.2
1.2
1.1
1.1
0.9
In 17 out of 23 years, the tracking error of GV-1L was superior to that of Market-Cash with a median value of and mean value of . The market cash offered very little advantage over S&P Buy-and-Hold. In 2008, it outperformed the index by . Without this year, the strategy would probably result in a loss. By contrast, the Growth-Value would still outperform the benchmark even if we were to remove its best year in 2022.
Next, we consider the Sharpe ratios. First, we illustrate the computation for 2023 using the previous results that we computed using machine learning and return statistics:
Growth-Value: and . This gives ML-Sharpe’s ratio
Market-Cash: and . This gives ML-Sharpe’s ratio
We now examine Sharpe ratios in more detail. From Table 16, the GV-1L has the highest Sharp ratio with a median and mean value of 1.2. This value is higher than that of the benchmarks ( range). By contrast, the Sharpe ratios value of MC-1L is lower than GV-1L and is comparable to that of the benchmarks. In fact, in 15 out of 23 years, the Sharpe ratio of GV-1L was higher than that of MC-1L.
Comparison of strategies by return efficiency ratios
In the previous sections, we compared the strategies by examining returns, volatility, and Sharpe’s ratio. We now ask the question: how efficient are the strategies, and do they outperform a simple random flip strategy? To that end, we compute the return efficiency index of the strategies and compare it to the return efficiency index of random flip.
and the detailed comparison is presented in the Appendix in Table 17. In 23 years from 2001–2023, in terms of the Return Efficiency Index, the GV-1L outperformed the random flip strategy in 16 years with the mean difference of in return efficiency ratios. The MC-1L outperformed the random flip in 14 years by a much smaller mean difference of . The median value of the Return efficiency index for was 0.41 and is about 25% much higher than the median value of 0.32 for MC-1L. Not only does GV-1L give higher return and Sharpe’s ratio, but it is also more efficient as a strategy in capturing the potential returns range.
Return efficiency ratios (C means cash).
Buy-and-Hold
Growth-Value
Market-Cash
Year
S&P
G
V
C
GV-1L
Rand.
MC-1L
Rand.
2001
0.19
0.11
0.14
0.23
0.18
0.13
0.21
0.21
2002
0.14
0.17
0.21
0.19
0.38
0.19
0.21
0.17
2003
0.32
0.33
0.32
0.22
0.45
0.32
0.28
0.27
2004
0.37
0.35
0.42
0.30
0.47
0.38
0.33
0.33
2005
0.36
0.39
0.42
0.33
0.47
0.40
0.40
0.34
2006
0.42
0.34
0.51
0.30
0.48
0.42
0.33
0.36
2007
0.30
0.46
0.35
0.28
0.48
0.40
0.37
0.29
2008
0.08
0.28
0.29
0.13
0.37
0.29
0.15
0.11
2009
0.21
0.41
0.29
0.15
0.39
0.35
0.20
0.18
2010
0.30
0.40
0.39
0.24
0.46
0.40
0.29
0.27
2011
0.22
0.41
0.35
0.21
0.45
0.38
0.20
0.21
2012
0.37
0.37
0.40
0.28
0.57
0.39
0.28
0.32
2013
0.44
0.43
0.42
0.25
0.53
0.43
0.34
0.34
2014
0.39
0.44
0.40
0.30
0.46
0.42
0.37
0.34
2015
0.29
0.46
0.35
0.29
0.45
0.41
0.29
0.29
2016
0.36
0.35
0.45
0.29
0.39
0.40
0.40
0.33
2017
0.54
0.49
0.35
0.29
0.41
0.42
0.44
0.41
2018
0.27
0.40
0.32
0.29
0.35
0.36
0.23
0.28
2019
0.42
0.41
0.42
0.25
0.35
0.42
0.30
0.33
2020
0.17
0.34
0.22
0.14
0.23
0.28
0.21
0.16
2021
0.39
0.30
0.28
0.24
0.24
0.29
0.32
0.32
2022
0.16
0.21
0.33
0.21
0.29
0.27
0.19
0.18
2023
0.38
0.44
0.37
0.25
0.37
0.40
0.33
0.31
0.54
0.49
0.51
0.33
0.57
0.43
0.44
0.41
0.08
0.11
0.14
0.13
0.18
0.13
0.15
0.11
0.32
0.39
0.35
0.25
0.41
0.39
0.29
0.29
0.31
0.36
0.35
0.25
0.4
0.35
0.29
0.28
0.11
0.1
0.08
0.05
0.1
0.08
0.08
0.08
A comparison of simple vs. logarithmic returns ().
True Label
Average Daily Return
Average Log Returns
Year
B&H
App.
App.
2001
11.76
125
123
1.01
1.11
10.11
14.01
1.00
1.12
12.51
6.37
2002
21.58
120
132
1.26
1.30
20.83
3.48
1.25
1.32
24.31
12.65
2003
28.18
144
108
0.80
0.82
26.20
7.04
0.79
0.83
24.83
11.88
2004
10.70
145
107
0.52
0.60
10.78
0.81
0.51
0.60
10.16
5.06
2005
4.83
140
112
0.48
0.56
5.25
8.66
0.48
0.56
4.72
2.34
2006
15.85
141
110
0.47
0.47
15.21
4.01
0.47
0.47
14.71
7.17
2007
5.15
139
112
0.67
0.78
6.28
21.99
0.67
0.79
5.02
2.54
2008
36.79
126
127
1.56
1.84
37.36
1.53
1.52
1.87
45.87
24.67
2009
26.35
141
111
1.17
1.24
26.93
2.19
1.16
1.26
23.39
11.25
2010
15.06
147
105
0.73
0.88
15.63
3.82
0.73
0.88
14.03
6.79
2011
1.89
136
116
0.97
1.10
4.53
138.94
0.96
1.11
1.89
0.47
2012
15.99
139
111
0.59
0.60
15.64
2.20
0.59
0.60
14.83
7.23
2013
32.31
149
103
0.55
0.52
28.62
11.41
0.55
0.52
28.00
13.32
2014
13.46
149
103
0.49
0.58
13.27
1.47
0.48
0.58
12.64
6.12
2015
1.23
121
131
0.75
0.68
2.41
95.63
0.75
0.68
1.22
1.12
2016
12.00
138
114
0.57
0.59
12.18
1.56
0.57
0.59
11.33
5.56
2017
21.71
144
107
0.33
0.27
19.88
8.42
0.33
0.27
19.64
9.49
2018
4.57
135
116
0.68
0.81
3.23
29.27
0.67
0.82
4.67
2.22
2019
31.22
150
102
0.57
0.56
27.97
10.42
0.57
0.57
27.17
12.97
2020
18.33
146
107
1.23
1.46
22.47
22.57
1.21
1.49
16.84
8.16
2021
28.73
146
106
0.63
0.62
26.11
9.12
0.63
0.63
25.26
12.09
2022
18.18
110
141
1.29
1.13
17.13
5.73
1.28
1.14
20.06
10.39
2023
26.18
141
109
0.66
0.63
24.11
7.91
0.66
0.64
23.25
11.19
32.31
150
141
1.56
0.27
28.62
138.94
1.56
0.27
28.00
24.67
36.79
110
102
0.33
1.84
37.36
0.81
0.33
1.84
45.87
0.47
13.46
141
111
0.67
0.68
13.27
7.91
0.67
0.68
12.64
7.23
9.40
138
114
0.78
0.83
9.34
17.92
0.78
0.83
7.46
8.31
18.25
10.6
10.4
0.32
0.37
17.40
32.88
0.32
0.37
18.60
5.37
For ML-Return Efficiency Ratios, recall the equation linking ratios of the benchmarks and machine learning metrics:
Let us illustrate the computation of the Return Efficiency Ratios of the two strategies for 2023:
Growth-Value: we have , , , and . Substituting this in the above equation, we obtain .
From equation (25) we compute the probability of the equivalent -strategy:
Market-Cash: we have , , , and . Substituting this in the above equation, we obtain .
From equation (25) we compute the probability of the equivalent -strategy:
By return efficiency index, the GV-1L strategy is superior to a random flip strategy, whereas the MC-1L strategy is inferior to a random flip strategy. Although the and are similar to both strategies, the efficiency return indices of the benchmarks are quite different.
Choosing the number of nearest neighbors and transaction costs
In Nearest Neighbor algorithms, one of the important hyperparameters (besides the distance metric) is - the number of neighbors to use. Since the decision on predicting labels is made by the majority (minority) of true labels, the number must be odd and, in the simplest case, . This value is determined by comparing the results for different .
We start by comparing the Growth for different . The results are presented in Figures 7, 8 and 9. We can see that as we increase , the performance of the Growth-Value strategy degrades rapidly. Intuitively, we can explain this as follows: in daily movements, the market overreacts to positive and negative news. This is corrected in the very near term (the next day). However, this effect is very short-lived. For very large , we will end up predicting labels depending on the proportion of true labels in the past trading days for both strategies. To illustrate this further, consider the annual returns for 2023 for different :
Growth-Value: annual returns decrease with : for , for , for , for , and for .
Market-Cash: annual returns also show a decreasing trend: for , for , for , for , and for .
Return Efficiency Index For Different .
Annual Return Efficiency Index for Trading Algorithms.
Growth, Maximum Drawdown, Volatility And Number Of Transactions For Different .
In 2023, the Growth-Value strategy’s returns were higher for lower because it capitalized on short-term market overreactions. As increased, returns decreased due to reduced responsiveness. For the Market-Cash strategy, returns also declined with higher , showing the market’s lack of memory and short-lived price movements. Over the whole range, both strategies showed diminishing returns with increasing .
Next, we compare Maximum drawdowns and volatility. This is presented in Figure 9. As we can see from this graph, the MDD for both strategies remains stable with the MDD for Growth-Value strategy about 50% higher than for Market-Cash strategies. if we examine annualized volatility, we note that its value does not change with . On average, the annualized volatility of Growth-Value is about 18%, whereas the volatility of Market-Cash is about 13%. The volatility of the Growth-Value strategy is about 50% higher than Market-Cash that is similar to the volatility of the S&P-500 index. Since the returns decrease for larger whereas volatility does not change, we see that the risk-adjusted returns as measured by Sharp’s ratio will decrease for larger as well.
We analyzed the differences between the return efficiency index of strategies compared with a random flip for different values. For GV, these differences were 0.047 for , 0.038 for , 0.022 for , 0.016 for , and 0.017 for . For MC, the differences were 0.013 for , 0.006 for , 0.008 for , 0.008 for , and 0.003 for . As increases, both strategies’ differences tend to decrease to 0. This means that for larger , the return efficiency approaches those of the random flip strategy, indicating that their average return efficiency ratios decrease towards randomness. However, this convergence is not towards a random flip strategy, but most likely to a random flip with probability equal to preference . For larger , the probability that a majority label is “+” would correspond to the proportion of “+” labels in the data. A similar situation is often encountered in machine learning, where increasing the number of neighbors does not necessarily increase the accuracy and for large , the accuracy will simply converge to the proportion of labels of some class in the dataset.
Finally, we compare the transaction costs for different in Figure 9. The number of transactions is comparable for both Growth Value and Market Cash strategies and drop rapidly as we increase. For , the number of transactions is about 120, indicating that on average, these strategies would require trading every two days. For larger , the number of trades drops to about 30, indicating trading every 5-7 days.
Additional examples on using return efficiency index
In this paper, we presented a detailed analysis of a simple trading strategy based on -NN nearest neighbor classification popular in machine learning (Bishop, 2016). One of our main results is the concept of “Return Efficiency Index” that can be used to connect machine learning accuracy and returns of strategies and to provide a universal metric to compare these algorithms.
Therefore, we can apply a similar analysis to describe the performance of trading algorithms based on other machine learning algorithms (Joiner et al., 2022). For example, consider two popular deep learning architectures (Ahmed et al., 2022; Hu et al., 2021) for label prediction: LSTM (Long-Short Term Memory) and CNN (Convolutional Neural Networks). In both architectures, we predict the next trading label based on input patterns of 10-day trading periods.
In Figure 8, we compare these deep-learning-based models with -NN based models using the Return Efficiency Index. The index captures not only accuracy but also robustness in financial prediction tasks. It is a key metric for comparing the reliability of models.
3-Month vs. 6-month Performance: For 6-month forecasts, CNN and LSTM have nearly identical performance, with average Return Efficiency Index values both rounding to 0.34. For 3-month forecasts, CNN performs slightly better than LSTM (CNN: 0.34, LSTM: 0.33). This shows that both models handle short- and medium-term trends similarly well.
GV-1L: The GV-1L strategy shows the highest Return Efficiency Index at 0.40, particularly excelling in earlier years. However, its value declines over time and gets lower than both LSTM and CNN in later years.
CNN Architecture: The CNN model includes convolutional layers with max-pooling and dense layers.
The Return Efficiency Index is a valuable tool for evaluating models. It links the forecasting accuracy of label prediction (Machine Learning) to trading algorithm financial performance. As such, it could be an additional useful metric for comparing trading algorithms.
Concluding remarks
In this paper, we presented an approach to analyze and explain the performance of algorithmic trading strategies by using an analogy to classification in machine learning. We derived explicit expressions that qualitatively relate strategy statistics to machine learning metrics derived from the underlying confusion matrices. We introduced a new performance metric of the Return Efficiency Index. This new index provides a link between the return performance of trading strategies and the accuracy of the strategy as a machine learning classifier. This new metric gives a universal scale to compare any trading strategies in terms of their ability to capture the possible returns and outperform a random flip strategy. We applied our approach to trading strategies designed by analogies to machine learning. Future work will focus on applying other methods and ideas, from machine learning to algorithmic trading.
Footnotes
Acknowledgements
We want to thank Metropolitan College Boston University, for their support.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Author contributions
All authors contributed equally to the effort.
Funding
This research was conducted without any external funding. All aspects of the study, including design, data collection, analysis, and interpretation, were carried out using the resources available within the authors’ institution.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability
All the relevant data and analysis are available via:
Appendix A: Details on ML-volatility and ML-sharpe ratio computations
We will use to denote volatility with appropriate subscripts to indicate specific trading strategies or benchmarks, and to represent ML-volatility computed from the confusion matrix. If denotes the standard deviation of daily returns, then the volatility over days is given by . Recall that the standard deviation of simple returns is approximately equal to the standard deviation of logarithmic returns as shown in equation (10). Let denote the prevalence. As before, we assume that daily returns are independent.
Let and denote the standard deviation of daily returns for and , then the volatilities for the buy-hold strategies and we have
For the ideal strategy, we have for ML-volatility
Similarly, for the worst strategy we have
For a generic strategy, we have
Since
we can rewrite the expression for ML-volatility for a generic strategy as
Ignoring the risk-free rate, we can write the expressions relating the Sharpe Ratios with confusion matrix entries. We will use the notation to denote the Sharpe ratio.
For the Buy-and-Hold strategist and we have
The ML-Sharpe’s ratio for the ideal strategy
The ML-Sharpe’s ratio for the worst strategy
Note that although , it is possible that . For example, if the numbers of true positive and true negative labels are the same then and . In this case, from the above two equations we obtain
Although we assume , it is still quite possible to have resulting in .
Finally, the ML-Sharpe’s ratio for a generic strategy is
Appendix B: Detailed tables and figures
This Appendix contains tables with detailed annual statistics for strategies
References
1.
AgrawalMKhanAKumarS (2019) Stock price prediction using technical indicators: A predictive model using optimal deep learning. International Journal of Recent Technology and Engineering. https://api.semanticscholar.org/CorpusID:219325700.
2.
AhmedDMHassanMMMstafaRJ (2022) A review on deep sequential models for forecasting time series data. Applied Computational Intelligence and Soft Computing.
3.
AshithaBSakshiSVishalS, et al. (2023) Prediction and sentiment analysis of stock using machine learning. International Journal for Research in Applied Science and Engineering Technology. DOI: https://doi.org/10.22214/ijraset.2023.53169.
4.
AyalaJGarcía-TorresMNogueraJ, et al. (2021) Technical analysis strategy optimization using a machine learning approach in stock market indices. Knowledge-based Systems225: 107119. DOI: https://doi.org/10.1016/J.KNOSYS.2021.107119.
5.
BegMOAwanMNAliA (2019) Algorithmic machine learning for prediction of stock prices. DOI: 10.4018/978-1-5225-7805-5.CH007.
6.
BishopC (2016) Pattern Recognition and Machine Learning. New York: Springer.
7.
BitvaiZCohnT (2015) Day trading profit maximization with multi-task learning and technical analysis. Machine Learning101: 187–209. DOI: https://doi.org/10.1007/s10994-014-5480-x.
8.
BuachuenWKantavatP (2023) Automated stock trading system using technical analysis and deep learning models. In: Proceedings of the 13th international conference on advances in information technology, pp.1–9. DOI: 10.1145/3628454.3631670.
9.
ChavarnakulTEnkeD (2009) A hybrid stock trading system for intelligent technical analysis-based equivolume charting. Neurocomputing72: 3517–3528. DOI: https://doi.org/10.1016/j.neucom.2008.11.030.
10.
ChenL (2020) Using machine learning algorithms on prediction of stock price. Journal of Modeling and Optimization12(2): 84–99. DOI: https://doi.org/10.32732/jmo.2020.12.2.84.
11.
ChenLLiXXieZ (2023) Two-stage attentional temporal convolution and lstm model for financial data forecasting. In: International conference on electronic information engineering and data processing (EIEDP 2023), volume 12700, pp.122–130. SPIE.
12.
ChoudhryRGargK (2008) A hybrid machine learning system for stock market forecasting. World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering2: 689–692.
13.
DashRDashP (2016) A hybrid stock trading framework integrating technical analysis with machine learning techniques. The Journal of Finance and Data Science2: 42–57. DOI: https://doi.org/10.1016/J.JFDS.2016.03.002.
14.
GerleinEMcGinnityTBelatrecheA, et al. (2016) Evaluating machine learning classification for financial trading: An empirical approach. Expert Systems With Applications54: 193–207. DOI: https://doi.org/10.1016/j.eswa.2016.01.018.
15.
GrigoryanH (2017) Stock market trend prediction using support vector machines and variable selection methods. In: Proceedings of the 2017 international conference on applied mathematics, modelling and statistics application (AMMSA 2017), pp.210–213. DOI: 10.2991/ammsa-17.2017.45.
16.
GuSKellyBTXiuD (2019) Empirical asset pricing via machine learning. In: Chicago Booth Research Paper No. 18-04, 31st Australasian Finance and Banking Conference 2018, Yale ICF Working Paper No. 2018-09. DOI: 10.2139/ssrn.3159577. https://ssrn.com/abstract=3159577.
17.
HastleT (2018) Elements of Statistical Learning. New York: Springer.
18.
HsuMLessmannSSungM, et al. (2016) Bridging the divide in financial market forecasting: Machine learners vs. financial economists. Expert Systems With Applications61: 215–234. DOI: https://doi.org/10.1016/j.eswa.2016.05.033.
19.
HuZZhaoYKhushiM (2021) A survey of forex and stock price prediction using deep learning. Applied System Innovation4(1): 9.
20.
HudsonRGregoriouA (2015) Calculating and comparing security returns is harder than you think: A comparison between logarithmic and simple returns. International Review of Financial Analysis38: 151–162. DOI: https://doi.org/10.1016/j.irfa.2014.10.008.
21.
JagadishaNRaghuramPPraveenPP, et al. (2022) Stock price movement prediction using machine learning. International Journal of Advanced Research in Science, Communication and Technology. DOI: https://doi.org/10.48175/ijarsct-7774.
22.
JohnsonNKotzS (1970) Distributions in Statistics. New York: Wiley.
23.
JoinerDVezeauAWongA, et al. (2022) Algorithmic trading and short-term forecast for financial time series with machine learning models; state of the art and perspectives. In: 2022 IEEE international conference on recent advances in systems science and engineering (RASSE), pp.1–9. DOI: 10.1109/RASSE54974.2022.9989592.
24.
KarthikK (2023) Applications of machine learning in predictive analysis and risk management in trading. International Journal of Innovative Research in Computer Science and Technology11(6): 18–25. DOI: https://doi.org/10.55524/ijircst.2023.11.6.4.
25.
KhanAShahAAShahidR, et al. (2023) A performance comparison of machine learning models for stock market prediction with novel investment strategy. PloS One18(9): e0286362.
26.
KimHWonJ (2018) An ensemble model integrating machine learning algorithms with technical indicators for stock price prediction. Expert Systems with Applications107: 123–130. DOI: https://doi.org/10.1016/j.eswa.2018.04.021.
27.
LumoringNChandraDAgungA, et al. (2023) A systematic literature review: Forecasting stock price using machine learning approach. In: 2023 International conference on data science and its applications (ICoDSA), pp.129–133. DOI: 10.1109/ICoDSA58501.2023.10277318.
28.
MahfoozSIftikharAKhanMN (2022) Improving stock trend prediction using lstm neural network trained on a complex trading strategy. International Journal for Research in Applied Science and Engineering Technology10(7): 4361–4371. DOI: https://doi.org/10.22214/ijraset.2022.45961.
29.
MeesadPBoonmathamS (2023) A combination of machine learning-based natural language processing with technical analysis for stock trading. Indonesian Journal of Electrical Engineering and Computer Science30: 422–434. DOI: https://doi.org/10.11591/ijeecs.v30.i1.pp422-434.
30.
MndaweSTPaulBSDoorsamyW (2022) Development of a stock price prediction framework for intelligent media and technical analysis. Applied Sciences12(2): 719. DOI: https://doi.org/10.3390/app12020719.
NousiPTsantekidisAPassalisN, et al. (2018) Machine learning for forecasting mid-price movements using limit order book data. IEEE Access7: 64722–64736. DOI: https://doi.org/10.1109/ACCESS.2019.2916793.
33.
NtakarisAKanniainenJGabboujM, et al. (2018) Mid-price prediction based on machine learning methods with technical and quantitative indicators. PloS one15. DOI: https://doi.org/10.2139/ssrn.3213389.
34.
OyewolaDDadaEOlaoluwaO, et al. (2019) Predicting nigerian stock returns using technical analysis and machine learning. European Journal of Electrical Engineering and Computer Science3. DOI: https://doi.org/10.24018/EJECE.2019.3.2.65.
35.
PadhiDPadhyNBhoiA, et al. (2022) An intelligent fusion model with portfolio selection and machine learning for stock market prediction. Computational Intelligence and Neuroscience2022. DOI: https://doi.org/10.1155/2022/7588303.
36.
ParkJShinS (2019) Stock price prediction using deep learning models and technical analysis indicators. IEEE Access7: 115463. DOI: https://doi.org/10.1109/ACCESS.2019.2933336.
37.
PasupuletyUAnmolAMohanB (2019) Predicting stock prices using ensemble learning and sentiment analysis. In: 2019 IEEE second international conference on artificial intelligence and knowledge engineering (AIKE), pp.215–222. DOI: 10.1109/AIKE.2019.00045.
38.
PatelJShahSRThakkarP, et al. (2015) Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Systems With Applications42: 259–268. DOI: https://doi.org/10.1016/j.eswa.2014.07.040.
39.
PholsriP (2023) Combining technical analysis and deep learning models for stock market trading. DOI: 10.58837/chula.the.2022.103.
40.
PradipGBariCNandhiniJ (2018) Stock market prediction using machine learning. Journal of Computational and Theoretical Nanoscience. DOI: https://doi.org/10.1166/jctn.2020.8405.
41.
SatchellSKnightJ (2001) Return Distributions in Finance (Quantitative Finance). New York: Butterworth-Heinemann.
42.
Tran VanQNguyen BaoTPham MinhT (2023) Integrated hybrid approaches for stock market prediction with deep learning, technical analysis, and reinforcement learning. In: Proceedings of the 12th international symposium on information and communication technology, pp.213–220. DOI: 10.1145/3628797.3629018.
43.
TsantekidisAPassalisNToufaA, et al. (2020) Price trailing for financial trading using deep reinforcement learning. IEEE Transactions on Neural Networks and Learning Systems32: 2837–2846. DOI: https://doi.org/10.1109/tnnls.2020.2997523.
44.
WangJSunTLiuB, et al. (2018) Financial markets prediction with deep learning. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA), pp.97–104. DOI: 10.1109/ICMLA.2018.00022.
45.
WongAFiginiJRaheemA, et al. (2023) Forecasting of stock prices using machine learning models. In: 2023 IEEE international systems conference (SysCon), pp.1–7. DOI: 10.1109/SysCon53073.2023.10131091.
46.
YaoWGuYChangJL, et al. (2022) Stock price analysis and forecasting based on machine learning. In: Third international conference on computer science and communication technology (ICCSCT 2022), volume 12506, pp.1503–1510. SPIE.
47.
YuSYangSYoonS (2023) The design of an intelligent lightweight stock trading system using deep learning models: Employing technical analysis methods. MDPI Systems11(9). DOI: https://doi.org/10.3390/systems11090470.
48.
ZarkiasKPassalisNTsantekidisA, et al. (2019) Deep reinforcement learning for financial trading using price trailing. In: ICASSP 2019 - 2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp.3067–3071. DOI: 10.1109/ICASSP.2019.8683161.
49.
ZhangLYuanHWuM, et al. (2021) A hybrid deep learning model for stock price prediction using technical analysis indicators. Journal of Finance and Data Science7: 67–78. DOI: https://doi.org/10.1016/j.jfds.2021.01.003.
50.
ZhongXEnkeD (2019) Predicting the daily return direction of the stock market using hybrid machine learning algorithms. Financial Innovation5: 1–20. DOI: https://doi.org/10.1186/s40854-019-0138-0.