Abstract
Pairs trading is a widespread market-neutral trading strategy aiming to utilize the relationship between pairs of financial instruments in efficient markets, where predictability of separate asset movements is theoretically not possible. The implication of trading pairs, following statistical analysis, is to buy the underpriced asset while short selling the overpriced. The predicted price relationship is determined through analysis of historical spread data between the members of the corresponding pair. The investor expects the price difference, in an efficient market, should converge and stocks return to their ‘fair value’, where the positions are closed and profit is realized. The main focus of this study is the contribution of the fuzzy engine to the existing pairs trading strategy. Widespread classical ‘crisp’ technique is chosen, utilized and compared with the developed ‘fuzzy’ model throughout the paper. In order to further improve this contribution, the expert opinions extracted from the Bloomberg database are also integrated into the fuzzy decision-making process. In most studies, transaction costs are simply ignored. As a final robustness check, the transaction costs are also considered. The improvement reached by the developed fuzzy technique is observed to be even more remarkable in this case.
Introduction
Studied for decades by financial and industrial engineers, mathematicians, and market quants, pairs trading is a relatively popular and market-neutral trading strategy that tries to take advantage of the differences in the prices of similar stocks. In principal, the strategy buys the stock that is underpriced compared to its counterpart in the pair, and accordingly short sells the other stock, and expects the prices of the stocks in the pair would converge within the intended time horizon. Once the convergence happens both positions are closed. In this study, this strategy is applied on the end-of-day data from year 2013 to 2014 of the energy sector stocks in Nasdaq Index.
Pairs trading aims to utilize this mispricing by testing pairs of assets, instead of a specific stock and using historical data to predict future movements. These types of strategies are analyzed under the broad class of statistical arbitrage strategies. The performance of the overall statistical arbitrage techniques have been analyzed extensively [22]. Pairs trading strategy is a particular kind within this class.
As any other algorithmic trading method, the strength of this strategy, and variants, diminish over time as more market actors employ it for profitability. Although, Do and Faff [21] find evidence that during the turbulent times in the financial markets such as the global crisis in 2008, the performance of the pairs trading strategies are quite high. Elliott et al. also points out the significant increase in the level of efficiency of the pairs trading strategies if the market is out of its equilibrium [23]. Huck comments on the market timing, and finds strong connection between the returns of the pairs trading strategies and the volatility level in the market [18]. In line with all these findings, in a complex and vague environment such as financial markets, using ‘crisp’ rules and strategies, we believe, may lead to missed opportunities in the trade signal decision making step of any algorithm. The sensitivity of the performance of the returns of the pairs trading strategies is well explained by Huck [19]. Therefore, we propose using a fuzzy inference model to better exploit these opportunities. We believe, fuzzy logic holds great potential for further research in similar trading environments where human actors take place, as it allows imitation of the human decision making process.
This study focuses on the signal generation for opening and winding up the position for a trade. We took into consideration the co-integration measure during pairs generation period and the spread measure for opening and closing the positions in the corresponding pair. There are other measures, e.g. stationarity, partial co-integration [16] or the stochastic control approach [3], that are studied in the literature. One can see Krauss [6] for the comparison of the effectiveness of each of these techniques. But the main focus of this study is the contribution of the fuzzy engine to the existing pairs trading strategy. Most common techniques are chosen and utilized throughout the paper. In order to further improve this contribution, also the expert opinions extracted from the Bloomberg database are integrated to our fuzzy decision making engine. In most of the studies the transaction costs are simply ignored. Do and Faff [4] claim the transaction costs in fact diminishes the efficiency of pairs trading strategies. As a final robustness check, also the transaction costs are considered.
Rest of this paper is structured as follows. First concepts of statistical arbitrage, specifically pairs trading strategy for efficient stock markets are introduced and previous studies are listed in Section 2. Our materials and methods, consisting of traditionally utilized ‘crisp’ strategy and proposed fuzzy model are represented in Section 3. In Section 4, detailed results with fuzzy engine figures are laid out and compared to traditional ‘crisp’ pairs trading technique. Conclusions are discussed in Section 5.
Statistical arbitrage & pairs trading
Algorithmic trading strategies usually make use of the quantitative indicators that are formulated from the past price information. These indicators create crisp (buy or sell) trading recommendations. Most of those models are not based on strong theory and ignore fundamental information about the price; yet, most of the time they have been proven to be quite successful in terms of making profitable trades.
Pairs trading, one of those most successful algorithmic trading strategy, was first discovered in 1980’s by a team of quantitative traders on Wall Street led by Nunzio Tartaglia. After the team brilliantly employed their strategy and made millions of dollars in the market, they announced their strategy to the market in 1987. After two years, in 1989, the group fell apart and all the members started working in various places, which made the strategy even more common and widespread. Since then pairs trading has grown significantly and have become a market standard trading strategy adopted by many investment banks, hedge funds and other financial institutions, [10].
The beginning point of this study is the seminal paper of Gatev et al. [8], where the concept of pairs trading is described in its most basic form. The overall procedure can be analyzed in two separate steps. The first step is choosing two shares of stocks whose prices tend to move closely in the time interval called the formation period. Second step is watching the pair in the following time interval called the trading period. If the prices of the stocks move away from each other, simply the relatively underpriced one is bought, and the relatively overpriced one is sold short. When the difference between the stock prices revert back to the original level, both of the positions are closed and a profit is realized. At the end of the trading period all the open positions are closed regardless of the size of the spread.
Gatev et al. [8], find evidence that pairs trading works well even including the transaction costs. Do and Faff [21], applies to same methodology to a broader set of data and finds that it still works, although at a declining rate. In those studies the daily data is used. The training period is 12 months and the trading period is 6 months. The same framework has also been adapted in this study, as this study is not looking for the most perfect pairs trading strategy. Rather, this study focuses on the improvements attained by the addition of fuzzy engine in different ways.
Application of the fuzzy systems theory to financial problems is not something new in the literature. Gradejovic and Gencay, [17], applies fuzzy logic to a portfolio optimization problem and reports the improvement on the performance of the algorithmic trading considered. Kahraman and Kaya, [5], employs a similar technique to an investment analysis problem and uses fuzzy logic to improve the estimation of interest rates. Kablan, [1], makes use of the fuzzy logic reasoning mechanisms to measure the momentum in a high frequency trading problem. Bayram and Akat, [15], involves the fuzzy logic techniques to make better decisions while pairs trading. Although this paper adopts a similar strategy of integrating a fuzzy logic engine to a financial problem, it differs significantly in the way it does it. In all of the above studies fuzzy logic is intended to improve a crisp technical measure. In this study, in addition to this, the expert opinions have been converted to a technical indicator via the fuzzy logic theory.
Another important aspect of the pairs trading strategies is the impact of the transaction costs on the performance of the strategy. It attracted a lot of people’s interest in the field. Primbs and Yumada, [11], develops a model predictive control approach to the trading of a portfolio of pairs of stocks under proportional transaction costs. Do and Faff, [4], examine the impact of trading costs on pairs trading profitability in the U.S. equity market, from 1963 to 2009. They report that the strategy remains profitable, but the efficiency declines significantly. An important contribution of the paper is addressing the issue of the transaction costs. With the help of the expert opinions, the fuzzy enriched pairs trading strategy avoids some of the unnecessary trades. The impact of the fuzzy engine is even more visible in the case where the transaction costs are taken into consideration.
Many studies in the literature tries to incorporate the fuzzy logic techniques to the world of finance. However, most of them are applications of fuzzy expert systems for decision making support in a stock trading process [9]. Zapart [7], improves the performance of statistical arbitrage strategies via neural networks. Kablan [2], employs neuro fuzzy systems for high frequency trading and stock price movement predictions. None of these works combine the particular pairs trading the strategies and the fuzzy logic systems.
From a fuzzy logic theory perspective, most related study to this work is the paper by Cao et al. [12], proposing a fuzzy genetic algorithm framework for financial pairs mining to discover pair relationships between financial entities such as stocks and markets. The findings show 13 highly correlated pairs, out of total of tested 32, came from different sectors; which leads to the idea as we also mentioned in our conclusion that potential pairs are not necessarily come from the same sector as presumed by traders and financial researchers [14].
Kablan mentioned the cumulated order quantity of large order executing institutional traders and proposed a novel way of momentum analysis named as ‘fuzzy momentum analysis’ which makes use of fuzzy logic reasoning mechanisms [1]. Kablan is only interested in high frequency trading and large volume trading. In this study only the end of day data is considered as it is the most common approach in the pairs trading strategies in general.
Materials and methods
Aligned with the aim of this study, and to demonstrate the contribution to the domain of statistical arbitrage, widespread pairs trading strategy (mentioned as classical ‘crisp’ strategy throughout this paper) is employed and compared with the developed fuzzy-logic based method. The classical ‘crisp’ method uses co-integration measure for suitable pairs selection process, and distance (spread) values for the trade signal generation.
Description of the ‘crisp’ method
The crisp method of pairs trading can be described mainly in two parts. The pairs are chosen based on an observation of the stock price movements over a 12-month period which we call the training period. The pairs that are detected to be co-integrated are traded for the next 12 months following the training period which is called the trading period. The choices of the lengths of the training period and the trading period are arbitrary. One could try different time frames, however these lengths remain the same throughout the entire study.
Training period
The end-of-day data from year 2013 to 2014 of the energy sector stocks in Nasdaq Index are considered. All stocks that dropped out of the index during the time period are screened out. Hence there is no missing data point for any of the 67 stocks used (Fig. 1 – Process Box 1).

Flow diagram of the proposed algorithm.
As a result there are 2211 possible pairs to consider for trading. Engle-Granger co-integration test [20] is run for all these 2211 possible pairs. All the pairs that are found out to be not co-integrated are opted out. This brings down the number of pairs to be considered to 240 (Fig. 1 – Process Box 2–4).
Consequently, a series of return index is created for each stock over the training period. In the beginning all the stock prices are normalized to be 1, so that all the stocks start at the same price. Hence, from this point onwards instead of the stock price data, the stock return data is used (Fig. 1 – Process Box 5).
Following the calculations and analysis throughout the training period, the pairs are sorted based on the co-integration measure and put together in the trading period; the top 50 (1–50), second top 50 (51–100), and all co-integrated pairs are traded based on the historical distance criteria. The reason for working with different groups can be considered as a robustness check for the efficiency of the pairs trading technique being used as well as a robustness check for the contribution of the fuzzy method.
The common practice for the historical distance metric is the standard deviation approach. A position in a pair is opened if the return indices diverge by more than two historical standard deviations, which was calculated in the training period. The position is closed at the next time the indices intersect, and a profit is realized as a result. If the indices do not meet before the end of the trading period, the position is closed no matter what. In this case a profit or a loss could be realized (Fig. 1 – Process Box 8).
Description of the fuzzy model
The proposed fuzzy algorithm in this study is in line with the classical crisp method for the training part. This makes crisp and the fuzzy strategies identical in choosing appropriate pairs for trade, via Engle-Granger co-integration testing. The novelty and contribution of the fuzzy strategy presents itself in the trading period. While classical strategy uses two standard deviations, 2σ spread to generate buy and sell signals, proposed fuzzy model uses fuzzy logic based variables of spread and expert inputs for the same purpose. Figure 1 demonstrates the flowchart of generated algorithm to execute and compare both strategies. Two strategies divert in the <8. Generate ‘Crisp’ Position Array>and <9. Generate ‘Fuzzy’ Position Array>process steps of the algorithm. The differences between the performances of the algorithms are analyzed and the results are presented in Section 4. The details of the fuzzy signal generation are in the following subsection(Fig. 1 – Process Box 6,7).
Trading period under the fuzzy algorithm
Proposed method in this study is a pairs trading strategy, employing fuzzy logic with three inputs: i. spread, ii. expert opinions on the first stock of the pair ‘expStock1’ and iii. expert opinions on the second stock of the pair ‘expStock2’.
Expert analyst inputs extracted from Bloomberg Terminal database, as partly displayed in Table 1, is structured as daily buy/sell/hold order recommendations by several stock market brokers, which may coincide or differ. Overlapping and populated inputs increase the strength rating of specific recommendation for the corresponding stock and vice versa. This strength rating afterwards determine the fuzzy variable, used in the fuzzy decision making process as demonstrated in Figs. 4, 5. As described in the results section of this paper, analyst inputs, employed as fuzzy variables with spread measure serves as an inhibiting factor for the order signal generation process, yielding better choices and decreasing number of transactions(Fig. 1 – Process Box 8–10).

Proposed fuzzy inference structure. Spread is the measure of deviation from the historical distance between normalized price series, expStock1 and expStock2 are the buy/hold/sell recommendations by experts concerning the first and the second stock.

Membership functions for the input variable ‘spread’, according to the standard deviation from historical mean. Positive membership values indicate taking short position in the first member of the pair while buying the second, negative values indicate buying the first member while short selling the second.

Membership functions for the input ‘expStock1’ analyst recommendation for the first stock of the corresponding pair.

Membership functions for the input ‘expStock2’, analyst recommendation for the second stock of the corresponding pair.
Analyst data layout extracted from Bloomberg Terminal database
Three fuzzy inputs (spread, expStock1, expStock2) processed through the ‘sugeno’ fuzzy inference engine in Matlab fuzzy logic toolbox, yield one of three signals: ‘tradePositive’, ‘tradeNegative’ or ‘noTrade’. Expert opinions are considered as perfect linguistic variables to be used in the fuzzy inference structure. Therefore, the analyst data acquired from Bloomberg database have been converted to a technical indicator via the fuzzy logic theory. For the reader to get an idea about the nature of this data, some portion of it is represented in Table 1.
This paper, to our knowledge, is the first study utilizing expert analyst recommendations as inputs for algorithmic trading, through fuzzy inference. This serves as a critical contribution to algorithmic trading for the domain of financial engineering. Flow diagram of the proposed algorithm is presented in Fig. 1.
The algorithm is implemented using Matlab R2018a software and employing Fuzzy Logic Toolbox for the trade signal generation step. Bloomberg terminal data was used to download the analyst data for each stock and the stock price information was downloaded from Yahoo Finance open-source online database.
The stock data from US Nasdaq energy sector was determined as sample dataset due to its high trade volume, volatility, and the number of stocks remaining in the index for two consecutive years (67 stocks). The stocks used for analysis in this study are listed in Table 2. The stock data is downloaded autonomously by the algorithm during the code execution from Yahoo Finance database URL. This lets the ability to plug in any online database to the algorithm for potential further research.
Nasdaq energy sector stocks used as inputs for this study
Sixty-seven stocks that stay in the index for the two consecutive years yields 2211 possible pairs. The proposed algorithm, using the candidate pairs information, proceeds to gathering historical price information of corresponding stocks, for further analysis. Through the Yahoo Finance gateway, the price information for the unique stocks in 2211 pairs is downloaded for the year 2012. Following the construction of the price data for the pairs, ‘Engle-Granger’ co-integration testing [20] is used to reduce the set of pairs and reach the best candidates for pairs trade. This method had been used in numerous studies throughout the statistical arbitrage literature [3, 24]. It involves the test for stationarity for the combination of not-necessarily stationary series. Following the test for every possible pair in our research, remaining co-integrated candidates are listed in Table 3 (240 pairs out of 2211).
Pairs chosen for trade, based on the Engle & Granger cointegration measure
The ‘training’ step of the algorithm, involves analysis of stock price series, calculation of means and variances using the end-of-day price data for the training year of 2012. Historical spread values are determined following the normalization.
The algorithm proceeds to the trading phase following the analysis and calculations in the training step. This part of the algorithm is designed to be able to run in real time using the corresponding stock price gateway. However, for the sake of analysis, historical price information for the second year, 2013, is used. Price data is normalized and trade is executed using ‘crisp’ and ‘fuzzy’ methods consecutively.
Trading stage of the proposed method is mainly based on the notion of mispricing for pairs taking into consideration the co-integrating nature. Short selling position signal for the high valued stock while buying signal for the low valued stock is generated. Positions are unwound when spread revert to the determined minimum.
The spread of prices in the strategy is simply denoted as:
The constructed fuzzy inference engine is demonstrated in Fig. 2. Figures 3–5 show the fuzzy membership functions for the variables ‘spread’, ‘expert suggestions for stock1’ and ‘expert suggestions for stock 2’. Three outputs yielded by the fuzzy strategy are demonstrated by Fig. 6: tradePositive, tradeNegative and noTrade. Output ‘tradePositive’ is the signal for buying the lower priced stock of the pair while short selling the higher priced, and vice versa. Output ‘noTrade’ stands for keeping closed position or winding out the open position.

Output variables by the fuzzy inference engine for trade signal generation.
In the ‘crisp’ trading phase, trade is executed utilizing the market-wide accepted distance strategy using the crisp rule of 2σ historical deviation. Following this step, constructed fuzzy strategy, utilizing Matlab fuzzy logic toolbox is executed using fuzzy inputs of expert opinions on separate stocks and the spread.
Stock markets inherently apply several fixed and variable costs throughout exercised transaction. Fixed transaction cost is an apparent reason for loss when considering a trade strategy. Any algorithmic trading strategy that enforces high number of transactions may diminish the advantages of gains because of the high aggregated transaction costs. Therefore, the rule base of the proposed strategy is constructed for not generating signal to take position for pairs when not recommended or encouraged by experts to avoid high relative cost and risk of bankruptcy. The investment value is always 1 unit, which may be developed, for different investment patterns and differing risk profiles of selected pair [17].
Following the signal generation by both the crisp and the fuzzy strategies, the signals are collected in separate matrixes, short and long returns are calculated for both strategies and compared with and without considering trade costs.
The algorithm, using the stock price data from year 2012 for the training and 2013 for the trading phase was run and results from crisp and fuzzy strategies were obtained. All open positions were wound out in the last trade day of the year no matter the rule base recommended. Figures 7–11 represent several random trades in a year (252 trading days), with generated signals to give a better overview of both strategies in different conditions. As demonstrated by the figures, both strategies may differ in decision, timing and length of the trade.

Several ‘crisp’ and ‘fuzzy’ signals for buy and short sell orders, shows apparently although similar, strategies may differ in signal generation for opening and unwinding positions.

Comparison of ‘crisp’ and ‘fuzzy’ signals revealed crisp strategy yields more transactions in the same time frame and for the same pair, which leads to increased transaction cost.

Another example representing one extra ‘crisp’ signal for reverse position opening and pair of ‘crisp’ and ‘fuzzy’ signals opened and unwound simultaneously for the same pair.

Fuzzy engine may generate early signals, which lets exploit profit opportunity, the crisp rule base may miss to acquire.

An additional signal generation and an early opening by the fuzzy engine for the same pair in the same time frame.
At the end of trading year of 2013, using the training data of 2012 for signal generation using distance method for the crisp and fuzzy strategies and expert analyst data as linguistic inputs for the fuzzy strategy, results are compared in terms of their financial profits and the main results are depicted in Table 4.
Classical and proposed method performance results for proposed strategy with expert inputs for each corresponding member of the pair as fuzzy variables, and with numerical inputs (spread and volatility). (Transaction cost: 0.005)
Table 4 summarizes the returns of pairs trading strategies under the crisp and two different fuzzy algorithms for the trading period of year 2013 for all the co-integrated pairs of stocks in the Energy Sector of the NASDAQ index. The profits generated under all algorithms are nominal values since the trading strategy does not require any initial capital, as the funding for buying a stock comes from the short sales of its counterpart in the pair. Note that the short sales costs are ignored in this study.
In Panel A of Table 4, the inputs for the fuzzy engine are the historical spread and the volatility of the historical spread which are both originally quantitative variables. These numerical variables are converted to linguistic variables and an improvement of 8.38% is observed in the profit generated. However, the improvement is only under the assumption of no transaction costs involved. When a modest transaction cost is introduced one notices that there is no improvement at all. In fact, the profit cut down by 16.15% compared to the crisp algorithm. This is simply due to the high number of transactions under the fuzzy method. Namely, fuzzy method is making more profit by opening and closing a lot more positions. If the level of the transaction cost is higher this may even cause a bankruptcy risk for the trader. The transaction cost used in Table 4 is 0.005, or 0.5%. That means that for a position traded (opened or closed) that is worth 1 dollar, a cost of 0.005 dollars or 0.5 cents is realized.
In Panel B of Table 4, the inputs, the fuzzy variables, for the fuzzy engine are historical spread and expert opinions for each member of the pair. The historical spread is inherently a quantitative variable, on the other hand the expert opinion is a qualitative variable. Contribution of the fuzzy algorithm with this combination of inputs is clearer. If there is no transaction cost compared to the crisp algorithm, the improvement in the profit is 8.15%. However, this time the improvement does not come from taking a lot more positions in the pairs. On the contrary, the number of transactions is 15.04% less than the crisp algorithm. This makes the fuzzy algorithm even more attractive with the transaction costs as the improvement is now 21.97%.
For a further robustness check different groups of co-integrated pairs are compared. Table 5 presents the results with different groups of pairs. If the top 50 of the mostly co-integrated pairs are considered there is a very slight improvement due to the fuzzy algorithm, 1.67%. However, in the second next 50 of most co-integrated pairs the difference is much more significant, 25%. When all the co-integrated pairs are considered the difference is 9.65% as we clearly see on Table 3. This 9.65% difference can be evaluated as the average percentage of improvement among all pairs. As determining which group shows the greatest improvement requires further research.
Returns grouped by sorted list by p-values of the co-integration test from the generated co-integrated pairs list
To summarize, Table 4 Panel B and Table 5 show that fuzzy algorithm consistently improves the performance of the crisp pairs trading strategy. Taking into consideration the market trade costs, which may differ from market to market and from the type of the transaction to another type, the fuzzy strategy also performed better by the contribution of linguistic analyst inputs, which made fuzzy signal generation engine more hesitant to opening position in proposed algorithm. Fuzzy engine yielded given profit with 15% less transactions which propose higher robustness of the strategy over the crisp rule based method. With the inclusion of the trading costs the difference between the crisp and the fuzzy strategies increased to almost 22%. Naturally, the trading costs bring down the profitability of the strategy regardless of the way it is done. However, the reduction of the profits is much more dramatic in the crisp case.
This study shows, via a sample data taken from sixty-seven stocks appearing in NASDAQ energy sector, the improvement obtained by the fuzzy logic techniques over the standard crisp pairs trading strategy based on the co-integration approach for choosing the pairs to trade and the spread approach for opening and closing positions in the pair. The main contribution of this application is acquired from the additional input of the expert opinions for the individual stocks. With the help of the fuzzy logic system, it is possible to convert this qualitative data to an indicator, a fuzzy variable, and feed into the strategy. On the other hand, it is well documented in the literature that the pairs trading strategy returns are quite sensitive to volatility of the market and the transaction costs. Pairs trading strategies tend to work better when the market is highly volatile, and the returns diminish significantly when the transaction costs are included.
The time period chosen in this study is between 2012 and 2013, which is a remarkably steady time span for the financial markets. The key contribution of this study is the improvement observed even in this kind of a time period. Also, the difference between the crisp pairs trading and the fuzzy pairs trading becomes even more visible when the transaction costs are included. This could be interpreted as the fuzzy pairs trading cuts down the unnecessary positions opened significantly. In section 4, when the impact of the transaction costs is analyzed, also the number of transactions is compared along with the returns. The difference is quite noticeable.
For future studies, several market related and linguistic inputs are recommended to be considered through more sophisticated fuzzy models. Buy and sell signals may be generated as strong or weak buy and sell, to determine different investment structures. As co-integration not necessarily require stocks of the pair to belong to the same sector or even market, several sectors and markets may be analyzed. Furthermore, we conclude that fuzzy logic holds great potential for utilizations in financial markets, where human actors take place, as it allows imitation of the human decision making process.
