Abstract
We propose factor models for the cross-section of daily cryptoasset returns and provide source code for data downloads, computing risk factors and backtesting them out-of-sample. In “cryptoassets” we include all cryptocurrencies and a host of various other digital assets (coins and tokens) for which exchange market data is available. Based on our empirical analysis, we identify the leading factor that appears to strongly contribute into daily cryptoasset returns. Our results suggest that cross-sectional statistical arbitrage trading may be possible for cryptoassets subject to efficient executions and shorting.
Introduction
Crytoassets 2 have a sizable (albeit highly volatile) total market capitalization measuring in hundreds of billions of dollars. Superfluously, there is also a sizable number of these cryptoassets, edging toward 2,000 as of this writing. The question we ask – and, at least to a some degree, answer – in this note is this: Are there common (risk) factors underlying the cross-section of cryptoasset returns?
There are no evident “fundamentals” for cryptoassets based on which one could attempt to build “fundamental” long-horizon factors for cryptoassets akin to value, growth, etc., for stocks. 3 However, even for stocks, on short horizons (e.g., overnight returns) in some sense things become simpler as the longer-horizon “fundamentals” (again, such as value and growth) are no longer relevant [Kakushadze and Liew, 2015]. This underlies the construction of the 4-factor model of [Kakushadze, 2015] for equity returns on short-horizons. It is therefore natural to extend the ideas set forth in [Kakushadze, 2015] to cryptoassets, to wit, to daily open-to-close returns.
And this is precisely what this note does. We consider 4(+) factors, to wit, cap (or size, based on market cap), mom (momentum), hlv (based on average intraday volatility), and vol (or liquidity, based on average daily dollar volume). By running out-of-sample Fama-MacBeth regressions [Fama and MacBeth, 1973] and computing annualized t-statistic from the time series of the corresponding regression coefficients, we conclude that vol is not a good predictor (with a possible exception of the previous day’s volume). One possible explanation is that cryptoassets on average trade much less in comparison to their market caps (low “turnover”), so vol does not actually meaningfully measure liquidity. The other three factors cap, mom and hlv do add value, with mom leading by a large margin. In fact, momentum from the day before the previous day is also predictive. The sign of the mom regression coefficient is negative, which indicates a mean-reversion effect in cryptoasset daily returns. 4
The remainder of this note is organized as follows. In Section 2 we describe the data, define our factors, and discuss the results of our regressions. Section 3 briefly concludes with some comments. Appendix A gives R source code for data downloads and running factor regressions. 5 Tables and figures summarize our results.
Factors
Setup and Data
Unlike stocks, barring any special circumstances such as unexpected halts in trading,
cryptoassets trade continuously, 24/7. So, while there are notions of “open” and “close”
for cryptoassets, their meanings are different from those for stocks. For our purposes
here, again, barring any special circumstances, “open” on any given day means the price
right after midnight (UTC time), while “close” on any given day means the price right
before midnight (UTC time). In this regard, absent trading halts, the open on a given day
is very close to the close of the previous day. The high and low prices then have the
usual meaning within the 24 hour window between the open and the close. And the volume is
the dollar volume traded in said 24 hour interval. All the prices are also measured in
dollars, as is the market cap. We use index i = 1, …, N
to label N different cryptoassets cross-sectionally, and index
s = 0, 1, 2, … to denote the dates, with s = 0
corresponding to the most recent date in the time series. So:
Next, we define our daily returns as open-to-close intraday returns:
The use of the log-return (or “continuously compounded” return) is intentional here. For
small values it is approximately the same as the standard (“single-period”) return defined
as
However, cryptoassets can be very volatile, on average, much more so than stocks, and log-returns “smooth out” the outliers somewhat, so below we use R is .
Unlike with stocks, there are no “dividends” to worry about for cryptoassets, to wit, in terms of adjusting prices for dividends. However, an issuer can split its cryptoasset. Thus, Xaurum had a forward split 8000-to-1 on August 23, 2016, so its price decreased accordingly. Unfortunately, https://coinmarketcap.com does not adjust historical prices for splits, and there does not appear to be a simple source to look up historical splits data. Fortunately, for the purposes of analyzing our factor models here, such splits are immaterial as all our factors are defined such that they are unaffected by splits. Note that market cap and dollar volume are unaffected, only prices are. However, in our factor definitions for any given day we only use ratios of intraday prices, which therefore are also unaffected. In this paper, the only place where splits become important is when we plot price weighted indexes, and we account for the aforesaid Xaurum split there (see below).
The factor model is of the form
Here K is the number of risk factors,
f
As
are the K factor
returns, ɛ
is
are the residuals, and
β
iAs
are the factor loadings. We include
the intercept in β
iAs
, so for a given date
s, the N × K matrix
β
iAs
contains a column equal the unit
N-vector, which we will take to be the first column in
β
iAs
. Below, instead of using the index
A, we will denote each column by
Results for regressions (3) with int (intercept) plus 4 factors cap, mom, hlv and vol
defined in Section 2 (along with parameters
d
vol
and
d
hlv
). The universe is based on
cryptoassets with historical data with non-NA close, high, low, open, volume and
market cap, and nonzero volume for a lookback of 386 = 365 + 20 + 1 days from August
18, 2018 (inclusive). This universe consists of 362 cryptoassets. The regressions are
run over 365 days starting with the most recent date in the time series and going back
in time. The t-statistic (t-stat) is annualized, by multiplying the daily t-statistic
by
Results for regressions (3) with int (intercept) plus 4 factors cap, mom, hlv and vol defined in Section 2 (along with parameters d vol and d hlv ). The universe is based on cryptoassets with historical data with non-NA close, high, low, open, volume and market cap, and nonzero volume for a lookback of 751 = 2 ×365 + 20 + 1 days from August 18, 2018 (inclusive). This universe consists of 127 cryptoassets (after eliminating 2 cryptoassets with apparently “artifact” stale prices; however, these stale prices occur prior to the most recent year in the time series, so these 2 cryptoassets are kept when the regressions are run only over said year, which is the case in some of the other tables). The regressions are run over 730 = 2 ×365 days starting with the most recent date in the time series and going back in time (i.e., over the 2 years in the 2-year time series)
Results for regressions (3) with int (intercept) plus 4 factors cap, mom, hlv and vol defined in Section 2 (along with parameters d vol and d hlv ). The universe is based on cryptoassets with historical data with non-NA close, high, low, open, volume and market cap, and nonzero volume for a lookback of 751 = 2 ×365 + 20 + 1 days from August 18, 2018 (inclusive). This universe consists of 129 cryptoassets. The regressions are run over 365 days starting with the most recent date in the time series and going back in time (i.e., over the 1st year in the 2-year time series)
Results for regressions (3) with int (intercept) plus 4 factors cap, mom, hlv and vol defined in Section 2 (along with parameters d vol and d hlv ). The universe is based on cryptoassets with historical data with non-NA close, high, low, open, volume and market cap, and nonzero volume for a lookback of 751 = 2 ×365 + 20 + 1 days from August 18, 2018 (inclusive). This universe consists of 127 cryptoassets (after eliminating 2 cryptoassets with apparently “artifact” stale prices). The regressions are run over 365 days by skipping 365 most recent days, starting with the 366th day in the time series and going back in time (i.e., over the 2nd year in the 2-year time series)
Results for regressions (3) with int (intercept) plus 4 factors cap, mom, hlv and vol defined in Section 2 (along with parameters d vol and d hlv ). The universe is based on cryptoassets with historical data with non-NA close, high, low, open, volume and market cap, and nonzero volume for a lookback of 1116 = 3 ×365 + 20 + 1 days from August 18, 2018 (inclusive). This universe consists of 64 cryptoassets (after eliminating 2 cryptoassets with apparently “artifact” stale prices). The regressions are run over 1095 = 3 ×365 days starting with the most recent date in the time series and going back in time (i.e., over the 3 years in the 3-year time series)
Results for regressions (3) with int (intercept) plus 4 factors cap, mom, hlv and vol defined in Section 2 (along with parameters d vol and d hlv ). The universe is based on cryptoassets with historical data with non-NA close, high, low, open, volume and market cap, and nonzero volume for a lookback of 1116 = 3 ×365 + 20 + 1 days from August 18, 2018 (inclusive). This universe consists of 64 cryptoassets (after eliminating 2 cryptoassets with apparently “artifact” stale prices). The regressions are run over 730 = 2 ×365 days starting with the most recent date in the time series and going back in time (i.e., over the 1st and 2nd years in the 3-year time series)
Results for regressions (3) with int (intercept) plus 4 factors cap, mom, hlv and vol defined in Section 2 (along with parameters d vol and d hlv ). The universe is based on cryptoassets with historical data with non-NA close, high, low, open, volume and market cap, and nonzero volume for a lookback of 1116 = 3 ×365 + 20 + 1 days from August 18, 2018 (inclusive). This universe consists of 64 cryptoassets (after eliminating 2 cryptoassets with apparently “artifact” stale prices). The regressions are run over 730 = 2 ×365 days by skipping the most recent 365 days, starting with the 366th day in the time series and going back in time (i.e., over the 2nd and 3rd years in the 3-year time series)
Results for regressions (3) with int (intercept) plus 4 factors cap, mom, hlv and vol defined in Section 2 (along with parameters d vol and d hlv ). The universe is based on cryptoassets with historical data with non-NA close, high, low, open, volume and market cap, and nonzero volume for a lookback of 1116 = 3 ×365 + 20 + 1 days from August 18, 2018 (inclusive). This universe consists of 66 cryptoassets. The regressions are run over 365 days starting with the most recent date in the time series and going back in time (i.e., over the 1st year in the 3-year time series)
Results for regressions (3) with int (intercept) plus 4 factors cap, mom, hlv and vol defined in Section 2 (along with parameters d vol and d hlv ). The universe is based on cryptoassets with historical data with non-NA close, high, low, open, volume and market cap, and nonzero volume for a lookback of 1116 = 3 ×365 + 20 + 1 days from August 18, 2018 (inclusive). This universe consists of 64 cryptoassets (after eliminating 2 cryptoassets with apparently “artifact” stale prices). The regressions are run over 365 days by skipping the most recent 365 days, starting with the 366th day in the time series and going back in time (i.e., over the 2nd year in the 3-year time series)
Results for regressions (3) with int (intercept) plus 4 factors cap, mom, hlv and vol defined in Section 2 (along with parameters d vol and d hlv ). The universe is based on cryptoassets with historical data with non-NA close, high, low, open, volume and market cap, and nonzero volume for a lookback of 1116 = 3 ×365 + 20 + 1 days from August 18, 2018 (inclusive). This universe consists of 64 cryptoassets (after eliminating 2 cryptoassets with apparently “artifact” stale prices). The regressions are run over 365 days by skipping the most recent 730 = 2 ×365 days, starting with the 731st day in the time series and going back in time (i.e., over the 3rd year in the 3-year time series)
Results for regressions (3) with int (intercept) plus 3 factors cap, mom and hlv defined in Section 2 (along with d hlv ). The period over which the regressions are run is the same as in the table shown in the first column below (where the corresponding cryptoasset universe is also defined)
Table 11 continued. Results for regressions (3) with int (intercept) plus 3 factors cap, mom and hlv defined in Section 2 (along with d hlv ). The period over which the regressions are run is the same as in the table shown in the first column below (where the corresponding cryptoasset universe is also defined)
Results for regressions (3) with int (intercept) plus 2 factors cap and mom defined in Section 2. The period over which the regressions are run is the same as in the table shown in the first column below (where the corresponding cryptoasset universe is also defined)
Results for regressions (3) without int (intercept) and with 3 factors cap, mom and hlv defined in Section 2 (along with d hlv ) for d hlv = 5. The period over which the regressions are run is the same as in the table shown in the first column below (where the corresponding cryptoasset universe is also defined)
Results for regressions (3) with int (intercept) plus 4 factors cap, mom, mom1 and hlv defined in Section 2 (along with d hlv ). The period over which the regressions are run is the same as in the table shown in the first column below (where the corresponding cryptoasset universe is also defined)
Table 15 continued. Results for regressions (3) with int (intercept) plus 4 factors cap, mom, mom1 and hlv defined in Section 2 (along with d hlv ). The period over which the regressions are run is the same as in the table shown in the first column below (where the corresponding cryptoasset universe is also defined)
Results for regressions (3) with int (intercept) plus 7 factors cap, mom, mom1, mom2, mom3, mom4 and hlv defined in Section 2 (along with d hlv ) for d hlv = 5. The period over which the regressions are run is the same as in the table shown in the first column below (where the corresponding cryptoasset universe is also defined). The remaining columns show the corresponding t-stat
Results for regressions (3) with int (intercept) plus 4 factors cap, mom, hlv and mnbl defined in Section 2 (along with d hlv ) for d hlv = 5. The period over which the regressions are run is the same as in the table shown in the first column below (where the corresponding cryptoasset universe is also defined)
To determine whether a given factor adds value, as in [Fama and MacBeth, 1973], we use
annualized t-statistic τ
A
for each factor,
which can be computed using the daily returns
f
As
as follows:
Here t is the beginning of the period for which the t-statistic is
computed, and T is the length of said period. Throughout, all time
quantities are measured in days. Also, the annualization factor
6
is
The intercept plays the role of the “market beta”:
Neutrality w.r.t.
Size is the natural logarithm of the market cap
7
So, on date s we use the previous day’s market cap, which is 100% out-of-sample.
There are various ways to define the momentum factor loading. For our purposes here, we
will define it as the previous day’s open-to-close return:
Again, this definition is 100% out-of-sample. Below we will also analyze momenta (which
we refer to as mom1, mom2, …) defined using the days prior to the previous day:
There are various ways to define the intraday volatility factor. Below we will use the
following simple definition:
8
Averaging over the previous d hlv days (which is done 100% out-of-sample) is necessary to smooth out the noise. Since here we are dealing with a variance-like quantity (as opposed to a correlation-like quantity), looking back d hlv days does not introduce out-of-sample instabilities associated with correlations and time series based betas. Below we use the values d hlv = 20, 15, 10, 5. Also, note that (13) is the logarithmic intraday volatility. This is because, as is generally the case with volatility, the intraday volatility itself (without taking the log) has a skewed, long-tailed distribution for higher-end values, and, among other things, as a factor it would adversely interfere with the intercept and also result in unnaturally skewed regression residuals ɛ is .
For stocks, the average daily dollar volume is a measure of liquidity. For cryptoassets
the situation is murkier (see below). Nonetheless, following [Kakushadze, 2015] we define the volume factor loading as follows:
Recall that V
is
is the daily dollar
volume, so
Some cryptoassets are minable and some are not. Therefore, it is natural to consider a binary factor loading defined as follows (notwithstanding that a priori it is not clear why it would have an impact on daily returns):
This factor loading is independent of the time index s and only depends on i.
We downloaded 9 the data from https://coinmarketcap.com for all cryptoassets as of August 19, 2018 (so the most recent date in the data is August 18, 2018), whose number was 1,855. Out of those, 1,851 had downloadable data, albeit for many various fields were populated with “?”, which we converted into NAs. Despite a seemingly large number of cryptoassets, the useful data going back any reasonable number of days only exists for a fraction of them. In order to be able to run our regressions (see below), we only kept cryptoassets with non-NA price (open, close, high, low), volume and market cap data, with an additional filter that no null volume was allowed either (to avoid contamination by stale prices). There were only 362 cryptoassets with such properties starting August 18, 2018 (inclusive) and going back 365 + 20 + 1 days (i.e., 1 year “padded” with additional 21 days to be able to compute 20-day moving averages out-of-sample), 129 cryptoassets when the filters were applied to 2 × 365 + 20 + 1 days (i.e., 2 years “padded” with additional 21 days), and only 66 cryptoassets when the filters were applied to 3 × 365 + 20 + 1 days (i.e., 3 years “padded” with additional 21 days). 10 So, the cross section is pretty “thin”, nowhere near as rich as for stocks. However, we must work with what we have.
Regressions and Results
So, we run regressions of the daily returns R is over the loadings β iAs (computed as above) for various periods and their subperiods described above and in Tables 1 through 18. Tables 1 through 10 include int (intercept), cap (size), mom (momentum), hlv (intraday volatility) and vol (volume), for various values of d hlv and d vol . From these tables it appears that the average daily dollar volume for d vol > 1 is a poor predictor. For d vol = 1 (i.e., the previous day’s dollar volume) the results are at best mixed and certainly not ground-shaking. While it is possible that the previous day’s volume adds value, it might do so more efficiently as an additional filter (when, e.g., defining a trading signal) as opposed to a stand-alone factor. Tables 11 and 12, which exclude vol and include only int, cap, mom and hlv, appear to support this conclusion. Further, hlv appears to be rather stable w.r.t. the choice of d hlv . The mom factor leads by a large margin in all periods (see Tables 11 and 12). The cap and hlv factors appear to work well except in the first year (meaning, going back in time from August 18, 2018, so this is the most recent 1-year period) for the smaller universes based on 129 and 66 cryptoassets (see above). However, these factors work much better for the same period for the larger universe based on 362 cryptoassets (see above), which subsumes the aforesaid smaller universes. Therefore, the most likely explanation would appear to be that this is due to the smallness of the cross-sectional samples for these smaller universes. Further, the hlv factor appears to be the weakest among cap, mom and hlv. However, removing hlv worsens the results (see Table 13), as does removing the intercept (see Table 14). 11
Considering how strong the mom factor is, it is natural to also look at momenta from days prior to the previous day, i.e., mom1, mom2, … (defined above). Tables 15 and 16 suggest that mom1 indeed appears to be a good predictor. Going beyond mom1 (see Table 17, which also includes mom2, mom3 and mom4) unsurprisingly gives mixed results as any effect from momentum is expected to decay with time. Note that the regression coefficients for the momenta are negative, which suggests a substantial mean-reversion effect in the aforesaid cryptoasset returns. Including the dummy variable mnbl defined above does not appear to add value (see Table 18).
“Sanity Check”
Considering the “flukes” in the performance of cap and hlv during the most recent 1-year
period for smaller universes (see above), it makes sense to get at least a superficial
visual confirmation that nothing utterly “odd” happened with those universes during that
period. A simple thing to do here is to look at the performance of “market indexes” built
based on the aforesaid 362, 129 and 66 cryptoasset universes. Thus, we can construct the
following indexes (among myriad others):

The market cap weighted index (bottom) and price weighted index (top) for the same 1-year period and cryptoasset universe as in Table 1. Both indexes are normalized to 1 on the first day of the period. The spike in the price weighted index is due to a high-priced cryptoasset (42-coin) with a small market cap.
Here

The market cap weighted index (top) and price weighted index (bottom) for the same 1-year period and cryptoasset universe as in Table 3. Both indexes are normalized to 1 on the first day of the period.

The market cap weighted index (top) and price weighted index (bottom) for the same 1-year period and cryptoasset universe as in Table 4. Both indexes are normalized to 1 on the first day of the period.

The market cap weighted index (top) and price weighted index (bottom) for the same 1-year period and cryptoasset universe as in Table 8. Both indexes are normalized to 1 on the first day of the period.

The market cap weighted index (top) and price weighted index (bottom) for the same 1-year period and cryptoasset universe as in Table 9. Both indexes are normalized to 1 on the first day of the period.
Despite the superfluous ubiquity of cryptoassets, the amount of cross-sectionally available data is still limited as for most cryptoassets it simply does not go far back enough. Cryptoassets come and go, many coins and tokens disappear a short while after issuance, and many are perceived as being scams to raise a quick buck and run with the unsuspecting or uninformed investors’ money. This is still a very young field despite Bitcoin having being around for over 9 years, so the kind of analyses performed in [Kakushadze, 2015] on 2,000, 3,000 and even 4,000 stock tickers going back 5 years is simply impossible for cryptoassets at this nascent stage in their development. It might take another 5–10 years to collect that kind of data, depending on their survival rate – and assuming the whole field does not disappear due to regulatory or some other (less foreseeable) issues. 12 Only time will tell.

The market cap weighted index (top) and price weighted index (bottom) for the same 1-year period and cryptoasset universe as in Table 10. Both indexes are normalized to 1 on the first day of the period.

The Bitcoin price drop (along with the broad cryptoasset market) on September 5, 2018. This snapshot was generated using an interactive chart on https://www.coinbase.com.
Nonetheless, it is pleasant to observe that the 3 short-horizon factors discussed in [Kakushadze, 2015] for stocks, to wit, cap (size), mom (momentum) and hlv (intraday volatility) work well for cryptoassets as well. The fact that vol (volume) does not seem to add value for cryptoassets (at least beyond the previous day’s volume) is not necessarily surprising as volume is not directional. In fact, on its own it is not a good predictor for stocks either and only works when combined with the other factors. One possible explanation for why vol does not add value for cryptoassets is that it is less evident how well vol describes actual liquidity in cryptoasset markets considering a very different (from stocks) fee structure and executions. Thus, one way to quantify and see the difference between cryptoassets and stocks (as it relates to volume) is to compare their respective cross-sections of ratios of, say, the 20-day average daily dollar volume to the market cap, which is a measure of average daily “turnover”. For the 362 cryptoasset universe (see above) as of August 18, 2018, the cross-sectional summary of this ratio is as follows: Min = 2.00 × 10-10, 1st Quartile = 6.87 × 10-7, Median = 3.03 × 10-6, Mean = 3.16 × 10-5, 3rd Quartile = 1.44 × 10-5, Max = 2.71 × 10-3. For stocks, for the universe of the 500 highest market cap stocks as of (a randomly chosen date) August 31, 2010, that summary reads: Min = 1.05 × 10-5, 1st Quartile = 6.09 × 10-3, Median = 8.57 × 10-3, Mean = 1.05 × 10-2, 3rd Quartile = 1.25 × 10-2, Max = 1.08 × 10-1. The bottom line is that on relative basis (i.e., based on the average daily “turnover”) cryptoassets do not trade anywhere near as much as stocks, so it is not that surprising that the volume is not a good predictor for cryptoassets.
The fact that momentum dominates as a factor for cryptoasset returns means that on short horizons the market is strongly mean-reverting (cross-sectionally). This in turn implies that if one could short a large cross-section of cryptoassets and trade them on short horizons cost-effectively, one might be able to trade a cross-sectional dollar-neutral mean-reversion statistical arbitrage strategy with cryptoassets, along the lines of similar strategies for stocks – perhaps one day soon.
Footnotes
For some literature on long-horizon factors for equities, see, e.g., [Amihud, 2002], [Ang et al., 2006], [Anson, 2013], [Asness, 1995], [Asness et al., 2001], [Asness, Porter and Stevens, 2000], [Banz, 1981], [Basu, 1977], [Carhart, 1997], [Fama and French, 1992], [Fama and French, 1993], [Fama and French, 1996], [Haugen, 1995], [Jegadeesh and Titman, 1993], [Lakonishok, Shleifer and Vishny, 1994], [Liew and Vassalou, 2000], [Pástor and Stambaugh, 2003], [Scholes and Williams, 1977].
To our knowledge, our analysis here is the first of its kind. For some cryptoasset investment and trading related literature, see, e.g., [Alessandretti et al., 2018], [Amjad and Shah, 2017], [Baek and Elbeck, 2014], [Bariviera et al., 2017], [Bouoiyour et al., 2016], [Bouri et al., 2017], [Brandvold et al., 2015], [Briere, Oosterlinck and Szafarz, 2015], [Cheah and Fry, 2015], [Cheung, Roca and Su, 2015], [Ciaian, Rajcaniova and Kancs, 2015], [Colianni, Rosales and Signorotti, 2015], [Donier and Bouchaud, 2015], [Dyhrberg, 2015], [Eisl, Gasser and Weinmayer, 2015], [Gajardo, Kristjanpoller and Minutolo, 2018], [Garcia and Schweitzer, 2015], [Georgoula et al., 2015], [Harvey, 2016], [Jiang and Liang, 2017], [Kim et al., 2016], [Kristoufek, 2015], [Lee, Guo and Wang, 2018], [Li et al., 2018], [Liew, Li and Budavari, 2018], [Nakano, Takahashi and Takahashi, 2018], [Ortisi, 2016], [Shah and Zhang, 2014], [Van Alstyne, 2014], [Wang and Vergne, 2017].
The source code given in Appendix A hereof is not written to be “fancy” or optimized for speed or in any other way. Its sole purpose is to illustrate the algorithms described in the main text in a simple-to-understand fashion. Some important legalese is relegated to Appendix B.
Note that t-statistic is a horizon-dependent quantity; it scales as the square root of the horizon.
In [Kakushadze, 2015] for stock returns the log of the price was used instead as the market cap is the price times the shares outstanding, and the latter change negligibly on short horizons. The same generally holds for cryptoassets, so we could use the price instead of the market cap. However, considering the split adjustment issues mentioned above, it is simpler to use market cap.
For alternative definitions, see [Kakushadze, 2015].
R source code for data downloads and running factor regressions is given in Appendix A.
Actually, 2 cryptoassets had apparently “artifact” stale prices in the second and third year (looking back), so they had to be excluded from the corresponding regressions (see below).
The intercept has variable t-statistic across various periods and universes, including changing its sign. However, this is not surprising as the intercept plays the role of the “market beta”, which is expected to be highly variable on general grounds, as it is for stocks as well as other assets.
Thus, as this note was being finalized, the cryptoasset market had yet another crash on September 5, 2018 on the news that Goldman Sachs reportedly is putting on hold its plans for a cryptocurrency trading desk [Campbell and Chaparro, 2018]. Also see Figure 7.
Appendix A. R Source Code: Downloads and Regressions
In this appendix we give R (R Project for Statistical Computing, https://www.r-project.org/) source code for downloading cryptoasset data and
running factor regressions discussed in the main text. The code is straightforward and
self-explanatory. The first function is
The second function
The third function
The fourth and last function
crypto.data < - function ()
{
require(XML)
require(httr)
url < - "https://coinmarketcap.com/all/
views/all/"
z < - x < - shared.get.webpage(url)
x < - shared.parse.html(x, keyword = "table")
u < - c(x[22:28])
for(i in 1:length(u))
{
u1 < - grep(u[i], x)
x < - x[-u1]
}
u1 < - grep("[*]", x)
x1 < - x[-u1]
hdr < - c("Rank", "Name", "Symbol", "MktCap", "Price", "Supply", "Volume", "Ch1h", "Ch24h", "Ch7d", "URL", "Minable")
x1 < - x1[-(1:10)]
x1 < - matrix(x1, length(x1)/11, 11, byrow = T)
x1 < - x1[, -2]
x1 < - gsub(",", "", x1)
x1 < - gsub(" $", "", x1)
x1 < - gsub("% ", "", x1)
x1 < - cbind(x1, rep("", nrow(x1)), rep("Y", nrow(x1)))
y < - grep("currency-symbol visible-xs", z)
z1 < - z[y]
y < - grep("link-secondary", z1)
z1 < - z1[y]
y < - grep("href", z1)
z1 < - z1[y]
y < - grep("currencies", z1)
z1 < - z1[y]
if(length(z1) != nrow(x1))
stop("ERROR")
x < - x[-(1:10)]
for(i in 1:length(z1))
{
u < - unlist(strsplit(z1[i], "/"))[3]
x1[i, 11] < - u
u1 < - grep("[*]", x[1:11])
if(length(u1) >0)
{
x1[i, 12] < - "N"
x < - x[-(1:12)]
}
else
x < - x[-(1:11)]
}
x1 < - rbind(hdr, x1)
shared.write.table(x1, "crypto.cap.txt", T)
}
crypto.hist.prc < - function (date)
{
require(XML)
require(httr)
x < - read.delim("crypto.cap.txt", header = T)
x < - as.matrix(x)
univ < - x[, 11]
for(i in 1:length(univ))
{
url < - paste("https://coinmarketcap.com/currencies/",
univ[i],
"/historical-data/?start=20000101&end=", date, sep = "")
x < - shared.get.webpage(url)
x < - shared.parse.html(x, keyword = "table")
n < - length(x)/7
if(n <1 | trunc(n) != n)
{
write(univ[i], "crypto.bad.txt", append = T)
next
}
x < - matrix(x, length(x)/7, 7, byrow = T)
x[1,] < - c("Date", "Open", "High", "Low",
"Close", "Volume", "MktCap")
file < - paste("CryptoHistData/", univ[i], ".txt", sep = "")
shared.write.table(x, file, T)
}
}
crypto.prc.files < - function ()
{
match.univ < - function (univ1, univ2)
{
good < - match(univ1, univ2, nomatch = 0)
univ < - univ2[good]
return(univ)
}
read.file < - function(file, header = T)
{
x < - read.delim(file, header = header)
x < - as.matrix(x)
x < - gsub(",", "", x)
return(x)
}
x < - read.file("crypto.cap.txt")
univ < - x[, 11]
mnbl < - x[, 12]
name < - x[, 2]
bad < - readLines("crypto.bad.txt")
take < - is.na(match(univ, bad))
mnbl < - mnbl[take]
name < - name[take]
mnbl[mnbl == "Y"] < - 1
mnbl[mnbl == "N"] < - 0
univ < - univ[take]
shared.write.table(mnbl, file = "cr.mnbl.txt", T)
shared.write.table(name, file = "cr.name.txt", T)
n < - length(univ)
for(i in 1:n)
{
x < - read.file(paste("CryptoHistData/", univ[i],
".txt", sep = ""))
if(i == 1)
{
dates < - x[, "Date"]
d < - length(dates)
prc < - matrix(NA, n, d)
cap < - matrix(NA, n, d)
high < - matrix(NA, n, d)
low < - matrix(NA, n, d)
vol < - matrix(NA, n, d)
open < - matrix(NA, n, d)
dimnames(prc)[[2]] < - dates
dimnames(cap)[[2]] < - dates
dimnames(high)[[2]] < - dates
dimnames(low)[[2]] < - dates
dimnames(vol)[[2]] < - dates
dimnames(open)[[2]] < - dates
prc[1,] < - x[1:d, "Close"]
cap[1,] < - x[1:d, "MktCap"]
high[1,] < - x[1:d, "High"]
low[1,] < - x[1:d, "Low"]
vol[1,] < - x[1:d, "Volume"]
open[1,] < - x[1:d, "Open"]
}
else
{
dates1 < - x[, "Date"]
dates1 < - match.univ(dates, dates1)
prc[i, dates1] < - x[1:length(dates1), "Close"]
cap[i, dates1] < - x[1:length(dates1), "MktCap"]
high[i, dates1] < - x[1:length(dates1), "High"]
low[i, dates1] < - x[1:length(dates1), "Low"]
vol[i, dates1] < - x[1:length(dates1),
"Volume"]
open[i, dates1] < - x[1:length(dates1), "Open"]
}
}
mode(prc) < - "numeric"
mode(cap) < - "numeric"
mode(high) < - "numeric"
mode(low) < - "numeric"
mode(vol) < - "numeric"
mode(open) < - "numeric"
shared.write.table(prc, file = "cr.prc.txt", T)
shared.write.table(cap, file = "cr.cap.txt", T)
shared.write.table(high, file = "cr.high.txt", T)
shared.write.table(low, file = "cr.low.txt", T)
shared.write.table(vol, file = "cr.vol.txt", T)
shared.write.table(open, file = "cr.open.txt", T)
}
crypto.prc < - function (days = 365, back = 0,
lookback = days, d.r = 20, d.v = 20, d.i = 20)
{
calc.ix < - function(z, days)
{
ix < - colSums(z[, 1:days])
ix < - ix[days:1]
ix < - ix / ix[1]
return(ix)
}
read.prc < - function(file, header = F, make.numeric = T)
{
x < - read.delim(file, header = header)
x < - as.matrix(x)
if(make.numeric)
mode(x) < - "numeric"
return(x)
}
calc.mv.avg < - function(x, days, d.r)
{
if(d.r == 1)
return(x[, 1:days])
y < - matrix(0, nrow(x), days)
for(i in 1:days)
y[, i] < - rowMeans(x[, i:(i + d.r - 1)], na.rm = T)
return(y)
}
prc < - read.prc("cr.prc.txt")
cap < - read.prc("cr.cap.txt")
high < - read.prc("cr.high.txt")
low < - read.prc("cr.low.txt")
vol < - read.prc("cr.vol.txt")
open < - read.prc("cr.open.txt")
mnbl < - read.prc("cr.mnbl.txt")
name < - read.prc("cr.name.txt", make.numeric = F)
d < - days + d.r + 1
prc < - prc[, 1:d]
cap < - cap[, 1:d]
high < - high[, 1:d]
low < - low[, 1:d]
vol < - vol[, 1:d]
open < - open[, 1:d]
take < - rowSums(is.na(prc)) == 0 & rowSums(is.na(cap)) == 0 &
rowSums(is.na(high)) == 0 & rowSums(is.na(low)) == 0 &
rowSums(is.na(vol)) == 0 & rowSums(is.na(open)) == 0 &
rowSums(vol == 0) == 0
ret < - log(prc[take, -d] / prc[take, -1])
prc < - prc[take, -1]
cap < - cap[take, -1]
high < - high[take, -1]
low < - low[take, -1]
vol < - vol[take, -1]
open < - open[take, -1]
mnbl < - mnbl[take, 1]
name < - name[take, 1]
if(back >0)
{
ret < - ret[, (back + 1):ncol(ret)]
prc < - prc[, (back + 1):ncol(prc)]
cap < - cap[, (back + 1):ncol(cap)]
high < - high[, (back + 1):ncol(high)]
low < - low[, (back + 1):ncol(low)]
vol < - vol[, (back + 1):ncol(vol)]
open < - open[, (back + 1):ncol(open)]
}
days < - lookback
av < - log(calc.mv.avg(vol, days, d.v))
hlv < - (high - low)ˆ2 / prcˆ2
hlv < - 0.5 * log(calc.mv.avg(hlv, days, d.i))
take < - rowSums(!is.finite(hlv)) == 0
av < - av[take,]
hlv < - hlv[take,]
mom < - log(prc / open)[take, 1:days]
mom1 < - log(prc / open)[take, 1:days + 1]
mom2 < - log(prc / open)[take, 1:days + 2]
mom3 < - log(prc / open)[take, 1:days + 3]
mom4 < - log(prc / open)[take, 1:days + 4]
size < - log(cap)[take, 1:days]
ret < - ret[take, 1:days]
mnbl < - mnbl[take]
name < - name[take]
for(i in 1:days)
{
flm < - cbind(size[, i], mom[, i], hlv[, i], av[, i])
if(i == 1)
fac < - matrix(NA, ncol(flm) + 1, days)
reg < - lm(ret[, i] flm)
fac[, i] < - coefficients(reg)
}
t.stat < - sqrt(365) * rowMeans(fac) / apply(fac, 1, sd)
t.stat < - round(t.stat, 2)
prc < - prc[take,]
cap < - cap[take,]
y < - prc[name == "Xaurum",]
prc[name == "Xaurum", y >1] < -
prc[name == "Xaurum", y >1] / 8000
ix.cap < - calc.ix(cap, days)
ix.prc < - calc.ix(prc, days)
plot(1:length(ix.cap), ix.cap, type = "l",
col = "green", xlab = "days", ylab = "index value",
ylim = c(min(c(ix.cap, ix.prc)) -.5,
max(c(ix.cap, ix.prc)) +.5))
lines(1:length(ix.prc), ix.prc, col = "blue")
return(t.stat)
}
Appendix B. Disclaimers
Wherever the context so requires, the masculine gender includes the feminine and/or neuter, and the singular form includes the plural and vice versa. The author of this paper (“Author") and his affiliates including without limitation Quantigic® Solutions LLC (“Author’s Affiliates” or “his Affiliates") make no implied or express warranties or any other representations whatsoever, including without limitation implied warranties of merchantability and fitness for a particular purpose, in connection with or with regard to the content of this paper including without limitation any code or algorithms contained herein (“Content").
The reader may use the Content solely at his/her/its own risk and the reader shall have no claims whatsoever against the Author or his Affiliates and the Author and his Affiliates shall have no liability whatsoever to the reader or any third party whatsoever for any loss, expense, opportunity cost, damages or any other adverse effects whatsoever relating to or arising from the use of the Content by the reader including without any limitation whatsoever: any direct, indirect, incidental, special, consequential or any other damages incurred by the reader, however caused and under any theory of liability; any loss of profit (whether incurred directly or indirectly), any loss of goodwill or reputation, any loss of data suffered, cost of procurement of substitute goods or services, or any other tangible or intangible loss; any reliance placed by the reader on the completeness, accuracy or existence of the Content or any other effect of using the Content; and any and all other adversities or negative effects the reader might encounter in using the Content irrespective of whether the Author or his Affiliates is or are or should have been aware of such adversities or negative effects.
The R code included in Appendix A hereof is part of the copyrighted R code of Quantigic® Solutions LLC and is provided herein with the express permission of Quantigic® Solutions LLC. The copyright owner retains all rights, title and interest in and to its copyrighted source code included in Appendix A hereof and any and all copyrights therefor.
