Algorithmic trading strategies are driven by signals that indicate when to buy or sell assets to generate superior returns relative to a benchmark such as an index. The portion of an asset's return that is not explained by exposure to this benchmark is called alpha, and hence the signals that aim to produce such uncorrelated returns are also called alpha factors.
If you are already familiar with ML, you may know that feature engineering is a key ingredient for successful predictions. This is no different in trading. Investment, however, is particularly rich in decades of research into how markets work and which features may work better than others to explain or predict price movements as a result. This chapter provides an overview as a starting point for your own search for alpha factors.
This chapter also presents key tools that facilitate the computing and testing alpha factors. We will highlight how the NumPy, pandas and TA-Lib libraries facilitate the manipulation of data and present popular smoothing techniques like the wavelets and the Kalman filter that help reduce noise in data.
We also preview how you can use the trading simulator Zipline to evaluate the predictive performance of (traditional) alpha factors. We discuss key alpha factor metrics like the information coefficient and factor turnover. An in-depth introduction to backtesting trading strategies that use machine learning follows in Chapter 6, which covers the ML4T workflow that we will use throughout the book to evaluate trading strategies.
In particular, this chapter will address the following topics:
- Which categories of factors exist, why they work, and how to measure them
- Creating e alpha factors using NumPy, pandas, and TA-Lib
- How to denoise data using wavelets and the Kalman filter
- Using e Zipline offline and on Quantopian to test individual and multiple alpha factors
- How to use Alphalens to evaluate predictive performance and turnover using, among other metrics, the information coefficient (IC)
Please see the Appendix - Alpha Factor Library for additional material on this topic, including numerous code examples that compute a broad range of alpha factors.
Alpha factors are transformations of market, fundamental, and alternative data that contain predictive signals. They are designed to capture risks that drive asset returns. One set of factors describes fundamental, economy-wide variables such as growth, inflation, volatility, productivity, and demographic risk. Another set consists of tradeable investment styles such as the market portfolio, value-growth investing, and momentum investing.
There are also factors that explain price movements based on the economics or institutional setting of financial markets, or investor behavior, including known biases of this behavior. The economic theory behind factors can be rational, where the factors have high returns over the long run to compensate for their low returns during bad times, or behavioral, where factor risk premiums result from the possibly biased, or not entirely rational behavior of agents that is not arbitraged away.
In an idealized world, categories of risk factors should be independent of each other (orthogonal), yield positive risk premia, and form a complete set that spans all dimensions of risk and explains the systematic risks for assets in a given class. In practice, these requirements will hold only approximately.
- Dissecting Anomalies by Eugene Fama and Ken French (2008)
- Explaining Stock Returns: A Literature Review by James L. Davis (2001)
- Market Efficiency, Long-Term Returns, and Behavioral Finance by Eugene Fama (1997)
- The Efficient Market Hypothesis and It's Critics by Burton Malkiel (2003)
- The New Palgrave Dictionary of Economics (2008) by Steven Durlauf and Lawrence Blume, 2nd ed.
- Anomalies and Market Efficiency by G. William Schwert25 (Ch. 15 in Handbook of the- "Economics of Finance", by Constantinides, Harris, and Stulz, 2003)
- Investor Psychology and Asset Pricing, by David Hirshleifer (2001)
Based on a conceptual understanding of key factor categories, their rationale and popular metrics, a key task is to identify new factors that may better capture the risks embodied by the return drivers laid out previously, or to find new ones. In either case, it will be important to compare the performance of innovative factors to that of known factors to identify incremental signal gains.
The notebook feature_engineering.ipynb in the data directory illustrates how to engineer basic factors.
- Fama French Data Library
- numpy website
- pandas website
- alphatools - Quantitative finance research tools in Python
- mlfinlab - Package based on the work of Dr Marcos Lopez de Prado regarding his research with respect to Advances in Financial Machine Learning
The notebook how_to_use_talib illustrates the usage of TA-Lib, which includes a broad range of common technical indicators. These indicators have in common that they only use market data, i.e., price and volume information.
The notebook common_alpha_factors in th appendix contains dozens of additional examples.
The notebook kalman_filter_and_wavelets demonstrates the use of the Kalman filter using the PyKalman
package for smoothing; we will also use it in Chapter 9 when we develop a pairs trading strategy.
- PyKalman documentation
- Tutorial: The Kalman Filter
- Understanding and Applying Kalman Filtering
- How a Kalman filter works, in pictures
The notebook kalman_filter_and_wavelets also demonstrates how to work with wavelets using the PyWavelets
package.
- PyWavelets - Wavelet Transforms in Python
- An Introduction to Wavelets
- The Wavelet Tutorial
- Wavelets for Kids
- The Barra Equity Risk Model Handbook
- Active Portfolio Management: A Quantitative Approach for Producing Superior Returns and Controlling Risk by Richard Grinold and Ronald Kahn, 1999
- Modern Investment Management: An Equilibrium Approach by Bob Litterman, 2003
- Quantitative Equity Portfolio Management: Modern Techniques and Applications by Edward Qian, Ronald Hua, and Eric Sorensen
- Spearman Rank Correlation
The open source zipline library is an event-driven backtesting system maintained and used in production by the crowd-sourced quantitative investment fund Quantopian to facilitate algorithm-development and live-trading. It automates the algorithm's reaction to trade events and provides it with current and historical point-in-time data that avoids look-ahead bias.
Chapter 8 contains a more comprehensive introduction to Zipline.
- The current release 1.3 has a few shortcomings such as the dependency on benchmark data from the IEX exchange and limitations for importing features beyond the basic OHLCV data points.
- To enable the use of
zipline
, I've provided a patched version that works for the purposes of this book. - Please follow the instructions in the
installation
folder.
The notebook single_factor_zipline develops and test a simple mean-reversion factor that measures how much recent performance has deviated from the historical average. Short-term reversal is a common strategy that takes advantage of the weakly predictive pattern that stock price increases are likely to mean-revert back down over horizons from less than a minute to one month.
The Quantopian research environment is tailored to the rapid testing of predictive alpha factors. The process is very similar because it builds on zipline
, but offers much richer access to data sources.
The notebook multiple_factors_quantopian_research illustrates how to compute alpha factors not only from market data as previously but also from fundamental and alternative data.
The notebook performance_eval_alphalens introduces the alphalens library for the performance analysis of predictive (alpha) factors, open-sourced by Quantopian. It demonstrates how it integrates with the backtesting library zipline
and the portfolio performance and risk analysis library pyfolio
that we will explore in the next chapter.
alphalens
facilitates the analysis of the predictive power of alpha factors concerning the:
- Correlation of the signals with subsequent returns
- Profitability of an equal or factor-weighted portfolio based on a (subset of) the signals
- Turnover of factors to indicate the potential trading costs
- Factor-performance during specific events
- Breakdowns of the preceding by sector
The analysis can be conducted using tearsheets
or individual computations and plots. The tearsheets are illustrated in the online repo to save some space.
- See here for a detailed
alphalens
tutorial by Quantopian
- QuantConnect
- Alpha Trading Labs
- WorldQuant
- Python Algorithmic Trading Library PyAlgoTrade
- pybacktest
- Trading with Python
- Interactive Brokers