Machine Learning for Factor Investing

✍ Scribed by Guillaume Coquere; Tony Guida

Publisher: CRC Press LLC
Year: 2023
Tongue: English
Leaves: 358
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Machine learning (ML) is progressively reshaping the fields of quantitative finance and algorithmic trading. ML tools are increasingly adopted by hedge funds and asset managers, notably for alpha signal generation and stocks selection. The technicality of the subject can make it hard for non-specialists to join the bandwagon, as the jargon and coding requirements may seem out-of-reach. Machine learning for factor investing: Python version bridges this gap. It provides a comprehensive tour of modern ML-based investment strategies that rely on firm characteristics. The book covers a wide array of subjects which range from economic rationales to rigorous portfolio back-testing and encompass both data processing and model interpretability. Common supervised learning algorithms such as tree models and neural networks are explained in the context of style investing and the reader can also dig into more complex techniques like autoencoder asset returns, Bayesian additive trees and causal models. All topics are illustrated with self-contained Python code samples and snippets that are applied to a large public dataset that contains over 90 predictors. The material, along with the content of the book, is available online so that readers can reproduce and enhance the examples at their convenience. If you have even a basic knowledge of quantitative finance, this combination of theoretical concepts and practical illustrations will help you learn quickly and deepen your financial and technical expertise.

✦ Table of Contents

I Introduction

1 Notations and data

    1.1 Notations

    1.2 Dataset

2 Introduction

    2.1 Context

    2.2 Portfolio construction: the workflow

    2.3 Machine learning is no magic wand

3 Factor investing and asset pricing anomalies

    3.1 Introduction

    3.2 Detecting anomalies

        3.2.1 Challenges

        3.2.2 Simple portfolio sorts

        3.2.3 Factors

        3.2.4 Fama-MacBeth regressions

        3.2.5 Factor competition

        3.2.6 Advanced techniques

    3.3 Factors or characteristics?

    3.4 Hot topics: momentum, timing, and ESG

        3.4.1 Factor momentum

        3.4.2 Factor timing

        3.4.3 The green factors

    3.5 The links with machine learning

        3.5.1 Short list of recent references

        3.5.2 Explicit connections with asset pricing models

    3.6 Coding exercises

4 Data preprocessing

    4.1 Know your data

    4.2 Missing data

    4.3 Outlier detection

    4.4 Feature engineering

        4.4.1 Feature selection

        4.4.2 Scaling the predictors

    4.5 Labelling

        4.5.1 Simple labels

        4.5.2 Categorical labels

        4.5.3 The triple barrier method

        4.5.4 Filtering the sample

        4.5.5 Return horizons

    4.6 Handling persistence

    4.7 Extensions

        4.7.1 Transforming features

        4.7.2 Macroeconomic variables

        4.7.3 Active learning

    4.8 Additional code and results

        4.8.1 Impact of rescaling: graphical representation

        4.8.2 Impact of rescaling: toy example

    4.9 Coding exercises

II Common supervised algorithms

5 Penalized regressions and sparse hedging for minimum variance portfolios

    5.1 Penalized regressions

        5.1.1 Simple regressions

        5.1.2 Forms of penalizations

        5.1.3 Illustrations

    5.2 Sparse hedging for minimum variance portfolios

        5.2.1 Presentation and derivations

        5.2.2 Example

    5.3 Predictive regressions

        5.3.1 Literature review and principle

        5.3.2 Code and results

    5.4 Coding exercise

6 Tree-based methods

    6.1 Simple trees

        6.1.1 Principle

        6.1.2 Further details on classification

        6.1.3 Pruning criteria

        6.1.4 Code and interpretation

    6.2 Random forests

        6.2.1 Principle

        6.2.2 Code and results

    6.3 Boosted trees: Adaboost

        6.3.1 Methodology

        6.3.2 Illustration

    6.4 Boosted trees: extreme gradient boosting

        6.4.1 Managing loss

        6.4.2 Penalization

        6.4.3 Aggregation

        6.4.4 Tree structure

        6.4.5 Extensions

        6.4.6 Code and results

        6.4.7 Instance weighting

    6.5 Discussion

    6.6 Coding exercises

7 Neural networks

    7.1 The original perceptron

    7.2 Multilayer perceptron

        7.2.1 Introduction and notations

        7.2.2 Universal approximation

        7.2.3 Learning via back-propagation

        7.2.4 Further details on classification

    7.3 How deep we should go and other practical issues

        7.3.1 Architectural choices

        7.3.2 Frequency of weight updates and learning duration

        7.3.3 Penalizations and dropout

    7.4 Code samples and comments for vanilla MLP

        7.4.1 Regression example

        7.4.2 Classification example

        7.4.3 Custom losses

    7.5 Recurrent networks

        7.5.1 Presentation

        7.5.2 Code and results

    7.6 Other common architectures

        7.6.1 Generative adversarial networks

        7.6.2 Autoencoders

        7.6.3 A word on convolutional networks

        7.6.4 Advanced architectures

    7.7 Coding exercise

8 Support vector machines

    8.1 SVM for classification

    8.2 SVM for regression

    8.3 Practice

    8.4 Coding exercises

9 Bayesian methods

    9.1 The Bayesian framework

    9.2 Bayesian sampling

        9.2.1 Gibbs sampling

        9.2.2 Metropolis-Hastings sampling

    9.3 Bayesian linear regression

    9.4 Naïve Bayes classifier

    9.5 Bayesian additive trees

        9.5.1 General formulation

        9.5.2 Priors

        9.5.3 Sampling and predictions

        9.5.4 Code

III From predictions to portfolios

10 Validating and tuning

    10.1 Learning metrics

        10.1.1 Regression analysis

        10.1.2 Classification analysis

    10.2 Validation

        10.2.1 The variance-bias tradeoff: theory

        10.2.2 The variance-bias tradeoff: illustration

        10.2.3 The risk of overfitting: principle

        10.2.4 The risk of overfitting: some solutions

    10.3 The search for good hyperparameters

        10.3.1 Methods

        10.3.2 Example: grid search

        10.3.3 Example: Bayesian optimization

    10.4 Short discussion on validation in backtests

11 Ensemble models

    11.1 Linear ensembles

        11.1.1 Principles

        11.1.2 Example

    11.2 Stacked ensembles

        11.2.1 Two-stage training

        11.2.2 Code and results

    11.3 Extensions

        11.3.1 Exogenous variables

        11.3.2 Shrinking inter-model correlations

    11.4 Exercise

12 Portfolio backtesting

    12.1 Setting the protocol

    12.2 Turning signals into portfolio weights

    12.3 Performance metrics

        12.3.1 Discussion

        12.3.2 Pure performance and risk indicators

        12.3.3 Factor-based evaluation

        12.3.4 Risk-adjusted measures

        12.3.5 Transaction costs and turnover

    12.4 Common errors and issues

        12.4.1 Forward looking data

        12.4.2 Backtest overfitting

        12.4.3 Simple safeguards

    12.5 Implication of non-stationarity: forecasting is hard

        12.5.1 General comments

        12.5.2 The no free lunch theorem

    12.6 First example: a complete backtest

    12.7 Second example: backtest overfitting

    12.8 Coding exercises

IV Further important topics

13 Interpretability

    13.1 Global interpretations

        13.1.1 Simple models as surrogates

        13.1.2 Variable importance (tree-based)

        13.1.3 Variable importance (agnostic)

        13.1.4 Partial dependence plot

    13.2 Local interpretations

        13.2.1 LIME

        13.2.2 Shapley values

        13.2.3 Breakdown

14 Two key concepts: causality and non-stationarity

    14.1 Causality

        14.1.1 Granger causality

        14.1.2 Causal additive models

        14.1.3 Structural time series models

    14.2 Dealing with changing environments

        14.2.1 Non-stationarity: yet another illustration

        14.2.2 Online learning

        14.2.3 Homogeneous transfer learning

15 Unsupervised learning

    15.1 The problem with correlated predictors

    15.2 Principal component analysis and autoencoders

        15.2.1 A bit of algebra

        15.2.2 PCA

        15.2.3 Autoencoders

        15.2.4 Application

    15.3 Clustering via k-means

    15.4 Nearest neighbors

    15.5 Coding exercise

16 Reinforcement learning

    16.1 Theoretical layout

        16.1.1 General framework

        16.1.2 Q-learning

        16.1.3 SARSA

    16.2 The curse of dimensionality

    16.3 Policy gradient

        16.3.1 Principle

        16.3.2 Extensions

    16.4 Simple examples

        16.4.1 Q-learning with simulations

        16.4.2 Q-learning with market data

    16.5 Concluding remarks

    16.6 Exercises

V Appendix

17 Data description

18 Solutions to exercises

    18.1 Chapter 3

    18.2 Chapter 4

    18.3 Chapter 5

    18.4 Chapter 6

    18.5 Chapter 7: the autoencoder model and universal approximation

    18.6 Chapter 8

    18.7 Chapter 11: ensemble neural network

    18.8 Chapter 12

        18.8.1 EW portfolios

        18.8.2 Advanced weighting function

    18.9 Chapter 15

    18.10 Chapter 16

Bibliography

Index

📜 SIMILAR VOLUMES

Machine Learning for Factor Investing: R

📁 Machine Learning for Factor Investing: R Version

✍ Guillaume Coqueret, Tony Guida 📂 Library 📅 2020 🏛 Chapman and Hall/CRC 🌐 English

Machine Learning for Factor Investing: P

📁 Machine Learning for Factor Investing: Python Version

✍ Guillaume Coqueret, Tony Guida 📂 Library 📅 2023 🏛 CRC Press 🌐 English

Machine Learning (ML) is progressively reshaping the fields of quantitative finance and algorithmic trading. ML tools are increasingly adopted by hedge funds and asset managers, notably for alpha signal generation and stocks selection. The technicality of the subject can make it hard for non-special

Probabilistic Machine Learning for Finan

📁 Probabilistic Machine Learning for Finance and Investing

✍ Deepak Kanungo 📂 Library 📅 2022 🏛 O'Reilly Media, Inc. 🌐 English

Whether based on academic theories or machine learning strategies, all financial models are at the mercy of modeling errors that can be mitigated but not eliminated. Probabilistic ML technologies are based on a simple and intuitive definition of probability and the rigorous calculus of probability t

Financial Machina: Machine Learning For

📁 Financial Machina: Machine Learning For Finance: The Quintessential Compendium for Python Machine Learning For 2024 & Beyond

✍ Sampson, Josh; Strauss, Johann; Bisette, Vincent; Van Der Post, Hayden 📂 Library 📅 2024 🏛 Reactive Publishing 🌐 English

Reactive Publishing "Step beyond the horizon of traditional finance with "Financial Machina: The Quintessential Compendium." This magnum opus isn't just a guide; it's your cipher to decode the enigmas of financial data science. Perfect for the finance maverick hungry for the acumen that only mach

Financial Machina: Machine Learning For

📁 Financial Machina: Machine Learning For Finance: The Quintessential Compendium for Python Machine Learning For 2024 & Beyond

✍ Sampson, Josh & Strauss, Johann & Bisette, Vincent & Van Der Post, Hayden 📂 Library 📅 2024 🏛 Reactive Publishing 🌐 English

"Step beyond the horizon of traditional finance with "Financial Machina: The Quintessential Compendium." This magnum opus isn't just a guide; it's your cipher to decode the enigmas of financial data science. Perfect for the finance maverick hungry for the acumen that only machine learning can provid

Probabilistic Machine Learning for Finan

📁 Probabilistic Machine Learning for Finance and Investing (Sixth Early Release)

✍ Deepak Kanungo 📂 Library 📅 2023 🏛 O'Reilly Media, Inc. 🌐 English