In the investment industry the algorithmic models, especially machine learning models, have replaced classical statistical models in recent years. However, overfitting is a comman problem in machine learning and statistics. When used incorrectly, the risk of ML overfitting is higher than with classical methods.
The traditional linear models are unable to discover the complex pattern of the data: non-linear and hierarchical patterns. They process lower dimensional data and numeric data, while the modern applications process categorical and high dimensional data. In addition, the applications of linear models have several pitfalls.
In general, portfolio construction has centered around optimizing risk adjucted return. In recent years, we see institutional investors ass environmental and social impact goals to their portfolio’s objective. This focus on sustainability in addition to risk and return affects asset allocation and is important in equity portfolio. The question is: how can investors construct equity portfolio and achieve both return/risk and sustainability objectives?
In portfolio construction, minimizing volatility (variance) is a commonly uased algorithm. However, volatility is asymmetric measurement, and it does not differentiate upside or downside risk.
An alternative risk measurement is CVaR (conditional value-at-risk).
Here we introduce the mail steps of developing machine learning models. As an example to demonstrate, we provide online Python code for each step and a complete case study.
Online portfolio selection is a fundamental problem in investment, which has been extensively applied using machine learning and statistics. Here we provide an introduction to the online portfolio selection techniques.
...
From machine learning perspective, online portfolio selection can be formulated as a sequential decision problem.
Meta labeling is particularly helpful when we want to achieve higher F1-scores of the models by filtering out the false positives. Meta labeling is a powerful to deliver more robust and reliable outcomes than other labeling models.