Probability of Backtesting Overfitting Combinatorially Symmetric Cross-Validation (CSCV)

The core question we are asking is this: What constitutes a legitimate empirical finding in the context of investment research? The question of `legitimate empirical findings' is particularly troubling when researchers conduct multiple tests. The probability of finding false positives increases with the number of tests conducted on the same data. Since by genetic algorithms and Walkforward optimization such a number of trails can be extremely huge and just by change we can get a false positive we need to have a robust statistical method in StrategyQuant to assess the probability of backtest overfitting.


The goal should be to develop computational techniques to control for the increased probability of false positives as the number of trials increases, applied to the particular field of investment strategy research.
Typically the principal reason for this underperformance is that the IS "optimal" strategy is so closely tied to the noise contained in the training set that further optimization of the strategy becomes pointless or even detrimental for the purpose of extracting the signal.


If we set as null hypothesis that backtest overtting has indeed taken place, and develop an algorithm that tests for this hypothesis. For a given strategy, the probability of backtest overtting (PBO) is then evaluated as the conditional probability that this strategy underperforms the median OOS while remaining optimal IS. For this reason, we need to use a specific implementation, which it is called a combinatorially symmetric cross-validation (CSCV). One advantage of this solution is that it only requires time series of backtested performance. And it avoids the credibility issue of preserving a truly out-of-sample test-set by not requiring a fixed "hold-out," and swapping all in-sample (IS) and out-of-sample (OOS) datasets.


In this CSCV algorithm the backtest strategy selection process overfits if a strategy with optimal performance in IS has an expected ranking below the median in OOS. In other words, we say that a strategy selection process overfits if the expected performance of the strategies selected in IS is less than the median performance rank in OOS of all strategies. In that situation, the strategy selection process becomes in fact detrimental. 


Estimating the probability in a particular application relies on schemes for selecting samples of IS and OOS pairs. This section is devoted to establishing such a procedure, which it is named combinatorially symmetric cross-validation, abbreviated as (CSCV) for convenience of reference. The CSCV algorithm is detailed in page 11 and 12 in the research paper called "The Probability of Backtest Overfitting 2015" attached.

Attachments
The Probability of Backtest Overfitting 2015.pdf
(1.16 MiB)
  • Votes +4
  • Project StrategyQuant X
  • Type Feature
  • Status Archived
  • Priority Normal
  • Assignee None

History

Jd
#1

jdelcarm66

11.05.2020 18:16

Task created

Jd
#2

jdelcarm66

11.05.2020 18:17
Voted for this task.
r
#3

RNG

13.06.2020 12:05
Voted for this task.
BS
#4

BobS

27.07.2020 21:51
Voted for this task.
a
#5

Venus

01.08.2020 09:41
Voted for this task.
MF
#6

Mark Fric

16.03.2021 10:35

Status changed from New to Archived


Votes: +4

Drop files to upload

or

choose files

Max size: 5MB

Not allowed: exe, msi, application, reg, php, js, htaccess, htpasswd, gitignore

...
Wait please