DELETE
Attachments
No attachments
  • Votes +9
  • Project StrategyQuant X
  • Type Feature
  • Status New
  • Priority Normal

History

k
#1

Karish

29.03.2021 21:31

Task created

k
#2

Karish

29.03.2021 21:31
Voted for this task.
JH
#3

Jabezz

30.03.2021 07:07
Voted for this task.
l
#4

lemming78

30.03.2021 07:30
Voted for this task.
MF
#5

Mark Fric

30.03.2021 09:05
this looks simple enough, how would you want to configure it?


I assume there should be a choice of K (number of parts) - 3,4,5,6.


Should all the parts be the same or they can be different?

k
#6

Karish

30.03.2021 09:43

Yey, im glad you liked the idea, because from my deep research, this is a method is been used in all predictive models in the Data Science field,


About your question of the implementation ideas,
Yes,  There should be a value of "K" which would essentially mean,  K=(TotalAmountOfData/NumberOfFolds),

So for example if we want 10 folds for example, The total amount of data would be divided by 5 parts of 20% out of the total of 100% of the data,

ALL The parts should be the same in value*.


(I was kinda drunk when writing the first post, i noticed that i had some miscalculations of 4 folds and 20% which was needed to be 25%*, so sorry about that hehe...),

Lets make an example of 5 Folds this time:

User set the K-Folds parameter to "5",
SQ will divide the data into 5 SIMILAR SIZED folds (parts), (100%/5=20% per each part):


Run 1 = [IS][IS][IS][IS][OOS]
Run 2 = [OOS][IS][IS][IS][IS]
Run 3 = [IS][OOS][IS][IS][IS]
Run 4 = [IS][IS][OOS][IS][IS]
Run 5 = [IS][IS][IS][OOS][IS]


Each step will also have some kind of score (Maybe we can use R-Squared?),
And if all the Runs will have have some kind of an Average value of R-Squared We will consider that the strategy passed the K-Fold Cross Validation.


More info about the benefits and the proper usage of this method can be found anywhere by researching "K-Fold Cross Validation".

h
#7

hankeys

30.03.2021 10:01
if the strategy is already tested and passed on IS IS IS IS OOS, why to test it on the same data as OOS IS IS IS IS?
k
#8

Karish

30.03.2021 10:24
Good point Hankeys,

We would like to get a Score on each OOS part of each K-Fold Run step,
And then we will be able to see the Average of that Score

Run 1 = [IS][IS][IS][IS][OOS]  =  Score = 0.21 R-Squared
Run 2 = [OOS][IS][IS][IS][IS]  =  Score = 0.43 R-Squared
Run 3 = [IS][OOS][IS][IS][IS]  =  Score = 0.47 R-Squared
Run 4 = [IS][IS][OOS][IS][IS]  =  Score = 0.72 R-Squared
Run 5 = [IS][IS][IS][OOS][IS]  =  Score = 0.75 R-Squared

Overall score = (run1+run2+run3+run4+run5 / K)

Or in English..:
Overall score = (0.21+0.43+0.47+0.72+0.75 / 5) = [Average R-Squared of 1.98]

Essentially we want to know how stable and robust our model is, This method can be applied to Optimizations tests as-well,

What i would like to see is that this K-Fold method be implemented with the Genetic Algorithm inside SQ, this way the Genetic Algorithm will be more robust rather than over-fitting biased.
k
#9

Karish

30.03.2021 11:18
I researched about it deeper and came to a conclusion that K-Fold validation cannot be used for Time-series.

Although i would still like to see more Accurate and Robust way to use the Random/Genetic Generations results,

..  Because the Builder is some-what an optimization engine that tries different parameters to fit our Ranking Filters and then show us the result as a strategy in our databank,

In this so called "Optimization" processes we can already implement some "forecasting procedure" validation from the get go that will already have some predictive futuristic bias with it,
BEFORE it will enter our databank,


Hence i found some good references:

This will explain why K-Fold cannot be used, and explain other different methods that are some-what Walkforward Validations:
https://medium.com/@soumyachess1496/cross-validation-in-time-series-566ae4981ce4

OneStepCross-Validation (Looks like what we already got in WFM but the IS part stays with the same starting period),
+
MultiStepCross-Validation (Same as the above but seems to be more future predictive?)
https://www.youtube.com/watch?v=oGqsyv49Wvo



t
#10

tnickel

30.03.2021 11:39
Voted for this task.
SS
#11

Stormin_Norman2

30.03.2021 11:48
Voted for this task.
b
#12

bentra

30.03.2021 17:26
 RE: for putting K-Fold in BUILDER....

If you want to analyze a strategy for stability by breaking data up and getting scores for smaller segments what's wrong with [is1][is2][is3][is4][is5] and then using appropriate filters? Or a simple column script could get the first 20% of trades and get a stat then next 20% etc. then give an average. No need to retest 5x..... 


The K-Fold analysis is for training on IS and verifying on OS. It is for checking the performance of the training process itself ie an optimization process or machine learning training process. So what training process are we really trying to analyze here the builder itself? Or you want to do a mini optimization within the building process? (if so it will be way slower than 5x validations, it will be more tests per ITERATION of course, also I think it would be comparable or even better to do a batch build first then a k-fold analysis optimization task after.)

K-Fold would be awesome as a new task or implemented in to optimization task but why complicate the builder itself?

k
#13

Karish

30.03.2021 17:47
Bentra,
I already noted that K-Fold cannot be used with a Time Series based data like within markets etc which is a time based series, so no K-Fold can help us out here,

this method is been used in Data Analysis of different things that are NON-Time Series based,


Hence i changed the subject of the topic for the following,



OneStepCross-Validation (Looks like what we already got in WFM but the IS part stays with the same starting period), + MultiStepCross-Validation (Same as the above but seems to be more future predictive?) https://www.youtube.com/watch?v=oGqsyv49Wvo


It would be an awesome feature to have inside SQ if we will have the SQ builder engine a robustness method within itself,
This will help us a bunch.


The following 2 above methods are already kinda implemented into SQX already, like the first one: OneStepCross-Validation,
and WF in general is the ultimate method in Data analysis for validation of models if they are fitted or not,

If this method would be available to us inside the first steps of strategy mining, this will save us huge amount of work,
Im not saying that SQ's Dev-team need to change the whole damn thing, what i am saying is to give this to us as an option to tick with a "V" and use it,
if the user wont want to use it, he could simply turn it off.



b
#14

bentra

30.03.2021 18:41
OneStepCross-Validation (Looks like what we already got in WFM but the IS part stays with the same starting period),

Yes it's anchored and we have it already in sqx. in sqx it is "floating/fixed" (select "fixed" to have an anchored IS start time.)

b
#15

bentra

30.03.2021 18:51
                           If this method would be available to us inside the first steps of strategy mining, this will save us huge amount of work,

Im not saying that SQ's Dev-team need to change the whole damn thing, what i am saying is to give this to us as an option to tick with a "V" and use it,

if the user wont want to use it, he could simply turn it off.



to be clear now this ticket is only about adding WF in to the builder?


                       Hence i changed the subject of the topic for the following,

Nope it still says k-fold in the subject and the main text of this ticket still shows k-fold examples.....

k
#16

Karish

30.03.2021 19:01

Subject changed from K-Fold Cross Validation to Builder's Cross Validation for less overfitting strategies from the get-go ?

k
#17

Karish

30.03.2021 19:02


to be clear now this ticket is only about adding WF in to the builder?


Yes, i guess so, The builder is an Optimization engine already as it is, why not just add some kind of validation method to make it robust from the get go?, atleast an option to.

b
#18

bentra

30.03.2021 19:28

OK but looks like Clonex started working out the leakage issue to try and make k-fold useful.

https://analyticsindiamag.com/can-we-trust-k-fold-cross-validation-for-financial-modelling/


As for putting it in to builder. It seems like you'd just be removing a cpu load from one part of your worklflow and putting it in to another part. It shouldn't necessarily be more efficient to do it at the beginning with builder than at the end. It's preferable to filter on the quickest tests first so we have less strategies to do the longer tests on. Any kind of WF or k-fold analysis is a long test. The results are exactly the same if you do the long test first except you've done more work.

The builder is an Optimization engine already as it is,
I don't think so, not really. The builder is not constrained to a single strategy like optimizer it's actually swapping new blocks in to make completely different strategies. In theory you could use a restricting template to build with an in situ "analysis" (k-fold or other) to check the TEMPLATE performance itself though.... The building process itself combined with the template is what you'd be analyzing in that case. To analyze each strategy with WF or  k-fold we need to do a full WF or k-fold on each strategy and for that as I pointed out above it can and probably should be in a different task. 

k
#19

Karish

30.03.2021 22:19
This feature by @Hankeys looks unique,
Please review:

https://roadmap.strategyquant.com/tasks/sq4_5699/edit


Another thing that i thought about is this:

Each strategy we will find we will optimize all the parameters of the strategy with %+-X of each parameter of the strategy with the steps of %+-Y, so for example the generation found a strategy of Bar Closed > MA100, TP 200, SL 200, The method will take all the 3 values that are available to us for this example: MA TP SL and will check automatically if the surrounding parameters are robust, This will work speretly for each parameter OR at once for all the parameters (By choosing to do so), MA = 100 TP = 200 SL = 200 Lets say we will optimize all the parameters for this simple example with 10 steps for all, STEPS of optimization = 10 MAXimum of optimization values = 25 so MA 100 will be optimized as this: (100 = 100%), (1000.25=25), (25/10=2.5). & TP 200 will be optimized as this: (200 = 100%), (2000.25=50), (50/10=5). & SL 200 will be optimized as this: (200 = 100%), (2000.25=50), (50/10=5). If all STEPS surrounding the parameters passed our criteria than the strategy will pass this validation method, What do you think?, seems to be a simple one..



k
#20

Karish

31.03.2021 18:06

This is a new more specific feature suggestion on what i wrote above this msg, please vote:


https://roadmap.strategyquant.com/tasks/sq4_7901

k
#21

Karish

02.04.2021 18:18

Made this one: https://roadmap.strategyquant.com/tasks/sq4_7915/edit


less work cause its already implemented elsewhere inside SQ..


please vote

JT
#22

TiNTa

03.05.2021 18:49
Voted for this task.
k
#23

Karish

09.05.2021 15:02

Please check:


https://roadmap.strategyquant.com/tasks/sq4_8062



CG
#24

Chris G

10.05.2021 05:42
Voted for this task.
b
#25

beppil

26.05.2021 11:33
Voted for this task.
k
#26

Karish

17.03.2024 21:29

Subject changed from Builder's Cross Validation for less overfitting strategies from the get-go ? to DELETE

Description changed:

DELETE


Votes: +9

Drop files to upload

or

choose files

Max size: 5MB

Not allowed: exe, msi, application, reg, php, js, htaccess, htpasswd, gitignore

...
Wait please