I have tried to activate the debugging, but the crash does not occur any more ... I do not know if the issue could be related to the huge number of strategies created (more than 1.5Million/hour in no-debug mode, which drops to 800K/hour in debug mode). When the crashes occurr, all physical parameters of the PC are OK (Mem, CPU Temp, ...)
Let me know how I can support further the debug of the issue
M
Attachment Underpower 180W Stress Test.png added
Attachment Underpower 180W Stress Test.png deleted
Attachment Immagine 2022-10-15 162446.png added
The graph of HW resources during run is attached: indicators are pretty normal.
I have already underpowerd the CPU to keep T stable, and performed a 24h HW stress test using XTU at 100% CPU all Core used withour any problem.
Moreover I checked the XTU logs just before the PC crashed: the HW utilization was very low:
I will anyhow try to lower to number of cores in SQX to see what happens
Did you test the memory?
Actually, SQX caused the PC crash several times BEFORE I undepowered it.
In fact, I thought that the high Core temps I had when I ran SQX could be the root cause. This is why I underpowered the CPU.
The underpowered config is anyhow very stable also under stress test or using other apps.
The problem doesn't occurr if the number of strategies/hour is "normal". For example I' ve been running the WFlow with M30 instead of H4 for the last 4 hours @ 400K strategies/hour and everything is going fine.
(*) btw I am not sure the config is really applied. This is what I find in the logs
16:37:18.127 [Thread-91632] INFO c.s.p.Task.impl.Retest.RetestTask - Batch size computed to: 5, SingleOptims: false, str to test: 7, totalCores: 24, Backtest mode: 1
I see but had you done an oc already? I'd try a default non oc setup.
It's just so unlikely that sqx can crash windows. I read all the bug reports and have read thousands over the last few years and not seen this before except for maybe 2 hardware issues. Except it did happen to me once when I was unstable OC. I had to reduce the clockspeed AND increase the voltage to get it stable even though it passed some other stability tests fine, sqx still crashed. The fact that you need to hit a number of strats per hour to crash lends credence to the unstable oc hypothesis. So yeah I'd go for lower clock speeds AND increase in voltage, obv minding temp.
In fact I firstly had your same exact toughts: "the HW util must be too high and the Temps not well controlled". This is why I started looking for a safer config.
Actually, che custom config is not Oveclocked. I just set the Core Package setting to have a more stable TEMP. This causes the CPU to run to a bit lower Clock Speed than the standard Config (4.6Ghz instead of 4.8).
This config has been checked with a 24h stress test and a 2h memory test without any issue. Temps have been kept well under control (80C vs 100C of the unlimited config).
Very tricky.
Anyhow, if the issue does not happen under "normal" strategy generation is not that serious. My only doubt is that there could be some "hidden" issue in sqx in using very performing HW and/or a high number of parallel threads, that could become more serious in the future when powerful hw will become more and more available at low price.
Attachment Graph 1.png added
Attachment Graph 2.png added
I ran a lower strategy throughput (M30, approx 400K/hour) cycle for approx 12h without any issue.
I then re-set the CPU number to the original value (1 for GUI, 23 for Computing) but changed a little bit the Memory configuration like this:
I then restarted the high-throughput cycle (H4, approx 1.7M/hour) to re-create the issue. I noticed that CPU util and parameters are lower compared to the low-throughput scenario (left part of graph 1), while the memory util is significantly bigger (right part of graph1).
The cycle has been running regularly for the last 3 hours without any crash, and memory seems well under control (Graph 2). This has never happend before with the standard memory config.
I am then wandering if the issue could be related to the memory allocation under heavy throughput conditions.
I will leave the cycle running for some hours and let you know what happens..,
Attachment log_2022_10_18.log added
Attachment Graph 3.png added
Attachment Graph 4.png added
My personal reacords of over 1.8M strategies generated/hour, 140K accepted/hour has been reached (Graph 3)
Al PC resources seemed OK all along the test and also just befoe the crash, including memory which reached the max of 32GB during one of the cycles without any issue (Graph 4)
The following error, which I haven't seen before, appered in the log (attached) several times and also a few seconds before the crash.
ERROR c.s.p.S.impl.Project.ProjectServlet - UI error: DatabankCtrl for Build & Test 1/Backup is overwhelmed, old updates were not processed yet and new updates are coming
The crashes causes the Workflow corruption and all strategies, also those saved, are lost
I hope this can help in the debug.