129 dev 6 SQX only sees cores on one socket of dual socket server cpu

On windows server I have 104 cores with a dual socket cpu but I can only see the cores on one socket. 

See screenshot

Attachments
Screen Shot 2020-07-10 at 15743 AM.png
(702.70 KiB)
  • Votes +3
  • Project StrategyQuant X
  • Type Bug
  • Status New
  • Priority Normal

History

KL
#1

kainc301

10.07.2020 08:02

Task created

KL
#2

kainc301

10.07.2020 08:11

Subject changed from 129 dev 6 SQX only sees cores on one socket on dual socket server cpu to 129 dev 6 SQX only sees cores on one socket of dual socket server cpu

DB
#3

Enyx

10.07.2020 08:43

Is this a regression in comparison to a previous version ?


My bet. I think you are hitting well known windows processor group limit. Unmodified applications support only a single processor group which may be presented in weird way as you see it. To be clear, It's not SQX problem by a windows design.


As a current workaround either disable threading or find external app to modify app processor group to see all processors. Future LNX versions may not have this limit.

DB
#4

Enyx

10.07.2020 08:47

FYI:


When the system starts, the operating system creates processor groups and assigns logical processors to the groups. If the system is capable of hot-adding processors, the operating system allows space in groups for processors that might arrive while the system is running. The operating system minimizes the number of groups in a system. For example, a system with 128 logical processors would have two processor groups with 64 processors in each group, not four groups with 32 logical processors in each group.


==> As you described you see half of processors.


https://docs.microsoft.com/en-us/windows/win32/procthread/processor-groups


Just curious.. Is that a VM ? Do you have CPU hot-add ? Possible to disable it? It should create two unequal processor groups.


KL
#5

kainc301

10.07.2020 08:56
Okay so this is the 64 core logical processor limitation I heard about. Theres no way to set affinity from both sockets to the same SQX program.


Workaround is to install two separate instances of SQX and then I can set affinity in the task manager for the other socket. It seems like all is well going this route. However there has been no way for me to yet confirm if the two programs are actually using the different sockets after changing affinity of CPUs for one of them.


This can be closed. 

DB
#6

Enyx

10.07.2020 09:09

Indeed it's not actual limit. It's by default that apps are running in a single processor group. Extra processor groups has to be enable programmatically. It can be fixed for sure but it's developers that need to take care (I am not from core team). It's absolutely possible but not a quick fix.


You can check here: https://bitsum.com/portfolio/groupextend.Never tried but is a good start. I am sure it's doable and it's worth to use all cores.


Enyx

m
#7

mabi

10.07.2020 10:55
Tried bitsum it does actually use all cores when enabled but not more then 50-60 % which was wierd and i had to uninstall it because even if i started another session of SQ i still never reached more then  max 60%. There was noway to reset it so the only way was to uninstall bitsum and reboot.
KL
#8

kainc301

10.07.2020 11:47
If it can be done then by all means go ahead because using multiple instances inconvenient. I have not found a way to share data between the instances so data is copied and this is annoying. 
E
#9

Emmanuel

10.07.2020 12:20
Voted for this task.
DB
#10

Enyx

10.07.2020 14:16

It became interesting discussion, I am interested in both Win and Lnx affinity code so I will check native SQX library.. Stay tuned.


DB
#11

Enyx

10.07.2020 16:12

Step 1) The problem is that native runtime call does not properly report total cores (but just withing associated PG). SQX use that number as hard limit which explains why Bitsum hack did not worked out plus other symptoms.


Problem simulated with 4 cores/2 numa nodes divided in 2 processor groups. Same initial manifestation like 64+ cores


16:00:52.193 [main] WARN  c.s.g.utils.CoreUsagesEvaluator - Number of cores is higher that available cores. Using 2 core.
16:00:52.193 [main] INFO  c.s.g.utils.CoreUsagesEvaluator - Total cores available: 2, using: 2
16:00:52.197 [main] DEBUG c.s.g.c.p.MultithreadComputePerformer - Using: 2 cores.


Step 2) Once this code is fixed (even though not really a bug but works as designed) need to be validated build in affinity code which "should work"



Really so far does not seem like something developers cannot fix.. we will see.


Enyx

AA
#12

echelon

21.07.2020 00:00
Voted for this task.
MF
#13

Mark Fric

22.07.2020 08:54
we are looking into it, but it is not that simple. Moving the task to a next build.

Votes: +3

Drop files to upload

or

choose files

Max size: 5MB

Not allowed: exe, msi, application, reg, php, js, htaccess, htpasswd, gitignore

...
Wait please