In my SDL 2 racing game, I use 2 threads and they use approx. 25% of the CPU power. The CPU speed is 3.80 Ghz in this case. The game runs realtime and in every frame the code uses approx. 5 - 7 ms.
I tested using 3 threads and then the code uses approx. 3 - 5 ms in each frame. So, 3 threads is faster. However, the CPU utilization becomes 37%! And that is on my PC. Other slower PC's might get a higher CPU utilization.
So, I thought, maybe I can use a Mutex to let the operating system use the threads in the remaining time for each frame that is not used.
It worked. I used EnterCriticalSection(pCritSec) and LeaveCriticalSection(pCritSec). The CPU utilization goes down to 12%. A profit of 25% CPU utilization!
However, the CPU speed clocks down from 3.8 Ghz to 2.0 Ghz and the code speed for each frame becomes unpredictable: 3 - 13 ms, and that is unacceptable.
Then I did not use the mutex on one thread of the 3. The result was that the CPU runs at 3,80 Ghz but the CPU utilization is 25%. The code speed is 4 - 6 ms per frame. This seems to be the best solution.
zaterdag 16 november 2019
zaterdag 2 november 2019
Modern processor rep movsb performance
According to https://stackoverflow.com/questions/43343231/enhanced-rep-movsb-for-memcpy, the assembler instruction "rep movsb" has been enhanced since Haswell (4770) to use 256-bit operations internally.
I notice that when coding assembly: The "rep movsq" is as fast as "rep movsb". It is a bit counter-intuitive. You tend to think assembly is the lowest level of optimization, but that is not the case: the bytecode interpreter also does optimization.
The naming of the instructions are confusing: movsq moves 8 bytes. The name Quad makes you think that 4 bytes are moved. No, that is done by the movsd instruction which stands for move double word.
Double makes you think that 2 bytes are moved. No, that is done by the movsw instruction which stands for move word.
In the 16-bit era a "word" was the native unit of the processor. That's fine with me, but calling 32 bits a double word is confusing. They should have named it Quad, and 8 bytes an Oct.
I notice that when coding assembly: The "rep movsq" is as fast as "rep movsb". It is a bit counter-intuitive. You tend to think assembly is the lowest level of optimization, but that is not the case: the bytecode interpreter also does optimization.
The naming of the instructions are confusing: movsq moves 8 bytes. The name Quad makes you think that 4 bytes are moved. No, that is done by the movsd instruction which stands for move double word.
Double makes you think that 2 bytes are moved. No, that is done by the movsw instruction which stands for move word.
In the 16-bit era a "word" was the native unit of the processor. That's fine with me, but calling 32 bits a double word is confusing. They should have named it Quad, and 8 bytes an Oct.
Abonneren op:
Posts (Atom)