zaterdag 2 november 2019

Modern processor rep movsb performance

According to https://stackoverflow.com/questions/43343231/enhanced-rep-movsb-for-memcpy, the assembler instruction "rep movsb" has been enhanced since Haswell (4770)  to use 256-bit operations internally.

I notice that when coding assembly: The "rep movsq" is as fast as "rep movsb". It is a bit counter-intuitive. You tend to think assembly is the lowest level of optimization, but that is not the case: the bytecode interpreter also does optimization.

The naming of the instructions are confusing: movsq moves 8 bytes.  The name Quad makes you think that 4 bytes are moved. No, that is done by the movsd instruction which stands for move double word.
Double makes you think that 2 bytes are moved. No, that is done by the movsw instruction which stands for move word.
In the 16-bit era a "word" was the native unit of the processor. That's fine with me, but calling 32 bits a double word is confusing. They should have named it Quad, and 8 bytes an Oct.

Geen opmerkingen:

Een reactie posten