According to https://stackoverflow.com/questions/43343231/enhanced-rep-movsb-for-memcpy, the assembler instruction "rep movsb" has been enhanced since Haswell (4770) to use 256-bit operations internally.
I notice that when coding assembly: The "rep movsq" is as fast as "rep movsb". It is a bit counter-intuitive. You tend to think assembly is the lowest level of optimization, but that is not the case: the bytecode interpreter also does optimization.
The naming of the instructions are confusing: movsq moves 8 bytes. The name Quad makes you think that 4 bytes are moved. No, that is done by the movsd instruction which stands for move double word.
Double makes you think that 2 bytes are moved. No, that is done by the movsw instruction which stands for move word.
In the 16-bit era a "word" was the native unit of the processor. That's fine with me, but calling 32 bits a double word is confusing. They should have named it Quad, and 8 bytes an Oct.
Geen opmerkingen:
Een reactie posten