Redundant clearing of top-32 bit before certain calls to PDEP #442
Labels
area-CodeGen-coreclr
CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
optimization
Milestone
Repro Repo:
https://github.com/damageboy/coreclr-redundant-mov-around-pdep
Relevant piece of code:
https://github.com/damageboy/coreclr-redundant-mov-around-pdep/blob/2f87e88ccd419f26d2b5afcfe09cfadd7a220996/Program.cs#L40-L47
Generated asm:
https://github.com/damageboy/coreclr-redundant-mov-around-pdep/blob/2f87e88ccd419f26d2b5afcfe09cfadd7a220996/listing.asm#L56-L87
Issue
In the asm listing, we can clearly see that the upper 32 bits of the source register for PDEP is cleared as part of the implicit cast to
ulong
.While this is a sensible behaviour, as PDEP might end up reading from those bits, for cases such as the above, where PDEP is supplied with a constant mask, which "happens" to never read a single bit past the first 32 bits anyway, clearing these top bits seems redundant:
From the Intel docs:
In other words, the amount of toggled bits in the mask controls how many low-order bits are read from the source.
As such, a constant mask with 24 bits toggled in this case: (0x0707070707070707 -> each 0x7 is 0b111 x 8 == 24 bits) means that not a single bit from the upper 32 bits will ever be read by this instruction, so clearing those bits as part of the cast is meaningless.
For this case, it would shave off the 8
mov reg,reg
instructions that are currently emmited to clear those bits.category:cq
theme:vector-codegen
skill-level:intermediate
cost:medium
The text was updated successfully, but these errors were encountered: