-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make LLVM better at using XMM registers to perform structure moves #35093
Comments
Tagging this as help wanted because it's a significant (and kinda obvious) optimization. Unfortunately we don't know how to make LLVM do it. Will require some digging. |
@brson It might be impossible in the general case if LLVM can't get some guarantees about bytes that are never touched by the individual copies (i.e. the padding bytes). |
It may also be undesirable in the general case for security reasons. |
Are you sure this is a release build? That looks extremely suspicious. |
Looking at the IR for
Honestly, I have no idea where exactly those assembly instructions come from. This part is a potential clue, but I can't find the servo[0x100bf97c8] <+6760>: mov eax, dword ptr [r13 + 0x15c]
servo[0x100bf97cf] <+6767>: test ah, 0x6
servo[0x100bf97d2] <+6770>: je 0x100bf9949 ; <+7145> |
@pcwalton I convinced myself this is a release + debuginfo build, but that's not actually true, is it? This explains where all the code comes from ( |
Is this before or after optimization? |
@DemiMarie @eefriedman Turns out the IR is from a debug build (my fault, forgot to add |
!tbaa.struct is a way to describe padding holes when doing |
Triage: not aware of any changes here. This is a pretty abstract bug... |
Here's a small snippet of Servo code from block flow fragmentation:
I see this all over the place. It should be using XMM registers instead. This is bad because: (a) it clogs up the instruction stream; (b) it's an inefficient way to perform structure moves; (c) it kills tons of registers, resulting in spills elsewhere (notice rax, rdx, rdi, r8, r9, and r10 are all dead for no good reason); (d) it puts pressure on the register allocator, making compile times worse.
Is there some way to get LLVM to emit the right thing here?
cc @eddyb @brson
The text was updated successfully, but these errors were encountered: