-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve 32bit ld/st addressing mode propagation #3421
Conversation
Still looking to see if I can improve other types of addressing modes, so this is still marked as draft. |
Was going to say there are no asm_tests failures so it looks like the current change is ok, although I am adding access in 32bits to a block that has been 64 bits only. However, I just noticed glibc tests failed so I will look at those. |
151adb7
to
ee23b55
Compare
This is still failing but I am gaining an understanding of the complexities of this optimization. It's currently failing. My last optimization was a result of fixing this: which was generating a LoadMem between a register (repr eax) and the two complement of 0x40, but failing to sign extend it. The current issue is in :
Here we are loading eax, adding eax and edi and then storing to eax. But when we create the LoadMem, and simplify it we are sign extending the incorrect base. Unsure if there's a sure-fire way to know which of the registers is the base and which is the offset, or if it makes sense to ask this question even. WIP |
3ae67d0
to
645da81
Compare
Spurred on by FEX-Emu#3421. To ensure that applications don't take advantage of small address wrap around, allocate the first 4GB in the 64-bit space. Some context. Linux always reserves the first 16KB of virtual address space (unless you tinker with some settings which nobody should do). Example of 32-bit code: lea eax, [0xffff_0000] mov ebx, [eax + 0x1_0000] The address calculated by the mov will wrap around to 0x0 which will result in SIGSEGV. If FEX messes up zero extensions then it would try to access 0x1_0000_0000 instead. This could result in a 32-bit application potentially accessing some FEX memory instead of crashing. Add this safety net which will still SIGSEGV and we will be able to see the crash.
Spurred on by FEX-Emu#3421 This does a bunch of GPR and vector loads to showcase addressing limitations between ARM and x86. Tests: - 8/16/32/64-bit GPR loads - Both 32-bit and 64-bit addressing modes - 32/64/128-bit Vector loads - Both 32-bit and 64-bit addressing modes - Duplicate the tests for 32-bit addressing mode with a 32-bit process - Since it should change behaviour. Untested: - 8/16-bit vector loadstores since those don't exist on x86 - 16-bit x87 integer load exists but that doesn't go through a vector load in FEX.
…2-bit Spurred on by FEX-Emu#3421. To ensure that applications don't take advantage of small address wrap around, allocate the second 4GB of virtual memory. Some context. Linux always reserves the first 16KB of virtual address space (unless you tinker with some settings which nobody should do). Example of 32-bit code: lea eax, [0xffff_0000] mov ebx, [eax + 0x1_0000] The address calculated by the mov will wrap around to 0x0 which will result in SIGSEGV. If FEX messes up zero extensions then it would try to access 0x1_0000_0000 instead. This could result in a 32-bit application potentially accessing some FEX memory instead of crashing. Add this safety net which will still SIGSEGV and we will be able to see the crash.
…2-bit Spurred on by FEX-Emu#3421. To ensure that applications don't take advantage of small address wrap around, allocate the second 4GB of virtual memory. Some context. Linux always reserves the first 16KB of virtual address space (unless you tinker with some settings which nobody should do). Example of 32-bit code: lea eax, [0xffff_0000] mov ebx, [eax + 0x1_0000] The address calculated by the mov will wrap around to 0x0 which will result in SIGSEGV. If FEX messes up zero extensions then it would try to access 0x1_0000_0000 instead. This could result in a 32-bit application potentially accessing some FEX memory instead of crashing. Add this safety net which will still SIGSEGV and we will be able to see the crash.
This'll need a rebase so it can pick up the new instcountci changes before it can get merged |
645da81
to
b120df1
Compare
done - thx. |
IsInlineConstant is your new friend |
That's awesome - thanks. Friends are never enough. :) |
Not as straightforward as calling the method. It's in |
b120df1
to
5c476f3
Compare
Oops, my bad! I meant IsValueConstant .. sorry for the mixup |
Ah, yes, I was wondering the different between a constant and an inline constant. :) |
5c476f3
to
019555a
Compare
When the source arguments for LoadMem/StoreMem have bit 31 set then they are incorrectly sign extending in some instances. Detected this when testing FEX-Emu#3421 but I don't have a proper fix for it.
When the source arguments for LoadMem/StoreMem have bit 31 set then they are incorrectly sign extending in some instances. Detected this when testing FEX-Emu#3421 but I don't have a proper fix for it.
When the source arguments for LoadMem/StoreMem have bit 31 set then they are incorrectly sign extending in some instances. Detected this when testing FEX-Emu#3421 but I don't have a proper fix for it.
c5e3122
to
5f27180
Compare
I just made some sort of mistake pushing the last patch - sorry. Fixing. |
5f27180
to
8beeef3
Compare
Thank you |
…u#3421 Doesn't quite match the libc code directly because it uses `[gs:eax]` with both having the sign bit set and we can't deal with that with ASM tests. So match the behaviour in a different way.
b7f7aa9
to
f4b3870
Compare
Rebased new patch on main - which should now include the #3466 test. |
This time found in MGRR. It flips the problem space on its head.
Spurred on by FEX-Emu#3421 This does a bunch of GPR and vector loads to showcase addressing limitations between ARM and x86. Tests: - 8/16/32/64-bit GPR loads - Both 32-bit and 64-bit addressing modes - 32/64/128-bit Vector loads - Both 32-bit and 64-bit addressing modes - Duplicate the tests for 32-bit addressing mode with a 32-bit process - Since it should change behaviour. Untested: - 8/16-bit vector loadstores since those don't exist on x86 - 16-bit x87 integer load exists but that doesn't go through a vector load in FEX.
…2-bit Spurred on by FEX-Emu#3421. To ensure that applications don't take advantage of small address wrap around, allocate the second 4GB of virtual memory. Some context. Linux always reserves the first 16KB of virtual address space (unless you tinker with some settings which nobody should do). Example of 32-bit code: lea eax, [0xffff_0000] mov ebx, [eax + 0x1_0000] The address calculated by the mov will wrap around to 0x0 which will result in SIGSEGV. If FEX messes up zero extensions then it would try to access 0x1_0000_0000 instead. This could result in a 32-bit application potentially accessing some FEX memory instead of crashing. Add this safety net which will still SIGSEGV and we will be able to see the crash.
When the source arguments for LoadMem/StoreMem have bit 31 set then they are incorrectly sign extending in some instances. Detected this when testing FEX-Emu#3421 but I don't have a proper fix for it.
Spurred on by FEX-Emu#3421 This does a bunch of GPR and vector loads to showcase addressing limitations between ARM and x86. Tests: - 8/16/32/64-bit GPR loads - Both 32-bit and 64-bit addressing modes - 32/64/128-bit Vector loads - Both 32-bit and 64-bit addressing modes - Duplicate the tests for 32-bit addressing mode with a 32-bit process - Since it should change behaviour. Untested: - 8/16-bit vector loadstores since those don't exist on x86 - 16-bit x87 integer load exists but that doesn't go through a vector load in FEX.
…2-bit Spurred on by FEX-Emu#3421. To ensure that applications don't take advantage of small address wrap around, allocate the second 4GB of virtual memory. Some context. Linux always reserves the first 16KB of virtual address space (unless you tinker with some settings which nobody should do). Example of 32-bit code: lea eax, [0xffff_0000] mov ebx, [eax + 0x1_0000] The address calculated by the mov will wrap around to 0x0 which will result in SIGSEGV. If FEX messes up zero extensions then it would try to access 0x1_0000_0000 instead. This could result in a 32-bit application potentially accessing some FEX memory instead of crashing. Add this safety net which will still SIGSEGV and we will be able to see the crash.
When the source arguments for LoadMem/StoreMem have bit 31 set then they are incorrectly sign extending in some instances. Detected this when testing FEX-Emu#3421 but I don't have a proper fix for it.
…u#3421 Doesn't quite match the libc code directly because it uses `[gs:eax]` with both having the sign bit set and we can't deal with that with ASM tests. So match the behaviour in a different way.
ASM: Another sign extend bug in #3421
@Sonicadvance1 The bug you found in #3471 did feel like a nail in the coffin of this optimization. Problem being that anything that wraps around in 32bits cannot be faithfully represented in a single 64bit ld/st instructions. After a couple of iterations, I thought just reg+reg was problematic, but after your last bug, even reg+const is problematic. If the value in reg is high enough we wrap around and cannot fold the addition into the memory addressing. In other words, you can't generally simplify:
into For the past few days I have been playing with possible solutions including something like mapping the first 4G of pages and the second 4G of pages to the same physical pages. Something like: However, I think I found something that's easier to integrate into FEX. The use of userfaultfd available at least since kernel 4.11. https://gist.github.com/pmatos/677b1e3a3390c00f5c046fbc9bca7f68 What do you think about this? |
This is exactly what #3439 is for. In particular the remark:
That means that (ignoring sign-extension at least), if the constant is < 16384, the fold is legal. Since either:
|
ecfd88f
to
9c16415
Compare
Folds reg+const memory address into addressing mode, if the constant is within 16Kb. Update instcountci files. Add test 32Bit_ASM/FEX_bugs/SubAddrBug.asm
9c16415
to
a86f2d3
Compare
After that explanation the 16Kb comments makes even more sense. Which means that ok - without any other handling - we can indeed deal with 16Kb positive and negative offsets. I have pushed a patch to that effect. I am both upset and happy that I didn't see this coming. On the one hand, I spent a ton of time looking into userfaultfd which was not strictly necessary for this. On the other hand, looking into userfaultfd did show me a couple of cool tricks that could open the door for further optimization. |
Fantastic research in to finding out the limits of 32-bit addressing. I knew it would have problems but I wasn't sure where the limits were at. Known unknowns and whatnot. userfaultfd doesn't necessarily work because it will do copying and accesses that split the ranges will get decoupled. Additionally device mapped memory will have problems. Good to know these problems now and this optimization must be limited to constants <16KB |
@Sonicadvance1 ping - what do you think about merging this now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Gave this some testing and I don't see any regressions from the more limited scope of this now.
Thanks for pushing this through and finding the issues.
Signed-off-by: Paulo Matos [email protected]