-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shim occasionally crashes in _relocate()
on AArch64
#371
Comments
console log of a linux reboot -> no crash -> linux reboot -> crash -> watchdog reboot -> crash -> watchdog reboot -> crash |
I also experienced the crash of AArch64 build. Based on your finding, I looked into crt0-efi-aarch64.S which invokes _relocate(): When checking _DYNAMIC, it's mentioned in the gnuefi document: Per the document, _DYNAMIC is supposed to be the offset to .dynamic. Unlike x86_64, objcopy doesn't support AArch64 EFI, so those pe-coff section header are crafted manually in crt0-efi-aarch64.S, and .dynamic was merged into .data. To track _DYNAMIC, I first checked
and then dumped
On the other hand, the same content in shimaa64.efi was in a different offset:
But, there is no
The same content in shimaa64.so:
So it seems there is a gap of 0x10000 in shimaa64.efi, and maybe this caused the crash. |
The first 0x10000 data of shimaa64.so is the elf header and we use |
Nice find @lcp . I had added some debug |
Disassembled _start in shimaa64.so:
_DYNAMIC is the offset mentioned at '1018': 0x7a000.
Disassembled _start in shimaa64.efi:
In this case, the offset to _DYNAMIC is 495616=0x79000.
So at least the offset to _DYNAMIC is correct. |
Indeed, I also found that a huge
DT_RELA: 0x7d000 In my case,
R_AARCH64_RELATIVE is 1027(0x403), so the entries containing |
Keeping in mind I've little experience w/ binary formats, I wonder what it means that
Is it possible that there simply is no .dynamic array, and the value of |
The dynamic section is merged into the data section: Ideally, those elf sections would be converted to pe-coff sections by objcopy, but it wasn't implemented for AArch64 yet, so shim currently merges those sections into text, data, or rodata sections and crafts the section headers with assembly. |
Yeah, I tried moving the EDIT: I was doing it wrong, I do see a |
I started to realize that the relocation is necessary. For example, a function pointer is stored in .data section and its content points to a function in .text section. When EFI image is loaded, those function pointers needs to be relocated from the original addresses (rel->r_addend) to the actual running addresses (ImageBase + rel->r_addend). Correction: It's ld not objcopy to write the section size. |
The previous commit(*) merged .rel* and .dyn* into .rodata. However, this makes ld to generate the wrong section size for .rel* to cover other unrelated sections. This commit moves .rel* and .dyn* out of .rodata in the ld script. However, since the related variables, such as _evrodata, _rodata_size, and _rodata_vsize, are moved, so the crafted pe-coff section header for .rodata still covers our new .rela and .dyn sections. (*) 212ba30 ("arm/aa64 targets: put .rel* and .dyn* in .rodata") Issue: rhboot#371 Signed-off-by: Gary Lin <[email protected]>
I wrote a testing patch for elf_aarch64_efi.lds: In the .dynamic section of patched shimaa64.efi:
DT_RELA: 0x9b000 The size looks much reasonable now :) |
The previous commit(*) merged .rel* and .dyn* into .rodata, and this made ld to generate the wrong size for .rela* sections that covered other unrelated sections. When the EFI image was loaded, _relocate() went through the unexpected data and may cause unexpected crash. This commit moves .rel* and .dyn* out of .rodata in the ld script but also moves the related variables, such as _evrodata, _rodata_size, and _rodata_vsize, to the end of the new .dyn section, so that the crafted pe-coff section header for .rodata still covers our new .rela and .dyn sections. (*) 212ba30 ("arm/aa64 targets: put .rel* and .dyn* in .rodata") Fix issue: rhboot#371 Signed-off-by: Gary Lin <[email protected]>
Survived > 1600 reboots before I stopped it, so with this arm64 booting seems pretty robust :) |
Yay, good news. Although I'm definitely feeling we should be focussing on getting proper toolchain support which would make this moot... :-) |
Is there a concise technical description of the missing pieces in the toolchain? I know some of the Arm folks working in that area and would be happy to convey a detailed request for assistance. |
@vielmetti Since objcopy doesn't support PE/COFF for AArch64, shim/gnu-efi has to craft the PE/COFF header like this: To make the header work properly, we have to tweak some fields in the section table such as VirtualSize and the variables in the ld script. If objcopy could support pe-aarch64 target, the we can get rid of those workarounds and just convert the elf sections into the result EFI binary. |
The binutils bug entry: |
Cool, I hadn't found the bug report yet and was just thinking about adding one myself |
Hi @vielmetti LTNS! I've just mailed the linaro-toolchain list about this as well, with a reference to the binutils bug that @lcp referenced. Waiting on that showing up in the archives... |
Thanks @steve-mcintyre - relayed this to some Arm folks who may be able to get some additional eyes on task. |
The previous commit(*) merged .rel* and .dyn* into .rodata, and this made ld to generate the wrong size for .rela* sections that covered other unrelated sections. When the EFI image was loaded, _relocate() went through the unexpected data and may cause unexpected crash. This commit moves .rel* and .dyn* out of .rodata in the ld script but also moves the related variables, such as _evrodata, _rodata_size, and _rodata_vsize, to the end of the new .dyn section, so that the crafted pe-coff section header for .rodata still covers our new .rela and .dyn sections. (*) 212ba30 ("arm/aa64 targets: put .rel* and .dyn* in .rodata") Fix issue: #371 Signed-off-by: Gary Lin <[email protected]>
@dannf @steve-mcintyre et al - I'm coming back to this problem after about a month away - there has clearly been some progress in resolving this, but I'm not sure where the current discussion lives (or if it's been summarized recently). My goal is to provide direction for our team that builds Ubuntu and Debian images to make sure they pick up the right versions of the right components so that GRUB works correctly on Ubuntu 20.04 / Debian 10 on arm64 server systems (specifically the Ampere systems at Equinix Metal). |
For Ubuntu this has been resolved in shim 15.4-0ubuntu7 - see https://launchpad.net/bugs/1928010. |
Upstream issue rhboot#371. Closes: #990082
The previous commit(*) merged .rel* and .dyn* into .rodata, and this made ld to generate the wrong size for .rela* sections that covered other unrelated sections. When the EFI image was loaded, _relocate() went through the unexpected data and may cause unexpected crash. This commit moves .rel* and .dyn* out of .rodata in the ld script but also moves the related variables, such as _evrodata, _rodata_size, and _rodata_vsize, to the end of the new .dyn section, so that the crafted pe-coff section header for .rodata still covers our new .rela and .dyn sections. (*) 212ba30 ("arm/aa64 targets: put .rel* and .dyn* in .rodata") Fix issue: rhboot#371 Signed-off-by: Gary Lin <[email protected]> (cherry picked from commit 34e3ef2)
For an unknown reason Debian has `.dyn` between `.vendor_cert` and `.sbat`, while the rest of the world has it between `.rela*` and `.reloc`: <rhboot#371> <lcp@9828f65> <rhboot@34e3ef2> <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=990082> <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=990190> <https://salsa.debian.org/efi-team/shim/-/commit/9ace660bae651b1e07ddfbd1e37d6ae2a11165a7>
Upstream issue rhboot#371. Closes: #990082, #990190
I've been trying to track down an issue with Ubuntu's shim, that I've now found is reproducible with latest upstream shim (as of commit 4068fd4) as well.
If I put a guest in a reboot loop, it will eventually crash:
After it has crashed once, it will enter a watchdog reset/shim crash infinite loop.
I attached gdb to the QEMU process, and I'm pretty confident that it is crashing in
_relocate()
- specifically here:Object file layouts are out of my wheelhouse, but I did try to see if I could understand what is going on. First off, it seems odd that
_relocate()
is finding any relocations to apply in the first place. If I understand this correctly, this is just agnu-efi
's generic loader for EFI binaries, and executables shouldn't have relocations. Indeed, objdump shows no relocations in the ELF shim binary (I don't know how to scan the PE/COFF one).The faulting address appears to be
0x00000000BB9752C8
, which should fall within the.text
section. If I understand this correctly, it's setting up the attributes for the text and data sections:The text section would be the 2nd range there, which has the READONLY attribute set. So, that explains why we're getting the exception. But if relocations need to occur, seems like this would need to be writeable though, again, it doesn't seem like an executable would need relocations...
The text was updated successfully, but these errors were encountered: