-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle multiple LOAD program headers with different biases #107
Conversation
Running this code on patch, I'm seeing a lot of what could be garbage (indicative of bad bias calculations?):
Similarly, when i run bingrep using the new patch (it's much better than before, and gets past the program headers), but fails printing the remainder, in particular the size of a note is bad (and I'm sure if it got to the symbols, it would fail there as well. Before:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swapping out goblin backend for bingrep seems to confirm this still has trouble with the numpy binaries; need to get it figured out before can merge.
} // end if_std | ||
} | ||
|
||
if_alloc! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: how come this got moved to alloc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was always in alloc. I just moved it into another macro and kept it in alloc without thinking about whether it should be. I can move it out as part of this PR if you want, but it's probably better as another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah just leave
@@ -347,7 +324,6 @@ if_sylvan! { | |||
is_64: is_64, | |||
is_lib: is_lib, | |||
entry: entry as u64, | |||
bias: bias as u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm; I don't like to see this field go, but given now we're supporting multiple biases we have to. Or maybe I'm wrong and its useless? Still might be nice to expose the iteration macro as a function hanging off the Elf type for users to pass an unbiased value in and get biased value out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there's no single bias, this field seems wrong or at least misleading to me. Exposing the iteration might be ok, but we can wait until someone needs it. The current iteration can't be part of the Elf type because we need it during construction of the Elf type, but any exposed version would be best as part of the Elf type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, makes sense
macro_rules! elf_dynamic_info_std_impl { | ||
($size:ident, $phdr:ty) => { | ||
/// Convert a virtual memory address to a file offset | ||
fn vm_to_offset(phdrs: &[$phdr], address: $size) -> Option<$size> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some reservations about looping over the phdrs everytime an address lookup is requested; we really need to create an interval tree (and we could also use this in PE, which has very similar lookups).
I know you're trying to avoid allocation, but core dumps, for example, can have hundreds of phdrs, but then that might not ever have any bias calculations? I guess if this only occurs in the single pass over the dynamic info we're ok. I dunno, just some thoughts here, as opposed to allocating an associative list or a proper interval tree (which rust afaics doesn't have a robust one with a nice api unfortunately...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be overkill to do that for the single pass over the dynamic info.
No, this is because the dynamic info only has the start of the symtab, not it's size. The current calculation assumes it finishes at the strtab, and for this file, the symtab is at the start and the strtab is at the end. I guess we could do better by assuming it ends at the end of the program header containing its virtual address, it's still too long though so there will still be invalid symbols. In my opinion the best fix is to use the |
Looks to me like the file really does have a bad size for the note.. or more likely, that note segment is garbage, because the previous note doesn't look correct either, and there is no corresponding note section to match it. Note that binutils 'readelf -n' doesn't display notes in NOTE segments unless it is a core file. So in my opinion the fix is bingrep should handle bad notes. |
Wow ok, will check if skipping the note can let me parse rest of binary. Just realized that the debug dump using rdr example can parse it, so might be ok. Let me get back to you; great work though, thanks! I think we’ll publish a new 0.18 once this goes in, its a good feature :) |
@philipc are you sure the symbol table is correctly constructed in this new patch? I'm seeing complete garbage using both example=rdr and bingrep to parse the multiarray.so example: 451e78c085fffff6 LOCAL NOTYPE BAD NAME 0x4cda8948c931c031 BAD_IDX=40424 0x0 c489411ff8c1ffff UNKNOWN_STB UNKNOWN_STT BAD NAME 0xaed6e8fffffbf2e9 BAD_IDX=38294 0xe8 7b2e058b48fffffd UNKNOWN_STB NOTYPE BAD NAME 0xaebee8188b480034 BAD_IDX=33156 0xf fc085fff8abb3e8 UNKNOWN_STB NUM BAD NAME 0xbc86e8fffffd5f84 BAD_IDX=56969 0x48 7c8b48fffffbb2e9 GLOBAL UNKNOWN_STT BAD NAME 0xff4824448b482024 0x44 894118247c8b48ff UNKNOWN_STB COMMON BAD NAME 0x8b48fff8a88ae8c4 BAD_IDX=65534 0x3d 1f0ffffffb82e9 UNKNOWN_STB NOTYPE BAD NAME 0xc08548fff8ae63e8 .fini(9) 0xe3 824448b49fffffd UNKNOWN_STB UNKNOWN_STT BAD NAME 0x79e93050ffe7894c BAD_IDX=2537 0xff e9e93050ffe7894c UNKNOWN_STB UNKNOWN_STT BAD NAME 0xfff8a9d0e8fffffc BAD_IDX=2084 0x44 90fffffcd5e90009 WEAK NOTYPE BAD NAME 0xd3894853cd894855 BAD_IDX=58166 0xe8 2404c748 UNKNOWN_STB UNKNOWN_STT BAD NAME 0x102444c748 BAD_IDX=1849 0x6 ffff8420e8e28948 UNKNOWN_STB UNKNOWN_STT BAD NAME 0x74c0854824048b48 BAD_IDX=4132 0x4c 247cf7489948c3af UNKNOWN_STB UNKNOWN_STT BAD NAME 0x48c0310045894810 BAD_IDX=3912 0x32 841f0f UNKNOWN_STB UNKNOWN_STT BAD NAME 0xbb7504473904468b BAD_IDX=11878 0xc3 1ba10244c8b48 NUM OBJECT BAD NAME 0x2948c3af0f480000 BAD_IDX=57323 0xc0 c4834800458948f9 UNKNOWN_STB UNKNOWN_STT BAD NAME 0x2e66c35d5bc03128 BAD_IDX=63304 0x99 66a8ebffffffffb8 LOCAL NOTYPE BAD NAME 0x841f0f 0x0 4855cc89495441d5 UNKNOWN_STB OBJECT BAD NAME 0x8348f3894853fd89 BAD_IDX=35137 0x55 c6f7000000a8b7 UNKNOWN_STB UNKNOWN_STT BAD NAME 0xbf840f180000 BAD_IDX=35656 0x8 ab87f60000 WEAK FILE BAD NAME 0x102444c74810 0x20 48de894901038348 UNKNOWN_STB UNKNOWN_STT BAD NAME 0x24748d482024548d 0x1 3b7880fc085ff UNKNOWN_STB UNKNOWN_STT BAD NAME 0x85482024448b4800 BAD_IDX=63674 0x27 167840f03f883 LOCAL OBJECT BAD NAME 0x8d4810247c8b4800 BAD_IDX=18432 0x0 302444c7480000 UNKNOWN_STB UNKNOWN_STT BAD NAME 0x45e8ff3145000000 .rodata(10) 0xba 24442b483024448b UNKNOWN_STB UNKNOWN_STT BAD NAME 0xf412024443b4810 BAD_IDX=18468 0x4 fff854500000384 WEAK UNKNOWN_STT BAD NAME 0xf63145000000a884 BAD_IDX=33807 0x1 445c70000000e00 UNKNOWN_STB COMMON BAD NAME 0x906623eb00000001 BAD_IDX=17863 0x33 14e840fff007d83 UNKNOWN_STB NOTYPE BAD NAME 0xf63145df89480000 BAD_IDX=12404 0x1 f0894448c4834824 UNKNOWN_STB UNKNOWN_STT BAD NAME 0x5e415d415c415d5b BAD_IDX=1161 0x49 objdump doesn't appear to have any issues parsing the dynamic and symbol table, so I think we're in error here... |
Also we have an infinite loop in our note iterator implementation... |
objdump doesn't have a problem because it uses the re the infinite loop, gimli-rs/object#74 may be of interest. |
Well i'm shocked, had no idea objdump (and readelf!) relied on the section headers, which are technically strippable. We don't need to use .dynamic though, the dynamic linker has to know which dynamic symbols are available without section headers, so imho we should only use that (downstream clients can use the .dynamic section if they want manually, or we can implement a specific iterator over it if needed). There just shouldn't be garbage coming out, it's just a bug in our implementation; i'm just trying to figure out if it was already present, or is a result of this patch? probably the former? |
The bug was already there. It's at this line where it is calculating the number of symbols. The problem is that without a |
https://flapenguin.me/2017/04/24/elf-lookup-dt-hash/ and https://flapenguin.me/2017/05/10/elf-lookup-dt-gnu-hash/ have good explanations of the hash tables. For |
Ok it sounds like that’s somewhat unrelated to this patch, and this is ready to go ? I’ll let you merge. Thanks for this and the subsequent discussion :) |
readelf will read the hash table instead if you use |
Partial fix for #106
Note that this changes
elf::dyn::DynamicInfo
so that it is no longer the same aself::dyn::dyn32::DynamicInfo
.