Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement PE exception tables #136

Merged
merged 6 commits into from
Apr 1, 2019
Merged

Implement PE exception tables #136

merged 6 commits into from
Apr 1, 2019

Conversation

jan-auer
Copy link
Contributor

This implements parsing for the exception tables data directory and x64 unwind codes. Respective documentation from Microsoft is located here: https://docs.microsoft.com/en-us/cpp/build/exception-handling-x64?view=vs-2017.

The exception directory is only present on x64 builds, so the entire implementation can assume x64 exception handling.

Tests are still missing as I haven't taken the time to extract relevant bits from an actual executable for realistic testing yet.

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven’t given this a full review yet but this is really incredible work, awesome job!

src/pe/exception.rs Outdated Show resolved Hide resolved
src/pe/exception.rs Show resolved Hide resolved
@jan-auer
Copy link
Contributor Author

jan-auer commented Mar 29, 2019

Updated for compatibility with Rust 1.20.0.


Even though it's not that much code, I'd like to offer a suggestion to sanity-check this:

  • PE::exception_table -> ExceptionData::functions -> ExceptionData::get_unwind_info -> UnwindInfo::unwind_codes -> UnwindCode::operation
  • UnwindInfo::chained_info -> ExceptionData::get_unwind_info.
  • Comments on StackFrameOffset (used in some operations) regarding how to load register values

A usage example for this is my port of Breakpad's dump_syms here.


There might be more, one could do to help interpreting the unwind info:

  • For one, I did not implement the new UWOP_EPILOG operation. Almost noone seems to interpret it correctly, and its behavior is quite complex.
  • Next, there is nothing that "helps" with unwinding, e.g. an unwind context that keeps register state and restores register values. This could be a nice feature, but I think it's out of scope for this crate.
  • Finally, exception and termination handlers are only exposed as (unbounded) raw buffers. For my case, and probably most other cases, these won't be necessary. But if someone really wants to runtime-unwind, this would come in handy. Same applies as above, probably a good case for a separate crate.

For full transparency, here are two more thoughts that I had when implementing:

  • The manually implemented binary search in find_function is not ideal. However, I found no ergonomic way in scroll to map a &[u8] to &[T]. Transmuting and implementing a custom RuntimeFunction struct that converts from little endian on the fly wasn't pretty either...
  • Only RuntimeFunction derives Pwrite at the moment, but it might be nice to also write the other unwind info eventually. I wasn't sure if that's necessary though and what the best pattern would be, especially for the UnwindCodes iterator.

@philipc
Copy link
Collaborator

philipc commented Mar 30, 2019

Updated for compatibility with Rust 1.20.0.

Note that #134 is bumping to rust 1.31.1 and changing to 2018 edition.

src/pe/exception.rs Outdated Show resolved Hide resolved
src/pe/exception.rs Show resolved Hide resolved
src/pe/exception.rs Show resolved Hide resolved
}

let rva = function.unwind_info_address as usize;
let offset = utils::find_offset(rva, sections, self.file_alignment).ok_or_else(|| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we store the RVA bias in ExceptionData instead of looking it up every time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point. I wasn't sure if the RVAs always point into the same section. This would make sense generally, but I don't trust this is the case for every PE. Especially the tooling Microsoft applies to their DLLs might invalidate this assumption. I'll try to investigate this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think you're right. breakpad doesn't appear to do anything clever either.

src/pe/exception.rs Outdated Show resolved Hide resolved
@philipc
Copy link
Collaborator

philipc commented Mar 30, 2019

The manually implemented binary search in find_function is not ideal. However, I found no ergonomic way in scroll to map a &[u8] to &[T]. Transmuting and implementing a custom RuntimeFunction struct that converts from little endian on the fly wasn't pretty either...

Maybe transmute to &[[u8; RUNTIME_FUNCTION_SIZE]]?

@jan-auer
Copy link
Contributor Author

jan-auer commented Mar 30, 2019

Maybe transmute to &[[u8; RUNTIME_FUNCTION_SIZE]]?

Yeah, I was playing around with something similar for a moment... I'll let you choose what you like more 😄

type Placeholder = [u32; 3];
let slice = unsafe { std::slice::from_raw_parts(
    self.bytes.as_ptr() as *const Placeholder,
    self.bytes.len() / RUNTIME_FUNCTION_SIZE,
) };

let index = match slice.binary_search_by_key(&rva, |p| u32::from_le(p[0])) {
    Ok(i) => i,
    Err(0) => return Ok(None),
    Err(i) => i - 1,
};

let function = self.get_function(index)?;
if function.end_address > rva {
    Ok(Some(function))
} else {
    Ok(None)
}

I'm just learning about scroll::FromCtx. Theoretically, reading RuntimeFunction is infallible given that the bounds are checked ahead of time. The Results could be removed everywhere, then. It's just unfortunate, Pread doesn't work with FromCtx.

@philipc
Copy link
Collaborator

philipc commented Mar 30, 2019

Let's stick with the version that doesn't use unsafe.

Copy link
Collaborator

@philipc philipc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@m4b
Copy link
Owner

m4b commented Mar 31, 2019

yes, I agree with @philipc , lets leave out unsafe.

However, I found no ergonomic way in scroll to map a &[u8] to &[T]. Transmuting and implementing a custom RuntimeFunction struct that converts from little endian on the fly wasn't pretty either...

I'm curious though, @jan-auer , so gread_inout_with allows you to pass a mutable buffer of type &'mut [T] where T is your type you want to deserialize out of the bytes you give it (and which has a TryFromCtx impl). It copies in the same host endianness case, but in practice i don't think this is generally much of an issue on most hardware these days.
I considered adding a pread method for essentially performing transmutes but several people convinced me not to, and i think it was a good idea not to in the end. If the user needs the extra perf boost, then they should transmute themselves, and guarantee the host <-> binary blob is the same endianness; on the other hand, if they're deserializing from arbitrary binary formats on disk, its generally not a great idea to be transmuting, and one should just copy out.
Anyway, great work here!

@m4b
Copy link
Owner

m4b commented Mar 31, 2019

Only RuntimeFunction derives Pwrite at the moment, but it might be nice to also write the other unwind info eventually. I wasn't sure if that's necessary though and what the best pattern would be, especially for the UnwindCodes iterator.

If you or someone else wants to start on the pwrite implementation for a PE binary I would be super excited, or at least getting the ball rolling. m4b/faerie#46 is be major reason for this :)
Imho, it's the last major thing I'd like to see done before we take this library 1.0 :D

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has to be rebased for the R2018 changes, yes? After that LGTM, and we can merge

@m4b
Copy link
Owner

m4b commented Mar 31, 2019

After this PR lands i'm going to release 0.22, got some great stuff in this release!

@philipc
Copy link
Collaborator

philipc commented Apr 1, 2019

@m4b This is already rebased after #134. I think all your review requests have been addressed?

@jan-auer
Copy link
Contributor Author

jan-auer commented Apr 1, 2019

@m4b, It is rebased 👍 Tests are still missing, though. I hope to finish that today.

gread_inout_with allows you to pass a mutable buffer of type

Unfortunately, that reads the entire thing into a secondary buffer, which is not really in the spirit of memory mapped binary search :/ I was probably more looking for something along the lines of what we did in the PDB crate (and here the corresponding struct, which could be derived). As I've seen this pop a couple of times now, I might try to make some time for a PR to scroll.

If you or someone else wants to start on the pwrite implementation for a PE binary

Phew 😅

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, great work!

@m4b m4b merged commit e6f784e into m4b:master Apr 1, 2019
@jan-auer jan-auer deleted the pe-exceptions branch April 1, 2019 18:29
@jan-auer
Copy link
Contributor Author

@m4b do you have an idea when you'll be able to run a release?

@m4b
Copy link
Owner

m4b commented Apr 10, 2019

Sorry got distracted I can try for this evening otherwise for sure this weekend!

@m4b
Copy link
Owner

m4b commented Apr 14, 2019

Ok 0.0.22 is published :)

//! equal to the prolog size encoded in the unwind info. The effects of the prolog are unwound
//! by scanning forward through the unwind codes array for the first entry with an offset less
//! than or equal to the offset of the RIP from the function start, then undoing the effect of
//! all remaining items in the unwind code array. Step 1 is then repeated.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*step 2?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's step 1 from https://docs.microsoft.com/en-us/cpp/build/exception-handling-x64?view=vs-2017#unwind-procedure (the 1,2,3 here are actually case a,b,c from that doc).

/// An x64 register used during unwinding.
///
/// - `0` - `15`: General purpose registers
/// - `17` - `32`: XMM registers
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please add 16 (rip) to the list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be added, but please note that for unwinding $rip can never occur as it cannot be represented.

30 => "$xmm13",
31 => "$xmm14",
32 => "$xmm15",
_ => "",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please return something like "" here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants