-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
elf.parse: added lazy_parse function #254
Conversation
Cool! Yea this seems like a great compromise to get a lazy parsing approach, I like it! Reducing some of the boilerplate into common function would be good, as you said, but i understand that this is just an initial sketch/PR? What api were you thinking for filling in the rest of the structures, e.g., section headers, etc.? And thanks for the PR! :) |
Thank you for your reply. Glad you are interested. I push a new version to reduce the repeated code, make API more general and add a unit test which can also be an example.
With goblin's well-designed APIs for each part (program headers, section headers ...), user can read in the contents they need and parse the specific part with corresponding You may start the review. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks actually really great! Want some clarification on the new pub functions, and also the new parse_elf_hdr
, otherwise I think we can merge this very soon, and thanks again for the PR! I didn't think a lazy elf parser would be so easy;
Oh it might also be good to add a new example showing how to parse out portions in a lazy manner, etc. the test is good, but example filling in other parts of structs, just as proof of concept would be good (i assume the user has all the facilities to do this at the moment?)
The only other concern i have is the "semantics" of a lazy elf parse; i assume it's safe to use, and no unusual things will happen if the user creates an elf struct in the lazy manner (i.e., no other public methods on the struct will panic if called and it was constructed in the lazy manner, yes?)
src/elf/mod.rs
Outdated
@@ -202,22 +202,45 @@ if_sylvan! { | |||
pub fn is_object_file(&self) -> bool { | |||
self.header.e_type == header::ET_REL | |||
} | |||
|
|||
pub fn parse_elf_hdr(bytes: &'a [u8]) -> error::Result<Header> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this a pub fn (or a function at all actually?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename this to parse_header then since elf is redundant a bit since user would write Elf::parse_elf_hdr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe this function is very likely to be called directly by users who would like to load in a lazy manner. As elf header contains much information and user can just read in the header part with system IO and parse only the header to get the info and can then decide whether to read in other parts.
Renamed. Thanks for your review.
src/elf/mod.rs
Outdated
@@ -346,7 +369,7 @@ if_sylvan! { | |||
} | |||
} | |||
|
|||
fn gnu_hash_len(bytes: &[u8], offset: usize, ctx: Ctx) -> error::Result<usize> { | |||
pub fn gnu_hash_len(bytes: &[u8], offset: usize, ctx: Ctx) -> error::Result<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this need to be pub
now? I'd like to not increase api scope unless absolutely necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was writing the test case and was thinking this could be called by users if they want to do the same as the test case.
However, I don't think it is necessary. It is a truly low-level function. Maybe we can mark this pub
when there are strong demands. I am okay to keep it private for now.
Changed back to private functions. Done
src/elf/mod.rs
Outdated
@@ -379,7 +402,7 @@ if_sylvan! { | |||
} | |||
} | |||
|
|||
fn hash_len(bytes: &[u8], offset: usize, machine: u16, ctx: Ctx) -> error::Result<usize> { | |||
pub fn hash_len(bytes: &[u8], offset: usize, machine: u16, ctx: Ctx) -> error::Result<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also feels very lowlevel as a pub function; on argument is a u16
machine value, which doesn't seem great for user to think about; if this is needed for some reason, i think we can provide something better here (still not sure why it's needed exactly, hoping you can fill that in)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -388,6 +411,38 @@ if_sylvan! { | |||
}; | |||
Ok(nchain) | |||
} | |||
|
|||
struct Misc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review.
I second this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@m4b Thank you for your review. I changed gnu_hash_len
and hash_len
back to private functions. I can add an example later.
src/elf/mod.rs
Outdated
@@ -202,22 +202,45 @@ if_sylvan! { | |||
pub fn is_object_file(&self) -> bool { | |||
self.header.e_type == header::ET_REL | |||
} | |||
|
|||
pub fn parse_elf_hdr(bytes: &'a [u8]) -> error::Result<Header> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe this function is very likely to be called directly by users who would like to load in a lazy manner. As elf header contains much information and user can just read in the header part with system IO and parse only the header to get the info and can then decide whether to read in other parts.
Renamed. Thanks for your review.
src/elf/mod.rs
Outdated
@@ -346,7 +369,7 @@ if_sylvan! { | |||
} | |||
} | |||
|
|||
fn gnu_hash_len(bytes: &[u8], offset: usize, ctx: Ctx) -> error::Result<usize> { | |||
pub fn gnu_hash_len(bytes: &[u8], offset: usize, ctx: Ctx) -> error::Result<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was writing the test case and was thinking this could be called by users if they want to do the same as the test case.
However, I don't think it is necessary. It is a truly low-level function. Maybe we can mark this pub
when there are strong demands. I am okay to keep it private for now.
Changed back to private functions. Done
src/elf/mod.rs
Outdated
@@ -379,7 +402,7 @@ if_sylvan! { | |||
} | |||
} | |||
|
|||
fn hash_len(bytes: &[u8], offset: usize, machine: u16, ctx: Ctx) -> error::Result<usize> { | |||
pub fn hash_len(bytes: &[u8], offset: usize, machine: u16, ctx: Ctx) -> error::Result<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -388,6 +411,38 @@ if_sylvan! { | |||
}; | |||
Ok(nchain) | |||
} | |||
|
|||
struct Misc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review.
1ae2c00
to
2c2794a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once doc comment and renamed I think this is ready to merge, thank you for the PR! This is great stuff.
It's also occurred to me in future, we could add e.g., elf.parse_section_headers()
which takes &mut self
; then the Elf::parse
code could do:
let mut elf = Elf::lazy_parse()?;
elf.parse_section_headers();
elf.parse_whatever()?;
and ditto for users of lazy parsing, easier for them to fill out which portions they want without doing the freestanding parse functions filling a field up, etc. just a thought ( definitely don't do that now) :)
It seems that I can't fill in all parts without |
Example getting interpreter would be great! Re gnu hash, let's leave it private for now, that way we don't commit to an api; if someone needs this functionality, we can always make a PR/take a PR and review how best to expose it. let mut num_syms = if let Some(gnu_hash) = dyn_info.gnu_hash {
gnu_hash_len(bytes, gnu_hash as usize, ctx)?
} else if let Some(hash) = dyn_info.hash {
hash_len(bytes, hash as usize, header.e_machine, ctx)?
} else {
0
};
let max_reloc_sym = dynrelas.iter()
.chain(dynrels.iter())
.chain(pltrelocs.iter())
.fold(0, |num, reloc| cmp::max(num, reloc.r_sym));
if max_reloc_sym != 0 {
num_syms = cmp::max(num_syms, max_reloc_sym + 1);
}
dynsyms = Symtab::parse(bytes, dyn_info.symtab, num_syms, ctx)?; as far as i can see, this is only code that uses it; we could just a a helper function that does all this and takes whichever minimal arguments requierd, seems like For now i think it's better to leave private, as someone can always do as you say and parse whole thing :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome, thanks so much for this!
I can add this change onto the 0.3 branch and do (another :D ) release, 0.3.4 if you like? |
@m4b Thank you for your review. Yeah sure, I was just going to ask whether there will be a recent release with this feature that can be used directly from crates.io? |
You got it, 0.3.4 is out! |
Great! Thank you very much! |
lazy_parse will only generate a dummy Elf struct with only Header. Users can choose to parse whatever they want and fill the Elf based on their needs.
We are developing a libOS kernel (https://github.com/occlum/occlum) and would like to load ELF as lazy as possible. However, now goblin does all the read and parse in memory so we must read the whole ELF file to kernel memory (we can't mmap the file to memory due to some constraint).
Thus, I came up with this method to load lazily:
read ELF header only -> parse header -> get program header table -> only read in PT_LOAD and PT_INTERP segments
And by adding these two functions, I can achieve this. The implementation can be found here. I would like to see this get merged to master so that people with similar requirements can do this. But I am not sure if it is a common need. If it is, I can spend some time to reduce the repeated code.
Comments are welcome. Thank you.