-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize ReadDir iterator #103135
Optimize ReadDir iterator #103135
Conversation
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
The job Click to see the possible cause of the failure (guessed by this bot)
|
Eliminate 280-byte memset from ReadDir iterator This guy: https://github.com/rust-lang/rust/blob/1536ab1b383f21b38f8d49230a2aecc51daffa3d/library/std/src/sys/unix/fs.rs#L589 It turns out `libc::dirent64` is quite big—https://docs.rs/libc/0.2.135/libc/struct.dirent64.html. In rust-lang#103135 this memset accounted for 0.9% of the runtime of iterating a big directory. Almost none of the big zeroed value is ever used. We memcpy a tiny prefix (19 bytes) into it, and then read just 9 bytes (`d_ino` and `d_type`) back out. We can read exactly those 9 bytes we need directly from the original entry_ptr instead. ## History This code got added in rust-lang#93459 and tweaked in rust-lang#94272 and rust-lang#94750. Prior to rust-lang#93459, there was no memset but a full 280 bytes were being copied from the entry_ptr. <table><tr><td>copy 280 bytes</td></tr></table> This was not legal because not all of those bytes might be initialized, or even allocated, depending on the length of the directory entry's name, leading to a segfault. That PR fixed the segfault by creating a new zeroed dirent64 and copying just the guaranteed initialized prefix into it. <table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td></tr></table> However this was still buggy because it used `addr_of!((*entry_ptr).d_name)`, which is considered UB by Miri in the case that the full extent of entry_ptr is not in bounds of the same allocation. (Arguably this shouldn't be a requirement, but here we are.) The UB got fixed by rust-lang#94272 by replacing `addr_of` with some pointer manipulation based on `offset_from`, but still fundamentally the same operation. <table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td></tr></table> Then rust-lang#94750 noticed that only 9 of those 19 bytes were even being used, so we could pick out only those 9 to put in the ReadDir value. <table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td><td>copy 9 bytes</td></tr></table> After my PR we just grab the 9 needed bytes directly from entry_ptr. <table><tr><td>copy 9 bytes</td></tr></table> The resulting code is more complex but I believe still worthwhile to land for the following reason. This is an extremely straightforward thing to accomplish in C and clearly libc assumes that; literally just `entry_ptr->d_name`. The extra work in comparison to accomplish it in Rust is not an example of any actual safety being provided by Rust. I believe it's useful to have uncovered that and think about what could be done in the standard library or language to support this obvious operation better. ## References - https://man7.org/linux/man-pages/man3/readdir.3.html
Eliminate 280-byte memset from ReadDir iterator This guy: https://github.com/rust-lang/rust/blob/1536ab1b383f21b38f8d49230a2aecc51daffa3d/library/std/src/sys/unix/fs.rs#L589 It turns out `libc::dirent64` is quite big—https://docs.rs/libc/0.2.135/libc/struct.dirent64.html. In rust-lang#103135 this memset accounted for 0.9% of the runtime of iterating a big directory. Almost none of the big zeroed value is ever used. We memcpy a tiny prefix (19 bytes) into it, and then read just 9 bytes (`d_ino` and `d_type`) back out. We can read exactly those 9 bytes we need directly from the original entry_ptr instead. ## History This code got added in rust-lang#93459 and tweaked in rust-lang#94272 and rust-lang#94750. Prior to rust-lang#93459, there was no memset but a full 280 bytes were being copied from the entry_ptr. <table><tr><td>copy 280 bytes</td></tr></table> This was not legal because not all of those bytes might be initialized, or even allocated, depending on the length of the directory entry's name, leading to a segfault. That PR fixed the segfault by creating a new zeroed dirent64 and copying just the guaranteed initialized prefix into it. <table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td></tr></table> However this was still buggy because it used `addr_of!((*entry_ptr).d_name)`, which is considered UB by Miri in the case that the full extent of entry_ptr is not in bounds of the same allocation. (Arguably this shouldn't be a requirement, but here we are.) The UB got fixed by rust-lang#94272 by replacing `addr_of` with some pointer manipulation based on `offset_from`, but still fundamentally the same operation. <table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td></tr></table> Then rust-lang#94750 noticed that only 9 of those 19 bytes were even being used, so we could pick out only those 9 to put in the ReadDir value. <table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td><td>copy 9 bytes</td></tr></table> After my PR we just grab the 9 needed bytes directly from entry_ptr. <table><tr><td>copy 9 bytes</td></tr></table> The resulting code is more complex but I believe still worthwhile to land for the following reason. This is an extremely straightforward thing to accomplish in C and clearly libc assumes that; literally just `entry_ptr->d_name`. The extra work in comparison to accomplish it in Rust is not an example of any actual safety being provided by Rust. I believe it's useful to have uncovered that and think about what could be done in the standard library or language to support this obvious operation better. ## References - https://man7.org/linux/man-pages/man3/readdir.3.html
I was recently dealing with a directory containing a large number of files and tried to see if there was any room for optimizing
std::fs::ReadDir
for this situation. This commit ended up improvingcount()
by only 8.5% which I didn't find to be enough to justify merging after taking into account how often anybody else would care about this operation. However I think some of the changes tonext()
are an improvement independent of performance so I am going to move those into a separate PR.r? @ghost