-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero first byte of CString on drop #36264
Conversation
r? @aturon (rust_highfive has picked a reviewer for you, use r? to override) |
817b00c
to
c5d1a6e
Compare
I don't know what stability attribute I should put here :) |
@matklad stability attributes on impls don't do anything anyway, so I believe they are usually marked stable. |
// memory unsafe code from working by accident. | ||
impl Drop for CString { | ||
fn drop(&mut self) { | ||
unsafe { *self.inner.get_unchecked_mut(0) = 0; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it guaranteed that a CString always points to something? It can't be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, even unsafe constructors like from_raw
can't break this invariant.
I'm worried that this will mask the failure, because accessing the empty string won't crash. Would it be better to set the pointer to a known invalid value so as to cause a segfault as soon as possible? |
c5d1a6e
to
3e9fe0c
Compare
You've already given the pointer away - there's nothing you can do to it anymore. |
I wish we could do that, but looks like we can't? It's to late to change the pointer in the Drop, because when we created a |
3e9fe0c
to
d20fa49
Compare
Ah, good point. If we do this it should be tagged relnotes because it's a silent behavior change. |
d20fa49
to
e9923f7
Compare
Behaviour change wrt. what? This memory is deallocated right after and it is invalid to use the pointer after that. All things considered, I kinda like this PR. |
This seems like a neat trick to me to help a footgun in many cases. My only worry would be about performance where this is adding code where previously there was none (and may not be easy to optimize out?) @matklad could you try benchmarking the time it takes to deallocate a |
I made some benchmarks, the difference is within the error:
benchmarking just drop is somewhat difficult, because our benchmarking framework does not provide a way to initialize iteration state without counting time for it. |
I think the only way this could cause performance to suffer is if the memory for the CString has been evicted from the cache or paged out and freeing that memory does not involve paging it back in or pulling it into cache, so that writing that single null byte would result in a cache miss or page fault that wouldn't otherwise occur. However, this sort of situation is kinda rare, and I think the benefits of this PR definitely outweigh the performance costs of writing a single byte. |
e9923f7
to
a4e0630
Compare
@alexcrichton I sure hope it doesn't get optimized out :) |
Should we add a note to the |
This is my concern as well. I've verified manually that without this impl you are able to recover the contents of the string after the drop and that with the impl you get an empty string back. But I think that any such test will definitely invoke dereferencing a freed pointer, and I assume that we should not add such tests.
I don't think so because I can't think of a proper formulation for the docs, this is too hacky to be explained nicely. And hopefully if you have read the docs for |
I've tried adding a run-fail test. It fails with the current compiler, as expected, but I don't have time to wait for |
8cc7691
to
c25447f
Compare
fn is_hello(s: *const c_char) -> bool { | ||
let s = unsafe { CStr::from_ptr(s) }; | ||
let hello = CString::new("Hello").unwrap(); | ||
s == hello.as_ref() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is technically undefined behavior, so perhaps this test could be isolated into a separate process? That is, it's fine if the process segfaults or otherwise aborts abnormally, but it's not fine for it to return successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I though that run-fail
tests are already executed as a separate processes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are, yeah, but it's asserted that this test's output will always contain the text "assertion failed" which I don't think is guaranteed to happen (due to this being UB)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to omit error-pattern
here precisely for this reason, but the test infra does not allow me to do so :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's fine to just ship this over to a run-pass test which re-executes itself (there's a few other tests that do that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a few other tests that do that
Can you point me to a particular example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, Command::new(&args[0])
is stunningly beautiful, never heard of this trick!
This should prevent code like ``` let ptr = CString::new("hello").unwrap().as_ptr(); ``` from working by accident.
c25447f
to
f9a3408
Compare
Looks good to me, thanks @matklad! |
And I've verified that the new test fails if the |
📌 Commit f9a3408 has been approved by |
Zero first byte of CString on drop Hi! This is one more attempt to ameliorate `CString::new("...").unwrap().as_ptr()` problem (related RFC: rust-lang/rfcs#1642). One of the biggest problems with this code is that it may actually work in practice, so the idea of this PR is to proactively break such invalid code. Looks like writing a `null` byte at the start of the CString should do the trick, and I think is an affordable cost: zeroing a single byte in `Drop` should be cheap enough compared to actual memory deallocation which would follow. I would actually prefer to do something like ```Rust impl Drop for CString { fn drop(&mut self) { let pattern = b"CTHULHU FHTAGN "; let bytes = self.inner[..self.inner.len() - 1]; for (d, s) in bytes.iter_mut().zip(pattern.iter().cycle()) { *d = *s; } } } ``` because Cthulhu error should be much easier to google, but unfortunately this would be too expensive in release builds, and we can't implement things `cfg(debug_assertions)` conditionally in stdlib. Not sure if the whole idea or my implementation (I've used ~~`transmute`~~ `mem::unitialized` to workaround move out of Drop thing) makes sense :)
@matklad |
Would it make sense to do this exclusively in debug mode, rather than in release mode? That would avoid even the minor performance hit, while ensuring that tests run in debug mode would fail. |
@joshtriplett std is compiled in release mode, so it would always be disabled, regardless of whether the Rust program you're running is using debug or release, so that idea won't work. |
It is formally possible just because it invokes UB. I repeat test ten times just because it fails on my machine if I use only one string. Is there perhaps some way to add a hook to rust memory deallocation function to make this test UB-free? |
I think it would probably be a good idea to make do some benchmarking to make sure that the cost of writing a single byte is noticeable in any way compared to the other work going on to e.g. deallocate the CString and do whatever work the CString was created for in the first place. |
|
@sfackler see #36264 (comment) for benchmarks. We probably should remove the test as is. If we had a way to specify allocator for data inside CString, then it would be possible to use some Vec-based ptr-bump allocator which makes sure that data, once allocated, never gets overwritten, making sure the the whole test is not one big glaring UB hole. |
Would something like this work: |
Yes, it would. The only thing you really need to do is to override |
If the purpose of the test is to check the destructor behavior, then
This fixes the test too. |
@petrochenkov the drop glue of |
Yep. Less UB is still UB. Allocator override would certainly be better. |
I'll send a PR with a test fixed later this week, unless someone wants to step in ;) |
LLVM can, in theory, optimize the write away: https://godbolt.org/g/mfFfKu This doesn't affect us currently since LLVM specifically looks for the One possible workaround would be to use |
This appears to have succeeded at catching at least one bug in the wild: https://www.reddit.com/r/rust/comments/5r6ccm/an_unsafe_gotcha/ |
🍒 Yay! 🎉 |
Hi! This is one more attempt to ameliorate
CString::new("...").unwrap().as_ptr()
problem (related RFC: rust-lang/rfcs#1642).One of the biggest problems with this code is that it may actually work in practice, so the idea of this PR is to proactively break such invalid code.
Looks like writing a
null
byte at the start of the CString should do the trick, and I think is an affordable cost: zeroing a single byte inDrop
should be cheap enough compared to actual memory deallocation which would follow.I would actually prefer to do something like
because Cthulhu error should be much easier to google, but unfortunately this would be too expensive in release builds, and we can't implement things
cfg(debug_assertions)
conditionally in stdlib.Not sure if the whole idea or my implementation (I've used
transmute
mem::unitialized
to workaround move out of Drop thing) makes sense :)