Zero first byte of CString on drop #36264

matklad · 2016-09-04T15:51:54Z

Hi! This is one more attempt to ameliorate CString::new("...").unwrap().as_ptr() problem (related RFC: rust-lang/rfcs#1642).

One of the biggest problems with this code is that it may actually work in practice, so the idea of this PR is to proactively break such invalid code.

Looks like writing a null byte at the start of the CString should do the trick, and I think is an affordable cost: zeroing a single byte in Drop should be cheap enough compared to actual memory deallocation which would follow.

I would actually prefer to do something like

impl Drop for CString {
    fn drop(&mut self) {
        let pattern = b"CTHULHU FHTAGN ";
        let bytes = self.inner[..self.inner.len() - 1];
        for (d, s) in bytes.iter_mut().zip(pattern.iter().cycle()) {
            *d = *s;
        }
    }
}

because Cthulhu error should be much easier to google, but unfortunately this would be too expensive in release builds, and we can't implement things cfg(debug_assertions) conditionally in stdlib.

Not sure if the whole idea or my implementation (I've used ~~transmute~~ mem::unitialized to workaround move out of Drop thing) makes sense :)

rust-highfive · 2016-09-04T15:52:00Z

r? @aturon

(rust_highfive has picked a reviewer for you, use r? to override)

matklad · 2016-09-04T16:14:38Z

I don't know what stability attribute I should put here :)

durka · 2016-09-04T17:32:15Z

@matklad stability attributes on impls don't do anything anyway, so I believe they are usually marked stable.

durka · 2016-09-04T17:33:54Z

src/libstd/ffi/c_str.rs

+// memory unsafe code from working by accident.
+impl Drop for CString {
+    fn drop(&mut self) {
+        unsafe { *self.inner.get_unchecked_mut(0) = 0; }


Is it guaranteed that a CString always points to something? It can't be empty?

Yes, even unsafe constructors like from_raw can't break this invariant.

durka · 2016-09-04T17:49:54Z

I'm worried that this will mask the failure, because accessing the empty string won't crash. Would it be better to set the pointer to a known invalid value so as to cause a segfault as soon as possible?

sfackler · 2016-09-04T19:01:48Z

You've already given the pointer away - there's nothing you can do to it anymore.

matklad · 2016-09-04T19:01:55Z

Would it be better to set the pointer to a known invalid value so as to cause a segfault as soon as possible?

I wish we could do that, but looks like we can't? It's to late to change the pointer in the Drop, because when we created a CStr we already had copied the pointer value.

durka · 2016-09-04T19:08:54Z

Ah, good point. If we do this it should be tagged relnotes because it's a silent behavior change.

nagisa · 2016-09-04T20:53:07Z

Ah, good point. If we do this it should be tagged relnotes because it's a silent behavior change.

Behaviour change wrt. what? This memory is deallocated right after and it is invalid to use the pointer after that.

All things considered, I kinda like this PR.

alexcrichton · 2016-09-06T05:05:44Z

This seems like a neat trick to me to help a footgun in many cases. My only worry would be about performance where this is adding code where previously there was none (and may not be easy to optimize out?)

@matklad could you try benchmarking the time it takes to deallocate a CString just so we can see the comparison?

nagisa · 2016-09-06T13:12:14Z

I made some benchmarks, the difference is within the error:

                                old                             new
test drop_1000_cstring_run1     17,802 ns/iter (+/- 677)        17,776 ns/iter (+/- 1,137)
test drop_1000_cstring_run2     17,780 ns/iter (+/- 1,008)      17,838 ns/iter (+/- 901)
test drop_1000_cstring_run3     17,880 ns/iter (+/- 1,003)      17,786 ns/iter (+/- 249)

benchmarking just drop is somewhat difficult, because our benchmarking framework does not provide a way to initialize iteration state without counting time for it.

retep998 · 2016-09-06T13:38:38Z

I think the only way this could cause performance to suffer is if the memory for the CString has been evicted from the cache or paged out and freeing that memory does not involve paging it back in or pulling it into cache, so that writing that single null byte would result in a cache miss or page fault that wouldn't otherwise occur. However, this sort of situation is kinda rare, and I think the benefits of this PR definitely outweigh the performance costs of writing a single byte.

durka · 2016-09-07T02:48:14Z

@alexcrichton I sure hope it doesn't get optimized out :)

durka · 2016-09-07T02:48:59Z

Should we add a note to the as_ptr docs about this behavior?

matklad · 2016-09-07T05:57:16Z

@alexcrichton I sure hope it doesn't get optimized out :)

This is my concern as well. I've verified manually that without this impl you are able to recover the contents of the string after the drop and that with the impl you get an empty string back.

But I think that any such test will definitely invoke dereferencing a freed pointer, and I assume that we should not add such tests.

Should we add a note to the as_ptr docs about this behavior?

I don't think so because I can't think of a proper formulation for the docs, this is too hacky to be explained nicely. And hopefully if you have read the docs for as_ptr, then you have noticed the big warning :)

matklad · 2016-09-07T07:16:02Z

I've tried adding a run-fail test. It fails with the current compiler, as expected, but I don't have time to wait for make check-stage1-rfail NO_REBUILD=1 atm :(

alexcrichton · 2016-09-07T16:41:34Z

src/test/run-fail/cstring-drop.rs

+fn is_hello(s: *const c_char) -> bool {
+    let s = unsafe { CStr::from_ptr(s) };
+    let hello = CString::new("Hello").unwrap();
+    s == hello.as_ref()


I think this is technically undefined behavior, so perhaps this test could be isolated into a separate process? That is, it's fine if the process segfaults or otherwise aborts abnormally, but it's not fine for it to return successfully.

Hm, I though that run-fail tests are already executed as a separate processes?

They are, yeah, but it's asserted that this test's output will always contain the text "assertion failed" which I don't think is guaranteed to happen (due to this being UB)

I would like to omit error-pattern here precisely for this reason, but the test infra does not allow me to do so :(

Yeah it's fine to just ship this over to a run-pass test which re-executes itself (there's a few other tests that do that)

there's a few other tests that do that

Can you point me to a particular example?

Wow, Command::new(&args[0]) is stunningly beautiful, never heard of this trick!

This should prevent code like ``` let ptr = CString::new("hello").unwrap().as_ptr(); ``` from working by accident.

alexcrichton · 2016-09-07T20:22:54Z

Looks good to me, thanks @matklad!

matklad · 2016-09-07T20:34:28Z

And I've verified that the new test fails if the Drop impl is empty.

alexcrichton · 2016-09-12T22:25:25Z

Discussed at libs triage the conclusion was that this is good to go. Thanks again for the neat trick @matklad!

@bors: r+

bors · 2016-09-12T22:25:26Z

📌 Commit f9a3408 has been approved by alexcrichton

bors · 2016-09-13T11:57:24Z

⌛ Testing commit f9a3408 with merge c87ba3f...

Zero first byte of CString on drop Hi! This is one more attempt to ameliorate `CString::new("...").unwrap().as_ptr()` problem (related RFC: rust-lang/rfcs#1642). One of the biggest problems with this code is that it may actually work in practice, so the idea of this PR is to proactively break such invalid code. Looks like writing a `null` byte at the start of the CString should do the trick, and I think is an affordable cost: zeroing a single byte in `Drop` should be cheap enough compared to actual memory deallocation which would follow. I would actually prefer to do something like ```Rust impl Drop for CString { fn drop(&mut self) { let pattern = b"CTHULHU FHTAGN "; let bytes = self.inner[..self.inner.len() - 1]; for (d, s) in bytes.iter_mut().zip(pattern.iter().cycle()) { *d = *s; } } } ``` because Cthulhu error should be much easier to google, but unfortunately this would be too expensive in release builds, and we can't implement things `cfg(debug_assertions)` conditionally in stdlib. Not sure if the whole idea or my implementation (I've used ~~`transmute`~~ `mem::unitialized` to workaround move out of Drop thing) makes sense :)

bors · 2016-09-13T15:06:51Z

petrochenkov · 2016-09-20T20:13:37Z

@matklad
The test fails on my machine, btw. (Windows/GNU/64-bit)
I'm trying to investigate why.

joshtriplett · 2016-09-20T20:14:32Z

Would it make sense to do this exclusively in debug mode, rather than in release mode? That would avoid even the minor performance hit, while ensuring that tests run in debug mode would fail.

retep998 · 2016-09-20T20:21:32Z

@joshtriplett std is compiled in release mode, so it would always be disabled, regardless of whether the Rust program you're running is using debug or release, so that idea won't work.

matklad · 2016-09-20T20:31:32Z

The test fails on my machine, btw. (Windows/GNU/64-bit)
I'm trying to investigate why.

It is formally possible just because it invokes UB. I repeat test ten times just because it fails on my machine if I use only one string.

Is there perhaps some way to add a hook to rust memory deallocation function to make this test UB-free?

joshtriplett · 2016-09-20T20:36:29Z

@retep998 Hmmm, true.

#36372 was what gave me hope that this kind of debug/release distinction might be possible in std.

Have there been any discussions or plans about doing cross-crate LTO, including std?

sfackler · 2016-09-20T20:38:33Z

I think it would probably be a good idea to make do some benchmarking to make sure that the cost of writing a single byte is noticeable in any way compared to the other work going on to e.g. deallocate the CString and do whatever work the CString was created for in the first place.

petrochenkov · 2016-09-20T20:39:23Z

#[inline(never)] annotation on fn is_hello seems to fix the test in my case.

nagisa · 2016-09-20T20:41:27Z

@sfackler see #36264 (comment) for benchmarks.

We probably should remove the test as is. If we had a way to specify allocator for data inside CString, then it would be possible to use some Vec-based ptr-bump allocator which makes sure that data, once allocated, never gets overwritten, making sure the the whole test is not one big glaring UB hole.

matklad · 2016-09-20T20:48:29Z

If we had a way to specify allocator for data inside CString

Would something like this work:
https://github.com/rust-lang/rust/blob/master/src/test/run-pass/allocator-override.rs ?

nagisa · 2016-09-20T20:52:30Z

Yes, it would. The only thing you really need to do is to override free to be no-op.

petrochenkov · 2016-09-20T20:52:31Z

If the purpose of the test is to check the destructor behavior, then CStrings can be dropped in place without deallocating xs.

for x in &mut xs {
    unsafe { std::ptr::drop_in_place(x); }
}

This fixes the test too.

nagisa · 2016-09-20T20:55:52Z

@petrochenkov the drop glue of CString also frees the data buffer, so its still UB to try read the pointer afterwards.

petrochenkov · 2016-09-20T21:02:35Z

Yep. Less UB is still UB. Allocator override would certainly be better.

matklad · 2016-09-20T21:08:59Z

I'll send a PR with a test fixed later this week, unless someone wants to step in ;)

Amanieu · 2016-09-20T21:33:58Z

LLVM can, in theory, optimize the write away: https://godbolt.org/g/mfFfKu

This doesn't affect us currently since LLVM specifically looks for the free or C++ delete symbols while Rust uses a wrapper around those, but this could be added in the future.

One possible workaround would be to use write_volatile here to prevent it from being optimized away.

bstrie · 2017-01-31T06:42:54Z

This appears to have succeeded at catching at least one bug in the wild: https://www.reddit.com/r/rust/comments/5r6ccm/an_unsafe_gotcha/

matklad · 2017-01-31T06:44:25Z

🍒 Yay! 🎉

rust-highfive assigned aturon Sep 4, 2016

matklad force-pushed the zeroing-cstring branch from 817b00c to c5d1a6e Compare September 4, 2016 16:05

durka reviewed Sep 4, 2016
View reviewed changes

matklad force-pushed the zeroing-cstring branch from c5d1a6e to 3e9fe0c Compare September 4, 2016 18:55

matklad force-pushed the zeroing-cstring branch from 3e9fe0c to d20fa49 Compare September 4, 2016 19:06

matklad force-pushed the zeroing-cstring branch from d20fa49 to e9923f7 Compare September 4, 2016 19:09

matklad force-pushed the zeroing-cstring branch from e9923f7 to a4e0630 Compare September 6, 2016 19:06

matklad force-pushed the zeroing-cstring branch from 8cc7691 to c25447f Compare September 7, 2016 08:54

alexcrichton reviewed Sep 7, 2016
View reviewed changes

matklad added 2 commits September 7, 2016 22:17

Zero first byte of CString on drop

04b776e

This should prevent code like ``` let ptr = CString::new("hello").unwrap().as_ptr(); ``` from working by accident.

Add a test for CString drop

f9a3408

matklad force-pushed the zeroing-cstring branch from c25447f to f9a3408 Compare September 7, 2016 19:34

alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Sep 7, 2016

bluss added the relnotes Marks issues that should be documented in the release notes of the next release. label Sep 8, 2016

bors merged commit f9a3408 into rust-lang:master Sep 13, 2016

matklad mentioned this pull request Sep 21, 2016

UB free test for CString Drop #36607

Closed

matklad deleted the zeroing-cstring branch April 26, 2019 13:17

Zero first byte of CString on drop #36264

Zero first byte of CString on drop #36264

Conversation

matklad commented Sep 4, 2016 • edited Loading

rust-highfive commented Sep 4, 2016

matklad commented Sep 4, 2016

durka commented Sep 4, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

durka commented Sep 4, 2016

sfackler commented Sep 4, 2016

matklad commented Sep 4, 2016

durka commented Sep 4, 2016

nagisa commented Sep 4, 2016

alexcrichton commented Sep 6, 2016

nagisa commented Sep 6, 2016

retep998 commented Sep 6, 2016 • edited Loading

durka commented Sep 7, 2016

durka commented Sep 7, 2016

matklad commented Sep 7, 2016

matklad commented Sep 7, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcrichton commented Sep 7, 2016

matklad commented Sep 7, 2016

alexcrichton commented Sep 12, 2016

bors commented Sep 12, 2016

bors commented Sep 13, 2016

bors commented Sep 13, 2016

petrochenkov commented Sep 20, 2016

joshtriplett commented Sep 20, 2016

retep998 commented Sep 20, 2016

matklad commented Sep 20, 2016

joshtriplett commented Sep 20, 2016 • edited Loading

sfackler commented Sep 20, 2016

petrochenkov commented Sep 20, 2016

nagisa commented Sep 20, 2016

matklad commented Sep 20, 2016

nagisa commented Sep 20, 2016

petrochenkov commented Sep 20, 2016

nagisa commented Sep 20, 2016

petrochenkov commented Sep 20, 2016

matklad commented Sep 20, 2016

Amanieu commented Sep 20, 2016

bstrie commented Jan 31, 2017

matklad commented Jan 31, 2017

matklad commented Sep 4, 2016 •

edited

Loading

retep998 commented Sep 6, 2016 •

edited

Loading

matklad commented Sep 7, 2016 •

edited

Loading

joshtriplett commented Sep 20, 2016 •

edited

Loading