CStr::from_bytes_with_nul #30614

arcnmx · 2015-12-29T18:57:04Z

I'm a bit iffy on the return type, but Result would also be a bit weird... The two error cases are Unterminated and InteriorNul(usize).

I considered from_chars(&[c_char]) but this meshes better with as_bytes_with_nul() and Rust code in general.

Should there also be a corresponding unsafe from_bytes_unchecked variant?

rust-highfive · 2015-12-29T18:57:17Z

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

brson · 2015-12-29T19:15:28Z

cc @rust-lang/libs

nagisa · 2015-12-29T19:30:05Z

See rust-lang/rfcs#1264

arcnmx · 2015-12-29T19:51:40Z

@nagisa aha! I do feel like an explicit error type is probably better.

A Cow conversion would actually be quite nice too, but that one maybe belongs outside the standard library. from_bytes -> Result<Cow<CStr>, usize> seems like a little too much magic to me personally, but I wouldn't be entirely opposed to it. An additional method could work too, but I'd have no idea what to call it... from_bytes_cow is a terrible name.

EDIT: On second thought, I can't imagine a case when you wouldn't know beforehand whether your byte slice is terminated or not, and we already have CString::new for that.

alexcrichton · 2016-01-11T18:08:03Z

cc #20475, another issue that I believe this would handle.

I'm gonna tag this with T-libs so it comes up during normal triage. I'm not so sure about this API personally, could you elaborate on some of the use cases you have in mind?

arcnmx · 2016-01-11T19:10:44Z

Well, I'd break it into...

It's currently "impossible" to create a &CStr from a byte slice. You can use a transmute but that makes assumptions about the memory layout of the CStr type, which may change in the future to a thin pointer. So from_bytes_unchecked is pretty important to have as a basic building block.
from_bytes is the safe checked version of the above. It's the borrowed version of CString::new, which I find to be useful.
It's essentially a lifetime-safe version of from_ptr, so similar usecases apply. Mapping data inside another slice to a &CStr (for example when zero-copy parsing some binary formats) is a concrete one that's come up for me.
It feels nicer than CStr::from_ptr(b"hello\0".as_ptr() as *const _) for literals. Making it const unsafe would be really nice too, though I'm not sure if that's possible to implement in today's rust?

alexcrichton · 2016-01-12T00:31:35Z

It's currently "impossible" to create a &CStr from a byte slice

This is actually somewhat intentional, because the ideal representation of &CStr is just a pointer. The CStr type itself should continue to be unsized, but it's just an implementation detail that the only way to do that today is via [u8]. The intention was that CStr doesn't keep track of the length, and hence it architecturally is indeed impossible to go from &[u8] to &CStr in a "lossless fashion".

I do agree that literal construction is pretty difficult, but that's more along the lines of needs-to-be-a-compiler-plugin than anything else I think to ensure that the nul byte is always at the end. In theory the from_ptr function is a noop (and hence a const candidate), but due to the reasons mentioned above it needs to calculate the length and hence isn't a const candidate.

arcnmx · 2016-01-12T01:42:47Z

This is actually somewhat intentional, because the ideal representation of &CStr is just a pointer. The CStr type itself should continue to be unsized, but it's just an implementation detail that the only way to do that today is via [u8].

Whether the &CStr type encodes the length as part of its type or calculates it when necessary is an implementation detail though! That's exactly why it helps to have that method as a forward-compatible way of converting, since an unsafe transmute won't necessarily work in future Rust versions.

The intention was that CStr doesn't keep track of the length, and hence it architecturally is indeed impossible to go from &[u8] to &CStr in a "lossless fashion".

My view would be that as long as you can reconstruct the &[u8] from the &CStr (even if a length calculation is involved), the conversion can be considered to be lossless and the types can be considered semantically equivalent (plus the extra invariants guaranteed by the &CStr type).

I do agree that literal construction is pretty difficult, but that's more along the lines of needs-to-be-a-compiler-plugin than anything else I think to ensure that the nul byte is always at the end.

Well, if we assume a rustc with a more useful const fn story, we could implement it as a macro like so:

macro_rules! cstr {
    ($s:expr) => {
        unsafe { ::std::ffi::CStr::from_bytes_unchecked(concat!($s, "\0").as_bytes()) }
    }
}

The same could be implemented via from_ptr if we can get rid of the length calculation. It's true that's a discussion not really that related to this PR.

Another thought is that we could make from_ptr not involve a length calculation today, and defer it to when it's needed (if someone calls .len(), .as_bytes() or similar). Simply use 0 length as a sentinel that means the length isn't known. Or -1 as usize. And if you feel super adventurous, update the length with a volatile store (on platforms where that's guaranteed to be atomic) once it's calculated upon a call of one of those methods to avoid the check in the future.

It's true that from_bytes_unchecked would be unnecessary from a performance perspective if from_ptr didn't involve a length calculation, as they'll potentially be equivalent in the future. I still feel like the lifetime tie-in is a nice advantage that makes this a useful API in any case. The safe version that checks for interior and trailing null is mainly for convenience.

Also, another case where this API is useful is when creating C strings on the stack in a [u8; MAXPATHLEN], both for the lifetime borrow checking metadata and length check bypass.

alexcrichton · 2016-01-12T23:17:11Z

I think one part I'm worried about is that both of these assertions may fail

assert_eq!(CStr::from_bytes(slice).unwrap().to_bytes(), slice);
assert_eq!(CStr::from_bytes_unchecked(slice).to_bytes(), slice);

The first line will truncate slice to the first nul byte when calling to_bytes, as will the second. The second also runs the risk of memory unsafety (e.g. reading extra memory) if there's no nul byte.

Out of curiosity, what's the use case of creating a CStr from a &[u8] in the first place? In general Rust uses slices for passing data around, so the &CStr type should only rarely be used at FFI boundaries at most.

arcnmx · 2016-01-12T23:40:14Z

I think one part I'm worried about is that both of these assertions may fail

Ah, do you mean the naming is inconsistent vs .to_bytes_with_nul(), since those assertions will pretty much always fail?

The second also runs the risk of memory unsafety (e.g. reading extra memory) if there's no nul byte.

It's an unsafe method that runs the risk of memory unsafety if you provide it data that doesn't uphold the invariants, that's pretty much the point.

Out of curiosity, what's the use case of creating a CStr from a &[u8] in the first place? In general Rust uses slices for passing data around, so the &CStr type should only rarely be used at FFI boundaries at most.

Mm, FFI is the main place to be using &CStrs. One example out there right now is in nix.

alexcrichton · 2016-01-14T07:12:21Z

Oh I'm not worried about naming inconsistency, it just seems that if you construct a CStr from a slice, when you convert it back I would expect the same slice to come back. In this implementation, however, that's not always the case.

arcnmx · 2016-01-15T17:56:21Z

Well, if we break it apart...

assert_eq!(CStr::from_bytes(slice).unwrap().to_bytes_with_nul(), slice);

In the current implementation, this will always succeed, if slice has a trailing null byte and no interior bytes. If neither of those are true, the from_bytes/unwrap will fail instead. I'm partial to changing this to only check for a trailing null for performance reasons, which would make it act like you suggest, potentially returning a shorter slice than was originally provided. I'd consider that "defined undefined behaviour", but if you really care about performance that's what the unchecked version is for anyway.

assert_eq!(CStr::from_bytes_unchecked(slice).to_bytes(), slice);

Again, this is unsafe so you have to assert you're maintaining the invariants, or you will indeed get a different slice back, read past allocated memory, etc.

alexcrichton · 2016-01-19T22:52:51Z

Ah yes that's correct, I think I misread the implementation of from_bytes as truncating if a zero were found sooner.

Ok, thinking about this some more, I think my confused stemmed from perhaps misconstruing this PR as sort of reinterpreting a CStr as a "general purpose container of bytes." To me &[u8] is indeed this sort of container, but CStr is rather a general purpose container of non-0 bytes. It seems like the motivation here is a safe API that can convert to and from these containers, which seems reasonable to me.

Along those lines, however, I wonder if perhaps a better name would be from_bytes_with_nul instead of just from_bytes. That would clearly match up with to_bytes_with_nul, and those I believe are indeed the symmetric values here.

arcnmx · 2016-01-19T23:29:00Z

CStr is rather a general purpose container of non-0 bytes

Basically this, and it still also applies if in the future the type is changed to not literally store its length alongside the pointer. Though I'd argue that CStr is a general purpose container of non-0 bytes that ends in a terminating 0 byte and that to_bytes is the wrong default; to_bytes()/to_inner_bytes() would've made more sense IMO, but it's too late to argue that now.

I wonder if perhaps a better name would be from_bytes_with_nul instead of just from_bytes. That would clearly match up with to_bytes_with_nul

Unfortunate naming, but I agree that it would be more consistent/symmetrical. I mean damn, from_bytes_with_nul_unchecked(data) is the same amount of typing as from_ptr(data.as_ptr() as *const _)... Unless from_bytes_unchecked can still sort of apply maybe...

alexcrichton · 2016-01-20T00:12:12Z

Ah well, we'll discuss this at the next libs meeting regardless and I'm sure we'll have some feedback as well :)

alexcrichton · 2016-01-21T18:56:28Z

The libs team discussed this during triage yesterday and the conclusion was that we're ok with these methods so long as they have the with_nul suffix. @arcnmx would you be ok implementing that?

arcnmx · 2016-01-25T18:11:28Z

@alexcrichton like this? ab136bf623e291a51067c70ec6c8a17cab0ac8e8

ranma42 · 2016-01-25T18:40:46Z

src/libstd/ffi/c_str.rs

+    /// ```
+    #[unstable(feature = "cstr_from_bytes", reason = "recently added", issue = "0")]
+    pub fn from_bytes_with_nul<'a>(bytes: &'a [u8]) -> Option<&'a CStr> {
+        if bytes.is_empty() || memchr::memchr(0, &bytes) != Some(bytes.len() - 1) {


Isn't bytes.is_empty() redundant?

EDIT: I see, it's needed to avoid an underflow in bytes.len() - 1

alexcrichton · 2016-01-25T18:55:23Z

Thanks @arcnmx! Could you also hook this up to #31190 (the tracking issue) and add a few tests as well?

kamalmarhubi · 2016-01-26T04:07:56Z

src/libstd/ffi/c_str.rs

+    /// Creates a C string wrapper from a byte slice.
+    ///
+    /// This function will cast the provided `bytes` to a `CStr` wrapper after
+    /// ensuring that it is null terminated but does not contain any interior


"and does not contain"?

kamalmarhubi · 2016-01-26T04:12:37Z

src/libstd/ffi/c_str.rs

+    /// # }
+    /// ```
+    #[unstable(feature = "cstr_from_bytes", reason = "recently added", issue = "0")]
+    pub fn from_bytes_with_nul<'a>(bytes: &'a [u8]) -> Option<&'a CStr> {


does lifetime elision not work here?

arcnmx · 2016-02-23T15:15:12Z

Sorry that took so long...

alexcrichton · 2016-02-23T17:29:37Z

@bors: r+ 71f29cd

bors · 2016-02-23T20:08:55Z

⌛ Testing commit 71f29cd with merge 281f9d8...

I'm a bit iffy on the return type, but `Result` would also be a bit weird... The two error cases are `Unterminated` and `InteriorNul(usize)`. I considered `from_chars(&[c_char])` but this meshes better with `as_bytes_with_nul()` and Rust code in general. Should there also be a corresponding unsafe `from_bytes_unchecked` variant?

bors · 2016-02-23T23:06:24Z

☀️ Test successful - auto-linux-32-nopt-t, auto-linux-32-opt, auto-linux-64-debug-opt, auto-linux-64-nopt-t, auto-linux-64-opt, auto-linux-64-x-android-t, auto-linux-cross-opt, auto-linux-musl-64-opt, auto-mac-32-opt, auto-mac-64-nopt-t, auto-mac-64-opt, auto-mac-ios-opt, auto-win-gnu-32-nopt-t, auto-win-gnu-32-opt, auto-win-gnu-64-nopt-t, auto-win-gnu-64-opt, auto-win-msvc-32-opt, auto-win-msvc-64-opt

rust-highfive assigned alexcrichton Dec 29, 2015

arcnmx force-pushed the cstr-bytes branch 2 times, most recently from 641ac9e to c5d9806 Compare December 29, 2015 19:06

arcnmx force-pushed the cstr-bytes branch from c5d9806 to bed74df Compare December 29, 2015 19:32

arcnmx force-pushed the cstr-bytes branch 4 times, most recently from c9c189e to 9c46d43 Compare December 31, 2015 22:52

alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jan 11, 2016

ranma42 reviewed Jan 25, 2016
View reviewed changes

alexcrichton mentioned this pull request Jan 25, 2016

Tracking issue for CStr::from_bytes_with_nul #31190

Closed

kamalmarhubi reviewed Jan 26, 2016
View reviewed changes

alexcrichton changed the title ~~CStr::from_bytes~~ CStr::from_bytes_with_nul Jan 27, 2016

alexcrichton removed the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jan 27, 2016

arcnmx force-pushed the cstr-bytes branch from ab136bf to d595152 Compare February 23, 2016 06:36

arcnmx added 3 commits February 23, 2016 01:37

CStr::from_bytes

a70ae2f

Rename CStr::from_bytes to from_bytes_with_nul

16f218b

Link cstr from_bytes to tracking issue

9414c4e

arcnmx force-pushed the cstr-bytes branch from d595152 to c5cd9a7 Compare February 23, 2016 15:12

arcnmx force-pushed the cstr-bytes branch from c5cd9a7 to 71f29cd Compare February 23, 2016 16:51

CStr::from_bytes_with_nul tests

71f29cd

bors merged commit 71f29cd into rust-lang:master Feb 23, 2016

llogiq mentioned this pull request Feb 28, 2016

TWIR 120's CotW + PRs rust-lang/this-week-in-rust#174

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CStr::from_bytes_with_nul #30614

CStr::from_bytes_with_nul #30614

arcnmx commented Dec 29, 2015

rust-highfive commented Dec 29, 2015

brson commented Dec 29, 2015

nagisa commented Dec 29, 2015

arcnmx commented Dec 29, 2015

alexcrichton commented Jan 11, 2016

arcnmx commented Jan 11, 2016

alexcrichton commented Jan 12, 2016

arcnmx commented Jan 12, 2016

alexcrichton commented Jan 12, 2016

arcnmx commented Jan 12, 2016

alexcrichton commented Jan 14, 2016

arcnmx commented Jan 15, 2016

alexcrichton commented Jan 19, 2016

arcnmx commented Jan 19, 2016

alexcrichton commented Jan 20, 2016

alexcrichton commented Jan 21, 2016

arcnmx commented Jan 25, 2016

ranma42 Jan 25, 2016

alexcrichton commented Jan 25, 2016

kamalmarhubi Jan 26, 2016

kamalmarhubi Jan 26, 2016

arcnmx commented Feb 23, 2016

alexcrichton commented Feb 23, 2016

bors commented Feb 23, 2016

bors commented Feb 23, 2016

CStr::from_bytes_with_nul #30614

CStr::from_bytes_with_nul #30614

Conversation

arcnmx commented Dec 29, 2015

rust-highfive commented Dec 29, 2015

brson commented Dec 29, 2015

nagisa commented Dec 29, 2015

arcnmx commented Dec 29, 2015

alexcrichton commented Jan 11, 2016

arcnmx commented Jan 11, 2016

alexcrichton commented Jan 12, 2016

arcnmx commented Jan 12, 2016

alexcrichton commented Jan 12, 2016

arcnmx commented Jan 12, 2016

alexcrichton commented Jan 14, 2016

arcnmx commented Jan 15, 2016

alexcrichton commented Jan 19, 2016

arcnmx commented Jan 19, 2016

alexcrichton commented Jan 20, 2016

alexcrichton commented Jan 21, 2016

arcnmx commented Jan 25, 2016

ranma42 Jan 25, 2016

Choose a reason for hiding this comment

alexcrichton commented Jan 25, 2016

kamalmarhubi Jan 26, 2016

Choose a reason for hiding this comment

kamalmarhubi Jan 26, 2016

Choose a reason for hiding this comment

arcnmx commented Feb 23, 2016

alexcrichton commented Feb 23, 2016

bors commented Feb 23, 2016

bors commented Feb 23, 2016