Support `no_std` with and without `alloc`, various other improvements #1

kupiakos · 2024-02-19T01:37:51Z

See commits for further details

Features should be referenced by docs.rs now, a couple more keywords are added, and a link has been fixed.

There should be no breaking changes with the changes I've made.

CAD97

Looks to pass the sniff test; I should hopefully be able to merge and publish this more properly later this week. If I don't, please feel free to ping me.

src/macros.rs

.github/workflows/ci.yaml

CAD97 · 2024-02-20T01:09:48Z

Cargo.toml

@@ -1,19 +1,28 @@
 [package]
 name = "cstr8"
-version = "0.1.2"
+version = "0.1.3"


With no_std support added, I think it's viable to cut a 1.0 release. Thoughts?

I'd release a semver-hack 0.1.x re-export of ^1 ofc. It'd be nice to have this library at 1.0 so it's easier to justify its usage. Especially with c"" literals expected to stabilize in 1.77.

My initial look says "yes, 1.0 is good". I'll do a more thorough review soon. I have an idea for improving this macro further to accept &str, &[u8], &CStr, and [u8; N] input, though it should be a non-breaking change.

Somewhat interestingly, I was working with / writing about macro specialization earlier today, and built out the (annoying) machinery required to do autoderef specialization in a const context (i.e. where you can't directly call trait methods). I don't think I'll need it anytime soon (so probably won't implement it myself) but would be happy to review/merge an implementation. Here's the playground where I got const autoderef specialization working.

Nice playground, always glad to see more type inspection and const sorcery. #3 is much simpler than that; it's just many const fn with the same name in distinct impls. I wasn't planning on exposing a trait bound for valid input types to the macro or for it to be extensible, so I don't think it needs to be more complex.

Upon further review, my primary concern with CStr8 stabilization is that it exposes that it is backed by a str, via its Deref and as_ methods, instead of being backed by a CStr. In contrast, CString8 is backed by a CString (which is currently backed by a Box<[u8]>).

That's a fine library choice to make, but it should be seriously reviewed. Is it more valuable to have these cheap impls, or to have API & ABI compatibility with CStr?

Imo CStr8 is a bit more "str but nul terminated" than it is "CStr but UTF-8". It also isolates CStr8 from if &CStr ever manages to become a thin pointer. Native Rust code deals significantly more in str than CStr (being able to subslice is a huge benefit missed with nul-terminated strings) so it makes sense to privilege the Rust native choice instead of the FFI one.

The heap layout of both is identical, so it's fine that both as_str() and as_c_str() exist and both can exist whether CStr8 wraps str or CStr. That CStr8 uses str but CString8 uses CString is more an accident of convenience than deliberate choice.

I see the decision, though I'll admit that it is a little disappointing from an optimization perspective. If you're writing code that receives a raw pointer from C, you must perform strlen in order to unsafely convert to CStr8 even in a thin-&CStr future. This seriously hampers what I can do safely and efficiently with UTF-8 c-strings long-term and sort of defeats my use case for the crate.

A few final notes you might find relevant:

For Rust usage, the UTF-8 guarantee isn't particularly useful without the ability to get and use &str.

CStr8 could do a strlen on deref if we really wanted it to be thin. But an as-needed conversion to &str risks making apparently linear code accidentally quadratic.

I don't see the use case for threading FFI-created nul-terminated strings around (without ever inspecting them as &str¹) that benefits from using a safe mapping for C char8_t[restrict *].

My use case is primarily passing &CStr8 created from Rust to FFI, so having fat pointers on the Rust side isn't problematic. This isn't a PDP11 where avoiding passing the extra state is desirable, and I don't expect Rust to store enough &CStr8 for it to add up (and what the other side of FFI stores is irrelevant to our use).

I have some doubts that &std::ffi::CStr will actually become thin, despite the intent to leave that option open. Adding support for “?Sized + DynSized” types without breaking the world has proven difficult enough without the actually API-breaking change to make CStr not MetaSized.

In theory, my crate erasable enables using Thin<&CStr8> as a thin pointer even with &CStr8 being fat, for any P<T> which can be converted to/from a thin pointer. I don't currently actually implement the requisite trait, and erasable is due a refactor to take advantage of newer Rust capabilities, but that's the way I prefer having thin slice pointers, making the thin/fatten cost more evident.

I'm hoping to do the erasable refactor soonish so maybe that works for your use case.

Footnotes

If you do create &str, doing the strlen once up front is IMHO preferable, to eliminate the risk of causing accidentally quadratic code (same link). ↩

My "mediocre combination of both worlds" solution is to introduce a new intermediate type NulStr. CStr8 would directly wrap CStr and provide to_str and to_nul_str methods. NulStr wraps str and derefs to it for cheap string operations, so a rename of what CStr8 is today. That way, you can preserve the "known to end in nul" state while also caching the length, allowing users to only do strlen in the exact places where it is needed.

But I can understand not wanting another string type 😄

For Rust usage, the UTF-8 guarantee isn't particularly useful without the ability to get and use &str

You could say the same about &CStr and accessing the bytes via &[u8]. If this library were consistent with CStr, it would provide a to_str that performs the strlen, and not eagerly deref to str. Any operations that don't require knowing the length up-front could be reimplemented to not need to convert to &str first.

I'm hoping to do the erasable refactor soonish so maybe that works for your use case

It's a technically workable solution for my use case, though is less ergonomic than ideal. What's missing other than trait impls is unsafe fn thin_from_ptr<'a>(p: *const i8) -> Thin<&'a CStr8> - a way to construct a Thin<&CStr8> from a raw pointer without going through CStr8::from_ptr/strlen first. That'd be useful to have for ffi::CStr too!

If it ends up being a problem, the plan is to give utf8-cstr some TLC. I'll probably do so anyways because I think a "CStr that is UTF-8" type should exist as a dedicated type. This crate, despite its name, is distinctly not that but rather a "str that is nul-terminated".

have some doubts that &std::ffi::CStr will actually become thin, despite the intent to leave that option open

Agree to disagree. extern type and its impacts like thin-&CStr are useful for so many cases that I believe its landing is inevitable and long-coming (like GATs). My hope is that post-monomorphization errors are seen as the more acceptable cost than additional rarely-used traits, panicking in size_of_val, or refusing to stabilize. It'd be a perfect-is-the-enemy-of-good decision like allowing both panic in const and associated const in traits despite the risk to compile guarantees for generic code. And with the improvements to post-monomorphization errors for inline-const, this kind of error is becoming more and more acceptable.

actually API-breaking change to make CStr not MetaSized

How is that an API-breaking change exactly? Pointer metadata isn't stable yet.

CAD97 · 2024-03-03T03:53:30Z

Published as v0.1.3 🎉

https://github.com/CAD97/cstr8/releases/tag/v0.1.3

https://crates.io/crates/cstr8/0.1.3

CAD97 · 2024-03-03T22:39:45Z

src/macros.rs

-        }
-    };
+    ($s:expr) => {{
+        const CSTR: $crate::__internal_unstable::CStr8Array<


FYI, the reason I renamed the const item back to __CSTR is that while local binding names are hygienic in macro_rules!, item names aren't. Thus using __CSTR as the name avoids causing issues when the user has a CSTR name already in scope (and potentially uses it in $s). Double underscore doesn't prevent such issues, but it makes them significantly less likely to happen accidentally.

Ah, you're right, I was depending on old (probably always incorrect) info: https://danielkeep.github.io/tlborm/book/mbe-min-hygiene.html

kupiakos added 8 commits February 18, 2024 16:30

Support no_std and remove thiserror dependency

6a09437

Doc improvements

6d3bfba

Features should be referenced by docs.rs now, a couple more keywords are added, and a link has been fixed.

Simplify the cstr8! macro

a5fe1e9

Fix no_std! alloc builds

61752dd

Fix clippy::just_underscores_and_digits

134c1fb

Improve CI

9ed8810

Add shields to README

36e5fbb

Bump to version 0.1.3

707534c

There should be no breaking changes with the changes I've made.

CAD97 reviewed Feb 20, 2024

View reviewed changes

CAD97 merged commit 707534c into CAD97:main Mar 3, 2024
3 checks passed

CAD97 reviewed Mar 3, 2024

View reviewed changes

kupiakos mentioned this pull request Apr 25, 2024

Integrate with erasable #6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `no_std` with and without `alloc`, various other improvements #1

Support `no_std` with and without `alloc`, various other improvements #1

kupiakos commented Feb 19, 2024

CAD97 left a comment

CAD97 Feb 20, 2024

kupiakos Mar 3, 2024

CAD97 Mar 3, 2024

kupiakos Mar 3, 2024 •

edited

Loading

CAD97 Mar 4, 2024

kupiakos Apr 17, 2024 •

edited

Loading

CAD97 Apr 24, 2024

kupiakos Apr 25, 2024

CAD97 commented Mar 3, 2024

CAD97 Mar 3, 2024

kupiakos Mar 3, 2024

Support no_std with and without alloc, various other improvements #1

Support no_std with and without alloc, various other improvements #1

Conversation

kupiakos commented Feb 19, 2024

CAD97 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kupiakos Mar 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kupiakos Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

CAD97 commented Mar 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Support `no_std` with and without `alloc`, various other improvements #1

Support `no_std` with and without `alloc`, various other improvements #1

kupiakos Mar 3, 2024 •

edited

Loading

kupiakos Apr 17, 2024 •

edited

Loading