Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support no_std with and without alloc, various other improvements #1

Merged
merged 8 commits into from
Mar 3, 2024

Conversation

kupiakos
Copy link
Contributor

See commits for further details

Features should be referenced by docs.rs now, a couple more keywords
are added, and a link has been fixed.
There should be no breaking changes with the changes I've made.
Copy link
Owner

@CAD97 CAD97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to pass the sniff test; I should hopefully be able to merge and publish this more properly later this week. If I don't, please feel free to ping me.

src/macros.rs Show resolved Hide resolved
.github/workflows/ci.yaml Show resolved Hide resolved
@@ -1,19 +1,28 @@
[package]
name = "cstr8"
version = "0.1.2"
version = "0.1.3"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With no_std support added, I think it's viable to cut a 1.0 release. Thoughts?

I'd release a semver-hack 0.1.x re-export of ^1 ofc. It'd be nice to have this library at 1.0 so it's easier to justify its usage. Especially with c"" literals expected to stabilize in 1.77.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial look says "yes, 1.0 is good". I'll do a more thorough review soon. I have an idea for improving this macro further to accept &str, &[u8], &CStr, and [u8; N] input, though it should be a non-breaking change.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat interestingly, I was working with / writing about macro specialization earlier today, and built out the (annoying) machinery required to do autoderef specialization in a const context (i.e. where you can't directly call trait methods). I don't think I'll need it anytime soon (so probably won't implement it myself) but would be happy to review/merge an implementation. Here's the playground where I got const autoderef specialization working.

Copy link
Contributor Author

@kupiakos kupiakos Mar 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice playground, always glad to see more type inspection and const sorcery. #3 is much simpler than that; it's just many const fn with the same name in distinct impls. I wasn't planning on exposing a trait bound for valid input types to the macro or for it to be extensible, so I don't think it needs to be more complex.

Upon further review, my primary concern with CStr8 stabilization is that it exposes that it is backed by a str, via its Deref and as_ methods, instead of being backed by a CStr. In contrast, CString8 is backed by a CString (which is currently backed by a Box<[u8]>).

That's a fine library choice to make, but it should be seriously reviewed. Is it more valuable to have these cheap impls, or to have API & ABI compatibility with CStr?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo CStr8 is a bit more "str but nul terminated" than it is "CStr but UTF-8". It also isolates CStr8 from if &CStr ever manages to become a thin pointer. Native Rust code deals significantly more in str than CStr (being able to subslice is a huge benefit missed with nul-terminated strings) so it makes sense to privilege the Rust native choice instead of the FFI one.

The heap layout of both is identical, so it's fine that both as_str() and as_c_str() exist and both can exist whether CStr8 wraps str or CStr. That CStr8 uses str but CString8 uses CString is more an accident of convenience than deliberate choice.

Copy link
Contributor Author

@kupiakos kupiakos Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the decision, though I'll admit that it is a little disappointing from an optimization perspective. If you're writing code that receives a raw pointer from C, you must perform strlen in order to unsafely convert to CStr8 even in a thin-&CStr future. This seriously hampers what I can do safely and efficiently with UTF-8 c-strings long-term and sort of defeats my use case for the crate.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few final notes you might find relevant:

  • For Rust usage, the UTF-8 guarantee isn't particularly useful without the ability to get and use &str.
    • CStr8 could do a strlen on deref if we really wanted it to be thin. But an as-needed conversion to &str risks making apparently linear code accidentally quadratic.
    • I don't see the use case for threading FFI-created nul-terminated strings around (without ever inspecting them as &str1) that benefits from using a safe mapping for C char8_t[restrict *].
    • My use case is primarily passing &CStr8 created from Rust to FFI, so having fat pointers on the Rust side isn't problematic. This isn't a PDP11 where avoiding passing the extra state is desirable, and I don't expect Rust to store enough &CStr8 for it to add up (and what the other side of FFI stores is irrelevant to our use).
  • I have some doubts that &std::ffi::CStr will actually become thin, despite the intent to leave that option open. Adding support for “?Sized + DynSized” types without breaking the world has proven difficult enough without the actually API-breaking change to make CStr not MetaSized.
  • In theory, my crate erasable enables using Thin<&CStr8> as a thin pointer even with &CStr8 being fat, for any P<T> which can be converted to/from a thin pointer. I don't currently actually implement the requisite trait, and erasable is due a refactor to take advantage of newer Rust capabilities, but that's the way I prefer having thin slice pointers, making the thin/fatten cost more evident.

I'm hoping to do the erasable refactor soonish so maybe that works for your use case.

Footnotes

  1. If you do create &str, doing the strlen once up front is IMHO preferable, to eliminate the risk of causing accidentally quadratic code (same link).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My "mediocre combination of both worlds" solution is to introduce a new intermediate type NulStr. CStr8 would directly wrap CStr and provide to_str and to_nul_str methods. NulStr wraps str and derefs to it for cheap string operations, so a rename of what CStr8 is today. That way, you can preserve the "known to end in nul" state while also caching the length, allowing users to only do strlen in the exact places where it is needed.

But I can understand not wanting another string type 😄

For Rust usage, the UTF-8 guarantee isn't particularly useful without the ability to get and use &str

You could say the same about &CStr and accessing the bytes via &[u8]. If this library were consistent with CStr, it would provide a to_str that performs the strlen, and not eagerly deref to str. Any operations that don't require knowing the length up-front could be reimplemented to not need to convert to &str first.

I'm hoping to do the erasable refactor soonish so maybe that works for your use case

It's a technically workable solution for my use case, though is less ergonomic than ideal. What's missing other than trait impls is unsafe fn thin_from_ptr<'a>(p: *const i8) -> Thin<&'a CStr8> - a way to construct a Thin<&CStr8> from a raw pointer without going through CStr8::from_ptr/strlen first. That'd be useful to have for ffi::CStr too!

If it ends up being a problem, the plan is to give utf8-cstr some TLC. I'll probably do so anyways because I think a "CStr that is UTF-8" type should exist as a dedicated type. This crate, despite its name, is distinctly not that but rather a "str that is nul-terminated".

have some doubts that &std::ffi::CStr will actually become thin, despite the intent to leave that option open

Agree to disagree. extern type and its impacts like thin-&CStr are useful for so many cases that I believe its landing is inevitable and long-coming (like GATs). My hope is that post-monomorphization errors are seen as the more acceptable cost than additional rarely-used traits, panicking in size_of_val, or refusing to stabilize. It'd be a perfect-is-the-enemy-of-good decision like allowing both panic in const and associated const in traits despite the risk to compile guarantees for generic code. And with the improvements to post-monomorphization errors for inline-const, this kind of error is becoming more and more acceptable.

actually API-breaking change to make CStr not MetaSized

How is that an API-breaking change exactly? Pointer metadata isn't stable yet.

@CAD97 CAD97 merged commit 707534c into CAD97:main Mar 3, 2024
3 checks passed
@CAD97
Copy link
Owner

CAD97 commented Mar 3, 2024

Published as v0.1.3 🎉

https://github.com/CAD97/cstr8/releases/tag/v0.1.3

https://crates.io/crates/cstr8/0.1.3

}
};
($s:expr) => {{
const CSTR: $crate::__internal_unstable::CStr8Array<
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the reason I renamed the const item back to __CSTR is that while local binding names are hygienic in macro_rules!, item names aren't. Thus using __CSTR as the name avoids causing issues when the user has a CSTR name already in scope (and potentially uses it in $s). Double underscore doesn't prevent such issues, but it makes them significantly less likely to happen accidentally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right, I was depending on old (probably always incorrect) info: https://danielkeep.github.io/tlborm/book/mbe-min-hygiene.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants