-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support no_std
with and without alloc
, various other improvements
#1
Conversation
Features should be referenced by docs.rs now, a couple more keywords are added, and a link has been fixed.
There should be no breaking changes with the changes I've made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks to pass the sniff test; I should hopefully be able to merge and publish this more properly later this week. If I don't, please feel free to ping me.
@@ -1,19 +1,28 @@ | |||
[package] | |||
name = "cstr8" | |||
version = "0.1.2" | |||
version = "0.1.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With no_std support added, I think it's viable to cut a 1.0 release. Thoughts?
I'd release a semver-hack 0.1.x re-export of ^1 ofc. It'd be nice to have this library at 1.0 so it's easier to justify its usage. Especially with c""
literals expected to stabilize in 1.77.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial look says "yes, 1.0 is good". I'll do a more thorough review soon. I have an idea for improving this macro further to accept &str
, &[u8]
, &CStr
, and [u8; N]
input, though it should be a non-breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat interestingly, I was working with / writing about macro specialization earlier today, and built out the (annoying) machinery required to do autoderef specialization in a const
context (i.e. where you can't directly call trait methods). I don't think I'll need it anytime soon (so probably won't implement it myself) but would be happy to review/merge an implementation. Here's the playground where I got const
autoderef specialization working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice playground, always glad to see more type inspection and const
sorcery. #3 is much simpler than that; it's just many const fn
with the same name in distinct impl
s. I wasn't planning on exposing a trait bound for valid input types to the macro or for it to be extensible, so I don't think it needs to be more complex.
Upon further review, my primary concern with CStr8
stabilization is that it exposes that it is backed by a str
, via its Deref
and as_
methods, instead of being backed by a CStr
. In contrast, CString8
is backed by a CString
(which is currently backed by a Box<[u8]>
).
That's a fine library choice to make, but it should be seriously reviewed. Is it more valuable to have these cheap impls, or to have API & ABI compatibility with CStr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo CStr8
is a bit more "str
but nul terminated" than it is "CStr
but UTF-8". It also isolates CStr8
from if &CStr
ever manages to become a thin pointer. Native Rust code deals significantly more in str
than CStr
(being able to subslice is a huge benefit missed with nul-terminated strings) so it makes sense to privilege the Rust native choice instead of the FFI one.
The heap layout of both is identical, so it's fine that both as_str()
and as_c_str()
exist and both can exist whether CStr8
wraps str
or CStr
. That CStr8
uses str
but CString8
uses CString
is more an accident of convenience than deliberate choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the decision, though I'll admit that it is a little disappointing from an optimization perspective. If you're writing code that receives a raw pointer from C, you must perform strlen
in order to unsafe
ly convert to CStr8
even in a thin-&CStr
future. This seriously hampers what I can do safely and efficiently with UTF-8 c-strings long-term and sort of defeats my use case for the crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few final notes you might find relevant:
- For Rust usage, the UTF-8 guarantee isn't particularly useful without the ability to get and use
&str
.CStr8
could do astrlen
on deref if we really wanted it to be thin. But an as-needed conversion to&str
risks making apparently linear code accidentally quadratic.- I don't see the use case for threading FFI-created nul-terminated strings around (without ever inspecting them as
&str
1) that benefits from using a safe mapping for Cchar8_t[restrict *]
. - My use case is primarily passing
&CStr8
created from Rust to FFI, so having fat pointers on the Rust side isn't problematic. This isn't a PDP11 where avoiding passing the extra state is desirable, and I don't expect Rust to store enough&CStr8
for it to add up (and what the other side of FFI stores is irrelevant to our use).
- I have some doubts that
&std::ffi::CStr
will actually become thin, despite the intent to leave that option open. Adding support for “?Sized + DynSized
” types without breaking the world has proven difficult enough without the actually API-breaking change to makeCStr
notMetaSized
. - In theory, my crate erasable enables using
Thin<&CStr8>
as a thin pointer even with&CStr8
being fat, for anyP<T>
which can be converted to/from a thin pointer. I don't currently actually implement the requisite trait, and erasable is due a refactor to take advantage of newer Rust capabilities, but that's the way I prefer having thin slice pointers, making the thin/fatten cost more evident.
I'm hoping to do the erasable refactor soonish so maybe that works for your use case.
Footnotes
-
If you do create
&str
, doing thestrlen
once up front is IMHO preferable, to eliminate the risk of causing accidentally quadratic code (same link). ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My "mediocre combination of both worlds" solution is to introduce a new intermediate type NulStr
. CStr8
would directly wrap CStr
and provide to_str
and to_nul_str
methods. NulStr
wraps str
and derefs to it for cheap string operations, so a rename of what CStr8
is today. That way, you can preserve the "known to end in nul" state while also caching the length, allowing users to only do strlen
in the exact places where it is needed.
But I can understand not wanting another string type 😄
For Rust usage, the UTF-8 guarantee isn't particularly useful without the ability to get and use
&str
You could say the same about &CStr
and accessing the bytes via &[u8]
. If this library were consistent with CStr
, it would provide a to_str
that performs the strlen
, and not eagerly deref to str
. Any operations that don't require knowing the length up-front could be reimplemented to not need to convert to &str
first.
I'm hoping to do the erasable refactor soonish so maybe that works for your use case
It's a technically workable solution for my use case, though is less ergonomic than ideal. What's missing other than trait impls is unsafe fn thin_from_ptr<'a>(p: *const i8) -> Thin<&'a CStr8>
- a way to construct a Thin<&CStr8>
from a raw pointer without going through CStr8::from_ptr
/strlen
first. That'd be useful to have for ffi::CStr
too!
If it ends up being a problem, the plan is to give utf8-cstr
some TLC. I'll probably do so anyways because I think a "CStr that is UTF-8" type should exist as a dedicated type. This crate, despite its name, is distinctly not that but rather a "str that is nul-terminated".
have some doubts that &std::ffi::CStr will actually become thin, despite the intent to leave that option open
Agree to disagree. extern type
and its impacts like thin-&CStr
are useful for so many cases that I believe its landing is inevitable and long-coming (like GATs). My hope is that post-monomorphization errors are seen as the more acceptable cost than additional rarely-used traits, panicking in size_of_val
, or refusing to stabilize. It'd be a perfect-is-the-enemy-of-good decision like allowing both panic
in const
and associated const
in traits despite the risk to compile guarantees for generic code. And with the improvements to post-monomorphization errors for inline-const, this kind of error is becoming more and more acceptable.
actually API-breaking change to make CStr not MetaSized
How is that an API-breaking change exactly? Pointer metadata isn't stable yet.
Published as v0.1.3 🎉 |
} | ||
}; | ||
($s:expr) => {{ | ||
const CSTR: $crate::__internal_unstable::CStr8Array< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, the reason I renamed the const
item back to __CSTR
is that while local binding names are hygienic in macro_rules!
, item names aren't. Thus using __CSTR
as the name avoids causing issues when the user has a CSTR
name already in scope (and potentially uses it in $s
). Double underscore doesn't prevent such issues, but it makes them significantly less likely to happen accidentally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right, I was depending on old (probably always incorrect) info: https://danielkeep.github.io/tlborm/book/mbe-min-hygiene.html
See commits for further details