Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: c"…" string literals #3348

Merged
merged 10 commits into from
Dec 14, 2022
Merged

RFC: c"…" string literals #3348

merged 10 commits into from
Dec 14, 2022

Conversation

m-ou-se
Copy link
Member

@m-ou-se m-ou-se commented Nov 15, 2022

@m-ou-se m-ou-se added T-lang Relevant to the language team, which will review and decide on the RFC. A-syntax Syntax related proposals & ideas labels Nov 15, 2022
@m-ou-se m-ou-se added the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Nov 15, 2022
@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 15, 2022

Three weeks ago, the lang team said they would be interested in potentially doing this in the future. So here's an RFC. :)

@clarfonthey
Copy link

I'm on board. I'd even consider that a future extension might be to allow os"..." string literals, but that seems probably more iffy since it'd be the first case of a language item not being available in no_std environments. (I think?)

One other potential thing to thing about is whether c"..." string patterns should be allowed. Like, completely outside of the realm of constant patterns, if c"..." would be considered a valid pattern for macros, etc.


Accepted escape codes: [Quote](https://doc.rust-lang.org/reference/tokens.html#quote-escapes) & [Unicode](https://doc.rust-lang.org/reference/tokens.html#unicode-escapes) & [Byte](https://doc.rust-lang.org/reference/tokens.html#byte-escapes).

Unicode characters are accepted and encoded as UTF-8. That is, `c"🦀"`, `c"\u{1F980}"` and `c"\xf0\x9f\xa6\x80"` are all accepted and equivalent.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish byte string literals had this support too, so big 👍 on this!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth proposing that in a separate RFC. That would also resolve one unresolved question of concat_bytes, if we accept that mixing UTF-8 and non-UTF-8 in byte strings is okay.

Copy link
Member Author

@m-ou-se m-ou-se Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrote an RFC for that: #3349

@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 15, 2022

I'd even consider that a future extension might be to allow os"..." string literals

I was hoping to make things like os!"..." possible without extending the language for each prefix: #3267. But that proposal turned out to be quite controversial and was rejected.

An alternative would be to allow literals like "…" to implicitly convert to more than just &str (just like how 123 can be u32 or i64, etc. etc.). Some kind of const FromLiteral trait or something, once we have const traits. Then "…" could implicitly become a &CStr, and 123 a BigNum, etc. Not sure how exactly that feature would work though, but I'll mention it in the alternatives section.

@afetisov
Copy link

One concern I have is that if single-letter prefixes become common, extending the language with new prefixes can become confusing. Although, if br and cr are treated as fixed literals rather than composition, this may be a non-issue.

Co-authored-by: konsumlamm <[email protected]>
@nagisa
Copy link
Member

nagisa commented Nov 15, 2022

I have two rhetorical questions with regards to the RFC text:

  1. What does the dependence of this feature on the standard library types means for #[no_core] crates? Would it be possible to do something/anything that would make #[no_core] crates utilizing the c"" literals to work out of the box still?
  2. What does the defaulting to UTF-8 encoding mean when interacting with C source that targets non-UTF-8 locales (lets say the linked-in C code is encoded in JIS, and the environment is also set up for JIS?) How does that interact with whatever reasonable assumptions a developer might make about c""?

@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 15, 2022

  1. What does the dependence of this feature on the standard library types means for #[no_core] crates? Would it be possible to do something/anything that would make #[no_core] crates utilizing the c"" literals to work out of the box still?

Do we even support no_core? I suppose it just means that they'd have to define the CStr lang item if they want to use c"" syntax. I think we could make not the type but a constructor function the lang item, such that they can decide themselves what to do with the [u8; N]. (In core, that'd basically be CStr::from_bytes_with_nul_unchecked.)

  1. What does the defaulting to UTF-8 encoding mean when interacting with C source that targets non-UTF-8 locales (lets say the linked-in C code is encoded in JIS, and the environment is also set up for JIS?) How does that interact with whatever reasonable assumptions a developer might make about c""?

The exact same as would happen when using regular string literals. For example, libc::puts("我名字叫玛拉。".as_ptr() as _) is already possible. It'll just pass the string as UTF-8 encoded bytes. 🤷‍♀️


- Also add `c'…'` C character literals? (`u8`, `i8`, `c_char`, or something more flexible?)

- Should we make `&CStr` a thin pointer before stabilizing this? (If so, how?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a blocker on stabilization, yeah.

Copy link
Member

@Kixiron Kixiron Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this feature is blocked by that at all really. It produces an &'static CStr regardless of what &CStr itself is made of.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kixiron To be clear, I think considering that question should be a blocker for stabilization.

Given that a major use case of this will be FFI, it seems important that we have a simple, not-error-prone way of passing a C string to C functions. If we decide that &CStr wasn't that mechanism, then we should decide what that mechanism should be, and make sure c"..." works well with that.

@rfcbot rfcbot added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Nov 29, 2022
@rfcbot
Copy link
Collaborator

rfcbot commented Nov 29, 2022

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Nov 29, 2022
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. to-announce and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Dec 9, 2022
@rfcbot
Copy link
Collaborator

rfcbot commented Dec 9, 2022

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@CAD97
Copy link

CAD97 commented Dec 13, 2022

Just a minor additional note: I want to second that even if c"…" and bc"…" both create &CStr, having the former carry a guarantee of WF UTF-8 is beneficial to readers of the code that the former is known to be UTF-8 encoded (and the latter probably intended to contain non-UTF-8 encoded data). This includes procedural macros which are capable of seeing the prefix used and using the UTF-8 guarantee for 3rd party guaranteed UTF-8 CStr variants like e.g. cstr8::CStr8 (disclaimer: my own crate) and interop with C++ std::u8string/std::u8string_view.

Just having c"…" be &CStr and allow arbitrary nonnul bytes is probably the more practical choice. The proc macro which would've used the guaranteed-UTF-8 can just as easily take a normal string literal and convert it to a c"…" literal internally like it would today (but benefitting from the automatic interior-nul checking).

(Polymorphic string literals is probably the ideal long-term position, but having c", c8", c16", u", u8", u16", u32", char" (etc. or w/e) prefixes to explicitly disambiguate which string type from whatever string types this theoretical future std provides is still reasonable and a good idea. (Super explicit: not proposing any of these at this time.))

However, as a data point, the windows crate provides c!("…") as just concat!("…", "\0").as_ptr(), and despite the lack of interior-nul checking, the guaranteed-UTF-8 is useful. (They also currently provide w! for the same thing but for UTF-16, and h! for HSTRING.) Asking the team working on the windows crate how they'd ideally like to utilize c"…" is probably worth doing sometime before stabilization. (Not to prioritize windows over Linux or macOS; it's just what I'm familiar with. It's probably worth asking the Rust-for-Linux and Android people for their input as well.)

@tmandry tmandry merged commit 873890e into rust-lang:master Dec 14, 2022
@tmandry
Copy link
Member

tmandry commented Dec 14, 2022

Huzzah! The @rust-lang/lang team has decided to accept this RFC.

To track further discussion, subscribe to the tracking issue here:
rust-lang/rust#105723

@m-ou-se m-ou-se deleted the c-str-literal branch January 3, 2023 13:57
Manishearth added a commit to Manishearth/rust that referenced this pull request May 4, 2023
…r-errors

Implement RFC 3348, `c"foo"` literals

RFC: rust-lang/rfcs#3348
Tracking issue: rust-lang#105723
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request May 4, 2023
…r-errors

Implement RFC 3348, `c"foo"` literals

RFC: rust-lang/rfcs#3348
Tracking issue: rust-lang#105723
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this pull request May 5, 2023
…r-errors

Implement RFC 3348, `c"foo"` literals

RFC: rust-lang/rfcs#3348
Tracking issue: rust-lang#105723
flip1995 pushed a commit to flip1995/rust-clippy that referenced this pull request May 20, 2023
Implement RFC 3348, `c"foo"` literals

RFC: rust-lang/rfcs#3348
Tracking issue: #105723
thomcc pushed a commit to tcdi/postgrestd that referenced this pull request Jul 18, 2023
Implement RFC 3348, `c"foo"` literals

RFC: rust-lang/rfcs#3348
Tracking issue: #105723
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-syntax Syntax related proposals & ideas disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-lang Relevant to the language team, which will review and decide on the RFC. to-announce
Projects
None yet
Development

Successfully merging this pull request may close these issues.