Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create eep-0063.md: Lightweight UTF-8 binary string literals and patterns #46

Closed
wants to merge 2 commits into from

Conversation

TD5
Copy link
Contributor

@TD5 TD5 commented Jun 1, 2023

No description provided.

@josevalim
Copy link
Contributor

Btw, have you considered using u (for utf-8) instead of b? Thoughts?

@okeuday
Copy link

okeuday commented Jun 1, 2023

@TD5 It is possible to create a bytestring type as:

-type nonempty_bytestring() :: nonempty_list(byte()).
-type bytestring() :: list(byte()).

Adding a bytestring type into Erlang/OTP would be helpful, as part of this. If the compiler knew it was UTF-8, it could have a special type separate from a bytestring, but similar (like utf8string).

@TD5
Copy link
Contributor Author

TD5 commented Jun 5, 2023

Btw, have you considered using u (for utf-8) instead of b? Thoughts?

I hadn't considered it, but it sounds reasonable. I don't myself have a preference. Either could be fine in my view. Why might u be better? Because it implies the specific encoding? Because it aligns with the syntax of other languages with similar features?

@josevalim
Copy link
Contributor

josevalim commented Jun 5, 2023

No particular reason in isolation but I think it matters when it comes to concepts like interpolation, because you need a stronger indicator to know if you are interpolating a list of bytes or a list of characters and I believe the u"..." sigil makes the latter clear.

@jchristgit
Copy link

I hadn't considered it, but it sounds reasonable. I don't myself have a preference. Either could be fine in my view. Why might u be better? Because it implies the specific encoding? Because it aligns with the syntax of other languages with similar features?

Not sure how relevant Python is for Erlang here, but its UTF-8 string literals (when introduced) are written u"like this" while binary string literals are written b"like this" (in Python 3 u more or less became the default):

>>> type("foo")
<class 'str'>
>>> type(b"foo")
<class 'bytes'>
>>> type(u"foo")
<class 'str'>

For Erlang in theory I think both would fit - it is UTF-8 and the binary type - but when I think "bytes", I think some binary data that goes over the wire - when I think "UTF-8", I think some user-facing string. So as a small outside voice I'd vote for u here 🙂 Maybe b could be used for plain binary "strings" without /utf8.

@kikofernandez kikofernandez self-assigned this Sep 13, 2023
@paulnice
Copy link

For myself, bytes could contain any binary string, while utf8 must contain valid utf8 string/bytestring.
It is possible to have a valid bytestring, which represents invalid utf8 string at the same time.

So I'd prefer to have u literal for utf-8

@RaimoNiskanen
Copy link
Contributor

Has EEP 66 (now PR #55) obsoleted this PR?

@TD5
Copy link
Contributor Author

TD5 commented Nov 27, 2023

Has EEP 66 (now PR #55) obsoleted this PR?

I believe so 🙂

@TD5 TD5 closed this Nov 27, 2023
@TD5 TD5 reopened this Nov 27, 2023
@TD5
Copy link
Contributor Author

TD5 commented Nov 27, 2023

Actually, I am now sure that covers patterns, only literals?

@RaimoNiskanen
Copy link
Contributor

RaimoNiskanen commented Nov 27, 2023

Actually, I am now sure that covers patterns, only literals?

Did you mean "not sure"?

It isn't stated in EEP 66, but sigils are a syntactical sugar (transformation) that happens before the parser tries to figure out what is a pattern.

In general the parser may transform a sigil into any expression, for instance for string interpolation call a formatter. Then subsequent compilation steps will see that it cannot be in a pattern. But for the suggested ~b, ~B, ~s, ~S and ~ sigil prefixes, the content is just transformed into another literal, which is allowed in a pattern.

I can clarify this in EEP 66.
Edit: I have clarified this in EEP 66, or rather PR #55.

@TD5
Copy link
Contributor Author

TD5 commented Nov 27, 2023

Yep, I meant "not sure", but I glad to heard this is handled now 🙂

@TD5 TD5 closed this Nov 27, 2023
@TD5 TD5 deleted the patch-1 branch November 27, 2023 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants