You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now you can get a stringview from a string, but not a string from a stringview. We should change the proposal to ensure that you can go back to string from view.
The motivation is that if depending on the Opinions™ that a source language has about what a string consists of, you might want to make views the primary representation for a string.
For example Java/C#/JS/Dart/Kotlin, which consider a string to be any sequence of 16-bit code units, probably want to live in the WTF-16 world. When a string comes in from outside, you'll generally eagerly convert it to a stringview_wtf16, and then operate on it like that.
However there are some operations that are common between strings and which don't logically relate to the view, for example string.concat or string.eq or even string.new_wtf8 (which specifies an external encoding without necessarily caring about internal encoding). Right now if you just use stringview_wtf16 you don't have access to these. We need a way to go from view back to string.
There are three options that I see:
Hang all the view functionality off of string. I.e. string.get_wtf16_codeunit, string.advance_wtf8, and so on.
A string view has-a string. There's an instruction for each kind of view that can get the string. (Whether the string would actually be held by reference or not is related to Is stringref a subtype of eqref? #20.)
I think that (1) would be fine if we were in a WTF-16-only situation -- both for source languages and implementations. If we keep the goal of allowing WTF-8 implementations and codepoint/WTF-8 access for source languages, having views does have the good property of making conversion costs explicit. Basically what @jakobkummerow said here: #12 (comment)
For (2) I am less up-to-date on what GC people are thinking -- is it assumed that upcasting always keeps the same value representation? I.e. casting from view to string is just a type question and doesn't generate any code? If this is the case then I think that rules out (2) in practice. No browser JS implementation has a native WTF-8 string representation. It would also rule out any stateful iterator view (for better or for worse; I am not married to that choice).
I think (3) is doable. FWIW currently the V8 implementation I had doesn't include a link from WTF-8 view to string, but it would be no big deal to add. Therefore I would propose to add:
I agree that (2) would be problematic. In particular, since upcasts can happen implicitly, they indeed must never require any representation changes. So while stringview_wtf8 <: string would be technically implementable, it would be a lot of work.
I have no objections to (3). I'm not yet convinced that adding this feature is necessary, but I don't mind adding it.
Right now you can get a stringview from a string, but not a string from a stringview. We should change the proposal to ensure that you can go back to string from view.
The motivation is that if depending on the Opinions™ that a source language has about what a string consists of, you might want to make views the primary representation for a string.
For example Java/C#/JS/Dart/Kotlin, which consider a string to be any sequence of 16-bit code units, probably want to live in the WTF-16 world. When a string comes in from outside, you'll generally eagerly convert it to a
stringview_wtf16
, and then operate on it like that.However there are some operations that are common between strings and which don't logically relate to the view, for example
string.concat
orstring.eq
or evenstring.new_wtf8
(which specifies an external encoding without necessarily caring about internal encoding). Right now if you just usestringview_wtf16
you don't have access to these. We need a way to go from view back to string.There are three options that I see:
string
. I.e.string.get_wtf16_codeunit
,string.advance_wtf8
, and so on.string.concat
implicitly works on views. See Subtyping relationship between stringref and stringviews #3 and friends.I think that (1) would be fine if we were in a WTF-16-only situation -- both for source languages and implementations. If we keep the goal of allowing WTF-8 implementations and codepoint/WTF-8 access for source languages, having views does have the good property of making conversion costs explicit. Basically what @jakobkummerow said here: #12 (comment)
For (2) I am less up-to-date on what GC people are thinking -- is it assumed that upcasting always keeps the same value representation? I.e. casting from view to string is just a type question and doesn't generate any code? If this is the case then I think that rules out (2) in practice. No browser JS implementation has a native WTF-8 string representation. It would also rule out any stateful iterator view (for better or for worse; I am not married to that choice).
I think (3) is doable. FWIW currently the V8 implementation I had doesn't include a link from WTF-8 view to string, but it would be no big deal to add. Therefore I would propose to add:
Of course names could change; see #12.
Thoughts?
The text was updated successfully, but these errors were encountered: