Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It should be possible to get a string from a stringview #44

Open
wingo opened this issue Sep 12, 2022 · 1 comment
Open

It should be possible to get a string from a stringview #44

wingo opened this issue Sep 12, 2022 · 1 comment

Comments

@wingo
Copy link
Collaborator

wingo commented Sep 12, 2022

Right now you can get a stringview from a string, but not a string from a stringview. We should change the proposal to ensure that you can go back to string from view.

The motivation is that if depending on the Opinions™ that a source language has about what a string consists of, you might want to make views the primary representation for a string.

For example Java/C#/JS/Dart/Kotlin, which consider a string to be any sequence of 16-bit code units, probably want to live in the WTF-16 world. When a string comes in from outside, you'll generally eagerly convert it to a stringview_wtf16, and then operate on it like that.

However there are some operations that are common between strings and which don't logically relate to the view, for example string.concat or string.eq or even string.new_wtf8 (which specifies an external encoding without necessarily caring about internal encoding). Right now if you just use stringview_wtf16 you don't have access to these. We need a way to go from view back to string.

There are three options that I see:

  1. Hang all the view functionality off of string. I.e. string.get_wtf16_codeunit, string.advance_wtf8, and so on.
  2. A string view is-a string. string.concat implicitly works on views. See Subtyping relationship between stringref and stringviews #3 and friends.
  3. A string view has-a string. There's an instruction for each kind of view that can get the string. (Whether the string would actually be held by reference or not is related to Is stringref a subtype of eqref? #20.)

I think that (1) would be fine if we were in a WTF-16-only situation -- both for source languages and implementations. If we keep the goal of allowing WTF-8 implementations and codepoint/WTF-8 access for source languages, having views does have the good property of making conversion costs explicit. Basically what @jakobkummerow said here: #12 (comment)

For (2) I am less up-to-date on what GC people are thinking -- is it assumed that upcasting always keeps the same value representation? I.e. casting from view to string is just a type question and doesn't generate any code? If this is the case then I think that rules out (2) in practice. No browser JS implementation has a native WTF-8 string representation. It would also rule out any stateful iterator view (for better or for worse; I am not married to that choice).

I think (3) is doable. FWIW currently the V8 implementation I had doesn't include a link from WTF-8 view to string, but it would be no big deal to add. Therefore I would propose to add:

(stringview_wtf8.as_string view:stringview_wtf8)
  -> str:stringref
(stringview_wtf16.as_string view:stringview_wtf16)
  -> str:stringref
(stringview_iter.as_string view:stringview_iter)
  -> str:stringref

Of course names could change; see #12.

Thoughts?

@jakobkummerow
Copy link
Collaborator

I agree that (2) would be problematic. In particular, since upcasts can happen implicitly, they indeed must never require any representation changes. So while stringview_wtf8 <: string would be technically implementable, it would be a lot of work.

I have no objections to (3). I'm not yet convinced that adding this feature is necessary, but I don't mind adding it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants