-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is stringref a subtype of eqref? #20
Comments
I think that you would agree that it would be an error if we allowed WebAssembly to distinguish strings that can't be distinguished on the JS level -- that |
To add to that:
(+1 to |
Yes, not having a stringref <: eqref and relying on string.eq would be great. Just wanted to get this clarified. |
+1 to |
I don't think it's that easy. And I'm highly uncomfortable with the "because JavaScript" argument. For better or worse, there are languages that have strings with identity. Should we punish them? |
@rossberg in what way do you feel that anything is getting "punished"? |
@jakobkummerow, having string identity implies the existence of an operation to actually observe it by comparing string identity. And that's expected to be a constant time operation. |
My experience in Scheme (gosh I am an old person now) is that languages in which strings have identity lead to unspecified behavior and thus users shying away from using the feature -- pointer equality over strings then becomes a language gotcha or misfeature. Like whether Just for information, @rossberg do you know of languages with immutable strings in which strings have identity? |
JavaScript, PHP, Lua, and Julia. @rossberg String equality varies quite a lot between languages. Most languages do not do 100% string interning, and so string equality is not constant time. It's quite rare for a language to have guaranteed If |
@Pauan, JS strings do not have identity. I'm not sure about the others. But you can e.g. add Scheme and OCaml to that list. For OCaml, OO languages like Java typically also have string objects with identity, though it's unclear whether they would be able to implement those using plain stringrefs in the first place. That structural equality is not O(1) is exactly my point. That makes it no substitute for reference equality. |
@rossberg JS strings have identity in the sense that there isn't any way to do a pointer comparison in JS, the only identity operator is I specifically excluded languages like Scheme, Python, Java, etc. because they do have ways to distinguish strings at the pointer level. Which means that two equal strings can have different pointers, and so their identity is muddled. The definition of identity I am using is the same one used by the famous egal paper. For immutable objects, the only sane behavior is that identity is structural equality.
Yes, but the only way to get That's a very severe performance cost, especially for large strings. As I said before, a language having If I think it would be much better to have a separate |
@Pauan, I see, but usually folks say values have identity to mean that you can observe "pointer equivalence" between them. While the intent of the egal paper is of course correct (for a user-facing language!), their way of redefining the meaning of "identity" itself is perhaps less than helpful in this discussion. But if you prefer, s/identity/reference equality/g in all I (and others) said above. :)
Not all hosts, it's only an issue for a host language like JS that does not have reference equality on strings but simultaneously expects "seamless" interop. And even then we could under-specify the behaviour at the boundary, if that's what it takes. Don't forget that our primary customers are languages compiling to Wasm, not host languages, so we ought to meet their needs first of all. |
When you say "pointer equivalence" do you mean that equal strings will also be equal pointers? Or do you mean like in Python/Scheme/Java/etc. where the pointers aren't equivalent even if the string is structurally equal?
Of course, but we are talking about a host string type. The stringref type is primarily intended to communicate with the host (and with other Wasm modules which accept stringref). When it comes to a language's internal string type (like a Java String, or a Python string) they will likely continue to use linear memory (or the GC proposal) in order to get exactly the semantics that they want. It sounds like you are expecting most languages to use stringref even for their internal string type, but it's unlikely that a stringref would have the exact right behavior, API, and performance guarantees. So I expect that in practice most languages will convert from stringref into their internal string type. So that leads to a question: are there any languages which would desire to use stringref for their internal string type, and they are only capable of using stringref iff stringref supports
But JS is expected to be the most common host (thanks to the web), and so I don't think we should ignore the JS host concerns either. |
By the way, if a language wants to use stringref internally, and it requires That would allow the language to fully support |
Luckily we'll find out soon: this proposal's design specifically took Java's needs into account, and Java (in the form of the J2Wasm compiler) is the first user (as far as I'm aware) to target stringrefs.
Exactly. |
Pointer equivalence is just equality on pointers. It's fully agnostic to the contents of the objects it is pointing to and does not imply anything about it (other than equal pointers implying equal contents).
I don't think that is a useful characterisation. If holding host strings was the sole purpose, then externref would already be a perfectly fine type for that. The point of stringref is to enable sharing the same string representation between program and host, so that communicating it is cheap. Naturally, this only has any benefit if the compiled language is able to use it for its own strings. Otherwise you'd just shift the cost of boundary copies from the engine to the language runtime. That would make the whole proposal moot, AFAICS.
With everything else being equal, I agree we should accommodate JS. But when everything else is not equal, JS interop has lower priority.
Then every string would require twice the allocations and every access would be twice as expensive (and unlike with what we currently accept e.g. for GCed array objects, there would be no way forward out of that). I doubt that languages would be interested in making this choice, in particular, since exposing physical equality is primarily a means for optimising code, so requiring a more expensive implementation to support it would be self-defeating. |
Has there been any thought about pointer-equality exposing internal details of the host, and thus could be a security leak (or enable a side-channel for communication)?
I didn't say sole purpose, just the primary purpose. Many languages simply cannot use stringref at all for their internal string type (for example, C, C++, Rust, etc.) And even if a language would like to use stringref, it might need more control over the API and performance guarantees. So if the goal is for stringref to replace most language's internal string type, I think that's a much larger proposal than what we have right now. Perhaps that discussion can be delayed to a future proposal?
I think languages will need to bear that cost regardless, because many languages attach extra metadata to strings (internal object slots, caching, methods, etc.) So which languages do you think would be able to adopt stringref for their internal string type without any sort of extra wrapping? And which of those languages absolutely require |
I don't know about a security leak, but having I think there are two mismatches we have to choose from:
(1) seems unlikely to work due to the above implementation detail concerns. I wonder if (2) is easier to support by having these languages fallback to using |
Right, and that's one of the key problems. With my apologies to @wingo, but it seems to me that this proposal has not yet made up its mind. Either it extends core Wasm to serve language implementations, but then it needs to serve a sufficiently broad range of languages and be more flexible than what's currently offered (for example, enable ref equality, enable constant-time random byte access, a predictable cost model, etc). Or it is only meant to serve the interaction with specific hosts, but then it is out of place in the core language and the functionality should somehow be moved to the respective API. |
My goodness, I just ran into this for a Scheme compiler; I am using |
If so, how do we determine the identity of strings that come from a JS host so that we can perform ref.eq?
My understanding is that JS strings don't have an identity that can meaningfully be spoken of. Implementations have flexibility to canonicalize and mess around with which string operations create a new pointer value.
The text was updated successfully, but these errors were encountered: