-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core data types: IEEE floating point (in)compatibility #58
Comments
The reason we'd prefer to omit -0 is to avoid the proliferation of similar but different equality operators. JavaScript has
We spent years on the Records and Tuples proposal, which started out simple, remained simple in most ways, but could never resolve how to generalize JS equality ops to operate on containers containing -0 or NaNs. Ultimately, despite literally years of effort, this "minor detail" caused the proposal to fail. FWIW, In JS, The stance of the JS language on the observable NaN bits when casting numbers to bits is interesting: The JS language clearly states that there is only one NaN in the JS type number. But the JS language does not specify which NaN bit representation should be revealed by casting. Rather, it considers the bit representations for all IEEE NaNs to be in an equivalence class. A NaN can cast into any of them. All of them must cast back to a NaN. Thus, NaN does faithfully round trip through the bit representations, but no one bit representation is guaranteed to round trip through NaN. WASM has a similar stance on NaN bits. In both cases, this is spec determinacy. Those WASM systems that need strong determinism, like all WASM-on-blockchain systems, refine the WASM spec to require one particular bit encoding. But this is just a conforming refinement of the spec. Any WASM system that conforms to the refined spec also conforms to the WASM spec itself. From the perspective of defining the OCapN abstract data model, what it means to omit -0 can equivalently be described as treating -0 and 0 as in the same equivalence class. Either one, round tripping through the protocol, coming back as the other, is a successful round trip. In exchange for this loss of precision, there would be no difference between As with deterministic WASM, if OCapN treats -0 and 0 in the same equivalence class, particular implementations of OCapN can state that they obey a refinement of the OCapN spec which
Agoric would do the second. None of this is evidence that the loss would not still be problematic. It is currently just my guess that the pain of this loss of precision is greater than the pain of building abstractions over number that want to consider -0 and 0 equivalent for some purposes, and distinct for other purposes. I would want to see some real evidence that such a loss in transmission between system would actually cause, or has actually caused, real problems in practice. Complicating my own position, I want OCapN to guarantee that denormals round trip. There are strong reasons why some platforms want to lose the extra precision of denormals when doing internal high speed computation. But OCapN doesn't do any arithmetic over numbers. I see no benefit from allowing OCapN to reduce a denormal to a nearby non-denormal merely because the numbers were transmitted between systems. I'm not suggesting we should have any controversy about denormals. I bring this up because it seems to be a bit on the opposite side of the argument I'm making about -0, possibly weakening my position. |
To keep the ball moving: do Spritely and capnp agree? Is it ok to omit the distinction from -0 and 0 in the |
Forgive me for pushing on this... this may be the only chance to review the question for a long time. Note that my comment has nothing to do with existing OCapN-like implementations; they could all agree on eliminating -0, and all still be incorrect from an IEEE 754 point of view, which has to do with portability of programs that use floating point, not ocap interoperability. Taking an existing program that uses IEEE floating point and making it run in a distributed manner over OCapN seems a perfectly plausible and important use case, and this becomes much much harder if the program has to first be reviewed globally for its potential significant use of -0. The philosophy of IEEE 754 is that every possible kind of floating point hardware is going to have warts, and the only way to get portability of floating point programs is to just decide once and for all what those warts are and what the behavior has to be for each one. Once this is done, and the hardware is validated, we don't have to worry about getting different results on different platforms. This philosophy has worked beautifully. If you want to eliminate -0 you are saying there is one 'wart' that is behaving at variance to the standard, so the whole system is no longer really IEEE 754. Remember that most programmers don't understand floating point and won't care about this, but a certain important community (including scientific programming and perhaps some game developers?) cares enormously about the fine details of floating point. I'm talking about satisfying the latter, not the former. What JSON does with regard to reading and printing floating point does not seem relevant to me. Serious floating point programmers are already wary of reading and printing, which are almost always lossy. FWIW Python supports a distinct -0.0, but has 0.0 == -0.0, arguing that == addresses certain kinds of bugs (obviously different kinds of bugs than what I'm talking about). (I'm pretty sure no one should ever be comparing floating point numbers for equality, by the way.) I sort of see the rationale now, so thanks for the explanation. Looks like a rock and a hard place. "I want OCapN to guarantee that denormals round trip" sounds as if it might address my concern, if we broaden it a little to "I want OCapN to guarantee that IEEE 754 floating point entities (numbers whether normalized or not, infinities, zeroes, NaNs) round trip". That is, we should leave any interpretation of floats up to the host system(s), with the possible exception of ordering, which IEEE 754 prescribes bitwise in a simple manner. |
I don't think capnp has a dog in this fight; every implementation I know of just casts the bits and calls it a day, so we can be as faithful when round-tripping as any proposal we might come up with. |
This is a really good point. I will say that from Spritely's high-level work, of targeting social network issues, faithfulness to IEEE floating point strangeness is not something that is likely to matter to us. However I do have many friends in the gamedev and scientific computing world, and I know they care deeply about these issues. Since, in effect, we really are using IEEE floats, I think we should support all their needs fully. Again, not for a Spritely's high-level goal needs, but so that OCapN is most useful to a broad set of possible users. |
I think what @jar398 brought up in #58 (comment) makes a lot of sense and if we can be faithful to the spec, that'd be good. @erights obviously you've expressed a preference to not have -0 round trip or normalize it to zero. How much of a problem would it be supporting negative zero over CapTP? |
I don't know, but we can explore it. Agoric would at least need to
This one directly touches on the unpleasantness we experience in JS, with too many equality operators. (Though still fewer than many Lisps!) Until this step, if two JS values are The representational trick that some of our Maps use for keys (and likewise Sets and Bags for elements) is to use Thus, -0 would still not round trip through being stored into a Map/Set and then being retrieved. But since Maps, Sets, Keys, and pattern matching are not concepts until above This is all plausible enough that we should give it a serious try. I think it is also plausible enough that OCapN can proceed tentatively assuming that we will succeed unless/until we report an unexpected problem. Only one I cannot imagine anyone has a problem requiring denormals to round trip. But just reiterating in case anyone sees a problems. Anyone? |
See endojs/endo#1602 |
Recognizing that NaNs seem less important than -0, but also recognizing I don't understand how the community uses NaNs so it's possible they're important too: |
JS and wasm both allow NaN canonicalization, and some implementations do so. Thus, only one NaN could round trip though these languages and back as a NaN. |
As a fellow language implementor, you may be amused that one of the uses of NaN canonicalization is NaN boxing, so that non-canonical NaN values can be interpreted as something else. |
There are lots of search results of "NaN canonicalization". The first one I looked at (WebAssembly/design#1463) seemed quite informative but I didn't have the energy to digest it thoroughly (maybe later). However the topic seems to be what happens when operators are applied, not when a value is passed through (which doesn't come up really). As far as I can tell, if you were to pass a noncanonical NaN into webassembly code, which then passes it through and spits it out somewhere (no operators like max or + 0 applied), it comes out unchanged. Reading between the lines it sounds like the same would be true of ecmascript. And that's the situation I think is important in OCapN: if you pass a Nan in a message through an identity function in the vat, sending it onwards (either to an object or to another node) unchanged, will the Nan be swapped out for a different NaN. Or: is an OCapN node permitted to canonicalize a NaN on receipt or on transmission. I think we can say no, without compromising the ability of the objects connected to the OCapN layer to canonicalize NaNs (especially when they perform operations like max). That is, leave it up to the objects when and whether to canonicalize or internally use boxed NaNs. |
If that were true, NaN boxing would not work. Could someone check a NaN
boxing implementation?
On Fri, Jun 23, 2023 at 12:19 PM Jonathan A Rees ***@***.***> wrote:
There are lots of search results of "NaN canonicalization". The first one
I looked at (WebAssembly/design#1463
<WebAssembly/design#1463>) seemed quite
informative but I didn't have the energy to digest it thoroughly (maybe
later). However the topic seems to be what happens when operators are
applied, not when a value is passed through (which doesn't come up really).
As far as I can tell, if you were to pass a noncanonical NaN into
webassembly code, which then passes it through and spits it out somewhere
(no operators like max or + 0 applied), it comes out unchanged. Reading
between the lines it sounds like the same would be true of ecmascript. And
that's the situation I think is important in OCapN: if you pass a Nan in a
message through an identity function in the vat, sending it onwards (either
to an object or to another node) unchanged, will the Nan be swapped out for
a different NaN. Or: is an OCapN node permitted to canonicalize a NaN on
receipt or on transmission. I think we can say no, without compromising the
ability of the objects connected to the OCapN layer to canonicalize NaNs
(especially when they perform operations like max). That is, leave it up to
the objects when and whether to canonicalize.
—
Reply to this email directly, view it on GitHub
<#58 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACC3THYOOJVDZ5ZYEVIBETXMXT4RANCNFSM6AAAAAAYMURSTA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Cheers,
--MarkM
|
couldn't you unbox when the NaN exits whatever environment boxed it? |
I do not understand
On Fri, Jun 23, 2023 at 1:29 PM Jonathan A Rees ***@***.***> wrote:
couldn't you unbox when the NaaN exits whatever environment boxed it?
—
Reply to this email directly, view it on GitHub
<#58 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACC3TDWO5AP3EKN7S644NLXMX4C3ANCNFSM6AAAAAAYMURSTA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Cheers,
--MarkM
|
What I was thinking was that logically we have a circle with OCapN on the outside and webassembly or JS on the inside (some sort of logical 'container'). If a Nan as it goes from outside to inside the circle gets boxed (which as far as I can tell in my reading so far never happens), then you could just invert that operation (unbox) as it goes from inside to outside the circle, and from an external perspective nothing has happened to the data. If there are any operations involved causing the NaN to be replaced by a different NaN, or by a box, that's not a problem, it's just part of the circle's (object's) behavior. But I think this is moot because a Nan that is not subject to any operation is left untouched (and unboxed?) by wasm or JS. Maybe I am wrong. This goes to my hypothetical use case which was using OCapN to decompose and 'distribute' a complex program, perhaps even a C++ program, that uses floating point, and is connected together using generic modules (routing, scheduling, scatter/gather, etc) built using OCapN, written perhaps in some other language like JS. The connector code doesn't know what's being done with the floating point and itself does no operations on floating point, but the code being connected cares that the data is transmitted through the connectors faithfully. |
Can someone clarify something: what does it mean for a JS node to "read" a NaN? JS doesn't have a native way of serializing NaN values, which means the serialization / deserialization has to use a "custom" encoding. From what I understand, the suggestion above is that the encoding is simply the 64bits of the IEEE floating point. I assume that means the serialization / deserialization step would use |
I assume "read" means "convert into a native js number." how we do that is probably up to us, since we'll have to implement the decoders. I realized a couple small caveats wrt capnp: the node implementation parses things into native data structures, so probably has the same constraints as agoric is dealing with re: round-tripping. I have a couple suspicions:
|
If the ocapn spec says that any NaN value when decoded or encoded is canonicalized, then even if some implementation out there does not canonicalize when encoding, receiving from that implementation will be compatible, as long as the receiving implementation is able to decode a non canonical NaN value. As I mentioned, there is no guarantee in JS that the language engine will not do this canonicalization, so I don't believe any guaranteed preservation through a JS node is in fact possible. |
This has been helpful. For me the issue goes to the structure of the OCapN specs, which seem to have a lot of moving parts. A traditional protocol spec like HTTP is purely syntactic: it says what messages are OK to send, perhaps in the context of previous messages. Any semantics is purely motivational or advisory. It gives a concrete syntax, i.e. no notion of an abstract syntax with 'bindings' to a set of concrete syntaxes. It does not talk about how one programs an endpoint at all - programming language, data model, anything like that. We seem to be talking about the 'bindings' to various languages and what is supposed to happen to data when that happens (e.g. Nan changes or 0/-0 canonicalization). Those questions just don't come up in an IETF protocol spec. They would be relegated to a language-specific API for using the protocol. We know the protocol concrete syntax is orthogonal to the language used by the endpoint, since otherwise we could not get interoperation. I think making these separations might make talking about this issue a little easier. E.g. we could say that floating point in the protocol is IEEE floating point with no substitutions on transmission, even if a particular language binding did not use IEEE floating point, or "modified" it via canonicalization or boxing. |
I think what we're really trying to do is define a set of values and an equivalence relation on that set, and the present questions are:
...I actually think that the fact that some specs only specify syntax formally is not necessarily something to emulate. There are counterexamples to this, e.g. The Definition of Standard ML, and if you look far enough back you can find examples of specifications that described even syntax in plain English, which is unthinkable today I think what we do/don't decide to formalize ought to be motivated by what will make building interoperable software easiest, which is going to require talking about possible & likely implementations, even if those discussions don't make it into the spec proper. |
I guess an operational way to specify this type of requirement is, for the Echo Gc object specified in the test suite, If e.g. the test suite passes in -0 and gets back +0, should it fail? |
If we decide that floating point -0 and 0 are distinct equivalence classes, yes, such a test should fail. The first test for maintenance of equivalence classes should be round trip tests through combinations of concrete representation conversions. |
Re specs and syntax, a protocol spec is very different from a programming language spec. IETF has been very successful with its style and I'd be reluctant to innovate. But maybe there is another protocol spec we can emulate. I don't think what you're saying is that different from what I said. You're saying that the bindings in the various language should be designed, perhaps even coordinated, to promote interoperation. That is hard to argue with. But it does not preclude saying something stronger at the protocol syntax level, such as that all IETF floats are expressible 'on the wire', even if some language bindings choose to normalize them on receipt. That way two bindings that don't want to normalize can talk to one another without normalization. Maybe that's an interoperability risk, and it would be better if those two endpoints represented their IETF floats as a data type distinct from OCapN floats. |
I agree. Or rather, I think we need to think in terms of layers of spec, and dependencies between them. The lowest data layer is the abstract data model, which can equally well be described as an abstract syntax. It defines the equivalence classes. For Agoric, Concrete language bindings specs are layered on top of the abstract spec. Hopefully but not necessarily one per endpoint language. For Agoric, this is the remaining responsibility of Concrete syntax specs (hopefully but not necessarily singular) are also, and separately, layered on top of the abstract spec. For Agoric, Abstract tagged-interpretation layers are also, and separately, layered on top of the abstract spec. For Agoric, Concrete language-bindings for taggeds are then layered on the abstract tagged interpretation. For Agoric, the concrete APIs provided by is the most relevant slide for this part of the layering. From https://ocapn.org/files/ocapn-layers-orders-ocapn-talk.pdf |
Maybe programming language specs are simply the better precedent to emulate. The advantage of starting with the abstract syntax / abstract data model is that both protocol and language are concrete syntaxes of the same abstract syntax. The whole premise of what we're doing is that we at least have adapters at each endpoint that translate between a concrete language implementation and a concrete protocol implementation. If the equivalence classes do not round trip across such adapters between concrete syntaxes, then there is a bug somewhere. This same perspective enables concrete language bindings to be coupled to each by other non-protocol means. We would want this case to be equivalent, from the pov of the code in each respective language, to being coupled via the protocol. For the same spec to govern this non-protocol-based interoperation, it cannot fundamentally be a concrete protocol spec. |
I think I'm saying this, but also something much stronger: That the fundamental spec is the definition of equivalence classes over the abstract data. Promoting interoperation is the point! It is not simply a nice-to-have additional property.
This can be accommodated within the abstract-first layering for any given concrete protocol. Let's take the NaN example. A protocol that just transmits the IEEE bit representation with naturally be able to represent all IEEE representable NaNs. But the binding of this concrete syntax to the abstract data model would be to say that all of these represent the one NaN equivalence class. Thus, an adapter, perhaps a membrane that coverts between one instance of this concrete syntax to another instance of this concrete syntax, if it substituted one NaN representation for another in the conversion, would still preserve the abstract requirements, and would still pass all round trip tests.
If their correctness depends on not normalizing, then I'd say they depend on a stronger spec than ocapn. Placing the above ocapn-correctness-preserving adaptor between than would violate their correctness. The spec they count on can be a refinement of the ocapn spec, in that any correct implementation of their refined spec is necessarily a correct implementation of ocapn. But this hypothetical adapter would be ocapn conformant while violating the refined spec. This is a perfectly sensible layering. |
Cross referencing #47 (comment) for the application of this abstract-syntax / equivalence-class / round-trip perspective to the outstanding Unicode string questions. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
@erights I don't agree that the fundamental spec is a definition of equivalence; that doesn't feel right to me, ontologically. I think any spec for anything has to be an actionable definition of conformance of a given artifact to some set of criteria, and an equivalence predicate (or even a 'data model') doesn't provide this on its own. What I'm trying to get my head around now is what kind of artifact we're talking about (an endpoint, I presume, but what exactly constitutes an endpoint) and how conformance is to be tested in general, given that the 'data model' (or equivalence relation etc.) seems to be so important while at the same time the language and protocol details can vary. I'm going to go off and think about this for a while (it's outside the scope of this issue), but I'm happy if others continue to try to develop consensus on this issue. At least I'm starting to understand the non-IEEE (single zero and/or single NaN) position now, which I didn't before. |
Just quickly googling for an explanation of NaN boxing, I found WebAssembly/design#1463 (comment) which I like. |
@jar398, I think informally my comment above (#58 (comment)) captures how I think we will test this: fuzz an echo server and make sure the return values obey the equivalence relation. I think the harder question is what does the spec language look like, and that is non-obvious to me as well -- but it does seem like something that's out of scope here, and more generally probably can be figured out independently from agreeing on what that equivalence relation is. |
I am trying to understand your position, but I do not yet. What artifact does one test to test conformance to a syntactic protocol spec? |
Let's move the discussions about what a spec should look like over to #71, and keep this issue confined to floats. |
(Moved here from #5 (comment))
It's proposed (sorry, can't find the exact reference now) that -0 be excluded from OCapN core data values. While this value may not be used often in real programs, it is used and is potentially important. It seems unfortunate to be compatible with IEEE 754 floating point while deviating in this one small aspect. The difference could lead to program failures that would be very difficult to find, test for, and diagnose and could happen at inopportune moments.
I don't remember the rationale for this exclusion.
I'm less concerned about NaNs but it's conceivable they might matter as well.
The text was updated successfully, but these errors were encountered: