Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change name of identityof operator to digestof. #774

Closed
Praetonus opened this issue Apr 23, 2016 · 8 comments
Closed

Change name of identityof operator to digestof. #774

Praetonus opened this issue Apr 23, 2016 · 8 comments

Comments

@Praetonus
Copy link
Member

This isn't consistent with other pointer/address related stuff. Is it an oversight from the 32 bits port, which introduced USize?

@jemc
Copy link
Member

jemc commented Apr 23, 2016

Probably, yes. But I'll wait for @sylvanc to weigh in.

@jemc
Copy link
Member

jemc commented Apr 27, 2016

We discussed this on this week's sync call.

This question brings up another problem, which is that the identityof operator doesn't return a unique value. For example, on U128 and on tuples, we are using XOR to "encode" more bits into a U64. This means that there are many different U128s that have the same identityof. However, the is operator doesn't use the same mechanism, and actually compares correctly/uniquely. So the problem is that (identityof x) == (identityof y) would intuitively seem like it works like x is y, but it doesn't.

We decided that the name of identityof should be changed to digestof, to reflect that it is intended as a hashable number, instead of as a unique identity. The digestof operator should just be used for hashing and hash-like operations, and the is operator should be used to compare identities.

I'm changing the name of the ticket to reflect this strategy.

@jemc
Copy link
Member

jemc commented Apr 27, 2016

Also, we want to keep the size at U64, even on 32-bit machines, since it means fewer hash collisions.

@jemc jemc changed the title identityof operator returns U64 Change name of identityof operator to disgestof. Apr 27, 2016
@jemc jemc changed the title Change name of identityof operator to disgestof. Change name of identityof operator to digestof. Apr 27, 2016
@Praetonus
Copy link
Member Author

Praetonus commented Apr 29, 2016

I'm not sure everything is consistent here. We currently have a Hashable interface and an operator hashing stuff.

The interface

  • Isn't provided for everything.
  • Is provided for the numeric primitives. The implementation for I/U64 and below is doing a sequence of xor and bit shift. The implementation for I/U128 is xor'ing the upper part with the lower part.
  • On complex objects, the implementation is probably producing a digest depending on the internals of the objects.

The operator

  • Works on everything having a name (i.e. an identity).
  • For I/U64 and below it returns the number itself. For I/U128 it is also xor'ing the upper part with the lower part.
  • For complex objects it returns the address as an U64 (effectively getting the identity).
  • For tuples, it is xor'ing the identity/digest of every member together. So it is xor'ing plain numbers when it contains numbers and xor'ing addresses when it contains objects.

The operator doesn't look good to me. Sometimes it is operating on the contents and some other times on the addresses with no real reason other than "machine words aren't really objects", which I don't think is true especially with the machine word autobox in (and I expect the result of the operator on the same number boxed and unboxed will be different, which is bad).

I think we should

  • Keep the identityof operator and make it work as the name implies ((identityof x) == (identityof y) <=> x is y) for every object. There is however a small concern with garbage collection and object reallocation but I think it can be resolved easily. Basically, if we get the identity of an object x, the value we get isn't associated with the object itself. So x could be GC'ed and a new object could be allocated at the previous address of x. And now we think we have the identity of x but x doesn't exist anymore and we have the identity of some new random object. This could lead to nasty bugs. But I think it could be resolved by treating the result of identityof as a tag alias to the object. It would still have a distinct type and be Equatable to other identities, but it would also be traced by the GC. Then, an object wouldn't be collected until it isn't referenced either by direct aliases or by identity.
  • Make Hashable the standard way to hash objects and provide Hashable for tuples if every member is hashable. my_tuple.hash() would be sugared to something like my_tuple._1.hash() xor my_tuple._2.hash() xor ...

@jemc
Copy link
Member

jemc commented Apr 29, 2016

I'm not sure everything is consistent here. We currently have a Hashable interface and an operator hashing stuff.

That's not quite true, the identityof operator isn't providing a hash - it provides a number that can be hashed. The provided number isn't really intended to be used for anything else, which is why identityof is a bad name.

Keep the identityof operator and make it work as the name implies ((identityof x) == (identityof y) <=> x is y) for every object.

As you imply in this paragraph, if we want to make (identityof x) == (identityof y) work, then the identityof operator can't return a number as it does now, making it useless in the one way the operator is actually used now in stdlib (taking an arbitrary object and providing a hashable number for it). Furthermore, it's not clear to me that there is a real use case for making (identityof x) == (identityof y) work when we already have a working x is y. If you have a use case, let's please talk about it.

But I think it could be resolved by treating the result of identityof as a tag alias to the object.

This is kind of exactly the point. The use cases for the behaviour that you're proposing foridentityof seems like they can already be serviced by simply making a tag alias of the object and holding that as the "identity", which can be compared using is. You don't need a special operator for that, you can do it with simple assignment.

@sylvanc
Copy link
Contributor

sylvanc commented May 2, 2016

I agree with @jemc. The name identityof has mislead people. It does not, never has, and never can, return an identity. What it can return is a number suitable for hashing that is derived from an identity - in other words, a digest.

The relation (identityof x) == (identityof y) <=> x is y cannot, and need not, hold. Treating an identityof result as a tag is also fundamentally wrong: it confuses addresses and numbers.

For example, given an object allocated at 0x3ABC and the number 15036, should the identity of the latter prevent the GC of the former?

As @jemc points out, is is for is :) It works with no GC issues, no special casing, etc.

@Praetonus: I think this conversation has been super productive, actually, because it reinforces that identityof was a bad choice by me, as it mislead you as to the purpose of the operator. Really, it's only purpose is hashfun.pony:47 and related uses. Note that it isn't a hash or an identity there, regardless of my extraordinarily poorly chosen name!

@sylvanc
Copy link
Contributor

sylvanc commented May 2, 2016

PR #791

@sylvanc
Copy link
Contributor

sylvanc commented May 2, 2016

Closed via #791

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants