Generalized hashing of keys for memoization #1074

jlstevens · 2017-01-18T17:51:55Z

Addresses #1067.

This approach isn't perfect but it will allow a lot more keys to be memoized than before (sets, lists, dictionaries). The main limitation is that composite values as dictionary keys are not allowed by the JSON spec.

Currently pandas Dataframes/Series and numpy arrays are allowed as mutables. In general you will want to support mutable types explicitly as using the id does not reflect the contents of the object.

This PR isn't ready yet - I want to add a bunch of unit tests.

jlstevens · 2017-01-18T17:57:40Z

Just some quick examples/performance statistics:

%%timeit
   ...: keyhash([datetime.datetime(1,2,3), set([1,2,3]), 
                 pd.DataFrame({'a':[1,2],'b':[3,4]}), 
                 np.array([1,2,3]), np.int64(34)])
   ...:
1000 loops, best of 3: 659 µs per loop

%%timeit
   ...: keyhash([1,2,3, (4,5)])
   ...:
10000 loops, best of 3: 23.5 µs per loop

Unsurprisingly, big keys hash slower than small keys (I expect most keys to be pretty small). Note that datetime objects are also supported.

philippjfr · 2017-01-18T18:22:58Z

Looks good to me. I doubt we'll catch all the cases we'll need straight away so it's good you can add repr_hashable and str_hashable types. Happy to merge when you've added some unit tests.

jbednar · 2017-01-18T18:27:25Z

holoviews/core/util.py

+    supplied, the objects id is used - this may be an issue when dealing
+    with mutable objects. For this reason, it is important to support
+    common mutable types such as pandas Dataframes and numpy arrays
+    (supported).


The bit discussing mutable objects is confusing -- it calls out Pandas and Numpy, but doesn't mention dicts, lists, sets, etc. -- surely those are both mutable and supported as well?

The distinction is between what is supported directly by JSON (dicts, lists) and things that we need to add support for it HashGenerator. Currently support is added for sets, datetime, numpy arrays and pandas dataframes/series. More things may need to be explicitly supported in future.

Ok, please reword to convey that.

jbednar · 2017-01-18T18:29:49Z

holoviews/core/util.py

+    """
+    Extends JSONEncoder to generate a hashable string for as many types
+    of key as possible. These keys are supplied by widgets and Stream
+    parameters and may be nested. If an unrecognized object type is


I don't think the docstring for this class should mention widgets or Streams in any way other than something explicitly labeled as an example. It could mention memoization, though, so that we know the goal of this object is to generate an id that can tell whether two objects differ, not to provide an encoding of the object.

And I'm confused about whether it's truly a HashGenerator -- the docstring says it will generate a hashable string, not a hash, and the code indicates that sometimes it's one, sometimes it's the other. Which is it?

HashableStringGenerator is a pretty ugly name but probably more accurate.

I don't think the docstring for this class should mention widgets or Streams in any way other than something explicitly labeled as an example. It could mention memoization, though, so that we know the goal of this object is to generate an id that can tell whether two objects differ, not to provide an encoding of the object.

Happy to update the docstring as suggested.

Please do! Thanks.

Hmm, all strings are hashable already. It is about taking an arbitrary thing and generating a unique string from it suitable to generate a hash. I agree that is a less ugly name though... @philippjfr Any suggestions?

StringForHashing? Not great.

HashableJSON might be better - the output is a JSON string with some customised support for additional types.

HashableJSONEncoder if we want to be pedantic...

HashableJSON seems fine to me.

jbednar · 2017-01-18T18:40:30Z

holoviews/core/util.py

+
+
+
+def keyhash(key, as_string=False):


Or is this hashable_string?

In any case, I'm not sure in what sense the argument is a "key".

Or is this hashable_string?

No. I'll get rid of the as_string argument as I only needed it temporarily for debugging. The output is an integer, as returned by the Python hash function.

I am happy to rename it. In the context it is used (memoisation) the argument is what we have always called a 'key'. I'll just call it obj here... but then this function will then need a new name. We can't simply use hash clashes with the built-in. Again, any suggestions?

Sure, you'll use it as a key, but that doesn't mean the function has anything to do with keys. obj is a good name for the argument; that's what it is (an arbitrary object). For the function name, maybe objhash (to contrast it with the regular hash(), which doesn't work for arbitrary objects)?

Sure, you'll use it as a key, but that doesn't mean the function has anything to do with keys.

Agreed! Not that keen on objhash as a name though...

jlstevens · 2017-01-18T20:56:06Z

Implemented suggestions and added unit tests. Ready for final review/merge once tests pass.

jbednar · 2017-01-18T21:17:22Z

holoviews/core/util.py

-    (supported).
+    of object as possible including nested objects and objects that are
+    not normally hashable. The purpose of this class is to generate
+    unique strings that once hashed are suitable for memoization.


Strictly speaking, it's not the hashed strings that are memoized, it's the actual results (and only implicitly even those, in this case). The hashed strings are useful for memoization, but they themselves are not memoized.

So maybe "The purpose of this class is to generate unique strings that once hashed are suitable for use in memoization and other cases where deep equality must be tested without storing the entire object".

jbednar · 2017-01-18T21:18:24Z

holoviews/core/util.py

+
+    By default JSONEncoder supports booleans, numbers, strings, lists,
+    tuples and dictionaries. In order to support other types such as
+    sets, datetime objects and mutable objects such as pandas Dataframes


and custom mutable

jbednar · 2017-01-18T21:19:31Z

Not sure why it's failing, but it looks good to me apart from the minor docstring fixes suggested.

jlstevens · 2017-01-18T22:21:27Z

Python 3 has an additional limitation - the sort_keys option in json.dumps fails if it encounters dictionaries with heterogenous keys. This is a very clear bug as the final JSON keys are always strings which can be sorted but unfortunately this issue is unlikely to be fixed.

If we are happy to accept these limitations, the PR is ready to merge now the test have passed. Even with its caveats/holes, this approach greatly expands what can be memoized (before this PR, even a simple list wouldn't allow memoization).

jbednar · 2017-01-18T22:36:43Z

Looks good to me.

philippjfr · 2017-01-18T22:45:08Z

Annoying this turned out to be such a pain. We'll just have to see if we run into any issues in practice. Currently the types of objects that will appear in a stream are fairly limited so I don't foresee any immediate issues. Will merge.

github-actions · 2024-10-26T02:53:03Z

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

jlstevens added 3 commits January 18, 2017 15:57

Added HashGenerator and keyhash function to core.util

166e951

Callable now memoizes on the util.keyhash value of the key

17d93ff

Skipping memoization if key hashing fails

10f6440

jbednar reviewed Jan 18, 2017

View reviewed changes

jlstevens added 7 commits January 18, 2017 20:26

Updated docstring of HashGenerator

3b85f34

Renamed HashGenerator to HashableJSON

a474cac

Renamed core.util.keyhash to deephash

cc655f0

Removed as_string optional argument of deephash

2d6fd2d

Renamed 'key' argument of deephash to 'obj'

78c3953

Simplified deephash definition

c7225fe

Added 17 deephash unit tests

2636055

jbednar reviewed Jan 18, 2017

View reviewed changes

Updated HashableJSON docstring

7ac04a5

jlstevens added 3 commits January 18, 2017 21:23

Fix for Python 3 compatibility

d597fe1

Using repr for pandas objects for Python 3 compatibility

adcc5cb

Removed heterogeneous keys incompatible with Python 3

c1cd64e

philippjfr merged commit facf323 into master Jan 18, 2017

philippjfr deleted the nested_key_hashes branch January 27, 2017 02:53

github-actions bot locked as resolved and limited conversation to collaborators Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalized hashing of keys for memoization #1074

Generalized hashing of keys for memoization #1074

jlstevens commented Jan 18, 2017 •

edited

Loading

jlstevens commented Jan 18, 2017 •

edited

Loading

philippjfr commented Jan 18, 2017

jbednar Jan 18, 2017

jlstevens Jan 18, 2017

jbednar Jan 18, 2017

jbednar Jan 18, 2017 •

edited

Loading

jbednar Jan 18, 2017

jlstevens Jan 18, 2017

jlstevens Jan 18, 2017

jbednar Jan 18, 2017

jlstevens Jan 18, 2017 •

edited

Loading

jbednar Jan 18, 2017

jlstevens Jan 18, 2017

jlstevens Jan 18, 2017

jbednar Jan 18, 2017

jbednar Jan 18, 2017

jlstevens Jan 18, 2017 •

edited

Loading

jbednar Jan 18, 2017

jlstevens Jan 18, 2017

jlstevens commented Jan 18, 2017

jbednar Jan 18, 2017

jbednar Jan 18, 2017

jbednar commented Jan 18, 2017

jlstevens commented Jan 18, 2017 •

edited

Loading

jbednar commented Jan 18, 2017

philippjfr commented Jan 18, 2017

github-actions bot commented Oct 26, 2024

Generalized hashing of keys for memoization #1074

Generalized hashing of keys for memoization #1074

Conversation

jlstevens commented Jan 18, 2017 • edited Loading

jlstevens commented Jan 18, 2017 • edited Loading

philippjfr commented Jan 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbednar Jan 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlstevens Jan 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlstevens Jan 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlstevens commented Jan 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbednar commented Jan 18, 2017

jlstevens commented Jan 18, 2017 • edited Loading

jbednar commented Jan 18, 2017

philippjfr commented Jan 18, 2017

github-actions bot commented Oct 26, 2024

jlstevens commented Jan 18, 2017 •

edited

Loading

jlstevens commented Jan 18, 2017 •

edited

Loading

jbednar Jan 18, 2017 •

edited

Loading

jlstevens Jan 18, 2017 •

edited

Loading

jlstevens Jan 18, 2017 •

edited

Loading

jlstevens commented Jan 18, 2017 •

edited

Loading