-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalized hashing of keys for memoization #1074
Merged
Merged
Changes from 10 commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
166e951
Added HashGenerator and keyhash function to core.util
jlstevens 17d93ff
Callable now memoizes on the util.keyhash value of the key
jlstevens 10f6440
Skipping memoization if key hashing fails
jlstevens 3b85f34
Updated docstring of HashGenerator
jlstevens a474cac
Renamed HashGenerator to HashableJSON
jlstevens cc655f0
Renamed core.util.keyhash to deephash
jlstevens 2d6fd2d
Removed as_string optional argument of deephash
jlstevens 78c3953
Renamed 'key' argument of deephash to 'obj'
jlstevens c7225fe
Simplified deephash definition
jlstevens 2636055
Added 17 deephash unit tests
jlstevens 7ac04a5
Updated HashableJSON docstring
jlstevens d597fe1
Fix for Python 3 compatibility
jlstevens adcc5cb
Using repr for pandas objects for Python 3 compatibility
jlstevens c1cd64e
Removed heterogeneous keys incompatible with Python 3
jlstevens File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,8 @@ | |
import numpy as np | ||
import param | ||
|
||
import json | ||
|
||
try: | ||
from cyordereddict import OrderedDict | ||
except: | ||
|
@@ -24,6 +26,61 @@ | |
except ImportError: | ||
dd = None | ||
|
||
|
||
|
||
|
||
class HashableJSON(json.JSONEncoder): | ||
""" | ||
Extends JSONEncoder to generate a hashable string for as many types | ||
of object as possible including nested objects and objects that are | ||
not normally hashable. The purpose of this class is to generate | ||
unique strings that once hashed are suitable for memoization. | ||
|
||
By default JSONEncoder supports booleans, numbers, strings, lists, | ||
tuples and dictionaries. In order to support other types such as | ||
sets, datetime objects and mutable objects such as pandas Dataframes | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and custom mutable |
||
or numpy arrays, HashableJSON has to convert these types to | ||
datastructures that can normally be represented as JSON. | ||
|
||
Support for other object types may need to be introduced in | ||
future. By default, unrecognized object types are represented by | ||
their id. | ||
|
||
One limitation of this approach is that dictionaries with composite | ||
keys (e.g tuples) are not supported due to the JSON spec. | ||
""" | ||
string_hashable = (dt.datetime,) | ||
repr_hashable = () | ||
|
||
def default(self, obj): | ||
if isinstance(obj, set): | ||
return hash(frozenset(obj)) | ||
elif isinstance(obj, np.ndarray): | ||
return obj.tolist() | ||
if pd and isinstance(obj, (pd.Series, pd.DataFrame)): | ||
return sorted(obj.to_dict().items()) | ||
elif isinstance(obj, self.string_hashable): | ||
return str(obj) | ||
elif isinstance(obj, self.repr_hashables): | ||
return repr(obj) | ||
try: | ||
return hash(obj) | ||
except: | ||
return id(obj) | ||
|
||
|
||
|
||
def deephash(obj): | ||
""" | ||
Given an object, return a hash using HashableJSON. This hash is not | ||
architecture, Python version or platform independent. | ||
""" | ||
try: | ||
return hash(json.dumps(obj, cls=HashableJSON, sort_keys=True)) | ||
except: | ||
return None | ||
|
||
|
||
# Python3 compatibility | ||
import types | ||
if sys.version_info.major == 3: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, it's not the hashed strings that are memoized, it's the actual results (and only implicitly even those, in this case). The hashed strings are useful for memoization, but they themselves are not memoized.
So maybe "The purpose of this class is to generate unique strings that once hashed are suitable for use in memoization and other cases where deep equality must be tested without storing the entire object".