Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework Serializable class. #62

Merged

Conversation

pipermerriam
Copy link
Member

@pipermerriam pipermerriam commented Apr 12, 2018

What was wrong

The Serializable class was ready for some cleanup.

How was it fixed

  • replaced usage of __setattr_ and __getattr__ with metaclass mechanics and property based setters and getters.
  • removed the cls.exclude API.
  • changed is_mutable to be property based.
  • removed mutation based make_mutable API in favor of a copy based API.
  • implemented __iter__
  • improved __eq__ implementation to only return true on equality of rich values rather than serialized values.
  • added new obj[field_name] and obj[0] based field lookups.
  • removed cls.fields in favor of cls._meta which holds field names, sedes and some extra meta info.
  • reworked __init__ argument handling to be a bit more explicit in how it handles args and kwargs.

@pipermerriam
Copy link
Member Author

@carver @gsalgado ready for high level preliminary review. The collections.namedtuple idea didn't work out as planned due to the way that namedtuple works. It doesn't really create a class that you can properly inherit from (or I was just doing it wrong because of metaclass wizardry).

Copy link
Contributor

@gsalgado gsalgado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, and should allow me to get rid of gsalgado/py-evm@28b9f69#diff-18b46e5ecb111d05094ba0820424e8c7R28

}
obj = cls(**dict(list(params.items()) + list(kwargs.items())))

# TODO: mutability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything really left todo here?

raise NotImplementedError("Not yet implemented")

@property
def is_immutable(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need this as we can just inline the not self.is_mutable, no?

is_mutable = True

_cached_rlp_value = None
_cached_rlp_key = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's something about having two separate attributes for this that makes me uncomfortable... Did you consider using a dict for that? Maybe _cached_snapshot = {}, or maybe a _get_cached_rlp(snapshot_key) method decorated with a lru_cache

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I've changed to use a class level cache. Each subclass of Serializable now has it's own cache which is shared across instances of those objects as well as across both the mutable and immutable versions of those objects.

@pipermerriam pipermerriam changed the title STUB Rework Serializable class. Apr 13, 2018
field = self._meta.field_names[idx]
return getattr(self, field)
else:
raise NotImplementedError("not implemented")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif isinstance(idx, str):
    return getattr(self, idx)

Seems like a straightforward and nice thing to do here.

def __eq__(self, other):
if not hasattr(other, 'serialize'):
return False
return self.serialize(self) == other.serialize(other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is enough. An rlp with type (bytes32, bytes32) and one with type (int, int) can be trivially made to be equal here, which feels wrong to me.

Maybe test the deserialized values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Note that this is the current way that rlp.sedes.Serializable determines equality:

pyrlp/rlp/sedes/lists.py

Lines 204 to 208 in 188abd6

def __eq__(self, other):
"""Two objects are equal, if they are equal after serialization."""
if not hasattr(other.__class__, 'serialize'):
return False
return self.serialize(self) == other.serialize(other)

I may go back to using a version of the following:

return len(self) == len(other) and all(
    compare_eq(left, right)
    for left, right
    in zip(self, other)
)

Where compare_eq is a function which handles things like equality between tuples and lists with the same values which a naive == comparison will return False.

@pipermerriam
Copy link
Member Author

pipermerriam commented Apr 16, 2018

Performance Numbers:

This Branch

Block serializations / sec: 1914.65
Block deserializations / sec: 2519.28
TX serializations / sec: 24268.42
TX deserializations / sec: 33170.06

Master @ f30fe1e

Block serializations / sec: 1935.49
Block deserializations / sec: 774.44
TX serializations / sec: 25199.91
TX deserializations / sec: 13675.92

@gsalgado
Copy link
Contributor

For block importing (i.e. Chain.import_block()), we seem to consistently spend more than twice as long in rlp.decode than in rlp.encode_raw, but that's with py-rlp 0.6.0

@pipermerriam pipermerriam force-pushed the piper/rework-serializable-class branch from 66b5d4a to 738d9da Compare April 16, 2018 19:57
@pipermerriam pipermerriam force-pushed the piper/rework-serializable-class branch from 738d9da to fd14033 Compare April 16, 2018 20:01
@pipermerriam
Copy link
Member Author

@gsalgado @carver can I get a final round of 👁 👁

@pipermerriam
Copy link
Member Author

@gsalgado looking at stats, it looks like we have 2x calls to decode than encode which means this could have a significant effect on the amount of time spend dealing width decoding rlp objects since this branch almost doubles the speed of decoding rlp Serializable instances.

@@ -218,7 +222,6 @@ def decode(rlp, sedes=None, strict=True, **kwargs):
obj = sedes.deserialize(item, **kwargs)
if hasattr(obj, '_cached_rlp'):
obj._cached_rlp = rlp
assert not isinstance(obj, Serializable) or not obj.is_mutable()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was removed in favor of having mutable Serializable instances automatically invalidate their cache.

Copy link
Contributor

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet

@@ -29,185 +71,209 @@ class RLPType2(Serializable):
(RLPType2, [RLPType1(8, b'a', (7, ''))], 'field2_2'),
),
)
def test_validation(rlptype, fields, exception_includes):
def test_serializable_initialization_validation(rlptype, fields, exception_includes):
# TODO: expand tests to combine *args and **kwargs instantiation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time to do?

assert rlp_obj.is_mutable is True

# cache should be populated now.
assert encode(rlp_obj, cache=True) == rlp_code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to turn on caching by default?

@pipermerriam pipermerriam merged commit e5018ce into ethereum:master Apr 17, 2018
@pipermerriam pipermerriam deleted the piper/rework-serializable-class branch April 17, 2018 02:54
pacrob pushed a commit to pacrob/pyrlp that referenced this pull request Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants