-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RLP fields as proper object attributes. #44
Comments
Actually, overwriting a parent's fields attribute in a rlp.Serializable does not work currently: #45 |
This issue now has a funding of 0.4 ETH (413.69 USD) attached to it.
|
If you are interested in claiming this bounty please submit a proposal here on this issue as well as asking any questions you may have prior to claiming the bounty. I'd like to avoid:
|
Brute force way for starters, unless I am misunderstanding the requirements: class sedesBaseClass:
pass # Whatever base class all fields have in common
class rlpSerializeable:
# Generate fields on demand
@property
def fields(self):
# I know there's a way to make this method a tuple generator
ls = []
for field in dir(self):
if field is 'fields':
continue # Avoid infinite recursion
val = getattr(self, field)
if isinstance(val, sedesBaseClass):
ls.append((field, val))
return tuple(ls)
# or whatever function serializes the object
def serialize(self):
return self.fields
class A(rlpSerializeable):
field_a = sedesBaseClass()
class B(A):
field_b = sedesBaseClass()
class C(A):
field_c = sedesBaseClass()
if __name__ == '__main__':
a_fields = A().serialize()
assert a_fields[0][0] == 'field_a'
b_fields = B().serialize()
assert b_fields[0][0] == 'field_a'
assert b_fields[1][0] == 'field_b'
c_fields = C().serialize()
assert c_fields[0][0] == 'field_a'
assert c_fields[1][0] == 'field_c'
print("Works!") |
👍 for including backwards compatability with the However, I believe this fails at the class level. |
Can you provide a counter example that shows that? I'm not sure I understand what you mean. Do these classes dynamically add/remove fields? Also, wouldn't it be able to get these fields at the classes level because they're defined as class variables? I'm not too familiar with Django to know the idea you described offhand. |
Actually, while working up an example to show you, it exposed some even deeper problems with your approach (hint, the |
Another issue: RLP is order-dependent. What is the right order? I assume it's that the super-class fields all come first, in definitional order. Then subclasses, in defined-order. Can we even retrieve attributes in definitional order?
|
Haha, me too. I just looked at the issue at the brute force way seemed like a good enough start. Someone else can build off this if don't get a chance to revisit. |
@carver looking at the docs for python3, it looks like metaclasses can maintain field ordering. import collections
class OrderedClass(type):
@classmethod
def __prepare__(metacls, name, bases, **kwds):
return collections.OrderedDict()
def __new__(cls, name, bases, namespace, **kwds):
result = type.__new__(cls, name, bases, dict(namespace))
result.members = tuple(namespace.keys())
result.base_members = [tuple(base.members) for base in bases if hasattr(base, 'members')]
return result
class A(metaclass=OrderedClass):
def one(self): pass
def two(self): pass
def three(self): pass
def four(self): pass
class B(A):
def five(self): pass
>>> A.members
('__module__', '__qualname__', 'one', 'two', 'three', 'four')
>>> A.base_members
[]
>>> B.members
('__module__', '__qualname__', 'five')
>>> B.base_members
[('__module__', '__qualname__', 'one', 'two', 'three', 'four')] With that information, we would be able to maintain field ordering within a class, as well as treating new fields in subclasses as they should be appended by default with
That gets us most of the way there. I think the only thing we need to allow full control over field ordering is a way to dictate where in the previous class's fields a new field should go assuming they don't want to use the default ordering. Two options come to mind for dealing with this and we might want to do both (or some other better idea). Support
|
Fixed the brute force case so that it recurses in Inheritance order. This method only works if there is not dynamic modification of the fields, which I think is true based on using class variables class sedesClass:
pass # Whatever class(es) needed
def get_fields(cls):
fields = []
if cls is rlpSerializeable:
return tuple() # End recursion
for parent in cls.__bases__:
fields.extend(get_fields(parent))
# Fields are either empty tuple or tuple of tuples
if len(cls._fields) > 0 and isinstance(cls._fields[0], tuple):
fields.extend(list(cls._fields))
else:
fields.append(cls._fields)
return fields
class rlpSerializeable:
@property
def fields(self):
return get_fields(self.__class__)
# or whatever function serializes the object
def serialize(self):
return tuple(self.fields)
class A(rlpSerializeable):
_fields = (
('field_a', sedesClass)
)
class B(A):
_fields = (
('field_b', sedesClass)
)
class C(A):
_fields = (
('c_field', sedesClass),
('another_field', sedesClass)
)
class D(C):
pass # No fields
if __name__ == '__main__':
a_fields = A().serialize()
assert a_fields[0][0] == 'field_a'
b_fields = B().serialize()
assert b_fields[0][0] == 'field_a'
assert b_fields[1][0] == 'field_b'
c_fields = C().serialize()
assert c_fields[0][0] == 'field_a'
assert c_fields[1][0] == 'c_field'
assert c_fields[2][0] == 'another_field'
d_fields = D().serialize()
assert d_fields[0][0] == 'field_a'
assert d_fields[1][0] == 'c_field'
assert d_fields[2][0] == 'another_field'
print("Works!") |
Hah, cool that this is literally the textbook example for metaclasses.
I'm less keen on this style, because the ordering information doesn't semantically belong in the sedes
This seems cleaner and more explicit, with a bonus of being familiar to anyone with Django experience. |
To summarize:
|
Alternative proposal: Keep the class Parent(rlp.Serializable):
fields = [
('a', big_endian_int),
('b', big_endian_int),
]
class Child1(Parent):
""""Appends a field"""
fields = parent_fields + [
('c', big_endian_int)
]
class Child2(Parent):
"""Prepends a field"""
fields = [
('z', big_endian_int)
] + parent_fields
class Child3(Parent):
""""Replaces all fields""""
fields = [
('x', big_endian_int)
] To make creating class Child4(Parent):
""""Replaces some field"""
fields = replace_fields(parent_fields, [
('a', binary)
])
class Child5(Parent):
""""Inserts a field.""""
fields = insert_fields_after(parent_fields, 'a', [
('a2', big_endian_int),
])
class Child6(Parent):
""""Remove a field."""
fields = remove_fields(parent_fields, [
'a',
] Additionally, one could replace by setting an additional field: class Child6(Parent):
""""Replace some fields.""""
replaced_fields = [
('a', binary)
] If we turn Implementation would be straightforward: class SerializableMeta(type):
@classmethod
def __prepare__(metacls, name, bases):
namespace = {}
# look for `fields` in bases (if there are multiple, either raise an error or take the first)
# if found set `namespace['parent_fields'] = bases[0].fields`
return namespace
def __new__(cls, name, bases, namespace):
# look for `'replaced_fields'` in namespace
# if found replace fields in `namespace['fields']` accordingly
return result Obviously, this doesn't use Django-style field properties. So if that's a hard requirement, this approach wouldn't work. I personally prefer the old style though (because I find it more explicit and clearer), but I'm probably biased in that regard. |
My intent with this idea was to make overriding existing fields easy. I didn't at the time consider the field ordering issue which makes the solution less perfect, however, I think that the solutions that are currently on the table to manage ordering are good-enough. I also think that it'd be fine to do something the way you propose with some helpers. I don't like the solution as much because it feels less elegant but elegance isn't a hard requirement. The field class approach loses some of its elegance when futzing with field ordering comes into play so that's one mark against it, however, the helper functions that operate on the |
If a second RLP renames or reorders fields of a first, then I think the second RLP is not a strict subclass of the first, and we probably shouldn't encourage treating it as one. Instead, we might do something like COMMON_FIELDS = (
('a', big_endian_int),
('b', big_endian_int),
)
class Serial1(rlp.Serializable):
_fields = COMMON_FIELDS
class SerialChild(Serial1):
"""
Appends a field. All the parent fields are present, in the same order.
"""
_fields = (
('c', big_endian_int),
)
class Serial2(rlp.Serializable):
"""Prepends a field"""
_fields = (
('z', big_endian_int),
) + COMMON_FIELDS
class Serial3(rlp.Serializable):
""""Replaces all fields""""
_fields = (
('x', big_endian_int),
) I agree that the Django-style approach is more elegant, especially for single RLP class definitions. Since I think single explicitly-defined RLP classes are the common case (and maybe we should encourage them to be), I prefer the Django-style approach. class Serial1(rlp.Serializable):
a = big_endian_int
b = big_endian_int
class SerialChild(Serial1):
"""
Appends a field. All the parent fields are present, in the same order.
"""
c = big_endian_int
class Serial2(rlp.Serializable):
"""Prepends a field"""
z = big_endian_int
a = big_endian_int
b = big_endian_int
class Serial3(rlp.Serializable):
""""Replaces all fields""""
x = big_endian_int That little bit of repetition on |
The more I think about it, the more I think that RPL object inheritance for the purpose of changing fields won't be a major use case. This is based on how things are setup in Py-EVM. One option we haven't considered is requiring that either all fields be overridden in subclasses or none at all. Basically, rather than having a subtle API for ordering fields that we understand but is probably not intuitive, we can instead raise an exception during class construction if not all of the fields were overridden. This forces explicit field ordering in a manner that should make all RLP objects easy to know what the field ordering is since you just have to trace up the inheritance chain to the first object which defines any fields. |
👍 I'm in favor of this |
I've updated the main issue to reflect what I believe is the current consensus for how this should be implemented. |
@pipermerriam Is this one you still want to be worked? I'll resend it out on Slack / Twitter, if so |
@vs77bb yes, this is an issue that would be a big nice-to-have improvement to the python RLP API. |
hey @pipermerriam, we are in the middle of a migration between smart contracts and in order to migrate away from all the old code paths, we are cleaning up some of the straggler bounties. we're sorry this one didnt work out for ya. do you want to kill the bounty here to recover your funds? it'll take only 30 seconds... and itll mean you get that 0.4 ETH back. any questions, let me know. |
happy to put a bounty up for ya if you want. is it worth creating a clean issue ? |
@pipermerriam I was going through this issue and wanted to take a shot at it. Is this issue still up (or) has been resolved in the past? If it's still up, the whole issue at this point of time is boiled down to just Implement |
I don't think it's worth prioritizing over other efforts. Maybe do it in the SSZ repo instead and we can backport the implementation to pyrlp? Probably of higher value in the SSZ repo and lower risk. |
I feel that a solution using This is how we define a class Base(rlp.Serializable):
fields = (
('field_a', sedes.Binary),
)
class InheritsFromBase(MyBaseObject):
fields = (
('field_a', sedes.Binary),
('another_field', sedes.Binary),
) This is the API I propose: @rlp.serializable
class Base:
field_a: sedes.Binary
@rlp.serializable
class InheritsFromBase(MyBaseObject):
another_field: sedes.Binary From what I have understood so far, we have the following requirements / constraints with such classes:
Proposed solutionIf it's not obvious already from the proposed API, I intend to use CaveatsField orderingAccording to the Python docs, from dataclasses import dataclass
from rlp import sedes
@dataclass
class Base:
a: sedes.Binary
b: sedes.Binary
@dataclass
class Inherited(Base):
z: sedes.Binary
a: sedes.BigEndianInt
>>> Base.__dataclass_fields__.keys()
dict_keys(['a', 'b'])
>>> Inherited.__dataclass_fields__.keys()
dict_keys(['a', 'b', 'z']) This could be a deal breaker if:
Python 3.7 only
The alternativeProvided that we're okay with a third-party dependency, we can solve both the caveats mentioned above, by using attrs. See example below: import attr
from rlp import sedes
@attr.s
class Base:
a: sedes.Binary = attr.ib()
b: sedes.Binary = attr.ib()
@attr.s
class Inherited(Base):
z: sedes.Binary = attr.ib()
a: sedes.BigEndianInt = attr.ib()
>>> [each.name for each in Base.__attrs_attrs__]
['a', 'b']
>>> [each.name for each in Base.__attrs_attrs__]
['b', 'z', 'a'] ConclusionI'll conclude with two questions:
This is also a good opportunity to remove some of the black magic in this file. |
I'm not strictly against a decorator API, but in my opinion defining methods in a metaclass is the cleaner approach than monkey patching them in a decorator. I agree that I don't like the way inheritance and field ordering is handled in both |
In my experience, meta-programming produces code that's hard to debug, less readable, and often difficult to extend. Some of these problems are solved to a great extent by
Personally, I would prefer an explicit class MyBaseObject(rlp.Serializable, metaclass=Wizardy): # same as @Wizardry decorator
field_a = sedes.Binary() over this: class MyBaseObject(rlp.SerializableClassThatUsesMetaclassWizardy):
field_a = sedes.Binary() because there are less number of clever things happening behind the scenes, and there is more transparency to someone extending RLP classes. The fact that an inherited class can change the very structure of the derived class seems dangerous to me.
Can you please help me understand what an ideal inheritance behaviour would look like? |
Ideally, order and set of fields is as easy as possible to define and to determine. I'm not certain that it's the ideal solution, but I like what Piper suggested above:
I guess this would be quite straightforward to implement in both the metaclass and the third party library approach. |
My intuition is that using I think we could spin/bikeshed on design/approach for a while here and not really get anywhere because there's a lot of subjectivity as well as different angles to evaluate each approach. I'd advocate for you to do an MVP implementation of whatever approach you think is best (taking into consideration all of the things that have been discussed in this issue) at which point we can evaluate something more concrete. |
Also, I am still reasonably strongly of the opinion that a metaclass approach is going to be adequate, maintainable, and intuitive enough. |
👍 I'm still in favor of this |
…n-docs Replace web3 reference with <MODULE_NAME>
What is the problem
RLP objects are not very good at inheritance.
There is no good pythonic way to extend a base class's fields.
How can we fix it.
Metaclass wizardry!
Much the same way that django models perform magical metaclass voodoo, we can do the same. Implement a metaclass for use that finds all attributes which are valid
sedes
types and does all the fancy stuff that happens within the__init__
method ofSerializable
. In general, doing away withSerializable.fields
seems like a nice cleanup.The example above would look something like:
Implementation specifics.
Field class
We need a
Field
class which we can use to define RLP fields. This should be a thin wrapper of sorts around arlp.sedes
object. Ideally, we can have some sort of thin shortcut for converting asedes
object to aField
object using something like a classmethodField.from_sedes(...)
Metaclass
Unless an alternate solution is presented, the
rlp.Serializable
class needs to be converted to use a metaclass. This metaclass will need to introspect the current fields as well as all of the fields for the base class to construct the class.I suggest this part be done in two steps.
The first step would be to have the metaclass construct an
rlp.Serializable
instance with the properfields
property, and to allow the existing internals ofrlp.Serializable
to populate the actual attribute setters and getters. Under this approach accessing the field properties from the class would likely result in anAttributeError
as they would not be constructed until class instantiation.The second step is to invert this, and have the metaclass (or the underlying
Field
class) create these attributes at the class level, likely using descriptors (which theField
class could implement). Then, thefields
property on the class would only be present for backwards compatibility.Field ordering
The field ordering for RLP objects is important. Luckily, this is easy to do using metaclasses.
https://docs.python.org/3/reference/datamodel.html#metaclass-example
However, this gets complicated in cases of inheritance. From the discussion on this issue, we've decided that the best solution to fixing this is to required that either all fields or zero field be overridden when a class is sub classed.
Documentation
The documentation will need to be updated to reflect this new API.
The text was updated successfully, but these errors were encountered: