Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RLP fields using annotated class attributes #109

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

onyb
Copy link

@onyb onyb commented Mar 17, 2019

What was wrong?

See #44

How was it fixed?

The feature and the desired behavior has been discussed at length in #44. The implementation in this PR is based on PEP 526 -- Syntax for Variable Annotations.

In the SerializableBase metaclass, I am inspecting the __annotations__ attribute of RLP classes to filter Sede objects, and pass them as fields while constructing the RLP class.

Important notes

  • Duplicate definitions of RLP fields do not raise a TypeError anymore. For example, the following RLP class definition is considered valid:
    class Txn(rlp.Serializable):
        sender: sedes.text
        receiver: sedes.text
        sender: sedes.text
        amount: big_endian_int
    The (ordered) field names are ("sender", "receiver", "amount",).
  • New syntax doesn't work on Python 3.5.
    Because the syntax for annotated class attributes is not supported in Python < 3.6.
  • The new syntax also fails on PyPy3, since it's based on Python 3.5. PyPy3.6 is currently in beta.
  • We can merge and release this PR nonetheless since the code is backwards compatible.

Cute Animal Picture

@carver
Copy link
Contributor

carver commented Mar 23, 2019

Dropping support for py3.5 and pypy3 is not backwards compatible. It would require a major version bump. (I'm not necessarily opposed to that, but it does come with costs)

Why allow duplicate definitions of attributes? That seems quite surprising to me.

@onyb
Copy link
Author

onyb commented Mar 24, 2019

Dropping support for py3.5 and pypy3 is not backwards compatible. It would require a major version bump. (I'm not necessarily opposed to that, but it does come with costs)

We won't need to drop support for py3.5 or pypy3 in the code. It's just that if you're using the new annotation syntax for defining RLP fields, you must use Python 3.6.

Why allow duplicate definitions of attributes? That seems quite surprising to me.

I don't understand this remark. Could you please give an example?

@carver
Copy link
Contributor

carver commented Mar 27, 2019

We won't need to drop support for py3.5 or pypy3 in the code. It's just that if you're using the new annotation syntax for defining RLP fields, you must use Python 3.6.

Okay, cool. I'll do a deeper review when py3.5 tests are succeeding then 👍

Why allow duplicate definitions of attributes? That seems quite surprising to me.

I don't understand this remark. Could you please give an example?

Yeah, I was referring to:

  • Duplicate definitions of RLP fields do not raise a TypeError anymore. For example, the following RLP class definition is considered valid:
    class Txn(rlp.Serializable):
        sender: sedes.text
        receiver: sedes.text
        sender: sedes.text
        amount: big_endian_int

@pipermerriam
Copy link
Member

Something that I think needs to be addressed before this can be considered is constricting the definition of what a field is. Currently this library (and pull request) has used hasattr checks. I see this becoming an issue in the near-term future with the introduction of SSZ and the py-ssz library and the potential to accidentally mix-and-match SSZ/RLP primatives. py-ssz uses an isinstance approach which should solve this on that side, but I think that an SSZ object could sneak into py-rlp and it either would silently work or if it did error, I'd expect the error to be confusing.

I'd consider the cleanup of converting all sedes to inherit from a base class and getting all the hasattr checks out of this codebase to be a blocker for this.

@jannikluhn
Copy link
Contributor

Something that I think needs to be addressed before this can be considered is constricting the definition of what a field is.

This may be getting somewhat off topic, but in addition to what you said we should also distinguish between sedes (something that serializes/deserializes) and field (some kind of attribute whose value will be serialized). I thought a while ago about this and found descriptors which seem like the right tool to implement fields. The main utility we'd get from using them is proper type hinting support, at least for attribute access.

@onyb
Copy link
Author

onyb commented Mar 27, 2019

We won't need to drop support for py3.5 or pypy3 in the code. It's just that if you're using the new annotation syntax for defining RLP fields, you must use Python 3.6.

Okay, cool. I'll do a deeper review when py3.5 tests are succeeding then

Okay. I think I just need to find a clever way of skipping the (parametrized) test cases that use annotation syntax on py3.5.

Why allow duplicate definitions of attributes? That seems quite surprising to me.

I don't understand this remark. Could you please give an example?

Yeah, I was referring to:

  • Duplicate definitions of RLP fields do not raise a TypeError anymore. For example, the following RLP class definition is considered valid:
    class Txn(rlp.Serializable):
        sender: sedes.text
        receiver: sedes.text
        sender: sedes.text
        amount: big_endian_int

I didn't find a good way to circumvent this. Python stores the annotations in a dictionary, so only one of the fields will actually get considered. The following snippet shows the observed behavior:

class Txn(rlp.Serializable):
    sender: sedes.text
    receiver: sedes.text
    sender: sedes.text
    amount: big_endian_int
>>> Txn.__annotations__

{'sender': big_endian_int, receiver: sedes.text, amount: big_endian_int}

Note that in terms of order, sender comes first, but it takes the annotation as specified in the second occurrence.

@jannikluhn
Copy link
Contributor

Duplicate definitions of RLP fields do not raise a TypeError anymore.

I think this is less of an issue with class attributes compared to the fields list as mypy would complain about this (whereas it doesn't for the old fields approach). So in my opinion we could drop this requirement.

@pipermerriam
Copy link
Member

So in my opinion we could drop this requirement.

I think this just comes down to syntax and language constraints.

I think that the following two produce identical AST.

class A:
    field_a: int
    field_b: str
    field_a: bool

class B:
    field_b: str
    field_a: bool

So as far as I can tell we are unable to enforce this restriction since as far as python is concerned, there is not a duplicate field. It's only our visual inspection of the code makes it appear the field is duplicated when in fact the second declaration is overwriting the first at the lowest level of the language.

So 👍 for having a test that shows that this in fact works as expected and as long as we're enforcing the uniqueness of fields at the internal levels I'm 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants