Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add as_dict() method to struct #32

Open
kawing-chiu opened this issue Nov 27, 2017 · 7 comments
Open

Add as_dict() method to struct #32

kawing-chiu opened this issue Nov 27, 2017 · 7 comments

Comments

@kawing-chiu
Copy link

Namedtuple has the method _asdict() and pycapnp also has to_dict(). I think we should also add something equivalent to capnpy, which promptly converts a struct into OrderedDict.

@colinfang
Copy link
Collaborator

what happens if the struct contains a unnamed union or it is a nested struct?

@colinfang
Copy link
Collaborator

My colleagues do sometimes find to_dict useful for simple plain struct. Currently we add the methods via _extended.py.

@antocuni
Copy link
Owner

I think that it's not so easy to design something which has a reasonable behavior w.r.t. all the possible combination of capnproto features. Some random thoughts:

  • unions: do we include all the fields, or just the one which is set? Do we also include a special which key?
  • Void fields? Do we include them or not?
  • Schema evolution: what do we do with fields which are not statically known at compile time? Just ignore them?
  • AnyPointer: how to deal with it?
  • groups: are they rendered as nested dicts, or using dotted keys?
  • NULL pointers: if we have a NULL Text, do we render it as None or ""?
  • default Text values: same as above, but in the case the field has a default value

I am sure that depending on the exact use case, you would need slightly different answers to the questions above. So, I am tempted to say that this feature should not be part of the capnpy core, at least for now.
It would be nice to have it as an external library or plugin: then, as @colinfang says, you can easily integrate inside your schema using *_extended.py.

@kawing-chiu
Copy link
Author

Well...I wrote this without noticing your replies...

I haven't really used these advanced features of capnp yet. Will have a look tomorrow~

@kawing-chiu
Copy link
Author

kawing-chiu commented Nov 29, 2017

I've investigated the issue a bit more, here are my thoughts:

First of all, this issue is not about converting the whole capnp data structure into native python types, but about "shallowly" converting to a dict, so nested struct is certainly not considered and most fields don't need to be rendered. Generally, I think such kind of thing cannot and should not be done. For example:

Object = namedtuple('Object', ['dimension', 'weight'])
Dimension = namedtuple('Dimension', ['x', 'y', 'z'])
o = Object(Dimension(10, 15, 20), 50)
o._asdict()

Will the nested Dimension be converted? No. But the user can always choose to do it himself. Another example:

from types import MappingProxyType
d = {'nested': {'a': 1}, 'b': 2}
m = MappingProxyType(d)

Will m['nested'] become MappingProxyType? No. But the user can choose to do it with one more line. Also note that namedtuple._asdict() is indifferent to what the type of the field is, it can be a cffi pointer or whatever.

Secondly, I don't see how *_extended.py can solve this issue easily. My schema has ~30 fields. Maybe I missed it, I couldn't find a way to get/iterate the field names easily. So to write equivalent methods in *_extended.py, I have to list all the fields manually, this is unacceptable, given that I have already written a .capnp file containing all the relevant information.

Handling data consists of (possibly nested) dict/list of primitive types should cover at least 90% usage of a serialization library (which is quite a conservative figure, I would say). I think an api as succinct as possilbe should be provided for such usage. In our app, the serialization layer has a fixed api: dict <-> bytes, while the serialization lib can be changed. We have tried quite a few libs, most can do the job in one or two line.

@kawing-chiu
Copy link
Author

kawing-chiu commented Nov 29, 2017

Given the philosophy above, advanced fields that can normally be retrieved from attribute just works. For example group:

>>> mod = capnpy.load_schema('example_group')
>>> Point = mod.Point
>>> p = Point(position=(3, 4), color='red')
>>> p._fields
('position', 'color')
>>> p._asdict()
OrderedDict([('position', <Point.position: (x = 3, y = 4)>), ('color', b'red')])

named union:

>>> mod = capnpy.load_schema('example_named_union')
>>> Person = mod.Person
>>> p = Person(name='Bob', job=Person.Job(employer='Capnpy corporation'))
>>> p._fields
('name', 'job')
>>> p._asdict()
OrderedDict([('name', b'Bob'), ('job', <Person.job: (employer = "Capnpy corporation")>)])

There might be some corner cases left to be handled, most notably unnamed union. Even with a few exceptions, I think this feature is still very useful. The user can always choose to further process the data as needed.

@kawing-chiu
Copy link
Author

kawing-chiu commented Nov 29, 2017

As for unnamed union, I propose two possible solutions:

  • Omit unamed union fields in _fields and _asdict(). This is the simplest one.

  • Include the currently set field in the union. This is the arguably more reasonable one, and more close to the definition of 'union'. For example:

@0x8ced518a09aa7ce3;
struct Shape {
  area @0 :Float64;
  union {
    circle @1 :Float64;      # radius
    square @2 :Float64;      # width
  }
}
>>> s = Shape(area=20, circle=5)
>>> s._fields
# ('area', 'circle')
>>> s._asdict()
# OrderedDict([('area', 20.0), ('circle', 5.0)])

Note that no matter which one is chosen, the user can always choose to process it further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants