Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ak.Array constructor from {"key": whole_array} #156

Closed
tamasgal opened this issue Mar 10, 2020 · 5 comments
Closed

ak.Array constructor from {"key": whole_array} #156

tamasgal opened this issue Mar 10, 2020 · 5 comments
Labels
duplicate This issue or pull request already exists

Comments

@tamasgal
Copy link

In one of my analysis frameworks I introduced a Table class, which is a thin wrapper to numpy.recarray and has also a constructor which takes a dictionary as input. It became quite popular among our users due to the ability to quickly create type-safe 2D recarrays. Here is an example

In [1]: import km3pipe as kp

In [2]: t = kp.Table({'a': [1,2,3], 'b': [4.5, 6.7, 8.9], 'c': False})

In [3]: t
Out[3]: Generic Table <class 'km3pipe.dataclasses.Table'> (rows: 3)

In [4]: print(t)
Generic Table <class 'km3pipe.dataclasses.Table'>
HDF5 location: /misc (no split)
<i8 (dtype: a) = [1 2 3]
<f8 (dtype: b) = [4.5 6.7 8.9]
|b1 (dtype: c) = [False False False]

In [5]: t.dtype
Out[5]: dtype((numpy.record, [('a', '<i8'), ('b', '<f8'), ('c', '?')]))

In [6]: t[0]
Out[6]: (1, 4.5, False)

In [7]: type(t[0])
Out[7]: numpy.record

In [8]: t[0].b
Out[8]: 4.5

In [9]: t[1:3]
Out[9]: Generic Table <class 'km3pipe.dataclasses.Table'> (rows: 2)

The inspiration came from Pandas which offers a similar constructor:

In [5]: import pandas as pd

In [6]: df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6], 'c': True})

In [7]: df
Out[7]:
   a  b     c
0  1  4  True
1  2  5  True
2  3  6  True

With awkward array, the corresponding constructor only does a "for element in dict"-like iteration, which in Python boils down to an iteration over the dictionary .keys(). This might confuse people who already worked with pandas.DataFrames or similar classes. For the sake of completeness, here is what we get when passing a dictionary (no surprise for us but I post it here in for future reference):

In [1]: import awkward1 as ak

In [2]: arr = ak.Array({'a': [1,2,3], 'b': [4,5,6,7,8]})

In [3]: arr
Out[3]: <Array ['a', 'b'] type='2 * string'>

In [4]: arr.a
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-7be3c5184803> in <module>
----> 1 arr.a

~/Dev/awkward-1.0/awkward1/highlevel.py in __getattr__(self, where)
     95                     raise AttributeError("while trying to get field {0}, an exception occurred:\n{1}: {2}".format(repr(where), type(err), str(err)))
     96             else:
---> 97                 raise AttributeError("no field named {0}".format(repr(where)))
     98
     99     def __dir__(self):

AttributeError: no field named 'a'

With pandas it'

Of course the DataFrame or Table classes are mostly simple 2D table with the requirement of equal lengths for each field (that's why it e.g. auto-repeat-expands single value attributes) but I think this constructor would be a handy addition to akward.Arrays, especially since it accepts awkward data shapes and layouts 😉

What do you think? I am not sure if I find time to come up with a solid PR until 1.0.

@jpivarski
Copy link
Member

jpivarski commented Mar 10, 2020

Actually, I was just thinking about getting to work on #77, the ak.zip method, which might be what you're talking about here. That one would be a good first issue.

The signature

ak.Array({'a': [1,2,3], 'b': [4,5,6,7,8]})

should be forbidden, since the data between parentheses is actually a Record. Let's call that a bug-fix: it shouldn't turn a dict into an array of its keys. The error message should probably point out that if you want a record, you can build a record:

ak.Record({'a': [1,2,3], 'b': [4,5,6,7,8]})

but that constructor doesn't invoke fromiter yet. That's explicitly a FIXME:

https://github.com/scikit-hep/awkward-1.0/blob/e60faf1bf1815b44124a71f444c969194118481d/src/awkward1/highlevel.py#L173-L186

@jpivarski jpivarski changed the title Constructor with dictionaries (recarray-like) ak.Array constructor from {"key": whole_array} Mar 10, 2020
@jpivarski jpivarski added the duplicate This issue or pull request already exists label Mar 10, 2020
@jpivarski
Copy link
Member

I'm marking this as a duplicate because it's very similar to what the ak.zip function (#77) should provide. It should be a different function from the ak.Array constructor because it's doing a very different thing with the data it is given.

@tamasgal
Copy link
Author

Yes I see, thanks for the feedback!

@jpivarski
Copy link
Member

I'm going to open a PR to handle these constructor issues. I'm not sure how much it will try to address, but there will be a place to look.

@jpivarski
Copy link
Member

I really should have linked these issues to the PR. The signature you were asking for is available in ak.zip, but it's not an ak.Array signature.

>>> import awkward1 as ak
>>> array = ak.zip({"x": ak.Array([[1, 2], [], [3]]), "y": ak.Array([1.1, 2.2, 3.3])})

>>> print(array)
[{x: [1, 2], y: 1.1}, {x: [], y: 2.2}, {x: [3], y: 3.3}]

>>> ak.typeof(array)
3 * {"x": var * int64, "y": float64}

Since Python lists are "array-like," they can be used for small examples:

>>> array = ak.zip({"x": [[1, 2], [], [3]], "y": [1.1, 2.2, 3.3]})

>>> print(array)
[{x: [1, 2], y: 1.1}, {x: [], y: 2.2}, {x: [3], y: 3.3}]

>>> ak.typeof(array)
3 * {"x": var * int64, "y": float64}

It would be confusing to allow this syntax in the ak.Array constructor because that constructor argument is interpreted as the inverse of ak.list:

>>> not_array = ak.Record({"x": [[1, 2], [], [3]], "y": [1.1, 2.2, 3.3]})
>>> print(not_array)
{x: [[1, 2], [], [3]], y: [1.1, 2.2, 3.3]}
>>> ak.typeof(not_array)
{"x": var * var * int64, "y": var * float64}

>>> ak.Array({"x": [[1, 2], [], [3]], "y": [1.1, 2.2, 3.3]})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpivarski/irishep/awkward-1.0/awkward1/highlevel.py", line 34, in __init__
    raise TypeError("could not convert dict into an awkward1.Array; try awkward1.Record")
TypeError: could not convert dict into an awkward1.Array; try awkward1.Record

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants