Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend __getitem__ to include jagged and masked arrays in slices. #67

Closed
jpivarski opened this issue Jan 14, 2020 · 8 comments
Closed
Assignees
Labels
feature New feature or request

Comments

@jpivarski
Copy link
Member

jpivarski commented Jan 14, 2020

Relies upon #66.

Follow pyarrow.Array's behavior for slicing with masked arrays (IndexedOptionArray, BitMaskedArray, and eventually ByteMaskedArray).

Will need to extend Slice hierarchy and add jagged and masked cases to Content::getitem_*.

@jpivarski jpivarski self-assigned this Jan 14, 2020
@jpivarski jpivarski added the feature New feature or request label Jan 14, 2020
@nsmith-
Copy link
Member

nsmith- commented Jan 16, 2020

Putting here an example of the pyarrow behavior:

In [1]: import pyarrow as pa

In [2]: pa.array(range(5))
Out[2]:
<pyarrow.lib.Int64Array object at 0x112289c90>
[
  0,
  1,
  2,
  3,
  4
]

In [3]: pa.array(range(5)).take(pa.array([1, None, 2]))
Out[3]:
<pyarrow.lib.Int64Array object at 0x1122dd130>
[
  1,
  null,
  2
]

@jpivarski
Copy link
Member Author

pyarrow doesn't support it, but a logical extension should also do this:

>>> py.array(range(5)).compress(py.array([False, True, None, None, True])
[
   1,
   null,
   null,
   4
]

Of course, "compress" is a terrible name, and pyarrow's compress function does the more logical thing: lossless compression. However, when these are used in __getitem__ without special names like take and compress, the above is what a user would expect.

@jpivarski
Copy link
Member Author

Step 1 is done (in PR #111):

>>> ak.Array(range(5))[ak.Array([1, None, 2])]
<Array [1, None, 2] type='3 * ?int64'>

@jpivarski
Copy link
Member Author

Step 2 is done (also in PR #111):

>>> ak.Array(range(5))[ak.Array([False, True, None, None, True])]
<Array [1, None, None, 4] type='4 * ?int64'>

@jpivarski
Copy link
Member Author

And all the jagged slices:

>>> array = ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5], [6.6], [7.7, 8.8, 9.9]])
>>> ak.tolist(array[[[0, -1], [], [], [0, 0, 0], [-1, -2, -3]]])
[[1.1, 3.3], [], [], [6.6, 6.6, 6.6], [9.9, 8.8, 7.7]]
>>> ak.tolist(array[[[0, None, -1], [None], [], [0, None, 0], [-1, -2, -3]]])
[[1.1, None, 3.3], [None], [], [6.6, None, 6.6], [9.9, 8.8, 7.7]]
>>> ak.tolist(array[[[0, -1], None, [], [], None, [0, 0, 0], [-1, -2, -3]]])
[[1.1, 3.3], None, [], [], None, [6.6, 6.6, 6.6], [9.9, 8.8, 7.7]]
>>> ak.tolist(array[[[0, None, -1], None, [None], [], None, [0, 0, 0], [-1, -2, -3]]])
[[1.1, None, 3.3], None, [None], [], None, [6.6, 6.6, 6.6], [9.9, 8.8, 7.7]]

@jpivarski
Copy link
Member Author

And jagged mask (almost forgot the most important case!):

>>> ak.tolist(array[[[False, False, True], [], [True, True], [False], [True, False, True]]])
[[3.3], [], [4.4, 5.5], [], [7.7, 9.9]]

@jpivarski
Copy link
Member Author

This can also have None:

>>> ak.tolist(array[[[False, False, True], None, [], None, [True, True], [False], [True, False, True]]])
[[3.3], None, [], None, [4.4, 5.5], [], [7.7, 9.9]]

@jpivarski
Copy link
Member Author

Getting None values in the inner layer (correctly across jagged boundaries) was more difficult, but it's done now:

>>> ak.tolist(array[[[False, True, None], [None], [None, True], [False], [True, False, True]]])
[[2.2, None], [None], [None, 5.5], [], [7.7, 9.9]]

You can even do them at both levels. :)

>>> ak.tolist(array[[[False, True, None], None, [None], None, [None, True], [False], [True, False, True]]])
[[2.2, None], None, [None], None, [None, 5.5], [], [7.7, 9.9]]

So this issue is closed. The tests/test_PR111_jagged_and_masked_getitem.py is much more extensive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants