-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: named axis for ak.Array
#3238
Conversation
ak.Array
ak.Array
Progressgeneral
broadcasting
slicing
Unary and binary operations
high-level functionsNew:
Can be used with named axis:
Independent of named axis: improvements / bugs found that are fixed by this PR aswell:
|
And all the data types that can be passed into square brackets with |
…xing named axis propagation
…x named axis propagation in indexing for type tracers
…ak.mean; remove inplace addition of arrays from test
…tible with branched structures;fix regularize_axis in all highlevel ops
…ranch_depth using inner_shape property
…at depended on broadcasting to work)
The PR is ready for review @agoose77 and @jpivarski. If you know why the windows tests are failing, please let me know if you have an idea... (I thought I'm sorry that the PR got so huge, but the largest part are tests related to named axes and translating named axes to positional axes in each high-level function. |
@pfackeldey the |
I'd like to see the rendered documentation, but the Deploy Branch Preview jobs are getting skipped. I don't know why: they were running PRs relatively recently. Meanwhile, I'm still going over the rest of the PR. Edit: It's because this is a fork, not a branch. That's okay; the markdown looks good; we'll see the rendered version after merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to review such a large PR, but we went over it at length during development. I think we iterated much more quickly because of that, and so I don't have much to say here, on the PR itself. The spot-check issues below are all minor. Still, we should check in again before the final merger, since everybody working on the codebase needs to be ready to incorporate the update when it comes.
Oh, and I should have mentioned that this is very high quality code (type annotations, docstrings, comments)! Thank you! |
Thank you very much for your review @jpivarski! I'll add your suggestions soon 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small review -- I'll pop back tomorrow.
Proposal for named axis
This PR addresses #2596.
References for other named axis implementations:
Motivation
As argumented at PyHEP.dev 2023 and by the Harvard NLP group in their "Tensor Considered Harmful" write-up, named axis can be a powerful tool to make code more readable and less error-prone.
Design
ak.Array
with named axisNamed axis are implemented through a mapping from named axis to positional axis.
named axis are hashables (currently restricted to strings), except for integers (and None) as they are reserved for positional axis.
By default an
ak.Array
uses positional axis, but named axis can be added to the array in the following ways:The
named_axis
argument of the constructor of anak.Array
is a either tuple ofAxisName
or a dict ofAxisName
to integers.It is stored in the
.attrs
attribute of the array with a reserved key"__named_axis__"
of typedict[AxisName, int]
.The two types of axis can be accessed through the
named_axis
andpositional_axis
property (always represented as a tuple):Named axis in high-level functions
Named axis can be used by all high-level functions, e.g.
ak.sum
,ak.max
, etc.:There are different scenarios how named axis are propagated to the resulting array:
ak.sum(array, axis="jets", keepdims=True)
orarray ** 2
.ak.sum(array, axis="jets")
.ak.Array
or broadcasting:Here, checks for matching named axis are performed, the rules are:
Named axis in indexing
In addition, named axis can be used to select data:
For synthatic sugar use
np.s_
:This PR has to touch a lot of code and needs to add custom named axis propagation to each high-level operation. Thus, this PR is currently in draft mode.
Looking forward to ideas, thoughts, feedback on this effort!