-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: ak.unzip
visits all contents
#2373
Conversation
Huh, hold on a moment, this no longer fixes things for me. Perhaps I regressed something ... |
Codecov Report
Additional details and impacted files
|
Ah, of course, we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was supposed to be an approve. (Though the test should be added, if it can be.)
Let me check it with a full nanoevents form. |
Even for my repro this branch doesn't fix the over-touching. |
Ah, excuse me, it has fixed part of it:
It doesn't touch the "crossref"s any more at an early stage but still touches all the jet fields. |
Yes, I'm not super familiar with the purpose yet, but I can see that I wouldn't expect these cases to touch these buffers. |
If you mean the purpose of the "crossref", it could could be any name, the purpose is sort of aside the point. I could call it "cheeseburger" and it would still overtouch the data (so long as cheeseburger is a ListOffsetArray) when I index after a cartesian. |
No, you're fine there! It's really that I don't understand the details of nanoevents just yet, at least, not as a user. |
Ah, that has to do with the oral tradition of how CMS/ATLAS HEP physicists are trained to think about their data. It is highly object oriented and full of relational linkages (leading to lots of structure and arrays that are indexers of other arrays in awkward). It's a pretty decent way to think about the data, and maybe we can improve that with time, but it is hard to tell people to change the way they think in more than one direction at once (and moving to awkward array is already a big jump). Nanoevents accomplishes this by taking columns following a naming convention in some file format (which is usually collections of flat or singly jagged arrays in root files with a particular naming convention) and generating the form (with appropriate form keys to the original arrays) that uses the structure and relationships possible in awkward array to expose the structured / relational nature of collider physics data. The details I think you're wondering about are probably mostly related to cross-cleaning of physics objects and relationships of jets to constituents and such. The implementation details are mostly @nsmith-, if you want to dig very deep. |
This fixes #2372 by arresting recursion at the first record or union. I think this was the original intention, but the codebase was in a lot of flux at this point, and so it was likely missed.
Note that, in #2372 there are additional cases below the
unzip
that might also need addressing. We should raise these as new bugs.