Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ak.unzip visits all contents #2373

Merged
merged 4 commits into from
Apr 7, 2023
Merged

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Apr 7, 2023

This fixes #2372 by arresting recursion at the first record or union. I think this was the original intention, but the codebase was in a lot of flux at this point, and so it was likely missed.

Note that, in #2372 there are additional cases below the unzip that might also need addressing. We should raise these as new bugs.

@agoose77 agoose77 requested a review from jpivarski April 7, 2023 17:43
@agoose77 agoose77 marked this pull request as draft April 7, 2023 17:49
@agoose77
Copy link
Collaborator Author

agoose77 commented Apr 7, 2023

Huh, hold on a moment, this no longer fixes things for me. Perhaps I regressed something ...

@codecov
Copy link

codecov bot commented Apr 7, 2023

Codecov Report

Merging #2373 (5f8ac2e) into main (01417bc) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

Additional details and impacted files
Impacted Files Coverage Δ
src/awkward/operations/ak_unzip.py 95.83% <100.00%> (-0.33%) ⬇️

@agoose77
Copy link
Collaborator Author

agoose77 commented Apr 7, 2023

Ah, of course, we None is treated as an implicit recursion. Fixed!

@agoose77 agoose77 marked this pull request as ready for review April 7, 2023 17:53
@agoose77 agoose77 temporarily deployed to docs-preview April 7, 2023 17:59 — with GitHub Actions Inactive
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, good!

I thought I'd checked ak.unzip as a possible culprit. Is this the only one, or are there any errors left in #2372 after this?

If this is it, then the test case on #2372 can become a unit test here.

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was supposed to be an approve. (Though the test should be added, if it can be.)

@lgray
Copy link
Contributor

lgray commented Apr 7, 2023

Let me check it with a full nanoevents form.

@lgray
Copy link
Contributor

lgray commented Apr 7, 2023

Even for my repro this branch doesn't fix the over-touching.

@lgray
Copy link
Contributor

lgray commented Apr 7, 2023

Ah, excuse me, it has fixed part of it:

(coffea-dev) lgray@Lindseys-MBP awkward % python -i ../coffea/form_madness.py      
ab>>> ['muon_list!', 'jet_list!'] 

dr>>>> ['muon_list!', 'jet_list!', 'muon_eta!', 'jet_eta!', 'muon_phi!', 'jet_phi!'] 

mmin>> ['muon_list!', 'jet_list!', 'muon_eta!', 'jet_eta!', 'muon_phi!', 'jet_phi!'] 

pt>>>> ['muon_list!', 'jet_list!', 'muon_eta!', 'jet_eta!', 'muon_phi!', 'jet_phi!', 'jet_pt!', 'jet_crossref_index!', 'jet_crossref_content!', 'jet_thing1!'] 

It doesn't touch the "crossref"s any more at an early stage but still touches all the jet fields.
I suppose this is what is meant by "cases below unzip".
Let me see if I can back-out unzip and still get the problem.

@agoose77
Copy link
Collaborator Author

agoose77 commented Apr 7, 2023

It doesn't touch the "crossref"s any more at an early stage but still touches all the jet fields.
I suppose this is what is meant by "cases below unzip".
Let me see if I can back-out unzip and still get the problem.

Yes, I'm not super familiar with the purpose yet, but I can see that I wouldn't expect these cases to touch these buffers.

@agoose77 agoose77 enabled auto-merge (squash) April 7, 2023 18:48
@lgray
Copy link
Contributor

lgray commented Apr 7, 2023

If you mean the purpose of the "crossref", it could could be any name, the purpose is sort of aside the point. I could call it "cheeseburger" and it would still overtouch the data (so long as cheeseburger is a ListOffsetArray) when I index after a cartesian.

@agoose77 agoose77 temporarily deployed to docs-preview April 7, 2023 18:57 — with GitHub Actions Inactive
@agoose77
Copy link
Collaborator Author

agoose77 commented Apr 7, 2023

If you mean the purpose of the "crossref", it could could be any name, the purpose is sort of aside the point. I could call it "cheeseburger" and it would still overtouch the data (so long as cheeseburger is a ListOffsetArray) when I index after a cartesian.

No, you're fine there! It's really that I don't understand the details of nanoevents just yet, at least, not as a user.

@agoose77 agoose77 merged commit ee25546 into main Apr 7, 2023
@agoose77 agoose77 deleted the agoose77/fix-unzip-recursion branch April 7, 2023 18:59
@lgray
Copy link
Contributor

lgray commented Apr 7, 2023

Ah, that has to do with the oral tradition of how CMS/ATLAS HEP physicists are trained to think about their data. It is highly object oriented and full of relational linkages (leading to lots of structure and arrays that are indexers of other arrays in awkward). It's a pretty decent way to think about the data, and maybe we can improve that with time, but it is hard to tell people to change the way they think in more than one direction at once (and moving to awkward array is already a big jump).

Nanoevents accomplishes this by taking columns following a naming convention in some file format (which is usually collections of flat or singly jagged arrays in root files with a particular naming convention) and generating the form (with appropriate form keys to the original arrays) that uses the structure and relationships possible in awkward array to expose the structured / relational nature of collider physics data. The details I think you're wondering about are probably mostly related to cross-cleaning of physics objects and relationships of jets to constituents and such.

The implementation details are mostly @nsmith-, if you want to dig very deep.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ak.unzip touches all contents whilst typetracing
3 participants