Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: protect Uproot's 'project_columns' from Dask node names. #801

Merged

Conversation

jpivarski
Copy link
Member

When computing Dask graphs with a operations inside a __getitem__,

da = uproot.dask(skhep_testdata.data_path("uproot-issue-791.root") + ":tree")
da[da.int_branch < 0].compute()

Uproot is receiving names like "less-06b0b18209c65504e8506df9da02f75d" as branch names in project_columns. The strings we get in project_columns should be a strict subset of the strings in the input set of columns. This PR selects that subset, but it shouldn't be happening on the dask-awkward end.

I'm using a version of dask-awkward from git... 7fe448a1d448fa30f09693be0b14634c7968161a from December 9, 2022.

Comment on lines +600 to +604
return _UprootRead(
self.ttrees,
[x for x in branches if x in self.branches],
self.interp_options,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need to be a set intersection, except it's likely not to be a significant performance bottleneck.

@jpivarski jpivarski merged commit e24bfb7 into main Dec 15, 2022
@jpivarski jpivarski deleted the jpivarski/protect-uproot-project_columns-from-dask-node-names branch December 15, 2022 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slicing a Dask RecordArray from Uproot raises a key error
2 participants