Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading DAOD_PHYSLITE prototype #123

Closed
nikoladze opened this issue Oct 6, 2020 · 8 comments · Fixed by #124
Closed

Reading DAOD_PHYSLITE prototype #123

nikoladze opened this issue Oct 6, 2020 · 8 comments · Fixed by #124

Comments

@nikoladze
Copy link
Contributor

nikoladze commented Oct 6, 2020

It might make sense to discuss this here. I'm trying to read as much as possible from the current prototype for the DAOD_PHYSLITE format at ATLAS using uproot. The following uses these example root files: DAOD_PHYSLITE.art.pool.root and DAOD_PHYSLITE.art_split99.pool.root. I'm using uproot4 0.0.27. The things i currently can't read are the following:

  • Links into multiple other collections (branches whoose names end in "Links"). On the one hand those are vector<vector<...> and on the other hand a custom type (ElementLink<DataVector<something>>). Uproot can read these, but i get some "Unknown" objects out:
>>> import uproot4
>>> f = uproot4.open("DAOD_PHYSLITE.art.pool.root")
>>> t = f["CollectionTree"]
>>> arrays = t["AnalysisElectronsAuxDyn.trackParticleLinks"].array(library="np")
>>> arrays[8]
<STLVector [[<Unknown ElementLink<DataVector<xAOD::TrackParticle_v1_3e at 0x7fa30578b190>, ...], ...] at 0x7fa30578b2b0>

In principle the data part of these ElementLinks are just an index and a hash to identify the collection linked to. This can be seen for the branches that are just vector<ElementLink<...>> (single jagged). ROOT seems to be able to split these for the single-jagged case:

>>> t["AnalysisJetsAuxDyn.btaggingLink"]
<TBranchElement 'AnalysisJetsAuxDyn.btaggingLink' (2 subbranches) at 0x7fa304c861c0>
>>> t["AnalysisJetsAuxDyn.btaggingLink"].keys()
['AnalysisJetsAuxDyn.btaggingLink.m_persKey', 'AnalysisJetsAuxDyn.btaggingLink.m_persIndex']
>>> t["AnalysisJetsAuxDyn.btaggingLink.m_persIndex"].array()
<Array [[0, 1, 2, 3, 4, 5, ... 2, 3, 4, 5, 6]] type='100 * var * uint32'>
>>> t["AnalysisJetsAuxDyn.btaggingLink.m_persKey"].array()
<Array [[1030373024, ... 1030373024]] type='100 * var * uint32'>

Would there be a reasonable way to also read the multi-jagged links when i know that they should contain these uint32 numbers (m_persIndex and m_persKey)? So far i couldn't get ROOT to split them ...

  • When i set the default split level to 99 one more thing ROOT seems to split is a larger structure used for association of soft terms to the MET (branches with names *METAssoc*.*). In principle the branches are mostly vector<vector<float>, but i get the following error when trying to read them:
>>> import uproot4
>>> f = uproot4.open("DAOD_PHYSLITE.art_split99.pool.root")
>>> t = f["CollectionTree"]
>>> t["METAssoc_AnalysisMETAux.trkpx"].array()
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    t["METAssoc_AnalysisMETAux.trkpx"].array()
  File "/home/nikolai/.local/lib/python3.8/site-packages/uproot4/behaviors/TBranch.py", line 1984, in array
    interpretation = self.interpretation
  File "/home/nikolai/.local/lib/python3.8/site-packages/uproot4/behaviors/TBranch.py", line 2120, in interpretation
    self._interpretation = uproot4.interpretation.identify.interpretation_of(
  File "/home/nikolai/.local/lib/python3.8/site-packages/uproot4/interpretation/identify.py", line 443, in interpretation_of
    if branch.streamer is not None:
  File "/home/nikolai/.local/lib/python3.8/site-packages/uproot4/behaviors/TBranch.py", line 2232, in streamer
    for element in streamerinfo.walk_members(self._file.streamers):
  File "/home/nikolai/.local/lib/python3.8/site-packages/uproot4/streamers.py", line 372, in walk_members
    base = streamers[element.name][element.base_version]
KeyError: -1

The error also occurs when i run .show() - both on the branch and on the whole tree.

@jpivarski jpivarski linked a pull request Oct 6, 2020 that will close this issue
@jpivarski
Copy link
Member

#124 is chipping away at the assortment of issues this has raised. Try checking that out and seeing how far you get with it; post any other issues here. If it works, I'll merge it as-is.

(They've all been little things, except AsGrouped, which I had thought was already implemented, but apparently not.)

If the PR doesn't get merged for a while, be sure to ping me because I'm likely to forget! Thanks!

@nikoladze
Copy link
Contributor Author

Thanks for the quick fix! #124 seems to solve the issues on my second bullet point for reading the split branches from the MET association map. The errors are gone and as far as i can see uproot now correctly reads the data from these branches.

About the first bullet point with the double-jagged element links - i'm not sure if it is possible to split these (such that i would have instead of vector<vector<ElementLink<something>> two branches vector<vector<unsigned int> for the members m_persIndex and m_persKey). In case that does not work out - is there a way (possibly with custom deserialization code) to read these?

@jpivarski
Copy link
Member

Right, I should look into why the ElementLink stranger send to be missing. Is that an ATLAS thing or a ROOT thing?

My first guess at a custom interpretation would be

AsObjects(AsVector(AsVector(AsDtype([("persIndex", "u8"), ("persKey", "u8")]))))

but that leaves undetermined whether one or both of those vectors have headers. (Need to guess or look at the raw bytes with branch.debug(0).)

@nikoladze
Copy link
Contributor Author

That is an ATLAS thing - i'm no expert on athena internals, but i guess the stuff is defined somewhere here https://gitlab.cern.ch/atlas/athena/-/tree/21.2/Control/AthLinks

@jpivarski
Copy link
Member

So, a1f4c37 fixes the problem with ElementLinks being "unknown." If you'd believe it, it was because the name of the class sometimes has spaces before the closing > and sometimes doesn't (in the file).

(The lookup for "ElementLink<DataVector<xAOD::TrackParticle_v1>>" was failing because it was named "ElementLink<DataVector<xAOD::TrackParticle_v1> >" or vice-versa. They're all forced to be "ElementLink<DataVector<xAOD::TrackParticle_v1>>" now.)

>>> import uproot4
>>> f = uproot4.open("issue-123a.root")
>>> t = f["CollectionTree"]
>>> arrays = t["AnalysisElectronsAuxDyn.trackParticleLinks"].array(library="np")
>>> arrays[8]
<STLVector [[<ElementLink<DataVector<xAOD::TrackParticle_v1>> (version 1) at 0x7f4eeeaf5cd0>, ...], ...]>
>>> arrays[8][0][0]
<ElementLink<DataVector<xAOD::TrackParticle_v1>> (version 1) at 0x7f4eeeaf5cd0>
>>> arrays[8][0][0].all_members
{'m_persKey': 776133387, 'm_persIndex': 0}
>>> arrays[8][0][0].member("m_persKey")
776133387
>>> arrays[8][0][0].member("m_persIndex")
0

@nikoladze
Copy link
Contributor Author

Thanks a lot! I think with these fixes i should be able to read everything that's needed - i'll continue annoying you when i find further issues.

@nikoladze
Copy link
Contributor Author

Just for the record: We also have some branches where it seems they are affected by #38. E. g.

>>> import uproot4
>>>
>>> f = uproot4.open("DAOD_PHYSLITE.art.pool.root")
>>> f["MetaData"]["EventFormat"].array()
[...]
    xAOD::EventFormat_v1 version 9 as uproot4.dynamic.Model_xAOD_3a3a_EventFormat_5f_v1_v1 (46284 bytes)
Members for xAOD::EventFormat_v1: m_branchNames, m_classNames, m_parentNames, m_branchHashes

attempting to get bytes 83840:84064
outside expected range 0:83941 for this Chunk
in file DAOD_PHYSLITE.art.pool.root
in object /MetaData;2

Uproot seems to get the members correctly, there should be 3 strings and then one integer hash. Seems they are also stored "member wise". The .debug output is a bit long, but if i do .basket(0).data.tobytes() i get first a long block of stuff that looks like the branch names, then a long block of stuff that looks like the class names, etc and in the end a long block of garbage which are probably the hashes.

@jpivarski
Copy link
Member

Thanks for the note; memberwise splitting is going to be important. (I think you're the 7th to report it.)

The .basket(0).data.tobytes() method of debugging is good, though you might find more information in .debug(entryNum) and .debug_array(entryNum). One of the goals of Uproot 4 was to add more debugging tools because I think we'll always be encountering new types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants