Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

Unable to read nested branches #510

Closed
soleti opened this issue Jul 15, 2020 · 3 comments
Closed

Unable to read nested branches #510

soleti opened this issue Jul 15, 2020 · 3 comments

Comments

@soleti
Copy link

soleti commented Jul 15, 2020

I have a weird issue with a ROOT file with this structure:

Tree structure

I am able to access e.g. Primaries.Position, but I can't access Trajectories.Points.Position

>>> import uproot
>>> fi = uproot.open("nu.root")
>>> evs = fi["EDepSimEvents"]["Event"]
>>> prim = evs['Primaries']
>>> prim['Primaries.Position'].array()
<JaggedArrayMethods [[TLorentzVector(x=-4456, y=-3414.9, z=4859.5, t=1)] [TLorentzVector(x=-2018.8, y=94.998, z=7546.5, t=1)] [TLorentzVector(x=518.64, y=-3308, z=9187.8, t=1)] ... [TLorentzVector(x=-1115.3, y=-2646.9, z=8872.3, t=1)] [TLorentzVector(x=-3466.1, y=-3515.7, z=9152.3, t=1)] [TLorentzVector(x=-1750.7, y=-1975.8, z=7106.5, t=1)]] at 0x0001127e3040>
>>> traj = evs['Trajectories']
>>> points = traj['Trajectories.Points']
>>> points.keys()
[]

Why is that, am I doing something wrong? And is there a way to fix this? I uploaded a test file here.
Thank you!

@soleti soleti changed the title Impossible to read nested branches Unable to read nested branches Jul 15, 2020
@jpivarski
Copy link
Member

If you have the ability to rewrite this file, try increasing the splitLevel on the branch. I think it might be something around 2; try turning it up to 99. The data are split down to the Trajectories.Points level and no further, but there's no reason why they couldn't be split further.

In this TTree:

>>> import uproot4
>>> tree = uproot4.open("issue510.root:EDepSimEvents")
>>> tree.show()
name                 | typename             | interpretation                    
---------------------+----------------------+-----------------------------------
Event                | TG4Event             | AsObjects(Model_TG4Event)         
Event/TObject        | unknown              | None                              
Event/TObject/fUniqu | uint32_t             | AsDtype('>u4')                    
Event/TObject/fBits  | uint32_t             | AsDtype('>u4')                    
Event/RunId          | int32_t              | AsDtype('>i4')                    
Event/EventId        | int32_t              | AsDtype('>i4')                    
Event/Primaries      | std::vector<TG4Prima | AsObjects(AsVector(True, Model_TG4
Event/Primaries/Prim | uint32_t[]           | AsJagged(AsDtype('>u4'))          
Event/Primaries/Prim | uint32_t[]           | AsJagged(AsDtype('>u4'))          
Event/Primaries/Prim | std::vector<TG4Prima | AsObjects(AsVector(True, Model_TG4
Event/Primaries/Prim | std::vector<TG4Prima | AsObjects(AsVector(True, Model_TG4
Event/Primaries/Prim | TLorentzVector       | AsStridedObjects(Model_TLorentzVec
Event/Primaries/Prim | std::string          | AsStrings(header_bytes=6)         
Event/Primaries/Prim | std::string          | AsStrings(header_bytes=6)         
Event/Primaries/Prim | std::string          | AsStrings(header_bytes=6)         
Event/Primaries/Prim | int32_t[]            | AsJagged(AsDtype('>i4'))          
Event/Primaries/Prim | float[]              | AsJagged(AsDtype('>f4'))          
Event/Primaries/Prim | float[]              | AsJagged(AsDtype('>f4'))          
Event/Primaries/Prim | float[]              | AsJagged(AsDtype('>f4'))          
Event/Primaries/Prim | float[]              | AsJagged(AsDtype('>f4'))          
Event/Trajectories   | std::vector<TG4Traje | AsObjects(AsVector(True, Model_TG4
Event/Trajectories/T | uint32_t[]           | AsJagged(AsDtype('>u4'))          
Event/Trajectories/T | uint32_t[]           | AsJagged(AsDtype('>u4'))          
Event/Trajectories/T | std::vector<TG4Traje | AsJagged(AsStridedObjects(Model_TG
Event/Trajectories/T | int32_t[]            | AsJagged(AsDtype('>i4'))          
Event/Trajectories/T | int32_t[]            | AsJagged(AsDtype('>i4'))          
Event/Trajectories/T | std::string          | AsStrings(header_bytes=6)         
Event/Trajectories/T | int32_t[]            | AsJagged(AsDtype('>i4'))          
Event/Trajectories/T | TLorentzVector       | AsStridedObjects(Model_TLorentzVec
Event/SegmentDetecto | std::map<std::string | AsObjects(AsMap(True, AsString(Tru

the Event/Trajectories/Trajectories.Points is shown as a single branch:

>>> tree["Event/Trajectories/Trajectories.Points"].show()
name                 | typename             | interpretation                    
---------------------+----------------------+-----------------------------------
Trajectories.Points  | std::vector<TG4Traje | AsJagged(AsStridedObjects(Model_TG

It consists of a vector of TG4TrajectoryPoint objects:

>>> tree["Event/Trajectories/Trajectories.Points"].typename
'std::vector<TG4TrajectoryPoint>'

and the TG4TrajectoryPoint class has a fixed-width:

>>> tree.file.streamer_named("TG4TrajectoryPoint").show()
TG4TrajectoryPoint (v1): TObject (v1)
    Position: TLorentzVector (TStreamerObject)
    Momentum: TVector3 (TStreamerObject)
    Process: int (TStreamerBasicType)
    Subprocess: int (TStreamerBasicType)

It consists of nothing but TLorentzVector (fixed-width), TVector3 (fixed-width), and two integers (fixed-width). ROOT's splitting algorithm ought to be able to split that into four branches: a jagged array of position, a jagged array of momentum, a jagged array of process, and a jagged array of subprocess, and then it would be easy to get at the data.

As for why it doesn't work out of the box, I'm confused by the number of headers that this object has when unsplit.

@soleti
Copy link
Author

soleti commented Jul 20, 2020

Thank you for the suggestion. @mastbaum (the original author of the file) provided me with a copy of the file, this time with splitLevel=99. Unfortunately I am still not able to read those branches. In case you would like to take a look, I uploaded it here https://drive.google.com/file/d/1-0xcHlPFMsj4wFwgtLtWHGHPQsJC_3bj/view?usp=sharing.

@jpivarski
Copy link
Member

As it turns out, the error above is because I was unaware of ROOT's "memberwise splitting," and (if I said anything to the contrary above), it has nothing to do with Boost serialization. This same error came up in 6 different issues, so further discussion on it will be consolidated into scikit-hep/uproot5#38. (This comment is a form message I'm writing on all 6 issues.)

As of PR scikit-hep/uproot5#87, we can now detect such cases, so at least we'll raise a NotImplementedError instead of letting the deserializer fail in mysterious ways. Someday, it will actually be implemented (watch scikit-hep/uproot5#38), but in the meantime, the thing you can do is write your data "objectwise," not "memberwise." (See this comment for ideas on how to do that, and if you manage to do it, you can help a lot of people out by sharing a recipe.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants