-
Notifications
You must be signed in to change notification settings - Fork 64
IndexError while reading a vector of custom class objects from tree #475
Comments
I was looking at the file to fix Uproot for this case (to get it to deserialize slowly, rather than not at all), and it doesn't seem to be ROOT serialized. That's the IndexError: while trying to deserialize these objects, following the prescription set by the TStreamerInfo, the file pointer gets sent to a crazy position (268436067). The reading is not simply offset by a few bytes by an earlier mistake—there are no offsets that give you data like
Instead, there are batches of int32s, followed by batches of float32s, like:
and
(which is ROOT serialization doesn't put the variables of the same type in batches within a TBranch like this (one might call it "splitting within a branch"). Within a TBranch, we should see the integers and floats interleaved in the order described by the TStreamerInfo (the table above). I've seen this before, in #403, in which the data serialization ignored the TStreamers and serialized with Boost. Is your file similar to the one described in that issue? Deserializing Boost-in-ROOT is beyond Uproot's scope. In principle, you need the original C++ methods to do that, since the Boost serialization is described in code. Alternatively, if you can just write the data with ROOT's splitLevel turned on, each field would be in a separate TBranch. Then it wouldn't just be possible to deserialize, it would also use NumPy rather than falling back to |
Thanks very much for your quick reply, and the helpful explanation! I'm not sure how the data were being serialized, but after some digging through our software framework, we found that the splitlevel was being set to 1 when branches were created. Changing this to 99 allows the splitting to happen, and enables me access the However, this seems to cause another problem in one of our other data structures. We have another class called
works fine, but if I run
I get the following error:
I get the same failure, always at event 4561, no matter what the simulation inputs are (for example, same error if I change the random seed). The failure is not present when the branch splitlevel is 1. Is this also an issue with serialization / how we're writing the data to disk? I've uploaded an example file here: Thanks again! |
The short, and probably good, news is that it only affects a TBranch you don't care about. The one TBranch that is giving you this error is named [x for x in events.allkeys() if events[x].interpretation is not None and x != b"fBits"] The reason you saw the error at a particular event number didn't have anything to do with that event; it was the threshold where you read out more than one TBasket of Specifically, these are the TBasket data sizes that Uproot predicts: [events["fBits"].basket_uncompressedbytes(i) for i in range(events["fBits"].numbaskets)]
[22808, 22808, 4408] and these are the sizes that come out: [events["fBits"].basket(i).nbytes for i in range(events["fBits"].numbaskets)]
[9120, 9120, 1760] The prediction is exactly Was this file actually produced by Geant? Geant has its own ROOT file writer, and it's possible that it gets some things wrong, such as this I could put in specialized logic to predict the size of non-jagged numerical types as the number of entries times the item size, but that would assume that we trust the number of entries more than |
I don't believe this file was produced directly by Geant4. We are using a software framework called SNiPER (developed for the JUNO experiment), and the framework appears to handle all the ROOT I/O. I myself am just getting started with it, so I don't know the intricate details. But I will forward this information to more knowledgable people. In the mean time, I think we can work around this using the prescription you suggest. Thanks very much! |
I'm guessing I can close this? Let me know if I'm wrong. |
As it turns out, the error above is because I was unaware of ROOT's "memberwise splitting," and (if I said anything to the contrary above), it has nothing to do with Boost serialization. This same error came up in 6 different issues, so further discussion on it will be consolidated into scikit-hep/uproot5#38. (This comment is a form message I'm writing on all 6 issues.) As of PR scikit-hep/uproot5#87, we can now detect such cases, so at least we'll raise a |
Hi,
I have a use case in which we have a branch which is a vector of a custom class objects (in our case, the class is called
ElecChannel
), and I would like to access individual members of this object (which include somevector<short>
objects). If I try following the instructions from Issue #371 , I get anIndexError
:Using
show
tells me that the fElecChannels branch is being interpreted as a generic object, though I'm not sure exactly what that means:Is there a way I can read this out into arrays?
An example file can be found here
Thanks so much!
The text was updated successfully, but these errors were encountered: