-
Notifications
You must be signed in to change notification settings - Fork 64
uproot 3.11.7 fails to read branch with AssertionError #495
Comments
Thanks for pointing this out! Apparently, the interpretation is wrong. Depending on timing, this might make it into uproot4 and not uproot3. One of the things I'm building into uproot4 are more "expert tools" for investigating missing or wrong interpretations like this one. I was looking at this file using >>> branch.array(uproot.asdebug)[0]
array([ 64, 0, 0, 86, 64, 9, 0, 1, 0, 0, 0, 3, 0,
1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0,
0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 2,
0, 0, 0, 0, 0, 13, 83, 0, 0, 13, 94, 0, 0,
13, 146, 63, 197, 128, 193, 63, 203, 77, 82, 63, 104, 210,
116, 0, 0, 0, 0, 0, 0, 0, 0, 64, 77, 148, 232,
17, 171, 18, 0, 64, 106, 159, 139, 3, 134, 241, 0],
dtype=uint8) and I was looking for "charge" with type >>> np.array([-1], ">f4").view("u1")
array([191, 128, 0, 0], dtype=uint8)
>>> np.array([1], ">f4").view("u1")
array([ 63, 128, 0, 0], dtype=uint8) but I don't see anything like that. Do you see any of the values you expect in there? |
Thanks for looking at this issue. In this example file, the charge is charge measured by a photo-multiplier-tube scaled to units of photo-electrons (for this data it should be some +ve floating-point number of order 1 but not exactly). It is a small file so there is only 1 event with 3 PMT hits. The values I get from a ROOT TTree scan are:
So, to answer your question, none of the values in your debug array match the expected values (although I'm not sure how to interpret the debug array). For debugging, it might be better to focus on the ID since it should be exact integers 3411, 3422, or 3474. |
I'm not sure if this is useful. But with uproot.asdebug it does seem that there may be some offset issue. Element zero of the debug array, when converted to bits does contain the first hit PMT ID=3411. For example: import uproot
rootfile = uproot.open("example.root")
branch = rootfile["T"]["ev.pmt"]
arr = branch.array(uproot.asdebug)[0]
bytestr = "".join(f"{x:08b}" for x in arr)
print(f"3411 (0b{3411:032b}) is at index:", bytestr.find(f"{3411:032b}"))
for bitshift in range(0, 33):
bytes_ = [bytestr[i:i+32] for i in range(bitshift, len(bytestr), 32)]
if any((b==f"{3411:032b}" for b in bytes_)):
print("shifted by:", bitshift) prints the output:
|
Thanks—that's right; I could have used I don't plan to support Boost serialization, and I don't know if it can even be detected. I think the C++ libraries that usually load these data override the TStreamers that are included in the file, meaning that there is no way, looking only at the file, to know how to deserialize them. If they can be read in ROOT without For your example, an entry of data can be deserialized like this: >>> entry_number = 0 # the entry we want to read
>>> debug_array = branch.array(uproot.asdebug)
>>> debug_entry = debug_array[entry_number]
>>> pos = 8 # some outer header
>>> length = debug_entry[pos : pos + 4].view(">i4")[0]; pos += 4
>>> length
3
>>> pos += 6 # some inner header
>>> fBits = debug_entry[pos : pos + length*4].view(">u4"); pos += length*4
>>> fBits # but you don't care about the fBits
array([33554432, 65536, 512], dtype=uint32)
>>> fUniqueID = debug_entry[pos : pos + length*4].view(">u4"); pos += length*4
>>> fUniqueID # but you don't care about the fUniqueID
array([ 1, 0, 33554432], dtype=uint32)
>>> id = debug_entry[pos : pos + length*4].view(">i4"); pos += length*4
>>> id
array([3411, 3422, 3474], dtype=int32)
>>> charge = debug_entry[pos : pos + length*4].view(">f4"); pos += length*4
>>> charge
array([1.5429918 , 1.5882971 , 0.90946126], dtype=float32)
>>> time = debug_entry[pos : pos + length*8].view(">f8"); pos += length*8
>>> time
array([ 0. , 59.16333218, 212.98571946])
>>> assert pos == len(debug_entry) # Did we use all the bytes? Good. This would have to be a Python for loop over all entries because of the way the fields are interleaved—it can't be a NumPy all-at-once operation. If this is not Boost serialization, or there's some indicator in the ROOT file specifying that we should follow this very different kind of deserialization algorithm, then I'll have to figure out what that indicator is. As pointed out above, there have been several files so far with this weird feature. Your new message crossed in the mail—I'll check it out. |
Thanks again for looking at this. The file can be read in ROOT without loading libraries with ".L". By "can be read" I mean that I can open the file in a TBrowser and double click branches to produce plots. Although it does complain about missing dictionaries when you do this. I don't know if Boost serialization or something else weird is being done to write these files. It is produced by experiment software that I don't control. I will have to delve into the code and try and figure it out what is being done. |
In other cases that looked like this, it was Boost serialization used in custom streamers, but if you're able to read these data without any |
As it turns out, the error above is because I was unaware of ROOT's "memberwise splitting," and (if I said anything to the contrary above), it has nothing to do with Boost serialization. This same error came up in 6 different issues, so further discussion on it will be consolidated into scikit-hep/uproot5#38. (This comment is a form message I'm writing on all 6 issues.) As of PR scikit-hep/uproot5#87, we can now detect such cases, so at least we'll raise a |
uproot fails to read a branch raising an
AssertionError
from numerical.py line 159.The branch contains an std::vector containing a custom object inheriting from TObject that contains 3 primitives (1 int, 1 float and 1 double).
The branch interpretation appears to be correct (although I notice that it sets the fBits and fUniqueID to 8 byte rather than the 4 byte that I was expecting).
An example ROOT file is stored in example-root.zip.
The text was updated successfully, but these errors were encountered: