Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TH2 variances returned by .to_hist().variances() are incorrect (swapped x and y axis and perhaps some counting issues) #888

Closed
AWildridge opened this issue May 8, 2023 · 3 comments · Fixed by #965
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged

Comments

@AWildridge
Copy link

AWildridge commented May 8, 2023

This is for every TH2 that I have thus far inspected. If it matters, the TH2 histograms were written into a TTree originally by C++ ROOT code. This is most easily seen if your TH2 is not a square histogram (number of bins along x-axis != number of bins along y-axis). Here's some below code to reproduce the behavior showing that the 3rd plot on the right is indeed incorrect:

import uproot
import matplotlib
from matplotlib import pyplot as plt
import mplhep as hep

fig, ax = plt.subplots(1, 3, figsize=(30, 10))

hep.hist2dplot(fileptr['th2_name'].variances(), norm=matplotlib.colors.LogNorm(), ax=ax[0])
myhist = fileptr['th2_name'].to_pyroot()

variances = [[myhist.GetBinError(i+1, j+1)**2 for j in range(768)] for i in range(192)]
hep.hist2dplot(variances, norm=matplotlib.colors.LogNorm(), ax=ax[1])
hep.hist2dplot(fileptr['th2_name'].to_hist().variances(), norm=matplotlib.colors.LogNorm(), ax=ax[2])

Figure showing differences:
th2_variances

Edit: Added image showing example output of above code

@AWildridge AWildridge added the bug (unverified) The problem described would be a bug, but needs to be triaged label May 8, 2023
@jpivarski
Copy link
Member

Are you using an old version? Transposed axes (or something more complex if it was not square) was an issue early on, but it was caught and fixed in #198. It's hard to see how that could happen again.

I looked into this, and I don't see a problem in the most recent version of the code. We have a test file with a 2D histogram, uproot-hepdata-example.root, with two unfortunate features: it's square, and it's an old version of the histogram class that can't be serialized through to_pyroot. It also has values and variances that are equal to each other, but that helps us to see that we're not treating values and variances differently. We can use this file as a test of your issue by concentrating on the bins in the center of the randomly filled blob—we would be able to tell the difference between one matrix and its transpose—and we can make it to_pyroot-serializable by changing the version numbers of its classes and filling the missing member data with default values.

So first, this is how to_hist().plot2d() plots it:

import uproot
import skhep_testdata
import ROOT
import mplhep as hep

f = uproot.open(skhep_testdata.data_path("uproot-hepdata-example.root"))

f["hpxpy"].to_hist().plot2d()

download

Look at the yellowest bins near the center: their pattern is easy to distinguish from the transpose. (There's a line of three bright ones in a horizontal line, with one below and to the left and another above and to the right. If this were to be transposed, that would change.)

Now let's use mplhep's hist2dplot directly to see if it has a different interpretation of horizontal and vertical.

hep.hist2dplot(f["hpxpy"].to_hist().variances())

download

No it doesn't. (Those central yellow bins are showing the same pattern.)

Now what about ROOT's interpretation of the same histogram?

tfile = ROOT.TFile("~/irishep/OLD/uproot3/tests/samples/hepdata-example.root")
histogram = tfile.Get("hpxpy")

c1 = ROOT.TCanvas()
histogram.Draw("COLZ")
c1.Draw()   # necessary because I tested this in a Jupyter notebook...

download

Although the colors are different, the maximal bins have the same pattern in the center.

Suppose we do what you did and collected GetBinError in a list comprehension (which would be another opportunity to interpret horizontal and vertical differently, depending on the order of list comprehension and the interpretation of the two arguments of GetBinError).

errors = [[histogram.GetBinError(i+1, j+1) for j in range(40)] for i in range(40)]

hep.hist2dplot(errors)

download

Same picture: the yellow bins in the center are still in a horizontal line.

Okay, what about converting the Uproot histogram into a PyROOT histogram through to_pyroot, rather than reading it directly with ROOT? In principle, the to_pyroot serialization might be getting the wrong axis ordering (C contiguous instead of Fortran contiguous or something similar). The issue here is that our file has an old histogram version and needs to be updated manually.

h = f["hpxpy"]

h.__class__ = uproot.models.TH.Model_TH2F_v4

h.bases[0].__class__ = uproot.models.TH.Model_TH2_v5

h.bases[0].bases[0].__class__ = uproot.models.TH.Model_TH1_v8

h.bases[0].bases[0]._speedbump1 = None
h.bases[0].bases[0].members["fStatOverflows"] = 0

h.bases[0].bases[0].bases[1].__class__ = uproot.models.TAtt.Model_TAttLine_v2

h.bases[0].bases[0].bases[2].__class__ = uproot.models.TAtt.Model_TAttFill_v2

h.bases[0].bases[0].members["fXaxis"].__class__ = uproot.models.TH.Model_TAxis_v10
h.bases[0].bases[0].members["fYaxis"].__class__ = uproot.models.TH.Model_TAxis_v10
h.bases[0].bases[0].members["fZaxis"].__class__ = uproot.models.TH.Model_TAxis_v10

h.bases[0].bases[0].members["fXaxis"].members["fModLabs"] = None
h.bases[0].bases[0].members["fYaxis"].members["fModLabs"] = None
h.bases[0].bases[0].members["fZaxis"].members["fModLabs"] = None

Although this is some invasive surgery, it definitely would not change how the NumPy array is serialized (which happens in the TArrayF class, whose version didn't even need to be changed).

pyroot_hist = h.to_pyroot()

pyroot_hist.Draw("COLZ")
c1.Draw()

download

It's still the same picture. I don't see any issue. We could do this test with a non-square and more recent histogram class version, but it wouldn't change the outcome, based on what I've seen above.

I think the most likely thing is that you have a version of Uproot from before #198 was fixed. If you update to the most recent version of Uproot and still have this problem, you can reopen this issue (or comment here asking us to, if the permissions don't let you reopen), but then you should also include the file so that we can reproduce it.

@AWildridge
Copy link
Author

AWildridge commented Jun 16, 2023

Thank you for taking the time to look into this.

I am using uproot 5.0.7 which seems to be almost the newest stable release (5.0.8 as of this writing). I am creating these histograms from TUnfoldBinning objects within a C++ framework using TUnfoldBinning::CreateHistogramOfMigrations. I have produced 833 histograms and every single histogram shows this feature.

Here is an example ROOT file containing histograms with these features: https://drive.google.com/file/d/1etIvX7jxyhhkZi3NjFKIQofxgXwS3gao/view?usp=sharing

The below code can be used to replicate what I am seeing:

import uproot
import matplotlib
from matplotlib import pyplot as plt
import mplhep as hep

file = uproot.open('histosTUnfold_combined_ttbarsignalplustau.root')

fig, ax = plt.subplots(1, 3, figsize=(30, 10))

hep.hist2dplot(file['hrecoVsgen_ll_cHel_400mttbar'].variances(), norm=matplotlib.colors.LogNorm(), ax=ax[0])
myhist = file['hrecoVsgen_ll_cHel_400mttbar'].to_pyroot()

variances = [[myhist.GetBinError(i+1, j+1)**2 for j in range(768)] for i in range(192)]
hep.hist2dplot(variances, norm=matplotlib.colors.LogNorm(), ax=ax[1])
hep.hist2dplot(file['hrecoVsgen_ll_cHel_400mttbar'].to_hist().variances(), norm=matplotlib.colors.LogNorm(), ax=ax[2])

@jpivarski Could you please re-open this issue?

@jpivarski jpivarski reopened this Jun 16, 2023
@jpivarski
Copy link
Member

It's reopened so that we can reproduce it and find out why the histogram axes seem to be backward for your case and not for other cases. (There's some mixup going on at some level; we need to get to the bottom of this.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged
Projects
None yet
2 participants