Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help: Cut application while iterating a tree is not working properly #194

Closed
JuanBSLeite opened this issue Nov 19, 2020 · 5 comments · Fixed by #195
Closed

Help: Cut application while iterating a tree is not working properly #194

JuanBSLeite opened this issue Nov 19, 2020 · 5 comments · Fixed by #195
Labels
bug The problem described is something that must be fixed

Comments

@JuanBSLeite
Copy link

JuanBSLeite commented Nov 19, 2020

Hi,

I'm trying to reduce a very large sample by applying rectangular cuts. But the cuts are not been applied correctly.
For this, I'm following the tree iterator example in the tutorial.

`
tree = uproot4.open("file_MagDown.root:DecayTree")

tree_sel = []

for tree_sel in tree.iterate(cut="D_MM>1910 and D_MM<2030.",step_size=100000):
print(repr(tree_sel))
`
When I plot the mass distribution using the returned tree (tree_sel), it shows that no cut has been applied.

I'm doing something wrong or it's a bug? :/

@JuanBSLeite
Copy link
Author

I made the change bellow and plotted tree_sel[0]['D_MM'] and tree_sel[1]['D_MM'] histograms, the cuts were only applied in the first one,

`
tree_sel = []
Cuts = "(D_MM>1910) & (D_MM<2030)"

for x,report in tree.iterate(entry_stop=100000,step_size=50000,cut=Cuts,report=True):
print(report)
tree_sel.append(x)

`
:/

@jpivarski jpivarski added the question Open-ended questions from users label Nov 19, 2020
@jpivarski
Copy link
Member

You found the issue: it's not "and", it's "&", which is a Python+NumPy thing that is simply something we'll have to get used to. In Python, we can't overload "and", so NumPy and Awkward Array overload "&". This update will improve the situation by explicitly overloading the __bool__ method with an exception the way NumPy does:

>>> # allowed
>>> ak.Array([True, False, True]) & ak.Array([False, True, True])
<Array [False, False, True] type='3 * bool'>

>>> # not allowed
>>> ak.Array([True, False, True]) and ak.Array([False, True, True])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpivarski/irishep/awkward-1.0/awkward1/highlevel.py", line 1440, in __bool__
    raise ValueError(
ValueError: the truth value of an array whose length is not 1 is ambiguous; use ak.any() or ak.all()

This PR was included in awkward1==0.4.4, so upgrading to the latest version should provide this error message, at least.

On the second point, is the cut only being applied to the first file or the first step in iteration? Can you be explicit and provide a reproducer?

Here's a test I just ran:

>>> for arrays in uproot4.iterate("Zmumu*.root:events", ["px1"]): print(arrays)
... 
[{px1: -41.2}, {px1: 35.1}, {px1: 35.1}, ... {px1: 32.4}, {px1: 32.4}, {px1: 32.5}]
[{px1: -41.2}, {px1: 35.1}, {px1: 35.1}, ... {px1: 32.4}, {px1: 32.4}, {px1: 32.5}]
[{px1: -41.2}, {px1: 35.1}, {px1: 35.1}, ... {px1: 32.4}, {px1: 32.4}, {px1: 32.5}]
[{px1: -41.2}, {px1: 35.1}, {px1: 35.1}, ... {px1: 32.4}, {px1: 32.4}, {px1: 32.5}]
[{px1: -41.2}, {px1: 35.1}, {px1: 35.1}, ... {px1: 32.4}, {px1: 32.4}, {px1: 32.5}]
[{px1: -41.2}, {px1: 35.1}, {px1: 35.1}, ... {px1: 32.4}, {px1: 32.4}, {px1: 32.5}]
>>> for arrays in uproot4.iterate("Zmumu*.root:events", ["px1"], cut="px1 > 0"): print(arrays)
... 
[{px1: 35.1}, {px1: 35.1}, {px1: 34.1}, ... {px1: 32.4}, {px1: 32.4}, {px1: 32.5}]
[{px1: 35.1}, {px1: 35.1}, {px1: 34.1}, ... {px1: 32.4}, {px1: 32.4}, {px1: 32.5}]
[{px1: 35.1}, {px1: 35.1}, {px1: 34.1}, ... {px1: 32.4}, {px1: 32.4}, {px1: 32.5}]
[{px1: 35.1}, {px1: 35.1}, {px1: 34.1}, ... {px1: 32.4}, {px1: 32.4}, {px1: 32.5}]
[{px1: 35.1}, {px1: 35.1}, {px1: 34.1}, ... {px1: 32.4}, {px1: 32.4}, {px1: 32.5}]
[{px1: 35.1}, {px1: 35.1}, {px1: 34.1}, ... {px1: 32.4}, {px1: 32.4}, {px1: 32.5}]

The cut applied to all of these files, eliminating the negative px1 in all cases. (The files are different compressions of the same data, for tests.)

@JuanBSLeite
Copy link
Author

Hi @jpivarski,

Each step of the iterator produces a batch array with events selected by the cuts if I understand correctly the tutorial. I'm saving these batches in the tree_sel list and plotting the invariant mass distribution of each batch (2) to check if the cut has been applied.

The problem is that the cuts seem to be only applied in the batch returned in the first step. The second batch still untouched.

As I'm using LHCb data, I think I can't provide a reproducer here. But, I can show you in a zoom call if it ok.

Thanks!

@jpivarski jpivarski linked a pull request Nov 19, 2020 that will close this issue
@jpivarski
Copy link
Member

I'm in a Zoom meeting, and I think it's technically impossible to run two at once. (I have used Zoom and Vidyo at the same time, but that gets complicated fast!)

Also, debugging through Zoom is going to be hard, since I wouldn't be able to touch the code. It sounds like your procedure for identifying this is complex—the first thing we'd have to do anyway is break it down to focus just on Uproot itself.

The cut string does nothing more than putting the code from the string inside square brackets of the result, so

Cuts = "(D_MM>1910) & (D_MM<2030)"
for x,report in tree.iterate(entry_stop=100000,step_size=50000,cut=Cuts,report=True):

should be entirely equivalent to

for arrays,report in tree.iterate(entry_stop=100000,step_size=50000,report=True):
    x = arrays[(arrays.D_MM>1910) & (arrays.D_MM<2030)]

If this isn't true, then there's some bug with cut.


In writing this response, I noticed a difference between my example and yours—mine used uproot4.iterate and yours used ttree.iterate. Having created a reproducer, I found the issue and fixed it in PR #195. That will get deployed as a new version relatively soon, as we're doing the name transition Dec 1.

@jpivarski jpivarski added bug The problem described is something that must be fixed and removed question Open-ended questions from users labels Nov 19, 2020
@JuanBSLeite
Copy link
Author

Hi @jpivarski ,

The equivalent way is working fine!

Thank you very much! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants