Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of event data in NXdetector #123

Merged
merged 9 commits into from
Apr 17, 2023

Conversation

SimonHeybrock
Copy link
Member

This resolves a number of inconsistencies and broken things. For example, slicing pixels works now. Behavior wise, we are now closer to the pre-v2 implementation.

When an NXdetector contains both dense data and events, this is not returning a Dataset. I don't know if this is useful in practice, or if we should just go back to returning the events by default?

@SimonHeybrock SimonHeybrock requested a review from jl-wynen April 5, 2023 11:55
@jl-wynen
Copy link
Member

jl-wynen commented Apr 5, 2023

I don't know if this is useful in practice, or if we should just go back to returning the events by default?

Is the dense data typically used as a histogram of the events? If so, there should be an option to select which one to load. Or at least a postprocessing function.

@@ -84,15 +86,24 @@ def _init_signal(self, name: Optional[str], children):
if name is not None and name in children:
self._signal_name = name
self._signal = children[name]
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fishy. There are now 3 blocks that each can set self._signal regardless of whether it is already set. So if more than one signal-detection method succeeds, only the result of the last will be used.
Is this intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is intentional and I fear also required (even though I am far from happy about this), and not new (just added one more case, in a sense). For example, McStas NeXus files use the "legacy" mechanism for defining the signal. However, for some reason (bad luck?) they also define the non-legacy signal group-attr, but with an unrelated value. Therefore, the search continues, even if that is found.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So can all 3 methods appear in the same file? Or only the 2 that McStas uses?

Can you reverse the order and return after the first signal was found? This would make it clearer that there are multiple, independent encodings. As it stands, one has to read everything in detail to understand that they don't depend on each other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And can you log which one gets used to help with debugging?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScippNexus does not log. Printing warnings is also quite user-unfriendly, as there can easily be hundreds of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to change the current behavior?

No. Why would this change behaviour? From what I understood, the 3rd block will override the signal set earlier (if it finds one). And it will always set aux signals. So moving this to the top and returning if it find a signal should be the same behaviour, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understood, the 3rd block will override the signal set earlier (if it finds one). And it will always set aux signals. So moving this to the top and returning if it find a signal should be the same behaviour, no?

No. It does not override previously found signals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see. How about

self._aux_signals = self._find_aux_signals()
if self._signal is None:
  self._signal_name = self._aux_signals.pop(0)  # or w/o 0 if the order is irrelevant
  self._signal = children[self._signal_name]

to make it more explicit?

I keep asking for changes because I am, as you can tell, confused by this function and the precise logic behind it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am considering changing the mechanism such that a user has to select via definitions which "version" of the standard they want to use. The would prevent support for mixed files though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add comments on #126, as that changed this function some more.

Comment on lines 347 to 349
sizes = dict(self._grouping.sizes)
sizes.update(self._event_data.sizes)
return sizes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sizes = dict(self._grouping.sizes)
sizes.update(self._event_data.sizes)
return sizes
return {**self._grouping.sizes, **self._event_data.sizes}

def __init__(self, event_data: Group, grouping_name: str, grouping: Field) -> None:
self._event_data = event_data
self._grouping_name = grouping_name
self._grouping = grouping
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what 'grouping' means?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a docstring.

@SimonHeybrock
Copy link
Member Author

SimonHeybrock commented Apr 11, 2023

I don't know if this is useful in practice, or if we should just go back to returning the events by default?

Is the dense data typically used as a histogram of the events? If so, there should be an option to select which one to load. Or at least a postprocessing function.

I guess that the dense data is typically just meant as a "preview" of what you would get when histogramming the events. But good luck finding that documented anywhere. It is not even clear if having both is legal.

@jl-wynen
Copy link
Member

I don't know if this is useful in practice, or if we should just go back to returning the events by default?

Is the dense data typically used as a histogram of the events? If so, there should be an option to select which one to load. Or at least a postprocessing function.

I guess that the dense data is typically just meant as a "preview" of what you would get when histogramming the events. But good luck finding that documented anywhere. It is not even clear if having both is legal.

Based on your answer, I suspect that it will be most useful to return the events as a data array and log that there was dense data that was not read. But this is just a hunch

@SimonHeybrock SimonHeybrock merged commit 2d38ec9 into main Apr 17, 2023
@SimonHeybrock SimonHeybrock deleted the time-dependent-signal-handling branch April 17, 2023 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants