Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve depends_on-chains eagerly so positions can be computed when subgroups are loaded #221

Merged
merged 6 commits into from
Aug 8, 2024

Conversation

SimonHeybrock
Copy link
Member

@SimonHeybrock SimonHeybrock commented Aug 7, 2024

Problem description

We have recently moved to loading individual components from NeXus files, such as NXmonitor. After such a load snx.compute_positions can then not resolve depends_on chains that loaded outside that group.

A previous implementation attempted to perform this eagerly while loading, but it causes performance issues. For example, a Bifrost file (873855_00000015.hdf, a "known" worst case) currently loads in 1 s, but takes more than 10 s if we do a naive implementation that directly follows and loads each chain.

Solution

  • I tried adding a cache to a naive solution, but this still takes around 3.5 s, and it likely introduces other problems.
  • Resolving only depends_on attributes that lead outside the current group brings this down to 2 s. I believe this is acceptable as most other files will see a much smaller overhead. Newer Bifrost files might also be less of an issue as it was planned to improve the chain structure (using shorter chains).

Remaining subtlety: If there is a chain with 2 jumps cut by a depend_on in the current group this implementation will not resolve the second jump. This may be rare enough to matter, so I am avoiding complications until this actually causes issues. Example:

group1/transformations/t1/depends_on = group2/transformations/t2
group2/transformations/t2/depends_on = t3  # breaks resolve
group2/transformations/t3/depends_on = group3/transformations/t4
grouped3/transformations/t4

@SimonHeybrock SimonHeybrock requested a review from jl-wynen August 7, 2024 11:40
@SimonHeybrock SimonHeybrock changed the title Resolve chains eagerly so positions can be computed when subgroups are loaded Resolve depends_on-chains eagerly so positions can be computed when subgroups are loaded Aug 7, 2024
resolved = self._obj.parent[depends_on][()]
except Exception: # noqa: S110
# Catchall since resolving not strictly necessary, we should not
# fail the rest of the loading process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you log a message here to help with debugging if we hit this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScippNexus is not using logging so far. Are you suggesting to change this?

If this fail, you will get an error later when calling snx.compute_positions instead. Most likely cause for exceptions here is a dead link.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

# chain after loading, so we try to resolve it directly.
if relative.startswith('..'):
try:
resolved = self._obj.parent[depends_on][()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your description of the PR, did you mean that this line does not work whenever there is a chain that has multiple items within the same group? (group2 in your example)

Does the try-except here catch this case? Or do we need to check whether resolved is a data group?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean that we will resolve a continuous series of jumps to other groups, but if there is a jump followed by a non-jump the resolving will terminate there. I'll think some more about whether we can fix this...

@SimonHeybrock
Copy link
Member Author

I pushed an update, changing the mechanism. This now follows multiple hops correctly. It makes the "worst case" Bifrost load take 6 seconds.

@jl-wynen
Copy link
Member

jl-wynen commented Aug 8, 2024

Why does this affect the performance so much? Does the BIFROST file contain many cases that were not properly resolved with the first version of this PR?

@SimonHeybrock
Copy link
Member Author

Why does this affect the performance so much? Does the BIFROST file contain many cases that were not properly resolved with the first version of this PR?

Yes, there was a chain of hops, each to a different group. Now, ideally one would want to resolve only hops that lead outside the top-level loaded group, not outside the current subgroup being loaded. That sounds complicated with the current mechanism, so maybe some more need for rethinking this entirely?

@SimonHeybrock
Copy link
Member Author

With a recent Bifrost file version the extra overhead with this PR is below the noise, a fraction of a second.

@jl-wynen
Copy link
Member

jl-wynen commented Aug 8, 2024

Then I would say we can leave it as is. We should not make it more complicated than needed.

@SimonHeybrock SimonHeybrock merged commit 5fd2af1 into main Aug 8, 2024
3 checks passed
@SimonHeybrock SimonHeybrock deleted the resolve-chains branch August 8, 2024 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants