-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Performance improvements and aggregation #401
Comments
I did some additional benchmarking. I skipped doing the benchmarks for older package versions because I previously had to add some shims to account for backwards-incompatible changes in flow and didn't want to do that work again. There are still significant internal refactors that need to be done in the
|
My initial assumption that When we print the status, we have to evaluate all pre/postconditions to determine both "eligible" (all preconditions met and some unmet postcondition) and "complete" (all postconditions met). With run/submit, we can exit early if some precondition is unmet / etc., but that is not the case for status (we must do more work). I found and implemented a partial optimization that will help in certain cases: If Another place where I am trying a new approach to the status check. The previous logical flow of
In the new callback approach (#417), I combine these two iterations into one: a callback is provided to Current benchmark: the Further areas for (performance) improvement:
|
This has been resolved by #417. |
This issue exists to document a series of performance improvements I have worked on recently. We created the
next
branch to be a testing area for the new aggregation features, where we can focus on eliminating any bugs and improving performance before merging intomaster
and releasing 0.12.0.In PR #399, I have created a "benchmark" project with 3 operations and 2 labels. This project does not utilize groups or aggregation, so that we can use it as a fair performance comparison with older versions of signac.
Comparison of runtimes on the test project (2 labels, 3 operations with pre/postconditions):
We're getting the performance back down to a tolerable level (within the bounds of past releases). It was something like 100-110 seconds before I made the recent round of PRs to signac and signac-flow. Measuring the scaling for larger workspaces will also be important before we make any final conclusions.
It looks like the performance improvements in signac have been extremely helpful, and using signac
master
with the last flow release 0.11.0 or themaster
branch of flow gives the overall best performance thus far (and it would be even better with the_dfs_convert
optimization in this PR). The overall trend is that newer signac versions are faster, while newer flow versions are slower (as we would expect, since signac has received more optimizations over time and flow has received more complex features over time).The primary difference between flow's
master
andnext
branches is the introduction of aggregation. Breaking down the profile results for those two, the difference in performance appears to be solely affecting status check performance (print_status
). Running and submitting jobs appear to have nearly identical performance to past versions, and with similar profiles before/after aggregation.I studied
_fetch_status
for a while to identify the cause of the slowdown with aggregation. I believe the difference is due to the swapped order of iteration. With the new aggregation logic, the status fetching essentially does "for every group, get the status of each aggregate" by calling_get_group_status
which returns a dictionary structured like{"operation_name": ..., "job_status_details": {aggregate_id: JobStatus for aggregate_id in operation_aggregate_store}}
and then it combines the results from every group. The output is a list of dictionaries structured like[{"job_id": job_id, "operations": {operation_name: JobStatus}, "labels": [...]}, ...]
(I'm leaving out some details like error handling).The previous code could loop over the whole project one time, fetching the status of every operation for each job. Iterating over the project one time is faster than iterating over the aggregate store for each group and then stitching the results from each group together. My profiling tool doesn't quite show the right info here, but this is my best guess. If my assessment is accurate, here are some potential solutions:
Project.open_job
would become much cheaper and the order of iteration (groups, aggregate store) or (aggregate store, groups) wouldn't matter nearly as much. It would also provide performance benefits across the board.From discussions in dev meetings and on Slack, there seems to be a consensus that we should perform a total refactor of status-related code, which will be helped by some of the pickle-related changes #369 and #383 (the code can be parallelized more easily if we utilize
cloudpickle
).The text was updated successfully, but these errors were encountered: