Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark refactoring: tidy data and multi-node capability via --scheduler-file #940

Merged
merged 17 commits into from
Jul 1, 2022

Conversation

wence-
Copy link
Contributor

@wence- wence- commented Jun 22, 2022

This is a move towards using the benchmarks for regular profiling on more than one node.

That requires two substantive changes:

  1. Refactor benchmark running and data processing to bring a client up from an external source (here a --scheduler-file, dask-mpi could be used I think, but I haven't done so).
  2. As well as producing human-readable output, produce data that can be consumed by downstream scripts

I've refactored the benchmarks into common infrastructure, which simplifies new benchmark creation.

wence- added 5 commits June 22, 2022 13:37
Removes the async from the Cluster and Client definitions (every
future was immediately waited on, so no need for this complication).

Factor out the monolithic run and produce data function into pieces,
and build tidy dataframe (one row per experimental condition) for
saving.
Adds option to connect to cluster specified in a file (via
--scheduler-file) as well as creating a tidy data table for later
visualisation/analysis.
@wence- wence- requested a review from a team as a code owner June 22, 2022 16:15
@github-actions github-actions bot added the python python code needed label Jun 22, 2022
@wence-
Copy link
Contributor Author

wence- commented Jun 22, 2022

This will conflict with #937 and #938 so I should coordinate with @pentschev.

Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does look good overall, I didn't look line-by-line yet, but it seems that this PR could build upon #937 and #938, IOW, we could get those two merged first and then resolve conflicts on this, what do you think @wence- ?

dask_cuda/benchmarks/utils.py Show resolved Hide resolved
@wence-
Copy link
Contributor Author

wence- commented Jun 23, 2022

It does look good overall, I didn't look line-by-line yet, but it seems that this PR could build upon #937 and #938, IOW, we could get those two merged first and then resolve conflicts on this, what do you think @wence- ?

Yes, happy to do that.

@codecov-commenter
Copy link

codecov-commenter commented Jun 24, 2022

Codecov Report

Merging #940 (0844040) into branch-22.08 (be4aa70) will not change coverage.
The diff coverage is 0.00%.

❗ Current head 0844040 differs from pull request most recent head 1d4c744. Consider uploading reports for the commit 1d4c744 to get more accurate results

@@              Coverage Diff              @@
##           branch-22.08    #940    +/-   ##
=============================================
  Coverage          0.00%   0.00%            
=============================================
  Files                22      22            
  Lines              3149    3260   +111     
=============================================
- Misses             3149    3260   +111     
Impacted Files Coverage Δ
dask_cuda/benchmarks/local_cudf_merge.py 0.00% <0.00%> (ø)
dask_cuda/benchmarks/local_cupy.py 0.00% <0.00%> (ø)
dask_cuda/benchmarks/utils.py 0.00% <0.00%> (ø)
dask_cuda/cli/dask_cuda_worker.py 0.00% <0.00%> (ø)
dask_cuda/benchmarks/local_cudf_shuffle.py 0.00% <0.00%> (ø)
dask_cuda/benchmarks/local_cupy_map_overlap.py 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be4aa70...1d4c744. Read the comment docs.

@pentschev pentschev added 2 - In Progress Currently a work in progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jun 24, 2022
@wence-
Copy link
Contributor Author

wence- commented Jun 27, 2022

Let's think about data we want to store as part of a benchmark. At the moment, I store some aggregate data, and then summary statistics for point to point messaging between workers. The really blows up the storage of the data when using JSON files (this is a disadvantage of the tidy-data format since much information is repeated).

For example, on 256 GPUs, a single run's benchmark data with this format consumes 40MB. The majority of that is bumf. The in-memory representation is 18MB (which is still blown up).

I think that the p2p data is best stored just as a dense matrix, at single precision this would be (ngpus**2)*4 bytes (or 256KB in this case), multiplied by the number of different statistical measures we want (probably less than 10). The remaining overall summary statistics will then be a few KB, which is much more workable.

Other things that one should probably capture:

  • Mapping from integer node/device ID to node name/device name (in case the p2p data shows up bad behaviour we might want to match to bad hardware)

Following on from #938, the minimum information one should probably gather is:

  • Throughput (data processed / second), its quartiles, median, min, and max, standard deviation
  • Observed aggregated bandwidth (if possible, cf linked discussion above), + statistical measures
  • Time to solution
  • Your favourite metric here.

@pentschev
Copy link
Member

The metrics and means to achieve reasonable results sound sensible to me, I'm +1 on the whole idea.

The only comment I have is: I'm assuming you're already considering that, but just for the sake of completeness, perhaps we would like to make this easily extendable, in this manner we don't necessarily need to know in advance all metrics we may want in the future, and thus we will add them as we encounter need for those.

wence- added 2 commits June 30, 2022 16:31
Simplifies new benchmark creation and makes sure that data aggregation
is scalable for all benchmarks to large worker counts.

The idea is that a benchmark now needs to define three functions:

- bench_once (running the benchmark a single time)
- pretty_print_results (to produce a human-readable output)
- create_tidy_results (to produce computer-readable output)
@wence-
Copy link
Contributor Author

wence- commented Jun 30, 2022

The only comment I have is: I'm assuming you're already considering that, but just for the sake of completeness, perhaps we would like to make this easily extendable, in this manner we don't necessarily need to know in advance all metrics we may want in the future, and thus we will add them as we encounter need for those.

This is now hopefully easier, one must adapt the aggregate_transfer_log_data function.

dask_cuda/benchmarks/local_cudf_merge.py Show resolved Hide resolved
dask_cuda/benchmarks/local_cudf_merge.py Show resolved Hide resolved
dask_cuda/benchmarks/local_cupy.py Show resolved Hide resolved
dask_cuda/benchmarks/utils.py Show resolved Hide resolved
@wence-
Copy link
Contributor Author

wence- commented Jun 30, 2022

I think this is ready for another look.

@wence- wence- added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Jun 30, 2022
Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this all looks really well-organized and more complete, certainly a big leap from the old implementation, thanks @wence- ! I left a few suggestions.

dask_cuda/benchmarks/local_cudf_merge.py Show resolved Hide resolved
dask_cuda/benchmarks/utils.py Outdated Show resolved Hide resolved
dask_cuda/benchmarks/utils.py Outdated Show resolved Hide resolved
wence- added 3 commits July 1, 2022 09:32
Rather than randomly sorting the workers on a host (by port number)
sort by host and then device id (if it is set).
Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice now, just one minor missing change required and then we're done.

dask_cuda/benchmarks/utils.py Outdated Show resolved Hide resolved
Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, but I think we'll need the fixes for cuda-python<11.7.1 to be merged before we can merge this. Thanks @wence- for the nice work in making things much more professional looking!

@pentschev
Copy link
Member

Once more, thanks @wence- !

@pentschev
Copy link
Member

pentschev commented Jul 1, 2022

@gpucibot merge

1 similar comment
@pentschev
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 9515a22 into rapidsai:branch-22.08 Jul 1, 2022
@wence- wence- deleted the wence/benchmark-tidy-data branch July 1, 2022 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants