-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shuffle-benchmark: add --partition-distribution
#1081
shuffle-benchmark: add --partition-distribution
#1081
Conversation
I canceled the latest GH Action run because it looks like the python tests were in-progress for over 4 hours, which is dramatically longer than the usual ~30 minutes that they take. The logs are below, it looks like there were some errors throughout the tests. To prevent this from happening again, I will open a PR to put a reasonable timeout on the Python tests. This also happened on #981. |
There were two instances recently (below) where some Python test errors caused the `conda-python-tests` job to run/hang for ~4 hours. - rapidsai#981 (comment) - rapidsai#1081 (comment) To prevent this from happening again in the future, I've added a reasonable timeout of 45 minutes to that particular job. The job usually takes ~25 minutes to complete, so 45 minutes should be plenty. This timeout will help prevent jobs from hanging and thus help preserve our finite GPU capacity for CI (particularly for `arm` nodes).
There were two instances recently (below) where some Python test errors caused the `conda-python-tests` job to run/hang for ~4 hours. - #981 (comment) - #1081 (comment) To prevent this from happening again in the future, I've added a reasonable timeout of ~~45 minutes to that particular job~~ 30 minutes to the `pytest` command. The job usually takes ~25 minutes to complete entirely, so 30 minutes just for `pytest` should be plenty. This timeout will help prevent jobs from hanging and thus help preserve our finite GPU capacity for CI (particularly for `arm` nodes). Authors: - AJ Schmidt (https://github.com/ajschmidt8) Approvers: - Jake Awe (https://github.com/AyodeAwe)
Codecov ReportBase: 87.89% // Head: 87.89% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## branch-23.02 #1081 +/- ##
=============================================
Coverage 87.89% 87.89%
=============================================
Files 17 17
Lines 2296 2296
=============================================
Hits 2018 2018
Misses 278 278 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor niggle in an error message, but otherwise I think this looks fine.
…al_cudf_shuffle_partition_argument
…dsbk/dask-cuda into local_cudf_shuffle_partition_argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/merge |
Implements a
--partition-distribution
argument tolocal_cudf_shuffle.py