-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to the new dask CLI #981
Conversation
Codecov ReportBase: 0.00% // Head: 87.93% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## branch-23.02 #981 +/- ##
=================================================
+ Coverage 0.00% 87.93% +87.93%
=================================================
Files 26 17 -9
Lines 3439 2295 -1144
=================================================
+ Hits 0 2018 +2018
+ Misses 3439 277 -3162
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
The idea seems nice, but this is a very intricate component in many scripts and I imagine this will be annoying to abruptly change. Would we be able to keep |
AIUI, the old entry-point continues to exist and will work. I agree that a deprecation warning is a good idea, but not sure exactly how to hook that up. |
Yes, you're right. That was my bad, I didn't pay a lot of attention to the actual changes.
I think we can simply check if |
This PR has been labeled |
This is still blocked on the upstream PR |
The new CLI has been merged and released so we should be good to merge this now. I've bumped the minimum version of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, thanks @jacobtomlinson . I now wonder, could we have a dask cuda scheduler
without increasing code duplication too much now? I think this could be useful to have some default overrides, more specifically DASK_DISTRIBUTED__COMM__UCX__CREATE_CUDA_CONTEXT=True
.
It would be good to add the entry point to the Conda recipe as well (as is done in the feedstock). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @charlesbluca for picking this up, I've left a few suggestions/questions.
@click.option( | ||
"--get-cluster-configuration", | ||
"get_cluster_conf", | ||
default=False, | ||
is_flag=True, | ||
required=False, | ||
show_default=True, | ||
help="""Print a table of the current cluster configuration""", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is the only argument and without it nothing will happen. Does it make more sense to remove and make the default action (without any non-mandatory arguments) print the configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah was thinking that as well, just wasn't sure if it made more sense to do in a separate PR since the diff here is fairly large already.
(cc @quasiben in case there's something non-obvious this command does when --get-cluster-configuration
isn't passed as an option)
I canceled the latest GH Action run because it looks like the python tests were in-progress for nearly 4 hours, which is dramatically longer than the usual ~30 minutes that they take. The logs are below, it looks like there were some errors throughout the tests. To prevent this from happening again, I will open a PR to put a reasonable timeout on the Python tests. This also happened on #1081. |
There were two instances recently (below) where some Python test errors caused the `conda-python-tests` job to run/hang for ~4 hours. - rapidsai#981 (comment) - rapidsai#1081 (comment) To prevent this from happening again in the future, I've added a reasonable timeout of 45 minutes to that particular job. The job usually takes ~25 minutes to complete, so 45 minutes should be plenty. This timeout will help prevent jobs from hanging and thus help preserve our finite GPU capacity for CI (particularly for `arm` nodes).
conda/recipes/dask-cuda/meta.yaml
Outdated
{% for e in data.get("project", {}).get("scripts", {}).keys() %} | ||
- {{ e }} --help | ||
{% endfor %} | ||
- dask cuda --help | ||
- dask cuda worker --help | ||
- dask cuda config --help |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we might be able to do this to consolidate them further
{% for e in data.get("project", {}).get("scripts", {}).keys() %} | |
- {{ e }} --help | |
{% endfor %} | |
- dask cuda --help | |
- dask cuda worker --help | |
- dask cuda config --help | |
- dask cuda --help | |
{% for e in data.get("project", {}).get("scripts", {}).keys() %} | |
- {{ e }} --help | |
- {{ e|replace("-", " ") }} --help | |
{% endfor %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Yeah for now we can do this, interested in if we intend to deprecate and eventually remove the old entrypoints, which would force us to hardcode this some time in the future (cc @jacobtomlinson if you've had any discussion around this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's fair. For that reason wasn't sure whether we would want to do this or not
There were two instances recently (below) where some Python test errors caused the `conda-python-tests` job to run/hang for ~4 hours. - #981 (comment) - #1081 (comment) To prevent this from happening again in the future, I've added a reasonable timeout of ~~45 minutes to that particular job~~ 30 minutes to the `pytest` command. The job usually takes ~25 minutes to complete entirely, so 30 minutes just for `pytest` should be plenty. This timeout will help prevent jobs from hanging and thus help preserve our finite GPU capacity for CI (particularly for `arm` nodes). Authors: - AJ Schmidt (https://github.com/ajschmidt8) Approvers: - Jake Awe (https://github.com/AyodeAwe)
…port` (#1085) Changed this because IIUC `pkg_resources.working_set` is listing the installed distributions and not necessarily the importable modules; this becomes an issue if the distribution and module names aren't the same (e.g. one would `conda install pillow` and then `import PIL`), which was causing some failures in CI that seem unrelated to the changes here. _Originally posted by @charlesbluca in #981 (comment) Authors: - Charles Blackmon-Luca (https://github.com/charlesbluca) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #1085
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving ops-codeowner
file changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @jacobtomlinson and @charlesbluca for working on this!
/merge |
Have noted the corresponding changes for the conda-forge recipe in issue ( conda-forge/dask-cuda-feedstock#17 ). Please feel free to note anything there I may have missed |
…dask cuda config` (#1088) As @pentschev brought up in #981 (comment), we shouldn't need the `--get-cluster-configuration` option for `dask cuda config` since it only enables/disables printing the cluster configuration. Also added a check to ensure that a scheduler address or scheduler file has been specified, as otherwise IIUC running `dask cuda config` would just end up starting up and querying a local cluster on CPU. EDIT: Modified the scheduler check for `dask cuda worker` as well since it seems like a general improvement Authors: - Charles Blackmon-Luca (https://github.com/charlesbluca) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #1088
In dask/dask#9283 we are adding a new top level
dask
CLI command which can be extended by other modules using entry points. A primary motivation here is to improve discoverability by uniting everything under one tool and allowing folks to rundask --help
anddask <subcommand> --help
to learn more about the various tools.This PR adds a new
click
group calledcuda
and moves thedask-cuda-worker
command under that group with the nameworker
. This means thedask-cuda-worker
becomesdask cuda worker
in the new CLI tool.I haven't made any changes to the existing
dask-cuda-worker
console script so that will still continue to work, but maybe we should add a deprecation warning to it?I went with this name rather than
dask cuda-worker
because I think it is more readable and also leaves us open to adding more subcommands in the future without cluttering up the top-leveldask
namespace.The CLI PR needs to be merged and released before this can be merged.
Fixes #1038