Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to the new dask CLI #981

Merged
merged 20 commits into from
Jan 19, 2023
Merged

Conversation

jacobtomlinson
Copy link
Member

@jacobtomlinson jacobtomlinson commented Aug 24, 2022

In dask/dask#9283 we are adding a new top level dask CLI command which can be extended by other modules using entry points. A primary motivation here is to improve discoverability by uniting everything under one tool and allowing folks to run dask --help and dask <subcommand> --help to learn more about the various tools.

This PR adds a new click group called cuda and moves the dask-cuda-worker command under that group with the name worker. This means the dask-cuda-worker becomes dask cuda worker in the new CLI tool.

I haven't made any changes to the existing dask-cuda-worker console script so that will still continue to work, but maybe we should add a deprecation warning to it?

I went with this name rather than dask cuda-worker because I think it is more readable and also leaves us open to adding more subcommands in the future without cluttering up the top-level dask namespace.

$ dask --help             
Usage: dask [OPTIONS] COMMAND [ARGS]...

  Dask command line interface.

Options:
  --version   Show the version and exit.
  -h, --help  Show this message and exit.

Commands:
  cluster    Manage dask clusters.
  cuda       GPU subcommands.
  docs       Open Dask documentation (https://docs.dask.org/) in a web browser.
  info       Information about your dask installation.
  scheduler  Launch a distributed scheduler.
  ssh        Launch a distributed cluster over SSH.
  worker     Launch a distributed worker attached to an existing SCHEDULER.
$ dask cuda --help        
Usage: dask cuda [OPTIONS] COMMAND [ARGS]...

  GPU subcommands.

Options:
  -h, --help  Show this message and exit.

Commands:
  worker  Launch a distributed worker with GPUs attached to an existing SCHEDULER.
$ dask cuda worker --help  
Usage: dask cuda worker [OPTIONS] [SCHEDULER] [PRELOAD_ARGV]...

  Launch a distributed worker with GPUs attached to an existing SCHEDULER.

  See https://docs.rapids.ai/api/dask-cuda/stable/quickstart.html#dask-cuda-worker for
  info.

Options:
  --host TEXT                     IP address of serving host; should be visible to the
                                  scheduler and other workers. Can be a string (like
                                  ``"127.0.0.1"``) or ``None`` to fall back on the
                                  address of the interface specified by
                                  ``--interface`` or the default interface.
  --nthreads INTEGER              Number of threads to be used for each Dask worker
                                  process.  [default: 1]
...

The CLI PR needs to be merged and released before this can be merged.

Fixes #1038

@github-actions github-actions bot added the python python code needed label Aug 24, 2022
@jacobtomlinson jacobtomlinson added 5 - DO NOT MERGE Hold off on merging; see PR for details improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 24, 2022
@codecov-commenter
Copy link

codecov-commenter commented Aug 24, 2022

Codecov Report

Base: 0.00% // Head: 87.93% // Increases project coverage by +87.93% 🎉

Coverage data is based on head (f91ff34) compared to base (8c87288).
Patch coverage: 87.28% of modified lines in pull request are covered.

Additional details and impacted files
@@                Coverage Diff                @@
##           branch-23.02     #981       +/-   ##
=================================================
+ Coverage          0.00%   87.93%   +87.93%     
=================================================
  Files                26       17        -9     
  Lines              3439     2295     -1144     
=================================================
+ Hits                  0     2018     +2018     
+ Misses             3439      277     -3162     
Impacted Files Coverage Δ
dask_cuda/initialize.py 87.23% <ø> (+87.23%) ⬆️
dask_cuda/cli.py 87.37% <75.00%> (ø)
dask_cuda/utils.py 83.43% <82.35%> (+83.43%) ⬆️
dask_cuda/explicit_comms/dataframe/shuffle.py 98.65% <95.91%> (+98.65%) ⬆️
dask_cuda/cuda_worker.py 77.92% <100.00%> (+77.92%) ⬆️
dask_cuda/explicit_comms/comms.py 99.05% <100.00%> (+99.05%) ⬆️
dask_cuda/get_device_memory_objects.py 92.85% <100.00%> (+92.85%) ⬆️
dask_cuda/local_cuda_cluster.py 89.01% <100.00%> (+89.01%) ⬆️
dask_cuda/proxify_host_file.py 93.71% <100.00%> (+93.71%) ⬆️
dask_cuda/benchmarks/common.py
... and 24 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@pentschev
Copy link
Member

The idea seems nice, but this is a very intricate component in many scripts and I imagine this will be annoying to abruptly change. Would we be able to keep dask-cuda-worker for some time with a DeprecationWarning added to it?

@wence-
Copy link
Contributor

wence- commented Aug 25, 2022

The idea seems nice, but this is a very intricate component in many scripts and I imagine this will be annoying to abruptly change. Would we be able to keep dask-cuda-worker for some time with a DeprecationWarning added to it?

AIUI, the old entry-point continues to exist and will work. I agree that a deprecation warning is a good idea, but not sure exactly how to hook that up.

@pentschev
Copy link
Member

AIUI, the old entry-point continues to exist and will work.

Yes, you're right. That was my bad, I didn't pay a lot of attention to the actual changes.

I agree that a deprecation warning is a good idea, but not sure exactly how to hook that up.

I think we can simply check if dask_cuda_worker.py is running without the cuda argument and then raise the warning.

@github-actions
Copy link

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

@jacobtomlinson
Copy link
Member Author

This is still blocked on the upstream PR

@jacobtomlinson jacobtomlinson removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Nov 9, 2022
@jacobtomlinson jacobtomlinson marked this pull request as ready for review November 9, 2022 11:30
@jacobtomlinson jacobtomlinson requested a review from a team as a code owner November 9, 2022 11:30
@jacobtomlinson
Copy link
Member Author

The new CLI has been merged and released so we should be good to merge this now.

I've bumped the minimum version of dask and distributed to one with the CLI although I'm not sure this is actually necessary as folks can still use dask-cuda-worker and the new dask cuda CLI namespace is opt-in.

@pentschev pentschev changed the base branch from branch-22.10 to branch-22.12 November 9, 2022 12:27
Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, thanks @jacobtomlinson . I now wonder, could we have a dask cuda scheduler without increasing code duplication too much now? I think this could be useful to have some default overrides, more specifically DASK_DISTRIBUTED__COMM__UCX__CREATE_CUDA_CONTEXT=True.

requirements.txt Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@jakirkham
Copy link
Member

jakirkham commented Nov 11, 2022

It would be good to add the entry point to the Conda recipe as well (as is done in the feedstock).

@github-actions github-actions bot added the conda conda issue label Jan 13, 2023
Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @charlesbluca for picking this up, I've left a few suggestions/questions.

Comment on lines +430 to +438
@click.option(
"--get-cluster-configuration",
"get_cluster_conf",
default=False,
is_flag=True,
required=False,
show_default=True,
help="""Print a table of the current cluster configuration""",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is the only argument and without it nothing will happen. Does it make more sense to remove and make the default action (without any non-mandatory arguments) print the configuration?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah was thinking that as well, just wasn't sure if it made more sense to do in a separate PR since the diff here is fairly large already.

(cc @quasiben in case there's something non-obvious this command does when --get-cluster-configuration isn't passed as an option)

dask_cuda/cli.py Outdated Show resolved Hide resolved
dask_cuda/cli.py Outdated Show resolved Hide resolved
@ajschmidt8
Copy link
Member

ajschmidt8 commented Jan 13, 2023

I canceled the latest GH Action run because it looks like the python tests were in-progress for nearly 4 hours, which is dramatically longer than the usual ~30 minutes that they take.

The logs are below, it looks like there were some errors throughout the tests.

To prevent this from happening again, I will open a PR to put a reasonable timeout on the Python tests.

This also happened on #1081.

ajschmidt8 added a commit to ajschmidt8/dask-cuda that referenced this pull request Jan 13, 2023
There were two instances recently (below) where some Python test errors caused the `conda-python-tests` job to run/hang for ~4 hours.

- rapidsai#981 (comment)
- rapidsai#1081 (comment)

To prevent this from happening again in the future, I've added a reasonable timeout of 45 minutes to that particular job.

The job usually takes ~25 minutes to complete, so 45 minutes should be plenty.

This timeout will help prevent jobs from hanging and thus help preserve our finite GPU capacity for CI (particularly for `arm` nodes).
Comment on lines 45 to 50
{% for e in data.get("project", {}).get("scripts", {}).keys() %}
- {{ e }} --help
{% endfor %}
- dask cuda --help
- dask cuda worker --help
- dask cuda config --help
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we might be able to do this to consolidate them further

Suggested change
{% for e in data.get("project", {}).get("scripts", {}).keys() %}
- {{ e }} --help
{% endfor %}
- dask cuda --help
- dask cuda worker --help
- dask cuda config --help
- dask cuda --help
{% for e in data.get("project", {}).get("scripts", {}).keys() %}
- {{ e }} --help
- {{ e|replace("-", " ") }} --help
{% endfor %}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Yeah for now we can do this, interested in if we intend to deprecate and eventually remove the old entrypoints, which would force us to hardcode this some time in the future (cc @jacobtomlinson if you've had any discussion around this)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's fair. For that reason wasn't sure whether we would want to do this or not

ajschmidt8 added a commit that referenced this pull request Jan 13, 2023
There were two instances recently (below) where some Python test errors caused the `conda-python-tests` job to run/hang for ~4 hours.

- #981 (comment)
- #1081 (comment)

To prevent this from happening again in the future, I've added a reasonable timeout of ~~45 minutes to that particular job~~ 30 minutes to the `pytest` command.

The job usually takes ~25 minutes to complete entirely, so 30 minutes just for `pytest` should be plenty.

This timeout will help prevent jobs from hanging and thus help preserve our finite GPU capacity for CI (particularly for `arm` nodes).

Authors:
   - AJ Schmidt (https://github.com/ajschmidt8)

Approvers:
   - Jake Awe (https://github.com/AyodeAwe)
rapids-bot bot pushed a commit that referenced this pull request Jan 17, 2023
…port` (#1085)

Changed this because IIUC `pkg_resources.working_set` is listing the installed distributions and not necessarily the importable modules; this becomes an issue if the distribution and module names aren't the same (e.g. one would `conda install pillow` and then `import PIL`), which was causing some failures in CI that seem unrelated to the changes here.

_Originally posted by @charlesbluca in #981 (comment)

Authors:
  - Charles Blackmon-Luca (https://github.com/charlesbluca)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1085
Copy link
Member

@ajschmidt8 ajschmidt8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving ops-codeowner file changes

@jakirkham jakirkham requested a review from pentschev January 19, 2023 00:49
Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jacobtomlinson and @charlesbluca for working on this!

@pentschev
Copy link
Member

/merge

@jakirkham
Copy link
Member

Have noted the corresponding changes for the conda-forge recipe in issue ( conda-forge/dask-cuda-feedstock#17 ). Please feel free to note anything there I may have missed

rapids-bot bot pushed a commit that referenced this pull request Jan 19, 2023
…dask cuda config` (#1088)

As @pentschev brought up in #981 (comment), we shouldn't need the `--get-cluster-configuration` option for `dask cuda config` since it only enables/disables printing the cluster configuration.

Also added a check to ensure that a scheduler address or scheduler file has been specified, as otherwise IIUC running `dask cuda config` would just end up starting up and querying a local cluster on CPU.

EDIT:

Modified the scheduler check for `dask cuda worker` as well since it seems like a general improvement

Authors:
  - Charles Blackmon-Luca (https://github.com/charlesbluca)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1088
@jacobtomlinson jacobtomlinson deleted the dask-cli branch February 13, 2023 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conda conda issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python python code needed
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Rename dask-config CLI for clarity
8 participants