Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Unpin dask & distributed for development #11058

Merged
merged 22 commits into from
Jun 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions ci/benchmark/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ export GBENCH_BENCHMARKS_DIR="$WORKSPACE/cpp/build/gbenchmarks/"
export LIBCUDF_KERNEL_CACHE_PATH="$HOME/.jitify-cache"

# Dask & Distributed option to install main(nightly) or `conda-forge` packages.
export INSTALL_DASK_MAIN=0
export INSTALL_DASK_MAIN=1

function remove_libcudf_kernel_cache_dir {
EXITCODE=$?
Expand Down Expand Up @@ -82,8 +82,8 @@ if [[ "${INSTALL_DASK_MAIN}" == 1 ]]; then
gpuci_logger "gpuci_mamba_retry update dask"
gpuci_mamba_retry update dask
else
gpuci_logger "gpuci_mamba_retry install conda-forge::dask==2022.05.2 conda-forge::distributed==2022.05.2 conda-forge::dask-core==2022.05.2 --force-reinstall"
gpuci_mamba_retry install conda-forge::dask==2022.05.2 conda-forge::distributed==2022.05.2 conda-forge::dask-core==2022.05.2 --force-reinstall
gpuci_logger "gpuci_mamba_retry install conda-forge::dask>=2022.05.2 conda-forge::distributed>=2022.05.2 conda-forge::dask-core>=2022.05.2 --force-reinstall"
gpuci_mamba_retry install conda-forge::dask>=2022.05.2 conda-forge::distributed>=2022.05.2 conda-forge::dask-core>=2022.05.2 --force-reinstall
fi

# Install the master version of streamz
Expand Down
6 changes: 3 additions & 3 deletions ci/gpu/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ export MINOR_VERSION=`echo $GIT_DESCRIBE_TAG | grep -o -E '([0-9]+\.[0-9]+)'`
unset GIT_DESCRIBE_TAG

# Dask & Distributed option to install main(nightly) or `conda-forge` packages.
export INSTALL_DASK_MAIN=0
export INSTALL_DASK_MAIN=1

# ucx-py version
export UCX_PY_VERSION='0.27.*'
Expand Down Expand Up @@ -92,8 +92,8 @@ function install_dask {
gpuci_mamba_retry update dask
conda list
else
gpuci_logger "gpuci_mamba_retry install conda-forge::dask==2022.05.2 conda-forge::distributed==2022.05.2 conda-forge::dask-core==2022.05.2 --force-reinstall"
gpuci_mamba_retry install conda-forge::dask==2022.05.2 conda-forge::distributed==2022.05.2 conda-forge::dask-core==2022.05.2 --force-reinstall
gpuci_logger "gpuci_mamba_retry install conda-forge::dask>=2022.05.2 conda-forge::distributed>=2022.05.2 conda-forge::dask-core>=2022.05.2 --force-reinstall"
gpuci_mamba_retry install conda-forge::dask>=2022.05.2 conda-forge::distributed>=2022.05.2 conda-forge::dask-core>=2022.05.2 --force-reinstall
fi
# Install the main version of streamz
gpuci_logger "Install the main version of streamz"
Expand Down
4 changes: 2 additions & 2 deletions conda/environments/cudf_dev_cuda11.5.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@ dependencies:
- pydocstyle=6.1.1
- typing_extensions
- pre-commit
- dask==2022.05.2
- distributed==2022.05.2
- dask>=2022.05.2
- distributed>=2022.05.2
- streamz
- arrow-cpp=8.0.0
- dlpack>=0.5,<0.6.0a0
Expand Down
4 changes: 2 additions & 2 deletions conda/recipes/custreamz/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ requirements:
- python
- streamz
- cudf {{ version }}
- dask==2022.05.2
- distributed==2022.05.2
- dask>=2022.05.2
- distributed>=2022.05.2
- python-confluent-kafka >=1.7.0,<1.8.0a0
- cudf_kafka {{ version }}

Expand Down
8 changes: 4 additions & 4 deletions conda/recipes/dask-cudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,14 @@ requirements:
host:
- python
- cudf {{ version }}
- dask==2022.05.2
- distributed==2022.05.2
- dask>=2022.05.2
- distributed>=2022.05.2
- cudatoolkit {{ cuda_version }}
run:
- python
- cudf {{ version }}
- dask==2022.05.2
- distributed==2022.05.2
- dask>=2022.05.2
- distributed>=2022.05.2
- {{ pin_compatible('cudatoolkit', max_pin='x', min_pin='x') }}

test: # [linux64]
Expand Down
14 changes: 10 additions & 4 deletions python/dask_cudf/dask_cudf/backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
categorical_dtype_dispatch,
concat_dispatch,
group_split_dispatch,
grouper_dispatch,
hash_object_dispatch,
is_categorical_dtype_dispatch,
make_meta_dispatch,
Expand Down Expand Up @@ -296,12 +297,17 @@ def is_categorical_dtype_cudf(obj):
return cudf.api.types.is_categorical_dtype(obj)


@grouper_dispatch.register((cudf.Series, cudf.DataFrame))
def get_grouper_cudf(obj):
return cudf.core.groupby.Grouper


try:
from dask.dataframe.dispatch import grouper_dispatch
from dask.dataframe.dispatch import pyarrow_schema_dispatch
Comment on lines -300 to +306
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to bump the minimum Dask version because of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a pressing issue to bump min requirement, since it is in try/except block the current code should just work for old and new version of dask.


@grouper_dispatch.register((cudf.Series, cudf.DataFrame))
def get_grouper_cudf(obj):
return cudf.core.groupby.Grouper
@pyarrow_schema_dispatch.register((cudf.DataFrame,))
def get_pyarrow_schema_cudf(obj):
return obj.to_arrow().schema

except ImportError:
pass
Expand Down
4 changes: 3 additions & 1 deletion python/dask_cudf/dask_cudf/tests/test_distributed.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
# Copyright (c) 2020-2022, NVIDIA CORPORATION.

import numba.cuda
import pytest

import dask
from dask import dataframe as dd
from dask.distributed import Client
from distributed.utils_test import loop # noqa: F401
from distributed.utils_test import cleanup, loop # noqa: F401
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is cleanup used below somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had to import due to a CI failure, I think the reason is similar to: https://github.com/rapidsai/dask-cuda/pull/924/files

Copy link
Contributor Author

@galipremsagar galipremsagar Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested removing cleanup locally and got errors around Cluster failing to start, so will probably need it to be imported explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking 🙏 Strange. Maybe we should raise a Distributed issue about this?


import cudf
from cudf.testing._utils import assert_eq
Expand Down
4 changes: 2 additions & 2 deletions python/dask_cudf/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@

install_requires = [
"cudf",
"dask==2022.05.2",
"distributed==2022.05.2",
"dask>=2022.05.2",
"distributed>=2022.05.2",
"fsspec>=0.6.0",
"numpy",
"pandas>=1.0,<1.5.0dev0",
Expand Down