Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Unpin dask & distributed for development #11058

Merged
merged 22 commits into from
Jun 15, 2022
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions ci/benchmark/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ export GBENCH_BENCHMARKS_DIR="$WORKSPACE/cpp/build/gbenchmarks/"
export LIBCUDF_KERNEL_CACHE_PATH="$HOME/.jitify-cache"

# Dask & Distributed option to install main(nightly) or `conda-forge` packages.
export INSTALL_DASK_MAIN=0
export INSTALL_DASK_MAIN=1

function remove_libcudf_kernel_cache_dir {
EXITCODE=$?
Expand Down Expand Up @@ -82,8 +82,8 @@ if [[ "${INSTALL_DASK_MAIN}" == 1 ]]; then
gpuci_logger "gpuci_mamba_retry update dask"
gpuci_mamba_retry update dask
else
gpuci_logger "gpuci_mamba_retry install conda-forge::dask==2022.05.2 conda-forge::distributed==2022.05.2 conda-forge::dask-core==2022.05.2 --force-reinstall"
gpuci_mamba_retry install conda-forge::dask==2022.05.2 conda-forge::distributed==2022.05.2 conda-forge::dask-core==2022.05.2 --force-reinstall
gpuci_logger "gpuci_mamba_retry install conda-forge::dask>=2022.05.2 conda-forge::distributed>=2022.05.2 conda-forge::dask-core>=2022.05.2 --force-reinstall"
gpuci_mamba_retry install conda-forge::dask>=2022.05.2 conda-forge::distributed>=2022.05.2 conda-forge::dask-core>=2022.05.2 --force-reinstall
fi

# Install the master version of streamz
Expand Down
4 changes: 3 additions & 1 deletion ci/cpu/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
##############################################
# cuDF CPU conda build script for CI #
##############################################
set -e
set -ex

# Set path and build parallel level
# FIXME: PATH variable shouldn't be necessary.
Expand Down Expand Up @@ -41,6 +41,8 @@ fi
gpuci_logger "Check environment variables"
env

#ls $WORKSPACE
#cp -r $WORKSPACE/nvcc_linux-64_activate.sh /opt/conda/envs/rapids/etc/conda/activate.d/nvcc_linux-64_activate.sh
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
gpuci_logger "Activate conda env"
. /opt/conda/etc/profile.d/conda.sh
conda activate rapids
Expand Down
6 changes: 3 additions & 3 deletions ci/gpu/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ export MINOR_VERSION=`echo $GIT_DESCRIBE_TAG | grep -o -E '([0-9]+\.[0-9]+)'`
unset GIT_DESCRIBE_TAG

# Dask & Distributed option to install main(nightly) or `conda-forge` packages.
export INSTALL_DASK_MAIN=0
export INSTALL_DASK_MAIN=1

# ucx-py version
export UCX_PY_VERSION='0.27.*'
Expand Down Expand Up @@ -92,8 +92,8 @@ function install_dask {
gpuci_mamba_retry update dask
conda list
else
gpuci_logger "gpuci_mamba_retry install conda-forge::dask==2022.05.2 conda-forge::distributed==2022.05.2 conda-forge::dask-core==2022.05.2 --force-reinstall"
gpuci_mamba_retry install conda-forge::dask==2022.05.2 conda-forge::distributed==2022.05.2 conda-forge::dask-core==2022.05.2 --force-reinstall
gpuci_logger "gpuci_mamba_retry install conda-forge::dask>=2022.05.2 conda-forge::distributed>=2022.05.2 conda-forge::dask-core>=2022.05.2 --force-reinstall"
gpuci_mamba_retry install conda-forge::dask>=2022.05.2 conda-forge::distributed>=2022.05.2 conda-forge::dask-core>=2022.05.2 --force-reinstall
fi
# Install the main version of streamz
gpuci_logger "Install the main version of streamz"
Expand Down
4 changes: 2 additions & 2 deletions conda/environments/cudf_dev_cuda11.5.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@ dependencies:
- pydocstyle=6.1.1
- typing_extensions
- pre-commit
- dask==2022.05.2
- distributed==2022.05.2
- dask>=2022.05.2
- distributed>=2022.05.2
- streamz
- arrow-cpp=7.0.0
- dlpack>=0.5,<0.6.0a0
Expand Down
1 change: 1 addition & 0 deletions conda/recipes/cudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ requirements:
test: # [linux64]
requires: # [linux64]
- cudatoolkit {{ cuda_version }}.* # [linux64]
- {{ compiler('cuda') }} {{ cuda_version }}
jakirkham marked this conversation as resolved.
Show resolved Hide resolved
imports: # [linux64]
- cudf # [linux64]

Expand Down
4 changes: 2 additions & 2 deletions conda/recipes/custreamz/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ requirements:
- python
- streamz
- cudf {{ version }}
- dask==2022.05.2
- distributed==2022.05.2
- dask>=2022.05.2
- distributed>=2022.05.2
- python-confluent-kafka >=1.7.0,<1.8.0a0
- cudf_kafka {{ version }}

Expand Down
8 changes: 4 additions & 4 deletions conda/recipes/dask-cudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,14 @@ requirements:
host:
- python
- cudf {{ version }}
- dask==2022.05.2
- distributed==2022.05.2
- dask>=2022.05.2
- distributed>=2022.05.2
- cudatoolkit {{ cuda_version }}
run:
- python
- cudf {{ version }}
- dask==2022.05.2
- distributed==2022.05.2
- dask>=2022.05.2
- distributed>=2022.05.2
- {{ pin_compatible('cudatoolkit', max_pin='x', min_pin='x') }}

test: # [linux64]
Expand Down
106 changes: 106 additions & 0 deletions nvcc_linux-64_activate.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
#!/bin/bash
# Copyright (c) 2018-2022, NVIDIA CORPORATION.
# Backup environment variables (only if the variables are set)
if [[ ! -z "${CUDA_HOME+x}" ]]
then
export CUDA_HOME_CONDA_NVCC_BACKUP="${CUDA_HOME:-}"
fi

if [[ ! -z "${CUDA_PATH+x}" ]]
then
export CUDA_PATH_CONDA_NVCC_BACKUP="${CUDA_PATH:-}"
fi

if [[ ! -z "${CFLAGS+x}" ]]
then
export CFLAGS_CONDA_NVCC_BACKUP="${CFLAGS:-}"
fi

if [[ ! -z "${CPPFLAGS+x}" ]]
then
export CPPFLAGS_CONDA_NVCC_BACKUP="${CPPFLAGS:-}"
fi

if [[ ! -z "${CXXFLAGS+x}" ]]
then
export CXXFLAGS_CONDA_NVCC_BACKUP="${CXXFLAGS:-}"
fi

if [[ ! -z "${CMAKE_ARGS+x}" ]]
then
export CMAKE_ARGS_CONDA_NVCC_BACKUP="${CMAKE_ARGS:-}"
fi

# Default to using $(cuda-gdb) to specify $(CUDA_HOME).
if [[ -z "${CUDA_HOME+x}" ]]
then
CUDA_GDB_EXECUTABLE=$(which cuda-gdb || exit 0)
if [[ -n "$CUDA_GDB_EXECUTABLE" ]]
then
CUDA_HOME=$(dirname $(dirname $CUDA_GDB_EXECUTABLE))
else
echo "Cannot determine CUDA_HOME: cuda-gdb not in PATH"
return 1
fi
fi

if [[ ! -d "${CUDA_HOME}" ]]
then
echo "Directory specified in CUDA_HOME(=${CUDA_HOME}) doesn't exist"
return 1
fi

if [[ ! -f "${CUDA_HOME}/lib64/stubs/libcuda.so" ]]
then
echo "File ${CUDA_HOME}/lib64/stubs/libcuda.so doesn't exist"
return 1
fi

if [[ -z "$(${CUDA_HOME}/bin/nvcc --version | grep "Cuda compilation tools, release 11.5")" ]]
then
echo "Version of installed CUDA didn't match package"
return 1
fi

export CUDA_HOME="${CUDA_HOME}"
export CFLAGS="${CFLAGS} -isystem ${CUDA_HOME}/include"
export CPPFLAGS="${CPPFLAGS} -isystem ${CUDA_HOME}/include"
export CXXFLAGS="${CXXFLAGS} -isystem ${CUDA_HOME}/include"

### CMake configurations

# CMake looks up components in CUDA_PATH, not CUDA_HOME
export CUDA_PATH="${CUDA_HOME}"
# New-style CUDA integrations in CMake
CMAKE_ARGS="${CMAKE_ARGS:-} -DCUDAToolkit_ROOT=${CUDA_HOME}"
# Old-style CUDA integrations in CMake
## See https://github.com/conda-forge/nvcc-feedstock/pull/58#issuecomment-752179349
CMAKE_ARGS+=" -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_HOME}"
## Avoid https://github.com/conda-forge/openmm-feedstock/pull/44#issuecomment-753560234
## We need CUDA_HOME in _front_ of CMAKE_FIND_ROOT_PATH
CMAKE_ARGS="$(echo ${CMAKE_ARGS} | sed -E -e "s|(-DCMAKE_FIND_ROOT_PATH=)(\S+)|\1$CUDA_HOME;\2|")"
export CMAKE_ARGS="${CMAKE_ARGS}"

### /CMake configurations

mkdir -p "${CONDA_BUILD_SYSROOT}/lib"
mkdir -p "${CONDA_PREFIX}/lib/stubs"

# Add $(libcuda.so) shared object stub to the compiler sysroot.
# Needed for things that want to link to $(libcuda.so).
# Stub is used to avoid getting driver code linked into binaries.


if [[ "${CONDA_BUILD}" == 1 ]]
then
# Make a backup of $(libcuda.so) if it exists
if [[ -f "${CONDA_BUILD_SYSROOT}/lib/libcuda.so" ]]
then
LIBCUDA_SO_CONDA_NVCC_BACKUP="${CONDA_BUILD_SYSROOT}/lib/libcuda.so-conda-nvcc-backup"
mv "${CONDA_BUILD_SYSROOT}/lib/libcuda.so" "${LIBCUDA_SO_CONDA_NVCC_BACKUP}"
fi
ln -s "${CUDA_HOME}/lib64/stubs/libcuda.so" "${CONDA_BUILD_SYSROOT}/lib/libcuda.so"
else
ln -sf "${CUDA_HOME}/lib64/stubs/libcuda.so" "${CONDA_PREFIX}/lib/stubs/libcuda.so"
fi

14 changes: 10 additions & 4 deletions python/dask_cudf/dask_cudf/backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
categorical_dtype_dispatch,
concat_dispatch,
group_split_dispatch,
grouper_dispatch,
hash_object_dispatch,
is_categorical_dtype_dispatch,
make_meta_dispatch,
Expand Down Expand Up @@ -296,12 +297,17 @@ def is_categorical_dtype_cudf(obj):
return cudf.api.types.is_categorical_dtype(obj)


@grouper_dispatch.register((cudf.Series, cudf.DataFrame))
def get_grouper_cudf(obj):
return cudf.core.groupby.Grouper


try:
from dask.dataframe.dispatch import grouper_dispatch
from dask.dataframe.dispatch import pyarrow_schema_dispatch
Comment on lines -300 to +306
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to bump the minimum Dask version because of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a pressing issue to bump min requirement, since it is in try/except block the current code should just work for old and new version of dask.


@grouper_dispatch.register((cudf.Series, cudf.DataFrame))
def get_grouper_cudf(obj):
return cudf.core.groupby.Grouper
@pyarrow_schema_dispatch.register((cudf.DataFrame,))
def get_pyarrow_schema_cudf(obj):
return obj.to_arrow().schema

except ImportError:
pass
Expand Down
4 changes: 3 additions & 1 deletion python/dask_cudf/dask_cudf/tests/test_distributed.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
# Copyright (c) 2020-2022, NVIDIA CORPORATION.

import numba.cuda
import pytest

import dask
from dask import dataframe as dd
from dask.distributed import Client
from distributed.utils_test import loop # noqa: F401
from distributed.utils_test import cleanup, loop # noqa: F401
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is cleanup used below somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had to import due to a CI failure, I think the reason is similar to: https://github.com/rapidsai/dask-cuda/pull/924/files

Copy link
Contributor Author

@galipremsagar galipremsagar Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested removing cleanup locally and got errors around Cluster failing to start, so will probably need it to be imported explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking 🙏 Strange. Maybe we should raise a Distributed issue about this?


import cudf
from cudf.testing._utils import assert_eq
Expand Down
4 changes: 2 additions & 2 deletions python/dask_cudf/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@

install_requires = [
"cudf",
"dask==2022.05.2",
"distributed==2022.05.2",
"dask>=2022.05.2",
"distributed>=2022.05.2",
"fsspec>=0.6.0",
"numpy",
"pandas>=1.0,<1.5.0dev0",
Expand Down