Merge pull request #298 from rapidsai/branch-0.14

[RELEASE] dask-cuda v0.14
rapidsai · Jun 3, 2020 · 9db4453 · 9db4453
2 parents 7f94db5 + f6fe329
commit 9db4453
Show file tree

Hide file tree

Showing 38 changed files with 1,505 additions and 548 deletions.
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -1,3 +1,24 @@
+0.14
+----
+- Publish branch-0.14 to conda (#262) `Paul Taylor`_
+- Fix behavior for `memory_limit=0` (#269) `Peter Andreas Entschev`_
+- Raise serialization errors when spilling (#272) `John Kirkham`_
+- Fix dask-cuda-worker memory_limit (#279) `Peter Andreas Entschev`_
+- Add NVTX annotations for spilling (#282) `Peter Andreas Entschev`_
+- Skip existing on conda uploads (#284) `Ray Douglass`_
+- Local gpuCI build script (#285) `Eli Fajardo`_
+- Remove deprecated DGX class (#286) `Peter Andreas Entschev`_
+- Add RDMACM support (#287) `Peter Andreas Entschev`_
+- Read the Docs Setup (#290) `Benjamin Zaitlen`_
+- Raise ValueError when ucx_net_devices="auto" and IB is disabled (#291) `Peter Andreas Entschev`_
+- Multi-node benchmarks (#293) `Peter Andreas Entschev`_
+- Add docs for UCX (#294) `Peter Andreas Entschev`_
+- Add `--runs` argument to CuPy benchmark (#295) `Peter Andreas Entschev`_
+- Fixing LocalCUDACluster example. Adding README links to docs (#297) `Randy Gelhausen`_
+- Add `nfinal` argument to shuffle_group, required in Dask >= 2.17 (#299) `Peter Andreas Entschev`_
+- Initialize parent process' UCX configuration (#301) `Peter Andreas Entschev`_
+- Add Read the Docs link (#302) `John Kirkham`_
+
 0.13
 ----
 - Use RMM's `DeviceBuffer` directly (#235) `John Kirkham`_
@@ -119,3 +140,6 @@
 .. _`Richard (Rick) Zamora`: https://github.com/rjzamora
 .. _`Benjamin Zaitlen`: https://github.com/quasiben
 .. _`Ray Douglass`: https://github.com/raydouglass
+.. _`Paul Taylor`: https://github.com/trxcllnt
+.. _`Eli Fajardo`: https://github.com/efajardo-nv
+.. _`Randy Gelhausen`: https://github.com/randerzander
diff --git a/README.md b/README.md
@@ -18,6 +18,8 @@ cluster = LocalCUDACluster()
 client = Client(cluster)
 ```
 
+Documentation is available [here](https://dask-cuda.readthedocs.io/).
+
 What this is not
 ----------------
 

diff --git a/ci/checks/style.sh b/ci/checks/style.sh
@@ -20,7 +20,7 @@ BLACK=`black --check .`
 BLACK_RETVAL=$?
 
 # Run flake8 and get results/return code
-FLAKE=`flake8 python`
+FLAKE=`flake8 dask_cuda`
 RETVAL=$?
 
 # Output results if failure otherwise show pass

diff --git a/ci/cpu/upload-anaconda.sh b/ci/cpu/upload-anaconda.sh
@@ -25,4 +25,4 @@ fi
 
 echo "Upload"
 echo ${UPLOADFILE}
-anaconda -t ${MY_UPLOAD_KEY} upload -u ${CONDA_USERNAME:-rapidsai} ${LABEL_OPTION} --force ${UPLOADFILE}
+anaconda -t ${MY_UPLOAD_KEY} upload -u ${CONDA_USERNAME:-rapidsai} ${LABEL_OPTION} --skip-existing ${UPLOADFILE}
diff --git a/ci/local/README.md b/ci/local/README.md
@@ -0,0 +1,57 @@
+## Purpose
+
+This script is designed for developer and contributor use. This tool mimics the actions of gpuCI on your local machine. This allows you to test and even debug your code inside a gpuCI base container before pushing your code as a GitHub commit.
+The script can be helpful in locally triaging and debugging RAPIDS continuous integration failures.
+
+## Requirements
+
+```
+nvidia-docker
+```
+
+## Usage
+
+```
+bash build.sh [-h] [-H] [-s] [-r <repo_dir>] [-i <image_name>]
+Build and test your local repository using a base gpuCI Docker image
+
+where:
+    -H   Show this help text
+    -r   Path to repository (defaults to working directory)
+    -i   Use Docker image (default is gpuci/rapidsai-base:cuda10.0-ubuntu16.04-gcc5-py3.6)
+    -s   Skip building and testing and start an interactive shell in a container of the Docker image
+```
+
+Example Usage:
+`bash build.sh -r ~/rapids/dask-cuda -i gpuci/rapidsai-base:cuda10.1-ubuntu16.04-gcc5-py3.6`
+
+For a full list of available gpuCI docker images, visit our [DockerHub](https://hub.docker.com/r/gpuci/rapidsai-base/tags) page.
+
+Style Check:
+```bash
+$ bash ci/local/build.sh -r ~/rapids/dask-cuda -s
+$ source activate gdf    #Activate gpuCI conda environment
+$ cd rapids
+$ flake8 python
+```
+
+## Information
+
+There are some caveats to be aware of when using this script, especially if you plan on developing from within the container itself.
+
+
+### Docker Image Build Repository
+
+The docker image will generate build artifacts in a folder on your machine located in the `root` directory of the repository you passed to the script. For the above example, the directory is named `~/rapids/dask-cuda/build_rapidsai-base_cuda10.1-ubuntu16.04-gcc5-py3.6/`. Feel free to remove this directory after the script is finished.
+
+*Note*: The script *will not* override your local build repository. Your local environment stays in tact.
+
+
+### Where The User is Dumped
+
+The script will build your repository and run all tests. If any tests fail, it dumps the user into the docker container itself to allow you to debug from within the container. If all the tests pass as expected the container exits and is automatically removed. Remember to exit the container if tests fail and you do not wish to debug within the container itself.
+
+
+### Container File Structure
+
+Your repository will be located in the `/rapids/` folder of the container. This folder is volume mounted from the local machine. Any changes to the code in this repository are replicated onto the local machine. The `cpp/build` and `python/build` directories within your repository is on a separate mount to avoid conflicting with your local build artifacts.
diff --git a/ci/local/build.sh b/ci/local/build.sh
@@ -0,0 +1,142 @@
+#!/bin/bash
+
+DOCKER_IMAGE="gpuci/rapidsai-base:cuda10.0-ubuntu16.04-gcc5-py3.6"
+REPO_PATH=${PWD}
+RAPIDS_DIR_IN_CONTAINER="/rapids"
+CPP_BUILD_DIR="cpp/build"
+PYTHON_BUILD_DIR="python/build"
+CONTAINER_SHELL_ONLY=0
+
+SHORTHELP="$(basename "$0") [-h] [-H] [-s] [-r <repo_dir>] [-i <image_name>]"
+LONGHELP="${SHORTHELP}
+Build and test your local repository using a base gpuCI Docker image
+
+where:
+    -H   Show this help text
+    -r   Path to repository (defaults to working directory)
+    -i   Use Docker image (default is ${DOCKER_IMAGE})
+    -s   Skip building and testing and start an interactive shell in a container of the Docker image
+"
+
+# Limit GPUs available to container based on CUDA_VISIBLE_DEVICES
+if [[ -z "${CUDA_VISIBLE_DEVICES}" ]]; then
+    NVIDIA_VISIBLE_DEVICES="all"
+else
+    NVIDIA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES}
+fi
+
+while getopts ":hHr:i:s" option; do
+    case ${option} in
+        r)
+            REPO_PATH=${OPTARG}
+            ;;
+        i)
+            DOCKER_IMAGE=${OPTARG}
+            ;;
+        s)
+            CONTAINER_SHELL_ONLY=1
+            ;;
+        h)
+            echo "${SHORTHELP}"
+            exit 0
+            ;;
+        H)
+            echo "${LONGHELP}"
+            exit 0
+            ;;
+        *)
+            echo "ERROR: Invalid flag"
+            echo "${SHORTHELP}"
+            exit 1
+            ;;
+    esac
+done
+
+REPO_PATH_IN_CONTAINER="${RAPIDS_DIR_IN_CONTAINER}/$(basename "${REPO_PATH}")"
+CPP_BUILD_DIR_IN_CONTAINER="${RAPIDS_DIR_IN_CONTAINER}/$(basename "${REPO_PATH}")/${CPP_BUILD_DIR}"
+PYTHON_BUILD_DIR_IN_CONTAINER="${RAPIDS_DIR_IN_CONTAINER}/$(basename "${REPO_PATH}")/${PYTHON_BUILD_DIR}"
+
+
+# BASE_CONTAINER_BUILD_DIR is named after the image name, allowing for
+# multiple image builds to coexist on the local filesystem. This will
+# be mapped to the typical BUILD_DIR inside of the container. Builds
+# running in the container generate build artifacts just as they would
+# in a bare-metal environment, and the host filesystem is able to
+# maintain the host build in BUILD_DIR as well.
+# FIXME: Fix the shellcheck complaints
+# shellcheck disable=SC2001,SC2005,SC2046
+BASE_CONTAINER_BUILD_DIR=${REPO_PATH}/build_$(echo $(basename "${DOCKER_IMAGE}")|sed -e 's/:/_/g')
+CPP_CONTAINER_BUILD_DIR=${BASE_CONTAINER_BUILD_DIR}/cpp
+PYTHON_CONTAINER_BUILD_DIR=${BASE_CONTAINER_BUILD_DIR}/python
+# Create build directories. This is to ensure correct owner for directories. If
+# directories don't exist there is side effect from docker volume mounting creating build
+# directories owned by root(volume mount point(s))
+mkdir -p "${REPO_PATH}/${CPP_BUILD_DIR}"
+mkdir -p "${REPO_PATH}/${PYTHON_BUILD_DIR}"
+
+BUILD_SCRIPT="#!/bin/bash
+set -e
+WORKSPACE=${REPO_PATH_IN_CONTAINER}
+PREBUILD_SCRIPT=${REPO_PATH_IN_CONTAINER}/ci/gpu/prebuild.sh
+BUILD_SCRIPT=${REPO_PATH_IN_CONTAINER}/ci/gpu/build.sh
+cd \${WORKSPACE}
+if [ -f \${PREBUILD_SCRIPT} ]; then
+    source \${PREBUILD_SCRIPT}
+fi
+yes | source \${BUILD_SCRIPT}
+"
+
+if (( CONTAINER_SHELL_ONLY == 0 )); then
+    COMMAND="${CPP_BUILD_DIR_IN_CONTAINER}/build.sh || bash"
+else
+    COMMAND="bash"
+fi
+
+# Create the build dir for the container to mount, generate the build script inside of it
+mkdir -p "${BASE_CONTAINER_BUILD_DIR}"
+mkdir -p "${CPP_CONTAINER_BUILD_DIR}"
+mkdir -p "${PYTHON_CONTAINER_BUILD_DIR}"
+echo "${BUILD_SCRIPT}" > "${CPP_CONTAINER_BUILD_DIR}/build.sh"
+chmod ugo+x "${CPP_CONTAINER_BUILD_DIR}/build.sh"
+
+# Mount passwd and group files to docker. This allows docker to resolve username and group
+# avoiding these nags:
+#   * groups: cannot find name for group ID ID
+#   * I have no name!@id:/$
+# For ldap user user information is not present in system /etc/passwd and /etc/group files.
+# Hence we generate dummy files for ldap users which docker uses to resolve username and group
+
+PASSWD_FILE="/etc/passwd"
+GROUP_FILE="/etc/group"
+
+USER_FOUND=$(grep -wc "$(whoami)" < "$PASSWD_FILE")
+if [ "$USER_FOUND" == 0 ]; then
+  echo "Local User not found, LDAP WAR for docker mounts activated. Creating dummy passwd and group"
+  echo "files to allow docker resolve username and group"
+  cp "$PASSWD_FILE" /tmp/passwd
+  PASSWD_FILE="/tmp/passwd"
+  cp "$GROUP_FILE" /tmp/group
+  GROUP_FILE="/tmp/group"
+  echo "$(whoami):x:$(id -u):$(id -g):$(whoami),,,:$HOME:$SHELL" >> "$PASSWD_FILE"
+  echo "$(whoami):x:$(id -g):" >> "$GROUP_FILE"
+fi
+
+# Run the generated build script in a container
+docker pull "${DOCKER_IMAGE}"
+
+DOCKER_MAJOR=$(docker -v|sed 's/[^[0-9]*\([0-9]*\).*/\1/')
+GPU_OPTS="--gpus device=${NVIDIA_VISIBLE_DEVICES}"
+if [ "$DOCKER_MAJOR" -lt 19 ]
+then
+    GPU_OPTS="--runtime=nvidia -e NVIDIA_VISIBLE_DEVICES='${NVIDIA_VISIBLE_DEVICES}'"
+fi
+
+docker run --rm -it ${GPU_OPTS} \
+       -u "$(id -u)":"$(id -g)" \
+       -v "${REPO_PATH}":"${REPO_PATH_IN_CONTAINER}" \
+       -v "${CPP_CONTAINER_BUILD_DIR}":"${CPP_BUILD_DIR_IN_CONTAINER}" \
+       -v "${PYTHON_CONTAINER_BUILD_DIR}":"${PYTHON_BUILD_DIR_IN_CONTAINER}" \
+       -v "$PASSWD_FILE":/etc/passwd:ro \
+       -v "$GROUP_FILE":/etc/group:ro \
+       --cap-add=SYS_PTRACE \
+       "${DOCKER_IMAGE}" bash -c "${COMMAND}"
diff --git a/conda/environments/builddocs_py37.yml b/conda/environments/builddocs_py37.yml
@@ -0,0 +1,6 @@
+name: dask_cuda_docs
+channels:
+- rapidsai-nightly
+- conda-forge
+dependencies:
+- rapids-doc-env
diff --git a/dask_cuda/__init__.py b/dask_cuda/__init__.py
@@ -1,5 +1,4 @@
 from ._version import get_versions
-from .dgx import DGX
 from .local_cuda_cluster import LocalCUDACluster
 
 __version__ = get_versions()["version"]

diff --git a/dask_cuda/benchmarks/__init__.py b/dask_cuda/benchmarks/__init__.py
-Original file line number
+Diff line change
@@ Expand Up / @@ -18,6 +18,8 @@ cluster = LocalCUDACluster() @@
     client = Client(cluster)
     ```
+    Documentation is available [here](https://dask-cuda.readthedocs.io/).
     What this is not
     ----------------
@@ Expand Down @@