Skip to content

Releases: cupy/cupy

v10.4.0

27 Apr 07:44
173260d
Compare
Choose a tag to compare

This is the release note of v10.4.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Announcements

Introduction of generic cupy-wheel (EXPERIMENTAL) (#6012)

We have added a new package in the PyPI called cupy-wheel. This meta package allows other libraries to add a dependency to CuPy with the ability to transparently install the exact CuPy binary wheel matching the user environment. Users can also install CuPy using this package instead of manually specifying a CUDA/ROCm version.

pip install cupy-wheel

This package is only available for the stable release as the current pre-release wheels are not hosted in PyPI.

This feature is currently experimental and subject to change so we recommend users not to distribute packages relying on it for now. Your suggestions or comments are highly welcomed (please visit #6688.)

Changes

Enhancements

  • Add missing cudaDevAttrMemoryPoolsSupported to hip (#6626)
  • Add CC 3.2 to Tegra arch list (#6647)
  • Add a few driver/runtime/nvrtc API wrappers (#6651)

Bug Fixes

  • Define float16::operator-() only for ROCm 5.0+ (#6629)
  • JIT: fix access to cached codes (#6642)
  • [v10] Fix Mempool attr for Cuda Python (#6654)
  • Fix int64 overflow in cupy.polyval (#6666)

Documentation

  • Documentation update for ROCm 5.0 (#6607)
  • Add --pre option to instructions installing pre-releases (#6614)
  • Fix typo in performance guide (#6659)
  • JIT: fix function signatures in the docs (#6660)

Installation

  • Add universal CuPy package (#6683)

Tests

  • Remove jenkins requirements (#6634)
  • CI: Trigger FlexCI for hotfix branches (#6636)
  • Fix TestIncludesCompileCUDA for HEAD tests (#6650)
  • Trigger CUDA Python tests with /test mini (#6655)
  • Fix missing f prefix on f-strings fix (#6679)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @code-review-doctor @danielg1111 @emcastillo @kmaehashi @leofang @takagi

v10.3.1

08 Apr 01:52
48119b9
Compare
Choose a tag to compare

This is the release note of v10.3.1. See here for the complete list of solved issues and merged PRs.

This is a hot-fix release for v10.3.0 which contained a regression that prevents CuPy from working on older CUDA GPUs (Maxwell or earlier).

Changes

Bug Fixes

  • Define float16::operator-() only for ROCm 5.0+ (#6630)

Installation

  • Bump version to v10.3.1 (#6633)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@kmaehashi @takagi

v11.0.0b1

31 Mar 07:02
afa9bdc
Compare
Choose a tag to compare
v11.0.0b1 Pre-release
Pre-release

This is the release note of v11.0.0b1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Notice (2022-04-05)

We have identified that this release contains a regression that prevents CuPy from working in older CUDA GPUs (Maxwell or earlier). We are planning to fix this issue in the next pre-release. See #6615 for the details.

Highlights

Increase coverage of cupyx.scipy.special APIs (#6461, #6582, #6571)

A series of scipy.special routines have been added to cupyx with optimized CUDA raw kernel implementations. loggamma, multigammaln, fast Hankel transformations and several other utility special functions are added in these series of PRs by @grlee77 and @khushi-411.

Support for CUDA 11.6

Full support for CUDA 11.6 has been added as of this release. Binary packages can be installed with the following commnad: pip install --pre cupy-cuda116 -f https://pip.cupy.dev/pre

Support for ROCm 5.0

Full support for ROCm 5.0 has been added as of this release. Binary packages can be installed with the following commnad: pip install --pre cupy-rocm-5-0 -f https://pip.cupy.dev/pre

Changes without compatibility

Use CUB by default (#6549)

CUB support in CuPy is now enabled by default. This results in faster general reductions and routines such as sum, argmax, argmin having increased performance. Notice that CUB may introduce some non-deterministic behavior and this can be disabled by setting the CUPY_ACCELERATORS="" environment variable.

Drop support for ROCm 4.0 (#6420)

CuPy v11 will drop support for ROCm 4.0. We recommend users to use ROCm 4.3 or 5.0 instead.

Changes

New Features

  • Add cupyx.scipy.special statistical distributions (#6461)
  • Add cupy.real_if_close API (#6475)
  • Add cupyx.scipy.special loggamma, multigammaln and fast Hankel transforms (#6528)
  • Add cupyx.scipy.special.{i0e, i1e} (#6571)

Enhancements

  • Update cupy.array_api (#6486)
  • Fix for supporting ROCm 5.0 (#6524)
  • Use CUB by default (#6549)
  • Fix cupy.copyto to take NumPy array scalars (#6584)
  • Implement ndarray.ravel(order="K") (#6585)
  • Make einsum accept subscripts in numpy int (#6506)

Performance Improvements

  • Support cusparseSpGEMM() (#6511)
  • eigsh: Prefer gemv over gemm (#6570)
  • Performance improvement of cupy.in1d (#6583)

Bug Fixes

  • Fix cupy.fill to properly take zero-dim cupy.ndarray (#6481)
  • Fix error message in vectorize (#6499)
  • Fix cupy.cumsum on ROCm 5.0 (#6520)
  • Fix coo_matrix.diagonal (#6522)
  • Fix array creation shape (#6545)
  • Fix out args parser of ufunc (#6546)
  • Fix may_share_memory algorithm (#6560)
  • Avoid using the same kernel from different devices in JIT (#6575)
  • Fix cupy.full and cupy.full_like to make unsafe casting (#6587)
  • Fix device context management in MemoryAsyncPool (#6590)

Code Fixes

  • mypy: array_api (#6438)
  • Minor fixes on uarray backend support (#6526)

Documentation

  • Fix documents for CUDA 11.6 (#6405)
  • Remove description about issues from contribution guide (#6497)
  • Documentation update for ROCm 5.0 (#6530)

Installation

  • Skip appending --compiler-bindir if cl.exe is already on PATH (#6510)
  • Bump version to v11.0.0b1 (#6601)

Tests

  • Add FlexCI projects for Windows (#5889)
  • Run cupy-benchmark on CI (#6417)
  • Disable CentOS 8 test (#6492)
  • Fix Dockerfile broken for array-api tests (#6508)
  • CI: Trigger push event of FlexCI via GitHub Actions (#6538)
  • Skip async_malloc tests on unsupported device (#6541)
  • Fix flaky test_inverse_indices_shape (#6551)
  • Trigger CUDA 11.6 Windows CI when push/pull-request (#6553)
  • CI: Fix event name in dispatcher (#6555)
  • CI: Fix rule name in dispatcher (#6556)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @asi1024 @emcastillo @grlee77 @khushi-411 @kmaehashi @leofang @Onkar627 @peterbell10 @pri1311 @Smit-create @takagi @toslunar @tushxr16

v10.3.0

31 Mar 07:02
5ae1db4
Compare
Choose a tag to compare

This is the release note of v10.3.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Notice (2022-04-08)

We have published a hot-fix release v10.3.1 which addresses a regression that prevents CuPy from working in older CUDA GPUs (Maxwell or earlier).

Highlights

Support for CUDA 11.6

Full support for CUDA 11.6 has been added as of this release. Binary packages are available in PyPI and can be installed with the following command: pip install cupy-cuda116

Support for ROCm 5.0

Full support for ROCm 5.0 has been added as of this release. Binary packages are available in PyPI and can be installed with the following command: pip install cupy-rocm-5-0

Changes

Enhancements

  • Support ROCm 5.0 (#6496)
  • Support cuSPARSELt 0.2.0 (repost) (#6507)
  • Update cupy.array_api (#6550)
  • Fix cupy.copyto to take NumPy array scalars (#6593)
  • Fix for supporting ROCm 5.0 (#6599)
  • Make einsum accept subscripts in numpy int (#6516)

Bug Fixes

  • Fix error message in vectorize (#6515)
  • Fix cupy.cumsum on ROCm 5.0 (#6525)
  • Fix coo_matrix.diagonal (#6533)
  • Fix out args parser of ufunc (#6547)
  • Fix cupy.fill to properly take zero-dim cupy.ndarray (#6548)
  • Fix cuSPARSELt 0.1.0 support in v10 (#6563)
  • Fix may_share_memory algorithm (#6565)
  • Avoid using the same kernel from different devices in JIT (#6581)
  • Fix array creation shape (#6592)
  • Fix cupy.full and cupy.full_like to make unsafe casting (#6595)
  • Fix device context management in MemoryAsyncPool (#6596)

Code Fixes

  • mypy: array_api (#6552)

Documentation

  • Remove description about issues from contribution guide (#6542)
  • Fix documents for CUDA 11.6 (#6543)

Installation

  • Remove CUPY_SETUP_ENABLE_THRUST=0 environment variable (#6488)
  • Skip appending --compiler-bindir if cl.exe is already on PATH (#6514)
  • Bump version to v10.3.0 (#6602)

Tests

  • Ignore warnings from Optuna 3.0 pre-releases (#6490)
  • Disable CentOS 8 test (#6519)
  • Add FlexCI projects for Windows (#6540)
  • Skip async_malloc tests on unsupported device (#6544)
  • CI: Trigger push event of FlexCI via GitHub Actions (#6554)
  • CI: regenerate matrix (#6557)
  • CI: Fix rule name in dispatcher (#6558)
  • CI: Fix event name in dispatcher (#6559)
  • Fix flaky test_inverse_indices_shape (#6573)
  • Trigger CUDA 11.6 Windows CI when push/pull-request (#6578)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @asi1024 @kmaehashi @leofang @Onkar627 @takagi @toslunar @tushxr16

v11.0.0a2

25 Feb 07:05
57ccc98
Compare
Choose a tag to compare
v11.0.0a2 Pre-release
Pre-release

This is the release note of v11.0.0a2 See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Improved NumPy functions coverage (#6078)

As series of NumPy routines have been proposed as a good-first-issue and as a result, an increasing number of contributors have sent pull requests to help increase the number of available APIs. An issue tracker with the currently implemented issues is available at #6078.

Initial support for cupy.typing (#6251)

An API equivalent to numpy.typing to allow the introduction of data types in CuPy and user codes has been added.

Support for CUDA 11.6 (#6349)

Initial support for CUDA 11.6 has been added as of this release. However, binary wheels are not yet distributed and users are expected to build CuPy from source meanwhile.

Support for ROCm 5.0 (#6466)

Initial support for ROCm 5.0 has been added as of this release. However, binary wheels are not yet distributed and users are expected to build CuPy from source meanwhile.

Changes without compatibility

Drop support for ROCm 4.0 (#6420)

CuPy v11 will drop support for ROCm 4.0. We recommend users to use ROCm 4.2/4.3 instead.

Changes

New Features

  • Add cupy.isneginf and cupy.isposinf (#6089)
  • Add cupy.typing (#6251)
  • Add asarray_chkfinite API. (#6275)
  • Add Box-Cox transformations to cupyx.scipy.special (#6302)
  • Use CUDA's log1p for cupyx.scipy.special.log1p (#6315)
  • Add special functions from the CUDA Math API (#6317)
  • Add beta functions to cupyx.scipy.special (#6318)
  • Add cupy.union1d API. (#6357)
  • Add cupy.float_power (#6371)
  • Add cupy.intersect1d API. (#6402)
  • Add cupy.setdiff1d api. (#6433)
  • Add cupy.format_float_scientific API (#6474)

Enhancements

  • First step of mypy introduction (#4955)
  • Fix CI failure to support SciPy 1.8.0 (#6249)
  • implement overwrite_input in cupy.{percentile,quantile} (#6298)
  • avoid DeprecationWarning from SciPy 1.8 (cupyx.scipy.sparse) (#6321)
  • Support NumPy 1.22 (#6323)
  • Remove batched QR solver's experimental mark (#6327)
  • Make scipy.special ufuncs work with CuPy inputs (#6341)
  • Fix thrust related build issue with CUDA 11.6 (#6346)
  • Support CUDA 11.6 (#6349)
  • Fix CI failure to support SciPy 1.8.0 (#6362)
  • Fix type annotations in installer (#6382)
  • Add __cupy_get_ndarray__ dunder method to transform objects to arrays' (#6414)
  • Bump Jitify version to fix memory leak (#6430)
  • Support cuSPARSELt 0.2.0 (repost) (#6436)
  • Support ROCm 5.0 (#6466)
  • Warn if unexpectedlly failed to detect device count in cupy.show_config() (#6472)
  • Fix verbose LOBPCG for SciPy 1.8 (#6388)

Performance Improvements

  • Reduce memory usage in cupy.sort (#6392)

Bug Fixes

  • Fix JIT to support notebook environment (#6329)
  • Fix cupyx.ndimage.spline_filter1d for HIP (#6406)
  • Fix cupy.nan_to_num (#6408)
  • Fix cupyx.special.gammainc, lpmv and sph_harm for hip (#6409)
  • Fix boolean views for HIP (#6412)
  • Fix reduction contiguous size calculation (#6457)

Code Fixes

  • Remove global use_hip flag in setup (#6391)
  • Hide private names in cupyx.scipy.linalg (#6449)
  • Hide private names in cupyx.scipy.ndimage (#6450)
  • Hide private names in cupyx.scipy.signal (#6451)
  • Hide private names in cupyx.scipy.sparse (#6454)
  • Hide private names in cupyx.scipy.stats (#6456)

Documentation

  • Use cupy.__version__ instead of pkg_resources (#6332)
  • Tentatively pin intersphinx to SciPy 1.7.1 docs (#6440)
  • Revert "Tentatively pin intersphinx to SciPy 1.7.1 docs" (#6479)

Installation

  • Avoid monkeypatching distutils (#6273)
  • Eliminate unnecessary configuration pass in setup (#6389)
  • Remove CUPY_SETUP_ENABLE_THRUST=0 environment variable (#6390)
  • Drop support for ROCm 4.0 (#6420)
  • Bump version to v11.0.0a2 (#6501)

Tests

  • CI: allow discarding docker image cache manually (#6269)
  • Add slow tests for stable branch (#6340)
  • Parameterize library installer tests (#6343)
  • Fix tests for eigh() for CUDA 11.6 (#6347)
  • Avoid empty notification message for scheduled tests (#6363)
  • Support SciPy 1.8 (#6365)
  • Add cupy.testing.installed (#6381)
  • Mark XFAIL for SciPy 1.8 release candidate (#6385)
  • CI: Bump ROCm version from 4.3 to 4.3.1 (#6415)
  • CI: build docs in parallel (#6416)
  • CI: Add HEAD tests for stable branch (#6423)
  • CI: Use default schema/matrix path in generate.py (#6424)
  • Skip hfft related tests in HIP (#6427)
  • CI: Manage test tags in yaml (#6429)
  • CI: coverage in reST (#6445)
  • CI: fix NCCL 2.10 unit test not covered (#6448)
  • CI: Fix CUDA 11.6 driver update steps (#6467)
  • Ignore warnings from Optuna 3.0 pre-releases (#6470)
  • Fix failing tests in ROCm (#6482)

Others

  • CI: allow specifying special skip tag (#6468)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@amanchhaparia @anaruse @asi1024 @emcastillo @grlee77 @IvanYashchuk @khushi-411 @kmaehashi @pri1311 @saswatpp @takagi

v10.2.0

25 Feb 07:05
834f9a3
Compare
Choose a tag to compare

This is the release note of v10.2.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support for CUDA 11.6 (#6349)

Initial support for CUDA 11.6 has been added as of this release. However, binary wheels are not yet distributed and users are expected to build CuPy from source meanwhile.

Changes

Enhancements

  • Support cuDNN 8.3.2 (#6328)
  • Support cuTENSOR 1.4.0 (#6330)
  • Support CUDA 11.5.1 (#6331)
  • Support NumPy 1.22 (#6354)
  • avoid DeprecationWarning from SciPy 1.8 (cupyx.scipy.sparse) (#6379)
  • Fix thrust related build issue with CUDA 11.6 (#6386)
  • Fix type annotations in installer (#6395)
  • Support CUDA 11.6 (#6422)
  • Bump Jitify version to fix memory leak (#6432)
  • Add __cupy_get_ndarray__ dunder method to transform objects to arrays' (#6465)
  • Warn if unexpectedlly failed to detect device count in cupy.show_config() (#6476)
  • Fix verbose LOBPCG for SciPy 1.8 (#6394)

Bug Fixes

  • Fix JIT to support notebook environment (#6356)
  • Fix cuDNN installer not working (#6368)
  • Fix cupyx.ndimage.spline_filter1d for HIP (#6411)
  • Fix boolean views for HIP (#6418)
  • Fix cupy.nan_to_num (#6431)
  • Fix reduction contiguous size calculation (#6464)

Code Fixes

  • Remove global use_hip flag in setup (#6398)

Documentation

  • Use cupy.__version__ instead of pkg_resources (#6380)
  • Tentatively pin intersphinx to SciPy 1.7.1 docs (#6442)
  • Revert "Tentatively pin intersphinx to SciPy 1.7.1 docs" (#6480)

Installation

  • Fix for cuDNN directory structure in Windows (#6369)
  • Install lib directory on Windows in cuDNN installer (#6370)
  • Avoid monkeypatching distutils (#6373)
  • Eliminate unnecessary configuration pass in setup (#6399)
  • Bump version to v10.2.0 (#6502)

Tests

  • CI: use CUDA docker images for CUDA Python CI (#6338)
  • Avoid empty notification message for scheduled tests (#6364)
  • CI: allow discarding docker image cache manually (#6372)
  • Parameterize library installer tests (#6374)
  • Fix tests for eigh() for CUDA 11.6 (#6376)
  • Add cupy.testing.installed (#6387)
  • Mark XFAIL for SciPy 1.8 release candidate (#6396)
  • CI: build docs in parallel (#6419)
  • CI: Bump ROCm version from 4.3 to 4.3.1 (#6421)
  • CI: Use default schema/matrix path in generate.py (#6428)
  • CI: Manage test tags in yaml (#6441)
  • Support SciPy 1.8 (#6444)
  • CI: coverage in reST (#6447)
  • CI: fix NCCL 2.10 unit test not covered (#6452)
  • Skip hfft related tests in HIP (#6458)
  • CI: Fix CUDA 11.6 driver update steps (#6471)

Others

  • CI: allow specifying special skip tag (#6477)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @emcastillo @grlee77 @kmaehashi @takagi

v11.0.0a1

20 Jan 08:11
2a4d08e
Compare
Choose a tag to compare
v11.0.0a1 Pre-release
Pre-release

This is the release note of v11.0.0a1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Improved NumPy functions coverage (#6078)

As series of NumPy routines have been proposed as a good-first-issue and as a result, an increasing number of contributors have sent pull requests to help increase the number of available APIs. An issue tracker with the currently implemented issues is available at #6078.

Add cupyx.scipy.special functions (#5687)

Spherical harmonics, Legendre and Gamma functions are implemented using highly performant specific CUDA kernels. Thanks to @grlee77!

Initial support for CUDA Graph API by means of stream capture API (#4567)

This PR adds the ability of using the CUDA Graph API to greatly reduce the overhead of kernel launching. This is done by using the stream capture API, and example follows.
Thanks to @leofang!

import cupy as cp

a = cp.random.randint(0, 10, 100, dtype=np.int32)
s = cp.cuda.Stream(non_blocking=True)

with s:
    s.begin_capture()
    a += 3
    a = cp.abs(a)
    g = s.end_capture()  # work is queued, but not yet launched
g.launch()
s.synchronize()

Support __device__ function in CuPy JIT (#6265)

The new interface cupyx.jit.rawkernel(device=True) is supported to define a CUDA device function.

from cupyx import jit

@jit.rawkernel(device=True)
def getitem(x, tid):
    return x[tid]

@jit.rawkernel()
def elementwise_copy(x, y):
    tid = jit.threadIdx.x + jit.blockDim.x * jit.blockIdx.x
    y[tid] = getitem(x, tid)

The following CUDA code is generated from the above python code.

__device__ int getitem_1(CArray<int, 1, true, true> x, unsigned int tid) {
  return x[tid];
}
extern "C" __global__ void elementwise_copy(CArray<int, 1, true, true> x, CArray<int, 1, true, true> y) {
  unsigned int tid;
  tid = (threadIdx.x + (blockDim.x * blockIdx.x));
  y[tid] = getitem_1(x, tid);
}

Changes

New Features

  • Support stream capture (#4567)
  • Add additional special functions (spherical harmonics, Legendre, Gamma functions) (#5687)
  • Add cupy.asfarray (#6085)
  • Add cupy.trapz (#6107)
  • Add cupy.array_api.linalg (#6131)
  • Add cupy.mask_indices (#6156)
  • Add cupy.array_equiv API. (#6254)
  • Add cupy.cublas.syrk and cupy.cublas.sbmv (#6278)
  • Add cupy.vander API. (#6279)
  • Add cupy.ediff1d API. (#6280)
  • Add cupy.fabs API. (#6282)
  • Add discrete cosine and sine transforms to cupyx.scipy.fft (#6288)
  • Add logit, expit and log_expit to cupyx.scipy.special (#6300)
  • Add xlogy and xlog1py to cupyx.scipy.special(#6301)
  • Add tril_indices and tril_indices_from API. (#6305)
  • Add cupy.format_float_positional (#6308)
  • Add cupy.row_stack API. (#6312)
  • Add triu_indices and triu_indices_from API. (#6316)

Enhancements

  • Raise better message when importing CPU array via DLPack (#6051)
  • Borrow more non-GPU APIs from NumPy (#6074)
  • Add more aliases for compatibility with NumPy (#6075)
  • Import more dtype aliases from NumPy (#6076)
  • Borrow indexing APIs from NumPy (#6077)
  • Apply upstream patch to cupy.array_api (#6086)
  • Compile cub/thrust with no unique symbol (#6106)
  • Support cuDNN 8.3.0 (#6108)
  • Support all advanced indexing (#6127)
  • Support CUDA 11.5.1 (#6166)
  • Support lambda function in cupy.vectorize (#6170)
  • Support eigenvalue solver 64bit API (#6178)
  • Support cuTENSOR 1.4.0 (#6187)
  • Make matmul support ufunc kwargs (#6195)
  • Alias NumPy error classes (#6212)
  • Support comparison to None and Ellipsis (#6222)
  • JIT: Fix if expr typing rule (#6234)
  • Support comparison with more objects (#6250)
  • JIT: Support __device__ function (#6265)
  • More clear warning message (#6283)
  • Make streams hashable (#6285)
  • Check isinstance before comparison in __eq__ (#6287)
  • Support cuDNN 8.3.2 (#6314)
  • Deprecate MachAr (support NumPy 1.22) (#6188)
  • Fix cupy.linalg.qr to align with NumPy 1.22 (#6225)
  • Change a parameter name in percentile and quantile to support NumPy 1.22 (#6228)

Performance Improvements

  • Avoid 64bit division for reduce register consumption (#6019)
  • Remove memory copy in matmul (#6179)

Bug Fixes

  • Detect repeated axis in reduction (#5964)
  • Fix __all__ in cupyx.scipy.fft (#6071)
  • Fix __getitem__ on Ellipsis and advanced indexing dimension (#6081)
  • Allow leading unit dimensions in copy source (#6118)
  • Always test broadcast in copyto (#6121)
  • Fix overloading ambiguity in ndimage filters (#6162)
  • Fix empty Cholesky (#6164)
  • Fix empty solve (#6167)
  • Allow flip ()-shaped array (#6169)
  • Handles infinities of the same sign in logaddexp and logaddexp2 (#6172)
  • Fix #4675 on resolving TODO in #4198 (#6197)
  • Eigenvalue solver 64bit API on CUDA 11.1 (#6201)
  • Fix edge case compatibility in cupy.eye() (#6208)
  • Fix linalg.eigh and linalg.eigvalsh on empty inputs (#6210)
  • Fix overlapping out in matmul and (tensor)dot (#6216)
  • Fix compile_with_cache returning None (#6232)
  • Fixing index calculation for random constructor (#6257)
  • BUG: Fix the .T attribute in the array_api namespace (#6289)
  • Fix stream capture in ROCm (#6296)
  • Fix cuDNN installer not working (#6337)

Code Fixes

  • Remove __all__ from cupyx/scipy/* (#6149)
  • Delete from os import path (#6152)
  • Remove legacy cp.linalg.solve() implementation (#6161)

Documentation

  • Add link to compatibility matrix (#6055)
  • Update upgrade guide (#6058)
  • Add v11 to compatibility matrix (#6067)
  • Exclude kernel_version from comparison table (#6072)
  • Doc: Add more footnotes to comparison table (#6073)
  • Add polynomial modules to comparison table (#6082)
  • Add CITATION.bib and update README (#6091)
  • Remove LLVM_PATH note on document (#6093)
  • Docs: Update linkcode implementation (#6126)
  • Update footnotes in comparison table (#6142)
  • Update conda-forge installation guide (#6186)
  • Revise Overview for CuPy v10 (#6209)
  • Docs: CentOS installation from source (#6218)
  • Fix cupy.trapz docstring (#6239)
  • Fix eigsh doc (#6266)
  • Add cupy.positive in API Reference (#6274)

Installation

  • Replace distutils with setuptools in Windows cl.exe detection (#6025)
  • Fix for cuDNN directory structure in Windows (#6342)

Tests

  • Fix testing.multi_gpu to add pytest marker (#6015)
  • CI: add link to ROCm projects in CI coverage matrix (#6037)
  • CI: use separate project for multi-GPU tests (#6050)
  • Fix CI result notification message format (#6066)
  • Fix CI cannot override cuSPARSELt/cuTENSOR version preinstalled (#6084)
  • Workaround DeprecationWarning raised from pkg_resources (#6094)
  • Fix missing multi_gpu annotation in tests (#6098)
  • Fix exception handling in cupyx.distributed (#6114)
  • Improve FlexCI test scripts (#6117)
  • CI: Add timeout to show_config (#6120)
  • Trigger FlexCI from GitHub Actions (#6130)
  • CI: Fix package override sometimes fails in CentOS (#6141)
  • CI: Need to update CUDA driver in cuda115.multi (#6144)
  • Add tests for convolve2d (#6171)
  • CI: Update limits to reduce cache size (#6174)
  • CI: Fix unquoted specifiers (#6175)
  • Support pre-release NumPy version in tests (#6190)
  • Remove XFAIL for XPASS tests on ROCm (#6259)
  • Tentatively pin to setuptools<60 in Windows CI (#6260)
  • Fix cache key for github actions (#6281)
  • Use NVIDIA docker images for CUDA 11.5 (#6303)
  • Tentatively pin to CUDA Driver 495 (#6310)
  • Remove unused dtype parameterizing in tril_indices test (#6322)
  • Use get_include instead of array_equiv for fallback test (#6333)
  • CI: Add cuda-slow test in FlexCI (#6335)
  • CI: use CUDA docker images for CUDA Python CI (#6336)

Others

  • Add doc issue template (#6294)
  • Bump version to v11.0.0a1 (#6344)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@akochepasov @amanchhaparia @asi1024 @ColmTalbot @emcastillo @eternalphane @grlee77 @haesleinhuepf @khushi-411 @kmaehashi @leofang @okuta @ptim0626 @SauravMaheshkar @shwina @takagi @thomasjpfan @tom24d @toslunar @twmht @WiseroOrb @Yutaro-Sanada

v10.1.0

20 Jan 08:11
14e6413
Compare
Choose a tag to compare

This is the release note of v10.1.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Changes

Enhancements

  • Remove memory copy in matmul (#6241)
  • Fix cupy.linalg.qr to align with NumPy 1.22 (#6263)

Bug Fixes

  • Fix edge case compatibility in cupy.eye() (#6213)
  • Fix compile_with_cache returning None (#6236)
  • Allow flip ()-shaped array (#6237)
  • Fix linalg.eigh and linalg.eigvalsh on empty inputs (#6238)
  • Fix overloading ambiguity in ndimage filters (#6242)
  • Fixing index calculation for random constructor (#6267)
  • BUG: Fix the .T attribute in the array_api namespace (#6291)

Code Fixes

  • Remove legacy cp.linalg.solve() implementation (#6235)

Documentation

  • Docs: CentOS installation from source (#6230)
  • Add cupy.positive in API Reference (#6276)
  • Fix eigsh doc (#6292)

Tests

  • Add tests for convolve2d (#6194)
  • Change a parameter name in percentile and quantile to support NumPy 1.22 (#6247)
  • Tentatively pin to setuptools<60 in Windows CI (#6270)
  • Fix cache key for github actions (#6286)
  • Remove XFAIL for XPASS tests on ROCm (#6297)
  • Use NVIDIA docker images for CUDA 11.5 (#6304)
  • Tentatively pin to CUDA Driver 495 (#6311)
  • CI: Add cuda-slow test in FlexCI (#6339)

Others

  • Bump version to v10.1.0 (#6345)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @kmaehashi @leofang @ptim0626 @SauravMaheshkar @takagi @thomasjpfan @toslunar @WiseroOrb

v10.0.0

09 Dec 06:46
fddaf5e
Compare
Choose a tag to compare

This is the release note of v10.0.0. See here for the complete list of solved issues and merged PRs.

This release note only covers changes made since v10.0.0rc1 release. Check out our blog for highlights in the v10 release!

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support all advanced indexing (#6196)

The support for advanced indexing using boolean masks has been completed in CuPy v10.
Now it is possible to index arrays using combinations of Ellipsis, boolean flags and regular indexes such as a[[[1, 1, -3], [0, 2, 2]], [True, False, True, True]] and a[..., [[False, True]]]

Support lambda functions in cupy.vectorize (#6217)

A long-awaited feature to ensure compatibility with NumPy vectorize has been implemented. In this release, it is now possible to transpile lambda functions. This is especially handy when using JIT in conjunction with cupy.vectorize:

import cupy

a = cupy.array([0.4, -0.2, 1.8, -1.2])
relu = cupy.vectorize(lambda x: (x > 0.0) * x)
print(relu(a))  # [ 0.4 -0.   1.8 -0. ]

Announcements

Drop support for CUDA 10.1 or earlier (#5770)

As per the RFC in #5717 and Twitter, the minimum CUDA version that is supported by CuPy v10 is CUDA 10.2.

Drop support for NCCL 2.6 and 2.7 (#5855)

The minimum supported version for CuPy v10 is NCCL 2.8 as it implements the required primitives for cupyx.distributed to work.

Drop support for Python 3.6 (#5771)

Following the Python 3.6 sunset on December 2021, and the compatibility lines with NumPy, starting CuPy v10, Python 3.6 will no longer be supported.

Drop support for NumPy 1.17 (#5857)

As per NEP29, NumPy 1.17 support has been dropped on July 26, 2021.

Changes

New Features

  • Add cupy.array_api.linalg (#6199)

Enhancements

  • Add more aliases for compatibility with NumPy (#6080)
  • Raise better message when importing CPU array via DLPack (#6097)
  • Apply upstream patch to cupy.array_api (#6105)
  • Borrow more non-GPU APIs from NumPy (#6109)
  • Import more dtype aliases from NumPy (#6110)
  • Borrow indexing APIs from NumPy (#6111)
  • Compile cub/thrust with no unique symbol (#6140)
  • Support cuDNN 8.3.0 (#6150)
  • Support eigenvalue solver 64bit API (#6192)
  • Support all advanced indexing (#6196)
  • Support lambda functions in cupy.vectorize (#6217)
  • Deprecate MachAr (support NumPy 1.22) (#6189)

Performance Improvements

  • Avoid 64bit division to reduce register consumption (#6102)

Bug Fixes

  • Fix __all__ in cupyx.scipy.fft (#6083)
  • Detect repeated axis in reduction (#6103)
  • Fix __getitem__ on Ellipsis and advanced indexing dimension (#6113)
  • Allow leading unit dimensions in copy source (#6153)
  • Always test broadcast in copyto (#6155)
  • Handles infinities of the same sign in logaddexp and logaddexp2 (#6176)
  • Fix empty solve (#6183)
  • Fix empty Cholesky (#6184)
  • Fix #4675 on resolving TODO in #4198 (#6204)
  • Eigenvalue solver 64bit API on CUDA 11.1 (#6220)

Code Fixes

  • Avoid from os import path (#6165)

Documentation

  • Update stable branch (#6065)
  • Update labels of Docs column (#6068)
  • Add more footnotes to comparison table (#6079)
  • Exclude kernel_version from comparison table (#6090)
  • Remove LLVM_PATH note on document (#6101)
  • Add polynomial modules to comparison table (#6122)
  • Add link to compatibility matrix (#6135)
  • Update footnotes in comparison table (#6143)
  • Update conda-forge installation guide (#6200)
  • Update upgrade guide (#6203)
  • Update linkcode implementation (#6206)
  • Revise Overview for CuPy v10 (#6215)

Installation

  • Replace distutils with setuptools in Windows cl.exe detection (#6138)
  • Bump version to v10.0.0 (#6224)

Tests

  • Fix CI cannot override cuSPARSELt/cuTENSOR version preinstalled (#6087)
  • Workaround DeprecationWarning raised from pkg_resources (#6095)
  • Fix testing.multi_gpu to add pytest marker (#6096)
  • Fix missing multi_gpu annotation in tests (#6100)
  • Fix exception handling in cupyx.distributed (#6116)
  • Improve FlexCI test scripts (#6119)
  • Fix CI result notification message format (#6124)
  • CI: Add timeout to show_config (#6132)
  • CI: use separate project for multi-GPU tests (#6145)
  • CI: Need to update CUDA driver in cuda115.multi (#6146)
  • CI: Fix package override sometimes fails in CentOS (#6147)
  • CI: add link to ROCm projects in CI coverage matrix (#6148)
  • CI: Fix unquoted specifiers (#6182)
  • CI: Update limits to reduce cache size (#6185)
  • Trigger FlexCI from GitHub Actions (#6191)
  • Support pre-release NumPy version in tests (#6193)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @emcastillo @eternalphane @kmaehashi @leofang @okuta @takagi @toslunar @twmht @Yutaro-Sanada

v10.0.0rc1

11 Nov 07:00
8f1b39c
Compare
Choose a tag to compare
v10.0.0rc1 Pre-release
Pre-release

This is the release note of v10.0.0rc1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Add cupyx.distributed (#5590)

This new version provides a wrapper over NVIDIA’s NCCL library to perform communication in an MPI-like style. Currently, point-to-point and collective communication primitives are supported. Check the documentation for a complete reference of the functions.

CuPy now supports CUDA 11.5, Python 3.10, and NVIDIA Jetson

Wheels for CUDA 11.5 (cupy-cuda115) are now available.
Python 3.10 wheels are also available for all supported CUDA / ROCm versions.

Wheels for Jetson can be found in the attached artifacts (pip install cupy-cuda112 ​​-f https://pip.cupy.dev/pre).

Enable Generator random API in ROCm 4.3 (#5895)

ROCm 4.3 fixes a series of issues that prevented the Generator random API (#4177) to run in AMD devices.

Changes without compatibility

Refer to the Upgrade Guide ​​for the detailed description.

Automatically enable peer access (#5496)

Peer access is enabled by default when a CuPy ndarray is stored in a different device as long as the machine topology allows it.

Change Device.use() semantics to align with Stream.use() (#5853)

When exiting a context, the current device is now reverted back to the device of the parent's context scope, not the device last use()d.

Automatically convert big-endian numpy.ndarray to little-endian in cupy.array() and its variants (#5828)

Previously CuPy was copying the given numpy.ndarray to GPU as-is, regardless of the endianness. In CuPy v10, big-endian arrays are converted to little-endian before the transfer, which is the native byte order on GPUs. This change eliminates the need to manually change the array endianness before creating the CuPy array.

Add cupyx.profiler module (#5940)

A new module cupyx.profiler is added to host all profiling related APIs in CuPy. Accordingly, the following APIs are relocated to this module:

  • cupy.prof.TimeRangeDecorator() -> cupyx.profiler.time_range()
  • cupy.prof.time_range() -> cupyx.profiler.time_range()
  • cupy.cuda.profile() -> cupyx.profiler.profile()
  • cupyx.time.repeat() -> cupyx.profiler.benchmark()

The old routines are deprecated.

Deprecate cupy.cuda.compile_with_cache (#5858)

An internal API cupy.cuda.compile_with_cache() has been marked as deprecated as there are better alternatives (RawModule, RawKernel). While it has a long-standing history, this API has never been meant to be public. We encourage downstream libraries and users to migrate to the aforementioned public APIs.

Announcements

Drop support for CUDA 10.1 or earlier (#5770)

As per the RFC in #5717 and Twitter, the minimum CUDA version that will be supported by CuPy v10 is CUDA 10.2.

Drop support for NCCL 2.6 and 2.7 (#5855)

The minimum supported version for CuPy v10 will be NCCL 2.8 as it implements the required primitives for cupyx.distributed to work.

Drop support for Python 3.6 (#5771)

Following the Python 3.6 sunset on December 2021, and the compatibility lines with NumPy, starting CuPy v10, Python 3.6 will no longer be supported.

Drop support for NumPy 1.17 (#5857)

As per NEP29, NumPy 1.17 support has been dropped on July 26, 2021.

Alpha/Beta/RC wheels no longer distributed through PyPI

  • As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre) . Note that the sdist package is available in PyPI for all versions.

  • Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.

Changes of supported cuSPARSELt version

We are planning to drop cuSPARSELt v0.1.0 support in CuPy v10 final release. (#6045)

Changes

New Features

  • Add cupyx.distributed (#5590)
  • Add cupy.positive() (#5774)
  • Update cupy.array_api (#5783)
  • Update cupy.array_api typing (#5821)
  • Add trim_mean from scipy.stats to cupyx (#5900)
  • Implement more array creation & serialization methods (#5925)

Enhancements

  • Automatically enable peer access (#5496)
  • Update DLPack header to v0.6 to support exchanging arrays backed by managed memory (#5512)
  • Lazy-preload cuDNN (#5677)
  • Support ROCm managed memory (#5685)
  • Fix import failure when pytest namespace is available (#5703) (#5707)
  • Support cuTENSOR 1.3.3 (#5732)
  • Add dtype and casting arguments to cupy.concatenate() (#5759)
  • Automatically convert big-endian data to little-endian in cupy.array() and its variants (#5828)
  • Use pylibcugraph for connected_components (#5830)
  • Make show_config runnable without GPU (#5835)
  • NotImplementedError clarity (#5841)
  • Change Device.use() semantics to align with Stream.use() (#5853)
  • Drop support for NumPy 1.17 (#5857)
  • Deprecate cupy.cuda.compile_with_cache (#5858)
  • Show error when importing cupy.array_api with Python 3.7 (#5873)
  • Enable new random api in ROCm 4.3 (#5895)
  • Add bitorder option to cupy.packbits (#5898)
  • Support using cuTENSOR in elementwise ufuncs (#5902)
  • Workaround ROCm 4.3 LLVM_PATH issue in hipRTC (#5933)
  • Update the Array API module (#5939)
  • Add cupyx.profiler module (#5940)
  • Use SHA1 hash for kernel cache key to support Linux in FIPS-compliant mode (#5988)
  • Merge fp16 headers for CUDA 11.2+ (#5993)
  • Support CUDA 11.5 for library installer (#5996)
  • Add cupy-cuda115 to duplicate detection (#5999)
  • Suggest using binary packages when build failed (#6028)
  • Improve import error message (#6029)
  • Display license terms when downloading libraries (#6032)
  • Fix error type/message for duplicate value in axis (#5953)

Performance Improvements

  • Use index_t for faster address calculation (#5981)

Bug Fixes

  • Use cudaRuntimeGetVersion instead of CUDA_VERSION for CUDA Python support (#5723)
  • Allow generating cubins for the max known CC (#5779)
  • Fix hypergeometric distribution implementation to use int (#5785)
  • Fix non-determinisitc behavior in cupy.random.shuffle (#5838)
  • Avoid using driver.get_build_version (#5861)
  • Fix nan_to_num to comply with NumPy API (#5870)
  • Do not use cuTENSOR unless available (#5872)
  • Fix _get_cuda_build_version for ROCm (#5888)
  • Fix __repr__ of mode and scalar in cuTENSOR (#5901)
  • Fix to push device after setDevice succeed (#5904)
  • Fix ndarray.clip to match numpy (#5910)
  • Fix copyto with non-contiguous multidevice (#5913)
  • Avoid use of setDevice in CuPy codebase (#5915)
  • Fix max blocksize used in cupyx.optimizing.optimize for HIP (#5921)
  • Do not use with device in code base (#5963)
  • Fix __dlpack__ protocol (#5970)
  • Fix cupyx.tools.install_library for windows (#5977)
  • Fix ravel for strides 0 (#5978)
  • Avoid using with context for streams (#5985)
  • Fix cuTENSOR installation on Windows (#6007)
  • Fix hash length for SHA1 (#6023)
  • Fix: Add missing output dtype check for direct correlate/convolve (#6046)
  • Fix cuDNN version not displayed in wheel installation (#6054)

Code Fixes

  • Code-fix on cupy.array() (#5842)
  • Successive code fix on cupy.array() (#5844)
  • Fix kernel name of cupyx.scipy.ndimage.interpolation.map_coordinates (#5845)
  • Replace addAddNameExpression with addNameExpression in NVRTC binding (#5938)
  • Split loop testing helpers into _loops (#5967)
  • Make CUPY_DLPACK_EXPORT_VERSION consistent (#5982)
  • Fix comment in device switching (#5984)
  • Avoid using deprecated setDaemon method (#6059)

Documentation

  • Update upgrade guide (#5824)
  • Update list of supported OS (#5854)
  • Drop support for NCCL 2.6 and 2.7 (#5855)
  • Add docs for driver.get_build_version (#5860)
  • Document ppc64le and aarch64 are supported on conda-forge (#5865)
  • Mention deprecation of compile_with_cache() in upgrade guide (#5883)
  • Add docs for scipy.sparse.csgraph module (#5903)
  • Refine SciPy-compatible API documentation (#5905)
  • Improve the comparison table (#5907)
  • Remove CUDA 10.0 / 10.1 from README (#5924)
  • Improve some docs on interoperability and cupy.linalg.cholesky (#5941)
  • Add footnotes for functions unimplemented in CuPy (#5942)
  • Document CUPY_ACCELERATORS (#5948)
  • Fix section heading level (#5962)
  • Mention np.matrix in the difference section (#5966)
  • Add PyTorch with RawKernel example to docs (#5973)
  • Add sphinx-copybutton (#5976)
  • Add favicon to docs (#5980)
  • Replace favicon with high resolution one (#5986)
  • Update upgrade guide for v10 (#5994)
  • Cover a bit more of cuTENSOR in perf guide (#5995)
  • Support CUDA 11.5 on documents (#5997)
  • Fix typo in copyright line (#6030)
  • Add Python 3.10.0 to support list (#6038)
  • Added Compatibility Matrix to Upgrade Guide (#6053)

Installation

  • Bump CUDA/ROCm version in docker images (#5859)
  • Fix library installer to limit architecture (#5926)

Tests

  • Introduce new toolset for CI (#5474)
  • Simplify legacy ROCm test script for FlexCI (#5753)
  • Use pytest in TestJoin (#5764)
  • Clean up plan cache in a FFT slow test (#5811)
  • Improve handling of FlexCI test runs (#5814)
  • Tentatively disable pytest-xdist (#5826)
  • Ad...
Read more