25 May 07:58

takagi

5e5f2a9

v13.0.0a1 Pre-release

Pre-release

This is the release note of v13.0.0a1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CuPy v13 Roadmap and Revised Release Schedule

We have published a list of feature roadmaps for CuPy v13 planned to be released in October 2023. See #7555 for the details.
Starting in the CuPy v13 development cycle, we have adjusted our release frequency to once every two months. Mid-term or hot-fix releases may be provided depending on necessity, such as for new CUDA/Python version support or critical bug fixes. This new policy also applies to v12 releases.
RFC: We plan to drop CUDA 10.2/11.0/11.1 support in CuPy v13. Please leave a comment on #7557 if you have any suggestions.
RFC: We are thinking of improving PyTorch interoperability features in CuPy. If you are interested, please join the discussion in #7556.

Improved Coverage of `cupyx.scipy.signal` and `cupyx.scipy.interpolate` APIs (#7442, #7496 and others)

lfilter, lfilter_zi, filtfilt, sosfilt APIs are now included in cupyx.scipy.signal, and NdPPoly in cupyx.scipy.interpolate modules.

Acknowledgements: This work was done by Edgar Andrés Margffoy Tuay (@andfoy) and Evgeni Burovski (@ev-br) under the support of the Chan Zuckerberg Initiative's Essential Open Source Software for Science program.

Random number generator performance improved (#7517)

Sampling using cupy.random.Generator.* methods were slower than the cupy.random.* function calls using the old random API. Now the regression is solved, and performance has increased more than 4X when using the cupy.random.Generator API.

Changes without compatibility

Drop support for Python 3.8

Getting aligned with NumPy NEP29, Python 3.8 is no longer supported since CuPy v13.

Changes

New Features

Add NdPPoly to cupyx.scipy.interpolate (#7357)
Implement delete function, add documentation (#7359)
add array_api.take function (#7432)
Add lfilter/IIR utilities to cupyx.scipy.signal (#7442)
Added scipy.special.binom functionality to CuPy (#7463)
cupyx/scipy/signal: add savgol_coeffs and savgol_filter (#7469)
Add scipy.special.zetac to cupyx (#7470)
add cupyx.scipy.special.exprel (#7474)
Add lfiltic and lfilter_zi to cupyx.scipy.signal (#7477)
Add filtfilt to cupyx.scipy.signal (#7496)
Add deconvolve to cupyx.scipy.signal (#7509)
Add symiirorder1 to cupyx.scipy.signal (#7511)
Add symiirorder2 to cupyx.scipy.signal (#7518)
Add scipy.special.spherical_yn (#7520)
Add sosfilt to cupyx.scipy.signal (#7528)
ENH: scipy.signal: add detrend (#7536)
cupyx.scipy.signal: add bilinear & bilinear_zpk (#7541)

Enhancements

Support SciPy 1.10 (#7367)
ROCm5.3.0+ rocPrim C++14 extension requirement. (#7412)
Support cuDNN 8.8 (#7472)
Support CUDA 12.1 (#7473)
Support NumPy 1.24: dtype and casting keyword arguments for hstack, vstack, stack (#7490)
Replace concatenate by slice manipulation in lfilter (#7522)
Support NumPy 1.24: Adding strict option to testing.assert_array_equal (#7481)

Performance Improvements

Fix random module performance regression (#7517)
Improve symiirorder2 performance (#7526)

Bug Fixes

Fix new strides when array is both C and F-contiguous (#7438)
Fixup array/asarray call to prefer C order on plain NumPy arrays (#7457)
Fix cudart errors raised by texture APIs swallowed by Cython (#7540)
Dispatch ufunc methods (#7572)

Code Fixes

Rename type_test to type_testing (#7456)
Fix cythonize warnings (#7480)

Documentation

Add comparison table for scipy.interpolate module (#7433)
Update list of supported libraries (#7478)
Update aarch64 install insturctions (#7500)
Fix RTD build failure (#7547)

Installation

Bump version to v13.0.0a1 (#7494)
Use -Xfatbin=-compress-all (#7497)
Fix _depends.json not included in wheel (#7578)

Tests

Remove unused test decorators (#7453)
Remove xfail for invh (#7476)
Bump platform versions used in actions (#7488)
Fix TestBSpline::test_design_matrix_same_as_BSpline_call (#7521)
Mark scipy required in a test (#7523)
Require newer SciPy in a test (#7524)
Import SciPy in tests (#7531)
Restore GitHub Actions cache with prefix match (#7546)
Try to fix nan value mismatches in filtfilt tests (#7567)
Fix CUDA Python CI failure (#7574)

Others

Bump stable branch to v12 (#7447)
Update branch name from master to main (#7448)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@AdrianAbeyta @Anas20001 @andfoy @arogozhnikov @asi1024 @chettub @emcastillo @ev-br @kmaehashi @KyanCheung @leofang @pri1311 @Raghav323 @seberg @takagi @tysonwu

Contributors

seberg, kmaehashi, and 14 other contributors

Assets 62

25 May 08:06

takagi

v12.1.0

c76b2b2

v12.1.0

This is the release note of v12.1.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Changes

New Features

Add array_api.take function (#7513)

Enhancements

Support SciPy 1.10 (#7586)

Bug Fixes

Fixup array/asarray call to prefer C order on plain NumPy arrays (#7493)
Fix cudart errors raised by texture APIs swallowed by Cython (#7566)
Dispatch ufunc methods (#7583)

Code Fixes

Fix cythonize warnings (#7502)

Documentation

Update aarch64 install insturctions (#7503)
Fix RTD build failure (#7554)

Installation

Use -Xfatbin=-compress-all (#7505)
Fix _depends.json not included in wheel (#7584)

Tests

Bump platform versions used in actions (#7501)
Fix TestBSpline::test_design_matrix_same_as_BSpline_call (#7525)
Remove unused test decorators (#7535)
Restore GitHub Actions cache with prefix match (#7571)
Fix CUDA Python CI failure (#7582)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@andfoy @arogozhnikov @asi1024 @kmaehashi @leofang @seberg @takagi

Contributors

seberg, kmaehashi, and 5 other contributors

Assets 62

30 Mar 06:17

emcastillo

v12.0.0

60fe053

v12.0.0

This is the release note of v12.0.0. See here for the complete list of solved issues and merged PRs.

This release note only covers changes made since the v12.0.0rc1 release. Check out our blog for highlights of the v12 release!

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support for CUDA 12.1 & cuDNN 8.8 (#7484 & #7475)

CuPy now supports CUDA 12.1 and cuDNN 8.8. Binary packages are available for Linux (x86_64/aarch64) and Windows as cupy-cuda12x.

$ pip install cupy-cuda12x

Announcements

Arm packages available in PyPI

Binary packages for aarch64 (Jetson and Arm servers) can now be installed from PyPI.

$ pip install cupy-cuda102
$ pip install cupy-cuda11x
$ pip install cupy-cuda12x

Note: At the time of the release, Arm wheel of cupy-cuda11x for Python 3.8 (cupy_cuda11x-12.0.0-cp38-cp38-manylinux2014_aarch64.whl) is not available on PyPI. We are working on resolving this issue. Meanwhile, this wheel can be installed from the CuPy index. $ pip install cupy-cuda11x -f https://pip.cupy.dev/aarch64 This issue was resolved on 2023-04-03.

Changes

For all changes in v12, please refer to the release notes of the pre-releases (alpha1, alpha2, beta1, beta2, beta3, rc1).

Enhancements

ROCm5.3.0+ rocPrim C++14 extension requirement (#7454)
Support cuDNN 8.8 (#7475)
Support CUDA 12.1 (#7484)

Bug Fixes

Fix new strides when array is both C and F-contiguous (#7451)

Code Fixes

Rename type_test to type_testing (#7461)

Documentation

Add comparison table for scipy.interpolate module (#7450)
Update list of supported libraries (#7486)

Tests

Remove xfail for invh (#7485)

Others

Bump version to v12.0.0 (#7492)
Bump branch version to v12 (#7446)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@AdrianAbeyta @asi1024 @emcastillo @kmaehashi @seberg

Contributors

seberg, kmaehashi, and 3 other contributors

Assets 62

02 Mar 07:51

asi1024

v12.0.0rc1

33f5a20

v12.0.0rc1 Pre-release

Pre-release

This is the release note of v12.0.0rc1. See here for the complete list of solved issues and merged PRs.

This is a release candidate of the CuPy v12 series. Please start testing your workload with this release to prepare for the final v12 release. To install: pip install -U --pre cupy-cuda11x -f https://pip.cupy.dev/pre. See the Upgrade Guide for the list of possible breaking changes in v12.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Improved Coverage of `cupyx.scipy.interpolate`

The following interpolators have been implemented: BPoly, Akima1DInterpolator, PchipInterpolator.

DLPack v0.8 Support

CuPy is now compatible with DLPack v0.8 to allow importing/exporting bool arrays.

Fixed Performance Issue with CUDA 12.0

This release fixes a critical performance regression in CUDA 12.0 that the on-disk kernel cache is ineffective, causing kernels to be recompiled for each python process. Users with CUDA 12.0 are strongly suggested to upgrade to this release.

Changes without compatibility

Change `cupy.cuda.Device` Behavior (#7427)

The CUDA current device (set via cupy.cuda.Device.use() or underlying CUDA API cudaSetDevice()) will now be reactivated when exiting a cupy.cuda.Device context manager. This reverts the change introduced in CuPy v10, making the behavior identical to the one in CuPy v9 or earlier. Please refer to the Upgrade Guide for the background of this decision.

Requirement Changes (#7405)

As per NEP 29, CuPy v12 drops support for Python 3.7 and NumPy 1.20. Support for SciPy 1.6 has been dropped as well.

Remove Texture Reference APIs (#7308)

Texture reference features (RawModule.get_texref() and TextureReference), which were marked deprecated in CUDA 10.1 and removed in CUDA 12.0, have been removed from CuPy.

Changes

New Features

Initial experimental & private cupyx.distributed._array implementation (#7040)
Add PchipInterpolator to cupyx.scipy.interpolate (#7255)
Add Akima1DInterpolator to cupyx.scipy.interpolate (#7260)
Add cached_code to ElementwiseKernel and ReductionKernel (#7265)
Enable spline methods on RegularGridInterpolator (#7334)
Add BPoly to cupyx.scipy.interpolate module (#7343)

Enhancements

Use NumPy 1.24 in CI and bump baseline API (#7248)
Use warp size from runtime.getDeviceProperties (#7302)
Update DLPack to v0.8 to support bool arrays (#7307)
Remove texture reference completely (#7308)
Work around a potential OOM error raised by CUB histogram (#7316)
Mark cupy.cuda.profiler.initialize deprecated as it is removed in CUDA 12 (#7377)
Drop support for Python 3.7, NumPy 1.20, and SciPy 1.6 (#7405)
Raise RuntimeError if pylibraft is unavailable (#7411)
Revert cupy.cuda.Device behavior to v9 (#7427)
Fix ndarray.fill to raise ComplexWarning (#7393)
Fix arange() to raise TypeError in boolean case (#7394)

Performance Improvements

Change implementation of fftshift and ifftshift (#7399)

Bug Fixes

Fix kernel cache not working in CUDA 12.0 (#7345)
Imporves stability of orthogonization step in cupyx.scipy.sparse.eigsh (#7356)
Do not test NumPy version for private APIs (#7368)

Code Fixes

Small fixes and refactor of casting related things (#7322)

Documentation

Doc: fix wrong time unit (#7312)
Doc: add docs for contiguity policy (#7344)
Doc: downgrade pydata-sphinx-theme to v0.11.0 (#7375)
Fix typo in docstring (#7402)
DOC: cupyx.interpolate: document limitations on ROCm (#7419)
Add upgrade guide for v12 (#7430)

Installation

Add CUPY_INCLUDE_PATH and CUPY_LIBRARY_PATH env vars (#7305)
Bump docker image to CUDA 11.8.0 (#7429)
Bump version to v12.0.0rc1 (#7434)

Tests

CI: tentatively use SciPy 1.9 in Windows (#7326)
CI: Add optuna 3.0 (#7333)
Avoid int8 overflow warning in TestRoundHalfway (#7338)
Avoid int8 overflow in some tests (#7339)
Fix int8 overflow in vectorize tests (#7340)
Avoid casting nan value to integer type in nanargmin/max tests (#7341)
Add CI for CUDA 12.0 on Windows (#7349)
Remove invalid pytest markers and turn on strict mode (#7350)
Drop support for Optuna v2 (#7363)
Filter SQLAlchemy 2.0 warnings raised from Optuna v2 (#7364)
Fix pre-commit configuration error (#7369)
Avoid int8 overflow in core test (#7387)
Fix sumprod test to avoid uint overflow (#7395)
Avoid fillvalue overflow in cupyx.scipy.signal test (#7397)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

Contributors:
@andfoy @asi1024 @emcastillo @ev-br @kmaehashi @leofang @Nordicus @Raghav323 @RisaKirisu @seberg @wstolp

Contributors

seberg, kmaehashi, and 9 other contributors

Assets 62

02 Mar 07:51

asi1024

v11.6.0

6efd7d4

v11.6.0

This is the release note of v11.6.0. See here for the complete list of solved issues and merged PRs.

This is the last planned release for CuPy v11 series. Please start testing your workload with the v12 release candidate to get ready for the final v12 release. To install:pip install -U --pre cupy-cuda11x -f https://pip.cupy.dev/pre. See the Upgrade Guide for the list of possible breaking changes in v12.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Fixed Performance Issue with CUDA 12.0

Changes

Enhancements

Use warp size from runtime.getDeviceProperties (#7353)
Update DLPack to v0.8 to support bool arrays (#7376)
Mark cupy.cuda.profiler.initialize deprecated as it is removed in CUDA 12 (#7379)
Work around a potential OOM error raised by CUB histogram (#7388)
Use NumPy 1.24 in CI and bump baseline API (#7423)
Fix arange() to raise TypeError in boolean case (#7407)

Bug Fixes

Fix kernel cache not working in CUDA 12.0 (#7348)
Imporves stability of orthogonization step in cupyx.scipy.sparse.eigsh (#7361)
Do not test NumPy version for private APIs (#7370)

Documentation

Downgrade pydata-sphinx-theme to v0.11.0 (#7380)

Installation

Bump version to v11.6.0 (#7435)

Tests

CI: tentatively use SciPy 1.9 in Windows (#7336)
CI: Add optuna 3.0 (#7337)
Remove invalid pytest markers and turn on strict mode (#7354)
Avoid int8 overflow warning in TestRoundHalfway (#7362)
Filter SQLAlchemy 2.0 warnings raised from Optuna v2 (#7365)
Add CI for CUDA 12.0 on Windows (#7371)
Fix pre-commit configuration error (#7373)
Avoid casting nan value to integer type in nanargmin/max tests (#7381)
Avoid int8 overflow in some tests (#7382)
Fix int8 overflow in vectorize tests (#7384)
Fix sumprod test to avoid uint overflow (#7398)
Avoid fillvalue overflow in cupyx.scipy.signal test (#7401)
Fix ndarray.fill to raise ComplexWarning (#7408)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @emcastillo @kmaehashi @leofang @RisaKirisu

Contributors

kmaehashi, asi1024, and 3 other contributors

Assets 77

19 Jan 08:44

kmaehashi

v12.0.0b3

1a9c914

v12.0.0b3 Pre-release

Pre-release

This is the release note of v12.0.0b3. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CUDA 12 & H100 Support

CuPy now supports CUDA 12.0 and NVIDIA's latest H100 GPU. Binary packages are available for Linux (x86_64/aarch64) and Windows.

$ pip install cupy-cuda12x --pre -f https://pip.cupy.dev/pre

Note that cuDNN support is unavailable at this time as cuDNN for CUDA 12 has not yet been released.

NXTX3

NVTX support in CuPy is now backed by NVTX3 instead of the legacy NVTX1.

Changes

New Features

Add cupyx.scipy.interpolate.make_interp_spline (#7195)
Implementing RegularGridInterpolator and interpn from scipy.interpolate (#7197)
Add PPoly to cupyx.scipy.interpolate (#7204)
Add uniform() to random generator (#7205)
Implement make_interp_spline(..., bc_type="periodic") (#7206)
JIT: Enhance thrust functions coverage (#7233)
Add CubicHermiteSpline to cupyx.scipy.interpolate (#7242)

Enhancements

Conditionally change identifiers for ROCm (#7079)
cupyx.scipy.sparse.linalg.spsolve : allow two-dimensional right-hand sides in A @ X = B (#7219)
Support CUDA 12.0 (#7235)
Extra fixes for CUDA 12.0 (#7236)
Adding smaller eigenvalues option in cupyx.scipy.sparse.linalg.eigsh (#7269)
Performance optimization of RegularGridInterpolator (#7275)
Add function to diagnose Windows DLL load issue (#7279)
Support NCCL 2.16 (#7283)
Bump to cuTENSOR 1.6.2 (#7284)
Support cuDNN 8.7 (#7285)
Add cupy-cuda12x to cupy-wheel (#7300)
Migrate to NVTX3 (#7304)
Update for deprecations in NumPy 1.24 (#7245)
Check if the slice does not have inhomogeneous shape before converting it to array (#7286)
Update array_api (#7313)

Bug Fixes

Fix interpreting Sparse init arguments (#7222)
Fix race condition in Jitify (#7259)
Support passing int as shape to broadcast_to (#7271)
Update cuTENSOR installer for CUDA 12.x (#7298)

Documentation

Bump docs requirements (#7247)
Add explanation for JIT kernel. (#7252)
Doc: Add interop example using raw pointers (#7278)
Doc: Bump supported environments (CUDA 12 / cuDNN 8.7 / NCCL 2.16) (#7310)

Installation

Bump version to v12.0.0b3 (#7323)

Tests

CI: Support cuTENSOR 1.6.2 which defaults to CUDA 12 (#7237)
Skip tests if SciPy is unavailable (#7239)
Fix CI failures related to cupyx.scipy.interpolate (#7262)
Filter SQLAlchemy's warning on which optuna depends in test (#7276)
Add CI for CUDA 12.0 (#7299)
CI: Use NVTX1 in FlexCI image (#7311)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @andfoy @asi1024 @ev-br @hubertlu-tw @ideasrule @kmaehashi @leofang @mandal-saswata @oishigyunyu @takagi

Contributors

kmaehashi, takagi, and 9 other contributors

Assets 77

19 Jan 08:44

kmaehashi

v11.5.0

ba7ef7e

v11.5.0

This is the release note of v11.5.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CUDA 12 & H100 Support

CuPy now supports CUDA 12.0 and NVIDIA's latest H100 GPU. Binary packages are available for Linux (x86_64/aarch64) and Windows.

$ pip install cupy-cuda12x

For aarch64:
$ pip install cupy-cuda12x -f https://pip.cupy.dev/aarch64

Note that cuDNN support is unavailable at this time as cuDNN for CUDA 12 has not yet been released.

Changes

Enhancements

Support CUDA 12.0 (#7238)
Conditionally change identifiers for ROCm (#7244)
Extra fixes for CUDA 12.0 (#7257)
Support NCCL 2.16 (#7288)
Bump to cuTENSOR 1.6.2 (#7290)
Support cuDNN 8.7 (#7296)
Lazy load dtypes deprecated in NumPy 1.24 (#7297)
Add cupy-cuda12x to cupy-wheel (#7327)
Update for deprecations in NumPy 1.24 (#7263)
Update array_api (#7321)

Bug Fixes

Fix interpreting Sparse init arguments (#7230)
Fix race condition in Jitify (#7266)
Support passing int as shape to broadcast_to (#7291)
Update cuTENSOR installer for CUDA 12.x (#7301)

Documentation

Bump docs requirements (#7258)
Doc: Bump supported environments (CUDA 12 / cuDNN 8.7 / NCCL 2.16) (#7320)

Installation

Bump version to v11.5.0 (#7324)

Tests

CI: Support cuTENSOR 1.6.2 which defaults to CUDA 12 (#7241)
Filter SQLAlchemy's warning on which optuna depends in test (#7277)
Fix tests for NumPy 1.24 (c.f. #7286) (#7287)
Add CI for CUDA 12.0 (#7317)
CI: Use NVTX1 in FlexCI image (#7325)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @hubertlu-tw @kmaehashi @leofang @takagi

Contributors

kmaehashi, takagi, and 3 other contributors

Assets 77

08 Dec 08:31

takagi

v12.0.0b2

77e1dcb

v12.0.0b2 Pre-release

Pre-release

This is the release note of v12.0.0b2. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

More `cupyx.scipy.interpolate` APIs (#7086, #7190 and #7215)

Increased coverage of cupyx.scipy.interpolate APIs, which now includes BSpline, RBFInterpolator, splantider and splder.

Use CUB reduction classes in `cupyx.jit` (#7145)

Now it is possible to use the CUB reduction classes, cub::WarpReduce and cub::BlockReduce, in kernels written using CuPy JIT.

import cupy, cupyx
from cupy.cuda import runtime
from cupyx import jit

@jit.rawkernel()
def warp_reduce_sum(x, y):
    WarpReduce = jit.cub.WarpReduce[cupy.int32]
    temp_storage = jit.shared_memory(
        dtype=WarpReduce.TempStorage, size=1)
    i, j = jit.blockIdx.x, jit.threadIdx.x
    value = x[i, j]
    aggregator = WarpReduce(temp_storage[0])
    aggregate = aggregator.Reduce(value, jit.cub.Sum())
    if j == 0:
        y[i] = aggregate

warp_size = 64 if runtime.is_hip else 32
h, w = (32, warp_size)
x = cupy.arange(h * w, dtype=cupy.int32).reshape(h, w)
cupy.random.shuffle(x)
y = cupy.zeros(h, dtype=cupy.int32)
warp_reduce_sum[h, w](x, y)

Acknowledgements: This work was done by Tsutsui Masayoshi (@TsutsuiMasayoshi) as a part of the internship program at Preferred Networks.

Changes

New Features

Add 1-D BSpline to interpolate module (#7086)
JIT: Support cub::WarpReduce and cub::BlockReduce (#7145)
Add cupyx.scipy.interpolate.RBFInterpolator (#7190)
Expose splder and splantider (#7215)

Enhancements

Use cuSPARSE Generic API instead of older one documented to be removed (#7052)
Improve _PerfCaseResult.to_str format (#7152)

Bug Fixes

Split inputs to random routines (#7173)
Fix 1-dim lexsort (#7178)
Fix cupyx.scipy.ndimage.zoom for outputs of size 1 when mode is 'opencv' (#7192)
Fix wrong argument in warnings.warn() (#7194)
Use list(kwargs) instead of list(kwargs.keys) (#7203)
Fix cusparseSpSM compatibility (#7214)
Remove scipy import (#7218)
Use naive comb() for Python 3.7 (#7221)

Tests

CI: Generate coverage count just after the parameter axis in table (#7175)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @andfoy @asi1024 @emcastillo @ev-br @hadipash @jjmortensen @kmaehashi @takagi @TsutsuiMasayoshi

Contributors

kmaehashi, takagi, and 8 other contributors

Assets 62

08 Dec 08:31

takagi

v11.4.0

62c9347

v11.4.0

This is the release note of v11.4.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Changes

Enhancements

Use cuSPARSE Generic API instead of older one documented to be removed (#7209)

Bug Fixes

Fix 1-dim lexsort (#7191)
Fix cupyx.scipy.ndimage.zoom for outputs of size 1 when mode is 'opencv' (#7202)
Split inputs to random routines (#7207)
Use list(kwargs) instead of list(kwargs.keys) (#7213)
Fix cusparseSpSM compatibility (#7220)

Tests

CI: Generate coverage count just after the parameter axis in table (#7188)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@emcastillo @hadipash @jjmortensen @kmaehashi @takagi

Contributors

kmaehashi, takagi, and 3 other contributors

Assets 62

11 Nov 05:39

asi1024

v12.0.0b1

0df377a

v12.0.0b1 Pre-release

Pre-release

This is the release note of v12.0.0b1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support for CUDA 11.8 & NVIDIA H100 GPUs

This release adds support for CUDA 11.8 and the latest NVIDIA H100 GPUs. Note that CUDA 11.8 support is included in the cupy-cuda11x wheel.

Support for Python 3.11

Wheels are now available for Python 3.11.

`ufunc` Methods

This release adds ufunc.reduce, ufunc.accumulate, ufunc.reduceat, and ufunc.at methods. See the documentation for more details.

Use Thrust in `cupyx.jit` (#7054, #7139)

Now it is possible to use the Thrust library device functions in kernels written using CuPy JIT.

import cupy, cupyx

@cupyx.jit.rawkernel()
def sort_by_key(x, y):
    i = cupyx.jit.threadIdx.x
    x_array = x[i]
    y_array = y[i]
    cupyx.jit.thrust.sort_by_key(
        cupyx.jit.thrust.device,
        x_array.begin(),
        x_array.end(),
        y_array.begin(),
    )

h, w = (256, 256)
x = cupy.arange(h * w, dtype=cupy.int32)
cupy.random.shuffle(x)
x = x.reshape(h, w)
y = cupy.arange(h * w, dtype=cupy.int32)
cupy.random.shuffle(y)
y = y.reshape(h, w)
sort_by_key[1, 256](x, y)

Currently supported Thrust functions are count, copy, find, mismatch, sort, sort_by_key.

Acknowledgements: This work was done by Tsutsui Masayoshi (@TsutsuiMasayoshi) as a part of the internship program at Preferred Networks.

Changes without compatibility

Deprecates `ndarray.scatter_{add,max,min}` (#7097)

cupy.ndarray.scatter_{add,max,min} methods are marked as deprecated. Use the corresponding ufunc methods (cupy.{add,maximum,minimum}.at) instead.

CUDA library wrappers now live in `cupyx` (#7013)

Previously, CuPy has been providing high-level wrappers for CUDA libraries as cupy.cudnn, cupy.cusolver, cupy.cusparse, and cupy.cutensor. These modules are now moved to cupyx as a part of the cupy namespace cleanup. The old modules are still available but marked as deprecated. Note that these modules are still undocumented and may be subject to change.

Changes

New Features

Add axis to cupy.logspace (#6797)
Support thrust::count, device in CuPy JIT (#7054)
Add cupy.ndarray.searchsorted (#7059)
Support add.at, maximum.at, minimum.at (#7077)
Add pdist implementation to distance functions (#7078)
Support subtract.at, bitwise_and.at, bitwise_or.at, bitwise_xor.at (#7099)
Add ufunc.reduce and ufunc.accumulate (#7105)
Add cupy.add.reduceat (#7115)
Implement cupy.min_scalar_type (#7136)
JIT: Support more thrust functions (#7139)

Enhancements

Move cupy.cudnn cupy.cusolver cupy.cutensor cupy.cusparse to cupyx (#7013)
Allow randint to support array bounds (#7051)
Deprecate ndarray.scatter_{add, max, min} (#7097)
Support CUDA 11.8 H100 GPUs (#7100)
Support CUDA 11.8 (#7117)
Add CUDA 11.8 on documents (#7119)
Fix compile error from inf/nan in cupy.fuse (#7122)
Raise TypeError instead of ValueError in cupy.from_dlpack when CPU tensor is passed (#7133)
Support NCCL 2.15 (#7153)
Support Python 3.11 (#7159)
Fix indexing sparse matrix with empty index arguments (#7143)

Bug Fixes

Make sure that cupy (array-api) Array objects can be composed using asarray (#6874)
Don't use __del__ in TCPStore (#6989)
JIT: Fix compile error for op.routine including in0_type (#7076)
Fix cupy.nansum in fusing (#7102)
Fusion TypeError in cupy._core.fusion._call_ufunc() (#7113)
Fix a typo (#7163)
JIT: Fix compile error of minmax function (#7167)

Code Fixes

Remove _ufunc_method directory (#7116)
Add missing base type to cdef declarations (#7170)

Documentation

Docs: Add missing functions (#7103)
Docs: ufunc methods (#7104)
Improve benchmark documentation (#7176)

Installation

Bump version to v12.0.0b1 (#7181)

Examples

Tests

CI: Add ROCm 5.3 (#7124)
CI: Allow /test jenkins to trigger Jenkins only (#7126)
Install zlib for CUDA 11.8 Windows CI (#7137)
CI: improve use of cache in GitHub Actions (#7141)
Fix for pytest 7.2 (#7147)
CI: Add support for the latest FlexCI Windows image (#7161)
JIT: Skip HIP thrust::sort test (#7162)
CI: use pre-commit in GitHub Actions (#7123)

Others

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @andfoy @asi1024 @Diwakar-Gupta @emcastillo @IncubatorShokuhou @kmaehashi @MarcoGorelli @takagi @TsutsuiMasayoshi

Contributors

kmaehashi, takagi, and 8 other contributors

Assets 62

Releases: cupy/cupy

v13.0.0a1

Highlights

CuPy v13 Roadmap and Revised Release Schedule

Improved Coverage of cupyx.scipy.signal and cupyx.scipy.interpolate APIs (#7442, #7496 and others)

Random number generator performance improved (#7517)

Changes without compatibility

Drop support for Python 3.8

Changes

New Features

Enhancements

Performance Improvements

Bug Fixes

Code Fixes

Documentation

Installation

Tests

Others

Contributors

Contributors

v12.1.0

Changes

New Features

Enhancements

Bug Fixes

Code Fixes

Documentation

Installation

Tests

Contributors

Contributors

v12.0.0

Highlights

Support for CUDA 12.1 & cuDNN 8.8 (#7484 & #7475)

Announcements

Arm packages available in PyPI

Changes

Enhancements

Bug Fixes

Code Fixes

Documentation

Tests

Others

Contributors

Contributors

v12.0.0rc1

Highlights

Improved Coverage of cupyx.scipy.interpolate

DLPack v0.8 Support

Fixed Performance Issue with CUDA 12.0

Changes without compatibility

Change cupy.cuda.Device Behavior (#7427)

Requirement Changes (#7405)

Remove Texture Reference APIs (#7308)

Changes

New Features

Enhancements

Performance Improvements

Bug Fixes

Code Fixes

Documentation

Installation

Tests

Contributors

Contributors

v11.6.0

Highlights

Fixed Performance Issue with CUDA 12.0

Changes

Enhancements

Bug Fixes

Documentation

Installation

Tests

Contributors

Contributors

v12.0.0b3

Highlights

CUDA 12 & H100 Support

NXTX3

Improved Coverage of `cupyx.scipy.signal` and `cupyx.scipy.interpolate` APIs (#7442, #7496 and others)

Improved Coverage of `cupyx.scipy.interpolate`

Change `cupy.cuda.Device` Behavior (#7427)

More `cupyx.scipy.interpolate` APIs (#7086, #7190 and #7215)

Use CUB reduction classes in `cupyx.jit` (#7145)

`ufunc` Methods

Use Thrust in `cupyx.jit` (#7054, #7139)

Deprecates `ndarray.scatter_{add,max,min}` (#7097)

CUDA library wrappers now live in `cupyx` (#7013)