Releases: cupy/cupy
v13.0.0a1
This is the release note of v13.0.0a1. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
CuPy v13 Roadmap and Revised Release Schedule
- We have published a list of feature roadmaps for CuPy v13 planned to be released in October 2023. See #7555 for the details.
- Starting in the CuPy v13 development cycle, we have adjusted our release frequency to once every two months. Mid-term or hot-fix releases may be provided depending on necessity, such as for new CUDA/Python version support or critical bug fixes. This new policy also applies to v12 releases.
- RFC: We plan to drop CUDA 10.2/11.0/11.1 support in CuPy v13. Please leave a comment on #7557 if you have any suggestions.
- RFC: We are thinking of improving PyTorch interoperability features in CuPy. If you are interested, please join the discussion in #7556.
Improved Coverage of cupyx.scipy.signal
and cupyx.scipy.interpolate
APIs (#7442, #7496 and others)
lfilter
, lfilter_zi
, filtfilt
, sosfilt
APIs are now included in cupyx.scipy.signal
, and NdPPoly
in cupyx.scipy.interpolate
modules.
Acknowledgements: This work was done by Edgar Andrés Margffoy Tuay (@andfoy) and Evgeni Burovski (@ev-br) under the support of the Chan Zuckerberg Initiative's Essential Open Source Software for Science program.
Random number generator performance improved (#7517)
Sampling using cupy.random.Generator.*
methods were slower than the cupy.random.*
function calls using the old random API. Now the regression is solved, and performance has increased more than 4X when using the cupy.random.Generator
API.
Changes without compatibility
Drop support for Python 3.8
Getting aligned with NumPy NEP29, Python 3.8 is no longer supported since CuPy v13.
Changes
New Features
- Add
NdPPoly
tocupyx.scipy.interpolate
(#7357) - Implement
delete function
, add documentation (#7359) - add array_api.take function (#7432)
- Add lfilter/IIR utilities to cupyx.scipy.signal (#7442)
- Added
scipy.special.binom
functionality to CuPy (#7463) cupyx/scipy/signal
: add savgol_coeffs and savgol_filter (#7469)- Add
scipy.special.zetac
to cupyx (#7470) - add
cupyx.scipy.special.exprel
(#7474) - Add
lfiltic and lfilter_zi
tocupyx.scipy.signal
(#7477) - Add
filtfilt
tocupyx.scipy.signal
(#7496) - Add
deconvolve
tocupyx.scipy.signal
(#7509) - Add
symiirorder1
tocupyx.scipy.signal
(#7511) - Add
symiirorder2
tocupyx.scipy.signal
(#7518) - Add
scipy.special.spherical_yn
(#7520) - Add
sosfilt
to cupyx.scipy.signal (#7528) - ENH:
scipy.signal
: add detrend (#7536) cupyx.scipy.signal
: addbilinear
&bilinear_zpk
(#7541)
Enhancements
- Support SciPy 1.10 (#7367)
- ROCm5.3.0+ rocPrim C++14 extension requirement. (#7412)
- Support cuDNN 8.8 (#7472)
- Support CUDA 12.1 (#7473)
- Support NumPy 1.24:
dtype
andcasting
keyword arguments forhstack
,vstack
,stack
(#7490) - Replace
concatenate
by slice manipulation inlfilter
(#7522) - Support NumPy 1.24: Adding
strict
option totesting.assert_array_equal
(#7481)
Performance Improvements
Bug Fixes
- Fix new strides when array is both C and F-contiguous (#7438)
- Fixup array/asarray call to prefer C order on plain NumPy arrays (#7457)
- Fix cudart errors raised by texture APIs swallowed by Cython (#7540)
- Dispatch ufunc methods (#7572)
Code Fixes
Documentation
- Add comparison table for
scipy.interpolate
module (#7433) - Update list of supported libraries (#7478)
- Update aarch64 install insturctions (#7500)
- Fix RTD build failure (#7547)
Installation
- Bump version to v13.0.0a1 (#7494)
- Use
-Xfatbin=-compress-all
(#7497) - Fix
_depends.json
not included in wheel (#7578)
Tests
- Remove unused test decorators (#7453)
- Remove xfail for invh (#7476)
- Bump platform versions used in actions (#7488)
- Fix TestBSpline::test_design_matrix_same_as_BSpline_call (#7521)
- Mark scipy required in a test (#7523)
- Require newer SciPy in a test (#7524)
- Import SciPy in tests (#7531)
- Restore GitHub Actions cache with prefix match (#7546)
- Try to fix nan value mismatches in filtfilt tests (#7567)
- Fix CUDA Python CI failure (#7574)
Others
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@AdrianAbeyta @Anas20001 @andfoy @arogozhnikov @asi1024 @chettub @emcastillo @ev-br @kmaehashi @KyanCheung @leofang @pri1311 @Raghav323 @seberg @takagi @tysonwu
v12.1.0
This is the release note of v12.1.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Changes
New Features
- Add
array_api.take
function (#7513)
Enhancements
- Support SciPy 1.10 (#7586)
Bug Fixes
- Fixup array/asarray call to prefer C order on plain NumPy arrays (#7493)
- Fix cudart errors raised by texture APIs swallowed by Cython (#7566)
- Dispatch ufunc methods (#7583)
Code Fixes
- Fix cythonize warnings (#7502)
Documentation
Installation
Tests
- Bump platform versions used in actions (#7501)
- Fix TestBSpline::test_design_matrix_same_as_BSpline_call (#7525)
- Remove unused test decorators (#7535)
- Restore GitHub Actions cache with prefix match (#7571)
- Fix CUDA Python CI failure (#7582)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@andfoy @arogozhnikov @asi1024 @kmaehashi @leofang @seberg @takagi
v12.0.0
This is the release note of v12.0.0. See here for the complete list of solved issues and merged PRs.
This release note only covers changes made since the v12.0.0rc1 release. Check out our blog for highlights of the v12 release!
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Support for CUDA 12.1 & cuDNN 8.8 (#7484 & #7475)
CuPy now supports CUDA 12.1 and cuDNN 8.8. Binary packages are available for Linux (x86_64/aarch64) and Windows as cupy-cuda12x
.
$ pip install cupy-cuda12x
Announcements
Arm packages available in PyPI
Binary packages for aarch64 (Jetson and Arm servers) can now be installed from PyPI.
$ pip install cupy-cuda102
$ pip install cupy-cuda11x
$ pip install cupy-cuda12x
Note: At the time of the release, Arm wheel of This issue was resolved on 2023-04-03.cupy-cuda11x
for Python 3.8 (cupy_cuda11x-12.0.0-cp38-cp38-manylinux2014_aarch64.whl
) is not available on PyPI. We are working on resolving this issue. Meanwhile, this wheel can be installed from the CuPy index. $ pip install cupy-cuda11x -f https://pip.cupy.dev/aarch64
Changes
For all changes in v12, please refer to the release notes of the pre-releases (alpha1, alpha2, beta1, beta2, beta3, rc1).
Enhancements
- ROCm5.3.0+ rocPrim C++14 extension requirement (#7454)
- Support cuDNN 8.8 (#7475)
- Support CUDA 12.1 (#7484)
Bug Fixes
- Fix new strides when array is both C and F-contiguous (#7451)
Code Fixes
- Rename
type_test
totype_testing
(#7461)
Documentation
- Add comparison table for
scipy.interpolate
module (#7450) - Update list of supported libraries (#7486)
Tests
- Remove
xfail
forinvh
(#7485)
Others
Contributors
The CuPy Team would like to thank all those who contributed to this release!
v12.0.0rc1
This is the release note of v12.0.0rc1. See here for the complete list of solved issues and merged PRs.
This is a release candidate of the CuPy v12 series. Please start testing your workload with this release to prepare for the final v12 release. To install: pip install -U --pre cupy-cuda11x -f https://pip.cupy.dev/pre
. See the Upgrade Guide for the list of possible breaking changes in v12.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Improved Coverage of cupyx.scipy.interpolate
The following interpolators have been implemented: BPoly
, Akima1DInterpolator
, PchipInterpolator
.
Acknowledgements: This work was done by Edgar Andrés Margffoy Tuay (@andfoy) and Evgeni Burovski (@ev-br) under the support of the Chan Zuckerberg Initiative's Essential Open Source Software for Science program.
DLPack v0.8 Support
CuPy is now compatible with DLPack v0.8 to allow importing/exporting bool
arrays.
Fixed Performance Issue with CUDA 12.0
This release fixes a critical performance regression in CUDA 12.0 that the on-disk kernel cache is ineffective, causing kernels to be recompiled for each python process. Users with CUDA 12.0 are strongly suggested to upgrade to this release.
Changes without compatibility
Change cupy.cuda.Device
Behavior (#7427)
The CUDA current device (set via cupy.cuda.Device.use()
or underlying CUDA API cudaSetDevice()
) will now be reactivated when exiting a cupy.cuda.Device
context manager. This reverts the change introduced in CuPy v10, making the behavior identical to the one in CuPy v9 or earlier. Please refer to the Upgrade Guide for the background of this decision.
Requirement Changes (#7405)
As per NEP 29, CuPy v12 drops support for Python 3.7 and NumPy 1.20. Support for SciPy 1.6 has been dropped as well.
Remove Texture Reference APIs (#7308)
Texture reference features (RawModule.get_texref()
and TextureReference
), which were marked deprecated in CUDA 10.1 and removed in CUDA 12.0, have been removed from CuPy.
Changes
New Features
- Initial experimental & private
cupyx.distributed._array
implementation (#7040) - Add
PchipInterpolator
tocupyx.scipy.interpolate
(#7255) - Add
Akima1DInterpolator
tocupyx.scipy.interpolate
(#7260) - Add
cached_code
toElementwiseKernel
andReductionKernel
(#7265) - Enable spline methods on
RegularGridInterpolator
(#7334) - Add
BPoly
tocupyx.scipy.interpolate module
(#7343)
Enhancements
- Use NumPy 1.24 in CI and bump baseline API (#7248)
- Use warp size from
runtime.getDeviceProperties
(#7302) - Update DLPack to v0.8 to support bool arrays (#7307)
- Remove texture reference completely (#7308)
- Work around a potential OOM error raised by CUB histogram (#7316)
- Mark
cupy.cuda.profiler.initialize
deprecated as it is removed in CUDA 12 (#7377) - Drop support for Python 3.7, NumPy 1.20, and SciPy 1.6 (#7405)
- Raise RuntimeError if pylibraft is unavailable (#7411)
- Revert
cupy.cuda.Device
behavior to v9 (#7427) - Fix
ndarray.fill
to raiseComplexWarning
(#7393) - Fix
arange()
to raiseTypeError
in boolean case (#7394)
Performance Improvements
- Change implementation of fftshift and ifftshift (#7399)
Bug Fixes
- Fix kernel cache not working in CUDA 12.0 (#7345)
- Imporves stability of orthogonization step in
cupyx.scipy.sparse.eigsh
(#7356) - Do not test NumPy version for private APIs (#7368)
Code Fixes
- Small fixes and refactor of casting related things (#7322)
Documentation
- Doc: fix wrong time unit (#7312)
- Doc: add docs for contiguity policy (#7344)
- Doc: downgrade pydata-sphinx-theme to v0.11.0 (#7375)
- Fix typo in docstring (#7402)
- DOC: cupyx.interpolate: document limitations on ROCm (#7419)
- Add upgrade guide for v12 (#7430)
Installation
- Add
CUPY_INCLUDE_PATH
andCUPY_LIBRARY_PATH
env vars (#7305) - Bump docker image to CUDA 11.8.0 (#7429)
- Bump version to v12.0.0rc1 (#7434)
Tests
- CI: tentatively use SciPy 1.9 in Windows (#7326)
- CI: Add optuna 3.0 (#7333)
- Avoid int8 overflow warning in
TestRoundHalfway
(#7338) - Avoid int8 overflow in some tests (#7339)
- Fix int8 overflow in vectorize tests (#7340)
- Avoid casting nan value to integer type in
nanargmin/max
tests (#7341) - Add CI for CUDA 12.0 on Windows (#7349)
- Remove invalid pytest markers and turn on strict mode (#7350)
- Drop support for Optuna v2 (#7363)
- Filter SQLAlchemy 2.0 warnings raised from Optuna v2 (#7364)
- Fix pre-commit configuration error (#7369)
- Avoid int8 overflow in core test (#7387)
- Fix sumprod test to avoid uint overflow (#7395)
- Avoid
fillvalue
overflow incupyx.scipy.signal
test (#7397)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
Contributors:
@andfoy @asi1024 @emcastillo @ev-br @kmaehashi @leofang @Nordicus @Raghav323 @RisaKirisu @seberg @wstolp
v11.6.0
This is the release note of v11.6.0. See here for the complete list of solved issues and merged PRs.
This is the last planned release for CuPy v11 series. Please start testing your workload with the v12 release candidate to get ready for the final v12 release. To install:pip install -U --pre cupy-cuda11x -f https://pip.cupy.dev/pre
. See the Upgrade Guide for the list of possible breaking changes in v12.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Fixed Performance Issue with CUDA 12.0
This release fixes a critical performance regression in CUDA 12.0 that the on-disk kernel cache is ineffective, causing kernels to be recompiled for each python process. Users with CUDA 12.0 are strongly suggested to upgrade to this release.
Changes
Enhancements
- Use warp size from
runtime.getDeviceProperties
(#7353) - Update DLPack to v0.8 to support bool arrays (#7376)
- Mark
cupy.cuda.profiler.initialize
deprecated as it is removed in CUDA 12 (#7379) - Work around a potential OOM error raised by CUB histogram (#7388)
- Use NumPy 1.24 in CI and bump baseline API (#7423)
- Fix
arange()
to raiseTypeError
in boolean case (#7407)
Bug Fixes
- Fix kernel cache not working in CUDA 12.0 (#7348)
- Imporves stability of orthogonization step in
cupyx.scipy.sparse.eigsh
(#7361) - Do not test NumPy version for private APIs (#7370)
Documentation
- Downgrade pydata-sphinx-theme to v0.11.0 (#7380)
Installation
- Bump version to v11.6.0 (#7435)
Tests
- CI: tentatively use SciPy 1.9 in Windows (#7336)
- CI: Add optuna 3.0 (#7337)
- Remove invalid pytest markers and turn on strict mode (#7354)
- Avoid int8 overflow warning in
TestRoundHalfway
(#7362) - Filter SQLAlchemy 2.0 warnings raised from Optuna v2 (#7365)
- Add CI for CUDA 12.0 on Windows (#7371)
- Fix pre-commit configuration error (#7373)
- Avoid casting nan value to integer type in
nanargmin/max
tests (#7381) - Avoid int8 overflow in some tests (#7382)
- Fix int8 overflow in vectorize tests (#7384)
- Fix sumprod test to avoid uint overflow (#7398)
- Avoid
fillvalue
overflow incupyx.scipy.signal
test (#7401) - Fix ndarray.fill to raise ComplexWarning (#7408)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
v12.0.0b3
This is the release note of v12.0.0b3. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
CUDA 12 & H100 Support
CuPy now supports CUDA 12.0 and NVIDIA's latest H100 GPU. Binary packages are available for Linux (x86_64/aarch64) and Windows.
$ pip install cupy-cuda12x --pre -f https://pip.cupy.dev/pre
Note that cuDNN support is unavailable at this time as cuDNN for CUDA 12 has not yet been released.
NXTX3
NVTX support in CuPy is now backed by NVTX3 instead of the legacy NVTX1.
Changes
New Features
- Add
cupyx.scipy.interpolate.make_interp_spline
(#7195) - Implementing
RegularGridInterpolator
andinterpn
fromscipy.interpolate
(#7197) - Add
PPoly
tocupyx.scipy.interpolate
(#7204) - Add
uniform()
to random generator (#7205) - Implement
make_interp_spline(..., bc_type="periodic")
(#7206) - JIT: Enhance thrust functions coverage (#7233)
- Add
CubicHermiteSpline
tocupyx.scipy.interpolate
(#7242)
Enhancements
- Conditionally change identifiers for ROCm (#7079)
cupyx.scipy.sparse.linalg.spsolve
: allow two-dimensional right-hand sides inA @ X = B
(#7219)- Support CUDA 12.0 (#7235)
- Extra fixes for CUDA 12.0 (#7236)
- Adding smaller eigenvalues option in
cupyx.scipy.sparse.linalg.eigsh
(#7269) - Performance optimization of
RegularGridInterpolator
(#7275) - Add function to diagnose Windows DLL load issue (#7279)
- Support NCCL 2.16 (#7283)
- Bump to cuTENSOR 1.6.2 (#7284)
- Support cuDNN 8.7 (#7285)
- Add
cupy-cuda12x
tocupy-wheel
(#7300) - Migrate to NVTX3 (#7304)
- Update for deprecations in NumPy 1.24 (#7245)
- Check if the slice does not have inhomogeneous shape before converting it to array (#7286)
- Update
array_api
(#7313)
Bug Fixes
- Fix interpreting Sparse init arguments (#7222)
- Fix race condition in Jitify (#7259)
- Support passing int as shape to
broadcast_to
(#7271) - Update cuTENSOR installer for CUDA 12.x (#7298)
Documentation
- Bump docs requirements (#7247)
- Add explanation for JIT kernel. (#7252)
- Doc: Add interop example using raw pointers (#7278)
- Doc: Bump supported environments (CUDA 12 / cuDNN 8.7 / NCCL 2.16) (#7310)
Installation
- Bump version to v12.0.0b3 (#7323)
Tests
- CI: Support cuTENSOR 1.6.2 which defaults to CUDA 12 (#7237)
- Skip tests if SciPy is unavailable (#7239)
- Fix CI failures related to
cupyx.scipy.interpolate
(#7262) - Filter SQLAlchemy's warning on which optuna depends in test (#7276)
- Add CI for CUDA 12.0 (#7299)
- CI: Use NVTX1 in FlexCI image (#7311)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@anaruse @andfoy @asi1024 @ev-br @hubertlu-tw @ideasrule @kmaehashi @leofang @mandal-saswata @oishigyunyu @takagi
v11.5.0
This is the release note of v11.5.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
CUDA 12 & H100 Support
CuPy now supports CUDA 12.0 and NVIDIA's latest H100 GPU. Binary packages are available for Linux (x86_64/aarch64) and Windows.
$ pip install cupy-cuda12x
For aarch64:
$ pip install cupy-cuda12x -f https://pip.cupy.dev/aarch64
Note that cuDNN support is unavailable at this time as cuDNN for CUDA 12 has not yet been released.
Changes
Enhancements
- Support CUDA 12.0 (#7238)
- Conditionally change identifiers for ROCm (#7244)
- Extra fixes for CUDA 12.0 (#7257)
- Support NCCL 2.16 (#7288)
- Bump to cuTENSOR 1.6.2 (#7290)
- Support cuDNN 8.7 (#7296)
- Lazy load dtypes deprecated in NumPy 1.24 (#7297)
- Add
cupy-cuda12x
tocupy-wheel
(#7327) - Update for deprecations in NumPy 1.24 (#7263)
- Update
array_api
(#7321)
Bug Fixes
- Fix interpreting Sparse init arguments (#7230)
- Fix race condition in Jitify (#7266)
- Support passing int as shape to
broadcast_to
(#7291) - Update cuTENSOR installer for CUDA 12.x (#7301)
Documentation
- Bump docs requirements (#7258)
- Doc: Bump supported environments (CUDA 12 / cuDNN 8.7 / NCCL 2.16) (#7320)
Installation
- Bump version to v11.5.0 (#7324)
Tests
- CI: Support cuTENSOR 1.6.2 which defaults to CUDA 12 (#7241)
- Filter SQLAlchemy's warning on which optuna depends in test (#7277)
- Fix tests for NumPy 1.24 (c.f. #7286) (#7287)
- Add CI for CUDA 12.0 (#7317)
- CI: Use NVTX1 in FlexCI image (#7325)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
v12.0.0b2
This is the release note of v12.0.0b2. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
More cupyx.scipy.interpolate
APIs (#7086, #7190 and #7215)
Increased coverage of cupyx.scipy.interpolate
APIs, which now includes BSpline
, RBFInterpolator
, splantider
and splder
.
Acknowledgements: This work was done by Edgar Andrés Margffoy Tuay (@andfoy) and Evgeni Burovski (@ev-br) under the support of the Chan Zuckerberg Initiative's Essential Open Source Software for Science program.
Use CUB reduction classes in cupyx.jit
(#7145)
Now it is possible to use the CUB reduction classes, cub::WarpReduce
and cub::BlockReduce
, in kernels written using CuPy JIT.
import cupy, cupyx
from cupy.cuda import runtime
from cupyx import jit
@jit.rawkernel()
def warp_reduce_sum(x, y):
WarpReduce = jit.cub.WarpReduce[cupy.int32]
temp_storage = jit.shared_memory(
dtype=WarpReduce.TempStorage, size=1)
i, j = jit.blockIdx.x, jit.threadIdx.x
value = x[i, j]
aggregator = WarpReduce(temp_storage[0])
aggregate = aggregator.Reduce(value, jit.cub.Sum())
if j == 0:
y[i] = aggregate
warp_size = 64 if runtime.is_hip else 32
h, w = (32, warp_size)
x = cupy.arange(h * w, dtype=cupy.int32).reshape(h, w)
cupy.random.shuffle(x)
y = cupy.zeros(h, dtype=cupy.int32)
warp_reduce_sum[h, w](x, y)
Acknowledgements: This work was done by Tsutsui Masayoshi (@TsutsuiMasayoshi) as a part of the internship program at Preferred Networks.
Changes
New Features
- Add 1-D
BSpline
tointerpolate
module (#7086) - JIT: Support
cub::WarpReduce
andcub::BlockReduce
(#7145) - Add
cupyx.scipy.interpolate.RBFInterpolator
(#7190) - Expose
splder
andsplantider
(#7215)
Enhancements
- Use cuSPARSE Generic API instead of older one documented to be removed (#7052)
- Improve
_PerfCaseResult.to_str
format (#7152)
Bug Fixes
- Split inputs to random routines (#7173)
- Fix 1-dim
lexsort
(#7178) - Fix
cupyx.scipy.ndimage.zoom
for outputs of size 1 when mode is'opencv'
(#7192) - Fix wrong argument in
warnings.warn()
(#7194) - Use
list(kwargs)
instead oflist(kwargs.keys)
(#7203) - Fix cusparseSpSM compatibility (#7214)
- Remove scipy import (#7218)
- Use naive
comb()
for Python 3.7 (#7221)
Tests
- CI: Generate coverage count just after the parameter axis in table (#7175)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@anaruse @andfoy @asi1024 @emcastillo @ev-br @hadipash @jjmortensen @kmaehashi @takagi @TsutsuiMasayoshi
v11.4.0
This is the release note of v11.4.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Changes
Enhancements
- Use cuSPARSE Generic API instead of older one documented to be removed (#7209)
Bug Fixes
- Fix 1-dim
lexsort
(#7191) - Fix
cupyx.scipy.ndimage.zoom
for outputs of size 1 when mode is'opencv'
(#7202) - Split inputs to random routines (#7207)
- Use
list(kwargs)
instead oflist(kwargs.keys)
(#7213) - Fix cusparseSpSM compatibility (#7220)
Tests
- CI: Generate coverage count just after the parameter axis in table (#7188)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
v12.0.0b1
This is the release note of v12.0.0b1. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Support for CUDA 11.8 & NVIDIA H100 GPUs
This release adds support for CUDA 11.8 and the latest NVIDIA H100 GPUs. Note that CUDA 11.8 support is included in the cupy-cuda11x
wheel.
Support for Python 3.11
Wheels are now available for Python 3.11.
ufunc
Methods
This release adds ufunc.reduce
, ufunc.accumulate
, ufunc.reduceat
, and ufunc.at
methods. See the documentation for more details.
Use Thrust in cupyx.jit
(#7054, #7139)
Now it is possible to use the Thrust library device functions in kernels written using CuPy JIT.
import cupy, cupyx
@cupyx.jit.rawkernel()
def sort_by_key(x, y):
i = cupyx.jit.threadIdx.x
x_array = x[i]
y_array = y[i]
cupyx.jit.thrust.sort_by_key(
cupyx.jit.thrust.device,
x_array.begin(),
x_array.end(),
y_array.begin(),
)
h, w = (256, 256)
x = cupy.arange(h * w, dtype=cupy.int32)
cupy.random.shuffle(x)
x = x.reshape(h, w)
y = cupy.arange(h * w, dtype=cupy.int32)
cupy.random.shuffle(y)
y = y.reshape(h, w)
sort_by_key[1, 256](x, y)
Currently supported Thrust functions are count
, copy
, find
, mismatch
, sort
, sort_by_key
.
Acknowledgements: This work was done by Tsutsui Masayoshi (@TsutsuiMasayoshi) as a part of the internship program at Preferred Networks.
Changes without compatibility
Deprecates ndarray.scatter_{add,max,min}
(#7097)
cupy.ndarray.scatter_{add,max,min}
methods are marked as deprecated. Use the corresponding ufunc methods (cupy.{add,maximum,minimum}.at
) instead.
CUDA library wrappers now live in cupyx
(#7013)
Previously, CuPy has been providing high-level wrappers for CUDA libraries as cupy.cudnn
, cupy.cusolver
, cupy.cusparse
, and cupy.cutensor
. These modules are now moved to cupyx
as a part of the cupy
namespace cleanup. The old modules are still available but marked as deprecated. Note that these modules are still undocumented and may be subject to change.
Changes
New Features
- Add
axis
tocupy.logspace
(#6797) - Support
thrust::count, device
in CuPy JIT (#7054) - Add
cupy.ndarray.searchsorted
(#7059) - Support
add.at
,maximum.at
,minimum.at
(#7077) - Add pdist implementation to distance functions (#7078)
- Support
subtract.at
,bitwise_and.at
,bitwise_or.at
,bitwise_xor.at
(#7099) - Add
ufunc.reduce
andufunc.accumulate
(#7105) - Add
cupy.add.reduceat
(#7115) - Implement
cupy.min_scalar_type
(#7136) - JIT: Support more thrust functions (#7139)
Enhancements
- Move
cupy.cudnn
cupy.cusolver
cupy.cutensor
cupy.cusparse
tocupyx
(#7013) - Allow randint to support array bounds (#7051)
- Deprecate
ndarray.scatter_{add, max, min}
(#7097) - Support CUDA 11.8 H100 GPUs (#7100)
- Support CUDA 11.8 (#7117)
- Add CUDA 11.8 on documents (#7119)
- Fix compile error from
inf
/nan
in cupy.fuse (#7122) - Raise
TypeError
instead ofValueError
incupy.from_dlpack
when CPU tensor is passed (#7133) - Support NCCL 2.15 (#7153)
- Support Python 3.11 (#7159)
- Fix indexing sparse matrix with empty index arguments (#7143)
Bug Fixes
- Make sure that cupy (array-api) Array objects can be composed using asarray (#6874)
- Don't use
__del__
inTCPStore
(#6989) - JIT: Fix compile error for
op.routine
includingin0_type
(#7076) - Fix
cupy.nansum
in fusing (#7102) - Fusion
TypeError
incupy._core.fusion._call_ufunc()
(#7113) - Fix a typo (#7163)
- JIT: Fix compile error of minmax function (#7167)
Code Fixes
Documentation
- Docs: Add missing functions (#7103)
- Docs: ufunc methods (#7104)
- Improve benchmark documentation (#7176)
Installation
- Bump version to v12.0.0b1 (#7181)
Examples
Tests
- CI: Add ROCm 5.3 (#7124)
- CI: Allow
/test jenkins
to trigger Jenkins only (#7126) - Install zlib for CUDA 11.8 Windows CI (#7137)
- CI: improve use of cache in GitHub Actions (#7141)
- Fix for pytest 7.2 (#7147)
- CI: Add support for the latest FlexCI Windows image (#7161)
- JIT: Skip HIP
thrust::sort
test (#7162) - CI: use pre-commit in GitHub Actions (#7123)
Others
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@anaruse @andfoy @asi1024 @Diwakar-Gupta @emcastillo @IncubatorShokuhou @kmaehashi @MarcoGorelli @takagi @TsutsuiMasayoshi