Releases: cupy/cupy
v10.4.0
This is the release note of v10.4.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Announcements
Introduction of generic cupy-wheel
(EXPERIMENTAL) (#6012)
We have added a new package in the PyPI called cupy-wheel
. This meta package allows other libraries to add a dependency to CuPy with the ability to transparently install the exact CuPy binary wheel matching the user environment. Users can also install CuPy using this package instead of manually specifying a CUDA/ROCm version.
pip install cupy-wheel
This package is only available for the stable release as the current pre-release wheels are not hosted in PyPI.
This feature is currently experimental and subject to change so we recommend users not to distribute packages relying on it for now. Your suggestions or comments are highly welcomed (please visit #6688.)
Changes
Enhancements
- Add missing
cudaDevAttrMemoryPoolsSupported
to hip (#6626) - Add CC 3.2 to Tegra arch list (#6647)
- Add a few driver/runtime/nvrtc API wrappers (#6651)
Bug Fixes
- Define
float16::operator-()
only for ROCm 5.0+ (#6629) - JIT: fix access to cached codes (#6642)
- [v10] Fix Mempool attr for Cuda Python (#6654)
- Fix int64 overflow in
cupy.polyval
(#6666)
Documentation
- Documentation update for ROCm 5.0 (#6607)
- Add
--pre
option to instructions installing pre-releases (#6614) - Fix typo in performance guide (#6659)
- JIT: fix function signatures in the docs (#6660)
Installation
- Add universal CuPy package (#6683)
Tests
- Remove
jenkins
requirements (#6634) - CI: Trigger FlexCI for hotfix branches (#6636)
- Fix
TestIncludesCompileCUDA
for HEAD tests (#6650) - Trigger CUDA Python tests with
/test mini
(#6655) - Fix missing f prefix on f-strings fix (#6679)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @code-review-doctor @danielg1111 @emcastillo @kmaehashi @leofang @takagi
v10.3.1
This is the release note of v10.3.1. See here for the complete list of solved issues and merged PRs.
This is a hot-fix release for v10.3.0 which contained a regression that prevents CuPy from working on older CUDA GPUs (Maxwell or earlier).
Changes
Bug Fixes
- Define float16::operator-() only for ROCm 5.0+ (#6630)
Installation
- Bump version to v10.3.1 (#6633)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
v11.0.0b1
This is the release note of v11.0.0b1. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Notice (2022-04-05)
We have identified that this release contains a regression that prevents CuPy from working in older CUDA GPUs (Maxwell or earlier). We are planning to fix this issue in the next pre-release. See #6615 for the details.
Highlights
Increase coverage of cupyx.scipy.special
APIs (#6461, #6582, #6571)
A series of scipy.special
routines have been added to cupyx
with optimized CUDA raw kernel implementations. loggamma
, multigammaln
, fast Hankel transformations and several other utility special functions are added in these series of PRs by @grlee77 and @khushi-411.
Support for CUDA 11.6
Full support for CUDA 11.6 has been added as of this release. Binary packages can be installed with the following commnad: pip install --pre cupy-cuda116 -f https://pip.cupy.dev/pre
Support for ROCm 5.0
Full support for ROCm 5.0 has been added as of this release. Binary packages can be installed with the following commnad: pip install --pre cupy-rocm-5-0 -f https://pip.cupy.dev/pre
Changes without compatibility
Use CUB by default (#6549)
CUB support in CuPy is now enabled by default. This results in faster general reductions and routines such as sum
, argmax
, argmin
having increased performance. Notice that CUB may introduce some non-deterministic behavior and this can be disabled by setting the CUPY_ACCELERATORS=""
environment variable.
Drop support for ROCm 4.0 (#6420)
CuPy v11 will drop support for ROCm 4.0. We recommend users to use ROCm 4.3 or 5.0 instead.
Changes
New Features
- Add
cupyx.scipy.special
statistical distributions (#6461) - Add
cupy.real_if_close
API (#6475) - Add
cupyx.scipy.special
loggamma, multigammaln and fast Hankel transforms (#6528) - Add
cupyx.scipy.special.{i0e, i1e}
(#6571)
Enhancements
- Update
cupy.array_api
(#6486) - Fix for supporting ROCm 5.0 (#6524)
- Use CUB by default (#6549)
- Fix
cupy.copyto
to take NumPy array scalars (#6584) - Implement
ndarray.ravel(order="K")
(#6585) - Make einsum accept subscripts in numpy int (#6506)
Performance Improvements
- Support
cusparseSpGEMM()
(#6511) - eigsh: Prefer gemv over gemm (#6570)
- Performance improvement of
cupy.in1d
(#6583)
Bug Fixes
- Fix
cupy.fill
to properly take zero-dimcupy.ndarray
(#6481) - Fix error message in
vectorize
(#6499) - Fix
cupy.cumsum
on ROCm 5.0 (#6520) - Fix coo_matrix.diagonal (#6522)
- Fix array creation shape (#6545)
- Fix
out
args parser of ufunc (#6546) - Fix
may_share_memory
algorithm (#6560) - Avoid using the same kernel from different devices in JIT (#6575)
- Fix cupy.full and cupy.full_like to make unsafe casting (#6587)
- Fix device context management in
MemoryAsyncPool
(#6590)
Code Fixes
Documentation
- Fix documents for CUDA 11.6 (#6405)
- Remove description about issues from contribution guide (#6497)
- Documentation update for ROCm 5.0 (#6530)
Installation
- Skip appending
--compiler-bindir
ifcl.exe
is already onPATH
(#6510) - Bump version to v11.0.0b1 (#6601)
Tests
- Add FlexCI projects for Windows (#5889)
- Run cupy-benchmark on CI (#6417)
- Disable CentOS 8 test (#6492)
- Fix Dockerfile broken for array-api tests (#6508)
- CI: Trigger
push
event of FlexCI via GitHub Actions (#6538) - Skip
async_malloc
tests on unsupported device (#6541) - Fix flaky test_inverse_indices_shape (#6551)
- Trigger CUDA 11.6 Windows CI when push/pull-request (#6553)
- CI: Fix event name in dispatcher (#6555)
- CI: Fix rule name in dispatcher (#6556)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@anaruse @asi1024 @emcastillo @grlee77 @khushi-411 @kmaehashi @leofang @Onkar627 @peterbell10 @pri1311 @Smit-create @takagi @toslunar @tushxr16
v10.3.0
This is the release note of v10.3.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Notice (2022-04-08)
We have published a hot-fix release v10.3.1 which addresses a regression that prevents CuPy from working in older CUDA GPUs (Maxwell or earlier).
Highlights
Support for CUDA 11.6
Full support for CUDA 11.6 has been added as of this release. Binary packages are available in PyPI and can be installed with the following command: pip install cupy-cuda116
Support for ROCm 5.0
Full support for ROCm 5.0 has been added as of this release. Binary packages are available in PyPI and can be installed with the following command: pip install cupy-rocm-5-0
Changes
Enhancements
- Support ROCm 5.0 (#6496)
- Support cuSPARSELt 0.2.0 (repost) (#6507)
- Update
cupy.array_api
(#6550) - Fix
cupy.copyto
to take NumPy array scalars (#6593) - Fix for supporting ROCm 5.0 (#6599)
- Make einsum accept subscripts in numpy int (#6516)
Bug Fixes
- Fix error message in
vectorize
(#6515) - Fix
cupy.cumsum
on ROCm 5.0 (#6525) - Fix coo_matrix.diagonal (#6533)
- Fix
out
args parser of ufunc (#6547) - Fix
cupy.fill
to properly take zero-dimcupy.ndarray
(#6548) - Fix cuSPARSELt 0.1.0 support in v10 (#6563)
- Fix
may_share_memory
algorithm (#6565) - Avoid using the same kernel from different devices in JIT (#6581)
- Fix array creation shape (#6592)
- Fix cupy.full and cupy.full_like to make unsafe casting (#6595)
- Fix device context management in
MemoryAsyncPool
(#6596)
Code Fixes
- mypy: array_api (#6552)
Documentation
Installation
- Remove
CUPY_SETUP_ENABLE_THRUST=0
environment variable (#6488) - Skip appending
--compiler-bindir
ifcl.exe
is already onPATH
(#6514) - Bump version to v10.3.0 (#6602)
Tests
- Ignore warnings from Optuna 3.0 pre-releases (#6490)
- Disable CentOS 8 test (#6519)
- Add FlexCI projects for Windows (#6540)
- Skip
async_malloc
tests on unsupported device (#6544) - CI: Trigger
push
event of FlexCI via GitHub Actions (#6554) - CI: regenerate matrix (#6557)
- CI: Fix rule name in dispatcher (#6558)
- CI: Fix event name in dispatcher (#6559)
- Fix flaky test_inverse_indices_shape (#6573)
- Trigger CUDA 11.6 Windows CI when push/pull-request (#6578)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@anaruse @asi1024 @kmaehashi @leofang @Onkar627 @takagi @toslunar @tushxr16
v11.0.0a2
This is the release note of v11.0.0a2 See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Improved NumPy functions coverage (#6078)
As series of NumPy routines have been proposed as a good-first-issue and as a result, an increasing number of contributors have sent pull requests to help increase the number of available APIs. An issue tracker with the currently implemented issues is available at #6078.
Initial support for cupy.typing
(#6251)
An API equivalent to numpy.typing
to allow the introduction of data types in CuPy and user codes has been added.
Support for CUDA 11.6 (#6349)
Initial support for CUDA 11.6 has been added as of this release. However, binary wheels are not yet distributed and users are expected to build CuPy from source meanwhile.
Support for ROCm 5.0 (#6466)
Initial support for ROCm 5.0 has been added as of this release. However, binary wheels are not yet distributed and users are expected to build CuPy from source meanwhile.
Changes without compatibility
Drop support for ROCm 4.0 (#6420)
CuPy v11 will drop support for ROCm 4.0. We recommend users to use ROCm 4.2/4.3 instead.
Changes
New Features
- Add
cupy.isneginf
andcupy.isposinf
(#6089) - Add
cupy.typing
(#6251) - Add
asarray_chkfinite
API. (#6275) - Add Box-Cox transformations to
cupyx.scipy.special
(#6302) - Use CUDA's
log1p
forcupyx.scipy.special.log1p
(#6315) - Add special functions from the CUDA Math API (#6317)
- Add
beta
functions tocupyx.scipy.special
(#6318) - Add
cupy.union1d
API. (#6357) - Add
cupy.float_power
(#6371) - Add
cupy.intersect1d
API. (#6402) - Add
cupy.setdiff1d
api. (#6433) - Add
cupy.format_float_scientific
API (#6474)
Enhancements
- First step of
mypy
introduction (#4955) - Fix CI failure to support SciPy 1.8.0 (#6249)
- implement overwrite_input in cupy.{percentile,quantile} (#6298)
- avoid DeprecationWarning from SciPy 1.8 (
cupyx.scipy.sparse
) (#6321) - Support NumPy 1.22 (#6323)
- Remove batched QR solver's experimental mark (#6327)
- Make scipy.special ufuncs work with CuPy inputs (#6341)
- Fix thrust related build issue with CUDA 11.6 (#6346)
- Support CUDA 11.6 (#6349)
- Fix CI failure to support SciPy 1.8.0 (#6362)
- Fix type annotations in installer (#6382)
- Add
__cupy_get_ndarray__
dunder method to transform objects to arrays' (#6414) - Bump Jitify version to fix memory leak (#6430)
- Support cuSPARSELt 0.2.0 (repost) (#6436)
- Support ROCm 5.0 (#6466)
- Warn if unexpectedlly failed to detect device count in
cupy.show_config()
(#6472) - Fix verbose LOBPCG for SciPy 1.8 (#6388)
Performance Improvements
- Reduce memory usage in
cupy.sort
(#6392)
Bug Fixes
- Fix JIT to support notebook environment (#6329)
- Fix
cupyx.ndimage.spline_filter1d
for HIP (#6406) - Fix
cupy.nan_to_num
(#6408) - Fix
cupyx.special.gammainc
,lpmv
andsph_harm
for hip (#6409) - Fix boolean views for HIP (#6412)
- Fix reduction contiguous size calculation (#6457)
Code Fixes
- Remove global
use_hip
flag in setup (#6391) - Hide private names in
cupyx.scipy.linalg
(#6449) - Hide private names in
cupyx.scipy.ndimage
(#6450) - Hide private names in
cupyx.scipy.signal
(#6451) - Hide private names in
cupyx.scipy.sparse
(#6454) - Hide private names in
cupyx.scipy.stats
(#6456)
Documentation
- Use
cupy.__version__
instead ofpkg_resources
(#6332) - Tentatively pin intersphinx to SciPy 1.7.1 docs (#6440)
- Revert "Tentatively pin intersphinx to SciPy 1.7.1 docs" (#6479)
Installation
- Avoid monkeypatching distutils (#6273)
- Eliminate unnecessary configuration pass in setup (#6389)
- Remove
CUPY_SETUP_ENABLE_THRUST=0
environment variable (#6390) - Drop support for ROCm 4.0 (#6420)
- Bump version to v11.0.0a2 (#6501)
Tests
- CI: allow discarding docker image cache manually (#6269)
- Add slow tests for stable branch (#6340)
- Parameterize library installer tests (#6343)
- Fix tests for
eigh()
for CUDA 11.6 (#6347) - Avoid empty notification message for scheduled tests (#6363)
- Support SciPy 1.8 (#6365)
- Add
cupy.testing.installed
(#6381) - Mark XFAIL for SciPy 1.8 release candidate (#6385)
- CI: Bump ROCm version from 4.3 to 4.3.1 (#6415)
- CI: build docs in parallel (#6416)
- CI: Add HEAD tests for stable branch (#6423)
- CI: Use default schema/matrix path in
generate.py
(#6424) - Skip hfft related tests in HIP (#6427)
- CI: Manage test tags in yaml (#6429)
- CI: coverage in reST (#6445)
- CI: fix NCCL 2.10 unit test not covered (#6448)
- CI: Fix CUDA 11.6 driver update steps (#6467)
- Ignore warnings from Optuna 3.0 pre-releases (#6470)
- Fix failing tests in ROCm (#6482)
Others
- CI: allow specifying special
skip
tag (#6468)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@amanchhaparia @anaruse @asi1024 @emcastillo @grlee77 @IvanYashchuk @khushi-411 @kmaehashi @pri1311 @saswatpp @takagi
v10.2.0
This is the release note of v10.2.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Support for CUDA 11.6 (#6349)
Initial support for CUDA 11.6 has been added as of this release. However, binary wheels are not yet distributed and users are expected to build CuPy from source meanwhile.
Changes
Enhancements
- Support cuDNN 8.3.2 (#6328)
- Support cuTENSOR 1.4.0 (#6330)
- Support CUDA 11.5.1 (#6331)
- Support NumPy 1.22 (#6354)
- avoid DeprecationWarning from SciPy 1.8 (
cupyx.scipy.sparse
) (#6379) - Fix thrust related build issue with CUDA 11.6 (#6386)
- Fix type annotations in installer (#6395)
- Support CUDA 11.6 (#6422)
- Bump Jitify version to fix memory leak (#6432)
- Add
__cupy_get_ndarray__
dunder method to transform objects to arrays' (#6465) - Warn if unexpectedlly failed to detect device count in
cupy.show_config()
(#6476) - Fix verbose LOBPCG for SciPy 1.8 (#6394)
Bug Fixes
- Fix JIT to support notebook environment (#6356)
- Fix cuDNN installer not working (#6368)
- Fix
cupyx.ndimage.spline_filter1d
for HIP (#6411) - Fix boolean views for HIP (#6418)
- Fix
cupy.nan_to_num
(#6431) - Fix reduction contiguous size calculation (#6464)
Code Fixes
- Remove global
use_hip
flag in setup (#6398)
Documentation
- Use
cupy.__version__
instead ofpkg_resources
(#6380) - Tentatively pin intersphinx to SciPy 1.7.1 docs (#6442)
- Revert "Tentatively pin intersphinx to SciPy 1.7.1 docs" (#6480)
Installation
- Fix for cuDNN directory structure in Windows (#6369)
- Install lib directory on Windows in cuDNN installer (#6370)
- Avoid monkeypatching distutils (#6373)
- Eliminate unnecessary configuration pass in setup (#6399)
- Bump version to v10.2.0 (#6502)
Tests
- CI: use CUDA docker images for CUDA Python CI (#6338)
- Avoid empty notification message for scheduled tests (#6364)
- CI: allow discarding docker image cache manually (#6372)
- Parameterize library installer tests (#6374)
- Fix tests for
eigh()
for CUDA 11.6 (#6376) - Add
cupy.testing.installed
(#6387) - Mark XFAIL for SciPy 1.8 release candidate (#6396)
- CI: build docs in parallel (#6419)
- CI: Bump ROCm version from 4.3 to 4.3.1 (#6421)
- CI: Use default schema/matrix path in
generate.py
(#6428) - CI: Manage test tags in yaml (#6441)
- Support SciPy 1.8 (#6444)
- CI: coverage in reST (#6447)
- CI: fix NCCL 2.10 unit test not covered (#6452)
- Skip hfft related tests in HIP (#6458)
- CI: Fix CUDA 11.6 driver update steps (#6471)
Others
- CI: allow specifying special
skip
tag (#6477)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
v11.0.0a1
This is the release note of v11.0.0a1. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Improved NumPy functions coverage (#6078)
As series of NumPy routines have been proposed as a good-first-issue and as a result, an increasing number of contributors have sent pull requests to help increase the number of available APIs. An issue tracker with the currently implemented issues is available at #6078.
Add cupyx.scipy.special
functions (#5687)
Spherical harmonics, Legendre and Gamma functions are implemented using highly performant specific CUDA kernels. Thanks to @grlee77!
Initial support for CUDA Graph API by means of stream capture API (#4567)
This PR adds the ability of using the CUDA Graph API to greatly reduce the overhead of kernel launching. This is done by using the stream capture API, and example follows.
Thanks to @leofang!
import cupy as cp
a = cp.random.randint(0, 10, 100, dtype=np.int32)
s = cp.cuda.Stream(non_blocking=True)
with s:
s.begin_capture()
a += 3
a = cp.abs(a)
g = s.end_capture() # work is queued, but not yet launched
g.launch()
s.synchronize()
Support __device__
function in CuPy JIT (#6265)
The new interface cupyx.jit.rawkernel(device=True)
is supported to define a CUDA device function.
from cupyx import jit
@jit.rawkernel(device=True)
def getitem(x, tid):
return x[tid]
@jit.rawkernel()
def elementwise_copy(x, y):
tid = jit.threadIdx.x + jit.blockDim.x * jit.blockIdx.x
y[tid] = getitem(x, tid)
The following CUDA code is generated from the above python code.
__device__ int getitem_1(CArray<int, 1, true, true> x, unsigned int tid) {
return x[tid];
}
extern "C" __global__ void elementwise_copy(CArray<int, 1, true, true> x, CArray<int, 1, true, true> y) {
unsigned int tid;
tid = (threadIdx.x + (blockDim.x * blockIdx.x));
y[tid] = getitem_1(x, tid);
}
Changes
New Features
- Support stream capture (#4567)
- Add additional special functions (spherical harmonics, Legendre, Gamma functions) (#5687)
- Add
cupy.asfarray
(#6085) - Add
cupy.trapz
(#6107) - Add
cupy.array_api.linalg
(#6131) - Add
cupy.mask_indices
(#6156) - Add
cupy.array_equiv
API. (#6254) - Add
cupy.cublas.syrk
andcupy.cublas.sbmv
(#6278) - Add
cupy.vander
API. (#6279) - Add
cupy.ediff1d
API. (#6280) - Add
cupy.fabs
API. (#6282) - Add discrete cosine and sine transforms to
cupyx.scipy.fft
(#6288) - Add
logit
,expit
andlog_expit
tocupyx.scipy.special
(#6300) - Add
xlogy
andxlog1py
tocupyx.scipy.special
(#6301) - Add
tril_indices
andtril_indices_from
API. (#6305) - Add
cupy.format_float_positional
(#6308) - Add
cupy.row_stack
API. (#6312) - Add
triu_indices
andtriu_indices_from
API. (#6316)
Enhancements
- Raise better message when importing CPU array via DLPack (#6051)
- Borrow more non-GPU APIs from NumPy (#6074)
- Add more aliases for compatibility with NumPy (#6075)
- Import more dtype aliases from NumPy (#6076)
- Borrow indexing APIs from NumPy (#6077)
- Apply upstream patch to
cupy.array_api
(#6086) - Compile cub/thrust with no unique symbol (#6106)
- Support cuDNN 8.3.0 (#6108)
- Support all advanced indexing (#6127)
- Support CUDA 11.5.1 (#6166)
- Support lambda function in
cupy.vectorize
(#6170) - Support eigenvalue solver 64bit API (#6178)
- Support cuTENSOR 1.4.0 (#6187)
- Make
matmul
support ufunc kwargs (#6195) - Alias NumPy error classes (#6212)
- Support comparison to
None
andEllipsis
(#6222) - JIT: Fix if expr typing rule (#6234)
- Support comparison with more objects (#6250)
- JIT: Support
__device__
function (#6265) - More clear warning message (#6283)
- Make streams hashable (#6285)
- Check isinstance before comparison in
__eq__
(#6287) - Support cuDNN 8.3.2 (#6314)
- Deprecate MachAr (support NumPy 1.22) (#6188)
- Fix
cupy.linalg.qr
to align with NumPy 1.22 (#6225) - Change a parameter name in
percentile
andquantile
to support NumPy 1.22 (#6228)
Performance Improvements
Bug Fixes
- Detect repeated axis in reduction (#5964)
- Fix
__all__
incupyx.scipy.fft
(#6071) - Fix
__getitem__
on Ellipsis and advanced indexing dimension (#6081) - Allow leading unit dimensions in copy source (#6118)
- Always test broadcast in
copyto
(#6121) - Fix overloading ambiguity in ndimage filters (#6162)
- Fix empty Cholesky (#6164)
- Fix empty
solve
(#6167) - Allow
flip
()-shaped array (#6169) - Handles infinities of the same sign in
logaddexp
andlogaddexp2
(#6172) - Fix #4675 on resolving TODO in #4198 (#6197)
- Eigenvalue solver 64bit API on CUDA 11.1 (#6201)
- Fix edge case compatibility in
cupy.eye()
(#6208) - Fix
linalg.eigh
andlinalg.eigvalsh
on empty inputs (#6210) - Fix overlapping
out
inmatmul
and(tensor)dot
(#6216) - Fix
compile_with_cache
returning None (#6232) - Fixing index calculation for random constructor (#6257)
- BUG: Fix the .T attribute in the
array_api
namespace (#6289) - Fix stream capture in ROCm (#6296)
- Fix cuDNN installer not working (#6337)
Code Fixes
- Remove
__all__
fromcupyx/scipy/*
(#6149) - Delete
from os import path
(#6152) - Remove legacy
cp.linalg.solve()
implementation (#6161)
Documentation
- Add link to compatibility matrix (#6055)
- Update upgrade guide (#6058)
- Add v11 to compatibility matrix (#6067)
- Exclude
kernel_version
from comparison table (#6072) - Doc: Add more footnotes to comparison table (#6073)
- Add polynomial modules to comparison table (#6082)
- Add CITATION.bib and update README (#6091)
- Remove LLVM_PATH note on document (#6093)
- Docs: Update linkcode implementation (#6126)
- Update footnotes in comparison table (#6142)
- Update conda-forge installation guide (#6186)
- Revise Overview for CuPy v10 (#6209)
- Docs: CentOS installation from source (#6218)
- Fix
cupy.trapz
docstring (#6239) - Fix
eigsh
doc (#6266) - Add
cupy.positive
in API Reference (#6274)
Installation
- Replace
distutils
withsetuptools
in Windowscl.exe
detection (#6025) - Fix for cuDNN directory structure in Windows (#6342)
Tests
- Fix
testing.multi_gpu
to add pytest marker (#6015) - CI: add link to ROCm projects in CI coverage matrix (#6037)
- CI: use separate project for multi-GPU tests (#6050)
- Fix CI result notification message format (#6066)
- Fix CI cannot override cuSPARSELt/cuTENSOR version preinstalled (#6084)
- Workaround DeprecationWarning raised from pkg_resources (#6094)
- Fix missing
multi_gpu
annotation in tests (#6098) - Fix exception handling in cupyx.distributed (#6114)
- Improve FlexCI test scripts (#6117)
- CI: Add timeout to show_config (#6120)
- Trigger FlexCI from GitHub Actions (#6130)
- CI: Fix package override sometimes fails in CentOS (#6141)
- CI: Need to update CUDA driver in cuda115.multi (#6144)
- Add tests for
convolve2d
(#6171) - CI: Update limits to reduce cache size (#6174)
- CI: Fix unquoted specifiers (#6175)
- Support pre-release NumPy version in tests (#6190)
- Remove XFAIL for XPASS tests on ROCm (#6259)
- Tentatively pin to
setuptools<60
in Windows CI (#6260) - Fix cache key for github actions (#6281)
- Use NVIDIA docker images for CUDA 11.5 (#6303)
- Tentatively pin to CUDA Driver 495 (#6310)
- Remove unused dtype parameterizing in
tril_indices
test (#6322) - Use
get_include
instead ofarray_equiv
for fallback test (#6333) - CI: Add
cuda-slow
test in FlexCI (#6335) - CI: use CUDA docker images for CUDA Python CI (#6336)
Others
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@akochepasov @amanchhaparia @asi1024 @ColmTalbot @emcastillo @eternalphane @grlee77 @haesleinhuepf @khushi-411 @kmaehashi @leofang @okuta @ptim0626 @SauravMaheshkar @shwina @takagi @thomasjpfan @tom24d @toslunar @twmht @WiseroOrb @Yutaro-Sanada
v10.1.0
This is the release note of v10.1.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Changes
Enhancements
Bug Fixes
- Fix edge case compatibility in
cupy.eye()
(#6213) - Fix
compile_with_cache
returning None (#6236) - Allow
flip
()-shaped array (#6237) - Fix
linalg.eigh
andlinalg.eigvalsh
on empty inputs (#6238) - Fix overloading ambiguity in ndimage filters (#6242)
- Fixing index calculation for random constructor (#6267)
- BUG: Fix the .T attribute in the
array_api
namespace (#6291)
Code Fixes
- Remove legacy
cp.linalg.solve()
implementation (#6235)
Documentation
- Docs: CentOS installation from source (#6230)
- Add
cupy.positive
in API Reference (#6276) - Fix
eigsh
doc (#6292)
Tests
- Add tests for
convolve2d
(#6194) - Change a parameter name in
percentile
andquantile
to support NumPy 1.22 (#6247) - Tentatively pin to
setuptools<60
in Windows CI (#6270) - Fix cache key for github actions (#6286)
- Remove XFAIL for XPASS tests on ROCm (#6297)
- Use NVIDIA docker images for CUDA 11.5 (#6304)
- Tentatively pin to CUDA Driver 495 (#6311)
- CI: Add
cuda-slow
test in FlexCI (#6339)
Others
- Bump version to v10.1.0 (#6345)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @kmaehashi @leofang @ptim0626 @SauravMaheshkar @takagi @thomasjpfan @toslunar @WiseroOrb
v10.0.0
This is the release note of v10.0.0. See here for the complete list of solved issues and merged PRs.
This release note only covers changes made since v10.0.0rc1 release. Check out our blog for highlights in the v10 release!
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Support all advanced indexing (#6196)
The support for advanced indexing using boolean masks has been completed in CuPy v10.
Now it is possible to index arrays using combinations of Ellipsis
, boolean flags and regular indexes such as a[[[1, 1, -3], [0, 2, 2]], [True, False, True, True]]
and a[..., [[False, True]]]
Support lambda functions in cupy.vectorize
(#6217)
A long-awaited feature to ensure compatibility with NumPy vectorize
has been implemented. In this release, it is now possible to transpile lambda functions. This is especially handy when using JIT in conjunction with cupy.vectorize
:
import cupy
a = cupy.array([0.4, -0.2, 1.8, -1.2])
relu = cupy.vectorize(lambda x: (x > 0.0) * x)
print(relu(a)) # [ 0.4 -0. 1.8 -0. ]
Announcements
Drop support for CUDA 10.1 or earlier (#5770)
As per the RFC in #5717 and Twitter, the minimum CUDA version that is supported by CuPy v10 is CUDA 10.2.
Drop support for NCCL 2.6 and 2.7 (#5855)
The minimum supported version for CuPy v10 is NCCL 2.8 as it implements the required primitives for cupyx.distributed
to work.
Drop support for Python 3.6 (#5771)
Following the Python 3.6 sunset on December 2021, and the compatibility lines with NumPy, starting CuPy v10, Python 3.6 will no longer be supported.
Drop support for NumPy 1.17 (#5857)
As per NEP29, NumPy 1.17 support has been dropped on July 26, 2021.
Changes
New Features
- Add
cupy.array_api.linalg
(#6199)
Enhancements
- Add more aliases for compatibility with NumPy (#6080)
- Raise better message when importing CPU array via DLPack (#6097)
- Apply upstream patch to
cupy.array_api
(#6105) - Borrow more non-GPU APIs from NumPy (#6109)
- Import more dtype aliases from NumPy (#6110)
- Borrow indexing APIs from NumPy (#6111)
- Compile cub/thrust with no unique symbol (#6140)
- Support cuDNN 8.3.0 (#6150)
- Support eigenvalue solver 64bit API (#6192)
- Support all advanced indexing (#6196)
- Support lambda functions in
cupy.vectorize
(#6217) - Deprecate MachAr (support NumPy 1.22) (#6189)
Performance Improvements
- Avoid 64bit division to reduce register consumption (#6102)
Bug Fixes
- Fix
__all__
incupyx.scipy.fft
(#6083) - Detect repeated axis in reduction (#6103)
- Fix
__getitem__
on Ellipsis and advanced indexing dimension (#6113) - Allow leading unit dimensions in copy source (#6153)
- Always test broadcast in
copyto
(#6155) - Handles infinities of the same sign in
logaddexp
andlogaddexp2
(#6176) - Fix empty
solve
(#6183) - Fix empty Cholesky (#6184)
- Fix #4675 on resolving TODO in #4198 (#6204)
- Eigenvalue solver 64bit API on CUDA 11.1 (#6220)
Code Fixes
- Avoid
from os import path
(#6165)
Documentation
- Update stable branch (#6065)
- Update labels of Docs column (#6068)
- Add more footnotes to comparison table (#6079)
- Exclude
kernel_version
from comparison table (#6090) - Remove
LLVM_PATH
note on document (#6101) - Add polynomial modules to comparison table (#6122)
- Add link to compatibility matrix (#6135)
- Update footnotes in comparison table (#6143)
- Update conda-forge installation guide (#6200)
- Update upgrade guide (#6203)
- Update
linkcode
implementation (#6206) - Revise Overview for CuPy v10 (#6215)
Installation
- Replace
distutils
withsetuptools
in Windowscl.exe
detection (#6138) - Bump version to v10.0.0 (#6224)
Tests
- Fix CI cannot override cuSPARSELt/cuTENSOR version preinstalled (#6087)
- Workaround DeprecationWarning raised from pkg_resources (#6095)
- Fix
testing.multi_gpu
to add pytest marker (#6096) - Fix missing
multi_gpu
annotation in tests (#6100) - Fix exception handling in cupyx.distributed (#6116)
- Improve FlexCI test scripts (#6119)
- Fix CI result notification message format (#6124)
- CI: Add timeout to show_config (#6132)
- CI: use separate project for multi-GPU tests (#6145)
- CI: Need to update CUDA driver in cuda115.multi (#6146)
- CI: Fix package override sometimes fails in CentOS (#6147)
- CI: add link to ROCm projects in CI coverage matrix (#6148)
- CI: Fix unquoted specifiers (#6182)
- CI: Update limits to reduce cache size (#6185)
- Trigger FlexCI from GitHub Actions (#6191)
- Support pre-release NumPy version in tests (#6193)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @emcastillo @eternalphane @kmaehashi @leofang @okuta @takagi @toslunar @twmht @Yutaro-Sanada
v10.0.0rc1
This is the release note of v10.0.0rc1. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Add cupyx.distributed
(#5590)
This new version provides a wrapper over NVIDIA’s NCCL library to perform communication in an MPI-like style. Currently, point-to-point and collective communication primitives are supported. Check the documentation for a complete reference of the functions.
CuPy now supports CUDA 11.5, Python 3.10, and NVIDIA Jetson
Wheels for CUDA 11.5 (cupy-cuda115
) are now available.
Python 3.10 wheels are also available for all supported CUDA / ROCm versions.
Wheels for Jetson can be found in the attached artifacts (pip install cupy-cuda112 -f https://pip.cupy.dev/pre
).
Enable Generator
random API in ROCm 4.3 (#5895)
ROCm 4.3 fixes a series of issues that prevented the Generator
random API (#4177) to run in AMD devices.
Changes without compatibility
Refer to the Upgrade Guide for the detailed description.
Automatically enable peer access (#5496)
Peer access is enabled by default when a CuPy ndarray is stored in a different device as long as the machine topology allows it.
Change Device.use()
semantics to align with Stream.use()
(#5853)
When exiting a context, the current device is now reverted back to the device of the parent's context scope, not the device last use()
d.
Automatically convert big-endian numpy.ndarray
to little-endian in cupy.array()
and its variants (#5828)
Previously CuPy was copying the given numpy.ndarray
to GPU as-is, regardless of the endianness. In CuPy v10, big-endian arrays are converted to little-endian before the transfer, which is the native byte order on GPUs. This change eliminates the need to manually change the array endianness before creating the CuPy array.
Add cupyx.profiler
module (#5940)
A new module cupyx.profiler
is added to host all profiling related APIs in CuPy. Accordingly, the following APIs are relocated to this module:
cupy.prof.TimeRangeDecorator()
->cupyx.profiler.time_range()
cupy.prof.time_range()
->cupyx.profiler.time_range()
cupy.cuda.profile()
->cupyx.profiler.profile()
cupyx.time.repeat()
->cupyx.profiler.benchmark()
The old routines are deprecated.
Deprecate cupy.cuda.compile_with_cache
(#5858)
An internal API cupy.cuda.compile_with_cache()
has been marked as deprecated as there are better alternatives (RawModule
, RawKernel
). While it has a long-standing history, this API has never been meant to be public. We encourage downstream libraries and users to migrate to the aforementioned public APIs.
Announcements
Drop support for CUDA 10.1 or earlier (#5770)
As per the RFC in #5717 and Twitter, the minimum CUDA version that will be supported by CuPy v10 is CUDA 10.2.
Drop support for NCCL 2.6 and 2.7 (#5855)
The minimum supported version for CuPy v10 will be NCCL 2.8 as it implements the required primitives for cupyx.distributed
to work.
Drop support for Python 3.6 (#5771)
Following the Python 3.6 sunset on December 2021, and the compatibility lines with NumPy, starting CuPy v10, Python 3.6 will no longer be supported.
Drop support for NumPy 1.17 (#5857)
As per NEP29, NumPy 1.17 support has been dropped on July 26, 2021.
Alpha/Beta/RC wheels no longer distributed through PyPI
-
As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g.,
pip install cupy-cudaXXX -f https://pip.cupy.dev/pre
) . Note that the sdist package is available in PyPI for all versions. -
Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.
Changes of supported cuSPARSELt version
We are planning to drop cuSPARSELt v0.1.0 support in CuPy v10 final release. (#6045)
Changes
New Features
- Add
cupyx.distributed
(#5590) - Add
cupy.positive()
(#5774) - Update
cupy.array_api
(#5783) - Update
cupy.array_api
typing (#5821) - Add
trim_mean
from scipy.stats to cupyx (#5900) - Implement more array creation & serialization methods (#5925)
Enhancements
- Automatically enable peer access (#5496)
- Update DLPack header to v0.6 to support exchanging arrays backed by managed memory (#5512)
- Lazy-preload cuDNN (#5677)
- Support ROCm managed memory (#5685)
- Fix import failure when pytest namespace is available (#5703) (#5707)
- Support cuTENSOR 1.3.3 (#5732)
- Add
dtype
andcasting
arguments tocupy.concatenate()
(#5759) - Automatically convert big-endian data to little-endian in
cupy.array()
and its variants (#5828) - Use pylibcugraph for
connected_components
(#5830) - Make
show_config
runnable without GPU (#5835) NotImplementedError
clarity (#5841)- Change
Device.use()
semantics to align withStream.use()
(#5853) - Drop support for NumPy 1.17 (#5857)
- Deprecate
cupy.cuda.compile_with_cache
(#5858) - Show error when importing
cupy.array_api
with Python 3.7 (#5873) - Enable new random api in ROCm 4.3 (#5895)
- Add
bitorder
option tocupy.packbits
(#5898) - Support using cuTENSOR in elementwise ufuncs (#5902)
- Workaround ROCm 4.3
LLVM_PATH
issue in hipRTC (#5933) - Update the Array API module (#5939)
- Add
cupyx.profiler
module (#5940) - Use SHA1 hash for kernel cache key to support Linux in FIPS-compliant mode (#5988)
- Merge fp16 headers for CUDA 11.2+ (#5993)
- Support CUDA 11.5 for library installer (#5996)
- Add cupy-cuda115 to duplicate detection (#5999)
- Suggest using binary packages when build failed (#6028)
- Improve import error message (#6029)
- Display license terms when downloading libraries (#6032)
- Fix error type/message for duplicate value in axis (#5953)
Performance Improvements
- Use
index_t
for faster address calculation (#5981)
Bug Fixes
- Use
cudaRuntimeGetVersion
instead ofCUDA_VERSION
for CUDA Python support (#5723) - Allow generating cubins for the max known CC (#5779)
- Fix hypergeometric distribution implementation to use
int
(#5785) - Fix non-determinisitc behavior in
cupy.random.shuffle
(#5838) - Avoid using
driver.get_build_version
(#5861) - Fix
nan_to_num
to comply with NumPy API (#5870) - Do not use cuTENSOR unless available (#5872)
- Fix
_get_cuda_build_version
for ROCm (#5888) - Fix
__repr__
of mode and scalar in cuTENSOR (#5901) - Fix to push device after
setDevice
succeed (#5904) - Fix
ndarray.clip
to match numpy (#5910) - Fix
copyto
with non-contiguous multidevice (#5913) - Avoid use of
setDevice
in CuPy codebase (#5915) - Fix max
blocksize
used incupyx.optimizing.optimize
for HIP (#5921) - Do not use
with device
in code base (#5963) - Fix
__dlpack__
protocol (#5970) - Fix
cupyx.tools.install_library
for windows (#5977) - Fix
ravel
for strides 0 (#5978) - Avoid using
with
context for streams (#5985) - Fix cuTENSOR installation on Windows (#6007)
- Fix hash length for SHA1 (#6023)
- Fix: Add missing output dtype check for direct
correlate/convolve
(#6046) - Fix cuDNN version not displayed in wheel installation (#6054)
Code Fixes
- Code-fix on
cupy.array()
(#5842) - Successive code fix on
cupy.array()
(#5844) - Fix kernel name of
cupyx.scipy.ndimage.interpolation.map_coordinates
(#5845) - Replace
addAddNameExpression
withaddNameExpression
in NVRTC binding (#5938) - Split loop testing helpers into
_loops
(#5967) - Make
CUPY_DLPACK_EXPORT_VERSION
consistent (#5982) - Fix comment in device switching (#5984)
- Avoid using deprecated
setDaemon
method (#6059)
Documentation
- Update upgrade guide (#5824)
- Update list of supported OS (#5854)
- Drop support for NCCL 2.6 and 2.7 (#5855)
- Add docs for
driver.get_build_version
(#5860) - Document
ppc64le
andaarch64
are supported on conda-forge (#5865) - Mention deprecation of
compile_with_cache()
in upgrade guide (#5883) - Add docs for
scipy.sparse.csgraph
module (#5903) - Refine SciPy-compatible API documentation (#5905)
- Improve the comparison table (#5907)
- Remove CUDA 10.0 / 10.1 from README (#5924)
- Improve some docs on interoperability and
cupy.linalg.cholesky
(#5941) - Add footnotes for functions unimplemented in CuPy (#5942)
- Document
CUPY_ACCELERATORS
(#5948) - Fix section heading level (#5962)
- Mention
np.matrix
in the difference section (#5966) - Add PyTorch with
RawKernel
example to docs (#5973) - Add
sphinx-copybutton
(#5976) - Add favicon to docs (#5980)
- Replace favicon with high resolution one (#5986)
- Update upgrade guide for v10 (#5994)
- Cover a bit more of cuTENSOR in perf guide (#5995)
- Support CUDA 11.5 on documents (#5997)
- Fix typo in copyright line (#6030)
- Add Python 3.10.0 to support list (#6038)
- Added Compatibility Matrix to Upgrade Guide (#6053)