Releases: cupy/cupy
v9.6.0
This is the release note of v9.6.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Announcements
Final release for v9.x series
This is expected to be the last release of the CuPy v9 series. Please start trying your workflow with CuPy v10.0.0rc1 and let us know if you have any feedback!
CuPy now supports CUDA 11.5
Wheels for CUDA 11.5 (cupy-cuda115
) are now available.
Removal of Alpha/Beta/RC Wheels from PyPI
-
As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g.,
pip install cupy-cudaXXX -f https://pip.cupy.dev/pre
) . Note that the sdist package is available in PyPI for all versions. -
Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.
Changes
Enhancements
- Make
show_config
runnable without GPU (#5839) - Merge fp16 headers for CUDA 11.2+ (#6004)
- Support cuTENSOR 1.3.3 (#6005)
- Support CUDA 11.5 for library installer (#6010)
- Display license terms when downloading libraries (#6041)
- Fix error type/message for duplicate value in axis (#5987)
Bug Fixes
- Do not use cuTENSOR unless available (#5885)
- Fix non-determinisitc behavior in
cupy.random.shuffle
(#5887) - Fix
ndarray.clip
to match numpy (#5916) - Fix
__repr__
of mode and scalar in cuTENSOR (#5917) - Fix max
blocksize
used incupyx.optimizing.optimize
for HIP (#5931) - Fix
ravel
for strides 0 (#5998) - Fix cuTENSOR installation on Windows (#6022)
- Allow generating cubins for the max known CC (#6024)
Documentation
- Update upgrade guide (#5834)
- Document ppc64le and aarch64 are supported on conda-forge (#5869)
- Improve the comparison table (#5911)
- Add footnotes for functions unimplemented in CuPy (#5954)
- Update the docstring for
cholesky
(#5960) - Document
CUPY_ACCELERATORS
(#5975) - Add favicon to docs (#5983)
- Support CUDA 11.5 on documents (#6006)
- Replace favicon with high resolution one (#6008)
- Fix typo in copyright line (#6035)
Tests
- Clean up plan cache in a FFT slow test (#5825)
- Copy source directory to support pip 21.3 (#5896)
- Simplify legacy ROCm test script for FlexCI (#5936)
- Relax sparse linalg testing tolerance (#5958)
- CI: Fix ROCm build test (FlexCI) failing (#5965)
- Improve handling of FlexCI test runs (#6002)
- Upload cache even when test failed in FlexCI (#6003)
- CI: Increase timeout for CUDA 11.4 / 11.5 tests (#6040)
- CI: Do not run full combination test even for branch tests for ROCm (#5974)
Others
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @drbeh @emcastillo @kmaehashi @leofang @takagi @toslunar
v9.5.0
This is the release note of v9.5.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Announcements
Removal of Alpha/Beta/RC Wheels from PyPI
-
As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g.,
pip install cupy-cudaXXX -f https://pip.cupy.dev/pre
) . Note that the sdist package is available in PyPI for all versions. -
Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.
Changes
Enhancements
- Support cuDNN 8.2.4 (#5744)
- Support NCCL 2.11.4 (#5747)
- Fix
cupyx.optimize
to save file when no optimization ran (#5760)
Bug Fixes
- Fix spline filter with large array (#5686)
- Fix exception for indexing with multiple ellipses (#5739)
- Fix docstring for fallback modules (#5742)
- Include
stdexcept
in hip headers (#5777) - Fixed typo in error message in sparse.csr_matrix (#5788)
- Fix
MAX_NDIM
and add guards/tests (#5798) - Disable spmm on Windows CUDA 10.2 (#5805)
Documentation
- Fix random docstring (#5708)
- Remove
--pre
from ROCm source build instructions (#5782) - Use custom index for pre-release wheels (#5793)
Installation
Tests
- Update
test_eigenvalue.py
(#5643) - Improve performance of
TestSplineFilter1dLargeArray
(#5694) - Stop inheriting
unittest.TestCase
for performance (#5710) TestSplineFilter1dLargeArray
marked slow and reduced memory usage (#5729)- Make testing helpers support non-methods (#5731)
- Make test parameter names static (#5733)
- Update pip and setuptools in Windows CI (#5738)
- Improve FlexCI output (#5796)
- Fix error message comparison (#5806)
Others
- Add workflow to test/build/push docker images on pull-request/release (#5752)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@christinahedges @emcastillo @kmaehashi @leofang @takagi @toslunar
v10.0.0b3
This is the release note of v10.0.0b3. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Array API initial support (#5698)
This release starts implementing the Array API standard for interoperability with other tensor libraries. Please check the CuPy documentation to see a list of the currently available features.
Changes without compatibility
Drop support for CUDA 10.1 or earlier (#5770)
As per the RFC in #5717 and twitter, the minimum CUDA version that will be supported by CuPy v10 is CUDA 10.2.
Drop support for Python 3.6 (#5771)
Following the Python 3.6 sunset on December 2021, and the compatibility lines with NumPy, starting CuPy v10, Python 3.6 will no longer be supported.
Alpha/Beta/RC wheels no longer distributed through PyPI
-
As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g.,
pip install cupy-cudaXXX -f https://pip.cupy.dev/pre
) . Note that the sdist package is available in PyPI for all versions. -
Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.
Changes
New Features
- Add binomial distribution to new Generator (#5429)
- Adopt the
numpy.array_api
module ascupy.array_api
(#5698)
Enhancements
- Improve stream mismatch error message (#5706)
- Support cuDNN 8.2.4 (#5726)
- Support NCCL 2.11.4 (#5734)
- Fix
cupyx.optimize
to save file when no optimization ran (#5757) - Adding
bitorder
support tocupy.unpackbits
(#5765) - Drop support for CUDA 10.1 or earlier (#5770)
- Drop support for Python 3.6 (#5771)
Bug Fixes
- Fix spline filter with large array (#5673)
- Fix exception for indexing with multiple ellipses (#5718)
- Fix docstring for fallback modules (#5728)
- Fix
MAX_NDIM
and add guards/tests (#5749) - Fixed typo in error message in sparse.csr_matrix (#5767)
- Include
stdexcept
in hip headers (#5769) - Disable spmm on Windows CUDA 10.2 (#5802)
Code Fixes
- Prefix Cython
compile_time_env
withCUPY_
(#5740)
Documentation
- Use custom index for pre-release wheels (#5772)
- Remove
--pre
from ROCm source build instructions (#5773)
Installation
- Reorganize build scripts, part 1 (#5730)
- Reorganize build scripts, part 2: separate modules (#5743)
- Reorganize build scripts, part 3: simplify
setup.py
(#5745) - Reorganize build scripts, part 4: remove global
cupy_setup_options
(#5754) - Reorganize build scripts, part 5: remove Cython version check (#5755)
- Add maintainers in
setup.py
(#5756) - Bump version to v10.0.0b3 (#5807)
Tests
- Make testing helpers support non-methods (#5594)
- Stop inheriting
unittest.TestCase
for performance (#5599) - Eliminate random test ids (#5659)
- Improve performance of
TestSplineFilter1dLargeArray
(#5693) TestSplineFilter1dLargeArray
marked slow and reduced memory usage (#5724)- Make test parameter names static (#5727)
- Update pip and setuptools in Windows CI (#5735)
- Improve FlexCI output (#5786)
- Skip tests for bug cases (FFT on CUDA 10.2 + Pascal) (#5791)
- Fix error message comparison (#5799)
- Fix test skip issue (#5801)
Others
- Update auto-notify bot for array-api label (#5725)
- Fix backport trigger (#5741)
- Add workflow to test/build/push docker images on pull-request/release (#5746)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@christinahedges @emcastillo @iskode @kmaehashi @leofang @povinsahu1909 @takagi @toslunar
v10.0.0b2
This is the release note of v10.0.0b2. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Support for CUDA Python (#5638)
CuPy is one of the first libraries providing support for the newly released CUDA Python bindings. To try it, install cuda-python manually and set the CUPY_USE_CUDA_PYTHON=1
environment variable when building CuPy as written in the documentation.
Support for AMD ROCm 4.3
Support for ROCm 4.3 has been added in the latest release and binary wheels are provided as well. Note that there is currently an issue with ROCm 4.3 that prevents it from running in several environments. The current workaround is to set the LLVM_PATH
variable to the llvm folder included in ROCm 4.3 installation (e.g., export LLVM_PATH=/opt/rocm-4.3/llvm
).
Announcements
Removal of Alpha/Beta/RC Wheels from PyPI
-
As per the discussion in #5671, we will stop uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the assets section of each GitHub release page (e.g.,
pip install cupy-cudaXXX -f https://github.com/cupy/cupy/releases/tag/v10.0.0b2
) . Note that the sdist package is available in PyPI for all versions. -
We are also going to remove outdated (v8.0.0rc1 or earlier) pre-release binary wheels from PyPI on September 20th. See #5667 for details.
Changes
New Features
- Support batched QR solver (#5583)
- Add
cupyx.scipy.sparse.linalg.minres
(#5585) - Add Log Series distribution to
cupy.random.Generator
(#5618) - Add Power distribution to
cupy.random.Generator
(#5624) - Add support for CUDA Python (#5638)
- Add Chi-square distribution to
cupy.random.Generator
(#5645) - Add Dirichlet distribution to
cupy.random.Generator
(#5648) - Add F distribution to
cupy.random.Generator
(#5655)
Enhancements
- Add
ncclAvg
andncclBfloat16
for NCCL (#5545) - Add new eigensolvers from
rocSOLVER
(#5555) - Add support for array input in
beta
distribution ofcupy.random.Generator
(#5573) - Release the GIL for several NCCL ops (#5574)
- Allow to compile using PTX with an envvar (#5622)
- Show CUDA Python version (#5651)
- Fix version check for new ROCm version definition (#5657)
- Rest of version check fix for new ROCm version definition (#5660)
- Add ROCm 4.3 in duplicate detection (#5669)
Bug Fixes
- Fix compute capability check (#5600)
- Fix FFT convolve for shapes containing 1 (#5609)
- Fix squareness checks (#5642)
- Fix
unique
for empty array (#5654)
Code Fixes
Documentation
- Update Sphinx to 4.1.2 (#5612)
- Fix random docstring (#5628)
- Support ROCm v4.3 in document (#5633)
__array_function__
feature by default (#5644)
Tests
- Fix
skipTest
intest_decomp_lu
(#5593) - Mark
lsmr
tests xfail for CSR matrices on HIP (#5597) - Increase test timeout (#5601)
- Fix cubic
for_all_dtypes_combination
tests (#5629) - Add CI for ROCm 4.3 (#5630)
- Reload GPG key for ROCm 4.2 test (#5636)
- Fix branch name of cuda-python (#5650)
- Add a workaround for ROCm 4.3.0 for testing (#5662)
Others
Contributors
The CuPy Team would like to thank all those who contributed to this release!
v9.4.0
This is the release note of v9.4.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Compile with SASS (CUBIN) for CUDA versions >= 11.1 (#5097)
Changes NVRTC compile process to produce SASS (CUBIN files) instead of PTX so that kernels compiled with a new CUDA Toolkit version can be run with earlier CUDA Drivers. Check the CUDA Compatibility Guide and NVRTC Documentation for detailed information. We believe most users will not be affected by this change, but you can revert to the previous behavior by setting CUPY_COMPILE_WITH_PTX=1
environment variable just in case.
Support for AMD ROCm 4.3
Support for ROCm 4.3 has been added in the latest release and binary wheels are provided as well. Note that there is currently an issue with ROCm 4.3 that prevents it from running in several environments. The current workaround is to set the LLVM_PATH
variable to the llvm folder included in ROCm 4.3 installation (e.g., export LLVM_PATH=/opt/rocm-4.3/llvm
).
Changes
Enhancements
- Compile with SASS for CUDA versions >= 11.1 (#5611)
- Allow to compile using PTX with an envvar (#5634)
- Add
ncclAvg
andncclBfloat16
for NCCL (#5656) - Fix version check for new ROCm version definition (#5661)
- Rest of version check fix for new ROCm version definition (#5670)
Bug Fixes
- Fix FFT convolve for shapes containing 1 (#5613)
- Fix the RTC call path for HIP (#5620)
- Fix compute capability check (#5646)
- Fix squareness checks (#5652)
- Fix
unique
for empty array (#5658)
Code Fixes
Documentation
- Update Sphinx to 4.1.2 (#5616)
__array_function__
feature by default (#5653)- Support ROCm v4.3 in document (#5674)
Tests
- Increase test timeout (#5615)
- Increase timeout for CUDA 11.4 tests (#5617)
- Add CI for ROCm 4.3 (#5632)
- Reload GPG key for ROCm 4.2 test (#5637)
- Fix cubic
for_all_dtypes_combination
tests (#5639) - Add a workaround for ROCm 4.3.0 for testing (#5663)
- Fix
skipTest
intest_decomp_lu
(#5672)
Others
- Bump version to v9.4.0 (#5680)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
v10.0.0b1
This is the release note of v10.0.0b1. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
CuPy now supports CUDA 11.4 (cupy-cuda114
)
Along with the new CUDA toolkit version, support for NCCL 2.10.3 and cuDNN 8.2.2 libraries is added.
Compute capability 86 support for GPUs of the RTX 30X0 and AX000 series is also added.
Google Summer of Code
CuPy is participating in Google Summer of Code under the NumFOCUS organization.
Our student @povinsahu1909 is working hard to add support for sparse linear algebra solvers and increasing the compatibility of the new random number generation API.
Compile with SASS (CUBIN) for CUDA versions >= 11.1 (#5097)
Changes NVRTC compile process to produce SASS (CUBIN files) instead of PTX so that kernels compiled with a new CUDA Toolkit version can be run with earlier CUDA Drivers. Check the CUDA Compatibility Guide and NVRTC Documentation for detailed information.
Changes without compatibility
Support the new DLPack exchange protocol (#5306)
By adopting the new DLPack exchange protocol proposed in the Python array API standard, cupy.fromDlpack
has been deprecated in favor of cupy.from_dlpack
.
Known Issues
cupy-cuda102
,cupy-cuda110
andcupy-cuda111
wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.
Changes
New Features
- Texture memory 2D/3D affine transformations (#5171)
- Support the new DLPack exchange protocol (#5306)
- Add cupyx.scipy.sparse.linalg.lsmr (#5331)
- JIT: Support all atomic intrinsics (#5387)
- Expose
_GUFunc
throughcupyx
(#5408) - Add geometric distribution to new Generator (#5443)
- Support Numba-like
jit.gridsize()
syntax in CuPy JIT (#5461) - Support Numba-like
jit.laneid()
andjit.warpsize
syntax in CuPy JIT (#5462) - Add cupyx.scipy.sparse.linalg.cgs (#5524)
- Add hypergeometric distribution to new Generator (#5560)
Enhancements
- Compile with SASS for CUDA versions >= 11.1 (#5097)
- Support NCCL v2.9.9 (#5268)
- Support CUDA 11.4 and
compute_86
(#5434) - Update NumPy/SciPy pinning in
setup.py
(#5453) - Make
matrix_power
support stacked matrices (#5458) - Support hipSPARSE and fix streams not set in some generic APIs in cuSPARSE (#5472)
- Add
cudaDeviceDisablePeerAccess
wrapper (#5495) - Support cuDNN v8.2.2 (#5516)
- Support NCCL v2.10.3: library installer and document (#5521)
Bug Fixes
- JIT: Fix supported dtype of
atomic_add
on HIP (#5383) - Fix cupy.nanmedian's axis parameter to accept a sequence other than a tuple (#5389)
- Fix astype from boolean (#5410)
- Fix compatibility issues of
ndarray.view
(#5428) - Fix
types
attribute of ufunc (#5448) - Fix new DLPack protocol error messages and tests (#5449)
texture_memory
option inaffine_transform
not supported by HIP (#5464)- Fix
linalg.lstsq
for empty matrix (#5467) - Fix reshape (#5470)
- Fix random generator output not being raveled (#5478)
- Fix random
integers
(#5479) - Fix availability tests in cuSOLVER and cuSPARSE (#5492)
- Add missing hipSPARSE include to builder (#5515)
- prune cuFFT static lib by major cc ver (#5531)
- Fix casts from bool in ufunc inputs (#5539)
- Access
cudaMemoryType
in the pointer attributes and fix for HIP (#5544) - Fix casts in ufunc outputs (#5550)
- Code fix for {cu, roc}SOLVER (#5558)
- Fix CUDA API call on module initialization (#5561)
- Fix the RTC call path for HIP (#5569)
- Fix broadcast error messages (#5579)
Code Fixes
- Do not call
cudnnGetVersion
on import (#5326) - JIT: Fix
__call__()
for built-in functions (#5361) - Add HIP symbol redefinitions (#5362)
- Remove the data member
use_32bit_indexing
fromCArray
(#5376) - Use
dtype.name
insteaddtype.char
(#5444) - Try to use
-I
in hipRTC (#5486) - Hide modules from public APIs (#5522)
- consistent kernel names (#5551)
- Use the new macro
__HIP_PLATFORM_AMD__
at build time (#5554)
Documentation
- Add upgrade guide for v10 (#5278)
- Update tag lines in package description and docs index (#5399)
- Fix typo in
apply_along_axis
(#5432) - Fix indent of
Returns
section (#5433) - Update
user_guide/basic.rst
device agnostic section (#5435) - Support CUDA 11.4 on documents (#5447)
- Update install guide with new NumPy/SciPy versions (#5454)
- Use
from_dlpack
instead of fromDlpack (#5488) - Use Sphinx 4.1.0 (#5489)
- Bump ReadTheDocs configuration to version 2 (#5491)
- Fix docs of eigh and eigvalsh (#5494)
- Add a lingering doc page for
fromDlpack()
(#5509) - Document
scipy.fft
backend usage (#5514) - Replaced the links for NumPy docs as per issue #3418 (#5548)
- Use Sphinx's
envvar
construct (#5570) - Fix intersphinx for SciPy 1.7.1 docs (#5587)
Installation
Tests
- Add tests for num_to_num's optional parameters (#5337)
- Add script for ROCm CI on Jenkins (#5378)
- Skip unwrap tests for
numpy<1.21
(#5384) - Enable strict xfail in pytest (#5407)
- Remove xfail in windows jitify test (#5409)
- Fix preloading slow tests (#5440)
- Add script for CUDA 11.4 CI on FlexCI (#5457)
- Increase memory for CUDA 11.4 tests (#5477)
- Fix DLPack test for ROCm/HIP (#5485)
- Fix "Revert test decorators order" (#5498)
- Fix some tests for HIP (#5501)
- Fix FlexCI Linux tests (#5505)
- Add CUDA 11.4 for FlexCI helper script (#5528)
- Increase timeout for CUDA 11.4 tests (#5575)
- Update tests to install all requirements and add PATH (#5576)
- Add Cython to
all
requirements (#5577)
Others
- Notify conflict by mergify (#5371)
- Fix mergify to only comment when pull-request is open (#5439)
- Fix mergify condition (#5513)
- Add auto notify bot for
hip
label (#5538) - Use
pull_request_target
instead for auto notify bot (#5541) - Fix auto notify bot for issues (#5546)
- Disable Mergify's auto-merge (#5556)
- Bump version to v10.0.0b1 (#5595)
- Fix signal tests for scipy 1.7.0 (#5368)
- Fix
numpy.unwrap
for NumPy 1.21 (#5385) - Fix signaltools
medfilt
forscipy>=1.7.0
(#5386) - Fix deprecated
numpy.typeDict
utilization (#5388)
The CuPy Team would like to thank all those who contributed to this release!
@12rambau @grlee77 @leofang @maxim-belkin @Palash-Vishnani @povinsahu1909 @the-lay
v9.3.0
This is the release note of v9.3.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
CuPy now supports CUDA 11.4 (cupy-cuda114
)
Along with the new CUDA toolkit version, support for NCCL 2.10.3 and cuDNN 8.2.2 libraries is added.
Compute capability 86 support for GPUs of the RTX 30X0 and AX000 series is also added.
Known Issues
cupy-cuda102
,cupy-cuda110
andcupy-cuda111
wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.
Changes
Enhancements
- Support NCCL v2.9.9 (#5402)
- Update NumPy/SciPy pinning in
setup.py
(#5471) - Support CUDA 11.4 and support
compute_86
(#5519) - Support cuDNN v8.2.2 (#5523)
- Make
matrix_power
support stacked matrices (#5525) - Support NCCL v2.10.3: library installer and document (#5526)
Bug Fixes
- JIT: Fix supported dtype of
atomic_add
on HIP (#5405) - Fix cupy.nanmedian's axis parameter to accept a sequence other than a tuple (#5416)
- Fix compatibility issues of
ndarray.view
(#5442) - Fix
types
attribute of ufunc (#5455) - Fix random
integers
(#5484) - Fix random generator output not being raveled (#5487)
- Fix astype from boolean (#5490)
- Fix reshape (#5504)
- Fix
linalg.lstsq
for empty matrix (#5506) - Add missing checks and
_setStream()
(#5507) - Fix availability tests in cuSOLVER and cuSPARSE (#5534)
- prune cufft static lib by major cc ver (#5536)
- Fix casts from bool in ufunc inputs (#5549)
- Code fix for {cu, roc}SOLVER (#5566)
- Access
cudaMemoryType
in the pointer attributes and fix for HIP (#5571) - Fix broadcast error messages (#5584)
- Fix casts in ufunc outputs (#5589)
- Fix broken build on CUDA 9.2 (#5598)
Code Fixes
- Remove the data member
use_32bit_indexing
fromCArray
(#5414) - JIT: Fix
__call__()
for built-in functions (#5422) - Do not call
cudnnGetVersion
on import (#5446) - Add HIP symbol redefinitions (#5475)
- Try to use
-I
in hipRTC (#5502) - Hide modules from public APIs (#5533)
- Use the new macro
__HIP_PLATFORM_AMD__
at build time (#5565)
Documentation
- Update tag lines in package description and docs index (#5415)
- Fix typo in
apply_along_axis
(#5441) - Fix indent of
Returns
section (#5452) - Update
user_guide/basic.rst
device agnostic section (#5456) - Update install guide with new NumPy/SciPy versions (#5465)
- Bump ReadTheDocs configuration to version 2 (#5497)
- Fix docs of
eigh
andeigvalsh
(#5499) - Use Sphinx 4.1.0 (#5500)
- Document
scipy.fft
backend usage (#5532) - Support CUDA 11.4 on documents (#5535)
- Replaced the links for NumPy docs as per issue #3418 (#5553)
- Use Sphinx's
envvar
construct (#5586) - Fix intersphinx for SciPy 1.7.1 docs (#5588)
Installation
Examples
Tests
- Skip unwrap tests for
numpy<1.21
(#5412) - Remove xfail in windows jitify test (#5418)
- Enable strict xfail in pytest (#5423)
- Add missing DLPack test for complex numbers (#5425)
- Fix
unwrap
tests for v9 (#5426) - Fix preloading slow tests (#5445)
- Add script for ROCm CI on Jenkins (#5468)
- Add script for CUDA 11.4 CI on FlexCI (#5473)
- Increase memory for CUDA 11.4 tests (#5480)
- Fix "Revert test decorators order" (#5518)
- Fix FlexCI Linux tests (#5520)
- Add CUDA 11.4 for FlexCI helper script (#5543)
- Fix scipy requirement in tests (#5563)
- Fix some tests for HIP (#5578)
- Update tests to install all requirements and add PATH (#5581)
- Add Cython to
all
requirements (#5582)
Others
- Notify conflict by mergify (#5419)
- Fix mergify to only comment when pull-request is open (#5510)
- Fix mergify condition (#5517)
- Add auto notify bot for
hip
label (#5540) - Use
pull_request_target
instead for auto notify bot (#5542) - Fix auto notify bot for issues (#5547)
- Disable Mergify's auto-merge (#5562)
- Bump version to v9.3.0 (#5596)
- Fix deprecated
numpy.typeDict
utilization (#5403) - Fix signal tests for SciPy 1.7.0 (#5413)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
v10.0.0a2
This is the release note of v10.0.0a2. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
- CuPy now supports CUDA 11.3 (
cupy-cuda113
) and AMD ROCm 4.2 (cupy-rocm-4-2
) and binary wheels are now available on PyPI. - The following Python syntax and new APIs can now be used in JIT target functions.
- Calling
len
,min
,max
Python built-ins.len(arr)
: Equivalent toarr.shape[0]
.min(scalar1, scalar2, ...)
: Returns the minimum value of the inputs.max(scalar1, scalar2, ...)
: Returns the maximum value of the inputs.
- Accessing
.ndim
,.size
attributes ofndarray
. - Unpacking nested tuples.
(x, y), z = ...
jit.grid()
API, similar tonumba.cuda.grid
.x, y, z = cupyx.jit.grid(3)
(x
is equal tothreadIdx.x + blockIdx.x * blockDim.x
.)
- Warp shuffle and sync functions.
cupyx.jit.shfl_down_sync(mask, var, val_id)
(__shfl_down_sync(mask, var, val_id)
)
- Calling
cupyx.scipy.sparse.{coo,csr,csc}_matrix
now provides thereshape
method.
Changes without compatibility
Drop CUDA 9.2 & NCCL 2.4 Support (#5214)
CUDA 9.2 and NCCL 2.4 are no longer supported in CuPy v10.
Changes in Stream behavior (#5251)
The same cupy.cuda.Stream
instance can now safely be shared between multiple threads. To achieve this, CuPy v10 will not destroy the stream (i.e., call cudaStreamDestroy
) if the stream is the current stream of any thread.
Known Issues
cupy-cuda111
wheels only support CUDA 11.1.1 and will not work with CUDA 11.1.0 (#5313).cupy-cuda110
andcupy-cuda111
wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.
Changes
New Features
- Add reshape method for COO, CSR and CSC matrices (#5301)
- Support
len
,min
,max
,.ndim
,.size
in jit (#5319) - Support nested tuple unpack in CuPy JIT (#5332)
- Support Numba-like
jit.grid()
syntax in CuPy JIT (#5334) - Support warp shuffle and sync functions in CuPy JIT (#5335)
Enhancements
- Do not use handles unless requested in
cupy.show_config()
(#5073) - Fix to allow sharing a Stream instance between threads (#5251)
- Adding GUFunc order, dtype and casting kwarg support (#5260)
- Support
nan
,posinf
,neginf
incupy.nan_to_num
(#5295) - Use independent version of hipFFT for ROCm 4.1 and later (#5318)
- Support cuTENSOR v1.3.1 (#5338)
- Support cuDNN v8.2.1 (#5357)
Performance Improvements
- Make cuTENSOR available in
cupy.einsum
(#5203)
Bug Fixes
- Fix
check_availablity
forcupy.cusolver
(#5207) - Fix
MemoryAsync
to keep a weakref to stream (#5264) - Fix cuFFT callback for
sm_61
etc (#5304) - Fix cuDNN preloading (#5327)
- Fix large arrays assignment (#5330)
- Ensure source array is C-contiguous before copying to
CUDAArray
(#5342) - Increase test coverage for Generalized Universal Functions (#5344)
- Remove unnecessary print (#5374)
Code Fixes
- Fix cub repository url (#5236)
- Code and comment fixes for stream (#5243)
- Use
cdef
instead ofcpdef
where appropriate (#5274)
Documentation
- Fix
matmul
docstring (#5174) - Update list of wheels in README (#5267)
- Add user guide for FFT (#5272)
- Bump CuPy version in docs (#5277)
- Add user guide for streams & events (#5283)
- Fix deadlink to tutorial and reorder in README (#5287)
- Document
ExternalStream
(#5305) - Add ROCm 4.2 support to install docs (#5354)
user_guide/basic.rst
: various improvements (#5356)
Installation
- Drop support for CUDA 9.2 & NCCL 2.4 (#5214)
- Add upper restrictions to NumPy/SciPy versions (#5225)
- Exclude Cython 3 from
setup_requires
(#5273)
Tests
- Fix threading memory pool tests (#5263)
- Temporarily remove the async pool test from
TestAllocator
(#5308) - Fix Windows CI kernel cache (#5310)
- Tentatively skip unstable
MemoryPoolAsync
tests (#5350) - Xfail random generator tests for HIP (#5355)
- Tentatively pin to SciPy 1.6 in Windows CI (#5366)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@anaruse @eternalphane @leofang @maxim-belkin @povinsahu1909
v9.2.0
This is the release note of v9.2.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
- CuPy now supports CUDA 11.3 (
cupy-cuda113
) and AMD ROCm 4.2 (cupy-rocm-4-2
) and binary wheels are now available on PyPI.
Known Issues
cupy-cuda111
wheels only support CUDA 11.1.1 and will not work with CUDA 11.1.0 (#5313).cupy-cuda110
andcupy-cuda111
wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.
Changes
Enhancements
- Add CUDA 11.3 headers (#5232)
- Do not use handles unless requested in
cupy.show_config()
(#5285) - Use independent version of hipFFT for ROCm 4.1 and later (#5351)
- Support cuTENSOR v1.3.1 (#5370)
- Support cuDNN v8.2.1 (#5372)
Bug Fixes
MemoryAsyncPool
: Use the "current" mempool instead of the "default" one (#5271)- Fix MemoryAsync to keep a weakref to stream (#5307)
- Fix cuFFT callback for sm_61 etc (#5325)
- Fix large arrays assignment (#5333)
- Fix
check_availablity
forcupy.cusolver
(#5336) - Fix cuDNN preloading (#5365)
- Ensure source array is C-contiguous before copying to
CUDAArray
(#5375) - Remove unnecessary print (#5377)
Code Fixes
Documentation
- Fix
matmul
docstring (#5281) - Update list of wheels in README (#5284)
- Add user guide for FFT (#5286)
- Fix deadlink to tutorial and reorder in README (#5291)
- Add user guide for streams & events (#5302)
- Document
ExternalStream
(#5312) user_guide/basic.rst
: various improvements (#5356)- Add ROCm 4.2 support to install docs (#5360)
Installation
Tests
- Fix threading memory pool tests (#5289)
- Fix Windows CI kernel cache (#5317)
- Xfail random generator tests for HIP (#5359)
- Tentatively pin to SciPy 1.6 in Windows CI (#5369)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
v10.0.0a1
This is the release note of v10.0.0a1. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
CUDA 11.0 and 11.1 wheels not available yet in PyPI (#4971)
In the meantime, they can be downloaded from the Assets section below. See #4971 for the detailed instructions.
Changes without compatibility
Current stream is now managed per device (#5172)
CuPy now automatically manages the stream switching when changing a device, so the user is not responsible for changing the stream anymore.
This pull-request also includes a bug fix for #5143. An existing code mixing with stream:
blocks and stream.use()
may get different results as the stream set via use()
API will not be reactivated when exiting a stream context.
s1 = cupy.cuda.Stream()
s2 = cupy.cuda.Stream()
s3 = cupy.cuda.Stream()
with s1:
s2.use()
with s3:
pass
cupy.cuda.get_current_stream() # -> CuPy v10 returns `s1` instead of `s2`.
Make cupy.cuda.Device
context manager interface thread safe (#5083)
The use of a single cupy.cuda.Device
context manager object with multiple threads was leading to incorrect behavior when restoring the previous device since the first versions of CuPy. Now the correct device is restored back so user code relying on this incorrect behavior might need to be updated.
Deprecate cupyx.allow_synchronize
and cupyx.DeviceSynchronized
APIs (#5226)
These APIs used for detecting when synchronization to a device was happening have been deprecated since they don’t provide reliable behavior.
Changes
Note: many of these PRs are backported to the v9 series and available since the release.
New Features
- CUDA 11.2: Add
MemoryAsyncPool
to supportmalloc_async
(#4592) - Add APIs for creating NumPy arrays backed by pinned memory (#4870)
- Support cuSPARSELt (#4883)
- Add gamma distributions to random API (#4905)
- Add
random
for uniform [0, 1) generation (#4906) - Add
poisson
distribution to random API (#4927) - Add SciPy compatible connected_components (#4940)
- Support shared memory in CuPy JIT (#4950)
- Add cupyx.scipy.sparse.kronsum() (#4968)
- Add
hfft2
,ihfft2
,hfftn
, andihfftn
tocupyx.scipy.fft
(#4996) - CuPy JIT: Print kernel code (#5017)
- Add
cupyx.jit.atomic_add
(#5169) - CUDA 11.2/11.3: Support
MemoryAsyncPool
statistics and limits (#5177)
Enhancements
- Ability to pass structured data types by value as kernel parameters (#4829)
- Move the NVTX module to
cupy_backends.cuda.libs
(#4930) - Disable CUB SpMV on CUDA 11.x (#4949)
- CuPy JIT: Readable compile error messages (#4991)
- Fix JIT test failures on ROCm (#4998)
- Mark
cupyx.jit.rawkernel
as experimental (#5005) - HIP: add
-ftz=true
(#5007) - Give gufunc a name (#5013)
- CuPy JIT: Use C++-like typing rule in 'cuda' mode (#5028)
- Add PCI Bus ID to show_config (#5037)
- Print cuSPARSELt version in
show_config
(#5054) - Support custom getsource option in CuPy JIT (#5071)
- Make
cupy.cuda.Device
context manager interface thread safe (#5083) - Add a new argument
out
tocupy.asnumpy()
(#5155) - Support cuSPARSELt v0.1.0 (#5158)
- Per device stream (#5172)
- cuTENSOR v1.3.0 for library installer (#5192)
- Add
sum_labels
tocupyx.scipy.ndimage.measure
(#5200) - Support NCCL v2.9.8 (#5201)
- Fix thrust compilation for ROCm 4.2.0 (#5209)
- Add NVCC path and Python version to
show_config
(#5215) - Add CUDA 11.3 headers (#5218)
- Add libraries for CUDA 11.3 (#5219)
- Remove
syncdetect
APIs (#5226)
Bug Fixes
- Use
THRUST_OPTIONAL_CPP11_CONSTEXPR
(#5002) - Use async memcpy in
ndarray.copy
(#5004) - Fix DLPack
lanes
(#5045) - Disable cuFFT plan cache on CUDA 11.1 (#5046)
- Support PTDS in CuPy memory pool (#5072)
- CuPy JIT: Fix range type (#5077)
- Fix
poisson
to support lam array (#5087) - Adjust PATH when preloading to load cuDNN v8 correctly on Windows (#5103)
- Bugfix for typing rule of CuPy JIT (#5125)
- Fix TypeError in
svds
(#5140) - Properly handle non-contiguous RHS in
cupyx.scipy.sparse.linalg.spsolve
(#5168) - Fix integer
scatter_add
failure on Windows (#5173) MemoryAsyncPool
: Use the "current" mempool instead of the "default" one (#5191)- Fix
matmul
for input with relaxed strides (#5205) - Add
check_availability
for cuTensor routines (#5206) - Fix windows
constexpr
(#5233) - Remove duplicated subtraction in
cupy.random.Generator.integers
(#5247)
Code Fixes
- Rename
cupy.core
submodule tocupy._core
(#3820) - Fix some internal
cpdef
functions tocdef
in_kernel.pyx
(#5084) - Remove
cupy.cupy
(#5121) - Cosmetic change in cuSPARSELt stub header (#5149)
- Cosmetic changes of CuPy JIT implementation (#5152)
Documentation
- Follow the latest NumPy/SciPy docs style (#4945)
- Fix docs: cupy-cuda112 now on PyPI (#4957)
- Update installation guide for Conda-Forge (#4985)
- CuPy JIT documentation (#5012)
- Document
cupyx.time.repeat
(#5015) - Document
cupy.cuda.runtime.getDeviceProperties
(#5016) - More documentation on the supported backends (#5019)
- Add links to Anaconda, Gitter, StackOverflow (#5020)
- Improve the documentation on interoperability (#5023)
- Document
CFunctionAllocator
andManagedMemory
(#5025) - Fix code block in installation guide (#5033)
- Improve comments for memory and stream API usage (#5060)
- Point to the correct numpy random docs (#5088)
- Add user guide (#5093)
- Add ROCm limitations to docs (#5107)
- Reorganize API reference pages (#5108)
- Revise ROCm doc (#5122)
- Fix docs of
scatter_add
(#5129) - Mention baseline API change in upgrade guide (#5131)
- Fix ROCm wheel install steps (#5133)
- Fix docstring in
coo.py
(#5139) - Fix docs in
stream.pyx
(#5144) - cuDNN v8.2 on documentation (#5148)
- Mention PTDS in ROCm Limitation (#5159)
- Use Sphinx 4 (#5188)
- cuTENSOR v1.3 on documentation (#5196)
- Fix cuSPARSELt not covered in docs (#5221)
- Add
cupyx.scipy.ndimage.sum_labels
to docs (#5223) - Improve README (#5254)
- Update logo image (#5255)
- Tentatively remove CUDA 11.3 from support list (#5256)
Installation
- Fix Windows dll loading for Conda (#4974)
- Add warnings for duplicate installation (#5032)
- cuDNN v8.2.0 for library installer (#5146)
- Bump version to v10.0.0a1 (#5269)
Examples
- Fix cuSPARSELt example not to use internal function (#4995)
- Update examples for current version of CuPy (#4999)
Tests
- Refactor random tests (#4907)
- Tentatively pin CI to ROCm 4.0.1 (#4961)
- Fix
cutensor
import in the test (#4965) - Make
install_tests
runnable without depending on current path (#4969) - Avoid using
pip install -e
on Windows CI for performance (#4970) - Update known base branches in flexCI config (#4973)
- Update list of known branches (#4982)
- Fix
TestStream
cleanup (#5042) - Mark some memory tests as
testing.slow
(#5061) - Fix stream usage on D2D copy test under HIP (#5091)
- Xfail tests for random distribution generator under HIP/ROCm (#5096)
- Adjust testing tolerance for
hfftn
for HIP/ROCm (#5099) - Use current device in tests (#5127)
- Fix for updated FlexCI base image (#5164)
- Relax tolerance of
cupyx.jit.atomic_add
test (#5186) - Test build for ROCm 4.0 and latest (#5224)
- Fix mergify configuration (#5248)
Others
- Use bot mode in automatic backport (#5051)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@anaruse @beingaryan @eternalphane @grlee77 @insertinterestingnamehere @keckj @leofang @povinsahu1909 @UmashankarTriforce