11 Nov 07:02

3a1d844

v9.6.0

This is the release note of v9.6.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Announcements

Final release for v9.x series

This is expected to be the last release of the CuPy v9 series. Please start trying your workflow with CuPy v10.0.0rc1 and let us know if you have any feedback!

CuPy now supports CUDA 11.5

Wheels for CUDA 11.5 (cupy-cuda115) are now available.

Removal of Alpha/Beta/RC Wheels from PyPI

As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre) . Note that the sdist package is available in PyPI for all versions.
Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.

Changes

Enhancements

Make show_config runnable without GPU (#5839)
Merge fp16 headers for CUDA 11.2+ (#6004)
Support cuTENSOR 1.3.3 (#6005)
Support CUDA 11.5 for library installer (#6010)
Display license terms when downloading libraries (#6041)
Fix error type/message for duplicate value in axis (#5987)

Bug Fixes

Do not use cuTENSOR unless available (#5885)
Fix non-determinisitc behavior in cupy.random.shuffle (#5887)
Fix ndarray.clip to match numpy (#5916)
Fix __repr__ of mode and scalar in cuTENSOR (#5917)
Fix max blocksize used in cupyx.optimizing.optimize for HIP (#5931)
Fix ravel for strides 0 (#5998)
Fix cuTENSOR installation on Windows (#6022)
Allow generating cubins for the max known CC (#6024)

Documentation

Update upgrade guide (#5834)
Document ppc64le and aarch64 are supported on conda-forge (#5869)
Improve the comparison table (#5911)
Add footnotes for functions unimplemented in CuPy (#5954)
Update the docstring for cholesky (#5960)
Document CUPY_ACCELERATORS (#5975)
Add favicon to docs (#5983)
Support CUDA 11.5 on documents (#6006)
Replace favicon with high resolution one (#6008)
Fix typo in copyright line (#6035)

Tests

Clean up plan cache in a FFT slow test (#5825)
Copy source directory to support pip 21.3 (#5896)
Simplify legacy ROCm test script for FlexCI (#5936)
Relax sparse linalg testing tolerance (#5958)
CI: Fix ROCm build test (FlexCI) failing (#5965)
Improve handling of FlexCI test runs (#6002)
Upload cache even when test failed in FlexCI (#6003)
CI: Increase timeout for CUDA 11.4 / 11.5 tests (#6040)
CI: Do not run full combination test even for branch tests for ROCm (#5974)

Others

Avoid triggering docker workflow on release of forked repos (#5886)
Bump version to v9.6.0 (#6043)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @drbeh @emcastillo @kmaehashi @leofang @takagi @toslunar

Contributors

kmaehashi, takagi, and 5 other contributors

Assets 82

30 Sep 08:29

kmaehashi

v9.5.0

aa82a99

v9.5.0

This is the release note of v9.5.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Announcements

Removal of Alpha/Beta/RC Wheels from PyPI

As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre) . Note that the sdist package is available in PyPI for all versions.
Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.

Changes

Enhancements

Support cuDNN 8.2.4 (#5744)
Support NCCL 2.11.4 (#5747)
Fix cupyx.optimize to save file when no optimization ran (#5760)

Bug Fixes

Fix spline filter with large array (#5686)
Fix exception for indexing with multiple ellipses (#5739)
Fix docstring for fallback modules (#5742)
Include stdexcept in hip headers (#5777)
Fixed typo in error message in sparse.csr_matrix (#5788)
Fix MAX_NDIM and add guards/tests (#5798)
Disable spmm on Windows CUDA 10.2 (#5805)

Documentation

Fix random docstring (#5708)
Remove --pre from ROCm source build instructions (#5782)
Use custom index for pre-release wheels (#5793)

Installation

Add maintainers in setup.py (#5758)
Bump version to v9.5.0 (#5808)

Tests

Update test_eigenvalue.py (#5643)
Improve performance of TestSplineFilter1dLargeArray (#5694)
Stop inheriting unittest.TestCase for performance (#5710)
TestSplineFilter1dLargeArray marked slow and reduced memory usage (#5729)
Make testing helpers support non-methods (#5731)
Make test parameter names static (#5733)
Update pip and setuptools in Windows CI (#5738)
Improve FlexCI output (#5796)
Fix error message comparison (#5806)

Others

Add workflow to test/build/push docker images on pull-request/release (#5752)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@christinahedges @emcastillo @kmaehashi @leofang @takagi @toslunar

Contributors

kmaehashi, takagi, and 4 other contributors

Assets 74

30 Sep 08:28

kmaehashi

v10.0.0b3

4053fa9

v10.0.0b3 Pre-release

Pre-release

This is the release note of v10.0.0b3. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Array API initial support (#5698)

This release starts implementing the Array API standard for interoperability with other tensor libraries. Please check the CuPy documentation to see a list of the currently available features.

Changes without compatibility

Drop support for CUDA 10.1 or earlier (#5770)

As per the RFC in #5717 and twitter, the minimum CUDA version that will be supported by CuPy v10 is CUDA 10.2.

Drop support for Python 3.6 (#5771)

Following the Python 3.6 sunset on December 2021, and the compatibility lines with NumPy, starting CuPy v10, Python 3.6 will no longer be supported.

Alpha/Beta/RC wheels no longer distributed through PyPI

As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre) . Note that the sdist package is available in PyPI for all versions.
Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.

Changes

New Features

Add binomial distribution to new Generator (#5429)
Adopt the numpy.array_api module as cupy.array_api (#5698)

Enhancements

Improve stream mismatch error message (#5706)
Support cuDNN 8.2.4 (#5726)
Support NCCL 2.11.4 (#5734)
Fix cupyx.optimize to save file when no optimization ran (#5757)
Adding bitorder support to cupy.unpackbits (#5765)
Drop support for CUDA 10.1 or earlier (#5770)
Drop support for Python 3.6 (#5771)

Bug Fixes

Fix spline filter with large array (#5673)
Fix exception for indexing with multiple ellipses (#5718)
Fix docstring for fallback modules (#5728)
Fix MAX_NDIM and add guards/tests (#5749)
Fixed typo in error message in sparse.csr_matrix (#5767)
Include stdexcept in hip headers (#5769)
Disable spmm on Windows CUDA 10.2 (#5802)

Code Fixes

Prefix Cython compile_time_env with CUPY_ (#5740)

Documentation

Use custom index for pre-release wheels (#5772)
Remove --pre from ROCm source build instructions (#5773)

Installation

Reorganize build scripts, part 1 (#5730)
Reorganize build scripts, part 2: separate modules (#5743)
Reorganize build scripts, part 3: simplify setup.py (#5745)
Reorganize build scripts, part 4: remove global cupy_setup_options (#5754)
Reorganize build scripts, part 5: remove Cython version check (#5755)
Add maintainers in setup.py (#5756)
Bump version to v10.0.0b3 (#5807)

Tests

Make testing helpers support non-methods (#5594)
Stop inheriting unittest.TestCase for performance (#5599)
Eliminate random test ids (#5659)
Improve performance of TestSplineFilter1dLargeArray (#5693)
TestSplineFilter1dLargeArray marked slow and reduced memory usage (#5724)
Make test parameter names static (#5727)
Update pip and setuptools in Windows CI (#5735)
Improve FlexCI output (#5786)
Skip tests for bug cases (FFT on CUDA 10.2 + Pascal) (#5791)
Fix error message comparison (#5799)
Fix test skip issue (#5801)

Others

Update auto-notify bot for array-api label (#5725)
Fix backport trigger (#5741)
Add workflow to test/build/push docker images on pull-request/release (#5746)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@christinahedges @emcastillo @iskode @kmaehashi @leofang @povinsahu1909 @takagi @toslunar

Contributors

kmaehashi, takagi, and 6 other contributors

Assets 47

26 Aug 07:39

emcastillo

v10.0.0b2

836f41b

v10.0.0b2 Pre-release

Pre-release

This is the release note of v10.0.0b2. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support for CUDA Python (#5638)

CuPy is one of the first libraries providing support for the newly released CUDA Python bindings. To try it, install cuda-python manually and set the CUPY_USE_CUDA_PYTHON=1 environment variable when building CuPy as written in the documentation.

Support for AMD ROCm 4.3

Support for ROCm 4.3 has been added in the latest release and binary wheels are provided as well. Note that there is currently an issue with ROCm 4.3 that prevents it from running in several environments. The current workaround is to set the LLVM_PATH variable to the llvm folder included in ROCm 4.3 installation (e.g., export LLVM_PATH=/opt/rocm-4.3/llvm).

Announcements

Removal of Alpha/Beta/RC Wheels from PyPI

As per the discussion in #5671, we will stop uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the assets section of each GitHub release page (e.g., pip install cupy-cudaXXX -f https://github.com/cupy/cupy/releases/tag/v10.0.0b2) . Note that the sdist package is available in PyPI for all versions.
We are also going to remove outdated (v8.0.0rc1 or earlier) pre-release binary wheels from PyPI on September 20th. See #5667 for details.

Changes

New Features

Support batched QR solver (#5583)
Add cupyx.scipy.sparse.linalg.minres (#5585)
Add Log Series distribution to cupy.random.Generator (#5618)
Add Power distribution to cupy.random.Generator (#5624)
Add support for CUDA Python (#5638)
Add Chi-square distribution to cupy.random.Generator (#5645)
Add Dirichlet distribution to cupy.random.Generator (#5648)
Add F distribution to cupy.random.Generator (#5655)

Enhancements

Add ncclAvg and ncclBfloat16 for NCCL (#5545)
Add new eigensolvers from rocSOLVER (#5555)
Add support for array input in beta distribution of cupy.random.Generator (#5573)
Release the GIL for several NCCL ops (#5574)
Allow to compile using PTX with an envvar (#5622)
Show CUDA Python version (#5651)
Fix version check for new ROCm version definition (#5657)
Rest of version check fix for new ROCm version definition (#5660)
Add ROCm 4.3 in duplicate detection (#5669)

Bug Fixes

Fix compute capability check (#5600)
Fix FFT convolve for shapes containing 1 (#5609)
Fix squareness checks (#5642)
Fix unique for empty array (#5654)

Code Fixes

Add batch_identity helper (#5614)
Remove unnecessary comments (#5631)

Documentation

Update Sphinx to 4.1.2 (#5612)
Fix random docstring (#5628)
Support ROCm v4.3 in document (#5633)
__array_function__ feature by default (#5644)

Tests

Fix skipTest in test_decomp_lu (#5593)
Mark lsmr tests xfail for CSR matrices on HIP (#5597)
Increase test timeout (#5601)
Fix cubic for_all_dtypes_combination tests (#5629)
Add CI for ROCm 4.3 (#5630)
Reload GPG key for ROCm 4.2 test (#5636)
Fix branch name of cuda-python (#5650)
Add a workaround for ROCm 4.3.0 for testing (#5662)

Others

Add cupy-cuda114 to duplicate detection (#5621)
Bump version to v10.0.0b2 (#5679)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@hauntsaninja @leofang @povinsahu1909 @yashasvimisra2798

Contributors

leofang, hauntsaninja, and 2 other contributors

Assets 78

26 Aug 07:39

kmaehashi

v9.4.0

58f3db2

v9.4.0

This is the release note of v9.4.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Compile with SASS (CUBIN) for CUDA versions >= 11.1 (#5097)

Changes NVRTC compile process to produce SASS (CUBIN files) instead of PTX so that kernels compiled with a new CUDA Toolkit version can be run with earlier CUDA Drivers. Check the CUDA Compatibility Guide and NVRTC Documentation for detailed information. We believe most users will not be affected by this change, but you can revert to the previous behavior by setting CUPY_COMPILE_WITH_PTX=1 environment variable just in case.

Support for AMD ROCm 4.3

Changes

Enhancements

Compile with SASS for CUDA versions >= 11.1 (#5611)
Allow to compile using PTX with an envvar (#5634)
Add ncclAvg and ncclBfloat16 for NCCL (#5656)
Fix version check for new ROCm version definition (#5661)
Rest of version check fix for new ROCm version definition (#5670)

Bug Fixes

Fix FFT convolve for shapes containing 1 (#5613)
Fix the RTC call path for HIP (#5620)
Fix compute capability check (#5646)
Fix squareness checks (#5652)
Fix unique for empty array (#5658)

Code Fixes

Fix kernel names to be consistent (#5625)
Remove unnecessary comments (#5635)

Documentation

Update Sphinx to 4.1.2 (#5616)
__array_function__ feature by default (#5653)
Support ROCm v4.3 in document (#5674)

Tests

Increase test timeout (#5615)
Increase timeout for CUDA 11.4 tests (#5617)
Add CI for ROCm 4.3 (#5632)
Reload GPG key for ROCm 4.2 test (#5637)
Fix cubic for_all_dtypes_combination tests (#5639)
Add a workaround for ROCm 4.3.0 for testing (#5663)
Fix skipTest in test_decomp_lu (#5672)

Others

Bump version to v9.4.0 (#5680)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@grlee77 @leofang @yashasvimisra2798

Contributors

leofang, grlee77, and yashasvimisra2798

Assets 74

05 Aug 08:20

kmaehashi

v10.0.0b1

4ebc827

v10.0.0b1 Pre-release

Pre-release

This is the release note of v10.0.0b1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CuPy now supports CUDA 11.4 (`cupy-cuda114`)

Along with the new CUDA toolkit version, support for NCCL 2.10.3 and cuDNN 8.2.2 libraries is added.

Compute capability 86 support for GPUs of the RTX 30X0 and AX000 series is also added.

Google Summer of Code

CuPy is participating in Google Summer of Code under the NumFOCUS organization.

Our student @povinsahu1909 is working hard to add support for sparse linear algebra solvers and increasing the compatibility of the new random number generation API.

Compile with SASS (CUBIN) for CUDA versions >= 11.1 (#5097)

Changes without compatibility

Support the new DLPack exchange protocol (#5306)

By adopting the new DLPack exchange protocol proposed in the Python array API standard, cupy.fromDlpack has been deprecated in favor of cupy.from_dlpack.

Known Issues

cupy-cuda102, cupy-cuda110 and cupy-cuda111 wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.

Changes

New Features

Texture memory 2D/3D affine transformations (#5171)
Support the new DLPack exchange protocol (#5306)
Add cupyx.scipy.sparse.linalg.lsmr (#5331)
JIT: Support all atomic intrinsics (#5387)
Expose _GUFunc through cupyx (#5408)
Add geometric distribution to new Generator (#5443)
Support Numba-like jit.gridsize() syntax in CuPy JIT (#5461)
Support Numba-like jit.laneid() and jit.warpsize syntax in CuPy JIT (#5462)
Add cupyx.scipy.sparse.linalg.cgs (#5524)
Add hypergeometric distribution to new Generator (#5560)

Enhancements

Compile with SASS for CUDA versions >= 11.1 (#5097)
Support NCCL v2.9.9 (#5268)
Support CUDA 11.4 and compute_86 (#5434)
Update NumPy/SciPy pinning in setup.py (#5453)
Make matrix_power support stacked matrices (#5458)
Support hipSPARSE and fix streams not set in some generic APIs in cuSPARSE (#5472)
Add cudaDeviceDisablePeerAccess wrapper (#5495)
Support cuDNN v8.2.2 (#5516)
Support NCCL v2.10.3: library installer and document (#5521)

Bug Fixes

JIT: Fix supported dtype of atomic_add on HIP (#5383)
Fix cupy.nanmedian's axis parameter to accept a sequence other than a tuple (#5389)
Fix astype from boolean (#5410)
Fix compatibility issues of ndarray.view (#5428)
Fix types attribute of ufunc (#5448)
Fix new DLPack protocol error messages and tests (#5449)
texture_memory option in affine_transform not supported by HIP (#5464)
Fix linalg.lstsq for empty matrix (#5467)
Fix reshape (#5470)
Fix random generator output not being raveled (#5478)
Fix random integers (#5479)
Fix availability tests in cuSOLVER and cuSPARSE (#5492)
Add missing hipSPARSE include to builder (#5515)
prune cuFFT static lib by major cc ver (#5531)
Fix casts from bool in ufunc inputs (#5539)
Access cudaMemoryType in the pointer attributes and fix for HIP (#5544)
Fix casts in ufunc outputs (#5550)
Code fix for {cu, roc}SOLVER (#5558)
Fix CUDA API call on module initialization (#5561)
Fix the RTC call path for HIP (#5569)
Fix broadcast error messages (#5579)

Code Fixes

Do not call cudnnGetVersion on import (#5326)
JIT: Fix __call__() for built-in functions (#5361)
Add HIP symbol redefinitions (#5362)
Remove the data member use_32bit_indexing from CArray (#5376)
Use dtype.name instead dtype.char (#5444)
Try to use -I in hipRTC (#5486)
Hide modules from public APIs (#5522)
consistent kernel names (#5551)
Use the new macro __HIP_PLATFORM_AMD__ at build time (#5554)

Documentation

Add upgrade guide for v10 (#5278)
Update tag lines in package description and docs index (#5399)
Fix typo in apply_along_axis (#5432)
Fix indent of Returns section (#5433)
Update user_guide/basic.rst device agnostic section (#5435)
Support CUDA 11.4 on documents (#5447)
Update install guide with new NumPy/SciPy versions (#5454)
Use from_dlpack instead of fromDlpack (#5488)
Use Sphinx 4.1.0 (#5489)
Bump ReadTheDocs configuration to version 2 (#5491)
Fix docs of eigh and eigvalsh (#5494)
Add a lingering doc page for fromDlpack() (#5509)
Document scipy.fft backend usage (#5514)
Replaced the links for NumPy docs as per issue #3418 (#5548)
Use Sphinx's envvar construct (#5570)
Fix intersphinx for SciPy 1.7.1 docs (#5587)

Installation

Fix license_file option in setup.cfg (#5406)
Import numpy before Cython (#5482)

Tests

Add tests for num_to_num's optional parameters (#5337)
Add script for ROCm CI on Jenkins (#5378)
Skip unwrap tests for numpy<1.21 (#5384)
Enable strict xfail in pytest (#5407)
Remove xfail in windows jitify test (#5409)
Fix preloading slow tests (#5440)
Add script for CUDA 11.4 CI on FlexCI (#5457)
Increase memory for CUDA 11.4 tests (#5477)
Fix DLPack test for ROCm/HIP (#5485)
Fix "Revert test decorators order" (#5498)
Fix some tests for HIP (#5501)
Fix FlexCI Linux tests (#5505)
Add CUDA 11.4 for FlexCI helper script (#5528)
Increase timeout for CUDA 11.4 tests (#5575)
Update tests to install all requirements and add PATH (#5576)
Add Cython to all requirements (#5577)

Others

Notify conflict by mergify (#5371)
Fix mergify to only comment when pull-request is open (#5439)
Fix mergify condition (#5513)
Add auto notify bot for hip label (#5538)
Use pull_request_target instead for auto notify bot (#5541)
Fix auto notify bot for issues (#5546)
Disable Mergify's auto-merge (#5556)
Bump version to v10.0.0b1 (#5595)
Fix signal tests for scipy 1.7.0 (#5368)
Fix numpy.unwrap for NumPy 1.21 (#5385)
Fix signaltools medfilt for scipy>=1.7.0 (#5386)
Fix deprecated numpy.typeDict utilization (#5388)

The CuPy Team would like to thank all those who contributed to this release!

@12rambau @grlee77 @leofang @maxim-belkin @Palash-Vishnani @povinsahu1909 @the-lay

Contributors

the-lay, leofang, and 5 other contributors

Assets 66

05 Aug 08:20

kmaehashi

v9.3.0

c8a3cc9

v9.3.0

This is the release note of v9.3.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CuPy now supports CUDA 11.4 (`cupy-cuda114`)

Along with the new CUDA toolkit version, support for NCCL 2.10.3 and cuDNN 8.2.2 libraries is added.

Compute capability 86 support for GPUs of the RTX 30X0 and AX000 series is also added.

Known Issues

cupy-cuda102, cupy-cuda110 and cupy-cuda111 wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.

Changes

Enhancements

Support NCCL v2.9.9 (#5402)
Update NumPy/SciPy pinning in setup.py (#5471)
Support CUDA 11.4 and support compute_86 (#5519)
Support cuDNN v8.2.2 (#5523)
Make matrix_power support stacked matrices (#5525)
Support NCCL v2.10.3: library installer and document (#5526)

Bug Fixes

JIT: Fix supported dtype of atomic_add on HIP (#5405)
Fix cupy.nanmedian's axis parameter to accept a sequence other than a tuple (#5416)
Fix compatibility issues of ndarray.view (#5442)
Fix types attribute of ufunc (#5455)
Fix random integers (#5484)
Fix random generator output not being raveled (#5487)
Fix astype from boolean (#5490)
Fix reshape (#5504)
Fix linalg.lstsq for empty matrix (#5506)
Add missing checks and _setStream() (#5507)
Fix availability tests in cuSOLVER and cuSPARSE (#5534)
prune cufft static lib by major cc ver (#5536)
Fix casts from bool in ufunc inputs (#5549)
Code fix for {cu, roc}SOLVER (#5566)
Access cudaMemoryType in the pointer attributes and fix for HIP (#5571)
Fix broadcast error messages (#5584)
Fix casts in ufunc outputs (#5589)
Fix broken build on CUDA 9.2 (#5598)

Code Fixes

Remove the data member use_32bit_indexing from CArray (#5414)
JIT: Fix __call__() for built-in functions (#5422)
Do not call cudnnGetVersion on import (#5446)
Add HIP symbol redefinitions (#5475)
Try to use -I in hipRTC (#5502)
Hide modules from public APIs (#5533)
Use the new macro __HIP_PLATFORM_AMD__ at build time (#5565)

Documentation

Update tag lines in package description and docs index (#5415)
Fix typo in apply_along_axis (#5441)
Fix indent of Returns section (#5452)
Update user_guide/basic.rst device agnostic section (#5456)
Update install guide with new NumPy/SciPy versions (#5465)
Bump ReadTheDocs configuration to version 2 (#5497)
Fix docs of eigh and eigvalsh (#5499)
Use Sphinx 4.1.0 (#5500)
Document scipy.fft backend usage (#5532)
Support CUDA 11.4 on documents (#5535)
Replaced the links for NumPy docs as per issue #3418 (#5553)
Use Sphinx's envvar construct (#5586)
Fix intersphinx for SciPy 1.7.1 docs (#5588)

Installation

Fix license_file option in setup.cfg (#5411)
Import numpy before Cython (#5483)

Examples

Tests

Skip unwrap tests for numpy<1.21 (#5412)
Remove xfail in windows jitify test (#5418)
Enable strict xfail in pytest (#5423)
Add missing DLPack test for complex numbers (#5425)
Fix unwrap tests for v9 (#5426)
Fix preloading slow tests (#5445)
Add script for ROCm CI on Jenkins (#5468)
Add script for CUDA 11.4 CI on FlexCI (#5473)
Increase memory for CUDA 11.4 tests (#5480)
Fix "Revert test decorators order" (#5518)
Fix FlexCI Linux tests (#5520)
Add CUDA 11.4 for FlexCI helper script (#5543)
Fix scipy requirement in tests (#5563)
Fix some tests for HIP (#5578)
Update tests to install all requirements and add PATH (#5581)
Add Cython to all requirements (#5582)

Others

Notify conflict by mergify (#5419)
Fix mergify to only comment when pull-request is open (#5510)
Fix mergify condition (#5517)
Add auto notify bot for hip label (#5540)
Use pull_request_target instead for auto notify bot (#5542)
Fix auto notify bot for issues (#5547)
Disable Mergify's auto-merge (#5562)
Bump version to v9.3.0 (#5596)
Fix deprecated numpy.typeDict utilization (#5403)
Fix signal tests for SciPy 1.7.0 (#5413)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@12rambau @leofang @maxim-belkin @Palash-Vishnani

Contributors

leofang, 12rambau, and 2 other contributors

Assets 74

24 Jun 08:32

asi1024

v10.0.0a2

827dfba

v10.0.0a2 Pre-release

Pre-release

This is the release note of v10.0.0a2. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CuPy now supports CUDA 11.3 (cupy-cuda113) and AMD ROCm 4.2 (cupy-rocm-4-2) and binary wheels are now available on PyPI.
The following Python syntax and new APIs can now be used in JIT target functions.
- Calling len, min, max Python built-ins.
  - len(arr): Equivalent to arr.shape[0].
  - min(scalar1, scalar2, ...): Returns the minimum value of the inputs.
  - max(scalar1, scalar2, ...): Returns the maximum value of the inputs.
- Accessing .ndim, .size attributes of ndarray.
- Unpacking nested tuples.
  - (x, y), z = ...
- jit.grid() API, similar to numba.cuda.grid.
  - x, y, z = cupyx.jit.grid(3) (x is equal to threadIdx.x + blockIdx.x * blockDim.x.)
- Warp shuffle and sync functions.
  - cupyx.jit.shfl_down_sync(mask, var, val_id) (__shfl_down_sync(mask, var, val_id))
cupyx.scipy.sparse.{coo,csr,csc}_matrix now provides the reshape method.

Changes without compatibility

Drop CUDA 9.2 & NCCL 2.4 Support (#5214)

CUDA 9.2 and NCCL 2.4 are no longer supported in CuPy v10.

Changes in Stream behavior (#5251)

The same cupy.cuda.Stream instance can now safely be shared between multiple threads. To achieve this, CuPy v10 will not destroy the stream (i.e., call cudaStreamDestroy) if the stream is the current stream of any thread.

Known Issues

cupy-cuda111 wheels only support CUDA 11.1.1 and will not work with CUDA 11.1.0 (#5313).
cupy-cuda110 and cupy-cuda111 wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.

Changes

New Features

Add reshape method for COO, CSR and CSC matrices (#5301)
Support len, min, max, .ndim, .size in jit (#5319)
Support nested tuple unpack in CuPy JIT (#5332)
Support Numba-like jit.grid() syntax in CuPy JIT (#5334)
Support warp shuffle and sync functions in CuPy JIT (#5335)

Enhancements

Do not use handles unless requested in cupy.show_config() (#5073)
Fix to allow sharing a Stream instance between threads (#5251)
Adding GUFunc order, dtype and casting kwarg support (#5260)
Support nan, posinf, neginf in cupy.nan_to_num (#5295)
Use independent version of hipFFT for ROCm 4.1 and later (#5318)
Support cuTENSOR v1.3.1 (#5338)
Support cuDNN v8.2.1 (#5357)

Performance Improvements

Make cuTENSOR available in cupy.einsum (#5203)

Bug Fixes

Fix check_availablity for cupy.cusolver (#5207)
Fix MemoryAsync to keep a weakref to stream (#5264)
Fix cuFFT callback for sm_61 etc (#5304)
Fix cuDNN preloading (#5327)
Fix large arrays assignment (#5330)
Ensure source array is C-contiguous before copying to CUDAArray (#5342)
Increase test coverage for Generalized Universal Functions (#5344)
Remove unnecessary print (#5374)

Code Fixes

Fix cub repository url (#5236)
Code and comment fixes for stream (#5243)
Use cdef instead of cpdef where appropriate (#5274)

Documentation

Fix matmul docstring (#5174)
Update list of wheels in README (#5267)
Add user guide for FFT (#5272)
Bump CuPy version in docs (#5277)
Add user guide for streams & events (#5283)
Fix deadlink to tutorial and reorder in README (#5287)
Document ExternalStream (#5305)
Add ROCm 4.2 support to install docs (#5354)
user_guide/basic.rst: various improvements (#5356)

Installation

Drop support for CUDA 9.2 & NCCL 2.4 (#5214)
Add upper restrictions to NumPy/SciPy versions (#5225)
Exclude Cython 3 from setup_requires (#5273)

Tests

Fix threading memory pool tests (#5263)
Temporarily remove the async pool test from TestAllocator (#5308)
Fix Windows CI kernel cache (#5310)
Tentatively skip unstable MemoryPoolAsync tests (#5350)
Xfail random generator tests for HIP (#5355)
Tentatively pin to SciPy 1.6 in Windows CI (#5366)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @eternalphane @leofang @maxim-belkin @povinsahu1909

Assets 58

24 Jun 08:32

asi1024

v9.2.0

83d5e6d

v9.2.0

This is the release note of v9.2.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CuPy now supports CUDA 11.3 (cupy-cuda113) and AMD ROCm 4.2 (cupy-rocm-4-2) and binary wheels are now available on PyPI.

Known Issues

cupy-cuda111 wheels only support CUDA 11.1.1 and will not work with CUDA 11.1.0 (#5313).
cupy-cuda110 and cupy-cuda111 wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.

Changes

Enhancements

Add CUDA 11.3 headers (#5232)
Do not use handles unless requested in cupy.show_config() (#5285)
Use independent version of hipFFT for ROCm 4.1 and later (#5351)
Support cuTENSOR v1.3.1 (#5370)
Support cuDNN v8.2.1 (#5372)

Bug Fixes

MemoryAsyncPool: Use the "current" mempool instead of the "default" one (#5271)
Fix MemoryAsync to keep a weakref to stream (#5307)
Fix cuFFT callback for sm_61 etc (#5325)
Fix large arrays assignment (#5333)
Fix check_availablity for cupy.cusolver (#5336)
Fix cuDNN preloading (#5365)
Ensure source array is C-contiguous before copying to CUDAArray (#5375)
Remove unnecessary print (#5377)

Code Fixes

Use cdef instead of cpdef where appropriate (#5274)
Fix cub repository url (#5288)

Documentation

Fix matmul docstring (#5281)
Update list of wheels in README (#5284)
Add user guide for FFT (#5286)
Fix deadlink to tutorial and reorder in README (#5291)
Add user guide for streams & events (#5302)
Document ExternalStream (#5312)
user_guide/basic.rst: various improvements (#5356)
Add ROCm 4.2 support to install docs (#5360)

Installation

Exclude Cython 3 from setup_requires (#5273)
Add upper restrictions to NumPy/SciPy versions (#5321)

Tests

Fix threading memory pool tests (#5289)
Fix Windows CI kernel cache (#5317)
Xfail random generator tests for HIP (#5359)
Tentatively pin to SciPy 1.6 in Windows CI (#5369)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@leofang @maxim-belkin

Assets 66

27 May 07:50

emcastillo

v10.0.0a1

b01641d

v10.0.0a1 Pre-release

Pre-release

This is the release note of v10.0.0a1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CUDA 11.0 and 11.1 wheels not available yet in PyPI (#4971)

In the meantime, they can be downloaded from the Assets section below. See #4971 for the detailed instructions.

Changes without compatibility

Current stream is now managed per device (#5172)

CuPy now automatically manages the stream switching when changing a device, so the user is not responsible for changing the stream anymore.

This pull-request also includes a bug fix for #5143. An existing code mixing with stream: blocks and stream.use() may get different results as the stream set via use() API will not be reactivated when exiting a stream context.

s1 = cupy.cuda.Stream()
s2 = cupy.cuda.Stream()
s3 = cupy.cuda.Stream()
with s1:
    s2.use()
    with s3:
        pass
    cupy.cuda.get_current_stream()  # -> CuPy v10 returns `s1` instead of `s2`.

Make `cupy.cuda.Device` context manager interface thread safe (#5083)

The use of a single cupy.cuda.Device context manager object with multiple threads was leading to incorrect behavior when restoring the previous device since the first versions of CuPy. Now the correct device is restored back so user code relying on this incorrect behavior might need to be updated.

Deprecate `cupyx.allow_synchronize` and `cupyx.DeviceSynchronized` APIs (#5226)

These APIs used for detecting when synchronization to a device was happening have been deprecated since they don’t provide reliable behavior.

Changes

Note: many of these PRs are backported to the v9 series and available since the release.

New Features

CUDA 11.2: Add MemoryAsyncPool to support malloc_async (#4592)
Add APIs for creating NumPy arrays backed by pinned memory (#4870)
Support cuSPARSELt (#4883)
Add gamma distributions to random API (#4905)
Add random for uniform [0, 1) generation (#4906)
Add poisson distribution to random API (#4927)
Add SciPy compatible connected_components (#4940)
Support shared memory in CuPy JIT (#4950)
Add cupyx.scipy.sparse.kronsum() (#4968)
Add hfft2, ihfft2, hfftn, and ihfftn to cupyx.scipy.fft (#4996)
CuPy JIT: Print kernel code (#5017)
Add cupyx.jit.atomic_add (#5169)
CUDA 11.2/11.3: Support MemoryAsyncPool statistics and limits (#5177)

Enhancements

Ability to pass structured data types by value as kernel parameters (#4829)
Move the NVTX module to cupy_backends.cuda.libs (#4930)
Disable CUB SpMV on CUDA 11.x (#4949)
CuPy JIT: Readable compile error messages (#4991)
Fix JIT test failures on ROCm (#4998)
Mark cupyx.jit.rawkernel as experimental (#5005)
HIP: add -ftz=true (#5007)
Give gufunc a name (#5013)
CuPy JIT: Use C++-like typing rule in 'cuda' mode (#5028)
Add PCI Bus ID to show_config (#5037)
Print cuSPARSELt version in show_config (#5054)
Support custom getsource option in CuPy JIT (#5071)
Make cupy.cuda.Device context manager interface thread safe (#5083)
Add a new argument out to cupy.asnumpy() (#5155)
Support cuSPARSELt v0.1.0 (#5158)
Per device stream (#5172)
cuTENSOR v1.3.0 for library installer (#5192)
Add sum_labels to cupyx.scipy.ndimage.measure (#5200)
Support NCCL v2.9.8 (#5201)
Fix thrust compilation for ROCm 4.2.0 (#5209)
Add NVCC path and Python version to show_config (#5215)
Add CUDA 11.3 headers (#5218)
Add libraries for CUDA 11.3 (#5219)
Remove syncdetect APIs (#5226)

Bug Fixes

Use THRUST_OPTIONAL_CPP11_CONSTEXPR (#5002)
Use async memcpy in ndarray.copy (#5004)
Fix DLPack lanes (#5045)
Disable cuFFT plan cache on CUDA 11.1 (#5046)
Support PTDS in CuPy memory pool (#5072)
CuPy JIT: Fix range type (#5077)
Fix poisson to support lam array (#5087)
Adjust PATH when preloading to load cuDNN v8 correctly on Windows (#5103)
Bugfix for typing rule of CuPy JIT (#5125)
Fix TypeError in svds (#5140)
Properly handle non-contiguous RHS in cupyx.scipy.sparse.linalg.spsolve (#5168)
Fix integer scatter_add failure on Windows (#5173)
MemoryAsyncPool: Use the "current" mempool instead of the "default" one (#5191)
Fix matmul for input with relaxed strides (#5205)
Add check_availability for cuTensor routines (#5206)
Fix windows constexpr (#5233)
Remove duplicated subtraction in cupy.random.Generator.integers (#5247)

Code Fixes

Rename cupy.core submodule to cupy._core (#3820)
Fix some internal cpdef functions to cdef in _kernel.pyx (#5084)
Remove cupy.cupy (#5121)
Cosmetic change in cuSPARSELt stub header (#5149)
Cosmetic changes of CuPy JIT implementation (#5152)

Documentation

Follow the latest NumPy/SciPy docs style (#4945)
Fix docs: cupy-cuda112 now on PyPI (#4957)
Update installation guide for Conda-Forge (#4985)
CuPy JIT documentation (#5012)
Document cupyx.time.repeat (#5015)
Document cupy.cuda.runtime.getDeviceProperties (#5016)
More documentation on the supported backends (#5019)
Add links to Anaconda, Gitter, StackOverflow (#5020)
Improve the documentation on interoperability (#5023)
Document CFunctionAllocator and ManagedMemory (#5025)
Fix code block in installation guide (#5033)
Improve comments for memory and stream API usage (#5060)
Point to the correct numpy random docs (#5088)
Add user guide (#5093)
Add ROCm limitations to docs (#5107)
Reorganize API reference pages (#5108)
Revise ROCm doc (#5122)
Fix docs of scatter_add (#5129)
Mention baseline API change in upgrade guide (#5131)
Fix ROCm wheel install steps (#5133)
Fix docstring in coo.py (#5139)
Fix docs in stream.pyx (#5144)
cuDNN v8.2 on documentation (#5148)
Mention PTDS in ROCm Limitation (#5159)
Use Sphinx 4 (#5188)
cuTENSOR v1.3 on documentation (#5196)
Fix cuSPARSELt not covered in docs (#5221)
Add cupyx.scipy.ndimage.sum_labels to docs (#5223)
Improve README (#5254)
Update logo image (#5255)
Tentatively remove CUDA 11.3 from support list (#5256)

Installation

Fix Windows dll loading for Conda (#4974)
Add warnings for duplicate installation (#5032)
cuDNN v8.2.0 for library installer (#5146)
Bump version to v10.0.0a1 (#5269)

Examples

Fix cuSPARSELt example not to use internal function (#4995)
Update examples for current version of CuPy (#4999)

Tests

Refactor random tests (#4907)
Tentatively pin CI to ROCm 4.0.1 (#4961)
Fix cutensor import in the test (#4965)
Make install_tests runnable without depending on current path (#4969)
Avoid using pip install -e on Windows CI for performance (#4970)
Update known base branches in flexCI config (#4973)
Update list of known branches (#4982)
Fix TestStream cleanup (#5042)
Mark some memory tests as testing.slow (#5061)
Fix stream usage on D2D copy test under HIP (#5091)
Xfail tests for random distribution generator under HIP/ROCm (#5096)
Adjust testing tolerance for hfftn for HIP/ROCm (#5099)
Use current device in tests (#5127)
Fix for updated FlexCI base image (#5164)
Relax tolerance of cupyx.jit.atomic_add test (#5186)
Test build for ROCm 4.0 and latest (#5224)
Fix mergify configuration (#5248)

Others

Use bot mode in automatic backport (#5051)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @beingaryan @eternalphane @grlee77 @insertinterestingnamehere @keckj @leofang @povinsahu1909 @UmashankarTriforce

Assets 58

Releases: cupy/cupy

v9.6.0

Announcements

Final release for v9.x series

CuPy now supports CUDA 11.5

Removal of Alpha/Beta/RC Wheels from PyPI

Changes

Enhancements

Bug Fixes

Documentation

Tests

Others

Contributors

Contributors

v9.5.0

Announcements

Removal of Alpha/Beta/RC Wheels from PyPI

Changes

Enhancements

Bug Fixes

Documentation

Installation

Tests

Others

Contributors

Contributors

v10.0.0b3

Highlights

Array API initial support (#5698)

Changes without compatibility

Drop support for CUDA 10.1 or earlier (#5770)

Drop support for Python 3.6 (#5771)

Alpha/Beta/RC wheels no longer distributed through PyPI

Changes

New Features

Enhancements

Bug Fixes

Code Fixes

Documentation

Installation

Tests

Others

Contributors

Contributors

v10.0.0b2

Highlights

Support for CUDA Python (#5638)

Support for AMD ROCm 4.3

Announcements

Removal of Alpha/Beta/RC Wheels from PyPI

Changes

New Features

Enhancements

Bug Fixes

Code Fixes

Documentation

Tests

Others

Contributors

Contributors

v9.4.0

Highlights

Compile with SASS (CUBIN) for CUDA versions >= 11.1 (#5097)

Support for AMD ROCm 4.3

Changes

Enhancements

Bug Fixes

Code Fixes

Documentation

Tests

Others

Contributors

Contributors

v10.0.0b1

Highlights

CuPy now supports CUDA 11.4 (cupy-cuda114)

Google Summer of Code

Compile with SASS (CUBIN) for CUDA versions >= 11.1 (#5097)

Changes without compatibility

Support the new DLPack exchange protocol (#5306)

CuPy now supports CUDA 11.4 (`cupy-cuda114`)

CuPy now supports CUDA 11.4 (`cupy-cuda114`)

Make `cupy.cuda.Device` context manager interface thread safe (#5083)

Deprecate `cupyx.allow_synchronize` and `cupyx.DeviceSynchronized` APIs (#5226)