Skip to content

Releases: cupy/cupy

v9.6.0

11 Nov 07:02
3a1d844
Compare
Choose a tag to compare

This is the release note of v9.6.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Announcements

Final release for v9.x series

This is expected to be the last release of the CuPy v9 series. Please start trying your workflow with CuPy v10.0.0rc1 and let us know if you have any feedback!

CuPy now supports CUDA 11.5

Wheels for CUDA 11.5 (cupy-cuda115) are now available.

Removal of Alpha/Beta/RC Wheels from PyPI

  • As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre) . Note that the sdist package is available in PyPI for all versions.

  • Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.

Changes

Enhancements

  • Make show_config runnable without GPU (#5839)
  • Merge fp16 headers for CUDA 11.2+ (#6004)
  • Support cuTENSOR 1.3.3 (#6005)
  • Support CUDA 11.5 for library installer (#6010)
  • Display license terms when downloading libraries (#6041)
  • Fix error type/message for duplicate value in axis (#5987)

Bug Fixes

  • Do not use cuTENSOR unless available (#5885)
  • Fix non-determinisitc behavior in cupy.random.shuffle (#5887)
  • Fix ndarray.clip to match numpy (#5916)
  • Fix __repr__ of mode and scalar in cuTENSOR (#5917)
  • Fix max blocksize used in cupyx.optimizing.optimize for HIP (#5931)
  • Fix ravel for strides 0 (#5998)
  • Fix cuTENSOR installation on Windows (#6022)
  • Allow generating cubins for the max known CC (#6024)

Documentation

  • Update upgrade guide (#5834)
  • Document ppc64le and aarch64 are supported on conda-forge (#5869)
  • Improve the comparison table (#5911)
  • Add footnotes for functions unimplemented in CuPy (#5954)
  • Update the docstring for cholesky (#5960)
  • Document CUPY_ACCELERATORS (#5975)
  • Add favicon to docs (#5983)
  • Support CUDA 11.5 on documents (#6006)
  • Replace favicon with high resolution one (#6008)
  • Fix typo in copyright line (#6035)

Tests

  • Clean up plan cache in a FFT slow test (#5825)
  • Copy source directory to support pip 21.3 (#5896)
  • Simplify legacy ROCm test script for FlexCI (#5936)
  • Relax sparse linalg testing tolerance (#5958)
  • CI: Fix ROCm build test (FlexCI) failing (#5965)
  • Improve handling of FlexCI test runs (#6002)
  • Upload cache even when test failed in FlexCI (#6003)
  • CI: Increase timeout for CUDA 11.4 / 11.5 tests (#6040)
  • CI: Do not run full combination test even for branch tests for ROCm (#5974)

Others

  • Avoid triggering docker workflow on release of forked repos (#5886)
  • Bump version to v9.6.0 (#6043)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @drbeh @emcastillo @kmaehashi @leofang @takagi @toslunar

v9.5.0

30 Sep 08:29
aa82a99
Compare
Choose a tag to compare

This is the release note of v9.5.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Announcements

Removal of Alpha/Beta/RC Wheels from PyPI

  • As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre) . Note that the sdist package is available in PyPI for all versions.

  • Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.

Changes

Enhancements

  • Support cuDNN 8.2.4 (#5744)
  • Support NCCL 2.11.4 (#5747)
  • Fix cupyx.optimize to save file when no optimization ran (#5760)

Bug Fixes

  • Fix spline filter with large array (#5686)
  • Fix exception for indexing with multiple ellipses (#5739)
  • Fix docstring for fallback modules (#5742)
  • Include stdexcept in hip headers (#5777)
  • Fixed typo in error message in sparse.csr_matrix (#5788)
  • Fix MAX_NDIM and add guards/tests (#5798)
  • Disable spmm on Windows CUDA 10.2 (#5805)

Documentation

  • Fix random docstring (#5708)
  • Remove --pre from ROCm source build instructions (#5782)
  • Use custom index for pre-release wheels (#5793)

Installation

  • Add maintainers in setup.py (#5758)
  • Bump version to v9.5.0 (#5808)

Tests

  • Update test_eigenvalue.py (#5643)
  • Improve performance of TestSplineFilter1dLargeArray (#5694)
  • Stop inheriting unittest.TestCase for performance (#5710)
  • TestSplineFilter1dLargeArray marked slow and reduced memory usage (#5729)
  • Make testing helpers support non-methods (#5731)
  • Make test parameter names static (#5733)
  • Update pip and setuptools in Windows CI (#5738)
  • Improve FlexCI output (#5796)
  • Fix error message comparison (#5806)

Others

  • Add workflow to test/build/push docker images on pull-request/release (#5752)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@christinahedges @emcastillo @kmaehashi @leofang @takagi @toslunar

v10.0.0b3

30 Sep 08:28
4053fa9
Compare
Choose a tag to compare
v10.0.0b3 Pre-release
Pre-release

This is the release note of v10.0.0b3. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Array API initial support (#5698)

This release starts implementing the Array API standard for interoperability with other tensor libraries. Please check the CuPy documentation to see a list of the currently available features.

Changes without compatibility

Drop support for CUDA 10.1 or earlier (#5770)

As per the RFC in #5717 and twitter, the minimum CUDA version that will be supported by CuPy v10 is CUDA 10.2.

Drop support for Python 3.6 (#5771)

Following the Python 3.6 sunset on December 2021, and the compatibility lines with NumPy, starting CuPy v10, Python 3.6 will no longer be supported.

Alpha/Beta/RC wheels no longer distributed through PyPI

  • As per the discussion in #5671, we stopped uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the recently introduced custom index (e.g., pip install cupy-cudaXXX -f https://pip.cupy.dev/pre) . Note that the sdist package is available in PyPI for all versions.

  • Outdated (v8.0.0rc1 or earlier) pre-release binaries have been removed from PyPI. See #5667 for details.

Changes

New Features

  • Add binomial distribution to new Generator (#5429)
  • Adopt the numpy.array_api module as cupy.array_api (#5698)

Enhancements

  • Improve stream mismatch error message (#5706)
  • Support cuDNN 8.2.4 (#5726)
  • Support NCCL 2.11.4 (#5734)
  • Fix cupyx.optimize to save file when no optimization ran (#5757)
  • Adding bitorder support to cupy.unpackbits (#5765)
  • Drop support for CUDA 10.1 or earlier (#5770)
  • Drop support for Python 3.6 (#5771)

Bug Fixes

  • Fix spline filter with large array (#5673)
  • Fix exception for indexing with multiple ellipses (#5718)
  • Fix docstring for fallback modules (#5728)
  • Fix MAX_NDIM and add guards/tests (#5749)
  • Fixed typo in error message in sparse.csr_matrix (#5767)
  • Include stdexcept in hip headers (#5769)
  • Disable spmm on Windows CUDA 10.2 (#5802)

Code Fixes

  • Prefix Cython compile_time_env with CUPY_ (#5740)

Documentation

  • Use custom index for pre-release wheels (#5772)
  • Remove --pre from ROCm source build instructions (#5773)

Installation

  • Reorganize build scripts, part 1 (#5730)
  • Reorganize build scripts, part 2: separate modules (#5743)
  • Reorganize build scripts, part 3: simplify setup.py (#5745)
  • Reorganize build scripts, part 4: remove global cupy_setup_options (#5754)
  • Reorganize build scripts, part 5: remove Cython version check (#5755)
  • Add maintainers in setup.py (#5756)
  • Bump version to v10.0.0b3 (#5807)

Tests

  • Make testing helpers support non-methods (#5594)
  • Stop inheriting unittest.TestCase for performance (#5599)
  • Eliminate random test ids (#5659)
  • Improve performance of TestSplineFilter1dLargeArray (#5693)
  • TestSplineFilter1dLargeArray marked slow and reduced memory usage (#5724)
  • Make test parameter names static (#5727)
  • Update pip and setuptools in Windows CI (#5735)
  • Improve FlexCI output (#5786)
  • Skip tests for bug cases (FFT on CUDA 10.2 + Pascal) (#5791)
  • Fix error message comparison (#5799)
  • Fix test skip issue (#5801)

Others

  • Update auto-notify bot for array-api label (#5725)
  • Fix backport trigger (#5741)
  • Add workflow to test/build/push docker images on pull-request/release (#5746)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@christinahedges @emcastillo @iskode @kmaehashi @leofang @povinsahu1909 @takagi @toslunar

v10.0.0b2

26 Aug 07:39
836f41b
Compare
Choose a tag to compare
v10.0.0b2 Pre-release
Pre-release

This is the release note of v10.0.0b2. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support for CUDA Python (#5638)

CuPy is one of the first libraries providing support for the newly released CUDA Python bindings. To try it, install cuda-python manually and set the CUPY_USE_CUDA_PYTHON=1 environment variable when building CuPy as written in the documentation.

Support for AMD ROCm 4.3

Support for ROCm 4.3 has been added in the latest release and binary wheels are provided as well. Note that there is currently an issue with ROCm 4.3 that prevents it from running in several environments. The current workaround is to set the LLVM_PATH variable to the llvm folder included in ROCm 4.3 installation (e.g., export LLVM_PATH=/opt/rocm-4.3/llvm).

Announcements

Removal of Alpha/Beta/RC Wheels from PyPI

  • As per the discussion in #5671, we will stop uploading pre-release binary wheels to PyPI for the health of the ecosystem. Pre-release wheels can now be downloaded from the assets section of each GitHub release page (e.g., pip install cupy-cudaXXX -f https://github.com/cupy/cupy/releases/tag/v10.0.0b2) . Note that the sdist package is available in PyPI for all versions.

  • We are also going to remove outdated (v8.0.0rc1 or earlier) pre-release binary wheels from PyPI on September 20th. See #5667 for details.

Changes

New Features

  • Support batched QR solver (#5583)
  • Add cupyx.scipy.sparse.linalg.minres (#5585)
  • Add Log Series distribution to cupy.random.Generator (#5618)
  • Add Power distribution to cupy.random.Generator (#5624)
  • Add support for CUDA Python (#5638)
  • Add Chi-square distribution to cupy.random.Generator (#5645)
  • Add Dirichlet distribution to cupy.random.Generator (#5648)
  • Add F distribution to cupy.random.Generator (#5655)

Enhancements

  • Add ncclAvg and ncclBfloat16 for NCCL (#5545)
  • Add new eigensolvers from rocSOLVER (#5555)
  • Add support for array input in beta distribution of cupy.random.Generator (#5573)
  • Release the GIL for several NCCL ops (#5574)
  • Allow to compile using PTX with an envvar (#5622)
  • Show CUDA Python version (#5651)
  • Fix version check for new ROCm version definition (#5657)
  • Rest of version check fix for new ROCm version definition (#5660)
  • Add ROCm 4.3 in duplicate detection (#5669)

Bug Fixes

  • Fix compute capability check (#5600)
  • Fix FFT convolve for shapes containing 1 (#5609)
  • Fix squareness checks (#5642)
  • Fix unique for empty array (#5654)

Code Fixes

  • Add batch_identity helper (#5614)
  • Remove unnecessary comments (#5631)

Documentation

  • Update Sphinx to 4.1.2 (#5612)
  • Fix random docstring (#5628)
  • Support ROCm v4.3 in document (#5633)
  • __array_function__ feature by default (#5644)

Tests

  • Fix skipTest in test_decomp_lu (#5593)
  • Mark lsmr tests xfail for CSR matrices on HIP (#5597)
  • Increase test timeout (#5601)
  • Fix cubic for_all_dtypes_combination tests (#5629)
  • Add CI for ROCm 4.3 (#5630)
  • Reload GPG key for ROCm 4.2 test (#5636)
  • Fix branch name of cuda-python (#5650)
  • Add a workaround for ROCm 4.3.0 for testing (#5662)

Others

  • Add cupy-cuda114 to duplicate detection (#5621)
  • Bump version to v10.0.0b2 (#5679)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@hauntsaninja @leofang @povinsahu1909 @yashasvimisra2798

v9.4.0

26 Aug 07:39
58f3db2
Compare
Choose a tag to compare

This is the release note of v9.4.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Compile with SASS (CUBIN) for CUDA versions >= 11.1 (#5097)

Changes NVRTC compile process to produce SASS (CUBIN files) instead of PTX so that kernels compiled with a new CUDA Toolkit version can be run with earlier CUDA Drivers. Check the CUDA Compatibility Guide and NVRTC Documentation for detailed information. We believe most users will not be affected by this change, but you can revert to the previous behavior by setting CUPY_COMPILE_WITH_PTX=1 environment variable just in case.

Support for AMD ROCm 4.3

Support for ROCm 4.3 has been added in the latest release and binary wheels are provided as well. Note that there is currently an issue with ROCm 4.3 that prevents it from running in several environments. The current workaround is to set the LLVM_PATH variable to the llvm folder included in ROCm 4.3 installation (e.g., export LLVM_PATH=/opt/rocm-4.3/llvm).

Changes

Enhancements

  • Compile with SASS for CUDA versions >= 11.1 (#5611)
  • Allow to compile using PTX with an envvar (#5634)
  • Add ncclAvg and ncclBfloat16 for NCCL (#5656)
  • Fix version check for new ROCm version definition (#5661)
  • Rest of version check fix for new ROCm version definition (#5670)

Bug Fixes

  • Fix FFT convolve for shapes containing 1 (#5613)
  • Fix the RTC call path for HIP (#5620)
  • Fix compute capability check (#5646)
  • Fix squareness checks (#5652)
  • Fix unique for empty array (#5658)

Code Fixes

  • Fix kernel names to be consistent (#5625)
  • Remove unnecessary comments (#5635)

Documentation

  • Update Sphinx to 4.1.2 (#5616)
  • __array_function__ feature by default (#5653)
  • Support ROCm v4.3 in document (#5674)

Tests

  • Increase test timeout (#5615)
  • Increase timeout for CUDA 11.4 tests (#5617)
  • Add CI for ROCm 4.3 (#5632)
  • Reload GPG key for ROCm 4.2 test (#5637)
  • Fix cubic for_all_dtypes_combination tests (#5639)
  • Add a workaround for ROCm 4.3.0 for testing (#5663)
  • Fix skipTest in test_decomp_lu (#5672)

Others

  • Bump version to v9.4.0 (#5680)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@grlee77 @leofang @yashasvimisra2798

v10.0.0b1

05 Aug 08:20
4ebc827
Compare
Choose a tag to compare
v10.0.0b1 Pre-release
Pre-release

This is the release note of v10.0.0b1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CuPy now supports CUDA 11.4 (cupy-cuda114)

Along with the new CUDA toolkit version, support for NCCL 2.10.3 and cuDNN 8.2.2 libraries is added.

Compute capability 86 support for GPUs of the RTX 30X0 and AX000 series is also added.

Google Summer of Code

CuPy is participating in Google Summer of Code under the NumFOCUS organization.

Our student @povinsahu1909 is working hard to add support for sparse linear algebra solvers and increasing the compatibility of the new random number generation API.

Compile with SASS (CUBIN) for CUDA versions >= 11.1 (#5097)

Changes NVRTC compile process to produce SASS (CUBIN files) instead of PTX so that kernels compiled with a new CUDA Toolkit version can be run with earlier CUDA Drivers. Check the CUDA Compatibility Guide and NVRTC Documentation for detailed information.

Changes without compatibility

Support the new DLPack exchange protocol (#5306)

By adopting the new DLPack exchange protocol proposed in the Python array API standard, cupy.fromDlpack has been deprecated in favor of cupy.from_dlpack.

Known Issues

  • cupy-cuda102, cupy-cuda110 and cupy-cuda111 wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.

Changes

New Features

  • Texture memory 2D/3D affine transformations (#5171)
  • Support the new DLPack exchange protocol (#5306)
  • Add cupyx.scipy.sparse.linalg.lsmr (#5331)
  • JIT: Support all atomic intrinsics (#5387)
  • Expose _GUFunc through cupyx (#5408)
  • Add geometric distribution to new Generator (#5443)
  • Support Numba-like jit.gridsize() syntax in CuPy JIT (#5461)
  • Support Numba-like jit.laneid() and jit.warpsize syntax in CuPy JIT (#5462)
  • Add cupyx.scipy.sparse.linalg.cgs (#5524)
  • Add hypergeometric distribution to new Generator (#5560)

Enhancements

  • Compile with SASS for CUDA versions >= 11.1 (#5097)
  • Support NCCL v2.9.9 (#5268)
  • Support CUDA 11.4 and compute_86 (#5434)
  • Update NumPy/SciPy pinning in setup.py (#5453)
  • Make matrix_power support stacked matrices (#5458)
  • Support hipSPARSE and fix streams not set in some generic APIs in cuSPARSE (#5472)
  • Add cudaDeviceDisablePeerAccess wrapper (#5495)
  • Support cuDNN v8.2.2 (#5516)
  • Support NCCL v2.10.3: library installer and document (#5521)

Bug Fixes

  • JIT: Fix supported dtype of atomic_add on HIP (#5383)
  • Fix cupy.nanmedian's axis parameter to accept a sequence other than a tuple (#5389)
  • Fix astype from boolean (#5410)
  • Fix compatibility issues of ndarray.view (#5428)
  • Fix types attribute of ufunc (#5448)
  • Fix new DLPack protocol error messages and tests (#5449)
  • texture_memory option in affine_transform not supported by HIP (#5464)
  • Fix linalg.lstsq for empty matrix (#5467)
  • Fix reshape (#5470)
  • Fix random generator output not being raveled (#5478)
  • Fix random integers (#5479)
  • Fix availability tests in cuSOLVER and cuSPARSE (#5492)
  • Add missing hipSPARSE include to builder (#5515)
  • prune cuFFT static lib by major cc ver (#5531)
  • Fix casts from bool in ufunc inputs (#5539)
  • Access cudaMemoryType in the pointer attributes and fix for HIP (#5544)
  • Fix casts in ufunc outputs (#5550)
  • Code fix for {cu, roc}SOLVER (#5558)
  • Fix CUDA API call on module initialization (#5561)
  • Fix the RTC call path for HIP (#5569)
  • Fix broadcast error messages (#5579)

Code Fixes

  • Do not call cudnnGetVersion on import (#5326)
  • JIT: Fix __call__() for built-in functions (#5361)
  • Add HIP symbol redefinitions (#5362)
  • Remove the data member use_32bit_indexing from CArray (#5376)
  • Use dtype.name instead dtype.char (#5444)
  • Try to use -I in hipRTC (#5486)
  • Hide modules from public APIs (#5522)
  • consistent kernel names (#5551)
  • Use the new macro __HIP_PLATFORM_AMD__ at build time (#5554)

Documentation

  • Add upgrade guide for v10 (#5278)
  • Update tag lines in package description and docs index (#5399)
  • Fix typo in apply_along_axis (#5432)
  • Fix indent of Returns section (#5433)
  • Update user_guide/basic.rst device agnostic section (#5435)
  • Support CUDA 11.4 on documents (#5447)
  • Update install guide with new NumPy/SciPy versions (#5454)
  • Use from_dlpack instead of fromDlpack (#5488)
  • Use Sphinx 4.1.0 (#5489)
  • Bump ReadTheDocs configuration to version 2 (#5491)
  • Fix docs of eigh and eigvalsh (#5494)
  • Add a lingering doc page for fromDlpack() (#5509)
  • Document scipy.fft backend usage (#5514)
  • Replaced the links for NumPy docs as per issue #3418 (#5548)
  • Use Sphinx's envvar construct (#5570)
  • Fix intersphinx for SciPy 1.7.1 docs (#5587)

Installation

  • Fix license_file option in setup.cfg (#5406)
  • Import numpy before Cython (#5482)

Tests

  • Add tests for num_to_num's optional parameters (#5337)
  • Add script for ROCm CI on Jenkins (#5378)
  • Skip unwrap tests for numpy<1.21 (#5384)
  • Enable strict xfail in pytest (#5407)
  • Remove xfail in windows jitify test (#5409)
  • Fix preloading slow tests (#5440)
  • Add script for CUDA 11.4 CI on FlexCI (#5457)
  • Increase memory for CUDA 11.4 tests (#5477)
  • Fix DLPack test for ROCm/HIP (#5485)
  • Fix "Revert test decorators order" (#5498)
  • Fix some tests for HIP (#5501)
  • Fix FlexCI Linux tests (#5505)
  • Add CUDA 11.4 for FlexCI helper script (#5528)
  • Increase timeout for CUDA 11.4 tests (#5575)
  • Update tests to install all requirements and add PATH (#5576)
  • Add Cython to all requirements (#5577)

Others

  • Notify conflict by mergify (#5371)
  • Fix mergify to only comment when pull-request is open (#5439)
  • Fix mergify condition (#5513)
  • Add auto notify bot for hip label (#5538)
  • Use pull_request_target instead for auto notify bot (#5541)
  • Fix auto notify bot for issues (#5546)
  • Disable Mergify's auto-merge (#5556)
  • Bump version to v10.0.0b1 (#5595)
  • Fix signal tests for scipy 1.7.0 (#5368)
  • Fix numpy.unwrap for NumPy 1.21 (#5385)
  • Fix signaltools medfilt for scipy>=1.7.0 (#5386)
  • Fix deprecated numpy.typeDict utilization (#5388)

The CuPy Team would like to thank all those who contributed to this release!

@12rambau @grlee77 @leofang @maxim-belkin @Palash-Vishnani @povinsahu1909 @the-lay

v9.3.0

05 Aug 08:20
c8a3cc9
Compare
Choose a tag to compare

This is the release note of v9.3.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CuPy now supports CUDA 11.4 (cupy-cuda114)

Along with the new CUDA toolkit version, support for NCCL 2.10.3 and cuDNN 8.2.2 libraries is added.

Compute capability 86 support for GPUs of the RTX 30X0 and AX000 series is also added.

Known Issues

  • cupy-cuda102, cupy-cuda110 and cupy-cuda111 wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.

Changes

Enhancements

  • Support NCCL v2.9.9 (#5402)
  • Update NumPy/SciPy pinning in setup.py (#5471)
  • Support CUDA 11.4 and support compute_86 (#5519)
  • Support cuDNN v8.2.2 (#5523)
  • Make matrix_power support stacked matrices (#5525)
  • Support NCCL v2.10.3: library installer and document (#5526)

Bug Fixes

  • JIT: Fix supported dtype of atomic_add on HIP (#5405)
  • Fix cupy.nanmedian's axis parameter to accept a sequence other than a tuple (#5416)
  • Fix compatibility issues of ndarray.view (#5442)
  • Fix types attribute of ufunc (#5455)
  • Fix random integers (#5484)
  • Fix random generator output not being raveled (#5487)
  • Fix astype from boolean (#5490)
  • Fix reshape (#5504)
  • Fix linalg.lstsq for empty matrix (#5506)
  • Add missing checks and _setStream() (#5507)
  • Fix availability tests in cuSOLVER and cuSPARSE (#5534)
  • prune cufft static lib by major cc ver (#5536)
  • Fix casts from bool in ufunc inputs (#5549)
  • Code fix for {cu, roc}SOLVER (#5566)
  • Access cudaMemoryType in the pointer attributes and fix for HIP (#5571)
  • Fix broadcast error messages (#5584)
  • Fix casts in ufunc outputs (#5589)
  • Fix broken build on CUDA 9.2 (#5598)

Code Fixes

  • Remove the data member use_32bit_indexing from CArray (#5414)
  • JIT: Fix __call__() for built-in functions (#5422)
  • Do not call cudnnGetVersion on import (#5446)
  • Add HIP symbol redefinitions (#5475)
  • Try to use -I in hipRTC (#5502)
  • Hide modules from public APIs (#5533)
  • Use the new macro __HIP_PLATFORM_AMD__ at build time (#5565)

Documentation

  • Update tag lines in package description and docs index (#5415)
  • Fix typo in apply_along_axis (#5441)
  • Fix indent of Returns section (#5452)
  • Update user_guide/basic.rst device agnostic section (#5456)
  • Update install guide with new NumPy/SciPy versions (#5465)
  • Bump ReadTheDocs configuration to version 2 (#5497)
  • Fix docs of eigh and eigvalsh (#5499)
  • Use Sphinx 4.1.0 (#5500)
  • Document scipy.fft backend usage (#5532)
  • Support CUDA 11.4 on documents (#5535)
  • Replaced the links for NumPy docs as per issue #3418 (#5553)
  • Use Sphinx's envvar construct (#5586)
  • Fix intersphinx for SciPy 1.7.1 docs (#5588)

Installation

  • Fix license_file option in setup.cfg (#5411)
  • Import numpy before Cython (#5483)

Examples

Tests

  • Skip unwrap tests for numpy<1.21 (#5412)
  • Remove xfail in windows jitify test (#5418)
  • Enable strict xfail in pytest (#5423)
  • Add missing DLPack test for complex numbers (#5425)
  • Fix unwrap tests for v9 (#5426)
  • Fix preloading slow tests (#5445)
  • Add script for ROCm CI on Jenkins (#5468)
  • Add script for CUDA 11.4 CI on FlexCI (#5473)
  • Increase memory for CUDA 11.4 tests (#5480)
  • Fix "Revert test decorators order" (#5518)
  • Fix FlexCI Linux tests (#5520)
  • Add CUDA 11.4 for FlexCI helper script (#5543)
  • Fix scipy requirement in tests (#5563)
  • Fix some tests for HIP (#5578)
  • Update tests to install all requirements and add PATH (#5581)
  • Add Cython to all requirements (#5582)

Others

  • Notify conflict by mergify (#5419)
  • Fix mergify to only comment when pull-request is open (#5510)
  • Fix mergify condition (#5517)
  • Add auto notify bot for hip label (#5540)
  • Use pull_request_target instead for auto notify bot (#5542)
  • Fix auto notify bot for issues (#5547)
  • Disable Mergify's auto-merge (#5562)
  • Bump version to v9.3.0 (#5596)
  • Fix deprecated numpy.typeDict utilization (#5403)
  • Fix signal tests for SciPy 1.7.0 (#5413)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@12rambau @leofang @maxim-belkin @Palash-Vishnani

v10.0.0a2

24 Jun 08:32
827dfba
Compare
Choose a tag to compare
v10.0.0a2 Pre-release
Pre-release

This is the release note of v10.0.0a2. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

  • CuPy now supports CUDA 11.3 (cupy-cuda113) and AMD ROCm 4.2 (cupy-rocm-4-2) and binary wheels are now available on PyPI.
  • The following Python syntax and new APIs can now be used in JIT target functions.
    • Calling len, min, max Python built-ins.
      • len(arr): Equivalent to arr.shape[0].
      • min(scalar1, scalar2, ...): Returns the minimum value of the inputs.
      • max(scalar1, scalar2, ...): Returns the maximum value of the inputs.
    • Accessing .ndim, .size attributes of ndarray.
    • Unpacking nested tuples.
      • (x, y), z = ...
    • jit.grid() API, similar to numba.cuda.grid.
      • x, y, z = cupyx.jit.grid(3) (x is equal to threadIdx.x + blockIdx.x * blockDim.x.)
    • Warp shuffle and sync functions.
      • cupyx.jit.shfl_down_sync(mask, var, val_id) (__shfl_down_sync(mask, var, val_id))
  • cupyx.scipy.sparse.{coo,csr,csc}_matrix now provides the reshape method.

Changes without compatibility

Drop CUDA 9.2 & NCCL 2.4 Support (#5214)

CUDA 9.2 and NCCL 2.4 are no longer supported in CuPy v10.

Changes in Stream behavior (#5251)

The same cupy.cuda.Stream instance can now safely be shared between multiple threads. To achieve this, CuPy v10 will not destroy the stream (i.e., call cudaStreamDestroy) if the stream is the current stream of any thread.

Known Issues

  • cupy-cuda111 wheels only support CUDA 11.1.1 and will not work with CUDA 11.1.0 (#5313).
  • cupy-cuda110 and cupy-cuda111 wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.

Changes

New Features

  • Add reshape method for COO, CSR and CSC matrices (#5301)
  • Support len, min, max, .ndim, .size in jit (#5319)
  • Support nested tuple unpack in CuPy JIT (#5332)
  • Support Numba-like jit.grid() syntax in CuPy JIT (#5334)
  • Support warp shuffle and sync functions in CuPy JIT (#5335)

Enhancements

  • Do not use handles unless requested in cupy.show_config() (#5073)
  • Fix to allow sharing a Stream instance between threads (#5251)
  • Adding GUFunc order, dtype and casting kwarg support (#5260)
  • Support nan, posinf, neginf in cupy.nan_to_num (#5295)
  • Use independent version of hipFFT for ROCm 4.1 and later (#5318)
  • Support cuTENSOR v1.3.1 (#5338)
  • Support cuDNN v8.2.1 (#5357)

Performance Improvements

  • Make cuTENSOR available in cupy.einsum (#5203)

Bug Fixes

  • Fix check_availablity for cupy.cusolver (#5207)
  • Fix MemoryAsync to keep a weakref to stream (#5264)
  • Fix cuFFT callback for sm_61 etc (#5304)
  • Fix cuDNN preloading (#5327)
  • Fix large arrays assignment (#5330)
  • Ensure source array is C-contiguous before copying to CUDAArray (#5342)
  • Increase test coverage for Generalized Universal Functions (#5344)
  • Remove unnecessary print (#5374)

Code Fixes

  • Fix cub repository url (#5236)
  • Code and comment fixes for stream (#5243)
  • Use cdef instead of cpdef where appropriate (#5274)

Documentation

  • Fix matmul docstring (#5174)
  • Update list of wheels in README (#5267)
  • Add user guide for FFT (#5272)
  • Bump CuPy version in docs (#5277)
  • Add user guide for streams & events (#5283)
  • Fix deadlink to tutorial and reorder in README (#5287)
  • Document ExternalStream (#5305)
  • Add ROCm 4.2 support to install docs (#5354)
  • user_guide/basic.rst: various improvements (#5356)

Installation

  • Drop support for CUDA 9.2 & NCCL 2.4 (#5214)
  • Add upper restrictions to NumPy/SciPy versions (#5225)
  • Exclude Cython 3 from setup_requires (#5273)

Tests

  • Fix threading memory pool tests (#5263)
  • Temporarily remove the async pool test from TestAllocator (#5308)
  • Fix Windows CI kernel cache (#5310)
  • Tentatively skip unstable MemoryPoolAsync tests (#5350)
  • Xfail random generator tests for HIP (#5355)
  • Tentatively pin to SciPy 1.6 in Windows CI (#5366)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @eternalphane @leofang @maxim-belkin @povinsahu1909

v9.2.0

24 Jun 08:32
83d5e6d
Compare
Choose a tag to compare

This is the release note of v9.2.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

  • CuPy now supports CUDA 11.3 (cupy-cuda113) and AMD ROCm 4.2 (cupy-rocm-4-2) and binary wheels are now available on PyPI.

Known Issues

  • cupy-cuda111 wheels only support CUDA 11.1.1 and will not work with CUDA 11.1.0 (#5313).
  • cupy-cuda110 and cupy-cuda111 wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.

Changes

Enhancements

  • Add CUDA 11.3 headers (#5232)
  • Do not use handles unless requested in cupy.show_config() (#5285)
  • Use independent version of hipFFT for ROCm 4.1 and later (#5351)
  • Support cuTENSOR v1.3.1 (#5370)
  • Support cuDNN v8.2.1 (#5372)

Bug Fixes

  • MemoryAsyncPool: Use the "current" mempool instead of the "default" one (#5271)
  • Fix MemoryAsync to keep a weakref to stream (#5307)
  • Fix cuFFT callback for sm_61 etc (#5325)
  • Fix large arrays assignment (#5333)
  • Fix check_availablity for cupy.cusolver (#5336)
  • Fix cuDNN preloading (#5365)
  • Ensure source array is C-contiguous before copying to CUDAArray (#5375)
  • Remove unnecessary print (#5377)

Code Fixes

  • Use cdef instead of cpdef where appropriate (#5274)
  • Fix cub repository url (#5288)

Documentation

  • Fix matmul docstring (#5281)
  • Update list of wheels in README (#5284)
  • Add user guide for FFT (#5286)
  • Fix deadlink to tutorial and reorder in README (#5291)
  • Add user guide for streams & events (#5302)
  • Document ExternalStream (#5312)
  • user_guide/basic.rst: various improvements (#5356)
  • Add ROCm 4.2 support to install docs (#5360)

Installation

  • Exclude Cython 3 from setup_requires (#5273)
  • Add upper restrictions to NumPy/SciPy versions (#5321)

Tests

  • Fix threading memory pool tests (#5289)
  • Fix Windows CI kernel cache (#5317)
  • Xfail random generator tests for HIP (#5359)
  • Tentatively pin to SciPy 1.6 in Windows CI (#5369)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@leofang @maxim-belkin

v10.0.0a1

27 May 07:50
b01641d
Compare
Choose a tag to compare
v10.0.0a1 Pre-release
Pre-release

This is the release note of v10.0.0a1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CUDA 11.0 and 11.1 wheels not available yet in PyPI (#4971)

In the meantime, they can be downloaded from the Assets section below. See #4971 for the detailed instructions.

Changes without compatibility

Current stream is now managed per device (#5172)

CuPy now automatically manages the stream switching when changing a device, so the user is not responsible for changing the stream anymore.

This pull-request also includes a bug fix for #5143. An existing code mixing with stream: blocks and stream.use() may get different results as the stream set via use() API will not be reactivated when exiting a stream context.

s1 = cupy.cuda.Stream()
s2 = cupy.cuda.Stream()
s3 = cupy.cuda.Stream()
with s1:
    s2.use()
    with s3:
        pass
    cupy.cuda.get_current_stream()  # -> CuPy v10 returns `s1` instead of `s2`.

Make cupy.cuda.Device context manager interface thread safe (#5083)

The use of a single cupy.cuda.Device context manager object with multiple threads was leading to incorrect behavior when restoring the previous device since the first versions of CuPy. Now the correct device is restored back so user code relying on this incorrect behavior might need to be updated.

Deprecate cupyx.allow_synchronize and cupyx.DeviceSynchronized APIs (#5226)

These APIs used for detecting when synchronization to a device was happening have been deprecated since they don’t provide reliable behavior.

Changes

Note: many of these PRs are backported to the v9 series and available since the release.

New Features

  • CUDA 11.2: Add MemoryAsyncPool to support malloc_async (#4592)
  • Add APIs for creating NumPy arrays backed by pinned memory (#4870)
  • Support cuSPARSELt (#4883)
  • Add gamma distributions to random API (#4905)
  • Add random for uniform [0, 1) generation (#4906)
  • Add poisson distribution to random API (#4927)
  • Add SciPy compatible connected_components (#4940)
  • Support shared memory in CuPy JIT (#4950)
  • Add cupyx.scipy.sparse.kronsum() (#4968)
  • Add hfft2, ihfft2, hfftn, and ihfftn to cupyx.scipy.fft (#4996)
  • CuPy JIT: Print kernel code (#5017)
  • Add cupyx.jit.atomic_add (#5169)
  • CUDA 11.2/11.3: Support MemoryAsyncPool statistics and limits (#5177)

Enhancements

  • Ability to pass structured data types by value as kernel parameters (#4829)
  • Move the NVTX module to cupy_backends.cuda.libs (#4930)
  • Disable CUB SpMV on CUDA 11.x (#4949)
  • CuPy JIT: Readable compile error messages (#4991)
  • Fix JIT test failures on ROCm (#4998)
  • Mark cupyx.jit.rawkernel as experimental (#5005)
  • HIP: add -ftz=true (#5007)
  • Give gufunc a name (#5013)
  • CuPy JIT: Use C++-like typing rule in 'cuda' mode (#5028)
  • Add PCI Bus ID to show_config (#5037)
  • Print cuSPARSELt version in show_config (#5054)
  • Support custom getsource option in CuPy JIT (#5071)
  • Make cupy.cuda.Device context manager interface thread safe (#5083)
  • Add a new argument out to cupy.asnumpy() (#5155)
  • Support cuSPARSELt v0.1.0 (#5158)
  • Per device stream (#5172)
  • cuTENSOR v1.3.0 for library installer (#5192)
  • Add sum_labels to cupyx.scipy.ndimage.measure (#5200)
  • Support NCCL v2.9.8 (#5201)
  • Fix thrust compilation for ROCm 4.2.0 (#5209)
  • Add NVCC path and Python version to show_config (#5215)
  • Add CUDA 11.3 headers (#5218)
  • Add libraries for CUDA 11.3 (#5219)
  • Remove syncdetect APIs (#5226)

Bug Fixes

  • Use THRUST_OPTIONAL_CPP11_CONSTEXPR (#5002)
  • Use async memcpy in ndarray.copy (#5004)
  • Fix DLPack lanes (#5045)
  • Disable cuFFT plan cache on CUDA 11.1 (#5046)
  • Support PTDS in CuPy memory pool (#5072)
  • CuPy JIT: Fix range type (#5077)
  • Fix poisson to support lam array (#5087)
  • Adjust PATH when preloading to load cuDNN v8 correctly on Windows (#5103)
  • Bugfix for typing rule of CuPy JIT (#5125)
  • Fix TypeError in svds (#5140)
  • Properly handle non-contiguous RHS in cupyx.scipy.sparse.linalg.spsolve (#5168)
  • Fix integer scatter_add failure on Windows (#5173)
  • MemoryAsyncPool: Use the "current" mempool instead of the "default" one (#5191)
  • Fix matmul for input with relaxed strides (#5205)
  • Add check_availability for cuTensor routines (#5206)
  • Fix windows constexpr (#5233)
  • Remove duplicated subtraction in cupy.random.Generator.integers (#5247)

Code Fixes

  • Rename cupy.core submodule to cupy._core (#3820)
  • Fix some internal cpdef functions to cdef in _kernel.pyx (#5084)
  • Remove cupy.cupy (#5121)
  • Cosmetic change in cuSPARSELt stub header (#5149)
  • Cosmetic changes of CuPy JIT implementation (#5152)

Documentation

  • Follow the latest NumPy/SciPy docs style (#4945)
  • Fix docs: cupy-cuda112 now on PyPI (#4957)
  • Update installation guide for Conda-Forge (#4985)
  • CuPy JIT documentation (#5012)
  • Document cupyx.time.repeat (#5015)
  • Document cupy.cuda.runtime.getDeviceProperties (#5016)
  • More documentation on the supported backends (#5019)
  • Add links to Anaconda, Gitter, StackOverflow (#5020)
  • Improve the documentation on interoperability (#5023)
  • Document CFunctionAllocator and ManagedMemory (#5025)
  • Fix code block in installation guide (#5033)
  • Improve comments for memory and stream API usage (#5060)
  • Point to the correct numpy random docs (#5088)
  • Add user guide (#5093)
  • Add ROCm limitations to docs (#5107)
  • Reorganize API reference pages (#5108)
  • Revise ROCm doc (#5122)
  • Fix docs of scatter_add (#5129)
  • Mention baseline API change in upgrade guide (#5131)
  • Fix ROCm wheel install steps (#5133)
  • Fix docstring in coo.py (#5139)
  • Fix docs in stream.pyx (#5144)
  • cuDNN v8.2 on documentation (#5148)
  • Mention PTDS in ROCm Limitation (#5159)
  • Use Sphinx 4 (#5188)
  • cuTENSOR v1.3 on documentation (#5196)
  • Fix cuSPARSELt not covered in docs (#5221)
  • Add cupyx.scipy.ndimage.sum_labels to docs (#5223)
  • Improve README (#5254)
  • Update logo image (#5255)
  • Tentatively remove CUDA 11.3 from support list (#5256)

Installation

  • Fix Windows dll loading for Conda (#4974)
  • Add warnings for duplicate installation (#5032)
  • cuDNN v8.2.0 for library installer (#5146)
  • Bump version to v10.0.0a1 (#5269)

Examples

  • Fix cuSPARSELt example not to use internal function (#4995)
  • Update examples for current version of CuPy (#4999)

Tests

  • Refactor random tests (#4907)
  • Tentatively pin CI to ROCm 4.0.1 (#4961)
  • Fix cutensor import in the test (#4965)
  • Make install_tests runnable without depending on current path (#4969)
  • Avoid using pip install -e on Windows CI for performance (#4970)
  • Update known base branches in flexCI config (#4973)
  • Update list of known branches (#4982)
  • Fix TestStream cleanup (#5042)
  • Mark some memory tests as testing.slow (#5061)
  • Fix stream usage on D2D copy test under HIP (#5091)
  • Xfail tests for random distribution generator under HIP/ROCm (#5096)
  • Adjust testing tolerance for hfftn for HIP/ROCm (#5099)
  • Use current device in tests (#5127)
  • Fix for updated FlexCI base image (#5164)
  • Relax tolerance of cupyx.jit.atomic_add test (#5186)
  • Test build for ROCm 4.0 and latest (#5224)
  • Fix mergify configuration (#5248)

Others

  • Use bot mode in automatic backport (#5051)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @beingaryan @eternalphane @grlee77 @insertinterestingnamehere @keckj @leofang @povinsahu1909 @UmashankarTriforce