Releases: cupy/cupy
v2.1.0
This is the release of v2.1.0. See here for the complete list of solved issues and merged PRs.
New features
- Add
argpartition
(#608) - Add window functions (#612, thanks @ishihara1989!)
blackman
,hamming
,hanning
- Support
sparse.coo_matrix
initialization with other types of sparse matrices (#626) - Line memory profiler using memory hook and traceback (#630)
- Support
dtype
argument inrandom.randint
(#706) - cuDNN grouped convolution (#721, thanks @anaruse!)
Improvements
- Performance improvements
- Support uint32 sampling up to 0xffffffff in
random.RandomState.interval
(#633) - Fix
random.RandomState.seed
to only accept integer types (#709) - Fix typo in IndexError error message (#683)
- Fix interface for cuDNN find algorithm APIs (#664)
Bug fixes
- Fix OverflowError passing large integer to elementwise operation (#615)
- Fix indexing zero-dimensional array with boolean mask (#645)
- Setup Python’s builtin random state in
testing.fix_random
(#648) - Use v6 RNN API when using cuDNN7 to avoid incompatibility (#665, thanks @anaruse!)
- Set arch option for NVRTC, as the option is neccessary on some GPUs (#696, thanks @grafi-tt!)
- Fix memory pool for multi-threaded applications (#697)
- Fix
var
andstd
to correctly handleddof
argument (#711, thanks @stevendbrown!) - Fix advanced indexing to not alter the indices (#723, thanks @yuyu2172!)
Documentation
- Fix a link in README.md to the contribution guide (#629)
- Remove unrelated “see also” from
testing.numpy_cupy_raises
(#637, thanks @Hakuyume!) - Write note about environment variables for installation (#641)
- Fix reference page of
linalg
(#651) - Fix doctest for Python 3.5 (#663)
- Add intersphinx mapping to Chainer (#666)
- Fix typo and heading in documentation (#667)
- Update testing section in the contribution guide (#716)
- Fix a link in README.md to the forum (#754, thanks @muupan!)
- Fix incorrect heading “CuPy” instead of “NumPy” in license page (#674)
Test
- Use the latest Cython in Travis CI (#636)
- Fix typo (#647, thanks @Hakuyume!)
- Move to PyTest
- Fix doctest for Python 3.5 (#663)
- Allow filtering test cases by number of GPUs with
CUPY_TEST_GPU_LIMIT
environment variable (#677) - Ignore
ComplexWarning
innumpy.pad
for NumPy 1.11 or older (#690) - Fix NumPy warning for bool and complex operations (#708)
- Fix test of
where
to use different seeds for different arrays (#710) - Skip some dtypes in
test_einsum
(#740) - Skip some tests for old NumPy (#746)
Others
- Improve version embedding (#652)
v3.0.0a1
This is the release of CuPy v3.0.0a1. See here for the complete list of solved issues and merged PRs.
New features
- Memory pool is now used as the default allocator even if CuPy is used without Chainer (#472).
- Add line memory profiler using memory hook and traceback (#265)
- Add cuDNN support for dropout. (#479)
- Add
cudnnGetTensor4dDescriptor
for fp16 BatchNormalization support in Chainer (#492, thanks @anaruse!) - Add Tensor-Core support (cuDNN and cuBLAS) (#494 and #495, thanks @anaruse!)
- Add window functions (#555, thanks @ishihara1989!)
- Add
cupy.sparse.random
(#557) - Add
cupy.argpartition
(#294)
Bug fixes
- Fix multithread problem in
PooledMemory
(#480) - Resolve dealloc problem and multithread problem in
PinnedMemory
(#481) - Fix
cupy.nonzero
for corner cases (#498) - Fix simple reduction for corner cases (#499)
- Fix
broadcast
for corner cases (#543) - Fix
broadcast_arrays
return type (#545) - Avoid using global state in
RandomState.choice
(#556) - Fix
csrmm2
to support transa (#565) - Fix
csrmv
(#571) - Avoid using
dtype
option innumpy.random.randint
which is introduced in NumPy v1.11 (#574)
Improvements
- Fix
get_array_module
to be aware ofspmatrix
(#568) - Use
vector
to improve free memory searching inmalloc
(#476) - Fix Cython warning on variable declaration (#491)
- Check kernel name validity (#522)
- Show NVRTC error code (#531)
- Optimize
RandomState.interval
(#559) - Fix
random.normal
double memory consumption (#562)
Installation
- Import
memory_hooks
(#502) - Avoid Cython 0.27.0 (#550)
- Change minimum Cython version to 0.26.1 (#365, #530, #548)
- Support NVCC environment variable (#501)
Documentation
- Fix documentation of fusion functions (#497)
- Add documentation of
cupy.all
andcupy.any
function (#511) - Correct URLs in documentation (#547)
- Fix typo (#614, thanks @fukatani!)
Examples
- Add an example of option pricing using Black-Scholes equation (#473)
v2.0.0
This is a major release of CuPy v2.0.0. All of the updates since the previous major version (v1.0.0) can be found in the release notes below:
- v2.0.0a1 (https://github.com/cupy/cupy/releases/tag/v2.0.0a1)
- v2.0.0b1 (https://github.com/cupy/cupy/releases/tag/v2.0.0b1)
- v2.0.0rc1 (https://github.com/cupy/cupy/releases/tag/v2.0.0rc1)
Important Updates
Supports the latest versions of the following libraries
- CUDA9 support (#353, thanks @anaruse!)
- cuDNN7 support (#362, thanks @anaruse!)
- NCCL2 support (#363, thanks @anaruse!)
- NumPy 1.13 (#347)
In v2.0.0a1
- We started using NVRTC instead of NVCC for kernel compilation. This change enables CuPy to run in an environment where CUDA is installed but NVCC is not available. Note that some features depending on Thrust (e.g. sorting functions) cannot be used if NVCC is not available at the installation.
- Many functions for sorting, linear algebra, and others are added.
In v2.0.0b1
- Sparse matrix.
cupy.sparse
is a module that implementsscipy.sparse
API using CUDA and cuSPARSE. We now have basic features for using sparse matrices on GPU. - New memory allocator (#168). The memory pool implementation is greatly updated. It is based on best-fit allocation with coalescing. When there are a large number of allocations with different sizes (e.g. NLP applications), the memory usage is improved and the number of re-allocations is reduced (which also reduces the running time).
In v2.0.0rc1
- Complex numbers (#232)
- Many New functions.
Bug fixes
- Fix
cupy.nonzero
for corner cases (#504) - Fix simple reduction for corner cases (#505)
- Fix multithread problem in
PooledMemory
(#507) - Resolve dealloc problem and multithread problem in
PinnedMemory
(#510) - Avoid using global state in
RandomState.choice
(#560) - Fix
broadcast
for corner cases (#577) - Fix
csrmm2
to support transa (#601) - Fix
csrmv
(#607)
Improvements
- Fix
get_array_module
to be aware ofspmatrix
(#586) - Show NVRTC error code (#538)
- Optimize
RandomState.interval
(#585) - Fix
random.normal
double memory consumption (#592) - Check kernel name validity (#596)
Installation
Documentation
- Fix warnings (#535)
- Add documentation of
cupy.all
andcupy.any
function (#514) - Fix documentation of fusion functions (#517)
- Treat sphinx warnings as errors (#519)
- Correct URLs in documentation (#561, thanks @aonotas!)
- Fix Cython requirement for documentation build (#566)
Tests
- Fix doctest warnings (#500)
- Use
mock.patch
instead of directly replacing function withMock
(#610) - Remove
print()
in tests (#509) - Travis fails with Cython 0.27. Use Cython 0.26.1 for a while (#539)
- Add corner test cases for indexing (#576)
- Add unit tests for
csrgemm
(#602)
Others
- Avoid duplicate loop index (#520)
v2.0.0rc1
This is the release of CuPy v2.0.0rc1. See here for the complete list of solved issues and merged PRs.
Changes that break compatibility
- Change the default value of the
order
argument ofcopy
from’C’
to’K’
(#159) - Add
order
andsubok
arguments toarray
(#167). It breaks the compatibility of positional arguments.
New features
- Complex numbers (#232)
- Memory hook (#264). It can be used to observe the memory allocation/deallocation events.
- New functions
- New features in sparse matrices
- Support
dia_matrix
(#313, #321, #320, #450) - Sparse matrix creation methods:
eye
(#399),spdiags
(#388) andidentity
(#358) csr_matrix
andcsc_matrix
are improved:__mul__
(#239),__rmul__
(#300),__getitem__
(#240, #301, #302),dot
(#351, #352)- Initializers of
csr_matrix
,csc_matrix
, andcoo_matrix
supportshape
argument (#316, #375) - Sparse matrices can have duplicated elements (#326, #371)
order
argument intoarray
method of csc and csr (#311)__pow__
(#359)- Conversion from a dense array to a sparse matrix (#335)
- Support conversion from
scipy.sparse
matrix tocupy.sparse
(#370)
- Support
- Added supports of new libraries
argsort
for arrays of rank two or more (#288)- Fix race-condition on memory pool (#382)
- Implemented copy option of array conversion methods and wrote tests (#408)
- Enable saving CUDA source with environment variable (#415)
- Basic support of CUDA unified memory (#447)
- Use original function name as fusion kernel name (#448)
- Support
replace=False
inrandom.choice
(#453) - Add a
sync
option totime_range
(#474, thanks @anaruse!)
Bug fixes
- Fix bug of empty
coo_matrix
(#328) - Fix default behavior of methods in
spmatrix
(#356) - Made dummy implementation to prevent infinite loop (#364)
- Avoid to call python methods in
__dealloc__()
, use__del__()
instead. (#381) - Fix race-condition on memory pool (#382)
- Fix view when the itemsize of the dtype changes (#406, thanks @boeddeker!)
- Use double backslash in str literal (#418)
- Improved
pow
test (#421) - Use
randint
instead ofrandom_integer
, which is deprecated (#425) - Fix
diagonal
(#428, thanks @fukatani!) - Use
six.assertRegex
(#432) - Fix for numpy1.13 (#445)
- Fix tocsc behavior for an empty dia matrix (#451)
Improvements
- Tell the memory size when
cudaErrorMemoryAllocation
occurred (#314) - Simplify nogil (#164)
- Skip cross compile on
setup.py
develop to build faster (#309) - Remove device memory allocation out of memory pool (#337)
- Avoid importing NumPy docstring (#355)
- Improve header handling (#367)
- Remove redundant code in
cupy_thrust.cu
(#369) - Improve
_tril()
and_triu()
with anElementwiseKernel
(#377) - Remove unnecessary condition (#383)
- Add semicolons to the reduction kernel template (#386)
- Remove redundant transpose (#390)
- Fix usage about
ElementwiseKernel
(#391) - Remove duplicated preamble definition. (#402)
- Fix
cumsum
(#414) - Use
AxisError
to maintain compatibility to multiple versions of NumPy (#437) - doc: Sort out navigation menu (#444)
- Improve
tensordot_core
(#465) - Simplify
flip
(#468) - Use
None
instead ofset()
to improve memory allocation performance (#475)
Installation
- Skip cross compile on
setup.py develop
to build faster (#309) - Fix double declaration of
tuple_less
(#368) - Made a cutomized version of sdist command to use cython (#446)
Documentation
- Fix a grammatical error in tutorial (#267)
- Add
cupy.sparse
reference (#299, #303) - Cleanup
README.md
(#334) - Hide source link for alien objects (#354)
- Avoid importing NumPy docstring (#355)
- Remove unsupported
strides
argument from docstring (#361) - Fix
matmul
arguments (#384, thanks @hvy!) - Add link to our contribution guide (#392)
- Update docstring of
linalg.einsum
(#405) - Write docstring of A property and its test (#407)
- Use double backslash in str literal (#418)
- Fix typo in
sparse.spdiags
docstring. (#426) - Remove "Edit on GitHub" link (#434)
- Reorganize navigation menu (#444)
- Clear doctest warnings (#455)
- Add documents of
linalg
(#456) - Write docstring of
sparse.issparse
(#470)
Examples
- Conjugate Gradient (#94, thanks @KotaroSetoyama!)
Tests
- Example test (#297)
- Add test for
cuda.cusolver_enabled
flag (#374) - Write tests for operators for sparse matrices (#401)
- Write docstring of
A
property and its test (#407) - Fix test for random generator (#413)
- Fix
cumsum
test (#414) - Add test for
transpose
when axes is notNone
(#420) - Improved
pow
test (#421) - Changed order argument for unknown order test as SciPy causes DeprecationWarning (#422)
- Add tests for
asfptype
(#423) - Add
assert_warns
(#424) - Use
randint
instead ofrandom_integer
that is deprecated (#425) - Use
six.assertRegex
(#432) - Show error message when an error occurs on example test (#433)
- Fix tests on Windows (#435)
- Fix tolerance of arithmetic tests (#443)
- Added test for
__iter__
ofcsr_matrix
(#449) - Fix
tocsc
behavior for an emptydia
matrix (#451) - Fix test for
tensorsolve
(#454) - Skip NumPy
clip
tests in Windows (#467) - Fix typo in test function names (#394)
Others
- Configure flake8 to ignore the .git directory (#339)
v1.0.3
This release includes bug fixes and improvements to the documentation and tests. See the list for the complete list of solved issues and merged PRs.
Bug fixes
- Avoid decoding nvcc output with UTF-8 to remove
UnicodeDecodeError
. (#378, #379) - Bug in view with different itemsize. (#403, thanks @boeddeker!)
- Avoid to call python methods in
__dealloc__
and use__del__
instead. (#411) - Fix
ndarray.view
when theitemsize
of thedtype
changes. (#416) - Fix inconsistency of
ndarray.diagonal
between NumPy and CuPy. (#436)
Improvements
Documentation
- Remove unsupported
strides
argument from docstring. (#366) - Hide source link for alien objects. (#373)
- Fix the document of
matmul
. (#412) - Use double backslashes in
str
literals. (#429) - Clear doctest warnings. (#457)
- Sort out navigation menu. (#460)
- Fix a grammatical error in tutorial. (#463)
Tests
- Use
randint
instead ofrandom_integer
that is deprecated. (#430) - Add
testing.assert_warns
and test deprecation warning ofMemory.free_all_free
. (#431) - Skip some tests for
RandomState
when NumPy < 1.11.0. (#438) - Loosen the torelance of tests for binary operators. (#461)
- Fix typo in test names. (#395)
v2.0.0b1
This is a minor release. See https://github.com/cupy/cupy/milestone/8?closed=1 for the complete list of solved issues and merged PRs.
New features
Sparse matrix
cupy.sparse
is a module that implements scipy.sparse
API using CUDA and cuSPARSE. We now have basic features for using sparse matrices on GPU.
- CSR and CSC (#226)
- COO matrix (#234)
- Conversion method from compressed matrix (csr, csc) to coordinate format (coo) (#235)
- CSR and CSC copy (#236)
__add__
,__radd__
,__sub__
and__rsub__
for CSR and CSC (#238)- Fix
toarray
incupy.sparse.spmatrix
(#312) - Return
NotImplemented
instead ofNotImplementedError
(#330) - Use
csc2dense
to convert csr-matrix to dense (#305)
We are planning to add more features to cupy.sparse
in upcoming releases.
New memory allocator (#168)
The memory pool implementation is greatly updated. It is based on best-fit allocation with coalescing. When there are a large number of allocations with different sizes (e.g. NLP applications), the memory usage is improved and the number of re-allocations is reduced (which also reduces the running time).
For example, the memory usage of the sequence-to-sequence code using Chainer (chainer/chainer#2070) is reduced from 12GiB (which means the process is using all of the available GPU memory) to 3GiB, and the number of memory reallocations from 20 times to 0 times.
It may increase the memory usage in some cases, although the amount of additional usage is small in practice (see the benchmark results in #168).
You can use this memory allocator by calling cupy.cuda.set_allocator(cupy.cuda.MemoryPool().malloc)
(when using Chainer, it is called by default).
Other features
- Implement
cupy.linalg.det
(#96) - Support
cupy.sort
to sort arrays along arbitrary axis (#229) - Implemented
RangeStart
andRangeEnd
for NVIDIA visual profiler (nvvp) (#246) - Introduce
cupy.is_available()
which takes account of device availability (#247) - Implement
cupy.msort
(#251, #329)
Bug fixes
- Fix
cupy.copyto
function to treat multiple GPUs correctly (#220) - Restore kernel type check (#253)
- Fix
deepcopy
with multiple devices (#254) - Fix
cupy.argsort
for non-contiguous arrays (#284) - Fix
ldexp
on Windows (#278)
Improvements
- Improve
cupy.argsort
performance (#285)
Installation
- Remove old cuDNN support (#219)
- Add compile options to build on Windows (#244)
- Remove duplicated build options (#280)
- Avoid creating garbage file on setup (#287)
- Fix setup for cusolver (#292)
- Use
cupy.cuda.thrust_enabled
to check Thrust enabled (#224)
Documentation
- Updated difference with NumPy on reduction function behavior (#144)
- Fix spelling in tutorial (#268)
- Fix test instruction in README (#310)
- Fix links to GitHub source pages (#332)
Examples
- Add Gaussian Mixture Model (GMM) example (#29, thanks @KotaroSetoyama!)
- Make grid size to integer for SGEMM example (#289, thanks @yuyu2172!)
- Use absolute path in SGEMM example (#291)
- Updated README for SGEMM example (#245, thanks @yuyu2172!)
Tests
- Use
cupy.testing.for_all_dtypes
(#269) - Enable style check for Python code in Travis (#273)
- Refactor
cupy.argsort
tests (#282)
Others
- Small fixes for
cupy.argsort
(#223)
v1.0.2
This release includes bug fixes and improvements to the documentation and tests. See the list for the complete list of solved issues and merged PRs.
Enhancement
- Change
allocation_unit_size
from 256 to 512 (#256) - Avoid synchronize in array function (#257)
- Deterministic test (#217)
- Note that this change includes an additional public function; we prioritized stabilizing tests more than keeping the rule of not introducing new features in stable updates.
Bug fixes
- Fix out argument in fusion ufunc (#242)
- Fix array method on multi GPU (#258)
- Fix deepcopy with multiple devices (#263)
- Fix multi-device copyto (#275)
- Fix link args for cusolver (#315)
Installation
- Add compile option to build on Windows (#279)
- Do not create a.out on running python
setup.py
develop (#293) - Fix link args for cusolver (#315)
Documentation
Tests
- Make tests deterministic when possible (#217)
- Add unit tests for
cupy.array
(#259) - Fix Numpy
VisibleDeprecationWarning
in indexing tests (#261) - Add retry to unit tests of decomposition functions (#262)
- Fix travis test to enable style check for normal Python code (#290)
- Skip bool unary negative test (#341)
Other
v2.0.0a1
This is the release of CuPy v2.0.0a1. See here for the complete list of solved issues and merged PRs.
Release Notes
Important updates
- We start using NVRTC instead of NVCC for kernel compilation. This change enables CuPy to run on an environment where CUDA is installed but NVCC is not available. Note that some features depending on Thrust (e.g. sorting functions) cannot be used if NVCC is not available at the installation.
- Many functions for sorting, linear algebra, and others are added
New features
- Use NVRTC instead of NVCC to compile kernels (#33, #62)
- Sorting functions
- Linear algebra functions
- Preliminary features to support sparse matrices
cupy.mgrid
andcupy.ogrid
(#145, thanks @iory!)cupy.random.multinomial
(#85)cupy.cumprod
(#110, thanks @ronekko!)- Support cuDNN v6 dilated convolution (#133, thanks @anaruse!)
- Add
total_bytes()
,free_bytes()
, andused_bytes()
methods to memory pool (#184) - Support
order
option inastype
(#111) andcopy
(#112) cupy.fuse
now does not require parentheses (#43)- Add
ndim
toCArray
andCIndexer
(#160, #161)
Enhancement
- Improve memory deallocation (#174)
- Skip installing thrust support in case nvcc not found in PATH. (#91)
- Improve asynchronous host to device copy (#123)
- Change the allocation unit size from 256 to 512 (#176)
- Workaround to "No supported gcc/g++ host compiler found” error in Ubuntu 17.04 (#198)
- Avoid synchronization in
cupy.array
for 0-dim values (#157) - Make
cupy.count_nonzero
return an array instead ofint
to avoid device-to-host synchronization (#154) - Check type in
assert_array_list_equal
(#205) - Improve performance (#169, #171, #172, #193, #206)
- Improve testing utility (#218, #231)
- Refactor
cupy.atleast_nd
functions (#142)
Bug fixes
- Fix
out
argument in fusion (#209, #213) - Fix
cupy.array
on multiple GPU environment (#122, #135) - Fix usages of
copy
argument ofndarray.astype
(#118, #121) - Make memory pool thread-safe (#105, thanks @kmaehashi!)
- Fix fusion to reject NumPy arrays (#151)
- Fix thread safety of
cupy.random.get_random_state
(#77, #78)
Documents
- Fix tutorial (#93, thanks @hvy!)
- Add links to GitHub source pages (#131)
- Fix typo (#148, thanks @ignisan!)
- Write about advanced indexing support (#88, thanks @yuyu2172!)
- Remove description about discrepancy with NumPy regarding exponential of boolean arrays, which was resolved in NumPy 1.13.0 (#140)
- Add missing documentation of
cupy.cumsum
(#90, thanks @ronekko!) - Add documentation of
__getitem__
and__setitem__
for ndarray (#89, thanks @yuyu2172!) - Minor improvement for README and the document (#45, #49, #117, #134, #138, #155 thanks @ClimbsRocks!, #165, #177, #166)
Examples
Tests
- Stabilize
cupy.random.choice
test (#98, #104) - Fix Numpy
VisibleDeprecationWarning
in indexing tests (#202) - Make random tests deterministic (#81, #82)
- Retry unit tests of decomposition functions (#129)
- Fix bug of histogram in
RandomState.interval
test (#175)
Others
v1.0.1
This release includes bug fixes and improvements on documents and tests. See the list for the complete list of solved issues and merged PRs.
Release Notes
Enhancement
- Workaround to "No supported gcc/g++ host compiler found” error in Ubuntu 17.04 (#243)
Bug fixes
- Make memory pool thread-safe (#109, thanks @kmaehashi!)
- Fix fusion to reject NumPy arrays (#241)
- Fix thread safety of
cupy.random.get_random_state
(#77, #99)
Documents
- Fix markdown in the tutorial (#106, thanks @hvy!)
- Write about advanced indexing support (#126, thanks @yuyu2172!)
- Remove description about discrepancy with NumPy regarding exponential of boolean arrays, which was resolved in NumPy 1.13.0 (#146)
- Fix typo in the tutorial (#153, thanks @ignisan!)
- Other documentation improvements (#125, #189, #173, #210)
Examples
- Fix color argument in the k-means example (#107)
Install
- Skip installing thrust support in case nvcc not found in PATH. (#116)
- Other install improvement: (#143)
Others
v1.0.0
This is the release of CuPy v1.
This release also contains updates of CuPy included in Chainer v1.23.0 and v1.24.0. See the release note of Chainer v1.23.0 and the release note of Chainer v1.24.0 for the details.
Announcements
The set of supported versions of CUDA and cuDNN is changed from Chainer v1.x as follows.
- CUDA 7.0 and later
- cuDNN 4.0 and later
Release Notes
Note: We had originally planned to include NVRTC support for the just-in-time compilation of kernels via pynvrtc
, but we found that there is no guarantee on pynvrtc
being compatible with old versions of CUDA, so we decided to make our own wrapper instead. Unfortunately, it cannot be included in this version. We are planning to add NVRTC support in the next version.
New features
- Add
cupy.sort
function (#55, #66, #68) - 64bit address support on CUDA (#31)
- Support
CUPY_SEED
enviroment variable (#44)
Enhancement
- Refactor carray.cuh file (#53, #56, #57)
- Support lock-free cache of compiled nvcc binary (#37)
- Allow
cupy.copyto
from Python scalar (#38) - Improve setup process (#65, #69, #70, #73, #76, #80)
Bug fixes
- Fix
cupy.random.choise
(#84)
Documents
Examples
- Add KMeans example (#35)