From f29a880003aa3c7d64b1dc98948e2758bea8133e Mon Sep 17 00:00:00 2001 From: Mark Harris Date: Tue, 2 Apr 2024 03:58:57 +0000 Subject: [PATCH 1/3] Fix ordering / heading levels in readme and python example in guide --- README.md | 70 ++++++++++++++++++++++---------------------- python/docs/guide.md | 10 +++---- 2 files changed, 40 insertions(+), 40 deletions(-) diff --git a/README.md b/README.md index 9ec8cbf47..13d6651a8 100644 --- a/README.md +++ b/README.md @@ -207,37 +207,6 @@ alignment argument. All allocations are required to be aligned to at least 256B. `device_memory_resource` adds an additional `cuda_stream_view` argument to allow specifying the stream on which to perform the (de)allocation. -## `cuda_stream_view` and `cuda_stream` - -`rmm::cuda_stream_view` is a simple non-owning wrapper around a CUDA `cudaStream_t`. This wrapper's -purpose is to provide strong type safety for stream types. (`cudaStream_t` is an alias for a pointer, -which can lead to ambiguity in APIs when it is assigned `0`.) All RMM stream-ordered APIs take a -`rmm::cuda_stream_view` argument. - -`rmm::cuda_stream` is a simple owning wrapper around a CUDA `cudaStream_t`. This class provides -RAII semantics (constructor creates the CUDA stream, destructor destroys it). An `rmm::cuda_stream` -can never represent the CUDA default stream or per-thread default stream; it only ever represents -a single non-default stream. `rmm::cuda_stream` cannot be copied, but can be moved. - -## `cuda_stream_pool` - -`rmm::cuda_stream_pool` provides fast access to a pool of CUDA streams. This class can be used to -create a set of `cuda_stream` objects whose lifetime is equal to the `cuda_stream_pool`. Using the -stream pool can be faster than creating the streams on the fly. The size of the pool is configurable. -Depending on this size, multiple calls to `cuda_stream_pool::get_stream()` may return instances of -`rmm::cuda_stream_view` that represent identical CUDA streams. - -### Thread Safety - -All current device memory resources are thread safe unless documented otherwise. More specifically, -calls to memory resource `allocate()` and `deallocate()` methods are safe with respect to calls to -either of these functions from other threads. They are _not_ thread safe with respect to -construction and destruction of the memory resource object. - -Note that a class `thread_safe_resource_adapter` is provided which can be used to adapt a memory -resource that is not thread safe to be thread safe (as described above). This adapter is not needed -with any current RMM device memory resources. - ### Stream-ordered Memory Allocation `rmm::mr::device_memory_resource` is a base class that provides stream-ordered memory allocation. @@ -386,17 +355,48 @@ line of the error comment. } ``` -### Allocators +## `cuda_stream_view` and `cuda_stream` + +`rmm::cuda_stream_view` is a simple non-owning wrapper around a CUDA `cudaStream_t`. This wrapper's +purpose is to provide strong type safety for stream types. (`cudaStream_t` is an alias for a pointer, +which can lead to ambiguity in APIs when it is assigned `0`.) All RMM stream-ordered APIs take a +`rmm::cuda_stream_view` argument. + +`rmm::cuda_stream` is a simple owning wrapper around a CUDA `cudaStream_t`. This class provides +RAII semantics (constructor creates the CUDA stream, destructor destroys it). An `rmm::cuda_stream` +can never represent the CUDA default stream or per-thread default stream; it only ever represents +a single non-default stream. `rmm::cuda_stream` cannot be copied, but can be moved. + +## `cuda_stream_pool` + +`rmm::cuda_stream_pool` provides fast access to a pool of CUDA streams. This class can be used to +create a set of `cuda_stream` objects whose lifetime is equal to the `cuda_stream_pool`. Using the +stream pool can be faster than creating the streams on the fly. The size of the pool is configurable. +Depending on this size, multiple calls to `cuda_stream_pool::get_stream()` may return instances of +`rmm::cuda_stream_view` that represent identical CUDA streams. + +## Thread Safety + +All current device memory resources are thread safe unless documented otherwise. More specifically, +calls to memory resource `allocate()` and `deallocate()` methods are safe with respect to calls to +either of these functions from other threads. They are _not_ thread safe with respect to +construction and destruction of the memory resource object. + +Note that a class `thread_safe_resource_adapter` is provided which can be used to adapt a memory +resource that is not thread safe to be thread safe (as described above). This adapter is not needed +with any current RMM device memory resources. + +## Allocators C++ interfaces commonly allow customizable memory allocation through an [`Allocator`](https://en.cppreference.com/w/cpp/named_req/Allocator) object. RMM provides several `Allocator` and `Allocator`-like classes. -#### `polymorphic_allocator` +### `polymorphic_allocator` A [stream-ordered](#stream-ordered-memory-allocation) allocator similar to [`std::pmr::polymorphic_allocator`](https://en.cppreference.com/w/cpp/memory/polymorphic_allocator). Unlike the standard C++ `Allocator` interface, the `allocate` and `deallocate` functions take a `cuda_stream_view` indicating the stream on which the (de)allocation occurs. -#### `stream_allocator_adaptor` +### `stream_allocator_adaptor` `stream_allocator_adaptor` can be used to adapt a stream-ordered allocator to present a standard `Allocator` interface to consumers that may not be designed to work with a stream-ordered interface. @@ -415,7 +415,7 @@ auto p = adapted.allocate(100); adapted.deallocate(p,100); ``` -#### `thrust_allocator` +### `thrust_allocator` `thrust_allocator` is a device memory allocator that uses the strongly typed `thrust::device_ptr`, making it usable with containers like `thrust::device_vector`. diff --git a/python/docs/guide.md b/python/docs/guide.md index c06135ca8..aee01118a 100644 --- a/python/docs/guide.md +++ b/python/docs/guide.md @@ -181,9 +181,9 @@ You can configure for memory allocations using their by configuring the current allocator. -```python -from rmm.allocators.torch import rmm_torch_allocator -import torch + ```python + >>> from rmm.allocators.torch import rmm_torch_allocator + >>> import torch -torch.cuda.memory.change_current_allocator(rmm_torch_allocator) -``` + >>>torch.cuda.memory.change_current_allocator(rmm_torch_allocator) + ``` From 134a87e29155ea3581dae3f13714aaa3d950b20f Mon Sep 17 00:00:00 2001 From: Mark Harris Date: Wed, 3 Apr 2024 01:45:22 +0000 Subject: [PATCH 2/3] Fix heading levels better --- README.md | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index 13d6651a8..0fe848fea 100644 --- a/README.md +++ b/README.md @@ -207,7 +207,7 @@ alignment argument. All allocations are required to be aligned to at least 256B. `device_memory_resource` adds an additional `cuda_stream_view` argument to allow specifying the stream on which to perform the (de)allocation. -### Stream-ordered Memory Allocation +## Stream-ordered Memory Allocation `rmm::mr::device_memory_resource` is a base class that provides stream-ordered memory allocation. This allows optimizations such as re-using memory deallocated on the same stream without the @@ -239,16 +239,16 @@ For further information about stream-ordered memory allocation semantics, read Allocator](https://developer.nvidia.com/blog/using-cuda-stream-ordered-memory-allocator-part-1/) on the NVIDIA Developer Blog. -### Available Resources +## Available Device Resources RMM provides several `device_memory_resource` derived classes to satisfy various user requirements. For more detailed information about these resources, see their respective documentation. -#### `cuda_memory_resource` +### `cuda_memory_resource` Allocates and frees device memory using `cudaMalloc` and `cudaFree`. -#### `managed_memory_resource` +### `managed_memory_resource` Allocates and frees device memory using `cudaMallocManaged` and `cudaFree`. @@ -256,22 +256,22 @@ Note that `managed_memory_resource` cannot be used with NVIDIA Virtual GPU Softw with virtual machines or hypervisors) because [NVIDIA CUDA Unified Memory is not supported by NVIDIA vGPU](https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#cuda-open-cl-support-vgpu). -#### `pool_memory_resource` +### `pool_memory_resource` A coalescing, best-fit pool sub-allocator. -#### `fixed_size_memory_resource` +### `fixed_size_memory_resource` A memory resource that can only allocate a single fixed size. Average allocation and deallocation cost is constant. -#### `binning_memory_resource` +### `binning_memory_resource` Configurable to use multiple upstream memory resources for allocations that fall within different bin sizes. Often configured with multiple bins backed by `fixed_size_memory_resource`s and a single `pool_memory_resource` for allocations larger than the largest bin size. -### Default Resources and Per-device Resources +## Default Resources and Per-device Resources RMM users commonly need to configure a `device_memory_resource` object to use for all allocations where another resource has not explicitly been provided. A common example is configuring a @@ -296,7 +296,7 @@ Accessing and modifying the default resource is done through two functions: `get_current_device_resource()` - For more explicit control, you can use `set_per_device_resource()`, which takes a device ID. -#### Example +### Example ```c++ rmm::mr::cuda_memory_resource cuda_mr; @@ -308,7 +308,7 @@ rmm::mr::set_current_device_resource(&pool_mr); // Updates the current device re rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource(); // Points to `pool_mr` ``` -#### Multiple Devices +### Multiple Devices A `device_memory_resource` should only be used when the active CUDA device is the same device that was active when the `device_memory_resource` was created. Otherwise behavior is undefined. @@ -497,13 +497,13 @@ Similar to `device_memory_resource`, it has two key functions for (de)allocation Unlike `device_memory_resource`, the `host_memory_resource` interface and behavior is identical to `std::pmr::memory_resource`. -### Available Resources +## Available Host Resources -#### `new_delete_resource` +### `new_delete_resource` Uses the global `operator new` and `operator delete` to allocate host memory. -#### `pinned_memory_resource` +### `pinned_memory_resource` Allocates "pinned" host memory using `cuda(Malloc/Free)Host`. @@ -611,7 +611,7 @@ resources are detectable with Compute Sanitizer Memcheck. It may be possible in the future to add support for memory bounds checking with other memory resources using NVTX APIs. -## Using RMM in Python Code +# Using RMM in Python There are two ways to use RMM in Python code: @@ -622,7 +622,7 @@ There are two ways to use RMM in Python code: RMM provides a `MemoryResource` abstraction to control _how_ device memory is allocated in both the above uses. -### DeviceBuffers +## DeviceBuffer A DeviceBuffer represents an **untyped, uninitialized device memory allocation**. DeviceBuffers can be created by providing the @@ -662,7 +662,7 @@ host: array([1., 2., 3.]) ``` -### MemoryResource objects +## MemoryResource objects `MemoryResource` objects are used to configure how device memory allocations are made by RMM. From 7c90edc800b12e8099990eebd6bc2a87377273d3 Mon Sep 17 00:00:00 2001 From: Mark Harris Date: Wed, 3 Apr 2024 01:46:44 +0000 Subject: [PATCH 3/3] Indentation --- python/docs/guide.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/python/docs/guide.md b/python/docs/guide.md index aee01118a..968be8586 100644 --- a/python/docs/guide.md +++ b/python/docs/guide.md @@ -181,9 +181,9 @@ You can configure for memory allocations using their by configuring the current allocator. - ```python - >>> from rmm.allocators.torch import rmm_torch_allocator - >>> import torch +```python +>>> from rmm.allocators.torch import rmm_torch_allocator +>>> import torch - >>>torch.cuda.memory.change_current_allocator(rmm_torch_allocator) - ``` +>>> torch.cuda.memory.change_current_allocator(rmm_torch_allocator) +```