Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require explicit pool size in pool_memory_resource and move some things out of detail namespace #1417

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
c43a8c1
Add new util to get a fraction of available device mem, move availabl…
harrism Dec 19, 2023
d238daa
Deprecate old pool_mr ctors (optional initial size) and add new ctors…
harrism Dec 19, 2023
3d65d4c
Update all tests and resources to use new pool ctors and util
harrism Dec 19, 2023
66d85b4
Rename fraction_of_free_device_memory to percent_of_free_device_memory
harrism Dec 20, 2023
265de9b
clang-tidy Ignore 50 and 100 magic numbers
harrism Dec 20, 2023
0be364b
Remove straggler includes of removed file.
harrism Dec 20, 2023
266afa9
Merge branch 'branch-24.02' into fea-explicit-initial-pool-size
harrism Dec 20, 2023
5d66f40
Another missed include.
harrism Dec 20, 2023
fae5b73
Add detail::available_device_memory back as an alias of rmm::availabl…
harrism Jan 9, 2024
92c0653
merge branch 24.02
harrism Jan 9, 2024
2acf759
copyright
harrism Jan 9, 2024
782ff55
document (and deprecate) available_device_memory alias
harrism Jan 9, 2024
0b4c968
Respond to feedback from @wence-
harrism Jan 9, 2024
4f91478
Include doxygen deprecated output in docs
wence- Jan 9, 2024
f581809
Minor docstring fixes
wence- Jan 9, 2024
bafd70a
Don't use zero for default size in test.
harrism Jan 10, 2024
a77d215
Add non-detail alignment utilities
harrism Jan 10, 2024
07dffa3
Duplicate (for now) alignment utilities in rmm:: namespace since outs…
harrism Jan 10, 2024
8afff2d
Don't deprecate anything just yet (until cuDF/cuGraph updated)
harrism Jan 10, 2024
0140bd4
Merge branch 'fea-explicit-initial-pool-size' of github.com:harrism/r…
harrism Jan 10, 2024
91752c8
Make percent_of_free_device_memory do what it says on the tin.
harrism Jan 10, 2024
baf429c
Fix remaining uses of pool ctor in docs and code
harrism Jan 10, 2024
c90e81c
Fix overflow in percent_of_free_device_memory
harrism Jan 10, 2024
c2843be
Fix Cython to provide explicit initial size
harrism Jan 10, 2024
6e0aeaa
Respond to review suggestions in aligned.hpp
harrism Jan 10, 2024
c3c61e1
Fix quoted auto includes
harrism Jan 10, 2024
014ac5b
missed file for detail changes
harrism Jan 10, 2024
909b733
Add utilities doxygen group
harrism Jan 11, 2024
0fc3fba
Add utilities to sphinx docs
harrism Jan 11, 2024
6f9b0bd
Minimal changes to squash doc build warnings
wence- Jan 11, 2024
4ae13fc
docs: Fix custom handler for missing references
wence- Jan 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .clang-tidy
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,8 @@ CheckOptions:
value: 'alignment'
- key: cppcoreguidelines-avoid-magic-numbers.IgnorePowersOf2IntegerValues
value: '1'
- key: readability-magic-numbers.IgnorePowersOf2IntegerValues
value: '1'
- key: cppcoreguidelines-avoid-magic-numbers.IgnoredIntegerValues
value: "0;1;2;3;4;50;100"
- key: cppcoreguidelines-avoid-do-while.IgnoreMacros
value: 'true'
...
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,9 @@ Accessing and modifying the default resource is done through two functions:
```c++
rmm::mr::cuda_memory_resource cuda_mr;
// Construct a resource that uses a coalescing best-fit pool allocator
rmm::mr::pool_memory_resource<rmm::mr::cuda_memory_resource> pool_mr{&cuda_mr};
// With the pool initially half of available device memory
auto initial_size = rmm::percent_of_free_device_memory(50);
rmm::mr::pool_memory_resource<rmm::mr::cuda_memory_resource> pool_mr{&cuda_mr, initial_size};
rmm::mr::set_current_device_resource(&pool_mr); // Updates the current device resource pointer to `pool_mr`
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource(); // Points to `pool_mr`
```
Expand All @@ -351,11 +353,13 @@ per-device resources. Here is an example loop that creates `unique_ptr`s to `poo
objects for each device and sets them as the per-device resource for that device.

```c++
std::vector<unique_ptr<pool_memory_resource>> per_device_pools;
using pool_mr = rmm::mr::pool_memory_resource<rmm::mr::cuda_memory_resource>;
std::vector<unique_ptr<pool_mr>> per_device_pools;
for(int i = 0; i < N; ++i) {
cudaSetDevice(i); // set device i before creating MR
// Use a vector of unique_ptr to maintain the lifetime of the MRs
per_device_pools.push_back(std::make_unique<pool_memory_resource>());
// Note: for brevity, omitting creation of upstream and computing initial_size
per_device_pools.push_back(std::make_unique<pool_mr>(upstream, initial_size));
// Set the per-device resource for device i
set_per_device_resource(cuda_device_id{i}, &per_device_pools.back());
}
Expand Down
9 changes: 6 additions & 3 deletions benchmarks/device_uvector/device_uvector_bench.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
* Copyright (c) 2020-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -16,6 +16,7 @@

#include "../synchronization/synchronization.hpp"

#include <rmm/cuda_device.hpp>
#include <rmm/cuda_stream.hpp>
#include <rmm/detail/error.hpp>
#include <rmm/device_uvector.hpp>
Expand All @@ -38,7 +39,8 @@
void BM_UvectorSizeConstruction(benchmark::State& state)
{
rmm::mr::cuda_memory_resource cuda_mr{};
rmm::mr::pool_memory_resource<rmm::mr::cuda_memory_resource> mr{&cuda_mr};
rmm::mr::pool_memory_resource<rmm::mr::cuda_memory_resource> mr{
&cuda_mr, rmm::percent_of_free_device_memory(50)};
rmm::mr::set_current_device_resource(&mr);

for (auto _ : state) { // NOLINT(clang-analyzer-deadcode.DeadStores)
Expand All @@ -59,7 +61,8 @@ BENCHMARK(BM_UvectorSizeConstruction)
void BM_ThrustVectorSizeConstruction(benchmark::State& state)
{
rmm::mr::cuda_memory_resource cuda_mr{};
rmm::mr::pool_memory_resource<rmm::mr::cuda_memory_resource> mr{&cuda_mr};
rmm::mr::pool_memory_resource<rmm::mr::cuda_memory_resource> mr{
&cuda_mr, rmm::percent_of_free_device_memory(50)};
rmm::mr::set_current_device_resource(&mr);

for (auto _ : state) { // NOLINT(clang-analyzer-deadcode.DeadStores)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -16,6 +16,7 @@

#include <benchmarks/utilities/cxxopts.hpp>

#include <rmm/cuda_device.hpp>
#include <rmm/cuda_stream.hpp>
#include <rmm/cuda_stream_pool.hpp>
#include <rmm/device_uvector.hpp>
Expand Down Expand Up @@ -100,7 +101,8 @@ inline auto make_cuda_async() { return std::make_shared<rmm::mr::cuda_async_memo

inline auto make_pool()
{
return rmm::mr::make_owning_wrapper<rmm::mr::pool_memory_resource>(make_cuda());
return rmm::mr::make_owning_wrapper<rmm::mr::pool_memory_resource>(
make_cuda(), rmm::percent_of_free_device_memory(50));
}

inline auto make_arena()
Expand Down
8 changes: 5 additions & 3 deletions benchmarks/random_allocations/random_allocations.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2022, NVIDIA CORPORATION.
* Copyright (c) 2019-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -16,6 +16,7 @@

#include <benchmarks/utilities/cxxopts.hpp>

#include <rmm/cuda_device.hpp>
#include <rmm/mr/device/arena_memory_resource.hpp>
#include <rmm/mr/device/binning_memory_resource.hpp>
#include <rmm/mr/device/cuda_async_memory_resource.hpp>
Expand Down Expand Up @@ -165,12 +166,13 @@ inline auto make_cuda_async() { return std::make_shared<rmm::mr::cuda_async_memo

inline auto make_pool()
{
return rmm::mr::make_owning_wrapper<rmm::mr::pool_memory_resource>(make_cuda());
return rmm::mr::make_owning_wrapper<rmm::mr::pool_memory_resource>(
make_cuda(), rmm::percent_of_free_device_memory(50));
}

inline auto make_arena()
{
auto free = rmm::detail::available_device_memory().first;
auto free = rmm::available_device_memory().first;
constexpr auto reserve{64UL << 20}; // Leave some space for CUDA overhead.
return rmm::mr::make_owning_wrapper<rmm::mr::arena_memory_resource>(make_cuda(), free - reserve);
}
Expand Down
4 changes: 2 additions & 2 deletions benchmarks/replay/replay.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020-2021, NVIDIA CORPORATION.
* Copyright (c) 2020-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -61,7 +61,7 @@ inline auto make_pool(std::size_t simulated_size)
return rmm::mr::make_owning_wrapper<rmm::mr::pool_memory_resource>(
make_simulated(simulated_size), simulated_size, simulated_size);
}
return rmm::mr::make_owning_wrapper<rmm::mr::pool_memory_resource>(make_cuda());
return rmm::mr::make_owning_wrapper<rmm::mr::pool_memory_resource>(make_cuda(), 0);
}

inline auto make_arena(std::size_t simulated_size)
Expand Down
2 changes: 1 addition & 1 deletion doxygen/Doxyfile
Original file line number Diff line number Diff line change
Expand Up @@ -504,7 +504,7 @@ EXTRACT_PACKAGE = NO
# included in the documentation.
# The default value is: NO.

EXTRACT_STATIC = NO
EXTRACT_STATIC = YES

# If the EXTRACT_LOCAL_CLASSES tag is set to YES, classes (and structs) defined
# locally in source files will be included in the documentation. If set to NO,
Expand Down
3 changes: 2 additions & 1 deletion include/doxygen_groups.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2023, NVIDIA CORPORATION.
* Copyright (c) 2023-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -41,4 +41,5 @@
* @defgroup errors Errors
* @defgroup logging Logging
* @defgroup thrust_integrations Thrust Integrations
* @defgroup utilities Utilities
*/
119 changes: 119 additions & 0 deletions include/rmm/aligned.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
/*
* Copyright (c) 2020-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once

#include <cassert>
#include <cstddef>
#include <cstdint>

namespace rmm {

/**
* @addtogroup utilities
* @{
* @file
*/

/**
* @brief Default alignment used for host memory allocated by RMM.
*
*/
static constexpr std::size_t RMM_DEFAULT_HOST_ALIGNMENT{alignof(std::max_align_t)};

/**
* @brief Default alignment used for CUDA memory allocation.
*
*/
static constexpr std::size_t CUDA_ALLOCATION_ALIGNMENT{256};

/**
* @brief Returns whether or not `value` is a power of 2.
*
* @param[in] value to check.
*
* @return Whether the input a power of two with non-negative exponent
*/
constexpr bool is_pow2(std::size_t value) { return (value != 0U) && ((value & (value - 1)) == 0U); }

/**
* @brief Returns whether or not `alignment` is a valid memory alignment.
*
* @param[in] alignment to check
*
* @return Whether the alignment is valid
*/
constexpr bool is_supported_alignment(std::size_t alignment) { return is_pow2(alignment); }

/**
* @brief Align up to nearest multiple of specified power of 2
*
* @param[in] value value to align
* @param[in] alignment amount, in bytes, must be a power of 2
*
* @return Return the aligned value, as one would expect
*/
constexpr std::size_t align_up(std::size_t value, std::size_t alignment) noexcept
{
assert(is_supported_alignment(alignment));
return (value + (alignment - 1)) & ~(alignment - 1);
}

/**
* @brief Align down to the nearest multiple of specified power of 2
*
* @param[in] value value to align
* @param[in] alignment amount, in bytes, must be a power of 2
*
* @return Return the aligned value, as one would expect
*/
constexpr std::size_t align_down(std::size_t value, std::size_t alignment) noexcept
{
assert(is_supported_alignment(alignment));
return value & ~(alignment - 1);
}

/**
* @brief Checks whether a value is aligned to a multiple of a specified power of 2
*
* @param[in] value value to check for alignment
* @param[in] alignment amount, in bytes, must be a power of 2
*
* @return true if aligned
*/
constexpr bool is_aligned(std::size_t value, std::size_t alignment) noexcept
{
assert(is_supported_alignment(alignment));
return value == align_down(value, alignment);
}

/**
* @brief Checks whether the provided pointer is aligned to a specified @p alignment
*
* @param[in] ptr pointer to check for alignment
* @param[in] alignment required alignment in bytes, must be a power of 2
*
* @return true if the pointer is aligned
*/
inline bool is_pointer_aligned(void* ptr, std::size_t alignment = CUDA_ALLOCATION_ALIGNMENT)
{
// NOLINTNEXTLINE(cppcoreguidelines-pro-type-reinterpret-cast)
return is_aligned(reinterpret_cast<std::uintptr_t>(ptr), alignment);
}

/** @} */ // end of group

} // namespace rmm
46 changes: 45 additions & 1 deletion include/rmm/cuda_device.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -15,6 +15,7 @@
*/
#pragma once

#include <rmm/aligned.hpp>
#include <rmm/detail/error.hpp>

#include <cuda_runtime_api.h>
Expand Down Expand Up @@ -102,6 +103,49 @@ inline int get_num_cuda_devices()
return num_dev;
}

/**
* @brief Returns the available and total device memory in bytes for the current device
*
* @return The available and total device memory in bytes for the current device as a std::pair.
*/
inline std::pair<std::size_t, std::size_t> available_device_memory()
harrism marked this conversation as resolved.
Show resolved Hide resolved
{
std::size_t free{};
std::size_t total{};
RMM_CUDA_TRY(cudaMemGetInfo(&free, &total));
return {free, total};
}

namespace detail {

/**
* @brief Returns the available and total device memory in bytes for the current device
*
* @deprecated Use rmm::available_device_memory() instead.
*
* @return The available and total device memory in bytes for the current device as a std::pair.
*/
//[[deprecated("Use `rmm::available_device_memory` instead.")]] //
const auto available_device_memory = rmm::available_device_memory;

} // namespace detail

/**
* @brief Returns the approximate specified percent of available device memory on the current CUDA
* device, aligned (down) to the nearest CUDA allocation size.
*
* @param percent The percent of free memory to return.
*
* @return The recommended initial device memory pool size in bytes.
*/
inline std::size_t percent_of_free_device_memory(int percent)
{
[[maybe_unused]] auto const [free, total] = rmm::available_device_memory();
auto fraction = static_cast<double>(percent) / 100.0;
return rmm::align_down(static_cast<std::size_t>(static_cast<double>(free) * fraction),
rmm::CUDA_ALLOCATION_ALIGNMENT);
}

/**
* @brief RAII class that sets the current CUDA device to the specified device on construction
* and restores the previous device on destruction.
Expand Down
Loading