Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Arena MR to support simultaneous access by PTDS and other streams #1395

Merged
merged 12 commits into from
Nov 29, 2023
Merged
21 changes: 20 additions & 1 deletion include/rmm/mr/device/arena_memory_resource.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,26 @@ class arena_memory_resource final : public device_memory_resource {
}
}

if (!global_arena_.deallocate(ptr, bytes)) { RMM_FAIL("allocation not found"); }
if (!global_arena_.deallocate(ptr, bytes)) {
// It's possible to use per thread default streams along with another pool of streams.
// This means that it's possible for an arena to move from a thread or stream arena back
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for an allocation to move ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

// into the global arena during a defragmentation and then move down into another arena
// type. For instance, thread arena -> global arena -> stream arena. If this happens and
// there was an allocation from it while it was a thread arena, we now have to check to
// see if the allocation is part of a stream arena, and vice versa.
// Only do this in exceptional cases to not affect performance and have to check all
wence- marked this conversation as resolved.
Show resolved Hide resolved
// arenas all the time.
if (use_per_thread_arena(stream)) {
for (auto& stream_arena : stream_arenas_) {
if (stream_arena.second.deallocate(ptr, bytes)) { return; }
}
} else {
for (auto const& thread_arena : thread_arenas_) {
if (thread_arena.second->deallocate(ptr, bytes)) { return; }
}
}
RMM_FAIL("allocation not found");
}
}

/**
Expand Down
33 changes: 33 additions & 0 deletions tests/mr/device/arena_mr_tests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -533,6 +533,39 @@ TEST_F(ArenaTest, Defragment) // NOLINT
}());
}

TEST_F(ArenaTest, PerThreadToStreamDealloc) // NOLINT
{
// This is testing that deallocation of a ptr still works when
// it was originally allocated in a superblock that was in a thread
// arena that then moved to global arena during a defragmentation
// and then moved to a stream arena.
auto const arena_size = superblock::minimum_size * 2;
arena_mr mr(rmm::mr::get_current_device_resource(), arena_size);
auto per_thread_stream = rmm::cuda_stream_per_thread;
// Create an allocation from a per thread arena
void* thread_ptr = mr.allocate(256, per_thread_stream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [non-blocking]: No need to store this constant in a variable.

Suggested change
auto per_thread_stream = rmm::cuda_stream_per_thread;
// Create an allocation from a per thread arena
void* thread_ptr = mr.allocate(256, per_thread_stream);
// Create an allocation from a per thread arena
void* thread_ptr = mr.allocate(256, rmm::cuda_stream_per_thread);

// Create an allocation in a stream arena to force global arena
// to be empty
cuda_stream stream{};
void* ptr = mr.allocate(32_KiB, stream);
mr.deallocate(ptr, 32_KiB, stream);
// at this point the global arena doesn't have any superblocks so
// the next allocation causes defrag. Defrag causes all superblocks
// from the thread and stream arena allocated above to go back to
// global arena and it allocates one superblock to the stream arena.
auto* ptr1 = mr.allocate(superblock::minimum_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: What stream will this use?
Suggestion: Can you make the stream explicit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g.

Suggested change
auto* ptr1 = mr.allocate(superblock::minimum_size);
auto* ptr1 = mr.allocate(superblock::minimum_size, rmm::cuda_stream_view{});

// Allocate again to make sure all superblocks from
// global arena are owned by a stream arena instead of a thread arena
// or the global arena.
auto* ptr2 = mr.allocate(32_KiB);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: What stream will this use?
Suggestion: Can you make the stream explicit?

// The original thread ptr is now owned by a stream arena so make
// sure deallocation works.
// NOLINTNEXTLINE(cppcoreguidelines-avoid-goto)
EXPECT_NO_THROW(mr.deallocate(thread_ptr, 256, per_thread_stream));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [non-blocking]:

Suggested change
EXPECT_NO_THROW(mr.deallocate(thread_ptr, 256, per_thread_stream));
EXPECT_NO_THROW(mr.deallocate(thread_ptr, 256, rmm::cuda_stream_per_thread));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [non-blocking]:
Tests will fail if there is an exception whether or not you EXPECT_NO_THROW. Not throwing is expected by default. So you can remove the macro and the NOLINTNEXTLINE.

mr.deallocate(ptr1, superblock::minimum_size);
mr.deallocate(ptr2, 32_KiB);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: What stream will this use?
Suggestion: Can you make the stream explicit?

}

TEST_F(ArenaTest, DumpLogOnFailure) // NOLINT
{
arena_mr mr{rmm::mr::get_current_device_resource(), 1_MiB, true};
Expand Down