[FEA] Provide device_resources_manager for easy generation of device_resources #1716

wphicks · 2023-08-03T23:33:34Z

Edit: I have renamed the object to raft::device_resources_manager on the (excellent) suggestion of @cjnolet. Substitute the name accordingly in the description below.

Summary

This PR introduces a new utility (raft::resource_manager) to RAFT that is used to help downstream applications correctly generate the raft::device_resources they need to interact with the RAFT API.

Purpose

As more vector search applications have begun integrating RAFT, it has become apparent that correctly managing CUDA resources like streams, stream pools, and device memory can be a challenge in a codebase that has previously focused exclusively on CPU execution. As a specific example, these applications are generally highly multi-threaded on the host. As they begin to use RAFT, they typically make use of the default device_resources constructor to generate the requisite device_resources object for the API. Because this default constructor makes use of the default stream per thread, the application uses as many streams as there are host threads. This behavior can lead to exhaustion of device resources because all of those host threads simultaneously launch work on independent streams.

In a CUDA-aware codebase, we might expect the application to manage its own limited pool of streams, but requiring this creates and unnecessary barrier for RAFT adoption. Instead, the resource_manager provides a straightforward method for limiting streams and other CUDA resources to sensible values in a highly multi-threaded application.

Usage

To use the resource_manager, the host application will make use of setters which can provide control over various device resources. For instance, to limit the total number of streams used by the application to 16 per device, the application would call the following during its startup:

raft::resource_manager::set_streams_per_device(16);

After startup, if the application wishes to make a RAFT API call using a raft::device_resources object, it may call the following:

auto res = raft::resource_manager::get_device_resources();
some_raft_call(res);

If the same host thread calls get_device_resources() again in another function, it will retrieve a device_resources object based on the exact same stream it got with the previous call. This is similar in spirit to the way that the default CUDA stream per thread is used, but it draws from a limited pool of streams. This also means that while each host thread is associated with one CUDA stream, that stream may be associated with multiple host threads.

In addition to drawing the device_resources primary stream from a limited pool, we can share a pool of stream pools among host threads:

// Share 4 stream pools among host threads, with the same pool always assigned to any given thread
raft::resource_manager::set_stream_pools_per_device(4);

Besides streams and stream pools, the resource manager optionally allows initialization of an RMM memory pool for device allocations:

// Start the pool with only 2048 bytes
raft::resource_manager::set_init_mem_pool_size(2048);
// Allow the pool to use all available device memory
raft::resource_manager::set_max_mem_pool_size(std::nullopt);

For downstream consumers who know they want a memory pool but are not sure what sizes to pick, the following convenience function just sets up the memory pool with RMM defaults:

raft::resource_manager::set_mem_pool();

If no memory pool related options are set or if the maximum memory pool size is set to 0, no memory pool will be created or used. Furthermore, if the type of the current device memory resource is non-default, no memory pool will be created or used, and a warning will be emitted. We assume that if the application has already set a non-default memory resource, this was done intentionally and should not be overwritten.

Design

This object is designed with the following priorities:

Ease of use
Thread safety
Performance
(Lastly) Access to as many device_resources options as possible

If a downstream application needs complete control over device_resources creation and memory resource initialization, that codebase is probably already CUDA-aware and will not benefit from the resource manager. Therefore, we do not insist that every possible configuration of resources be available through the manager. Nevertheless, codebases may grow more CUDA-aware with time, so we provide access to as many options as possible (with sensible defaults) to provide an on-ramp to more sophisticated manipulation of device resources.

In terms of performance, the goal is to make retrieval of device_resources as fast as possible and avoid blocking other host threads. To that end, the design of resource_manager includes a layer of indirection that ensures that each host thread needs to acquire a lock only on its first get_device_resources call for any given device. The resource_manager singleton maintains an internal configuration object that may be updated using the setter methods until the first call to get_device_resources on any thread. After that, a warning will be emitted and no changes will be made on any subsequent setter calls.

Within the get_device_resources call, each thread keeps track of which devices it has already retrieved resources for. If it has not yet retrieved resources for that device, it acquires a lock. It then marks the configuration as finalized and checks to see if any thread has initialized the shared resources for that device. If no other thread has, it initializes those resources itself. It then updates its own thread_local list of devices it has retrieved resources for so that it does not need to reacquire the lock on subsequent calls.

Questions

This PR is still a WIP. It is being posted here now primarily to gather feedback on its public API. A thorough review of the implementation can happen once tests have been added and it is moved out of WIP. Edit: This PR is now ready for review. Implementation feedback welcome!

One question I have is about two convenience functions I added right at the end: synchronize_work_from_this_thread and synchronize_work_from_all_threads. The idea behind these functions is that for the target audience of this feature, it may be helpful to provide synchronization helpers that clearly indicate what their execution mean in relation to the more familiar synchronization requirements of the host code. The first is intended to communicate that it is guaranteed to block host execution on this thread until all work submitted to the device from that thread is completed. The hope is that this can increase confidence around questions of synchronicity that developers unfamiliar with CUDA streams sometimes have. Because this helper is not strictly required for the core functionality of resource_manager, however, it would be useful to have feedback on whether others think it is worthwhile.

I am even less certain about synchronize_work_from_all_threads. The idea behind this is that it provides a way to block host thread execution until all work submitted on streams under the resource_manager's control have completed. A natural question is why we would not just perform a device synchronization at that point. My justification for that is that the application may also be integrating other libraries which have their own stream management and synchronization infrastructure. In that case, it may be desirable not to synchronize work submitted to the device by calls to the other library. It would be useful to hear if others think this might be useful for the target audience of this PR or if we should just push them toward a device synchronization to avoid unpleasant edge cases.

cjnolet · 2023-08-04T02:30:32Z

I took a cursory look through the code. The intent is pretty clear and I think the implementation looks good. I'll take a closer look tomorrow and read through the description more thoroughly.

I suggest we name this device_resource_manager or even device_resources_manager so that it's consistent with all the other naming conventions in RAFT- it becomes blatantly clear just through the naming that it's specific to managing device resources.

Also implement destructor to ensure that resources created by manager are appropriately destroyed before the CUDA context

wphicks · 2023-08-04T16:06:32Z

~~Looks like my auto-signing configuration for git is not working. I'll rebase to add signatures to all commits before final review.~~

Edit: Figured it out!

divyegala

Thanks for this PR! It's shaping really well and I am very happy with your descriptions as well how well-commented the code is.

cpp/include/raft/core/device_resources_manager.hpp

cpp/include/raft/core/logger.hpp

cjnolet

I finally got a chance to review this more thoroghly. Mostly, I'd like to separate the concerns a little better between individual device_resources and the device_resources_manager. From the RAFT perspective, the device_resources is not just an API for invoking RAFT's APIs. It's supposed to be responsible for maintaining the state assigned to an individual device. In many ways, this device_resources_manager implementation is side-stepping the individual resource management when we could be using this as an opportunity to reuse the existing centralized resource management infrastructure.

Other than that, I think the design is great RAFT has been in need of something that can manage multiple devices (e.g. instances of device_resources for different devices) across different threads.

I'm even okay stepping down a layer and using raft::resources directly, if that makes the experience more clean, but I'd like to integrate and reuse the raft::resource APIs as much as possible.

cpp/include/raft/core/device_resources_manager.hpp

wphicks · 2023-08-08T14:09:46Z

In many ways, this device_resources_manager implementation is side-stepping the individual resource management when we could be using this as an opportunity to reuse the existing centralized resource management infrastructure.

Could you be more specific? The intention of this object is to ensure that device_resources objects are constructed from a fixed pool of resources. E.g. Multiple device_resources objects may share a single stream in order to control the total number of streams used by the application per device. If you are saying that the device_resources objects themselves should be solely responsible for ownership of the underlying streams, there is no way to achieve that. If you are suggesting that we should create initial device_resources objects to distribute their resources among the final device_resources objects returned, this seems unnecessary when all we really want is a collection of streams (adequately encapsulated by rmm::cuda_stream_pool) and a collection of pools (adequately encapsulated by any container of rmm::cuda_stream_pool).

wphicks · 2023-08-08T21:58:31Z

Refactored to cache the generated device_resources. Still to finalize:

Fix setting of workspace memory resource
Fix docs
Move the log_level_setter implementation

cpp/include/raft/core/device_resources_manager.hpp

ahendriksen

Marking my comment about splitting the logger code into logger-inl and logger-ext as a request for changes 👍

https://github.com/rapidsai/raft/pull/1716/files#r1286897448

…r' into fea-resource_manager

wphicks · 2023-08-10T21:55:56Z

Updated everything requested except the docs. I'll finish those up tomorrow.

ahendriksen

One more move of log_level_setter requested. Sorry, should have given this a bit more thought in the previous comment.

cpp/include/raft/core/logger-inl.hpp

…r' into fea-resource_manager

ahendriksen

Thanks for moving log_level_setter to logger-ext.cuh. LGTM!

cjnolet

LGTM. Thanks @wphicks!

cjnolet · 2023-08-15T17:46:41Z

/merge

wphicks added 2 commits August 3, 2023 15:08

Provide initial implementation of resource_manager

7c1c14f

Add synchronization helpers

6f4bbd3

github-actions bot added the cpp label Aug 3, 2023

wphicks added non-breaking Non-breaking change feature request New feature or request labels Aug 3, 2023

wphicks added 3 commits August 3, 2023 19:47

Update pool setter docs

01f1993

Update style

6c2ae3a

Add missing copyright header

75c6862

cjnolet assigned wphicks Aug 4, 2023

Update name to device_resources_manager

329eae2

Also implement destructor to ensure that resources created by manager are appropriately destroyed before the CUDA context

wphicks changed the title ~~[FEA] Provide resource_manager for easy generation of device_resources~~ [FEA] Provide device_resources_manager for easy generation of device_resources Aug 4, 2023

Merge branch 'branch-23.10' into fea-resource_manager

dc03d71

wphicks mentioned this pull request Aug 4, 2023

Provide GPU-accelerated vector indexes with RAFT RedisAI/VectorSimilarity#413

Draft

7 tasks

divyegala reviewed Aug 4, 2023

View reviewed changes

wphicks added 7 commits August 7, 2023 10:29

Merge branch 'branch-23.10' into fea-resource_manager

d4952da

Rename get_resources to get_device_resources

8cc26a7

Add tests for device_setter

776851a

Add test for multiple device_setters in one scope

2394d52

Add more robust test of get_current_device

3ac4de2

Add device_resources_manager tests

48cd06c

Use universal initialization syntax

e0f7bfe

github-actions bot added the CMake label Aug 7, 2023

wphicks marked this pull request as ready for review August 7, 2023 21:31

wphicks requested review from a team as code owners August 7, 2023 21:31

wphicks commented Aug 7, 2023

View reviewed changes

cpp/include/raft/core/logger.hpp Outdated Show resolved Hide resolved

Remove unused test helper function

9a29dd0

cjnolet requested changes Aug 7, 2023

View reviewed changes

Remove synchronization helpers

befe92c

Cache device_resources created by manager

3af0280

cjnolet reviewed Aug 9, 2023

View reviewed changes

cpp/include/raft/core/device_resources_manager.hpp Show resolved Hide resolved

cpp/include/raft/core/device_resources_manager.hpp Show resolved Hide resolved

Merge branch 'branch-23.10' into fea-resource_manager

2706652

ahendriksen requested changes Aug 10, 2023

View reviewed changes

wphicks added 3 commits August 10, 2023 17:28

Correct check for workspace memory allocator setting

6140463

Move log_level_setter definition

880926c

Merge remote-tracking branch 'refs/remotes/origin/fea-resource_manage…

0727dc5

…r' into fea-resource_manager

Merge branch 'branch-23.10' into fea-resource_manager

1fd7b8f

ahendriksen requested changes Aug 11, 2023

View reviewed changes

cpp/include/raft/core/logger-inl.hpp Outdated Show resolved Hide resolved

wphicks added 4 commits August 11, 2023 14:02

Fix docs and move log_level_setter

4eed573

Add device_resources_manager to core_resources docs

0671771

Merge remote-tracking branch 'refs/remotes/origin/fea-resource_manage…

3c5c0bd

…r' into fea-resource_manager

Merge branch 'branch-23.10' into fea-resource_manager

7588044

wphicks requested review from ahendriksen and cjnolet August 11, 2023 21:33

ahendriksen approved these changes Aug 14, 2023

View reviewed changes

cjnolet approved these changes Aug 15, 2023

View reviewed changes

Merge branch 'branch-23.10' into fea-resource_manager

e15dc59

rapids-bot bot merged commit 7aded12 into rapidsai:branch-23.10 Aug 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Provide device_resources_manager for easy generation of device_resources #1716

[FEA] Provide device_resources_manager for easy generation of device_resources #1716

wphicks commented Aug 3, 2023 •

edited

Loading

cjnolet commented Aug 4, 2023 •

edited

Loading

wphicks commented Aug 4, 2023 •

edited

Loading

divyegala left a comment

cjnolet left a comment •

edited

Loading

wphicks commented Aug 8, 2023

wphicks commented Aug 8, 2023 •

edited

Loading

ahendriksen left a comment

wphicks commented Aug 10, 2023

ahendriksen left a comment

ahendriksen left a comment

cjnolet left a comment

cjnolet commented Aug 15, 2023

[FEA] Provide device_resources_manager for easy generation of device_resources #1716

[FEA] Provide device_resources_manager for easy generation of device_resources #1716

Conversation

wphicks commented Aug 3, 2023 • edited Loading

Summary

Purpose

Usage

Design

Questions

cjnolet commented Aug 4, 2023 • edited Loading

wphicks commented Aug 4, 2023 • edited Loading

divyegala left a comment

Choose a reason for hiding this comment

cjnolet left a comment • edited Loading

Choose a reason for hiding this comment

wphicks commented Aug 8, 2023

wphicks commented Aug 8, 2023 • edited Loading

ahendriksen left a comment

Choose a reason for hiding this comment

wphicks commented Aug 10, 2023

ahendriksen left a comment

Choose a reason for hiding this comment

ahendriksen left a comment

Choose a reason for hiding this comment

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet commented Aug 15, 2023

wphicks commented Aug 3, 2023 •

edited

Loading

cjnolet commented Aug 4, 2023 •

edited

Loading

wphicks commented Aug 4, 2023 •

edited

Loading

cjnolet left a comment •

edited

Loading

wphicks commented Aug 8, 2023 •

edited

Loading