Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to enable rmm pool on cudf.pandas import #15628

Merged
merged 10 commits into from
Jun 13, 2024

Conversation

galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented May 1, 2024

Description

This PR enables allocating of rmm memory pool on cudf.pandas import using the following environment variables:

export CUDF_PANDAS_RMM_MODE="pool"

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@galipremsagar galipremsagar added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 1, 2024
@galipremsagar galipremsagar self-assigned this May 1, 2024
@galipremsagar galipremsagar requested a review from a team as a code owner May 1, 2024 15:12
@github-actions github-actions bot added the Python Affects Python cuDF API. label May 1, 2024
python/cudf/cudf/pandas/__init__.py Outdated Show resolved Hide resolved
python/cudf/cudf/pandas/__init__.py Outdated Show resolved Hide resolved
python/cudf/cudf/pandas/__init__.py Outdated Show resolved Hide resolved
if cudf_pandas_mr is not None:
from rmm.mr import PoolMemoryResource

mr = PoolMemoryResource(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always want the user's specified MR to be wrapped in a pool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, open to suggestions here. Or we can change this as we know which works the best from experimenting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly not. In the example in the PR description you specify CudeAsyncMemoryResource. That already has a pool, you probably wouldn't always want to wrap that in an RMM PoolMemoryResource.

I thought in the meeting we discussed just providing a limited set of choices here. More sophisticated users should be able to import RMM first and configure the current device resource to override if they have very specific MR needs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do a similar thing in libcudf NVBench benchmarks. Here are the options we expose there, for reference:

inline std::shared_ptr<rmm::mr::device_memory_resource> create_memory_resource(
std::string const& mode)
{
if (mode == "cuda") return make_cuda();
if (mode == "pool") return make_pool();
if (mode == "async") return make_async();
if (mode == "arena") return make_arena();
if (mode == "managed") return make_managed();
if (mode == "managed_pool") return make_managed_pool();
CUDF_FAIL("Unknown rmm_mode parameter: " + mode +
"\nExpecting: cuda, pool, async, arena, managed, or managed_pool");
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harrism Do you feel comfortable having a similar option set for cudf.pandas?

Copy link
Member

@madsbk madsbk May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harrism Do you feel comfortable having a similar option set for cudf.pandas?

I think this option set would be good, it covers the typical use cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we need arena?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we need arena?

Maybe, I haven't tried it, but its fragmentation avoidance might be useful.
However, to limit the user complexity/documentation of cudf.pandas, I am also fine skipping arena. We can always add it later if it turns out to be useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can also manually enable it by importing RMM and enabling it before importing cudf.pandas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR to have simpler rmm mode selection and added pytests for the same.

@harrism @bdice This should be ready for review.

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should always wrap with a PoolMemoryResource

if cudf_pandas_mr is not None:
from rmm.mr import PoolMemoryResource

mr = PoolMemoryResource(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly not. In the example in the PR description you specify CudeAsyncMemoryResource. That already has a pool, you probably wouldn't always want to wrap that in an RMM PoolMemoryResource.

I thought in the meeting we discussed just providing a limited set of choices here. More sophisticated users should be able to import RMM first and configure the current device resource to override if they have very specific MR needs.


if (rmm_mode := os.getenv("CUDF_PANDAS_RMM_MODE", None)) is not None:
import rmm.mr
from rmm._lib.memory_resource import get_free_device_memory
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is being added in: rapidsai/rmm#1567

Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

python/cudf/cudf_pandas_tests/test_cudf_pandas.py Outdated Show resolved Hide resolved
rapids-bot bot pushed a commit to rapidsai/rmm that referenced this pull request Jun 4, 2024
…#1567)

This PR adds `get_free_device_memory` that returns free GPU memory necessary for rapidsai/cudf#15628

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Bradley Dice (https://github.com/bdice)
  - Lawrence Mitchell (https://github.com/wence-)

URL: #1567
@galipremsagar galipremsagar changed the base branch from branch-24.06 to branch-24.08 June 12, 2024 14:22
@github-actions github-actions bot added the cudf.pandas Issues specific to cudf.pandas label Jun 12, 2024
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions/questions but none are blocking. Resolve as you see fit.

@@ -19,6 +22,46 @@ def install():
loader = ModuleAccelerator.install("pandas", "cudf", "pandas")
global LOADED
LOADED = loader is not None
import os
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this import to the top of the file.


if (rmm_mode := os.getenv("CUDF_PANDAS_RMM_MODE", None)) is not None:
import rmm.mr
from rmm.mr import available_device_memory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we import rmm at the top, too? I don’t think we have much benefit to lazy import here.

Also we should use the fully qualified name for rmm.mr.available_device_memory so we only have one import statement for rmm.mr.

initial_pool_size=free_memory,
)
else:
raise TypeError(f"Unsupported rmm mode: {rmm_mode}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise TypeError(f"Unsupported rmm mode: {rmm_mode}")
raise ValueError(f"Unsupported rmm mode: {rmm_mode}")

mr = rmm.mr.ManagedMemoryResource()
rmm.mr.set_current_device_resource(mr)
elif rmm_mode == "managed_pool":
rmm.reinitialize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this call reinitialize while other modes create and set the current device resource?

@galipremsagar galipremsagar added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Jun 13, 2024
@galipremsagar
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 3cb3df3 into rapidsai:branch-24.08 Jun 13, 2024
75 checks passed
rapids-bot bot pushed a commit that referenced this pull request Jun 13, 2024
This PR addresses review comments made by @bdice here: #15628 (review)

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #16021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge cudf.pandas Issues specific to cudf.pandas improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants