Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multithreaded reading of compressed buffers in JSON reader #17670

Open
wants to merge 20 commits into
base: branch-25.02
Choose a base branch
from

Conversation

shrshi
Copy link
Contributor

@shrshi shrshi commented Jan 2, 2025

Description

Addresses #17638

This PR introduces multithreaded host-side decompression of compressed input buffers passed to the JSON reader, and uses a stream pool to transfer the uncompressed buffers to device.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Jan 2, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jan 2, 2025
@shrshi shrshi added feature request New feature or request cuIO cuIO issue non-breaking Non-breaking change labels Jan 2, 2025
@shrshi
Copy link
Contributor Author

shrshi commented Jan 2, 2025

/ok to test

@shrshi
Copy link
Contributor Author

shrshi commented Jan 2, 2025

/ok to test

@shrshi
Copy link
Contributor Author

shrshi commented Jan 2, 2025

/ok to test

@shrshi
Copy link
Contributor Author

shrshi commented Jan 2, 2025

/ok to test

@shrshi
Copy link
Contributor Author

shrshi commented Jan 3, 2025

/ok to test

@shrshi
Copy link
Contributor Author

shrshi commented Jan 3, 2025

/ok to test

@shrshi
Copy link
Contributor Author

shrshi commented Jan 4, 2025

/ok to test

@shrshi
Copy link
Contributor Author

shrshi commented Jan 4, 2025

/ok to test

@shrshi shrshi marked this pull request as ready for review January 6, 2025 18:53
@shrshi shrshi requested a review from a team as a code owner January 6, 2025 18:53
Comment on lines +119 to +125
std::future<size_t> device_read_async(size_t offset,
size_t size,
uint8_t* dst,
rmm::cuda_stream_view stream) override
{
auto& thread_pool = pools::tpool();
return thread_pool.submit_task([this, offset, size, dst, stream] {
Copy link
Contributor

@ttnghia ttnghia Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be called by multiple threads? If so, we may have a race condition issue.

Copy link
Contributor Author

@shrshi shrshi Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, device_read_async is only called by the primary thread. Each of the worker threads executes the code in thread_pool.submit_task(..)

@shrshi
Copy link
Contributor Author

shrshi commented Jan 7, 2025

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants