Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-implement the prefetcher using backpressure mechanism #980

Merged
merged 4 commits into from
Aug 15, 2024

Conversation

monthonk
Copy link
Contributor

@monthonk monthonk commented Aug 10, 2024

Description of change

The prefetcher now uses only one GetObject request to fetch data in advance. This request has a range of entire object but use backpressure mechanism to control how much data it wants to fetch into the part queue instead of spawning up to two requests in parallel.

This should make the throughput more stable because previously the two request tasks could compete with each other when fetching data from S3. Also, it will be easier to control how much data we want to store in the part queue.

Related: #987

Does this change impact existing behavior?

No, the change only for internal interface.

Does this change need a changelog entry in any of the crates?

No


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

The prefetcher now uses only one GetObject request to fetch data in advance.
This request has a range of entire object but use backpressure mechanism
to control how much data it wants to fetch into the part queue instead of
spawning up to two requests in parallel.

This should make the throughput more stable because previously the two
request tasks could compete with each other when fetching data from S3.
Also, it will be easier to control how much data we want to store in the
part queue.

Signed-off-by: Monthon Klongklaew <[email protected]>
@monthonk monthonk had a problem deploying to PR integration tests August 10, 2024 08:43 — with GitHub Actions Failure
@monthonk monthonk temporarily deployed to PR integration tests August 10, 2024 08:43 — with GitHub Actions Inactive
@monthonk monthonk temporarily deployed to PR integration tests August 10, 2024 08:43 — with GitHub Actions Inactive
@monthonk monthonk temporarily deployed to PR integration tests August 10, 2024 08:43 — with GitHub Actions Inactive
@monthonk monthonk temporarily deployed to PR integration tests August 10, 2024 08:43 — with GitHub Actions Inactive
@monthonk monthonk temporarily deployed to PR integration tests August 10, 2024 08:43 — with GitHub Actions Inactive
@monthonk monthonk temporarily deployed to PR integration tests August 10, 2024 08:43 — with GitHub Actions Inactive
@monthonk monthonk temporarily deployed to PR integration tests August 10, 2024 09:38 — with GitHub Actions Inactive
@monthonk monthonk added the performance PRs to run benchmarks on label Aug 10, 2024
mountpoint-s3/src/checksums.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/prefetch.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/prefetch/part_stream.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/prefetch/caching_stream.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/prefetch/caching_stream.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/prefetch/caching_stream.rs Show resolved Hide resolved
mountpoint-s3/src/prefetch/caching_stream.rs Outdated Show resolved Hide resolved
@monthonk monthonk temporarily deployed to PR integration tests August 14, 2024 19:28 — with GitHub Actions Inactive
@monthonk monthonk temporarily deployed to PR integration tests August 14, 2024 19:28 — with GitHub Actions Inactive
@monthonk monthonk temporarily deployed to PR integration tests August 14, 2024 19:28 — with GitHub Actions Inactive
@monthonk monthonk temporarily deployed to PR integration tests August 14, 2024 19:28 — with GitHub Actions Inactive
mountpoint-s3/src/prefetch/caching_stream.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/prefetch/caching_stream.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/prefetch/caching_stream.rs Outdated Show resolved Hide resolved
…st is already completed

Signed-off-by: Monthon Klongklaew <[email protected]>
Signed-off-by: Monthon Klongklaew <[email protected]>
@monthonk monthonk temporarily deployed to PR integration tests August 15, 2024 12:19 — with GitHub Actions Inactive
@monthonk monthonk temporarily deployed to PR integration tests August 15, 2024 12:19 — with GitHub Actions Inactive
@monthonk monthonk temporarily deployed to PR integration tests August 15, 2024 12:19 — with GitHub Actions Inactive
@monthonk monthonk temporarily deployed to PR integration tests August 15, 2024 12:19 — with GitHub Actions Inactive
@monthonk monthonk temporarily deployed to PR integration tests August 15, 2024 12:19 — with GitHub Actions Inactive
@monthonk monthonk temporarily deployed to PR integration tests August 15, 2024 12:19 — with GitHub Actions Inactive
@monthonk monthonk temporarily deployed to PR integration tests August 15, 2024 12:19 — with GitHub Actions Inactive
// This should not block since the channel is unbounded
self.read_window_updater
let _ = self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since we are not returning the error, I'd rather use an explicit if let Err.. to trace instead of inspect_err. Not blocking on this though.

@monthonk monthonk added this pull request to the merge queue Aug 15, 2024
Merged via the queue into awslabs:main with commit 7f78cc4 Aug 15, 2024
26 checks passed
@monthonk monthonk deleted the backpressure_prefetch branch August 15, 2024 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance PRs to run benchmarks on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants