Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Cache S3 data in database to improve backfill speed and reduce s3 costs #5627

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

keefertaylor
Copy link

Description

While monitoring metrics in #3, Tessellated discovered that backfills could often take long to complete. For instance, some chains like Optimism would spend days backfilling.

This is because during backfill, the validator client makes a network request to s3 to verify that the checkpoint is there. If it isn't then it signs / uploads a checkpoint. This happens on every binary restart.

From a time perspective, this isn't efficient. It also increases the cost of using AWS to store signatures (since validators pay for transfer bandwidth) and it increases network egress costs for the requesting machine. To solve this, we cache checkpoints that we've written to S3 or that we've fetched from S3 into the database.

Benchmarks (anecdotal, would be happy to run better tests):

  • Backfill time for Optimism: 1.5 days -> 90 seconds
  • Disk Usage: +15GB for implementing this for 24 chains

This is a proof of concept PR that shows how we can store signed / fetched checkpoints into the local database. If we end up wanting to upstream this PR, I'd be happy to clean this up to have better abstractions. Specifically I'd want to:

  • Separate caching functionality from S3 storage, likely using a decorator pattern (which in turn would make this applicable to all storage types, now and in the future)
  • Potentially hide this functionality behind a feature flag so it is opt in
  • Fix DB methods to use higher level abstractions

Copy link

changeset-bot bot commented Mar 7, 2025

⚠️ No Changeset found

Latest commit: 7496fb6

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@keefertaylor keefertaylor changed the title cache messages feat: Cache S3 data in database to improve backfill speed and reduce s3 costs Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

1 participant