feat: Cache S3 data in database to improve backfill speed and reduce s3 costs #5627

keefertaylor · 2025-03-07T02:00:36Z

Description

While monitoring metrics in #3, Tessellated discovered that backfills could often take long to complete. For instance, some chains like Optimism would spend days backfilling.

This is because during backfill, the validator client makes a network request to s3 to verify that the checkpoint is there. If it isn't then it signs / uploads a checkpoint. This happens on every binary restart.

From a time perspective, this isn't efficient. It also increases the cost of using AWS to store signatures (since validators pay for transfer bandwidth) and it increases network egress costs for the requesting machine. To solve this, we cache checkpoints that we've written to S3 or that we've fetched from S3 into the database.

Benchmarks (anecdotal, would be happy to run better tests):

Backfill time for Optimism: 1.5 days -> 90 seconds
Disk Usage: +15GB for implementing this for 24 chains

This is a proof of concept PR that shows how we can store signed / fetched checkpoints into the local database. If we end up wanting to upstream this PR, I'd be happy to clean this up to have better abstractions. Specifically I'd want to:

Separate caching functionality from S3 storage, likely using a decorator pattern (which in turn would make this applicable to all storage types, now and in the future)
Potentially hide this functionality behind a feature flag so it is opt in
Fix DB methods to use higher level abstractions

changeset-bot · 2025-03-07T02:00:40Z

⚠️ No Changeset found

Latest commit: 7496fb6

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

cache messages

7496fb6

keefertaylor requested review from tkporter and daniel-savu as code owners March 7, 2025 02:00

keefertaylor changed the title ~~cache messages~~ feat: Cache S3 data in database to improve backfill speed and reduce s3 costs Mar 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Cache S3 data in database to improve backfill speed and reduce s3 costs #5627

feat: Cache S3 data in database to improve backfill speed and reduce s3 costs #5627

keefertaylor commented Mar 7, 2025

changeset-bot bot commented Mar 7, 2025

feat: Cache S3 data in database to improve backfill speed and reduce s3 costs #5627

Are you sure you want to change the base?

feat: Cache S3 data in database to improve backfill speed and reduce s3 costs #5627

Conversation

keefertaylor commented Mar 7, 2025

Description

changeset-bot bot commented Mar 7, 2025

⚠️ No Changeset found