Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Db sync should possibly avoid trying to fetch non-current pool metadata #1929

Open
hodlonaut opened this issue Jan 3, 2025 · 4 comments
Open
Labels
bug Something isn't working

Comments

@hodlonaut
Copy link

OS
Your OS: Ubuntu

Versions
The db-sync version (eg cardano-db-sync --version): 13.6.0.4
PostgreSQL version: 17

Build/Install Method
The method you use to build or install cardano-db-sync: downloaded binaries

Run method
The method you used to run cardano-db-sync (eg Nix/Docker/systemd/none): systemd

Problem Report
After doing some analysis of the records in off_chain_pool_fetch_error table and being perplexed for a little while about some metadata hash mismatch messages in there, it was pointed out to me by a fellow colleague that some of the records i was looking at were due to DB sync trying to fetch previous / not-most-recent version of the metadata using the now outdated either url or hash or both. After manually purging the off_chain_pool_fetch_error table on my DB sync instances the earlier seen errors seem to have gone away. This is an example of a log message in question:

230668 | 3085 | 2024-12-31 00:58:47.28307 | 24111 | Hash mismatch when fetching metadata from https://public.bladepool.com/metadata.json. Expected "2738e2233800ab7f82bd2212a9a55f52d4851f9147f161684c63e6655bedb562" but got "d7c25ea70f63c45413d56c35a80293e7dd859233c43c25e1b0cad2738cdfc037". | 51

230652 | 1498 | 2024-12-30 12:04:28.439982 | 27276 | Hash mismatch when fetching metadata from https://raw.githubusercontent.com/Bmtxs/sp/master/na.json. Expected "48cbb69c4384c9847369e89fd693e637236afb174813e05b6464e1cf2aea037d" but got "1df6e0d2b80ba684fbcca263fde20cfe8b5aa7a30ce15ff1fd79a8df2c5840a7". | 49

in both cases the pmr_id column value refers to not-the-most-recent pool update record, and in my case caused a bit of confusion. So this ticket is primarily to trigger a consideration of whether once new pmr_id is established for a pool, the retries can be cancelled for previous pmr_ids and possibly some table cleanup can be performed at that point in time (unless there's some value in retaining retry history in all of its entirety for previous iteration of metadata)

@hodlonaut hodlonaut added the bug Something isn't working label Jan 3, 2025
@hodlonaut
Copy link
Author

I'm also seeing metadata fetching attempts being made for a pool that retired in epoch 210 (i.e. back in year 2020), possibly another small optimisation opportunity unless this is all by design to try and have as thorough database representation of all pools as possible.

@rdlrt
Copy link

rdlrt commented Jan 3, 2025

While the immediate task as put aptly by @kderme is 'to have a policy that stops fetching attempts when there is a newer pool update' , a side-action/question here is also, is there any thought about adding control for pool metadata refresh from dbsync itself:

  1. Ability to blacklist a pool
  2. Ability to manually refresh a given pool's metadata

I think for years, best practices around pool metadata operations for SPO has been to update contents (thus, different meta hash) when pool makes any meta changes - eg: for CNTools, we already add a nonce field to ensure users dont put multiple update entries with same URL/hash combination. For those who do not follow, Perhaps above could be managed as an addition of a column status into pool_metadata_ref table (allowing us to put a blacklist or refresh to specific pools):

  • If status is success, we already have an entry in pool offline data, do not re-attempt fetch in future for that id
  • If status is failed, there will be a re-attempt for that id (using current logic with sleeps) until a successful fetch or a new pool update entry
  • If status is skip , dbsync will not attempt fetching URL for this pool metadata reference ID.
  • If status is blacklist [manually overriden] , same as skip...but this allows manual control (cannot be overridden) for notorious / bad ops folks
  • If status is refresh , dbsync will re-attempt fetching metadata - allowing manual control for refresh for a given entry (eg: if the next polling for this entry it too far out)

@sgillespie
Copy link
Contributor

sgillespie commented Jan 3, 2025

Perhaps above could be managed as an addition of a column status into pool_metadata_ref table (allowing us to put a blacklist or refresh to specific pools):

  • If status is success, we already have an entry in pool offline data, do not re-attempt fetch in future for that id
  • If status is failed, there will be a re-attempt for that id (using current logic with sleeps) until a successful fetch or a new pool update entry
  • If status is skip , dbsync will not attempt fetching URL for this pool.
  • If status is blacklist [manually overriden] , dbsync will skip fetching too...this allows manual control for notorious / bad ops folks
  • If status is refresh , dbsync will re-attempt fetching metadata - allowing manual control for refresh for a given entry

I like the idea of a status field. Currently, we look for corresponding rows in off_chain_pool_data and off_chain_pool_fetch_error, which I don't think will scale much farther.

@kderme
Copy link
Contributor

kderme commented Jan 7, 2025

Since the problem only appears whilet DBSync is still syncing, I think it's not a big issue.

Using a status field to separate the current pool update from previous ones, would be quite useful. For the next major DBSync release we're trying to focus more on live data and separate them from the historic ones, eg this is similar #1798

Designing the state machine, through rollbacks, manual intervention could be tricky.
Delisting pools is already supported in table delisted_pool, but it may only affect the smash server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants