Db sync should possibly avoid trying to fetch non-current pool metadata #1929

hodlonaut · 2025-01-03T02:50:40Z

OS
Your OS: Ubuntu

Versions
The db-sync version (eg cardano-db-sync --version): 13.6.0.4
PostgreSQL version: 17

Build/Install Method
The method you use to build or install cardano-db-sync: downloaded binaries

Run method
The method you used to run cardano-db-sync (eg Nix/Docker/systemd/none): systemd

Problem Report
After doing some analysis of the records in off_chain_pool_fetch_error table and being perplexed for a little while about some metadata hash mismatch messages in there, it was pointed out to me by a fellow colleague that some of the records i was looking at were due to DB sync trying to fetch previous / not-most-recent version of the metadata using the now outdated either url or hash or both. After manually purging the off_chain_pool_fetch_error table on my DB sync instances the earlier seen errors seem to have gone away. This is an example of a log message in question:

230668 | 3085 | 2024-12-31 00:58:47.28307 | 24111 | Hash mismatch when fetching metadata from https://public.bladepool.com/metadata.json. Expected "2738e2233800ab7f82bd2212a9a55f52d4851f9147f161684c63e6655bedb562" but got "d7c25ea70f63c45413d56c35a80293e7dd859233c43c25e1b0cad2738cdfc037". | 51

230652 | 1498 | 2024-12-30 12:04:28.439982 | 27276 | Hash mismatch when fetching metadata from https://raw.githubusercontent.com/Bmtxs/sp/master/na.json. Expected "48cbb69c4384c9847369e89fd693e637236afb174813e05b6464e1cf2aea037d" but got "1df6e0d2b80ba684fbcca263fde20cfe8b5aa7a30ce15ff1fd79a8df2c5840a7". | 49

in both cases the pmr_id column value refers to not-the-most-recent pool update record, and in my case caused a bit of confusion. So this ticket is primarily to trigger a consideration of whether once new pmr_id is established for a pool, the retries can be cancelled for previous pmr_ids and possibly some table cleanup can be performed at that point in time (unless there's some value in retaining retry history in all of its entirety for previous iteration of metadata)

The text was updated successfully, but these errors were encountered:

hodlonaut · 2025-01-03T02:56:51Z

I'm also seeing metadata fetching attempts being made for a pool that retired in epoch 210 (i.e. back in year 2020), possibly another small optimisation opportunity unless this is all by design to try and have as thorough database representation of all pools as possible.

rdlrt · 2025-01-03T03:12:11Z

While the immediate task as put aptly by @kderme is 'to have a policy that stops fetching attempts when there is a newer pool update' , a side-action/question here is also, is there any thought about adding control for pool metadata refresh from dbsync itself:

Ability to blacklist a pool
Ability to manually refresh a given pool's metadata

I think for years, best practices around pool metadata operations for SPO has been to update contents (thus, different meta hash) when pool makes any meta changes - eg: for CNTools, we already add a nonce field to ensure users dont put multiple update entries with same URL/hash combination. For those who do not follow, Perhaps above could be managed as an addition of a column status into pool_metadata_ref table (allowing us to put a blacklist or refresh to specific pools):

If status is success, we already have an entry in pool offline data, do not re-attempt fetch in future for that id
If status is failed, there will be a re-attempt for that id (using current logic with sleeps) until a successful fetch or a new pool update entry
If status is skip , dbsync will not attempt fetching URL for this pool metadata reference ID.
If status is blacklist [manually overriden] , same as skip...but this allows manual control (cannot be overridden) for notorious / bad ops folks
If status is refresh , dbsync will re-attempt fetching metadata - allowing manual control for refresh for a given entry (eg: if the next polling for this entry it too far out)

sgillespie · 2025-01-03T15:51:53Z

Perhaps above could be managed as an addition of a column status into pool_metadata_ref table (allowing us to put a blacklist or refresh to specific pools):

If status is success, we already have an entry in pool offline data, do not re-attempt fetch in future for that id

If status is failed, there will be a re-attempt for that id (using current logic with sleeps) until a successful fetch or a new pool update entry

If status is skip , dbsync will not attempt fetching URL for this pool.

If status is blacklist [manually overriden] , dbsync will skip fetching too...this allows manual control for notorious / bad ops folks

If status is refresh , dbsync will re-attempt fetching metadata - allowing manual control for refresh for a given entry

I like the idea of a status field. Currently, we look for corresponding rows in off_chain_pool_data and off_chain_pool_fetch_error, which I don't think will scale much farther.

kderme · 2025-01-07T10:48:19Z

Since the problem only appears whilet DBSync is still syncing, I think it's not a big issue.

Using a status field to separate the current pool update from previous ones, would be quite useful. For the next major DBSync release we're trying to focus more on live data and separate them from the historic ones, eg this is similar #1798

Designing the state machine, through rollbacks, manual intervention could be tricky.
Delisting pools is already supported in table delisted_pool, but it may only affect the smash server.

hodlonaut added the bug Something isn't working label Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Db sync should possibly avoid trying to fetch non-current pool metadata #1929

Db sync should possibly avoid trying to fetch non-current pool metadata #1929

hodlonaut commented Jan 3, 2025

hodlonaut commented Jan 3, 2025

rdlrt commented Jan 3, 2025 •

edited

Loading

sgillespie commented Jan 3, 2025 •

edited

Loading

kderme commented Jan 7, 2025

Db sync should possibly avoid trying to fetch non-current pool metadata #1929

Db sync should possibly avoid trying to fetch non-current pool metadata #1929

Comments

hodlonaut commented Jan 3, 2025

hodlonaut commented Jan 3, 2025

rdlrt commented Jan 3, 2025 • edited Loading

sgillespie commented Jan 3, 2025 • edited Loading

kderme commented Jan 7, 2025

rdlrt commented Jan 3, 2025 •

edited

Loading

sgillespie commented Jan 3, 2025 •

edited

Loading