-
Notifications
You must be signed in to change notification settings - Fork 768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PVF: consider adding a checksum for artifacts #5441
Comments
My first thought would be that checksumming the entire PVF everytime could prove to be expensive, however I don't see any reasons why we can't do it periodically and cleanup the corrupted one, in this way the validator recovers quickly if we hit such a condition and we don't pay the price of checksumming all the time. |
Agreed there is overhead, but let’s measure it. Assuming nodes do at most 10-12 validations on average per RCB it shouldn’t be much overhead IMO. |
The largest kusama PVF has around 50MiB(the smallest is 20MiB), sha-1 on it on reference hardware seems to take around I wouldn't want to pay this price all the time for fixing this edge-case, maybe we could just check it for PVFs that fail validation as a way to try to recover the node as fast as possible. |
SHA-1 is quite expensive, wouldn't a good old |
I think we had an issue for this already and the idea to not pay the overhead on the happy path was:
|
Checked the performance of https://docs.rs/crc-catalog/latest/crc_catalog/algorithm/constant.CRC_32_BZIP2.html & https://docs.rs/crc-catalog/latest/crc_catalog/algorithm/constant.CRC_32_CKSUM.html I'm a bit surprised but on this 50MiB file it seems to actually perform worse than sha1, it is around 100ms. |
Yeah, this is more efficient. However I am surprised by the CRC32 results. |
I've noticed remark that CRC32 winds up slow in practice.
Yes, this makes sense. We're likely happy if we lower latancy here, but have all CPU cores work hard upon this, given we're only running the check once validation fails, right? I'd think Blake3 checks the boxes wellk enough: It's extremely fast thanks to being a Merkle tree, at the cost of using all available CPU cores. We do not need a cryptographic hash for disk corruptions, but who knows maybe something stranger becomes possible with compiler toolchains. |
There was a closely related discussion in #3139. I remember Jan saying that the |
closing in favor of #677 |
... related to #5413 (comment)
The checksum should only be stored after successful validation of candidates . It then should be checked before PVF artifact is used to validate a candidate. If it differs, we recompile the artifact and then validate the candidate. If after recompilation the validation still fails, we emit an error and stop validating using that artifact.
Any thoughts about this @s0me0ne-unkn0wn @alexggh ?
The text was updated successfully, but these errors were encountered: