Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip serializing blocks when persisting to db #3657

Closed
twoeths opened this issue Jan 22, 2022 · 7 comments · Fixed by #5573
Closed

Skip serializing blocks when persisting to db #3657

twoeths opened this issue Jan 22, 2022 · 7 comments · Fixed by #5573
Labels
prio-medium Resolve this some time soon (tm). scope-performance Performance issue and ideas to improve performance.

Comments

@twoeths
Copy link
Contributor

twoeths commented Jan 22, 2022

Is your feature request related to a problem? Please describe.

A profile from contabo-17 which has low peer count shows that it takes 8% to serialize blocks due to Backfill sync:
Screen Shot 2022-01-22 at 17 09 02

Describe the solution you'd like

When fetching blocks from p2p, we already have binary data, we should be able to use that to persist to db without having to serialize() again

@twoeths
Copy link
Contributor Author

twoeths commented Jan 22, 2022

it also takes 9% of cpu time to do hashTreeRoot()
Screen Shot 2022-01-22 at 17 17 23

0122_contabo_17_backfill_serialize.cpuprofile.zip

@dapplion
Copy link
Contributor

dapplion commented Jan 22, 2022

To mitigate this issue we could slow down Backfill sync, such that only N blocks are hashed every X seconds to even out the load.

Also keeping the binary blob around to prevent re-serialization is a very nice trick easy to implement now

@g11tech
Copy link
Contributor

g11tech commented Jan 22, 2022

To mitigate this issue we could slow down Backfill sync, such that only N blocks are hashed every X seconds to even out the load.

should we expose this as a configurable param with a hidden cli arg?

@twoeths
Copy link
Contributor Author

twoeths commented Jan 24, 2022

should we expose this as a configurable param with a hidden cli arg?

year I prefer that. Also should we run this in a separate worker thread?

@dapplion
Copy link
Contributor

dapplion commented Feb 4, 2022

Also should we run this in a separate worker thread?

Actually maybe! I think this is a completely independent process that given the initial initial conditions it does not require communication with the main thread at all. However, is our database thread safe? Can it handle multiple writes from different workers?

@twoeths
Copy link
Contributor Author

twoeths commented Feb 8, 2022

there are other Backfill Sync performance issues mentioned in #3732 (comment)

@g11tech
Copy link
Contributor

g11tech commented Feb 8, 2022

should i remove the bytes caching bit from this PR (which we can add once ssz v2 PR is in): #3669, it will save the costs of doing hashTreeRoot for parent/child relationship validation (about 8% CPU as previously profiled by @tuyennhv ).
Also one will be able to specify the batchSize on the terminal to reduce any adverse impact of backfill

@dapplion dapplion added prio-medium Resolve this some time soon (tm). scope-performance Performance issue and ideas to improve performance. labels May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio-medium Resolve this some time soon (tm). scope-performance Performance issue and ideas to improve performance.
Projects
None yet
3 participants