-
Notifications
You must be signed in to change notification settings - Fork 20.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: improve shutdown synchronization in BlockChain #22853
Conversation
core/blockchain.go
Outdated
if bc.insertStopped() { | ||
return errInsertionInterrupted | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was placed on the wrong method, though :)
We should consider taking the chain mutex in Blockchain.Stop |
on
With this PR (plus the same miner reward on clique), I never got an error, but instead it detected that it's already shutting down:
|
Tested this with the clique stress-test, where I added a signal to shut down all the stacks on
With this PR:
(the panic is part of the stress-test, so that's fine, but the error is different) |
rebased |
@fjl now would be a good time to merge this, to give it lot of time on master before next release |
Rebased again, cc @fjl |
Hi @holiman, I've had a look at this and think its definitely a neater approach compared to #23673. I think there might still be some shutdown sync issues with go-routines started from within the blockchain code. This line starts a go routine that tries to access the rawdb, and there doesn't appear to be any sync control around it, so I think this code could again result in trying to access the db after it has been closed. Then there's also these 2 lines, which I don't think would access the db in an unsafe way, but its not easy to really verify that, and that could change when code changes are made, so it would seem safer to me to also wait for completion of these with the waitgroup. |
@piersy Thanks for your additional review! I have checked the goroutines you mentioned. The The The last one you mentioned is a goroutine created for the state prefetcher. It's not possible to track it properly at this time. We should probably move the goroutine creation/tracking into the prefetcher itself. For now, I think we can live with this one not being tracked, the prefetcher does not write to the database. |
Hmm, so this won't work just yet. Adding futureBlocksLoop into the WaitGroup creates a potential deadlock because procFutureBlocks calls InsertChain, which may take the chain mutex. Here's the challenge with shutdown sync: within Stop, we want to ensure that all calls to InsertChain and related methods have left the critical section, and new calls cannot enter it. When we discussed removing the wg.Add calls, we were hoping this could be achieved by simply taking the chain mutex in Stop. Since this mutex is held during all chain modifications, it would give us the exclusion we need. However, simple exclusion is not all we need here. While Stop is running, we also need to deflect all attempts to insert new chain data immediately. I think what we need is some kind of closable mutex. All the chain mutations would attempt to take this mutex, and return an error if it is closed. We'd close the mutex in Stop. |
You missed this: func (bc *BlockChain) ResetWithGenesisBlock(genesis *types.Block) error {
// Dump the entire block chain and purge the caches
if err := bc.SetHead(0); err != nil {
return err
}
bc.chainmu.Lock() and
and inside
|
…hereum#22853) This change removes misuses of sync.WaitGroup in BlockChain. Before this change, block insertion modified the WaitGroup counter in order to ensure that Stop would wait for pending operations to complete. This was racy and could even lead to crashes if Stop was called at an unfortunate time. The issue is resolved by adding a specialized 'closable' mutex, which prevents chain modifications after stopping while also synchronizing writers with each other. Co-authored-by: Felix Lange <[email protected]>
This change removes misuses of sync.WaitGroup in BlockChain. Before this change, block insertion modified the WaitGroup counter in order to ensure that Stop would wait for pending operations to complete. This was racy and could even lead to crashes if Stop was called at an unfortunate time. The issue is resolved by adding a specialized 'closable' mutex, which prevents chain modifications after stopping while also synchronizing writers with each other. Co-authored-by: Felix Lange <[email protected]>
This change removes misuses of sync.WaitGroup in BlockChain. Before this change,
block insertion modified the WaitGroup counter in order to ensure that Stop would wait
for pending operations to complete. This was racy and could even lead to crashes
if Stop was called at an unfortunate time. The issue is resolved by adding a specialized
'closable' mutex, which prevents chain modifications after stopping while also
synchronizing writers.