-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set completion mark in correct timing #229
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on PR description, before accessing dbm.dbs[StateTrieMigrationDB]
, isn't the read lock checked? Could you show me which line?
Plus, this PR seems correct because the lock is not held in removeOldDB
.
@ian0371 https://github.com/kaiachain/kaia/blob/dev/storage/database/db_manager.go#L1841. And the lock is not held in |
If the completion mark is forwarded back, previous bad ux may remain (klaytn/klaytn#1412). I think these three lines https://github.com/kaiachain/kaia/pull/229/files#diff-3f4e98143300d776da515f7e8784f500929242d2fd1705aa237cdb76aa84d638R836-R838 should be invoked in one operation. What do you think if the other two lines are moved together in L851? |
@hyunsooda Thanks for supplying contexts, and I think moving all three lines into defer in |
We should figure out the dependencies of each resource, and figure out the deallocation order.
So the invariant is that MigrationDB can be accessed only when If the program halts during step 3, resume from 3 on restart. Maybe we need to add "ShouldRemoveOldDB" to indicate that removing DB is not finished. |
@ian0371 @hyunsooda Currently, the node restarts the state migration if the program exits/halts before deleting the old DB. And if we call The fundamental solution would be storing deletion info into DB as Ian's suggestion, and it can be handled in separate PR. I'll remain Before that, I patched to make dbs nil after |
Yes the migration restarts if I belive that this commit 95afbd6 will solve the issue because the delete operation will be called at restart again. |
@hyunsooda Reading state trie from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, got the point. Following PR expects that recorgnize the end of migration work and directly jump to remove database if it has not been deleted.
I think reverting file to pre-#88 is correct (for both If |
@ian0371 In pre-#88, it doesn't continue on restart since So the call flow of pre-88 is:
I also think it's more appropriate to be pre-88 except for the restart issue, so I'll try to introduce solution (add flag to continue deleting DB) in a separate PR. |
Proposed changes
When state migration has been finished, it removes old DB and dbm for state migration. But setting the completion mark
inMigration = false
is done afterdbm.dbs[StateTrieMigrationDB] = nil
in separate go-routine (no lock held), which leads to potential nil dereference due to timing issue while opening a new trie (since it accesses todbm.dbs[StateTrieMigrationDB]
ifinMigration = true
).I strongly suspect it's the root cause of the random test failure of
TestMigration_StartMigrationByMiscDBOnRestart
in CI.Types of changes
Please put an x in the boxes related to your change.
Checklist
Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.
I have read the CLA Document and I hereby sign the CLA
in first time contribute$ make test
)Related issues
The CI failure always happens in #225.
Further comments