Don't clear metadata before transitioning chunk to owned state #1015
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While addressing last-minute review feedback, I got confused and forgot why I had used
chunk_manager_t::load()
instead ofchunk_manager_t::initialize()
inmemory_manager_t::allocate_used_chunk()
. The reason was thatchunk_manager_t::initialize()
clears chunk metadata, which we can't safely do until we know we own the chunk. This resulted in a race where a thread might initially try to acquire a used chunk and then get preempted, losing the race to acquire the chunk to another concurrent thread. Then the first thread would wake up and clear the now-in-use chunk's metadata while the second thread was already using it for allocations! This would clear any allocation bits that the second thread had already set, and the missing allocation bits would eventually trigger an assert when those allocations were freed during GC.The fix was simple: just use
chunk_manager_t::load()
to access the metadata before acquiring the chunk, as I originally did. The metadata will be cleared anyway whengaia::db::allocate_object()
(the original caller ofmemory_manager_t::allocate_chunk()
) callschunk_manager_t::initialize()
, which callschunk_manager_metadata_t::clear()
.Note that the above initialization hazard doesn't apply to
memory_manager_t::allocate_unused_chunk()
, but there's no reason to do things any differently, so I made the same change there.The fact that I got confused about initialization responsibility is evidence that this code needs further refactoring; the existing interfaces have to be contorted to fit the new chunk lifecycle protocol and should be replaced by a new interface design that starts from the protocol itself.