Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't clear metadata before transitioning chunk to owned state #1015

Merged
merged 1 commit into from
Oct 14, 2021

Conversation

senderista
Copy link
Contributor

While addressing last-minute review feedback, I got confused and forgot why I had used chunk_manager_t::load() instead of chunk_manager_t::initialize() in memory_manager_t::allocate_used_chunk(). The reason was that chunk_manager_t::initialize() clears chunk metadata, which we can't safely do until we know we own the chunk. This resulted in a race where a thread might initially try to acquire a used chunk and then get preempted, losing the race to acquire the chunk to another concurrent thread. Then the first thread would wake up and clear the now-in-use chunk's metadata while the second thread was already using it for allocations! This would clear any allocation bits that the second thread had already set, and the missing allocation bits would eventually trigger an assert when those allocations were freed during GC.

The fix was simple: just use chunk_manager_t::load() to access the metadata before acquiring the chunk, as I originally did. The metadata will be cleared anyway when gaia::db::allocate_object() (the original caller of memory_manager_t::allocate_chunk()) calls chunk_manager_t::initialize(), which calls chunk_manager_metadata_t::clear().

Note that the above initialization hazard doesn't apply to memory_manager_t::allocate_unused_chunk(), but there's no reason to do things any differently, so I made the same change there.

The fact that I got confused about initialization responsibility is evidence that this code needs further refactoring; the existing interfaces have to be contorted to fit the new chunk lifecycle protocol and should be replaced by a new interface design that starts from the protocol itself.

@@ -148,6 +148,7 @@ void chunk_manager_metadata_t::synchronize_allocation_metadata()

void chunk_manager_metadata_t::clear()
{
// NB: We cannot clear the chunk state and version!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the "NB" part is needed.

@senderista senderista merged commit 67b393c into master Oct 14, 2021
@senderista senderista deleted the tobin/debug_double_free branch October 14, 2021 17:47
@@ -153,7 +154,8 @@ chunk_offset_t memory_manager_t::allocate_used_chunk()

auto available_chunk_offset = static_cast<chunk_offset_t>(found_index);
chunk_manager_t chunk_manager;
chunk_manager.initialize(available_chunk_offset);
// NB: We cannot call initialize() here because we don't own the chunk yet!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I'm not sure if you have any valid use for initialize() anymore, so if that's the case, you should just remove it. The reason I asked you why you were not using it was because you still kept it, so I thought you still needed its semantics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm still calling it from gaia::db::allocate_object(), but as I said, this code is a mess and the control flow is quite convoluted. Really I just need to call chunk_manager_metadata_t::clear() somewhere, and only the chunk manager has access to chunk metadata, so that's why I still need it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants