Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add missing barrier to fix reads in the coordinator fiber #1009

Merged
merged 1 commit into from
Mar 30, 2023

Conversation

romange
Copy link
Collaborator

@romange romange commented Mar 29, 2023

  1. Adds memory barriers around run_count_ so that the coordinator thread won't read stale data.
  2. (unrelated to the fix): Replaces transaction callback with FunctionRef that does not allocate like std::function.

Fixes #997.

@romange romange requested a review from dranikpg March 29, 2023 13:37
@romange
Copy link
Collaborator Author

romange commented Mar 29, 2023

My regression tests on ARM64 pass.


cb_ = std::move(cb);
cb_ptr_ = &cb;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something here, this is unsafe. You're taking a ref to a local var.
Perhaps you intended to copy it to a local member of a non-pointer type?
I realize you'd like to know if it's initialized, so you can have the member be of type std::optional

BTW note that this is somewhat risky, as now callers to this API must make sure that their callable outlive the inner object, but I don't know this API so it might be OK.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lifetime of cb_/cb_ptr_ is only for during of ScheduleSingleHop/Execute.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that we could use std::optional here but at the end I use FunctionRef to accept lambdas which by itself is valid only during the call of these functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, if the lifetime of cb_ptr_ is only for the duration of these methods then there's no issue. I didn't realize that, but reading the code now - it makes sense. Sorry about the noise :)

BTW, in the previous version (with std::function) the lambdas were moved into the Transaction object, so they could have theoretically had a longer lifetime than the function call. I'm not sure if that's what you implied in "... lambdas which by itself is valid only during the call of these functions", but anyway, that's obviously irrelevant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, they could - and we paid for it with unnecessary allocation, which I discovered when fixing the bug.
These allocations by themselves are not awful, but their pattern is - to always malloc in one thread and free in another. This puts an unnecessary load on malloc library to move free lists from thread to thread.
those cb_ = nullptr are the assignments that reset the function object.

@@ -471,7 +478,7 @@ class Transaction {
// Reverse argument mapping for ReverseArgIndex to convert from shard index to original index.
std::vector<uint32_t> reverse_index_;

RunnableType cb_; // Run on shard threads
RunnableType* cb_ptr_ = nullptr; // Run on shard threads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absl::FunctionRef already acts as a lightweight reference, so there is no need for the pointer in the general context.

However its always used as a pointer to the argument, so it makes sense like this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not pissible to "reset" it or to clear it. I could use optional but decided to use a pointer instead.

src/server/transaction.cc Show resolved Hide resolved
@romange romange merged commit 8e00801 into main Mar 30, 2023
@romange romange deleted the FixMemoryOrder branch March 30, 2023 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash due to data race
3 participants