-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: remove locks from acl validation in the hot path #1765
Conversation
0a16135
to
627581f
Compare
this is tested via the already existing |
src/server/acl/acl_family.cc
Outdated
@@ -193,4 +225,8 @@ void AclFamily::Register(dfly::CommandRegistry* registry) { | |||
|
|||
#undef HFUNC | |||
|
|||
void AclFamily::Init(std::vector<facade::Listener*> listeners) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not const& and than copy here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like to delegate this to the callers. An example is:
std::vector<...> tmp;
Init(std::move(tmp)); //init moves internally so in total 2 moves
If we do what you suggest we end up with 1 move(fine const& will work) and then a deep copy.
Generally, I like to:
- Accept by value an delegate the choice to the user.
- Accept by && only if the function is supposed to be used as a sink.
src/server/acl/acl_family.cc
Outdated
ServerState::tlocal()->user_registry->MaybeAddAndUpdate(username, std::move(req)); | ||
auto cred = ServerState::tlocal()->user_registry->GetCredentials(username); | ||
StreamUpdatesToAllProactorConnections(username, cred.acl_categories); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- maybe you want to do this only if this is not a new user?
- do we want to have a lock here so no 2 set user commands are done at the same time because if we have 2 set commands for the same user at the same time what will be set in the connections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- maybe you want to do this only if this is not a new user?
That's true we can do this - That's a good catch! I already have a pattern for this 😜
void AclFamily::StreamUpdatesToAllProactorConnections(std::string_view user, uint32_t update_cat) { | ||
auto update_cb = [user, update_cat]([[maybe_unused]] size_t id, util::Connection* conn) { | ||
DCHECK(conn); | ||
auto connection = static_cast<facade::Connection*>(conn); | ||
auto ctx = static_cast<ConnectionContext*>(connection->cntx()); | ||
if (ctx && user == ctx->authed_username) { | ||
ctx->acl_categories = update_cat; | ||
} | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetUser is called so rarely that we can neglect the cost of traversing all connections? (of which there are also just a few usually?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn... What if the context is performing squashing? 😭
We can copy the value (instead of referencing the parent), but then how do we propagate changes? On each batch? 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want you can drop support for squashing and I will think how to enable support for your features
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, because Redis is single-threaded it's impossible for ACL rules to change while its executing a script / a multi transaction, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetUser is called so rarely that we can neglect the cost of traversing all connections? (of which there are also just a few usually?)
Yes that's true but it doesn't hurt to not traverse if the user was just created :)
Damn... What if the context is performing squashing? 😭 We can copy the value (instead of referencing the parent), but then how do we propagate changes? On each batch? 😕
I already thought of this 👨🍳 ! It won't affect us. Why? Because, sure we update the "stub" contexts as well BUT validation is done on the parent
and not on the stub
context. Therefore, we don't really care, since the update done to stub contexts won't affect anything(think of it a stub
update -- we mocked the mock 🔪 ). That is before a batch executes we call ValidateCommandExecution
which will see the updated changes since it promts the parent and not the stub context
If you want you can drop support for squashing and I will think how to enable support for your features
It's all fine and if it wasn't we would fix it 🚀 No rush here
Btw, because Redis is single-threaded it's impossible for ACL rules to change while its executing a script / a multi transaction, right?
Redis is single threaded indeed but connections can be multiplexed just like on a single thread you can run multiple tasks interleaved with each other (to the user this looks like multihreading, in reality it's just a single thread that interleaves task execution). That being said, this example works on redis:
time client 1 client 2
t1 redis-cli
t2 auth user user
t3 multi
t4 set key 23
t5 redis-cli
t6 setuser kostas -@string
t7 exec --> ERROR, categories changed between `multi/exec`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t1 to t7 is monotonically increasing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the case when acl rules change while exec is running
BUT validation is done on the parent and not on the stub context
Exactly. When you change a connection from its own thread, you assume no other thread has access to it. So it is safe to mutate variables. BUT with squashing, the parent connection is accessed from multiple threads for validation, so its not possible to mutate it (because ValidateCommandExecution
might be running on another thread)
src/server/main_service.cc
Outdated
@@ -665,6 +665,7 @@ void Service::Init(util::AcceptServer* acceptor, std::vector<facade::Listener*> | |||
|
|||
shard_set->Init(shard_num, !opts.disable_time_update); | |||
|
|||
acl_family_.Init(listeners); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we enforce ACL on all listeners? I think we should only do it for the main listener (and not uds and admin).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, you are right.
@@ -665,7 +665,7 @@ void Service::Init(util::AcceptServer* acceptor, std::vector<facade::Listener*> | |||
|
|||
shard_set->Init(shard_num, !opts.disable_time_update); | |||
const auto tcp_disabled = GetFlag(FLAGS_port) == 0u; | |||
if (!tcp_disabled) { | |||
if (!tcp_disabled && !listeners.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a comment that we assume here that the main interface comes first. Also, please CHECK that it's not an admin interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did add a comment, in dfly_main
but I will add one here as well
src/server/acl/acl_family.cc
Outdated
|
||
pp_.AwaitFiberOnAll([this, update_cb](util::ProactorBase* pb) { | ||
if (main_listener_) { | ||
main_listener_->TraverseConnections(update_cb); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// traverses all client connections in all threads. cb must be thread safe.
// cb should not keep Connection* pointers beyond the run of this function because
// Connection* are valid only during the call to cb.
void TraverseConnections(TraverseCB cb);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gonna approve this but see my comment, I'm not sure the issue is resolved fully
void AclFamily::StreamUpdatesToAllProactorConnections(std::string_view user, uint32_t update_cat) { | ||
auto update_cb = [user, update_cat]([[maybe_unused]] size_t id, util::Connection* conn) { | ||
DCHECK(conn); | ||
auto connection = static_cast<facade::Connection*>(conn); | ||
auto ctx = static_cast<ConnectionContext*>(connection->cntx()); | ||
if (ctx && user == ctx->authed_username) { | ||
ctx->acl_categories = update_cat; | ||
} | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the case when acl rules change while exec is running
BUT validation is done on the parent and not on the stub context
Exactly. When you change a connection from its own thread, you assume no other thread has access to it. So it is safe to mutate variables. BUT with squashing, the parent connection is accessed from multiple threads for validation, so its not possible to mutate it (because ValidateCommandExecution
might be running on another thread)
I didn't see you have auto merge 😄 |
Currently df acquires a
ReadLock
over theUserRegistry
to validate ACL categories, therefore any changes to theUserRegistry
are immediately available when promt. This is not very performant for the hot path since the atomic increments on the lock force the cores to synchronize (caches + penalty of accessing memory that is outside of the core for NUMA).This behavior is replaced by deep copying the acl categories in the
ConnectionContext
. When there is an update to that user's ACL categories, it is streamlined via the proactor to the already opened connections.Therefore this PR: