Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: remove locks from acl validation in the hot path #1765

Merged
merged 7 commits into from
Aug 31, 2023

Conversation

kostasrim
Copy link
Contributor

@kostasrim kostasrim commented Aug 29, 2023

Currently df acquires a ReadLock over the UserRegistry to validate ACL categories, therefore any changes to the UserRegistry are immediately available when promt. This is not very performant for the hot path since the atomic increments on the lock force the cores to synchronize (caches + penalty of accessing memory that is outside of the core for NUMA).

This behavior is replaced by deep copying the acl categories in the ConnectionContext. When there is an update to that user's ACL categories, it is streamlined via the proactor to the already opened connections.

Therefore this PR:

  • removes locking the registry when Validating users
  • deep copies the acl categories in the connection context
  • streamline updates to the acl categories via the proactor pool

@kostasrim kostasrim force-pushed the stream_acl_user_updates branch from 0a16135 to 627581f Compare August 30, 2023 12:46
@kostasrim kostasrim changed the title (WIP - Do not review) chore: remove locks from acl validation in the hot path chore: remove locks from acl validation in the hot path Aug 30, 2023
@kostasrim kostasrim marked this pull request as ready for review August 30, 2023 12:47
@kostasrim kostasrim self-assigned this Aug 30, 2023
@kostasrim
Copy link
Contributor Author

this is tested via the already existing acl_family_test.py which passes with the latest changes

@@ -193,4 +225,8 @@ void AclFamily::Register(dfly::CommandRegistry* registry) {

#undef HFUNC

void AclFamily::Init(std::vector<facade::Listener*> listeners) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not const& and than copy here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to delegate this to the callers. An example is:

std::vector<...> tmp;
Init(std::move(tmp)); //init moves internally so in total 2 moves

If we do what you suggest we end up with 1 move(fine const& will work) and then a deep copy.

Generally, I like to:

  1. Accept by value an delegate the choice to the user.
  2. Accept by && only if the function is supposed to be used as a sink.

ServerState::tlocal()->user_registry->MaybeAddAndUpdate(username, std::move(req));
auto cred = ServerState::tlocal()->user_registry->GetCredentials(username);
StreamUpdatesToAllProactorConnections(username, cred.acl_categories);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. maybe you want to do this only if this is not a new user?
  2. do we want to have a lock here so no 2 set user commands are done at the same time because if we have 2 set commands for the same user at the same time what will be set in the connections?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. maybe you want to do this only if this is not a new user?
    That's true we can do this
  2. That's a good catch! I already have a pattern for this 😜

Comment on lines +166 to +175
void AclFamily::StreamUpdatesToAllProactorConnections(std::string_view user, uint32_t update_cat) {
auto update_cb = [user, update_cat]([[maybe_unused]] size_t id, util::Connection* conn) {
DCHECK(conn);
auto connection = static_cast<facade::Connection*>(conn);
auto ctx = static_cast<ConnectionContext*>(connection->cntx());
if (ctx && user == ctx->authed_username) {
ctx->acl_categories = update_cat;
}
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetUser is called so rarely that we can neglect the cost of traversing all connections? (of which there are also just a few usually?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn... What if the context is performing squashing? 😭

We can copy the value (instead of referencing the parent), but then how do we propagate changes? On each batch? 😕

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want you can drop support for squashing and I will think how to enable support for your features

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, because Redis is single-threaded it's impossible for ACL rules to change while its executing a script / a multi transaction, right?

Copy link
Contributor Author

@kostasrim kostasrim Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetUser is called so rarely that we can neglect the cost of traversing all connections? (of which there are also just a few usually?)

Yes that's true but it doesn't hurt to not traverse if the user was just created :)

Damn... What if the context is performing squashing? 😭 We can copy the value (instead of referencing the parent), but then how do we propagate changes? On each batch? 😕

I already thought of this 👨‍🍳 ! It won't affect us. Why? Because, sure we update the "stub" contexts as well BUT validation is done on the parent and not on the stub context. Therefore, we don't really care, since the update done to stub contexts won't affect anything(think of it a stub update -- we mocked the mock 🔪 ). That is before a batch executes we call ValidateCommandExecution which will see the updated changes since it promts the parent and not the stub context

If you want you can drop support for squashing and I will think how to enable support for your features

It's all fine and if it wasn't we would fix it 🚀 No rush here

Btw, because Redis is single-threaded it's impossible for ACL rules to change while its executing a script / a multi transaction, right?

Redis is single threaded indeed but connections can be multiplexed just like on a single thread you can run multiple tasks interleaved with each other (to the user this looks like multihreading, in reality it's just a single thread that interleaves task execution). That being said, this example works on redis:

time     client 1                          client 2
t1         redis-cli
t2         auth user user
t3         multi
t4         set key 23
t5                                              redis-cli
t6                                              setuser kostas -@string
t7         exec --> ERROR, categories changed between `multi/exec`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t1 to t7 is monotonically increasing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the case when acl rules change while exec is running


BUT validation is done on the parent and not on the stub context

Exactly. When you change a connection from its own thread, you assume no other thread has access to it. So it is safe to mutate variables. BUT with squashing, the parent connection is accessed from multiple threads for validation, so its not possible to mutate it (because ValidateCommandExecution might be running on another thread)

@@ -665,6 +665,7 @@ void Service::Init(util::AcceptServer* acceptor, std::vector<facade::Listener*>

shard_set->Init(shard_num, !opts.disable_time_update);

acl_family_.Init(listeners);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we enforce ACL on all listeners? I think we should only do it for the main listener (and not uds and admin).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, you are right.

@kostasrim kostasrim enabled auto-merge (squash) August 31, 2023 16:25
@@ -665,7 +665,7 @@ void Service::Init(util::AcceptServer* acceptor, std::vector<facade::Listener*>

shard_set->Init(shard_num, !opts.disable_time_update);
const auto tcp_disabled = GetFlag(FLAGS_port) == 0u;
if (!tcp_disabled) {
if (!tcp_disabled && !listeners.empty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment that we assume here that the main interface comes first. Also, please CHECK that it's not an admin interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did add a comment, in dfly_main but I will add one here as well


pp_.AwaitFiberOnAll([this, update_cb](util::ProactorBase* pb) {
if (main_listener_) {
main_listener_->TraverseConnections(update_cb);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  // traverses all client connections in all threads. cb must be thread safe.
  // cb should not keep Connection* pointers beyond the run of this function because
  // Connection* are valid only during the call to cb.
  void TraverseConnections(TraverseCB cb);

@kostasrim kostasrim requested a review from romange August 31, 2023 17:25
Copy link
Contributor

@dranikpg dranikpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gonna approve this but see my comment, I'm not sure the issue is resolved fully

Comment on lines +166 to +175
void AclFamily::StreamUpdatesToAllProactorConnections(std::string_view user, uint32_t update_cat) {
auto update_cb = [user, update_cat]([[maybe_unused]] size_t id, util::Connection* conn) {
DCHECK(conn);
auto connection = static_cast<facade::Connection*>(conn);
auto ctx = static_cast<ConnectionContext*>(connection->cntx());
if (ctx && user == ctx->authed_username) {
ctx->acl_categories = update_cat;
}
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the case when acl rules change while exec is running


BUT validation is done on the parent and not on the stub context

Exactly. When you change a connection from its own thread, you assume no other thread has access to it. So it is safe to mutate variables. BUT with squashing, the parent connection is accessed from multiple threads for validation, so its not possible to mutate it (because ValidateCommandExecution might be running on another thread)

@kostasrim kostasrim merged commit 866b9af into main Aug 31, 2023
@kostasrim kostasrim deleted the stream_acl_user_updates branch August 31, 2023 18:59
@dranikpg
Copy link
Contributor

I didn't see you have auto merge 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants