Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: deduplicate mget response #4175

Merged
merged 4 commits into from
Nov 25, 2024
Merged

fix: deduplicate mget response #4175

merged 4 commits into from
Nov 25, 2024

Conversation

romange
Copy link
Collaborator

@romange romange commented Nov 24, 2024

In case of duplicate mget keys, skips fetching the same key twice.
We add transactional flag DUPLICATE_KEYS_HINT that tells us for free
whether shard has duplicate keys.
The optimization is straighforward - we just copy the response for the original key,
since the response is a shallow object, we potentially save lots of memory with this
deduplication.

@romange romange changed the title chore: deduplicate mget response fix: deduplicate mget response Nov 24, 2024
@romange romange requested a review from adiholden November 24, 2024 16:06
@@ -1104,10 +1104,13 @@ bool Transaction::ScheduleInShard(EngineShard* shard, bool execute_optimistic) {

// We need to acquire the fp locks because the executing callback
// within RunCallback below might preempt.
bool keys_unlocked = GetDbSlice(shard->shard_id()).Acquire(mode, lock_args);
auto [keys_unlocked, has_duplicates] = GetDbSlice(shard->shard_id()).Acquire(mode, lock_args);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when running dragonfly with lock on hash tag will this always be true for multi keys?
if so do you see any downside?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing some tests that check the response for mget for the same key
lets add this test before merging this pr

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, i did not think of it. yes, there is a downside

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am gonna drop the transaction optimization for now, too bad it does not work.

In case of duplicate mget keys, skips fetching the same key twice.
We add transactional flag DUPLICATE_KEYS_HINT that tells us for free
whether shard has duplicate keys.
The optimization is straighforward - we just copy the response for the original key,
since the response is a shallow object, we potentially save lots of memory with this
deduplication.

Signed-off-by: Roman Gershman <[email protected]>
Always deduplicate inside OpMGet.

Signed-off-by: Roman Gershman <[email protected]>
@adiholden
Copy link
Collaborator

@romange please add unit test for checking the output of mget with same key

@romange
Copy link
Collaborator Author

romange commented Nov 25, 2024

@adiholden MGetCachingModeBug2465 that you added already covers it.
Also SetWithHashtagsWithEmulatedCluster covers the hashtags scenario.

@adiholden
Copy link
Collaborator

MGetCachingModeBug2465

As I wrote above we do not check the mget response, we just run the mget command and check the stats
you can change this test to check the mget response

absl::InlinedVector<Item, 32> items(keys.Size());
static thread_local absl::flat_hash_map<string_view, unsigned> key_index;

FiberAtomicGuard guard;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think this is not preemptive
If you run in cache mode and do bumpups for example you will need to serialize the bucket

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we can not use thread_local.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you good you.

@romange romange requested a review from adiholden November 25, 2024 14:19
@@ -596,6 +623,7 @@ MGetResponse OpMGet(util::fb2::BlockingCounter wait_bc, uint8_t fetch_mask, cons
}
}
}
key_index.clear();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this is redundant now

@romange romange merged commit 7ac1631 into main Nov 25, 2024
9 checks passed
@romange romange deleted the DeduplicateMgt branch November 25, 2024 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants