Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big value serialization in migration #4100

Closed
adiholden opened this issue Nov 10, 2024 · 0 comments · Fixed by #4197
Closed

Big value serialization in migration #4100

adiholden opened this issue Nov 10, 2024 · 0 comments · Fixed by #4197
Assignees

Comments

@adiholden
Copy link
Collaborator

No description provided.

@adiholden adiholden added this to the dfly cluster v4 milestone Nov 10, 2024
chakaz added a commit that referenced this issue Nov 12, 2024
Before this PR:

We serialized a `RESTORE` command for each entry into a string, and then
push that string to the wire.
This means that, when serializing an entry of size X, we consume 2X
memory during the migration.

This PR:

Instead of serializing into a string, we serialize into the wire
directly. Luckily, only a small modification is needed in the way we
interact with `crc64`, which works well even in chunks.

Fixes #4100
chakaz added a commit that referenced this issue Nov 18, 2024
Before this PR we used `RESTORE` commands for transferring data between
source and target nodes in cluster slots migration.

While this _works_, it has a side effect of consuming 2x memory for huge
values (i.e. if a single key's value takes 10gb, serializing it will
take 20gb or even 30gb).

With this PR we break down huge keys into multiple commands (`RPUSH`,
`HSET`, etc), respecting the existing `--serialization_max_chunk_size`
flag.

Fixes #4100
chakaz added a commit that referenced this issue Nov 25, 2024
* feat: Huge values breakdown in cluster migration

Before this PR we used `RESTORE` commands for transferring data between
source and target nodes in cluster slots migration.

While this _works_, it has a side effect of consuming 2x memory for huge
values (i.e. if a single key's value takes 10gb, serializing it will
take 20gb or even 30gb).

With this PR we break down huge keys into multiple commands (`RPUSH`,
`HSET`, etc), respecting the existing `--serialization_max_chunk_size`
flag.

Part of #4100
chakaz added a commit that referenced this issue Nov 25, 2024
This actually yields between invocations, and tests that modifying huge
values while migrating slots to other nodes works.

Still TODO: assert on RSS size during migration (will probably need to
create bigger containers)

Fixes #4100
chakaz added a commit that referenced this issue Nov 26, 2024
With #4144 we break huge values slot migration into multiple commands.
This PR now adds yield between those commands.
It also adds a test that checks that modifying huge values while doing a
migration works well, and that RSS doesn't grow too much.

Fixes #4100
chakaz added a commit that referenced this issue Jan 5, 2025
* feat: Yield inside huge values migration serialization

With #4144 we break huge values slot migration into multiple commands.
This PR now adds yield between those commands.
It also adds a test that checks that modifying huge values while doing a
migration works well, and that RSS doesn't grow too much.

Fixes #4100
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants