Epic: tenant relocation between pageserver nodes #886

kelvich · 2021-11-16T17:12:17Z

Motivation

We want to be able to assign tenants and timelines to the pageserver running on the appropriate EC2 instances, to be able to distribute the tenants and their workloads on EC2 instances fairly. This helps achieve the stable user query latencies, avoid noisy neighbor issues and when necessary perform maintenance and upgrades on the nodes where the pageserver runs.

See #985 for a similar issue on safekeepers

DoD

Add possibility to move tenant from one storage node to another. For now it is okay to require compute node restart to do that.

There should be a button in control plane to trigger the migration. Aside from compute restart there shouldnt be any downtime.

Tasks

For more in-depth description see the last RFC on the topic: #3868

For actual progress see the project board: https://github.com/orgs/neondatabase/projects/27/views/2

As described in the RFC we implement simpler approach first. The downside is the short downtime. The approach can evolve into one without the downtime after we solve immediate tech debt issues.

Report attach failures in pageserver API #4344

LizardWizzard · 2021-12-07T18:16:13Z

Some more notes.

What is the sequence of actions to test moving a tenant to a different pageserver?

Init pageserver old
Create new tenant
Insert some data
Run a checkpoint
Wait for it to be uploaded to the remote storage
Bootstrap new pageserver
Wait for all the data to be downloaded
Execute callmemaybe so new pageserver starts replication from safekeeper
Wait for replication to catch up
Restart compute with changed pageserver address
Shut down tenant on the old pageserver

What bothers me the most is that it becomes possible for two pageservers to write checkpoints concurrently to the same s3 path. Though they'll probably write the same data I wouldnt rely on that, there might be a version upgrade and the format might diverge. So I think we shouldn't allow that. This case can be guarded with some flags like when we attach timeline to a different pageserver it shouldn't try to upload something to s3. But this is still quite tricky. We shouldn't crash because of OOM here because InMemoryLayers can now be swapped to disk but still we should manage this somehow. I'll investigate possible solutions

kelvich · 2021-12-08T07:30:45Z

Some more notes.

What is the sequence of actions to test moving a tenant to a different pageserver?

1. Init pageserver old

2. Create new tenant

3. Insert some data

4. Run a checkpoint

5. Wait for it to be uploaded to the remote storage

6. Bootstrap new pageserver

7. Wait for all the data to be downloaded

8. Execute callmemaybe so new pageserver starts replication from safekeeper

9. Wait for replication to catch up

10. Restart compute with changed pageserver address

11. Shut down tenant on the old pageserver

I think for a v0 you can just go from 7 to 10 directly. It would be a bigger performance hiccup for compute, but should be okay from the correctness standpoint. After that we can add more synchronization between steps 7-10. Note that you can connect pageserver to safekeeper without callmemaybe.

What bothers me the most is that it becomes possible for two pageservers to write checkpoints concurrently to the same s3 path. Though they'll probably write the same data I wouldnt rely on that, there might be a version upgrade and the format might diverge. So I think we shouldn't allow that. This case can be guarded with some flags like when we attach timeline to a different pageserver it shouldn't try to upload something to s3. But this is still quite tricky. We shouldn't crash because of OOM here because InMemoryLayers can now be swapped to disk but still we should manage this somehow.

OOM shouldn't be the case here since pageserver anyway can spill files to disk. We can have unconditional check that if we are trying to overwrite existing valid files on s3 we need to check crc/hashsum and do nothing if they match -- that should help with your case IIUC.

LizardWizzard · 2021-12-08T12:37:52Z

I think for a v0 you can just go from 7 to 10 directly

Yeah, currently I just insert sleep calls if something needs waiting with todos to replace with api calls for proper synchronization.

do nothing if they match

What if they don't? Currently there is an archiving mechanism on the way to land in #874 and I've raised similar concerns there too. Can there be some non determinism in the files layout? Can there be a different version of pageserver with some changes to file layout? So I think now we might want to avoid both: overwriting files in s3, ~~extracting s3 data to non empty local timeline~~ we do not download anything for timeline that is available locally. Though it is good to check for race conditions etc. What do you think?

I've created #971 so maybe it is better to continue discussion there

cc @SomeoneToIgnore

LizardWizzard · 2022-12-30T12:07:57Z

Let me summarize whats left here:

There is a possible race with branch creation. So if branch is created when relocation is in progress it can be lost. There is a possible fix to track that on the client side. I e make some eventually consistent algorithm which will converge to proper branch set on target pageserver. This solution was implemented but was not committed to cloud repo.
Other issue is related to background operations that can corrupt state on remote storage (s3) if they're running on both pageservers simultaneously (gc/compaction)

This can be mitigated by suspending background operations before relocation. Corresponding issue: #2740 I attempted to implement that, but with current state of tenant management thing its hard to do that reliably. PR with that attempt #2665

This approach has some downsides, and I wrote an RFC with better proposal, but it is a way heavier change: #2676. RFC involves some future problems which will occur when we start thinking about scaling one tenant to multiple pageservers

So I think its better to start with suspending background operations now

LizardWizzard · 2023-01-30T14:21:35Z

Another bug that needs to be resolved: #3478

…cessing (#4235) With this patch, the attach handler now follows the same pattern as tenant create with regards to instantiation of the new tenant: 1. Prepare on-disk state using `create_tenant_files`. 2. Use the same code path as pageserver startup to load it into memory and start background loops (`schedule_local_tenant_processing`). It's a bit sad we can't use the `PageServerConfig::tenant_attaching_mark_file_path` method inside `create_tenant_files` because it operates in a temporary directory. However, it's a small price to pay for the gained simplicity. During implementation, I noticed that we don't handle failures post `create_tenant_files` well. I left TODO comments in the code linking to the issue that I created for this [^1]. Also, I'll dedupe the spawn_load and spawn_attach code in a future commit. refs #1555 part of #886 (Tenant Relocation) [^1]: #4233

This PR adds support for supplying the tenant config upon /attach. Before this change, when relocating a tenant using `/detach` and `/attach`, the tenant config after `/attach` would be the default config from `pageserver.toml`. That is undesirable for settings such as the PITR-interval: if the tenant's config on the source was `30 days` and the default config on the attach-side is `7 days`, then the first GC run would eradicate 23 days worth of PITR capability. The API change is backwards-compatible: if the body is empty, we continue to use the default config. We'll remove that capability as soon as the cloud.git code is updated to use attach-time tenant config (#4282 keeps track of this). unblocks neondatabase/cloud#5092 fixes #1555 part of #886 (Tenant Relocation) Implementation ============== The preliminary PRs for this work were (most-recent to least-recent) * #4279 * #4267 * #4252 * #4235

jcsp · 2023-09-05T08:12:24Z

This ticket is quite old, and described an earlier migration approach with downtime, that already exists. Closing in favour of #5199 for the ongoing work to enable seamless migration.

kelvich added this to the Tech preview milestone Nov 16, 2021

LizardWizzard added the tracking-issue label Nov 16, 2021

kelvich added the launch blocker label Nov 24, 2021

kelvich assigned LizardWizzard Nov 24, 2021

stepashka removed this from the Tech preview milestone Nov 29, 2021

stepashka changed the title ~~Epic: tenant rebalancing~~ Epic: initial version of tenant rebalancing Dec 2, 2021

stepashka added the t/Epic Issue type: Epic label Dec 6, 2021

stepashka added this to the Technical preview milestone Dec 10, 2021

stepashka mentioned this issue Dec 13, 2021

Epic: implement migration of a tenant between safekeeper nodes #985

Closed

6 tasks

stepashka changed the title ~~Epic: initial version of tenant rebalancing~~ Epic: implement migration of tenants between the pageserver nodes Dec 13, 2021

stepashka added c/storage/pageserver Component: storage: pageserver s3 and removed launch blocker labels Dec 13, 2021

stepashka changed the title ~~Epic: implement migration of tenants between the pageserver nodes~~ Epic: implement migration of a tenant between the pageserver nodes Dec 15, 2021

stepashka changed the title ~~Epic: implement migration of a tenant between the pageserver nodes~~ Epic: implement migration of a tenant between pageserver nodes Dec 15, 2021

stepashka modified the milestones: Technical preview, Limited Preview Dec 15, 2021

stepashka added the a/scalability Area: related to scalability label Dec 22, 2021

stepashka mentioned this issue Dec 23, 2021

Shutdown inactive tenants on the pageservers #945

Closed

stepashka added the p/cloud Product: Neon Cloud label Dec 27, 2021

stepashka removed the p/cloud Product: Neon Cloud label Jan 10, 2022

LizardWizzard mentioned this issue Jan 12, 2022

Introduce node id to identify pageservers (other services) #1105

Closed

stepashka added m/COW-jan22 Moment: during the Cloud Ops Week in Jan 22 and removed m/COW-jan22 Moment: during the Cloud Ops Week in Jan 22 labels Jan 14, 2022

stepashka linked a pull request Jan 17, 2022 that will close this issue

very first version of tenant migration from one pageserver to another via remote storage #995

Merged

stepashka removed this from the Limited Preview milestone Jan 24, 2022

stepashka added this to the 2022/06 milestone Jun 17, 2022

kelvich changed the title ~~Epic: implement migration of a tenant between pageserver nodes~~ Epic: rebalancing aka implement migration of a tenant between pageserver nodes Jul 5, 2022

kelvich changed the title ~~Epic: rebalancing aka implement migration of a tenant between pageserver nodes~~ Epic: rebalancing aka migration of a tenant between pageserver nodes Jul 5, 2022

stepashka modified the milestones: 2022/06, 2022/08 Jul 25, 2022

stepashka modified the milestones: 2022/08, 2022/09 Sep 5, 2022

stepashka modified the milestones: 2022/09, 2022/10 Oct 17, 2022

stepashka changed the title ~~Epic: rebalancing aka migration of a tenant between pageserver nodes~~ Epic: tenant relocation between pageserver nodes Oct 24, 2022

LizardWizzard mentioned this issue Nov 2, 2022

Epic: pageserver s3 coordination #2739

Closed

2 tasks

vadim2404 modified the milestones: 2022/10, 2022/11 Nov 9, 2022

shanyp modified the milestones: 2022/11, 2023/01 Dec 20, 2022

shanyp modified the milestones: 2023/01, 2023/02 Jan 9, 2023

shanyp modified the milestones: 2023/02, 2023/03 Feb 19, 2023

shanyp removed the s/blocked label May 10, 2023

problame mentioned this issue May 15, 2023

refactor: attach: use create_tenant_files + schedule_local_tenant_processing #4235

Merged

problame mentioned this issue May 19, 2023

attach-time tenant config #4255

Merged

LizardWizzard removed their assignment Aug 14, 2023

jcsp self-assigned this Aug 14, 2023

jcsp closed this as completed Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: tenant relocation between pageserver nodes #886

Epic: tenant relocation between pageserver nodes #886

kelvich commented Nov 16, 2021 •

edited by ololobus

Loading

LizardWizzard commented Dec 7, 2021

kelvich commented Dec 8, 2021

LizardWizzard commented Dec 8, 2021 •

edited

Loading

LizardWizzard commented Dec 30, 2022

LizardWizzard commented Jan 30, 2023

jcsp commented Sep 5, 2023

Epic: tenant relocation between pageserver nodes #886

Epic: tenant relocation between pageserver nodes #886

Comments

kelvich commented Nov 16, 2021 • edited by ololobus Loading

Motivation

DoD

Tasks

LizardWizzard commented Dec 7, 2021

kelvich commented Dec 8, 2021

LizardWizzard commented Dec 8, 2021 • edited Loading

LizardWizzard commented Dec 30, 2022

LizardWizzard commented Jan 30, 2023

jcsp commented Sep 5, 2023

kelvich commented Nov 16, 2021 •

edited by ololobus

Loading

LizardWizzard commented Dec 8, 2021 •

edited

Loading