-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epic: tenant relocation between pageserver nodes #886
Comments
Some more notes. What is the sequence of actions to test moving a tenant to a different pageserver?
What bothers me the most is that it becomes possible for two pageservers to write checkpoints concurrently to the same s3 path. Though they'll probably write the same data I wouldnt rely on that, there might be a version upgrade and the format might diverge. So I think we shouldn't allow that. This case can be guarded with some flags like when we attach timeline to a different pageserver it shouldn't try to upload something to s3. But this is still quite tricky. We shouldn't crash because of OOM here because InMemoryLayers can now be swapped to disk but still we should manage this somehow. I'll investigate possible solutions |
I think for a v0 you can just go from 7 to 10 directly. It would be a bigger performance hiccup for compute, but should be okay from the correctness standpoint. After that we can add more synchronization between steps 7-10. Note that you can connect pageserver to safekeeper without callmemaybe.
OOM shouldn't be the case here since pageserver anyway can spill files to disk. We can have unconditional check that if we are trying to overwrite existing valid files on s3 we need to check crc/hashsum and do nothing if they match -- that should help with your case IIUC. |
Yeah, currently I just insert sleep calls if something needs waiting with todos to replace with api calls for proper synchronization.
What if they don't? Currently there is an archiving mechanism on the way to land in #874 and I've raised similar concerns there too. Can there be some non determinism in the files layout? Can there be a different version of pageserver with some changes to file layout? So I think now we might want to avoid both: overwriting files in s3, I've created #971 so maybe it is better to continue discussion there |
Let me summarize whats left here:
This can be mitigated by suspending background operations before relocation. Corresponding issue: #2740 I attempted to implement that, but with current state of tenant management thing its hard to do that reliably. PR with that attempt #2665 This approach has some downsides, and I wrote an RFC with better proposal, but it is a way heavier change: #2676. RFC involves some future problems which will occur when we start thinking about scaling one tenant to multiple pageservers So I think its better to start with suspending background operations now |
Another bug that needs to be resolved: #3478 |
…cessing (#4235) With this patch, the attach handler now follows the same pattern as tenant create with regards to instantiation of the new tenant: 1. Prepare on-disk state using `create_tenant_files`. 2. Use the same code path as pageserver startup to load it into memory and start background loops (`schedule_local_tenant_processing`). It's a bit sad we can't use the `PageServerConfig::tenant_attaching_mark_file_path` method inside `create_tenant_files` because it operates in a temporary directory. However, it's a small price to pay for the gained simplicity. During implementation, I noticed that we don't handle failures post `create_tenant_files` well. I left TODO comments in the code linking to the issue that I created for this [^1]. Also, I'll dedupe the spawn_load and spawn_attach code in a future commit. refs #1555 part of #886 (Tenant Relocation) [^1]: #4233
This PR adds support for supplying the tenant config upon /attach. Before this change, when relocating a tenant using `/detach` and `/attach`, the tenant config after `/attach` would be the default config from `pageserver.toml`. That is undesirable for settings such as the PITR-interval: if the tenant's config on the source was `30 days` and the default config on the attach-side is `7 days`, then the first GC run would eradicate 23 days worth of PITR capability. The API change is backwards-compatible: if the body is empty, we continue to use the default config. We'll remove that capability as soon as the cloud.git code is updated to use attach-time tenant config (#4282 keeps track of this). unblocks neondatabase/cloud#5092 fixes #1555 part of #886 (Tenant Relocation) Implementation ============== The preliminary PRs for this work were (most-recent to least-recent) * #4279 * #4267 * #4252 * #4235
This ticket is quite old, and described an earlier migration approach with downtime, that already exists. Closing in favour of #5199 for the ongoing work to enable seamless migration. |
Motivation
We want to be able to assign tenants and timelines to the pageserver running on the appropriate EC2 instances, to be able to distribute the tenants and their workloads on EC2 instances fairly. This helps achieve the stable user query latencies, avoid noisy neighbor issues and when necessary perform maintenance and upgrades on the nodes where the pageserver runs.
See #985 for a similar issue on safekeepers
DoD
Add possibility to move tenant from one storage node to another. For now it is okay to require compute node restart to do that.
There should be a button in control plane to trigger the migration. Aside from compute restart there shouldnt be any downtime.
Tasks
For more in-depth description see the last RFC on the topic: #3868
For actual progress see the project board: https://github.com/orgs/neondatabase/projects/27/views/2
As described in the RFC we implement simpler approach first. The downside is the short downtime. The approach can evolve into one without the downtime after we solve immediate tech debt issues.
The text was updated successfully, but these errors were encountered: