-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deal with concurrent checkpointing from both old and new pageserver during timeline migration #971
Comments
The current vision on that is that we should allow concurrent checkpointing without ant problems. I created a test for that, but faced unrelated errors which are currently being fixed on main. So I'll continue my investigation |
Note: while concurrent checkpointing shouldn't lead to correctness issues we still might want to avoid that in some future scenarios when we have timeline attached to two pageservers e.g to spread get page requests or to support failover. Currently concurrent checkpointing might happen in the process of tenant migration, when new and old pageservers are active simultaneously |
this is waiting for #1396 |
Things changed and we decided to introduce etcd. The new vision is to use it in order to prevent concurrent uploads from happening. |
I think we can close this, given we're set to implement relocation as specified in RFC #3868 |
I think this is still relevant. So issue describes the problem that RFC should solve in one way or another. In first iteration we decided to not have this problem by detaching before the attach so there is no concurrent background activity from more than one pageserver at a time. The project currently takes into account only first stage, so I'm not sure whether we should keep the issue in the project (keep it with separate label?). WDYT @problame? |
fixed by generations |
Started from discussion here #874 (comment)
There is a dangerous possibility of two pageservers writing data concurrently to the same underlying s3 storage. This is scary even if it works given the incremental format we use.
There are some questions and invariants we need to uphold. We shouldn't be able to overwrite local metadata of an active timeline with something older/newer from s3. As far as I understand the opposite is not a problem because we save metadata to each checkpoint. Currently local overwrite shouldn't happen because we do not schedule downloads for timelines that are present locally. So this is good.
Feel free to correct me, maybe I'm missing something
The text was updated successfully, but these errors were encountered: