-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
very first version of tenant migration from one pageserver to another via remote storage #995
Conversation
9c430b5
to
5b5ca4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice, actually, despite all the WIP stuff.
I'm a bit concerned by the sleep(n)
things, but this is not a problem right now.
IMO later, we could call some http method periodically and limit the number of calls instead.
|
||
new_pageserver_config_file_path = new_pageserver_dir / 'pageserver.toml' | ||
|
||
# add remote storage mock to pageserver config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth a todo, IMO, we would like to enable remote storage for every Python IT tests with one env var/command, so the file appending way is rather crude and temporary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think when your config changes land it will be convenient to support multiple pageservers directly from zenith cli
@SomeoneToIgnore Thanks for the review!
Yes, I fully agree with that, we can have some utility in our python tests to wait for certain predicate to become true with a timeout on a waiting period. When all the bits will be in place I'll remove these sleep calls |
#1079 helps with hacks around the remote storage |
ac3c7d2
to
189924e
Compare
I think in this form this patch can be accepted and TODO's are closed with follow ups. I used links in them to corresponding issues. @arssher @lubennikovaav Please take a look at safekeeper changes, this mostly touches callmemaybe stuff There is also a small document in docs/ so if you have any questions I'll be glad to cover them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not looked thoroughly at the Python code, but LGTM overall
b2c1ca3
to
b10dc00
Compare
I've rebased this on top of shutdown improvements patch and added margins for some exact checks to reduce possible flakiness |
443194c
to
1d60d95
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I compile walkeeper package alone, I get an error:
❯ cargo check --package walkeeper --all-targets
Checking walkeeper v0.1.0 (/Users/someonetoignore/Work/zenith/zenith/walkeeper)
error[E0433]: failed to resolve: could not find `select` in `tokio`
--> walkeeper/src/callmemaybe.rs:220:16
|
220 | tokio::select! {
| ^^^^^^ could not find `select` in `tokio`
feels like tokio
is missing a feature there?
Otherwise, the same "LGTM but not sure about Python magic code" impression.
Thanks for the review @SomeoneToIgnore!
It seems that the error appears on main too. I wonder should we pin somewhere (in zeinth_utils?) tokio dependency with all the needed features. This should simplify things a bit, and AFAIK cargo uses union of all features needed by crate/it's dependencies so this shouldn't affect compile times, but worth checking |
docs/pageserver-tenant-migration.md
Outdated
|
||
### Implementation details | ||
|
||
Now safekeeper needs to track which pageserver it is replicating to. This introduces complications into replication code and callmemaybe subscription state handling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on the "complications in the replication code"?
I wonder if we need special treatment of pageserver's feedback in get_replicas_state()
on safekeeper, given that two streams can exist for the same timeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By complications I mean additional handling of pageserver connection string and the requirement to track which pageserver is actually a primary one, to avoid reconnections to it (which is not currently implemented but is present in the epic in the follow ups section)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that two streams can exist for the same timeline?
At a first glance everything looks OK because we keep a Vec of replicas for particular timeline. Do you have some invariants in mind that should be checked?
let timelineid = spg.timeline.get().timelineid; | ||
// TODO this expect is ugly, will it be better with node id? | ||
// can we guarantee that this code is not executed during walproposer recovery? | ||
let pageserver_connstr = pageserver_connstr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lubennikovaav what do you think about this expect
call? It was not triggered in tests so I assume this code path it not followed in walproposer recovery but can we guarantee somehow that recovery won't touch this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If walproposer is in recovery stop_lsn
is set and this is checked in the code above this line.
1d60d95
to
3b8494c
Compare
|
||
### Implementation details | ||
|
||
Now safekeeper needs to track which pageserver it is replicating to. This introduces complications into replication code: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW do you think we can address both of that with neondatabase/rfcs#16 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first one, if callmemaybe goes away there is no need to weak it for these changes.
The second one can go away too if safekeeper doesnt initiate connections (and doesnt reconnect) to pageserver. As you've answered this is exactly how it is proposed in the RFC, so yeah, both problems are absent with this RFC being implemented
3b8494c
to
17300a0
Compare
This patch includes attach/detach http endpoints in pageservers. Some changes in callmemaybe handling inside safekeeper and an integrational test to check migration with and without load. There are still some rough edges that will be addressed in follow up patches
17300a0
to
ae9bc4a
Compare
Currently this contains python test which makes some assumptions and uses quite hacky setup with second pageserver without zenith cli support
Resolves #896
Resolves #897
Resolves #898
Resolves #900