-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bucket priorities #192
base: main
Are you sure you want to change the base?
Bucket priorities #192
Conversation
🦋 Changeset detectedLatest commit: d7065b1 The changes in this PR will be included in the next version bump. This PR includes changesets to release 10 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
@@ -0,0 +1,5 @@ | |||
--- | |||
'@powersync/service-sync-rules': minor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've set this to minor because breaking bumps it to 1.0.0 - there are breaking changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall! Added some minor comments on the implementation.
If I understand the implementation correctly, it will effectively change nothing if bucket priorities are not specified, which is great.
For specifying bucket priorities, could you also add support for this syntax?
bucket_definitions:
bulk_data:
priority: 2
parameters: select id as project_id from projects where ...
I think this would be simpler for the majority of cases, leaving the as _priority
form only for cases where developers need more control.
let firstBucketInSamePriority = 0; | ||
let currentPriority = bucketsToFetch[0]!.priority; | ||
const lowestPriority = bucketsToFetch.at(-1)!.priority; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't double-check, but I think bucketsToFetch
could be empty, which would break this.
for (let i = 0; i < bucketsToFetch.length; i++) { | ||
if (bucketsToFetch[i].priority == currentPriority) { | ||
continue; | ||
} | ||
|
||
// This incrementally updates dataBuckets with each individual bucket position. | ||
// At the end of this, we can be sure that all buckets have data up to the checkpoint. | ||
yield* bucketDataWithPriority(i); | ||
firstBucketInSamePriority = i; | ||
currentPriority = bucketsToFetch[i].priority; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic could perhaps be more clear if you group by priority first.
Map.groupBy
would be great for that, but would need an upgrade to Node 22 first.
After looking at the consistency implications in the client implementation, I think it would be better to make the default priority 3 instead of 1. If the client gets a partial checkpoint, we only sync PUT operations, ignoring REMOVEs, which changes the consistency properties a little. I think that's a decent trade-off in many cases, but the developer needs to be aware of it. Now imagine a developer starts with only having a priority 1 bucket (the default). Now they want to sync some bulk data as well, so they add a new bucket with priority 2. The issue is that this now affected the consistency properties of bucket 1, despite not touching it in the sync rules. While if we instead have the default as priority 3 (the lowest), it means the developer will have to explicitly modify the original bucket to assign a higher priority, making the implications more obvious. Thoughts? |
I agree that making the lowest priority the default is a good choice to make sure changing the priority of some buckets doesn't affect unrelated buckets 👍 |
b8b7836
to
7b1fcde
Compare
This implements the bucket priorities proposal. When buckets with different priorities are involved:
Priorities are currently declared with a static literal column named
_priority
in a data query, e.g.At the moment, the implementation simply groups buckets into their priorities and then synchronizes each priority in a batch (instead of using a batch for all buckets like before).
An interesting improvement might be to interrupt lower-priority sync work when higher-priorities have new data. For instance, in a flow like:
todos
to send to client, immediately send partial complete message for priority 0.lists
rows.todos
comes in.Here, we might want to stop sending rows for 4 (it's not that they're lost, the client has received the operations and we can later resume from that state) so that we can send the new checkpoint first.