Support bucket with different priorities #55

simolus3 · 2025-01-30T14:18:49Z

No description provided.

simolus3

Adding some of my own comments for a discussion on this.

crates/core/src/operations.rs

crates/core/src/sync_local.rs

crates/core/src/view_admin.rs

crates/core/src/sync_local.rs

crates/core/src/bucket_priority.rs

crates/core/src/checkpoint.rs

crates/core/src/sync_local.rs

rkistner · 2025-02-04T15:34:03Z

After checking the implementation, I realized that a row present in multiple buckets with different priorities has a big potential for edge cases - both in the spec and the specific implementation. This is specifically relevant for the r.bucket IN (SELECT id FROM involved_buckets) clause for example.

A potential use case could be where a user has a lot of data, and "stars" specific items to prioritize the sync of them.

For these examples, suppose we have two buckets: bucket1 and bucket2, with priorities 1 and 2 respectively. The same row could be in either or both of the buckets.

Current state

Case 1: A row is present in bucket 2, then added to bucket1. We get a partial_checkpoint_complete for bucket1.
The row will be included here, getting the latest version. ✅

Case 2: A row is removed from bucket2, and added to bucket1 at the same time ("starring" a row to move it to a higher priority). We get a partial_checkpoint_complete for bucket1. The row will be included here, and whether or not we got the REMOVE on bucket2 already doesn't make a difference. When we sync the rest of the checkpoint, the updated row will stay present. ✅

Case 3: A row is removed from bucket1, and added to bucket2 at the same time (removing a star to move to lower priority). We get a partial_checkpoint_complete for bucket1. We don't track removes per bucket, so this does nothing. ✅ When we sync the rest of the checkpoint, the row will be added again. ✅

Case 4: A row is present in bucket1 and bucket2, then removed from bucket1. We get a partial_checkpoint_complete for bucket1. We don't track removes per bucket, so this does nothing. ✅ When we sync the rest of the checkpoint, the row is updated with the state from bucket2. ✅

Hypothetical, if we tracked remove operations per bucket

Case 1: A row is present in bucket 2, then added to bucket1. We get a partial_checkpoint_complete for bucket1.
The row will be included here, getting the latest version. ✅

Case 2: A row is removed from bucket2, and added to bucket1 at the same time ("starring" a row to move it to a higher priority). We get a partial_checkpoint_complete for bucket1. The row will be included here, and whether or not we got the REMOVE on bucket2 already doesn't make a difference. When we sync the rest of the checkpoint, the updated row will stay present. ✅

Case 3: A row is removed from bucket1, and added to bucket2 at the same time (removing a star to move to lower priority). We get a partial_checkpoint_complete for bucket1. The row will be removed here, whether or not we got the PUT for bucket2. This works "according to the spec", but not sure if this is desired behavior. ❓ When we sync the rest of the checkpoint, the row will be added again. ✅

Case 4: A row is present in bucket1 and bucket2, then removed from bucket1. We get a partial_checkpoint_complete for bucket1. The row will be removed here, despite it still being present in bucket2. ❓ When we sync the rest of the checkpoint, the row will not be added again, resulting in an inconsistency. ❌

My summary from this that adding support for REMOVE operations in partial checkpoints does not actually give improved consistency as I hoped - it just creates more weird edge cases. The current behavior of only applying the REMOVE operations in the final checkpoint gives better results.

simolus3 added 6 commits January 29, 2025 17:37

Add priority field to buckets

5a18308

Create bucket with correct priority

7afb7ec

Fix test

b2c081f

Add basic tests around priorities

2eb12e0

Cleanup

e566e4a

Revert unintentional changes

8f4c914

simolus3 commented Jan 30, 2025

View reviewed changes

crates/core/src/operations.rs Outdated Show resolved Hide resolved

crates/core/src/sync_local.rs Outdated Show resolved Hide resolved

crates/core/src/sync_local.rs Outdated Show resolved Hide resolved

simolus3 added 3 commits February 3, 2025 14:24

Check checksums partially

f2cc286

Don't store priorities

7b9fb06

Revert unecessary join

c05654e

rkistner reviewed Feb 4, 2025

View reviewed changes

Refactor sync_local

7ad5a13

Introduce table for partial sync completions

b61c374

simolus3 marked this pull request as ready for review February 10, 2025 15:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support bucket with different priorities #55

Support bucket with different priorities #55

simolus3 commented Jan 30, 2025

simolus3 left a comment

rkistner commented Feb 4, 2025

Current state

Hypothetical, if we tracked remove operations per bucket

Support bucket with different priorities #55

Are you sure you want to change the base?

Support bucket with different priorities #55

Conversation

simolus3 commented Jan 30, 2025

simolus3 left a comment

Choose a reason for hiding this comment

rkistner commented Feb 4, 2025

Current state

Hypothetical, if we tracked remove operations per bucket