feat!: expose device-specific sync state #757

EvanHahn · 2024-08-20T21:41:21Z

myProject.$sync.getState() and the sync-state event now return objects like this:

{
  initial: { isSyncEnabled: true },
  data: { isSyncEnabled: true },
  remoteDeviceSyncState: {
    abc123: {
      initial: { isSyncEnabled: true, want: 0, wanted: 0 },
      data: { isSyncEnabled: true, want: 2, wanted: 0 },
    },
    def456: {
      initial: { isSyncEnabled: true, want: 0, wanted: 0 },
      data: { isSyncEnabled: true, want: 0, wanted: 1 },
    },
  },
}

In English, this means:

We now have sync information for individual peers, instead of aggregated info.
want and wanted have slightly different meaning than before. want is the number of blocks a peer wants from us, and wanted is the number of blocks we want from a specific peer.

For example, imagine peers A, B, and C are connected and syncing. Peer A has observation 1, B has observation 2 and 3, and C has nothing. Each of these peers has a relationship:
- From A's perspective...
  - B wants 1 observation from us and has 2 observations it knows we want.
  - C wants 1 observation from us.
- From B's perspective... - A wants 2 observations from us and has 1 observation is knows we want. - C wants 2 observations from us.
- From C's perspective...
  - A has 1 observation is knows we want, and wants nothing from us.
  - A has 2 observations is knows we want, and wants nothing from us.
Some data should now be inferred. For example, the number of peers is now the number of keys in remoteDeviceSyncState, not a calculated value.
have and missing were removed from remote states, because they are unused.

Closes #743.

`myProject.$sync.getState()` and the `sync-state` event now return objects like this: ```javascript { initial: { isSyncEnabled: true }, data: { isSyncEnabled: true }, remoteDeviceSyncState: { abc123: { initial: { isSyncEnabled: true, want: 0, wanted: 0 }, data: { isSyncEnabled: true, want: 2, wanted: 0 }, }, def456: { initial: { isSyncEnabled: true, want: 0, wanted: 0 }, data: { isSyncEnabled: true, want: 0, wanted: 1 }, }, }, } ``` In English, this means: - We now have sync information for individual peers, instead of aggregated info. - `want` and `wanted` have slightly different meaning than before. `want` is the number of blocks a peer wants from us, and `wanted` is the number of blocks we want from a specific peer. For example, imagine peers A, B, and C are connected and syncing. Peer A has observation 1, B has observation 2 and 3, and C has nothing. Each of these peers has a relationship: - From A's perspective... - B wants 1 observation from us and has 2 observations it knows we want. - C wants 1 observation from us. - From B's perspective... - A wants 2 observations from us and has 1 observation is knows we want. - C wants 2 observations from us. - From C's perspective... - A has 1 observation is knows we want, and wants nothing from us. - A has 2 observations is knows we want, and wants nothing from us. - Some data should now be inferred. For example, the number of peers is now the number of keys in `remoteDeviceSyncState`, not a calculated value. - `have` and `missing` were removed from remote states, because they are unused. Closes [#743]. [#743]: #743 Co-authored-by: Tomás Ciccola <[email protected]>

EvanHahn · 2024-08-20T21:48:10Z

package.json

@@ -128,7 +128,6 @@
    "cpy-cli": "^5.0.0",
    "drizzle-kit": "^0.20.14",
    "eslint": "^8.57.0",
-    "filter-obj": "^6.0.0",


No longer used.

EvanHahn · 2024-08-20T21:48:22Z

src/sync/peer-sync-controller.js

-export const DATA_NAMESPACES = NAMESPACES.filter(
-  (ns) => !PRESYNC_NAMESPACES.includes(ns)
-)


No longer used.

tomasciccola · 2024-08-20T23:21:31Z

tests/sync/core-sync-state.js

ufff, fixing all of this 😰

tomasciccola

Awesomee 🚀

gmaclennan · 2024-08-22T09:16:52Z

@EvanHahn unfortunately this won't give us quite the information we want for sync progress. wants and wanted don't really make sense unless they are aggregated, due to the way sync works in a p2p system. E.g. if the device is connected to 5 peers, and they all want the block at index 1 of a specific hypercore, this does not mean that 5 blocks are sent. Like with BitTorrent, blocks are shared as soon as they are received, so a device could only send one block, and this is shared amongst the 5 connected peers themselves. A similar scenario applies with wants: we could want just one block, and all 5 connected peers have it. When we are not sharing the aggregated sync state, there is no way of distinguishing between wanting the same block from multiple peers and wanting different blocks from all peers. This means that sync state can be wildly inaccurate and/or unreliable, especially when connecting to multiple peers.

I'm also not sure how you are handling missing blocks here? I know they weren't used by the front-end, but I thought they were necessary so that missing blocks are not included in the wants or wanted counts. This was one of the main bugs that we had with sync with previous Mapeo versions - peers never completed sync because they were waiting for blocks which were missing from all connected peers.

I think the way to solve the front-end requirement for sync state for both syncing and non-syncing peers is to aggregate syncing and non-syncing cores separately, and to pass the aggregated values as the sync state, with the same wanted, wants, haves and missing as before, but two separate values, one for all peers, and one for syncing peers.

gmaclennan · 2024-08-22T16:04:02Z

ok I've been able to review this in more detail. I think dropping missing is fine - the code looks like it handles that correctly. The issue remains about counting the same block multiple times, but I think that is a minor issue, which just affects the estimates for sync progress - sync completion should be accurate whatever. One non-urgent change to do later is to consider doing array indexed access instead of .entries() when iterating over peers, because this code path runs multiple times, and using .entries() involves creating lots of additional arrays that then need garbage collected. I vaguely remember running some benchmarks to notice the difference - I don't think performance is hugely different, but I think memory and GC is different. I wouldn't worry about this if it were not for large projects with hundreds of blocks in each core, and multiple sync events, this code is run lots of times.

EvanHahn requested a review from tomasciccola August 20, 2024 21:44

This was referenced Aug 20, 2024

Update tests to (maybe?) expose bug in sync state #754

Closed

feat!: expose device specific sync state #753

Closed

EvanHahn commented Aug 20, 2024

View reviewed changes

tomasciccola reviewed Aug 20, 2024

View reviewed changes

tests/sync/core-sync-state.js

Copy link

Contributor

tomasciccola Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ufff, fixing all of this 😰

tomasciccola approved these changes Aug 21, 2024

View reviewed changes

EvanHahn merged commit 2b36edd into main Aug 21, 2024
7 checks passed

EvanHahn deleted the device-specific-sync-state branch August 21, 2024 17:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: expose device-specific sync state #757

feat!: expose device-specific sync state #757

EvanHahn commented Aug 20, 2024

EvanHahn Aug 20, 2024

EvanHahn Aug 20, 2024

tomasciccola Aug 20, 2024

tomasciccola left a comment

gmaclennan commented Aug 22, 2024

gmaclennan commented Aug 22, 2024

feat!: expose device-specific sync state #757

feat!: expose device-specific sync state #757

Conversation

EvanHahn commented Aug 20, 2024

EvanHahn Aug 20, 2024

Choose a reason for hiding this comment

EvanHahn Aug 20, 2024

Choose a reason for hiding this comment

tomasciccola Aug 20, 2024

Choose a reason for hiding this comment

tomasciccola left a comment

Choose a reason for hiding this comment

gmaclennan commented Aug 22, 2024

gmaclennan commented Aug 22, 2024