Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: expose device-specific sync state #757

Merged
merged 1 commit into from
Aug 21, 2024
Merged

Conversation

EvanHahn
Copy link
Contributor

myProject.$sync.getState() and the sync-state event now return objects like this:

{
  initial: { isSyncEnabled: true },
  data: { isSyncEnabled: true },
  remoteDeviceSyncState: {
    abc123: {
      initial: { isSyncEnabled: true, want: 0, wanted: 0 },
      data: { isSyncEnabled: true, want: 2, wanted: 0 },
    },
    def456: {
      initial: { isSyncEnabled: true, want: 0, wanted: 0 },
      data: { isSyncEnabled: true, want: 0, wanted: 1 },
    },
  },
}

In English, this means:

  • We now have sync information for individual peers, instead of aggregated info.

  • want and wanted have slightly different meaning than before. want is the number of blocks a peer wants from us, and wanted is the number of blocks we want from a specific peer.

    For example, imagine peers A, B, and C are connected and syncing. Peer A has observation 1, B has observation 2 and 3, and C has nothing. Each of these peers has a relationship:

    • From A's perspective...
      • B wants 1 observation from us and has 2 observations it knows we want.
      • C wants 1 observation from us.
    • From B's perspective... - A wants 2 observations from us and has 1 observation is knows we want. - C wants 2 observations from us.
    • From C's perspective...
      • A has 1 observation is knows we want, and wants nothing from us.
      • A has 2 observations is knows we want, and wants nothing from us.
  • Some data should now be inferred. For example, the number of peers is now the number of keys in remoteDeviceSyncState, not a calculated value.

  • have and missing were removed from remote states, because they are unused.

Closes #743.

`myProject.$sync.getState()` and the `sync-state` event now return
objects like this:

```javascript
{
  initial: { isSyncEnabled: true },
  data: { isSyncEnabled: true },
  remoteDeviceSyncState: {
    abc123: {
      initial: { isSyncEnabled: true, want: 0, wanted: 0 },
      data: { isSyncEnabled: true, want: 2, wanted: 0 },
    },
    def456: {
      initial: { isSyncEnabled: true, want: 0, wanted: 0 },
      data: { isSyncEnabled: true, want: 0, wanted: 1 },
    },
  },
}
```

In English, this means:

- We now have sync information for individual peers, instead of
  aggregated info.

- `want` and `wanted` have slightly different meaning than before.
  `want` is the number of blocks a peer wants from us, and `wanted` is
  the number of blocks we want from a specific peer.

  For example, imagine peers A, B, and C are connected and syncing. Peer
  A has observation 1, B has observation 2 and 3, and C has nothing.
  Each of these peers has a relationship:

  - From A's perspective...
    - B wants 1 observation from us and has 2 observations it knows we
      want.
    - C wants 1 observation from us.
  - From B's perspective...
    - A wants 2 observations from us and has 1 observation is knows we
      want.
    - C wants 2 observations from us.
  - From C's perspective...
    - A has 1 observation is knows we want, and wants nothing from us.
    - A has 2 observations is knows we want, and wants nothing from us.

- Some data should now be inferred. For example, the number of peers is
  now the number of keys in `remoteDeviceSyncState`, not a calculated
  value.

- `have` and `missing` were removed from remote states, because they are
  unused.

Closes [#743].

[#743]: #743

Co-authored-by: Tomás Ciccola <[email protected]>
@@ -128,7 +128,6 @@
"cpy-cli": "^5.0.0",
"drizzle-kit": "^0.20.14",
"eslint": "^8.57.0",
"filter-obj": "^6.0.0",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer used.

Comment on lines -15 to -17
export const DATA_NAMESPACES = NAMESPACES.filter(
(ns) => !PRESYNC_NAMESPACES.includes(ns)
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ufff, fixing all of this 😰

Copy link
Contributor

@tomasciccola tomasciccola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesomee 🚀

@EvanHahn EvanHahn merged commit 2b36edd into main Aug 21, 2024
7 checks passed
@EvanHahn EvanHahn deleted the device-specific-sync-state branch August 21, 2024 17:53
@gmaclennan
Copy link
Member

@EvanHahn unfortunately this won't give us quite the information we want for sync progress. wants and wanted don't really make sense unless they are aggregated, due to the way sync works in a p2p system. E.g. if the device is connected to 5 peers, and they all want the block at index 1 of a specific hypercore, this does not mean that 5 blocks are sent. Like with BitTorrent, blocks are shared as soon as they are received, so a device could only send one block, and this is shared amongst the 5 connected peers themselves. A similar scenario applies with wants: we could want just one block, and all 5 connected peers have it. When we are not sharing the aggregated sync state, there is no way of distinguishing between wanting the same block from multiple peers and wanting different blocks from all peers. This means that sync state can be wildly inaccurate and/or unreliable, especially when connecting to multiple peers.

I'm also not sure how you are handling missing blocks here? I know they weren't used by the front-end, but I thought they were necessary so that missing blocks are not included in the wants or wanted counts. This was one of the main bugs that we had with sync with previous Mapeo versions - peers never completed sync because they were waiting for blocks which were missing from all connected peers.

I think the way to solve the front-end requirement for sync state for both syncing and non-syncing peers is to aggregate syncing and non-syncing cores separately, and to pass the aggregated values as the sync state, with the same wanted, wants, haves and missing as before, but two separate values, one for all peers, and one for syncing peers.

@gmaclennan
Copy link
Member

ok I've been able to review this in more detail. I think dropping missing is fine - the code looks like it handles that correctly. The issue remains about counting the same block multiple times, but I think that is a minor issue, which just affects the estimates for sync progress - sync completion should be accurate whatever. One non-urgent change to do later is to consider doing array indexed access instead of .entries() when iterating over peers, because this code path runs multiple times, and using .entries() involves creating lots of additional arrays that then need garbage collected. I vaguely remember running some benchmarks to notice the difference - I don't think performance is hugely different, but I think memory and GC is different. I wouldn't worry about this if it were not for large projects with hundreds of blocks in each core, and multiple sync events, this code is run lots of times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose sync state of syncing peers
3 participants