Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync can unexpectedly occur with devices that have not enabled it #762

Closed
achou11 opened this issue Aug 22, 2024 · 3 comments · Fixed by #764
Closed

Sync can unexpectedly occur with devices that have not enabled it #762

achou11 opened this issue Aug 22, 2024 · 3 comments · Fixed by #764
Assignees
Labels
bug Something isn't working

Comments

@achou11
Copy link
Member

achou11 commented Aug 22, 2024

Description

I'm running into a situation where I can get data sync to occur between two devices even though only one of them has data sync enabled. Using myself and Evan as an example, here are the reproduction steps:

  1. Andrew invites Evan to a project as a participant.
  2. Andrew and Evan collect some data independently.
  3. Andrew and Evan go to the sync screen and both press the button to enable data sync.
  4. Andrew and Evan leave the sync screen, which disables data sync in this case. EDIT: in this case, it's important that Andrew leaves first (see Sync can unexpectedly occur with devices that have not enabled it #762 (comment)). And that's how I've been doing this in practice
  5. Andrew or Evan (or both) collect more data independently.
  6. Andrew goes to the sync screen and presses the button to enable sync. Notice that syncing starts between Andrew and Evan, even though Evan hasn't pressed the button to enable sync again.

The expectation here is that in 6, Andrew should be in some pending state, waiting for any other connected devices (in this case, Evan) to also enable sync. Instead, sync immediately happens without Evan opting in.

Important notes:

  • In step 1, not sure if it matters what role is chosen for Evan.

    • I've only tried this when Evan is a participant, but might be worth trying to reproduce the issue for when Evan is added as a coordinator. EDIT: being a coordinator did not make a difference.
  • Steps 2,3, and 4 are important. If we do this without those steps, the issue does not seem to occur. In technical terms, it seems like a prior data sync has to have happened before encountering this issue.

  • In step 6, it's vital that Andrew enables sync again. If it were Evan that did that instead, this issue does not occur.

    • This suggests that something about being the project creator somehow bypasses the expected sync "etiquette". EDIT: This is probably wrong, see Sync can unexpectedly occur with devices that have not enabled it #762 (comment) for a more likely explanation
    • Going back to the first note, wondering if the issue could be reproduced if Evan were a coordinator and re-enabled sync instead of Andrew. EDIT: being a coordinator did not make a difference

Tasks

  • [ ]
@achou11 achou11 added the bug Something isn't working label Aug 22, 2024
@gmaclennan
Copy link
Member

Took a quick look. There's a possibility that here: https://github.com/digidem/mapeo-core-next/blob/72709284c9f37a2559e28b80d0ebec5e7726af49/src/sync/peer-sync-controller.js#L266 whichever side closes first will result in the peer being removed from core.peers for the other side, but maybe the protomux channel remains open, and because the other peer doesn't find it, then it skips closing the protomux channel because it doesn't find the peer in core.peers. If this is the case, the issue would depend on who leaves the sync screen (e.g. stops sync) first. Could you maybe check that Andrew?

@achou11
Copy link
Member Author

achou11 commented Aug 22, 2024

Took a quick look. There's a possibility that here:

https://github.com/digidem/mapeo-core-next/blob/72709284c9f37a2559e28b80d0ebec5e7726af49/src/sync/peer-sync-controller.js#L266
whichever side closes first will result in the peer being removed from core.peers for the other side, but maybe the protomux channel remains open, and because the other peer doesn't find it, then it skips closing the protomux channel because it doesn't find the peer in core.peers. If this is the case, the issue would depend on who leaves the sync screen (e.g. stops sync) first. Could you maybe check that Andrew?

ah very interesting. so, i can confirm that there's different behavior based on who stopped sync first. It seems like whomever stops sync first is also the one who will cause the bug to occur when they re-enable sync. For example, after step 3 in the original description:

  1. Evan leaves the sync screen first, then Andrew.
  2. Evan collects more data
  3. Evan and Andrew return to the sync screen.
  4. Andrew presses start sync button. Nothing happens, which is expected. Andrew presses stop sync button.
  5. Evan presses start sync button. Sync occurs unexpectedly!

Does this align with what you'd suspect?

@EvanHahn EvanHahn self-assigned this Aug 22, 2024
@EvanHahn
Copy link
Contributor

Working on this now. See this work-in-progress branch, which currently just has a failing test and some TODOs.

EvanHahn added a commit that referenced this issue Aug 27, 2024
This change:

- Fixes core unreplication, closing issue [#762].

- Adds "fuzz testing" for sync, which tries a bunch of random actions
  and makes sure things work as expected.

[#762]: #762

Co-Authored-By: Gregor MacLennan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants