fix: sync should only happen when both sides are enabled #764

EvanHahn · 2024-08-25T19:56:17Z

Closes #762.

gmaclennan · 2024-08-27T10:50:19Z

I added a commit which I think fixes this. I don't particularly like this, and I feel like this code is fragile, given that we are doing something not supported by hypercore, but I have written some more robust unit tests for this now.

I extracted unreplicate into its own function, and added more comprehensive unit tests for different ways that cores could be unreplicated, e.g. in different orders, and with delays. I found that a delay between unreplicating (e.g. if one peer replicates after the other) caused errors, due to the peer being removed from core.peers when the other side closes the channel, which results in not finding the channel that needs to be closed.

The solution I found for this is that rather than iterating through peers attached to a core to get the channel reference (to close it), I iterate through the channels of a protomux. I'm not sure what the consequences are if there are multiple channels with the same id on a protomux, but I don't think we do that so I think it's fine. With the unit tests for unreplicate and the fuzzy sync tests (thanks @EvanHahn !) I think this is ok, even though I don't like how I'm implementing this!

gmaclennan · 2024-08-27T10:56:48Z

Correction: not yet fixed! Fuzz tests are showing errors still with re-sync not working as expected. I am unable to reproduce locally, I wonder if there is a way we can make fuzz tests reproducible?

gmaclennan · 2024-08-27T12:39:48Z

The issue could be because of edge cases (race conditions) around starting and stopping sync in the same tick. Opening a protomux channel (which happens when replicating a core) is asynchronous, and I'm not sure what happens when we close a protomux channel in the same tick as when it is opening. Our sync api start() should probably be async, and implement start-stop-state-machine so that stop waits for start before attempting to stop.

EvanHahn · 2024-08-27T19:14:57Z

The issue could be because of edge cases (race conditions) around starting and stopping sync in the same tick. Opening a protomux channel (which happens when replicating a core) is asynchronous, and I'm not sure what happens when we close a protomux channel in the same tick as when it is opening. Our sync api start() should probably be async, and implement start-stop-state-machine so that stop waits for start before attempting to stop.

Filed #786.

EvanHahn · 2024-08-27T20:20:52Z

Gregor and I paired on this so I will merge without additional review.

EvanHahn and others added 14 commits August 25, 2024 18:04

Add failing test

4ef0871

Fix test (?)

866e2ce

WIP: messily trying to fix other tests

d2eb629

Tidying

713e97c

Test cleanup

12683a8

No .only

cb05512

Basic, buggy fuzzer

c3193b5

Organize imports

8b21536

Fuzzer cleanup

fbad430

Trade one failure for another

46a833f

Merge branch 'main' into 2024-08-22-sync-bug-exploration

128dc57

Merge branch 'main' into 2024-08-22-sync-bug-exploration

bb48eb8

Fuzz cleanup

389f0b4

extract & fix unreplicate function

99063d7

gmaclennan and others added 13 commits August 27, 2024 13:43

test adding 10ms wait after sync start/stop in fuzz tests

ead73a4

Merge branch 'main' into 2024-08-22-sync-bug-exploration

e38d99e

Ready for Evan to clean up

2ec8936

Merge branch 'main' into 2024-08-22-sync-bug-exploration

7aeb3ad

Remove why-is-node-running commented-out import

47a9fe3

sync fuzz: use delay

8db6106

Sync fuzz: parallelize, wait for expectation rather than "finish"

afbc1d7

Sync fuzz test cleanup

2dd3813

More sync fuzz test cleanup

f78bb01

More sync fuzz test cleanup

f3ad1ee

Minor: use single quotes for type import

ee49924

Revert a type rename

014cb8b

Merge branch 'main' into 2024-08-22-sync-bug-exploration

6c15b21

EvanHahn added 4 commits August 27, 2024 18:49

Remove a TODO we've now addressed

e2d4647

Merge branch 'main' into 2024-08-22-sync-bug-exploration

a5ecb06

Remove another addressed TODO

34ec93f

Revert an unused type change

fb4860b

EvanHahn mentioned this pull request Aug 27, 2024

Make SyncApi.prototype.start and .stop asynchronous #786

Open

EvanHahn added 4 commits August 27, 2024 19:22

Use shared utilities

2926762

Assert mins and maxes are correct

c68f3f2

Wait for ready before unreplicating

47648ae

Fuzz tests should do subtests properly

b74fd2e

EvanHahn marked this pull request as ready for review August 27, 2024 20:20

EvanHahn merged commit b61e132 into main Aug 27, 2024
7 checks passed

EvanHahn deleted the 2024-08-22-sync-bug-exploration branch August 27, 2024 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: sync should only happen when both sides are enabled #764

fix: sync should only happen when both sides are enabled #764

EvanHahn commented Aug 25, 2024

gmaclennan commented Aug 27, 2024

gmaclennan commented Aug 27, 2024

gmaclennan commented Aug 27, 2024

EvanHahn commented Aug 27, 2024

EvanHahn commented Aug 27, 2024

fix: sync should only happen when both sides are enabled #764

fix: sync should only happen when both sides are enabled #764

Conversation

EvanHahn commented Aug 25, 2024

gmaclennan commented Aug 27, 2024

gmaclennan commented Aug 27, 2024

gmaclennan commented Aug 27, 2024

EvanHahn commented Aug 27, 2024

EvanHahn commented Aug 27, 2024