Browser-to-server libp2p reliability #2529

raykyri · 2024-05-06T19:52:56Z

We've been running a browser-to-server libp2p mesh for chat applications at https://play.skystrife.xyz, that uses gossipsub to distribute messages and our own service, based on GossipLog and a Prolly tree to sync past messages. We're monitoring logs, Prometheus metrics, and have separate instances that spin up libp2p nodes and connect to our mesh to perform health checks.

Since last week, there have been tens of players online at the same time (occasionally even 100+). We've noticed reliability issues even at the smaller scales - libp2p server nodes will randomly stop accepting messages, or stop listening on the port after a few hours. The cause isn't an OOM or anything else readily apparent from libp2p:*:error logging.

What's the state of reliability for browser-to-server libp2p right now? We're considering using a separate websocket service and using libp2p for server-to-server sync exclusively as it seems unclear how others have deployed this stack in a browser environment.

We are currently on:

    "@libp2p/bootstrap": "^10.0.7",
    "@libp2p/fetch": "^1.0.5",
    "@libp2p/identify": "^1.0.6",
    "@libp2p/interface": "^1.0.2",
    "@libp2p/logger": "^4.0.2",
    "@libp2p/mplex": "^10.0.7",
    "@libp2p/peer-id": "^4.0.2",
    "@libp2p/peer-id-factory": "^4.0.1",
    "@libp2p/ping": "^1.0.6",
    "@libp2p/prometheus-metrics": "^3.0.7",
    "@libp2p/utils": "^5.0.3",
    "@libp2p/websockets": "^8.0.7",

Platform: Linux d891112f4e1d68 5.15.98-fly #gfb1e5e454b SMP Tue Apr 23 17:55:23 UTC 2024 x86_64 GNU/Linux
Subsystem: Primarily dialer (addresses disappearing), gossipsub/relay, and others.
Severity: High

The text was updated successfully, but these errors were encountered:

abuvanth · 2024-05-09T09:08:54Z

Check this silkroadnomad/libp2p-relay#3

raykyri · 2024-05-12T19:09:42Z

Manually patching the autodialer retry threshold solved most of our problems. Someone else caught this last week and a fix is already on main: 767b23e. (We're mostly running with gossipsub penalties off, so no issues there. Tuning how many peers are grafted helped only marginally.)

Still encountering occasional SIGILL crashes, which were propagating up our stack and causing issues with our container host, but it may not be a js-libp2p issue but something lower level (filecoin-lotus users are seeing it too??) so I'll close this issue now. If anyone else reads this while testing their mesh: invest in headless browser network tests using something like docker-compose -- it's not as hard as it sounds and worth it!!

SgtPooki · 2024-05-13T17:22:06Z

@raykyri some of the browser interop work is being covered with a demo app at https://github.com/libp2p/universal-connectivity. Browser reliability should increase when webrtc is released in go-libp2p. see libp2p/go-libp2p#2778

2color · 2024-06-03T12:02:05Z

@raykyri Are you still encountering SIGILL crashes?

Are you using js-libp2p for the server host on Node.js?

raykyri · 2024-06-03T21:09:16Z

We're using js-libp2p for the server host, yep.

We haven't seen any SIGILL issues, we eventually traced that to somewhere else.

raykyri added the need/triage Needs initial labeling and prioritization label May 6, 2024

raykyri closed this as completed May 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Browser-to-server libp2p reliability #2529

Browser-to-server libp2p reliability #2529

raykyri commented May 6, 2024 •

edited

Loading

abuvanth commented May 9, 2024

raykyri commented May 12, 2024

SgtPooki commented May 13, 2024

2color commented Jun 3, 2024

raykyri commented Jun 3, 2024

Browser-to-server libp2p reliability #2529

Browser-to-server libp2p reliability #2529

Comments

raykyri commented May 6, 2024 • edited Loading

abuvanth commented May 9, 2024

raykyri commented May 12, 2024

SgtPooki commented May 13, 2024

2color commented Jun 3, 2024

raykyri commented Jun 3, 2024

raykyri commented May 6, 2024 •

edited

Loading