Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Browser-to-server libp2p reliability #2529

Closed
raykyri opened this issue May 6, 2024 · 5 comments
Closed

Browser-to-server libp2p reliability #2529

raykyri opened this issue May 6, 2024 · 5 comments
Labels
need/triage Needs initial labeling and prioritization

Comments

@raykyri
Copy link

raykyri commented May 6, 2024

We've been running a browser-to-server libp2p mesh for chat applications at https://play.skystrife.xyz, that uses gossipsub to distribute messages and our own service, based on GossipLog and a Prolly tree to sync past messages. We're monitoring logs, Prometheus metrics, and have separate instances that spin up libp2p nodes and connect to our mesh to perform health checks.

Since last week, there have been tens of players online at the same time (occasionally even 100+). We've noticed reliability issues even at the smaller scales - libp2p server nodes will randomly stop accepting messages, or stop listening on the port after a few hours. The cause isn't an OOM or anything else readily apparent from libp2p:*:error logging.

image

What's the state of reliability for browser-to-server libp2p right now? We're considering using a separate websocket service and using libp2p for server-to-server sync exclusively as it seems unclear how others have deployed this stack in a browser environment.

We are currently on:

    "@libp2p/bootstrap": "^10.0.7",
    "@libp2p/fetch": "^1.0.5",
    "@libp2p/identify": "^1.0.6",
    "@libp2p/interface": "^1.0.2",
    "@libp2p/logger": "^4.0.2",
    "@libp2p/mplex": "^10.0.7",
    "@libp2p/peer-id": "^4.0.2",
    "@libp2p/peer-id-factory": "^4.0.1",
    "@libp2p/ping": "^1.0.6",
    "@libp2p/prometheus-metrics": "^3.0.7",
    "@libp2p/utils": "^5.0.3",
    "@libp2p/websockets": "^8.0.7",
  • Platform: Linux d891112f4e1d68 5.15.98-fly #gfb1e5e454b SMP Tue Apr 23 17:55:23 UTC 2024 x86_64 GNU/Linux
  • Subsystem: Primarily dialer (addresses disappearing), gossipsub/relay, and others.
  • Severity: High
@raykyri raykyri added the need/triage Needs initial labeling and prioritization label May 6, 2024
@abuvanth
Copy link

abuvanth commented May 9, 2024

Check this silkroadnomad/libp2p-relay#3

@raykyri
Copy link
Author

raykyri commented May 12, 2024

Manually patching the autodialer retry threshold solved most of our problems. Someone else caught this last week and a fix is already on main: 767b23e. (We're mostly running with gossipsub penalties off, so no issues there. Tuning how many peers are grafted helped only marginally.)

Still encountering occasional SIGILL crashes, which were propagating up our stack and causing issues with our container host, but it may not be a js-libp2p issue but something lower level (filecoin-lotus users are seeing it too??) so I'll close this issue now. If anyone else reads this while testing their mesh: invest in headless browser network tests using something like docker-compose -- it's not as hard as it sounds and worth it!!

@raykyri raykyri closed this as completed May 12, 2024
@SgtPooki
Copy link
Member

@raykyri some of the browser interop work is being covered with a demo app at https://github.com/libp2p/universal-connectivity. Browser reliability should increase when webrtc is released in go-libp2p. see libp2p/go-libp2p#2778

@2color
Copy link
Contributor

2color commented Jun 3, 2024

@raykyri Are you still encountering SIGILL crashes?

Are you using js-libp2p for the server host on Node.js?

@raykyri
Copy link
Author

raykyri commented Jun 3, 2024

We're using js-libp2p for the server host, yep.

We haven't seen any SIGILL issues, we eventually traced that to somewhere else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

4 participants