Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

panic: concurrent write to websocket connection #56

Closed
bonedaddy opened this issue Nov 10, 2019 · 25 comments · Fixed by #65
Closed

panic: concurrent write to websocket connection #56

bonedaddy opened this issue Nov 10, 2019 · 25 comments · Fixed by #65
Assignees

Comments

@bonedaddy
Copy link

bonedaddy commented Nov 10, 2019

Using the latest commit to the masterthis happens. I have also confirmed it present on the 0.1.2 version.

panic: concurrent write to websocket connection

goroutine 88169580 [running]:
github.com/gorilla/websocket.(*messageWriter).flushFrame(0xc011df6d20, 0xc009445901, 0x0, 0x0, 0x0, 0x7f12f5486e10, 0x0)
	/home/solidity/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:610 +0x612
github.com/gorilla/websocket.(*messageWriter).Close(0xc011df6d20, 0x8, 0xc053db3d00)
	/home/solidity/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:724 +0x62
github.com/gorilla/websocket.(*Conn).beginMessage(0xc02b0aa2c0, 0xc0094459b0, 0x2, 0x442401, 0xc053db3ee8)
	/home/solidity/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:473 +0x284
github.com/gorilla/websocket.(*Conn).NextWriter(0xc02b0aa2c0, 0x2, 0x0, 0x0, 0xa55, 0x100000000000000)
	/home/solidity/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:513 +0x53
github.com/gorilla/websocket.(*Conn).WriteMessage(0xc02b0aa2c0, 0x2, 0xc0463d7b00, 0x20, 0x40, 0x10, 0xc04d8f44e0)
	/home/solidity/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:766 +0x73
github.com/libp2p/go-ws-transport.(*Conn).Write(0xc011df69c0, 0xc0463d7b00, 0x20, 0x40, 0x1178340, 0x12eee40, 0x7f12f00d9320)
	/home/solidity/go/pkg/mod/github.com/libp2p/[email protected]/conn_native.go:70 +0x59
bytes.(*Buffer).WriteTo(0xc009445980, 0x7f12a01a9480, 0xc011df6cf0, 0x7f12f00d9320, 0xc009445980, 0xcd4101)
	/usr/local/go/src/bytes/buffer.go:239 +0xb5
io.copyBuffer(0x7f12a01a9480, 0xc011df6cf0, 0x16068a0, 0xc009445980, 0x0, 0x0, 0x0, 0x8, 0x439416, 0x14a6618)
	/usr/local/go/src/io/io.go:384 +0x33f
io.Copy(...)
	/usr/local/go/src/io/io.go:364
github.com/multiformats/go-multistream.SelectProtoOrFail.func1(0x132712c, 0xa, 0x7f12a01a93e8, 0xc011df6cf0, 0xc04a438480)
	/home/solidity/go/pkg/mod/github.com/multiformats/[email protected]/client.go:27 +0x163
created by github.com/multiformats/go-multistream.SelectProtoOrFail
	/home/solidity/go/pkg/mod/github.com/multiformats/[email protected]/client.go:23 +0x95

I previously opened this on the go-multistream repo multiformats/go-multistream#49. Failure is happening here. I suspect we need to protect the connection by a mutex?

@bonedaddy
Copy link
Author

ipfs/kubo#6197

Is a report of this from go-ipfs

@Stebalien
Copy link
Member

@vyzo could you look into this and ipfs/kubo#6197? They've stumped me in the past.

We can always fix it with a lock (maybe we should just do that) but it would be nice to know why this is happening.

@Stebalien
Copy link
Member

@bonedaddy updating go-mplex, go-yamux, and github.com/multiformats/go-multistream may help. Basically, go-ws-transport doesn't support concurrent writes and we may be doing that somewhere else in libp2p.

I'm suggesting upgrading because upgrading libp2p has stopped this bug from appearing on the our gateways.

@bonedaddy
Copy link
Author

I'm on v0.1.0 for go-mplex, v1.2.4 for go-yamu and latest available version of go-multistream. The commits to master on go-multistream don't seem to do anything that would impact this. I've been at a hackathon all weekend so i haven't had much time to experiment, but I've made these changes to a fork RTradeLtd@14bcce0 that I think should solve this issue.

However one thing I'm unsured about is if Read should use a write mutex lock as opposed to a read mutex lock.

@Jorropo
Copy link

Jorropo commented Nov 12, 2019

go-ws-transport doesn't support concurrent writes and we may be doing that somewhere else in libp2p

Why not just wrap writer in an lock defer unlock write return object ?

@bonedaddy
Copy link
Author

go-ws-transport doesn't support concurrent writes and we may be doing that somewhere else in libp2p

I didn't notice this bit, but that is a bit concerning. If something doesn't support concurrent writes it should be quite clearly explained. One CTRL+F on the home page of this repository doesn't show any hits for concurrent writes. The problem is most definitely some libp2p component that is attempting to perform concurrent writes.

While it is doubtful a clear warning would prevent this, it would at least make it much less likely to happen 😞

@marten-seemann
Copy link
Contributor

Concurrent writes don't make sense with most (all?) io.Writers, since the writer will write data in an ordered manner. Writing concurrently will inevitably destroy that ordering guarantee.
Thus, I'd consider this a bug in the application using the transport.

@bonedaddy
Copy link
Author

Concurrent writes don't make sense with most (all?) io.Writers, since the writer will write data in an ordered manner. Writing concurrently will inevitably destroy that ordering guarantee.
Thus, I'd consider this a bug in the application using the transport.

go-libp2p probably has a ton of bugs then 😭

@Stebalien
Copy link
Member

I don't see how that follows. Writing to a single stream/connection from two goroutines at the same time will always yield garbage results as the writes can be arbitrarily interleaved and merged.

Note: You can obviously still write to multiple streams on the same connection from multiple goroutines. The stream muxer takes care of muxing/demuxing.

@marten-seemann
Copy link
Contributor

@Stebalien Concurrent writes to the same stream (or the connection, if you're not running a stream muxer) was what I meant. Did I misunderstand the issue here?

@vyzo
Copy link
Contributor

vyzo commented Nov 12, 2019

@Stebalien I think we'd have to bite the bullet and use a lock.

@Jorropo
Copy link

Jorropo commented Nov 12, 2019

Reproduced with very few nodes ~ 200 :

nov. 12 10:12:29 Radus ipfs[27860]: 10:12:29.009 ERROR core/comma: context canceled pin.go:132
nov. 12 11:22:47 Radus ipfs[27860]: panic: concurrent write to websocket connection
nov. 12 11:22:47 Radus ipfs[27860]: goroutine 3249429 [running]:
nov. 12 11:22:47 Radus ipfs[27860]: github.com/gorilla/websocket.(*messageWriter).flushFrame(0xc008958540, 0x1, 0x0, 0x0, 0x0, 0x22, 0x22)
nov. 12 11:22:47 Radus ipfs[27860]:         /home/hugo/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:591 +0x77a
nov. 12 11:22:47 Radus ipfs[27860]: github.com/gorilla/websocket.(*messageWriter).Close(0xc008958540, 0xc002b56140, 0x22)
nov. 12 11:22:47 Radus ipfs[27860]:         /home/hugo/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:709 +0x56
nov. 12 11:22:47 Radus ipfs[27860]: github.com/gorilla/websocket.(*Conn).WriteMessage(0xc002b34dc0, 0x2, 0xc002b56140, 0x22, 0x40, 0x10, 0xc00
nov. 12 11:22:47 Radus ipfs[27860]:         /home/hugo/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:762 +0x10e
nov. 12 11:22:47 Radus ipfs[27860]: github.com/libp2p/go-ws-transport.(*Conn).Write(0xc007183560, 0xc002b56140, 0x22, 0x40, 0x168b420, 0x17e67
nov. 12 11:22:47 Radus ipfs[27860]:         /home/hugo/go/pkg/mod/github.com/libp2p/[email protected]/conn.go:74 +0x59
nov. 12 11:22:47 Radus ipfs[27860]: bytes.(*Buffer).WriteTo(0xc008958510, 0x7f8bfc0c3390, 0xc0071836b0, 0x7f8bfc0bd620, 0xc008958510, 0xe67c01
nov. 12 11:22:47 Radus ipfs[27860]:         /usr/lib/go/src/bytes/buffer.go:242 +0xb8
nov. 12 11:22:47 Radus ipfs[27860]: io.copyBuffer(0x7f8bfc0c3390, 0xc0071836b0, 0x1aebf80, 0xc008958510, 0x0, 0x0, 0x0, 0x6fc23ac00, 0x0, 0x0)
nov. 12 11:22:47 Radus ipfs[27860]:         /usr/lib/go/src/io/io.go:384 +0x34e
nov. 12 11:22:47 Radus ipfs[27860]: io.Copy(...)
nov. 12 11:22:47 Radus ipfs[27860]:         /usr/lib/go/src/io/io.go:364
nov. 12 11:22:47 Radus ipfs[27860]: github.com/multiformats/go-multistream.SelectProtoOrFail.func1(0x1817e73, 0xc, 0x7f8bfc0c32f8, 0xc0071836b
nov. 12 11:22:47 Radus ipfs[27860]:         /home/hugo/go/pkg/mod/github.com/multiformats/[email protected]/client.go:27 +0x164
nov. 12 11:22:47 Radus ipfs[27860]: created by github.com/multiformats/go-multistream.SelectProtoOrFail
nov. 12 11:22:47 Radus ipfs[27860]:         /home/hugo/go/pkg/mod/github.com/multiformats/[email protected]/client.go:23 +0x95

Also it was dial out because I'm not listening on ws.

@Stebalien
Copy link
Member

@Stebalien Concurrent writes to the same stream (or the connection, if you're not running a stream muxer) was what I meant. Did I misunderstand the issue here?

No. I'm just clarifying.


@Stebalien I think we'd have to bite the bullet and use a lock.

Maybe but that doesn't actually fix the issue. Either:

  1. We're not performing concurrent writes and go-ws-transport has a bug.
  2. We are performing concurrent writes and something in libp2p has a bug.

Basically, there's a logic bug somewhere.

@Stebalien
Copy link
Member

@Jorropo or @bonedaddy are you using github.com/libp2p/go-conn-security-multistream v0.0.2?

@vyzo
Copy link
Contributor

vyzo commented Nov 12, 2019

I would say that go-ws-transport needs to be audited first, because it's not as well exercised as the rest of the stack.
Is it possible that the dialer reuses connections when dialing the same url?

@bonedaddy
Copy link
Author

@Stebalien no I'm using v0.1.0.

@Stebalien
Copy link
Member

@bonedaddy go-conn-security-multistream, not go-multistream. There was a bug in previous releases that could, maybe, have caused this.

(of course, users are seeing this issue with go-ipfs 0.4.22 which has the latest go-conn-security-multistream so there must be more issues)

@Stebalien
Copy link
Member

@vyzo

Is it possible that the dialer reuses connections when dialing the same url?

It shouldn't. And even if it did, this error is at a higher layer. Re-using the same TCP conn shouldn't cause this error.

@Jorropo
Copy link

Jorropo commented Nov 13, 2019

@Stebalien, its just IPFS v0.4.22, also my arm node on v0.5.0 was never having that.

@bonedaddy
Copy link
Author

@bonedaddy go-conn-security-multistream, not go-multistream. There was a bug in previous releases that could, maybe, have caused this.

(of course, users are seeing this issue with go-ipfs 0.4.22 which has the latest go-conn-security-multistream so there must be more issues)

Using v0.1.0 of go-conn-security-multistream

@Stebalien
Copy link
Member

Stebalien commented Nov 13, 2019 via email

@bonedaddy
Copy link
Author

bonedaddy commented Nov 13, 2019

FYIW I'm experimenting with wrapping the Write method in mutex locks and I haven't had this error surface since opening the bug report.

I have made these changes to a fork already so I can port them upstream (here) if desired.

@Jorropo
Copy link

Jorropo commented Nov 14, 2019

@bonedaddy that seems an ugly hacks (even if I'm totaly okay with that if that works).
But :

  • it should not hapend (the muxer should take care of not doing that (but if that a bug of the muxer the tcp transport should be affected)).
  • other transports doesn't have this problem
  • I have an experimental node running for more than a week node with 10k peers and it never happend, so its maybe already patched but we don't realised yet.

But you got me an idea, I'll for each write check if the lock is already taken it will log the stack for both goroutine (holder and taker).

@vyzo
Copy link
Contributor

vyzo commented Nov 14, 2019

But you got me an idea, I'll for each write check if the lock is already taken it will log the stack for both goroutine (holder and taker).

@Jorropo that's exactly what I was planning to do too! That should give us the contenting threads so that we can debug it.

@vyzo
Copy link
Contributor

vyzo commented Nov 15, 2019

See #57 for instrumentation. Can you try this patch?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants