-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/poll: Close should not return until the fd is really closed #21856
Comments
Sorry --- did anyone have thoughts on this? |
Proposal reviews are done approximately weekly. When this one will be reviewed depends on the overall load. There have been a bunch of proposals recently, for some reason. My initial reaction is that this seems reasonable, but what is special about |
Thanks for the clarification --- I was confused because the latency of the review process in practice often outperforms the specification :-) (This seems to be a common issue in systems design, e.g., In my opinion, This was my intention with the original proposal title of "Listener.Close() should not return until the underlying address is freed". The goal is to have |
It's not clear to me if this should be restricted to listeners. It's plausible that other special fds might also exist and that we should instead make sure that when any fd Close returns, the underlying close(2) system call has returned. It's then not clear to me if this would potentially cause any deadlocks by making Close wait for pending I/O to be aborted, and that pending I/O might take a long time. Or maybe we should do this for all fds we poll, since at least there we (think we) know how to abort pending I/O promptly. |
I think it would definitely help to enumerate any other cases that might be "special"; thus far I can only think of listeners. It seems like an implementation at the poll / |
Per discussion with @ianlancetaylor let's just try making internal/poll's Close do this for all file descriptors and back down to Listeners only if we find some reason to do so. |
Change https://golang.org/cl/66150 mentions this issue: |
Change https://golang.org/cl/66334 mentions this issue: |
Updates #21856 Change-Id: I9baa51fe23e6dd2fcf9dd14f7acfaf7457571e1d Reviewed-on: https://go-review.googlesource.com/66334 Run-TryBot: Ian Lance Taylor <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: David Crawshaw <[email protected]> Reviewed-by: Mikio Hara <[email protected]>
Change https://golang.org/cl/83715 mentions this issue: |
Updates #7970 Updates #21856 Updates #23111 Change-Id: I0cd0151fcca740c40c3c976f941b04e98e67b0bf Reviewed-on: https://go-review.googlesource.com/83715 Reviewed-by: Russ Cox <[email protected]>
Change https://golang.org/cl/83995 mentions this issue: |
os.NewFile doesn't put the fd into non-blocking mode. In most cases, an *os.File returned by os.NewFile is in blocking mode. Updates #7970 Updates #21856 Updates #23111 Change-Id: Iab08432e41f7ac1b5e25aaa8855d478adb7f98ed Reviewed-on: https://go-review.googlesource.com/83995 Reviewed-by: Ian Lance Taylor <[email protected]>
What version of Go are you using (
go version
)?1.9
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?linux/amd64
Proposal
I'm taking
TCPListener
as an example, but I'm pretty sure the issue applies to UDP and Unix domain sockets as well.TCPListener.Close
is documented as "Close stops listening on the TCP address." However,Close()
does not guarantee that the address is safe to listen on again. This is because if a concurrent goroutine is in the middle of anAccept()
call on the same listener, it will hold an additional reference, via anfdmutex
readLock, to the underlying file descriptor. Consequently, in the typical case,Close()
will merely decrement the reference count and return without actually callingclose(2)
on the file descriptor.Because the runtime sets
SO_REUSEADDR
on all its listen sockets, the address will be free forbind(2)
again immediately after theclose(2)
system call returns. But sinceclose(2)
will not execute until bothClose()
andAccept()
return, a synchronization primitive (e.g., achan
or async.WaitGroup
) is currently required in order to produce the necessary happens-before relationship to safely create a newTCPListener
.Here's a test case (adapted from one suggested by @davecheney on the golang-nuts list):
https://gist.github.com/slingamn/3b0f81169de6578c732ec279828c4866
This case will reliably panic with:
panic: listen tcp :6502: bind: address already in use
.The proposal is that all the
Listener
types affected by this issue should perform this synchronization automatically in theirClose
methods. This would (at least on POSIX platforms) make it safe to re-listen on an address immediately afterClose
.Here are some reasons in support of this change:
SO_REUSEADDR
set, the address is available immediately afterclose(2)
returns; any concurrentaccept(2)
call will fail withEBADF
. This behavior is arguably being simulated, inaccurately, by the polling layer, which does not actually do a blockingaccept(2)
.)time.Sleep()
call is deleted from the above test case, the program reliably executes without a panic, even when built with-race
. It's easy to write code that appears to work, but has a lurking race.Accept()
andClose()
return" is currently a sufficient condition to safely listen on the address again, this is not clearly specified in the documentation; it seems more like an implementation detail than anything else. It is not clear that future versions of the runtime will behave the same way.Listener
is a relatively uncommon operation, relative to, e.g., accepting new connections).Thanks very much for your time.
The text was updated successfully, but these errors were encountered: