-
Notifications
You must be signed in to change notification settings - Fork 874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to make Windows non-blocking I/O #1525
Conversation
@jackc this PR works in real app, but I can't run tests for nbconn ( |
It looks like |
I fixed |
Looks good to me. I'm inclined to merge this as is. Regarding the If you get a chance could you help me with that error by:
|
@jackc I tested the master branch,
P.S. If you need to test something on WIndows, I'm ready to help (and I'll look at this bug) |
@jackc I have the same errors as you in this test repo - https://github.com/jackc/golang-windows-dupreads |
@codercms please read the latest comment in the upstream issue. Can you check if anything changes using Go 1.19.5? If you can't, I'll try it myself next weekend (I don't have a Windows test environment right now) |
@drakkan same error with Go 1.19.5 & 1.18.10 & 1.16 & 1.15 |
Thanks for your testing. Not sure how to try to help. Based on this comment What are the pratical implications for pgx? Can we have sporadic errors in Windows with TLS enabled? |
See golang/go#58764 for my longer description of the problem. But I'm pretty sure the problem is not in the tls package. I made wrapper conns that sit below the tls conn and record the traffic. They are seeing the erroneous duplication of traffic. That is what breaks the tls connection. This also suggests that the problem is not exclusive to tls connections, it's just that's the only way I have managed to trigger the issue. As far as its affect for pgx on Windows, it's concerning but I think it would be very difficult to trigger in real world usage. We can only duplicate this issue when at least 2.7 MB of writes are buffered on the client side and the server has simultaneously written a large amount of data. The only way I can foresee that happening is if a |
I've decided to merge this now. The failing test predates this change and appears to be a Go or Windows issue. This change should be an improvement regardless. |
if c.nonblockWriteFunc == nil { | ||
c.nonblockWriteFunc = func(fd uintptr) (done bool) { | ||
// Make sock non-blocking | ||
if err := setSockMode(fd, sockModeNonBlocking); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a test case so feel free to ignore this comment.
What if another goroutine ends a write and set the socket to blocking mode after this method ends and before calling WSASend
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drakkan Connection isn't considered as thread (goroutine) safe , it should be managed through connection pool, that guarantees that only one goroutine can work with connection.
Here it is - https://github.com/jackc/pgx/wiki/Getting-started-with-pgx#using-a-connection-pool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drakkan Connection isn't considered as thread (goroutine) safe , it should be managed through connection pool, that guarantees that only one goroutine can work with connection.
Here it is - https://github.com/jackc/pgx/wiki/Getting-started-with-pgx#using-a-connection-pool
Ok thanks. I was just trying to figure out if it could be related to #1552, #1548
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drakkan hmm, #1552 looks really strange. I started debugging this case and found that there is inconsistent setSockMode
calls:
var currSockMode = sockModeBlocking
func setSockMode(fd uintptr, mode sockMode) error {
if mode == currSockMode {
fmt.Printf("double setSockMode=%d\n", mode)
}
currSockMode = mode
res, _, err := ioctlsocket.Call(fd, uintptr(FIONBIO), uintptr(unsafe.Pointer(&mode)))
// Upon successful completion, the ioctlsocket returns zero.
if res != 0 && err != nil {
return err
}
return nil
}
Here is output when using CopyFrom with large amount of data:
double setSockMode=1
double setSockMode=0
double setSockMode=1
double setSockMode=0
double setSockMode=1
double setSockMode=0
double setSockMode=0
double setSockMode=0
double setSockMode=0
double setSockMode=0
double setSockMode=1
double setSockMode=0
double setSockMode=0
double setSockMode=0
double setSockMode=0
double setSockMode=1
(hangs forever)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added goroutine ids to output:
double setSockMode=1 (id=1)
double setSockMode=0 (id=1)
double setSockMode=0 (id=25)
double setSockMode=0 (id=1)
double setSockMode=0 (id=25)
double setSockMode=1 (id=25)
Looks like there is race condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have much expertise with Windows network programming, but a bit of doc and Google searching seems to indicate that it should be safe to for one thread to read and one thread to write from the safe socket. However, it appears that some of the winsock API is not thread safe. Perhaps that includes the ioctlsocket
function.
...
But now that I think about it a bit more I think I have a more likely cause for the deadlock. pgx will read and write concurrently under certain conditions (such as bufferNonblockingRead
when a write is blocked). Both of these functions set non-blocking mode, read or write, and set blocking mode.
What happens if the following sequence of events occurs:
- Writer sets non-blocking mode
- Reader sets non-blocking mode
- Writer writes
- Writer sets blocking mode
- Reader reads
The socket is unexpectedly in blocking mode for the read. If nothing is ever received it hangs.
As far as Unix goes, I think that since we are not changing the socket options, we are only reading and writing, it is safe.
I really wish the Go stdlib included non-blocking I/O. This system is so complicated...
I'm half tempted to try using goroutines and normal Go I/O again. I tried it a few years ago and the performance hit was significant, but maybe modern Go has higher performance...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackc You're absolutely right about that socket can be used for concurrent read/write (socket is full duplex).
I think implementing critical section (e.g. mutex) for Windows non-blocking I/O will be simplest solution. What do you think will this fix the problem? But additional benchmark is needed, because I/O actually will be not full duplex (you can only write or only read at the same time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be the simplest approach.
But it seems it should be possible to switch to non-blocking mode at the beginning of a sequence of read or writes and turn it off at the end. Not sure how messy that would be though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackc switching to non-blocking mode looks good too, because it doesn't hit performance.
I think non-blocking conn could expose setBlocking(blocking bool)
method or something like this, for Windows it will call ioctlsocket
, for Unix it will do nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created based on discussion here: jackc#1525 (review) Fixes jackc#1552
Created based on discussion here: #1525 (review) Fixes #1552
Resolves #1523 #1482
Do not merge: Possible broken tests