-
Notifications
You must be signed in to change notification settings - Fork 873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace raw syscalls for non-blocking reads #1629
Conversation
The non-blocking IO system was designed to solve three problems: 1. Deadlock that can occur when both sides of a connection are blocked writing because all buffers between are full. 2. The inability to use a write deadline with a TLS.Conn without killing the connection. 3. Efficiently check if a connection has been closed before writing. This reduces the cases where the application doesn't know if a query that does a INSERT/UPDATE/DELETE was actually sent to the server or not. However, the nbconn package is extraordinarily complex, has been a source of very tricky bugs, and has OS specific code paths. It also does not work at all with underlying net.Conn implementations that do not have platform specific non-blocking IO syscall support and do not properly implement deadlines. In particular, this is the case with golang.org/x/crypto/ssh. I believe the deadlock problem can be solved with a combination of a goroutine for CopyFrom like v4 used and a watchdog for regular queries that uses time.AfterFunc. The write deadline problem actually should be ignorable. We check for context cancellation before sending a query and the actual Write should be almost instant as long as the underlying connection is not blocked. (We should only have to wait until it is accepted by the OS, not until it is fully sent.) Efficiently checking if a connection has been closed is probably the hardest to solve without non-blocking reads. However, the existing code only solves part of the problem. It can detect a closed or broken connection the OS knows about, but it won't actually detect other types of broken connections such as a network interruption. This is currently implemented in CheckConn and called automatically when checking a connection out of the pool that has been idle for over one second. I think that changing CheckConn to a very short deadline read and changing the pool to do an actual Ping would be an acceptable solution. Remove nbconn and non-blocking code. This does not leave the system in an entirely working state. In particular, CopyFrom is broken, deadlocks can occur for extremely large queries or batches, and PgConn.CheckConn is now a `select 1` ping. These will be resolved in subsequent commits.
This approach uses an extra goroutine to write while the main goroutine continues to read. This avoids the need to use non-blocking I/O.
This commit adds a background reader that can optionally buffer reads. It is used whenever a potentially blocking write is made to the server. The background reader is started on a slight delay so there should be no meaningful performance impact as it doesn't run for quick queries and its overhead is minimal relative to slower queries.
This PR is an overall reduction of over 1,000 lines. The new system is simpler, but still complex in places, and it is a substantial change. I would appreciate some review and testing. |
@jackc thank you! I did a quick test
sftpgo test cases pass on both Linux and Windows. I just use the database/sql driver so nothing complex. cc @codercms |
pgconn/pgconn.go
Outdated
// enterPotentialWriteReadDeadlock must be called before a write that could deadlock if the server is simultaneously | ||
// blocked writing to us. | ||
func (pgConn *PgConn) enterPotentialWriteReadDeadlock() { | ||
pgConn.slowWriteTimer.Reset(10 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 milliseconds seems like a pretty arbitrary value. Since in most cases this timer shouldn't be needed I was wondering if we could use a higher value, e.g. 100 milliseconds. Does using a higher value create performance issues with batch queries? In any case I would use at least 15 milliseconds as this is the current minimum resolution of timers on Windows so that pgx behaves the same on all supported operating systems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 milliseconds is basically an educated guess. A Write should only take as long as the syscall and memcpy to the OS outbound network buffer unless the buffer is full (which potentially is a block). It seemed like long enough for the normal case, but short enough not to kill performance if a block occurs.
And yes, large batch queries are the primary place this would be an issue.
I don't mind increasing it to 15ms, but I wonder Windows would still behave differently. Is that resolution per timer or is it some sort of system wide tick? i.e. Could we end up waiting almost 30ms in the worst case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 milliseconds is basically an educated guess. A Write should only take as long as the syscall and memcpy to the OS outbound network buffer unless the buffer is full (which potentially is a block). It seemed like long enough for the normal case, but short enough not to kill performance if a block occurs.
And yes, large batch queries are the primary place this would be an issue.
I don't mind increasing it to 15ms, but I wonder Windows would still behave differently. Is that resolution per timer or is it some sort of system wide tick? i.e. Could we end up waiting almost 30ms in the worst case?
I think this is the resolution per timer. Windows supports high resolution timers but the Go implementation don't use them. There is a WIP patch here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackc Default timer resolution in Windows is 15.6 ms, so in theory the worst case would be awaiting for 15.6 ms.
Ref: https://randomascii.wordpress.com/2013/07/08/windows-timer-resolution-megawatts-wasted/
But it's possible to set timer resolution to 1ms or even to 0.5 ms.
Ref: https://stackoverflow.com/a/22862989/2585154 , but this settings are system-wide, it will change timer resolution for entire system and affect all processes.
P.S. Some process may set timer resolution e.g. to 50ms, then the worst case would be awaiting for 50ms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackc here is more detailed information about system timers in Windows - http://windowstimestamp.com/description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed a change to 15ms and added docs explaining how 15ms was chosen.
I put the branch into our staging environment and tested it over a number of SSH connections. So far I haven't noticed any problems, which is good, because previously we were able to observe locking issues ~immediately. So I'm hopeful it's working as expected! I'll follow up if we notice anything amiss. Thanks for the quick work! |
This change looks great! We are using a custom DialFunc to connect to Google Cloud SQL databases using the dialer from the func afterConnect(ctx context.Context, c *pgx.Conn) error {
/*
Because of our use of the Google Cloud SQL proxy library to talk to the database in cloud environments,
the switch to non-blocking I/O in pgx/v5 started causing us a 100ms delay on acquiring a connection
from the connection pool. This doesn't happen locally because the connection is more "direct", and pgx
can use real non-blocking I/O. However, with the proxy tunnel involved, it falls back to a fake non-blocking
implementation, which by default waits 100ms to prove the connection hasn't been closed. See `CheckConn`
and chase it all the way down to `nbconn.go`.
We can work around this by doing the only operation that reduces this 100ms delay - a COPY FROM. The current
fake non-blocking implementation will time and measure a read operation in this code path, and reduce the
delay to 2x the measured delay. In a cloud environment, this should be 1-5 ms at most. While it would be
great to get this back to 0ms, it is orders of magnitude better than 100ms.
This one line of code took us about two days to figure out, and many other ideas were tried and ruled out
along the way. Please don't change this unless you really know what you're doing.
*/
_, err := c.CopyFrom(ctx, pgx.Identifier{"schema_migrations"}, []string{"version"}, pgx.CopyFromRows([][]any{}))
return err
} I'm going to try and find some time to try a build with the changes from this branch, and see if it lets us remove our workaround. I'll report back with the results. |
I was able to run this branch in a Google Cloud K8S deployment today with our workaround removed, connecting via a custom dialer, and it worked great- no slowdown on first usage of the connection from the pool. Let me know if there is any additional testing we can do! |
@jackc @drakkan I ran tests on a real Windows machine, pgconn_test fails on my machine with these errors (I use PostgreSQL 15 for tests):
But it doesn't seem to be affected by this PR, so I guess it doesn't matter.
|
@codercms you need to create to uint64 domain like this. After executing the initialization sql all tests should pass
|
@drakkan thx, all tests passed:
|
All the feedback has been positive so I'm going to merge this. Thanks for testing it everyone! |
|
The non-blocking IO system was designed to solve three problems:
However, the nbconn package is extraordinarily complex, has been a source of very tricky bugs, and has OS specific code paths. It also does not work at all with underlying
net.Conn
implementations that do not have platform specific non-blocking IO syscall support and do not properly implement deadlines. In particular, this is the case with golang.org/x/crypto/ssh. See #1600, #1605, and #1618. In addition, there appears to be a Windows platform bug that is impossible to work around in pgx. See golang/go#58764.The deadlock problem is now solved with a combination of a goroutine for
CopyFrom
like v4 used and a watchdog timer for other writes that starts a background reader. Moving back to the v4 styleCopyFrom
implementation should also resolve any potential remaining Windows performance issues. See #1481.The write deadline problem is actually moot. We check for context cancellation before sending a query and the actual Write should be almost instant as long as the underlying connection is not blocked. (We should only have to wait until it is accepted by the OS, not until it is fully sent.) So the write deadline is almost never encountered.
Efficiently checking if a connection has been closed is probably the hardest to solve without non-blocking reads. However, the existing code only solves part of the problem. It can detect a closed or broken connection the OS knows about, but it won't actually detect other types of broken connections such as a network interruption. This is currently implemented in
CheckConn
and called automatically when checking a connection out of the pool that has been idle for over one second. This is resolved by changingCheckConn
to a very short deadline read, deprecating it, and replacing internal calls toCheckConn
withPing
.