Replace raw syscalls for non-blocking reads #1629

jackc · 2023-06-04T00:19:52Z

The non-blocking IO system was designed to solve three problems:

Deadlock that can occur when both sides of a connection are blocked writing because all buffers between are full.
The inability to use a write deadline with a TLS.Conn without killing the connection.
Efficiently check if a connection is broken before writing. Due to fun characteristics of TCP it is possible a connection broken but writes still to succeed. This can be detected by doing a read. This reduces the cases where the application doesn't know if a query that does a INSERT/UPDATE/DELETE was actually sent to the server or not.

However, the nbconn package is extraordinarily complex, has been a source of very tricky bugs, and has OS specific code paths. It also does not work at all with underlying net.Conn implementations that do not have platform specific non-blocking IO syscall support and do not properly implement deadlines. In particular, this is the case with golang.org/x/crypto/ssh. See #1600, #1605, and #1618. In addition, there appears to be a Windows platform bug that is impossible to work around in pgx. See golang/go#58764.

The deadlock problem is now solved with a combination of a goroutine for CopyFrom like v4 used and a watchdog timer for other writes that starts a background reader. Moving back to the v4 style CopyFrom implementation should also resolve any potential remaining Windows performance issues. See #1481.

The write deadline problem is actually moot. We check for context cancellation before sending a query and the actual Write should be almost instant as long as the underlying connection is not blocked. (We should only have to wait until it is accepted by the OS, not until it is fully sent.) So the write deadline is almost never encountered.

Efficiently checking if a connection has been closed is probably the hardest to solve without non-blocking reads. However, the existing code only solves part of the problem. It can detect a closed or broken connection the OS knows about, but it won't actually detect other types of broken connections such as a network interruption. This is currently implemented in CheckConn and called automatically when checking a connection out of the pool that has been idle for over one second. This is resolved by changing CheckConn to a very short deadline read, deprecating it, and replacing internal calls to CheckConn with Ping.

The non-blocking IO system was designed to solve three problems: 1. Deadlock that can occur when both sides of a connection are blocked writing because all buffers between are full. 2. The inability to use a write deadline with a TLS.Conn without killing the connection. 3. Efficiently check if a connection has been closed before writing. This reduces the cases where the application doesn't know if a query that does a INSERT/UPDATE/DELETE was actually sent to the server or not. However, the nbconn package is extraordinarily complex, has been a source of very tricky bugs, and has OS specific code paths. It also does not work at all with underlying net.Conn implementations that do not have platform specific non-blocking IO syscall support and do not properly implement deadlines. In particular, this is the case with golang.org/x/crypto/ssh. I believe the deadlock problem can be solved with a combination of a goroutine for CopyFrom like v4 used and a watchdog for regular queries that uses time.AfterFunc. The write deadline problem actually should be ignorable. We check for context cancellation before sending a query and the actual Write should be almost instant as long as the underlying connection is not blocked. (We should only have to wait until it is accepted by the OS, not until it is fully sent.) Efficiently checking if a connection has been closed is probably the hardest to solve without non-blocking reads. However, the existing code only solves part of the problem. It can detect a closed or broken connection the OS knows about, but it won't actually detect other types of broken connections such as a network interruption. This is currently implemented in CheckConn and called automatically when checking a connection out of the pool that has been idle for over one second. I think that changing CheckConn to a very short deadline read and changing the pool to do an actual Ping would be an acceptable solution. Remove nbconn and non-blocking code. This does not leave the system in an entirely working state. In particular, CopyFrom is broken, deadlocks can occur for extremely large queries or batches, and PgConn.CheckConn is now a `select 1` ping. These will be resolved in subsequent commits.

This approach uses an extra goroutine to write while the main goroutine continues to read. This avoids the need to use non-blocking I/O.

This commit adds a background reader that can optionally buffer reads. It is used whenever a potentially blocking write is made to the server. The background reader is started on a slight delay so there should be no meaningful performance impact as it doesn't run for quick queries and its overhead is minimal relative to slower queries.

jackc · 2023-06-04T01:55:07Z

This PR is an overall reduction of over 1,000 lines. The new system is simpler, but still complex in places, and it is a substantial change. I would appreciate some review and testing.

drakkan · 2023-06-04T10:10:30Z

@jackc thank you!

I did a quick test

cat go.mod | grep pgx
	github.com/jackc/pgx/v5 v5.3.2-0.20230604013921-d9633608d0fe

sftpgo test cases pass on both Linux and Windows. I just use the database/sql driver so nothing complex.
Honestly this no raw syscalls approach makes me feel much more relaxed, this way pgx should work the same on any platform 😄

cc @codercms

drakkan · 2023-06-05T08:40:09Z

pgconn/pgconn.go

+// enterPotentialWriteReadDeadlock must be called before a write that could deadlock if the server is simultaneously
+// blocked writing to us.
+func (pgConn *PgConn) enterPotentialWriteReadDeadlock() {
+	pgConn.slowWriteTimer.Reset(10 * time.Millisecond)


10 milliseconds seems like a pretty arbitrary value. Since in most cases this timer shouldn't be needed I was wondering if we could use a higher value, e.g. 100 milliseconds. Does using a higher value create performance issues with batch queries? In any case I would use at least 15 milliseconds as this is the current minimum resolution of timers on Windows so that pgx behaves the same on all supported operating systems

10 milliseconds is basically an educated guess. A Write should only take as long as the syscall and memcpy to the OS outbound network buffer unless the buffer is full (which potentially is a block). It seemed like long enough for the normal case, but short enough not to kill performance if a block occurs.

And yes, large batch queries are the primary place this would be an issue.

I don't mind increasing it to 15ms, but I wonder Windows would still behave differently. Is that resolution per timer or is it some sort of system wide tick? i.e. Could we end up waiting almost 30ms in the worst case?

10 milliseconds is basically an educated guess. A Write should only take as long as the syscall and memcpy to the OS outbound network buffer unless the buffer is full (which potentially is a block). It seemed like long enough for the normal case, but short enough not to kill performance if a block occurs.

And yes, large batch queries are the primary place this would be an issue.

I don't mind increasing it to 15ms, but I wonder Windows would still behave differently. Is that resolution per timer or is it some sort of system wide tick? i.e. Could we end up waiting almost 30ms in the worst case?

I think this is the resolution per timer. Windows supports high resolution timers but the Go implementation don't use them. There is a WIP patch here

@jackc Default timer resolution in Windows is 15.6 ms, so in theory the worst case would be awaiting for 15.6 ms.
Ref: https://randomascii.wordpress.com/2013/07/08/windows-timer-resolution-megawatts-wasted/

But it's possible to set timer resolution to 1ms or even to 0.5 ms.
Ref: https://stackoverflow.com/a/22862989/2585154 , but this settings are system-wide, it will change timer resolution for entire system and affect all processes.

P.S. Some process may set timer resolution e.g. to 50ms, then the worst case would be awaiting for 50ms.

@jackc here is more detailed information about system timers in Windows - http://windowstimestamp.com/description

Just pushed a change to 15ms and added docs explaining how 15ms was chosen.

codercms · 2023-06-05T11:31:20Z

@jackc @drakkan It looks good to me, I'll test it on real Windows machine

brandur · 2023-06-05T16:14:38Z

I put the branch into our staging environment and tested it over a number of SSH connections. So far I haven't noticed any problems, which is good, because previously we were able to observe locking issues ~immediately.

So I'm hopeful it's working as expected! I'll follow up if we notice anything amiss. Thanks for the quick work!

danmcgee-soda · 2023-06-06T14:25:26Z

This change looks great!

We are using a custom DialFunc to connect to Google Cloud SQL databases using the dialer from the cloudsqlconn library, and we were seeing a 100ms delay on the first query issued, each time an idle connection was pulled from the pool. This is because the custom dialer forced the code down the fake non-blocking path, instead of having direct access to the underlying net.Conn object. We had to work around it with this bit of code as an afterConnect function, to shorten the hardcoded 100ms timeout value that could only be reduced if you triggered certain code paths:

func afterConnect(ctx context.Context, c *pgx.Conn) error {
	/*
		Because of our use of the Google Cloud SQL proxy library to talk to the database in cloud environments,
		the switch to non-blocking I/O in pgx/v5 started causing us a 100ms delay on acquiring a connection
		from the connection pool. This doesn't happen locally because the connection is more "direct", and pgx
		can use real non-blocking I/O. However, with the proxy tunnel involved, it falls back to a fake non-blocking
		implementation, which by default waits 100ms to prove the connection hasn't been closed. See `CheckConn`
		and chase it all the way down to `nbconn.go`.

		We can work around this by doing the only operation that reduces this 100ms delay - a COPY FROM. The current
		fake non-blocking implementation will time and measure a read operation in this code path, and reduce the
		delay to 2x the measured delay. In a cloud environment, this should be 1-5 ms at most. While it would be
		great to get this back to 0ms, it is orders of magnitude better than 100ms.

		This one line of code took us about two days to figure out, and many other ideas were tried and ruled out
		along the way. Please don't change this unless you really know what you're doing.
	*/
	_, err := c.CopyFrom(ctx, pgx.Identifier{"schema_migrations"}, []string{"version"}, pgx.CopyFromRows([][]any{}))
	return err
}

I'm going to try and find some time to try a build with the changes from this branch, and see if it lets us remove our workaround. I'll report back with the results.

danmcgee-soda · 2023-06-06T15:32:24Z

I'm going to try and find some time to try a build with the changes from this branch, and see if it lets us remove our workaround. I'll report back with the results.

I was able to run this branch in a Google Cloud K8S deployment today with our workaround removed, connecting via a custom dialer, and it worked great- no slowdown on first usage of the connection from the pool. Let me know if there is any additional testing we can do!

codercms · 2023-06-11T18:42:12Z

@jackc @drakkan I ran tests on a real Windows machine, pgconn_test fails on my machine with these errors (I use PostgreSQL 15 for tests):

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult
--- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult (0.24s)
=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/cache_statement
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/cache_statement (0.03s)

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/cache_describe
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/cache_describe (0.02s)

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/describe_exec
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/describe_exec (0.14s)

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/exec
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/exec (0.02s)

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/simple_protocol
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/simple_protocol (0.02s)

But it doesn't seem to be affected by this PR, so I guess it doesn't matter.
All other tests passed successfully:

go test ./...

...

FAIL    github.com/jackc/pgx/v5 10.766s
ok      github.com/jackc/pgx/v5/internal/iobufpool      0.226s
ok      github.com/jackc/pgx/v5/internal/pgio   0.042s
ok      github.com/jackc/pgx/v5/internal/pgmock 0.070s
ok      github.com/jackc/pgx/v5/internal/sanitize       0.041s
ok      github.com/jackc/pgx/v5/pgconn  9.859s
ok      github.com/jackc/pgx/v5/pgconn/internal/bgreader        1.585s
ok      github.com/jackc/pgx/v5/pgconn/internal/ctxwatch        4.008s
ok      github.com/jackc/pgx/v5/pgproto3        0.471s
ok      github.com/jackc/pgx/v5/pgtype  10.166s
ok      github.com/jackc/pgx/v5/pgtype/zeronull 0.713s
ok      github.com/jackc/pgx/v5/pgxpool 7.709s
ok      github.com/jackc/pgx/v5/stdlib  10.982s
ok      github.com/jackc/pgx/v5/tracelog        0.682s
FAIL

drakkan · 2023-06-11T18:47:41Z

@jackc @drakkan I ran tests of PR on a real Windows machine, pgconn_test fails on my machine with these errors (I use PostgreSQL 15 for tests):

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult
--- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult (0.24s)
=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/cache_statement
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/cache_statement (0.03s)

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/cache_describe
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/cache_describe (0.02s)

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/describe_exec
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/describe_exec (0.14s)

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/exec
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/exec (0.02s)

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/simple_protocol
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/simple_protocol (0.02s)

But it doesn't seem to be affected by this pull request, so I guess it doesn't matter. All other tests passed successfully:

@codercms you need to create to uint64 domain like this. After executing the initialization sql all tests should pass

go test ./...

...

FAIL    github.com/jackc/pgx/v5 10.766s
ok      github.com/jackc/pgx/v5/internal/iobufpool      0.226s
ok      github.com/jackc/pgx/v5/internal/pgio   0.042s
ok      github.com/jackc/pgx/v5/internal/pgmock 0.070s
ok      github.com/jackc/pgx/v5/internal/sanitize       0.041s
ok      github.com/jackc/pgx/v5/pgconn  9.859s
ok      github.com/jackc/pgx/v5/pgconn/internal/bgreader        1.585s
ok      github.com/jackc/pgx/v5/pgconn/internal/ctxwatch        4.008s
ok      github.com/jackc/pgx/v5/pgproto3        0.471s
ok      github.com/jackc/pgx/v5/pgtype  10.166s
ok      github.com/jackc/pgx/v5/pgtype/zeronull 0.713s
ok      github.com/jackc/pgx/v5/pgxpool 7.709s
ok      github.com/jackc/pgx/v5/stdlib  10.982s
ok      github.com/jackc/pgx/v5/tracelog        0.682s
FAIL

codercms · 2023-06-11T18:56:58Z

@drakkan thx, all tests passed:

go test ./...
?       github.com/jackc/pgx/v5/examples/chat   [no test files]
?       github.com/jackc/pgx/v5/examples/todo   [no test files]
?       github.com/jackc/pgx/v5/examples/url_shortener  [no test files]
?       github.com/jackc/pgx/v5/internal/anynil [no test files]
?       github.com/jackc/pgx/v5/internal/stmtcache      [no test files]
?       github.com/jackc/pgx/v5/pgproto3/example/pgfortune      [no test files]
?       github.com/jackc/pgx/v5/log/testingadapter      [no test files]
?       github.com/jackc/pgx/v5/pgxtest [no test files]
ok      github.com/jackc/pgx/v5 10.184s
ok      github.com/jackc/pgx/v5/internal/iobufpool      0.210s
ok      github.com/jackc/pgx/v5/internal/pgio   0.043s
ok      github.com/jackc/pgx/v5/internal/pgmock 0.067s
ok      github.com/jackc/pgx/v5/internal/sanitize       0.042s
ok      github.com/jackc/pgx/v5/pgconn  10.277s
ok      github.com/jackc/pgx/v5/pgconn/internal/bgreader        1.473s
ok      github.com/jackc/pgx/v5/pgconn/internal/ctxwatch        4.332s
ok      github.com/jackc/pgx/v5/pgproto3        0.610s
ok      github.com/jackc/pgx/v5/pgtype  10.857s
ok      github.com/jackc/pgx/v5/pgtype/zeronull 0.605s
ok      github.com/jackc/pgx/v5/pgxpool 7.655s
ok      github.com/jackc/pgx/v5/stdlib  10.346s
ok      github.com/jackc/pgx/v5/tracelog        0.519s

codercms · 2023-06-11T18:58:49Z

@jackc @drakkan I also installed pgx version from that PR into my real app, it seems to work fine, and sometimes I see even better query times.

jackc · 2023-06-12T14:39:22Z

All the feedback has been positive so I'm going to merge this. Thanks for testing it everyone!

pgconn/pgconn.go

#1629 (comment)

hothanhloc68 · 2023-10-12T15:46:10Z

@jackc thank you!

I did a quick test
cat go.mod | grep pgx
	github.com/jackc/pgx/v5 v5.3.2-0.20230604013921-d9633608d0fe
sftpgo test cases pass on both Linux and Windows. I just use the database/sql driver so nothing complex.
Honestly this no raw syscalls approach makes me feel much more relaxed, this way pgx should work the same on any platform 😄

cc @codercms

jackc added 6 commits June 3, 2023 08:42

Restore pgx v4 style CopyFrom implementation

68f6788

This approach uses an extra goroutine to write while the main goroutine continues to read. This avoids the need to use non-blocking I/O.

Deprecate CheckConn in favor of Ping

ea5f9c8

Fix race condition when CopyFrom is cancelled.

31a6043

failed to write startup message error should be normalized

d963360

drakkan reviewed Jun 5, 2023

View reviewed changes

jackc mentioned this pull request Jun 5, 2023

Connection over SSH hung with DialFunc in pgx v5 (v4 works) #1600

Closed

Increase slowWriteTimer to 15ms and document why

6b38a09

jackc merged commit 34eddf9 into master Jun 12, 2023

ujihisa reviewed Jul 4, 2023

View reviewed changes

pgconn/pgconn.go Show resolved Hide resolved

This was referenced Jul 4, 2023

Do not use infinite slow write timers #1670

Merged

Session reset callbacks cannot be used with the background reader #1673

Closed

jackc added a commit that referenced this pull request Jul 8, 2023

Recreate the frontend in Construct with the new bgReader

cd46cdd

#1629 (comment)

jackc mentioned this pull request Aug 11, 2023

PgConn.CheckConn() does not detect broken connections after v5.4.0 #1710

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace raw syscalls for non-blocking reads #1629

Replace raw syscalls for non-blocking reads #1629

jackc commented Jun 4, 2023

jackc commented Jun 4, 2023

drakkan commented Jun 4, 2023

drakkan Jun 5, 2023

jackc Jun 5, 2023

drakkan Jun 5, 2023

codercms Jun 5, 2023 •

edited

Loading

codercms Jun 5, 2023 •

edited

Loading

jackc Jun 6, 2023

codercms commented Jun 5, 2023 •

edited

Loading

brandur commented Jun 5, 2023

danmcgee-soda commented Jun 6, 2023

danmcgee-soda commented Jun 6, 2023

codercms commented Jun 11, 2023 •

edited

Loading

drakkan commented Jun 11, 2023

codercms commented Jun 11, 2023

codercms commented Jun 11, 2023 •

edited

Loading

jackc commented Jun 12, 2023

hothanhloc68 commented Oct 12, 2023

Replace raw syscalls for non-blocking reads #1629

Replace raw syscalls for non-blocking reads #1629

Conversation

jackc commented Jun 4, 2023

jackc commented Jun 4, 2023

drakkan commented Jun 4, 2023

drakkan Jun 5, 2023

Choose a reason for hiding this comment

jackc Jun 5, 2023

Choose a reason for hiding this comment

drakkan Jun 5, 2023

Choose a reason for hiding this comment

codercms Jun 5, 2023 • edited Loading

Choose a reason for hiding this comment

codercms Jun 5, 2023 • edited Loading

Choose a reason for hiding this comment

jackc Jun 6, 2023

Choose a reason for hiding this comment

codercms commented Jun 5, 2023 • edited Loading

brandur commented Jun 5, 2023

danmcgee-soda commented Jun 6, 2023

danmcgee-soda commented Jun 6, 2023

codercms commented Jun 11, 2023 • edited Loading

drakkan commented Jun 11, 2023

codercms commented Jun 11, 2023

codercms commented Jun 11, 2023 • edited Loading

jackc commented Jun 12, 2023

hothanhloc68 commented Oct 12, 2023

codercms Jun 5, 2023 •

edited

Loading

codercms Jun 5, 2023 •

edited

Loading

codercms commented Jun 5, 2023 •

edited

Loading

codercms commented Jun 11, 2023 •

edited

Loading

codercms commented Jun 11, 2023 •

edited

Loading