Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace raw syscalls for non-blocking reads #1629

Merged
merged 7 commits into from
Jun 12, 2023
Merged

Replace raw syscalls for non-blocking reads #1629

merged 7 commits into from
Jun 12, 2023

Conversation

jackc
Copy link
Owner

@jackc jackc commented Jun 4, 2023

The non-blocking IO system was designed to solve three problems:

  1. Deadlock that can occur when both sides of a connection are blocked writing because all buffers between are full.
  2. The inability to use a write deadline with a TLS.Conn without killing the connection.
  3. Efficiently check if a connection is broken before writing. Due to fun characteristics of TCP it is possible a connection broken but writes still to succeed. This can be detected by doing a read. This reduces the cases where the application doesn't know if a query that does a INSERT/UPDATE/DELETE was actually sent to the server or not.

However, the nbconn package is extraordinarily complex, has been a source of very tricky bugs, and has OS specific code paths. It also does not work at all with underlying net.Conn implementations that do not have platform specific non-blocking IO syscall support and do not properly implement deadlines. In particular, this is the case with golang.org/x/crypto/ssh. See #1600, #1605, and #1618. In addition, there appears to be a Windows platform bug that is impossible to work around in pgx. See golang/go#58764.

The deadlock problem is now solved with a combination of a goroutine for CopyFrom like v4 used and a watchdog timer for other writes that starts a background reader. Moving back to the v4 style CopyFrom implementation should also resolve any potential remaining Windows performance issues. See #1481.

The write deadline problem is actually moot. We check for context cancellation before sending a query and the actual Write should be almost instant as long as the underlying connection is not blocked. (We should only have to wait until it is accepted by the OS, not until it is fully sent.) So the write deadline is almost never encountered.

Efficiently checking if a connection has been closed is probably the hardest to solve without non-blocking reads. However, the existing code only solves part of the problem. It can detect a closed or broken connection the OS knows about, but it won't actually detect other types of broken connections such as a network interruption. This is currently implemented in CheckConn and called automatically when checking a connection out of the pool that has been idle for over one second. This is resolved by changing CheckConn to a very short deadline read, deprecating it, and replacing internal calls to CheckConn with Ping.

jackc added 6 commits June 3, 2023 08:42
The non-blocking IO system was designed to solve three problems:

1. Deadlock that can occur when both sides of a connection are blocked
   writing because all buffers between are full.
2. The inability to use a write deadline with a TLS.Conn without killing
   the connection.
3. Efficiently check if a connection has been closed before writing.
   This reduces the cases where the application doesn't know if a query
   that does a INSERT/UPDATE/DELETE was actually sent to the server or
   not.

However, the nbconn package is extraordinarily complex, has been a
source of very tricky bugs, and has OS specific code paths. It also does
not work at all with underlying net.Conn implementations that do not
have platform specific non-blocking IO syscall support and do not
properly implement deadlines. In particular, this is the case with
golang.org/x/crypto/ssh.

I believe the deadlock problem can be solved with a combination of a
goroutine for CopyFrom like v4 used and a watchdog for regular queries
that uses time.AfterFunc.

The write deadline problem actually should be ignorable. We check for
context cancellation before sending a query and the actual Write should
be almost instant as long as the underlying connection is not blocked.
(We should only have to wait until it is accepted by the OS, not until
it is fully sent.)

Efficiently checking if a connection has been closed is probably the
hardest to solve without non-blocking reads. However, the existing code
only solves part of the problem. It can detect a closed or broken
connection the OS knows about, but it won't actually detect other types
of broken connections such as a network interruption. This is currently
implemented in CheckConn and called automatically when checking a
connection out of the pool that has been idle for over one second. I
think that changing CheckConn to a very short deadline read and changing
the pool to do an actual Ping would be an acceptable solution.

Remove nbconn and non-blocking code. This does not leave the system in
an entirely working state. In particular, CopyFrom is broken, deadlocks
can occur for extremely large queries or batches, and PgConn.CheckConn
is now a `select 1` ping. These will be resolved in subsequent commits.
This approach uses an extra goroutine to write while the main goroutine
continues to read. This avoids the need to use non-blocking I/O.
This commit adds a background reader that can optionally buffer reads.
It is used whenever a potentially blocking write is made to the server.
The background reader is started on a slight delay so there should be no
meaningful performance impact as it doesn't run for quick queries and
its overhead is minimal relative to slower queries.
@jackc
Copy link
Owner Author

jackc commented Jun 4, 2023

This PR is an overall reduction of over 1,000 lines. The new system is simpler, but still complex in places, and it is a substantial change. I would appreciate some review and testing.

@drakkan
Copy link
Contributor

drakkan commented Jun 4, 2023

@jackc thank you!

I did a quick test

cat go.mod | grep pgx
	github.com/jackc/pgx/v5 v5.3.2-0.20230604013921-d9633608d0fe

sftpgo test cases pass on both Linux and Windows. I just use the database/sql driver so nothing complex.
Honestly this no raw syscalls approach makes me feel much more relaxed, this way pgx should work the same on any platform 😄

cc @codercms

pgconn/pgconn.go Outdated
// enterPotentialWriteReadDeadlock must be called before a write that could deadlock if the server is simultaneously
// blocked writing to us.
func (pgConn *PgConn) enterPotentialWriteReadDeadlock() {
pgConn.slowWriteTimer.Reset(10 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 milliseconds seems like a pretty arbitrary value. Since in most cases this timer shouldn't be needed I was wondering if we could use a higher value, e.g. 100 milliseconds. Does using a higher value create performance issues with batch queries? In any case I would use at least 15 milliseconds as this is the current minimum resolution of timers on Windows so that pgx behaves the same on all supported operating systems

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 milliseconds is basically an educated guess. A Write should only take as long as the syscall and memcpy to the OS outbound network buffer unless the buffer is full (which potentially is a block). It seemed like long enough for the normal case, but short enough not to kill performance if a block occurs.

And yes, large batch queries are the primary place this would be an issue.

I don't mind increasing it to 15ms, but I wonder Windows would still behave differently. Is that resolution per timer or is it some sort of system wide tick? i.e. Could we end up waiting almost 30ms in the worst case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 milliseconds is basically an educated guess. A Write should only take as long as the syscall and memcpy to the OS outbound network buffer unless the buffer is full (which potentially is a block). It seemed like long enough for the normal case, but short enough not to kill performance if a block occurs.

And yes, large batch queries are the primary place this would be an issue.

I don't mind increasing it to 15ms, but I wonder Windows would still behave differently. Is that resolution per timer or is it some sort of system wide tick? i.e. Could we end up waiting almost 30ms in the worst case?

I think this is the resolution per timer. Windows supports high resolution timers but the Go implementation don't use them. There is a WIP patch here

Copy link
Contributor

@codercms codercms Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackc Default timer resolution in Windows is 15.6 ms, so in theory the worst case would be awaiting for 15.6 ms.
Ref: https://randomascii.wordpress.com/2013/07/08/windows-timer-resolution-megawatts-wasted/

But it's possible to set timer resolution to 1ms or even to 0.5 ms.
Ref: https://stackoverflow.com/a/22862989/2585154 , but this settings are system-wide, it will change timer resolution for entire system and affect all processes.

P.S. Some process may set timer resolution e.g. to 50ms, then the worst case would be awaiting for 50ms.

Copy link
Contributor

@codercms codercms Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackc here is more detailed information about system timers in Windows - http://windowstimestamp.com/description

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed a change to 15ms and added docs explaining how 15ms was chosen.

@codercms
Copy link
Contributor

codercms commented Jun 5, 2023

@jackc @drakkan It looks good to me, I'll test it on real Windows machine

@brandur
Copy link

brandur commented Jun 5, 2023

I put the branch into our staging environment and tested it over a number of SSH connections. So far I haven't noticed any problems, which is good, because previously we were able to observe locking issues ~immediately.

So I'm hopeful it's working as expected! I'll follow up if we notice anything amiss. Thanks for the quick work!

@danmcgee-soda
Copy link
Contributor

This change looks great!

We are using a custom DialFunc to connect to Google Cloud SQL databases using the dialer from the cloudsqlconn library, and we were seeing a 100ms delay on the first query issued, each time an idle connection was pulled from the pool. This is because the custom dialer forced the code down the fake non-blocking path, instead of having direct access to the underlying net.Conn object. We had to work around it with this bit of code as an afterConnect function, to shorten the hardcoded 100ms timeout value that could only be reduced if you triggered certain code paths:

func afterConnect(ctx context.Context, c *pgx.Conn) error {
	/*
		Because of our use of the Google Cloud SQL proxy library to talk to the database in cloud environments,
		the switch to non-blocking I/O in pgx/v5 started causing us a 100ms delay on acquiring a connection
		from the connection pool. This doesn't happen locally because the connection is more "direct", and pgx
		can use real non-blocking I/O. However, with the proxy tunnel involved, it falls back to a fake non-blocking
		implementation, which by default waits 100ms to prove the connection hasn't been closed. See `CheckConn`
		and chase it all the way down to `nbconn.go`.

		We can work around this by doing the only operation that reduces this 100ms delay - a COPY FROM. The current
		fake non-blocking implementation will time and measure a read operation in this code path, and reduce the
		delay to 2x the measured delay. In a cloud environment, this should be 1-5 ms at most. While it would be
		great to get this back to 0ms, it is orders of magnitude better than 100ms.

		This one line of code took us about two days to figure out, and many other ideas were tried and ruled out
		along the way. Please don't change this unless you really know what you're doing.
	*/
	_, err := c.CopyFrom(ctx, pgx.Identifier{"schema_migrations"}, []string{"version"}, pgx.CopyFromRows([][]any{}))
	return err
}

I'm going to try and find some time to try a build with the changes from this branch, and see if it lets us remove our workaround. I'll report back with the results.

@danmcgee-soda
Copy link
Contributor

I'm going to try and find some time to try a build with the changes from this branch, and see if it lets us remove our workaround. I'll report back with the results.

I was able to run this branch in a Google Cloud K8S deployment today with our workaround removed, connecting via a custom dialer, and it worked great- no slowdown on first usage of the connection from the pool. Let me know if there is any additional testing we can do!

@codercms
Copy link
Contributor

codercms commented Jun 11, 2023

@jackc @drakkan I ran tests on a real Windows machine, pgconn_test fails on my machine with these errors (I use PostgreSQL 15 for tests):

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult
--- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult (0.24s)
=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/cache_statement
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/cache_statement (0.03s)

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/cache_describe
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/cache_describe (0.02s)

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/describe_exec
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/describe_exec (0.14s)

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/exec
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/exec (0.02s)

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/simple_protocol
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/simple_protocol (0.02s)

But it doesn't seem to be affected by this PR, so I guess it doesn't matter.
All other tests passed successfully:

go test ./...

...

FAIL    github.com/jackc/pgx/v5 10.766s
ok      github.com/jackc/pgx/v5/internal/iobufpool      0.226s
ok      github.com/jackc/pgx/v5/internal/pgio   0.042s
ok      github.com/jackc/pgx/v5/internal/pgmock 0.070s
ok      github.com/jackc/pgx/v5/internal/sanitize       0.041s
ok      github.com/jackc/pgx/v5/pgconn  9.859s
ok      github.com/jackc/pgx/v5/pgconn/internal/bgreader        1.585s
ok      github.com/jackc/pgx/v5/pgconn/internal/ctxwatch        4.008s
ok      github.com/jackc/pgx/v5/pgproto3        0.471s
ok      github.com/jackc/pgx/v5/pgtype  10.166s
ok      github.com/jackc/pgx/v5/pgtype/zeronull 0.713s
ok      github.com/jackc/pgx/v5/pgxpool 7.709s
ok      github.com/jackc/pgx/v5/stdlib  10.982s
ok      github.com/jackc/pgx/v5/tracelog        0.682s
FAIL

@drakkan
Copy link
Contributor

drakkan commented Jun 11, 2023

@jackc @drakkan I ran tests of PR on a real Windows machine, pgconn_test fails on my machine with these errors (I use PostgreSQL 15 for tests):

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult
--- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult (0.24s)
=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/cache_statement
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/cache_statement (0.03s)

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/cache_describe
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/cache_describe (0.02s)

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/describe_exec
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/describe_exec (0.14s)

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/exec
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/exec (0.02s)

=== RUN   TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/simple_protocol
    conn_test.go:870: ERROR: type "uint64" does not exist (SQLSTATE 42704)
    --- FAIL: TestUnregisteredTypeUsableAsStringArgumentAndBaseResult/simple_protocol (0.02s)

But it doesn't seem to be affected by this pull request, so I guess it doesn't matter. All other tests passed successfully:

@codercms you need to create to uint64 domain like this. After executing the initialization sql all tests should pass

go test ./...

...

FAIL    github.com/jackc/pgx/v5 10.766s
ok      github.com/jackc/pgx/v5/internal/iobufpool      0.226s
ok      github.com/jackc/pgx/v5/internal/pgio   0.042s
ok      github.com/jackc/pgx/v5/internal/pgmock 0.070s
ok      github.com/jackc/pgx/v5/internal/sanitize       0.041s
ok      github.com/jackc/pgx/v5/pgconn  9.859s
ok      github.com/jackc/pgx/v5/pgconn/internal/bgreader        1.585s
ok      github.com/jackc/pgx/v5/pgconn/internal/ctxwatch        4.008s
ok      github.com/jackc/pgx/v5/pgproto3        0.471s
ok      github.com/jackc/pgx/v5/pgtype  10.166s
ok      github.com/jackc/pgx/v5/pgtype/zeronull 0.713s
ok      github.com/jackc/pgx/v5/pgxpool 7.709s
ok      github.com/jackc/pgx/v5/stdlib  10.982s
ok      github.com/jackc/pgx/v5/tracelog        0.682s
FAIL

@codercms
Copy link
Contributor

@drakkan thx, all tests passed:

go test ./...
?       github.com/jackc/pgx/v5/examples/chat   [no test files]
?       github.com/jackc/pgx/v5/examples/todo   [no test files]
?       github.com/jackc/pgx/v5/examples/url_shortener  [no test files]
?       github.com/jackc/pgx/v5/internal/anynil [no test files]
?       github.com/jackc/pgx/v5/internal/stmtcache      [no test files]
?       github.com/jackc/pgx/v5/pgproto3/example/pgfortune      [no test files]
?       github.com/jackc/pgx/v5/log/testingadapter      [no test files]
?       github.com/jackc/pgx/v5/pgxtest [no test files]
ok      github.com/jackc/pgx/v5 10.184s
ok      github.com/jackc/pgx/v5/internal/iobufpool      0.210s
ok      github.com/jackc/pgx/v5/internal/pgio   0.043s
ok      github.com/jackc/pgx/v5/internal/pgmock 0.067s
ok      github.com/jackc/pgx/v5/internal/sanitize       0.042s
ok      github.com/jackc/pgx/v5/pgconn  10.277s
ok      github.com/jackc/pgx/v5/pgconn/internal/bgreader        1.473s
ok      github.com/jackc/pgx/v5/pgconn/internal/ctxwatch        4.332s
ok      github.com/jackc/pgx/v5/pgproto3        0.610s
ok      github.com/jackc/pgx/v5/pgtype  10.857s
ok      github.com/jackc/pgx/v5/pgtype/zeronull 0.605s
ok      github.com/jackc/pgx/v5/pgxpool 7.655s
ok      github.com/jackc/pgx/v5/stdlib  10.346s
ok      github.com/jackc/pgx/v5/tracelog        0.519s

@codercms
Copy link
Contributor

codercms commented Jun 11, 2023

@jackc @drakkan I also installed pgx version from that PR into my real app, it seems to work fine, and sometimes I see even better query times.

@jackc
Copy link
Owner Author

jackc commented Jun 12, 2023

All the feedback has been positive so I'm going to merge this. Thanks for testing it everyone!

@jackc jackc merged commit 34eddf9 into master Jun 12, 2023
@hothanhloc68
Copy link

@jackc thank you!

I did a quick test

cat go.mod | grep pgx
	github.com/jackc/pgx/v5 v5.3.2-0.20230604013921-d9633608d0fe

sftpgo test cases pass on both Linux and Windows. I just use the database/sql driver so nothing complex.
Honestly this no raw syscalls approach makes me feel much more relaxed, this way pgx should work the same on any platform 😄

cc @codercms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants