Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/sync/semaphore: skipping acquisition if the context is done is a breaking change #67723

Open
baganokodo2022 opened this issue May 30, 2024 · 6 comments
Labels
BugReport Issues describing a possible bug in the Go implementation. NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@baganokodo2022
Copy link

Go version

go version go1.21.1 darwin/arm64

Output of go env in your module/workspace:

GO111MODULE='on'
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/xinyu.liu/Library/Caches/go-build'
GOENV='/Users/xinyu.liu/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS='-count=1'
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/xinyu.liu/go/pkg/mod'
GONOPROXY=''
GONOSUMDB='github.cbhq.net'
GOOS='darwin'
GOPATH='/Users/xinyu.liu/go'
GOPRIVATE=''
GOPROXY='https://gomodules.cbhq.net/'
GOROOT='/opt/homebrew/opt/go/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/homebrew/opt/go/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.21.1'
GCCGO='gccgo'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/Users/xinyu.liu/src/streaming-sdk/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/hh/kbt1yk9d3y56yxzxr382jth80000gn/T/go-build912487987=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

The PR below introduced a breaking change - a regression, and broke our services deployed in production environments.
golang/sync@14be23e

What did you see happen?

We explicitly make use of a zero-timeout context,

ctx, cancel := context.WithTimeout(context.Background(), 0)
defer cancel()

and pass the ctx to sem.Acquire(ctx, 1) to minimize the wait time on a critical path. The goal is to avoid latencies causing by the wait timeouts due to upstream back pressure.

For instance, we use sem to control in-flight messages sent to Kafka. When Kafka brokers experience back pressure, we would like the call to sem.Acquire failed immediately to unblock the main thread, otherwise, it would introduce extra latencies to the critical, user-facing threads.

With this change in the PR, ctx with 0 timeout immediately returns a context cancelled error from sem.Acquire(ctx, 1) even if the semaphore still has a full capacity.

What did you expect to see?

Passing ctx with 0 timeout to sem.Acquire(ctx, int) should successfully acquire the semaphore counts as long as there are still capacity immediately available on the semaphore.

baganokodo2022 referenced this issue in golang/sync May 30, 2024
When acquiring from a semaphore could proceed without contention, the
previous behavior was to always do so, even when the provided context
was done. This was the documented behavior, but it could lead to
confusion. It isn't much more expensive to check the context error, so
cancel acquisition if it's done.

Fixes golang/go#63615.

goos: linux
goarch: amd64
pkg: golang.org/x/sync/semaphore
cpu: 12th Gen Intel(R) Core(TM) i5-1235U
                                        │  old.bench  │             new.bench              │
                                        │   sec/op    │   sec/op     vs base               │
AcquireSeq/Weighted-acquire-1-1-1-12      26.45n ± 2%   27.25n ± 3%  +3.04% (p=0.001 n=20)
AcquireSeq/Weighted-acquire-2-1-1-12      26.96n ± 1%   27.12n ± 1%       ~ (p=0.104 n=20)
AcquireSeq/Weighted-acquire-16-1-1-12     26.07n ± 3%   27.48n ± 1%  +5.45% (p=0.000 n=20)
AcquireSeq/Weighted-acquire-128-1-1-12    26.19n ± 2%   27.24n ± 1%  +4.01% (p=0.000 n=20)
AcquireSeq/Weighted-acquire-2-2-1-12      25.61n ± 1%   25.99n ± 2%       ~ (p=0.066 n=20)
AcquireSeq/Weighted-acquire-16-2-8-12     209.6n ± 2%   211.0n ± 3%       ~ (p=0.280 n=20)
AcquireSeq/Weighted-acquire-128-2-64-12   1.669µ ± 1%   1.721µ ± 2%  +3.09% (p=0.000 n=20)
AcquireSeq/Weighted-acquire-2-1-2-12      51.08n ± 1%   53.03n ± 2%  +3.82% (p=0.000 n=20)
AcquireSeq/Weighted-acquire-16-8-2-12     52.48n ± 2%   53.66n ± 2%  +2.26% (p=0.028 n=20)
AcquireSeq/Weighted-acquire-128-64-2-12   52.27n ± 1%   53.71n ± 2%  +2.75% (p=0.000 n=20)
geomean                                   60.06n        61.69n       +2.71%

Change-Id: I0ae1a0bb6c027461ac1a9ee71c51efd8427ab308
Reviewed-on: https://go-review.googlesource.com/c/sync/+/536275
Auto-Submit: Bryan Mills <[email protected]>
Reviewed-by: Bryan Mills <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
@ianlancetaylor ianlancetaylor changed the title import/path: golang/sync/semaphore/semaphore.go breaking change in v0.7.0 x/sync: skipping acquisition if the context is done is a breaking change May 30, 2024
@gopherbot gopherbot added this to the Unreleased milestone May 30, 2024
@ianlancetaylor
Copy link
Member

The change in question is https://go.dev/cl/536275, written to address issue #63615. CC @szuecs @zephyrtronium

@ianlancetaylor
Copy link
Member

From the description of your code, it sounds like you should be using the TryAcquire method rather than the Acquire method.

@seankhliao seankhliao changed the title x/sync: skipping acquisition if the context is done is a breaking change x/sync/semaphore: skipping acquisition if the context is done is a breaking change May 30, 2024
@baganokodo2022
Copy link
Author

baganokodo2022 commented May 30, 2024

In that case, I need to figure out the timeout on the context, and call TryAcquire when the timeout is set to zero, and call Acquire when the timeout is not zero.

Since the timeout is set by client programs, it doesn't seem to be viable. I have to say, the vast majority of our clients don't set zero timeout on the context, while there are still very few performing fire-and-forget message publishing and expecting minimum latencies on back pressure.

Thoughts?

@ianlancetaylor
Copy link
Member

You can call the context.Context.Deadline method to find out whether the context has timed out.

@ruyi789
Copy link

ruyi789 commented May 31, 2024

The 0 timeout that triggers done depends only on the context, so you should fix the context

@mknyszek mknyszek added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Jun 3, 2024
@gabyhelp gabyhelp added the Bug label Jan 4, 2025
@neild
Copy link
Contributor

neild commented Jan 4, 2025

I think there's a reasonable argument to be made that CL 536275 was not a backwards-compatible change.

There are three things a function can do when passed a pre-canceled context: Succeed, fail, or pick one at random. Weighted.Acquire was documented as possibly doing any of these ("If ctx is already done, Acquire may still succeed..."), but the implementation was to consistently succeed. CL 536275 changed it to consistently fail. The new behavior is consistent with the documentation, but so was the previous behavior. I also don't think there's a clear right choice on always-fail vs. always-succeed here.

In general, we try to avoid making substantive changes that may cause existing programs to break, even if the documentation permits the change, except when the change is a clear improvement. I'm not convinced this change meets that bar.

That said, the change was made about nine months ago. Reverting it at this time would be just more churn and opportunity for breakage.

@jba jba added BugReport Issues describing a possible bug in the Go implementation. and removed Bug labels Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BugReport Issues describing a possible bug in the Go implementation. NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

8 participants