Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Race with Incorrect Username or Password #2382

Closed
PatrickDuncan opened this issue Nov 9, 2022 · 4 comments · Fixed by #2698
Closed

Data Race with Incorrect Username or Password #2382

PatrickDuncan opened this issue Nov 9, 2022 · 4 comments · Fixed by #2698
Assignees
Labels
bug needs-investigation Issues that require followup from maintainers

Comments

@PatrickDuncan
Copy link
Member

PatrickDuncan commented Nov 9, 2022

I tracked down the root cause to this PR that was merged in 1.34.0 https://github.com/Shopify/sarama/pull/2234/files

Versions

Please specify real version numbers or git SHAs, not just "Latest" since that changes fairly regularly.

Sarama Kafka Go
v1.37.2 2.13-2.7.1 go version go1.18.2 darwin/amd64
Configuration

What configuration values are you using for Sarama and Kafka?

config.Producer.Flush.Bytes = 0
config.Producer.Flush.Frequency = 5 * time.Second
config.Producer.Flush.MaxMessages = 0
config.Producer.Idempotent = true
config.Producer.RequiredAcks = sarama.WaitForAll
config.Producer.Retry.Max = 3
config.Producer.Retry.Backoff = 100 * time.Millisecond
config.Producer.Return.Successes = true
config.Producer.Return.Errors = true
Logs
kafka server: SASL Authentication failed: Authentication failed: Invalid username or password
==================
WARNING: DATA RACE
Write at 0x00c00012e768 by goroutine 74:
  github.com/Shopify/sarama.(*Broker).Open.func1()
      /go/pkg/mod/github.com/!shopify/[email protected]/broker.go:248 +0x1a44
  github.com/Shopify/sarama.withRecover()
      /go/pkg/mod/github.com/!shopify/[email protected]/utils.go:43 +0x48
  github.com/Shopify/sarama.(*Broker).Open.func2()
      /go/pkg/mod/github.com/!shopify/[email protected]/broker.go:178 +0x39

Previous read at 0x00c00012e768 by goroutine 72:
  github.com/Shopify/sarama.(*Broker).responseReceiver()
      /go/pkg/mod/github.com/!shopify/[email protected]/broker.go:1160 +0x555
  github.com/Shopify/sarama.(*Broker).responseReceiver-fm()
      <autogenerated>:1 +0x39
  github.com/Shopify/sarama.withRecover()
      /go/pkg/mod/github.com/!shopify/[email protected]/utils.go:43 +0x48
  github.com/Shopify/sarama.(*Broker).Open.func1.2()
      /go/pkg/mod/github.com/!shopify/[email protected]/broker.go:251 +0x39

Goroutine 74 (running) created at:
  github.com/Shopify/sarama.(*Broker).Open()
      /go/pkg/mod/github.com/!shopify/[email protected]/broker.go:178 +0x566
  github.com/Shopify/sarama.(*client).LeastLoadedBroker()
      /go/pkg/mod/github.com/!shopify/[email protected]/client.go:776 +0x1a7
  github.com/Shopify/sarama.(*transactionManager).initProducerId.func2()
      /go/pkg/mod/github.com/!shopify/[email protected]/transaction_manager.go:514 +0xe5
  github.com/Shopify/sarama.(*transactionManager).initProducerId.func1()
      /go/pkg/mod/github.com/!shopify/[email protected]/transaction_manager.go:496 +0x139
  github.com/Shopify/sarama.(*transactionManager).initProducerId()
      /go/pkg/mod/github.com/!shopify/[email protected]/transaction_manager.go:508 +0x7aa
  github.com/Shopify/sarama.newTransactionManager()
      /go/pkg/mod/github.com/!shopify/[email protected]/transaction_manager.go:872 +0x3cb
  github.com/Shopify/sarama.newAsyncProducer()
      /go/pkg/mod/github.com/!shopify/[email protected]/async_producer.go:117 +0x92
  github.com/Shopify/sarama.NewAsyncProducer()
      /go/pkg/mod/github.com/!shopify/[email protected]/async_producer.go:99 +0x84

Goroutine 72 (finished) created at:
  github.com/Shopify/sarama.(*Broker).Open.func1()
      /go/pkg/mod/github.com/!shopify/[email protected]/broker.go:251 +0x1c09
  github.com/Shopify/sarama.withRecover()
      /go/pkg/mod/github.com/!shopify/[email protected]/utils.go:43 +0x48
  github.com/Shopify/sarama.(*Broker).Open.func2()
      /go/pkg/mod/github.com/!shopify/[email protected]/broker.go:178 +0x39
Problem Description

Our unit tests validate the behaviour of invalid usernames and passwords. It's running into a data race with go test -race. Which was not the case with v1.30.0

@PatrickDuncan PatrickDuncan changed the title v1.37.2 Data Race with Incorrect Username or Password Data Race with Incorrect Username or Password Nov 11, 2022
@github-actions

This comment was marked as outdated.

@github-actions github-actions bot added the stale Issues and pull requests without any recent activity label Jul 18, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 17, 2023
@dnwe dnwe reopened this Aug 17, 2023
@dnwe dnwe added needs-investigation Issues that require followup from maintainers and removed stale Issues and pull requests without any recent activity labels Aug 17, 2023
@jeremyroberts0
Copy link

I can reproduce this in v1.41.2

@dnwe dnwe added the bug label Oct 11, 2023
@prestona prestona self-assigned this Oct 30, 2023
@prestona
Copy link
Member

The lines of code identified by the race detector are:

b.done = make(chan bool)

And

sarama/broker.go

Line 1160 in 610514e

close(b.done)

So it seems the problem relates to the Broker's done channel.

I was initially puzzled as to how this could be a race, as the go-routine that reads from the done channel (by closing it) is launched from the go-routine that initializes the value of the done channel. From The Go Memory Model:

The go statement that starts a new goroutine is synchronized before the start of the goroutine's execution.

It then occurred to me that the instance of the Broker struct could be being re-used. So closing the done channel as a result of a failed first connect request could be run at the same time as a second call to Broker.Open made on the same instance of the Broker struct.

If I hack the broker code to always go down the V1 SASL authentication failure path then the race can be re-produced with a short test-case:

func TestRace(t *testing.T) {
	b := NewBroker("localhost:9092")
	cfg := NewConfig()
	_ = b.Open(cfg)

	// Block until first open completes
	for atomic.LoadInt32(&b.opened) == 1 {
	}

	_ = b.Open(cfg)
}

I suspect the fix is to update the SASL authentication failure path to try and read from the done channel after closing the responses channel - mimicking the way that Broker.Close() ensures the go-routine running responseReceiver() completes before proceeding. However, first I'd like to try and write a test-case that re-produces this behavior using the MockBroker.

@prestona
Copy link
Member

Frustratingly, I'm unable to write a testcase that consistently re-produces the issue. If I add a milliseconds sleep just before the call to close(b.done) here, then I can reliable re-produce in a test. I wonder if the data race occurs very infrequently, or perhaps is affected by system load?

With the millisecond sleep hacked in to the code, I've been able to confirm that adding a <-done just after the call to close(b.responses) here reliably prevents the data race from occurring.

prestona added a commit to prestona/sarama that referenced this issue Oct 31, 2023
The underlying case was not waiting for the goroutine running the
`responseReceiver()` method to fully complete if SASL authentication
failed. This created a window where a further call to `Broker.Open()`
could overwrite the `Broker.done` channel value while the goroutine
still running `responseReceiver()` was trying to close the same channel.

Fixes: IBM#2382
Signed-off-by: Adrian Preston <[email protected]>
@dnwe dnwe closed this as completed in #2698 Nov 1, 2023
dnwe pushed a commit that referenced this issue Nov 1, 2023
The underlying case was not waiting for the goroutine running the
`responseReceiver()` method to fully complete if SASL authentication
failed. This created a window where a further call to `Broker.Open()`
could overwrite the `Broker.done` channel value while the goroutine
still running `responseReceiver()` was trying to close the same channel.

Fixes: #2382

Signed-off-by: Adrian Preston <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs-investigation Issues that require followup from maintainers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants