Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use correct mutex to guard confirms.published #240

Merged
merged 3 commits into from
Jan 31, 2024

Conversation

hjr265
Copy link
Contributor

@hjr265 hjr265 commented Jan 23, 2024

The confirms type uses the publishedMut to guard access to the published field:

amqp091-go/confirms.go

Lines 43 to 47 in a6fa7f7

c.publishedMut.Lock()
defer c.publishedMut.Unlock()
c.published++
return c.deferredConfirmations.Add(c.published)

amqp091-go/confirms.go

Lines 53 to 56 in a6fa7f7

c.publishedMut.Lock()
defer c.publishedMut.Unlock()
c.deferredConfirmations.remove(c.published)
c.published--

etc.

However, the Channel type uses the mutex m (on the confirms type) instead:

amqp091-go/channel.go

Lines 1829 to 1832 in a6fa7f7

ch.confirms.m.Lock()
defer ch.confirms.m.Unlock()
return ch.confirms.published + 1

This can lead to race if the GetNextPublishSeqNo function is used while other functions on the confirms type is used concurrently.

@hjr265
Copy link
Contributor Author

hjr265 commented Jan 23, 2024

I have added a test which, if run without the fix, results in data race warning:

» make tests GO_TEST_FLAGS='-run ^TestIntegrationGetNextPublishSeqNoRace$$ -count 1'
go test -race -v -tags integration -run ^TestIntegrationGetNextPublishSeqNoRace$ -count 1
environment variable envAMQPURLName undefined or empty, using default: "amqp://guest:[email protected]:5672/"
=== RUN   TestIntegrationGetNextPublishSeqNoRace
==================
WARNING: DATA RACE
Write at 0x00c000096260 by goroutine 12:
  github.com/rabbitmq/amqp091-go.(*confirms).publish()
      /home/hjr265/Projects/contrib/amqp091-go/confirms.go:46 +0xea
  github.com/rabbitmq/amqp091-go.(*Channel).PublishWithDeferredConfirmWithContext()
      /home/hjr265/Projects/contrib/amqp091-go/channel.go:1564 +0x18a
  github.com/rabbitmq/amqp091-go.(*Channel).PublishWithContext()
      /home/hjr265/Projects/contrib/amqp091-go/channel.go:1528 +0x106
  github.com/rabbitmq/amqp091-go.TestIntegrationGetNextPublishSeqNoRace.func2()
      /home/hjr265/Projects/contrib/amqp091-go/integration_test.go:2065 +0x107

Previous read at 0x00c000096260 by goroutine 11:
  github.com/rabbitmq/amqp091-go.(*Channel).GetNextPublishSeqNo()
      /home/hjr265/Projects/contrib/amqp091-go/channel.go:1832 +0xfc
  github.com/rabbitmq/amqp091-go.TestIntegrationGetNextPublishSeqNoRace.func1()
      /home/hjr265/Projects/contrib/amqp091-go/integration_test.go:2057 +0xb9

Goroutine 12 (running) created at:
  github.com/rabbitmq/amqp091-go.TestIntegrationGetNextPublishSeqNoRace()
      /home/hjr265/Projects/contrib/amqp091-go/integration_test.go:2063 +0x5f5
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1595 +0x261
  testing.(*T).Run.func1()
      /usr/lib/go/src/testing/testing.go:1648 +0x44

Goroutine 11 (finished) created at:
  github.com/rabbitmq/amqp091-go.TestIntegrationGetNextPublishSeqNoRace()
      /home/hjr265/Projects/contrib/amqp091-go/integration_test.go:2055 +0x4e4
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1595 +0x261
  testing.(*T).Run.func1()
      /usr/lib/go/src/testing/testing.go:1648 +0x44
==================
    testing.go:1465: race detected during execution of test
--- FAIL: TestIntegrationGetNextPublishSeqNoRace (0.01s)
=== NAME  
    testing.go:1465: race detected during execution of test
FAIL
exit status 1
FAIL    github.com/rabbitmq/amqp091-go  0.023s
make: *** [Makefile:20: tests] Error 1

With the fix, there is no data race.

» make tests GO_TEST_FLAGS='-run ^TestIntegrationGetNextPublishSeqNoRace$$ -count 1'
go test -race -v -tags integration -run ^TestIntegrationGetNextPublishSeqNoRace$ -count 1
environment variable envAMQPURLName undefined or empty, using default: "amqp://guest:[email protected]:5672/"
=== RUN   TestIntegrationGetNextPublishSeqNoRace
--- PASS: TestIntegrationGetNextPublishSeqNoRace (0.01s)
PASS
ok      github.com/rabbitmq/amqp091-go  1.022s

@hjr265 hjr265 changed the title Use correct mutex to guard confirms published Use correct mutex to guard confirms.published Jan 23, 2024
@lukebakken lukebakken self-assigned this Jan 23, 2024
@lukebakken lukebakken added this to the 1.9.1 milestone Jan 23, 2024
@lukebakken
Copy link
Contributor

Thank you very much! If you have time to review the use of these and other mutexes for similar errors, we would appreciate it. Obviously, this is a subtle bug 😸

@hjr265
Copy link
Contributor Author

hjr265 commented Jan 23, 2024

@lukebakken It was a subtle bug indeed. I came across it while investing an issue I am facing on production. The only clues I had were that the production bug started happening when I implemented Publisher Confirms and the code was publishing a lot of messages in bursts.

I will keep an eye out for more. 🙂

@hjr265 hjr265 force-pushed the correct-confirms-published-mutex branch from ddb7218 to 977c4d2 Compare January 23, 2024 17:51
@Zerpet Zerpet self-assigned this Jan 31, 2024
@Zerpet
Copy link
Contributor

Zerpet commented Jan 31, 2024

Thank you for fixing this! I've approved the CI run and I'll merge after CI passes 👍

@Zerpet Zerpet merged commit 4a009c7 into rabbitmq:main Jan 31, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants