Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data race in params.Entries chan #75

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kpp
Copy link

@kpp kpp commented Aug 23, 2019

A solution from whyrusleeping#4 (comment)

Other solutions can be:

  1. Create new ServiceEntry in func ensureName if inp from inprogress is already sent
  2. Remove entries with value==inp from map inprogress before sending to channel

Before:

$ go test -race -v
=== RUN   TestServer_StartStop
--- PASS: TestServer_StartStop (0.00s)
=== RUN   TestServer_Lookup
==================
WARNING: DATA RACE
Write at 0x00c0001f6050 by goroutine 9:
  github.com/hashicorp/mdns.(*client).query()
      /home/humbug/ipfs/mdns/client.go:252 +0x17e3
  github.com/hashicorp/mdns.Query()
      /home/humbug/ipfs/mdns/client.go:86 +0xf4
  github.com/hashicorp/mdns.TestServer_Lookup()
      /home/humbug/ipfs/mdns/server_test.go:52 +0x2c9
  testing.tRunner()
      /usr/lib/go-1.12/src/testing/testing.go:865 +0x163

Previous read at 0x00c0001f6050 by goroutine 12:
  github.com/hashicorp/mdns.TestServer_Lookup.func1()
      /home/humbug/ipfs/mdns/server_test.go:33 +0x223

Goroutine 9 (running) created at:
  testing.(*T).Run()
      /usr/lib/go-1.12/src/testing/testing.go:916 +0x65a
  testing.runTests.func1()
      /usr/lib/go-1.12/src/testing/testing.go:1157 +0xa8
  testing.tRunner()
      /usr/lib/go-1.12/src/testing/testing.go:865 +0x163
  testing.runTests()
      /usr/lib/go-1.12/src/testing/testing.go:1155 +0x523
  testing.(*M).Run()
      /usr/lib/go-1.12/src/testing/testing.go:1072 +0x2eb
  main.main()
      _testmain.go:64 +0x222

Goroutine 12 (finished) created at:
  github.com/hashicorp/mdns.TestServer_Lookup()
      /home/humbug/ipfs/mdns/server_test.go:27 +0x226
  testing.tRunner()
      /usr/lib/go-1.12/src/testing/testing.go:865 +0x163
==================
==================
WARNING: DATA RACE
Write at 0x00c0001f6058 by goroutine 9:
  github.com/hashicorp/mdns.(*client).query()
      /home/humbug/ipfs/mdns/client.go:257 +0xbdd
  github.com/hashicorp/mdns.Query()
      /home/humbug/ipfs/mdns/client.go:86 +0xf4
  github.com/hashicorp/mdns.TestServer_Lookup()
      /home/humbug/ipfs/mdns/server_test.go:52 +0x2c9
  testing.tRunner()
      /usr/lib/go-1.12/src/testing/testing.go:865 +0x163

Previous read at 0x00c0001f6058 by goroutine 12:
  github.com/hashicorp/mdns.TestServer_Lookup.func1()
      /home/humbug/ipfs/mdns/server_test.go:36 +0x245

Goroutine 9 (running) created at:
  testing.(*T).Run()
      /usr/lib/go-1.12/src/testing/testing.go:916 +0x65a
  testing.runTests.func1()
      /usr/lib/go-1.12/src/testing/testing.go:1157 +0xa8
  testing.tRunner()
      /usr/lib/go-1.12/src/testing/testing.go:865 +0x163
  testing.runTests()
      /usr/lib/go-1.12/src/testing/testing.go:1155 +0x523
  testing.(*M).Run()
      /usr/lib/go-1.12/src/testing/testing.go:1072 +0x2eb
  main.main()
      _testmain.go:64 +0x222

Goroutine 12 (finished) created at:
  github.com/hashicorp/mdns.TestServer_Lookup()
      /home/humbug/ipfs/mdns/server_test.go:27 +0x226
  testing.tRunner()
      /usr/lib/go-1.12/src/testing/testing.go:865 +0x163
==================
2019/08/23 19:18:27 [INFO] mdns: Closing client {0xc0000a0050 0xc0000a0058 0xc0000a0060 0xc0000a0068 1 0xc000088480}
--- FAIL: TestServer_Lookup (0.05s)
    testing.go:809: race detected during execution of test
=== RUN   TestNewMDNSService_BadParams
--- PASS: TestNewMDNSService_BadParams (0.00s)
=== RUN   TestMDNSService_BadAddr
--- PASS: TestMDNSService_BadAddr (0.00s)
=== RUN   TestMDNSService_ServiceAddr
--- PASS: TestMDNSService_ServiceAddr (0.00s)
=== RUN   TestMDNSService_InstanceAddr_ANY
--- PASS: TestMDNSService_InstanceAddr_ANY (0.00s)
=== RUN   TestMDNSService_InstanceAddr_SRV
--- PASS: TestMDNSService_InstanceAddr_SRV (0.00s)
=== RUN   TestMDNSService_InstanceAddr_A
--- PASS: TestMDNSService_InstanceAddr_A (0.00s)
=== RUN   TestMDNSService_InstanceAddr_AAAA
--- PASS: TestMDNSService_InstanceAddr_AAAA (0.00s)
=== RUN   TestMDNSService_InstanceAddr_TXT
--- PASS: TestMDNSService_InstanceAddr_TXT (0.00s)
=== RUN   TestMDNSService_HostNameQuery
--- PASS: TestMDNSService_HostNameQuery (0.00s)
=== RUN   TestMDNSService_serviceEnum_PTR
--- PASS: TestMDNSService_serviceEnum_PTR (0.00s)
FAIL
exit status 1
FAIL	github.com/hashicorp/mdns	0.068s

After:

$ go test -race -v
=== RUN   TestServer_StartStop
--- PASS: TestServer_StartStop (0.00s)
=== RUN   TestServer_Lookup
2019/08/23 19:16:50 [INFO] mdns: Closing client {0xc0000a0050 0xc0000a0058 0xc0000a0060 0xc0000a0068 1 0xc000088480}
--- PASS: TestServer_Lookup (0.05s)
=== RUN   TestNewMDNSService_BadParams
--- PASS: TestNewMDNSService_BadParams (0.00s)
=== RUN   TestMDNSService_BadAddr
--- PASS: TestMDNSService_BadAddr (0.00s)
=== RUN   TestMDNSService_ServiceAddr
--- PASS: TestMDNSService_ServiceAddr (0.00s)
=== RUN   TestMDNSService_InstanceAddr_ANY
--- PASS: TestMDNSService_InstanceAddr_ANY (0.00s)
=== RUN   TestMDNSService_InstanceAddr_SRV
--- PASS: TestMDNSService_InstanceAddr_SRV (0.00s)
=== RUN   TestMDNSService_InstanceAddr_A
--- PASS: TestMDNSService_InstanceAddr_A (0.00s)
=== RUN   TestMDNSService_InstanceAddr_AAAA
--- PASS: TestMDNSService_InstanceAddr_AAAA (0.00s)
=== RUN   TestMDNSService_InstanceAddr_TXT
--- PASS: TestMDNSService_InstanceAddr_TXT (0.00s)
=== RUN   TestMDNSService_HostNameQuery
--- PASS: TestMDNSService_HostNameQuery (0.00s)
=== RUN   TestMDNSService_serviceEnum_PTR
--- PASS: TestMDNSService_serviceEnum_PTR (0.00s)
PASS
ok  	github.com/hashicorp/mdns	1.068s

@hashicorp-cla
Copy link

hashicorp-cla commented Aug 23, 2019

CLA assistant check
All committers have signed the CLA.

@kpp kpp force-pushed the hashicorp_fix_data_race branch from 51bc674 to d7f8d26 Compare August 24, 2019 04:09
@jefferai
Copy link
Member

According to whyrusleeping#4 (comment) this isn't the right fix for this problem?

@jefferai
Copy link
Member

By copying the other thread won't be able to reference sent. It seems like the right solution would be to replace the bool with an atomic check-and-set.

@kpp
Copy link
Author

kpp commented Aug 26, 2019

It seems like the right solution would be to replace the bool with an atomic check-and-set.

Well no. Other fields are affected by data race too. So you either have to use mutex to access fields (like in whyrusleeping#4) or copy the value and send it.

@jefferai
Copy link
Member

Only one field is called out in this PR though. If other fields are affected too, shouldn't they also be addressed?

@kpp
Copy link
Author

kpp commented Aug 26, 2019

If other fields are affected too, shouldn't they also be addressed?

No because I copy the value and send it to another thread. So having sent field is useless but the other fields do not have to be protected by mutex.

@jefferai
Copy link
Member

What's wrong with protecting the fields with a mutex like they did in whyrusleeping? I don't really know the code, but that seems to have less potential knock on effects than changing whether the original object or a copy are sent. Especially given some fields (e.g. InfoFields, and potentially other fields in the future) are reference values.

@kpp
Copy link
Author

kpp commented Aug 26, 2019

What's wrong with protecting the fields with a mutex like they did in whyrusleeping?

You have to ensure the end user to lock the mutex (via protected fields and accessors). It will break the existent code for your users.

but that seems to have less potential knock on effects than changing whether the original object or a copy are sent. Especially given some fields (e.g. InfoFields, and potentially other fields in the future) are reference values.

It will have an opposite effect. You store InfoFields* nowhere else but in the heap however you store pointers to inp in both params.Entries and inprogress which live in different threads.

Please read -race traces I provided you carefully and try to track data races on your own.

@jefferai
Copy link
Member

You have to ensure the end user to lock the mutex (via protected fields and accessors). It will break the existent code for your users.

Will it? At worst if they are accessing things directly they will be no worse off than they are now. At best they'll use the mutexes.

@kpp
Copy link
Author

kpp commented Aug 27, 2019

At best they'll use the mutexes.

Or in my case they don't need mutexes at all. Isn't it perfect?

@jefferai
Copy link
Member

Depends on the knock-on effects of using a copy instead of the original values, which I'm unclear on.

@kpp
Copy link
Author

kpp commented Aug 27, 2019

https://golang.org/ref/mem#tmp_6

the assignment to a is not followed by any synchronization event, so it is not guaranteed to be observed by any other goroutine.

Which means that no matter what logic you have in your code having data race makes it incorrect. Fix the data race. And then fix your logic.

@jefferai
Copy link
Member

Sorry, I'm not following what you mean there.

@kpp
Copy link
Author

kpp commented Aug 27, 2019

I am telling you that a code without data race is better than a code with a data race.

@jefferai
Copy link
Member

We can agree on that, yes.

@kpp
Copy link
Author

kpp commented Aug 27, 2019

So let's merge this PR.

@jefferai
Copy link
Member

Sorry, I can't. I don't know the ramifications of making that change. Someone that knows the library better than me would have to evaluate that. Or it can be fixed in a way that doesn't change the object being sent down the channel, in which case I'd be comfortable merging.

@ecksun
Copy link

ecksun commented May 3, 2021

I don't know the ramifications of making that change

Is not the only ramification that clients will get multiple messages on the channel for the same discovered service?

That might be considered a breaking change, but I don't think this can be solved without making a breaking change.

Or it can be fixed in a way that doesn't change the object being sent down the channel

Are you referring to the sent field? Note that its not exported.

If you still would like to keep it, would be it be acceptable to just set it to true unconditionally? Thats what a consumer of the API would see currently anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants