-
Notifications
You must be signed in to change notification settings - Fork 31
{cmd,statsd,writer}: fix race conditions in tests #372
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ones on the writers make sense but for the ServiceMapper I have some qualms with adding locks to production code unnecessarily.
cmd/trace-agent/service_mapper.go
Outdated
cacheEntry, ok := s.cache[service] | ||
s.mu.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like adding all these locks just for the tests. Locks may have a significant impact in performance and in production this code would be perfectly thread-safe since all changes/accesses to cache are done by the same coroutine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. But I think we should definitely re-enable the -race
flag for tests. We could, of course see how we can improve the tests so that these locks are not necessary in a subsequent PR imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure but I'd just remove the intermediate length checks from the test and just do a final one after the Stop (that should not race I think?) instead of adding the locks.
cmd/trace-agent/service_mapper.go
Outdated
@@ -56,7 +58,9 @@ func (s *ServiceMapper) Run() { | |||
case metadata := <-s.in: | |||
s.update(metadata) | |||
case <-telemetryTicker.C: | |||
s.mu.RLock() | |||
log.Infof("total number of tracked services: %d", len(s.cache)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to use cacheSize
here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to take @AlexJF suggestion from above and remove the locks, and the checks. This structure is already thread-safe (considering nobody accesses private variables within the same package).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right very good point. +1 with adjusting test to run with the same thread-safety as the actual code.
Also, maybe add comments in the code when we do the assumption that a given struct will be executed in a thread-safe environment? Might be too much though.
Removed locks and redundant checks. A separate unit test can be made separately for the cache if needed. PTAL |
@LotharSee happy? |
Running the tests with
go test -race /....
reveals several race conditions. This PR fixes them and enables the flag. It slows CI down a bit but adds more thread-safety.