-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#7514 Fix channel deadlock in meta sync fetcher #7933
base: main
Are you sure you want to change the base?
Conversation
@fpetkovski Could you please take a look when you have some time? |
7d6817c
to
03a7c44
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way we could add a unit test here? I'm afraid that otherwise we will have this problem again
@GiedriusS Sure, I will try to implement a test which simulates store errors. it should be running into the deadlock with the original code, but not with the patch. I hopefully can have something by the end of the week. |
e9b250d
to
3f7b3aa
Compare
@GiedriusS I created a PR with just the test case to show the error by itself with the code on main: https://github.com/thanos-io/thanos/pull/7966/files The same test case is passing with the provided fix. |
94bc785
to
2562632
Compare
pkg/block/fetcher.go
Outdated
multiError = multierror.Append(eg.Wait(), multiError) | ||
|
||
if multiError != nil { | ||
if multiErrorUnwrap, ok := err.(interface{ Unwrap() []error }); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should always succeed unless we change the library, right? Perhaps in the !ok
clause we could put a panic("BUG: multierr doesn't implement unwrap")
to make sure that in the future we would notice this part if suddenly err
wouldn't implement this interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the panic with a comment. Seeing that Unwrap is basically the core feature that go-multierror adds to the mix this should never happen. The panic therefore would be absolutely warranted if it still happens. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into a bunch of issues mostly due to the panic. I then realised that this could be so much simpler, and I refactored it so that the panic is no longer necessary, it would just give a compile error if something major changed in the imported package.
I also realised that my test case's success condition was incorrect. I cannot assert on no errors occuring, since the whole point is that simulated errors are occurring. What I should (and now I am) testing for is that I get exactly the expected number of errors. If there was a deadlock, it would be missing some of them.
I ended up squashing the entire change-set as I forgot to sign the commits. ( :/ )
2562632
to
b14bf1c
Compare
Squashed and rebased the reviewed version, added the new changes in a new commit. |
5c3fe50
to
2a31948
Compare
Errors no longer take out the thread with them, instead are collected into a multierror. Signed-off-by: Miklós Földényi <[email protected]>
Rebased on main. Please let me know if there is anything else for this to be merged! |
Worker threads no longer exit on error, instead they collect them into a multiError and keep working.
Fix proposal for #7514
Changes
Changed concurrent worker behavior in case of error in Exists call, they no longer exit on error, instead collect errors in a multiError and continue working.
Verification
Added unit test simulating errors in the objstore. It is sending a sufficient amount of ULIDs to check and simulates failure on all of them. In the old scenario this would have resulted in a deadlock.