-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky cache tests #993
Comments
@RolandasRazma @AnthonyMDev Would either of you have a second to take a look at this? Obviously flakiness is obnoxious to diagnose but we're getting intermittent and weird failures on a bunch of tests on CI. |
Yes, it probably related, don't like that tight loop |
It's not failing consistently on one particular test though - that's the bit that makes this a bit obnoxious to try to figure out |
If tests running in parallel maybe we slow down is enough to knock them off |
I don't think that's a great solution - we do want the tests to reflect potentially very high loads, and if we just slow them down in order to make them pass, we're absolutely going to miss something. |
This is a good example of some of the failures we're seeing: There's no change except to our documentation, but it's randomly having a bunch of tests time out. |
If that test is failing intermittently, it almost certainly means that there are bugs in the locking. That’s what it was designed to test. It has to loop a large number of times to trigger the edge case of the dead lock. If you reduce the loop to 10, the test becomes useless. If you reduce it to 500, it will maybe sometimes deadlock, just not quite as often. Because it’s a race condition we are testing for, the test will not ALWAYS fail if the bug exists. It’s near impossible to make that happen. But that test shouldn’t be failing randomly. It most likely indicates something is wrong.
|
Does seem like we're getting some repeated crashes. Going to keep updating links as these repeat:
And also some one-offs (so far):
|
I would love to help look into this, but I am slammed at work this week. I'll be able to look into this more next week. But I'm pretty confident that if that test is failing or crashing, it is indicative of a problem in the code not just in the test... I could be wrong, but I'd have to look at it next week if this isn't resolved by then. |
@AnthonyMDev No worries, totally understand! I'm just throwing my notes up to try and see if it helps anyone think of something, because I'm kind of mystified by this. The recursive lock was only added to the in-memory cache, but this test was added to the SQLite tests as well since it's in the It's crashing intermittently in the same place on iOS and tvOS when not using SQLite, both on CI and locally. |
Yeah, this should be in This comment from before might give you something to think about to solve this? This could be due to the way the |
Oof, yeah that's a bit of a mess. Definitely seems like something weird is going on with the I've tried the solution proffered in this article about using dispatch groups but it doesn't appear to have made any difference - still seeing that same wraparound error. Will keep futzing with this. |
Yeah, I really think that the |
It looks like I'm not sure I see where the loader is accessing the store directly, though. |
looking at master history https://github.com/apollographql/apollo-ios/commits/master it looks like it started after weak delegate was merged. Are we looking in correct place? |
Those are builds that are failing after the merge to master - builds that were failing prior to merge to master started earlier, but you're correct that does seem to have increased the velocity. I'll take a look. |
The flakiness has been resolved. Underlying issues are now being tracked in #1011. Closing this out. |
We're seeing some really flaky cache tests, particularly since #552 was merged - it may be related to the changes in that or it may not.
The text was updated successfully, but these errors were encountered: