Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make tests less flaky on CI #716

Merged
merged 3 commits into from
Apr 24, 2023
Merged

Make tests less flaky on CI #716

merged 3 commits into from
Apr 24, 2023

Conversation

StephenSorriaux
Copy link
Member

@StephenSorriaux StephenSorriaux commented Mar 24, 2023

Why is this needed?

This makes the tests way less flaky, I hope (and first results confirmed it... but you never know).

Proposed Changes

  • renamed back test__connection.py to test_connection.py, where it belongs
  • added client_cluster_health in the setup_zookeeper step that is there to "guarantee" the cluster is up and running, without even considering the configuration that will be used by the test itself. I added a "custom" retry logic without using a KazooRetry since KazooRetry does not handle refused/dropped connections like I would want too: even if a connection_retry is given to KazooClient(), the retry configuration is actually not respected because the interrupt() end up being triggered anyway (ie. the client._stopped event)
  • now performing a ZK cluster nuking at the end of each test case (not each test, it's different) since keeping the same ZK state from test cases to test cases actually led to some tests randomly failing
  • fixed the test_read_only test based on what is done in the official java client (https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/test/java/org/apache/zookeeper/test/ReadOnlyModeTest.java), we were missing some things and that led the test to fail from time to time (see comments in the code)
  • activated the pytest log_cli configuration, as it gives more logs for each tests and I find it useful

Does this PR introduce any breaking change?

Nope.

@StephenSorriaux StephenSorriaux force-pushed the fix/flaky-tests branch 2 times, most recently from 8a21392 to 93b7337 Compare March 24, 2023 21:45
@codecov
Copy link

codecov bot commented Mar 24, 2023

Codecov Report

Patch coverage has no change and project coverage change: +0.36 🎉

Comparison is base (ffa0ae9) 96.28% compared to head (a41f564) 96.65%.

❗ Current head a41f564 differs from pull request most recent head 64af698. Consider uploading reports for the commit 64af698 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #716      +/-   ##
==========================================
+ Coverage   96.28%   96.65%   +0.36%     
==========================================
  Files          27       27              
  Lines        3554     3554              
==========================================
+ Hits         3422     3435      +13     
+ Misses        132      119      -13     

see 7 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@StephenSorriaux StephenSorriaux force-pushed the fix/flaky-tests branch 2 times, most recently from 092eabd to 8e2e933 Compare March 24, 2023 22:06
@StephenSorriaux StephenSorriaux force-pushed the fix/flaky-tests branch 24 times, most recently from 84a34a4 to 53bc8c1 Compare March 26, 2023 05:35
kazoo/tests/conftest.py Outdated Show resolved Hide resolved
kazoo/tests/test_connection.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
kazoo/tests/test_connection.py Show resolved Hide resolved
@StephenSorriaux StephenSorriaux force-pushed the fix/flaky-tests branch 2 times, most recently from cde26c3 to a41f564 Compare April 10, 2023 22:21
Copy link
Member

@jeffwidman jeffwidman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments. Overall, I'm still 👍 on this, but realized there's a few more sections where it'd be helpful to have more of the history/rationale of the change documented, not only for me but also future git blame spelunkers.

Would also appreciate if @ceache takes a look, although IMO we shouldn't wait for him before merging this if he doesn't have time.

@jeffwidman
Copy link
Member

  • I added a "custom" retry logic without using a KazooRetry since KazooRetry does not handle refused/dropped connections like I would want too: even if a connection_retry is given to KazooClient(), the retry configuration is actually not respected because the interrupt() end up being triggered anyway (ie. the client._stopped event)

This is fine for this PR, but is this something we should consider changing/fixing in KazooClient/KazooRetry long-term?

@ceache
Copy link
Contributor

ceache commented Apr 11, 2023 via email

Fixed the `test_read_only` test based on what is done in the official Java client, we were missing some things and that led the test to fail from time to time (see comments in the code for full story).
Also renamed `test__connection.py` to `test_connection.py`, where it belongs.
Now performing a ZK cluster nuking at the end of each test case since keeping the same ZK state from test cases to test cases actually led to some tests randomly failing.
Added `client_cluster_health` ZK client in the `setup_zookeeper()` step that is there to "guarantee" the cluster is up and running, without even considering the configuration that will be used by the test itself.
Activate `pytest`'s `logcli` setting to better read the client lifecycle during testing.
Add `log()` function to KazooTestHarness class so that it is possible to log like crazy when something is not working
Display the ZK client port in logs when starting a ZK server (useful for test "debug")
Be able to get more than the last 100 lines of ZK logs (can be useful, believe me)
@StephenSorriaux
Copy link
Member Author

  • I added a "custom" retry logic without using a KazooRetry since KazooRetry does not handle refused/dropped connections like I would want too: even if a connection_retry is given to KazooClient(), the retry configuration is actually not respected because the interrupt() end up being triggered anyway (ie. the client._stopped event)

This is fine for this PR, but is this something we should consider changing/fixing in KazooClient/KazooRetry long-term?

Yes, I think we should consider looking into it, especially what to expect of the retry when used as a connection_retry parameter for the client

@StephenSorriaux
Copy link
Member Author

I'll be all in for a docker based harness! I personally have java installed on my desktop just for kazoo...

@jeffwidman
Copy link
Member

I would 110% support a docker-based harness. No problem adding that as a test/dev dependency, it's almost certainly more accessible to drive-by devs as well who are going to be familiar with that from other libraries doing similar things.

Copy link
Member

@jeffwidman jeffwidman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me, thank you so much for the detailed explanations, not only for me but also on behalf of other folks who stumble across this later.

Probably best to wait to merge til Charles get a chance to look at it too.

Copy link
Contributor

@ceache ceache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Thanks a lot @StephenSorriaux !

@StephenSorriaux StephenSorriaux merged commit 2c36d69 into master Apr 24, 2023
@StephenSorriaux StephenSorriaux deleted the fix/flaky-tests branch April 24, 2023 18:04
@StephenSorriaux
Copy link
Member Author

  • I added a "custom" retry logic without using a KazooRetry since KazooRetry does not handle refused/dropped connections like I would want too: even if a connection_retry is given to KazooClient(), the retry configuration is actually not respected because the interrupt() end up being triggered anyway (ie. the client._stopped event)

This is fine for this PR, but is this something we should consider changing/fixing in KazooClient/KazooRetry long-term?

Yes, I think we should consider looking into it, especially what to expect of the retry when used as a connection_retry parameter for the client

Update: I think #685 tried to tackle the "issue" I mentionned

@ceache
Copy link
Contributor

ceache commented Apr 24, 2023 via email

@StephenSorriaux
Copy link
Member Author

Sorry for the rebase, test_connection.py might give you some conflict...

I think I owe you a review on #685, will take the time now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants