Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[grid] Fix flaky event bus tests by dedicated threading, reverting the polling loop logic and increasing poll timeout #9383

Merged
merged 6 commits into from
Apr 21, 2021

Conversation

pujagani
Copy link
Contributor

@pujagani pujagani commented Apr 15, 2021

Thanks for contributing to Selenium!
A PR well described will help maintainers to quickly review and merge it

Before submitting your PR, please check our contributing guidelines.
Avoid large PRs, help reviewers by making them as simple and short as possible.

Description

Event bus tests when run using the following command would fail :
bazel test --cache_test_results=no --runs_per_test=20 //java/server/test/org/openqa/selenium/events:EventBusTest --test_filter=org.openqa.selenium.events.EventBusTest#

Motivation and Context

The pattern observed in the failure was that the countdown latch would keep waiting and never receive the messages as expected intermittently event after increasing the wait times.
This was primarily due to errors :

  1. ClosedSelectorException
java.nio.channels.ClosedSelectorException
	at java.base/sun.nio.ch.SelectorImpl.ensureOpen(SelectorImpl.java:75)
	at java.base/sun.nio.ch.SelectorImpl.keys(SelectorImpl.java:80)
	at zmq.ZMQ.poll(ZMQ.java:647)
	at org.zeromq.ZMQ$Poller.poll(ZMQ.java:3892)
  1. CancelledKeyException
java.nio.channels.CancelledKeyException
	at java.base/sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:71)
	at java.base/sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:90)
	at zmq.ZMQ.poll(ZMQ.java:663)
	at org.zeromq.ZMQ$Poller.poll(ZMQ.java:3892)

This was observed when tcp transport layer was used in creating the socket when the poll wait time was set to return immediately i.e. set to 0. Once this was updated to a higher value, the tests were not longer flaky but the run time was very slow since a scheduled thread ran and waited for a bit in each run.
Reverting the polling loop to

while (!Thread.currentThread().isInterrupted()) {
had a significant impact on the test run speed and fixed this issue.

Additionally, closed the poller during resource clean up. Added a separate dedicated thread to publish messages.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • I have read the contributing document.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@pujagani pujagani changed the title [grid] Fix flaky event bus tests by dedicated threading and reverting the polling loop logic [grid] Fix flaky event bus tests by dedicated threading, reverting the polling loop logic and increasing poll timeout Apr 15, 2021
@sonarqubecloud
Copy link

Copy link
Member

@diemol diemol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @pujagani!

@diemol diemol merged commit dd8741a into SeleniumHQ:trunk Apr 21, 2021
@pujagani pujagani mentioned this pull request Apr 22, 2021
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants