Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: stabilize eventloopdelay test #41716

Closed

Conversation

benjamingr
Copy link
Member

This change makes the eventloopdelay test less flakey on some platforms
by replacing a blocking-wait with a busy-wait which means the eventloop
will always be positive increase.

This is a partial revert of #30787 that returns the test to the state it
had originally.

Example failure:

https://ci.nodejs.org/job/node-test-commit-linuxone/30460/nodes=rhel7-lto-s390x/testReport/junit/(root)/test/sequential_test_performance_eventloopdelay_/

AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:

  assert(histogram.min > 0)

    at Timeout.spinAWhile [as _onTimeout] (/home/iojs/build/workspace/node-test-commit-linuxone/test/sequential/test-performance-eventloopdelay.js:65:7)
    at listOnTimeout (node:internal/timers:559:17)
    at processTimers (node:internal/timers:502:7) {
  generatedMessage: true,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}

Node.js v18.0.0-pre

This change makes the eventloopdelay test less flakey on some platforms
by replacing a blocking-wait with a busy-wait which means the eventloop
will always be positive increase.

This is a partial revert of nodejs#30787 that returns the test to the state it
had originally.
@benjamingr benjamingr added test Issues and PRs related to the tests. async_hooks Issues and PRs related to the async hooks subsystem. flaky-test Issues and PRs related to the tests with unstable failures on the CI. labels Jan 27, 2022
@benjamingr benjamingr requested review from jasnell and cjihrig January 27, 2022 10:58
@benjamingr benjamingr added the request-ci Add this label to start a Jenkins CI on a PR. label Jan 27, 2022
@nodejs-github-bot nodejs-github-bot added the needs-ci PRs that need a full CI run. label Jan 27, 2022
@Trott
Copy link
Member

Trott commented Jan 27, 2022

@nodejs/testing, especially @cjihrig

Trott
Trott previously approved these changes Jan 27, 2022
Copy link
Member

@Trott Trott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI is green

@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jan 27, 2022
@nodejs-github-bot

This comment has been minimized.

@targos
Copy link
Member

targos commented Jan 27, 2022

should we do a stress test?

@nodejs-github-bot
Copy link
Collaborator

@cjihrig
Copy link
Contributor

cjihrig commented Jan 27, 2022

I'm not so sure about this fix. It's good if we can get the test passing, but I think either Node's event loop monitoring needs to be able to handle this scenario, or something else in this test is incorrect.

should we do a stress test?

I think so, yes.

@benjamingr
Copy link
Member Author

I'm not so sure about this fix.

Me neither - if there is a way to synthetically get the histogram to report event loop delay that would be preferable.

@cjihrig
Copy link
Contributor

cjihrig commented Jan 27, 2022

I'm curious how long this test has been flaky for. I've definitely noticed it lately, but the change being partially reverted here landed over two years ago. Maybe I just missed the failures in the past, but is it possible that a recent change introduced the flakiness?

@richardlau
Copy link
Member

I'm curious how long this test has been flaky for. I've definitely noticed it lately, but the change being partially reverted here landed over two years ago. Maybe I just missed the failures in the past, but is it possible that a recent change introduced the flakiness?

It seemed to start appearing around 22 December 2021 #41286 (comment).

@cjihrig
Copy link
Contributor

cjihrig commented Jan 27, 2022

It seemed to start appearing around 22 December 2021

23637e9 landed three days prior to that and could be worth investigating.

@richardlau
Copy link
Member

It seemed to start appearing around 22 December 2021

23637e9 landed three days prior to that and could be worth investigating.

#41286 (comment)

Walking backwards from commits landed on 22 December 2021, I've started two stress runs around 23637e9:

665b404: https://ci.nodejs.org/view/Stress/job/node-stress-single-test/306/nodes=rhel7-s390x/ (✔️ no failures in 1000 runs) 23637e9: https://ci.nodejs.org/view/Stress/job/node-stress-single-test/307/nodes=rhel7-s390x/ (❌ 100 failures in 1000 runs)

#41286 (comment) added some debug and it looks like in the failing case we end up with histogram.min being 0 which trips an assertion in the test. And at that point I don't know if it's the test or the implementation that's at fault.

@Flarna Flarna added perf_hooks Issues and PRs related to the implementation of the Performance Timing API. and removed async_hooks Issues and PRs related to the async hooks subsystem. labels Jan 27, 2022
@Trott Trott dismissed their stale review January 28, 2022 05:41

dismissing my review while this is discussed

@benjamingr benjamingr added the blocked PRs that are blocked by other issues or PRs. label Jan 28, 2022
@benjamingr
Copy link
Member Author

Adding 'blocked' so this doesn't land before other avenues have been explored.

@benjamingr benjamingr closed this Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked PRs that are blocked by other issues or PRs. flaky-test Issues and PRs related to the tests with unstable failures on the CI. needs-ci PRs that need a full CI run. perf_hooks Issues and PRs related to the implementation of the Performance Timing API. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants