Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix timeouts tests #30374

Closed

Conversation

jenmwms
Copy link
Collaborator

@jenmwms jenmwms commented Dec 1, 2020

Descrition with context
Sanitycheck uses a default timeout for these tests (listed below).
On emulator (qemu) the tests work well during default time.
On physical hardware (up2) using sanitycheck yields timeout
errors and appears to clip the results. See #30275 for more detail.

This occurs in the following:

  • tests/kernel/common (all of the test_sys_put_)
  • tests/kernel/mem_protect/mem_protect (including test_permission_inheritance, test_mem_domain_remove_add_partition)
  • tests/kernel/device (pm - test_dummy_device)

Using west to build and flash a test instead of sanitycheck,
on Up2 it is observed by manual inspection using
stopwatch that several tests each need more time
to complete (START to PASS/FAIL).

This commit adds a timeout sufficient to prevent
sanitycheck from moving on to the next test too quickly.
This PR includes a commit to fix #30275. The others have
not yet been reported as bugs but suggest to address them here if agreed appropriate.

This is happening for many tests in these specific test suites (above) on this platform. This does not appear to be due to setup/flashing/teardown/bootup time between tests, it is time during each test (after starting the suite, and then for a test between 'START - test_mem_domain_remove_add_partition' prints and 'PASS- test_mem_domain_remove_add_partition').

Why now?
This is yielding errors (see #30275) despite passing tests if relieved of default timeout. This is a workaround to inform sanitycheck to run long enough to observe the full duration of the test case and reflect passing tests correctly in sanitycheck reports.

What platforms might be affected?
This is intended for x86/up2, however currently not aware of a way to do board level timeout. So, the change (extended test timeout) will apply across all platforms that use the tests. If the tests pass, there is no observed effect, but if the tests do not complete sanitycheck will take the longer timeout to report the fail.

Continued work:
This is a workaround for sanitycheck to observe the full results of a test in the suite. Don't want to hide a potential bug. Kicking off investigation to verify that the added delay is not due to some bug or issue in the test or the platform itself.

@jenmwms jenmwms requested a review from andrewboie as a code owner December 1, 2020 20:23
@github-actions github-actions bot added area: Kernel area: Tests Issues related to a particular existing or missing test labels Dec 1, 2020
@@ -1,3 +1,6 @@
common:
timeout: 120000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnt this a bit of a large number? this translates to ~1.5 days! We do not want that, if something else goes wrong, the test will just block everything else and CI will never complete.
Please provide additional context and why this is required now and what platforms might be affected and what has been done to verify that the added delay is not due to some bug or issue in the test or the platform itself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nashif! I thought the timeout was in milliseconds, I will fix all to 120 (seconds).
I will add more details in the PR description and commit messages if needed there too. This is a 'workaround' to allow sanitycheck to complete for up2 on these tests. I'll investigate more too so we can decide if we open a separate bug or enhancement to address the root issue(s).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this extra time coming from?

if this is something that is happening for all tests on this platform, due to some unavoidable setup/flashing/teardown/bootup time or whatever, it would be better to add an option in board yaml to apply a constant offset to all timeouts instead of modifying individual test timeouts to work around it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My next step is to add timestamps printing out for the test(s) as Anas suggested. Maybe this could help us find out if its related to setup/flashing/bootup time? I'll also try bisecting to see when/why this changed behavior to start failing in sanitycheck.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the description, hoping it helps give context. Please let me know your thoughts.

Sanitycheck uses a default timeout for these tests.
On physical hardware (up2) it is observed by
manual inspection that the sys_put_*
tests need more time to complete (START to PASS/FAIL)
than default setting. This commit adds a timeout sufficient
to prevent sanitycheck from moving on to the next test
too quickly.

Fixes zephyrproject-rtos#30275

Signed-off-by: Jennifer Williams <[email protected]>
Sanitycheck uses a default timeout for these tests.
On physical hardware (up2) it is observed by
manual inspection that the pm tests
need more time to complete (START to PASS/FAIL) than
default. This commit adds a timeout sufficient to
prevent sanitycheck from moving on to the next test
too quickly.

Signed-off-by: Jennifer Williams <[email protected]>
Sanitycheck uses a default timeout for these tests.
On physical hardware (up2) it is observed by
manual inspection that some tests (ex:
permission_inheritance, mem_domain_remove_add_partition)
need more time to complete (START to PASS/FAIL) than default.
This commit adds a timeout sufficient to prevent sanitycheck
from moving on to the next test too quickly.

Signed-off-by: Jennifer Williams <[email protected]>
@jenmwms jenmwms force-pushed the fix_timeouts_tests_incl_30275 branch from e669f26 to 9471f14 Compare December 1, 2020 21:57
Copy link
Contributor

@andrewboie andrewboie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, but I'm going to have to NACK this, this needs more root-cause analysis and I really suspect the real issue is that there are inherent delays on testing with upsquared that need to be applied globally. this approach doesn't scale.

@jenmwms
Copy link
Collaborator Author

jenmwms commented Dec 1, 2020

sorry, but I'm going to have to NACK this, this needs more root-cause analysis and I really suspect the real issue is that there are inherent delays on testing with upsquared that need to be applied globally. this approach doesn't scale.

Understood. I'll keep working to analyze and how to resolve the issue properly. Maybe we should close this PR then?

@maksimmasalski
Copy link
Collaborator

sorry, but I'm going to have to NACK this, this needs more root-cause analysis and I really suspect the real issue is that there are inherent delays on testing with upsquared that need to be applied globally. this approach doesn't scale.

Understood. I'll keep working to analyze and how to resolve the issue properly. Maybe we should close this PR then?

Better to close that patch, I had a freeze during 3min35sec in some cases, that is 215 seconds. According to the logic of the patch you have to increase time more and more. That is not correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Kernel area: Tests Issues related to a particular existing or missing test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

up_squared: tests/kernel/common failed (timeout error)
5 participants