Fix timeouts tests #30374

jenmwms · 2020-12-01T20:23:00Z

Descrition with context
Sanitycheck uses a default timeout for these tests (listed below).
On emulator (qemu) the tests work well during default time.
On physical hardware (up2) using sanitycheck yields timeout
errors and appears to clip the results. See #30275 for more detail.

This occurs in the following:

tests/kernel/common (all of the test_sys_put_)
tests/kernel/mem_protect/mem_protect (including test_permission_inheritance, test_mem_domain_remove_add_partition)
tests/kernel/device (pm - test_dummy_device)

Using west to build and flash a test instead of sanitycheck,
on Up2 it is observed by manual inspection using
stopwatch that several tests each need more time
to complete (START to PASS/FAIL).

This commit adds a timeout sufficient to prevent
sanitycheck from moving on to the next test too quickly.
This PR includes a commit to fix #30275. The others have
not yet been reported as bugs but suggest to address them here if agreed appropriate.

This is happening for many tests in these specific test suites (above) on this platform. This does not appear to be due to setup/flashing/teardown/bootup time between tests, it is time during each test (after starting the suite, and then for a test between 'START - test_mem_domain_remove_add_partition' prints and 'PASS- test_mem_domain_remove_add_partition').

Why now?
This is yielding errors (see #30275) despite passing tests if relieved of default timeout. This is a workaround to inform sanitycheck to run long enough to observe the full duration of the test case and reflect passing tests correctly in sanitycheck reports.

What platforms might be affected?
This is intended for x86/up2, however currently not aware of a way to do board level timeout. So, the change (extended test timeout) will apply across all platforms that use the tests. If the tests pass, there is no observed effect, but if the tests do not complete sanitycheck will take the longer timeout to report the fail.

Continued work:
This is a workaround for sanitycheck to observe the full results of a test in the suite. Don't want to hide a potential bug. Kicking off investigation to verify that the added delay is not due to some bug or issue in the test or the platform itself.

nashif · 2020-12-01T20:25:20Z

tests/kernel/common/testcase.yaml

@@ -1,3 +1,6 @@
+common:
+  timeout: 120000


isnt this a bit of a large number? this translates to ~1.5 days! We do not want that, if something else goes wrong, the test will just block everything else and CI will never complete.
Please provide additional context and why this is required now and what platforms might be affected and what has been done to verify that the added delay is not due to some bug or issue in the test or the platform itself.

Thanks @nashif! I thought the timeout was in milliseconds, I will fix all to 120 (seconds).
I will add more details in the PR description and commit messages if needed there too. This is a 'workaround' to allow sanitycheck to complete for up2 on these tests. I'll investigate more too so we can decide if we open a separate bug or enhancement to address the root issue(s).

where is this extra time coming from?

if this is something that is happening for all tests on this platform, due to some unavoidable setup/flashing/teardown/bootup time or whatever, it would be better to add an option in board yaml to apply a constant offset to all timeouts instead of modifying individual test timeouts to work around it

My next step is to add timestamps printing out for the test(s) as Anas suggested. Maybe this could help us find out if its related to setup/flashing/bootup time? I'll also try bisecting to see when/why this changed behavior to start failing in sanitycheck.

I updated the description, hoping it helps give context. Please let me know your thoughts.

Sanitycheck uses a default timeout for these tests. On physical hardware (up2) it is observed by manual inspection that the sys_put_* tests need more time to complete (START to PASS/FAIL) than default setting. This commit adds a timeout sufficient to prevent sanitycheck from moving on to the next test too quickly. Fixes zephyrproject-rtos#30275 Signed-off-by: Jennifer Williams <[email protected]>

Sanitycheck uses a default timeout for these tests. On physical hardware (up2) it is observed by manual inspection that the pm tests need more time to complete (START to PASS/FAIL) than default. This commit adds a timeout sufficient to prevent sanitycheck from moving on to the next test too quickly. Signed-off-by: Jennifer Williams <[email protected]>

Sanitycheck uses a default timeout for these tests. On physical hardware (up2) it is observed by manual inspection that some tests (ex: permission_inheritance, mem_domain_remove_add_partition) need more time to complete (START to PASS/FAIL) than default. This commit adds a timeout sufficient to prevent sanitycheck from moving on to the next test too quickly. Signed-off-by: Jennifer Williams <[email protected]>

andrewboie

sorry, but I'm going to have to NACK this, this needs more root-cause analysis and I really suspect the real issue is that there are inherent delays on testing with upsquared that need to be applied globally. this approach doesn't scale.

jenmwms · 2020-12-01T23:58:33Z

sorry, but I'm going to have to NACK this, this needs more root-cause analysis and I really suspect the real issue is that there are inherent delays on testing with upsquared that need to be applied globally. this approach doesn't scale.

Understood. I'll keep working to analyze and how to resolve the issue properly. Maybe we should close this PR then?

maksimmasalski · 2020-12-02T11:25:43Z

sorry, but I'm going to have to NACK this, this needs more root-cause analysis and I really suspect the real issue is that there are inherent delays on testing with upsquared that need to be applied globally. this approach doesn't scale.

Understood. I'll keep working to analyze and how to resolve the issue properly. Maybe we should close this PR then?

Better to close that patch, I had a freeze during 3min35sec in some cases, that is 215 seconds. According to the logic of the patch you have to increase time more and more. That is not correct.

jenmwms requested review from nashif, andyross, enjiamai, yerabolu, aasthagr and maksimmasalski December 1, 2020 20:23

jenmwms requested a review from andrewboie as a code owner December 1, 2020 20:23

github-actions bot added area: Kernel area: Tests Issues related to a particular existing or missing test labels Dec 1, 2020

nashif requested changes Dec 1, 2020

View reviewed changes

jenmwms added 3 commits December 1, 2020 13:51

jenmwms force-pushed the fix_timeouts_tests_incl_30275 branch from e669f26 to 9471f14 Compare December 1, 2020 21:57

andrewboie suggested changes Dec 1, 2020

View reviewed changes

jenmwms mentioned this pull request Dec 1, 2020

up_squared: tests/kernel/common failed (timeout error) #30275

Closed

jenmwms mentioned this pull request Dec 2, 2020

up_squared: tests/kernel/mem_protect/mem_protect failed. #30305

Closed

LeiW000 approved these changes Dec 2, 2020

View reviewed changes

jenmwms closed this Dec 2, 2020

jenmwms mentioned this pull request Dec 10, 2020

up_squared: slowdown on test execution and timing out on multiple tests #30573

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix timeouts tests #30374

Fix timeouts tests #30374

jenmwms commented Dec 1, 2020 •

edited

Loading

nashif Dec 1, 2020

jenmwms Dec 1, 2020

andrewboie Dec 1, 2020

jenmwms Dec 1, 2020

jenmwms Dec 1, 2020

andrewboie left a comment

jenmwms commented Dec 1, 2020

maksimmasalski commented Dec 2, 2020

Fix timeouts tests #30374

Fix timeouts tests #30374

Conversation

jenmwms commented Dec 1, 2020 • edited Loading

nashif Dec 1, 2020

Choose a reason for hiding this comment

jenmwms Dec 1, 2020

Choose a reason for hiding this comment

andrewboie Dec 1, 2020

Choose a reason for hiding this comment

jenmwms Dec 1, 2020

Choose a reason for hiding this comment

jenmwms Dec 1, 2020

Choose a reason for hiding this comment

andrewboie left a comment

Choose a reason for hiding this comment

jenmwms commented Dec 1, 2020

maksimmasalski commented Dec 2, 2020

jenmwms commented Dec 1, 2020 •

edited

Loading