-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix timeouts tests #30374
Fix timeouts tests #30374
Conversation
tests/kernel/common/testcase.yaml
Outdated
@@ -1,3 +1,6 @@ | |||
common: | |||
timeout: 120000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isnt this a bit of a large number? this translates to ~1.5 days! We do not want that, if something else goes wrong, the test will just block everything else and CI will never complete.
Please provide additional context and why this is required now and what platforms might be affected and what has been done to verify that the added delay is not due to some bug or issue in the test or the platform itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nashif! I thought the timeout was in milliseconds, I will fix all to 120 (seconds).
I will add more details in the PR description and commit messages if needed there too. This is a 'workaround' to allow sanitycheck to complete for up2 on these tests. I'll investigate more too so we can decide if we open a separate bug or enhancement to address the root issue(s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this extra time coming from?
if this is something that is happening for all tests on this platform, due to some unavoidable setup/flashing/teardown/bootup time or whatever, it would be better to add an option in board yaml to apply a constant offset to all timeouts instead of modifying individual test timeouts to work around it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My next step is to add timestamps printing out for the test(s) as Anas suggested. Maybe this could help us find out if its related to setup/flashing/bootup time? I'll also try bisecting to see when/why this changed behavior to start failing in sanitycheck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the description, hoping it helps give context. Please let me know your thoughts.
Sanitycheck uses a default timeout for these tests. On physical hardware (up2) it is observed by manual inspection that the sys_put_* tests need more time to complete (START to PASS/FAIL) than default setting. This commit adds a timeout sufficient to prevent sanitycheck from moving on to the next test too quickly. Fixes zephyrproject-rtos#30275 Signed-off-by: Jennifer Williams <[email protected]>
Sanitycheck uses a default timeout for these tests. On physical hardware (up2) it is observed by manual inspection that the pm tests need more time to complete (START to PASS/FAIL) than default. This commit adds a timeout sufficient to prevent sanitycheck from moving on to the next test too quickly. Signed-off-by: Jennifer Williams <[email protected]>
Sanitycheck uses a default timeout for these tests. On physical hardware (up2) it is observed by manual inspection that some tests (ex: permission_inheritance, mem_domain_remove_add_partition) need more time to complete (START to PASS/FAIL) than default. This commit adds a timeout sufficient to prevent sanitycheck from moving on to the next test too quickly. Signed-off-by: Jennifer Williams <[email protected]>
e669f26
to
9471f14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, but I'm going to have to NACK this, this needs more root-cause analysis and I really suspect the real issue is that there are inherent delays on testing with upsquared that need to be applied globally. this approach doesn't scale.
Understood. I'll keep working to analyze and how to resolve the issue properly. Maybe we should close this PR then? |
Better to close that patch, I had a freeze during 3min35sec in some cases, that is 215 seconds. According to the logic of the patch you have to increase time more and more. That is not correct. |
Descrition with context
Sanitycheck uses a default timeout for these tests (listed below).
On emulator (qemu) the tests work well during default time.
On physical hardware (up2) using sanitycheck yields timeout
errors and appears to clip the results. See #30275 for more detail.
This occurs in the following:
Using west to build and flash a test instead of sanitycheck,
on Up2 it is observed by manual inspection using
stopwatch that several tests each need more time
to complete (START to PASS/FAIL).
This commit adds a timeout sufficient to prevent
sanitycheck from moving on to the next test too quickly.
This PR includes a commit to fix #30275. The others have
not yet been reported as bugs but suggest to address them here if agreed appropriate.
This is happening for many tests in these specific test suites (above) on this platform. This does not appear to be due to setup/flashing/teardown/bootup time between tests, it is time during each test (after starting the suite, and then for a test between 'START - test_mem_domain_remove_add_partition' prints and 'PASS- test_mem_domain_remove_add_partition').
Why now?
This is yielding errors (see #30275) despite passing tests if relieved of default timeout. This is a workaround to inform sanitycheck to run long enough to observe the full duration of the test case and reflect passing tests correctly in sanitycheck reports.
What platforms might be affected?
This is intended for x86/up2, however currently not aware of a way to do board level timeout. So, the change (extended test timeout) will apply across all platforms that use the tests. If the tests pass, there is no observed effect, but if the tests do not complete sanitycheck will take the longer timeout to report the fail.
Continued work:
This is a workaround for sanitycheck to observe the full results of a test in the suite. Don't want to hide a potential bug. Kicking off investigation to verify that the added delay is not due to some bug or issue in the test or the platform itself.