Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent timerfd failure on Android #124873

Closed
mhsmith opened this issue Oct 1, 2024 · 6 comments
Closed

Intermittent timerfd failure on Android #124873

mhsmith opened this issue Oct 1, 2024 · 6 comments
Labels
OS-android type-bug An unexpected behavior, bug, or error

Comments

@mhsmith
Copy link
Member

mhsmith commented Oct 1, 2024

Bug report

This happens quite often, but usually doesn't cause a buildbot failure because it passes on the second attempt. However, it's now double-failed twice:

======================================================================
FAIL: test_timerfd_poll (test.test_os.TimerfdTests.test_timerfd_poll)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/user/0/org.python.testbed/files/python/lib/python3.14/test/test_os.py", line 4390, in test_timerfd_poll
    self.check_timerfd_poll(False)
    ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
  File "/data/user/0/org.python.testbed/files/python/lib/python3.14/test/test_os.py", line 4378, in check_timerfd_poll
    self.assertEqual(self.read_count_signaled(fd), 1)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 2 != 1

----------------------------------------------------------------------
Ran 367 tests in 12.334s

FAILED (failures=1, skipped=120)
test test_os failed

Linked PRs

@mhsmith mhsmith added type-bug An unexpected behavior, bug, or error OS-android labels Oct 1, 2024
@mhsmith
Copy link
Member Author

mhsmith commented Nov 20, 2024

I was able to reproduce this by doing something else on the emulator while running the test repeatedly in -F mode – starting Chrome was usually enough. My initial understanding of the test was that the process would need to be suspended for at least 1/8 of a second for it to fail this way, but surely if that was happening, lots of other time-sensitive tests would fail as well. More detailed investigation is needed.

If this is caused by the emulator suspending the tests to run some higher-priority task, we might be able to increase the priority of the tests by making them start a visible activity.

Since this is a Linux-only feature exposed via a BSD-based libc, it’s also possible Android's implementation is buggy, or at least different enough that it breaks assumptions in the test.

For now I'll disable the test to avoid the buildbots posting useless messages on GitHub. It would be no great loss to disable the entire timerfd feature on Android, since it's a Linux-only feature which isn't widely used. But since it's already been released as part of 3.13, that would technically be a breaking change rather than a bug fix.

@vstinner
Copy link
Member

vstinner commented Nov 21, 2024

Maybe CLOCK_RES_PLACES = 2 can be replaced with CLOCK_RES_PLACES = 1 to tolerate 100 ms difference on Android? See test_os.py:

class TimerfdTests(unittest.TestCase):
    # 1 ms accuracy is reliably achievable on every platform except Android
    # emulators, where we allow 10 ms (gh-108277).
    if sys.platform == "android" and platform.android_ver().is_emulator:
        CLOCK_RES_PLACES = 2
    else:
        CLOCK_RES_PLACES = 3

vstinner added a commit to vstinner/cpython that referenced this issue Nov 21, 2024
On Android, TimerfdTests of test_os now uses 100 ms accuracy instead
of 10 ms.
vstinner added a commit that referenced this issue Nov 21, 2024
On Android, TimerfdTests of test_os now uses 100 ms accuracy instead
of 10 ms.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 21, 2024
…-127101)

On Android, TimerfdTests of test_os now uses 100 ms accuracy instead
of 10 ms.
(cherry picked from commit bab4b04)

Co-authored-by: Victor Stinner <[email protected]>
@vstinner
Copy link
Member

Maybe CLOCK_RES_PLACES = 2 can be replaced with CLOCK_RES_PLACES = 1 to tolerate 100 ms difference on Android?

I made the change bab4b04 since tests failed recently: #126730 (comment). Let's see how it goes with 100 ms instead of 10 ms.

vstinner added a commit that referenced this issue Nov 21, 2024
…) (#127105)

gh-124873: Tolerate 100 ms in TimerfdTests on Android (GH-127101)

On Android, TimerfdTests of test_os now uses 100 ms accuracy instead
of 10 ms.
(cherry picked from commit bab4b04)

Co-authored-by: Victor Stinner <[email protected]>
mhsmith added a commit to mhsmith/cpython that referenced this issue Nov 26, 2024
@mhsmith
Copy link
Member Author

mhsmith commented Nov 26, 2024

Sorry for the slow reply. Unfortunately this won't fix the failure, because it's happening at a point in the test earlier than any of the CLOCK_RES variables are involved.

This issue hasn't caused any full buildbot failures since #127101 was merged, but there have been failures which passed on a rerun and are therefore reported as warnings:

So I've created #127279 to revert #127101 and skip these tests until we have time to investigate this further.

@mhsmith mhsmith reopened this Nov 26, 2024
vstinner added a commit that referenced this issue Nov 26, 2024
* Revert "[3.13] gh-124873: Tolerate 100 ms in TimerfdTests on Android (GH-127101) (#127105)"

This reverts commit c09366b.

* Skip timerfd tests on Android.

Co-authored-by: Victor Stinner <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 26, 2024
* Revert "[3.13] pythongh-124873: Tolerate 100 ms in TimerfdTests on Android (pythonGH-127101) (pythonGH-127105)"

This reverts commit c09366b.

* Skip timerfd tests on Android.

(cherry picked from commit 4ca2c82)

Co-authored-by: Malcolm Smith <[email protected]>
Co-authored-by: Victor Stinner <[email protected]>
@vstinner
Copy link
Member

The test is now skipped on Android: 4ca2c82.

vstinner added a commit that referenced this issue Nov 29, 2024
gh-124873: Skip timerfd tests on Android (GH-127279)

* Revert "[3.13] gh-124873: Tolerate 100 ms in TimerfdTests on Android (GH-127101) (GH-127105)"

This reverts commit c09366b.

* Skip timerfd tests on Android.

(cherry picked from commit 4ca2c82)

Co-authored-by: Malcolm Smith <[email protected]>
Co-authored-by: Victor Stinner <[email protected]>
@mhsmith
Copy link
Member Author

mhsmith commented Dec 13, 2024

Similar report on Gentoo: #126112.

ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
…27101)

On Android, TimerfdTests of test_os now uses 100 ms accuracy instead
of 10 ms.
ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
* Revert "[3.13] pythongh-124873: Tolerate 100 ms in TimerfdTests on Android (pythonGH-127101) (python#127105)"

This reverts commit c09366b.

* Skip timerfd tests on Android.

Co-authored-by: Victor Stinner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-android type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants