Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test_quiet_client_close #6540

Open
gjoseph92 opened this issue Jun 9, 2022 · 1 comment · Fixed by #6541
Open

Flaky test_quiet_client_close #6540

gjoseph92 opened this issue Jun 9, 2022 · 1 comment · Fixed by #6541
Labels
flaky test Intermittent failures on CI.

Comments

@gjoseph92
Copy link
Collaborator

This hasn't actually failed yet on the test report, but I know it could like it did here: #6504 (comment)

This is caused by #6390 in fallout from #6361.

    def test_quiet_client_close(loop):
        with captured_logger(logging.getLogger("distributed")) as logger:
            with Client(
                loop=loop,
                processes=False,
                dashboard_address=":0",
                threads_per_worker=4,
            ) as c:
                futures = c.map(slowinc, range(1000), delay=0.01)
                sleep(0.200)  # stop part-way
            sleep(0.1)  # let things settle
    
            out = logger.getvalue()
            lines = out.strip().split("\n")
            assert len(lines) <= 2
            for line in lines:
>               assert (
                    not line
                    or "Reconnecting" in line
                    or "garbage" in line
                    or set(line) == {"-"}
                ), line
E               AssertionError: Received heartbeat from unregistered worker 'inproc://10.213.1.205/15971/24'.

I think we can just swap reconnecting with unregistered worker in the acceptable output. It's not great that that happens, but the only fix for it is #6390. So in the interim, we can just accept that it may happen.

@gjoseph92 gjoseph92 added the flaky test Intermittent failures on CI. label Jun 9, 2022
gjoseph92 added a commit to gjoseph92/distributed that referenced this issue Jun 9, 2022
gjoseph92 added a commit to gjoseph92/distributed that referenced this issue Jun 9, 2022
@gjoseph92
Copy link
Collaborator Author

@graingert saw this again with a slightly different error. You can have much more than 2 lines when the heartbeat fails :)

___________________________ test_quiet_client_close ____________________________

loop = <tornado.platform.asyncio.AsyncIOMainLoop object at 0x13e44e110>

    def test_quiet_client_close(loop):
        with captured_logger(logging.getLogger("distributed")) as logger:
            with Client(
                loop=loop,
                processes=False,
                dashboard_address=":0",
                threads_per_worker=4,
            ) as c:
                futures = c.map(slowinc, range(1000), delay=0.01)
                sleep(0.200)  # stop part-way
            sleep(0.1)  # let things settle
    
            out = logger.getvalue()
            lines = out.strip().split("\n")
>           assert len(lines) <= 2
E           assert 16 <= 2
E            +  where 16 = len(["Received heartbeat from unregistered worker 'inproc://10.79.8.226/15050/24'.", 'Heartbeat to scheduler failed', 'Traceback (most recent call last):', '  File "/Users/runner/work/distributed/distributed/distributed/worker.py", line 1159, in heartbeat', '    response = await retry_operation(', '  File "/Users/runner/work/distributed/distributed/distributed/utils_comm.py", line 383, in retry_operation', ...])

distributed/tests/test_client.py:4934: AssertionError

https://github.com/dask/distributed/runs/7672977739?check_suite_focus=true#step:11:1346

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky test Intermittent failures on CI.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant