Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky Test Teardown: test_matrix_multi_user_roaming #4605

Closed
Dominik1999 opened this issue Aug 14, 2019 · 5 comments
Closed

Flaky Test Teardown: test_matrix_multi_user_roaming #4605

Dominik1999 opened this issue Aug 14, 2019 · 5 comments

Comments

@Dominik1999
Copy link
Contributor

Dominik1999 commented Aug 14, 2019

We seem to have a flaky test in

raiden/tests/integration/network/transport/test_matrix_transport.py::test_matrix_multi_user_roaming
happend on circle ci
see https://circleci.com/gh/raiden-network/raiden/71825#tests/containers/4

raiden/tests/integration/network/transport/test_matrix_transport.py::test_matrix_multi_user_roaming[matrix-private_rooms0-6-3] FAILED [ 62%]
raiden/tests/integration/network/transport/test_matrix_transport.py::test_matrix_multi_user_roaming[matrix-private_rooms0-6-3] ERROR [ 62%]
raiden/tests/integration/test_recovery.py::test_recovery_happy_case[matrix-False-3-channels_per_node0-10] PASSED [ 75%]
raiden/tests/integration/long_running/test_integration_events.py::test_query_events[matrix-False-0-2] PASSED [ 87%]
raiden/tests/integration/api/test_restapi.py::test_hex_converter PASSED  [100%]

==================================== ERRORS ====================================
_ ERROR at teardown of test_matrix_multi_user_roaming[matrix-private_rooms0-6-3] _

tp = <class 'Failed'>, value = None, tb = None

    def reraise(tp, value, tb=None):
        try:
            if value is None:
                value = tp()
            if value.__traceback__ is not tb:
                raise value.with_traceback(tb)
>           raise value

../venv-3.7-LINUX/lib/python3.7/site-packages/six.py:693: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../venv-3.7-LINUX/lib/python3.7/site-packages/six.py:693: in reraise
    raise value
../venv-3.7-LINUX/lib/python3.7/site-packages/six.py:693: in reraise
    raise value
raiden/tests/integration/fixtures/transport.py:82: in matrix_transports
    transport.stop()
raiden/network/transport/matrix/transport.py:473: in stop
    retrier.notify()
raiden/network/transport/matrix/transport.py:169: in notify
    with self._lock:
src/gevent/_semaphore.py:252: in gevent.__semaphore.Semaphore.__enter__
    ???
src/gevent/_semaphore.py:253: in gevent.__semaphore.Semaphore.__enter__
    ???
src/gevent/_semaphore.py:239: in gevent.__semaphore.Semaphore.acquire
    ???
src/gevent/_semaphore.py:179: in gevent.__semaphore.Semaphore._do_wait
    ???
src/gevent/_greenlet_primitives.py:59: in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
    ???
src/gevent/_greenlet_primitives.py:59: in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
    ???
src/gevent/_greenlet_primitives.py:63: in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   Failed: Teardown timeout >540.0s. This must not happen, when the teardown times out not all finalizers got a chance to run. This means not all fixtures are cleaned up, which can make subsequent tests flaky. This would be the case for pending greenlets which are not cleared by previous run.

src/gevent/__greenlet_primitives.pxd:35: Failed
=================================== FAILURES ===================================
__________ test_matrix_multi_user_roaming[matrix-private_rooms0-6-3] ___________

matrix_transports = [<MatrixTransport node:0x25912d70117A032CbF1477678b7448aBbeaF33CE id:4d8e294d-4db3-4b3b-a27f-4daa1a580482>, <MatrixTra...9bad27d553>, <MatrixTransport node:0xF47fA4e9bfB69307CF38E85A3f91b2243B562c2f id:0301887d-f5f0-46ab-b3db-a89a23b9659b>]

    @pytest.mark.parametrize("matrix_server_count", [3])
    @pytest.mark.parametrize("number_of_transports", [6])
    def test_matrix_multi_user_roaming(matrix_transports):
        # 6 transports on 3 servers, where (0,3), (1,4), (2,5) are one the same server
        (
            transport_rs0_0,
            transport_rs0_1,
            transport_rs0_2,
            transport_rs1_0,
            transport_rs1_1,
            transport_rs1_2,
        ) = matrix_transports
        received_messages0 = set()
        received_messages1 = set()
    
        message_handler0 = MessageHandler(received_messages0)
        message_handler1 = MessageHandler(received_messages1)
    
        raiden_service0 = MockRaidenService(message_handler0)
        raiden_service1 = MockRaidenService(message_handler1)
    
        # Both nodes on the same server
        transport_rs0_0.start(raiden_service0, message_handler0, "")
        transport_rs1_0.start(raiden_service1, message_handler1, "")
    
        transport_rs0_0.start_health_check(raiden_service1.address)
        transport_rs1_0.start_health_check(raiden_service0.address)
    
        wait_for_peer_reachable(transport_rs0_0, raiden_service1.address)
        wait_for_peer_reachable(transport_rs1_0, raiden_service0.address)
    
        assert ping_pong_message_success(transport_rs0_0, transport_rs1_0)
    
        # Node two switches to second server
        transport_rs1_0.stop()
        wait_for_peer_unreachable(transport_rs0_0, raiden_service1.address)
    
        transport_rs1_1.start(raiden_service1, message_handler1, "")
        transport_rs1_1.start_health_check(raiden_service0.address)
    
        wait_for_peer_reachable(transport_rs0_0, raiden_service1.address)
        wait_for_peer_reachable(transport_rs1_1, raiden_service0.address)
    
        assert ping_pong_message_success(transport_rs0_0, transport_rs1_1)
    
        # Node two switches to third server
        transport_rs1_1.stop()
        wait_for_peer_unreachable(transport_rs0_0, raiden_service1.address)
    
        transport_rs1_2.start(raiden_service1, message_handler1, "")
        transport_rs1_2.start_health_check(raiden_service0.address)
    
        wait_for_peer_reachable(transport_rs0_0, raiden_service1.address)
        wait_for_peer_reachable(transport_rs1_2, raiden_service0.address)
    
        assert ping_pong_message_success(transport_rs0_0, transport_rs1_2)
        # Node one switches to second server, Node two back to first
        transport_rs0_0.stop()
        transport_rs1_2.stop()
    
        transport_rs0_1.start(raiden_service0, message_handler0, "")
        transport_rs0_1.start_health_check(raiden_service1.address)
        transport_rs1_0.start(raiden_service1, message_handler1, "")
        transport_rs1_0.start_health_check(raiden_service0.address)
    
        wait_for_peer_reachable(transport_rs0_1, raiden_service1.address)
        wait_for_peer_reachable(transport_rs1_0, raiden_service0.address)
    
        assert ping_pong_message_success(transport_rs0_1, transport_rs1_0)
    
        # Node two joins on second server again
        transport_rs1_0.stop()
        wait_for_peer_unreachable(transport_rs0_1, raiden_service1.address)
    
        transport_rs1_1.start(raiden_service1, message_handler1, "")
        transport_rs1_1.start_health_check(raiden_service0.address)
    
        wait_for_peer_reachable(transport_rs0_1, raiden_service1.address)
        wait_for_peer_reachable(transport_rs1_1, raiden_service0.address)
    
        assert ping_pong_message_success(transport_rs0_1, transport_rs1_1)
    
        # Node two switches to third server
        transport_rs1_1.stop()
        wait_for_peer_unreachable(transport_rs0_1, raiden_service1.address)
    
        transport_rs1_2.start(raiden_service1, message_handler1, "")
        transport_rs1_2.start_health_check(raiden_service0.address)
    
        wait_for_peer_reachable(transport_rs0_1, raiden_service1.address)
        wait_for_peer_reachable(transport_rs1_2, raiden_service0.address)
    
        assert ping_pong_message_success(transport_rs0_1, transport_rs1_2)
    
        # Node one switches to third server, node two switches to first server
        transport_rs0_1.stop()
        transport_rs1_2.stop()
    
        transport_rs0_2.start(raiden_service0, message_handler0, "")
        transport_rs0_2.start_health_check(raiden_service1.address)
        transport_rs1_0.start(raiden_service1, message_handler1, "")
        transport_rs1_0.start_health_check(raiden_service0.address)
    
        wait_for_peer_reachable(transport_rs0_2, raiden_service1.address)
        wait_for_peer_reachable(transport_rs1_0, raiden_service0.address)
    
>       assert ping_pong_message_success(transport_rs0_2, transport_rs1_0)

raiden/tests/integration/network/transport/test_matrix_transport.py:1022: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

transport0 = <MatrixTransport node:0x25912d70117A032CbF1477678b7448aBbeaF33CE id:80f7466b-28e8-46d7-bac0-5869a32259a8>
transport1 = <MatrixTransport node:0xF47fA4e9bfB69307CF38E85A3f91b2243B562c2f id:76ea6e3e-284b-4f6f-ab64-385f73307cd1>

    def ping_pong_message_success(transport0, transport1):
        queueid0 = QueueIdentifier(
            recipient=transport0._raiden_service.address,
            canonical_identifier=CANONICAL_IDENTIFIER_GLOBAL_QUEUE,
        )
    
        queueid1 = QueueIdentifier(
            recipient=transport1._raiden_service.address,
            canonical_identifier=CANONICAL_IDENTIFIER_GLOBAL_QUEUE,
        )
    
        transport0_raiden_queues = views.get_all_messagequeues(
            views.state_from_raiden(transport0._raiden_service)
        )
        transport1_raiden_queues = views.get_all_messagequeues(
            views.state_from_raiden(transport1._raiden_service)
        )
    
        transport0_raiden_queues[queueid1] = []
        transport1_raiden_queues[queueid0] = []
    
        received_messages0 = transport0._raiden_service.message_handler.bag
        received_messages1 = transport1._raiden_service.message_handler.bag
    
        msg_id = random.randint(1e5, 9e5)
    
        ping_message = Processed(message_identifier=msg_id, signature=EMPTY_SIGNATURE)
        pong_message = Delivered(delivered_message_identifier=msg_id, signature=EMPTY_SIGNATURE)
    
        transport0_raiden_queues[queueid1].append(ping_message)
    
        transport0._raiden_service.sign(ping_message)
        transport1._raiden_service.sign(pong_message)
        transport0.send_async(queueid1, ping_message)
    
        with Timeout(TIMEOUT_MESSAGE_RECEIVE, exception=False):
            all_messages_received = False
            while not all_messages_received:
                all_messages_received = (
                    ping_message in received_messages1 and pong_message in received_messages0
                )
                gevent.sleep(0.1)
>       assert ping_message in received_messages1
E       assert <Processed [msghash=-0x4ec1fdbf362f3a45]> in {<Processed [msghash=0x21ba299444774a80]>, <Delivered [msghash=0x149ac7d22dea0ba1]>, <Processed [msghash=0x701cd353a53...msghash=-0x4cd9e1c3fceda000]>, <Processed [msghash=0x7f093e4ebbc7eee4]>, <Delivered [msghash=0x7a698de1d740c8e9]>, ...}

raiden/tests/integration/network/transport/test_matrix_transport.py:89: AssertionError
=============================== warnings summary ===============================
raiden/tests/integration/network/proxies/test_token_network.py::test_token_network_deposit_race
raiden/tests/integration/network/proxies/test_token_network.py::test_token_network_deposit_race
raiden/tests/integration/network/proxies/test_token_network.py::test_token_network_deposit_race
raiden/tests/integration/api/test_restapi.py::test_api_channel_state_change_errors[matrix-False-0-1]
raiden/tests/integration/api/test_restapi.py::test_api_channel_state_change_errors[matrix-False-0-1]
raiden/tests/integration/api/test_restapi.py::test_api_channel_state_change_errors[matrix-False-0-1]
raiden/tests/integration/api/test_restapi.py::test_api_channel_state_change_errors[matrix-False-0-1]
raiden/tests/integration/api/test_restapi.py::test_api_channel_state_change_errors[matrix-False-0-1]
raiden/tests/integration/api/test_restapi.py::test_api_channel_state_change_errors[matrix-False-0-1]
raiden/tests/integration/api/test_restapi.py::test_api_channel_state_change_errors[matrix-False-0-1]
raiden/tests/integration/transfer/test_refundtransfer.py::test_different_view_of_last_bp_during_unlock[matrix-False-channels_per_node0-3-test_different_view_of_last_bp_during_unlock:{}]
raiden/tests/integration/transfer/test_refundtransfer.py::test_different_view_of_last_bp_during_unlock[matrix-False-channels_per_node0-3-test_different_view_of_last_bp_during_unlock:{}]
raiden/tests/integration/transfer/test_refundtransfer.py::test_different_view_of_last_bp_during_unlock[matrix-False-channels_per_node0-3-test_different_view_of_last_bp_during_unlock:{}]
raiden/tests/integration/transfer/test_refundtransfer.py::test_different_view_of_last_bp_during_unlock[matrix-False-channels_per_node0-3-test_different_view_of_last_bp_during_unlock:{}]
raiden/tests/integration/transfer/test_refundtransfer.py::test_different_view_of_last_bp_during_unlock[matrix-False-channels_per_node0-3-test_different_view_of_last_bp_during_unlock:{}]
raiden/tests/integration/transfer/test_refundtransfer.py::test_different_view_of_last_bp_during_unlock[matrix-False-channels_per_node0-3-test_different_view_of_last_bp_during_unlock:{}]
raiden/tests/integration/transfer/test_refundtransfer.py::test_different_view_of_last_bp_during_unlock[matrix-False-channels_per_node0-3-test_different_view_of_last_bp_during_unlock:{}]
raiden/tests/integration/rpc/assumptions/test_rpc_transaction_assumptions.py::test_transact_opcode
raiden/tests/integration/test_recovery.py::test_recovery_happy_case[matrix-False-3-channels_per_node0-10]
raiden/tests/integration/test_recovery.py::test_recovery_happy_case[matrix-False-3-channels_per_node0-10]
raiden/tests/integration/test_recovery.py::test_recovery_happy_case[matrix-False-3-channels_per_node0-10]
raiden/tests/integration/test_recovery.py::test_recovery_happy_case[matrix-False-3-channels_per_node0-10]
raiden/tests/integration/test_recovery.py::test_recovery_happy_case[matrix-False-3-channels_per_node0-10]
raiden/tests/integration/test_recovery.py::test_recovery_happy_case[matrix-False-3-channels_per_node0-10]
raiden/tests/integration/test_recovery.py::test_recovery_happy_case[matrix-False-3-channels_per_node0-10]
raiden/tests/integration/long_running/test_integration_events.py::test_query_events[matrix-False-0-2]
raiden/tests/integration/long_running/test_integration_events.py::test_query_events[matrix-False-0-2]
raiden/tests/integration/long_running/test_integration_events.py::test_query_events[matrix-False-0-2]
raiden/tests/integration/long_running/test_integration_events.py::test_query_events[matrix-False-0-2]
raiden/tests/integration/long_running/test_integration_events.py::test_query_events[matrix-False-0-2]
raiden/tests/integration/long_running/test_integration_events.py::test_query_events[matrix-False-0-2]
raiden/tests/integration/long_running/test_integration_events.py::test_query_events[matrix-False-0-2]
  /home/circleci/venv-3.7-LINUX/lib/python3.7/site-packages/web3/utils/transactions.py:28: DeprecationWarning: chainId to be deprecated in v5, according to EIP 1474
    'chainId': lambda web3, tx: web3.net.chainId,

-- Docs: https://docs.pytest.org/en/latest/warnings.html
- generated xml file: /home/circleci/raiden/test-reports/test-integration-matrix-3.7/results.xml -
== 1 failed, 7 passed, 698 deselected, 32 warnings, 1 error in 567.65 seconds ==
Exited with code 1

@palango palango added this to the Alderaan milestone Aug 14, 2019
@Dominik1999 Dominik1999 modified the milestone: Alderaan Aug 14, 2019
@ulope
Copy link
Collaborator

ulope commented Aug 14, 2019

Since the test first failed half way through and then teardown failed I suspect a synapse bug.
Since we're still on a quite old version this isn't too unlikely.

Related: #3387, #4646

@Dominik1999 Dominik1999 removed this from the Alderaan milestone Aug 29, 2019
@ulope ulope changed the title Flaky Test: test_matrix_multi_user_roaming Flaky Test Teardown: test_matrix_multi_user_roaming Aug 29, 2019
@rakanalh rakanalh added this to the Bespin milestone Sep 6, 2019
@LefterisJP
Copy link
Contributor

@Dominik1999 @ulope after switching to synapse 1.0 has this flakiness been observed in the wild again? If not I would close this issue.

@Dominik1999
Copy link
Contributor Author

I don't know. Good point, do we still have disabled integration tests because of flakiness?

@LefterisJP
Copy link
Contributor

This test was never skipped afaik.

@karlb
Copy link
Contributor

karlb commented Sep 25, 2020

Did not happen for a long time, so this is probably not the case, anymore.

@karlb karlb closed this as completed Sep 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants