Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #3588 #3598

Merged
merged 5 commits into from
Mar 14, 2019
Merged

Fix #3588 #3598

merged 5 commits into from
Mar 14, 2019

Conversation

err508
Copy link
Contributor

@err508 err508 commented Mar 7, 2019

Adds a check with retries to the _get_room_for_address method, to prevent sending messages to empty rooms. After discussion with @andrevmatos, we will not leave the room but assume the peer will join eventually.

Fixes #3588 race condition, where we sent messages to a room, where no peer has joined yet or never will.

@err508 err508 requested review from ulope and andrevmatos March 7, 2019 17:37
@codecov
Copy link

codecov bot commented Mar 8, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@3073e65). Click here to learn what that means.
The diff coverage is 78.94%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #3598   +/-   ##
=========================================
  Coverage          ?   77.25%           
=========================================
  Files             ?      103           
  Lines             ?    13723           
  Branches          ?     1931           
=========================================
  Hits              ?    10602           
  Misses            ?     2461           
  Partials          ?      660
Impacted Files Coverage Δ
raiden/network/transport/matrix/client.py 70.22% <100%> (ø)
raiden/network/transport/matrix/utils.py 76.77% <100%> (ø)
raiden/network/transport/matrix/transport.py 76.6% <72.41%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3073e65...ee06934. Read the comment docs.

room_is_empty = not bool(peer_ids & member_ids)
if room_is_empty or last_ex:
if last_ex:
raise last_ex
Copy link
Contributor

@andrevmatos andrevmatos Mar 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of raising last_ex inside for retry loop? It's the same as not handling the exception at all. The raise last_ex inside else of for loop (or in a if last_ex later) is a pattern to allow swallowing exceptions a given number of times, and re-raising them iff for couldn't succeed (break) on the retries. Re-raising inside the loop is the same as not catching it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to only raise the exception once the retries have been exhausted, as the server response might change in the meantime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know. That's the pattern, but the code there raises on the first exception, which is the same as not handling it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah sorry, didn't see it was there twice

if room_is_empty or last_ex:
if last_ex:
raise last_ex
gevent.sleep(retry_interval)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if not going for a separate function, I'd still suggest to use stop_event.wait instead of gevent.sleep and breaking early in case it returns True (was set), so it properly return in case client is stopping, instead of lying around way longer than any Runnable's child is supposed to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, changed it _handle_invite as well.

@err508 err508 force-pushed the #3588 branch 2 times, most recently from aea11b9 to e2be88e Compare March 8, 2019 16:17
last_ex = e
room_is_empty = not bool(peer_ids & member_ids)
if room_is_empty or last_ex:
self._stop_event.wait(retry_interval)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need also to break if return of wait is True, for the loop to exit.

@err508 err508 force-pushed the #3588 branch 3 times, most recently from 467a9df to 4207b7c Compare March 8, 2019 17:47
Copy link
Contributor

@andrevmatos andrevmatos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Collaborator

@ulope ulope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

else:
# Inform the client, that currently no one listens:
self.log.error(
'Peer has not joined from invite yet, should join in the eventually',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the

len(received_messages1) == number_of_received_messages1 + 1
)
gevent.sleep(.1)
message = Processed(message_identifier=number_of_received_messages1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in #3385: This looks to me as if an assert all_messages_received is missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to happen in the tests as here, I found it cleaner to assert directly in the tests, not in the helper function

@err508
Copy link
Contributor Author

err508 commented Mar 13, 2019

  • has been tested for regression of hanging transfers
  • has been tested for users roaming with
  - transport01.raiden.network
  - transport02.raiden.network
  - transport03.raiden.network

err508 and others added 4 commits March 14, 2019 13:22
condition:
- Client A invites
- The invite triggers _handle_invite in Client B's transport
- Client A starts sending messages to Client B
- Messages are lost, as the invite was not processed yet

The race condition will be fixed in another PR.
Appeared during raiden-network#3124, related raiden-network#2779, raiden-network#3123.
if someone joins the room we created. If no peer joins, we
leave the room as well.
This fixes a race condition, where we sent messages to a room,
where no peer has joined yet or never will.
to speed up the sync if necessary.
[skip tests]
@err508 err508 merged commit 61bdaff into raiden-network:master Mar 14, 2019
This was referenced Mar 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component / Transport Transport related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants