Payment couldn't be completed because: there is no route available PFS5 #4927

agatsoh · 2019-09-13T08:36:39Z

Description

Seems the node is not getting any routes from the PFS for the target node.

f'HTTP status code "{resp.status_code}" while fetching {url}. '
scenario_player.exceptions.legacy.RESTAPIStatusMismatchError: HTTP status code "409" while fetching http://127.0.0.1:41163/api/v1/payments/0x62083c80353Df771426D209eF578619EE68D5C7A/0x536b7368575301950A00E0B945f75D22AD64534e. Expected 200: {"errors": "Payment couldn't be completed because: there is no route available"}

scenario-player-run_pfs5_too_low_capacity_2019-09-13T13:08:38.log

node_logs_0_1_2_3_4.zip

A snapshot of the console of the SP from my local machine
https://gist.github.com/agatsoh/f53f16372f87e672c21ec4989b8ea845

Edit:
Raiden commit

(raiden_env) [krishna@krishna-pc raiden]$ git log
commit bb0d6563ba5328399e303c74c00ed68bb210535b (HEAD -> develop, upstream/develop)
Author: Yoichi Hirai <[email protected]>
Date:   Thu Sep 12 18:36:43 2019 +0200

    Fixes on @LefterisJP's comments

SP commit

(raiden_env) [krishna@krishna-pc scenario-player]$ git log
commit 272da93ef08d53d176e18ef261a0952185912244 (HEAD -> dev, upstream/dev)
Merge: f28b481 c54466e
Author: Nils Diefenbach <[email protected]>
Date:   Fri Sep 13 08:58:44 2019 +0200

    [FIX-#319] Respect token.address scenario setting

Expected Behavior

The node should be able to find the correct path. The PFS should reply with the correct path.

The text was updated successfully, but these errors were encountered:

czepluch · 2019-09-13T08:55:06Z

For me this scenario passes locally. Will try to run it some more times and see if I can reproduce. Maybe it's flaky. Might be related to https://github.com/orgs/raiden-network/projects/12#card-26353251 that also seems to have shown flaky behaviour.

czepluch · 2019-09-13T09:20:50Z

It was indeed the first assert_pfs_history that failed. So my assumption would be that the PFS acted wrongly. I will look into it.

agatsoh · 2019-09-13T10:08:24Z

Just checked for me fails all the time, I run the scenario like this

scenario_player --chain=goerli:http://10.104.6.13:8545 run --no-ui --keystore-file=/home/krishna/.ethereum/keystore/UTC--2018-10-12T07-01-18.438476520Z--8f2e0940bed6f90f1cb14feb37f045bb79c41b2d --password=${KEYSTORE_PW} /home/krishna/raidenforked/raiden/raiden/tests/scenarios/pfs5_too_low_capacity.yaml

czepluch · 2019-09-13T14:16:31Z

Took a look at the logs and also had @palango take a look. And it looks like nodes are UNREACHABLE or offline when the payment takes place. I have also managed to run into this problem locally myself now. Looks like a matrix problem. @ulope can you take a look when you're back from vacation? Or maybe @err508 has time?

czepluch · 2019-09-18T12:31:29Z

This doesn't just happen for the specific scenario. I have experienced it for several different ones, which points even further towards a transport/matrix problem.

czepluch · 2019-09-19T09:39:22Z

@ulope may I assign this to you? I know you're looking into it.

christianbrb · 2019-09-19T11:13:12Z

@andrevmatos also experienced this issue. A restart of the Raiden Node fixed it.

czepluch · 2019-09-19T11:36:48Z

@ulope Here are all the logs from a run that failed.

ms1_node21_fails_logs.zip

palango · 2019-09-20T10:33:03Z

I think I ran onto this as well yesterday. So if the PFS logs help you I can upload them.

andrevmatos · 2019-09-20T11:13:21Z

This seems related to matrix transport. I've seen some python client I had online appear as offline to the light-client, even if it continued to operate normally. Couldn't notice anything in the logs, but it seems like the sync stopped and the servers set the presence of the node as offline. If that's the cause, PFS is also not going to find the route because it'll see the node as offline in matrix. Restarting the node seems to fix it because it restarts the client/transport and the presence event is picked up again by the server and broadcast to the peers.

ulope · 2019-09-24T09:47:52Z

Current working theory is that some sort of problem is happening with federation requests between (at least) transport03 and transport01.

As mentioned in chat to help track down the cause please run the following command if you see this issue while running scenarios:

cat ~/.raiden/scenario-player/scenarios/<scenario-name>/node_<highest-run-number>_*/run-*.log | jq 'select(.event == "Using Matrix server") | .server_url'

(please substitute the correct values in the angle brackets).

palango · 2019-09-24T09:57:17Z

cat ~/.raiden/scenario-player/scenarios/mfee1_flat_fee/node_9_*/run-*.log | jq 'select(.event == "Using Matrix server") | .server_url'
"https://transport01.raiden.network"
"https://transport03.raiden.network"
"https://transport03.raiden.network"
"https://transport01.raiden.network"

Dominik1999 · 2019-09-24T10:06:23Z

➜  BF1-test cat node_3_*/run-*.log | jq 'select(.event == "Using Matrix server") | .server_url'
"https://transport01.raiden.network"
"https://transport03.raiden.network"
"https://transport01.raiden.network"
"https://transport01.raiden.network"
"https://transport01.raiden.network"
"https://transport01.raiden.network"

LefterisJP · 2019-09-24T10:17:34Z

Got the same problem running PFS1.

cat ~/.raiden/scenario-player/scenarios/pfs1_get_a_simple_path/node_0_003/run-000.log  | jq 'select(.event == "Using Matrix server") | .server_url'
"https://transport03.raiden.network"

konradkonrad · 2019-09-24T10:46:46Z

Same Problem for MS3.

LOGPATH="$HOME/.raiden/scenario-player/scenarios/ms3_simple_monitoring/"; cat $LOGPATH/node_$(cat $LOGPATH/run_number.txt)_*/*.log | jq 'select(.event == "Using Matrix server") | .server_url'
"https://transport01.raiden.network"
"https://transport01.raiden.network"

agatsoh · 2019-09-24T11:49:06Z

I tested this today and on my local machine with the latest SP and latest raiden

(raiden_env) [krishna@krishna-pc scenario-player]$ git log
commit 8ded944959e6d469056cbce268e0c4aa64d4c66a (HEAD -> dev, upstream/dev)
Author: Nils Diefenbach <[email protected]>
Date:   Tue Sep 24 11:35:57 2019 +0200

    Update scenario-example-v2.yaml

scenario-player-run_pfs5_too_low_capacity_2019-09-24T16:48:32.log

pfs5_too_low_capacity_node_logs.zip

err508 · 2019-09-24T12:33:42Z

I think these three logs are interesting, because all nodes are on transport3 here.
It's from a failed run of mfee1_flat_fee.
The user that causes problems is Alice, cb64, 16f0 are none of the other nodes, nor the scenario player.

Time line:

Scenario run starts at 10:10
at 10:11:30.75 Alice is online at transport3
at 10:11:30.76 transport1 learns that Alice is online
at 10:12:10.07156 transport1 receives a Presence update that Alice is offline from 16f0@transport1
at 10:12:10.0716 transport3 receives a Presence update that Alice is offline from cb64@transport1
at 10:17:10 Alice sync with transport03 noticing that Bob is online
at 10:17:19.7250 transport3 receives another presence update that Alice is offline from cb64@transport1
at 10:17:19.740495 transport1 receives a Presence update that Alice is offline from 16f0@transport1
at 10:17:50 the scenario fails, as Alice is supposed to be offline

shortened_html_logs.zip

err508 · 2019-09-24T12:35:33Z

@palango could you let me know what cb64 and 16f0 are doing in the scenario, could this be the pfs/ms?

palango · 2019-09-24T13:22:11Z

@err508 What are the full addresses

err508 · 2019-09-24T14:13:18Z

@palango 0x16f090b46c9eea72478a21f00764e2798d966d0b and 0xcb645ac5a359fb2c6dd2971cb5f556f6ceb1ea06 but nvm, these are two clients for the presence logging.

ulope · 2019-09-25T09:00:54Z

After I did a purge of the test transport servers databases the situation seems to have improved for many of the testers but unfortunately not all (I know @czepluch is still seeing the issue).

What I've discovered so far:

Raiden nodes that experience this issue appear to go offline because:
- They perform the normal matrix /sync long polling http request
- This triggers a federation request server side
- This request stalls for some yet unknown reason (eventually getting aborted with Connection reset by peer or a similar error)
- This leads to a 500 response from the /sync endpoint
  - Crucially 500 responses don't seem to update the 'last seen' mechanism on the server
- Raiden retries matrix requests that return 500 responses
- If this goes on long enough the 30 s timeout in synapse expires (see two bullets above) and the node is assumed offline (which gets broadcast as normal to all listening peers)

It doesn't seem to be a networking issue. I've run networking tests in the background all day yesterday and haven't seen any indications that this could be the reason.

I'm going ahead with upgrading the test servers to a recent synapse release (with a workaround for #4634) anyway in the hope that this fixes the federation issue.

Dominik1999 assigned agatsoh and czepluch Sep 13, 2019

christianbrb transferred this issue from raiden-network/scenario-player Sep 19, 2019

christianbrb added the Component / Transport Transport related issues label Sep 19, 2019

christianbrb assigned ulope and unassigned czepluch and agatsoh Sep 20, 2019

christianbrb added the State / Investigating For issues that are currently being looked into before labeling further label Sep 20, 2019

This was referenced Sep 20, 2019

[FEAT-#333] Allow assert_events to filter by event args raiden-network/scenario-player#340

Merged

PFS fails with "Expected route [0, 4, 3] not found. Actual routes: [[0, 1, 2, 3]]" #4866

Closed

christianbrb assigned err508 Sep 23, 2019

ulope added the Severity / Medium label Sep 23, 2019

rakanalh added this to the Alderaan milestone Sep 24, 2019

rakanalh closed this as completed Sep 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Payment couldn't be completed because: there is no route available PFS5 #4927

Payment couldn't be completed because: there is no route available PFS5 #4927

agatsoh commented Sep 13, 2019

czepluch commented Sep 13, 2019 •

edited

Loading

czepluch commented Sep 13, 2019

agatsoh commented Sep 13, 2019

czepluch commented Sep 13, 2019 •

edited

Loading

czepluch commented Sep 18, 2019

czepluch commented Sep 19, 2019

christianbrb commented Sep 19, 2019

czepluch commented Sep 19, 2019

palango commented Sep 20, 2019

andrevmatos commented Sep 20, 2019 •

edited

Loading

ulope commented Sep 24, 2019

palango commented Sep 24, 2019

Dominik1999 commented Sep 24, 2019 •

edited

Loading

LefterisJP commented Sep 24, 2019

konradkonrad commented Sep 24, 2019

agatsoh commented Sep 24, 2019 •

edited by christianbrb

Loading

err508 commented Sep 24, 2019 •

edited

Loading

err508 commented Sep 24, 2019

palango commented Sep 24, 2019

err508 commented Sep 24, 2019

ulope commented Sep 25, 2019 •

edited

Loading

Payment couldn't be completed because: there is no route available PFS5 #4927

Payment couldn't be completed because: there is no route available PFS5 #4927

Comments

agatsoh commented Sep 13, 2019

Description

Expected Behavior

czepluch commented Sep 13, 2019 • edited Loading

czepluch commented Sep 13, 2019

agatsoh commented Sep 13, 2019

czepluch commented Sep 13, 2019 • edited Loading

czepluch commented Sep 18, 2019

czepluch commented Sep 19, 2019

christianbrb commented Sep 19, 2019

czepluch commented Sep 19, 2019

palango commented Sep 20, 2019

andrevmatos commented Sep 20, 2019 • edited Loading

ulope commented Sep 24, 2019

palango commented Sep 24, 2019

Dominik1999 commented Sep 24, 2019 • edited Loading

LefterisJP commented Sep 24, 2019

konradkonrad commented Sep 24, 2019

agatsoh commented Sep 24, 2019 • edited by christianbrb Loading

err508 commented Sep 24, 2019 • edited Loading

err508 commented Sep 24, 2019

palango commented Sep 24, 2019

err508 commented Sep 24, 2019

ulope commented Sep 25, 2019 • edited Loading

czepluch commented Sep 13, 2019 •

edited

Loading

czepluch commented Sep 13, 2019 •

edited

Loading

andrevmatos commented Sep 20, 2019 •

edited

Loading

Dominik1999 commented Sep 24, 2019 •

edited

Loading

agatsoh commented Sep 24, 2019 •

edited by christianbrb

Loading

err508 commented Sep 24, 2019 •

edited

Loading

ulope commented Sep 25, 2019 •

edited

Loading