Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow payments using Breez SDK #502

Open
afterburn opened this issue Aug 20, 2024 · 15 comments
Open

Slow payments using Breez SDK #502

afterburn opened this issue Aug 20, 2024 · 15 comments

Comments

@afterburn
Copy link

We're using Breez SDK and while testing paying lightning invoices we experience anywhere from 30s to multiple minutes of delay before a payment completes fully where the latter is an unacceptable UX to us. I've tried to get to the bottom of this with the Breez team but they've told me to open an issue here, so here it is. Attached you'll find two logs, one for a relatively fast payment and one for what we experience to be a slow payment.

breez-payment-fast.log
breez-payment-slow.log

Any help/guidance as to what might be causing this and how to resolve this would be greatly appreciated.

@kingonly
Copy link
Collaborator

There's 5 minutes log in the slow one...

What's the node pubkey?
What the payment timestamp?
What the payment destination?

@afterburn
Copy link
Author

@kingonly Okay, attaching a new log file. This time I took note of the details you requested.

breez.log

Node Pubkey/Id

02d3cbfaf44dbb8636bd12354f042b19015ecd65162ea5bd1d4aeb99bb16c7ad72

Payment Timestamp

20:10:45

Payment Destination

lnbc1u1pnvfhqhpp5qfz0se9lfqakv74wpalty00n85ezmv45tnve33r029zep477ch9qcqpjsp5gp5dc08p92hmrn3h9r2dx4n8g96cn5rmrvhjjtcwquvnce23jw2q9q7sqqqqqqqqqqqqqqqqqqqsqqqqqysgqdqqmqz9gxqyjw5qrzjqwryaup9lh50kkranzgcdnn2fgvx390wgj5jd07rwr3vxeje0glcll6r2vqt00symuqqqqlgqqqqqeqqjq7ngrt8ftkweytv34dnfpt55nlyj6la2m26swerrguzxvpkk8pmgkxqlmdvm9ed0dxcart8lxvfc8g99z3f6cv6dpaerymxdlgh422lgpxkpxr9

@kingonly
Copy link
Collaborator

This is a payment to Phoenix. Are you testing it on the same device? Can you please send to another device? Sending between two non custodial wallets on the same device can explain the latency.

@afterburn
Copy link
Author

@kingonly I used two separate devices, a physical Samsung phone running Phoenix on android and the other is an iOS simulator running our app. They were on the same network.

@kingonly
Copy link
Collaborator

@cdecker could you help me debug this payment? I can pay Phoenix quickly in my environment fwiw

@afterburn
Copy link
Author

breez.log

FWIW here is another log where we pay an invoice to WoS from our app, two separate devices as before. Same pubkey/node id. Payment timestamp 22:33:51. Payment destination lnbc10u1pnvflvqpp5quwkpcuaw5vm98gpgqn0tgxksrne4rspq4689suctwha5xcmyxasdqqcqzzsxqyz5vqsp59stlvtzazvx58u99rygxezqsqjdwjlcc747xxhc0ztwtr3jj89xq9qxpqysgqp24a25mhgvhu3mvrzq5w9pzxyxuf0yn77rf4n3td6petg2p00k9n4cum3sfyahqqjqxas6fckfcv2qljygwxkyyzhsl0h3z5n3yyegcq5a80ay.

@cdecker
Copy link
Collaborator

cdecker commented Aug 21, 2024

For future reference the sessions are:

  • 7231591378063360
  • 7231722781413377
  • 7231756495228928

Looking at the slow payments only here

7231591378063360

Timeline

  • 2024-08-20T11:22:21+02:00 Scheduled
  • 2024-08-20T11:22:23+02:00 Startup completed
  • 2024-08-20T11:22:24+02:00 Signer attached
  • 2024-08-20T11:22:25+02:00 Gossip sync complete

The first successful payment

  • 2024-08-20T11:49:39+02:00 Call to pay
  • 2024-08-20T11:49:39+02:00 Pre-flight checks completed (by this time several signers had registered, all of them disconnect, potentially due to the app opening and closing. We do get a response for our request, so that's not a problem)
  • 2024-08-20T11:51:15+02:00 Payment succeeds

The second payment:

  • 2024-08-20T11:51:18+02:00 Call to pay
  • 2024-08-20T11:51:18+02:00 Pre-flight check completed
  • 2024-08-20T11:51:18+02:00 Created outgoing onion for route: 844981x3824x0 -> 840936x1057x0 -> 856647x1703x0 -> 845136x1684x1 -> 854798x1031x6 -> 848375x2725x0 -> 836841x2426x1 -> 762578x410x1 -> 852303x2834x0
  • 2024-08-20T11:51:37+02:00 partid 0: Intermediate node 02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924 reported 1007 (WIRE_TEMPORARY_CHANNEL_FAILURE) at 836841x2426x1
  • 2024-08-20T11:51:38+02:00 partid 1: Created outgoing onion for route: 844981x3824x0 -> 840936x1057x0 -> 856647x1703x0 -> 845136x1684x1 -> 854798x1031x6 -> 848375x2648x0 -> 836841x2419x1 -> 762578x410x1 -> 852303x2834x0
  • 2024-08-20T11:51:50+02:00 Intermediate node 039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec465019c3 reported 1007 (WIRE_TEMPORARY_CHANNEL_FAILURE) at 836841x2419x1

So as you can see we get unlucky and try a channel 836841x2426x1, which has insufficient capacity. This attempt alone cost 19s to execute. The problem here is twofold:

  • Our scoring does not dampen the chances of these very slow attempts out. Scoring TOR and known slow nodes lower should help reduce our reliance on them, skipping expensive attempts. The right parametrization here is key: if we eliminate to eagerly we may end up excluding routes that would have worked, but we were prejudiced, on the other hand not avoiding these nodes hard enough results in us running into these traps.
  • We don't remember scoring across payment attempts. That means that every payments starts with a blank slate, so even if we had learned something interesting, by the time we could reuse it, we've already discarded the information. We're working on this in pay: Remember and update channel_hints across payments ElementsProject/lightning#7494 which will allow us to a) remember inferred information across payments, b) share that information across several plugins such as keysend and pay and c) we can even share that information across nodes (useful for nodes that don't do many payments).

The remaining payments in the logs are aggregated as this:

ParserContext(
│   payments=[
│   │   Payment(
│   │   │   parts=[Part(partid='0', route=Route(hops=['844981x3824x0', '840936x1057x0', '855917x1467x1', '16728915x47038x1247']), error=None, code=None, failchan=None, failnode=None, cltv='857836', amount_msat='1644164', htlc_id='169')]
│   │   ),
│   │   Payment(
│   │   │   parts=[
│   │   │   │   Part(
│   │   │   │   │   partid='0',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '836841x2426x1', '762578x410x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='836841x2426x1',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858209',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='170'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='1',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2648x0', '836841x2419x1', '762578x410x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='836841x2419x1',
│   │   │   │   │   failnode='039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec465019c3',
│   │   │   │   │   cltv='858209',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='171'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='2',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '838707x2928x1', '850207x295x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='838707x2928x1',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858273',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='172'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='3',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '843228x418x0', '850207x295x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='843228x418x0',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858273',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='173'
│   │   │   │   )
│   │   │   ]
│   │   ),
│   │   Payment(
│   │   │   parts=[Part(partid='0', route=Route(hops=['844981x3824x0', '840936x1057x0', '855917x1467x1', '16728915x47038x1247']), error=None, code=None, failchan=None, failnode=None, cltv='857837', amount_msat='1643164', htlc_id='174')]
│   │   ),
│   │   Payment(
│   │   │   parts=[
│   │   │   │   Part(
│   │   │   │   │   partid='0',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '836841x2426x1', '762578x410x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='836841x2426x1',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858209',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='175'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='1',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2648x0', '836841x2419x1', '762578x410x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='836841x2419x1',
│   │   │   │   │   failnode='039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec465019c3',
│   │   │   │   │   cltv='858209',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='176'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='2',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '838707x2928x1', '850207x295x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='838707x2928x1',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858273',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='177'
│   │   │   │   )
│   │   │   ]
│   │   ),
│   │   Payment(
│   │   │   parts=[Part(partid='0', route=Route(hops=['844981x3824x0', '840936x1057x0', '855917x1467x1', '16728915x47038x1247']), error=None, code=None, failchan=None, failnode=None, cltv='857838', amount_msat='1501150', htlc_id='178')]
│   │   ),
│   │   Payment(
│   │   │   parts=[
│   │   │   │   Part(
│   │   │   │   │   partid='0',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '836841x2426x1', '762578x410x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='836841x2426x1',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858210',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='179'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='1',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2648x0', '836841x2419x1', '762578x410x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='836841x2419x1',
│   │   │   │   │   failnode='039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec465019c3',
│   │   │   │   │   cltv='858210',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='180'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='2',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '838707x2928x1', '850207x295x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='838707x2928x1',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858274',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='181'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='3',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '843228x418x0', '850207x295x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='843228x418x0',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858274',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='182'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='4',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2648x0', '838710x1842x0', '850207x295x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='838710x1842x0',
│   │   │   │   │   failnode='039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec465019c3',
│   │   │   │   │   cltv='858274',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='183'
│   │   │   │   )
│   │   │   ]
│   │   ),
│   │   Payment(
│   │   │   parts=[Part(partid='0', route=Route(hops=['844981x3824x0', '840936x1057x0', '855917x1467x1', '16728915x47038x1247']), error=None, code=None, failchan=None, failnode=None, cltv='857838', amount_msat='1645164', htlc_id='184')]
│   │   ),
│   │   Payment(
│   │   │   parts=[
│   │   │   │   Part(
│   │   │   │   │   partid='0',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '836841x2426x1', '762578x410x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='836841x2426x1',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858210',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='185'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='1',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2648x0', '836841x2419x1', '762578x410x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='836841x2419x1',
│   │   │   │   │   failnode='039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec465019c3',
│   │   │   │   │   cltv='858210',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='186'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='2',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '838707x2928x1', '850207x295x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='838707x2928x1',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858274',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='187'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='3',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '843228x418x0', '850207x295x1', '852303x2834x0']),
│   │   │   │   │   error=None,
│   │   │   │   │   code=None,
│   │   │   │   │   failchan=None,
│   │   │   │   │   failnode=None,
│   │   │   │   │   cltv='858274',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='188'
│   │   │   │   )
│   │   │   ]
│   │   )
│   ]
)

(This is the output of a log slicer that I'm building, since analyzing
these logs is getting tricky, and rather time-consuming, apologies for
the lack of timestamps, which would be useful here, but I pulled apart
the first payment to give an idea on timings)

As you can see we're losing a lot of time talking to
02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924,
a.k.a., that first failing hop in the second payment, costing us
dozens of seconds. A correct scoring and remembering prior information
would de-emphasize that node, and boost other, more successful one.

If desired I can analyse the other logs too, but I think this
highlights the issue rather nicely.

@kingonly
Copy link
Collaborator

@yaslama please take a look at our channels with this node. Perhaps we should close our channels, although it seems like large routing node.

Meanwhile, we've opened direct channels to Acinq to mitigate this issue. Please report if this problem persists.

@afterburn
Copy link
Author

@cdecker @kingonly is there any ETA on the change @cdecker is suggesting?

@yaslama
Copy link

yaslama commented Aug 26, 2024

We have new channels to Acinq and it should be sufficient to mitigate the issue. But i also closed the channel to 02a98e8c

@afterburn
Copy link
Author

afterburn commented Aug 28, 2024

We are still experiencing times well above 15s, anywhere from 30s - 1m30s both to non-custodial and custodial recipients. What else can be done to get this resolved? Happy to assist wherever possible.

@kingonly
Copy link
Collaborator

Provide the details of a specific payment and we'll take a look

@dallasirushing
Copy link

Hey guys

Summary of the problem:

  • We're facing 30 seconds or more for all payments to reach finality whether we pay custodial wallets or non-custodial wallets.

I'm concerned that this discussion has gotten to a point where the resolution was to fix a route with Phoenix, but I can't see how this solves the "general slowness" that we're facing.

@kingonly
Copy link
Collaborator

The problem is that we're not aware of a general issue. I don't see any other way than debugging specific payments. Please share more info in the issue re specific payments so we could follow up with the GL team.

We're also working on a trampoline payments implementation that might mitigate this issue but we're still a few weeks away from delivering it.

@cdecker
Copy link
Collaborator

cdecker commented Aug 28, 2024

Apologies for not providing the requested ETA for general enhancements of payments. We were heads down implementing it, and we're getting close with CLN#7494, but sadly we missed the release window for v24.08. As soon as we've finalized the PR we will start back porting it onto v24.02 which corresponds to the latest Breez SDK, and from there on we can fine tune in the servers quickly. So a couple of weeks at most for general improvements. In the meantime the trampoline progress is looking promising, and that'll alleviate the issue maybe even a bit quicker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants