Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User can't send payment, always pending #482

Open
roeierez opened this issue Jul 18, 2024 · 4 comments
Open

User can't send payment, always pending #482

roeierez opened this issue Jul 18, 2024 · 4 comments
Assignees

Comments

@roeierez
Copy link
Contributor

roeierez commented Jul 18, 2024

Relai user tries to send payment that results in pending.
The logs shows that there is a timeout waiting for the channel to re-established. Also full logs are attached.

1721292295613.app.log

[BreezSdk] {INFO} (2024-07-18T08:45:36.502607Z) : node-logs: UNUSUAL plugin-gl-plugin-internal: Failing payment with: Timeout while waiting for channels to re-establish\n\nStack backtrace:\n   0: <gl_plugin_internal::node::InternalNodeServer<R> as cln_grpc::pb::node_server::Node>::pay::{{closure}}\n   1: <<cln_grpc::pb::node_server::NodeServer<T> as tower_service::Service<http::request::Request<B>>>::call::PaySvc<T> as tonic::server::service::UnaryService<cln_grpc::pb::PayRequest>>::call::{{closure}}\n   2: <cln_grpc::pb::node_server::NodeServer<T> as tower_service::Service<http::request::Request<B>>>::call::{{closure}}\n   3: <gl_plugin::node::rpcwait::RpcWaitService<S> as tower_service::Service<http::request::Request<hyper::body::body::Body>>>::call::{{closure}}\n   4: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll\n   5: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll\n   6: <tower::util::map_response::MapResponseFuture<F,N> as core::future::future::Future>::poll\n   7: <tonic::transport::service::router::RoutesFuture as core::future::future::Future>::poll\n   8: <gl_plugin::node::SignatureContextService<S> as tower_service::Service<http::request::Request<hyper::body::body::Body>>>::call::{{closure}}\n   9: <tower::util::either::Either<A,B> as core::future::future::Future>::poll\n  10: <tonic::transport::server::recover_error::ResponseFuture<F> as core::future::future::Future>::poll\n  11: <tonic::transport::server::SvcFuture<F> as core::future::future::Future>::poll\n  12: <hyper::proto::h2::server::H2Stream<F,B> as core::future::future::Future>::poll\n  13: tokio::runtime::task::core::Core<T,S>::poll\n  14: tokio::runtime::task::harness::Harness<T,S>::poll\n  15: tokio::runtime::scheduler::multi_thread::worker::Context::run_task\n  16: tokio::runtime::scheduler::multi_thread::worker::Context::run\n  17: tokio::runtime::context::set_scheduler\n  18: tokio::runtime::context::runtime::enter_runtime\n  19: tokio::runtime::scheduler::multi_thread::worker::run\n  20: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll\n  21: tokio::runtime::task::core::Core<T,S>::poll\n  22: tokio::runtime::task::harness::Harness<T,S>::poll\n  23: tokio::runtime::blocking::pool::Inner::run\n  24: std::sys_common::backtrace::__rust_begin_short_backtrace\n  25: core::ops::function::FnOnce::call_once{{vtable.shim}}\n  26: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once\n             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2015:9\n  27: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once\n             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2015:9\n  28: std::sys::unix::thread::Thread::new::thread_start\n             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys/unix/thread.rs:108:17\n  29: start_thread\n  30: clone
@cdecker
Copy link
Collaborator

cdecker commented Jul 18, 2024

Yes, we used to treat these as a faillible pre-flight test, but as the error being returned was the usual no route found error, that did not really tell us much, we made it infallible, i.e., it is now required to pass.

The usual cause for these is a slow signer, a CLN <> VLS desync issue, or a slow peer.

Failing here is correct by the way, as we do not stand a chance to complete the payments without enough capacity back online. This just happens to tell us more than the catch-all no route found error, so this is a symptom of us getting closer to the actual root cause :-)

@cdecker cdecker self-assigned this Jul 18, 2024
@cdecker
Copy link
Collaborator

cdecker commented Jul 18, 2024

Tracing through the logs we can reconstruct the following timeline:

  • 2024-07-18T10:38:41+02:00 Node scheduled
  • 2024-07-18T10:38:43+02:00 Startup complete
  • 2024-07-18T10:38:44+02:00 Signer 0 connects
  • 2024-07-18T10:38:45+02:00 Issuing reconnection command
  • 2024-07-18T10:39:31+02:00 Block sync completes (no longer necessary since pay: Use the networkheight as current height ElementsProject/lightning#7190)
  • 2024-07-18T10:39:47+02:00 Call to glclient.Node/Pay start of pre-flight checks
  • 2024-07-18T10:39:47+02:00 Checking if channel 841259x3019x0 is ready

So from what we can see here, we either fail to detect a channel becoming active, or we are not re-activating the channel correctly.

I'll dig further.

@cdecker
Copy link
Collaborator

cdecker commented Jul 18, 2024

If desired we can make the pre-flight infallible again for this node, and see if that helps. That'd mean that the pre-flight checks are not detecting activation correctly.

Longer term we want to make the channel status (active / inactive) an explicit part of listpeerchannels so we can skip some of the guesswork.

@roeierez
Copy link
Contributor Author

@cdecker I am in favor of anything that will help us improve here. We can update relai that we want to do that for this node if you want to go for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants