Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counterintuitive warp error for Unsupported endpoint version #3404

Closed
michaelsproul opened this issue Aug 2, 2022 · 15 comments
Closed

Counterintuitive warp error for Unsupported endpoint version #3404

michaelsproul opened this issue Aug 2, 2022 · 15 comments
Labels
bug Something isn't working good first issue Good for newcomers HTTP-API

Comments

@michaelsproul
Copy link
Member

Description

When hitting the /eth/v2/validators/blocks/{slot} endpoint with an invalid randao_reveal (and verify_randao=true or empty) Lighthouse will return this error:

$ curl "http://localhost:5052/eth/v2/validator/blocks/1000000"
{"code":400,"message":"BAD_REQUEST: Unsupported endpoint version: v2","stacktraces":[]}

This makes zero sense, as the endpoint is cleared declared supporting v2 here:

let get_validator_blocks = any_version
.and(warp::path("validator"))
.and(warp::path("blocks"))
.and(warp::path::param::<Slot>().or_else(|_| async {
Err(warp_utils::reject::custom_bad_request(
"Invalid slot".to_string(),
))
}))

Adding a panic to unsupported_endpoint_version to get a backtrace shows some funky warp stuff going on:

thread 'tokio-runtime-worker' panicked at 'unsupported_version_rejection: v2', beacon_node/http_api/src/version.rs:60:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:142:14
   2: http_api::version::unsupported_version_rejection
   3: <warp::filter::and_then::AndThenFuture<T,F> as core::future::future::Future>::poll
   4: <warp::filter::and::AndFuture<T,U> as core::future::future::Future>::poll
   5: <warp::filter::and::AndFuture<T,U> as core::future::future::Future>::poll
   6: <warp::filter::and::AndFuture<T,U> as core::future::future::Future>::poll
   7: <warp::filter::and::AndFuture<T,U> as core::future::future::Future>::poll
   8: <warp::filter::and_then::AndThenFuture<T,F> as core::future::future::Future>::poll
   9: <futures_util::future::try_future::into_future::IntoFuture<Fut> as core::future::future::Future>::poll
  10: <warp::filter::or::EitherFuture<T,U> as core::future::future::Future>::poll
  11: <warp::filter::or::EitherFuture<T,U> as core::future::future::Future>::poll
  12: <F as futures_core::future::TryFuture>::try_poll
  13: <warp::filter::or::EitherFuture<T,U> as core::future::future::Future>::poll
  14: <warp::filter::and::AndFuture<T,U> as core::future::future::Future>::poll
  15: <warp::filter::or::EitherFuture<T,U> as core::future::future::Future>::poll
  16: <warp::filter::recover::RecoverFuture<T,F> as core::future::future::Future>::poll
  17: <warp::filters::log::internal::WithLogFuture<FN,F> as core::future::future::Future>::poll
  18: <warp::filters::cors::internal::WrappedFuture<F> as core::future::future::Future>::poll
  19: scoped_tls::ScopedKey<T>::set
  20: <warp::filter::service::FilteredFuture<F> as core::future::future::Future>::poll
  21: hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_catch
  22: <hyper::server::conn::upgrades::UpgradeableConnection<I,S,E> as core::future::future::Future>::poll
  23: <hyper::server::server::new_svc::NewSvcTask<I,N,S,E,W> as core::future::future::Future>::poll
  24: tokio::runtime::task::core::CoreStage<T>::poll
  25: tokio::runtime::task::harness::Harness<T,S>::poll
  26: std::thread::local::LocalKey<T>::with
  27: tokio::runtime::thread_pool::worker::Context::run_task
  28: tokio::runtime::thread_pool::worker::Context::run
  29: tokio::macros::scoped_tls::ScopedKey<T>::set
  30: tokio::runtime::thread_pool::worker::run
  31: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  32: tokio::runtime::task::harness::Harness<T,S>::poll
  33: tokio::runtime::blocking::pool::Inner::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

i.e. it seems that warp is doing some counterintuitive backtracking through a few and_then/and/or until it hits the unsupported_version_endpoint for a different endpoint (one that is v1 only). This makes little sense to me, and would probably require someone to dig pretty deep into warp to understand why this happens. I think this is similar to another open issue (#3112) related to our apparent misuse of warp combinators.

Finally, if verify_randao=false is set or the randao reveal is valid then no error occurs:

$ curl "http://localhost:5052/eth/v2/validator/blocks/3574858?verify_randao=false"
{ .. }

Version

v2.5.0

Related issues

@michaelsproul michaelsproul added bug Something isn't working HTTP-API labels Aug 2, 2022
@michaelsproul
Copy link
Member Author

It may be that we want to avoid and_then for our handlers and instead use then. The warp docs state:

Rejections are meant to say “this filter didn’t accept the request, maybe another can”. So for application-level errors, consider using Filter::then instead.

@michaelsproul michaelsproul added the good first issue Good for newcomers label Nov 3, 2022
@muang0
Copy link

muang0 commented Apr 8, 2023

I'll look into this one and follow up, thx

@michaelsproul
Copy link
Member Author

It may be that we want to avoid and_then for our handlers and instead use then.

@jmcph4 and I got this working on #4316. Our approach avoids returning a warp::reject::Reject for errors in the handlers themselves. For now we're converting the rejects into error responses using handle_rejection, like this:

let post_beacon_blocks_v2 = eth_v2
.and(warp::path("beacon"))
.and(warp::path("blocks"))
.and(warp::query::<api_types::BroadcastValidationQuery>())
.and(warp::path::end())
.and(warp::body::json())
.and(chain_filter.clone())
.and(network_tx_filter.clone())
.and(log_filter.clone())
.then(
|validation_level: api_types::BroadcastValidationQuery,
block: Arc<SignedBeaconBlock<T::EthSpec>>,
chain: Arc<BeaconChain<T>>,
network_tx: UnboundedSender<NetworkMessage<T::EthSpec>>,
log: Logger| async move {
match publish_blocks::publish_block(
None,
ProvenancedBlock::Local(block),
chain,
&network_tx,
log,
validation_level.broadcast_validation,
)
.await
{
Ok(()) => warp::reply().into_response(),
Err(e) => warp_utils::reject::handle_rejection(e)
.await
.unwrap()
.into_response(),
}
},
);

Jack will fix just the new v2/blocks endpoints in that PR, and this issue can remain open until we roll out the fix across the codebase. We might want to write a small abstraction to minimise the amount of code duplication (something like then_or_reject, then_or_error).

@muang0
Copy link

muang0 commented Jul 15, 2023

@michaelsproul Thanks for helping get this one moving. Been a bit busy lately but should be freeing up more in the immediate future.

I'm not sure how a rejection handler solves our initial issue of warp mixing up endpoint specific logic... If you've got any advice/guidance there it would be much appreciated.

Also, do you mind clarifying how this would look? We might want to write a small abstraction to minimise the amount of code duplication (something like then_or_reject, then_or_error).

@michaelsproul
Copy link
Member Author

Hey @muang0, the pattern we found that works is this one:

.then(
|validation_level: api_types::BroadcastValidationQuery,
block: Arc<SignedBeaconBlock<T::EthSpec>>,
chain: Arc<BeaconChain<T>>,
network_tx: UnboundedSender<NetworkMessage<T::EthSpec>>,
log: Logger| async move {
match publish_blocks::publish_block(
None,
ProvenancedBlock::local(block),
chain,
&network_tx,
log,
validation_level.broadcast_validation,
)
.await
{
Ok(()) => warp::reply().into_response(),
Err(e) => match warp_utils::reject::handle_rejection(e).await {
Ok(reply) => reply.into_response(),
Err(_) => warp::reply::with_status(
StatusCode::INTERNAL_SERVER_ERROR,
eth2::StatusCode::INTERNAL_SERVER_ERROR,
)
.into_response(),
},
}
},

i.e. change the and_then to a then so that there's no possibility of returning a Rejection. This prevents warp from backtracking when a handler errors.

The complication is that, then expects a Response as the return value, not a Result<Response, Rejection> so we need to convert the Rejections that were previously returned into responses, which is what that warp_utils::reject::handle_rejection accomplishes. The abstraction I'm thinking of is something like:

then_or_error(|| {
   // this function `f` returns a `Result<T, Rejection>`, which `then_or_error` will convert to a
   // `Response`. We can hopefully do a find and replace of `and_then` for `then_or_error`,
   // with minimal other changes.
   f()
})

The implementation of then_or_error might look like this:

use warp::{filters::BoxedFilter, Filter, reply::Reply};

// Mixin trait (add methods to all warp types implementing Filter)
pub trait ThenOrError {
   fn then_or_error<G, R>(self, g: G) -> BoxedFilter<(Response,)> 
      where G: Future<Item = Result<R, Rejection>>,
            R: Reply,
   {
       self.then(|| {
           match f.await {
               Ok(result) => result.into_response(),
               Err(e) => match warp_utils::reject::handle_rejection(e).await { 
                 Ok(reply) => reply.into_response(), 
                 Err(_) => warp::reply::with_status( 
                     StatusCode::INTERNAL_SERVER_ERROR, 
                     eth2::StatusCode::INTERNAL_SERVER_ERROR, 
                 )
                 .into_response(), 
             },
           }
       })
       .boxed()
   }
}

impl<F> ThenOrError for F: Filter {}

This might require some tweaks to get working.

We are also working on a big overhaul of the HTTP API here: #4462, so it might make sense for that to stabilise before rolling out this change (or we can raise a PR based on #4462 and merge this afterwards). I'm away for the next two weeks, but if you feel like working on it in the meantime I hope this is enough info to make progress

@muang0
Copy link

muang0 commented Jul 18, 2023

Thanks for the clarification @michaelsproul

I think we may be able to solve this using warp-native constructs (see example here). I'll play around with it a bit tomorrow.

In the meantime, can anyone provide context around the choice of a custom uor rather than or (beyond the in-line comment below)?
This is a shorthand for self.or(other).unify().boxed(), which is useful because it keeps the filter type simple and prevents type-checker explosions.
Is now an appropriate time for us to face the 'type-checker explosions?'

@michaelsproul
Copy link
Member Author

michaelsproul commented Jul 18, 2023

Hey @muang0, I don't think recover on its own is sufficient. We are already using it here:

.recover(warp_utils::reject::handle_rejection),

The problem is the and_then: it will backtrack and try a different handler if the handler returns a Rejection, which is exactly what we don't want. We could maybe achieve the same thing as then_or_error by sprinkling recovers immediately after every and_then, is that what you had in mind?

Is now an appropriate time for us to face the 'type-checker explosions?'

I don't think there's a better solution, but willing to be proved wrong. The boxing greatly simplifies the types and prevents us from hitting the compiler's recursion limit on massive types like Or<Or<Or<Or<Or<Or<....>>>>>>. It mostly worked fine without the boxing before, except that new compiler releases would subtly change the recursion limit behaviour and cause compilation to fail. We haven't had any compilation issues since adopting the boxing.

@muang0
Copy link

muang0 commented Jul 18, 2023

We could maybe achieve the same thing as then_or_error by sprinkling recovers immediately after every and_then, is that what you had in mind?
Wasn't thinking of this approach in particular, but I think this would be more optimal than writing a custom Mixin.

new compiler releases would subtly change the recursion limit behaviour and cause compilation to fail. We haven't had any compilation issues since adopting the boxing.
I'm following you, neat workaround for the compiler issues.

Assuming no concerns, I'll update an endpoint to use recover() after the and_then; then we can see how it looks/behaves

@muang0
Copy link

muang0 commented Jul 18, 2023

Testing blocked on an interesting cloudfront error. I'll give it some time before retrying & the subsequent Crates/AWS support tickets unblocked

bender@Jamess-MacBook-Air lighthouse % make install-lcli
cargo install --path lcli --force --locked \
		--features "jemalloc" \
		--profile "release" \
		
  Installing lcli v4.3.0 (/Users/bender/Projects/fun/lighthouse/lcli)
    Updating crates.io index
error: download of ha/sh/hashers failed

Caused by:
  failed to get successful HTTP response from `https://index.crates.io/ha/sh/hashers` (18.165.83.65), got 421
  debug headers:
  x-amz-cf-pop: IAD12-P1
  x-cache: Error from cloudfront
  x-amz-cf-pop: IAD55-P3
  x-amz-cf-id: bK5gURNzbur0Xk8kh7XHscLP0xLRes6LhH_hq4Ez1R5gcSHN8ir2cQ==
  body:
  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
  <HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
  <TITLE>ERROR: The request could not be satisfied</TITLE>
  </HEAD><BODY>
  <H1>421 ERROR</H1>
  <H2>The request could not be satisfied.</H2>
  <HR noshade size="1px">
  The distribution does not match the certificate for which the HTTPS connection was established with.
  We can't connect to the server for this app or website at this …
make: *** [install-lcli] Error 101

@muang0
Copy link

muang0 commented Jul 19, 2023

Running into an issue starting the local testnet, looks like the clients are expecting a deploy_block.txt file that's not being created. Will investigate more tmrw. unblocked

@muang0
Copy link

muang0 commented Jul 25, 2023

Update: Adding recover() to each filter didn't solve the backtracking issue. I'll try the Mixin approach next.

@michaelsproul
Copy link
Member Author

@muang0 Are you still working on this? If you're stuck I'd like to give it a try, particularly now that #4462 is merged.

@michaelsproul
Copy link
Member Author

Sorry @muang0 I ended up just going ahead and doing it. We need this fix quite soon for our next release, and we were getting some stack overflows running debug tests and thought this might help them as well.

PR #4595

@michaelsproul
Copy link
Member Author

I realised we need to repeat the same fix for the VC API, maybe you could tackle that @muang0? See #4597

bors bot pushed a commit that referenced this issue Aug 14, 2023
## Issue Addressed

Closes #3404 (mostly)

## Proposed Changes

- Remove all uses of Warp's `and_then` (which backtracks) in favour of `then` (which doesn't).
- Bump the priority of the `POST` method for `v2/blocks` to `P0`. Publishing a block needs to happen quickly.
- Run the new SSZ POST endpoints on the beacon processor. I think this was missed in between merging #4462 and #4504/#4479.
- Fix a minor issue in the validator registrations endpoint whereby an error from spawning the task on the beacon processor would be dropped.

## Additional Info

I've tested this manually and can confirm that we no longer get the dreaded `Unsupported endpoint version` errors for queries like:

```
$ curl -X POST -H "Content-Type: application/json" --data @block.json "http://localhost:5052/eth/v2/beacon/blocks" | jq
{
  "code": 400,
  "message": "BAD_REQUEST: WeakSubjectivityConflict",
  "stacktraces": []
}
```

```
$ curl -X POST -H "Content-Type: application/octet-stream" --data @block.json "http://localhost:5052/eth/v2/beacon/blocks" | jq
{
  "code": 400,
  "message": "BAD_REQUEST: invalid SSZ: OffsetOutOfBounds(572530811)",
  "stacktraces": []
}
```

```
$ curl "http://localhost:5052/eth/v2/validator/blocks/7067595"
{"code":400,"message":"BAD_REQUEST: invalid query: Invalid query string","stacktraces":[]}
```

However, I can still trigger it by leaving off the `Content-Type`. We can re-test this aspect with #4575.
bors bot pushed a commit that referenced this issue Aug 14, 2023
## Issue Addressed

Closes #3404 (mostly)

## Proposed Changes

- Remove all uses of Warp's `and_then` (which backtracks) in favour of `then` (which doesn't).
- Bump the priority of the `POST` method for `v2/blocks` to `P0`. Publishing a block needs to happen quickly.
- Run the new SSZ POST endpoints on the beacon processor. I think this was missed in between merging #4462 and #4504/#4479.
- Fix a minor issue in the validator registrations endpoint whereby an error from spawning the task on the beacon processor would be dropped.

## Additional Info

I've tested this manually and can confirm that we no longer get the dreaded `Unsupported endpoint version` errors for queries like:

```
$ curl -X POST -H "Content-Type: application/json" --data @block.json "http://localhost:5052/eth/v2/beacon/blocks" | jq
{
  "code": 400,
  "message": "BAD_REQUEST: WeakSubjectivityConflict",
  "stacktraces": []
}
```

```
$ curl -X POST -H "Content-Type: application/octet-stream" --data @block.json "http://localhost:5052/eth/v2/beacon/blocks" | jq
{
  "code": 400,
  "message": "BAD_REQUEST: invalid SSZ: OffsetOutOfBounds(572530811)",
  "stacktraces": []
}
```

```
$ curl "http://localhost:5052/eth/v2/validator/blocks/7067595"
{"code":400,"message":"BAD_REQUEST: invalid query: Invalid query string","stacktraces":[]}
```

However, I can still trigger it by leaving off the `Content-Type`. We can re-test this aspect with #4575.
@muang0
Copy link

muang0 commented Aug 24, 2023

@michaelsproul
I'm very very sorry for the delayed response from my end, I've had a rough summer with a couple things that came up unexpectedly that needed my full attention for the past month. I'm just now back in my normal environment and getting my legs underneath me again so to speak. I noticed #4597 has someone working on it already. If there's anything else lightweight and low-priority that could use help I'm happy to take a look; else I'll keep an eye on the backlog for a good 1st issue to pop up.

Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this issue Jan 6, 2024
Closes sigp#3404 (mostly)

- Remove all uses of Warp's `and_then` (which backtracks) in favour of `then` (which doesn't).
- Bump the priority of the `POST` method for `v2/blocks` to `P0`. Publishing a block needs to happen quickly.
- Run the new SSZ POST endpoints on the beacon processor. I think this was missed in between merging sigp#4462 and sigp#4504/sigp#4479.
- Fix a minor issue in the validator registrations endpoint whereby an error from spawning the task on the beacon processor would be dropped.

I've tested this manually and can confirm that we no longer get the dreaded `Unsupported endpoint version` errors for queries like:

```
$ curl -X POST -H "Content-Type: application/json" --data @block.json "http://localhost:5052/eth/v2/beacon/blocks" | jq
{
  "code": 400,
  "message": "BAD_REQUEST: WeakSubjectivityConflict",
  "stacktraces": []
}
```

```
$ curl -X POST -H "Content-Type: application/octet-stream" --data @block.json "http://localhost:5052/eth/v2/beacon/blocks" | jq
{
  "code": 400,
  "message": "BAD_REQUEST: invalid SSZ: OffsetOutOfBounds(572530811)",
  "stacktraces": []
}
```

```
$ curl "http://localhost:5052/eth/v2/validator/blocks/7067595"
{"code":400,"message":"BAD_REQUEST: invalid query: Invalid query string","stacktraces":[]}
```

However, I can still trigger it by leaving off the `Content-Type`. We can re-test this aspect with sigp#4575.
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this issue Jan 6, 2024
Closes sigp#3404 (mostly)

- Remove all uses of Warp's `and_then` (which backtracks) in favour of `then` (which doesn't).
- Bump the priority of the `POST` method for `v2/blocks` to `P0`. Publishing a block needs to happen quickly.
- Run the new SSZ POST endpoints on the beacon processor. I think this was missed in between merging sigp#4462 and sigp#4504/sigp#4479.
- Fix a minor issue in the validator registrations endpoint whereby an error from spawning the task on the beacon processor would be dropped.

I've tested this manually and can confirm that we no longer get the dreaded `Unsupported endpoint version` errors for queries like:

```
$ curl -X POST -H "Content-Type: application/json" --data @block.json "http://localhost:5052/eth/v2/beacon/blocks" | jq
{
  "code": 400,
  "message": "BAD_REQUEST: WeakSubjectivityConflict",
  "stacktraces": []
}
```

```
$ curl -X POST -H "Content-Type: application/octet-stream" --data @block.json "http://localhost:5052/eth/v2/beacon/blocks" | jq
{
  "code": 400,
  "message": "BAD_REQUEST: invalid SSZ: OffsetOutOfBounds(572530811)",
  "stacktraces": []
}
```

```
$ curl "http://localhost:5052/eth/v2/validator/blocks/7067595"
{"code":400,"message":"BAD_REQUEST: invalid query: Invalid query string","stacktraces":[]}
```

However, I can still trigger it by leaving off the `Content-Type`. We can re-test this aspect with sigp#4575.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers HTTP-API
Projects
None yet
Development

No branches or pull requests

2 participants