Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transport: check error returned by WriteField to catch illegal characters in metadata #1134

Closed
egoldschmidt opened this issue Mar 21, 2017 · 21 comments

Comments

@egoldschmidt
Copy link

We frequently see failed tests with the following error:

rpc error: code = Canceled desc = stream terminated by RST_STREAM with error code: 8

We don't see logs on the server or client that would indicate that the request failed due to app code. Is there any further explanation for why this type of error would emerge?

Environment:

  • Client: Dockerized Alpine 3.4 / go version go1.8rc3 linux/amd64 / grpc-go @ 0713829b980f4ddd276689a36235c5fcc82a21bf
  • Servier: Dockerized Ubuntu Yakkety / Python 3.6 / grpc 1.1.3
@dfawley
Copy link
Member

dfawley commented Mar 21, 2017

Are you setting deadlines on your RPCs, and are you attempting 100+ concurrent RPCs? If so, this may be a known issue fixed by #1124. Please give it a try.

@egoldschmidt
Copy link
Author

We still see this from time to time even with the patch. It's possibly less frequent but I can't be sure.

@egoldschmidt
Copy link
Author

Also I don't believe we are setting deadlines nor doing 100+ concurrent RPCs in our tests, where we most frequently see this bug.

@MakMukhi
Copy link
Contributor

  • Does the grpc python server log anything out about sending RST_STREAMS?
  • Is there any proxy between grpc-go client and grpc-python server?

You are getting a RST_STREAM with error code 8. According to the spec you should get this when the call is canceled: https://github.com/grpc/grpc/blob/c2bc59b2b0de079c64a2e544354b699685ea2853/doc/PROTOCOL-HTTP2.md
or deadline set on RPC exceeded:
https://github.com/grpc/grpc/blob/46357c882df1afc28f7a5228c40fde522093fa32/src/core/lib/transport/status_conversion.c#L40

If there's a deadline set on the RPC it's unlikely but possible that the server sees this deadline and sends RST_STREAM(with error code 8) before the client does.
Can you think of anything about the tests that might be leading to these situations?
Also, it'd be worthwhile to include gRPC-python team to get their input; If the python server sends RST_STREAM in any other case too.

  • Are these unary or streaming RPCs?

@tiziano88
Copy link

tiziano88 commented Jun 16, 2017

This started happening to me as well. I noticed that it seems related to setting a large value for a metadata value. The value happens to be a JWT token, so plain ASCII, but it seems that changing the metadata key to have the -bin suffix fixes the issue, even though the value is not actually binary. Will try and come up with a reproducible test case shortly.

@tiziano88
Copy link

Turns out, it was an extra newline in one of the metadata values (which I was reading from a file, and the editor was helpfully adding a trailing newline at the end and not making it obvious).

@dfawley
Copy link
Member

dfawley commented Jun 19, 2017

Thanks for the follow-up.

It's unlikely the original issue was seen because of malformed metadata, because it sounds like it was happening sporadically. Is this still happening? If so, we could use some help in reproducing it (please see @MakMukhi's comment: #1134 (comment)). Thanks!

@egoldschmidt
Copy link
Author

egoldschmidt commented Jun 20, 2017 via email

@zllak
Copy link
Contributor

zllak commented Jun 26, 2017

I can confirm we see it too in production from time to time. We end up being called with a canceled context. We initially thought it was due to the client context being canceled, but even with a test where the client call with a non cancelable context, the server-side context ends up canceled.
Any idea what could cause this issue ? this is quite a huge problem for us.
We'll try to do some digging on our side to try and understand what's happening.

We have seen it mostly on unary calls, sometimes on streams.

@MakMukhi
Copy link
Contributor

@zllak Can you provide more information please?

  • Is the error code you get with RST_STREAM on client side same as above (i.e. 8)?
  • Are both you server and client on golang?
  • What exactly is the scenario: client makes a unary RPC and the server instead of responding back send a RST_STREAM with error code 8? Is the RST_STREAM sent immediately or the RPC calls hangs for a while and eventually gets the RST_STREAM?
  • Since you could reproduce this in a test, would you be able to share some code (ideally a repro)?

@zllak
Copy link
Contributor

zllak commented Jun 28, 2017

@MakMukhi

  • Client side error is a "context canceled", not a "stream terminated by RST_STREAM with error code: 8"
  • Both server and client in golang (also, client is running on arm, might change nothing)
  • Unfortunately no, we still can't reproduce it 100%, it pops randomly in production. Once we can put together a real test case, I'll share it.

@mehrdada
Copy link
Member

@kahuang seems to have run into a similar issue here: grpc/grpc#11586

@philipithomas
Copy link

philipithomas commented Aug 10, 2017

Instead of doing a JWT, I'm trying to use protobuf for a token. It's triggering this same issue while in the "authorization" metadata. I suspect it has to do with newlines in the encoded message.

Switching the md key to "authorization-bin" seems to have fixed the issue.

@dfawley
Copy link
Member

dfawley commented Aug 14, 2017

Switching the md key to "authorization-bin" seems to have fixed the issue.

We think that checking the return value of WriteField on the client side (ref) may be able to surface these types of issues before attempting to send the RPC that contains illegal characters in metadata.

@raliste
Copy link

raliste commented Dec 29, 2017

We are seeing the same error, but in our case is rather stable for messages larger than 5 MB even with:

grpcOpts := []grpc.DialOption{
		grpc.WithInsecure(),
		grpc.WithDefaultCallOptions(
			grpc.MaxCallRecvMsgSize(64<<20),
			grpc.MaxCallSendMsgSize(64<<20),
		),
	}

@dfawley
Copy link
Member

dfawley commented Jan 8, 2018

@sslavian812
Copy link

I got nearly the same error about RST_STREAM with error code 2.
Can't figure out what is it exactly about, but I believe that it's close to this issue.
I described the problem in a SO question, but not one answered for a while.
https://stackoverflow.com/questions/48174240/grpc-rendezvous-terminated-with-statuscode-internal-received-rst-stream-with

It would be very helpful to get some hints what to look at.

@dfawley
Copy link
Member

dfawley commented Jan 12, 2018

@sslavian812, it looks like you're working in Python, but this is the Go repo. Please be patient and someone should respond to your question on SO. If not, you can also email the grpc.io group to ask for help, or if you think it's a bug, file an issue against python in the main grpc repo.

@btc
Copy link
Contributor

btc commented Mar 7, 2018

@egoldschmidt Have you found the root cause of your disconnects?

@nmccready
Copy link

I got nearly the same error about RST_STREAM with error code 2.
Can't figure out what is it exactly about, but I believe that it's close to this issue.
I described the problem in a SO question, but not one answered for a while.
https://stackoverflow.com/questions/48174240/grpc-rendezvous-terminated-with-statuscode-internal-received-rst-stream-with

For whatever it is worth, I am seeing this same error in Node.js and the stack overflow hint at fork made me think it was something with jest. I removed jest and went to mocha and the error went away.

@menghanl menghanl changed the title gRPC requests mysteriously being canceled transport: check error returned by WriteField to catch illegal characters in metadata Jun 13, 2018
@stale stale bot added the stale label Sep 6, 2019
@dfawley dfawley removed the stale label Sep 6, 2019
@grpc grpc deleted a comment from stale bot May 3, 2021
@menghanl
Copy link
Contributor

Closing due to lack of activity.

Also, if metadata was the cause, #4886 should fix the problem.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests