-
-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Just add a retry before really throwing a FastCgiCommunicationFailed exception. #1758
Just add a retry before really throwing a FastCgiCommunicationFailed exception. #1758
Conversation
Here is something interesting about this: hollodotme/fast-cgi-client#68 (comment) |
What if the request is updating something in the database, or sending emails for example. That could run the same request/action twice, which might not be a good thing 🤔 How often do you get these errors? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-trying seems like a bad idea to me. It's usual for clients to retry safe requests, not for proxy layers to do it, unless the error is for sure retriable such as an edge load balancer retrying a TLS handshake.
@mnapoli Can't really tell how often it appears on the long run, but when I do some manual testing I sometimes get 50% error rate.... So that's a lot... I know that retrying is a bit crappy but the fact is that in all the error I get in logs our code is never executed at all, everything happens on Bref side... So for us no problem to do a retry, and then we have a 0% error rate with manual testing. |
50% error rate smells like something else is borked. Is it always failing after the first invoke? |
That's only my own feeling but yes I'm pretty sure it's always after the first invoke... |
And to add more context, we did NOT see this behavior on workers or console lambdas (working with Symfony) |
Just looked at the numbers of the last days with bref dashboard and I can see a 6-7% error rate on entire days. |
That is really weird, something else must be at play here. A 6-7% error rate would be affecting all Bref users if that was a global Bref problem. I'd start looking at ways to pinpoint the problem:
|
Here a first list of answers, will keep you posted with others answers when I get them:
-> Yes, on this project we have redis and mongodb
-> I'm pretty sure this is not the case as the error is really fast at the execution start (few milliseconds). We are using 2048Mo lambda memory size on this project.
-> this project is an API using bref 8.2 fpm layer and symfony so this is not the case for me here.
-> Like I said the error is really fast at the execution start so not the case either...
-> will do more testing but I saw it on every HTTP route (GET mainly)
-> Will try and keep you posted. |
Would be interesting to see too if this happens on cold starts. If not, is the request before successful? Times out? Could fill the memory? (or any other reason it could leave the environment in a broken state) Also nothing specific/exotic, like using Symfony Runtime, setting a non-standard handler, etc. |
As far as I manually tested this is not happening on cold starts, and the previous request is always successful. No timeout, not full memory... Nothing visible at least.... |
This kind of problem is only happening in API mode, so nothing fancy outside of classic Symfony, Symfony Runtime is NOT used in this project. |
File
|
Here is the raw log we got when this problem occur:
Here the 424 seconds is really weird as the behavior in an API client is really fast... |
@mnapoli sorry this is not the right
So I'm trying to delete the |
Closing for now as we know that's unfortunately not a solution we can use since it may retry valid requests (non-idempotent) on occasion. |
Hello there!
This PR only to start a discussion to try to find a solution to this kind of random errors:
I'm pretty sure something smarter can be achieved but manually testing this for a few minutes now and only got 200 responses and no more 500...