-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature flags detection sometimes triggers erpc,noconnection
#8346
Comments
I think the expected behavior should be "the operation is retried N times" :) |
[Why] There could be a transient network issue. Let's give a few more chances to perform the requested RPC call. [How] We retry until the given timeout is reached, if any. To honor that timeout, we measure the time taken by the RPC call itself. We also sleep between retries. Before each retry, the timeout is reduced by the total of the time taken by the RPC call and the sleep. References #8346.
[Why] There could be a transient network issue. Let's give a few more chances to perform the requested RPC call. [How] We retry until the given timeout is reached, if any. To honor that timeout, we measure the time taken by the RPC call itself. We also sleep between retries. Before each retry, the timeout is reduced by the total of the time taken by the RPC call and the sleep. References #8346. V2: Treat `infinity` timeout differently. In this case, we never retry following a `noconnection` error. The reason is that this timeout is used specifically for callbacks executed remotely. We don't know how long they take (for instance if there is a lot of data to migrate). We don't want an infinite retry loop either, so in this case, we don't retry.
[Why] There could be a transient network issue. Let's give a few more chances to perform the requested RPC call. [How] We retry until the given timeout is reached, if any. To honor that timeout, we measure the time taken by the RPC call itself. We also sleep between retries. Before each retry, the timeout is reduced by the total of the time taken by the RPC call and the sleep. References #8346. V2: Treat `infinity` timeout differently. In this case, we never retry following a `noconnection` error. The reason is that this timeout is used specifically for callbacks executed remotely. We don't know how long they take (for instance if there is a lot of data to migrate). We don't want an infinite retry loop either, so in this case, we don't retry.
[Why] There could be a transient network issue. Let's give a few more chances to perform the requested RPC call. [How] We retry until the given timeout is reached, if any. To honor that timeout, we measure the time taken by the RPC call itself. We also sleep between retries. Before each retry, the timeout is reduced by the total of the time taken by the RPC call and the sleep. References #8346. V2: Treat `infinity` timeout differently. In this case, we never retry following a `noconnection` error. The reason is that this timeout is used specifically for callbacks executed remotely. We don't know how long they take (for instance if there is a lot of data to migrate). We don't want an infinite retry loop either, so in this case, we don't retry.
[Why] There could be a transient network issue. Let's give a few more chances to perform the requested RPC call. [How] We retry until the given timeout is reached, if any. To honor that timeout, we measure the time taken by the RPC call itself. We also sleep between retries. Before each retry, the timeout is reduced by the total of the time taken by the RPC call and the sleep. References #8346. V2: Treat `infinity` timeout differently. In this case, we never retry following a `noconnection` error. The reason is that this timeout is used specifically for callbacks executed remotely. We don't know how long they take (for instance if there is a lot of data to migrate). We don't want an infinite retry loop either, so in this case, we don't retry.
[Why] There could be a transient network issue. Let's give a few more chances to perform the requested RPC call. [How] We retry until the given timeout is reached, if any. To honor that timeout, we measure the time taken by the RPC call itself. We also sleep between retries. Before each retry, the timeout is reduced by the total of the time taken by the RPC call and the sleep. References #8346. V2: Treat `infinity` timeout differently. In this case, we never retry following a `noconnection` error. The reason is that this timeout is used specifically for callbacks executed remotely. We don't know how long they take (for instance if there is a lot of data to migrate). We don't want an infinite retry loop either, so in this case, we don't retry.
[Why] There could be a transient network issue. Let's give a few more chances to perform the requested RPC call. [How] We retry until the given timeout is reached, if any. To honor that timeout, we measure the time taken by the RPC call itself. We also sleep between retries. Before each retry, the timeout is reduced by the total of the time taken by the RPC call and the sleep. References #8346. V2: Treat `infinity` timeout differently. In this case, we never retry following a `noconnection` error. The reason is that this timeout is used specifically for callbacks executed remotely. We don't know how long they take (for instance if there is a lot of data to migrate). We don't want an infinite retry loop either, so in this case, we don't retry. (cherry picked from commit 8749c60)
[Why] There could be a transient network issue. Let's give a few more chances to perform the requested RPC call. [How] We retry until the given timeout is reached, if any. To honor that timeout, we measure the time taken by the RPC call itself. We also sleep between retries. Before each retry, the timeout is reduced by the total of the time taken by the RPC call and the sleep. References #8346. V2: Treat `infinity` timeout differently. In this case, we never retry following a `noconnection` error. The reason is that this timeout is used specifically for callbacks executed remotely. We don't know how long they take (for instance if there is a lot of data to migrate). We don't want an infinite retry loop either, so in this case, we don't retry. (cherry picked from commit 8749c60) (cherry picked from commit 47b1596) # Conflicts: # deps/rabbit/src/rabbit_ff_controller.erl # deps/rabbit/test/feature_flags_v2_SUITE.erl
[Why] There could be a transient network issue. Let's give a few more chances to perform the requested RPC call. [How] We retry until the given timeout is reached, if any. To honor that timeout, we measure the time taken by the RPC call itself. We also sleep between retries. Before each retry, the timeout is reduced by the total of the time taken by the RPC call and the sleep. References #8346. V2: Treat `infinity` timeout differently. In this case, we never retry following a `noconnection` error. The reason is that this timeout is used specifically for callbacks executed remotely. We don't know how long they take (for instance if there is a lot of data to migrate). We don't want an infinite retry loop either, so in this case, we don't retry. (cherry picked from commit 8749c60) (cherry picked from commit 47b1596) # Conflicts: # deps/rabbit/src/rabbit_ff_controller.erl # deps/rabbit/test/feature_flags_v2_SUITE.erl
[Why] There could be a transient network issue. Let's give a few more chances to perform the requested RPC call. [How] We retry until the given timeout is reached, if any. To honor that timeout, we measure the time taken by the RPC call itself. We also sleep between retries. Before each retry, the timeout is reduced by the total of the time taken by the RPC call and the sleep. References #8346. V2: Treat `infinity` timeout differently. In this case, we never retry following a `noconnection` error. The reason is that this timeout is used specifically for callbacks executed remotely. We don't know how long they take (for instance if there is a lot of data to migrate). We don't want an infinite retry loop either, so in this case, we don't retry. (cherry picked from commit 8749c60) (cherry picked from commit 47b1596)
We stumbled over this by user error in #10100 and as requested, here is the step by step to get the same error message. Although, bear in mind that this happened to me only because I forgot the "rabbit@" when trying to call
|
It's not clear to me from this log what exactly logs this message: the node or the shell where In any case, |
I don't know if you checked the log on the node that is running when you try to connect, but it's worth checking. What may be wrong is your |
@CarvalhoRod thank you for chiming in but this is RabbitMQ 101 and @lukebakken is a core team engineer. You can be sure such basics were accounted for. That said, with #8411 this can probably be closed. If we get more details/observe more specific failure scenarios that are specific to the code and not the setup, we can always file a new issue. |
Setting the milestone to |
Note that the relevant PR was reverted in #11507, I will unset the milestone to reduce confusion. |
@lukebakken @michaelklishin just to advise I am experiencing the exact same when trying to cluster on aws. Erland cookie is the same. Both instances/nodes on same VPC in different AZ. Security groups all set up correctly. Running version 4.0.4 I have redacted the erland cookie except the last 2 letters to show this has been accounted for. Both nodes are up and running before node 2 runs join_cluster to join node 1. That is, systemctl start rabbitmq-server followed by rabbitmqctl start_app both run fine. I'm not sure what i'm missing. |
@Hussain-f our team does not appreciate when issues or PRs are used for questions. Start a discussion, they have been around for a few years now. We cannot suggest much based on the logs of the connecting node. Logs from all nodes must be collected and inspected, when a node with a mismatching shared secret connects, the connection target will log a message after refusing the conneciton. |
Describe the bug
Logs
Reproduction steps
See above.
Expected behavior
No
erpc
error - either it is re-tried, or it is not tried until disterl is definitely up and running.Additional context
Observed in the following situations:
The text was updated successfully, but these errors were encountered: