Skip to content

Commit

Permalink
Revert "rabbit_feature_flags: Retry after erpc:call() fails with `noc…
Browse files Browse the repository at this point in the history
…onnection`"

This reverts commit 8749c60.

[Why]
The patch was supposed to solve an issue that we didn't understand and
that was likely a network/DNS problem outside of RabbitMQ. We know it
didn't solve that issue because it was reported again 6 months after the
initial pull request (#8411).

What we are sure however is that it increased the testing of RabbitMQ
significantly because the code loops for 10+ minutes if the remote node
is not running.

The retry in the Feature flags subsystem was not the right place either.
The `noconnection` error is visible there because it runs earlier during
RabbitMQ startup. But retrying there won't solve a network issue
magically.

There are two ways to create a cluster:
1. peer discovery and this subsystem takes care of retries if necessary
   and appropriate
2. manually using the CLI, in which case the user is responsible for
   starting RabbitMQ nodes and clustering them

Let's revert it until the root cause is really understood.
  • Loading branch information
dumbbell committed Jun 20, 2024
1 parent 7146274 commit d0c13b4
Showing 1 changed file with 0 additions and 23 deletions.
23 changes: 0 additions & 23 deletions deps/rabbit/src/rabbit_ff_controller.erl
Original file line number Diff line number Diff line change
Expand Up @@ -1390,32 +1390,9 @@ this_node_first(Nodes) ->
Ret :: term() | {error, term()}.

rpc_call(Node, Module, Function, Args, Timeout) ->
SleepBetweenRetries = 5000,
T0 = erlang:monotonic_time(),
try
erpc:call(Node, Module, Function, Args, Timeout)
catch
%% In case of `noconnection' with `Timeout'=infinity, we don't retry
%% at all. This is because the infinity "timeout" is used to run
%% callbacks on remote node and they can last an indefinite amount of
%% time, for instance, if there is a lot of data to migrate.
error:{erpc, noconnection} = Reason
when is_integer(Timeout) andalso Timeout > SleepBetweenRetries ->
?LOG_WARNING(
"Feature flags: no connection to node `~ts`; "
"retrying in ~b milliseconds",
[Node, SleepBetweenRetries],
#{domain => ?RMQLOG_DOMAIN_FEAT_FLAGS}),
timer:sleep(SleepBetweenRetries),
T1 = erlang:monotonic_time(),
TDiff = erlang:convert_time_unit(T1 - T0, native, millisecond),
Remaining = Timeout - TDiff,
Timeout1 = erlang:max(Remaining, 0),
case Timeout1 of
0 -> {error, Reason};
_ -> rpc_call(Node, Module, Function, Args, Timeout1)
end;

Class:Reason:Stacktrace ->
Message0 = erl_error:format_exception(Class, Reason, Stacktrace),
Message1 = lists:flatten(Message0),
Expand Down

0 comments on commit d0c13b4

Please sign in to comment.