-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out-of-date connection strings when running three_data_hall #1958
Comments
That issue probably happens for clients outside of the FDB Pods? I believe the issue is that the client cannot write to the mounted cluster file from the ConfigMap. It's better to copy the cluster file to a location where the client can write to, e.g. with an init container. If the client can write to the cluster file it should automatically update the cluster file when the coordinators change. So the issue is not necessarily that the connection strings are different for some time but rather the issue is that the clients are not able to update their cluster file. |
Correct. We mount the cluster-file from the ConfigMap, which has worked excellent with a single
If the cluster file in the ConfigMap is out-of-date, there might be a risk of new pods, which copies the cluster-file from the ConfigMap, getting an invalid/unusable connection string. I cannot tell if this is improbable or impossible. |
As long as at least one of the old coordinators are reachable they still can connect to the cluster. So the risk should be minimal for that. I'll add a different label to this issue, as this is rather a deficit in the current design and not a bug. I'm going to update the documentation for clients (not sure if we have something like that already in place) with the hint that the cluster file should be moved to a location where the application can write. |
What happened?
When running with multiple
FoundationDBCluster
k8s resources, managing the same cluster, like in a three_data_hall setup, a change to the connection string does not propagate to allFoundationDBCluster
resources, leaving the connection string in theFoundationDBCluster
resource status and the associated ConfigMap (cluster-file) out-of-date.Foundationdb clients use the cluster-file (connection string) mounted from the ConfigMap. When the connection string is stale,
status json
reportsclient.database_status.healthy: false
, with the messageCluster file contents do not match current cluster connection string. Verify the cluster file and its parent directory are writable and that the cluster file has not been overwritten externally.
.The issue resolves whenever the reconciliation loop of the out-of-date
FoundationDBCluster
is run again. This seems to require a trigger, such that there is no bound on how long they are out-of-date.What did you expect to happen?
An update to the connection string should eventually (within a few minutes) propagate to all
FoundationDBCluster
resources managing the cluster, and their associated ConfigMap.How can we reproduce it (as minimally and precisely as possible)?
Create a cluster with multiple
FoundationDBCluster
resource. E.g. by following the three_data_hall example.Wait for it to reconcile.
Apply a change dummy change to one of the
FoundationDBCluster
resources which triggers a change of coordinators (e.g. updating the node selector). The connection strings should be out-of-sync between theFoundationDBCluster
resources (and ConfigMaps).Anything else we need to know?
No response
FDB Kubernetes operator
v1.33.0
Kubernetes version
1.26
Cloud provider
Azure, GCP
The text was updated successfully, but these errors were encountered: