-
Notifications
You must be signed in to change notification settings - Fork 31
Configure Cassandra nodes to lookup their IP address by their fully qualified domain name #334
Configure Cassandra nodes to lookup their IP address by their fully qualified domain name #334
Conversation
When a pod is deleted, in 99% of cases it should have a new IP address. Can we make a test that deletes the pod and verifies the IP has changed? (perhaps we delete the pod again if it hasn't changed first time?). We definitely need an e2e test for this - ideally the test would be labelled "Resiliency" too, as the first of a suite of resiliency tests. |
pkg/cassandra/nodetool/nodetool.go
Outdated
leavingNodes, joiningNodes, movingNodes, mappedNodes, | ||
) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused - isn't this meant to be a part of another PR?
If yes and this is just stacked, it'd be great if we can stop stacking PRs too much as it makes it far harder to know which to review first. If we keep them separate, PRs that depend on [some other] PR will fail and/or we can add /hold
and comment to inform a reviewer that "this PR depends on PR xyz being merged first"
{ | ||
Name: "CASSANDRA_RPC_ADDRESS", | ||
Value: " ", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use the Cassandra RPC interface, and do we want to listen on the default pod IP for RPC connections? It seems to me like we'd only want to listen on 127.0.0.1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use it, but I'm not sure whether Cassandra nodes talk to each other with this protocol.
The default is to listen on all addresses:
So setting it here may not be ideal, but I can improve or remove it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// Set a non-existent default seed. | ||
// The Kubernetes Seed Provider will fall back to a default seed host if it can't look up seeds via the CASSANDRA_SERVICE. | ||
// And if the CASSANDRA_SEEDS environment variable is not set, it defaults to localhost. | ||
// Which could cause confusion if a non-seed node is temporarily unable to lookup the seed nodes from the service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen if we were to set this to localhost instead? Would the node fail to boot, or would it cause a split brain of some sort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would mean that new nodes might consider themselves seeds in the event that the KubernetesSeedProvider plugin encounters an error.
So yeah, a split brain situation I think.
// https://github.com/kubernetes/examples/blob/cabf8b8e4739e576837111e156763d19a64a3591/cassandra/go/main.go#L51 | ||
{ | ||
Name: "CASSANDRA_SEEDS", | ||
Value: "black-hole-dns-name", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly concerned this could become an attack vector. What if someone creates a service named black-hole-dns-name? Could an attacker gain access to the contents of the Cassandra DB as a result?
If possible, I'd rather set this to a value that causes Cassandra to fail hard/crash (as this value should only be referenced in the event of a DNS resolution failure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the answer is to remove the fallback code from the Kubernetes Seed Provider.
If it can't lookup the seeds from the Kubernetes it should log an error and return an empty list,
rather than returning a default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do this in a follow up branch, where I remove the reliance on the Docker image entrypoint and instead create a Navigator default config.
…ualified domain name * Also ensure that they only ever use the Kubernetes seed provider service * And ensure that that service publishes the seed IP addresses as soon as they are known, * so that seeds them selves can find themselves when they are starting up. Fixes: jetstack#319
726cd02
to
310bc07
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rebased and answered your comments @munnerz
// Set a non-existent default seed. | ||
// The Kubernetes Seed Provider will fall back to a default seed host if it can't look up seeds via the CASSANDRA_SERVICE. | ||
// And if the CASSANDRA_SEEDS environment variable is not set, it defaults to localhost. | ||
// Which could cause confusion if a non-seed node is temporarily unable to lookup the seed nodes from the service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would mean that new nodes might consider themselves seeds in the event that the KubernetesSeedProvider plugin encounters an error.
So yeah, a split brain situation I think.
// https://github.com/kubernetes/examples/blob/cabf8b8e4739e576837111e156763d19a64a3591/cassandra/go/main.go#L51 | ||
{ | ||
Name: "CASSANDRA_SEEDS", | ||
Value: "black-hole-dns-name", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do this in a follow up branch, where I remove the reliance on the Docker image entrypoint and instead create a Navigator default config.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: munnerz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This is a cut down version of #330
Fixes: #319
Release note: