Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consul in segmented network #1116

Closed
ask0n opened this issue Jul 16, 2015 · 7 comments
Closed

consul in segmented network #1116

ask0n opened this issue Jul 16, 2015 · 7 comments

Comments

@ask0n
Copy link

ask0n commented Jul 16, 2015

I'm trying to adapt consul to segmented L3 network. Network segmentation looks like this:

srv-A: 10.0.1.0/24
srv-B: 10.10.1.0/24
agent-A: 10.0.2.0/24
agent-B: 10.10.2.0/24

communication between agent-A and agent-B is prohibited. Communication between srv-A and srv-B is allowed on ports 8300, 8302. Communication inside segment A and inside B is allowed (all ports which may be used by consul). Consul agents will live in agent-* subnet, consul servers will live in srv-* subnet. Looks very similar to #882.
The only way I've found to disallow agent-A and agent-B communication in documentation - use separate data center names for them.
According to configuration doc - "Nodes in the same data center should be on a single LAN." What does it mean "single LAN", is it same L2 segment, same L3 subnet, or this is just mean that we need to allow all necessary ports for communication between agents/servers in routable network?
Anyway I have started servers in A subnet as "dc-A", servers in B subnet as "dc1" [BTW, I have tried to rename them to "dc-B" without success via config, even with cleaning working directory, is such renaming possible?]
Now I can resolve services and nodes in agent-B subnet from srv-A. I must use dc name, like test.service.dc1.consul, this is expected behavior, but I hadn't found any method to ask something like test.service.ALL.consul to get nodes from both data centers.
I think this should be an expected behavior for HA, when we could ask consul about some service and if there is no local nodes, which provide this service in local DC, but have some alive nodes in remote DC for this service - use them.
Maybe I've missed something and such segmentation should be done in other way?

@armon
Copy link
Member

armon commented Jul 22, 2015

Each of the gossip pools assumes a fully connected mesh, every node able to talk to every other node. This is really what we mean with the "single LAN". So in this case, if you had 3 gossip pools, (DC1, DC2, and WAN), this would work. srv-A and agent-A would be DC1, and srv-B and agent-B are DC2, then the servers are in the WAN pool.

In terms of improving the discovery between datacenters, that is something we are working on and have a number of existing tickets around. The idea is that the DC specifier would be overloaded to allow things like "meta" DCs (e.g. EU = FR, GE, UK), (US= West,East) etc.

@ChristianKniep
Copy link

Hey @armon,

I got somehow a similar remark. I would like to make the deployment agnostic to the underlaying docker machines and therefore I thought having

  • an inter-machine DC (dc1) for which all consul ports are pinned to the physical eth0 address
  • an intra-machine DC (vmX) which is comprised of the containers running on top of the physical machine.

Registrator exposes all internal services to all the other nodes in the dc1-cluster. And the internal containers are able to access services exposed on different machines, by accessing the dc1 namespace.
So far, so good.

screen shot 2015-08-31 at 15 31 30

But the consul servers (the green once), expecting to talk to the DC ontop of the other machines as well.
The log from consul ontop of VM2.

2015/08/31 15:15:27 [INFO] serf: EventMemberJoin: consulVM2.vm2 172.17.0.44
2015/08/31 15:15:27 [INFO] consul: adding server consulVM2.vm2 (Addr: 172.17.0.44:8300) (DC: vm2)
2015/08/31 15:15:47 [INFO] agent.rpc: Accepted client: 127.0.0.1:48551

The internal consul server is hooking himself in - all good.
But...

2015/08/31 15:16:42 [INFO] serf: EventMemberJoin: kibana4.vm1 172.17.0.89
2015/08/31 15:16:42 [INFO] consul: adding server kibana4.vm1 (Addr: 172.17.0.89:8300) (DC: vm1)
2015/08/31 15:16:45 [INFO] serf: EventMemberJoin: 718ef6d3e6ba.vm1 172.17.0.90
2015/08/31 15:16:45 [INFO] consul: adding server 718ef6d3e6ba.vm1 (Addr: 172.17.0.90:8300) (DC: vm1)
2015/08/31 15:16:51 [INFO] memberlist: Suspect 718ef6d3e6ba.vm1 has failed, no acks received
2015/08/31 15:17:01 [INFO] memberlist: Suspect kibana4.vm1 has failed, no acks received

... kibana4 was started on vm1, I would have expected that a consul server only listens to his own datacenter and the DC of his WAN server.

But maybe that's the problem, that the agent listens to all DCs of the WAN server?

Cheers
Christian

UPDATE:
Hmm.. looking at the memberships they present themself different:

  • VM1 seems only to care about itself (as I would expect).
    This bugger is bootstraped.
[root@consul /]# consul members
Node        Address           Status  Type    Build  Protocol  DC
vm1  192.168.1.1:8301  alive   server  0.5.2  2         dc1
vm2  192.168.1.2:8301  alive   server  0.5.2  2         dc1
[root@consul /]# consul members -wan
Node            Address           Status  Type    Build  Protocol  DC
vm1.dc1  192.168.1.1:8302  alive   server  0.5.2  2         dc1
consulVM1.vm1   172.17.0.88:8302  alive   server  0.5.2  2         vm1
[root@consul /]#
  • VM2 is also looking out for the first VM-DC.
    This one joins the first one.
[root@consul /]# consul members
Node        Address           Status  Type    Build  Protocol  DC
vm2  192.168.1.2:8301  alive   server  0.5.2  2         dc1
vm1  192.168.1.1:8301  alive   server  0.5.2  2         dc1
[root@consul /]# consul members -wan
Node              Address           Status  Type    Build  Protocol  DC
vm2.dc1    192.168.1.2:8302  alive   server  0.5.2  2         dc1
consulVM2.vm2     172.17.0.44:8302  alive   server  0.5.2  2         vm2
kibana4.vm1       172.17.0.89:8302  alive   server  0.5.2  2         vm1
718ef6d3e6ba.vm1  172.17.0.90:8302  alive   server  0.5.2  2         vm1

Is there a way to prevent agents to look beyond there own and a common DC?
I also skipped the registrator w/o a change in behavior. The second dc1-consul sees the local containers, the first one doesn't. :(

@slackpad
Copy link
Contributor

Hi @ChristianKniep - it's weird that kibana4.vm1 would bleed into the WAN pool, so I'm a little confused about your setup. I'd imagine that all the green Consul servers would be in one DC (as you've got), and that you'd see the red Consul servers in your -wan query (consulVM1.vm1, consulVM2.vm2, etc.). In your example is kibana4.vm1 also another Consul server, or just a registered service running on that machine?

@ChristianKniep
Copy link

Hey James,

I described the setup and problem here: https://github.com/ChristianKniep/orchestra/tree/master/distributed-consul
It seems that the use of the consul-DNS on all nodes confuses the agents and they are communicating with all nodes. Maybe it's because the services are named equally and the DNS round-robins? Not sure...
If I only use a local DNS on the internal nodes, it's quite fine - but I could polish it, so that the eth0-IP is advertised. If I can get this working it should be fine for me...

@ChristianKniep
Copy link

OK, I getting closer...

  • Only the internal service nodes using the local DNS server
  • I missed the order in registrators config, so -ip was ignored, what gave me the internal IP
    By doing so I can query elasticsearch.service.dc1.consul and I get the external IP addresses of the service across the cluster.
[root@es2 /]# dig +short SRV elasticsearch.service.dc1.consul
1 1 9200 consul2.node.dc1.consul.
1 1 9200 consul1.node.dc1.consul.
[root@es2 /]# dig +short A elasticsearch.service.dc1.consul
192.168.99.100
192.168.99.101

If I want to stay within my own DC, I get the interal IPs.

[root@es2 /]# dig +short A elasticsearch.service.consul
172.17.0.16

@slackpad
Copy link
Contributor

slackpad commented Sep 4, 2015

Ah ok - it makes sense that you'd use the local agent to resolve DNS within the container if you don't want to qualify the names with the DC. There's nothing Consul-wise that should round robin across DCs, so I'm not sure what configuration you were in before, but maybe you were sometimes asking the other DC's server for elasticsearch.service.consul.

@slackpad
Copy link
Contributor

Hi @ChristianKniep the WAN address translation feature shipped in 0.6.4 should solve this - #1698 by allowing you to have separate DCs with differing internal/external addresses. Some other discussions about asymmetric WAN connectivity are happening under #1871.

Closing this out in favor of those. Please re-open and/or let me know if you need anything else!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants