This repository has been archived by the owner on Jul 28, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 41
Intermittent DNS failures when running Alpine containers in user-defined docker-compose network #303
Comments
underscorgan
pushed a commit
to underscorgan/pupperware
that referenced
this issue
May 3, 2019
When testing with the `puppet/puppet-agent-alpine` image on windows systems with LCOW we had intermittent failures in DNS resolution that occurred fairly regularly. It seems to be specifically interaction between the base alpine (3.8 and 3.9) images with windows/LCOW. Two issues related to this issue are moby/libnetwork#2371 and microsoft/opengcs#303
underscorgan
pushed a commit
to underscorgan/puppetserver
that referenced
this issue
May 3, 2019
This updates the puppetserver tests to use `docker-compose` instead of `docker-run`. This also updates the tests to use the shared testing gem from github.com/puppetlabs/pupperware. This also includes a move from the puppet-agent-alpine to puppet-agent-ubuntu for testing. We were seeing a lot of intermittent network failures with the alpine container on windows (LCOW). See moby/libnetwork#2371 and microsoft/opengcs#303 have more information on this issue. This should hopefully clear up the intermittent name resolution failures we were seeing.
Iristyle
added a commit
to Iristyle/pupperware
that referenced
this issue
May 3, 2019
- Remove the domain introspection / setting of AZURE_DOMAIN env var as this does not work as originally thought. Instead, hardcode the DNS suffix `.internal` to each service in the compose stack, and make sure that `dns_search` for `internal` will use the Docker DNS resolver when dealing with these hosts. Note that these compose file settings only affect the configuration of the DNS resolver, *not* resolv.conf. This is different from the docker run behavior, which *does* modify resolv.conf. Also note, config file locations vary depending on whether or not systemd is running in the container. It's not "safe" to refer to services in the cluster by only their short service names like `puppet`, `puppetdb` or `postgres` as they can conflict with hosts on the external network with these names when `resolv.conf` appends DNS search suffixes. When docker compose creates the user defined network, it copies the DNS settings from the host to the `resolv.conf` in each of the containers. This often takes search domains from the outside network and applies them to containers. When network resolutions happen, any default search suffix will be applied to short names when the dns option for ndots is not set to 0. So for instance, given a `resolv.conf` that contains: search delivery.puppetlabs.net A DNS request for `puppet` becomes `puppet.delivery.puppetlabs.net` which will fail to resolve in the Docker DNS resolver, then be sent to the next DNS server in the `nameserver` list, which may resolve it to a different host in the external network. This behaves this way because `resolv.conf` also sets secondary DNS servers from the host. While it is possible to try and service requests for an external domain like `delivery.puppetlabs.net` with the embedded Docker DNS resolver, it's better to instead choose a domain suffix to use inside the cluster. There are some good details on how various network types configure: docker/for-linux#488 (comment) - Note that the .internal domain is typically not recommended for production given the only IANA reserved domains are .example, .test, .invalid or .localhost. However, given the DNS resolver is set to own the resolution of .internal, this is a compromise. In production its recommended to use a subdomain of a domain that you own, but that's not yet configurable in this compose file. A future commit will make this configurable. - Another workaround for this problem would be to set the ndots option in resolv.conf to 0 per the documentation at http://man7.org/linux/man-pages/man5/resolv.conf.5.html However that can't be done for two reasons: - docker-compose schema doesn't actually support setting DNS options docker/cli#1557 - k8s sets ndots to 5 by default, so we don't want to be at odds - A further, but implausible workaround would be to modify the host DNS settings to remove any search suffixes. - The original FQDN change being reverted in this commit was introduced in 2549f19 " Lastly, the Windows specific docker-compose.windows.yml sets up a custom alias in the "default" network so that an extra DNS name for puppetserver can be set based on the FQDN that Facter determines. Without this additional DNS reservation, the `puppetserver ca` command will be unable to connect to the REST endpoint. A better long-term solution is making sure puppetserver is setup to point to `puppet` as the host instead of an FQDN. " With the PUPPETSERVER_HOSTNAME value set on the puppetserver container, both certname and server are set to puppet.internal, preventing a need to synchronize a domain name. - Note that at this time there is also a discrepancy in how Facter 3 behaves vs Facter 2. The Facter 2 gem is being used by the `puppetserver ca` gem based application, and may return a different value for Facter.value('domain') than calling `facter domain` at the command line. Such is the case inside the puppet network, where Facter 2 returns `ops.puppetlabs.net` while Facter 3 returns the value `delivery.puppetlabs.net` This discrepancy makes it so that the `puppetserver ca` application cannot find the client side cert on disk and fails outright. Facter 2 should not be included in the puppetserver packages, and changes have been made to packaging for future releases. For now, setting PUPPETSERVER_HOSTNAME configuration value in the puppetserver container will set the `puppet.conf` values explicitly to the desired DNS name to work around this problem. - Resolution of `postgres.internal` seems to rely on having the `hostname` value explicitly defined in the docker-compose file, even though hostname values supposedly don't interact with DNS in docker - This PR is also made possible by switching over to using the Ubuntu based container from the Alpine container (performed in a prior commit), due to DNS resolution problems with Alpine inside LCOW: moby/libnetwork#2371 microsoft/opengcs#303 - Another avenue that was investigated to resolve the DNS problem in Alpine was to feed host:ip mappings in through --add-host, but it turns out that Windows doesn't yet support that feature per docker/for-win#1455 - Finally, these changes are also made in preparation of switching the pupperware-commercial repo over to a private builder
Iristyle
added a commit
to Iristyle/pupperware
that referenced
this issue
May 4, 2019
- Remove the domain introspection / setting of AZURE_DOMAIN env var as this does not work as originally thought. Instead, hardcode the DNS suffix `.internal` to each service in the compose stack, and make sure that `dns_search` for `internal` will use the Docker DNS resolver when dealing with these hosts. Note that these compose file settings only affect the configuration of the DNS resolver, *not* resolv.conf. This is different from the docker run behavior, which *does* modify resolv.conf. Also note, config file locations vary depending on whether or not systemd is running in the container. It's not "safe" to refer to services in the cluster by only their short service names like `puppet`, `puppetdb` or `postgres` as they can conflict with hosts on the external network with these names when `resolv.conf` appends DNS search suffixes. When docker compose creates the user defined network, it copies the DNS settings from the host to the `resolv.conf` in each of the containers. This often takes search domains from the outside network and applies them to containers. When network resolutions happen, any default search suffix will be applied to short names when the dns option for ndots is not set to 0. So for instance, given a `resolv.conf` that contains: search delivery.puppetlabs.net A DNS request for `puppet` becomes `puppet.delivery.puppetlabs.net` which will fail to resolve in the Docker DNS resolver, then be sent to the next DNS server in the `nameserver` list, which may resolve it to a different host in the external network. This behaves this way because `resolv.conf` also sets secondary DNS servers from the host. While it is possible to try and service requests for an external domain like `delivery.puppetlabs.net` with the embedded Docker DNS resolver, it's better to instead choose a domain suffix to use inside the cluster. There are some good details on how various network types configure: docker/for-linux#488 (comment) - Note that the .internal domain is typically not recommended for production given the only IANA reserved domains are .example, .test, .invalid or .localhost. However, given the DNS resolver is set to own the resolution of .internal, this is a compromise. In production its recommended to use a subdomain of a domain that you own, but that's not yet configurable in this compose file. A future commit will make this configurable. - Another workaround for this problem would be to set the ndots option in resolv.conf to 0 per the documentation at http://man7.org/linux/man-pages/man5/resolv.conf.5.html However that can't be done for two reasons: - docker-compose schema doesn't actually support setting DNS options docker/cli#1557 - k8s sets ndots to 5 by default, so we don't want to be at odds - A further, but implausible workaround would be to modify the host DNS settings to remove any search suffixes. - The original FQDN change being reverted in this commit was introduced in 2549f19 " Lastly, the Windows specific docker-compose.windows.yml sets up a custom alias in the "default" network so that an extra DNS name for puppetserver can be set based on the FQDN that Facter determines. Without this additional DNS reservation, the `puppetserver ca` command will be unable to connect to the REST endpoint. A better long-term solution is making sure puppetserver is setup to point to `puppet` as the host instead of an FQDN. " With the PUPPETSERVER_HOSTNAME value set on the puppetserver container, both certname and server are set to puppet.internal, preventing a need to synchronize a domain name. - Note that at this time there is also a discrepancy in how Facter 3 behaves vs Facter 2. The Facter 2 gem is being used by the `puppetserver ca` gem based application, and may return a different value for Facter.value('domain') than calling `facter domain` at the command line. Such is the case inside the puppet network, where Facter 2 returns `ops.puppetlabs.net` while Facter 3 returns the value `delivery.puppetlabs.net` This discrepancy makes it so that the `puppetserver ca` application cannot find the client side cert on disk and fails outright. Facter 2 should not be included in the puppetserver packages, and changes have been made to packaging for future releases. For now, setting PUPPETSERVER_HOSTNAME configuration value in the puppetserver container will set the `puppet.conf` values explicitly to the desired DNS name to work around this problem. - Resolution of `postgres.internal` seems to rely on having the `hostname` value explicitly defined in the docker-compose file, even though hostname values supposedly don't interact with DNS in docker - This PR is also made possible by switching over to using the Ubuntu based container from the Alpine container (performed in a prior commit), due to DNS resolution problems with Alpine inside LCOW: moby/libnetwork#2371 microsoft/opengcs#303 - Another avenue that was investigated to resolve the DNS problem in Alpine was to feed host:ip mappings in through --add-host, but it turns out that Windows doesn't yet support that feature per docker/for-win#1455 - Finally, these changes are also made in preparation of switching the pupperware-commercial repo over to a private builder
Iristyle
added a commit
to Iristyle/puppetdb
that referenced
this issue
May 4, 2019
- Unexpectedly, a Travis failure was also encountered where 30 seconds of running `host postgres.internal` failed, but the immediately subsequent call to `dig postgres.internal` succeeded. Running dig seems to prime a local cache, so perform a dig prior to host in an effort to help fix this problem, given the PDB container is based on Alpine microsoft/opengcs#303
Iristyle
added a commit
to Iristyle/puppetdb
that referenced
this issue
May 4, 2019
- Unexpectedly, a Travis failure was also encountered where 30 seconds of running `host postgres.internal` failed, but the immediately subsequent call to `dig postgres.internal` succeeded. Running dig seems to prime a local cache, so perform a dig prior to host in an effort to help fix this problem, given the PDB container is based on Alpine microsoft/opengcs#303
Iristyle
added a commit
to puppetlabs/puppetdb
that referenced
this issue
May 4, 2019
- Unexpectedly, a Travis failure was also encountered where 30 seconds of running `host postgres.internal` failed, but the immediately subsequent call to `dig postgres.internal` succeeded. Running dig seems to prime a local cache, so perform a dig prior to host in an effort to help fix this problem, given the PDB container is based on Alpine microsoft/opengcs#303
Iristyle
added a commit
to Iristyle/pupperware
that referenced
this issue
May 4, 2019
- Remove the domain introspection / setting of AZURE_DOMAIN env var as this does not work as originally thought. Instead, hardcode the DNS suffix `.internal` to each service in the compose stack, and make sure that `dns_search` for `internal` will use the Docker DNS resolver when dealing with these hosts. Note that these compose file settings only affect the configuration of the DNS resolver, *not* resolv.conf. This is different from the docker run behavior, which *does* modify resolv.conf. Also note, config file locations vary depending on whether or not systemd is running in the container. It's not "safe" to refer to services in the cluster by only their short service names like `puppet`, `puppetdb` or `postgres` as they can conflict with hosts on the external network with these names when `resolv.conf` appends DNS search suffixes. When docker compose creates the user defined network, it copies the DNS settings from the host to the `resolv.conf` in each of the containers. This often takes search domains from the outside network and applies them to containers. When network resolutions happen, any default search suffix will be applied to short names when the dns option for ndots is not set to 0. So for instance, given a `resolv.conf` that contains: search delivery.puppetlabs.net A DNS request for `puppet` becomes `puppet.delivery.puppetlabs.net` which will fail to resolve in the Docker DNS resolver, then be sent to the next DNS server in the `nameserver` list, which may resolve it to a different host in the external network. This behaves this way because `resolv.conf` also sets secondary DNS servers from the host. While it is possible to try and service requests for an external domain like `delivery.puppetlabs.net` with the embedded Docker DNS resolver, it's better to instead choose a domain suffix to use inside the cluster. There are some good details on how various network types configure: docker/for-linux#488 (comment) - Note that the .internal domain is typically not recommended for production given the only IANA reserved domains are .example, .test, .invalid or .localhost. However, given the DNS resolver is set to own the resolution of .internal, this is a compromise. In production its recommended to use a subdomain of a domain that you own, but that's not yet configurable in this compose file. A future commit will make this configurable. - Another workaround for this problem would be to set the ndots option in resolv.conf to 0 per the documentation at http://man7.org/linux/man-pages/man5/resolv.conf.5.html However that can't be done for two reasons: - docker-compose schema doesn't actually support setting DNS options docker/cli#1557 - k8s sets ndots to 5 by default, so we don't want to be at odds - A further, but implausible workaround would be to modify the host DNS settings to remove any search suffixes. - The original FQDN change being reverted in this commit was introduced in 2549f19 " Lastly, the Windows specific docker-compose.windows.yml sets up a custom alias in the "default" network so that an extra DNS name for puppetserver can be set based on the FQDN that Facter determines. Without this additional DNS reservation, the `puppetserver ca` command will be unable to connect to the REST endpoint. A better long-term solution is making sure puppetserver is setup to point to `puppet` as the host instead of an FQDN. " With the PUPPETSERVER_HOSTNAME value set on the puppetserver container, both certname and server are set to puppet.internal, preventing a need to synchronize a domain name. - Note that at this time there is also a discrepancy in how Facter 3 behaves vs Facter 2. The Facter 2 gem is being used by the `puppetserver ca` gem based application, and may return a different value for Facter.value('domain') than calling `facter domain` at the command line. Such is the case inside the puppet network, where Facter 2 returns `ops.puppetlabs.net` while Facter 3 returns the value `delivery.puppetlabs.net` This discrepancy makes it so that the `puppetserver ca` application cannot find the client side cert on disk and fails outright. Facter 2 should not be included in the puppetserver packages, and changes have been made to packaging for future releases. For now, setting PUPPETSERVER_HOSTNAME configuration value in the puppetserver container will set the `puppet.conf` values explicitly to the desired DNS name to work around this problem. - Resolution of `postgres.internal` seems to rely on having the `hostname` value explicitly defined in the docker-compose file, even though hostname values supposedly don't interact with DNS in docker - This PR is also made possible by switching over to using the Ubuntu based container from the Alpine container (performed in a prior commit), due to DNS resolution problems with Alpine inside LCOW: moby/libnetwork#2371 microsoft/opengcs#303 - Another avenue that was investigated to resolve the DNS problem in Alpine was to feed host:ip mappings in through --add-host, but it turns out that Windows doesn't yet support that feature per docker/for-win#1455 - Finally, these changes are also made in preparation of switching the pupperware-commercial repo over to a private builder
Iristyle
added a commit
to Iristyle/pupperware
that referenced
this issue
May 6, 2019
- Remove the domain introspection / setting of AZURE_DOMAIN env var as this does not work as originally thought. Instead, hardcode the DNS suffix `.internal` to each service in the compose stack, and make sure that `dns_search` for `internal` will use the Docker DNS resolver when dealing with these hosts. Note that these compose file settings only affect the configuration of the DNS resolver, *not* resolv.conf. This is different from the docker run behavior, which *does* modify resolv.conf. Also note, config file locations vary depending on whether or not systemd is running in the container. It's not "safe" to refer to services in the cluster by only their short service names like `puppet`, `puppetdb` or `postgres` as they can conflict with hosts on the external network with these names when `resolv.conf` appends DNS search suffixes. When docker compose creates the user defined network, it copies the DNS settings from the host to the `resolv.conf` in each of the containers. This often takes search domains from the outside network and applies them to containers. When network resolutions happen, any default search suffix will be applied to short names when the dns option for ndots is not set to 0. So for instance, given a `resolv.conf` that contains: search delivery.puppetlabs.net A DNS request for `puppet` becomes `puppet.delivery.puppetlabs.net` which will fail to resolve in the Docker DNS resolver, then be sent to the next DNS server in the `nameserver` list, which may resolve it to a different host in the external network. This behaves this way because `resolv.conf` also sets secondary DNS servers from the host. While it is possible to try and service requests for an external domain like `delivery.puppetlabs.net` with the embedded Docker DNS resolver, it's better to instead choose a domain suffix to use inside the cluster. There are some good details on how various network types configure: docker/for-linux#488 (comment) - Note that the .internal domain is typically not recommended for production given the only IANA reserved domains are .example, .test, .invalid or .localhost. However, given the DNS resolver is set to own the resolution of .internal, this is a compromise. In production its recommended to use a subdomain of a domain that you own, but that's not yet configurable in this compose file. A future commit will make this configurable. - Another workaround for this problem would be to set the ndots option in resolv.conf to 0 per the documentation at http://man7.org/linux/man-pages/man5/resolv.conf.5.html However that can't be done for two reasons: - docker-compose schema doesn't actually support setting DNS options docker/cli#1557 - k8s sets ndots to 5 by default, so we don't want to be at odds - A further, but implausible workaround would be to modify the host DNS settings to remove any search suffixes. - The original FQDN change being reverted in this commit was introduced in 2549f19 " Lastly, the Windows specific docker-compose.windows.yml sets up a custom alias in the "default" network so that an extra DNS name for puppetserver can be set based on the FQDN that Facter determines. Without this additional DNS reservation, the `puppetserver ca` command will be unable to connect to the REST endpoint. A better long-term solution is making sure puppetserver is setup to point to `puppet` as the host instead of an FQDN. " With the PUPPETSERVER_HOSTNAME value set on the puppetserver container, both certname and server are set to puppet.internal, inside of puppet.conf, preventing a need to inject a domain name as was done previously. This is necessary because of a discrepancy in how Facter 3 behaves vs Facter 2, which creates a mismatch between how the host cert is initially generated (using Facter 3) and how `puppetserver ca` finds the files on disk (using Facter 2), that setting PUPPETSERVER_HOSTNAME will explicitly work around. Specifically, Facter 2 may return a different Facter.value('domain') than calling `facter domain` using Facter 3 at the command line. Such is the case inside the puppet network, where Facter 2 returns `ops.puppetlabs.net` while Facter 3 returns `delivery.puppetlabs.net` Without explicitly setting PUPPETSERVER_HOSTNAME, this makes cert files on disk get written as *.delivery.puppetlabs.net, yet the `puppetserver ca` application looks for the client certs on disk as *.ops.puppetlabs.net, which causes `puppetserver ca` to fail. - Facter 2 should not be included in the puppetserver packages, and changes have been made to packaging for future releases, which may remove the need for the above. - This PR is also made possible by switching over to using the Ubuntu based container from the Alpine container (performed in a prior commit), due to DNS resolution problems with Alpine inside LCOW: moby/libnetwork#2371 microsoft/opengcs#303 - Another avenue that was investigated to resolve the DNS problem in Alpine was to feed host:ip mappings in through --add-host, but it turns out that Windows doesn't yet support that feature per docker/for-win#1455 - Finally, these changes are also made in preparation of switching the pupperware-commercial repo over to a private builder
Iristyle
added a commit
to Iristyle/pupperware
that referenced
this issue
May 6, 2019
- Remove the domain introspection / setting of AZURE_DOMAIN env var as this does not work as originally thought. Instead, hardcode the DNS suffix `.internal` to each service in the compose stack, and make sure that `dns_search` for `internal` will use the Docker DNS resolver when dealing with these hosts. Note that these compose file settings only affect the configuration of the DNS resolver, *not* resolv.conf. This is different from the docker run behavior, which *does* modify resolv.conf. Also note, config file locations vary depending on whether or not systemd is running in the container. It's not "safe" to refer to services in the cluster by only their short service names like `puppet`, `puppetdb` or `postgres` as they can conflict with hosts on the external network with these names when `resolv.conf` appends DNS search suffixes. When docker compose creates the user defined network, it copies the DNS settings from the host to the `resolv.conf` in each of the containers. This often takes search domains from the outside network and applies them to containers. When network resolutions happen, any default search suffix will be applied to short names when the dns option for ndots is not set to 0. So for instance, given a `resolv.conf` that contains: search delivery.puppetlabs.net A DNS request for `puppet` becomes `puppet.delivery.puppetlabs.net` which will fail to resolve in the Docker DNS resolver, then be sent to the next DNS server in the `nameserver` list, which may resolve it to a different host in the external network. This behaves this way because `resolv.conf` also sets secondary DNS servers from the host. While it is possible to try and service requests for an external domain like `delivery.puppetlabs.net` with the embedded Docker DNS resolver, it's better to instead choose a domain suffix to use inside the cluster. There are some good details on how various network types configure: docker/for-linux#488 (comment) - Note that the .internal domain is typically not recommended for production given the only IANA reserved domains are .example, .test, .invalid or .localhost. However, given the DNS resolver is set to own the resolution of .internal, this is a compromise. In production its recommended to use a subdomain of a domain that you own, but that's not yet configurable in this compose file. A future commit will make this configurable. - Another workaround for this problem would be to set the ndots option in resolv.conf to 0 per the documentation at http://man7.org/linux/man-pages/man5/resolv.conf.5.html However that can't be done for two reasons: - docker-compose schema doesn't actually support setting DNS options docker/cli#1557 - k8s sets ndots to 5 by default, so we don't want to be at odds - A further, but implausible workaround would be to modify the host DNS settings to remove any search suffixes. - The original FQDN change being reverted in this commit was introduced in 2549f19 " Lastly, the Windows specific docker-compose.windows.yml sets up a custom alias in the "default" network so that an extra DNS name for puppetserver can be set based on the FQDN that Facter determines. Without this additional DNS reservation, the `puppetserver ca` command will be unable to connect to the REST endpoint. A better long-term solution is making sure puppetserver is setup to point to `puppet` as the host instead of an FQDN. " With the PUPPETSERVER_HOSTNAME value set on the puppetserver container, both certname and server are set to puppet.internal, inside of puppet.conf, preventing a need to inject a domain name as was done previously. This is necessary because of a discrepancy in how Facter 3 behaves vs Facter 2, which creates a mismatch between how the host cert is initially generated (using Facter 3) and how `puppetserver ca` finds the files on disk (using Facter 2), that setting PUPPETSERVER_HOSTNAME will explicitly work around. Specifically, Facter 2 may return a different Facter.value('domain') than calling `facter domain` using Facter 3 at the command line. Such is the case inside the puppet network, where Facter 2 returns `ops.puppetlabs.net` while Facter 3 returns `delivery.puppetlabs.net` Without explicitly setting PUPPETSERVER_HOSTNAME, this makes cert files on disk get written as *.delivery.puppetlabs.net, yet the `puppetserver ca` application looks for the client certs on disk as *.ops.puppetlabs.net, which causes `puppetserver ca` to fail. - Facter 2 should not be included in the puppetserver packages, and changes have been made to packaging for future releases, which may remove the need for the above. - This PR is also made possible by switching over to using the Ubuntu based container from the Alpine container (performed in a prior commit), due to DNS resolution problems with Alpine inside LCOW: moby/libnetwork#2371 microsoft/opengcs#303 - Another avenue that was investigated to resolve the DNS problem in Alpine was to feed host:ip mappings in through --add-host, but it turns out that Windows doesn't yet support that feature per docker/for-win#1455 - Finally, these changes are also made in preparation of switching the pupperware-commercial repo over to a private builder - Additionally update k8s / Bolt specs to be consistent with updated naming
Iristyle
added a commit
to Iristyle/puppetdb
that referenced
this issue
Oct 28, 2019
- Unexpectedly, a Travis failure was also encountered where 30 seconds of running `host postgres.internal` failed, but the immediately subsequent call to `dig postgres.internal` succeeded. Running dig seems to prime a local cache, so perform a dig prior to host in an effort to help fix this problem, given the PDB container is based on Alpine microsoft/opengcs#303
Iristyle
pushed a commit
to puppetlabs/pupperware
that referenced
this issue
Apr 9, 2021
When testing with the `puppet/puppet-agent-alpine` image on windows systems with LCOW we had intermittent failures in DNS resolution that occurred fairly regularly. It seems to be specifically interaction between the base alpine (3.8 and 3.9) images with windows/LCOW. Two issues related to this issue are moby/libnetwork#2371 and microsoft/opengcs#303
Iristyle
added a commit
to puppetlabs/pupperware
that referenced
this issue
Apr 9, 2021
- Remove the domain introspection / setting of AZURE_DOMAIN env var as this does not work as originally thought. Instead, hardcode the DNS suffix `.internal` to each service in the compose stack, and make sure that `dns_search` for `internal` will use the Docker DNS resolver when dealing with these hosts. Note that these compose file settings only affect the configuration of the DNS resolver, *not* resolv.conf. This is different from the docker run behavior, which *does* modify resolv.conf. Also note, config file locations vary depending on whether or not systemd is running in the container. It's not "safe" to refer to services in the cluster by only their short service names like `puppet`, `puppetdb` or `postgres` as they can conflict with hosts on the external network with these names when `resolv.conf` appends DNS search suffixes. When docker compose creates the user defined network, it copies the DNS settings from the host to the `resolv.conf` in each of the containers. This often takes search domains from the outside network and applies them to containers. When network resolutions happen, any default search suffix will be applied to short names when the dns option for ndots is not set to 0. So for instance, given a `resolv.conf` that contains: search delivery.puppetlabs.net A DNS request for `puppet` becomes `puppet.delivery.puppetlabs.net` which will fail to resolve in the Docker DNS resolver, then be sent to the next DNS server in the `nameserver` list, which may resolve it to a different host in the external network. This behaves this way because `resolv.conf` also sets secondary DNS servers from the host. While it is possible to try and service requests for an external domain like `delivery.puppetlabs.net` with the embedded Docker DNS resolver, it's better to instead choose a domain suffix to use inside the cluster. There are some good details on how various network types configure: docker/for-linux#488 (comment) - Note that the .internal domain is typically not recommended for production given the only IANA reserved domains are .example, .test, .invalid or .localhost. However, given the DNS resolver is set to own the resolution of .internal, this is a compromise. In production its recommended to use a subdomain of a domain that you own, but that's not yet configurable in this compose file. A future commit will make this configurable. - Another workaround for this problem would be to set the ndots option in resolv.conf to 0 per the documentation at http://man7.org/linux/man-pages/man5/resolv.conf.5.html However that can't be done for two reasons: - docker-compose schema doesn't actually support setting DNS options docker/cli#1557 - k8s sets ndots to 5 by default, so we don't want to be at odds - A further, but implausible workaround would be to modify the host DNS settings to remove any search suffixes. - The original FQDN change being reverted in this commit was introduced in 8b38620 " Lastly, the Windows specific docker-compose.windows.yml sets up a custom alias in the "default" network so that an extra DNS name for puppetserver can be set based on the FQDN that Facter determines. Without this additional DNS reservation, the `puppetserver ca` command will be unable to connect to the REST endpoint. A better long-term solution is making sure puppetserver is setup to point to `puppet` as the host instead of an FQDN. " With the PUPPETSERVER_HOSTNAME value set on the puppetserver container, both certname and server are set to puppet.internal, inside of puppet.conf, preventing a need to inject a domain name as was done previously. This is necessary because of a discrepancy in how Facter 3 behaves vs Facter 2, which creates a mismatch between how the host cert is initially generated (using Facter 3) and how `puppetserver ca` finds the files on disk (using Facter 2), that setting PUPPETSERVER_HOSTNAME will explicitly work around. Specifically, Facter 2 may return a different Facter.value('domain') than calling `facter domain` using Facter 3 at the command line. Such is the case inside the puppet network, where Facter 2 returns `ops.puppetlabs.net` while Facter 3 returns `delivery.puppetlabs.net` Without explicitly setting PUPPETSERVER_HOSTNAME, this makes cert files on disk get written as *.delivery.puppetlabs.net, yet the `puppetserver ca` application looks for the client certs on disk as *.ops.puppetlabs.net, which causes `puppetserver ca` to fail. - Facter 2 should not be included in the puppetserver packages, and changes have been made to packaging for future releases, which may remove the need for the above. - This PR is also made possible by switching over to using the Ubuntu based container from the Alpine container (performed in a prior commit), due to DNS resolution problems with Alpine inside LCOW: moby/libnetwork#2371 microsoft/opengcs#303 - Another avenue that was investigated to resolve the DNS problem in Alpine was to feed host:ip mappings in through --add-host, but it turns out that Windows doesn't yet support that feature per docker/for-win#1455 - Finally, these changes are also made in preparation of switching the pupperware-commercial repo over to a private builder - Additionally update k8s / Bolt specs to be consistent with updated naming
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
This is a cross-post from moby/libnetwork#2371 as I don't know where the bug lies.
In my environment, I am able to reproduce DNS resolution failures minimally with the following compose file when running LCOW.
docker-compose up
will yield something like the following, noting failures likebar_1 | nslookup: can't resolve 'foo.internal': Name does not resolve
andfoo_1 | nslookup: can't resolve 'bar.internal': Name does not resolve
mixed in with successful resolutions:I can run this compose stack on OSX and it does not fail. If I switch to an ubuntu container from Alpine, the resolutions don't fail.
I can at least workaround the problem a bit by modifying the compose file to first perform a
dig
against the host like this:The
nslookup: can't resolve '(null)': Name does not resolve
in the original case is reported to be unnecessary per gliderlabs/docker-alpine#476 (comment), but after performing adig
that message changes and resolutions look like:My host is as follows
The LCOW image is built from linuxkit/lcow@d5dfdbc - I tried the latest merged PR, but it didn't launch containers and I had to revert (more info in linuxkit/lcow#45 (comment))
There are some further details in the original issue I filed at moby/libnetwork#2371
The text was updated successfully, but these errors were encountered: