Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LCOW: Intermittent DNS resolution failures with Alpine containers #2371

Open
Iristyle opened this issue May 3, 2019 · 8 comments
Open

LCOW: Intermittent DNS resolution failures with Alpine containers #2371

Iristyle opened this issue May 3, 2019 · 8 comments

Comments

@Iristyle
Copy link

Iristyle commented May 3, 2019

Preface - I haven't yet debugged this issue enough to know precisely where the issue lies. I do know that I can very trivially reproduce the problem and wanted to at least get the ticket filed / conversation going. It may be related to some combination of:

  • LCOW (or LCOW image / kernel / opengcs / etc)
  • Alpine 3.9
  • Environment - containers are running inside a Server 2019 Hyper-V VM that has nested virtualization enabled
  • Docker version / some nuance of the Docker DNS resolver

I'm pretty sure this has something to do with Alpine in particular, since running the failing scenario with Ubuntu containers instead does not fail.

docker info

Client:
 Debug Mode: false
 Plugins:
  app: Docker Application (Docker Inc., v0.8.0-beta2)
  buildx: Build with BuildKit (Docker Inc., v0.2.0-6-g509c4b6-tp)

Server:
 Containers: 2
  Running: 0
  Paused: 0
  Stopped: 2
 Images: 138
 Server Version: master-dockerproject-2019-04-28
 Storage Driver: windowsfilter (windows) lcow (linux)
  Windows:
  LCOW:
 Logging Driver: json-file
 Plugins:
  Volume: local
  Network: ics l2bridge l2tunnel nat null overlay transparent
  Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
 Swarm: inactive
 Default Isolation: hyperv
 Kernel Version: 10.0 17763 (17763.1.amd64fre.rs5_release.180914-1434)
 Operating System: Windows 10 Enterprise Version 1809 (OS Build 17763.437)
 OSType: windows
 Architecture: x86_64
 CPUs: 2
 Total Memory: 16GiB
 Name: ci-lcow-prod-1
 ID: 0ac02c9d-aaba-42f4-8749-5a64af3068d8
 Docker Root Dir: C:\ProgramData\docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

The LCOW image is built from linuxkit/lcow@d5dfdbc - it includes kernel 4.19.27 amongst other bits. There is an updated kernel image PR that was merged containing newer versions of OpenGCS, Alpine, kernel and runc BUT when I built it, it didn't launch containers and I had to revert (more info in linuxkit/lcow#45 (comment))

compose file to demonstrate the problem

version: '3'

services:
  foo:
    image: alpine:latest
    dns_search: internal
    entrypoint: sh -c "while true; do nslookup bar.internal && sleep 1s; done"
    networks:
      default:
        aliases:
         - foo.internal

  bar:
    image: alpine:latest
    dns_search: internal
    entrypoint: sh -c "while true; do nslookup foo.internal && sleep 1s; done"
    networks:
      default:
        aliases:
         - bar.internal

Output from compose up

The problem is that DNS resolution failures occur pretty regularly - i.e. foo cannot resolve bar.internal fail and vice versa. While the log also shows some successes, there are a number of failures as well (which vary depending on each run).

PS C:\source\alpine-test> docker-compose -f .\docker-compose-bad.yml up
Creating network "alpine-test_default" with the default driver
Creating alpine-test_bar_1 ... done
Creating alpine-test_foo_1 ... done
Attaching to alpine-test_foo_1, alpine-test_bar_1
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | nslookup: can't resolve 'bar.internal': Name does not resolve
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | nslookup: can't resolve 'foo.internal': Name does not resolve
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | nslookup: can't resolve 'foo.internal': Name does not resolve
bar_1  |
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | nslookup: can't resolve 'bar.internal': Name does not resolve
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | nslookup: can't resolve 'bar.internal': Name does not resolve
foo_1  |
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | nslookup: can't resolve 'bar.internal': Name does not resolve
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
Gracefully stopping... (press Ctrl+C again to force)

Workaround

One way to workaround the problem is to have the Alpine container perform a dig against the host, which presumably will cache the DNS record for future nslookup calls

compose file

version: '3'

services:
  foo:
    image: alpine:latest
    dns_search: internal
    entrypoint: sh -c "apk add bind-tools; dig bar.internal; while true; do nslookup bar.internal; sleep 2s; done"
    networks:
      default:
        aliases:
         - foo.internal

  bar:
    image: alpine:latest
    dns_search: internal
    entrypoint: sh -c "apk add bind-tools; dig foo.internal; while true; do nslookup foo.internal; sleep 2s; done"
    networks:
      default:
        aliases:
         - bar.internal

Output from compose up

The nslookup results have changed quite a bit from:

bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25

To

bar_1  | Server:                172.25.128.1
bar_1  | Address:       172.25.128.1#53
bar_1  |
bar_1  | Non-authoritative answer:
bar_1  | Name:  foo.internal
bar_1  | Address: 172.25.139.149
bar_1  |

Here's a longer run from the above compose file showing that nslookup no longer fails intermittently.

PS C:\source\alpine-test> docker-compose up
Creating network "alpine-test_default" with the default driver
Creating alpine-test_bar_1 ... done
Creating alpine-test_foo_1 ... done
Attaching to alpine-test_foo_1, alpine-test_bar_1
foo_1  | fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/main/x86_64/APKINDEX.tar.gz
bar_1  | fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/main/x86_64/APKINDEX.tar.gz
foo_1  | fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/community/x86_64/APKINDEX.tar.gz
bar_1  | fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/community/x86_64/APKINDEX.tar.gz
foo_1  | (1/10) Installing libgcc (8.3.0-r0)
bar_1  | (1/10) Installing libgcc (8.3.0-r0)
bar_1  | (2/10) Installing krb5-conf (1.0-r1)
foo_1  | (2/10) Installing krb5-conf (1.0-r1)
bar_1  | (3/10) Installing libcom_err (1.44.5-r0)
foo_1  | (3/10) Installing libcom_err (1.44.5-r0)
bar_1  | (4/10) Installing keyutils-libs (1.6-r0)
foo_1  | (4/10) Installing keyutils-libs (1.6-r0)
bar_1  | (5/10) Installing libverto (0.3.0-r1)
bar_1  | (6/10) Installing krb5-libs (1.15.5-r0)
foo_1  | (5/10) Installing libverto (0.3.0-r1)
foo_1  | (6/10) Installing krb5-libs (1.15.5-r0)
bar_1  | (7/10) Installing json-c (0.13.1-r0)
bar_1  | (8/10) Installing libxml2 (2.9.9-r1)
foo_1  | (7/10) Installing json-c (0.13.1-r0)
foo_1  | (8/10) Installing libxml2 (2.9.9-r1)
bar_1  | (9/10) Installing bind-libs (9.12.4_p1-r1)
foo_1  | (9/10) Installing bind-libs (9.12.4_p1-r1)
foo_1  | (10/10) Installing bind-tools (9.12.4_p1-r1)
bar_1  | (10/10) Installing bind-tools (9.12.4_p1-r1)
foo_1  | Executing busybox-1.29.3-r10.trigger
bar_1  | Executing busybox-1.29.3-r10.trigger
bar_1  | OK: 12 MiB in 24 packages
foo_1  | OK: 12 MiB in 24 packages
foo_1  |
foo_1  | ; <<>> DiG 9.12.4-P1 <<>> bar.internal
foo_1  | ;; global options: +cmd
foo_1  | ;; Got answer:
foo_1  | ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62166
foo_1  | ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
foo_1  |
foo_1  | ;; QUESTION SECTION:
foo_1  | ;bar.internal.                 IN      A
foo_1  |
foo_1  | ;; ANSWER SECTION:
foo_1  | bar.internal.          600     IN      A       172.25.137.174
foo_1  |
foo_1  | ;; Query time: 0 msec
foo_1  | ;; SERVER: 172.25.128.1#53(172.25.128.1)
foo_1  | ;; WHEN: Fri May 03 18:26:29 UTC 2019
foo_1  | ;; MSG SIZE  rcvd: 58
foo_1  |
foo_1  | Server:                172.25.128.1
foo_1  | Address:       172.25.128.1#53
foo_1  |
foo_1  | Non-authoritative answer:
foo_1  | Name:  bar.internal
foo_1  | Address: 172.25.137.174
foo_1  |
bar_1  |
bar_1  | ; <<>> DiG 9.12.4-P1 <<>> foo.internal
bar_1  | ;; global options: +cmd
bar_1  | ;; Got answer:
bar_1  | ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34929
bar_1  | ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
bar_1  |
bar_1  | ;; QUESTION SECTION:
bar_1  | ;foo.internal.                 IN      A
bar_1  |
bar_1  | ;; ANSWER SECTION:
bar_1  | foo.internal.          600     IN      A       172.25.139.149
bar_1  |
bar_1  | ;; Query time: 0 msec
bar_1  | ;; SERVER: 172.25.128.1#53(172.25.128.1)
bar_1  | ;; WHEN: Fri May 03 18:26:29 UTC 2019
bar_1  | ;; MSG SIZE  rcvd: 58
bar_1  |
bar_1  | Server:                172.25.128.1
bar_1  | Address:       172.25.128.1#53
bar_1  |
bar_1  | Non-authoritative answer:
bar_1  | Name:  foo.internal
bar_1  | Address: 172.25.139.149
bar_1  |
foo_1  | Server:                172.25.128.1
foo_1  | Address:       172.25.128.1#53
foo_1  |
foo_1  | Non-authoritative answer:
foo_1  | Name:  bar.internal
foo_1  | Address: 172.25.137.174
foo_1  |
bar_1  | Server:                172.25.128.1
bar_1  | Address:       172.25.128.1#53
bar_1  |
bar_1  | Non-authoritative answer:
bar_1  | Name:  foo.internal
bar_1  | Address: 172.25.139.149
bar_1  |
foo_1  | Server:                172.25.128.1
foo_1  | Address:       172.25.128.1#53
foo_1  |
foo_1  | Non-authoritative answer:
foo_1  | Name:  bar.internal
foo_1  | Address: 172.25.137.174
foo_1  |
bar_1  | Server:                172.25.128.1
bar_1  | Address:       172.25.128.1#53
bar_1  |
bar_1  | Non-authoritative answer:
bar_1  | Name:  foo.internal
bar_1  | Address: 172.25.139.149
bar_1  |
foo_1  | Server:                172.25.128.1
foo_1  | Address:       172.25.128.1#53
foo_1  |
foo_1  | Non-authoritative answer:
foo_1  | Name:  bar.internal
foo_1  | Address: 172.25.137.174
foo_1  |
bar_1  | Server:                172.25.128.1
bar_1  | Address:       172.25.128.1#53
bar_1  |
bar_1  | Non-authoritative answer:
bar_1  | Name:  foo.internal
bar_1  | Address: 172.25.139.149
bar_1  |
foo_1  | Server:                172.25.128.1
foo_1  | Address:       172.25.128.1#53
foo_1  |
foo_1  | Non-authoritative answer:
foo_1  | Name:  bar.internal
foo_1  | Address: 172.25.137.174
foo_1  |
bar_1  | Server:                172.25.128.1
bar_1  | Address:       172.25.128.1#53
bar_1  |
bar_1  | Non-authoritative answer:
bar_1  | Name:  foo.internal
bar_1  | Address: 172.25.139.149
bar_1  |
foo_1  | Server:                172.25.128.1
foo_1  | Address:       172.25.128.1#53
foo_1  |
foo_1  | Non-authoritative answer:
foo_1  | Name:  bar.internal
foo_1  | Address: 172.25.137.174
foo_1  |
bar_1  | Server:                172.25.128.1
bar_1  | Address:       172.25.128.1#53
bar_1  |
bar_1  | Non-authoritative answer:
bar_1  | Name:  foo.internal
bar_1  | Address: 172.25.139.149
bar_1  |

Ubuntu results

Compose file

version: '3'

services:
  foo:
    image: ubuntu:latest
    dns_search: internal
    entrypoint: sh -c "apt-get update && apt-get install -y dnsutils; while true; do nslookup 'bar.internal'; sleep 2s; done"
    networks:
      default:
        aliases:
         - foo.internal

  bar:
    image: ubuntu:latest
    dns_search: internal
    entrypoint: sh -c "apt-get update && apt-get install -y dnsutils; while true; do nslookup 'foo.internal'; sleep 2s; done"
    networks:
      default:
        aliases:
         - bar.internal

I'll spare the full log here, but switching to an Ubuntu container and nslookup succeeds from the onset:

foo_1  | Server:                172.30.16.1
foo_1  | Address:       172.30.16.1#53
foo_1  |
foo_1  | Non-authoritative answer:
foo_1  | Name:  bar.internal
foo_1  | Address: 172.30.18.190
foo_1  |
bar_1  | Server:                172.30.16.1
bar_1  | Address:       172.30.16.1#53
bar_1  |
bar_1  | Non-authoritative answer:
bar_1  | Name:  foo.internal
bar_1  | Address: 172.30.28.25
bar_1  |
@Iristyle
Copy link
Author

Iristyle commented May 3, 2019

I just verified that I'm not seeing the same behavior with Alpine 3.9 on my Mac Docker (I get the reverse pointer fail of nslookup: can't resolve '(null)': Name does not resolve, but I'm always able to resolve queries).

Not sure who the right MS folks are to contact - @jhowardmsft or @jterry75?

This might end up being a ticket to file in opengcs project.

docker info

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 18.09.2
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 09c8266bf2fcf9519a651b04ae54c967b9ab86ec
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.125-linuxkit
Operating System: Docker for Mac
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 1.952GiB
Name: linuxkit-025000000001
ID: WOHB:ZTHF:LEYI:UJM6:XU5Y:KRRI:2TLV:Z352:WLSD:HYPI:IUBB:K27H
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 26
 Goroutines: 53
 System Time: 2019-05-03T18:48:28.685819469Z
 EventsListeners: 2
HTTP Proxy: gateway.docker.internal:3128
HTTPS Proxy: gateway.docker.internal:3129
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

underscorgan pushed a commit to underscorgan/puppetserver that referenced this issue May 3, 2019
This updates the puppetserver tests to use `docker-compose` instead of
`docker-run`. This also updates the tests to use the shared testing
gem from github.com/puppetlabs/pupperware.

This also includes a move from the puppet-agent-alpine to
puppet-agent-ubuntu for testing. We were seeing a lot of intermittent
network failures with the alpine container on windows (LCOW). See
moby/libnetwork#2371 for the bug report.
This should hopefully clear up the intermittent failures we were seeing.
underscorgan pushed a commit to underscorgan/puppetserver that referenced this issue May 3, 2019
This updates the puppetserver tests to use `docker-compose` instead of
`docker-run`. This also updates the tests to use the shared testing
gem from github.com/puppetlabs/pupperware.

This also includes a move from the puppet-agent-alpine to
puppet-agent-ubuntu for testing. We were seeing a lot of intermittent
network failures with the alpine container on windows (LCOW). See
moby/libnetwork#2371 for the bug report.
This should hopefully clear up the intermittent failures we were seeing.
underscorgan pushed a commit to underscorgan/pupperware that referenced this issue May 3, 2019
When testing with the `puppet/puppet-agent-alpine` image on windows
systems with LCOW we had intermittent failures in DNS resolution that
occurred fairly regularly. It seems to be specifically interaction
between the base alpine (3.8 and 3.9) images with windows/LCOW.

Two issues related to this issue are
moby/libnetwork#2371 and
microsoft/opengcs#303
underscorgan pushed a commit to underscorgan/puppetserver that referenced this issue May 3, 2019
This updates the puppetserver tests to use `docker-compose` instead of
`docker-run`. This also updates the tests to use the shared testing
gem from github.com/puppetlabs/pupperware.

This also includes a move from the puppet-agent-alpine to
puppet-agent-ubuntu for testing. We were seeing a lot of intermittent
network failures with the alpine container on windows (LCOW). See
moby/libnetwork#2371 for the bug report.
This should hopefully clear up the intermittent failures we were seeing.
underscorgan pushed a commit to underscorgan/puppetserver that referenced this issue May 3, 2019
This updates the puppetserver tests to use `docker-compose` instead of
`docker-run`. This also updates the tests to use the shared testing
gem from github.com/puppetlabs/pupperware.

This also includes a move from the puppet-agent-alpine to
puppet-agent-ubuntu for testing. We were seeing a lot of intermittent
network failures with the alpine container on windows (LCOW). See
moby/libnetwork#2371 and
microsoft/opengcs#303 have more information on
this issue. This should hopefully clear up the intermittent name
resolution failures we were seeing.
Iristyle added a commit to Iristyle/pupperware that referenced this issue May 3, 2019
 - Remove the domain introspection / setting of AZURE_DOMAIN env var
   as this does not work as originally thought.

   Instead, hardcode the DNS suffix `.internal` to each service in the
   compose stack, and make sure that `dns_search` for `internal` will
   use the Docker DNS resolver when dealing with these hosts. Note that
   these compose file settings only affect the configuration of the
   DNS resolver, *not* resolv.conf. This is different from the
   docker run behavior, which *does* modify resolv.conf. Also note,
   config file locations vary depending on whether or not systemd is
   running in the container.

   It's not "safe" to refer to services in the cluster by only their
   short service names like `puppet`, `puppetdb` or `postgres` as they
   can conflict with hosts on the external network with these names
   when `resolv.conf` appends DNS search suffixes.

   When docker compose creates the user defined network, it copies the
   DNS settings from the host to the `resolv.conf` in each of the
   containers. This often takes search domains from the outside network
   and applies them to containers.

   When network resolutions happen, any default search suffix will be
   applied to short names when the dns option for ndots is not set to 0.
   So for instance, given a `resolv.conf` that contains:

   search delivery.puppetlabs.net

   A DNS request for `puppet` becomes `puppet.delivery.puppetlabs.net`
   which will fail to resolve in the Docker DNS resolver, then be sent
   to the next DNS server in the `nameserver` list, which may resolve it
   to a different host in the external network. This behaves this way
   because `resolv.conf` also sets secondary DNS servers from the host.

   While it is possible to try and service requests for an external
   domain like `delivery.puppetlabs.net` with the embedded Docker DNS
   resolver, it's better to instead choose a domain suffix to use inside
   the cluster.

   There are some good details on how various network types configure:
   docker/for-linux#488 (comment)

 - Note that the .internal domain is typically not recommended for
   production given the only IANA reserved domains are .example, .test,
   .invalid or .localhost. However, given the DNS resolver is set to
   own the resolution of .internal, this is a compromise.

   In production its recommended to use a subdomain of a domain that
   you own, but that's not yet configurable in this compose file. A
   future commit will make this configurable.

 - Another workaround for this problem would be to set the ndots option
   in resolv.conf to 0 per the documentation at
   http://man7.org/linux/man-pages/man5/resolv.conf.5.html

   However that can't be done for two reasons:

   - docker-compose schema doesn't actually support setting DNS options
     docker/cli#1557

   - k8s sets ndots to 5 by default, so we don't want to be at odds

 - A further, but implausible workaround would be to modify the host DNS
   settings to remove any search suffixes.

 - The original FQDN change being reverted in this commit was introduced
   in 2549f19

   "
   Lastly, the Windows specific docker-compose.windows.yml sets up a
   custom alias in the "default" network so that an extra DNS name for
   puppetserver can be set based on the FQDN that Facter determines.
   Without this additional DNS reservation, the `puppetserver ca`
   command will be unable to connect to the REST endpoint.

   A better long-term solution is making sure puppetserver is setup to
   point to `puppet` as the host instead of an FQDN.
   "

   With the PUPPETSERVER_HOSTNAME value set on the puppetserver
   container, both certname and server are set to puppet.internal,
   preventing a need to synchronize a domain name.

 - Note that at this time there is also a discrepancy in how Facter 3
   behaves vs Facter 2.

   The Facter 2 gem is being used by the `puppetserver ca` gem based
   application, and may return a different value for
   Facter.value('domain') than calling `facter domain` at the command
   line.  Such is the case inside the puppet network, where Facter 2
   returns `ops.puppetlabs.net` while Facter 3 returns the value
   `delivery.puppetlabs.net`

   This discrepancy makes it so that the `puppetserver ca` application
   cannot find the client side cert on disk and fails outright.

   Facter 2 should not be included in the puppetserver packages, and
   changes have been made to packaging for future releases.

   For now, setting PUPPETSERVER_HOSTNAME configuration value in the
   puppetserver container will set the `puppet.conf` values explicitly
   to the desired DNS name to work around this problem.

 - Resolution of `postgres.internal` seems to rely on having the
   `hostname` value explicitly defined in the docker-compose file, even
   though hostname values supposedly don't interact with DNS in docker

 - This PR is also made possible by switching over to using the Ubuntu
   based container from the Alpine container (performed in a prior
   commit), due to DNS resolution problems with Alpine inside LCOW:

   moby/libnetwork#2371
   microsoft/opengcs#303

 - Another avenue that was investigated to resolve the DNS problem in
   Alpine was to feed host:ip mappings in through --add-host, but it
   turns out that Windows doesn't yet support that feature per

   docker/for-win#1455

 - Finally, these changes are also made in preparation of switching the
   pupperware-commercial repo over to a private builder
Iristyle added a commit to Iristyle/pupperware that referenced this issue May 4, 2019
 - Remove the domain introspection / setting of AZURE_DOMAIN env var
   as this does not work as originally thought.

   Instead, hardcode the DNS suffix `.internal` to each service in the
   compose stack, and make sure that `dns_search` for `internal` will
   use the Docker DNS resolver when dealing with these hosts. Note that
   these compose file settings only affect the configuration of the
   DNS resolver, *not* resolv.conf. This is different from the
   docker run behavior, which *does* modify resolv.conf. Also note,
   config file locations vary depending on whether or not systemd is
   running in the container.

   It's not "safe" to refer to services in the cluster by only their
   short service names like `puppet`, `puppetdb` or `postgres` as they
   can conflict with hosts on the external network with these names
   when `resolv.conf` appends DNS search suffixes.

   When docker compose creates the user defined network, it copies the
   DNS settings from the host to the `resolv.conf` in each of the
   containers. This often takes search domains from the outside network
   and applies them to containers.

   When network resolutions happen, any default search suffix will be
   applied to short names when the dns option for ndots is not set to 0.
   So for instance, given a `resolv.conf` that contains:

   search delivery.puppetlabs.net

   A DNS request for `puppet` becomes `puppet.delivery.puppetlabs.net`
   which will fail to resolve in the Docker DNS resolver, then be sent
   to the next DNS server in the `nameserver` list, which may resolve it
   to a different host in the external network. This behaves this way
   because `resolv.conf` also sets secondary DNS servers from the host.

   While it is possible to try and service requests for an external
   domain like `delivery.puppetlabs.net` with the embedded Docker DNS
   resolver, it's better to instead choose a domain suffix to use inside
   the cluster.

   There are some good details on how various network types configure:
   docker/for-linux#488 (comment)

 - Note that the .internal domain is typically not recommended for
   production given the only IANA reserved domains are .example, .test,
   .invalid or .localhost. However, given the DNS resolver is set to
   own the resolution of .internal, this is a compromise.

   In production its recommended to use a subdomain of a domain that
   you own, but that's not yet configurable in this compose file. A
   future commit will make this configurable.

 - Another workaround for this problem would be to set the ndots option
   in resolv.conf to 0 per the documentation at
   http://man7.org/linux/man-pages/man5/resolv.conf.5.html

   However that can't be done for two reasons:

   - docker-compose schema doesn't actually support setting DNS options
     docker/cli#1557

   - k8s sets ndots to 5 by default, so we don't want to be at odds

 - A further, but implausible workaround would be to modify the host DNS
   settings to remove any search suffixes.

 - The original FQDN change being reverted in this commit was introduced
   in 2549f19

   "
   Lastly, the Windows specific docker-compose.windows.yml sets up a
   custom alias in the "default" network so that an extra DNS name for
   puppetserver can be set based on the FQDN that Facter determines.
   Without this additional DNS reservation, the `puppetserver ca`
   command will be unable to connect to the REST endpoint.

   A better long-term solution is making sure puppetserver is setup to
   point to `puppet` as the host instead of an FQDN.
   "

   With the PUPPETSERVER_HOSTNAME value set on the puppetserver
   container, both certname and server are set to puppet.internal,
   preventing a need to synchronize a domain name.

 - Note that at this time there is also a discrepancy in how Facter 3
   behaves vs Facter 2.

   The Facter 2 gem is being used by the `puppetserver ca` gem based
   application, and may return a different value for
   Facter.value('domain') than calling `facter domain` at the command
   line.  Such is the case inside the puppet network, where Facter 2
   returns `ops.puppetlabs.net` while Facter 3 returns the value
   `delivery.puppetlabs.net`

   This discrepancy makes it so that the `puppetserver ca` application
   cannot find the client side cert on disk and fails outright.

   Facter 2 should not be included in the puppetserver packages, and
   changes have been made to packaging for future releases.

   For now, setting PUPPETSERVER_HOSTNAME configuration value in the
   puppetserver container will set the `puppet.conf` values explicitly
   to the desired DNS name to work around this problem.

 - Resolution of `postgres.internal` seems to rely on having the
   `hostname` value explicitly defined in the docker-compose file, even
   though hostname values supposedly don't interact with DNS in docker

 - This PR is also made possible by switching over to using the Ubuntu
   based container from the Alpine container (performed in a prior
   commit), due to DNS resolution problems with Alpine inside LCOW:

   moby/libnetwork#2371
   microsoft/opengcs#303

 - Another avenue that was investigated to resolve the DNS problem in
   Alpine was to feed host:ip mappings in through --add-host, but it
   turns out that Windows doesn't yet support that feature per

   docker/for-win#1455

 - Finally, these changes are also made in preparation of switching the
   pupperware-commercial repo over to a private builder
Iristyle added a commit to Iristyle/pupperware that referenced this issue May 4, 2019
 - Remove the domain introspection / setting of AZURE_DOMAIN env var
   as this does not work as originally thought.

   Instead, hardcode the DNS suffix `.internal` to each service in the
   compose stack, and make sure that `dns_search` for `internal` will
   use the Docker DNS resolver when dealing with these hosts. Note that
   these compose file settings only affect the configuration of the
   DNS resolver, *not* resolv.conf. This is different from the
   docker run behavior, which *does* modify resolv.conf. Also note,
   config file locations vary depending on whether or not systemd is
   running in the container.

   It's not "safe" to refer to services in the cluster by only their
   short service names like `puppet`, `puppetdb` or `postgres` as they
   can conflict with hosts on the external network with these names
   when `resolv.conf` appends DNS search suffixes.

   When docker compose creates the user defined network, it copies the
   DNS settings from the host to the `resolv.conf` in each of the
   containers. This often takes search domains from the outside network
   and applies them to containers.

   When network resolutions happen, any default search suffix will be
   applied to short names when the dns option for ndots is not set to 0.
   So for instance, given a `resolv.conf` that contains:

   search delivery.puppetlabs.net

   A DNS request for `puppet` becomes `puppet.delivery.puppetlabs.net`
   which will fail to resolve in the Docker DNS resolver, then be sent
   to the next DNS server in the `nameserver` list, which may resolve it
   to a different host in the external network. This behaves this way
   because `resolv.conf` also sets secondary DNS servers from the host.

   While it is possible to try and service requests for an external
   domain like `delivery.puppetlabs.net` with the embedded Docker DNS
   resolver, it's better to instead choose a domain suffix to use inside
   the cluster.

   There are some good details on how various network types configure:
   docker/for-linux#488 (comment)

 - Note that the .internal domain is typically not recommended for
   production given the only IANA reserved domains are .example, .test,
   .invalid or .localhost. However, given the DNS resolver is set to
   own the resolution of .internal, this is a compromise.

   In production its recommended to use a subdomain of a domain that
   you own, but that's not yet configurable in this compose file. A
   future commit will make this configurable.

 - Another workaround for this problem would be to set the ndots option
   in resolv.conf to 0 per the documentation at
   http://man7.org/linux/man-pages/man5/resolv.conf.5.html

   However that can't be done for two reasons:

   - docker-compose schema doesn't actually support setting DNS options
     docker/cli#1557

   - k8s sets ndots to 5 by default, so we don't want to be at odds

 - A further, but implausible workaround would be to modify the host DNS
   settings to remove any search suffixes.

 - The original FQDN change being reverted in this commit was introduced
   in 2549f19

   "
   Lastly, the Windows specific docker-compose.windows.yml sets up a
   custom alias in the "default" network so that an extra DNS name for
   puppetserver can be set based on the FQDN that Facter determines.
   Without this additional DNS reservation, the `puppetserver ca`
   command will be unable to connect to the REST endpoint.

   A better long-term solution is making sure puppetserver is setup to
   point to `puppet` as the host instead of an FQDN.
   "

   With the PUPPETSERVER_HOSTNAME value set on the puppetserver
   container, both certname and server are set to puppet.internal,
   preventing a need to synchronize a domain name.

 - Note that at this time there is also a discrepancy in how Facter 3
   behaves vs Facter 2.

   The Facter 2 gem is being used by the `puppetserver ca` gem based
   application, and may return a different value for
   Facter.value('domain') than calling `facter domain` at the command
   line.  Such is the case inside the puppet network, where Facter 2
   returns `ops.puppetlabs.net` while Facter 3 returns the value
   `delivery.puppetlabs.net`

   This discrepancy makes it so that the `puppetserver ca` application
   cannot find the client side cert on disk and fails outright.

   Facter 2 should not be included in the puppetserver packages, and
   changes have been made to packaging for future releases.

   For now, setting PUPPETSERVER_HOSTNAME configuration value in the
   puppetserver container will set the `puppet.conf` values explicitly
   to the desired DNS name to work around this problem.

 - Resolution of `postgres.internal` seems to rely on having the
   `hostname` value explicitly defined in the docker-compose file, even
   though hostname values supposedly don't interact with DNS in docker

 - This PR is also made possible by switching over to using the Ubuntu
   based container from the Alpine container (performed in a prior
   commit), due to DNS resolution problems with Alpine inside LCOW:

   moby/libnetwork#2371
   microsoft/opengcs#303

 - Another avenue that was investigated to resolve the DNS problem in
   Alpine was to feed host:ip mappings in through --add-host, but it
   turns out that Windows doesn't yet support that feature per

   docker/for-win#1455

 - Finally, these changes are also made in preparation of switching the
   pupperware-commercial repo over to a private builder
Iristyle added a commit to Iristyle/pupperware that referenced this issue May 6, 2019
 - Remove the domain introspection / setting of AZURE_DOMAIN env var
   as this does not work as originally thought.

   Instead, hardcode the DNS suffix `.internal` to each service in the
   compose stack, and make sure that `dns_search` for `internal` will
   use the Docker DNS resolver when dealing with these hosts. Note that
   these compose file settings only affect the configuration of the
   DNS resolver, *not* resolv.conf. This is different from the
   docker run behavior, which *does* modify resolv.conf. Also note,
   config file locations vary depending on whether or not systemd is
   running in the container.

   It's not "safe" to refer to services in the cluster by only their
   short service names like `puppet`, `puppetdb` or `postgres` as they
   can conflict with hosts on the external network with these names
   when `resolv.conf` appends DNS search suffixes.

   When docker compose creates the user defined network, it copies the
   DNS settings from the host to the `resolv.conf` in each of the
   containers. This often takes search domains from the outside network
   and applies them to containers.

   When network resolutions happen, any default search suffix will be
   applied to short names when the dns option for ndots is not set to 0.
   So for instance, given a `resolv.conf` that contains:

   search delivery.puppetlabs.net

   A DNS request for `puppet` becomes `puppet.delivery.puppetlabs.net`
   which will fail to resolve in the Docker DNS resolver, then be sent
   to the next DNS server in the `nameserver` list, which may resolve it
   to a different host in the external network. This behaves this way
   because `resolv.conf` also sets secondary DNS servers from the host.

   While it is possible to try and service requests for an external
   domain like `delivery.puppetlabs.net` with the embedded Docker DNS
   resolver, it's better to instead choose a domain suffix to use inside
   the cluster.

   There are some good details on how various network types configure:
   docker/for-linux#488 (comment)

 - Note that the .internal domain is typically not recommended for
   production given the only IANA reserved domains are .example, .test,
   .invalid or .localhost. However, given the DNS resolver is set to
   own the resolution of .internal, this is a compromise.

   In production its recommended to use a subdomain of a domain that
   you own, but that's not yet configurable in this compose file. A
   future commit will make this configurable.

 - Another workaround for this problem would be to set the ndots option
   in resolv.conf to 0 per the documentation at
   http://man7.org/linux/man-pages/man5/resolv.conf.5.html

   However that can't be done for two reasons:

   - docker-compose schema doesn't actually support setting DNS options
     docker/cli#1557

   - k8s sets ndots to 5 by default, so we don't want to be at odds

 - A further, but implausible workaround would be to modify the host DNS
   settings to remove any search suffixes.

 - The original FQDN change being reverted in this commit was introduced
   in 2549f19

   "
   Lastly, the Windows specific docker-compose.windows.yml sets up a
   custom alias in the "default" network so that an extra DNS name for
   puppetserver can be set based on the FQDN that Facter determines.
   Without this additional DNS reservation, the `puppetserver ca`
   command will be unable to connect to the REST endpoint.

   A better long-term solution is making sure puppetserver is setup to
   point to `puppet` as the host instead of an FQDN.
   "

   With the PUPPETSERVER_HOSTNAME value set on the puppetserver
   container, both certname and server are set to puppet.internal,
   inside of puppet.conf, preventing a need to inject a domain name as
   was done previously.

   This is necessary because of a discrepancy in how Facter 3 behaves vs
   Facter 2, which creates a mismatch between how the host cert is
   initially generated (using Facter 3) and how `puppetserver ca`
   finds the files on disk (using Facter 2), that setting
   PUPPETSERVER_HOSTNAME will explicitly work around.

   Specifically, Facter 2 may return a different Facter.value('domain')
   than calling `facter domain` using Facter 3 at the command line.
   Such is the case inside the puppet network, where Facter 2 returns
   `ops.puppetlabs.net` while Facter 3 returns `delivery.puppetlabs.net`

	 Without explicitly setting PUPPETSERVER_HOSTNAME, this makes cert
   files on disk get written as *.delivery.puppetlabs.net, yet the
   `puppetserver ca` application looks for the client certs on disk as
   *.ops.puppetlabs.net, which causes `puppetserver ca` to fail.

 - Facter 2 should not be included in the puppetserver packages, and
   changes have been made to packaging for future releases, which may
   remove the need for the above.

 - This PR is also made possible by switching over to using the Ubuntu
   based container from the Alpine container (performed in a prior
   commit), due to DNS resolution problems with Alpine inside LCOW:

   moby/libnetwork#2371
   microsoft/opengcs#303

 - Another avenue that was investigated to resolve the DNS problem in
   Alpine was to feed host:ip mappings in through --add-host, but it
   turns out that Windows doesn't yet support that feature per

   docker/for-win#1455

 - Finally, these changes are also made in preparation of switching the
   pupperware-commercial repo over to a private builder
Iristyle added a commit to Iristyle/pupperware that referenced this issue May 6, 2019
 - Remove the domain introspection / setting of AZURE_DOMAIN env var
   as this does not work as originally thought.

   Instead, hardcode the DNS suffix `.internal` to each service in the
   compose stack, and make sure that `dns_search` for `internal` will
   use the Docker DNS resolver when dealing with these hosts. Note that
   these compose file settings only affect the configuration of the
   DNS resolver, *not* resolv.conf. This is different from the
   docker run behavior, which *does* modify resolv.conf. Also note,
   config file locations vary depending on whether or not systemd is
   running in the container.

   It's not "safe" to refer to services in the cluster by only their
   short service names like `puppet`, `puppetdb` or `postgres` as they
   can conflict with hosts on the external network with these names
   when `resolv.conf` appends DNS search suffixes.

   When docker compose creates the user defined network, it copies the
   DNS settings from the host to the `resolv.conf` in each of the
   containers. This often takes search domains from the outside network
   and applies them to containers.

   When network resolutions happen, any default search suffix will be
   applied to short names when the dns option for ndots is not set to 0.
   So for instance, given a `resolv.conf` that contains:

   search delivery.puppetlabs.net

   A DNS request for `puppet` becomes `puppet.delivery.puppetlabs.net`
   which will fail to resolve in the Docker DNS resolver, then be sent
   to the next DNS server in the `nameserver` list, which may resolve it
   to a different host in the external network. This behaves this way
   because `resolv.conf` also sets secondary DNS servers from the host.

   While it is possible to try and service requests for an external
   domain like `delivery.puppetlabs.net` with the embedded Docker DNS
   resolver, it's better to instead choose a domain suffix to use inside
   the cluster.

   There are some good details on how various network types configure:
   docker/for-linux#488 (comment)

 - Note that the .internal domain is typically not recommended for
   production given the only IANA reserved domains are .example, .test,
   .invalid or .localhost. However, given the DNS resolver is set to
   own the resolution of .internal, this is a compromise.

   In production its recommended to use a subdomain of a domain that
   you own, but that's not yet configurable in this compose file. A
   future commit will make this configurable.

 - Another workaround for this problem would be to set the ndots option
   in resolv.conf to 0 per the documentation at
   http://man7.org/linux/man-pages/man5/resolv.conf.5.html

   However that can't be done for two reasons:

   - docker-compose schema doesn't actually support setting DNS options
     docker/cli#1557

   - k8s sets ndots to 5 by default, so we don't want to be at odds

 - A further, but implausible workaround would be to modify the host DNS
   settings to remove any search suffixes.

 - The original FQDN change being reverted in this commit was introduced
   in 2549f19

   "
   Lastly, the Windows specific docker-compose.windows.yml sets up a
   custom alias in the "default" network so that an extra DNS name for
   puppetserver can be set based on the FQDN that Facter determines.
   Without this additional DNS reservation, the `puppetserver ca`
   command will be unable to connect to the REST endpoint.

   A better long-term solution is making sure puppetserver is setup to
   point to `puppet` as the host instead of an FQDN.
   "

   With the PUPPETSERVER_HOSTNAME value set on the puppetserver
   container, both certname and server are set to puppet.internal,
   inside of puppet.conf, preventing a need to inject a domain name as
   was done previously.

   This is necessary because of a discrepancy in how Facter 3 behaves vs
   Facter 2, which creates a mismatch between how the host cert is
   initially generated (using Facter 3) and how `puppetserver ca`
   finds the files on disk (using Facter 2), that setting
   PUPPETSERVER_HOSTNAME will explicitly work around.

   Specifically, Facter 2 may return a different Facter.value('domain')
   than calling `facter domain` using Facter 3 at the command line.
   Such is the case inside the puppet network, where Facter 2 returns
   `ops.puppetlabs.net` while Facter 3 returns `delivery.puppetlabs.net`

	 Without explicitly setting PUPPETSERVER_HOSTNAME, this makes cert
   files on disk get written as *.delivery.puppetlabs.net, yet the
   `puppetserver ca` application looks for the client certs on disk as
   *.ops.puppetlabs.net, which causes `puppetserver ca` to fail.

 - Facter 2 should not be included in the puppetserver packages, and
   changes have been made to packaging for future releases, which may
   remove the need for the above.

 - This PR is also made possible by switching over to using the Ubuntu
   based container from the Alpine container (performed in a prior
   commit), due to DNS resolution problems with Alpine inside LCOW:

   moby/libnetwork#2371
   microsoft/opengcs#303

 - Another avenue that was investigated to resolve the DNS problem in
   Alpine was to feed host:ip mappings in through --add-host, but it
   turns out that Windows doesn't yet support that feature per

   docker/for-win#1455

 - Finally, these changes are also made in preparation of switching the
   pupperware-commercial repo over to a private builder

 - Additionally update k8s / Bolt specs to be consistent with updated
   naming
Iristyle added a commit to Iristyle/puppetdb that referenced this issue Aug 13, 2019
 - Alpine seems to still be having issues with DNS resolutions inside
   an LCOW environment. In an effort to reduce these transient
   problems, switch the base container to a non-Alpine platform.

   A ticket has been filed with a repro at :

   moby/libnetwork#2371

 - While this may increase the image size a bit, the goal here is
   reliability and robustness

 - Ubuntu 18.04 shares a lineage with debian buster, which should
   be a well supported platform for PDB
Iristyle added a commit to Iristyle/puppetdb that referenced this issue Aug 13, 2019
 - Alpine seems to still be having issues with DNS resolutions inside
   an LCOW environment. In an effort to reduce these transient
   problems, switch the base container to a non-Alpine platform.

   A ticket has been filed with a repro for Alpine DNS issues under LCOW
   moby/libnetwork#2371

 - While this may increase the image size by about 100MB, the goal here
   is reliability and robustness

   for the builder container:
   clojure:lein-alpine was about 142MB
   clojure:openjdk-8-lein is about 507MB

	 for the target container:
   openjdk:8-jre-alpine was about 85MB
   openjdk:8-buster-slim is about 184MB

 - Ubuntu 18.04 shares a lineage with debian buster, which should
   be a well supported platform for PDB

   All OpenJDK container variants are listed at:
   https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links
Iristyle added a commit to Iristyle/puppetdb that referenced this issue Aug 13, 2019
 - Alpine seems to still be having issues with DNS resolutions inside
   an LCOW environment. In an effort to reduce these transient
   problems, switch the base container to a non-Alpine platform.

   A ticket has been filed with a repro for Alpine DNS issues under LCOW
   moby/libnetwork#2371

 - While this may increase the image size by about 100MB, the goal here
   is reliability and robustness

   for the builder container:
   clojure:lein-alpine was about 142MB
   clojure:openjdk-8-lein is about 507MB

	 for the target container:
   openjdk:8-jre-alpine was about 85MB
   openjdk:8-buster-slim is about 184MB

 - Ubuntu 18.04 shares a lineage with debian buster, which should
   be a well supported platform for PDB

   All OpenJDK container variants are listed at:
   https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links
Iristyle added a commit to puppetlabs/puppetdb that referenced this issue Aug 14, 2019
 - Alpine seems to still be having issues with DNS resolutions inside
   an LCOW environment. In an effort to reduce these transient
   problems, switch the base container to a non-Alpine platform.

   A ticket has been filed with a repro for Alpine DNS issues under LCOW
   moby/libnetwork#2371

 - While this may increase the image size by about 100MB, the goal here
   is reliability and robustness

   for the builder container:
   clojure:lein-alpine was about 142MB
   clojure:openjdk-8-lein is about 507MB

	 for the target container:
   openjdk:8-jre-alpine was about 85MB
   openjdk:8-buster-slim is about 184MB

 - Ubuntu 18.04 shares a lineage with debian buster, which should
   be a well supported platform for PDB

   All OpenJDK container variants are listed at:
   https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links
Iristyle added a commit to Iristyle/puppetdb that referenced this issue Oct 28, 2019
 - Alpine seems to still be having issues with DNS resolutions inside
   an LCOW environment. In an effort to reduce these transient
   problems, switch the base container to a non-Alpine platform.

   A ticket has been filed with a repro for Alpine DNS issues under LCOW
   moby/libnetwork#2371

 - While this may increase the image size by about 100MB, the goal here
   is reliability and robustness

   for the builder container:
   clojure:lein-alpine was about 142MB
   clojure:openjdk-8-lein is about 507MB

	 for the target container:
   openjdk:8-jre-alpine was about 85MB
   openjdk:8-buster-slim is about 184MB

 - Ubuntu 18.04 shares a lineage with debian buster, which should
   be a well supported platform for PDB

   All OpenJDK container variants are listed at:
   https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links
@3dbrows
Copy link

3dbrows commented Feb 6, 2020

@Iristyle Thank you very much for this detailed description of the problem, and workaround. In case the following is useful context for anyone looking to fix this issue, I see this issue while running:

  • Windows Server 2019 (1809)
  • Docker EE 17.10.0-ee-preview-3
  • LCOW containers
  • docker-compose using a nat network it spins up automatically

(I am running this antique docker version because to the best of my knowledge it is the only one that supports LCOW on Win Server.)

I have a sneaking suspicion that the culprit is busybox, and that using version 1.28 will work; 1.29 is broken in terms of nslookup.

@arkodg
Copy link
Contributor

arkodg commented Feb 6, 2020

can you ptal @pradipd

@pradipd
Copy link
Contributor

pradipd commented Feb 7, 2020

@daschott - Would you mind triaging?

@daschott
Copy link

@mamezgeb what is the latest supported Docker version with LCOW on Server? @3dbrows is running old 17.10 preview version. I know we have https://docs.docker.com/docker-for-windows/wsl-tech-preview/ for Desktop and experimental feature on Docker-CE, but what is the current recommendation for server?

@3dbrows did it work to try older busybox image? Do other container images not work as well?

@3dbrows
Copy link

3dbrows commented Feb 10, 2020

@daschott Busybox 1.28 works (wider discussion here: docker-library/busybox#48). I've seen this nslookup problem in any image containing this version of busybox. My workaround is (on container startup) to use a command that installs dig and uses it to find the other containers that I'm interested in, and write out a hosts file with their IPs. I am aware that this is brittle in case the target IPs change, but it's all I can think of for now. Example docker-compose script to achieve this:

nsq_create_topic:
    image: nsqio/nsq:v1.2.0
    dns: "8.8.8.8"
    command: >
      sh -c "
        apk add bind-tools; echo \"$$(dig nsqd +short) nsqd\" >> /etc/hosts; cat /etc/hosts;
        wget -qO- --post-data='' 'nsqd:4151/topic/create?topic=worker'"

I specify my DNS resolver (for apk to use) because my target machine is an Azure VM with Private DNS running on 127.7.7.7 (this is the resolver that DHCP in Azure specifies), which the container cannot access, therefore won't resolve anything much if I don't supply this. (I have no control over the Azure environment.)

@daschott
Copy link

Thanks @3dbrows for confirming. Is it possible at all to rebuild on top of busybox 1.28?

@Iristyle the point that this appears to work reliably on ubuntu image + on alpine dig works but only nslookup fails is interesting. Can you confirm which busybox version is being used by alpine? Dig works reliably?

@3dbrows
Copy link

3dbrows commented Feb 10, 2020

@daschott Could try that, best way might be to obtain latest busybox by upgrading the Alpine base image - the version numbers are as follows:

Alpine 3.10 has busybox 1.30.1-r3: https://pkgs.alpinelinux.org/packages?name=busybox&branch=v3.10
Alpine 3.9 has busybox 1.29.3-r10: https://pkgs.alpinelinux.org/packages?name=busybox&branch=v3.9

I do not right this minute have access to my LCOW/WinServer box, but I imagine a good test would be to take @Iristyle 's script above and modify like this:

Expected to fail:

services:
  foo:
    image: alpine:3.9
    dns_search: internal
    entrypoint: sh -c "while true; do nslookup bar.internal && sleep 1s; done"
    networks:
      default:
        aliases:
         - foo.internal

  bar:
    image: alpine:3.9
    dns_search: internal
    entrypoint: sh -c "while true; do nslookup foo.internal && sleep 1s; done"
    networks:
      default:
        aliases:
         - bar.internal

Expected to work:
Copy-paste the above but change 3.9 to 3.10. Bear in mind that at the time of @Iristyle 's initial post, latest meant 3.9. So, in fact, the repro script he wrote might now work as-is, given that latest now means 3.11.

Iristyle pushed a commit to puppetlabs/pupperware that referenced this issue Apr 9, 2021
When testing with the `puppet/puppet-agent-alpine` image on windows
systems with LCOW we had intermittent failures in DNS resolution that
occurred fairly regularly. It seems to be specifically interaction
between the base alpine (3.8 and 3.9) images with windows/LCOW.

Two issues related to this issue are
moby/libnetwork#2371 and
microsoft/opengcs#303
Iristyle added a commit to puppetlabs/pupperware that referenced this issue Apr 9, 2021
 - Remove the domain introspection / setting of AZURE_DOMAIN env var
   as this does not work as originally thought.

   Instead, hardcode the DNS suffix `.internal` to each service in the
   compose stack, and make sure that `dns_search` for `internal` will
   use the Docker DNS resolver when dealing with these hosts. Note that
   these compose file settings only affect the configuration of the
   DNS resolver, *not* resolv.conf. This is different from the
   docker run behavior, which *does* modify resolv.conf. Also note,
   config file locations vary depending on whether or not systemd is
   running in the container.

   It's not "safe" to refer to services in the cluster by only their
   short service names like `puppet`, `puppetdb` or `postgres` as they
   can conflict with hosts on the external network with these names
   when `resolv.conf` appends DNS search suffixes.

   When docker compose creates the user defined network, it copies the
   DNS settings from the host to the `resolv.conf` in each of the
   containers. This often takes search domains from the outside network
   and applies them to containers.

   When network resolutions happen, any default search suffix will be
   applied to short names when the dns option for ndots is not set to 0.
   So for instance, given a `resolv.conf` that contains:

   search delivery.puppetlabs.net

   A DNS request for `puppet` becomes `puppet.delivery.puppetlabs.net`
   which will fail to resolve in the Docker DNS resolver, then be sent
   to the next DNS server in the `nameserver` list, which may resolve it
   to a different host in the external network. This behaves this way
   because `resolv.conf` also sets secondary DNS servers from the host.

   While it is possible to try and service requests for an external
   domain like `delivery.puppetlabs.net` with the embedded Docker DNS
   resolver, it's better to instead choose a domain suffix to use inside
   the cluster.

   There are some good details on how various network types configure:
   docker/for-linux#488 (comment)

 - Note that the .internal domain is typically not recommended for
   production given the only IANA reserved domains are .example, .test,
   .invalid or .localhost. However, given the DNS resolver is set to
   own the resolution of .internal, this is a compromise.

   In production its recommended to use a subdomain of a domain that
   you own, but that's not yet configurable in this compose file. A
   future commit will make this configurable.

 - Another workaround for this problem would be to set the ndots option
   in resolv.conf to 0 per the documentation at
   http://man7.org/linux/man-pages/man5/resolv.conf.5.html

   However that can't be done for two reasons:

   - docker-compose schema doesn't actually support setting DNS options
     docker/cli#1557

   - k8s sets ndots to 5 by default, so we don't want to be at odds

 - A further, but implausible workaround would be to modify the host DNS
   settings to remove any search suffixes.

 - The original FQDN change being reverted in this commit was introduced
   in 8b38620

   "
   Lastly, the Windows specific docker-compose.windows.yml sets up a
   custom alias in the "default" network so that an extra DNS name for
   puppetserver can be set based on the FQDN that Facter determines.
   Without this additional DNS reservation, the `puppetserver ca`
   command will be unable to connect to the REST endpoint.

   A better long-term solution is making sure puppetserver is setup to
   point to `puppet` as the host instead of an FQDN.
   "

   With the PUPPETSERVER_HOSTNAME value set on the puppetserver
   container, both certname and server are set to puppet.internal,
   inside of puppet.conf, preventing a need to inject a domain name as
   was done previously.

   This is necessary because of a discrepancy in how Facter 3 behaves vs
   Facter 2, which creates a mismatch between how the host cert is
   initially generated (using Facter 3) and how `puppetserver ca`
   finds the files on disk (using Facter 2), that setting
   PUPPETSERVER_HOSTNAME will explicitly work around.

   Specifically, Facter 2 may return a different Facter.value('domain')
   than calling `facter domain` using Facter 3 at the command line.
   Such is the case inside the puppet network, where Facter 2 returns
   `ops.puppetlabs.net` while Facter 3 returns `delivery.puppetlabs.net`

	 Without explicitly setting PUPPETSERVER_HOSTNAME, this makes cert
   files on disk get written as *.delivery.puppetlabs.net, yet the
   `puppetserver ca` application looks for the client certs on disk as
   *.ops.puppetlabs.net, which causes `puppetserver ca` to fail.

 - Facter 2 should not be included in the puppetserver packages, and
   changes have been made to packaging for future releases, which may
   remove the need for the above.

 - This PR is also made possible by switching over to using the Ubuntu
   based container from the Alpine container (performed in a prior
   commit), due to DNS resolution problems with Alpine inside LCOW:

   moby/libnetwork#2371
   microsoft/opengcs#303

 - Another avenue that was investigated to resolve the DNS problem in
   Alpine was to feed host:ip mappings in through --add-host, but it
   turns out that Windows doesn't yet support that feature per

   docker/for-win#1455

 - Finally, these changes are also made in preparation of switching the
   pupperware-commercial repo over to a private builder

 - Additionally update k8s / Bolt specs to be consistent with updated
   naming
binford2k pushed a commit to voxpupuli/container-puppetdb that referenced this issue Nov 1, 2022
 - Alpine seems to still be having issues with DNS resolutions inside
   an LCOW environment. In an effort to reduce these transient
   problems, switch the base container to a non-Alpine platform.

   A ticket has been filed with a repro for Alpine DNS issues under LCOW
   moby/libnetwork#2371

 - While this may increase the image size by about 100MB, the goal here
   is reliability and robustness

   for the builder container:
   clojure:lein-alpine was about 142MB
   clojure:openjdk-8-lein is about 507MB

	 for the target container:
   openjdk:8-jre-alpine was about 85MB
   openjdk:8-buster-slim is about 184MB

 - Ubuntu 18.04 shares a lineage with debian buster, which should
   be a well supported platform for PDB

   All OpenJDK container variants are listed at:
   https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links
binford2k pushed a commit to voxpupuli/container-puppetdb that referenced this issue Nov 1, 2022
 - Alpine seems to still be having issues with DNS resolutions inside
   an LCOW environment. In an effort to reduce these transient
   problems, switch the base container to a non-Alpine platform.

   A ticket has been filed with a repro for Alpine DNS issues under LCOW
   moby/libnetwork#2371

 - While this may increase the image size by about 100MB, the goal here
   is reliability and robustness

   for the builder container:
   clojure:lein-alpine was about 142MB
   clojure:openjdk-8-lein is about 507MB

	 for the target container:
   openjdk:8-jre-alpine was about 85MB
   openjdk:8-buster-slim is about 184MB

 - Ubuntu 18.04 shares a lineage with debian buster, which should
   be a well supported platform for PDB

   All OpenJDK container variants are listed at:
   https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants