Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Desktop on Windows: 'docker context inspect' is very slow #2420

Closed
karolz-ms opened this issue Apr 6, 2020 · 30 comments · Fixed by #2424
Closed

Docker Desktop on Windows: 'docker context inspect' is very slow #2420

karolz-ms opened this issue Apr 6, 2020 · 30 comments · Fixed by #2424

Comments

@karolz-ms
Copy link

karolz-ms commented Apr 6, 2020

Steps to reproduce the issue:

  1. Install Docker Desktop on Windows (tested on Windows 10 Enterpirse version 1909 build 18363.720, Docker Desktop 2.2.0.4)
  2. Start Docker Desktop.
  3. Open command prompt and do docker context inspect

Describe the results you received:
It consistently takes more than 1 second to run this command, often more than 2 seconds.

Describe the results you expected:
Should take 100 ms or less. The default context, pointing to Docker Desktop engine, is the only context on the machine and it should be really straightforward to "inspect" it.

Additional information you deem important (e.g. issue happens only occasionally):

I took a trace (available upon request) and noticed that the majority of time is spent waiting on results from NetUserGetInfo Windows API. On my machine this API is called for a default Docker Desktop context inspection, which has no user information. Ultimately this accounts for more than 2 seconds of latency (for that run of the command)

May be related to #1739

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.8
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        afacb8b
 Built:             Wed Mar 11 01:23:10 2020
 OS/Arch:           windows/amd64
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       afacb8b
  Built:            Wed Mar 11 01:29:16 2020
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          v1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false
 Plugins:
  app: Docker Application (Docker Inc., v0.8.0)
  buildx: Build with BuildKit (Docker Inc., v0.3.1-tp-docker)

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 25
 Server Version: 19.03.8
 Storage Driver: overlay2
  Backing Filesystem: <unknown>
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.19.76-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 3.846GiB
 Name: docker-desktop
 ID: KLND:UUTX:OIZ2:IUWS:KHMX:VVTK:SBMI:GHS3:VCM4:WPUY:4TJM:SXMY
 Docker Root Dir: /var/lib/docker
 Debug Mode: true
  File Descriptors: 33
  Goroutines: 49
  System Time: 2020-04-06T15:18:37.131345023Z
  EventsListeners: 3
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

Additional environment details (AWS, VirtualBox, physical, etc.):

@bwateratmsft
Copy link

It may not be limited to just docker context inspect calls. On my system, even doing something as simple as docker help takes 7-8 seconds.

@karolz-ms
Copy link
Author

Yep, I have confirmed that docker help also somehow ends up calling NetUserGetInfo... 😕

@bwateratmsft
Copy link

For some context, my machine is on VPN, a good 3000 miles away from the corporate network. So, any network calls that cross that gap are gonna end up being pretty slow--I'm sure that my observed 7-8 seconds is well above average.

I looked up the global averages in our telemetry, docker context inspect takes roughly 600 ms on Mac/Linux and roughly 1300ms on Windows. More here: microsoft/vscode-docker#1804 (comment)

@karolz-ms
Copy link
Author

karolz-ms commented Apr 15, 2020

@silvin-lubecki @gtardif looks like the issue is still not completely fixed. I have built the binary from the master branch and am observing a lot of variance between the runs. The first run of the context inspect command is almost always is very slow. The following commands were executed from a freshly open Windows command window:

C:\code>\tools\timecmd.bat \scratch\docker-windows-amd64.exe context inspect
[
    {
        "Name": "default",
        "Metadata": {
            "StackOrchestrator": "swarm"
        },
        "Endpoints": {
            "docker": {
                "Host": "npipe:////./pipe/docker_engine",
                "SkipTLSVerify": false
            }
        },
        "TLSMaterial": {},
        "Storage": {
            "MetadataPath": "\u003cIN MEMORY\u003e",
            "TLSPath": "\u003cIN MEMORY\u003e"
        }
    }
]
command took 0:0:2.41 (2.41s total)
C:\code>\tools\timecmd.bat \scratch\docker-windows-amd64.exe context inspect
[
    {
        "Name": "default",
        "Metadata": {
            "StackOrchestrator": "swarm"
        },
        "Endpoints": {
            "docker": {
                "Host": "npipe:////./pipe/docker_engine",
                "SkipTLSVerify": false
            }
        },
        "TLSMaterial": {},
        "Storage": {
            "MetadataPath": "\u003cIN MEMORY\u003e",
            "TLSPath": "\u003cIN MEMORY\u003e"
        }
    }
]
command took 0:0:0.10 (0.10s total)

It is enough to wait about 30 seconds between docker context inspect invocations to observe the slow execution again. I bet the Windows user info is still being retrieved over the network, but it is cached for a short while by Windows, that is why quick successful invocations are fast.

The docker.exe binary that comes with Docker Desktop for Windows 2.2.0.5 behaves the same way BTW.

@thaJeztah
Copy link
Member

@karolz-ms curious; where do those MetadataPaths (<IN MEMORY>) come from?

FWIW, with #2424 applied, I tried reproducing in a Linux container (I was curious if the difference was between docker context ls, which I tried before, and docker context inspect), on Linux I didn't reproduce the problem though;

docker context create faulty --docker host=tcp://42.42.42.41:4242
docker context create local --docker host=unix:///var/run/docker.sock
docker context use faulty

time docker context ls
NAME                DESCRIPTION                               DOCKER ENDPOINT               KUBERNETES ENDPOINT   ORCHESTRATOR
default             Current DOCKER_HOST based configuration   unix:///var/run/docker.sock                         swarm
faulty *                                                      tcp://42.42.42.41:4242
local                                                         unix:///var/run/docker.sock
real	0m 0.33s
user	0m 0.04s
sys	0m 0.04s


time docker context inspect local
[
    {
        "Name": "local",
        "Metadata": {},
        "Endpoints": {
            "docker": {
                "Host": "unix:///var/run/docker.sock",
                "SkipTLSVerify": false
            }
        },
        "TLSMaterial": {},
        "Storage": {
            "MetadataPath": "/root/.docker/contexts/meta/25bf8e1a2393f1108d37029b3df5593236c755742ec93465bbafa9b290bddcf6",
            "TLSPath": "/root/.docker/contexts/tls/25bf8e1a2393f1108d37029b3df5593236c755742ec93465bbafa9b290bddcf6"
        }
    }
]
real	0m 0.36s
user	0m 0.05s
sys	0m 0.03s

time docker context inspect faulty
[
    {
        "Name": "faulty",
        "Metadata": {},
        "Endpoints": {
            "docker": {
                "Host": "tcp://42.42.42.41:4242",
                "SkipTLSVerify": false
            }
        },
        "TLSMaterial": {},
        "Storage": {
            "MetadataPath": "/root/.docker/contexts/meta/9ec1600e60d2aba399d84db8163ec37fc67cd60096cb32a66f0a52280797ee95",
            "TLSPath": "/root/.docker/contexts/tls/9ec1600e60d2aba399d84db8163ec37fc67cd60096cb32a66f0a52280797ee95"
        }
    }
]
real	0m 0.34s
user	0m 0.03s
sys	0m 0.05s

@thaJeztah
Copy link
Member

thaJeztah commented Apr 20, 2020

Oh, never mind; looks like the <IN MEMORY> is a default that's set here;

return store.StorageInfo{MetadataPath: "<IN MEMORY>", TLSPath: "<IN MEMORY>"}

Same result when using / inspecting the default

docker context use default
time docker context inspect
[
    {
        "Name": "default",
        "Metadata": {
            "StackOrchestrator": "swarm"
        },
        "Endpoints": {
            "docker": {
                "Host": "unix:///var/run/docker.sock",
                "SkipTLSVerify": false
            }
        },
        "TLSMaterial": {},
        "Storage": {
            "MetadataPath": "\u003cIN MEMORY\u003e",
            "TLSPath": "\u003cIN MEMORY\u003e"
        }
    }
]
real	0m 0.29s
user	0m 0.04s
sys	0m 0.05s

@bwateratmsft
Copy link

I can still consistently get long times on Windows with the current build. docker context inspect took an average of 3028 ms over 100 runs; docker info took an average of 7297 ms over 50 runs.

@karolz-ms
Copy link
Author

@thaJeztah @gtardif any news on this issue?

You know, we are a Windows shop here ☺️ so let us know if we can help in any way; run some experiments or whatever.

@chris-crone chris-crone added this to the 20.03.0 milestone May 4, 2020
@simonferquel
Copy link
Contributor

The NetGetUserInfo call probably comes for getting the HomeDir of the user... we recently changed the way we get the homedir to the golang base library implementation.
@thaJeztah can we try to revert this PR in some branch to see if it fixes the issue ?

@thaJeztah
Copy link
Member

@simonferquel we went through a couple of changes there; #2101, which was later (partially?) reverted in (#2111); could you check which is the problematic one?

I also have #2161 pending review, which should at least not call it by default

@simonferquel
Copy link
Contributor

Hmm, maybe we should ask for Microsoft guys here what is the correct (and most efficient) way to get the home dir on Windows using win32 APIs and plug homedir.Get onto that for Windows ?

@karolz-ms
Copy link
Author

The most reliable way is to call SHGetKnownFolderPath with FOLDERID_Profile. USERPROFILE environment variable should point to the same thing, but might be set to something else by a user looking for trouble.

@bwateratmsft
Copy link

Somebody overwriting that environment variable is asking for so much trouble that I think it's an acceptable risk to use it. 😄

@thaJeztah
Copy link
Member

thaJeztah commented May 7, 2020

So config.go seems to be using both "our" (github.com/docker/docker/pkg/homedir) and Golang's os.UserHomeDir();

cli/cli/config/config.go

Lines 30 to 32 in d05846d

if configDir == "" {
configDir = filepath.Join(homedir.Get(), configFileDir)
}

cli/cli/config/config.go

Lines 108 to 112 in d05846d

// Can't find latest config file so check for the old one
homedir, err := os.UserHomeDir()
if err != nil {
return configFile, errors.Wrap(err, oldConfigfile)
}

The github.com/docker/docker/pkg/homedir package for windows looks to do just that (get USERPROFILE env var, and nothing else);

// Key returns the env var name for the user's home dir based on
// the platform being run on
func Key() string {
return "USERPROFILE"
}
// Get returns the home directory of the current user with the help of
// environment variables depending on the target operating system.
// Returned path should be used with "path/filepath" to form new paths.
func Get() string {
return os.Getenv(Key())
}

os.UserHomeDir() also seems to do the same 🤔 https://github.com/golang/go/blob/go1.13.10/src/os/file.go#L462-L472

func UserHomeDir() (string, error) {
	env, enverr := "HOME", "$HOME"
	switch runtime.GOOS {
	case "windows":
		env, enverr = "USERPROFILE", "%userprofile%"
	case "plan9":
		env, enverr = "home", "$home"
	}
	if v := Getenv(env); v != "" {
		return v, nil
	}

@simonferquel
Copy link
Contributor

Hmm so do we have tooling to see the go stacktrace calling the problematic win32 API?

@karolz-ms
Copy link
Author

I will try to find out and report back here

@karolz-ms
Copy link
Author

@thaJeztah @simonferquel I was able to find the issue: it is in the klog logging package that you guys are vendoring. This package calls os/user.Current() upon initialization, which on Windows does all sorts of domain lookups.

These lookups causes problems on some versions of Windows OS, but much more importantly (and this is what this bug is about), this way of obtaining current user information for logging purposes is just too damn slow for a CLI utility, and probably not really necessary either.

Hope this helps!
@gtardif FYI

@karolz-ms
Copy link
Author

How about pulling this fix?

@thaJeztah
Copy link
Member

Great find; It's vendored as a dependency of the docker stack deploy to kubernetes, so we matched the version that's used with the version of kubernetes we vendor; let me check if vendoring a more current version would be problematic

@thaJeztah
Copy link
Member

I see the fix was back ported to v1.x; kubernetes/klog#124, but is not yet in a release; kubernetes/klog@v1.0.0...release-1.x

I may have one more suggestion for that fix; think it would make sense to lazily look up the username, given that on Linux looking up the user may also make network connections for external users (if cgo is enabled and the osusergo build-tag is not set); see golang/go#23265 and https://github.com/golang/go/blob/go1.13.10/src/os/user/user.go#L8-L15

@dims any chance on tagging a v1.0.1 release?

@thaJeztah
Copy link
Member

Actually perhaps it would be good to have a change in Go itself, and add support for the osusergo build-tag for Windows, similar to golang/go@62f0127 implemented for Unix/Linux

@karolz-ms perhaps there's interest from Microsoft's side to contribute such a change to Golang? (looking at the PR you linked, it looks like user.Current() is not supported on Nano Server, so possibly other Golang projects on Windows hit the same issue

@thaJeztah
Copy link
Member

opened kubernetes/klog#143 to perform userName lookup lazily instead of on init()

@thaJeztah
Copy link
Member

opened #2515 and #2516 to (temporarily) vendor tip of release-1.x from k8s.io/klog

@dims
Copy link

dims commented May 10, 2020

@thaJeztah Ack on 143, let me try to get some reviews on that first.

@karolz-ms
Copy link
Author

@thaJeztah thanks for quick reply, much appreciated

Actually perhaps it would be good to have a change in Go itself, and add support for the osusergo build-tag for Windows, similar to golang/go@62f0127 implemented for Unix/Linux

@karolz-ms perhaps there's interest from Microsoft's side to contribute such a change to Golang? (looking at the PR you linked, it looks like user.Current() is not supported on Nano Server, so possibly other Golang projects on Windows hit the same issue

This sounds like a good idea, although such work would be outside of my area of expertise. I will ask around, but do you happen to know about any Microsoft people who did contribute to Golang runtime in the past?

@thaJeztah
Copy link
Member

do you happen to know about any Microsoft people who did contribute to Golang runtime in the past?

Hmmm, no, not from the top of my head. I made a contribution myself once, but their flow is not a regular "github" flow, so I need to re-read the instructions again 😅. In this case, it'd probably be good to look if a reasonable "fallback" exists (that's generic enough), which probably should be done by someone with a better understanding of the Windows mechanisms than I have

@thaJeztah
Copy link
Member

This should be fixed through #2516

@bwateratmsft
Copy link

There are definitely substantial improvements in the latest build from master in my testing. I know that my results before were atypically slow so take the % improvements with a grain of salt.

  • docker context inspect averaged 111 ms over 50 calls (down 96% from 3028 ms before).
  • docker context ls averaged 102 ms over 50 calls (down 95% from 2210 ms before).
  • docker -v averaged 124 ms over 50 calls (down 95% from 2255 ms before).
  • docker info is still a little sluggish at 6525 ms over 50 calls (down 11% from 7297 ms before). However, the VSCode Docker extension only calls this in very rare circumstances, so this isn't much of a concern for us.

Globally, the average for docker context inspect on Windows looked to be around 1300 ms, so this marks a 91% drop from that. That's amazing!

Thanks everyone! This is a huge help for us!

@thaJeztah
Copy link
Member

Thanks @bwateratmsft!

docker info is still a little sluggish at 6525 ms over 50 calls (down 11% from 7297 ms before)

The /info endpoint has become a bit of a "kitchen-sink" of any information related to the daemon. While it contains a lot of useful information, unfortunately it also means the endpoint itself has become way more heavy than it should be (especially if you're only looking for a subset of the info)

We're aware of this, but (admittedly) haven't come round to design a better approach (I was thinking of having an optional query parameter to limit the returned information to specific parts, but ideally the struct that's returned would also be more structured).

Is there specific information that VSCode uses from the info output? (Perhaps there's alternatives).

Globally, the average for docker context inspect on Windows looked to be around 1300 ms, so this marks a 91% drop from that. That's amazing!

Thanks! Erm, so it turns out there's a regression (#2533) related to the first change (#2424), I think that fix was not the "main contributor" to the docker context inspect issue on Windows, but it may mean that for 19.03 we may have to do a partial revert (but continue looking at re-adding the initial version negotiation for the next release)

@bwateratmsft
Copy link

bwateratmsft commented May 19, 2020

That makes sense. I looked at where we call info, turns out we use Dockerode for that, not the CLI. There's three places, all user-initiated:

  1. Attaching the shell to a container
  2. Running the Azure CLI image
  3. Debugging .NET Core via attach configuration. In this case, the info endpoint is definitely not the choke point, since we have to copy vsdbg into the container.

Only the first of those three has any significant usage. We're querying the Docker server OSType, indirectly to find out the container OS, which we use to decide whether we should launch cmd or bash.

Curiously, Dockerode takes only ~10-20 ms to call that endpoint. Any ideas on the discrepancy?

thaJeztah added a commit to thaJeztah/cli that referenced this issue Jan 28, 2022
…ng/glog

glog has the same issue as k8s.io/klog, and is calling `user.Current()`
inside an `init()`; see kubernetes/klog@466fbb6

Calling `user.Current()` on Windows can result in remove connections being
made to get the user's information, which can be a heavy call. See docker#2420

glog was only used in a single location in compose-on-kubernetes, so we may as
well remove it.

compose-on-kubernetes has been archived, so (for now) vendoring from my fork, but
we should consider to temporarily un-archive it, and get this fix in upstream.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
thaJeztah added a commit to thaJeztah/cli that referenced this issue Jan 28, 2022
…ng/glog

glog has the same issue as k8s.io/klog, and is calling `user.Current()`
inside an `init()`; see kubernetes/klog@466fbb6

Calling `user.Current()` on Windows can result in remove connections being
made to get the user's information, which can be a heavy call. See docker#2420

glog was only used in a single location in compose-on-kubernetes, so we may as
well remove it.

compose-on-kubernetes has been archived, so (for now) vendoring from my fork, but
we should consider to temporarily un-archive it, and get this fix in upstream.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
thaJeztah added a commit to thaJeztah/cli that referenced this issue Jan 28, 2022
…ng/glog

glog has the same issue as k8s.io/klog, and is calling `user.Current()`
inside an `init()`; see kubernetes/klog@466fbb6

Calling `user.Current()` on Windows can result in remove connections being
made to get the user's information, which can be a heavy call. See docker#2420

glog was only used in a single location in compose-on-kubernetes, so we may as
well remove it.

compose-on-kubernetes has been archived, so (for now) vendoring from my fork, but
we should consider to temporarily un-archive it, and get this fix in upstream.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
thaJeztah added a commit to thaJeztah/cli that referenced this issue Jan 28, 2022
…ng/glog

glog has the same issue as k8s.io/klog, and is calling `user.Current()`
inside an `init()`; see kubernetes/klog@466fbb6

Calling `user.Current()` on Windows can result in remove connections being
made to get the user's information, which can be a heavy call. See docker#2420

glog was only used in a single location in compose-on-kubernetes, so we may as
well remove it.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
thaJeztah added a commit to thaJeztah/cli that referenced this issue Feb 3, 2022
…ang/glog

glog has the same issue as k8s.io/klog, and is calling `user.Current()`
inside an `init()`; see kubernetes/klog@466fbb6

Calling `user.Current()` on Windows can result in remove connections being
made to get the user's information, which can be a heavy call. See docker#2420

glog was only used in a single location in compose-on-kubernetes, so we may as
well remove it.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
thaJeztah added a commit to thaJeztah/cli that referenced this issue Feb 14, 2022
…ang/glog

glog has the same issue as k8s.io/klog, and is calling `user.Current()`
inside an `init()`; see kubernetes/klog@466fbb6

Calling `user.Current()` on Windows can result in remove connections being
made to get the user's information, which can be a heavy call. See docker#2420

glog was only used in a single location in compose-on-kubernetes, so we may as
well remove it.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
thaJeztah added a commit to thaJeztah/cli that referenced this issue Feb 14, 2022
…ng/glog

glog has the same issue as k8s.io/klog, and is calling `user.Current()`
inside an `init()`; see kubernetes/klog@466fbb6

Calling `user.Current()` on Windows can result in remove connections being
made to get the user's information, which can be a heavy call. See docker#2420

glog was only used in a single location in compose-on-kubernetes, so we may as
well remove it.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants