Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inputs.vsphere] Error in plugin: ServerFaultCode: XML document element count exceeds configured maximum 500000 #5041

Closed
PLColuccio opened this issue Nov 26, 2018 · 35 comments
Labels
area/vsphere discussion Topics for discussion
Milestone

Comments

@PLColuccio
Copy link

I am receiving this when the plugin receives metrics from the vCenter servers.

Any idea on what is wrong / how to fix?

2018-11-26T22:24:47Z E! [inputs.vsphere]: Error in plugin: ServerFaultCode: XML document element count exceeds configured maximum 500000

while parsing serialized DataObject of type vim.PerformanceManager.MetricId
at line 2, column 19637665

while parsing property "metricId" of static type ArrayOfPerfMetricId

while parsing serialized DataObject of type vim.PerformanceManager.QuerySpec
at line 2, column 19598059

while parsing call information for method QueryPerf
at line 2, column 66

while parsing SOAP body
at line 2, column 60

while parsing SOAP envelope
at line 2, column 0

while parsing HTTP request for method queryStats
on object of type vim.PerformanceManager
at line 1, column 0

@prydin
Copy link
Contributor

prydin commented Nov 27, 2018

I looks like the plugin is trying to send a huge query to the server. Limit max_query_objects and/or max_query_metrics. For example:

max_query_objects = 100
max_query_metrics = 100

@glinton glinton added area/vsphere discussion Topics for discussion labels Nov 28, 2018
@prydin
Copy link
Contributor

prydin commented Nov 28, 2018

The next release will also limit queries to 100,000 metrics at a time, regardless of settings. This should prevent this from happening again.

As a side note, @phreak2599, I'd be interested in knowing a bit more about your configuration. Assuming you have the default 256 objects per query, 500,000 metrics sounds incredibly high. How many VMs/hosts are in that vCenter, if you don't mind sharing?

@PLColuccio
Copy link
Author

PLColuccio commented Nov 28, 2018

It was actually set to 64, since we are currently running 5.5, although we are in the process of upgrading to 6.5. I have since set it to 32 to see if that helps.

This plugin is currently running in one of our data centers against 4 vCenter hosts.
Per vCenter:
host count 29 vm count 548
host count 131 vm count 2343
host count 60 vm count 1733
host count 87 vm count 2168

Not sure which vCenter was causing the issue though.

Update: Seems to be working well with both the query settings set to 32.

Thanks for the help @prydin !

@PLColuccio
Copy link
Author

Spoke too soon. Looks like now I am getting:

2018-11-28T15:40:54Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-11-28T15:40:54Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-11-28T15:40:54Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics.
2018-11-28T15:40:54Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-11-28T15:40:54Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-11-28T15:40:59Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics.
2018-11-28T15:41:04Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics.
2018-11-28T15:41:09Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics.
2018-11-28T15:41:14Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics.
2018-11-28T15:41:19Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics.
2018-11-28T15:41:24Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics.
2018-11-28T15:41:29Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics.
2018-11-28T15:41:34Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics.
2018-11-28T15:41:39Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics.
2018-11-28T15:41:44Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics.
2018-11-28T15:41:49Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics.
2018-11-28T15:41:54Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-11-28T15:41:54Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-11-28T15:41:54Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-11-28T15:41:54Z W! [agent] input "inputs.vsphere" did not complete within its interval

@PLColuccio PLColuccio reopened this Nov 28, 2018
@prydin
Copy link
Contributor

prydin commented Nov 28, 2018

What's your collect_concurrency setting? Try to increase it to, say, 5.

Also, if you don't need instance-level (per CPU etc) metrics, you can turn that off per resource type, which should save you a lot of collection time.

Another thing you can try is to reduce the number of metrics collected to only those you need.

We're just at the tail end of a huge scale testing and performance tuning effort and should be providing an update soon that has some performance tweaks. In our lab, we're collecting metrics for 7000 VMs, including instance data, in about 6 seconds.

@PLColuccio
Copy link
Author

PLColuccio commented Nov 28, 2018

Not sure if I understand exactly what is happening, but it seems the initial discovery runs, then the plugin runs fine until the next discovery. When the next discovery runs, it doesn't complete, then the plugin doesn't seem to be sending any metrics, most likely due to the discovery failing.

Does that sound plausible?

If I raise the concurrency settings I think I will have to give more CPU to my vCenter DB servers. They have pegged out when I was playing with those in the past.

@prydin
Copy link
Contributor

prydin commented Nov 28, 2018

Try increasing the discovery interval to 30 minutes. The discovery logic is greatly improved in the version we're about to release. Should run 50-100 times faster!

I can post a binary if you feel like testing it out.

@prydin
Copy link
Contributor

prydin commented Nov 28, 2018

BTW, the concurrency settings for metric collection shouldn't have a huge impact on database servers, at least not for VM and host metrics, since they are scraped from ESXi memory.

@PLColuccio
Copy link
Author

I can try the latest repo. Let me see how difficult it is to compile.

@prydin
Copy link
Contributor

prydin commented Nov 28, 2018 via email

@PLColuccio
Copy link
Author

I had the same issue with the new version. The discovery finishes, the initial metric collection seems to finish (but I don't think it does). I think the plugin is hanging, and not ever finishing the initial collection.

I am just getting:

2018-11-29T15:46:36Z D! [input.vsphere] Query for cluster returned metrics for 2 objects
2018-11-29T15:46:36Z D! [input.vsphere] CollectChunk for cluster returned 8 metrics
2018-11-29T15:46:36Z D! [input.vsphere] Query for cluster returned metrics for 1 objects
2018-11-29T15:46:36Z D! [input.vsphere] CollectChunk for cluster returned 8 metrics
2018-11-29T15:46:36Z D! [input.vsphere] Query for cluster returned metrics for 1 objects
2018-11-29T15:46:36Z D! [input.vsphere] CollectChunk for cluster returned 10 metrics
2018-11-29T15:46:36Z D! [input.vsphere] CollectChunk for cluster returned 0 metrics
2018-11-29T15:46:36Z D! [input.vsphere] Query for cluster returned metrics for 1 objects
2018-11-29T15:46:36Z D! [input.vsphere] CollectChunk for cluster returned 10 metrics
2018-11-29T15:46:36Z D! [input.vsphere] Query for cluster returned metrics for 1 objects
2018-11-29T15:46:36Z D! [input.vsphere] CollectChunk for cluster returned 10 metrics
2018-11-29T15:46:36Z D! [input.vsphere] Query for cluster returned metrics for 1 objects
2018-11-29T15:46:36Z D! [input.vsphere] CollectChunk for cluster returned 4 metrics
2018-11-29T15:46:40Z D! [outputs.influxdb] wrote batch of 12 metrics in 3.165227ms
2018-11-29T15:46:40Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-11-29T15:46:45Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-11-29T15:46:50Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-11-29T15:46:55Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-11-29T15:47:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-11-29T15:47:05Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-11-29T15:47:10Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-11-29T15:47:15Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-11-29T15:47:20Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-11-29T15:47:20Z W! [agent] input "inputs.vsphere" did not complete within its interval

and the last bit keeps repeating. Never starts collecting metrics again.

@prydin
Copy link
Contributor

prydin commented Nov 29, 2018

Are you collecting datastore metrics? Try disabling that.

datastore_metric_exclude = ["*"]

If that solves the problem, move the datastore collection to a separate instance of [inputs.vsphere] with an interval >= 300s.

Collection of datastore metrics can take a VERY long time due to the way vCenter manages that data. If it doesn't complete within the interval, you'll see these kinds of problems.

Also, let me point you to the very latest version that has some pretty radical performance improvements. Stand by!

@PLColuccio
Copy link
Author

Cool, that seemed to get things going on this current release. Do you know the timeframe for the new release?

@prydin
Copy link
Contributor

prydin commented Nov 29, 2018

The actual release timing is up to the influx team, but I can get you a snapshot from my branch today. Use at you own risk and all that, of course...

@prydin
Copy link
Contributor

prydin commented Nov 30, 2018

Here's a snapshot that's been tested in our lab for a few days without any issues. You're welcome to try it (at your own risk). I attached a compiled binary for Linux. Let me know if you need any other flavors.

https://github.com/prydin/telegraf/releases/tag/PR-SCALE-IMPROVEMENT-BETA1

@ghost
Copy link

ghost commented Dec 3, 2018

I still have the same issue with the binary you provided.
Even with an interval of 300s on the agent i still get no information after an hour of collecting metrics.

2018-12-03T13:50:00Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-12-03T13:52:00Z D! [outputs.influxdb] wrote batch of 34 metrics in 6.219342ms
2018-12-03T13:52:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-03T13:54:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-03T13:55:00Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-12-03T13:56:00Z D! [outputs.influxdb] wrote batch of 34 metrics in 5.141758ms
2018-12-03T13:56:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-03T13:58:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-03T14:00:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-03T14:00:00Z W! [agent] input "inputs.vsphere" did not complete within its interval

I get 204 http status on the influxdb side API.

Dec 03 14:42:00 XXXXXXXXX influxd[32499]: [httpd] 10.12.168.11 - - [03/Dec/2018:14:42:00 +0100] "POST /write?db=iaaspriv HTTP/1.1" 204 0 "-" "Telegraf/unknown" 35891433-f701-11e8-846a-005056bc0ddf 5270
Dec 03 14:46:00 XXXXXXXXX influxd[32499]: [httpd] 10.12.168.11 - - [03/Dec/2018:14:46:00 +0100] "POST /write?db=iaaspriv HTTP/1.1" 204 0 "-" "Telegraf/unknown" c4963448-f701-11e8-846b-005056bc0ddf 7489
Dec 03 14:52:00 XXXXXXXXX influxd[32499]: [httpd] 10.12.168.11 - - [03/Dec/2018:14:52:00 +0100] "POST /write?db=iaaspriv HTTP/1.1" 204 0 "-" "Telegraf/unknown" 9b29d8c6-f702-11e8-8488-005056bc0ddf 5070
Dec 03 14:56:00 XXXXXXXXX influxd[32499]: [httpd] 10.12.168.11 - - [03/Dec/2018:14:56:00 +0100] "POST /write?db=iaaspriv HTTP/1.1" 204 0 "-" "Telegraf/unknown" 2a36e2c8-f703-11e8-849e-005056bc0ddf 4267
Dec 03 15:02:00 XXXXXXXXX influxd[32499]: [httpd] 10.12.168.11 - - [03/Dec/2018:15:02:00 +0100] "POST /write?db=iaaspriv HTTP/1.1" 204 0 "-" "Telegraf/unknown" 00ca9e50-f704-11e8-84a6-005056bc0ddf 4848

I only try to get few information on Vcenter that contain 7669 VM, here is my conf:

  vm_metric_include = [
    "cpu.usage.average",
    "mem.usage.average",
    "net.received.average",
    "net.transmitted.average",
    "virtualDisk.read.average",
    "virtualDisk.write.average",
    "virtualDisk.writeOIO.latest"
  ]
  host_metric_include = [
    "cpu.usage.average",
    "disk.read.average",
    "disk.write.average",
    "disk.totalReadLatency.average",
    "disk.totalWriteLatency.average",
    "mem.usage.average",
    "net.received.average",
    "net.transmitted.average"
  ]
  cluster_metric_exclude = ["*"]
  datastore_metric_exclude = ["*"]
  datacenter_metric_exclude = [ "*" ]
  datacenter_metric_exclude = [ "*" ]
  max_query_objects = 256
  max_query_metrics = 256
  collect_concurrency = 24
  discover_concurrency = 24
  object_discovery_interval = "600s"
  timeout = "120s"
  insecure_skip_verify = true

@prydin
Copy link
Contributor

prydin commented Dec 3, 2018

The "exclude" statements should read:

datstore_metric_exclude = [ "*" ]

@prydin
Copy link
Contributor

prydin commented Dec 3, 2018

Also, do you get any debug statements starting with [input.vsphere]? You should at least see some statements saying that it's attempting to collect.

@ghost
Copy link

ghost commented Dec 3, 2018

datstore_metric_exclude = [ "*" ]

Sorry i've forget to display my conf in markdown.

And yes i get debug entry with [input.vsphere]:
Latest log:

2018-12-03T13:31:41Z D! [input.vsphere] Discovering resources for datastore

After that none of this entry appears in my telegraf.log

@prydin
Copy link
Contributor

prydin commented Dec 3, 2018

I'd need to see all the [input.vsphere] log lines to troubleshoot this. It looks like discovering the datastores takes a really long time. How many datastores do you have?

@prydin
Copy link
Contributor

prydin commented Dec 3, 2018

Also, what is the output of telegraf -version?

@ghost
Copy link

ghost commented Dec 3, 2018

Telegraf version:
Telegraf unknown (git: prydin-scale-improvement aaa6754

For security reason i can't give you the complete log, but the last interval didn't show up any errors with the key [input.vsphere]. The output only says

2018-12-03T13:31:40Z D! [input.vsphere] Skipped powered off VM: xxxxx  <= this show up for more than a thousand time with another hostname
2018-12-03T13:31:41Z D! [input.vsphere] Found 11 metrics for foo.bar.io <= this show up for more than a thousand time with another hostname
2018-12-03T13:31:41Z D! [input.vsphere] Discovering resources for datastore

After this last line the process still running and send request to influxdb but without data (204 http/code).

@prydin
Copy link
Contributor

prydin commented Dec 3, 2018

@bashrc666 If possible, could you run telegraf with the -pprof-addr 0.0.0.0:6060 added to the end. Then, once the agent becomes unresponsive, you can get a complete goroutine dump using this command:

curl http://localhost:6060/debug/pprof/goroutine?debug=1

Copy and paste the output to this thread. The output doesn't contain any application data, so it should be safe to share. This will tell me exactly where the code locks up.

@ghost
Copy link

ghost commented Dec 4, 2018

Context

I try to collect simple vm metrics on a vcenter that manage:

259 host
8129 VM

BUG
After about 25min Telegraf still running but no data is inserted in influxdb.

GO DUMP :

8 @ 0x42e14b 0x43e12d 0x8893ce 0x88c330 0x45c551
#       0x8893cd        github.com/influxdata/telegraf/agent.(*Agent).gatherOnInterval+0x1cd    /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:262
#       0x88c32f        github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1+0xbf      /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:229

5 @ 0x42e14b 0x43e12d 0x17ec6dc 0x17ee29d 0x45c551
#       0x17ec6db       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).pushOut+0xeb        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:56
#       0x17ee29c       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1.1+0xcc    /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:80

1 @ 0x40ae87 0x4431dc 0x737382 0x45c551
#       0x4431db        os/signal.signal_recv+0x9b      /usr/local/Cellar/go/1.11/libexec/src/runtime/sigqueue.go:139
#       0x737381        os/signal.loop+0x21             /usr/local/Cellar/go/1.11/libexec/src/os/signal/signal_unix.go:23

1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x498fe9 0x5a143f 0x5b5648 0x69da8a 0x45c551
#       0x428b35        internal/poll.runtime_pollWait+0x65             /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
#       0x498189        internal/poll.(*pollDesc).wait+0x99             /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
#       0x49829c        internal/poll.(*pollDesc).waitRead+0x3c         /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
#       0x498fe8        internal/poll.(*FD).Read+0x178                  /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:169
#       0x5a143e        net.(*netFD).Read+0x4e                          /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:202
100 12082    0 12082    0     0  2141k      0 --:--:-- --:--:-- --:-src/net/net.go:177lar/go/1.11/libexec/
#       0x69da89        net/http.(*connReader).backgroundRead+0x59      /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:676

1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x498fe9 0x5a143f 0x5b5648 0x6ba7e5 0x559cd6 0x559e2f 0x6bb332 0x45c551
#       0x428b35        internal/poll.runtime_pollWait+0x65     /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
#       0x498189        internal/poll.(*pollDesc).wait+0x99     /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
#       0x49829c        internal/poll.(*pollDesc).waitRead+0x3c /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
#       0x498fe8        internal/poll.(*FD).Read+0x178          /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:169
#       0x5a143e        net.(*netFD).Read+0x4e                  /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:202
#       0x5b5647        net.(*conn).Read+0x67                   /usr/local/Cellar/go/1.11/libexe-:-- 2359k
c/src/net/net.go:177
#       0x6ba7e4        net/http.(*persistConn).Read+0x74       /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1497
#       0x559cd5        bufio.(*Reader).fill+0x105              /usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:100
#       0x559e2e        bufio.(*Reader).Peek+0x3e               /usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:132
#       0x6bb331        net/http.(*persistConn).readLoop+0x1a1  /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1645

1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x49a590 0x5a1d82 0x5c026e 0x5be797 0x6a819f 0x6c8c7c 0x6a6fcf 0x6a6c86 0x6a7c74 0x1a5ac1f 0x45c551
#       0x428b35        internal/poll.runtime_pollWait+0x65             /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
#       0x498189        internal/poll.(*pollDesc).wait+0x99             /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
#       0x49829c        internal/poll.(*pollDesc).waitRead+0x3c         /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
#       0x49a58f        internal/poll.(*FD).Accept+0x19f                /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:384
#       0x5a1d81        net.(*netFD).accept+0x41                        /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:238
#       0x5c026d        net.(*TCPListener).accept+0x2d                  /usr/local/Cellar/go/1.11/libexec/src/net/tcpsock_posix.go:139
#       0x5be796        net.(*TCPListener).AcceptTCP+0x46               /usr/local/Cellar/go/1.11/libexec/src/net/tcpsock.go:247
#       0x6a819e        net/http.tcpKeepAliveListener.Accept+0x2e       /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:3232
#       0x6a6fce        net/http.(*Server).Serve+0x22e                  /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2826
#       0x6a6c85        net/http.(*Server).ListenAndServe+0xb5          /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2764
#       0x6a7c73        net/http.ListenAndServe+0x73                    /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:3004
#       0x1a5ac1e       main.main.func2+0x17e                           /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:274

1 @ 0x42e14b 0x42e1f3 0x405a8e 0x4057bb 0x88a13c 0x88bde4 0x45c551
#       0x88a13b        github.com/influxdata/telegraf/agent.(*Agent).runOutputs+0x2ab  /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:451
#       0x88bde3        github.com/influxdata/telegraf/agent.(*Agent).Run.func4+0x83    /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:123

1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17e6a7c 0x17edd08 0x45c551#       0x43ed68        sync.runtime_Semacquire+0x38                                                            /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
#       0x474d53        sync.(*WaitGroup).Wait+0x63                                                             /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
#       0x17e6a7b       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect+0x2ab         /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:611
#       0x17edd07       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather.func1+0x87      /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:269

1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17ec378 0x77f66d 0x88c40f 0x45c551
#       0x43ed68        sync.runtime_Semacquire+0x38                                                    /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
#       0x474d53        sync.(*WaitGroup).Wait+0x63                                                     /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
#       0x17ec377       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather+0x167   /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:282
#       0x77f66c        github.com/influxdata/telegraf/internal/models.(*RunningInput).Gather+0x6c      /Users/prydin/go/src/github.com/influxdata/telegraf/internal/models/running_input.go:86
#       0x88c40e        github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1+0x3e             /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:283

1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17ec908 0x17e81de 0x17ed4fe 0x45c551
#       0x43ed68        sync.runtime_Semacquire+0x38                                                            /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
#       0x474d53        sync.(*WaitGroup).Wait+0x63                                                             /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
#       0x17ec907       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Drain+0x87          /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:117
#       0x17e81dd       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).collectResource+0x99d /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:758
#       0x17ed4fd       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect.func1+0x9d    /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:604

1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17ee4dd 0x45c551
#       0x43ed68        sync.runtime_Semacquire+0x38                                                            /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
#       0x474d53        sync.(*WaitGroup).Wait+0x63                                                             /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
#       0x17ee4dc       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1+0xec      /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:88

1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x8887c1 0x1a590a0 0x1a587a8 0x1a59f4a 0x42dd57 0x45c551
#       0x43ed68        sync.runtime_Semacquire+0x38                            /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
#       0x474d53        sync.(*WaitGroup).Wait+0x63                             /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
#       0x8887c0        github.com/influxdata/telegraf/agent.(*Agent).Run+0x470 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:129
#       0x1a5909f       main.runAgent+0x85f                                     /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:185
#       0x1a587a7       main.reloadLoop+0x247                                   /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:101
#       0x1a59f49       main.main+0x4b9                                         /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:381
#       0x42dd56        runtime.main+0x206                                      /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:201

1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x8891d8 0x88b9d4 0x45c551
#       0x43ed68        sync.runtime_Semacquire+0x38                                    /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
#       0x474d53        sync.(*WaitGroup).Wait+0x63                                     /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
#       0x8891d7        github.com/influxdata/telegraf/agent.(*Agent).runInputs+0x287   /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:232
#       0x88b9d3        github.com/influxdata/telegraf/agent.(*Agent).Run.func1+0xa3    /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:69

1 @ 0x42e14b 0x43e12d 0x17ecb5a 0x45c551
#       0x17ecb59       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).startDiscovery.func1+0xd9     /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:230

1 @ 0x42e14b 0x43e12d 0x19fa24d 0x45c551
#       0x19fa24c       github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view.(*worker).start+0xdc  /Users/prydin/go/src/github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view/worker.go:150

1 @ 0x42e14b 0x43e12d 0x1a5a8cf 0x45c551
#       0x1a5a8ce       main.reloadLoop.func1+0xae      /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:88

1 @ 0x42e14b 0x43e12d 0x6bc8f3 0x45c551
#       0x6bc8f2        net/http.(*persistConn).writeLoop+0x112 /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1885

1 @ 0x42e14b 0x43e12d 0x8896b3 0x889326 0x88c330 0x45c551
#       0x8896b2        github.com/influxdata/telegraf/agent.(*Agent).gatherOnce+0x232          /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:287
#       0x889325        github.com/influxdata/telegraf/agent.(*Agent).gatherOnInterval+0x125    /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:257
#       0x88c32f        github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1+0xbf      /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:229

1 @ 0x42e14b 0x43e12d 0x88a3d2 0x88c8c3 0x45c551
#       0x88a3d1        github.com/influxdata/telegraf/agent.(*Agent).flush+0x1a1               /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:496
#       0x88c8c2        github.com/influxdata/telegraf/agent.(*Agent).runOutputs.func1+0xa2     /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:447

1 @ 0x42e14b 0x43e12d 0x88b82b 0x45c551
#       0x88b82a        github.com/influxdata/telegraf/agent.(*Ticker).relayTime+0x12a  /Users/prydin/go/src/github.com/influxdata/telegraf/agent/tick.go:46

1 @ 0x72d048 0x72ce50 0x7298b4 0x735cf0 0x7365c3 0x6a3f24 0x6a5bb7 0x6a6b5b 0x6a2f86 0x45c551
#       0x72d047        runtime/pprof.writeRuntimeProfile+0x97  /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:707
#       0x72ce4f        runtime/pprof.writeGoroutine+0x9f       /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:669
#       0x7298b3        runtime/pprof.(*Profile).WriteTo+0x3e3  /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:328
#       0x735cef        net/http/pprof.handler.ServeHTTP+0x20f  /usr/local/Cellar/go/1.11/libexec/src/net/http/pprof/pprof.go:245
#       0x7365c2        net/http/pprof.Index+0x722              /usr/local/Cellar/go/1.11/libexec/src/net/http/pprof/pprof.go:268
#       0x6a3f23        net/http.HandlerFunc.ServeHTTP+0x43     /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1964
#       0x6a5bb6        net/http.(*ServeMux).ServeHTTP+0x126    /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2361
#       0x6a6b5a        net/http.serverHandler.ServeHTTP+0xaa   /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2741
#       0x6a2f85        net/http.(*conn).serve+0x645            /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1847

Latest Log :

2018-12-04T09:56:00Z D! [outputs.influxdb] wrote batch of 136 metrics in 7.138475ms
2018-12-04T09:56:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-04T09:56:30Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-12-04T09:57:00Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-12-04T09:57:30Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-12-04T09:58:00Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-12-04T09:58:00Z D! [outputs.influxdb] wrote batch of 136 metrics in 10.431907ms
2018-12-04T09:58:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-04T09:58:30Z W! [agent] input "inputs.vsphere" did not complete within its interval

@prydin
Copy link
Contributor

prydin commented Dec 4, 2018

THANK YOU!!!! This gives me a pretty good idea what's wrong!

@prydin
Copy link
Contributor

prydin commented Dec 6, 2018

@bashrc666 Thanks again for the detailed information. It was extremely helpful.

Here's a pre-release of what's on PR #5113

https://github.com/prydin/telegraf/releases/tag/PR-SCALE-IMPROVEMENT-RC1

Try it if you like. As always with a pre-release, you use it at your own risk.

@ghost
Copy link

ghost commented Dec 12, 2018

Hello,

I still have the same issue with the same vcenter.

version :

Telegraf unknown (git: prydin-scale-improvement 646c5960

GO DUMP

goroutine profile: total 30
8 @ 0x42e14b 0x43e12d 0x88941e 0x88c380 0x45c551
#       0x88941d        github.com/influxdata/telegraf/agent.(*Agent).gatherOnInterval+0x1cd    /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:262
#       0x88c37f        github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1+0xbf      /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:229

2 @ 0x42e14b 0x43e12d 0x6bc943 0x45c551
#       0x6bc942        net/http.(*persistConn).writeLoop+0x112 /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1885

1 @ 0x40ae87 0x4431dc 0x7373d2 0x45c551
#       0x4431db        os/signal.signal_recv+0x9b      /usr/local/Cellar/go/1.11/libexec/src/runtime/sigqueue.go:139
#       0x7373d1        os/signal.loop+0x21             /usr/local/Cellar/go/1.11/libexec/src/os/signal/signal_unix.go:23

1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x498fe9 0x5a148f 0x5b5698 0x603f29 0x60442d 0x6079b1 0x6ba835 0x559d26 0x559e7f 0x6bb382 0x45c551
#       0x428b35        internal/poll.runtime_pollWait+0x65     /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
#       0x498189        internal/poll.(*pollDesc).wait+0x99     /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
#       0x49829c        internal/poll.(*pollDesc).waitRead+0x3c /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
#       0x498fe8        internal/poll.(*FD).Read+0x178          /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:169
#       0x5a148e        net.(*netFD).Read+0x4e                  /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:202
#       0x5b5697        net.(*conn).Read+0x67                   /usr/local/Cellar/go/1.11/libexec/src/net/net.go:177
#       0x603f28        crypto/tls.(*block).readFromUntil+0x88  /usr/local/Cellar/go/1.11/libexec/src/crypto/tls/conn.go:492
#       0x60442c        crypto/tls.(*Conn).readRecord+0xdc      /usr/local/Cellar/go/1.11/libexec/src/crypto/tls/conn.go:593
#       0x6079b0        crypto/tls.(*Conn).Read+0xf0            /usr/local/Cellar/go/1.11/libexec/src/crypto/tls/conn.go:1145
#       0x6ba834        net/http.(*persistConn).Read+0x74       /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1497
#       0x559d25        bufio.(*Reader).fill+0x105              /usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:100
#       0x559e7e        bufio.(*Reader).Peek+0x3e               /usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:132
#       0x6bb381        net/http.(*persistConn).readLoop+0x1a1  /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1645

1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x498fe9 0x5a148f 0x5b5698 0x69dada 0x45c551
#       0x428b35        internal/poll.runtime_pollWait+0x65             /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
#       0x498189        internal/poll.(*pollDesc).wait+0x99             /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
#       0x49829c        internal/poll.(*pollDesc).waitRead+0x3c         /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
#       0x498fe8        internal/poll.(*FD).Read+0x178                  /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:169
#       0x5a148e        net.(*netFD).Read+0x4e                          /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:202
#       0x5b5697        net.(*conn).Read+0x67                           /usr/local/Cellar/go/1.11/libexec/src/net/net.go:177
#       0x69dad9        net/http.(*connReader).backgroundRead+0x59      /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:676

1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x498fe9 0x5a148f 0x5b5698 0x6ba835 0x559d26 0x559e7f 0x6bb382 0x45c551
#       0x428b35        internal/poll.runtime_pollWait+0x65     /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
#       0x498189        internal/poll.(*pollDesc).wait+0x99     /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
#       0x49829c        internal/poll.(*pollDesc).waitRead+0x3c /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
#       0x498fe8        internal/poll.(*FD).Read+0x178          /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:169
#       0x5a148e        net.(*netFD).Read+0x4e                  /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:202
#       0x5b5697        net.(*conn).Read+0x67                   /usr/local/Cellar/go/1.11/libexec/src/net/net.go:177
#       0x6ba834        net/http.(*persistConn).Read+0x74       /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1497
#       0x559d25        bufio.(*Reader).fill+0x105              /usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:100
#       0x559e7e        bufio.(*Reader).Peek+0x3e               /usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:132
#       0x6bb381        net/http.(*persistConn).readLoop+0x1a1  /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1645

1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x49a590 0x5a1dd2 0x5c02be 0x5be7e7 0x6a81ef 0x6c8ccc 0x6a701f 0x6a6cd6 0x6a7cc4 0x1a5a9ff 0x45c551
#       0x428b35        internal/poll.runtime_pollWait+0x65             /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
#       0x498189        internal/poll.(*pollDesc).wait+0x99             /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
#       0x49829c        internal/poll.(*pollDesc).waitRead+0x3c         /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
#       0x49a58f        internal/poll.(*FD).Accept+0x19f                /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:384
#       0x5a1dd1        net.(*netFD).accept+0x41                        /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:238
#       0x5c02bd        net.(*TCPListener).accept+0x2d                  /usr/local/Cellar/go/1.11/libexec/src/net/tcpsock_posix.go:139
#       0x5be7e6        net.(*TCPListener).AcceptTCP+0x46               /usr/local/Cellar/go/1.11/libexec/src/net/tcpsock.go:247
#       0x6a81ee        net/http.tcpKeepAliveListener.Accept+0x2e       /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:3232
#       0x6a701e        net/http.(*Server).Serve+0x22e                  /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2826
#       0x6a6cd5        net/http.(*Server).ListenAndServe+0xb5          /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2764
#       0x6a7cc3        net/http.ListenAndServe+0x73                    /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:3004
#       0x1a5a9fe       main.main.func2+0x17e                           /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:274

1 @ 0x42e14b 0x42e1f3 0x404ead 0x404c85 0x17ec185 0x17e7689 0x17e7e05 0x17e8b60 0x17eda7e 0x45c551
#       0x17ec184       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*ThrottledExecutor).Run+0x54     /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/throttled_exec.go:25
#       0x17e7688       github.com/influxdata/telegraf/plugins/inputs/vsphere.submitChunkJob+0x88               /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:667
#       0x17e7e04       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).chunkify+0x734        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:732
#       0x17e8b5f       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).collectResource+0x7cf /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:789
#       0x17eda7d       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect.func1+0x9d    /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:651

1 @ 0x42e14b 0x42e1f3 0x405a8e 0x4057bb 0x88a18c 0x88be34 0x45c551
#       0x88a18b        github.com/influxdata/telegraf/agent.(*Agent).runOutputs+0x2ab  /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:451
#       0x88be33        github.com/influxdata/telegraf/agent.(*Agent).Run.func4+0x83    /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:123

1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17e74dc 0x17ee225 0x45c551
#       0x43ed68        sync.runtime_Semacquire+0x38                                                            /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
#       0x474d53        sync.(*WaitGroup).Wait+0x63                                                             /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
#       0x17e74db       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect+0x2ab         /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:658
#       0x17ee224       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather.func1+0x84      /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:268

1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17ecd22 0x77f6bd 0x88c45f 0x45c551
#       0x43ed68        sync.runtime_Semacquire+0x38                                                    /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
#       0x474d53        sync.(*WaitGroup).Wait+0x63                                                     /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
#       0x17ecd21       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather+0xe1    /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:280
#       0x77f6bc        github.com/influxdata/telegraf/internal/models.(*RunningInput).Gather+0x6c      /Users/prydin/go/src/github.com/influxdata/telegraf/internal/models/running_input.go:86
#       0x88c45e        github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1+0x3e             /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:283

1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x888811 0x1a58e80 0x1a58588 0x1a59d2a 0x42dd57 0x45c551
#       0x43ed68        sync.runtime_Semacquire+0x38                            /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
#       0x474d53        sync.(*WaitGroup).Wait+0x63                             /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
#       0x888810        github.com/influxdata/telegraf/agent.(*Agent).Run+0x470 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:129
#       0x1a58e7f       main.runAgent+0x85f                                     /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:185
#       0x1a58587       main.reloadLoop+0x247                                   /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:101
#       0x1a59d29       main.main+0x4b9                                         /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:381
#       0x42dd56        runtime.main+0x206                                      /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:201

1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x889228 0x88ba24 0x45c551
#       0x43ed68        sync.runtime_Semacquire+0x38                                    /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
#       0x474d53        sync.(*WaitGroup).Wait+0x63                                     /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
#       0x889227        github.com/influxdata/telegraf/agent.(*Agent).runInputs+0x287   /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:232
#       0x88ba23        github.com/influxdata/telegraf/agent.(*Agent).Run.func1+0xa3    /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:69

1 @ 0x42e14b 0x43e12d 0x17ecffa 0x45c551
#       0x17ecff9       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).startDiscovery.func1+0xd9     /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:237

1 @ 0x42e14b 0x43e12d 0x19fa02d 0x45c551
#       0x19fa02c       github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view.(*worker).start+0xdc  /Users/prydin/go/src/github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view/worker.go:150

1 @ 0x42e14b 0x43e12d 0x1a5a6af 0x45c551
#       0x1a5a6ae       main.reloadLoop.func1+0xae      /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:88

1 @ 0x42e14b 0x43e12d 0x6bd36a 0x6b3a01 0x69c165 0x6617db 0x6614fa 0x662b88 0x6628a5 0x172ca14 0x172d4a3 0x173f650 0x1738e18 0x17d419a 0x17e1785 0x17e93b7 0x17edc27 0x17edb23 0x17ee0ad 0x45c551
#       0x6bd369        net/http.(*persistConn).roundTrip+0x569                                                                 /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:2101
#       0x6b3a00        net/http.(*Transport).roundTrip+0x9b0                                                                   /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:465
#       0x69c164        net/http.(*Transport).RoundTrip+0x34                                                                    /usr/local/Cellar/go/1.11/libexec/src/net/http/roundtrip.go:17
#       0x6617da        net/http.send+0x14a                                                                                     /usr/local/Cellar/go/1.11/libexec/src/net/http/client.go:250
#       0x6614f9        net/http.(*Client).send+0xf9                                                                            /usr/local/Cellar/go/1.11/libexec/src/net/http/client.go:174
#       0x662b87        net/http.(*Client).do+0x2a7                                                                             /usr/local/Cellar/go/1.11/libexec/src/net/http/client.go:641
#       0x6628a4        net/http.(*Client).Do+0x34                                                                              /usr/local/Cellar/go/1.11/libexec/src/net/http/client.go:509
#       0x172ca13       github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/vim25/soap.(*Client).do+0x113           /Users/prydin/go/src/github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/vim25/soap/client.go:442
#       0x172d4a2       github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/vim25/soap.(*Client).RoundTrip+0x882    /Users/prydin/go/src/github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/vim25/soap/client.go:524
#       0x173f64f       github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/vim25.(*Client).RoundTrip+0x7f          /Users/prydin/go/src/github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/vim25/client.go:89
#       0x1738e17       github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/vim25/methods.QueryPerf+0xb7            /Users/prydin/go/src/github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/vim25/methods/methods.go:9899
#       0x17d4199       github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/performance.(*Manager).Query+0x1a9      /Users/prydin/go/src/github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/performance/manager.go:276
#       0x17e1784       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Client).QueryMetrics+0x104                      /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/client.go:268
#       0x17e93b6       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).collectChunk+0x2c6                    /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:830
#       0x17edc26       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).collectResource.func1+0xe6            /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:791
#       0x17edb22       github.com/influxdata/telegraf/plugins/inputs/vsphere.submitChunkJob.func1+0x42                         /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:668
#       0x17ee0ac       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*ThrottledExecutor).Run.func1+0x7c               /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/throttled_exec.go:31

1 @ 0x42e14b 0x43e12d 0x6bf3df 0x45c551
#       0x6bf3de        net/http.setRequestCancel.func3+0xce    /usr/local/Cellar/go/1.11/libexec/src/net/http/client.go:321

1 @ 0x42e14b 0x43e12d 0x889703 0x889376 0x88c380 0x45c551
#       0x889702        github.com/influxdata/telegraf/agent.(*Agent).gatherOnce+0x232          /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:287
#       0x889375        github.com/influxdata/telegraf/agent.(*Agent).gatherOnInterval+0x125    /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:257
#       0x88c37f        github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1+0xbf      /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:229

1 @ 0x42e14b 0x43e12d 0x88a422 0x88c913 0x45c551
#       0x88a421        github.com/influxdata/telegraf/agent.(*Agent).flush+0x1a1               /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:496
#       0x88c912        github.com/influxdata/telegraf/agent.(*Agent).runOutputs.func1+0xa2     /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:447

1 @ 0x42e14b 0x43e12d 0x88b87b 0x45c551
#       0x88b87a        github.com/influxdata/telegraf/agent.(*Ticker).relayTime+0x12a  /Users/prydin/go/src/github.com/influxdata/telegraf/agent/tick.go:46

1 @ 0x72d098 0x72cea0 0x729904 0x735d40 0x736613 0x6a3f74 0x6a5c07 0x6a6bab 0x6a2fd6 0x45c551
#       0x72d097        runtime/pprof.writeRuntimeProfile+0x97  /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:707
#       0x72ce9f        runtime/pprof.writeGoroutine+0x9f       /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:669
#       0x729903        runtime/pprof.(*Profile).WriteTo+0x3e3  /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:328
#       0x735d3f        net/http/pprof.handler.ServeHTTP+0x20f  /usr/local/Cellar/go/1.11/libexec/src/net/http/pprof/pprof.go:245
#       0x736612        net/http/pprof.Index+0x722              /usr/local/Cellar/go/1.11/libexec/src/net/http/pprof/pprof.go:268
#       0x6a3f73        net/http.HandlerFunc.ServeHTTP+0x43     /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1964
#       0x6a5c06        net/http.(*ServeMux).ServeHTTP+0x126    /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2361
#       0x6a6baa        net/http.serverHandler.ServeHTTP+0xaa   /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2741
#       0x6a2fd5        net/http.(*conn).serve+0x645            /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1847

STDOUT

panic: runtime error: index out of range

goroutine 2061 [running]:
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).collectChunk(0xc000206900, 0x25051a0, 0xc00003e098, 0xc0016a4000, 0x13, 0x100, 0x21d7320, 0x9, 0xc000c3a090, 0x2512c40, ...)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:831 +0x17c2
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).collectResource.func1(0x25051a0, 0xc00003e098, 0x1c50cc0, 0xc000af96e0, 0x0, 0x0)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:734 +0xff
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1.1(0xc001376b80, 0xc000c675a0, 0x25051a0, 0xc00003e098, 0xc000155310)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:80 +0x8e
created by github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:72 +0xcd

@prydin
Copy link
Contributor

prydin commented Dec 12, 2018

@bashrc666 that output doesn't match the thread dump. The WorkerPool class doesn't exist anymore. Are you sure that's the right output?

As for the dump, it looks like it's stuck on a slow call to vCenter. What's your concurrency setting? Is the vCenter slow in general?

@ghost
Copy link

ghost commented Dec 12, 2018

My conf

  vcenters = [ 'http://foo.bar/sdk' ]
  username = 'ADUSER'
  password = "supersecurepassword"

  vm_metric_include = [
    "cpu.usage.average",
    "mem.usage.average",
    "net.received.average",
    "net.transmitted.average",
    "virtualDisk.read.average",
    "virtualDisk.write.average",
    "virtualDisk.writeOIO.latest"
  ]
  host_metric_include = [
    "cpu.usage.average",
    "disk.read.average",
    "disk.write.average",
    "disk.totalReadLatency.average",
    "disk.totalWriteLatency.average",
    "mem.usage.average",
    "net.received.average",
    "net.transmitted.average"
  ]
  cluster_metric_exclude = []
  datastore_metric_exclude = [] 
  datacenter_metric_exclude = [ "*" ]
  collect_concurrency = 10
  discover_concurrency = 4
  object_discovery_interval = "3000s"
  insecure_skip_verify = true

It only happened on this particular very big vcenter that contain 29 cluster and 259 host and 8129 VM and so many datastore.

Maybe i'have something to improve on this config ???

@prydin Thank's so much for the help

@prydin
Copy link
Contributor

prydin commented Dec 12, 2018

@bashrc666 It's probably the datastore collection that takes a long time. Break it out into a separate declaration of [[inputs.vsphere]] and set the interval for that instance to 300s. Also, you're collecting every metric on the datastores. You can save some collection time by specifying a smaller set.

@ghost
Copy link

ghost commented Dec 13, 2018

@prydin i've decided to get ride of the datastore metric for the moment, et get back on it when i'm sure that the VMS and HOST collecting will work on that vcenter. but between 10 to 20min telegraf stop working.

CONFIG

[[inputs.vsphere]]
  vcenters = [ 'https://foor.bar/sdk' ]
  username = 'ADUSER'
  password = "SUPERSTRONGPASSWORD"
  vm_metric_include = []
  host_metric_include = []
  cluster_metric_exclude = ["*"]
  datastore_metric_exclude = ["*"]
  datacenter_metric_exclude = [ "*" ]
  collect_concurrency = 10
  discover_concurrency = 4
  object_discovery_interval = "300s"
  insecure_skip_verify = true

GO DUMP

goroutine profile: total 37
10 @ 0x42e14b 0x43e12d 0x17ec6dc 0x17ee29d 0x45c551
#       0x17ec6db       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).pushOut+0xeb        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:56
#       0x17ee29c       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1.1+0xcc    /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:80

8 @ 0x42e14b 0x43e12d 0x8893ce 0x88c330 0x45c551
#       0x8893cd        github.com/influxdata/telegraf/agent.(*Agent).gatherOnInterval+0x1cd    /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:262
#       0x88c32f        github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1+0xbf      /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:229

1 @ 0x40ae87 0x4431dc 0x737382 0x45c551
#       0x4431db        os/signal.signal_recv+0x9b      /usr/local/Cellar/go/1.11/libexec/src/runtime/sigqueue.go:139
#       0x737381        os/signal.loop+0x21             /usr/local/Cellar/go/1.11/libexec/src/os/signal/signal_unix.go:23

1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x498fe9 0x5a143f 0x5b5648 0x69da8a 0x45c551
#       0x428b35        internal/poll.runtime_pollWait+0x65             /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
#       0x498189        internal/poll.(*pollDesc).wait+0x99             /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
#       0x49829c        internal/poll.(*pollDesc).waitRead+0x3c         /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
#       0x498fe8        internal/poll.(*FD).Read+0x178                  /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:169
#       0x5a143e        net.(*netFD).Read+0x4e                          /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:202
#       0x5b5647        net.(*conn).Read+0x67                           /usr/local/Cellar/go/1.11/libexec/src/net/net.go:177
#       0x69da89        net/http.(*connReader).backgroundRead+0x59      /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:676

1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x498fe9 0x5a143f 0x5b5648 0x6ba7e5 0x559cd6 0x559e2f 0x6bb332 0x45c551
#       0x428b35        internal/poll.runtime_pollWait+0x65     /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
#       0x498189        internal/poll.(*pollDesc).wait+0x99     /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
#       0x49829c        internal/poll.(*pollDesc).waitRead+0x3c /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
#       0x498fe8        internal/poll.(*FD).Read+0x178          /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:169
#       0x5a143e        net.(*netFD).Read+0x4e                  /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:202
#       0x5b5647        net.(*conn).Read+0x67                   /usr/local/Cellar/go/1.11/libexec/src/net/net.go:177
#       0x6ba7e4        net/http.(*persistConn).Read+0x74       /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1497
#       0x559cd5        bufio.(*Reader).fill+0x105              /usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:100
#       0x559e2e        bufio.(*Reader).Peek+0x3e               /usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:132
#       0x6bb331        net/http.(*persistConn).readLoop+0x1a1  /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1645

1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x49a590 0x5a1d82 0x5c026e 0x5be797 0x6a819f 0x6c8c7c 0x6a6fcf 0x6a6c86 0x6a7c74 0x1a5ac1f 0x45c551
#       0x428b35        internal/poll.runtime_pollWait+0x65             /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
#       0x498189        internal/poll.(*pollDesc).wait+0x99             /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
#       0x49829c        internal/poll.(*pollDesc).waitRead+0x3c         /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
#       0x49a58f        internal/poll.(*FD).Accept+0x19f                /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:384
#       0x5a1d81        net.(*netFD).accept+0x41                        /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:238
#       0x5c026d        net.(*TCPListener).accept+0x2d                  /usr/local/Cellar/go/1.11/libexec/src/net/tcpsock_posix.go:139
#       0x5be796        net.(*TCPListener).AcceptTCP+0x46               /usr/local/Cellar/go/1.11/libexec/src/net/tcpsock.go:247
#       0x6a819e        net/http.tcpKeepAliveListener.Accept+0x2e       /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:3232
#       0x6a6fce        net/http.(*Server).Serve+0x22e                  /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2826
#       0x6a6c85        net/http.(*Server).ListenAndServe+0xb5          /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2764
#       0x6a7c73        net/http.ListenAndServe+0x73                    /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:3004
#       0x1a5ac1e       main.main.func2+0x17e                           /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:274

1 @ 0x42e14b 0x42e1f3 0x405a8e 0x4057bb 0x88a13c 0x88bde4 0x45c551
#       0x88a13b        github.com/influxdata/telegraf/agent.(*Agent).runOutputs+0x2ab  /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:451
#       0x88bde3        github.com/influxdata/telegraf/agent.(*Agent).Run.func4+0x83    /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:123

1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17e6a7c 0x17edd08 0x45c551
#       0x43ed68        sync.runtime_Semacquire+0x38                                                            /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
#       0x474d53        sync.(*WaitGroup).Wait+0x63                                                             /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
#       0x17e6a7b       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect+0x2ab         /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:611
#       0x17edd07       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather.func1+0x87      /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:269

1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17ec378 0x77f66d 0x88c40f 0x45c551
#       0x43ed68        sync.runtime_Semacquire+0x38                                                    /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
#       0x474d53        sync.(*WaitGroup).Wait+0x63                                                     /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
#       0x17ec377       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather+0x167   /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:282
#       0x77f66c        github.com/influxdata/telegraf/internal/models.(*RunningInput).Gather+0x6c      /Users/prydin/go/src/github.com/influxdata/telegraf/internal/models/running_input.go:86
#       0x88c40e        github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1+0x3e             /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:283

1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17ec908 0x17e81de 0x17ed4fe 0x45c551
#       0x43ed68        sync.runtime_Semacquire+0x38                                                            /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
#       0x474d53        sync.(*WaitGroup).Wait+0x63                                                             /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
#       0x17ec907       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Drain+0x87          /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:117
#       0x17e81dd       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).collectResource+0x99d /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:758
#       0x17ed4fd       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect.func1+0x9d    /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:604

1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17ee4dd 0x45c551
#       0x43ed68        sync.runtime_Semacquire+0x38                                                            /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
#       0x474d53        sync.(*WaitGroup).Wait+0x63                                                             /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
#       0x17ee4dc       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1+0xec      /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:88

1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x8887c1 0x1a590a0 0x1a587a8 0x1a59f4a 0x42dd57 0x45c551
#       0x43ed68        sync.runtime_Semacquire+0x38                            /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
#       0x474d53        sync.(*WaitGroup).Wait+0x63                             /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
#       0x8887c0        github.com/influxdata/telegraf/agent.(*Agent).Run+0x470 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:129
#       0x1a5909f       main.runAgent+0x85f                                     /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:185
#       0x1a587a7       main.reloadLoop+0x247                                   /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:101
#       0x1a59f49       main.main+0x4b9                                         /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:381
#       0x42dd56        runtime.main+0x206                                      /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:201

1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x8891d8 0x88b9d4 0x45c551
#       0x43ed68        sync.runtime_Semacquire+0x38                                    /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
#       0x474d53        sync.(*WaitGroup).Wait+0x63                                     /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
#       0x8891d7        github.com/influxdata/telegraf/agent.(*Agent).runInputs+0x287   /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:232
#       0x88b9d3        github.com/influxdata/telegraf/agent.(*Agent).Run.func1+0xa3    /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:69

1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ee5d 0x4749e4 0x17e4960 0x17ecb93 0x45c551
#       0x43ee5c        sync.runtime_SemacquireMutex+0x3c                                                               /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:71
#       0x4749e3        sync.(*RWMutex).Lock+0x73                                                                       /usr/local/Cellar/go/1.11/libexec/src/sync/rwmutex.go:98
#       0x17e495f       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).discover+0xd8f                /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:452
#       0x17ecb92       github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).startDiscovery.func1+0x112    /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:232

1 @ 0x42e14b 0x43e12d 0x19fa24d 0x45c551
#       0x19fa24c       github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view.(*worker).start+0xdc  /Users/prydin/go/src/github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view/worker.go:150

1 @ 0x42e14b 0x43e12d 0x1a5a8cf 0x45c551
#       0x1a5a8ce       main.reloadLoop.func1+0xae      /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:88

1 @ 0x42e14b 0x43e12d 0x6bc8f3 0x45c551
#       0x6bc8f2        net/http.(*persistConn).writeLoop+0x112 /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1885

1 @ 0x42e14b 0x43e12d 0x8896b3 0x889326 0x88c330 0x45c551
#       0x8896b2        github.com/influxdata/telegraf/agent.(*Agent).gatherOnce+0x232          /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:287
#       0x889325        github.com/influxdata/telegraf/agent.(*Agent).gatherOnInterval+0x125    /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:257
#       0x88c32f        github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1+0xbf      /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:229

1 @ 0x42e14b 0x43e12d 0x88a3d2 0x88c8c3 0x45c551
#       0x88a3d1        github.com/influxdata/telegraf/agent.(*Agent).flush+0x1a1               /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:496
#       0x88c8c2        github.com/influxdata/telegraf/agent.(*Agent).runOutputs.func1+0xa2     /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:447

1 @ 0x42e14b 0x43e12d 0x88b82b 0x45c551
#       0x88b82a        github.com/influxdata/telegraf/agent.(*Ticker).relayTime+0x12a  /Users/prydin/go/src/github.com/influxdata/telegraf/agent/tick.go:46

1 @ 0x72d048 0x72ce50 0x7298b4 0x735cf0 0x7365c3 0x6a3f24 0x6a5bb7 0x6a6b5b 0x6a2f86 0x45c551
#       0x72d047        runtime/pprof.writeRuntimeProfile+0x97  /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:707
#       0x72ce4f        runtime/pprof.writeGoroutine+0x9f       /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:669
#       0x7298b3        runtime/pprof.(*Profile).WriteTo+0x3e3  /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:328
#       0x735cef        net/http/pprof.handler.ServeHTTP+0x20f  /usr/local/Cellar/go/1.11/libexec/src/net/http/pprof/pprof.go:245
#       0x7365c2        net/http/pprof.Index+0x722              /usr/local/Cellar/go/1.11/libexec/src/net/http/pprof/pprof.go:268
#       0x6a3f23        net/http.HandlerFunc.ServeHTTP+0x43     /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1964
#       0x6a5bb6        net/http.(*ServeMux).ServeHTTP+0x126    /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2361
#       0x6a6b5a        net/http.serverHandler.ServeHTTP+0xaa   /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2741
#       0x6a2f85        net/http.(*conn).serve+0x645            /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1847

@danielnelson
Copy link
Contributor

Can you grab the full goroutine stack dump from here: http://localhost:6060/debug/pprof/goroutine?debug=2

@ghost
Copy link

ghost commented Dec 14, 2018

TELEGRAF VERSION

~# /usr/bin/telegraf --version
Telegraf unknown (git: prydin-scale-improvement aaa67547)

CONTEXT

I try to collect simple vm metrics on a vcenter that manage:

259 host
8129 VM

Telegraf stop working between 20 or 30 min after it started.

  • Here is the debug log telegraf throw after he stop work
2018-12-14T12:50:30Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:50:40Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:50:50Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:51:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:51:00Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-12-14T12:51:10Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:51:20Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:51:30Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:51:40Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:51:50Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:52:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:52:00Z W! [agent] input "inputs.vsphere" did not complete within its interval

CONFIG

[global_tags]

[agent]

interval = "60s"
round_interval = true
metric_batch_size = 10000
metric_buffer_limit = 100000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
debug = true
quiet = false
logfile = "/var/log/telegraf/telegraf.log"
hostname = ""
omit_hostname = false

[[outputs.influxdb]]

urls = ["http://10.x.x.x:8086"]
database = "vcenter"

[[inputs.vsphere]]
  vcenters = [ 'https://foo.bar/sdk' ]
  username = 'ADUSER'
  password = "SUPERSTRONGPASSWORD"
  vm_metric_include = []
  host_metric_include = []
  cluster_metric_exclude = ["*"] 
  datastore_metric_exclude = ["*"]
  datacenter_metric_exclude = [ "*" ]
  collect_concurrency = 2
  discover_concurrency = 2
  object_discovery_interval = "600s"
  insecure_skip_verify = true

GO DUMP LEVEL 2

goroutine 12632 [running]:                                                                                                                                                                                                                                            [278/1877]
runtime/pprof.writeGoroutineStacks(0x24e9000, 0xc01931c0e0, 0x40be5f, 0xc022c4e240)
        /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:678 +0xa7
runtime/pprof.writeGoroutine(0x24e9000, 0xc01931c0e0, 0x2, 0xc0004e4700, 0x0)
        /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:667 +0x44
runtime/pprof.(*Profile).WriteTo(0x3ca45e0, 0x24e9000, 0xc01931c0e0, 0x2, 0xc01931c0e0, 0x21dec75)
        /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:328 +0x3e4
net/http/pprof.handler.ServeHTTP(0xc0102a4011, 0x9, 0x2502020, 0xc01931c0e0, 0xc000128100)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/pprof/pprof.go:245 +0x210
net/http/pprof.Index(0x2502020, 0xc01931c0e0, 0xc000128100)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/pprof/pprof.go:268 +0x723
net/http.HandlerFunc.ServeHTTP(0x22cf9c0, 0x2502020, 0xc01931c0e0, 0xc000128100)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1964 +0x44
net/http.(*ServeMux).ServeHTTP(0x3cd89a0, 0x2502020, 0xc01931c0e0, 0xc000128100)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2361 +0x127
net/http.serverHandler.ServeHTTP(0xc0000a6c30, 0x2502020, 0xc01931c0e0, 0xc000128100)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2741 +0xab
net/http.(*conn).serve(0xc00787c500, 0x2505160, 0xc02047a000)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1847 +0x646
created by net/http.(*Server).Serve
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2851 +0x2f5

goroutine 1 [semacquire, 101 minutes]:
sync.runtime_Semacquire(0xc00053aea8)
        /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc00053aea0)
        /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130 +0x64
github.com/influxdata/telegraf/agent.(*Agent).Run(0xc0002025f0, 0x2505160, 0xc000042d00, 0x1, 0x1)
        /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:129 +0x471
main.runAgent(0x2505160, 0xc000042d00, 0x3cfde20, 0x0, 0x0, 0x3cfde20, 0x0, 0x0, 0x0, 0x0)
        /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:185 +0x860
main.reloadLoop(0xc0002e0120, 0x3cfde20, 0x0, 0x0, 0x3cfde20, 0x0, 0x0, 0xc0007add58, 0x0, 0x0, ...)
        /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:101 +0x248
main.main()
        /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:381 +0x4ba

goroutine 17 [syscall, 101 minutes]:
os/signal.signal_recv(0x0)
        /usr/local/Cellar/go/1.11/libexec/src/runtime/sigqueue.go:139 +0x9c
os/signal.loop()
        /usr/local/Cellar/go/1.11/libexec/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
        /usr/local/Cellar/go/1.11/libexec/src/os/signal/signal_unix.go:29 +0x41

goroutine 13 [select]:
github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view.(*worker).start(0xc000133b80)
        /Users/prydin/go/src/github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view/worker.go:150 +0xdd
created by github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view.init.0
        /Users/prydin/go/src/github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view/worker.go:29 +0x57

goroutine 14 [IO wait]:
internal/poll.runtime_pollWait(0x7f074bd93f00, 0x72, 0x0)
        /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc00020c018, 0x72, 0xc0002a4200, 0x0, 0x0)
        /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc00020c018, 0xffffffffffffff00, 0x0, 0x0)
        /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Accept(0xc00020c000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:384 +0x1a0
net.(*netFD).accept(0xc00020c000, 0x50, 0x1fa58e0, 0xc0004bfd01)
        /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc00013e018, 0xc0004bfd88, 0xc009ec73b0, 0xe25aac92949344a6)
        /usr/local/Cellar/go/1.11/libexec/src/net/tcpsock_posix.go:139 +0x2e
net.(*TCPListener).AcceptTCP(0xc00013e018, 0xc0004bfdb0, 0x48f726, 0x5c13a732)
        /usr/local/Cellar/go/1.11/libexec/src/net/tcpsock.go:247 +0x47
net/http.tcpKeepAliveListener.Accept(0xc00013e018, 0xc0004bfe00, 0x18, 0xc0001ee600, 0x6a7095)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:3232 +0x2f
net/http.(*Server).Serve(0xc0000a6c30, 0x2503060, 0xc00013e018, 0x0, 0x0)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2826 +0x22f
net/http.(*Server).ListenAndServe(0xc0000a6c30, 0xc0000a6c30, 0x41)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2764 +0xb6
net/http.ListenAndServe(0x7ffcb30abf4e, 0xc, 0x0, 0x0, 0x1, 0x21dc8c0)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:3004 +0x74
main.main.func2()
        /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:274 +0x17f
created by main.main
        /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:264 +0xa1b

goroutine 37 [select, 101 minutes]:
main.reloadLoop.func1(0xc0002e02a0, 0xc0003042a0, 0xc00007b650, 0xc0002e0120)
        /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:88 +0xaf
created by main.reloadLoop
        /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:87 +0x1e2

goroutine 82 [IO wait, 82 minutes]:
internal/poll.runtime_pollWait(0x7f074bd93e30, 0x72, 0xc000414a88)
        /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc00020c398, 0x72, 0xffffffffffffff00, 0x24eb300, 0x3bc37f0)
        /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc00020c398, 0xc00042b000, 0x1000, 0x1000)
        /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc00020c380, 0xc00042b000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:169 +0x179
net.(*netFD).Read(0xc00020c380, 0xc00042b000, 0x1000, 0x1000, 0x1, 0x0, 0xc0002b2ce0)
        /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc00013e058, 0xc00042b000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.11/libexec/src/net/net.go:177 +0x68
net/http.(*persistConn).Read(0xc0000ba6c0, 0xc00042b000, 0x1000, 0x1000, 0xc0000ba480, 0xc0000ba6c0, 0x0)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1497 +0x75
bufio.(*Reader).fill(0xc000134420)
        /usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:100 +0x106
bufio.(*Reader).Peek(0xc000134420, 0x1, 0x2, 0x0, 0x0, 0xc0002e1ec0, 0x0)
        /usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:132 +0x3f
net/http.(*persistConn).readLoop(0xc0000ba6c0)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1645 +0x1a2
created by net/http.(*Transport).dialConn
        /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1338 +0x941

goroutine 83 [select, 82 minutes]:

net/http.(*persistConn).writeLoop(0xc0000ba6c0)                                                                                                                                                                                                                       [169/1877]
        /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1885 +0x113
created by net/http.(*Transport).dialConn
        /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1339 +0x966

goroutine 20 [semacquire, 101 minutes]:
sync.runtime_Semacquire(0xc00053b938)
        /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc00053b930)
        /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130 +0x64
github.com/influxdata/telegraf/agent.(*Agent).runInputs(0xc0002025f0, 0x2505160, 0xc000042d00, 0xbefd01b4aa3b9c9f, 0x22803fa, 0x3cd91c0, 0xc00003a5a0, 0x0, 0x0)
        /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:232 +0x288
github.com/influxdata/telegraf/agent.(*Agent).Run.func1(0xc00053aea0, 0xc0002025f0, 0x2505160, 0xc000042d00, 0xbefd01b4aa3b9c9f, 0x22803fa, 0x3cd91c0, 0xc00003a5a0)
        /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:69 +0xa4
created by github.com/influxdata/telegraf/agent.(*Agent).Run
        /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:66 +0x3bb

goroutine 21 [chan receive, 82 minutes]:
github.com/influxdata/telegraf/agent.(*Agent).runOutputs(0xc0002025f0, 0xbefd01b4aa3b9c9f, 0x22803fa, 0x3cd91c0, 0xc00003a5a0, 0x4500000000, 0x201)
        /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:451 +0x2ac
github.com/influxdata/telegraf/agent.(*Agent).Run.func4(0xc00053aea0, 0xc0002025f0, 0xbefd01b4aa3b9c9f, 0x22803fa, 0x3cd91c0, 0xc00003a5a0)
        /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:123 +0x84
created by github.com/influxdata/telegraf/agent.(*Agent).Run
        /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:120 +0x460

goroutine 22 [select]:
github.com/influxdata/telegraf/agent.(*Agent).flush(0xc0002025f0, 0x2505160, 0xc0002a4900, 0xc000483290, 0x2540be400, 0x0)
        /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:496 +0x1a2
github.com/influxdata/telegraf/agent.(*Agent).runOutputs.func1(0xc00053af30, 0xc0002025f0, 0x2505160, 0xc0002a4900, 0xbefd01b4aa3b9c9f, 0x22803fa, 0x3cd91c0, 0x2540be400, 0x0, 0xc000483290)
        /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:447 +0xa3
created by github.com/influxdata/telegraf/agent.(*Agent).runOutputs
        /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:436 +0x1b9

goroutine 27 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce(0xc0002025f0, 0x2512c40, 0xc0002ebd80, 0xc000043240, 0xdf8475800, 0x0, 0x0)
        /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:287 +0x233
github.com/influxdata/telegraf/agent.(*Agent).gatherOnInterval(0xc0002025f0, 0x2505160, 0xc000042d00, 0x2512c40, 0xc0002ebd80, 0xc000043240, 0xdf8475800, 0x0)
        /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:257 +0x126
github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1(0xc00053b930, 0xc0002025f0, 0x2505160, 0xc000042d00, 0xbefd01b4aa3b9c9f, 0x22803fa, 0x3cd91c0, 0xdf8475800, 0x2512c40, 0xc0002ebd80, ...)
        /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:229 +0xc0
created by github.com/influxdata/telegraf/agent.(*Agent).runInputs
        /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:218 +0x171

goroutine 39 [select]:
github.com/influxdata/telegraf/agent.(*Ticker).relayTime(0xc000bec000, 0x2505160, 0xc000be8000)
        /Users/prydin/go/src/github.com/influxdata/telegraf/agent/tick.go:46 +0x12b
created by github.com/influxdata/telegraf/agent.NewTicker
        /Users/prydin/go/src/github.com/influxdata/telegraf/agent/tick.go:33 +0x135

goroutine 6098 [select, 82 minutes]:
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).pushOut(0xc02253d640, 0x25051a0, 0xc00003c048, 0x0, 0x0, 0x0)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:56 +0xec
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1.1(0xc002a51520, 0xc02253d640, 0x25051a0, 0xc00003c048, 0xc001312f00)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:80 +0xcd
created by github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:72 +0xcd                                                                                                                                                             [114/1877]

goroutine 6081 [semacquire, 83 minutes]:
sync.runtime_Semacquire(0xc02253d648)
        /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc02253d640)
        /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130 +0x64
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Drain(0xc02253d640, 0x25051a0, 0xc00003c048, 0xc02253dcc0, 0x0)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:117 +0x88
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).collectResource(0xc000146380, 0x25051a0, 0xc00003c048, 0x21ccf4e, 0x2, 0x2512c40, 0xc0002ebd80, 0x36cc0d000, 0x0)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:758 +0x99e
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect.func1(0xc01b1f5170, 0xc000146380, 0x25051a0, 0xc00003c048, 0x2512c40, 0xc0002ebd80, 0x21ccf4e, 0x2)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:604 +0x9e
created by github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:602 +0x299

goroutine 6079 [select, 82 minutes]:
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).push(0xc02253d640, 0x25051a0, 0xc00003c048, 0x1c50cc0, 0xc00ef1ca20, 0xc00ef1ca20)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:47 +0xec
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).push-fm(0x25051a0, 0xc00003c048, 0x1c50cc0, 0xc00ef1ca20, 0x7)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:100 +0x52
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).chunker(0xc000146380, 0x25051a0, 0xc00003c048, 0xc0159054d0, 0xc00f5a79e0, 0x81d260, 0xed3a58ac0, 0x0, 0x0, 0xed3a58a84, ...)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:677 +0x708
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).collectResource.func2(0x25051a0, 0xc00003c048, 0xc0159054d0)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:752 +0x8b
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Fill.func1(0xc02253d640, 0xc001312f50, 0x25051a0, 0xc00003c048)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:100 +0xa0
created by github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Fill
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:98 +0x76

goroutine 1177 [semacquire, 81 minutes]:
sync.runtime_SemacquireMutex(0xc0001463d8, 0xc00ca49100)
        /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:71 +0x3d
sync.(*RWMutex).Lock(0xc0001463d0)
        /usr/local/Cellar/go/1.11/libexec/src/sync/rwmutex.go:98 +0x74
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).discover(0xc000146380, 0x2505160, 0xc0002a4740, 0x0, 0x0)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:452 +0xd90
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).startDiscovery.func1(0xc000146380, 0x2505160, 0xc0002a4740)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:232 +0x113
created by github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).startDiscovery
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:228 +0x81

goroutine 12633 [IO wait]:
internal/poll.runtime_pollWait(0x7f074bd93bc0, 0x72, 0xc000c2de58)
        /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc0020de098, 0x72, 0xffffffffffffff00, 0x24eb300, 0x3bc37f0)
        /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc0020de098, 0xc022c4e000, 0x1, 0x1)
        /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc0020de080, 0xc022c4e0d1, 0x1, 0x1, 0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:169 +0x179
net.(*netFD).Read(0xc0020de080, 0xc022c4e0d1, 0x1, 0x1, 0x0, 0x24ad6, 0x259a9)
        /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc000202aa0, 0xc022c4e0d1, 0x1, 0x1, 0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.11/libexec/src/net/net.go:177 +0x68
net/http.(*connReader).backgroundRead(0xc022c4e0c0)                                                                                                                                                                                                                    [59/1877]
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:676 +0x5a
created by net/http.(*connReader).startBackgroundRead
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:672 +0xd2

goroutine 6076 [semacquire, 83 minutes]:
sync.runtime_Semacquire(0xc01b1f5178)
        /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc01b1f5170)
        /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130 +0x64
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect(0xc000146380, 0x25051a0, 0xc00003c048, 0x2512c40, 0xc0002ebd80, 0x0, 0x0)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:611 +0x2ac
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather.func1(0xc00dc1bf90, 0x2512c40, 0xc0002ebd80, 0xc009476cc0, 0xc000146380)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:269 +0x88
created by github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:267 +0x13e

goroutine 6078 [semacquire, 83 minutes]:
sync.runtime_Semacquire(0xc002a51528)
        /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc002a51520)
        /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130 +0x64
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1(0xc02253d640, 0x2, 0x25051a0, 0xc00003c048, 0xc001312f00)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:88 +0xed
created by github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:67 +0x84

goroutine 6075 [semacquire, 83 minutes]:
sync.runtime_Semacquire(0xc00dc1bf98)
        /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc00dc1bf90)
        /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130 +0x64
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather(0xc0002aefc0, 0x2512c40, 0xc0002ebd80, 0x2710, 0x0)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:282 +0x168
github.com/influxdata/telegraf/internal/models.(*RunningInput).Gather(0xc000043240, 0x2512c40, 0xc0002ebd80, 0xc001637fc0, 0x88ca67)
        /Users/prydin/go/src/github.com/influxdata/telegraf/internal/models/running_input.go:86 +0x6d
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1(0xc0001786c0, 0xc000043240, 0x2512c40, 0xc0002ebd80)
        /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:283 +0x3f
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce
        /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:282 +0xdc

goroutine 6097 [select, 82 minutes]:
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).pushOut(0xc02253d640, 0x25051a0, 0xc00003c048, 0x0, 0x0, 0x0)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:56 +0xec
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1.1(0xc002a51520, 0xc02253d640, 0x25051a0, 0xc00003c048, 0xc001312f00)
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:80 +0xcd
created by github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1
        /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:72 +0xcd

NOTE

I just figured that, when i run telegraf as a systemd unit it fail like this case. but when i run it into a linux jobs with the same parameters of the systemd unit it work properly for more than an 2hours. I really dont get it. right now i'm trying to setup a proper InfluxDB Enterprise Cluster to check if this collecting failure it's not because of a standalone Influxdb.

@ghost
Copy link

ghost commented Dec 18, 2018

Update

My bad, The plugin working fine in release Telegraf unknown (git: prydin-scale-improvement 646c596). I just forget to tell grafana to connect each point of metric in an interval superior of 1min. I appologize for my huge misstake..

I have increase my interval at 120s and it's working like a charm with all my Vcenter

@danielnelson
Copy link
Contributor

I believe this is working, and now available, in 1.10.0

@danielnelson danielnelson added this to the 1.10.0 milestone Mar 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vsphere discussion Topics for discussion
Projects
None yet
Development

No branches or pull requests

4 participants