Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraf 1.10.1 default transform vsphere data from 20s to 60s sample. #5619

Closed
radekhuda opened this issue Mar 21, 2019 · 6 comments · Fixed by #5726
Closed

Telegraf 1.10.1 default transform vsphere data from 20s to 60s sample. #5619

radekhuda opened this issue Mar 21, 2019 · 6 comments · Fixed by #5726

Comments

@radekhuda
Copy link

radekhuda commented Mar 21, 2019

Relevant telegraf.conf:

# Global tags can be specified here in key="value" format.
[global_tags]
  # dc = "us-east-1" # will tag all metrics with dc=us-east-1
  # rack = "1a"
  ## Environment variables can be used as tags, and throughout the config file
  # user = "$USER"


# Configuration for telegraf agent
[agent]
  ## Default data collection interval for all inputs
  interval = "20s"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will send metrics to outputs in batches of at most
  ## metric_batch_size metrics.
  ## This controls the size of writes that Telegraf sends to output plugins.
  metric_batch_size = 10000

  ## For failed writes, telegraf will cache metric_buffer_limit metrics for each
  ## output, and will flush this buffer on a successful write. Oldest metrics
  ## are dropped first when this buffer fills.
  ## This buffer only fills when writes fail to output plugin(s).
  metric_buffer_limit = 1000000

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "0s"

  ## Default flushing interval for all outputs. Maximum flush_interval will be
  ## flush_interval + flush_jitter
  flush_interval = "5s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "0s"

  ## By default or when set to "0s", precision will be set to the same
  ## timestamp order as the collection interval, with the maximum being 1s.
  ##   ie, when interval = "10s", precision will be "1s"
  ##       when interval = "250ms", precision will be "1ms"
  ## Precision will NOT be used for service inputs. It is up to each individual
  ## service input to set the timestamp at the appropriate precision.
  ## Valid time units are "ns", "us" (or "µs"), "ms", "s".
  precision = ""
  ## Logging configuration:
  ## Run telegraf with debug log messages.
  debug = true
  ## Run telegraf in quiet mode (error log messages only).
  quiet = false
  ## Specify the log file name. The empty string means to log to stderr.
  logfile = "/var/log/telegraf/telegraf.log"

  ## Override default hostname, if empty use os.Hostname()
  hostname = ""
  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = false


###############################################################################
#                            OUTPUT PLUGINS                                   #
###############################################################################

# Configuration for sending metrics to InfluxDB
[[outputs.influxdb]]
  ## The full HTTP or UDP URL for your InfluxDB instance.
  ##
  ## Multiple URLs can be specified for a single cluster, only ONE of the
  ## urls will be written to each interval.
  # urls = ["unix:///var/run/influxdb.sock"]
  # urls = ["udp://127.0.0.1:8089"]
   urls = ["http://127.0.0.1:8086"]

  ## The target database for metrics; will be created as needed.
  ## For UDP url endpoint database needs to be configured on server side.
   database = "VC"

  ## The value of this tag will be used to determine the database.  If this
  ## tag is not set the 'database' option is used as the default.
  # database_tag = ""

  ## If true, no CREATE DATABASE queries will be sent.  Set to true when using
  ## Telegraf with a user without permissions to create databases or when the
  ## database already exists.
  # skip_database_creation = false

  ## Name of existing retention policy to write to.  Empty string writes to
  ## the default retention policy.  Only takes effect when using HTTP.
  # retention_policy = ""

  ## Write consistency (clusters only), can be: "any", "one", "quorum", "all".
  ## Only takes effect when using HTTP.
  # write_consistency = "any"

  ## Timeout for HTTP messages.
   timeout = "10s"

  ## HTTP Basic Auth
   username = "username"
   password = "password"

  ## HTTP User-Agent
  # user_agent = "telegraf"

  ## UDP payload size is the maximum packet size to send.
  # udp_payload = "512B"
# # Read metrics from VMware vCenter
 [[inputs.vsphere]]
#   ## List of vCenter URLs to be monitored. These three lines must be uncommented
#   ## and edited for the plugin to work.
   vcenters = [ "https://vcenter/sdk" ]
   username = "username"
   password = "password"
#
#   ## VMs
#   ## Typical VM metrics (if omitted or empty, all metrics are collected)
#   vm_include = [ "/*/vm/**"] # Inventory path to VMs to collect (by default all are collected)
   vm_metric_include = [
        "cpu.usage.average",
        "cpu.ready.summation",
        "cpu.costop.summation",
        "mem.usage.average",
        "mem.swapped.average",
        "net.bytesTx.average",
        "net.bytesRx.average",
        "virtualDisk.totalWriteLatency.average",
        "virtualDisk.totalReadLatency.average",
        "disk.numberReadAveraged.average",
        "disk.numberWriteAveraged.average",
        "disk.read.average",
        "disk.write.average",
        "sys.osUptime.latest",
   ]

#   # vm_metric_exclude = [] ## Nothing is excluded by default
#   # vm_instances = true ## true by default
#   ## whether or not to force discovery of new objects on initial gather call before collecting metrics
#   ## when true for large environments this may cause errors for time elapsed while collecting metrics
#   ## when false (default) the first collection cycle may result in no or limited metrics while objects are discove$
    force_discover_on_init = true
#
#   ## the interval before (re)discovering objects subject to metrics collection (default: 300s)
#   # object_discovery_interval = "300s"
#
#   ## timeout applies to any of the api request made to vcenter
#   # timeout = "60s"
#
#   ## Optional SSL Config
#   # ssl_ca = "/path/to/cafile"
#   # ssl_cert = "/path/to/certfile"
#   # ssl_key = "/path/to/keyfile"
#   ## Use SSL but skip chain & host verification
    insecure_skip_verify = true

System info:

[Include Telegraf version, operating system name, and other relevant details]
Telegraf: 1.10.1
OS: ubuntu 18.04.02
vSphere: 6.5

Steps to reproduce:

Expected behavior:

Possibility to select sample or leave unchanged.

Actual behavior:

image

Additional info:

After upgrade telegraf from version 1.9.5 to 1.10.1 all collected real-time vsphere metrics are transform instead of 20s to 60s samples. I tried added param interval = "20s" but it not works. Is possible change settings for turn off default transformation to 60s sample?

How to get back metrics with 20s sample?

@prydin
Copy link
Contributor

prydin commented Mar 22, 2019

Investigating...

@prydin
Copy link
Contributor

prydin commented Mar 22, 2019

There appears to be a bug in the code that estimates the sampling interval. I'll provide a fix. It's already running in my lab, but I want a few more hours on it to make sure it works in all cases.

(@danielnelson it would be really nice to have access to the interval. It's needed when we have to deal with late arriving data from vCenter)

@radekhuda
Copy link
Author

Yes, it would be nice to have the option to choose the sample length.

@ZHumphries
Copy link

@prydin Do you have a rough ETA for this?

@danielnelson
Copy link
Contributor

@ZHumphries @radekhuda I linked to some builds on #5726 that contain a fix, would be great if you could try it out.

@ZHumphries
Copy link

@danielnelson This appears to have fixed this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants