Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azure_monitor support for VM scale sets #5819

Closed
johncrim opened this issue May 8, 2019 · 1 comment · Fixed by #5821
Closed

azure_monitor support for VM scale sets #5819

johncrim opened this issue May 8, 2019 · 1 comment · Fixed by #5821
Assignees
Labels
area/azure Azure plugins including eventhub_consumer, azure_storage_queue, azure_monitor bug unexpected problem or unintended behavior
Milestone

Comments

@johncrim
Copy link

johncrim commented May 8, 2019

Currently, the azure_monitor output plugin doesn't correctly resolve Virtual Machine Scaleset Resource IDs for VMs that are part of scale sets. It only works correctly/automatically for single + individually configured Virtual Machines. By default VMs in ScaleSets receive a bunch of 404 errors in the telegraf log, and it's difficult to diagnose the cause.

Proposal:

The resourceIDTemplate unnecessarily constrains the resource ID to non-scaleset VMs:

resourceIDTemplate    = "/subscriptions/%s/resourceGroups/%s/providers/Microsoft.Compute/virtualMachines/%s"

Fixing this template is easy enough - the harder part is adding logic to determine whether the VM is running singly or within a scaleset resource.

Scaleset VMs do have the Instance Metadata service running, and managed service identity works the same. The only difference needed for this plugin to work correctly is for the correct resource ID to be determined, using a template like:

vmssResourceIDTemplate    = "/subscriptions/%s/resourceGroups/%s/providers/Microsoft.Compute/virtualMachineScaleSets/%s"

Current behavior:

Currently, if this setup is performed on a VM in a scaleset:

apt install telegraf -y
telegraf --input-filter cpu:mem:diskio:net --output-filter azure_monitor config > /etc/telegraf/telegraf.conf
systemctl restart telegraf

The telegraf service log starts showing a bunch of 404 errors, though the URL isn't specified.

If the resourceID is manually set to the VM ScaleSet resource Id in telegraf.conf, the telegraf metrics are sent as expected.

Desired behavior:

Installing and configuring the azure_monitor output plugin, as specified (and as documented in the Microsoft and Influx docs) just works.

Use case:

Much of the cloud native use-cases for Azure use VM scalesets (eg Kubernetes or ServiceFabric). Adding this support makes telegraf useable in Azure for VMs that aren't individually configured.

@anildesai61
Copy link

anildesai61 commented Nov 14, 2023

Hello @johncrim

i have ubuntu 18.04 LTS VMSS on azure , below are the steps which i followed:

apt install telegraf -y
telegraf --input-filter cpu:mem:diskio:net --output-filter azure_monitor config > /etc/telegraf/telegraf.conf
systemctl restart telegraf

  1. added a line in /etc/telegraf/telegraf.conf file as:
    resoure_id = "/subscriptions/%s/resourceGroups/%s/providers/Microsoft.Compute/virtualMachineScaleSets/%s"

But still telegraf metric not able to visible in the metric option at vmss, can you please help here it will be more helpful.

telegraf status

telegraf.service - Telegraf
Loaded: loaded (/lib/systemd/system/telegraf.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2023-11-14 18:01:09 UTC; 2min 6s ago
Docs: https://github.com/influxdata/telegraf
Main PID: 3505 (telegraf)
Tasks: 7 (limit: 4915)
CGroup: /system.slice/telegraf.service
└─3505 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d

Nov 14 18:01:09 waf-teleg000000 telegraf[3505]: 2023-11-14T18:01:09Z I! Tags enabled: host=waf-teleg000000
Nov 14 18:01:09 waf-teleg000000 systemd[1]: Started Telegraf.
Nov 14 18:01:09 waf-teleg000000 telegraf[3505]: 2023-11-14T18:01:09Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"waf-teleg000000", Flush Interval:10s
Nov 14 18:02:09 waf-teleg000000 telegraf[3505]: 2023-11-14T18:02:09Z E! [agent] Error writing to outputs.azure_monitor: unable to fetch authentication credentials: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://CentralIndia.monitoring.azure.com/subscriptions/xxxx-xx-xx-xx-xxx-xxx-xxx-xxx-xxxx/resourceGroups/TELEGRAPH/providers/Microsoft.Compute/virtualMachineScaleSets/waf-telegraph1/metrics: StatusCode=400 -- Original Error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Identity not found"} Endpoint http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmonitoring.azure.com%2F

Telegraf config file.

#Send aggregate metrics to Azure Monitor
[[outputs.azure_monitor]]
Timeout for HTTP writes.
timeout = "20s"

#Set the namespace prefix, defaults to "Telegraf/".
namespace_prefix = "Telegraf/Apache"

#Azure Monitor doesn't have a string value type, so convert string
#fields to dimensions (a.k.a. tags) if enabled. Azure Monitor allows
#a maximum of 10 dimensions so Telegraf will only send the first 10
#alphanumeric dimensions.
strings_as_dimensions = false

#Both region and resource_id must be set or be available via the
#Instance Metadata service on Azure Virtual Machines.
#Azure Region to publish metrics against
region = "centralindia"

The Azure Resource ID against which metric will be logged, e.g.
resource_id = "/subscriptions/xxx-xx-x-xxxxx-xxx-xx/resourceGroups/TELEGRAPH/providers/Microsoft.Compute/virtualMachineScaleSets/waf-telegraph1"

Please share any example telegraf config file, i wanted to achieve based on the Apache requests in telegraf metric wanted to scale up vmss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/azure Azure plugins including eventhub_consumer, azure_storage_queue, azure_monitor bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants