-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix resource id in virtual machine scale sets with azure_monitor output #5821
Conversation
@johncrim Do you have any available time to help test? We should verify this fix on both a scaleset and non-scaleset virtual machine, if you could do either of these it would be very appreciated. |
Yes, I'd be happy to. My uses cases are Ubuntu, so I'm afraid it's not easy
for me to help test the RPM and Windows build.
I can test the scaleset and non-scaleset hosts on Ubuntu.
…On Wed, May 8, 2019 at 4:58 PM Daniel Nelson ***@***.***> wrote:
@johncrim <https://github.com/johncrim> Do you have any available time to
help test? We should verify this fix on both a scaleset and non-scaleset
virtual machine, if you could do either of these it would be very
appreciated.
- telegraf-1.11.0~6e504a4c-0.x86_64.rpm
<https://32499-33258973-gh.circle-artifacts.com/0/build/linux/amd64/telegraf-1.11.0~6e504a4c-0.x86_64.rpm>
- telegraf_1.11.0~6e504a4c-0_amd64.deb
<https://32499-33258973-gh.circle-artifacts.com/0/build/linux/amd64/telegraf_1.11.0~6e504a4c-0_amd64.deb>
- telegraf-1.11.0~6e504a4c_windows_amd64.zip
<https://32499-33258973-gh.circle-artifacts.com/0/build/windows/amd64/telegraf-1.11.0~6e504a4c_windows_amd64.zip>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5821 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAZIT7724YXUMPPSAQKSMKDPUNSJBANCNFSM4HLWB7OA>
.
|
Great, testing the .deb should definitely be sufficient on this issue. |
@danielnelson : I'm still getting the same error as before on servers in the VM ScaleSet. My hunch is that the logic to detect whether the VM is in a scaleset or a standalone VM isn't working. I'll review the changes and try to troubleshoot a bit more to see if I can help. vm000:~$ systemctl status telegraf --all -n 20
● telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB
Loaded: loaded (/lib/systemd/system/telegraf.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2019-05-14 16:17:17 UTC; 1min 43s ago
Docs: https://github.com/influxdata/telegraf
Main PID: 72690 (telegraf)
Tasks: 10
Memory: 22.4M
CPU: 223ms
CGroup: /system.slice/telegraf.service
└─72690 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
May 14 16:17:17 vm000 systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
May 14 16:17:17 vm000 systemd[1]: Started The plugin-driven server agent for reporting metrics into InfluxDB.
May 14 16:17:17 vm000 telegraf[72690]: 2019-05-14T16:17:17Z I! Starting Telegraf
May 14 16:17:17 vm000 telegraf[72690]: 2019-05-14T16:17:17Z I! Loaded inputs: cpu diskio mem net
May 14 16:17:17 vm000 telegraf[72690]: 2019-05-14T16:17:17Z I! Loaded aggregators:
May 14 16:17:17 vm000 telegraf[72690]: 2019-05-14T16:17:17Z I! Loaded processors:
May 14 16:17:17 vm000 telegraf[72690]: 2019-05-14T16:17:17Z I! Loaded outputs: azure_monitor
May 14 16:17:17 vm000 telegraf[72690]: 2019-05-14T16:17:17Z I! Tags enabled: host=vm000
May 14 16:17:17 vm000 telegraf[72690]: 2019-05-14T16:17:17Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"vm000", Flush Interval:10s
May 14 16:18:00 vm000 telegraf[72690]: 2019-05-14T16:18:00Z E! [agent] Error writing to output [azure_monitor]: failed to write batch: [404] 404 Not Found
May 14 16:18:10 vm000 telegraf[72690]: 2019-05-14T16:18:10Z E! [agent] Error writing to output [azure_monitor]: failed to write batch: [404] 404 Not Found
May 14 16:18:20 vm000 telegraf[72690]: 2019-05-14T16:18:20Z E! [agent] Error writing to output [azure_monitor]: failed to write batch: [404] 404 Not Found
May 14 16:18:30 vm000 telegraf[72690]: 2019-05-14T16:18:30Z E! [agent] Error writing to output [azure_monitor]: failed to write batch: [404] 404 Not Found It still works normally on the standalone VM. |
A little more diagnostic info: Querying the metadata service on a VM in the scaleset: vm000:~$ curl -H Metadata:true "http://169.254.169.254/metadata/instance?api-version=2017-12-01"
{"compute":{"location":"westus2","name":"vm0_0","offer":"UbuntuServer","osType":"Linux","placementGroupId":"<guid>","platformFaultDomain":"0","platformUpdateDomain":"0","publisher":"Canonical","resourceGroupName":"rg","sku":"16.04-LTS","subscriptionId":"<guid>","tags":"","version":"16.04.201904240","vmId":"<guid>","vmScaleSetName":"vm0","vmSize":"Standard_B2s","zone":""},"network":...} Querying the metadata service on a standalone VM: jcdev:~$ curl -H Metadata:true "http://169.254.169.254/metadata/instance?api-version=2017-12-01"
{"compute":{"location":"westus2","name":"jcdev","offer":"UbuntuServer","osType":"Linux","placementGroupId":"","platformFaultDomain":"0","platformUpdateDomain":"0","publisher":"Canonical","resourceGroupName":"rg","sku":"16.04-LTS","subscriptionId":"<guid>","tags":"","version":"16.04.201904240","vmId":"<guid>","vmScaleSetName":"","vmSize":"Standard_B1ms","zone":""},"network":...} In both cases I edited the response to remove any potentially sensitive info. |
@danielnelson : I'm pretty confident that I've identified the bug: The last segment of the resourceId on a VM ScaleSet needs to be the VM ScaleSet name. The current code is using the computer name. Eg in the example above, the resource ID is currently:
And it should be:
Note that it would be a bit easier to troubleshoot if the URL were logged when a 404 occurs. I don't know if that idea violates any security standards in the telegraf code base, but it certainly would have saved me a bunch of time. |
Thanks for the testing, I believe I have fixed the issue in these new builds, can you give them a try? If you still have problems, run Telegraf with |
@danielnelson - Thank you for the fix. Unfortunately, it's still not working on the VM in a scaleset. With
If I manually set the resourceId in
Then, as before, metric reporting works as expected:
Would it be possible to debug log the evaluated resource ID? I'll take a look at your changes again (I'm not a Go developer, but it's pretty easy to read). |
@danielnelson : This looks like the problem: if m.Compute.VMScaleSetName == "" {
return fmt.Sprintf(
resourceIDScaleSetTemplate,
m.Compute.SubscriptionID,
m.Compute.ResourceGroupName,
m.Compute.VMScaleSetName,
)
} else {
return fmt.Sprintf(
resourceIDTemplate,
m.Compute.SubscriptionID,
m.Compute.ResourceGroupName,
m.Compute.Name,
)
} The if/else bodies are switched. If the VMScaleSetName is empty, use the VM template. |
if m.Compute.VMScaleSetName == "" { | ||
template = resourceIDTemplate | ||
return fmt.Sprintf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if m.Compute.VMScaleSetName != ""
Thanks, the resource ID is essentially the azure monitor url in the debug output, but I think it would make sense to print it explicitly. I'll try to circle back on this later today but here is the builds with the fixed logic: |
Thanks @danielnelson. This .deb build works as expected on both type of Azure VM resources. I think you're good to go. |
Good news, thanks again for the testing |
Hello @johncrim @danielnelson i have ubuntu 18.04 LTS VMSS on azure , below are the steps which i followed: apt install telegraf -y added a line in /etc/telegraf/telegraf.conf file as: telegraf status telegraf.service - Telegraf Nov 14 18:01:09 waf-teleg000000 telegraf[3505]: 2023-11-14T18:01:09Z I! Tags enabled: host=waf-teleg000000 Telegraf config file. #Send aggregate metrics to Azure Monitor #Set the namespace prefix, defaults to "Telegraf/". #Azure Monitor doesn't have a string value type, so convert string #Both region and resource_id must be set or be available via the The Azure Resource ID against which metric will be logged, e.g. Please share any example telegraf config file for vmss, i wanted to achieve based on the Apache requests in telegraf metric wanted to scale up vmss |
@anildesai61 please stop putting comments on closed PR and issues. If you want support or help please use the slack or community forums. |
Sure, I will open a new PR. |
Use alternate resource-id for virtual machine scale sets.
closes #5819
Required for all PRs: