-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[resourcedetection] windows: Error 'failed getting host cpuinfo: context deadline exceeded' #33768
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Since these cpu info attributes are not enabled by default we shouldn't call the /cc @mx-psi |
Yeah. That's one thing that needs fixing indeed. It would be great if |
I also just opened one more bug about this new detector: #33771 That other bug is MUCH harder to reproduce though, because I cannot rely on the alternative WMI/COM error case of simply disable the winmgmt service .... I would actually need to find a way to artificially slow down COM calls in a dev/test system. |
**Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> This PR changes the system resource detector so as to only try to fetch the CPU info when required. The CPU info attributes are disabled by default so we should only fetch this information when at least one of those is enabled. **Link to tracking Issue:** <Issue number if applicable> #33768 **Testing:** <Describe what testing was performed and which tests were added.> Added unti-tests **Documentation:** <Describe the documentation added.> ~ /cc @mx-psi --------- Signed-off-by: ChrsMark <[email protected]>
Component(s)
processor/resourcedetection, processor/resourcedetection/internal/system
What happened?
Description
After introduction of the host cpuinfo attributes in #26533, the
system
resource detection can fail catastrophically on Windows hosts, resulting in ALL configuredsystem
resource attributes (Host name, Host ID, OS type, OS description ...) to become unavailable in all pipelines where the instance ofresourcedection
processor is used.The cause is a combination of:
cpuinfo
attribute collection is ALWAYS running on the processor'sStart()
phase, regardless of whether the cpuinfo attributes are configured to be added into the resource attributes.cpuinfo
work in [processor/resourcedetection] Add support for host cpuinfo attributes #26533 uses a mechanism (WMI 1) for retrieving the CPU info that can often fail with a timeout (hence, thecontext deadline exceeded
error).The issue is more likely to happen when the Otel collector starts up during host boot up (e.g. as a service launched by a service manager) as opposed to launching the Otel collector on demand after the Windows host is already running.
This due to parallelization of startup tasks (services) in the Operating System.
Steps to Reproduce
winmgmgt
windows service to simulate the failure condition of not being able to collect the CPU Info:sc config winmgmt start= disabled
andnet stop winmgmt
system
resourcedetection with at least one of the configs enabled (e.g. Host ID) in a pipeline.Expected Result
Otel Collector's logs contains the requested attribute in the resourcedetection processor's logs (e.g. Host ID)
e.g.
Actual Result
Otel Collector's logs contain the error message
'failed getting host cpuinfo:
(NOTE: the simulated failure condition of completely disabling
winmgmt
produces a slightly different exception instead of the'context deadline exceeded'
error from a production system)The requested attributes (e.g. Host ID) are missing from the resourcedetection processor's logs
e.g.
Collector version
v0.103.1
Environment information
Environment
OS: Windows (10, 11, 2019, 2022)
OpenTelemetry Collector configuration
Log output
Additional context
There should have been a Breaking Change note in the ChangeLog that makes all users of the
resourcedetection
processor aware of the newly introduced hard dependency on thewinmgmt
service.Footnotes
https://github.com/shirou/gopsutil/blob/e74324b6a726997ce756b8f79dbbd7a3a0999ba0/cpu/cpu_windows.go#L98-L127 ↩
The text was updated successfully, but these errors were encountered: