Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Enhance GPU metrics collection and error handling in vGPU monitor #827

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

haitwang-cloud
Copy link
Contributor

@haitwang-cloud haitwang-cloud commented Jan 22, 2025

What type of PR is this?

/kind flake
This pull request includes significant changes to the vGPUmonitor application to improve its structure and functionality. The most important changes include the addition of context and signal handling, the restructuring of the metrics collection process, and the refactoring of the watchAndFeedback function to support graceful shutdowns.

Context and Signal Handling:

  • cmd/vGPUmonitor/main.go: Added context and signal handling to enable graceful shutdown of the application. This includes capturing system signals and using a context to manage the lifecycle of goroutines.

Metrics Collection:

  • cmd/vGPUmonitor/metrics.go: Refactored the metrics collection process by splitting it into multiple functions (collectGPUInfo, collectPodAndContainerInfo, collectContainerMetrics, etc.) to improve readability and maintainability. [1] [2]
  • cmd/vGPUmonitor/metrics.go: Introduced the sendMetric helper function to streamline sending metrics to Prometheus.

Refactoring watchAndFeedback:

  • cmd/vGPUmonitor/feedback.go: Refactored the watchAndFeedback function to support context-based cancellation, improving the application's ability to shut down gracefully. [1] [2]

Code Cleanup:

These changes collectively enhance the robustness and maintainability of the vGPUmonitor application.
What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant