-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in App Core RP - number of goroutines is increasing #5237
Comments
What immediately comes to mind is the async background processor. Not currently used in UCP but heavily used in appcore. Other than that there's very little concurrency in our server-side APIs since webservers are already "embarassingly parallel". |
Yeah nested go routines in async worker may have the bug or it could be Go GC behavior. We need to do some load test in order to see the root cause. |
# Description Close opened buffers. Potential memory leak cause #5237 <!-- We strive to have all PR being opened based on an issue, where the problem or feature have been discussed prior to implementation. --> Fixes: #issue_number ## Checklist Please make sure you've completed the relevant tasks for this PR, out of the following list: * [ ] Code compiles correctly * [ ] Adds necessary unit tests for change * [ ] Adds necessary E2E tests for change * [ ] Unit tests passing * [ ] Extended the documentation / Created issue for it
# Description otelhttp has the high cardinality problem which record all remote address from client. This issue is tracked by open-telemetry/opentelemetry-go-contrib#3765 . This PR is adding workaround to prevent otelhttp from recording remoteaddr as an attribute. ## Issue reference <!-- We strive to have all PR being opened based on an issue, where the problem or feature have been discussed prior to implementation. --> #5299 #5237 ## Checklist Please make sure you've completed the relevant tasks for this PR, out of the following list: * [ ] Code compiles correctly * [ ] Adds necessary unit tests for change * [ ] Adds necessary E2E tests for change * [ ] Unit tests passing * [ ] Extended the documentation / Created issue for it --------- Co-authored-by: Yetkin Timocin <[email protected]>
# Description Making some improvements found while investigating the memory leak issue. #5237 <!-- We strive to have all PR being opened based on an issue, where the problem or feature have been discussed prior to implementation. --> Fixes: #5237 ## Checklist Please make sure you've completed the relevant tasks for this PR, out of the following list: * [ ] Code compiles correctly * [ ] Adds necessary unit tests for change * [ ] Adds necessary E2E tests for change * [ ] Unit tests passing * [ ] Extended the documentation / Created issue for it
@vinayada1 UCP has been fixed, but memory and goroutines of 0.21 appcore-rp are still leaking now. |
# Description Fixes goroutine leak in kubernetes handler. The informerfactory which was started in a goroutine was invoked with an option wait.NeverStop which caused the informer to keep the associated goroutines lingering around #5237 <!-- We strive to have all PR being opened based on an issue, where the problem or feature have been discussed prior to implementation. --> Fixes: #5237 ## Checklist Please make sure you've completed the relevant tasks for this PR, out of the following list: * [ ] Code compiles correctly * [ ] Adds necessary unit tests for change * [ ] Adds necessary E2E tests for change * [ ] Unit tests passing * [ ] Extended the documentation / Created issue for it ## Auto-generated summary <!-- GitHub Copilot for docs will auto-generate a summary of the PR --> <!-- copilot:all --> ### <samp>🤖 Generated by Copilot at d1fc1d4</samp> ### Summary 🚀🐛🧹 <!-- 1. 🚀 - This emoji can represent the improvement in performance and concurrency by using go routines and tracking their IDs. 2. 🐛 - This emoji can represent the bug fix and error handling in case of a panic, which can prevent crashes and data loss. 3. 🧹 - This emoji can represent the cleanup and refactoring of unused code and better logging and shutdown logic. --> Improved logging and error handling for async operations and Kubernetes informer in `armrpc` and `corerp` packages. Added `goroutineID` variable to `worker.go` to identify go routines. > _`goroutineID` logs_ > _panic and async tasks - /_ > _improved error handling_ ### Walkthrough * Generate and log go routine ID for async operation worker ([link](https://github.com/project-radius/radius/pull/5693/files?diff=unified&w=0#diff-5d9b9964242d77606f4868e3fe049978ea1c3f1903afcfeb60c76e19d9785fc5L232-R235), [link](https://github.com/project-radius/radius/pull/5693/files?diff=unified&w=0#diff-5d9b9964242d77606f4868e3fe049978ea1c3f1903afcfeb60c76e19d9785fc5L245-R246)) * Use context for informer factory and logger in `WatchUntilReady` function ([link](https://github.com/project-radius/radius/pull/5693/files?diff=unified&w=0#diff-033dc8a7c9b970cc30f7420ce8cf2f12a26258b276779683ba3f761c481bf0a7L188-R189), [link](https://github.com/project-radius/radius/pull/5693/files?diff=unified&w=0#diff-033dc8a7c9b970cc30f7420ce8cf2f12a26258b276779683ba3f761c481bf0a7L242-R244)) --------- Co-authored-by: Young Bu Park <[email protected]>
Nice to see that graph go dooooowwwwwn |
# Description Fixes goroutine leak in kubernetes handler. The informerfactory which was started in a goroutine was invoked with an option wait.NeverStop which caused the informer to keep the associated goroutines lingering around #5237 <!-- We strive to have all PR being opened based on an issue, where the problem or feature have been discussed prior to implementation. --> Fixes: #5237 ## Checklist Please make sure you've completed the relevant tasks for this PR, out of the following list: * [ ] Code compiles correctly * [ ] Adds necessary unit tests for change * [ ] Adds necessary E2E tests for change * [ ] Unit tests passing * [ ] Extended the documentation / Created issue for it ## Auto-generated summary <!-- GitHub Copilot for docs will auto-generate a summary of the PR --> <!-- copilot:all --> ### <samp>🤖 Generated by Copilot at d1fc1d4</samp> ### Summary 🚀🐛🧹 <!-- 1. 🚀 - This emoji can represent the improvement in performance and concurrency by using go routines and tracking their IDs. 2. 🐛 - This emoji can represent the bug fix and error handling in case of a panic, which can prevent crashes and data loss. 3. 🧹 - This emoji can represent the cleanup and refactoring of unused code and better logging and shutdown logic. --> Improved logging and error handling for async operations and Kubernetes informer in `armrpc` and `corerp` packages. Added `goroutineID` variable to `worker.go` to identify go routines. > _`goroutineID` logs_ > _panic and async tasks - /_ > _improved error handling_ ### Walkthrough * Generate and log go routine ID for async operation worker ([link](https://github.com/project-radius/radius/pull/5693/files?diff=unified&w=0#diff-5d9b9964242d77606f4868e3fe049978ea1c3f1903afcfeb60c76e19d9785fc5L232-R235), [link](https://github.com/project-radius/radius/pull/5693/files?diff=unified&w=0#diff-5d9b9964242d77606f4868e3fe049978ea1c3f1903afcfeb60c76e19d9785fc5L245-R246)) * Use context for informer factory and logger in `WatchUntilReady` function ([link](https://github.com/project-radius/radius/pull/5693/files?diff=unified&w=0#diff-033dc8a7c9b970cc30f7420ce8cf2f12a26258b276779683ba3f761c481bf0a7L188-R189), [link](https://github.com/project-radius/radius/pull/5693/files?diff=unified&w=0#diff-033dc8a7c9b970cc30f7420ce8cf2f12a26258b276779683ba3f761c481bf0a7L242-R244)) --------- Co-authored-by: Young Bu Park <[email protected]>
Bug information
Steps to reproduce (required)
Observed behavior (required)
Link: https://radius-longhaul-wus3-awejb5drbjabh8h0.wus3.grafana.azure.com/d/Wh5JpyxVz/radius-overview?orgId=1&from=1677877199000&to=1677887999000
Number of goroutines of appcore-rp keeps growing while memory usage is increasing.
Desired behavior (required)
When GC executes, memory usage should be decreased and gorouine should be decresed.
Action items
We need to run load tests and find the memory leak by increased go routine.
AB#6467
The text was updated successfully, but these errors were encountered: