-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create SOS Report to Troubleshoot customer environments #14778
Comments
@gtanzillo @blomquisg @dmetzger57 @Fryguy @agrare Talk amongst yourselves, discuss. |
Not sure if appropriate, but first occurrence of an error in the log? |
I think this PR, #14107, answers the appliances, roles, and workers part of the bullet points. If it's helpful, we can add the VERSION, and memory information in the existing rake task or add new ones that we log... The memory information should be available and updated fairly often in
|
We should investigate if that sosreport tool is worth formally plugging into, or if it just makes sense to just create a simple tools script to get what we need as a first pass, or just reuse an existing command like As a further explanation of the request here, it really isn't about large environments, but is for ANY environment where support is requested. In other words, this report would be something that support must run/ask for as a first response in every ticket they respond to. Right now, we always "ask for logs", but that is a very heavy handed request since the logs are huge and need to be parsed and reviewed to find what we are looking for. Instead, this report is meant to be a single, simple command, with a simple 1-page text output, that can be dumped to a file or copied into an email or bug ticket, which gives support and/or developers the information that they almost always need and always ask for (we even ask for this stuff when we already have the logs, which is kind of silly, but the logs are so onerous). If rake |
I definitely like the idea of And, if it's a one-liner, we could always have sosreport call that if that ends up being the end goal. |
Oh, something else to add: Region, Zone, Provider landscape
|
@gtanzillo @blomquisg @dmetzger57 @Fryguy @agrare Where are we with this? I do not want to get to the point where we hit another set of issues and wonder what happened to the SOS report? |
There are already the tools/db_printers that do something similar that we can include into the single report. |
Things that would be useful:
|
@bronaghs Can you make traction on a first cut of this. Once in place, we can iterate and improve it. |
@chessbyte - will do. |
@miq-bot assign @juliancheal |
Some indication to which features are used to allow understanding complexity and load in the environment around:
|
@dmetzger57 help |
@mfeifer I've done some work on this, but I keep getting delayed. |
@juliancheal I'll reach out to you and setup a time when we can chat. I'm going to be dedicating part of my time helping @mfeifer effort to make field engineering faster/better/stronger |
Perhaps a simple start is implementing a cli tool for gathering high level configuration / health information, initially providing the following information:
Of course taking into account multi-appliance, multi-zone, multi-region fun 😄 An SOS Report contains a massive amount of information, this suggestion looks to provide a light weight tool to begin gaining a perspective on the environment being supported, it can be added to an SOS Report if desired. |
I think the whole point of an SOS Report is something that is relatively small that can be cut/paste in an email to get clarity on the user's environment. This would precede the set of ManageIQ logs that tend to be massive and are typically shared via attachment or a link to an available storage location. As @ohadlevy mentioned via email, perhaps we can borrow some ideas from the Foreman project here and here. |
This issue has been automatically marked as stale because it has not been updated for at least 6 months. If you can still reproduce this issue on the current release or on Thank you for all your contributions! |
@dmetzger57 is this still a valid issue. If not can you close. |
Closing issue. If you feel the issue needs to remain open, please let me know and it will be reopened. |
When troubleshooting in large environments, would like some kind of health report (or another name) to get vital signs of an environment. This is not the same as getting all the logs.
For discussion:
The text was updated successfully, but these errors were encountered: